Source MySQL #
The following steps refer the extracted Arcion self-hosted CLI download as the $REPLICANT_HOME directory.
Prerequisites #
I. Install mysqlbinlog utility on Replicant host
#
Install an appropriate version of the mysqlbinlog utility on the machine where Replicant runs. The utility must be compatible with the source MySQL server.
To ensure that you possess the appropriate version of mysqlbinlog utility, install the same MySQL server version as your source MySQL system. After installation, stop the MySQL server running on Replicant’s host:
sudo systemctl stop mysql
II. Enable binary logging in MySQL server #
-
Open the MySQL option file
var/lib/my.cnf(create the file if it doesn’t already exist). Add the following lines in the file:[mysqld] log-bin=mysql-log.bin binlog_format=ROW binlog_row_image=fullThe preceding option file specifies the following binary logging options:
-
The first line specifies the base name to use for binary log files.
-
The second line sets the binary logging format.
-
The third line specifies how the server writes row images to the binary log. In
fullmode, the server logs all columns in both the before image and the after image.
-
-
Export
$MYSQL_HOMEpath:export MYSQL_HOME=/var/lib/mysql -
Restart MySQL service:
sudo systemctl restart mysql -
Verify that you have successfully enabled binary logging:
mysql -u root -pmysql> show variables like "%log_bin%"; +---------------------------------+--------------------------------+ | Variable_name | Value | +---------------------------------+--------------------------------+ | log_bin | ON | | log_bin_basename | /var/lib/mysql/mysql-bin | | log_bin_compress | OFF | | log_bin_compress_min_len | 256 | | log_bin_index | /var/lib/mysql/mysql-bin.index | | log_bin_trust_function_creators | OFF | | sql_log_bin | ON | +---------------------------------+--------------------------------+ 7 rows in set (0.011 sec)
III. Set up MySQL user for Replicant #
- Create MySQL user:
CREATE USER 'USERNAME'@'REPLICANT_HOST' IDENTIFIED BY 'PASSWORD'; - Grant the following privileges on all tables relevant to the replication:
GRANT SELECT ON "USER_DATABASE"."TABLE_NAME" TO 'USERNAME'@'REPLICANT_HOST'; - Grant the
REPLICATIONprivileges:GRANT REPLICATION CLIENT ON *.* TO 'USERNAME'@'REPLICANT_HOST'; GRANT REPLICATION SLAVE ON *.* TO 'USERNAME'@'REPLICANT_HOST'; - Verify that this new user can access binary logs:
mysql> show binary logs; +------------------+-----------+ | Log_name | File_size | +------------------+-----------+ | mysql-bin.000001 | 351 | | mysql-bin.000002 | 4635 | | mysql-bin.000003 | 628 | | mysql-bin.000004 | 195038185 | +------------------+-----------+ 4 rows in set (0.001 sec)
Replace the following:
USERNAME: the username for the userREPLICANT_HOST: the hostname of the machine running ReplicantPASSWORD: the user passwordUSER_DATABASE: the MySQL database name to grant privilege toTABLE_NAME: the relevant table name inUSER_DATABASE
I. Set up connection configuration #
Specify your MySQL connection details to Replicant with a connection configuration file. You can find a sample connection configuration file mysql.yaml in the $REPLICANT_HOME/conf/conn directory.
To connect to MySQL, you can choose between two methods for an authenticated connection:
Connect with username and password without SSL #
To connect to MySQL with basic username and password authentication, you have these two options:
You can specify your credentials in plain form in the connection configuration file like the folowing sample:
type: MYSQL
host: HOSTNAME_OR_IP
port: PORT_NUMBER
username: "USERNAME"
password: "PASSWORD"
slave-server-ids: [1]
max-connections: 30
max-retries: 10
retry-wait-duration-ms: 1000
Replace the following:
HOSTNAME_OR_IP: the MySQL hostname or IP addressPORT_NUMBER: the port number of MySQL hostUSERNAME: the MySQL username to connect to the MySQL server (theuserpart in MySQL account name<’user’>@<’host’>)PASSWORD: the password associated withUSERNAME
Connect using SSL #
To connect to MySQL using SSL, specify the SSL configuration under the ssl field in the connection configuration file. When using this method, make sure you provide the username and don’t provide password.
type: MYSQL
host: HOSTNAME_OR_IP
port: PORT_NUMBER
username: "USERNAME"
ssl:
enable: true
root-cert: PATH_TO_CA_PEM_FILE
ssl-cert: PATH_TO_CLIENT_CERT_PEM_FILE
ssl-key: PATH_TO_CLIENT_KEY_PEM_FILE
hostname-verification: {true|false}
slave-server-ids: [1]
max-connections: 30
max-retries: 10
retry-wait-duration-ms: 1000
Replace the following:
HOSTNAME_OR_IP: the MySQL hostname or IP addressPORT_NUMBER: the port number of MySQL hostUSERNAME: the MySQL username to connect to the MySQL server (theuserpart in MySQL account name<’user’>@<’host’>)PATH_TO_CA_PEM_FILE: path to the certificate authority (CA) certificate PEM file–for example,ca.pem.PATH_TO_CLIENT_CERT_PEM_FILE: path to the client SSL public key certificate file in PEM format—for example,client-cert.pem.PATH_TO_CLIENT_KEY_PEM_FILE: path to the client private key—for example,client-key.pem.
hostname-verification enables hostname verification against the server identity according to the specification in the server’s certificate. It defaults to true.
Tip: You can use SSL with TrustStore and KeyStore for snapshot replication.
Connect to Amazon RDS for MySQL using SSL #
To connect to RDS for MySQL using SSL, follow these steps:
- Download a certificate bundle.
- Set the
root-certparameter of the connection configuration file to the location of the PEM certificate bundle.
The following sample shows how to define a connection for RDS for MySQL:
type: MYSQL
host: HOSTNAME_OR_IP
port: PORT_NUMBER
username: "USERNAME"
password: "PASSWORD"
slave-server-ids: [1]
max-connections: 30
max-retries: 10
retry-wait-duration-ms: 1000
ssl:
enable: true
root-cert: PATH_TO_PEM_BUNDLE
hostname-verification: {true|false}
Replace the following:
HOSTNAME_OR_IP: the RDS for MySQL hostname or IP addressPORT_NUMBER: the port number of RDS for MySQL hostUSERNAME: the username to connect to the RDS for MySQL server (theuserpart in the account name<’user’>@<’host’>)PASSWORD: the password associated withUSERNAMEPATH_TO_PEM_BUNDLE: full path to the PEM certificate bundle—for example,/home/arcion/us-east-1-bundle.pem
hostname-verification enables hostname verification against the server identity according to the specification in the server’s certificate. It defaults to true.
II. Set up filter configuration (optional) #
If you want to define filter rules for source MySQL, specify the them in the filter configuration file. You can find a sample filter configuration file in the filter/ directory of your Arcion self-hosted CLI download.
For example:
allow:
catalog: "tpch"
types: [TABLE]
allow:
NATION:
allow: ["US, AUS"]
ORDERS:
allow: ["product", "service"]
conditions: "o_orderkey < 5000"
PART:
The preceding sample consists of the following elements:
- Data of object type
TABLEin the schematpchgoes through replication. - From catalog
tpch, only theNATION,ORDERS, andPARTtables go through replication. - From
NATIONtable, only theUSandAUScolumns go through replication. - From the
ORDERStable, only theproductandservicecolumns go through replication as long as those columns meet the condition inconditions.
Note: Unless you specify, Replicant replicates all tables in the catalog.
The following illustrates the format you must follow:
allow:
catalog: <your_catalog_name>
types: <your_object_type>
allow:
<your_table_name>:
allow: ["your_column_name"]
condtions: "your_condition"
<your_table_name>:
allow: ["your_column_name"]
conditions: "your_condition"
<your_table_name>:
For a thorough explanation of filters, see Filter Reference.
III. Set up Extractor configuration #
To configure replication according to your requirements, specify your configuration in the Extractor configuration file. You can find a sample Extractor configuration file mysql.yaml in the $REPLICANT_HOME/conf/src directory. For a thorough explanation of the configuration parameters in the Extractor file, see Extractor Reference.
You can configure the following replication modes by specifying the parameters under their respective sections in the configuration file:
snapshotrealtime
See the following sections for more information.
Configure snapshot replication
#
The following illustrates a sample configuration for operating in snapshot mode:
snapshot:
threads: 16
fetch-size-rows: 15_000
per-table-config:
- catalog: tpch
tables:
ORDERS:
num-jobs: 1
LINEITEM:
row-identifier-key: [L_ORDERKEY]
split-key: l_orderkey
For more information about the configuration parameters for snapshot mode, see Snapshot Mode.
Configure realtime replication
#
To operate in realtime mode, follow these steps:
-
For real-time replication, you must create a heartbeat table in the source MySQL. To create a heartbeat table in the catalog or schema you want to replicate, use the following DDL:
CREATE TABLE `<user_database>`.`replicate_io_cdc_heartbeat`( timestamp BIGINT NOT NULL, PRIMARY KEY(timestamp));Replace
<user_database>with the name of your specific database—for example,tpch. -
Grant
INSERT,UPDATE, andDELETEprivileges to the user for the heartbeat table. -
Specify your configuration under the
realtimesection of the Extractor configuration file. For example:realtime: threads: 4 fetch-size-rows: 10_000 fetch-duration-per-extractor-slot-s: 3 _traceDBTasks: true heartbeat: enable: true catalog: tpch table-name: replicate_io_cdc_heartbeat column-name: timestampIn the preceding example, notice the following about the
heartbeatconfiguration corresponding to the heartbeat table in the first step:-
tpchrepresents the name of the database that contains the heartbeat table. -
replicate_io_cdc_heartbeatrepresents the heartbeat table’s name. -
timestamprepresents the heartbeat table’s column name.
-
For more information about the configuration parameters for realtime mode, see Realtime Mode.
Additional realtime parameters
#
bin-log-idle-timeout-s-
In some cases, a
mysqlbinlogprocess might not produce any output. This parameter specifies after how many seconds the replication restarts such amysqlbinlogprocess.Default:
600.
Support for DDL replication #
Replicant supports DDL replication for real-time MySQL source. For more information, contact us.
Note: If you want to use the Source Column Transformation feature of Replicant for a MySQL-to-Databricks pipeline, see Source Column Transformation.
Replication of generated columns #
Replicant supports replication of generated columns from MySQL to either a different database platform, or another MySQL database.
Arcion supports replication of generated columns for the following Replicant modes:
The behavior of replicating generated columns depends on the type of replication pipeline and your usage of the configuration parameters. See the following sections for more information.
Configuration parameters #
Use the following Extractor configuration parameters and CLI option to control replication of generated columns.
Warning: If you specifycreate-sqlorfetch-create-sql, only use schema name or database name in Mapper file. Any Mapper rule with column names, table names, or others raises Exception.
Extractor parameters #
You can specify these parameters in the Extractor configuration file of MySQL.
-
computed-columns -
Whether to block or unblock replication of generated columns. It can take one of the following two values:
BLOCKUNBLOCK
The behavior of
BLOCKandUNBLOCKdepends on the type of replication pipeline and how you use thecreate-sqlorfetch-create-sqlparameters. -
fetch-create-sql -
A boolean parameter supporting the following two values:
-
true. Replicant replicates the generated columns to the target without skipping data types and functions. This means that the tables possess the same definitions as the source. -
false. Replicant replicates generated columns with the following characteristics:- Replicant only replicates the data type and the data.
- Replicant skips replicating functions. The target database table possesses different definition than the one on the source. Replicant treats generated columns as ordinary columns.
Warning: Replicant doesn’t support
fetch-create-sqlfor heterogeneous pipelines. Any usage offetch-create-sqlin a heterogeneous pipeline raises Exception. -
- Replicant only replicates the data type and the data.
- Replicant skips replicating functions. The target database table possesses different definition than the one on the source. Replicant treats generated columns as ordinary columns.
-
With
fetch-create-sqlorcreate-sqlenabled -
If you set
fetch-create-sqltotrueor specify thecreate-sqlCLI option, Replicant replicates data and corresponding data types as well as the functions. This means that table definition on the target stays the same as source. The value ofcomputed-columnsholds no effect in this scenario. -
With
fetch-create-sqlorcreate-sqldisabled -
If you set
fetch-create-sqltofalseor omit thecreate-sqlCLI option, the behavior follows the same pattern as a heterogeneous pipeline. -
Import the certificate authority (CA) certificate PEM file (for example
ca.pem):keytool -importcert -alias MySQLCACert -file /path/to/ca.pem \ -keystore TRUSTSTORE_LOCATION \ -storepass TRUSTORE_PASSWORD -nopromptReplace the following:
TRUSTSTORE_LOCATION: The TrustStore location. It corresponds to thessl.trust-store.pathparameter in the SSL configuration.TRUSTORE_PASSWORD: The TrustStore password. It corresponds to thessl.trust-store.passwordparameter in the SSL configuration.
The
ca.pemfile corresponds to thessl.root-certfield in the SSL configuration. -
Once you have the client private key (for example,
client-key.pem) and certificate files (for example,client-cert.pem) you want to use, import them into a Java KeyStore:-
Convert the client key and certificate files to a PKCS #12 archive:
openssl pkcs12 -export -in /path/to/client-cert.pem -inkey /path/to/client-key.pem \ -name "NAME" -passout pass:PASSWORD \ -out client-keystore_src.p12Replace the following:
PASSWORD: the password source for output filesNAME: a name for the certificate and key
For more information, see the
openssl-pkcs12manpage.The
client-key.pemandclient-cert.pemfiles correspond to thessl.ssl-keyandssl.ssl-certparameters respectively in the SSL configuration. -
Import the client key and certificate into a Java KeyStore:
keytool -importkeystore -srckeystore client-keystore_src.p12 \ -srcstoretype pkcs12 -srcstorepass SRC_KEYSTORE_PASSWORD \ -destkeystore NAME_OF_THE_DST_KEYSTORE_FILE -deststoretype JKS \ -deststorepass DST_KEYSTORE_PASSWORDReplace the following:
SRC_KEYSTORE_PASSWORD: Source KeyStore password.NAME_OF_THE_DST_KEYSTORE_FILE: Name of the destination KeyStore file. Corresponds to thessl.key-store.pathparameter in the SSL configuration.DST_KEYSTORE_PASSWORD: Destination KeyStore password. Corresponds to thessl.key-store.passwordparameter in the SSL configuration.
If you get an error with the preceding command, make sure to use the same password for both
srcstorepassanddeststorepass. For more information, seekeytool-importkeystoredocumentation.
-
HOSTNAME_OR_IP: the MySQL hostname or IP addressPORT_NUMBER: the port number of MySQL hostUSERNAME: the MySQL username to connect to the MySQL server (theuserpart in MySQL account name<’user’>@<’host’>)PATH_TO_TRUSTSTORE: path to the TrustStore.TRUSTSTORE_PASSWORD: the TrustStore password.PATH_TO_KEYSTORE: path to the Java KeyStore.KEYSTORE_PASSWORD: the KeyStore password.
The following shows a sample Extractor configuration for a homogeneous pipeline:
snapshot:
threads: 16
fetch-size-rows: 15_000
min-job-size-rows: 1_000_000
max-jobs-per-chunk: 32
computed-columns: UNBLOCK
fetch-user-roles: true
fetch-create-sql : true
The preceding sample instructs Replicant to replicate generated columns with data, corresponding data types, and functions.
CLI option #
Replicant self-hosted CLI exposes a CLI option create-sql. create-sql yields the same outcome as setting fetch-create-sql to true.
create-sql holds a higher precedence than the Extractor parameter fetch-create-sql. If you run Replicant with the create-sql option, Replicant ignores the value of fetch-create-sql.
The following shows a sample command for running Replicant:
./bin/replicant full conf/conn/mysql_src.yaml conf/conn/mysql_dst.yaml \
--filter filter/mysql_filter.yaml \
--extractor conf/src/mysql.yaml \
--metadata conf/metadata/postgresql.yaml \
--replace-existing --id mf2 \
--overwrite --create-sql
The preceding command instructs Replicant to replicate the generated columns with data, the corresponding data types, and functions.
Warning: Replicant doesn’t supportcreate-sqlfor heterogeneous pipelines. Any usage ofcreate-sqlin a heterogeneous pipeline raises Exception.
Replication of generated columns in heterogeneous pipeline #
A heterogeneous pipeline means replication between two different database platforms. For example, a MySQL-to-PostgreSQL replication pipeline.
For a heterogeneous pipeline, set the computed-columns property to one of the following values in your Extractor configuration file:
BLOCK
#
Replicant skips replicating generated columns. No generated columns from your source MySQL exists in the target database you’re replicating to.
UNBLOCK
#
Replicant replicates generated columns from source MySQL with the following caveats:
Replication of generated columns in homogeneous pipeline #
A homogeneous pipeline means replication between two identical database platforms. For example, a MySQL-to-MySQL replication pipeline.
For a homogeneous pipeline, computed-columns behaves in the following manner. The behavior depends on the usage of the create-sql or fetch-create-sql parameters:
Use SSL with TrustStore and KeyStore in snapshot replication #
You can use TrustStore and KeyStore that contains the necessary SSL certificates for SSL connection in snapshot mode replication. If you use this approach, don’t specify the ssl.root-cert, ssl.ssl-cert, and ssl.ssl-key parameters in the connection configuration file.
To use SSL with TrustStore and KeyStore in snapshot replication, follow these instructions:
Create the TrustStore and KeyStore on the host running Replicant #
The following message appears after you execute the preceding commands successfully:
Entry for alias MySQLCACert successfully imported.
Import command completed: 1 entries successfully imported, 0 entries failed or cancelled
Specify SSL configuration in the connection configuration file #
Specify the SSL configuration under the ssl field in the connection configuration file. When using this method, make sure you provide the username and don’t provide password.:
type: MYSQL
host: HOSTNAME_OR_IP
port: PORT_NUMBER
username: "USERNAME"
slave-server-ids: [1]
max-connections: 30
max-retries: 10
retry-wait-duration-ms: 1000
ssl:
enable: true
hostname-verification: {true|false}
trust-store:
path: PATH_TO_TRUSTORE
password: "TRUSTSTORE_PASSWORD"
key-store:
path: PATH_TO_KEYSTORE
password: "KEYSTORE_PASSWORD"
Replace the following:
hostname-verification enables hostname verification against the server identity according to the specification in the server’s certificate. It defaults to true.