Source MySQL #
The following steps refer the extracted Arcion self-hosted CLI download as the $REPLICANT_HOME
directory.
Prerequisites #
I. Install mysqlbinlog
utility on Replicant host
#
Install an appropriate version of the mysqlbinlog
utility on the machine where Replicant runs. The utility must be compatible with the source MySQL server.
To ensure that you possess the appropriate version of mysqlbinlog
utility, install the same MySQL server version as your source MySQL system. After installation, stop the MySQL server running on Replicant’s host:
sudo systemctl stop mysql
II. Enable binary logging in MySQL server #
-
Open the MySQL option file
var/lib/my.cnf
(create the file if it doesn’t already exist). Add the following lines in the file:[mysqld] log-bin=mysql-log.bin binlog_format=ROW binlog_row_image=full
The preceding option file specifies the following binary logging options:
-
The first line specifies the base name to use for binary log files.
-
The second line sets the binary logging format.
-
The third line specifies how the server writes row images to the binary log. In
full
mode, the server logs all columns in both the before image and the after image.
-
-
Export
$MYSQL_HOME
path:export MYSQL_HOME=/var/lib/mysql
-
Restart MySQL service:
sudo systemctl restart mysql
-
Verify that you have successfully enabled binary logging:
mysql -u root -p
mysql> show variables like "%log_bin%"; +---------------------------------+--------------------------------+ | Variable_name | Value | +---------------------------------+--------------------------------+ | log_bin | ON | | log_bin_basename | /var/lib/mysql/mysql-bin | | log_bin_compress | OFF | | log_bin_compress_min_len | 256 | | log_bin_index | /var/lib/mysql/mysql-bin.index | | log_bin_trust_function_creators | OFF | | sql_log_bin | ON | +---------------------------------+--------------------------------+ 7 rows in set (0.011 sec)
III. Set up MySQL user for Replicant #
- Create MySQL user:
CREATE USER 'USERNAME'@'REPLICANT_HOST' IDENTIFIED BY 'PASSWORD';
- Grant the following privileges on all tables relevant to the replication:
GRANT SELECT ON "USER_DATABASE"."TABLE_NAME" TO 'USERNAME'@'REPLICANT_HOST';
- Grant the
REPLICATION
privileges:GRANT REPLICATION CLIENT ON *.* TO 'USERNAME'@'REPLICANT_HOST'; GRANT REPLICATION SLAVE ON *.* TO 'USERNAME'@'REPLICANT_HOST';
- Verify that this new user can access binary logs:
mysql> show binary logs; +------------------+-----------+ | Log_name | File_size | +------------------+-----------+ | mysql-bin.000001 | 351 | | mysql-bin.000002 | 4635 | | mysql-bin.000003 | 628 | | mysql-bin.000004 | 195038185 | +------------------+-----------+ 4 rows in set (0.001 sec)
Replace the following:
USERNAME
: the username for the userREPLICANT_HOST
: the hostname of the machine running ReplicantPASSWORD
: the user passwordUSER_DATABASE
: the MySQL database name to grant privilege toTABLE_NAME
: the relevant table name inUSER_DATABASE
I. Set up connection configuration #
Specify your MySQL connection details to Replicant with a connection configuration file. You can find a sample connection configuration file mysql.yaml
in the $REPLICANT_HOME/conf/conn
directory.
To connect to MySQL, you can choose between two methods for an authenticated connection:
Connect with username and password without SSL #
To connect to MySQL with basic username and password authentication, you have these two options:
You can specify your credentials in plain form in the connection configuration file like the folowing sample:
type: MYSQL
host: HOSTNAME_OR_IP
port: PORT_NUMBER
username: "USERNAME"
password: "PASSWORD"
slave-server-ids: [1]
max-connections: 30
max-retries: 10
retry-wait-duration-ms: 1000
Replace the following:
HOSTNAME_OR_IP
: the MySQL hostname or IP addressPORT_NUMBER
: the port number of MySQL hostUSERNAME
: the MySQL username to connect to the MySQL server (theuser
part in MySQL account name<’user’>@<’host’>
)PASSWORD
: the password associated withUSERNAME
Connect using SSL #
To connect to MySQL using SSL, specify the SSL configuration under the ssl
field in the connection configuration file. When using this method, make sure you provide the username
and don’t provide password
.
type: MYSQL
host: HOSTNAME_OR_IP
port: PORT_NUMBER
username: "USERNAME"
ssl:
enable: true
root-cert: PATH_TO_CA_PEM_FILE
ssl-cert: PATH_TO_CLIENT_CERT_PEM_FILE
ssl-key: PATH_TO_CLIENT_KEY_PEM_FILE
hostname-verification: {true|false}
slave-server-ids: [1]
max-connections: 30
max-retries: 10
retry-wait-duration-ms: 1000
Replace the following:
HOSTNAME_OR_IP
: the MySQL hostname or IP addressPORT_NUMBER
: the port number of MySQL hostUSERNAME
: the MySQL username to connect to the MySQL server (theuser
part in MySQL account name<’user’>@<’host’>
)PATH_TO_CA_PEM_FILE
: path to the certificate authority (CA) certificate PEM file–for example,ca.pem
.PATH_TO_CLIENT_CERT_PEM_FILE
: path to the client SSL public key certificate file in PEM format—for example,client-cert.pem
.PATH_TO_CLIENT_KEY_PEM_FILE
: path to the client private key—for example,client-key.pem
.
hostname-verification
enables hostname verification against the server identity according to the specification in the server’s certificate. It defaults to true
.
Tip: You can use SSL with TrustStore and KeyStore for snapshot replication.
Connect to Amazon RDS for MySQL using SSL #
To connect to RDS for MySQL using SSL, follow these steps:
- Download a certificate bundle.
- Set the
root-cert
parameter of the connection configuration file to the location of the PEM certificate bundle.
The following sample shows how to define a connection for RDS for MySQL:
type: MYSQL
host: HOSTNAME_OR_IP
port: PORT_NUMBER
username: "USERNAME"
password: "PASSWORD"
slave-server-ids: [1]
max-connections: 30
max-retries: 10
retry-wait-duration-ms: 1000
ssl:
enable: true
root-cert: PATH_TO_PEM_BUNDLE
hostname-verification: {true|false}
Replace the following:
HOSTNAME_OR_IP
: the RDS for MySQL hostname or IP addressPORT_NUMBER
: the port number of RDS for MySQL hostUSERNAME
: the username to connect to the RDS for MySQL server (theuser
part in the account name<’user’>@<’host’>
)PASSWORD
: the password associated withUSERNAME
PATH_TO_PEM_BUNDLE
: full path to the PEM certificate bundle—for example,/home/arcion/us-east-1-bundle.pem
hostname-verification
enables hostname verification against the server identity according to the specification in the server’s certificate. It defaults to true
.
II. Set up filter configuration (optional) #
If you want to define filter rules for source MySQL, specify the them in the filter configuration file. You can find a sample filter configuration file in the filter/
directory of your Arcion self-hosted CLI download.
For example:
allow:
catalog: "tpch"
types: [TABLE]
allow:
NATION:
allow: ["US, AUS"]
ORDERS:
allow: ["product", "service"]
conditions: "o_orderkey < 5000"
PART:
The preceding sample consists of the following elements:
- Data of object type
TABLE
in the schematpch
goes through replication. - From catalog
tpch
, only theNATION
,ORDERS
, andPART
tables go through replication. - From
NATION
table, only theUS
andAUS
columns go through replication. - From the
ORDERS
table, only theproduct
andservice
columns go through replication as long as those columns meet the condition inconditions
.
Note: Unless you specify, Replicant replicates all tables in the catalog.
The following illustrates the format you must follow:
allow:
catalog: <your_catalog_name>
types: <your_object_type>
allow:
<your_table_name>:
allow: ["your_column_name"]
condtions: "your_condition"
<your_table_name>:
allow: ["your_column_name"]
conditions: "your_condition"
<your_table_name>:
For a thorough explanation of filters, see Filter Reference.
III. Set up Extractor configuration #
To configure replication according to your requirements, specify your configuration in the Extractor configuration file. You can find a sample Extractor configuration file mysql.yaml
in the $REPLICANT_HOME/conf/src
directory. For a thorough explanation of the configuration parameters in the Extractor file, see Extractor Reference.
You can configure the following replication modes by specifying the parameters under their respective sections in the configuration file:
snapshot
realtime
See the following sections for more information.
Configure snapshot
replication
#
The following illustrates a sample configuration for operating in snapshot
mode:
snapshot:
threads: 16
fetch-size-rows: 15_000
per-table-config:
- catalog: tpch
tables:
ORDERS:
num-jobs: 1
LINEITEM:
row-identifier-key: [L_ORDERKEY]
split-key: l_orderkey
For more information about the configuration parameters for snapshot
mode, see Snapshot Mode.
Configure realtime
replication
#
To operate in realtime
mode, follow these steps:
-
For real-time replication, you must create a heartbeat table in the source MySQL. To create a heartbeat table in the catalog or schema you want to replicate, use the following DDL:
CREATE TABLE `<user_database>`.`replicate_io_cdc_heartbeat`( timestamp BIGINT NOT NULL, PRIMARY KEY(timestamp));
Replace
<user_database>
with the name of your specific database—for example,tpch
. -
Grant
INSERT
,UPDATE
, andDELETE
privileges to the user for the heartbeat table. -
Specify your configuration under the
realtime
section of the Extractor configuration file. For example:realtime: threads: 4 fetch-size-rows: 10_000 fetch-duration-per-extractor-slot-s: 3 _traceDBTasks: true heartbeat: enable: true catalog: tpch table-name: replicate_io_cdc_heartbeat column-name: timestamp
In the preceding example, notice the following about the
heartbeat
configuration corresponding to the heartbeat table in the first step:-
tpch
represents the name of the database that contains the heartbeat table. -
replicate_io_cdc_heartbeat
represents the heartbeat table’s name. -
timestamp
represents the heartbeat table’s column name.
-
For more information about the configuration parameters for realtime
mode, see Realtime Mode.
Additional realtime
parameters
#
bin-log-idle-timeout-s
-
In some cases, a
mysqlbinlog
process might not produce any output. This parameter specifies after how many seconds the replication restarts such amysqlbinlog
process.Default:
600
.
Support for DDL replication #
Replicant supports DDL replication for real-time MySQL source. For more information, contact us.
Note: If you want to use the Source Column Transformation feature of Replicant for a MySQL-to-Databricks pipeline, see Source Column Transformation.
Replication of generated columns #
Replicant supports replication of generated columns from MySQL to either a different database platform, or another MySQL database.
Arcion supports replication of generated columns for the following Replicant modes:
The behavior of replicating generated columns depends on the type of replication pipeline and your usage of the configuration parameters. See the following sections for more information.
Configuration parameters #
Use the following Extractor configuration parameters and CLI option to control replication of generated columns.
Warning: If you specifycreate-sql
orfetch-create-sql
, only use schema name or database name in Mapper file. Any Mapper rule with column names, table names, or others raises Exception.
Extractor parameters #
You can specify these parameters in the Extractor configuration file of MySQL.
-
computed-columns
-
Whether to block or unblock replication of generated columns. It can take one of the following two values:
BLOCK
UNBLOCK
The behavior of
BLOCK
andUNBLOCK
depends on the type of replication pipeline and how you use thecreate-sql
orfetch-create-sql
parameters. -
fetch-create-sql
-
A boolean parameter supporting the following two values:
-
true
. Replicant replicates the generated columns to the target without skipping data types and functions. This means that the tables possess the same definitions as the source. -
false
. Replicant replicates generated columns with the following characteristics:- Replicant only replicates the data type and the data.
- Replicant skips replicating functions. The target database table possesses different definition than the one on the source. Replicant treats generated columns as ordinary columns.
Warning: Replicant doesn’t support
fetch-create-sql
for heterogeneous pipelines. Any usage offetch-create-sql
in a heterogeneous pipeline raises Exception. -
- Replicant only replicates the data type and the data.
- Replicant skips replicating functions. The target database table possesses different definition than the one on the source. Replicant treats generated columns as ordinary columns.
-
With
fetch-create-sql
orcreate-sql
enabled -
If you set
fetch-create-sql
totrue
or specify thecreate-sql
CLI option, Replicant replicates data and corresponding data types as well as the functions. This means that table definition on the target stays the same as source. The value ofcomputed-columns
holds no effect in this scenario. -
With
fetch-create-sql
orcreate-sql
disabled -
If you set
fetch-create-sql
tofalse
or omit thecreate-sql
CLI option, the behavior follows the same pattern as a heterogeneous pipeline. -
Import the certificate authority (CA) certificate PEM file (for example
ca.pem
):keytool -importcert -alias MySQLCACert -file /path/to/ca.pem \ -keystore TRUSTSTORE_LOCATION \ -storepass TRUSTORE_PASSWORD -noprompt
Replace the following:
TRUSTSTORE_LOCATION
: The TrustStore location. It corresponds to thessl.trust-store.path
parameter in the SSL configuration.TRUSTORE_PASSWORD
: The TrustStore password. It corresponds to thessl.trust-store.password
parameter in the SSL configuration.
The
ca.pem
file corresponds to thessl.root-cert
field in the SSL configuration. -
Once you have the client private key (for example,
client-key.pem
) and certificate files (for example,client-cert.pem
) you want to use, import them into a Java KeyStore:-
Convert the client key and certificate files to a PKCS #12 archive:
openssl pkcs12 -export -in /path/to/client-cert.pem -inkey /path/to/client-key.pem \ -name "NAME" -passout pass:PASSWORD \ -out client-keystore_src.p12
Replace the following:
PASSWORD
: the password source for output filesNAME
: a name for the certificate and key
For more information, see the
openssl-pkcs12
manpage.The
client-key.pem
andclient-cert.pem
files correspond to thessl.ssl-key
andssl.ssl-cert
parameters respectively in the SSL configuration. -
Import the client key and certificate into a Java KeyStore:
keytool -importkeystore -srckeystore client-keystore_src.p12 \ -srcstoretype pkcs12 -srcstorepass SRC_KEYSTORE_PASSWORD \ -destkeystore NAME_OF_THE_DST_KEYSTORE_FILE -deststoretype JKS \ -deststorepass DST_KEYSTORE_PASSWORD
Replace the following:
SRC_KEYSTORE_PASSWORD
: Source KeyStore password.NAME_OF_THE_DST_KEYSTORE_FILE
: Name of the destination KeyStore file. Corresponds to thessl.key-store.path
parameter in the SSL configuration.DST_KEYSTORE_PASSWORD
: Destination KeyStore password. Corresponds to thessl.key-store.password
parameter in the SSL configuration.
If you get an error with the preceding command, make sure to use the same password for both
srcstorepass
anddeststorepass
. For more information, seekeytool-importkeystore
documentation.
-
HOSTNAME_OR_IP
: the MySQL hostname or IP addressPORT_NUMBER
: the port number of MySQL hostUSERNAME
: the MySQL username to connect to the MySQL server (theuser
part in MySQL account name<’user’>@<’host’>
)PATH_TO_TRUSTSTORE
: path to the TrustStore.TRUSTSTORE_PASSWORD
: the TrustStore password.PATH_TO_KEYSTORE
: path to the Java KeyStore.KEYSTORE_PASSWORD
: the KeyStore password.
The following shows a sample Extractor configuration for a homogeneous pipeline:
snapshot:
threads: 16
fetch-size-rows: 15_000
min-job-size-rows: 1_000_000
max-jobs-per-chunk: 32
computed-columns: UNBLOCK
fetch-user-roles: true
fetch-create-sql : true
The preceding sample instructs Replicant to replicate generated columns with data, corresponding data types, and functions.
CLI option #
Replicant self-hosted CLI exposes a CLI option create-sql
. create-sql
yields the same outcome as setting fetch-create-sql
to true
.
create-sql
holds a higher precedence than the Extractor parameter fetch-create-sql
. If you run Replicant with the create-sql
option, Replicant ignores the value of fetch-create-sql
.
The following shows a sample command for running Replicant:
./bin/replicant full conf/conn/mysql_src.yaml conf/conn/mysql_dst.yaml \
--filter filter/mysql_filter.yaml \
--extractor conf/src/mysql.yaml \
--metadata conf/metadata/postgresql.yaml \
--replace-existing --id mf2 \
--overwrite --create-sql
The preceding command instructs Replicant to replicate the generated columns with data, the corresponding data types, and functions.
Warning: Replicant doesn’t supportcreate-sql
for heterogeneous pipelines. Any usage ofcreate-sql
in a heterogeneous pipeline raises Exception.
Replication of generated columns in heterogeneous pipeline #
A heterogeneous pipeline means replication between two different database platforms. For example, a MySQL-to-PostgreSQL replication pipeline.
For a heterogeneous pipeline, set the computed-columns
property to one of the following values in your Extractor configuration file:
BLOCK
#
Replicant skips replicating generated columns. No generated columns from your source MySQL exists in the target database you’re replicating to.
UNBLOCK
#
Replicant replicates generated columns from source MySQL with the following caveats:
Replication of generated columns in homogeneous pipeline #
A homogeneous pipeline means replication between two identical database platforms. For example, a MySQL-to-MySQL replication pipeline.
For a homogeneous pipeline, computed-columns
behaves in the following manner. The behavior depends on the usage of the create-sql
or fetch-create-sql
parameters:
Use SSL with TrustStore and KeyStore in snapshot replication #
You can use TrustStore and KeyStore that contains the necessary SSL certificates for SSL connection in snapshot
mode replication. If you use this approach, don’t specify the ssl.root-cert
, ssl.ssl-cert
, and ssl.ssl-key
parameters in the connection configuration file.
To use SSL with TrustStore and KeyStore in snapshot replication, follow these instructions:
Create the TrustStore and KeyStore on the host running Replicant #
The following message appears after you execute the preceding commands successfully:
Entry for alias MySQLCACert successfully imported.
Import command completed: 1 entries successfully imported, 0 entries failed or cancelled
Specify SSL configuration in the connection configuration file #
Specify the SSL configuration under the ssl
field in the connection configuration file. When using this method, make sure you provide the username
and don’t provide password
.:
type: MYSQL
host: HOSTNAME_OR_IP
port: PORT_NUMBER
username: "USERNAME"
slave-server-ids: [1]
max-connections: 30
max-retries: 10
retry-wait-duration-ms: 1000
ssl:
enable: true
hostname-verification: {true|false}
trust-store:
path: PATH_TO_TRUSTORE
password: "TRUSTSTORE_PASSWORD"
key-store:
path: PATH_TO_KEYSTORE
password: "KEYSTORE_PASSWORD"
Replace the following:
hostname-verification
enables hostname verification against the server identity according to the specification in the server’s certificate. It defaults to true
.