Source MySQL #
The extracted replicant-cli
will be referred to as the $REPLICANT_HOME
directory.
I. Install mysqlbinlog
utility on Replicant host
#
Install an appropriate version of the mysqlbinlog
utility on the machine where Replicant runs. The utility must be compatible with the source MySQL server.
To ensure that you possess the appropriate version of mysqlbinlog
utility, install the same MySQL server version as your source MySQL system. After installation, you can stop the MySQL server running on Replicant’s host using the following command:
sudo systemctl stop mysql
II. Enable binary logging in MySQL server #
-
Open the MySQL option file
var/lib/my.cnf
(create the file if it doesn’t already exist). Add the following lines in the file:[mysqld] log-bin=mysql-log.bin binlog_format=ROW binlog_row_image=full
The preceding option file specifies the following binary logging options:
-
The first line specifies the base name to use for binary log files.
-
The second line sets the binary logging format.
-
The third line specifies how the server writes row images to the binary log. In
full
mode, the server logs all columns in both the before image and the after image.
-
-
Export
$MYSQL_HOME
path with the following command:export MYSQL_HOME=/var/lib/mysql
-
Restart MySQL with the following command:
sudo systemctl restart mysql
-
Verify that you successfully enabled binary logging with the following command:
mysql -u root -p
mysql> show variables like "%log_bin%"; +---------------------------------+--------------------------------+ | Variable_name | Value | +---------------------------------+--------------------------------+ | log_bin | ON | | log_bin_basename | /var/lib/mysql/mysql-bin | | log_bin_compress | OFF | | log_bin_compress_min_len | 256 | | log_bin_index | /var/lib/mysql/mysql-bin.index | | log_bin_trust_function_creators | OFF | | sql_log_bin | ON | +---------------------------------+--------------------------------+ 7 rows in set (0.011 sec)
III. Set up MySQL user for Replicant #
- Create MySQL user
CREATE USER 'username'@'replicate_host' IDENTIFIED BY 'password';
- Grant the following privileges on all tables involved in replication
GRANT SELECT ON "<user_database>"."<table_name>" TO 'username'@'replicate_host';
- Grant the following Replication privileges
GRANT REPLICATION CLIENT ON *.* TO 'username'@'replicate_host'; GRANT REPLICATION SLAVE ON *.* TO 'username'@'replicate_host';
- Verify if created user can access bin logs
mysql> show binary logs; +------------------+-----------+ | Log_name | File_size | +------------------+-----------+ | mysql-bin.000001 | 351 | | mysql-bin.000002 | 4635 | | mysql-bin.000003 | 628 | | mysql-bin.000004 | 195038185 | +------------------+-----------+ 4 rows in set (0.001 sec)
IV. Set up connection configuration #
-
From
$REPLICANT_HOME
, navigate to the connection configuration filevi conf/conn/mysql_src.yaml
-
If you store your connection credentials in AWS Secrets Manager, you can tell Replicant to retrieve them. For more information, see Retrieve credentials from AWS Secrets Manager.
Otherwise, you can put your credentials like usernames and passwords in plain form like the following sample:
type: MYSQL host: 127.0.0.1 #Replace 127.0.0.1 with your MySQL server host name port: 3306 #Replace the 3306 with the port of your host username: "replicant" #Replace replicant with your username of the user that connects to your MySQL server password: "Replicant#123" #Replace Replicant#123 with the your user's password slave-server-ids: [1] max-connections: 30 #Maximum number of connections replicant can open in MySQL
V. Set up filter configuration #
-
From
$REPLICANT_HOME
, navigate to the filter configuration filevi filter/mysql_filter.yaml
-
According to your requirements, specify the data you want to replicate. Use the following format:
allow: catalog: "tpch" types: [TABLE] allow: NATION: allow: ["US, AUS"] ORDERS: allow: ["product", "service"] conditions: "o_orderkey < 5000" PART:
The preceding sample consists of the following elements:
- Data of object type
TABLE
in the schematpch
goes through replication. - From catalog
tpch
, only theNATION
,ORDERS
, andPART
tables go through replication. - From
NATION
table, only theUS
andAUS
columns go through replication. - From the
ORDERS
table, only theproduct
andservice
columns go through replication as long as those columns meet the condition inconditions
.
Note: Unless you specify, Replicant replicates all tables in the catalog.
The following illustrates the format you must follow:
allow: catalog: <your_catalog_name> types: <your_object_type> allow: <your_table_name>: allow: ["your_column_name"] condtions: "your_condition" <your_table_name>: allow: ["your_column_name"] conditions: "your_condition" <your_table_name>:
- Data of object type
For a thorough explanation of the configuration parameters in the filter file, see Filter Reference.
VI. Set up Extractor configuration #
To configure replication according to your requirements, specify your configuration in the Extractor configuration file. You can find a sample Extractor configuration file mysql.yaml
in the $REPLICANT_HOME/conf/src
directory. For a thorough explanation of the configuration parameters in the Extractor file, see Extractor Reference.
You can configure the following replication modes by specifying the parameters under their respective sections in the configuration file:
snapshot
realtime
See the following sections for more information.
Configure snapshot
replication
#
The following illustrates a sample configuration for operating in snapshot
mode:
snapshot:
threads: 16
fetch-size-rows: 15_000
per-table-config:
- catalog: tpch
tables:
ORDERS:
num-jobs: 1
LINEITEM:
row-identifier-key: [L_ORDERKEY]
split-key: l_orderkey
For more information about the configuration parameters for snapshot
mode, see Snapshot Mode.
Configure realtime
replication
#
To operate in realtime
mode, follow these steps:
-
For real-time replication, you must create a heartbeat table in the source MySQL. To create a heartbeat table in the catalog or schema you want to replicate, use the following DDL:
CREATE TABLE `<user_database>`.`replicate_io_cdc_heartbeat`( timestamp BIGINT NOT NULL, PRIMARY KEY(timestamp));
Replace
<user_database>
with the name of your specific database—for example,tpch
. -
Grant
INSERT
,UPDATE
, andDELETE
privileges to the user for the heartbeat table. -
Specify your configuration under the
realtime
section of the Extractor configuration file. For example:realtime: threads: 4 fetch-size-rows: 10_000 fetch-duration-per-extractor-slot-s: 3 _traceDBTasks: true heartbeat: enable: true catalog: tpch table-name: replicate_io_cdc_heartbeat column-name: timestamp
In the preceding example, notice the following about the
heartbeat
configuration corresponding to the heartbeat table in the first step:-
tpch
represents the name of the database that contains the heartbeat table. -
replicate_io_cdc_heartbeat
represents the heartbeat table’s name. -
timestamp
represents the heartbeat table’s column name.
-
Additional realtime
parameters
#
bin-log-idle-timeout-s
-
In some cases, a
mysqlbinlog
process might not produce any output. This parameter specifies after how many seconds the replication restarts such amysqlbinlog
process.Default:
600
.
For more information about the configuration parameters for realtime
mode, see Realtime Mode.
If you want to use the Source Column Transformation feature of Replicant for a MySQL-to-Databricks pipeline, see Source Column Transformation.
Replication of generated columns #
Replicant supports replication of generated columns from MySQL to either a different database platform, or another MySQL database.
Arcion supports replication of generated columns for the following Replicant modes:
The behavior of replicating generated columns depends on the type of replication pipeline and your usage of the configuration parameters. See the following sections for more information.
Configuration parameters #
Use the following Extractor configuration parameters and CLI option to control replication of generated columns.
Warning: If you specifycreate-sql
orfetch-create-sql
, only use schema name or database name in Mapper file. Any Mapper rule with column names, table names, or others raises Exception.
Extractor parameters #
You can specify these parameters in the Extractor configuration file of MySQL.
-
computed-columns
-
Whether to block or unblock replication of generated columns. It can take one of the following two values:
BLOCK
UNBLOCK
The behavior of
BLOCK
andUNBLOCK
depends on the type of replication pipeline and how you use thecreate-sql
orfetch-create-sql
parameters. -
fetch-create-sql
-
A boolean parameter supporting the following two values:
-
true
. Replicant replicates the generated columns to the target without skipping data types and functions. This means that the tables possess the same definitions as the source. -
false
. Replicant replicates generated columns with the following characteristics:- Replicant only replicates the data type and the data.
- Replicant skips replicating functions. The target database table possesses different definition than the one on the source. Replicant treats generated columns as ordinary columns.
Warning: Replicant doesn’t support
fetch-create-sql
for heterogeneous pipelines. Any usage offetch-create-sql
in a heterogeneous pipeline raises Exception. -
- Replicant only replicates the data type and the data.
- Replicant skips replicating functions. The target database table possesses different definition than the one on the source. Replicant treats generated columns as ordinary columns.
-
With
fetch-create-sql
orcreate-sql
enabled -
If you set
fetch-create-sql
totrue
or specify thecreate-sql
CLI option, Replicant replicates data and corresponding data types as well as the functions. This means that table definition on the target stays the same as source. The value ofcomputed-columns
holds no effect in this scenario. -
With
fetch-create-sql
orcreate-sql
disabled -
If you set
fetch-create-sql
tofalse
or omit thecreate-sql
CLI option, the behavior follows the same pattern as a heterogeneous pipeline.
The following shows a sample Extractor configuration for a homogeneous pipeline:
snapshot:
threads: 16
fetch-size-rows: 15_000
min-job-size-rows: 1_000_000
max-jobs-per-chunk: 32
computed-columns: UNBLOCK
fetch-user-roles: true
fetch-create-sql : true
The preceding sample instructs Replicant to replicate generated columns with data, corresponding data types, and functions.
CLI option #
Replicant self-hosted CLI exposes a CLI option create-sql
. create-sql
yields the same outcome as setting fetch-create-sql
to true
.
create-sql
holds a higher precedence than the Extractor parameter fetch-create-sql
. If you run Replicant with the create-sql
option, Replicant ignores the value of fetch-create-sql
.
The following shows a sample command for running Replicant:
./bin/replicant full conf/conn/mysql_src.yaml conf/conn/mysql_dst.yaml \
--filter filter/mysql_filter.yaml \
--extractor conf/src/mysql.yaml \
--metadata conf/metadata/postgresql.yaml \
--replace-existing --id mf2 \
--overwrite --create-sql
The preceding command instructs Replicant to replicate the generated columns with data, the corresponding data types, and functions.
Warning: Replicant doesn’t supportcreate-sql
for heterogeneous pipelines. Any usage ofcreate-sql
in a heterogeneous pipeline raises Exception.
Replication of generated columns in heterogeneous pipeline #
A heterogeneous pipeline means replication between two different database platforms. For example, a MySQL-to-PostgreSQL replication pipeline.
For a heterogeneous pipeline, set the computed-columns
property to one of the following values in your Extractor configuration file:
BLOCK
#
Replicant skips replicating generated columns. No generated columns from your source MySQL exists in the target database you’re replicating to.
UNBLOCK
#
Replicant replicates generated columns from source MySQL with the following caveats:
Replication of generated columns in homogeneous pipeline #
A homogeneous pipeline means replication between two identical database platforms. For example, a MySQL-to-MySQL replication pipeline.
For a homogeneous pipeline, computed-columns
behaves in the following manner. The behavior depends on the usage of the create-sql
or fetch-create-sql
parameters: