Source SingleStore #
The extracted replicant-cli
will be referred to as the $REPLICANT_HOME
directory in the proceeding steps.
I. Set up Connection Configuration #
-
From
$REPLICANT_HOME
, navigate to the sample connection configuration file:vi conf/conn/singlestore.yaml
-
If you store your connection credentials in AWS Secrets Manager, you can tell Replicant to retrieve them. For more information, see Retrieve credentials from AWS Secrets Manager.
Otherwise, you can put your credentials like usernames and passwords in plain form like the sample below:
type: SINGLESTORE host: localhost #Replace localhost with address to your SingleStore host port: 3306 #Replace default port 3306 if needed username: 'replicant' #Replace replicant with your SingleStore user password: 'Replicant#123' #Replace Replicant#123 with your user's password max-connections: 30 #Maximum number of connections replicant can open in SingleStore max-retries: 10 retry-wait-duration-ms: 1000
II. Set up Extractor Configuration #
-
From
$REPLICANT_HOME
, navigate to the Extractor configuration file:vi conf/src/singlestore.yaml
a. For snapshot mode, make the necessary changes as follows in the
snapshot
section of the configuration file:snapshot: #threads: 32 #fetch-size-rows: 10_000 #min-job-size-rows: 1_000_000 max-jobs-per-chunk: 32 #verify-row-count: false _traceDBTasks: true per-table-config: - catalog: tpch tables: # testTable # split-key: split-key-column part: split-key: partkey partsupp: split-key: partkey supplier: orders: split-key: orderkey lineitem: row-identifier-key: [l_orderkey, l_linenumber] split-key: l_orderkey # split-hints: # row-count-estimate: 15000 # split-key-min-value: 1 # split-key-max-value: 60_000
b. For delta snapshot mode, you can add
delta-snapshot-key
column to SingleStore tables if not present already with the followingALTER
(andUPDATE
) statement per table.ALTER TABLE tpch.LINEITEM_CS ADD COLUMN replicate_io_delta_snapshot_key TIMESTAMP NOT NULL DEFAULT current_timestamp ON UPDATE current_timestamp;
UPDATE tpch.LINEITEM_CS set replicate_io_delta_snapshot_key = current_timestamp;
Then make the necessary changes as follows in the
delta-snapshot
section of the configuration file:delta-snapshot: #threads: 32 #fetch-size-rows: 10_000 #min-job-size-rows: 1_000_000 max-jobs-per-chunk: 32 _max-delete-jobs-per-chunk: 32 delta-snapshot-key: last_update_time delta-snapshot-interval: 10 delta-snapshot-delete-interval: 10 _traceDBTasks: true replicate-deletes: false per-table-config: - schema: tpch tables: # testTable # split-key: split-key-column # Any numeric/timestamp column with sufficiently large number of distincts # split-hints: # row-count-estimate: 100000 # Estimated row count, if supplied replicant will leverage # split-key-min-value: 1 #Lower bound of split key value # split-key-max-value: 60_000 #Upper bound of split key value, if supplied replicant will leverage and avoid querying source database for the same # delta-snapshot-key: delta-snapshot-key-column # A monotonic increasing numeric/timestamp column which gets new value on each INSERT/UPDATE # row-identifier-key: [col1, col2] # A set of columns which uniquely identify a row # update-key: [col1, col2] # A set of columns which replicant should use to perform deletes/updates during incremental replication part: split-key: partkey partsupp: split-key: partkey supplier: orders: split-key: orderkey parts_view: update-key: [partkey] delta-snapshot-key: last_update_time split-key: last_update_time partsupp_macro: update-key: [partkey] delta-snapshot-key: last_update_time split-key: last_update_time
For a detailed explanation of configuration parameters in the extractor file, read Extractor Reference.