Source IBM Db2 with Native LUW #
You may want to use db2ReadLog API to read log records from the Db2 database logs, or query the Log Manager for current log state information. This page describes how to do that in Arcion when using Db2 as source.
I. Check Permissions #
The user should have read access on all the databases, schemas, and tables to be replicated.
The user should have read access to following system tables and views:
The user should have execute permissions on the following system procedures:
Users need these permissions only once at the start of a fresh replication.
II. Create the Heartbeat Table #
For CDC replication, you must create the heartbeat table on the Source database with the following DDL:
CREATE TABLE "tpch"."replicate_io_cdc_heartbeat"("timestamp" BIGINT NOT NULL, CONSTRAINT "cdc_heartbeat_id_default_default" PRIMARY KEY("timestamp"))
III. Enable CDC-based Replication #
If you’re performing CDC-based replication from the source Db2 server, please follow the steps below:
On system running source Db2 server: #
For any tables being replicated, run the following command:
ALTER TABLE <TABLE> DATA CAPTURE CHANGES
Check if the database is recoverable by running the followingg command:
db2 get db cfg for <DATABASE> show detail | grep -i "logarch"
LOGARCHMETH2is set, the database is already recoverable.
Skip the next step if the database is already recoverable.
Update the db2 logging method by running the following command:
db2 update db cfg for <DATABASE> using LOGARCHMETH1 LOGRETAIN
Updating the logging methods leaves the database in a Backup Pending state. To recover from this, you need to backup the database with:
db2 backup db <DATABASE> to <DESTINATION>
On system running Arcion Replicant: #
Skip steps 2-6 if Replicant is running from the same system as the source Db2 database.
JAVA_OPTSenvironment variable with:
Install Db2 Data Server Client Prerequisites by running the following commands:
sudo dpkg --add-architecture i386 sudo apt install libaio1 libstdc++6:i386 libpam0g:i386 sudo apt install binutils
Install Db2 Data Server Client:
a. Download latest version of Db2 Data Server Client from IBM.
b. Extract and start the installer by running
c. Select Custom installation.
d. Check the Base Application Development tools option on page 2 of the installation wizard.
e. Leave remaining options as default and complete the installation.
Catalog the source Db2 Server node by running the following:
db2 catalog tcpip node <NODE_NAME> remote <REMOTE> server <PORT>
Catalog the source Db2 database:
db2 catalog database <DATABASE> at node <NODE_NAME>
Finally, test the connection with:
db2 connect to <DATABASE> user <USER>
IV. Configure Replicant #
You also need to configure Replicant’s Db2 connection configuration file:
If you store your connection credentials in AWS Secrets Manager, you can tell Replicant to retrieve them. For more information, see Retrieve credentials from AWS Secrets Manager.
Otherwise, you can put your credentials like usernames and passwords in plain form like the sample below:
type: DB2 database: tpch host: localhost port: 50002 node: db2inst1 username: replicant password: "Replicant#123" max-connections: 30 max-retries: 10 retry-wait-duration-ms: 1000
Notice the property called
node. It represents the name of the Db2 node you’re connecting to. It can go anywhere in the root of the file. The default node name is the Db2 user’s name.
Set the value of
READ_LOGin the connection configuration file. This tells Replicant that you want to use the native db2ReadLog as the CDC log reader:
cdc-log-config: cdc-log-storage: READ_LOG
V. Set up Extractor Configuration #
$REPLICANT_HOME, navigate to the Extractor configuration file:
The Extractor configuration file has three parts:
- Parameters related to snapshot mode.
- Parameters related to delta snapshot mode.
- Parameters related to realtime mode.
Parameters related to snapshot mode #
For snapshot mode, you can make use of the following sample:
snapshot: threads: 16 fetch-size-rows: 5_000 _traceDBTasks: true #fetch-schemas-from-system-tables: true per-table-config: - catalog: tpch schema: db2user tables: lineitem: row-identifier-key: [l_orderkey, l_linenumber]
Parameters related to delta snapshot mode #
For delta delta snapshot mode, you can make use of the following sample:
delta-snapshot: #threads: 32 #fetch-size-rows: 10_000 #min-job-size-rows: 1_000_000 max-jobs-per-chunk: 32 _max-delete-jobs-per-chunk: 32 delta-snapshot-key: last_update_time delta-snapshot-interval: 10 delta-snapshot-delete-interval: 10 _traceDBTasks: true replicate-deletes: false per-table-config: - catalog: tpch schema: db2user tables: # testTable # split-key: split-key-column # split-hints: # row-count-estimate: 100000 # split-key-min-value: 1 # split-key-max-value: 60_000 # delta-snapshot-key: delta-snapshot-key-column # row-identifier-key: [col1, col2] # update-key: [col1, col2] partsupp: split-key: partkey
Parameters related to realtime mode #
For realtime mode, you can make use of the following sample:
realtime: #threads: 1 #fetch-size-rows: 10000 _traceDBTasks: true #fetch-interval-s: 0 replicate-empty-string-as-null: true # start-position: # commit-time: '2020-08-24 08:16:38.019002' # idempotent-replay: false heartbeat: enable: true catalog: tpch schema: db2user #table-name: replicate_io_cdc_heartbeat #column-name: timestamp interval-ms: 10000
In the sample above, notice the following details:
commit-timespecifies the timestamp in UTC under
start-position, which indicates the starting log position for realtime replication. To get a timestamp in UTC, you can execute the following query:
SELECT CURRENT TIMESTAMP - CURRENT TIMEZONE AS UTC_TIMESTAMP FROM SYSIBM.SYSDUMMY1
If you’ve set
For a detailed explanation of configuration parameters in the Extractor file, read Extractor Reference.