Native LUW

Source IBM Db2 with Native LUW #

You may want to use db2ReadLog API to read log records from the Db2 database logs, or query the Log Manager for current log state information. This page describes how to do that in Arcion when using Db2 as source.

I. Check Permissions #

  1. The user should have read access on all the databases, schemas, and tables to be replicated.

  2. The user should have read access to following system tables and views:

    a. SYSIBM.SYSTABLES

    b. SYSIBM.SQLTABLETYPES

    c. SYSIBM.SYSCOLUMNS

    d. SYSIBM.SYSTABCONST

    e. SYSIBM.SQLCOLUMNS

    f. SYSCAT.COLUMNS (required for fetch-schemas mode).

  3. The user should have execute permissions on the following system procedures:

    a. SYSIBM.SQLTABLES

    b. SYSIBM.SQLCOLUMNS

    c. SYSIBM.SQLPRIMARYKEYS

    d. SYSIBM.SQLSTATISTICS

Users need these permissions only once at the start of a fresh replication.

II. Create the Heartbeat Table #

For CDC replication, you must create the heartbeat table on the Source database with the following DDL:

CREATE TABLE "tpch"."replicate_io_cdc_heartbeat"("timestamp" BIGINT NOT NULL, 
CONSTRAINT "cdc_heartbeat_id_default_default" PRIMARY KEY("timestamp"))

III. Enable CDC-based Replication #

If you’re performing CDC-based replication from the source Db2 server, please follow the steps below:

On system running source Db2 server: #

  1. For any tables being replicated, run the following command:

    ALTER TABLE <TABLE> DATA CAPTURE CHANGES
    
  2. Check if the database is recoverable by running the followingg command:

    db2 get db cfg for <DATABASE> show detail | grep -i "logarch"
    

    If either LOGARCHMETH1 or LOGARCHMETH2 is set, the database is already recoverable.

    Skip the next step if the database is already recoverable.
  3. Update the db2 logging method by running the following command:

    db2 update db cfg for <DATABASE> using LOGARCHMETH1 LOGRETAIN
    

    Updating the logging methods leaves the database in a Backup Pending state. To recover from this, you need to backup the database with:

    db2 backup db <DATABASE> to <DESTINATION>
    

On system running Arcion Replicant: #

Skip steps 2-6 if Replicant is running from the same system as the source Db2 database.
  1. Configure the JAVA_OPTS environment variable with:

    export JAVA_OPTS=-Djava.library.path=lib
    
  2. Install Db2 Data Server Client Prerequisites by running the following commands:

    sudo dpkg --add-architecture i386
    sudo apt install libaio1 libstdc++6:i386 libpam0g:i386
    sudo apt install binutils
    
  3. Install Db2 Data Server Client:

    a. Download latest version of Db2 Data Server Client from IBM.

    b. Extract and start the installer by running db2_setup.

    c. Select Custom installation.

    d. Check the Base Application Development tools option on page 2 of the installation wizard.

    e. Leave remaining options as default and complete the installation.

  4. Catalog the source Db2 Server node by running the following:

    db2 catalog tcpip node <NODE_NAME> remote <REMOTE> server <PORT>
    
  5. Catalog the source Db2 database:

    db2 catalog database <DATABASE> at node <NODE_NAME>
    
  6. Finally, test the connection with:

    db2 connect to <DATABASE> user <USER>
    

IV. Configure Replicant #

You also need to configure Replicant’s Db2 connection configuration file:

  1. If you store your connection credentials in AWS Secrets Manager, you can tell Replicant to retrieve them. For more information, see Retrieve credentials from AWS Secrets Manager.

    Otherwise, you can put your credentials like usernames and passwords in plain form like the sample below:

    type: DB2
    
    database: tpch
    host: localhost
    port: 50002
    
    node: db2inst1
    
    username: replicant
    password: "Replicant#123"
    
    max-connections: 30
    
    max-retries: 10
    retry-wait-duration-ms: 1000
    

    Notice the property called node. It represents the name of the Db2 node you’re connecting to. It can go anywhere in the root of the file. The default node name is the Db2 user’s name.

  2. Set the value of cdc-log-storage to READ_LOG in the connection configuration file. This tells Replicant that you want to use the native db2ReadLog as the CDC log reader:

    cdc-log-config:
        cdc-log-storage: READ_LOG
    

V. Set up Extractor Configuration #

  1. From $REPLICANT_HOME, navigate to the Extractor configuration file:

    vi conf/src/db2.yaml
    
  2. The Extractor configuration file has three parts:

    • Parameters related to snapshot mode.
    • Parameters related to delta snapshot mode.
    • Parameters related to realtime mode.

    For snapshot mode, you can make use of the following sample:

    snapshot:
      threads: 16
      fetch-size-rows: 5_000
    
      _traceDBTasks: true
      #fetch-schemas-from-system-tables: true
    
      per-table-config:
      - catalog: tpch
        schema: db2user
        tables:
          lineitem:
            row-identifier-key: [l_orderkey, l_linenumber]
    

    For delta delta snapshot mode, you can make use of the following sample:

    delta-snapshot:
      #threads: 32
      #fetch-size-rows: 10_000
    
      #min-job-size-rows: 1_000_000
      max-jobs-per-chunk: 32
      _max-delete-jobs-per-chunk: 32
    
      delta-snapshot-key: last_update_time
      delta-snapshot-interval: 10
      delta-snapshot-delete-interval: 10
      _traceDBTasks: true
      replicate-deletes: false
    
      per-table-config:
        - catalog: tpch
          schema: db2user
          tables:
            #      testTable
            #        split-key: split-key-column
            #        split-hints:
            #          row-count-estimate: 100000 
            #          split-key-min-value: 1 
            #          split-key-max-value: 60_000
            #        delta-snapshot-key: delta-snapshot-key-column
            #        row-identifier-key: [col1, col2]
            #        update-key: [col1, col2]
            partsupp:
              split-key: partkey
    

    For realtime mode, you can make use of the following sample:

    realtime:
      #threads: 1
      #fetch-size-rows: 10000
      _traceDBTasks: true
      #fetch-interval-s: 0
      replicate-empty-string-as-null: true
    
    #  start-position:
    #    commit-time: '2020-08-24 08:16:38.019002'
    # idempotent-replay: false
    
      heartbeat:
        enable: true
        catalog: tpch
        schema: db2user
        #table-name: replicate_io_cdc_heartbeat
        #column-name: timestamp
        interval-ms: 10000
    

    In the sample above, notice the following details:

    • The parameter commit-time specifies the timestamp in UTC under start-position, which indicates the starting log position for realtime replication. To get a timestamp in UTC, you can execute the following query:

      SELECT CURRENT TIMESTAMP - CURRENT TIMEZONE AS UTC_TIMESTAMP FROM SYSIBM.SYSDUMMY1
      
    • If you’ve set message-format to DELIMITED, set replicate-empty-string-as-null to true.

For a detailed explanation of configuration parameters in the Extractor file, read Extractor Reference.