This page describes the requirements for using IBM Informix as Source and how to achieve CDC replication.
Enabling CDC Replication #
-
First, follow the instructions described here in IBM Informix Documentation.
-
The user should have read access on all the databases, schemas and tables to be replicated.
-
You must set the log mode of the source Informix to
UNBUFFERED
orANSI.
For more details, see alter logmode argument: Change the database logging mode on the IBM Informix Documentation. For example, to set log mode toUNBUFFERED
, use the following command:EXECUTE FUNCTION task("alter logmode","<dbname>","u");
You can check the current log mode using SQL:
SELECT * FROM sysmaster:sysdatabases WHERE name = '<dbname>'
Logical Log Configuration Guidelines #
You must perform Logical log configuration in such manner that Informix logical logs do not get overwritten fast enough before Arcion Replicant has finished consuming them. To know about the general Informix guidelines for managing logical logs, see Managing logical-log files on IBM Informix Documentation.
Arcion replicant recommends the following:
-
It is important to have a sufficiently large
DBSPACE
(s) configured to hold logical logs corresponding to the transaction activity that is done for a few days to a week duration. The higher the logical log space configured, the more is the resiliency of CDC based replication for the failures with the error code CDC_E_LSN ( Data at the requested log sequence number is unavailable for capture). E.g. A DBSPACE can be added with following usingDBACCESS
:EXECUTE FUNCTION task("create dbspace", "logical_log_space", $INFORMIXDIR/tmp/dbspace3", "100000 M", 0);
-
A sufficiently large number of initial logical log files of appropriate size should be added in this dbspace. For example:
onparams -a -d dbspace3 -s 1000000
See Performance considerations on IBM Informix Documentation for an overview of all the performance considerations. It’ll help you determine the number of log files and size of each logical file. For CDC-based replication, we highly recommend you use the maximum possible
LOGSIZE
to avoid frequentLOG FULL
situations. -
The more the threads specified in the realtime section of the extractor configuration file (refer
conf/src/informix.yaml
in the release) the more is the number of CDC sessions replicant creates to pull logs from the source Informix database. For example, if there are 128 tables to be replicated and if number of realtime extractor threads is specified as 16 then replicant will divide 128 tables amongst these 16 extractor slots ( cdc sessions) each pulling CDC logs for 8 tables). This will help replicant to consume the logs aggressively by maximizing the available bandwidth of log consumption. -
Once Arcion replicant has been started and it finishes snapshot and enters realtime mode, it is possible to monitor the consumption rate in different ways as described below
-
Replicant has an in-built heartbeat mechanism by which replicant can precisely figure out how much is the replication lag at any point. For example, if an
INSERT
/UPDATE
/DELETE
operation was executed on the source Informix server at time X and it was applied to the target database by Arcion replicant at time X + 25 then replicant computes this replication lag as 25 seconds and shows it up on the dashboard.Replicant continuously computes and updates the replication lag value in real time and displays it on the Replicant dashboard. It is a true reliable metric to understand how much Replicant is lagging behind from the source system in terms of elapsed time.
-
On Informix server, it is possible to continuously monitor how fast Arcion Replicant is consuming the CDC logs buffers. To do so, use the following command:
[informix@62881888bf5b informix]$ onstat -g cdc bufm
This will yield an output like below:
IBM Informix Dynamic Server Version 14.10.FC3DE -- On-Line -- Up 03:21:20 -- 181044 Kbytes CDC subsystem structure at 0x4410cc98 CDC session structure at 0x4e387d28 CDC session id: 46137383 (0x2c00027) Bufer Manager at 0x4e38aa20 Number of allocated bufers high watermark: 22 Number of currently allocated bufers: 14 Minimum prepend for alloced bufers: 172 CDC session structure at 0x4e1a1d28 CDC session id: 45613095 (0x2b80027) Bufer Manager at 0x4e1a3a20 Number of allocated bufers high watermark: 22 Number of currently allocated bufers: 14 Minimum prepend for alloced bufers: 172 CDC session structure at 0x4e5add28 CDC session id: 45088807 (0x2b00027) Bufer Manager at 0x4e5afa20 Number of allocated bufers high watermark: 268 Number of currently allocated bufers: 14 Minimum prepend for alloced bufers: 172 CDC session structure at 0x4de7ad28 CDC session id: 44040231 (0x2a00027) Bufer Manager at 0x4e042a20 Number of allocated bufers high watermark: 22 Number of currently allocated bufers: 14 Minimum prepend for alloced bufers: 17
In this output, the more the number of currently allocated buffers for each CDC session, the more is the amount of unconsumed log by Arcion Replicant.
The value of this allocated buffers < 20 as above indicates all the logs have been consumed by Arcion repicant. When the value is much higher, it is an indication of Arcion replicant not being able to catch up with the high throughput of log generation. It is not possible to have record level granularity of records not yet fetched by Replicant. But once logs are fetched and are waiting to be applied we show “Buffered Oper” count on the replicant dashboard.
-
Informix does not offer any direct way for CDC consumers (like Arcion Replicant) to correlate the CDC record sequence number with the LSN in actual logical log files. (To know more, see the section CDC Record Sequence Number in IBM Informix Change Data Capture API Programmer’s Guide). Meaning, there is no direct way/API to know which logical log file Arcion Replicant is currently consuming.
However, Arcion replicant still offers one way to do this as below :
-
One of the metadata tables (in the metadata database) created by Arcion Replicant is
replicate_io_cdc_extractor_metadata_<id>_<id>
. The columncommitted_cursor
in this table gives the exact CDC cursor information for each CDC session that Arcion Replicant checkpoints and updates continuously. For example, notice the following SQL:select committed_cursor from blitzz.replicate_io_cdc_extractor_metadata_repl1_repl1;
An example output looks ike below:
{ "extractorId": 0, "timestamp": 1594378633405, "transactionNum": 45, "seqNum": 738842783900, "startSeqNumForOldestUncommitedTxn": 738842783768, "v": 1 } { "extractorId": 0, "timestamp": 1594378633405, "transactionNum": 45, "seqNum": 738842783900, "startSeqNumForOldestUncommitedTxn": 738842783768, "v": 1 } { "extractorId": 2, "timestamp": 1594378633405, "transactionNum": 45, "seqNum": 738842783900, "startSeqNumForOldestUncommitedTxn": 738842783768, "v": 1 } { "extractorId": 3, "timestamp": 1594378633405, "transactionNum": 45, "seqNum": 738842783900, "startSeqNumForOldestUncommitedTxn": 738842783768, "v": 1 } (4 rows)
Each value is a JSON string is with a timestamp field in it. This timestamp is the start time of the Informix transaction that Arcion Replicant has successfully replicated.
The minimum of all these values will give the oldest transaction’s start timestamp (say it is X ) which has been replayed by Replicant.This timestamp is in milliseconds (time since epoch) and it can be easily converted to a UTC timestamp.
-
Informix provides an event alarm mechanism where all event alarms are continuously logged in a
ph_alert
table. For information regarding all Informix events, see Event alarm IDs on IBM Informix Documentation.Following event is generated when a logical log file is full
Class ID: 23 Event ID: 23001 Class message: Logical log 'number' complete Specific message: Logical Log log_number Complete, timestamp: timestamp. The logical log is full, and no more transactions can be written to it. Online log: Message indicating that the logical log is full. Server state: Online. User action: None.
As soon as a logical log file gets full, an alarm is generated for this event. You can query for all such alarms in the
ph_alerts
table using the below query (on sysadmin database):SELECT * FROM ph_alerts WHERE alert_object_type='ALARM' AND alert_object_info = 23001;
This would give an output like the following:
alert_id 171 run_id 348 task_id 20 task_name post_alarm_message task_description System function to post alerts alert_type INFO alert_color YELLOW alert_time 2020-07-10 08:42:46 alert_state NEW alert_object_type ALARM alert_object_name 23 alert_message Logical Log 162 Complete, timestamp: 0x5b6bba. alert_action_dbs sysadmin alert_action alert_object_info 23001
The alert time here is a UTC time. Let us call it
Y
.It is possible to write an external monitoring script which can continuously query the following:
-
The Arcion replicant metadata table
replicate_io_cdc_extractor_metadata_repl1_repl1
to get the start time of the oldest transaction which has been successfully replicated by replicant (calling itX
).\ -
The
ph_alerts
table to get the most recent logical log full time and logical log file number (calling itY
). -
By comparing
X
withY
of each logical log file which becomes full, you can deduce which exact logical logs Replicant didn’t consume at any point. Based on that observation, you can take appropriate action. For example:- If X is TS 120
- Log FIle 1 became full at TS 50
- Log File 2 became full at TS 100
- Log File 3 became full at TS 150
- Log File 4 became full at TS 200 etc.
Then, we can deduce that replicant will need all log files starting from Log file 2, 3, 4 and so on.
-
-
-
When you deduce that Replicant is not able to consume CDC logs and more and more logical logs are getting full, you should take the following actions:
- Add new
DBSPACE
(s) and new logical log files dynamically to avoid the cyclic overwriting of the unconsumed logs.- Make sure that Replicant is running and not failing for any other external factors. These factors could be hardware issues on replication host, network connection failures, target system down, or slowness. If you can sort out any of such causes in a timely manner, you should be able to run Replicant again so that it can consume the pending CDC activity.