Replicant System Configuration

General system configuration of Replicant #

You can optionally configure several system parameters of Replicant. These parameters control Replicant’s various behavior and functions in replication—for example, tracing, logging, and information dashboard.

This feature is available since version 20.07.16.5.

To configure Replicant’s system parameters:

  1. Specify the general configuration parameters in a YAML configuration file.
  2. Run Replicant with the --general option and provide the full path to the general configuration file.

System configuration parameters #

liveness-monitor #

Controls liveness checks of Replicant. This allows you to configure how Replicant stops and resumes replication in different situations.

enable

{true|false}

Enable liveness monitoring.

inactive-timeout-ms

Specifies the replication inactivity time in milliseconds. If the liveness monitor detects no replication activity in this time period, Replicant stops and resumes replication.

Default: 900_000 (15 minutes).

snapshot-extractor-inactive-timeout-ms

Specifies the time period when no snapshot extraction activity occurs. If the liveness monitor detects no snapshot extraction activity in this time period, Replicant stops and resumes replication. If you don’t specify this parameter, it takes the value of inactive-timeout-ms.

Default: The value of inactive-timeout-ms.

snapshot-applier-inactive-timeout-ms

Specifies the time period when no snapshot Applier activity occurs. If the liveness monitor detects no snapshot Applier activity in this time period, Replicant stops and resumes replication. If you don’t specify this parameter, it takes the value of inactive-timeout-ms.

Default: The value of inactive-timeout-ms.

realtime-extractor-inactive-timeout-ms

Specifies the time period when no realtime extraction activity occurs. If the liveness monitor detects no realtime extraction activity in this time period, Replicant stops and resumes replication. If you don’t specify this parameter, it takes the value of inactive-timeout-ms.

Default: The value of inactive-timeout-ms.

realtime-applier-inactive-timeout-ms

Specifies the time period when no realtime Applier activity occurs. If the liveness monitor detects no realtime Applier activity in this time period, Replicant stops and resumes replication. If you don’t specify this parameter, it takes the value of inactive-timeout-ms.

Default: The value of inactive-timeout-ms.

min-free-memory-threshold-percent

If free memory drops below this threshold, Replicant stops and resumes operation.

liveness-check-interval-ms

Specifies the time interval between two successive liveness checks in milliseconds.

schema-validation[v20.09.14.8] #

Enables and configures schema validation. Replicant displays schema validation errors in the information dashboard.

enable

{true|false}.

Enables schema validation. Replicant validates the target schema against the source schema.

error-types

Specifies the error types in an array. The following error types are supported:

  • ALL
  • ERRORS
  • WARNINGS
  • COL_CNT_MISMATCH
  • COL_TYPE_MISMATCH

Default: [ALL].

warning-as-error

{true|false}.

Whether to consider warnings as errors.

Default: false.

dump-schema-mapping

{true|false}.

Controls whether or not Replicant dumps the mapping between source and target schemas.

permission-validation #

Validates whether user possesses appropriate permissions to read table data in a particular database. This parameter works in snapshot and full mode replication.

permission-validation shows expected behavior for the following databases:

  • Microsoft SQL Server
  • MySQL
  • Oracle
  • Snowflake

enable

{true|false}.

Enables permission validation.

Default: true for Databricks and Snowflake targets, false otherwise.

fencing [v20.10.07.3] #

This parameter allows you to prevent multiple instances of Replicant from executing simultaneously. Consider the situation when the same replication gets resumed twice, leading to two replication processes trying to perform the same job. Fencing ensures that the older replication process terminates as soon as a new replication process starts.

Replicant achieves this functionality by using validation tokens. A validation token consists of a monotonically increasing counter. Each replication obtains this counter at the start. Before each action on the respective storage, the replication job performs validation against this counter. Thus a validation token acts as a fence around the metadata and destination storage.

Fencing works in the following manner depending on the configuration:

  • In DDL fencing, Replicant embeds the validation token into the table name.
  • In DML fencing, Replicant embeds validation token into a row value and keeps it in the respective fencing table
  • Secifying NONE disables fencing for the respective storage.

enable-metadata-fence

Enables and specifies metadata fencing.

The following values are supported:

  • DDL
  • DML
  • NONE

Default: DDL for JDBC metadata databases.

enable-dst-fence

Enables and specifies fencing on the destination database.

The following values are supported:

  • DDL
  • DML
  • NONE

Default: DDL for JDBC databases.

enable-dst-query-fence [v20.02.01.13]

Enables and specifies query fencing on the destination database.

The following values are supported:

  • DDL
  • DML
  • NONE

Default: DDL for JDBC databases.

heartbeat-interval-ms

Specifies the time interval between successive heartbeat signals in milliseconds.

Default: 30_000

data-dir [v20.12.04.4] #

Specifies the directory to store temporary files related to bulk loading.

If you don’t specify trace-dir, data-dir also stores the trace.log file.

Default: data/.

trace-dir [v20.12.04.4] #

Specifies the directory location for the trace.log file.

If you set data-dir, Replicant creates trace-dir inside the data-dir directory.

Default: data-dir/default.

error-trace-dir #

Specifies the directory location for the error-trace.log file. If you set data-dir, Replicant creates error-trace-dir inside the data-dir directory.

Default: data-dir/default.

trace-time-zone [v20.12.04.8] #

The trace.log file contains timestamps in a specific timezone. This parameter allows you to specify the timezone to use.

For example, with trace-time-zone: Asia/Kolkata, trace.log contains timestamps as 2021-01-07 19:08:24.530 IST.

Default: UTC. For example, 2021-01-07 13:40:23.462.

trace-level [v20.12.04.12] #

Specifies the level of logback tracing. You can choose among the following trace levels:

  • DEBUG
  • INFO
  • ERROR
  • WARNING

Default: DEBUG.

archive-trace [v20.12.04.12] #

{true|false}.

Archives trace logs on a daily basis into time stamped files.

Default: true.

purge-trace-before-days [v20.12.04.12] #

Specify the number of days to keep trace.log archives. Older trace logs are automatically deleted.

Default: 0.

sensitive-info-trace-dir [v20.12.04.16] #

{true|false}.

If true, Replicant logs sensitive trace messages into a separate file in the sensitive_trace_directory directory.

Default: true.

dashboard-dump-file [v21.04.06.1] #

Replicant can dump the contents of the information dashboard in a file. This parameter allows you to configure its behavior.

enable

{true|false}.

Default: false.

storage

{FILE|SQLITE}.

Default: FILE.

location

Directory location for the dashboard dump file. The dump file is periodically udpated.

format

{TEXT|JSON}.

Specifies the file format for the dashboard dump file.

Default: TEXT.

interval-ms

Specifies the time interval for updating the dashboard dump file in milliseconds.

Default: 1000.

license-path [v21.05.04.3] #

Specifies the location of the license file.

db-connection-tracing [v21.05.04.6] #

{true|false}.

Replicant can collect diagnostics on database connection usage. This parameter allows you to enable stack trace dump during the diagnostics.

Default: false.

metadata #

This parameter allows you to reuse metadata tables.

reuse-metadata-tables

{true|false}.

Whether to reuse metadata tables instead of creating new ones.

Default: false.

report-dir #

Controls where Replicant stores reports, such as the permission validation report.

log-pattern #

Allows you to change the log format—for example, "%d{HH:mm:ss.SSS} [%t] [replicant] %-5level %logger{35} - %msg %n". For more information, see the logback documentation.

enable-console-logger #

Enables logging in the console. Normally, these logs go to the trace.log file.

cleanup-dst #

{true|false}.

Whether to clean up target metadata tables.

ntp-server #

Allows you to specify the time server for license validation.

Default: time.google.com.

Sample configuration #

You can find a sample Replicant system configuration file inside the conf/general directory of your Arcion self-hosted download.

The following shows a sample configuration:

liveness-monitor:
  enable: true
  inactive-timeout-ms: 900000
  min-free-memory-threshold-percent: 5
  liveness-check-interval-ms: 60000

schema-validation:
  enable: false

permission-validation:
  enable: false

archive-trace: true
purge-trace-before-days: 30

fetch-schema:
  skip-tables-on-failure: true

metadata:
  reuse-metadata-tables: true

Run Replicant with your configuration #

After configuring the system parameters, run Replicant with the --general option and give it the full path to your configuration file. For example:

./bin/replicant delta-snapshot conf/conn/oracle.yaml conf/conn/singlestore.yaml \
--general conf/general/general.yaml \
--extractor conf/src/oracle.yaml \
--applier conf/dst/singlestore.yaml \
--replace-existing --overwrite --id repl1 --resume