Source MongoDB #

The extracted replicant-cli will be referred to as the $REPLICANT_HOME directory.

I. Set up connection configuration #

From $REPLICANT_HOME, navigate to the connection configuration file:
```
vi conf/conn/mongodb.yaml
```
For connecting to the MongoDB server, you can choose between the following methods for an authenticated connection:
Connect using MongoDB connection string URI #

To connect to MongoDB using connection strings URI, specify your credentials in plain text in the connection configuration file like the following sample:
```
type: MONGODB

url: "mongodb://localhost:27019/?w=majority"

max-connections: 30

replica-sets:
  mongors1:
    url: "mongodb://localhost:27017/?w=majority&replicaSet=mongors1"
  mongors2:
    url: "mongodb://localhost:27027/?w=majority&replicaSet=mongors2"
```
For multiple replica-sets, specify all of them under replica-sets according to the preceding format.

Replicant monitors the replica-sets for oplog entries to carry out real-time replication. Each url of a MongoDB replica set must represent the host:port belonging to the replica set. url must contain the option replicaSet=<replicaSet_name> to represent the URL as a replica set.

You can specify additional connection configurations in the url string according to the MongoDB syntax. For example, you can specify number of connections, Read Concern Options, Write Concern Options, etc. For more information, see Connection String Options.

Connect using SSL #

To connect to MongoDB using SSL, specify the SSL connection parameters in the ssl section of the connection configuration file:
```
ssl:
  key-store:
    path: 'PATH_TO_KEYSTORE'
    password: 'KEYSTORE_PASSWORD'
  trust-store:
    path: 'PATH_TO_TRUST_STORE'
    password: 'TRUSTSTORE_PASSWORD'
```
Replcate the following:
- PATH_TO_KEYSTORE: Path to your KeyStore file.
- KEYSTORE_PASSWORD: Your KeyStore password.
- PATH_TO_TRUST_STORE: Path to your TrustStore file.
- TRUSTSTORE_PASSWORD: Your TrustStore password.
Kerberos authentication #

Arcion Replicant supports replication from Kerberized MongoDB clusters. For Kerberos authentication, you have the following options available:
Using host, port, and user principal
In this method, you can lay out the connection configuration file in the following manner:
1. Specify the hostname and port number of your Kerberized MongoDB cluster using the host and port parameters respectively. If you have replica sets, specify them in the format we discuss in the Connect using connection strings URI section. The hostname must be a fully qualified domain name (FQDN) instead of an IP address.
2. Under kerberos section of the connection configuration file, specify the following:
  - The path to the krb5.conf Kerberos configuration file
  - User principal
  User principal name follows this format:
```
<USERNAME>%40<KERBEROS_REALM>
```
The following shows a sample connection configuration:
```
type: MONGODB
host: "routerdst.replicant.io" 
port: 27017
max-connections: 3

replica-sets:
  mongors1:
    host: "shard1dst.replicant.io"
    port: 27017
  mongors2:
    host: "shard0dst.replicant.io"
    port: 27017

kerberos:
  kerberosConfigFilePath: /etc/krb5.conf
  user-principal: "replicant%40REPLICANT.IO" 
```
Using connection string URI
In this method, you specify the necessary connection parameters inside the MongoDB connection string URI and set it as the url value in the connection configuration file. This applies to both the Kerberized MongoDB cluster URI and the individual URIs of the replica sets.

For more information on different components of MongoDB connection string URI, see Connection string URI components.

The connection string follows this format:
```
mongodb://<USER_PRINCIPAL@>HOST[:PORT]/?authSource=$external&authMechanism=GSSAPI
```
The following shows a sample configuration using this method:
```
type: MONGODB
url: "mongodb://replicant%40REPLICANT.IO@routersrc.replicant.io:27017/?authSource=$external&authMechanism=GSSAPI"
max-connections: 3

replica-sets:
  mongors1:
    url: "mongodb://replicant%40REPLICANT.IO@shard1src.replicant.io:27017/?authSource=$external&authMechanism=GSSAPI"
  mongors2:
    url: "mongodb://replicant%40REPLICANT.IO@shard0src.replicant.io:27017/?authSource=$external&authMechanism=GSSAPI"

max-retries: 1
retry-wait-duration-ms: 1000
```

II. Set up filter Configuration #

From $REPLICANT_HOME, navigate to the filter configuration file:
```
vi filter/mongodb_filter.yaml
```

According to your replication needs, specify the data to be replicated. Use the format of the following example:

allow:
- schema: "tpch"
  types: [TABLE]

  allow:
    lineitem:
    allow: ["item_one, item_two"]

    ng_test:  
      #Within ORDERS, only the test_one and test_two columns will be replicated as long as they meet the condition $and: [{c1: {$gt : 1}}, {c1: {$lt : 9}}]}
      allow: ["test_one", "test_two"]
      conditions: "{$and: [{c1: {$gt : 1}}, {c1: {$lt : 9}}]}"

    usertable: #All columns in the table usertable will be replicated without any predicates

The preceding sample consists of the following elements:

Data of object type Table in the schema tpch will be replicated.
From schema tpch, only the lineitem, ng_test, and usertable tables will be replicated.
Note: Unless specified, all tables in the catalog will be replicated.
Within lineitem, only the item_one and item_two columns will be replicated.
From the ng_test table, only the test_one and test_two columns will be replicated as long as they meet the condition specified in conditions.

The preceding sample follows the followig format. You must adhere to this format for specifying your filters.

allow:
- schema: SCHEMA_NAME
  types: OBJECT_TYPE

  allow:
    <your_table_name>:
       allow: ["COLUMN_NAME"]
       conditions: "CONDITION"

    <your_table_name>:  
       allow: ["COLUMN_NAME"]
       conditions: "CONDITION"

    <your_table_name>:
      allow: "COLUMN_NAME"]
      conditions: "CONDITION"

Replace the following:

SCHEMA_NAME: name of your MongoDB schema.
OBJECT_TYPE: object type of data.
COLUMN_NAME: column name.
CONDITION: the condition that must be satisfied in order for the specified columns to undergo replication.

Using the same format, specify the database, collections, or documents under the global-filter section for carrying out distributed replication across multiple nodes. Global filter is the sum total of all tables, including the Local filters of snapshot . For example:

global-filter:
  allow:
  - schema: "tpch"
    types: [TABLE]
    allow:
      nation :
      region :
      part :
      supplier :
      partsupp :
      orders :
      customer:
      lineitem:
      ng_test:
        conditions: "{$and: [{c1: {$gt : 1}}, {c1: {$lt : 9}}]}"
      usertable:

For a detailed explanation of configuration parameters in the Filter file, see Filter Reference.

III. Set up Extractor configuration #

From $REPLICANT_HOME, navigate to the Extractor configuration file:
```
vi conf/src/mongodb.yaml
```

The configuration file has two parts:

Parameters related to snapshot mode.
Parameters related to realtime mode.

For snapshot mode, make the necessary changes as follows in the snapshot section of the configuration file:

snapshot:
  threads: 16
  fetch-size-rows: 5000

  min-job-size-rows: 1_000
  max-jobs-per-chunk: 32

  split-key: _id
  _traceDBTasks: true
#  fetch-user-roles: true
#  fetch-system-tables: true
  normalize:
    enable: true
    de-duplicate: false
#     extract-upto-depth: 2
  per-table-config:
  - schema: tpch
    tables:
      t1:
        num-jobs: 1
      usertable1:
        split-key: field1
      lineitem:
        normalize:
          de-duplicate: false
          extract-upto-depth: 3
#       extraction-priority: 2  #Higher value is higher priority. Both positive and negative values are allowed. Default priority is 0 if unspecified.

If you want to operate in realtime mode, you can use the realtime section to specify your configuration. For example:

realtime:
  threads: 4
  fetch-size-rows: 10000
  fetch-duration-per-extractor-slot-s: 3
  _traceDBTasks: true
#   heartbeat:
#     enable: false
#     schema: io_replicate
#     interval-ms: 10_000

  replicate-ddl: true      #use for replicaSet only, not for sharded cluster

#   start-position:
#     increment: 1
#     timestamp-ms: 1598619575000

  normalize:
    enable: true
#     extract-upto-depth: 2

For a detailed explanation of configuration parameters in the Extractor file, see Extractor Reference.

Source MongoDB #

I. Set up connection configuration #

Connect using MongoDB connection string URI #

Connect using SSL #

Kerberos authentication #

II. Set up filter Configuration #

III. Set up Extractor configuration #

Parameters related to snapshot mode #

Parameters related to realtime mode #