Destination Microsoft SQL Server #

The extracted replicant-cli will be referred to as the $REPLICANT_HOME directory in the following steps.

I. Set up connection configuration #

Specify the connection details of your SQL Server instance to Replicant with a connection configuration file. You can find a sample connection configuration file sqlserver.yaml in $REPLICANT_HOME/conf/conn directory. The following configuration parameters are available:

`type` #

The connection type representing the database. In this case, it’s SQLServer.

`host` #

The hostname of your SQL Server system.

`port` #

The port number to connect to the host.

`username` #

The username credential to access the SQL Server system.

`password` #

The password associated with username.

`max-connections` #

The maximum number of connections Replicant uses to load data into the SQL Server system.

For Arcion self-hosted only #

You can store your connection credentials in a secrets management service and tell Replicant to retrieve the credentials. For more information, see Secrets management.

The following is a sample connection configuration:

type: SQLSERVER

host: 192.168.0.166 
port: 1433

username: 'replicate'
password: 'Replicate#123'

max-connections: 30

II. Configure mapper file (optional) #

If you want to define data mapping from source to your target SQL Server, specify the mapping rules in the mapper file. For more information on how to define the mapping rules and run Replicant CLI with the mapper file, see Mapper Configuration.

III. Set up Applier configuration #

To configure replication mode according to your requirements, specify your configuration in the Applier configuration file. You can find a sample Applier configuration file sqlserver.yaml in the $REPLICANT_HOME/conf/dst directory.

Arcion Replicant supports the following Replicant modes for SQL Server as target:

snapshot
full
realtime
delta-snapshot

For more information about different Replicant modes, see Running Replicant.

You can configure replication modes by specifying the parameters under their respective sections in the configuration file. See the following sections for more information.

Configure `snapshot` mode #

The following is a sample configuration for operating in snapshot mode:

snapshot:
  threads: 32
  batch-size-rows: 10_000
  txn-size-rows: 1_000_000
  identity-column-insert: false

Additional `snapshot` parameters #

identity-insert-column

true or false.

Default: false.

Controls how Replicant handles identity columns in the following manner:

true. Replicant copies the values of identity columns from source to target.
false. Replicant doesn’t copy the values of identity columns from source to target. Instead, Replicant generates the values of identity columns in the target.

You can use this parameter both globally and in per-table-configs. For example, the following sample disables identity-insert-column globally and enables identity-insert-column for a table tableName:

snapshot:
  threads: 32
  batch-size-rows: 10_000
  txn-size-rows: 1_000_000

  identity-column-insert: false

  per-table-config:
  - catalog: catalogName
    schema: schemaName
    tables:
      tableName:
        identity-column-insert: true

Use bulk loading #

If you want to use bulk loading in snapshot mode, use the bulk-load section to specify your configuration. For example:

bulk-load:
  enable: true
  type: FILE
  bulk-file-location: \\192.168.0.199\Replicant

Important:

If you enable bulk loading, make sure to specify the bulk-file-location parameter. This is a Samba URL. Using this URL, SQL Server (running on Windows on some network) can access the files Replicant (running on the same network) generates and ingests them. If you disable bulk loading by setting enable to false, giving any value to bulk-file-location will throw parsing error.

The Samba URL must be a valid path so that the Windows machine can access the directory shared by the machine Replicant is running on.

For information on how to set up the shared directory with Samba, see Set up shared directory for bulk loading with Samba.

Replicant only supports FILE type bulk loading.

For more information about the configuration parameters for snapshot mode, see Snapshot Mode.

Configure `realtime` mode #

The following is a sample configuration for operating in realtime mode:

realtime:
  threads: 8
  txn-size-rows: 10000
  batch-size-rows: 1000
  identity-column-insert: false

Additional `realtime` parameters #

identity-insert-column

true or false.

Default: true.

Controls how Replicant handles identity columns in the following manner:

true. Replicant copies the values of identity columns from source to target.
false. Replicant doesn’t copy the values of identity columns from source to target. Instead, Replicant generates the values of identity columns in the target.

You can use this parameter both globally and in per-table-configs. For example, the following sample enables identity-insert-column globally and disables identity-insert-column for a table tableName:

realtime:
  threads: 8
  txn-size-rows: 10000
  batch-size-rows: 1000

  identity-column-insert: true

  per-table-config:
    - catalog: catalogName
      schema: schemaName
      tables:
        tableName:
          identity-column-insert: false

Important: Make sure to set identity-insert-column to true when you use identity column as primary key in realtime mode.

For more information about the configuration parameters for realtime mode, see Realtime Mode.

Delta-snapshot modes #

Arcion Replicant supports the following delta-snapshot modes:

UPDATE_INSERT

This mode works in the following manner:

Data is inserted into a temporary table.
Non-matching data is inserted from the temporary table into the original table.
Matching data is updated from the temporary table to the original table.

DELETE_INSERT

This mode works in the following manner:

Data is inserted into a temporary table.
Matching data is deleted from the original table.
Data is copied from the temporary table to the original table.

SINGLE_DELETE_INSERT

This mode works in same way as DELETE_INSERT except that it works in parallel for multiple Applier threads.

Enable recovery #

To enable recovery, you must run Replicant CLI with the --replace option. For more information, see Running Replicant.

Set up shared directory for bulk loading with Samba #

We use Samba to share a directory between the host running Replicant and the Windows VM running SQL Server. When using bulk loading, this shared directory allows the Windows VM access to the data files Replicant generates. Follow these steps for sharing directories across network without the hassle of passwords:

Install Samba with the following command:
```
samba sudo apt install samba
```

Back up your original Samba configuration file:

cp /etc/samba/smb.conf /etc/samba/smb.conf.orig

Open /etc/samba/smb.conf and replace its content with the following content:
```
[Replicant]
path = PATH_TO_SHARED_DIR
force user = USERNAME
force group = GROUPNAME
create mask = 0664
force create mode = 0664
directory mask = 0775
force directory mode = 0775
public = yes
writable = yes
```
Replace the following:
- PATH_TO_SHARED_DIR: the path to the shared directory on the network—for example, /home/replicant/arcion_work/replicant-core/data/{replication_id}/tmp_
- USERNAME: the effective username for a user accessing path
- GROUPNAME: the effective group for a user accessing path
Other machine on the same network or VM sees the path with the Replicant section name (specified in square brackets). Make sure that force user and force group both have permissions to read files from the shared directory.
Finally, restart the systemd service:
```
sudo systemtcl restart smbd.service
```

Now you can run Replicant with bulk loading enabled.

Caution: Whenever you change the replication ID, you need to make changes to smb.cnf and then restart the systemd service manually.

Destination Microsoft SQL Server #

I. Set up connection configuration #

type #

host #

port #

username #

password #

max-connections #

For Arcion self-hosted only #