Filter Configuration

Filter Configuration #

You can instruct Replicant which data collections, tables, or files to replicate to best suit your replication strategy.

The filter configuration file #

The filter configuration file contains a set of filter rules. Replicant follows these rules while carrying out replication. You can find sample filter configuration files for different source databases inside the filter directory of your Arcion self-hosted download.

The following configuration parameters are available that you can use to lay out your filters:

allow #

What database, collections, or documents to replicate. This is the entrypoint of the filter configuration file. The following parameters are available under allow:

catalog
The source catalog which needs to be replicated. Note that if a source system supports the concept of catalogs or databases, then you need to specify this configuration and its value should be the same as the database configuration value specified in the source system’s connection configuration file.
schema
The source database schema that needs to be replicated. Each schema must have a separate entry.
types
The data type(s) to be replicated from the source catalog catalog enclosed in square brackets. For example, the TABLE type data. You can specify multiple data types.
allow
The entrypoint to define what collections or tables from catalog gets replicated. It has the following parameters available:
TABLE_NAME
Specify the collection or table names that should be replicated from catalog. Note that each collection within the database must be a separate entry.
allow
A list of columns in the table TABLE_NAME which should be replicated. If you don't specify anything, all columns are replicated.
conditions
A predicate for filtering the data while extracting from the source. If the source system is an SQL system, you can specify the exact SQL predicate which Replicant should use while extracting data. Please note that the same predicate is executed on both the source and target systems to achieve the required end to end filtering of data during replication.
src-conditions
Sometimes source and target systems support a different query language and a different mechanism to specify predicates. For example, source Oracle supporting SQL predicates while MongoDB supporting JSON predicates. In that case, you must specify the same filtering condition in both languages in src-conditions and dst-conditions for the source and target systems respectively.
dst-conditions
Same as src-conditions.
allow-update-any [v20.05.12.3]
This option is relevant for realtime (CDC-based) replication. It contains a list of columns. Replicant publishes update operations on this table only if any of the columns you specify here have been modified. Replicant looks for modifications in the UPDATE logs it receives from the source system.
allow-update-all [v20.05.12.3]
This option is relevant for realtime (CDC-based) replication. It contains a list of columns. Replicant publishes update operations on this table only if all of the columns you specify here have been modified. Replicant looks for modifications in the UPDATE logs it receives from the source system.
Important: We recommend that you create an index on the columns of the target table which are part of dst-conditions.

Run Replicant #

After you have a filter file ready, run Replicant with the --filter option, providing it the path to the filter file. For example:

./bin/replicant full \
conf/conn/oracle_src.yaml conf/conn/databricks.yaml \
--extractor conf/src/oracle.yaml \
--applier conf/dst/databricks.yaml \
--filter filter/oracle_filter.yaml \