Filter Configuration #
You can instruct Replicant which data collections, tables, or files to replicate to best suit your replication strategy.
The filter configuration file #
The filter configuration file contains a set of filter rules. Replicant follows these rules while carrying out replication. You can filter tables, views, and queries.
There are sample filter configuration files for different source databases inside the filter
directory of your Arcion self-hosted download.
The following configuration parameters are available that you can use to lay out your filters:
allow
#
What database, collections, or documents to replicate. allow
marks the entry point of the filter configuration file. You can specify the following parameters under allow
:
catalog
- The source catalog which needs to be replicated. Note that if a source system supports the concept of catalogs or databases, then you need to specify this configuration and its value should be the same as the database configuration value specified in the source system’s connection configuration file.
schema
- The source database schema that needs to be replicated. Each schema must have a separate entry.
types
-
The data type(s) to be replicated from the source catalog
catalog
enclosed in square brackets. Arcion Replicant supports the following types:TABLE
VIEW
QUERY
You can specify multiple data types.
allow
-
The entrypoint to define what collections or tables from
catalog
gets replicated. It has the following parameters available: TABLE_NAME
-
Specify the collection or table names that should be replicated from
catalog
. Note that each collection within the database must be a separate entry. allow
-
A list of columns in the table
TABLE_NAME
. These columns undergo replication. If you don't specify anything, Replicant replicates all columns. conditions
- A predicate for filtering the data while extracting from the source. If the source system supports SQL, you can specify the exact SQL predicate that Replicant uses while extracting data. Please note that Replicant executes the same predicate on both the source and target systems to achieve end-to-end filtering of data during replication.
src-conditions
-
Sometimes source and target systems support a different query language and a different mechanism to specify predicates. For example, source Oracle supporting SQL predicates while MongoDB supporting JSON predicates. In that case, you must specify the same filtering condition in both languages in
src-conditions
anddst-conditions
for the source and target systems respectively. dst-conditions
-
Same as
src-conditions
. allow-update-any
[v20.05.12.3]- This option applies to realtime (CDC-based) replication. It contains a list of columns. Replicant publishes update operations on this table only if any of the columns you specify here have been modified. Replicant looks for modifications in the UPDATE logs it receives from the source system.
allow-update-all
[v20.05.12.3]- This option applies to realtime (CDC-based) replication. It contains a list of columns. Replicant publishes update operations on this table only if all of the columns you specify here have been modified. Replicant looks for modifications in the UPDATE logs it receives from the source system.
Important: We recommend that you create an index on the target table columns when those columns are part of dst-conditions
.
Run Replicant #
After you have a filter file ready, run Replicant with the --filter
option, providing it the path to the filter file. For example:
./bin/replicant full \
conf/conn/oracle_src.yaml conf/conn/databricks.yaml \
--extractor conf/src/oracle.yaml \
--applier conf/dst/databricks.yaml \
--filter filter/oracle_filter.yaml \
Filter queries #
If you replicate source queries, you must whitelist them in the filter file. To do so, include the special tag QUERY
inside your filter types
list. The QUERY
tag instructs Replicant to allow all the queries under that specific catalog or schema. For example:
allow:
- schema: "tpch"
types: [QUERY]
If the filter types
list contains any other type besides QUERY
, you must explicitly specify the logical names under the allow
field. You define these logical names in the src-queries
configuration file as MACRO_NAME
and QUERY_NAME
. For example, the following sample specifies the ng_test_tbd_sql
query and the tables under the allow
field.
allow:
- schema : "tpch"
types: [TABLE, QUERY]
nation:
region:
ng_test_tbd_sql:
supplier: