Mapper Configuration #
While replicating data between storages of different type, Replicant by default attempts to transfer the data as fetched from the source while maintaining its structure. But there maybe situations where you need more control over Source data mapping. That’s where a Mapper file comes in.
By using the Mapper file, you can precisely define how the data retrieved from the Source is applied to the Target/Destination.
You can specify the Mapper file when running Replicant in the folliwng way :
./bin/replicant snapshot conf/conn/oracle.yaml \
conf/conn/singlestore.yaml \
--map sqlserver_to_singlestore_map.yaml
Overview #
The Mapper file contains a map of rules where each rule applies to a single Target catalog or schema (namespace).
- For databases that support both catalog and schema, each rule applies to a single schema. The schema must be prefixed with the catalog (fully qualified).
- For each Target namespace, it’s possible to define a list of Source namespaces, the contents of which will then be mapped into the Target namespace.
When you need more control, you can define additional rules for the tables and columns. Table rules are defined in the similar fashion as namespace rules—using the tables map. In the tables map, you can specify multiple source tables for each destination tables.
Note: Target tables are defined using only their name, while Source table names need to be fully qualified with each table’s respective catalog and schema.
For each table mapping, it’s also possible to map columns based on their names.
Structure of Mapper Rules #
Following is a template of the structure of Mapper rules:
rules:
[<target_namespace>]:
source:
- [<source_namespace>]:
.
.
.
tables:
<target_table_name>:
source:
- [<src_namespace>, <src_table_name>]:
<target_col_name>: <src_col_name>
.
.
.
.
.
.
.
.
.
.
.
.
Example #
Below is a sample Mapper file for a SQL Server-to-SingleStore pipeline:
rules:
[testdb_s]:
source:
- [testdb, dbo]:
tables:
testTB1_s:
source:
[testdb, dbo, TestTB1]:
col1_s: col1
In the above example:
testdb_s
is the Target database name that you want to change to.- Under
source
parameter, the flow sequence containingtestdb
anddbo
represents Source database name and schema name (SQL Server default schema name isdbo
) respectively. testTB1_s
is the name of the table at Target that you want to change to.- The flow sequence
[testdb, dbo, TestTB1]
undersource
represents the Source database name, Source schema name, and Source table name respectively. Pay attention to case sensitivity. - The key-value pair
col1_s: col1
maps Target column name to Source column name
You can also find sample Mapper files for different pipelines in the mapper
directory of your Arcion Self-hosted download.
Dynamic trimming options #
There maybe cases where the Target database does not support the name length of tables in the Source database. In those cases, you can make use of the following options inside a mapper file to enable trimming of table and column names:
dynamic-identifier-name-trimming
[v20.12.04.5]
#
true
of false
.
When set to true
, table or column names are trimmed so that it fits the length the target supports.
identifier-mapping-table-namespace
[v20.12.04.5]
#
Represents the namespace where the mappings of table or column are stored in the target database.
catalog
: Specify catalog name if applicable.schema
: Specify schema name if applicable.
Convert letter case for database objects #
In cases where the target database requires database object names (tables or columns) to be in particular letter case, you can use the convert-case
option inside the mapper file:
convert-case
[v21.06.14.3]
#
The letter case to use for table or column names while mapping. The following values are allowed:
DEFAULT
LOWERCASE
UPPERCASE
When set to LOWERCASE
or UPPERCASE
, Replicant converts table or column names to the appropriate letter case while mapping. When set to DEFAULT
, Replicant maintains the letter case of source object names.
Note: If the table is explicitly mapped, then that mapping will override the convert-case behaviour. If a source table namedREGION
is mapped to destination tableregion_lowercase
, then this mapping will override theconvert-case
rule for this table. So, on target the table name becomesregion_lowercase
instead of uppercaseREGION
.
Delimiter in Kafka topic and Redis stream names #
Replicant supports either dot (.
), or underscore (_
) as the delimiter in Kafka topic and Redis stream names. This allows you to map your source database object names to the appropriate format in Kafka and Redis Streams.
To set the delimiter, set the object-name-concat-delimiter
parameter to one of the following values in the Mapper file:
-
DOT
-
Use dot (
.
) as the delimiter.For example, source table name
<catalog>.<schema>.<table>
is mapped to<catalog>.<schema>.<table>
. -
UNDERSCORE
-
Use underscore (
_
) as the delimiter. Underscore is the default delimiter for Kafka topic and Redis stream names.For example, source table name
<catalog>.<schema>.<table>
is mapped to<catalog>_<schema>_<table>
. - Schemas (databases)
- Tables
- Catalogs
- Schemas (databases)
- Tables
For example, notice the following Mapper sample for MySQL-to-Kafka pipeline:
rules:
[topic_prefix_r]:
source:
- [io,replicate]
[topic_prefix_s]:
source:
- [tpch_scale_0_01]
object-name-concat-delimiter: DOT
Mapper configuration in Databricks #
The Mapper file configuration in Databricks differs from other database platforms. The mapping system also varies between Legacy Databricks and Unity Catalog.
Mapping in Legacy Databricks #
Legacy Databricks provides a two-level namespace:
By default, Replicant maps catalog and schema of source database to the database (schema) of Legacy Databricks. If source supports both catalog and schema, then Replicant adds schema name as suffix in the database name of Legacy Databricks.
Sources that support catalog only #
For example, consider a MySQL table tpch.lineitem
, meaning table lineitem
in catalog tpch
. Replicant maps this table to tpch.lineitem
in Legacy Databricks.
Sources that support schema only #
For example, consider an Oracle table tpch.lineitem
, meaning table lineitem
in schema tpch
. Replicant maps this table to tpch.lineitem
in Legacy Databricks.
Sources that support both catalog and schema #
For example, consider a PostgreSQL table tpch.scale_0_01.lineitem
. Replicant maps this table to tpch_scale_0_01.lineitem
in Legacy Databricks, meaning table lineitem
in database tpch_scale_0_01
. Replicant adds schema scale_0_01
as a suffix to the database name.
Use Mapper file to change default namespace mapping in Legacy Databricks #
The precedding sections discuss the default mapping beahvior of Replicant. However, it’s possible to map source namespace to a custom namespace using the Mapper configuration file. For example:
rules:
[arcion]:
source:
- tpch
- io_blitzz
In the preceding Mapper sample, Replicant maps source catalogs tpch
and io_blitzz
to arcion
. So Replicant creates each source table under tpch
and io_blitzz
catalogs inside arcion
database.
Mapping in Unity Catalog #
Unity Catalog provides a three-level namespace:
By default, Replicant maps catalog and schema of source database to the catalog and schema of Unity Catalog respectively.
If source database doesn’t support catalog or schema, then Replicant maps to the default catalog main
or the default schema default
respectively. The following three sections describe how Replicant performs mapping for different scenarios.
Sources that support catalog only #
For example, consider a MySQL table tpch.lineitem
, meaning table lineitem
in catalog tpch
. In this case, Replicant maps tpch.lineitem
to tpch.default.lineitem
in Unity Catalog. Since MySQL doesn’t support schema, Replicant uses the default schema default
.
Sources that support schema only #
For example, consider an Oracle table tpch.lineitem
, meaning table lineitem
in schema tpch
. In this case, Replicant maps tpch.lineitem
to main.tpch.lineitem
in Unity Catalog. Since Oracle doesn’t support catalog, Replicant uses the default catalog main
.
Sources that support both catalog and schema #
For example, let’s consider a PostgreSQL table tpch.scale_0_01.lineitem
. Replicant maps this table to tpch.scale_0_01.lineitem
in Unity Catalog.
Use Mapper file to change default namespace mapping in Unity Catalog #
The precedding sections discuss the default mapping beahvior of Replicant. However, it’s possible to map source namespace to a custom namespace using the Mapper configuration file. For example:
rules:
[arcion, io]:
source:
- tpch
- io_blitzz
In the preceding Mapper sample, Replicant maps source catalogs tpch
and io_blitzz
to arcion.io
. So Replicant creates each source table under tpch
and io_blitzz
catalogs inside arcion.io
.