Replay strategies for BigQuery and Databricks targets #
Replay strategies are how Arcion implements CDC changes and applies them in realtime to the target.
Replay strategies apply to the following targets:
Replicant automatically chooses the best replay strategy to use. So you don’t always have to explicitly specify it.
When a set of upcoming operations arrives, the Applier buffers the operations and checks the following criteria:
- The upcoming operation and the buffered operation has different operation types.
- The upcoming operation depends on the buffered operation.
The Applier applies the buffered batch if the operations meet one of the preceeding criteria. The Applier applies the buffered operations using the
MERGE statement on target.
Requires FULL logging for after and before images. In this strategy, Replicant takes the following approach to operations:
- Every insert is applied as an insert.
- Every delete is applied as a delete.
- Every update or replace is broken down into insert + delete.
Replicant also introduces a special column called
REPLICATE_IO_VERSION_METADATA. The insert-delete approach helps preserve only the latest version of each row. It also enables batching even if operations depend on the same row.
INSERT_DELETE but deletes are deleted using
MERGE statement instead of
This strategy applies only to tables with valid row-identifier keys, primary keys, or unique keys.
IN_MEMORY_MERGE requires only updated columns for logging.
In this strategy, Replicant calculates the checksum of row identifier keys, primary keys, or unique keys and persists the checksum in memory. Replicant uses this checksum to resolve conflicting operations. For example, update over insert or insert over delete. These conflicting operations are meant to apply to the same row. So Replicant resolves them in memory. After resolving, the Applier applies them using the
There is always a single row-identifier key (might have multiple columns) and a single primary key (might have multiple columns) for a table. If a table has multiple unique keys, we consider only one unique key in the absence of a row-identifier key.