Destination Apache Spark #
The extracted replicant-cli
will be referred to as the $REPLICANT_HOME
directory in the proceeding steps.
I. Setup Connection Configuration #
-
From
$REPLICANT_HOME
, navigate to the sample connection configuration file:vi conf/conn/spark.yaml
-
You can store your connection credentials in a secrets management service and tell Replicant to retrieve the credentials. For more information, see Secrets management.
Otherwise, you can put your credentials like usernames and passwords in plain form like the sample below:
type: SPARK host: local #Replace local with your Apache Spark host storage-location: "/tmp/parquet" storage-type: PARQUET max-retries: 10 #Enter the maximum number of times Replicant can re-attempt a failed operation retry-wait-duration-ms: 1000 #Enter the time Replicant should wait between each re-try of a failed operation
II. Setup Applier Configuration #
-
From
$REPLICANT_HOME
, navigate to the applier configuration file:vi conf/dst/spark.yaml
-
Make the necessary changes as follows:
snapshot: threads: 16 #Maximum number of threads Replicant should use for writing to the targe # batch-size-rows: 5_000 txn-size-rows: 1_000_000 #If bulk-load is used, Replicant will use the native bulk-loading capabilities of the target database bulk-load: enable: true type: FILE # PIPE, FILE serialize: true #Set to true if you want the generated files to be applied in serial/parallel fashion
For a detailed explanation of configuration parameters in the applier file, read Applier Reference.