Source Greenplum #
The extracted replicant-cli
will be referred to as the $REPLICANT_HOME
directory in the proceeding steps.
I. Set up Connection Configuration #
-
From
$REPLICANT_HOME
, navigate to the sample connection configuration file:vi conf/conn/greenplum.yaml
-
You can store your connection credentials in a secrets management service and tell Replicant to retrieve the credentials. For more information, see Secrets management.
Otherwise, you can put your credentials like usernames and passwords in plain form like the sample below:
type: GREENPLUM host: HOSTNAME port: PORT_NUMBER database: 'DATABASE_NAME' username: 'USERNAME' password: 'PASSWORD' max-connections: 30 socket-timeout-s: 60 max-retries: 10 retry-wait-duration-ms: 1000
Replace the following:
HOSTNAME
: the hostname of your Greenplum hostPORT_NUMBER
: the relevant port number of the Greenplum clusterDATABASE_NAME
: the name of the Greenplum databaseUSERNAME
: the Greenplum database role name to connect asPASSWORD
: the password associated withUSERNAME
II. Set up Extractor Configuration #
-
From
$REPLICANT_HOME
, navigate to the Extractor configuration file:vi conf/src/greenplum.yaml
-
The configuration file has two parts:
- Parameters related to snapshot mode.
- Parameters related to delta snapshot mode.
Parameters related to snapshot mode #
For snapshot mode, you can make use of the following sample:
snapshot: threads: 32 fetch-size-rows: 10_000 min-job-size-rows: 1_000_000 max-jobs-per-chunk: 32 _traceDBTasks: true lock: enable: false per-table-config: - catalog: tpch schema: public tables: lineitem: row-identifier-key: [l_orderkey, l_linenumber] split-key: l_orderkey split-hints: row-count-estimate: 15000 split-key-min-value: 1 split-key-max-value: 60_000
- In the absence of a
split-key
, we use a generated keygp_segment_id
for tables. This allows Replicant to split a table into multiple jobs, increasing parallelism. - In case of views or any other non-table object type, we don’t support extraction using generated key. Replicant honors
split-key
only if the user explicitly provides it in the Extractor configuration file.
-
If you want to enable objecting locking, you can do so by setting the
enable
field oflock
totrue
and providing the locking details in the following way:lock: enable: {true|false} scope: {TABLE|DATABASE} force: {true|false} timeout-sec: TIMEOUT_IN_SECONDS
Replace
TIMEOUT_IN_SECONDS
with the number of seconds you want the locking timeout to be—for example,5
. -
To specify extraction method for Source Greenplum, set the parameter
extraction-method
to any of the following two extraction methods:QUERY
COPY
.
-
You can also specify details for native extraction method—for example the type of compression to use. For Source Greenplum, Arcion currently supports only the
GZIP
compression type, which generates extracted files in compressed.gz
format. Notice the following sample:extraction-method: COPY native-extract-options: compression-type: "GZIP"
Important: Use
GZIP
as thecompression-type
only when you’ve set theextraction-method
toCOPY
. Otherwise, setcompression-type
toNONE
. -
You can also choose to specify the
extraction-method
andnative-extract-options
parameters in theper-table-config
section to more finely tune your table-specific requirements. For example:per-table-config: - catalog: tpch schema: public tables: lineitem: extraction-method: QUERY native-extract-options: compression-type: "NONE"
Parameters related to delta snapshot mode #
If you want to operate in delta snapshot mode, you can use the
delta-snapshot
section to specify your configuration. For example:delta-snapshot: threads: 32 fetch-size-rows: 10_000 min-job-size-rows: 1_000_000 max-jobs-per-chunk: 32 delta-snapshot-key: last_update_time delta-snapshot-interval: 10 delta-snapshot-delete-interval: 10 _traceDBTasks: true #replicate-deletes: false per-table-config: - catalog: tpch schema: public tables: part: delta-snapshot-key: last_update_time lineitem: delta-snapshot-key: last_update_time row-identifier-key: [l_orderkey, l_linenumber]
-
To specify extraction method, set the parameter
extraction-method
to any of the following two extraction methods:QUERY
COPY
.
-
You can also specify details for native extraction method—for example the type of compression to use. For Source Greenplum, Arcion currently supports only the
GZIP
compression type, which generates extracted files in compressed.gz
format. Notice the following sample:extraction-method: COPY native-extract-options: compression-type: "GZIP"
Important: Use
GZIP
as thecompression-type
only when you’ve setextraction-method
toCOPY
. Otherwise, setcompression-type
toNONE
. -
You can also choose to specify the
extraction-method
andnative-extract-options
parameters in theper-table-config
section to more finely tune your table-specific requirements. For example:per-table-config: - catalog: tpch schema: public tables: testTable: split-key: split-key-column delta-snapshot-key: delta-snapshot-key-column delta-snapshot-key-offset: '2019-10-05 13:23:45.890000' extraction-method: QUERY native-extract-options: compression-type: "NONE"
In the sample above, the
delta-snapshot-key-offset
parameter indicates the delta snapshot key to start replication from.
For a detailed explanation of configuration parameters in the Extractor file, see Extractor Reference.