Source Google BigQuery #
replicant-cli will be referred to as the
$REPLICANT_HOME directory in the proceeding steps.
I. Obtain the JDBC Driver for Google BigQuery #
Replicant requires the JDBC driver for Google BigQuery as a dependency. To obtain the appropriate driver, follow the steps below:
- Go to the JDBC drivers for BigQuery page.
- From there, download the latest JDBC 4.2-compatible JDBC driver ZIP.
- From the downloaded ZIP, locate and extract the
- Put the
II. Set up Connection Configuration #
$REPLICANT_HOME, navigate to the sample connection configuration file:
If you store your connection credentials in AWS Secrets Manager, you can tell Replicant to retrieve them. For more information, see Retrieve credentials from AWS Secrets Manager.
Otherwise, you can put your credentials like usernames and passwords in plain form like the sample below:
type: BIGQUERY host: https://www.googleapis.com/bigquery/v2 port: 443 project-id: <bigquery_projectID> auth-type: 0 o-auth-service-acc-email: <your_service_account@your_project.iam.gserviceaccount.com> o-auth-pvt-key-path: <path_to_oauth_private_key> location: US timeout: 500 username: "<your_username>" password: "<your_password>" max-connections: 20 max-retries: 10 retry-wait-duration-ms: 1000
III. Set up Extractor Configuration #
$REPLICANT_HOME, navigate to the Extractor configuration file:
Currently, Arcion only supports snapshot mode for BigQuery as Source. So make the necessary changes as follows in the
snapshotsection of the configuration file:
snapshot: threads: 32 fetch-size-rows: 10_000 min-job-size-rows: 1_000_000 # max-jobs-per-chunk: 32 per-table-config: - schema: tpch tables: partsupp: split-key: ps_partkey supplier: split-key: s_suppkey orders: split-key: o_orderkey nation: split-key: n_regionkey
For a detailed explanation of configuration parameters in the Extractor file, read Extractor Reference.