Google BigQuery

Source Google BigQuery #

The extracted replicant-cli will be referred to as the $REPLICANT_HOME directory in the proceeding steps.

I. Obtain the JDBC Driver for Google BigQuery #

Replicant requires the JDBC driver for Google BigQuery as a dependency. To obtain the appropriate driver, follow the steps below:

II. Set up Connection Configuration #

  1. From $REPLICANT_HOME, navigate to the sample connection configuration file:

    vi conf/conn/bigquery_src.yaml
    
  2. If you store your connection credentials in AWS Secrets Manager, you can tell Replicant to retrieve them. For more information, see Retrieve credentials from AWS Secrets Manager.

    Otherwise, you can put your credentials like usernames and passwords in plain form like the sample below:

    type: BIGQUERY
    
    host: https://www.googleapis.com/bigquery/v2
    port: 443
    project-id: <bigquery_projectID>
    auth-type: 0
    o-auth-service-acc-email: <your_service_account@your_project.iam.gserviceaccount.com>
    o-auth-pvt-key-path: <path_to_oauth_private_key>
    location: US
    timeout: 500
    
    username: "<your_username>"
    password: "<your_password>"
    
    max-connections: 20
    
    max-retries: 10
    retry-wait-duration-ms: 1000
    

III. Set up Extractor Configuration #

  1. From $REPLICANT_HOME, navigate to the Extractor configuration file:

    vi conf/src/bigquery.yaml
    
  2. Currently, Arcion only supports snapshot mode for BigQuery as Source. So make the necessary changes as follows in the snapshot section of the configuration file:

    snapshot:
      threads: 32
      fetch-size-rows: 10_000
    
      min-job-size-rows: 1_000_000
    #  max-jobs-per-chunk: 32
      per-table-config:
        - schema: tpch
          tables:
            partsupp:
              split-key: ps_partkey
            supplier:
              split-key: s_suppkey
            orders:
              split-key: o_orderkey
            nation:
              split-key: n_regionkey
    

For a detailed explanation of configuration parameters in the Extractor file, read Extractor Reference.