Iceberg tables

Use Snowflake Iceberg Tables #

Note: This feature is only available in Arcion self-hosted CLI.

From version 23.01.05.3, Arcion supports Snowflake Iceberg tables as target for both snapshot-based and realtime replication. To use Snowflake Iceberg tables as target, follow these instructions.

Prerequisites #

  1. Create an Amazon S3 bucket if it doesn’t exist.

  2. Create external volume in Snowflake for your AWS S3 bucket using the CREATE EXTERNAL VOLUME command:

    CREATE EXTERNAL VOLUME <volume_name>
        STORAGE_LOCATIONS =
        (
            (
            NAME = '<volume_name>'
            STORAGE_PROVIDER = 'S3'
            STORAGE_AWS_ROLE_ARN = '<iam_role>'
            STORAGE_BASE_URL = 's3://<bucket>[/<path>/]'
            )
        ); 
    

    Replace the following:

    • <volume_name>: the name of the new external volume
    • <iam_role>: the Amazon Resource Name (ARN) of the IAM role
    • <path>: an optional path that provides granular control over objects in the bucket

For more information on granting Snowflake access to your Amazon S3 bucket, see Accessing Amazon S3 Using External Volumes .

Specify Iceberg as table type in Applier configuration file #

In your Applier configuration file, you need to set the table-type property to ICEBERG under the per-table-config configuration. For example, look at the following sample Applier configuration:

snapshot:
  threads: 8

  batch-size-rows: 600_000
  txn-size-rows: 600_000
  per-table-config:
  - catalog: "CATALOG"
    schema: "SCHEMA"
    tables:
      TABLE_NAME:
        table-type: ICEBERG

  bulk-load:
    enable: true
    type: FILE
    save-file-on-error: true
Attention: In realtime replication, Replicant first creates the destination tables with a one-time data snapshot to transfer all existing data from the source. In this “snapshot phase”, Replicant needs to know beforehand whether or not you’re using Iceberg tables. For this reason, you must always use the snapshot section of the Applier configuration file to specify your per-table-config parameters, including the value of table-type. For more information about how different Replicant modes work, see Running Replicant.