Skip to main content

BigQuery and Apache Iceberg

dbt supports materializing Iceberg tables on BigQuery via the catalog integration, starting with the dbt-bigquery 1.10 release.

Creating Iceberg Tables

dbt supports creating Iceberg tables for two of the BigQuery materializations:

BigQuery Iceberg catalogs

BigQuery supports Iceberg tables via its built-in catalog BigLake Metastore today. No setup is needed to access the BigLake Metastore. However, you will need to have a storage bucket and the required BigQuery roles configured prior to creating an Iceberg table.

dbt Catalog integration configurations for BigQuery

The following table outlines the configuration fields required to set up a catalog integration for Biglake Iceberg tables in BigQuery.

Configure catalog integration for managed Iceberg tables

  1. Create a catalogs.yml at the top level of your dbt project.

    An example:

catalogs:
- name: my_bigquery_iceberg_catalog
active_write_integration: biglake_metastore
write_integrations:
- name: biglake_metastore
external_volume: 'gs://mydbtbucket'
table_format: iceberg
file_format: parquet
catalog_type: biglake_metastore

  1. Apply the catalog configuration at either the model, folder, or project level:
iceberg_model.sql

{{
config(
materialized='table',
catalog_name='my_bigquery_iceberg_catalog'

)
}}

select * from {{ ref('jaffle_shop_customers') }}

  1. Execute the dbt model with dbt run -s iceberg_model.

Limitations

BigQuery today does not support connecting to external Iceberg catalogs. In terms of SQL operations and table management features, please refer to the BigQuery docs for more information.

Base location

Rationale

By default, dbt manages the full storage_uri on behalf of users for ease of use. The base_location parameter specifies the location within the storage bucket where the data will be written. Without guardrails (for example, if the user forgets to provide a base location root), it's possible for BigQuery to reuse the same path across multiple tables.

This behavior could result in future technical debt because it will limit the ability to:

  • Navigate the underlying object store
  • Read Iceberg tables via an object-store integration
  • Grant schema-specific access to tables via object store
  • Use a crawler pointed at the tables within the external storage to build a new catalog with another tool

To maintain best practices, dbt enforces an input and, by default, writes your tables within a _dbt/{SCHEMA_NAME}/{TABLE_NAME} prefix to ensure easier object-store observability and auditability.

Was this page helpful?

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

0
Loading