* test(s3tables): expand Dremio Iceberg catalog test coverage
Restructure TestDremioIcebergCatalog into subtests and add three new
checks that go beyond a connectivity smoke test:
- ColumnProjection: SELECT id, label proves Dremio parsed the schema
served by the SeaweedFS REST catalog (the previous SELECT COUNT(*)
passed without exercising any column metadata).
- InformationSchemaColumns: verifies the table's columns are listed in
Dremio's INFORMATION_SCHEMA.COLUMNS in the expected ordinal order.
- InformationSchemaTables: verifies the table is registered in
INFORMATION_SCHEMA.TABLES.
All subtests share a single Dremio container startup, so total
runtime is unchanged.
* test(s3tables): exercise multi-level Iceberg namespaces from Dremio
Seed a 2-level Iceberg namespace (and a table inside it) via the REST
catalog before bootstrapping Dremio, then add a MultiLevelNamespace
subtest that scans the nested table by its dot-separated reference.
This relies on isRecursiveAllowedNamespaces=true (already set in the
Dremio source config) to surface the nested levels as folders. A
regression in either the SeaweedFS namespace path encoding (#8959-style)
or Dremio's recursive-namespace discovery would surface here.
Adds two helpers to keep the existing single-level call sites unchanged:
- createIcebergNamespaceLevels: namespace creation with []string levels
- createIcebergTableInLevels: table creation with []string levels and
unit-separator (0x1F) URL encoding for the namespace path component
* test(s3tables): verify Dremio reads PyIceberg-written rows
The previous Dremio subtests only scanned empty tables, so they did not
exercise the data path - just the catalog/metadata path. Add a
PyIceberg-based writer that materializes parquet files plus a snapshot
on a separate table before Dremio bootstraps, and two new subtests:
- ReadWrittenDataCount: SELECT COUNT(*) returns 3.
- ReadWrittenDataValues: SELECT id, label ORDER BY id returns the three
written rows with the expected (id, label) pairs.
The writer runs in a small image (Dockerfile.writer) built locally on
demand. It pip-installs pyiceberg+pyarrow once and reuses the layer
cache on subsequent runs. The CI workflow pre-pulls python:3.11-slim
to keep cold runs predictable.
The writer authenticates via the OAuth2 client_credentials flow that
SeaweedFS already exposes at /v1/oauth/tokens, mirroring the Go-side
helper used for REST-API table creation.
* test(s3tables): fix Dremio writer required-field schema mismatch
PyIceberg's append() compatibility check rejects an arrow column whose
nullability does not match the Iceberg field. The table schema declares
id as `required long`, but the default pyarrow int64 column is nullable
- so the writer failed with:
1: id: required long vs. 1: id: optional long
Declare an explicit pyarrow schema with nullable=False on id and
nullable=True on label to match the Iceberg side.
Dremio Iceberg Catalog Integration Test
This directory contains a Dremio integration smoke test for SeaweedFS's Iceberg REST Catalog implementation.
What It Tests
TestDremioIcebergCatalog verifies the Dremio path end to end:
- Starts a local SeaweedFS mini cluster with S3 Tables and Iceberg REST enabled.
- Creates a SeaweedFS table bucket.
- Creates an Iceberg namespace and empty table through the SeaweedFS REST catalog OAuth flow.
- Starts
dremio/dremio-oss:25.2.0. - Bootstraps a Dremio admin user and logs in.
- Creates a Dremio
RESTCATALOGsource that points at the SeaweedFS catalog. - Submits Dremio SQL through
/api/v3/sql, polls the job API, and reads job results. - Runs subtests against the SeaweedFS-backed Iceberg table:
BasicSelect: Dremio is alive and answering SQL.CountEmptyTable: catalog-to-table resolution and a scan of an empty table.ColumnProjection:SELECT id, labelsucceeds and the response schema reports both columns. Failure here means Dremio could not parse the schema returned by the SeaweedFS catalog.InformationSchemaColumns: the table's columns are exposed through Dremio's metadata layer with the expected ordinal order.InformationSchemaTables: the table is registered in Dremio'sINFORMATION_SCHEMA.MultiLevelNamespace: a 2-level Iceberg namespace (created via the REST API) is exposed by Dremio as nested folders, and a table inside it is queryable with dot-separated identifiers.ReadWrittenDataCountandReadWrittenDataValues: a separate table is populated with three rows by a PyIceberg writer container (Dockerfile.writer+append_rows.py) before Dremio bootstraps; Dremio reads the data back and the values are verified. This exercises the actual data path, not just metadata.
The PyIceberg writer image is built on demand via Docker layer caching. The first build pulls python:3.11-slim and pip-installs PyIceberg + PyArrow (~1-2 min in CI); subsequent invocations are cheap.
Running Locally
Build or install weed, then run:
cd test/s3tables/catalog_dremio
go test -v -timeout 20m .
The test requires Docker. The GitHub Actions job runs on ubuntu-22.04 and executes the test for pull requests.
Configuration
The test uses these fixed credentials for the local SeaweedFS IAM config:
- S3 access key:
AKIAIOSFODNN7EXAMPLE - S3 secret key:
wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY - Region:
us-west-2 - Warehouse bucket:
iceberg-tables
The Dremio source is configured via POST /api/v3/catalog; it is not configured in dremio.conf.
The Dremio container starts with the plugins.restcatalog.enabled support key enabled, which is required for the Iceberg REST Catalog source in Dremio OSS 25.2.
Troubleshooting
- Ensure Docker is running:
docker version - Ensure
weedis built or available onPATH - Check host-gateway routing if Dremio cannot reach SeaweedFS:
docker run --add-host host.docker.internal:host-gateway --rm alpine getent hosts host.docker.internal - Check Dremio logs from the failed test output; the harness prints the Dremio container tail on Dremio startup, source setup, or job failures.