mirror of https://github.com/seaweedfs/seaweedfs.git synced 2026-06-13 23:36:45 +03:00

Files

T

Chris Lu fc75f16c30 test(s3tables): expand Dremio Iceberg catalog test coverage (#9303 )

* test(s3tables): expand Dremio Iceberg catalog test coverage

Restructure TestDremioIcebergCatalog into subtests and add three new
checks that go beyond a connectivity smoke test:

- ColumnProjection: SELECT id, label proves Dremio parsed the schema
  served by the SeaweedFS REST catalog (the previous SELECT COUNT(*)
  passed without exercising any column metadata).
- InformationSchemaColumns: verifies the table's columns are listed in
  Dremio's INFORMATION_SCHEMA.COLUMNS in the expected ordinal order.
- InformationSchemaTables: verifies the table is registered in
  INFORMATION_SCHEMA.TABLES.

All subtests share a single Dremio container startup, so total
runtime is unchanged.

* test(s3tables): exercise multi-level Iceberg namespaces from Dremio

Seed a 2-level Iceberg namespace (and a table inside it) via the REST
catalog before bootstrapping Dremio, then add a MultiLevelNamespace
subtest that scans the nested table by its dot-separated reference.

This relies on isRecursiveAllowedNamespaces=true (already set in the
Dremio source config) to surface the nested levels as folders. A
regression in either the SeaweedFS namespace path encoding (#8959-style)
or Dremio's recursive-namespace discovery would surface here.

Adds two helpers to keep the existing single-level call sites unchanged:

- createIcebergNamespaceLevels: namespace creation with []string levels
- createIcebergTableInLevels: table creation with []string levels and
  unit-separator (0x1F) URL encoding for the namespace path component

* test(s3tables): verify Dremio reads PyIceberg-written rows

The previous Dremio subtests only scanned empty tables, so they did not
exercise the data path - just the catalog/metadata path. Add a
PyIceberg-based writer that materializes parquet files plus a snapshot
on a separate table before Dremio bootstraps, and two new subtests:

- ReadWrittenDataCount: SELECT COUNT(*) returns 3.
- ReadWrittenDataValues: SELECT id, label ORDER BY id returns the three
  written rows with the expected (id, label) pairs.

The writer runs in a small image (Dockerfile.writer) built locally on
demand. It pip-installs pyiceberg+pyarrow once and reuses the layer
cache on subsequent runs. The CI workflow pre-pulls python:3.11-slim
to keep cold runs predictable.

The writer authenticates via the OAuth2 client_credentials flow that
SeaweedFS already exposes at /v1/oauth/tokens, mirroring the Go-side
helper used for REST-API table creation.

* test(s3tables): fix Dremio writer required-field schema mismatch

PyIceberg's append() compatibility check rejects an arrow column whose
nullability does not match the Iceberg field. The table schema declares
id as `required long`, but the default pyarrow int64 column is nullable
- so the writer failed with:

    1: id: required long  vs.  1: id: optional long

Declare an explicit pyarrow schema with nullable=False on id and
nullable=True on label to match the Iceberg side.

2026-05-03 00:17:16 -07:00

append_rows.py

test(s3tables): expand Dremio Iceberg catalog test coverage (#9303 )

2026-05-03 00:17:16 -07:00

Dockerfile.writer

test(s3tables): expand Dremio Iceberg catalog test coverage (#9303 )

2026-05-03 00:17:16 -07:00

dremio_catalog_test.go

test(s3tables): expand Dremio Iceberg catalog test coverage (#9303 )

2026-05-03 00:17:16 -07:00

README.md

test(s3tables): expand Dremio Iceberg catalog test coverage (#9303 )

2026-05-03 00:17:16 -07:00

README.md

Dremio Iceberg Catalog Integration Test

This directory contains a Dremio integration smoke test for SeaweedFS's Iceberg REST Catalog implementation.

What It Tests

TestDremioIcebergCatalog verifies the Dremio path end to end:

Starts a local SeaweedFS mini cluster with S3 Tables and Iceberg REST enabled.
Creates a SeaweedFS table bucket.
Creates an Iceberg namespace and empty table through the SeaweedFS REST catalog OAuth flow.
Starts dremio/dremio-oss:25.2.0.
Bootstraps a Dremio admin user and logs in.
Creates a Dremio RESTCATALOG source that points at the SeaweedFS catalog.
Submits Dremio SQL through /api/v3/sql, polls the job API, and reads job results.
Runs subtests against the SeaweedFS-backed Iceberg table:
- BasicSelect: Dremio is alive and answering SQL.
- CountEmptyTable: catalog-to-table resolution and a scan of an empty table.
- ColumnProjection: SELECT id, label succeeds and the response schema reports both columns. Failure here means Dremio could not parse the schema returned by the SeaweedFS catalog.
- InformationSchemaColumns: the table's columns are exposed through Dremio's metadata layer with the expected ordinal order.
- InformationSchemaTables: the table is registered in Dremio's INFORMATION_SCHEMA.
- MultiLevelNamespace: a 2-level Iceberg namespace (created via the REST API) is exposed by Dremio as nested folders, and a table inside it is queryable with dot-separated identifiers.
- ReadWrittenDataCount and ReadWrittenDataValues: a separate table is populated with three rows by a PyIceberg writer container (Dockerfile.writer + append_rows.py) before Dremio bootstraps; Dremio reads the data back and the values are verified. This exercises the actual data path, not just metadata.

The PyIceberg writer image is built on demand via Docker layer caching. The first build pulls python:3.11-slim and pip-installs PyIceberg + PyArrow (~1-2 min in CI); subsequent invocations are cheap.

Running Locally

Build or install weed, then run:

cd test/s3tables/catalog_dremio
go test -v -timeout 20m .

The test requires Docker. The GitHub Actions job runs on ubuntu-22.04 and executes the test for pull requests.

Configuration

The test uses these fixed credentials for the local SeaweedFS IAM config:

S3 access key: AKIAIOSFODNN7EXAMPLE
S3 secret key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Region: us-west-2
Warehouse bucket: iceberg-tables

The Dremio source is configured via POST /api/v3/catalog; it is not configured in dremio.conf. The Dremio container starts with the plugins.restcatalog.enabled support key enabled, which is required for the Iceberg REST Catalog source in Dremio OSS 25.2.

Troubleshooting

Ensure Docker is running: docker version
Ensure weed is built or available on PATH
Check host-gateway routing if Dremio cannot reach SeaweedFS: docker run --add-host host.docker.internal:host-gateway --rm alpine getent hosts host.docker.internal
Check Dremio logs from the failed test output; the harness prints the Dremio container tail on Dremio startup, source setup, or job failures.