updated table buckets based on new tests

Chris Lu
2026-05-03 00:19:30 -07:00
parent 5d9c0cb6b8
commit f54074f133
5 changed files with 101 additions and 5 deletions
+94
@@ -0,0 +1,94 @@
# Dremio Iceberg Integration
Dremio connects to SeaweedFS Iceberg tables using a `RESTCATALOG` source that points to the SeaweedFS Iceberg REST Catalog. Authentication to S3 uses standard access/secret keys; Dremio reaches the data layer over the S3A filesystem.
This page reflects the integration verified by the [Dremio OSS 25.2.0 catalog test](https://github.com/seaweedfs/seaweedfs/tree/master/test/s3tables/catalog_dremio).
## Prerequisites
- SeaweedFS running with the Iceberg REST Catalog enabled (port `8181` by default) and the S3 API on `8333`
- A table bucket created via `weed shell` or the S3 Tables API
- Dremio OSS 25.2.0 or later with the experimental REST catalog plugin enabled
In Dremio OSS 25.2 the Iceberg REST catalog source is gated behind a support key. Pass it to the JVM via `DREMIO_JAVA_EXTRA_OPTS`:
```bash
DREMIO_JAVA_EXTRA_OPTS="-Ddremio.debug.sysopt.plugins.restcatalog.enabled=true"
```
## Configuration
Iceberg REST catalog sources in Dremio are not configured through `dremio.conf`. After Dremio is up, register the source by `POST`ing to `/api/v3/catalog`:
```bash
curl -X POST http://localhost:9047/api/v3/catalog \
-H "Authorization: _dremio$DREMIO_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"entityType": "source",
"name": "iceberg",
"type": "RESTCATALOG",
"config": {
"restEndpointUri": "http://host.docker.internal:8181",
"enableAsync": true,
"isCachingEnabled": false,
"maxCacheSpacePct": 100,
"isRecursiveAllowedNamespaces": true,
"propertyList": [
{"name": "warehouse", "value": "s3://my-table-bucket"},
{"name": "scope", "value": "PRINCIPAL_ROLE:ALL"},
{"name": "fs.s3a.aws.credentials.provider", "value": "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"},
{"name": "fs.s3a.endpoint", "value": "host.docker.internal:8333"},
{"name": "fs.s3a.path.style.access", "value": "true"},
{"name": "fs.s3a.connection.ssl.enabled", "value": "false"},
{"name": "fs.s3a.endpoint.region", "value": "us-east-1"},
{"name": "dremio.s3.compat", "value": "true"},
{"name": "dremio.s3.region", "value": "us-east-1"},
{"name": "dremio.bucket.discovery.enabled", "value": "false"},
{"name": "fs.s3a.audit.enabled", "value": "false"},
{"name": "fs.s3a.create.file-status-check", "value": "false"}
],
"secretPropertyList": [
{"name": "fs.s3a.access.key", "value": "YOUR_ACCESS_KEY"},
{"name": "fs.s3a.secret.key", "value": "YOUR_SECRET_KEY"},
{"name": "credential", "value": "YOUR_ACCESS_KEY:YOUR_SECRET_KEY"}
]
}
}'
```
Key settings:
- `restEndpointUri` points at the SeaweedFS Iceberg REST Catalog (default `:8181`).
- `warehouse` is `s3://<table-bucket-name>`. SeaweedFS maps this to the bucket of the same name.
- `dremio.s3.compat=true` and `dremio.bucket.discovery.enabled=false` are required for non-AWS S3 endpoints.
- `fs.s3a.path.style.access=true` and `fs.s3a.connection.ssl.enabled=false` match a typical local SeaweedFS deployment.
If Dremio runs in a container and SeaweedFS runs on the host, use `host.docker.internal` (with `--add-host host.docker.internal:host-gateway` on Linux).
## Example SQL
Dremio submits SQL through `/api/v3/sql`, which returns a job ID; poll `/api/v3/job/<id>` for completion and fetch rows from `/api/v3/job/<id>/results`.
The integration test exercises the read path against tables produced by the SeaweedFS REST catalog and populated by PyIceberg (and by extension any other Iceberg writer such as Spark or Trino):
```sql
SELECT * FROM iceberg.my_namespace.events;
SELECT COUNT(*) FROM iceberg.my_namespace.events;
```
Write paths from Dremio (`CREATE TABLE`, `INSERT`) are not exercised by the SeaweedFS test suite as of Dremio OSS 25.2.0. Treat Dremio primarily as a reader against tables produced by Spark, Trino, or other writers.
## Multi-Level Namespaces
SeaweedFS exposes multi-level Iceberg namespaces (e.g. `analytics.web`) through dot-separated namespace names in REST catalog calls. Dremio surfaces them as nested folders under the source. The Dremio integration test exercises this path; no extra source configuration is required beyond `isRecursiveAllowedNamespaces: true` shown above.
## Anonymous Access
When SeaweedFS runs without IAM (e.g. `weed mini` with no `-s3.config`), the REST catalog accepts unsigned requests. The Dremio source still needs S3 credentials for the data path, so leave `fs.s3a.access.key` / `fs.s3a.secret.key` set; SeaweedFS accepts any value when IAM is disabled.
## See Also
- [[SeaweedFS Iceberg Catalog]] - Architecture and concepts
- [[S3 Table Bucket]] - Managing table buckets
- [Dremio integration test](https://github.com/seaweedfs/seaweedfs/tree/master/test/s3tables/catalog_dremio) - End-to-end reference
+1 -1
@@ -437,6 +437,6 @@ Each operation returns structured metrics in the job's `OutputValues` map, keyed
- [[S3 Table Bucket]] -- Table bucket concepts and file layout
- [[S3 Table Bucket Commands]] -- CLI examples for creating and managing table buckets
- [[SeaweedFS Iceberg Catalog]] -- Using the Iceberg REST Catalog with Spark and Trino
- [[SeaweedFS Iceberg Catalog]] -- Using the Iceberg REST Catalog with Spark, Trino, Dremio, and other engines
- [[S3 Tables Security]] -- IAM permissions for table buckets
- [[Worker]] -- Worker framework overview
+3 -3
@@ -4,7 +4,7 @@
SeaweedFS supports **Amazon S3 Tables**, providing a dedicated interface for managing structured datasets using the **Apache Iceberg** table format. Unlike standard S3 buckets that store unstructured objects, S3 Table Buckets are optimized for analytics workloads, offering a hierarchical structure of **Namespaces** and **Tables**.
This feature implements the **Iceberg REST Catalog API**, allowing direct integration with analytics engines like Apache Spark, Trino, and Flink without needing an external catalog service.
This feature implements the **Iceberg REST Catalog API**, allowing direct integration with analytics engines like Apache Spark, Trino, Dremio, DuckDB, and RisingWave without needing an external catalog service.
## Key Features
@@ -76,7 +76,7 @@ graph TD
## Workflows
The following diagrams illustrate how clients (like Spark or Trino) interact with SeaweedFS S3 Tables.
The following diagrams illustrate how clients (like Spark, Trino, or Dremio) interact with SeaweedFS S3 Tables.
### Write (Commit) Workflow
@@ -125,6 +125,6 @@ sequenceDiagram
## Documentation
- [S3 Table Bucket Commands](S3-Table-Bucket-Commands) - Examples of how to create and manage table buckets.
- [SeaweedFS Iceberg Catalog](SeaweedFS-Iceberg-Catalog) - Usage with Apache Spark and Trino.
- [SeaweedFS Iceberg Catalog](SeaweedFS-Iceberg-Catalog) - Usage with Apache Spark, Trino, Dremio, and other engines.
- [Iceberg Table Maintenance](Iceberg-Table-Maintenance) - Automated compaction, snapshot expiration, orphan removal, and manifest rewriting.
- [S3 Tables Security](S3-Tables-Security) - IAM permissions and policy reference.
+2 -1
@@ -8,7 +8,7 @@ The SeaweedFS S3 Tables feature implements the **Iceberg REST Catalog API**. Thi
- **Iceberg REST Catalog**: Available on a dedicated port (default `8181`)
- **S3 Data Access**: Available on the S3 port (default `8333`)
- **Authentication**: SigV4 (Spark, Trino, RisingWave) or OAuth2 (DuckDB)
- **Authentication**: SigV4 (Spark, Trino, RisingWave), OAuth2 (DuckDB), or unsigned REST + S3 access keys (Dremio)
## Catalog and Bucket Relationship
@@ -48,6 +48,7 @@ See the integration guide for your engine below.
|--------|------------|-------|
| **Apache Spark** | SigV4 | [[Spark Iceberg Integration]] |
| **Trino** | SigV4 | [[Trino Iceberg Integration]] |
| **Dremio** | S3 access keys (REST source) | [[Dremio Iceberg Integration]] |
| **DuckDB** | OAuth2 | [[DuckDB Iceberg Integration]] |
| **RisingWave** | SigV4 | [[RisingWave Iceberg Integration]] |
| **Lakekeeper** | STS + SigV4 | [[Lakekeeper Iceberg Integration]] |
+1
@@ -103,6 +103,7 @@
### Iceberg Integrations
* [[Spark Iceberg Integration]]
* [[Trino Iceberg Integration]]
* [[Dremio Iceberg Integration]]
* [[DuckDB Iceberg Integration]]
* [[RisingWave Iceberg Integration]]
* [[Lakekeeper Iceberg Integration]]