Dremio Iceberg Integration
Dremio connects to SeaweedFS Iceberg tables using a RESTCATALOG source that points to the SeaweedFS Iceberg REST Catalog. Authentication to S3 uses standard access/secret keys; Dremio reaches the data layer over the S3A filesystem.
This page reflects the integration verified by the Dremio OSS 25.2.0 catalog test.
Prerequisites
- Dremio OSS 25.2.0 or later with the experimental REST catalog plugin enabled
- SeaweedFS started as shown in Setup below
In Dremio OSS 25.2 the Iceberg REST catalog source is gated behind a support key. Pass it to the JVM via DREMIO_JAVA_EXTRA_OPTS:
DREMIO_JAVA_EXTRA_OPTS="-Ddremio.debug.sysopt.plugins.restcatalog.enabled=true"
Setup
Start weed mini with credentials and a pre-created table bucket via environment variables:
export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
export S3_TABLE_BUCKET=my-table-bucket
weed mini -dir ~/data
This brings up the Iceberg REST Catalog on http://localhost:8181, the S3 endpoint on http://localhost:8333, an admin S3 identity using the AWS env vars (used as the Dremio source's S3 credentials below), and the table bucket my-table-bucket pre-created.
Configuration
Iceberg REST catalog sources in Dremio are not configured through dremio.conf. After Dremio is up, register the source by POSTing to /api/v3/catalog:
curl -X POST http://localhost:9047/api/v3/catalog \
-H "Authorization: _dremio$DREMIO_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"entityType": "source",
"name": "iceberg",
"type": "RESTCATALOG",
"config": {
"restEndpointUri": "http://host.docker.internal:8181",
"enableAsync": true,
"isCachingEnabled": false,
"maxCacheSpacePct": 100,
"isRecursiveAllowedNamespaces": true,
"propertyList": [
{"name": "warehouse", "value": "s3://my-table-bucket"},
{"name": "scope", "value": "PRINCIPAL_ROLE:ALL"},
{"name": "fs.s3a.aws.credentials.provider", "value": "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"},
{"name": "fs.s3a.endpoint", "value": "host.docker.internal:8333"},
{"name": "fs.s3a.path.style.access", "value": "true"},
{"name": "fs.s3a.connection.ssl.enabled", "value": "false"},
{"name": "fs.s3a.endpoint.region", "value": "us-east-1"},
{"name": "dremio.s3.compat", "value": "true"},
{"name": "dremio.s3.region", "value": "us-east-1"},
{"name": "dremio.bucket.discovery.enabled", "value": "false"},
{"name": "fs.s3a.audit.enabled", "value": "false"},
{"name": "fs.s3a.create.file-status-check", "value": "false"}
],
"secretPropertyList": [
{"name": "fs.s3a.access.key", "value": "AKIAIOSFODNN7EXAMPLE"},
{"name": "fs.s3a.secret.key", "value": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"},
{"name": "credential", "value": "AKIAIOSFODNN7EXAMPLE:wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"}
]
}
}'
Key settings:
restEndpointUripoints at the SeaweedFS Iceberg REST Catalog (default:8181).warehouseiss3://<table-bucket-name>. SeaweedFS maps this to the bucket of the same name.dremio.s3.compat=trueanddremio.bucket.discovery.enabled=falseare required for non-AWS S3 endpoints.fs.s3a.path.style.access=trueandfs.s3a.connection.ssl.enabled=falsematch a typical local SeaweedFS deployment.
If Dremio runs in a container and SeaweedFS runs on the host, use host.docker.internal (with --add-host host.docker.internal:host-gateway on Linux).
Example SQL
Dremio submits SQL through /api/v3/sql, which returns a job ID; poll /api/v3/job/<id> for completion and fetch rows from /api/v3/job/<id>/results.
The integration test exercises the read path against tables produced by the SeaweedFS REST catalog and populated by PyIceberg (and by extension any other Iceberg writer such as Spark or Trino):
SELECT * FROM iceberg.my_namespace.events;
SELECT COUNT(*) FROM iceberg.my_namespace.events;
Write paths from Dremio (CREATE TABLE, INSERT) are not exercised by the SeaweedFS test suite as of Dremio OSS 25.2.0. Treat Dremio primarily as a reader against tables produced by Spark, Trino, or other writers.
Multi-Level Namespaces
SeaweedFS exposes multi-level Iceberg namespaces (e.g. analytics.web) through dot-separated namespace names in REST catalog calls. Dremio surfaces them as nested folders under the source. The Dremio integration test exercises this path; no extra source configuration is required beyond isRecursiveAllowedNamespaces: true shown above.
Anonymous Access
When SeaweedFS runs without IAM (e.g. weed mini with no -s3.config), the REST catalog accepts unsigned requests. The Dremio source still needs S3 credentials for the data path, so leave fs.s3a.access.key / fs.s3a.secret.key set; SeaweedFS accepts any value when IAM is disabled.
See Also
- SeaweedFS Iceberg Catalog - Architecture and concepts
- S3 Table Bucket - Managing table buckets
- Dremio integration test - End-to-end reference
Introduction
- Quick Start with weed mini
- Simplest S3 Bucket and User Setup
- Components
- Getting Started
- Production Setup
- A typical step‐by‐step example
- Benchmarks
- FAQ
- Applications
API
Configuration
- Replication
- Store file with a Time To Live
- Failover Master Server
- Erasure coding for warm storage
- EC Bitrot Detection
- Server Startup via Systemd
- Environment Variables
Filer
- Filer Setup
- Directories and Files
- File Operations Quick Reference
- Data Structure for Large Files
- Filer Data Encryption
- Filer Commands and Operations
- Filer JWT Use
- TUS Resumable Uploads
Filer Stores
- Filer Cassandra Setup
- Filer Redis Setup
- Super Large Directories
- Path-Specific Filer Store
- Choosing a Filer Store
- Customize Filer Store
Management
Advanced Filer Configurations
- Migrate to Filer Store
- Add New Filer Store
- Filer Store Replication
- Filer Active Active cross cluster continuous synchronization
- Filer as a Key-Large-Value Store
- Path Specific Configuration
- Filer Change Data Capture
- Filer Operation Serialization
FUSE Mount
- FIO benchmark
- fstab and systemd mount
- POSIX Compliance
- Distributed POSIX Locks
- P2P reading in weed mount
WebDAV
SFTP Server
Cloud Drive
- Cloud Drive Benefits
- Cloud Drive Architecture
- Configure Remote Storage
- Mount Remote Storage
- Cache Remote Storage
- Cloud Drive Quick Setup
- Gateway to Remote Object Storage
AWS S3 API
- Amazon S3 API
- Supported APIs vs Minio
- S3 Lifecycle
- S3 Lifecycle vs Volume TTL
- S3 Conditional Operations
- S3 CORS
- S3 Object Lock and Retention
- S3 Object Versioning
- S3 API Benchmark
- S3 API FAQ
- S3 Bucket Quota
- S3 Rate Limiting
- S3 API Audit log
- S3 Nginx Proxy
- Docker Compose for S3
S3 Table Bucket
- S3 Table Bucket
- S3 Table Bucket Commands
- S3 Tables Security
- SeaweedFS Iceberg Catalog
- Iceberg Table Maintenance
Iceberg Integrations
- Spark Iceberg Integration
- Trino Iceberg Integration
- Dremio Iceberg Integration
- DuckDB Iceberg Integration
- Doris Iceberg Integration
- RisingWave Iceberg Integration
- Lakekeeper Iceberg Integration
S3 Authentication & IAM
- S3 Configuration - Start Here
- S3 Credentials (
-s3.config) - OIDC Integration (
-s3.iam.config) - Kubernetes ServiceAccount Authentication (IRSA-style)
- S3 Policy Variables
- S3 Policy Conditions
- S3 Bucket Policies
- Amazon IAM API
- AWS IAM CLI
- weed shell - Shell IAM Commands
Server-Side Encryption
S3 Client Tools
- AWS CLI with SeaweedFS
- s3cmd with SeaweedFS
- rclone with SeaweedFS
- restic with SeaweedFS
- nodejs with Seaweed S3
Machine Learning
HDFS
- Hadoop Compatible File System
- run Spark on SeaweedFS
- run HBase on SeaweedFS
- run Presto on SeaweedFS
- Hadoop Benchmark
- HDFS via S3 connector
Replication and Backup
- Async Replication to another Filer [Deprecated]
- Async Backup
- Async Filer Metadata Backup
- Async Replication to Cloud [Deprecated]
- Kubernetes Backups and Recovery with K8up
Metadata Change Events
Messaging
- Structured Data Lake with SMQ and SQL
- Seaweed Message Queue
- SQL Queries on Message Queue
- SQL Quick Reference
- PostgreSQL-compatible Server weed db
- Pub-Sub to SMQ to SQL
- Kafka to Kafka Gateway to SMQ to SQL
Use Cases
Operations
- System Metrics
- weed shell
- Data Backup
- Deployment to Kubernetes and Minikube
- Deployment with seaweed-up
Rust Volume Server
Advanced
- Large File Handling
- Optimization
- Optimization for Many Small Buckets
- Volume Management
- Tiered Storage
- Cloud Tier
- Cloud Monitoring
- Load Command Line Options from a file
- SRV Service Discovery
- Volume Files Structure
Security
- Security Overview
- Security Configuration
- Cryptography and FIPS Compliance
- Run Blob Storage on Public Internet