DuckDB Iceberg Integration
DuckDB can query Iceberg tables stored in SeaweedFS using the Iceberg extension and the built-in Iceberg REST Catalog.
Prerequisites
- SeaweedFS running with the Iceberg REST Catalog enabled (port
8181by default) - A table bucket created via
weed shellor the S3 Tables API - DuckDB v1.1.0+ with the Iceberg extension
Quick Start
1. Start SeaweedFS with credentials and a table bucket
weed mini is fully configurable via environment variables — credentials become an admin S3 identity, and S3_TABLE_BUCKET pre-creates the Iceberg table bucket. No IAM config file or weed shell step needed:
export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
export S3_TABLE_BUCKET=my-catalog
weed mini -dir ~/data
This brings up:
- Iceberg REST Catalog on
http://localhost:8181 - S3 endpoint on
http://localhost:8333 - An admin S3 identity using the AWS env vars above (used as DuckDB's
CLIENT_ID/CLIENT_SECRET) - Table bucket
my-catalogpre-created
S3_TABLE_BUCKET (or the -tableBucket flag) accepts a comma-separated list (my-catalog,other-catalog); existing buckets are left alone. S3_BUCKET / -bucket does the same for plain (non-table) buckets.
2. Connect DuckDB
INSTALL iceberg;
LOAD iceberg;
-- Iceberg catalog secret (OAuth2 against the REST Catalog).
-- CLIENT_ID / CLIENT_SECRET must match the AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY exported above.
CREATE SECRET iceberg_secret (
TYPE ICEBERG,
ENDPOINT 'http://localhost:8181',
CLIENT_ID 'AKIAIOSFODNN7EXAMPLE',
CLIENT_SECRET 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
);
-- S3 secret for reading data files. Same credentials.
CREATE SECRET s3_secret (
TYPE S3,
KEY_ID 'AKIAIOSFODNN7EXAMPLE',
SECRET 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
ENDPOINT 'localhost:8333',
URL_STYLE 'path',
USE_SSL false
);
Authentication
SeaweedFS supports OAuth2 client_credentials flow for the Iceberg REST Catalog, which is what DuckDB uses. Your S3 access key and secret key are used as the CLIENT_ID and CLIENT_SECRET.
When DuckDB creates an Iceberg secret, it automatically:
- Posts to
POST /v1/oauth/tokenswith your credentials - Receives a bearer token
- Uses the bearer token for all subsequent catalog requests
With IAM Credentials
When weed mini is started with AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY exported (or with -s3.config=iam.json for richer multi-identity setups), the CLIENT_ID / CLIENT_SECRET passed to CREATE SECRET must match a registered access key / secret key pair. The catalog rejects unknown credentials with 401 invalid_client.
Anonymous Access (Development)
When SeaweedFS runs without IAM configuration (e.g., weed mini with no -s3.config), you can connect without credentials. DuckDB still requires the CLIENT_ID and CLIENT_SECRET fields, but any non-empty values will work:
CREATE SECRET (
TYPE ICEBERG,
ENDPOINT 'http://localhost:8181',
CLIENT_ID 'admin',
CLIENT_SECRET 'admin'
);
Scoping Secrets to a Catalog
Use the SCOPE parameter to bind a secret to a specific table bucket:
CREATE SECRET iceberg_secret (
TYPE ICEBERG,
ENDPOINT 'http://localhost:8181',
CLIENT_ID 'your-access-key',
CLIENT_SECRET 'your-secret-key',
SCOPE 's3://my-catalog/'
);
CREATE SECRET s3_secret (
TYPE S3,
KEY_ID 'your-access-key',
SECRET 'your-secret-key',
ENDPOINT 'localhost:8333',
URL_STYLE 'path',
USE_SSL false,
SCOPE 's3://my-catalog/'
);
Querying Tables
Once secrets are configured, you can query Iceberg tables:
-- Scan an Iceberg table by its S3 path
SELECT * FROM iceberg_scan('s3://my-catalog/my-namespace/my-table');
-- With metadata path
SELECT * FROM iceberg_scan('s3://my-catalog/my-namespace/my-table/metadata/v1.metadata.json');
Configuration Reference
| Parameter | Description | Default |
|---|---|---|
| Iceberg REST port | --port.iceberg (standalone) or -s3.port.iceberg (mini) |
8181 |
| S3 port | --port (standalone) or -s3.port (mini) |
8333 |
| Disable Iceberg | Set port to 0 |
Enabled |
Troubleshooting
"HTTP NotFound_404" on /v1/oauth/tokens
Upgrade SeaweedFS. The OAuth2 token endpoint was added to support DuckDB's authentication flow.
"access denied" when creating secrets
Ensure your CLIENT_ID and CLIENT_SECRET match a valid IAM identity configured in SeaweedFS. Check your -s3.config file.
DuckDB can connect but cannot read data
Make sure you have an S3 secret configured in addition to the Iceberg secret. The Iceberg secret handles catalog operations (listing namespaces, table metadata), while the S3 secret is needed for reading the actual Parquet data files.
-- Both secrets are needed:
-- 1. ICEBERG secret -> talks to the catalog API on port 8181
-- 2. S3 secret -> reads data files from S3 on port 8333
See Also
- SeaweedFS Iceberg Catalog - Architecture and Spark/Trino integration
- S3 Table Bucket - Managing table buckets
- S3 Tables Security - IAM policies for table access
Introduction
- Quick Start with weed mini
- Simplest S3 Bucket and User Setup
- Components
- Getting Started
- Production Setup
- A typical step‐by‐step example
- Benchmarks
- FAQ
- Applications
API
Configuration
- Replication
- Store file with a Time To Live
- Failover Master Server
- Erasure coding for warm storage
- EC Bitrot Detection
- Server Startup via Systemd
- Environment Variables
Filer
- Filer Setup
- Directories and Files
- File Operations Quick Reference
- Data Structure for Large Files
- Filer Data Encryption
- Filer Commands and Operations
- Filer JWT Use
- TUS Resumable Uploads
Filer Stores
- Filer Cassandra Setup
- Filer Redis Setup
- Super Large Directories
- Path-Specific Filer Store
- Choosing a Filer Store
- Customize Filer Store
Management
Advanced Filer Configurations
- Migrate to Filer Store
- Add New Filer Store
- Filer Store Replication
- Filer Active Active cross cluster continuous synchronization
- Filer as a Key-Large-Value Store
- Path Specific Configuration
- Filer Change Data Capture
- Filer Operation Serialization
FUSE Mount
- FIO benchmark
- fstab and systemd mount
- POSIX Compliance
- Distributed POSIX Locks
- P2P reading in weed mount
WebDAV
SFTP Server
Cloud Drive
- Cloud Drive Benefits
- Cloud Drive Architecture
- Configure Remote Storage
- Mount Remote Storage
- Cache Remote Storage
- Cloud Drive Quick Setup
- Gateway to Remote Object Storage
AWS S3 API
- Amazon S3 API
- Supported APIs vs Minio
- S3 Lifecycle
- S3 Lifecycle vs Volume TTL
- S3 Conditional Operations
- S3 CORS
- S3 Object Lock and Retention
- S3 Object Versioning
- S3 API Benchmark
- S3 API FAQ
- S3 Bucket Quota
- S3 Rate Limiting
- S3 API Audit log
- S3 Nginx Proxy
- Docker Compose for S3
S3 Table Bucket
- S3 Table Bucket
- S3 Table Bucket Commands
- S3 Tables Security
- SeaweedFS Iceberg Catalog
- Iceberg Table Maintenance
Iceberg Integrations
- Spark Iceberg Integration
- Trino Iceberg Integration
- Dremio Iceberg Integration
- DuckDB Iceberg Integration
- Doris Iceberg Integration
- RisingWave Iceberg Integration
- Lakekeeper Iceberg Integration
S3 Authentication & IAM
- S3 Configuration - Start Here
- S3 Credentials (
-s3.config) - OIDC Integration (
-s3.iam.config) - Kubernetes ServiceAccount Authentication (IRSA-style)
- S3 Policy Variables
- S3 Policy Conditions
- S3 Bucket Policies
- Amazon IAM API
- AWS IAM CLI
- weed shell - Shell IAM Commands
Server-Side Encryption
S3 Client Tools
- AWS CLI with SeaweedFS
- s3cmd with SeaweedFS
- rclone with SeaweedFS
- restic with SeaweedFS
- nodejs with Seaweed S3
Machine Learning
HDFS
- Hadoop Compatible File System
- run Spark on SeaweedFS
- run HBase on SeaweedFS
- run Presto on SeaweedFS
- Hadoop Benchmark
- HDFS via S3 connector
Replication and Backup
- Async Replication to another Filer [Deprecated]
- Async Backup
- Async Filer Metadata Backup
- Async Replication to Cloud [Deprecated]
- Kubernetes Backups and Recovery with K8up
Metadata Change Events
Messaging
- Structured Data Lake with SMQ and SQL
- Seaweed Message Queue
- SQL Queries on Message Queue
- SQL Quick Reference
- PostgreSQL-compatible Server weed db
- Pub-Sub to SMQ to SQL
- Kafka to Kafka Gateway to SMQ to SQL
Use Cases
Operations
- System Metrics
- weed shell
- Data Backup
- Deployment to Kubernetes and Minikube
- Deployment with seaweed-up
Rust Volume Server
Advanced
- Large File Handling
- Optimization
- Optimization for Many Small Buckets
- Volume Management
- Tiered Storage
- Cloud Tier
- Cloud Monitoring
- Load Command Line Options from a file
- SRV Service Discovery
- Volume Files Structure
Security
- Security Overview
- Security Configuration
- Cryptography and FIPS Compliance
- Run Blob Storage on Public Internet