Lakekeeper Iceberg Integration
Lakekeeper is an open-source Iceberg catalog that can use SeaweedFS as its storage backend via S3 Tables and STS-vended credentials.
Architecture
In a Lakekeeper setup with SeaweedFS:
- Lakekeeper acts as the Iceberg catalog, managing namespace and table metadata
- SeaweedFS provides S3-compatible storage for data files (Parquet) and metadata
- STS (Security Token Service) issues temporary credentials that Lakekeeper vends to clients
This architecture supports credential vending — Lakekeeper assumes an IAM role via STS and passes short-lived credentials to query engines, avoiding the need to distribute long-lived secrets.
Prerequisites
- SeaweedFS running with IAM and STS enabled
- A table bucket created via
weed shellor the S3 Tables API - Lakekeeper configured to use SeaweedFS as its storage
SeaweedFS IAM Configuration
Lakekeeper requires STS support for credential vending. Configure SeaweedFS with an IAM config that includes STS settings and an assumable role:
{
"identities": [
{
"name": "admin",
"credentials": [
{
"accessKey": "admin",
"secretKey": "admin"
}
],
"actions": ["Admin", "Read", "List", "Tagging", "Write"]
}
],
"sts": {
"tokenDuration": "12h",
"maxSessionLength": "24h",
"issuer": "seaweedfs-sts",
"signingKey": "BASE64_ENCODED_SIGNING_KEY"
},
"roles": [
{
"roleName": "LakekeeperVendedRole",
"roleArn": "arn:aws:iam::000000000000:role/LakekeeperVendedRole",
"trustPolicy": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": "*",
"Action": "sts:AssumeRole"
}
]
},
"attachedPolicies": ["FullAccess"]
}
],
"policies": [
{
"name": "FullAccess",
"document": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "*",
"Resource": "*"
}
]
}
}
]
}
Start SeaweedFS with IAM enabled:
weed mini \
-s3.config /path/to/iam_config.json \
-s3.iam.config /path/to/iam_config.json \
-s3.iam.readOnly=false
STS Credential Vending
Lakekeeper uses STS AssumeRole to obtain temporary credentials for accessing SeaweedFS S3:
POST http://localhost:8333/?Action=AssumeRole
&RoleArn=arn:aws:iam::000000000000:role/LakekeeperVendedRole
&RoleSessionName=lakekeeper-session
&Version=2011-06-15
The response includes temporary AccessKeyId, SecretAccessKey, and SessionToken that Lakekeeper vends to query engines.
S3 Tables Operations
Lakekeeper interacts with SeaweedFS via the S3 Tables REST API using SigV4 signing with the s3tables service name:
# Create a table bucket
PUT /buckets
Content-Type: application/x-amz-json-1.1
{"name": "iceberg-tables"}
# Create a namespace
PUT /namespaces/{bucketARN}
{"namespace": ["my_namespace"]}
# Create a table
PUT /tables/{bucketARN}/{namespace}
{"name": "my_table", "format": "ICEBERG"}
All requests must be signed with SigV4 using the s3tables service name and the appropriate region.
Key Configuration Parameters
| Parameter | Value |
|---|---|
| S3 endpoint | http://localhost:8333 |
| STS endpoint | http://localhost:8333 (same as S3) |
| Region | us-east-1 |
| SigV4 service (S3 Tables) | s3tables |
| SigV4 service (S3 data) | s3 |
| Role ARN | arn:aws:iam::000000000000:role/LakekeeperVendedRole |
See Also
- SeaweedFS Iceberg Catalog - Architecture and concepts
- S3 Tables Security - IAM policies for table access
- S3 Table Bucket - Managing table buckets
- STS Integration Tests - STS API details
Introduction
- Quick Start with weed mini
- Simplest S3 Bucket and User Setup
- Components
- Getting Started
- Production Setup
- A typical step‐by‐step example
- Benchmarks
- FAQ
- Applications
API
Configuration
- Replication
- Store file with a Time To Live
- Failover Master Server
- Erasure coding for warm storage
- EC Bitrot Detection
- Server Startup via Systemd
- Environment Variables
Filer
- Filer Setup
- Directories and Files
- File Operations Quick Reference
- Data Structure for Large Files
- Filer Data Encryption
- Filer Commands and Operations
- Filer JWT Use
- TUS Resumable Uploads
Filer Stores
- Filer Cassandra Setup
- Filer Redis Setup
- Super Large Directories
- Path-Specific Filer Store
- Choosing a Filer Store
- Customize Filer Store
Management
Advanced Filer Configurations
- Migrate to Filer Store
- Add New Filer Store
- Filer Store Replication
- Filer Active Active cross cluster continuous synchronization
- Filer as a Key-Large-Value Store
- Path Specific Configuration
- Filer Change Data Capture
- Filer Operation Serialization
FUSE Mount
- FIO benchmark
- fstab and systemd mount
- POSIX Compliance
- Distributed POSIX Locks
- P2P reading in weed mount
WebDAV
SFTP Server
Cloud Drive
- Cloud Drive Benefits
- Cloud Drive Architecture
- Configure Remote Storage
- Mount Remote Storage
- Cache Remote Storage
- Cloud Drive Quick Setup
- Gateway to Remote Object Storage
AWS S3 API
- Amazon S3 API
- Supported APIs vs Minio
- S3 Lifecycle
- S3 Lifecycle vs Volume TTL
- S3 Conditional Operations
- S3 CORS
- S3 Object Lock and Retention
- S3 Object Versioning
- S3 API Benchmark
- S3 API FAQ
- S3 Bucket Quota
- S3 Rate Limiting
- S3 API Audit log
- S3 Nginx Proxy
- Docker Compose for S3
S3 Table Bucket
- S3 Table Bucket
- S3 Table Bucket Commands
- S3 Tables Security
- SeaweedFS Iceberg Catalog
- Iceberg Table Maintenance
Iceberg Integrations
- Spark Iceberg Integration
- Trino Iceberg Integration
- Dremio Iceberg Integration
- DuckDB Iceberg Integration
- Doris Iceberg Integration
- RisingWave Iceberg Integration
- Lakekeeper Iceberg Integration
S3 Authentication & IAM
- S3 Configuration - Start Here
- S3 Credentials (
-s3.config) - OIDC Integration (
-s3.iam.config) - Kubernetes ServiceAccount Authentication (IRSA-style)
- S3 Policy Variables
- S3 Policy Conditions
- S3 Bucket Policies
- Amazon IAM API
- AWS IAM CLI
- weed shell - Shell IAM Commands
Server-Side Encryption
S3 Client Tools
- AWS CLI with SeaweedFS
- s3cmd with SeaweedFS
- rclone with SeaweedFS
- restic with SeaweedFS
- nodejs with Seaweed S3
Machine Learning
HDFS
- Hadoop Compatible File System
- run Spark on SeaweedFS
- run HBase on SeaweedFS
- run Presto on SeaweedFS
- Hadoop Benchmark
- HDFS via S3 connector
Replication and Backup
- Async Replication to another Filer [Deprecated]
- Async Backup
- Async Filer Metadata Backup
- Async Replication to Cloud [Deprecated]
- Kubernetes Backups and Recovery with K8up
Metadata Change Events
Messaging
- Structured Data Lake with SMQ and SQL
- Seaweed Message Queue
- SQL Queries on Message Queue
- SQL Quick Reference
- PostgreSQL-compatible Server weed db
- Pub-Sub to SMQ to SQL
- Kafka to Kafka Gateway to SMQ to SQL
Use Cases
Operations
- System Metrics
- weed shell
- Data Backup
- Deployment to Kubernetes and Minikube
- Deployment with seaweed-up
Rust Volume Server
Advanced
- Large File Handling
- Optimization
- Optimization for Many Small Buckets
- Volume Management
- Tiered Storage
- Cloud Tier
- Cloud Monitoring
- Load Command Line Options from a file
- SRV Service Discovery
- Volume Files Structure
Security
- Security Overview
- Security Configuration
- Cryptography and FIPS Compliance
- Run Blob Storage on Public Internet