S3 Lifecycle — Architecture
High-level overview of the lifecycle worker. For implementation detail, see weed/s3api/s3lifecycle/DESIGN.md.
At a glance
The lifecycle worker runs as a scheduled job. Each invocation:
┌──────────────────────────────────────────┐
│ dailyrun.Run (one filer subscription) │
│ │
meta-log ──→ │ reader ──→ fan-out ──→ per-shard │
│ channels │
│ │
│ ┌──────────────────────────────────┐ │
│ │ 16 shard goroutines │ │
│ │ ┌──────────────────────────┐ │ │
│ │ │ walker(view, shardID)? │ │ │
│ │ │ drainShardEvents │ │ │
│ │ │ saveCursorAndPublish │ │ │
│ │ └──────────────────────────┘ │ │
│ └──────────────────────────────────┘ │
│ │
│ summary heartbeat + exit │
└──────────────────────────────────────────┘
One filer SubscribeMetadata stream covers every shard in this worker's set. A fan-out goroutine routes events to per-shard channels by ev.ShardID = sha256(bucket || "/" || key) >> 252. Each shard's goroutine independently runs the walker (when due), drains events, and persists its cursor.
Once every shard's goroutine returns, the worker tears down the subscription, emits a summary heartbeat, and exits.
Per-shard state
Each shard owns a cursor file on the filer at /etc/s3/lifecycle/daily-cursors/shard-NN.json:
TsNs — last meta-log event for which all matches dispatched successfully
RuleSetHash — ReplayContentHash of the rule set when this cursor was written
PromotedHash — PromotedHash(retentionWindow) at write time
LastWalkedNs — wall-clock of the last successful walker fire
The two hashes together detect every situation that invalidates the cursor: a replay-rule edit (RuleSetHash changes) or a partition flip (PromotedHash changes). On mismatch, the next pass triggers a recovery walk over RecoveryView(snap) to catch already-due objects across the full rule set, then rewinds the cursor.
Replay vs walker
The lifecycle rule space splits two ways:
| Path | Action kinds | Why this path |
|---|---|---|
| Replay (meta-log) | ExpirationDays, NoncurrentDays, AbortMPU |
DueTime is monotonic in event TsNs. The done early-stop works. |
| Walker (bucket list) | ExpirationDate, ExpiredObjectDeleteMarker, NewerNoncurrent |
DueTime depends on current sibling/version state, not event age. |
The engine's RulesForShard(shardID, retentionWindow) returns two snapshot views (replay, walk); each is a clone of the base snapshot with the action map masked to the right partition. router.Route consumes the replay view per event; the walker consumes the walk view per bucket.
A rule promoted to scan-only because its TTL exceeds meta-log retention moves from replay to walk — visible via PromotedHash.
Cadence layers
Three independent cadences shape worker behavior:
| Cadence | Set by | Default |
|---|---|---|
| Worker invocation | Admin scheduler DetectionIntervalMinutes |
1440 (daily) |
| Walker fire | walker_interval_minutes admin config |
0 (every invocation) |
| Cursor save | After each runShard |
n/a |
The walker throttle decouples walker firing from invocation rate. CI invokes the worker every 2s; production invokes once per day. Both can use the same code with appropriate walker_interval_minutes.
Failure model
- Worker crash mid-run. Cursor only advances past events whose matches all succeeded. On restart, the next pass resumes at the same cursor. Identity-CAS makes redundant deletes no-ops.
- Transient delete failure. Pass halts at the failing event, cursor persists. Next pass retries from there. Head-of-line blocking is intentional — surfaces real problems instead of silently retrying forever.
- Rule edits. Replay-rule edits trigger one-time recovery walk over
RecoveryView. Walker-only rule edits don't change either hash; walker reads the new rules on its next steady-state fire. - Object overwritten between event and delete.
LifecycleDeleteRPC's identity-CAS returnsNOOP_RESOLVED; cursor advances normally.
Rate limiting
Cluster-wide cap allocated per worker at job dispatch:
per_worker_rate = cluster_deletes_per_second / count(active_s3_lifecycle_workers)
Each worker shares one rate.Limiter across all shard goroutines. dispatchWithRetry calls limiter.Wait(ctx) before each LifecycleDelete RPC.
Components
| Path | Role |
|---|---|
engine/ |
Rule compilation, partition views (RulesForShard, RecoveryView) |
evaluate.go |
Per-event rule evaluation (EvaluateAction) |
due_at.go |
Per-(rule, kind, info) due-time computation |
router/router.go |
Per-event match emission (calls engine.Action and EvaluateAction) |
reader/reader.go |
Meta-log subscribe with ShardPredicate |
bootstrap/walker.go |
Bucket-walker with RunForShard filter |
dailyrun/run.go |
Main orchestrator: subscription, fan-out, per-shard runShard |
dailyrun/cursor.go |
Cursor type + filer JSON serializer |
dailyrun/walker_dispatcher.go |
Walker-to-LifecycleDelete adapter |
What it's not
- Not a streaming dispatcher. The earlier model kept a long-running goroutine per shard with an in-memory match heap. That code is gone. Worker is now "start, do today's work, stop."
- Not event-time accurate. Latency from PUT to delete is bounded by the worker invocation cadence plus the walker interval — typically up to 24h, not seconds.
- Not a general-purpose scheduler. The two action paths (replay, walker) are specific to lifecycle semantics. Don't add new event sources or actions without thinking through which path they belong on.
Why this shape
Each design choice points back to a specific failure mode of the prior streaming worker:
| Choice | Replaces |
|---|---|
| Per-pass run + exit | Long-running goroutines with ticker drift, leak risk, restart pain |
| Cursor file per shard | Per-key freeze state, retry counters, in-memory heap on every restart |
| Identity-CAS at dispatch time | Pre-dispatch consistency checks at schedule time, racing object updates |
Recovery branch over RecoveryView |
Implicit "is this rule new" tracking with bookkeeping flags |
| Walker throttle independent of invocation | Walker hammering filer when test driver invokes every 2s |
| Single subscription per pass | 16x filer load with 16 per-shard subscriptions |
The result is a worker the operator can reason about by reading 2 metrics and a heartbeat line, with a state machine small enough to fit in one design doc.
Introduction
- Quick Start with weed mini
- Simplest S3 Bucket and User Setup
- Components
- Getting Started
- Production Setup
- A typical step‐by‐step example
- Benchmarks
- FAQ
- Applications
API
Configuration
- Replication
- Store file with a Time To Live
- Failover Master Server
- Erasure coding for warm storage
- EC Bitrot Detection
- Server Startup via Systemd
- Environment Variables
Filer
- Filer Setup
- Directories and Files
- File Operations Quick Reference
- Data Structure for Large Files
- Filer Data Encryption
- Filer Commands and Operations
- Filer JWT Use
- TUS Resumable Uploads
Filer Stores
- Filer Cassandra Setup
- Filer Redis Setup
- Super Large Directories
- Path-Specific Filer Store
- Choosing a Filer Store
- Customize Filer Store
Management
Advanced Filer Configurations
- Migrate to Filer Store
- Add New Filer Store
- Filer Store Replication
- Filer Active Active cross cluster continuous synchronization
- Filer as a Key-Large-Value Store
- Path Specific Configuration
- Filer Change Data Capture
- Filer Operation Serialization
FUSE Mount
- FIO benchmark
- fstab and systemd mount
- POSIX Compliance
- Distributed POSIX Locks
- P2P reading in weed mount
WebDAV
SFTP Server
Cloud Drive
- Cloud Drive Benefits
- Cloud Drive Architecture
- Configure Remote Storage
- Mount Remote Storage
- Cache Remote Storage
- Cloud Drive Quick Setup
- Gateway to Remote Object Storage
AWS S3 API
- Amazon S3 API
- Supported APIs vs Minio
- S3 Lifecycle
- S3 Lifecycle vs Volume TTL
- S3 Conditional Operations
- S3 CORS
- S3 Object Lock and Retention
- S3 Object Versioning
- S3 API Benchmark
- S3 API FAQ
- S3 Bucket Quota
- S3 Rate Limiting
- S3 API Audit log
- S3 Nginx Proxy
- Docker Compose for S3
S3 Table Bucket
- S3 Table Bucket
- S3 Table Bucket Commands
- S3 Tables Security
- SeaweedFS Iceberg Catalog
- Iceberg Table Maintenance
Iceberg Integrations
- Spark Iceberg Integration
- Trino Iceberg Integration
- Dremio Iceberg Integration
- DuckDB Iceberg Integration
- Doris Iceberg Integration
- RisingWave Iceberg Integration
- Lakekeeper Iceberg Integration
S3 Authentication & IAM
- S3 Configuration - Start Here
- S3 Credentials (
-s3.config) - OIDC Integration (
-s3.iam.config) - Kubernetes ServiceAccount Authentication (IRSA-style)
- S3 Policy Variables
- S3 Policy Conditions
- S3 Bucket Policies
- Amazon IAM API
- AWS IAM CLI
- weed shell - Shell IAM Commands
Server-Side Encryption
S3 Client Tools
- AWS CLI with SeaweedFS
- s3cmd with SeaweedFS
- rclone with SeaweedFS
- restic with SeaweedFS
- nodejs with Seaweed S3
Machine Learning
HDFS
- Hadoop Compatible File System
- run Spark on SeaweedFS
- run HBase on SeaweedFS
- run Presto on SeaweedFS
- Hadoop Benchmark
- HDFS via S3 connector
Replication and Backup
- Async Replication to another Filer [Deprecated]
- Async Backup
- Async Filer Metadata Backup
- Async Replication to Cloud [Deprecated]
- Kubernetes Backups and Recovery with K8up
Metadata Change Events
Messaging
- Structured Data Lake with SMQ and SQL
- Seaweed Message Queue
- SQL Queries on Message Queue
- SQL Quick Reference
- PostgreSQL-compatible Server weed db
- Pub-Sub to SMQ to SQL
- Kafka to Kafka Gateway to SMQ to SQL
Use Cases
Operations
- System Metrics
- weed shell
- Data Backup
- Deployment to Kubernetes and Minikube
- Deployment with seaweed-up
Rust Volume Server
Advanced
- Large File Handling
- Optimization
- Optimization for Many Small Buckets
- Volume Management
- Tiered Storage
- Cloud Tier
- Cloud Monitoring
- Load Command Line Options from a file
- SRV Service Discovery
- Volume Files Structure
Security
- Security Overview
- Security Configuration
- Cryptography and FIPS Compliance
- Run Blob Storage on Public Internet