Files
seaweedfs/seaweed-volume
Chris Lu 49e83a26cb feat(seaweed-volume): auto-load EC shards on startup (#9212) (#9251)
* feat(seaweed-volume): auto-load EC shards on startup

The Rust volume server's load_existing_volumes only scanned .dat
files; EC shards on disk stayed invisible until something explicitly
issued VolumeEcShardsMount. Strict superset of the issue
seaweedfs/seaweedfs#9212 reports for Go: after a fresh restart, every
local EC shard was missing from the master's view.

Port loadAllEcShards from weed/storage/disk_location_ec.go:

- DiskLocation::load_all_ec_shards walks Directory (and IdxDirectory
  if separate) sorted, groups .ec?? shard files by (collection, vid),
  validates and mounts each group when its matching .ecx is found.
- handle_found_ecx_file: validate_ec_volume + mount_ec_shards path,
  with cleanup when .dat exists and validation fails (incomplete
  encoding) or load fails.
- check_orphaned_shards: cleans up shard remnants whose .ecx never
  arrived AND whose stale .dat is still present (interrupted
  encoding); leaves them on disk otherwise so cross-disk
  reconciliation / operator recovery can find them.
- check_dat_file_exists / parse_collection_volume_id /
  parse_ec_shard_extension: small helpers mirroring Go's checkDatFileExists,
  parseCollectionVolumeId, and the `\.ec\d{2,3}` regex.
- Wire through load_existing_volumes after the .dat scan; failures
  log but don't fail the disk's startup.

Tests:
- test_parse_ec_shard_extension covers .ec00–.ec255 and the rejection
  of .ec0, .ec999, .ecx, .ecj, .dat, and missing leading dot.
- test_load_all_ec_shards_mounts_pairs_with_ecx: shards + .ecx + .vif
  on disk get mounted into ec_volumes after load_existing_volumes.
- test_load_all_ec_shards_keeps_orphan_shards_when_no_dat: orphan
  shards (no .ecx, no .dat) stay on disk untouched
  (distributed-EC scenario).
- test_load_all_ec_shards_cleans_orphan_shards_when_dat_exists:
  orphan shards alongside a stale .dat get cleaned up
  (interrupted-encoding scenario).

Prerequisite for porting the cross-disk orphan-shard reconciliation
in seaweedfs/seaweedfs#9244 to Rust.

* fix(seaweed-volume): dedupe filenames when scanning data + idx dirs

load_all_ec_shards scans both `directory` and `idx_directory` (when
they differ) so the loop can pair `.ec??` shards with their `.ecx`
regardless of which dir owns the index. If the same filename is
present in both — possible in idempotent legacy layouts that
pre-date `-dir.idx` — the previous implementation processed it
twice. mount_ec_shards increments the per-shard `ec_shards` metric
inside the loop, so a duplicated `.ec??` entry would double-count
the gauge.

Use a HashSet<String> while accumulating entries so each filename
is processed exactly once.

Reported in PR #9251 review by @gemini-code-assist.

* fix(seaweed-volume): drive partial-mount cleanup through unmount_ec_shards

handle_found_ecx_file calls mount_ec_shards which adds shards one at
a time. mount_ec_shards increments the `ec_shards` gauge per shard
that successfully attaches. If mount fails halfway, plain
ec_volumes.remove(vid) drops the EcVolume but leaves the gauge
incremented for whatever did mount.

Drive the cleanup branches through unmount_ec_shards instead — it
mirror-decrements the gauge per shard and only then drops the
EcVolume. Same shape applied to both .dat-exists and distributed-EC
fallbacks.

Reported in PR #9251 review by @gemini-code-assist.

* docs(seaweed-volume): clarify parse_ec_shard_extension shard-id range

Doc previously said `.ec00`–`.ec999` but the implementation rejects
any shard id > 255 (matches the `EcVolumeShard` u8 typed shard id
and Go's `strconv.ParseInt(... 10, 64)` + `> 255` guard). Fix the
doc to say `.ec00`–`.ec255` and explain why the 3-digit form is
still recognised.

Reported in PR #9251 review by @coderabbitai.
2026-04-27 16:41:46 -07:00
..

SeaweedFS Volume Server (Rust)

A drop-in replacement for the SeaweedFS Go volume server, rewritten in Rust. It uses binary-compatible storage formats (.dat, .idx, .vif) and speaks the same HTTP and gRPC protocols, so it works with an unmodified Go master server.

Building

Requires Rust 1.75+ (2021 edition).

cd seaweed-volume
cargo build --release

The binary is produced at target/release/seaweed-volume.

Running

Start a Go master server first, then point the Rust volume server at it:

# Minimal
seaweed-volume --port 8080 --master localhost:9333 --dir /data/vol1 --max 7

# Multiple data directories
seaweed-volume --port 8080 --master localhost:9333 \
  --dir /mnt/ssd1,/mnt/ssd2 --max 100,100 --disk ssd

# With datacenter/rack topology
seaweed-volume --port 8080 --master localhost:9333 --dir /data/vol1 --max 7 \
  --dataCenter dc1 --rack rack1

# With JWT authentication
seaweed-volume --port 8080 --master localhost:9333 --dir /data/vol1 --max 7 \
  --securityFile /etc/seaweedfs/security.toml

# With TLS (configured in security.toml via [https.volume] and [grpc.volume] sections)
seaweed-volume --port 8080 --master localhost:9333 --dir /data/vol1 --max 7 \
  --securityFile /etc/seaweedfs/security.toml

Common flags

Flag Default Description
--port 8080 HTTP listen port
--port.grpc port+10000 gRPC listen port
--master localhost:9333 Comma-separated master server addresses
--dir /tmp Comma-separated data directories
--max 8 Max volumes per directory (comma-separated)
--ip auto-detect Server IP / identifier
--ip.bind same as --ip Bind address
--dataCenter Datacenter name
--rack Rack name
--disk Disk type tag: hdd, ssd, or custom
--index memory Needle map type: memory, leveldb, leveldbMedium, leveldbLarge
--readMode proxy Non-local read mode: local, proxy, redirect
--fileSizeLimitMB 256 Max upload file size
--minFreeSpace 1 (percent) Min free disk space before marking volumes read-only
--securityFile Path to security.toml for JWT keys and TLS certs
--metricsPort 0 (disabled) Prometheus metrics endpoint port
--whiteList Comma-separated IPs with write permission
--preStopSeconds 10 Graceful drain period before shutdown
--compactionMBps 0 (unlimited) Compaction I/O rate limit
--pprof false Enable pprof HTTP handlers

Set RUST_LOG=debug (or trace, info, warn) for log level control. Set SEAWEED_WRITE_QUEUE=1 to enable batched async write processing.

Features

  • Binary compatible -- reads and writes the same .dat/.idx/.vif files as the Go server; seamless migration with no data conversion.
  • HTTP + gRPC -- full implementation of the volume server HTTP API and all gRPC RPCs including streaming operations (copy, tail, incremental copy, vacuum).
  • Master heartbeat -- bidirectional streaming heartbeat with the Go master server; volume and EC shard registration, leader failover, graceful shutdown deregistration.
  • JWT authentication -- signing key configuration via security.toml with token source precedence (query > header > cookie), file_id claims validation, and separate read/write keys.
  • TLS -- HTTPS for the HTTP API and mTLS for gRPC, configured through security.toml.
  • Erasure coding -- Reed-Solomon EC shard management: mount/unmount, read, rebuild, copy, delete, and shard-to-volume reconstruction.
  • S3 remote storage -- FetchAndWriteNeedle reads from any S3-compatible backend (AWS, MinIO, Wasabi, Backblaze, etc.) and writes locally. Supports VolumeTierMoveDatToRemote/FromRemote for tiered storage.
  • Needle map backends -- in-memory HashMap, LevelDB (via rusty-leveldb), or redb (pure Rust disk-backed) needle maps.
  • Image processing -- on-the-fly resize/crop, JPEG EXIF orientation auto-fix, WebP support.
  • Streaming reads -- large files (>1MB) are streamed via spawn_blocking to avoid blocking the async runtime.
  • Auto-compression -- compressible file types (text, JSON, CSS, JS, SVG, etc.) are gzip-compressed on upload.
  • Prometheus metrics -- counters, histograms, and gauges exported at a dedicated metrics port; optional push gateway support.
  • Graceful shutdown -- SIGINT/SIGTERM handling with configurable preStopSeconds drain period.

Testing

Rust unit tests

cd seaweed-volume
cargo test

Go integration tests

The Go test suite can target either the Go or Rust volume server via the VOLUME_SERVER_IMPL environment variable:

# Run all HTTP + gRPC integration tests against the Rust server
VOLUME_SERVER_IMPL=rust go test -v -count=1 -timeout 1200s \
  ./test/volume_server/grpc/... ./test/volume_server/http/...

# Run a single test
VOLUME_SERVER_IMPL=rust go test -v -count=1 -timeout 60s \
  -run "TestName" ./test/volume_server/http/...

# Run S3 remote storage tests
VOLUME_SERVER_IMPL=rust go test -v -count=1 -timeout 180s \
  -run "TestFetchAndWriteNeedle" ./test/volume_server/grpc/...

Load testing

A load test harness is available at test/volume_server/loadtest/. See that directory for usage instructions and scenarios.

Architecture

The server runs three listeners concurrently:

  • HTTP (Axum 0.7) -- admin and public routers for file upload/download, status, and stats endpoints.
  • gRPC (Tonic 0.12) -- all VolumeServer RPCs from the SeaweedFS protobuf definition.
  • Metrics (optional) -- Prometheus scrape endpoint on a separate port.

Key source modules:

Path Description
src/main.rs Entry point, server startup, signal handling
src/config.rs CLI parsing and configuration resolution
src/server/volume_server.rs HTTP router setup and middleware
src/server/handlers.rs HTTP request handlers (read, write, delete, status)
src/server/grpc_server.rs gRPC service implementation
src/server/heartbeat.rs Master heartbeat loop
src/storage/volume.rs Volume read/write/delete logic
src/storage/needle.rs Needle (file entry) serialization
src/storage/store.rs Multi-volume store management
src/security.rs JWT validation and IP whitelist guard
src/remote_storage/ S3 remote storage backend

See DEV_PLAN.md for the full development history and feature checklist.