Files
seaweedfs/seaweed-volume
Chris Lu 3a8389cd68 fix(ec): verify full shard set before deleting source volume (#9490) (#9493)
* fix(ec): verify full shard set before deleting source volume (#9490)

Before this change, both the worker EC task and the shell ec.encode
command would delete the source .dat as soon as MountEcShards returned —
even if distribute/mount failed partway, leaving fewer than 14 shards
in the cluster. The deletion was logged at V(2), so by the time someone
noticed missing data the only trace was a 0-byte .dat synthesized by
disk_location at next restart.

- Worker path adds Step 6: poll VolumeEcShardsInfo on every destination,
  union the bitmaps, and refuse to call deleteOriginalVolume unless all
  TotalShardsCount distinct shard ids are observed. A failed gate leaves
  the source readonly so the next detection scan can retry.
- Shell ec.encode adds the same gate after EcBalance, walking the master
  topology with collectEcNodeShardsInfo.
- VolumeDelete RPC success and .dat/.idx unlinks now log at V(0) so any
  source destruction is traceable in default-verbosity production logs.

The EC-balance-vs-in-flight-encode race is intentionally left for a
follow-up; balance should refuse to move shards for a volume whose
encode job is not in Completed state.

* fix(ec): trim doc comments on the new shard-verification path

Drop WHAT-describing godoc on freshly added helpers; keep only the WHY
notes (query-error policy in VerifyShardsAcrossServers, the #9490
reference at the call sites).

* fix(ec): drop issue-number anchors from new comments

Issue references age poorly — the why behind each comment already
stands on its own.

* fix(ec): parametrize RequireFullShardSet on totalShards

Take totalShards as an argument instead of reading the package-level
TotalShardsCount constant. The OSS callers continue to pass 14, but the
helper is now usable with any DataShards+ParityShards ratio.

* test(plugin_workers): make fake volume server respond to VolumeEcShardsInfo

The new pre-delete verification gate calls VolumeEcShardsInfo on every
destination after mount, and the fake server's UnimplementedVolumeServer
returns Unimplemented — the verifier read that as zero shards on every
node and aborted source deletion. Build the response from recorded
mount requests so the integration test exercises the gate end-to-end.

* fix(rust/volume): log .dat/.idx unlink with size in remove_volume_files

Mirror the Go-side change in weed/storage/volume_write.go: stat each
file before removing and emit an info-level log for .dat/.idx so a
destructive call is always traceable. The OSS Rust crate previously
unlinked them silently.

* fix(ec/decode): verify regenerated .dat before deleting EC shards

After mountDecodedVolume succeeds, the previous code immediately
unmounts and deletes every EC shard. A silent failure in generate or
mount could leave the cluster with neither shards nor a valid normal
volume. Probe ReadVolumeFileStatus on the target and refuse to proceed
if dat or idx is 0 bytes.

Also make the fake volume server's VolumeEcShardsInfo reflect whichever
shard files exist on disk (seeded for tests as well as mounted via
RPC), so the new gate can be exercised end-to-end.

* fix(ec): address PR review nits in verification + fake server

- Drop unused ServerShardInventory.Sizes field.
- Skip shard ids >= MaxShardCount before bitmap Set so the ShardBits
  bound is explicit (Set already no-ops on overflow, this is for
  clarity).
- Nil-guard the fake server's VolumeEcShardsInfo so a malformed call
  doesn't panic the test process.
2026-05-13 19:29:24 -07:00
..

SeaweedFS Volume Server (Rust)

A drop-in replacement for the SeaweedFS Go volume server, rewritten in Rust. It uses binary-compatible storage formats (.dat, .idx, .vif) and speaks the same HTTP and gRPC protocols, so it works with an unmodified Go master server.

Building

Requires Rust 1.75+ (2021 edition).

cd seaweed-volume
cargo build --release

The binary is produced at target/release/seaweed-volume.

Running

Start a Go master server first, then point the Rust volume server at it:

# Minimal
seaweed-volume --port 8080 --master localhost:9333 --dir /data/vol1 --max 7

# Multiple data directories
seaweed-volume --port 8080 --master localhost:9333 \
  --dir /mnt/ssd1,/mnt/ssd2 --max 100,100 --disk ssd

# With datacenter/rack topology
seaweed-volume --port 8080 --master localhost:9333 --dir /data/vol1 --max 7 \
  --dataCenter dc1 --rack rack1

# With JWT authentication
seaweed-volume --port 8080 --master localhost:9333 --dir /data/vol1 --max 7 \
  --securityFile /etc/seaweedfs/security.toml

# With TLS (configured in security.toml via [https.volume] and [grpc.volume] sections)
seaweed-volume --port 8080 --master localhost:9333 --dir /data/vol1 --max 7 \
  --securityFile /etc/seaweedfs/security.toml

Common flags

Flag Default Description
--port 8080 HTTP listen port
--port.grpc port+10000 gRPC listen port
--master localhost:9333 Comma-separated master server addresses
--dir /tmp Comma-separated data directories
--max 8 Max volumes per directory (comma-separated)
--ip auto-detect Server IP / identifier
--ip.bind same as --ip Bind address
--dataCenter Datacenter name
--rack Rack name
--disk Disk type tag: hdd, ssd, or custom
--index memory Needle map type: memory, leveldb, leveldbMedium, leveldbLarge
--readMode proxy Non-local read mode: local, proxy, redirect
--fileSizeLimitMB 256 Max upload file size
--minFreeSpace 1 (percent) Min free disk space before marking volumes read-only
--securityFile Path to security.toml for JWT keys and TLS certs
--metricsPort 0 (disabled) Prometheus metrics endpoint port
--whiteList Comma-separated IPs with write permission
--preStopSeconds 10 Graceful drain period before shutdown
--compactionMBps 0 (unlimited) Compaction I/O rate limit
--pprof false Enable pprof HTTP handlers

Set RUST_LOG=debug (or trace, info, warn) for log level control. Set SEAWEED_WRITE_QUEUE=1 to enable batched async write processing.

Features

  • Binary compatible -- reads and writes the same .dat/.idx/.vif files as the Go server; seamless migration with no data conversion.
  • HTTP + gRPC -- full implementation of the volume server HTTP API and all gRPC RPCs including streaming operations (copy, tail, incremental copy, vacuum).
  • Master heartbeat -- bidirectional streaming heartbeat with the Go master server; volume and EC shard registration, leader failover, graceful shutdown deregistration.
  • JWT authentication -- signing key configuration via security.toml with token source precedence (query > header > cookie), file_id claims validation, and separate read/write keys.
  • TLS -- HTTPS for the HTTP API and mTLS for gRPC, configured through security.toml.
  • Erasure coding -- Reed-Solomon EC shard management: mount/unmount, read, rebuild, copy, delete, and shard-to-volume reconstruction.
  • S3 remote storage -- FetchAndWriteNeedle reads from any S3-compatible backend (AWS, MinIO, Wasabi, Backblaze, etc.) and writes locally. Supports VolumeTierMoveDatToRemote/FromRemote for tiered storage.
  • Needle map backends -- in-memory HashMap, LevelDB (via rusty-leveldb), or redb (pure Rust disk-backed) needle maps.
  • Image processing -- on-the-fly resize/crop, JPEG EXIF orientation auto-fix, WebP support.
  • Streaming reads -- large files (>1MB) are streamed via spawn_blocking to avoid blocking the async runtime.
  • Auto-compression -- compressible file types (text, JSON, CSS, JS, SVG, etc.) are gzip-compressed on upload.
  • Prometheus metrics -- counters, histograms, and gauges exported at a dedicated metrics port; optional push gateway support.
  • Graceful shutdown -- SIGINT/SIGTERM handling with configurable preStopSeconds drain period.

Testing

Rust unit tests

cd seaweed-volume
cargo test

Go integration tests

The Go test suite can target either the Go or Rust volume server via the VOLUME_SERVER_IMPL environment variable:

# Run all HTTP + gRPC integration tests against the Rust server
VOLUME_SERVER_IMPL=rust go test -v -count=1 -timeout 1200s \
  ./test/volume_server/grpc/... ./test/volume_server/http/...

# Run a single test
VOLUME_SERVER_IMPL=rust go test -v -count=1 -timeout 60s \
  -run "TestName" ./test/volume_server/http/...

# Run S3 remote storage tests
VOLUME_SERVER_IMPL=rust go test -v -count=1 -timeout 180s \
  -run "TestFetchAndWriteNeedle" ./test/volume_server/grpc/...

Load testing

A load test harness is available at test/volume_server/loadtest/. See that directory for usage instructions and scenarios.

Architecture

The server runs three listeners concurrently:

  • HTTP (Axum 0.7) -- admin and public routers for file upload/download, status, and stats endpoints.
  • gRPC (Tonic 0.12) -- all VolumeServer RPCs from the SeaweedFS protobuf definition.
  • Metrics (optional) -- Prometheus scrape endpoint on a separate port.

Key source modules:

Path Description
src/main.rs Entry point, server startup, signal handling
src/config.rs CLI parsing and configuration resolution
src/server/volume_server.rs HTTP router setup and middleware
src/server/handlers.rs HTTP request handlers (read, write, delete, status)
src/server/grpc_server.rs gRPC service implementation
src/server/heartbeat.rs Master heartbeat loop
src/storage/volume.rs Volume read/write/delete logic
src/storage/needle.rs Needle (file entry) serialization
src/storage/store.rs Multi-volume store management
src/security.rs JWT validation and IP whitelist guard
src/remote_storage/ S3 remote storage backend

See DEV_PLAN.md for the full development history and feature checklist.