mirror of https://github.com/seaweedfs/seaweedfs.git synced 2026-06-13 23:36:45 +03:00

Files

T

Chris Lu c4e1885053 fix(ec): honor disk_id in ReceiveFile so EC shards respect admin placement (#9184 ) (#9185 )

* test(volume_server): reproduce #9184 EC ReceiveFile disk-placement bug

The plugin-worker EC task sends shards via ReceiveFile, which picks
Locations[0] as the target directory regardless of the admin planner's
TargetDisk assignment. ReceiveFileInfo has no disk_id field, so there
is no wire channel to honor the plan.

Adds StartSingleVolumeClusterWithDataDirs to the integration framework
so tests can launch a volume server with N data directories. The new
repro asserts the current (buggy) behavior: sending three distinct EC
shards via ReceiveFile leaves all three files in dir[0] and the other
dirs empty. When the fix adds disk_id to ReceiveFileInfo, this
assertion must flip to verify the planned placement is respected.

* fix(ec): honor disk_id in ReceiveFile so EC shards respect admin placement

Before this change, VolumeServer.ReceiveFile for EC shards always
selected the first HDD location (Locations[0]). The plugin-worker EC
task had no way to pass the admin planner's per-shard disk
assignment — ReceiveFileInfo carried no disk_id field — so every
received EC shard piled onto a single disk per destination server.
On multi-disk servers this caused uneven load (one disk absorbing all
EC shard I/O), frequent ENOSPC retries, and a growing EC backlog
under sustained ingest (see issue #9184).

Changes:
- proto: add disk_id to ReceiveFileInfo, mirroring
  VolumeEcShardsCopyRequest.disk_id.
- worker: DistributeEcShards tracks the planner-assigned disk per
  shard; sendShardFileToDestination forwards that disk id. Metadata
  files (ecx/ecj/vif) inherit the disk of the first data shard
  targeting the same node so they land next to the shards.
- server: ReceiveFile honors disk_id when > 0 with bounds
  validation; disk_id=0 (unset) falls back to the same
  auto-selection pattern as VolumeEcShardsCopy (prefer disk that
  already has shards for this volume, then any HDD with free space,
  then any location with free space).

Tests updated:
- TestReceiveFileEcShardHonorsDiskID asserts three shards sent with
  disk_id={1,2,0} land on data dirs 1, 2, and 0 respectively.
- TestReceiveFileEcShardRejectsInvalidDiskID pins the out-of-range
  disk_id rejection path.

* fix(volume-rust): honor disk_id in ReceiveFile for EC shards

Mirror the Go-side change: when disk_id > 0 place the EC shard on the
requested disk; when unset, auto-select with the same preference order
as volume_ec_shards_copy (disk already holding shards, then any HDD,
then any disk).

* fix(volume): compare disk_id as uint32 to avoid 32-bit overflow

On 32-bit Go builds `int(fileInfo.DiskId) >= len(Locations)` can wrap a
high-bit uint32 to a negative int, bypassing the bounds check before the
index operation. Compare in the uint32 domain instead.

* test(ec): fail invalid-disk_id test on transport error

Previously a transport-level error from CloseAndRecv silently passed the
test by returning early, masking any real gRPC failure. Fail loudly so
only the structured ReceiveFileResponse rejection path counts as a pass.

* docs(test): explain why DiskId=0 auto-selects dir 0 in EC placement test

Documents the load-bearing assumption that shards are never mounted in
this test, so loc.FindEcVolume always returns false and auto-select
falls through to the first HDD. Saves future readers from re-deriving
the expected directory for the DiskId=0 case.

* fix(test): preserve baseDir/volume path for single-dir clusters

StartSingleVolumeClusterWithDataDirs started naming the data directory
volume0 even in the dataDirCount=1 case, which broke Scrub tests that
reach into baseDir/volume via CorruptDatFile / CorruptEcShardFile /
CorruptEcxFile. Keep the legacy name for single-dir clusters; only use
the indexed "volumeN" layout when multiple disks are requested.

2026-04-22 10:30:13 -07:00

proto

fix(ec): honor disk_id in ReceiveFile so EC shards respect admin placement (#9184 ) (#9185 )

2026-04-22 10:30:13 -07:00

src

fix(ec): honor disk_id in ReceiveFile so EC shards respect admin placement (#9184 ) (#9185 )

2026-04-22 10:30:13 -07:00

tests

Rust volume server implementation with CI (#8539 )

2026-03-26 17:24:35 -07:00

tools

Rust volume server implementation with CI (#8539 )

2026-03-26 17:24:35 -07:00

vendor/reed-solomon-erasure

Rust volume server implementation with CI (#8539 )

2026-03-26 17:24:35 -07:00

build.rs

Rust volume server implementation with CI (#8539 )

2026-03-26 17:24:35 -07:00

Cargo.lock

fix(rust): remove transitive openssl dependency from seaweed-volume

2026-04-04 14:07:01 -07:00

Cargo.toml

fix(rust): remove transitive openssl dependency from seaweed-volume

2026-04-04 14:07:01 -07:00

DEV_PLAN.md

Rust volume server implementation with CI (#8539 )

2026-03-26 17:24:35 -07:00

MISSING_FEATURES.md

Rust volume server implementation with CI (#8539 )

2026-03-26 17:24:35 -07:00

PARITY_PLAN.md

Rust volume server implementation with CI (#8539 )

2026-03-26 17:24:35 -07:00

README.md

Rust volume server implementation with CI (#8539 )

2026-03-26 17:24:35 -07:00

README.md

SeaweedFS Volume Server (Rust)

A drop-in replacement for the SeaweedFS Go volume server, rewritten in Rust. It uses binary-compatible storage formats (.dat, .idx, .vif) and speaks the same HTTP and gRPC protocols, so it works with an unmodified Go master server.

Building

Requires Rust 1.75+ (2021 edition).

cd seaweed-volume
cargo build --release

The binary is produced at target/release/seaweed-volume.

Running

Start a Go master server first, then point the Rust volume server at it:

# Minimal
seaweed-volume --port 8080 --master localhost:9333 --dir /data/vol1 --max 7

# Multiple data directories
seaweed-volume --port 8080 --master localhost:9333 \
  --dir /mnt/ssd1,/mnt/ssd2 --max 100,100 --disk ssd

# With datacenter/rack topology
seaweed-volume --port 8080 --master localhost:9333 --dir /data/vol1 --max 7 \
  --dataCenter dc1 --rack rack1

# With JWT authentication
seaweed-volume --port 8080 --master localhost:9333 --dir /data/vol1 --max 7 \
  --securityFile /etc/seaweedfs/security.toml

# With TLS (configured in security.toml via [https.volume] and [grpc.volume] sections)
seaweed-volume --port 8080 --master localhost:9333 --dir /data/vol1 --max 7 \
  --securityFile /etc/seaweedfs/security.toml

Common flags

Flag	Default	Description
`--port`	`8080`	HTTP listen port
`--port.grpc`	`port+10000`	gRPC listen port
`--master`	`localhost:9333`	Comma-separated master server addresses
`--dir`	`/tmp`	Comma-separated data directories
`--max`	`8`	Max volumes per directory (comma-separated)
`--ip`	auto-detect	Server IP / identifier
`--ip.bind`	same as `--ip`	Bind address
`--dataCenter`		Datacenter name
`--rack`		Rack name
`--disk`		Disk type tag: `hdd`, `ssd`, or custom
`--index`	`memory`	Needle map type: `memory`, `leveldb`, `leveldbMedium`, `leveldbLarge`
`--readMode`	`proxy`	Non-local read mode: `local`, `proxy`, `redirect`
`--fileSizeLimitMB`	`256`	Max upload file size
`--minFreeSpace`	`1` (percent)	Min free disk space before marking volumes read-only
`--securityFile`		Path to `security.toml` for JWT keys and TLS certs
`--metricsPort`	`0` (disabled)	Prometheus metrics endpoint port
`--whiteList`		Comma-separated IPs with write permission
`--preStopSeconds`	`10`	Graceful drain period before shutdown
`--compactionMBps`	`0` (unlimited)	Compaction I/O rate limit
`--pprof`	`false`	Enable pprof HTTP handlers

Set RUST_LOG=debug (or trace, info, warn) for log level control. Set SEAWEED_WRITE_QUEUE=1 to enable batched async write processing.

Features

Binary compatible -- reads and writes the same .dat/.idx/.vif files as the Go server; seamless migration with no data conversion.
HTTP + gRPC -- full implementation of the volume server HTTP API and all gRPC RPCs including streaming operations (copy, tail, incremental copy, vacuum).
Master heartbeat -- bidirectional streaming heartbeat with the Go master server; volume and EC shard registration, leader failover, graceful shutdown deregistration.
JWT authentication -- signing key configuration via security.toml with token source precedence (query > header > cookie), file_id claims validation, and separate read/write keys.
TLS -- HTTPS for the HTTP API and mTLS for gRPC, configured through security.toml.
Erasure coding -- Reed-Solomon EC shard management: mount/unmount, read, rebuild, copy, delete, and shard-to-volume reconstruction.
S3 remote storage -- FetchAndWriteNeedle reads from any S3-compatible backend (AWS, MinIO, Wasabi, Backblaze, etc.) and writes locally. Supports VolumeTierMoveDatToRemote/FromRemote for tiered storage.
Needle map backends -- in-memory HashMap, LevelDB (via rusty-leveldb), or redb (pure Rust disk-backed) needle maps.
Image processing -- on-the-fly resize/crop, JPEG EXIF orientation auto-fix, WebP support.
Streaming reads -- large files (>1MB) are streamed via spawn_blocking to avoid blocking the async runtime.
Auto-compression -- compressible file types (text, JSON, CSS, JS, SVG, etc.) are gzip-compressed on upload.
Prometheus metrics -- counters, histograms, and gauges exported at a dedicated metrics port; optional push gateway support.
Graceful shutdown -- SIGINT/SIGTERM handling with configurable preStopSeconds drain period.

Testing

Rust unit tests

cd seaweed-volume
cargo test

Go integration tests

The Go test suite can target either the Go or Rust volume server via the VOLUME_SERVER_IMPL environment variable:

# Run all HTTP + gRPC integration tests against the Rust server
VOLUME_SERVER_IMPL=rust go test -v -count=1 -timeout 1200s \
  ./test/volume_server/grpc/... ./test/volume_server/http/...

# Run a single test
VOLUME_SERVER_IMPL=rust go test -v -count=1 -timeout 60s \
  -run "TestName" ./test/volume_server/http/...

# Run S3 remote storage tests
VOLUME_SERVER_IMPL=rust go test -v -count=1 -timeout 180s \
  -run "TestFetchAndWriteNeedle" ./test/volume_server/grpc/...

Load testing

A load test harness is available at test/volume_server/loadtest/. See that directory for usage instructions and scenarios.

Architecture

The server runs three listeners concurrently:

HTTP (Axum 0.7) -- admin and public routers for file upload/download, status, and stats endpoints.
gRPC (Tonic 0.12) -- all VolumeServer RPCs from the SeaweedFS protobuf definition.
Metrics (optional) -- Prometheus scrape endpoint on a separate port.

Key source modules:

Path	Description
`src/main.rs`	Entry point, server startup, signal handling
`src/config.rs`	CLI parsing and configuration resolution
`src/server/volume_server.rs`	HTTP router setup and middleware
`src/server/handlers.rs`	HTTP request handlers (read, write, delete, status)
`src/server/grpc_server.rs`	gRPC service implementation
`src/server/heartbeat.rs`	Master heartbeat loop
`src/storage/volume.rs`	Volume read/write/delete logic
`src/storage/needle.rs`	Needle (file entry) serialization
`src/storage/store.rs`	Multi-volume store management
`src/security.rs`	JWT validation and IP whitelist guard
`src/remote_storage/`	S3 remote storage backend

See DEV_PLAN.md for the full development history and feature checklist.