seaweedfs/weed/storage at 18cdb3819bea91a2ab3e48f5af8b3a0b69928f90 - seaweedfs - Mediatoday GIT repository

viaprog/seaweedfs

mirror of https://github.com/seaweedfs/seaweedfs.git synced 2026-06-13 23:36:45 +03:00

Files

T

History

Chris Lu 18cdb3819b fix(ec): crash-safe ecx-journal fold and shard rebuild (fsync before publish, no short-read-as-success) (#9938 )

* fix(ec): make ecx-journal fold and shard rebuild crash-safe

Two EC rebuild paths could silently lose or corrupt data:

RebuildEcxFile folded the .ecj deletion journal into .ecx (in-place
WriteAt tombstones) and then unlinked the journal without flushing the
.ecx writes first. A crash could persist the unlink ahead of the
tombstones, resurrecting deleted needles on the next load. It also read
journal records with a bare n!=size break, so a torn tail silently
dropped the remaining tombstones before the unlink. Now: read records
with io.ReadFull (io.EOF ends cleanly, a torn tail aborts and leaves
.ecj in place for retry), fsync .ecx before removing the journal.

rebuildEcFiles treated a zero/short ReadAt as a clean end-of-input and
discarded the read error, so a truncated or unreadable input shard
produced truncated regenerated shards that were then published as
restored redundancy; the regenerated shards were also never fsynced on
the no-sidecar path. Now: derive the expected shard size from the
present inputs up front (rejecting a divergent/zero-size input), drive
the loop by that size, fail on any short read or short write, and fsync
every regenerated shard before it is mounted/renamed.

Rust volume server mirrors the rebuild fix: rebuild_ec_files now checks
the read_at byte count (it previously discarded it, the same truncation
bug). The Rust ecx fold already synced .ecx before removing the journal.

Custom EC ratios are unaffected: the shard size derives from the input
shards and the loop uses the .vif-resolved data/parity counts, never a
hardcoded 10+4.

* storage: close ecx journal files via defer in RebuildEcxFile

Per review: a single deferred Close per file replaces the per-error-path
manual closes, so new early returns cannot leak descriptors. The journal
is still closed explicitly before its unlink since Windows cannot delete
an open file; the deferred second Close is a harmless no-op.

2026-06-12 22:28:56 -07:00

..

chore(weed/storage/backend/s3_backend): remove unused function (#9715 )

2026-05-27 22:14:45 -07:00

fix(ec): crash-safe ecx-journal fold and shard rebuild (fsync before publish, no short-read-as-success) (#9938 )

2026-06-12 22:28:56 -07:00

chore: remove ~50k lines of unreachable dead code (#8913 )

2026-04-03 16:04:27 -07:00

fix(needle): use discovered file content type (#9851 )

2026-06-07 11:50:34 -07:00

go fix

2026-02-20 18:42:00 -08:00

EC placement: shared replica-placement resolver, snapshot + Place core, capacity fixes, tiering (#9621 )

2026-05-22 20:22:09 -07:00

Have volume scrubs account for zero-sized volumes. (#9609 )

2026-05-21 09:42:07 -07:00

fix(master): register EC shards per physical disk on full heartbeat sync (#9212 ) (#9219 )

2026-04-24 14:01:09 -07:00

…

fix(ec): bring ec.encode worker and EC/volume helpers to parity with shell (#9599 )

2026-05-21 02:16:28 -07:00

disk_location_ec_realworld_test.go

EC bitrot detection: per-shard checksum sidecars (#9761 )

2026-05-31 18:52:44 -07:00

disk_location_ec_shard_size_test.go

fix(volume_server): load orphan EC shards across disks on startup (#9212 ) (#9244 )

2026-04-27 16:01:10 -07:00

disk_location_ec_test.go

fix(storage): refuse to load .vif-only entry as regular volume when .ecx exists (#9448 ) (#9461 )

2026-05-12 09:30:42 -07:00

disk_location_ec.go

fix(storage): never let an empty .dat delete healthy distributed EC shards (#9930 )

2026-06-11 20:26:20 -07:00

disk_location_test.go

…

disk_location.go

fix(storage): never let an empty .dat delete healthy distributed EC shards (#9930 )

2026-06-11 20:26:20 -07:00

needle_map_leveldb_test.go

fix(volume): keep vacuum running past dangling .idx entries (#9115 )

2026-04-16 22:01:34 -07:00

needle_map_leveldb.go

fix(volume): keep vacuum running past dangling .idx entries (#9115 )

2026-04-16 22:01:34 -07:00

needle_map_memory.go

fix(volume): keep vacuum running past dangling .idx entries (#9115 )

2026-04-16 22:01:34 -07:00

needle_map_metric_test.go

fix(volume): keep vacuum running past dangling .idx entries (#9115 )

2026-04-16 22:01:34 -07:00

needle_map_metric.go

fix(volume): keep vacuum running past dangling .idx entries (#9115 )

2026-04-16 22:01:34 -07:00

needle_map_sorted_file_test.go

fix(volume): seed indexFileOffset in SortedFileNeedleMap so Delete appends (#9483 )

2026-05-13 10:22:01 -07:00

needle_map_sorted_file.go

fix(volume): seed indexFileOffset in SortedFileNeedleMap so Delete appends (#9483 )

2026-05-13 10:22:01 -07:00

needle_map.go

fix(volume): keep vacuum running past dangling .idx entries (#9115 )

2026-04-16 22:01:34 -07:00

remote_tier_integration_test.go

fix(seaweed-volume): stop EC shard deletion from phantom .dat on restart (#9874 )

2026-06-08 22:10:16 -07:00

store_disk_space_test.go

[CheckDisk]: implement disk health detection (#9560 )

2026-06-02 09:02:05 -07:00

store_ec_9plus3_reboot_repro_test.go

[CheckDisk]: implement disk health detection (#9560 )

2026-06-02 09:02:05 -07:00

store_ec_delete.go

admin: report file and delete counts for EC volumes (#9060 )

2026-04-13 21:10:36 -07:00

store_ec_disk_type_test.go

batch drain delta heartbeat messages (#9914 )

2026-06-10 13:33:45 -07:00

store_ec_hybrid_repro_test.go

batch drain delta heartbeat messages (#9914 )

2026-06-10 13:33:45 -07:00

store_ec_mirror_test.go

[CheckDisk]: implement disk health detection (#9560 )

2026-06-02 09:02:05 -07:00

store_ec_mirror.go

fix(ec): mirror EC sidecars onto every shard-bearing disk at startup (#9525 )

2026-05-17 19:55:15 -07:00

store_ec_mount_cross_disk_test.go

[CheckDisk]: implement disk health detection (#9560 )

2026-06-02 09:02:05 -07:00

store_ec_orphan_shard_test.go

[CheckDisk]: implement disk health detection (#9560 )

2026-06-02 09:02:05 -07:00

store_ec_phantom_dat_test.go

fix(storage): never let an empty .dat delete healthy distributed EC shards (#9930 )

2026-06-11 20:26:20 -07:00

store_ec_reconcile.go

batch drain delta heartbeat messages (#9914 )

2026-06-10 13:33:45 -07:00

store_ec_recovery_test.go

…

store_ec_scrub.go

fix(ec): don't mix EC shards from different encode runs (#9880 )

2026-06-10 22:31:18 -07:00

store_ec_target_location_test.go

[CheckDisk]: implement disk health detection (#9560 )

2026-06-02 09:02:05 -07:00

store_ec.go

fix(ec): don't mix EC shards from different encode runs (#9880 )

2026-06-10 22:31:18 -07:00

store_load_balancing_simple_test.go

…

store_load_balancing_test.go

[CheckDisk]: implement disk health detection (#9560 )

2026-06-02 09:02:05 -07:00

store_maxvolume_deadzone_test.go

[CheckDisk]: implement disk health detection (#9560 )

2026-06-02 09:02:05 -07:00

store_state.go

chore: remove ~50k lines of unreachable dead code (#8913 )

2026-04-03 16:04:27 -07:00

store_vacuum_test.go

…

store_vacuum.go

Process .ecj deletions during EC decode and vacuum decoded volume (#8863 )

2026-04-01 01:15:26 -07:00

store.go

batch drain delta heartbeat messages (#9914 )

2026-06-10 13:33:45 -07:00

volume_backup_test.go

…

volume_backup.go

…

volume_checking_test.go

fix(volume): stop flipping volumes read-only on a non-append-ordered .idx (#9726 )

2026-05-28 18:04:31 -07:00

volume_checking.go

fix(volume): stop flipping volumes read-only on a non-append-ordered .idx (#9726 )

2026-05-28 18:04:31 -07:00

volume_destroy_ec_vif_test.go

fix(storage): keep EC .vif when deleting a coexisting regular volume (#9723 )

2026-05-28 15:39:31 -07:00

volume_info_test.go

…

volume_info.go

…

volume_io_error_test.go

fix(volume): sticky EIO quarantine; track streamed reads (#9384 )

2026-05-09 09:55:02 -07:00

volume_loading_corrupt_idx_test.go

fix(volume): avoid nil-deref when needle map loader errors (#9694 ) (#9697 )

2026-05-26 16:56:49 -07:00

volume_loading.go

fix(volume): avoid nil-deref when needle map loader errors (#9694 ) (#9697 )

2026-05-26 16:56:49 -07:00

volume_mark_writable_test.go

fix(volume): reopen .idx writable after MarkVolumeWritable (fixes #9515 ) (#9526 )

2026-05-18 20:51:04 -07:00

volume_read_all.go

…

volume_read_test.go

fix(volume): don't panic on read when needle map is nil (#9342 )

2026-05-06 18:23:06 -07:00

volume_read.go

fix(volume): sticky EIO quarantine; track streamed reads (#9384 )

2026-05-09 09:55:02 -07:00

volume_super_block.go

…

volume_tier.go

…

volume_vacuum_test.go

[CheckDisk]: implement disk health detection (#9560 )

2026-06-02 09:02:05 -07:00

volume_vacuum.go

volume: keep volume writable after a deletion-tail compaction (#9776 )

2026-06-01 13:15:08 -07:00

volume_write_test.go

fix(balance): don't move remote-tiered volumes; don't fatal on missing .idx (#9335 )

2026-05-06 15:19:43 -07:00

volume_write.go

fix(storage): keep EC .vif when deleting a coexisting regular volume (#9723 )

2026-05-28 15:39:31 -07:00

volume.go

[CheckDisk]: implement disk health detection (#9560 )

2026-06-02 09:02:05 -07:00