* doc: P14 S8 final bounded close — evidence matrix + P15 handoff
Adds the six S8 closure deliverables consolidating S4-S7 evidence,
classifying V2 scenarios, and mapping residual product gaps onto
canonical P15 tracks (per v3-phase-15-product-plan.md §4).
New docs:
- v3-phase-14-s8-assignment.md — S8 execution contract.
- v3-phase-14-s8-final-bounded-close.md — bounded P14 target,
accepted topology, reject conditions.
- v3-phase-14-s8-evidence-matrix.md — 16 claims × {L0, L1, L2, L3,
Status, Residual}. 15 PROVEN, 1 PARTIAL (Claim 15 fence
quantitative bound, P14 internal follow-up). Rounds 2-3 architect
corrections: Claim 10 / 12 L2 narrowed; Claim 6 refresh gap closed
by the new L1 test (see companion commit in seaweed_block).
- v3-phase-14-s8-v2-scenario-classification.md — every V2 scenario
mapped to RUNNABLE-P14 / BLOCKED-FRONTEND / BLOCKED-OPS /
BLOCKED-HA / BLOCKED-PERF / PORT-MECHANISM; scenario YAMLs kept
as L3 shape, not executed evidence.
- v3-phase-14-s8-p15-handoff.md — 11 rows (10 canonical P15 tracks
+ 1 P14 internal follow-up anchored to Claim 15 PARTIAL); §4
integrity check split by row class.
- v3-phase-14-s8-closure.md — final P14 closure statement matching
the close doc §10 wording; explicit non-goals; all 9 P15 tracks
named with canonical numbering.
No claim of CSI / frontend / migration / security / performance /
production readiness. Every product gap is handed off with a
concrete first-proof gate.
Companion: seaweed_block commit adds the IntentRefreshEndpoint L1
route test that closes Claim 6.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* doc: P14 S8 — resolve port-now doc conflict (CodeRabbit #4)
final-bounded-close.md §7 previously said "Port now: testrunner,
scenarios, component harness, qa_block, learn/test" while
v2-scenario-classification.md §2 says S8 does NOT port testrunner
machinery and defers all actual porting to P15.
Align final-bounded-close.md §7 with classification: section
renamed "Classify now (S8 scope), port deferred to P15". Every
item now states which P15 track actually owns the port (Final Gate
or T1 Frontend + Data Path as applicable).
No scope expansion; no new handoff gap. Pure doc-consistency fix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8.8 KiB
V3 Phase 14 S8 Closure
Date: 2026-04-20
Status: draft — pending 14A final pass + architect + tester sign-off
Purpose: the final P14 closure statement per v3-phase-14-s8-final-bounded-close.md §10
1. What P14 Now Owns
After S4-S7, P14 closes one bounded single-active-master internal topology/control-plane truth loop for the accepted topology:
-
Observation institution (S4): heartbeats and inventory flow through
ObservationHostinto a synthesizedClusterSnapshot+SupportabilityReport. Observation never mints authority. Partial / conflicting / duplicate / unknown / expired inventory becomes explicitVolumeUnsupportedEvidence; pending state is distinct from unsupported; per-volume isolation holds. -
Durable authority institution (S5): current authority line is durable, single-owner, atomically written, and reloaded synchronously at publisher boot. One record per volume. Corrupt records per-volume-fail-closed and surface as structured
ReloadSkips. Process lock is exclusive and idempotent. Store never mints authority (boundary-guard test). -
Convergence institution (S6): desired assignments have bounded fate — confirmed (cleared), publish-not-observed (stuck-evidence, no re-mint, no churn), or superseded (by newer publisher line or different decision). Passive retry only; no active re-drive. Normal observation lag does NOT supersede. Per-volume retry clock; no global state.
-
Real-route restart closure (S7): the full
ObservationHost → TopologyController → Publisher(reloaded) → VolumeBridge → VolumeReplicaAdaptercomposes correctly at restart. Restart recomputes desired from durable authority + fresh observation (no durable desired store). Old-slot per-replica state is NOT revived from the store (S5 one-record-per-volume rule); live-route old-slot delivery via bridge is rejected by the adapter's monotonic guard. Stale observation cannot move authority backward (4 simulation shapes). Unsupported topology after restart records evidence, not silent idle. L2 process smoke proves real-binary restart × 2 preserves durable truth with no backward mint, via pinned structured JSON. -
Accepted topology (S8 §4): single active master; multiple volumes; three replica slots per volume on distinct servers; one primary + two candidates; publisher-owned
Epoch/EndpointVersion; durable current line; passive convergence; realVolumeBridgeinto adapter.
2. What P14 Explicitly Does NOT Own
- CSI lifecycle (Create/Delete/Publish/Node*).
- iSCSI / NVMe-oF frontend data path.
- External volume management API (REST / gRPC).
- Security / auth (authn, CHAP, encryption).
- Operator diagnostics surface (dashboards, alerts,
/metrics). - Operator workflow (drain, planned failover, manual reassign, supervised rebuild).
- V2 ↔ V3 migration and coexistence.
- Deployment / packaging / hardening (systemd, container, backup of durable store).
- Performance / soak / release-hardening claims.
- Multi-master / leader election / distributed authority (explicit non-goal, not just deferred).
- V2
HandleAssignment/promote/demotesemantics (explicit non-goal). - Heartbeat-as-authority (explicit non-goal).
Every item 1–9 is mapped to a P15 track in v3-phase-14-s8-p15-handoff.md §2. Items 10–12 are permanent non-goals.
3. Evidence At L0 / L1 / L2 / L3
Full matrix: v3-phase-14-s8-evidence-matrix.md §3. Summary of coverage:
| Claim | L0 | L1 | L2 | L3 |
|---|---|---|---|---|
| 1. Observation system-fed | ✓ | ✓ | — (via L1) | L3 shape only |
| 2. Supportability explicit | ✓ | ✓ | — | L3 shape only |
| 3. Durable authority / reload | ✓ (13 tests) | ✓ | ✓ (subprocess) | L3 shape only |
| 4. No old-slot durable revival | ✓ | ✓ | ✓ | L3 shape only |
| 5. Live-route old-slot rejected | ✓ | ✓ | N/A (live-route) | N/A |
| 6. Controller-driven bind/reassign/refresh | ✓ (7 tests) | ✓ | ✓ (Bind only — Reassign / RefreshEndpoint are L1-only; subprocess does not ingest heartbeats) | L3 shape only |
| 7. Confirmation clears desired | ✓ | ✓ | N/A | L3 shape only |
| 8. Stuck evidence bounded / no churn | ✓ (5 tests) | ✓ | N/A | L3 shape only |
| 9. Supersede (2 modes) | ✓ (3 tests) | ✓ | N/A | L3 shape only |
| 10. Restart re-anchors via VolumeBridge | ✓ | ✓ | subclaim only — L2 proves durable Publisher reload (Claim 3); the bridge/adapter re-anchor is NOT at L2 because sparrow subprocess does not construct ObservationHost / controller / bridge / adapter |
L3 shape only |
| 11. Stale observation cannot go backward | ✓ | ✓ (4 sub-cases) | — | L3 shape only |
| 12. Unsupported → evidence | ✓ | ✓ | not at L2 — ReloadSkips is S5 durable-corruption evidence, not observation-layer unsupported-topology. Controller LastUnsupported is L1 |
L3 shape only |
| 13. Per-volume isolation | ✓ (4 tests) | ✓ | N/A | L3 shape only |
| 14. Placement / rebalance | ✓ (5 tests) | ✓ | N/A | L3 shape only |
| 15. Fence route bounded | ✓ (adapter) | PARTIAL | N/A | L3 shape only |
| 16. Boundary guards | ✓ | N/A (structural) | N/A | N/A |
Sixteen claims; fifteen PROVEN at their target level; one (Claim 15) PARTIAL — adapter-package follow-up per §5 below. L3 is classification-only at P14 (v3-phase-14-s8-v2-scenario-classification.md) — L3 entries are scenario SHAPES, not executed evidence. Runnable L3 is P15 Final Gate (Cluster Validation Agent). Claims 6, 10, and 12 have NARROWED L2 cells where the subprocess binary cannot carry the full claim surface; each is handed off to the correct canonical P15 track.
4. Test Baseline (2026-04-20)
$ go test ./core/engine ./core/adapter ./core/authority ./cmd/sparrow -count=1
ok github.com/seaweedfs/seaweed-block/core/engine 0.016s
ok github.com/seaweedfs/seaweed-block/core/adapter 2.344s
ok github.com/seaweedfs/seaweed-block/core/authority 4.569s
ok github.com/seaweedfs/seaweed-block/cmd/sparrow 2.264s
$ go test ./... -count=1
(14 packages, all PASS)
193 Go tests across the P14 scope. L2 subprocess smoke (TestS7Process_RealSubprocessRestartSmoke) spawns a real sparrow binary twice on the same store directory and asserts the pinned JSON pass-line schema.
5. Residual Risks
| # | Risk | Scope / carry-forward |
|---|---|---|
| 1 | Windows os.RemoveAll on t.TempDir() can race with lock-file release during teardown. No correctness impact. |
S7 sketch §8.5 — logged as t.Logf, not a failure. |
| 2 | Claim 15 PARTIAL: fence quantitative timeout-bound is adapter-package-owned, depends on the uncommitted fence-watchdog branch committing. | v3-phase-14-s8-p15-handoff.md item 11 — P14 internal follow-up, NOT P15. Single adapter test replaces PARTIAL status. |
| 3 | L2 subprocess drives mint via StaticDirective; it does NOT construct ObservationHost / controller / bridge / adapter. Full heartbeat-ingress and real-binary adapter route are L1-only in S8. |
P15 T1 Frontend + Data Path (handoff items #1 and #2). |
No S8-blocking correctness residuals. One adapter-owned quantitative fence proof (Claim 15) remains as P14-internal follow-up per row 11 of the handoff table — not a P15 track, not an S8 gate. Zero backward-authority-mint residuals. Zero silent-idle residuals.
6. P15 Track Owners
Handoff complete per v3-phase-14-s8-p15-handoff.md, aligned to canonical v3-phase-15-product-plan.md §4. Every P14 residual product gap has a named P15 track and a concrete first-proof gate:
- T1 Frontend + Data Path Contract (includes subprocess heartbeat ingress)
- T2 CSI / External Lifecycle Surface
- T3 External Control API
- T4 Security And Auth Posture
- T5 Diagnostics And Explainability
- T6 Operator Workflow
- T7 V2/V3 Coexistence And Migration
- T8 Deployment / Upgrade / Release Hardening
- Final Gate — Cluster Validation Agent
7. Final P14 Claim
P14 has closed one bounded single-active-master internal topology/control-plane truth loop for the accepted topology set. Observation is system-fed; authority is durable, single-owner, and restart-recovered; convergence has bounded fate; the real
VolumeBridge → adapterroute re-anchors across restart without backward mint or old-slot revival.P14 has NOT closed CSI, external API, frontend data path, migration, security, deployment, performance, or production readiness. Those nine tracks are handed off to P15 with concrete first-proof gates. Multi-master, V2
HandleAssignment/promote/demotesemantics, and heartbeat-as-authority are explicit non-goals — not deferrals.
8. Sign-off State
- Architect — pending review of this matrix + handoff.
- Tester — pending reproducibility check (commands in §4).
- 14A final targeted pass — pending (scope in
v3-phase-14a-checklist.md).
S8 cannot be accepted until all three sign off. After that, P14 is closed and the next active work is P15 T1 Frontend.