* doc: P14 S8 final bounded close — evidence matrix + P15 handoff
Adds the six S8 closure deliverables consolidating S4-S7 evidence,
classifying V2 scenarios, and mapping residual product gaps onto
canonical P15 tracks (per v3-phase-15-product-plan.md §4).
New docs:
- v3-phase-14-s8-assignment.md — S8 execution contract.
- v3-phase-14-s8-final-bounded-close.md — bounded P14 target,
accepted topology, reject conditions.
- v3-phase-14-s8-evidence-matrix.md — 16 claims × {L0, L1, L2, L3,
Status, Residual}. 15 PROVEN, 1 PARTIAL (Claim 15 fence
quantitative bound, P14 internal follow-up). Rounds 2-3 architect
corrections: Claim 10 / 12 L2 narrowed; Claim 6 refresh gap closed
by the new L1 test (see companion commit in seaweed_block).
- v3-phase-14-s8-v2-scenario-classification.md — every V2 scenario
mapped to RUNNABLE-P14 / BLOCKED-FRONTEND / BLOCKED-OPS /
BLOCKED-HA / BLOCKED-PERF / PORT-MECHANISM; scenario YAMLs kept
as L3 shape, not executed evidence.
- v3-phase-14-s8-p15-handoff.md — 11 rows (10 canonical P15 tracks
+ 1 P14 internal follow-up anchored to Claim 15 PARTIAL); §4
integrity check split by row class.
- v3-phase-14-s8-closure.md — final P14 closure statement matching
the close doc §10 wording; explicit non-goals; all 9 P15 tracks
named with canonical numbering.
No claim of CSI / frontend / migration / security / performance /
production readiness. Every product gap is handed off with a
concrete first-proof gate.
Companion: seaweed_block commit adds the IntentRefreshEndpoint L1
route test that closes Claim 6.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* doc: P14 S8 — resolve port-now doc conflict (CodeRabbit #4)
final-bounded-close.md §7 previously said "Port now: testrunner,
scenarios, component harness, qa_block, learn/test" while
v2-scenario-classification.md §2 says S8 does NOT port testrunner
machinery and defers all actual porting to P15.
Align final-bounded-close.md §7 with classification: section
renamed "Classify now (S8 scope), port deferred to P15". Every
item now states which P15 track actually owns the port (Final Gate
or T1 Frontend + Data Path as applicable).
No scope expansion; no new handoff gap. Pure doc-consistency fix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
10 KiB
V3 Phase 14 S8 — P15 Handoff
Date: 2026-04-20
Status: draft (S8 P15 handoff, round-2 aligned to canonical P15 plan)
Purpose: map every P14-residual product gap to the canonical P15 track in v3-phase-15-product-plan.md, so P15 execution starts from the correct ownership map
1. Handoff Contract
Per v3-phase-14-s8-assignment.md §3.F: every gap that P14 does NOT close must appear here with four columns — the gap, why it is not P14, which canonical P15 track owns it, and the minimum first proof gate.
Canonical P15 track numbering (from v3-phase-15-product-plan.md §4):
T1Frontend + Data Path ContractT2CSI / External Lifecycle SurfaceT3External Control APIT4Security And Auth PostureT5Diagnostics And ExplainabilityT6Operator WorkflowT7V2/V3 Coexistence And MigrationT8Deployment / Upgrade / Release HardeningFinal GateCluster Validation Agent
S8 does NOT invent new tracks, renumber existing ones, or prescribe P15 implementation — only the acceptance gate each P15 track must pass for the handed-off gap.
2. Handoff Table
| # | Gap | Why not P14 | P15 owner track | Required first proof |
|---|---|---|---|---|
| 1 | Frontend data path (attach → real read → real write → stale-primary fence at frontend boundary) | P14 stops at adapter/engine projection; there is no user-data I/O path in P14, and adapter.PublishHealthy is a control-plane signal, not a data-path handshake. |
P15 T1 Frontend + Data Path | L2: V3 frontend backend attaches to an internal volume, writes data, reads it back, triggers P14 reassignment/failover, and verifies the stale primary can no longer serve writes. Covered by T1 §5 Acceptance. |
| 2 | Subprocess heartbeat ingress — the L2 subprocess smoke (TestS7Process_RealSubprocessRestartSmoke) drives mint via StaticDirective, not a real heartbeat wire. The full heartbeat → observation → controller path is L1-only in the subprocess. |
Real heartbeat ingress is a frontend/transport surface. P14 has no intention of adding one. | P15 T1 Frontend + Data Path (observation ingress is part of the frontend/transport contract) | L2: subprocess accepts heartbeats over a real transport; observation store mutates; controller sees the change; adapter advances — end-to-end in a real process. |
| 3 | CSI driver lifecycle (CreateVolume / DeleteVolume / ControllerPublish / NodeStage / NodePublish / NodeUnpublish / ControllerUnpublish) | P14 has no orchestrator-facing lifecycle surface; CSI sits above the P14 authority/adapter route. | P15 T2 CSI / External Lifecycle Surface | L2: CSI sidecar ↔ sparrow process, full Create → Publish → use → Unpublish → Delete cycle via real CSI gRPC against V3 authority (T2 §5 Acceptance). |
| 4 | External control API (REST / gRPC verbs for list / inspect / create / delete / resize / pause / resume) | P14 exposes hidden test flags on sparrow only. An external API is a net-new surface. |
P15 T3 External Control API | L2: OpenAPI / gRPC verbs backed by V3 authority state, exercised by an external client that does NOT construct AssignmentInfo directly. |
| 5 | Security / auth posture (API authn/authz, CHAP for iSCSI, TLS/mTLS where applicable, audit trail) | P14 has no external authn surface. | P15 T4 Security And Auth | L1+L2: CHAP positive+negative for any iSCSI target shipped; API authn positive+negative; tenant-scoped listing; audit log for mutating verbs. |
| 6 | Operator diagnostics surface (health dashboard, structured readout of LastUnsupported / LastConvergenceStuck / ReloadSkips, alerts) |
P14 records internal evidence (per-volume maps + structured JSON pass-line in S7 subprocess smoke) but does NOT expose it to operators. | P15 T5 Diagnostics And Explainability | L2: structured /healthz + /metrics surfacing P14 internal states; L3 cp85-metrics-verify.yaml scenario class reshaped for V3. |
| 7 | Operator workflow (planned failover, graceful drain, supervised reassign, supervised rebuild) | P14 rejects V2 promote/demote/HandleAssignment. Operator-initiated actions MUST route through the publisher mint path as ordinary AssignmentAsks. A new workflow surface is required. |
P15 T6 Operator Workflow | L2: operator-triggered reassign lands in TopologyController as an AssignmentAsk; publisher mints new epoch; adapter converges. Manual promote is explicitly reshaped from V2 semantics. |
| 8 | V2 ↔ V3 migration / coexistence (online migrate a V2 volume to V3 authority; rollback; mixed V2+V3 cluster) | Migration is a product-level transition requiring both T1 and T6. P14 is V3-only. | P15 T7 V2/V3 Coexistence And Migration | L3 + L4: op-upgrade-rollback.yaml scenario class on a mixed cluster; explicit no-split-brain check under migration. |
| 9 | Deployment / hardening (systemd / container packaging, lockfile+storedir pathing, crash policy, log rotation, store backup) | sparrow is currently a test binary with hidden smoke flags; no deployment story. |
P15 T8 Deployment / Upgrade / Release Hardening | L2: deployable package starts sparrow with the durable store correctly mounted; crash-restart cycle preserves the P14 restart truth. L4: release-hardening soak (cp84-soak-4h.yaml, cp85-soak-24h.yaml). |
| 10 | Final cluster validation (end-to-end: operator creates volume → data written → primary killed → failover → operator reads data back → no loss, on real hardware) | P14 covers the control plane only; full cluster validation needs T1 + T2 + T3 + T4 + T5 + T6 all landed. | P15 Final Gate Cluster Validation Agent | Composite L3/L4 pack: smoke-block-api + ha-restart-recovery + ha-failover + ha-io-continuity + fault-partition in one run on real hardware, with evidence artifacts in learn/test/. |
| 11 | Adapter fence-watchdog quantitative timeout bound (watchdog fires at exact fence deadline + structured watchdog event for fence lineage) | S7 L1 covers "Fence command reaches executor via bridge + withheld callback does not produce Healthy"; the quantitative bound is adapter-package-owned and depends on the currently-uncommitted fence-watchdog branch. S7 sketch §10.1 Path B forbids S7 from reconfiguring the fence path. | P14 internal follow-up (NOT a P15 track) | Single adapter-package test replacing the PARTIAL row for evidence-matrix Claim 15. Lands when the fence-watchdog branch commits. Not an S8 gate. |
3. Out-Of-Scope (NOT Handed Off)
These are explicit non-goals for both P14 and current P15 bounds — not failures, not deferrals, permanent exclusions.
| Non-goal | Reason |
|---|---|
| Multi-master / leader election / distributed authority store | v3-phase-14-s8-final-bounded-close.md §3.6 and v3-phase-15-product-plan.md §2 both reject. Single-active-master is the accepted topology. Any multi-master claim would require a separate phase with a distributed-authority institution. |
V2 HandleAssignment / promote / demote authority-owning semantic |
Rejected at S2, reaffirmed S3-S8. Not ported under any P15 track. Operator promote surface in T6 is reshaped, not ported. |
| V2 heartbeat-as-authority | Rejected at S4. Heartbeats are observation inputs only under any P14 or P15 surface. |
4. Handoff Integrity Check
The table splits into two distinct row classes, each with its own integrity rule:
Rows 1-10 — P15 product-surface gaps. Each of these must satisfy all four checks as of 2026-04-20:
- Gap is not proven at any P14 level. Cross-reference
v3-phase-14-s8-evidence-matrix.md§3 — none of rows 1-10 appear as PROVEN or PARTIAL rows of the internal matrix. These are genuinely product-surface gaps P14 never attempted. - Gap is assigned to a canonical P15 track. Numbering matches
v3-phase-15-product-plan.md§4. - First-proof gate is concrete, not a hand-wave. Every row names a specific test layer and a specific scenario or surface shape.
- Gap does not silently depend on a P14 claim being wider than is actually proven. Example: P15 T1 is allowed to depend on P14 Claim 10 (restart re-anchors via VolumeBridge, PROVEN at L1) but NOT on any data-path claim — P14 makes none.
Row 11 — P14 internal follow-up (exception to rule #1). Row 11 is NOT a P15 gap; it is a targeted P14 internal completion that lands when the fence-watchdog branch commits. Its integrity rules are narrower:
- Row 11 MUST correspond to an existing PARTIAL or subclaim-only row of the internal matrix — the whole point of the row is to name the follow-up that closes that partial. In this case row 11 corresponds to evidence-matrix Claim 15 PARTIAL (fence quantitative timeout bound). This is the inverse of rule #1 above: row 11 is listed here precisely BECAUSE it appears as PARTIAL in the matrix.
- Row 11 MUST NOT be assigned to any P15 track — it is explicitly labeled "P14 internal follow-up (NOT a P15 track)" in the gap table.
- Row 11's first-proof gate MUST be a single-package test that closes the matrix PARTIAL row — not a multi-track composite.
Splitting the integrity check this way keeps rule #1 for P15 gaps strict (any cross-reference to PROVEN/PARTIAL in the matrix is a bug there) while making row 11's purpose explicit and auditable (every P14 internal follow-up row should have a matrix PARTIAL anchor).
5. Per-Track Dependency Summary
| P15 Track | Depends on P14 claims | Notes |
|---|---|---|
| T1 Frontend + Data Path | Claims 3, 6, 10, 14 (durable reload, controller-driven, restart re-anchor, placement/rebalance) | Core product dependency. |
| T2 CSI / Lifecycle | T1 + Claims 6, 10 | CSI needs a working data path. |
| T3 External API | Claims 6, 14 | API translates external verbs to AssignmentAsks. |
| T4 Security | T1 + T3 | Authn/authz sits above the surfaces T1 and T3 expose. |
| T5 Diagnostics | Claims 2, 8, 11, 12 (supportability / stuck / unsupported evidence) | Surfaces what P14 already records internally. |
| T6 Operator Workflow | Claims 6, 9, 14 (controller-driven, supersede, placement) | Operator actions are AssignmentAsks through the controller. |
| T7 Migration | T1 + T6 | Needs both frontend and operator surfaces. |
| T8 Deployment | Claim 3 (durable reload) + T1 | Depends on durable store shape and frontend. |
| Final Gate | All P14 claims + all P15 tracks | Composite. |
6. Closure
Every product gap that S8 identifies is listed here with a canonical P15 track or explicit P14 follow-up. Track numbering matches v3-phase-15-product-plan.md. P15 execution starts from this table without needing to re-derive the ownership map.