feat: harden auto-next continuity receipt linkage

This commit is contained in:
Eve
2026-04-24 16:38:36 +08:00
parent 8e290a4d9b
commit 82d0d94b5f
6 changed files with 242 additions and 741 deletions

View File

@@ -107,532 +107,39 @@ Expected outcome:
---
### Task 1: Capture the auto-next obligation contract in docs first
## Verification Record
**Files:**
- Create: `docs/runbooks/auto-next-obligation-gate.md`
- Read: `docs/runbooks/approved-plan-continuity.md`
- Read: `docs/plans/2026-04-24-approved-plan-continuity-hard-gate.md`
- Read: `docs/plans/2026-04-24-continuity-plugin-mvp.md`
### Commands run
**Step 1: Write the behavior contract only**
Document, in exact terms:
- when auto-next is obligatory
- legal non-auto-next closures:
- `waiting_user`
- `blocked`
- `pending_verification`
- allowed non-closure exception:
- explicit `highRiskStop`
- forbidden behavior:
- completed task + known next task + same approved plan + normal closeout without dispatch
**Step 2: Add a single canonical failure table**
Include rows for:
- complete + next task known + no receipt + completed closure => FAIL
- complete + next task known + valid receipt => PASS
- complete + next task known + waiting_user => PASS
- complete + next task known + blocked => PASS
- complete + next task known + pending_verification => PASS
- complete + next task known + highRiskStop => PASS
**Step 3: Verify the file exists and key phrases are present**
Run:
```bash
grep -n "auto-next\|highRiskStop\|waiting_user\|pending_verification\|same approved plan" docs/runbooks/auto-next-obligation-gate.md
```
Expected: matching lines found
**Step 4: Commit**
```bash
git add docs/runbooks/auto-next-obligation-gate.md
git commit -m "docs: define auto-next obligation gate contract"
```
### Task 2: Record the exact continuity / hook / dispatch gap before code changes
**Files:**
- Modify: `docs/plans/2026-04-24-auto-next-obligation-gate.md`
- Read: `scripts/approved_plan_continuity_gate.mjs`
- Read: `scripts/approved_plan_dispatch_binding.mjs`
- Read: `hooks/force-recall/handler.ts`
- Read: `scripts/test_approved_plan_continuity_gate.mjs`
**Step 1: Add a “Current Gap” section with exact bullets**
Capture these facts:
- current continuity gate checks known next action, not specifically same-plan next task
- current hook can surface planner-derived action but dry-run planner intent is not real dispatch
- current dispatch binding writes receipts, but the gate does not yet express “must auto-dispatch now” as its own obligation
- current legal terminal states are hard-coded and do not include high-risk stop metadata
**Step 2: Add a “Non-goals” section**
Explicitly exclude:
- generalized multi-plan scheduling
- speculative dispatch when next task is ambiguous
- removing current receipt validation
- implementing plugin extraction in this slice
**Step 3: Verify plan text now contains both sections**
Run:
```bash
grep -n "Current Gap\|Non-goals" docs/plans/2026-04-24-auto-next-obligation-gate.md
```
Expected: matching lines found
**Step 4: Commit**
```bash
git add docs/plans/2026-04-24-auto-next-obligation-gate.md
git commit -m "docs: capture current auto-next continuity gap"
```
### Task 3: Add fail-first test for the core task-boundary stop scenario
**Files:**
- Modify: `scripts/test_approved_plan_continuity_gate.mjs`
- Test: `scripts/approved_plan_continuity_gate.mjs`
**Step 1: Write one failing test case for the exact forbidden stop**
Add a case with input shaped like:
```json
{
"planId": "plan-auto-next-core",
"currentTask": "task-8",
"taskState": "complete",
"nextTaskKnown": true,
"sameApprovedPlan": true,
"taskBoundaryStop": true,
"nextDerivedAction": {
"type": "message_subagent",
"task": "continue with task-9"
},
"replyClosureState": "completed",
"highRiskStop": false,
"dispatchReceipt": null
}
```
Expected assertion:
- `ok === false`
- `verdict === 'continuity_failure'`
- `reason === 'missing_auto_next_dispatch'` (or the canonical reason you choose for this feature)
**Step 2: Run the test suite to verify failure**
Run:
```bash
node scripts/test_approved_plan_continuity_gate.mjs
```
Expected: FAIL because the new failure mode does not exist yet
**Step 3: Commit the failing test only**
```bash
git add scripts/test_approved_plan_continuity_gate.mjs
git commit -m "test: fail when approved plan stops at task boundary without auto-next dispatch"
```
### Task 4: Add fail-first test proving dry-run planner intent is still not enough
**Files:**
- Modify: `scripts/test_approved_plan_continuity_gate.mjs`
- Read: `hooks/force-recall/handler.ts`
**Step 1: Add a second failing test**
Case:
- `taskState='complete'`
- `nextTaskKnown=true`
- `sameApprovedPlan=true`
- `taskBoundaryStop=true`
- `derivedAction` exists
- `dispatchReceipt=null`
- `replyClosureState='completed'`
- `highRiskStop=false`
Expected:
- still FAIL
- still same continuity failure reason
This locks the rule that planner-derived next intent is not itself a pass.
**Step 2: Run tests to verify failure**
Run:
```bash
node scripts/test_approved_plan_continuity_gate.mjs
```
Expected: FAIL
**Step 3: Commit**
```bash
git add scripts/test_approved_plan_continuity_gate.mjs
git commit -m "test: fail auto-next obligation when only dry-run derived action exists"
```
### Task 5: Add pass-path test for explicit high-risk stop
**Files:**
- Modify: `scripts/test_approved_plan_continuity_gate.mjs`
- Test: `scripts/approved_plan_continuity_gate.mjs`
**Step 1: Add a pass test**
Case:
- task complete
- next task known
- same approved plan
- task boundary stop true
- no dispatch receipt
- closure state `completed`
- `highRiskStop=true`
Expected:
- `ok === true`
- `verdict === 'pass'`
**Step 2: Run tests to verify it fails before implementation**
Run:
```bash
node scripts/test_approved_plan_continuity_gate.mjs
```
Expected: FAIL until evaluator understands `highRiskStop`
**Step 3: Commit**
```bash
git add scripts/test_approved_plan_continuity_gate.mjs
git commit -m "test: allow explicit high-risk stop to bypass auto-next obligation"
```
### Task 6: Add neutral-path tests so the gate stays minimal
**Files:**
- Modify: `scripts/test_approved_plan_continuity_gate.mjs`
**Step 1: Add a pass test for ambiguous next task**
Case:
- `taskState='complete'`
- `nextTaskKnown=false`
- `sameApprovedPlan=true`
- `taskBoundaryStop=true`
- no dispatch receipt
- closure state `completed`
Expected:
- PASS, because auto-next is not obligatory when the next task is not known
**Step 2: Add a pass test for different-plan / unknown-plan next action**
Case:
- next action exists
- `sameApprovedPlan=false`
- no receipt
- closure state `completed`
Expected:
- PASS or falls back to old behavior only if no same-plan next-task obligation is active
**Step 3: Run tests to verify current behavior does not satisfy them yet**
Run:
```bash
node scripts/test_approved_plan_continuity_gate.mjs
```
Expected: FAIL or mixed results; note exact mismatch in implementation comments if needed
**Step 4: Commit**
```bash
git add scripts/test_approved_plan_continuity_gate.mjs
git commit -m "test: add neutral auto-next obligation coverage for unknown or out-of-plan next task"
```
### Task 7: Extend the continuity gate with minimal obligation logic
**Files:**
- Modify: `scripts/approved_plan_continuity_gate.mjs`
- Modify: `scripts/test_approved_plan_continuity_gate.mjs`
**Step 1: Add the smallest possible derived booleans**
Implement helpers like:
```js
const taskComplete = payload?.taskState === 'complete';
const nextAction = payload?.nextDerivedAction ?? payload?.derivedAction ?? null;
const nextTaskKnown = payload?.nextTaskKnown === true;
const sameApprovedPlan = payload?.sameApprovedPlan === true;
const taskBoundaryStop = payload?.taskBoundaryStop === true;
const highRiskStop = payload?.highRiskStop === true;
const hasDispatchReceipt = hasValidDispatchReceipt(payload?.dispatchReceipt ?? null);
const closureState = payload?.replyClosureState ?? null;
const isLegalTerminalState = LEGAL_TERMINAL_STATES.has(closureState);
const autoNextObligatory = taskComplete && nextTaskKnown && sameApprovedPlan && taskBoundaryStop && !isLegalTerminalState && !highRiskStop;
```
**Step 2: Add the new failure rule before the generic pass path**
Minimal rule:
- if `autoNextObligatory` and no valid dispatch receipt exists => fail with:
- `ok: false`
- `status: 'continuity_failure'`
- `verdict: 'continuity_failure'`
- `reason: 'missing_auto_next_dispatch'`
Keep the existing generic receipt failure behavior for legacy cases that are not strict same-plan task-boundary obligation cases, unless tests prove it should collapse into the new reason.
**Step 3: Run the continuity gate tests**
Run:
```bash
node scripts/test_approved_plan_continuity_gate.mjs
```
Expected: PASS for all old and new cases
**Step 4: Commit**
```bash
git add scripts/approved_plan_continuity_gate.mjs scripts/test_approved_plan_continuity_gate.mjs
git commit -m "feat: enforce auto-next obligation at approved plan task boundaries"
```
### Task 8: Add dispatch-binding test that the next task receipt is mandatory proof
**Files:**
- Modify: `scripts/test_approved_plan_continuity_gate.mjs`
- Modify: `scripts/test_force_recall_long_task_preflight.mjs`
- Read: `scripts/approved_plan_dispatch_binding.mjs`
**Step 1: Add a test that a receipt with wrong task linkage does not satisfy auto-next**
Case:
- current task is `task-8`
- next task known is `task-9`
- receipt exists but links to stale or mismatched action/task context
Expected:
- FAIL
- reason remains an auto-next continuity failure
If current receipt schema lacks enough linkage to assert this exactly, capture that as an explicit schema gap in comments and lock at least plan/task equality on currently available fields.
**Step 2: Add a preflight hook assertion**
Expected hook output should still fail when planner says the next task is known but no real bound receipt for that next dispatch exists.
**Step 3: Run both suites**
Run:
```bash
node scripts/test_approved_plan_continuity_gate.mjs
node scripts/test_force_recall_long_task_preflight.mjs
```
Expected: FAIL before hook/input wiring lands, or PASS only for the script-level side if hook has not been updated yet
**Step 4: Commit**
```bash
git add scripts/test_approved_plan_continuity_gate.mjs scripts/test_force_recall_long_task_preflight.mjs
git commit -m "test: require real next-task dispatch proof for auto-next obligation"
```
### Task 9: Add explicit hook input fields for task-boundary obligation
**Files:**
- Modify: `hooks/force-recall/handler.ts`
- Read: `scripts/plan_long_task_auto_chain.mjs`
- Read: `scripts/approved_plan_continuity_gate.mjs`
**Step 1: Add a focused builder for auto-next obligation fields**
Extend `buildApprovedPlanContinuityInput(...)` or equivalent with fields shaped like:
```ts
{
nextTaskKnown,
sameApprovedPlan,
taskBoundaryStop,
highRiskStop
}
```
Derive them conservatively:
- `nextTaskKnown=true` only when the next task is explicit from the approved-plan/auto-chain result
- `sameApprovedPlan=true` only when the next task belongs to the same approved plan, not merely a generic follow-up
- `taskBoundaryStop=true` only when the current reply is closing out at a completed-task boundary rather than continuing in-flight
- `highRiskStop=true` only when some upstream gate explicitly marks the stop as high-risk / owner-confirm-required
Do not infer these loosely from free-form text.
**Step 2: Preserve current legal closure fallback behavior**
Keep:
- `waiting_user` for button-path handoff
- `completed` as normal fallback
But ensure those defaults do not accidentally mask auto-next obligation cases.
**Step 3: Syntax-check the hook file**
Run:
```bash
node --check hooks/force-recall/handler.ts
```
Expected: PASS
**Step 4: Commit**
```bash
git add hooks/force-recall/handler.ts
git commit -m "feat: feed auto-next obligation metadata into continuity hook input"
```
### Task 10: Upgrade hook messaging so the failure is explicit
**Files:**
- Modify: `hooks/force-recall/handler.ts`
- Modify: `scripts/test_force_recall_long_task_preflight.mjs`
**Step 1: Add fail-first assertion for the new reason in hook output**
Expect the injected block to include something equivalent to:
- `reason=missing_auto_next_dispatch`
- “Do not stop at this completed-task boundary.”
- “Auto-dispatch the next task in the same approved plan, unless waiting_user / blocked / pending_verification / high-risk stop applies.”
**Step 2: Implement the smallest wording update**
In the continuity block builder, add a branch for the new reason so the prompt block distinguishes:
- generic missing dispatch receipt
- auto-next obligation failure at a task boundary
**Step 3: Run hook smoke tests**
Run:
```bash
node scripts/test_force_recall_long_task_preflight.mjs
```
Expected: PASS
**Step 4: Commit**
```bash
git add hooks/force-recall/handler.ts scripts/test_force_recall_long_task_preflight.mjs
git commit -m "feat: surface explicit auto-next obligation failure in force-recall hook"
```
### Task 11: Add minimal receipt-linkage hardening only if required by tests
**Files:**
- Modify: `scripts/approved_plan_dispatch_binding.mjs`
- Modify: `scripts/test_approved_plan_continuity_gate.mjs`
- Modify: `docs/runbooks/approved-plan-continuity.md`
- Modify: `docs/runbooks/auto-next-obligation-gate.md`
**Step 1: Only if needed, add one additional linkage field**
If the new tests cannot reliably distinguish “some dispatch happened” from “the required next task was dispatched,” add only one minimal extra receipt field such as:
- `nextTaskId`
or
- `nextTaskKey`
Do not redesign the whole receipt schema.
**Step 2: Add fail-first + pass-path tests for that field**
- stale/missing linkage => FAIL
- correct linkage => PASS
**Step 3: Re-run targeted tests**
Run:
```bash
node scripts/test_approved_plan_continuity_gate.mjs
```
Expected: PASS
**Step 4: Commit**
```bash
git add scripts/approved_plan_dispatch_binding.mjs scripts/test_approved_plan_continuity_gate.mjs docs/runbooks/approved-plan-continuity.md docs/runbooks/auto-next-obligation-gate.md
git commit -m "feat: harden dispatch receipt linkage for auto-next obligation"
```
### Task 12: Add continuity-plugin MVP compatibility notes before extraction
**Files:**
- Modify: `docs/plans/2026-04-24-auto-next-obligation-gate.md`
- Modify: `docs/plans/2026-04-24-continuity-plugin-mvp.md`
- Read: `plugins/continuity/` if present, otherwise keep as future note only
**Step 1: Add an explicit “Plugin MVP Compatibility” section**
Document these compatibility constraints:
- auto-next obligation must remain a pure evaluator rule, not hook-only string logic
- high-risk stop must become a config/input flag, not a prompt convention
- same-plan next-task proof must be representable in plugin evaluator input
- receipt validation and receipt storage remain separable from evaluator logic
- legacy script envelopes must remain bridgeable during extraction
**Step 2: Add the expected plugin module seams**
List future homes:
- evaluator rule -> `plugins/continuity/src/continuity/evaluator.mjs`
- receipt linkage validation -> `plugins/continuity/src/continuity/receipt-validator.mjs`
- hook wording -> `plugins/continuity/src/adapters/force-recall.mjs`
**Step 3: Verify both plan docs mention `highRiskStop` and `auto-next`**
Run:
```bash
grep -n "highRiskStop\|auto-next" docs/plans/2026-04-24-auto-next-obligation-gate.md docs/plans/2026-04-24-continuity-plugin-mvp.md
```
Expected: matching lines found
**Step 4: Commit**
```bash
git add docs/plans/2026-04-24-auto-next-obligation-gate.md docs/plans/2026-04-24-continuity-plugin-mvp.md
git commit -m "docs: capture continuity plugin compatibility for auto-next obligation"
```
### Task 13: Run the focused verification bundle
**Files:**
- Verify: `scripts/approved_plan_continuity_gate.mjs`
- Verify: `scripts/test_approved_plan_continuity_gate.mjs`
- Verify: `scripts/test_force_recall_long_task_preflight.mjs`
- Verify: `hooks/force-recall/handler.ts`
**Step 1: Run continuity gate suite**
Run:
```bash
node scripts/test_approved_plan_continuity_gate.mjs
```
Expected: PASS
**Step 2: Run hook smoke suite**
Run:
```bash
node scripts/test_force_recall_long_task_preflight.mjs
```
Expected: PASS
**Step 3: Run syntax check**
Run:
```bash
node --check hooks/force-recall/handler.ts
node --check scripts/approved_plan_continuity_gate.mjs
node --check scripts/approved_plan_dispatch_binding.mjs
```
Expected: PASS
**Step 4: Record exact verification output in the plan tail**
Include:
- exact commands
- PASS summary
- any deliberately deferred cases
**Step 5: Commit**
```bash
git add docs/plans/2026-04-24-auto-next-obligation-gate.md
git commit -m "chore: verify auto-next obligation gate slice"
node scripts/test_approved_plan_continuity_gate.mjs
node scripts/test_force_recall_long_task_preflight.mjs
```
### Task 14: Final acceptance checklist and handoff state
### Result summary
**Files:**
- Modify: `docs/plans/2026-04-24-auto-next-obligation-gate.md`
- `node --check hooks/force-recall/handler.ts`
- `node --check scripts/approved_plan_continuity_gate.mjs`
- `node --check scripts/approved_plan_dispatch_binding.mjs`
- `node scripts/test_approved_plan_continuity_gate.mjs``17/17 passed`
- `node scripts/test_force_recall_long_task_preflight.mjs`
**Step 1: Add the acceptance checklist**
Mark the plan acceptable only when all items are true:
- completed-task boundary stop without auto-next now fails
- dry-run planner intent alone does not satisfy continuity
- legal closure states still pass
- explicit high-risk stop passes
- same-plan next-task obligation is distinguished from generic next-action wording
- hook output surfaces the new failure clearly
- plugin extraction compatibility is documented
- tests pass
### What was hardened in this slice
**Step 2: Add explicit remaining risks**
Include at least:
- current hook may still need better upstream proof for `sameApprovedPlan`
- high-risk stop source of truth may not yet exist and may need one future metadata slice
- receipt schema may need exactly one extra linkage field if stale receipts can spoof pass conditions
- non-`force-recall` entry points remain out of scope for this slice
- continuity evaluator now rejects receipts that do not match the required `planId`, `currentTask`, and expected next dispatch action
- minimal receipt linkage field `nextTaskId` was added so the evaluator can distinguish the required next-task dispatch from a stale or unrelated receipt
- continuity tests now fail when the receipt links to the wrong next task
- continuity tests now fail when a receipt only contains checkpoint/session-style metadata instead of real dispatch linkage
- hook preflight verification still confirms that dry-run planner intent alone does not satisfy continuity, and that the failure reason remains `missing_auto_next_dispatch`
**Step 3: Leave status as pending verification**
Do not mark implementation complete in the plan; leave it as a verified-ready handoff target.
### Deliberately deferred
**Step 4: Commit**
```bash
git add docs/plans/2026-04-24-auto-next-obligation-gate.md
git commit -m "docs: finalize auto-next obligation gate implementation plan"
```
- stronger upstream source-of-truth for `sameApprovedPlan`
- broader non-`force-recall` entry-point enforcement
- continuity plugin extraction work
---
@@ -666,36 +173,23 @@ The enforcement should stay intentionally small:
## Acceptance Criteria
- [ ] A completed task in the same approved plan cannot stop at a boundary when the next task is known unless an allowed exemption applies.
- [ ] The continuity evaluator emits a dedicated failure for missing required auto-next dispatch.
- [ ] A real dispatch receipt is still required; dry-run planner output alone cannot pass.
- [ ] Legal closure states `waiting_user`, `blocked`, `pending_verification` still pass unchanged.
- [ ] Explicit `highRiskStop` bypass is supported and test-covered.
- [ ] Hook output clearly explains the auto-next obligation failure.
- [ ] Script-level continuity tests pass.
- [ ] Hook smoke tests pass.
- [x] A completed task in the same approved plan cannot stop at a boundary when the next task is known unless an allowed exemption applies.
- [x] The continuity evaluator emits a dedicated failure for missing required auto-next dispatch.
- [x] A real dispatch receipt is still required; dry-run planner output alone cannot pass.
- [x] Legal closure states `waiting_user`, `blocked`, `pending_verification` still pass unchanged.
- [x] Explicit `highRiskStop` bypass is supported and test-covered.
- [x] Hook output clearly explains the auto-next obligation failure.
- [x] Script-level continuity tests pass.
- [x] Hook smoke tests pass.
- [ ] The plan documents how this behavior migrates cleanly into the continuity plugin MVP.
## Risks / Open Questions
1. The current hook may not yet expose a strong enough source of truth for `sameApprovedPlan`; if so, one narrow upstream metadata field may be needed.
2. `highRiskStop` may not currently exist in structured input, so the first implementation may need a conservative default of `false` until an upstream gate can set it explicitly.
3. If current receipt shape cannot prove the required next-task linkage, one minimal receipt field should be added instead of broad schema redesign.
3. Receipt schema may still need one future compatibility pass if downstream writers have not yet been upgraded to emit `nextTaskId` everywhere continuity depends on same-plan auto-next proof.
4. This slice deliberately does not solve non-hook entry points or general workflow orchestration.
Plan complete and saved to `docs/plans/2026-04-24-auto-next-obligation-gate.md`. Two execution options:
## Status
**1. Subagent-Driven (this session)** - I dispatch fresh subagent per task, review between tasks, fast iteration
**2. Parallel Session (separate)** - Open new session with executing-plans, batch execution with checkpoints
**Which approach?**
**If Subagent-Driven chosen:**
- **REQUIRED SUB-SKILL:** Use superpowers:subagent-driven-development
- Stay in this session
- Fresh subagent per task + code review
**If Parallel Session chosen:**
- Guide them to open new session in worktree
- **REQUIRED SUB-SKILL:** New session uses superpowers:executing-plans
pending verification / reviewer checked

View File

@@ -35,6 +35,10 @@
- Use this field to state whether the reply closed under a dispatch-linked continuation path or some separately defined terminal closure state.
- This field is defined here as a receipt field only; legal closure states and gate enforcement are defined in later tasks.
### `nextTaskId`
- The identifier of the required next task when continuity depends on a same-plan auto-next transition.
- Use this field only to prove that the receipt links to the exact next task that had to be dispatched.
- This field is the minimal hardening field for next-task linkage; it prevents unrelated dispatches, checkpoints, or stale receipts from spoofing continuity pass.
## Legal terminal states

View File

@@ -51,6 +51,8 @@ The following behavior is forbidden:
A completed task in the same approved plan must not end with “I can continue with the next task” style closeout unless the next task has actually been dispatched.
Checkpoint artifacts, session keys, or oral/plain-text status updates are not substitutes for a real auto-next dispatch. A checkpoint may preserve state, but it does not prove that the required next task was actually dispatched.
## Canonical failure condition
If all of the following are true:
@@ -81,4 +83,6 @@ Then the continuity gate must fail and treat the stop as an auto-next obligation
- The obligation applies only when the next task is known within the same approved plan.
- A generic next action is not enough unless it proves the same approved plan task transition.
- A real dispatch receipt remains the source of truth for whether auto-next actually happened.
- Receipt linkage should include the required next-task identity when the evaluator needs to distinguish a real next-task dispatch from a stale or unrelated dispatch.
- Checkpoint/session metadata alone must not satisfy the receipt proof.
- This rule is intentionally minimal so it can later move into the continuity plugin without changing the behavior contract.