diff --git a/docs/plans/2026-04-24-auto-next-obligation-gate.md b/docs/plans/2026-04-24-auto-next-obligation-gate.md index 375d75d..44ef8c9 100644 --- a/docs/plans/2026-04-24-auto-next-obligation-gate.md +++ b/docs/plans/2026-04-24-auto-next-obligation-gate.md @@ -107,532 +107,39 @@ Expected outcome: --- -### Task 1: Capture the auto-next obligation contract in docs first +## Verification Record -**Files:** -- Create: `docs/runbooks/auto-next-obligation-gate.md` -- Read: `docs/runbooks/approved-plan-continuity.md` -- Read: `docs/plans/2026-04-24-approved-plan-continuity-hard-gate.md` -- Read: `docs/plans/2026-04-24-continuity-plugin-mvp.md` +### Commands run -**Step 1: Write the behavior contract only** -Document, in exact terms: -- when auto-next is obligatory -- legal non-auto-next closures: - - `waiting_user` - - `blocked` - - `pending_verification` -- allowed non-closure exception: - - explicit `highRiskStop` -- forbidden behavior: - - completed task + known next task + same approved plan + normal closeout without dispatch - -**Step 2: Add a single canonical failure table** -Include rows for: -- complete + next task known + no receipt + completed closure => FAIL -- complete + next task known + valid receipt => PASS -- complete + next task known + waiting_user => PASS -- complete + next task known + blocked => PASS -- complete + next task known + pending_verification => PASS -- complete + next task known + highRiskStop => PASS - -**Step 3: Verify the file exists and key phrases are present** -Run: -```bash -grep -n "auto-next\|highRiskStop\|waiting_user\|pending_verification\|same approved plan" docs/runbooks/auto-next-obligation-gate.md -``` -Expected: matching lines found - -**Step 4: Commit** -```bash -git add docs/runbooks/auto-next-obligation-gate.md -git commit -m "docs: define auto-next obligation gate contract" -``` - -### Task 2: Record the exact continuity / hook / dispatch gap before code changes - -**Files:** -- Modify: `docs/plans/2026-04-24-auto-next-obligation-gate.md` -- Read: `scripts/approved_plan_continuity_gate.mjs` -- Read: `scripts/approved_plan_dispatch_binding.mjs` -- Read: `hooks/force-recall/handler.ts` -- Read: `scripts/test_approved_plan_continuity_gate.mjs` - -**Step 1: Add a “Current Gap” section with exact bullets** -Capture these facts: -- current continuity gate checks known next action, not specifically same-plan next task -- current hook can surface planner-derived action but dry-run planner intent is not real dispatch -- current dispatch binding writes receipts, but the gate does not yet express “must auto-dispatch now” as its own obligation -- current legal terminal states are hard-coded and do not include high-risk stop metadata - -**Step 2: Add a “Non-goals” section** -Explicitly exclude: -- generalized multi-plan scheduling -- speculative dispatch when next task is ambiguous -- removing current receipt validation -- implementing plugin extraction in this slice - -**Step 3: Verify plan text now contains both sections** -Run: -```bash -grep -n "Current Gap\|Non-goals" docs/plans/2026-04-24-auto-next-obligation-gate.md -``` -Expected: matching lines found - -**Step 4: Commit** -```bash -git add docs/plans/2026-04-24-auto-next-obligation-gate.md -git commit -m "docs: capture current auto-next continuity gap" -``` - -### Task 3: Add fail-first test for the core task-boundary stop scenario - -**Files:** -- Modify: `scripts/test_approved_plan_continuity_gate.mjs` -- Test: `scripts/approved_plan_continuity_gate.mjs` - -**Step 1: Write one failing test case for the exact forbidden stop** -Add a case with input shaped like: -```json -{ - "planId": "plan-auto-next-core", - "currentTask": "task-8", - "taskState": "complete", - "nextTaskKnown": true, - "sameApprovedPlan": true, - "taskBoundaryStop": true, - "nextDerivedAction": { - "type": "message_subagent", - "task": "continue with task-9" - }, - "replyClosureState": "completed", - "highRiskStop": false, - "dispatchReceipt": null -} -``` -Expected assertion: -- `ok === false` -- `verdict === 'continuity_failure'` -- `reason === 'missing_auto_next_dispatch'` (or the canonical reason you choose for this feature) - -**Step 2: Run the test suite to verify failure** -Run: -```bash -node scripts/test_approved_plan_continuity_gate.mjs -``` -Expected: FAIL because the new failure mode does not exist yet - -**Step 3: Commit the failing test only** -```bash -git add scripts/test_approved_plan_continuity_gate.mjs -git commit -m "test: fail when approved plan stops at task boundary without auto-next dispatch" -``` - -### Task 4: Add fail-first test proving dry-run planner intent is still not enough - -**Files:** -- Modify: `scripts/test_approved_plan_continuity_gate.mjs` -- Read: `hooks/force-recall/handler.ts` - -**Step 1: Add a second failing test** -Case: -- `taskState='complete'` -- `nextTaskKnown=true` -- `sameApprovedPlan=true` -- `taskBoundaryStop=true` -- `derivedAction` exists -- `dispatchReceipt=null` -- `replyClosureState='completed'` -- `highRiskStop=false` - -Expected: -- still FAIL -- still same continuity failure reason - -This locks the rule that planner-derived next intent is not itself a pass. - -**Step 2: Run tests to verify failure** -Run: -```bash -node scripts/test_approved_plan_continuity_gate.mjs -``` -Expected: FAIL - -**Step 3: Commit** -```bash -git add scripts/test_approved_plan_continuity_gate.mjs -git commit -m "test: fail auto-next obligation when only dry-run derived action exists" -``` - -### Task 5: Add pass-path test for explicit high-risk stop - -**Files:** -- Modify: `scripts/test_approved_plan_continuity_gate.mjs` -- Test: `scripts/approved_plan_continuity_gate.mjs` - -**Step 1: Add a pass test** -Case: -- task complete -- next task known -- same approved plan -- task boundary stop true -- no dispatch receipt -- closure state `completed` -- `highRiskStop=true` - -Expected: -- `ok === true` -- `verdict === 'pass'` - -**Step 2: Run tests to verify it fails before implementation** -Run: -```bash -node scripts/test_approved_plan_continuity_gate.mjs -``` -Expected: FAIL until evaluator understands `highRiskStop` - -**Step 3: Commit** -```bash -git add scripts/test_approved_plan_continuity_gate.mjs -git commit -m "test: allow explicit high-risk stop to bypass auto-next obligation" -``` - -### Task 6: Add neutral-path tests so the gate stays minimal - -**Files:** -- Modify: `scripts/test_approved_plan_continuity_gate.mjs` - -**Step 1: Add a pass test for ambiguous next task** -Case: -- `taskState='complete'` -- `nextTaskKnown=false` -- `sameApprovedPlan=true` -- `taskBoundaryStop=true` -- no dispatch receipt -- closure state `completed` - -Expected: -- PASS, because auto-next is not obligatory when the next task is not known - -**Step 2: Add a pass test for different-plan / unknown-plan next action** -Case: -- next action exists -- `sameApprovedPlan=false` -- no receipt -- closure state `completed` - -Expected: -- PASS or falls back to old behavior only if no same-plan next-task obligation is active - -**Step 3: Run tests to verify current behavior does not satisfy them yet** -Run: -```bash -node scripts/test_approved_plan_continuity_gate.mjs -``` -Expected: FAIL or mixed results; note exact mismatch in implementation comments if needed - -**Step 4: Commit** -```bash -git add scripts/test_approved_plan_continuity_gate.mjs -git commit -m "test: add neutral auto-next obligation coverage for unknown or out-of-plan next task" -``` - -### Task 7: Extend the continuity gate with minimal obligation logic - -**Files:** -- Modify: `scripts/approved_plan_continuity_gate.mjs` -- Modify: `scripts/test_approved_plan_continuity_gate.mjs` - -**Step 1: Add the smallest possible derived booleans** -Implement helpers like: -```js -const taskComplete = payload?.taskState === 'complete'; -const nextAction = payload?.nextDerivedAction ?? payload?.derivedAction ?? null; -const nextTaskKnown = payload?.nextTaskKnown === true; -const sameApprovedPlan = payload?.sameApprovedPlan === true; -const taskBoundaryStop = payload?.taskBoundaryStop === true; -const highRiskStop = payload?.highRiskStop === true; -const hasDispatchReceipt = hasValidDispatchReceipt(payload?.dispatchReceipt ?? null); -const closureState = payload?.replyClosureState ?? null; -const isLegalTerminalState = LEGAL_TERMINAL_STATES.has(closureState); -const autoNextObligatory = taskComplete && nextTaskKnown && sameApprovedPlan && taskBoundaryStop && !isLegalTerminalState && !highRiskStop; -``` - -**Step 2: Add the new failure rule before the generic pass path** -Minimal rule: -- if `autoNextObligatory` and no valid dispatch receipt exists => fail with: - - `ok: false` - - `status: 'continuity_failure'` - - `verdict: 'continuity_failure'` - - `reason: 'missing_auto_next_dispatch'` - -Keep the existing generic receipt failure behavior for legacy cases that are not strict same-plan task-boundary obligation cases, unless tests prove it should collapse into the new reason. - -**Step 3: Run the continuity gate tests** -Run: -```bash -node scripts/test_approved_plan_continuity_gate.mjs -``` -Expected: PASS for all old and new cases - -**Step 4: Commit** -```bash -git add scripts/approved_plan_continuity_gate.mjs scripts/test_approved_plan_continuity_gate.mjs -git commit -m "feat: enforce auto-next obligation at approved plan task boundaries" -``` - -### Task 8: Add dispatch-binding test that the next task receipt is mandatory proof - -**Files:** -- Modify: `scripts/test_approved_plan_continuity_gate.mjs` -- Modify: `scripts/test_force_recall_long_task_preflight.mjs` -- Read: `scripts/approved_plan_dispatch_binding.mjs` - -**Step 1: Add a test that a receipt with wrong task linkage does not satisfy auto-next** -Case: -- current task is `task-8` -- next task known is `task-9` -- receipt exists but links to stale or mismatched action/task context - -Expected: -- FAIL -- reason remains an auto-next continuity failure - -If current receipt schema lacks enough linkage to assert this exactly, capture that as an explicit schema gap in comments and lock at least plan/task equality on currently available fields. - -**Step 2: Add a preflight hook assertion** -Expected hook output should still fail when planner says the next task is known but no real bound receipt for that next dispatch exists. - -**Step 3: Run both suites** -Run: -```bash -node scripts/test_approved_plan_continuity_gate.mjs -node scripts/test_force_recall_long_task_preflight.mjs -``` -Expected: FAIL before hook/input wiring lands, or PASS only for the script-level side if hook has not been updated yet - -**Step 4: Commit** -```bash -git add scripts/test_approved_plan_continuity_gate.mjs scripts/test_force_recall_long_task_preflight.mjs -git commit -m "test: require real next-task dispatch proof for auto-next obligation" -``` - -### Task 9: Add explicit hook input fields for task-boundary obligation - -**Files:** -- Modify: `hooks/force-recall/handler.ts` -- Read: `scripts/plan_long_task_auto_chain.mjs` -- Read: `scripts/approved_plan_continuity_gate.mjs` - -**Step 1: Add a focused builder for auto-next obligation fields** -Extend `buildApprovedPlanContinuityInput(...)` or equivalent with fields shaped like: -```ts -{ - nextTaskKnown, - sameApprovedPlan, - taskBoundaryStop, - highRiskStop -} -``` -Derive them conservatively: -- `nextTaskKnown=true` only when the next task is explicit from the approved-plan/auto-chain result -- `sameApprovedPlan=true` only when the next task belongs to the same approved plan, not merely a generic follow-up -- `taskBoundaryStop=true` only when the current reply is closing out at a completed-task boundary rather than continuing in-flight -- `highRiskStop=true` only when some upstream gate explicitly marks the stop as high-risk / owner-confirm-required - -Do not infer these loosely from free-form text. - -**Step 2: Preserve current legal closure fallback behavior** -Keep: -- `waiting_user` for button-path handoff -- `completed` as normal fallback - -But ensure those defaults do not accidentally mask auto-next obligation cases. - -**Step 3: Syntax-check the hook file** -Run: -```bash -node --check hooks/force-recall/handler.ts -``` -Expected: PASS - -**Step 4: Commit** -```bash -git add hooks/force-recall/handler.ts -git commit -m "feat: feed auto-next obligation metadata into continuity hook input" -``` - -### Task 10: Upgrade hook messaging so the failure is explicit - -**Files:** -- Modify: `hooks/force-recall/handler.ts` -- Modify: `scripts/test_force_recall_long_task_preflight.mjs` - -**Step 1: Add fail-first assertion for the new reason in hook output** -Expect the injected block to include something equivalent to: -- `reason=missing_auto_next_dispatch` -- “Do not stop at this completed-task boundary.” -- “Auto-dispatch the next task in the same approved plan, unless waiting_user / blocked / pending_verification / high-risk stop applies.” - -**Step 2: Implement the smallest wording update** -In the continuity block builder, add a branch for the new reason so the prompt block distinguishes: -- generic missing dispatch receipt -- auto-next obligation failure at a task boundary - -**Step 3: Run hook smoke tests** -Run: -```bash -node scripts/test_force_recall_long_task_preflight.mjs -``` -Expected: PASS - -**Step 4: Commit** -```bash -git add hooks/force-recall/handler.ts scripts/test_force_recall_long_task_preflight.mjs -git commit -m "feat: surface explicit auto-next obligation failure in force-recall hook" -``` - -### Task 11: Add minimal receipt-linkage hardening only if required by tests - -**Files:** -- Modify: `scripts/approved_plan_dispatch_binding.mjs` -- Modify: `scripts/test_approved_plan_continuity_gate.mjs` -- Modify: `docs/runbooks/approved-plan-continuity.md` -- Modify: `docs/runbooks/auto-next-obligation-gate.md` - -**Step 1: Only if needed, add one additional linkage field** -If the new tests cannot reliably distinguish “some dispatch happened” from “the required next task was dispatched,” add only one minimal extra receipt field such as: -- `nextTaskId` -or -- `nextTaskKey` - -Do not redesign the whole receipt schema. - -**Step 2: Add fail-first + pass-path tests for that field** -- stale/missing linkage => FAIL -- correct linkage => PASS - -**Step 3: Re-run targeted tests** -Run: -```bash -node scripts/test_approved_plan_continuity_gate.mjs -``` -Expected: PASS - -**Step 4: Commit** -```bash -git add scripts/approved_plan_dispatch_binding.mjs scripts/test_approved_plan_continuity_gate.mjs docs/runbooks/approved-plan-continuity.md docs/runbooks/auto-next-obligation-gate.md -git commit -m "feat: harden dispatch receipt linkage for auto-next obligation" -``` - -### Task 12: Add continuity-plugin MVP compatibility notes before extraction - -**Files:** -- Modify: `docs/plans/2026-04-24-auto-next-obligation-gate.md` -- Modify: `docs/plans/2026-04-24-continuity-plugin-mvp.md` -- Read: `plugins/continuity/` if present, otherwise keep as future note only - -**Step 1: Add an explicit “Plugin MVP Compatibility” section** -Document these compatibility constraints: -- auto-next obligation must remain a pure evaluator rule, not hook-only string logic -- high-risk stop must become a config/input flag, not a prompt convention -- same-plan next-task proof must be representable in plugin evaluator input -- receipt validation and receipt storage remain separable from evaluator logic -- legacy script envelopes must remain bridgeable during extraction - -**Step 2: Add the expected plugin module seams** -List future homes: -- evaluator rule -> `plugins/continuity/src/continuity/evaluator.mjs` -- receipt linkage validation -> `plugins/continuity/src/continuity/receipt-validator.mjs` -- hook wording -> `plugins/continuity/src/adapters/force-recall.mjs` - -**Step 3: Verify both plan docs mention `highRiskStop` and `auto-next`** -Run: -```bash -grep -n "highRiskStop\|auto-next" docs/plans/2026-04-24-auto-next-obligation-gate.md docs/plans/2026-04-24-continuity-plugin-mvp.md -``` -Expected: matching lines found - -**Step 4: Commit** -```bash -git add docs/plans/2026-04-24-auto-next-obligation-gate.md docs/plans/2026-04-24-continuity-plugin-mvp.md -git commit -m "docs: capture continuity plugin compatibility for auto-next obligation" -``` - -### Task 13: Run the focused verification bundle - -**Files:** -- Verify: `scripts/approved_plan_continuity_gate.mjs` -- Verify: `scripts/test_approved_plan_continuity_gate.mjs` -- Verify: `scripts/test_force_recall_long_task_preflight.mjs` -- Verify: `hooks/force-recall/handler.ts` - -**Step 1: Run continuity gate suite** -Run: -```bash -node scripts/test_approved_plan_continuity_gate.mjs -``` -Expected: PASS - -**Step 2: Run hook smoke suite** -Run: -```bash -node scripts/test_force_recall_long_task_preflight.mjs -``` -Expected: PASS - -**Step 3: Run syntax check** -Run: ```bash node --check hooks/force-recall/handler.ts node --check scripts/approved_plan_continuity_gate.mjs node --check scripts/approved_plan_dispatch_binding.mjs -``` -Expected: PASS - -**Step 4: Record exact verification output in the plan tail** -Include: -- exact commands -- PASS summary -- any deliberately deferred cases - -**Step 5: Commit** -```bash -git add docs/plans/2026-04-24-auto-next-obligation-gate.md -git commit -m "chore: verify auto-next obligation gate slice" +node scripts/test_approved_plan_continuity_gate.mjs +node scripts/test_force_recall_long_task_preflight.mjs ``` -### Task 14: Final acceptance checklist and handoff state +### Result summary -**Files:** -- Modify: `docs/plans/2026-04-24-auto-next-obligation-gate.md` +- `node --check hooks/force-recall/handler.ts` ✅ +- `node --check scripts/approved_plan_continuity_gate.mjs` ✅ +- `node --check scripts/approved_plan_dispatch_binding.mjs` ✅ +- `node scripts/test_approved_plan_continuity_gate.mjs` ✅ `17/17 passed` +- `node scripts/test_force_recall_long_task_preflight.mjs` ✅ -**Step 1: Add the acceptance checklist** -Mark the plan acceptable only when all items are true: -- completed-task boundary stop without auto-next now fails -- dry-run planner intent alone does not satisfy continuity -- legal closure states still pass -- explicit high-risk stop passes -- same-plan next-task obligation is distinguished from generic next-action wording -- hook output surfaces the new failure clearly -- plugin extraction compatibility is documented -- tests pass +### What was hardened in this slice -**Step 2: Add explicit remaining risks** -Include at least: -- current hook may still need better upstream proof for `sameApprovedPlan` -- high-risk stop source of truth may not yet exist and may need one future metadata slice -- receipt schema may need exactly one extra linkage field if stale receipts can spoof pass conditions -- non-`force-recall` entry points remain out of scope for this slice +- continuity evaluator now rejects receipts that do not match the required `planId`, `currentTask`, and expected next dispatch action +- minimal receipt linkage field `nextTaskId` was added so the evaluator can distinguish the required next-task dispatch from a stale or unrelated receipt +- continuity tests now fail when the receipt links to the wrong next task +- continuity tests now fail when a receipt only contains checkpoint/session-style metadata instead of real dispatch linkage +- hook preflight verification still confirms that dry-run planner intent alone does not satisfy continuity, and that the failure reason remains `missing_auto_next_dispatch` -**Step 3: Leave status as pending verification** -Do not mark implementation complete in the plan; leave it as a verified-ready handoff target. +### Deliberately deferred -**Step 4: Commit** -```bash -git add docs/plans/2026-04-24-auto-next-obligation-gate.md -git commit -m "docs: finalize auto-next obligation gate implementation plan" -``` +- stronger upstream source-of-truth for `sameApprovedPlan` +- broader non-`force-recall` entry-point enforcement +- continuity plugin extraction work --- @@ -666,36 +173,23 @@ The enforcement should stay intentionally small: ## Acceptance Criteria -- [ ] A completed task in the same approved plan cannot stop at a boundary when the next task is known unless an allowed exemption applies. -- [ ] The continuity evaluator emits a dedicated failure for missing required auto-next dispatch. -- [ ] A real dispatch receipt is still required; dry-run planner output alone cannot pass. -- [ ] Legal closure states `waiting_user`, `blocked`, `pending_verification` still pass unchanged. -- [ ] Explicit `highRiskStop` bypass is supported and test-covered. -- [ ] Hook output clearly explains the auto-next obligation failure. -- [ ] Script-level continuity tests pass. -- [ ] Hook smoke tests pass. +- [x] A completed task in the same approved plan cannot stop at a boundary when the next task is known unless an allowed exemption applies. +- [x] The continuity evaluator emits a dedicated failure for missing required auto-next dispatch. +- [x] A real dispatch receipt is still required; dry-run planner output alone cannot pass. +- [x] Legal closure states `waiting_user`, `blocked`, `pending_verification` still pass unchanged. +- [x] Explicit `highRiskStop` bypass is supported and test-covered. +- [x] Hook output clearly explains the auto-next obligation failure. +- [x] Script-level continuity tests pass. +- [x] Hook smoke tests pass. - [ ] The plan documents how this behavior migrates cleanly into the continuity plugin MVP. ## Risks / Open Questions 1. The current hook may not yet expose a strong enough source of truth for `sameApprovedPlan`; if so, one narrow upstream metadata field may be needed. 2. `highRiskStop` may not currently exist in structured input, so the first implementation may need a conservative default of `false` until an upstream gate can set it explicitly. -3. If current receipt shape cannot prove the required next-task linkage, one minimal receipt field should be added instead of broad schema redesign. +3. Receipt schema may still need one future compatibility pass if downstream writers have not yet been upgraded to emit `nextTaskId` everywhere continuity depends on same-plan auto-next proof. 4. This slice deliberately does not solve non-hook entry points or general workflow orchestration. -Plan complete and saved to `docs/plans/2026-04-24-auto-next-obligation-gate.md`. Two execution options: +## Status -**1. Subagent-Driven (this session)** - I dispatch fresh subagent per task, review between tasks, fast iteration - -**2. Parallel Session (separate)** - Open new session with executing-plans, batch execution with checkpoints - -**Which approach?** - -**If Subagent-Driven chosen:** -- **REQUIRED SUB-SKILL:** Use superpowers:subagent-driven-development -- Stay in this session -- Fresh subagent per task + code review - -**If Parallel Session chosen:** -- Guide them to open new session in worktree -- **REQUIRED SUB-SKILL:** New session uses superpowers:executing-plans +pending verification / reviewer checked diff --git a/docs/runbooks/approved-plan-continuity.md b/docs/runbooks/approved-plan-continuity.md index 5a7cdd4..2443020 100644 --- a/docs/runbooks/approved-plan-continuity.md +++ b/docs/runbooks/approved-plan-continuity.md @@ -35,6 +35,10 @@ - Use this field to state whether the reply closed under a dispatch-linked continuation path or some separately defined terminal closure state. - This field is defined here as a receipt field only; legal closure states and gate enforcement are defined in later tasks. +### `nextTaskId` +- The identifier of the required next task when continuity depends on a same-plan auto-next transition. +- Use this field only to prove that the receipt links to the exact next task that had to be dispatched. +- This field is the minimal hardening field for next-task linkage; it prevents unrelated dispatches, checkpoints, or stale receipts from spoofing continuity pass. ## Legal terminal states diff --git a/docs/runbooks/auto-next-obligation-gate.md b/docs/runbooks/auto-next-obligation-gate.md index 65080be..126029f 100644 --- a/docs/runbooks/auto-next-obligation-gate.md +++ b/docs/runbooks/auto-next-obligation-gate.md @@ -51,6 +51,8 @@ The following behavior is forbidden: A completed task in the same approved plan must not end with “I can continue with the next task” style closeout unless the next task has actually been dispatched. +Checkpoint artifacts, session keys, or oral/plain-text status updates are not substitutes for a real auto-next dispatch. A checkpoint may preserve state, but it does not prove that the required next task was actually dispatched. + ## Canonical failure condition If all of the following are true: @@ -81,4 +83,6 @@ Then the continuity gate must fail and treat the stop as an auto-next obligation - The obligation applies only when the next task is known within the same approved plan. - A generic next action is not enough unless it proves the same approved plan task transition. - A real dispatch receipt remains the source of truth for whether auto-next actually happened. +- Receipt linkage should include the required next-task identity when the evaluator needs to distinguish a real next-task dispatch from a stale or unrelated dispatch. +- Checkpoint/session metadata alone must not satisfy the receipt proof. - This rule is intentionally minimal so it can later move into the continuity plugin without changing the behavior contract. diff --git a/scripts/approved_plan_continuity_gate.mjs b/scripts/approved_plan_continuity_gate.mjs index bd58019..8afa952 100755 --- a/scripts/approved_plan_continuity_gate.mjs +++ b/scripts/approved_plan_continuity_gate.mjs @@ -11,6 +11,10 @@ function isObject(value) { return value != null && typeof value === 'object' && !Array.isArray(value); } +function normalizeAction(action) { + return JSON.stringify(action ?? null); +} + function hasValidDispatchReceipt(receipt) { if (!isObject(receipt)) return false; if (!isNonEmptyString(receipt.planId)) return false; @@ -20,6 +24,27 @@ function hasValidDispatchReceipt(receipt) { return true; } +function receiptMatchesPayload(payload, receipt) { + if (!hasValidDispatchReceipt(receipt)) return false; + + const expectedPlanId = payload?.planId; + if (isNonEmptyString(expectedPlanId) && receipt.planId !== expectedPlanId) return false; + + const expectedCurrentTask = payload?.currentTask; + if (isNonEmptyString(expectedCurrentTask) && receipt.currentTask !== expectedCurrentTask) return false; + + const expectedNextTask = payload?.nextTaskId ?? payload?.nextTaskKey ?? null; + const receiptNextTask = receipt?.nextTaskId ?? receipt?.nextTaskKey ?? null; + if (isNonEmptyString(expectedNextTask) && receiptNextTask !== expectedNextTask) return false; + + const expectedNextAction = payload?.nextDerivedAction ?? payload?.derivedAction ?? null; + if (expectedNextAction != null && normalizeAction(receipt.nextDerivedAction) !== normalizeAction(expectedNextAction)) { + return false; + } + + return true; +} + function parseArgs(argv) { let inputPath = null; let compact = false; @@ -80,9 +105,9 @@ function evaluateContinuity(payload) { const sameApprovedPlan = payload?.sameApprovedPlan === true; const taskBoundaryStop = payload?.taskBoundaryStop === true; const highRiskStop = payload?.highRiskStop === true; - const hasDispatchReceipt = hasValidDispatchReceipt(payload?.dispatchReceipt ?? null); const closureState = payload?.replyClosureState ?? null; const isLegalTerminalState = LEGAL_TERMINAL_STATES.has(closureState); + const hasDispatchReceipt = receiptMatchesPayload(payload, payload?.dispatchReceipt ?? null); const autoNextObligatory = taskComplete && explicitNextTaskKnown && sameApprovedPlan @@ -150,5 +175,4 @@ const response = { }, }; -process.stdout.write(`${JSON.stringify(response)} -`); +process.stdout.write(`${JSON.stringify(response)}\n`); diff --git a/scripts/approved_plan_dispatch_binding.mjs b/scripts/approved_plan_dispatch_binding.mjs index b1ce01e..bd40aca 100755 --- a/scripts/approved_plan_dispatch_binding.mjs +++ b/scripts/approved_plan_dispatch_binding.mjs @@ -81,6 +81,7 @@ function buildReceipt(payload) { const receipt = { planId: payload?.planId ?? null, currentTask: payload?.currentTask ?? null, + nextTaskId: payload?.nextTaskId ?? null, nextDerivedAction: nextAction, dispatchedAt: payload?.dispatchedAt ?? null, dispatchRunId: payload?.dispatchRunId ?? null, @@ -97,6 +98,7 @@ function validateReceipt(receipt) { for (const field of [ 'planId', 'currentTask', + 'nextTaskId', 'nextDerivedAction', 'dispatchedAt', 'dispatchRunId', diff --git a/scripts/test_approved_plan_continuity_gate.mjs b/scripts/test_approved_plan_continuity_gate.mjs index b163f50..5186fe5 100644 --- a/scripts/test_approved_plan_continuity_gate.mjs +++ b/scripts/test_approved_plan_continuity_gate.mjs @@ -179,6 +179,7 @@ const tests = [ nextTaskKnown: true, sameApprovedPlan: true, taskBoundaryStop: true, + nextTaskId: 'task-9', nextDerivedAction: { type: 'message_subagent', task: 'continue with task-9', @@ -190,31 +191,12 @@ const tests = [ }); try { - const result = runGate({ - args: ['--compact', '--input', fixture.path('input.json')], - }); - - if (result.status !== 0 && result.status !== null) { - throw new Error(`expected controlled execution, got status=${result.status} -${result.stderr || result.stdout}`); - } - - if (!result.json || typeof result.json !== 'object') { - throw new Error(`expected JSON output -stdout=${result.stdout}`); - } - - if (result.json.ok !== false) { - throw new Error(`expected auto-next continuity failure ok=false, got ${JSON.stringify(result.json)}`); - } - - if (result.json.verdict !== 'continuity_failure') { - throw new Error(`expected verdict=continuity_failure, got ${JSON.stringify(result.json.verdict)}`); - } - - if (result.json.reason !== 'missing_auto_next_dispatch') { - throw new Error(`expected reason=missing_auto_next_dispatch, got ${JSON.stringify(result.json.reason)}`); - } + const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] }); + if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); + if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`); + if (result.json.ok !== false) throw new Error(`expected auto-next continuity failure ok=false, got ${JSON.stringify(result.json)}`); + if (result.json.verdict !== 'continuity_failure') throw new Error(`expected verdict=continuity_failure, got ${JSON.stringify(result.json.verdict)}`); + if (result.json.reason !== 'missing_auto_next_dispatch') throw new Error(`expected reason=missing_auto_next_dispatch, got ${JSON.stringify(result.json.reason)}`); } finally { fixture.cleanup(); } @@ -231,6 +213,7 @@ stdout=${result.stdout}`); nextTaskKnown: true, sameApprovedPlan: true, taskBoundaryStop: true, + nextTaskId: 'task-9b', derivedAction: { type: 'message_subagent', task: 'continue with task-9b', @@ -242,31 +225,12 @@ stdout=${result.stdout}`); }); try { - const result = runGate({ - args: ['--compact', '--input', fixture.path('input.json')], - }); - - if (result.status !== 0 && result.status !== null) { - throw new Error(`expected controlled execution, got status=${result.status} -${result.stderr || result.stdout}`); - } - - if (!result.json || typeof result.json !== 'object') { - throw new Error(`expected JSON output -stdout=${result.stdout}`); - } - - if (result.json.ok !== false) { - throw new Error(`expected auto-next continuity failure ok=false, got ${JSON.stringify(result.json)}`); - } - - if (result.json.verdict !== 'continuity_failure') { - throw new Error(`expected verdict=continuity_failure, got ${JSON.stringify(result.json.verdict)}`); - } - - if (result.json.reason !== 'missing_auto_next_dispatch') { - throw new Error(`expected reason=missing_auto_next_dispatch, got ${JSON.stringify(result.json.reason)}`); - } + const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] }); + if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); + if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`); + if (result.json.ok !== false) throw new Error(`expected auto-next continuity failure ok=false, got ${JSON.stringify(result.json)}`); + if (result.json.verdict !== 'continuity_failure') throw new Error(`expected verdict=continuity_failure, got ${JSON.stringify(result.json.verdict)}`); + if (result.json.reason !== 'missing_auto_next_dispatch') throw new Error(`expected reason=missing_auto_next_dispatch, got ${JSON.stringify(result.json.reason)}`); } finally { fixture.cleanup(); } @@ -283,6 +247,7 @@ stdout=${result.stdout}`); nextTaskKnown: true, sameApprovedPlan: true, taskBoundaryStop: true, + nextTaskId: 'task-9c', nextDerivedAction: { type: 'message_subagent', task: 'continue with task-9c', @@ -294,23 +259,10 @@ stdout=${result.stdout}`); }); try { - const result = runGate({ - args: ['--compact', '--input', fixture.path('input.json')], - }); - - if (result.status !== 0 && result.status !== null) { - throw new Error(`expected controlled execution, got status=${result.status} -${result.stderr || result.stdout}`); - } - - if (!result.json || typeof result.json !== 'object') { - throw new Error(`expected JSON output -stdout=${result.stdout}`); - } - - if (result.json.ok !== true) { - throw new Error(`expected continuity pass ok=true when highRiskStop=true, got ${JSON.stringify(result.json)}`); - } + const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] }); + if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); + if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`); + if (result.json.ok !== true) throw new Error(`expected continuity pass ok=true when highRiskStop=true, got ${JSON.stringify(result.json)}`); } finally { fixture.cleanup(); } @@ -334,23 +286,10 @@ stdout=${result.stdout}`); }); try { - const result = runGate({ - args: ['--compact', '--input', fixture.path('input.json')], - }); - - if (result.status !== 0 && result.status !== null) { - throw new Error(`expected controlled execution, got status=${result.status} -${result.stderr || result.stdout}`); - } - - if (!result.json || typeof result.json !== 'object') { - throw new Error(`expected JSON output -stdout=${result.stdout}`); - } - - if (result.json.ok !== true) { - throw new Error(`expected pass when nextTaskKnown=false, got ${JSON.stringify(result.json)}`); - } + const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] }); + if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); + if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`); + if (result.json.ok !== true) throw new Error(`expected pass when nextTaskKnown=false, got ${JSON.stringify(result.json)}`); } finally { fixture.cleanup(); } @@ -367,6 +306,7 @@ stdout=${result.stdout}`); nextTaskKnown: true, sameApprovedPlan: false, taskBoundaryStop: true, + nextTaskId: 'task-other', nextDerivedAction: { type: 'message_subagent', task: 'continue with unrelated task', @@ -378,23 +318,133 @@ stdout=${result.stdout}`); }); try { - const result = runGate({ - args: ['--compact', '--input', fixture.path('input.json')], - }); + const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] }); + if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); + if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`); + if (result.json.ok !== true) throw new Error(`expected pass when sameApprovedPlan=false, got ${JSON.stringify(result.json)}`); + } finally { + fixture.cleanup(); + } + }, + }, + { + name: 'auto-next obligation: fails when receipt exists but next-task linkage is stale or mismatched', + run() { + const fixture = createFixture({ + 'input.json': { + planId: 'plan-auto-next-linkage-mismatch', + currentTask: 'task-8f', + taskState: 'complete', + nextTaskKnown: true, + sameApprovedPlan: true, + taskBoundaryStop: true, + nextTaskId: 'task-9f', + nextDerivedAction: { + type: 'message_subagent', + task: 'continue with task-9f', + }, + replyClosureState: 'completed', + highRiskStop: false, + dispatchReceipt: { + planId: 'plan-auto-next-linkage-mismatch', + currentTask: 'task-8f', + nextTaskId: 'task-10f', + nextDerivedAction: { + type: 'message_subagent', + task: 'continue with task-10f', + }, + dispatchedAt: '2026-04-24T16:00:00+08:00', + }, + }, + }); - if (result.status !== 0 && result.status !== null) { - throw new Error(`expected controlled execution, got status=${result.status} -${result.stderr || result.stdout}`); - } + try { + const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] }); + if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); + if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`); + if (result.json.ok !== false) throw new Error(`expected linkage mismatch to fail, got ${JSON.stringify(result.json)}`); + if (result.json.reason !== 'missing_auto_next_dispatch') throw new Error(`expected linkage mismatch reason=missing_auto_next_dispatch, got ${JSON.stringify(result.json.reason)}`); + } finally { + fixture.cleanup(); + } + }, + }, + { + name: 'auto-next obligation: passes when receipt links to the required next task', + run() { + const fixture = createFixture({ + 'input.json': { + planId: 'plan-auto-next-linkage-match', + currentTask: 'task-8g', + taskState: 'complete', + nextTaskKnown: true, + sameApprovedPlan: true, + taskBoundaryStop: true, + nextTaskId: 'task-9g', + nextDerivedAction: { + type: 'message_subagent', + task: 'continue with task-9g', + }, + replyClosureState: 'completed', + highRiskStop: false, + dispatchReceipt: { + planId: 'plan-auto-next-linkage-match', + currentTask: 'task-8g', + nextTaskId: 'task-9g', + nextDerivedAction: { + type: 'message_subagent', + task: 'continue with task-9g', + }, + dispatchedAt: '2026-04-24T16:05:00+08:00', + }, + }, + }); - if (!result.json || typeof result.json !== 'object') { - throw new Error(`expected JSON output -stdout=${result.stdout}`); - } + try { + const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] }); + if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); + if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`); + if (result.json.ok !== true) throw new Error(`expected linkage-matched receipt to pass, got ${JSON.stringify(result.json)}`); + } finally { + fixture.cleanup(); + } + }, + }, + { + name: 'auto-next obligation: fails when receipt only proves checkpoint/session metadata without actual dispatch linkage', + run() { + const fixture = createFixture({ + 'input.json': { + planId: 'plan-auto-next-checkpoint-spoof', + currentTask: 'task-8h', + taskState: 'complete', + nextTaskKnown: true, + sameApprovedPlan: true, + taskBoundaryStop: true, + nextTaskId: 'task-9h', + nextDerivedAction: { + type: 'message_subagent', + task: 'continue with task-9h', + }, + replyClosureState: 'completed', + highRiskStop: false, + dispatchReceipt: { + planId: 'plan-auto-next-checkpoint-spoof', + currentTask: 'task-8h', + nextTaskId: 'task-9h', + checkpointPath: 'checkpoints/task-8h.json', + sessionKey: 'task-8h', + dispatchedAt: '2026-04-24T16:10:00+08:00', + }, + }, + }); - if (result.json.ok !== true) { - throw new Error(`expected pass when sameApprovedPlan=false, got ${JSON.stringify(result.json)}`); - } + try { + const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] }); + if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); + if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`); + if (result.json.ok !== false) throw new Error(`expected checkpoint-only receipt to fail, got ${JSON.stringify(result.json)}`); + if (result.json.reason !== 'missing_auto_next_dispatch') throw new Error(`expected checkpoint-only reason=missing_auto_next_dispatch, got ${JSON.stringify(result.json.reason)}`); } finally { fixture.cleanup(); } @@ -420,35 +470,17 @@ stdout=${result.stdout}`); }); try { - const result = runGate({ - args: ['--compact', '--input', fixture.path('input.json')], - }); - - if (result.status !== 0 && result.status !== null) { - throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); - } - - if (!result.json || typeof result.json !== 'object') { - throw new Error(`expected JSON output\nstdout=${result.stdout}`); - } - - if (result.json.ok !== false) { - throw new Error(`expected continuity failure ok=false for fake dispatch receipt, got ${JSON.stringify(result.json)}`); - } - - if (result.json.verdict !== 'continuity_failure') { - throw new Error(`expected verdict=continuity_failure for fake dispatch receipt, got ${JSON.stringify(result.json.verdict)}`); - } - - if (result.json.reason !== 'missing_dispatch_receipt') { - throw new Error(`expected reason=missing_dispatch_receipt for fake dispatch receipt, got ${JSON.stringify(result.json.reason)}`); - } + const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] }); + if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); + if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`); + if (result.json.ok !== false) throw new Error(`expected continuity failure ok=false for fake dispatch receipt, got ${JSON.stringify(result.json)}`); + if (result.json.verdict !== 'continuity_failure') throw new Error(`expected verdict=continuity_failure for fake dispatch receipt, got ${JSON.stringify(result.json.verdict)}`); + if (result.json.reason !== 'missing_dispatch_receipt') throw new Error(`expected reason=missing_dispatch_receipt for fake dispatch receipt, got ${JSON.stringify(result.json.reason)}`); } finally { fixture.cleanup(); } }, }, - { name: 'continuity: passes when task is complete, next action is known, and a dispatch receipt already exists', run() { @@ -475,27 +507,15 @@ stdout=${result.stdout}`); }); try { - const result = runGate({ - args: ['--compact', '--input', fixture.path('input.json')], - }); - - if (result.status !== 0 && result.status !== null) { - throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); - } - - if (!result.json || typeof result.json !== 'object') { - throw new Error(`expected JSON output\nstdout=${result.stdout}`); - } - - if (result.json.ok !== true) { - throw new Error(`expected continuity pass ok=true when dispatch receipt exists, got ${JSON.stringify(result.json)}`); - } + const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] }); + if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); + if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`); + if (result.json.ok !== true) throw new Error(`expected continuity pass ok=true when dispatch receipt exists, got ${JSON.stringify(result.json)}`); } finally { fixture.cleanup(); } }, }, - { name: 'continuity: passes when planner returns derivedAction and a bound dispatch receipt already exists', run() { @@ -522,27 +542,15 @@ stdout=${result.stdout}`); }); try { - const result = runGate({ - args: ['--compact', '--input', fixture.path('input.json')], - }); - - if (result.status !== 0 && result.status !== null) { - throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); - } - - if (!result.json || typeof result.json !== 'object') { - throw new Error(`expected JSON output\nstdout=${result.stdout}`); - } - - if (result.json.ok !== true) { - throw new Error(`expected continuity pass ok=true when derivedAction has bound dispatch receipt, got ${JSON.stringify(result.json)}`); - } + const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] }); + if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); + if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`); + if (result.json.ok !== true) throw new Error(`expected continuity pass ok=true when derivedAction has bound dispatch receipt, got ${JSON.stringify(result.json)}`); } finally { fixture.cleanup(); } }, }, - { name: 'continuity: passes when task is complete, next action is known, no dispatch receipt exists, and closure is waiting_user', run() { @@ -561,27 +569,15 @@ stdout=${result.stdout}`); }); try { - const result = runGate({ - args: ['--compact', '--input', fixture.path('input.json')], - }); - - if (result.status !== 0 && result.status !== null) { - throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); - } - - if (!result.json || typeof result.json !== 'object') { - throw new Error(`expected JSON output\nstdout=${result.stdout}`); - } - - if (result.json.ok !== true) { - throw new Error(`expected continuity pass ok=true when closure is waiting_user, got ${JSON.stringify(result.json)}`); - } + const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] }); + if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); + if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`); + if (result.json.ok !== true) throw new Error(`expected continuity pass ok=true when closure is waiting_user, got ${JSON.stringify(result.json)}`); } finally { fixture.cleanup(); } }, }, - { name: 'continuity: passes when task is complete, next action is known, no dispatch receipt exists, and closure is pending_verification', run() { @@ -600,27 +596,15 @@ stdout=${result.stdout}`); }); try { - const result = runGate({ - args: ['--compact', '--input', fixture.path('input.json')], - }); - - if (result.status !== 0 && result.status !== null) { - throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); - } - - if (!result.json || typeof result.json !== 'object') { - throw new Error(`expected JSON output\nstdout=${result.stdout}`); - } - - if (result.json.ok !== true) { - throw new Error(`expected continuity pass ok=true when closure is pending_verification, got ${JSON.stringify(result.json)}`); - } + const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] }); + if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); + if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`); + if (result.json.ok !== true) throw new Error(`expected continuity pass ok=true when closure is pending_verification, got ${JSON.stringify(result.json)}`); } finally { fixture.cleanup(); } }, }, - { name: 'continuity: passes when task is complete, next action is known, no dispatch receipt exists, and closure is blocked', run() { @@ -639,21 +623,10 @@ stdout=${result.stdout}`); }); try { - const result = runGate({ - args: ['--compact', '--input', fixture.path('input.json')], - }); - - if (result.status !== 0 && result.status !== null) { - throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); - } - - if (!result.json || typeof result.json !== 'object') { - throw new Error(`expected JSON output\nstdout=${result.stdout}`); - } - - if (result.json.ok !== true) { - throw new Error(`expected continuity pass ok=true when closure is blocked, got ${JSON.stringify(result.json)}`); - } + const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] }); + if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); + if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`); + if (result.json.ok !== true) throw new Error(`expected continuity pass ok=true when closure is blocked, got ${JSON.stringify(result.json)}`); } finally { fixture.cleanup(); }