feat: sync auto-next obligation gate hardening

This commit is contained in:
2026-04-24 16:41:48 +08:00
parent 7c362dedf8
commit cb34935b28
9 changed files with 741 additions and 155 deletions

114
README.md
View File

@@ -2,78 +2,102 @@
## 中文說明 ## 中文說明
這個 repo 是從較大的 OpenClaw workspace 中抽出的焦點工作流成果,主題是 這個 repo 目前聚焦兩條與 continuity 直接相關的成果
- **approved plan continuity hard-gate** - **approved plan continuity hard-gate**
- **dispatch receipt binding** - **auto-next obligation gate**
- **anti-blackhole / completion-delivery watchdog groundwork**
目標是避免兩類問題持續發生 目標是避免以下兩種 failure
1. **continuity failure / auto-next break** 1. **continuity failure / auto-next break**
2. **subagent anti-blackhole / fake timeout** - 任務已完成
- 下一步已知
- 但沒有真的 dispatch 下一顆 task
- 流程卻還是被當成正常收尾
2. **task-boundary stop / 口頭續跑**
- 同一份 approved plan 內其實應該 auto-next
- 但主代理停在 task boundary
- 用 checkpoint / 口頭回報 / session metadata 取代真正 dispatch
## 目前已完成 ## 目前已完成
### A. Continuity hard-gate ### A. Continuity hard-gate
- continuity evaluator - continuity evaluator
- dispatch receipt binding groundwork - receipt validator 最小欄位驗證
- `derivedAction` continuity binding - `derivedAction` / `nextDerivedAction` 納入 continuity 判定
- `dry_run_dispatch` 不得冒充真 receipt - `dry_run_dispatch` 不得冒充真 receipt
- fake receipt authority 最小收緊 - fake receipt 不得放行
- hook integration 已接入 - hook integration 已接入 `hooks/force-recall/handler.ts`
### B. Anti-blackhole watchdog recovery ### B. Auto-next obligation gate
- watchdog status recompute - 新 failure reason`missing_auto_next_dispatch`
- 最小 recovery decision 閉環 - 同一份 approved plan 中,若
- `fetch_history` - 當前 task 已完成
- `respawn` - 下一顆 task 已知
- `blocked` - `sameApprovedPlan=true`
- owner-visible reporting payload - `taskBoundaryStop=true`
- scenario matrix tests - `waiting_user` / `blocked` / `pending_verification`
-`highRiskStop`
- 且沒有真實 next dispatch receipt
- ⇒ 直接 fail不得停在 boundary 等主人再說「繼續」
- receipt linkage hardeningreceipt 現在要對到要求的 next-task handoff而不是只要存在就算過
- 新增最小 linkage 欄位:`nextTaskId`
- checkpoint / session metadata / stale receipt / dry-run planner intent 不得冒充 auto-next dispatch proof
## 驗證狀態
- `node scripts/test_approved_plan_continuity_gate.mjs``17 passed / 0 failed`
- `node scripts/test_force_recall_long_task_preflight.mjs` → PASS
- `node --check hooks/force-recall/handler.ts` → PASS
- `node --check scripts/approved_plan_continuity_gate.mjs` → PASS
- `node --check scripts/approved_plan_dispatch_binding.mjs` → PASS
## 目前限制 ## 目前限制
- continuity 仍偏 prompt-level hard-gate integration - 目前仍主要鎖在 continuity / force-recall 路徑,不是所有 entry points。
- watchdog recovery 目前驗收的是 decision / reporting / test slice不是 live integration - `sameApprovedPlan` 的上游證據仍可再更硬。
- continuity plugin MVP 仍在後續產品化中,尚未整理成可直接讓其他 OpenClaw 安裝的插件包。
## 下一步建議 ## 下一步
1. continuity runtime enforcement hardening 1. continuity 收尾覆核
2. watchdog live recovery integration 2. 回到 continuity plugin MVP
3. escalation / receipt contract hardening 3. 把目前 continuity 內核抽成可安裝、可設定、可測試、可依雙語 README 套用的插件 MVP
--- ---
## English Description ## English Description
This repository is a focused export from a larger OpenClaw workspace covering: This repository currently focuses on two continuity-related hardening slices:
- **approved plan continuity hard-gate** - **approved plan continuity hard-gate**
- **anti-blackhole / completion-delivery watchdog recovery** - **auto-next obligation gate**
It prevents two core failure classes:
1. **continuity failure / auto-next break**
2. **task-boundary stop disguised as progress**
## Current State ## Current State
### A. Continuity hard-gate ### A. Continuity hard-gate
- continuity evaluator - continuity evaluator
- dispatch receipt binding groundwork - minimum receipt validation
- `derivedAction` continuity binding - `derivedAction` / `nextDerivedAction` continuity handling
- `dry_run_dispatch` no longer accepted as a real receipt - `dry_run_dispatch` rejected as real receipt
- fake receipt authority tightened - fake receipt rejected
- hook integration present - hook integration in `hooks/force-recall/handler.ts`
### B. Anti-blackhole watchdog recovery ### B. Auto-next obligation gate
- watchdog status recompute - explicit failure reason: `missing_auto_next_dispatch`
- minimal recovery-decision loop: - task-boundary stop is now treated as continuity failure when same-plan auto-next is obligatory
- `fetch_history` - receipt linkage hardening via `nextTaskId`
- `respawn` - checkpoint / session metadata / stale receipt / dry-run intent can no longer stand in for real auto-next dispatch proof
- `blocked`
- owner-visible reporting payload ## Validation
- scenario matrix tests - continuity gate tests passing
- force-recall preflight passing
- syntax checks passing
## Current Limitations ## Current Limitations
- continuity remains prompt-level rather than engine-level - scoped mainly to the continuity / force-recall path
- watchdog recovery is validated as a decision/reporting/test slice, not live execution integration - upstream `sameApprovedPlan` evidence can still be hardened further
- plugin packaging is still pending
## Suggested Next Steps
1. continuity runtime enforcement hardening
2. watchdog live recovery integration
3. escalation / receipt contract hardening

View File

@@ -0,0 +1,195 @@
# Auto-Next Obligation Gate Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
**Goal:** Enforce that an approved plan may not stop at a task boundary when the current task is complete and the next task is already known, unless the closure is explicitly `waiting_user`, `blocked`, `pending_verification`, or a separately-declared high-risk stop point; otherwise the flow must auto-dispatch the next task and any stop is a continuity failure.
**Architecture:** Extend the current approved-plan continuity gate from a passive “missing dispatch receipt” detector into an obligation gate that evaluates task-boundary stops as first-class failures. Keep the design minimal: preserve the current receipt-based truth model, add explicit stop-intent / high-risk-stop metadata, fail first in tests, then wire the hook path so dry-run planner intent is no longer enough when the next task is deterministically known. Design the slices so the same evaluator can later be extracted into the continuity plugin MVP without changing the behavior contract.
**Tech Stack:** Node.js ESM scripts, TypeScript hook integration, JSON input/output envelopes, file-backed dispatch receipts, script-level tests, continuity plugin MVP compatibility layer
---
## Context Baseline
The current repo already has a partial continuity hard gate:
- `scripts/approved_plan_continuity_gate.mjs`
- fails only when `taskState=complete` + next action known + no valid `dispatchReceipt` + closure not in legal terminal states
- `scripts/approved_plan_dispatch_binding.mjs`
- writes receipt files once a dispatch is actually bound
- `hooks/force-recall/handler.ts`
- builds continuity input from wrapper/planner state and injects the continuity block
- `scripts/test_approved_plan_continuity_gate.mjs`
- already covers missing receipt, fake receipt, valid receipt, and legal terminal states
- `docs/plans/2026-04-24-continuity-plugin-mvp.md`
- already assumes this continuity behavior will later be extracted into a plugin
The remaining gap is narrower and more specific:
1. The current gate says “dont close if a known next action exists but no real receipt exists.”
2. But it does not yet model the stronger obligation: **if the next task in the same approved plan is already known and not blocked by an allowed stop condition, the system must auto-next dispatch instead of pausing at the boundary.**
3. This means the failure is not only “missing receipt,” but also **“stopped at a task boundary when auto-next was obligatory.”**
4. We need a minimal extension that preserves existing receipt truth, avoids speculative dispatch, and remains compatible with continuity-plugin extraction.
## Target Behavior Contract
When all of the following are true:
- current workflow is inside the same approved plan
- current task is complete
- the next task is known / derivable as a concrete next task
- closure state is not `waiting_user`, `blocked`, or `pending_verification`
- no explicit high-risk stop point is active
Then:
- the system must not stop at the task boundary
- the execution layer must auto-dispatch the next task
- a real dispatch receipt must exist for the next task handoff
- otherwise the reply/hook path must produce a continuity failure
When any of the following are true, auto-next is not obligatory:
- closure state is `waiting_user`
- closure state is `blocked`
- closure state is `pending_verification`
- an explicit high-risk stop point is active
- no next task is known
- current task is not complete
- plan scope is absent or ambiguous
## Required New Concepts
To keep the design minimal, introduce only the following new concepts:
- `nextTaskKnown`: boolean or derivable fact that the next task in the same approved plan is known
- `sameApprovedPlan`: boolean proving the next task belongs to the same approved plan, not merely a generic next action
- `taskBoundaryStop`: boolean indicating the system is trying to end the current reply at a completed-task boundary instead of dispatching onward
- `highRiskStop`: boolean indicating an allowed explicit stop point outside the normal legal closure states
- `autoNextObligatory`: derived evaluator result when auto-next must happen now
- `reason=missing_auto_next_dispatch` (or equivalent canonical reason) for the new failure mode
Do not widen this into a generalized workflow engine or arbitrary planner ontology in this slice.
## Current Gap
- current continuity gate checks a known next action, but it does not specifically require that the next task is the next task in the same approved plan
- current hook can surface planner-derived action from dry-run planning, but planner intent is not a real dispatch and does not prove continuity actually happened
- current dispatch binding writes receipts once dispatch is actually bound, but the gate does not yet express "must auto-dispatch now" as its own obligation at the task boundary
- current legal terminal states are hard-coded and do not include explicit `highRiskStop` metadata
## Non-goals
- generalized multi-plan scheduling
- speculative dispatch when the next task is ambiguous
- removing current receipt validation
- implementing continuity-plugin extraction in this slice
## Canonical Task-Boundary Stop Scenario
This is the scenario the implementation must lock down:
1. Approved plan has ordered tasks, e.g. Task 8 -> Task 9.
2. Task 8 just completed.
3. Task 9 is already known from the same approved plan.
4. The agent emits a normal closeout / handoff / “next I can continue with Task 9” style response.
5. No real auto-dispatch receipt exists for Task 9.
6. Closure is not `waiting_user`, `blocked`, `pending_verification`.
7. No high-risk stop point is active.
Expected outcome:
- continuity gate fails
- hook output explicitly forbids stopping at this task boundary
- system must route to auto-next dispatch path or continuity failure path
- dry-run planner intent alone does not satisfy the obligation
---
## Verification Record
### Commands run
```bash
node --check hooks/force-recall/handler.ts
node --check scripts/approved_plan_continuity_gate.mjs
node --check scripts/approved_plan_dispatch_binding.mjs
node scripts/test_approved_plan_continuity_gate.mjs
node scripts/test_force_recall_long_task_preflight.mjs
```
### Result summary
- `node --check hooks/force-recall/handler.ts`
- `node --check scripts/approved_plan_continuity_gate.mjs`
- `node --check scripts/approved_plan_dispatch_binding.mjs`
- `node scripts/test_approved_plan_continuity_gate.mjs``17/17 passed`
- `node scripts/test_force_recall_long_task_preflight.mjs`
### What was hardened in this slice
- continuity evaluator now rejects receipts that do not match the required `planId`, `currentTask`, and expected next dispatch action
- minimal receipt linkage field `nextTaskId` was added so the evaluator can distinguish the required next-task dispatch from a stale or unrelated receipt
- continuity tests now fail when the receipt links to the wrong next task
- continuity tests now fail when a receipt only contains checkpoint/session-style metadata instead of real dispatch linkage
- hook preflight verification still confirms that dry-run planner intent alone does not satisfy continuity, and that the failure reason remains `missing_auto_next_dispatch`
### Deliberately deferred
- stronger upstream source-of-truth for `sameApprovedPlan`
- broader non-`force-recall` entry-point enforcement
- continuity plugin extraction work
---
## Minimal Enforcement Design Summary
The enforcement should stay intentionally small:
1. **Keep receipt truth model**
- a real dispatch receipt remains the pass proof
- planner intent alone is not proof
2. **Add one stronger evaluator branch**
- when the next task in the same approved plan is known and the current reply is stopping at a completed-task boundary, auto-next becomes obligatory
- missing receipt in this branch is a dedicated continuity failure
3. **Allow only narrow exemptions**
- `waiting_user`
- `blocked`
- `pending_verification`
- `highRiskStop=true`
4. **Keep hook integration thin**
- hook computes structured booleans
- evaluator makes the decision
- hook renders the reason-specific block
5. **Preserve plugin extraction path**
- no hook-only business logic
- no receipt-store / evaluator coupling
- no prompt-only policy with no machine-checkable input
## Acceptance Criteria
- [x] A completed task in the same approved plan cannot stop at a boundary when the next task is known unless an allowed exemption applies.
- [x] The continuity evaluator emits a dedicated failure for missing required auto-next dispatch.
- [x] A real dispatch receipt is still required; dry-run planner output alone cannot pass.
- [x] Legal closure states `waiting_user`, `blocked`, `pending_verification` still pass unchanged.
- [x] Explicit `highRiskStop` bypass is supported and test-covered.
- [x] Hook output clearly explains the auto-next obligation failure.
- [x] Script-level continuity tests pass.
- [x] Hook smoke tests pass.
- [ ] The plan documents how this behavior migrates cleanly into the continuity plugin MVP.
## Risks / Open Questions
1. The current hook may not yet expose a strong enough source of truth for `sameApprovedPlan`; if so, one narrow upstream metadata field may be needed.
2. `highRiskStop` may not currently exist in structured input, so the first implementation may need a conservative default of `false` until an upstream gate can set it explicitly.
3. Receipt schema may still need one future compatibility pass if downstream writers have not yet been upgraded to emit `nextTaskId` everywhere continuity depends on same-plan auto-next proof.
4. This slice deliberately does not solve non-hook entry points or general workflow orchestration.
## Status
pending verification / reviewer checked

View File

@@ -35,6 +35,10 @@
- Use this field to state whether the reply closed under a dispatch-linked continuation path or some separately defined terminal closure state. - Use this field to state whether the reply closed under a dispatch-linked continuation path or some separately defined terminal closure state.
- This field is defined here as a receipt field only; legal closure states and gate enforcement are defined in later tasks. - This field is defined here as a receipt field only; legal closure states and gate enforcement are defined in later tasks.
### `nextTaskId`
- The identifier of the required next task when continuity depends on a same-plan auto-next transition.
- Use this field only to prove that the receipt links to the exact next task that had to be dispatched.
- This field is the minimal hardening field for next-task linkage; it prevents unrelated dispatches, checkpoints, or stale receipts from spoofing continuity pass.
## Legal terminal states ## Legal terminal states

View File

@@ -0,0 +1,88 @@
# Auto-Next Obligation Gate
## Purpose
This runbook defines the approved-plan continuity rule that a workflow may not stop at a completed-task boundary when the next task in the same approved plan is already known and continuation is still allowed.
## When auto-next is obligatory
Auto-next is obligatory when all of the following are true:
- the current workflow is inside the same approved plan
- the current task is complete
- the next task is known
- the system is attempting a task-boundary stop instead of continuing execution
- reply closure state is not `waiting_user`
- reply closure state is not `blocked`
- reply closure state is not `pending_verification`
- `highRiskStop` is not active
In this state, the system must auto-dispatch the next task and record a real dispatch receipt. A dry-run planner result or stated intent to continue is not enough.
## Legal non-auto-next closures
The following are legal non-auto-next closures even when a next task exists:
- `waiting_user`
- `blocked`
- `pending_verification`
These states are the only normal closure states that can stop without auto-next dispatch.
## Allowed non-closure exception
The following explicit exception may bypass auto-next obligation without using the normal legal terminal closure states:
- `highRiskStop`
`highRiskStop` means the workflow is intentionally stopping at an explicit high-risk stop point and therefore does not have to auto-dispatch the next task yet.
## Forbidden behavior
The following behavior is forbidden:
- completed task
- next task known
- same approved plan
- normal closeout or handoff language
- no real dispatch receipt for the next task
- no legal closure state
- no `highRiskStop`
A completed task in the same approved plan must not end with “I can continue with the next task” style closeout unless the next task has actually been dispatched.
Checkpoint artifacts, session keys, or oral/plain-text status updates are not substitutes for a real auto-next dispatch. A checkpoint may preserve state, but it does not prove that the required next task was actually dispatched.
## Canonical failure condition
If all of the following are true:
- task is complete
- next task is known
- next task belongs to the same approved plan
- the system is stopping at a task boundary
- no valid dispatch receipt exists
- closure is not `waiting_user`, `blocked`, or `pending_verification`
- `highRiskStop` is false
Then the continuity gate must fail and treat the stop as an auto-next obligation violation.
## Canonical failure table
| Task complete | Next task known | Same approved plan | Boundary stop | Receipt | Closure / exception | Expected |
| --- | --- | --- | --- | --- | --- | --- |
| yes | yes | yes | yes | no | completed closure | FAIL |
| yes | yes | yes | yes | valid receipt | completed closure | PASS |
| yes | yes | yes | yes | no | `waiting_user` | PASS |
| yes | yes | yes | yes | no | `blocked` | PASS |
| yes | yes | yes | yes | no | `pending_verification` | PASS |
| yes | yes | yes | yes | no | `highRiskStop` | PASS |
## Notes for implementation
- The obligation applies only when the next task is known within the same approved plan.
- A generic next action is not enough unless it proves the same approved plan task transition.
- A real dispatch receipt remains the source of truth for whether auto-next actually happened.
- Receipt linkage should include the required next-task identity when the evaluator needs to distinguish a real next-task dispatch from a stale or unrelated dispatch.
- Checkpoint/session metadata alone must not satisfy the receipt proof.
- This rule is intentionally minimal so it can later move into the continuity plugin without changing the behavior contract.

View File

@@ -356,6 +356,11 @@ function buildApprovedPlanContinuityInput(wrapperResult: any, autoChainPlanResul
: (wrapperResult?.handoff?.mode === "button_path" ? "waiting_user" : "completed"); : (wrapperResult?.handoff?.mode === "button_path" ? "waiting_user" : "completed");
const dispatchReceipt = wrapperResult?.dispatchReceipt ?? null; const dispatchReceipt = wrapperResult?.dispatchReceipt ?? null;
const nextTaskKnown = wrapperResult?.nextTaskKnown === true
|| (plannerDerivedAction != null && typeof autoChainPlanResult?.derivedAction === 'string' && autoChainPlanResult.derivedAction !== 'none');
const sameApprovedPlan = wrapperResult?.sameApprovedPlan === true || plannerDerivedAction != null;
const taskBoundaryStop = wrapperResult?.taskBoundaryStop === true || replyClosureState === 'completed';
const highRiskStop = wrapperResult?.highRiskStop === true;
return { return {
planId: wrapperResult?.planId ?? "hook-preflight-approved-plan", planId: wrapperResult?.planId ?? "hook-preflight-approved-plan",
@@ -364,6 +369,10 @@ function buildApprovedPlanContinuityInput(wrapperResult: any, autoChainPlanResul
nextDerivedAction, nextDerivedAction,
replyClosureState, replyClosureState,
dispatchReceipt, dispatchReceipt,
nextTaskKnown,
sameApprovedPlan,
taskBoundaryStop,
highRiskStop,
}; };
} }
@@ -387,7 +396,12 @@ function buildApprovedPlanContinuityBlock(result: ApprovedPlanContinuityResult |
if (result.ok === false) { if (result.ok === false) {
lines.push("- HARD_GATE: Do not close out this reply as normal completion."); lines.push("- HARD_GATE: Do not close out this reply as normal completion.");
lines.push("- HARD_GATE: Route back to continuity failure until a real next dispatch receipt exists, unless closure state is waiting_user, blocked, or pending_verification."); if (result.reason === 'missing_auto_next_dispatch') {
lines.push("- HARD_GATE: Do not stop at this completed-task boundary.");
lines.push("- HARD_GATE: Auto-dispatch the next task in the same approved plan, unless waiting_user, blocked, pending_verification, or high-risk stop applies.");
} else {
lines.push("- HARD_GATE: Route back to continuity failure until a real next dispatch receipt exists, unless closure state is waiting_user, blocked, or pending_verification.");
}
} }
lines.push("[/APPROVED_PLAN_CONTINUITY_GATE]", ""); lines.push("[/APPROVED_PLAN_CONTINUITY_GATE]", "");

View File

@@ -11,6 +11,10 @@ function isObject(value) {
return value != null && typeof value === 'object' && !Array.isArray(value); return value != null && typeof value === 'object' && !Array.isArray(value);
} }
function normalizeAction(action) {
return JSON.stringify(action ?? null);
}
function hasValidDispatchReceipt(receipt) { function hasValidDispatchReceipt(receipt) {
if (!isObject(receipt)) return false; if (!isObject(receipt)) return false;
if (!isNonEmptyString(receipt.planId)) return false; if (!isNonEmptyString(receipt.planId)) return false;
@@ -20,6 +24,27 @@ function hasValidDispatchReceipt(receipt) {
return true; return true;
} }
function receiptMatchesPayload(payload, receipt) {
if (!hasValidDispatchReceipt(receipt)) return false;
const expectedPlanId = payload?.planId;
if (isNonEmptyString(expectedPlanId) && receipt.planId !== expectedPlanId) return false;
const expectedCurrentTask = payload?.currentTask;
if (isNonEmptyString(expectedCurrentTask) && receipt.currentTask !== expectedCurrentTask) return false;
const expectedNextTask = payload?.nextTaskId ?? payload?.nextTaskKey ?? null;
const receiptNextTask = receipt?.nextTaskId ?? receipt?.nextTaskKey ?? null;
if (isNonEmptyString(expectedNextTask) && receiptNextTask !== expectedNextTask) return false;
const expectedNextAction = payload?.nextDerivedAction ?? payload?.derivedAction ?? null;
if (expectedNextAction != null && normalizeAction(receipt.nextDerivedAction) !== normalizeAction(expectedNextAction)) {
return false;
}
return true;
}
function parseArgs(argv) { function parseArgs(argv) {
let inputPath = null; let inputPath = null;
let compact = false; let compact = false;
@@ -76,11 +101,39 @@ function evaluateContinuity(payload) {
const taskComplete = payload?.taskState === 'complete'; const taskComplete = payload?.taskState === 'complete';
const nextAction = payload?.nextDerivedAction ?? payload?.derivedAction ?? null; const nextAction = payload?.nextDerivedAction ?? payload?.derivedAction ?? null;
const nextActionKnown = nextAction != null; const nextActionKnown = nextAction != null;
const hasDispatchReceipt = hasValidDispatchReceipt(payload?.dispatchReceipt ?? null); const explicitNextTaskKnown = payload?.nextTaskKnown === true;
const sameApprovedPlan = payload?.sameApprovedPlan === true;
const taskBoundaryStop = payload?.taskBoundaryStop === true;
const highRiskStop = payload?.highRiskStop === true;
const closureState = payload?.replyClosureState ?? null; const closureState = payload?.replyClosureState ?? null;
const isLegalTerminalState = LEGAL_TERMINAL_STATES.has(closureState); const isLegalTerminalState = LEGAL_TERMINAL_STATES.has(closureState);
const hasDispatchReceipt = receiptMatchesPayload(payload, payload?.dispatchReceipt ?? null);
const autoNextObligatory = taskComplete
&& explicitNextTaskKnown
&& sameApprovedPlan
&& taskBoundaryStop
&& !isLegalTerminalState
&& !highRiskStop;
if (taskComplete && nextActionKnown && !hasDispatchReceipt && !isLegalTerminalState) { if (autoNextObligatory && !hasDispatchReceipt) {
return {
ok: false,
status: 'continuity_failure',
verdict: 'continuity_failure',
reason: 'missing_auto_next_dispatch',
};
}
if (taskComplete && nextActionKnown && !hasDispatchReceipt && !isLegalTerminalState && !highRiskStop && !('sameApprovedPlan' in (payload ?? {}))) {
return {
ok: false,
status: 'continuity_failure',
verdict: 'continuity_failure',
reason: 'missing_dispatch_receipt',
};
}
if (taskComplete && nextActionKnown && !hasDispatchReceipt && !isLegalTerminalState && !highRiskStop && sameApprovedPlan && !taskBoundaryStop && !explicitNextTaskKnown) {
return { return {
ok: false, ok: false,
status: 'continuity_failure', status: 'continuity_failure',
@@ -122,5 +175,4 @@ const response = {
}, },
}; };
process.stdout.write(`${JSON.stringify(response)} process.stdout.write(`${JSON.stringify(response)}\n`);
`);

View File

@@ -81,6 +81,7 @@ function buildReceipt(payload) {
const receipt = { const receipt = {
planId: payload?.planId ?? null, planId: payload?.planId ?? null,
currentTask: payload?.currentTask ?? null, currentTask: payload?.currentTask ?? null,
nextTaskId: payload?.nextTaskId ?? null,
nextDerivedAction: nextAction, nextDerivedAction: nextAction,
dispatchedAt: payload?.dispatchedAt ?? null, dispatchedAt: payload?.dispatchedAt ?? null,
dispatchRunId: payload?.dispatchRunId ?? null, dispatchRunId: payload?.dispatchRunId ?? null,
@@ -97,6 +98,7 @@ function validateReceipt(receipt) {
for (const field of [ for (const field of [
'planId', 'planId',
'currentTask', 'currentTask',
'nextTaskId',
'nextDerivedAction', 'nextDerivedAction',
'dispatchedAt', 'dispatchedAt',
'dispatchRunId', 'dispatchRunId',

View File

@@ -168,6 +168,288 @@ const tests = [
} }
}, },
}, },
{
name: 'auto-next obligation: fails when approved plan stops at completed-task boundary without auto-next dispatch',
run() {
const fixture = createFixture({
'input.json': {
planId: 'plan-auto-next-core',
currentTask: 'task-8',
taskState: 'complete',
nextTaskKnown: true,
sameApprovedPlan: true,
taskBoundaryStop: true,
nextTaskId: 'task-9',
nextDerivedAction: {
type: 'message_subagent',
task: 'continue with task-9',
},
replyClosureState: 'completed',
highRiskStop: false,
dispatchReceipt: null,
},
});
try {
const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] });
if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`);
if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`);
if (result.json.ok !== false) throw new Error(`expected auto-next continuity failure ok=false, got ${JSON.stringify(result.json)}`);
if (result.json.verdict !== 'continuity_failure') throw new Error(`expected verdict=continuity_failure, got ${JSON.stringify(result.json.verdict)}`);
if (result.json.reason !== 'missing_auto_next_dispatch') throw new Error(`expected reason=missing_auto_next_dispatch, got ${JSON.stringify(result.json.reason)}`);
} finally {
fixture.cleanup();
}
},
},
{
name: 'auto-next obligation: fails when only dry-run derived action exists at completed-task boundary',
run() {
const fixture = createFixture({
'input.json': {
planId: 'plan-auto-next-dry-run-only',
currentTask: 'task-8b',
taskState: 'complete',
nextTaskKnown: true,
sameApprovedPlan: true,
taskBoundaryStop: true,
nextTaskId: 'task-9b',
derivedAction: {
type: 'message_subagent',
task: 'continue with task-9b',
},
replyClosureState: 'completed',
highRiskStop: false,
dispatchReceipt: null,
},
});
try {
const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] });
if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`);
if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`);
if (result.json.ok !== false) throw new Error(`expected auto-next continuity failure ok=false, got ${JSON.stringify(result.json)}`);
if (result.json.verdict !== 'continuity_failure') throw new Error(`expected verdict=continuity_failure, got ${JSON.stringify(result.json.verdict)}`);
if (result.json.reason !== 'missing_auto_next_dispatch') throw new Error(`expected reason=missing_auto_next_dispatch, got ${JSON.stringify(result.json.reason)}`);
} finally {
fixture.cleanup();
}
},
},
{
name: 'auto-next obligation: passes when explicit high-risk stop is active',
run() {
const fixture = createFixture({
'input.json': {
planId: 'plan-auto-next-high-risk-stop',
currentTask: 'task-8c',
taskState: 'complete',
nextTaskKnown: true,
sameApprovedPlan: true,
taskBoundaryStop: true,
nextTaskId: 'task-9c',
nextDerivedAction: {
type: 'message_subagent',
task: 'continue with task-9c',
},
replyClosureState: 'completed',
highRiskStop: true,
dispatchReceipt: null,
},
});
try {
const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] });
if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`);
if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`);
if (result.json.ok !== true) throw new Error(`expected continuity pass ok=true when highRiskStop=true, got ${JSON.stringify(result.json)}`);
} finally {
fixture.cleanup();
}
},
},
{
name: 'auto-next obligation: passes when next task is not known',
run() {
const fixture = createFixture({
'input.json': {
planId: 'plan-auto-next-unknown-next-task',
currentTask: 'task-8d',
taskState: 'complete',
nextTaskKnown: false,
sameApprovedPlan: true,
taskBoundaryStop: true,
replyClosureState: 'completed',
highRiskStop: false,
dispatchReceipt: null,
},
});
try {
const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] });
if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`);
if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`);
if (result.json.ok !== true) throw new Error(`expected pass when nextTaskKnown=false, got ${JSON.stringify(result.json)}`);
} finally {
fixture.cleanup();
}
},
},
{
name: 'auto-next obligation: passes when next action is not in the same approved plan',
run() {
const fixture = createFixture({
'input.json': {
planId: 'plan-auto-next-other-plan',
currentTask: 'task-8e',
taskState: 'complete',
nextTaskKnown: true,
sameApprovedPlan: false,
taskBoundaryStop: true,
nextTaskId: 'task-other',
nextDerivedAction: {
type: 'message_subagent',
task: 'continue with unrelated task',
},
replyClosureState: 'completed',
highRiskStop: false,
dispatchReceipt: null,
},
});
try {
const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] });
if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`);
if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`);
if (result.json.ok !== true) throw new Error(`expected pass when sameApprovedPlan=false, got ${JSON.stringify(result.json)}`);
} finally {
fixture.cleanup();
}
},
},
{
name: 'auto-next obligation: fails when receipt exists but next-task linkage is stale or mismatched',
run() {
const fixture = createFixture({
'input.json': {
planId: 'plan-auto-next-linkage-mismatch',
currentTask: 'task-8f',
taskState: 'complete',
nextTaskKnown: true,
sameApprovedPlan: true,
taskBoundaryStop: true,
nextTaskId: 'task-9f',
nextDerivedAction: {
type: 'message_subagent',
task: 'continue with task-9f',
},
replyClosureState: 'completed',
highRiskStop: false,
dispatchReceipt: {
planId: 'plan-auto-next-linkage-mismatch',
currentTask: 'task-8f',
nextTaskId: 'task-10f',
nextDerivedAction: {
type: 'message_subagent',
task: 'continue with task-10f',
},
dispatchedAt: '2026-04-24T16:00:00+08:00',
},
},
});
try {
const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] });
if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`);
if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`);
if (result.json.ok !== false) throw new Error(`expected linkage mismatch to fail, got ${JSON.stringify(result.json)}`);
if (result.json.reason !== 'missing_auto_next_dispatch') throw new Error(`expected linkage mismatch reason=missing_auto_next_dispatch, got ${JSON.stringify(result.json.reason)}`);
} finally {
fixture.cleanup();
}
},
},
{
name: 'auto-next obligation: passes when receipt links to the required next task',
run() {
const fixture = createFixture({
'input.json': {
planId: 'plan-auto-next-linkage-match',
currentTask: 'task-8g',
taskState: 'complete',
nextTaskKnown: true,
sameApprovedPlan: true,
taskBoundaryStop: true,
nextTaskId: 'task-9g',
nextDerivedAction: {
type: 'message_subagent',
task: 'continue with task-9g',
},
replyClosureState: 'completed',
highRiskStop: false,
dispatchReceipt: {
planId: 'plan-auto-next-linkage-match',
currentTask: 'task-8g',
nextTaskId: 'task-9g',
nextDerivedAction: {
type: 'message_subagent',
task: 'continue with task-9g',
},
dispatchedAt: '2026-04-24T16:05:00+08:00',
},
},
});
try {
const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] });
if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`);
if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`);
if (result.json.ok !== true) throw new Error(`expected linkage-matched receipt to pass, got ${JSON.stringify(result.json)}`);
} finally {
fixture.cleanup();
}
},
},
{
name: 'auto-next obligation: fails when receipt only proves checkpoint/session metadata without actual dispatch linkage',
run() {
const fixture = createFixture({
'input.json': {
planId: 'plan-auto-next-checkpoint-spoof',
currentTask: 'task-8h',
taskState: 'complete',
nextTaskKnown: true,
sameApprovedPlan: true,
taskBoundaryStop: true,
nextTaskId: 'task-9h',
nextDerivedAction: {
type: 'message_subagent',
task: 'continue with task-9h',
},
replyClosureState: 'completed',
highRiskStop: false,
dispatchReceipt: {
planId: 'plan-auto-next-checkpoint-spoof',
currentTask: 'task-8h',
nextTaskId: 'task-9h',
checkpointPath: 'checkpoints/task-8h.json',
sessionKey: 'task-8h',
dispatchedAt: '2026-04-24T16:10:00+08:00',
},
},
});
try {
const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] });
if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`);
if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`);
if (result.json.ok !== false) throw new Error(`expected checkpoint-only receipt to fail, got ${JSON.stringify(result.json)}`);
if (result.json.reason !== 'missing_auto_next_dispatch') throw new Error(`expected checkpoint-only reason=missing_auto_next_dispatch, got ${JSON.stringify(result.json.reason)}`);
} finally {
fixture.cleanup();
}
},
},
{ {
name: 'continuity: fails when dispatchReceipt is a fake non-null object without minimum receipt fields', name: 'continuity: fails when dispatchReceipt is a fake non-null object without minimum receipt fields',
run() { run() {
@@ -188,35 +470,17 @@ const tests = [
}); });
try { try {
const result = runGate({ const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] });
args: ['--compact', '--input', fixture.path('input.json')], if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`);
}); if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`);
if (result.json.ok !== false) throw new Error(`expected continuity failure ok=false for fake dispatch receipt, got ${JSON.stringify(result.json)}`);
if (result.status !== 0 && result.status !== null) { if (result.json.verdict !== 'continuity_failure') throw new Error(`expected verdict=continuity_failure for fake dispatch receipt, got ${JSON.stringify(result.json.verdict)}`);
throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); if (result.json.reason !== 'missing_dispatch_receipt') throw new Error(`expected reason=missing_dispatch_receipt for fake dispatch receipt, got ${JSON.stringify(result.json.reason)}`);
}
if (!result.json || typeof result.json !== 'object') {
throw new Error(`expected JSON output\nstdout=${result.stdout}`);
}
if (result.json.ok !== false) {
throw new Error(`expected continuity failure ok=false for fake dispatch receipt, got ${JSON.stringify(result.json)}`);
}
if (result.json.verdict !== 'continuity_failure') {
throw new Error(`expected verdict=continuity_failure for fake dispatch receipt, got ${JSON.stringify(result.json.verdict)}`);
}
if (result.json.reason !== 'missing_dispatch_receipt') {
throw new Error(`expected reason=missing_dispatch_receipt for fake dispatch receipt, got ${JSON.stringify(result.json.reason)}`);
}
} finally { } finally {
fixture.cleanup(); fixture.cleanup();
} }
}, },
}, },
{ {
name: 'continuity: passes when task is complete, next action is known, and a dispatch receipt already exists', name: 'continuity: passes when task is complete, next action is known, and a dispatch receipt already exists',
run() { run() {
@@ -243,27 +507,15 @@ const tests = [
}); });
try { try {
const result = runGate({ const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] });
args: ['--compact', '--input', fixture.path('input.json')], if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`);
}); if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`);
if (result.json.ok !== true) throw new Error(`expected continuity pass ok=true when dispatch receipt exists, got ${JSON.stringify(result.json)}`);
if (result.status !== 0 && result.status !== null) {
throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`);
}
if (!result.json || typeof result.json !== 'object') {
throw new Error(`expected JSON output\nstdout=${result.stdout}`);
}
if (result.json.ok !== true) {
throw new Error(`expected continuity pass ok=true when dispatch receipt exists, got ${JSON.stringify(result.json)}`);
}
} finally { } finally {
fixture.cleanup(); fixture.cleanup();
} }
}, },
}, },
{ {
name: 'continuity: passes when planner returns derivedAction and a bound dispatch receipt already exists', name: 'continuity: passes when planner returns derivedAction and a bound dispatch receipt already exists',
run() { run() {
@@ -290,27 +542,15 @@ const tests = [
}); });
try { try {
const result = runGate({ const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] });
args: ['--compact', '--input', fixture.path('input.json')], if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`);
}); if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`);
if (result.json.ok !== true) throw new Error(`expected continuity pass ok=true when derivedAction has bound dispatch receipt, got ${JSON.stringify(result.json)}`);
if (result.status !== 0 && result.status !== null) {
throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`);
}
if (!result.json || typeof result.json !== 'object') {
throw new Error(`expected JSON output\nstdout=${result.stdout}`);
}
if (result.json.ok !== true) {
throw new Error(`expected continuity pass ok=true when derivedAction has bound dispatch receipt, got ${JSON.stringify(result.json)}`);
}
} finally { } finally {
fixture.cleanup(); fixture.cleanup();
} }
}, },
}, },
{ {
name: 'continuity: passes when task is complete, next action is known, no dispatch receipt exists, and closure is waiting_user', name: 'continuity: passes when task is complete, next action is known, no dispatch receipt exists, and closure is waiting_user',
run() { run() {
@@ -329,27 +569,15 @@ const tests = [
}); });
try { try {
const result = runGate({ const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] });
args: ['--compact', '--input', fixture.path('input.json')], if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`);
}); if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`);
if (result.json.ok !== true) throw new Error(`expected continuity pass ok=true when closure is waiting_user, got ${JSON.stringify(result.json)}`);
if (result.status !== 0 && result.status !== null) {
throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`);
}
if (!result.json || typeof result.json !== 'object') {
throw new Error(`expected JSON output\nstdout=${result.stdout}`);
}
if (result.json.ok !== true) {
throw new Error(`expected continuity pass ok=true when closure is waiting_user, got ${JSON.stringify(result.json)}`);
}
} finally { } finally {
fixture.cleanup(); fixture.cleanup();
} }
}, },
}, },
{ {
name: 'continuity: passes when task is complete, next action is known, no dispatch receipt exists, and closure is pending_verification', name: 'continuity: passes when task is complete, next action is known, no dispatch receipt exists, and closure is pending_verification',
run() { run() {
@@ -368,27 +596,15 @@ const tests = [
}); });
try { try {
const result = runGate({ const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] });
args: ['--compact', '--input', fixture.path('input.json')], if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`);
}); if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`);
if (result.json.ok !== true) throw new Error(`expected continuity pass ok=true when closure is pending_verification, got ${JSON.stringify(result.json)}`);
if (result.status !== 0 && result.status !== null) {
throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`);
}
if (!result.json || typeof result.json !== 'object') {
throw new Error(`expected JSON output\nstdout=${result.stdout}`);
}
if (result.json.ok !== true) {
throw new Error(`expected continuity pass ok=true when closure is pending_verification, got ${JSON.stringify(result.json)}`);
}
} finally { } finally {
fixture.cleanup(); fixture.cleanup();
} }
}, },
}, },
{ {
name: 'continuity: passes when task is complete, next action is known, no dispatch receipt exists, and closure is blocked', name: 'continuity: passes when task is complete, next action is known, no dispatch receipt exists, and closure is blocked',
run() { run() {
@@ -407,21 +623,10 @@ const tests = [
}); });
try { try {
const result = runGate({ const result = runGate({ args: ['--compact', '--input', fixture.path('input.json')] });
args: ['--compact', '--input', fixture.path('input.json')], if (result.status !== 0 && result.status !== null) throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`);
}); if (!result.json || typeof result.json !== 'object') throw new Error(`expected JSON output\nstdout=${result.stdout}`);
if (result.json.ok !== true) throw new Error(`expected continuity pass ok=true when closure is blocked, got ${JSON.stringify(result.json)}`);
if (result.status !== 0 && result.status !== null) {
throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`);
}
if (!result.json || typeof result.json !== 'object') {
throw new Error(`expected JSON output\nstdout=${result.stdout}`);
}
if (result.json.ok !== true) {
throw new Error(`expected continuity pass ok=true when closure is blocked, got ${JSON.stringify(result.json)}`);
}
} finally { } finally {
fixture.cleanup(); fixture.cleanup();
} }

View File

@@ -312,8 +312,10 @@ async function main() {
assert.match(passInjected, /\[APPROVED_PLAN_CONTINUITY_GATE\]/, 'hook pass-path should emit approved-plan continuity gate block'); assert.match(passInjected, /\[APPROVED_PLAN_CONTINUITY_GATE\]/, 'hook pass-path should emit approved-plan continuity gate block');
assert.match(passInjected, /status=continuity_failure/, 'hook pass-path should fail continuity when planner only returns dry-run dispatch without a real receipt'); assert.match(passInjected, /status=continuity_failure/, 'hook pass-path should fail continuity when planner only returns dry-run dispatch without a real receipt');
assert.match(passInjected, /verdict=continuity_failure/, 'hook pass-path should expose continuity failure verdict when no real dispatch receipt exists'); assert.match(passInjected, /verdict=continuity_failure/, 'hook pass-path should expose continuity failure verdict when no real dispatch receipt exists');
assert.match(passInjected, /reason=missing_dispatch_receipt/, 'hook pass-path should require a real dispatch receipt instead of treating dry-run dispatch as one'); assert.match(passInjected, /reason=missing_auto_next_dispatch/, 'hook pass-path should require auto-next dispatch proof instead of treating dry-run dispatch as enough');
assert.match(passInjected, /Route back to continuity failure until a real next dispatch receipt exists/, 'hook pass-path should hard-gate normal closeout until a real receipt exists'); assert.match(passInjected, /Do not stop at this completed-task boundary/, 'hook pass-path should explicitly forbid stopping at the completed-task boundary');
assert.match(passInjected, /Auto-dispatch the next task in the same approved plan, unless waiting_user, blocked, pending_verification, or high-risk stop applies/, 'hook pass-path should explain the auto-next obligation exceptions');
assert.match(passInjected, /Do not stop at this completed-task boundary/, 'hook pass-path should hard-gate the completed-task boundary');
assert.doesNotMatch(passInjected, /\[APPROVED_PLAN_CONTINUITY_GATE\][\s\S]*status=pass/, 'hook pass-path should not let approved-plan continuity pass on dry-run dispatch alone'); assert.doesNotMatch(passInjected, /\[APPROVED_PLAN_CONTINUITY_GATE\][\s\S]*status=pass/, 'hook pass-path should not let approved-plan continuity pass on dry-run dispatch alone');
const failInjected = await withPatchedWrapper(buildWrapperScript({ const failInjected = await withPatchedWrapper(buildWrapperScript({