From 6572f0b5d560929ed3273b47397fc587e5cce437 Mon Sep 17 00:00:00 2001 From: "openclaw@cowbay.org" Date: Fri, 24 Apr 2026 14:58:22 +0800 Subject: [PATCH] feat: sync latest continuity hard-gate integration --- README.md | 104 ++-- hooks/force-recall/handler.ts | 73 ++- scripts/approved_plan_continuity_gate.mjs | 19 +- .../test_approved_plan_continuity_gate.mjs | 86 ++- .../test_force_recall_long_task_preflight.mjs | 489 ++++++++++++++++++ 5 files changed, 706 insertions(+), 65 deletions(-) create mode 100755 scripts/test_force_recall_long_task_preflight.mjs diff --git a/README.md b/README.md index c671adc..630fc65 100644 --- a/README.md +++ b/README.md @@ -15,34 +15,50 @@ - 但沒有真的 dispatch 下一個 task - 最後流程卻還是收尾,造成 **auto-next break / continuity failure** -目前 repo 內主要包含: +## 目前已完成 -- continuity gate 規則與測試 -- dispatch receipt binding 骨架與最小 receipt writer -- anti-blackhole / delivery watchdog 的前置設計與基礎腳本 -- 相關 runbook / plan / state shape +目前這個 repo 已經包含並驗證以下能力: -### 目前重點 +1. **continuity evaluator** + - task 完成、next action 已知、但沒有 next dispatch receipt,且 closure 狀態又不是 `waiting_user` / `blocked` / `pending_verification` 時,會判定 `continuity_failure`。 -這個 repo 目前著重於把以下能力拆成可測試、可逐步落地的切片: +2. **dispatch receipt binding groundwork** + - 已有 continuity receipt storage 定義 + - 已有最小 dispatch receipt writer + - 已有 continuity gate / dispatch binding 對應測試 -1. **continuity hard-gate** - - approved plan 內 task 完成後,若沒有 next dispatch receipt,且狀態又不是 `waiting_user` / `blocked` / `pending_verification`,則不應允許正常收尾。 +3. **`derivedAction` 與 `nextDerivedAction` 一致納入 continuity 判定** + - 不再只有 `nextDerivedAction` 才受 gate 約束。 -2. **dispatch receipt binding** - - 不只知道 planner 推導出下一步,而是要真的留下可驗證的 dispatch receipt。 +4. **`dry_run_dispatch` 不得冒充真 receipt** + - planner 的 dry-run 結果不再被 handler fallback 當成真實 dispatch receipt。 -3. **anti-blackhole groundwork** - - 為後續的 subagent completion-delivery watchdog 提供 receipt / state / failure mode 基礎。 +5. **fake receipt authority 已補強** + - continuity gate 不再接受任意 non-null `dispatchReceipt` + - 現在至少要求最小 receipt 欄位: + - `planId` + - `currentTask` + - `nextDerivedAction` + - `dispatchedAt` -### 說明 +6. **hook integration 已接入** + - continuity gate 已接進 `hooks/force-recall/handler.ts` + - 目前會透過 `[APPROVED_PLAN_CONTINUITY_GATE]` block 注入現行 flow -這是一個**聚焦匯出 repo**,不是整個原 workspace 的完整鏡像。目的是先把相關修補線獨立出來,方便: +## 目前限制 -- 追蹤修補進度 -- 做 code review -- 補 README / runbook / 測試 -- 後續再整合回主要流程 +這條線雖然已經接入現行 flow,但目前仍偏向 **prompt-level hard-gate integration**,而不是 engine-level abort。也就是說: + +- 已經不是只有規則文件 +- 已經不是只有獨立腳本測試 +- 但也還不是最底層 runtime/core 的絕對阻斷器 + +## 下一步建議 + +下一階段最合理的方向有兩條: + +1. **把 continuity hard-gate 再往更硬的 runtime enforcement 推進** +2. **回頭補完 anti-blackhole / completion-delivery watchdog recovery 閉環** --- @@ -61,31 +77,43 @@ The goal is to prevent this failure mode: - but the next task is never actually dispatched, - and the flow still closes out as if continuity were preserved. -The repository currently includes: +## Current State -- continuity gate rules and tests -- dispatch receipt binding skeletons and a minimal receipt writer -- groundwork for anti-blackhole / delivery-watchdog handling -- related runbooks, plans, and state-shape definitions +The repo now includes and validates the following capabilities: -### Current Focus +1. **Continuity evaluator** + - When a task is complete, the next action is known, and there is no next dispatch receipt, and the closure state is not `waiting_user`, `blocked`, or `pending_verification`, the flow is classified as `continuity_failure`. -This repo is currently structured to make the following capabilities testable and incrementally adoptable: +2. **Dispatch receipt binding groundwork** + - continuity receipt storage shape + - minimal dispatch receipt writer + - continuity gate / dispatch binding tests -1. **Continuity hard-gate** - - If a task inside an approved plan is marked complete, and there is no next dispatch receipt, and the closure state is not `waiting_user`, `blocked`, or `pending_verification`, the flow should not be allowed to close normally. +3. **`derivedAction` is treated as a real next-action source** + - The gate no longer depends only on `nextDerivedAction`. -2. **Dispatch receipt binding** - - It is not enough for the planner to derive a next action; that action must also produce verifiable dispatch evidence. +4. **`dry_run_dispatch` is no longer accepted as a real receipt** + - Planner dry-run output is no longer promoted into a real dispatch receipt by handler fallback logic. -3. **Anti-blackhole groundwork** - - The repo also lays groundwork for future subagent completion-delivery watchdog logic through receipt/state/failure-mode definitions. +5. **Fake receipt authority has been tightened** + - The continuity gate no longer accepts any arbitrary non-null `dispatchReceipt` + - It now requires at least these minimum fields: + - `planId` + - `currentTask` + - `nextDerivedAction` + - `dispatchedAt` -### Note +6. **Hook integration is now present** + - The continuity gate is integrated into `hooks/force-recall/handler.ts` + - It currently enters the live flow through the `[APPROVED_PLAN_CONTINUITY_GATE]` injected block -This is a **focused export repository**, not a full mirror of the original workspace. The intent is to isolate the relevant hardening work so it can be: +## Current Limitation -- reviewed more easily -- iterated independently -- documented clearly -- integrated back into the main flow later +This workstream is now beyond pure documentation and beyond isolated script-level testing, but it still behaves more like a **prompt-level hard-gate integration** than a true engine-level abort mechanism. + +## Suggested Next Steps + +Two reasonable follow-up directions remain: + +1. **push continuity hard-gate further toward stronger runtime enforcement** +2. **return to anti-blackhole / completion-delivery watchdog recovery closure** diff --git a/hooks/force-recall/handler.ts b/hooks/force-recall/handler.ts index 13df26a..3a51abd 100644 --- a/hooks/force-recall/handler.ts +++ b/hooks/force-recall/handler.ts @@ -8,6 +8,7 @@ const execFileAsync = promisify(execFile); const LONG_TASK_WRAPPER_TIMEOUT_MS = 8000; const LONG_TASK_GATE_LOCK_TIMEOUT_MS = 8000; const LONG_TASK_AUTO_CHAIN_PLANNER_TIMEOUT_MS = 8000; +const APPROVED_PLAN_CONTINUITY_TIMEOUT_MS = 8000; type AutoChainPlanResult = { plannerStatus: string; @@ -30,6 +31,14 @@ type GateLockResult = { allowedResponseModes?: string[]; }; +type ApprovedPlanContinuityResult = { + ok: boolean; + status: string; + verdict: string; + reason?: string; + gate?: string; +}; + function clamp(s: string, max = 1200): string { if (!s) return s; if (s.length <= max) return s; @@ -328,6 +337,63 @@ async function runAutoChainPlanner(workspaceDir: string, gateLockResult: GateLoc return runJsonScript(plannerPath, workspaceDir, input, LONG_TASK_AUTO_CHAIN_PLANNER_TIMEOUT_MS); } +function buildApprovedPlanContinuityInput(wrapperResult: any, autoChainPlanResult: AutoChainPlanResult | null): Record | null { + if (!wrapperResult || wrapperResult.classification !== "long_task") return null; + + const wrapperNextAction = wrapperResult?.nextDerivedAction ?? wrapperResult?.derivedAction ?? null; + const plannerDerivedAction = autoChainPlanResult?.derivedAction && autoChainPlanResult.derivedAction !== "none" + ? { + type: autoChainPlanResult.dispatchMode ?? "no_dispatch", + action: autoChainPlanResult.derivedAction, + } + : null; + const nextDerivedAction = wrapperNextAction ?? plannerDerivedAction; + + if (nextDerivedAction == null) return null; + + const replyClosureState = typeof wrapperResult?.replyClosureState === "string" + ? wrapperResult.replyClosureState + : (wrapperResult?.handoff?.mode === "button_path" ? "waiting_user" : "completed"); + + const dispatchReceipt = wrapperResult?.dispatchReceipt ?? null; + + return { + planId: wrapperResult?.planId ?? "hook-preflight-approved-plan", + currentTask: wrapperResult?.currentTask ?? wrapperResult?.requiredNextAction ?? "hook-preflight-task", + taskState: wrapperResult?.taskState ?? (plannerDerivedAction ? "complete" : null), + nextDerivedAction, + replyClosureState, + dispatchReceipt, + }; +} + +async function runApprovedPlanContinuityGate(workspaceDir: string, wrapperResult: any, autoChainPlanResult: AutoChainPlanResult | null): Promise { + const continuityPath = path.join(workspaceDir, "scripts", "approved_plan_continuity_gate.mjs"); + const input = buildApprovedPlanContinuityInput(wrapperResult, autoChainPlanResult); + if (!input) return null; + return runJsonScript(continuityPath, workspaceDir, input, APPROVED_PLAN_CONTINUITY_TIMEOUT_MS); +} + +function buildApprovedPlanContinuityBlock(result: ApprovedPlanContinuityResult | null): string { + if (!result) return ""; + + const lines = [ + "[APPROVED_PLAN_CONTINUITY_GATE]", + `status=${result.status}`, + `verdict=${result.verdict}`, + ]; + + if (result.reason) lines.push(`reason=${result.reason}`); + + if (result.ok === false) { + lines.push("- HARD_GATE: Do not close out this reply as normal completion."); + lines.push("- HARD_GATE: Route back to continuity failure until a real next dispatch receipt exists, unless closure state is waiting_user, blocked, or pending_verification."); + } + + lines.push("[/APPROVED_PLAN_CONTINUITY_GATE]", ""); + return lines.join("\n"); +} + function buildAutoChainPlanBlock(planResult: AutoChainPlanResult | null): string { if (!planResult) { return [ @@ -473,8 +539,11 @@ const forceRecall = async (event: any) => { ]); const gateLockResult = wrapperResult ? await runLongTaskGateLock(workspaceDir, wrapperResult) : null; const autoChainPlanResult = wrapperResult ? await runAutoChainPlanner(workspaceDir, gateLockResult, wrapperResult) : null; + const approvedPlanContinuityResult = wrapperResult + ? await runApprovedPlanContinuityGate(workspaceDir, wrapperResult, autoChainPlanResult) + : null; - if (!rulebook && !soul && !wrapperResult && !gateLockResult && !autoChainPlanResult) return; + if (!rulebook && !soul && !wrapperResult && !gateLockResult && !autoChainPlanResult && !approvedPlanContinuityResult) return; const wrapperBlock = wrapperResult ? [ @@ -500,6 +569,7 @@ const forceRecall = async (event: any) => { const gateLockBlock = buildGateLockBlock(gateLockResult); const autoChainPlanBlock = buildAutoChainPlanBlock(autoChainPlanResult); + const approvedPlanContinuityBlock = buildApprovedPlanContinuityBlock(approvedPlanContinuityResult); const recallBlock = [ "[RECALL_GATE] Mandatory recall before ANY technical action/tool use.", @@ -509,6 +579,7 @@ const forceRecall = async (event: any) => { wrapperBlock || null, gateLockBlock, autoChainPlanBlock, + approvedPlanContinuityBlock || null, rulebook ? `RULEBOOK (source: ${rulebookPath}):\n${clamp(rulebook, 1200)}` : null, soul ? `SOUL (source: ${soulPath}):\n${clamp(soul, 1200)}` : null, "[/RECALL_GATE]", diff --git a/scripts/approved_plan_continuity_gate.mjs b/scripts/approved_plan_continuity_gate.mjs index 38df9bd..8e93082 100755 --- a/scripts/approved_plan_continuity_gate.mjs +++ b/scripts/approved_plan_continuity_gate.mjs @@ -3,6 +3,23 @@ import fs from 'node:fs'; const LEGAL_TERMINAL_STATES = new Set(['waiting_user', 'blocked', 'pending_verification']); +function isNonEmptyString(value) { + return typeof value === 'string' && value.trim().length > 0; +} + +function isObject(value) { + return value != null && typeof value === 'object' && !Array.isArray(value); +} + +function hasValidDispatchReceipt(receipt) { + if (!isObject(receipt)) return false; + if (!isNonEmptyString(receipt.planId)) return false; + if (!isNonEmptyString(receipt.currentTask)) return false; + if (!isObject(receipt.nextDerivedAction)) return false; + if (!isNonEmptyString(receipt.dispatchedAt)) return false; + return true; +} + function parseArgs(argv) { let inputPath = null; let compact = false; @@ -59,7 +76,7 @@ function evaluateContinuity(payload) { const taskComplete = payload?.taskState === 'complete'; const nextAction = payload?.nextDerivedAction ?? payload?.derivedAction ?? null; const nextActionKnown = nextAction != null; - const hasDispatchReceipt = payload?.dispatchReceipt != null; + const hasDispatchReceipt = hasValidDispatchReceipt(payload?.dispatchReceipt ?? null); const closureState = payload?.replyClosureState ?? null; const isLegalTerminalState = LEGAL_TERMINAL_STATES.has(closureState); diff --git a/scripts/test_approved_plan_continuity_gate.mjs b/scripts/test_approved_plan_continuity_gate.mjs index 97a1f37..bb8ac25 100644 --- a/scripts/test_approved_plan_continuity_gate.mjs +++ b/scripts/test_approved_plan_continuity_gate.mjs @@ -149,13 +149,11 @@ const tests = [ }); if (result.status !== 0 && result.status !== null) { - throw new Error(`expected controlled execution, got status=${result.status} -${result.stderr || result.stdout}`); + throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); } if (!result.json || typeof result.json !== 'object') { - throw new Error(`expected JSON output -stdout=${result.stdout}`); + throw new Error(`expected JSON output\nstdout=${result.stdout}`); } if (result.json.ok !== false) { @@ -170,6 +168,54 @@ stdout=${result.stdout}`); } }, }, + { + name: 'continuity: fails when dispatchReceipt is a fake non-null object without minimum receipt fields', + run() { + const fixture = createFixture({ + 'input.json': { + planId: 'plan-fake-dispatch-receipt', + currentTask: 'task-6fake', + taskState: 'complete', + nextDerivedAction: { + type: 'message_subagent', + task: 'continue with task-7fake', + }, + replyClosureState: 'completed', + dispatchReceipt: { + fake: true, + }, + }, + }); + + try { + const result = runGate({ + args: ['--compact', '--input', fixture.path('input.json')], + }); + + if (result.status !== 0 && result.status !== null) { + throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); + } + + if (!result.json || typeof result.json !== 'object') { + throw new Error(`expected JSON output\nstdout=${result.stdout}`); + } + + if (result.json.ok !== false) { + throw new Error(`expected continuity failure ok=false for fake dispatch receipt, got ${JSON.stringify(result.json)}`); + } + + if (result.json.verdict !== 'continuity_failure') { + throw new Error(`expected verdict=continuity_failure for fake dispatch receipt, got ${JSON.stringify(result.json.verdict)}`); + } + + if (result.json.reason !== 'missing_dispatch_receipt') { + throw new Error(`expected reason=missing_dispatch_receipt for fake dispatch receipt, got ${JSON.stringify(result.json.reason)}`); + } + } finally { + fixture.cleanup(); + } + }, + }, { name: 'continuity: passes when task is complete, next action is known, and a dispatch receipt already exists', @@ -202,13 +248,11 @@ stdout=${result.stdout}`); }); if (result.status !== 0 && result.status !== null) { - throw new Error(`expected controlled execution, got status=${result.status} -${result.stderr || result.stdout}`); + throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); } if (!result.json || typeof result.json !== 'object') { - throw new Error(`expected JSON output -stdout=${result.stdout}`); + throw new Error(`expected JSON output\nstdout=${result.stdout}`); } if (result.json.ok !== true) { @@ -236,7 +280,7 @@ stdout=${result.stdout}`); dispatchReceipt: { planId: 'plan-derived-action-with-bound-dispatch', currentTask: 'task-6c', - derivedAction: { + nextDerivedAction: { type: 'message_subagent', task: 'continue with task-7c', }, @@ -251,13 +295,11 @@ stdout=${result.stdout}`); }); if (result.status !== 0 && result.status !== null) { - throw new Error(`expected controlled execution, got status=${result.status} -${result.stderr || result.stdout}`); + throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); } if (!result.json || typeof result.json !== 'object') { - throw new Error(`expected JSON output -stdout=${result.stdout}`); + throw new Error(`expected JSON output\nstdout=${result.stdout}`); } if (result.json.ok !== true) { @@ -292,13 +334,11 @@ stdout=${result.stdout}`); }); if (result.status !== 0 && result.status !== null) { - throw new Error(`expected controlled execution, got status=${result.status} -${result.stderr || result.stdout}`); + throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); } if (!result.json || typeof result.json !== 'object') { - throw new Error(`expected JSON output -stdout=${result.stdout}`); + throw new Error(`expected JSON output\nstdout=${result.stdout}`); } if (result.json.ok !== true) { @@ -333,13 +373,11 @@ stdout=${result.stdout}`); }); if (result.status !== 0 && result.status !== null) { - throw new Error(`expected controlled execution, got status=${result.status} -${result.stderr || result.stdout}`); + throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); } if (!result.json || typeof result.json !== 'object') { - throw new Error(`expected JSON output -stdout=${result.stdout}`); + throw new Error(`expected JSON output\nstdout=${result.stdout}`); } if (result.json.ok !== true) { @@ -374,13 +412,11 @@ stdout=${result.stdout}`); }); if (result.status !== 0 && result.status !== null) { - throw new Error(`expected controlled execution, got status=${result.status} -${result.stderr || result.stdout}`); + throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`); } if (!result.json || typeof result.json !== 'object') { - throw new Error(`expected JSON output -stdout=${result.stdout}`); + throw new Error(`expected JSON output\nstdout=${result.stdout}`); } if (result.json.ok !== true) { diff --git a/scripts/test_force_recall_long_task_preflight.mjs b/scripts/test_force_recall_long_task_preflight.mjs new file mode 100755 index 0000000..8a48e4f --- /dev/null +++ b/scripts/test_force_recall_long_task_preflight.mjs @@ -0,0 +1,489 @@ +#!/usr/bin/env node +import assert from 'node:assert/strict'; +import fs from 'node:fs/promises'; +import os from 'node:os'; +import path from 'node:path'; +import { pathToFileURL } from 'node:url'; +import { execFile as execFileCallback } from 'node:child_process'; +import { promisify } from 'node:util'; +import { stripTypeScriptTypes } from 'node:module'; + +const __dirname = path.dirname(new URL(import.meta.url).pathname); +const repoRoot = path.resolve(__dirname, '..'); +const handlerPath = path.join(repoRoot, 'hooks', 'force-recall', 'handler.ts'); +const wrapperPath = path.join(repoRoot, 'scripts', 'long_task_governor_wrapper.mjs'); +const gateLockPath = path.join(repoRoot, 'scripts', 'long_task_gate_lock.mjs'); +const plannerPath = path.join(repoRoot, 'scripts', 'plan_long_task_auto_chain.mjs'); +const continuityGatePath = path.join(repoRoot, 'scripts', 'approved_plan_continuity_gate.mjs'); +const execFileAsync = promisify(execFileCallback); + +async function importTsModule(tsPath) { + const source = await fs.readFile(tsPath, 'utf8'); + const jsSource = stripTypeScriptTypes(source, { mode: 'strip' }); + const dataUrl = `data:text/javascript;charset=utf-8,${encodeURIComponent(jsSource)}\n//# sourceURL=${encodeURIComponent(pathToFileURL(tsPath).href)}`; + return import(dataUrl); +} + +function escapeRegex(snippet) { + return snippet.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); +} + +async function runScenario(forceRecall, requestText, workspaceDir = repoRoot) { + const event = { + type: 'message', + action: 'preprocessed', + context: { + workspaceDir, + body: requestText, + bodyForAgent: requestText, + }, + }; + + await forceRecall(event); + const injected = event.context?.bodyForAgent; + assert.equal(typeof injected, 'string', 'event.context.bodyForAgent should be a string after handler runs'); + return injected; +} + +async function prepareTempWorkspace() { + const tempWorkspace = await fs.mkdtemp(path.join(os.tmpdir(), 'force-recall-workspace-')); + await fs.mkdir(path.join(tempWorkspace, 'scripts'), { recursive: true }); + await fs.mkdir(path.join(tempWorkspace, 'hooks', 'force-recall'), { recursive: true }); + await fs.mkdir(path.join(tempWorkspace, 'docs'), { recursive: true }); + + const copies = [ + [wrapperPath, path.join(tempWorkspace, 'scripts', 'long_task_governor_wrapper.mjs')], + [gateLockPath, path.join(tempWorkspace, 'scripts', 'long_task_gate_lock.mjs')], + [plannerPath, path.join(tempWorkspace, 'scripts', 'plan_long_task_auto_chain.mjs')], + [continuityGatePath, path.join(tempWorkspace, 'scripts', 'approved_plan_continuity_gate.mjs')], + [handlerPath, path.join(tempWorkspace, 'hooks', 'force-recall', 'handler.ts')], + [path.join(repoRoot, 'docs', 'RULEBOOK.md'), path.join(tempWorkspace, 'docs', 'RULEBOOK.md')], + [path.join(repoRoot, 'SOUL.md'), path.join(tempWorkspace, 'SOUL.md')], + ]; + + for (const [src, dest] of copies) { + await fs.copyFile(src, dest); + } + + return tempWorkspace; +} + +async function withPatchedWrapper(tempContent, callback) { + const originalWrapper = await fs.readFile(wrapperPath, 'utf8'); + await fs.writeFile(wrapperPath, tempContent, 'utf8'); + try { + return await callback(); + } finally { + await fs.writeFile(wrapperPath, originalWrapper, 'utf8'); + } +} + +async function withPatchedWrapperWorkspace(wrapperResult, callback) { + const tempWorkspace = await prepareTempWorkspace(); + const wrapperScriptPath = path.join(tempWorkspace, 'scripts', 'long_task_governor_wrapper.mjs'); + await fs.writeFile(wrapperScriptPath, buildWrapperScript(wrapperResult), 'utf8'); + + if (typeof wrapperResult.externalizedCheckpointPath === 'string' && wrapperResult.externalizedCheckpointPath.trim()) { + const checkpointPath = path.join(tempWorkspace, wrapperResult.externalizedCheckpointPath); + await fs.mkdir(path.dirname(checkpointPath), { recursive: true }); + await fs.writeFile(checkpointPath, JSON.stringify({ + kind: 'long_task_checkpoint', + currentStep: 'patched-wrapper-test', + nextStep: 'patched-wrapper-test-next', + verificationResult: 'checkpoint artifact readable in temp workspace', + }, null, 2) + '\n', 'utf8'); + } + + try { + return await callback(tempWorkspace); + } finally { + await fs.rm(tempWorkspace, { recursive: true, force: true }); + } +} + +function buildWrapperScript(wrapperResult) { + return `#!/usr/bin/env node\nprocess.stdout.write(JSON.stringify(${JSON.stringify(wrapperResult)}, null, 0) + "\\n");\n`; +} + +async function main() { + await Promise.all([fs.access(wrapperPath), fs.access(gateLockPath), fs.access(plannerPath), fs.access(continuityGatePath)]); + const { default: forceRecall } = await importTsModule(handlerPath); + assert.equal(typeof forceRecall, 'function', 'force-recall handler should export default function'); + + const requestText = [ + 'Please inspect the workspace files and verify the hook injection path.', + 'I need you to review the behavior, choose the final accept/reject decision,', + 'and continue in background with a follow-up later.', + ].join(' '); + const plannerOnlyRequestText = [ + 'Please inspect the workspace files and verify the hook injection path.', + 'Summarize the current dry-run planner state for technical inspection only.', + ].join(' '); + + const checkpointWorkspace = await prepareTempWorkspace(); + let realWrapperInjected; + try { + realWrapperInjected = await runScenario(forceRecall, 'Dispatch a subagent to inspect logs and wait for the result.', checkpointWorkspace); + const wrapperInputPath = path.join(checkpointWorkspace, 'wrapper-input.json'); + await fs.writeFile(wrapperInputPath, JSON.stringify({ + requestText: 'Dispatch a subagent to inspect logs and wait for the result.', + hasFilesOrSystems: false, + needsWaiting: false, + needsSubagent: false, + needsOwnerDecision: false, + canReplyNow: false, + taskName: 'Hook preflight classification', + currentStep: 'Classifying request at preprocessed hook', + nextStep: 'Carry governor recommendation into prompt context', + nextReportCondition: 'At next meaningful milestone', + waitingOn: 'none', + blocker: 'none', + checkpointTrigger: '', + externalizedTrigger: '', + triggerKind: '', + }), 'utf8'); + const wrapperRaw = await fs.readFile(path.join(checkpointWorkspace, 'scripts', 'long_task_governor_wrapper.mjs'), 'utf8'); + assert.ok(wrapperRaw.length > 0, 'temp workspace should contain wrapper script'); + const { stdout: wrapperStdout } = await execFileAsync('node', [path.join(checkpointWorkspace, 'scripts', 'long_task_governor_wrapper.mjs'), '--compact', '--input', wrapperInputPath], { cwd: checkpointWorkspace, encoding: 'utf8' }); + const wrapperOutput = JSON.parse(wrapperStdout); + const checkpointPath = path.join(checkpointWorkspace, wrapperOutput.externalizedCheckpointPath); + const checkpointBody = await fs.readFile(checkpointPath, 'utf8'); + assert.ok(checkpointBody.trim().length > 0, 'real wrapper integration should emit readable checkpoint artifact'); + assert.doesNotMatch(checkpointBody, /Hook preflight classification/, 'real wrapper artifact should not fall back to taskRecord.task_name'); + } finally { + await fs.rm(checkpointWorkspace, { recursive: true, force: true }); + } + assert.match(realWrapperInjected, /classification=long_task/, 'real wrapper integration should classify subagent wait as long_task'); + assert.match(realWrapperInjected, /gateStatus=pass/, 'real wrapper integration should pass gate with real progress evidence'); + assert.match(realWrapperInjected, /allowedResponseMode=silent_continuation/, 'real wrapper integration should preserve silent continuation allowance'); + assert.doesNotMatch(realWrapperInjected, /reason=claimed progression without concrete progress evidence is forbidden/, 'real wrapper integration should not fail for missing progress evidence'); + assert.doesNotMatch(realWrapperInjected, /requiredEvidence=progressEvidence/, 'real wrapper integration should not require synthetic progressEvidence repair'); + assert.doesNotMatch(realWrapperInjected, /task_name/, 'real wrapper integration should not leak taskRecord.task_name fallback into gate/preflight text'); + + const injected = await runScenario(forceRecall, requestText); + + const expectedSnippets = [ + '[LONG_TASK_GOVERNOR_PREFLIGHT]', + 'classification=long_task', + 'silentLaunchOk=false', + 'handoff.mode=button_path', + '[LONG_TASK_GATE_LOCK]', + 'gateStatus=fail', + '[LONG_TASK_AUTO_CHAIN_PLAN]', + 'plannerStatus=blocked_by_gate', + 'derivedAction=none', + 'dispatchMode=no_dispatch', + 'autoChainAllowed=false', + 'reason=gateStatus must pass before auto-chain planning can proceed', + 'requiredEvidence=gateStatus=pass', + 'requiredEvidence=externalizedCheckpoint', + 'requiredEvidence=concreteNextAction', + 'requiredEvidence=buttonPathMode', + 'reason=silent long-task cannot continue without externalized checkpoint path', + 'reason=claimed execution requires evidence of a concrete next action', + 'reason=owner decision flow must end in button-path, not plain text', + 'ENFORCEMENT: Hook input should include progressEvidence (or equivalent concrete fields) whenever a progression claim is present.', + 'HARD_GATE: Block any plain-text handoff or silent-continuation claim when externalized checkpoint evidence is missing.', + 'HARD_GATE: If owner decision is involved, do not replace button-path closure with plain-text handoff.', + 'ENFORCEMENT: Forbidden path: plain-text handoff that pretends the long task is already continuing without an externalized checkpoint.', + 'ENFORCEMENT: Forbidden path: stating you have already entered the next task/step when the record only contains planning language and no concrete execution evidence.', + ]; + + const unexpectedSnippets = [ + 'reason=claimed progression without concrete progress evidence is forbidden', + 'requiredEvidence=progressEvidence', + ]; + + for (const snippet of expectedSnippets) { + assert.match(injected, new RegExp(escapeRegex(snippet)), `missing snippet: ${snippet}`); + } + for (const snippet of unexpectedSnippets) { + assert.doesNotMatch(injected, new RegExp(escapeRegex(snippet)), `unexpected snippet present: ${snippet}`); + } + + const { evaluateGate } = await import(pathToFileURL(gateLockPath).href + `?t=${Date.now()}`); + assert.equal(typeof evaluateGate, 'function', 'long_task_gate_lock should export evaluateGate for direct tests'); + + const passResult = evaluateGate({ + classification: 'long_task', + claimedExecution: true, + concreteNextAction: 'dispatch_follow_up_subagent', + autoChainNextAction: 'dispatch_follow_up_subagent', + autoChainDispatchEvidence: { + action: 'dispatch_follow_up_subagent', + dispatched: true, + event: 'dispatch', + }, + progressionClaim: 'already progressing to the next step in background', + progressEvidence: { sessionKey: 'task-123' }, + }); + assert.equal(passResult.gateStatus, 'pass', 'pass-path should pass with concrete progressEvidence'); + + const failResult = evaluateGate({ + classification: 'long_task', + claimedExecution: true, + concreteNextAction: 'dispatch_follow_up_subagent', + autoChainNextAction: 'dispatch_follow_up_subagent', + progressionClaim: 'already progressing to the next step in background', + executionEvidence: { concreteNextAction: 'dispatch_follow_up_subagent' }, + }); + assert.equal(failResult.gateStatus, 'fail', 'fail-path should fail when explicit auto-chain action lacks dispatch evidence'); + assert.match(JSON.stringify(failResult), /autoChainDispatchEvidence/, 'fail-path should require autoChainDispatchEvidence'); + + const neutralResult = evaluateGate({ + classification: 'long_task', + claimedExecution: true, + concreteNextAction: 'summarize findings for reply', + executionEvidence: { concreteNextAction: 'summarize findings for reply' }, + }); + assert.equal(neutralResult.gateStatus, 'pass', 'neutral-path should pass when there is no explicit auto-chain next action'); + assert.doesNotMatch(JSON.stringify(neutralResult), /autoChainDispatchEvidence/, 'neutral-path should not require auto-chain dispatch evidence'); + + const directAutoChainFailResult = evaluateGate({ + classification: 'long_task', + claimedExecution: true, + concreteNextAction: 'dispatch_follow_up_subagent', + autoChainNextAction: 'dispatch_follow_up_subagent', + }); + assert.equal(directAutoChainFailResult.gateStatus, 'fail', 'direct evaluator should fail when explicit auto-chain action has no dispatch evidence'); + assert.match(JSON.stringify(directAutoChainFailResult), /explicit auto-chain next action requires dispatched-action evidence/, 'direct evaluator fail-path should mention missing dispatched-action evidence'); + + const mismatchedDispatchEvidenceResult = evaluateGate({ + classification: 'long_task', + claimedExecution: true, + concreteNextAction: 'dispatch_follow_up_subagent', + autoChainNextAction: 'dispatch_follow_up_subagent', + autoChainDispatchEvidence: { + action: 'dispatch_other_subagent', + dispatched: true, + event: 'dispatch', + }, + }); + assert.equal(mismatchedDispatchEvidenceResult.gateStatus, 'fail', 'mismatched dispatch evidence should fail'); + assert.match(JSON.stringify(mismatchedDispatchEvidenceResult), /autoChainDispatchEvidence/, 'mismatched dispatch evidence should still require matching autoChainDispatchEvidence'); + + const fakeCheckpointDispatchEvidenceResult = evaluateGate({ + classification: 'long_task', + claimedExecution: true, + concreteNextAction: 'dispatch_follow_up_subagent', + autoChainNextAction: 'dispatch_follow_up_subagent', + autoChainDispatchEvidence: { + sessionKey: 'task-123', + checkpointPath: 'checkpoints/task-123.json', + }, + }); + assert.equal(fakeCheckpointDispatchEvidenceResult.gateStatus, 'fail', 'checkpoint/session-only dispatch evidence should fail'); + assert.match(JSON.stringify(fakeCheckpointDispatchEvidenceResult), /explicit auto-chain next action requires dispatched-action evidence/, 'checkpoint/session-only dispatch evidence should be rejected as fake dispatch evidence'); + + const neutralSnakeCaseResult = evaluateGate({ + classification: 'long_task', + claimedExecution: true, + concreteNextAction: 'summarize findings for reply', + autoChainNextAction: 'checkpoint_session_metadata_only', + executionEvidence: { concreteNextAction: 'summarize findings for reply' }, + }); + assert.equal(neutralSnakeCaseResult.gateStatus, 'pass', 'neutral snake_case non-dispatch action should not trigger dispatch-evidence requirement'); + assert.doesNotMatch(JSON.stringify(neutralSnakeCaseResult), /autoChainDispatchEvidence/, 'neutral snake_case non-dispatch action should not mention dispatch-evidence requirement'); + + const passInjected = await withPatchedWrapperWorkspace({ + classification: 'long_task', + silentCandidate: true, + needsCheckpoint: true, + needsSubagent: false, + needsOwnerDecision: false, + silentLaunchOk: true, + silentLaunchReason: 'checkpoint established', + requiredNextAction: 'dispatch_follow_up_subagent', + autoChainDispatchEvidence: { + action: 'dispatch_follow_up_subagent', + dispatched: true, + event: 'dispatch', + }, + progressEvidence: { sessionKey: 'task-123' }, + externalizedCheckpointPath: 'checkpoints/task-123.json', + handoff: { mode: 'direct_reply' }, + }, async (workspaceDir) => runScenario(forceRecall, requestText, workspaceDir)); + assert.match(passInjected, /gateStatus=pass/, 'hook pass-path should pass when wrapper provides concrete progressEvidence'); + assert.match(passInjected, /\[LONG_TASK_AUTO_CHAIN_PLAN\]/, 'hook pass-path should emit auto-chain plan block'); + assert.match(passInjected, /plannerStatus=pass/, 'hook pass-path should expose planner pass result'); + assert.match(passInjected, /derivedAction=dispatch_spec_review/, 'hook pass-path should derive dry-run spec review dispatch'); + assert.match(passInjected, /dispatchMode=dry_run_dispatch/, 'hook pass-path should stay in dry-run dispatch mode'); + assert.match(passInjected, /autoChainAllowed=true/, 'hook pass-path should allow auto-chain in dry-run planner output'); + assert.match(passInjected, /\[APPROVED_PLAN_CONTINUITY_GATE\]/, 'hook pass-path should emit approved-plan continuity gate block'); + assert.match(passInjected, /status=continuity_failure/, 'hook pass-path should fail continuity when planner only returns dry-run dispatch without a real receipt'); + assert.match(passInjected, /verdict=continuity_failure/, 'hook pass-path should expose continuity failure verdict when no real dispatch receipt exists'); + assert.match(passInjected, /reason=missing_dispatch_receipt/, 'hook pass-path should require a real dispatch receipt instead of treating dry-run dispatch as one'); + assert.match(passInjected, /Route back to continuity failure until a real next dispatch receipt exists/, 'hook pass-path should hard-gate normal closeout until a real receipt exists'); + assert.doesNotMatch(passInjected, /\[APPROVED_PLAN_CONTINUITY_GATE\][\s\S]*status=pass/, 'hook pass-path should not let approved-plan continuity pass on dry-run dispatch alone'); + + const failInjected = await withPatchedWrapper(buildWrapperScript({ + classification: 'long_task', + silentCandidate: false, + needsCheckpoint: false, + needsSubagent: false, + needsOwnerDecision: false, + silentLaunchOk: false, + requiredNextAction: 'dispatch_follow_up_subagent', + handoff: { mode: 'direct_reply' }, + }), async () => runScenario(forceRecall, requestText)); + assert.match(failInjected, /gateStatus=fail/, 'hook fail-path should fail when wrapper exposes explicit auto-chain action without dispatch evidence'); + assert.match(failInjected, /\[LONG_TASK_AUTO_CHAIN_PLAN\]/, 'hook fail-path should emit auto-chain plan block'); + assert.match(failInjected, /plannerStatus=blocked_by_gate/, 'hook fail-path should report planner blocked by gate'); + assert.match(failInjected, /derivedAction=none/, 'hook fail-path should not derive a dry-run action'); + assert.match(failInjected, /dispatchMode=no_dispatch/, 'hook fail-path should remain no-dispatch'); + assert.match(failInjected, /autoChainAllowed=false/, 'hook fail-path should not allow auto-chain'); + assert.match(failInjected, /reason=explicit auto-chain next action requires dispatched-action evidence/, 'hook fail-path should mention missing dispatched-action evidence'); + assert.match(failInjected, /requiredEvidence=autoChainDispatchEvidence/, 'hook fail-path should require autoChainDispatchEvidence'); + + const neutralInjected = await withPatchedWrapper(buildWrapperScript({ + classification: 'long_task', + silentCandidate: false, + needsCheckpoint: false, + needsSubagent: false, + needsOwnerDecision: false, + silentLaunchOk: false, + requiredNextAction: 'summarize findings for reply', + handoff: { mode: 'direct_reply' }, + }), async () => runScenario(forceRecall, requestText)); + assert.match(neutralInjected, /gateStatus=pass/, 'hook neutral-path should pass when wrapper does not expose an explicit auto-chain action'); + assert.match(neutralInjected, /\[LONG_TASK_AUTO_CHAIN_PLAN\]/, 'hook neutral-path should emit auto-chain plan block'); + assert.match(neutralInjected, /plannerStatus=none/, 'hook neutral-path should report no derived auto-chain action'); + assert.match(neutralInjected, /derivedAction=none/, 'hook neutral-path should keep derivedAction as none'); + assert.match(neutralInjected, /dispatchMode=no_dispatch/, 'hook neutral-path should remain no-dispatch'); + assert.match(neutralInjected, /autoChainAllowed=false/, 'hook neutral-path should keep auto-chain disabled'); + assert.doesNotMatch(neutralInjected, /reason=explicit auto-chain next action requires dispatched-action evidence/, 'hook neutral-path should not fail on auto-chain evidence when no explicit tool action exists'); + + const fakeProgressEvidenceInjected = await withPatchedWrapper(buildWrapperScript({ + classification: 'long_task', + silentCandidate: true, + needsCheckpoint: true, + needsSubagent: false, + needsOwnerDecision: false, + silentLaunchOk: true, + silentLaunchReason: 'task name exists but no externalized artifact', + taskRecord: { task_name: 'descriptive-task-name-only' }, + handoff: { mode: 'direct_reply' }, + }), async () => runScenario(forceRecall, requestText)); + assert.match(fakeProgressEvidenceInjected, /gateStatus=fail/, 'hook fake-progress-evidence path should fail when only task_name exists'); + assert.match(fakeProgressEvidenceInjected, /reason=claimed progression without concrete progress evidence is forbidden/, 'hook fake-progress-evidence path should mention missing concrete progress evidence'); + assert.match(fakeProgressEvidenceInjected, /requiredEvidence=progressEvidence/, 'hook fake-progress-evidence path should require progressEvidence'); + assert.match(fakeProgressEvidenceInjected, /reason=silent long-task cannot continue without externalized checkpoint path/, 'hook fake-progress-evidence path should also require real checkpoint evidence'); + + const specReviewWithoutEvidenceInjected = await withPatchedWrapper(buildWrapperScript({ + classification: 'long_task', + silentCandidate: false, + needsCheckpoint: false, + needsSubagent: false, + needsOwnerDecision: false, + silentLaunchOk: true, + requiredNextAction: 'dispatch_code_quality_review', + autoChainDispatchEvidence: { + action: 'dispatch_code_quality_review', + dispatched: true, + event: 'dispatch', + }, + progressEvidence: { sessionKey: 'task-spec-review-missing-evidence' }, + externalizedCheckpointPath: 'checkpoints/task-spec-review-missing-evidence.json', + handoff: { mode: 'direct_reply' }, + }), async () => runScenario(forceRecall, plannerOnlyRequestText)); + assert.match(specReviewWithoutEvidenceInjected, /\[LONG_TASK_AUTO_CHAIN_PLAN\]/, 'hook spec-review missing-evidence path should emit auto-chain plan block'); + assert.match(specReviewWithoutEvidenceInjected, /plannerStatus=blocked_by_evidence/, 'hook spec-review missing-evidence path should block on missing evidence'); + assert.match(specReviewWithoutEvidenceInjected, /derivedAction=none/, 'hook spec-review missing-evidence path should not derive a dry-run action'); + assert.match(specReviewWithoutEvidenceInjected, /dispatchMode=no_dispatch/, 'hook spec-review missing-evidence path should stay no-dispatch'); + assert.match(specReviewWithoutEvidenceInjected, /autoChainAllowed=false/, 'hook spec-review missing-evidence path should not allow auto-chain'); + assert.match(specReviewWithoutEvidenceInjected, /reason=review pass evidence missing for code quality review transition/, 'hook spec-review missing-evidence path should mention missing review evidence'); + assert.match(specReviewWithoutEvidenceInjected, /requiredEvidence=reviewEvidence/, 'hook spec-review missing-evidence path should require reviewEvidence'); + + const fixSliceWithoutEvidenceInjected = await withPatchedWrapper(buildWrapperScript({ + classification: 'long_task', + silentCandidate: false, + needsCheckpoint: false, + needsSubagent: false, + needsOwnerDecision: false, + silentLaunchOk: true, + silentLaunchReason: 'review blocked by findings', + requiredNextAction: 'dispatch_fix_slice', + autoChainDispatchEvidence: { + action: 'dispatch_fix_slice', + dispatched: true, + event: 'dispatch', + }, + progressEvidence: { sessionKey: 'task-fix-slice-missing-evidence' }, + externalizedCheckpointPath: 'checkpoints/task-fix-slice-missing-evidence.json', + handoff: { mode: 'direct_reply' }, + }), async () => runScenario(forceRecall, plannerOnlyRequestText)); + assert.match(fixSliceWithoutEvidenceInjected, /\[LONG_TASK_AUTO_CHAIN_PLAN\]/, 'hook fix-slice missing-evidence path should emit auto-chain plan block'); + assert.match(fixSliceWithoutEvidenceInjected, /plannerStatus=blocked_by_evidence/, 'hook fix-slice missing-evidence path should block on missing evidence'); + assert.match(fixSliceWithoutEvidenceInjected, /derivedAction=none/, 'hook fix-slice missing-evidence path should not derive a dry-run action'); + assert.match(fixSliceWithoutEvidenceInjected, /dispatchMode=no_dispatch/, 'hook fix-slice missing-evidence path should stay no-dispatch'); + assert.match(fixSliceWithoutEvidenceInjected, /autoChainAllowed=false/, 'hook fix-slice missing-evidence path should not allow auto-chain'); + assert.match(fixSliceWithoutEvidenceInjected, /reason=blocker evidence missing for retry\/fix transition/, 'hook fix-slice missing-evidence path should mention missing blocker evidence'); + assert.match(fixSliceWithoutEvidenceInjected, /requiredEvidence=blockerEvidence/, 'hook fix-slice missing-evidence path should require blockerEvidence'); + + const specReviewWithoutImplementationEvidenceInjected = await withPatchedWrapper(buildWrapperScript({ + classification: 'long_task', + silentCandidate: false, + needsCheckpoint: false, + needsSubagent: false, + needsOwnerDecision: false, + silentLaunchOk: true, + requiredNextAction: 'dispatch_spec_review', + autoChainDispatchEvidence: { + action: 'dispatch_spec_review', + dispatched: true, + event: 'dispatch', + }, + progressEvidence: { sessionKey: 'task-implementation-missing-evidence' }, + externalizedCheckpointPath: 'checkpoints/task-implementation-missing-evidence.json', + handoff: { mode: 'direct_reply' }, + }), async () => runScenario(forceRecall, plannerOnlyRequestText)); + assert.match(specReviewWithoutImplementationEvidenceInjected, /\[LONG_TASK_AUTO_CHAIN_PLAN\]/, 'hook implementation missing-evidence path should emit auto-chain plan block'); + assert.match(specReviewWithoutImplementationEvidenceInjected, /plannerStatus=blocked_by_evidence/, 'hook implementation missing-evidence path should block on missing evidence'); + assert.match(specReviewWithoutImplementationEvidenceInjected, /derivedAction=none/, 'hook implementation missing-evidence path should not derive a dry-run action'); + assert.match(specReviewWithoutImplementationEvidenceInjected, /dispatchMode=no_dispatch/, 'hook implementation missing-evidence path should stay no-dispatch'); + assert.match(specReviewWithoutImplementationEvidenceInjected, /autoChainAllowed=false/, 'hook implementation missing-evidence path should not allow auto-chain'); + assert.match(specReviewWithoutImplementationEvidenceInjected, /reason=implementation evidence missing for review-required next action/, 'hook implementation missing-evidence path should mention missing implementation evidence'); + assert.match(specReviewWithoutImplementationEvidenceInjected, /requiredEvidence=executionEvidence/, 'hook implementation missing-evidence path should require executionEvidence'); + + const originalGateLock = await fs.readFile(gateLockPath, 'utf8'); + const tempDir = await fs.mkdtemp(path.join(os.tmpdir(), 'force-recall-gate-lock-')); + const backupPath = path.join(tempDir, path.basename(gateLockPath)); + await fs.writeFile(backupPath, originalGateLock, 'utf8'); + await fs.writeFile(gateLockPath, '#!/usr/bin/env node\nprocess.exit(1);\n', 'utf8'); + + let degradedInjected; + try { + degradedInjected = await runScenario(forceRecall, requestText); + } finally { + const backup = await fs.readFile(backupPath, 'utf8'); + await fs.writeFile(gateLockPath, backup, 'utf8'); + await fs.rm(tempDir, { recursive: true, force: true }); + } + + const degradedExpectedSnippets = [ + '[LONG_TASK_GATE_LOCK]', + 'gateStatus=degraded', + 'gateRequired=unknown', + 'HARD_GATE: Evaluator unavailable is not permission to claim silent continuation or next-task progression without verifiable progress evidence.', + 'HARD_GATE: Fall back to a non-silent, evidence-preserving follow-up if you cannot prove checkpoint state or concrete execution.', + ]; + + for (const snippet of degradedExpectedSnippets) { + assert.match(degradedInjected, new RegExp(escapeRegex(snippet)), `missing degraded snippet: ${snippet}`); + } + + process.stdout.write(JSON.stringify({ + ok: true, + gatePaths: { + pass: passResult.gateStatus, + fail: failResult.gateStatus, + neutral: neutralResult.gateStatus, + }, + bodyPreview: injected.split('\n').slice(0, 35), + }, null, 2) + '\n'); +} + +main().catch((error) => { + console.error(error); + process.exitCode = 1; +});