feat: sync latest continuity hard-gate integration
This commit is contained in:
104
README.md
104
README.md
@@ -15,34 +15,50 @@
|
|||||||
- 但沒有真的 dispatch 下一個 task
|
- 但沒有真的 dispatch 下一個 task
|
||||||
- 最後流程卻還是收尾,造成 **auto-next break / continuity failure**
|
- 最後流程卻還是收尾,造成 **auto-next break / continuity failure**
|
||||||
|
|
||||||
目前 repo 內主要包含:
|
## 目前已完成
|
||||||
|
|
||||||
- continuity gate 規則與測試
|
目前這個 repo 已經包含並驗證以下能力:
|
||||||
- dispatch receipt binding 骨架與最小 receipt writer
|
|
||||||
- anti-blackhole / delivery watchdog 的前置設計與基礎腳本
|
|
||||||
- 相關 runbook / plan / state shape
|
|
||||||
|
|
||||||
### 目前重點
|
1. **continuity evaluator**
|
||||||
|
- task 完成、next action 已知、但沒有 next dispatch receipt,且 closure 狀態又不是 `waiting_user` / `blocked` / `pending_verification` 時,會判定 `continuity_failure`。
|
||||||
|
|
||||||
這個 repo 目前著重於把以下能力拆成可測試、可逐步落地的切片:
|
2. **dispatch receipt binding groundwork**
|
||||||
|
- 已有 continuity receipt storage 定義
|
||||||
|
- 已有最小 dispatch receipt writer
|
||||||
|
- 已有 continuity gate / dispatch binding 對應測試
|
||||||
|
|
||||||
1. **continuity hard-gate**
|
3. **`derivedAction` 與 `nextDerivedAction` 一致納入 continuity 判定**
|
||||||
- approved plan 內 task 完成後,若沒有 next dispatch receipt,且狀態又不是 `waiting_user` / `blocked` / `pending_verification`,則不應允許正常收尾。
|
- 不再只有 `nextDerivedAction` 才受 gate 約束。
|
||||||
|
|
||||||
2. **dispatch receipt binding**
|
4. **`dry_run_dispatch` 不得冒充真 receipt**
|
||||||
- 不只知道 planner 推導出下一步,而是要真的留下可驗證的 dispatch receipt。
|
- planner 的 dry-run 結果不再被 handler fallback 當成真實 dispatch receipt。
|
||||||
|
|
||||||
3. **anti-blackhole groundwork**
|
5. **fake receipt authority 已補強**
|
||||||
- 為後續的 subagent completion-delivery watchdog 提供 receipt / state / failure mode 基礎。
|
- continuity gate 不再接受任意 non-null `dispatchReceipt`
|
||||||
|
- 現在至少要求最小 receipt 欄位:
|
||||||
|
- `planId`
|
||||||
|
- `currentTask`
|
||||||
|
- `nextDerivedAction`
|
||||||
|
- `dispatchedAt`
|
||||||
|
|
||||||
### 說明
|
6. **hook integration 已接入**
|
||||||
|
- continuity gate 已接進 `hooks/force-recall/handler.ts`
|
||||||
|
- 目前會透過 `[APPROVED_PLAN_CONTINUITY_GATE]` block 注入現行 flow
|
||||||
|
|
||||||
這是一個**聚焦匯出 repo**,不是整個原 workspace 的完整鏡像。目的是先把相關修補線獨立出來,方便:
|
## 目前限制
|
||||||
|
|
||||||
- 追蹤修補進度
|
這條線雖然已經接入現行 flow,但目前仍偏向 **prompt-level hard-gate integration**,而不是 engine-level abort。也就是說:
|
||||||
- 做 code review
|
|
||||||
- 補 README / runbook / 測試
|
- 已經不是只有規則文件
|
||||||
- 後續再整合回主要流程
|
- 已經不是只有獨立腳本測試
|
||||||
|
- 但也還不是最底層 runtime/core 的絕對阻斷器
|
||||||
|
|
||||||
|
## 下一步建議
|
||||||
|
|
||||||
|
下一階段最合理的方向有兩條:
|
||||||
|
|
||||||
|
1. **把 continuity hard-gate 再往更硬的 runtime enforcement 推進**
|
||||||
|
2. **回頭補完 anti-blackhole / completion-delivery watchdog recovery 閉環**
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -61,31 +77,43 @@ The goal is to prevent this failure mode:
|
|||||||
- but the next task is never actually dispatched,
|
- but the next task is never actually dispatched,
|
||||||
- and the flow still closes out as if continuity were preserved.
|
- and the flow still closes out as if continuity were preserved.
|
||||||
|
|
||||||
The repository currently includes:
|
## Current State
|
||||||
|
|
||||||
- continuity gate rules and tests
|
The repo now includes and validates the following capabilities:
|
||||||
- dispatch receipt binding skeletons and a minimal receipt writer
|
|
||||||
- groundwork for anti-blackhole / delivery-watchdog handling
|
|
||||||
- related runbooks, plans, and state-shape definitions
|
|
||||||
|
|
||||||
### Current Focus
|
1. **Continuity evaluator**
|
||||||
|
- When a task is complete, the next action is known, and there is no next dispatch receipt, and the closure state is not `waiting_user`, `blocked`, or `pending_verification`, the flow is classified as `continuity_failure`.
|
||||||
|
|
||||||
This repo is currently structured to make the following capabilities testable and incrementally adoptable:
|
2. **Dispatch receipt binding groundwork**
|
||||||
|
- continuity receipt storage shape
|
||||||
|
- minimal dispatch receipt writer
|
||||||
|
- continuity gate / dispatch binding tests
|
||||||
|
|
||||||
1. **Continuity hard-gate**
|
3. **`derivedAction` is treated as a real next-action source**
|
||||||
- If a task inside an approved plan is marked complete, and there is no next dispatch receipt, and the closure state is not `waiting_user`, `blocked`, or `pending_verification`, the flow should not be allowed to close normally.
|
- The gate no longer depends only on `nextDerivedAction`.
|
||||||
|
|
||||||
2. **Dispatch receipt binding**
|
4. **`dry_run_dispatch` is no longer accepted as a real receipt**
|
||||||
- It is not enough for the planner to derive a next action; that action must also produce verifiable dispatch evidence.
|
- Planner dry-run output is no longer promoted into a real dispatch receipt by handler fallback logic.
|
||||||
|
|
||||||
3. **Anti-blackhole groundwork**
|
5. **Fake receipt authority has been tightened**
|
||||||
- The repo also lays groundwork for future subagent completion-delivery watchdog logic through receipt/state/failure-mode definitions.
|
- The continuity gate no longer accepts any arbitrary non-null `dispatchReceipt`
|
||||||
|
- It now requires at least these minimum fields:
|
||||||
|
- `planId`
|
||||||
|
- `currentTask`
|
||||||
|
- `nextDerivedAction`
|
||||||
|
- `dispatchedAt`
|
||||||
|
|
||||||
### Note
|
6. **Hook integration is now present**
|
||||||
|
- The continuity gate is integrated into `hooks/force-recall/handler.ts`
|
||||||
|
- It currently enters the live flow through the `[APPROVED_PLAN_CONTINUITY_GATE]` injected block
|
||||||
|
|
||||||
This is a **focused export repository**, not a full mirror of the original workspace. The intent is to isolate the relevant hardening work so it can be:
|
## Current Limitation
|
||||||
|
|
||||||
- reviewed more easily
|
This workstream is now beyond pure documentation and beyond isolated script-level testing, but it still behaves more like a **prompt-level hard-gate integration** than a true engine-level abort mechanism.
|
||||||
- iterated independently
|
|
||||||
- documented clearly
|
## Suggested Next Steps
|
||||||
- integrated back into the main flow later
|
|
||||||
|
Two reasonable follow-up directions remain:
|
||||||
|
|
||||||
|
1. **push continuity hard-gate further toward stronger runtime enforcement**
|
||||||
|
2. **return to anti-blackhole / completion-delivery watchdog recovery closure**
|
||||||
|
|||||||
@@ -8,6 +8,7 @@ const execFileAsync = promisify(execFile);
|
|||||||
const LONG_TASK_WRAPPER_TIMEOUT_MS = 8000;
|
const LONG_TASK_WRAPPER_TIMEOUT_MS = 8000;
|
||||||
const LONG_TASK_GATE_LOCK_TIMEOUT_MS = 8000;
|
const LONG_TASK_GATE_LOCK_TIMEOUT_MS = 8000;
|
||||||
const LONG_TASK_AUTO_CHAIN_PLANNER_TIMEOUT_MS = 8000;
|
const LONG_TASK_AUTO_CHAIN_PLANNER_TIMEOUT_MS = 8000;
|
||||||
|
const APPROVED_PLAN_CONTINUITY_TIMEOUT_MS = 8000;
|
||||||
|
|
||||||
type AutoChainPlanResult = {
|
type AutoChainPlanResult = {
|
||||||
plannerStatus: string;
|
plannerStatus: string;
|
||||||
@@ -30,6 +31,14 @@ type GateLockResult = {
|
|||||||
allowedResponseModes?: string[];
|
allowedResponseModes?: string[];
|
||||||
};
|
};
|
||||||
|
|
||||||
|
type ApprovedPlanContinuityResult = {
|
||||||
|
ok: boolean;
|
||||||
|
status: string;
|
||||||
|
verdict: string;
|
||||||
|
reason?: string;
|
||||||
|
gate?: string;
|
||||||
|
};
|
||||||
|
|
||||||
function clamp(s: string, max = 1200): string {
|
function clamp(s: string, max = 1200): string {
|
||||||
if (!s) return s;
|
if (!s) return s;
|
||||||
if (s.length <= max) return s;
|
if (s.length <= max) return s;
|
||||||
@@ -328,6 +337,63 @@ async function runAutoChainPlanner(workspaceDir: string, gateLockResult: GateLoc
|
|||||||
return runJsonScript(plannerPath, workspaceDir, input, LONG_TASK_AUTO_CHAIN_PLANNER_TIMEOUT_MS);
|
return runJsonScript(plannerPath, workspaceDir, input, LONG_TASK_AUTO_CHAIN_PLANNER_TIMEOUT_MS);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function buildApprovedPlanContinuityInput(wrapperResult: any, autoChainPlanResult: AutoChainPlanResult | null): Record<string, unknown> | null {
|
||||||
|
if (!wrapperResult || wrapperResult.classification !== "long_task") return null;
|
||||||
|
|
||||||
|
const wrapperNextAction = wrapperResult?.nextDerivedAction ?? wrapperResult?.derivedAction ?? null;
|
||||||
|
const plannerDerivedAction = autoChainPlanResult?.derivedAction && autoChainPlanResult.derivedAction !== "none"
|
||||||
|
? {
|
||||||
|
type: autoChainPlanResult.dispatchMode ?? "no_dispatch",
|
||||||
|
action: autoChainPlanResult.derivedAction,
|
||||||
|
}
|
||||||
|
: null;
|
||||||
|
const nextDerivedAction = wrapperNextAction ?? plannerDerivedAction;
|
||||||
|
|
||||||
|
if (nextDerivedAction == null) return null;
|
||||||
|
|
||||||
|
const replyClosureState = typeof wrapperResult?.replyClosureState === "string"
|
||||||
|
? wrapperResult.replyClosureState
|
||||||
|
: (wrapperResult?.handoff?.mode === "button_path" ? "waiting_user" : "completed");
|
||||||
|
|
||||||
|
const dispatchReceipt = wrapperResult?.dispatchReceipt ?? null;
|
||||||
|
|
||||||
|
return {
|
||||||
|
planId: wrapperResult?.planId ?? "hook-preflight-approved-plan",
|
||||||
|
currentTask: wrapperResult?.currentTask ?? wrapperResult?.requiredNextAction ?? "hook-preflight-task",
|
||||||
|
taskState: wrapperResult?.taskState ?? (plannerDerivedAction ? "complete" : null),
|
||||||
|
nextDerivedAction,
|
||||||
|
replyClosureState,
|
||||||
|
dispatchReceipt,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
async function runApprovedPlanContinuityGate(workspaceDir: string, wrapperResult: any, autoChainPlanResult: AutoChainPlanResult | null): Promise<ApprovedPlanContinuityResult | null> {
|
||||||
|
const continuityPath = path.join(workspaceDir, "scripts", "approved_plan_continuity_gate.mjs");
|
||||||
|
const input = buildApprovedPlanContinuityInput(wrapperResult, autoChainPlanResult);
|
||||||
|
if (!input) return null;
|
||||||
|
return runJsonScript(continuityPath, workspaceDir, input, APPROVED_PLAN_CONTINUITY_TIMEOUT_MS);
|
||||||
|
}
|
||||||
|
|
||||||
|
function buildApprovedPlanContinuityBlock(result: ApprovedPlanContinuityResult | null): string {
|
||||||
|
if (!result) return "";
|
||||||
|
|
||||||
|
const lines = [
|
||||||
|
"[APPROVED_PLAN_CONTINUITY_GATE]",
|
||||||
|
`status=${result.status}`,
|
||||||
|
`verdict=${result.verdict}`,
|
||||||
|
];
|
||||||
|
|
||||||
|
if (result.reason) lines.push(`reason=${result.reason}`);
|
||||||
|
|
||||||
|
if (result.ok === false) {
|
||||||
|
lines.push("- HARD_GATE: Do not close out this reply as normal completion.");
|
||||||
|
lines.push("- HARD_GATE: Route back to continuity failure until a real next dispatch receipt exists, unless closure state is waiting_user, blocked, or pending_verification.");
|
||||||
|
}
|
||||||
|
|
||||||
|
lines.push("[/APPROVED_PLAN_CONTINUITY_GATE]", "");
|
||||||
|
return lines.join("\n");
|
||||||
|
}
|
||||||
|
|
||||||
function buildAutoChainPlanBlock(planResult: AutoChainPlanResult | null): string {
|
function buildAutoChainPlanBlock(planResult: AutoChainPlanResult | null): string {
|
||||||
if (!planResult) {
|
if (!planResult) {
|
||||||
return [
|
return [
|
||||||
@@ -473,8 +539,11 @@ const forceRecall = async (event: any) => {
|
|||||||
]);
|
]);
|
||||||
const gateLockResult = wrapperResult ? await runLongTaskGateLock(workspaceDir, wrapperResult) : null;
|
const gateLockResult = wrapperResult ? await runLongTaskGateLock(workspaceDir, wrapperResult) : null;
|
||||||
const autoChainPlanResult = wrapperResult ? await runAutoChainPlanner(workspaceDir, gateLockResult, wrapperResult) : null;
|
const autoChainPlanResult = wrapperResult ? await runAutoChainPlanner(workspaceDir, gateLockResult, wrapperResult) : null;
|
||||||
|
const approvedPlanContinuityResult = wrapperResult
|
||||||
|
? await runApprovedPlanContinuityGate(workspaceDir, wrapperResult, autoChainPlanResult)
|
||||||
|
: null;
|
||||||
|
|
||||||
if (!rulebook && !soul && !wrapperResult && !gateLockResult && !autoChainPlanResult) return;
|
if (!rulebook && !soul && !wrapperResult && !gateLockResult && !autoChainPlanResult && !approvedPlanContinuityResult) return;
|
||||||
|
|
||||||
const wrapperBlock = wrapperResult
|
const wrapperBlock = wrapperResult
|
||||||
? [
|
? [
|
||||||
@@ -500,6 +569,7 @@ const forceRecall = async (event: any) => {
|
|||||||
|
|
||||||
const gateLockBlock = buildGateLockBlock(gateLockResult);
|
const gateLockBlock = buildGateLockBlock(gateLockResult);
|
||||||
const autoChainPlanBlock = buildAutoChainPlanBlock(autoChainPlanResult);
|
const autoChainPlanBlock = buildAutoChainPlanBlock(autoChainPlanResult);
|
||||||
|
const approvedPlanContinuityBlock = buildApprovedPlanContinuityBlock(approvedPlanContinuityResult);
|
||||||
|
|
||||||
const recallBlock = [
|
const recallBlock = [
|
||||||
"[RECALL_GATE] Mandatory recall before ANY technical action/tool use.",
|
"[RECALL_GATE] Mandatory recall before ANY technical action/tool use.",
|
||||||
@@ -509,6 +579,7 @@ const forceRecall = async (event: any) => {
|
|||||||
wrapperBlock || null,
|
wrapperBlock || null,
|
||||||
gateLockBlock,
|
gateLockBlock,
|
||||||
autoChainPlanBlock,
|
autoChainPlanBlock,
|
||||||
|
approvedPlanContinuityBlock || null,
|
||||||
rulebook ? `RULEBOOK (source: ${rulebookPath}):\n${clamp(rulebook, 1200)}` : null,
|
rulebook ? `RULEBOOK (source: ${rulebookPath}):\n${clamp(rulebook, 1200)}` : null,
|
||||||
soul ? `SOUL (source: ${soulPath}):\n${clamp(soul, 1200)}` : null,
|
soul ? `SOUL (source: ${soulPath}):\n${clamp(soul, 1200)}` : null,
|
||||||
"[/RECALL_GATE]",
|
"[/RECALL_GATE]",
|
||||||
|
|||||||
@@ -3,6 +3,23 @@ import fs from 'node:fs';
|
|||||||
|
|
||||||
const LEGAL_TERMINAL_STATES = new Set(['waiting_user', 'blocked', 'pending_verification']);
|
const LEGAL_TERMINAL_STATES = new Set(['waiting_user', 'blocked', 'pending_verification']);
|
||||||
|
|
||||||
|
function isNonEmptyString(value) {
|
||||||
|
return typeof value === 'string' && value.trim().length > 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
function isObject(value) {
|
||||||
|
return value != null && typeof value === 'object' && !Array.isArray(value);
|
||||||
|
}
|
||||||
|
|
||||||
|
function hasValidDispatchReceipt(receipt) {
|
||||||
|
if (!isObject(receipt)) return false;
|
||||||
|
if (!isNonEmptyString(receipt.planId)) return false;
|
||||||
|
if (!isNonEmptyString(receipt.currentTask)) return false;
|
||||||
|
if (!isObject(receipt.nextDerivedAction)) return false;
|
||||||
|
if (!isNonEmptyString(receipt.dispatchedAt)) return false;
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
function parseArgs(argv) {
|
function parseArgs(argv) {
|
||||||
let inputPath = null;
|
let inputPath = null;
|
||||||
let compact = false;
|
let compact = false;
|
||||||
@@ -59,7 +76,7 @@ function evaluateContinuity(payload) {
|
|||||||
const taskComplete = payload?.taskState === 'complete';
|
const taskComplete = payload?.taskState === 'complete';
|
||||||
const nextAction = payload?.nextDerivedAction ?? payload?.derivedAction ?? null;
|
const nextAction = payload?.nextDerivedAction ?? payload?.derivedAction ?? null;
|
||||||
const nextActionKnown = nextAction != null;
|
const nextActionKnown = nextAction != null;
|
||||||
const hasDispatchReceipt = payload?.dispatchReceipt != null;
|
const hasDispatchReceipt = hasValidDispatchReceipt(payload?.dispatchReceipt ?? null);
|
||||||
const closureState = payload?.replyClosureState ?? null;
|
const closureState = payload?.replyClosureState ?? null;
|
||||||
const isLegalTerminalState = LEGAL_TERMINAL_STATES.has(closureState);
|
const isLegalTerminalState = LEGAL_TERMINAL_STATES.has(closureState);
|
||||||
|
|
||||||
|
|||||||
@@ -149,13 +149,11 @@ const tests = [
|
|||||||
});
|
});
|
||||||
|
|
||||||
if (result.status !== 0 && result.status !== null) {
|
if (result.status !== 0 && result.status !== null) {
|
||||||
throw new Error(`expected controlled execution, got status=${result.status}
|
throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`);
|
||||||
${result.stderr || result.stdout}`);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
if (!result.json || typeof result.json !== 'object') {
|
if (!result.json || typeof result.json !== 'object') {
|
||||||
throw new Error(`expected JSON output
|
throw new Error(`expected JSON output\nstdout=${result.stdout}`);
|
||||||
stdout=${result.stdout}`);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
if (result.json.ok !== false) {
|
if (result.json.ok !== false) {
|
||||||
@@ -170,6 +168,54 @@ stdout=${result.stdout}`);
|
|||||||
}
|
}
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
name: 'continuity: fails when dispatchReceipt is a fake non-null object without minimum receipt fields',
|
||||||
|
run() {
|
||||||
|
const fixture = createFixture({
|
||||||
|
'input.json': {
|
||||||
|
planId: 'plan-fake-dispatch-receipt',
|
||||||
|
currentTask: 'task-6fake',
|
||||||
|
taskState: 'complete',
|
||||||
|
nextDerivedAction: {
|
||||||
|
type: 'message_subagent',
|
||||||
|
task: 'continue with task-7fake',
|
||||||
|
},
|
||||||
|
replyClosureState: 'completed',
|
||||||
|
dispatchReceipt: {
|
||||||
|
fake: true,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
|
try {
|
||||||
|
const result = runGate({
|
||||||
|
args: ['--compact', '--input', fixture.path('input.json')],
|
||||||
|
});
|
||||||
|
|
||||||
|
if (result.status !== 0 && result.status !== null) {
|
||||||
|
throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!result.json || typeof result.json !== 'object') {
|
||||||
|
throw new Error(`expected JSON output\nstdout=${result.stdout}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (result.json.ok !== false) {
|
||||||
|
throw new Error(`expected continuity failure ok=false for fake dispatch receipt, got ${JSON.stringify(result.json)}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (result.json.verdict !== 'continuity_failure') {
|
||||||
|
throw new Error(`expected verdict=continuity_failure for fake dispatch receipt, got ${JSON.stringify(result.json.verdict)}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (result.json.reason !== 'missing_dispatch_receipt') {
|
||||||
|
throw new Error(`expected reason=missing_dispatch_receipt for fake dispatch receipt, got ${JSON.stringify(result.json.reason)}`);
|
||||||
|
}
|
||||||
|
} finally {
|
||||||
|
fixture.cleanup();
|
||||||
|
}
|
||||||
|
},
|
||||||
|
},
|
||||||
|
|
||||||
{
|
{
|
||||||
name: 'continuity: passes when task is complete, next action is known, and a dispatch receipt already exists',
|
name: 'continuity: passes when task is complete, next action is known, and a dispatch receipt already exists',
|
||||||
@@ -202,13 +248,11 @@ stdout=${result.stdout}`);
|
|||||||
});
|
});
|
||||||
|
|
||||||
if (result.status !== 0 && result.status !== null) {
|
if (result.status !== 0 && result.status !== null) {
|
||||||
throw new Error(`expected controlled execution, got status=${result.status}
|
throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`);
|
||||||
${result.stderr || result.stdout}`);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
if (!result.json || typeof result.json !== 'object') {
|
if (!result.json || typeof result.json !== 'object') {
|
||||||
throw new Error(`expected JSON output
|
throw new Error(`expected JSON output\nstdout=${result.stdout}`);
|
||||||
stdout=${result.stdout}`);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
if (result.json.ok !== true) {
|
if (result.json.ok !== true) {
|
||||||
@@ -236,7 +280,7 @@ stdout=${result.stdout}`);
|
|||||||
dispatchReceipt: {
|
dispatchReceipt: {
|
||||||
planId: 'plan-derived-action-with-bound-dispatch',
|
planId: 'plan-derived-action-with-bound-dispatch',
|
||||||
currentTask: 'task-6c',
|
currentTask: 'task-6c',
|
||||||
derivedAction: {
|
nextDerivedAction: {
|
||||||
type: 'message_subagent',
|
type: 'message_subagent',
|
||||||
task: 'continue with task-7c',
|
task: 'continue with task-7c',
|
||||||
},
|
},
|
||||||
@@ -251,13 +295,11 @@ stdout=${result.stdout}`);
|
|||||||
});
|
});
|
||||||
|
|
||||||
if (result.status !== 0 && result.status !== null) {
|
if (result.status !== 0 && result.status !== null) {
|
||||||
throw new Error(`expected controlled execution, got status=${result.status}
|
throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`);
|
||||||
${result.stderr || result.stdout}`);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
if (!result.json || typeof result.json !== 'object') {
|
if (!result.json || typeof result.json !== 'object') {
|
||||||
throw new Error(`expected JSON output
|
throw new Error(`expected JSON output\nstdout=${result.stdout}`);
|
||||||
stdout=${result.stdout}`);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
if (result.json.ok !== true) {
|
if (result.json.ok !== true) {
|
||||||
@@ -292,13 +334,11 @@ stdout=${result.stdout}`);
|
|||||||
});
|
});
|
||||||
|
|
||||||
if (result.status !== 0 && result.status !== null) {
|
if (result.status !== 0 && result.status !== null) {
|
||||||
throw new Error(`expected controlled execution, got status=${result.status}
|
throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`);
|
||||||
${result.stderr || result.stdout}`);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
if (!result.json || typeof result.json !== 'object') {
|
if (!result.json || typeof result.json !== 'object') {
|
||||||
throw new Error(`expected JSON output
|
throw new Error(`expected JSON output\nstdout=${result.stdout}`);
|
||||||
stdout=${result.stdout}`);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
if (result.json.ok !== true) {
|
if (result.json.ok !== true) {
|
||||||
@@ -333,13 +373,11 @@ stdout=${result.stdout}`);
|
|||||||
});
|
});
|
||||||
|
|
||||||
if (result.status !== 0 && result.status !== null) {
|
if (result.status !== 0 && result.status !== null) {
|
||||||
throw new Error(`expected controlled execution, got status=${result.status}
|
throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`);
|
||||||
${result.stderr || result.stdout}`);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
if (!result.json || typeof result.json !== 'object') {
|
if (!result.json || typeof result.json !== 'object') {
|
||||||
throw new Error(`expected JSON output
|
throw new Error(`expected JSON output\nstdout=${result.stdout}`);
|
||||||
stdout=${result.stdout}`);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
if (result.json.ok !== true) {
|
if (result.json.ok !== true) {
|
||||||
@@ -374,13 +412,11 @@ stdout=${result.stdout}`);
|
|||||||
});
|
});
|
||||||
|
|
||||||
if (result.status !== 0 && result.status !== null) {
|
if (result.status !== 0 && result.status !== null) {
|
||||||
throw new Error(`expected controlled execution, got status=${result.status}
|
throw new Error(`expected controlled execution, got status=${result.status}\n${result.stderr || result.stdout}`);
|
||||||
${result.stderr || result.stdout}`);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
if (!result.json || typeof result.json !== 'object') {
|
if (!result.json || typeof result.json !== 'object') {
|
||||||
throw new Error(`expected JSON output
|
throw new Error(`expected JSON output\nstdout=${result.stdout}`);
|
||||||
stdout=${result.stdout}`);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
if (result.json.ok !== true) {
|
if (result.json.ok !== true) {
|
||||||
|
|||||||
489
scripts/test_force_recall_long_task_preflight.mjs
Executable file
489
scripts/test_force_recall_long_task_preflight.mjs
Executable file
@@ -0,0 +1,489 @@
|
|||||||
|
#!/usr/bin/env node
|
||||||
|
import assert from 'node:assert/strict';
|
||||||
|
import fs from 'node:fs/promises';
|
||||||
|
import os from 'node:os';
|
||||||
|
import path from 'node:path';
|
||||||
|
import { pathToFileURL } from 'node:url';
|
||||||
|
import { execFile as execFileCallback } from 'node:child_process';
|
||||||
|
import { promisify } from 'node:util';
|
||||||
|
import { stripTypeScriptTypes } from 'node:module';
|
||||||
|
|
||||||
|
const __dirname = path.dirname(new URL(import.meta.url).pathname);
|
||||||
|
const repoRoot = path.resolve(__dirname, '..');
|
||||||
|
const handlerPath = path.join(repoRoot, 'hooks', 'force-recall', 'handler.ts');
|
||||||
|
const wrapperPath = path.join(repoRoot, 'scripts', 'long_task_governor_wrapper.mjs');
|
||||||
|
const gateLockPath = path.join(repoRoot, 'scripts', 'long_task_gate_lock.mjs');
|
||||||
|
const plannerPath = path.join(repoRoot, 'scripts', 'plan_long_task_auto_chain.mjs');
|
||||||
|
const continuityGatePath = path.join(repoRoot, 'scripts', 'approved_plan_continuity_gate.mjs');
|
||||||
|
const execFileAsync = promisify(execFileCallback);
|
||||||
|
|
||||||
|
async function importTsModule(tsPath) {
|
||||||
|
const source = await fs.readFile(tsPath, 'utf8');
|
||||||
|
const jsSource = stripTypeScriptTypes(source, { mode: 'strip' });
|
||||||
|
const dataUrl = `data:text/javascript;charset=utf-8,${encodeURIComponent(jsSource)}\n//# sourceURL=${encodeURIComponent(pathToFileURL(tsPath).href)}`;
|
||||||
|
return import(dataUrl);
|
||||||
|
}
|
||||||
|
|
||||||
|
function escapeRegex(snippet) {
|
||||||
|
return snippet.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
|
||||||
|
}
|
||||||
|
|
||||||
|
async function runScenario(forceRecall, requestText, workspaceDir = repoRoot) {
|
||||||
|
const event = {
|
||||||
|
type: 'message',
|
||||||
|
action: 'preprocessed',
|
||||||
|
context: {
|
||||||
|
workspaceDir,
|
||||||
|
body: requestText,
|
||||||
|
bodyForAgent: requestText,
|
||||||
|
},
|
||||||
|
};
|
||||||
|
|
||||||
|
await forceRecall(event);
|
||||||
|
const injected = event.context?.bodyForAgent;
|
||||||
|
assert.equal(typeof injected, 'string', 'event.context.bodyForAgent should be a string after handler runs');
|
||||||
|
return injected;
|
||||||
|
}
|
||||||
|
|
||||||
|
async function prepareTempWorkspace() {
|
||||||
|
const tempWorkspace = await fs.mkdtemp(path.join(os.tmpdir(), 'force-recall-workspace-'));
|
||||||
|
await fs.mkdir(path.join(tempWorkspace, 'scripts'), { recursive: true });
|
||||||
|
await fs.mkdir(path.join(tempWorkspace, 'hooks', 'force-recall'), { recursive: true });
|
||||||
|
await fs.mkdir(path.join(tempWorkspace, 'docs'), { recursive: true });
|
||||||
|
|
||||||
|
const copies = [
|
||||||
|
[wrapperPath, path.join(tempWorkspace, 'scripts', 'long_task_governor_wrapper.mjs')],
|
||||||
|
[gateLockPath, path.join(tempWorkspace, 'scripts', 'long_task_gate_lock.mjs')],
|
||||||
|
[plannerPath, path.join(tempWorkspace, 'scripts', 'plan_long_task_auto_chain.mjs')],
|
||||||
|
[continuityGatePath, path.join(tempWorkspace, 'scripts', 'approved_plan_continuity_gate.mjs')],
|
||||||
|
[handlerPath, path.join(tempWorkspace, 'hooks', 'force-recall', 'handler.ts')],
|
||||||
|
[path.join(repoRoot, 'docs', 'RULEBOOK.md'), path.join(tempWorkspace, 'docs', 'RULEBOOK.md')],
|
||||||
|
[path.join(repoRoot, 'SOUL.md'), path.join(tempWorkspace, 'SOUL.md')],
|
||||||
|
];
|
||||||
|
|
||||||
|
for (const [src, dest] of copies) {
|
||||||
|
await fs.copyFile(src, dest);
|
||||||
|
}
|
||||||
|
|
||||||
|
return tempWorkspace;
|
||||||
|
}
|
||||||
|
|
||||||
|
async function withPatchedWrapper(tempContent, callback) {
|
||||||
|
const originalWrapper = await fs.readFile(wrapperPath, 'utf8');
|
||||||
|
await fs.writeFile(wrapperPath, tempContent, 'utf8');
|
||||||
|
try {
|
||||||
|
return await callback();
|
||||||
|
} finally {
|
||||||
|
await fs.writeFile(wrapperPath, originalWrapper, 'utf8');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function withPatchedWrapperWorkspace(wrapperResult, callback) {
|
||||||
|
const tempWorkspace = await prepareTempWorkspace();
|
||||||
|
const wrapperScriptPath = path.join(tempWorkspace, 'scripts', 'long_task_governor_wrapper.mjs');
|
||||||
|
await fs.writeFile(wrapperScriptPath, buildWrapperScript(wrapperResult), 'utf8');
|
||||||
|
|
||||||
|
if (typeof wrapperResult.externalizedCheckpointPath === 'string' && wrapperResult.externalizedCheckpointPath.trim()) {
|
||||||
|
const checkpointPath = path.join(tempWorkspace, wrapperResult.externalizedCheckpointPath);
|
||||||
|
await fs.mkdir(path.dirname(checkpointPath), { recursive: true });
|
||||||
|
await fs.writeFile(checkpointPath, JSON.stringify({
|
||||||
|
kind: 'long_task_checkpoint',
|
||||||
|
currentStep: 'patched-wrapper-test',
|
||||||
|
nextStep: 'patched-wrapper-test-next',
|
||||||
|
verificationResult: 'checkpoint artifact readable in temp workspace',
|
||||||
|
}, null, 2) + '\n', 'utf8');
|
||||||
|
}
|
||||||
|
|
||||||
|
try {
|
||||||
|
return await callback(tempWorkspace);
|
||||||
|
} finally {
|
||||||
|
await fs.rm(tempWorkspace, { recursive: true, force: true });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
function buildWrapperScript(wrapperResult) {
|
||||||
|
return `#!/usr/bin/env node\nprocess.stdout.write(JSON.stringify(${JSON.stringify(wrapperResult)}, null, 0) + "\\n");\n`;
|
||||||
|
}
|
||||||
|
|
||||||
|
async function main() {
|
||||||
|
await Promise.all([fs.access(wrapperPath), fs.access(gateLockPath), fs.access(plannerPath), fs.access(continuityGatePath)]);
|
||||||
|
const { default: forceRecall } = await importTsModule(handlerPath);
|
||||||
|
assert.equal(typeof forceRecall, 'function', 'force-recall handler should export default function');
|
||||||
|
|
||||||
|
const requestText = [
|
||||||
|
'Please inspect the workspace files and verify the hook injection path.',
|
||||||
|
'I need you to review the behavior, choose the final accept/reject decision,',
|
||||||
|
'and continue in background with a follow-up later.',
|
||||||
|
].join(' ');
|
||||||
|
const plannerOnlyRequestText = [
|
||||||
|
'Please inspect the workspace files and verify the hook injection path.',
|
||||||
|
'Summarize the current dry-run planner state for technical inspection only.',
|
||||||
|
].join(' ');
|
||||||
|
|
||||||
|
const checkpointWorkspace = await prepareTempWorkspace();
|
||||||
|
let realWrapperInjected;
|
||||||
|
try {
|
||||||
|
realWrapperInjected = await runScenario(forceRecall, 'Dispatch a subagent to inspect logs and wait for the result.', checkpointWorkspace);
|
||||||
|
const wrapperInputPath = path.join(checkpointWorkspace, 'wrapper-input.json');
|
||||||
|
await fs.writeFile(wrapperInputPath, JSON.stringify({
|
||||||
|
requestText: 'Dispatch a subagent to inspect logs and wait for the result.',
|
||||||
|
hasFilesOrSystems: false,
|
||||||
|
needsWaiting: false,
|
||||||
|
needsSubagent: false,
|
||||||
|
needsOwnerDecision: false,
|
||||||
|
canReplyNow: false,
|
||||||
|
taskName: 'Hook preflight classification',
|
||||||
|
currentStep: 'Classifying request at preprocessed hook',
|
||||||
|
nextStep: 'Carry governor recommendation into prompt context',
|
||||||
|
nextReportCondition: 'At next meaningful milestone',
|
||||||
|
waitingOn: 'none',
|
||||||
|
blocker: 'none',
|
||||||
|
checkpointTrigger: '',
|
||||||
|
externalizedTrigger: '',
|
||||||
|
triggerKind: '',
|
||||||
|
}), 'utf8');
|
||||||
|
const wrapperRaw = await fs.readFile(path.join(checkpointWorkspace, 'scripts', 'long_task_governor_wrapper.mjs'), 'utf8');
|
||||||
|
assert.ok(wrapperRaw.length > 0, 'temp workspace should contain wrapper script');
|
||||||
|
const { stdout: wrapperStdout } = await execFileAsync('node', [path.join(checkpointWorkspace, 'scripts', 'long_task_governor_wrapper.mjs'), '--compact', '--input', wrapperInputPath], { cwd: checkpointWorkspace, encoding: 'utf8' });
|
||||||
|
const wrapperOutput = JSON.parse(wrapperStdout);
|
||||||
|
const checkpointPath = path.join(checkpointWorkspace, wrapperOutput.externalizedCheckpointPath);
|
||||||
|
const checkpointBody = await fs.readFile(checkpointPath, 'utf8');
|
||||||
|
assert.ok(checkpointBody.trim().length > 0, 'real wrapper integration should emit readable checkpoint artifact');
|
||||||
|
assert.doesNotMatch(checkpointBody, /Hook preflight classification/, 'real wrapper artifact should not fall back to taskRecord.task_name');
|
||||||
|
} finally {
|
||||||
|
await fs.rm(checkpointWorkspace, { recursive: true, force: true });
|
||||||
|
}
|
||||||
|
assert.match(realWrapperInjected, /classification=long_task/, 'real wrapper integration should classify subagent wait as long_task');
|
||||||
|
assert.match(realWrapperInjected, /gateStatus=pass/, 'real wrapper integration should pass gate with real progress evidence');
|
||||||
|
assert.match(realWrapperInjected, /allowedResponseMode=silent_continuation/, 'real wrapper integration should preserve silent continuation allowance');
|
||||||
|
assert.doesNotMatch(realWrapperInjected, /reason=claimed progression without concrete progress evidence is forbidden/, 'real wrapper integration should not fail for missing progress evidence');
|
||||||
|
assert.doesNotMatch(realWrapperInjected, /requiredEvidence=progressEvidence/, 'real wrapper integration should not require synthetic progressEvidence repair');
|
||||||
|
assert.doesNotMatch(realWrapperInjected, /task_name/, 'real wrapper integration should not leak taskRecord.task_name fallback into gate/preflight text');
|
||||||
|
|
||||||
|
const injected = await runScenario(forceRecall, requestText);
|
||||||
|
|
||||||
|
const expectedSnippets = [
|
||||||
|
'[LONG_TASK_GOVERNOR_PREFLIGHT]',
|
||||||
|
'classification=long_task',
|
||||||
|
'silentLaunchOk=false',
|
||||||
|
'handoff.mode=button_path',
|
||||||
|
'[LONG_TASK_GATE_LOCK]',
|
||||||
|
'gateStatus=fail',
|
||||||
|
'[LONG_TASK_AUTO_CHAIN_PLAN]',
|
||||||
|
'plannerStatus=blocked_by_gate',
|
||||||
|
'derivedAction=none',
|
||||||
|
'dispatchMode=no_dispatch',
|
||||||
|
'autoChainAllowed=false',
|
||||||
|
'reason=gateStatus must pass before auto-chain planning can proceed',
|
||||||
|
'requiredEvidence=gateStatus=pass',
|
||||||
|
'requiredEvidence=externalizedCheckpoint',
|
||||||
|
'requiredEvidence=concreteNextAction',
|
||||||
|
'requiredEvidence=buttonPathMode',
|
||||||
|
'reason=silent long-task cannot continue without externalized checkpoint path',
|
||||||
|
'reason=claimed execution requires evidence of a concrete next action',
|
||||||
|
'reason=owner decision flow must end in button-path, not plain text',
|
||||||
|
'ENFORCEMENT: Hook input should include progressEvidence (or equivalent concrete fields) whenever a progression claim is present.',
|
||||||
|
'HARD_GATE: Block any plain-text handoff or silent-continuation claim when externalized checkpoint evidence is missing.',
|
||||||
|
'HARD_GATE: If owner decision is involved, do not replace button-path closure with plain-text handoff.',
|
||||||
|
'ENFORCEMENT: Forbidden path: plain-text handoff that pretends the long task is already continuing without an externalized checkpoint.',
|
||||||
|
'ENFORCEMENT: Forbidden path: stating you have already entered the next task/step when the record only contains planning language and no concrete execution evidence.',
|
||||||
|
];
|
||||||
|
|
||||||
|
const unexpectedSnippets = [
|
||||||
|
'reason=claimed progression without concrete progress evidence is forbidden',
|
||||||
|
'requiredEvidence=progressEvidence',
|
||||||
|
];
|
||||||
|
|
||||||
|
for (const snippet of expectedSnippets) {
|
||||||
|
assert.match(injected, new RegExp(escapeRegex(snippet)), `missing snippet: ${snippet}`);
|
||||||
|
}
|
||||||
|
for (const snippet of unexpectedSnippets) {
|
||||||
|
assert.doesNotMatch(injected, new RegExp(escapeRegex(snippet)), `unexpected snippet present: ${snippet}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
const { evaluateGate } = await import(pathToFileURL(gateLockPath).href + `?t=${Date.now()}`);
|
||||||
|
assert.equal(typeof evaluateGate, 'function', 'long_task_gate_lock should export evaluateGate for direct tests');
|
||||||
|
|
||||||
|
const passResult = evaluateGate({
|
||||||
|
classification: 'long_task',
|
||||||
|
claimedExecution: true,
|
||||||
|
concreteNextAction: 'dispatch_follow_up_subagent',
|
||||||
|
autoChainNextAction: 'dispatch_follow_up_subagent',
|
||||||
|
autoChainDispatchEvidence: {
|
||||||
|
action: 'dispatch_follow_up_subagent',
|
||||||
|
dispatched: true,
|
||||||
|
event: 'dispatch',
|
||||||
|
},
|
||||||
|
progressionClaim: 'already progressing to the next step in background',
|
||||||
|
progressEvidence: { sessionKey: 'task-123' },
|
||||||
|
});
|
||||||
|
assert.equal(passResult.gateStatus, 'pass', 'pass-path should pass with concrete progressEvidence');
|
||||||
|
|
||||||
|
const failResult = evaluateGate({
|
||||||
|
classification: 'long_task',
|
||||||
|
claimedExecution: true,
|
||||||
|
concreteNextAction: 'dispatch_follow_up_subagent',
|
||||||
|
autoChainNextAction: 'dispatch_follow_up_subagent',
|
||||||
|
progressionClaim: 'already progressing to the next step in background',
|
||||||
|
executionEvidence: { concreteNextAction: 'dispatch_follow_up_subagent' },
|
||||||
|
});
|
||||||
|
assert.equal(failResult.gateStatus, 'fail', 'fail-path should fail when explicit auto-chain action lacks dispatch evidence');
|
||||||
|
assert.match(JSON.stringify(failResult), /autoChainDispatchEvidence/, 'fail-path should require autoChainDispatchEvidence');
|
||||||
|
|
||||||
|
const neutralResult = evaluateGate({
|
||||||
|
classification: 'long_task',
|
||||||
|
claimedExecution: true,
|
||||||
|
concreteNextAction: 'summarize findings for reply',
|
||||||
|
executionEvidence: { concreteNextAction: 'summarize findings for reply' },
|
||||||
|
});
|
||||||
|
assert.equal(neutralResult.gateStatus, 'pass', 'neutral-path should pass when there is no explicit auto-chain next action');
|
||||||
|
assert.doesNotMatch(JSON.stringify(neutralResult), /autoChainDispatchEvidence/, 'neutral-path should not require auto-chain dispatch evidence');
|
||||||
|
|
||||||
|
const directAutoChainFailResult = evaluateGate({
|
||||||
|
classification: 'long_task',
|
||||||
|
claimedExecution: true,
|
||||||
|
concreteNextAction: 'dispatch_follow_up_subagent',
|
||||||
|
autoChainNextAction: 'dispatch_follow_up_subagent',
|
||||||
|
});
|
||||||
|
assert.equal(directAutoChainFailResult.gateStatus, 'fail', 'direct evaluator should fail when explicit auto-chain action has no dispatch evidence');
|
||||||
|
assert.match(JSON.stringify(directAutoChainFailResult), /explicit auto-chain next action requires dispatched-action evidence/, 'direct evaluator fail-path should mention missing dispatched-action evidence');
|
||||||
|
|
||||||
|
const mismatchedDispatchEvidenceResult = evaluateGate({
|
||||||
|
classification: 'long_task',
|
||||||
|
claimedExecution: true,
|
||||||
|
concreteNextAction: 'dispatch_follow_up_subagent',
|
||||||
|
autoChainNextAction: 'dispatch_follow_up_subagent',
|
||||||
|
autoChainDispatchEvidence: {
|
||||||
|
action: 'dispatch_other_subagent',
|
||||||
|
dispatched: true,
|
||||||
|
event: 'dispatch',
|
||||||
|
},
|
||||||
|
});
|
||||||
|
assert.equal(mismatchedDispatchEvidenceResult.gateStatus, 'fail', 'mismatched dispatch evidence should fail');
|
||||||
|
assert.match(JSON.stringify(mismatchedDispatchEvidenceResult), /autoChainDispatchEvidence/, 'mismatched dispatch evidence should still require matching autoChainDispatchEvidence');
|
||||||
|
|
||||||
|
const fakeCheckpointDispatchEvidenceResult = evaluateGate({
|
||||||
|
classification: 'long_task',
|
||||||
|
claimedExecution: true,
|
||||||
|
concreteNextAction: 'dispatch_follow_up_subagent',
|
||||||
|
autoChainNextAction: 'dispatch_follow_up_subagent',
|
||||||
|
autoChainDispatchEvidence: {
|
||||||
|
sessionKey: 'task-123',
|
||||||
|
checkpointPath: 'checkpoints/task-123.json',
|
||||||
|
},
|
||||||
|
});
|
||||||
|
assert.equal(fakeCheckpointDispatchEvidenceResult.gateStatus, 'fail', 'checkpoint/session-only dispatch evidence should fail');
|
||||||
|
assert.match(JSON.stringify(fakeCheckpointDispatchEvidenceResult), /explicit auto-chain next action requires dispatched-action evidence/, 'checkpoint/session-only dispatch evidence should be rejected as fake dispatch evidence');
|
||||||
|
|
||||||
|
const neutralSnakeCaseResult = evaluateGate({
|
||||||
|
classification: 'long_task',
|
||||||
|
claimedExecution: true,
|
||||||
|
concreteNextAction: 'summarize findings for reply',
|
||||||
|
autoChainNextAction: 'checkpoint_session_metadata_only',
|
||||||
|
executionEvidence: { concreteNextAction: 'summarize findings for reply' },
|
||||||
|
});
|
||||||
|
assert.equal(neutralSnakeCaseResult.gateStatus, 'pass', 'neutral snake_case non-dispatch action should not trigger dispatch-evidence requirement');
|
||||||
|
assert.doesNotMatch(JSON.stringify(neutralSnakeCaseResult), /autoChainDispatchEvidence/, 'neutral snake_case non-dispatch action should not mention dispatch-evidence requirement');
|
||||||
|
|
||||||
|
const passInjected = await withPatchedWrapperWorkspace({
|
||||||
|
classification: 'long_task',
|
||||||
|
silentCandidate: true,
|
||||||
|
needsCheckpoint: true,
|
||||||
|
needsSubagent: false,
|
||||||
|
needsOwnerDecision: false,
|
||||||
|
silentLaunchOk: true,
|
||||||
|
silentLaunchReason: 'checkpoint established',
|
||||||
|
requiredNextAction: 'dispatch_follow_up_subagent',
|
||||||
|
autoChainDispatchEvidence: {
|
||||||
|
action: 'dispatch_follow_up_subagent',
|
||||||
|
dispatched: true,
|
||||||
|
event: 'dispatch',
|
||||||
|
},
|
||||||
|
progressEvidence: { sessionKey: 'task-123' },
|
||||||
|
externalizedCheckpointPath: 'checkpoints/task-123.json',
|
||||||
|
handoff: { mode: 'direct_reply' },
|
||||||
|
}, async (workspaceDir) => runScenario(forceRecall, requestText, workspaceDir));
|
||||||
|
assert.match(passInjected, /gateStatus=pass/, 'hook pass-path should pass when wrapper provides concrete progressEvidence');
|
||||||
|
assert.match(passInjected, /\[LONG_TASK_AUTO_CHAIN_PLAN\]/, 'hook pass-path should emit auto-chain plan block');
|
||||||
|
assert.match(passInjected, /plannerStatus=pass/, 'hook pass-path should expose planner pass result');
|
||||||
|
assert.match(passInjected, /derivedAction=dispatch_spec_review/, 'hook pass-path should derive dry-run spec review dispatch');
|
||||||
|
assert.match(passInjected, /dispatchMode=dry_run_dispatch/, 'hook pass-path should stay in dry-run dispatch mode');
|
||||||
|
assert.match(passInjected, /autoChainAllowed=true/, 'hook pass-path should allow auto-chain in dry-run planner output');
|
||||||
|
assert.match(passInjected, /\[APPROVED_PLAN_CONTINUITY_GATE\]/, 'hook pass-path should emit approved-plan continuity gate block');
|
||||||
|
assert.match(passInjected, /status=continuity_failure/, 'hook pass-path should fail continuity when planner only returns dry-run dispatch without a real receipt');
|
||||||
|
assert.match(passInjected, /verdict=continuity_failure/, 'hook pass-path should expose continuity failure verdict when no real dispatch receipt exists');
|
||||||
|
assert.match(passInjected, /reason=missing_dispatch_receipt/, 'hook pass-path should require a real dispatch receipt instead of treating dry-run dispatch as one');
|
||||||
|
assert.match(passInjected, /Route back to continuity failure until a real next dispatch receipt exists/, 'hook pass-path should hard-gate normal closeout until a real receipt exists');
|
||||||
|
assert.doesNotMatch(passInjected, /\[APPROVED_PLAN_CONTINUITY_GATE\][\s\S]*status=pass/, 'hook pass-path should not let approved-plan continuity pass on dry-run dispatch alone');
|
||||||
|
|
||||||
|
const failInjected = await withPatchedWrapper(buildWrapperScript({
|
||||||
|
classification: 'long_task',
|
||||||
|
silentCandidate: false,
|
||||||
|
needsCheckpoint: false,
|
||||||
|
needsSubagent: false,
|
||||||
|
needsOwnerDecision: false,
|
||||||
|
silentLaunchOk: false,
|
||||||
|
requiredNextAction: 'dispatch_follow_up_subagent',
|
||||||
|
handoff: { mode: 'direct_reply' },
|
||||||
|
}), async () => runScenario(forceRecall, requestText));
|
||||||
|
assert.match(failInjected, /gateStatus=fail/, 'hook fail-path should fail when wrapper exposes explicit auto-chain action without dispatch evidence');
|
||||||
|
assert.match(failInjected, /\[LONG_TASK_AUTO_CHAIN_PLAN\]/, 'hook fail-path should emit auto-chain plan block');
|
||||||
|
assert.match(failInjected, /plannerStatus=blocked_by_gate/, 'hook fail-path should report planner blocked by gate');
|
||||||
|
assert.match(failInjected, /derivedAction=none/, 'hook fail-path should not derive a dry-run action');
|
||||||
|
assert.match(failInjected, /dispatchMode=no_dispatch/, 'hook fail-path should remain no-dispatch');
|
||||||
|
assert.match(failInjected, /autoChainAllowed=false/, 'hook fail-path should not allow auto-chain');
|
||||||
|
assert.match(failInjected, /reason=explicit auto-chain next action requires dispatched-action evidence/, 'hook fail-path should mention missing dispatched-action evidence');
|
||||||
|
assert.match(failInjected, /requiredEvidence=autoChainDispatchEvidence/, 'hook fail-path should require autoChainDispatchEvidence');
|
||||||
|
|
||||||
|
const neutralInjected = await withPatchedWrapper(buildWrapperScript({
|
||||||
|
classification: 'long_task',
|
||||||
|
silentCandidate: false,
|
||||||
|
needsCheckpoint: false,
|
||||||
|
needsSubagent: false,
|
||||||
|
needsOwnerDecision: false,
|
||||||
|
silentLaunchOk: false,
|
||||||
|
requiredNextAction: 'summarize findings for reply',
|
||||||
|
handoff: { mode: 'direct_reply' },
|
||||||
|
}), async () => runScenario(forceRecall, requestText));
|
||||||
|
assert.match(neutralInjected, /gateStatus=pass/, 'hook neutral-path should pass when wrapper does not expose an explicit auto-chain action');
|
||||||
|
assert.match(neutralInjected, /\[LONG_TASK_AUTO_CHAIN_PLAN\]/, 'hook neutral-path should emit auto-chain plan block');
|
||||||
|
assert.match(neutralInjected, /plannerStatus=none/, 'hook neutral-path should report no derived auto-chain action');
|
||||||
|
assert.match(neutralInjected, /derivedAction=none/, 'hook neutral-path should keep derivedAction as none');
|
||||||
|
assert.match(neutralInjected, /dispatchMode=no_dispatch/, 'hook neutral-path should remain no-dispatch');
|
||||||
|
assert.match(neutralInjected, /autoChainAllowed=false/, 'hook neutral-path should keep auto-chain disabled');
|
||||||
|
assert.doesNotMatch(neutralInjected, /reason=explicit auto-chain next action requires dispatched-action evidence/, 'hook neutral-path should not fail on auto-chain evidence when no explicit tool action exists');
|
||||||
|
|
||||||
|
const fakeProgressEvidenceInjected = await withPatchedWrapper(buildWrapperScript({
|
||||||
|
classification: 'long_task',
|
||||||
|
silentCandidate: true,
|
||||||
|
needsCheckpoint: true,
|
||||||
|
needsSubagent: false,
|
||||||
|
needsOwnerDecision: false,
|
||||||
|
silentLaunchOk: true,
|
||||||
|
silentLaunchReason: 'task name exists but no externalized artifact',
|
||||||
|
taskRecord: { task_name: 'descriptive-task-name-only' },
|
||||||
|
handoff: { mode: 'direct_reply' },
|
||||||
|
}), async () => runScenario(forceRecall, requestText));
|
||||||
|
assert.match(fakeProgressEvidenceInjected, /gateStatus=fail/, 'hook fake-progress-evidence path should fail when only task_name exists');
|
||||||
|
assert.match(fakeProgressEvidenceInjected, /reason=claimed progression without concrete progress evidence is forbidden/, 'hook fake-progress-evidence path should mention missing concrete progress evidence');
|
||||||
|
assert.match(fakeProgressEvidenceInjected, /requiredEvidence=progressEvidence/, 'hook fake-progress-evidence path should require progressEvidence');
|
||||||
|
assert.match(fakeProgressEvidenceInjected, /reason=silent long-task cannot continue without externalized checkpoint path/, 'hook fake-progress-evidence path should also require real checkpoint evidence');
|
||||||
|
|
||||||
|
const specReviewWithoutEvidenceInjected = await withPatchedWrapper(buildWrapperScript({
|
||||||
|
classification: 'long_task',
|
||||||
|
silentCandidate: false,
|
||||||
|
needsCheckpoint: false,
|
||||||
|
needsSubagent: false,
|
||||||
|
needsOwnerDecision: false,
|
||||||
|
silentLaunchOk: true,
|
||||||
|
requiredNextAction: 'dispatch_code_quality_review',
|
||||||
|
autoChainDispatchEvidence: {
|
||||||
|
action: 'dispatch_code_quality_review',
|
||||||
|
dispatched: true,
|
||||||
|
event: 'dispatch',
|
||||||
|
},
|
||||||
|
progressEvidence: { sessionKey: 'task-spec-review-missing-evidence' },
|
||||||
|
externalizedCheckpointPath: 'checkpoints/task-spec-review-missing-evidence.json',
|
||||||
|
handoff: { mode: 'direct_reply' },
|
||||||
|
}), async () => runScenario(forceRecall, plannerOnlyRequestText));
|
||||||
|
assert.match(specReviewWithoutEvidenceInjected, /\[LONG_TASK_AUTO_CHAIN_PLAN\]/, 'hook spec-review missing-evidence path should emit auto-chain plan block');
|
||||||
|
assert.match(specReviewWithoutEvidenceInjected, /plannerStatus=blocked_by_evidence/, 'hook spec-review missing-evidence path should block on missing evidence');
|
||||||
|
assert.match(specReviewWithoutEvidenceInjected, /derivedAction=none/, 'hook spec-review missing-evidence path should not derive a dry-run action');
|
||||||
|
assert.match(specReviewWithoutEvidenceInjected, /dispatchMode=no_dispatch/, 'hook spec-review missing-evidence path should stay no-dispatch');
|
||||||
|
assert.match(specReviewWithoutEvidenceInjected, /autoChainAllowed=false/, 'hook spec-review missing-evidence path should not allow auto-chain');
|
||||||
|
assert.match(specReviewWithoutEvidenceInjected, /reason=review pass evidence missing for code quality review transition/, 'hook spec-review missing-evidence path should mention missing review evidence');
|
||||||
|
assert.match(specReviewWithoutEvidenceInjected, /requiredEvidence=reviewEvidence/, 'hook spec-review missing-evidence path should require reviewEvidence');
|
||||||
|
|
||||||
|
const fixSliceWithoutEvidenceInjected = await withPatchedWrapper(buildWrapperScript({
|
||||||
|
classification: 'long_task',
|
||||||
|
silentCandidate: false,
|
||||||
|
needsCheckpoint: false,
|
||||||
|
needsSubagent: false,
|
||||||
|
needsOwnerDecision: false,
|
||||||
|
silentLaunchOk: true,
|
||||||
|
silentLaunchReason: 'review blocked by findings',
|
||||||
|
requiredNextAction: 'dispatch_fix_slice',
|
||||||
|
autoChainDispatchEvidence: {
|
||||||
|
action: 'dispatch_fix_slice',
|
||||||
|
dispatched: true,
|
||||||
|
event: 'dispatch',
|
||||||
|
},
|
||||||
|
progressEvidence: { sessionKey: 'task-fix-slice-missing-evidence' },
|
||||||
|
externalizedCheckpointPath: 'checkpoints/task-fix-slice-missing-evidence.json',
|
||||||
|
handoff: { mode: 'direct_reply' },
|
||||||
|
}), async () => runScenario(forceRecall, plannerOnlyRequestText));
|
||||||
|
assert.match(fixSliceWithoutEvidenceInjected, /\[LONG_TASK_AUTO_CHAIN_PLAN\]/, 'hook fix-slice missing-evidence path should emit auto-chain plan block');
|
||||||
|
assert.match(fixSliceWithoutEvidenceInjected, /plannerStatus=blocked_by_evidence/, 'hook fix-slice missing-evidence path should block on missing evidence');
|
||||||
|
assert.match(fixSliceWithoutEvidenceInjected, /derivedAction=none/, 'hook fix-slice missing-evidence path should not derive a dry-run action');
|
||||||
|
assert.match(fixSliceWithoutEvidenceInjected, /dispatchMode=no_dispatch/, 'hook fix-slice missing-evidence path should stay no-dispatch');
|
||||||
|
assert.match(fixSliceWithoutEvidenceInjected, /autoChainAllowed=false/, 'hook fix-slice missing-evidence path should not allow auto-chain');
|
||||||
|
assert.match(fixSliceWithoutEvidenceInjected, /reason=blocker evidence missing for retry\/fix transition/, 'hook fix-slice missing-evidence path should mention missing blocker evidence');
|
||||||
|
assert.match(fixSliceWithoutEvidenceInjected, /requiredEvidence=blockerEvidence/, 'hook fix-slice missing-evidence path should require blockerEvidence');
|
||||||
|
|
||||||
|
const specReviewWithoutImplementationEvidenceInjected = await withPatchedWrapper(buildWrapperScript({
|
||||||
|
classification: 'long_task',
|
||||||
|
silentCandidate: false,
|
||||||
|
needsCheckpoint: false,
|
||||||
|
needsSubagent: false,
|
||||||
|
needsOwnerDecision: false,
|
||||||
|
silentLaunchOk: true,
|
||||||
|
requiredNextAction: 'dispatch_spec_review',
|
||||||
|
autoChainDispatchEvidence: {
|
||||||
|
action: 'dispatch_spec_review',
|
||||||
|
dispatched: true,
|
||||||
|
event: 'dispatch',
|
||||||
|
},
|
||||||
|
progressEvidence: { sessionKey: 'task-implementation-missing-evidence' },
|
||||||
|
externalizedCheckpointPath: 'checkpoints/task-implementation-missing-evidence.json',
|
||||||
|
handoff: { mode: 'direct_reply' },
|
||||||
|
}), async () => runScenario(forceRecall, plannerOnlyRequestText));
|
||||||
|
assert.match(specReviewWithoutImplementationEvidenceInjected, /\[LONG_TASK_AUTO_CHAIN_PLAN\]/, 'hook implementation missing-evidence path should emit auto-chain plan block');
|
||||||
|
assert.match(specReviewWithoutImplementationEvidenceInjected, /plannerStatus=blocked_by_evidence/, 'hook implementation missing-evidence path should block on missing evidence');
|
||||||
|
assert.match(specReviewWithoutImplementationEvidenceInjected, /derivedAction=none/, 'hook implementation missing-evidence path should not derive a dry-run action');
|
||||||
|
assert.match(specReviewWithoutImplementationEvidenceInjected, /dispatchMode=no_dispatch/, 'hook implementation missing-evidence path should stay no-dispatch');
|
||||||
|
assert.match(specReviewWithoutImplementationEvidenceInjected, /autoChainAllowed=false/, 'hook implementation missing-evidence path should not allow auto-chain');
|
||||||
|
assert.match(specReviewWithoutImplementationEvidenceInjected, /reason=implementation evidence missing for review-required next action/, 'hook implementation missing-evidence path should mention missing implementation evidence');
|
||||||
|
assert.match(specReviewWithoutImplementationEvidenceInjected, /requiredEvidence=executionEvidence/, 'hook implementation missing-evidence path should require executionEvidence');
|
||||||
|
|
||||||
|
const originalGateLock = await fs.readFile(gateLockPath, 'utf8');
|
||||||
|
const tempDir = await fs.mkdtemp(path.join(os.tmpdir(), 'force-recall-gate-lock-'));
|
||||||
|
const backupPath = path.join(tempDir, path.basename(gateLockPath));
|
||||||
|
await fs.writeFile(backupPath, originalGateLock, 'utf8');
|
||||||
|
await fs.writeFile(gateLockPath, '#!/usr/bin/env node\nprocess.exit(1);\n', 'utf8');
|
||||||
|
|
||||||
|
let degradedInjected;
|
||||||
|
try {
|
||||||
|
degradedInjected = await runScenario(forceRecall, requestText);
|
||||||
|
} finally {
|
||||||
|
const backup = await fs.readFile(backupPath, 'utf8');
|
||||||
|
await fs.writeFile(gateLockPath, backup, 'utf8');
|
||||||
|
await fs.rm(tempDir, { recursive: true, force: true });
|
||||||
|
}
|
||||||
|
|
||||||
|
const degradedExpectedSnippets = [
|
||||||
|
'[LONG_TASK_GATE_LOCK]',
|
||||||
|
'gateStatus=degraded',
|
||||||
|
'gateRequired=unknown',
|
||||||
|
'HARD_GATE: Evaluator unavailable is not permission to claim silent continuation or next-task progression without verifiable progress evidence.',
|
||||||
|
'HARD_GATE: Fall back to a non-silent, evidence-preserving follow-up if you cannot prove checkpoint state or concrete execution.',
|
||||||
|
];
|
||||||
|
|
||||||
|
for (const snippet of degradedExpectedSnippets) {
|
||||||
|
assert.match(degradedInjected, new RegExp(escapeRegex(snippet)), `missing degraded snippet: ${snippet}`);
|
||||||
|
}
|
||||||
|
|
||||||
|
process.stdout.write(JSON.stringify({
|
||||||
|
ok: true,
|
||||||
|
gatePaths: {
|
||||||
|
pass: passResult.gateStatus,
|
||||||
|
fail: failResult.gateStatus,
|
||||||
|
neutral: neutralResult.gateStatus,
|
||||||
|
},
|
||||||
|
bodyPreview: injected.split('\n').slice(0, 35),
|
||||||
|
}, null, 2) + '\n');
|
||||||
|
}
|
||||||
|
|
||||||
|
main().catch((error) => {
|
||||||
|
console.error(error);
|
||||||
|
process.exitCode = 1;
|
||||||
|
});
|
||||||
Reference in New Issue
Block a user