Files
approved-plan-continuity-ha…/docs/plans/2026-04-24-subagent-anti-blackhole-watchdog.md

19 KiB

Subagent Anti-Blackhole / Completion-Delivery Watchdog Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Prevent B-class fake timeouts where a subagent finishes, stalls, or loses its return path off-thread and the main conversation never receives a trustworthy completion update.

Architecture: Build this in very small layers: first define receipts and states, then pin the blackhole cases with fail-first tests, then implement deterministic receipt-state logic, then add done-but-not-forwarded recovery decisions, then add owner-visible reporting rules and scenario simulations. Keep all early slices file-backed and test-driven before touching any live-session integration.

Tech Stack: Node.js, MJS test runners, file-backed JSON state, OpenClaw subagent/session concepts, docs/runbooks


Task 1: Define dispatch receipt fields

Files:

  • Modify: docs/runbooks/subagent-anti-blackhole.md

Step 1: Write the receipt field list

  • Define only dispatch fields:
    • runId
    • childSessionKey
    • dispatchAt
    • expectedBy

Step 2: Verify file contains the new field names Run: grep -n "runId\|childSessionKey\|dispatchAt\|expectedBy" docs/runbooks/subagent-anti-blackhole.md Expected: matching lines found

Step 3: Commit

git add docs/runbooks/subagent-anti-blackhole.md
git commit -m "docs: define subagent dispatch receipt fields"

Task 2: Define completion receipt fields

Files:

  • Modify: docs/runbooks/subagent-anti-blackhole.md

Step 1: Write the completion field list

  • Define only completion fields:
    • completionReceivedAt
    • forwardedToMain
    • resultSource

Step 2: Verify file contains the new field names Run: grep -n "completionReceivedAt\|forwardedToMain\|resultSource" docs/runbooks/subagent-anti-blackhole.md Expected: matching lines found

Step 3: Commit

git add docs/runbooks/subagent-anti-blackhole.md
git commit -m "docs: define subagent completion receipt fields"

Task 3: Define watchdog statuses

Files:

  • Modify: docs/runbooks/subagent-anti-blackhole.md

Step 1: Add the status enum

  • Define:
    • active
    • suspect_delivery_failure
    • done_but_not_forwarded
    • completed
    • recovered
    • blocked

Step 2: Verify status names exist Run: grep -n "suspect_delivery_failure\|done_but_not_forwarded\|recovered" docs/runbooks/subagent-anti-blackhole.md Expected: matching lines found

Step 3: Commit

git add docs/runbooks/subagent-anti-blackhole.md
git commit -m "docs: define subagent watchdog statuses"

Task 4: Define B-class failure modes

Files:

  • Modify: docs/runbooks/subagent-anti-blackhole.md

Step 1: Write the failure mode bullets

  • Add:
    • done but not forwarded
    • no completion event received
    • session exists but no result bounce
    • unclear slow-run vs delivery failure

Step 2: Verify phrases exist Run: grep -n "done but not forwarded\|completion event\|result bounce\|delivery failure" docs/runbooks/subagent-anti-blackhole.md Expected: matching lines found

Step 3: Commit

git add docs/runbooks/subagent-anti-blackhole.md
git commit -m "docs: define B-class subagent failure modes"

Task 5: Create watchdog script skeleton

Files:

  • Create: scripts/subagent_delivery_watchdog.mjs

Step 1: Create the script shell

  • Add CLI parsing and a placeholder JSON response.

Step 2: Verify it runs Run: node scripts/subagent_delivery_watchdog.mjs --compact --input /dev/null || true Expected: script exists and is executable enough for next test work

Step 3: Commit

git add scripts/subagent_delivery_watchdog.mjs
git commit -m "chore: add subagent delivery watchdog skeleton"

Task 6: Create watchdog test skeleton

Files:

  • Create: scripts/test_subagent_delivery_watchdog.mjs

Step 1: Create the test shell

  • Add basic harness structure and fixture runner.

Step 2: Verify test file executes Run: node scripts/test_subagent_delivery_watchdog.mjs || true Expected: test runner executes, even if failing

Step 3: Commit

git add scripts/test_subagent_delivery_watchdog.mjs
git commit -m "test: add subagent watchdog test skeleton"

Task 7: Add active-before-SLA test

Files:

  • Modify: scripts/test_subagent_delivery_watchdog.mjs

Step 1: Write the test

  • dispatch exists
  • no completion receipt yet
  • current time still before SLA
  • expect active

Step 2: Run test to verify it fails Run: node scripts/test_subagent_delivery_watchdog.mjs Expected: FAIL on missing logic

Step 3: Commit

git add scripts/test_subagent_delivery_watchdog.mjs
git commit -m "test: require active status before SLA breach"

Task 8: Add suspect-delivery-failure test

Files:

  • Modify: scripts/test_subagent_delivery_watchdog.mjs

Step 1: Write the test

  • dispatch exists
  • no completion receipt
  • current time beyond SLA
  • expect suspect_delivery_failure

Step 2: Run test to verify it fails Run: node scripts/test_subagent_delivery_watchdog.mjs Expected: FAIL on new assertion

Step 3: Commit

git add scripts/test_subagent_delivery_watchdog.mjs
git commit -m "test: detect suspected delivery failure after SLA"

Task 9: Add completed-status test

Files:

  • Modify: scripts/test_subagent_delivery_watchdog.mjs

Step 1: Write the test

  • dispatch exists
  • completion receipt exists
  • expect completed

Step 2: Run test to verify it fails Run: node scripts/test_subagent_delivery_watchdog.mjs Expected: FAIL on completed path

Step 3: Commit

git add scripts/test_subagent_delivery_watchdog.mjs
git commit -m "test: close watchdog on completion receipt"

Task 10: Add state shape fixture

Files:

  • Create: state/subagent-delivery-watchdog/README.md
  • Create: state/subagent-delivery-watchdog/.gitkeep

Step 1: Define the state JSON shape in README

  • Include receipt fields and status fields.

Step 2: Verify files exist Run: test -f state/subagent-delivery-watchdog/README.md && test -f state/subagent-delivery-watchdog/.gitkeep && echo OK Expected: OK

Step 3: Commit

git add state/subagent-delivery-watchdog/README.md state/subagent-delivery-watchdog/.gitkeep
git commit -m "docs: define watchdog state storage shape"

Task 11: Implement dispatch receipt write

Files:

  • Modify: scripts/subagent_delivery_watchdog.mjs

Step 1: Add a function to write dispatch receipt state

  • Only handle a new dispatch record.

Step 2: Run tests Run: node scripts/test_subagent_delivery_watchdog.mjs Expected: some tests still fail, but dispatch state path exists

Step 3: Commit

git add scripts/subagent_delivery_watchdog.mjs
git commit -m "feat: write subagent dispatch receipt state"

Task 12: Implement completion receipt write

Files:

  • Modify: scripts/subagent_delivery_watchdog.mjs

Step 1: Add a function to write completion receipt state

  • Only update completion-related fields.

Step 2: Run tests Run: node scripts/test_subagent_delivery_watchdog.mjs Expected: some tests still fail, but completion data path exists

Step 3: Commit

git add scripts/subagent_delivery_watchdog.mjs
git commit -m "feat: write subagent completion receipt state"

Task 13: Implement status recompute for active/completed/suspect

Files:

  • Modify: scripts/subagent_delivery_watchdog.mjs

Step 1: Add status recompute logic

  • Implement only:
    • active
    • suspect_delivery_failure
    • completed

Step 2: Run tests Run: node scripts/test_subagent_delivery_watchdog.mjs Expected: Task 7-9 tests pass

Step 3: Commit

git add scripts/subagent_delivery_watchdog.mjs scripts/test_subagent_delivery_watchdog.mjs
git commit -m "feat: recompute basic watchdog statuses"

Task 14: Add done-but-not-forwarded test

Files:

  • Modify: scripts/test_subagent_delivery_watchdog.mjs

Step 1: Write the test

  • child run marked done
  • no completion receipt in main thread
  • expect done_but_not_forwarded

Step 2: Run tests to verify it fails Run: node scripts/test_subagent_delivery_watchdog.mjs Expected: FAIL on new assertion

Step 3: Commit

git add scripts/test_subagent_delivery_watchdog.mjs
git commit -m "test: detect done but not forwarded state"

Task 15: Implement done-but-not-forwarded state

Files:

  • Modify: scripts/subagent_delivery_watchdog.mjs

Step 1: Add done-but-not-forwarded detection

  • Use child-done signal + missing completion receipt.

Step 2: Run tests Run: node scripts/test_subagent_delivery_watchdog.mjs Expected: done-but-not-forwarded test passes

Step 3: Commit

git add scripts/subagent_delivery_watchdog.mjs scripts/test_subagent_delivery_watchdog.mjs
git commit -m "feat: detect done without forwarded completion"

Task 16: Add first recovery-action test

Files:

  • Modify: scripts/test_subagent_delivery_watchdog.mjs

Step 1: Write fetch-history recovery test

  • done but not forwarded
  • no prior recovery action
  • expect recovery decision fetch_history

Step 2: Run tests to verify it fails Run: node scripts/test_subagent_delivery_watchdog.mjs Expected: FAIL on recovery decision

Step 3: Commit

git add scripts/test_subagent_delivery_watchdog.mjs
git commit -m "test: fetch history after missing forwarded completion"

Task 17: Implement fetch-history recovery decision

Files:

  • Modify: scripts/subagent_delivery_watchdog.mjs

Step 1: Add minimal recovery decision logic

  • Return fetch_history for first-time done-but-not-forwarded.

Step 2: Run tests Run: node scripts/test_subagent_delivery_watchdog.mjs Expected: fetch-history recovery test passes

Step 3: Commit

git add scripts/subagent_delivery_watchdog.mjs scripts/test_subagent_delivery_watchdog.mjs
git commit -m "feat: recover with history fetch first"

Task 18: Add respawn-escalation test

Files:

  • Modify: scripts/test_subagent_delivery_watchdog.mjs

Step 1: Write the respawn test

  • recovery already attempted once
  • still no forwarded completion
  • expect respawn

Step 2: Run tests to verify it fails Run: node scripts/test_subagent_delivery_watchdog.mjs Expected: FAIL on respawn decision

Step 3: Commit

git add scripts/test_subagent_delivery_watchdog.mjs
git commit -m "test: escalate to respawn after failed recovery"

Task 19: Implement respawn decision

Files:

  • Modify: scripts/subagent_delivery_watchdog.mjs

Step 1: Add respawn logic

  • Return respawn when fetch-history path did not recover delivery.

Step 2: Run tests Run: node scripts/test_subagent_delivery_watchdog.mjs Expected: respawn test passes

Step 3: Commit

git add scripts/subagent_delivery_watchdog.mjs scripts/test_subagent_delivery_watchdog.mjs
git commit -m "feat: respawn after failed delivery recovery"

Task 20: Add blocked-escalation test

Files:

  • Modify: scripts/test_subagent_delivery_watchdog.mjs

Step 1: Write the blocked test

  • repeated recovery failure
  • expect blocked plus owner-visible reporting requirement

Step 2: Run tests to verify it fails Run: node scripts/test_subagent_delivery_watchdog.mjs Expected: FAIL on blocked escalation

Step 3: Commit

git add scripts/test_subagent_delivery_watchdog.mjs
git commit -m "test: escalate repeated delivery failures to blocked"

Task 21: Implement blocked escalation

Files:

  • Modify: scripts/subagent_delivery_watchdog.mjs

Step 1: Add blocked escalation logic

  • repeated recovery failure -> blocked

Step 2: Run tests Run: node scripts/test_subagent_delivery_watchdog.mjs Expected: blocked escalation test passes

Step 3: Commit

git add scripts/subagent_delivery_watchdog.mjs scripts/test_subagent_delivery_watchdog.mjs
git commit -m "feat: block repeated subagent delivery failures"

Task 22: Add owner-visible reporting rule for suspect state

Files:

  • Modify: WORKFLOW.md
  • Modify: AGENTS.md
  • Modify: docs/runbooks/subagent-anti-blackhole.md

Step 1: Add suspect-state reporting rule

  • If SLA is crossed with no completion receipt, the owner must be informed.

Step 2: Verify text exists Run: grep -RIn "SLA\|suspect_delivery_failure" WORKFLOW.md AGENTS.md docs/runbooks/subagent-anti-blackhole.md Expected: matching lines found

Step 3: Commit

git add WORKFLOW.md AGENTS.md docs/runbooks/subagent-anti-blackhole.md
git commit -m "docs: require reporting on suspect delivery failure"

Task 23: Add owner-visible reporting rule for done-but-not-forwarded

Files:

  • Modify: WORKFLOW.md
  • Modify: AGENTS.md
  • Modify: docs/runbooks/subagent-anti-blackhole.md

Step 1: Add done-but-not-forwarded reporting rule

  • Must state that result exists but did not bounce back.

Step 2: Verify text exists Run: grep -RIn "done but not forwarded\|did not bounce back" WORKFLOW.md AGENTS.md docs/runbooks/subagent-anti-blackhole.md Expected: matching lines found

Step 3: Commit

git add WORKFLOW.md AGENTS.md docs/runbooks/subagent-anti-blackhole.md
git commit -m "docs: require reporting on missing forwarded completion"

Task 24: Add rule to fetch history before respawn

Files:

  • Modify: WORKFLOW.md
  • Modify: docs/runbooks/subagent-delivery-recovery.md

Step 1: Add the history-first rule

  • Done-but-not-forwarded should prefer fetch_history before respawn.

Step 2: Verify text exists Run: grep -RIn "fetch_history\|before respawn" WORKFLOW.md docs/runbooks/subagent-delivery-recovery.md Expected: matching lines found

Step 3: Commit

git add WORKFLOW.md docs/runbooks/subagent-delivery-recovery.md
git commit -m "docs: prefer history fetch before respawn"

Task 25: Add no-silent-waiting-after-SLA rule

Files:

  • Modify: WORKFLOW.md
  • Modify: AGENTS.md

Step 1: Add the no-silent-waiting rule

  • Once SLA is crossed, silent waiting is forbidden.

Step 2: Verify text exists Run: grep -RIn "silent waiting\|SLA" WORKFLOW.md AGENTS.md Expected: matching lines found

Step 3: Commit

git add WORKFLOW.md AGENTS.md
git commit -m "docs: forbid silent waiting after subagent SLA"

Task 26: Create blackhole scenario test shell

Files:

  • Create: scripts/test_subagent_blackhole_scenarios.mjs

Step 1: Create the scenario test shell

  • Add empty scenario harness.

Step 2: Verify file runs Run: node scripts/test_subagent_blackhole_scenarios.mjs || true Expected: file executes, even if not complete

Step 3: Commit

git add scripts/test_subagent_blackhole_scenarios.mjs
git commit -m "test: add subagent blackhole scenario harness"

Task 27: Add normal-completion scenario

Files:

  • Modify: scripts/test_subagent_blackhole_scenarios.mjs

Step 1: Write the scenario

  • dispatch -> completion receipt -> completed

Step 2: Run tests Run: node scripts/test_subagent_blackhole_scenarios.mjs Expected: scenario still may fail until engine wiring is ready

Step 3: Commit

git add scripts/test_subagent_blackhole_scenarios.mjs
git commit -m "test: add normal subagent completion scenario"

Task 28: Add slow-but-active scenario

Files:

  • Modify: scripts/test_subagent_blackhole_scenarios.mjs

Step 1: Write the scenario

  • dispatch before SLA -> active

Step 2: Run tests Run: node scripts/test_subagent_blackhole_scenarios.mjs Expected: scenario result captured

Step 3: Commit

git add scripts/test_subagent_blackhole_scenarios.mjs
git commit -m "test: add slow but active subagent scenario"

Task 29: Add done-but-not-forwarded scenario

Files:

  • Modify: scripts/test_subagent_blackhole_scenarios.mjs

Step 1: Write the scenario

  • child done -> no completion receipt -> fetch_history

Step 2: Run tests Run: node scripts/test_subagent_blackhole_scenarios.mjs Expected: scenario result captured

Step 3: Commit

git add scripts/test_subagent_blackhole_scenarios.mjs
git commit -m "test: add done but not forwarded scenario"

Task 30: Add missing-completion-event scenario

Files:

  • Modify: scripts/test_subagent_blackhole_scenarios.mjs

Step 1: Write the scenario

  • no bounce, no completion receipt, beyond SLA -> suspect delivery failure

Step 2: Run tests Run: node scripts/test_subagent_blackhole_scenarios.mjs Expected: scenario result captured

Step 3: Commit

git add scripts/test_subagent_blackhole_scenarios.mjs
git commit -m "test: add missing completion event scenario"

Task 31: Add repeated-failure escalation scenario

Files:

  • Modify: scripts/test_subagent_blackhole_scenarios.mjs

Step 1: Write the scenario

  • fetch_history fails -> respawn fails -> blocked

Step 2: Run tests Run: node scripts/test_subagent_blackhole_scenarios.mjs Expected: scenario result captured

Step 3: Commit

git add scripts/test_subagent_blackhole_scenarios.mjs
git commit -m "test: add repeated blackhole escalation scenario"

Task 32: Run the full local watchdog test set

Files:

  • Modify if needed: scripts/test_subagent_delivery_watchdog.mjs
  • Modify if needed: scripts/test_subagent_blackhole_scenarios.mjs

Step 1: Run the combined tests Run:

  • node scripts/test_subagent_delivery_watchdog.mjs
  • node scripts/test_subagent_blackhole_scenarios.mjs Expected: PASS

Step 2: Fix only minimal wiring needed for all-pass

  • Keep changes scoped to watchdog logic/tests.

Step 3: Commit

git add scripts/test_subagent_delivery_watchdog.mjs scripts/test_subagent_blackhole_scenarios.mjs scripts/subagent_delivery_watchdog.mjs
git commit -m "test: pass full subagent blackhole watchdog suite"

Task 33: Peer review watchdog state logic

Files:

  • Review: scripts/subagent_delivery_watchdog.mjs
  • Review: scripts/test_subagent_delivery_watchdog.mjs

Step 1: Request reviewer focus on receipt state logic

  • Verify statuses and transitions match B-class failure goals.

Step 2: Record reviewer verdict

  • Include commands and findings.

Step 3: Commit any follow-up fixes if needed

# only if reviewer requests changes
git add <changed-files>
git commit -m "fix: address watchdog state review feedback"

Task 34: Peer review recovery decisions

Files:

  • Review: scripts/subagent_delivery_watchdog.mjs
  • Review: docs/runbooks/subagent-delivery-recovery.md

Step 1: Request reviewer focus on recovery ordering

  • Verify fetch-history before respawn and blocked escalation.

Step 2: Record reviewer verdict

  • Include commands and findings.

Step 3: Commit any follow-up fixes if needed

# only if reviewer requests changes
git add <changed-files>
git commit -m "fix: address recovery decision review feedback"

Task 35: Peer review scenario coverage and handoff

Files:

  • Review: scripts/test_subagent_blackhole_scenarios.mjs
  • Review: docs/runbooks/subagent-anti-blackhole.md
  • Review: docs/runbooks/subagent-delivery-recovery.md

Step 1: Request reviewer focus on blackhole realism

  • Confirm this targets fake timeout / no-bounce cases, not just slow work.

Step 2: Record verification output

  • Include exact commands and reviewer verdict.

Step 3: Final state

  • Leave task in pending_verification; do not mark complete.