19 KiB
Subagent Anti-Blackhole / Completion-Delivery Watchdog Implementation Plan
For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
Goal: Prevent B-class fake timeouts where a subagent finishes, stalls, or loses its return path off-thread and the main conversation never receives a trustworthy completion update.
Architecture: Build this in very small layers: first define receipts and states, then pin the blackhole cases with fail-first tests, then implement deterministic receipt-state logic, then add done-but-not-forwarded recovery decisions, then add owner-visible reporting rules and scenario simulations. Keep all early slices file-backed and test-driven before touching any live-session integration.
Tech Stack: Node.js, MJS test runners, file-backed JSON state, OpenClaw subagent/session concepts, docs/runbooks
Task 1: Define dispatch receipt fields
Files:
- Modify:
docs/runbooks/subagent-anti-blackhole.md
Step 1: Write the receipt field list
- Define only dispatch fields:
runIdchildSessionKeydispatchAtexpectedBy
Step 2: Verify file contains the new field names
Run: grep -n "runId\|childSessionKey\|dispatchAt\|expectedBy" docs/runbooks/subagent-anti-blackhole.md
Expected: matching lines found
Step 3: Commit
git add docs/runbooks/subagent-anti-blackhole.md
git commit -m "docs: define subagent dispatch receipt fields"
Task 2: Define completion receipt fields
Files:
- Modify:
docs/runbooks/subagent-anti-blackhole.md
Step 1: Write the completion field list
- Define only completion fields:
completionReceivedAtforwardedToMainresultSource
Step 2: Verify file contains the new field names
Run: grep -n "completionReceivedAt\|forwardedToMain\|resultSource" docs/runbooks/subagent-anti-blackhole.md
Expected: matching lines found
Step 3: Commit
git add docs/runbooks/subagent-anti-blackhole.md
git commit -m "docs: define subagent completion receipt fields"
Task 3: Define watchdog statuses
Files:
- Modify:
docs/runbooks/subagent-anti-blackhole.md
Step 1: Add the status enum
- Define:
activesuspect_delivery_failuredone_but_not_forwardedcompletedrecoveredblocked
Step 2: Verify status names exist
Run: grep -n "suspect_delivery_failure\|done_but_not_forwarded\|recovered" docs/runbooks/subagent-anti-blackhole.md
Expected: matching lines found
Step 3: Commit
git add docs/runbooks/subagent-anti-blackhole.md
git commit -m "docs: define subagent watchdog statuses"
Task 4: Define B-class failure modes
Files:
- Modify:
docs/runbooks/subagent-anti-blackhole.md
Step 1: Write the failure mode bullets
- Add:
- done but not forwarded
- no completion event received
- session exists but no result bounce
- unclear slow-run vs delivery failure
Step 2: Verify phrases exist
Run: grep -n "done but not forwarded\|completion event\|result bounce\|delivery failure" docs/runbooks/subagent-anti-blackhole.md
Expected: matching lines found
Step 3: Commit
git add docs/runbooks/subagent-anti-blackhole.md
git commit -m "docs: define B-class subagent failure modes"
Task 5: Create watchdog script skeleton
Files:
- Create:
scripts/subagent_delivery_watchdog.mjs
Step 1: Create the script shell
- Add CLI parsing and a placeholder JSON response.
Step 2: Verify it runs
Run: node scripts/subagent_delivery_watchdog.mjs --compact --input /dev/null || true
Expected: script exists and is executable enough for next test work
Step 3: Commit
git add scripts/subagent_delivery_watchdog.mjs
git commit -m "chore: add subagent delivery watchdog skeleton"
Task 6: Create watchdog test skeleton
Files:
- Create:
scripts/test_subagent_delivery_watchdog.mjs
Step 1: Create the test shell
- Add basic harness structure and fixture runner.
Step 2: Verify test file executes
Run: node scripts/test_subagent_delivery_watchdog.mjs || true
Expected: test runner executes, even if failing
Step 3: Commit
git add scripts/test_subagent_delivery_watchdog.mjs
git commit -m "test: add subagent watchdog test skeleton"
Task 7: Add active-before-SLA test
Files:
- Modify:
scripts/test_subagent_delivery_watchdog.mjs
Step 1: Write the test
- dispatch exists
- no completion receipt yet
- current time still before SLA
- expect
active
Step 2: Run test to verify it fails
Run: node scripts/test_subagent_delivery_watchdog.mjs
Expected: FAIL on missing logic
Step 3: Commit
git add scripts/test_subagent_delivery_watchdog.mjs
git commit -m "test: require active status before SLA breach"
Task 8: Add suspect-delivery-failure test
Files:
- Modify:
scripts/test_subagent_delivery_watchdog.mjs
Step 1: Write the test
- dispatch exists
- no completion receipt
- current time beyond SLA
- expect
suspect_delivery_failure
Step 2: Run test to verify it fails
Run: node scripts/test_subagent_delivery_watchdog.mjs
Expected: FAIL on new assertion
Step 3: Commit
git add scripts/test_subagent_delivery_watchdog.mjs
git commit -m "test: detect suspected delivery failure after SLA"
Task 9: Add completed-status test
Files:
- Modify:
scripts/test_subagent_delivery_watchdog.mjs
Step 1: Write the test
- dispatch exists
- completion receipt exists
- expect
completed
Step 2: Run test to verify it fails
Run: node scripts/test_subagent_delivery_watchdog.mjs
Expected: FAIL on completed path
Step 3: Commit
git add scripts/test_subagent_delivery_watchdog.mjs
git commit -m "test: close watchdog on completion receipt"
Task 10: Add state shape fixture
Files:
- Create:
state/subagent-delivery-watchdog/README.md - Create:
state/subagent-delivery-watchdog/.gitkeep
Step 1: Define the state JSON shape in README
- Include receipt fields and status fields.
Step 2: Verify files exist
Run: test -f state/subagent-delivery-watchdog/README.md && test -f state/subagent-delivery-watchdog/.gitkeep && echo OK
Expected: OK
Step 3: Commit
git add state/subagent-delivery-watchdog/README.md state/subagent-delivery-watchdog/.gitkeep
git commit -m "docs: define watchdog state storage shape"
Task 11: Implement dispatch receipt write
Files:
- Modify:
scripts/subagent_delivery_watchdog.mjs
Step 1: Add a function to write dispatch receipt state
- Only handle a new dispatch record.
Step 2: Run tests
Run: node scripts/test_subagent_delivery_watchdog.mjs
Expected: some tests still fail, but dispatch state path exists
Step 3: Commit
git add scripts/subagent_delivery_watchdog.mjs
git commit -m "feat: write subagent dispatch receipt state"
Task 12: Implement completion receipt write
Files:
- Modify:
scripts/subagent_delivery_watchdog.mjs
Step 1: Add a function to write completion receipt state
- Only update completion-related fields.
Step 2: Run tests
Run: node scripts/test_subagent_delivery_watchdog.mjs
Expected: some tests still fail, but completion data path exists
Step 3: Commit
git add scripts/subagent_delivery_watchdog.mjs
git commit -m "feat: write subagent completion receipt state"
Task 13: Implement status recompute for active/completed/suspect
Files:
- Modify:
scripts/subagent_delivery_watchdog.mjs
Step 1: Add status recompute logic
- Implement only:
activesuspect_delivery_failurecompleted
Step 2: Run tests
Run: node scripts/test_subagent_delivery_watchdog.mjs
Expected: Task 7-9 tests pass
Step 3: Commit
git add scripts/subagent_delivery_watchdog.mjs scripts/test_subagent_delivery_watchdog.mjs
git commit -m "feat: recompute basic watchdog statuses"
Task 14: Add done-but-not-forwarded test
Files:
- Modify:
scripts/test_subagent_delivery_watchdog.mjs
Step 1: Write the test
- child run marked done
- no completion receipt in main thread
- expect
done_but_not_forwarded
Step 2: Run tests to verify it fails
Run: node scripts/test_subagent_delivery_watchdog.mjs
Expected: FAIL on new assertion
Step 3: Commit
git add scripts/test_subagent_delivery_watchdog.mjs
git commit -m "test: detect done but not forwarded state"
Task 15: Implement done-but-not-forwarded state
Files:
- Modify:
scripts/subagent_delivery_watchdog.mjs
Step 1: Add done-but-not-forwarded detection
- Use child-done signal + missing completion receipt.
Step 2: Run tests
Run: node scripts/test_subagent_delivery_watchdog.mjs
Expected: done-but-not-forwarded test passes
Step 3: Commit
git add scripts/subagent_delivery_watchdog.mjs scripts/test_subagent_delivery_watchdog.mjs
git commit -m "feat: detect done without forwarded completion"
Task 16: Add first recovery-action test
Files:
- Modify:
scripts/test_subagent_delivery_watchdog.mjs
Step 1: Write fetch-history recovery test
- done but not forwarded
- no prior recovery action
- expect recovery decision
fetch_history
Step 2: Run tests to verify it fails
Run: node scripts/test_subagent_delivery_watchdog.mjs
Expected: FAIL on recovery decision
Step 3: Commit
git add scripts/test_subagent_delivery_watchdog.mjs
git commit -m "test: fetch history after missing forwarded completion"
Task 17: Implement fetch-history recovery decision
Files:
- Modify:
scripts/subagent_delivery_watchdog.mjs
Step 1: Add minimal recovery decision logic
- Return
fetch_historyfor first-time done-but-not-forwarded.
Step 2: Run tests
Run: node scripts/test_subagent_delivery_watchdog.mjs
Expected: fetch-history recovery test passes
Step 3: Commit
git add scripts/subagent_delivery_watchdog.mjs scripts/test_subagent_delivery_watchdog.mjs
git commit -m "feat: recover with history fetch first"
Task 18: Add respawn-escalation test
Files:
- Modify:
scripts/test_subagent_delivery_watchdog.mjs
Step 1: Write the respawn test
- recovery already attempted once
- still no forwarded completion
- expect
respawn
Step 2: Run tests to verify it fails
Run: node scripts/test_subagent_delivery_watchdog.mjs
Expected: FAIL on respawn decision
Step 3: Commit
git add scripts/test_subagent_delivery_watchdog.mjs
git commit -m "test: escalate to respawn after failed recovery"
Task 19: Implement respawn decision
Files:
- Modify:
scripts/subagent_delivery_watchdog.mjs
Step 1: Add respawn logic
- Return
respawnwhen fetch-history path did not recover delivery.
Step 2: Run tests
Run: node scripts/test_subagent_delivery_watchdog.mjs
Expected: respawn test passes
Step 3: Commit
git add scripts/subagent_delivery_watchdog.mjs scripts/test_subagent_delivery_watchdog.mjs
git commit -m "feat: respawn after failed delivery recovery"
Task 20: Add blocked-escalation test
Files:
- Modify:
scripts/test_subagent_delivery_watchdog.mjs
Step 1: Write the blocked test
- repeated recovery failure
- expect
blockedplus owner-visible reporting requirement
Step 2: Run tests to verify it fails
Run: node scripts/test_subagent_delivery_watchdog.mjs
Expected: FAIL on blocked escalation
Step 3: Commit
git add scripts/test_subagent_delivery_watchdog.mjs
git commit -m "test: escalate repeated delivery failures to blocked"
Task 21: Implement blocked escalation
Files:
- Modify:
scripts/subagent_delivery_watchdog.mjs
Step 1: Add blocked escalation logic
- repeated recovery failure ->
blocked
Step 2: Run tests
Run: node scripts/test_subagent_delivery_watchdog.mjs
Expected: blocked escalation test passes
Step 3: Commit
git add scripts/subagent_delivery_watchdog.mjs scripts/test_subagent_delivery_watchdog.mjs
git commit -m "feat: block repeated subagent delivery failures"
Task 22: Add owner-visible reporting rule for suspect state
Files:
- Modify:
WORKFLOW.md - Modify:
AGENTS.md - Modify:
docs/runbooks/subagent-anti-blackhole.md
Step 1: Add suspect-state reporting rule
- If SLA is crossed with no completion receipt, the owner must be informed.
Step 2: Verify text exists
Run: grep -RIn "SLA\|suspect_delivery_failure" WORKFLOW.md AGENTS.md docs/runbooks/subagent-anti-blackhole.md
Expected: matching lines found
Step 3: Commit
git add WORKFLOW.md AGENTS.md docs/runbooks/subagent-anti-blackhole.md
git commit -m "docs: require reporting on suspect delivery failure"
Task 23: Add owner-visible reporting rule for done-but-not-forwarded
Files:
- Modify:
WORKFLOW.md - Modify:
AGENTS.md - Modify:
docs/runbooks/subagent-anti-blackhole.md
Step 1: Add done-but-not-forwarded reporting rule
- Must state that result exists but did not bounce back.
Step 2: Verify text exists
Run: grep -RIn "done but not forwarded\|did not bounce back" WORKFLOW.md AGENTS.md docs/runbooks/subagent-anti-blackhole.md
Expected: matching lines found
Step 3: Commit
git add WORKFLOW.md AGENTS.md docs/runbooks/subagent-anti-blackhole.md
git commit -m "docs: require reporting on missing forwarded completion"
Task 24: Add rule to fetch history before respawn
Files:
- Modify:
WORKFLOW.md - Modify:
docs/runbooks/subagent-delivery-recovery.md
Step 1: Add the history-first rule
- Done-but-not-forwarded should prefer
fetch_historybeforerespawn.
Step 2: Verify text exists
Run: grep -RIn "fetch_history\|before respawn" WORKFLOW.md docs/runbooks/subagent-delivery-recovery.md
Expected: matching lines found
Step 3: Commit
git add WORKFLOW.md docs/runbooks/subagent-delivery-recovery.md
git commit -m "docs: prefer history fetch before respawn"
Task 25: Add no-silent-waiting-after-SLA rule
Files:
- Modify:
WORKFLOW.md - Modify:
AGENTS.md
Step 1: Add the no-silent-waiting rule
- Once SLA is crossed, silent waiting is forbidden.
Step 2: Verify text exists
Run: grep -RIn "silent waiting\|SLA" WORKFLOW.md AGENTS.md
Expected: matching lines found
Step 3: Commit
git add WORKFLOW.md AGENTS.md
git commit -m "docs: forbid silent waiting after subagent SLA"
Task 26: Create blackhole scenario test shell
Files:
- Create:
scripts/test_subagent_blackhole_scenarios.mjs
Step 1: Create the scenario test shell
- Add empty scenario harness.
Step 2: Verify file runs
Run: node scripts/test_subagent_blackhole_scenarios.mjs || true
Expected: file executes, even if not complete
Step 3: Commit
git add scripts/test_subagent_blackhole_scenarios.mjs
git commit -m "test: add subagent blackhole scenario harness"
Task 27: Add normal-completion scenario
Files:
- Modify:
scripts/test_subagent_blackhole_scenarios.mjs
Step 1: Write the scenario
- dispatch -> completion receipt -> completed
Step 2: Run tests
Run: node scripts/test_subagent_blackhole_scenarios.mjs
Expected: scenario still may fail until engine wiring is ready
Step 3: Commit
git add scripts/test_subagent_blackhole_scenarios.mjs
git commit -m "test: add normal subagent completion scenario"
Task 28: Add slow-but-active scenario
Files:
- Modify:
scripts/test_subagent_blackhole_scenarios.mjs
Step 1: Write the scenario
- dispatch before SLA -> active
Step 2: Run tests
Run: node scripts/test_subagent_blackhole_scenarios.mjs
Expected: scenario result captured
Step 3: Commit
git add scripts/test_subagent_blackhole_scenarios.mjs
git commit -m "test: add slow but active subagent scenario"
Task 29: Add done-but-not-forwarded scenario
Files:
- Modify:
scripts/test_subagent_blackhole_scenarios.mjs
Step 1: Write the scenario
- child done -> no completion receipt -> fetch_history
Step 2: Run tests
Run: node scripts/test_subagent_blackhole_scenarios.mjs
Expected: scenario result captured
Step 3: Commit
git add scripts/test_subagent_blackhole_scenarios.mjs
git commit -m "test: add done but not forwarded scenario"
Task 30: Add missing-completion-event scenario
Files:
- Modify:
scripts/test_subagent_blackhole_scenarios.mjs
Step 1: Write the scenario
- no bounce, no completion receipt, beyond SLA -> suspect delivery failure
Step 2: Run tests
Run: node scripts/test_subagent_blackhole_scenarios.mjs
Expected: scenario result captured
Step 3: Commit
git add scripts/test_subagent_blackhole_scenarios.mjs
git commit -m "test: add missing completion event scenario"
Task 31: Add repeated-failure escalation scenario
Files:
- Modify:
scripts/test_subagent_blackhole_scenarios.mjs
Step 1: Write the scenario
- fetch_history fails -> respawn fails -> blocked
Step 2: Run tests
Run: node scripts/test_subagent_blackhole_scenarios.mjs
Expected: scenario result captured
Step 3: Commit
git add scripts/test_subagent_blackhole_scenarios.mjs
git commit -m "test: add repeated blackhole escalation scenario"
Task 32: Run the full local watchdog test set
Files:
- Modify if needed:
scripts/test_subagent_delivery_watchdog.mjs - Modify if needed:
scripts/test_subagent_blackhole_scenarios.mjs
Step 1: Run the combined tests Run:
node scripts/test_subagent_delivery_watchdog.mjsnode scripts/test_subagent_blackhole_scenarios.mjsExpected: PASS
Step 2: Fix only minimal wiring needed for all-pass
- Keep changes scoped to watchdog logic/tests.
Step 3: Commit
git add scripts/test_subagent_delivery_watchdog.mjs scripts/test_subagent_blackhole_scenarios.mjs scripts/subagent_delivery_watchdog.mjs
git commit -m "test: pass full subagent blackhole watchdog suite"
Task 33: Peer review watchdog state logic
Files:
- Review:
scripts/subagent_delivery_watchdog.mjs - Review:
scripts/test_subagent_delivery_watchdog.mjs
Step 1: Request reviewer focus on receipt state logic
- Verify statuses and transitions match B-class failure goals.
Step 2: Record reviewer verdict
- Include commands and findings.
Step 3: Commit any follow-up fixes if needed
# only if reviewer requests changes
git add <changed-files>
git commit -m "fix: address watchdog state review feedback"
Task 34: Peer review recovery decisions
Files:
- Review:
scripts/subagent_delivery_watchdog.mjs - Review:
docs/runbooks/subagent-delivery-recovery.md
Step 1: Request reviewer focus on recovery ordering
- Verify fetch-history before respawn and blocked escalation.
Step 2: Record reviewer verdict
- Include commands and findings.
Step 3: Commit any follow-up fixes if needed
# only if reviewer requests changes
git add <changed-files>
git commit -m "fix: address recovery decision review feedback"
Task 35: Peer review scenario coverage and handoff
Files:
- Review:
scripts/test_subagent_blackhole_scenarios.mjs - Review:
docs/runbooks/subagent-anti-blackhole.md - Review:
docs/runbooks/subagent-delivery-recovery.md
Step 1: Request reviewer focus on blackhole realism
- Confirm this targets fake timeout / no-bounce cases, not just slow work.
Step 2: Record verification output
- Include exact commands and reviewer verdict.
Step 3: Final state
- Leave task in
pending_verification; do not mark complete.