Files
watchdog-discord-route/references/watchdog-b-readme.md

7.2 KiB

Watchdog B v3 notification layer

Single source of truth for owner-facing policy: ~/.config/openclaw/watchdog-b.env

Runtime auto-detection source: scripts/openclaw_runtime_probe.py

This directory now contains:

  • check_openclaw_state.sh — tri-state checker (running / stalled / idle)
  • run_watchdog_b.sh — dispatcher + notification runner
  • notify_watchdog_b.py — minimal notification integration layer

Configuration source

Priority order is now:

  1. process env already set by caller/systemd
  2. WATCHDOG_B_CONFIG_FILE if set
  3. fallback ~/.config/openclaw/watchdog-b.env
  4. code defaults

This means both run_watchdog_b.sh and notify_watchdog_b.py can be invoked manually and still resolve the same owner-facing channel / target / mode / wording.

For Node/OpenClaw runtime paths, the bundled scripts now resolve in this order:

  1. explicit env overrides: WATCHDOG_B_NODE_BIN, WATCHDOG_B_OPENCLAW_MJS, WATCHDOG_B_OPENCLAW_ENTRY
  2. PATH lookup for node and openclaw
  3. common install roots scan: nvm, pnpm global, npm-global, /usr/local, /usr, Volta-style trees
  4. fail with an operator-facing error that tells you which env vars to set manually

Repo template to copy from:

  • ops/systemd/user/watchdog-b.env.example

Install example:

mkdir -p ~/.config/openclaw
cp ~/.openclaw/workspace/ops/systemd/user/watchdog-b.env.example ~/.config/openclaw/watchdog-b.env
$EDITOR ~/.config/openclaw/watchdog-b.env

Notification strategy

1) running

Default: manual / queue-ready only.

Why:

  • a healthy runtime every 10 minutes should not spam Eric
  • owner-facing reporting should remain explicit and auditable

Behavior:

  • WATCHDOG_B_RUNNING_REPORT_MODE=manual (default)
    • does not create external messages
    • returns a concrete hint for how to enable queue creation
  • WATCHDOG_B_RUNNING_REPORT_MODE=enqueue
    • creates a real pending owner report in ~/.clawteam/owner-reports/pending/
    • does not auto-send it
  • WATCHDOG_B_RUNNING_REPORT_MODE=enqueue-and-drain
    • creates a pending owner report and immediately delivers that exact pending item through owner_report_driver.py
    • send path is direct OpenClaw Discord send using the env-configured owner-facing destination
    • this keeps queue/audit semantics but avoids depending on the wrapper watchdog's destination/visibility behavior

Throttle:

  • WATCHDOG_B_RUNNING_REPORT_MIN_INTERVAL_SECONDS default 3600

2) stalled

Default: nudge main agent first, then escalate to Eric only after repetition.

Behavior:

  • call internal OpenClaw agent route:
    • node .../openclaw.mjs agent --agent main --message ...
  • maintain local notify state under state/watchdog-b/notify-state.json
  • after repeated observations (WATCHDOG_B_STALLED_OWNER_ESCALATION_AFTER, default 2), enqueue an owner report

Throttle:

  • WATCHDOG_B_STALLED_NUDGE_MIN_INTERVAL_SECONDS default 900

Owner escalation mode:

  • WATCHDOG_B_STALLED_OWNER_MODE=escalate (default)
  • WATCHDOG_B_STALLED_OWNER_MODE=always
  • WATCHDOG_B_STALLED_OWNER_MODE=never

Owner delivery mode after enqueue:

  • WATCHDOG_B_OWNER_DELIVERY_MODE=enqueue-only (default)
  • WATCHDOG_B_OWNER_DELIVERY_MODE=direct-discord

When direct-discord is enabled, watchdog-b still enqueues first, then directly delivers that same pending report via owner_report_driver.py to the env-configured Discord target.

Owner-facing message style

Owner-facing Discord message is now compact and conclusion-first:

  • first line: headline (🔔 [watchdog-b] <worker>)
  • second line: concise conclusion with emoji, e.g. ✅ 主程序仍在運行
  • third line: actionable next step, prefixed with
  • last line: compact technical metadata (task=... | status=... | progress=... | source=...)

Style knobs live in watchdog-b.env:

  • WATCHDOG_B_RUNNING_EMOJI, WATCHDOG_B_RUNNING_SUMMARY
  • WATCHDOG_B_STALLED_EMOJI, WATCHDOG_B_STALLED_SUMMARY
  • WATCHDOG_B_IDLE_EMOJI, WATCHDOG_B_IDLE_SUMMARY

3) idle

Default: same pattern as stalled, but slower.

Behavior:

  • nudge main agent first
  • only escalate to Eric after repeated idle detections

Throttle:

  • WATCHDOG_B_IDLE_NUDGE_MIN_INTERVAL_SECONDS default 1800

Owner escalation threshold:

  • WATCHDOG_B_IDLE_OWNER_ESCALATION_AFTER default 2

Safety defaults

  • WATCHDOG_B_NOTIFY_DRY_RUN=1 by default in run_watchdog_b.sh
  • owner-facing send path keeps the existing owner-reporting-system queue/artifact/audit flow, but direct Discord delivery no longer depends on the cron wrapper's default destination semantics
  • local state tracks last send time / count to reduce spam
  • a failed notifier does not crash the dispatcher; it emits warning + preserves artifacts

Key artifacts

Under state/watchdog-b/:

  • last-output.txt — rendered dispatcher output
  • last-notify-output.txt — notifier JSON result
  • last-state.txt — last state
  • history.tsv — state history
  • notify-state.json — throttle / repetition tracking

Manual test examples

Dry-run running path

WATCHDOG_B_NOTIFY_DRY_RUN=1 \
WATCHDOG_B_RUNNING_REPORT_MODE=manual \
./scripts/watchdog-b/notify_watchdog_b.py --state running --dry-run

Real queue creation for running (no send)

WATCHDOG_B_NOTIFY_DRY_RUN=0 \
WATCHDOG_B_RUNNING_REPORT_MODE=enqueue \
./scripts/watchdog-b/notify_watchdog_b.py --state running
ls -l ~/.clawteam/owner-reports/pending

Runtime probe only

python3 ./scripts/watchdog-b/openclaw_runtime_probe.py --pretty

Single-shot enqueue + direct Discord delivery

WATCHDOG_B_NOTIFY_DRY_RUN=0 \
WATCHDOG_B_RUNNING_REPORT_MODE=enqueue-and-drain \
./scripts/watchdog-b/notify_watchdog_b.py --state running

Dry-run stalled nudge

WATCHDOG_B_NOTIFY_DRY_RUN=1 \
./scripts/watchdog-b/notify_watchdog_b.py --state stalled --dry-run

If runtime auto-detection fails on a host with a custom install layout, set one or more of:

  • WATCHDOG_B_NODE_BIN
  • WATCHDOG_B_OPENCLAW_MJS
  • WATCHDOG_B_OPENCLAW_ENTRY

Full dispatcher dry-run with fixture overrides

OPENCLAW_PID_FILE=$PWD/tests/fixtures/watchdog-b/running/host-runtime/openclaw.pid \
OPENCLAW_LOG_FILE=$PWD/tests/fixtures/watchdog-b/running/logs/openclaw.log \
WATCHDOG_B_ARTIFACT_DIR=$PWD/state/watchdog-b-test-running-v3 \
WATCHDOG_B_NOTIFY_DRY_RUN=1 \
./scripts/watchdog-b/run_watchdog_b.sh

What is truly wired vs not

Truly wired now

  • state detection
  • notifier invocation from dispatcher
  • main-agent internal nudge command construction and execution path
  • owner-report queue creation via existing producer
  • optional direct delivery of an enqueued owner report through owner_report_driver.py
  • throttling / repetition state persisted locally

Still conditional / not claimed as universally proven

  • successful main-agent wake-up depends on local OpenClaw CLI/runtime being callable from this environment
  • successful owner-facing delivery still depends on valid local OpenClaw Discord routing on the host
  • direct watchdog-b owner delivery now targets channel:1480577550445969541 by default and bypasses the wrapper watchdog destination logic
  • the cron wrapper remains available for generic queue draining, but watchdog-b no longer needs to rely on it for single-shot owner-facing verification