Files
watchdog-discord-route/references/watchdog-b-readme.md

193 lines
7.2 KiB
Markdown

# Watchdog B v3 notification layer
Single source of truth for owner-facing policy: `~/.config/openclaw/watchdog-b.env`
Runtime auto-detection source: `scripts/openclaw_runtime_probe.py`
This directory now contains:
- `check_openclaw_state.sh` — tri-state checker (`running` / `stalled` / `idle`)
- `run_watchdog_b.sh` — dispatcher + notification runner
- `notify_watchdog_b.py` — minimal notification integration layer
## Configuration source
Priority order is now:
1. process env already set by caller/systemd
2. `WATCHDOG_B_CONFIG_FILE` if set
3. fallback `~/.config/openclaw/watchdog-b.env`
4. code defaults
This means both `run_watchdog_b.sh` and `notify_watchdog_b.py` can be invoked manually and still resolve the same owner-facing channel / target / mode / wording.
For Node/OpenClaw runtime paths, the bundled scripts now resolve in this order:
1. explicit env overrides: `WATCHDOG_B_NODE_BIN`, `WATCHDOG_B_OPENCLAW_MJS`, `WATCHDOG_B_OPENCLAW_ENTRY`
2. PATH lookup for `node` and `openclaw`
3. common install roots scan: nvm, pnpm global, npm-global, `/usr/local`, `/usr`, Volta-style trees
4. fail with an operator-facing error that tells you which env vars to set manually
Repo template to copy from:
- `ops/systemd/user/watchdog-b.env.example`
Install example:
```bash
mkdir -p ~/.config/openclaw
cp ~/.openclaw/workspace/ops/systemd/user/watchdog-b.env.example ~/.config/openclaw/watchdog-b.env
$EDITOR ~/.config/openclaw/watchdog-b.env
```
## Notification strategy
### 1) `running`
Default: **manual / queue-ready only**.
Why:
- a healthy runtime every 10 minutes should not spam Eric
- owner-facing reporting should remain explicit and auditable
Behavior:
- `WATCHDOG_B_RUNNING_REPORT_MODE=manual` (default)
- does not create external messages
- returns a concrete hint for how to enable queue creation
- `WATCHDOG_B_RUNNING_REPORT_MODE=enqueue`
- creates a real pending owner report in `~/.clawteam/owner-reports/pending/`
- does **not** auto-send it
- `WATCHDOG_B_RUNNING_REPORT_MODE=enqueue-and-drain`
- creates a pending owner report and immediately delivers that exact pending item through `owner_report_driver.py`
- send path is direct OpenClaw Discord send using the env-configured owner-facing destination
- this keeps queue/audit semantics but avoids depending on the wrapper watchdog's destination/visibility behavior
Throttle:
- `WATCHDOG_B_RUNNING_REPORT_MIN_INTERVAL_SECONDS` default `3600`
### 2) `stalled`
Default: **nudge main agent first**, then escalate to Eric only after repetition.
Behavior:
- call internal OpenClaw agent route:
- `node .../openclaw.mjs agent --agent main --message ...`
- maintain local notify state under `state/watchdog-b/notify-state.json`
- after repeated observations (`WATCHDOG_B_STALLED_OWNER_ESCALATION_AFTER`, default `2`), enqueue an owner report
Throttle:
- `WATCHDOG_B_STALLED_NUDGE_MIN_INTERVAL_SECONDS` default `900`
Owner escalation mode:
- `WATCHDOG_B_STALLED_OWNER_MODE=escalate` (default)
- `WATCHDOG_B_STALLED_OWNER_MODE=always`
- `WATCHDOG_B_STALLED_OWNER_MODE=never`
Owner delivery mode after enqueue:
- `WATCHDOG_B_OWNER_DELIVERY_MODE=enqueue-only` (default)
- `WATCHDOG_B_OWNER_DELIVERY_MODE=direct-discord`
When `direct-discord` is enabled, watchdog-b still enqueues first, then directly delivers that same pending report via `owner_report_driver.py` to the env-configured Discord target.
## Owner-facing message style
Owner-facing Discord message is now compact and conclusion-first:
- first line: headline (`🔔 [watchdog-b] <worker>`)
- second line: concise conclusion with emoji, e.g. `✅ 主程序仍在運行`
- third line: actionable next step, prefixed with `→`
- last line: compact technical metadata (`task=... | status=... | progress=... | source=...`)
Style knobs live in `watchdog-b.env`:
- `WATCHDOG_B_RUNNING_EMOJI`, `WATCHDOG_B_RUNNING_SUMMARY`
- `WATCHDOG_B_STALLED_EMOJI`, `WATCHDOG_B_STALLED_SUMMARY`
- `WATCHDOG_B_IDLE_EMOJI`, `WATCHDOG_B_IDLE_SUMMARY`
### 3) `idle`
Default: same pattern as stalled, but slower.
Behavior:
- nudge main agent first
- only escalate to Eric after repeated idle detections
Throttle:
- `WATCHDOG_B_IDLE_NUDGE_MIN_INTERVAL_SECONDS` default `1800`
Owner escalation threshold:
- `WATCHDOG_B_IDLE_OWNER_ESCALATION_AFTER` default `2`
## Safety defaults
- `WATCHDOG_B_NOTIFY_DRY_RUN=1` by default in `run_watchdog_b.sh`
- owner-facing send path keeps the existing `owner-reporting-system` queue/artifact/audit flow, but direct Discord delivery no longer depends on the cron wrapper's default destination semantics
- local state tracks last send time / count to reduce spam
- a failed notifier does not crash the dispatcher; it emits warning + preserves artifacts
## Key artifacts
Under `state/watchdog-b/`:
- `last-output.txt` — rendered dispatcher output
- `last-notify-output.txt` — notifier JSON result
- `last-state.txt` — last state
- `history.tsv` — state history
- `notify-state.json` — throttle / repetition tracking
## Manual test examples
### Dry-run running path
```bash
WATCHDOG_B_NOTIFY_DRY_RUN=1 \
WATCHDOG_B_RUNNING_REPORT_MODE=manual \
./scripts/watchdog-b/notify_watchdog_b.py --state running --dry-run
```
### Real queue creation for running (no send)
```bash
WATCHDOG_B_NOTIFY_DRY_RUN=0 \
WATCHDOG_B_RUNNING_REPORT_MODE=enqueue \
./scripts/watchdog-b/notify_watchdog_b.py --state running
ls -l ~/.clawteam/owner-reports/pending
```
### Runtime probe only
```bash
python3 ./scripts/watchdog-b/openclaw_runtime_probe.py --pretty
```
### Single-shot enqueue + direct Discord delivery
```bash
WATCHDOG_B_NOTIFY_DRY_RUN=0 \
WATCHDOG_B_RUNNING_REPORT_MODE=enqueue-and-drain \
./scripts/watchdog-b/notify_watchdog_b.py --state running
```
### Dry-run stalled nudge
```bash
WATCHDOG_B_NOTIFY_DRY_RUN=1 \
./scripts/watchdog-b/notify_watchdog_b.py --state stalled --dry-run
```
If runtime auto-detection fails on a host with a custom install layout, set one or more of:
- `WATCHDOG_B_NODE_BIN`
- `WATCHDOG_B_OPENCLAW_MJS`
- `WATCHDOG_B_OPENCLAW_ENTRY`
### Full dispatcher dry-run with fixture overrides
```bash
OPENCLAW_PID_FILE=$PWD/tests/fixtures/watchdog-b/running/host-runtime/openclaw.pid \
OPENCLAW_LOG_FILE=$PWD/tests/fixtures/watchdog-b/running/logs/openclaw.log \
WATCHDOG_B_ARTIFACT_DIR=$PWD/state/watchdog-b-test-running-v3 \
WATCHDOG_B_NOTIFY_DRY_RUN=1 \
./scripts/watchdog-b/run_watchdog_b.sh
```
## What is truly wired vs not
### Truly wired now
- state detection
- notifier invocation from dispatcher
- main-agent internal nudge command construction and execution path
- owner-report queue creation via existing producer
- optional direct delivery of an enqueued owner report through `owner_report_driver.py`
- throttling / repetition state persisted locally
### Still conditional / not claimed as universally proven
- successful main-agent wake-up depends on local OpenClaw CLI/runtime being callable from this environment
- successful owner-facing delivery still depends on valid local OpenClaw Discord routing on the host
- direct watchdog-b owner delivery now targets `channel:1480577550445969541` by default and bypasses the wrapper watchdog destination logic
- the cron wrapper remains available for generic queue draining, but watchdog-b no longer needs to rely on it for single-shot owner-facing verification