Initial import of watchdog-discord-route skill

This commit is contained in:
Alice
2026-04-22 08:33:51 +08:00
commit 8138fb011d
22 changed files with 2447 additions and 0 deletions

View File

@@ -0,0 +1,242 @@
# Owner Report Operator Manual
## Purpose
Owner-report system 用來把「需要主動回報的 checkpoint」變成一條可觀測、可驗證、可補送的通知鏈路。
適合解決:
- 長時間、跨步驟任務
- ClawTeam / worker / 背景流程的進度回報
- 不能只靠口頭承諾「等等回報」的情境
不追求複雜事件平台;目標是 **簡單、可靠、失敗不假成功**
## Architecture
最小鏈路:
一般 queue drain
`producer -> pending/*.md -> watchdog -> driver -> Discord -> sent/*.md`
watchdog-b 單發直送:
`producer -> pending/*.md -> driver -> Discord channel:1480577550445969541 -> sent/*.md`
元件:
- `scripts/owner_report_producer.py`
- 把顯式 checkpoint 欄位寫成 `~/.clawteam/owner-reports/pending/<report_id>.md`
- `scripts/owner_report_consumer.py`
- 讀 pending report轉成標準 JSON
- `scripts/owner_report_driver.py`
- 呼叫外部 send command**只有送成功才移到** `sent/`
- `scripts/owner_report_watchdog.py`
- 單次掃描 pending預設 oldest-first 處理 1 筆
- `scripts/run_owner_report_watchdog.sh`
- 本機 wrapper固定 send command / target / max-count給 cron 用
目錄:
- pending: `~/.clawteam/owner-reports/pending/`
- sent: `~/.clawteam/owner-reports/sent/`
- cron log: `/opt/workspace_auditing_report/logs/owner_report_watchdog_cron.out`
## When to use
請用在:
- 多步驟、跨時間任務
- 有明確 checkpoint / status change
- 使用 ClawTeam、subagent、watchdog、cron 的工作
- Eric 明確要求不要漏回報的任務
不要用在:
- 一次回完即可的短問答
- 小修改、低風險單步操作
- 不需要主動通知的普通工作
## Normal flow
1. 任務出現值得回報的 checkpoint
2. `owner_report_producer.py` 產生一筆 pending report
3. cron 每分鐘執行 `run_owner_report_watchdog.sh`
4. watchdog 從 `pending/` 挑最舊 report
5. driver 用 `OWNER_REPORT_SEND_CMD` 實際送出
6. 成功:移到 `sent/`
7. 失敗:留在 `pending/`,本輪停止
### Failure semantics
- **送失敗不會 archive**
- **backlog 依 oldest-first**
- **遇到失敗即停**,避免把後續成功誤當整體正常
## Common commands
### 1) 看 queue
```bash
ls -l ~/.clawteam/owner-reports/pending
ls -l ~/.clawteam/owner-reports/sent
```
### 2) 手動產生一筆 report
```bash
cd /home/chchang/.openclaw/workspace/owner-reporting-system/scripts
uv run python owner_report_producer.py \
--team clawteam \
--worker backend-a \
--task-id example-task \
--progress 80% \
--done 'export complete' \
--next 'wait for aggregation' \
--status normal \
--source checkpoint-complete
```
### 3) producer dry-run
```bash
uv run python owner_report_producer.py \
--team clawteam \
--worker backend-a \
--task-id example-task \
--progress 80% \
--done 'export complete' \
--next 'wait for aggregation' \
--status normal \
--dry-run
```
### 4) watchdog dry-run
```bash
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh --dry-run
```
### 5) 立即手動跑 watchdog
```bash
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh
```
### 6) 一次多吃幾筆 backlog
```bash
OWNER_REPORT_MAX_COUNT=20 \
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh
```
### 7) 查看 cron
```bash
crontab -l
```
### 8) 看 watchdog log
```bash
tail -n 200 /opt/workspace_auditing_report/logs/owner_report_watchdog_cron.out
```
## Debugging checklist
遇到「沒有送出」時,依序檢查:
1. **pending 有沒有檔案**
```bash
ls -l ~/.clawteam/owner-reports/pending
```
2. **report 內容是否合理**
```bash
cd /home/chchang/.openclaw/workspace/owner-reporting-system/scripts
uv run python owner_report_consumer.py <report_id_or_path>
```
3. **driver dry-run 是否正常**
```bash
uv run python owner_report_driver.py <report_id_or_path> --dry-run
```
4. **watchdog dry-run 挑的是哪一筆**
```bash
./run_owner_report_watchdog.sh --dry-run
```
5. **cron 是否存在**
```bash
crontab -l
```
6. **cron log 是否有錯**
```bash
tail -n 200 /opt/workspace_auditing_report/logs/owner_report_watchdog_cron.out
```
7. **Node / openclaw entry 是否存在**
- `run_owner_report_watchdog.sh` 會檢查:
- `NODE_BIN`
- `OPENCLAW_ENTRY`
8. **send command 是否能成功發送**
- wrapper 與 watchdog-b 直送都靠 `OWNER_REPORT_SEND_CMD` / `--send-cmd`
- 現行 owner-facing 預設目標為 Discord `channel:1480577550445969541`
- watchdog-b 單發驗證可直接用 `owner_report_driver.py <report_id> --send-cmd '...message send --channel discord --target '\''channel:1480577550445969541'\'' --message "$OWNER_REPORT_MESSAGE"'`
9. **如果有 backlog 卡住**
- 先修掉最舊失敗那筆watchdog 是 oldest-first且失敗即停
## Cron / watchdog behavior
本機現況:
- schedule: `* * * * *`
- command: `run_owner_report_watchdog.sh`
- target: 預設 Discord `channel:1480577550445969541`
- max backlog per run: 預設 `5`
wrapper 會:
1. 固定 `OWNER_REPORT_SEND_CMD`
2. 固定 owner-facing channel/target預設 `OWNER_REPORT_CHANNEL=discord`、`OWNER_REPORT_TARGET=channel:1480577550445969541`;可覆蓋)
3. 呼叫 `owner_report_watchdog.py --max-count "$OWNER_REPORT_MAX_COUNT"`
watchdog 本身特性:
- 非 daemon
- 非常駐
- 不 retry / backoff
- 每次只做一輪掃描
- 預設處理 1 筆wrapper 預設放大為 5 筆
- `--all` 可掃完整個當前 backlog
## Caveats
- 這不是 message bus也不是完整 job system
- 沒有內建 retry / dedupe / database
- 若最舊 pending 一直失敗,後面會被擋住
- `sent/` 代表「send command 成功」,不是保證人類已讀
- producer 依賴顯式欄位輸入,不會自動理解任意 log
- cron / PATH / node version 問題,已透過 wrapper 固定 Node 與 OpenClaw entry 盡量降低
## Examples
### Example A: 一般任務 checkpoint
```bash
cd /home/chchang/.openclaw/workspace/owner-reporting-system/scripts
uv run python owner_report_producer.py \
--team general-task \
--worker alice \
--task-id manual-checkpoint-1 \
--progress 50% \
--done '第一階段完成' \
--next '等待第二階段結果' \
--status normal \
--source manual-checkpoint
```
### Example B: 只驗證不送出
```bash
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh --dry-run
```
### Example C: watchdog-b 單發直送到 owner-facing Discord
先用 probe 找到本機 runtime
```bash
python3 /home/chchang/.openclaw/workspace/skills/watchdog-discord-route/scripts/openclaw_runtime_probe.py --pretty
```
再帶入偵測到的 `node` / `dist/entry.js`
```bash
cd /home/chchang/.openclaw/workspace/owner-reporting-system/scripts
python3 owner_report_driver.py <report_id> \
--send-cmd '"<detected-node>" "<detected-entry.js>" message send --channel discord --target '\''channel:1480577550445969541'\'' --message "$OWNER_REPORT_MESSAGE"'
```
### Example D: 臨時改送別的 channel / target
```bash
OWNER_REPORT_CHANNEL=telegram \
OWNER_REPORT_TARGET=123456789 \
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh
```
### Example E: 快速消化 backlog
```bash
OWNER_REPORT_MAX_COUNT=10 \
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh
```
## Operator summary
如果只記三件事:
1. **report 要先進 `pending/`,才有東西可送**
2. **只有 send 成功才會移到 `sent/`**
3. **看不到通知時,先查 pending / cron log / 最舊失敗那筆**

View File

@@ -0,0 +1,80 @@
# Owner Reporting System
這是一套全域性的 owner-facing 主動回報流程,不屬於某一個特定專案。
它的目的,是把長時間、多步驟、不可漏回報的工作,整理成一條可觀測、可驗證、失敗不假成功的通知鏈路。
## Core flow
General queue drain path:
`producer -> pending/*.md -> watchdog -> driver -> Discord -> sent/*.md`
Watchdog-b single-shot direct path:
`producer -> pending/*.md -> driver -> Discord channel:1480577550445969541 -> sent/*.md`
元件:
- `scripts/owner_report_producer.py`
- `scripts/owner_report_consumer.py`
- `scripts/owner_report_driver.py`
- `scripts/owner_report_watchdog.py`
- `scripts/run_owner_report_watchdog.sh`
- `OWNER_REPORT_OPERATOR_MANUAL.md`
## Scope
適用於:
- ClawTeam / subagent / 背景流程 checkpoint
- 多步驟技術任務
- 明確要求不要漏回報的交辦
- 需要 oldest-first / success-only archive / stop-on-failure 語義的通知鏈路
不適用於:
- 單次短問答
- 不需要主動通知的小修改
- 一次即可回完的低風險任務
## Queue paths
- pending: `~/.clawteam/owner-reports/pending/`
- sent: `~/.clawteam/owner-reports/sent/`
## Local integration
本機目前由 user crontab 每分鐘執行一次 watchdog wrapper
- wrapper: `/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh`
- log: `/opt/workspace_auditing_report/logs/owner_report_watchdog_cron.out`
- default target: 預設 `OWNER_REPORT_CHANNEL=discord` + `OWNER_REPORT_TARGET=channel:1480577550445969541`
- backlog per run: 預設 `OWNER_REPORT_MAX_COUNT=5`
另外watchdog-b owner-facing 單發驗證現在可直接走 `owner_report_driver.py`,不必依賴 wrapper watchdog 的目標/顯示語義判斷。
## Common commands
```bash
# produce one checkpoint report
cd /home/chchang/.openclaw/workspace/owner-reporting-system/scripts
uv run python owner_report_producer.py \
--team general-task \
--worker alice \
--task-id example-task \
--progress 50% \
--done '第一階段完成' \
--next '等待第二階段結果' \
--status normal \
--source manual-checkpoint
# dry-run watchdog
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh --dry-run
# process backlog immediately
OWNER_REPORT_MAX_COUNT=20 \
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh
# temporarily override destination
OWNER_REPORT_CHANNEL=telegram \
OWNER_REPORT_TARGET=864811879 \
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh --dry-run
```
更完整的操作、debug 與 failure semantics 請看 `OWNER_REPORT_OPERATOR_MANUAL.md`

View File

@@ -0,0 +1,192 @@
# Watchdog B v3 notification layer
Single source of truth for owner-facing policy: `~/.config/openclaw/watchdog-b.env`
Runtime auto-detection source: `scripts/openclaw_runtime_probe.py`
This directory now contains:
- `check_openclaw_state.sh` — tri-state checker (`running` / `stalled` / `idle`)
- `run_watchdog_b.sh` — dispatcher + notification runner
- `notify_watchdog_b.py` — minimal notification integration layer
## Configuration source
Priority order is now:
1. process env already set by caller/systemd
2. `WATCHDOG_B_CONFIG_FILE` if set
3. fallback `~/.config/openclaw/watchdog-b.env`
4. code defaults
This means both `run_watchdog_b.sh` and `notify_watchdog_b.py` can be invoked manually and still resolve the same owner-facing channel / target / mode / wording.
For Node/OpenClaw runtime paths, the bundled scripts now resolve in this order:
1. explicit env overrides: `WATCHDOG_B_NODE_BIN`, `WATCHDOG_B_OPENCLAW_MJS`, `WATCHDOG_B_OPENCLAW_ENTRY`
2. PATH lookup for `node` and `openclaw`
3. common install roots scan: nvm, pnpm global, npm-global, `/usr/local`, `/usr`, Volta-style trees
4. fail with an operator-facing error that tells you which env vars to set manually
Repo template to copy from:
- `ops/systemd/user/watchdog-b.env.example`
Install example:
```bash
mkdir -p ~/.config/openclaw
cp ~/.openclaw/workspace/ops/systemd/user/watchdog-b.env.example ~/.config/openclaw/watchdog-b.env
$EDITOR ~/.config/openclaw/watchdog-b.env
```
## Notification strategy
### 1) `running`
Default: **manual / queue-ready only**.
Why:
- a healthy runtime every 10 minutes should not spam Eric
- owner-facing reporting should remain explicit and auditable
Behavior:
- `WATCHDOG_B_RUNNING_REPORT_MODE=manual` (default)
- does not create external messages
- returns a concrete hint for how to enable queue creation
- `WATCHDOG_B_RUNNING_REPORT_MODE=enqueue`
- creates a real pending owner report in `~/.clawteam/owner-reports/pending/`
- does **not** auto-send it
- `WATCHDOG_B_RUNNING_REPORT_MODE=enqueue-and-drain`
- creates a pending owner report and immediately delivers that exact pending item through `owner_report_driver.py`
- send path is direct OpenClaw Discord send using the env-configured owner-facing destination
- this keeps queue/audit semantics but avoids depending on the wrapper watchdog's destination/visibility behavior
Throttle:
- `WATCHDOG_B_RUNNING_REPORT_MIN_INTERVAL_SECONDS` default `3600`
### 2) `stalled`
Default: **nudge main agent first**, then escalate to Eric only after repetition.
Behavior:
- call internal OpenClaw agent route:
- `node .../openclaw.mjs agent --agent main --message ...`
- maintain local notify state under `state/watchdog-b/notify-state.json`
- after repeated observations (`WATCHDOG_B_STALLED_OWNER_ESCALATION_AFTER`, default `2`), enqueue an owner report
Throttle:
- `WATCHDOG_B_STALLED_NUDGE_MIN_INTERVAL_SECONDS` default `900`
Owner escalation mode:
- `WATCHDOG_B_STALLED_OWNER_MODE=escalate` (default)
- `WATCHDOG_B_STALLED_OWNER_MODE=always`
- `WATCHDOG_B_STALLED_OWNER_MODE=never`
Owner delivery mode after enqueue:
- `WATCHDOG_B_OWNER_DELIVERY_MODE=enqueue-only` (default)
- `WATCHDOG_B_OWNER_DELIVERY_MODE=direct-discord`
When `direct-discord` is enabled, watchdog-b still enqueues first, then directly delivers that same pending report via `owner_report_driver.py` to the env-configured Discord target.
## Owner-facing message style
Owner-facing Discord message is now compact and conclusion-first:
- first line: headline (`🔔 [watchdog-b] <worker>`)
- second line: concise conclusion with emoji, e.g. `✅ 主程序仍在運行`
- third line: actionable next step, prefixed with `→`
- last line: compact technical metadata (`task=... | status=... | progress=... | source=...`)
Style knobs live in `watchdog-b.env`:
- `WATCHDOG_B_RUNNING_EMOJI`, `WATCHDOG_B_RUNNING_SUMMARY`
- `WATCHDOG_B_STALLED_EMOJI`, `WATCHDOG_B_STALLED_SUMMARY`
- `WATCHDOG_B_IDLE_EMOJI`, `WATCHDOG_B_IDLE_SUMMARY`
### 3) `idle`
Default: same pattern as stalled, but slower.
Behavior:
- nudge main agent first
- only escalate to Eric after repeated idle detections
Throttle:
- `WATCHDOG_B_IDLE_NUDGE_MIN_INTERVAL_SECONDS` default `1800`
Owner escalation threshold:
- `WATCHDOG_B_IDLE_OWNER_ESCALATION_AFTER` default `2`
## Safety defaults
- `WATCHDOG_B_NOTIFY_DRY_RUN=1` by default in `run_watchdog_b.sh`
- owner-facing send path keeps the existing `owner-reporting-system` queue/artifact/audit flow, but direct Discord delivery no longer depends on the cron wrapper's default destination semantics
- local state tracks last send time / count to reduce spam
- a failed notifier does not crash the dispatcher; it emits warning + preserves artifacts
## Key artifacts
Under `state/watchdog-b/`:
- `last-output.txt` — rendered dispatcher output
- `last-notify-output.txt` — notifier JSON result
- `last-state.txt` — last state
- `history.tsv` — state history
- `notify-state.json` — throttle / repetition tracking
## Manual test examples
### Dry-run running path
```bash
WATCHDOG_B_NOTIFY_DRY_RUN=1 \
WATCHDOG_B_RUNNING_REPORT_MODE=manual \
./scripts/watchdog-b/notify_watchdog_b.py --state running --dry-run
```
### Real queue creation for running (no send)
```bash
WATCHDOG_B_NOTIFY_DRY_RUN=0 \
WATCHDOG_B_RUNNING_REPORT_MODE=enqueue \
./scripts/watchdog-b/notify_watchdog_b.py --state running
ls -l ~/.clawteam/owner-reports/pending
```
### Runtime probe only
```bash
python3 ./scripts/watchdog-b/openclaw_runtime_probe.py --pretty
```
### Single-shot enqueue + direct Discord delivery
```bash
WATCHDOG_B_NOTIFY_DRY_RUN=0 \
WATCHDOG_B_RUNNING_REPORT_MODE=enqueue-and-drain \
./scripts/watchdog-b/notify_watchdog_b.py --state running
```
### Dry-run stalled nudge
```bash
WATCHDOG_B_NOTIFY_DRY_RUN=1 \
./scripts/watchdog-b/notify_watchdog_b.py --state stalled --dry-run
```
If runtime auto-detection fails on a host with a custom install layout, set one or more of:
- `WATCHDOG_B_NODE_BIN`
- `WATCHDOG_B_OPENCLAW_MJS`
- `WATCHDOG_B_OPENCLAW_ENTRY`
### Full dispatcher dry-run with fixture overrides
```bash
OPENCLAW_PID_FILE=$PWD/tests/fixtures/watchdog-b/running/host-runtime/openclaw.pid \
OPENCLAW_LOG_FILE=$PWD/tests/fixtures/watchdog-b/running/logs/openclaw.log \
WATCHDOG_B_ARTIFACT_DIR=$PWD/state/watchdog-b-test-running-v3 \
WATCHDOG_B_NOTIFY_DRY_RUN=1 \
./scripts/watchdog-b/run_watchdog_b.sh
```
## What is truly wired vs not
### Truly wired now
- state detection
- notifier invocation from dispatcher
- main-agent internal nudge command construction and execution path
- owner-report queue creation via existing producer
- optional direct delivery of an enqueued owner report through `owner_report_driver.py`
- throttling / repetition state persisted locally
### Still conditional / not claimed as universally proven
- successful main-agent wake-up depends on local OpenClaw CLI/runtime being callable from this environment
- successful owner-facing delivery still depends on valid local OpenClaw Discord routing on the host
- direct watchdog-b owner delivery now targets `channel:1480577550445969541` by default and bypasses the wrapper watchdog destination logic
- the cron wrapper remains available for generic queue draining, but watchdog-b no longer needs to rely on it for single-shot owner-facing verification