Initial import of watchdog-discord-route skill

This commit is contained in:
Alice
2026-04-22 08:33:51 +08:00
commit 8138fb011d
22 changed files with 2447 additions and 0 deletions

View File

@@ -0,0 +1,242 @@
# Owner Report Operator Manual
## Purpose
Owner-report system 用來把「需要主動回報的 checkpoint」變成一條可觀測、可驗證、可補送的通知鏈路。
適合解決:
- 長時間、跨步驟任務
- ClawTeam / worker / 背景流程的進度回報
- 不能只靠口頭承諾「等等回報」的情境
不追求複雜事件平台;目標是 **簡單、可靠、失敗不假成功**
## Architecture
最小鏈路:
一般 queue drain
`producer -> pending/*.md -> watchdog -> driver -> Discord -> sent/*.md`
watchdog-b 單發直送:
`producer -> pending/*.md -> driver -> Discord channel:1480577550445969541 -> sent/*.md`
元件:
- `scripts/owner_report_producer.py`
- 把顯式 checkpoint 欄位寫成 `~/.clawteam/owner-reports/pending/<report_id>.md`
- `scripts/owner_report_consumer.py`
- 讀 pending report轉成標準 JSON
- `scripts/owner_report_driver.py`
- 呼叫外部 send command**只有送成功才移到** `sent/`
- `scripts/owner_report_watchdog.py`
- 單次掃描 pending預設 oldest-first 處理 1 筆
- `scripts/run_owner_report_watchdog.sh`
- 本機 wrapper固定 send command / target / max-count給 cron 用
目錄:
- pending: `~/.clawteam/owner-reports/pending/`
- sent: `~/.clawteam/owner-reports/sent/`
- cron log: `/opt/workspace_auditing_report/logs/owner_report_watchdog_cron.out`
## When to use
請用在:
- 多步驟、跨時間任務
- 有明確 checkpoint / status change
- 使用 ClawTeam、subagent、watchdog、cron 的工作
- Eric 明確要求不要漏回報的任務
不要用在:
- 一次回完即可的短問答
- 小修改、低風險單步操作
- 不需要主動通知的普通工作
## Normal flow
1. 任務出現值得回報的 checkpoint
2. `owner_report_producer.py` 產生一筆 pending report
3. cron 每分鐘執行 `run_owner_report_watchdog.sh`
4. watchdog 從 `pending/` 挑最舊 report
5. driver 用 `OWNER_REPORT_SEND_CMD` 實際送出
6. 成功:移到 `sent/`
7. 失敗:留在 `pending/`,本輪停止
### Failure semantics
- **送失敗不會 archive**
- **backlog 依 oldest-first**
- **遇到失敗即停**,避免把後續成功誤當整體正常
## Common commands
### 1) 看 queue
```bash
ls -l ~/.clawteam/owner-reports/pending
ls -l ~/.clawteam/owner-reports/sent
```
### 2) 手動產生一筆 report
```bash
cd /home/chchang/.openclaw/workspace/owner-reporting-system/scripts
uv run python owner_report_producer.py \
--team clawteam \
--worker backend-a \
--task-id example-task \
--progress 80% \
--done 'export complete' \
--next 'wait for aggregation' \
--status normal \
--source checkpoint-complete
```
### 3) producer dry-run
```bash
uv run python owner_report_producer.py \
--team clawteam \
--worker backend-a \
--task-id example-task \
--progress 80% \
--done 'export complete' \
--next 'wait for aggregation' \
--status normal \
--dry-run
```
### 4) watchdog dry-run
```bash
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh --dry-run
```
### 5) 立即手動跑 watchdog
```bash
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh
```
### 6) 一次多吃幾筆 backlog
```bash
OWNER_REPORT_MAX_COUNT=20 \
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh
```
### 7) 查看 cron
```bash
crontab -l
```
### 8) 看 watchdog log
```bash
tail -n 200 /opt/workspace_auditing_report/logs/owner_report_watchdog_cron.out
```
## Debugging checklist
遇到「沒有送出」時,依序檢查:
1. **pending 有沒有檔案**
```bash
ls -l ~/.clawteam/owner-reports/pending
```
2. **report 內容是否合理**
```bash
cd /home/chchang/.openclaw/workspace/owner-reporting-system/scripts
uv run python owner_report_consumer.py <report_id_or_path>
```
3. **driver dry-run 是否正常**
```bash
uv run python owner_report_driver.py <report_id_or_path> --dry-run
```
4. **watchdog dry-run 挑的是哪一筆**
```bash
./run_owner_report_watchdog.sh --dry-run
```
5. **cron 是否存在**
```bash
crontab -l
```
6. **cron log 是否有錯**
```bash
tail -n 200 /opt/workspace_auditing_report/logs/owner_report_watchdog_cron.out
```
7. **Node / openclaw entry 是否存在**
- `run_owner_report_watchdog.sh` 會檢查:
- `NODE_BIN`
- `OPENCLAW_ENTRY`
8. **send command 是否能成功發送**
- wrapper 與 watchdog-b 直送都靠 `OWNER_REPORT_SEND_CMD` / `--send-cmd`
- 現行 owner-facing 預設目標為 Discord `channel:1480577550445969541`
- watchdog-b 單發驗證可直接用 `owner_report_driver.py <report_id> --send-cmd '...message send --channel discord --target '\''channel:1480577550445969541'\'' --message "$OWNER_REPORT_MESSAGE"'`
9. **如果有 backlog 卡住**
- 先修掉最舊失敗那筆watchdog 是 oldest-first且失敗即停
## Cron / watchdog behavior
本機現況:
- schedule: `* * * * *`
- command: `run_owner_report_watchdog.sh`
- target: 預設 Discord `channel:1480577550445969541`
- max backlog per run: 預設 `5`
wrapper 會:
1. 固定 `OWNER_REPORT_SEND_CMD`
2. 固定 owner-facing channel/target預設 `OWNER_REPORT_CHANNEL=discord`、`OWNER_REPORT_TARGET=channel:1480577550445969541`;可覆蓋)
3. 呼叫 `owner_report_watchdog.py --max-count "$OWNER_REPORT_MAX_COUNT"`
watchdog 本身特性:
- 非 daemon
- 非常駐
- 不 retry / backoff
- 每次只做一輪掃描
- 預設處理 1 筆wrapper 預設放大為 5 筆
- `--all` 可掃完整個當前 backlog
## Caveats
- 這不是 message bus也不是完整 job system
- 沒有內建 retry / dedupe / database
- 若最舊 pending 一直失敗,後面會被擋住
- `sent/` 代表「send command 成功」,不是保證人類已讀
- producer 依賴顯式欄位輸入,不會自動理解任意 log
- cron / PATH / node version 問題,已透過 wrapper 固定 Node 與 OpenClaw entry 盡量降低
## Examples
### Example A: 一般任務 checkpoint
```bash
cd /home/chchang/.openclaw/workspace/owner-reporting-system/scripts
uv run python owner_report_producer.py \
--team general-task \
--worker alice \
--task-id manual-checkpoint-1 \
--progress 50% \
--done '第一階段完成' \
--next '等待第二階段結果' \
--status normal \
--source manual-checkpoint
```
### Example B: 只驗證不送出
```bash
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh --dry-run
```
### Example C: watchdog-b 單發直送到 owner-facing Discord
先用 probe 找到本機 runtime
```bash
python3 /home/chchang/.openclaw/workspace/skills/watchdog-discord-route/scripts/openclaw_runtime_probe.py --pretty
```
再帶入偵測到的 `node` / `dist/entry.js`
```bash
cd /home/chchang/.openclaw/workspace/owner-reporting-system/scripts
python3 owner_report_driver.py <report_id> \
--send-cmd '"<detected-node>" "<detected-entry.js>" message send --channel discord --target '\''channel:1480577550445969541'\'' --message "$OWNER_REPORT_MESSAGE"'
```
### Example D: 臨時改送別的 channel / target
```bash
OWNER_REPORT_CHANNEL=telegram \
OWNER_REPORT_TARGET=123456789 \
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh
```
### Example E: 快速消化 backlog
```bash
OWNER_REPORT_MAX_COUNT=10 \
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh
```
## Operator summary
如果只記三件事:
1. **report 要先進 `pending/`,才有東西可送**
2. **只有 send 成功才會移到 `sent/`**
3. **看不到通知時,先查 pending / cron log / 最舊失敗那筆**