Files
watchdog-discord-route/references/owner-report-operator-manual.md

243 lines
7.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Owner Report Operator Manual
## Purpose
Owner-report system 用來把「需要主動回報的 checkpoint」變成一條可觀測、可驗證、可補送的通知鏈路。
適合解決:
- 長時間、跨步驟任務
- ClawTeam / worker / 背景流程的進度回報
- 不能只靠口頭承諾「等等回報」的情境
不追求複雜事件平台;目標是 **簡單、可靠、失敗不假成功**
## Architecture
最小鏈路:
一般 queue drain
`producer -> pending/*.md -> watchdog -> driver -> Discord -> sent/*.md`
watchdog-b 單發直送:
`producer -> pending/*.md -> driver -> Discord channel:1480577550445969541 -> sent/*.md`
元件:
- `scripts/owner_report_producer.py`
- 把顯式 checkpoint 欄位寫成 `~/.clawteam/owner-reports/pending/<report_id>.md`
- `scripts/owner_report_consumer.py`
- 讀 pending report轉成標準 JSON
- `scripts/owner_report_driver.py`
- 呼叫外部 send command**只有送成功才移到** `sent/`
- `scripts/owner_report_watchdog.py`
- 單次掃描 pending預設 oldest-first 處理 1 筆
- `scripts/run_owner_report_watchdog.sh`
- 本機 wrapper固定 send command / target / max-count給 cron 用
目錄:
- pending: `~/.clawteam/owner-reports/pending/`
- sent: `~/.clawteam/owner-reports/sent/`
- cron log: `/opt/workspace_auditing_report/logs/owner_report_watchdog_cron.out`
## When to use
請用在:
- 多步驟、跨時間任務
- 有明確 checkpoint / status change
- 使用 ClawTeam、subagent、watchdog、cron 的工作
- Eric 明確要求不要漏回報的任務
不要用在:
- 一次回完即可的短問答
- 小修改、低風險單步操作
- 不需要主動通知的普通工作
## Normal flow
1. 任務出現值得回報的 checkpoint
2. `owner_report_producer.py` 產生一筆 pending report
3. cron 每分鐘執行 `run_owner_report_watchdog.sh`
4. watchdog 從 `pending/` 挑最舊 report
5. driver 用 `OWNER_REPORT_SEND_CMD` 實際送出
6. 成功:移到 `sent/`
7. 失敗:留在 `pending/`,本輪停止
### Failure semantics
- **送失敗不會 archive**
- **backlog 依 oldest-first**
- **遇到失敗即停**,避免把後續成功誤當整體正常
## Common commands
### 1) 看 queue
```bash
ls -l ~/.clawteam/owner-reports/pending
ls -l ~/.clawteam/owner-reports/sent
```
### 2) 手動產生一筆 report
```bash
cd /home/chchang/.openclaw/workspace/owner-reporting-system/scripts
uv run python owner_report_producer.py \
--team clawteam \
--worker backend-a \
--task-id example-task \
--progress 80% \
--done 'export complete' \
--next 'wait for aggregation' \
--status normal \
--source checkpoint-complete
```
### 3) producer dry-run
```bash
uv run python owner_report_producer.py \
--team clawteam \
--worker backend-a \
--task-id example-task \
--progress 80% \
--done 'export complete' \
--next 'wait for aggregation' \
--status normal \
--dry-run
```
### 4) watchdog dry-run
```bash
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh --dry-run
```
### 5) 立即手動跑 watchdog
```bash
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh
```
### 6) 一次多吃幾筆 backlog
```bash
OWNER_REPORT_MAX_COUNT=20 \
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh
```
### 7) 查看 cron
```bash
crontab -l
```
### 8) 看 watchdog log
```bash
tail -n 200 /opt/workspace_auditing_report/logs/owner_report_watchdog_cron.out
```
## Debugging checklist
遇到「沒有送出」時,依序檢查:
1. **pending 有沒有檔案**
```bash
ls -l ~/.clawteam/owner-reports/pending
```
2. **report 內容是否合理**
```bash
cd /home/chchang/.openclaw/workspace/owner-reporting-system/scripts
uv run python owner_report_consumer.py <report_id_or_path>
```
3. **driver dry-run 是否正常**
```bash
uv run python owner_report_driver.py <report_id_or_path> --dry-run
```
4. **watchdog dry-run 挑的是哪一筆**
```bash
./run_owner_report_watchdog.sh --dry-run
```
5. **cron 是否存在**
```bash
crontab -l
```
6. **cron log 是否有錯**
```bash
tail -n 200 /opt/workspace_auditing_report/logs/owner_report_watchdog_cron.out
```
7. **Node / openclaw entry 是否存在**
- `run_owner_report_watchdog.sh` 會檢查:
- `NODE_BIN`
- `OPENCLAW_ENTRY`
8. **send command 是否能成功發送**
- wrapper 與 watchdog-b 直送都靠 `OWNER_REPORT_SEND_CMD` / `--send-cmd`
- 現行 owner-facing 預設目標為 Discord `channel:1480577550445969541`
- watchdog-b 單發驗證可直接用 `owner_report_driver.py <report_id> --send-cmd '...message send --channel discord --target '\''channel:1480577550445969541'\'' --message "$OWNER_REPORT_MESSAGE"'`
9. **如果有 backlog 卡住**
- 先修掉最舊失敗那筆watchdog 是 oldest-first且失敗即停
## Cron / watchdog behavior
本機現況:
- schedule: `* * * * *`
- command: `run_owner_report_watchdog.sh`
- target: 預設 Discord `channel:1480577550445969541`
- max backlog per run: 預設 `5`
wrapper 會:
1. 固定 `OWNER_REPORT_SEND_CMD`
2. 固定 owner-facing channel/target預設 `OWNER_REPORT_CHANNEL=discord`、`OWNER_REPORT_TARGET=channel:1480577550445969541`;可覆蓋)
3. 呼叫 `owner_report_watchdog.py --max-count "$OWNER_REPORT_MAX_COUNT"`
watchdog 本身特性:
- 非 daemon
- 非常駐
- 不 retry / backoff
- 每次只做一輪掃描
- 預設處理 1 筆wrapper 預設放大為 5 筆
- `--all` 可掃完整個當前 backlog
## Caveats
- 這不是 message bus也不是完整 job system
- 沒有內建 retry / dedupe / database
- 若最舊 pending 一直失敗,後面會被擋住
- `sent/` 代表「send command 成功」,不是保證人類已讀
- producer 依賴顯式欄位輸入,不會自動理解任意 log
- cron / PATH / node version 問題,已透過 wrapper 固定 Node 與 OpenClaw entry 盡量降低
## Examples
### Example A: 一般任務 checkpoint
```bash
cd /home/chchang/.openclaw/workspace/owner-reporting-system/scripts
uv run python owner_report_producer.py \
--team general-task \
--worker alice \
--task-id manual-checkpoint-1 \
--progress 50% \
--done '第一階段完成' \
--next '等待第二階段結果' \
--status normal \
--source manual-checkpoint
```
### Example B: 只驗證不送出
```bash
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh --dry-run
```
### Example C: watchdog-b 單發直送到 owner-facing Discord
先用 probe 找到本機 runtime
```bash
python3 /home/chchang/.openclaw/workspace/skills/watchdog-discord-route/scripts/openclaw_runtime_probe.py --pretty
```
再帶入偵測到的 `node` / `dist/entry.js`
```bash
cd /home/chchang/.openclaw/workspace/owner-reporting-system/scripts
python3 owner_report_driver.py <report_id> \
--send-cmd '"<detected-node>" "<detected-entry.js>" message send --channel discord --target '\''channel:1480577550445969541'\'' --message "$OWNER_REPORT_MESSAGE"'
```
### Example D: 臨時改送別的 channel / target
```bash
OWNER_REPORT_CHANNEL=telegram \
OWNER_REPORT_TARGET=123456789 \
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh
```
### Example E: 快速消化 backlog
```bash
OWNER_REPORT_MAX_COUNT=10 \
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh
```
## Operator summary
如果只記三件事:
1. **report 要先進 `pending/`,才有東西可送**
2. **只有 send 成功才會移到 `sent/`**
3. **看不到通知時,先查 pending / cron log / 最舊失敗那筆**