243 lines
7.6 KiB
Markdown
243 lines
7.6 KiB
Markdown
# Owner Report Operator Manual
|
||
|
||
## Purpose
|
||
Owner-report system 用來把「需要主動回報的 checkpoint」變成一條可觀測、可驗證、可補送的通知鏈路。
|
||
|
||
適合解決:
|
||
- 長時間、跨步驟任務
|
||
- ClawTeam / worker / 背景流程的進度回報
|
||
- 不能只靠口頭承諾「等等回報」的情境
|
||
|
||
不追求複雜事件平台;目標是 **簡單、可靠、失敗不假成功**。
|
||
|
||
## Architecture
|
||
最小鏈路:
|
||
|
||
一般 queue drain:
|
||
`producer -> pending/*.md -> watchdog -> driver -> Discord -> sent/*.md`
|
||
|
||
watchdog-b 單發直送:
|
||
`producer -> pending/*.md -> driver -> Discord channel:1480577550445969541 -> sent/*.md`
|
||
|
||
元件:
|
||
- `scripts/owner_report_producer.py`
|
||
- 把顯式 checkpoint 欄位寫成 `~/.clawteam/owner-reports/pending/<report_id>.md`
|
||
- `scripts/owner_report_consumer.py`
|
||
- 讀 pending report,轉成標準 JSON
|
||
- `scripts/owner_report_driver.py`
|
||
- 呼叫外部 send command;**只有送成功才移到** `sent/`
|
||
- `scripts/owner_report_watchdog.py`
|
||
- 單次掃描 pending,預設 oldest-first 處理 1 筆
|
||
- `scripts/run_owner_report_watchdog.sh`
|
||
- 本機 wrapper,固定 send command / target / max-count,給 cron 用
|
||
|
||
目錄:
|
||
- pending: `~/.clawteam/owner-reports/pending/`
|
||
- sent: `~/.clawteam/owner-reports/sent/`
|
||
- cron log: `/opt/workspace_auditing_report/logs/owner_report_watchdog_cron.out`
|
||
|
||
## When to use
|
||
請用在:
|
||
- 多步驟、跨時間任務
|
||
- 有明確 checkpoint / status change
|
||
- 使用 ClawTeam、subagent、watchdog、cron 的工作
|
||
- Eric 明確要求不要漏回報的任務
|
||
|
||
不要用在:
|
||
- 一次回完即可的短問答
|
||
- 小修改、低風險單步操作
|
||
- 不需要主動通知的普通工作
|
||
|
||
## Normal flow
|
||
1. 任務出現值得回報的 checkpoint
|
||
2. `owner_report_producer.py` 產生一筆 pending report
|
||
3. cron 每分鐘執行 `run_owner_report_watchdog.sh`
|
||
4. watchdog 從 `pending/` 挑最舊 report
|
||
5. driver 用 `OWNER_REPORT_SEND_CMD` 實際送出
|
||
6. 成功:移到 `sent/`
|
||
7. 失敗:留在 `pending/`,本輪停止
|
||
|
||
### Failure semantics
|
||
- **送失敗不會 archive**
|
||
- **backlog 依 oldest-first**
|
||
- **遇到失敗即停**,避免把後續成功誤當整體正常
|
||
|
||
## Common commands
|
||
### 1) 看 queue
|
||
```bash
|
||
ls -l ~/.clawteam/owner-reports/pending
|
||
ls -l ~/.clawteam/owner-reports/sent
|
||
```
|
||
|
||
### 2) 手動產生一筆 report
|
||
```bash
|
||
cd /home/chchang/.openclaw/workspace/owner-reporting-system/scripts
|
||
uv run python owner_report_producer.py \
|
||
--team clawteam \
|
||
--worker backend-a \
|
||
--task-id example-task \
|
||
--progress 80% \
|
||
--done 'export complete' \
|
||
--next 'wait for aggregation' \
|
||
--status normal \
|
||
--source checkpoint-complete
|
||
```
|
||
|
||
### 3) producer dry-run
|
||
```bash
|
||
uv run python owner_report_producer.py \
|
||
--team clawteam \
|
||
--worker backend-a \
|
||
--task-id example-task \
|
||
--progress 80% \
|
||
--done 'export complete' \
|
||
--next 'wait for aggregation' \
|
||
--status normal \
|
||
--dry-run
|
||
```
|
||
|
||
### 4) watchdog dry-run
|
||
```bash
|
||
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh --dry-run
|
||
```
|
||
|
||
### 5) 立即手動跑 watchdog
|
||
```bash
|
||
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh
|
||
```
|
||
|
||
### 6) 一次多吃幾筆 backlog
|
||
```bash
|
||
OWNER_REPORT_MAX_COUNT=20 \
|
||
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh
|
||
```
|
||
|
||
### 7) 查看 cron
|
||
```bash
|
||
crontab -l
|
||
```
|
||
|
||
### 8) 看 watchdog log
|
||
```bash
|
||
tail -n 200 /opt/workspace_auditing_report/logs/owner_report_watchdog_cron.out
|
||
```
|
||
|
||
## Debugging checklist
|
||
遇到「沒有送出」時,依序檢查:
|
||
|
||
1. **pending 有沒有檔案**
|
||
```bash
|
||
ls -l ~/.clawteam/owner-reports/pending
|
||
```
|
||
2. **report 內容是否合理**
|
||
```bash
|
||
cd /home/chchang/.openclaw/workspace/owner-reporting-system/scripts
|
||
uv run python owner_report_consumer.py <report_id_or_path>
|
||
```
|
||
3. **driver dry-run 是否正常**
|
||
```bash
|
||
uv run python owner_report_driver.py <report_id_or_path> --dry-run
|
||
```
|
||
4. **watchdog dry-run 挑的是哪一筆**
|
||
```bash
|
||
./run_owner_report_watchdog.sh --dry-run
|
||
```
|
||
5. **cron 是否存在**
|
||
```bash
|
||
crontab -l
|
||
```
|
||
6. **cron log 是否有錯**
|
||
```bash
|
||
tail -n 200 /opt/workspace_auditing_report/logs/owner_report_watchdog_cron.out
|
||
```
|
||
7. **Node / openclaw entry 是否存在**
|
||
- `run_owner_report_watchdog.sh` 會檢查:
|
||
- `NODE_BIN`
|
||
- `OPENCLAW_ENTRY`
|
||
8. **send command 是否能成功發送**
|
||
- wrapper 與 watchdog-b 直送都靠 `OWNER_REPORT_SEND_CMD` / `--send-cmd`
|
||
- 現行 owner-facing 預設目標為 Discord `channel:1480577550445969541`
|
||
- watchdog-b 單發驗證可直接用 `owner_report_driver.py <report_id> --send-cmd '...message send --channel discord --target '\''channel:1480577550445969541'\'' --message "$OWNER_REPORT_MESSAGE"'`
|
||
9. **如果有 backlog 卡住**
|
||
- 先修掉最舊失敗那筆;watchdog 是 oldest-first,且失敗即停
|
||
|
||
## Cron / watchdog behavior
|
||
本機現況:
|
||
- schedule: `* * * * *`
|
||
- command: `run_owner_report_watchdog.sh`
|
||
- target: 預設 Discord `channel:1480577550445969541`
|
||
- max backlog per run: 預設 `5`
|
||
|
||
wrapper 會:
|
||
1. 固定 `OWNER_REPORT_SEND_CMD`
|
||
2. 固定 owner-facing channel/target(預設 `OWNER_REPORT_CHANNEL=discord`、`OWNER_REPORT_TARGET=channel:1480577550445969541`;可覆蓋)
|
||
3. 呼叫 `owner_report_watchdog.py --max-count "$OWNER_REPORT_MAX_COUNT"`
|
||
|
||
watchdog 本身特性:
|
||
- 非 daemon
|
||
- 非常駐
|
||
- 不 retry / backoff
|
||
- 每次只做一輪掃描
|
||
- 預設處理 1 筆;wrapper 預設放大為 5 筆
|
||
- `--all` 可掃完整個當前 backlog
|
||
|
||
## Caveats
|
||
- 這不是 message bus,也不是完整 job system
|
||
- 沒有內建 retry / dedupe / database
|
||
- 若最舊 pending 一直失敗,後面會被擋住
|
||
- `sent/` 代表「send command 成功」,不是保證人類已讀
|
||
- producer 依賴顯式欄位輸入,不會自動理解任意 log
|
||
- cron / PATH / node version 問題,已透過 wrapper 固定 Node 與 OpenClaw entry 盡量降低
|
||
|
||
## Examples
|
||
### Example A: 一般任務 checkpoint
|
||
```bash
|
||
cd /home/chchang/.openclaw/workspace/owner-reporting-system/scripts
|
||
uv run python owner_report_producer.py \
|
||
--team general-task \
|
||
--worker alice \
|
||
--task-id manual-checkpoint-1 \
|
||
--progress 50% \
|
||
--done '第一階段完成' \
|
||
--next '等待第二階段結果' \
|
||
--status normal \
|
||
--source manual-checkpoint
|
||
```
|
||
|
||
### Example B: 只驗證不送出
|
||
```bash
|
||
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh --dry-run
|
||
```
|
||
|
||
### Example C: watchdog-b 單發直送到 owner-facing Discord
|
||
先用 probe 找到本機 runtime:
|
||
```bash
|
||
python3 /home/chchang/.openclaw/workspace/skills/watchdog-discord-route/scripts/openclaw_runtime_probe.py --pretty
|
||
```
|
||
|
||
再帶入偵測到的 `node` / `dist/entry.js`:
|
||
```bash
|
||
cd /home/chchang/.openclaw/workspace/owner-reporting-system/scripts
|
||
python3 owner_report_driver.py <report_id> \
|
||
--send-cmd '"<detected-node>" "<detected-entry.js>" message send --channel discord --target '\''channel:1480577550445969541'\'' --message "$OWNER_REPORT_MESSAGE"'
|
||
```
|
||
|
||
### Example D: 臨時改送別的 channel / target
|
||
```bash
|
||
OWNER_REPORT_CHANNEL=telegram \
|
||
OWNER_REPORT_TARGET=123456789 \
|
||
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh
|
||
```
|
||
|
||
### Example E: 快速消化 backlog
|
||
```bash
|
||
OWNER_REPORT_MAX_COUNT=10 \
|
||
/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh
|
||
```
|
||
|
||
## Operator summary
|
||
如果只記三件事:
|
||
1. **report 要先進 `pending/`,才有東西可送**
|
||
2. **只有 send 成功才會移到 `sent/`**
|
||
3. **看不到通知時,先查 pending / cron log / 最舊失敗那筆**
|