commit 8138fb011d5a583e5cc10f7a327eec86e2f722a6 Author: Alice Date: Wed Apr 22 08:33:51 2026 +0800 Initial import of watchdog-discord-route skill diff --git a/SKILL.md b/SKILL.md new file mode 100644 index 0000000..f396f4b --- /dev/null +++ b/SKILL.md @@ -0,0 +1,274 @@ +--- +name: watchdog-discord-route +description: Install, reset, verify, or operate the OpenClaw watchdog-b owner-facing Discord route. Use when setting up or repairing watchdog-b -> owner-report -> Discord delivery, cleaning old watchdog test residue, enabling the systemd --user timer, running end-to-end verification, or adjusting the Discord-facing notification path/target/template. +--- + +# Watchdog Discord Route + +Use this skill when the task is about the **watchdog-b owner-facing notification path to Discord**. + +This skill covers four recurring jobs: + +1. **Reset / clean** old watchdog test residue without deleting live audit assets. +2. **Verify** the end-to-end path from watchdog-b to Discord. +3. **Install / enable** the live watchdog schedule. +4. **Repair / adjust** the Discord-facing route, target, or message format. + +## What the current canonical path is + +Preferred owner-facing path: + +`watchdog-b -> notify_watchdog_b.py -> owner_report_producer.py -> owner_report_driver.py -> OpenClaw Discord send -> sent archive` + +For watchdog-b single-shot owner-facing delivery, prefer the **direct Discord driver path** over relying on the generic wrapper watchdog destination semantics. + +Current default Discord target in this workspace is a validated example, but the portable skill should treat target as host-local configuration. + +For portable installs, set: + +- `WATCHDOG_B_OWNER_REPORT_TARGET=channel:REPLACE_ME` + +## Bundled skill resources + +This skill now carries a portable bundle under its own directory. +Prefer these bundled files first when adapting or reusing on another host: + +### scripts/ +- `scripts/check_openclaw_state.sh` +- `scripts/notify_watchdog_b.py` +- `scripts/run_watchdog_b.sh` +- `scripts/verify_watchdog_b_e2e.sh` +- `scripts/owner_report_consumer.py` +- `scripts/owner_report_producer.py` +- `scripts/owner_report_driver.py` +- `scripts/install_watchdog_bundle.sh` +- `scripts/bootstrap_watchdog_bundle.sh` +- `scripts/openclaw-watchdog-b.service` +- `scripts/openclaw-watchdog-b.timer` +- `scripts/openclaw_runtime_probe.py` +- `scripts/watchdog-b.env.example` + +### references/ +- `references/watchdog-b-readme.md` +- `references/owner-reporting-system.md` +- `references/owner-report-operator-manual.md` + +If working in this workspace, you may still inspect the live workspace files too, but the bundled skill files are the portable baseline. + +## When resetting / cleaning + +Before touching anything, inventory: + +- `state/watchdog-b/` +- `state/watchdog-b-test-*` +- `state/watchdog-b-verify-e2e/` +- `state/archive/` +- `~/.clawteam/owner-reports/pending/` +- `~/.clawteam/owner-reports/sent/` +- user crontab entries related to owner-report/watchdog +- `~/.config/systemd/user/openclaw-watchdog-b.*` + +Rules: + +- **Do not delete** `~/.clawteam/owner-reports/sent/` history unless explicitly asked. +- **Do not delete** live `state/watchdog-b/notify-state.json` unless the user explicitly wants a hard reset. +- Prefer **archiving** old `state/watchdog-b-test-*` into `state/archive//`. +- Distinguish clearly between: + - live state + - old test residue + - repo templates + - live installed units + +## When doing end-to-end verification + +Default verification script: + +- bundled: `scripts/verify_watchdog_b_e2e.sh` +- workspace live path: `scripts/watchdog-b/verify_watchdog_b_e2e.sh` + +This should be treated as the **single-source verification path** unless there is a specific reason to bypass it. + +Minimum success evidence: + +- Discord send success with a new message id +- pending report created then moved to sent +- sent file exists under `~/.clawteam/owner-reports/sent/` +- verification artifacts under `state/watchdog-b-verify-e2e//` + +Useful artifacts: + +- `verify.log` +- `run-output.txt` +- `queue-before.txt` +- `queue-after.txt` +- `sent-head.txt` +- `state/notify-state.json` + +Do not claim human-visible success unless either: + +- the user confirms visibility, or +- you can read back the message from the exact Discord channel and match the message id/content. + +## When installing the live schedule + +Preferred scheduler: **systemd --user timer**. + +Before claiming a host is ready, prefer to run the bundled bootstrap checker first: + +- `scripts/bootstrap_watchdog_bundle.sh` + +When installing or refreshing the portable bundle into live paths, prefer the bundled installer: + +- `scripts/install_watchdog_bundle.sh --install-env-example` + +Use bootstrap when you need a quick host-readiness answer without changing the host. +Use install when you need to copy the bundled scripts/service/timer/env example into the live workspace and user config paths. + +Use it when: + +- `systemd --user` is available +- linger/user services are supported +- you want journal/status/list-timers visibility + +Live install paths: + +- `~/.config/systemd/user/openclaw-watchdog-b.service` +- `~/.config/systemd/user/openclaw-watchdog-b.timer` +- `~/.config/openclaw/watchdog-b.env` + +Portable install sources carried by this skill: + +- `scripts/check_openclaw_state.sh` +- `scripts/run_watchdog_b.sh` +- `scripts/notify_watchdog_b.py` +- `scripts/owner_report_consumer.py` +- `scripts/owner_report_producer.py` +- `scripts/owner_report_driver.py` +- `scripts/install_watchdog_bundle.sh` +- `scripts/bootstrap_watchdog_bundle.sh` +- `scripts/openclaw-watchdog-b.service` +- `scripts/openclaw-watchdog-b.timer` +- `scripts/openclaw_runtime_probe.py` +- `scripts/watchdog-b.env.example` + +Expected environment for live route: + +- `WATCHDOG_B_NOTIFY_DRY_RUN=0` +- `WATCHDOG_B_RUNNING_REPORT_MODE=enqueue-and-drain` +- `WATCHDOG_B_OWNER_DELIVERY_MODE=direct-discord` +- `WATCHDOG_B_OWNER_REPORT_CHANNEL=discord` +- `WATCHDOG_B_OWNER_REPORT_TARGET=channel:REPLACE_ME` +- optional: `WATCHDOG_B_MAIN_AGENT_ID=` + +Minimum installation verification: + +- `systemctl --user daemon-reload` +- `systemctl --user enable --now openclaw-watchdog-b.timer` +- `systemctl --user start openclaw-watchdog-b.service` +- `systemctl --user status openclaw-watchdog-b.timer --no-pager` +- `systemctl --user status openclaw-watchdog-b.service --no-pager` +- `systemctl --user list-timers --all | rg openclaw-watchdog-b` +- `journalctl --user -u openclaw-watchdog-b.service -n 50 --no-pager` + +## When repairing Discord delivery + +Check these in order: + +1. Is the target channel correct? + - Validate the host-local configured form such as `channel:` +2. Is the send path using direct driver delivery or the generic wrapper? + - For watchdog-b owner-facing single-shot delivery, prefer direct driver delivery. +3. If stalled/idle tries to nudge a main agent, is `WATCHDOG_B_MAIN_AGENT_ID` set to a valid agent id on that host? + - If not, leave it unset so main-agent nudge is skipped instead of failing on `Unknown agent id`. +4. Does the message actually appear in channel readback? +5. Is the message merely transport-accepted, or human-visible? + +Be precise in language: + +- "transport accepted" = send returned success / message id +- "channel-readable" = message appears in channel readback +- "human-confirmed" = Eric says he saw it + +Do not collapse these into one claim. + +## Message formatting preference + +Eric reported that visible does not necessarily mean prominent. + +When adjusting Discord-facing watchdog messages, prefer: + +- short first line +- conclusion first +- minimize raw diagnostic fields in the user-facing body +- keep machine/audit detail in artifacts, not in the first visible lines + +Preferred shape: + +- `🔔 WATCHDOG|<結論>` +- `任務:...` +- `結論:...` +- `你現在不用做事` or `需要你介入:...` + +## Suggested execution pattern + +For most tasks in this skill: + +1. Inventory live state and installed schedule. +2. Decide whether the task is: + - reset/cleanup + - route repair + - schedule install + - e2e verify + - portability / migration +3. Preserve audit assets. +4. Prefer the bundled skill scripts as the reusable baseline. +5. Use direct driver Discord route for owner-facing verification. +6. Attach evidence paths and exact commands before claiming success. + +## Portability note + +This skill is now suitable to copy to another OpenClaw workspace, but portability still depends on the target host having: + +- a working OpenClaw install +- Discord routing available +- a valid destination channel/target configured in `watchdog-b.env` +- systemd --user if live scheduling is desired + +When moving to another host, recommended order is: + +1. Run `scripts/install_watchdog_bundle.sh --install-env-example` to populate live paths from the bundled copies. +2. If `~/.config/openclaw/watchdog-b.env` does not exist, create it from the example: + - `mkdir -p ~/.config/openclaw` + - `cp ~/.config/openclaw/watchdog-b.env.example ~/.config/openclaw/watchdog-b.env` +3. Edit `~/.config/openclaw/watchdog-b.env` and set at least: + - `WATCHDOG_B_OWNER_REPORT_TARGET=channel:YOUR_DISCORD_CHANNEL_ID` +4. Run `scripts/bootstrap_watchdog_bundle.sh`. +5. Bootstrap should now validate the installed live bundle under `scripts/watchdog-b/` and should not require a separate `owner-reporting-system/` live tree. +6. Re-run bootstrap until it passes, then enable/start systemd --user units. + +When moving to another host, runtime detection now follows this order: + +1. explicit env overrides: `WATCHDOG_B_NODE_BIN`, `WATCHDOG_B_OPENCLAW_MJS`, `WATCHDOG_B_OPENCLAW_ENTRY` +2. PATH discovery: `node`, then `openclaw`-adjacent install roots +3. common install roots scan: nvm, pnpm global, npm-global, `/usr/local`, `/usr`, and Volta-style locations +4. hard failure with a message telling the operator which env vars to set manually + +So on most hosts you should only need to update: + +- `WATCHDOG_B_OWNER_REPORT_TARGET` +- `WATCHDOG_B_WORKSPACE` if the workspace is not the default `~/.openclaw/workspace` + +Only set `WATCHDOG_B_NODE_BIN` / `WATCHDOG_B_OPENCLAW_ENTRY` / `WATCHDOG_B_OPENCLAW_MJS` when auto-detection fails or you intentionally want to pin a non-default runtime. + +If another agent reports that `~/.config/openclaw/watchdog-b.env` does not exist, the correct response is to create it from `watchdog-b.env.example` before expecting bootstrap to pass. + +## Success checklist + +Do not say done unless you have all applicable evidence: + +- changed file paths +- exact command(s) run +- status output or send result +- sent archive path when queue/driver is involved +- channel/message evidence when claiming delivery +- rollback instructions when you changed live scheduling diff --git a/references/owner-report-operator-manual.md b/references/owner-report-operator-manual.md new file mode 100644 index 0000000..0c72681 --- /dev/null +++ b/references/owner-report-operator-manual.md @@ -0,0 +1,242 @@ +# Owner Report Operator Manual + +## Purpose +Owner-report system 用來把「需要主動回報的 checkpoint」變成一條可觀測、可驗證、可補送的通知鏈路。 + +適合解決: +- 長時間、跨步驟任務 +- ClawTeam / worker / 背景流程的進度回報 +- 不能只靠口頭承諾「等等回報」的情境 + +不追求複雜事件平台;目標是 **簡單、可靠、失敗不假成功**。 + +## Architecture +最小鏈路: + +一般 queue drain: +`producer -> pending/*.md -> watchdog -> driver -> Discord -> sent/*.md` + +watchdog-b 單發直送: +`producer -> pending/*.md -> driver -> Discord channel:1480577550445969541 -> sent/*.md` + +元件: +- `scripts/owner_report_producer.py` + - 把顯式 checkpoint 欄位寫成 `~/.clawteam/owner-reports/pending/.md` +- `scripts/owner_report_consumer.py` + - 讀 pending report,轉成標準 JSON +- `scripts/owner_report_driver.py` + - 呼叫外部 send command;**只有送成功才移到** `sent/` +- `scripts/owner_report_watchdog.py` + - 單次掃描 pending,預設 oldest-first 處理 1 筆 +- `scripts/run_owner_report_watchdog.sh` + - 本機 wrapper,固定 send command / target / max-count,給 cron 用 + +目錄: +- pending: `~/.clawteam/owner-reports/pending/` +- sent: `~/.clawteam/owner-reports/sent/` +- cron log: `/opt/workspace_auditing_report/logs/owner_report_watchdog_cron.out` + +## When to use +請用在: +- 多步驟、跨時間任務 +- 有明確 checkpoint / status change +- 使用 ClawTeam、subagent、watchdog、cron 的工作 +- Eric 明確要求不要漏回報的任務 + +不要用在: +- 一次回完即可的短問答 +- 小修改、低風險單步操作 +- 不需要主動通知的普通工作 + +## Normal flow +1. 任務出現值得回報的 checkpoint +2. `owner_report_producer.py` 產生一筆 pending report +3. cron 每分鐘執行 `run_owner_report_watchdog.sh` +4. watchdog 從 `pending/` 挑最舊 report +5. driver 用 `OWNER_REPORT_SEND_CMD` 實際送出 +6. 成功:移到 `sent/` +7. 失敗:留在 `pending/`,本輪停止 + +### Failure semantics +- **送失敗不會 archive** +- **backlog 依 oldest-first** +- **遇到失敗即停**,避免把後續成功誤當整體正常 + +## Common commands +### 1) 看 queue +```bash +ls -l ~/.clawteam/owner-reports/pending +ls -l ~/.clawteam/owner-reports/sent +``` + +### 2) 手動產生一筆 report +```bash +cd /home/chchang/.openclaw/workspace/owner-reporting-system/scripts +uv run python owner_report_producer.py \ + --team clawteam \ + --worker backend-a \ + --task-id example-task \ + --progress 80% \ + --done 'export complete' \ + --next 'wait for aggregation' \ + --status normal \ + --source checkpoint-complete +``` + +### 3) producer dry-run +```bash +uv run python owner_report_producer.py \ + --team clawteam \ + --worker backend-a \ + --task-id example-task \ + --progress 80% \ + --done 'export complete' \ + --next 'wait for aggregation' \ + --status normal \ + --dry-run +``` + +### 4) watchdog dry-run +```bash +/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh --dry-run +``` + +### 5) 立即手動跑 watchdog +```bash +/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh +``` + +### 6) 一次多吃幾筆 backlog +```bash +OWNER_REPORT_MAX_COUNT=20 \ +/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh +``` + +### 7) 查看 cron +```bash +crontab -l +``` + +### 8) 看 watchdog log +```bash +tail -n 200 /opt/workspace_auditing_report/logs/owner_report_watchdog_cron.out +``` + +## Debugging checklist +遇到「沒有送出」時,依序檢查: + +1. **pending 有沒有檔案** + ```bash + ls -l ~/.clawteam/owner-reports/pending + ``` +2. **report 內容是否合理** + ```bash + cd /home/chchang/.openclaw/workspace/owner-reporting-system/scripts + uv run python owner_report_consumer.py + ``` +3. **driver dry-run 是否正常** + ```bash + uv run python owner_report_driver.py --dry-run + ``` +4. **watchdog dry-run 挑的是哪一筆** + ```bash + ./run_owner_report_watchdog.sh --dry-run + ``` +5. **cron 是否存在** + ```bash + crontab -l + ``` +6. **cron log 是否有錯** + ```bash + tail -n 200 /opt/workspace_auditing_report/logs/owner_report_watchdog_cron.out + ``` +7. **Node / openclaw entry 是否存在** + - `run_owner_report_watchdog.sh` 會檢查: + - `NODE_BIN` + - `OPENCLAW_ENTRY` +8. **send command 是否能成功發送** + - wrapper 與 watchdog-b 直送都靠 `OWNER_REPORT_SEND_CMD` / `--send-cmd` + - 現行 owner-facing 預設目標為 Discord `channel:1480577550445969541` + - watchdog-b 單發驗證可直接用 `owner_report_driver.py --send-cmd '...message send --channel discord --target '\''channel:1480577550445969541'\'' --message "$OWNER_REPORT_MESSAGE"'` +9. **如果有 backlog 卡住** + - 先修掉最舊失敗那筆;watchdog 是 oldest-first,且失敗即停 + +## Cron / watchdog behavior +本機現況: +- schedule: `* * * * *` +- command: `run_owner_report_watchdog.sh` +- target: 預設 Discord `channel:1480577550445969541` +- max backlog per run: 預設 `5` + +wrapper 會: +1. 固定 `OWNER_REPORT_SEND_CMD` +2. 固定 owner-facing channel/target(預設 `OWNER_REPORT_CHANNEL=discord`、`OWNER_REPORT_TARGET=channel:1480577550445969541`;可覆蓋) +3. 呼叫 `owner_report_watchdog.py --max-count "$OWNER_REPORT_MAX_COUNT"` + +watchdog 本身特性: +- 非 daemon +- 非常駐 +- 不 retry / backoff +- 每次只做一輪掃描 +- 預設處理 1 筆;wrapper 預設放大為 5 筆 +- `--all` 可掃完整個當前 backlog + +## Caveats +- 這不是 message bus,也不是完整 job system +- 沒有內建 retry / dedupe / database +- 若最舊 pending 一直失敗,後面會被擋住 +- `sent/` 代表「send command 成功」,不是保證人類已讀 +- producer 依賴顯式欄位輸入,不會自動理解任意 log +- cron / PATH / node version 問題,已透過 wrapper 固定 Node 與 OpenClaw entry 盡量降低 + +## Examples +### Example A: 一般任務 checkpoint +```bash +cd /home/chchang/.openclaw/workspace/owner-reporting-system/scripts +uv run python owner_report_producer.py \ + --team general-task \ + --worker alice \ + --task-id manual-checkpoint-1 \ + --progress 50% \ + --done '第一階段完成' \ + --next '等待第二階段結果' \ + --status normal \ + --source manual-checkpoint +``` + +### Example B: 只驗證不送出 +```bash +/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh --dry-run +``` + +### Example C: watchdog-b 單發直送到 owner-facing Discord +先用 probe 找到本機 runtime: +```bash +python3 /home/chchang/.openclaw/workspace/skills/watchdog-discord-route/scripts/openclaw_runtime_probe.py --pretty +``` + +再帶入偵測到的 `node` / `dist/entry.js`: +```bash +cd /home/chchang/.openclaw/workspace/owner-reporting-system/scripts +python3 owner_report_driver.py \ + --send-cmd '"" "" message send --channel discord --target '\''channel:1480577550445969541'\'' --message "$OWNER_REPORT_MESSAGE"' +``` + +### Example D: 臨時改送別的 channel / target +```bash +OWNER_REPORT_CHANNEL=telegram \ +OWNER_REPORT_TARGET=123456789 \ +/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh +``` + +### Example E: 快速消化 backlog +```bash +OWNER_REPORT_MAX_COUNT=10 \ +/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh +``` + +## Operator summary +如果只記三件事: +1. **report 要先進 `pending/`,才有東西可送** +2. **只有 send 成功才會移到 `sent/`** +3. **看不到通知時,先查 pending / cron log / 最舊失敗那筆** diff --git a/references/owner-reporting-system.md b/references/owner-reporting-system.md new file mode 100644 index 0000000..0ee69f3 --- /dev/null +++ b/references/owner-reporting-system.md @@ -0,0 +1,80 @@ +# Owner Reporting System + +這是一套全域性的 owner-facing 主動回報流程,不屬於某一個特定專案。 + +它的目的,是把長時間、多步驟、不可漏回報的工作,整理成一條可觀測、可驗證、失敗不假成功的通知鏈路。 + +## Core flow + +General queue drain path: +`producer -> pending/*.md -> watchdog -> driver -> Discord -> sent/*.md` + +Watchdog-b single-shot direct path: +`producer -> pending/*.md -> driver -> Discord channel:1480577550445969541 -> sent/*.md` + +元件: +- `scripts/owner_report_producer.py` +- `scripts/owner_report_consumer.py` +- `scripts/owner_report_driver.py` +- `scripts/owner_report_watchdog.py` +- `scripts/run_owner_report_watchdog.sh` +- `OWNER_REPORT_OPERATOR_MANUAL.md` + +## Scope + +適用於: +- ClawTeam / subagent / 背景流程 checkpoint +- 多步驟技術任務 +- 明確要求不要漏回報的交辦 +- 需要 oldest-first / success-only archive / stop-on-failure 語義的通知鏈路 + +不適用於: +- 單次短問答 +- 不需要主動通知的小修改 +- 一次即可回完的低風險任務 + +## Queue paths + +- pending: `~/.clawteam/owner-reports/pending/` +- sent: `~/.clawteam/owner-reports/sent/` + +## Local integration + +本機目前由 user crontab 每分鐘執行一次 watchdog wrapper: + +- wrapper: `/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh` +- log: `/opt/workspace_auditing_report/logs/owner_report_watchdog_cron.out` +- default target: 預設 `OWNER_REPORT_CHANNEL=discord` + `OWNER_REPORT_TARGET=channel:1480577550445969541` +- backlog per run: 預設 `OWNER_REPORT_MAX_COUNT=5` + +另外,watchdog-b owner-facing 單發驗證現在可直接走 `owner_report_driver.py`,不必依賴 wrapper watchdog 的目標/顯示語義判斷。 + +## Common commands + +```bash +# produce one checkpoint report +cd /home/chchang/.openclaw/workspace/owner-reporting-system/scripts +uv run python owner_report_producer.py \ + --team general-task \ + --worker alice \ + --task-id example-task \ + --progress 50% \ + --done '第一階段完成' \ + --next '等待第二階段結果' \ + --status normal \ + --source manual-checkpoint + +# dry-run watchdog +/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh --dry-run + +# process backlog immediately +OWNER_REPORT_MAX_COUNT=20 \ +/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh + +# temporarily override destination +OWNER_REPORT_CHANNEL=telegram \ +OWNER_REPORT_TARGET=864811879 \ +/home/chchang/.openclaw/workspace/owner-reporting-system/scripts/run_owner_report_watchdog.sh --dry-run +``` + +更完整的操作、debug 與 failure semantics 請看 `OWNER_REPORT_OPERATOR_MANUAL.md`。 diff --git a/references/watchdog-b-readme.md b/references/watchdog-b-readme.md new file mode 100644 index 0000000..df42427 --- /dev/null +++ b/references/watchdog-b-readme.md @@ -0,0 +1,192 @@ +# Watchdog B v3 notification layer + +Single source of truth for owner-facing policy: `~/.config/openclaw/watchdog-b.env` + +Runtime auto-detection source: `scripts/openclaw_runtime_probe.py` + +This directory now contains: + +- `check_openclaw_state.sh` — tri-state checker (`running` / `stalled` / `idle`) +- `run_watchdog_b.sh` — dispatcher + notification runner +- `notify_watchdog_b.py` — minimal notification integration layer + +## Configuration source + +Priority order is now: +1. process env already set by caller/systemd +2. `WATCHDOG_B_CONFIG_FILE` if set +3. fallback `~/.config/openclaw/watchdog-b.env` +4. code defaults + +This means both `run_watchdog_b.sh` and `notify_watchdog_b.py` can be invoked manually and still resolve the same owner-facing channel / target / mode / wording. + +For Node/OpenClaw runtime paths, the bundled scripts now resolve in this order: +1. explicit env overrides: `WATCHDOG_B_NODE_BIN`, `WATCHDOG_B_OPENCLAW_MJS`, `WATCHDOG_B_OPENCLAW_ENTRY` +2. PATH lookup for `node` and `openclaw` +3. common install roots scan: nvm, pnpm global, npm-global, `/usr/local`, `/usr`, Volta-style trees +4. fail with an operator-facing error that tells you which env vars to set manually + +Repo template to copy from: +- `ops/systemd/user/watchdog-b.env.example` + +Install example: +```bash +mkdir -p ~/.config/openclaw +cp ~/.openclaw/workspace/ops/systemd/user/watchdog-b.env.example ~/.config/openclaw/watchdog-b.env +$EDITOR ~/.config/openclaw/watchdog-b.env +``` + +## Notification strategy + +### 1) `running` +Default: **manual / queue-ready only**. + +Why: +- a healthy runtime every 10 minutes should not spam Eric +- owner-facing reporting should remain explicit and auditable + +Behavior: +- `WATCHDOG_B_RUNNING_REPORT_MODE=manual` (default) + - does not create external messages + - returns a concrete hint for how to enable queue creation +- `WATCHDOG_B_RUNNING_REPORT_MODE=enqueue` + - creates a real pending owner report in `~/.clawteam/owner-reports/pending/` + - does **not** auto-send it +- `WATCHDOG_B_RUNNING_REPORT_MODE=enqueue-and-drain` + - creates a pending owner report and immediately delivers that exact pending item through `owner_report_driver.py` + - send path is direct OpenClaw Discord send using the env-configured owner-facing destination + - this keeps queue/audit semantics but avoids depending on the wrapper watchdog's destination/visibility behavior + +Throttle: +- `WATCHDOG_B_RUNNING_REPORT_MIN_INTERVAL_SECONDS` default `3600` + +### 2) `stalled` +Default: **nudge main agent first**, then escalate to Eric only after repetition. + +Behavior: +- call internal OpenClaw agent route: + - `node .../openclaw.mjs agent --agent main --message ...` +- maintain local notify state under `state/watchdog-b/notify-state.json` +- after repeated observations (`WATCHDOG_B_STALLED_OWNER_ESCALATION_AFTER`, default `2`), enqueue an owner report + +Throttle: +- `WATCHDOG_B_STALLED_NUDGE_MIN_INTERVAL_SECONDS` default `900` + +Owner escalation mode: +- `WATCHDOG_B_STALLED_OWNER_MODE=escalate` (default) +- `WATCHDOG_B_STALLED_OWNER_MODE=always` +- `WATCHDOG_B_STALLED_OWNER_MODE=never` + +Owner delivery mode after enqueue: +- `WATCHDOG_B_OWNER_DELIVERY_MODE=enqueue-only` (default) +- `WATCHDOG_B_OWNER_DELIVERY_MODE=direct-discord` + +When `direct-discord` is enabled, watchdog-b still enqueues first, then directly delivers that same pending report via `owner_report_driver.py` to the env-configured Discord target. + +## Owner-facing message style + +Owner-facing Discord message is now compact and conclusion-first: +- first line: headline (`🔔 [watchdog-b] `) +- second line: concise conclusion with emoji, e.g. `✅ 主程序仍在運行` +- third line: actionable next step, prefixed with `→` +- last line: compact technical metadata (`task=... | status=... | progress=... | source=...`) + +Style knobs live in `watchdog-b.env`: +- `WATCHDOG_B_RUNNING_EMOJI`, `WATCHDOG_B_RUNNING_SUMMARY` +- `WATCHDOG_B_STALLED_EMOJI`, `WATCHDOG_B_STALLED_SUMMARY` +- `WATCHDOG_B_IDLE_EMOJI`, `WATCHDOG_B_IDLE_SUMMARY` + +### 3) `idle` +Default: same pattern as stalled, but slower. + +Behavior: +- nudge main agent first +- only escalate to Eric after repeated idle detections + +Throttle: +- `WATCHDOG_B_IDLE_NUDGE_MIN_INTERVAL_SECONDS` default `1800` + +Owner escalation threshold: +- `WATCHDOG_B_IDLE_OWNER_ESCALATION_AFTER` default `2` + +## Safety defaults + +- `WATCHDOG_B_NOTIFY_DRY_RUN=1` by default in `run_watchdog_b.sh` +- owner-facing send path keeps the existing `owner-reporting-system` queue/artifact/audit flow, but direct Discord delivery no longer depends on the cron wrapper's default destination semantics +- local state tracks last send time / count to reduce spam +- a failed notifier does not crash the dispatcher; it emits warning + preserves artifacts + +## Key artifacts + +Under `state/watchdog-b/`: + +- `last-output.txt` — rendered dispatcher output +- `last-notify-output.txt` — notifier JSON result +- `last-state.txt` — last state +- `history.tsv` — state history +- `notify-state.json` — throttle / repetition tracking + +## Manual test examples + +### Dry-run running path +```bash +WATCHDOG_B_NOTIFY_DRY_RUN=1 \ +WATCHDOG_B_RUNNING_REPORT_MODE=manual \ +./scripts/watchdog-b/notify_watchdog_b.py --state running --dry-run +``` + +### Real queue creation for running (no send) +```bash +WATCHDOG_B_NOTIFY_DRY_RUN=0 \ +WATCHDOG_B_RUNNING_REPORT_MODE=enqueue \ +./scripts/watchdog-b/notify_watchdog_b.py --state running +ls -l ~/.clawteam/owner-reports/pending +``` + +### Runtime probe only +```bash +python3 ./scripts/watchdog-b/openclaw_runtime_probe.py --pretty +``` + +### Single-shot enqueue + direct Discord delivery +```bash +WATCHDOG_B_NOTIFY_DRY_RUN=0 \ +WATCHDOG_B_RUNNING_REPORT_MODE=enqueue-and-drain \ +./scripts/watchdog-b/notify_watchdog_b.py --state running +``` + +### Dry-run stalled nudge +```bash +WATCHDOG_B_NOTIFY_DRY_RUN=1 \ +./scripts/watchdog-b/notify_watchdog_b.py --state stalled --dry-run +``` + +If runtime auto-detection fails on a host with a custom install layout, set one or more of: +- `WATCHDOG_B_NODE_BIN` +- `WATCHDOG_B_OPENCLAW_MJS` +- `WATCHDOG_B_OPENCLAW_ENTRY` + +### Full dispatcher dry-run with fixture overrides +```bash +OPENCLAW_PID_FILE=$PWD/tests/fixtures/watchdog-b/running/host-runtime/openclaw.pid \ +OPENCLAW_LOG_FILE=$PWD/tests/fixtures/watchdog-b/running/logs/openclaw.log \ +WATCHDOG_B_ARTIFACT_DIR=$PWD/state/watchdog-b-test-running-v3 \ +WATCHDOG_B_NOTIFY_DRY_RUN=1 \ +./scripts/watchdog-b/run_watchdog_b.sh +``` + +## What is truly wired vs not + +### Truly wired now +- state detection +- notifier invocation from dispatcher +- main-agent internal nudge command construction and execution path +- owner-report queue creation via existing producer +- optional direct delivery of an enqueued owner report through `owner_report_driver.py` +- throttling / repetition state persisted locally + +### Still conditional / not claimed as universally proven +- successful main-agent wake-up depends on local OpenClaw CLI/runtime being callable from this environment +- successful owner-facing delivery still depends on valid local OpenClaw Discord routing on the host +- direct watchdog-b owner delivery now targets `channel:1480577550445969541` by default and bypasses the wrapper watchdog destination logic +- the cron wrapper remains available for generic queue draining, but watchdog-b no longer needs to rely on it for single-shot owner-facing verification diff --git a/scripts/__pycache__/notify_watchdog_b.cpython-312.pyc b/scripts/__pycache__/notify_watchdog_b.cpython-312.pyc new file mode 100644 index 0000000..360593f Binary files /dev/null and b/scripts/__pycache__/notify_watchdog_b.cpython-312.pyc differ diff --git a/scripts/__pycache__/openclaw_runtime_probe.cpython-312.pyc b/scripts/__pycache__/openclaw_runtime_probe.cpython-312.pyc new file mode 100644 index 0000000..62c5f74 Binary files /dev/null and b/scripts/__pycache__/openclaw_runtime_probe.cpython-312.pyc differ diff --git a/scripts/__pycache__/owner_report_consumer.cpython-312.pyc b/scripts/__pycache__/owner_report_consumer.cpython-312.pyc new file mode 100644 index 0000000..c4d861f Binary files /dev/null and b/scripts/__pycache__/owner_report_consumer.cpython-312.pyc differ diff --git a/scripts/__pycache__/owner_report_driver.cpython-312.pyc b/scripts/__pycache__/owner_report_driver.cpython-312.pyc new file mode 100644 index 0000000..8f72d72 Binary files /dev/null and b/scripts/__pycache__/owner_report_driver.cpython-312.pyc differ diff --git a/scripts/__pycache__/owner_report_producer.cpython-312.pyc b/scripts/__pycache__/owner_report_producer.cpython-312.pyc new file mode 100644 index 0000000..2c0eb2c Binary files /dev/null and b/scripts/__pycache__/owner_report_producer.cpython-312.pyc differ diff --git a/scripts/bootstrap_watchdog_bundle.sh b/scripts/bootstrap_watchdog_bundle.sh new file mode 100755 index 0000000..303add1 --- /dev/null +++ b/scripts/bootstrap_watchdog_bundle.sh @@ -0,0 +1,174 @@ +#!/usr/bin/env bash +set -euo pipefail + +SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)" +SKILL_DIR="$(cd -- "$SCRIPT_DIR/.." && pwd)" +HOME_DIR="${HOME:?HOME is required}" +WORKSPACE_DEFAULT="$HOME_DIR/.openclaw/workspace" +WORKSPACE="${WATCHDOG_B_WORKSPACE:-$WORKSPACE_DEFAULT}" +LIVE_SCRIPT_DIR="${WATCHDOG_B_LIVE_SCRIPT_DIR:-$WORKSPACE/scripts/watchdog-b}" +SYSTEMD_USER_DIR="${WATCHDOG_B_SYSTEMD_USER_DIR:-$HOME_DIR/.config/systemd/user}" +CONFIG_DIR="${WATCHDOG_B_CONFIG_DIR:-$HOME_DIR/.config/openclaw}" +CONFIG_FILE="${WATCHDOG_B_CONFIG_FILE:-$CONFIG_DIR/watchdog-b.env}" +PROBE_SCRIPT="${WATCHDOG_B_RUNTIME_PROBE:-$SCRIPT_DIR/openclaw_runtime_probe.py}" +NODE_BIN_RAW="${WATCHDOG_B_NODE_BIN:-}" +OPENCLAW_MJS="${WATCHDOG_B_OPENCLAW_MJS:-}" +OPENCLAW_ENTRY="${WATCHDOG_B_OPENCLAW_ENTRY:-}" +OWNER_REPORT_PRODUCER="${WATCHDOG_B_OWNER_PRODUCER:-$LIVE_SCRIPT_DIR/owner_report_producer.py}" +OWNER_REPORT_DRIVER="${WATCHDOG_B_OWNER_DRIVER:-$LIVE_SCRIPT_DIR/owner_report_driver.py}" +OWNER_REPORT_CONSUMER_DEFAULT="$LIVE_SCRIPT_DIR/owner_report_consumer.py" +OWNER_REPORT_CONSUMER="${WATCHDOG_B_OWNER_REPORT_CONSUMER:-$OWNER_REPORT_CONSUMER_DEFAULT}" +FAILURES=0 + +pass() { echo "[PASS] $*"; } +warn() { echo "[WARN] $*"; } +fail() { echo "[FAIL] $*"; FAILURES=$((FAILURES+1)); } + +check_exists() { + local path="$1" label="$2" + if [[ -e "$path" ]]; then + pass "$label: $path" + else + fail "$label missing: $path" + fi +} + +check_exec_path() { + local raw="$1" label="$2" + local resolved="" + if [[ "$raw" == */* ]]; then + resolved="$raw" + if [[ -x "$resolved" ]]; then + pass "$label executable: $resolved" + else + fail "$label not executable: $resolved" + fi + return + fi + if resolved="$(command -v "$raw" 2>/dev/null)"; then + pass "$label on PATH: $resolved" + else + fail "$label not found on PATH: $raw" + fi +} + +check_systemd_user() { + if ! command -v systemctl >/dev/null 2>&1; then + fail "systemctl not found" + return + fi + if systemctl --user --version >/dev/null 2>&1; then + pass "systemd --user command available" + else + fail "systemd --user unavailable" + fi + if systemctl --user show-environment >/dev/null 2>&1; then + pass "systemd --user bus reachable" + else + warn "systemd --user bus not reachable in current session" + fi +} + +check_env_target() { + if [[ ! -f "$CONFIG_FILE" ]]; then + warn "config file not present yet: $CONFIG_FILE" + return + fi + local target="" + target="$(awk -F= '/^WATCHDOG_B_OWNER_REPORT_TARGET=/{print $2}' "$CONFIG_FILE" | tail -n 1 | tr -d '[:space:]' || true)" + if [[ -z "$target" ]]; then + fail "WATCHDOG_B_OWNER_REPORT_TARGET missing in $CONFIG_FILE" + elif [[ "$target" == "channel:REPLACE_ME" ]]; then + fail "WATCHDOG_B_OWNER_REPORT_TARGET still placeholder in $CONFIG_FILE" + elif [[ "$target" == channel:* || "$target" == user:* ]]; then + pass "WATCHDOG_B_OWNER_REPORT_TARGET looks configured: $target" + else + warn "WATCHDOG_B_OWNER_REPORT_TARGET present but format is unusual: $target" + fi +} + +probe_runtime() { + if [[ ! -f "$PROBE_SCRIPT" ]]; then + fail "runtime probe missing: $PROBE_SCRIPT" + return + fi + + local probe_output="" + if ! probe_output="$(python3 "$PROBE_SCRIPT" --shell 2>/dev/null)"; then + fail "runtime probe failed; set WATCHDOG_B_NODE_BIN / WATCHDOG_B_OPENCLAW_MJS / WATCHDOG_B_OPENCLAW_ENTRY explicitly" + return + fi + + while IFS='=' read -r key value; do + case "$key" in + WATCHDOG_B_NODE_BIN) NODE_BIN_RAW="$value" ;; + WATCHDOG_B_OPENCLAW_MJS) OPENCLAW_MJS="$value" ;; + WATCHDOG_B_OPENCLAW_ENTRY) OPENCLAW_ENTRY="$value" ;; + esac + done <<< "$probe_output" + + pass "runtime probe resolved node/openclaw paths" +} + +check_message_cli() { + probe_runtime + if [[ -n "$OPENCLAW_ENTRY" && -f "$OPENCLAW_ENTRY" ]]; then + pass "openclaw entry present: $OPENCLAW_ENTRY" + else + fail "openclaw entry missing: ${OPENCLAW_ENTRY:-}" + fi + if [[ -n "$OPENCLAW_MJS" && -f "$OPENCLAW_MJS" ]]; then + pass "openclaw mjs present: $OPENCLAW_MJS" + else + fail "openclaw mjs missing: ${OPENCLAW_MJS:-}" + fi +} + +echo "watchdog-discord-route bootstrap" +echo "- skill_dir: $SKILL_DIR" +echo "- workspace: $WORKSPACE" +echo "- live_script_dir: $LIVE_SCRIPT_DIR" +echo "- systemd_user_dir: $SYSTEMD_USER_DIR" +echo "- config_file: $CONFIG_FILE" + +echo +echo "[bundle]" +check_exists "$SCRIPT_DIR/check_openclaw_state.sh" "bundled checker" +check_exists "$SCRIPT_DIR/run_watchdog_b.sh" "bundled runner" +check_exists "$SCRIPT_DIR/notify_watchdog_b.py" "bundled notifier" +check_exists "$SCRIPT_DIR/openclaw_runtime_probe.py" "bundled runtime probe" +check_exists "$SCRIPT_DIR/openclaw-watchdog-b.service" "bundled service" +check_exists "$SCRIPT_DIR/openclaw-watchdog-b.timer" "bundled timer" +check_exists "$SCRIPT_DIR/watchdog-b.env.example" "bundled env example" + +echo +echo "[workspace/live paths]" +check_exists "$WORKSPACE" "workspace" +check_exists "$LIVE_SCRIPT_DIR" "live script dir" +check_exists "$OWNER_REPORT_CONSUMER" "live owner_report_consumer.py" +check_exists "$OWNER_REPORT_PRODUCER" "live owner_report_producer.py" +check_exists "$OWNER_REPORT_DRIVER" "live owner_report_driver.py" + +echo +echo "[runtime]" +check_message_cli +if [[ -n "$NODE_BIN_RAW" ]]; then + check_exec_path "$NODE_BIN_RAW" "node" +else + fail "node runtime unresolved" +fi +check_exec_path "python3" "python3" +check_systemd_user + +echo +echo "[discord-route minimal config]" +check_env_target + +if [[ $FAILURES -gt 0 ]]; then + echo + fail "bootstrap failed with $FAILURES issue(s)" + exit 1 +fi + +echo +pass "bootstrap checks passed" diff --git a/scripts/check_openclaw_state.sh b/scripts/check_openclaw_state.sh new file mode 100755 index 0000000..82aa1a3 --- /dev/null +++ b/scripts/check_openclaw_state.sh @@ -0,0 +1,68 @@ +#!/usr/bin/env bash +set -euo pipefail + +# Watchdog B MVP tri-state checker for OpenClaw main runtime. +# Output (stdout): exactly one token: running | stalled | idle +# +# Heuristic (MVP): +# - If openclaw.pid exists and process is alive => running unless logs are stale. +# - If process alive but log file hasn't changed for STALL_AFTER_SECONDS => stalled. +# - Otherwise => idle. +# +# Future extension point: +# - Replace/augment log-freshness with real main-agent session/ledger signals. + +PID_FILE_DEFAULT="${OPENCLAW_PID_FILE:-/home/chchang/.openclaw/workspace/host-runtime/openclaw.pid}" +LOG_FILE_DEFAULT="${OPENCLAW_LOG_FILE:-/home/chchang/.openclaw/workspace/logs/openclaw.log}" + +STALL_AFTER_SECONDS="${STALL_AFTER_SECONDS:-1200}" # 20 minutes default +NOW_EPOCH="$(date +%s)" + +pid_file="$PID_FILE_DEFAULT" +log_file="$LOG_FILE_DEFAULT" + +get_mtime_epoch() { + # GNU stat: %Y; BSD stat: -f %m + local path="$1" + if stat -c %Y "$path" >/dev/null 2>&1; then + stat -c %Y "$path" + else + stat -f %m "$path" + fi +} + +proc_alive() { + local pid="$1" + [[ -n "$pid" ]] || return 1 + [[ "$pid" =~ ^[0-9]+$ ]] || return 1 + kill -0 "$pid" >/dev/null 2>&1 +} + +# No pid file => idle +if [[ ! -f "$pid_file" ]]; then + echo "idle" + exit 0 +fi + +pid="$(tr -d ' \t\n\r' < "$pid_file" || true)" + +# PID file exists but process not alive => idle +if ! proc_alive "$pid"; then + echo "idle" + exit 0 +fi + +# Process alive. If no log file, assume running (can't assess stall) +if [[ ! -f "$log_file" ]]; then + echo "running" + exit 0 +fi + +log_mtime="$(get_mtime_epoch "$log_file")" +age=$(( NOW_EPOCH - log_mtime )) + +if (( age > STALL_AFTER_SECONDS )); then + echo "stalled" +else + echo "running" +fi diff --git a/scripts/install_watchdog_bundle.sh b/scripts/install_watchdog_bundle.sh new file mode 100755 index 0000000..bb56a0c --- /dev/null +++ b/scripts/install_watchdog_bundle.sh @@ -0,0 +1,136 @@ +#!/usr/bin/env bash +set -euo pipefail + +SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)" +SKILL_DIR="$(cd -- "$SCRIPT_DIR/.." && pwd)" +HOME_DIR="${HOME:?HOME is required}" +WORKSPACE_DEFAULT="$HOME_DIR/.openclaw/workspace" +WORKSPACE="${WATCHDOG_B_WORKSPACE:-$WORKSPACE_DEFAULT}" +SYSTEMD_USER_DIR="${WATCHDOG_B_SYSTEMD_USER_DIR:-$HOME_DIR/.config/systemd/user}" +CONFIG_DIR="${WATCHDOG_B_CONFIG_DIR:-$HOME_DIR/.config/openclaw}" +LIVE_SCRIPT_DIR="${WATCHDOG_B_LIVE_SCRIPT_DIR:-$WORKSPACE/scripts/watchdog-b}" +INSTALL_ENV_EXAMPLE=0 +FORCE=0 + +usage() { + cat </scripts/watchdog-b) + --install-env-example Also install watchdog-b.env.example to /watchdog-b.env.example + --force Overwrite existing files in live paths + -h, --help Show this help +EOF +} + +while [[ $# -gt 0 ]]; do + case "$1" in + --workspace) + WORKSPACE="$2"; shift 2 ;; + --systemd-user-dir) + SYSTEMD_USER_DIR="$2"; shift 2 ;; + --config-dir) + CONFIG_DIR="$2"; shift 2 ;; + --live-script-dir) + LIVE_SCRIPT_DIR="$2"; shift 2 ;; + --install-env-example) + INSTALL_ENV_EXAMPLE=1; shift ;; + --force) + FORCE=1; shift ;; + -h|--help) + usage; exit 0 ;; + *) + echo "unknown argument: $1" >&2 + usage >&2 + exit 2 ;; + esac +done + +mkdir -p "$LIVE_SCRIPT_DIR" "$SYSTEMD_USER_DIR" "$CONFIG_DIR" + +copy_file() { + local src="$1" + local dest="$2" + if [[ -e "$dest" && "$FORCE" != "1" ]]; then + echo "skip existing: $dest" + return 0 + fi + install -m 0644 "$src" "$dest" + echo "installed: $dest" +} + +copy_exec() { + local src="$1" + local dest="$2" + if [[ -e "$dest" && "$FORCE" != "1" ]]; then + echo "skip existing: $dest" + return 0 + fi + install -m 0755 "$src" "$dest" + echo "installed: $dest" +} + +render_service() { + local src="$SCRIPT_DIR/openclaw-watchdog-b.service" + local dest="$SYSTEMD_USER_DIR/openclaw-watchdog-b.service" + if [[ -e "$dest" && "$FORCE" != "1" ]]; then + echo "skip existing: $dest" + return 0 + fi + sed \ + -e "s#%h/.openclaw/workspace#${WORKSPACE//\#/\\#}#g" \ + -e "s#%h/.config/openclaw#${CONFIG_DIR//\#/\\#}#g" \ + -e "s#%h/.openclaw/workspace/scripts/watchdog-b#${LIVE_SCRIPT_DIR//\#/\\#}#g" \ + "$src" > "$dest" + chmod 0644 "$dest" + echo "installed: $dest" +} + +copy_exec "$SCRIPT_DIR/check_openclaw_state.sh" "$LIVE_SCRIPT_DIR/check_openclaw_state.sh" +copy_exec "$SCRIPT_DIR/run_watchdog_b.sh" "$LIVE_SCRIPT_DIR/run_watchdog_b.sh" +copy_exec "$SCRIPT_DIR/verify_watchdog_b_e2e.sh" "$LIVE_SCRIPT_DIR/verify_watchdog_b_e2e.sh" +copy_exec "$SCRIPT_DIR/notify_watchdog_b.py" "$LIVE_SCRIPT_DIR/notify_watchdog_b.py" +copy_exec "$SCRIPT_DIR/openclaw_runtime_probe.py" "$LIVE_SCRIPT_DIR/openclaw_runtime_probe.py" +copy_file "$SCRIPT_DIR/owner_report_consumer.py" "$LIVE_SCRIPT_DIR/owner_report_consumer.py" +copy_file "$SCRIPT_DIR/owner_report_driver.py" "$LIVE_SCRIPT_DIR/owner_report_driver.py" +copy_file "$SCRIPT_DIR/owner_report_producer.py" "$LIVE_SCRIPT_DIR/owner_report_producer.py" +copy_file "$SCRIPT_DIR/openclaw-watchdog-b.timer" "$SYSTEMD_USER_DIR/openclaw-watchdog-b.timer" +render_service + +if [[ "$INSTALL_ENV_EXAMPLE" == "1" ]]; then + copy_file "$SCRIPT_DIR/watchdog-b.env.example" "$CONFIG_DIR/watchdog-b.env.example" +fi + +cat < None: + if not path.exists(): + return + for raw_line in path.read_text(encoding="utf-8").splitlines(): + line = raw_line.strip() + if not line or line.startswith("#") or "=" not in line: + continue + key, value = line.split("=", 1) + key = key.strip() + if not key: + continue + value = value.strip() + if (value.startswith('"') and value.endswith('"')) or (value.startswith("'") and value.endswith("'")): + value = value[1:-1] + os.environ.setdefault(key, value) + + +load_env_file(CONFIG_FILE) + +STATE_DIR = Path(os.environ.get("WATCHDOG_B_ARTIFACT_DIR", str(WORKSPACE / "state" / "watchdog-b"))) +NOTIFY_STATE_PATH = STATE_DIR / "notify-state.json" +OWNER_PRODUCER = Path(os.environ.get("WATCHDOG_B_OWNER_PRODUCER", str(SCRIPT_DIR / "owner_report_producer.py"))) +OWNER_DRIVER = Path(os.environ.get("WATCHDOG_B_OWNER_DRIVER", str(SCRIPT_DIR / "owner_report_driver.py"))) +PYTHON_BIN = os.environ.get("WATCHDOG_B_PYTHON_BIN", sys.executable or "python3") +WATCHDOG_OWNER_REPORT_CHANNEL = os.environ.get("WATCHDOG_B_OWNER_REPORT_CHANNEL", "discord") +WATCHDOG_OWNER_REPORT_TARGET = os.environ.get("WATCHDOG_B_OWNER_REPORT_TARGET", "channel:REPLACE_ME") +WATCHDOG_MAIN_AGENT_ID = os.environ.get("WATCHDOG_B_MAIN_AGENT_ID", "").strip() +HOSTNAME = os.uname().nodename +UTC = timezone.utc +RUNTIME_PROBE = Path(os.environ.get("WATCHDOG_B_RUNTIME_PROBE", str(SCRIPT_DIR / "openclaw_runtime_probe.py"))) +RUNTIME_CACHE: dict[str, Path] | None = None + +DEFAULTS = { + "running_min_interval_seconds": 3600, + "stalled_nudge_min_interval_seconds": 900, + "idle_nudge_min_interval_seconds": 1800, + "stalled_owner_escalation_after": 2, + "idle_owner_escalation_after": 2, +} + + +def now_iso() -> str: + return datetime.now().astimezone().isoformat(timespec="seconds") + + +def path_or_none(value: str | None) -> Path | None: + if not value: + return None + return Path(value).expanduser() + + +def detect_runtime_paths() -> dict[str, Path]: + global RUNTIME_CACHE + if RUNTIME_CACHE is not None: + return RUNTIME_CACHE + + node_bin = path_or_none(os.environ.get("WATCHDOG_B_NODE_BIN")) + openclaw_mjs = path_or_none(os.environ.get("WATCHDOG_B_OPENCLAW_MJS")) + openclaw_entry = path_or_none(os.environ.get("WATCHDOG_B_OPENCLAW_ENTRY")) + + if node_bin and node_bin.exists() and os.access(node_bin, os.X_OK) and openclaw_mjs and openclaw_mjs.is_file() and openclaw_entry and openclaw_entry.is_file(): + RUNTIME_CACHE = { + "node": node_bin, + "openclaw_mjs": openclaw_mjs, + "openclaw_entry": openclaw_entry, + } + return RUNTIME_CACHE + + if RUNTIME_PROBE.exists(): + proc = subprocess.run([PYTHON_BIN, str(RUNTIME_PROBE)], text=True, capture_output=True) + if proc.returncode == 0: + payload = json.loads(proc.stdout) + detected = payload.get("detected", {}) + RUNTIME_CACHE = { + "node": Path(detected["node"]), + "openclaw_mjs": Path(detected["openclaw_mjs"]), + "openclaw_entry": Path(detected["openclaw_entry"]), + } + return RUNTIME_CACHE + + node_which = shutil.which("node") + if node_which: + node_bin = Path(node_which) + + missing = [] + if not node_bin or not node_bin.exists(): + missing.append("WATCHDOG_B_NODE_BIN") + if not openclaw_mjs or not openclaw_mjs.is_file(): + missing.append("WATCHDOG_B_OPENCLAW_MJS") + if not openclaw_entry or not openclaw_entry.is_file(): + missing.append("WATCHDOG_B_OPENCLAW_ENTRY") + raise RuntimeError( + "Unable to auto-detect watchdog runtime paths. Missing: " + ", ".join(missing) + ) + + +def load_state() -> dict[str, Any]: + if NOTIFY_STATE_PATH.exists(): + try: + return json.loads(NOTIFY_STATE_PATH.read_text(encoding="utf-8")) + except Exception: + pass + return {"events": {}} + + +def save_state(data: dict[str, Any]) -> None: + STATE_DIR.mkdir(parents=True, exist_ok=True) + NOTIFY_STATE_PATH.write_text(json.dumps(data, ensure_ascii=False, indent=2) + "\n", encoding="utf-8") + + +def event_bucket(state: str) -> dict[str, Any]: + data = load_state() + events = data.setdefault("events", {}) + bucket = events.setdefault(state, {}) + return data + + +def get_bucket(data: dict[str, Any], state: str) -> dict[str, Any]: + events = data.setdefault("events", {}) + return events.setdefault(state, {}) + + +def should_send(bucket: dict[str, Any], min_interval_seconds: int, timestamp: datetime) -> tuple[bool, str]: + last_sent = bucket.get("last_sent_at") + if not last_sent: + return True, "first-send" + try: + prev = datetime.fromisoformat(last_sent) + except Exception: + return True, "state-corrupt-reset" + elapsed = (timestamp - prev).total_seconds() + if elapsed >= min_interval_seconds: + return True, f"interval-ok:{int(elapsed)}s" + return False, f"throttled:{int(elapsed)}s<{min_interval_seconds}s" + + +def mark_sent(bucket: dict[str, Any], channel: str, timestamp: str, detail: dict[str, Any] | None = None) -> None: + bucket["last_sent_at"] = timestamp + bucket["last_channel"] = channel + bucket["send_count"] = int(bucket.get("send_count", 0)) + 1 + bucket["last_detail"] = detail or {} + + +def build_owner_message(state: str, timestamp: str, detail: str) -> dict[str, str]: + emoji_default = { + "running": "✅", + "stalled": "⚠️", + "idle": "🛑", + } + summary_default = { + "running": "主程序仍在運行", + "stalled": "主程序疑似卡住", + "idle": "主程序目前未運行", + } + progress_default = { + "running": "running", + "stalled": "stalled", + "idle": "idle", + } + status_default = { + "running": "normal", + "stalled": "needs-attention", + "idle": "needs-attention", + } + source_default = { + "running": "watchdog-b-running", + "stalled": "watchdog-b-stalled-escalation", + "idle": "watchdog-b-idle-escalation", + } + detail_default = { + "running": f"checked_at={timestamp} host={HOSTNAME}", + "stalled": f"checked_at={timestamp} host={HOSTNAME}; stale activity detected while process still looked alive", + "idle": f"checked_at={timestamp} host={HOSTNAME}; no active main runtime detected", + } + return { + "progress": os.environ.get(f"WATCHDOG_B_{state.upper()}_PROGRESS_LABEL", progress_default[state]), + "done": f"{os.environ.get(f'WATCHDOG_B_{state.upper()}_EMOJI', emoji_default[state])} {os.environ.get(f'WATCHDOG_B_{state.upper()}_SUMMARY', summary_default[state])}", + "next": detail or os.environ.get(f"WATCHDOG_B_{state.upper()}_DETAIL", detail_default[state]), + "status": os.environ.get(f"WATCHDOG_B_{state.upper()}_STATUS", status_default[state]), + "source": os.environ.get(f"WATCHDOG_B_{state.upper()}_SOURCE", source_default[state]), + } + + +def enqueue_owner_report(*, state: str, timestamp: str, dry_run: bool, detail: str) -> dict[str, Any]: + msg = build_owner_message(state, timestamp, detail) + report_id = f"watchdog-b-{state}-{datetime.now(UTC).strftime('%Y%m%dT%H%M%SZ')}" + cmd = [ + PYTHON_BIN, + str(OWNER_PRODUCER), + "--team", + "watchdog-b", + "--worker", + HOSTNAME, + "--task-id", + f"openclaw-main-{state}", + "--progress", + msg["progress"], + "--done", + msg["done"], + "--next", + msg["next"], + "--status", + msg["status"], + "--source", + msg["source"], + "--report-id", + report_id, + ] + if dry_run: + cmd.append("--dry-run") + proc = subprocess.run(cmd, text=True, capture_output=True) + result = { + "kind": "owner-report-enqueue", + "ok": proc.returncode == 0, + "command": cmd, + "exit_code": proc.returncode, + "stdout": proc.stdout, + "stderr": proc.stderr, + "report_id": report_id, + "dry_run": dry_run, + } + if proc.returncode == 0 and not dry_run: + result["pending_path"] = str(Path.home() / ".clawteam" / "owner-reports" / "pending" / f"{report_id}.md") + return result + + +def build_owner_send_cmd() -> str: + runtime = detect_runtime_paths() + return ( + f'"{runtime["node"]}" "{runtime["openclaw_entry"]}" message send ' + f'--channel {WATCHDOG_OWNER_REPORT_CHANNEL} ' + f"--target '{WATCHDOG_OWNER_REPORT_TARGET}' " + f'--message "$OWNER_REPORT_MESSAGE"' + ) + + +def deliver_owner_report(*, report_id: str, dry_run: bool) -> dict[str, Any]: + send_cmd = build_owner_send_cmd() + cmd = [PYTHON_BIN, str(OWNER_DRIVER), report_id, "--send-cmd", send_cmd] + if dry_run: + cmd.append("--dry-run") + proc = subprocess.run(cmd, text=True, capture_output=True) + return { + "kind": "owner-report-direct-delivery", + "ok": proc.returncode == 0, + "command": cmd, + "send_cmd": send_cmd, + "exit_code": proc.returncode, + "stdout": proc.stdout, + "stderr": proc.stderr, + "dry_run": dry_run, + "report_id": report_id, + "target_channel": WATCHDOG_OWNER_REPORT_CHANNEL, + "target": WATCHDOG_OWNER_REPORT_TARGET, + } + + +def call_main_agent(*, state: str, timestamp: str, dry_run: bool) -> dict[str, Any]: + message = ( + f"[watchdog-b][{state}] {timestamp}\n" + f"Host: {HOSTNAME}\n" + f"Please confirm current task state, whether progress is blocked, and whether owner-facing escalation is needed." + ) + if not WATCHDOG_MAIN_AGENT_ID: + return { + "kind": "main-agent-nudge", + "ok": True, + "skipped": True, + "reason": "WATCHDOG_B_MAIN_AGENT_ID not configured", + "dry_run": dry_run, + "message": message, + } + try: + runtime = detect_runtime_paths() + except Exception as exc: + return { + "kind": "main-agent-nudge", + "ok": False, + "dry_run": dry_run, + "error": str(exc), + "message": message, + } + cmd = [ + str(runtime["node"]), + str(runtime["openclaw_mjs"]), + "agent", + "--agent", + WATCHDOG_MAIN_AGENT_ID, + "--message", + message, + "--timeout", + os.environ.get("WATCHDOG_B_MAIN_AGENT_TIMEOUT", "120"), + ] + if dry_run: + return {"kind": "main-agent-nudge", "ok": True, "dry_run": True, "command": cmd, "message": message} + try: + proc = subprocess.run(cmd, text=True, capture_output=True, timeout=int(os.environ.get("WATCHDOG_B_MAIN_AGENT_TIMEOUT", "120")) + 10) + return { + "kind": "main-agent-nudge", + "ok": proc.returncode == 0, + "dry_run": False, + "command": cmd, + "exit_code": proc.returncode, + "stdout": proc.stdout, + "stderr": proc.stderr, + "message": message, + } + except subprocess.TimeoutExpired as e: + return { + "kind": "main-agent-nudge", + "ok": False, + "dry_run": False, + "command": cmd, + "timeout": True, + "stdout": e.stdout, + "stderr": e.stderr, + "message": message, + } + + +def maybe_running_report(data: dict[str, Any], bucket: dict[str, Any], timestamp: str, dry_run: bool) -> dict[str, Any]: + mode = os.environ.get("WATCHDOG_B_RUNNING_REPORT_MODE", "manual").lower() + min_interval = int(os.environ.get("WATCHDOG_B_RUNNING_REPORT_MIN_INTERVAL_SECONDS", str(DEFAULTS["running_min_interval_seconds"]))) + allowed, reason = should_send(bucket, min_interval, datetime.fromisoformat(timestamp)) + result: dict[str, Any] = { + "state": "running", + "route": "owner-report", + "mode": mode, + "allowed": allowed, + "reason": reason, + "dry_run": dry_run, + } + if mode not in {"manual", "enqueue", "enqueue-and-drain"}: + result.update({"ok": False, "error": f"unsupported running mode: {mode}"}) + return result + if mode == "manual": + result.update({ + "ok": True, + "action": "manual-only", + "hint": "set WATCHDOG_B_RUNNING_REPORT_MODE=enqueue to create a real pending item, or enqueue-and-drain to enqueue and directly deliver it to Discord", + }) + return result + if not allowed: + result.update({"ok": True, "action": "suppressed"}) + return result + enqueue = enqueue_owner_report(state="running", timestamp=timestamp, dry_run=dry_run, detail="Main runtime alive and log activity fresh.") + result["enqueue"] = enqueue + result["ok"] = enqueue.get("ok", False) + if enqueue.get("ok"): + mark_sent(bucket, "owner-report-enqueue", timestamp, {"report_id": enqueue.get("report_id")}) + if mode == "enqueue-and-drain" and enqueue.get("ok"): + deliver = deliver_owner_report(report_id=enqueue.get("report_id"), dry_run=dry_run) + result["deliver"] = deliver + result["ok"] = result["ok"] and deliver.get("ok", False) + if deliver.get("ok"): + mark_sent(bucket, "owner-report-direct-delivery", timestamp, {"report_id": enqueue.get("report_id")}) + return result + + +def maybe_nudge_and_escalate(data: dict[str, Any], bucket: dict[str, Any], *, state: str, timestamp: str, dry_run: bool) -> dict[str, Any]: + is_stalled = state == "stalled" + nudge_min = int(os.environ.get( + "WATCHDOG_B_STALLED_NUDGE_MIN_INTERVAL_SECONDS" if is_stalled else "WATCHDOG_B_IDLE_NUDGE_MIN_INTERVAL_SECONDS", + str(DEFAULTS["stalled_nudge_min_interval_seconds"] if is_stalled else DEFAULTS["idle_nudge_min_interval_seconds"]), + )) + escalation_after = int(os.environ.get( + "WATCHDOG_B_STALLED_OWNER_ESCALATION_AFTER" if is_stalled else "WATCHDOG_B_IDLE_OWNER_ESCALATION_AFTER", + str(DEFAULTS["stalled_owner_escalation_after"] if is_stalled else DEFAULTS["idle_owner_escalation_after"]), + )) + owner_mode = os.environ.get( + "WATCHDOG_B_STALLED_OWNER_MODE" if is_stalled else "WATCHDOG_B_IDLE_OWNER_MODE", + "escalate", + ).lower() + + bucket["seen_count"] = int(bucket.get("seen_count", 0)) + 1 + allowed, reason = should_send(bucket, nudge_min, datetime.fromisoformat(timestamp)) + result: dict[str, Any] = { + "state": state, + "route": "main-agent-then-owner", + "allowed": allowed, + "reason": reason, + "seen_count": bucket["seen_count"], + "owner_mode": owner_mode, + "dry_run": dry_run, + } + + if allowed: + nudge = call_main_agent(state=state, timestamp=timestamp, dry_run=dry_run) + result["main_agent_nudge"] = nudge + if nudge.get("ok"): + mark_sent(bucket, "main-agent", timestamp, {"state": state}) + result["ok"] = nudge.get("ok", False) + else: + result.update({"ok": True, "action": "nudge-suppressed"}) + + should_escalate = owner_mode in {"always", "escalate"} and bucket["seen_count"] >= escalation_after + if owner_mode == "never": + should_escalate = False + + if should_escalate: + owner_allowed, owner_reason = should_send(bucket, nudge_min, datetime.fromisoformat(timestamp)) + result["owner_escalation_gate"] = {"allowed": owner_allowed, "reason": owner_reason, "threshold": escalation_after} + if owner_allowed: + detail = "Main agent was nudged repeatedly; please review whether manual intervention is needed." + enqueue = enqueue_owner_report(state=state, timestamp=timestamp, dry_run=dry_run, detail=detail) + result["owner_enqueue"] = enqueue + result["ok"] = result.get("ok", True) and enqueue.get("ok", False) + if enqueue.get("ok"): + mark_sent(bucket, "owner-report-enqueue", timestamp, {"report_id": enqueue.get("report_id"), "state": state}) + owner_delivery_mode = os.environ.get( + "WATCHDOG_B_OWNER_DELIVERY_MODE", + "enqueue-only", + ).lower() + result["owner_delivery_mode"] = owner_delivery_mode + if owner_delivery_mode == "direct-discord": + deliver = deliver_owner_report(report_id=enqueue.get("report_id"), dry_run=dry_run) + result["owner_deliver"] = deliver + result["ok"] = result.get("ok", True) and deliver.get("ok", False) + if deliver.get("ok"): + mark_sent(bucket, "owner-report-direct-delivery", timestamp, {"report_id": enqueue.get("report_id"), "state": state}) + return result + + +def main() -> int: + ap = argparse.ArgumentParser(description="Notification layer for watchdog-b") + ap.add_argument("--state", required=True, choices=["running", "stalled", "idle"]) + ap.add_argument("--timestamp", default=now_iso()) + ap.add_argument("--dry-run", action="store_true") + args = ap.parse_args() + + data = load_state() + bucket = get_bucket(data, args.state) + + if args.state == "running": + result = maybe_running_report(data, bucket, args.timestamp, args.dry_run) + else: + result = maybe_nudge_and_escalate(data, bucket, state=args.state, timestamp=args.timestamp, dry_run=args.dry_run) + + bucket["last_seen_at"] = args.timestamp + bucket["last_result"] = result + save_state(data) + print(json.dumps(result, ensure_ascii=False, indent=2)) + return 0 if result.get("ok", False) else 1 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/scripts/openclaw-watchdog-b.service b/scripts/openclaw-watchdog-b.service new file mode 100644 index 0000000..59d8cf9 --- /dev/null +++ b/scripts/openclaw-watchdog-b.service @@ -0,0 +1,17 @@ +# Template systemd --user unit for Watchdog B. +# Install to: ~/.config/systemd/user/openclaw-watchdog-b.service +# Optional env file: ~/.config/openclaw/watchdog-b.env + +[Unit] +Description=OpenClaw Watchdog B (verified direct Discord owner-facing path) +After=network-online.target +Wants=network-online.target + +[Service] +Type=oneshot +WorkingDirectory=%h/.openclaw/workspace +Environment=WATCHDOG_B_CONFIG_FILE=%h/.config/openclaw/watchdog-b.env +EnvironmentFile=-%h/.config/openclaw/watchdog-b.env +ExecStart=%h/.openclaw/workspace/scripts/watchdog-b/run_watchdog_b.sh +StandardOutput=journal +StandardError=journal diff --git a/scripts/openclaw-watchdog-b.timer b/scripts/openclaw-watchdog-b.timer new file mode 100644 index 0000000..3931085 --- /dev/null +++ b/scripts/openclaw-watchdog-b.timer @@ -0,0 +1,15 @@ +# Template systemd --user timer (DO NOT auto-install). +# Runs every 10 minutes. + +[Unit] +Description=Run OpenClaw Watchdog B every 10 minutes + +[Timer] +OnCalendar=*:0/10 +Persistent=true +# Optional jitter to avoid synchronized runs +RandomizedDelaySec=30 +Unit=openclaw-watchdog-b.service + +[Install] +WantedBy=timers.target diff --git a/scripts/openclaw_runtime_probe.py b/scripts/openclaw_runtime_probe.py new file mode 100644 index 0000000..8b5f11c --- /dev/null +++ b/scripts/openclaw_runtime_probe.py @@ -0,0 +1,200 @@ +#!/usr/bin/env python3 +from __future__ import annotations + +import argparse +import json +import os +import shutil +from pathlib import Path +from typing import Iterable + +HOME = Path.home() +ENV_KEYS = { + "node": "WATCHDOG_B_NODE_BIN", + "openclaw_mjs": "WATCHDOG_B_OPENCLAW_MJS", + "openclaw_entry": "WATCHDOG_B_OPENCLAW_ENTRY", +} + + +def dedupe(items: Iterable[Path]) -> list[Path]: + seen: set[str] = set() + out: list[Path] = [] + for item in items: + key = str(item) + if key in seen: + continue + seen.add(key) + out.append(item) + return out + + +def path_candidates() -> tuple[Path | None, list[Path], list[Path]]: + node_path = shutil.which("node") + openclaw_path = shutil.which("openclaw") + node_candidate = Path(node_path).resolve() if node_path else None + roots: list[Path] = [] + entry_candidates: list[Path] = [] + if openclaw_path: + op = Path(openclaw_path).resolve() + roots.extend([ + op.parent.parent / "lib" / "node_modules" / "openclaw", + op.parent.parent.parent / "lib" / "node_modules" / "openclaw", + ]) + entry_candidates.append(op.parent.parent / "lib" / "node_modules" / "openclaw" / "dist" / "entry.js") + if node_candidate: + roots.append(node_candidate.parent.parent / "lib" / "node_modules" / "openclaw") + return node_candidate, dedupe(roots), dedupe(entry_candidates) + + +def common_roots() -> list[Path]: + roots: list[Path] = [] + nvm_dir = Path(os.environ.get("NVM_DIR", HOME / ".nvm")).expanduser() + roots.extend([ + HOME / ".nvm" / "versions" / "node", + nvm_dir / "versions" / "node", + HOME / ".local" / "share" / "pnpm" / "global", + HOME / ".npm-global", + Path("/usr/local"), + Path("/usr"), + HOME / ".volta" / "tools" / "image", + ]) + return dedupe(roots) + + +def scan_openclaw_install_roots() -> list[Path]: + candidates: list[Path] = [] + for root in common_roots(): + if not root.exists(): + continue + if root.name == "node": + for child in sorted(root.glob("v*/lib/node_modules/openclaw"), reverse=True): + candidates.append(child) + continue + patterns = [ + "lib/node_modules/openclaw", + "node_modules/openclaw", + "*/lib/node_modules/openclaw", + "*/node_modules/openclaw", + ] + for pattern in patterns: + for child in sorted(root.glob(pattern), reverse=True): + candidates.append(child) + return dedupe(candidates) + + +def valid_node(path: Path | None) -> Path | None: + if path and path.exists() and os.access(path, os.X_OK): + return path + return None + + +def valid_file(path: Path | None) -> Path | None: + if path and path.is_file(): + return path + return None + + +def detect_runtime() -> dict[str, object]: + result: dict[str, object] = {"ok": False, "detected": {}, "sources": {}, "searched": {}} + detected: dict[str, str] = {} + sources: dict[str, str] = {} + searched: dict[str, list[str]] = {"node": [], "openclaw": []} + + env_node = os.environ.get(ENV_KEYS["node"]) + if env_node: + searched["node"].append(env_node) + node = valid_node(Path(env_node).expanduser()) + if node: + detected["node"] = str(node) + sources["node"] = f"env:{ENV_KEYS['node']}" + env_mjs = os.environ.get(ENV_KEYS["openclaw_mjs"]) + if env_mjs: + searched["openclaw"].append(env_mjs) + mjs = valid_file(Path(env_mjs).expanduser()) + if mjs: + detected["openclaw_mjs"] = str(mjs) + sources["openclaw_mjs"] = f"env:{ENV_KEYS['openclaw_mjs']}" + env_entry = os.environ.get(ENV_KEYS["openclaw_entry"]) + if env_entry: + searched["openclaw"].append(env_entry) + entry = valid_file(Path(env_entry).expanduser()) + if entry: + detected["openclaw_entry"] = str(entry) + sources["openclaw_entry"] = f"env:{ENV_KEYS['openclaw_entry']}" + + path_node, path_roots, path_entry_candidates = path_candidates() + if "node" not in detected and path_node: + searched["node"].append(str(path_node)) + node = valid_node(path_node) + if node: + detected["node"] = str(node) + sources["node"] = "path:node" + + install_roots = dedupe(path_roots + path_entry_candidates + scan_openclaw_install_roots()) + searched["openclaw"].extend(str(p) for p in install_roots) + + def fill_from_root(root: Path, source: str) -> None: + if root.is_file(): + candidate_entry = valid_file(root) + if candidate_entry and candidate_entry.name == "entry.js" and "openclaw_entry" not in detected: + detected["openclaw_entry"] = str(candidate_entry) + sources["openclaw_entry"] = source + root = candidate_entry.parent.parent + elif candidate_entry and candidate_entry.name == "openclaw.mjs" and "openclaw_mjs" not in detected: + detected["openclaw_mjs"] = str(candidate_entry) + sources["openclaw_mjs"] = source + root = candidate_entry.parent + else: + return + candidate_mjs = valid_file(root / "openclaw.mjs") + candidate_entry = valid_file(root / "dist" / "entry.js") + if candidate_mjs and "openclaw_mjs" not in detected: + detected["openclaw_mjs"] = str(candidate_mjs) + sources["openclaw_mjs"] = source + if candidate_entry and "openclaw_entry" not in detected: + detected["openclaw_entry"] = str(candidate_entry) + sources["openclaw_entry"] = source + + for root in install_roots: + source = "path:openclaw" if root in path_roots or root in path_entry_candidates else "scan:common-locations" + fill_from_root(root, source) + if all(k in detected for k in ("openclaw_mjs", "openclaw_entry")): + break + + result["detected"] = detected + result["sources"] = sources + result["searched"] = searched + result["ok"] = all(k in detected for k in ("node", "openclaw_mjs", "openclaw_entry")) + if not result["ok"]: + missing = [k for k in ("node", "openclaw_mjs", "openclaw_entry") if k not in detected] + result["missing"] = missing + result["error"] = ( + "Could not auto-detect: " + ", ".join(missing) + ". " + "Set WATCHDOG_B_NODE_BIN / WATCHDOG_B_OPENCLAW_MJS / WATCHDOG_B_OPENCLAW_ENTRY explicitly if this host uses a non-standard install path." + ) + return result + + +def main() -> int: + parser = argparse.ArgumentParser(description="Detect node/openclaw runtime paths for watchdog-b scripts") + parser.add_argument("--shell", action="store_true", help="print shell export lines") + parser.add_argument("--pretty", action="store_true", help="pretty-print json") + args = parser.parse_args() + + result = detect_runtime() + if args.shell: + if not result["ok"]: + print(result["error"], flush=True) + return 1 + detected = result["detected"] + print(f'WATCHDOG_B_NODE_BIN={detected["node"]}') + print(f'WATCHDOG_B_OPENCLAW_MJS={detected["openclaw_mjs"]}') + print(f'WATCHDOG_B_OPENCLAW_ENTRY={detected["openclaw_entry"]}') + return 0 + + print(json.dumps(result, ensure_ascii=False, indent=2 if args.pretty else None)) + return 0 if result["ok"] else 1 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/scripts/owner_report_consumer.py b/scripts/owner_report_consumer.py new file mode 100644 index 0000000..35becbe --- /dev/null +++ b/scripts/owner_report_consumer.py @@ -0,0 +1,75 @@ +#!/usr/bin/env python3 +"""Minimal owner-report consumer. + +Reads a pending owner report markdown file with simple front-matter-like key/value +lines and emits normalized JSON to stdout. +""" + +from __future__ import annotations + +import argparse +import json +from pathlib import Path + +OWNER_REPORT_ROOT = Path.home() / ".clawteam" / "owner-reports" +PENDING_DIR = OWNER_REPORT_ROOT / "pending" + + +def parse_pending_report(path: Path) -> dict: + raw = path.read_text(encoding="utf-8") + data: dict[str, str] = {} + for line in raw.splitlines(): + line = line.strip() + if not line or ":" not in line: + continue + key, value = line.split(":", 1) + data[key.strip()] = value.strip() + + return { + "ok": True, + "path": str(path), + "filename": path.name, + "report_id": data.get("report_id") or path.stem, + "team": data.get("team"), + "source": data.get("source"), + "report_kind": data.get("report_kind") or "checkpoint", + "created_at": data.get("created_at"), + "message": _unquote(data.get("message", "")), + "raw": data, + } + + +def _unquote(value: str) -> str: + value = value.strip() + if len(value) >= 2 and value[0] == '"' and value[-1] == '"': + return value[1:-1] + return value + + +def resolve_input(name_or_path: str) -> Path: + p = Path(name_or_path).expanduser() + if p.exists(): + return p + candidate = PENDING_DIR / name_or_path + if candidate.exists(): + return candidate + if not candidate.suffix: + md_candidate = candidate.with_suffix(".md") + if md_candidate.exists(): + return md_candidate + raise FileNotFoundError(f"pending report not found: {name_or_path}") + + +def main() -> int: + ap = argparse.ArgumentParser(description="Emit JSON for a pending owner report") + ap.add_argument("report", help="Pending report path, filename, or report_id") + args = ap.parse_args() + + path = resolve_input(args.report) + payload = parse_pending_report(path) + print(json.dumps(payload, ensure_ascii=False, indent=2)) + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/scripts/owner_report_driver.py b/scripts/owner_report_driver.py new file mode 100644 index 0000000..7e20ce0 --- /dev/null +++ b/scripts/owner_report_driver.py @@ -0,0 +1,118 @@ +#!/usr/bin/env python3 +"""Minimal owner-report driver. + +Consumes one pending owner report, calls an external send command, and only moves +it to sent/ after the send command succeeds. + +This is a deliberately small manual driver for debugging the owner-report chain. +It does not watch directories, retry, or send anything by itself. +""" + +from __future__ import annotations + +import argparse +import json +import os +import subprocess +from pathlib import Path + +from owner_report_consumer import OWNER_REPORT_ROOT, PENDING_DIR, parse_pending_report, resolve_input + +SENT_DIR = OWNER_REPORT_ROOT / "sent" + + +def _build_send_env(payload: dict) -> dict[str, str]: + env = os.environ.copy() + env.update( + { + "OWNER_REPORT_JSON": json.dumps(payload, ensure_ascii=False), + "OWNER_REPORT_ID": str(payload.get("report_id") or ""), + "OWNER_REPORT_TEAM": str(payload.get("team") or ""), + "OWNER_REPORT_SOURCE": str(payload.get("source") or ""), + "OWNER_REPORT_KIND": str(payload.get("report_kind") or "checkpoint"), + "OWNER_REPORT_CREATED_AT": str(payload.get("created_at") or ""), + "OWNER_REPORT_MESSAGE": str(payload.get("message") or ""), + "OWNER_REPORT_PATH": str(payload.get("path") or ""), + } + ) + return env + + +def _sent_path(src: Path) -> Path: + SENT_DIR.mkdir(parents=True, exist_ok=True) + return SENT_DIR / src.name + + +def _finalize_successful_send(src: Path) -> dict[str, object]: + dest = _sent_path(src) + if src.exists(): + src.rename(dest) + return {"moved": True, "already_archived": False, "final_path": str(dest)} + + if dest.exists(): + return {"moved": False, "already_archived": True, "final_path": str(dest)} + + raise FileNotFoundError( + f"successful send completed but pending report disappeared before archiving: pending={src} sent={dest}" + ) + + +def main() -> int: + ap = argparse.ArgumentParser(description="Send one pending owner report via external command") + ap.add_argument("report", help="Pending report path, filename, or report_id") + ap.add_argument( + "--send-cmd", + help="Shell command used to send the report. Can also come from OWNER_REPORT_SEND_CMD.", + ) + ap.add_argument("--dry-run", action="store_true", help="Print what would be sent and do not move files") + args = ap.parse_args() + + src = resolve_input(args.report) + payload = parse_pending_report(src) + + send_cmd = args.send_cmd or os.environ.get("OWNER_REPORT_SEND_CMD") + if not send_cmd and not args.dry_run: + raise SystemExit("missing send command: use --send-cmd or OWNER_REPORT_SEND_CMD") + + if args.dry_run: + print(json.dumps({ + "ok": True, + "dry_run": True, + "action": "would_send", + "pending_path": str(src), + "sent_path": str(_sent_path(src)), + "payload": payload, + "send_cmd": send_cmd, + }, ensure_ascii=False, indent=2)) + return 0 + + proc = subprocess.run( + ["bash", "-lc", send_cmd], + text=True, + capture_output=True, + env=_build_send_env(payload), + ) + + result = { + "ok": proc.returncode == 0, + "dry_run": False, + "pending_path": str(src), + "sent_path": str(_sent_path(src)), + "send_cmd": send_cmd, + "exit_code": proc.returncode, + "stdout": proc.stdout, + "stderr": proc.stderr, + "payload": payload, + } + + if proc.returncode != 0: + print(json.dumps(result, ensure_ascii=False, indent=2)) + return proc.returncode + + result.update(_finalize_successful_send(src)) + print(json.dumps(result, ensure_ascii=False, indent=2)) + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/scripts/owner_report_producer.py b/scripts/owner_report_producer.py new file mode 100644 index 0000000..fde6883 --- /dev/null +++ b/scripts/owner_report_producer.py @@ -0,0 +1,143 @@ +#!/usr/bin/env python3 +"""Minimal owner-report producer for ClawTeam-style worker checkpoints. + +Writes ~/.clawteam/owner-reports/pending/.md using explicit checkpoint +fields and a human-readable message suitable for direct Telegram delivery. + +This intentionally stays tiny: +- no daemon +- no event bus +- no parser for arbitrary logs +- just explicit fields in -> pending markdown out +""" + +from __future__ import annotations + +import argparse +import json +import re +from datetime import datetime, timezone +from pathlib import Path + +from owner_report_consumer import OWNER_REPORT_ROOT + +PENDING_DIR = OWNER_REPORT_ROOT / "pending" + + +def _slug(value: str) -> str: + slug = re.sub(r"[^a-zA-Z0-9._-]+", "-", value.strip()).strip("-._") + return slug or "report" + + +def _now_iso() -> str: + return datetime.now().astimezone().isoformat(timespec="seconds") + + +def build_message(*, team: str, worker: str, task_id: str, progress: str, done: str, next_step: str, status: str, source: str | None, report_kind: str) -> str: + headline = f"🔔 [{team}] {worker}" + if report_kind == "leader-final": + headline = f"✅ [{team}] final" + + lines = [ + headline, + done, + ] + + if next_step.strip(): + lines.append(f"→ {next_step}") + + tech = [ + f"task={task_id}", + f"status={status}", + f"progress={progress}", + ] + if source: + tech.append(f"source={source}") + lines.append(" | ".join(tech)) + return "\n".join(lines) + + +def build_report_body(*, report_id: str, team: str, worker: str, task_id: str, progress: str, done: str, next_step: str, status: str, source: str | None, created_at: str, message: str, report_kind: str) -> str: + fields: list[tuple[str, str | None]] = [ + ("report_id", report_id), + ("team", team), + ("worker", worker), + ("task_id", task_id), + ("progress", progress), + ("done", done), + ("next", next_step), + ("status", status), + ("report_kind", report_kind), + ("source", source), + ("created_at", created_at), + ("message", json.dumps(message, ensure_ascii=False)), + ] + return "\n".join(f"{k}: {v}" for k, v in fields if v is not None) + "\n" + + +def main() -> int: + ap = argparse.ArgumentParser(description="Create one pending owner report from explicit checkpoint fields") + ap.add_argument("--team", required=True) + ap.add_argument("--worker", required=True) + ap.add_argument("--task-id", required=True) + ap.add_argument("--progress", required=True) + ap.add_argument("--done", required=True) + ap.add_argument("--next", dest="next_step", required=True) + ap.add_argument("--status", required=True) + ap.add_argument("--source") + ap.add_argument("--report-kind", choices=["checkpoint", "leader-final"], default="checkpoint") + ap.add_argument("--report-id", help="Optional explicit report_id / filename stem") + ap.add_argument("--created-at", default=_now_iso()) + ap.add_argument("--dry-run", action="store_true") + args = ap.parse_args() + + report_id = args.report_id or f"{_slug(args.team)}-{_slug(args.worker)}-{_slug(args.task_id)}-{_slug(args.report_kind)}" + message = build_message( + team=args.team, + worker=args.worker, + task_id=args.task_id, + progress=args.progress, + done=args.done, + next_step=args.next_step, + status=args.status, + source=args.source, + report_kind=args.report_kind, + ) + body = build_report_body( + report_id=report_id, + team=args.team, + worker=args.worker, + task_id=args.task_id, + progress=args.progress, + done=args.done, + next_step=args.next_step, + status=args.status, + source=args.source, + created_at=args.created_at, + message=message, + report_kind=args.report_kind, + ) + + path = PENDING_DIR / f"{report_id}.md" + + result = { + "ok": True, + "report_id": report_id, + "path": str(path), + "message": message, + "dry_run": args.dry_run, + } + + if args.dry_run: + result["body"] = body + print(json.dumps(result, ensure_ascii=False, indent=2)) + return 0 + + PENDING_DIR.mkdir(parents=True, exist_ok=True) + path.write_text(body, encoding="utf-8") + print(json.dumps(result, ensure_ascii=False, indent=2)) + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/scripts/run_watchdog_b.sh b/scripts/run_watchdog_b.sh new file mode 100755 index 0000000..6ef1412 --- /dev/null +++ b/scripts/run_watchdog_b.sh @@ -0,0 +1,141 @@ +#!/usr/bin/env bash +set -euo pipefail + +# Watchdog B v2 dispatcher/runner. +# Unified entrypoint for timer/service/manual runs. +# +# Flow: +# 1) Call check_openclaw_state.sh to get one of: running | stalled | idle +# 2) Emit a human-readable action template for the detected state +# 3) Invoke the notification layer (dry-run/manual by default, configurable) +# 4) Persist rendered output for local verification / future integrations +# +# Notification behavior is intentionally conservative: +# - running: defaults to a manual/queue-ready owner report path +# - stalled/idle: nudge main agent first, then optionally escalate to owner report +# - outbound owner messaging reuses the existing owner-reporting-system queue + +SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)" +SKILL_DIR="$(cd -- "$SCRIPT_DIR/.." && pwd)" +WATCHDOG_B_CONFIG_FILE_DEFAULT="$HOME/.config/openclaw/watchdog-b.env" +WATCHDOG_B_CONFIG_FILE="${WATCHDOG_B_CONFIG_FILE:-$WATCHDOG_B_CONFIG_FILE_DEFAULT}" +if [[ -f "$WATCHDOG_B_CONFIG_FILE" ]]; then + set -a + # shellcheck disable=SC1090 + . "$WATCHDOG_B_CONFIG_FILE" + set +a +fi + +WORKSPACE_DEFAULT="$HOME/.openclaw/workspace" +WORKSPACE_DIR="${WATCHDOG_B_WORKSPACE:-$WORKSPACE_DEFAULT}" +CHECKER="${WATCHDOG_B_CHECKER:-$SCRIPT_DIR/check_openclaw_state.sh}" +ARTIFACT_DIR="${WATCHDOG_B_ARTIFACT_DIR:-$WORKSPACE_DIR/state/watchdog-b}" +TIMESTAMP="$(date '+%Y-%m-%dT%H:%M:%S%z')" +HOSTNAME_VALUE="$(hostname 2>/dev/null || echo unknown-host)" +NOTIFIER="${WATCHDOG_B_NOTIFIER:-$SCRIPT_DIR/notify_watchdog_b.py}" +NOTIFY_DRY_RUN="${WATCHDOG_B_NOTIFY_DRY_RUN:-1}" + +mkdir -p "$ARTIFACT_DIR" + +if [[ ! -x "$CHECKER" ]]; then + echo "watchdog-b error: checker not executable: $CHECKER" >&2 + exit 1 +fi + +STATE="$($CHECKER)" + +emit_running() { + cat <&2 + exit 2 + ;; +esac + +printf '%s\n' "$OUTPUT" + +NOTIFY_OUTPUT="" +if [[ -x "$NOTIFIER" ]]; then + NOTIFY_CMD=("$NOTIFIER" --state "$STATE" --timestamp "$TIMESTAMP") + if [[ "$NOTIFY_DRY_RUN" == "1" ]]; then + NOTIFY_CMD+=(--dry-run) + fi + if NOTIFY_OUTPUT="$(WATCHDOG_B_ARTIFACT_DIR="$ARTIFACT_DIR" "${NOTIFY_CMD[@]}" 2>&1)"; then + printf '%s\n' "$NOTIFY_OUTPUT" + else + printf '%s\n' "$NOTIFY_OUTPUT" + echo "watchdog-b warning: notifier returned non-zero for state=$STATE" >&2 + fi +else + echo "watchdog-b warning: notifier not executable: $NOTIFIER" >&2 +fi + +printf '%s\n' "$OUTPUT" > "$ARTIFACT_DIR/last-output.txt" +printf '%s\n' "$NOTIFY_OUTPUT" > "$ARTIFACT_DIR/last-notify-output.txt" +printf '%s\t%s\n' "$TIMESTAMP" "$STATE" >> "$ARTIFACT_DIR/history.tsv" +printf '%s\n' "$STATE" > "$ARTIFACT_DIR/last-state.txt" diff --git a/scripts/verify_watchdog_b_e2e.sh b/scripts/verify_watchdog_b_e2e.sh new file mode 100755 index 0000000..c46fc04 --- /dev/null +++ b/scripts/verify_watchdog_b_e2e.sh @@ -0,0 +1,65 @@ +#!/usr/bin/env bash +set -euo pipefail + +SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)" +WORKSPACE="$(cd -- "$SCRIPT_DIR/../.." && pwd)" +ARTIFACT_ROOT="${WATCHDOG_B_VERIFY_ROOT:-$WORKSPACE/state/watchdog-b-verify-e2e}" +RUN_ID="${RUN_ID:-$(date +%Y%m%dT%H%M%S)}" +RUN_DIR="$ARTIFACT_ROOT/$RUN_ID" +FIXTURE_DIR="$RUN_DIR/fixture" +LOG="$RUN_DIR/verify.log" +STATE_DIR="$RUN_DIR/state" +QUEUE_SNAPSHOT="$RUN_DIR/queue-before.txt" +QUEUE_AFTER="$RUN_DIR/queue-after.txt" +mkdir -p "$FIXTURE_DIR/host-runtime" "$FIXTURE_DIR/logs" "$STATE_DIR" "$RUN_DIR" + +exec > >(tee -a "$LOG") 2>&1 + +echo "[verify] run_id=$RUN_ID" +echo "[verify] workspace=$WORKSPACE" +date -Iseconds + +echo "[verify] snapshot owner-report queue before" +find "$HOME/.clawteam/owner-reports" -maxdepth 2 -type f | sort > "$QUEUE_SNAPSHOT" || true + +sleep 180 & +FAKE_PID=$! +trap 'kill "$FAKE_PID" 2>/dev/null || true' EXIT +printf '%s\n' "$FAKE_PID" > "$FIXTURE_DIR/host-runtime/openclaw.pid" +touch "$FIXTURE_DIR/logs/openclaw.log" + +echo "[verify] run watchdog-b direct E2E (enqueue + direct delivery)" +OPENCLAW_PID_FILE="$FIXTURE_DIR/host-runtime/openclaw.pid" \ +OPENCLAW_LOG_FILE="$FIXTURE_DIR/logs/openclaw.log" \ +STALL_AFTER_SECONDS=1200 \ +WATCHDOG_B_ARTIFACT_DIR="$STATE_DIR" \ +WATCHDOG_B_NOTIFY_DRY_RUN=0 \ +WATCHDOG_B_RUNNING_REPORT_MODE=enqueue-and-drain \ +WATCHDOG_B_RUNNING_REPORT_MIN_INTERVAL_SECONDS=0 \ +"$WORKSPACE/scripts/watchdog-b/run_watchdog_b.sh" | tee "$RUN_DIR/run-output.txt" + +echo "[verify] snapshot owner-report queue after" +find "$HOME/.clawteam/owner-reports" -maxdepth 2 -type f | sort > "$QUEUE_AFTER" || true + +echo "[verify] summarize" +REPORT_ID="$(python3 - <<'PY' "$STATE_DIR/notify-state.json" +import json,sys +p=sys.argv[1] +with open(p,'r',encoding='utf-8') as f: + data=json.load(f) +print(data['events']['running']['last_result']['enqueue']['report_id']) +PY +)" + +echo "REPORT_ID=$REPORT_ID" | tee "$RUN_DIR/result.env" +SENT_PATH="$HOME/.clawteam/owner-reports/sent/$REPORT_ID.md" +echo "SENT_PATH=$SENT_PATH" | tee -a "$RUN_DIR/result.env" +if [[ ! -f "$SENT_PATH" ]]; then + echo "[verify] ERROR: sent file missing: $SENT_PATH" >&2 + exit 1 +fi + +echo "[verify] sent file found" +sed -n '1,120p' "$SENT_PATH" | tee "$RUN_DIR/sent-head.txt" + +echo "[verify] done" diff --git a/scripts/watchdog-b.env.example b/scripts/watchdog-b.env.example new file mode 100644 index 0000000..bbe993d --- /dev/null +++ b/scripts/watchdog-b.env.example @@ -0,0 +1,40 @@ +# Single source of truth for watchdog-b owner-facing policy. +# Preferred location: ~/.config/openclaw/watchdog-b.env +# Can also be loaded manually by: +# WATCHDOG_B_CONFIG_FILE=... ./scripts/watchdog-b/run_watchdog_b.sh +# WATCHDOG_B_CONFIG_FILE=... ./scripts/watchdog-b/notify_watchdog_b.py --state running + +# --- delivery / runtime policy --- +WATCHDOG_B_NOTIFY_DRY_RUN=0 +WATCHDOG_B_RUNNING_REPORT_MODE=enqueue-and-drain +WATCHDOG_B_RUNNING_REPORT_MIN_INTERVAL_SECONDS=3600 +WATCHDOG_B_OWNER_DELIVERY_MODE=direct-discord +WATCHDOG_B_OWNER_REPORT_CHANNEL=discord +WATCHDOG_B_OWNER_REPORT_TARGET=channel:REPLACE_ME + +# --- non-running escalation policy --- +# Set this only if the host actually has a valid OpenClaw agent id to nudge. +# If left unset, stalled/idle paths skip main-agent nudge and can still escalate owner-facing reports. +# WATCHDOG_B_MAIN_AGENT_ID=main +# WATCHDOG_B_STALLED_OWNER_MODE=escalate +# WATCHDOG_B_IDLE_OWNER_MODE=escalate +# WATCHDOG_B_STALLED_OWNER_ESCALATION_AFTER=2 +# WATCHDOG_B_IDLE_OWNER_ESCALATION_AFTER=2 +# WATCHDOG_B_STALLED_NUDGE_MIN_INTERVAL_SECONDS=900 +# WATCHDOG_B_IDLE_NUDGE_MIN_INTERVAL_SECONDS=1800 + +# --- owner-facing message style --- +WATCHDOG_B_RUNNING_EMOJI=✅ +WATCHDOG_B_RUNNING_SUMMARY=主程序仍在運行 +WATCHDOG_B_STALLED_EMOJI=⚠️ +WATCHDOG_B_STALLED_SUMMARY=主程序疑似卡住 +WATCHDOG_B_IDLE_EMOJI=🛑 +WATCHDOG_B_IDLE_SUMMARY=主程序目前未運行 + +# Optional overrides for the compact technical line. +# WATCHDOG_B_RUNNING_PROGRESS_LABEL=running +# WATCHDOG_B_STALLED_PROGRESS_LABEL=stalled +# WATCHDOG_B_IDLE_PROGRESS_LABEL=idle +# WATCHDOG_B_RUNNING_STATUS=normal +# WATCHDOG_B_STALLED_STATUS=needs-attention +# WATCHDOG_B_IDLE_STATUS=needs-attention