chore: disable long-task watchdog

This commit is contained in:
Eve
2026-04-21 18:41:40 +08:00
parent ee9d05a4ad
commit 5fb6c28a9b
2 changed files with 346 additions and 0 deletions

308
CRAYFISH_STATUS.md Normal file
View File

@@ -0,0 +1,308 @@
# Crayfish Status (Alice)
> 任務是否「已完成」由 Eric 總管判定此檔僅追蹤指派與進度。Pending Verification
---
## 執行中
- [ ] backend 進化會議
- [ ] Discord 代理會議轉播診斷
- [ ] 檢查靜態 Wiki 更新狀態
- [ ] Mattermost docker-compose 部署 + NPM 反向代理
- [ ] 診斷 kanban.service 200/CHDIR
- [ ] KAL 研究 hermes memory lcm / hermes-lcm plugin
- [ ] 設定 rsnapshot 每小時備份 `.openclaw` / `.hermes`
- [ ] 強化 checkpoint 狀態機規則
- [ ] checkpoint 流程演練測試
- [ ] 研究與安裝測試 paperclip
- [ ] 診斷與修正 approval 流程問題
- [ ] 將 long task 規則升級為 watchdog 巡查機制
- [ ] 調查 watchdog 回報未送達目前對話
### Details
#### Synology 溫度儀表板
<details>
<summary>點開看細節</summary>
**狀態**M2 Verified儀表板已部署運行SMS 改用 Infobip短訊版文案已實測送達待總管確認完成
**Lobby 紀錄**telegram:-5079485182 msgId=2387 / msgId=2483
**入口**<http://192.168.17.123:8787/>
**服務**`systemctl --user status synology-temp-dashboard.service`
</details>
#### 移除 deer-flow
<details>
<summary>點開看細節</summary>
**狀態**M1 Assigned
**Lobby 紀錄**telegram:-5079485182 msgId=2691
</details>
#### 調查安裝軟體權限
<details>
<summary>點開看細節</summary>
**狀態**M1 Assigned
**Lobby 紀錄**telegram:-5079485182 msgId=2686
</details>
#### 安裝 gh CLI
<details>
<summary>點開看細節</summary>
**狀態**M2 Verifiedgh 2.45.0 已安裝;待總管確認)
**Lobby 紀錄**telegram:-5079485182 msgId=2699 / msgId=2701
</details>
#### backend 進化會議
<details>
<summary>點開看細節</summary>
**狀態**M1 Assigned
**Lobby 紀錄**telegram:-5079485182 msgId=2738
</details>
#### Discord 代理會議轉播診斷
<details>
<summary>點開看細節</summary>
**狀態**M2 Verified已釐清根因與可行方案待總管確認
**Lobby 紀錄**telegram:-5079485182 msgId=2765
</details>
#### Discord 優先工作模式遷移(已取消)
<details>
<summary>點開看細節</summary>
**狀態**M1 Assigned
**Lobby 紀錄**telegram:-5079485182 msgId=2772
**備註**:總管已決定改由 Discord 端直接召集會議,不再以此路徑推進
</details>
#### 調整 gpt-5.4 context 與 Synology dashboard
<details>
<summary>點開看細節</summary>
**狀態**M1 Assigned
**Lobby 紀錄**telegram:-5079485182 msgId=2804
**明細**
- `cowbay/gpt-5.4` contextWindow 已調為 `1048576`
- Synology dashboard 已新增高溫告警卡(讀 `WARNING_TEMPER_MAX`
</details>
#### 重構 kanban 結構
<details>
<summary>點開看細節</summary>
**狀態**M1 Assigned
**Lobby 紀錄**telegram:-5079485182 msgId=2831 / msgId=2869 / msgId=2939
**目標**:主列表只留 task 名稱;明細收進各 task 內文;整理狀態欄位/命名/封存區
</details>
#### 檢查靜態 Wiki 更新狀態
<details>
<summary>點開看細節</summary>
**狀態**M1 Assigned
**Lobby 紀錄**telegram:-5079485182 msgId=2831
</details>
#### Mattermost docker-compose 部署 + NPM 反向代理
<details>
<summary>點開看細節</summary>
**狀態**M1 Assigned
**目標**:在本機以 docker-compose 部署 Mattermost含 Postgres、持久化 volumes、可升級流程並使用既有 Nginx Proxy Manager (NPM) 設定反向代理(含 WebSocket / headers / 上傳大小)。
**環境盤點**:本機已在 Docker 層有 NPM80/443 已占用;管理介面 8181→818065/host 5432 未占用。
**待總管提供**對外網域chat.ai.cowbay.org與是否要 Lets Encrypt。
</details>
#### 檢視 Synology dashboard 視覺建議
<details>
<summary>點開看細節</summary>
**狀態**M1 Assigned
**Lobby 紀錄**telegram:-5079485182 msgId=2835
</details>
#### 診斷 kanban.service 200/CHDIR
<details>
<summary>點開看細節</summary>
**狀態**M1 Assigned
**目標**:只讀診斷 `kanban.service` 為何出現 `status=200/CHDIR`,確認 unit 的 `WorkingDirectory``ExecStart`、權限與最近 journal。
**目前阻塞**:依 Lobby-First 嘗試發送 Lobby 紀錄時失敗:`Telegram send failed: chat not found (chat_id=-5079485182)`
</details>
#### KAL 研究 hermes memory lcm / hermes-lcm plugin
<details>
<summary>點開看細節</summary>
**狀態**M1 Assigned
**Lobby 紀錄**telegram:-5079485182 msgId=1762
**目標**:依 KAL intake 研究 `hermes memory lcm` / `hermes-lcm` plugin理解其思路邏輯與實際做法。
**開題**
- topic: `hermes memory lcm`
- seed: `https://github.com/stephenschoettler/hermes-lcm`
- goal: `這個plugin的思路邏輯以及實際的做法`
- scope: `none`
**主查**subagent research (runId=bcf46305-9ef8-4ded-80eb-26c9e6483c6e)
**同行覆核**subagent reviewer (runId=b322bc44-c5d6-49cb-9bb0-f6f89f390c04)
</details>
#### 設定 rsnapshot 每小時備份 `.openclaw` / `.hermes`
<details>
<summary>點開看細節</summary>
**狀態**M2 Verifiedrsnapshot 已建好獨立設定並成功手動建立 `hourly.0`;自動排程仍待最後確認,待總管判定)
**Lobby 紀錄**telegram:-5079485182 msgId=1982 / msgId=1997 / msgId=2038
**目標**:在本機安裝並設定 rsnapshot`/home/alice/.openclaw``/home/alice/.hermes` 每小時備份到 `/opt/agents_backup`,只保留 12 份 hourly。
**主查**subagent rsnapshot-backup-setup-retry (runId=283a2d90-9a7f-4a7b-8d0f-a823c88d50ec)
**同行覆核**:待指派
**已驗證**
- 獨立設定檔:`/home/alice/.config/rsnapshot-agents.conf`
- `retain hourly 12`
- `rsnapshot configtest`Syntax OK
- 已建立:`/opt/agents_backup/hourly.0/openclaw`
- 已建立:`/opt/agents_backup/hourly.0/hermes`
**待補**自動排程cron最終落地與回讀確認
</details>
#### 強化 checkpoint 狀態機規則
<details>
<summary>點開看細節</summary>
**狀態**M1 Assigned
**Lobby 紀錄**telegram:-5079485182 msgId=2126
**備註**:總管已批准落地此規則;目標是修正 checkpoint 後常停住
</details>
#### checkpoint 流程演練測試
<details>
<summary>點開看細節</summary>
**狀態**M1 Assigned
**Lobby 紀錄**telegram:-5079485182 msgId=2145
**備註**:總管批准進行小型 long task 測試;驗收點是派工 → checkpoint → 繼續執行 → 收尾回報,中途不得在 checkpoint 後停住
</details>
#### 研究與安裝測試 paperclip
<details>
<summary>點開看細節</summary>
**狀態**M1 Assigned改走 docker-compose 路徑)
**Lobby 紀錄**telegram:-5079485182 msgId=2167 / msgId=2199
**備註**:總管已指示停止原生安裝路線,改用 repo 內 docker-compose 相關檔案在隔離目錄部署並驗證
</details>
#### 診斷與修正 approval 流程問題
<details>
<summary>點開看細節</summary>
**狀態**M1 Assigned
**Lobby 紀錄**telegram:-5079485182 msgId=2225
**備註**:總管已指示先解決 approval 問題;目標是找出 exec/docker 命令頻繁要求 approval、approval 容易過期的根因與可修方案
</details>
#### 將 long task 規則升級為 watchdog 巡查機制
<details>
<summary>點開看細節</summary>
**狀態**M1 Assigned2026-04-21 依總管指示已關閉 recurring watchdog
**Lobby 紀錄**telegram:-5079485182 msgId=2283 / msgId=2300 / msgId=2465
**備註**:總管已批准改寫 long task / checkpoint 規則,並進一步建立外部 watchdog MVP狀態檔 + recurring cron其後依總管最新指示已停用 `long-task-watchdog-10m` 並將 `watchdog-implementation-watchdog` 改為 paused。
</details>
#### 調查 watchdog 回報未送達目前對話
<details>
<summary>點開看細節</summary>
**狀態**M1 Assigned
**Lobby 紀錄**telegram:-5079485182 msgId=2372
**備註**:總管要求釐清為何 watchdog 觸發後已生成 checkpoint`delivered=false / not-delivered`,沒有成功送到目前這條 Telegram 對話
</details>
#### 實作 Synology dashboard 前三項視覺優化
<details>
<summary>點開看細節</summary>
**狀態**M1 Assigned
**Lobby 紀錄**telegram:-5079485182 msgId=2850
**明細**
- 按鈕 / 輸入框 / label 樣式統一
- Header + 狀態 pill
- 溫度卡狀態色彩規則
</details>
---
## 垃圾桶
- [ ] TG / OpenClaw 無回應診斷
- [ ] Forum v1 / openclaw-bot-review
- [ ] /start diag:1149
- [ ] compaction trigger 診斷
### Details
#### TG / OpenClaw 無回應診斷
<details>
<summary>點開看細節</summary>
**狀態**M1 Assigned
**Lobby 紀錄**telegram:-5079485182 msgId=1614
**主查**subagent backend (runId=79a799b2-6ce1-4ee0-bd37-a8136447effb)
**同行覆核**subagent test (runId=f43d56d7-976b-4a04-b49e-678f6ae6b211)
</details>
#### Forum v1 / openclaw-bot-review
<details>
<summary>點開看細節</summary>
**狀態**M2 Verified功能可用待總管指示下一步
**入口**<http://192.168.17.123:3077/forum>
- 範例 PR<http://192.168.17.123:3077/forum/1>
- 系統服務:`systemctl --user status openclaw-bot-review.service`
**PORT**=3077 / HOSTNAME=0.0.0.0
**已驗證**GET /api/forum/pulls/1 ✅
</details>
#### /start diag:1149
<details>
<summary>點開看細節</summary>
**狀態**M1 Assigned
**Lobby 紀錄**telegram:-5079485182 msgId=1403補充 msgId=141213:28 重啟前已無回應)
**主查**subagent backend (runId=f1c4cd0c-b39b-4931-8436-1001cf595525)
**同行覆核**subagent docs (runId=1f441f2d-9909-49b0-aadd-b37a40e16429)
**加派**subagent diag-no-response-primary (runId=13f08bee-4c95-4949-8dea-4733e003406f)
</details>
#### compaction trigger 診斷
<details>
<summary>點開看細節</summary>
**狀態**M1 Assigned
**主查**subagent diag-compaction-trigger-primary (runId=6eb2bf3f-88bb-4e40-990a-221b27073646)
**主查**subagent backend (runId=db9407a6-a69c-4d4f-b751-bc0f543b80e9)
**同行覆核**subagent docs (runId=390d7e08-1041-4d2a-a4d0-c97781f00871)
</details>
---
- [ ] Discord 優先工作模式遷移(已取消)
## 封存備查
- [x] agent forum / gitea 驗證紀錄
- [x] 無回應 / compaction 歷史覆核紀錄
### Notes
**Lobby 紀錄**telegram:-5079485182 msgId=1669 / msgId=1432 / msgId=1170
- GITEA_BASE_URL=https://gitea.cowbay.org
- GITEA_FORUM_REPO=openclaw/agent-forum
**已設**GITEA_TOKEN / FORUM_ADMIN_KEY
**已驗證**
- GET /api/forum/pulls ✅
- GET /api/forum/pulls/1/labels ✅
- GET /api/forum/pulls/1/comments ✅
**同行覆核**subagent diag-compaction-trigger-peerreview (runId=9d52596d-1d95-4406-bb6c-ef1f25159a16)
**同行覆核**subagent diag-no-response-peerreview (runId=70a13b2e-c4b6-4c7e-8131-a0da0190a37b)
**加派**subagent diag-no-response-pre-restart (runId=ed5f2b90-560d-479f-9dc9-6da8346a6fc8)
---
- [x] 調查安裝軟體權限
- [x] 修復現有記憶管理
- [x] 檢視 Synology dashboard 視覺建議
- [x] 移除 deer-flow
- [x] 安裝 gh CLI
## 已完成
- [x] 部署 AstrBot Docker Telegram
### Notes
#### 修復現有記憶管理
<details>
<summary>點開看細節</summary>
**說明**:修現有記憶管理(不更換套件):修正 openclaw-mem0 slot/config 問題,恢復內建 memory ingest/index驗證 write/index/query
</details>
#### 部署 AstrBot Docker Telegram
<details>
<summary>點開看細節</summary>
**說明**:以 Docker / docker-compose 架設 AstrBot接上 Telegram Bot先做隔離部署與基本驗證
</details>
- [x] 實作 Synology dashboard 前三項視覺優化
- [x] 重構 kanban 結構
- [x] 調整 gpt-5.4 context 與 Synology dashboard
- [x] Synology 溫度儀表板