Add silent long-task checkpoint gate

This commit is contained in:
Eve
2026-04-22 14:19:48 +08:00
parent 450b99fa5b
commit 83be99a6bb
4 changed files with 68 additions and 11 deletions

View File

@@ -26,6 +26,16 @@ Subagent 指派後 **5 分鐘內若無結果**
- 必須遵守 no-fake-progress 與 stop-clock gate - 必須遵守 no-fake-progress 與 stop-clock gate
- 若回覆前其實已進入非一般 chat 工作流,卻仍以「普通聊天」方式直接回完,視為流程違規。 - 若回覆前其實已進入非一般 chat 工作流,卻仍以「普通聊天」方式直接回完,視為流程違規。
## Silent Long-Task Rule
- 若 long-task 啟動後**不會自然立刻產生下一則對總管的輸出**,則它屬於 `silent long-task`
- 任何 silent long-task 在啟動時都必須同步定義:
- 第一個回報節點(時間 / 階段 / 事件)
- 若尚未完成時的回報內容
- 若沒有新證據時的狀態轉移(`paused` / `blocked`
- 若最後需要總管判定handoff 方式(例如 button-path
- 若 silent long-task 啟動後沒有這個強制回報節點,之後出現「為什麼沒消息了?」就視為流程違規,而不是單純延遲。
## Checkpoint Rule ## Checkpoint Rule
- checkpoint **不是結案**;它只是長任務中的階段回報,不代表可以在送出後直接停住。 - checkpoint **不是結案**;它只是長任務中的階段回報,不代表可以在送出後直接停住。

View File

@@ -31,6 +31,31 @@ This especially applies to:
If the endpoint is predictably a user decision, the assistant should structure the run so that the final user-facing handoff is already prepared as a button interaction. If the endpoint is predictably a user decision, the assistant should structure the run so that the final user-facing handoff is already prepared as a button interaction.
### Silent Long-Task Checkpoint Gate
If a long-task is started and it will **not naturally produce an immediate next user-visible message**, it must define a forced reporting checkpoint at startup.
This applies to any silent long-task pattern, including but not limited to:
- research
- investigation
- debugging
- delegation / waiting on subagents
- background execution
- staged analysis
- long-running verification
- full tests / regression tests
- any "Ill go do this and report back" workflow
At startup, the task must define:
1. the **first checkpoint trigger** (time, stage, or event)
2. what to report if the task is **not yet finished** by that checkpoint
3. how to downgrade status if there is **no new evidence** (`paused` / `blocked`)
4. how final owner handoff will work if a user decision is expected
### Failure rule
If a silent long-task was started without a forced reporting checkpoint, and the user later has to ask "why is there no update?", treat that as a workflow failure / checkpoint-lost condition.
### Button-driven test rule ### Button-driven test rule
If a test or validation flow is known in advance to end in a Telegram pass/fail, accept/reject, or rerun/stop decision, do not start that test in the ordinary text-reply lane. If a test or validation flow is known in advance to end in a Telegram pass/fail, accept/reject, or rerun/stop decision, do not start that test in the ordinary text-reply lane.
@@ -53,6 +78,7 @@ These are violations when used as the closing interaction on Telegram:
- saying buttons will be used, but not actually sending them - saying buttons will be used, but not actually sending them
- sending explanation first, and only later sending buttons after being corrected - sending explanation first, and only later sending buttons after being corrected
- running a full test/report in plain text even though the result is obviously heading toward a pass/fail owner decision - running a full test/report in plain text even though the result is obviously heading toward a pass/fail owner decision
- starting a silent long-task without any explicit forced reporting checkpoint
### Required interpretation ### Required interpretation
@@ -63,14 +89,17 @@ The gate applies to:
- next-step choices - next-step choices
- accept / rerun / stop style decisions - accept / rerun / stop style decisions
- pass / fail verdict requests - pass / fail verdict requests
- silent long-task launches
### Violation standard ### Violation standard
If the assistant reaches a user-decision closure and no real inline buttons were delivered first, treat it as a workflow violation even if the reply mentioned buttons in text. If the assistant reaches a user-decision closure and no real inline buttons were delivered first, treat it as a workflow violation even if the reply mentioned buttons in text.
If a silent long-task goes dark because no forced reporting checkpoint was defined, treat it as a workflow violation even if the task later resumes.
### Corrective rule ### Corrective rule
If this violation happens: If this violation happens:
- acknowledge the violation plainly - acknowledge the violation plainly
- immediately send the real button message - immediately send the real button message or status recovery update
- record the lesson into workflow / memory if it exposed a missing rule - record the lesson into workflow / memory if it exposed a missing rule

View File

@@ -52,3 +52,4 @@
- 後續再往前補強到更直接的操作層:新增 `Reply Closure Button Gate` 概念,明確規定只要回覆最後的可執行部分需要總管決定、確認、批准、停止、繼續、重跑或選下一步,就不能只在文字裡說會用按鈕,必須真的送出 inline buttons或直接執行最合理下一步。 - 後續再往前補強到更直接的操作層:新增 `Reply Closure Button Gate` 概念,明確規定只要回覆最後的可執行部分需要總管決定、確認、批准、停止、繼續、重跑或選下一步,就不能只在文字裡說會用按鈕,必須真的送出 inline buttons或直接執行最合理下一步。
- 在最短回歸測試後又確認一個更前面的根因:不只要「有按鈕」,而是**若需要按鈕,必須讓按鈕先出場**,不能先送普通文字再補按鈕;因此已把規則再收緊成 ordering rule優先用 `message` 工具送真按鈕,然後回 `NO_REPLY` - 在最短回歸測試後又確認一個更前面的根因:不只要「有按鈕」,而是**若需要按鈕,必須讓按鈕先出場**,不能先送普通文字再補按鈕;因此已把規則再收緊成 ordering rule優先用 `message` 工具送真按鈕,然後回 `NO_REPLY`
- 在完整 long-task 測試後再驗出更前置的根因:若一開始就可預見最後會進入 owner 的 pass/fail 或 accept/reject 判定,流程本身就必須提早切成 `button-path`,不能等到最後一段才臨時想起要用按鈕。 - 在完整 long-task 測試後再驗出更前置的根因:若一開始就可預見最後會進入 owner 的 pass/fail 或 accept/reject 判定,流程本身就必須提早切成 `button-path`,不能等到最後一段才臨時想起要用按鈕。
- 今日再被總管指出更高一層抽象:問題不能只限於「測試」,而是所有 **silent long-task** 都可能黑洞;因此已把規則提升成通用的 `Silent Long-Task Checkpoint Gate`:凡是啟動後不會立刻自然產生下一則對總管輸出的任務,都必須在一開始定義第一個強制回報節點與失敗時的狀態轉移。

View File

@@ -135,7 +135,22 @@ Use `checkpoint-template.md` in this skill directory.
--- ---
## 5. No-fake-progress rule ## 5. Silent long-task governance
If a long-task will not naturally emit an immediate next user-visible message, treat it as a **silent long-task**.
A silent long-task must define at startup:
- the first forced checkpoint trigger
- what to report if not yet complete
- what status transition to use if there is no new evidence
- how final owner handoff will happen if a decision is expected
Silent long-tasks must not rely on memory, intention, or implied future follow-up.
If the user later has to ask where the update went, the flow is considered failed.
---
## 6. No-fake-progress rule
Only count as real progress if there is at least one of: Only count as real progress if there is at least one of:
- new file change - new file change
@@ -154,7 +169,7 @@ If you only have status sync, say it is status sync. Do not dress it up as progr
--- ---
## 6. Stop-clock gate ## 7. Stop-clock gate
If repeated checkpoints show no new evidence, do **not** keep the task cosmetically active. If repeated checkpoints show no new evidence, do **not** keep the task cosmetically active.
You must choose one: You must choose one:
@@ -168,21 +183,22 @@ Without that, default to `paused`.
--- ---
## 7. How to use this skill in practice ## 8. How to use this skill in practice
When this skill applies: When this skill applies:
1. decide if the request is ordinary chat or long task 1. decide if the request is ordinary chat or long task
2. if long task, create/update a task record 2. if long task, create/update a task record
3. choose one of the five valid states 3. choose one of the five valid states
4. if reporting progress, use the 5-part checkpoint structure 4. if the task is silent, define the first forced checkpoint before proceeding
5. before claiming progress, check for real evidence 5. if reporting progress, use the 5-part checkpoint structure
6. if no evidence and no concrete action, stop the clock 6. before claiming progress, check for real evidence
7. if the run is clearly heading toward a user pass/fail or accept/reject judgement on Telegram, prepare a button-path before the final handoff 7. if no evidence and no concrete action, stop the clock
8. if the entire test itself exists to validate Telegram decision closure, run it as a button-driven flow rather than a normal long plain-text report 8. if the run is clearly heading toward a user pass/fail or accept/reject judgement on Telegram, prepare a button-path before the final handoff
9. if the entire test itself exists to validate Telegram decision closure, run it as a button-driven flow rather than a normal long plain-text report
--- ---
## 8. Integration guidance ## 9. Integration guidance
This skill should be paired with: This skill should be paired with:
- the current session `WORKFLOW.md` - the current session `WORKFLOW.md`
@@ -207,10 +223,11 @@ This prevents governed long-task flows from degrading back into ambiguous text-o
--- ---
## 9. Success criteria ## 10. Success criteria
This skill is working correctly when: This skill is working correctly when:
- non-chat work always enters a governed state - non-chat work always enters a governed state
- silent long-tasks never go dark without a predeclared checkpoint
- checkpoints no longer cause silent stalls - checkpoints no longer cause silent stalls
- no-evidence updates are not mislabeled as progress - no-evidence updates are not mislabeled as progress
- stalled work becomes `paused` / `blocked` instead of fake-`active` - stalled work becomes `paused` / `blocked` instead of fake-`active`