Add silent long-task checkpoint gate
This commit is contained in:
10
WORKFLOW.md
10
WORKFLOW.md
@@ -26,6 +26,16 @@ Subagent 指派後 **5 分鐘內若無結果**:
|
||||
- 必須遵守 no-fake-progress 與 stop-clock gate
|
||||
- 若回覆前其實已進入非一般 chat 工作流,卻仍以「普通聊天」方式直接回完,視為流程違規。
|
||||
|
||||
## Silent Long-Task Rule
|
||||
|
||||
- 若 long-task 啟動後**不會自然立刻產生下一則對總管的輸出**,則它屬於 `silent long-task`。
|
||||
- 任何 silent long-task 在啟動時都必須同步定義:
|
||||
- 第一個回報節點(時間 / 階段 / 事件)
|
||||
- 若尚未完成時的回報內容
|
||||
- 若沒有新證據時的狀態轉移(`paused` / `blocked`)
|
||||
- 若最後需要總管判定,handoff 方式(例如 button-path)
|
||||
- 若 silent long-task 啟動後沒有這個強制回報節點,之後出現「為什麼沒消息了?」就視為流程違規,而不是單純延遲。
|
||||
|
||||
## Checkpoint Rule
|
||||
|
||||
- checkpoint **不是結案**;它只是長任務中的階段回報,不代表可以在送出後直接停住。
|
||||
|
||||
@@ -31,6 +31,31 @@ This especially applies to:
|
||||
|
||||
If the endpoint is predictably a user decision, the assistant should structure the run so that the final user-facing handoff is already prepared as a button interaction.
|
||||
|
||||
### Silent Long-Task Checkpoint Gate
|
||||
|
||||
If a long-task is started and it will **not naturally produce an immediate next user-visible message**, it must define a forced reporting checkpoint at startup.
|
||||
|
||||
This applies to any silent long-task pattern, including but not limited to:
|
||||
- research
|
||||
- investigation
|
||||
- debugging
|
||||
- delegation / waiting on subagents
|
||||
- background execution
|
||||
- staged analysis
|
||||
- long-running verification
|
||||
- full tests / regression tests
|
||||
- any "I’ll go do this and report back" workflow
|
||||
|
||||
At startup, the task must define:
|
||||
1. the **first checkpoint trigger** (time, stage, or event)
|
||||
2. what to report if the task is **not yet finished** by that checkpoint
|
||||
3. how to downgrade status if there is **no new evidence** (`paused` / `blocked`)
|
||||
4. how final owner handoff will work if a user decision is expected
|
||||
|
||||
### Failure rule
|
||||
|
||||
If a silent long-task was started without a forced reporting checkpoint, and the user later has to ask "why is there no update?", treat that as a workflow failure / checkpoint-lost condition.
|
||||
|
||||
### Button-driven test rule
|
||||
|
||||
If a test or validation flow is known in advance to end in a Telegram pass/fail, accept/reject, or rerun/stop decision, do not start that test in the ordinary text-reply lane.
|
||||
@@ -53,6 +78,7 @@ These are violations when used as the closing interaction on Telegram:
|
||||
- saying buttons will be used, but not actually sending them
|
||||
- sending explanation first, and only later sending buttons after being corrected
|
||||
- running a full test/report in plain text even though the result is obviously heading toward a pass/fail owner decision
|
||||
- starting a silent long-task without any explicit forced reporting checkpoint
|
||||
|
||||
### Required interpretation
|
||||
|
||||
@@ -63,14 +89,17 @@ The gate applies to:
|
||||
- next-step choices
|
||||
- accept / rerun / stop style decisions
|
||||
- pass / fail verdict requests
|
||||
- silent long-task launches
|
||||
|
||||
### Violation standard
|
||||
|
||||
If the assistant reaches a user-decision closure and no real inline buttons were delivered first, treat it as a workflow violation even if the reply mentioned buttons in text.
|
||||
|
||||
If a silent long-task goes dark because no forced reporting checkpoint was defined, treat it as a workflow violation even if the task later resumes.
|
||||
|
||||
### Corrective rule
|
||||
|
||||
If this violation happens:
|
||||
- acknowledge the violation plainly
|
||||
- immediately send the real button message
|
||||
- immediately send the real button message or status recovery update
|
||||
- record the lesson into workflow / memory if it exposed a missing rule
|
||||
|
||||
@@ -52,3 +52,4 @@
|
||||
- 後續再往前補強到更直接的操作層:新增 `Reply Closure Button Gate` 概念,明確規定只要回覆最後的可執行部分需要總管決定、確認、批准、停止、繼續、重跑或選下一步,就不能只在文字裡說會用按鈕,必須真的送出 inline buttons,或直接執行最合理下一步。
|
||||
- 在最短回歸測試後又確認一個更前面的根因:不只要「有按鈕」,而是**若需要按鈕,必須讓按鈕先出場**,不能先送普通文字再補按鈕;因此已把規則再收緊成 ordering rule:優先用 `message` 工具送真按鈕,然後回 `NO_REPLY`。
|
||||
- 在完整 long-task 測試後再驗出更前置的根因:若一開始就可預見最後會進入 owner 的 pass/fail 或 accept/reject 判定,流程本身就必須提早切成 `button-path`,不能等到最後一段才臨時想起要用按鈕。
|
||||
- 今日再被總管指出更高一層抽象:問題不能只限於「測試」,而是所有 **silent long-task** 都可能黑洞;因此已把規則提升成通用的 `Silent Long-Task Checkpoint Gate`:凡是啟動後不會立刻自然產生下一則對總管輸出的任務,都必須在一開始定義第一個強制回報節點與失敗時的狀態轉移。
|
||||
|
||||
@@ -135,7 +135,22 @@ Use `checkpoint-template.md` in this skill directory.
|
||||
|
||||
---
|
||||
|
||||
## 5. No-fake-progress rule
|
||||
## 5. Silent long-task governance
|
||||
|
||||
If a long-task will not naturally emit an immediate next user-visible message, treat it as a **silent long-task**.
|
||||
|
||||
A silent long-task must define at startup:
|
||||
- the first forced checkpoint trigger
|
||||
- what to report if not yet complete
|
||||
- what status transition to use if there is no new evidence
|
||||
- how final owner handoff will happen if a decision is expected
|
||||
|
||||
Silent long-tasks must not rely on memory, intention, or implied future follow-up.
|
||||
If the user later has to ask where the update went, the flow is considered failed.
|
||||
|
||||
---
|
||||
|
||||
## 6. No-fake-progress rule
|
||||
|
||||
Only count as real progress if there is at least one of:
|
||||
- new file change
|
||||
@@ -154,7 +169,7 @@ If you only have status sync, say it is status sync. Do not dress it up as progr
|
||||
|
||||
---
|
||||
|
||||
## 6. Stop-clock gate
|
||||
## 7. Stop-clock gate
|
||||
|
||||
If repeated checkpoints show no new evidence, do **not** keep the task cosmetically active.
|
||||
You must choose one:
|
||||
@@ -168,21 +183,22 @@ Without that, default to `paused`.
|
||||
|
||||
---
|
||||
|
||||
## 7. How to use this skill in practice
|
||||
## 8. How to use this skill in practice
|
||||
|
||||
When this skill applies:
|
||||
1. decide if the request is ordinary chat or long task
|
||||
2. if long task, create/update a task record
|
||||
3. choose one of the five valid states
|
||||
4. if reporting progress, use the 5-part checkpoint structure
|
||||
5. before claiming progress, check for real evidence
|
||||
6. if no evidence and no concrete action, stop the clock
|
||||
7. if the run is clearly heading toward a user pass/fail or accept/reject judgement on Telegram, prepare a button-path before the final handoff
|
||||
8. if the entire test itself exists to validate Telegram decision closure, run it as a button-driven flow rather than a normal long plain-text report
|
||||
4. if the task is silent, define the first forced checkpoint before proceeding
|
||||
5. if reporting progress, use the 5-part checkpoint structure
|
||||
6. before claiming progress, check for real evidence
|
||||
7. if no evidence and no concrete action, stop the clock
|
||||
8. if the run is clearly heading toward a user pass/fail or accept/reject judgement on Telegram, prepare a button-path before the final handoff
|
||||
9. if the entire test itself exists to validate Telegram decision closure, run it as a button-driven flow rather than a normal long plain-text report
|
||||
|
||||
---
|
||||
|
||||
## 8. Integration guidance
|
||||
## 9. Integration guidance
|
||||
|
||||
This skill should be paired with:
|
||||
- the current session `WORKFLOW.md`
|
||||
@@ -207,10 +223,11 @@ This prevents governed long-task flows from degrading back into ambiguous text-o
|
||||
|
||||
---
|
||||
|
||||
## 9. Success criteria
|
||||
## 10. Success criteria
|
||||
|
||||
This skill is working correctly when:
|
||||
- non-chat work always enters a governed state
|
||||
- silent long-tasks never go dark without a predeclared checkpoint
|
||||
- checkpoints no longer cause silent stalls
|
||||
- no-evidence updates are not mislabeled as progress
|
||||
- stalled work becomes `paused` / `blocked` instead of fake-`active`
|
||||
|
||||
Reference in New Issue
Block a user