Clarify externalized silent long-task policy

2026-04-22 14:46:11 +08:00
parent 83be99a6bb
commit 52f7f0a557
3 changed files with 26 additions and 17 deletions
--- a/WORKFLOW.md
+++ b/WORKFLOW.md
@@ -34,6 +34,8 @@ Subagent 指派後 **5 分鐘內若無結果**：
  - 若尚未完成時的回報內容
  - 若沒有新證據時的狀態轉移（`paused` / `blocked`）
  - 若最後需要總管判定，handoff 方式（例如 button-path）
 - 任何 silent long-task 都不得只靠內部記憶與口頭承諾維持；應優先綁定外部化 checkpoint / reminder / cron 類觸發。
 - 若沒有外部化觸發可綁，則該任務**不應以 silent 模式啟動**，而應維持在立即 follow-up 模式。
 - 若 silent long-task 啟動後沒有這個強制回報節點，之後出現「為什麼沒消息了？」就視為流程違規，而不是單純延遲。
 ## Checkpoint Rule
--- a/WORKFLOW_GATES.md
+++ b/WORKFLOW_GATES.md
@@ -31,9 +31,18 @@ This especially applies to:
 If the endpoint is predictably a user decision, the assistant should structure the run so that the final user-facing handoff is already prepared as a button interaction.
-### Silent Long-Task Checkpoint Gate
+### Externalized Silent Long-Task Gate
-If a long-task is started and it will **not naturally produce an immediate next user-visible message**, it must define a forced reporting checkpoint at startup.
+If a long-task is started and it will **not naturally produce an immediate next user-visible message**, it is a silent long-task and must not rely only on assistant memory.
 A silent long-task must be externalized at startup by defining or binding:
 1. the **first forced checkpoint trigger** (time, stage, or event)
 2. what to report if the task is **not yet finished** by that checkpoint
 3. how to downgrade status if there is **no new evidence** (`paused` / `blocked`)
 4. how final owner handoff will work if a user decision is expected
 5. whether an actual external trigger should be bound (for example cron/reminder) or whether the task must remain non-silent
 If no externalized checkpoint mechanism exists, the task must **not** be launched as silent. It must stay in immediate follow-up mode instead.
 This applies to any silent long-task pattern, including but not limited to:
 - research
@@ -46,15 +55,9 @@ This applies to any silent long-task pattern, including but not limited to:
 - full tests / regression tests
 - any "I’ll go do this and report back" workflow
 At startup, the task must define:
 1. the **first checkpoint trigger** (time, stage, or event)
 2. what to report if the task is **not yet finished** by that checkpoint
 3. how to downgrade status if there is **no new evidence** (`paused` / `blocked`)
 4. how final owner handoff will work if a user decision is expected
 ### Failure rule
-If a silent long-task was started without a forced reporting checkpoint, and the user later has to ask "why is there no update?", treat that as a workflow failure / checkpoint-lost condition.
+If a silent long-task was started without an externalized checkpoint path, and the user later has to ask "why is there no update?", treat that as a workflow failure / checkpoint-lost condition.
 ### Button-driven test rule
@@ -78,7 +81,7 @@ These are violations when used as the closing interaction on Telegram:
 - saying buttons will be used, but not actually sending them
 - sending explanation first, and only later sending buttons after being corrected
 - running a full test/report in plain text even though the result is obviously heading toward a pass/fail owner decision
- starting a silent long-task without any explicit forced reporting checkpoint
+- starting a silent long-task without any explicit externalized checkpoint path
 ### Required interpretation
@@ -95,7 +98,7 @@ The gate applies to:
 If the assistant reaches a user-decision closure and no real inline buttons were delivered first, treat it as a workflow violation even if the reply mentioned buttons in text.
-If a silent long-task goes dark because no forced reporting checkpoint was defined, treat it as a workflow violation even if the task later resumes.
+If a silent long-task goes dark because no externalized checkpoint path was defined, treat it as a workflow violation even if the task later resumes.
 ### Corrective rule
--- a/skills/long-task-governor/SKILL.md
+++ b/skills/long-task-governor/SKILL.md
@@ -144,10 +144,13 @@ A silent long-task must define at startup:
 - what to report if not yet complete
 - what status transition to use if there is no new evidence
 - how final owner handoff will happen if a decision is expected
 - whether an externalized checkpoint mechanism is actually bound, or whether the task must remain non-silent
 Silent long-tasks must not rely on memory, intention, or implied future follow-up.
 If the user later has to ask where the update went, the flow is considered failed.
 If no externalized checkpoint path can be created safely, do not launch the task in silent mode.
 ---
 ## 6. No-fake-progress rule
@@ -190,11 +193,12 @@ When this skill applies:
 2. if long task, create/update a task record
 3. choose one of the five valid states
 4. if the task is silent, define the first forced checkpoint before proceeding
-5. if reporting progress, use the 5-part checkpoint structure
+5. if the task is silent, externalize the checkpoint path or keep the task non-silent
-6. before claiming progress, check for real evidence
+6. if reporting progress, use the 5-part checkpoint structure
-7. if no evidence and no concrete action, stop the clock
+7. before claiming progress, check for real evidence
-8. if the run is clearly heading toward a user pass/fail or accept/reject judgement on Telegram, prepare a button-path before the final handoff
+8. if no evidence and no concrete action, stop the clock
-9. if the entire test itself exists to validate Telegram decision closure, run it as a button-driven flow rather than a normal long plain-text report
+9. if the run is clearly heading toward a user pass/fail or accept/reject judgement on Telegram, prepare a button-path before the final handoff
 10. if the entire test itself exists to validate Telegram decision closure, run it as a button-driven flow rather than a normal long plain-text report
 ---
@@ -227,7 +231,7 @@ This prevents governed long-task flows from degrading back into ambiguous text-o
 This skill is working correctly when:
 - non-chat work always enters a governed state
- silent long-tasks never go dark without a predeclared checkpoint
+- silent long-tasks never go dark without a predeclared and externalized checkpoint path
 - checkpoints no longer cause silent stalls
 - no-evidence updates are not mislabeled as progress
 - stalled work becomes `paused` / `blocked` instead of fake-`active`