跳转到内容

四层 compression

随着对话增长,最终会逼近模型的 context window 上限。Claude Code 实现了多层 compression 策略——从轻量级内容裁剪到完整对话摘要——以在不丢失关键 context 的情况下持续运行。

graph TD
    A[Full Conversation Context] -->|approaching limit| B{Which layer?}
    B -->|"lightweight"| C[Layer 1: Snip<br/>Truncate large tool results]
    B -->|"targeted"| D[Layer 2: Microcompact<br/>Cache-aware inline editing]
    B -->|"structural"| E[Layer 3: Context Collapse<br/>Drop or summarize old turns]
    B -->|"full reset"| F[Layer 4: Auto Compact<br/>Summarize entire conversation]

    style C fill:#e8f5e9
    style D fill:#fff3e0
    style E fill:#fce4ec
    style F fill:#e3f2fd

每一层都比上一层更激进。系统优先尝试较轻的方式,仅在必要时才升级。

最轻量的 compression。超过大小限制的 tool 结果(文件读取、grep 输出、bash 输出)会被截断。这在消息构建时内联发生——超出限制的 tool 结果会被截断并附上截断提示。

// Conceptual — tool results are trimmed before being sent to the API
// "Output truncated. Total: 45000 chars. Showing first 30000 chars."

核心洞察:大多数 tool 输出远大于模型所需。一个 10,000 行的 grep 结果通常只有 5-10 个相关匹配项埋藏在噪声之中。

Microcompact 在 API cache 层面运作——它使用 cache 编辑来删除或缩减已被 cache 的内容,而无需重新发送整个 context。此功能由 CACHED_MICROCOMPACT feature flag 控制。

// src/constants/prompts.ts — conditional import
const getCachedMCConfigForFRC = feature('CACHED_MICROCOMPACT')
? require('../services/compact/cachedMCConfig.js').getCachedMCConfig
: null

cache 中断检测系统中的 cacheDeletionsPending flag 追踪 microcompact 发送删除操作的时机,以便将随之而来的 cache 读取 token 下降不误报为 cache 中断。

哪些内容会被 microcompact 处理:

  • FILE_UNCHANGED_STUB——当文件被重新读取且内容未变时,完整内容被替换为占位符
  • 不再被引用的旧 tool 结果
  • 过时的搜索结果

第三层:Partial Compact(context 折叠)

Section titled “第三层:Partial Compact(context 折叠)”

Partial compaction 对对话的一部分进行摘要,同时保留近期消息的原文。存在两种变体:

对最旧的消息进行摘要,保留最近消息的原文:

graph LR
    subgraph "Before partial compact"
        A[Old turns 1-50] --> B[Recent turns 51-80]
    end
    subgraph "After partial compact"
        C[Summary of turns 1-50] --> D[Recent turns 51-80<br/>preserved verbatim]
    end
src/services/compact/prompt.ts
const PARTIAL_COMPACT_PROMPT = `Your task is to create a detailed summary of the
RECENT portion of the conversation — the messages that follow earlier retained
context. The earlier messages are being kept intact and do NOT need to be summarized.
Focus your summary on what was discussed, learned, and accomplished in the recent
messages only.`

对直到某个边界点的所有内容进行摘要,生成的摘要放置在保留的近期消息之前:

const PARTIAL_COMPACT_UP_TO_PROMPT = `Your task is to create a detailed summary of this
conversation. This summary will be placed at the start of a continuing session; newer
messages that build on this context will follow after your summary.`

摘要以 SystemCompactBoundaryMessage 的形式插入,标记从摘要内容到原文内容的过渡点。

最激进的 compression。当 context 使用量达到临界阈值时,整个对话被摘要为一份单一的结构化文档。src/services/compact/prompt.ts 中的 prompt 模板精确定义了摘要必须涵盖的内容:

// src/services/compact/prompt.ts — BASE_COMPACT_PROMPT (structure)
const BASE_COMPACT_PROMPT = `Your task is to create a detailed summary of the
conversation so far...
Your summary should include the following sections:
1. Primary Request and Intent
2. Key Technical Concepts
3. Files and Code Sections (with full code snippets)
4. Errors and fixes
5. Problem Solving
6. All user messages (non-tool-result)
7. Pending Tasks
8. Current Work
9. Optional Next Step`

compact prompt 强力阻止摘要模型调用 tool:

const NO_TOOLS_PREAMBLE = `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.
- Do NOT use Read, Bash, Grep, Glob, Edit, Write, or ANY other tool.
- You already have all the context you need in the conversation above.
- Tool calls will be REJECTED and will waste your only turn — you will fail the task.`

并在结尾再次强调:

const NO_TOOLS_TRAILER =
'\n\nREMINDER: Do NOT call any tools. Respond with plain text only — ' +
'an <analysis> block followed by a <summary> block.'

compact 过程使用两阶段输出格式:

<analysis>
[Detailed chronological analysis — a drafting scratchpad]
</analysis>
<summary>
1. Primary Request and Intent: ...
2. Key Technical Concepts: ...
...
</summary>

<analysis> 块通过强制模型在摘要前梳理对话来提升摘要质量。随后由 formatCompactSummary() 在摘要进入对话前将其剥离

src/services/compact/prompt.ts
export function formatCompactSummary(summary: string): string {
// Strip analysis scratchpad
formattedSummary = formattedSummary.replace(/<analysis>[\s\S]*?<\/analysis>/, '')
// Extract and format summary section
const summaryMatch = formattedSummary.match(/<summary>([\s\S]*?)<\/summary>/)
if (summaryMatch) {
formattedSummary = formattedSummary.replace(
/<summary>[\s\S]*?<\/summary>/,
`Summary:\n${content.trim()}`,
)
}
return formattedSummary.trim()
}

压缩后的摘要以用户消息形式注入,附带来源说明:

export function getCompactUserSummaryMessage(
summary: string,
suppressFollowUpQuestions?: boolean,
transcriptPath?: string,
): string {
let baseSummary = `This session is being continued from a previous conversation
that ran out of context. The summary below covers the earlier portion.
${formattedSummary}`
if (transcriptPath) {
baseSummary += `\n\nIf you need specific details from before compaction,
read the full transcript at: ${transcriptPath}`
}
if (suppressFollowUpQuestions) {
baseSummary += `\nContinue the conversation from where it left off without
asking the user any further questions.`
}
}

用户可以提供自定义指令来引导摘要的关注重点:

export function getCompactPrompt(customInstructions?: string): string {
let prompt = NO_TOOLS_PREAMBLE + BASE_COMPACT_PROMPT
if (customInstructions && customInstructions.trim() !== '') {
prompt += `\n\nAdditional Instructions:\n${customInstructions}`
}
prompt += NO_TOOLS_TRAILER
return prompt
}

prompt 中的示例:

  • “When summarizing focus on typescript code changes and mistakes”
  • “Focus on test output and code changes. Include file reads verbatim.”
sequenceDiagram
    participant L as Agentic Loop
    participant C as Compact System
    participant A as API (Summarizer)
    participant S as Session State

    L->>L: Check context usage
    L->>C: Context threshold exceeded
    C->>C: Choose compact strategy<br/>(full vs partial)
    C->>C: Build compact prompt<br/>(NO_TOOLS + template)
    C->>A: Send conversation + compact prompt<br/>(maxTurns: 1)
    A-->>C: <analysis>...</analysis><br/><summary>...</summary>
    C->>C: formatCompactSummary()<br/>(strip analysis)
    C->>S: Insert CompactBoundaryMessage
    C->>S: Notify cache detection system
    L->>L: Resume with compressed context

pre-compact 和 post-compact hook 允许外部系统观察和响应 compaction 事件:

src/services/compact/compact.ts
import { executePostCompactHooks, executePreCompactHooks } from '../../utils/hooks.js'

这支持以下集成场景:

  • 记录 compaction 事件用于分析
  • 保存 compact 前的状态用于调试
  • 在 context 丢失前触发 memory 提取