跳转到内容

Four-Layer Compression

此内容尚不支持你的语言。

As conversations grow, they eventually approach the model’s context window limit. Claude Code implements a multi-layered compression strategy — from lightweight content trimming to full conversation summarization — to keep operating without losing critical context.

graph TD
A[Full Conversation Context] -->|approaching limit| B{Which layer?}
B -->|"lightweight"| C[Layer 1: Snip<br/>Truncate large tool results]
B -->|"targeted"| D[Layer 2: Microcompact<br/>Cache-aware inline editing]
B -->|"structural"| E[Layer 3: Context Collapse<br/>Drop or summarize old turns]
B -->|"full reset"| F[Layer 4: Auto Compact<br/>Summarize entire conversation]
style C fill:#e8f5e9
style D fill:#fff3e0
style E fill:#fce4ec
style F fill:#e3f2fd

Each layer is progressively more aggressive. The system tries lighter approaches first and escalates only when needed.

The lightest compression. Large tool results (file reads, grep outputs, bash outputs) are truncated to a maximum size. This happens inline during message construction — tool results that exceed the limit are cut with a truncation notice.

// Conceptual — tool results are trimmed before being sent to the API
// "Output truncated. Total: 45000 chars. Showing first 30000 chars."

The key insight: most tool output is far larger than what the model needs. A 10,000-line grep result usually contains 5-10 relevant matches buried in noise.

Microcompact operates at the API cache level — it uses cache edits to delete or shrink content that’s already been cached, without re-sending the entire context. This is gated behind the CACHED_MICROCOMPACT feature flag.

// src/constants/prompts.ts — conditional import
const getCachedMCConfigForFRC = feature('CACHED_MICROCOMPACT')
? require('../services/compact/cachedMCConfig.js').getCachedMCConfig
: null

The cacheDeletionsPending flag in the cache break detection system tracks when microcompact sends deletions, so the resulting drop in cache read tokens isn’t misreported as a cache break.

What gets microcompacted:

  • FILE_UNCHANGED_STUB — when a file is re-read and hasn’t changed, the full content is replaced with a stub
  • Old tool results that are no longer referenced
  • Stale search results

Layer 3: Partial Compact (Context Collapse)

Section titled “Layer 3: Partial Compact (Context Collapse)”

Partial compaction summarizes a portion of the conversation while keeping recent messages intact. Two variants exist:

Summarizes the oldest messages, keeping the most recent ones verbatim:

graph LR
subgraph "Before partial compact"
A[Old turns 1-50] --> B[Recent turns 51-80]
end
subgraph "After partial compact"
C[Summary of turns 1-50] --> D[Recent turns 51-80<br/>preserved verbatim]
end
src/services/compact/prompt.ts
const PARTIAL_COMPACT_PROMPT = `Your task is to create a detailed summary of the
RECENT portion of the conversation — the messages that follow earlier retained
context. The earlier messages are being kept intact and do NOT need to be summarized.
Focus your summary on what was discussed, learned, and accomplished in the recent
messages only.`

Summarizes everything up to a boundary point, producing a summary that precedes the kept recent messages:

const PARTIAL_COMPACT_UP_TO_PROMPT = `Your task is to create a detailed summary of this
conversation. This summary will be placed at the start of a continuing session; newer
messages that build on this context will follow after your summary.`

The summary is inserted as a SystemCompactBoundaryMessage that marks the transition from summarized to verbatim content.

The most aggressive compression. When context usage hits a critical threshold, the entire conversation is summarized into a single structured document. The prompt template in src/services/compact/prompt.ts defines exactly what the summary must capture:

// src/services/compact/prompt.ts — BASE_COMPACT_PROMPT (structure)
const BASE_COMPACT_PROMPT = `Your task is to create a detailed summary of the
conversation so far...
Your summary should include the following sections:
1. Primary Request and Intent
2. Key Technical Concepts
3. Files and Code Sections (with full code snippets)
4. Errors and fixes
5. Problem Solving
6. All user messages (non-tool-result)
7. Pending Tasks
8. Current Work
9. Optional Next Step`

The compact prompt aggressively prevents the summarization model from calling tools:

const NO_TOOLS_PREAMBLE = `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.
- Do NOT use Read, Bash, Grep, Glob, Edit, Write, or ANY other tool.
- You already have all the context you need in the conversation above.
- Tool calls will be REJECTED and will waste your only turn — you will fail the task.`

And reinforced at the end:

const NO_TOOLS_TRAILER =
'\n\nREMINDER: Do NOT call any tools. Respond with plain text only — ' +
'an <analysis> block followed by a <summary> block.'

The compact process uses a two-phase output format:

<analysis>
[Detailed chronological analysis — a drafting scratchpad]
</analysis>
<summary>
1. Primary Request and Intent: ...
2. Key Technical Concepts: ...
...
</summary>

The <analysis> block improves summary quality by forcing the model to think through the conversation before summarizing. It is then stripped by formatCompactSummary() before the summary enters the conversation:

src/services/compact/prompt.ts
export function formatCompactSummary(summary: string): string {
// Strip analysis scratchpad
formattedSummary = formattedSummary.replace(/<analysis>[\s\S]*?<\/analysis>/, '')
// Extract and format summary section
const summaryMatch = formattedSummary.match(/<summary>([\s\S]*?)<\/summary>/)
if (summaryMatch) {
formattedSummary = formattedSummary.replace(
/<summary>[\s\S]*?<\/summary>/,
`Summary:\n${content.trim()}`,
)
}
return formattedSummary.trim()
}

The compacted summary is injected as a user message with context about where it came from:

export function getCompactUserSummaryMessage(
summary: string,
suppressFollowUpQuestions?: boolean,
transcriptPath?: string,
): string {
let baseSummary = `This session is being continued from a previous conversation
that ran out of context. The summary below covers the earlier portion.
${formattedSummary}`
if (transcriptPath) {
baseSummary += `\n\nIf you need specific details from before compaction,
read the full transcript at: ${transcriptPath}`
}
if (suppressFollowUpQuestions) {
baseSummary += `\nContinue the conversation from where it left off without
asking the user any further questions.`
}
}

Users can provide custom instructions that guide what the summary should focus on:

export function getCompactPrompt(customInstructions?: string): string {
let prompt = NO_TOOLS_PREAMBLE + BASE_COMPACT_PROMPT
if (customInstructions && customInstructions.trim() !== '') {
prompt += `\n\nAdditional Instructions:\n${customInstructions}`
}
prompt += NO_TOOLS_TRAILER
return prompt
}

Examples from the prompt:

  • “When summarizing focus on typescript code changes and mistakes”
  • “Focus on test output and code changes. Include file reads verbatim.”
sequenceDiagram
participant L as Agentic Loop
participant C as Compact System
participant A as API (Summarizer)
participant S as Session State
L->>L: Check context usage
L->>C: Context threshold exceeded
C->>C: Choose compact strategy<br/>(full vs partial)
C->>C: Build compact prompt<br/>(NO_TOOLS + template)
C->>A: Send conversation + compact prompt<br/>(maxTurns: 1)
A-->>C: <analysis>...</analysis><br/><summary>...</summary>
C->>C: formatCompactSummary()<br/>(strip analysis)
C->>S: Insert CompactBoundaryMessage
C->>S: Notify cache detection system
L->>L: Resume with compressed context

Pre-compact and post-compact hooks allow external systems to observe and react to compaction events:

src/services/compact/compact.ts
import { executePostCompactHooks, executePreCompactHooks } from '../../utils/hooks.js'

This enables integrations like:

  • Logging compaction events for analytics
  • Saving pre-compact state for debugging
  • Triggering memory extraction before context is lost