Four-Layer Compression
As conversations grow, they eventually approach the model’s context window limit. Claude Code implements a multi-layered compression strategy — from lightweight content trimming to full conversation summarization — to keep operating without losing critical context.
The Compression Layers
Section titled “The Compression Layers”graph TD A[Full Conversation Context] -->|approaching limit| B{Which layer?} B -->|"lightweight"| C[Layer 1: Snip<br/>Truncate large tool results] B -->|"targeted"| D[Layer 2: Microcompact<br/>Cache-aware inline editing] B -->|"structural"| E[Layer 3: Context Collapse<br/>Drop or summarize old turns] B -->|"full reset"| F[Layer 4: Auto Compact<br/>Summarize entire conversation]
style C fill:#e8f5e9 style D fill:#fff3e0 style E fill:#fce4ec style F fill:#e3f2fdEach layer is progressively more aggressive. The system tries lighter approaches first and escalates only when needed.
Layer 1: Snip (Tool Result Truncation)
Section titled “Layer 1: Snip (Tool Result Truncation)”The lightest compression. Large tool results (file reads, grep outputs, bash outputs) are truncated to a maximum size. This happens inline during message construction — tool results that exceed the limit are cut with a truncation notice.
// Conceptual — tool results are trimmed before being sent to the API// "Output truncated. Total: 45000 chars. Showing first 30000 chars."The key insight: most tool output is far larger than what the model needs. A 10,000-line grep result usually contains 5-10 relevant matches buried in noise.
Layer 2: Cached Microcompact
Section titled “Layer 2: Cached Microcompact”Microcompact operates at the API cache level — it uses cache edits to delete or shrink content that’s already been cached, without re-sending the entire context. This is gated behind the CACHED_MICROCOMPACT feature flag.
// src/constants/prompts.ts — conditional importconst getCachedMCConfigForFRC = feature('CACHED_MICROCOMPACT') ? require('../services/compact/cachedMCConfig.js').getCachedMCConfig : nullThe cacheDeletionsPending flag in the cache break detection system tracks when microcompact sends deletions, so the resulting drop in cache read tokens isn’t misreported as a cache break.
What gets microcompacted:
FILE_UNCHANGED_STUB— when a file is re-read and hasn’t changed, the full content is replaced with a stub- Old tool results that are no longer referenced
- Stale search results
Layer 3: Partial Compact (Context Collapse)
Section titled “Layer 3: Partial Compact (Context Collapse)”Partial compaction summarizes a portion of the conversation while keeping recent messages intact. Two variants exist:
“from” Direction (Default)
Section titled ““from” Direction (Default)”Summarizes the oldest messages, keeping the most recent ones verbatim:
graph LR subgraph "Before partial compact" A[Old turns 1-50] --> B[Recent turns 51-80] end subgraph "After partial compact" C[Summary of turns 1-50] --> D[Recent turns 51-80<br/>preserved verbatim] endconst PARTIAL_COMPACT_PROMPT = `Your task is to create a detailed summary of theRECENT portion of the conversation — the messages that follow earlier retainedcontext. The earlier messages are being kept intact and do NOT need to be summarized.Focus your summary on what was discussed, learned, and accomplished in the recentmessages only.`“up_to” Direction
Section titled ““up_to” Direction”Summarizes everything up to a boundary point, producing a summary that precedes the kept recent messages:
const PARTIAL_COMPACT_UP_TO_PROMPT = `Your task is to create a detailed summary of thisconversation. This summary will be placed at the start of a continuing session; newermessages that build on this context will follow after your summary.`The summary is inserted as a SystemCompactBoundaryMessage that marks the transition from summarized to verbatim content.
Layer 4: Full Auto Compact
Section titled “Layer 4: Full Auto Compact”The most aggressive compression. When context usage hits a critical threshold, the entire conversation is summarized into a single structured document. The prompt template in src/services/compact/prompt.ts defines exactly what the summary must capture:
// src/services/compact/prompt.ts — BASE_COMPACT_PROMPT (structure)const BASE_COMPACT_PROMPT = `Your task is to create a detailed summary of theconversation so far...
Your summary should include the following sections:
1. Primary Request and Intent2. Key Technical Concepts3. Files and Code Sections (with full code snippets)4. Errors and fixes5. Problem Solving6. All user messages (non-tool-result)7. Pending Tasks8. Current Work9. Optional Next Step`Anti-Tool-Use Enforcement
Section titled “Anti-Tool-Use Enforcement”The compact prompt aggressively prevents the summarization model from calling tools:
const NO_TOOLS_PREAMBLE = `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.
- Do NOT use Read, Bash, Grep, Glob, Edit, Write, or ANY other tool.- You already have all the context you need in the conversation above.- Tool calls will be REJECTED and will waste your only turn — you will fail the task.`And reinforced at the end:
const NO_TOOLS_TRAILER = '\n\nREMINDER: Do NOT call any tools. Respond with plain text only — ' + 'an <analysis> block followed by a <summary> block.'Analysis-Then-Summary Pattern
Section titled “Analysis-Then-Summary Pattern”The compact process uses a two-phase output format:
<analysis>[Detailed chronological analysis — a drafting scratchpad]</analysis>
<summary>1. Primary Request and Intent: ...2. Key Technical Concepts: ......</summary>The <analysis> block improves summary quality by forcing the model to think through the conversation before summarizing. It is then stripped by formatCompactSummary() before the summary enters the conversation:
export function formatCompactSummary(summary: string): string { // Strip analysis scratchpad formattedSummary = formattedSummary.replace(/<analysis>[\s\S]*?<\/analysis>/, '') // Extract and format summary section const summaryMatch = formattedSummary.match(/<summary>([\s\S]*?)<\/summary>/) if (summaryMatch) { formattedSummary = formattedSummary.replace( /<summary>[\s\S]*?<\/summary>/, `Summary:\n${content.trim()}`, ) } return formattedSummary.trim()}Post-Compaction Message
Section titled “Post-Compaction Message”The compacted summary is injected as a user message with context about where it came from:
export function getCompactUserSummaryMessage( summary: string, suppressFollowUpQuestions?: boolean, transcriptPath?: string,): string { let baseSummary = `This session is being continued from a previous conversationthat ran out of context. The summary below covers the earlier portion.
${formattedSummary}`
if (transcriptPath) { baseSummary += `\n\nIf you need specific details from before compaction,read the full transcript at: ${transcriptPath}` }
if (suppressFollowUpQuestions) { baseSummary += `\nContinue the conversation from where it left off withoutasking the user any further questions.` }}Custom Compact Instructions
Section titled “Custom Compact Instructions”Users can provide custom instructions that guide what the summary should focus on:
export function getCompactPrompt(customInstructions?: string): string { let prompt = NO_TOOLS_PREAMBLE + BASE_COMPACT_PROMPT if (customInstructions && customInstructions.trim() !== '') { prompt += `\n\nAdditional Instructions:\n${customInstructions}` } prompt += NO_TOOLS_TRAILER return prompt}Examples from the prompt:
- “When summarizing focus on typescript code changes and mistakes”
- “Focus on test output and code changes. Include file reads verbatim.”
Compact Lifecycle
Section titled “Compact Lifecycle”sequenceDiagram participant L as Agentic Loop participant C as Compact System participant A as API (Summarizer) participant S as Session State
L->>L: Check context usage L->>C: Context threshold exceeded C->>C: Choose compact strategy<br/>(full vs partial) C->>C: Build compact prompt<br/>(NO_TOOLS + template) C->>A: Send conversation + compact prompt<br/>(maxTurns: 1) A-->>C: <analysis>...</analysis><br/><summary>...</summary> C->>C: formatCompactSummary()<br/>(strip analysis) C->>S: Insert CompactBoundaryMessage C->>S: Notify cache detection system L->>L: Resume with compressed contextHooks Integration
Section titled “Hooks Integration”Pre-compact and post-compact hooks allow external systems to observe and react to compaction events:
import { executePostCompactHooks, executePreCompactHooks } from '../../utils/hooks.js'This enables integrations like:
- Logging compaction events for analytics
- Saving pre-compact state for debugging
- Triggering memory extraction before context is lost