Prompt Cache Strategy
Anthropic’s API supports prompt caching — when consecutive API calls share the same prefix, the cached portion is served from memory at reduced cost and latency. Claude Code invests significant engineering effort into maximizing cache hit rates, because every cache miss means re-processing thousands of tokens of system prompt and conversation history.
Cache Break Detection
Section titled “Cache Break Detection”The core monitoring system lives in src/services/api/promptCacheBreakDetection.ts. It tracks every factor that can cause a cache break and logs detailed diagnostics when one occurs.
Tracked State
Section titled “Tracked State”type PreviousState = { systemHash: number // Hash of system prompt (without cache_control) toolsHash: number // Hash of tool schemas cacheControlHash: number // Hash including cache_control markers toolNames: string[] // Ordered list of tool names perToolHashes: Record<string, number> // Per-tool schema hash for diffing systemCharCount: number model: string fastMode: boolean globalCacheStrategy: string // 'tool_based' | 'system_prompt' | 'none' betas: string[] // Beta header list autoModeActive: boolean // AFK mode beta isUsingOverage: boolean // Subscription overage state effortValue: string // Resolved effort level extraBodyHash: number // Hash of extra API body params callCount: number prevCacheReadTokens: number | null cacheDeletionsPending: boolean // Expected drops from microcompact}Change Detection
Section titled “Change Detection”When any tracked value changes between API calls, the system captures the delta:
type PendingChanges = { systemPromptChanged: boolean toolSchemasChanged: boolean modelChanged: boolean fastModeChanged: boolean cacheControlChanged: boolean globalCacheStrategyChanged: boolean betasChanged: boolean addedToolCount: number removedToolCount: number systemCharDelta: number addedTools: string[] removedTools: string[] changedToolSchemas: string[] // Which tools had schema changes // ...}Cache Scoping Strategy
Section titled “Cache Scoping Strategy”Claude Code uses two cache scoping strategies based on session configuration:
Global Scope
Section titled “Global Scope”When shouldUseGlobalCacheScope() is true, the system prompt is split at SYSTEM_PROMPT_DYNAMIC_BOUNDARY:
graph TD subgraph "scope: global (shared across all users)" A[Static system prompt sections] end B["SYSTEM_PROMPT_DYNAMIC_BOUNDARY"] subgraph "scope: org (per-organization)" C[Dynamic system prompt sections] end subgraph "No cache control" D[Conversation messages] end A --> B --> C --> DEverything before the boundary gets cache_control: { type: 'ephemeral', scope: 'global' } — shared across all users on the platform. This maximizes cache sharing for the largest, most stable prompt sections.
Tool-Based Scope
Section titled “Tool-Based Scope”The alternative strategy places cache breakpoints at tool definitions rather than the system prompt boundary. This is used when global scoping isn’t available.
Strategy Transitions
Section titled “Strategy Transitions”When MCP tools are discovered or removed mid-session, the globalCacheStrategy can flip between 'tool_based', 'system_prompt', and 'none'. The detection system tracks these transitions as a known cache break cause.
Sources of Cache Breaks
Section titled “Sources of Cache Breaks”From the detection system, the known cache break sources in order of frequency:
| Source | Cause | Mitigation |
|---|---|---|
| Tool schema changes | AgentTool/SkillTool embed dynamic lists | Per-tool hashing identifies the culprit |
| MCP connect/disconnect | New tools added/removed mid-session | Moving to delta-based MCP instructions |
| System prompt changes | Dynamic sections recompute | Section registry caches stable values |
| Model changes | User switches model mid-session | Detected but unavoidable |
| Beta header changes | Feature flags toggle | Sticky-on latching prevents flip-flop |
| Effort value changes | User changes effort setting | Detected, flows into output_config |
Latching Mechanisms
Section titled “Latching Mechanisms”Several sources of instability are mitigated with sticky-on latches — once a flag is set, it stays set for the session:
// Examples of latched values (conceptual, from claude.ts)// AFK_MODE_BETA_HEADER — once auto mode activates, the beta header stays on// Cached MC enabled — once cache editing is used, it stays enabled// Overage state — once overage is detected, eligibility is latched session-stableThese prevent flip-flopping that would break the cache every other turn.
Diff Generation for Debugging
Section titled “Diff Generation for Debugging”When a cache break is detected, the system generates a diff for debugging:
function getCacheBreakDiffPath(): string { return join(getClaudeTempDir(), `cache-break-${randomSuffix}.diff`)}The diff file uses the createPatch function from the diff library to show exactly what changed in the system prompt or tool schemas between turns.
Fork Cache Sharing
Section titled “Fork Cache Sharing”The fork mechanism (see Fork Mechanism) is designed specifically for cache efficiency. All fork children from the same parent turn share a byte-identical API prefix:
[system prompt | tool definitions | conversation history | assistant turn | placeholder tool_results... | per-child directive] ←————————————————— cache-shared across all forks ————————————————————————————————————————————————→ ← varies →The FORK_PLACEHOLDER_RESULT constant ensures all tool_result blocks in the fork prefix are identical:
const FORK_PLACEHOLDER_RESULT = 'Fork started — processing in background'Notification on Compaction
Section titled “Notification on Compaction”When context is compacted (compressed), the cache detection system is explicitly notified:
import { notifyCompaction } from '../api/promptCacheBreakDetection.js'Compaction legitimately changes the conversation content, so the detection system needs to know it happened — otherwise it would report a false cache break.
Monitoring and Analytics
Section titled “Monitoring and Analytics”Cache break events are logged to analytics with detailed metadata:
- Which fields changed (system prompt, tools, model, etc.)
- Exact tool names added/removed
- Character count delta in system prompt
- Whether the break was expected (e.g., from compaction or MCP changes)
- Previous cache read token count (to measure impact)
This data feeds back into optimization — the per-tool schema hashing feature was added after analytics showed that tool schema changes (without tool addition/removal) caused 77% of tool-related cache breaks.