跳转到内容

Prompt Cache Strategy

此内容尚不支持你的语言。

Anthropic’s API supports prompt caching — when consecutive API calls share the same prefix, the cached portion is served from memory at reduced cost and latency. Claude Code invests significant engineering effort into maximizing cache hit rates, because every cache miss means re-processing thousands of tokens of system prompt and conversation history.

The core monitoring system lives in src/services/api/promptCacheBreakDetection.ts. It tracks every factor that can cause a cache break and logs detailed diagnostics when one occurs.

src/services/api/promptCacheBreakDetection.ts
type PreviousState = {
systemHash: number // Hash of system prompt (without cache_control)
toolsHash: number // Hash of tool schemas
cacheControlHash: number // Hash including cache_control markers
toolNames: string[] // Ordered list of tool names
perToolHashes: Record<string, number> // Per-tool schema hash for diffing
systemCharCount: number
model: string
fastMode: boolean
globalCacheStrategy: string // 'tool_based' | 'system_prompt' | 'none'
betas: string[] // Beta header list
autoModeActive: boolean // AFK mode beta
isUsingOverage: boolean // Subscription overage state
effortValue: string // Resolved effort level
extraBodyHash: number // Hash of extra API body params
callCount: number
prevCacheReadTokens: number | null
cacheDeletionsPending: boolean // Expected drops from microcompact
}

When any tracked value changes between API calls, the system captures the delta:

type PendingChanges = {
systemPromptChanged: boolean
toolSchemasChanged: boolean
modelChanged: boolean
fastModeChanged: boolean
cacheControlChanged: boolean
globalCacheStrategyChanged: boolean
betasChanged: boolean
addedToolCount: number
removedToolCount: number
systemCharDelta: number
addedTools: string[]
removedTools: string[]
changedToolSchemas: string[] // Which tools had schema changes
// ...
}

Claude Code uses two cache scoping strategies based on session configuration:

When shouldUseGlobalCacheScope() is true, the system prompt is split at SYSTEM_PROMPT_DYNAMIC_BOUNDARY:

graph TD
subgraph "scope: global (shared across all users)"
A[Static system prompt sections]
end
B["SYSTEM_PROMPT_DYNAMIC_BOUNDARY"]
subgraph "scope: org (per-organization)"
C[Dynamic system prompt sections]
end
subgraph "No cache control"
D[Conversation messages]
end
A --> B --> C --> D

Everything before the boundary gets cache_control: { type: 'ephemeral', scope: 'global' } — shared across all users on the platform. This maximizes cache sharing for the largest, most stable prompt sections.

The alternative strategy places cache breakpoints at tool definitions rather than the system prompt boundary. This is used when global scoping isn’t available.

When MCP tools are discovered or removed mid-session, the globalCacheStrategy can flip between 'tool_based', 'system_prompt', and 'none'. The detection system tracks these transitions as a known cache break cause.

From the detection system, the known cache break sources in order of frequency:

SourceCauseMitigation
Tool schema changesAgentTool/SkillTool embed dynamic listsPer-tool hashing identifies the culprit
MCP connect/disconnectNew tools added/removed mid-sessionMoving to delta-based MCP instructions
System prompt changesDynamic sections recomputeSection registry caches stable values
Model changesUser switches model mid-sessionDetected but unavoidable
Beta header changesFeature flags toggleSticky-on latching prevents flip-flop
Effort value changesUser changes effort settingDetected, flows into output_config

Several sources of instability are mitigated with sticky-on latches — once a flag is set, it stays set for the session:

// Examples of latched values (conceptual, from claude.ts)
// AFK_MODE_BETA_HEADER — once auto mode activates, the beta header stays on
// Cached MC enabled — once cache editing is used, it stays enabled
// Overage state — once overage is detected, eligibility is latched session-stable

These prevent flip-flopping that would break the cache every other turn.

When a cache break is detected, the system generates a diff for debugging:

src/services/api/promptCacheBreakDetection.ts
function getCacheBreakDiffPath(): string {
return join(getClaudeTempDir(), `cache-break-${randomSuffix}.diff`)
}

The diff file uses the createPatch function from the diff library to show exactly what changed in the system prompt or tool schemas between turns.

The fork mechanism (see Fork Mechanism) is designed specifically for cache efficiency. All fork children from the same parent turn share a byte-identical API prefix:

[system prompt | tool definitions | conversation history | assistant turn | placeholder tool_results... | per-child directive]
←————————————————— cache-shared across all forks ————————————————————————————————————————————————→ ← varies →

The FORK_PLACEHOLDER_RESULT constant ensures all tool_result blocks in the fork prefix are identical:

const FORK_PLACEHOLDER_RESULT = 'Fork started — processing in background'

When context is compacted (compressed), the cache detection system is explicitly notified:

src/services/compact/compact.ts
import { notifyCompaction } from '../api/promptCacheBreakDetection.js'

Compaction legitimately changes the conversation content, so the detection system needs to know it happened — otherwise it would report a false cache break.

Cache break events are logged to analytics with detailed metadata:

  • Which fields changed (system prompt, tools, model, etc.)
  • Exact tool names added/removed
  • Character count delta in system prompt
  • Whether the break was expected (e.g., from compaction or MCP changes)
  • Previous cache read token count (to measure impact)

This data feeds back into optimization — the per-tool schema hashing feature was added after analytics showed that tool schema changes (without tool addition/removal) caused 77% of tool-related cache breaks.