Prompt Cache Strategy

此内容尚不支持你的语言。

Anthropic’s API supports prompt caching — when consecutive API calls share the same prefix, the cached portion is served from memory at reduced cost and latency. Claude Code invests significant engineering effort into maximizing cache hit rates, because every cache miss means re-processing thousands of tokens of system prompt and conversation history.

Cache Break Detection

The core monitoring system lives in src/services/api/promptCacheBreakDetection.ts. It tracks every factor that can cause a cache break and logs detailed diagnostics when one occurs.

Tracked State

type PreviousState = {
  systemHash: number          // Hash of system prompt (without cache_control)
  toolsHash: number           // Hash of tool schemas
  cacheControlHash: number    // Hash including cache_control markers
  toolNames: string[]         // Ordered list of tool names
  perToolHashes: Record<string, number>  // Per-tool schema hash for diffing
  systemCharCount: number
  model: string
  fastMode: boolean
  globalCacheStrategy: string  // 'tool_based' | 'system_prompt' | 'none'
  betas: string[]             // Beta header list
  autoModeActive: boolean     // AFK mode beta
  isUsingOverage: boolean     // Subscription overage state
  effortValue: string         // Resolved effort level
  extraBodyHash: number       // Hash of extra API body params
  callCount: number
  prevCacheReadTokens: number | null
  cacheDeletionsPending: boolean  // Expected drops from microcompact
}

Change Detection

When any tracked value changes between API calls, the system captures the delta:

type PendingChanges = {
  systemPromptChanged: boolean
  toolSchemasChanged: boolean
  modelChanged: boolean
  fastModeChanged: boolean
  cacheControlChanged: boolean
  globalCacheStrategyChanged: boolean
  betasChanged: boolean
  addedToolCount: number
  removedToolCount: number
  systemCharDelta: number
  addedTools: string[]
  removedTools: string[]
  changedToolSchemas: string[]  // Which tools had schema changes
  // ...
}

Cache Scoping Strategy

Claude Code uses two cache scoping strategies based on session configuration:

Global Scope

When shouldUseGlobalCacheScope() is true, the system prompt is split at SYSTEM_PROMPT_DYNAMIC_BOUNDARY:

graph TD
    subgraph "scope: global (shared across all users)"
        A[Static system prompt sections]
    end
    B["SYSTEM_PROMPT_DYNAMIC_BOUNDARY"]
    subgraph "scope: org (per-organization)"
        C[Dynamic system prompt sections]
    end
    subgraph "No cache control"
        D[Conversation messages]
    end
    A --> B --> C --> D

Everything before the boundary gets cache_control: { type: 'ephemeral', scope: 'global' } — shared across all users on the platform. This maximizes cache sharing for the largest, most stable prompt sections.

Tool-Based Scope

The alternative strategy places cache breakpoints at tool definitions rather than the system prompt boundary. This is used when global scoping isn’t available.

Strategy Transitions

When MCP tools are discovered or removed mid-session, the globalCacheStrategy can flip between 'tool_based', 'system_prompt', and 'none'. The detection system tracks these transitions as a known cache break cause.

Sources of Cache Breaks

From the detection system, the known cache break sources in order of frequency:

Source	Cause	Mitigation
Tool schema changes	AgentTool/SkillTool embed dynamic lists	Per-tool hashing identifies the culprit
MCP connect/disconnect	New tools added/removed mid-session	Moving to delta-based MCP instructions
System prompt changes	Dynamic sections recompute	Section registry caches stable values
Model changes	User switches model mid-session	Detected but unavoidable
Beta header changes	Feature flags toggle	Sticky-on latching prevents flip-flop
Effort value changes	User changes effort setting	Detected, flows into output_config

Latching Mechanisms

Several sources of instability are mitigated with sticky-on latches — once a flag is set, it stays set for the session:

// Examples of latched values (conceptual, from claude.ts)
// AFK_MODE_BETA_HEADER — once auto mode activates, the beta header stays on
// Cached MC enabled — once cache editing is used, it stays enabled
// Overage state — once overage is detected, eligibility is latched session-stable

These prevent flip-flopping that would break the cache every other turn.

Diff Generation for Debugging

When a cache break is detected, the system generates a diff for debugging:

function getCacheBreakDiffPath(): string {
  return join(getClaudeTempDir(), `cache-break-${randomSuffix}.diff`)
}

The diff file uses the createPatch function from the diff library to show exactly what changed in the system prompt or tool schemas between turns.

The fork mechanism (see Fork Mechanism) is designed specifically for cache efficiency. All fork children from the same parent turn share a byte-identical API prefix:

[system prompt | tool definitions | conversation history | assistant turn | placeholder tool_results... | per-child directive]
 ←————————————————— cache-shared across all forks ————————————————————————————————————————————————→  ← varies →

The FORK_PLACEHOLDER_RESULT constant ensures all tool_result blocks in the fork prefix are identical:

const FORK_PLACEHOLDER_RESULT = 'Fork started — processing in background'

Notification on Compaction

When context is compacted (compressed), the cache detection system is explicitly notified:

import { notifyCompaction } from '../api/promptCacheBreakDetection.js'

Compaction legitimately changes the conversation content, so the detection system needs to know it happened — otherwise it would report a false cache break.

Monitoring and Analytics

Cache break events are logged to analytics with detailed metadata:

Which fields changed (system prompt, tools, model, etc.)
Exact tool names added/removed
Character count delta in system prompt
Whether the break was expected (e.g., from compaction or MCP changes)
Previous cache read token count (to measure impact)

This data feeds back into optimization — the per-tool schema hashing feature was added after analytics showed that tool schema changes (without tool addition/removal) caused 77% of tool-related cache breaks.