Prompt Cache 策略

Anthropic API 支持 prompt caching——当连续的 API 调用共享相同前缀时，被 cache 的部分将从内存中以更低成本和延迟提供服务。Claude Code 投入了大量工程努力来最大化 cache 命中率，因为每次 cache 未命中都意味着需要重新处理数千个 token 的 system prompt 和对话历史。

Cache 中断检测

核心监控系统位于 src/services/api/promptCacheBreakDetection.ts。它追踪可能导致 cache 中断的每个因素，并在发生中断时记录详细诊断信息。

追踪的状态

type PreviousState = {
  systemHash: number          // Hash of system prompt (without cache_control)
  toolsHash: number           // Hash of tool schemas
  cacheControlHash: number    // Hash including cache_control markers
  toolNames: string[]         // Ordered list of tool names
  perToolHashes: Record<string, number>  // Per-tool schema hash for diffing
  systemCharCount: number
  model: string
  fastMode: boolean
  globalCacheStrategy: string  // 'tool_based' | 'system_prompt' | 'none'
  betas: string[]             // Beta header list
  autoModeActive: boolean     // AFK mode beta
  isUsingOverage: boolean     // Subscription overage state
  effortValue: string         // Resolved effort level
  extraBodyHash: number       // Hash of extra API body params
  callCount: number
  prevCacheReadTokens: number | null
  cacheDeletionsPending: boolean  // Expected drops from microcompact
}

变更检测

当任意追踪值在 API 调用之间发生变化时，系统会捕获差异：

type PendingChanges = {
  systemPromptChanged: boolean
  toolSchemasChanged: boolean
  modelChanged: boolean
  fastModeChanged: boolean
  cacheControlChanged: boolean
  globalCacheStrategyChanged: boolean
  betasChanged: boolean
  addedToolCount: number
  removedToolCount: number
  systemCharDelta: number
  addedTools: string[]
  removedTools: string[]
  changedToolSchemas: string[]  // Which tools had schema changes
  // ...
}

Cache 作用域策略

Claude Code 根据 session 配置使用两种 cache 作用域策略：

全局作用域

当 shouldUseGlobalCacheScope() 为 true 时，system prompt 在 SYSTEM_PROMPT_DYNAMIC_BOUNDARY 处被拆分：

graph TD
    subgraph "scope: global (shared across all users)"
        A[Static system prompt sections]
    end
    B["SYSTEM_PROMPT_DYNAMIC_BOUNDARY"]
    subgraph "scope: org (per-organization)"
        C[Dynamic system prompt sections]
    end
    subgraph "No cache control"
        D[Conversation messages]
    end
    A --> B --> C --> D

边界之前的所有内容获得 cache_control: { type: 'ephemeral', scope: 'global' }——在平台上所有用户之间共享。这最大化了对最大、最稳定的 prompt 分区的 cache 共享。

基于 Tool 的作用域

另一种策略将 cache 断点放置在 tool 定义处，而非 system prompt 边界。在全局作用域不可用时使用此策略。

策略切换

当 MCP tool 在 session 中途被发现或移除时，globalCacheStrategy 可能在 'tool_based'、'system_prompt' 和 'none' 之间切换。检测系统将这些切换作为已知的 cache 中断原因进行追踪。

Cache 中断来源

根据检测系统，已知 cache 中断来源按频率排序：

来源	原因	缓解措施
Tool schema 变更	AgentTool/SkillTool 嵌入动态列表	按 tool 粒度哈希定位罪魁祸首
MCP 连接/断开	session 中途新增/移除 tool	迁移至基于增量的 MCP 指令
System prompt 变更	动态分区重新计算	分区注册表 cache 稳定值
模型切换	用户在 session 中途切换模型	可检测但不可避免
Beta 头变更	feature flag 切换	粘性开启锁存防止来回翻转
Effort 值变更	用户修改 effort 设置	可检测，流入 output_config

锁存机制

几种不稳定来源通过粘性开启锁存加以缓解——一旦某个 flag 被设置，它在 session 内保持设置状态：

// Examples of latched values (conceptual, from claude.ts)
// AFK_MODE_BETA_HEADER — once auto mode activates, the beta header stays on
// Cached MC enabled — once cache editing is used, it stays enabled
// Overage state — once overage is detected, eligibility is latched session-stable

这防止了每隔一轮就破坏 cache 的来回翻转问题。

调试用差异生成

当检测到 cache 中断时，系统会生成差异文件用于调试：

function getCacheBreakDiffPath(): string {
  return join(getClaudeTempDir(), `cache-break-${randomSuffix}.diff`)
}

差异文件使用 diff 库的 createPatch 函数，精确显示相邻两轮之间 system prompt 或 tool schema 的变化内容。

Fork Cache 共享

fork 机制（参见 Fork 机制）专为 cache 效率而设计。来自同一父轮次的所有 fork 子节点共享字节完全相同的 API 前缀：

[system prompt | tool definitions | conversation history | assistant turn | placeholder tool_results... | per-child directive]
 ←————————————————— cache-shared across all forks ————————————————————————————————————————————————→  ← varies →

FORK_PLACEHOLDER_RESULT 常量确保 fork 前缀中所有 tool_result 块完全相同：

const FORK_PLACEHOLDER_RESULT = 'Fork started — processing in background'

Compaction 时的通知

当 context 被压缩（compacted）时，cache 检测系统会收到明确通知：

import { notifyCompaction } from '../api/promptCacheBreakDetection.js'

Compaction 会合理地改变对话内容，因此检测系统需要知道这一事件——否则会误报 cache 中断。

监控与分析

Cache 中断事件会连同详细元数据一起记录到分析系统：

哪些字段发生了变化（system prompt、tool、模型等）
精确的 tool 增减名称
system prompt 字符数变化量
中断是否在预期之内（例如来自 compaction 或 MCP 变更）
上一轮 cache 读取的 token 数量（用于评估影响）

这些数据反哺优化工作——按 tool 粒度 schema 哈希功能正是在分析数据显示 tool schema 变更（无 tool 增减的情况下）造成 77% 的 tool 相关 cache 中断后才被加入的。