Pattern: Fork & Cache Reuse
Pattern Essence
Section titled “Pattern Essence”When a parent agent spawns a sub-agent (fork), the sub-agent typically needs the same system prompt, conversation context, and tool definitions. Naively, each fork would send an independent API request — paying full price for the identical prefix.
Fork & Cache Reuse exploits Claude’s Prompt Cache by ensuring all sub-agents share the same system prompt prefix, allowing them to reuse cached tokens instead of re-processing them.
graph TB subgraph "Without Cache Reuse" P1["Parent Agent<br/>System: 4K tokens<br/>Context: 20K tokens"] F1["Fork A<br/>System: 4K tokens ← re-processed<br/>Context: 20K tokens ← re-processed"] F2["Fork B<br/>System: 4K tokens ← re-processed<br/>Context: 20K tokens ← re-processed"] P1 --> F1 P1 --> F2 end
subgraph "With Cache Reuse" P2["Parent Agent<br/>System: 4K tokens<br/>Context: 20K tokens"] F3["Fork A<br/>System: 4K tokens ← CACHED ✅<br/>Context: 20K tokens ← CACHED ✅"] F4["Fork B<br/>System: 4K tokens ← CACHED ✅<br/>Context: 20K tokens ← CACHED ✅"] P2 --> F3 P2 --> F4 end
style P1 fill:#94a3b8 style F1 fill:#fca5a5 style F2 fill:#fca5a5 style P2 fill:#4ade80 style F3 fill:#4ade80 style F4 fill:#4ade80How Prompt Caching Works
Section titled “How Prompt Caching Works”Claude’s Prompt Cache operates on a prefix-matching basis:
Request A: [System Prompt][Context A][User Message A] ↑ cache breakpoint
Request B: [System Prompt][Context B][User Message B] ^^^^^^^^^^^^^^^^ cached if identical prefixThe cache matches from the beginning of the request. Any shared prefix is cached; once the content diverges, caching stops. This has a critical implication for fork design:
The Prefix Rule
Section titled “The Prefix Rule”// ✅ Cache-friendly: identical prefixconst parentRequest = { system: SHARED_SYSTEM_PROMPT, // Identical across all forks messages: [ ...sharedContext, // Shared conversation history { role: 'user', content: 'Fork-specific task A' }, // Diverges here ],};
const forkRequest = { system: SHARED_SYSTEM_PROMPT, // ← Same prefix: CACHED messages: [ ...sharedContext, // ← Same prefix: CACHED { role: 'user', content: 'Fork-specific task B' }, // Diverges here ],};// ❌ Cache-hostile: different prefixconst forkRequest = { system: SHARED_SYSTEM_PROMPT + '\nYou are Sub-Agent B.', // Different! messages: [ { role: 'user', content: 'Fork-specific preamble' }, // Different! ...sharedContext, // Too late — prefix already diverged ],};Economic Analysis
Section titled “Economic Analysis”Scenario: Code Review with 5 File Forks
Section titled “Scenario: Code Review with 5 File Forks”A parent agent reviews a PR and forks sub-agents to analyze each changed file independently.
Shared prefix: - System prompt: 4,000 tokens - Project context: 8,000 tokens - PR description: 2,000 tokens - Shared instructions: 1,000 tokens Total shared prefix: 15,000 tokens
Fork-specific suffix: - File content: ~3,000 tokens each - Analysis prompt: ~500 tokens each Total per-fork: ~3,500 tokensCost Comparison (Claude Sonnet pricing):
| Metric | Without Cache | With Cache | Savings |
|---|---|---|---|
| Parent request | 15,000 tokens | 15,000 tokens | — |
| Fork A input | 18,500 tokens | 3,500 fresh + 15,000 cached | 90% on 15K |
| Fork B input | 18,500 tokens | 3,500 fresh + 15,000 cached | 90% on 15K |
| Fork C input | 18,500 tokens | 3,500 fresh + 15,000 cached | 90% on 15K |
| Fork D input | 18,500 tokens | 3,500 fresh + 15,000 cached | 90% on 15K |
| Fork E input | 18,500 tokens | 3,500 fresh + 15,000 cached | 90% on 15K |
| Total input | 107,500 tokens | 15,000 + 17,500 fresh + 75,000 cached | — |
| Effective cost | 107,500 × $3/M | 32,500 × $3/M + 75,000 × $0.30/M | ~$0.30 → ~$0.12 |
| Savings | — | — | ~60% |
Implementation Pattern
Section titled “Implementation Pattern”Step 1: Design a Cache-Friendly System Prompt
Section titled “Step 1: Design a Cache-Friendly System Prompt”// The system prompt is structured for maximum cache reusefunction buildSystemPrompt(config: AgentConfig): string { // STATIC section — identical for all agents and forks const staticSection = [ IDENTITY_PROMPT, // "You are Claude, made by Anthropic..." CAPABILITIES_PROMPT, // Tool descriptions, behavior rules SAFETY_PROMPT, // Security policies ].join('\n\n');
// SEMI-STATIC section — changes per project but shared across forks const projectSection = [ `Project: ${config.projectName}`, `Working directory: ${config.cwd}`, config.claudeMdContent, // CLAUDE.md content ].join('\n\n');
// The fork-specific part goes in the messages, NOT the system prompt return `${staticSection}\n\n${projectSection}`;}Step 2: Fork with Shared Prefix
Section titled “Step 2: Fork with Shared Prefix”interface ForkOptions { task: string; parentMessages: Message[]; sharedPrefixLength: number; // How many messages form the shared prefix}
function createFork( parentSystemPrompt: string, options: ForkOptions,): APIRequest { // Share the parent's message prefix for cache reuse const sharedMessages = options.parentMessages.slice(0, options.sharedPrefixLength);
return { system: parentSystemPrompt, // Identical — will be cached messages: [ ...sharedMessages, // Identical prefix — will be cached { role: 'user', content: `Sub-task: ${options.task}`, // Fork-specific — not cached }, ], };}Step 3: Orchestrate Forks for Cache Warming
Section titled “Step 3: Orchestrate Forks for Cache Warming”async function executeForksWithCacheReuse( systemPrompt: string, sharedMessages: Message[], tasks: string[],): Promise<ForkResult[]> { // Step 1: The parent's last request already warmed the cache // (The system prompt + shared messages are now in the cache)
// Step 2: Fire all forks — they all share the cached prefix const forkPromises = tasks.map(task => callAPI({ system: systemPrompt, // Cache hit on this messages: [ ...sharedMessages, // Cache hit on this { role: 'user', content: `Analyze: ${task}` }, ], }) );
// Step 3: All forks execute with ~90% cached input return Promise.all(forkPromises);}Cache Alignment Strategies
Section titled “Cache Alignment Strategies”Strategy 1: Static Prefix + Dynamic Suffix
Section titled “Strategy 1: Static Prefix + Dynamic Suffix”[CACHED] System Prompt → Project Context → Shared History[FRESH] Fork-specific task descriptionThis is the simplest and most effective. Works when all forks share the same conversation context.
Strategy 2: Checkpoint-Based Caching
Section titled “Strategy 2: Checkpoint-Based Caching”// Create explicit cache checkpoints at conversation milestonesfunction createCacheCheckpoint(messages: Message[]): CacheCheckpoint { return { messages: [...messages], tokenCount: countTokens(messages), timestamp: Date.now(), };}
// Forks reference the checkpoint instead of the live conversationfunction forkFromCheckpoint( checkpoint: CacheCheckpoint, task: string,): APIRequest { return { system: systemPrompt, messages: [ ...checkpoint.messages, // Identical to other forks using same checkpoint { role: 'user', content: task }, ], };}Strategy 3: Layered Caching
Section titled “Strategy 3: Layered Caching”graph LR L1["Layer 1: Identity<br/>~2K tokens<br/>Cache: ALL requests"] --> L2["Layer 2: Project<br/>~5K tokens<br/>Cache: Same project"] L2 --> L3["Layer 3: Conversation<br/>~10K tokens<br/>Cache: Same session forks"] L3 --> L4["Layer 4: Fork Task<br/>~2K tokens<br/>NOT cached"]
style L1 fill:#4ade80 style L2 fill:#a3e635 style L3 fill:#facc15 style L4 fill:#94a3b8Limitations and Trade-offs
Section titled “Limitations and Trade-offs”Cache TTL
Section titled “Cache TTL”Prompt caches have a time-to-live (typically 5 minutes). If forks are launched too far apart, earlier caches may expire.
// Mitigation: Launch forks as close together as possibleasync function launchForksQuickly(tasks: string[]) { // ✅ Good: All forks launched within milliseconds const results = await Promise.all(tasks.map(t => launchFork(t)));
// ❌ Bad: Sequential with delays — later forks may miss the cache for (const task of tasks) { await launchFork(task); // 30s per fork = 5 forks take 2.5 min await delay(30_000); }}Prefix Rigidity
Section titled “Prefix Rigidity”Any change to the shared prefix invalidates the cache. This creates tension between personalization and caching:
// ❌ Cache-hostile: per-fork customization in system promptsystem: `${BASE_PROMPT}\nYou are analyzing file: ${filename}`
// ✅ Cache-friendly: customization in messages insteadsystem: BASE_PROMPT,messages: [...shared, { role: 'user', content: `Analyze file: ${filename}` }]Minimum Cache Size
Section titled “Minimum Cache Size”Claude’s Prompt Cache has a minimum prefix length (typically 1,024 tokens for Sonnet, 2,048 for Opus). Very short system prompts won’t benefit.
Cost Overhead
Section titled “Cost Overhead”Cache write has a small surcharge (~25% on first request). The savings only materialize when the cache is read by subsequent requests. Single-fork scenarios may actually cost slightly more.
| Scenario | Cache Benefit |
|---|---|
| 1 fork | ❌ Net cost increase (write overhead) |
| 2 forks | ⚠️ Break-even |
| 3+ forks | ✅ Significant savings |
| 5+ forks | ✅✅ Major savings (60%+) |
Reusable Template
Section titled “Reusable Template”// ============================================// Fork & Cache Reuse Manager// ============================================
interface CacheAwareForkManager { warmCache(systemPrompt: string, sharedMessages: Message[]): Promise<void>; fork(task: string): Promise<ForkResult>; forkAll(tasks: string[]): Promise<ForkResult[]>;}
function createForkManager( apiClient: APIClient, systemPrompt: string, sharedMessages: Message[],): CacheAwareForkManager { let cacheWarmed = false;
return { async warmCache() { // Make a lightweight request to populate the cache await apiClient.complete({ system: systemPrompt, messages: [...sharedMessages, { role: 'user', content: 'Acknowledge.' }], maxTokens: 10, }); cacheWarmed = true; },
async fork(task: string) { return apiClient.complete({ system: systemPrompt, messages: [ ...sharedMessages, { role: 'user', content: task }, ], }); },
async forkAll(tasks: string[]) { if (!cacheWarmed) await this.warmCache();
// Launch all forks simultaneously to maximize cache hits return Promise.all(tasks.map(task => this.fork(task))); }, };}Decision Guide
Section titled “Decision Guide”graph TD A["Need sub-agents?"] -->|Yes| B["How many forks?"] A -->|No| Z["No cache benefit needed"]
B -->|"1"| C["Skip cache optimization<br/>Write overhead not worth it"] B -->|"2"| D["Use if shared prefix > 10K tokens"] B -->|"3+"| E["Always use Fork & Cache"]
E --> F["Shared prefix > 1024 tokens?"] F -->|Yes| G["✅ Implement pattern"] F -->|No| H["Extend system prompt<br/>to reach minimum"]
G --> I["Launch forks within<br/>cache TTL window"]