Pattern: Fork & Cache Reuse

Pattern Essence

When a parent agent spawns a sub-agent (fork), the sub-agent typically needs the same system prompt, conversation context, and tool definitions. Naively, each fork would send an independent API request — paying full price for the identical prefix.

Fork & Cache Reuse exploits Claude’s Prompt Cache by ensuring all sub-agents share the same system prompt prefix, allowing them to reuse cached tokens instead of re-processing them.

graph TB
    subgraph "Without Cache Reuse"
        P1["Parent Agent<br/>System: 4K tokens<br/>Context: 20K tokens"]
        F1["Fork A<br/>System: 4K tokens ← re-processed<br/>Context: 20K tokens ← re-processed"]
        F2["Fork B<br/>System: 4K tokens ← re-processed<br/>Context: 20K tokens ← re-processed"]
        P1 --> F1
        P1 --> F2
    end

    subgraph "With Cache Reuse"
        P2["Parent Agent<br/>System: 4K tokens<br/>Context: 20K tokens"]
        F3["Fork A<br/>System: 4K tokens ← CACHED ✅<br/>Context: 20K tokens ← CACHED ✅"]
        F4["Fork B<br/>System: 4K tokens ← CACHED ✅<br/>Context: 20K tokens ← CACHED ✅"]
        P2 --> F3
        P2 --> F4
    end

    style P1 fill:#94a3b8
    style F1 fill:#fca5a5
    style F2 fill:#fca5a5
    style P2 fill:#4ade80
    style F3 fill:#4ade80
    style F4 fill:#4ade80

How Prompt Caching Works

Claude’s Prompt Cache operates on a prefix-matching basis:

Request A: [System Prompt][Context A][User Message A]
                          ↑ cache breakpoint

Request B: [System Prompt][Context B][User Message B]
           ^^^^^^^^^^^^^^^^ cached if identical prefix

The cache matches from the beginning of the request. Any shared prefix is cached; once the content diverges, caching stops. This has a critical implication for fork design:

The Prefix Rule

// ✅ Cache-friendly: identical prefix
const parentRequest = {
  system: SHARED_SYSTEM_PROMPT,      // Identical across all forks
  messages: [
    ...sharedContext,                 // Shared conversation history
    { role: 'user', content: 'Fork-specific task A' },  // Diverges here
  ],
};

const forkRequest = {
  system: SHARED_SYSTEM_PROMPT,      // ← Same prefix: CACHED
  messages: [
    ...sharedContext,                 // ← Same prefix: CACHED
    { role: 'user', content: 'Fork-specific task B' },  // Diverges here
  ],
};

// ❌ Cache-hostile: different prefix
const forkRequest = {
  system: SHARED_SYSTEM_PROMPT + '\nYou are Sub-Agent B.',  // Different!
  messages: [
    { role: 'user', content: 'Fork-specific preamble' },   // Different!
    ...sharedContext,                                        // Too late — prefix already diverged
  ],
};

Economic Analysis

Scenario: Code Review with 5 File Forks

A parent agent reviews a PR and forks sub-agents to analyze each changed file independently.

Shared prefix:
  - System prompt:        4,000 tokens
  - Project context:      8,000 tokens
  - PR description:       2,000 tokens
  - Shared instructions:  1,000 tokens
  Total shared prefix:   15,000 tokens

Fork-specific suffix:
  - File content:         ~3,000 tokens each
  - Analysis prompt:      ~500 tokens each
  Total per-fork:        ~3,500 tokens

Cost Comparison (Claude Sonnet pricing):

Metric	Without Cache	With Cache	Savings
Parent request	15,000 tokens	15,000 tokens	—
Fork A input	18,500 tokens	3,500 fresh + 15,000 cached	90% on 15K
Fork B input	18,500 tokens	3,500 fresh + 15,000 cached	90% on 15K
Fork C input	18,500 tokens	3,500 fresh + 15,000 cached	90% on 15K
Fork D input	18,500 tokens	3,500 fresh + 15,000 cached	90% on 15K
Fork E input	18,500 tokens	3,500 fresh + 15,000 cached	90% on 15K
Total input	107,500 tokens	15,000 + 17,500 fresh + 75,000 cached	—
Effective cost	107,500 × $3/M	32,500 × $3/M + 75,000 × $0.30/M	~$0.30 → ~$0.12
Savings	—	—	~60%

Implementation Pattern

Step 1: Design a Cache-Friendly System Prompt

// The system prompt is structured for maximum cache reuse
function buildSystemPrompt(config: AgentConfig): string {
  // STATIC section — identical for all agents and forks
  const staticSection = [
    IDENTITY_PROMPT,          // "You are Claude, made by Anthropic..."
    CAPABILITIES_PROMPT,      // Tool descriptions, behavior rules
    SAFETY_PROMPT,            // Security policies
  ].join('\n\n');

  // SEMI-STATIC section — changes per project but shared across forks
  const projectSection = [
    `Project: ${config.projectName}`,
    `Working directory: ${config.cwd}`,
    config.claudeMdContent,   // CLAUDE.md content
  ].join('\n\n');

  // The fork-specific part goes in the messages, NOT the system prompt
  return `${staticSection}\n\n${projectSection}`;
}

Step 2: Fork with Shared Prefix

interface ForkOptions {
  task: string;
  parentMessages: Message[];
  sharedPrefixLength: number;  // How many messages form the shared prefix
}

function createFork(
  parentSystemPrompt: string,
  options: ForkOptions,
): APIRequest {
  // Share the parent's message prefix for cache reuse
  const sharedMessages = options.parentMessages.slice(0, options.sharedPrefixLength);

  return {
    system: parentSystemPrompt,  // Identical — will be cached
    messages: [
      ...sharedMessages,         // Identical prefix — will be cached
      {
        role: 'user',
        content: `Sub-task: ${options.task}`,  // Fork-specific — not cached
      },
    ],
  };
}

Step 3: Orchestrate Forks for Cache Warming

async function executeForksWithCacheReuse(
  systemPrompt: string,
  sharedMessages: Message[],
  tasks: string[],
): Promise<ForkResult[]> {
  // Step 1: The parent's last request already warmed the cache
  // (The system prompt + shared messages are now in the cache)

  // Step 2: Fire all forks — they all share the cached prefix
  const forkPromises = tasks.map(task =>
    callAPI({
      system: systemPrompt,      // Cache hit on this
      messages: [
        ...sharedMessages,       // Cache hit on this
        { role: 'user', content: `Analyze: ${task}` },
      ],
    })
  );

  // Step 3: All forks execute with ~90% cached input
  return Promise.all(forkPromises);
}

Cache Alignment Strategies

Strategy 1: Static Prefix + Dynamic Suffix

[CACHED] System Prompt → Project Context → Shared History
[FRESH]  Fork-specific task description

This is the simplest and most effective. Works when all forks share the same conversation context.

Strategy 2: Checkpoint-Based Caching

// Create explicit cache checkpoints at conversation milestones
function createCacheCheckpoint(messages: Message[]): CacheCheckpoint {
  return {
    messages: [...messages],
    tokenCount: countTokens(messages),
    timestamp: Date.now(),
  };
}

// Forks reference the checkpoint instead of the live conversation
function forkFromCheckpoint(
  checkpoint: CacheCheckpoint,
  task: string,
): APIRequest {
  return {
    system: systemPrompt,
    messages: [
      ...checkpoint.messages,  // Identical to other forks using same checkpoint
      { role: 'user', content: task },
    ],
  };
}

Strategy 3: Layered Caching

graph LR
    L1["Layer 1: Identity<br/>~2K tokens<br/>Cache: ALL requests"] --> L2["Layer 2: Project<br/>~5K tokens<br/>Cache: Same project"]
    L2 --> L3["Layer 3: Conversation<br/>~10K tokens<br/>Cache: Same session forks"]
    L3 --> L4["Layer 4: Fork Task<br/>~2K tokens<br/>NOT cached"]

    style L1 fill:#4ade80
    style L2 fill:#a3e635
    style L3 fill:#facc15
    style L4 fill:#94a3b8

Limitations and Trade-offs

Cache TTL

Prompt caches have a time-to-live (typically 5 minutes). If forks are launched too far apart, earlier caches may expire.

// Mitigation: Launch forks as close together as possible
async function launchForksQuickly(tasks: string[]) {
  // ✅ Good: All forks launched within milliseconds
  const results = await Promise.all(tasks.map(t => launchFork(t)));

  // ❌ Bad: Sequential with delays — later forks may miss the cache
  for (const task of tasks) {
    await launchFork(task);  // 30s per fork = 5 forks take 2.5 min
    await delay(30_000);
  }
}

Prefix Rigidity

Any change to the shared prefix invalidates the cache. This creates tension between personalization and caching:

// ❌ Cache-hostile: per-fork customization in system prompt
system: `${BASE_PROMPT}\nYou are analyzing file: ${filename}`

// ✅ Cache-friendly: customization in messages instead
system: BASE_PROMPT,
messages: [...shared, { role: 'user', content: `Analyze file: ${filename}` }]

Minimum Cache Size

Claude’s Prompt Cache has a minimum prefix length (typically 1,024 tokens for Sonnet, 2,048 for Opus). Very short system prompts won’t benefit.

Cost Overhead

Cache write has a small surcharge (~25% on first request). The savings only materialize when the cache is read by subsequent requests. Single-fork scenarios may actually cost slightly more.

Scenario	Cache Benefit
1 fork	❌ Net cost increase (write overhead)
2 forks	⚠️ Break-even
3+ forks	✅ Significant savings
5+ forks	✅✅ Major savings (60%+)

Reusable Template

// ============================================
// Fork & Cache Reuse Manager
// ============================================

interface CacheAwareForkManager {
  warmCache(systemPrompt: string, sharedMessages: Message[]): Promise<void>;
  fork(task: string): Promise<ForkResult>;
  forkAll(tasks: string[]): Promise<ForkResult[]>;
}

function createForkManager(
  apiClient: APIClient,
  systemPrompt: string,
  sharedMessages: Message[],
): CacheAwareForkManager {
  let cacheWarmed = false;

  return {
    async warmCache() {
      // Make a lightweight request to populate the cache
      await apiClient.complete({
        system: systemPrompt,
        messages: [...sharedMessages, { role: 'user', content: 'Acknowledge.' }],
        maxTokens: 10,
      });
      cacheWarmed = true;
    },

    async fork(task: string) {
      return apiClient.complete({
        system: systemPrompt,
        messages: [
          ...sharedMessages,
          { role: 'user', content: task },
        ],
      });
    },

    async forkAll(tasks: string[]) {
      if (!cacheWarmed) await this.warmCache();

      // Launch all forks simultaneously to maximize cache hits
      return Promise.all(tasks.map(task => this.fork(task)));
    },
  };
}

Decision Guide

graph TD
    A["Need sub-agents?"] -->|Yes| B["How many forks?"]
    A -->|No| Z["No cache benefit needed"]

    B -->|"1"| C["Skip cache optimization<br/>Write overhead not worth it"]
    B -->|"2"| D["Use if shared prefix > 10K tokens"]
    B -->|"3+"| E["Always use Fork & Cache"]

    E --> F["Shared prefix > 1024 tokens?"]
    F -->|Yes| G["✅ Implement pattern"]
    F -->|No| H["Extend system prompt<br/>to reach minimum"]

    G --> I["Launch forks within<br/>cache TTL window"]