Skip to content

Pattern: Fork & Cache Reuse

When a parent agent spawns a sub-agent (fork), the sub-agent typically needs the same system prompt, conversation context, and tool definitions. Naively, each fork would send an independent API request — paying full price for the identical prefix.

Fork & Cache Reuse exploits Claude’s Prompt Cache by ensuring all sub-agents share the same system prompt prefix, allowing them to reuse cached tokens instead of re-processing them.

graph TB
subgraph "Without Cache Reuse"
P1["Parent Agent<br/>System: 4K tokens<br/>Context: 20K tokens"]
F1["Fork A<br/>System: 4K tokens ← re-processed<br/>Context: 20K tokens ← re-processed"]
F2["Fork B<br/>System: 4K tokens ← re-processed<br/>Context: 20K tokens ← re-processed"]
P1 --> F1
P1 --> F2
end
subgraph "With Cache Reuse"
P2["Parent Agent<br/>System: 4K tokens<br/>Context: 20K tokens"]
F3["Fork A<br/>System: 4K tokens ← CACHED ✅<br/>Context: 20K tokens ← CACHED ✅"]
F4["Fork B<br/>System: 4K tokens ← CACHED ✅<br/>Context: 20K tokens ← CACHED ✅"]
P2 --> F3
P2 --> F4
end
style P1 fill:#94a3b8
style F1 fill:#fca5a5
style F2 fill:#fca5a5
style P2 fill:#4ade80
style F3 fill:#4ade80
style F4 fill:#4ade80

Claude’s Prompt Cache operates on a prefix-matching basis:

Request A: [System Prompt][Context A][User Message A]
↑ cache breakpoint
Request B: [System Prompt][Context B][User Message B]
^^^^^^^^^^^^^^^^ cached if identical prefix

The cache matches from the beginning of the request. Any shared prefix is cached; once the content diverges, caching stops. This has a critical implication for fork design:

// ✅ Cache-friendly: identical prefix
const parentRequest = {
system: SHARED_SYSTEM_PROMPT, // Identical across all forks
messages: [
...sharedContext, // Shared conversation history
{ role: 'user', content: 'Fork-specific task A' }, // Diverges here
],
};
const forkRequest = {
system: SHARED_SYSTEM_PROMPT, // ← Same prefix: CACHED
messages: [
...sharedContext, // ← Same prefix: CACHED
{ role: 'user', content: 'Fork-specific task B' }, // Diverges here
],
};
// ❌ Cache-hostile: different prefix
const forkRequest = {
system: SHARED_SYSTEM_PROMPT + '\nYou are Sub-Agent B.', // Different!
messages: [
{ role: 'user', content: 'Fork-specific preamble' }, // Different!
...sharedContext, // Too late — prefix already diverged
],
};

A parent agent reviews a PR and forks sub-agents to analyze each changed file independently.

Shared prefix:
- System prompt: 4,000 tokens
- Project context: 8,000 tokens
- PR description: 2,000 tokens
- Shared instructions: 1,000 tokens
Total shared prefix: 15,000 tokens
Fork-specific suffix:
- File content: ~3,000 tokens each
- Analysis prompt: ~500 tokens each
Total per-fork: ~3,500 tokens

Cost Comparison (Claude Sonnet pricing):

MetricWithout CacheWith CacheSavings
Parent request15,000 tokens15,000 tokens
Fork A input18,500 tokens3,500 fresh + 15,000 cached90% on 15K
Fork B input18,500 tokens3,500 fresh + 15,000 cached90% on 15K
Fork C input18,500 tokens3,500 fresh + 15,000 cached90% on 15K
Fork D input18,500 tokens3,500 fresh + 15,000 cached90% on 15K
Fork E input18,500 tokens3,500 fresh + 15,000 cached90% on 15K
Total input107,500 tokens15,000 + 17,500 fresh + 75,000 cached
Effective cost107,500 × $3/M32,500 × $3/M + 75,000 × $0.30/M~$0.30 → ~$0.12
Savings~60%

Step 1: Design a Cache-Friendly System Prompt

Section titled “Step 1: Design a Cache-Friendly System Prompt”
// The system prompt is structured for maximum cache reuse
function buildSystemPrompt(config: AgentConfig): string {
// STATIC section — identical for all agents and forks
const staticSection = [
IDENTITY_PROMPT, // "You are Claude, made by Anthropic..."
CAPABILITIES_PROMPT, // Tool descriptions, behavior rules
SAFETY_PROMPT, // Security policies
].join('\n\n');
// SEMI-STATIC section — changes per project but shared across forks
const projectSection = [
`Project: ${config.projectName}`,
`Working directory: ${config.cwd}`,
config.claudeMdContent, // CLAUDE.md content
].join('\n\n');
// The fork-specific part goes in the messages, NOT the system prompt
return `${staticSection}\n\n${projectSection}`;
}
interface ForkOptions {
task: string;
parentMessages: Message[];
sharedPrefixLength: number; // How many messages form the shared prefix
}
function createFork(
parentSystemPrompt: string,
options: ForkOptions,
): APIRequest {
// Share the parent's message prefix for cache reuse
const sharedMessages = options.parentMessages.slice(0, options.sharedPrefixLength);
return {
system: parentSystemPrompt, // Identical — will be cached
messages: [
...sharedMessages, // Identical prefix — will be cached
{
role: 'user',
content: `Sub-task: ${options.task}`, // Fork-specific — not cached
},
],
};
}

Step 3: Orchestrate Forks for Cache Warming

Section titled “Step 3: Orchestrate Forks for Cache Warming”
async function executeForksWithCacheReuse(
systemPrompt: string,
sharedMessages: Message[],
tasks: string[],
): Promise<ForkResult[]> {
// Step 1: The parent's last request already warmed the cache
// (The system prompt + shared messages are now in the cache)
// Step 2: Fire all forks — they all share the cached prefix
const forkPromises = tasks.map(task =>
callAPI({
system: systemPrompt, // Cache hit on this
messages: [
...sharedMessages, // Cache hit on this
{ role: 'user', content: `Analyze: ${task}` },
],
})
);
// Step 3: All forks execute with ~90% cached input
return Promise.all(forkPromises);
}

Strategy 1: Static Prefix + Dynamic Suffix

Section titled “Strategy 1: Static Prefix + Dynamic Suffix”
[CACHED] System Prompt → Project Context → Shared History
[FRESH] Fork-specific task description

This is the simplest and most effective. Works when all forks share the same conversation context.

// Create explicit cache checkpoints at conversation milestones
function createCacheCheckpoint(messages: Message[]): CacheCheckpoint {
return {
messages: [...messages],
tokenCount: countTokens(messages),
timestamp: Date.now(),
};
}
// Forks reference the checkpoint instead of the live conversation
function forkFromCheckpoint(
checkpoint: CacheCheckpoint,
task: string,
): APIRequest {
return {
system: systemPrompt,
messages: [
...checkpoint.messages, // Identical to other forks using same checkpoint
{ role: 'user', content: task },
],
};
}
graph LR
L1["Layer 1: Identity<br/>~2K tokens<br/>Cache: ALL requests"] --> L2["Layer 2: Project<br/>~5K tokens<br/>Cache: Same project"]
L2 --> L3["Layer 3: Conversation<br/>~10K tokens<br/>Cache: Same session forks"]
L3 --> L4["Layer 4: Fork Task<br/>~2K tokens<br/>NOT cached"]
style L1 fill:#4ade80
style L2 fill:#a3e635
style L3 fill:#facc15
style L4 fill:#94a3b8

Prompt caches have a time-to-live (typically 5 minutes). If forks are launched too far apart, earlier caches may expire.

// Mitigation: Launch forks as close together as possible
async function launchForksQuickly(tasks: string[]) {
// ✅ Good: All forks launched within milliseconds
const results = await Promise.all(tasks.map(t => launchFork(t)));
// ❌ Bad: Sequential with delays — later forks may miss the cache
for (const task of tasks) {
await launchFork(task); // 30s per fork = 5 forks take 2.5 min
await delay(30_000);
}
}

Any change to the shared prefix invalidates the cache. This creates tension between personalization and caching:

// ❌ Cache-hostile: per-fork customization in system prompt
system: `${BASE_PROMPT}\nYou are analyzing file: ${filename}`
// ✅ Cache-friendly: customization in messages instead
system: BASE_PROMPT,
messages: [...shared, { role: 'user', content: `Analyze file: ${filename}` }]

Claude’s Prompt Cache has a minimum prefix length (typically 1,024 tokens for Sonnet, 2,048 for Opus). Very short system prompts won’t benefit.

Cache write has a small surcharge (~25% on first request). The savings only materialize when the cache is read by subsequent requests. Single-fork scenarios may actually cost slightly more.

ScenarioCache Benefit
1 fork❌ Net cost increase (write overhead)
2 forks⚠️ Break-even
3+ forks✅ Significant savings
5+ forks✅✅ Major savings (60%+)
// ============================================
// Fork & Cache Reuse Manager
// ============================================
interface CacheAwareForkManager {
warmCache(systemPrompt: string, sharedMessages: Message[]): Promise<void>;
fork(task: string): Promise<ForkResult>;
forkAll(tasks: string[]): Promise<ForkResult[]>;
}
function createForkManager(
apiClient: APIClient,
systemPrompt: string,
sharedMessages: Message[],
): CacheAwareForkManager {
let cacheWarmed = false;
return {
async warmCache() {
// Make a lightweight request to populate the cache
await apiClient.complete({
system: systemPrompt,
messages: [...sharedMessages, { role: 'user', content: 'Acknowledge.' }],
maxTokens: 10,
});
cacheWarmed = true;
},
async fork(task: string) {
return apiClient.complete({
system: systemPrompt,
messages: [
...sharedMessages,
{ role: 'user', content: task },
],
});
},
async forkAll(tasks: string[]) {
if (!cacheWarmed) await this.warmCache();
// Launch all forks simultaneously to maximize cache hits
return Promise.all(tasks.map(task => this.fork(task)));
},
};
}
graph TD
A["Need sub-agents?"] -->|Yes| B["How many forks?"]
A -->|No| Z["No cache benefit needed"]
B -->|"1"| C["Skip cache optimization<br/>Write overhead not worth it"]
B -->|"2"| D["Use if shared prefix > 10K tokens"]
B -->|"3+"| E["Always use Fork & Cache"]
E --> F["Shared prefix > 1024 tokens?"]
F -->|Yes| G["✅ Implement pattern"]
F -->|No| H["Extend system prompt<br/>to reach minimum"]
G --> I["Launch forks within<br/>cache TTL window"]