Skip to content

Lessons for Agent Builders

After deep-diving into 512K+ lines of Claude Code’s architecture, these are the 10 most actionable lessons for anyone building AI agent systems. Each lesson is distilled from a specific architectural decision and includes concrete implementation guidance.

Lesson 1: Use Generators for Agent Loops, Not Recursion

Section titled “Lesson 1: Use Generators for Agent Loops, Not Recursion”

Recursive agent loops accumulate stack frames, can’t be observed mid-execution, and have no natural cancellation point.

Async generators provide linear readability, built-in backpressure, native pause/resume, and zero stack growth.

// ✅ The generator approach
async function* agentLoop(messages: Message[]): AsyncGenerator<Event> {
while (shouldContinue) {
const response = await callLLM(messages);
yield { type: 'response', data: response }; // Observable
for (const tool of response.toolCalls) {
yield { type: 'tool_start', tool }; // Pausable
const result = await executeTool(tool);
yield { type: 'tool_end', result }; // Resumable
}
}
}
// Consumer controls the pace
for await (const event of agentLoop(messages)) {
if (event.type === 'tool_start' && needsPermission(event.tool)) {
const ok = await askUser();
if (!ok) break; // Graceful cancellation
}
}

Replace your while (true) { ... } agent loop with an async function*. Every yield point becomes a free observation/interception point for logging, permission checks, and UI updates.

Lesson 2: Tool Permissions Need Layered Checks

Section titled “Lesson 2: Tool Permissions Need Layered Checks”

A single permission check is a single point of failure. If a prompt injection bypasses one check, the tool executes unconstrained.

Multiple independent security layers where any layer can deny but no single layer can approve alone.

// Five categories of checks, each independent
const securityPipeline = [
syntacticCheck, // Is the input structurally valid?
semanticCheck, // What does it intend to do?
scopeCheck, // Does it stay within allowed boundaries?
policyCheck, // Do configured rules allow this?
userCheck, // Does the human approve?
];
// ALL must pass — one deny is final
for (const layer of securityPipeline) {
const result = layer.check(toolInput);
if (result.verdict === 'deny') return { allowed: false };
}

At minimum, implement three layers: input validation (is it well-formed?), scope check (is it within bounds?), and user confirmation (does the human agree?). Run them in application code, not as model instructions.

Lesson 3: Context Compression is the Lifeline for Long Conversations

Section titled “Lesson 3: Context Compression is the Lifeline for Long Conversations”

Without compression, every agent conversation has a hard ceiling (~15-20 minutes). Token accumulation from tool results, file reads, and conversation history inevitably exceeds the context window.

Progressive 4-layer compression: Snip → Microcompact → Auto Compact → Hard Truncate. Start early (60% capacity), not when it’s too late.

// The critical insight: start compressing well before the limit
const THRESHOLDS = {
snip: 0.4, // Snip large tool results at 40%
microcompact: 0.6, // Summarize old results at 60%
autoCompact: 0.8, // Summarize conversation at 80%
hardTruncate: 0.95, // Emergency drop at 95%
};

Implement at least Snip (truncate large tool outputs to head + tail) and Auto Compact (summarize old messages). These two layers alone extend session lifetime from ~20 minutes to potentially hours.

Lesson 4: Prompt Cache is Your #1 Cost Control Lever

Section titled “Lesson 4: Prompt Cache is Your #1 Cost Control Lever”

API costs scale with input tokens. An agent making 20+ API calls per session, each with a large system prompt, accumulates significant costs.

Structure your system prompt for maximum cache prefix reuse. Put static content first, dynamic content last.

System prompt layout (cache-optimized):
┌─────────────────────────────────┐
│ Static: Identity, capabilities │ ← Cached across ALL sessions
│ Static: Tool definitions │ ← Cached across ALL sessions
│ Semi-static: Project context │ ← Cached within session
│ Dynamic: Current task context │ ← NOT cached (changes each call)
└─────────────────────────────────┘

Audit your system prompt. Move everything that doesn’t change between API calls to the beginning. Even reordering sections to maximize the shared prefix can save 30-60% on input token costs.

Lesson 5: Sub-Agents Should Reuse Context, Not Rebuild It

Section titled “Lesson 5: Sub-Agents Should Reuse Context, Not Rebuild It”

Spawning a sub-agent that builds its own system prompt and context from scratch wastes tokens and loses valuable cached state.

Fork sub-agents from the parent’s existing context, sharing the system prompt prefix for cache reuse.

// ❌ Bad: Each sub-agent builds context independently
const subAgent = createAgent({
system: buildNewSystemPrompt(), // Different from parent
messages: [{ role: 'user', content: task }], // No shared history
});
// ✅ Good: Sub-agent extends parent's cached context
const subAgent = createAgent({
system: parentAgent.systemPrompt, // Same prefix → cache hit
messages: [
...parentAgent.messages, // Shared history → cache hit
{ role: 'user', content: task }, // Only this is new
],
});

When designing fork/sub-agent mechanisms, pass the parent’s system prompt and message history as a shared prefix. With 3+ forks, this can reduce input token cost by 60%+.

Lesson 6: Start with Single Agent, Scale Multi-Agent on Demand

Section titled “Lesson 6: Start with Single Agent, Scale Multi-Agent on Demand”

Over-engineering multi-agent systems for tasks that a single agent handles perfectly. Coordination overhead outweighs any parallelism benefit for simple tasks.

Progressive escalation: single agent → fork → coordinator → team.

graph LR
Q["Task complexity?"]
Q -->|"Simple"| S["Single Agent<br/>80% of tasks"]
Q -->|"Parallel subtasks"| F["Fork<br/>15% of tasks"]
Q -->|"Dependencies"| C["Coordinator<br/>4% of tasks"]
Q -->|"Collaboration"| T["Team<br/>1% of tasks"]
style S fill:#4ade80
style F fill:#a3e635
style C fill:#facc15
style T fill:#fb923c

Build your single-agent loop first. Only add multi-agent coordination when you have concrete evidence that single-agent is insufficient for specific task types. The 80/20 rule applies: 80% of tasks work fine with one agent.

Lesson 7: Security Checks Must Have an Unbypassable Baseline

Section titled “Lesson 7: Security Checks Must Have an Unbypassable Baseline”

If security checks are implemented as model instructions (“never run dangerous commands”), prompt injection can override them.

Implement security as application code that runs outside the model’s execution boundary. The model’s output is input to your security pipeline — it cannot modify the pipeline itself.

// The model generates a tool call
const toolCall = model.generateToolCall(); // Potentially compromised
// Security runs in YOUR code — model cannot influence this
const allowed = securityPipeline.evaluate(toolCall); // Immune to injection
if (!allowed) {
// Hard reject — no amount of prompt injection changes this
return { error: 'Denied by security pipeline' };
}

Identify your hard security boundaries — the checks that must NEVER be bypassed regardless of what the model outputs. Implement these in application code with no prompt-controllable escape hatches.

Lesson 8: Streaming Makes Experience an Order of Magnitude Better

Section titled “Lesson 8: Streaming Makes Experience an Order of Magnitude Better”

Traditional request-response: user waits 3-10 seconds seeing nothing, then gets a wall of text. This feels slow and unresponsive.

Stream everything: API responses token-by-token, tool execution progress, and even start executing tools before the full response is received.

// The key insight: yield progress at every opportunity
async function* streamingExperience(): AsyncGenerator<UIEvent> {
yield { type: 'thinking' }; // "Claude is thinking..."
for await (const token of apiStream) {
yield { type: 'token', text: token }; // Real-time text display
}
yield { type: 'tool_start', name: 'Bash' }; // "Running command..."
for await (const line of toolOutput) {
yield { type: 'output_line', text: line }; // Live command output
}
yield { type: 'complete' };
}

If your agent system shows a blank screen for more than 500ms, you’re losing users. Stream the first token within 200ms, show tool execution progress, and overlap API streaming with tool execution when possible.

Lesson 9: Configuration Needs Multi-Layer Source Merging

Section titled “Lesson 9: Configuration Needs Multi-Layer Source Merging”

A single configuration file doesn’t work for tools that operate across different projects, teams, and environments. Users need project-level, user-level, and enterprise-level settings.

Multiple configuration sources with clear precedence: flags > env vars > project config > user config > enterprise config > defaults.

// Configuration precedence (highest to lowest)
const config = mergeConfigs([
flagOverrides, // --model opus
envVarConfig, // CLAUDE_MODEL=opus
projectConfig, // .claude/settings.json
userConfig, // ~/.claude/settings.json
enterpriseConfig, // /etc/claude/settings.json
defaults, // Hardcoded sensible defaults
]);
// Each layer only specifies what it needs to override
// projectConfig: { "model": "sonnet", "tools": { "bash": { "allowedCommands": ["npm", "git"] } } }
// userConfig: { "theme": "dark" }
// Result: { "model": "sonnet", "theme": "dark", "tools": { "bash": { ... } } }

Design your config system with at least 3 layers: project (checked into git), user (personal preferences), and defaults. Use deep merge, not shallow override, so partial configs at any layer work correctly.

Lesson 10: Hook Systems Provide the Best Extensibility

Section titled “Lesson 10: Hook Systems Provide the Best Extensibility”

Plugin APIs are powerful but complex. Most users want to customize a specific behavior, not build an entire plugin.

A hook system that lets users intercept specific lifecycle events with simple scripts or functions.

// .claude/hooks.json — simple, declarative, powerful
{
"onToolStart": [
{
"matcher": { "tool": "Bash" },
"action": "log",
"config": { "file": "./claude-commands.log" }
}
],
"onToolEnd": [
{
"matcher": { "tool": "WriteFile", "path": "*.test.ts" },
"action": "exec",
"config": { "command": "npm test" }
}
]
}

Before building a plugin API, consider whether a hook system covers 90% of use cases with 10% of the complexity. Hooks are declarative (easy to understand), composable (multiple hooks per event), and safe (run in your process, not theirs).

#LessonKey Metric
1Generators for agent loops0 stack overflow risk
2Layered security checks5 independent layers minimum
3Progressive context compressionStart at 60% capacity
4Prompt cache optimization90% savings on cached tokens
5Sub-agent context reuse60%+ cost reduction with 3+ forks
6Progressive agent scaling80% of tasks need only 1 agent
7Unbypassable security baseline0 prompt-controllable escape hatches
8Stream everything< 200ms to first visible output
9Multi-layer configuration3+ config source layers
10Hook-based extensibility90% of customizations, 10% complexity