Lessons for Agent Builders
Overview
Section titled “Overview”After deep-diving into 512K+ lines of Claude Code’s architecture, these are the 10 most actionable lessons for anyone building AI agent systems. Each lesson is distilled from a specific architectural decision and includes concrete implementation guidance.
Lesson 1: Use Generators for Agent Loops, Not Recursion
Section titled “Lesson 1: Use Generators for Agent Loops, Not Recursion”The Problem
Section titled “The Problem”Recursive agent loops accumulate stack frames, can’t be observed mid-execution, and have no natural cancellation point.
The Solution
Section titled “The Solution”Async generators provide linear readability, built-in backpressure, native pause/resume, and zero stack growth.
// ✅ The generator approachasync function* agentLoop(messages: Message[]): AsyncGenerator<Event> { while (shouldContinue) { const response = await callLLM(messages); yield { type: 'response', data: response }; // Observable
for (const tool of response.toolCalls) { yield { type: 'tool_start', tool }; // Pausable const result = await executeTool(tool); yield { type: 'tool_end', result }; // Resumable } }}
// Consumer controls the pacefor await (const event of agentLoop(messages)) { if (event.type === 'tool_start' && needsPermission(event.tool)) { const ok = await askUser(); if (!ok) break; // Graceful cancellation }}Actionable Takeaway
Section titled “Actionable Takeaway”Replace your while (true) { ... } agent loop with an async function*. Every yield point becomes a free observation/interception point for logging, permission checks, and UI updates.
Lesson 2: Tool Permissions Need Layered Checks
Section titled “Lesson 2: Tool Permissions Need Layered Checks”The Problem
Section titled “The Problem”A single permission check is a single point of failure. If a prompt injection bypasses one check, the tool executes unconstrained.
The Solution
Section titled “The Solution”Multiple independent security layers where any layer can deny but no single layer can approve alone.
// Five categories of checks, each independentconst securityPipeline = [ syntacticCheck, // Is the input structurally valid? semanticCheck, // What does it intend to do? scopeCheck, // Does it stay within allowed boundaries? policyCheck, // Do configured rules allow this? userCheck, // Does the human approve?];
// ALL must pass — one deny is finalfor (const layer of securityPipeline) { const result = layer.check(toolInput); if (result.verdict === 'deny') return { allowed: false };}Actionable Takeaway
Section titled “Actionable Takeaway”At minimum, implement three layers: input validation (is it well-formed?), scope check (is it within bounds?), and user confirmation (does the human agree?). Run them in application code, not as model instructions.
Lesson 3: Context Compression is the Lifeline for Long Conversations
Section titled “Lesson 3: Context Compression is the Lifeline for Long Conversations”The Problem
Section titled “The Problem”Without compression, every agent conversation has a hard ceiling (~15-20 minutes). Token accumulation from tool results, file reads, and conversation history inevitably exceeds the context window.
The Solution
Section titled “The Solution”Progressive 4-layer compression: Snip → Microcompact → Auto Compact → Hard Truncate. Start early (60% capacity), not when it’s too late.
// The critical insight: start compressing well before the limitconst THRESHOLDS = { snip: 0.4, // Snip large tool results at 40% microcompact: 0.6, // Summarize old results at 60% autoCompact: 0.8, // Summarize conversation at 80% hardTruncate: 0.95, // Emergency drop at 95%};Actionable Takeaway
Section titled “Actionable Takeaway”Implement at least Snip (truncate large tool outputs to head + tail) and Auto Compact (summarize old messages). These two layers alone extend session lifetime from ~20 minutes to potentially hours.
Lesson 4: Prompt Cache is Your #1 Cost Control Lever
Section titled “Lesson 4: Prompt Cache is Your #1 Cost Control Lever”The Problem
Section titled “The Problem”API costs scale with input tokens. An agent making 20+ API calls per session, each with a large system prompt, accumulates significant costs.
The Solution
Section titled “The Solution”Structure your system prompt for maximum cache prefix reuse. Put static content first, dynamic content last.
System prompt layout (cache-optimized):┌─────────────────────────────────┐│ Static: Identity, capabilities │ ← Cached across ALL sessions│ Static: Tool definitions │ ← Cached across ALL sessions│ Semi-static: Project context │ ← Cached within session│ Dynamic: Current task context │ ← NOT cached (changes each call)└─────────────────────────────────┘Actionable Takeaway
Section titled “Actionable Takeaway”Audit your system prompt. Move everything that doesn’t change between API calls to the beginning. Even reordering sections to maximize the shared prefix can save 30-60% on input token costs.
Lesson 5: Sub-Agents Should Reuse Context, Not Rebuild It
Section titled “Lesson 5: Sub-Agents Should Reuse Context, Not Rebuild It”The Problem
Section titled “The Problem”Spawning a sub-agent that builds its own system prompt and context from scratch wastes tokens and loses valuable cached state.
The Solution
Section titled “The Solution”Fork sub-agents from the parent’s existing context, sharing the system prompt prefix for cache reuse.
// ❌ Bad: Each sub-agent builds context independentlyconst subAgent = createAgent({ system: buildNewSystemPrompt(), // Different from parent messages: [{ role: 'user', content: task }], // No shared history});
// ✅ Good: Sub-agent extends parent's cached contextconst subAgent = createAgent({ system: parentAgent.systemPrompt, // Same prefix → cache hit messages: [ ...parentAgent.messages, // Shared history → cache hit { role: 'user', content: task }, // Only this is new ],});Actionable Takeaway
Section titled “Actionable Takeaway”When designing fork/sub-agent mechanisms, pass the parent’s system prompt and message history as a shared prefix. With 3+ forks, this can reduce input token cost by 60%+.
Lesson 6: Start with Single Agent, Scale Multi-Agent on Demand
Section titled “Lesson 6: Start with Single Agent, Scale Multi-Agent on Demand”The Problem
Section titled “The Problem”Over-engineering multi-agent systems for tasks that a single agent handles perfectly. Coordination overhead outweighs any parallelism benefit for simple tasks.
The Solution
Section titled “The Solution”Progressive escalation: single agent → fork → coordinator → team.
graph LR Q["Task complexity?"] Q -->|"Simple"| S["Single Agent<br/>80% of tasks"] Q -->|"Parallel subtasks"| F["Fork<br/>15% of tasks"] Q -->|"Dependencies"| C["Coordinator<br/>4% of tasks"] Q -->|"Collaboration"| T["Team<br/>1% of tasks"]
style S fill:#4ade80 style F fill:#a3e635 style C fill:#facc15 style T fill:#fb923cActionable Takeaway
Section titled “Actionable Takeaway”Build your single-agent loop first. Only add multi-agent coordination when you have concrete evidence that single-agent is insufficient for specific task types. The 80/20 rule applies: 80% of tasks work fine with one agent.
Lesson 7: Security Checks Must Have an Unbypassable Baseline
Section titled “Lesson 7: Security Checks Must Have an Unbypassable Baseline”The Problem
Section titled “The Problem”If security checks are implemented as model instructions (“never run dangerous commands”), prompt injection can override them.
The Solution
Section titled “The Solution”Implement security as application code that runs outside the model’s execution boundary. The model’s output is input to your security pipeline — it cannot modify the pipeline itself.
// The model generates a tool callconst toolCall = model.generateToolCall(); // Potentially compromised
// Security runs in YOUR code — model cannot influence thisconst allowed = securityPipeline.evaluate(toolCall); // Immune to injection
if (!allowed) { // Hard reject — no amount of prompt injection changes this return { error: 'Denied by security pipeline' };}Actionable Takeaway
Section titled “Actionable Takeaway”Identify your hard security boundaries — the checks that must NEVER be bypassed regardless of what the model outputs. Implement these in application code with no prompt-controllable escape hatches.
Lesson 8: Streaming Makes Experience an Order of Magnitude Better
Section titled “Lesson 8: Streaming Makes Experience an Order of Magnitude Better”The Problem
Section titled “The Problem”Traditional request-response: user waits 3-10 seconds seeing nothing, then gets a wall of text. This feels slow and unresponsive.
The Solution
Section titled “The Solution”Stream everything: API responses token-by-token, tool execution progress, and even start executing tools before the full response is received.
// The key insight: yield progress at every opportunityasync function* streamingExperience(): AsyncGenerator<UIEvent> { yield { type: 'thinking' }; // "Claude is thinking..."
for await (const token of apiStream) { yield { type: 'token', text: token }; // Real-time text display }
yield { type: 'tool_start', name: 'Bash' }; // "Running command..."
for await (const line of toolOutput) { yield { type: 'output_line', text: line }; // Live command output }
yield { type: 'complete' };}Actionable Takeaway
Section titled “Actionable Takeaway”If your agent system shows a blank screen for more than 500ms, you’re losing users. Stream the first token within 200ms, show tool execution progress, and overlap API streaming with tool execution when possible.
Lesson 9: Configuration Needs Multi-Layer Source Merging
Section titled “Lesson 9: Configuration Needs Multi-Layer Source Merging”The Problem
Section titled “The Problem”A single configuration file doesn’t work for tools that operate across different projects, teams, and environments. Users need project-level, user-level, and enterprise-level settings.
The Solution
Section titled “The Solution”Multiple configuration sources with clear precedence: flags > env vars > project config > user config > enterprise config > defaults.
// Configuration precedence (highest to lowest)const config = mergeConfigs([ flagOverrides, // --model opus envVarConfig, // CLAUDE_MODEL=opus projectConfig, // .claude/settings.json userConfig, // ~/.claude/settings.json enterpriseConfig, // /etc/claude/settings.json defaults, // Hardcoded sensible defaults]);
// Each layer only specifies what it needs to override// projectConfig: { "model": "sonnet", "tools": { "bash": { "allowedCommands": ["npm", "git"] } } }// userConfig: { "theme": "dark" }// Result: { "model": "sonnet", "theme": "dark", "tools": { "bash": { ... } } }Actionable Takeaway
Section titled “Actionable Takeaway”Design your config system with at least 3 layers: project (checked into git), user (personal preferences), and defaults. Use deep merge, not shallow override, so partial configs at any layer work correctly.
Lesson 10: Hook Systems Provide the Best Extensibility
Section titled “Lesson 10: Hook Systems Provide the Best Extensibility”The Problem
Section titled “The Problem”Plugin APIs are powerful but complex. Most users want to customize a specific behavior, not build an entire plugin.
The Solution
Section titled “The Solution”A hook system that lets users intercept specific lifecycle events with simple scripts or functions.
// .claude/hooks.json — simple, declarative, powerful{ "onToolStart": [ { "matcher": { "tool": "Bash" }, "action": "log", "config": { "file": "./claude-commands.log" } } ], "onToolEnd": [ { "matcher": { "tool": "WriteFile", "path": "*.test.ts" }, "action": "exec", "config": { "command": "npm test" } } ]}Actionable Takeaway
Section titled “Actionable Takeaway”Before building a plugin API, consider whether a hook system covers 90% of use cases with 10% of the complexity. Hooks are declarative (easy to understand), composable (multiple hooks per event), and safe (run in your process, not theirs).
Quick Reference Card
Section titled “Quick Reference Card”| # | Lesson | Key Metric |
|---|---|---|
| 1 | Generators for agent loops | 0 stack overflow risk |
| 2 | Layered security checks | 5 independent layers minimum |
| 3 | Progressive context compression | Start at 60% capacity |
| 4 | Prompt cache optimization | 90% savings on cached tokens |
| 5 | Sub-agent context reuse | 60%+ cost reduction with 3+ forks |
| 6 | Progressive agent scaling | 80% of tasks need only 1 agent |
| 7 | Unbypassable security baseline | 0 prompt-controllable escape hatches |
| 8 | Stream everything | < 200ms to first visible output |
| 9 | Multi-layer configuration | 3+ config source layers |
| 10 | Hook-based extensibility | 90% of customizations, 10% complexity |