Lessons for Agent Builders

Overview

After deep-diving into 512K+ lines of Claude Code’s architecture, these are the 10 most actionable lessons for anyone building AI agent systems. Each lesson is distilled from a specific architectural decision and includes concrete implementation guidance.

Lesson 1: Use Generators for Agent Loops, Not Recursion

The Problem

Recursive agent loops accumulate stack frames, can’t be observed mid-execution, and have no natural cancellation point.

The Solution

Async generators provide linear readability, built-in backpressure, native pause/resume, and zero stack growth.

// ✅ The generator approach
async function* agentLoop(messages: Message[]): AsyncGenerator<Event> {
  while (shouldContinue) {
    const response = await callLLM(messages);
    yield { type: 'response', data: response };      // Observable

    for (const tool of response.toolCalls) {
      yield { type: 'tool_start', tool };             // Pausable
      const result = await executeTool(tool);
      yield { type: 'tool_end', result };             // Resumable
    }
  }
}

// Consumer controls the pace
for await (const event of agentLoop(messages)) {
  if (event.type === 'tool_start' && needsPermission(event.tool)) {
    const ok = await askUser();
    if (!ok) break;  // Graceful cancellation
  }
}

Actionable Takeaway

Replace your while (true) { ... } agent loop with an async function*. Every yield point becomes a free observation/interception point for logging, permission checks, and UI updates.

Lesson 2: Tool Permissions Need Layered Checks

The Problem

A single permission check is a single point of failure. If a prompt injection bypasses one check, the tool executes unconstrained.

The Solution

Multiple independent security layers where any layer can deny but no single layer can approve alone.

// Five categories of checks, each independent
const securityPipeline = [
  syntacticCheck,    // Is the input structurally valid?
  semanticCheck,     // What does it intend to do?
  scopeCheck,        // Does it stay within allowed boundaries?
  policyCheck,       // Do configured rules allow this?
  userCheck,         // Does the human approve?
];

// ALL must pass — one deny is final
for (const layer of securityPipeline) {
  const result = layer.check(toolInput);
  if (result.verdict === 'deny') return { allowed: false };
}

Actionable Takeaway

At minimum, implement three layers: input validation (is it well-formed?), scope check (is it within bounds?), and user confirmation (does the human agree?). Run them in application code, not as model instructions.

Lesson 3: Context Compression is the Lifeline for Long Conversations

The Problem

Without compression, every agent conversation has a hard ceiling (~15-20 minutes). Token accumulation from tool results, file reads, and conversation history inevitably exceeds the context window.

The Solution

Progressive 4-layer compression: Snip → Microcompact → Auto Compact → Hard Truncate. Start early (60% capacity), not when it’s too late.

// The critical insight: start compressing well before the limit
const THRESHOLDS = {
  snip: 0.4,           // Snip large tool results at 40%
  microcompact: 0.6,   // Summarize old results at 60%
  autoCompact: 0.8,    // Summarize conversation at 80%
  hardTruncate: 0.95,  // Emergency drop at 95%
};

Actionable Takeaway

Implement at least Snip (truncate large tool outputs to head + tail) and Auto Compact (summarize old messages). These two layers alone extend session lifetime from ~20 minutes to potentially hours.

Lesson 4: Prompt Cache is Your #1 Cost Control Lever

The Problem

API costs scale with input tokens. An agent making 20+ API calls per session, each with a large system prompt, accumulates significant costs.

The Solution

Structure your system prompt for maximum cache prefix reuse. Put static content first, dynamic content last.

System prompt layout (cache-optimized):
┌─────────────────────────────────┐
│ Static: Identity, capabilities   │ ← Cached across ALL sessions
│ Static: Tool definitions         │ ← Cached across ALL sessions
│ Semi-static: Project context     │ ← Cached within session
│ Dynamic: Current task context    │ ← NOT cached (changes each call)
└─────────────────────────────────┘

Actionable Takeaway

Audit your system prompt. Move everything that doesn’t change between API calls to the beginning. Even reordering sections to maximize the shared prefix can save 30-60% on input token costs.

Lesson 5: Sub-Agents Should Reuse Context, Not Rebuild It

The Problem

Spawning a sub-agent that builds its own system prompt and context from scratch wastes tokens and loses valuable cached state.

The Solution

Fork sub-agents from the parent’s existing context, sharing the system prompt prefix for cache reuse.

// ❌ Bad: Each sub-agent builds context independently
const subAgent = createAgent({
  system: buildNewSystemPrompt(),    // Different from parent
  messages: [{ role: 'user', content: task }],  // No shared history
});

// ✅ Good: Sub-agent extends parent's cached context
const subAgent = createAgent({
  system: parentAgent.systemPrompt,  // Same prefix → cache hit
  messages: [
    ...parentAgent.messages,          // Shared history → cache hit
    { role: 'user', content: task },  // Only this is new
  ],
});

Actionable Takeaway

When designing fork/sub-agent mechanisms, pass the parent’s system prompt and message history as a shared prefix. With 3+ forks, this can reduce input token cost by 60%+.

Lesson 6: Start with Single Agent, Scale Multi-Agent on Demand

The Problem

Over-engineering multi-agent systems for tasks that a single agent handles perfectly. Coordination overhead outweighs any parallelism benefit for simple tasks.

The Solution

Progressive escalation: single agent → fork → coordinator → team.

graph LR
    Q["Task complexity?"]
    Q -->|"Simple"| S["Single Agent<br/>80% of tasks"]
    Q -->|"Parallel subtasks"| F["Fork<br/>15% of tasks"]
    Q -->|"Dependencies"| C["Coordinator<br/>4% of tasks"]
    Q -->|"Collaboration"| T["Team<br/>1% of tasks"]

    style S fill:#4ade80
    style F fill:#a3e635
    style C fill:#facc15
    style T fill:#fb923c

Actionable Takeaway

Build your single-agent loop first. Only add multi-agent coordination when you have concrete evidence that single-agent is insufficient for specific task types. The 80/20 rule applies: 80% of tasks work fine with one agent.

Lesson 7: Security Checks Must Have an Unbypassable Baseline

The Problem

If security checks are implemented as model instructions (“never run dangerous commands”), prompt injection can override them.

The Solution

Implement security as application code that runs outside the model’s execution boundary. The model’s output is input to your security pipeline — it cannot modify the pipeline itself.

// The model generates a tool call
const toolCall = model.generateToolCall();  // Potentially compromised

// Security runs in YOUR code — model cannot influence this
const allowed = securityPipeline.evaluate(toolCall);  // Immune to injection

if (!allowed) {
  // Hard reject — no amount of prompt injection changes this
  return { error: 'Denied by security pipeline' };
}

Actionable Takeaway

Identify your hard security boundaries — the checks that must NEVER be bypassed regardless of what the model outputs. Implement these in application code with no prompt-controllable escape hatches.

Lesson 8: Streaming Makes Experience an Order of Magnitude Better

The Problem

Traditional request-response: user waits 3-10 seconds seeing nothing, then gets a wall of text. This feels slow and unresponsive.

The Solution

Stream everything: API responses token-by-token, tool execution progress, and even start executing tools before the full response is received.

// The key insight: yield progress at every opportunity
async function* streamingExperience(): AsyncGenerator<UIEvent> {
  yield { type: 'thinking' };                    // "Claude is thinking..."

  for await (const token of apiStream) {
    yield { type: 'token', text: token };        // Real-time text display
  }

  yield { type: 'tool_start', name: 'Bash' };   // "Running command..."

  for await (const line of toolOutput) {
    yield { type: 'output_line', text: line };   // Live command output
  }

  yield { type: 'complete' };
}

Actionable Takeaway

If your agent system shows a blank screen for more than 500ms, you’re losing users. Stream the first token within 200ms, show tool execution progress, and overlap API streaming with tool execution when possible.

Lesson 9: Configuration Needs Multi-Layer Source Merging

The Problem

A single configuration file doesn’t work for tools that operate across different projects, teams, and environments. Users need project-level, user-level, and enterprise-level settings.

The Solution

Multiple configuration sources with clear precedence: flags > env vars > project config > user config > enterprise config > defaults.

// Configuration precedence (highest to lowest)
const config = mergeConfigs([
  flagOverrides,           // --model opus
  envVarConfig,            // CLAUDE_MODEL=opus
  projectConfig,           // .claude/settings.json
  userConfig,              // ~/.claude/settings.json
  enterpriseConfig,        // /etc/claude/settings.json
  defaults,                // Hardcoded sensible defaults
]);

// Each layer only specifies what it needs to override
// projectConfig: { "model": "sonnet", "tools": { "bash": { "allowedCommands": ["npm", "git"] } } }
// userConfig: { "theme": "dark" }
// Result: { "model": "sonnet", "theme": "dark", "tools": { "bash": { ... } } }

Actionable Takeaway

Design your config system with at least 3 layers: project (checked into git), user (personal preferences), and defaults. Use deep merge, not shallow override, so partial configs at any layer work correctly.

Lesson 10: Hook Systems Provide the Best Extensibility

The Problem

Plugin APIs are powerful but complex. Most users want to customize a specific behavior, not build an entire plugin.

The Solution

A hook system that lets users intercept specific lifecycle events with simple scripts or functions.

// .claude/hooks.json — simple, declarative, powerful
{
  "onToolStart": [
    {
      "matcher": { "tool": "Bash" },
      "action": "log",
      "config": { "file": "./claude-commands.log" }
    }
  ],
  "onToolEnd": [
    {
      "matcher": { "tool": "WriteFile", "path": "*.test.ts" },
      "action": "exec",
      "config": { "command": "npm test" }
    }
  ]
}

Actionable Takeaway

Before building a plugin API, consider whether a hook system covers 90% of use cases with 10% of the complexity. Hooks are declarative (easy to understand), composable (multiple hooks per event), and safe (run in your process, not theirs).

Quick Reference Card

#	Lesson	Key Metric
1	Generators for agent loops	0 stack overflow risk
2	Layered security checks	5 independent layers minimum
3	Progressive context compression	Start at 60% capacity
4	Prompt cache optimization	90% savings on cached tokens
5	Sub-agent context reuse	60%+ cost reduction with 3+ forks
6	Progressive agent scaling	80% of tasks need only 1 agent
7	Unbypassable security baseline	0 prompt-controllable escape hatches
8	Stream everything	< 200ms to first visible output
9	Multi-layer configuration	3+ config source layers
10	Hook-based extensibility	90% of customizations, 10% complexity