The Loop

The agentic loop is the beating heart of Claude Code. It is the engine that transforms a single user message into a multi-step, tool-using conversation with the Claude API. This chapter is a deep-dive into the while(true) loop inside src/query.ts — the most critical ~1700 lines in the entire codebase.

Architecture Overview

Claude Code’s agentic loop is split across two layers:

Layer	File	Responsibility
Outer	`src/QueryEngine.ts`	Session lifecycle, message persistence, budget enforcement, SDK message yield
Inner	`src/query.ts`	The `while(true)` loop — API calls, streaming, tool dispatch, continue decisions

The outer layer (QueryEngine.submitMessage) is an AsyncGenerator that drives the inner layer (query()) and post-processes its output for SDK consumers. The inner layer is where the actual agentic reasoning happens.

graph TD
    A[User Message] --> B[QueryEngine.submitMessage]
    B --> C[processUserInput]
    C --> D[query loop entry]
    D --> E{while true}
    E --> F[Message Build & Compact]
    F --> G[API Call & Stream]
    G --> H{needsFollowUp?}
    H -- yes --> I[Tool Dispatch]
    I --> J[Result Collection]
    J --> K[Attachment Injection]
    K --> L{maxTurns reached?}
    L -- no --> E
    L -- yes --> M[Return max_turns]
    H -- no --> N{Stop Hooks}
    N --> O[Return completed]
    B --> P[Yield SDK Messages]

The while(true) Pattern

The core loop lives in queryLoop() inside src/query.ts. It is a literal while (true) — there is no iteration count, no for-loop bound. The loop runs until an explicit return statement fires.

// src/query.ts — the core loop structure
async function* queryLoop(
  params: QueryParams,
  consumedCommandUuids: string[],
): AsyncGenerator<
  StreamEvent | RequestStartEvent | Message | TombstoneMessage | ToolUseSummaryMessage,
  Terminal
> {
  const { systemPrompt, userContext, systemContext, canUseTool, ... } = params;

  let state: State = {
    messages: params.messages,
    toolUseContext: params.toolUseContext,
    autoCompactTracking: undefined,
    maxOutputTokensRecoveryCount: 0,
    hasAttemptedReactiveCompact: false,
    turnCount: 1,
    transition: undefined,
    // ...
  };

  // eslint-disable-next-line no-constant-condition
  while (true) {
    let { toolUseContext } = state;
    const { messages, turnCount, ... } = state;

    // === STAGE 1-7 happen here ===

    state = next;  // transition to next iteration
  }
}

Why while(true)?

The loop cannot be expressed as a bounded iteration because:

Unpredictable tool chains: The model may call 1 tool or 50 tools before finishing.
Recovery loops: max_output_tokens hit → inject recovery message → re-enter. This can happen up to 3 times per turn.
Reactive compaction: If the context is too large, compact and retry — but only once.
Stop hooks: A hook can block the response, injecting an error message and forcing another iteration.
Token budget continuation: When enabled, the loop auto-continues if the model stopped early relative to its token budget.

The 7 Stages Per Iteration

Each pass through the while(true) loop executes 7 conceptual stages:

Stage 1: Message Build & Context Preparation

The first stage prepares the message array for the API call. This involves several sub-steps:

// 1a. Get messages after the last compact boundary
let messagesForQuery = [...getMessagesAfterCompactBoundary(messages)];

// 1b. Apply tool result budget (persist large results to disk)
messagesForQuery = await applyToolResultBudget(
  messagesForQuery,
  toolUseContext.contentReplacementState,
  // ...
);

// 1c. Apply snip compaction (HISTORY_SNIP feature)
if (feature('HISTORY_SNIP')) {
  const snipResult = snipModule!.snipCompactIfNeeded(messagesForQuery);
  messagesForQuery = snipResult.messages;
  snipTokensFreed = snipResult.tokensFreed;
}

// 1d. Apply microcompact (remove stale tool results)
const microcompactResult = await deps.microcompact(
  messagesForQuery, toolUseContext, querySource
);
messagesForQuery = microcompactResult.messages;

// 1e. Apply context collapse (if enabled)
if (feature('CONTEXT_COLLAPSE') && contextCollapse) {
  const collapseResult = await contextCollapse.applyCollapsesIfNeeded(
    messagesForQuery, toolUseContext, querySource
  );
  messagesForQuery = collapseResult.messages;
}

// 1f. Build the full system prompt
const fullSystemPrompt = asSystemPrompt(
  appendSystemContext(systemPrompt, systemContext)
);

// 1g. Auto-compact if needed
const { compactionResult } = await deps.autocompact(
  messagesForQuery, toolUseContext, /* ... */
);

The message preparation pipeline has a strict ordering: tool result budget → snip → microcompact → context collapse → autocompact. Each stage can reduce the token count, potentially preventing the next stage from triggering.

Stage 2: API Call Configuration

After messages are prepared, the loop configures and fires the API request:

for await (const message of deps.callModel({
  messages: prependUserContext(messagesForQuery, userContext),
  systemPrompt: fullSystemPrompt,
  thinkingConfig: toolUseContext.options.thinkingConfig,
  tools: toolUseContext.options.tools,
  signal: toolUseContext.abortController.signal,
  options: {
    model: currentModel,
    fallbackModel,
    querySource,
    maxOutputTokensOverride,
    taskBudget: params.taskBudget && { /* ... */ },
    // ...
  },
})) {
  // Stage 3: Stream processing
}

deps.callModel defaults to queryModelWithStreaming in src/services/api/claude.ts, which wraps the Anthropic SDK’s messages.stream().

Stage 3: Stream Processing

As content blocks stream in from the API, the loop processes each one:

for await (const message of deps.callModel({ /* ... */ })) {
  // Handle streaming fallback (model switch mid-stream)
  if (streamingFallbackOccured) {
    assistantMessages.length = 0;
    toolResults.length = 0;
    // Reset and retry with fallback model
  }

  // Withhold recoverable errors (prompt-too-long, max-output-tokens)
  let withheld = false;
  if (reactiveCompact?.isWithheldPromptTooLong(message)) withheld = true;
  if (isWithheldMaxOutputTokens(message)) withheld = true;
  if (!withheld) yield yieldMessage;

  // Track assistant messages and detect tool_use blocks
  if (message.type === 'assistant') {
    assistantMessages.push(message);
    const msgToolUseBlocks = message.message.content.filter(
      content => content.type === 'tool_use'
    );
    if (msgToolUseBlocks.length > 0) {
      toolUseBlocks.push(...msgToolUseBlocks);
      needsFollowUp = true;
    }

    // Feed tool blocks to StreamingToolExecutor for parallel execution
    if (streamingToolExecutor) {
      for (const toolBlock of msgToolUseBlocks) {
        streamingToolExecutor.addTool(toolBlock, message);
      }
    }
  }
}

Key insight: tool execution can start while the API is still streaming. The StreamingToolExecutor receives tool blocks as they arrive and begins executing concurrency-safe tools immediately.

Stage 4: Tool Dispatch

After the API response completes, the remaining tool results are collected:

const toolUpdates = streamingToolExecutor
  ? streamingToolExecutor.getRemainingResults()
  : runTools(toolUseBlocks, assistantMessages, canUseTool, toolUseContext);

for await (const update of toolUpdates) {
  if (update.message) {
    yield update.message;
    toolResults.push(
      ...normalizeMessagesForAPI([update.message], tools).filter(_ => _.type === 'user')
    );
  }
  if (update.newContext) {
    updatedToolUseContext = { ...update.newContext, queryTracking };
  }
}

There are two execution paths:

StreamingToolExecutor (default): Tools start executing as soon as their tool_use block streams in. Concurrent-safe tools run in parallel.
runTools (fallback): Traditional sequential/batch execution from toolOrchestration.ts.

Stage 5: Result Collection & Attachments

After tools complete, the loop collects additional context to inject:

// Get queued commands (btw messages, task notifications)
const queuedCommandsSnapshot = getCommandsByMaxPriority(
  sleepRan ? 'later' : 'next'
);

// Inject file change attachments, memory attachments, skill discovery
for await (const attachment of getAttachmentMessages(
  null, updatedToolUseContext, null, queuedCommandsSnapshot,
  [...messagesForQuery, ...assistantMessages, ...toolResults],
  querySource,
)) {
  yield attachment;
  toolResults.push(attachment);
}

// Memory prefetch consume
if (pendingMemoryPrefetch?.settledAt !== null) {
  const memoryAttachments = filterDuplicateMemoryAttachments(
    await pendingMemoryPrefetch.promise,
    toolUseContext.readFileState,
  );
  // yield memory attachments
}

Stage 6: Continue Decision

This is the critical fork point. After all tool results are collected:

if (!needsFollowUp) {
  // No tool_use blocks → model is done (or errored)

  // Handle recoverable errors: prompt-too-long, max-output-tokens
  // Handle stop hooks
  // Handle token budget continuation

  return { reason: 'completed' };
}

// Tool results exist → check limits and continue
const nextTurnCount = turnCount + 1;
if (maxTurns && nextTurnCount > maxTurns) {
  yield createAttachmentMessage({ type: 'max_turns_reached', maxTurns, turnCount: nextTurnCount });
  return { reason: 'max_turns', turnCount: nextTurnCount };
}

// Prepare next iteration
state = {
  messages: [...messagesForQuery, ...assistantMessages, ...toolResults],
  toolUseContext: toolUseContextWithQueryTracking,
  turnCount: nextTurnCount,
  transition: { reason: 'next_turn' },
  // ...
};
// implicit continue → back to while(true)

Stage 7: State Transition

The loop body ends by writing a new State object and falling through to the next while(true) iteration. The State type captures everything that varies between iterations:

type State = {
  messages: Message[];
  toolUseContext: ToolUseContext;
  autoCompactTracking: AutoCompactTrackingState | undefined;
  maxOutputTokensRecoveryCount: number;
  hasAttemptedReactiveCompact: boolean;
  maxOutputTokensOverride: number | undefined;
  pendingToolUseSummary: Promise<ToolUseSummaryMessage | null> | undefined;
  stopHookActive: boolean | undefined;
  turnCount: number;
  transition: Continue | undefined;  // why we continued
};

The transition field records why the loop continued — invaluable for debugging and testing:

Transition Reason	Meaning
`next_turn`	Normal: tool results need a follow-up API call
`reactive_compact_retry`	Prompt was too long, compacted and retrying
`collapse_drain_retry`	Context collapse drained staged collapses
`max_output_tokens_recovery`	Hit output token limit, injected recovery message
`max_output_tokens_escalate`	Escalated from 8K to 64K max output tokens
`stop_hook_blocking`	Stop hook injected a blocking error
`token_budget_continuation`	Model stopped early, budget says continue

Async Generator Yield Points

The queryLoop function is an AsyncGenerator, meaning it yields messages back to the caller as they become available. This is a critical design choice — it allows:

Real-time streaming: SDK consumers see assistant text as it streams in, not after the full turn.
Backpressure: The caller controls consumption speed. If the SDK consumer is slow, the generator pauses.
Early termination: The caller can .return() the generator to abort the loop at any point.

The key yield points are:

yield { type: 'stream_request_start' };           // Turn boundary marker
yield yieldMessage;                                 // Streamed content blocks
yield result;                                       // Completed tool results
yield attachment;                                   // Context attachments
yield createAttachmentMessage({ type: 'max_turns_reached' }); // Limit hit

The outer QueryEngine.submitMessage consumes these yields and transforms them for SDK output:

for await (const message of query({
  messages, systemPrompt, userContext, systemContext,
  canUseTool, toolUseContext, /* ... */
})) {
  switch (message.type) {
    case 'assistant':
      this.mutableMessages.push(message);
      yield* normalizeMessage(message);
      break;
    case 'stream_event':
      if (message.event.type === 'message_start') {
        currentMessageUsage = updateUsage(currentMessageUsage, message.event.message.usage);
      }
      if (message.event.type === 'message_stop') {
        this.totalUsage = accumulateUsage(this.totalUsage, currentMessageUsage);
      }
      break;
    // ...
  }

  // Check USD budget after each message
  if (maxBudgetUsd !== undefined && getTotalCost() >= maxBudgetUsd) {
    yield { type: 'result', subtype: 'error_max_budget_usd', /* ... */ };
    return;
  }
}

State Management Across Iterations

State is managed through a mutable State object that is fully replaced at each continue site. There are 7 continue sites in the loop, each building a fresh State:

graph LR
    A[next_turn] --> S[State]
    B[reactive_compact_retry] --> S
    C[collapse_drain_retry] --> S
    D[max_output_tokens_recovery] --> S
    E[max_output_tokens_escalate] --> S
    F[stop_hook_blocking] --> S
    G[token_budget_continuation] --> S

Each continue site explicitly constructs the full state, ensuring no stale values leak between recovery paths. For example, reactive_compact_retry resets autoCompactTracking to undefined but preserves hasAttemptedReactiveCompact: true to prevent infinite retry loops.

Error Recovery Paths

The loop has sophisticated error recovery built into its control flow:

Prompt Too Long (413)

1. API returns prompt-too-long error
2. Error is WITHHELD from SDK stream
3. Try context-collapse drain (cheap, preserves granular context)
4. If still failing, try reactive compact (full summary)
5. If both fail, surface the withheld error and return

Max Output Tokens

1. API returns max_output_tokens stop reason
2. Error is WITHHELD from SDK stream
3. First: try escalating from 8K to 64K (single retry, no user message)
4. If still hitting limit: inject "Resume directly — no recap" message
5. Allow up to 3 recovery attempts
6. If exhausted, surface the withheld error

Streaming Fallback

1. API stream fails mid-response
2. FallbackTriggeredError caught
3. Tombstone orphaned messages (invalid thinking signatures)
4. Switch to fallback model
5. Retry the entire request

The Complete Flow

Putting it all together, here is the complete lifecycle of a single iteration through the loop:

sequenceDiagram
    participant L as Loop Entry
    participant P as Message Prep
    participant A as API Call
    participant S as Stream
    participant T as Tool Dispatch
    participant R as Result Collection
    participant D as Continue Decision

    L->>P: Destructure state
    P->>P: Tool result budget
    P->>P: Snip compaction
    P->>P: Microcompact
    P->>P: Context collapse
    P->>P: Auto-compact
    P->>A: Configure API params
    A->>S: Stream response
    S->>S: Yield content blocks
    S->>T: Feed tool_use to StreamingToolExecutor
    S-->>S: Collect completed results during stream
    Note over S: Stream ends
    T->>T: Await remaining tool results
    T->>R: Yield tool results
    R->>R: Inject attachments
    R->>R: Memory prefetch consume
    R->>D: Check needsFollowUp
    alt No tool use
        D->>D: Run stop hooks
        D-->>L: return {reason: 'completed'}
    else Has tool results
        D->>D: Check maxTurns
        D-->>L: state = next; continue
    end

Key Invariants

Several invariants are maintained across the loop:

Every tool_use gets a tool_result: Even on abort, synthetic error tool_result blocks are generated. The API requires matching pairs.
Messages are append-only within an iteration: messagesForQuery is built fresh each iteration, but within an iteration, messages only grow via push.
Compaction is at most once per iteration: The hasAttemptedReactiveCompact flag prevents infinite compact→retry→compact loops.
Stop hooks run at most once per terminal position: The stopHookActive flag prevents re-triggering on the retry after a hook-injected blocking error.
Budget checks are post-yield: The QueryEngine checks maxBudgetUsd after each yielded message, not inside the loop. This keeps budget enforcement in one place.

Performance Characteristics

Metric	Typical Value	Notes
Iterations per user message	2-15	Depends on task complexity
Time per iteration	2-30s	Dominated by API latency
Tool overlap with streaming	40-80%	StreamingToolExecutor starts tools during stream
Compact frequency	Every 5-20 turns	Depends on context window usage
Memory per iteration	~2-5MB	Message array is the dominant cost

The streaming tool executor is the single biggest performance optimization in the loop. By starting tool execution while the API is still streaming, it can save 1-5 seconds per iteration for multi-tool responses.