Skip to content

The Loop

The agentic loop is the beating heart of Claude Code. It is the engine that transforms a single user message into a multi-step, tool-using conversation with the Claude API. This chapter is a deep-dive into the while(true) loop inside src/query.ts — the most critical ~1700 lines in the entire codebase.

Claude Code’s agentic loop is split across two layers:

LayerFileResponsibility
Outersrc/QueryEngine.tsSession lifecycle, message persistence, budget enforcement, SDK message yield
Innersrc/query.tsThe while(true) loop — API calls, streaming, tool dispatch, continue decisions

The outer layer (QueryEngine.submitMessage) is an AsyncGenerator that drives the inner layer (query()) and post-processes its output for SDK consumers. The inner layer is where the actual agentic reasoning happens.

graph TD
A[User Message] --> B[QueryEngine.submitMessage]
B --> C[processUserInput]
C --> D[query loop entry]
D --> E{while true}
E --> F[Message Build & Compact]
F --> G[API Call & Stream]
G --> H{needsFollowUp?}
H -- yes --> I[Tool Dispatch]
I --> J[Result Collection]
J --> K[Attachment Injection]
K --> L{maxTurns reached?}
L -- no --> E
L -- yes --> M[Return max_turns]
H -- no --> N{Stop Hooks}
N --> O[Return completed]
B --> P[Yield SDK Messages]

The core loop lives in queryLoop() inside src/query.ts. It is a literal while (true) — there is no iteration count, no for-loop bound. The loop runs until an explicit return statement fires.

// src/query.ts — the core loop structure
async function* queryLoop(
params: QueryParams,
consumedCommandUuids: string[],
): AsyncGenerator<
StreamEvent | RequestStartEvent | Message | TombstoneMessage | ToolUseSummaryMessage,
Terminal
> {
const { systemPrompt, userContext, systemContext, canUseTool, ... } = params;
let state: State = {
messages: params.messages,
toolUseContext: params.toolUseContext,
autoCompactTracking: undefined,
maxOutputTokensRecoveryCount: 0,
hasAttemptedReactiveCompact: false,
turnCount: 1,
transition: undefined,
// ...
};
// eslint-disable-next-line no-constant-condition
while (true) {
let { toolUseContext } = state;
const { messages, turnCount, ... } = state;
// === STAGE 1-7 happen here ===
state = next; // transition to next iteration
}
}

The loop cannot be expressed as a bounded iteration because:

  1. Unpredictable tool chains: The model may call 1 tool or 50 tools before finishing.
  2. Recovery loops: max_output_tokens hit → inject recovery message → re-enter. This can happen up to 3 times per turn.
  3. Reactive compaction: If the context is too large, compact and retry — but only once.
  4. Stop hooks: A hook can block the response, injecting an error message and forcing another iteration.
  5. Token budget continuation: When enabled, the loop auto-continues if the model stopped early relative to its token budget.

Each pass through the while(true) loop executes 7 conceptual stages:

Stage 1: Message Build & Context Preparation

Section titled “Stage 1: Message Build & Context Preparation”

The first stage prepares the message array for the API call. This involves several sub-steps:

// 1a. Get messages after the last compact boundary
let messagesForQuery = [...getMessagesAfterCompactBoundary(messages)];
// 1b. Apply tool result budget (persist large results to disk)
messagesForQuery = await applyToolResultBudget(
messagesForQuery,
toolUseContext.contentReplacementState,
// ...
);
// 1c. Apply snip compaction (HISTORY_SNIP feature)
if (feature('HISTORY_SNIP')) {
const snipResult = snipModule!.snipCompactIfNeeded(messagesForQuery);
messagesForQuery = snipResult.messages;
snipTokensFreed = snipResult.tokensFreed;
}
// 1d. Apply microcompact (remove stale tool results)
const microcompactResult = await deps.microcompact(
messagesForQuery, toolUseContext, querySource
);
messagesForQuery = microcompactResult.messages;
// 1e. Apply context collapse (if enabled)
if (feature('CONTEXT_COLLAPSE') && contextCollapse) {
const collapseResult = await contextCollapse.applyCollapsesIfNeeded(
messagesForQuery, toolUseContext, querySource
);
messagesForQuery = collapseResult.messages;
}
// 1f. Build the full system prompt
const fullSystemPrompt = asSystemPrompt(
appendSystemContext(systemPrompt, systemContext)
);
// 1g. Auto-compact if needed
const { compactionResult } = await deps.autocompact(
messagesForQuery, toolUseContext, /* ... */
);

The message preparation pipeline has a strict ordering: tool result budget → snip → microcompact → context collapse → autocompact. Each stage can reduce the token count, potentially preventing the next stage from triggering.

After messages are prepared, the loop configures and fires the API request:

for await (const message of deps.callModel({
messages: prependUserContext(messagesForQuery, userContext),
systemPrompt: fullSystemPrompt,
thinkingConfig: toolUseContext.options.thinkingConfig,
tools: toolUseContext.options.tools,
signal: toolUseContext.abortController.signal,
options: {
model: currentModel,
fallbackModel,
querySource,
maxOutputTokensOverride,
taskBudget: params.taskBudget && { /* ... */ },
// ...
},
})) {
// Stage 3: Stream processing
}

deps.callModel defaults to queryModelWithStreaming in src/services/api/claude.ts, which wraps the Anthropic SDK’s messages.stream().

As content blocks stream in from the API, the loop processes each one:

for await (const message of deps.callModel({ /* ... */ })) {
// Handle streaming fallback (model switch mid-stream)
if (streamingFallbackOccured) {
assistantMessages.length = 0;
toolResults.length = 0;
// Reset and retry with fallback model
}
// Withhold recoverable errors (prompt-too-long, max-output-tokens)
let withheld = false;
if (reactiveCompact?.isWithheldPromptTooLong(message)) withheld = true;
if (isWithheldMaxOutputTokens(message)) withheld = true;
if (!withheld) yield yieldMessage;
// Track assistant messages and detect tool_use blocks
if (message.type === 'assistant') {
assistantMessages.push(message);
const msgToolUseBlocks = message.message.content.filter(
content => content.type === 'tool_use'
);
if (msgToolUseBlocks.length > 0) {
toolUseBlocks.push(...msgToolUseBlocks);
needsFollowUp = true;
}
// Feed tool blocks to StreamingToolExecutor for parallel execution
if (streamingToolExecutor) {
for (const toolBlock of msgToolUseBlocks) {
streamingToolExecutor.addTool(toolBlock, message);
}
}
}
}

Key insight: tool execution can start while the API is still streaming. The StreamingToolExecutor receives tool blocks as they arrive and begins executing concurrency-safe tools immediately.

After the API response completes, the remaining tool results are collected:

const toolUpdates = streamingToolExecutor
? streamingToolExecutor.getRemainingResults()
: runTools(toolUseBlocks, assistantMessages, canUseTool, toolUseContext);
for await (const update of toolUpdates) {
if (update.message) {
yield update.message;
toolResults.push(
...normalizeMessagesForAPI([update.message], tools).filter(_ => _.type === 'user')
);
}
if (update.newContext) {
updatedToolUseContext = { ...update.newContext, queryTracking };
}
}

There are two execution paths:

  • StreamingToolExecutor (default): Tools start executing as soon as their tool_use block streams in. Concurrent-safe tools run in parallel.
  • runTools (fallback): Traditional sequential/batch execution from toolOrchestration.ts.

After tools complete, the loop collects additional context to inject:

// Get queued commands (btw messages, task notifications)
const queuedCommandsSnapshot = getCommandsByMaxPriority(
sleepRan ? 'later' : 'next'
);
// Inject file change attachments, memory attachments, skill discovery
for await (const attachment of getAttachmentMessages(
null, updatedToolUseContext, null, queuedCommandsSnapshot,
[...messagesForQuery, ...assistantMessages, ...toolResults],
querySource,
)) {
yield attachment;
toolResults.push(attachment);
}
// Memory prefetch consume
if (pendingMemoryPrefetch?.settledAt !== null) {
const memoryAttachments = filterDuplicateMemoryAttachments(
await pendingMemoryPrefetch.promise,
toolUseContext.readFileState,
);
// yield memory attachments
}

This is the critical fork point. After all tool results are collected:

if (!needsFollowUp) {
// No tool_use blocks → model is done (or errored)
// Handle recoverable errors: prompt-too-long, max-output-tokens
// Handle stop hooks
// Handle token budget continuation
return { reason: 'completed' };
}
// Tool results exist → check limits and continue
const nextTurnCount = turnCount + 1;
if (maxTurns && nextTurnCount > maxTurns) {
yield createAttachmentMessage({ type: 'max_turns_reached', maxTurns, turnCount: nextTurnCount });
return { reason: 'max_turns', turnCount: nextTurnCount };
}
// Prepare next iteration
state = {
messages: [...messagesForQuery, ...assistantMessages, ...toolResults],
toolUseContext: toolUseContextWithQueryTracking,
turnCount: nextTurnCount,
transition: { reason: 'next_turn' },
// ...
};
// implicit continue → back to while(true)

The loop body ends by writing a new State object and falling through to the next while(true) iteration. The State type captures everything that varies between iterations:

type State = {
messages: Message[];
toolUseContext: ToolUseContext;
autoCompactTracking: AutoCompactTrackingState | undefined;
maxOutputTokensRecoveryCount: number;
hasAttemptedReactiveCompact: boolean;
maxOutputTokensOverride: number | undefined;
pendingToolUseSummary: Promise<ToolUseSummaryMessage | null> | undefined;
stopHookActive: boolean | undefined;
turnCount: number;
transition: Continue | undefined; // why we continued
};

The transition field records why the loop continued — invaluable for debugging and testing:

Transition ReasonMeaning
next_turnNormal: tool results need a follow-up API call
reactive_compact_retryPrompt was too long, compacted and retrying
collapse_drain_retryContext collapse drained staged collapses
max_output_tokens_recoveryHit output token limit, injected recovery message
max_output_tokens_escalateEscalated from 8K to 64K max output tokens
stop_hook_blockingStop hook injected a blocking error
token_budget_continuationModel stopped early, budget says continue

The queryLoop function is an AsyncGenerator, meaning it yields messages back to the caller as they become available. This is a critical design choice — it allows:

  1. Real-time streaming: SDK consumers see assistant text as it streams in, not after the full turn.
  2. Backpressure: The caller controls consumption speed. If the SDK consumer is slow, the generator pauses.
  3. Early termination: The caller can .return() the generator to abort the loop at any point.

The key yield points are:

yield { type: 'stream_request_start' }; // Turn boundary marker
yield yieldMessage; // Streamed content blocks
yield result; // Completed tool results
yield attachment; // Context attachments
yield createAttachmentMessage({ type: 'max_turns_reached' }); // Limit hit

The outer QueryEngine.submitMessage consumes these yields and transforms them for SDK output:

for await (const message of query({
messages, systemPrompt, userContext, systemContext,
canUseTool, toolUseContext, /* ... */
})) {
switch (message.type) {
case 'assistant':
this.mutableMessages.push(message);
yield* normalizeMessage(message);
break;
case 'stream_event':
if (message.event.type === 'message_start') {
currentMessageUsage = updateUsage(currentMessageUsage, message.event.message.usage);
}
if (message.event.type === 'message_stop') {
this.totalUsage = accumulateUsage(this.totalUsage, currentMessageUsage);
}
break;
// ...
}
// Check USD budget after each message
if (maxBudgetUsd !== undefined && getTotalCost() >= maxBudgetUsd) {
yield { type: 'result', subtype: 'error_max_budget_usd', /* ... */ };
return;
}
}

State is managed through a mutable State object that is fully replaced at each continue site. There are 7 continue sites in the loop, each building a fresh State:

graph LR
A[next_turn] --> S[State]
B[reactive_compact_retry] --> S
C[collapse_drain_retry] --> S
D[max_output_tokens_recovery] --> S
E[max_output_tokens_escalate] --> S
F[stop_hook_blocking] --> S
G[token_budget_continuation] --> S

Each continue site explicitly constructs the full state, ensuring no stale values leak between recovery paths. For example, reactive_compact_retry resets autoCompactTracking to undefined but preserves hasAttemptedReactiveCompact: true to prevent infinite retry loops.

The loop has sophisticated error recovery built into its control flow:

1. API returns prompt-too-long error
2. Error is WITHHELD from SDK stream
3. Try context-collapse drain (cheap, preserves granular context)
4. If still failing, try reactive compact (full summary)
5. If both fail, surface the withheld error and return
1. API returns max_output_tokens stop reason
2. Error is WITHHELD from SDK stream
3. First: try escalating from 8K to 64K (single retry, no user message)
4. If still hitting limit: inject "Resume directly — no recap" message
5. Allow up to 3 recovery attempts
6. If exhausted, surface the withheld error
1. API stream fails mid-response
2. FallbackTriggeredError caught
3. Tombstone orphaned messages (invalid thinking signatures)
4. Switch to fallback model
5. Retry the entire request

Putting it all together, here is the complete lifecycle of a single iteration through the loop:

sequenceDiagram
participant L as Loop Entry
participant P as Message Prep
participant A as API Call
participant S as Stream
participant T as Tool Dispatch
participant R as Result Collection
participant D as Continue Decision
L->>P: Destructure state
P->>P: Tool result budget
P->>P: Snip compaction
P->>P: Microcompact
P->>P: Context collapse
P->>P: Auto-compact
P->>A: Configure API params
A->>S: Stream response
S->>S: Yield content blocks
S->>T: Feed tool_use to StreamingToolExecutor
S-->>S: Collect completed results during stream
Note over S: Stream ends
T->>T: Await remaining tool results
T->>R: Yield tool results
R->>R: Inject attachments
R->>R: Memory prefetch consume
R->>D: Check needsFollowUp
alt No tool use
D->>D: Run stop hooks
D-->>L: return {reason: 'completed'}
else Has tool results
D->>D: Check maxTurns
D-->>L: state = next; continue
end

Several invariants are maintained across the loop:

  1. Every tool_use gets a tool_result: Even on abort, synthetic error tool_result blocks are generated. The API requires matching pairs.

  2. Messages are append-only within an iteration: messagesForQuery is built fresh each iteration, but within an iteration, messages only grow via push.

  3. Compaction is at most once per iteration: The hasAttemptedReactiveCompact flag prevents infinite compact→retry→compact loops.

  4. Stop hooks run at most once per terminal position: The stopHookActive flag prevents re-triggering on the retry after a hook-injected blocking error.

  5. Budget checks are post-yield: The QueryEngine checks maxBudgetUsd after each yielded message, not inside the loop. This keeps budget enforcement in one place.

MetricTypical ValueNotes
Iterations per user message2-15Depends on task complexity
Time per iteration2-30sDominated by API latency
Tool overlap with streaming40-80%StreamingToolExecutor starts tools during stream
Compact frequencyEvery 5-20 turnsDepends on context window usage
Memory per iteration~2-5MBMessage array is the dominant cost

The streaming tool executor is the single biggest performance optimization in the loop. By starting tool execution while the API is still streaming, it can save 1-5 seconds per iteration for multi-tool responses.