The Loop
此内容尚不支持你的语言。
The agentic loop is the beating heart of Claude Code. It is the engine that transforms a single user message into a multi-step, tool-using conversation with the Claude API. This chapter is a deep-dive into the while(true) loop inside src/query.ts — the most critical ~1700 lines in the entire codebase.
Architecture Overview
Section titled “Architecture Overview”Claude Code’s agentic loop is split across two layers:
| Layer | File | Responsibility |
|---|---|---|
| Outer | src/QueryEngine.ts | Session lifecycle, message persistence, budget enforcement, SDK message yield |
| Inner | src/query.ts | The while(true) loop — API calls, streaming, tool dispatch, continue decisions |
The outer layer (QueryEngine.submitMessage) is an AsyncGenerator that drives the inner layer (query()) and post-processes its output for SDK consumers. The inner layer is where the actual agentic reasoning happens.
graph TD A[User Message] --> B[QueryEngine.submitMessage] B --> C[processUserInput] C --> D[query loop entry] D --> E{while true} E --> F[Message Build & Compact] F --> G[API Call & Stream] G --> H{needsFollowUp?} H -- yes --> I[Tool Dispatch] I --> J[Result Collection] J --> K[Attachment Injection] K --> L{maxTurns reached?} L -- no --> E L -- yes --> M[Return max_turns] H -- no --> N{Stop Hooks} N --> O[Return completed] B --> P[Yield SDK Messages]The while(true) Pattern
Section titled “The while(true) Pattern”The core loop lives in queryLoop() inside src/query.ts. It is a literal while (true) — there is no iteration count, no for-loop bound. The loop runs until an explicit return statement fires.
// src/query.ts — the core loop structureasync function* queryLoop( params: QueryParams, consumedCommandUuids: string[],): AsyncGenerator< StreamEvent | RequestStartEvent | Message | TombstoneMessage | ToolUseSummaryMessage, Terminal> { const { systemPrompt, userContext, systemContext, canUseTool, ... } = params;
let state: State = { messages: params.messages, toolUseContext: params.toolUseContext, autoCompactTracking: undefined, maxOutputTokensRecoveryCount: 0, hasAttemptedReactiveCompact: false, turnCount: 1, transition: undefined, // ... };
// eslint-disable-next-line no-constant-condition while (true) { let { toolUseContext } = state; const { messages, turnCount, ... } = state;
// === STAGE 1-7 happen here ===
state = next; // transition to next iteration }}Why while(true)?
Section titled “Why while(true)?”The loop cannot be expressed as a bounded iteration because:
- Unpredictable tool chains: The model may call 1 tool or 50 tools before finishing.
- Recovery loops:
max_output_tokenshit → inject recovery message → re-enter. This can happen up to 3 times per turn. - Reactive compaction: If the context is too large, compact and retry — but only once.
- Stop hooks: A hook can block the response, injecting an error message and forcing another iteration.
- Token budget continuation: When enabled, the loop auto-continues if the model stopped early relative to its token budget.
The 7 Stages Per Iteration
Section titled “The 7 Stages Per Iteration”Each pass through the while(true) loop executes 7 conceptual stages:
Stage 1: Message Build & Context Preparation
Section titled “Stage 1: Message Build & Context Preparation”The first stage prepares the message array for the API call. This involves several sub-steps:
// 1a. Get messages after the last compact boundarylet messagesForQuery = [...getMessagesAfterCompactBoundary(messages)];
// 1b. Apply tool result budget (persist large results to disk)messagesForQuery = await applyToolResultBudget( messagesForQuery, toolUseContext.contentReplacementState, // ...);
// 1c. Apply snip compaction (HISTORY_SNIP feature)if (feature('HISTORY_SNIP')) { const snipResult = snipModule!.snipCompactIfNeeded(messagesForQuery); messagesForQuery = snipResult.messages; snipTokensFreed = snipResult.tokensFreed;}
// 1d. Apply microcompact (remove stale tool results)const microcompactResult = await deps.microcompact( messagesForQuery, toolUseContext, querySource);messagesForQuery = microcompactResult.messages;
// 1e. Apply context collapse (if enabled)if (feature('CONTEXT_COLLAPSE') && contextCollapse) { const collapseResult = await contextCollapse.applyCollapsesIfNeeded( messagesForQuery, toolUseContext, querySource ); messagesForQuery = collapseResult.messages;}
// 1f. Build the full system promptconst fullSystemPrompt = asSystemPrompt( appendSystemContext(systemPrompt, systemContext));
// 1g. Auto-compact if neededconst { compactionResult } = await deps.autocompact( messagesForQuery, toolUseContext, /* ... */);The message preparation pipeline has a strict ordering: tool result budget → snip → microcompact → context collapse → autocompact. Each stage can reduce the token count, potentially preventing the next stage from triggering.
Stage 2: API Call Configuration
Section titled “Stage 2: API Call Configuration”After messages are prepared, the loop configures and fires the API request:
for await (const message of deps.callModel({ messages: prependUserContext(messagesForQuery, userContext), systemPrompt: fullSystemPrompt, thinkingConfig: toolUseContext.options.thinkingConfig, tools: toolUseContext.options.tools, signal: toolUseContext.abortController.signal, options: { model: currentModel, fallbackModel, querySource, maxOutputTokensOverride, taskBudget: params.taskBudget && { /* ... */ }, // ... },})) { // Stage 3: Stream processing}deps.callModel defaults to queryModelWithStreaming in src/services/api/claude.ts, which wraps the Anthropic SDK’s messages.stream().
Stage 3: Stream Processing
Section titled “Stage 3: Stream Processing”As content blocks stream in from the API, the loop processes each one:
for await (const message of deps.callModel({ /* ... */ })) { // Handle streaming fallback (model switch mid-stream) if (streamingFallbackOccured) { assistantMessages.length = 0; toolResults.length = 0; // Reset and retry with fallback model }
// Withhold recoverable errors (prompt-too-long, max-output-tokens) let withheld = false; if (reactiveCompact?.isWithheldPromptTooLong(message)) withheld = true; if (isWithheldMaxOutputTokens(message)) withheld = true; if (!withheld) yield yieldMessage;
// Track assistant messages and detect tool_use blocks if (message.type === 'assistant') { assistantMessages.push(message); const msgToolUseBlocks = message.message.content.filter( content => content.type === 'tool_use' ); if (msgToolUseBlocks.length > 0) { toolUseBlocks.push(...msgToolUseBlocks); needsFollowUp = true; }
// Feed tool blocks to StreamingToolExecutor for parallel execution if (streamingToolExecutor) { for (const toolBlock of msgToolUseBlocks) { streamingToolExecutor.addTool(toolBlock, message); } } }}Key insight: tool execution can start while the API is still streaming. The StreamingToolExecutor receives tool blocks as they arrive and begins executing concurrency-safe tools immediately.
Stage 4: Tool Dispatch
Section titled “Stage 4: Tool Dispatch”After the API response completes, the remaining tool results are collected:
const toolUpdates = streamingToolExecutor ? streamingToolExecutor.getRemainingResults() : runTools(toolUseBlocks, assistantMessages, canUseTool, toolUseContext);
for await (const update of toolUpdates) { if (update.message) { yield update.message; toolResults.push( ...normalizeMessagesForAPI([update.message], tools).filter(_ => _.type === 'user') ); } if (update.newContext) { updatedToolUseContext = { ...update.newContext, queryTracking }; }}There are two execution paths:
- StreamingToolExecutor (default): Tools start executing as soon as their
tool_useblock streams in. Concurrent-safe tools run in parallel. - runTools (fallback): Traditional sequential/batch execution from
toolOrchestration.ts.
Stage 5: Result Collection & Attachments
Section titled “Stage 5: Result Collection & Attachments”After tools complete, the loop collects additional context to inject:
// Get queued commands (btw messages, task notifications)const queuedCommandsSnapshot = getCommandsByMaxPriority( sleepRan ? 'later' : 'next');
// Inject file change attachments, memory attachments, skill discoveryfor await (const attachment of getAttachmentMessages( null, updatedToolUseContext, null, queuedCommandsSnapshot, [...messagesForQuery, ...assistantMessages, ...toolResults], querySource,)) { yield attachment; toolResults.push(attachment);}
// Memory prefetch consumeif (pendingMemoryPrefetch?.settledAt !== null) { const memoryAttachments = filterDuplicateMemoryAttachments( await pendingMemoryPrefetch.promise, toolUseContext.readFileState, ); // yield memory attachments}Stage 6: Continue Decision
Section titled “Stage 6: Continue Decision”This is the critical fork point. After all tool results are collected:
if (!needsFollowUp) { // No tool_use blocks → model is done (or errored)
// Handle recoverable errors: prompt-too-long, max-output-tokens // Handle stop hooks // Handle token budget continuation
return { reason: 'completed' };}
// Tool results exist → check limits and continueconst nextTurnCount = turnCount + 1;if (maxTurns && nextTurnCount > maxTurns) { yield createAttachmentMessage({ type: 'max_turns_reached', maxTurns, turnCount: nextTurnCount }); return { reason: 'max_turns', turnCount: nextTurnCount };}
// Prepare next iterationstate = { messages: [...messagesForQuery, ...assistantMessages, ...toolResults], toolUseContext: toolUseContextWithQueryTracking, turnCount: nextTurnCount, transition: { reason: 'next_turn' }, // ...};// implicit continue → back to while(true)Stage 7: State Transition
Section titled “Stage 7: State Transition”The loop body ends by writing a new State object and falling through to the next while(true) iteration. The State type captures everything that varies between iterations:
type State = { messages: Message[]; toolUseContext: ToolUseContext; autoCompactTracking: AutoCompactTrackingState | undefined; maxOutputTokensRecoveryCount: number; hasAttemptedReactiveCompact: boolean; maxOutputTokensOverride: number | undefined; pendingToolUseSummary: Promise<ToolUseSummaryMessage | null> | undefined; stopHookActive: boolean | undefined; turnCount: number; transition: Continue | undefined; // why we continued};The transition field records why the loop continued — invaluable for debugging and testing:
| Transition Reason | Meaning |
|---|---|
next_turn | Normal: tool results need a follow-up API call |
reactive_compact_retry | Prompt was too long, compacted and retrying |
collapse_drain_retry | Context collapse drained staged collapses |
max_output_tokens_recovery | Hit output token limit, injected recovery message |
max_output_tokens_escalate | Escalated from 8K to 64K max output tokens |
stop_hook_blocking | Stop hook injected a blocking error |
token_budget_continuation | Model stopped early, budget says continue |
Async Generator Yield Points
Section titled “Async Generator Yield Points”The queryLoop function is an AsyncGenerator, meaning it yields messages back to the caller as they become available. This is a critical design choice — it allows:
- Real-time streaming: SDK consumers see assistant text as it streams in, not after the full turn.
- Backpressure: The caller controls consumption speed. If the SDK consumer is slow, the generator pauses.
- Early termination: The caller can
.return()the generator to abort the loop at any point.
The key yield points are:
yield { type: 'stream_request_start' }; // Turn boundary markeryield yieldMessage; // Streamed content blocksyield result; // Completed tool resultsyield attachment; // Context attachmentsyield createAttachmentMessage({ type: 'max_turns_reached' }); // Limit hitThe outer QueryEngine.submitMessage consumes these yields and transforms them for SDK output:
for await (const message of query({ messages, systemPrompt, userContext, systemContext, canUseTool, toolUseContext, /* ... */})) { switch (message.type) { case 'assistant': this.mutableMessages.push(message); yield* normalizeMessage(message); break; case 'stream_event': if (message.event.type === 'message_start') { currentMessageUsage = updateUsage(currentMessageUsage, message.event.message.usage); } if (message.event.type === 'message_stop') { this.totalUsage = accumulateUsage(this.totalUsage, currentMessageUsage); } break; // ... }
// Check USD budget after each message if (maxBudgetUsd !== undefined && getTotalCost() >= maxBudgetUsd) { yield { type: 'result', subtype: 'error_max_budget_usd', /* ... */ }; return; }}State Management Across Iterations
Section titled “State Management Across Iterations”State is managed through a mutable State object that is fully replaced at each continue site. There are 7 continue sites in the loop, each building a fresh State:
graph LR A[next_turn] --> S[State] B[reactive_compact_retry] --> S C[collapse_drain_retry] --> S D[max_output_tokens_recovery] --> S E[max_output_tokens_escalate] --> S F[stop_hook_blocking] --> S G[token_budget_continuation] --> SEach continue site explicitly constructs the full state, ensuring no stale values leak between recovery paths. For example, reactive_compact_retry resets autoCompactTracking to undefined but preserves hasAttemptedReactiveCompact: true to prevent infinite retry loops.
Error Recovery Paths
Section titled “Error Recovery Paths”The loop has sophisticated error recovery built into its control flow:
Prompt Too Long (413)
Section titled “Prompt Too Long (413)”1. API returns prompt-too-long error2. Error is WITHHELD from SDK stream3. Try context-collapse drain (cheap, preserves granular context)4. If still failing, try reactive compact (full summary)5. If both fail, surface the withheld error and returnMax Output Tokens
Section titled “Max Output Tokens”1. API returns max_output_tokens stop reason2. Error is WITHHELD from SDK stream3. First: try escalating from 8K to 64K (single retry, no user message)4. If still hitting limit: inject "Resume directly — no recap" message5. Allow up to 3 recovery attempts6. If exhausted, surface the withheld errorStreaming Fallback
Section titled “Streaming Fallback”1. API stream fails mid-response2. FallbackTriggeredError caught3. Tombstone orphaned messages (invalid thinking signatures)4. Switch to fallback model5. Retry the entire requestThe Complete Flow
Section titled “The Complete Flow”Putting it all together, here is the complete lifecycle of a single iteration through the loop:
sequenceDiagram participant L as Loop Entry participant P as Message Prep participant A as API Call participant S as Stream participant T as Tool Dispatch participant R as Result Collection participant D as Continue Decision
L->>P: Destructure state P->>P: Tool result budget P->>P: Snip compaction P->>P: Microcompact P->>P: Context collapse P->>P: Auto-compact P->>A: Configure API params A->>S: Stream response S->>S: Yield content blocks S->>T: Feed tool_use to StreamingToolExecutor S-->>S: Collect completed results during stream Note over S: Stream ends T->>T: Await remaining tool results T->>R: Yield tool results R->>R: Inject attachments R->>R: Memory prefetch consume R->>D: Check needsFollowUp alt No tool use D->>D: Run stop hooks D-->>L: return {reason: 'completed'} else Has tool results D->>D: Check maxTurns D-->>L: state = next; continue endKey Invariants
Section titled “Key Invariants”Several invariants are maintained across the loop:
-
Every
tool_usegets atool_result: Even on abort, synthetic errortool_resultblocks are generated. The API requires matching pairs. -
Messages are append-only within an iteration:
messagesForQueryis built fresh each iteration, but within an iteration, messages only grow viapush. -
Compaction is at most once per iteration: The
hasAttemptedReactiveCompactflag prevents infinite compact→retry→compact loops. -
Stop hooks run at most once per terminal position: The
stopHookActiveflag prevents re-triggering on the retry after a hook-injected blocking error. -
Budget checks are post-yield: The
QueryEnginechecksmaxBudgetUsdafter each yielded message, not inside the loop. This keeps budget enforcement in one place.
Performance Characteristics
Section titled “Performance Characteristics”| Metric | Typical Value | Notes |
|---|---|---|
| Iterations per user message | 2-15 | Depends on task complexity |
| Time per iteration | 2-30s | Dominated by API latency |
| Tool overlap with streaming | 40-80% | StreamingToolExecutor starts tools during stream |
| Compact frequency | Every 5-20 turns | Depends on context window usage |
| Memory per iteration | ~2-5MB | Message array is the dominant cost |
The streaming tool executor is the single biggest performance optimization in the loop. By starting tool execution while the API is still streaming, it can save 1-5 seconds per iteration for multi-tool responses.