跳转到内容

Turn Lifecycle

此内容尚不支持你的语言。

A “turn” in Claude Code is a single round-trip: the model receives messages, produces a response (possibly with tool calls), tools execute, and results are collected. This chapter traces the complete lifecycle of one turn, from message construction to the continue/stop decision.

sequenceDiagram
participant QE as QueryEngine
participant Q as queryLoop
participant API as claude.ts
participant TE as Tool Executor
participant ATT as Attachments
QE->>Q: for await (query({messages, ...}))
Q->>Q: Prepare messagesForQuery
Q->>API: callModel(messages, systemPrompt, tools, ...)
API-->>Q: Stream assistant content blocks
Q->>TE: Feed tool_use blocks
TE-->>Q: Yield completed tool results (during stream)
Note over API: Stream ends
Q->>TE: getRemainingResults()
TE-->>Q: Remaining tool results
Q->>ATT: getAttachmentMessages(...)
ATT-->>Q: File changes, memory, skill discovery
Q->>Q: Continue decision
Q-->>QE: Yield messages

The system prompt is assembled in QueryEngine.submitMessage before the loop begins, then held constant across iterations:

src/QueryEngine.ts
const { defaultSystemPrompt, userContext, systemContext } =
await fetchSystemPromptParts({
tools,
mainLoopModel: initialMainLoopModel,
additionalWorkingDirectories: Array.from(
initialAppState.toolPermissionContext.additionalWorkingDirectories.keys(),
),
mcpClients,
customSystemPrompt: customPrompt,
});
const systemPrompt = asSystemPrompt([
...(customPrompt !== undefined ? [customPrompt] : defaultSystemPrompt),
...(memoryMechanicsPrompt ? [memoryMechanicsPrompt] : []),
...(appendSystemPrompt ? [appendSystemPrompt] : []),
]);

The system prompt is a layered structure:

  1. Default system prompt or a custom system prompt (SDK callers)
  2. Memory mechanics prompt (when CLAUDE_COWORK_MEMORY_PATH_OVERRIDE is set)
  3. Append system prompt (additional instructions)

At API call time, user context and system context are injected:

// src/query.ts — inside the loop
const fullSystemPrompt = asSystemPrompt(
appendSystemContext(systemPrompt, systemContext)
);
// User context is prepended to the message array, not the system prompt
deps.callModel({
messages: prependUserContext(messagesForQuery, userContext),
systemPrompt: fullSystemPrompt,
// ...
});

The messages sent to the API follow a strict structure defined by the Anthropic Messages API:

[system prompt]
[user context block] ← prepended via prependUserContext()
[user message] ← the original prompt
[assistant message] ← model's response with tool_use blocks
[user message] ← tool_result blocks
[assistant message] ← model's next response
... ← repeating pattern
[user message] ← latest tool results + attachments

Messages are normalized before being sent to the API via normalizeMessagesForAPI(), which:

  • Strips internal metadata fields
  • Ensures alternating user/assistant turns
  • Removes system-only messages
  • Handles tool result pairing

If auto-compaction has occurred, only messages after the last compact boundary are sent:

let messagesForQuery = [...getMessagesAfterCompactBoundary(messages)];

The compact boundary is a special system message that marks where compaction summarized older history. Everything before it is replaced by the summary.

The API call is configured with model-specific parameters:

src/query.ts
for await (const message of deps.callModel({
messages: prependUserContext(messagesForQuery, userContext),
systemPrompt: fullSystemPrompt,
thinkingConfig: toolUseContext.options.thinkingConfig,
tools: toolUseContext.options.tools,
signal: toolUseContext.abortController.signal,
options: {
model: currentModel,
fastMode: appState.fastMode,
fallbackModel,
querySource,
maxOutputTokensOverride,
agentId: toolUseContext.agentId,
effortValue: appState.effortValue,
taskBudget: params.taskBudget && {
total: params.taskBudget.total,
...(taskBudgetRemaining !== undefined && {
remaining: taskBudgetRemaining,
}),
},
},
}))

Key configuration parameters:

ParameterSourcePurpose
modelgetRuntimeMainLoopModel()Which Claude model to use
thinkingConfig{ type: 'adaptive' } or { type: 'disabled' }Extended thinking control
toolstoolUseContext.options.toolsAvailable tool definitions
maxOutputTokensOverrideSet during escalation recoveryOverride default 8K cap
taskBudgetSDK caller configServer-side token budget
effortValueUser /effort commandControls reasoning depth
fallbackModelConfigModel to try if primary fails

Tools are converted to API schema format in toolToAPISchema() from src/utils/api.ts. Each tool’s Zod schema is transformed into JSON Schema for the API:

// Simplified from src/utils/api.ts
function toolToAPISchema(tool: Tool): BetaToolUnion {
return {
name: tool.name,
description: await tool.description(input, options),
input_schema: tool.inputJSONSchema ?? zodToJsonSchema(tool.inputSchema),
};
}

The API response streams as a sequence of server-sent events. The claude.ts module processes these into typed Message objects:

graph LR
A[SSE Events] --> B[claude.ts]
B --> C[message_start]
B --> D[content_block_start]
B --> E[content_block_delta]
B --> F[content_block_stop]
B --> G[message_delta]
B --> H[message_stop]
C --> I[Reset usage counters]
D --> J[Create AssistantMessage]
E --> K[Stream text/tool_use deltas]
F --> L[Yield completed block]
G --> M[Capture stop_reason, final usage]
H --> N[Accumulate total usage]

The stream processing in queryLoop distinguishes between:

  • assistant messages: Pushed to assistantMessages[], tool_use blocks extracted
  • stream_event messages: Usage tracking (message_start, message_delta, message_stop)
  • Withheld errors: prompt_too_long and max_output_tokens are captured but NOT yielded yet

Before yielding assistant messages, tool_use inputs are backfilled for observability:

if (block.type === 'tool_use' && tool?.backfillObservableInput) {
const inputCopy = { ...originalInput };
tool.backfillObservableInput(inputCopy);
// Only clone when backfill ADDED fields (not overwrites)
const addedFields = Object.keys(inputCopy).some(k => !(k in originalInput));
if (addedFields) {
clonedContent ??= [...message.message.content];
clonedContent[i] = { ...block, input: inputCopy };
}
}

This adds legacy/derived fields for hooks and SDK consumers without mutating the original API-bound message (which would break prompt caching).

After tool execution, results are normalized for the API:

for await (const update of toolUpdates) {
if (update.message) {
yield update.message;
toolResults.push(
...normalizeMessagesForAPI(
[update.message],
toolUseContext.options.tools,
).filter(_ => _.type === 'user'),
);
}
}

Each tool result becomes a user message containing a tool_result content block:

{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_01abc...",
"content": "File written successfully",
"is_error": false
}
]
}

Error results set is_error: true and wrap the error in <tool_use_error> tags.

Phase 5: Token Counting and Budget Tracking

Section titled “Phase 5: Token Counting and Budget Tracking”

Token tracking happens at two levels:

// src/QueryEngine.ts — inside the for-await loop
case 'stream_event':
if (message.event.type === 'message_start') {
currentMessageUsage = EMPTY_USAGE;
currentMessageUsage = updateUsage(currentMessageUsage, message.event.message.usage);
}
if (message.event.type === 'message_delta') {
currentMessageUsage = updateUsage(currentMessageUsage, message.event.usage);
}
if (message.event.type === 'message_stop') {
this.totalUsage = accumulateUsage(this.totalUsage, currentMessageUsage);
}

The addToTotalSessionCost() function in src/cost-tracker.ts maintains running totals:

export function addToTotalSessionCost(
cost: number,
usage: Usage,
model: string,
): number {
const modelUsage = addToTotalModelUsage(cost, usage, model);
addToTotalCostState(cost, modelUsage, model);
// Also tracks advisor model usage recursively
}

The API returns these token counts per response:

FieldDescription
input_tokensTokens in the prompt (non-cached)
output_tokensTokens generated by the model
cache_creation_input_tokensTokens written to prompt cache
cache_read_input_tokensTokens read from prompt cache

These are accumulated in NonNullableUsage:

src/services/api/logging.ts
export type NonNullableUsage = {
input_tokens: number;
output_tokens: number;
cache_creation_input_tokens: number;
cache_read_input_tokens: number;
};

The final phase determines whether the loop should continue or terminate. The decision tree is:

graph TD
A{needsFollowUp?} -- no --> B{Is API error?}
B -- yes --> C{Recoverable?}
C -- prompt_too_long --> D[Try collapse drain]
D -- success --> E[Continue: collapse_drain_retry]
D -- fail --> F[Try reactive compact]
F -- success --> G[Continue: reactive_compact_retry]
F -- fail --> H[Return error]
C -- max_output_tokens --> I[Try escalate 8K→64K]
I -- first time --> J[Continue: max_output_tokens_escalate]
I -- already escalated --> K{Recovery count < 3?}
K -- yes --> L[Inject resume message]
L --> M[Continue: max_output_tokens_recovery]
K -- no --> N[Surface error]
B -- no --> O{Stop hooks?}
O -- blocking --> P[Continue: stop_hook_blocking]
O -- prevent --> Q[Return stop_hook_prevented]
O -- pass --> R{Token budget?}
R -- continue --> S[Continue: token_budget_continuation]
R -- stop --> T[Return completed]
A -- yes --> U{maxTurns reached?}
U -- yes --> V[Return max_turns]
U -- no --> W[Continue: next_turn]

The turnCount starts at 1 and increments each time tool results produce a follow-up:

// Each tool-result follow-up is a new turn
const nextTurnCount = turnCount + 1;
if (maxTurns && nextTurnCount > maxTurns) {
yield createAttachmentMessage({
type: 'max_turns_reached',
maxTurns,
turnCount: nextTurnCount,
});
return { reason: 'max_turns', turnCount: nextTurnCount };
}

In the outer QueryEngine, there’s a separate turnCount that increments on each user message yielded from the inner loop:

if (message.type === 'user') {
turnCount++;
}

The difference: the inner turnCount counts API round-trips, while the outer one counts user-visible turn boundaries.