跳转到内容

Cost Tracking

此内容尚不支持你的语言。

Claude Code tracks every token and every dollar across the entire session. This chapter covers how costs flow from API responses through the tracking system to the TUI status line.

graph TD
A[API Response] --> B[claude.ts - message_delta event]
B --> C[usage: input_tokens, output_tokens, cache_*]
C --> D[addToTotalSessionCost]
D --> E[addToTotalModelUsage]
D --> F[addToTotalCostState - global state]
D --> G[OpenTelemetry counters]
F --> H[getTotalCostUSD]
F --> I[getModelUsage]
H --> J[QueryEngine budget check]
H --> K[formatTotalCost - TUI display]
I --> K

Each API response includes a usage object with these fields:

// From @anthropic-ai/sdk
type BetaUsage = {
input_tokens: number;
output_tokens: number;
cache_creation_input_tokens?: number;
cache_read_input_tokens?: number;
server_tool_use?: {
web_search_requests?: number;
};
};

The fields map to:

FieldMeaningBilling
input_tokensNon-cached prompt tokensStandard input rate
output_tokensGenerated tokensStandard output rate
cache_creation_input_tokensTokens written to prompt cache1.25× input rate
cache_read_input_tokensTokens read from prompt cache0.1× input rate
web_search_requestsServer-side web searchesPer-request fee

Usage is captured at two points in the stream:

// src/services/api/claude.ts — inside queryModelWithStreaming
case 'message_start':
// Initial token count (just the input)
usage = event.message.usage;
break;
case 'message_delta':
// Final token count (after generation)
usage = updateUsage(usage, event.usage);
// event.usage has output_tokens and possibly updated cache tokens
break;

The message_stop event in QueryEngine triggers cost accumulation:

src/QueryEngine.ts
if (message.event.type === 'message_stop') {
this.totalUsage = accumulateUsage(this.totalUsage, currentMessageUsage);
}

Costs are calculated using per-model pricing defined in src/utils/modelCost.ts:

src/utils/modelCost.ts
export type ModelCosts = {
inputTokens: number; // per 1M tokens
outputTokens: number; // per 1M tokens
promptCacheWriteTokens: number;
promptCacheReadTokens: number;
webSearchRequests: number; // per request
};
// Standard Sonnet pricing: $3/$15 per Mtok
export const COST_TIER_3_15 = {
inputTokens: 3,
outputTokens: 15,
promptCacheWriteTokens: 3.75,
promptCacheReadTokens: 0.3,
webSearchRequests: 0.01,
};
// Opus pricing: $15/$75 per Mtok
export const COST_TIER_15_75 = {
inputTokens: 15,
outputTokens: 75,
promptCacheWriteTokens: 18.75,
promptCacheReadTokens: 1.5,
webSearchRequests: 0.01,
};

The calculateUSDCost function computes the dollar cost:

export function calculateUSDCost(model: string, usage: Usage): number {
const costs = getModelCosts(model);
if (!costs) {
setHasUnknownModelCost(true);
return 0; // unknown model → track as $0 but flag it
}
return (
(usage.input_tokens * costs.inputTokens) / 1_000_000 +
(usage.output_tokens * costs.outputTokens) / 1_000_000 +
((usage.cache_creation_input_tokens ?? 0) * costs.promptCacheWriteTokens) / 1_000_000 +
((usage.cache_read_input_tokens ?? 0) * costs.promptCacheReadTokens) / 1_000_000 +
((usage.server_tool_use?.web_search_requests ?? 0) * costs.webSearchRequests)
);
}

This is the central accumulation point in src/cost-tracker.ts:

export function addToTotalSessionCost(
cost: number,
usage: Usage,
model: string,
): number {
// 1. Accumulate per-model usage
const modelUsage = addToTotalModelUsage(cost, usage, model);
// 2. Update global state
addToTotalCostState(cost, modelUsage, model);
// 3. Emit to OpenTelemetry
getCostCounter()?.add(cost, attrs);
getTokenCounter()?.add(usage.input_tokens, { ...attrs, type: 'input' });
getTokenCounter()?.add(usage.output_tokens, { ...attrs, type: 'output' });
getTokenCounter()?.add(usage.cache_read_input_tokens ?? 0, { ...attrs, type: 'cacheRead' });
getTokenCounter()?.add(usage.cache_creation_input_tokens ?? 0, { ...attrs, type: 'cacheCreation' });
// 4. Recursively add advisor model usage
let totalCost = cost;
for (const advisorUsage of getAdvisorUsage(usage)) {
totalCost += addToTotalSessionCost(
calculateUSDCost(advisorUsage.model, advisorUsage),
advisorUsage,
advisorUsage.model,
);
}
return totalCost;
}

The addToTotalModelUsage function maintains per-model running totals:

function addToTotalModelUsage(cost: number, usage: Usage, model: string): ModelUsage {
const modelUsage = getUsageForModel(model) ?? {
inputTokens: 0, outputTokens: 0,
cacheReadInputTokens: 0, cacheCreationInputTokens: 0,
webSearchRequests: 0, costUSD: 0,
contextWindow: 0, maxOutputTokens: 0,
};
modelUsage.inputTokens += usage.input_tokens;
modelUsage.outputTokens += usage.output_tokens;
modelUsage.cacheReadInputTokens += usage.cache_read_input_tokens ?? 0;
modelUsage.cacheCreationInputTokens += usage.cache_creation_input_tokens ?? 0;
modelUsage.webSearchRequests += usage.server_tool_use?.web_search_requests ?? 0;
modelUsage.costUSD += cost;
modelUsage.contextWindow = getContextWindowForModel(model, getSdkBetas());
modelUsage.maxOutputTokens = getModelMaxOutputTokens(model).default;
return modelUsage;
}

All cost counters live in src/bootstrap/state.ts as module-level variables. This centralizes state for the entire process:

// src/bootstrap/state.ts (conceptual)
let totalCostUSD = 0;
let totalAPIDuration = 0;
let totalToolDuration = 0;
let totalLinesAdded = 0;
let totalLinesRemoved = 0;
let modelUsage: Record<string, ModelUsage> = {};

Enforced in QueryEngine.submitMessage after each yielded message:

src/QueryEngine.ts
if (maxBudgetUsd !== undefined && getTotalCost() >= maxBudgetUsd) {
yield {
type: 'result',
subtype: 'error_max_budget_usd',
is_error: true,
errors: [`Reached maximum budget ($${maxBudgetUsd})`],
total_cost_usd: getTotalCost(),
usage: this.totalUsage,
};
return;
}

The task budget is passed to the API and enforced server-side:

src/query.ts
taskBudget: params.taskBudget && {
total: params.taskBudget.total,
...(taskBudgetRemaining !== undefined && {
remaining: taskBudgetRemaining,
}),
},

The remaining field is decremented as tokens are consumed. When exhausted, the API returns a stop reason.

Cost state is saved to project config for resume support:

src/cost-tracker.ts
export function saveCurrentSessionCosts(fpsMetrics?: FpsMetrics): void {
saveCurrentProjectConfig(current => ({
...current,
lastCost: getTotalCostUSD(),
lastAPIDuration: getTotalAPIDuration(),
lastToolDuration: getTotalToolDuration(),
lastDuration: getTotalDuration(),
lastLinesAdded: getTotalLinesAdded(),
lastLinesRemoved: getTotalLinesRemoved(),
lastTotalInputTokens: getTotalInputTokens(),
lastTotalOutputTokens: getTotalOutputTokens(),
lastTotalCacheCreationInputTokens: getTotalCacheCreationInputTokens(),
lastTotalCacheReadInputTokens: getTotalCacheReadInputTokens(),
lastTotalWebSearchRequests: getTotalWebSearchRequests(),
lastModelUsage: Object.fromEntries(
Object.entries(getModelUsage()).map(([model, usage]) => [model, {
inputTokens: usage.inputTokens,
outputTokens: usage.outputTokens,
cacheReadInputTokens: usage.cacheReadInputTokens,
cacheCreationInputTokens: usage.cacheCreationInputTokens,
webSearchRequests: usage.webSearchRequests,
costUSD: usage.costUSD,
}]),
),
lastSessionId: getSessionId(),
}));
}

When restoring a session, the costs are loaded back:

export function restoreCostStateForSession(sessionId: string): boolean {
const data = getStoredSessionCosts(sessionId);
if (!data) return false;
setCostStateForRestore(data);
return true;
}

This only restores if the sessionId matches — preventing cross-session cost contamination.

The formatTotalCost() function produces the cost summary shown on exit:

export function formatTotalCost(): string {
const costDisplay = formatCost(getTotalCostUSD()) +
(hasUnknownModelCost()
? ' (costs may be inaccurate due to usage of unknown models)'
: '');
return chalk.dim(
`Total cost: ${costDisplay}\n` +
`Total duration (API): ${formatDuration(getTotalAPIDuration())}\n` +
`Total duration (wall): ${formatDuration(getTotalDuration())}\n` +
`Total code changes: ${getTotalLinesAdded()} lines added, ${getTotalLinesRemoved()} lines removed\n` +
`${formatModelUsage()}`
);
}

Per-model breakdown uses canonical (short) names:

function formatModelUsage(): string {
const usageByShortName = {};
for (const [model, usage] of Object.entries(getModelUsage())) {
const shortName = getCanonicalName(model);
// Accumulate by short name (e.g., "sonnet" combines all sonnet variants)
usageByShortName[shortName].inputTokens += usage.inputTokens;
// ...
}
let result = 'Usage by model:';
for (const [shortName, usage] of Object.entries(usageByShortName)) {
result += `\n${shortName}: ${formatNumber(usage.inputTokens)} input, ` +
`${formatNumber(usage.outputTokens)} output, ` +
`${formatNumber(usage.cacheReadInputTokens)} cache read, ` +
`${formatNumber(usage.cacheCreationInputTokens)} cache write` +
` ($${usage.costUSD.toFixed(4)})`;
}
return result;
}

Example output:

Total cost: $0.0847
Total duration (API): 45.2s
Total duration (wall): 1m 12s
Total code changes: 42 lines added, 7 lines removed
Usage by model:
sonnet: 125,432 input, 8,291 output, 98,100 cache read, 27,332 cache write ($0.0847)

The formatCost function uses adaptive precision:

function formatCost(cost: number, maxDecimalPlaces: number = 4): string {
return `$${cost > 0.5
? round(cost, 100).toFixed(2) // > $0.50 → 2 decimal places
: cost.toFixed(maxDecimalPlaces) // ≤ $0.50 → 4 decimal places
}`;
}
  • $0.0012 → shows 4 decimal places (cheap queries)
  • $3.47 → shows 2 decimal places (expensive sessions)