跳转到内容

Error 处理

Claude Code 的 error 处理十分精密,因为该工具必须优雅地应对各种 API 失败——从瞬时限速到认证 error,再到完全的 API 中断。系统以 src/services/api/ 中的 withRetry.tserrors.ts 为核心。

graph TB
    subgraph "API Call"
        CALL["claude.ts → API request"]
    end

    subgraph "withRetry.ts"
        RETRY["Retry loop (max 10 attempts)"]
        CLASS["Error classification"]
        BACKOFF["Exponential backoff + jitter"]
        FASTMODE["Fast mode fallback"]
        FALLBACK["Model fallback (529)"]
        PERSIST["Persistent retry (unattended)"]
    end

    subgraph "errors.ts"
        MSG["User-facing error messages"]
        CLASSIFY["Error type classification"]
        ANALYTICS["Analytics tagging"]
    end

    subgraph "Consumer"
        UI["REPL UI display"]
        SDK_OUT["SDK error messages"]
    end

    CALL --> RETRY
    RETRY -->|Retryable| BACKOFF
    RETRY -->|Fast mode| FASTMODE
    RETRY -->|Repeated 529| FALLBACK
    RETRY -->|Unattended 429/529| PERSIST
    RETRY -->|Non-retryable| CLASS
    CLASS --> MSG
    MSG --> UI
    MSG --> SDK_OUT
    BACKOFF --> RETRY
    FASTMODE --> RETRY

src/services/api/withRetry.ts 中的 withRetry() 函数是一个 async generator,用于为 API 调用包装 retry 逻辑:

src/services/api/withRetry.ts
export async function* withRetry<T>(
getClient: () => Promise<Anthropic>,
operation: (client: Anthropic, attempt: number, context: RetryContext) => Promise<T>,
options: RetryOptions,
): AsyncGenerator<SystemAPIErrorMessage, T> {
const maxRetries = getMaxRetries(options) // Default: 10
const retryContext: RetryContext = {
model: options.model,
thinkingConfig: options.thinkingConfig,
...(isFastModeEnabled() && { fastMode: options.fastMode }),
}
let client: Anthropic | null = null
let consecutive529Errors = options.initialConsecutive529Errors ?? 0
for (let attempt = 1; attempt <= maxRetries + 1; attempt++) {
if (options.signal?.aborted) throw new APIUserAbortError()
try {
if (client === null || /* auth error on last attempt */) {
client = await getClient()
}
return await operation(client, attempt, retryContext)
} catch (error) {
// Classification and retry logic...
}
}
throw new CannotRetryError(lastError, retryContext)
}

关键设计:withRetry 在等待期间产出 SystemAPIErrorMessage(用于 UI 显示),并通过 generator 的返回值返回最终结果。

Claude Code 按 HTTP 状态码对 error 分类,并采取不同处理:

// Rate limit handling depends on subscription type
if (error.status === 429) {
// ClaudeAI subscribers (Max/Pro): don't retry (wait could be hours)
// Enterprise subscribers: retry (typically PAYG, short limits)
// API key users: retry with backoff
return !isClaudeAISubscriber() || isEnterpriseSubscriber()
}

速率限制响应包含指导行为的 header:

// Headers checked for rate limiting
'anthropic-ratelimit-unified-representative-claim' // 'five_hour' | 'seven_day'
'anthropic-ratelimit-unified-overage-status' // 'allowed' | 'rejected'
'anthropic-ratelimit-unified-reset' // Unix timestamp
'anthropic-ratelimit-unified-overage-disabled-reason' // Why extra usage is blocked
src/services/api/withRetry.ts
export function is529Error(error: unknown): boolean {
if (!(error instanceof APIError)) return false
return (
error.status === 529 ||
// SDK sometimes fails to pass 529 status during streaming
(error.message?.includes('"type":"overloaded_error"') ?? false)
)
}

529 error 有特殊的 query 来源处理——仅前台 query 进行 retry:

// Only foreground sources retry on 529 to avoid amplification
const FOREGROUND_529_RETRY_SOURCES = new Set<QuerySource>([
'repl_main_thread',
'sdk',
'agent:custom',
'agent:default',
'compact',
'auto_mode',
// Background sources (summaries, titles, classifiers) bail immediately
])
function shouldRetry529(querySource: QuerySource | undefined): boolean {
return querySource === undefined || FOREGROUND_529_RETRY_SOURCES.has(querySource)
}

设计原理:在容量级联故障期间,每次 retry 会将负载放大 3-10 倍。后台 query(标题生成、建议)用户根本看不到,应该静默失败,而不是加剧级联。

// Always retry internal server errors
if (error.status && error.status >= 500) return true
if (error.status === 401) {
// Clear cached API key and retry
clearApiKeyHelperCache()
// For OAuth: force token refresh
if (lastError instanceof APIError && lastError.status === 401) {
const failedAccessToken = getClaudeAIOAuthTokens()?.accessToken
if (failedAccessToken) {
await handleOAuth401Error(failedAccessToken)
}
}
return true // Retry with refreshed credentials
}
function isStaleConnectionError(error: unknown): boolean {
if (!(error instanceof APIConnectionError)) return false
const details = extractConnectionErrorDetails(error)
return details?.code === 'ECONNRESET' || details?.code === 'EPIPE'
}
// On stale connection: disable keep-alive and reconnect
if (isStaleConnection) {
disableKeepAlive()
client = await getClient() // Force new connection
}
src/services/api/withRetry.ts
export const BASE_DELAY_MS = 500
export function getRetryDelay(
attempt: number,
retryAfterHeader?: string | null,
maxDelayMs = 32000,
): number {
// Honor server's Retry-After header if present
if (retryAfterHeader) {
const seconds = parseInt(retryAfterHeader, 10)
if (!isNaN(seconds)) return seconds * 1000
}
// Exponential backoff: 500ms, 1s, 2s, 4s, 8s, 16s, 32s (capped)
const baseDelay = Math.min(
BASE_DELAY_MS * Math.pow(2, attempt - 1),
maxDelayMs,
)
// Add 25% jitter to prevent thundering herd
const jitter = Math.random() * 0.25 * baseDelay
return baseDelay + jitter
}

默认设置下的 retry 延迟序列:

尝试次数基础延迟含抖动(约)
1500ms500-625ms
21,000ms1,000-1,250ms
32,000ms2,000-2,500ms
44,000ms4,000-5,000ms
58,000ms8,000-10,000ms
616,000ms16,000-20,000ms
7+32,000ms32,000-40,000ms

当 fast mode 激活时,429/529 error 会触发降级机制:

src/services/api/withRetry.ts
const SHORT_RETRY_THRESHOLD_MS = 20 * 1000 // 20 seconds
const MIN_COOLDOWN_MS = 10 * 60 * 1000 // 10 minutes
const DEFAULT_FAST_MODE_FALLBACK_HOLD_MS = 30 * 60 * 1000 // 30 minutes
if (wasFastModeActive && (error.status === 429 || is529Error(error))) {
const retryAfterMs = getRetryAfterMs(error)
if (retryAfterMs !== null && retryAfterMs < SHORT_RETRY_THRESHOLD_MS) {
// Short retry-after (<20s): wait and retry with fast mode still active
// Preserves prompt cache (same model name)
await sleep(retryAfterMs, options.signal)
continue
}
// Long or unknown retry-after: enter cooldown
const cooldownMs = Math.max(
retryAfterMs ?? DEFAULT_FAST_MODE_FALLBACK_HOLD_MS,
MIN_COOLDOWN_MS,
)
triggerFastModeCooldown(Date.now() + cooldownMs, cooldownReason)
retryContext.fastMode = false
continue
}
graph TB
    A["429/529 in Fast Mode"] --> B{Retry-After < 20s?}
    B -->|Yes| C["Wait & retry<br/>(keep fast mode)"]
    B -->|No| D["Enter cooldown<br/>(switch to normal)"]
    D --> E["Cooldown for<br/>max(retryAfter, 10min)"]
    E --> F["Retry with<br/>normal mode model"]

    A --> G{Overage disabled?}
    G -->|Yes| H["Permanently disable<br/>fast mode"]

连续 3 次 529 error 后,Claude Code 可以从主模型切换到降级模型:

src/services/api/withRetry.ts
const MAX_529_RETRIES = 3
if (is529Error(error)) {
consecutive529Errors++
if (consecutive529Errors >= MAX_529_RETRIES) {
if (options.fallbackModel) {
// Throw special error — caller catches and retries with fallback model
throw new FallbackTriggeredError(options.model, options.fallbackModel)
}
// External users with no fallback: give up
if (process.env.USER_TYPE === 'external') {
throw new CannotRetryError(
new Error(REPEATED_529_ERROR_MESSAGE),
retryContext,
)
}
}
}

FallbackTriggeredErrorquery.ts 捕获,后者使用降级模型重新发起 API 调用。

对于无人值守(无头)会话,Claude Code 支持持久 retry——无限 retry 并带有保活心跳:

src/services/api/withRetry.ts
const PERSISTENT_MAX_BACKOFF_MS = 5 * 60 * 1000 // 5 minutes max backoff
const PERSISTENT_RESET_CAP_MS = 6 * 60 * 60 * 1000 // 6 hours max wait
const HEARTBEAT_INTERVAL_MS = 30_000 // 30 second heartbeats
function isPersistentRetryEnabled(): boolean {
return isEnvTruthy(process.env.CLAUDE_CODE_UNATTENDED_RETRY)
}

持久 retry 激活时:

if (persistent) {
// Chunk long sleeps to emit heartbeats
let remaining = delayMs
while (remaining > 0) {
if (options.signal?.aborted) throw new APIUserAbortError()
// Yield status message as heartbeat
yield createSystemAPIErrorMessage(error, remaining, reportedAttempt, maxRetries)
const chunk = Math.min(remaining, HEARTBEAT_INTERVAL_MS)
await sleep(chunk, options.signal)
remaining -= chunk
}
// Clamp attempt counter — the for-loop never terminates
if (attempt >= maxRetries) attempt = maxRetries
}

为什么需要心跳? 宿主环境(CI 系统、编排器)可能会终止空闲会话。每个产出的 SystemAPIErrorMessage 通过 QueryEngine 产生 stdout 活动,保持会话存活。

对于带有速率限制重置 header 的 429 error,持久 retry 会遵循精确的重置时间:

function getRateLimitResetDelayMs(error: APIError): number | null {
const resetHeader = error.headers?.get?.('anthropic-ratelimit-unified-reset')
if (!resetHeader) return null
const resetUnixSec = Number(resetHeader)
const delayMs = resetUnixSec * 1000 - Date.now()
return Math.min(delayMs, PERSISTENT_RESET_CAP_MS)
}

errors.ts 中的 classifyAPIError() 函数将 error 映射到标准化标签:

src/services/api/errors.ts
export function classifyAPIError(error: unknown): string {
if (error instanceof APIConnectionTimeoutError) return 'api_timeout'
if (error.message.includes(REPEATED_529_ERROR_MESSAGE)) return 'repeated_529'
if (error instanceof APIError && error.status === 429) return 'rate_limit'
if (error instanceof APIError && error.status === 529) return 'server_overload'
if (error.message.includes('prompt is too long')) return 'prompt_too_long'
if (error.message.includes('x-api-key')) return 'invalid_api_key'
if (error instanceof APIError && error.status >= 500) return 'server_error'
if (error instanceof APIConnectionError) {
const details = extractConnectionErrorDetails(error)
if (details?.isSSLError) return 'ssl_cert_error'
return 'connection_error'
}
return 'unknown'
}

完整分类分类法:

Error 类型HTTP 状态描述
api_timeout连接超时
rate_limit429速率限制
server_overload529API 过载
repeated_529529连续 3+ 次 529
prompt_too_long400输入超出 context window
pdf_too_large400PDF 超出页数限制
image_too_large400图片超出大小限制
tool_use_mismatch400tool_use/tool_result 配对 error
invalid_model400模型名称无法识别
credit_balance_low余额不足
invalid_api_key401API key 无效
token_revoked403OAuth token 已撤销
auth_error401/403通用认证失败
server_error500+内部服务器 error
connection_error网络连接
ssl_cert_errorSSL/TLS 证书问题

getAssistantMessageFromError() 函数将 API error 转换为用户友好的消息:

src/services/api/errors.ts
export function getAssistantMessageFromError(
error: unknown,
model: string,
): AssistantMessage {
// Timeout → "Request timed out"
// Image too large → "Image was too large. Try resizing..."
// Prompt too long → "Prompt is too long"
// 429 with headers → Specific rate limit message with reset time
// 401 → "Please run /login" or "Invalid API key"
// 403 OAuth revoked → "OAuth token revoked · Please run /login"
// 529 → "Repeated 529 Overloaded errors"
// Generic → "API Error: {message}"
}

Error 消息会根据执行上下文自适应:

// Interactive mode gets UI hints
'PDF too large. Double press esc to go back and try again'
// SDK/headless mode gets actionable advice
'PDF too large. Try reading the file a different way (e.g., extract text with pdftotext).'
flowchart TD
    ERR["API Error"] --> IS_ABORT{Aborted?}
    IS_ABORT -->|Yes| THROW_ABORT["Throw APIUserAbortError"]
    IS_ABORT -->|No| IS_FAST{Fast mode active?}

    IS_FAST -->|Yes| FAST_429{429/529?}
    FAST_429 -->|Short retry| FAST_RETRY["Wait, keep fast mode"]
    FAST_429 -->|Long retry| FAST_COOL["Cooldown, switch normal"]

    IS_FAST -->|No| IS_529{529?}
    IS_529 -->|Yes| FG{Foreground query?}
    FG -->|No| DROP["Drop immediately<br/>(no amplification)"]
    FG -->|Yes| COUNT{3+ consecutive?}
    COUNT -->|Yes| FALLBACK["FallbackTriggeredError<br/>(switch model)"]
    COUNT -->|No| RETRY_529["Retry with backoff"]

    IS_529 -->|No| IS_429{429?}
    IS_429 -->|Yes| SUB{Subscriber type?}
    SUB -->|ClaudeAI Max/Pro| NO_RETRY["Show rate limit message"]
    SUB -->|Enterprise/API| RETRY_429["Retry with backoff"]

    IS_429 -->|No| IS_AUTH{401/403?}
    IS_AUTH -->|Yes| REFRESH["Refresh credentials, retry"]
    IS_AUTH -->|No| IS_5XX{5xx?}
    IS_5XX -->|Yes| RETRY_5XX["Retry with backoff"]
    IS_5XX -->|No| CANNOT_RETRY["CannotRetryError"]