Error 处理
Claude Code 的 error 处理十分精密,因为该工具必须优雅地应对各种 API 失败——从瞬时限速到认证 error,再到完全的 API 中断。系统以 src/services/api/ 中的 withRetry.ts 和 errors.ts 为核心。
Error 处理架构
Section titled “Error 处理架构”graph TB
subgraph "API Call"
CALL["claude.ts → API request"]
end
subgraph "withRetry.ts"
RETRY["Retry loop (max 10 attempts)"]
CLASS["Error classification"]
BACKOFF["Exponential backoff + jitter"]
FASTMODE["Fast mode fallback"]
FALLBACK["Model fallback (529)"]
PERSIST["Persistent retry (unattended)"]
end
subgraph "errors.ts"
MSG["User-facing error messages"]
CLASSIFY["Error type classification"]
ANALYTICS["Analytics tagging"]
end
subgraph "Consumer"
UI["REPL UI display"]
SDK_OUT["SDK error messages"]
end
CALL --> RETRY
RETRY -->|Retryable| BACKOFF
RETRY -->|Fast mode| FASTMODE
RETRY -->|Repeated 529| FALLBACK
RETRY -->|Unattended 429/529| PERSIST
RETRY -->|Non-retryable| CLASS
CLASS --> MSG
MSG --> UI
MSG --> SDK_OUT
BACKOFF --> RETRY
FASTMODE --> RETRY
withRetry 模式
Section titled “withRetry 模式”src/services/api/withRetry.ts 中的 withRetry() 函数是一个 async generator,用于为 API 调用包装 retry 逻辑:
export async function* withRetry<T>( getClient: () => Promise<Anthropic>, operation: (client: Anthropic, attempt: number, context: RetryContext) => Promise<T>, options: RetryOptions,): AsyncGenerator<SystemAPIErrorMessage, T> { const maxRetries = getMaxRetries(options) // Default: 10 const retryContext: RetryContext = { model: options.model, thinkingConfig: options.thinkingConfig, ...(isFastModeEnabled() && { fastMode: options.fastMode }), }
let client: Anthropic | null = null let consecutive529Errors = options.initialConsecutive529Errors ?? 0
for (let attempt = 1; attempt <= maxRetries + 1; attempt++) { if (options.signal?.aborted) throw new APIUserAbortError()
try { if (client === null || /* auth error on last attempt */) { client = await getClient() } return await operation(client, attempt, retryContext) } catch (error) { // Classification and retry logic... } } throw new CannotRetryError(lastError, retryContext)}关键设计:withRetry 在等待期间产出 SystemAPIErrorMessage(用于 UI 显示),并通过 generator 的返回值返回最终结果。
HTTP Error 区分
Section titled “HTTP Error 区分”Claude Code 按 HTTP 状态码对 error 分类,并采取不同处理:
429 — 速率限制
Section titled “429 — 速率限制”// Rate limit handling depends on subscription typeif (error.status === 429) { // ClaudeAI subscribers (Max/Pro): don't retry (wait could be hours) // Enterprise subscribers: retry (typically PAYG, short limits) // API key users: retry with backoff return !isClaudeAISubscriber() || isEnterpriseSubscriber()}速率限制响应包含指导行为的 header:
// Headers checked for rate limiting'anthropic-ratelimit-unified-representative-claim' // 'five_hour' | 'seven_day''anthropic-ratelimit-unified-overage-status' // 'allowed' | 'rejected''anthropic-ratelimit-unified-reset' // Unix timestamp'anthropic-ratelimit-unified-overage-disabled-reason' // Why extra usage is blocked529 — 服务器过载
Section titled “529 — 服务器过载”export function is529Error(error: unknown): boolean { if (!(error instanceof APIError)) return false return ( error.status === 529 || // SDK sometimes fails to pass 529 status during streaming (error.message?.includes('"type":"overloaded_error"') ?? false) )}529 error 有特殊的 query 来源处理——仅前台 query 进行 retry:
// Only foreground sources retry on 529 to avoid amplificationconst FOREGROUND_529_RETRY_SOURCES = new Set<QuerySource>([ 'repl_main_thread', 'sdk', 'agent:custom', 'agent:default', 'compact', 'auto_mode', // Background sources (summaries, titles, classifiers) bail immediately])
function shouldRetry529(querySource: QuerySource | undefined): boolean { return querySource === undefined || FOREGROUND_529_RETRY_SOURCES.has(querySource)}设计原理:在容量级联故障期间,每次 retry 会将负载放大 3-10 倍。后台 query(标题生成、建议)用户根本看不到,应该静默失败,而不是加剧级联。
500+ — 服务器 Error
Section titled “500+ — 服务器 Error”// Always retry internal server errorsif (error.status && error.status >= 500) return true401 — 认证 Error
Section titled “401 — 认证 Error”if (error.status === 401) { // Clear cached API key and retry clearApiKeyHelperCache()
// For OAuth: force token refresh if (lastError instanceof APIError && lastError.status === 401) { const failedAccessToken = getClaudeAIOAuthTokens()?.accessToken if (failedAccessToken) { await handleOAuth401Error(failedAccessToken) } }
return true // Retry with refreshed credentials}连接 Error(ECONNRESET/EPIPE)
Section titled “连接 Error(ECONNRESET/EPIPE)”function isStaleConnectionError(error: unknown): boolean { if (!(error instanceof APIConnectionError)) return false const details = extractConnectionErrorDetails(error) return details?.code === 'ECONNRESET' || details?.code === 'EPIPE'}
// On stale connection: disable keep-alive and reconnectif (isStaleConnection) { disableKeepAlive() client = await getClient() // Force new connection}指数退避与抖动
Section titled “指数退避与抖动”export const BASE_DELAY_MS = 500
export function getRetryDelay( attempt: number, retryAfterHeader?: string | null, maxDelayMs = 32000,): number { // Honor server's Retry-After header if present if (retryAfterHeader) { const seconds = parseInt(retryAfterHeader, 10) if (!isNaN(seconds)) return seconds * 1000 }
// Exponential backoff: 500ms, 1s, 2s, 4s, 8s, 16s, 32s (capped) const baseDelay = Math.min( BASE_DELAY_MS * Math.pow(2, attempt - 1), maxDelayMs, ) // Add 25% jitter to prevent thundering herd const jitter = Math.random() * 0.25 * baseDelay return baseDelay + jitter}默认设置下的 retry 延迟序列:
| 尝试次数 | 基础延迟 | 含抖动(约) |
|---|---|---|
| 1 | 500ms | 500-625ms |
| 2 | 1,000ms | 1,000-1,250ms |
| 3 | 2,000ms | 2,000-2,500ms |
| 4 | 4,000ms | 4,000-5,000ms |
| 5 | 8,000ms | 8,000-10,000ms |
| 6 | 16,000ms | 16,000-20,000ms |
| 7+ | 32,000ms | 32,000-40,000ms |
Fast Mode → Normal Mode 降级
Section titled “Fast Mode → Normal Mode 降级”当 fast mode 激活时,429/529 error 会触发降级机制:
const SHORT_RETRY_THRESHOLD_MS = 20 * 1000 // 20 secondsconst MIN_COOLDOWN_MS = 10 * 60 * 1000 // 10 minutesconst DEFAULT_FAST_MODE_FALLBACK_HOLD_MS = 30 * 60 * 1000 // 30 minutes
if (wasFastModeActive && (error.status === 429 || is529Error(error))) { const retryAfterMs = getRetryAfterMs(error)
if (retryAfterMs !== null && retryAfterMs < SHORT_RETRY_THRESHOLD_MS) { // Short retry-after (<20s): wait and retry with fast mode still active // Preserves prompt cache (same model name) await sleep(retryAfterMs, options.signal) continue }
// Long or unknown retry-after: enter cooldown const cooldownMs = Math.max( retryAfterMs ?? DEFAULT_FAST_MODE_FALLBACK_HOLD_MS, MIN_COOLDOWN_MS, ) triggerFastModeCooldown(Date.now() + cooldownMs, cooldownReason) retryContext.fastMode = false continue}graph TB
A["429/529 in Fast Mode"] --> B{Retry-After < 20s?}
B -->|Yes| C["Wait & retry<br/>(keep fast mode)"]
B -->|No| D["Enter cooldown<br/>(switch to normal)"]
D --> E["Cooldown for<br/>max(retryAfter, 10min)"]
E --> F["Retry with<br/>normal mode model"]
A --> G{Overage disabled?}
G -->|Yes| H["Permanently disable<br/>fast mode"]
模型降级(529 → Sonnet)
Section titled “模型降级(529 → Sonnet)”连续 3 次 529 error 后,Claude Code 可以从主模型切换到降级模型:
const MAX_529_RETRIES = 3
if (is529Error(error)) { consecutive529Errors++ if (consecutive529Errors >= MAX_529_RETRIES) { if (options.fallbackModel) { // Throw special error — caller catches and retries with fallback model throw new FallbackTriggeredError(options.model, options.fallbackModel) }
// External users with no fallback: give up if (process.env.USER_TYPE === 'external') { throw new CannotRetryError( new Error(REPEATED_529_ERROR_MESSAGE), retryContext, ) } }}FallbackTriggeredError 由 query.ts 捕获,后者使用降级模型重新发起 API 调用。
无人值守会话的持久 Retry
Section titled “无人值守会话的持久 Retry”对于无人值守(无头)会话,Claude Code 支持持久 retry——无限 retry 并带有保活心跳:
const PERSISTENT_MAX_BACKOFF_MS = 5 * 60 * 1000 // 5 minutes max backoffconst PERSISTENT_RESET_CAP_MS = 6 * 60 * 60 * 1000 // 6 hours max waitconst HEARTBEAT_INTERVAL_MS = 30_000 // 30 second heartbeats
function isPersistentRetryEnabled(): boolean { return isEnvTruthy(process.env.CLAUDE_CODE_UNATTENDED_RETRY)}持久 retry 激活时:
if (persistent) { // Chunk long sleeps to emit heartbeats let remaining = delayMs while (remaining > 0) { if (options.signal?.aborted) throw new APIUserAbortError()
// Yield status message as heartbeat yield createSystemAPIErrorMessage(error, remaining, reportedAttempt, maxRetries)
const chunk = Math.min(remaining, HEARTBEAT_INTERVAL_MS) await sleep(chunk, options.signal) remaining -= chunk }
// Clamp attempt counter — the for-loop never terminates if (attempt >= maxRetries) attempt = maxRetries}为什么需要心跳? 宿主环境(CI 系统、编排器)可能会终止空闲会话。每个产出的 SystemAPIErrorMessage 通过 QueryEngine 产生 stdout 活动,保持会话存活。
对于带有速率限制重置 header 的 429 error,持久 retry 会遵循精确的重置时间:
function getRateLimitResetDelayMs(error: APIError): number | null { const resetHeader = error.headers?.get?.('anthropic-ratelimit-unified-reset') if (!resetHeader) return null const resetUnixSec = Number(resetHeader) const delayMs = resetUnixSec * 1000 - Date.now() return Math.min(delayMs, PERSISTENT_RESET_CAP_MS)}用于分析的 Error 分类
Section titled “用于分析的 Error 分类”errors.ts 中的 classifyAPIError() 函数将 error 映射到标准化标签:
export function classifyAPIError(error: unknown): string { if (error instanceof APIConnectionTimeoutError) return 'api_timeout' if (error.message.includes(REPEATED_529_ERROR_MESSAGE)) return 'repeated_529' if (error instanceof APIError && error.status === 429) return 'rate_limit' if (error instanceof APIError && error.status === 529) return 'server_overload' if (error.message.includes('prompt is too long')) return 'prompt_too_long' if (error.message.includes('x-api-key')) return 'invalid_api_key' if (error instanceof APIError && error.status >= 500) return 'server_error' if (error instanceof APIConnectionError) { const details = extractConnectionErrorDetails(error) if (details?.isSSLError) return 'ssl_cert_error' return 'connection_error' } return 'unknown'}完整分类分类法:
| Error 类型 | HTTP 状态 | 描述 |
|---|---|---|
api_timeout | — | 连接超时 |
rate_limit | 429 | 速率限制 |
server_overload | 529 | API 过载 |
repeated_529 | 529 | 连续 3+ 次 529 |
prompt_too_long | 400 | 输入超出 context window |
pdf_too_large | 400 | PDF 超出页数限制 |
image_too_large | 400 | 图片超出大小限制 |
tool_use_mismatch | 400 | tool_use/tool_result 配对 error |
invalid_model | 400 | 模型名称无法识别 |
credit_balance_low | — | 余额不足 |
invalid_api_key | 401 | API key 无效 |
token_revoked | 403 | OAuth token 已撤销 |
auth_error | 401/403 | 通用认证失败 |
server_error | 500+ | 内部服务器 error |
connection_error | — | 网络连接 |
ssl_cert_error | — | SSL/TLS 证书问题 |
面向用户的 Error 消息
Section titled “面向用户的 Error 消息”getAssistantMessageFromError() 函数将 API error 转换为用户友好的消息:
export function getAssistantMessageFromError( error: unknown, model: string,): AssistantMessage { // Timeout → "Request timed out" // Image too large → "Image was too large. Try resizing..." // Prompt too long → "Prompt is too long" // 429 with headers → Specific rate limit message with reset time // 401 → "Please run /login" or "Invalid API key" // 403 OAuth revoked → "OAuth token revoked · Please run /login" // 529 → "Repeated 529 Overloaded errors" // Generic → "API Error: {message}"}上下文感知消息
Section titled “上下文感知消息”Error 消息会根据执行上下文自适应:
// Interactive mode gets UI hints'PDF too large. Double press esc to go back and try again'
// SDK/headless mode gets actionable advice'PDF too large. Try reading the file a different way (e.g., extract text with pdftotext).'Error 处理流程总结
Section titled “Error 处理流程总结”flowchart TD
ERR["API Error"] --> IS_ABORT{Aborted?}
IS_ABORT -->|Yes| THROW_ABORT["Throw APIUserAbortError"]
IS_ABORT -->|No| IS_FAST{Fast mode active?}
IS_FAST -->|Yes| FAST_429{429/529?}
FAST_429 -->|Short retry| FAST_RETRY["Wait, keep fast mode"]
FAST_429 -->|Long retry| FAST_COOL["Cooldown, switch normal"]
IS_FAST -->|No| IS_529{529?}
IS_529 -->|Yes| FG{Foreground query?}
FG -->|No| DROP["Drop immediately<br/>(no amplification)"]
FG -->|Yes| COUNT{3+ consecutive?}
COUNT -->|Yes| FALLBACK["FallbackTriggeredError<br/>(switch model)"]
COUNT -->|No| RETRY_529["Retry with backoff"]
IS_529 -->|No| IS_429{429?}
IS_429 -->|Yes| SUB{Subscriber type?}
SUB -->|ClaudeAI Max/Pro| NO_RETRY["Show rate limit message"]
SUB -->|Enterprise/API| RETRY_429["Retry with backoff"]
IS_429 -->|No| IS_AUTH{401/403?}
IS_AUTH -->|Yes| REFRESH["Refresh credentials, retry"]
IS_AUTH -->|No| IS_5XX{5xx?}
IS_5XX -->|Yes| RETRY_5XX["Retry with backoff"]
IS_5XX -->|No| CANNOT_RETRY["CannotRetryError"]