LLM API 客户端
任何调用 OpenAI、Anthropic 或其他有速率限制和偶尔中断的 LLM API 的应用。
高强度 API 调用的系统不可避免地会遭遇压力:速率限制、服务器过载、网络不稳定或配额耗尽。Graceful Degradation 在检测到压力时自动从快速/乐观模式切换到慢速/保守模式,并在压力消退时切换回来。
与简单重试逻辑的关键区别:降级是模态的 —— 整个系统调整其行为,而不仅仅是单个请求。
stateDiagram-v2
[*] --> Fast
Fast --> Degraded: Error rate > threshold
Fast --> Degraded: Rate limit hit
Fast --> Degraded: Latency spike
Degraded --> Cooldown: Consecutive errors
Degraded --> Fast: Cooldown expires + success
Cooldown --> Degraded: Cooldown timer expires
Cooldown --> Cooldown: Still failing
note right of Fast: Full parallelism\nAggressive prefetch\nOptimistic caching
note right of Degraded: Sequential execution\nNo prefetch\nConservative timeouts
note right of Cooldown: Pause new requests\nWait for recovery
interface FastModeConfig { maxConcurrency: 5; // 并行 API 调用 prefetchEnabled: true; // 推测性预取可能需要的数据 retryCount: 1; // 短暂故障时快速重试 retryDelay: 500; // 重试间隔 500ms timeout: 30_000; // 每个请求 30s 超时 batchSize: 10; // 一次处理 10 个项目}Fast 模式下系统是乐观的:并行发起请求、推测性预取、使用短超时。这是一切正常时的默认状态。
interface DegradedModeConfig { maxConcurrency: 1; // 仅顺序执行 prefetchEnabled: false; // 不在推测上浪费配额 retryCount: 3; // 更多重试(带退避) retryDelay: 2_000; // 重试间隔 2s timeout: 60_000; // 每个请求 60s 超时(更耐心) batchSize: 1; // 每次处理一个项目}Degraded 模式节省资源:顺序执行、不预取、更长超时、更耐心的重试。Agent 更慢,但仍然可用。
interface CooldownConfig { pauseDuration: 30_000; // 探测前暂停 30s probeInterval: 10_000; // 每 10s 进行健康检查 requiredSuccesses: 3; // 需要 3 次成功探测才能恢复 maxCooldownDuration: 300_000; // Cooldown 最长 5 分钟}Cooldown 模式暂停新任务,并定期用轻量级请求探测 API 以检测是否恢复。
class DegradationController { private mode: 'fast' | 'degraded' | 'cooldown' = 'fast'; private errorWindow: number[] = []; // 近期错误的时间戳 private successCount = 0; private cooldownStart = 0;
private readonly ERROR_WINDOW_MS = 60_000; // 1 分钟滑动窗口 private readonly ERROR_THRESHOLD = 3; // 窗口内 3 个错误 → 降级 private readonly COOLDOWN_THRESHOLD = 5; // 5 个连续错误 → cooldown private readonly RECOVERY_SUCCESSES = 3; // 3 次成功 → 恢复
recordSuccess() { this.successCount++;
if (this.mode === 'cooldown' && this.successCount >= this.RECOVERY_SUCCESSES) { this.transitionTo('fast'); } else if (this.mode === 'degraded') { // 在 degraded 模式下,跟踪恢复窗口 if (this.successCount >= this.RECOVERY_SUCCESSES * 2) { this.transitionTo('fast'); } } }
recordError(error: Error) { const now = Date.now(); this.successCount = 0;
// 添加到滑动窗口 this.errorWindow.push(now); this.errorWindow = this.errorWindow.filter(t => now - t < this.ERROR_WINDOW_MS);
if (this.mode === 'fast' && this.errorWindow.length >= this.ERROR_THRESHOLD) { this.transitionTo('degraded'); } else if (this.mode === 'degraded' && this.errorWindow.length >= this.COOLDOWN_THRESHOLD) { this.transitionTo('cooldown'); } }
private transitionTo(newMode: 'fast' | 'degraded' | 'cooldown') { const oldMode = this.mode; this.mode = newMode;
if (newMode === 'cooldown') { this.cooldownStart = Date.now(); } if (newMode === 'fast') { this.errorWindow = []; this.successCount = 0; }
console.log(`[Degradation] ${oldMode} → ${newMode}`); }
getMode() { return this.mode; } getConfig(): ModeConfig { switch (this.mode) { case 'fast': return FAST_CONFIG; case 'degraded': return DEGRADED_CONFIG; case 'cooldown': return COOLDOWN_CONFIG; } }}不是所有错误都适合相同的重试行为:
interface RetryStrategy { shouldRetry: boolean; delay: number; degradeMode: boolean;}
function classifyError(error: APIError): RetryStrategy { switch (error.status) { // 短暂错误 —— 快速重试 case 500: // 内部服务器错误 case 502: // 网关错误 case 503: // 服务不可用 return { shouldRetry: true, delay: 1_000, degradeMode: false };
// 速率限制 —— 退避重试,降级模式 case 429: const retryAfter = error.headers['retry-after'] ? parseInt(error.headers['retry-after']) * 1000 : 30_000; return { shouldRetry: true, delay: retryAfter, degradeMode: true };
// 过载 —— 长退避,必须降级 case 529: return { shouldRetry: true, delay: 60_000, degradeMode: true };
// 客户端错误 —— 不重试 case 400: // 请求格式错误 case 401: // 未授权 case 403: // 禁止 return { shouldRetry: false, delay: 0, degradeMode: false };
// 未知 —— 保守重试一次 default: return { shouldRetry: true, delay: 5_000, degradeMode: false }; }}function calculateBackoff(attempt: number, baseDelay: number): number { // 指数:1s, 2s, 4s, 8s, 16s... const exponential = baseDelay * Math.pow(2, attempt);
// 上限 60 秒 const capped = Math.min(exponential, 60_000);
// 添加抖动(±25%)以防止惊群效应 const jitter = capped * (0.75 + Math.random() * 0.5);
return Math.floor(jitter);}
// 示例进程:// 第 0 次尝试:1000ms(± 250ms 抖动)// 第 1 次尝试:2000ms(± 500ms 抖动)// 第 2 次尝试:4000ms(± 1000ms 抖动)// 第 3 次尝试:8000ms(± 2000ms 抖动)// 第 4 次尝试:16000ms(± 4000ms 抖动)// 第 5 次尝试:32000ms(± 8000ms 抖动)// 第 6 次及以上:60000ms(上限,± 15000ms 抖动)sequenceDiagram
participant Agent
participant Controller
participant API
Note over Agent,API: Fast Mode
Agent->>API: Request 1 ✅
Agent->>API: Request 2 ✅
Agent->>API: Request 3 ❌ 429
Agent->>Controller: recordError()
Agent->>API: Request 4 ❌ 429
Agent->>Controller: recordError()
Agent->>API: Request 5 ❌ 529
Agent->>Controller: recordError()
Controller-->>Agent: Mode → Degraded
Note over Agent,API: Degraded Mode (Sequential)
Agent->>API: Request 6 ❌ 529
Agent->>API: Request 7 ❌ 529
Controller-->>Agent: Mode → Cooldown
Note over Agent,API: Cooldown (30s pause)
Note over Agent: Waiting...
Agent->>API: Health probe ❌
Note over Agent: Wait 10s...
Agent->>API: Health probe ✅
Agent->>API: Health probe ✅
Agent->>API: Health probe ✅
Controller-->>Agent: Mode → Fast
Note over Agent,API: Fast Mode (Recovered)
Agent->>API: Request 8 ✅
async function cooldownProbe( apiClient: APIClient, controller: DegradationController, config: CooldownConfig,): Promise<void> { const start = Date.now(); let consecutiveSuccesses = 0;
while ( controller.getMode() === 'cooldown' && Date.now() - start < config.maxCooldownDuration ) { await delay(config.probeInterval);
try { // 轻量级探测 —— 最小化 token 使用 await apiClient.complete({ messages: [{ role: 'user', content: 'ping' }], maxTokens: 1, }); consecutiveSuccesses++; controller.recordSuccess();
if (consecutiveSuccesses >= config.requiredSuccesses) { return; // 控制器将转换到 fast 模式 } } catch (error) { consecutiveSuccesses = 0; controller.recordError(error); } }}// ============================================// 可复用 Graceful Degradation 包装器// ============================================
interface DegradableClient<T> { execute(request: T): Promise<unknown>; getMode(): 'fast' | 'degraded' | 'cooldown'; getStats(): DegradationStats;}
interface DegradationStats { mode: string; totalRequests: number; totalErrors: number; modeTransitions: number; averageLatency: number;}
function withGracefulDegradation<T>( client: { execute: (req: T) => Promise<unknown> }, options?: Partial<DegradationOptions>,): DegradableClient<T> { const controller = new DegradationController(); const stats = { totalRequests: 0, totalErrors: 0, modeTransitions: 0, latencies: [] as number[] };
return { async execute(request: T) { const config = controller.getConfig();
// 遵守 cooldown if (controller.getMode() === 'cooldown') { await cooldownProbe(client as any, controller, COOLDOWN_CONFIG); }
// 应用模式特定的配置 let lastError: Error | null = null;
for (let attempt = 0; attempt <= config.retryCount; attempt++) { if (attempt > 0) { await delay(calculateBackoff(attempt, config.retryDelay)); }
const start = performance.now(); stats.totalRequests++;
try { const result = await Promise.race([ client.execute(request), timeout(config.timeout), ]);
const latency = performance.now() - start; stats.latencies.push(latency); controller.recordSuccess();
return result; } catch (error) { lastError = error as Error; stats.totalErrors++;
const strategy = classifyError(error as APIError); controller.recordError(error as Error);
if (!strategy.shouldRetry) throw error; } }
throw lastError; },
getMode() { return controller.getMode(); }, getStats() { return { mode: controller.getMode(), totalRequests: stats.totalRequests, totalErrors: stats.totalErrors, modeTransitions: stats.modeTransitions, averageLatency: stats.latencies.reduce((a, b) => a + b, 0) / stats.latencies.length || 0, }; }, };}| 维度 | 短期(短暂) | 长期(持续) |
|---|---|---|
| 触发条件 | 1 分钟内 1-3 个错误 | 5 分钟内 5 个以上错误 |
| 动作 | 带退避的重试 | 切换到 degraded 模式 |
| 恢复 | 下次成功后自动恢复 | 需要 N 次连续成功 |
| 持续时间 | 秒级 | 分钟到小时 |
| 影响 | 用户几乎感知不到 | 用户看到更慢但可用的系统 |
| 示例 | 网络抖动、502 | 速率限制耗尽、服务中断 |
async function* agentLoopWithDegradation( messages: Message[], tools: Tool[],): AsyncGenerator<AgentEvent> { const apiClient = withGracefulDegradation(rawApiClient);
while (true) { const mode = apiClient.getMode();
// 根据模式调整行为 if (mode === 'degraded') { yield { type: 'status', message: '⚠️ Operating in degraded mode (slower but functional)' }; }
if (mode === 'cooldown') { yield { type: 'status', message: '⏸️ API cooling down, will resume shortly...' }; }
try { const response = await apiClient.execute({ system: systemPrompt, messages, tools: mode === 'fast' ? tools : essentialToolsOnly(tools), });
yield { type: 'response', data: response };
// 在 degraded 模式下,不进行并行 tool 执行 if (mode === 'degraded') { for (const call of response.toolCalls) { const result = await executeTool(call); yield { type: 'tool_result', data: result }; } } else { // Fast 模式:并行执行 const results = await Promise.all( response.toolCalls.map(call => executeTool(call)) ); for (const result of results) { yield { type: 'tool_result', data: result }; } } } catch (error) { yield { type: 'error', data: error }; if (isUnrecoverable(error)) return; } }}LLM API 客户端
任何调用 OpenAI、Anthropic 或其他有速率限制和偶尔中断的 LLM API 的应用。
微服务系统
当下游依赖变慢或失败时的服务网格降级。
实时数据流水线
需要处理来自慢速接收端的 backpressure 的流处理系统。
移动端应用
必须在不稳定网络上正常工作的应用,通过自动减少数据使用量和功能丰富度来适应。