LLM API Clients
Any application calling OpenAI, Anthropic, or other LLM APIs that have rate limits and occasional outages.
API-intensive systems inevitably face pressure: rate limits, server overload, network instability, or quota exhaustion. Graceful Degradation automatically shifts from a fast/optimistic mode to a slower/conservative mode when pressure is detected, and shifts back when pressure subsides.
The key distinction from simple retry logic: degradation is modal — the entire system adjusts its behavior, not just individual requests.
stateDiagram-v2 [*] --> Fast
Fast --> Degraded: Error rate > threshold Fast --> Degraded: Rate limit hit Fast --> Degraded: Latency spike
Degraded --> Cooldown: Consecutive errors Degraded --> Fast: Cooldown expires + success
Cooldown --> Degraded: Cooldown timer expires Cooldown --> Cooldown: Still failing
note right of Fast: Full parallelism\nAggressive prefetch\nOptimistic caching note right of Degraded: Sequential execution\nNo prefetch\nConservative timeouts note right of Cooldown: Pause new requests\nWait for recoveryinterface FastModeConfig { maxConcurrency: 5; // Parallel API calls prefetchEnabled: true; // Speculatively fetch likely-needed data retryCount: 1; // Quick retry on transient failures retryDelay: 500; // 500ms between retries timeout: 30_000; // 30s per request batchSize: 10; // Process 10 items at once}In fast mode, the system is optimistic: it makes parallel requests, speculatively prefetches, and uses short timeouts. This is the default when everything is working.
interface DegradedModeConfig { maxConcurrency: 1; // Sequential only prefetchEnabled: false; // Don't waste quota on speculation retryCount: 3; // More retries (with backoff) retryDelay: 2_000; // 2s between retries timeout: 60_000; // 60s per request (more patient) batchSize: 1; // One item at a time}Degraded mode conserves resources: sequential execution, no prefetching, longer timeouts, and more patient retries. The agent is slower but still functional.
interface CooldownConfig { pauseDuration: 30_000; // 30s pause before probing probeInterval: 10_000; // Health check every 10s requiredSuccesses: 3; // Need 3 successful probes to recover maxCooldownDuration: 300_000; // Max 5 minutes in cooldown}Cooldown mode halts new work and periodically probes the API with lightweight requests to detect recovery.
class DegradationController { private mode: 'fast' | 'degraded' | 'cooldown' = 'fast'; private errorWindow: number[] = []; // Timestamps of recent errors private successCount = 0; private cooldownStart = 0;
private readonly ERROR_WINDOW_MS = 60_000; // 1-minute sliding window private readonly ERROR_THRESHOLD = 3; // 3 errors in window → degrade private readonly COOLDOWN_THRESHOLD = 5; // 5 consecutive errors → cooldown private readonly RECOVERY_SUCCESSES = 3; // 3 successes → recover
recordSuccess() { this.successCount++;
if (this.mode === 'cooldown' && this.successCount >= this.RECOVERY_SUCCESSES) { this.transitionTo('fast'); } else if (this.mode === 'degraded') { // In degraded mode, track a recovery window if (this.successCount >= this.RECOVERY_SUCCESSES * 2) { this.transitionTo('fast'); } } }
recordError(error: Error) { const now = Date.now(); this.successCount = 0;
// Add to sliding window this.errorWindow.push(now); this.errorWindow = this.errorWindow.filter(t => now - t < this.ERROR_WINDOW_MS);
if (this.mode === 'fast' && this.errorWindow.length >= this.ERROR_THRESHOLD) { this.transitionTo('degraded'); } else if (this.mode === 'degraded' && this.errorWindow.length >= this.COOLDOWN_THRESHOLD) { this.transitionTo('cooldown'); } }
private transitionTo(newMode: 'fast' | 'degraded' | 'cooldown') { const oldMode = this.mode; this.mode = newMode;
if (newMode === 'cooldown') { this.cooldownStart = Date.now(); } if (newMode === 'fast') { this.errorWindow = []; this.successCount = 0; }
console.log(`[Degradation] ${oldMode} → ${newMode}`); }
getMode() { return this.mode; } getConfig(): ModeConfig { switch (this.mode) { case 'fast': return FAST_CONFIG; case 'degraded': return DEGRADED_CONFIG; case 'cooldown': return COOLDOWN_CONFIG; } }}Not all errors deserve the same retry behavior:
interface RetryStrategy { shouldRetry: boolean; delay: number; degradeMode: boolean;}
function classifyError(error: APIError): RetryStrategy { switch (error.status) { // Transient — retry quickly case 500: // Internal server error case 502: // Bad gateway case 503: // Service unavailable return { shouldRetry: true, delay: 1_000, degradeMode: false };
// Rate limited — retry with backoff, degrade mode case 429: const retryAfter = error.headers['retry-after'] ? parseInt(error.headers['retry-after']) * 1000 : 30_000; return { shouldRetry: true, delay: retryAfter, degradeMode: true };
// Overloaded — long backoff, definitely degrade case 529: return { shouldRetry: true, delay: 60_000, degradeMode: true };
// Client errors — don't retry case 400: // Bad request case 401: // Unauthorized case 403: // Forbidden return { shouldRetry: false, delay: 0, degradeMode: false };
// Unknown — retry once conservatively default: return { shouldRetry: true, delay: 5_000, degradeMode: false }; }}function calculateBackoff(attempt: number, baseDelay: number): number { // Exponential: 1s, 2s, 4s, 8s, 16s... const exponential = baseDelay * Math.pow(2, attempt);
// Cap at 60 seconds const capped = Math.min(exponential, 60_000);
// Add jitter (±25%) to prevent thundering herd const jitter = capped * (0.75 + Math.random() * 0.5);
return Math.floor(jitter);}
// Example progression:// Attempt 0: 1000ms (± 250ms jitter)// Attempt 1: 2000ms (± 500ms jitter)// Attempt 2: 4000ms (± 1000ms jitter)// Attempt 3: 8000ms (± 2000ms jitter)// Attempt 4: 16000ms (± 4000ms jitter)// Attempt 5: 32000ms (± 8000ms jitter)// Attempt 6+: 60000ms (capped, ± 15000ms jitter)sequenceDiagram participant Agent participant Controller participant API
Note over Agent,API: Fast Mode Agent->>API: Request 1 ✅ Agent->>API: Request 2 ✅ Agent->>API: Request 3 ❌ 429 Agent->>Controller: recordError() Agent->>API: Request 4 ❌ 429 Agent->>Controller: recordError() Agent->>API: Request 5 ❌ 529 Agent->>Controller: recordError() Controller-->>Agent: Mode → Degraded
Note over Agent,API: Degraded Mode (Sequential) Agent->>API: Request 6 ❌ 529 Agent->>API: Request 7 ❌ 529 Controller-->>Agent: Mode → Cooldown
Note over Agent,API: Cooldown (30s pause) Note over Agent: Waiting... Agent->>API: Health probe ❌ Note over Agent: Wait 10s... Agent->>API: Health probe ✅ Agent->>API: Health probe ✅ Agent->>API: Health probe ✅ Controller-->>Agent: Mode → Fast
Note over Agent,API: Fast Mode (Recovered) Agent->>API: Request 8 ✅async function cooldownProbe( apiClient: APIClient, controller: DegradationController, config: CooldownConfig,): Promise<void> { const start = Date.now(); let consecutiveSuccesses = 0;
while ( controller.getMode() === 'cooldown' && Date.now() - start < config.maxCooldownDuration ) { await delay(config.probeInterval);
try { // Lightweight probe — minimal token usage await apiClient.complete({ messages: [{ role: 'user', content: 'ping' }], maxTokens: 1, }); consecutiveSuccesses++; controller.recordSuccess();
if (consecutiveSuccesses >= config.requiredSuccesses) { return; // Controller will transition to fast mode } } catch (error) { consecutiveSuccesses = 0; controller.recordError(error); } }}// ============================================// Reusable Graceful Degradation Wrapper// ============================================
interface DegradableClient<T> { execute(request: T): Promise<unknown>; getMode(): 'fast' | 'degraded' | 'cooldown'; getStats(): DegradationStats;}
interface DegradationStats { mode: string; totalRequests: number; totalErrors: number; modeTransitions: number; averageLatency: number;}
function withGracefulDegradation<T>( client: { execute: (req: T) => Promise<unknown> }, options?: Partial<DegradationOptions>,): DegradableClient<T> { const controller = new DegradationController(); const stats = { totalRequests: 0, totalErrors: 0, modeTransitions: 0, latencies: [] as number[] };
return { async execute(request: T) { const config = controller.getConfig();
// Respect cooldown if (controller.getMode() === 'cooldown') { await cooldownProbe(client as any, controller, COOLDOWN_CONFIG); }
// Apply mode-specific configuration let lastError: Error | null = null;
for (let attempt = 0; attempt <= config.retryCount; attempt++) { if (attempt > 0) { await delay(calculateBackoff(attempt, config.retryDelay)); }
const start = performance.now(); stats.totalRequests++;
try { const result = await Promise.race([ client.execute(request), timeout(config.timeout), ]);
const latency = performance.now() - start; stats.latencies.push(latency); controller.recordSuccess();
return result; } catch (error) { lastError = error as Error; stats.totalErrors++;
const strategy = classifyError(error as APIError); controller.recordError(error as Error);
if (!strategy.shouldRetry) throw error; } }
throw lastError; },
getMode() { return controller.getMode(); }, getStats() { return { mode: controller.getMode(), totalRequests: stats.totalRequests, totalErrors: stats.totalErrors, modeTransitions: stats.modeTransitions, averageLatency: stats.latencies.reduce((a, b) => a + b, 0) / stats.latencies.length || 0, }; }, };}| Aspect | Short-Term (Transient) | Long-Term (Sustained) |
|---|---|---|
| Trigger | 1-3 errors in 1 minute | 5+ errors in 5 minutes |
| Action | Retry with backoff | Switch to degraded mode |
| Recovery | Automatic on next success | Requires N consecutive successes |
| Duration | Seconds | Minutes to hours |
| Impact | User barely notices | User sees slower but working system |
| Example | Network blip, 502 | Rate limit exhaustion, outage |
async function* agentLoopWithDegradation( messages: Message[], tools: Tool[],): AsyncGenerator<AgentEvent> { const apiClient = withGracefulDegradation(rawApiClient);
while (true) { const mode = apiClient.getMode();
// Adjust behavior based on mode if (mode === 'degraded') { yield { type: 'status', message: '⚠️ Operating in degraded mode (slower but functional)' }; }
if (mode === 'cooldown') { yield { type: 'status', message: '⏸️ API cooling down, will resume shortly...' }; }
try { const response = await apiClient.execute({ system: systemPrompt, messages, tools: mode === 'fast' ? tools : essentialToolsOnly(tools), });
yield { type: 'response', data: response };
// In degraded mode, don't do parallel tool execution if (mode === 'degraded') { for (const call of response.toolCalls) { const result = await executeTool(call); yield { type: 'tool_result', data: result }; } } else { // Fast mode: parallel execution const results = await Promise.all( response.toolCalls.map(call => executeTool(call)) ); for (const result of results) { yield { type: 'tool_result', data: result }; } } } catch (error) { yield { type: 'error', data: error }; if (isUnrecoverable(error)) return; } }}LLM API Clients
Any application calling OpenAI, Anthropic, or other LLM APIs that have rate limits and occasional outages.
Microservice Systems
Service mesh degradation when downstream dependencies slow down or fail.
Real-Time Data Pipelines
Stream processing systems that need to handle backpressure from slow sinks.
Mobile Applications
Apps that must function on flaky networks by automatically reducing data usage and feature richness.