跳转到内容

Pattern: Graceful Degradation

此内容尚不支持你的语言。

API-intensive systems inevitably face pressure: rate limits, server overload, network instability, or quota exhaustion. Graceful Degradation automatically shifts from a fast/optimistic mode to a slower/conservative mode when pressure is detected, and shifts back when pressure subsides.

The key distinction from simple retry logic: degradation is modal — the entire system adjusts its behavior, not just individual requests.

stateDiagram-v2
[*] --> Fast
Fast --> Degraded: Error rate > threshold
Fast --> Degraded: Rate limit hit
Fast --> Degraded: Latency spike
Degraded --> Cooldown: Consecutive errors
Degraded --> Fast: Cooldown expires + success
Cooldown --> Degraded: Cooldown timer expires
Cooldown --> Cooldown: Still failing
note right of Fast: Full parallelism\nAggressive prefetch\nOptimistic caching
note right of Degraded: Sequential execution\nNo prefetch\nConservative timeouts
note right of Cooldown: Pause new requests\nWait for recovery
interface FastModeConfig {
maxConcurrency: 5; // Parallel API calls
prefetchEnabled: true; // Speculatively fetch likely-needed data
retryCount: 1; // Quick retry on transient failures
retryDelay: 500; // 500ms between retries
timeout: 30_000; // 30s per request
batchSize: 10; // Process 10 items at once
}

In fast mode, the system is optimistic: it makes parallel requests, speculatively prefetches, and uses short timeouts. This is the default when everything is working.

interface DegradedModeConfig {
maxConcurrency: 1; // Sequential only
prefetchEnabled: false; // Don't waste quota on speculation
retryCount: 3; // More retries (with backoff)
retryDelay: 2_000; // 2s between retries
timeout: 60_000; // 60s per request (more patient)
batchSize: 1; // One item at a time
}

Degraded mode conserves resources: sequential execution, no prefetching, longer timeouts, and more patient retries. The agent is slower but still functional.

interface CooldownConfig {
pauseDuration: 30_000; // 30s pause before probing
probeInterval: 10_000; // Health check every 10s
requiredSuccesses: 3; // Need 3 successful probes to recover
maxCooldownDuration: 300_000; // Max 5 minutes in cooldown
}

Cooldown mode halts new work and periodically probes the API with lightweight requests to detect recovery.

class DegradationController {
private mode: 'fast' | 'degraded' | 'cooldown' = 'fast';
private errorWindow: number[] = []; // Timestamps of recent errors
private successCount = 0;
private cooldownStart = 0;
private readonly ERROR_WINDOW_MS = 60_000; // 1-minute sliding window
private readonly ERROR_THRESHOLD = 3; // 3 errors in window → degrade
private readonly COOLDOWN_THRESHOLD = 5; // 5 consecutive errors → cooldown
private readonly RECOVERY_SUCCESSES = 3; // 3 successes → recover
recordSuccess() {
this.successCount++;
if (this.mode === 'cooldown' && this.successCount >= this.RECOVERY_SUCCESSES) {
this.transitionTo('fast');
} else if (this.mode === 'degraded') {
// In degraded mode, track a recovery window
if (this.successCount >= this.RECOVERY_SUCCESSES * 2) {
this.transitionTo('fast');
}
}
}
recordError(error: Error) {
const now = Date.now();
this.successCount = 0;
// Add to sliding window
this.errorWindow.push(now);
this.errorWindow = this.errorWindow.filter(t => now - t < this.ERROR_WINDOW_MS);
if (this.mode === 'fast' && this.errorWindow.length >= this.ERROR_THRESHOLD) {
this.transitionTo('degraded');
} else if (this.mode === 'degraded' && this.errorWindow.length >= this.COOLDOWN_THRESHOLD) {
this.transitionTo('cooldown');
}
}
private transitionTo(newMode: 'fast' | 'degraded' | 'cooldown') {
const oldMode = this.mode;
this.mode = newMode;
if (newMode === 'cooldown') {
this.cooldownStart = Date.now();
}
if (newMode === 'fast') {
this.errorWindow = [];
this.successCount = 0;
}
console.log(`[Degradation] ${oldMode}${newMode}`);
}
getMode() { return this.mode; }
getConfig(): ModeConfig {
switch (this.mode) {
case 'fast': return FAST_CONFIG;
case 'degraded': return DEGRADED_CONFIG;
case 'cooldown': return COOLDOWN_CONFIG;
}
}
}

Not all errors deserve the same retry behavior:

interface RetryStrategy {
shouldRetry: boolean;
delay: number;
degradeMode: boolean;
}
function classifyError(error: APIError): RetryStrategy {
switch (error.status) {
// Transient — retry quickly
case 500: // Internal server error
case 502: // Bad gateway
case 503: // Service unavailable
return { shouldRetry: true, delay: 1_000, degradeMode: false };
// Rate limited — retry with backoff, degrade mode
case 429:
const retryAfter = error.headers['retry-after']
? parseInt(error.headers['retry-after']) * 1000
: 30_000;
return { shouldRetry: true, delay: retryAfter, degradeMode: true };
// Overloaded — long backoff, definitely degrade
case 529:
return { shouldRetry: true, delay: 60_000, degradeMode: true };
// Client errors — don't retry
case 400: // Bad request
case 401: // Unauthorized
case 403: // Forbidden
return { shouldRetry: false, delay: 0, degradeMode: false };
// Unknown — retry once conservatively
default:
return { shouldRetry: true, delay: 5_000, degradeMode: false };
}
}
function calculateBackoff(attempt: number, baseDelay: number): number {
// Exponential: 1s, 2s, 4s, 8s, 16s...
const exponential = baseDelay * Math.pow(2, attempt);
// Cap at 60 seconds
const capped = Math.min(exponential, 60_000);
// Add jitter (±25%) to prevent thundering herd
const jitter = capped * (0.75 + Math.random() * 0.5);
return Math.floor(jitter);
}
// Example progression:
// Attempt 0: 1000ms (± 250ms jitter)
// Attempt 1: 2000ms (± 500ms jitter)
// Attempt 2: 4000ms (± 1000ms jitter)
// Attempt 3: 8000ms (± 2000ms jitter)
// Attempt 4: 16000ms (± 4000ms jitter)
// Attempt 5: 32000ms (± 8000ms jitter)
// Attempt 6+: 60000ms (capped, ± 15000ms jitter)
sequenceDiagram
participant Agent
participant Controller
participant API
Note over Agent,API: Fast Mode
Agent->>API: Request 1 ✅
Agent->>API: Request 2 ✅
Agent->>API: Request 3 ❌ 429
Agent->>Controller: recordError()
Agent->>API: Request 4 ❌ 429
Agent->>Controller: recordError()
Agent->>API: Request 5 ❌ 529
Agent->>Controller: recordError()
Controller-->>Agent: Mode → Degraded
Note over Agent,API: Degraded Mode (Sequential)
Agent->>API: Request 6 ❌ 529
Agent->>API: Request 7 ❌ 529
Controller-->>Agent: Mode → Cooldown
Note over Agent,API: Cooldown (30s pause)
Note over Agent: Waiting...
Agent->>API: Health probe ❌
Note over Agent: Wait 10s...
Agent->>API: Health probe ✅
Agent->>API: Health probe ✅
Agent->>API: Health probe ✅
Controller-->>Agent: Mode → Fast
Note over Agent,API: Fast Mode (Recovered)
Agent->>API: Request 8 ✅
async function cooldownProbe(
apiClient: APIClient,
controller: DegradationController,
config: CooldownConfig,
): Promise<void> {
const start = Date.now();
let consecutiveSuccesses = 0;
while (
controller.getMode() === 'cooldown' &&
Date.now() - start < config.maxCooldownDuration
) {
await delay(config.probeInterval);
try {
// Lightweight probe — minimal token usage
await apiClient.complete({
messages: [{ role: 'user', content: 'ping' }],
maxTokens: 1,
});
consecutiveSuccesses++;
controller.recordSuccess();
if (consecutiveSuccesses >= config.requiredSuccesses) {
return; // Controller will transition to fast mode
}
} catch (error) {
consecutiveSuccesses = 0;
controller.recordError(error);
}
}
}
// ============================================
// Reusable Graceful Degradation Wrapper
// ============================================
interface DegradableClient<T> {
execute(request: T): Promise<unknown>;
getMode(): 'fast' | 'degraded' | 'cooldown';
getStats(): DegradationStats;
}
interface DegradationStats {
mode: string;
totalRequests: number;
totalErrors: number;
modeTransitions: number;
averageLatency: number;
}
function withGracefulDegradation<T>(
client: { execute: (req: T) => Promise<unknown> },
options?: Partial<DegradationOptions>,
): DegradableClient<T> {
const controller = new DegradationController();
const stats = { totalRequests: 0, totalErrors: 0, modeTransitions: 0, latencies: [] as number[] };
return {
async execute(request: T) {
const config = controller.getConfig();
// Respect cooldown
if (controller.getMode() === 'cooldown') {
await cooldownProbe(client as any, controller, COOLDOWN_CONFIG);
}
// Apply mode-specific configuration
let lastError: Error | null = null;
for (let attempt = 0; attempt <= config.retryCount; attempt++) {
if (attempt > 0) {
await delay(calculateBackoff(attempt, config.retryDelay));
}
const start = performance.now();
stats.totalRequests++;
try {
const result = await Promise.race([
client.execute(request),
timeout(config.timeout),
]);
const latency = performance.now() - start;
stats.latencies.push(latency);
controller.recordSuccess();
return result;
} catch (error) {
lastError = error as Error;
stats.totalErrors++;
const strategy = classifyError(error as APIError);
controller.recordError(error as Error);
if (!strategy.shouldRetry) throw error;
}
}
throw lastError;
},
getMode() { return controller.getMode(); },
getStats() {
return {
mode: controller.getMode(),
totalRequests: stats.totalRequests,
totalErrors: stats.totalErrors,
modeTransitions: stats.modeTransitions,
averageLatency: stats.latencies.reduce((a, b) => a + b, 0) / stats.latencies.length || 0,
};
},
};
}
AspectShort-Term (Transient)Long-Term (Sustained)
Trigger1-3 errors in 1 minute5+ errors in 5 minutes
ActionRetry with backoffSwitch to degraded mode
RecoveryAutomatic on next successRequires N consecutive successes
DurationSecondsMinutes to hours
ImpactUser barely noticesUser sees slower but working system
ExampleNetwork blip, 502Rate limit exhaustion, outage
async function* agentLoopWithDegradation(
messages: Message[],
tools: Tool[],
): AsyncGenerator<AgentEvent> {
const apiClient = withGracefulDegradation(rawApiClient);
while (true) {
const mode = apiClient.getMode();
// Adjust behavior based on mode
if (mode === 'degraded') {
yield { type: 'status', message: '⚠️ Operating in degraded mode (slower but functional)' };
}
if (mode === 'cooldown') {
yield { type: 'status', message: '⏸️ API cooling down, will resume shortly...' };
}
try {
const response = await apiClient.execute({
system: systemPrompt,
messages,
tools: mode === 'fast' ? tools : essentialToolsOnly(tools),
});
yield { type: 'response', data: response };
// In degraded mode, don't do parallel tool execution
if (mode === 'degraded') {
for (const call of response.toolCalls) {
const result = await executeTool(call);
yield { type: 'tool_result', data: result };
}
} else {
// Fast mode: parallel execution
const results = await Promise.all(
response.toolCalls.map(call => executeTool(call))
);
for (const result of results) {
yield { type: 'tool_result', data: result };
}
}
} catch (error) {
yield { type: 'error', data: error };
if (isUnrecoverable(error)) return;
}
}
}

LLM API Clients

Any application calling OpenAI, Anthropic, or other LLM APIs that have rate limits and occasional outages.

Microservice Systems

Service mesh degradation when downstream dependencies slow down or fail.

Real-Time Data Pipelines

Stream processing systems that need to handle backpressure from slow sinks.

Mobile Applications

Apps that must function on flaky networks by automatically reducing data usage and feature richness.

  1. Modes, not just retries: Degradation changes the system’s entire behavior pattern, not just individual request retry counts
  2. Automatic transitions: Mode changes based on measured error rates, not manual intervention
  3. Hysteresis: Recovery requires sustained success (3+ consecutive), preventing flapping between modes
  4. Transparency: The user is informed when the system degrades (“Operating in degraded mode”)
  5. Continuity: Even in the worst case (cooldown), the system eventually recovers rather than failing permanently