Pattern: Graceful Degradation

Pattern Essence

API-intensive systems inevitably face pressure: rate limits, server overload, network instability, or quota exhaustion. Graceful Degradation automatically shifts from a fast/optimistic mode to a slower/conservative mode when pressure is detected, and shifts back when pressure subsides.

The key distinction from simple retry logic: degradation is modal — the entire system adjusts its behavior, not just individual requests.

stateDiagram-v2
    [*] --> Fast

    Fast --> Degraded: Error rate > threshold
    Fast --> Degraded: Rate limit hit
    Fast --> Degraded: Latency spike

    Degraded --> Cooldown: Consecutive errors
    Degraded --> Fast: Cooldown expires + success

    Cooldown --> Degraded: Cooldown timer expires
    Cooldown --> Cooldown: Still failing

    note right of Fast: Full parallelism\nAggressive prefetch\nOptimistic caching
    note right of Degraded: Sequential execution\nNo prefetch\nConservative timeouts
    note right of Cooldown: Pause new requests\nWait for recovery

The Three Modes

Mode 1: Fast (Normal Operation)

interface FastModeConfig {
  maxConcurrency: 5;              // Parallel API calls
  prefetchEnabled: true;          // Speculatively fetch likely-needed data
  retryCount: 1;                  // Quick retry on transient failures
  retryDelay: 500;                // 500ms between retries
  timeout: 30_000;                // 30s per request
  batchSize: 10;                  // Process 10 items at once
}

In fast mode, the system is optimistic: it makes parallel requests, speculatively prefetches, and uses short timeouts. This is the default when everything is working.

Mode 2: Degraded (Under Pressure)

interface DegradedModeConfig {
  maxConcurrency: 1;              // Sequential only
  prefetchEnabled: false;         // Don't waste quota on speculation
  retryCount: 3;                  // More retries (with backoff)
  retryDelay: 2_000;              // 2s between retries
  timeout: 60_000;                // 60s per request (more patient)
  batchSize: 1;                   // One item at a time
}

Degraded mode conserves resources: sequential execution, no prefetching, longer timeouts, and more patient retries. The agent is slower but still functional.

Mode 3: Cooldown (Recovery Period)

interface CooldownConfig {
  pauseDuration: 30_000;          // 30s pause before probing
  probeInterval: 10_000;          // Health check every 10s
  requiredSuccesses: 3;           // Need 3 successful probes to recover
  maxCooldownDuration: 300_000;   // Max 5 minutes in cooldown
}

Cooldown mode halts new work and periodically probes the API with lightweight requests to detect recovery.

Mode Transition Logic

class DegradationController {
  private mode: 'fast' | 'degraded' | 'cooldown' = 'fast';
  private errorWindow: number[] = [];  // Timestamps of recent errors
  private successCount = 0;
  private cooldownStart = 0;

  private readonly ERROR_WINDOW_MS = 60_000;    // 1-minute sliding window
  private readonly ERROR_THRESHOLD = 3;          // 3 errors in window → degrade
  private readonly COOLDOWN_THRESHOLD = 5;       // 5 consecutive errors → cooldown
  private readonly RECOVERY_SUCCESSES = 3;       // 3 successes → recover

  recordSuccess() {
    this.successCount++;

    if (this.mode === 'cooldown' && this.successCount >= this.RECOVERY_SUCCESSES) {
      this.transitionTo('fast');
    } else if (this.mode === 'degraded') {
      // In degraded mode, track a recovery window
      if (this.successCount >= this.RECOVERY_SUCCESSES * 2) {
        this.transitionTo('fast');
      }
    }
  }

  recordError(error: Error) {
    const now = Date.now();
    this.successCount = 0;

    // Add to sliding window
    this.errorWindow.push(now);
    this.errorWindow = this.errorWindow.filter(t => now - t < this.ERROR_WINDOW_MS);

    if (this.mode === 'fast' && this.errorWindow.length >= this.ERROR_THRESHOLD) {
      this.transitionTo('degraded');
    } else if (this.mode === 'degraded' && this.errorWindow.length >= this.COOLDOWN_THRESHOLD) {
      this.transitionTo('cooldown');
    }
  }

  private transitionTo(newMode: 'fast' | 'degraded' | 'cooldown') {
    const oldMode = this.mode;
    this.mode = newMode;

    if (newMode === 'cooldown') {
      this.cooldownStart = Date.now();
    }
    if (newMode === 'fast') {
      this.errorWindow = [];
      this.successCount = 0;
    }

    console.log(`[Degradation] ${oldMode} → ${newMode}`);
  }

  getMode() { return this.mode; }
  getConfig(): ModeConfig {
    switch (this.mode) {
      case 'fast': return FAST_CONFIG;
      case 'degraded': return DEGRADED_CONFIG;
      case 'cooldown': return COOLDOWN_CONFIG;
    }
  }
}

Retry Strategy: Tiered Backoff

Not all errors deserve the same retry behavior:

interface RetryStrategy {
  shouldRetry: boolean;
  delay: number;
  degradeMode: boolean;
}

function classifyError(error: APIError): RetryStrategy {
  switch (error.status) {
    // Transient — retry quickly
    case 500:  // Internal server error
    case 502:  // Bad gateway
    case 503:  // Service unavailable
      return { shouldRetry: true, delay: 1_000, degradeMode: false };

    // Rate limited — retry with backoff, degrade mode
    case 429:
      const retryAfter = error.headers['retry-after']
        ? parseInt(error.headers['retry-after']) * 1000
        : 30_000;
      return { shouldRetry: true, delay: retryAfter, degradeMode: true };

    // Overloaded — long backoff, definitely degrade
    case 529:
      return { shouldRetry: true, delay: 60_000, degradeMode: true };

    // Client errors — don't retry
    case 400:  // Bad request
    case 401:  // Unauthorized
    case 403:  // Forbidden
      return { shouldRetry: false, delay: 0, degradeMode: false };

    // Unknown — retry once conservatively
    default:
      return { shouldRetry: true, delay: 5_000, degradeMode: false };
  }
}

Exponential Backoff with Jitter

function calculateBackoff(attempt: number, baseDelay: number): number {
  // Exponential: 1s, 2s, 4s, 8s, 16s...
  const exponential = baseDelay * Math.pow(2, attempt);

  // Cap at 60 seconds
  const capped = Math.min(exponential, 60_000);

  // Add jitter (±25%) to prevent thundering herd
  const jitter = capped * (0.75 + Math.random() * 0.5);

  return Math.floor(jitter);
}

// Example progression:
// Attempt 0: 1000ms  (± 250ms jitter)
// Attempt 1: 2000ms  (± 500ms jitter)
// Attempt 2: 4000ms  (± 1000ms jitter)
// Attempt 3: 8000ms  (± 2000ms jitter)
// Attempt 4: 16000ms (± 4000ms jitter)
// Attempt 5: 32000ms (± 8000ms jitter)
// Attempt 6+: 60000ms (capped, ± 15000ms jitter)

Cooldown and Recovery

sequenceDiagram
    participant Agent
    participant Controller
    participant API

    Note over Agent,API: Fast Mode
    Agent->>API: Request 1 ✅
    Agent->>API: Request 2 ✅
    Agent->>API: Request 3 ❌ 429
    Agent->>Controller: recordError()
    Agent->>API: Request 4 ❌ 429
    Agent->>Controller: recordError()
    Agent->>API: Request 5 ❌ 529
    Agent->>Controller: recordError()
    Controller-->>Agent: Mode → Degraded

    Note over Agent,API: Degraded Mode (Sequential)
    Agent->>API: Request 6 ❌ 529
    Agent->>API: Request 7 ❌ 529
    Controller-->>Agent: Mode → Cooldown

    Note over Agent,API: Cooldown (30s pause)
    Note over Agent: Waiting...
    Agent->>API: Health probe ❌
    Note over Agent: Wait 10s...
    Agent->>API: Health probe ✅
    Agent->>API: Health probe ✅
    Agent->>API: Health probe ✅
    Controller-->>Agent: Mode → Fast

    Note over Agent,API: Fast Mode (Recovered)
    Agent->>API: Request 8 ✅

Cooldown Health Probing

async function cooldownProbe(
  apiClient: APIClient,
  controller: DegradationController,
  config: CooldownConfig,
): Promise<void> {
  const start = Date.now();
  let consecutiveSuccesses = 0;

  while (
    controller.getMode() === 'cooldown' &&
    Date.now() - start < config.maxCooldownDuration
  ) {
    await delay(config.probeInterval);

    try {
      // Lightweight probe — minimal token usage
      await apiClient.complete({
        messages: [{ role: 'user', content: 'ping' }],
        maxTokens: 1,
      });
      consecutiveSuccesses++;
      controller.recordSuccess();

      if (consecutiveSuccesses >= config.requiredSuccesses) {
        return; // Controller will transition to fast mode
      }
    } catch (error) {
      consecutiveSuccesses = 0;
      controller.recordError(error);
    }
  }
}

Reusable Template

// ============================================
// Reusable Graceful Degradation Wrapper
// ============================================

interface DegradableClient<T> {
  execute(request: T): Promise<unknown>;
  getMode(): 'fast' | 'degraded' | 'cooldown';
  getStats(): DegradationStats;
}

interface DegradationStats {
  mode: string;
  totalRequests: number;
  totalErrors: number;
  modeTransitions: number;
  averageLatency: number;
}

function withGracefulDegradation<T>(
  client: { execute: (req: T) => Promise<unknown> },
  options?: Partial<DegradationOptions>,
): DegradableClient<T> {
  const controller = new DegradationController();
  const stats = { totalRequests: 0, totalErrors: 0, modeTransitions: 0, latencies: [] as number[] };

  return {
    async execute(request: T) {
      const config = controller.getConfig();

      // Respect cooldown
      if (controller.getMode() === 'cooldown') {
        await cooldownProbe(client as any, controller, COOLDOWN_CONFIG);
      }

      // Apply mode-specific configuration
      let lastError: Error | null = null;

      for (let attempt = 0; attempt <= config.retryCount; attempt++) {
        if (attempt > 0) {
          await delay(calculateBackoff(attempt, config.retryDelay));
        }

        const start = performance.now();
        stats.totalRequests++;

        try {
          const result = await Promise.race([
            client.execute(request),
            timeout(config.timeout),
          ]);

          const latency = performance.now() - start;
          stats.latencies.push(latency);
          controller.recordSuccess();

          return result;
        } catch (error) {
          lastError = error as Error;
          stats.totalErrors++;

          const strategy = classifyError(error as APIError);
          controller.recordError(error as Error);

          if (!strategy.shouldRetry) throw error;
        }
      }

      throw lastError;
    },

    getMode() { return controller.getMode(); },
    getStats() {
      return {
        mode: controller.getMode(),
        totalRequests: stats.totalRequests,
        totalErrors: stats.totalErrors,
        modeTransitions: stats.modeTransitions,
        averageLatency: stats.latencies.reduce((a, b) => a + b, 0) / stats.latencies.length || 0,
      };
    },
  };
}

Short-Term vs. Long-Term Degradation

Aspect	Short-Term (Transient)	Long-Term (Sustained)
Trigger	1-3 errors in 1 minute	5+ errors in 5 minutes
Action	Retry with backoff	Switch to degraded mode
Recovery	Automatic on next success	Requires N consecutive successes
Duration	Seconds	Minutes to hours
Impact	User barely notices	User sees slower but working system
Example	Network blip, 502	Rate limit exhaustion, outage

Integration with Agent Loop

async function* agentLoopWithDegradation(
  messages: Message[],
  tools: Tool[],
): AsyncGenerator<AgentEvent> {
  const apiClient = withGracefulDegradation(rawApiClient);

  while (true) {
    const mode = apiClient.getMode();

    // Adjust behavior based on mode
    if (mode === 'degraded') {
      yield { type: 'status', message: '⚠️ Operating in degraded mode (slower but functional)' };
    }

    if (mode === 'cooldown') {
      yield { type: 'status', message: '⏸️ API cooling down, will resume shortly...' };
    }

    try {
      const response = await apiClient.execute({
        system: systemPrompt,
        messages,
        tools: mode === 'fast' ? tools : essentialToolsOnly(tools),
      });

      yield { type: 'response', data: response };

      // In degraded mode, don't do parallel tool execution
      if (mode === 'degraded') {
        for (const call of response.toolCalls) {
          const result = await executeTool(call);
          yield { type: 'tool_result', data: result };
        }
      } else {
        // Fast mode: parallel execution
        const results = await Promise.all(
          response.toolCalls.map(call => executeTool(call))
        );
        for (const result of results) {
          yield { type: 'tool_result', data: result };
        }
      }
    } catch (error) {
      yield { type: 'error', data: error };
      if (isUnrecoverable(error)) return;
    }
  }
}

Applicable Scenarios

LLM API Clients

Any application calling OpenAI, Anthropic, or other LLM APIs that have rate limits and occasional outages.

Microservice Systems

Service mesh degradation when downstream dependencies slow down or fail.

Real-Time Data Pipelines

Stream processing systems that need to handle backpressure from slow sinks.

Mobile Applications

Apps that must function on flaky networks by automatically reducing data usage and feature richness.

Key Takeaways

Modes, not just retries: Degradation changes the system’s entire behavior pattern, not just individual request retry counts
Automatic transitions: Mode changes based on measured error rates, not manual intervention
Hysteresis: Recovery requires sustained success (3+ consecutive), preventing flapping between modes
Transparency: The user is informed when the system degrades (“Operating in degraded mode”)
Continuity: Even in the worst case (cooldown), the system eventually recovers rather than failing permanently