Adaptive’s provider resiliency system ensures your applications stay online even when individual AI providers experience outages. With intelligent failover mechanisms and circuit breaker patterns, you get enterprise-grade reliability.
Cost Consideration: Fallback is disabled by default to control costs. Enable it when you need maximum reliability and can handle potential higher costs from multiple provider calls.

How Resiliency Works

1

Health Monitoring

Continuous monitoring of provider availability, response times, and error rates
2

Failure Detection

Instant detection of timeouts, rate limits, service errors, and degraded performance
3

Automatic Failover

Seamless switching to backup providers based on your configured fallback strategy
4

Recovery Tracking

Automatic re-integration of recovered providers back into the rotation

Failover Strategies

Race Mode (Fastest, Higher Cost)

Send requests to multiple providers simultaneously and use the first successful response:
const completion = await openai.chat.completions.create({
  model: "",
  messages: [{ role: "user", content: "Hello!" }],
  fallback: {
    mode: "race" // Try multiple providers simultaneously
  }
});

Benefits

Ultra-low latency: Get responses from the fastest available provider
Maximum reliability: Multiple providers increase success probability

Trade-offs

Higher costs: Multiple API calls are made simultaneously
Resource usage: Increased bandwidth and compute utilization

Sequential Mode (Cost-Effective)

Try providers one after another until one succeeds:
const completion = await openai.chat.completions.create({
  model: "",
  messages: [{ role: "user", content: "Hello!" }],
  fallback: {
    mode: "sequential" // Try providers one by one
  }
});

Benefits

Lower costs: Only pay for successful requests
Predictable: Clear understanding of provider order and costs

Trade-offs

Higher latency: Additional delay when primary provider fails
Sequential delays: Each failed attempt adds to total response time

Disabled (Default)

Fallback disabled for cost control:
const completion = await openai.chat.completions.create({
  model: "",
  messages: [{ role: "user", content: "Hello!" }]
  // No fallback configuration = disabled
});

Circuit Breaker Patterns

Automatic Circuit Breaking

Adaptive implements intelligent circuit breakers to prevent cascading failures:

Failure Threshold

5 failures within 60 seconds triggers circuit breaker activation

Recovery Time

30 seconds cooldown before attempting to use the provider again

Health Checks

Continuous monitoring to detect when providers recover

Circuit Breaker States

State: All requests flow through normally
Condition: Provider is healthy and responding successfully
Behavior: No restrictions on request routing

Reliability Metrics

Uptime

99.95%
Across all providers

Failover Speed

<500ms
Detection and switch time

Recovery Time

<30s
Provider re-integration

Success Rate

99.9%
With fallback enabled

Configuration Options

Basic Fallback Configuration

fallback
object
Configuration for provider fallback behavior

Advanced Configuration

const completion = await openai.chat.completions.create({
  model: "",
  messages: [{ role: "user", content: "Critical business request" }],
  fallback: {
    mode: "sequential",
    providers: ["openai", "anthropic", "deepseek"], // Custom provider order
    timeout_ms: 45000, // Extended timeout for critical requests
    max_retries: 3 // Maximum retry attempts per provider
  }
});

Error Handling

Comprehensive Error Management

try {
  const completion = await openai.chat.completions.create({
    model: "",
    messages: [{ role: "user", content: "Hello!" }],
    fallback: {
      mode: "sequential"
    }
  });
  
  // Check which provider was used
  console.log(`Used provider: ${completion.provider}`);
  
} catch (error) {
  if (error.code === 'all_providers_failed') {
    console.error('All configured providers are currently unavailable');
    // Implement your fallback strategy (cached response, error message, etc.)
  } else if (error.code === 'timeout') {
    console.error('Request timed out across all providers');
  } else {
    console.error('Unexpected error:', error.message);
  }
}

Error Codes

all_providers_failed

Description: All configured providers returned errors or are unavailable
Action: Implement application-level fallback (cached responses, error messages)

timeout

Description: Request timed out across all attempted providers
Action: Consider increasing timeout_ms or checking network connectivity

rate_limit_exceeded

Description: Rate limits hit across all providers simultaneously
Action: Implement request queuing or backoff strategies

insufficient_quota

Description: Credit/quota exhausted across all providers
Action: Check billing and quota limits on provider accounts

Monitoring and Observability

Real-time Metrics

Track resiliency performance in your Adaptive dashboard:

Provider Health

Real-time status: Availability, response times, and error rates for each provider

Failover Events

Event tracking: When, why, and how often failovers occur

Circuit Breaker Status

State monitoring: Current state and history of circuit breakers

Success Rates

Reliability metrics: Success rates with and without fallback enabled

Alerts and Notifications

1

Provider Outages

Automatic alerts when providers go down or experience degraded performance
2

Failover Events

Notifications when automatic failover is triggered for your requests
3

Recovery Events

Updates when providers recover and are re-integrated into rotation
4

Quota Warnings

Proactive alerts before hitting rate limits or quota exhaustion

Best Practices

When to Enable Fallback

Critical Applications

High-availability needs: Customer-facing applications, real-time systems

Production Workloads

Business-critical: Revenue-generating applications, SLA requirements

Batch Processing

Large-scale operations: Long-running jobs that can’t afford to fail

Emergency Systems

Zero-downtime requirements: Safety-critical or emergency response systems

When to Keep Disabled

Cost-Sensitive Apps

Budget constraints: Development environments, cost-optimized applications

Non-Critical Workloads

Testing environments: Experimental features, internal tools

Batch Jobs

Delay-tolerant: Operations that can retry later without business impact

Development

Local development: Testing and debugging scenarios

Performance Impact

Race Mode Performance

Latency

Best case: 50ms faster than single provider
Worst case: Same as slowest provider

Cost

Typical: 2-3x single provider cost
Maximum: N providers × base cost

Reliability

Failure rate: Exponentially decreased
Uptime: 99.99%+ effective availability

Sequential Mode Performance

Latency

Best case: Same as single provider
Worst case: Sum of all timeouts

Cost

Typical: 1.1-1.3x single provider
Maximum: Same as race mode on full failures

Reliability

Failure rate: Significantly decreased
Uptime: 99.9%+ effective availability

Troubleshooting

Common Issues

Next Steps