Cost Consideration: Fallback is disabled by default to control costs. Enable it when you need maximum reliability and can handle potential higher costs from multiple provider calls.
How Resiliency Works
1
Health Monitoring
Continuous monitoring of provider availability, response times, and error rates
2
Failure Detection
Instant detection of timeouts, rate limits, service errors, and degraded performance
3
Automatic Failover
Seamless switching to backup providers based on your configured fallback strategy
4
Recovery Tracking
Automatic re-integration of recovered providers back into the rotation
Failover Strategies
Race Mode (Fastest, Higher Cost)
Send requests to multiple providers simultaneously and use the first successful response:Benefits
Ultra-low latency: Get responses from the fastest available provider
Maximum reliability: Multiple providers increase success probability
Maximum reliability: Multiple providers increase success probability
Trade-offs
Higher costs: Multiple API calls are made simultaneously
Resource usage: Increased bandwidth and compute utilization
Resource usage: Increased bandwidth and compute utilization
Sequential Mode (Cost-Effective)
Try providers one after another until one succeeds:Benefits
Lower costs: Only pay for successful requests
Predictable: Clear understanding of provider order and costs
Predictable: Clear understanding of provider order and costs
Trade-offs
Higher latency: Additional delay when primary provider fails
Sequential delays: Each failed attempt adds to total response time
Sequential delays: Each failed attempt adds to total response time
Disabled (Default)
Fallback disabled for cost control:Circuit Breaker Patterns
Automatic Circuit Breaking
Adaptive implements intelligent circuit breakers to prevent cascading failures:Failure Threshold
5 failures within 60 seconds triggers circuit breaker activation
Recovery Time
30 seconds cooldown before attempting to use the provider again
Health Checks
Continuous monitoring to detect when providers recover
Circuit Breaker States
State: All requests flow through normally
Condition: Provider is healthy and responding successfully
Behavior: No restrictions on request routing
Condition: Provider is healthy and responding successfully
Behavior: No restrictions on request routing
Reliability Metrics
Uptime
99.95%
Across all providers
Across all providers
Failover Speed
<500ms
Detection and switch time
Detection and switch time
Recovery Time
<30s
Provider re-integration
Provider re-integration
Success Rate
99.9%
With fallback enabled
With fallback enabled
Configuration Options
Basic Fallback Configuration
Configuration for provider fallback behavior
Advanced Configuration
Error Handling
Comprehensive Error Management
Error Codes
all_providers_failed
Description: All configured providers returned errors or are unavailable
Action: Implement application-level fallback (cached responses, error messages)
Action: Implement application-level fallback (cached responses, error messages)
timeout
Description: Request timed out across all attempted providers
Action: Consider increasing timeout_ms or checking network connectivity
Action: Consider increasing timeout_ms or checking network connectivity
rate_limit_exceeded
Description: Rate limits hit across all providers simultaneously
Action: Implement request queuing or backoff strategies
Action: Implement request queuing or backoff strategies
insufficient_quota
Description: Credit/quota exhausted across all providers
Action: Check billing and quota limits on provider accounts
Action: Check billing and quota limits on provider accounts
Monitoring and Observability
Real-time Metrics
Track resiliency performance in your Adaptive dashboard:Provider Health
Real-time status: Availability, response times, and error rates for each provider
Failover Events
Event tracking: When, why, and how often failovers occur
Circuit Breaker Status
State monitoring: Current state and history of circuit breakers
Success Rates
Reliability metrics: Success rates with and without fallback enabled
Alerts and Notifications
1
Provider Outages
Automatic alerts when providers go down or experience degraded performance
2
Failover Events
Notifications when automatic failover is triggered for your requests
3
Recovery Events
Updates when providers recover and are re-integrated into rotation
4
Quota Warnings
Proactive alerts before hitting rate limits or quota exhaustion
Best Practices
When to Enable Fallback
Critical Applications
High-availability needs: Customer-facing applications, real-time systems
Production Workloads
Business-critical: Revenue-generating applications, SLA requirements
Batch Processing
Large-scale operations: Long-running jobs that can’t afford to fail
Emergency Systems
Zero-downtime requirements: Safety-critical or emergency response systems
When to Keep Disabled
Cost-Sensitive Apps
Budget constraints: Development environments, cost-optimized applications
Non-Critical Workloads
Testing environments: Experimental features, internal tools
Batch Jobs
Delay-tolerant: Operations that can retry later without business impact
Development
Local development: Testing and debugging scenarios
Performance Impact
Race Mode Performance
Latency
Best case: 50ms faster than single provider
Worst case: Same as slowest provider
Worst case: Same as slowest provider
Cost
Typical: 2-3x single provider cost
Maximum: N providers × base cost
Maximum: N providers × base cost
Reliability
Failure rate: Exponentially decreased
Uptime: 99.99%+ effective availability
Uptime: 99.99%+ effective availability
Sequential Mode Performance
Latency
Best case: Same as single provider
Worst case: Sum of all timeouts
Worst case: Sum of all timeouts
Cost
Typical: 1.1-1.3x single provider
Maximum: Same as race mode on full failures
Maximum: Same as race mode on full failures
Reliability
Failure rate: Significantly decreased
Uptime: 99.9%+ effective availability
Uptime: 99.9%+ effective availability
Troubleshooting
Common Issues
All Providers Failing
All Providers Failing
Symptoms: Consistent
all_providers_failed
errorsPossible Causes:- Network connectivity issues
- API key problems across multiple providers
- Widespread provider outages
- Request format issues
- Check network connectivity and DNS resolution
- Verify API keys and quotas for all providers
- Check provider status pages for outages
- Review request format and parameters
High Latency with Sequential Mode
High Latency with Sequential Mode
Symptoms: Slow responses when fallback is enabledPossible Causes:
- Primary provider consistently failing
- Long timeout values
- Network latency to backup providers
- Review provider health metrics
- Reduce timeout_ms for faster failover
- Consider switching to race mode for critical requests
- Check provider selection order
Unexpected Costs
Unexpected Costs
Symptoms: Higher than expected API costsPossible Causes:
- Race mode calling multiple providers
- Frequent failovers due to provider issues
- Misconfigured fallback settings
- Review fallback mode configuration
- Monitor provider health to identify problematic providers
- Consider sequential mode for cost optimization
- Set appropriate timeout values