Automatic: Both semantic and provider caching work without configuration for most applications.
How It Works
Adaptive checks caches in order of speed:- Provider Cache: Exact matches at provider infrastructure (microseconds)
- Semantic Cache: AI-powered similar meaning matches (<100ms)
- Fresh Request: Standard API call when no cache hit
Key Benefits
Speed
Microseconds - 100ms across cache levels
Savings
60-95% cost reduction
Intelligence
Exact + semantic matching
Integration
Provider native caching
Real-World Examples
Customer Support: “How do I reset my password?” matches “I forgot my password”, “Password reset steps”, “Help me recover my account” Technical Docs: “How to implement JWT authentication” matches “JWT auth implementation guide”, “Setting up JSON Web Token auth” FAQ Systems: “What are your business hours?” matches “When are you open?”, “Office hours information”Cache Priority
- Provider Cache: Exact matches at provider infrastructure
- Semantic Cache: AI-powered similar meaning matches
- Fresh Request: Standard API call when no cache hit
Provider Prompt Caching
Adaptive automatically leverages provider-level caching when available. Key providers:| Provider | Cache Reads | Configuration | Notes |
|---|---|---|---|
| OpenAI | 0.25x-0.50x cost | Automatic | 1024+ tokens minimum |
| Anthropic | 0.1x cost | cache_control required | 4 breakpoints max, 5min TTL |
| Google Gemini | 0.25x cost | Automatic or explicit | 1028-2048 tokens minimum |
| DeepSeek | 0.1x cost | Automatic | - |
| Grok | 0.25x cost | Automatic | - |
| Groq | Reduced rate | Automatic | Kimi K2 models |
Provider Cache Examples
Provider vs Semantic Cache
| Type | Matching | Speed | Savings |
|---|---|---|---|
| Provider | Exact prompts only | Microseconds | 90-95% |
| Semantic | Similar meaning | <100ms | 60-90% |
| Combined | Best available | Automatic | Maximum |
Monitoring Provider Cache Hits
Provider cache hits are indicated differently than semantic cache hits:Provider Cache Best Practices
1
Use Consistent System Prompts
Keep system messages identical across similar requests for provider cache hits
2
Maintain Conversation Context
Provider caches work best with ongoing conversations using the same context
3
Monitor Response Times
Very fast responses (<50ms) often indicate provider cache hits
4
Combine with Semantic Cache
Let Adaptive automatically choose the best caching strategy
SDK Setup
Configuration
Default: Semantic cache enabled automatically - no configuration needed. Custom Threshold:Custom Threshold Settings
Configuration Parameters
enabled: Enable/disable semantic caching (default: true)threshold: Similarity threshold 0.0-1.0 (default: 0.85)
Threshold Guide
- 0.7-0.8: Loose matching for FAQ/customer support (high hit rate)
- 0.8-0.9: Balanced (default) for most applications
- 0.9+: Strict matching for technical/legal content (high precision)
Cache Performance Tracking
Every response includes cache information:Cache Tier Values
semantic_exact
Perfect semantic match: 90%+ savings, <50ms response
semantic_similar
Similar semantic match: 60-90% savings, <100ms response
provider_cache
Provider infrastructure cache: Microsecond response, high savings
none
No cache hit: Standard latency and costs
Performance Characteristics
Performance: Microseconds-100ms latency, 60-80% hit rate, 60-95% cost savingsUse Cases
Perfect for: Customer support, FAQ systems, documentation search, educational content Avoid for: Real-time data, personalized content, time-sensitive informationBest Practices
- Use default settings (0.85 threshold) for most applications
- Monitor
cache_tierin responses to track effectiveness - Lower thresholds for FAQ systems, higher for technical content
- Group similar content to maximize cache hits
Error Handling
Cache failures automatically fallback to fresh API calls. No interruption to your requests.Monitoring
Track cache performance in your Adaptive dashboard: hit rates, cost savings, and response times.Technical Details
- Multi-layer: Provider infrastructure + AI semantic caching
- Embeddings: Sentence transformers for semantic matching
- Similarity: Cosine similarity with configurable thresholds
- Storage: Dual systems with automatic fallback
Enhanced Cache Performance Tracking
Monitoring Cache Effectiveness
Check cache performance in API responses:Cache Tier Values
"semantic_exact": Perfect semantic match (90%+ savings, <50ms)"semantic_similar": Similar semantic match (60-90% savings, <100ms)"provider_cache": Provider infrastructure cache (microseconds, high savings)"none": No cache hit (standard costs)
Tracking Cache Hit Rates
Calculate cache effectiveness:Streaming with Cache Data
Cache information is available in streaming responses:Production Cache Monitoring
Simple cache tracking for production:Cost Savings Calculator
Next Steps
- Performance Overview - All optimization features
- Intelligent Routing - Routing + caching



