💡 Quick Start : Same as OpenAI API, but use model: "" for intelligent routing and automatic cost savings
30-Second Setup
1. Authentication : Use your Adaptive API key (either format works)
X-Stainless-API-Key: your-adaptive-api-key
# OR
Authorization: Bearer your-adaptive-api-key
2. Model Selection : Leave empty for smart routing
{
"model" : "" , // ← This enables intelligent routing
"messages" : [ ... ]
}
That’s it! Your requests automatically save 60-90% while maintaining quality.
Essential Parameters
For intelligent routing : Use "" (empty string) to automatically select the best model for cost and qualityFor specific models : Use provider:model format like "anthropic:claude-sonnet-4-5" or "openai:gpt-5-mini"
Array of message objects. Same format as OpenAI. [
{ "role" : "system" , "content" : "You are a helpful assistant" },
{ "role" : "user" , "content" : "Hello!" }
]
Roles : system, user, assistant, tool
Creativity level: 0 = focused, 1 = balanced, 2 = creative. Default: 1
Maximum response length in tokens. Leave unset for automatic sizing.
Enable streaming responses. Default: false
Smart Routing & Cost Control
Control intelligent routing to optimize cost and performance// Prefer cost savings (80% cheaper on average)
model_router : { cost_bias : 0.1 }
// Balanced cost and quality
model_router : { cost_bias : 0.5 }
// Prefer best performance
model_router : { cost_bias : 0.9 }
// Limit to specific providers
model_router : {
models : [
"openai:gpt-5-mini" ,
"anthropic:claude-sonnet-4-5"
]
}
Balance cost vs performance: 0 = cheapest, 1 = best quality. Default: 0.5
Allowed providers/models. Examples:
"openai:gpt-5-mini" - Specific model
"anthropic:claude-sonnet-4-5" - Specific model
"gemini:gemini-2.5-flash-lite" - Specific model
Override automatic complexity detection (0-1)
Override automatic token counting threshold
Provider backup when primary fails// Try providers one by one (cheaper)
fallback : { mode : "sequential" }
// Try multiple providers at once (faster)
fallback : { mode : "race" }
"sequential" (cheaper) or "race" (faster)
📋 All Standard OpenAI Parameters
Core Parameters Deprecated - Maximum number of tokens to generate. Use max_completion_tokens instead.
Maximum number of tokens that can be generated for completion, including reasoning tokens.
Whether to stream the response. Default: false
Nucleus sampling parameter between 0 and 1. Default: 1
Penalty for token frequency. Range: -2.0 to 2.0. Default: 0
Penalty for token presence. Range: -2.0 to 2.0. Default: 0
Number of chat completion choices to generate. Default: 1
Seed for deterministic sampling. Helps ensure reproducible results.
Up to 4 sequences where the API will stop generating tokens.
Unique identifier for end-user to help detect abuse and improve caching.
Advanced Parameters Whether to return log probabilities of output tokens. Default: false
Number of most likely tokens to return at each position (0-20). Requires logprobs: true.
Modify likelihood of specified tokens. Maps token IDs to bias values (-100 to 100).
Format for model output. Supports JSON schema for structured outputs. Show Response Format Options
Either json_object or json_schema
JSON schema definition when using json_schema type
Latency tier for processing. Options: auto, default, flex
Whether to store output for model distillation or evals. Default: false
Set of 16 key-value pairs for storing additional information about the request.
Audio and Multimodal Output types to generate. Options: ["text"], ["audio"], or ["text", "audio"]
Parameters for audio output when modalities includes "audio".
Reasoning Models (o-series) o-series models only - Effort level for reasoning: low, medium, or high
Function Calling Array of tool definitions for function calling. Maximum 128 functions. Tool type, currently only function supported
Function definition with name, description, and parameters schema
Controls tool usage: none, auto, required, or specific tool selection
Whether to enable parallel function calling. Default: true
Deprecated - Use tool_choice instead. Controls function calling behavior.
Web Search Options for web search tool functionality.
Streaming Options Additional options for streaming responses. Whether to include usage statistics in streaming response
Prediction and Caching Static predicted output content for regeneration scenarios.
Adaptive-Specific Parameters Configuration for intelligent routing and provider selection. Array of model capabilities to consider for routing. Can be simplified or detailed: Simple formats:
"openai:gpt-5-mini" - Use specific OpenAI model
"anthropic:claude-sonnet-4-5" - Use specific Anthropic model
Custom models require all parameters: Show Model Capability Object
Provider name: "openai", "anthropic", "gemini", "z-ai"
Specific model identifier (e.g., "gpt-5-mini", "claude-sonnet-4-5"). Required for specific model selection
Cost per 1 million input tokens in USD.
cost_per_1m_output_tokens
Cost per 1 million output tokens in USD.
Maximum context window size in tokens. Replaces deprecated max_context_tokens.
Maximum output tokens the model can generate. Replaces deprecated max_output_tokens.
List of supported API parameters (e.g., [“temperature”, “top_p”, “tools”]). Replaces deprecated supports_tool_calling boolean.
Human-readable description of the model
Array of supported language codes
Model size information (e.g., "7B", "70B")
Expected latency: "low", "medium", "high"
Optimal task type: "Open QA", "Closed QA", "Summarization", "Text Generation", "Code Generation", "Chatbot", "Classification", "Rewrite", "Brainstorming", "Extraction", "Other"
Model complexity tier: "low", "medium", "high"
Bias towards cost optimization. Range: 0.0-1.0 where 0.0 = cheapest, 1.0 = best performance
Threshold for task complexity routing decisions. Range: 0.0-1.0
Token count threshold for model selection. Positive integer.
Configuration for provider fallback behavior. Fallback is disabled by default (empty/omitted), enabled when mode is specified. Fallback strategy: "sequential" or "race". Empty/omitted = disabled, specified = enabled.
Timeout in milliseconds for fallback operations
Maximum number of retry attempts
Response
Unique identifier for the completion
Object type, always chat.completion
Unix timestamp of creation
Model used for the completion
Adaptive addition: Which provider was selected (e.g., “openai”, “anthropic”)
Array of completion choices The generated message Role of the message, always “assistant”
The content of the message
Tool calls made by the model (if any)
Reason completion finished: stop, length, tool_calls, or content_filter
Token usage statistics Number of tokens in the prompt
Number of tokens in the completion
Adaptive addition: Cache tier used for this response. Possible values:
"semantic_exact" - Exact semantic cache match
"semantic_similar" - Similar semantic cache match
Omitted when no cache is used
Live Examples
💡 Try These Examples : Copy-paste ready code that works immediately. Each example shows the cost savings in action.
1. Simple Chat → 97% Cost Savings
Cost Comparison: Simple question routes to Gemini Flash
OpenAI Direct: 3.00 p e r 1 M i n p u t t o k e n s ∗ ∗ A d a p t i v e S m a r t : ∗ ∗ G e m i n i F l a s h ( 3.00 per 1M input tokens **Adaptive Smart:** Gemini Flash ( 3.00 p er 1 M in p u tt o k e n s ∗ ∗ A d a pt i v e S ma r t : ∗ ∗ G e mini Fl a s h ( 0.075/1M) + Overhead (0.10 / 1 M i n p u t , 0.10/1M input, 0.10/1 M in p u t , 0.20/1M output)
Savings: 97% (total ~0.10 / 1 M v s 0.10/1M vs 0.10/1 M v s 3.00/1M)
JavaScript - Copy & Run
Python - Copy & Run
cURL - Copy & Run
const completion = await openai . chat . completions . create ({
model: '' , // ← Smart routing enabled
messages: [
{ role: 'user' , content: 'Explain quantum computing simply' }
],
});
console . log ( completion . choices [ 0 ]. message . content );
console . log ( `Provider used: ${ completion . provider } ` ); // See which was chosen
console . log ( `Cache tier: ${ completion . usage . cache_tier || 'none' } ` );
2. Complex Analysis → 85% Cost Savings
Cost Comparison: Complex task routes to DeepSeek Reasoner
OpenAI Direct: 15.00 p e r 1 M i n p u t t o k e n s ∗ ∗ A d a p t i v e S m a r t : ∗ ∗ D e e p S e e k ( 15.00 per 1M input tokens **Adaptive Smart:** DeepSeek ( 15.00 p er 1 M in p u tt o k e n s ∗ ∗ A d a pt i v e S ma r t : ∗ ∗ Dee pS ee k ( 1.00/1M) + Overhead (0.10 / 1 M i n p u t , 0.10/1M input, 0.10/1 M in p u t , 0.20/1M output)
Savings: 85% (total ~1.30 / 1 M v s 1.30/1M vs 1.30/1 M v s 15.00/1M)
JavaScript - Advanced Prompt
Python - Research Task
const completion = await openai . chat . completions . create ({
model: '' ,
messages: [
{
role: 'user' ,
content: 'Analyze the economic implications of quantum computing on cryptocurrency security, considering both short-term disruptions and long-term adaptations. Include specific recommendations for blockchain protocols.'
}
],
});
// Complex prompts automatically route to premium models when needed
console . log ( `Routed to: ${ completion . provider } ` ); // Likely Claude or DeepSeek
With Intelligent Routing Configuration
// Simple provider selection
const completion = await openai . chat . completions . create ({
model: '' ,
messages: [
{ role: 'user' , content: 'Write a Python function to sort a list' }
],
model_router: {
models: [
"anthropic:claude-sonnet-4-5" , // Premium Anthropic option
"openai:gpt-5-mini" // Specific OpenAI model
],
cost_bias: 0.2 , // Prefer cost savings
complexity_threshold: 0.3 ,
token_threshold: 1000
},
fallback: {
mode: 'sequential' // Enabled by specifying mode
}
});
Customizing Standard Providers
You can also customize standard providers (OpenAI, Anthropic, etc.) with custom base URLs, API keys, and settings:
// Override standard provider configuration
const completion = await openai . chat . completions . create ({
model: '' ,
messages: [
{ role: 'user' , content: 'Hello from custom OpenAI endpoint!' }
],
model_router: {
models: [
"openai:gpt-5-mini" , // Will use custom config below
"anthropic:claude-sonnet-4-5" // Will also use custom config
]
},
// Custom configurations for standard providers
provider_configs: {
"openai" : {
base_url: "https://my-custom-openai-proxy.com/v1" ,
api_key: "sk-my-custom-openai-key" ,
timeout_ms: 60000 ,
headers: {
"X-Proxy-Key" : "proxy-auth-123"
}
},
"anthropic" : {
base_url: "https://my-anthropic-proxy.com/v1" ,
api_key: "sk-ant-custom-key" ,
timeout_ms: 45000
}
}
});
Streaming Response
const stream = await openai . chat . completions . create ({
model: '' ,
messages: [
{ role: 'user' , content: 'Tell me a story about space exploration' }
],
stream: true
});
for await ( const chunk of stream ) {
process . stdout . write ( chunk . choices [ 0 ]?. delta ?. content || '' );
}
Function Calling
const completion = await openai . chat . completions . create ({
model: '' ,
messages: [
{ role: 'user' , content: 'What \' s the weather like in San Francisco?' }
],
tools: [
{
type: 'function' ,
function: {
name: 'get_weather' ,
description: 'Get current weather for a location' ,
parameters: {
type: 'object' ,
properties: {
location: {
type: 'string' ,
description: 'City and state, e.g. San Francisco, CA'
}
},
required: [ 'location' ]
}
}
}
]
});
Vision (Multimodal)
const completion = await openai . chat . completions . create ({
model: '' ,
messages: [
{
role: 'user' ,
content: [
{ type: 'text' , text: 'What \' s in this image?' },
{
type: 'image_url' ,
image_url: {
url: 'https://example.com/image.jpg'
}
}
]
}
],
modalities: [ 'text' ] // Can also include 'audio' for supported models
});
Advanced Configuration with All Parameters
const completion = await openai . chat . completions . create ({
model: '' ,
messages: [
{ role: 'system' , content: 'You are a helpful assistant.' },
{ role: 'user' , content: 'Explain machine learning concepts' }
],
// Core parameters
temperature: 0.7 ,
max_completion_tokens: 1000 ,
top_p: 0.9 ,
frequency_penalty: 0.1 ,
presence_penalty: 0.1 ,
n: 1 ,
seed: 12345 ,
stop: [ ' \n\n ' ],
user: 'user-123' ,
// Advanced parameters
logprobs: true ,
top_logprobs: 5 ,
response_format: {
type: 'json_schema' ,
json_schema: {
name: 'explanation' ,
schema: {
type: 'object' ,
properties: {
concept: { type: 'string' },
explanation: { type: 'string' }
}
}
}
},
service_tier: 'auto' ,
store: false ,
metadata: {
session_id: 'abc123' ,
user_type: 'premium'
},
// Reasoning models (o-series)
reasoning_effort: 'medium' ,
// Function calling
tools: [
{
type: 'function' ,
function: {
name: 'search_knowledge' ,
description: 'Search knowledge base for information' ,
parameters: {
type: 'object' ,
properties: {
query: { type: 'string' }
}
}
}
}
],
tool_choice: 'auto' ,
parallel_tool_calls: true ,
// Streaming
stream: false ,
stream_options: {
include_usage: true
},
// Adaptive-specific
model_router: {
models: [
"openai:gpt-5-mini" , // Use OpenAI gpt-5 family
"anthropic:claude-sonnet-4-5" , // Specific model
],
cost_bias: 0.3 ,
complexity_threshold: 0.5 ,
token_threshold: 2000
},
fallback: {
mode: 'sequential' // Enabled by specifying mode
}
});
Response Examples
Cache Tier Tracking
The usage.cache_tier field shows which cache served your response:
// Semantic cache hit
{
"usage" : {
"prompt_tokens" : 10 ,
"completion_tokens" : 8 ,
"total_tokens" : 18 ,
"cache_tier" : "semantic_exact"
}
}
// No cache used
{
"usage" : {
"prompt_tokens" : 8 ,
"completion_tokens" : 10 ,
"total_tokens" : 18
// cache_tier omitted
}
}
Error Handling & Troubleshooting
🛠️ Quick Fix Guide : Most issues have simple solutions. Here’s how to resolve them fast.
⚡ Instant Fixes
🔑 Authentication Error (401)
Problem : {"error": {"message": "Invalid API key", "type": "authentication_error"}}Instant Solutions :
Check header format :
// ✅ Correct
headers : { "X-Stainless-API-Key" : "your-adaptive-key" }
// OR
headers : { "Authorization" : "Bearer your-adaptive-key" }
// ❌ Wrong
headers : { "X-API-Key" : "your-key" } // Wrong header name
Verify your key : Copy-paste from llmadaptive.uk dashboard
Check environment variables :
echo $ADAPTIVE_API_KEY # Should show your key
Working Example :const openai = new OpenAI ({
apiKey: process . env . ADAPTIVE_API_KEY , // ← Make sure this is set
baseURL: 'https://api.llmadaptive.uk/v1'
});
Problem : {"error": {"message": "Invalid request", "type": "invalid_request_error"}}Common Causes & Fixes :
Empty messages array :
// ❌ Wrong
messages : []
// ✅ Correct
messages : [{ role: "user" , content: "Hello!" }]
Missing required fields :
// ❌ Wrong
{ role : "user" } // Missing content
// ✅ Correct
{ role : "user" , content : "Your message here" }
Invalid model_router config :
// ❌ Wrong - model missing required fields
model_router : {
models : [{ provider: "unknown-provider" }] // Provider not supported
}
// ✅ Correct - use supported providers
model_router : {
models : [ "openai:gpt-5-mini" , "anthropic:claude-sonnet-4-5" ]
}
Problem : {"error": {"message": "Rate limit exceeded", "type": "rate_limit_error"}}Immediate Actions :
Wait and retry : Rate limits reset every minute
Implement exponential backoff :
async function callWithRetry ( requestFn , maxRetries = 3 ) {
for ( let i = 0 ; i < maxRetries ; i ++ ) {
try {
return await requestFn ();
} catch ( error ) {
if ( error . status === 429 && i < maxRetries - 1 ) {
await new Promise ( resolve => setTimeout ( resolve , Math . pow ( 2 , i ) * 1000 ));
} else {
throw error ;
}
}
}
}
Upgrade your plan at llmadaptive.uk for higher limits
Use caching to reduce requests:
semantic_cache : { enabled : true }
Problem : Custom provider not working or failingChecklist :
Provider configuration must be complete :
provider_configs : {
"my-provider" : {
base_url: "https://api.example.com/v1" , // ✅ Required
api_key: "sk-your-key" , // ✅ Required
auth_type: "bearer" , // ✅ Good practice
timeout_ms: 30000 // ✅ Recommended
}
}
Model definition must include all fields :
model_router : {
models : [{
provider: "my-provider" ,
model_name: "model-name" , // ✅ Required
cost_per_1m_input_tokens: 2.0 , // ✅ Required
cost_per_1m_output_tokens: 6.0 , // ✅ Required
context_length: 16000 , // ✅ Required
max_completion_tokens: 4000 , // ✅ Required
supported_parameters: [ "temperature" , "top_p" ], // ✅ Required
task_type: "Text Generation" , // ✅ Required
complexity: "medium" // ✅ Required
}]
}
Test the provider directly first :
curl https://api.your-provider.com/v1/chat/completions \
-H "Authorization: Bearer your-key" \
-d '{"model": "model-name", "messages": [...]}'
Standard error object format {
"error" : {
"message" : "Human-readable error description" ,
"type" : "authentication_error" ,
"code" : "invalid_api_key"
}
}
Clear description of what went wrong
Error category: invalid_request_error, authentication_error, permission_error, rate_limit_error, server_error
Specific error code for programmatic handling
🚨 Emergency Troubleshooting
Rate Limits
Plan Requests per Minute Tokens per Minute
Free 100 10,000 Pro 1,000 100,000 Enterprise Custom Custom
Rate limits are applied per API key and reset every minute.
Best Practices
Model Selection Use empty string "" for model to enable intelligent routing and cost savings
Cost Control Use cost_bias parameter to balance cost vs performance for your use case
Custom Providers When using custom providers, always include their configuration in provider_configs
Error Handling Always implement proper error handling for network and API failures