💡 Quick Start : Same as OpenAI API, but use model: ""
for intelligent routing and automatic cost savings
30-Second Setup
1. Authentication : Use your Adaptive API key (either format works)
X-Stainless-API-Key: your-adaptive-api-key
# OR
Authorization: Bearer your-adaptive-api-key
2. Model Selection : Leave empty for smart routing
{
"model" : "" , // ← This enables intelligent routing
"messages" : [ ... ]
}
That’s it! Your requests automatically save 60-80% while maintaining quality.
Essential Parameters
For intelligent routing : Use ""
(empty string) to automatically select the best model for cost and qualityFor specific models : Use provider:model format like "anthropic:claude-3-sonnet"
or "openai:gpt-4"
Array of message objects. Same format as OpenAI. [
{ "role" : "system" , "content" : "You are a helpful assistant" },
{ "role" : "user" , "content" : "Hello!" }
]
Roles : system
, user
, assistant
, tool
Creativity level: 0
= focused, 1
= balanced, 2
= creative. Default: 1
Maximum response length in tokens. Leave unset for automatic sizing.
Enable streaming responses. Default: false
Smart Routing & Cost Control
Control intelligent routing to optimize cost and performance// Prefer cost savings (80% cheaper on average)
model_router : { cost_bias : 0.1 }
// Balanced cost and quality
model_router : { cost_bias : 0.5 }
// Prefer best performance
model_router : { cost_bias : 0.9 }
// Limit to specific providers
model_router : {
models : [
{ provider: "openai" },
{ provider: "anthropic" }
]
}
Balance cost vs performance: 0
= cheapest, 1
= best quality. Default: 0.5
Allowed providers/models. Examples:
{ provider: "openai" }
- All OpenAI models
{ provider: "anthropic", model_name: "claude-3-sonnet" }
- Specific model
Override automatic complexity detection (0-1)
Override automatic token counting threshold
Provider backup when primary fails// Try providers one by one (cheaper)
fallback : { mode : "sequential" }
// Try multiple providers at once (faster)
fallback : { mode : "race" }
"sequential"
(cheaper) or "race"
(faster)
Semantic caching for similar requests (faster responses, lower costs)prompt_response_cache : {
enabled : true ,
semantic_threshold : 0.85 // How similar for cache hit
}
📋 All Standard OpenAI Parameters
Core Parameters Deprecated - Maximum number of tokens to generate. Use max_completion_tokens
instead.
Maximum number of tokens that can be generated for completion, including reasoning tokens.
Whether to stream the response. Default: false
Nucleus sampling parameter between 0 and 1. Default: 1
Penalty for token frequency. Range: -2.0 to 2.0. Default: 0
Penalty for token presence. Range: -2.0 to 2.0. Default: 0
Number of chat completion choices to generate. Default: 1
Seed for deterministic sampling. Helps ensure reproducible results.
Up to 4 sequences where the API will stop generating tokens.
Unique identifier for end-user to help detect abuse and improve caching.
Advanced Parameters Whether to return log probabilities of output tokens. Default: false
Number of most likely tokens to return at each position (0-20). Requires logprobs: true
.
Modify likelihood of specified tokens. Maps token IDs to bias values (-100 to 100).
Format for model output. Supports JSON schema for structured outputs. Show Response Format Options
Either json_object
or json_schema
JSON schema definition when using json_schema
type
Latency tier for processing. Options: auto
, default
, flex
Whether to store output for model distillation or evals. Default: false
Set of 16 key-value pairs for storing additional information about the request.
Audio and Multimodal Output types to generate. Options: ["text"]
, ["audio"]
, or ["text", "audio"]
Parameters for audio output when modalities
includes "audio"
.
Reasoning Models (o-series) o-series models only - Effort level for reasoning: low
, medium
, or high
Function Calling Array of tool definitions for function calling. Maximum 128 functions. Tool type, currently only function
supported
Function definition with name, description, and parameters schema
Controls tool usage: none
, auto
, required
, or specific tool selection
Whether to enable parallel function calling. Default: true
Deprecated - Use tool_choice
instead. Controls function calling behavior.
Web Search Options for web search tool functionality.
Streaming Options Additional options for streaming responses. Whether to include usage statistics in streaming response
Prediction and Caching Static predicted output content for regeneration scenarios.
Adaptive-Specific Parameters Configuration for intelligent routing and provider selection. Array of model capabilities to consider for routing. Can be simplified or detailed: Simple formats:
{ "provider": "openai" }
- Use all OpenAI models
{ "provider": "anthropic", "model_name": "claude-3-sonnet-20240229" }
- Use specific model
Custom models require all parameters: Show Model Capability Object
Provider name: "openai"
, "anthropic"
, "google"
, "groq"
, "deepseek"
, "mistral"
, "grok"
, "huggingface"
Specific model identifier (e.g., "gpt-4"
, "claude-3-sonnet"
). Required for custom models or specific model selection
Cost per 1 million input tokens in USD. Required for custom models
cost_per_1m_output_tokens
Cost per 1 million output tokens in USD. Required for custom models
Maximum context window size in tokens. Required for custom models
Maximum output tokens the model can generate. Required for custom models
Whether the model supports function/tool calling. Required for custom models
Human-readable description of the model
Array of supported language codes
Model size information (e.g., "7B"
, "70B"
)
Expected latency: "low"
, "medium"
, "high"
Optimal task type: "Open QA"
, "Closed QA"
, "Summarization"
, "Text Generation"
, "Code Generation"
, "Chatbot"
, "Classification"
, "Rewrite"
, "Brainstorming"
, "Extraction"
, "Other"
Model complexity tier: "low"
, "medium"
, "high"
Bias towards cost optimization. Range: 0.0-1.0 where 0.0 = cheapest, 1.0 = best performance
Threshold for task complexity routing decisions. Range: 0.0-1.0
Token count threshold for model selection. Positive integer.
Configuration for semantic caching to improve response times and reduce costs. Whether semantic caching is enabled
Similarity threshold for cache hits. Range: 0.0-1.0, higher values require more similarity
Configuration for prompt-response caching (disabled by default). Whether to enable prompt-response caching for this request
Cache duration in seconds. Default: 3600 (1 hour)
Configuration for provider fallback behavior. Fallback is disabled by default (empty/omitted), enabled when mode is specified. Fallback strategy: "sequential"
or "race"
. Empty/omitted = disabled, specified = enabled.
Timeout in milliseconds for fallback operations
Maximum number of retry attempts
Configuration for custom providers. Required when using custom providers in your model list. Show Provider Config Object
API base URL for the custom provider (e.g., "https://api.custom.com/v1"
)
Full API key for authentication with the custom provider
Authentication type: "bearer"
, "api_key"
, "basic"
, or "custom"
. Default: "bearer"
Custom authentication header name. Default: "Authorization"
Additional headers to send with requests to the custom provider
Request timeout in milliseconds. Range: 1000-120000. Default: 30000
Rate limit in requests per minute. Range: 1-100000
Health check endpoint for monitoring provider availability
Custom retry configuration for failed requests
Advanced Configuration
🔧 Custom Providers & Enterprise Features
Response
Unique identifier for the completion
Object type, always chat.completion
Unix timestamp of creation
Model used for the completion
Adaptive addition: Which provider was selected (e.g., “openai”, “anthropic”)
Array of completion choices The generated message Role of the message, always “assistant”
The content of the message
Tool calls made by the model (if any)
Reason completion finished: stop
, length
, tool_calls
, or content_filter
Token usage statistics Number of tokens in the prompt
Number of tokens in the completion
Adaptive addition: Cache tier used for this response. Possible values:
"semantic_exact"
- Exact semantic cache match
"semantic_similar"
- Similar semantic cache match
"prompt_response"
- Prompt response cache hit
Omitted when no cache is used
Live Examples
💡 Try These Examples : Copy-paste ready code that works immediately. Each example shows the cost savings in action.
1. Simple Chat → 97% Cost Savings
Cost Comparison: Simple question routes to Gemini Flash
OpenAI Direct: 3.00 p e r 1 M t o k e n s ∗ ∗ A d a p t i v e S m a r t : ∗ ∗ 3.00 per 1M tokens **Adaptive Smart:** 3.00 p er 1 Mt o k e n s ∗ ∗ A d a pt i v e S ma r t : ∗ ∗ 0.10 per 1M tokens
Savings: 97%
JavaScript - Copy & Run
Python - Copy & Run
cURL - Copy & Run
const completion = await openai . chat . completions . create ({
model: '' , // ← Smart routing enabled
messages: [
{ role: 'user' , content: 'Explain quantum computing simply' }
],
});
console . log ( completion . choices [ 0 ]. message . content );
console . log ( `Provider used: ${ completion . provider } ` ); // See which was chosen
console . log ( `Cache tier: ${ completion . usage . cache_tier || 'none' } ` );
2. Complex Analysis → 85% Cost Savings
Cost Comparison: Complex task routes to DeepSeek Reasoner
OpenAI Direct: 15.00 p e r 1 M t o k e n s ∗ ∗ A d a p t i v e S m a r t : ∗ ∗ 15.00 per 1M tokens **Adaptive Smart:** 15.00 p er 1 Mt o k e n s ∗ ∗ A d a pt i v e S ma r t : ∗ ∗ 2.19 per 1M tokens
Savings: 85%
JavaScript - Advanced Prompt
Python - Research Task
const completion = await openai . chat . completions . create ({
model: '' ,
messages: [
{
role: 'user' ,
content: 'Analyze the economic implications of quantum computing on cryptocurrency security, considering both short-term disruptions and long-term adaptations. Include specific recommendations for blockchain protocols.'
}
],
});
// Complex prompts automatically route to premium models when needed
console . log ( `Routed to: ${ completion . provider } ` ); // Likely Claude or DeepSeek
With Intelligent Routing Configuration
// Simple provider selection
const completion = await openai . chat . completions . create ({
model: '' ,
messages: [
{ role: 'user' , content: 'Write a Python function to sort a list' }
],
model_router: {
models: [
{ provider: "anthropic" }, // All Anthropic models
{ provider: "openai" , model_name: "gpt-4" } // Specific OpenAI model
],
cost_bias: 0.2 , // Prefer cost savings
complexity_threshold: 0.3 ,
token_threshold: 1000
},
fallback: {
mode: 'sequential' // Enabled by specifying mode
}
});
Using Custom Providers
// Custom provider example
const completion = await openai . chat . completions . create ({
model: '' ,
messages: [
{ role: 'user' , content: 'Explain machine learning concepts' }
],
model_router: {
models: [
{ provider: "openai" }, // Standard provider
{
provider: "my-custom-llm" , // Custom provider
model_name: "custom-model-v1" ,
cost_per_1m_input_tokens: 2.0 ,
cost_per_1m_output_tokens: 6.0 ,
max_context_tokens: 16000 ,
max_output_tokens: 4000 ,
supports_tool_calling: true ,
task_type: "Text Generation" ,
complexity: "medium"
}
],
cost_bias: 0.5
},
// Configure each custom provider
provider_configs: {
"my-custom-llm" : {
base_url: "https://api.mycustom.com/v1" ,
api_key: "sk-custom-api-key-here" ,
auth_type: "bearer" ,
headers: {
"X-Custom-Header" : "value"
},
timeout_ms: 45000
}
}
});
Customizing Standard Providers
You can also customize standard providers (OpenAI, Anthropic, etc.) with custom base URLs, API keys, and settings:
// Override standard provider configuration
const completion = await openai . chat . completions . create ({
model: '' ,
messages: [
{ role: 'user' , content: 'Hello from custom OpenAI endpoint!' }
],
model_router: {
models: [
{ provider: "openai" }, // Will use custom config below
{ provider: "anthropic" } // Will also use custom config
]
},
// Custom configurations for standard providers
provider_configs: {
"openai" : {
base_url: "https://my-custom-openai-proxy.com/v1" ,
api_key: "sk-my-custom-openai-key" ,
timeout_ms: 60000 ,
headers: {
"X-Proxy-Key" : "proxy-auth-123"
}
},
"anthropic" : {
base_url: "https://my-anthropic-proxy.com/v1" ,
api_key: "sk-ant-custom-key" ,
timeout_ms: 45000
}
}
});
Streaming Response
const stream = await openai . chat . completions . create ({
model: '' ,
messages: [
{ role: 'user' , content: 'Tell me a story about space exploration' }
],
stream: true
});
for await ( const chunk of stream ) {
process . stdout . write ( chunk . choices [ 0 ]?. delta ?. content || '' );
}
Function Calling
const completion = await openai . chat . completions . create ({
model: '' ,
messages: [
{ role: 'user' , content: 'What \' s the weather like in San Francisco?' }
],
tools: [
{
type: 'function' ,
function: {
name: 'get_weather' ,
description: 'Get current weather for a location' ,
parameters: {
type: 'object' ,
properties: {
location: {
type: 'string' ,
description: 'City and state, e.g. San Francisco, CA'
}
},
required: [ 'location' ]
}
}
}
]
});
Vision (Multimodal)
const completion = await openai . chat . completions . create ({
model: '' ,
messages: [
{
role: 'user' ,
content: [
{ type: 'text' , text: 'What \' s in this image?' },
{
type: 'image_url' ,
image_url: {
url: 'https://example.com/image.jpg'
}
}
]
}
],
modalities: [ 'text' ] // Can also include 'audio' for supported models
});
Advanced Configuration with All Parameters
const completion = await openai . chat . completions . create ({
model: '' ,
messages: [
{ role: 'system' , content: 'You are a helpful assistant.' },
{ role: 'user' , content: 'Explain machine learning concepts' }
],
// Core parameters
temperature: 0.7 ,
max_completion_tokens: 1000 ,
top_p: 0.9 ,
frequency_penalty: 0.1 ,
presence_penalty: 0.1 ,
n: 1 ,
seed: 12345 ,
stop: [ ' \n\n ' ],
user: 'user-123' ,
// Advanced parameters
logprobs: true ,
top_logprobs: 5 ,
response_format: {
type: 'json_schema' ,
json_schema: {
name: 'explanation' ,
schema: {
type: 'object' ,
properties: {
concept: { type: 'string' },
explanation: { type: 'string' }
}
}
}
},
service_tier: 'auto' ,
store: false ,
metadata: {
session_id: 'abc123' ,
user_type: 'premium'
},
// Reasoning models (o-series)
reasoning_effort: 'medium' ,
// Function calling
tools: [
{
type: 'function' ,
function: {
name: 'search_knowledge' ,
description: 'Search knowledge base for information' ,
parameters: {
type: 'object' ,
properties: {
query: { type: 'string' }
}
}
}
}
],
tool_choice: 'auto' ,
parallel_tool_calls: true ,
// Streaming
stream: false ,
stream_options: {
include_usage: true
},
// Adaptive-specific
model_router: {
models: [
{ provider: "openai" }, // Use all OpenAI models
{ provider: "anthropic" , model_name: "claude-3-sonnet-20240229" }, // Specific model
// Custom model example (all params required):
{
provider: "my-custom-provider" ,
model_name: "custom-model-v1" ,
cost_per_1m_input_tokens: 5.0 ,
cost_per_1m_output_tokens: 10.0 ,
max_context_tokens: 32000 ,
max_output_tokens: 2048 ,
supports_tool_calling: false ,
task_type: "Text Generation" ,
complexity: "medium"
}
],
cost_bias: 0.3 ,
complexity_threshold: 0.5 ,
token_threshold: 2000
},
provider_configs: {
"my-custom-provider" : {
base_url: "https://api.custom.com/v1" ,
api_key: "sk-custom-key-123" ,
auth_type: "bearer" ,
headers: {
"Custom-Header" : "custom-value"
},
timeout_ms: 30000 ,
rate_limit_rpm: 1000
}
},
prompt_response_cache: {
enabled: true ,
semantic_threshold: 0.85
},
fallback: {
mode: 'sequential' // Enabled by specifying mode
}
});
Response Examples
Cache Tier Tracking
The usage.cache_tier
field shows which cache served your response:
// Semantic cache hit
{
"usage" : {
"prompt_tokens" : 10 ,
"completion_tokens" : 8 ,
"total_tokens" : 18 ,
"cache_tier" : "semantic_exact"
}
}
// Prompt cache hit
{
"usage" : {
"prompt_tokens" : 5 ,
"completion_tokens" : 4 ,
"total_tokens" : 9 ,
"cache_tier" : "prompt_response"
}
}
// No cache used
{
"usage" : {
"prompt_tokens" : 8 ,
"completion_tokens" : 10 ,
"total_tokens" : 18
// cache_tier omitted
}
}
Error Handling & Troubleshooting
🛠️ Quick Fix Guide : Most issues have simple solutions. Here’s how to resolve them fast.
⚡ Instant Fixes
🔑 Authentication Error (401)
Problem : {"error": {"message": "Invalid API key", "type": "authentication_error"}}
Instant Solutions :
Check header format :
// ✅ Correct
headers : { "X-Stainless-API-Key" : "your-adaptive-key" }
// OR
headers : { "Authorization" : "Bearer your-adaptive-key" }
// ❌ Wrong
headers : { "X-API-Key" : "your-key" } // Wrong header name
Verify your key : Copy-paste from llmadaptive.uk dashboard
Check environment variables :
echo $ADAPTIVE_API_KEY # Should show your key
Working Example :const openai = new OpenAI ({
apiKey: process . env . ADAPTIVE_API_KEY , // ← Make sure this is set
baseURL: 'https://www.llmadaptive.uk/api/v1'
});
Problem : {"error": {"message": "Invalid request", "type": "invalid_request_error"}}
Common Causes & Fixes :
Empty messages array :
// ❌ Wrong
messages : []
// ✅ Correct
messages : [{ role: "user" , content: "Hello!" }]
Missing required fields :
// ❌ Wrong
{ role : "user" } // Missing content
// ✅ Correct
{ role : "user" , content : "Your message here" }
Invalid model_router config :
// ❌ Wrong - custom model missing required fields
model_router : {
models : [{ provider: "custom-provider" }] // Missing details
}
// ✅ Correct - all required fields for custom models
model_router : {
models : [{
provider: "custom-provider" ,
model_name: "model-v1" ,
cost_per_1m_input_tokens: 2.0 ,
cost_per_1m_output_tokens: 6.0 ,
max_context_tokens: 16000 ,
max_output_tokens: 4000 ,
supports_tool_calling: true ,
task_type: "Text Generation" ,
complexity: "medium"
}]
}
Problem : {"error": {"message": "Rate limit exceeded", "type": "rate_limit_error"}}
Immediate Actions :
Wait and retry : Rate limits reset every minute
Implement exponential backoff :
async function callWithRetry ( requestFn , maxRetries = 3 ) {
for ( let i = 0 ; i < maxRetries ; i ++ ) {
try {
return await requestFn ();
} catch ( error ) {
if ( error . status === 429 && i < maxRetries - 1 ) {
await new Promise ( resolve => setTimeout ( resolve , Math . pow ( 2 , i ) * 1000 ));
} else {
throw error ;
}
}
}
}
Upgrade your plan at llmadaptive.uk for higher limits
Use caching to reduce requests:
prompt_response_cache : { enabled : true }
Problem : Custom provider not working or failingChecklist :
Provider configuration must be complete :
provider_configs : {
"my-provider" : {
base_url: "https://api.example.com/v1" , // ✅ Required
api_key: "sk-your-key" , // ✅ Required
auth_type: "bearer" , // ✅ Good practice
timeout_ms: 30000 // ✅ Recommended
}
}
Model definition must include all fields :
models : [{
provider: "my-provider" ,
model_name: "model-name" , // ✅ Required
cost_per_1m_input_tokens: 2.0 , // ✅ Required
cost_per_1m_output_tokens: 6.0 , // ✅ Required
max_context_tokens: 16000 , // ✅ Required
max_output_tokens: 4000 , // ✅ Required
supports_tool_calling: false , // ✅ Required
task_type: "Text Generation" , // ✅ Required
complexity: "medium" // ✅ Required
}]
Test the provider directly first :
curl https://api.your-provider.com/v1/chat/completions \
-H "Authorization: Bearer your-key" \
-d '{"model": "model-name", "messages": [...]}'
Standard error object format {
"error" : {
"message" : "Human-readable error description" ,
"type" : "authentication_error" ,
"code" : "invalid_api_key"
}
}
Clear description of what went wrong
Error category: invalid_request_error
, authentication_error
, permission_error
, rate_limit_error
, server_error
Specific error code for programmatic handling
🚨 Emergency Troubleshooting
Rate Limits
Plan Requests per Minute Tokens per Minute Free 100 10,000 Pro 1,000 100,000 Enterprise Custom Custom
Rate limits are applied per API key and reset every minute.
Best Practices
Model Selection Use empty string ""
for model to enable intelligent routing and cost savings
Cost Control Use cost_bias
parameter to balance cost vs performance for your use case
Custom Providers When using custom providers, always include their configuration in provider_configs
Error Handling Always implement proper error handling for network and API failures