Get Adaptive’s intelligent model selection without using our inference. Provider-agnostic design - works with any models, any providers, any infrastructure.
Why Use This?
Use Adaptive’s intelligence, run inference wherever you want:
“I have my own OpenAI/Anthropic accounts” - Get optimal model selection, pay your providers directly
“I run models on-premise” - Get routing decisions for your local infrastructure
“I have enterprise contracts” - Use your existing provider relationships with intelligent routing
“I need data privacy” - Keep inference local while getting smart model selection
Request
Provider-agnostic format - send your available models and prompt, get intelligent selection back.
Array of available models. For known models (GPT-4, Claude, Gemini, etc.), just specify provider
and model_name
- Adaptive knows the rest. Only provide full specs for custom/unknown models. Show Model Specification Options
Provider + Model (for known models): { "provider" : "openai" , "model_name" : "gpt-4o-mini" }
Provider-only (let Adaptive choose best model): { "provider" : "anthropic" }
Model-only (if provider is obvious): { "model_name" : "gpt-4o-mini" }
Full specification (for custom models):
provider (string, required): Provider name (e.g., “openai”, “anthropic”, “local”, “custom”)
model_name (string): Model identifier (required unless provider-only)
cost_per_1m_input_tokens (number): Cost per 1M input tokens (auto-filled for known models)
cost_per_1m_output_tokens (number): Cost per 1M output tokens (auto-filled for known models)
max_context_tokens (number): Maximum context window size (auto-filled for known models)
supports_tool_calling (boolean): Whether the model supports tool/function calling (auto-filled for known models)
max_output_tokens (number): Maximum output tokens (optional)
complexity (string): Model complexity tier: “low”, “medium”, “high” (optional)
task_type (string): Optimized task type (optional)
The prompt text to analyze for optimal model selection
Optional user identifier for caching optimization (enables user-specific cache hits)
Cost optimization preference (0.0 = cheapest, 1.0 = best performance) Default: Uses server configuration. Override to prioritize cost savings or performance for this specific selection.
Available tool definitions for function calling detection Tool definitions help Adaptive understand if your prompt requires function calling capabilities, influencing model selection towards models that support tools. Show Tool Definition Properties
Type of tool (always “function”)
Function definition object Description of what the function does
JSON Schema object defining the function parameters
Current tool call being made (if any) If this request is part of a tool calling sequence, provide the current tool call context to help with model selection optimization. Show Tool Call Properties
Unique identifier for the tool call
Type of tool call (always “function”)
Function call details Show Function Call Properties
Name of the function being called
JSON string containing the function arguments
Semantic cache configuration for this request Show Semantic Cache Configuration
Override whether to use semantic caching for this specific request (overrides server configuration)
Override similarity threshold for cache hits (0.0-1.0, higher = more strict matching)
Response
Selected provider name The provider that was chosen for this prompt (e.g., “openai”, “anthropic”, “local”)
Selected model identifier The specific model that was chosen (e.g., “gpt-4”, “claude-3-5-sonnet”, “llama-3-8b”)
Alternative provider/model combinations (optional)Fallback options if the primary selection is unavailable Alternative provider name
Alternative model identifier
Quick Examples
”Known models - just specify what you have"
# Mix and match specification styles
response = $( curl -s -w "\n%{http_code}" https://llmadaptive.uk/api/v1/select-model \
-H "X-Stainless-API-Key: $API_KEY " \
-d '{
"models": [
{"provider": "openai", "model_name": "gpt-4o-mini"},
{"model_name": "claude-3-5-sonnet"},
{"provider": "google"}
],
"prompt": "Hello, how are you?"
}' )
http_code = $( echo " $response " | tail -n1 )
response_body = $( echo " $response " | head -n -1 )
if [ " $http_code " -ge 200 ] && [ " $http_code " -lt 300 ]; then
echo "Success: $response_body "
else
echo "Error $http_code : $response_body " >&2
exit 1
fi
# Success response:
{
"provider" : "openai",
"model" : "gpt-4o-mini"
}
"Just specify providers - let Adaptive choose"
# Even simpler - just say what providers you have access to
response = $( curl -s -w "\n%{http_code}" https://llmadaptive.uk/api/v1/select-model \
-H "X-Stainless-API-Key: $API_KEY " \
-d '{
"models": [
{"provider": "openai"},
{"provider": "anthropic"}
],
"prompt": "Write a complex analysis of market trends"
}' )
http_code = $( echo " $response " | tail -n1 )
response_body = $( echo " $response " | head -n -1 )
if [ " $http_code " -ge 200 ] && [ " $http_code " -lt 300 ]; then
echo "Success: $response_body "
else
echo "Error $http_code : $response_body " >&2
exit 1
fi
# Success response:
{
"provider" : "anthropic",
"model" : "claude-3-5-sonnet-20241022",
"alternatives" : [
{ "provider" : "openai", "model": "gpt-4o"}
]
}
"Custom models - specify full details"
# Only specify details for custom/unknown models
response = $( curl -s -w "\n%{http_code}" https://llmadaptive.uk/api/v1/select-model \
-H "X-Stainless-API-Key: $API_KEY " \
-d '{
"models": [
{"provider": "openai", "model_name": "gpt-4o-mini"},
{
"provider": "local",
"model_name": "my-custom-llama-fine-tune",
"cost_per_1m_input_tokens": 0.0,
"cost_per_1m_output_tokens": 0.0,
"max_context_tokens": 4096,
"supports_tool_calling": false,
"complexity": "medium"
}
],
"prompt": "Hello, how are you?"
}' )
http_code = $( echo " $response " | tail -n1 )
response_body = $( echo " $response " | head -n -1 )
if [ " $http_code " -ge 200 ] && [ " $http_code " -lt 300 ]; then
echo "Success: $response_body "
else
echo "Error $http_code : $response_body " >&2
exit 1
fi
# Known models use Adaptive's specs, custom models use yours
"Test cost optimization"
// Will cost_bias actually pick cheaper models?
const response = await fetch ( '/api/v1/select-model' , {
method: 'POST' ,
headers: { 'X-Stainless-API-Key' : apiKey },
body: JSON . stringify ({
models: [
{
provider: "openai" ,
model_name: "gpt-4o-mini" ,
cost_per_1m_input_tokens: 0.15 ,
cost_per_1m_output_tokens: 0.6 ,
max_context_tokens: 128000 ,
supports_tool_calling: true
},
{
provider: "openai" ,
model_name: "gpt-4o" ,
cost_per_1m_input_tokens: 2.5 ,
cost_per_1m_output_tokens: 10.0 ,
max_context_tokens: 128000 ,
supports_tool_calling: true
}
],
prompt: "Analyze this complex dataset and provide insights..." ,
cost_bias: 0.1 // Maximize cost savings
})
});
if ( ! response . ok ) {
const errorBody = await response . text ();
throw new Error ( `HTTP ${ response . status } : ${ errorBody } ` );
}
const result = await response . json ();
console . log ( result );
// Check if it picked the cheaper model despite complexity
"Function calling optimization"
# Models with function calling will be prioritized when tools are provided
response = $( curl -s -w "\n%{http_code}" https://llmadaptive.uk/api/v1/select-model \
-H "X-Stainless-API-Key: $API_KEY " \
-d '{
"models": [
{"provider": "openai", "model_name": "gpt-4o-mini"},
{"provider": "anthropic", "model_name": "claude-3-haiku"},
{"provider": "openai", "model_name": "gpt-3.5-turbo"}
],
"prompt": "What is the weather like in San Francisco?",
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
}
}
]
}' )
http_code = $( echo " $response " | tail -n1 )
response_body = $( echo " $response " | head -n -1 )
if [ " $http_code " -ge 200 ] && [ " $http_code " -lt 300 ]; then
echo "Success: $response_body "
else
echo "Error $http_code : $response_body " >&2
exit 1
fi
# Success response - will prefer models that support function calling:
{
"provider" : "openai",
"model" : "gpt-4o-mini",
"alternatives" : [
{ "provider" : "openai", "model": "gpt-3.5-turbo"}
]
}
"Compare different configurations”
import requests
import os
# Configuration
BASE_URL = "https://api.yourdomain.com" # Replace with your actual domain
API_TOKEN = os.getenv( "ADAPTIVE_API_TOKEN" , "your-api-token-here" ) # Set via environment variable
TIMEOUT = 30 # Request timeout in seconds
# Define available models
models = [
{
"provider" : "openai" ,
"model_name" : "gpt-4o-mini" ,
"cost_per_1m_input_tokens" : 0.15 ,
"cost_per_1m_output_tokens" : 0.6 ,
"max_context_tokens" : 128000 ,
"supports_tool_calling" : True ,
"complexity" : "low"
},
{
"provider" : "openai" ,
"model_name" : "gpt-4o" ,
"cost_per_1m_input_tokens" : 2.5 ,
"cost_per_1m_output_tokens" : 10.0 ,
"max_context_tokens" : 128000 ,
"supports_tool_calling" : True ,
"complexity" : "high"
}
]
base_request = {
"models" : models,
"prompt" : "Write Python code to analyze customer data"
}
# Headers for authentication
headers = {
"Authorization" : f "Bearer { API_TOKEN } " ,
"Content-Type" : "application/json"
}
# Test cost-focused vs performance-focused
configs = [
{ "cost_bias" : 0.1 , "name" : "cost-optimized" },
{ "cost_bias" : 0.9 , "name" : "performance-focused" }
]
for config in configs:
try :
response = requests.post(
f " { BASE_URL } /api/v1/select-model" ,
json = {
** base_request,
"cost_bias" : config[ "cost_bias" ]
},
headers = headers,
timeout = TIMEOUT
)
# Check if request was successful
if response.ok:
result = response.json()
print ( f " { config[ 'name' ] } : { result[ 'provider' ] } / { result[ 'model' ] } " )
else :
print ( f "Error for { config[ 'name' ] } : HTTP { response.status_code } - { response.text } " )
except requests.exceptions.Timeout:
print ( f "Timeout error for { config[ 'name' ] } : Request took longer than { TIMEOUT } seconds" )
except requests.exceptions.ConnectionError:
print ( f "Connection error for { config[ 'name' ] } : Unable to connect to { BASE_URL } " )
except requests.exceptions.RequestException as e:
print ( f "Request error for { config[ 'name' ] } : { e } " )
except Exception as e:
print ( f "Unexpected error for { config[ 'name' ] } : { e } " )
Real-World Integration Patterns
1. Use Your Own Provider Accounts
// Define your available models with your own pricing
const availableModels = [
{
provider: "openai" ,
model_name: "gpt-4o-mini" ,
cost_per_1m_input_tokens: 0.15 ,
cost_per_1m_output_tokens: 0.6 ,
max_context_tokens: 128000 ,
supports_tool_calling: true
},
{
provider: "anthropic" ,
model_name: "claude-3-5-sonnet-20241022" ,
cost_per_1m_input_tokens: 3.0 ,
cost_per_1m_output_tokens: 15.0 ,
max_context_tokens: 200000 ,
supports_tool_calling: true
}
];
// Get intelligent selection
const selection = await fetch ( '/api/v1/select-model' , {
method: 'POST' ,
headers: { 'X-Stainless-API-Key' : adaptiveKey },
body: JSON . stringify ({
models: availableModels ,
prompt: userMessage
})
});
const result = await selection . json ();
// Route to your own provider accounts
if ( result . provider === "openai" ) {
const completion = await yourOpenAI . chat . completions . create ({
model: result . model ,
messages: [{ role: "user" , content: userMessage }]
});
} else if ( result . provider === "anthropic" ) {
const completion = await yourAnthropic . messages . create ({
model: result . model ,
messages: [{ role: "user" , content: userMessage }],
max_tokens: 4096
});
}
2. On-Premise Model Routing
// Tell Adaptive about your local models (plus a cloud fallback)
const res = await fetch ( 'https://llmadaptive.uk/api/v1/select-model' , {
method: 'POST' ,
headers: { 'X-Stainless-API-Key' : adaptiveKey , 'Content-Type' : 'application/json' },
body: JSON . stringify ({
models: [
{ provider: "local" , model_name: "llama-3-8b" },
{ provider: "local" , model_name: "llama-3-70b" },
{ provider: "openai" , model_name: "gpt-4" } // Cloud fallback
],
prompt: userMessage
})
});
const selection = await res . json ();
// Route to the right infrastructure using provider/model
if ( selection . provider === "local" && selection . model === "llama-3-8b" ) {
await yourLocalServer . infer ({ model: selection . model , messages: [{ role: "user" , content: userMessage }] });
} else if ( selection . provider === "openai" ) {
await yourOpenAI . chat . completions . create ({ model: selection . model , messages: [{ role: "user" , content: userMessage }] });
}
3. Enterprise Contract Optimization
// Maximize usage of your enterprise contracts
const res = await fetch ( 'https://llmadaptive.uk/api/v1/select-model' , {
method: 'POST' ,
headers: { 'X-Stainless-API-Key' : adaptiveKey , 'Content-Type' : 'application/json' },
body: JSON . stringify ({
models: [
{ provider: "anthropic" }, // Your enterprise contract
{ provider: "openai" }, // Your enterprise contract
{ provider: "google" } // Pay-per-use fallback
],
prompt: userMessage ,
cost_bias: 0.8
})
});
const selection = await res . json ();
// Always use your own accounts
const client = yourProviderClients [ selection . provider ];
const completion = await client . create ({
model: selection . model ,
messages: [{ role: "user" , content: userMessage }]
});
4. Data Privacy & Compliance
// Keep sensitive data local while getting smart routing
const selection = await selectModel ({
models: [
{ provider: "local" , model_name: "llama-3-70b" },
{ provider: "local" , model_name: "llama-3-8b" }
],
prompt: "NON_SENSITIVE_TASK_DESCRIPTION" ,
// Don't send actual sensitive data to Adaptive
});
// Run inference on your secure infrastructure
if ( selection . model === "llama-3-70b" ) {
// Use your high-end local model
const result = await yourLocalGPU . infer ( actualSensitiveData );
} else {
// Use your efficient local model
const result = await yourLocalCPU . infer ( actualSensitiveData );
}
Understanding the Response
What You Get Back
{
"provider" : "anthropic" ,
"model" : "claude-3-5-sonnet-20241022" ,
"alternatives" : [
{ "provider" : "openai" , "model" : "gpt-4o" }
]
}
Key Insights
provider
- Which API service should be called
model
- The specific model identifier to use with that provider
alternatives
- Fallback options if the primary selection is unavailable
Common Patterns
Before/After Comparison
// See what changes with different parameters
const baseline = await selectModel ( request );
const withConstraints = await selectModel ({
... request ,
cost_bias: 0.1
});
console . log ( `Baseline: ${ baseline . model } ` );
console . log ( `Cost-optimized: ${ withConstraints . model } ` );
Validate Your Setup
// Make sure your routing rules work
const shouldUseCheap = await fetch ( 'https://llmadaptive.uk/api/v1/select-model' , {
method: 'POST' ,
headers: { 'X-Stainless-API-Key' : adaptiveKey , 'Content-Type' : 'application/json' },
body: JSON . stringify ({
models: [{ provider: "openai" , model_name: "gpt-4o-mini" }, { provider: "openai" , model_name: "gpt-4o" }],
prompt: "Hi"
})
}). then ( r => r . json ());
const shouldUseExpensive = await fetch ( 'https://llmadaptive.uk/api/v1/select-model' , {
method: 'POST' ,
headers: { 'X-Stainless-API-Key' : adaptiveKey , 'Content-Type' : 'application/json' },
body: JSON . stringify ({
models: [{ provider: "openai" , model_name: "gpt-4o-mini" }, { provider: "openai" , model_name: "gpt-4o" }],
prompt: "Analyze this complex dataset..."
})
}). then ( r => r . json ());
// Verify different complexity tasks get different models
Authentication
Same as chat completions:
# Any of these work
-H "X-Stainless-API-Key: your-key"
-H "Authorization: Bearer your-key"
No Inference = Fast & Cheap
This endpoint:
✅ Fast - No LLM inference, just routing logic
✅ Cheap - Doesn’t count against token usage
✅ Accurate - Uses exact same selection logic as real completions
Perfect for testing, debugging, and cost planning without burning through your budget.