POST
/
api
/
v1
/
select-model
Select Model
curl --request POST \
  --url https://www.llmadaptive.uk/api/v1/select-model \
  --header 'Content-Type: application/json' \
  --data '{
  "models": [
    {}
  ],
  "prompt": "<string>",
  "user": "<string>",
  "cost_bias": 123,
  "tools": [
    {
      "type": "<string>",
      "function": {
        "name": "<string>",
        "description": "<string>",
        "parameters": {}
      }
    }
  ],
  "tool_call": {
    "id": "<string>",
    "type": "<string>",
    "function": {
      "name": "<string>",
      "arguments": "<string>"
    }
  },
  "model_router_cache": {
    "enabled": true,
    "semantic_threshold": 123
  }
}'
{
  "provider": "<string>",
  "model": "<string>",
  "alternatives": [
    {
      "provider": "<string>",
      "model": "<string>"
    }
  ]
}
Get Adaptive’s intelligent model selection without using our inference. Provider-agnostic design - works with any models, any providers, any infrastructure.

Why Use This?

Use Adaptive’s intelligence, run inference wherever you want:
  • “I have my own OpenAI/Anthropic accounts” - Get optimal model selection, pay your providers directly
  • “I run models on-premise” - Get routing decisions for your local infrastructure
  • “I have enterprise contracts” - Use your existing provider relationships with intelligent routing
  • “I need data privacy” - Keep inference local while getting smart model selection

Request

Provider-agnostic format - send your available models and prompt, get intelligent selection back.
models
array
required
Array of available models. For known models (GPT-4, Claude, Gemini, etc.), just specify provider and model_name - Adaptive knows the rest. Only provide full specs for custom/unknown models.
prompt
string
required
The prompt text to analyze for optimal model selection
user
string
Optional user identifier for caching optimization (enables user-specific cache hits)
cost_bias
number
Cost optimization preference (0.0 = cheapest, 1.0 = best performance)Default: Uses server configuration. Override to prioritize cost savings or performance for this specific selection.
tools
object[]
Available tool definitions for function calling detectionTool definitions help Adaptive understand if your prompt requires function calling capabilities, influencing model selection towards models that support tools.
tool_call
object
Current tool call being made (if any)If this request is part of a tool calling sequence, provide the current tool call context to help with model selection optimization.
model_router_cache
object
Semantic cache configuration for this request

Response

provider
string
Selected provider nameThe provider that was chosen for this prompt (e.g., “openai”, “anthropic”, “local”)
model
string
Selected model identifierThe specific model that was chosen (e.g., “gpt-4”, “claude-3-5-sonnet”, “llama-3-8b”)
alternatives
array
Alternative provider/model combinations (optional)Fallback options if the primary selection is unavailable

Quick Examples

”Known models - just specify what you have"

# Mix and match specification styles
response=$(curl -s -w "\n%{http_code}" https://llmadaptive.uk/api/v1/select-model \
  -H "X-Stainless-API-Key: $API_KEY" \
  -d '{
    "models": [
      {"provider": "openai", "model_name": "gpt-4o-mini"},
      {"model_name": "claude-3-5-sonnet"},
      {"provider": "google"}
    ],
    "prompt": "Hello, how are you?"
  }')

http_code=$(echo "$response" | tail -n1)
response_body=$(echo "$response" | head -n -1)

if [ "$http_code" -ge 200 ] && [ "$http_code" -lt 300 ]; then
  echo "Success: $response_body"
else
  echo "Error $http_code: $response_body" >&2
  exit 1
fi

# Success response:
{
  "provider": "openai", 
  "model": "gpt-4o-mini"
}

"Just specify providers - let Adaptive choose"

# Even simpler - just say what providers you have access to
response=$(curl -s -w "\n%{http_code}" https://llmadaptive.uk/api/v1/select-model \
  -H "X-Stainless-API-Key: $API_KEY" \
  -d '{
    "models": [
      {"provider": "openai"},
      {"provider": "anthropic"}
    ],
    "prompt": "Write a complex analysis of market trends"
  }')

http_code=$(echo "$response" | tail -n1)
response_body=$(echo "$response" | head -n -1)

if [ "$http_code" -ge 200 ] && [ "$http_code" -lt 300 ]; then
  echo "Success: $response_body"
else
  echo "Error $http_code: $response_body" >&2
  exit 1
fi

# Success response:
{
  "provider": "anthropic",
  "model": "claude-3-5-sonnet-20241022",
  "alternatives": [
    {"provider": "openai", "model": "gpt-4o"}
  ]
}

"Custom models - specify full details"

# Only specify details for custom/unknown models
response=$(curl -s -w "\n%{http_code}" https://llmadaptive.uk/api/v1/select-model \
  -H "X-Stainless-API-Key: $API_KEY" \
  -d '{
    "models": [
      {"provider": "openai", "model_name": "gpt-4o-mini"},
      {
        "provider": "local",
        "model_name": "my-custom-llama-fine-tune",
        "cost_per_1m_input_tokens": 0.0,
        "cost_per_1m_output_tokens": 0.0,
        "max_context_tokens": 4096,
        "supports_tool_calling": false,
        "complexity": "medium"
      }
    ],
    "prompt": "Hello, how are you?"
  }')

http_code=$(echo "$response" | tail -n1)
response_body=$(echo "$response" | head -n -1)

if [ "$http_code" -ge 200 ] && [ "$http_code" -lt 300 ]; then
  echo "Success: $response_body"
else
  echo "Error $http_code: $response_body" >&2
  exit 1
fi

# Known models use Adaptive's specs, custom models use yours

"Test cost optimization"

// Will cost_bias actually pick cheaper models?
const response = await fetch('/api/v1/select-model', {
  method: 'POST',
  headers: { 'X-Stainless-API-Key': apiKey },
  body: JSON.stringify({
    models: [
      {
        provider: "openai",
        model_name: "gpt-4o-mini",
        cost_per_1m_input_tokens: 0.15,
        cost_per_1m_output_tokens: 0.6,
        max_context_tokens: 128000,
        supports_tool_calling: true
      },
      {
        provider: "openai",
        model_name: "gpt-4o", 
        cost_per_1m_input_tokens: 2.5,
        cost_per_1m_output_tokens: 10.0,
        max_context_tokens: 128000,
        supports_tool_calling: true
      }
    ],
    prompt: "Analyze this complex dataset and provide insights...",
    cost_bias: 0.1  // Maximize cost savings
  })
});

if (!response.ok) {
  const errorBody = await response.text();
  throw new Error(`HTTP ${response.status}: ${errorBody}`);
}

const result = await response.json();
console.log(result);
// Check if it picked the cheaper model despite complexity

"Function calling optimization"

# Models with function calling will be prioritized when tools are provided
response=$(curl -s -w "\n%{http_code}" https://llmadaptive.uk/api/v1/select-model \
  -H "X-Stainless-API-Key: $API_KEY" \
  -d '{
    "models": [
      {"provider": "openai", "model_name": "gpt-4o-mini"},
      {"provider": "anthropic", "model_name": "claude-3-haiku"},
      {"provider": "openai", "model_name": "gpt-3.5-turbo"}
    ],
    "prompt": "What is the weather like in San Francisco?",
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
          }
        }
      }
    ]
  }')

http_code=$(echo "$response" | tail -n1)
response_body=$(echo "$response" | head -n -1)

if [ "$http_code" -ge 200 ] && [ "$http_code" -lt 300 ]; then
  echo "Success: $response_body"
else
  echo "Error $http_code: $response_body" >&2
  exit 1
fi

# Success response - will prefer models that support function calling:
{
  "provider": "openai",
  "model": "gpt-4o-mini",
  "alternatives": [
    {"provider": "openai", "model": "gpt-3.5-turbo"}
  ]
}

"Compare different configurations”

import requests
import os

# Configuration
BASE_URL = "https://api.yourdomain.com"  # Replace with your actual domain
API_TOKEN = os.getenv("ADAPTIVE_API_TOKEN", "your-api-token-here")  # Set via environment variable
TIMEOUT = 30  # Request timeout in seconds

# Define available models
models = [
    {
        "provider": "openai",
        "model_name": "gpt-4o-mini",
        "cost_per_1m_input_tokens": 0.15,
        "cost_per_1m_output_tokens": 0.6,
        "max_context_tokens": 128000,
        "supports_tool_calling": True,
        "complexity": "low"
    },
    {
        "provider": "openai", 
        "model_name": "gpt-4o",
        "cost_per_1m_input_tokens": 2.5,
        "cost_per_1m_output_tokens": 10.0,
        "max_context_tokens": 128000,
        "supports_tool_calling": True,
        "complexity": "high"
    }
]

base_request = {
    "models": models,
    "prompt": "Write Python code to analyze customer data"
}

# Headers for authentication
headers = {
    "Authorization": f"Bearer {API_TOKEN}",
    "Content-Type": "application/json"
}

# Test cost-focused vs performance-focused
configs = [
    {"cost_bias": 0.1, "name": "cost-optimized"},
    {"cost_bias": 0.9, "name": "performance-focused"}
]

for config in configs:
    try:
        response = requests.post(
            f"{BASE_URL}/api/v1/select-model",
            json={
                **base_request,
                "cost_bias": config["cost_bias"]
            },
            headers=headers,
            timeout=TIMEOUT
        )
        
        # Check if request was successful
        if response.ok:
            result = response.json()
            print(f"{config['name']}: {result['provider']}/{result['model']}")
        else:
            print(f"Error for {config['name']}: HTTP {response.status_code} - {response.text}")
            
    except requests.exceptions.Timeout:
        print(f"Timeout error for {config['name']}: Request took longer than {TIMEOUT} seconds")
    except requests.exceptions.ConnectionError:
        print(f"Connection error for {config['name']}: Unable to connect to {BASE_URL}")
    except requests.exceptions.RequestException as e:
        print(f"Request error for {config['name']}: {e}")
    except Exception as e:
        print(f"Unexpected error for {config['name']}: {e}")

Real-World Integration Patterns

1. Use Your Own Provider Accounts

// Define your available models with your own pricing
const availableModels = [
  {
    provider: "openai",
    model_name: "gpt-4o-mini", 
    cost_per_1m_input_tokens: 0.15,
    cost_per_1m_output_tokens: 0.6,
    max_context_tokens: 128000,
    supports_tool_calling: true
  },
  {
    provider: "anthropic",
    model_name: "claude-3-5-sonnet-20241022",
    cost_per_1m_input_tokens: 3.0,
    cost_per_1m_output_tokens: 15.0,
    max_context_tokens: 200000,
    supports_tool_calling: true
  }
];

// Get intelligent selection
const selection = await fetch('/api/v1/select-model', {
  method: 'POST',
  headers: { 'X-Stainless-API-Key': adaptiveKey },
  body: JSON.stringify({
    models: availableModels,
    prompt: userMessage
  })
});

const result = await selection.json();

// Route to your own provider accounts
if (result.provider === "openai") {
  const completion = await yourOpenAI.chat.completions.create({
    model: result.model,
    messages: [{ role: "user", content: userMessage }]
  });
} else if (result.provider === "anthropic") {
  const completion = await yourAnthropic.messages.create({
    model: result.model,
    messages: [{ role: "user", content: userMessage }],
    max_tokens: 4096
  });
}

2. On-Premise Model Routing

// Tell Adaptive about your local models (plus a cloud fallback)
const res = await fetch('https://llmadaptive.uk/api/v1/select-model', {
  method: 'POST',
  headers: { 'X-Stainless-API-Key': adaptiveKey, 'Content-Type': 'application/json' },
  body: JSON.stringify({
    models: [
      { provider: "local", model_name: "llama-3-8b" },
      { provider: "local", model_name: "llama-3-70b" },
      { provider: "openai", model_name: "gpt-4" } // Cloud fallback
    ],
    prompt: userMessage
  })
});
const selection = await res.json();

// Route to the right infrastructure using provider/model
if (selection.provider === "local" && selection.model === "llama-3-8b") {
  await yourLocalServer.infer({ model: selection.model, messages: [{ role: "user", content: userMessage }] });
} else if (selection.provider === "openai") {
  await yourOpenAI.chat.completions.create({ model: selection.model, messages: [{ role: "user", content: userMessage }] });
}

3. Enterprise Contract Optimization

// Maximize usage of your enterprise contracts
const res = await fetch('https://llmadaptive.uk/api/v1/select-model', {
  method: 'POST',
  headers: { 'X-Stainless-API-Key': adaptiveKey, 'Content-Type': 'application/json' },
  body: JSON.stringify({
    models: [
      { provider: "anthropic" }, // Your enterprise contract
      { provider: "openai" },    // Your enterprise contract  
      { provider: "google" }     // Pay-per-use fallback
    ],
    prompt: userMessage,
    cost_bias: 0.8
  })
});
const selection = await res.json();

// Always use your own accounts
const client = yourProviderClients[selection.provider];
const completion = await client.create({
  model: selection.model,
  messages: [{ role: "user", content: userMessage }]
});

4. Data Privacy & Compliance

// Keep sensitive data local while getting smart routing
const selection = await selectModel({
  models: [
    { provider: "local", model_name: "llama-3-70b" },
    { provider: "local", model_name: "llama-3-8b" }
  ],
  prompt: "NON_SENSITIVE_TASK_DESCRIPTION",
  // Don't send actual sensitive data to Adaptive
});

// Run inference on your secure infrastructure
if (selection.model === "llama-3-70b") {
  // Use your high-end local model
  const result = await yourLocalGPU.infer(actualSensitiveData);
} else {
  // Use your efficient local model
  const result = await yourLocalCPU.infer(actualSensitiveData);
}

Understanding the Response

What You Get Back

{
  "provider": "anthropic",
  "model": "claude-3-5-sonnet-20241022",
  "alternatives": [
    { "provider": "openai", "model": "gpt-4o" }
  ]
}

Key Insights

  • provider - Which API service should be called
  • model - The specific model identifier to use with that provider
  • alternatives - Fallback options if the primary selection is unavailable

Common Patterns

Before/After Comparison

// See what changes with different parameters
const baseline = await selectModel(request);
const withConstraints = await selectModel({
  ...request,
  cost_bias: 0.1
});

console.log(`Baseline: ${baseline.model}`);
console.log(`Cost-optimized: ${withConstraints.model}`);

Validate Your Setup

// Make sure your routing rules work
const shouldUseCheap = await fetch('https://llmadaptive.uk/api/v1/select-model', {
  method: 'POST',
  headers: { 'X-Stainless-API-Key': adaptiveKey, 'Content-Type': 'application/json' },
  body: JSON.stringify({
    models: [{ provider: "openai", model_name: "gpt-4o-mini" }, { provider: "openai", model_name: "gpt-4o" }],
    prompt: "Hi"
  })
}).then(r => r.json());

const shouldUseExpensive = await fetch('https://llmadaptive.uk/api/v1/select-model', {
  method: 'POST',
  headers: { 'X-Stainless-API-Key': adaptiveKey, 'Content-Type': 'application/json' },
  body: JSON.stringify({
    models: [{ provider: "openai", model_name: "gpt-4o-mini" }, { provider: "openai", model_name: "gpt-4o" }],
    prompt: "Analyze this complex dataset..."
  })
}).then(r => r.json());

// Verify different complexity tasks get different models

Authentication

Same as chat completions:
# Any of these work
-H "X-Stainless-API-Key: your-key"
-H "Authorization: Bearer your-key"

No Inference = Fast & Cheap

This endpoint:
  • Fast - No LLM inference, just routing logic
  • Cheap - Doesn’t count against token usage
  • Accurate - Uses exact same selection logic as real completions
Perfect for testing, debugging, and cost planning without burning through your budget.