Model Routing

Adaptive’s AI-powered routing engine analyzes every request and automatically selects the optimal model from multiple providers based on complexity, cost, and performance requirements.

How It Works

Adaptive’s model routing is powered by a sophisticated evaluation system that analyzes model performance across diverse benchmarks:

Benchmark Clustering

We take top benchmarks and cluster questions by embedding each one, creating semantic clusters that group similar tasks together

Model Evaluation

Each LLM is evaluated on every cluster, generating performance profiles that show which models excel at specific types of tasks

Inference Routing

When a prompt arrives, we embed it to find its cluster match, then select the best-performing model for that cluster

Continuous Learning

The system continuously updates profiles as new models and benchmarks become available. Benchmarks are updated based on production workloads, and models are continuously evaluated to guard against performance degradation.

Coming Soon: Custom evaluations for each user, allowing you to define your own benchmarks and evaluation criteria for personalized model routing.

Quick Start

Simply leave the model field empty to enable model routing:

const completion = await openai.chat.completions.create({
  model: "", // Empty enables model routing
  messages: [{ role: "user", content: "Hello!" }]
});

console.log(`Used provider: ${completion.provider}`);

Costs shown include Adaptive overhead (

0.10/1M input +

0.20/1M output). With BYOK (custom API keys), you only pay the overhead.

Simple Greeting

“Hello, how are you?”Routes to: Gemini Flash Cost: $0.10 per 1M tokens Savings: 97% vs GPT-4

Code Generation

“Write a React component…”Routes to: DeepSeek Coder Cost: $0.34 per 1M tokens Savings: 87% vs GPT-4

Complex Analysis

“Analyze this dataset…”Routes to: Claude Sonnet Cost: $2.19 per 1M tokens Savings: 72% vs GPT-4

Function Calling

“What’s the weather?” + toolsRoutes to: GPT-5 Mini Prioritizes function calling support Smart tool-capable model selection

Configuration Options

Function Calling Support

When tools are provided, Adaptive automatically prioritizes models with function calling capabilities:

const completion = await openai.chat.completions.create({
  model: "",
  messages: [{ role: "user", content: "What's the weather in San Francisco?" }],
  tools: [{
    type: "function",
    function: {
      name: "get_weather",
      description: "Get current weather for a location",
      parameters: {
        type: "object",
        properties: {
          location: { type: "string", description: "City name" }
        },
        required: ["location"]
      }
    }
  }]
});

// Automatically routes to models that support function calling

Control Cost vs Performance

Balance between cost savings and response quality:

const completion = await openai.chat.completions.create({
  model: "",
  messages: [{ role: "user", content: "Explain quantum physics" }],
  cost_bias: 0.3 // 0 = cheapest, 0.5 = balanced, 1 = best performance
});

Limit Available Providers

Restrict routing to specific providers or models:

const completion = await openai.chat.completions.create({
  model: "",
  messages: [{ role: "user", content: "Write a story" }],
  model_router: {
    models: ["openai/gpt-5-mini", "anthropic/claude-sonnet-4-5"] // Specify allowed models
  }
});

Routing Performance

Accuracy

94% accurate model selection based on prompt analysis

Speed

<1ms routing decision time with zero added latency

Reliability

99.9% uptime with automatic failover mechanisms

Preview Routing Decisions

Want to see which model would be selected before making the request? Use our model selection preview:

// Preview which model would be selected
const response = await fetch('https://api.llmadaptive.uk/v1/select-model', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer apk_123456',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    prompt: 'Complex data analysis task',
     models: [
       'openai/gpt-5-mini',
       'anthropic/claude-sonnet-4-5',
       'gemini/gemini-2.5-flash-lite'
     ],
    cost_bias: 0.5
  })
});

const result = await response.json();
console.log(`Would select: ${result.selected_model.author}/${result.selected_model.model_name}`);
console.log(`Alternatives: ${JSON.stringify(result.alternatives)}`);

Response Information

Every response includes provider information:

{
  "id": "chatcmpl-abc123",
  "choices": [
    {
      "message": { "content": "Hello! How can I help you today?" }
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  },
  "provider": "gemini", // Which provider was selected
  "model": "gemini-flash" // Specific model used
}

Advanced Use Cases

Enterprise Optimization

Custom provider contracts: Use model routing with your own API keys and enterprise pricing

Local Deployment

On-premise inference: Get cloud-quality routing decisions for local model deployments

A/B Testing

Model comparison: Preview different routing strategies before implementing them

Cost Monitoring

Budget control: Set cost thresholds and optimize spending automatically

Best Practices

Tip: Start with cost_bias: 0.3 for most applications. This provides excellent cost savings while maintaining high quality responses.

Important: Always handle the case where no suitable model is found. The API will return an error with suggested alternatives.

Getting Started

Key Features

Framework Integrations

Developer Tools

Examples

API Reference

Support

How It Works

Quick Start

Simple Greeting

Code Generation

Complex Analysis

Function Calling

Configuration Options

Function Calling Support

Control Cost vs Performance

Limit Available Providers

Routing Performance

Accuracy

Speed

Reliability

Preview Routing Decisions

Response Information

Advanced Use Cases

Enterprise Optimization

Local Deployment

A/B Testing

Cost Monitoring

Best Practices

Next Steps

Performance Features

Provider Resiliency

Getting Started

Key Features

Framework Integrations

Developer Tools

Examples

API Reference

Support

​How It Works

​Quick Start

Simple Greeting

Code Generation

Complex Analysis

Function Calling

​Configuration Options

​Function Calling Support

​Control Cost vs Performance

​Limit Available Providers

​Routing Performance

Accuracy

Speed

Reliability

​Preview Routing Decisions

​Response Information

​Advanced Use Cases

Enterprise Optimization

Local Deployment

A/B Testing

Cost Monitoring

​Best Practices

​Next Steps

Performance Features

Provider Resiliency

How It Works

Quick Start

Configuration Options

Function Calling Support

Control Cost vs Performance

Limit Available Providers

Routing Performance

Preview Routing Decisions

Response Information

Advanced Use Cases

Best Practices

Next Steps