Chat Completions - Adaptive

💡 Quick Start: Same as OpenAI API, but use model: "" for intelligent routing and automatic cost savings

30-Second Setup

1. Authentication: Use your Adaptive API key (either format works)

X-Stainless-API-Key: your-adaptive-api-key
# OR
Authorization: Bearer your-adaptive-api-key

2. Model Selection: Leave empty for smart routing

{
  "model": "",  // ← This enables intelligent routing
  "messages": [...]
}

That’s it! Your requests automatically save 60-90% while maintaining quality.

Essential Parameters

model

string

required

For intelligent routing: Use "" (empty string) to automatically select the best model for cost and qualityFor specific models: Use provider:model format like "anthropic:claude-sonnet-4-5" or "openai:gpt-5-mini"

messages

array

required

Array of message objects. Same format as OpenAI.

Show Message Format

[
  {"role": "system", "content": "You are a helpful assistant"},
  {"role": "user", "content": "Hello!"}
]

Roles: system, user, assistant, tool

temperature

number

Creativity level: 0 = focused, 1 = balanced, 2 = creative. Default: 1

max_completion_tokens

integer

Maximum response length in tokens. Leave unset for automatic sizing.

stream

boolean

Enable streaming responses. Default: false

Smart Routing & Cost Control

model_router

object

Control intelligent routing to optimize cost and performance

Show Quick Examples

// Prefer cost savings (80% cheaper on average)
model_router: { cost_bias: 0.1 }

// Balanced cost and quality
model_router: { cost_bias: 0.5 }

// Prefer best performance
model_router: { cost_bias: 0.9 }

// Limit to specific providers
model_router: {
  models: [
    "openai:gpt-5-mini",
    "anthropic:claude-sonnet-4-5"
  ]
}

Show Full Configuration

cost_bias

number

Balance cost vs performance: 0 = cheapest, 1 = best quality. Default: 0.5

models

array

Allowed providers/models. Examples:

"openai:gpt-5-mini" - Specific model
"anthropic:claude-sonnet-4-5" - Specific model
"gemini:gemini-2.5-flash-lite" - Specific model

complexity_threshold

number

Override automatic complexity detection (0-1)

token_threshold

integer

Override automatic token counting threshold

fallback

object

Provider backup when primary fails

Show Fallback Options

// Try providers one by one (cheaper)
fallback: { mode: "sequential" }

// Try multiple providers at once (faster)
fallback: { mode: "race" }

mode

string

"sequential" (cheaper) or "race" (faster)

📋 All Standard OpenAI Parameters

Core Parameters

max_tokens

integer

Deprecated - Maximum number of tokens to generate. Use max_completion_tokens instead.

max_completion_tokens

integer

Maximum number of tokens that can be generated for completion, including reasoning tokens.

stream

boolean

Whether to stream the response. Default: false

top_p

number

Nucleus sampling parameter between 0 and 1. Default: 1

frequency_penalty

number

Penalty for token frequency. Range: -2.0 to 2.0. Default: 0

presence_penalty

number

Penalty for token presence. Range: -2.0 to 2.0. Default: 0

integer

Number of chat completion choices to generate. Default: 1

seed

integer

Seed for deterministic sampling. Helps ensure reproducible results.

stop

string | array

Up to 4 sequences where the API will stop generating tokens.

user

string

Unique identifier for end-user to help detect abuse and improve caching.

Advanced Parameters

logprobs

boolean

Whether to return log probabilities of output tokens. Default: false

top_logprobs

integer

Number of most likely tokens to return at each position (0-20). Requires logprobs: true.

logit_bias

object

Modify likelihood of specified tokens. Maps token IDs to bias values (-100 to 100).

response_format

object

Format for model output. Supports JSON schema for structured outputs.

Show Response Format Options

type

string

Either json_object or json_schema

json_schema

object

JSON schema definition when using json_schema type

service_tier

string

Latency tier for processing. Options: auto, default, flex

store

boolean

Whether to store output for model distillation or evals. Default: false

metadata

object

Set of 16 key-value pairs for storing additional information about the request.

Audio and Multimodal

modalities

array

Output types to generate. Options: ["text"], ["audio"], or ["text", "audio"]

audio

object

Parameters for audio output when modalities includes "audio".

Reasoning Models (o-series)

reasoning_effort

string

o-series models only - Effort level for reasoning: low, medium, or high

Function Calling

tools

array

Array of tool definitions for function calling. Maximum 128 functions.

Show Tool Definition

type

string

Tool type, currently only function supported

function

object

Function definition with name, description, and parameters schema

tool_choice

string | object

Controls tool usage: none, auto, required, or specific tool selection

parallel_tool_calls

boolean

Whether to enable parallel function calling. Default: true

function_call

string | object

Deprecated - Use tool_choice instead. Controls function calling behavior.

Web Search

web_search_options

object

Options for web search tool functionality.

Streaming Options

stream_options

object

Additional options for streaming responses.

Show Stream Options

include_usage

boolean

Whether to include usage statistics in streaming response

Prediction and Caching

prediction

object

Static predicted output content for regeneration scenarios.

Adaptive-Specific Parameters

model_router

object

Configuration for intelligent routing and provider selection.

Show Model Router Config

models

ModelCapability[]

Array of model capabilities to consider for routing. Can be simplified or detailed:Simple formats:

"openai:gpt-5-mini" - Use specific OpenAI model
"anthropic:claude-sonnet-4-5" - Use specific Anthropic model

Custom models require all parameters:

Show Model Capability Object

provider

string

required

Provider name: "openai", "anthropic", "gemini", "z-ai"

model_name

string

Specific model identifier (e.g., "gpt-5-mini", "claude-sonnet-4-5"). Required for specific model selection

cost_per_1m_input_tokens

number

Cost per 1 million input tokens in USD.

cost_per_1m_output_tokens

number

Cost per 1 million output tokens in USD.

context_length

integer

Maximum context window size in tokens. Replaces deprecated max_context_tokens.

max_completion_tokens

integer

Maximum output tokens the model can generate. Replaces deprecated max_output_tokens.

supported_parameters

array

List of supported API parameters (e.g., [“temperature”, “top_p”, “tools”]). Replaces deprecated supports_tool_calling boolean.

description

string

Human-readable description of the model

languages_supported

string[]

Array of supported language codes

model_size_params

string

Model size information (e.g., "7B", "70B")

latency_tier

string

Expected latency: "low", "medium", "high"

task_type

string

Optimal task type: "Open QA", "Closed QA", "Summarization", "Text Generation", "Code Generation", "Chatbot", "Classification", "Rewrite", "Brainstorming", "Extraction", "Other"

complexity

string

Model complexity tier: "low", "medium", "high"

cost_bias

number

Bias towards cost optimization. Range: 0.0-1.0 where 0.0 = cheapest, 1.0 = best performance

complexity_threshold

number

Threshold for task complexity routing decisions. Range: 0.0-1.0

token_threshold

integer

Token count threshold for model selection. Positive integer.

fallback

object

Configuration for provider fallback behavior. Fallback is disabled by default (empty/omitted), enabled when mode is specified.

Show Fallback Config

mode

string

Fallback strategy: "sequential" or "race". Empty/omitted = disabled, specified = enabled.

timeout_ms

integer

Timeout in milliseconds for fallback operations

max_retries

integer

Maximum number of retry attempts

Response

string

Unique identifier for the completion

object

string

Object type, always chat.completion

created

integer

Unix timestamp of creation

model

string

Model used for the completion

provider

string

Adaptive addition: Which provider was selected (e.g., “openai”, “anthropic”)

choices

array

Array of completion choices

Show Choice Object

index

integer

Index of the choice

message

object

The generated message

Show Message Object

role

string

Role of the message, always “assistant”

content

string

The content of the message

tool_calls

array

Tool calls made by the model (if any)

finish_reason

string

Reason completion finished: stop, length, tool_calls, or content_filter

usage

object

Token usage statistics

Show Usage Object

prompt_tokens

integer

Number of tokens in the prompt

completion_tokens

integer

Number of tokens in the completion

total_tokens

integer

Total tokens used

cache_tier

string

Adaptive addition: Cache tier used for this response. Possible values:

"semantic_exact" - Exact semantic cache match
"semantic_similar" - Similar semantic cache match
Omitted when no cache is used

Live Examples

💡 Try These Examples: Copy-paste ready code that works immediately. Each example shows the cost savings in action.

1. Simple Chat → 97% Cost Savings

Cost Comparison: Simple question routes to Gemini Flash OpenAI Direct:

3.00 per 1M input tokens **Adaptive Smart:** Gemini Flash (

0.075/1M) + Overhead (

0.10/1M input,

0.20/1M output) Savings: 97% (total ~

0.10/1M vs

3.00/1M)

const completion = await openai.chat.completions.create({
  model: '',  // ← Smart routing enabled
  messages: [
    { role: 'user', content: 'Explain quantum computing simply' }
  ],
});

console.log(completion.choices[0].message.content);
console.log(`Provider used: ${completion.provider}`);  // See which was chosen
console.log(`Cache tier: ${completion.usage.cache_tier || 'none'}`);

2. Complex Analysis → 85% Cost Savings

Cost Comparison: Complex task routes to DeepSeek Reasoner OpenAI Direct:

15.00 per 1M input tokens **Adaptive Smart:** DeepSeek (

1.00/1M) + Overhead (

0.10/1M input,

0.20/1M output) Savings: 85% (total ~

1.30/1M vs

15.00/1M)

const completion = await openai.chat.completions.create({
  model: '',
  messages: [
    { 
      role: 'user', 
      content: 'Analyze the economic implications of quantum computing on cryptocurrency security, considering both short-term disruptions and long-term adaptations. Include specific recommendations for blockchain protocols.' 
    }
  ],
});

// Complex prompts automatically route to premium models when needed
console.log(`Routed to: ${completion.provider}`);  // Likely Claude or DeepSeek

With Intelligent Routing Configuration

// Simple provider selection
const completion = await openai.chat.completions.create({
  model: '',
  messages: [
    { role: 'user', content: 'Write a Python function to sort a list' }
  ],
  model_router: {
    models: [
      "anthropic:claude-sonnet-4-5", // Premium Anthropic option
      "openai:gpt-5-mini" // Specific OpenAI model
    ],
    cost_bias: 0.2, // Prefer cost savings
    complexity_threshold: 0.3,
    token_threshold: 1000
  },
  fallback: {
    mode: 'sequential'  // Enabled by specifying mode
  }
});

Customizing Standard Providers

You can also customize standard providers (OpenAI, Anthropic, etc.) with custom base URLs, API keys, and settings:

// Override standard provider configuration
const completion = await openai.chat.completions.create({
  model: '',
  messages: [
    { role: 'user', content: 'Hello from custom OpenAI endpoint!' }
  ],
  model_router: {
      models: [
        "openai:gpt-5-mini", // Will use custom config below
        "anthropic:claude-sonnet-4-5" // Will also use custom config
      ]
  },
  
  // Custom configurations for standard providers
  provider_configs: {
    "openai": {
      base_url: "https://my-custom-openai-proxy.com/v1",
      api_key: "sk-my-custom-openai-key",
      timeout_ms: 60000,
      headers: {
        "X-Proxy-Key": "proxy-auth-123"
      }
    },
    "anthropic": {
      base_url: "https://my-anthropic-proxy.com/v1",
      api_key: "sk-ant-custom-key",
      timeout_ms: 45000
    }
  }
});

Streaming Response

const stream = await openai.chat.completions.create({
  model: '',
  messages: [
    { role: 'user', content: 'Tell me a story about space exploration' }
  ],
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Function Calling

const completion = await openai.chat.completions.create({
  model: '',
  messages: [
    { role: 'user', content: 'What\'s the weather like in San Francisco?' }
  ],
  tools: [
    {
      type: 'function',
      function: {
        name: 'get_weather',
        description: 'Get current weather for a location',
        parameters: {
          type: 'object',
          properties: {
            location: {
              type: 'string',
              description: 'City and state, e.g. San Francisco, CA'
            }
          },
          required: ['location']
        }
      }
    }
  ]
});

Vision (Multimodal)

const completion = await openai.chat.completions.create({
  model: '',
  messages: [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'What\'s in this image?' },
        {
          type: 'image_url',
          image_url: {
            url: 'https://example.com/image.jpg'
          }
        }
      ]
    }
  ],
  modalities: ['text'] // Can also include 'audio' for supported models
});

Advanced Configuration with All Parameters

const completion = await openai.chat.completions.create({
  model: '',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Explain machine learning concepts' }
  ],
  
  // Core parameters
  temperature: 0.7,
  max_completion_tokens: 1000,
  top_p: 0.9,
  frequency_penalty: 0.1,
  presence_penalty: 0.1,
  n: 1,
  seed: 12345,
  stop: ['\n\n'],
  user: 'user-123',
  
  // Advanced parameters
  logprobs: true,
  top_logprobs: 5,
  response_format: {
    type: 'json_schema',
    json_schema: {
      name: 'explanation',
      schema: {
        type: 'object',
        properties: {
          concept: { type: 'string' },
          explanation: { type: 'string' }
        }
      }
    }
  },
  service_tier: 'auto',
  store: false,
  metadata: {
    session_id: 'abc123',
    user_type: 'premium'
  },
  
  // Reasoning models (o-series)
  reasoning_effort: 'medium',
  
  // Function calling
  tools: [
    {
      type: 'function',
      function: {
        name: 'search_knowledge',
        description: 'Search knowledge base for information',
        parameters: {
          type: 'object',
          properties: {
            query: { type: 'string' }
          }
        }
      }
    }
  ],
  tool_choice: 'auto',
  parallel_tool_calls: true,
  
  // Streaming
  stream: false,
  stream_options: {
    include_usage: true
  },
  
  // Adaptive-specific
  model_router: {
    models: [
      "openai:gpt-5-mini", // Use OpenAI gpt-5 family
      "anthropic:claude-sonnet-4-5", // Specific model
    ],
    cost_bias: 0.3,
    complexity_threshold: 0.5,
    token_threshold: 2000
  },
  
  fallback: {
    mode: 'sequential'  // Enabled by specifying mode
  }
});

Response Examples

Cache Tier Tracking

The usage.cache_tier field shows which cache served your response:

// Semantic cache hit
{
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 8, 
    "total_tokens": 18,
    "cache_tier": "semantic_exact"
  }
}

// No cache used
{
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 10,
    "total_tokens": 18
    // cache_tier omitted
  }
}

Error Handling & Troubleshooting

🛠️ Quick Fix Guide: Most issues have simple solutions. Here’s how to resolve them fast.

⚡ Instant Fixes

🔑 Authentication Error (401)

Problem: {"error": {"message": "Invalid API key", "type": "authentication_error"}}Instant Solutions:

Check header format:

// ✅ Correct
headers: { "X-Stainless-API-Key": "your-adaptive-key" }
// OR
headers: { "Authorization": "Bearer your-adaptive-key" }

// ❌ Wrong
headers: { "X-API-Key": "your-key" }  // Wrong header name

Verify your key: Copy-paste from llmadaptive.uk dashboard

Check environment variables:

echo $ADAPTIVE_API_KEY  # Should show your key

Working Example:

const openai = new OpenAI({
  apiKey: process.env.ADAPTIVE_API_KEY,  // ← Make sure this is set
  baseURL: 'https://api.llmadaptive.uk/v1'
});

📋 Invalid Request (400)

Problem: {"error": {"message": "Invalid request", "type": "invalid_request_error"}}Common Causes & Fixes:

Empty messages array:

// ❌ Wrong
messages: []

// ✅ Correct  
messages: [{ role: "user", content: "Hello!" }]

Missing required fields:

// ❌ Wrong
{ role: "user" }  // Missing content

// ✅ Correct
{ role: "user", content: "Your message here" }

Invalid model_router config:

// ❌ Wrong - model missing required fields
model_router: {
  models: [{ provider: "unknown-provider" }]  // Provider not supported
}

// ✅ Correct - use supported providers
model_router: {
  models: ["openai:gpt-5-mini", "anthropic:claude-sonnet-4-5"]
}

⏱️ Rate Limited (429)

Problem: {"error": {"message": "Rate limit exceeded", "type": "rate_limit_error"}}Immediate Actions:

Wait and retry: Rate limits reset every minute

Implement exponential backoff:

async function callWithRetry(requestFn, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await requestFn();
    } catch (error) {
      if (error.status === 429 && i < maxRetries - 1) {
        await new Promise(resolve => setTimeout(resolve, Math.pow(2, i) * 1000));
      } else {
        throw error;
      }
    }
  }
}

Upgrade your plan at llmadaptive.uk for higher limits
Use caching to reduce requests:
```
semantic_cache: { enabled: true }
```

🔧 Custom Provider Issues

Problem: Custom provider not working or failingChecklist:

Provider configuration must be complete:

provider_configs: {
  "my-provider": {
    base_url: "https://api.example.com/v1",  // ✅ Required
    api_key: "sk-your-key",                  // ✅ Required
    auth_type: "bearer",                     // ✅ Good practice
    timeout_ms: 30000                       // ✅ Recommended
  }
}

Model definition must include all fields:

model_router: {
  models: [{
    provider: "my-provider",
    model_name: "model-name",                    // ✅ Required
    cost_per_1m_input_tokens: 2.0,              // ✅ Required
    cost_per_1m_output_tokens: 6.0,             // ✅ Required
    context_length: 16000,                      // ✅ Required
    max_completion_tokens: 4000,                // ✅ Required
    supported_parameters: ["temperature", "top_p"],  // ✅ Required
    task_type: "Text Generation",               // ✅ Required
    complexity: "medium"                        // ✅ Required
  }]
}

Test the provider directly first:

curl https://api.your-provider.com/v1/chat/completions \
  -H "Authorization: Bearer your-key" \
  -d '{"model": "model-name", "messages": [...]}'

Error Response Format

error

object

Standard error object format

Show Error Structure

{
  "error": {
    "message": "Human-readable error description",
    "type": "authentication_error", 
    "code": "invalid_api_key"
  }
}

message

string

Clear description of what went wrong

type

string

Error category: invalid_request_error, authentication_error, permission_error, rate_limit_error, server_error

code

string

Specific error code for programmatic handling

🚨 Emergency Troubleshooting

Service Down?

Check our status page: status.llmadaptive.uk
Join our Discord: discord.gg/adaptive
Email support: info@llmadaptive.uk

Rate Limits

Plan	Requests per Minute	Tokens per Minute
Free	100	10,000
Pro	1,000	100,000
Enterprise	Custom	Custom

Rate limits are applied per API key and reset every minute.

Best Practices

Model Selection

Use empty string "" for model to enable intelligent routing and cost savings

Cost Control

Use cost_bias parameter to balance cost vs performance for your use case

Custom Providers

When using custom providers, always include their configuration in provider_configs

Error Handling

Always implement proper error handling for network and API failures

Getting Started

Framework Integrations

Developer Tools

Key Features

API Reference

Support

​30-Second Setup

​Essential Parameters

​Smart Routing & Cost Control

​Core Parameters

​Advanced Parameters

​Audio and Multimodal

​Reasoning Models (o-series)

​Function Calling

​Web Search

​Streaming Options

​Prediction and Caching

​Adaptive-Specific Parameters

​Response

​Live Examples

​1. Simple Chat → 97% Cost Savings

​2. Complex Analysis → 85% Cost Savings

​With Intelligent Routing Configuration

​Customizing Standard Providers

​Streaming Response

​Function Calling

​Vision (Multimodal)

​Advanced Configuration with All Parameters

​Response Examples

​Cache Tier Tracking

​Error Handling & Troubleshooting

​⚡ Instant Fixes

​Error Response Format

​🚨 Emergency Troubleshooting

​Rate Limits

​Best Practices

Model Selection

Cost Control

Custom Providers

Error Handling

30-Second Setup

Essential Parameters

Smart Routing & Cost Control

Core Parameters

Advanced Parameters

Audio and Multimodal

Reasoning Models (o-series)

Function Calling

Web Search

Streaming Options

Prediction and Caching

Adaptive-Specific Parameters

Response

Live Examples

1. Simple Chat → 97% Cost Savings

2. Complex Analysis → 85% Cost Savings

With Intelligent Routing Configuration

Customizing Standard Providers

Streaming Response

Function Calling

Vision (Multimodal)

Advanced Configuration with All Parameters

Response Examples

Cache Tier Tracking

Error Handling & Troubleshooting

⚡ Instant Fixes

Error Response Format

🚨 Emergency Troubleshooting

Rate Limits

Best Practices