Skip to main content
POST
/
v1
/
chat
/
completions
Chat Completions
curl --request POST \
  --url https://api.llmadaptive.uk/v1/chat/completions \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "<string>",
  "messages": [
    {}
  ],
  "temperature": 123,
  "max_completion_tokens": 123,
  "stream": true,
  "model_router": {
    "cost_bias": 123,
    "models": [
      {}
    ],
    "complexity_threshold": 123,
    "token_threshold": 123
  },
  "fallback": {
    "mode": "<string>"
  }
}'
{
  "id": "<string>",
  "object": "<string>",
  "created": 123,
  "model": "<string>",
  "provider": "<string>",
  "choices": [
    {
      "index": 123,
      "message": {
        "role": "<string>",
        "content": "<string>",
        "tool_calls": [
          {}
        ]
      },
      "finish_reason": "<string>"
    }
  ],
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 123,
    "total_tokens": 123,
    "cache_tier": "<string>"
  },
  "error": {
    "message": "<string>",
    "type": "<string>",
    "code": "<string>"
  }
}
💡 Quick Start: Same as OpenAI API, but use model: "" for intelligent routing and automatic cost savings

30-Second Setup

1. Authentication: Use your Adaptive API key (either format works)
X-Stainless-API-Key: your-adaptive-api-key
# OR
Authorization: Bearer your-adaptive-api-key
2. Model Selection: Leave empty for smart routing
{
  "model": "",  // ← This enables intelligent routing
  "messages": [...]
}
That’s it! Your requests automatically save 60-90% while maintaining quality.

Essential Parameters

model
string
required
For intelligent routing: Use "" (empty string) to automatically select the best model for cost and qualityFor specific models: Use provider:model format like "anthropic:claude-sonnet-4-5" or "openai:gpt-5-mini"
messages
array
required
Array of message objects. Same format as OpenAI.
temperature
number
Creativity level: 0 = focused, 1 = balanced, 2 = creative. Default: 1
max_completion_tokens
integer
Maximum response length in tokens. Leave unset for automatic sizing.
stream
boolean
Enable streaming responses. Default: false

Smart Routing & Cost Control

model_router
object
Control intelligent routing to optimize cost and performance
fallback
object
Provider backup when primary fails

Core Parameters

max_tokens
integer
Deprecated - Maximum number of tokens to generate. Use max_completion_tokens instead.
max_completion_tokens
integer
Maximum number of tokens that can be generated for completion, including reasoning tokens.
stream
boolean
Whether to stream the response. Default: false
top_p
number
Nucleus sampling parameter between 0 and 1. Default: 1
frequency_penalty
number
Penalty for token frequency. Range: -2.0 to 2.0. Default: 0
presence_penalty
number
Penalty for token presence. Range: -2.0 to 2.0. Default: 0
n
integer
Number of chat completion choices to generate. Default: 1
seed
integer
Seed for deterministic sampling. Helps ensure reproducible results.
stop
string | array
Up to 4 sequences where the API will stop generating tokens.
user
string
Unique identifier for end-user to help detect abuse and improve caching.

Advanced Parameters

logprobs
boolean
Whether to return log probabilities of output tokens. Default: false
top_logprobs
integer
Number of most likely tokens to return at each position (0-20). Requires logprobs: true.
logit_bias
object
Modify likelihood of specified tokens. Maps token IDs to bias values (-100 to 100).
response_format
object
Format for model output. Supports JSON schema for structured outputs.
service_tier
string
Latency tier for processing. Options: auto, default, flex
store
boolean
Whether to store output for model distillation or evals. Default: false
metadata
object
Set of 16 key-value pairs for storing additional information about the request.

Audio and Multimodal

modalities
array
Output types to generate. Options: ["text"], ["audio"], or ["text", "audio"]
audio
object
Parameters for audio output when modalities includes "audio".

Reasoning Models (o-series)

reasoning_effort
string
o-series models only - Effort level for reasoning: low, medium, or high

Function Calling

tools
array
Array of tool definitions for function calling. Maximum 128 functions.
tool_choice
string | object
Controls tool usage: none, auto, required, or specific tool selection
parallel_tool_calls
boolean
Whether to enable parallel function calling. Default: true
function_call
string | object
Deprecated - Use tool_choice instead. Controls function calling behavior.
web_search_options
object
Options for web search tool functionality.

Streaming Options

stream_options
object
Additional options for streaming responses.

Prediction and Caching

prediction
object
Static predicted output content for regeneration scenarios.

Adaptive-Specific Parameters

model_router
object
Configuration for intelligent routing and provider selection.
fallback
object
Configuration for provider fallback behavior. Fallback is disabled by default (empty/omitted), enabled when mode is specified.

Response

id
string
Unique identifier for the completion
object
string
Object type, always chat.completion
created
integer
Unix timestamp of creation
model
string
Model used for the completion
provider
string
Adaptive addition: Which provider was selected (e.g., “openai”, “anthropic”)
choices
array
Array of completion choices
usage
object
Token usage statistics

Live Examples

💡 Try These Examples: Copy-paste ready code that works immediately. Each example shows the cost savings in action.

1. Simple Chat → 97% Cost Savings

Cost Comparison: Simple question routes to Gemini Flash OpenAI Direct: 3.00per1MinputtokensAdaptiveSmart:GeminiFlash(3.00 per 1M input tokens **Adaptive Smart:** Gemini Flash (0.075/1M) + Overhead (0.10/1Minput,0.10/1M input, 0.20/1M output) Savings: 97% (total ~0.10/1Mvs0.10/1M vs 3.00/1M)
const completion = await openai.chat.completions.create({
  model: '',  // ← Smart routing enabled
  messages: [
    { role: 'user', content: 'Explain quantum computing simply' }
  ],
});

console.log(completion.choices[0].message.content);
console.log(`Provider used: ${completion.provider}`);  // See which was chosen
console.log(`Cache tier: ${completion.usage.cache_tier || 'none'}`);

2. Complex Analysis → 85% Cost Savings

Cost Comparison: Complex task routes to DeepSeek Reasoner OpenAI Direct: 15.00per1MinputtokensAdaptiveSmart:DeepSeek(15.00 per 1M input tokens **Adaptive Smart:** DeepSeek (1.00/1M) + Overhead (0.10/1Minput,0.10/1M input, 0.20/1M output) Savings: 85% (total ~1.30/1Mvs1.30/1M vs 15.00/1M)
const completion = await openai.chat.completions.create({
  model: '',
  messages: [
    { 
      role: 'user', 
      content: 'Analyze the economic implications of quantum computing on cryptocurrency security, considering both short-term disruptions and long-term adaptations. Include specific recommendations for blockchain protocols.' 
    }
  ],
});

// Complex prompts automatically route to premium models when needed
console.log(`Routed to: ${completion.provider}`);  // Likely Claude or DeepSeek

With Intelligent Routing Configuration

// Simple provider selection
const completion = await openai.chat.completions.create({
  model: '',
  messages: [
    { role: 'user', content: 'Write a Python function to sort a list' }
  ],
  model_router: {
    models: [
      "anthropic:claude-sonnet-4-5", // Premium Anthropic option
      "openai:gpt-5-mini" // Specific OpenAI model
    ],
    cost_bias: 0.2, // Prefer cost savings
    complexity_threshold: 0.3,
    token_threshold: 1000
  },
  fallback: {
    mode: 'sequential'  // Enabled by specifying mode
  }
});

Customizing Standard Providers

You can also customize standard providers (OpenAI, Anthropic, etc.) with custom base URLs, API keys, and settings:
// Override standard provider configuration
const completion = await openai.chat.completions.create({
  model: '',
  messages: [
    { role: 'user', content: 'Hello from custom OpenAI endpoint!' }
  ],
  model_router: {
      models: [
        "openai:gpt-5-mini", // Will use custom config below
        "anthropic:claude-sonnet-4-5" // Will also use custom config
      ]
  },
  
  // Custom configurations for standard providers
  provider_configs: {
    "openai": {
      base_url: "https://my-custom-openai-proxy.com/v1",
      api_key: "sk-my-custom-openai-key",
      timeout_ms: 60000,
      headers: {
        "X-Proxy-Key": "proxy-auth-123"
      }
    },
    "anthropic": {
      base_url: "https://my-anthropic-proxy.com/v1",
      api_key: "sk-ant-custom-key",
      timeout_ms: 45000
    }
  }
});

Streaming Response

const stream = await openai.chat.completions.create({
  model: '',
  messages: [
    { role: 'user', content: 'Tell me a story about space exploration' }
  ],
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Function Calling

const completion = await openai.chat.completions.create({
  model: '',
  messages: [
    { role: 'user', content: 'What\'s the weather like in San Francisco?' }
  ],
  tools: [
    {
      type: 'function',
      function: {
        name: 'get_weather',
        description: 'Get current weather for a location',
        parameters: {
          type: 'object',
          properties: {
            location: {
              type: 'string',
              description: 'City and state, e.g. San Francisco, CA'
            }
          },
          required: ['location']
        }
      }
    }
  ]
});

Vision (Multimodal)

const completion = await openai.chat.completions.create({
  model: '',
  messages: [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'What\'s in this image?' },
        {
          type: 'image_url',
          image_url: {
            url: 'https://example.com/image.jpg'
          }
        }
      ]
    }
  ],
  modalities: ['text'] // Can also include 'audio' for supported models
});

Advanced Configuration with All Parameters

const completion = await openai.chat.completions.create({
  model: '',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Explain machine learning concepts' }
  ],
  
  // Core parameters
  temperature: 0.7,
  max_completion_tokens: 1000,
  top_p: 0.9,
  frequency_penalty: 0.1,
  presence_penalty: 0.1,
  n: 1,
  seed: 12345,
  stop: ['\n\n'],
  user: 'user-123',
  
  // Advanced parameters
  logprobs: true,
  top_logprobs: 5,
  response_format: {
    type: 'json_schema',
    json_schema: {
      name: 'explanation',
      schema: {
        type: 'object',
        properties: {
          concept: { type: 'string' },
          explanation: { type: 'string' }
        }
      }
    }
  },
  service_tier: 'auto',
  store: false,
  metadata: {
    session_id: 'abc123',
    user_type: 'premium'
  },
  
  // Reasoning models (o-series)
  reasoning_effort: 'medium',
  
  // Function calling
  tools: [
    {
      type: 'function',
      function: {
        name: 'search_knowledge',
        description: 'Search knowledge base for information',
        parameters: {
          type: 'object',
          properties: {
            query: { type: 'string' }
          }
        }
      }
    }
  ],
  tool_choice: 'auto',
  parallel_tool_calls: true,
  
  // Streaming
  stream: false,
  stream_options: {
    include_usage: true
  },
  
  // Adaptive-specific
  model_router: {
    models: [
      "openai:gpt-5-mini", // Use OpenAI gpt-5 family
      "anthropic:claude-sonnet-4-5", // Specific model
    ],
    cost_bias: 0.3,
    complexity_threshold: 0.5,
    token_threshold: 2000
  },
  
  fallback: {
    mode: 'sequential'  // Enabled by specifying mode
  }
});

Response Examples

Cache Tier Tracking

The usage.cache_tier field shows which cache served your response:
// Semantic cache hit
{
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 8, 
    "total_tokens": 18,
    "cache_tier": "semantic_exact"
  }
}

// No cache used
{
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 10,
    "total_tokens": 18
    // cache_tier omitted
  }
}

Error Handling & Troubleshooting

🛠️ Quick Fix Guide: Most issues have simple solutions. Here’s how to resolve them fast.

⚡ Instant Fixes

Problem: {"error": {"message": "Invalid API key", "type": "authentication_error"}}Instant Solutions:
  1. Check header format:
    // ✅ Correct
    headers: { "X-Stainless-API-Key": "your-adaptive-key" }
    // OR
    headers: { "Authorization": "Bearer your-adaptive-key" }
    
    // ❌ Wrong
    headers: { "X-API-Key": "your-key" }  // Wrong header name
    
  2. Verify your key: Copy-paste from llmadaptive.uk dashboard
  3. Check environment variables:
    echo $ADAPTIVE_API_KEY  # Should show your key
    
Working Example:
const openai = new OpenAI({
  apiKey: process.env.ADAPTIVE_API_KEY,  // ← Make sure this is set
  baseURL: 'https://api.llmadaptive.uk/v1'
});
Problem: {"error": {"message": "Invalid request", "type": "invalid_request_error"}}Common Causes & Fixes:
  1. Empty messages array:
    // ❌ Wrong
    messages: []
    
    // ✅ Correct  
    messages: [{ role: "user", content: "Hello!" }]
    
  2. Missing required fields:
    // ❌ Wrong
    { role: "user" }  // Missing content
    
    // ✅ Correct
    { role: "user", content: "Your message here" }
    
  3. Invalid model_router config:
    // ❌ Wrong - model missing required fields
    model_router: {
      models: [{ provider: "unknown-provider" }]  // Provider not supported
    }
    
    // ✅ Correct - use supported providers
    model_router: {
      models: ["openai:gpt-5-mini", "anthropic:claude-sonnet-4-5"]
    }
    
Problem: {"error": {"message": "Rate limit exceeded", "type": "rate_limit_error"}}Immediate Actions:
  1. Wait and retry: Rate limits reset every minute
  2. Implement exponential backoff:
    async function callWithRetry(requestFn, maxRetries = 3) {
      for (let i = 0; i < maxRetries; i++) {
        try {
          return await requestFn();
        } catch (error) {
          if (error.status === 429 && i < maxRetries - 1) {
            await new Promise(resolve => setTimeout(resolve, Math.pow(2, i) * 1000));
          } else {
            throw error;
          }
        }
      }
    }
    
  3. Upgrade your plan at llmadaptive.uk for higher limits
  4. Use caching to reduce requests:
    semantic_cache: { enabled: true }
    
Problem: Custom provider not working or failingChecklist:
  1. Provider configuration must be complete:
    provider_configs: {
      "my-provider": {
        base_url: "https://api.example.com/v1",  // ✅ Required
        api_key: "sk-your-key",                  // ✅ Required
        auth_type: "bearer",                     // ✅ Good practice
        timeout_ms: 30000                       // ✅ Recommended
      }
    }
    
  2. Model definition must include all fields:
    model_router: {
      models: [{
        provider: "my-provider",
        model_name: "model-name",                    // ✅ Required
        cost_per_1m_input_tokens: 2.0,              // ✅ Required
        cost_per_1m_output_tokens: 6.0,             // ✅ Required
        context_length: 16000,                      // ✅ Required
        max_completion_tokens: 4000,                // ✅ Required
        supported_parameters: ["temperature", "top_p"],  // ✅ Required
        task_type: "Text Generation",               // ✅ Required
        complexity: "medium"                        // ✅ Required
      }]
    }
    
  3. Test the provider directly first:
    curl https://api.your-provider.com/v1/chat/completions \
      -H "Authorization: Bearer your-key" \
      -d '{"model": "model-name", "messages": [...]}'
    

Error Response Format

error
object
Standard error object format

🚨 Emergency Troubleshooting

Service Down?
  1. Check our status page: status.llmadaptive.uk
  2. Join our Discord: discord.gg/adaptive
  3. Email support: info@llmadaptive.uk

Rate Limits

PlanRequests per MinuteTokens per Minute
Free10010,000
Pro1,000100,000
EnterpriseCustomCustom
Rate limits are applied per API key and reset every minute.

Best Practices

Model Selection

Use empty string "" for model to enable intelligent routing and cost savings

Cost Control

Use cost_bias parameter to balance cost vs performance for your use case

Custom Providers

When using custom providers, always include their configuration in provider_configs

Error Handling

Always implement proper error handling for network and API failures