POST
/
api
/
v1
/
chat
/
completions
Chat Completions
curl --request POST \
  --url https://llmadaptive.uk/api/v1/chat/completions \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "<string>",
  "messages": [
    {}
  ],
  "temperature": 123,
  "max_completion_tokens": 123,
  "stream": true,
  "model_router": {
    "cost_bias": 123,
    "models": [
      {}
    ],
    "complexity_threshold": 123,
    "token_threshold": 123
  },
  "fallback": {
    "mode": "<string>"
  },
  "prompt_response_cache": {}
}'
{
  "id": "<string>",
  "object": "<string>",
  "created": 123,
  "model": "<string>",
  "provider": "<string>",
  "choices": [
    {
      "index": 123,
      "message": {
        "role": "<string>",
        "content": "<string>",
        "tool_calls": [
          {}
        ]
      },
      "finish_reason": "<string>"
    }
  ],
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 123,
    "total_tokens": 123,
    "cache_tier": "<string>"
  },
  "error": {
    "message": "<string>",
    "type": "<string>",
    "code": "<string>"
  }
}
💡 Quick Start: Same as OpenAI API, but use model: "" for intelligent routing and automatic cost savings

30-Second Setup

1. Authentication: Use your Adaptive API key (either format works)
X-Stainless-API-Key: your-adaptive-api-key
# OR
Authorization: Bearer your-adaptive-api-key
2. Model Selection: Leave empty for smart routing
{
  "model": "",  // ← This enables intelligent routing
  "messages": [...]
}
That’s it! Your requests automatically save 60-80% while maintaining quality.

Essential Parameters

model
string
required
For intelligent routing: Use "" (empty string) to automatically select the best model for cost and qualityFor specific models: Use provider:model format like "anthropic:claude-3-sonnet" or "openai:gpt-4"
messages
array
required
Array of message objects. Same format as OpenAI.
temperature
number
Creativity level: 0 = focused, 1 = balanced, 2 = creative. Default: 1
max_completion_tokens
integer
Maximum response length in tokens. Leave unset for automatic sizing.
stream
boolean
Enable streaming responses. Default: false

Smart Routing & Cost Control

model_router
object
Control intelligent routing to optimize cost and performance
fallback
object
Provider backup when primary fails

Performance & Caching

prompt_response_cache
object
Semantic caching for similar requests (faster responses, lower costs)

Advanced Configuration

Response

id
string
Unique identifier for the completion
object
string
Object type, always chat.completion
created
integer
Unix timestamp of creation
model
string
Model used for the completion
provider
string
Adaptive addition: Which provider was selected (e.g., “openai”, “anthropic”)
choices
array
Array of completion choices
usage
object
Token usage statistics

Live Examples

💡 Try These Examples: Copy-paste ready code that works immediately. Each example shows the cost savings in action.

1. Simple Chat → 97% Cost Savings

Cost Comparison: Simple question routes to Gemini Flash
OpenAI Direct: 3.00per1MtokensAdaptiveSmart:3.00 per 1M tokens **Adaptive Smart:** 0.10 per 1M tokens
Savings: 97%
const completion = await openai.chat.completions.create({
  model: '',  // ← Smart routing enabled
  messages: [
    { role: 'user', content: 'Explain quantum computing simply' }
  ],
});

console.log(completion.choices[0].message.content);
console.log(`Provider used: ${completion.provider}`);  // See which was chosen
console.log(`Cache tier: ${completion.usage.cache_tier || 'none'}`);

2. Complex Analysis → 85% Cost Savings

Cost Comparison: Complex task routes to DeepSeek Reasoner
OpenAI Direct: 15.00per1MtokensAdaptiveSmart:15.00 per 1M tokens **Adaptive Smart:** 2.19 per 1M tokens
Savings: 85%
const completion = await openai.chat.completions.create({
  model: '',
  messages: [
    { 
      role: 'user', 
      content: 'Analyze the economic implications of quantum computing on cryptocurrency security, considering both short-term disruptions and long-term adaptations. Include specific recommendations for blockchain protocols.' 
    }
  ],
});

// Complex prompts automatically route to premium models when needed
console.log(`Routed to: ${completion.provider}`);  // Likely Claude or DeepSeek

With Intelligent Routing Configuration

// Simple provider selection
const completion = await openai.chat.completions.create({
  model: '',
  messages: [
    { role: 'user', content: 'Write a Python function to sort a list' }
  ],
  model_router: {
    models: [
      { provider: "anthropic" }, // All Anthropic models
      { provider: "openai", model_name: "gpt-4" } // Specific OpenAI model
    ],
    cost_bias: 0.2, // Prefer cost savings
    complexity_threshold: 0.3,
    token_threshold: 1000
  },
  fallback: {
    mode: 'sequential'  // Enabled by specifying mode
  }
});

Using Custom Providers

// Custom provider example
const completion = await openai.chat.completions.create({
  model: '',
  messages: [
    { role: 'user', content: 'Explain machine learning concepts' }
  ],
  model_router: {
    models: [
      { provider: "openai" }, // Standard provider
      { 
        provider: "my-custom-llm", // Custom provider
        model_name: "custom-model-v1",
        cost_per_1m_input_tokens: 2.0,
        cost_per_1m_output_tokens: 6.0,
        max_context_tokens: 16000,
        max_output_tokens: 4000,
        supports_tool_calling: true,
        task_type: "Text Generation",
        complexity: "medium"
      }
    ],
    cost_bias: 0.5
  },
  
  // Configure each custom provider
  provider_configs: {
    "my-custom-llm": {
      base_url: "https://api.mycustom.com/v1",
      api_key: "sk-custom-api-key-here",
      auth_type: "bearer",
      headers: {
        "X-Custom-Header": "value"
      },
      timeout_ms: 45000
    }
  }
});

Customizing Standard Providers

You can also customize standard providers (OpenAI, Anthropic, etc.) with custom base URLs, API keys, and settings:
// Override standard provider configuration
const completion = await openai.chat.completions.create({
  model: '',
  messages: [
    { role: 'user', content: 'Hello from custom OpenAI endpoint!' }
  ],
  model_router: {
    models: [
      { provider: "openai" }, // Will use custom config below
      { provider: "anthropic" } // Will also use custom config
    ]
  },
  
  // Custom configurations for standard providers
  provider_configs: {
    "openai": {
      base_url: "https://my-custom-openai-proxy.com/v1",
      api_key: "sk-my-custom-openai-key",
      timeout_ms: 60000,
      headers: {
        "X-Proxy-Key": "proxy-auth-123"
      }
    },
    "anthropic": {
      base_url: "https://my-anthropic-proxy.com/v1",
      api_key: "sk-ant-custom-key",
      timeout_ms: 45000
    }
  }
});

Streaming Response

const stream = await openai.chat.completions.create({
  model: '',
  messages: [
    { role: 'user', content: 'Tell me a story about space exploration' }
  ],
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Function Calling

const completion = await openai.chat.completions.create({
  model: '',
  messages: [
    { role: 'user', content: 'What\'s the weather like in San Francisco?' }
  ],
  tools: [
    {
      type: 'function',
      function: {
        name: 'get_weather',
        description: 'Get current weather for a location',
        parameters: {
          type: 'object',
          properties: {
            location: {
              type: 'string',
              description: 'City and state, e.g. San Francisco, CA'
            }
          },
          required: ['location']
        }
      }
    }
  ]
});

Vision (Multimodal)

const completion = await openai.chat.completions.create({
  model: '',
  messages: [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'What\'s in this image?' },
        {
          type: 'image_url',
          image_url: {
            url: 'https://example.com/image.jpg'
          }
        }
      ]
    }
  ],
  modalities: ['text'] // Can also include 'audio' for supported models
});

Advanced Configuration with All Parameters

const completion = await openai.chat.completions.create({
  model: '',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Explain machine learning concepts' }
  ],
  
  // Core parameters
  temperature: 0.7,
  max_completion_tokens: 1000,
  top_p: 0.9,
  frequency_penalty: 0.1,
  presence_penalty: 0.1,
  n: 1,
  seed: 12345,
  stop: ['\n\n'],
  user: 'user-123',
  
  // Advanced parameters
  logprobs: true,
  top_logprobs: 5,
  response_format: {
    type: 'json_schema',
    json_schema: {
      name: 'explanation',
      schema: {
        type: 'object',
        properties: {
          concept: { type: 'string' },
          explanation: { type: 'string' }
        }
      }
    }
  },
  service_tier: 'auto',
  store: false,
  metadata: {
    session_id: 'abc123',
    user_type: 'premium'
  },
  
  // Reasoning models (o-series)
  reasoning_effort: 'medium',
  
  // Function calling
  tools: [
    {
      type: 'function',
      function: {
        name: 'search_knowledge',
        description: 'Search knowledge base for information',
        parameters: {
          type: 'object',
          properties: {
            query: { type: 'string' }
          }
        }
      }
    }
  ],
  tool_choice: 'auto',
  parallel_tool_calls: true,
  
  // Streaming
  stream: false,
  stream_options: {
    include_usage: true
  },
  
  // Adaptive-specific
  model_router: {
    models: [
      { provider: "openai" }, // Use all OpenAI models
      { provider: "anthropic", model_name: "claude-3-sonnet-20240229" }, // Specific model
      // Custom model example (all params required):
      {
        provider: "my-custom-provider",
        model_name: "custom-model-v1",
        cost_per_1m_input_tokens: 5.0,
        cost_per_1m_output_tokens: 10.0,
        max_context_tokens: 32000,
        max_output_tokens: 2048,
        supports_tool_calling: false,
        task_type: "Text Generation",
        complexity: "medium"
      }
    ],
    cost_bias: 0.3,
    complexity_threshold: 0.5,
    token_threshold: 2000
  },
  
  provider_configs: {
    "my-custom-provider": {
      base_url: "https://api.custom.com/v1",
      api_key: "sk-custom-key-123",
      auth_type: "bearer",
      headers: {
        "Custom-Header": "custom-value"
      },
      timeout_ms: 30000,
      rate_limit_rpm: 1000
    }
  },
  
  prompt_response_cache: {
    enabled: true,
    semantic_threshold: 0.85
  },
  fallback: {
    mode: 'sequential'  // Enabled by specifying mode
  }
});

Response Examples

Cache Tier Tracking

The usage.cache_tier field shows which cache served your response:
// Semantic cache hit
{
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 8, 
    "total_tokens": 18,
    "cache_tier": "semantic_exact"
  }
}

// Prompt cache hit  
{
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 4,
    "total_tokens": 9,
    "cache_tier": "prompt_response"
  }
}

// No cache used
{
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 10,
    "total_tokens": 18
    // cache_tier omitted
  }
}

Error Handling & Troubleshooting

🛠️ Quick Fix Guide: Most issues have simple solutions. Here’s how to resolve them fast.

⚡ Instant Fixes

Error Response Format

error
object
Standard error object format

🚨 Emergency Troubleshooting

Service Down?
  1. Check our status page: status.llmadaptive.uk
  2. Join our Discord: discord.gg/adaptive
  3. Email support: info@llmadaptive.uk

Rate Limits

PlanRequests per MinuteTokens per Minute
Free10010,000
Pro1,000100,000
EnterpriseCustomCustom
Rate limits are applied per API key and reset every minute.

Best Practices

Model Selection

Use empty string "" for model to enable intelligent routing and cost savings

Cost Control

Use cost_bias parameter to balance cost vs performance for your use case

Custom Providers

When using custom providers, always include their configuration in provider_configs

Error Handling

Always implement proper error handling for network and API failures