Supported Models

Adaptive supports the latest models from all major AI providers, automatically updated with new releases. Our intelligent routing system selects the optimal model based on your prompt, cost preferences, and performance requirements.
New Models Available: We’ve added support for GPT-5, GPT-4.1, Claude Opus 4, Gemini 2.5, DeepSeek-V3.1, Grok 4, and many more latest 2025 models. GPT-5 is now available as OpenAI’s most advanced model series.

OpenAI Models

GPT-5 Series (Latest 2025)

OpenAI’s most advanced models featuring unified reasoning and superior intelligence across all domains:
  • gpt-5 - The flagship model with state-of-the-art performance across coding, math, writing, health, and visual perception
  • gpt-5-mini - Faster, cost-efficient version of GPT-5 for well-defined tasks
  • gpt-5-nano - Fastest, most cost-efficient version for simple tasks and high-volume usage
  • gpt-5-chat-latest - GPT-5 variant optimized for ChatGPT (API access available)
Key Features:
  • Context window: Up to 272,000 tokens input / 128,000 tokens output
  • Unified system with fast and deeper reasoning modes
  • 94.6% accuracy on AIME 2025 math problems
  • 74.9% on SWE-bench Verified coding tasks
  • 45% fewer factual errors vs GPT-4o, 80% fewer vs o3 when reasoning
  • Support for custom tools with plaintext instead of JSON
  • Reasoning effort levels: minimal, low, medium, high
  • Advanced multimodal capabilities (text + vision)
Pricing:
  • GPT-5: 1.25/1.25/10 per million tokens (input/output)
  • GPT-5-mini: 0.25/0.25/2 per million tokens
  • GPT-5-nano: 0.05/0.05/0.4 per million tokens

GPT-4.1 Series (2025)

Previous generation models with significant improvements in coding and reasoning:
  • gpt-4.1 - The flagship model outperforming GPT-4o across all benchmarks
  • gpt-4.1-mini - 83% cheaper than GPT-4o with near-GPT-4 performance
  • gpt-4.1-nano - Fastest and cheapest with 1M token context window
Key Features:
  • Context window: Up to 1M tokens
  • Knowledge cutoff: June 2024
  • Major coding improvements (54.6% vs 33.2% on SWE-bench)
  • Exceptional instruction following

GPT-4o Series

Multimodal models integrating text and images:
  • gpt-4o - Main multimodal model matching GPT-4 Turbo performance
  • gpt-4o-mini - Cost-efficient alternative to GPT-3.5 Turbo
  • gpt-4o-audio models - Speech-to-text capabilities

GPT-4 Turbo

  • gpt-4-turbo - Fast, cost-efficient variant for text tasks
  • gpt-4-turbo-preview - Preview version with latest features

Reasoning Models (o-series)

  • o3 - Most powerful reasoning model for logical and technical tasks
  • o3-pro - Extended reasoning time for complex problems
  • o4-mini - Enhanced reasoning with improved performance

Anthropic Claude Models

Claude 4 Family (Latest 2025)

The newest generation setting new standards for AI capabilities:
  • claude-opus-4.1 - World’s best coding model (72.5% on SWE-bench)
  • claude-opus-4 - Advanced reasoning and AI agent capabilities
  • claude-sonnet-4 - Improved coding with 72.7% on SWE-bench
Pricing: Consistent with previous Opus/Sonnet models
  • Opus 4: 15/15/75 per million tokens (input/output)
  • Sonnet 4: 3/3/15 per million tokens (input/output)

Claude 3.7 Family

  • claude-sonnet-3.7 - Most intelligent model with extended thinking capabilities

Claude 3.5 Family

  • claude-3-5-sonnet-20241022 - Enhanced performance across all tasks
  • claude-3-5-haiku-20241022 - Fastest model surpassing Claude 3 Opus benchmarks

Claude 3 Family (Original)

  • claude-3-opus-20240229 - Most intelligent with best-in-market complex task performance
  • claude-3-sonnet-20240229 - Balanced intelligence and speed for enterprise
  • claude-3-haiku-20240307 - Fastest, most compact for near-instant responses

Google Gemini Models

Gemini 2.5 Series (Latest)

Google’s most advanced thinking models with adaptive capabilities:
  • gemini-2.5-pro - State-of-the-art thinking model with adaptive reasoning
  • gemini-2.5-flash - Best price-performance model with thinking capabilities
  • gemini-2.5-flash-lite - Most cost-efficient and fastest 2.5 model
Special Features:
  • Adaptive thinking mode shows reasoning process
  • Superior code, math, and STEM reasoning
  • Long context for large datasets and documents

Gemini 2.0 Series

  • gemini-2.0-flash - Next-gen features with 1M token context
  • gemini-2.0-flash-live - Low-latency voice and video interactions

Gemini 1.5 Series (Deprecated – Removed May 2025)

Gemini 1.5 models have been fully deprecated and removed from the Adaptive platform. Please migrate to Gemini 2.x or later models.
Gemini 1.5 models are no longer supported and cannot be selected for new or existing projects.

DeepSeek Models

DeepSeek-V3.1 (Latest Hybrid 2025)

The most advanced hybrid model combining reasoning and efficiency:
  • deepseek-chat (V3.1) - Hybrid model with thinking/non-thinking modes
  • deepseek-reasoner (V3.1) - Enhanced reasoning mode for complex problems
  • deepseek-v3-0324 - Improved post-training with better reasoning
Key Features:
  • 671B total parameters (37B activated)
  • 128K context window
  • Dual-mode operation (thinking vs direct)
  • Outperforms GPT-4.5 in math and coding

DeepSeek Specialized Models

  • deepseek-coder-v2 - 338 programming languages, 128K context
  • deepseek-r1 - Dedicated reasoning model for complex logic
  • deepseek-r1-0528 - Advanced reasoning with 23K token reasoning chains

Available Sizes

  • 1.5B, 7B, 8B, 14B, 32B, 70B - Distilled models for various deployment needs

Groq Models (Ultra-Fast Inference)

Latest Llama Models on Groq

High-performance inference with Groq’s LPU™ technology:
  • llama-3.3-70b-versatile - Flagship model with exceptional speed
  • llama-3.1-8b-instant - Exceptional price-performance ratio
  • llama-3-groq-70b-tool-use - Specialized for function calling
  • deepseek-r1-distill-llama-70b - Reasoning optimized with 128K context
Performance Benefits:
  • 5-15x faster than other API providers
  • Up to 814 tokens/second
  • Sub-second response times

Additional Groq Models

  • gemma2-9b-it - Google’s efficient model (being deprecated)
  • llama-guard-4-12b - AI content moderation
  • gpt-oss, kimi-k2, qwen3-32b - Various open-source options

xAI Grok Models

Grok 4 Series (Latest 2025)

xAI’s most intelligent models with real-time capabilities:
  • grok-4 - “Most intelligent model in the world” with native tool use
  • grok-4-heavy - Most powerful version of Grok 4
  • grok-4-fast - Cost-efficient reasoning with 2M token context
  • grok-code-fast-1 - Specialized for agentic coding tasks
Features:
  • Real-time X/web search integration
  • 256K context window (2M for fast variants)
  • Native multimodal understanding

Grok 3 Series

  • grok-3 - Superior reasoning with extensive knowledge
  • grok-3-mini - Efficient model for standard tasks
  • grok-3-reasoning - Enhanced logical reasoning capabilities

Perplexity Sonar Models

Latest Sonar (2025)

Built on Llama 3.3 70B with search optimization:
  • sonar-latest - Latest Sonar model optimized for answer quality
  • llama-3.1-sonar-large-128k-online - Large online search model
  • llama-3.1-sonar-small-128k-online - Efficient online model
Deprecation: llama-3.1-sonar-large-128k-online will be discontinued February 22, 2025.
Performance: 1200 tokens/second with Cerebras infrastructure

Together AI & HuggingFace Models

Qwen3 Series (2025)

Advanced reasoning models with dual-mode capabilities:
  • qwen3-235b-a22b - Large MoE model (235B total, 22B active)
  • qwen3-30b-a3b - Smaller MoE model (30B total, 3B active)
  • qwen3-coder-480b-a35b - Largest open-source coding model
  • qwen2.5-vl - Visual reasoning and video understanding
Key Features:
  • Dual-mode: Instant responses vs deep reasoning
  • Apache 2.0 license
  • Outperforms OpenAI O3 on key benchmarks

Llama Models via Together AI

  • llama-3.3-70b-instruct-turbo - Recommended general-purpose model
  • llama-4-scout-17b - Vision model for multimodal tasks
  • Various fine-tuned and specialized variants

HuggingFace Models

Access to 200+ open-source models including:
  • meta-llama/Llama-3.1-8B-Instruct - Efficient general-purpose
  • deepseek-ai/DeepSeek-R1-Distill-Qwen-14B - Reasoning optimized
  • Custom and fine-tuned models for specialized domains

Model Selection Intelligence

Automatic Routing

Adaptive’s AI system automatically selects the optimal model based on:
  • Task Type: Code, math, creative writing, analysis, etc.
  • Complexity: Simple queries vs complex reasoning tasks
  • Cost Preference: Your cost_bias setting (0.0 = cheapest, 1.0 = best)
  • Context Length: Required context window size
  • Tool Use: Function calling capabilities when needed

Cost Optimization

Our intelligent routing typically saves 60-80% on costs by:
  • Using efficient models for simple tasks
  • Reserving premium models for complex reasoning
  • Automatic fallback when providers are unavailable
  • Real-time cost-performance analysis

Performance Tiers

Best for: Simple queries, basic tasks, high-volume usageModels: GPT-5-nano, DeepSeek-Chat, GPT-4.1-nano, Grok-3-mini, Groq Llama modelsTypical Cost: $0.15-2.50 per 1M tokens

Getting Started

Using Supported Models

You can specify models in three ways:
  1. Let Adaptive choose (recommended):
    {
      "models": [
        {"provider": "openai"},
        {"provider": "anthropic"},
        {"provider": "google"}
      ]
    }
    
  2. Specify exact models:
    {
      "models": [
        {"provider": "openai", "model_name": "gpt-5-mini"},
        {"provider": "anthropic", "model_name": "claude-sonnet-4-20250514"}
      ]
    }
    
  3. OpenAI-compatible direct calls:
    cURL
    curl -sS -X POST https://llmadaptive.uk/api/v1/chat/completions \
      -H "Authorization: Bearer $ADAPTIVE_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"model": "openai:gpt-5-mini", "messages": [{"role": "user", "content": "Hello"}]}'
    
    Python
    import os
    import requests
    
    response = requests.post(
        "https://llmadaptive.uk/api/v1/chat/completions",
        headers={
            "Authorization": f"Bearer {os.getenv('ADAPTIVE_API_KEY')}",
            "Content-Type": "application/json"
        },
        json={
            "model": "openai:gpt-5-mini",
            "messages": [{"role": "user", "content": "Hello"}]
        },
        timeout=30
    )
    
    if not response.ok:
        raise Exception(f"HTTP {response.status_code}: {response.text}")
    
    print(response.json())
    
    JavaScript (Node 18+)
    const response = await fetch("https://llmadaptive.uk/api/v1/chat/completions", {
      method: "POST",
      headers: {
        "Authorization": `Bearer ${process.env.ADAPTIVE_API_KEY}`,
        "Content-Type": "application/json"
      },
      body: JSON.stringify({
        model: "openai:gpt-5-mini",
        messages: [{ role: "user", content: "Hello" }]
      })
    });
    
    if (!response.ok) {
      throw new Error(`HTTP ${response.status}: ${await response.text()}`);
    }
    
    console.log(await response.json());
    

Model Updates

Automatic Updates: New models are added automatically as providers release them
Backward Compatibility: Existing model names continue to work with automatic fallbacks
Performance Monitoring: We continuously monitor model performance and update recommendations

Next Steps