Supported Models
Adaptive supports the latest models from all major AI providers, automatically updated with new releases. Our intelligent routing system selects the optimal model based on your prompt, cost preferences, and performance requirements.New Models Available: We’ve added support for GPT-5, GPT-4.1, Claude Opus 4, Gemini 2.5, DeepSeek-V3.1, Grok 4, and many more latest 2025 models. GPT-5 is now available as OpenAI’s most advanced model series.
OpenAI Models
GPT-5 Series (Latest 2025)
OpenAI’s most advanced models featuring unified reasoning and superior intelligence across all domains:GPT-5 Models
GPT-5 Models
- gpt-5 - The flagship model with state-of-the-art performance across coding, math, writing, health, and visual perception
- gpt-5-mini - Faster, cost-efficient version of GPT-5 for well-defined tasks
- gpt-5-nano - Fastest, most cost-efficient version for simple tasks and high-volume usage
- gpt-5-chat-latest - GPT-5 variant optimized for ChatGPT (API access available)
- Context window: Up to 272,000 tokens input / 128,000 tokens output
- Unified system with fast and deeper reasoning modes
- 94.6% accuracy on AIME 2025 math problems
- 74.9% on SWE-bench Verified coding tasks
- 45% fewer factual errors vs GPT-4o, 80% fewer vs o3 when reasoning
- Support for custom tools with plaintext instead of JSON
- Reasoning effort levels: minimal, low, medium, high
- Advanced multimodal capabilities (text + vision)
- GPT-5: 10 per million tokens (input/output)
- GPT-5-mini: 2 per million tokens
- GPT-5-nano: 0.4 per million tokens
GPT-4.1 Series (2025)
Previous generation models with significant improvements in coding and reasoning:GPT-4.1 Models
GPT-4.1 Models
- gpt-4.1 - The flagship model outperforming GPT-4o across all benchmarks
- gpt-4.1-mini - 83% cheaper than GPT-4o with near-GPT-4 performance
- gpt-4.1-nano - Fastest and cheapest with 1M token context window
- Context window: Up to 1M tokens
- Knowledge cutoff: June 2024
- Major coding improvements (54.6% vs 33.2% on SWE-bench)
- Exceptional instruction following
GPT-4o Series
Multimodal models integrating text and images:- gpt-4o - Main multimodal model matching GPT-4 Turbo performance
- gpt-4o-mini - Cost-efficient alternative to GPT-3.5 Turbo
- gpt-4o-audio models - Speech-to-text capabilities
GPT-4 Turbo
- gpt-4-turbo - Fast, cost-efficient variant for text tasks
- gpt-4-turbo-preview - Preview version with latest features
Reasoning Models (o-series)
- o3 - Most powerful reasoning model for logical and technical tasks
- o3-pro - Extended reasoning time for complex problems
- o4-mini - Enhanced reasoning with improved performance
Anthropic Claude Models
Claude 4 Family (Latest 2025)
The newest generation setting new standards for AI capabilities:Claude 4 Models
Claude 4 Models
- claude-opus-4.1 - World’s best coding model (72.5% on SWE-bench)
- claude-opus-4 - Advanced reasoning and AI agent capabilities
- claude-sonnet-4 - Improved coding with 72.7% on SWE-bench
- Opus 4: 75 per million tokens (input/output)
- Sonnet 4: 15 per million tokens (input/output)
Claude 3.7 Family
- claude-sonnet-3.7 - Most intelligent model with extended thinking capabilities
Claude 3.5 Family
- claude-3-5-sonnet-20241022 - Enhanced performance across all tasks
- claude-3-5-haiku-20241022 - Fastest model surpassing Claude 3 Opus benchmarks
Claude 3 Family (Original)
- claude-3-opus-20240229 - Most intelligent with best-in-market complex task performance
- claude-3-sonnet-20240229 - Balanced intelligence and speed for enterprise
- claude-3-haiku-20240307 - Fastest, most compact for near-instant responses
Google Gemini Models
Gemini 2.5 Series (Latest)
Google’s most advanced thinking models with adaptive capabilities:Gemini 2.5 Models
Gemini 2.5 Models
- gemini-2.5-pro - State-of-the-art thinking model with adaptive reasoning
- gemini-2.5-flash - Best price-performance model with thinking capabilities
- gemini-2.5-flash-lite - Most cost-efficient and fastest 2.5 model
- Adaptive thinking mode shows reasoning process
- Superior code, math, and STEM reasoning
- Long context for large datasets and documents
Gemini 2.0 Series
- gemini-2.0-flash - Next-gen features with 1M token context
- gemini-2.0-flash-live - Low-latency voice and video interactions
Gemini 1.5 Series (Deprecated – Removed May 2025)
Gemini 1.5 models have been fully deprecated and removed from the Adaptive platform. Please migrate to Gemini 2.x or later models.Gemini 1.5 models are no longer supported and cannot be selected for new or existing projects.
DeepSeek Models
DeepSeek-V3.1 (Latest Hybrid 2025)
The most advanced hybrid model combining reasoning and efficiency:DeepSeek Latest Models
DeepSeek Latest Models
- deepseek-chat (V3.1) - Hybrid model with thinking/non-thinking modes
- deepseek-reasoner (V3.1) - Enhanced reasoning mode for complex problems
- deepseek-v3-0324 - Improved post-training with better reasoning
- 671B total parameters (37B activated)
- 128K context window
- Dual-mode operation (thinking vs direct)
- Outperforms GPT-4.5 in math and coding
DeepSeek Specialized Models
- deepseek-coder-v2 - 338 programming languages, 128K context
- deepseek-r1 - Dedicated reasoning model for complex logic
- deepseek-r1-0528 - Advanced reasoning with 23K token reasoning chains
Available Sizes
- 1.5B, 7B, 8B, 14B, 32B, 70B - Distilled models for various deployment needs
Groq Models (Ultra-Fast Inference)
Latest Llama Models on Groq
High-performance inference with Groq’s LPU™ technology:Groq High-Speed Models
Groq High-Speed Models
- llama-3.3-70b-versatile - Flagship model with exceptional speed
- llama-3.1-8b-instant - Exceptional price-performance ratio
- llama-3-groq-70b-tool-use - Specialized for function calling
- deepseek-r1-distill-llama-70b - Reasoning optimized with 128K context
- 5-15x faster than other API providers
- Up to 814 tokens/second
- Sub-second response times
Additional Groq Models
- gemma2-9b-it - Google’s efficient model (being deprecated)
- llama-guard-4-12b - AI content moderation
- gpt-oss, kimi-k2, qwen3-32b - Various open-source options
xAI Grok Models
Grok 4 Series (Latest 2025)
xAI’s most intelligent models with real-time capabilities:Grok 4 Models
Grok 4 Models
- grok-4 - “Most intelligent model in the world” with native tool use
- grok-4-heavy - Most powerful version of Grok 4
- grok-4-fast - Cost-efficient reasoning with 2M token context
- grok-code-fast-1 - Specialized for agentic coding tasks
- Real-time X/web search integration
- 256K context window (2M for fast variants)
- Native multimodal understanding
Grok 3 Series
- grok-3 - Superior reasoning with extensive knowledge
- grok-3-mini - Efficient model for standard tasks
- grok-3-reasoning - Enhanced logical reasoning capabilities
Perplexity Sonar Models
Latest Sonar (2025)
Built on Llama 3.3 70B with search optimization:- sonar-latest - Latest Sonar model optimized for answer quality
- llama-3.1-sonar-large-128k-online - Large online search model
- llama-3.1-sonar-small-128k-online - Efficient online model
Deprecation: llama-3.1-sonar-large-128k-online will be discontinued February 22, 2025.
Together AI & HuggingFace Models
Qwen3 Series (2025)
Advanced reasoning models with dual-mode capabilities:Qwen3 Models
Qwen3 Models
- qwen3-235b-a22b - Large MoE model (235B total, 22B active)
- qwen3-30b-a3b - Smaller MoE model (30B total, 3B active)
- qwen3-coder-480b-a35b - Largest open-source coding model
- qwen2.5-vl - Visual reasoning and video understanding
- Dual-mode: Instant responses vs deep reasoning
- Apache 2.0 license
- Outperforms OpenAI O3 on key benchmarks
Llama Models via Together AI
- llama-3.3-70b-instruct-turbo - Recommended general-purpose model
- llama-4-scout-17b - Vision model for multimodal tasks
- Various fine-tuned and specialized variants
HuggingFace Models
Access to 200+ open-source models including:- meta-llama/Llama-3.1-8B-Instruct - Efficient general-purpose
- deepseek-ai/DeepSeek-R1-Distill-Qwen-14B - Reasoning optimized
- Custom and fine-tuned models for specialized domains
Model Selection Intelligence
Automatic Routing
Adaptive’s AI system automatically selects the optimal model based on:- Task Type: Code, math, creative writing, analysis, etc.
- Complexity: Simple queries vs complex reasoning tasks
- Cost Preference: Your cost_bias setting (0.0 = cheapest, 1.0 = best)
- Context Length: Required context window size
- Tool Use: Function calling capabilities when needed
Cost Optimization
Our intelligent routing typically saves 60-80% on costs by:- Using efficient models for simple tasks
- Reserving premium models for complex reasoning
- Automatic fallback when providers are unavailable
- Real-time cost-performance analysis
Performance Tiers
Best for: Simple queries, basic tasks, high-volume usageModels: GPT-5-nano, DeepSeek-Chat, GPT-4.1-nano, Grok-3-mini, Groq Llama modelsTypical Cost: $0.15-2.50 per 1M tokens
Getting Started
Using Supported Models
You can specify models in three ways:-
Let Adaptive choose (recommended):
-
Specify exact models:
-
OpenAI-compatible direct calls:
cURLPythonJavaScript (Node 18+)
Model Updates
Automatic Updates: New models are added automatically as providers release them
Backward Compatibility: Existing model names continue to work with automatic fallbacks
Performance Monitoring: We continuously monitor model performance and update recommendations