Skip to main content

Supported Models

Adaptive supports the latest models from all major AI providers, automatically updated with new releases. Our intelligent routing system selects the optimal model based on your prompt, cost preferences, and performance requirements.
Latest October 2025 Models: GPT-5 series with unified reasoning, Claude Haiku 4.5 (now free tier), Gemini 2.5 Pro, DeepSeek V3.2-Exp with 50% lower pricing, Grok 4 Fast, Llama 4 Scout & Maverick on Groq. All models automatically available through intelligent routing.

OpenAI Models

GPT-5 Series (Latest 2025)

OpenAI’s most advanced models featuring unified reasoning and superior intelligence across all domains:
  • gpt-5 - The flagship model with state-of-the-art performance across coding, math, writing, health, and visual perception
  • gpt-5-mini - Faster, cost-efficient version of GPT-5 for well-defined tasks
  • gpt-5-nano - Fastest, most cost-efficient version for simple tasks and high-volume usage
  • gpt-5-chat-latest - GPT-5 variant optimized for ChatGPT (API access available)
Key Features:
  • Context window: Up to 272,000 tokens input / 128,000 tokens output
  • Unified system with fast and deeper reasoning modes
  • 94.6% accuracy on AIME 2025 math problems
  • 74.9% on SWE-bench Verified coding tasks
  • 45% fewer factual errors vs GPT-4o, 80% fewer vs o3 when reasoning
  • Support for custom tools with plaintext instead of JSON
  • Reasoning effort levels: minimal, low, medium, high
  • Advanced multimodal capabilities (text + vision)
  • Semantic caching reduces cached input costs by 90% ($0.125 per million cached tokens)
Pricing:
  • GPT-5: 1.25/1.25/10 per million tokens (input/output), $0.125/million cached
  • GPT-5-mini: 0.25/0.25/2 per million tokens
  • GPT-5-nano: 0.05/0.05/0.4 per million tokens

Anthropic Claude Models

Claude 4 Family (Latest 2025)

The newest generation setting new standards for AI capabilities:
  • claude-haiku-4.5 - Latest October 2025 release - Fast, affordable, available on free tier (1/1/5 per million tokens)
  • claude-opus-4.1 - World’s best coding model (72.5% on SWE-bench)
  • claude-opus-4 - Advanced reasoning and AI agent capabilities
  • claude-sonnet-4.5 - Anthropic’s best Sonnet model for complex agents and coding with vision support
  • claude-sonnet-4 - Improved coding with 72.7% on SWE-bench
Pricing:
  • Haiku 4.5: 1/1/5 per million tokens (input/output) - Now available free on Claude.ai
  • Sonnet 4 and 4.5: 3/3/15 per million tokens (input/output)
  • Opus 4.1: 20/20/80 per million tokens + $40/million thinking tokens

Claude 3.5 Family (Previous Generation)

  • claude-sonnet-4-5 - Enhanced performance across all tasks
  • claude-3-5-haiku - Previous Haiku version (superseded by 4.5)

Google Gemini Models

Gemini 2.5 Series (Latest)

Google’s most advanced thinking models with adaptive capabilities:
  • gemini-2.5-flash-lite - State-of-the-art thinking model with adaptive reasoning
  • gemini-2.5-flash - Best price-performance model with thinking capabilities
  • gemini-2.5-flash-lite - Most cost-efficient and fastest 2.5 model ($0.02 per million tokens)
Pricing:
  • 2.5 Pro: 1.25/1.25/10 per million tokens (up to 200K context), 2.50/2.50/15 (over 200K)
  • 2.5 Flash: Similar pricing to 2.5 Flash-Lite
  • 2.5 Flash-Lite: $0.02 per million tokens
Special Features:
  • Adaptive thinking mode shows reasoning process
  • Superior code, math, and STEM reasoning
  • Long context for large datasets and documents

Gemini 2.0 Series

  • gemini-2.0-flash - Next-gen features with 1M token context (0.10/0.10/0.40 per million tokens)
  • gemini-2.0-flash-live - Low-latency voice and video interactions

Gemini 1.5 Series (Deprecated – Removed May 2025)

Gemini 1.5 models have been fully deprecated and removed from the Adaptive platform. Please migrate to Gemini 2.x or later models.
Gemini 1.5 models are no longer supported and cannot be selected for new or existing projects.

DeepSeek Models

DeepSeek V3.2-Exp (Latest September 2025)

The most affordable and advanced hybrid model with 50% price reduction:
  • deepseek-v3.2-exp - Latest experimental model with 50% lower pricing ($0.028 per million input tokens)
  • deepseek-chat (V3) - Production-ready chat model (0.27input/0.27 input / 1.10 output per million tokens)
  • deepseek-reasoner - Specialized reasoning mode with enhanced thinking capabilities
  • deepseek-v3 - Standard V3 model (0.14input/0.14 input / 0.28 output per million tokens)
Pricing:
  • V3.2-Exp: $0.028 per million input tokens (50% cheaper than V3)
  • DeepSeek-Chat: 0.07(cachehit)/0.07 (cache hit) / 0.27 (cache miss) input, $1.10 output per million
  • DeepSeek V3: 0.14/0.14/0.28 per million tokens (input/output)
Key Features:
  • 671B total parameters (MoE; 37B active)
  • 128K context window
  • Dual-mode operation (thinking vs direct)
  • Most affordable frontier model pricing

DeepSeek Specialized Models

  • deepseek-coder-v2 - 338 programming languages, 128K context
  • deepseek-r1 - Dedicated reasoning model for complex logic
  • deepseek-r1-0528 - Advanced reasoning with 23K token reasoning chains

Available Sizes

  • 1.5B, 7B, 8B, 14B, 32B, 70B - Distilled models for various deployment needs

Groq Models (Ultra-Fast Inference)

Latest Models on Groq (2025)

High-performance inference with Groq’s LPU™ technology:
  • llama-4-scout - Latest April 2025 Llama 4 model (17B active, 109B total parameters)
  • llama-4-maverick - Larger Llama 4 variant (17B active, 400B total parameters)
  • gpt-oss-120b - OpenAI open-source model on Groq (500+ tokens/sec, 128K context)
  • gpt-oss-20b - Smaller OpenAI OSS model (1000+ tokens/sec)
  • llama-3.3-70b-versatile - Flagship Llama 3 model with exceptional speed
  • llama-3-groq-70b-tool-use - Specialized for function calling
  • llama-3-groq-8b-tool-use - Efficient tool use variant
  • kimi-k2-0905 - 256K context window with agentic coding capabilities
Pricing:
  • gpt-oss-120B: 0.15/0.15/0.75 per million tokens (input/output)
  • gpt-oss-20B: 0.10/0.10/0.50 per million tokens
Performance Benefits:
  • 5-15× faster than other API providers
  • Up to 1000+ tokens/second
  • Sub-second response times with LPU acceleration

Additional Groq Models

  • llama-guard-4-12b - AI content moderation
  • Compound on GroqCloud - Production-ready agentic AI with research, code execution, and browser control

xAI Grok Models

Grok 4 Series (Latest 2025)

xAI’s most intelligent models with real-time capabilities:
  • grok-4 - “Most intelligent model in the world” with native tool use
  • grok-4-heavy - Most powerful version of Grok 4
  • grok-4-fast - Cost-efficient reasoning with 2M token context
  • grok-code-fast-1 - Specialized for agentic coding tasks
Pricing:
  • Grok 4: 3.00/3.00/15.00 per million tokens (input/output), $0.75/million cached input
  • Grok 4 Fast: 0.200.20-0.40 input (tiered), 0.500.50-1.00 output, $0.05/million cached
Features:
  • Real-time X/web search integration
  • 256K context window (2M for fast variants)
  • Native multimodal understanding and tool use

Z.AI GLM Models

GLM-4 Family (Latest 2025)

Z.AI’s advanced language models with competitive pricing and prompt caching:
  • glm-4.6 - Latest flagship model with enhanced capabilities
  • glm-4.5 - Advanced model for complex tasks
  • glm-4.5v - Vision-enabled multimodal model
  • glm-4.5-x - Extra-large model for demanding applications
  • glm-4.5-air - Fast and cost-efficient for everyday tasks
  • glm-4.5-airx - Balanced performance and cost
  • glm-4-32b-0414-128k - 128K context window, ultra-low pricing
  • glm-4.5-flash - Free tier model with zero cost
Pricing (per 1M tokens):
  • GLM-4.6: 0.6/0.6/2.2 (input/output), Cached: $0.11
  • GLM-4.5: 0.6/0.6/2.2 (input/output), Cached: $0.11
  • GLM-4.5V: 0.6/0.6/1.8 (input/output), Cached: $0.11
  • GLM-4.5-X: 2.2/2.2/8.9 (input/output), Cached: $0.45
  • GLM-4.5-Air: 0.2/0.2/1.1 (input/output), Cached: $0.03
  • GLM-4.5-AirX: 1.1/1.1/4.5 (input/output), Cached: $0.22
  • GLM-4-32B-0414-128K: 0.1/0.1/0.1 (input/output)
  • GLM-4.5-Flash: Free (input/output/cached)
Special Features:
  • Prompt caching for reduced input costs (storage currently free)
  • Built-in web search tool ($0.01 per use)
  • 128K context window on select models
  • Multimodal capabilities (GLM-4.5V)

Perplexity Sonar Models

Latest Sonar (2025)

Built on Llama 3.3 70B with search optimization:
  • sonar-latest - Latest Sonar model optimized for answer quality
  • llama-3.1-sonar-large-128k-online - Large online search model
  • llama-3.1-sonar-small-128k-online - Efficient online model
Deprecation: llama-3.1-sonar-large-128k-online will be discontinued February 22, 2025.
Performance: 1200 tokens/second with Cerebras infrastructure

Together AI & HuggingFace Models

Qwen3 Series (2025)

Advanced reasoning models with dual-mode capabilities:
  • qwen3-235b-a22b - Large MoE model (235B total, 22B active)
  • qwen3-30b-a3b - Smaller MoE model (30B total, 3B active)
  • qwen3-coder-480b-a35b - Largest open-source coding model
  • qwen2.5-vl - Visual reasoning and video understanding
Key Features:
  • Dual-mode: Instant responses vs deep reasoning
  • Apache 2.0 license
  • Outperforms OpenAI O3 on key benchmarks

Llama Models via Together AI

  • llama-3.3-70b-instruct-turbo - Recommended general-purpose model
  • llama-4-scout-17b - Vision model for multimodal tasks
  • Various fine-tuned and specialized variants

HuggingFace Models

Access to 200+ open-source models including:
  • meta-llama/Llama-3.1-8B-Instruct - Efficient general-purpose
  • deepseek-ai/DeepSeek-R1-Distill-Qwen-14B - Reasoning optimized
  • Custom and fine-tuned models for specialized domains
The remainder of this page (Model Selection Intelligence, Cost Optimization, Performance Tiers, Getting Started, Model Updates, Next Steps) remains unchanged.