Intelligent model selection with automatic routing. Works as a drop-in
replacement for Qwen Code’s API backend, providing access to multiple AI
providers (Claude, GPT-4, etc.) with cost optimization and intelligent routing
built-in.
Benefits of Using Qwen Code with Adaptive
When you integrate Qwen Code with Adaptive, you unlock powerful capabilities:- Multi-Provider Access: Access Claude, GPT-4, Gemini, and other providers through a single interface
- Intelligent Model Selection: Adaptive automatically routes requests to the optimal model based on task complexity, language, and context
- Cost Optimization: Save 60-80% on API costs through intelligent routing and model selection
- Higher Reliability: Automatic fallbacks across providers ensure consistent responses
- Enhanced Performance: Load balancing and circuit breakers for optimal throughput
- Usage Analytics: Monitor model usage, costs, and performance in real-time
Get Your Adaptive API Key
Visit llmadaptive.uk to create an account and generate your API key.Quick Setup
Run Automated Installer
- Install Qwen Code if not present (via npm)
- Configure OpenAI-compatible environment variables for Adaptive
- Add configuration to your shell profile (~/.bashrc, ~/.zshrc, etc.)
- Verify the installation
Verify Configuration
Start Using
Manual Installation
If you prefer to set up Qwen Code manually or need more control over the installation process:Step 1: Install Qwen Code
Qwen Code requires Node.js 20 or higher. Check your version with
node --version
.Step 2: Configure Environment Variables
Qwen Code uses OpenAI-compatible API configuration:Step 3: Apply Configuration
Step 4: Verify Installation
Alternative Setup Methods
Advanced Configuration
Model Selection with Adaptive
Configure which provider and model to use by default:Intelligent Routing
WhenOPENAI_MODEL
is set to "intelligent-routing"
or empty, Adaptive intelligently selects the best model for each task based on:
- Task Complexity: Analyzes prompt complexity to select the optimal model
- Language & Framework: Matches model strengths to programming languages
- Code Context: Understands codebase size and complexity
- Performance Requirements: Balances speed and quality
- Cost Optimization: Automatically minimizes costs while maintaining quality
- Provider Availability: Automatic fallback if a provider is unavailable
Available Model Providers
Provider | Models | Best For | Speed | Cost |
---|---|---|---|---|
Qwen | qwen-plus, qwen-turbo | Code generation, Asian languages | Fast | Low |
Anthropic | Claude Sonnet 4 | Complex reasoning, refactoring | Medium | Medium |
OpenAI | GPT-4, GPT-4 Turbo | General coding, documentation | Medium | Higher |
Gemini Pro, Flash | Code review, analysis | Fast | Medium | |
DeepSeek | deepseek-coder | Code completion, debugging | Fast | Low |
Usage Examples
Code Understanding & Editing
Workflow Automation
Session Management
Control your token usage with configurable session limits:Session token limit applies to a single conversation, not cumulative API calls.
Vision Model Support
Qwen Code includes automatic vision model detection for image analysis:Integration with Adaptive Features
Cost Optimization
Intelligent Model Routing
Intelligent Model Routing
Adaptive automatically routes your requests to the most cost-effective model that meets quality requirements:Before Adaptive: Fixed model costs
- GPT-4: 0.06/1K tokens (output)
- Claude Sonnet: 0.015/1K tokens (output)
- Simple queries → Qwen Turbo: $0.0008/1K tokens
- Moderate tasks → Qwen Plus: $0.002/1K tokens
- Complex reasoning → Claude Sonnet: $0.003/1K tokens
- 1M tokens/month without Adaptive: ~$45
- 1M tokens/month with Adaptive: ~$12
- Monthly savings: $33 (73% reduction)
Semantic Caching
Semantic Caching
Adaptive caches similar requests to reduce API calls:How it works:
- Semantic similarity detection for code queries
- Automatic cache hits for similar questions
- Configurable cache TTL and similarity threshold
- Cache hit rate: 30-40% for typical dev workflows
- Additional savings: 20-30% on top of intelligent routing
- Zero latency for cached responses
Load Balancing
Load Balancing
Distribute requests across providers for optimal performance:Benefits:
- Higher rate limits through multi-provider distribution
- Automatic failover if one provider is down
- Geographic routing for lower latency
- Cost-optimized provider selection
- 99.9% uptime with automatic failover
- 50% higher effective rate limits
- 20-30% latency reduction with geographic routing
Troubleshooting
Installation Issues
Installation Issues
Problem: Qwen Code installation failsSolutions:
-
Ensure Node.js 20+ is installed:
node --version
-
Install Node.js if needed:
-
Check npm permissions:
npm config get prefix
-
Try with sudo (not recommended):
sudo npm install -g @qwen-code/qwen-code
-
Clear npm cache:
npm cache clean --force
Authentication Errors
Authentication Errors
Problem: “Unauthorized” or “Invalid API key” errorsSolutions:
- Verify your API key at llmadaptive.uk/dashboard
-
Check environment variables are set:
-
Ensure variables are exported in your shell config:
- Restart your terminal if changes were made to shell config
-
Verify the base URL is correct:
https://www.llmadaptive.uk/api/v1
-
Check for the
# qwen-code
comment to ensure correct environment variables
Connection Errors
Connection Errors
Problem: Cannot connect to Adaptive APISolutions:
- Check internet connectivity
-
Verify base URL is correct:
echo $OPENAI_BASE_URL
-
Test API directly:
- Check if your network/firewall blocks the API endpoint
- Try using a different network or VPN
Model Routing Issues
Model Routing Issues
Problem: Requests not routing to expected modelsSolutions:
-
Check current model configuration:
-
Use intelligent routing for automatic selection:
-
Verify provider:model format:
- Check Adaptive dashboard for routing logs and model availability
- Review model names match supported providers
Performance Issues
Performance Issues
Problem: Slow response times or timeoutsSolutions:
- Check Adaptive dashboard for provider status
- Verify rate limits aren’t exceeded
-
Use faster models for simple tasks:
- Enable semantic caching for repeated queries
- Check your internet connection speed
- Review model selection—Qwen Turbo and Flash models are faster
- Consider load balancing configuration in Adaptive dashboard
Session Token Limits
Session Token Limits
Problem: Hitting token limits in long sessionsSolutions:
-
Configure higher session limits in
.qwen/settings.json
: -
Use session compression to reduce token usage:
-
Clear conversation history and start fresh:
-
Monitor token usage:
- Break large tasks into smaller sessions
Uninstallation
If you need to remove Qwen Code or revert configuration:1
Remove Qwen Code
2
Remove Environment Variables
Edit your shell config file and remove these lines:
3
Reload Shell Configuration
Next Steps
Monitor Usage & Savings
Track your cost savings and usage analytics in real-time
API Documentation
Learn about Adaptive’s API capabilities and advanced features
More CLI Tools
Explore other CLI tools with Adaptive integration
Advanced Routing
Learn about intelligent model routing and load balancing
Monitor Usage & Savings
Track your cost savings and usage analytics in real-time
API Documentation
Learn about Adaptive’s API capabilities and advanced features
More CLI Tools
Explore other CLI tools with Adaptive integration
Advanced Routing
Learn about intelligent model routing and load balancing
Was this page helpful? Contact us at
info@llmadaptive.uk for feedback or assistance
with your Qwen Code integration.