Overview
The Gemini Generate Content endpoint provides a Google Gemini API-compatible interface for generating text, code, and structured content. Use this endpoint with the official @google/genai
SDK or any Gemini-compatible client.
This endpoint is fully compatible with Google’s Gemini API, allowing you to use the official Google Gen AI SDK while benefiting from Adaptive’s intelligent routing, cost optimization, and multi-provider support.
Authentication
Your Adaptive API key. Also supports Authorization: Bearer
, X-API-Key
, or api-key
headers.
Path Parameters
The model to use for generation. Supports Gemini model names and Adaptive’s intelligent routing. Examples:
gemini-2.5-pro
- Latest Gemini Pro model
gemini-2.5-flash
- Fast Gemini Flash model
gemini-1.5-pro
- Gemini 1.5 Pro
Custom model aliases configured in Adaptive
Request Body
An array of content parts representing the conversation history or prompt. "contents" : [
{
"role" : "user" ,
"parts" : [
{
"text" : "Explain quantum computing in simple terms"
}
]
}
]
Generation configuration parameters. Show Configuration Properties
Controls randomness in generation (0.0 to 2.0). Default: 1.0
Nucleus sampling parameter (0.0 to 1.0). Default: 0.95
Top-K sampling parameter. Default: 40
Maximum tokens to generate. Default: 8192
Sequences that stop generation when encountered.
Number of response candidates to generate. Default: 1
Adaptive Extension : Provider-specific configuration overrides."provider_configs" : {
"anthropic" : {
"temperature" : 0.7
},
"openai" : {
"temperature" : 0.8
}
}
Adaptive Extension : Control intelligent routing behavior.Enable/disable intelligent routing. Default: true
model_router.fallback_models
List of fallback models if primary model fails.
model_router.cost_optimization
Enable cost-based model selection. Default: true
Adaptive Extension : Semantic caching configuration."semantic_cache" : {
"enabled" : true ,
"similarity_threshold" : 0.95
}
Adaptive Extension : Prompt caching configuration."prompt_cache" : {
"enabled" : true ,
"ttl" : 3600
}
Adaptive Extension : Fallback configuration for provider failures."fallback" : {
"enabled" : true ,
"max_retries" : 3
}
Response
Array of generated response candidates. The generated content. "content" : {
"parts" : [
{
"text" : "Quantum computing uses quantum..."
}
],
"role" : "model"
}
Reason the generation stopped: STOP
, MAX_TOKENS
, SAFETY
, RECITATION
, OTHER
Safety classification ratings for the generated content.
Citation information for referenced sources.
Token usage information. Number of tokens in the prompt.
Number of tokens in the generated response.
Total tokens used (prompt + completion).
Adaptive Extension : Cache tier used (none
, prompt
, semantic
)
The actual model version used for generation.
Adaptive Extension : The provider that handled the request (e.g., google
, anthropic
, openai
)
Code Examples
TypeScript (Google Gen AI SDK)
Python
cURL
JavaScript (Fetch)
import { GoogleGenAI } from '@google/genai' ;
const ai = new GoogleGenAI ({
apiKey: process . env . GEMINI_API_KEY ,
httpOptions: {
baseUrl: 'https://www.llmadaptive.uk/api/v1beta'
}
});
const response = await ai . models . generateContent ({
model: 'gemini-2.5-pro' ,
contents: [
{
role: 'user' ,
parts: [
{ text: 'Explain quantum computing in simple terms' }
]
}
],
config: {
temperature: 0.7 ,
maxOutputTokens: 1024
}
});
console . log ( response . candidates [ 0 ]. content . parts [ 0 ]. text );
console . log ( 'Provider:' , response . provider );
console . log ( 'Tokens used:' , response . usageMetadata . totalTokenCount );
Advanced Examples
Multi-Turn Conversation
Multi-Turn Chat
Multi-Turn Chat
const response = await ai . models . generateContent ({
model: 'gemini-2.5-pro' ,
contents: [
{
role: 'user' ,
parts: [{ text: 'What is the capital of France?' }]
},
{
role: 'model' ,
parts: [{ text: 'The capital of France is Paris.' }]
},
{
role: 'user' ,
parts: [{ text: 'What is its population?' }]
}
]
});
With Adaptive Extensions
const response = await ai . models . generateContent ({
model: 'gemini-2.5-pro' ,
contents: [
{
role: 'user' ,
parts: [{ text: 'Write a sorting algorithm in Python' }]
}
],
config: {
temperature: 0.3 ,
maxOutputTokens: 2048
},
// Adaptive-specific features
semantic_cache: {
enabled: true ,
similarity_threshold: 0.95
},
fallback: {
enabled: true ,
max_retries: 3
},
model_router: {
cost_optimization: true ,
fallback_models: [ 'claude-sonnet-4-20250514' , 'gpt-4o' ]
}
});
console . log ( 'Cache tier:' , response . usageMetadata . cache_tier );
console . log ( 'Provider:' , response . provider );
Error Responses
Error information when the request fails. HTTP status code (400, 401, 429, 500, etc.)
Human-readable error message.
Error status: INVALID_ARGUMENT
, UNAUTHENTICATED
, PERMISSION_DENIED
, RESOURCE_EXHAUSTED
, INTERNAL
Common Errors
{
"error" : {
"code" : 401 ,
"message" : "API key required. Provide it via x-goog-api-key, Authorization: Bearer, X-API-Key, or api-key header" ,
"status" : "UNAUTHENTICATED"
}
}
Solution : Provide a valid API key in the x-goog-api-key
header or other supported header formats.
{
"error" : {
"code" : 400 ,
"message" : "Invalid request format" ,
"status" : "INVALID_ARGUMENT"
}
}
Solution : Check your request body format. Ensure contents
array is properly formatted with valid roles and parts.
{
"error" : {
"code" : 429 ,
"message" : "Rate limit exceeded" ,
"status" : "RESOURCE_EXHAUSTED"
}
}
Solution : Reduce request rate or upgrade your plan for higher limits. Adaptive’s load balancing helps distribute requests across providers.
{
"error" : {
"code" : 500 ,
"message" : "Internal server error" ,
"status" : "INTERNAL"
}
}
Solution : Temporary server issue. Adaptive’s fallback system will automatically retry with alternative providers.
Features & Benefits
Google SDK Compatible Drop-in replacement for Google’s Gemini API—use the official @google/genai
SDK without changes
Multi-Provider Routing Access models from Google, Anthropic, OpenAI, and more through a single endpoint
Intelligent Caching Semantic and prompt caching reduce costs by up to 90% for similar requests
Automatic Fallbacks Provider failures automatically route to alternative models for high reliability
Cost Optimization Intelligent routing selects the most cost-effective model for each request
Usage Analytics Detailed token usage, costs, and performance metrics in the dashboard
SDK Integration
For full SDK integration guide with code examples and best practices, see: