provider object is accepted on Chat Completions, Anthropic Messages, and Gemini Generate requests and is enforced consistently across non-streaming and streaming executions (including fallback paths).
Request Shape
provider.allow_fallbacks simply toggles the new fallback.enabled flag. Use the fallback object when you need finer control over mode, retries, or circuit breakers.Field Reference
order: Explicit list of provider tags to try first. When omitted, Adaptive’s heuristics determine the initial ordering.only: Whitelist of providers/endpoint tags. Requests are rejected if no allowed provider remains.ignore: Blacklist of providers/endpoint tags to skip even when the router selects them.sort: Secondary ordering whenorderis absent.price,throughput, andlatencymap to cost, capacity, and responsiveness heuristics.quantizations: Require specific quantization levels (e.g.,["int8","fp8"]). Endpoint metadata is used; models lacking the requested format are filtered out.require_parameters: When true, the model must advertise support for every parameter implied by the request (tools, response_format, etc.).data_collection:allow(default) ordeny. Whendeny, only providers marked as non-retentive in the registry remain. (Falls back to current metadata; future registry updates will make this stricter.)zdr: Restrict routing to Zero Data Retention endpoints.enforce_distillable_text: Filter to models whose publishers have opted into distillable outputs.allow_fallbacks: Convenience flag that maps tofallback.enabled. Set tofalseto disable provider retries entirely.max_price: Ceilings for prompt/completion/request/image pricing. Providers lacking explicit pricing are treated as exceeding the cap.
Intelligent Routing + Provider Constraints
- Logical model selection still happens through the Adaptive router (unless you hard-code
model). - Provider constraints (order/only/quantization/price/etc.) are applied when building the physical execution plan.
- Fallback now respects
fallback.enabled. When disabled, the first provider failure surfaces directly.
quantizations: ["fp8"], every provider in the execution plan satisfies that requirement.
Nitro / Floor Shortcuts
Suffix shortcuts work out of the box:- Append
:nitroto any model slug to implyprovider.sort = "throughput". - Append
:floorto implyprovider.sort = "price".
model directly (e.g., meta-llama/llama-3.1-70b-instruct:nitro).
Endpoint Coverage
| Endpoint | Support |
|---|---|
/v1/chat/completions | Full provider object + fallback.enabled |
/v1/messages | Same provider fields + fallback toggle |
/v1/models/:generate | Same provider fields + fallback toggle |
The Gemini streaming API now builds the same provider execution plan as the non-streaming route, so ordering and filtering are consistent everywhere.
Migration Tips
- Existing code: No changes required unless you want to leverage the new controls. Previous behavior (no
providerobject) is unchanged. - Fallback: If you relied on “unset mode = disabled,” switch to
fallback.enabled=false(orprovider.allow_fallbacks=false). - Registry metadata: Some filters (data collection, ZDR, distillable text) depend on registry tags. They currently act as “best effort” switches and will grow stricter as the registry schema expands.



