[New] Fallback Management Endpoints
Dedicated endpoints for managing model fallbacks separately from the general configuration.
Overview​
These endpoints allow you to configure, retrieve, and delete fallback models without modifying the entire proxy configuration. This provides a cleaner and safer way to manage fallbacks compared to using the /config/update endpoint.
Prerequisites​
- Database storage must be enabled: Set
STORE_MODEL_IN_DB=Truein your environment - Models must exist in the router before configuring fallbacks
Endpoints​
POST /fallback​
Create or update fallbacks for a specific model.
Request Body:
{
"model": "gpt-3.5-turbo",
"fallback_models": ["gpt-4", "claude-3-haiku"],
"fallback_type": "general"
}
Parameters:
model(string, required): The primary model name to configure fallbacks forfallback_models(array of strings, required): List of fallback model names in priority orderfallback_type(string, optional): Type of fallback. Options:"general"(default): Standard fallbacks for any error"context_window": Fallbacks for context window exceeded errors"content_policy": Fallbacks for content policy violations
Response:
{
"model": "gpt-3.5-turbo",
"fallback_models": ["gpt-4", "claude-3-haiku"],
"fallback_type": "general",
"message": "Fallback configuration created successfully"
}
Example using cURL:
curl -X POST "http://localhost:4000/fallback" \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-3.5-turbo",
"fallback_models": ["gpt-4", "claude-3-haiku"],
"fallback_type": "general"
}'
Example using Python:
import requests
response = requests.post(
"http://localhost:4000/fallback",
headers={
"Authorization": "Bearer sk-1234",
"Content-Type": "application/json"
},
json={
"model": "gpt-3.5-turbo",
"fallback_models": ["gpt-4", "claude-3-haiku"],
"fallback_type": "general"
}
)
print(response.json())
GET /fallback/{model}​
Get fallback configuration for a specific model.
Parameters:
model(path parameter, required): The model name to get fallbacks forfallback_type(query parameter, optional): Type of fallback to retrieve (default: "general")
Response:
{
"model": "gpt-3.5-turbo",
"fallback_models": ["gpt-4", "claude-3-haiku"],
"fallback_type": "general"
}
Example using cURL:
curl -X GET "http://localhost:4000/fallback/gpt-3.5-turbo?fallback_type=general" \
-H "Authorization: Bearer sk-1234"
Example using Python:
import requests
response = requests.get(
"http://localhost:4000/fallback/gpt-3.5-turbo",
headers={"Authorization": "Bearer sk-1234"},
params={"fallback_type": "general"}
)
print(response.json())
DELETE /fallback/{model}​
Delete fallback configuration for a specific model.
Parameters:
model(path parameter, required): The model name to delete fallbacks forfallback_type(query parameter, optional): Type of fallback to delete (default: "general")
Response:
{
"model": "gpt-3.5-turbo",
"fallback_type": "general",
"message": "Fallback configuration deleted successfully"
}
Example using cURL:
curl -X DELETE "http://localhost:4000/fallback/gpt-3.5-turbo?fallback_type=general" \
-H "Authorization: Bearer sk-1234"
Example using Python:
import requests
response = requests.delete(
"http://localhost:4000/fallback/gpt-3.5-turbo",
headers={"Authorization": "Bearer sk-1234"},
params={"fallback_type": "general"}
)
print(response.json())
Test fallback​
curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "ping"
}
],
"mock_testing_fallbacks": true
}
'
Validation​
The endpoints perform the following validations:
- Model Existence: Verifies that the primary model exists in the router
- Fallback Model Existence: Ensures all fallback models exist in the router
- No Self-Fallback: Prevents a model from being its own fallback
- No Duplicates: Ensures no duplicate models in the fallback list
- Database Enabled: Requires
STORE_MODEL_IN_DB=Trueto be set
Error Responses​
400 Bad Request​
{
"detail": {
"error": "Invalid fallback models: ['non-existent-model']",
"available_models": ["gpt-3.5-turbo", "gpt-4", "claude-3-haiku"]
}
}
404 Not Found​
{
"detail": {
"error": "Model 'gpt-3.5-turbo' not found in router",
"available_models": ["gpt-4", "claude-3-haiku"]
}
}
500 Internal Server Error​
{
"detail": {
"error": "Router not initialized"
}
}
Fallback Types Explained​
General Fallbacks​
Used for any type of error that occurs during model invocation. This is the most common type of fallback.
Use Case: When a model is unavailable, rate-limited, or returns an error.
{
"model": "gpt-3.5-turbo",
"fallback_models": ["gpt-4", "claude-3-haiku"],
"fallback_type": "general"
}
Context Window Fallbacks​
Specifically triggered when a context window exceeded error occurs.
Use Case: When the input is too long for the primary model, fallback to a model with a larger context window.
{
"model": "gpt-3.5-turbo",
"fallback_models": ["gpt-4-32k", "claude-3-opus"],
"fallback_type": "context_window"
}
Content Policy Fallbacks​
Specifically triggered when content policy violations occur.
Use Case: When the primary model rejects content due to safety filters, fallback to a model with different content policies.
{
"model": "gpt-4",
"fallback_models": ["claude-3-haiku"],
"fallback_type": "content_policy"
}
Benefits Over /config/update​
- Safety: Only modifies fallback configuration, won't accidentally change other settings
- Simplicity: Focused API with clear validation messages
- Granularity: Manage fallbacks per model and per type
- Validation: Comprehensive checks ensure configuration is valid before applying
- Clarity: Clear error messages with available models listed
Notes​
- Fallbacks are triggered after the configured number of retries fails
- Fallbacks are attempted in the order specified in
fallback_models - The maximum number of fallbacks attempted is controlled by the router's
max_fallbackssetting - Changes take effect immediately and are persisted to the database