Skip to main content

Auto Routing

LiteLLM can auto select the best model for a request based on rules you define.

LiteLLM Python SDK​

Auto routing allows you to define routing rules that automatically select the best model for a request based on the input content. This is useful for directing different types of queries to specialized models.

Setup​

  1. Create a router configuration file (e.g., router.json):
{
"encoder_type": "openai",
"encoder_name": "text-embedding-3-large",
"routes": [
{
"name": "litellm-gpt-4.1",
"utterances": [
"litellm is great"
],
"description": "positive affirmation",
"function_schemas": null,
"llm": null,
"score_threshold": 0.5,
"metadata": {}
},
{
"name": "litellm-claude-35",
"utterances": [
"how to code a program in [language]"
],
"description": "coding assistant",
"function_schemas": null,
"llm": null,
"score_threshold": 0.5,
"metadata": {}
}
]
}
  1. Configure the Router with auto routing models:
from litellm import Router
import os

router = Router(
model_list=[
# Embedding models for routing
{
"model_name": "custom-text-embedding-model",
"litellm_params": {
"model": "text-embedding-3-large",
"api_key": os.getenv("OPENAI_API_KEY"),
},
},
# Your target models
{
"model_name": "litellm-gpt-4.1",
"litellm_params": {
"model": "gpt-4.1",
},
"model_info": {"id": "openai-id"},
},
{
"model_name": "litellm-claude-35",
"litellm_params": {
"model": "claude-3-5-sonnet-latest",
},
"model_info": {"id": "claude-id"},
},
# Auto router configuration
{
"model_name": "auto_router1",
"litellm_params": {
"model": "auto_router/auto_router_1",
"auto_router_config_path": "router.json",
"auto_router_default_model": "gpt-4o-mini",
"auto_router_embedding_model": "custom-text-embedding-model",
},
},
],
)

Usage​

Once configured, use the auto router by calling it with your auto router model name:

# This request will be routed to gpt-4.1 based on the utterance match
response = await router.acompletion(
model="auto_router1",
messages=[{"role": "user", "content": "litellm is great"}],
)

# This request will be routed to claude-3-5-sonnet-latest for coding queries
response = await router.acompletion(
model="auto_router1",
messages=[{"role": "user", "content": "how to code a program in python"}],
)

Configuration Parameters​

  • auto_router_config_path: Path to your router.json configuration file
  • auto_router_default_model: Fallback model when no route matches
  • auto_router_embedding_model: Model used for generating embeddings to match against utterances

Router Configuration Schema​

The router.json file supports the following structure:

  • encoder_type: Type of encoder (e.g., "openai")
  • encoder_name: Name of the embedding model
  • routes: Array of routing rules with:
    • name: Target model name (must match a model in your model_list)
    • utterances: Example phrases/patterns to match against
    • description: Human-readable description of the route
    • score_threshold: Minimum similarity score to trigger this route (0.0-1.0)
    • metadata: Additional metadata for the route

LiteLLM Proxy Server​

Setup​

Navigate to the LiteLLM UI and go to Models+Endpoints > Add Model > Auto Router Tab.

Configure the following required fields:

  • Auto Router Name - The model name that developers will use when making LLM API requests to LiteLLM
  • Default Model - The fallback model used when no route is matched (e.g., if set to "gpt-4o-mini", unmatched requests will be routed to gpt-4o-mini)
  • Embedding Model - The model used to generate embeddings for input messages. These embeddings are used to semantically match input against the utterances defined in your routes

Route Configuration​



Click Add Route to create a new routing rule. Each route consists of utterances that are matched against input messages to determine the target model.

Configure each route with:

  • Utterances - Example phrases that will trigger this route. Use placeholders in brackets for variables:
"how to code a program in [language]",
"can you explain this [language] code",
"can you explain this [language] script",
"can you convert this [language] code to [target_language]"
  • Description - A human-readable description of what this route handles
  • Score Threshold - The minimum similarity score (0.0-1.0) required to trigger this route

Usage​

Once added developers need to select the model=auto_router1 in the model field of the LLM API request.

import openai
client = openai.OpenAI(
api_key="sk-1234", # replace with your LiteLLM API key
base_url="http://localhost:4000"
)

# This request will be auto-routed based on the content
response = client.chat.completions.create(
model="auto_router1",
messages=[
{
"role": "user",
"content": "how to code a program in python"
}
]
)

print(response)

How It Works​

  1. When a request comes in, LiteLLM generates embeddings for the input message
  2. It compares these embeddings against all utterances defined across all your routes simultaneously
  3. It identifies the route with the highest similarity score. If that score exceeds the route's defined threshold, the request is routed to that model. (Because the router selects the global maximum score rather than stopping at the first match, the order of routes in your configuration does not affect which route is selected.)
  4. If no route's maximum score meets its threshold, the request goes to the default model

Complexity Router​

The Complexity Router provides an alternative to semantic routing that uses rule-based scoring to classify requests by complexity and route them to appropriate models — with zero external API calls and sub-millisecond latency.

When to Use​

FeatureSemantic Auto RouterComplexity Router
ClassificationEmbedding-based matchingRule-based scoring
Latency~100-500ms (embedding API)<1ms
API CallsRequires embedding modelNone
TrainingRequires utterance examplesWorks out of the box
Best ForIntent-based routingCost optimization

Use Complexity Router when you want to:

  • Route simple queries to cheaper/faster models (e.g., gpt-4o-mini)
  • Route complex queries to more capable models (e.g., claude-sonnet-4)
  • Minimize latency overhead from routing decisions
  • Avoid additional API costs for embeddings

LiteLLM Python SDK​

from litellm import Router

router = Router(
model_list=[
# Target models for each tier
{
"model_name": "gpt-4o-mini",
"litellm_params": {"model": "gpt-4o-mini"},
},
{
"model_name": "gpt-4o",
"litellm_params": {"model": "gpt-4o"},
},
{
"model_name": "claude-sonnet",
"litellm_params": {"model": "claude-sonnet-4-20250514"},
},
{
"model_name": "o1-preview",
"litellm_params": {"model": "o1-preview"},
},
# Complexity router configuration
{
"model_name": "smart-router",
"litellm_params": {
"model": "auto_router/complexity_router",
"complexity_router_config": {
"tiers": {
"SIMPLE": "gpt-4o-mini",
"MEDIUM": "gpt-4o",
"COMPLEX": "claude-sonnet",
"REASONING": "o1-preview",
},
},
"complexity_router_default_model": "gpt-4o",
},
},
],
)

Usage​

# Simple query → routes to gpt-4o-mini
response = await router.acompletion(
model="smart-router",
messages=[{"role": "user", "content": "What is 2+2?"}],
)

# Complex technical query → routes to claude-sonnet or higher
response = await router.acompletion(
model="smart-router",
messages=[{"role": "user", "content": "Design a distributed microservice architecture with Kubernetes orchestration"}],
)

# Reasoning request → routes to o1-preview
response = await router.acompletion(
model="smart-router",
messages=[{"role": "user", "content": "Think step by step and reason through this problem carefully..."}],
)

LiteLLM Proxy Server​

Add the complexity router to your config.yaml:

model_list:
# Target models
- model_name: gpt-4o-mini
litellm_params:
model: gpt-4o-mini

- model_name: gpt-4o
litellm_params:
model: gpt-4o

- model_name: claude-sonnet
litellm_params:
model: claude-sonnet-4-20250514

- model_name: o1-preview
litellm_params:
model: o1-preview

# Complexity router
- model_name: smart-router
litellm_params:
model: auto_router/complexity_router
complexity_router_config:
tiers:
SIMPLE: gpt-4o-mini
MEDIUM: gpt-4o
COMPLEX: claude-sonnet
REASONING: o1-preview
complexity_router_default_model: gpt-4o

Configuration Options​

Tier Boundaries​

Customize the score thresholds for each tier:

complexity_router_config:
tiers:
SIMPLE: gpt-4o-mini
MEDIUM: gpt-4o
COMPLEX: claude-sonnet
REASONING: o1-preview
tier_boundaries:
simple_medium: 0.15 # Below 0.15 → SIMPLE
medium_complex: 0.35 # 0.15-0.35 → MEDIUM
complex_reasoning: 0.60 # 0.35-0.60 → COMPLEX, above → REASONING

Token Thresholds​

Adjust when prompts are considered "short" or "long":

complexity_router_config:
token_thresholds:
simple: 15 # Prompts under 15 tokens are penalized (simple indicator)
complex: 400 # Prompts over 400 tokens get complexity boost

Dimension Weights​

Customize how much each signal contributes to the complexity score:

complexity_router_config:
dimension_weights:
tokenCount: 0.10 # Prompt length
codePresence: 0.30 # Code-related keywords
reasoningMarkers: 0.25 # "step by step", "think through", etc.
technicalTerms: 0.25 # Domain-specific complexity
simpleIndicators: 0.05 # "what is", "define", greetings
multiStepPatterns: 0.03 # "first...then", numbered steps
questionComplexity: 0.02 # Multiple questions

How Complexity Routing Works​

The router scores each request across 7 dimensions:

DimensionWhat It DetectsEffect
Token CountShort (<15) or long (>400) promptsShort = simple, long = complex
Code Presence"function", "class", "api", "database", etc.Increases complexity
Reasoning Markers"step by step", "think through", "analyze"Triggers REASONING tier
Technical Terms"architecture", "distributed", "encryption"Increases complexity
Simple Indicators"what is", "define", "hello"Decreases complexity
Multi-Step Patterns"first...then", "1. 2. 3."Increases complexity
Question ComplexityMultiple question marksIncreases complexity

Special behavior: If 2+ reasoning markers are detected in the user message, the request automatically routes to the REASONING tier regardless of the weighted score.