A/B Testing - Traffic Mirroring
Traffic mirroring allows you to "mimic" production traffic to a secondary (silent) model for evaluation purposes. The silent model's response is gathered in the background and does not affect the latency or result of the primary request.
This is useful for:
- Testing a new model's performance on production prompts before switching.
- Comparing costs and latency between different providers.
- Debugging issues by mirroring traffic to a more verbose model.
Quick Start​
To enable traffic mirroring, add silent_model to the litellm_params of a deployment.
- SDK
- Proxy
from litellm import Router
model_list = [
{
"model_name": "gpt-3.5-turbo",
"litellm_params": {
"model": "azure/chatgpt-v-2",
"api_key": "...",
"silent_model": "gpt-4" # 👈 Mirror traffic to gpt-4
},
},
{
"model_name": "gpt-4",
"litellm_params": {
"model": "openai/gpt-4",
"api_key": "..."
},
}
]
router = Router(model_list=model_list)
# The request to "gpt-3.5-turbo" will trigger a background call to "gpt-4"
response = await router.acompletion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "How does traffic mirroring work?"}]
)
Add silent_model to your config.yaml:
model_list:
- model_name: primary-model
litellm_params:
model: azure/gpt-35-turbo
api_key: os.environ/AZURE_API_KEY
silent_model: evaluation-model # 👈 Mirror traffic here
- model_name: evaluation-model
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY
How it works​
- Request Received: A request is made to a model group (e.g.
primary-model). - Deployment Picked: LiteLLM picks a deployment from the group.
- Primary Call: LiteLLM makes the call to the primary deployment.
- Mirroring: If
silent_modelis present, LiteLLM triggers a background call to that model.- For Sync calls: Uses a shared thread pool.
- For Async calls: Uses
asyncio.create_task.
- Isolation: The background call uses a
deepcopyof the original request parameters and setsmetadata["is_silent_experiment"] = True. It also strips out logging IDs to prevent collisions in usage tracking.
Key Features​
- Latency Isolation: The primary request returns as soon as it's ready. The background (silent) call does not block.
- Unified Logging: Background calls are processed via the Router, meaning they are automatically logged to your configured observability tools (Langfuse, S3, etc.).
- Evaluation: Use the
is_silent_experiment: Trueflag in your logs to filter and compare results between the primary and mirrored calls.