v1.80.5-stable - Gemini 3.0 Support
Deploy this version​
- Docker
- Pip
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.80.5-stable
pip install litellm==1.80.5
Key Highlights​
- Gemini 3 - Day-0 support for Gemini 3 models with thought signatures
- Prompt Management - Full prompt versioning support with UI for editing, testing, and version history
- MCP Hub - Publish and discover MCP servers within your organization
- Model Compare UI - Side-by-side model comparison interface for testing
- Batch API Spend Tracking - Granular spend tracking with custom metadata for batch and file creation requests
- AWS IAM Secret Manager - IAM role authentication support for AWS Secret Manager
- Logging Callback Controls - Admin-level controls to prevent callers from disabling logging callbacks in compliance environments
- Proxy CLI JWT Authentication - Enable developers to authenticate to LiteLLM AI Gateway using the Proxy CLI
- Batch API Routing - Route batch operations to different provider accounts using model-specific credentials from your config.yaml
Prompt Management​
This release introduces LiteLLM Prompt Studio - a comprehensive prompt management solution built directly into the LiteLLM UI. Create, test, and version your prompts without leaving your browser.
You can now do the following on LiteLLM Prompt Studio:
- Create & Test Prompts: Build prompts with developer messages (system instructions) and test them in real-time with an interactive chat interface
- Dynamic Variables: Use
{{variable_name}}syntax to create reusable prompt templates with automatic variable detection - Version Control: Automatic versioning for every prompt update with complete version history tracking and rollback capabilities
- Prompt Studio: Edit prompts in a dedicated studio environment with live testing and preview
API Integration:
Use your prompts in any application with simple API calls:
response = client.chat.completions.create(
model="gpt-4",
extra_body={
"prompt_id": "your-prompt-id",
"prompt_version": 2, # Optional: specify version
"prompt_variables": {"name": "value"} # Optional: pass variables
}
)
Get started here: LiteLLM Prompt Management Documentation
Performance – /realtime 182× Lower p99 Latency​
This update reduces /realtime latency by removing redundant encodings on the hot path, reusing shared SSL contexts, and caching formatting strings that were being regenerated twice per request despite rarely changing.
Results​
| Metric | Before | After | Improvement |
|---|---|---|---|
| Median latency | 2,200 ms | 59 ms | −97% (~37× faster) |
| p95 latency | 8,500 ms | 67 ms | −99% (~127× faster) |
| p99 latency | 18,000 ms | 99 ms | −99% (~182× faster) |
| Average latency | 3,214 ms | 63 ms | −98% (~51× faster) |
| RPS | 165 | 1,207 | +631% (~7.3× increase) |
Test Setup​
| Category | Specification |
|---|---|
| Load Testing | Locust: 1,000 concurrent users, 500 ramp-up |
| System | 4 vCPUs, 8 GB RAM, 4 workers, 4 instances |
| Database | PostgreSQL (Redis unused) |
| Configuration | config.yaml |
| Load Script | no_cache_hits.py |
Model Compare UI​
New interactive playground UI enables side-by-side comparison of multiple LLM models, making it easy to evaluate and compare model responses.
Features:
- Compare responses from multiple models in real-time
- Side-by-side view with synchronized scrolling
- Support for all LiteLLM-supported models
- Cost tracking per model
- Response time comparison
- Pre-configured prompts for quick and easy testing
Details:
-
Parameterization: Configure API keys, endpoints, models, and model parameters, as well as interaction types (chat completions, embeddings, etc.)
-
Model Comparison: Compare up to 3 different models simultaneously with side-by-side response views
-
Comparison Metrics: View detailed comparison information including:
- Time To First Token
- Input / Output / Reasoning Tokens
- Total Latency
- Cost (if enabled in config)
-
Safety Filters: Configure and test guardrails (safety filters) directly in the playground interface
Get Started with Model Compare
New Providers and Endpoints​
New Providers​
| Provider | Supported Endpoints | Description |
|---|---|---|
| Docker Model Runner | /v1/chat/completions | Run LLM models in Docker containers |
New Models / Updated Models​
New Model Support​
| Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Features |
|---|---|---|---|---|---|
| Azure | azure/gpt-5.1 | 272K | $1.38 | $11.00 | Reasoning, vision, PDF input, responses API |
| Azure | azure/gpt-5.1-2025-11-13 | 272K | $1.38 | $11.00 | Reasoning, vision, PDF input, responses API |
| Azure | azure/gpt-5.1-codex | 272K | $1.38 | $11.00 | Responses API, reasoning, vision |
| Azure | azure/gpt-5.1-codex-2025-11-13 | 272K | $1.38 | $11.00 | Responses API, reasoning, vision |
| Azure | azure/gpt-5.1-codex-mini | 272K | $0.275 | $2.20 | Responses API, reasoning, vision |
| Azure | azure/gpt-5.1-codex-mini-2025-11-13 | 272K | $0.275 | $2.20 | Responses API, reasoning, vision |
| Azure EU | azure/eu/gpt-5-2025-08-07 | 272K | $1.375 | $11.00 | Reasoning, vision, PDF input |
| Azure EU | azure/eu/gpt-5-mini-2025-08-07 | 272K | $0.275 | $2.20 | Reasoning, vision, PDF input |
| Azure EU | azure/eu/gpt-5-nano-2025-08-07 | 272K | $0.055 | $0.44 | Reasoning, vision, PDF input |
| Azure EU | azure/eu/gpt-5.1 | 272K | $1.38 | $11.00 | Reasoning, vision, PDF input, responses API |
| Azure EU | azure/eu/gpt-5.1-codex | 272K | $1.38 | $11.00 | Responses API, reasoning, vision |
| Azure EU | azure/eu/gpt-5.1-codex-mini | 272K | $0.275 | $2.20 | Responses API, reasoning, vision |
| Gemini | gemini-3-pro-preview | 2M | $1.25 | $5.00 | Reasoning, vision, function calling |
| Gemini | gemini-3-pro-image | 2M | $1.25 | $5.00 | Image generation, reasoning |
| OpenRouter | openrouter/deepseek/deepseek-v3p1-terminus | 164K | $0.20 | $0.40 | Function calling, reasoning |
| OpenRouter | openrouter/moonshot/kimi-k2-instruct | 262K | $0.60 | $2.50 | Function calling, web search |
| OpenRouter | openrouter/gemini/gemini-3-pro-preview | 2M | $1.25 | $5.00 | Reasoning, vision, function calling |
| XAI | xai/grok-4.1-fast | 2M | $0.20 | $0.50 | Reasoning, function calling |
| Together AI | together_ai/z-ai/glm-4.6 | 203K | $0.40 | $1.75 | Function calling, reasoning |
| Cerebras | cerebras/gpt-oss-120b | 131K | $0.60 | $0.60 | Function calling |
| Bedrock | anthropic.claude-sonnet-4-5-20250929-v1:0 | 200K | $3.00 | $15.00 | Computer use, reasoning, vision |
Features​
-
Gemini (Google AI Studio + Vertex AI)
- Add Day 0 gemini-3-pro-preview support - PR #16719
- Add support for Gemini 3 Pro Image model - PR #16938
- Add reasoning_content to streaming responses with tools enabled - PR #16854
- Add includeThoughts=True for Gemini 3 reasoning_effort - PR #16838
- Support thought signatures for Gemini 3 in responses API - PR #16872
- Correct wrong system message handling for gemma - PR #16767
- Gemini 3 Pro Image: capture image_tokens and support cost_per_output_image - PR #16912
- Fix missing costs for gemini-2.5-flash-image - PR #16882
- Gemini 3 thought signatures in tool call id - PR #16895
-
- Snowflake provider support: added embeddings, PAT, account_id - PR #15727
-
- Add oci_endpoint_id Parameter for OCI Dedicated Endpoints - PR #16723
-
- Add support for Grok 4.1 Fast models - PR #16936
-
- Add GLM 4.6 from together.ai - PR #16942
-
- Fix Cerebras GPT-OSS-120B model name - PR #16939
Bug Fixes​
-
General
LLM API Endpoints​
Features​
-
- Search APIs - error in firecrawl-search "Invalid request body" - PR #16943
-
- Fix videos tagging - PR #16770
Bugs​
- General
Management Endpoints / UI​
Features​
-
Proxy CLI Auth
- Allow using JWTs for signing in with Proxy CLI - PR #16756
-
Virtual Keys
- Fix Key Model Alias Not Working - PR #16896
-
Models + Endpoints
-
Teams
- Teams table empty state - PR #16738
-
Fallbacks
- Fallbacks icon button tooltips and delete with friction - PR #16737
-
MCP Servers
- Delete user and MCP Server Modal, MCP Table Tooltips - PR #16751
-
Callbacks
-
Usage & Analytics
- Allow partial matches for user ID in User Table - PR #16952
-
General UI
Bugs​
-
UI Fixes
- Fix flaky tests due to antd Notification Manager - PR #16740
- Fix UI MCP Tool Test Regression - PR #16695
- Fix edit logging settings not appearing - PR #16798
- Add css to truncate long request ids in request viewer - PR #16665
- Remove azure/ prefix in Placeholder for Azure in Add Model - PR #16597
- Remove UI Session Token from user/info return - PR #16851
- Remove console logs and errors from model tab - PR #16455
- Change Bulk Invite User Roles to Match Backend - PR #16906
- Mock Tremor's Tooltip to Fix Flaky UI Tests - PR #16786
- Fix e2e ui playwright test - PR #16799
- Fix Tests in CI/CD - PR #16972
-
SSO
-
Auth
-
Swagger UI
- Fixes Swagger UI resolver errors for chat completion endpoints caused by Pydantic v2
$defsnot being properly exposed in the OpenAPI schema - PR #16784
- Fixes Swagger UI resolver errors for chat completion endpoints caused by Pydantic v2
AI Integrations​
Logging​
-
- Filter secret fields form Langfuse - PR #16842
-
General
Guardrails​
-
- Fix IBM Guardrails optional params, add extra_headers field - PR #16771
-
- Grayswan guardrail passthrough on flagged - PR #16891
-
General Guardrails
- Fix prompt injection not working - PR #16701
Prompt Management​
- Prompt Management
- Allow specifying just prompt_id in a request to a model - PR #16834
- Add support for versioning prompts - PR #16836
- Allow storing prompt version in DB - PR #16848
- Add UI for editing the prompts - PR #16853
- Allow testing prompts with Chat UI - PR #16898
- Allow viewing version history - PR #16901
- Allow specifying prompt version in code - PR #16929
- UI, allow seeing model, prompt id for Prompt - PR #16932
- Show "get code" section for prompt management + minor polish of showing version history - PR #16941
Secret Managers​
- AWS Secrets Manager
- Adds IAM role assumption support for AWS Secret Manager - PR #16887
MCP Gateway​
- MCP Hub - Publish/discover MCP Servers within a company - PR #16857
- MCP Resources - MCP resources support - PR #16800
- MCP OAuth - Docs - mcp oauth flow details - PR #16742
- MCP Lifecycle - Drop MCPClient.connect and use run_with_session lifecycle - PR #16696
- MCP Server IDs - Add mcp server ids - PR #16904
- MCP URL Format - Fix mcp url format - PR #16940
Performance / Loadbalancing / Reliability improvements​
- Realtime Endpoint Performance - Fix bottlenecks degrading realtime endpoint performance - PR #16670
- SSL Context Caching - Cache SSL contexts to prevent excessive memory allocation - PR #16955
- Cache Optimization - Fix cache cooldown key generation - PR #16954
- Router Cache - Fix routing for requests with same cacheable prefix but different user messages - PR #16951
- Redis Event Loop - Fix redis event loop closed at first call - PR #16913
- Dependency Management - Upgrade pydantic to version 2.11.0 - PR #16909
Documentation Updates​
-
Provider Documentation
-
API Documentation
-
General Documentation
- Add mini-swe-agent to Projects built on LiteLLM - PR #16971
Infrastructure / CI/CD​
-
UI Testing
-
Dependency Management
-
Migration
- Migration job labels - PR #16831
-
Config
- This yaml actually works - PR #16757
-
Release Notes
-
Investigation
- Investigate issue root cause - PR #16859
New Contributors​
- @mattmorgis made their first contribution in PR #16371
- @mmandic-coatue made their first contribution in PR #16732
- @Bradley-Butcher made their first contribution in PR #16725
- @BenjaminLevy made their first contribution in PR #16757
- @CatBraaain made their first contribution in PR #16767
- @tushar8408 made their first contribution in PR #16831
- @nbsp1221 made their first contribution in PR #16845
- @idola9 made their first contribution in PR #16832
- @nkukard made their first contribution in PR #16864
- @alhuang10 made their first contribution in PR #16852
- @sebslight made their first contribution in PR #16838
- @TsurumaruTsuyoshi made their first contribution in PR #16905
- @cyberjunk made their first contribution in PR #16492
- @colinlin-stripe made their first contribution in PR #16895
- @sureshdsk made their first contribution in PR #16883
- @eiliyaabedini made their first contribution in PR #16875
- @justin-tahara made their first contribution in PR #16957
- @wangsoft made their first contribution in PR #16913
- @dsduenas made their first contribution in PR #16891

