v1.81.3-stable - Performance - 25% CPU Usage Reduction
Deploy this version
- Docker
- Pip
docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:v1.81.3.rc.2
pip install litellm
pip install litellm==1.81.3.rc.2
New Models / Updated Models
New Model Support
| Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Deprecation Date |
|---|---|---|---|---|---|
| OpenAI | gpt-audio, gpt-audio-2025-08-28 | 128K | $32/1M audio tokens, $2.5/1M text tokens | $64/1M audio tokens, $10/1M text tokens | - |
| OpenAI | gpt-audio-mini, gpt-audio-mini-2025-08-28 | 128K | $10/1M audio tokens, $0.6/1M text tokens | $20/1M audio tokens, $2.4/1M text tokens | - |
| Deepinfra, Vertex AI, Google AI Studio, OpenRouter, Vercel AI Gateway | gemini-2.0-flash-001, gemini-2.0-flash | - | - | - | 2026-03-31 |
| Groq | openai/gpt-oss-120b | 131K | 0.075/1M cache read | 0.6/1M output tokens | - |
| Groq | groq/openai/gpt-oss-20b | 131K | 0.0375/1M cache read, $0.075/1M text tokens | 0.3/1M output tokens | - |
| Vertex AI | gemini-2.5-computer-use-preview-10-2025 | 128K | $1.25 | $10 | - |
| Azure AI | claude-haiku-4-5 | $1.25/1M cache read, $2/1M cache read above 1 hr, $0.1/1M text tokens | $5/1M output tokens | - | |
| Azure AI | claude-sonnet-4-5 | $3.75/1M cache read, $6/1M cache read above 1 hr, $3/1M text tokens | $15/1M output tokens | - | |
| Azure AI | claude-opus-4-5 | $6.25/1M cache read, $10/1M cache read above 1 hr, $0.5/1M text tokens | $25/1M output tokens | - | |
| Azure AI | claude-opus-4-1 | $18.75/1M cache read, $30/1M cache read above 1 hr, $1.5/1M text tokens | $75/1M output tokens | - |
Features
-
- Docs - Google Workload Identity Federation (WIF) support - PR #19320
-
- Fixes streaming issues with AWS Bedrock AgentCore where responses would stop after the first chunk, particularly affecting OAuth-enabled agents - PR #17141
-
- support for output format for bedrock invoke via v1/messages - PR #19560
-
Gemini(Vertex AI, Google AI Studio)
- use responseJsonSchema for Gemini 2.0+ models - PR #19314
-
- Support Volcengine responses api - PR #18508
-
- New Search provider - PR #19433
-
Sarvam ai
- Add support for new sarvam models - PR #19479
-
- add GMI Cloud provider support - PR #19376
Bug Fixes
-
- Expose stability models via /image_edits endpoint and ensure proper request transformation - PR #19323
- Claude Code x Bedrock Invoke fails with advanced-tool-use-2025-11-20 - PR #19373
- deduplicate tool calls in assistant history - PR #19324
- fix: correct us.anthropic.claude-opus-4-5 In-region pricing - PR #19310
- Fix request validation errors when using Claude 4 via bedrock invoke - PR #19381
- Handle thinking with tool calls for Claude 4 models - PR #19506
- correct streaming choice index for tool calls - PR #19506
-
- Fix tool call errors due with improved message extraction - PR #19369
-
- Removed optional vertex_count_tokens_location param before request is sent to vertex - PR #19359
-
Gemini(Vertex AI, Google AI Studio)
-
- Fix Azure AI costs for Anthropic models - PR #19530
-
- Add tool choice mapping - PR #19645
AI API Endpoints (LLMs, MCP, Agents)
Features
-
- Add managed files support when load_balancing is True - PR #19338
-
- Add self hosted Claude Code Plugin Marketplace - PR #19378
Bugs
-
- Fix duplicate messages during MCP streaming tool execution - PR #19317
- Fix pickle error when using OpenAI's Responses API with stream=True and tool_choice of type allowed_tools (an OpenAI-native parameter) - PR #17205
- stream tool call events for non-openai models - PR #19368
- preserve tool output ordering for gemini in responses bridge - PR #19360
- Add ID caching to prevent ID mismatch text-start and text-delta - PR #19390
- Include output_item, reasoning_summary_Text_done and reasoning_summary_part_done events for non-openai models - PR #19472
-
- fix: drop_params not dropping prompt_cache_key for non-OpenAI providers - PR #19346
-
- disable SSL for ws:// WebSocket connections - PR #19345
-
- Log actual user input when google genai/vertex endpoints are called client-side - PR #19156
-
/messages/count_tokens Anthropic Token Counting
- ensure it works for Anthropic, Azure AI Anthropic on AI Gateway - PR #19432
-
- forward static_headers to MCP servers - PR #19366
-
- Fix: generation config empty for batch - PR #19556
-
- Always reupdate registry - PR #19420
Management Endpoints / UI
Features
-
Cost Estimator
- Fix model dropdown - PR #19529
-
Claude Code Plugins
- Allow Adding Claude Code Plugins via UI - PR #19387
-
Guardrails
-
General
- respects custom authentication header override - PR #19276
-
Playground
-
Models
-
MCP Servers
- MCP Tools Tab Resetting to Overview - PR #19468
-
Organizations
-
Teams
-
Logs
- Include tool arguments in spend logs table - PR #19640
-
Fallbacks / Loadbalancing
Bugs
-
Playground
- increase model selector width in playground Compare view - PR #19423
-
Virtual Keys
- Sorting Shows Incorrect Entries - PR #19534
-
General
-
SSO
- Fix SSO user roles not updating for existing users - PR #19621
-
Guardrails
- ensure guardrail patterns persist on edit and mode toggle - PR #19265
AI Integrations
Logging
- General Logging
- Langfuse OTEL
- ignore service logs and fix callback shadowing - PR #19298
- Langfuse
- GCS Bucket
- Responses API Logging
- Fix pydantic serialization error - PR #19486
- Arize Phoenix
- add openinference span kinds to arize phoenix - PR #19267
- Prometheus
- Added new prometheus metrics for user count and team count - PR #19520
Guardrails
- Bedrock Guardrails
- Ensure post_call guardrail checks input+output - PR #19151
- Prompt Security
- fixing prompt-security's guardrail implementation - PR #19374
- Presidio
- Fixes crash in Presidio Guardrail when running in background threads (logging_hook) - PR #19714
- Pillar Security
- Migrate Pillar Security to Generic Guardrail API - PR #19364
- Policy Engine
- New LiteLLM Policy engine - create policies to manage guardrails, conditions - permissions per Key, Team - PR #19612
- General
- add case-insensitive support for guardrail mode and actions - PR #19480
Prompt Management
- General
- fix prompt info lookup and delete using correct IDs - PR #19358
Secret Manager
- AWS Secret Manager
- ensure auto-rotation updates existing AWS secret instead of creating new one - PR #19455
- Hashicorp Vault
- Ensure key rotations work with Vault - PR #19634
Spend Tracking, Budgets and Rate Limiting
- Pricing Updates
Performance / Loadbalancing / Reliability improvements
-
General
- Fix date overflow/division by zero in proxy utils - PR #19527
- Fix in-flight request termination on SIGTERM when health-check runs in a separate process - PR #19427
- Fix Pass through routes to work with server root path - PR #19383
- Fix logging error for stop iteration - PR #19649
- prevent retrying 4xx client errors - PR #19275
- add better error handling for misconfig on health check - PR #19441
-
Router
-
Memory Leaks/OOM
-
Non root
-
Dockerfile
- Redis Semantic Caching - add missing redisvl dependency to requirements.txt - PR #19417
- Bump OTEL versions to support a2a dependency - resolves modulenotfounderror for Microsoft Agents by @Harshit28j in #18991
-
DB
- Handle PostgreSQL cached plan errors during rolling deployments - PR #19424
-
Timeouts
- Fix: total timeout is not respected - PR #19389
-
SDK
-
Performance
- Cut chat_completion latency by ~21% by reducing pre-call processing time - PR #19535
- Optimize strip_trailing_slash with O(1) index check - PR #19679
- Optimize use_custom_pricing_for_model with set intersection - PR #19677
- perf: skip pattern_router.route() for non-wildcard models - PR #19664
- perf: Add LRU caching to get_model_info for faster cost lookups - PR #19606
General Proxy Improvements
Doc Improvements
- new tutorial for adding MCPs to Cursor via LiteLLM - PR #19317
- fix vertex_region to vertex_location in Vertex AI pass-through docs - PR #19380
- clarify Gemini and Vertex AI model prefix in json file - PR #19443
- update Claude Code integration guides - PR #19415
- adjust opencode tutorial - PR #19605
- add spend-queue-troubleshooting docs - PR #19659
- docs: add litellm-enterprise requirement for managed files - PR #19689
Helm
- Add support for keda in helm chart - PR #19337
- sync Helm chart version with LiteLLM release version - PR #19438
- Enable PreStop hook configuration in values.yaml - PR #19613
General
- Add health check scripts and parallel execution support - PR #19295
New Contributors
- @dushyantzz made their first contribution in PR #19158
- @obod-mpw made their first contribution in PR #19133
- @msexxeta made their first contribution in PR #19030
- @rsicart made their first contribution in PR #19337
- @cluebbehusen made their first contribution in PR #19311
- @Lucky-Lodhi2004 made their first contribution in PR #19315
- @binbandit made their first contribution in PR #19324
- @flex-myeonghyeon made their first contribution in PR #19381
- @Lrakotoson made their first contribution in PR #18321
- @bensi94 made their first contribution in PR #18787
- @victorigualada made their first contribution in PR #19368
- @VedantMadane made their first contribution in #19266
- @stiyyagura0901 made their first contribution in #19276
- @kamilio made their first contribution in PR #19447
- @jonathansampson made their first contribution in PR #19433
- @rynecarbone made their first contribution in PR #19416
- @jayy-77 made their first contribution in #19366
- @davida-ps made their first contribution in PR #19374
- @joaodinissf made their first contribution in PR #19506
- @ecao310 made their first contribution in PR #19520
- @mpcusack-altos made their first contribution in PR #19577
- @milan-berri made their first contribution in PR #19602
- @xqe2011 made their first contribution in #19621

