v1.81.3-stable - Performance - 25% CPU Usage Reduction

January 26, 2026

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaff

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:v1.81.3-stable

pip install litellm
pip install litellm==1.81.3.rc.2

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Deprecation Date
OpenAI	`gpt-audio`, `gpt-audio-2025-08-28`	128K	$32/1M audio tokens, $2.5/1M text tokens	$64/1M audio tokens, $10/1M text tokens	-
OpenAI	`gpt-audio-mini`, `gpt-audio-mini-2025-08-28`	128K	$10/1M audio tokens, $0.6/1M text tokens	$20/1M audio tokens, $2.4/1M text tokens	-
Deepinfra, Vertex AI, Google AI Studio, OpenRouter, Vercel AI Gateway	`gemini-2.0-flash-001`, `gemini-2.0-flash`	-	-	-	2026-03-31
Groq	`openai/gpt-oss-120b`	131K	0.075/1M cache read	0.6/1M output tokens	-
Groq	`groq/openai/gpt-oss-20b`	131K	0.0375/1M cache read, $0.075/1M text tokens	0.3/1M output tokens	-
Vertex AI	`gemini-2.5-computer-use-preview-10-2025`	128K	$1.25	$10	-
Azure AI	`claude-haiku-4-5`	$1.25/1M cache read, $2/1M cache read above 1 hr, $0.1/1M text tokens	$5/1M output tokens	-
Azure AI	`claude-sonnet-4-5`	$3.75/1M cache read, $6/1M cache read above 1 hr, $3/1M text tokens	$15/1M output tokens	-
Azure AI	`claude-opus-4-5`	$6.25/1M cache read, $10/1M cache read above 1 hr, $0.5/1M text tokens	$25/1M output tokens	-
Azure AI	`claude-opus-4-1`	$18.75/1M cache read, $30/1M cache read above 1 hr, $1.5/1M text tokens	$75/1M output tokens	-

Features

OpenAI
- Add gpt-audio and gpt-audio-mini models to pricing - PR #19509
- correct audio token costs for gpt-4o-audio-preview models - PR #19500
- Limit stop sequence as per openai spec (ensures JetBrains IDE compatibility) - PR #19562
VertexAI
- Docs - Google Workload Identity Federation (WIF) support - PR #19320
Agentcore
- Fixes streaming issues with AWS Bedrock AgentCore where responses would stop after the first chunk, particularly affecting OAuth-enabled agents - PR #17141
Chatgpt
- Adds support for calling chatgpt subscription via LiteLLM - PR #19030
- Adds responses API bridge support for chatgpt subscription provider - PR #19030
Bedrock
- support for output format for bedrock invoke via v1/messages - PR #19560
Azure
- Add support for Azure OpenAI v1 API - PR #19313
- preserve content_policy_violation details for images (#19328) - PR #19372
- Support OpenAI-format nested tool definitions for Responses API - PR #19526
Gemini(Vertex AI, Google AI Studio)
- use responseJsonSchema for Gemini 2.0+ models - PR #19314
Volcengine
- Support Volcengine responses api - PR #18508
Anthropic
- Add Support for calling Claude Code Max subscriptions via LiteLLM - PR #19453
- Add Structured output for /v1/messages with Anthropic API, Azure Anthropic API, Bedrock Converse - PR #19545
Brave Search
- New Search provider - PR #19433
Sarvam ai
- Add support for new sarvam models - PR #19479
GMI
- add GMI Cloud provider support - PR #19376

Bug Fixes

Anthropic
- Fix anthropic-beta sent client side being overridden instead of appended to - PR #19343
- Filter out unsupported fields from JSON schema for Anthropic's output_format API - PR #19482
Bedrock
- Expose stability models via /image_edits endpoint and ensure proper request transformation - PR #19323
- Claude Code x Bedrock Invoke fails with advanced-tool-use-2025-11-20 - PR #19373
- deduplicate tool calls in assistant history - PR #19324
- fix: correct us.anthropic.claude-opus-4-5 In-region pricing - PR #19310
- Fix request validation errors when using Claude 4 via bedrock invoke - PR #19381
- Handle thinking with tool calls for Claude 4 models - PR #19506
- correct streaming choice index for tool calls - PR #19506
Ollama
- Fix tool call errors due with improved message extraction - PR #19369
VertexAI
- Removed optional vertex_count_tokens_location param before request is sent to vertex - PR #19359
Gemini(Vertex AI, Google AI Studio)
- Supports setting media_resolution and fps parameters on each video file, when using Gemini video understanding - PR #19273
- handle reasoning_effort as dict from OpenAI Agents SDK - PR #19419
- add file content support in tool results - PR #19416
Azure
- Fix Azure AI costs for Anthropic models - PR #19530
Giga Chat
- Add tool choice mapping - PR #19645

AI API Endpoints (LLMs, MCP, Agents)

Features

Files API
- Add managed files support when load_balancing is True - PR #19338
Claude Plugin Marketplace
- Add self hosted Claude Code Plugin Marketplace - PR #19378
MCP
- Add MCP Protocol version 2025-11-25 support - PR #19379
- Log MCP tool calls and list tools in the LiteLLM Spend Logs table for easier debugging - PR #19469
Vertex AI
- Ensure only anthropic betas are forwarded down to LLM API (by default) - PR #19542
- Allow overriding to support forwarding incoming headers are forwarded down to target - PR #19524
Chat/Completions
- Add MCP tools response to chat completions - PR #19552
- Add custom vertex ai finish reasons to the output - PR #19558
- Return MCP execution in /chat/completions before model output during streaming - PR #19623

Bugs

Responses API
- Fix duplicate messages during MCP streaming tool execution - PR #19317
- Fix pickle error when using OpenAI's Responses API with stream=True and tool_choice of type allowed_tools (an OpenAI-native parameter) - PR #17205
- stream tool call events for non-openai models - PR #19368
- preserve tool output ordering for gemini in responses bridge - PR #19360
- Add ID caching to prevent ID mismatch text-start and text-delta - PR #19390
- Include output_item, reasoning_summary_Text_done and reasoning_summary_part_done events for non-openai models - PR #19472
Chat/Completions
- fix: drop_params not dropping prompt_cache_key for non-OpenAI providers - PR #19346
Realtime API
- disable SSL for ws:// WebSocket connections - PR #19345
Generate Content
- Log actual user input when google genai/vertex endpoints are called client-side - PR #19156
/messages/count_tokens Anthropic Token Counting
- ensure it works for Anthropic, Azure AI Anthropic on AI Gateway - PR #19432
MCP
- forward static_headers to MCP servers - PR #19366
Batch API
- Fix: generation config empty for batch - PR #19556
Pass Through Endpoints
- Always reupdate registry - PR #19420

Management Endpoints / UI

Features

Cost Estimator
- Fix model dropdown - PR #19529
Claude Code Plugins
- Allow Adding Claude Code Plugins via UI - PR #19387
Guardrails
- New Policy management UI - PR #19668
- Allow adding policies on Keys/Teams + Viewing on Info panels - PR #19688
General
- respects custom authentication header override - PR #19276
Playground
- Button to Fill Custom API Base - PR #19440
- display mcp output on the play ground - PR #19553
Models
- Paginate /v2/models/info - PR #19521
- All Model Tab Pagination - PR #19525
- Adding Optional scope Param to /models - PR #19539
- Model Search - PR #19622
- Filter by Model ID and Team ID - PR #19713
MCP Servers
- MCP Tools Tab Resetting to Overview - PR #19468
Organizations
- Prevent org admin from creating a new user with proxy_admin permissions - PR #19296
- Edit Page: Reusable Model Select - PR #19601
Teams
- Reusable Model Select - PR #19543
- [Fix] Team Update with Organization having All Proxy Models - PR #19604
Logs
- Include tool arguments in spend logs table - PR #19640
Fallbacks / Loadbalancing
- New fallbacks modal - PR #19673
- Set fallbacks/loadbalancing by team/key - PR #19686

Bugs

Playground
- increase model selector width in playground Compare view - PR #19423
Virtual Keys
- Sorting Shows Incorrect Entries - PR #19534
General
- UI 404 error when SERVER_ROOT_PATH is set - PR #19467
- Redirect to ui/login on expired JWT - PR #19687
SSO
- Fix SSO user roles not updating for existing users - PR #19621
Guardrails
- ensure guardrail patterns persist on edit and mode toggle - PR #19265

AI Integrations

Logging

General Logging
- prevent printing duplicate StandardLoggingPayload logs - PR #19325
- Fix: log duplication when json_logs is enabled - PR #19705
Langfuse OTEL
- ignore service logs and fix callback shadowing - PR #19298
Langfuse
- Send litellm_trace_id - PR #19528
- Add Langfuse mock mode for testing without API calls - PR #19676
GCS Bucket
- prevent unbounded queue growth due to slow API calls - PR #19297
- Add GCS mock mode for testing without API calls - PR #19683
Responses API Logging
- Fix pydantic serialization error - PR #19486
Arize Phoenix
- add openinference span kinds to arize phoenix - PR #19267
Prometheus
- Added new prometheus metrics for user count and team count - PR #19520

Guardrails

Bedrock Guardrails
- Ensure post_call guardrail checks input+output - PR #19151
Prompt Security
- fixing prompt-security's guardrail implementation - PR #19374
Presidio
- Fixes crash in Presidio Guardrail when running in background threads (logging_hook) - PR #19714
Pillar Security
- Migrate Pillar Security to Generic Guardrail API - PR #19364
Policy Engine
- New LiteLLM Policy engine - create policies to manage guardrails, conditions - permissions per Key, Team - PR #19612
General
- add case-insensitive support for guardrail mode and actions - PR #19480

Prompt Management

General
- fix prompt info lookup and delete using correct IDs - PR #19358

Secret Manager

AWS Secret Manager
- ensure auto-rotation updates existing AWS secret instead of creating new one - PR #19455
Hashicorp Vault
- Ensure key rotations work with Vault - PR #19634

Spend Tracking, Budgets and Rate Limiting

Pricing Updates
- Add openai/dall-e base pricing entries - PR #19133
- Add input_cost_per_video_per_second in ModelInfoBase - PR #19398

Performance / Loadbalancing / Reliability improvements

General
- Fix date overflow/division by zero in proxy utils - PR #19527
- Fix in-flight request termination on SIGTERM when health-check runs in a separate process - PR #19427
- Fix Pass through routes to work with server root path - PR #19383
- Fix logging error for stop iteration - PR #19649
- prevent retrying 4xx client errors - PR #19275
- add better error handling for misconfig on health check - PR #19441
Router
- Fix Azure RPM calculation formula - PR #19513
- Persist scheduler request queue to redis - PR #19304
- pass search_tools to Router during DB-triggered initialization - PR #19388
- Fixed PromptCachingCache to correctly handle messages where cache_control is a sibling key of string content - PR #19266
Memory Leaks/OOM
- prevent OOM with nested $defs in tool schemas - PR #19112
- fix: HTTP client memory leaks in Presidio, OpenAI, and Gemini - PR #19190
Non root
- fix logfile and pidfile of supervisor for non root environment - PR #17267
- resolve Read-only file system error in non-root images - PR #19449
Dockerfile
- Redis Semantic Caching - add missing redisvl dependency to requirements.txt - PR #19417
- Bump OTEL versions to support a2a dependency - resolves modulenotfounderror for Microsoft Agents by @Harshit28j in #18991
DB
- Handle PostgreSQL cached plan errors during rolling deployments - PR #19424
Timeouts
- Fix: total timeout is not respected - PR #19389
SDK
- Field-Existence Checks to Type Classes to Prevent Attribute Errors - PR #18321
- add google-cloud-aiplatform as optional dependency with clear error message - PR #19437
- Make grpc dependency optional - PR #19447
- Add support for retry policies - PR #19645
Performance
- Cut chat_completion latency by ~21% by reducing pre-call processing time - PR #19535
- Optimize strip_trailing_slash with O(1) index check - PR #19679
- Optimize use_custom_pricing_for_model with set intersection - PR #19677
- perf: skip pattern_router.route() for non-wildcard models - PR #19664
- perf: Add LRU caching to get_model_info for faster cost lookups - PR #19606

General Proxy Improvements

Doc Improvements

new tutorial for adding MCPs to Cursor via LiteLLM - PR #19317
fix vertex_region to vertex_location in Vertex AI pass-through docs - PR #19380
clarify Gemini and Vertex AI model prefix in json file - PR #19443
update Claude Code integration guides - PR #19415
adjust opencode tutorial - PR #19605
add spend-queue-troubleshooting docs - PR #19659
docs: add litellm-enterprise requirement for managed files - PR #19689

Helm

Add support for keda in helm chart - PR #19337
sync Helm chart version with LiteLLM release version - PR #19438
Enable PreStop hook configuration in values.yaml - PR #19613

General

Add health check scripts and parallel execution support - PR #19295

New Contributors

@dushyantzz made their first contribution in PR #19158
@obod-mpw made their first contribution in PR #19133
@msexxeta made their first contribution in PR #19030
@rsicart made their first contribution in PR #19337
@cluebbehusen made their first contribution in PR #19311
@Lucky-Lodhi2004 made their first contribution in PR #19315
@binbandit made their first contribution in PR #19324
@flex-myeonghyeon made their first contribution in PR #19381
@Lrakotoson made their first contribution in PR #18321
@bensi94 made their first contribution in PR #18787
@victorigualada made their first contribution in PR #19368
@VedantMadane made their first contribution in #19266
@stiyyagura0901 made their first contribution in #19276
@kamilio made their first contribution in PR #19447
@jonathansampson made their first contribution in PR #19433
@rynecarbone made their first contribution in PR #19416
@jayy-77 made their first contribution in #19366
@davida-ps made their first contribution in PR #19374
@joaodinissf made their first contribution in PR #19506
@ecao310 made their first contribution in PR #19520
@mpcusack-altos made their first contribution in PR #19577
@milan-berri made their first contribution in PR #19602
@xqe2011 made their first contribution in #19621

Full Changelog

View complete changelog on GitHub

Deploy this version​

New Models / Updated Models​

New Model Support​

Features​

Bug Fixes​

AI API Endpoints (LLMs, MCP, Agents)​

Features​

Bugs​

Management Endpoints / UI​

Features​

Bugs​

AI Integrations​

Logging​

Guardrails​

Prompt Management​

Secret Manager​

Spend Tracking, Budgets and Rate Limiting​

Performance / Loadbalancing / Reliability improvements​

General Proxy Improvements​

Doc Improvements​

Helm​

General​

New Contributors​

Full Changelog​

Deploy this version

New Models / Updated Models

New Model Support

Features

Bug Fixes

AI API Endpoints (LLMs, MCP, Agents)

Features

Bugs

Management Endpoints / UI

Features

Bugs

AI Integrations

Logging

Guardrails

Prompt Management

Secret Manager

Spend Tracking, Budgets and Rate Limiting

Performance / Loadbalancing / Reliability improvements

General Proxy Improvements

Doc Improvements

Helm

General

New Contributors

Full Changelog