v1.65.4-stable

April 5, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.65.4-stable

pip install litellm
pip install litellm==1.65.4.post1

v1.65.4-stable is live. Here are the improvements since v1.65.0-stable.

Key Highlights

Preventing DB Deadlocks: Fixes a high-traffic issue when multiple instances were writing to the DB at the same time.
New Usage Tab: Enables viewing spend by model and customizing date range

Let's dive in.

Preventing DB Deadlocks

This release fixes the DB deadlocking issue that users faced in high traffic (10K+ RPS). This is great because it enables user/key/team spend tracking works at that scale.

Read more about the new architecture here

New Usage Tab

The new Usage tab now brings the ability to track daily spend by model. This makes it easier to catch any spend tracking or token counting errors, when combined with the ability to view successful requests, and token usage.

To test this out, just go to Experimental > New Usage > Activity.

New Models / Updated Models

Databricks - claude-3-7-sonnet cost tracking PR
VertexAI - gemini-2.5-pro-exp-03-25 cost tracking PR
VertexAI - gemini-2.0-flash cost tracking PR
Groq - add whisper ASR models to model cost map PR
IBM - Add watsonx/ibm/granite-3-8b-instruct to model cost map PR
Google AI Studio - add gemini/gemini-2.5-pro-preview-03-25 to model cost map PR

LLM Translation

Vertex AI - Support anyOf param for OpenAI json schema translation Get Started
Anthropic- response_format + thinking param support (works across Anthropic API, Bedrock, Vertex) Get Started
Anthropic - if thinking token is specified and max tokens is not - ensure max token to anthropic is higher than thinking tokens (works across Anthropic API, Bedrock, Vertex) PR
Bedrock - latency optimized inference support Get Started
Sagemaker - handle special tokens + multibyte character code in response Get Started
MCP - add support for using SSE MCP servers Get Started
Anthropic - new litellm.messages.create interface for calling Anthropic /v1/messages via passthrough Get Started
Anthropic - support ‘file’ content type in message param (works across Anthropic API, Bedrock, Vertex) Get Started
Anthropic - map openai 'reasoning_effort' to anthropic 'thinking' param (works across Anthropic API, Bedrock, Vertex) Get Started
Google AI Studio (Gemini) - [BETA] /v1/files upload support Get Started
Azure - fix o-series tool calling Get Started
Unified file id - [ALPHA] allow calling multiple providers with same file id PR
- This is experimental, and not recommended for production use.
- We plan to have a production-ready implementation by next week.
Google AI Studio (Gemini) - return logprobs PR
Anthropic - Support prompt caching for Anthropic tool calls Get Started
OpenRouter - unwrap extra body on open router calls PR
VertexAI - fix credential caching issue PR
XAI - filter out 'name' param for XAI PR
Gemini - image generation output support Get Started
Databricks - support claude-3-7-sonnet w/ thinking + response_format Get Started

Spend Tracking Improvements

Reliability fix - Check sent and received model for cost calculation PR
Vertex AI - Multimodal embedding cost tracking Get Started, PR

Management Endpoints / UI

New Usage Tab
- Report 'total_tokens' + report success/failure calls
- Remove double bars on scroll
- Ensure ‘daily spend’ chart ordered from earliest to latest date
- showing spend per model per day
- show key alias on usage tab
- Allow non-admins to view their activity
- Add date picker to new usage tab
Virtual Keys Tab
- remove 'default key' on user signup
- fix showing user models available for personal key creation
Test Key Tab
- Allow testing image generation models
Models Tab
- Fix bulk adding models
- support reusable credentials for passthrough endpoints
- Allow team members to see team models
Teams Tab
- Fix json serialization error on update team metadata
Request Logs Tab
- Add reasoning_content token tracking across all providers on streaming
API
- return key alias on /user/daily/activity Get Started
SSO
- Allow assigning SSO users to teams on MSFT SSO PR

Logging / Guardrail Integrations

Console Logs - Add json formatting for uncaught exceptions PR
Guardrails - AIM Guardrails support for virtual key based policies Get Started
Logging - fix completion start time tracking PR
Prometheus
- Allow adding authentication on Prometheus /metrics endpoints PR
- Distinguish LLM Provider Exception vs. LiteLLM Exception in metric naming PR
- Emit operational metrics for new DB Transaction architecture PR

Performance / Loadbalancing / Reliability improvements

Preventing Deadlocks
- Reduce DB Deadlocks by storing spend updates in Redis and then committing to DB PR
- Ensure no deadlocks occur when updating DailyUserSpendTransaction PR
- High Traffic fix - ensure new DB + Redis architecture accurately tracks spend PR
- Use Redis for PodLock Manager instead of PG (ensures no deadlocks occur) PR
- v2 DB Deadlock Reduction Architecture – Add Max Size for In-Memory Queue + Backpressure Mechanism PR
Prisma Migrations Get Started
- connects litellm proxy to litellm's prisma migration files
- Handle db schema updates from new litellm-proxy-extras sdk
Redis - support password for sync sentinel clients PR
Fix "Circular reference detected" error when max_parallel_requests = 0 PR
Code QA - Ban hardcoded numbers PR

Helm

fix: wrong indentation of ttlSecondsAfterFinished in chart PR

General Proxy Improvements

Fix - only apply service_account_settings.enforced_params on service accounts PR
Fix - handle metadata null on /chat/completion PR
Fix - Move daily user transaction logging outside of 'disable_spend_logs' flag, as they’re unrelated PR

Demo

Try this on the demo instance today

Complete Git Diff

See the complete git diff since v1.65.0-stable, here

Deploy this version​

Key Highlights​

Preventing DB Deadlocks​

New Usage Tab​

New Models / Updated Models​

LLM Translation​

Spend Tracking Improvements​

Management Endpoints / UI​

Logging / Guardrail Integrations​

Performance / Loadbalancing / Reliability improvements​

Helm​

General Proxy Improvements​

Demo​

Complete Git Diff​