Deploy this version​
- Docker
- Pip
docker run litellm
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.65.4-stable
pip install litellm
pip install litellm==1.65.4.post1
v1.65.4-stable is live. Here are the improvements since v1.65.0-stable.
Key Highlights​
- Preventing DB Deadlocks: Fixes a high-traffic issue when multiple instances were writing to the DB at the same time.
- New Usage Tab: Enables viewing spend by model and customizing date range
Let's dive in.
Preventing DB Deadlocks​
This release fixes the DB deadlocking issue that users faced in high traffic (10K+ RPS). This is great because it enables user/key/team spend tracking works at that scale.
Read more about the new architecture here
New Usage Tab​
The new Usage tab now brings the ability to track daily spend by model. This makes it easier to catch any spend tracking or token counting errors, when combined with the ability to view successful requests, and token usage.
To test this out, just go to Experimental > New Usage > Activity.
New Models / Updated Models​
- Databricks - claude-3-7-sonnet cost tracking PR
- VertexAI -
gemini-2.5-pro-exp-03-25
cost tracking PR - VertexAI -
gemini-2.0-flash
cost tracking PR - Groq - add whisper ASR models to model cost map PR
- IBM - Add watsonx/ibm/granite-3-8b-instruct to model cost map PR
- Google AI Studio - add gemini/gemini-2.5-pro-preview-03-25 to model cost map PR
LLM Translation​
- Vertex AI - Support anyOf param for OpenAI json schema translation Get Started
- Anthropic- response_format + thinking param support (works across Anthropic API, Bedrock, Vertex) Get Started
- Anthropic - if thinking token is specified and max tokens is not - ensure max token to anthropic is higher than thinking tokens (works across Anthropic API, Bedrock, Vertex) PR
- Bedrock - latency optimized inference support Get Started
- Sagemaker - handle special tokens + multibyte character code in response Get Started
- MCP - add support for using SSE MCP servers Get Started
- Anthropic - new
litellm.messages.create
interface for calling Anthropic/v1/messages
via passthrough Get Started - Anthropic - support ‘file’ content type in message param (works across Anthropic API, Bedrock, Vertex) Get Started
- Anthropic - map openai 'reasoning_effort' to anthropic 'thinking' param (works across Anthropic API, Bedrock, Vertex) Get Started
- Google AI Studio (Gemini) - [BETA]
/v1/files
upload support Get Started - Azure - fix o-series tool calling Get Started
- Unified file id - [ALPHA] allow calling multiple providers with same file id PR
- This is experimental, and not recommended for production use.
- We plan to have a production-ready implementation by next week.
- Google AI Studio (Gemini) - return logprobs PR
- Anthropic - Support prompt caching for Anthropic tool calls Get Started
- OpenRouter - unwrap extra body on open router calls PR
- VertexAI - fix credential caching issue PR
- XAI - filter out 'name' param for XAI PR
- Gemini - image generation output support Get Started
- Databricks - support claude-3-7-sonnet w/ thinking + response_format Get Started
Spend Tracking Improvements​
- Reliability fix - Check sent and received model for cost calculation PR
- Vertex AI - Multimodal embedding cost tracking Get Started, PR
Management Endpoints / UI​
- New Usage Tab
- Report 'total_tokens' + report success/failure calls
- Remove double bars on scroll
- Ensure ‘daily spend’ chart ordered from earliest to latest date
- showing spend per model per day
- show key alias on usage tab
- Allow non-admins to view their activity
- Add date picker to new usage tab
- Virtual Keys Tab
- remove 'default key' on user signup
- fix showing user models available for personal key creation
- Test Key Tab
- Allow testing image generation models
- Models Tab
- Fix bulk adding models
- support reusable credentials for passthrough endpoints
- Allow team members to see team models
- Teams Tab
- Fix json serialization error on update team metadata
- Request Logs Tab
- Add reasoning_content token tracking across all providers on streaming
- API
- return key alias on /user/daily/activity Get Started
- SSO
- Allow assigning SSO users to teams on MSFT SSO PR
Logging / Guardrail Integrations​
- Console Logs - Add json formatting for uncaught exceptions PR
- Guardrails - AIM Guardrails support for virtual key based policies Get Started
- Logging - fix completion start time tracking PR
- Prometheus
Performance / Loadbalancing / Reliability improvements​
- Preventing Deadlocks
- Reduce DB Deadlocks by storing spend updates in Redis and then committing to DB PR
- Ensure no deadlocks occur when updating DailyUserSpendTransaction PR
- High Traffic fix - ensure new DB + Redis architecture accurately tracks spend PR
- Use Redis for PodLock Manager instead of PG (ensures no deadlocks occur) PR
- v2 DB Deadlock Reduction Architecture – Add Max Size for In-Memory Queue + Backpressure Mechanism PR
- Prisma Migrations Get Started
- connects litellm proxy to litellm's prisma migration files
- Handle db schema updates from new
litellm-proxy-extras
sdk
- Redis - support password for sync sentinel clients PR
- Fix "Circular reference detected" error when max_parallel_requests = 0 PR
- Code QA - Ban hardcoded numbers PR
Helm​
- fix: wrong indentation of ttlSecondsAfterFinished in chart PR
General Proxy Improvements​
- Fix - only apply service_account_settings.enforced_params on service accounts PR
- Fix - handle metadata null on
/chat/completion
PR - Fix - Move daily user transaction logging outside of 'disable_spend_logs' flag, as they’re unrelated PR
Demo​
Try this on the demo instance today
Complete Git Diff​
See the complete git diff since v1.65.0-stable, here