Skip to main content

v1.65.4-stable

Krrish Dholakia
Ishaan Jaffer

Deploy this version​

docker run litellm
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.65.4-stable

v1.65.4-stable is live. Here are the improvements since v1.65.0-stable.

Key Highlights​

  • Preventing DB Deadlocks: Fixes a high-traffic issue when multiple instances were writing to the DB at the same time.
  • New Usage Tab: Enables viewing spend by model and customizing date range

Let's dive in.

Preventing DB Deadlocks​

This release fixes the DB deadlocking issue that users faced in high traffic (10K+ RPS). This is great because it enables user/key/team spend tracking works at that scale.

Read more about the new architecture here

New Usage Tab​

The new Usage tab now brings the ability to track daily spend by model. This makes it easier to catch any spend tracking or token counting errors, when combined with the ability to view successful requests, and token usage.

To test this out, just go to Experimental > New Usage > Activity.

New Models / Updated Models​

  1. Databricks - claude-3-7-sonnet cost tracking PR
  2. VertexAI - gemini-2.5-pro-exp-03-25 cost tracking PR
  3. VertexAI - gemini-2.0-flash cost tracking PR
  4. Groq - add whisper ASR models to model cost map PR
  5. IBM - Add watsonx/ibm/granite-3-8b-instruct to model cost map PR
  6. Google AI Studio - add gemini/gemini-2.5-pro-preview-03-25 to model cost map PR

LLM Translation​

  1. Vertex AI - Support anyOf param for OpenAI json schema translation Get Started
  2. Anthropic- response_format + thinking param support (works across Anthropic API, Bedrock, Vertex) Get Started
  3. Anthropic - if thinking token is specified and max tokens is not - ensure max token to anthropic is higher than thinking tokens (works across Anthropic API, Bedrock, Vertex) PR
  4. Bedrock - latency optimized inference support Get Started
  5. Sagemaker - handle special tokens + multibyte character code in response Get Started
  6. MCP - add support for using SSE MCP servers Get Started
  7. Anthropic - new litellm.messages.create interface for calling Anthropic /v1/messages via passthrough Get Started
  8. Anthropic - support ‘file’ content type in message param (works across Anthropic API, Bedrock, Vertex) Get Started
  9. Anthropic - map openai 'reasoning_effort' to anthropic 'thinking' param (works across Anthropic API, Bedrock, Vertex) Get Started
  10. Google AI Studio (Gemini) - [BETA] /v1/files upload support Get Started
  11. Azure - fix o-series tool calling Get Started
  12. Unified file id - [ALPHA] allow calling multiple providers with same file id PR
    • This is experimental, and not recommended for production use.
    • We plan to have a production-ready implementation by next week.
  13. Google AI Studio (Gemini) - return logprobs PR
  14. Anthropic - Support prompt caching for Anthropic tool calls Get Started
  15. OpenRouter - unwrap extra body on open router calls PR
  16. VertexAI - fix credential caching issue PR
  17. XAI - filter out 'name' param for XAI PR
  18. Gemini - image generation output support Get Started
  19. Databricks - support claude-3-7-sonnet w/ thinking + response_format Get Started

Spend Tracking Improvements​

  1. Reliability fix - Check sent and received model for cost calculation PR
  2. Vertex AI - Multimodal embedding cost tracking Get Started, PR

Management Endpoints / UI​

  1. New Usage Tab
    • Report 'total_tokens' + report success/failure calls
    • Remove double bars on scroll
    • Ensure ‘daily spend’ chart ordered from earliest to latest date
    • showing spend per model per day
    • show key alias on usage tab
    • Allow non-admins to view their activity
    • Add date picker to new usage tab
  2. Virtual Keys Tab
    • remove 'default key' on user signup
    • fix showing user models available for personal key creation
  3. Test Key Tab
    • Allow testing image generation models
  4. Models Tab
    • Fix bulk adding models
    • support reusable credentials for passthrough endpoints
    • Allow team members to see team models
  5. Teams Tab
    • Fix json serialization error on update team metadata
  6. Request Logs Tab
    • Add reasoning_content token tracking across all providers on streaming
  7. API
  8. SSO
    • Allow assigning SSO users to teams on MSFT SSO PR

Logging / Guardrail Integrations​

  1. Console Logs - Add json formatting for uncaught exceptions PR
  2. Guardrails - AIM Guardrails support for virtual key based policies Get Started
  3. Logging - fix completion start time tracking PR
  4. Prometheus
    • Allow adding authentication on Prometheus /metrics endpoints PR
    • Distinguish LLM Provider Exception vs. LiteLLM Exception in metric naming PR
    • Emit operational metrics for new DB Transaction architecture PR

Performance / Loadbalancing / Reliability improvements​

  1. Preventing Deadlocks
    • Reduce DB Deadlocks by storing spend updates in Redis and then committing to DB PR
    • Ensure no deadlocks occur when updating DailyUserSpendTransaction PR
    • High Traffic fix - ensure new DB + Redis architecture accurately tracks spend PR
    • Use Redis for PodLock Manager instead of PG (ensures no deadlocks occur) PR
    • v2 DB Deadlock Reduction Architecture – Add Max Size for In-Memory Queue + Backpressure Mechanism PR
  2. Prisma Migrations Get Started
    • connects litellm proxy to litellm's prisma migration files
    • Handle db schema updates from new litellm-proxy-extras sdk
  3. Redis - support password for sync sentinel clients PR
  4. Fix "Circular reference detected" error when max_parallel_requests = 0 PR
  5. Code QA - Ban hardcoded numbers PR

Helm​

  1. fix: wrong indentation of ttlSecondsAfterFinished in chart PR

General Proxy Improvements​

  1. Fix - only apply service_account_settings.enforced_params on service accounts PR
  2. Fix - handle metadata null on /chat/completion PR
  3. Fix - Move daily user transaction logging outside of 'disable_spend_logs' flag, as they’re unrelated PR

Demo​

Try this on the demo instance today

Complete Git Diff​

See the complete git diff since v1.65.0-stable, here