Skip to main content

[PRE-RELEASE] v1.73.6-stable

Krrish Dholakia
CEO, LiteLLM
Ishaan Jaffer
CTO, LiteLLM
warning

Known Issues​

The non-root docker image has a known issue around the UI not loading. If you use the non-root docker image we recommend waiting before upgrading to this version. We will post a patch fix for this.

Deploy this version​

This release is not out yet. The pre-release will be live on Sunday and the stable release will be live on Wednesday.


Key Highlights​

Claude on gemini-cli​


This release brings support for using gemini-cli with LiteLLM.

You can use claude-sonnet-4, gemini-2.5-flash (Vertex AI & Google AI Studio), gpt-4.1 and any LiteLLM supported model on gemini-cli.

When you use gemini-cli with LiteLLM you get the following benefits:

Developer Benefits:

  • Universal Model Access: Use any LiteLLM supported model (Anthropic, OpenAI, Vertex AI, Bedrock, etc.) through the gemini-cli interface.
  • Higher Rate Limits & Reliability: Load balance across multiple models and providers to avoid hitting individual provider limits, with fallbacks to ensure you get responses even if one provider fails.

Proxy Admin Benefits:

  • Centralized Management: Control access to all models through a single LiteLLM proxy instance without giving your developers API Keys to each provider.
  • Budget Controls: Set spending limits and track costs across all gemini-cli usage.

Get Started


Batch API Cost Tracking​


v1.73.6 brings cost tracking for LiteLLM Managed Batch API calls to LiteLLM. Previously, this was not being done for Batch API calls using LiteLLM Managed Files. Now, LiteLLM will store the status of each batch call in the DB and poll incomplete batch jobs in the background, emitting a spend log for cost tracking once the batch is complete.

There is no new flag / change needed on your end. Over the next few weeks we hope to extend this to cover batch cost tracking for the Anthropic passthrough as well.

Get Started


New Models / Updated Models​

Pricing / Context Window Updates​

ProviderModelContext WindowInput ($/1M tokens)Output ($/1M tokens)Type
Azure OpenAIazure/o3-pro200k$20.00$80.00New
OpenRouteropenrouter/mistralai/mistral-small-3.2-24b-instruct32k$0.1$0.3New
OpenAIo3-deep-research200k$10.00$40.00New
OpenAIo3-deep-research-2025-06-26200k$10.00$40.00New
OpenAIo4-mini-deep-research200k$2.00$8.00New
OpenAIo4-mini-deep-research-2025-06-26200k$2.00$8.00New
Deepseekdeepseek-r165k$0.55$2.19New
Deepseekdeepseek-v365k$0.27$0.07New

Updated Models​

Bugs​

  • Sambanova
  • Azure
    • support Azure Authentication method (azure ad token, api keys) on Responses API - PR s/o @hsuyuming
    • Map ‘image_url’ str as nested dict - PR s/o @davis-featherstone
  • Watsonx
    • Set ‘model’ field to None when model is part of a custom deployment - fixes error raised by WatsonX in those cases - PR s/o @cbjuan
  • Perplexity
    • Support web_search_options - PR
    • Support citation token and search queries cost calculation - PR
  • Anthropic
    • Null value in usage block handling - PR
  • Gemini (Google AI Studio + VertexAI)
    • Only use accepted format values (enum and datetime) - else gemini raises errors - PR
    • Cache tools if passed alongside cached content (else gemini raises an error) - PR
    • Json schema translation improvement: Fix unpack_def handling of nested $ref inside anyof items - PR
  • Mistral
    • Fix thinking prompt to match hugging face recommendation - PR
    • Add supports_response_schema: true for all mistral models except codestral-mamba - PR
  • Ollama
    • Fix unnecessary await on embedding calls - PR

Features​

  • Azure OpenAI
    • Check if o-series model supports reasoning effort (enables drop_params to work for o1 models)
    • Assistant + tool use cost tracking - PR
  • Nvidia Nim
    • Add ‘response_format’ param support - PR @shagunb-acn 
  • ElevenLabs
    • New STT provider - PR

LLM API Endpoints​

Features​

  • /mcp
    • Send appropriate auth string value to /tool/call endpoint with x-mcp-auth - PR s/o @wagnerjt
  • /v1/messages
  • /chat/completions
    • Azure Responses API via chat completion support - PR
  • /responses
    • Add reasoning content support for non-openai providers - PR
  • [NEW] /generateContent
    • New endpoints for gemini cli support - PR
    • Support calling Google AI Studio / VertexAI Gemini models in their native format - PR
    • Add logging + cost tracking for stream + non-stream vertex/google ai studio routes - PR
    • Add Bridge from generateContent to /chat/completions - PR
  • /batches
    • Filter deployments to only those where managed file was written to - PR
    • Save all model / file id mappings in db (previously it was just the first one) - enables ‘true’ loadbalancing - PR
    • Support List Batches with target model name specified - PR

Spend Tracking / Budget Improvements​

Features​

  • Passthrough
    • Bedrock - cost tracking (/invoke + /converse routes) on streaming + non-streaming - PR
    • VertexAI - anthropic cost calculation support - PR
  • Batches
    • Background job for cost tracking LiteLLM Managed batches - PR

Management Endpoints / UI​

Bugs​

  • General UI
    • Fix today selector date mutation in dashboard components - PR
  • Usage
    • Aggregate usage data across all pages of paginated endpoint - PR
  • Teams
    • De-duplicate models in team settings dropdown - PR
  • Models
    • Preserve public model name when selecting ‘test connect’ with azure model (previously would reset) - PR
  • Invitation Links
    • Ensure Invite links email contain the correct invite id when using tf provider - PR

Features​

  • Models
    • Add ‘last success’ column to health check table - PR
  • MCP
    • New UI component to support auth types: api key, bearer token, basic auth - PR s/o @wagnerjt
    • Ensure internal users can access /mcp and /mcp/ routes - PR
  • SCIM
    • Ensure default_internal_user_params are applied for new users - PR
  • Team
    • Support default key expiry for team member keys - PR
    • Expand team member add check to cover user email - PR
  • UI
    • Restrict UI access by SSO group - PR
  • Keys
    • Add new new_key param for regenerating key - PR
  • Test Keys
    • New ‘get code’ button for getting runnable python code snippet based on ui configuration - PR

Logging / Guardrail Integrations​

Bugs​

  • Braintrust
    • Adds model to metadata to enable braintrust cost estimation - PR

Features​

  • Callbacks
    • (Enterprise) - disable logging callbacks in request headers - PR
    • Add List Callbacks API Endpoint - PR
  • Bedrock Guardrail
    • Don't raise exception on intervene action - PR
    • Ensure PII Masking is applied on response streaming or non streaming content when using post call - PR
  • [NEW] Palo Alto Networks Prisma AIRS Guardrail
  • ElasticSearch
    • New Elasticsearch Logging Tutorial - PR
  • Message Redaction
    • Preserve usage / model information for Embedding redaction - PR

Performance / Loadbalancing / Reliability improvements​

Bugs​

  • Team-only models
    • Filter team-only models from routing logic for non-team calls
  • Context Window Exceeded error
    • Catch anthropic exceptions - PR

Features​

  • Router
    • allow using dynamic cooldown time for a specific deployment - PR
    • handle cooldown_time = 0 for deployments - PR
  • Redis
    • Add better debugging to see what variables are set - PR

General Proxy Improvements​

Bugs​

  • aiohttp
    • Check HTTP_PROXY vars in networking requests
    • Allow using HTTP_ Proxy settings with trust_env

Features​

  • Docs
    • Add recommended spec - PR
  • Swagger
    • Introduce new environment variable NO_REDOC to opt-out Redoc - PR

New Contributors​

Git Diff​