v1.73.6-stable

June 28, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:v1.73.6-stable.patch.1

pip install litellm
pip install litellm==1.73.6.post1

Key Highlights

Claude on gemini-cli

This release brings support for using gemini-cli with LiteLLM.

You can use claude-sonnet-4, gemini-2.5-flash (Vertex AI & Google AI Studio), gpt-4.1 and any LiteLLM supported model on gemini-cli.

When you use gemini-cli with LiteLLM you get the following benefits:

Developer Benefits:

Universal Model Access: Use any LiteLLM supported model (Anthropic, OpenAI, Vertex AI, Bedrock, etc.) through the gemini-cli interface.
Higher Rate Limits & Reliability: Load balance across multiple models and providers to avoid hitting individual provider limits, with fallbacks to ensure you get responses even if one provider fails.

Proxy Admin Benefits:

Centralized Management: Control access to all models through a single LiteLLM proxy instance without giving your developers API Keys to each provider.
Budget Controls: Set spending limits and track costs across all gemini-cli usage.

Get Started

Batch API Cost Tracking

v1.73.6 brings cost tracking for LiteLLM Managed Batch API calls to LiteLLM. Previously, this was not being done for Batch API calls using LiteLLM Managed Files. Now, LiteLLM will store the status of each batch call in the DB and poll incomplete batch jobs in the background, emitting a spend log for cost tracking once the batch is complete.

There is no new flag / change needed on your end. Over the next few weeks we hope to extend this to cover batch cost tracking for the Anthropic passthrough as well.

Get Started

New Models / Updated Models

Pricing / Context Window Updates

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Type
Azure OpenAI	`azure/o3-pro`	200k	$20.00	$80.00	New
OpenRouter	`openrouter/mistralai/mistral-small-3.2-24b-instruct`	32k	$0.1	$0.3	New
OpenAI	`o3-deep-research`	200k	$10.00	$40.00	New
OpenAI	`o3-deep-research-2025-06-26`	200k	$10.00	$40.00	New
OpenAI	`o4-mini-deep-research`	200k	$2.00	$8.00	New
OpenAI	`o4-mini-deep-research-2025-06-26`	200k	$2.00	$8.00	New
Deepseek	`deepseek-r1`	65k	$0.55	$2.19	New
Deepseek	`deepseek-v3`	65k	$0.27	$0.07	New

Updated Models

Bugs

Sambanova
- Handle float timestamps - PR s/o @neubig
Azure
- support Azure Authentication method (azure ad token, api keys) on Responses API - PR s/o @hsuyuming
- Map ‘image_url’ str as nested dict - PR s/o @davis-featherstone
Watsonx
- Set ‘model’ field to None when model is part of a custom deployment - fixes error raised by WatsonX in those cases - PR s/o @cbjuan
Perplexity
- Support web_search_options - PR
- Support citation token and search queries cost calculation - PR
Anthropic
- Null value in usage block handling - PR
Gemini (Google AI Studio + VertexAI)
- Only use accepted format values (enum and datetime) - else gemini raises errors - PR
- Cache tools if passed alongside cached content (else gemini raises an error) - PR
- Json schema translation improvement: Fix unpack_def handling of nested $ref inside anyof items - PR
Mistral
- Fix thinking prompt to match hugging face recommendation - PR
- Add supports_response_schema: true for all mistral models except codestral-mamba - PR
Ollama
- Fix unnecessary await on embedding calls - PR

Features

Azure OpenAI
- Check if o-series model supports reasoning effort (enables drop_params to work for o1 models)
- Assistant + tool use cost tracking - PR
Nvidia Nim
- Add ‘response_format’ param support - PR @shagunb-acn
ElevenLabs
- New STT provider - PR

LLM API Endpoints

Features

/mcp
- Send appropriate auth string value to /tool/call endpoint with x-mcp-auth - PR s/o @wagnerjt
/v1/messages
- Custom LLM support - PR
/chat/completions
- Azure Responses API via chat completion support - PR
/responses
- Add reasoning content support for non-openai providers - PR
[NEW] /generateContent
- New endpoints for gemini cli support - PR
- Support calling Google AI Studio / VertexAI Gemini models in their native format - PR
- Add logging + cost tracking for stream + non-stream vertex/google ai studio routes - PR
- Add Bridge from generateContent to /chat/completions - PR
/batches
- Filter deployments to only those where managed file was written to - PR
- Save all model / file id mappings in db (previously it was just the first one) - enables ‘true’ loadbalancing - PR
- Support List Batches with target model name specified - PR

Spend Tracking / Budget Improvements

Features

Passthrough
- Bedrock - cost tracking (/invoke + /converse routes) on streaming + non-streaming - PR
- VertexAI - anthropic cost calculation support - PR
Batches
- Background job for cost tracking LiteLLM Managed batches - PR

Management Endpoints / UI

Bugs

General UI
- Fix today selector date mutation in dashboard components - PR
Usage
- Aggregate usage data across all pages of paginated endpoint - PR
Teams
- De-duplicate models in team settings dropdown - PR
Models
- Preserve public model name when selecting ‘test connect’ with azure model (previously would reset) - PR
Invitation Links
- Ensure Invite links email contain the correct invite id when using tf provider - PR

Features

Models
- Add ‘last success’ column to health check table - PR
MCP
- New UI component to support auth types: api key, bearer token, basic auth - PR s/o @wagnerjt
- Ensure internal users can access /mcp and /mcp/ routes - PR
SCIM
- Ensure default_internal_user_params are applied for new users - PR
Team
- Support default key expiry for team member keys - PR
- Expand team member add check to cover user email - PR
UI
- Restrict UI access by SSO group - PR
Keys
- Add new new_key param for regenerating key - PR
Test Keys
- New ‘get code’ button for getting runnable python code snippet based on ui configuration - PR

Logging / Guardrail Integrations

Bugs

Braintrust
- Adds model to metadata to enable braintrust cost estimation - PR

Features

Callbacks
- (Enterprise) - disable logging callbacks in request headers - PR
- Add List Callbacks API Endpoint - PR
Bedrock Guardrail
- Don't raise exception on intervene action - PR
- Ensure PII Masking is applied on response streaming or non streaming content when using post call - PR
[NEW] Palo Alto Networks Prisma AIRS Guardrail
- PR
ElasticSearch
- New Elasticsearch Logging Tutorial - PR
Message Redaction
- Preserve usage / model information for Embedding redaction - PR

Performance / Loadbalancing / Reliability improvements

Bugs

Team-only models
- Filter team-only models from routing logic for non-team calls
Context Window Exceeded error
- Catch anthropic exceptions - PR

Features

Router
- allow using dynamic cooldown time for a specific deployment - PR
- handle cooldown_time = 0 for deployments - PR
Redis
- Add better debugging to see what variables are set - PR

General Proxy Improvements

Bugs

aiohttp
- Check HTTP_PROXY vars in networking requests
- Allow using HTTP_ Proxy settings with trust_env

Features

Docs
- Add recommended spec - PR
Swagger
- Introduce new environment variable NO_REDOC to opt-out Redoc - PR

New Contributors

@mukesh-dream11 made their first contribution in https://github.com/BerriAI/litellm/pull/11969
@cbjuan made their first contribution in https://github.com/BerriAI/litellm/pull/11854
@ryan-castner made their first contribution in https://github.com/BerriAI/litellm/pull/12055
@davis-featherstone made their first contribution in https://github.com/BerriAI/litellm/pull/12075
@Gum-Joe made their first contribution in https://github.com/BerriAI/litellm/pull/12068
@jroberts2600 made their first contribution in https://github.com/BerriAI/litellm/pull/12116
@ohmeow made their first contribution in https://github.com/BerriAI/litellm/pull/12022
@amarrella made their first contribution in https://github.com/BerriAI/litellm/pull/11942
@zhangyoufu made their first contribution in https://github.com/BerriAI/litellm/pull/12092
@bougou made their first contribution in https://github.com/BerriAI/litellm/pull/12088
@codeugar made their first contribution in https://github.com/BerriAI/litellm/pull/11972
@glgh made their first contribution in https://github.com/BerriAI/litellm/pull/12133

Deploy this version​

Key Highlights​

Claude on gemini-cli​

Batch API Cost Tracking​

New Models / Updated Models​

Pricing / Context Window Updates​

Updated Models​

Bugs​

Features​

LLM API Endpoints​

Features​

Spend Tracking / Budget Improvements​

Features​

Management Endpoints / UI​

Bugs​

Features​

Logging / Guardrail Integrations​

Bugs​

Features​

Performance / Loadbalancing / Reliability improvements​

Bugs​

Features​

General Proxy Improvements​

Bugs​

Features​

New Contributors​

Git Diff​

Deploy this version

Key Highlights

Claude on gemini-cli

Batch API Cost Tracking

New Models / Updated Models

Pricing / Context Window Updates

Updated Models

Bugs

Features

LLM API Endpoints

Features

Spend Tracking / Budget Improvements

Features

Management Endpoints / UI

Bugs

Features

Logging / Guardrail Integrations

Bugs

Features

Performance / Loadbalancing / Reliability improvements

Bugs

Features

General Proxy Improvements

Bugs

Features

New Contributors

Git Diff