This is a pre-release version.
The production version will be released on Wednesday.
Deploy this version​
- Docker
- Pip
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.72.6.rc
This version is not out yet.
TLDR​
- Why Upgrade
- Codex-mini on Claude Code: You can now use
codex-mini
(OpenAI’s code assistant model) via Claude Code. - MCP Permissions Management: Manage permissions for MCP Servers by Keys, Teams, Organizations (entities) on LiteLLM.
- UI: Turn on/off auto refresh on logs view.
- Rate Limiting: Support for output token-only rate limiting.
- Codex-mini on Claude Code: You can now use
- Who Should Read
- Teams using
/v1/messages
API (Claude Code) - Teams using MCP
- Teams giving access to self-hosted models and setting rate limits
- Teams using
- Risk of Upgrade
- Low
- No major changes to existing functionality or package updates.
- Low
Key Highlights​
MCP Permissions Management​
This release brings support for managing permissions for MCP Servers by Keys, Teams, Organizations (entities) on LiteLLM. When a MCP client attempts to list tools, LiteLLM will only return the tools the entity has permissions to access.
This is great for use cases that require access to restricted data (e.g Jira MCP) that you don't want everyone to use.
For Proxy Admins, this enables centralized management of all MCP Servers with access control. For developers, this means you'll only see the MCP tools assigned to you.
Codex-mini on Claude Code​
This release brings support for calling codex-mini
(OpenAI’s code assistant model) via Claude Code.
This is done by LiteLLM enabling any Responses API model (including o3-pro
) to be called via /chat/completions
and /v1/messages
endpoints. This includes:
- Streaming calls
- Non-streaming calls
- Cost Tracking on success + failure for Responses API models
Here's how to use it today
New / Updated Models​
Pricing / Context Window Updates​
Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Type |
---|---|---|---|---|---|
VertexAI | vertex_ai/claude-opus-4 | 200K | $15.00 | $75.00 | New |
OpenAI | gpt-4o-audio-preview-2025-06-03 | 128k | $2.5 (text), $40 (audio) | $10 (text), $80 (audio) | New |
OpenAI | o3-pro | 200k | 20 | 80 | New |
OpenAI | o3-pro-2025-06-10 | 200k | 20 | 80 | New |
OpenAI | o3 | 200k | 2 | 8 | Updated |
OpenAI | o3-2025-04-16 | 200k | 2 | 8 | Updated |
Azure | azure/gpt-4o-mini-transcribe | 16k | 1.25 (text), 3 (audio) | 5 (text) | New |
Mistral | mistral/magistral-medium-latest | 40k | 2 | 5 | New |
Mistral | mistral/magistral-small-latest | 40k | 0.5 | 1.5 | New |
- Deepgram:
nova-3
cost per second pricing is now supported.
Updated Models​
Bugs​
- Watsonx
- Ignore space id on Watsonx deployments (throws json errors) - PR
- Ollama
- Set tool call id for streaming calls - PR
- Gemini (VertexAI + Google AI Studio)
- Custom LLM
- Huggingface
- Add /chat/completions to endpoint url when missing - PR
- Deepgram
- Support async httpx calls - PR
- Anthropic
- Append prefix (if set) to assistant content start - PR
Features​
- VertexAI
- Anthropic
- ‘none’ tool choice param support - PR, Get Started
- Perplexity
- Add ‘reasoning_effort’ support - PR, Get Started
- Mistral
- Add mistral reasoning support - PR, Get Started
- SGLang
- Map context window exceeded error for proper handling - PR
- Deepgram
- Provider specific params support - PR
- Azure
- Return content safety filter results - PR
LLM API Endpoints​
Bugs​
- Chat Completion
- Streaming - Ensure consistent ‘created’ across chunks - PR
Features​
Spend Tracking​
Bugs​
- End Users
- Custom Pricing
- Convert scientific notation str to int - PR
Management Endpoints / UI​
Bugs​
Features​
- Leftnav
- Show remaining Enterprise users on UI
- MCP
- Models
- Add deepgram models on UI
- Model Access Group support on UI - PR
- Keys
- Trim long user id’s - PR
- Logs
Logging / Guardrails Integrations​
Bugs​
- Arize
- Prometheus
- Fix total requests increment - PR
Features​
- Lasso Guardrails
- [NEW] Lasso Guardrails support - PR
- Users
- New
organizations
param on/user/new
- allows adding users to orgs on creation - PR
- New
- Prevent double logging when using bridge logic - PR
Performance / Reliability Improvements​
Bugs​
- Tag based routing
- Do not consider ‘default’ models when request specifies a tag - PR (s/o thiagosalvatore)
Features​
General Proxy Improvements​
Bugs​
- aiohttp
- fixes for transfer encoding error on aiohttp transport - PR
Features​
- aiohttp
- CLI
- Make all commands show server URL - PR
- Unicorn
- Allow setting keep alive timeout - PR
- Experimental Rate Limiting v2 (enable via
EXPERIMENTAL_MULTI_INSTANCE_RATE_LIMITING="True"
) - Helm
- support extraContainers in migrations-job.yaml - PR
New Contributors​
- @laurien16 made their first contribution in https://github.com/BerriAI/litellm/pull/8460
- @fengbohello made their first contribution in https://github.com/BerriAI/litellm/pull/11547
- @lapinek made their first contribution in https://github.com/BerriAI/litellm/pull/11570
- @yanwork made their first contribution in https://github.com/BerriAI/litellm/pull/11586
- @dhs-shine made their first contribution in https://github.com/BerriAI/litellm/pull/11575
- @ElefHead made their first contribution in https://github.com/BerriAI/litellm/pull/11450
- @idootop made their first contribution in https://github.com/BerriAI/litellm/pull/11616
- @stevenaldinger made their first contribution in https://github.com/BerriAI/litellm/pull/11649
- @thiagosalvatore made their first contribution in https://github.com/BerriAI/litellm/pull/11454
- @vanities made their first contribution in https://github.com/BerriAI/litellm/pull/11595
- @alvarosevilla95 made their first contribution in https://github.com/BerriAI/litellm/pull/11661
Demo Instance​
Here's a Demo Instance to test changes:
- Instance: https://demo.litellm.ai/
- Login Credentials:
- Username: admin
- Password: sk-1234