v1.72.6-stable - MCP Gateway Permission Management

June 14, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
docker.litellm.ai/berriai/litellm:main-v1.72.6-stable

pip install litellm
pip install litellm==1.72.6.post2

TLDR

Why Upgrade
- Codex-mini on Claude Code: You can now use codex-mini (OpenAI’s code assistant model) via Claude Code.
- MCP Permissions Management: Manage permissions for MCP Servers by Keys, Teams, Organizations (entities) on LiteLLM.
- UI: Turn on/off auto refresh on logs view.
- Rate Limiting: Support for output token-only rate limiting.
Who Should Read
- Teams using /v1/messages API (Claude Code)
- Teams using MCP
- Teams giving access to self-hosted models and setting rate limits
Risk of Upgrade
- Low
  - No major changes to existing functionality or package updates.

Key Highlights

MCP Permissions Management

This release brings support for managing permissions for MCP Servers by Keys, Teams, Organizations (entities) on LiteLLM. When a MCP client attempts to list tools, LiteLLM will only return the tools the entity has permissions to access.

This is great for use cases that require access to restricted data (e.g Jira MCP) that you don't want everyone to use.

For Proxy Admins, this enables centralized management of all MCP Servers with access control. For developers, this means you'll only see the MCP tools assigned to you.

Codex-mini on Claude Code

This release brings support for calling codex-mini (OpenAI’s code assistant model) via Claude Code.

This is done by LiteLLM enabling any Responses API model (including o3-pro) to be called via /chat/completions and /v1/messages endpoints. This includes:

Streaming calls
Non-streaming calls
Cost Tracking on success + failure for Responses API models

Here's how to use it today

New / Updated Models

Pricing / Context Window Updates

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Type
VertexAI	`vertex_ai/claude-opus-4`	200K	$15.00	$75.00	New
OpenAI	`gpt-4o-audio-preview-2025-06-03`	128k	$2.5 (text), $40 (audio)	$10 (text), $80 (audio)	New
OpenAI	`o3-pro`	200k	20	80	New
OpenAI	`o3-pro-2025-06-10`	200k	20	80	New
OpenAI	`o3`	200k	2	8	Updated
OpenAI	`o3-2025-04-16`	200k	2	8	Updated
Azure	`azure/gpt-4o-mini-transcribe`	16k	1.25 (text), 3 (audio)	5 (text)	New
Mistral	`mistral/magistral-medium-latest`	40k	2	5	New
Mistral	`mistral/magistral-small-latest`	40k	0.5	1.5	New

Deepgram: nova-3 cost per second pricing is now supported.

Updated Models

Bugs

Watsonx
- Ignore space id on Watsonx deployments (throws json errors) - PR
Ollama
- Set tool call id for streaming calls - PR
Gemini (VertexAI + Google AI Studio)
- Fix tool call indexes - PR
- Handle empty string for arguments in function calls - PR
- Add audio/ogg mime type support when inferring from file url’s - PR
Custom LLM
- Fix passing api_base, api_key, litellm_params_dict to custom_llm embedding methods - PR s/o ElefHead
Huggingface
- Add /chat/completions to endpoint url when missing - PR
Deepgram
- Support async httpx calls - PR
Anthropic
- Append prefix (if set) to assistant content start - PR

Features

VertexAI
- Support vertex credentials set via env var on passthrough - PR
- Support for choosing ‘global’ region when model is only available there - PR
- Anthropic passthrough cost calculation + token tracking - PR
- Support ‘global’ vertex region on passthrough - PR
Anthropic
- ‘none’ tool choice param support - PR, Get Started
Perplexity
- Add ‘reasoning_effort’ support - PR, Get Started
Mistral
- Add mistral reasoning support - PR, Get Started
SGLang
- Map context window exceeded error for proper handling - PR
Deepgram
- Provider specific params support - PR
Azure
- Return content safety filter results - PR

LLM API Endpoints

Bugs

Chat Completion
- Streaming - Ensure consistent ‘created’ across chunks - PR

Features

MCP
- Add controls for MCP Permission Management - PR, Docs
- Add permission management for MCP List + Call Tool operations - PR, Docs
- Streamable HTTP server support - PR, PR, Docs
- Use Experimental dedicated Rest endpoints for list, calling MCP tools - PR
Responses API
- NEW API Endpoint - List input items - PR
- Background mode for OpenAI + Azure OpenAI - PR
- Langfuse/other Logging support on responses api requests - PR
Chat Completions
- Bridge for Responses API - allows calling codex-mini via /chat/completions and /v1/messages - PR, PR

Spend Tracking

Bugs

End Users
- Update enduser spend and budget reset date based on budget duration - PR (s/o laurien16)
Custom Pricing
- Convert scientific notation str to int - PR

Management Endpoints / UI

Bugs

Users
- /user/info - fix passing user with + in user id
- Add admin-initiated password reset flow - PR
- Fixes default user settings UI rendering error - PR
Budgets
- Correct success message when new user budget is created - PR

Features

Leftnav
- Show remaining Enterprise users on UI
MCP
- New server add form - PR
- Allow editing mcp servers - PR
Models
- Add deepgram models on UI
- Model Access Group support on UI - PR
Keys
- Trim long user id’s - PR
Logs
- Add live tail feature to logs view, allows user to disable auto refresh in high traffic - PR
- Audit Logs - preview screenshot - PR

Logging / Guardrails Integrations

Bugs

Arize
- Change space_key header to space_id - PR (s/o vanities)
Prometheus
- Fix total requests increment - PR

Features

Lasso Guardrails
- [NEW] Lasso Guardrails support - PR
Users
- New organizations param on /user/new - allows adding users to orgs on creation - PR
Prevent double logging when using bridge logic - PR

Performance / Reliability Improvements

Bugs

Tag based routing
- Do not consider ‘default’ models when request specifies a tag - PR (s/o thiagosalvatore)

Features

Caching
- New optional ‘litellm[caching]’ pip install for adding disk cache dependencies - PR

General Proxy Improvements

Bugs

aiohttp
- fixes for transfer encoding error on aiohttp transport - PR

Features

aiohttp
- Enable System Proxy Support for aiohttp transport - PR (s/o idootop)
CLI
- Make all commands show server URL - PR
Unicorn
- Allow setting keep alive timeout - PR
Experimental Rate Limiting v2 (enable via EXPERIMENTAL_MULTI_INSTANCE_RATE_LIMITING="True")
- Support specifying rate limit by output_tokens only - PR
- Decrement parallel requests on call failure - PR
- In-memory only rate limiting support - PR
- Return remaining rate limits by key/user/team - PR
Helm
- support extraContainers in migrations-job.yaml - PR

New Contributors

@laurien16 made their first contribution in https://github.com/BerriAI/litellm/pull/8460
@fengbohello made their first contribution in https://github.com/BerriAI/litellm/pull/11547
@lapinek made their first contribution in https://github.com/BerriAI/litellm/pull/11570
@yanwork made their first contribution in https://github.com/BerriAI/litellm/pull/11586
@dhs-shine made their first contribution in https://github.com/BerriAI/litellm/pull/11575
@ElefHead made their first contribution in https://github.com/BerriAI/litellm/pull/11450
@idootop made their first contribution in https://github.com/BerriAI/litellm/pull/11616
@stevenaldinger made their first contribution in https://github.com/BerriAI/litellm/pull/11649
@thiagosalvatore made their first contribution in https://github.com/BerriAI/litellm/pull/11454
@vanities made their first contribution in https://github.com/BerriAI/litellm/pull/11595
@alvarosevilla95 made their first contribution in https://github.com/BerriAI/litellm/pull/11661

Demo Instance

Here's a Demo Instance to test changes:

Instance: https://demo.litellm.ai/
Login Credentials:
- Username: admin
- Password: sk-1234

Deploy this version​

TLDR​

Key Highlights​

MCP Permissions Management​

Codex-mini on Claude Code​

New / Updated Models​

Pricing / Context Window Updates​

Updated Models​

Bugs​

Features​

LLM API Endpoints​

Bugs​

Features​

Spend Tracking​

Bugs​

Management Endpoints / UI​

Bugs​

Features​

Logging / Guardrails Integrations​

Bugs​

Features​

Performance / Reliability Improvements​

Bugs​

Features​

General Proxy Improvements​

Bugs​

Features​

New Contributors​

Demo Instance​

Git Diff​

Deploy this version

TLDR

Key Highlights

MCP Permissions Management

Codex-mini on Claude Code

New / Updated Models

Pricing / Context Window Updates

Updated Models

Bugs

Features

LLM API Endpoints

Bugs

Features

Spend Tracking

Bugs

Management Endpoints / UI

Bugs

Features

Logging / Guardrails Integrations

Bugs

Features

Performance / Reliability Improvements

Bugs

Features

General Proxy Improvements

Bugs

Features

New Contributors

Demo Instance

Git Diff