[Preview] v1.78.0-stable - MCP Gateway: Control Tool Access by Team, Key
Deploy this versionโ
- Docker
- Pip
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.78.0.rc.2
pip install litellm==1.78.0.rc.2
Key Highlightsโ
- MCP Gateway - Control Tool Access by Team, Key - Control MCP tool access by team/key.
- Performance Improvements - 70% Lower p99 Latency
- GPT-5 Pro & GPT-Image-1-Mini - Day 0 support for OpenAI's GPT-5 Pro (400K context) and gpt-image-1-mini image generation
- EnkryptAI Guardrails - New guardrail integration for content moderation
- Tag-Based Budgets - Support for setting budgets based on request tags
MCP Gateway - Control Tool Access by Team, Keyโ
Proxy admins can now control MCP tool access by team or key. This makes it easy to grant different teams selective access to tools from the same MCP server.
For example, you can now give your Engineering team access to list_repositories
, create_issue
, and search_code
tools, while Sales only gets search_code
and close_issue
tools.
This makes it easier for Proxy Admins to govern MCP Tool Access.
Performance - 70% Lower p99 Latencyโ
This release cuts p99 latency by 70% on LiteLLM AI Gateway, making it even better for low-latency use cases.
These gains come from two key enhancements:
Reliable Sessions
Added support for shared sessions with aiohttp. The shared_session parameter is now consistently used across all calls, enabling connection pooling.
Faster Routing
A new model_name_to_deployment_indices
hash map replaces O(n) list scans in _get_all_deployments()
with O(1) hash lookups, boosting routing performance and scalability.
As a result, performance improved across all latency percentiles:
- Median latency: 110 ms โ 100 ms (โ9.1%)
- p95 latency: 440 ms โ 150 ms (โ65.9%)
- p99 latency: 810 ms โ 240 ms (โ70.4%)
- Average latency: 310 ms โ 111.73 ms (โ64.0%)
Test Setupโ
Locust
- Concurrent users:ย 1,000
- Ramp-up:ย 500
System Specs
- Database was used
- CPU:ย 4 vCPUs
- Memory:ย 8 GB RAM
- LiteLLM Workers:ย 4
- Instances: 4
Configuration (config.yaml)
View the complete configuration:ย gist.github.com/AlexsanderHamir/config.yaml
Load Script (no_cache_hits.py)
View the complete load testing script:ย gist.github.com/AlexsanderHamir/no_cache_hits.py
New Models / Updated Modelsโ
New Model Supportโ
Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Features |
---|---|---|---|---|---|
OpenAI | gpt-5-pro | 400K | $15.00 | $120.00 | Responses API, reasoning, vision, function calling, prompt caching, web search |
OpenAI | gpt-5-pro-2025-10-06 | 400K | $15.00 | $120.00 | Responses API, reasoning, vision, function calling, prompt caching, web search |
OpenAI | gpt-image-1-mini | - | $2.00/img | - | Image generation and editing |
OpenAI | gpt-realtime-mini | 128K | $0.60 | $2.40 | Realtime audio, function calling |
Azure AI | azure_ai/Phi-4-mini-reasoning | 131K | $0.08 | $0.32 | Function calling |
Azure AI | azure_ai/Phi-4-reasoning | 32K | $0.125 | $0.50 | Function calling, reasoning |
Azure AI | azure_ai/MAI-DS-R1 | 128K | $1.35 | $5.40 | Reasoning, function calling |
Bedrock | au.anthropic.claude-sonnet-4-5-20250929-v1:0 | 200K | $3.30 | $16.50 | Chat, reasoning, vision, function calling, prompt caching |
Bedrock | global.anthropic.claude-sonnet-4-5-20250929-v1:0 | 200K | $3.00 | $15.00 | Chat, reasoning, vision, function calling, prompt caching |
Bedrock | global.anthropic.claude-sonnet-4-20250514-v1:0 | 1M | $3.00 | $15.00 | Chat, reasoning, vision, function calling, prompt caching |
Bedrock | cohere.embed-v4:0 | 128K | $0.12 | - | Embeddings, image input support |
OCI | oci/cohere.command-latest | 128K | $1.56 | $1.56 | Function calling |
OCI | oci/cohere.command-a-03-2025 | 256K | $1.56 | $1.56 | Function calling |
OCI | oci/cohere.command-plus-latest | 128K | $1.56 | $1.56 | Function calling |
Together AI | together_ai/moonshotai/Kimi-K2-Instruct-0905 | 262K | $1.00 | $3.00 | Function calling |
Together AI | together_ai/Qwen/Qwen3-Next-80B-A3B-Instruct | 262K | $0.15 | $1.50 | Function calling |
Together AI | together_ai/Qwen/Qwen3-Next-80B-A3B-Thinking | 262K | $0.15 | $1.50 | Function calling |
Vertex AI | MedGemma models | Varies | Varies | Varies | Medical-focused Gemma models on custom endpoints |
Watson X | 27 new foundation models | Varies | Varies | Varies | Granite, Llama, Mistral families |
Featuresโ
-
- Add GPT-5 Pro model configuration and documentation - PR #15258
- Add stop parameter to non-supported params for GPT-5 - PR #15244
- Day 0 Support, Add gpt-image-1-mini - PR #15259
- Add gpt-realtime-mini support - PR #15283
- Add gpt-5-pro-2025-10-06 to model costs - PR #15344
- Minimal fix: gpt5 models should not go on cooldown when called with temperature!=1 - PR #15330
-
- Add function calling support for Snowflake Cortex REST API - PR #15221
-
- Fix header forwarding for Gemini/Vertex AI providers in proxy mode - PR #15231
-
- Add Global Cross-Region Inference - PR #15210
- Add Cohere Embed v4 support for AWS Bedrock - PR #15298
- Fix(bedrock): include cacheWriteInputTokens in prompt_tokens calculation - PR #15292
- Add Bedrock AU Cross-Region Inference for Claude Sonnet 4.5 - PR #15402
- Converse โ /v1/messages streaming doesn't handle parallel tool calls with Claude models - PR #15315
-
- Add OCI Cohere support with tool calling and streaming capabilities - PR #15365
-
- Add new together models - PR #15383
Bug Fixesโ
- General
LLM API Endpointsโ
Featuresโ
-
- Feat(files): add @client decorator to file operations - PR #15339
-
- Fix gemini cli by actually streaming the response - PR #15264
-
- Azure - passthrough support with router models - PR #15240
Bugsโ
- General
- Fix x-litellm-cache-key header not being returned on cache hit - PR #15348
Management Endpoints / UIโ
Featuresโ
-
Proxy CLI Auth
- Proxy CLI - dont store existing key in the URL, store it in the state param - PR #15290
-
Models + Endpoints
- Make PATCH
/model/{model_id}/update
handleteam_id
consistently with POST/model/new
- PR #15297 - Feature: adds Infinity as a provider in the UI - PR #15285
- Fix: model + endpoints page crash when config file contains router_settings.model_group_alias - PR #15308
- Models & Endpoints Initial Refactor - PR #15435
- Litellm UI API Reference page updates - PR #15438
- Make PATCH
-
Teams
-
UI Infrastructure
- Added prettier to autoformat frontend - PR #15215
- Adds turbopack to the npm run dev command in UI to build faster during development - PR #15250
- (perf) fix: Replaces bloated key list calls with lean key aliases endpoint - PR #15252
- Potentially fixes a UI spasm issue with an expired cookie - PR #15309
- LiteLLM UI Refactor Infrastructure - PR #15236
- Enforces removal of unused imports from UI - PR #15416
- Fix: usage page >> Model Activity >> spend per day graph: y-axis clipping on large spend values - PR #15389
- Updates guardrail provider logos - PR #15421
-
Admin Settings
-
SSO
- SSO - support EntraID app roles - PR #15351
Logging / Guardrail / Prompt Management Integrationsโ
Featuresโ
Guardrailsโ
Spend Tracking, Budgets and Rate Limitingโ
-
Tag Management
- Tag Management - Add support for setting tag based budgets - PR #15433
-
Dynamic Rate Limiter v3
-
Shared Health Check
- Implement Shared Health Check State Across Pods - PR #15380
MCP Gatewayโ
-
Tool Control
- MCP Gateway - UI - Select allowed tools for Key, Teams - PR #15241
- MCP Gateway - Backend - Allow storing allowed tools by team/key - PR #15243
- MCP Gateway - Fine-grained Database Object Storage Control - PR #15255
- MCP Gateway - Litellm mcp fixes team control - PR #15304
- MCP Gateway - QA/Fixes - Ensure Team/Key level enforcement works for MCPs - PR #15305
- Feature: Include server_name in /v1/mcp/server/health endpoint response - PR #15431
-
OpenAPI Integration
-
Configuration
Performance / Loadbalancing / Reliability improvementsโ
-
Router Optimizations
-
Session Management
-
SSL/TLS Performance
- Perf: optimize SSL/TLS handshake performance with prioritized cipher - PR #15398
-
Dependencies
- Upgrades tenacity version to 8.5.0 - PR #15303
-
Data Masking
- Fix - SensitiveDataMasker converts lists to string - PR #15420
General AI Gateway Improvementsโ
Securityโ
- General
- Fix: redact AWS credentials when redact_user_api_key_info enabled - PR #15321
Documentation Updatesโ
-
Provider Documentation
-
Deployment
- Deletion of docker-compose buggy comment that cause
config.yaml
based startup fail - PR #15425
- Deletion of docker-compose buggy comment that cause
New Contributorsโ
- @Gal-bloch made their first contribution in PR #15219
- @lcfyi made their first contribution in PR #15315
- @ashengstd made their first contribution in PR #15362
- @vkolehmainen made their first contribution in PR #15363
- @jlan-nl made their first contribution in PR #15330
- @BCook98 made their first contribution in PR #15402
- @PabloGmz96 made their first contribution in PR #15425