v1.81.9 - Control which MCP Servers are exposed on the Internet
Deploy this versionβ
- Docker
- Pip
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:main-v1.81.9
pip install litellm==1.81.9
Key Highlightsβ
- Claude Opus 4.6 - Full support across Anthropic, AWS Bedrock, Azure AI, and Vertex AI with adaptive thinking and 1M context window
- A2A Agent Gateway - Call A2A (Agent-to-Agent) registered agents through the standard
/chat/completionsAPI - Expose MCP servers on the public internet - Launch MCP servers with public/private visibility and IP-based access control for internet-facing deployments
- UI Team Soft Budget Alerts - Set soft budgets on teams and receive email alerts when spending crosses the threshold β without blocking requests
- Performance Optimizations - Multiple performance improvements including ~40% Prometheus CPU reduction, LRU caching, and optimized logging paths
- LiteLLM Observatory - Automated 24-hour load tests
LiteLLM Observatoryβ
LiteLLM Observatory is a long-running release-validation system we built to catch regressions before they reach users. The system is built to be extensibleβyou can add new tests, configure models and failure thresholds, and queue runs against any deployment. Our goal is to achieve 100% coverage of LiteLLM functionality through these tests. We run 24-hour load tests against our production deployments before all releases, surfacing issues like resource lifecycle bugs, OOMs, and CPU regressions that only appear under sustained load.
MCP Servers on the Public Internetβ
This release makes it safe to expose MCP servers on the public internet by adding public/private visibility and IP-based access control. You can now run internet-facing MCP services while restricting access to trusted networks and keeping internal tools private.
UI Team Soft Budget Alertsβ
Set a soft budget on any team to receive email alerts when spending crosses the threshold β without blocking any requests. Configure the threshold and alerting emails directly from the Admin UI, with no proxy restart needed.
Let's dive in.
New Models / Updated Modelsβ
New Model Support (13 new models)β
| Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) |
|---|---|---|---|---|
| Anthropic | claude-opus-4-6 | 1M | $5.00 | $25.00 |
| AWS Bedrock | anthropic.claude-opus-4-6-v1 | 1M | $5.00 | $25.00 |
| Azure AI | azure_ai/claude-opus-4-6 | 200K | $5.00 | $25.00 |
| Vertex AI | vertex_ai/claude-opus-4-6 | 1M | $5.00 | $25.00 |
| Google Gemini | gemini/deep-research-pro-preview-12-2025 | 65K | $2.00 | $12.00 |
| Vertex AI | vertex_ai/deep-research-pro-preview-12-2025 | 65K | $2.00 | $12.00 |
| Moonshot | moonshot/kimi-k2.5 | 262K | $0.60 | $3.00 |
| OpenRouter | openrouter/qwen/qwen3-235b-a22b-2507 | 262K | $0.07 | $0.10 |
| OpenRouter | openrouter/qwen/qwen3-235b-a22b-thinking-2507 | 262K | $0.11 | $0.60 |
| Together AI | together_ai/zai-org/GLM-4.7 | 200K | $0.45 | $2.00 |
| Together AI | together_ai/moonshotai/Kimi-K2.5 | 256K | $0.50 | $2.80 |
| ElevenLabs | elevenlabs/eleven_v3 | - | $0.18/1K chars | - |
| ElevenLabs | elevenlabs/eleven_multilingual_v2 | - | $0.18/1K chars | - |
Featuresβ
-
- Add 1hr tiered caching costs for long-context models - PR #20214
- Support TTL (1h) field in prompt caching for Bedrock Claude 4.5 models - PR #20338
- Add Nova Sonic speech-to-speech model support - PR #20244
- Fix empty assistant message for Converse API - PR #20390
- Fix content blocked handling - PR #20606
-
- Add Gemini Deep Research model support - PR #20406
- Fix Vertex AI Gemini streaming content_filter handling - PR #20105
- Allow using OpenAI-style tools for
web_searchwith Vertex AI/Gemini models - PR #20280 - Fix
supports_native_streamingfor Gemini and Vertex AI models - PR #20408 - Add mapping for responses tools in file IDs - PR #20402
-
- Support
dimensionsparam for Cohere embed v4 - PR #20235
- Support
-
- Add reasoning param support for GPT OSS Cerebras - PR #20258
-
- Add Kimi K2.5 model entries - PR #20273
-
- Add Qwen3-235B models - PR #20455
-
- Add GLM-4.7 and Kimi-K2.5 models - PR #20319
-
- Add
eleven_v3andeleven_multilingual_v2TTS models - PR #20522
- Add
-
- Add missing capability flags to models - PR #20276
-
- Fix system prompts being dropped and auto-add required Copilot headers - PR #20113
-
- Fix incorrect merging of consecutive user messages for GigaChat provider - PR #20341
-
- Add xAI
/realtimeAPI support - works with LiveKit SDK - PR #20381
- Add xAI
-
- Add
gpt-5-search-apimodel and docs clarifications - PR #20512
- Add
Bug Fixesβ
-
- Fix extra inputs not permitted error for
provider_specific_fields- PR #20334
- Fix extra inputs not permitted error for
-
- Fix: Managed Batches inconsistent state management for list and cancel batches - PR #20331
-
- Fix
open_ai_embedding_modelsto havecustom_llm_providerNone - PR #20253
- Fix
LLM API Endpointsβ
Featuresβ
-
- Filter unsupported Claude Code beta headers for non-Anthropic providers - PR #20578
- Fix inconsistent response format in
anthropic.messages.acreate()when using non-Anthropic providers - PR #20442 - Fix 404 on
/api/event_logging/batchendpoint that caused Claude Code "route not found" errors - PR #20504
-
- Add support for delete and GET via file_id for Gemini - PR #20329
-
General
Management Endpoints / UIβ
Featuresβ
-
SSO Configuration
-
Auth / SDK
- Add
proxy_authfor auto OAuth2/JWT token management in SDK - PR #20238
- Add
-
Virtual Keys
-
Teams & Budgets
-
UI Improvements
- Default Team Settings: Migrate to use Reusable Model Select - PR #20310
- Navbar: Option to Hide Community Engagement Buttons - PR #20308
- Show team alias on Models health page - PR #20359
- Admin Settings: Add option for Authentication for public AI Hub - PR #20444
- Adjust daily spend date filtering for user timezone - PR #20472
-
SCIM
- Add base
/scim/v2endpoint for SCIM resource discovery - PR #20301
- Add base
-
Proxy CLI
- CLI arguments for RDS IAM auth - PR #20437
Bugsβ
- Fix: Remove unnecessary key blocking on UI login that prevented access - PR #20210
- UI - Team Settings: Disable Global Guardrail Persistence - PR #20307
- UI - Model Info Page: Fix Input and Output Labels - PR #20462
- UI - Model Page: Column Resizing on Smaller Screens - PR #20599
- Fix
/key/listuser_idEmpty String Edge Case - PR #20623 - Add array type checks for model, agent, and MCP hub data to prevent UI crashes - PR #20469
- Fix unique constraint on daily tables + logging when updates fail - PR #20394
Logging / Guardrail / Prompt Management Integrationsβ
Bug Fixes (3 fixes)β
-
- Fix Langfuse OTEL trace export failing when spans contain null attributes - PR #20382
-
- Fix incorrect failure metrics labels causing miscounted error rates - PR #20152
-
- Fix Slack alert delivery failing for certain budget threshold configurations - PR #20257
Guardrails (7 updates)β
-
Custom Code Guardrails
-
Team-Based Guardrails
- Implement team-based isolation guardrails management - PR #20318
-
- Ensure OpenAI Moderations Guard works with OpenAI Embeddings - PR #20523
-
- Fix fail-open for GraySwan and pass metadata to Cygnal API endpoint - PR #19837
-
General
Spend Tracking, Budgets and Rate Limitingβ
- Support 0 cost models - Allow zero-cost model entries for internal/free-tier models - PR #20249
MCP Gateway (9 updates)β
- MCP Semantic Filtering - Filter MCP tools using semantic similarity to reduce tool sprawl for LLM calls - PR #20296, PR #20316
- UI - MCP Semantic Filtering - Add support for MCP Semantic Filtering configuration on UI - PR #20454
- MCP IP-Based Access Control - Set MCP servers as private/public available on internet with IP-based restrictions - PR #20607, PR #20620
- Fix MCP "Session not found" error on VSCode reconnect - PR #20298
- Fix OAuth2 'Capabilities: none' bug for upstream MCP servers - PR #20602
- Include Config Defined Search Tools in
/search_tools/list- PR #20371 - UI - Search Tools: Show Config Defined Search Tools - PR #20436
- Ensure MCP permissions are enforced when using JWT Auth - PR #20383
- Fix
gcs_bucket_namenot being passed correctly for MCP server storage configuration - PR #20491
Performance / Loadbalancing / Reliability improvements (14 improvements)β
- Prometheus ~40% CPU reduction - Parallelize budget metrics, fix caching bug, reduce CPU usage - PR #20544
- Prevent closed client errors by reverting httpx client caching - PR #20025
- Avoid unnecessary Router creation when no models or search tools are configured - PR #20661
- Optimize
wrapper_asyncwithCallTypescaching and reduced lookups - PR #20204 - Cache
_get_relevant_args_to_use_for_logging()at module level - PR #20077 - LRU cache for
normalize_request_route- PR #19812 - Optimize
get_standard_logging_metadatawith set intersection - PR #19685 - Early-exit guards in
completion_costfor unused features - PR #20020 - Optimize
get_litellm_paramswith sparse kwargs extraction - PR #19884 - Guard debug log f-strings and remove redundant dict copies - PR #19961
- Replace enum construction with frozenset lookup - PR #20302
- Guard debug f-string in
update_environment_variables- PR #20360 - Warn when budget lookup fails to surface silent caching misses - PR #20545
- Add INFO-level session reuse logging per request for better observability - PR #20597
Database Changesβ
Schema Updatesβ
| Table | Change Type | Description | PR | Migration |
|---|---|---|---|---|
LiteLLM_TeamTable | New Column | Added allow_team_guardrail_config boolean field for team-based guardrail isolation | PR #20318 | Migration |
LiteLLM_DeletedTeamTable | New Column | Added allow_team_guardrail_config boolean field | PR #20318 | Migration |
LiteLLM_TeamTable | New Column | Added soft_budget (double precision) for soft budget alerting | PR #20530 | Migration |
LiteLLM_DeletedTeamTable | New Column | Added soft_budget (double precision) | PR #20653 | Migration |
LiteLLM_MCPServerTable | New Column | Added available_on_public_internet boolean for MCP IP-based access control | PR #20607 | Migration |
Documentation Updates (14 updates)β
- Add FAQ for setting up and verifying LITELLM_LICENSE - PR #20284
- Model request tags documentation - PR #20290
- Add Prisma migration troubleshooting guide - PR #20300
- MCP Semantic Filtering documentation - PR #20316
- Add CopilotKit SDK doc as supported agents SDK - PR #20396
- Add documentation for Nova Sonic - PR #20320
- Update Vertex AI Text to Speech doc to show use of audio - PR #20255
- Improve Okta SSO setup guide with step-by-step instructions - PR #20353
- Langfuse doc update - PR #20443
- Expose MCPs on public internet documentation - PR #20626
- Add blog post: Achieving Sub-Millisecond Proxy Overhead - PR #20309
- Add blog post about litellm-observatory - PR #20622
- Update Opus 4.6 blog with adaptive thinking - PR #20637
gpt-5-search-apidocs clarifications - PR #20512
New Contributorsβ
- @Quentin-M made their first contribution in PR #19818
- @amirzaushnizer made their first contribution in PR #20235
- @cscguochang made their first contribution in PR #20214
- @krauckbot made their first contribution in PR #20273
- @agrattan0820 made their first contribution in PR #19784
- @nina-hu made their first contribution in PR #20472
- @swayambhu94 made their first contribution in PR #20469
- @ssadedin made their first contribution in PR #20566

