Skip to main content

v1.81.9 - Control which MCP Servers are exposed on the Internet

Krrish Dholakia
CEO, LiteLLM
Ishaan Jaff
CTO, LiteLLM

Deploy this version​

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:main-v1.81.9

Key Highlights​


LiteLLM Observatory​

LiteLLM Observatory is a long-running release-validation system we built to catch regressions before they reach users. The system is built to be extensibleβ€”you can add new tests, configure models and failure thresholds, and queue runs against any deployment. Our goal is to achieve 100% coverage of LiteLLM functionality through these tests. We run 24-hour load tests against our production deployments before all releases, surfacing issues like resource lifecycle bugs, OOMs, and CPU regressions that only appear under sustained load.


MCP Servers on the Public Internet​

This release makes it safe to expose MCP servers on the public internet by adding public/private visibility and IP-based access control. You can now run internet-facing MCP services while restricting access to trusted networks and keeping internal tools private.

Get started

UI Team Soft Budget Alerts​

Set a soft budget on any team to receive email alerts when spending crosses the threshold β€” without blocking any requests. Configure the threshold and alerting emails directly from the Admin UI, with no proxy restart needed.

Get started

Let's dive in.


New Models / Updated Models​

New Model Support (13 new models)​

ProviderModelContext WindowInput ($/1M tokens)Output ($/1M tokens)
Anthropicclaude-opus-4-61M$5.00$25.00
AWS Bedrockanthropic.claude-opus-4-6-v11M$5.00$25.00
Azure AIazure_ai/claude-opus-4-6200K$5.00$25.00
Vertex AIvertex_ai/claude-opus-4-61M$5.00$25.00
Google Geminigemini/deep-research-pro-preview-12-202565K$2.00$12.00
Vertex AIvertex_ai/deep-research-pro-preview-12-202565K$2.00$12.00
Moonshotmoonshot/kimi-k2.5262K$0.60$3.00
OpenRouteropenrouter/qwen/qwen3-235b-a22b-2507262K$0.07$0.10
OpenRouteropenrouter/qwen/qwen3-235b-a22b-thinking-2507262K$0.11$0.60
Together AItogether_ai/zai-org/GLM-4.7200K$0.45$2.00
Together AItogether_ai/moonshotai/Kimi-K2.5256K$0.50$2.80
ElevenLabselevenlabs/eleven_v3-$0.18/1K chars-
ElevenLabselevenlabs/eleven_multilingual_v2-$0.18/1K chars-

Features​

Bug Fixes​


LLM API Endpoints​

Features​

  • Messages API

    • Filter unsupported Claude Code beta headers for non-Anthropic providers - PR #20578
    • Fix inconsistent response format in anthropic.messages.acreate() when using non-Anthropic providers - PR #20442
    • Fix 404 on /api/event_logging/batch endpoint that caused Claude Code "route not found" errors - PR #20504
  • A2A Agent Gateway

    • Allow calling A2A agents through LiteLLM /chat/completions API - PR #20358
    • Use A2A registered agents with /chat/completions - PR #20362
    • Fix A2A agents deployed with localhost/internal URLs in their agent cards - PR #20604
  • Files API

    • Add support for delete and GET via file_id for Gemini - PR #20329
  • General

    • Add User-Agent customization support - PR #19881
    • Fix search tools not found when using per-request routers - PR #19818
    • Forward extra headers in chat - PR #20386

Management Endpoints / UI​

Features​

  • SSO Configuration

    • SSO Config Team Mappings - PR #20111
    • UI - SSO: Add Team Mappings - PR #20299
    • Extract user roles from JWT access token for Keycloak compatibility - PR #20591
  • Auth / SDK

    • Add proxy_auth for auto OAuth2/JWT token management in SDK - PR #20238
  • Virtual Keys

    • Key reset_spend endpoint - PR #20305
    • UI - Keys: Allowed Routes to Key Info and Edit Pages - PR #20369
    • Add Key info endpoint object permission data - PR #20407
    • Keys and Teams Router Setting + Allow Override of Router Settings - PR #20205
  • Teams & Budgets

    • Add soft_budget to Team Table + Create/Update Endpoints - PR #20530
    • Team Soft Budget Email Alerts - PR #20553
    • UI - Team Settings: Soft Budget + Alerting Emails - PR #20634
    • UI - User Budget Page: Unlimited Budget Checkbox - PR #20380
    • /user/update allow for max_budget resets - PR #20375
  • UI Improvements

    • Default Team Settings: Migrate to use Reusable Model Select - PR #20310
    • Navbar: Option to Hide Community Engagement Buttons - PR #20308
    • Show team alias on Models health page - PR #20359
    • Admin Settings: Add option for Authentication for public AI Hub - PR #20444
    • Adjust daily spend date filtering for user timezone - PR #20472
  • SCIM

    • Add base /scim/v2 endpoint for SCIM resource discovery - PR #20301
  • Proxy CLI

Bugs​

  • Fix: Remove unnecessary key blocking on UI login that prevented access - PR #20210
  • UI - Team Settings: Disable Global Guardrail Persistence - PR #20307
  • UI - Model Info Page: Fix Input and Output Labels - PR #20462
  • UI - Model Page: Column Resizing on Smaller Screens - PR #20599
  • Fix /key/list user_id Empty String Edge Case - PR #20623
  • Add array type checks for model, agent, and MCP hub data to prevent UI crashes - PR #20469
  • Fix unique constraint on daily tables + logging when updates fail - PR #20394

Logging / Guardrail / Prompt Management Integrations​

Bug Fixes (3 fixes)​

  • Langfuse

    • Fix Langfuse OTEL trace export failing when spans contain null attributes - PR #20382
  • Prometheus

    • Fix incorrect failure metrics labels causing miscounted error rates - PR #20152
  • Slack Alerts

    • Fix Slack alert delivery failing for certain budget threshold configurations - PR #20257

Guardrails (7 updates)​

  • Custom Code Guardrails

    • Add HTTP support to custom code guardrails + Unified guardrails for MCP + Agent guardrail support - PR #20619
    • Custom Code Guardrails UI Playground - PR #20377
  • Team-Based Guardrails

    • Implement team-based isolation guardrails management - PR #20318
  • OpenAI Moderations

    • Ensure OpenAI Moderations Guard works with OpenAI Embeddings - PR #20523
  • GraySwan / Cygnal

    • Fix fail-open for GraySwan and pass metadata to Cygnal API endpoint - PR #19837
  • General

    • Check for model_response_choices before guardrail input - PR #19784
    • Preserve streaming content on guardrail-sampled chunks - PR #20027

Spend Tracking, Budgets and Rate Limiting​

  • Support 0 cost models - Allow zero-cost model entries for internal/free-tier models - PR #20249

MCP Gateway (9 updates)​

  • MCP Semantic Filtering - Filter MCP tools using semantic similarity to reduce tool sprawl for LLM calls - PR #20296, PR #20316
  • UI - MCP Semantic Filtering - Add support for MCP Semantic Filtering configuration on UI - PR #20454
  • MCP IP-Based Access Control - Set MCP servers as private/public available on internet with IP-based restrictions - PR #20607, PR #20620
  • Fix MCP "Session not found" error on VSCode reconnect - PR #20298
  • Fix OAuth2 'Capabilities: none' bug for upstream MCP servers - PR #20602
  • Include Config Defined Search Tools in /search_tools/list - PR #20371
  • UI - Search Tools: Show Config Defined Search Tools - PR #20436
  • Ensure MCP permissions are enforced when using JWT Auth - PR #20383
  • Fix gcs_bucket_name not being passed correctly for MCP server storage configuration - PR #20491

Performance / Loadbalancing / Reliability improvements (14 improvements)​

  • Prometheus ~40% CPU reduction - Parallelize budget metrics, fix caching bug, reduce CPU usage - PR #20544
  • Prevent closed client errors by reverting httpx client caching - PR #20025
  • Avoid unnecessary Router creation when no models or search tools are configured - PR #20661
  • Optimize wrapper_async with CallTypes caching and reduced lookups - PR #20204
  • Cache _get_relevant_args_to_use_for_logging() at module level - PR #20077
  • LRU cache for normalize_request_route - PR #19812
  • Optimize get_standard_logging_metadata with set intersection - PR #19685
  • Early-exit guards in completion_cost for unused features - PR #20020
  • Optimize get_litellm_params with sparse kwargs extraction - PR #19884
  • Guard debug log f-strings and remove redundant dict copies - PR #19961
  • Replace enum construction with frozenset lookup - PR #20302
  • Guard debug f-string in update_environment_variables - PR #20360
  • Warn when budget lookup fails to surface silent caching misses - PR #20545
  • Add INFO-level session reuse logging per request for better observability - PR #20597

Database Changes​

Schema Updates​

TableChange TypeDescriptionPRMigration
LiteLLM_TeamTableNew ColumnAdded allow_team_guardrail_config boolean field for team-based guardrail isolationPR #20318Migration
LiteLLM_DeletedTeamTableNew ColumnAdded allow_team_guardrail_config boolean fieldPR #20318Migration
LiteLLM_TeamTableNew ColumnAdded soft_budget (double precision) for soft budget alertingPR #20530Migration
LiteLLM_DeletedTeamTableNew ColumnAdded soft_budget (double precision)PR #20653Migration
LiteLLM_MCPServerTableNew ColumnAdded available_on_public_internet boolean for MCP IP-based access controlPR #20607Migration

Documentation Updates (14 updates)​

  • Add FAQ for setting up and verifying LITELLM_LICENSE - PR #20284
  • Model request tags documentation - PR #20290
  • Add Prisma migration troubleshooting guide - PR #20300
  • MCP Semantic Filtering documentation - PR #20316
  • Add CopilotKit SDK doc as supported agents SDK - PR #20396
  • Add documentation for Nova Sonic - PR #20320
  • Update Vertex AI Text to Speech doc to show use of audio - PR #20255
  • Improve Okta SSO setup guide with step-by-step instructions - PR #20353
  • Langfuse doc update - PR #20443
  • Expose MCPs on public internet documentation - PR #20626
  • Add blog post: Achieving Sub-Millisecond Proxy Overhead - PR #20309
  • Add blog post about litellm-observatory - PR #20622
  • Update Opus 4.6 blog with adaptive thinking - PR #20637
  • gpt-5-search-api docs clarifications - PR #20512

New Contributors​

  • @Quentin-M made their first contribution in PR #19818
  • @amirzaushnizer made their first contribution in PR #20235
  • @cscguochang made their first contribution in PR #20214
  • @krauckbot made their first contribution in PR #20273
  • @agrattan0820 made their first contribution in PR #19784
  • @nina-hu made their first contribution in PR #20472
  • @swayambhu94 made their first contribution in PR #20469
  • @ssadedin made their first contribution in PR #20566

Full Changelog​

v1.81.6-nightly...v1.81.9