v1.85.0 - Realtime GA, MCP Gateway Expansion & Hardened Multi-Tenancy
Deploy this versionβ
- Docker
- Pip
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:1.85.0
pip install litellm==1.85.0
Key Highlightsβ
- OpenAI Realtime GA β first-class support for the GA OpenAI Realtime API (plus beta compatibility), including
gpt-realtime-2pricing and/openai/v1/realtimelogging. - Hardened multi-tenancy β a large sweep of per-tenant scoping fixes across keys, projects, batches, files, MCP servers, and analytics endpoints (project-hijack/key-org isolation, service-account resource isolation, per-entity team/agent activity scoping).
- MCP Gateway expansion β org-level MCP server/toolset permissions, OBO (on-behalf-of) MCP auth,
delegate_auth_to_upstreamPKCE passthrough, and MCP access-group name namespacing. - Observability overhaul β broad Prometheus fixes (label-count correctness, end-user cardinality cap, PromQL escaping), OTEL handler isolation + GenAI message-content capture, and decoupled S3 audit-log config.
- New models β xAI
grok-4.3/grok-4.3-latest, OpenAIgpt-realtime-2, OpenRouterqwen/qwen3.6-plus, SambaNovaMiniMax-M2.7, and Bedrock Z.AIGLM-5.
New Models / Updated Modelsβ
New Model Support (5 new models)β
| Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Features |
|---|---|---|---|---|---|
| OpenAI | gpt-realtime-2 | 32K | $4.00 (audio in $32.00) | $16.00 (audio out $64.00) | Realtime (/v1/realtime), audio in/out, function calling, parallel tool calls |
| xAI | xai/grok-4.3 | 1M | $1.25 (>200K: $2.50) | $2.50 (>200K: $5.00) | Reasoning, vision, prompt caching, response schema, web search, tool calling |
| xAI | xai/grok-4.3-latest | 1M | $1.25 (>200K: $2.50) | $2.50 (>200K: $5.00) | Reasoning, vision, prompt caching, response schema, web search, tool calling |
| OpenRouter | openrouter/qwen/qwen3.6-plus | 1M | $0.325 | $1.95 | Reasoning, vision, function calling, tool choice |
| SambaNova | sambanova/MiniMax-M2.7 | 204.8K | $0.30 | $1.20 | Reasoning, function calling, tool choice |
Pricing/metadata also updated for existing entries: Gemini multimodal-embedding pricing repointed to the Vertex pricing source with image/audio/video per-unit costs, audio-token cost reductions on realtime/Gemini entries, and a gemini-embedding-2-preview cost alignment.
- xAI grok-4.3 / grok-4.3-latest metadata - PR #27154, PR #27396
- OpenAI gpt-realtime-2 pricing - PR #27653
- OpenRouter Qwen 3.6 Plus metadata - PR #27486
- New chat model metadata + Bedrock Z.AI GLM-5 - PR #27313, PR #24338
- GPT-4o-Transcribe pricing fix - PR #27875
Featuresβ
- Anthropic
- Bedrock
- Vertex AI
- Gemini
- xAI
- Add
parallel_tool_callsto supported params - PR #25106
- Add
- Azure
- General
Bug Fixesβ
- OpenRouter
- Strip
openrouter/prefix from model names - PR #24282
- Strip
- Azure
- Anthropic / Vertex
- Fireworks AI
- Strip
thinking_blocksfrom chat messages before the Fireworks API call - PR #27881
- Strip
- hosted vLLM
- Normalize custom tools for chat completions - PR #25763
- General
- Decode unified
file_idwhenmodel_file_id_mappingis unavailable - PR #27406 - Pass
output_configthrough to backends that accept it - PR #26439 - Resolve provider from deployment for multi-provider default config - PR #27517
- Return
503from/healthwhen the targeted model is unhealthy or DB is disconnected - PR #27003 - Guard URL-valued model destinations and align resource-model auth checks - PR #26915, PR #26963
- Decode unified
LLM API Endpointsβ
Featuresβ
- Realtime API
- Responses API
- Persist and replay streamed Responses API requests from cache - PR #24580
- Route
gpt-5.4+chat-without-tools to the Responses API - PR #27618 - Preserve
cache_controlin Responses β Chat Completion transformation - PR #27727 - Normalize chat
tool_choicefor the completionsβresponses bridge - PR #27634
- Batches
- Embeddings
- Audio Transcription
- Add NVIDIA Riva STT provider - PR #27185
- Vector Stores
Bugsβ
- General
- Preserve
compact_20260112context management on Bedrock/v1/messages- PR #27534 - Fix managed file
model_mappingswhen the router resolves a single deployment dict - PR #26950 - Omit
modelfrom Azure deployment image-gen / image-edit bodies - PR #27103 - Fix Bedrock passthrough call-ID headers - PR #27412
- Pin Responses API affinity to the Azure resource on model-group switch - PR #27703
- Align
vertex_ai/gemini-embedding-2-previewcost with Vertex multimodal pricing - PR #27848 - Consolidate batch + dynamic limiter check/increment - PR #26954
- Preserve
Management Endpoints / UIβ
Featuresβ
- Virtual Keys
- Teams & Models
- Search teams by team ID alongside name - PR #27684
- Add a "Your Usage" view for admin users on the usage page - PR #26746
- Add Vertex AI Search as a vector-store provider in the UI - PR #27790
- "Last Minute" quick-select on the Logs time range - PR #27446
- Add missing Z.AI (
zai) provider to the Add-Model dropdown - PR #26419
- SSO / Auth
Bugs β access scoping & correctnessβ
- Multi-tenancy isolation
- Scope project, key-org, team, and agent-activity lookups per entity; reject
user_id=Noneon non-admin analytics; re-validateuser_idafter/user/infore-parses query - PR #27011, PR #27014, PR #26929, PR #27009 - Constrain cloud-storage file paths and batch-file model access - PR #27019, PR #27015
- Isolate managed resources for service-account API keys - PR #27004
- Tighten resource-ownership checks and sensitive public-endpoint guards - PR #26951, PR #26912
- Scope project, key-org, team, and agent-activity lookups per entity; reject
- Authorization hardening
- Block missing write routes for proxy admin viewers; restore admin-viewer read parity on Logs + Settings - PR #27007, PR #26846
- Encode upstream URL path identifiers; require a trusted proxy for header-identity auth - PR #26860, PR #26825
- Bind generic SSO state to a session cookie; allow non-admin compliance-path reads - PR #26944, PR #27234
- Keys / Teams / SCIM
- Honor
key access_group_idswhen a team restricts models; resolve access-group names in team filtering and same-name deployment routing - PR #26275, PR #25224, PR #26161 - Revoke virtual keys when SCIM deprovisions a user; fix SCIM user-lookup filters - PR #26861, PR #27308
- Key-rotation bug fix; honor
team_member_permissionson/key/list- PR #27756, PR #27026 /config/updatetargeted per-section writes (dropstore_model_in_dbgate) - PR #26643- Scope CLI stored token to
base_url; redact Gemini API key from URL query params in error traces - PR #26945, PR #24943
- Honor
- UI fixes
- Remove the insecure
?token=URL handler from the login page; clear admin session cookies before establishing an invited user's session; URL-encodeteam_idinteamInfoCall- PR #26924, PR #27227, PR #27466 - Project dropdown empty for internal users (3 bugs); remove blank leading entry from access-group model dropdown; omit
allowed_routesfrom key edit save when unchanged - PR #26664, PR #27521, PR #27553 - Member/team access-group fix; team model test-connection authorization - PR #27317, PR #27487
- Remove the insecure
AI Integrationsβ
Loggingβ
- Prometheus
- Fix custom-metadata label counts, cap end-user metric cardinality, fix remaining-metric zero values, escape
api_keyfor PromQL string literals, emitlitellm_remaining_tokens_metricfor Bedrock & Vertex - PR #27268, PR #27272, PR #27348, PR #27013, PR #27705 - Fix
/metricshang whenrequire_auth_for_metrics_endpointis true and auth succeeds; point/metrics401 at the opt-out flag; fix metric labels for litellm-side rejects - PR #25980, PR #27502, PR #26947
- Fix custom-metadata label counts, cap end-user metric cardinality, fix remaining-metric zero values, escape
- OpenTelemetry
- Arize / LangSmith
- General
- Decouple S3 audit-log config via
s3_audit_callback_params- PR #27222 - Set
verbose_loggerlevel whenLITELLM_LOG=INFO; require a team-management role on/team/{id}/callback; close callback-config and observability-credential side channels; guard dynamic integration hosts - PR #26401, PR #26819, PR #27081, PR #26921
- Decouple S3 audit-log config via
Guardrailsβ
- General
- Add Qohash Nexus guardrail hook - PR #24927
- Run model-level
post_callguardrails on streaming requests; ensure post-call guardrail fires exactly once - PR #26922, PR #27012, PR #26109 - Preserve Responses event streams in Presidio output masking - PR #26878
- Cover multimodal + Responses-API content shapes; tighten tool-permission checks; optional skip of tool message in unified guardrail inputs - PR #26957, PR #26969, PR #27441
- Handle legacy dict shape for
metadata.guardrailsin the Team UI - PR #27224
Prompt Managementβ
- General
Secret Managersβ
- General
- Audit-log
/cache/settingsand/config_overrides/hashicorp_vaultmutations - PR #26953
- Audit-log
Spend Tracking, Budgets and Rate Limitingβ
- Rate Limiting
- Budgets
- Skip the personal-budget hook when a reservation covers the counter - PR #27021
- Treat
0team_member_budgetas no cap; enforce team-member budget without a user row; reset org/tag/proxy budgets correctly - PR #27133, PR #27273, PR #27326, PR #27488 - Flush virtual-key
model_maxbudget spend to Redis after success logging; tighten budget spend admission - PR #27334, PR #26845
- Tag Budgets & Routing
- Enforce tag budgets on
x-litellm-tagsheader requests; tag-budget reset drops stale management-cache entries; unionx-litellm-tagswith static team/key tags; fix internal tag-usage scoping; always merge caller-supplied tags into request metadata - PR #27573, PR #27568, PR #27247, PR #27315, PR #27784 - Tag-routing test preventing header-regex bypass for strict plain-text tags - PR #26805
- Enforce tag budgets on
- Spend Logs / Cost
- Pass
service_tierthrough Azure and Azure AI cost calculation - PR #24926 - Opt-in suppression of stack traces in spend-tracking error logs; keep spend-log cleanup running after batch failures; redact echoed prompts in
error_information; preventsecret_fieldsfrom leaking into spend logs; drop client-supplied pricing fields from request bodies - PR #26899, PR #27303, PR #27689, PR #27143, PR #27071
- Pass
MCP Gatewayβ
- Features
- Bugs
- Sanitize tool names to Anthropic's
[a-zA-Z0-9_-]{1,128}pattern - PR #26788 - Require a trusted-proxy gate before honoring
X-Forwarded-*on OAuth discovery; preserve oauth2 m2m auth for tools routes; runpre_call_tool_checkon the OpenAPI/local-registry path - PR #26841, PR #26871, PR #27016 - Redact MCP server URL/headers for non-admin viewers; replace user-API-key auth with authorization-or-cookie for MCP server creation - PR #27027, PR #27190
- Fix MCP DB reload partial failures; surface upstream 401 for token-forwarding MCP servers - PR #27314, PR #27847
- Sanitize tool names to Anthropic's
Performance / Loadbalancing / Reliability improvementsβ
- Routing & Reliability
- Trigger fallbacks on mid-stream
httpx.TimeoutException- PR #26998 - Register cooldowns on failure + fail fast on stale
encrypted_content(Responses) - PR #27820 - Register model info under the responses/-stripped variant - PR #27531
- Fix Redis Sentinel client handling for authenticated Sentinel setups - PR #26302
- Trigger fallbacks on mid-stream
- Proxy hot path
- Token-verification query optimization - PR #26202
- Run daily activity aggregation off the event loop - PR #27264
- Shared IAM cache + static credentials in
BaseAWSLLM- PR #27125 - Isolate semantic cache entries; stable Redis key generation across working directories; remove a duplicate in-memory cache-size constant - PR #26990, PR #27025, PR #26385
- Early proxy request-size enforcement; coerce non-str
x-litellm-*header values to avoid an httpxTypeError- PR #27311, PR #27504 - Separate DB read and write endpoints - PR #27493
- Health checks
- Config / startup robustness
- Packaging / Docker / Helm / CI
- Pin Wolfi & uv to multi-arch index digests; remove the hardcoded Prisma binary target for multi-arch builds; clear flagged OS-package advisories on the Docker image; refresh dependency locks - PR #27123, PR #27170, PR #27225, PR #27126
- Helm: skip startup
prisma db pushwhen a migrations Job is enabled; increase default probe timeouts, disable debug logging by default - PR #27200, PR #27237 - CI: Rerun Failed Tests for all pytest jobs, block PRs that drop coverage, Redis-backed VCR replay caches, reduce cassette bloat, mutation-testing workflow, dev-tag detection in the release workflow, Playwright apt-install skip - PR #27155, PR #27340, PR #26838, PR #27159, PR #27409, PR #27576, PR #26966, PR #27169
- Remove legacy deployment artifacts and litellm-js packages; remove a redundant backup pricing file; misc test/import cleanup - PR #27541, PR #16590, PR #27699, PR #27633
- Tighten router-settings-override and mock-testing trust; drop blank-text fallback for empty Bedrock Converse thinking blocks - PR #26968, PR #27850
Documentation Updatesβ
- Update the Greptile README logo to a higher-quality image - PR #25385
- Add a
BudgetManager.reset_costdocstring - PR #27867 - Add a
_LoopWrapperclass docstring - PR #27870
New Contributorsβ
- @kimimgo made their first contribution in #24282
- @shubham-arora-clear made their first contribution in #24644
- @ohnoah made their first contribution in #24580
- @ushiromiya-lion made their first contribution in #25106
- @gowtham2809 made their first contribution in #25224
- @he-yufeng made their first contribution in #26401
- @MackDing made their first contribution in #26419
- @dgu1-godaddy made their first contribution in #26834
- @Vedanshu7 made their first contribution in #24943
- @dennishenry made their first contribution in #27190
- @SHARP155 made their first contribution in #27466
- @mats852 made their first contribution in #24927
Full Changelog: https://github.com/BerriAI/litellm/compare/v1.84.0...v1.85.0
05/16/2026 (v1.85.0)β
Counts cover PRs new in
v1.85.0relative tov1.84.0stable. 14 PRs that were backported intov1.84.0stable (and documented in the v1.84.0 release notes) are excluded here to avoid double-counting.
- New Models / Updated Models: 43
- LLM API Endpoints: 24
- Management Endpoints / UI: 54
- AI Integrations (Logging / Guardrails / Prompt Mgmt / Secret Managers): 32
- Spend Tracking, Budgets and Rate Limiting: 23
- MCP Gateway: 12
- Performance / Loadbalancing / Reliability improvements: 41
- Documentation Updates: 3
Total: 232 PRs