Skip to main content

v1.85.0 - Realtime GA, MCP Gateway Expansion & Hardened Multi-Tenancy

Deploy this version​

docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:1.85.0

Key Highlights​

  • OpenAI Realtime GA β€” first-class support for the GA OpenAI Realtime API (plus beta compatibility), including gpt-realtime-2 pricing and /openai/v1/realtime logging.
  • Hardened multi-tenancy β€” a large sweep of per-tenant scoping fixes across keys, projects, batches, files, MCP servers, and analytics endpoints (project-hijack/key-org isolation, service-account resource isolation, per-entity team/agent activity scoping).
  • MCP Gateway expansion β€” org-level MCP server/toolset permissions, OBO (on-behalf-of) MCP auth, delegate_auth_to_upstream PKCE passthrough, and MCP access-group name namespacing.
  • Observability overhaul β€” broad Prometheus fixes (label-count correctness, end-user cardinality cap, PromQL escaping), OTEL handler isolation + GenAI message-content capture, and decoupled S3 audit-log config.
  • New models β€” xAI grok-4.3 / grok-4.3-latest, OpenAI gpt-realtime-2, OpenRouter qwen/qwen3.6-plus, SambaNova MiniMax-M2.7, and Bedrock Z.AI GLM-5.

New Models / Updated Models​

New Model Support (5 new models)​

ProviderModelContext WindowInput ($/1M tokens)Output ($/1M tokens)Features
OpenAIgpt-realtime-232K$4.00 (audio in $32.00)$16.00 (audio out $64.00)Realtime (/v1/realtime), audio in/out, function calling, parallel tool calls
xAIxai/grok-4.31M$1.25 (>200K: $2.50)$2.50 (>200K: $5.00)Reasoning, vision, prompt caching, response schema, web search, tool calling
xAIxai/grok-4.3-latest1M$1.25 (>200K: $2.50)$2.50 (>200K: $5.00)Reasoning, vision, prompt caching, response schema, web search, tool calling
OpenRouteropenrouter/qwen/qwen3.6-plus1M$0.325$1.95Reasoning, vision, function calling, tool choice
SambaNovasambanova/MiniMax-M2.7204.8K$0.30$1.20Reasoning, function calling, tool choice

Pricing/metadata also updated for existing entries: Gemini multimodal-embedding pricing repointed to the Vertex pricing source with image/audio/video per-unit costs, audio-token cost reductions on realtime/Gemini entries, and a gemini-embedding-2-preview cost alignment.

Features​

  • Anthropic
    • Forward output_config.effort, reject garbage reasoning_effort with 400, and omit thinking/output_config when reasoning_effort="none" - PR #27074, PR #27039
    • Add Bedrock Claude Platform route - PR #27678
    • Inject dummy tool without modify_params - PR #27620
  • Bedrock
    • Add Z.AI GLM-5 model support - PR #24338
    • Handle document content blocks in Converse API message conversion - PR #24644
    • Refactor response stream shape handling - PR #27257
  • Vertex AI
    • Model Garden OpenAPI support for publisher model IDs - PR #26076
    • Omit system_instruction/tools/toolConfig when cachedContent set - PR #26077
  • Gemini
    • Follow provider defaults for Gemini 3 thinking - PR #25764
    • Handle Gemini Files API URIs without fetching - PR #24922
    • Normalize response_schema on native generateContent - PR #27775
  • xAI
    • Add parallel_tool_calls to supported params - PR #25106
  • Azure
    • Authenticate to Azure with a token - PR #27556
    • Azure Sentinel audit-log support - PR #27280
  • General
    • gpt-5.5 reasoning-effort capability flags + supports_low_reasoning_effort - PR #26456
    • Match litellm.completion supported params with proxy model info - PR #27720

Bug Fixes​

  • OpenRouter
    • Strip openrouter/ prefix from model names - PR #24282
  • Azure
    • Forward api_version to aembedding() for Azure AI Foundry v1 endpoints - PR #24911
    • Route Azure container file requests by decoded deployment - PR #26402
  • Anthropic / Vertex
    • Fix Vertex Anthropic streaming status-error hangs - PR #27310
    • Fix Anthropic streaming reasoning token usage - PR #27319
  • Fireworks AI
    • Strip thinking_blocks from chat messages before the Fireworks API call - PR #27881
  • hosted vLLM
    • Normalize custom tools for chat completions - PR #25763
  • General
    • Decode unified file_id when model_file_id_mapping is unavailable - PR #27406
    • Pass output_config through to backends that accept it - PR #26439
    • Resolve provider from deployment for multi-provider default config - PR #27517
    • Return 503 from /health when the targeted model is unhealthy or DB is disconnected - PR #27003
    • Guard URL-valued model destinations and align resource-model auth checks - PR #26915, PR #26963

LLM API Endpoints​

Features​

Bugs​

  • General
    • Preserve compact_20260112 context management on Bedrock /v1/messages - PR #27534
    • Fix managed file model_mappings when the router resolves a single deployment dict - PR #26950
    • Omit model from Azure deployment image-gen / image-edit bodies - PR #27103
    • Fix Bedrock passthrough call-ID headers - PR #27412
    • Pin Responses API affinity to the Azure resource on model-group switch - PR #27703
    • Align vertex_ai/gemini-embedding-2-preview cost with Vertex multimodal pricing - PR #27848
    • Consolidate batch + dynamic limiter check/increment - PR #26954

Management Endpoints / UI​

Features​

  • Virtual Keys
    • Bulk key updates for a team - PR #26468
    • Rename "Default" key type to "Full Access" and reorder dropdown - PR #27218
    • Add Expires to key Overview header; merge User into one field - PR #27696
  • Teams & Models
    • Search teams by team ID alongside name - PR #27684
    • Add a "Your Usage" view for admin users on the usage page - PR #26746
    • Add Vertex AI Search as a vector-store provider in the UI - PR #27790
    • "Last Minute" quick-select on the Logs time range - PR #27446
    • Add missing Z.AI (zai) provider to the Add-Model dropdown - PR #26419
  • SSO / Auth
    • JWT scope + wildcard support for routing overrides, issuer verification with unscoped warning - PR #26325, PR #27008
    • Grafana Cloud Pyroscope authentication - PR #26902
    • Show full IdP claims in /sso/debug/callback - PR #27498

Bugs β€” access scoping & correctness​

  • Multi-tenancy isolation
    • Scope project, key-org, team, and agent-activity lookups per entity; reject user_id=None on non-admin analytics; re-validate user_id after /user/info re-parses query - PR #27011, PR #27014, PR #26929, PR #27009
    • Constrain cloud-storage file paths and batch-file model access - PR #27019, PR #27015
    • Isolate managed resources for service-account API keys - PR #27004
    • Tighten resource-ownership checks and sensitive public-endpoint guards - PR #26951, PR #26912
  • Authorization hardening
    • Block missing write routes for proxy admin viewers; restore admin-viewer read parity on Logs + Settings - PR #27007, PR #26846
    • Encode upstream URL path identifiers; require a trusted proxy for header-identity auth - PR #26860, PR #26825
    • Bind generic SSO state to a session cookie; allow non-admin compliance-path reads - PR #26944, PR #27234
  • Keys / Teams / SCIM
    • Honor key access_group_ids when a team restricts models; resolve access-group names in team filtering and same-name deployment routing - PR #26275, PR #25224, PR #26161
    • Revoke virtual keys when SCIM deprovisions a user; fix SCIM user-lookup filters - PR #26861, PR #27308
    • Key-rotation bug fix; honor team_member_permissions on /key/list - PR #27756, PR #27026
    • /config/update targeted per-section writes (drop store_model_in_db gate) - PR #26643
    • Scope CLI stored token to base_url; redact Gemini API key from URL query params in error traces - PR #26945, PR #24943
  • UI fixes
    • Remove the insecure ?token= URL handler from the login page; clear admin session cookies before establishing an invited user's session; URL-encode team_id in teamInfoCall - PR #26924, PR #27227, PR #27466
    • Project dropdown empty for internal users (3 bugs); remove blank leading entry from access-group model dropdown; omit allowed_routes from key edit save when unchanged - PR #26664, PR #27521, PR #27553
    • Member/team access-group fix; team model test-connection authorization - PR #27317, PR #27487

AI Integrations​

Logging​

  • Prometheus
    • Fix custom-metadata label counts, cap end-user metric cardinality, fix remaining-metric zero values, escape api_key for PromQL string literals, emit litellm_remaining_tokens_metric for Bedrock & Vertex - PR #27268, PR #27272, PR #27348, PR #27013, PR #27705
    • Fix /metrics hang when require_auth_for_metrics_endpoint is true and auth succeeds; point /metrics 401 at the opt-out flag; fix metric labels for litellm-side rejects - PR #25980, PR #27502, PR #26947
  • OpenTelemetry
    • Isolate dual OTEL handlers; honor OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT; fix proxy-integration tracing bugs - PR #27018, PR #27403, PR #27757
  • Arize / LangSmith
    • Arize _set_usage_outputs handles raw OpenAI Pydantic CompletionUsage; remove unwanted metadata info from LangSmith - PR #26506, PR #26894
  • General
    • Decouple S3 audit-log config via s3_audit_callback_params - PR #27222
    • Set verbose_logger level when LITELLM_LOG=INFO; require a team-management role on /team/{id}/callback; close callback-config and observability-credential side channels; guard dynamic integration hosts - PR #26401, PR #26819, PR #27081, PR #26921

Guardrails​

  • General
    • Add Qohash Nexus guardrail hook - PR #24927
    • Run model-level post_call guardrails on streaming requests; ensure post-call guardrail fires exactly once - PR #26922, PR #27012, PR #26109
    • Preserve Responses event streams in Presidio output masking - PR #26878
    • Cover multimodal + Responses-API content shapes; tighten tool-permission checks; optional skip of tool message in unified guardrail inputs - PR #26957, PR #26969, PR #27441
    • Handle legacy dict shape for metadata.guardrails in the Team UI - PR #27224

Prompt Management​

  • General
    • Block path-traversal in BitBucket / Arize Phoenix / AssemblyAI clients; sandbox jinja2 in the GitLab/Arize/BitBucket prompt managers - PR #26943, PR #27043

Secret Managers​

  • General
    • Audit-log /cache/settings and /config_overrides/hashicorp_vault mutations - PR #26953

Spend Tracking, Budgets and Rate Limiting​

  • Rate Limiting
    • Atomic TPM rate limit; include model name + configured TPM/RPM in priority rate-limit 429 errors - PR #27001, PR #27216
    • Load team-member RPM/TPM from membership budget in the combined view - PR #24925
  • Budgets
    • Skip the personal-budget hook when a reservation covers the counter - PR #27021
    • Treat 0 team_member_budget as no cap; enforce team-member budget without a user row; reset org/tag/proxy budgets correctly - PR #27133, PR #27273, PR #27326, PR #27488
    • Flush virtual-key model_max budget spend to Redis after success logging; tighten budget spend admission - PR #27334, PR #26845
  • Tag Budgets & Routing
    • Enforce tag budgets on x-litellm-tags header requests; tag-budget reset drops stale management-cache entries; union x-litellm-tags with static team/key tags; fix internal tag-usage scoping; always merge caller-supplied tags into request metadata - PR #27573, PR #27568, PR #27247, PR #27315, PR #27784
    • Tag-routing test preventing header-regex bypass for strict plain-text tags - PR #26805
  • Spend Logs / Cost
    • Pass service_tier through Azure and Azure AI cost calculation - PR #24926
    • Opt-in suppression of stack traces in spend-tracking error logs; keep spend-log cleanup running after batch failures; redact echoed prompts in error_information; prevent secret_fields from leaking into spend logs; drop client-supplied pricing fields from request bodies - PR #26899, PR #27303, PR #27689, PR #27143, PR #27071

MCP Gateway​

  • Features
    • Org-level MCP server and toolset permissions - PR #26960
    • OBO (on-behalf-of) MCP auth - PR #27421
    • delegate_auth_to_upstream flag for PKCE passthrough - PR #27834
    • Support MCP access-group names in URL-based namespacing - PR #27726
  • Bugs
    • Sanitize tool names to Anthropic's [a-zA-Z0-9_-]{1,128} pattern - PR #26788
    • Require a trusted-proxy gate before honoring X-Forwarded-* on OAuth discovery; preserve oauth2 m2m auth for tools routes; run pre_call_tool_check on the OpenAPI/local-registry path - PR #26841, PR #26871, PR #27016
    • Redact MCP server URL/headers for non-admin viewers; replace user-API-key auth with authorization-or-cookie for MCP server creation - PR #27027, PR #27190
    • Fix MCP DB reload partial failures; surface upstream 401 for token-forwarding MCP servers - PR #27314, PR #27847

Performance / Loadbalancing / Reliability improvements​

  • Routing & Reliability
    • Trigger fallbacks on mid-stream httpx.TimeoutException - PR #26998
    • Register cooldowns on failure + fail fast on stale encrypted_content (Responses) - PR #27820
    • Register model info under the responses/-stripped variant - PR #27531
    • Fix Redis Sentinel client handling for authenticated Sentinel setups - PR #26302
  • Proxy hot path
    • Token-verification query optimization - PR #26202
    • Run daily activity aggregation off the event loop - PR #27264
    • Shared IAM cache + static credentials in BaseAWSLLM - PR #27125
    • Isolate semantic cache entries; stable Redis key generation across working directories; remove a duplicate in-memory cache-size constant - PR #26990, PR #27025, PR #26385
    • Early proxy request-size enforcement; coerce non-str x-litellm-* header values to avoid an httpx TypeError - PR #27311, PR #27504
    • Separate DB read and write endpoints - PR #27493
  • Health checks
    • Shared health-check polling; health_check_reasoning_effort for model health checks; skip disable_background_health_check models on GET /health; scope /health response to the caller's models; remove the separate health app - PR #26434, PR #27115, PR #27716, PR #26935, PR #27430
  • Config / startup robustness
    • Hot-reload config YAML when --reload is set; break the managed-resources import cycle on Python 3.13; reject bare-str file-input sinks (local-file read hardening) - PR #27274, PR #27160, PR #27762
  • Packaging / Docker / Helm / CI
    • Pin Wolfi & uv to multi-arch index digests; remove the hardcoded Prisma binary target for multi-arch builds; clear flagged OS-package advisories on the Docker image; refresh dependency locks - PR #27123, PR #27170, PR #27225, PR #27126
    • Helm: skip startup prisma db push when a migrations Job is enabled; increase default probe timeouts, disable debug logging by default - PR #27200, PR #27237
    • CI: Rerun Failed Tests for all pytest jobs, block PRs that drop coverage, Redis-backed VCR replay caches, reduce cassette bloat, mutation-testing workflow, dev-tag detection in the release workflow, Playwright apt-install skip - PR #27155, PR #27340, PR #26838, PR #27159, PR #27409, PR #27576, PR #26966, PR #27169
    • Remove legacy deployment artifacts and litellm-js packages; remove a redundant backup pricing file; misc test/import cleanup - PR #27541, PR #16590, PR #27699, PR #27633
    • Tighten router-settings-override and mock-testing trust; drop blank-text fallback for empty Bedrock Converse thinking blocks - PR #26968, PR #27850

Documentation Updates​

  • Update the Greptile README logo to a higher-quality image - PR #25385
  • Add a BudgetManager.reset_cost docstring - PR #27867
  • Add a _LoopWrapper class docstring - PR #27870

New Contributors​

  • @kimimgo made their first contribution in #24282
  • @shubham-arora-clear made their first contribution in #24644
  • @ohnoah made their first contribution in #24580
  • @ushiromiya-lion made their first contribution in #25106
  • @gowtham2809 made their first contribution in #25224
  • @he-yufeng made their first contribution in #26401
  • @MackDing made their first contribution in #26419
  • @dgu1-godaddy made their first contribution in #26834
  • @Vedanshu7 made their first contribution in #24943
  • @dennishenry made their first contribution in #27190
  • @SHARP155 made their first contribution in #27466
  • @mats852 made their first contribution in #24927

Full Changelog: https://github.com/BerriAI/litellm/compare/v1.84.0...v1.85.0


05/16/2026 (v1.85.0)​

Counts cover PRs new in v1.85.0 relative to v1.84.0 stable. 14 PRs that were backported into v1.84.0 stable (and documented in the v1.84.0 release notes) are excluded here to avoid double-counting.

  • New Models / Updated Models: 43
  • LLM API Endpoints: 24
  • Management Endpoints / UI: 54
  • AI Integrations (Logging / Guardrails / Prompt Mgmt / Secret Managers): 32
  • Spend Tracking, Budgets and Rate Limiting: 23
  • MCP Gateway: 12
  • Performance / Loadbalancing / Reliability improvements: 41
  • Documentation Updates: 3

Total: 232 PRs