Skip to main content

v1.91.0rc1 - MCP OAuth v2, Rust OCR Gateway & Realtime Performance

Deploy this version​

docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:1.91.0-rc.1

Key Highlights​

v1.91.0rc1 is the current release candidate for 1.91.0.

  • MCP Gateway OAuth 2.0 v2 resolver - a new shared OAuth token foundation with cross-replica single-flight refresh, an outbound-credentials package with typed results, and the first authorization_code migration onto the v2 resolver.
  • Rust OCR gateway - a new LiteLLM Rust workspace ships an async-first Mistral OCR bridge, packaged directly into the LiteLLM wheel, alongside an experimental Axum-based realtime AI gateway.
  • Realtime API performance - upstream connection-pool pre-warming and client-disconnect cancellation cut session-establishment latency and stop wasted upstream work.
  • Least-privilege MCP defaults - team keys can now default to least-privilege MCP access, scope to zero MCP servers via a sentinel, and harden client-IP resolution with trusted X-Forwarded-For hop counts.
  • ~48 new models - a large Cloudflare Workers AI batch, Gemini 3 image models, Mistral Medium 3.5 / OCR 3 & 4, GLM/zai, SambaNova, and AI/ML image models.

New Providers and Endpoints​

New Providers (2 new providers)​

ProviderSupported LiteLLM EndpointsDescription
Amazon Bedrock Mantle (bedrock_mantle)Chat CompletionsBedrock Mantle support with VPC endpoint routing via api_base, surfaced as its own Add Model provider - PR #31034, PR #31141
OpenSandbox (opensandbox)Sandbox / code interpreterNew sandbox provider for the code-interpreter loop - PR #31024

New LLM API Endpoints​

CapabilityDescriptionDocumentation
Rust OCR (Mistral)A new LiteLLM Rust workspace ships an async-first Mistral OCR bridge, packaged into the LiteLLM wheel - PR #31033, PR #31253, PR #31267OCR
Code interpreterSandbox code-interpreter interceptor on the Responses API and a chat-completions code-interpreter loop - PR #30905, PR #31027Sandbox

New Models / Updated Models​

New Model Support (~48 new models)​

ProviderModelContextInput ($/1M)Output ($/1M)Features
Gemini / Vertex AIgemini-3-pro-image, gemini-3.1-flash-image (+ gemini/, vertex_ai/ variants)1Mper-imageper-imageImage generation, GA pricing
AI/MLaiml/openai/gpt-image-2-per-imageper-imageImage generation
Cloudflare Workers AI~28 text-generation models (Llama 3.x/4, Qwen 2.5/3/QwQ, GLM 4.7/5.2, Kimi K2.6/K2.7, gpt-oss 20b/120b, Gemma, Granite, Nemotron, DeepSeek-R1 distill, Mistral, Llama Guard)variesvariesvariesNative Workers AI via OpenAI-compatible endpoint
Mistralmistral-medium-2508, mistral-medium-2604, mistral-medium-latest (Medium 3.5), mistral-ocr-2512 (OCR 3), mistral-ocr-4-0 (OCR 4)variesvariesvariesChat, OCR
SambaNovasambanova/DeepSeek-V3.2, sambanova/gemma-4-31B-itvariesvariesvariesChat
zai / OpenRouterzai/glm-4.7-flash, zai/glm-5.1, openrouter/z-ai/glm-5.1variesvariesvariesChat
Bedrockamazon.titan-embed-g1-text-02-embedding-Embeddings
Darkbloomdarkbloom/gemma-4-26b, darkbloom/gpt-oss-20bvariesvariesvariesChat

Exact per-model context windows and prices are in model_prices_and_context_window.json.

Features​

  • Fireworks AI
    • Sync chat completions endpoint with the full Fireworks API surface - PR #30885
  • Cloudflare
    • Add current Workers AI text-generation models to the cost map - PR #31051
    • Route the native Workers AI provider through the OpenAI-compatible endpoint - PR #31053
  • Mistral
    • Support Mistral OCR 4 (mistral-ocr-4-0) - PR #31353
    • Add mistral/mistral-ocr-2512 (OCR 3) to the cost map - PR #31463
    • Retarget mistral-medium-latest to Medium 3.5 and add date-pinned aliases - PR #31373
  • AI/ML
    • Add the openai/gpt-image-2 image model - PR #31323
  • Rerank
    • Rerank transformation refresh across ~15 providers (Cohere v1/v2, Voyage, Jina, Vertex, Bedrock, Hugging Face, hosted vLLM, DashScope, DeepInfra, NVIDIA NIM, Fireworks, Watsonx) - PR #31185
  • DeepSeek / GitHub Copilot / Moonshot

Bug Fixes​

  • Anthropic
    • Sanitize tool_use ids on the native /v1/messages path - PR #31094
    • Drop the unsupported speed param under drop_params - PR #31152
    • Normalize the Messages system role and adaptive-thinking for Claude Invoke - PR #31364
  • Bedrock
    • Only expand config-sourced AWS credential references - PR #30867
    • Prevent key-level metadata.tags from leaking into the Bedrock passthrough body - PR #30985
    • Surface web-identity token aud/iss on InvalidIdentityToken - PR #31412
  • Vertex AI
    • Prevent a stale Vertex bearer token from causing a /v1/messages 401 after token expiry - PR #31276
    • Append the rawPredict suffix for a custom api_base - PR #31529

LLM API Endpoints​

Features​

  • Responses API
    • Code-interpreter interceptor (sandbox) on the Responses API - PR #30905
    • Chat-completions code-interpreter loop - PR #31027
  • Realtime API
    • Add an OpenAI realtime translation layer to litellm-rust (1/2) - PR #31129
    • Add a minimal Rust router + Axum AI-gateway calling router.realtime (2/2) - PR #31135
  • OCR
  • Batches
    • Stream OpenAI to Vertex batch JSONL uploads - PR #31036
  • Pass-through
    • Forward all multipart files with repeated field names - PR #31391
    • Schedule spend logging via the durable logging worker - PR #31485
  • Web Search
    • Sync tool_choice when converting web_search tools - PR #31375
    • Wrap the agentic-loop response in a fake stream for streaming requests - PR #31484

Bugs​

  • Realtime API
    • Fix post-tool-call function_response id omission - PR #30446
    • Stop revalidating realtime events at the logging boundary - PR #31054
  • General
    • Skip the model override when the response has no model field - PR #31183
    • Recover cost on interrupted and agentic Anthropic streams - PR #31035

Management Endpoints / UI​

Features​

  • Virtual Keys & Teams
    • Scope team BYOK models by key team_id in /model/info - PR #31009
    • Restore wildcard expansion in /v1/model/info - PR #31444
    • Expand the all-proxy-models sentinel in direct-access lookup - PR #31153
    • Persist budget_duration on /team/member_add member budgets - PR #31443
    • Persist budget-window deletion on virtual keys - PR #31107
  • SCIM
    • Ingest enterprise-extension attributes into user metadata - PR #30893
    • Drive the global proxy role from a SCIM admin group - PR #30895
  • Proxy CLI / Auth
    • Mint a per-session agent credential on lite login - PR #31072
  • Config & Plugins
    • LiteLLM plugin architecture v2 - PR #30688
    • Persist the global retry_policy via /config/update - PR #29540
    • Tighten role-based visibility of config and MCP fields - PR #30587
  • UI
    • Show an agent's attached virtual key in the UI - PR #29619
    • Add Amazon Bedrock Mantle to the Add Model provider dropdown - PR #31034
    • Clarify OpenAI-compatible provider dropdown labels (chat vs legacy completions) - PR #31046
    • Render logos under a custom server_root_path - PR #31156

Bugs​

  • UI
    • Keep team Organization optional for proxy admins in single-org setups - PR #30861
    • Stop per-model usage export from duplicating user spend across models - PR #30980
    • Resolve user_id to email in the Spend Per User usage chart - PR #30992
    • Label the request-logs column "Key Alias" to match the filter - PR #31037
    • Stop listing bedrock_mantle models under the Bedrock provider - PR #31478
  • Auth & Management
    • Resolve caller identity once into a Principal at the auth seam - PR #30887
    • Cache the auth-path team object under the canonical team_id key - PR #31418
    • Honor user_api_key_cache_ttl for management-object cache writes - PR #31504
    • Reject model_list in the proxy body and gate advisor client credentials - PR #30585
    • Redact the API key from key/info client error messages - PR #31342
    • Stop double-decrypting email/slack alerting env vars in get_config - PR #31117
    • Serialize team budget_limits to JSON in jsonify_team_object - PR #31045
    • Block a server credential leak to a caller-supplied api_base - PR #30682

AI Integrations​

Logging​

  • Prometheus
    • Add a requested_model label to spend and request metrics - PR #31410
    • Add a per-team litellm_team_members_metric gauge - PR #31506
  • OpenTelemetry
    • Resolve the LITELLM_OTEL_V2 flag once instead of rebuilding settings per call - PR #30989
    • Use a hashable scope for _emit_once when guardrail_mode is a list - PR #31262
    • Point the AgentOps OTLP exporter at otlp.agentops.ai - PR #31490
  • General
    • Add POST /v1/callbacks/logs to replay logging payloads through callbacks - PR #31134

Guardrails​

  • Bedrock Guardrails
    • Select the latest user message by original role in apply_guardrail - PR #30482
  • General
    • Add a headroom guardrail for message compression - PR #31407
    • Instrument during-call and post-call guardrail latency - PR #31414
    • Match the policy-pipeline block response to a direct guardrail attachment - PR #31421
    • Make the Generic Guardrail resilient to built-in tools and errors - PR #31461

Spend Tracking, Budgets and Rate Limiting​

  • Cost tracking
    • Store litellm_call_id on spend logs for DB-to-trace correlation - PR #31344
    • Preserve Anthropic server_tool_use web-search usage in cost tracking - PR #31355
    • Restore per-query Gemini 3.x web-search billing - PR #31363
    • Preserve Gemini Embedding 2 usageMetadata for cost tracking - PR #31354
    • Correct the regional processing uplift to the gpt-5.4/5.5 series only - PR #31136
    • Isolate all per-deployment pricing overrides from sibling deployments - PR #31021
  • Spend UI and endpoints
    • Fold the logs-tab total into the page query to avoid a separate COUNT(*) - PR #31423
    • Spend-management endpoint and OpenAI image-generation cost-calculator updates - PR #31185

MCP Gateway​

  • OAuth 2.0 v2 resolver
    • Shared OAuth token foundation: challenge, store seam, expiry-aware cache, single-flight refresh - PR #31275
    • Scaffold the outbound_credentials package with a typed Result - PR #31047
    • Add a resolve_credentials dispatch skeleton - PR #31056
    • Graft the v2 resolver onto _create_mcp_client (none + api_key static family) - PR #31058
    • Migrate authorization_code MCP to the v2 resolver (single-replica) [1/2] - PR #31473
    • Cross-replica single-flight refresh for the v2 per-user OAuth store [2/2] - PR #31493
    • Challenge delegate-auth OAuth servers with upstream resource_metadata - PR #31255
  • Access control
    • Opt-in least-privilege default for team-key MCP access - PR #31380
    • Scope a key to zero MCP servers with a no-mcp-servers sentinel - PR #31029
    • Allow llm_api_routes virtual keys to list MCP tools via /v1/mcp/tools - PR #31031
    • Let proxy admins assign MCP servers to teamless keys - PR #31126
    • Resolve config-defined servers in per-user credential and env-var endpoints - PR #31171
  • X-Forwarded-For hardening
    • Add mcp_xff_num_trusted_hops to harden X-Forwarded-For client-IP resolution - PR #31257
    • Correct the misleading no-trusted-proxy warning for XFF access control - PR #31264
    • Warn loudly when X-Forwarded-For is present but use_x_forwarded_for is off - PR #31266
  • Bug fixes
    • Stop exposing MCP server URLs on the AI Hub and public hub API - PR #30902
    • Stop auth failures on the /mcp path surfacing as cancelled tool calls - PR #31011
    • Resolve toolset tools by the server's known prefix - PR #31254
    • Stop logging tool-call input in the MCP client - PR #31393

Performance / Loadbalancing / Reliability improvements​

  • Streaming and realtime
    • Pre-warm the upstream realtime connection pool to cut session-establishment latency - PR #31163
    • Cancel the upstream LLM stream when the client disconnects during time-to-first-token - PR #31499
    • Word-sliced cache replay for stream=true cache hits - PR #30216
    • Stop the O(n^2) re-parse of accumulated Gemini stream JSON - PR #31297
    • Surface a clean RateLimitError on a mid-stream 429 with no fallbacks - PR #31298
  • Router and timeouts
    • Honor litellm_settings.request_timeout as an independent per-attempt timeout - PR #31119
    • Guard num_retries=None in async_function_with_retries - PR #30036
  • Caching and proxy
    • Apply the Redis namespace to all key operations - PR #31288
    • Loop-scope async Lua script registration - PR #31501
    • Memoize _get_all_llm_api_params, rebuilt per request - PR #31430
    • Precompute service-tier cost-key suffixes - PR #31431
    • Bound event-loop blocking from oversized requests - PR #31497
    • Stop the pass-through route registry growing on every reload - PR #31314
    • Strip NUL bytes in safe_dumps only when present - PR #31424
    • Semantic-caching (Redis/Qdrant) and embedding-router updates - PR #31305
  • Supply chain and build
    • Bump osv-flagged dependencies to clear known CVEs - PR #31122
    • Bump the wolfi-base digest to patch openssl CVE-2026-34182 - PR #31133
    • Add a Grype image scan for OS and library CVEs - PR #31151
    • Harden cargo fetches during maturin builds - PR #31348
    • Build the Admin UI from source in a build-platform-pinned stage - PR #31130

Documentation Updates​

  • Add MCP server change guidelines - PR #31038

New Contributors​

This release candidate contains changes from existing maintainers only; there are no new contributors in this window.

Full Changelog​

v1.90.0...v1.91.0-rc.1