v1.91.0rc1 - MCP OAuth v2, Rust OCR Gateway & Realtime Performance
Deploy this version​
- Docker
- Pip
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:1.91.0-rc.1
pip install litellm==1.91.0rc1
Key Highlights​
v1.91.0rc1 is the current release candidate for 1.91.0.
- MCP Gateway OAuth 2.0 v2 resolver - a new shared OAuth token foundation with cross-replica single-flight refresh, an outbound-credentials package with typed results, and the first authorization_code migration onto the v2 resolver.
- Rust OCR gateway - a new LiteLLM Rust workspace ships an async-first Mistral OCR bridge, packaged directly into the LiteLLM wheel, alongside an experimental Axum-based realtime AI gateway.
- Realtime API performance - upstream connection-pool pre-warming and client-disconnect cancellation cut session-establishment latency and stop wasted upstream work.
- Least-privilege MCP defaults - team keys can now default to least-privilege MCP access, scope to zero MCP servers via a sentinel, and harden client-IP resolution with trusted X-Forwarded-For hop counts.
- ~48 new models - a large Cloudflare Workers AI batch, Gemini 3 image models, Mistral Medium 3.5 / OCR 3 & 4, GLM/zai, SambaNova, and AI/ML image models.
New Providers and Endpoints​
New Providers (2 new providers)​
| Provider | Supported LiteLLM Endpoints | Description |
|---|---|---|
Amazon Bedrock Mantle (bedrock_mantle) | Chat Completions | Bedrock Mantle support with VPC endpoint routing via api_base, surfaced as its own Add Model provider - PR #31034, PR #31141 |
OpenSandbox (opensandbox) | Sandbox / code interpreter | New sandbox provider for the code-interpreter loop - PR #31024 |
New LLM API Endpoints​
| Capability | Description | Documentation |
|---|---|---|
| Rust OCR (Mistral) | A new LiteLLM Rust workspace ships an async-first Mistral OCR bridge, packaged into the LiteLLM wheel - PR #31033, PR #31253, PR #31267 | OCR |
| Code interpreter | Sandbox code-interpreter interceptor on the Responses API and a chat-completions code-interpreter loop - PR #30905, PR #31027 | Sandbox |
New Models / Updated Models​
New Model Support (~48 new models)​
| Provider | Model | Context | Input ($/1M) | Output ($/1M) | Features |
|---|---|---|---|---|---|
| Gemini / Vertex AI | gemini-3-pro-image, gemini-3.1-flash-image (+ gemini/, vertex_ai/ variants) | 1M | per-image | per-image | Image generation, GA pricing |
| AI/ML | aiml/openai/gpt-image-2 | - | per-image | per-image | Image generation |
| Cloudflare Workers AI | ~28 text-generation models (Llama 3.x/4, Qwen 2.5/3/QwQ, GLM 4.7/5.2, Kimi K2.6/K2.7, gpt-oss 20b/120b, Gemma, Granite, Nemotron, DeepSeek-R1 distill, Mistral, Llama Guard) | varies | varies | varies | Native Workers AI via OpenAI-compatible endpoint |
| Mistral | mistral-medium-2508, mistral-medium-2604, mistral-medium-latest (Medium 3.5), mistral-ocr-2512 (OCR 3), mistral-ocr-4-0 (OCR 4) | varies | varies | varies | Chat, OCR |
| SambaNova | sambanova/DeepSeek-V3.2, sambanova/gemma-4-31B-it | varies | varies | varies | Chat |
| zai / OpenRouter | zai/glm-4.7-flash, zai/glm-5.1, openrouter/z-ai/glm-5.1 | varies | varies | varies | Chat |
| Bedrock | amazon.titan-embed-g1-text-02 | - | embedding | - | Embeddings |
| Darkbloom | darkbloom/gemma-4-26b, darkbloom/gpt-oss-20b | varies | varies | varies | Chat |
Exact per-model context windows and prices are in model_prices_and_context_window.json.
Features​
- Fireworks AI
- Sync chat completions endpoint with the full Fireworks API surface - PR #30885
- Cloudflare
- Mistral
- AI/ML
- Add the
openai/gpt-image-2image model - PR #31323
- Add the
- Rerank
- Rerank transformation refresh across ~15 providers (Cohere v1/v2, Voyage, Jina, Vertex, Bedrock, Hugging Face, hosted vLLM, DashScope, DeepInfra, NVIDIA NIM, Fireworks, Watsonx) - PR #31185
- DeepSeek / GitHub Copilot / Moonshot
- Chat transformation updates - PR #31185
Bug Fixes​
LLM API Endpoints​
Features​
- Responses API
- Realtime API
- OCR
- Batches
- Stream OpenAI to Vertex batch JSONL uploads - PR #31036
- Pass-through
- Web Search
Bugs​
- Realtime API
- General
Management Endpoints / UI​
Features​
- Virtual Keys & Teams
- Scope team BYOK models by key
team_idin/model/info- PR #31009 - Restore wildcard expansion in
/v1/model/info- PR #31444 - Expand the all-proxy-models sentinel in direct-access lookup - PR #31153
- Persist
budget_durationon/team/member_addmember budgets - PR #31443 - Persist budget-window deletion on virtual keys - PR #31107
- Scope team BYOK models by key
- SCIM
- Proxy CLI / Auth
- Mint a per-session agent credential on
lite login- PR #31072
- Mint a per-session agent credential on
- Config & Plugins
- UI
Bugs​
- UI
- Keep team Organization optional for proxy admins in single-org setups - PR #30861
- Stop per-model usage export from duplicating user spend across models - PR #30980
- Resolve
user_idto email in the Spend Per User usage chart - PR #30992 - Label the request-logs column "Key Alias" to match the filter - PR #31037
- Stop listing
bedrock_mantlemodels under the Bedrock provider - PR #31478
- Auth & Management
- Resolve caller identity once into a Principal at the auth seam - PR #30887
- Cache the auth-path team object under the canonical
team_idkey - PR #31418 - Honor
user_api_key_cache_ttlfor management-object cache writes - PR #31504 - Reject
model_listin the proxy body and gate advisor client credentials - PR #30585 - Redact the API key from
key/infoclient error messages - PR #31342 - Stop double-decrypting email/slack alerting env vars in
get_config- PR #31117 - Serialize team
budget_limitsto JSON injsonify_team_object- PR #31045 - Block a server credential leak to a caller-supplied
api_base- PR #30682
AI Integrations​
Logging​
- Prometheus
- OpenTelemetry
- General
- Add
POST /v1/callbacks/logsto replay logging payloads through callbacks - PR #31134
- Add
Guardrails​
- Bedrock Guardrails
- Select the latest user message by original role in
apply_guardrail- PR #30482
- Select the latest user message by original role in
- General
Spend Tracking, Budgets and Rate Limiting​
- Cost tracking
- Store
litellm_call_idon spend logs for DB-to-trace correlation - PR #31344 - Preserve Anthropic
server_tool_useweb-search usage in cost tracking - PR #31355 - Restore per-query Gemini 3.x web-search billing - PR #31363
- Preserve Gemini Embedding 2
usageMetadatafor cost tracking - PR #31354 - Correct the regional processing uplift to the gpt-5.4/5.5 series only - PR #31136
- Isolate all per-deployment pricing overrides from sibling deployments - PR #31021
- Store
- Spend UI and endpoints
MCP Gateway​
- OAuth 2.0 v2 resolver
- Shared OAuth token foundation: challenge, store seam, expiry-aware cache, single-flight refresh - PR #31275
- Scaffold the
outbound_credentialspackage with a typed Result - PR #31047 - Add a
resolve_credentialsdispatch skeleton - PR #31056 - Graft the v2 resolver onto
_create_mcp_client(none + api_key static family) - PR #31058 - Migrate authorization_code MCP to the v2 resolver (single-replica) [1/2] - PR #31473
- Cross-replica single-flight refresh for the v2 per-user OAuth store [2/2] - PR #31493
- Challenge delegate-auth OAuth servers with upstream
resource_metadata- PR #31255
- Access control
- Opt-in least-privilege default for team-key MCP access - PR #31380
- Scope a key to zero MCP servers with a no-mcp-servers sentinel - PR #31029
- Allow
llm_api_routesvirtual keys to list MCP tools via/v1/mcp/tools- PR #31031 - Let proxy admins assign MCP servers to teamless keys - PR #31126
- Resolve config-defined servers in per-user credential and env-var endpoints - PR #31171
- X-Forwarded-For hardening
- Bug fixes
Performance / Loadbalancing / Reliability improvements​
- Streaming and realtime
- Pre-warm the upstream realtime connection pool to cut session-establishment latency - PR #31163
- Cancel the upstream LLM stream when the client disconnects during time-to-first-token - PR #31499
- Word-sliced cache replay for
stream=truecache hits - PR #30216 - Stop the O(n^2) re-parse of accumulated Gemini stream JSON - PR #31297
- Surface a clean
RateLimitErroron a mid-stream 429 with no fallbacks - PR #31298
- Router and timeouts
- Caching and proxy
- Apply the Redis namespace to all key operations - PR #31288
- Loop-scope async Lua script registration - PR #31501
- Memoize
_get_all_llm_api_params, rebuilt per request - PR #31430 - Precompute service-tier cost-key suffixes - PR #31431
- Bound event-loop blocking from oversized requests - PR #31497
- Stop the pass-through route registry growing on every reload - PR #31314
- Strip NUL bytes in
safe_dumpsonly when present - PR #31424 - Semantic-caching (Redis/Qdrant) and embedding-router updates - PR #31305
- Supply chain and build
- Bump osv-flagged dependencies to clear known CVEs - PR #31122
- Bump the wolfi-base digest to patch openssl CVE-2026-34182 - PR #31133
- Add a Grype image scan for OS and library CVEs - PR #31151
- Harden cargo fetches during maturin builds - PR #31348
- Build the Admin UI from source in a build-platform-pinned stage - PR #31130
Documentation Updates​
- Add MCP server change guidelines - PR #31038
New Contributors​
This release candidate contains changes from existing maintainers only; there are no new contributors in this window.