v1.87.0rc1 - OCI Generative AI Provider, Gemini 3.5 Flash Day-0, MCP UI for OAuth Servers
Deploy this versionโ
- Docker
- Pip
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:1.87.0-rc.1
pip install litellm==1.87.0rc1
Key Highlightsโ
- OCI Generative AI as a first-class provider โ production-ready chat, embeddings, streaming, reasoning and tool use across Cohere Command-A, Meta Llama 3.1/3.2/3.3/4, xAI Grok 3/4, Google Gemini 2.5, and OpenAI GPT-5 hosted on OCI; full model-pricing catalog included.
- Gemini 3.5 Flash Day-0 support โ
gemini-3.5-flashandgemini-3.1-flash-liteship on Vertex AI, Google AI Studio, and OpenRouter with full pricing, function calling, web search, code execution, and managed-agents support. - MCP UI for OAuth tool calls โ the dashboard now resolves tool list and tool call against OAuth-protected MCP servers directly, plus native MCP OAuth support for Cursor and clearer OAuth error messages.
- Codex CLI auth hardening โ JWT-derived team aliases and SSO form-URL flow for the OpenAI Codex CLI, plus allowlisted OIDC-claim persistence across the CLI SSO poll.
- Anthropic streaming hot-path perf โ ~90% lower TTFT overhead and higher sustained throughput on the proxy's Anthropic
/v1/messagesSSE path, measured on a real 4-pod deployment against both Anthropic and Bedrock Invoke (wire output is parity-tested); plus lazy-loaded response streaming for Bedrock SageMaker.
New Providers and Endpointsโ
New Providers (1 new provider)โ
| Provider | Supported LiteLLM Endpoints | Description |
|---|---|---|
| OCI Generative AI | /v1/chat/completions, /v1/embeddings | Official Oracle Cloud Infrastructure Generative AI integration. Production-ready support for chat, streaming, reasoning, tool calling, and embeddings across Cohere Command-A (incl. Reasoning + Vision), Meta Llama 3.1 / 3.2 / 3.3 / 4, xAI Grok 3 / 4, Google Gemini 2.5, and OpenAI GPT-5. Includes full model-pricing catalog. - PR #28223 |
New Models / Updated Modelsโ
New Model Support (22 new models)โ
| Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Features |
|---|---|---|---|---|---|
| Azure | azure/speech/azure-stt | โ | $0.000278/sec | โ | Audio transcription |
| Fireworks AI | fireworks_ai/glm-5p1 | 202,800 | $1.40 | $4.40 | Reasoning |
| Fireworks AI | fireworks_ai/accounts/fireworks/models/glm-5p1 | 202,800 | $1.40 | $4.40 | Reasoning |
| Gemini | gemini/gemini-3.5-flash | 1,048,576 | $1.50 | $9.00 | Audio input, function calling, parallel function calling, PDF input, prompt caching, reasoning, response schema, system messages, tool choice, URL context, video input, vision, web search, service tier |
| Gemini | gemini/gemini-3.1-flash-lite | 1,048,576 | $0.25 | $1.50 | Audio input, code execution, file search, function calling, parallel function calling, PDF input, prompt caching, reasoning, response schema, system messages, tool choice, URL context, video input, vision, web search, service tier |
| Vertex AI | vertex_ai/gemini-3.5-flash | 1,048,576 | $1.50 | $9.00 | Same as Gemini direct |
| Vertex AI | vertex_ai/gemini-3.1-flash-lite | 1,048,576 | $0.25 | $1.50 | Same as Gemini direct |
| Mistral | mistral/ministral-8b-2512 | 262,144 | $0.15 | $0.15 | Assistant prefill, function calling, response schema, tool choice, vision |
| OCI | oci/openai.gpt-5 | 272,000 | $1.25 | $10.00 | Function calling, reasoning, response schema, vision |
| OCI | oci/openai.gpt-5-mini | 272,000 | $0.25 | $2.00 | Function calling, reasoning, response schema, vision |
| OCI | oci/openai.gpt-5-nano | 272,000 | $0.05 | $0.40 | Function calling, reasoning, response schema, vision |
| OCI | oci/cohere.command-a-reasoning | 256,000 | $1.56 | $1.56 | Reasoning, native streaming |
| OCI | oci/cohere.command-a-vision | 256,000 | $1.56 | $1.56 | Function calling, vision, native streaming |
| OCI | oci/cohere.embed-multilingual-image-v3.0 | 512 | $0.10 | โ | Embeddings, vision |
| OCI | oci/meta.llama-3.1-8b-instruct | 128,000 | $0.72 | $0.72 | Function calling, native streaming |
| OpenRouter | openrouter/google/gemini-3.1-flash-lite | 1,048,576 | $0.25 | $1.50 | Audio input, code execution, file search, function calling, parallel function calling, PDF input, prompt caching, reasoning, response schema, system messages, tool choice, URL context, video input, vision, web search |
| OpenRouter | openrouter/xiaomi/mimo-v2.5 | 1,048,576 | $0.40 | $2.00 | Function calling, reasoning, vision, audio input, video input, response schema, prompt caching |
| OpenRouter | openrouter/xiaomi/mimo-v2.5-pro | 1,048,576 | $1.00 | $3.00 | Function calling, reasoning, response schema, prompt caching |
| Reducto | reducto/parse-v3 | โ | โ | โ | OCR |
| Reducto | reducto/parse-legacy | โ | โ | โ | OCR |
Plus a Vertex / Anthropic supports_output_config flag flip on all claude-opus-4-6, claude-opus-4-7, and claude-sonnet-4-6 regional variants, and an oci/* supports_native_streaming flip across Cohere, Gemini, Meta, and xAI catalog entries.
Featuresโ
- Gemini
- Azure
- Add Azure Speech STT config support - PR #27482
- OpenRouter
Bug Fixesโ
- Vertex AI
- Bedrock
- SageMaker
- Send the native Cohere embed payload to Cohere SageMaker endpoints - PR #28613
- DeepSeek
- Use the native
/anthropic/v1/messagesendpoint and sanitize tools - PR #28200
- Use the native
- Azure
- vLLM
- Fix Anthropic tool-call transformation on vLLM deployments - PR #28549
LLM API Endpointsโ
Featuresโ
- Interactions API
- Migrate to the Google Interactions API steps schema (May 2026 revision) - PR #28153
- Google-native passthrough
- Decode bytes and pass through SSE for Google-native
streamGenerateContent(no moreb'...'literals on the wire) - PR #28213
- Decode bytes and pass through SSE for Google-native
Bugsโ
- Responses API
- Forward
timeouton the completion-transformation path for Anthropic, Bedrock, and Vertex - PR #28133 - Accept dict-shape
reasoning_effortfrom the Anthropic Responses bridge - PR #28201 - Wrap
aresponsesstreaming iterator for mid-stream router fallbacks - PR #28215 - Unblock staging โ mypy + coverage for
aresponsesstreaming fallback - PR #28318 - Strip Anthropic
cache_controlfrom OpenAI Responses API requests - PR #28431 - Use the OpenAI
SSEDecoderfor Responses API streaming - PR #28566 - Replay
openai/responsesbridge cache hits as chat streams - PR #28158
- Forward
- Interactions API
- Never drop streamed text deltas; always emit the terminal completion - PR #28394
- Batch API
- Normalize batch file IDs before the
ManagedObjectTablewrite - PR #28339
- Normalize batch file IDs before the
Management Endpoints / UIโ
Featuresโ
- Models + Endpoints
- Add a pause/resume Switch on the models table - PR #28151
- Spend Logs
- Consolidate filter state and extract components in the UI - PR #25847
- Playground
- Interactions API endpoint in the Playground with SSE streaming - PR #28156
- Passthrough Routes
- Auth / Codex CLI
- Virtual Keys
- Encrypt
callback_varsin key/team metadata at rest in the DB - PR #27141
- Encrypt
Bugsโ
- Auth / Discovery
- Hydrate wildcard discovery credentials so OIDC discovery works against wildcarded providers - PR #28284
- Spend Logs
- Restore the log-filter loading indicator - PR #28282
- End-User Logs
- Fix end-user logs surfacing - PR #27758
AI Integrationsโ
Loggingโ
Guardrailsโ
- Microsoft Purview DLP
- New guardrail integration for Microsoft Purview DLP - PR #24966
Spend Tracking, Budgets and Rate Limitingโ
- Spend Counter โ Seed the Redis counter via
SET NXto prevent cross-pod double-seed on cold start - PR #27854 - Cost Tracking โ Recalculate cost after router retry failures so the logged cost reflects the actual attempt that succeeded - PR #28476
- Cost Tracking โ Treat
litellm_provider=Noneas a wildcard in_check_provider_matchso cost lookup works for catalog entries that omit the provider field - PR #28523
MCP Gatewayโ
- OAuth in the UI โ Add tool-call and tool-list support via the dashboard for OAuth-protected MCP servers - PR #28454
- Cursor OAuth โ Allow native MCP OAuth support for Cursor - PR #28327
- Auth Resolution โ JWT on
tools/listand RESTtools/callserver resolution - PR #28227 - Cold-Start Init โ Forward upstream
initializeinstructions on cold gateway init - PR #28231 - OAuth Errors โ Add
error_descriptionand hint to OAuth flow error responses - PR #28471 - Inspector โ Trim whitespace from MCP inspector tool-call inputs - PR #28203
Performance / Loadbalancing / Reliability improvementsโ
- Anthropic
/v1/messagesstreaming hot path โ cut per-request and per-chunk overhead on the proxy's Anthropic streaming path, with byte-identical wire output guaranteed by parity tests that diff the logged and billed payloads between the fast and legacy paths. Measured on a real 4-podm7i.xlargedeployment (no HPA) streaming 256text_deltachunks per request, against both Anthropic and Bedrock Invoke โ TTFT overhead ~90% lower with higher sustained throughput (full numbers below) - PR #28289- Skip work that's a no-op in the default config: the per-chunk Datadog span when tracing is off, the per-chunk streaming hook when no callback / guardrail / cost-injection is active, and the agentic post-processing wrapper when no callback overrides its hook (it otherwise buffers every chunk and rebuilds the response from SSE just to call hooks that all return
(False, {})). - Stop doing the same work twice per request: serialize the request body once and reuse it for the pre-call log and the wire, memoize the optional-params type-hint resolution (~80ยตs/request), and skip the redundant
strip_empty_text_blocksscan when the async wrapper already sanitized. - Cheaper end-of-stream reconstruction: collapse the homogeneous run of
content_block_deltatext events into a single equivalent SSE event beforestream_chunk_builder, removing O(output-token)ModelResponseStreamconstructions; tool-use / thinking / citations streams fall back to the unchanged legacy path. - Cheaper hot-path logging: gate debug f-string evaluation behind
isEnabledFor(DEBUG), hoistcost_injection_activeout of the per-chunk loop, and drop one async-generator layer per chunk inasync_sse_data_generator.
- Skip work that's a no-op in the default config: the per-chunk Datadog span when tracing is off, the per-chunk streaming hook when no callback / guardrail / cost-injection is active, and the agentic post-processing wrapper when no callback overrides its hook (it otherwise buffers every chunk and rebuilds the response from SSE just to call hooks that all return
Anthropic /v1/messages streaming, 256 text_delta chunks/request โ 4 pods on m7i.xlarge (4 vCPU / 16 GB), no HPA:
| Metric | Baseline (v1.87.0-dev.1) | Patched (#28289) | Change |
|---|---|---|---|
| TPM (p50 / p95 / p99) | 2634 / 2808 / 2867 | 2952 / 2968 / 2971 | +12% / +6% / +4% |
| TTFT overhead % (p50 / p95 / p99) | 2220 / 3057 / 3111 | 165 / 316 / 328 | ~90% lower |
- Bedrock / SageMaker โ Switch to lazy loading for response streaming - PR #28189
- Granian ASGI โ Add Granian as a supported ASGI server for better throughput stability - PR #26027
- Prisma โ Expose Prisma idle/connect timeout + extra DB URL params so production deployments can tune connection pools - PR #28395
- Proxy auth โ Strict media-type match for form bodies (defensive against ambiguous
Content-Type) - PR #27939 - Proxy auth โ Carry the ASGI path into the WebSocket auth synthetic Request so auth resolves the right route - PR #27940
- Docker โ Restore
npmto the non-root builder image so UI builds run there - PR #28519 - Helm โ Drop the
main-prefix from the default image tag - PR #28710 - License check โ Read PEP 639
license-expressionmetadata incheck_licenses- PR #28529
Documentation Updatesโ
- Fix the incorrect
/v1/agentsrequest example - PR #28131 - Fix misleading credential-passing examples in Gemini-agents GET/DELETE docstrings - PR #28293
General Proxy Improvementsโ
Testing, CI & build hardening:
- Behavior-pinning harness + Key Tier-1 matrix (and tier-2/3 + team management endpoints + phase-4 payload matrix) - PR #28321, PR #28441, PR #28620, PR #28681
- Stabilize image-edit VCR cassettes to stop live
gpt-image-1spend - PR #28110 - Migrate realtime + rerank tests off shut-down upstream models; replace
gpt-4o-audio-previewwithgpt-audio-1.5; expectsession.createdas xAI realtime initial event - PR #28191, PR #28281, PR #28424 - Harden the flaky proxy callback-leak detector - PR #28195
- E2E runner migrated to
uv; add an "All Proxy Models" key test - PR #28313 - UI-e2e: admin key creation with a specific proxy model; forward
LITELLM_LICENSEto the UI e2e proxy - PR #28365, PR #28398 - Vertex AI grounding test tolerates transient 500; streaming test tolerates Vertex 429 wrapped in
MidStreamFallbackError- PR #28503, PR #28669 - Bump black to 26.3.1 and reapply formatting; one-shot lint fix - PR #28525, PR #28639
- Allow
audio_transcription_configin the model-prices schema - PR #28708 - Remove the dead old Playwright e2e suite - PR #28632
- Routine dependency/CI bumps - PR #28287, PR #28524, PR #28528, PR #27665, PR #28296, PR #28303, PR #28707
PR roll-up by ownership areaโ
PRs by ownership area (total: 93)
- Other (CI / tests / build hardening): 25
- Models & Providers (incl. new provider): 18
- UI / Auth & Management: 12
- LLM API Endpoints: 11
- Performance: 9
- Logging: 6
- MCP: 6
- Spend / Budgets / Rate Limits: 3
- Docs: 2
- Guardrails: 1
New Contributorsโ
- @IshaMeera made their first contribution in #28131
- @TorvaldUtne made their first contribution in #27700
- @adityasingh2400 made their first contribution in #28523
- @cwang-otto made their first contribution in #28133
- @ro31337 made their first contribution in #28280
- @withomasmicrosoft made their first contribution in #28490
Full Changelog: https://github.com/BerriAI/litellm/compare/v1.86.0-rc.1...v1.87.0-rc.1
05/23/2026 (v1.87.0rc1)โ
- New Providers: 1
- New Models / Updated Models: 17
- LLM API Endpoints: 11
- Management Endpoints / UI: 12
- AI Integrations (Logging / Guardrails / Secret Managers): 7
- Spend Tracking, Budgets and Rate Limiting: 3
- MCP Gateway: 6
- Performance / Loadbalancing / Reliability improvements: 9
- General Proxy Improvements (testing / CI / build): 25
- Documentation Updates: 2
Total: 93 PRs