v1.90.0rc1 - Six New Providers, OpenTelemetry v2 Parity & Streaming Reliability
Deploy this version​
- Docker
- Pip
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:1.90.0-rc.1
pip install litellm==1.90.0rc1
Key Highlights​
v1.90.0rc1 is the current release candidate for 1.90.0.
- Six new providers - ModelScope, LibertAI, Parasail, Pinstripes, TinyFish (search), and FastCRW (search) - plus a new e2b code-execution sandbox primitive.
- 91 new models across Fireworks AI, Scaleway, Tensormesh, LibertAI, Azure AI (including
gpt-5.5and DeepSeek V4), and Bedrock Mantle. - OpenTelemetry v2 reaches metrics parity with v1, emitting the six
gen_ai.client.*metrics, stamping input/output message content, and scoping OTLP credentials per tenant. - A broad streaming-reliability sweep: upstream connections are now released when the client disconnects mid-stream (Gemini, aiohttp), requests are cancelled cleanly, and partial spend is recorded on interrupted streams.
- Two new guardrails (Cisco AI Defense, Repello Argus) and a large Next.js App Router UI migration covering the models, teams, users, organizations, api-keys, and usage pages.
New Providers and Endpoints​
New Providers (6 new providers)​
| Provider | Supported LiteLLM Endpoints | Description |
|---|---|---|
ModelScope (modelscope) | Chat Completions | OpenAI-compatible provider for ModelScope-hosted models - PR #28460 |
LibertAI (libertai) | Chat Completions, Embeddings | JSON-configured OpenAI-compatible provider; ships 12 catalog models including bge-m3 embeddings - PR #30203 |
TinyFish (tinyfish) | Search | Web search provider - PR #30634 |
FastCRW (fastcrw) | Search | Web search provider - PR #30434 |
Parasail (parasail) | Chat Completions | OpenAI-compatible provider |
Pinstripes (pinstripes) | Chat Completions | New chat provider; ships 6 catalog models |
New LLM API Endpoints​
| Capability | Description | Documentation |
|---|---|---|
| Code execution (e2b) | New sandbox / code-interpreter primitive for running model-generated code - PR #30898 | Sandbox |
New Models / Updated Models​
New Model Support (91 new models)​
| Provider | Model | Context | Input ($/1M) | Output ($/1M) | Features |
|---|---|---|---|---|---|
| Azure AI | azure_ai/gpt-5.5 | 1,050,000 | $5 | $30 | reasoning, function calling, prompt caching, pdf, vision |
| Azure AI | azure_ai/gpt-5.5-2026-04-23 | 1,050,000 | $5 | $30 | reasoning, function calling, prompt caching, pdf, vision |
| Azure AI | azure_ai/deepseek-v4-flash | 1,000,000 | $0.19 | $0.51 | reasoning, function calling |
| Azure AI | azure_ai/deepseek-v4-pro | 1,000,000 | $1.74 | $3.48 | reasoning, function calling |
| Azure AI | azure_ai/deepseek-v3.1 | 131,072 | $1.23 | $4.94 | reasoning, function calling |
| Azure AI | azure_ai/MAI-Image-2.5 | - | $5 | - | image generation |
| Azure AI | azure_ai/MAI-Image-2.5-Flash | - | $1.75 | - | image generation |
| Azure AI | azure_ai/MAI-Image-2e | - | $5 | - | image generation |
| Azure | azure/gpt-realtime-whisper | - | - | - | audio transcription |
| OpenAI | gpt-realtime-whisper | - | - | - | audio transcription |
| DeepSeek | deepseek-v4-flash / deepseek/deepseek-v4-flash | 1,000,000 | $0.14 | $0.28 | function calling, prompt caching |
| DeepSeek | deepseek-v4-pro / deepseek/deepseek-v4-pro | 1,000,000 | $0.43 | $0.87 | function calling, prompt caching |
| Mistral | mistral/mistral-medium-3-5 | 262,144 | $1.50 | $7.50 | function calling, vision |
| GitHub Copilot | github_copilot/mai-code-1-flash | 128,000 | $0.75 | $4.50 | function calling |
| Fireworks AI | 24 models incl. deepseek-v4-pro, glm-5p2, kimi-k2p6/kimi-k2p7-code, minimax-m3, qwen3p7-plus, gpt-oss-120b/gpt-oss-20b | up to 1,048,576 | $0.07-$2.80 | $0.28-$8.80 | function calling, reasoning, vision |
| Bedrock Mantle | bedrock_mantle/google.gemma-4-26b-a4b / gemma-4-31b / gemma-4-e2b | 128k-256k | $0.04-$0.14 | $0.08-$0.40 | function calling, reasoning, vision |
| LibertAI | 12 models incl. qwen3.6-35b-a3b(-thinking), gemma-4-31b-it(-thinking), deepseek-v4-flash, bge-m3 | up to 262,144 | $0.01-$0.25 | free-$1.75 | function calling, reasoning, vision, embedding |
| Pinstripes | 6 models incl. ps/minimax-m2.7, ps/qwen3.6-35b-a3b, ps/glm-4.5-air, ps/deepseek-v4-flash | up to 1,000,192 | $0.09-$0.30 | $0.20-$0.60 | function calling, reasoning |
| Scaleway | 17 models incl. qwen3.5-397b-a17b, mistral-medium-3.5-128b, gemma-4-26b-a4b-it, gpt-oss-120b, whisper-large-v3 | up to 256,000 | free-$1.50 | free-$7.50 | function calling, reasoning, vision, audio, embedding |
| Tensormesh | 10 models incl. Qwen3-Coder-480B-A35B-FP8, Qwen3.5-397B-A17B-FP8, Kimi-K2.6, DeepSeek-V4-Flash, gpt-oss-120b/gpt-oss-20b | up to 262,144 | $0.07-$1.40 | $0.28-$4.40 | function calling, reasoning, prompt caching |
| Soniox | soniox/stt-async-v5 | 8,000 | - | - | audio transcription |
| TinyFish | tinyfish/search | - | - | - | search |
The 91 new entries also include the full fireworks_ai/accounts/... model and router paths. Claude Fable 5 already shipped in v1.89.0, so it is not counted here. Full diff: model_prices_and_context_window.json.
Features​
- Anthropic
- OpenRouter
- Map reasoning
maxlevel toxhigh- PR #28881
- Map reasoning
- Bedrock
- DashScope
- Add Responses API support - PR #30286
- OCI
- Make Cohere
{{trace}}judges work (tool param types + agentic tool-calling continuation) - PR #30646
- Make Cohere
Bug Fixes​
- Anthropic
- Apply
cache_control_injection_pointson the/v1/messagespath - PR #30341 - Strip LiteLLM-injected
total_tokensfrom/v1/messagesresponses - PR #30382 - Cap cache_control injection at 4 blocks - PR #30480
- Drop orphaned
server_tool_useon multi-turn replay from generic OpenAI clients - PR #30486 - Don't leak tool
typeinto OpenAI function parameters schema - PR #30618
- Apply
- Bedrock
- Preserve
cache_controlfor ARN models in the/v1/messagesadapter - PR #29823 - Handle
role: "system"inside the messages array on/v1/messages- PR #30443 - Use a unique function-call id for Bedrock Mantle responses->chat tool calls - PR #30426
- Add SigV4 fallback to Bedrock Mantle chat completions auth - PR #30714
- Preserve
- Gemini / Vertex AI
- OpenAI-compatible
- WatsonX
- Wrap string embedding input in an array for the WatsonX API - PR #30897
- Pricing / Cost map
- Add cost mapping for
deepseek-v4-flash/deepseek-v4-pro- PR #27056 - Add
mistral-medium-3-5to the cost map - PR #29303 - Add
azure_ai/gpt-5.5to the model cost map - PR #30428 - Add GitHub Copilot MAI Code Flash pricing - PR #30415
- Sync the Fireworks AI model registry with the current platform catalog - PR #30616
- Add
soniox/stt-async-v5- PR #30672 - Correct swapped input/output token costs for
command-r7b-12-2024- PR #30413 - Add 1h cache-write cost for Anthropic Sonnet 4.5/4.6 - PR #30474
- Route Volcengine (Doubao) tiered-pricing models to the tiered cost handler - PR #30357; sort tiered thresholds numerically - PR #30375; treat a DashScope explicit
0.0tier cost as a real price - PR #30653 - Drop synthesized zero costs in
register_modelto preserve sparse entries - PR #30201
- Add cost mapping for
LLM API Endpoints​
Features​
- Responses API
- Propagate
completed_responsethroughFallbackResponsesStreamWrapperfor streaming/v1/responsescontainer ownership - PR #30213
- Propagate
- /v1/models
- Realtime
- Allow non-admin virtual keys to call GA Realtime WebRTC HTTP routes - PR #30089
- Files
- Attach existing OpenAI file ids - PR #30628
Bugs​
- General
- Token counter: handle Anthropic
tool_referenceblocks to stop dropped spend logs - PR #30302 - Streaming: guard
raise_on_model_repetitionagainst empty choices - PR #30485 - Audio: don't override an explicit
response_formatwithverbose_json- PR #30599 - Validate the resolved model in
/realtime/client_secretsfor non-transcription sessions - PR #30710
- Token counter: handle Anthropic
Management Endpoints / UI​
Features​
- App Router migration - models - PR #30677, teams - PR #30343, users - PR #30334, organizations - PR #30336, api-keys - PR #30699, usage report - PR #30694, agents + router-settings - PR #30323
- UI cleanup - remove the unreachable
/chatpage - PR #30178, dead UI components - PR #30340, orphaned pass-through-settings route - PR #30692; remove in-product survey and feedback nudges - PR #30773 - Virtual Keys - expose per-model budget usage in
/key/info- PR #30394; grace-period key rotation returns the deprecated-key lookup result on 401 - PR #30327 - Teams / Orgs - add
key_limitquery param to/team/info- PR #30006; list public team model names in/v1/models- PR #30588 - Proxy CLI Auth - add
verification_uri_completeto the CLI SSO device flow - PR #30571 - Proxy - configurable response headers and login-page hint - PR #30792; gate the "Default Credentials" hint on
/ui/loginbehind an env flag - PR #30234
Bugs​
- Access control / keys
/key/listnow does exactuser_id/key_aliasmatching by default, preventing cross-user key disclosure - PR #30593- Restrict
/customer/daily/activityto admin-only - PR #28849 org_adminsees all org teams when the UI sends its ownuser_id- PR #30247- Allow internal roles to access vector store CRUD routes - PR #30503
- Require premium only when enabling premium metadata fields - PR #30506
- Guard
check_and_fix_namespaceagainst aNonekey - PR #30435 - Warn at startup when
custom_authskipscommon_checksenforcement - PR #30665 - Resolve list-files credentials from team BYOK deployments - PR #30495; preserve
azure_ad_tokenthroughCredentialLiteLLMParamsfor/v1/files+ batches - PR #30241 - Enforce budget for models not in the cost map - PR #24949
- UI
- Stop the Virtual Keys page from an infinite render loop - PR #30397
- Source api-keys identity from
useAuthorizedto stop "User ID is not set" - PR #30903 - Warn that team models are deleted in the delete-team modal - PR #29990
- Three small fixes - Gemini
api_base, credential form reset, Mode badge - PR #30419 - Repoint the dead usage-guide link to cost-tracking docs - PR #30859
- Proxy
- Support SMTP implicit SSL (port 465) - PR #30395
AI Integrations​
Logging​
- OpenTelemetry
- Emit the six
gen_ai.client.*metrics at v1 parity in v2 - PR #30326 - One v2 logger owns the global provider; scope tenant OTLP creds per exporter - PR #30590
- Export v2 gen_ai client metrics to the configured meter provider - PR #30549
- Stamp
gen_ai.input/output.messageson v2 spans - PR #30548 - Cap metric attribute cardinality with include/exclude lists - PR #30257
- Record the full error message on the standard exception event in v2 - PR #30380
- Accept
UPPER_SNAKE_CASEOTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENTin v2 - PR #30562
- Emit the six
- General
- Preserve
error_messageonProxyExceptionfailures in spend logs - PR #30381
- Preserve
Guardrails​
- Cisco AI Defense - new integration - PR #28249
- Repello Argus - new integration - PR #30465
- Presidio - add missing UK PII entity types - PR #30537; don't mask the live request when the guardrail is
logging_only- PR #30461 - AIM - return 400 not 500 when AIM blocks a request - PR #30573
- General
Secret Managers​
- AWS Secrets Manager - cross-region replication - PR #30368
Spend Tracking, Budgets and Rate Limiting​
- Service-tier pricing - apply the
service_tiersuffix to above-threshold cache rates and expose priority+threshold keys inModelInfo- PR #30450; price and surface the Anthropic responseservice_tierin cost tracking - PR #30558; stop non-stringservice_tierfrom silently dropping cost tracking - PR #30690, PR #30706 - Budgets - enforce budgets against authoritative DB spend when the cross-pod counter is stale - PR #30684; release a budget reservation when a request is cancelled mid-flight - PR #30522; recompute
budget_reset_atwhenbudget_durationchanges - PR #30555 - Rate limiting - prevent internal
parallel_request_limiterfields from leaking to upstream providers - PR #30545 - Spend accuracy - record partial spend on the failure row for interrupted streams - PR #30788; recover output tokens for interrupted Anthropic streams - PR #30787; stop Perplexity double-billing reasoning tokens in the manual cost fallback - PR #30488; correct cached-token usage with
ChatCompletionUsageBlock- PR #30422 - Usage aggregation - drain all daily-spend batches per flush cycle - PR #30505; show session-aggregate cost and duration in request logs - PR #30507; coalesce null aggregates for no-spend keys - PR #29945; remove timezone date expansion in daily-activity aggregation - PR #29569
MCP Gateway​
- Make the MCP gateway name and description configurable via env vars - PR #30473
- Fail closed when the scope filter resolves to no servers - PR #30353
- Re-raise instead of silently dropping MCP team permissions - PR #30477
- Drop the phantom 401 span on delegated OAuth2 tool calls - PR #30494
- Default the Linear MCP registry entry to streamable HTTP - PR #30396
- Preserve native tools in the semantic filter hook - PR #26650
Performance / Loadbalancing / Reliability improvements​
- Streaming connection hygiene - cancel the upstream Gemini request and release the httpx connection on client disconnect - PR #30075; close the upstream LLM stream when the client disconnects mid-stream - PR #30245; release the aiohttp connection when stream iteration ends abnormally - PR #30271; use
e.request_dataforlogging_objinModifyResponseExceptionstreaming passthrough - PR #30800 - Caching - add a valkey-semantic cache backend and fix semantic-cache scope keys - PR #30675; url-encode the object name in the GCS cache GET path - PR #30378; allow
use_redis_transaction_bufferwithout a Redis cache - PR #28764 - Router / fallbacks - resolve a list-unhashable crash on model alias - PR #30464; clean pattern_router state on upsert/delete - PR #29601; preserve the fallback model in SDK fallback responses - PR #28260; add
expose_router_debug_in_errors(default True) to redact internal model_group/fallback names - PR #30418 - Startup / workers - fail fast on a non-PostgreSQL
DATABASE_URLinstead of hanging - PR #30366; add--max_requests_before_restart_jitterto stagger worker restarts - PR #30601; fix the IAM refresh-engine watcher race - PR #30183; release the cron pod-lock by matchingasync_set_cacheJSON encoding - PR #30600 - Health checks - correct Bedrock embedding health checks - PR #30583; bump the health-check
max_tokensdefault to 16 for GPT-5 compatibility - PR #30708, PR #26610 - Developer experience / CI - around 30 PRs hardening the lint and type-check gates (standardizing on basedpyright, dropping mypy, ratcheting any-discipline budgets), an osv-scanner lockfile workflow, zizmor PR gating, a local fake-OpenAI test endpoint replacing the shared mock, dependency bumps, and a pinned build toolchain.
Documentation Updates​
- Add 1-click AWS/GCP Terraform deploy buttons and fix README deploy-button rendering - PR #29879
- Strengthen the coding conventions in
CLAUDE.md- PR #30333 - Clarify the Linear portion of the PR template - PR #30766
New Contributors​
@hannahmadison, @ayushh0110, @Dotify71, @munnr, @V-3604, @yrk111222, @Silvenga, @djmaze, @apshada, @HumphreySun98, @Harshxth, @tomoyat1, @S0ngRu1, @habonlaci, @moshemalawach, @nahrinoda, @Vedant-Agarwal, @lollinng, @anneheartrecord, @hdt12a1, @vineethsaivs, @krishvsoni, @rvishwas26, @santino18727-debug, @darktheorys, @songkuan-zheng, @Thijmen, @Kropiunig, @jay-tau, @KnyazSh, @koztkozt, @us, @Anuj7411, @zkryakgul, @lavish619, @EugeneLugovtsov, @Bochenski, @menardorama, @factnn, @semmons99, @nitishagar, @FadelT, @jho1-godaddy, @yucheng-berri, @ad1269, @shzdehmd, @vanika02, @Nithish-Yenaganti, @simantak-dabhade, @devYRPauli, @clpatterson, @tcconnally