v1.90.0rc1 - Six New Providers, OpenTelemetry v2 Parity & Streaming Reliability

Deploy this version

Docker
Pip

docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:1.90.0-rc.1

pip install litellm==1.90.0rc1

Key Highlights

v1.90.0rc1 is the current release candidate for 1.90.0.

Six new providers - ModelScope, LibertAI, Parasail, Pinstripes, TinyFish (search), and FastCRW (search) - plus a new e2b code-execution sandbox primitive.
91 new models across Fireworks AI, Scaleway, Tensormesh, LibertAI, Azure AI (including gpt-5.5 and DeepSeek V4), and Bedrock Mantle.
OpenTelemetry v2 reaches metrics parity with v1, emitting the six gen_ai.client.* metrics, stamping input/output message content, and scoping OTLP credentials per tenant.
A broad streaming-reliability sweep: upstream connections are now released when the client disconnects mid-stream (Gemini, aiohttp), requests are cancelled cleanly, and partial spend is recorded on interrupted streams.
Two new guardrails (Cisco AI Defense, Repello Argus) and a large Next.js App Router UI migration covering the models, teams, users, organizations, api-keys, and usage pages.

New Providers and Endpoints

New Providers (6 new providers)

Provider	Supported LiteLLM Endpoints	Description
ModelScope (`modelscope`)	Chat Completions	OpenAI-compatible provider for ModelScope-hosted models - PR #28460
LibertAI (`libertai`)	Chat Completions, Embeddings	JSON-configured OpenAI-compatible provider; ships 12 catalog models including `bge-m3` embeddings - PR #30203
TinyFish (`tinyfish`)	Search	Web search provider - PR #30634
FastCRW (`fastcrw`)	Search	Web search provider - PR #30434
Parasail (`parasail`)	Chat Completions	OpenAI-compatible provider
Pinstripes (`pinstripes`)	Chat Completions	New chat provider; ships 6 catalog models

New LLM API Endpoints

Capability	Description	Documentation
Code execution (e2b)	New sandbox / code-interpreter primitive for running model-generated code - PR #30898	Sandbox

New Models / Updated Models

New Model Support (91 new models)

Provider	Model	Context	Input ($/1M)	Output ($/1M)	Features
Azure AI	`azure_ai/gpt-5.5`	1,050,000	$5	$30	reasoning, function calling, prompt caching, pdf, vision
Azure AI	`azure_ai/gpt-5.5-2026-04-23`	1,050,000	$5	$30	reasoning, function calling, prompt caching, pdf, vision
Azure AI	`azure_ai/deepseek-v4-flash`	1,000,000	$0.19	$0.51	reasoning, function calling
Azure AI	`azure_ai/deepseek-v4-pro`	1,000,000	$1.74	$3.48	reasoning, function calling
Azure AI	`azure_ai/deepseek-v3.1`	131,072	$1.23	$4.94	reasoning, function calling
Azure AI	`azure_ai/MAI-Image-2.5`	-	$5	-	image generation
Azure AI	`azure_ai/MAI-Image-2.5-Flash`	-	$1.75	-	image generation
Azure AI	`azure_ai/MAI-Image-2e`	-	$5	-	image generation
Azure	`azure/gpt-realtime-whisper`	-	-	-	audio transcription
OpenAI	`gpt-realtime-whisper`	-	-	-	audio transcription
DeepSeek	`deepseek-v4-flash` / `deepseek/deepseek-v4-flash`	1,000,000	$0.14	$0.28	function calling, prompt caching
DeepSeek	`deepseek-v4-pro` / `deepseek/deepseek-v4-pro`	1,000,000	$0.43	$0.87	function calling, prompt caching
Mistral	`mistral/mistral-medium-3-5`	262,144	$1.50	$7.50	function calling, vision
GitHub Copilot	`github_copilot/mai-code-1-flash`	128,000	$0.75	$4.50	function calling
Fireworks AI	24 models incl. `deepseek-v4-pro`, `glm-5p2`, `kimi-k2p6`/`kimi-k2p7-code`, `minimax-m3`, `qwen3p7-plus`, `gpt-oss-120b`/`gpt-oss-20b`	up to 1,048,576	$0.07-$2.80	$0.28-$8.80	function calling, reasoning, vision
Bedrock Mantle	`bedrock_mantle/google.gemma-4-26b-a4b` / `gemma-4-31b` / `gemma-4-e2b`	128k-256k	$0.04-$0.14	$0.08-$0.40	function calling, reasoning, vision
LibertAI	12 models incl. `qwen3.6-35b-a3b(-thinking)`, `gemma-4-31b-it(-thinking)`, `deepseek-v4-flash`, `bge-m3`	up to 262,144	$0.01-$0.25	free-$1.75	function calling, reasoning, vision, embedding
Pinstripes	6 models incl. `ps/minimax-m2.7`, `ps/qwen3.6-35b-a3b`, `ps/glm-4.5-air`, `ps/deepseek-v4-flash`	up to 1,000,192	$0.09-$0.30	$0.20-$0.60	function calling, reasoning
Scaleway	17 models incl. `qwen3.5-397b-a17b`, `mistral-medium-3.5-128b`, `gemma-4-26b-a4b-it`, `gpt-oss-120b`, `whisper-large-v3`	up to 256,000	free-$1.50	free-$7.50	function calling, reasoning, vision, audio, embedding
Tensormesh	10 models incl. `Qwen3-Coder-480B-A35B-FP8`, `Qwen3.5-397B-A17B-FP8`, `Kimi-K2.6`, `DeepSeek-V4-Flash`, `gpt-oss-120b`/`gpt-oss-20b`	up to 262,144	$0.07-$1.40	$0.28-$4.40	function calling, reasoning, prompt caching
Soniox	`soniox/stt-async-v5`	8,000	-	-	audio transcription
TinyFish	`tinyfish/search`	-	-	-	search

The 91 new entries also include the full fireworks_ai/accounts/... model and router paths. Claude Fable 5 already shipped in v1.89.0, so it is not counted here. Full diff: model_prices_and_context_window.json.

Features

Anthropic
- Surface compaction usage iterations data - PR #27065
- Serve Anthropic-native /v1/models for Claude Code gateway discovery - PR #30273
OpenRouter
- Map reasoning max level to xhigh - PR #28881
Bedrock
- Optionally forward multimodal content blocks in AgentCore InvokeAgentRuntime - PR #28885
- Support file content retrieval for batch output files - PR #30595
- Make Bedrock Mantle Responses routing data-driven per model - PR #30700
DashScope
- Add Responses API support - PR #30286
OCI
- Make Cohere {{trace}} judges work (tool param types + agentic tool-calling continuation) - PR #30646

Bug Fixes

Anthropic
- Apply cache_control_injection_points on the /v1/messages path - PR #30341
- Strip LiteLLM-injected total_tokens from /v1/messages responses - PR #30382
- Cap cache_control injection at 4 blocks - PR #30480
- Drop orphaned server_tool_use on multi-turn replay from generic OpenAI clients - PR #30486
- Don't leak tool type into OpenAI function parameters schema - PR #30618
Bedrock
- Preserve cache_control for ARN models in the /v1/messages adapter - PR #29823
- Handle role: "system" inside the messages array on /v1/messages - PR #30443
- Use a unique function-call id for Bedrock Mantle responses->chat tool calls - PR #30426
- Add SigV4 fallback to Bedrock Mantle chat completions auth - PR #30714
Gemini / Vertex AI
- Use get_vertex_base_url for cachedContents host - PR #29707
- Buffer native Gemini SSE frames - PR #30225
- Map Gemini upstream-error body code 429 to RateLimitError - PR #30417
- Ensure checks show gemini-3-flash-preview supports responseJsonSchema - PR #30696
OpenAI-compatible
- Preserve cache_control for OpenAI-compatible custom endpoints - PR #30387
- hosted_vllm: remove thinking_blocks and convert list content to strings - PR #30475
- Don't stack provider prefix on wildcard models with a custom prefix - PR #30360
WatsonX
- Wrap string embedding input in an array for the WatsonX API - PR #30897
Pricing / Cost map
- Add cost mapping for deepseek-v4-flash/deepseek-v4-pro - PR #27056
- Add mistral-medium-3-5 to the cost map - PR #29303
- Add azure_ai/gpt-5.5 to the model cost map - PR #30428
- Add GitHub Copilot MAI Code Flash pricing - PR #30415
- Sync the Fireworks AI model registry with the current platform catalog - PR #30616
- Add soniox/stt-async-v5 - PR #30672
- Correct swapped input/output token costs for command-r7b-12-2024 - PR #30413
- Add 1h cache-write cost for Anthropic Sonnet 4.5/4.6 - PR #30474
- Route Volcengine (Doubao) tiered-pricing models to the tiered cost handler - PR #30357; sort tiered thresholds numerically - PR #30375; treat a DashScope explicit 0.0 tier cost as a real price - PR #30653
- Drop synthesized zero costs in register_model to preserve sparse entries - PR #30201

LLM API Endpoints

Features

Responses API
- Propagate completed_response through FallbackResponsesStreamWrapper for streaming /v1/responses container ownership - PR #30213
/v1/models
- Surface max_input_tokens/max_output_tokens on /v1/models - PR #30272
- Include model group aliases in v1 model info - PR #30626
Realtime
- Allow non-admin virtual keys to call GA Realtime WebRTC HTTP routes - PR #30089
Files
- Attach existing OpenAI file ids - PR #30628

Bugs

General
- Token counter: handle Anthropic tool_reference blocks to stop dropped spend logs - PR #30302
- Streaming: guard raise_on_model_repetition against empty choices - PR #30485
- Audio: don't override an explicit response_format with verbose_json - PR #30599
- Validate the resolved model in /realtime/client_secrets for non-transcription sessions - PR #30710

Management Endpoints / UI

Features

App Router migration - models - PR #30677, teams - PR #30343, users - PR #30334, organizations - PR #30336, api-keys - PR #30699, usage report - PR #30694, agents + router-settings - PR #30323
UI cleanup - remove the unreachable /chat page - PR #30178, dead UI components - PR #30340, orphaned pass-through-settings route - PR #30692; remove in-product survey and feedback nudges - PR #30773
Virtual Keys - expose per-model budget usage in /key/info - PR #30394; grace-period key rotation returns the deprecated-key lookup result on 401 - PR #30327
Teams / Orgs - add key_limit query param to /team/info - PR #30006; list public team model names in /v1/models - PR #30588
Proxy CLI Auth - add verification_uri_complete to the CLI SSO device flow - PR #30571
Proxy - configurable response headers and login-page hint - PR #30792; gate the "Default Credentials" hint on /ui/login behind an env flag - PR #30234

Bugs

Access control / keys
- /key/list now does exact user_id/key_alias matching by default, preventing cross-user key disclosure - PR #30593
- Restrict /customer/daily/activity to admin-only - PR #28849
- org_admin sees all org teams when the UI sends its own user_id - PR #30247
- Allow internal roles to access vector store CRUD routes - PR #30503
- Require premium only when enabling premium metadata fields - PR #30506
- Guard check_and_fix_namespace against a None key - PR #30435
- Warn at startup when custom_auth skips common_checks enforcement - PR #30665
- Resolve list-files credentials from team BYOK deployments - PR #30495; preserve azure_ad_token through CredentialLiteLLMParams for /v1/files + batches - PR #30241
- Enforce budget for models not in the cost map - PR #24949
UI
- Stop the Virtual Keys page from an infinite render loop - PR #30397
- Source api-keys identity from useAuthorized to stop "User ID is not set" - PR #30903
- Warn that team models are deleted in the delete-team modal - PR #29990
- Three small fixes - Gemini api_base, credential form reset, Mode badge - PR #30419
- Repoint the dead usage-guide link to cost-tracking docs - PR #30859
Proxy
- Support SMTP implicit SSL (port 465) - PR #30395

AI Integrations

Logging

OpenTelemetry
- Emit the six gen_ai.client.* metrics at v1 parity in v2 - PR #30326
- One v2 logger owns the global provider; scope tenant OTLP creds per exporter - PR #30590
- Export v2 gen_ai client metrics to the configured meter provider - PR #30549
- Stamp gen_ai.input/output.messages on v2 spans - PR #30548
- Cap metric attribute cardinality with include/exclude lists - PR #30257
- Record the full error message on the standard exception event in v2 - PR #30380
- Accept UPPER_SNAKE_CASE OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT in v2 - PR #30562
General
- Preserve error_message on ProxyException failures in spend logs - PR #30381

Guardrails

Cisco AI Defense - new integration - PR #28249
Repello Argus - new integration - PR #30465
Presidio - add missing UK PII entity types - PR #30537; don't mask the live request when the guardrail is logging_only - PR #30461
AIM - return 400 not 500 when AIM blocks a request - PR #30573
General
- Stop re-initializing DB guardrails on every poll - PR #30542
- Run the pre_call hook once for model-level guardrails - PR #30543
- disable_global_guardrails overrides the team list - PR #28563
- Surface OpenAI moderation violation_categories on guardrail traces - PR #30659

Secret Managers

AWS Secrets Manager - cross-region replication - PR #30368

Spend Tracking, Budgets and Rate Limiting

Service-tier pricing - apply the service_tier suffix to above-threshold cache rates and expose priority+threshold keys in ModelInfo - PR #30450; price and surface the Anthropic response service_tier in cost tracking - PR #30558; stop non-string service_tier from silently dropping cost tracking - PR #30690, PR #30706
Budgets - enforce budgets against authoritative DB spend when the cross-pod counter is stale - PR #30684; release a budget reservation when a request is cancelled mid-flight - PR #30522; recompute budget_reset_at when budget_duration changes - PR #30555
Rate limiting - prevent internal parallel_request_limiter fields from leaking to upstream providers - PR #30545
Spend accuracy - record partial spend on the failure row for interrupted streams - PR #30788; recover output tokens for interrupted Anthropic streams - PR #30787; stop Perplexity double-billing reasoning tokens in the manual cost fallback - PR #30488; correct cached-token usage with ChatCompletionUsageBlock - PR #30422
Usage aggregation - drain all daily-spend batches per flush cycle - PR #30505; show session-aggregate cost and duration in request logs - PR #30507; coalesce null aggregates for no-spend keys - PR #29945; remove timezone date expansion in daily-activity aggregation - PR #29569

MCP Gateway

Make the MCP gateway name and description configurable via env vars - PR #30473
Fail closed when the scope filter resolves to no servers - PR #30353
Re-raise instead of silently dropping MCP team permissions - PR #30477
Drop the phantom 401 span on delegated OAuth2 tool calls - PR #30494
Default the Linear MCP registry entry to streamable HTTP - PR #30396
Preserve native tools in the semantic filter hook - PR #26650

Performance / Loadbalancing / Reliability improvements

Streaming connection hygiene - cancel the upstream Gemini request and release the httpx connection on client disconnect - PR #30075; close the upstream LLM stream when the client disconnects mid-stream - PR #30245; release the aiohttp connection when stream iteration ends abnormally - PR #30271; use e.request_data for logging_obj in ModifyResponseException streaming passthrough - PR #30800
Caching - add a valkey-semantic cache backend and fix semantic-cache scope keys - PR #30675; url-encode the object name in the GCS cache GET path - PR #30378; allow use_redis_transaction_buffer without a Redis cache - PR #28764
Router / fallbacks - resolve a list-unhashable crash on model alias - PR #30464; clean pattern_router state on upsert/delete - PR #29601; preserve the fallback model in SDK fallback responses - PR #28260; add expose_router_debug_in_errors (default True) to redact internal model_group/fallback names - PR #30418
Startup / workers - fail fast on a non-PostgreSQL DATABASE_URL instead of hanging - PR #30366; add --max_requests_before_restart_jitter to stagger worker restarts - PR #30601; fix the IAM refresh-engine watcher race - PR #30183; release the cron pod-lock by matching async_set_cache JSON encoding - PR #30600
Health checks - correct Bedrock embedding health checks - PR #30583; bump the health-check max_tokens default to 16 for GPT-5 compatibility - PR #30708, PR #26610
Developer experience / CI - around 30 PRs hardening the lint and type-check gates (standardizing on basedpyright, dropping mypy, ratcheting any-discipline budgets), an osv-scanner lockfile workflow, zizmor PR gating, a local fake-OpenAI test endpoint replacing the shared mock, dependency bumps, and a pinned build toolchain.

Documentation Updates

Add 1-click AWS/GCP Terraform deploy buttons and fix README deploy-button rendering - PR #29879
Strengthen the coding conventions in CLAUDE.md - PR #30333
Clarify the Linear portion of the PR template - PR #30766

New Contributors

@hannahmadison, @ayushh0110, @Dotify71, @munnr, @V-3604, @yrk111222, @Silvenga, @djmaze, @apshada, @HumphreySun98, @Harshxth, @tomoyat1, @S0ngRu1, @habonlaci, @moshemalawach, @nahrinoda, @Vedant-Agarwal, @lollinng, @anneheartrecord, @hdt12a1, @vineethsaivs, @krishvsoni, @rvishwas26, @santino18727-debug, @darktheorys, @songkuan-zheng, @Thijmen, @Kropiunig, @jay-tau, @KnyazSh, @koztkozt, @us, @Anuj7411, @zkryakgul, @lavish619, @EugeneLugovtsov, @Bochenski, @menardorama, @factnn, @semmons99, @nitishagar, @FadelT, @jho1-godaddy, @yucheng-berri, @ad1269, @shzdehmd, @vanika02, @Nithish-Yenaganti, @simantak-dabhade, @devYRPauli, @clpatterson, @tcconnally

Full Changelog

v1.89.0...v1.90.0-rc.1

Deploy this version​

Key Highlights​

New Providers and Endpoints​

New Providers (6 new providers)​

New LLM API Endpoints​

New Models / Updated Models​

New Model Support (91 new models)​

Features​

Bug Fixes​

LLM API Endpoints​

Features​

Bugs​

Management Endpoints / UI​

Features​

Bugs​

AI Integrations​

Logging​

Guardrails​

Secret Managers​

Spend Tracking, Budgets and Rate Limiting​

MCP Gateway​

Performance / Loadbalancing / Reliability improvements​

Documentation Updates​

New Contributors​

Full Changelog​

Deploy this version

Key Highlights

New Providers and Endpoints

New Providers (6 new providers)

New LLM API Endpoints

New Models / Updated Models

New Model Support (91 new models)

Features

Bug Fixes

LLM API Endpoints

Features

Bugs

Management Endpoints / UI

Features

Bugs

AI Integrations

Logging

Guardrails

Secret Managers

Spend Tracking, Budgets and Rate Limiting

MCP Gateway

Performance / Loadbalancing / Reliability improvements

Documentation Updates

New Contributors

Full Changelog