v1.90.0 - Six New Providers, OpenTelemetry v2 Parity & Streaming Reliability

Deploy this version

Docker
Pip

docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:1.90.0

pip install litellm==1.90.0

Key Highlights

Six new providers - ModelScope, LibertAI, Parasail, Pinstripes, TinyFish (search), and FastCRW (search) - plus a new e2b code-execution sandbox primitive.
91 new models across Fireworks AI, Scaleway, Tensormesh, LibertAI, Azure AI (including gpt-5.5 and DeepSeek V4), and Bedrock Mantle.
OpenTelemetry v2 reaches metrics parity with v1, emitting the six gen_ai.client.* metrics, stamping input/output message content, and scoping OTLP credentials per tenant.
A broad streaming-reliability sweep: upstream connections are now released when the client disconnects mid-stream (Gemini, aiohttp), requests are cancelled cleanly, and partial spend is recorded on interrupted streams.
Two new guardrails (Cisco AI Defense, Repello Argus) and a large Next.js App Router UI migration covering the models, teams, users, organizations, api-keys, and usage pages.

App Router Routing

We're moving the Admin UI from query param based routing to the Nextjs App Router. The motivation is that routing now lives in the URL, so any view (a specific team, a filtered usage report, a single key) becomes a shareable link you can send to a teammate or bookmark instead of a piece of in-memory client state.

The motivation for this is twofold: it lays the groundwork for a lot of highly requested feaetures/improvements to the UI, and it lays the groundwork for contributing to LiteLLM easier and code more human reviewable for maintainers.

The biggest one here will be sharing links for different pages like specific logs pages, teams pages, and more.

New Providers and Endpoints

New Providers (6 new providers)

Provider	Supported LiteLLM Endpoints	Description
ModelScope (`modelscope`)	Chat Completions	OpenAI-compatible provider for ModelScope-hosted models - PR #28460
LibertAI (`libertai`)	Chat Completions, Embeddings	JSON-configured OpenAI-compatible provider; ships 12 catalog models including `bge-m3` embeddings - PR #30203
TinyFish (`tinyfish`)	Search	Web search provider - PR #30634
FastCRW (`fastcrw`)	Search	Web search provider - PR #30434
Parasail (`parasail`)	Chat Completions	OpenAI-compatible provider
Pinstripes (`pinstripes`)	Chat Completions	New chat provider; ships 6 catalog models

New LLM API Endpoints

Capability	Description	Documentation
Code execution (e2b)	New sandbox / code-interpreter primitive for running model-generated code - PR #30898	Sandbox

New Models / Updated Models

New Model Support (91 new models)

Provider	Model	Context	Input ($/1M)	Output ($/1M)	Features
Azure AI	`azure_ai/gpt-5.5`	1,050,000	$5	$30	reasoning, function calling, prompt caching, pdf, vision
Azure AI	`azure_ai/gpt-5.5-2026-04-23`	1,050,000	$5	$30	reasoning, function calling, prompt caching, pdf, vision
Azure AI	`azure_ai/deepseek-v4-flash`	1,000,000	$0.19	$0.51	reasoning, function calling
Azure AI	`azure_ai/deepseek-v4-pro`	1,000,000	$1.74	$3.48	reasoning, function calling
Azure AI	`azure_ai/deepseek-v3.1`	131,072	$1.23	$4.94	reasoning, function calling
Azure AI	`azure_ai/MAI-Image-2.5`	-	$5	-	image generation
Azure AI	`azure_ai/MAI-Image-2.5-Flash`	-	$1.75	-	image generation
Azure AI	`azure_ai/MAI-Image-2e`	-	$5	-	image generation
Azure	`azure/gpt-realtime-whisper`	-	-	-	audio transcription
OpenAI	`gpt-realtime-whisper`	-	-	-	audio transcription
DeepSeek	`deepseek-v4-flash` / `deepseek/deepseek-v4-flash`	1,000,000	$0.14	$0.28	function calling, prompt caching
DeepSeek	`deepseek-v4-pro` / `deepseek/deepseek-v4-pro`	1,000,000	$0.43	$0.87	function calling, prompt caching
Mistral	`mistral/mistral-medium-3-5`	262,144	$1.50	$7.50	function calling, vision
GitHub Copilot	`github_copilot/mai-code-1-flash`	128,000	$0.75	$4.50	function calling
Fireworks AI	24 models incl. `deepseek-v4-pro`, `glm-5p2`, `kimi-k2p6`/`kimi-k2p7-code`, `minimax-m3`, `qwen3p7-plus`, `gpt-oss-120b`/`gpt-oss-20b`	up to 1,048,576	$0.07-$2.80	$0.28-$8.80	function calling, reasoning, vision
Bedrock Mantle	`bedrock_mantle/google.gemma-4-26b-a4b` / `gemma-4-31b` / `gemma-4-e2b`	128k-256k	$0.04-$0.14	$0.08-$0.40	function calling, reasoning, vision
LibertAI	12 models incl. `qwen3.6-35b-a3b(-thinking)`, `gemma-4-31b-it(-thinking)`, `deepseek-v4-flash`, `bge-m3`	up to 262,144	$0.01-$0.25	free-$1.75	function calling, reasoning, vision, embedding
Pinstripes	6 models incl. `ps/minimax-m2.7`, `ps/qwen3.6-35b-a3b`, `ps/glm-4.5-air`, `ps/deepseek-v4-flash`	up to 1,000,192	$0.09-$0.30	$0.20-$0.60	function calling, reasoning
Scaleway	17 models incl. `qwen3.5-397b-a17b`, `mistral-medium-3.5-128b`, `gemma-4-26b-a4b-it`, `gpt-oss-120b`, `whisper-large-v3`	up to 256,000	free-$1.50	free-$7.50	function calling, reasoning, vision, audio, embedding
Tensormesh	10 models incl. `Qwen3-Coder-480B-A35B-FP8`, `Qwen3.5-397B-A17B-FP8`, `Kimi-K2.6`, `DeepSeek-V4-Flash`, `gpt-oss-120b`/`gpt-oss-20b`	up to 262,144	$0.07-$1.40	$0.28-$4.40	function calling, reasoning, prompt caching
Soniox	`soniox/stt-async-v5`	8,000	-	-	audio transcription
TinyFish	`tinyfish/search`	-	-	-	search

The 91 new entries also include the full fireworks_ai/accounts/... model and router paths. Claude Fable 5 already shipped in v1.89.0, so it is not counted here. Full diff: model_prices_and_context_window.json.

Features

Anthropic
- Surface compaction usage iterations data - PR #27065
- Serve Anthropic-native /v1/models for Claude Code gateway discovery - PR #30273
OpenRouter
- Map reasoning max level to xhigh - PR #28881
Bedrock
- Optionally forward multimodal content blocks in AgentCore InvokeAgentRuntime - PR #28885
- Support file content retrieval for batch output files - PR #30595
- Make Bedrock Mantle Responses routing data-driven per model - PR #30700
DashScope
- Add Responses API support - PR #30286
OCI
- Make Cohere {{trace}} judges work (tool param types + agentic tool-calling continuation) - PR #30646

Bug Fixes

Anthropic
- Apply cache_control_injection_points on the /v1/messages path - PR #30341
- Strip LiteLLM-injected total_tokens from /v1/messages responses - PR #30382
- Cap cache_control injection at 4 blocks - PR #30480
- Drop orphaned server_tool_use on multi-turn replay from generic OpenAI clients - PR #30486
- Don't leak tool type into OpenAI function parameters schema - PR #30618
Bedrock
- Preserve cache_control for ARN models in the /v1/messages adapter - PR #29823
- Handle role: "system" inside the messages array on /v1/messages - PR #30443
- Use a unique function-call id for Bedrock Mantle responses->chat tool calls - PR #30426
- Add SigV4 fallback to Bedrock Mantle chat completions auth - PR #30714
Gemini / Vertex AI
- Use get_vertex_base_url for cachedContents host - PR #29707
- Buffer native Gemini SSE frames - PR #30225
- Map Gemini upstream-error body code 429 to RateLimitError - PR #30417
- Ensure checks show gemini-3-flash-preview supports responseJsonSchema - PR #30696
OpenAI-compatible
- Preserve cache_control for OpenAI-compatible custom endpoints - PR #30387
- hosted_vllm: remove thinking_blocks and convert list content to strings - PR #30475
- Don't stack provider prefix on wildcard models with a custom prefix - PR #30360
WatsonX
- Wrap string embedding input in an array for the WatsonX API - PR #30897
Pricing / Cost map
- Add cost mapping for deepseek-v4-flash/deepseek-v4-pro - PR #27056
- Add mistral-medium-3-5 to the cost map - PR #29303
- Add azure_ai/gpt-5.5 to the model cost map - PR #30428
- Add GitHub Copilot MAI Code Flash pricing - PR #30415
- Sync the Fireworks AI model registry with the current platform catalog - PR #30616
- Add soniox/stt-async-v5 - PR #30672
- Correct swapped input/output token costs for command-r7b-12-2024 - PR #30413
- Add 1h cache-write cost for Anthropic Sonnet 4.5/4.6 - PR #30474
- Route Volcengine (Doubao) tiered-pricing models to the tiered cost handler - PR #30357; sort tiered thresholds numerically - PR #30375; treat a DashScope explicit 0.0 tier cost as a real price - PR #30653
- Drop synthesized zero costs in register_model to preserve sparse entries - PR #30201

LLM API Endpoints

Features

Responses API
- Propagate completed_response through FallbackResponsesStreamWrapper for streaming /v1/responses container ownership - PR #30213
/v1/models
- Surface max_input_tokens/max_output_tokens on /v1/models - PR #30272
- Include model group aliases in v1 model info - PR #30626
Realtime
- Allow non-admin virtual keys to call GA Realtime WebRTC HTTP routes - PR #30089
Files
- Attach existing OpenAI file ids - PR #30628

Bugs

General
- Token counter: handle Anthropic tool_reference blocks to stop dropped spend logs - PR #30302
- Streaming: guard raise_on_model_repetition against empty choices - PR #30485
- Audio: don't override an explicit response_format with verbose_json - PR #30599
- Validate the resolved model in /realtime/client_secrets for non-transcription sessions - PR #30710

Management Endpoints / UI

Features

App Router migration - models - PR #30677, teams - PR #30343, users - PR #30334, organizations - PR #30336, api-keys - PR #30699, usage report - PR #30694, agents + router-settings - PR #30323
UI cleanup - remove the unreachable /chat page - PR #30178, dead UI components - PR #30340, orphaned pass-through-settings route - PR #30692; remove in-product survey and feedback nudges - PR #30773
Virtual Keys - expose per-model budget usage in /key/info - PR #30394; grace-period key rotation returns the deprecated-key lookup result on 401 - PR #30327
Teams / Orgs - add key_limit query param to /team/info - PR #30006; list public team model names in /v1/models - PR #30588
Proxy CLI Auth - add verification_uri_complete to the CLI SSO device flow - PR #30571
Proxy - configurable response headers and login-page hint - PR #30792; gate the "Default Credentials" hint on /ui/login behind an env flag - PR #30234

Bugs

Access control / keys
- /key/list now does exact user_id/key_alias matching by default, preventing cross-user key disclosure - PR #30593
- Restrict /customer/daily/activity to admin-only - PR #28849
- org_admin sees all org teams when the UI sends its own user_id - PR #30247
- Allow internal roles to access vector store CRUD routes - PR #30503
- Require premium only when enabling premium metadata fields - PR #30506
- Guard check_and_fix_namespace against a None key - PR #30435
- Warn at startup when custom_auth skips common_checks enforcement - PR #30665
- Resolve list-files credentials from team BYOK deployments - PR #30495; preserve azure_ad_token through CredentialLiteLLMParams for /v1/files + batches - PR #30241
- Enforce budget for models not in the cost map - PR #24949
UI
- Stop the Virtual Keys page from an infinite render loop - PR #30397
- Source api-keys identity from useAuthorized to stop "User ID is not set" - PR #30903
- Render logos correctly under a custom server_root_path - PR #31156
- Warn that team models are deleted in the delete-team modal - PR #29990
- Three small fixes - Gemini api_base, credential form reset, Mode badge - PR #30419
- Repoint the dead usage-guide link to cost-tracking docs - PR #30859
Proxy
- Support SMTP implicit SSL (port 465) - PR #30395

AI Integrations

Logging

OpenTelemetry
- Emit the six gen_ai.client.* metrics at v1 parity in v2 - PR #30326
- One v2 logger owns the global provider; scope tenant OTLP creds per exporter - PR #30590
- Export v2 gen_ai client metrics to the configured meter provider - PR #30549
- Stamp gen_ai.input/output.messages on v2 spans - PR #30548
- Cap metric attribute cardinality with include/exclude lists - PR #30257
- Record the full error message on the standard exception event in v2 - PR #30380
- Accept UPPER_SNAKE_CASE OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT in v2 - PR #30562
General
- Preserve error_message on ProxyException failures in spend logs - PR #30381

Guardrails

Cisco AI Defense - new integration - PR #28249
Repello Argus - new integration - PR #30465
Presidio - add missing UK PII entity types - PR #30537; don't mask the live request when the guardrail is logging_only - PR #30461
AIM - return 400 not 500 when AIM blocks a request - PR #30573
General
- Stop re-initializing DB guardrails on every poll - PR #30542
- Run the pre_call hook once for model-level guardrails - PR #30543
- disable_global_guardrails overrides the team list - PR #28563
- Surface OpenAI moderation violation_categories on guardrail traces - PR #30659

Secret Managers

AWS Secrets Manager - cross-region replication - PR #30368

Spend Tracking, Budgets and Rate Limiting

Service-tier pricing - apply the service_tier suffix to above-threshold cache rates and expose priority+threshold keys in ModelInfo - PR #30450; price and surface the Anthropic response service_tier in cost tracking - PR #30558; stop non-string service_tier from silently dropping cost tracking - PR #30690, PR #30706
Budgets - enforce budgets against authoritative DB spend when the cross-pod counter is stale - PR #30684; release a budget reservation when a request is cancelled mid-flight - PR #30522; recompute budget_reset_at when budget_duration changes - PR #30555
Rate limiting - prevent internal parallel_request_limiter fields from leaking to upstream providers - PR #30545
Spend accuracy - record partial spend on the failure row for interrupted streams - PR #30788; recover output tokens for interrupted Anthropic streams - PR #30787; stop Perplexity double-billing reasoning tokens in the manual cost fallback - PR #30488; correct cached-token usage with ChatCompletionUsageBlock - PR #30422
Usage aggregation - drain all daily-spend batches per flush cycle - PR #30505; show session-aggregate cost and duration in request logs - PR #30507; coalesce null aggregates for no-spend keys - PR #29945; remove timezone date expansion in daily-activity aggregation - PR #29569

MCP Gateway

Make the MCP gateway name and description configurable via env vars - PR #30473
Fail closed when the scope filter resolves to no servers - PR #30353
Re-raise instead of silently dropping MCP team permissions - PR #30477
Drop the phantom 401 span on delegated OAuth2 tool calls - PR #30494
Challenge delegate-auth OAuth servers with the upstream resource_metadata - PR #31255
Default the Linear MCP registry entry to streamable HTTP - PR #30396
Preserve native tools in the semantic filter hook - PR #26650

Performance / Loadbalancing / Reliability improvements

Streaming connection hygiene - cancel the upstream Gemini request and release the httpx connection on client disconnect - PR #30075; close the upstream LLM stream when the client disconnects mid-stream - PR #30245; release the aiohttp connection when stream iteration ends abnormally - PR #30271; use e.request_data for logging_obj in ModifyResponseException streaming passthrough - PR #30800
Caching - add a valkey-semantic cache backend and fix semantic-cache scope keys - PR #30675; url-encode the object name in the GCS cache GET path - PR #30378; allow use_redis_transaction_buffer without a Redis cache - PR #28764
Router / fallbacks - resolve a list-unhashable crash on model alias - PR #30464; clean pattern_router state on upsert/delete - PR #29601; preserve the fallback model in SDK fallback responses - PR #28260; add expose_router_debug_in_errors (default True) to redact internal model_group/fallback names - PR #30418
Startup / workers - fail fast on a non-PostgreSQL DATABASE_URL instead of hanging - PR #30366; add --max_requests_before_restart_jitter to stagger worker restarts - PR #30601; fix the IAM refresh-engine watcher race - PR #30183; release the cron pod-lock by matching async_set_cache JSON encoding - PR #30600
Health checks - correct Bedrock embedding health checks - PR #30583; bump the health-check max_tokens default to 16 for GPT-5 compatibility - PR #30708, PR #26610
Developer experience / CI - around 30 PRs hardening the lint and type-check gates (standardizing on basedpyright, dropping mypy, ratcheting any-discipline budgets), an osv-scanner lockfile workflow, zizmor PR gating, a local fake-OpenAI test endpoint replacing the shared mock, dependency bumps, and a pinned build toolchain.

Documentation Updates

Add 1-click AWS/GCP Terraform deploy buttons and fix README deploy-button rendering - PR #29879
Strengthen the coding conventions in CLAUDE.md - PR #30333
Clarify the Linear portion of the PR template - PR #30766

New Contributors

@hannahmadison, @ayushh0110, @Dotify71, @munnr, @V-3604, @yrk111222, @Silvenga, @djmaze, @apshada, @HumphreySun98, @Harshxth, @tomoyat1, @S0ngRu1, @habonlaci, @moshemalawach, @nahrinoda, @Vedant-Agarwal, @lollinng, @anneheartrecord, @hdt12a1, @vineethsaivs, @krishvsoni, @rvishwas26, @santino18727-debug, @darktheorys, @songkuan-zheng, @Thijmen, @Kropiunig, @jay-tau, @KnyazSh, @koztkozt, @us, @Anuj7411, @zkryakgul, @lavish619, @EugeneLugovtsov, @Bochenski, @menardorama, @factnn, @semmons99, @nitishagar, @FadelT, @jho1-godaddy, @yucheng-berri, @ad1269, @shzdehmd, @vanika02, @Nithish-Yenaganti, @simantak-dabhade, @devYRPauli, @clpatterson, @tcconnally

Full Changelog

v1.89.0...v1.90.0

Deploy this version​

Key Highlights​

App Router Routing​

New Providers and Endpoints​

New Providers (6 new providers)​

New LLM API Endpoints​

New Models / Updated Models​

New Model Support (91 new models)​

Features​

Bug Fixes​

LLM API Endpoints​

Features​

Bugs​

Management Endpoints / UI​

Features​

Bugs​

AI Integrations​

Logging​

Guardrails​

Secret Managers​

Spend Tracking, Budgets and Rate Limiting​

MCP Gateway​

Performance / Loadbalancing / Reliability improvements​

Documentation Updates​

New Contributors​

Full Changelog​

Deploy this version

Key Highlights

App Router Routing

New Providers and Endpoints

New Providers (6 new providers)

New LLM API Endpoints

New Models / Updated Models

New Model Support (91 new models)

Features

Bug Fixes

LLM API Endpoints

Features

Bugs

Management Endpoints / UI

Features

Bugs

AI Integrations

Logging

Guardrails

Secret Managers

Spend Tracking, Budgets and Rate Limiting

MCP Gateway

Performance / Loadbalancing / Reliability improvements

Documentation Updates

New Contributors

Full Changelog