Skip to main content

v1.90.0rc1 - Six New Providers, OpenTelemetry v2 Parity & Streaming Reliability

Deploy this version​

docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:1.90.0-rc.1

Key Highlights​

v1.90.0rc1 is the current release candidate for 1.90.0.

  • Six new providers - ModelScope, LibertAI, Parasail, Pinstripes, TinyFish (search), and FastCRW (search) - plus a new e2b code-execution sandbox primitive.
  • 91 new models across Fireworks AI, Scaleway, Tensormesh, LibertAI, Azure AI (including gpt-5.5 and DeepSeek V4), and Bedrock Mantle.
  • OpenTelemetry v2 reaches metrics parity with v1, emitting the six gen_ai.client.* metrics, stamping input/output message content, and scoping OTLP credentials per tenant.
  • A broad streaming-reliability sweep: upstream connections are now released when the client disconnects mid-stream (Gemini, aiohttp), requests are cancelled cleanly, and partial spend is recorded on interrupted streams.
  • Two new guardrails (Cisco AI Defense, Repello Argus) and a large Next.js App Router UI migration covering the models, teams, users, organizations, api-keys, and usage pages.

New Providers and Endpoints​

New Providers (6 new providers)​

ProviderSupported LiteLLM EndpointsDescription
ModelScope (modelscope)Chat CompletionsOpenAI-compatible provider for ModelScope-hosted models - PR #28460
LibertAI (libertai)Chat Completions, EmbeddingsJSON-configured OpenAI-compatible provider; ships 12 catalog models including bge-m3 embeddings - PR #30203
TinyFish (tinyfish)SearchWeb search provider - PR #30634
FastCRW (fastcrw)SearchWeb search provider - PR #30434
Parasail (parasail)Chat CompletionsOpenAI-compatible provider
Pinstripes (pinstripes)Chat CompletionsNew chat provider; ships 6 catalog models

New LLM API Endpoints​

CapabilityDescriptionDocumentation
Code execution (e2b)New sandbox / code-interpreter primitive for running model-generated code - PR #30898Sandbox

New Models / Updated Models​

New Model Support (91 new models)​

ProviderModelContextInput ($/1M)Output ($/1M)Features
Azure AIazure_ai/gpt-5.51,050,000$5$30reasoning, function calling, prompt caching, pdf, vision
Azure AIazure_ai/gpt-5.5-2026-04-231,050,000$5$30reasoning, function calling, prompt caching, pdf, vision
Azure AIazure_ai/deepseek-v4-flash1,000,000$0.19$0.51reasoning, function calling
Azure AIazure_ai/deepseek-v4-pro1,000,000$1.74$3.48reasoning, function calling
Azure AIazure_ai/deepseek-v3.1131,072$1.23$4.94reasoning, function calling
Azure AIazure_ai/MAI-Image-2.5-$5-image generation
Azure AIazure_ai/MAI-Image-2.5-Flash-$1.75-image generation
Azure AIazure_ai/MAI-Image-2e-$5-image generation
Azureazure/gpt-realtime-whisper---audio transcription
OpenAIgpt-realtime-whisper---audio transcription
DeepSeekdeepseek-v4-flash / deepseek/deepseek-v4-flash1,000,000$0.14$0.28function calling, prompt caching
DeepSeekdeepseek-v4-pro / deepseek/deepseek-v4-pro1,000,000$0.43$0.87function calling, prompt caching
Mistralmistral/mistral-medium-3-5262,144$1.50$7.50function calling, vision
GitHub Copilotgithub_copilot/mai-code-1-flash128,000$0.75$4.50function calling
Fireworks AI24 models incl. deepseek-v4-pro, glm-5p2, kimi-k2p6/kimi-k2p7-code, minimax-m3, qwen3p7-plus, gpt-oss-120b/gpt-oss-20bup to 1,048,576$0.07-$2.80$0.28-$8.80function calling, reasoning, vision
Bedrock Mantlebedrock_mantle/google.gemma-4-26b-a4b / gemma-4-31b / gemma-4-e2b128k-256k$0.04-$0.14$0.08-$0.40function calling, reasoning, vision
LibertAI12 models incl. qwen3.6-35b-a3b(-thinking), gemma-4-31b-it(-thinking), deepseek-v4-flash, bge-m3up to 262,144$0.01-$0.25free-$1.75function calling, reasoning, vision, embedding
Pinstripes6 models incl. ps/minimax-m2.7, ps/qwen3.6-35b-a3b, ps/glm-4.5-air, ps/deepseek-v4-flashup to 1,000,192$0.09-$0.30$0.20-$0.60function calling, reasoning
Scaleway17 models incl. qwen3.5-397b-a17b, mistral-medium-3.5-128b, gemma-4-26b-a4b-it, gpt-oss-120b, whisper-large-v3up to 256,000free-$1.50free-$7.50function calling, reasoning, vision, audio, embedding
Tensormesh10 models incl. Qwen3-Coder-480B-A35B-FP8, Qwen3.5-397B-A17B-FP8, Kimi-K2.6, DeepSeek-V4-Flash, gpt-oss-120b/gpt-oss-20bup to 262,144$0.07-$1.40$0.28-$4.40function calling, reasoning, prompt caching
Sonioxsoniox/stt-async-v58,000--audio transcription
TinyFishtinyfish/search---search

The 91 new entries also include the full fireworks_ai/accounts/... model and router paths. Claude Fable 5 already shipped in v1.89.0, so it is not counted here. Full diff: model_prices_and_context_window.json.

Features​

  • Anthropic
    • Surface compaction usage iterations data - PR #27065
    • Serve Anthropic-native /v1/models for Claude Code gateway discovery - PR #30273
  • OpenRouter
    • Map reasoning max level to xhigh - PR #28881
  • Bedrock
    • Optionally forward multimodal content blocks in AgentCore InvokeAgentRuntime - PR #28885
    • Support file content retrieval for batch output files - PR #30595
    • Make Bedrock Mantle Responses routing data-driven per model - PR #30700
  • DashScope
  • OCI
    • Make Cohere {{trace}} judges work (tool param types + agentic tool-calling continuation) - PR #30646

Bug Fixes​

  • Anthropic
    • Apply cache_control_injection_points on the /v1/messages path - PR #30341
    • Strip LiteLLM-injected total_tokens from /v1/messages responses - PR #30382
    • Cap cache_control injection at 4 blocks - PR #30480
    • Drop orphaned server_tool_use on multi-turn replay from generic OpenAI clients - PR #30486
    • Don't leak tool type into OpenAI function parameters schema - PR #30618
  • Bedrock
    • Preserve cache_control for ARN models in the /v1/messages adapter - PR #29823
    • Handle role: "system" inside the messages array on /v1/messages - PR #30443
    • Use a unique function-call id for Bedrock Mantle responses->chat tool calls - PR #30426
    • Add SigV4 fallback to Bedrock Mantle chat completions auth - PR #30714
  • Gemini / Vertex AI
    • Use get_vertex_base_url for cachedContents host - PR #29707
    • Buffer native Gemini SSE frames - PR #30225
    • Map Gemini upstream-error body code 429 to RateLimitError - PR #30417
    • Ensure checks show gemini-3-flash-preview supports responseJsonSchema - PR #30696
  • OpenAI-compatible
    • Preserve cache_control for OpenAI-compatible custom endpoints - PR #30387
    • hosted_vllm: remove thinking_blocks and convert list content to strings - PR #30475
    • Don't stack provider prefix on wildcard models with a custom prefix - PR #30360
  • WatsonX
    • Wrap string embedding input in an array for the WatsonX API - PR #30897
  • Pricing / Cost map
    • Add cost mapping for deepseek-v4-flash/deepseek-v4-pro - PR #27056
    • Add mistral-medium-3-5 to the cost map - PR #29303
    • Add azure_ai/gpt-5.5 to the model cost map - PR #30428
    • Add GitHub Copilot MAI Code Flash pricing - PR #30415
    • Sync the Fireworks AI model registry with the current platform catalog - PR #30616
    • Add soniox/stt-async-v5 - PR #30672
    • Correct swapped input/output token costs for command-r7b-12-2024 - PR #30413
    • Add 1h cache-write cost for Anthropic Sonnet 4.5/4.6 - PR #30474
    • Route Volcengine (Doubao) tiered-pricing models to the tiered cost handler - PR #30357; sort tiered thresholds numerically - PR #30375; treat a DashScope explicit 0.0 tier cost as a real price - PR #30653
    • Drop synthesized zero costs in register_model to preserve sparse entries - PR #30201

LLM API Endpoints​

Features​

  • Responses API
    • Propagate completed_response through FallbackResponsesStreamWrapper for streaming /v1/responses container ownership - PR #30213
  • /v1/models
    • Surface max_input_tokens/max_output_tokens on /v1/models - PR #30272
    • Include model group aliases in v1 model info - PR #30626
  • Realtime
    • Allow non-admin virtual keys to call GA Realtime WebRTC HTTP routes - PR #30089
  • Files
    • Attach existing OpenAI file ids - PR #30628

Bugs​

  • General
    • Token counter: handle Anthropic tool_reference blocks to stop dropped spend logs - PR #30302
    • Streaming: guard raise_on_model_repetition against empty choices - PR #30485
    • Audio: don't override an explicit response_format with verbose_json - PR #30599
    • Validate the resolved model in /realtime/client_secrets for non-transcription sessions - PR #30710

Management Endpoints / UI​

Features​

  • App Router migration - models - PR #30677, teams - PR #30343, users - PR #30334, organizations - PR #30336, api-keys - PR #30699, usage report - PR #30694, agents + router-settings - PR #30323
  • UI cleanup - remove the unreachable /chat page - PR #30178, dead UI components - PR #30340, orphaned pass-through-settings route - PR #30692; remove in-product survey and feedback nudges - PR #30773
  • Virtual Keys - expose per-model budget usage in /key/info - PR #30394; grace-period key rotation returns the deprecated-key lookup result on 401 - PR #30327
  • Teams / Orgs - add key_limit query param to /team/info - PR #30006; list public team model names in /v1/models - PR #30588
  • Proxy CLI Auth - add verification_uri_complete to the CLI SSO device flow - PR #30571
  • Proxy - configurable response headers and login-page hint - PR #30792; gate the "Default Credentials" hint on /ui/login behind an env flag - PR #30234

Bugs​

  • Access control / keys
    • /key/list now does exact user_id/key_alias matching by default, preventing cross-user key disclosure - PR #30593
    • Restrict /customer/daily/activity to admin-only - PR #28849
    • org_admin sees all org teams when the UI sends its own user_id - PR #30247
    • Allow internal roles to access vector store CRUD routes - PR #30503
    • Require premium only when enabling premium metadata fields - PR #30506
    • Guard check_and_fix_namespace against a None key - PR #30435
    • Warn at startup when custom_auth skips common_checks enforcement - PR #30665
    • Resolve list-files credentials from team BYOK deployments - PR #30495; preserve azure_ad_token through CredentialLiteLLMParams for /v1/files + batches - PR #30241
    • Enforce budget for models not in the cost map - PR #24949
  • UI
    • Stop the Virtual Keys page from an infinite render loop - PR #30397
    • Source api-keys identity from useAuthorized to stop "User ID is not set" - PR #30903
    • Warn that team models are deleted in the delete-team modal - PR #29990
    • Three small fixes - Gemini api_base, credential form reset, Mode badge - PR #30419
    • Repoint the dead usage-guide link to cost-tracking docs - PR #30859
  • Proxy
    • Support SMTP implicit SSL (port 465) - PR #30395

AI Integrations​

Logging​

  • OpenTelemetry
    • Emit the six gen_ai.client.* metrics at v1 parity in v2 - PR #30326
    • One v2 logger owns the global provider; scope tenant OTLP creds per exporter - PR #30590
    • Export v2 gen_ai client metrics to the configured meter provider - PR #30549
    • Stamp gen_ai.input/output.messages on v2 spans - PR #30548
    • Cap metric attribute cardinality with include/exclude lists - PR #30257
    • Record the full error message on the standard exception event in v2 - PR #30380
    • Accept UPPER_SNAKE_CASE OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT in v2 - PR #30562
  • General
    • Preserve error_message on ProxyException failures in spend logs - PR #30381

Guardrails​

  • Cisco AI Defense - new integration - PR #28249
  • Repello Argus - new integration - PR #30465
  • Presidio - add missing UK PII entity types - PR #30537; don't mask the live request when the guardrail is logging_only - PR #30461
  • AIM - return 400 not 500 when AIM blocks a request - PR #30573
  • General
    • Stop re-initializing DB guardrails on every poll - PR #30542
    • Run the pre_call hook once for model-level guardrails - PR #30543
    • disable_global_guardrails overrides the team list - PR #28563
    • Surface OpenAI moderation violation_categories on guardrail traces - PR #30659

Secret Managers​

Spend Tracking, Budgets and Rate Limiting​

  • Service-tier pricing - apply the service_tier suffix to above-threshold cache rates and expose priority+threshold keys in ModelInfo - PR #30450; price and surface the Anthropic response service_tier in cost tracking - PR #30558; stop non-string service_tier from silently dropping cost tracking - PR #30690, PR #30706
  • Budgets - enforce budgets against authoritative DB spend when the cross-pod counter is stale - PR #30684; release a budget reservation when a request is cancelled mid-flight - PR #30522; recompute budget_reset_at when budget_duration changes - PR #30555
  • Rate limiting - prevent internal parallel_request_limiter fields from leaking to upstream providers - PR #30545
  • Spend accuracy - record partial spend on the failure row for interrupted streams - PR #30788; recover output tokens for interrupted Anthropic streams - PR #30787; stop Perplexity double-billing reasoning tokens in the manual cost fallback - PR #30488; correct cached-token usage with ChatCompletionUsageBlock - PR #30422
  • Usage aggregation - drain all daily-spend batches per flush cycle - PR #30505; show session-aggregate cost and duration in request logs - PR #30507; coalesce null aggregates for no-spend keys - PR #29945; remove timezone date expansion in daily-activity aggregation - PR #29569

MCP Gateway​

  • Make the MCP gateway name and description configurable via env vars - PR #30473
  • Fail closed when the scope filter resolves to no servers - PR #30353
  • Re-raise instead of silently dropping MCP team permissions - PR #30477
  • Drop the phantom 401 span on delegated OAuth2 tool calls - PR #30494
  • Default the Linear MCP registry entry to streamable HTTP - PR #30396
  • Preserve native tools in the semantic filter hook - PR #26650

Performance / Loadbalancing / Reliability improvements​

  • Streaming connection hygiene - cancel the upstream Gemini request and release the httpx connection on client disconnect - PR #30075; close the upstream LLM stream when the client disconnects mid-stream - PR #30245; release the aiohttp connection when stream iteration ends abnormally - PR #30271; use e.request_data for logging_obj in ModifyResponseException streaming passthrough - PR #30800
  • Caching - add a valkey-semantic cache backend and fix semantic-cache scope keys - PR #30675; url-encode the object name in the GCS cache GET path - PR #30378; allow use_redis_transaction_buffer without a Redis cache - PR #28764
  • Router / fallbacks - resolve a list-unhashable crash on model alias - PR #30464; clean pattern_router state on upsert/delete - PR #29601; preserve the fallback model in SDK fallback responses - PR #28260; add expose_router_debug_in_errors (default True) to redact internal model_group/fallback names - PR #30418
  • Startup / workers - fail fast on a non-PostgreSQL DATABASE_URL instead of hanging - PR #30366; add --max_requests_before_restart_jitter to stagger worker restarts - PR #30601; fix the IAM refresh-engine watcher race - PR #30183; release the cron pod-lock by matching async_set_cache JSON encoding - PR #30600
  • Health checks - correct Bedrock embedding health checks - PR #30583; bump the health-check max_tokens default to 16 for GPT-5 compatibility - PR #30708, PR #26610
  • Developer experience / CI - around 30 PRs hardening the lint and type-check gates (standardizing on basedpyright, dropping mypy, ratcheting any-discipline budgets), an osv-scanner lockfile workflow, zizmor PR gating, a local fake-OpenAI test endpoint replacing the shared mock, dependency bumps, and a pinned build toolchain.

Documentation Updates​

  • Add 1-click AWS/GCP Terraform deploy buttons and fix README deploy-button rendering - PR #29879
  • Strengthen the coding conventions in CLAUDE.md - PR #30333
  • Clarify the Linear portion of the PR template - PR #30766

New Contributors​

@hannahmadison, @ayushh0110, @Dotify71, @munnr, @V-3604, @yrk111222, @Silvenga, @djmaze, @apshada, @HumphreySun98, @Harshxth, @tomoyat1, @S0ngRu1, @habonlaci, @moshemalawach, @nahrinoda, @Vedant-Agarwal, @lollinng, @anneheartrecord, @hdt12a1, @vineethsaivs, @krishvsoni, @rvishwas26, @santino18727-debug, @darktheorys, @songkuan-zheng, @Thijmen, @Kropiunig, @jay-tau, @KnyazSh, @koztkozt, @us, @Anuj7411, @zkryakgul, @lavish619, @EugeneLugovtsov, @Bochenski, @menardorama, @factnn, @semmons99, @nitishagar, @FadelT, @jho1-godaddy, @yucheng-berri, @ad1269, @shzdehmd, @vanika02, @Nithish-Yenaganti, @simantak-dabhade, @devYRPauli, @clpatterson, @tcconnally

Full Changelog​

v1.89.0...v1.90.0-rc.1