Skip to main content

v1.89.0 - Claude Fable 5, A2A Agent Providers & MCP Per-Server Controls

Deploy this version​

docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:v1.89.0

Key Highlights​

v1.89.0 builds on v1.88.0.

  • Claude Fable 5 is supported across Anthropic, Bedrock, Azure AI, and Vertex at 1M-token context with adaptive thinking and computer use.
  • Agent-to-agent (A2A) gains two new agent providers - watsonx Orchestrate and LangFlow (with A2A session bridging) - plus OAuth M2M for Databricks Apps agents.
  • MCP gateway adds per-server environment variables with global and per-user scopes, per-server RPM rate limiting for keys and teams, OAuth passthrough with issuer-scoped JWT auth, and oauth2_flow persistence on server registration.
  • Observability lands OpenInference rendering parity for Arize/Phoenix (tool calls, cost, passthrough I/O, sessions, multimodal, cache tokens), MCP semantic conventions on the typed OTel v2 spans, and a Galileo logger that uses the ingest-traces API.
  • New search and transcription providers - APISerpent, You.com, and Soniox - join the gateway, alongside the dashboard's migration to fully typed, OpenAPI-generated API clients.

New Providers and Endpoints​

New Providers (3 new providers)​

ProviderSupported LiteLLM EndpointsDescription
APISerpent (apiserpent)SearchWeb search and deep-search API
You.com (you_com)SearchYou.com web search API
Soniox (soniox)Audio TranscriptionAsync speech-to-text (stt-async-v4)

New Models / Updated Models​

New Model Support (selected)​

ProviderModelContext WindowInput ($/1M tokens)Output ($/1M tokens)Features
Anthropicclaude-fable-51,000,000$10.00$50.00Adaptive thinking, computer use, function calling, prompt caching, vision
Vertex AIvertex_ai/claude-fable-51,000,000$10.00$50.00Same as Anthropic direct
Azure AIazure_ai/claude-fable-51,000,000$10.00$50.00Same as Anthropic direct
Bedrockanthropic.claude-fable-5 (+ global. / us. / eu. routes)1,000,000$10.00$50.00Same as Anthropic direct
Bedrock Mantlebedrock_mantle/openai.gpt-5.5272,000$5.50$33.00Responses API, reasoning, function calling, prompt caching
Bedrock Mantlebedrock_mantle/openai.gpt-5.4272,000$2.75$16.50Responses API, reasoning, function calling, prompt caching
Azure AIazure_ai/kimi-k2.6262,144$0.95$4.00Reasoning, vision, function calling, tool choice
MiniMaxminimax/MiniMax-M3512,000$0.60$2.40Reasoning, prompt caching, function calling
Inceptioninception/mercury-2 (+ mercury-edit-2)128,000$0.25$0.75Function calling, prompt caching, response schema

Additional model-map additions: fal.ai Nano Banana and Gemini 2.5 Flash Image generation - PR #29798; mistral/ministral-8b-latest - PR #29453; a batch of new Snowflake Cortex model entries (Claude, GPT, Llama, embeddings); vertex_ai/google/gemma-4-26b-a4b-it-maas; APISerpent, You.com, and Soniox catalog entries; and a jp. regional route for Claude Opus 4.7.

Features​

  • Anthropic
    • Route future Claude models to the Anthropic provider via pattern matching - PR #29239
    • Route Claude Opus 4.8 through adaptive thinking - PR #29702
    • Emit a thinking block for reasoning_content-only streaming chunks in the Anthropic adapter - PR #29600
    • Inline legacy $ref defs in tool schemas (Anthropic and Fireworks) - PR #28646
  • Gemini
    • Support googleSearch with server-side tools and googleMaps JSON schema - PR #29582
    • Use GA event names for Pipecat 1.3.x compatibility on Gemini realtime - PR #29662
  • Vertex AI
    • Use a user-supplied api_base as-is for the Model Garden OpenAI-compatible path - PR #29530
    • Handle namespace tools and strip client_metadata for Codex compatibility on Vertex/Anthropic - PR #29489
  • Azure AI
    • Strip tool-level extra fields on a 400 and retry - PR #29479

Bug Fixes​

  • General
    • Return a 400 (not 500) on Anthropic context overflow, and seed identity on failed auth - PR #29848
    • Omit the OpenAI [DONE] sentinel on google-genai streamGenerateContent - PR #29426

LLM API Endpoints​

Features​

  • Batches
    • Skip unnecessary batch input-file reads - PR #29114
    • Resolve credentials correctly when cancelling a managed batch - PR #29734
  • Vector Stores
    • Resolve vector-store file-list credentials from team deployments - PR #29739
    • Support an engines URL for Vertex AI Search - PR #27885
    • Forward per-request params to Vertex AI Search - PR #29459
  • Realtime
    • Track realtime audio token cost - PR #29722
    • Allow null transcripts in stream logging payloads - PR #29625
    • WebSocket connection improvements - PR #29563

Agents (A2A)​

  • watsonx Orchestrate agent provider - PR #29410
  • LangFlow agent provider with A2A session bridging - PR #28963
  • OAuth M2M for Databricks Apps A2A agents - PR #29586
  • A2A bug fixes - PR #29566

Management Endpoints / UI​

Features​

  • Virtual Keys & Auth
    • JWT-to-virtual-key mapping - PR #28510
    • Let internal users view search tools - PR #29542
    • Expand the all-team-models sentinel in can_key_call_model for batch validation - PR #29746
  • Dashboard
    • Generate dashboard API types from the proxy OpenAPI spec - PR #29816
    • Centralize proxy base-URL resolution into a tested resolver - PR #29793
    • Route networking calls through a shared, location-pinned apiClient - PR #29723, PR #29806, PR #29815
    • Migrate ESLint to flat config and bump eslint-config-next to 16 - PR #29626

Bug Fixes​

  • Use the resolved DB user_id for spend on legacy email match (JWT) - PR #29217
  • Preserve the 401 status for expired JWTs in OTel traces - PR #29510
  • Stop team BYOK model-name corruption on model edit - PR #29731
  • Drop a deleted team BYOK model name from team.models - PR #29820
  • Add default=None to LiteLLM_TeamMembership.litellm_budget_table - PR #29684
  • Require a new expiration when regenerating an expired key - PR #29838
  • Render caller-supplied filter options in caller order (LIT-3151) - PR #29462
  • Make A2A skill tags enterable and validated - PR #29512
  • Persist the Tools-tab MCP OAuth token to the DB - PR #29809
  • Route MCP playground auth by OAuth2 mode instead of token_url - PR #29714
  • Stop MCP playground tool calls from sending twice - PR #29821

AI Integrations​

Logging​

  • Arize / Phoenix
    • OpenInference rendering parity: tool calls, cost, passthrough I/O, session/user, multimodal, and cache tokens - PR #28800
  • Datadog
    • Split oversized batches on a 413 instead of re-queueing forever - PR #29444
  • Galileo
    • Use the ingest-traces API and the standard logging payload - PR #29651
  • OpenTelemetry
    • Allowlist team_metadata sub-keys promoted to baggage - PR #29442
    • Add MCP semantic conventions to OTel v2 - PR #29468
    • Capture 401 error details in management-endpoint spans - PR #29535
    • Emit the missing MCP span attributes - PR #29554
    • Emit a guardrail span on passthrough, including when a guardrail blocks - PR #29552, PR #29470

Guardrails​

Spend Tracking, Budgets and Rate Limiting​

  • Strip NUL bytes from spend-log payloads to prevent PostgreSQL 22P05 errors - PR #29515
  • Scope the session-token team-key budget exemption to a caller-supplied team_id - PR #29641

MCP Gateway​

  • Per-server environment variables with global and per-user scopes - PR #28917
  • Per-MCP-server RPM rate limiting for keys and teams - PR #29482
  • Support MCP OAuth passthrough and issuer-scoped JWT auth - PR #28356
  • Persist oauth2_flow on MCP server registration - PR #29690
  • Clear allowed_tools and tool overrides on MCP server edit - PR #29411
  • Gate /public/mcp_hub strictly on litellm.public_mcp_servers - PR #27764

Performance / Loadbalancing / Reliability improvements​

  • Native /health/drain preStop hook for graceful shutdown - PR #29439
  • Disable proxy buffering on streaming SSE responses - PR #29557
  • Populate llm_provider on internal rate-limit errors - PR #27707
  • Hot-reload .env in dev when running with --reload - PR #29783
  • Enable the Helm backend deployment to mount the gateway config.yaml - PR #29605
  • Convert the AWS and GCP Terraform stacks into reusable modules - PR #28103
  • Terraform GCP: abandon the SQL user on destroy - PR #29855; prompt for image_registry in the DeployStack one-click - PR #29852
  • Dependency bumps - PR #29860

Documentation Updates​

PR roll-up by ownership area​

PRs by ownership area (visible, non-vehicle set; total: 101)
- UI / Dashboard: 22
- General Proxy (testing / CI / build): 22
- Models & Providers: 13
- Performance / Reliability: 10
- Logging: 9
- LLM API Endpoints: 8
- MCP: 6
- Auth & Management: 5
- Agents (A2A): 4
- Docs: 4
- Spend / Budgets / Rate Limits: 2
- Models & Providers (new providers): 3
- Guardrails: 1

New Contributors​

Full Changelog​

https://github.com/BerriAI/litellm/compare/v1.88.0...v1.89.0


06/10/2026​

  • New Models / Updated Models: 16
  • LLM API Endpoints: 12
  • Management Endpoints / UI: 22
  • AI Integrations (Logging / Guardrails): 10
  • Spend Tracking, Budgets and Rate Limiting: 2
  • MCP Gateway: 6
  • Performance / Loadbalancing / Reliability improvements: 10
  • General Proxy Improvements (testing / CI / build): 22
  • Documentation Updates: 4