Skip to main content

v1.83.3-stable - MCP Toolsets & Skills Marketplace

Deploy this version​

docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:main-v1.83.3-stable

Key Highlights​


Skills Marketplace​

The Skills Marketplace gives teams a self-hosted catalog for discovering, installing, and publishing Claude Code skills. Skills are portable across Anthropic, Vertex AI, Azure, and Bedrock β€” so a skill published once works everywhere your gateway routes to.

Skills Marketplace

Get Started

Guardrail Fallbacks​

Guardrail Fallbacks

Guardrail pipelines now support an optional on_error behavior. When a guardrail check fails or errors out, you can configure the pipeline to fall back gracefully β€” logging the failure and continuing the request β€” instead of returning a hard 500 to the caller. This is especially useful for non-critical guardrails where availability matters more than enforcement.

Get Started

Team Bring Your Own Guardrails​

Teams can now attach guardrails directly from the team management UI. Admins configure available guardrails at the project or proxy level, and individual teams select which ones apply to their traffic β€” no config file changes or proxy restarts needed. This also ships with project-level guardrail support in the project create/edit flows.

MCP Toolsets​

MCP Toolsets let AI platform admins create curated subsets of tools from one or more MCP servers and assign them to teams and keys with scoped permissions. Instead of granting access to an entire MCP server, you can now bundle specific tools into a named toolset β€” controlling exactly which tools each team or API key can invoke. Toolsets are fully managed through the UI (new Toolsets tab) and API, and work seamlessly with the Responses API and Playground.

MCP Toolsets

Get Started


New Models / Updated Models​

New Model Support (60 new models)​

ProviderModelContext WindowInput ($/1M tokens)Output ($/1M tokens)Features
OpenAIgpt-5.4-mini272K$0.75$4.50Chat, cache read, flex/batch/priority tiers
OpenAIgpt-5.4-nano272K$0.20-Chat, flex/batch tiers
OpenAIgpt-4-03148K$30.00$60.00Re-added legacy entry (deprecation 2026-03-26)
Azure OpenAIazure/gpt-5.4-mini1.05M$0.75$4.50Chat completions, cache read
Azure OpenAIazure/gpt-5.4-nano---Chat completions
AWS Bedrockus.amazon.nova-canvas-v1:02.6K-$0.06 / imageNova Canvas image edit support
AWS Bedrocknvidia.nemotron-super-3-120b256K$0.15$0.65Function calling, reasoning, system messages
AWS Bedrockminimax.minimax-m2.5 (12 regions)1M$0.30$1.20Function calling, reasoning, system messages
AWS Bedrockzai.glm-5200K$1.00$3.20Function calling, reasoning
AWS Bedrockbedrock/us-gov-{east,west}-1/anthropic.claude-haiku-4-5-20251001-v1:0200K$1.20$6.00GovCloud Claude Haiku 4.5
Vertex AIvertex_ai/claude-haiku-4-5200K$1.00$5.00Chat, cache creation/read
Geminigemini-3.1-flash-live-preview / gemini/gemini-3.1-flash-live-preview131K$0.75-Live audio/video/image/text
Geminigemini/lyria-3-pro-preview, gemini/lyria-3-clip-preview131K--Music generation preview
xAIxai/grok-4.20-beta-0309-reasoning2M$2.00$6.00Function calling, reasoning
xAIxai/grok-4.20-beta-0309-non-reasoning2M--Function calling
xAIxai/grok-4.20-multi-agent-beta-03092M--Multi-agent preview
OCI GenAIoci/cohere.command-a-reasoning-08-2025, oci/cohere.command-a-vision-07-2025, oci/cohere.command-a-translate-08-2025, oci/cohere.command-r-08-2024, oci/cohere.command-r-plus-08-2024256K$1.56$1.56Cohere chat family on OCI
OCI GenAIoci/meta.llama-3.1-70b-instruct, oci/meta.llama-3.2-11b-vision-instruct, oci/meta.llama-3.3-70b-instruct-fp8-dynamicVariesVariesVariesLlama chat family on OCI
OCI GenAIoci/xai.grok-4-fast, oci/xai.grok-4.1-fast, oci/xai.grok-4.20, oci/xai.grok-4.20-multi-agent, oci/xai.grok-code-fast-1131K$3.00$15.00Grok family on OCI
OCI GenAIoci/google.gemini-2.5-pro, oci/google.gemini-2.5-flash, oci/google.gemini-2.5-flash-lite1M+$1.25$10.00Gemini family on OCI
OCI GenAIoci/cohere.embed-english-v3.0, oci/cohere.embed-english-light-v3.0, oci/cohere.embed-multilingual-v3.0, oci/cohere.embed-multilingual-light-v3.0, oci/cohere.embed-english-image-v3.0, oci/cohere.embed-english-light-image-v3.0, oci/cohere.embed-multilingual-light-image-v3.0, oci/cohere.embed-v4.0VariesVaries-Embeddings on OCI
Volcenginevolcengine/doubao-seed-2-0-pro-260215, doubao-seed-2-0-lite-260215, doubao-seed-2-0-mini-260215, doubao-seed-2-0-code-preview-260215256K--Doubao Seed 2.0 family

Features​

Bug Fixes​

  • General
    • Fix gpt-5.4 pricing metadata - PR #24748
    • Fix gov pricing tests and Bedrock model test follow-ups - PR #24931, PR #24947, PR #25022
    • Fix thinking blocks null handling - PR #24070
    • Streaming tool-call finish reason with empty content - PR #23895
    • Ensure alternating roles in conversion paths - PR #24015
    • File β†’ input_file mapping fix - PR #23618
    • File-search emulated alignment - PR #23969
    • Preserve final streaming attributes - PR #23530
    • Streaming metadata hidden params - PR #24220
    • Improve LLM repeated message detection performance - PR #18120

LLM API Endpoints​

Features​

  • Responses API

    • File Search support β€” Phase 1 native passthrough and Phase 2 emulated fallback for non-OpenAI models - PR #23969
    • Prompt management support for Responses API - PR #23999
    • Encrypted-content affinity across model versions - PR #23854, PR #24110
    • Round-trip Responses API reasoning_items in chat completions - PR #24690
    • Emit content_part.added streaming event for non-OpenAI models - PR #24445
    • Surface Anthropic code execution results as code_interpreter_call - PR #23784
    • Preserve Anthropic thinking.summary when routing to OpenAI Responses API - PR #21441
    • Auto-route Azure gpt-5.4+ tools + reasoning to Responses API - PR #23926
    • Preserve annotations in Azure AI Foundry Agents responses - PR #23939
    • API reference path routing updates - PR #24155
    • Map Chat Completion file type to Responses API input_file - PR #23618
    • Map file_url β†’ file_id in Responsesβ†’Completions translation - PR #24874
  • Batch API

  • Token Counting

    • Bedrock: respect api_base and aws_bedrock_runtime_endpoint - PR #24199
    • Vertex: respect vertex_count_tokens_location for Claude - PR #23907
  • Audio / Transcription API

    • Mistral: preserve diarization segments in transcription response - PR #23925
  • Embeddings API

    • Gemini: convert task_type to camelCase taskType for Gemini API - PR #24191
  • Video Generation

    • New reusable video character endpoints (create / edit / extension / get) with router-first routing - PR #23737
  • Search API

    • Support self-hosted Firecrawl response format - PR #24866
  • A2A / MCP Gateway API

    • Preserve JSON-RPC envelope for AgentCore A2A-native agents - PR #25092
  • Pass-Through Endpoints

    • Support ANTHROPIC_AUTH_TOKEN / ANTHROPIC_BASE_URL env vars and custom api_base in experimental passthrough - PR #24140

Bugs​

Management Endpoints / UI​

Features​

  • Virtual Keys

  • Teams + Organizations

    • Resolve access-group models / MCP servers / agents in team endpoints and UI - PR #25027, PR #25119
    • Allow changing team organization from team settings - PR #25095
    • Per-model rate limits in team edit/info views - PR #25144, PR #25156
    • Fix team model update 500 due to unsupported Prisma JSON path filter - PR #25152
    • Team model-group name routing fix - PR #24688
    • Modernize teams table - PR #24189
    • Team-member budget duration on create - PR #23484
    • Add missing team_member_budget_duration param to new_team docstring - PR #24243
    • Fix teams table refresh, infinite dropdown, and leftnav migration - PR #24342
  • Usage + Analytics

  • Models + Providers

    • Include access-group models in UI model listing - PR #24743
    • Expose Azure Entra ID credential fields in provider forms - PR #25137
    • Do not inject vector_store_ids: [] when editing a model - PR #25133
  • Guardrails UI

    • Project-level guardrails in project create/edit flows - PR #25100
    • Project-level guardrails support in the proxy - PR #25087
    • Allow adding team guardrails from the UI - PR #25038
  • MCP Toolsets UI

    • New Toolsets tab for curated MCP tool subsets with scoped permissions - PR #25155
  • Auth / SSO

  • UI Cleanup / Migration

    • Migrate Tremor Text/Badge to antd Tag and native spans - PR #24750
    • Migrate default user settings to antd - PR #23787
    • Migrate route preview Tremor β†’ antd - PR #24485
    • Migrate antd message to context API - PR #24192
    • Extract useChatHistory hook - PR #24172
    • Left-nav external icon - PR #24069
    • Vitest coverage for UI - PR #24144

Bugs​

AI Integrations​

Logging​

  • Langfuse

  • Prometheus

  • General

    • Centralize logging kwarg updates via a single update function - PR #23659
    • Fix failure callbacks silently skipped when customLogger is not initialized - PR #24826
    • Eliminate race condition in streaming guardrail_information logging - PR #24592
    • Use actual start_time in failed request spend logs - PR #24906
    • Harden credential redaction and stop logging raw sensitive auth values - PR #25151, PR #24305
    • Filter metadata by user_id - PR #24661
    • Batch metrics improvements - PR #24691
    • Filter metadata hidden params in streaming - PR #24220
    • Shared aiohttp session auto-recovery - PR #23808
    • Deferred guardrail logging v2 - PR #24135

Guardrails​

  • Register DynamoAI guardrail initializer and enum entry - PR #23752
  • Extract helper methods in guardrail handlers to fix PLR0915 - PR #24802
  • Add optional on_error fallback for guardrail pipeline failures - PR #24831, PR #25150
  • Allow teams to attach/manage their own guardrails from team settings - PR #25038
  • Project-level guardrail config in create/edit flows - PR #25100
  • Return HTTP 400 (vs 500) for Model Armor streaming blocks - PR #24693
  • Deferred guardrail logging v2 - PR #24135
  • Eliminate race condition in streaming guardrail_information logging - PR #24592
  • Model-level guardrails on non-streaming post-call - PR #23774
  • Guardrail post-call logging fix - PR #23910
  • Missing guardrails docs - PR #24083

Prompt Management​

  • Environment + user tracking for prompts (development/staging/production) in CRUD + UI flows - PR #24855, PR #25110
  • Prompt-to-responses integration - PR #23999

Secret Managers​

  • No new secret manager provider additions in this release.

Spend Tracking, Budgets and Rate Limiting​

  • Enforce budget for models not directly present in the cost map - PR #24949
  • Per-model rate limits in team settings/info UI - PR #25144, PR #25156
  • Prometheus organization budget metrics - PR #24449
  • Prometheus spend metadata - PR #24434
  • Fix unversioned Vertex Claude Haiku pricing entry to avoid $0.00 accounting - PR #25151
  • Fix budget/spend counters - PR #24682
  • Project ID tracking in spend logs - PR #24432
  • Dynamic rate-limit pre-ratelimit background refresh - PR #24106
  • Point72 limits changes - PR #24088
  • Model-level affinity in router - PR #24110

MCP Gateway​

  • Introduce MCP Toolsets with DB types, CRUD APIs, scoped permissions, and UI management tab - PR #25155
  • Resolve toolset names and enforce toolset access correctly in Responses API and streamable MCP paths - PR #25155
  • Switch toolset permission caching to shared cache path and improve cache invalidation behavior - PR #25155
  • Allow JWT auth for /v1/mcp/server/* sub-paths - PR #24698, PR #25113
  • Add STS AssumeRole support for MCP SigV4 auth - PR #25151
  • Tag query fix + MCP metadata support cherry-pick - PR #25145
  • MCP REST M2M OAuth2 flow - PR #23468
  • Upgrade MCP SDK to 1.26.0 - PR #24179
  • Restore MCP server fields dropped by schema sync migration - PR #24078

Performance / Loadbalancing / Reliability improvements​

  • Add control plane for multi-proxy worker management - PR #24217
  • Make DB migration failure exit opt-in via --enforce_prisma_migration_check - PR #23675
  • Return the picked model (not a comma-separated list) when batch completions is used - PR #24753
  • Fix mypy type errors in Responses transformation, spend tracking, and PagerDuty - PR #24803
  • Fix router code coverage CI failure for health check filter tests - PR #24812
  • Integrate router health-check failures with cooldown behavior and transient 429/408 handling - PR #24988, PR #25150
  • Add distributed lock for key rotation job execution - PR #23364, PR #23834, PR #25150
  • Improve team routing reliability with deterministic grouping, isolation fixes, stale alias controls, and order-based fallback - PR #25148, PR #25154
  • Regenerate GCP IAM token per async Redis cluster connection (fix token TTL failures) - PR #24426, PR #25155
  • Proxy server reliability hardening with bounded queue usage - PR #25155
  • Auto schema sync on startup - PR #24705
  • Kill orphaned Prisma engine on reconnect - PR #24149
  • Use dynamic DB URL - PR #24827
  • Migration corrections - PR #24105

Documentation Updates​

Infrastructure / Security Notes​

New Contributors​

Full Changelog: https://github.com/BerriAI/litellm/compare/v1.82.3-stable...v1.83.3-stable


04/04/2026​

  • New Models / Updated Models: 59
  • LLM API Endpoints: 28
  • Management Endpoints / UI: 61
  • Logging / Guardrail / Prompt Management Integrations: 30
  • Spend Tracking, Budgets and Rate Limiting: 11
  • MCP Gateway: 8
  • Performance / Loadbalancing / Reliability improvements: 17
  • Documentation Updates: 24
  • Infrastructure / Security: 50