v1.83.10 - Claude Opus 4.7, Prompt Compression & Multi-Window Budgets
Deploy this version​
- Docker
- Pip
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:main-v1.83.10-stable
pip install litellm==1.83.10
Key Highlights​
- Claude Opus 4.7 day-0 support — Opus 4.7 across Anthropic, Bedrock, Vertex AI, Azure AI, and Perplexity, with reasoning, vision, prompt caching, computer use, and 1M-token context.
litellm.compress()— BM25-based prompt compression with a retrieval tool for trimming long context before it hits the model.- Multi-Threshold Budget Alerts — virtual keys can fire alerts at multiple configurable spend thresholds (e.g. 50% / 80% / 95%) instead of a single soft-budget level.
- Concurrent Budget Windows — keys and teams can run multiple budget periods (daily + monthly) simultaneously, each with its own reset cadence.
- Per-Team Guardrail Opt-Out — teams can opt out of specific global guardrails from team settings without touching config files.
- PromptGuard Guardrail Integration — first-class pre/post-call guardrail for prompt-injection detection.
- uv Packaging Migration — Poetry replaced by uv across packaging, CI, and Docker for faster, reproducible builds.
New Models / Updated Models​
New Model Support (10 new models)​
| Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Features |
|---|---|---|---|---|---|
| Anthropic | claude-opus-4-7, claude-opus-4-7-20260416 | 1M | $5.00 | $25.00 | Chat, reasoning, vision, computer use, prompt caching, PDF input, xhigh reasoning effort |
| AWS Bedrock | anthropic.claude-opus-4-7, us.anthropic.claude-opus-4-7, eu.anthropic.claude-opus-4-7, au.anthropic.claude-opus-4-7, global.anthropic.claude-opus-4-7 | 1M | $5.50 | $27.50 | Chat, reasoning, vision, computer use, prompt caching, PDF input, native structured output |
| Vertex AI | vertex_ai/claude-opus-4-7, vertex_ai/claude-opus-4-7@default | 1M | $5.00 | $25.00 | Chat, reasoning, vision, computer use, prompt caching, PDF input |
| Azure AI | azure_ai/claude-opus-4-7 | 200K | $5.00 | $25.00 | Chat, reasoning, vision, computer use, prompt caching, PDF input |
| Perplexity | perplexity/anthropic/claude-opus-4-7 | - | - | - | Web search, function calling (Responses mode) |
| Google Gemini | gemini/veo-3.1-lite-generate-preview | 1024 | - | $0.05 / sec | Video generation preview |
| OpenRouter | openrouter/google/gemini-3.1-flash-lite-preview | 1.05M | $0.25 | $1.50 | Chat, code execution, file search, function calling, prompt caching, reasoning, web search, vision, video/audio/PDF input |
| xAI | xai/grok-4.20-0309-reasoning | 2M | $2.00 | $6.00 | Function calling, reasoning, tool choice, vision, web search |
| W&B Inference | wandb/MiniMaxAI/MiniMax-M2.5 | 197K | $0.30 | $1.20 | Function calling, reasoning, response schema |
| W&B Inference | wandb/moonshotai/Kimi-K2.5 | 262K | $0.60 | $3.00 | Function calling, reasoning, response schema, vision |
Features​
-
- Normalize custom tool JSON schema for both Invoke and Converse APIs - PR #25396
- Bedrock API response null-type handling - PR #25810, PR #24147
- Prevent negative streaming costs for start-only cache usage - PR #25846
- Accurate cache token cost breakdown in UI and SpendLogs - PR #25735
- Remove unresolved merge conflict markers in Bedrock test file - PR #25995
- Replace flaky Bedrock gpt-oss tool-call live test with request-body mock - PR #25739
- Mock Bedrock Moonshot tests + fix
TogetherAIConfigrecursion - PR #25920 - Remove dead Bedrock
clear_thinkinginterleaved-thinking-beta assertion - PR #25913
-
- Veo 3.1 Lite pricing, video resolution usage, and tiered cost tracking - PR #25348
-
- Add
azure_ai/claude-opus-4-7cost map entry - cost map - Populate
standard_logging_objectfor Azure passthrough via logging hook - PR #25679
- Add
-
- Add
xai/grok-4.20-0309-reasoningcost map entry - PR #25930
- Add
-
- Preserve
cache_controlfor explicit prompt caching - PR #25331
- Preserve
-
- Allow overriding the default GitHub Copilot authentication endpoint - PR #25915
-
- Add Kimi-K2.5 and MiniMax-M2.5 cost map entries - PR #25409
Bug Fixes​
-
- Return actual upstream status code from
/v1/messages/count_tokensinstead of always 200 - PR #21352
- Return actual upstream status code from
-
- Gemini
finish_reasonenum normalization (see Features above) - PR #25337
- Gemini
-
- Revert null-
encoding_formatomission after downstream regression - PR #25698
- Revert null-
-
General
- Fix
versionshown in docs banner - PR #25875
- Fix
LLM API Endpoints​
Features​
-
- Add Responses API params to cache key allow-list - PR #25673
-
- Veo 3.1 Lite resolution-aware tiered cost tracking - PR #25348
-
General —
litellm.compress()- New BM25-based prompt compression API with retrieval tool, exposed via
litellm.compress()for trimming long prompts before model invocation - PR #25637
- New BM25-based prompt compression API with retrieval tool, exposed via
Bugs​
- General
- Tighten
api_keyvalue check in credential validation - PR #25917 - Tighten environment-reference handling in request parameters - PR #25592
- Harden request parameter handling - PR #25827
- Add shared path utilities and prevent directory traversal - PR #25834
- Add URL validation for user-supplied URLs - PR #25906
- Read guardrail config from admin metadata; fix tag-routing consistency - PR #25905
- Enforce organization boundaries in admin operations - PR #25904
- Resolve
prometheus_helpersfile/package shadow breaking/global/spend/logs- PR #26026 - Harden CORS credentials,
create_viewsexception handling, and spend-log cleanup loop - PR #25559 - Prevent API key leaks in error tracebacks, logs, and alerts - PR #25117
- Remove leading space from license
public_key.pem- PR #25339 - Cache invalidation: stop double-hashing token in bulk update and key rotation - PR #25552
model_max_budgetsilently broken for routed models - PR #25549- Bump 22 of 25 vulnerable dependabot-reported dependencies - PR #25442
- Fix
multiple valuesTypeErroringet_cache_key- PR #20261 - S3v2: use prepared URL for SigV4-signed S3 requests - PR #25074
- Health-check reasoning-token max-token precedence - PR #25936
BACKGROUND_HEALTH_CHECK_MAX_TOKENSenv var - PR #25344- Batch-limit stale managed object cleanup to prevent 300K-row UPDATE - PR #25227
- Preserve provider response headers in
StandardLoggingPayload- PR #25807 - Optimize DB query to prevent OOM during health checks - PR #25732
PodLockManager.release_lockatomic compare-and-delete (re-land #21226) - PR #24466routing_strategy_argsreturnsNonewhen strategy is not latency-based - PR #25882is_tool_name_prefixedvalidates against known MCP server prefixes - PR #25085- Persist default router end-budget across restarts - PR #25991
- Enforce team membership in team-scoped key management checks - PR #25686
- Agent endpoint and routing permission checks - PR #25922
- JWT-auth
key_alias=user_idfor Prometheus metrics — initial fix and revert - PR #25340, PR #25438 - Gate post-custom-auth DB lookups behind opt-in flag - PR #25634
- Align field-level checks in user and key update endpoints - PR #25541
/spend/logsfilter handling aligned with user scoping - PR #25594- Replace
custom_codeguardrail sandbox with RestrictedPython - PR #25818 - Presidio: use correct text positions in
anonymize_text- PR #24998
- Tighten
Management Endpoints / UI​
Features​
-
Virtual Keys
- Configurable multi-threshold budget alerts (e.g. 50% / 80% / 95%) - PR #25989
- Multiple concurrent budget windows per API key and team (
#24883) - PR #25109 - Per-member model scope + team
default_team_member_models- PR #24950 - Migrate regenerate key modal to AntD - PR #25406
- Strip empty premium fields from key update payload - PR #26023
- Default invite-user modal global role to least privilege - PR #25721
-
Teams
-
Models + Endpoints
- Claude Code BYOK support in UI Settings - PR #25998
- E2E tests for Add Model flow - PR #25590
- Pre-select backend default for boolean guardrail provider fields - PR #25700
- Render guardrail
optional_paramsbool defaults inSelect- PR #25806 - Use AntD
Selectfor MCPToolTestPanelboolean inputs - PR #25809 - Persist
extra_headerson MCP server edit - PR #26003 - Migrate Guardrail Test Playground from
@tremor/reactto AntD - PR #25749 - Migrate router_settings page from Tremor to AntD - PR #25879
- Reduce Tremor usage in Guardrails Monitor layout - PR #25803
- Remove Chat UI link from Swagger docs message - PR #25727
- Delete policy attachments via controlled modal - PR #25324
-
Auth / SSO
-
Logs / Activity
-
Helm
- Add
tplsupport toextraContainersandextraInitContainers- PR #25494
- Add
Bugs​
- Strip empty premium fields from key update payload - PR #26023
- Tighten
api_keyvalue check in credential validation - PR #25917 extra_headersnot persisting on MCP server edit - PR #26003- Logs team-filter dropdown leakage - PR #25716
- Add
getCookietocookieUtilsmock inuser_dashboardtest - PR #25719 - Remove deprecated
tests/ui_e2e_tests/suite - PR #25657 - Restrict
x-pass-header forwarding - PR #25916 - Blog dark-mode text invisible on dark background - PR #25620
- Default invite-user role least-privilege - PR #25721
AI Integrations​
Logging​
-
- Populate
standard_logging_objectvia logging hook - PR #25679
- Populate
-
General
- Preserve provider response headers in
StandardLoggingPayload- PR #25807
- Preserve provider response headers in
Guardrails​
-
- New PromptGuard guardrail integration for prompt-injection detection - PR #24268
-
- Replace
custom_codesandbox with RestrictedPython - PR #25818
- Replace
-
- Use correct text positions in
anonymize_text- PR #24998
- Use correct text positions in
-
General
- Per-team opt-out for specific global guardrails - PR #25575
- UI: pre-select backend default for boolean guardrail provider fields - PR #25700
- UI: render guardrail
optional_paramsboolean defaults inSelect- PR #25806 - Read guardrail config from admin metadata and fix tag-routing consistency - PR #25905
Caching​
- Add Responses API params to cache key allow-list - PR #25673
- Prevent
multiple valuesTypeErroringet_cache_key- PR #20261 - S3v2: use prepared URL for SigV4-signed S3 requests - PR #25074
Prompt Management / Compression​
- New
litellm.compress()BM25-based prompt compression API with retrieval tool - PR #25637
Secret Managers​
- No new secret manager provider additions in this release.
Spend Tracking, Budgets and Rate Limiting​
- Configurable multi-threshold budget alerts for virtual keys (e.g. 50% / 80% / 95%) - PR #25989
- Multiple concurrent budget windows per API key and team (
#24883) - PR #25109 - Bedrock/Anthropic accurate cache token cost breakdown in UI and SpendLogs - PR #25735
- Bedrock: prevent negative streaming costs for start-only cache usage - PR #25846
- Fix virtual-key projected-spend soft budget alerts - PR #25838
- Enforce project-level model-specific rate limits in parallel-request limiter - PR #25994
- Persist default router end-budget across restarts - PR #25991
- Align reset times for legacy entities (Team Members, End Users) with the standardized calendar - PR #25440
- Batch-limit stale managed-object cleanup to prevent 300K-row UPDATE - PR #25227
- Cache invalidation: stop double-hashing token in bulk update and key rotation - PR #25552
model_max_budgetsilently broken for routed models - PR #25549- Expose reasoning-effort fields in
get_model_info(and addtogether_ai/gpt-oss-120bto cost map) - PR #25263 - Veo 3.1 Lite resolution-aware tiered cost tracking - PR #25348
- Add
us-south1region for Vertexqwen3-235b-a22b-instruct-2507-maascost map - PR #25382
MCP Gateway​
- Validate
is_tool_name_prefixedagainst the set of known MCP server prefixes - PR #25085 - Restore PKCE-triggering 401 when no stored per-user token exists - PR #26032
- Expose per-server
InitializeResult.instructionsfrom the MCP gateway - PR #25694 - Extract shared PKCE helpers into
utils/pkce.ts- PR #25878 - UI: AntD
Selectfor MCPToolTestPanelboolean inputs - PR #25809 - UI: persist
extra_headerson MCP server edit - PR #26003
Performance / Loadbalancing / Reliability improvements​
- Prometheus exporter performance improvements - PR #25934
- Optimize DB query to prevent OOM during health checks - PR #25732
PodLockManager.release_lockatomic compare-and-delete (re-land of #21226) - PR #24466- Health-check reasoning-token max-token precedence - PR #25936
- New
BACKGROUND_HEALTH_CHECK_MAX_TOKENSenvironment variable - PR #25344 - Return
Noneforrouting_strategy_argswhen strategy is not latency-based - PR #25882 - Bump proxy dependencies; raise minimum Python to 3.10 - PR #26022
- Bump 22 of 25 vulnerable dependabot-reported dependencies - PR #25442
- Migrate packaging, CI, and Docker from Poetry to uv - PR #25007
[Infra]Bumpllm_translation_testingresource class toxlargeand tolerate worker restarts - PR #25887, PR #25898[Infra]Expand CI branch filters for non-mainPR targets - PR #25819[Infra]Guardmainto only accept PRs from staging and hotfix branches - PR #25733[Infra]Remove unusedpublish_proxy_extrasandprisma_schema_syncjobs from CircleCI config - PR #25821fix(ci): increasetest-server-root-pathtimeout to 30m - PR #25741- Remove non-existent
litellm_mcps_tests_coveragefrom coverage combine - PR #25737 - Helm: add
tplsupport toextraContainers/extraInitContainers- PR #25494 - Advisor tool orchestration loop for non-Anthropic providers - PR #25579
Documentation Updates​
- Cost discrepancy debugging guide - PR #25622
- Week 2 onboarding checklist - PR #25452
- Add "Copy Page as Markdown" +
llms.txtto docs site - PR #25975 - Docs announcement bar for Trivy compromise resolution - PR #25870
- Restyle docs.litellm.ai/blog to engineering blog aesthetic - PR #25580
- Ramp-style engineering blog restyle + Redis circuit breaker post - PR #25583
- Add back arrow to blog post pages - PR #25587
- Fallbacks image - PR #25731
- General docs update - PR #25736
- Backfill release notes for v1.83.3-stable and v1.83.7.rc.1 - PR #25723, PR #25726
- Fix version shown in docs - PR #25875
New Contributors​
- @hunterchris made their first contribution in https://github.com/BerriAI/litellm/pull/20261
- @Dmitry-Kucher made their first contribution in https://github.com/BerriAI/litellm/pull/24998
- @kulia26 made their first contribution in https://github.com/BerriAI/litellm/pull/25071
- @jaxhend made their first contribution in https://github.com/BerriAI/litellm/pull/23532
- @abhyudayareddy made their first contribution in https://github.com/BerriAI/litellm/pull/25337
- @avarga1 made their first contribution in https://github.com/BerriAI/litellm/pull/25263
- @acebot712 made their first contribution in https://github.com/BerriAI/litellm/pull/24268
- @meutsabdahal made their first contribution in https://github.com/BerriAI/litellm/pull/25395
- @shreyescodes made their first contribution in https://github.com/BerriAI/litellm/pull/25559
- @Lucas-Song-Dev made their first contribution in https://github.com/BerriAI/litellm/pull/25324
- @steromano87 made their first contribution in https://github.com/BerriAI/litellm/pull/25915
- @jlav made their first contribution in https://github.com/BerriAI/litellm/pull/25494
Full Changelog: https://github.com/BerriAI/litellm/compare/v1.83.7-stable...v1.83.10-stable
04/27/2026​
- New Models / Updated Models: 23
- LLM API Endpoints: 18
- Management Endpoints / UI: 22
- AI Integrations (Logging / Guardrails / Caching / Prompt): 16
- Spend Tracking, Budgets and Rate Limiting: 13
- MCP Gateway: 6
- Performance / Loadbalancing / Reliability improvements: 17
- Documentation Updates: 11