v1.84.0-rc.1 - Reliability hardening + multi-pod budget accuracy
Deploy this versionβ
- Docker
- Pip
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:main-v1.84.0-rc.1
pip install litellm==1.84.0rc1
This is a release candidate cut on top of
v1.83.14-stable. Validate on a staging proxy before promoting to the next stable tag.Heads up β large bundle of behavioral changes. This rc consolidates a lot of reliability and hardening work that shipped in tight sequence. The Important Behavior Changes section below covers everything that changes a default, removes a configuration shortcut, or alters a request/response shape, with the opt-out you need to keep prior behavior. Read that section before upgrading a production deployment.
Key Highlightsβ
- Pass-through endpoints are authenticated by default. The
authfield on entries undergeneral_settings.pass_through_endpointsnow defaults totrue. The previous "OSS gets unauthenticated forwarders by default;auth: trueis enterprise-only" combination is gone βauth: trueworks on OSS, and operators who want an unauthenticated forwarder must setauth: falseexplicitly. - Multi-pod budget enforcement is materially more accurate.
RedisCache.async_incrementgains arefresh_ttlopt-in, spend counters opt into it, and stale in-memory counters are skipped on a clean Redis miss.ResetBudgetJobinvalidates Redis counters alongside DB resets so refreshed counters get reset too. - Prisma DB reconnects no longer freeze the event loop. The reconnect path replaced
await self.db.disconnect()(which calledsubprocess.Popen.wait()synchronously) with a SIGTERMβSIGKILL β freshPrisma()+connect()sequence. Liveness probes stop failing during database flaps. Companion fix restores reconnect-and-retry onPrismaClient.get_generic_data. - Memory footprint down ~700 MB on a two-worker Docker deployment via lazy-loaded feature routers and lazy-loaded front page. First request to a lazy route incurs the import cost; subsequent requests are unchanged.
- MCP OAuth + Azure Entra discovery support, opt-in short-ID tool prefix to keep MCP tool names under the 60-char limit, and OAuth root-endpoint visibility now matches explicit server-name lookup.
- Durable agent workflow run tracking via a new
/v1/workflows/runsREST surface backed byLiteLLM_WorkflowRun/LiteLLM_WorkflowEvent/LiteLLM_WorkflowMessagetables. Spend logssession_idjoins for free cost attribution. - Per-model routing strategies via Routing Groups. New
router_settings.routing_groupsschema binds a list ofmodel_names to its own routing strategy (e.g.latency-based-routingforgpt-4o,simple-shufflefor cheaper models) within a single router. Configurable inproxy_config.yamlor from the LiteLLM dashboard under General Settings β Routing Groups; UI-managed groups persist and override the YAML values.
β οΈ Important Behavior Changesβ
This release tightens a number of defaults across auth, ingress, callbacks, MCP, and the UI. Each item below names the change and, where applicable, the exact configuration you need to restore prior behavior.
Auth & request ingressβ
Pass-through endpoints default to auth: trueβ
- What changed:
PassThroughGenericEndpoint.authnow defaults toTrue. The runtime dispatch inuser_api_key_auth.pyreads endpoints as raw dicts, soendpoint.get("auth", True)applies even when the dict has no explicit key. Thepremium_usergate onauth: truewas also removed β OSS deployments can now useauth: true. - Who is affected: Any pass-through entry in
general_settings.pass_through_endpointsthat omittedauth:. Prior to this rc that meant unauthenticated; it now means LiteLLM-key-authenticated. - Restore prior behavior: Set
auth: falseexplicitly on every pass-through entry that is meant to be public (e.g. webhook receivers).general_settings:
pass_through_endpoints:
- path: /webhook/something
target: https://example.com/webhook
auth: false # was implicit before; must be explicit now
Clientside api_base / base_url are gated and credential-strippedβ
-
What changed:
- Clientside
api_base/base_urlare validated againstvalidate_urlwhenlitellm.user_url_validationis enabled. - When a request redirects
api_base/base_url, admin-configured provider credentials and per-deployment metadata (OCI signing keys, AWS / Azure / Vertex tokens, observability vars, every field onCredentialLiteLLMParams) are dropped before the call is forwarded. - The provider-inference matcher in
get_llm_provider_logic.pyno longer does an unanchored substring match β it now compares parsed URL hostname + segment-bounded path prefix. - The blocklist for clientside-overridable params adds
aws_bedrock_runtime_endpoint,langsmith_base_url,langfuse_host,posthog_host,braintrust_host,slack_webhook_url,s3_endpoint_url,sagemaker_base_url,deployment_url. The old "blocklist is a no-op whenapi_keyis non-empty" clause is removed.
- Clientside
-
Who is affected: Anyone passing
api_base(or any of the newly-blocked fields) at request time and relying on the implicit-api_keybypass to thread it through. -
Restore prior behavior: Use the documented BYOK paths instead of the bypass:
- Proxy-wide:
general_settings.allow_client_side_credentials: true - Per deployment:
litellm_params.configurable_clientside_auth_params: ["api_base", ...]
The 400 returned by the proxy on a blocked request names the offending field and points at the same two settings.
- Proxy-wide:
Master-key requests now propagate an alias instead of the master-key hashβ
- What changed: When a request authenticates with the master key, the
UserAPIKeyAuth.api_key/tokenvalue handed to downstream code is now the constantLITELLM_PROXY_MASTER_KEY_ALIAS = "litellm_proxy_master_key". The cache lookup is unchanged (still keyed onhash_token(master_key))._is_master_keyno longer accepts the SHA-256 hash form β only the raw master key. - Who is affected: Anything joining or filtering on the prior master-key hash value, including custom dashboards over spend logs and Prometheus
/metricsqueries pinned to the hash literal. - Restore prior behavior: None β operators querying spend logs or metrics for master-key activity should switch their filter to the alias
"litellm_proxy_master_key".
Invite-link onboarding no longer mints a key from GETβ
- What changed:
GET /onboarding/get_tokenreturns a 15-minute signed onboarding JWT bound to invite + user id; it does not mint ask-...virtual key.POST /onboarding/claim_tokenrequires that JWT and atomically reserves the invite viaupdate_many(... is_accepted=False, ... β True). - Who is affected: Any tooling that consumed
GET /onboarding/get_tokenfor an embeddedsk-...and treated it as a usable session key before completing the password claim. - Restore prior behavior: None β clients must call
POST /onboarding/claim_tokento obtain the live key.
CLI SSO login flow uses a server-side sessionβ
- What changed:
litellm-proxy loginnow starts a CLI SSO flow that returns a login id + polling secret + terminal verification code. The browser callback must confirm the terminal code before the polling endpoint returns the JWT. - Who is affected: Anyone running an older
litellm-proxyCLI against an upgraded proxy β the old caller-supplied-handle handoff is gone. - Restore prior behavior: None β upgrade the CLI alongside the proxy.
Team self-join (_is_available_team) only allows self-add as role=userβ
- What changed:
/team/member_add: when the caller is not an admin and the team is "available," the request must add only the caller themselves withrole="user". Bulk shapes are checked the same way; lists mixing a valid self-entry with arole="admin"entry are rejected. Email-only members on the self-join path are rejected./team/permissions_update: the_is_available_teamclause is removed entirely β only proxy/team/org admins can updateteam_member_permissions.
- Who is affected: Any flow that relied on the blanket bypass to either add an admin to an available team without admin privileges, or to mutate
team_member_permissionsfrom a non-admin context. - Restore prior behavior: None β perform admin-scoped operations with an admin key.
Guardrail modification permission gates on key presenceβ
- What changed: The guardrail-modification authz check in
auth_checks.pynow gates on intent (whether the key is present in the request) rather than payload truthiness. Some previously-accepted shapes will now 403. - Restore prior behavior: None β flow updates required for non-admin callers that previously slipped past on falsy payloads.
Untrusted root control fields are stripped from client requestsβ
- What changed:
_UNTRUSTED_ROOT_CONTROL_FIELDSinlitellm_pre_call_utils.pyincludesmock_response,mock_tool_calls, redaction-bypass controls, and a few others. They are stripped from client requests unless the calling key/team carriesallow_client_mock_response: true(formock_response/mock_tool_calls) or the corresponding admin-opt-in metadata for the redaction bypass. Pillar guardrail caching headers and Bedrock dynamic evaluation overrides are also filtered when not explicitly allowed. - Who is affected: Tests and tooling that pass
mock_response/mock_tool_callsinextra_bodyto short-circuit completions. - Restore prior behavior: Set
allow_client_mock_response: truein the admin metadata of the test key (or the team owning it):client.keys.generate(
key_alias="ci-mock-key",
metadata={"allow_client_mock_response": True},
)
Error responses no longer leak re-raised local parametersβ
- What changed: Broad
excepthandlers in the response-utils path used to render the captured request parameters into the re-raised error message. Those parameters can carry credentials, so they're now dropped from the rendered message. - Who is affected: Any client that parsed credential-shaped fields out of a 5xx error body. The error response shape is otherwise unchanged.
- Restore prior behavior: None.
Vector storesβ
Credentials redacted; /vector_store/update is per-store gatedβ
- What changed:
/vector_store/list,/vector_store/info,/vector_store/updateredact credential-bearing values inside the persistedlitellm_params(handles dicts, JSON-string-serialized params, and nested-dict shapes likelitellm_embedding_config)./vector_store/updateis now gated by_fetch_and_authorize_vector_storeβ same per-store access check/vector_store/infoalready had.SensitiveDataMaskeradds plural"credentials"to its default sensitive-pattern set, so segment-exact matching catchesvertex_credentials,aws_credentials, etc. (Latent fix that affects every default-instantiated masker, not just vector stores.)get_vector_store_infoandupdate_vector_storere-raiseHTTPExceptioninstead of letting the catch-all downgrade403/404to500.
- Who is affected: Anything reading
litellm_paramsoff these responses to recover provider keys, or any non-store-admin caller mutating arbitrary vector stores via/vector_store/update. - Restore prior behavior: None.
Logging callbacks & key/team metadataβ
os.environ/* callback refs in key/team metadata are no longer resolvedβ
- What changed:
convert_key_logging_metadata_to_callback()no longer resolvesos.environ/*values from key/team metadata viaget_secret(). Existing rows with such values are silently ignored at request setup instead of crashing the request. Trustedconfig.yamlteam-callback env resolution inadd_team_based_callbacks_from_config()is unchanged. NewAddTeamCallbackconstructions from key/team logging metadata also rejectos.environ/*callback vars. - Who is affected: Any key/team that stored
os.environ/DATABASE_URL(or similar) in its callback metadata to pick up a server env var at request time. - Restore prior behavior: Configure those callback secrets through trusted proxy
config.yaml(team_callbacks/model_list[*].litellm_params) instead of puttingos.environ/*references in DB-backed key or team metadata. The literal credential value can still be stored in metadata if absolutely necessary.
Team-callback admin mutations now emit audit logsβ
- What changed:
POST /team/{id}/callback(add_team_callbacks) andPOST /team/{id}/disable_logging(disable_team_logging) emitLiteLLM_AuditLogsrows whenlitellm.store_audit_logs=True. Additive when audit logging is enabled. - Restore prior behavior:
litellm.store_audit_logs: false(the default) suppresses the new rows.
MCPβ
Encrypted user-scoped MCP credentials at restβ
- What changed: Writes to
LiteLLM_MCPUserCredentials.credential_b64go throughencrypt_value_helper(nacl SecretBox) instead of plainurlsafe_b64encode. The read path tries nacl decryption first and falls back to plainurlsafe_b64decodefor legacy rows; existing rows stay readable. - Who is affected: Operators reading the table directly; the column contents change shape on first re-write.
- Restore prior behavior: None β backward-compat read path keeps legacy rows working until they are next written.
OAuth metadata discovery follows SSRF guardβ
- What changed: The two URLs MCP discovery follows (
resource_metadatafromWWW-Authenticate, andauthorization_servers[0]from protected-resource-metadata) are now subject toasync_safe_get. Same-authority metadata fetches stay direct (withfollow_redirects=False); cross-origin fetches are validated via the existing user URL validation policy. Public federated providers (Azure Entra, Google, Okta, GitHub) remain supported. - Who is affected: Cross-origin internal/loopback/cloud-metadata OAuth metadata URLs.
- Restore prior behavior: Toggle
litellm.user_url_validationand the existing URL validation controls per the proxy URL-validation docs to permit your specific internal targets.
MCP public-route detection no longer matches query strings; OAuth2 fallback no longer fail-opensβ
- What changed:
MCPRequestHandler.process_mcp_requestchecksrequest.url.path.startswith("/.well-known/")instead of".well-known" in str(request.url). Query-string smuggling like?.well-knownis rejected.- When an
Authorizationheader fails LiteLLM-key validation, the handler no longer treats the failure as "OAuth2 passthrough" and returns an emptyUserAPIKeyAuth().
- Restore prior behavior: None.
MCP OAuth root endpoint resolves with request visibility rulesβ
- What changed: Root-endpoint fallback resolves the single OAuth2 server using the same visibility rules as explicit server-name lookup; non-visible servers are no longer selected via the fallback path. The callback redirect path validates the full client redirect URI carried in state and appends parameters without dropping an existing query string.
- Restore prior behavior: None β adjust server visibility rather than relying on the fallback.
UI / static assetsβ
/get_image, /get_favicon, /get_logo_urlβ
- What changed:
- Remote HTTP(S)
UI_LOGO_PATH/LITELLM_FAVICON_URLare now browser-loaded via redirect β the proxy no longer fetches them server-side from these unauthenticated endpoints. - Local file paths still work in place, but the resolved file must have a supported image signature (
jpeg,png,gif,webp,ico); non-image paths fall back to the bundled default. /get_logo_urlonly returns HTTP(S) values; local filesystem paths are not disclosed.- Stale
cached_logo.jpgfiles are no longer served by/get_image.
- Remote HTTP(S)
- Who is affected: Custom branding setups that pointed
UI_LOGO_PATH/LITELLM_FAVICON_URLat non-image local files, or relied on/get_logo_urlto surface a local path. - Restore prior behavior: No new env vars required. Existing remote URLs continue to work; local image paths continue to work as long as the file is a recognized image type.
/ui/chat removedβ
- What changed: Static
chat.html/chat.txt/chat/are gone; the route 404s. The chat UI was already removed from the nav; the dangling static build is now also gone. - Restore prior behavior: None.
"Store Prompts in Spend Logs" toggle moved to Admin Settingsβ
- What changed: Both "Store Prompts in Spend Logs" and "Maximum Spend Logs Retention Period" moved from a gear-icon modal on the Logs page to Admin Settings β Logging Settings. The gear was visible to non-admins and surfaced 403s on save.
- Restore prior behavior: None β controls are admin-only as
/config/updateand/config/listalready required.
New Models / Updated Modelsβ
New Model Support (16 new models)β
| Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Features |
|---|---|---|---|---|---|
| OpenAI | gpt-image-2, gpt-image-2-2026-04-21 | n/a (image) | $5.00 | $10.00 | vision, pdf input |
| Azure OpenAI | azure/gpt-image-2, azure/gpt-image-2-2026-04-21 | n/a (image) | $5.00 | $10.00 | vision, pdf input |
| AWS Bedrock | zai.glm-5 | 200,000 | $1.00 | $3.20 | function calling, reasoning, tool choice |
| Crusoe | crusoe/deepseek-ai/DeepSeek-R1-0528 | 163,840 | $3.00 | $7.00 | reasoning |
| Crusoe | crusoe/deepseek-ai/DeepSeek-V3-0324 | - | - | - | - |
| Crusoe | crusoe/google/gemma-3-12b-it | 131,072 | $0.10 | $0.10 | function calling, vision, tool choice |
| Crusoe | crusoe/meta-llama/Llama-3.3-70B-Instruct | 131,072 | $0.20 | $0.20 | function calling, tool choice |
| Crusoe | crusoe/moonshotai/Kimi-K2-Thinking | 262,144 | $2.50 | $2.50 | reasoning |
| Crusoe | crusoe/openai/gpt-oss-120b | 131,072 | $0.80 | $0.80 | function calling, tool choice |
| Crusoe | crusoe/Qwen/Qwen3-235B-A22B-Instruct-2507 | 262,144 | $3.00 | $3.00 | function calling, tool choice |
| Vertex AI | vertex_ai/xai/grok-4.1-fast-reasoning | 2,000,000 | $0.20 | $0.50 | function calling, vision, reasoning, response schema, tool choice |
| Vertex AI | vertex_ai/xai/grok-4.1-fast-non-reasoning | 2,000,000 | $0.20 | $0.50 | function calling, vision, response schema, tool choice |
| Vertex AI | vertex_ai/xai/grok-4.20-reasoning | 2,000,000 | $2.00 | $6.00 | function calling, vision, reasoning, response schema, tool choice |
| Vertex AI | vertex_ai/xai/grok-4.20-non-reasoning | 2,000,000 | $2.00 | $6.00 | function calling, vision, response schema, tool choice |
New Providers (2 new providers)β
| Provider | Endpoints | Notes |
|---|---|---|
| AIHubMix | OpenAI-compatible chat completions | PR #24294 |
| Crusoe | chat completions across reasoning / instruct catalogs | catalog above |
Pricing updatesβ
- OpenAI
gpt-5.5-proβ corrected: was 2Γ OpenAI's published rate. Cost-tracking output forgpt-5.5-prowill drop to half what it reported under previous releases β operators reconciling spend reports across the upgrade boundary should expect the discontinuity. - PR #26651 - AWS Bedrock Anthropic Claude 4.5 / 4.6 / 4.7 (Global + US) β added
cache_creation_input_token_cost_above_1hr(and the_above_200k_tokensLC variant for Sonnet 4.5). 1-hour-TTL prompt-cache writes on Bedrock now bill at the published 1.6Γ rate instead of falling back to the 5-minute rate (was undercounting by ~60%). - PR #26800
Featuresβ
- Bedrock
- Preserve
cache_controlTTL on tools for Claude 4.5+ on the Converse path; sanitizetoolsblocks on the Invoke path - PR #25855 - Translate OpenAI
filecontent on the tool-result path (Bedrock Converse + direct Anthropic) - PR #26710 retrievalConfigurationpassthrough for vector-store search viaextra_body- PR #26685
- Preserve
- Vertex AI
- Google Native
- Anthropic
- JSON
response_format+ user tools on non-streaming: filtered tool calls + structured JSON merged intocontent; internaljson_tool_callno longer surfaces - PR #26222
- JSON
- Ollama
- Forward
tool_callson assistant messages andtool_call_idonrole: toolmessages β fixes the infinite tool-call loop on multi-turn agents - PR #26122
- Forward
- Predibase
- Migrate
transform_request/transform_responseintotransformation.py(refactor, no behavior change) - PR #25249
- Migrate
- AIHubMix (new)
- First-class OpenAI-compatible provider entry - PR #24294
Bug Fixesβ
- Vertex AI
- Preserve
itemson the array branch ofanyOfschemas withnull(Vertex was rejectingINVALID_ARGUMENT) - PR #26675
- Preserve
- Bedrock
GET /v1/batches/{batch_id}forwardsmodelfrom the encoded id (was returningLiteLLM doesn't support bedrock for 'create_batch') - PR #26814- Pass-through stream interruption now flushes spend tracking β
GeneratorExitfrom client disconnect was dropping per-chunk usage values - PR #26719 - Replace deprecated Claude 3.7 Sonnet test references with
claude-sonnet-4-5-20250929-v1:0across 16 test files - PR #26721
- Router custom pricing
- Propagate custom
cost_per_tokenfrom DBmodel_infothrough the fallback path - PR #25888
- Propagate custom
LLM API Endpointsβ
Featuresβ
- Workflows API (new)
- Durable agent workflow run tracking. New schema (
LiteLLM_WorkflowRun,LiteLLM_WorkflowEvent,LiteLLM_WorkflowMessage) and 8 endpoints under/v1/workflows/runs/...(create, list, get, patch, append/list events, append/list messages).session_idjoins toLiteLLM_SpendLogs.session_idfor free cost attribution. - PR #26793
- Durable agent workflow run tracking. New schema (
- Vector Stores
- Bedrock
retrievalConfigurationpassthrough viaextra_body, with explicit allow-listing per provider - PR #26685
- Bedrock
Bugsβ
- Responses API
DELETE /openai/responses/{id}no longer sendsjson={}β Azure now rejects the empty{}body withunexpected_body- PR #26949
- Pass-through endpoints
- Invoke post-call guardrails on non-streaming pass-through responses (
/vertex_ai/*,/openai/*,/bedrock/*); opt-in only when guardrails are configured for the route - PR #26262 - Inherit caller identity from
litellm_paramsmetadata when fabricatingUserAPIKeyAuthfor managed-files passthrough batch creation (Anthropic + Vertex AI) - PR #26831
- Invoke post-call guardrails on non-streaming pass-through responses (
- Embedding cache
- Preserve
prompt_tokens_details(incl.image_count) through the cache round-trip; aggregate per-item details on retrieval; merge incombine_usage()for partial cache hits - PR #26653
- Preserve
- Streaming logging
- Backfill streaming hidden response cost into the success log path - PR #26606
- Cost calculation
- Unify
success_handlertyped and dict branches so spend rows stop logging0and the budget-overrun reports it caused - PR #26629
- Unify
Management Endpoints / UIβ
Featuresβ
- Teams
- Team-level search-tool credentials: new
search_toolsarray onLiteLLM_ObjectPermissionTable; per-key permissions validated as a subset of the owning team's; UI selector under team management - PR #26691
- Team-level search-tool credentials: new
- Routing Groups
- New General Settings β Routing Groups page: create, edit, and delete per-model routing strategies from the dashboard without editing
proxy_config.yaml. UI-managed groups are persisted and override values defined in YAML; per-group state is rebuilt on save - PR #27131
- New General Settings β Routing Groups page: create, edit, and delete per-model routing strategies from the dashboard without editing
- Model Health
- Pagination controls on the model health status page - PR #26826
- CLI / Workers
--timeout_worker_healthcheckCLI flag (envTIMEOUT_WORKER_HEALTHCHECK) β forwards to uvicorn 0.37.0+ Config kwarg; older uvicorn = warning + no-op; gunicorn / hypercorn paths untouched - PR #26622
- Memory / lazy loading
- Background jobs
- Cleanup job for expired LiteLLM dashboard session keys - PR #26460
- MCP OAuth
- Azure Entra discovery endpoint support - PR #26584
Bugsβ
- MCP UI
- Tool Configuration panel on the MCP server edit page switched from
POST /mcp-rest/test/tools/list(temp-session preview, requires inline creds) toGET /mcp-rest/tools/list?server_id=...(stored credentials). Saved servers withauth_typeofapi_key/bearer_token/basic/authorizationnow load tools without "Unable to load tools β Failed to connect to MCP server." - PR #26002
- Tool Configuration panel on the MCP server edit page switched from
- Teams
- Per-member rows with
max_budget=NULLnow fall through to team-level enforcement instead of silently disabling it - PR #26809
- Per-member rows with
- Spend logs
- Strip request data from spend-log error messages - PR #26662
- Vertex retrieve mocked tests
is_redirect=Falseset on mocked retrieve responses - PR #26844
AI Integrationsβ
Loggingβ
- General
- Opt-in retry settings for the Generic API logger batch send β transient
litellm.Timeout/httpx.ConnectTimeoutfailures retry instead of dropping the batch - PR #26645 - Cache GCP IAM token used for Redis (was being regenerated per-connection; synchronous
google-auth+google-cloud-iamcalls were freezing the asyncio event loop, causing ~25 sINCRBYFLOATRedis spans in production) - PR #26441 - Backfill streaming hidden response cost - PR #26606
- Opt-in retry settings for the Generic API logger batch send β transient
Guardrailsβ
- CyCraft XecGuard (new)
- First-class partner guardrail. Multi-policy prompt/response scanning (prompt injection, harmful content, PII, system-prompt enforcement, bias, skills protection) plus RAG context-grounding via
/grounding- PR #26011
- First-class partner guardrail. Multi-policy prompt/response scanning (prompt injection, harmful content, PII, system-prompt enforcement, bias, skills protection) plus RAG context-grounding via
- Noma v2
_build_scan_payloadno longer crashes duringpost_call/during_call/during_mcp_callondeepcopy(request_data)failures with unserializable objects (e.g.uvloop.Loop) - PR #26605
- Pass-through
- Post-call guardrails on non-streaming pass-through responses (see LLM API Endpoints) - PR #26262
Spend Tracking, Budgets and Rate Limitingβ
- Multi-pod budget enforcement
RedisCache.async_incrementgainsrefresh_ttlopt-in (used by spend counters);get_current_spendandSpendCounterReseed.coalescedskip stale per-pod in-memory on a clean Redis miss;ResetBudgetJobinvalidates the Redis counter alongside every DB row reset (keys, users, teams, team members, budgets-linked keys) - PR #26829
- Cost calc unification
success_handlertyped + dict branches now compute cost the same way - PR #26629
- Per-member null budget
- Per-member rows with
max_budget=NULLfall through to team enforcement - PR #26809
- Per-member rows with
- Bedrock 1-hour cache write pricing
- Claude 4.5 / 4.6 / 4.7 Global + US entries gain
cache_creation_input_token_cost_above_1hr(was undercounting ~60%) - PR #26800
- Claude 4.5 / 4.6 / 4.7 Global + US entries gain
gpt-5.5-procorrected pricing- Was double-priced - PR #26651
- Bedrock pass-through stream interruption
- Spend tracking now flushes when client disconnects mid-stream - PR #26719
MCP Gatewayβ
- Tool prefix
- Opt-in
LITELLM_USE_SHORT_MCP_TOOL_PREFIXenv var: switches per-tool prefix from the human-readable server name (github_onprem-get_repo) to a deterministic 3-char base62 id derived fromserver_id(Xy7-get_repo). Lets long server names stay under the 60-char tool-name limit some model APIs enforce - PR #26733
- Opt-in
- OAuth
- Azure Entra discovery endpoint support - PR #26584
- See Important Behavior Changes for public-route detection, OAuth root endpoint visibility, OAuth metadata SSRF guard, and user-scoped credential encryption.
Performance / Loadbalancing / Reliability improvementsβ
- Routing Groups (per-model strategies)
- New
router_settings.routing_groupsschema binds a list ofmodel_names to its ownrouting_strategyand optionalrouting_strategy_args; ungrouped models fall back to the top-levelrouting_strategy(the implicitdefaultgroup, name reserved). Eachmodel_namemay belong to at most one group β overlap raisesValueErrorat init. Updatable at runtime viaRouter.update_settings(routing_groups=[...])or/config/update; per-group state is rebuilt on update - PR #27022
- New
- Database reconnect
- Prisma reconnect no longer blocks the asyncio event loop. Replaces
await self.db.disconnect()(which callssubprocess.Popen.wait()synchronously and freezes the loop for 30β120 s+ in production, failing K8s liveness probes) with SIGTERM β 0.5 s sleep β SIGKILL β freshPrisma()+connect(). Direct-reconnect path delegates torecreate_prisma_client- PR #26225 call_with_db_reconnect_retryhelper centralizes the reconnect-and-retry-once pattern. Restores the self-heal that 1.83.x lost onPrismaClient.get_generic_data(issue #25143) and harden the reconnect state machine - PR #26756
- Prisma reconnect no longer blocks the asyncio event loop. Replaces
- Redis IAM token caching
- GCP IAM token is no longer regenerated on every Redis connection; a single Redis
INCRBYFLOATwas taking 25.6 s on a 28.4 s trace in production - PR #26441
- GCP IAM token is no longer regenerated on every Redis connection; a single Redis
- Config caching
- DualCache config parameter reads are cached and batched. End-to-end on Docker, read load drops from 2.8 q/s to 0.7 q/s; improvement scales with pod count. Note: config edits will take longer to propagate (until the cache is invalidated) - PR #26469
- Memory footprint
- Connection layer
- Optional TCP
SO_KEEPALIVEsupport on aiohttp'sTCPConnector- PR #26730
- Optional TCP
- CLI
--timeout_worker_healthcheckflag for uvicorn worker triage (see Management Endpoints) - PR #26622
- Test stability
- Scope
test_model_alias_mapERROR-log assertion to LiteLLM logger soasynciorecords (e.g.Unclosed client session) stop flunking the assertion intermittently - PR #26741 - Replace lazy-load subprocess startup-import diff with static source scan (~13 s instead of timing out past two minutes) - PR #26934
- Opt model-access E2E tests into
allow_client_mock_response: trueafter the request-control hardening - PR #26941
- Scope
- Validation
General Proxy Improvementsβ
- CI / Tooling
- Support CircleCI "Rerun failed tests" for
local_testing_part1/local_testing_part2/litellm_router_testingjobs (was collecting 0 items + exit 123) - PR #26461 - Correct
min-release-agevalue in.npmrcfiles: drop thedsuffix to keepnpm installfrom crashing on npm 11.x withRangeError: Invalid time value- PR #26850
- Support CircleCI "Rerun failed tests" for
- Pull request template
- Add Linear ticket field for internal contributors - PR #26655
New Contributorsβ
- @xinrui-z made their first contribution in #24294
- @Jerry-SDE made their first contribution in #25249
- @Zerohertz made their first contribution in #25888
- @clyang made their first contribution in #26011
- @mverrilli made their first contribution in #26122
- @tuhinspatra made their first contribution in #26262
- @omriShukrun08 made their first contribution in #26605
- @lmcdonald-godaddy made their first contribution in #26651
- @minznerjosh made their first contribution in #26710
- @yassinkortam made their first contribution in #26730
- @sruthi-sixt-26 made their first contribution in #26814
Full Changelog: https://github.com/BerriAI/litellm/compare/v1.83.14-stable...v1.84.0-rc.1
05/05/2026β
- New Models / Updated Models: 19
- LLM API Endpoints: 6
- Management Endpoints / UI: 22
- AI Integrations (Logging / Guardrails): 3
- Spend Tracking, Budgets and Rate Limiting: 5
- MCP Gateway: 6
- Performance / Loadbalancing / Reliability improvements: 14
- General Proxy Improvements: 2
- Documentation Updates: 1
Total: 78 PRs