Skip to main content

v1.83.14.rc.1 - GPT-5.5, Prompt Compression & Memory API

Deploy this version​

docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:main-v1.83.14.rc.1

This is a release candidate cut on top of v1.83.10-stable. Validate on a staging proxy before promoting to the next stable tag.

Key Highlights​

  • Day-0 GPT-5.5 and GPT-5.5 Pro support — OpenAI and Azure variants ship with full pricing maps, dated snapshots, and Responses-mode routing for the Pro tier.
  • Server-side Prompt Compression — first-class proxy callback that transparently compresses long-context inputs (Claude Code, RAG, document workloads) before they hit the upstream model, with no client opt-in required.
  • /v1/memory CRUD endpoints — proxy now exposes a memory store API with Prisma-backed metadata, consumed by the new agent loop.
  • LLM-as-a-Judge guardrail — model-graded post-call guardrail with configurable rubrics, joining the Bedrock / Lakera / Presidio / Noma family.
  • MCP OAuth hardening — discoverable + BYOK authorize/token endpoints are tightened, temporary OAuth sessions are now shared across proxy instances via Redis, and per-server access policy is uniformly enforced across the proxy and broker.
  • Per-member team budgets land in production — individual member budgets, per-member cycle surfacing in the Teams UI, and atomic counter alignment for user/org spend checks.
  • Adaptive routing — opt-in router policy that weights deployments by recent latency/error history on top of the existing wildcard fallback.

New Models / Updated Models​

New Model Support (22 new models)​

ProviderModelContext WindowInput ($/1M tokens)Output ($/1M tokens)Mode
OpenAIgpt-5.5, gpt-5.5-2026-04-231,050,000$5.00$30.00chat
OpenAIgpt-5.5-pro, gpt-5.5-pro-2026-04-231,050,000$60.00$360.00responses
OpenAIgpt-5.4-mini-2026-03-17272,000$0.75$4.50chat
OpenAIgpt-5.4-nano-2026-03-17272,000$0.20$1.25chat
Azure OpenAIazure/gpt-5.5, azure/gpt-5.5-2026-04-231,050,000$5.00$30.00chat
Azure OpenAIazure/gpt-5.5-pro, azure/gpt-5.5-pro-2026-04-231,050,000$60.00$360.00responses
Azure OpenAIazure/gpt-5.4-mini-2026-03-171,050,000$0.75$4.50chat
Azure OpenAIazure/gpt-5.4-nano-2026-03-171,050,000$0.20$1.25chat
AWS Bedrockanthropic.claude-mythos-preview1,000,000--chat
AWS Bedrockbedrock/us-east-1/zai.glm-5, bedrock/us-west-2/zai.glm-5200,000$1.00$3.20chat
AWS Bedrockbedrock/us-east-1/minimax.minimax-m2.5, bedrock/us-west-2/minimax.minimax-m2.5---chat
Moonshotmoonshot/kimi-k2.6262,144$0.95$4.00chat
OpenRouteropenrouter/anthropic/claude-opus-4.71,000,000$5.00$25.00chat
Geminigemini/gemini-embedding-2, gemini-embedding-2, vertex_ai/gemini-embedding-28,192$0.20-embedding
DashScopedashscope/qwen-image-2.0, dashscope/qwen-image-2.0-pro---image_generation

Features​

  • Bedrock
    • Add GLM-5 and Minimax M2.5 entries with regional aliases - PR #24423
    • Day-0 support for Claude Mythos Preview via the bedrock-mantle endpoint - PR #26196
    • Allowlist Bedrock Invoke body fields and filter all anthropic-beta values - PR #26148
  • OpenAI
    • Versioned GPT-5.4 mini / nano snapshots - PR #26115
    • Add gpt-5.5 and gpt-5.5-pro to the model cost map - PR #26345, PR #26348
    • Day-0 support for GPT-5.5 and GPT-5.5 Pro - PR #26449
  • Azure OpenAI
    • azure/gpt-5.5 + azure/gpt-5.5-pro entries with dated variants - PR #26361
  • Gemini
    • Gemini Embedding 2 GA: cost map, blog, and tests - PR #26391
    • Expand VideoMetadata support to all Gemini models - PR #25767
  • Vertex AI
    • Multi-region Vertex hosts (aiplatform.*.rep.googleapis.com) - PR #26281
  • DashScope
    • Image generation support for qwen-image-2.0 and qwen-image-2.0-pro - PR #25672
  • Moonshot
    • Add moonshot/kimi-k2.6 to the model registry - PR #26203
  • Anthropic
    • Migrate retired claude-3-haiku-20240307 references to claude-haiku-4-5-20251001 - PR #26139
  • General
    • Migrate 38 models from legacy max_tokens to max_input_tokens / max_output_tokens - PR #24422

Bug Fixes​

  • Anthropic
    • Preserve tool_use input args in adapter streaming - PR #24355
    • Strip Gemini thought suffix from streaming tool_use id - PR #25935
    • Skip non-OpenAI file content blocks in file-id discovery helpers - PR #26228
    • Handle tool_choice type 'none' in messages API - PR #24457
  • Azure
    • Preserve role='assistant' in streaming with include_usage - PR #24354
  • Bedrock
    • Sort assistant content blocks so text precedes toolUse - PR #24368
    • Above-200k token pricing fix for Claude Sonnet/Opus 4.6 + Sonnet 4.6 max_input_tokens to 1M - PR #24164
  • Gemini
    • Filter params from embedding requests - PR #24370
    • Read web search cost from model_info instead of hardcoding - PR #24372
    • Include DOCUMENT modality tokens in cost calculation - PR #24410
  • Vertex AI
    • Forward dimensions parameter in multimodalembedding requests - PR #24415
  • Zhipu / GLM
    • Map non-standard finish_reason values - PR #24373
  • OVHcloud
  • Scaleway

LLM API Endpoints​

Features​

  • Responses API
    • Extract shared format mapping between Responses API and Chat Completions bridges - PR #24417
    • use_chat_completions_api flag for openai/ models with custom api_base - PR #25346
    • route_all_chat_openai_to_responses global flag - PR #25359
    • Strip custom_tool_call namespace for all providers - PR #26221
  • Anthropic Messages API
    • Map reasoning_auto_summary to thinking.display for native /v1/messages - PR #25883
    • Normalize reasoning effort with graceful degradation - PR #26111
  • Memory API
  • General
    • Apply GPT-5 temperature validation in Responses API - PR #24371

Bugs​

  • Responses API
  • Anthropic Messages API
    • Preserve anthropic_messages call type for /v1/messages logging - PR #26248
  • Image API
    • Forward litellm_params to validate_environment for Vertex AI credentials in image_edit - PR #26160
    • Enforce multipart-only file inputs on image edit endpoints - PR #26293
    • Align image URL fetch with the validated HTTP client (Bedrock + token counter paths) - PR #26272
  • Vector Stores
    • Restore BYOK key injection for vector store endpoints with team-scoped deployments - PR #25746
    • Respect object-level permissions for managed vector store endpoints - PR #26351
  • Memory API
    • JSONify metadata before Prisma writes on /v1/memory - PR #26536
  • General
    • Harden pass-through target URL construction - PR #26467

Management Endpoints / UI​

Features​

  • Virtual Keys / Auth
    • Refresh router after POST /model/update - PR #26427
    • Auto-add SSO team members to org on move (proxy admin only) - PR #26377
    • Apply team TPM/RPM + attribution for admins using x-litellm-team-id - PR #26438
    • Single-team DB fallback when JWT has no team_id - PR #26418
  • UI
    • "My User" tab on team info page - PR #26520
    • Send Invitation Email toggle on Users tab - PR #25808
    • UI setting to disable /key/generate for org admins - PR #26442
    • Sortable Model and TTFT columns on Spend Logs - PR #26488
    • Surface per-member budget cycle in Teams › Members tab - PR #26207
  • Refactor
    • Move projects management to enterprise package - PR #25677

Bugs​

  • Virtual Keys / Auth
    • Centralize common_checks to close authorization bypass - PR #26279
    • Tighten caller-permission checks on key route fields - PR #26492
    • Extend caller-permission checks to service-account + tighten raw-body acceptance - PR #26493
    • Enforce upperbound_key_generate_params on /key/regenerate - PR #26340
    • Preserve service_account_id in metadata on /key/update - PR #26004
    • Restrict /global/spend/* routes to admin roles - PR #26490
    • Harden team metadata handling in /team/new and /team/update - PR #26464
    • Extend request body parameter restrictions to cloud provider auth fields - PR #26264
    • Enforce format constraints on provider URL parameters - PR #26287
    • Bind RAG ingestion config to stored credential values - PR #26512
    • Broaden RAG ingestion credential cleanup to AWS endpoint/identity fields - PR #26525
    • Harden /model/info redaction for plural credential field names - PR #26513
  • UI
    • Stop injecting $0 cost on model edit - PR #26001

AI Integrations​

Logging​

Guardrails​

  • Bedrock Guardrails
    • Use Bedrock OUTPUT source for apply_guardrail when scanning model responses - PR #26144
    • Dedupe post-call log entry when only post_call is configured - PR #26474
    • Hook mode + match redaction + streaming request_data for spend logs - PR #25854, PR #26266
  • LLM-as-a-Judge
  • General
    • Team-level guardrails and global policy guardrails can run together - PR #26466
    • Guardrail param handling in list and submission endpoints - PR #26390
    • Log guardrail_information on streaming post-call - PR #26448
    • Suppress deferred success log when post-call guardrail blocks - PR #26528

Spend Tracking, Budgets and Rate Limiting​

  • Per-member budgets
    • Individual team-member budgets - PR #26208
    • Track per-member total spend on team memberships - PR #26195
    • Fix per-team member budget bypass - PR #26204
  • Rate limiting
    • Reseed enforcement read path from DB on counter miss - PR #26459
  • Budgets
    • Align user and org budget spend checks with the atomic counter pattern - PR #26182
    • Reset budget windows failing due to Prisma Json? null filter - PR #26346

MCP Gateway​

  • OAuth
    • Harden OAuth authorize/token endpoints (BYOK + discoverable) - PR #26274
    • Share temporary MCP OAuth sessions across instances via Redis - PR #26162, PR #26318
    • Align MCP OAuth proxy endpoints with per-server access policy - PR #26516
    • MCP broker OAuth endpoint access controls - PR #26142
  • Permissions / routing
    • Resolve team/key MCP permissions by name or alias - PR #26338
    • Split MCP routes into inference vs. management (unblocks Admin UI on DISABLE_LLM_API_ENDPOINTS nodes) - PR #26367
  • Tool filtering
    • Match tools with client-side namespace prefix in mcp_semantic_tool_filter - PR #26117

Performance / Loadbalancing / Reliability improvements​

  • Routing
    • Adaptive routing - PR #26049
    • Wildcard order fallback to higher-order deployments - PR #25772
  • Prompt Compression
    • First-class server-side prompt compression callback - PR #25729
  • Reliability
    • Fix /health/readiness 503 loop when DB is unreachable - PR #26134
  • Developer ergonomics
    • --reload flag for uvicorn hot reload (dev only) - PR #25901

General Proxy Improvements​

  • Build / Docker
    • Streamline Dockerfile.non_root build time - PR #26055
    • Use numeric UID 65534 in docker.non_root for K8s runAsNonRoot - PR #26268
    • Restore pre-uv Prisma cache path - PR #26201
  • Migrations
    • Opt-in v2 migration resolver - PR #26194
    • Freshness and destructive guards on migration workflow - PR #26185
  • CI / Infra
    • Migrate more CI jobs from CircleCI to GitHub Actions - PR #26261
    • CCI: cache, cleanup, anchors, install-path parity, Python 3.12, Ruby/Node pins - PR #26286
    • CircleCI config cleanup and consolidation - PR #26226
    • Speed up proxy unit tests and split proxy-utils into its own matrix entry - PR #26150
    • Remove CCI/GHA test duplication and semantically shard proxy DB tests - PR #26356
    • Standalone create-release-branch workflow + contents:write permission - PR #26342, PR #26359
    • Supply-chain guard to block fork PRs that modify dependencies - PR #26511
    • Use Postgres sidecar instead of shared DB for auth_ui_unit_tests - PR #26141
    • Fix e2e_ui_testing stale-bundle issue on Ubuntu (cp -r merge semantics) - PR #26047
    • Apply black formatting to fix CI lint failures - PR #26140
  • Test stability
    • Stabilize spend-accuracy tests + patch Redis buffer data-loss path - PR #26270
    • Stabilize spend-accuracy test transport flakes - PR #26290
    • Deflake spend-tracking tests - PR #26349
    • Drain logging worker in test_router_caching_ttl to fix flakiness - PR #26355
    • Isolate master_key/prisma_client module globals between proxy tests - PR #26362
  • Packaging / dependencies
    • Bump vulnerable dependencies - PR #26365
    • Declare MIT license in litellm-proxy-extras metadata - PR #26369
    • Declare proprietary license in litellm-enterprise metadata - PR #26457
  • UI
    • Fetch button ignores active filters on Request Logs page - PR #25788
    • Stale filters applied after sort/page/time change on Request Logs - PR #25789
  • Misc
    • Replace substring check with startswith in is_model_gpt_5_model - PR #25793

Documentation Updates​

  • Add missing observability integrations to View All page - PR #24420
  • Clarify x-litellm-model-group vs. provider model id in proxy docs - PR #25497
  • Gemini 3 thinking_level defaults and release note - PR #25842
  • Align fenced code block padding on blog and doc pages - PR #25932
  • Add supported providers to prompt caching doc - PR #26124
  • Remove docs/my-website, point contributors to BerriAI/litellm-docs - PR #26454

New Contributors​

  • @dongyu-turo made their first contribution in #24164
  • @Alpha-Zark made their first contribution in #25672
  • @vinhphamhuu-ct made their first contribution in #25767
  • @Bytechoreographer made their first contribution in #25788
  • @BraulioV made their first contribution in #25793
  • @Vigilans made their first contribution in #25883
  • @nhyy244 made their first contribution in #26110
  • @sakenuGOD made their first contribution in #26117
  • @Michael-RZ-Berri made their first contribution in #26124
  • @anmolg1997 made their first contribution in #26228

Full Changelog: https://github.com/BerriAI/litellm/compare/v1.83.10-stable...v1.83.14.rc.1


04/27/2026​

  • New Models / Updated Models: 29
  • LLM API Endpoints: 18
  • Management Endpoints / UI: 23
  • AI Integrations (Logging / Guardrails): 11
  • Spend Tracking, Budgets and Rate Limiting: 6
  • MCP Gateway: 8
  • Performance / Loadbalancing / Reliability improvements: 5
  • General Proxy Improvements: 27
  • Documentation Updates: 6