v1.83.14.rc.1 - GPT-5.5, Prompt Compression & Memory API
Deploy this version​
- Docker
- Pip
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:main-v1.83.14.rc.1
pip install litellm==1.83.14.rc1
This is a release candidate cut on top of
v1.83.10-stable. Validate on a staging proxy before promoting to the next stable tag.
Key Highlights​
- Day-0 GPT-5.5 and GPT-5.5 Pro support — OpenAI and Azure variants ship with full pricing maps, dated snapshots, and Responses-mode routing for the Pro tier.
- Server-side Prompt Compression — first-class proxy callback that transparently compresses long-context inputs (Claude Code, RAG, document workloads) before they hit the upstream model, with no client opt-in required.
/v1/memoryCRUD endpoints — proxy now exposes a memory store API with Prisma-backed metadata, consumed by the new agent loop.- LLM-as-a-Judge guardrail — model-graded post-call guardrail with configurable rubrics, joining the Bedrock / Lakera / Presidio / Noma family.
- MCP OAuth hardening — discoverable + BYOK authorize/token endpoints are tightened, temporary OAuth sessions are now shared across proxy instances via Redis, and per-server access policy is uniformly enforced across the proxy and broker.
- Per-member team budgets land in production — individual member budgets, per-member cycle surfacing in the Teams UI, and atomic counter alignment for user/org spend checks.
- Adaptive routing — opt-in router policy that weights deployments by recent latency/error history on top of the existing wildcard fallback.
New Models / Updated Models​
New Model Support (22 new models)​
| Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Mode |
|---|---|---|---|---|---|
| OpenAI | gpt-5.5, gpt-5.5-2026-04-23 | 1,050,000 | $5.00 | $30.00 | chat |
| OpenAI | gpt-5.5-pro, gpt-5.5-pro-2026-04-23 | 1,050,000 | $60.00 | $360.00 | responses |
| OpenAI | gpt-5.4-mini-2026-03-17 | 272,000 | $0.75 | $4.50 | chat |
| OpenAI | gpt-5.4-nano-2026-03-17 | 272,000 | $0.20 | $1.25 | chat |
| Azure OpenAI | azure/gpt-5.5, azure/gpt-5.5-2026-04-23 | 1,050,000 | $5.00 | $30.00 | chat |
| Azure OpenAI | azure/gpt-5.5-pro, azure/gpt-5.5-pro-2026-04-23 | 1,050,000 | $60.00 | $360.00 | responses |
| Azure OpenAI | azure/gpt-5.4-mini-2026-03-17 | 1,050,000 | $0.75 | $4.50 | chat |
| Azure OpenAI | azure/gpt-5.4-nano-2026-03-17 | 1,050,000 | $0.20 | $1.25 | chat |
| AWS Bedrock | anthropic.claude-mythos-preview | 1,000,000 | - | - | chat |
| AWS Bedrock | bedrock/us-east-1/zai.glm-5, bedrock/us-west-2/zai.glm-5 | 200,000 | $1.00 | $3.20 | chat |
| AWS Bedrock | bedrock/us-east-1/minimax.minimax-m2.5, bedrock/us-west-2/minimax.minimax-m2.5 | - | - | - | chat |
| Moonshot | moonshot/kimi-k2.6 | 262,144 | $0.95 | $4.00 | chat |
| OpenRouter | openrouter/anthropic/claude-opus-4.7 | 1,000,000 | $5.00 | $25.00 | chat |
| Gemini | gemini/gemini-embedding-2, gemini-embedding-2, vertex_ai/gemini-embedding-2 | 8,192 | $0.20 | - | embedding |
| DashScope | dashscope/qwen-image-2.0, dashscope/qwen-image-2.0-pro | - | - | - | image_generation |
Features​
- Bedrock
- OpenAI
- Azure OpenAI
azure/gpt-5.5+azure/gpt-5.5-proentries with dated variants - PR #26361
- Gemini
- Vertex AI
- Multi-region Vertex hosts (
aiplatform.*.rep.googleapis.com) - PR #26281
- Multi-region Vertex hosts (
- DashScope
- Image generation support for
qwen-image-2.0andqwen-image-2.0-pro- PR #25672
- Image generation support for
- Moonshot
- Add
moonshot/kimi-k2.6to the model registry - PR #26203
- Add
- Anthropic
- Migrate retired
claude-3-haiku-20240307references toclaude-haiku-4-5-20251001- PR #26139
- Migrate retired
- General
- Migrate 38 models from legacy
max_tokenstomax_input_tokens/max_output_tokens- PR #24422
- Migrate 38 models from legacy
Bug Fixes​
- Anthropic
- Azure
- Preserve
role='assistant'in streaming withinclude_usage- PR #24354
- Preserve
- Bedrock
- Gemini
- Vertex AI
- Forward
dimensionsparameter inmultimodalembeddingrequests - PR #24415
- Forward
- Zhipu / GLM
- Map non-standard
finish_reasonvalues - PR #24373
- Map non-standard
- OVHcloud
- Fix tool calling not working - PR #25948
- Scaleway
- Add audio support - PR #26110
LLM API Endpoints​
Features​
- Responses API
- Anthropic Messages API
- Memory API
- General
- Apply GPT-5 temperature validation in Responses API - PR #24371
Bugs​
- Responses API
- Normalize bridged object field - PR #26327
- Anthropic Messages API
- Preserve
anthropic_messagescall type for/v1/messageslogging - PR #26248
- Preserve
- Image API
- Vector Stores
- Memory API
- JSONify metadata before Prisma writes on
/v1/memory- PR #26536
- JSONify metadata before Prisma writes on
- General
- Harden pass-through target URL construction - PR #26467
Management Endpoints / UI​
Features​
- Virtual Keys / Auth
- UI
- Refactor
- Move projects management to enterprise package - PR #25677
Bugs​
- Virtual Keys / Auth
- Centralize
common_checksto close authorization bypass - PR #26279 - Tighten caller-permission checks on key route fields - PR #26492
- Extend caller-permission checks to service-account + tighten raw-body acceptance - PR #26493
- Enforce
upperbound_key_generate_paramson/key/regenerate- PR #26340 - Preserve
service_account_idin metadata on/key/update- PR #26004 - Restrict
/global/spend/*routes to admin roles - PR #26490 - Harden team metadata handling in
/team/newand/team/update- PR #26464 - Extend request body parameter restrictions to cloud provider auth fields - PR #26264
- Enforce format constraints on provider URL parameters - PR #26287
- Bind RAG ingestion config to stored credential values - PR #26512
- Broaden RAG ingestion credential cleanup to AWS endpoint/identity fields - PR #26525
- Harden
/model/inforedaction for plural credential field names - PR #26513
- Centralize
- UI
- Stop injecting $0 cost on model edit - PR #26001
AI Integrations​
Logging​
- General
- Add
litellm_call_idtoStandardLoggingPayloadand OTel span - PR #26133
- Add
- Vertex AI Passthrough
- Log
:embedContentand:batchEmbedContentsresponses - PR #26146
- Log
Guardrails​
- Bedrock Guardrails
- LLM-as-a-Judge
- Ship LLM-as-a-Judge guardrail - PR #26360
- General
Spend Tracking, Budgets and Rate Limiting​
- Per-member budgets
- Rate limiting
- Reseed enforcement read path from DB on counter miss - PR #26459
- Budgets
MCP Gateway​
- OAuth
- Permissions / routing
- Tool filtering
- Match tools with client-side namespace prefix in
mcp_semantic_tool_filter- PR #26117
- Match tools with client-side namespace prefix in
Performance / Loadbalancing / Reliability improvements​
- Routing
- Prompt Compression
- First-class server-side prompt compression callback - PR #25729
- Reliability
- Fix
/health/readiness503 loop when DB is unreachable - PR #26134
- Fix
- Developer ergonomics
--reloadflag for uvicorn hot reload (dev only) - PR #25901
General Proxy Improvements​
- Build / Docker
- Migrations
- CI / Infra
- Migrate more CI jobs from CircleCI to GitHub Actions - PR #26261
- CCI: cache, cleanup, anchors, install-path parity, Python 3.12, Ruby/Node pins - PR #26286
- CircleCI config cleanup and consolidation - PR #26226
- Speed up proxy unit tests and split
proxy-utilsinto its own matrix entry - PR #26150 - Remove CCI/GHA test duplication and semantically shard proxy DB tests - PR #26356
- Standalone
create-release-branchworkflow +contents:writepermission - PR #26342, PR #26359 - Supply-chain guard to block fork PRs that modify dependencies - PR #26511
- Use Postgres sidecar instead of shared DB for
auth_ui_unit_tests- PR #26141 - Fix
e2e_ui_testingstale-bundle issue on Ubuntu (cp -rmerge semantics) - PR #26047 - Apply black formatting to fix CI lint failures - PR #26140
- Test stability
- Stabilize spend-accuracy tests + patch Redis buffer data-loss path - PR #26270
- Stabilize spend-accuracy test transport flakes - PR #26290
- Deflake spend-tracking tests - PR #26349
- Drain logging worker in
test_router_caching_ttlto fix flakiness - PR #26355 - Isolate
master_key/prisma_clientmodule globals between proxy tests - PR #26362
- Packaging / dependencies
- UI
- Misc
- Replace substring check with
startswithinis_model_gpt_5_model- PR #25793
- Replace substring check with
Documentation Updates​
- Add missing observability integrations to View All page - PR #24420
- Clarify
x-litellm-model-groupvs. provider model id in proxy docs - PR #25497 - Gemini 3 thinking_level defaults and release note - PR #25842
- Align fenced code block padding on blog and doc pages - PR #25932
- Add supported providers to prompt caching doc - PR #26124
- Remove
docs/my-website, point contributors toBerriAI/litellm-docs- PR #26454
New Contributors​
- @dongyu-turo made their first contribution in #24164
- @Alpha-Zark made their first contribution in #25672
- @vinhphamhuu-ct made their first contribution in #25767
- @Bytechoreographer made their first contribution in #25788
- @BraulioV made their first contribution in #25793
- @Vigilans made their first contribution in #25883
- @nhyy244 made their first contribution in #26110
- @sakenuGOD made their first contribution in #26117
- @Michael-RZ-Berri made their first contribution in #26124
- @anmolg1997 made their first contribution in #26228
Full Changelog: https://github.com/BerriAI/litellm/compare/v1.83.10-stable...v1.83.14.rc.1
04/27/2026​
- New Models / Updated Models: 29
- LLM API Endpoints: 18
- Management Endpoints / UI: 23
- AI Integrations (Logging / Guardrails): 11
- Spend Tracking, Budgets and Rate Limiting: 6
- MCP Gateway: 8
- Performance / Loadbalancing / Reliability improvements: 5
- General Proxy Improvements: 27
- Documentation Updates: 6