[Preview] v1.77.3-stable - Priority Based Rate Limiting
Deploy this versionโ
- Docker
- Pip
docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.77.3.rc.1
pip install litellm
pip install litellm==1.77.3
Key Highlightsโ
- +550 RPS Performance Improvements - Optimizations in request handling and object initialization.
- Priority Quota Reservation - Proxy admins can now reserve TPM/RPM capacity for specific keys.
Priority Quota Reservationโ
This release adds support for priority quota reservation. This allows Proxy Admins to reserve TPM/RPM capacity for keys based on metadata priority levels, ensuring critical production workloads get guaranteed access regardless of development traffic volume.
Get started here
New Models / Updated Modelsโ
New Model Supportโ
Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Features |
---|---|---|---|---|---|
SambaNova | sambanova/deepseek-v3.1 | 128K | $0.90 | $0.90 | Chat completions |
SambaNova | sambanova/gpt-oss-120b | 128K | $0.72 | $0.72 | Chat completions |
OVHCloud | Various models | Varies | Contact provider | Contact provider | Chat completions |
CompactifAI | Various models | Varies | Contact provider | Contact provider | Chat completions |
TwelveLabs | twelvelabs/marengo-embed-2.7 | 32K | $0.12 | $0.00 | Embeddings |
Featuresโ
- OVHCloud AI Endpoints
- New provider support with comprehensive model catalog - PR #14494
- CompactifAI
- New provider integration - PR #14532
- SambaNova
- Added DeepSeek v3.1 and GPT-OSS-120B models - PR #14500
- Bedrock
- Cross-region inference profile cost calculation - PR #14566
- AWS external ID parameter support for authentication - PR #14582
- CountTokens API implementation - PR #14557
- Titan V2 encoding_format parameter support - PR #14687
- Nova Canvas image generation inference profiles - PR #14578
- Bedrock Batches API - batch processing support with file upload and request transformation - PR #14618
- Bedrock Twelve Labs embedding provider support - PR #14697
- Vertex AI
- Volcengine
- Fixed thinking parameters when disabled - PR #14569
- Cohere
- Handle Generate API deprecation, default to chat endpoints - PR #14676
- TwelveLabs
- Added Marengo Embed 2.7 embedding support - PR #14674
Bug Fixesโ
- Bedrock
- Empty arguments handling in tool call invocation - PR #14583
- Vertex AI
- Avoid deepcopy crash with non-pickleables in Gemini/Vertex - PR #14418
- XAI
- Fix unsupported stop parameter for grok-code models - PR #14565
- Gemini
New Provider Supportโ
- OVHCloud AI Endpoints
- Complete provider integration with model catalog and authentication - PR #14494
- CompactifAI
- New provider support with documentation - PR #14532
LLM API Endpointsโ
Featuresโ
- /responses
- General
Bugsโ
- /chat/completions
- /responses
- Fixed cost calculation - PR #14675
- General
- Rate limiter AttributeError fix - PR #14609
Spend Tracking, Budgets and Rate Limitingโ
- Responses API Cost Calculation fix - PR #14675
- Anthropic Cache Token Pricing - Separate 1-hour vs 5-minute cache creation costs - PR #14620, PR #14652
- Indochina Time Timezone support for budget resets - PR #14666
- Soft Budget Alert Cache Issues - Resolved soft budget alert cache issues - PR #14491
- Dynamic Rate Limiter v3 - Priority routing improvements - PR #14734
- Enhanced Rate Limit Errors - More detailed error messages - PR #14736
Management Endpoints / UIโ
Featuresโ
- Team Member Service Account Keys - Allow team members to view keys they create - PR #14619
- Default Budget for JWT Teams - Auto-assign budgets to generated teams - PR #14514
- SSO Access Control Groups - Enhanced token info endpoint integration - PR #14738
- Health Test Connect Protection - Restrict access based on model creation permissions - PR #14650
- Amazon Bedrock Guardrail Info View - Enhanced logging visualization - PR #14696
Bug Fixesโ
- SCIM v2 - Fix group PUSH and PUT operations for non-existent members - PR #14581
- Guardrail View/Edit/Delete behavior fixes - PR #14622
- In-Memory Guardrail update failures - PR #14653
Logging / Guardrail Integrationsโ
Featuresโ
- DataDog
- Langfuse
- Added logging support for Responses API - PR #14597
- Langsmith
- Langsmith Sampling Rate - Key/Team-level tracing configuration - PR #14740
- Prometheus
- Opik
- Fixed timezone issue - PR #14708
Bug Fixesโ
Guardrailsโ
- Tool Permission Guardrail - Fine-grained tool access control - PR #14519
- Bedrock Guardrails - Selective guarding support with runtime endpoint configuration - PR #14575, PR #14650
- Default Last Message in guardrails - PR #14640
- AWS exceptions handling despite 200 response - PR #14658
New Integrationโ
MCP Gatewayโ
- MCP Server Alias Parsing - Multi-part URL path support - PR #14558
- MCP Filter Recomputation - After server deletion - PR #14542
- MCP Gateway Tools List improvements - PR #14695
Performance / Loadbalancing / Reliability improvementsโ
- +500 RPS Performance Boost when sending the
user
field - PR #14616 - +50 RPS by removing iscoroutine from hot path - PR #14649
- 7% reduction in init overhead - PR #14689
- Generic Object Pool implementation for better resource management - PR #14702
General Proxy Improvementsโ
- Middle-Truncation for spend log payloads - PR #14637
Securityโ
- Security Update - Bump aiohttp==3.12.14, fix CVE-2025-53643 - PR #14638
New Contributorsโ
- @luisfucros made their first contribution in PR #14500
- @hanakannzashi made their first contribution in PR #14548
- @eliasto made their first contribution in PR #14494
- @Rasmusafj made their first contribution in PR #14491
- @LingXuanYin made their first contribution in PR #14569
- @ronaldpereira made their first contribution in PR #14613
- @hula-la made their first contribution in PR #14534
- @carlos-marchal-ph made their first contribution in PR #14610
- @akraines made their first contribution in PR #14637
- @mrFranklin made their first contribution in PR #14708
- @tcx4c70 made their first contribution in PR #14675
- @michaeltansg made their first contribution in PR #14666
- @tosi29 made their first contribution in PR #14725
- @gmdfalk made their first contribution in PR #14735
- @FelipeRodriguesGare made their first contribution in PR #14733
- @mritunjaysharma394 made their first contribution in PR #14678