Skip to main content

[Preview] v1.77.3-stable - Priority Based Rate Limiting

Krrish Dholakia
CEO, LiteLLM
Ishaan Jaff
CTO, LiteLLM

Deploy this versionโ€‹

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.77.3.rc.1

Key Highlightsโ€‹

  • +550 RPS Performance Improvements - Optimizations in request handling and object initialization.
  • Priority Quota Reservation - Proxy admins can now reserve TPM/RPM capacity for specific keys.

Priority Quota Reservationโ€‹

This release adds support for priority quota reservation. This allows Proxy Admins to reserve TPM/RPM capacity for keys based on metadata priority levels, ensuring critical production workloads get guaranteed access regardless of development traffic volume.

Get started here

New Models / Updated Modelsโ€‹

New Model Supportโ€‹

ProviderModelContext WindowInput ($/1M tokens)Output ($/1M tokens)Features
SambaNovasambanova/deepseek-v3.1128K$0.90$0.90Chat completions
SambaNovasambanova/gpt-oss-120b128K$0.72$0.72Chat completions
OVHCloudVarious modelsVariesContact providerContact providerChat completions
CompactifAIVarious modelsVariesContact providerContact providerChat completions
TwelveLabstwelvelabs/marengo-embed-2.732K$0.12$0.00Embeddings

Featuresโ€‹

Bug Fixesโ€‹

New Provider Supportโ€‹


LLM API Endpointsโ€‹

Featuresโ€‹

  • /responses
    • Added cancel endpoint support for non-admin users - PR #14594
    • Improved response session handling and cold storage configuration with s3 - PR #14534
    • Added OpenAI & Azure /responses/cancel endpoint support - PR #14561
  • General
    • Enhanced rate limit error messages with details - PR #14736
    • Middle-truncation for spend log payloads - PR #14637

Bugsโ€‹


Spend Tracking, Budgets and Rate Limitingโ€‹

  • Responses API Cost Calculation fix - PR #14675
  • Anthropic Cache Token Pricing - Separate 1-hour vs 5-minute cache creation costs - PR #14620, PR #14652
  • Indochina Time Timezone support for budget resets - PR #14666
  • Soft Budget Alert Cache Issues - Resolved soft budget alert cache issues - PR #14491
  • Dynamic Rate Limiter v3 - Priority routing improvements - PR #14734
  • Enhanced Rate Limit Errors - More detailed error messages - PR #14736

Management Endpoints / UIโ€‹

Featuresโ€‹

  • Team Member Service Account Keys - Allow team members to view keys they create - PR #14619
  • Default Budget for JWT Teams - Auto-assign budgets to generated teams - PR #14514
  • SSO Access Control Groups - Enhanced token info endpoint integration - PR #14738
  • Health Test Connect Protection - Restrict access based on model creation permissions - PR #14650
  • Amazon Bedrock Guardrail Info View - Enhanced logging visualization - PR #14696

Bug Fixesโ€‹

  • SCIM v2 - Fix group PUSH and PUT operations for non-existent members - PR #14581
  • Guardrail View/Edit/Delete behavior fixes - PR #14622
  • In-Memory Guardrail update failures - PR #14653

Logging / Guardrail Integrationsโ€‹

Featuresโ€‹

Bug Fixesโ€‹

  • S3
    • Fixed 404 error when using s3_endpoint_url - PR #14559

Guardrailsโ€‹

  • Tool Permission Guardrail - Fine-grained tool access control - PR #14519
  • Bedrock Guardrails - Selective guarding support with runtime endpoint configuration - PR #14575, PR #14650
  • Default Last Message in guardrails - PR #14640
  • AWS exceptions handling despite 200 response - PR #14658

New Integrationโ€‹

  • PostHog - Complete observability integration for LiteLLM usage tracking and analytics - PR #14610

MCP Gatewayโ€‹

  • MCP Server Alias Parsing - Multi-part URL path support - PR #14558
  • MCP Filter Recomputation - After server deletion - PR #14542
  • MCP Gateway Tools List improvements - PR #14695

Performance / Loadbalancing / Reliability improvementsโ€‹

  • +500 RPS Performance Boost when sending the user field - PR #14616
  • +50 RPS by removing iscoroutine from hot path - PR #14649
  • 7% reduction in init overhead - PR #14689
  • Generic Object Pool implementation for better resource management - PR #14702

General Proxy Improvementsโ€‹

  • Middle-Truncation for spend log payloads - PR #14637

Securityโ€‹

  • Security Update - Bump aiohttp==3.12.14, fix CVE-2025-53643 - PR #14638

New Contributorsโ€‹

  • @luisfucros made their first contribution in PR #14500
  • @hanakannzashi made their first contribution in PR #14548
  • @eliasto made their first contribution in PR #14494
  • @Rasmusafj made their first contribution in PR #14491
  • @LingXuanYin made their first contribution in PR #14569
  • @ronaldpereira made their first contribution in PR #14613
  • @hula-la made their first contribution in PR #14534
  • @carlos-marchal-ph made their first contribution in PR #14610
  • @akraines made their first contribution in PR #14637
  • @mrFranklin made their first contribution in PR #14708
  • @tcx4c70 made their first contribution in PR #14675
  • @michaeltansg made their first contribution in PR #14666
  • @tosi29 made their first contribution in PR #14725
  • @gmdfalk made their first contribution in PR #14735
  • @FelipeRodriguesGare made their first contribution in PR #14733
  • @mritunjaysharma394 made their first contribution in PR #14678

Full Changelogโ€‹