Skip to main content

[Preview] v1.77.7-stable - Claude Sonnet 4.5

Krrish Dholakia
CEO, LiteLLM
Ishaan Jaff
CTO, LiteLLM
Alexsander Hamir
Backend Performance Engineer
Achintya Srivastava
Fullstack Engineer
Sameer Kankute
Backend Engineer (LLM Translation)

Deploy this versionโ€‹

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.77.7.rc.1

Key Highlightsโ€‹

  • Dynamic Rate Limiter v3 - Automatically maximizes throughput when capacity is available (< 80% saturation) by allowing lower-priority requests to use unused capacity, then switches to fair priority-based allocation under high load (โ‰ฅ 80%) to prevent blocking
  • Major Performance Improvements - Router optimization reducing P99 latency by 62.5%, cache improvements from O(n*log(n)) to O(log(n))
  • Claude Sonnet 4.5 - Support for Anthropic's new Claude Sonnet 4.5 model family with 200K+ context and tiered pricing
  • MCP Gateway Enhancements - Fine-grained tool control, server permissions, and forwardable headers
  • AMD Lemonade & Nvidia NIM - New provider support for AMD Lemonade and Nvidia NIM Rerank
  • GitLab Prompt Management - GitLab-based prompt management integration

New Models / Updated Modelsโ€‹

New Model Supportโ€‹

ProviderModelContext WindowInput ($/1M tokens)Output ($/1M tokens)Features
Anthropicclaude-sonnet-4-5200K$3.00$15.00Chat, reasoning, vision, function calling, prompt caching
Anthropicclaude-sonnet-4-5-20250929200K$3.00$15.00Chat, reasoning, vision, function calling, prompt caching
Bedrockeu.anthropic.claude-sonnet-4-5-20250929-v1:0200K$3.00$15.00Chat, reasoning, vision, function calling, prompt caching
Azure AIazure_ai/grok-4131K$5.50$27.50Chat, reasoning, function calling, web search
Azure AIazure_ai/grok-4-fast-reasoning131K$0.43$1.73Chat, reasoning, function calling, web search
Azure AIazure_ai/grok-4-fast-non-reasoning131K$0.43$1.73Chat, function calling, web search
Azure AIazure_ai/grok-code-fast-1131K$3.50$17.50Chat, function calling, web search
Groqgroq/moonshotai/kimi-k2-instruct-0905Context variesPricing variesPricing variesChat, function calling
OllamaOllama Cloud modelsVariesFreeFreeSelf-hosted models via Ollama Cloud

Featuresโ€‹

  • Anthropic
    • Add new claude-sonnet-4-5 model family with tiered pricing above 200K tokens - PR #15041
    • Add anthropic/claude-sonnet-4-5 to model price json with prompt caching support - PR #15049
    • Add 200K prices for Sonnet 4.5 - PR #15140
    • Add cost tracking for /v1/messages in streaming response - PR #15102
    • Add /v1/messages/count_tokens to Anthropic routes for non-admin user access - PR #15034
  • Gemini
    • Ignore type param for gemini tools - PR #15022
  • Vertex AI
    • Add LiteLLM Overhead metric for VertexAI - PR #15040
    • Support googlemap grounding in vertex ai - PR #15179
  • Azure
    • Add azure_ai grok-4 model family - PR #15137
    • Use the extra_query parameter for GET requests in Azure Batch - PR #14997
    • Use extra_query for download results (Batch API) - PR #15025
    • Add support for Azure AD token-based authorization - PR #14813
  • Ollama
  • Groq
    • Add groq/moonshotai/kimi-k2-instruct-0905 - PR #15079
  • OpenAI
    • Add support for GPT 5 codex models - PR #14841
  • DeepInfra
    • Update DeepInfra model data refresh with latest pricing - PR #14939
  • Bedrock
    • Add JP Cross-Region Inference - PR #15188
    • Add "eu.anthropic.claude-sonnet-4-5-20250929-v1:0" - PR #15181
    • Add twelvelabs bedrock Async Invoke Support - PR #14871
  • Nvidia NIM

Bug Fixesโ€‹

  • VLLM
    • Fix response_format bug in hosted vllm audio_transcription - PR #15010
    • Fix passthrough of atranscription into kwargs going to upstream provider - PR #15005
  • OCI
    • Fix OCI Generative AI Integration when using Proxy - PR #15072
  • General
    • Fix: Authorization header to use correct "Bearer" capitalization - PR #14764
    • Bug fix: gpt-5-chat-latest has incorrect max_input_tokens value - PR #15116
    • Update request handling for original exceptions - PR #15013

New Provider Supportโ€‹


LLM API Endpointsโ€‹

Featuresโ€‹

  • Responses API

    • Return Cost for Responses API Streaming requests - PR #15053
  • /generateContent

    • Add full support for native Gemini API translation - PR #15029
  • Passthrough Gemini Routes

    • Add Gemini generateContent passthrough cost tracking - PR #15014
    • Add streamGenerateContent cost tracking in passthrough - PR #15199
  • Passthrough Vertex AI Routes

    • Add cost tracking for Vertex AI Passthrough /predict endpoint - PR #15019
    • Add cost tracking for Vertex AI Live API WebSocket Passthrough - PR #14956
  • General

    • Preserve Whitespace Characters in Model Response Streams - PR #15160
    • Add provider name to payload specification - PR #15130
    • Ensure query params are forwarded from origin url to downstream request - PR #15087

Management Endpoints / UIโ€‹

Featuresโ€‹

  • Virtual Keys

    • Ensure LLM_API_KEYs can access pass through routes - PR #15115
    • Support 'guaranteed_throughput' when setting limits on keys belonging to a team - PR #15120
  • Models + Endpoints

    • Ensure OCI secret fields not shared on /models and /v1/models endpoints - PR #15085
    • Add snowflake on UI - PR #15083
    • Make UI theme settings publicly accessible for custom branding - PR #15074
  • Admin Settings

  • MCP

    • show health status of MCP servers - PR #15185
    • allow setting extra headers on the UI - PR #15185
    • allow editing allowed tools on the UI - PR #15185

Bug Fixesโ€‹

  • Virtual Keys

    • (security) prevent user key from updating other user keys - PR #15201
    • (security) don't return all keys with blank key alias on /v2/key/info - PR #15201
    • Fix Session Token Cookie Infinite Logout Loop - PR #15146
  • Models + Endpoints

    • Make UI theme settings publicly accessible for custom branding - PR #15074
  • Teams

    • fix failed copy to clipboard for http ui - PR #15195
  • Logs

    • fix logs page render logs on filter lookup - PR #15195
    • fix lookup list of end users (migrate to more efficient /customers/list lookup) - PR #15195
  • Test key

    • update selected model on key change - PR #15197
  • Dashboard

    • Fix LiteLLM model name fallback in dashboard overview - PR #14998

Logging / Guardrail / Prompt Management Integrationsโ€‹

Featuresโ€‹

Guardrailsโ€‹

  • Javelin
    • Add Javelin standalone guardrails integration for LiteLLM Proxy - PR #14983
    • Add logging for important status fields in guardrails - PR #15090
    • Don't run post_call guardrail if no text returned from Bedrock - PR #15106

Prompt Managementโ€‹


Spend Tracking, Budgets and Rate Limitingโ€‹

  • Cost Tracking
    • Proxy: end user cost tracking in the responses API - PR #15124
  • Parallel Request Limiter v3
    • Use well known redis cluster hashing algorithm - PR #15052
    • Fixes to dynamic rate limiter v3 - add saturation detection - PR #15119
    • Dynamic Rate Limiter v3 - fixes for detecting saturation + fixes for post saturation behavior - PR #15192
  • Teams
    • Add model specific tpm/rpm limits to teams on LiteLLM - PR #15044

MCP Gatewayโ€‹

  • Server Configuration
    • Specify forwardable headers, specify allowed/disallowed tools for MCP servers - PR #15002
    • Enforce server permissions on call tools - PR #15044
    • MCP Gateway Fine-grained Tools Addition - PR #15153
  • Bug Fixes
    • Remove servername prefix mcp tools tests - PR #14986
    • Resolve regression with duplicate Mcp-Protocol-Version header - PR #15050
    • Fix test_mcp_server.py - PR #15183

Performance / Loadbalancing / Reliability improvementsโ€‹

  • Router Optimizations
    • +62.5% P99 Latency Improvement - Remove router inefficiencies (from O(M*N) to O(1)) - PR #15046
    • Remove hasattr checks in Router - PR #15082
    • Remove Double Lookups - PR #15084
    • Optimize _filter_cooldown_deployments from O(nร—m + kร—n) to O(n) - PR #15091
    • Optimize unhealthy deployment filtering in retry path (O(n*m) โ†’ O(n+m)) - PR #15110
  • Cache Optimizations
    • Reduce complexity of InMemoryCache.evict_cache from O(n*log(n)) to O(log(n)) - PR #15000
    • Avoiding expensive operations when cache isn't available - PR #15182
  • Worker Management
    • Add proxy CLI option to recycle workers after N requests - PR #15007
  • Metrics & Monitoring
    • LiteLLM Overhead metric tracking - Add support for tracking litellm overhead on cache hits - PR #15045

Documentation Updatesโ€‹

  • Provider Documentation
    • Update litellm docs from latest release - PR #15004
    • Add missing api_key parameter - PR #15058
  • General Documentation
    • Use docker compose instead of docker-compose - PR #15024
    • Add railtracks to projects that are using litellm - PR #15144
    • Perf: Last week improvement - PR #15193
    • Sync models GitHub documentation with Loom video and cross-reference - PR #15191

Security Fixesโ€‹

  • JWT Token Security - Don't log JWT SSO token on .info() log - PR #15145

New Contributorsโ€‹

  • @herve-ves made their first contribution in PR #14998
  • @wenxi-onyx made their first contribution in PR #15008
  • @jpetrucciani made their first contribution in PR #15005
  • @abhijitjavelin made their first contribution in PR #14983
  • @ZeroClover made their first contribution in PR #15039
  • @cedarm made their first contribution in PR #15043
  • @Isydmr made their first contribution in PR #15025
  • @serializer made their first contribution in PR #15013
  • @eddierichter-amd made their first contribution in PR #14840
  • @malags made their first contribution in PR #15000
  • @henryhwang made their first contribution in PR #15029
  • @plafleur made their first contribution in PR #15111
  • @tyler-liner made their first contribution in PR #14799
  • @Amir-R25 made their first contribution in PR #15144
  • @georg-wolflein made their first contribution in PR #15124
  • @niharm made their first contribution in PR #15140
  • @anthony-liner made their first contribution in PR #15015
  • @rishiganesh2002 made their first contribution in PR #15153
  • @danielaskdd made their first contribution in PR #15160
  • @JVenberg made their first contribution in PR #15146
  • @speglich made their first contribution in PR #15072
  • @daily-kim made their first contribution in PR #14764

Full Changelogโ€‹