[Preview] v1.77.7-stable - Claude Sonnet 4.5
Deploy this versionโ
- Docker
- Pip
docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.77.7.rc.1
pip install litellm
pip install litellm==1.77.7.rc.1
Key Highlightsโ
- Dynamic Rate Limiter v3 - Automatically maximizes throughput when capacity is available (< 80% saturation) by allowing lower-priority requests to use unused capacity, then switches to fair priority-based allocation under high load (โฅ 80%) to prevent blocking
- Major Performance Improvements - Router optimization reducing P99 latency by 62.5%, cache improvements from O(n*log(n)) to O(log(n))
- Claude Sonnet 4.5 - Support for Anthropic's new Claude Sonnet 4.5 model family with 200K+ context and tiered pricing
- MCP Gateway Enhancements - Fine-grained tool control, server permissions, and forwardable headers
- AMD Lemonade & Nvidia NIM - New provider support for AMD Lemonade and Nvidia NIM Rerank
- GitLab Prompt Management - GitLab-based prompt management integration
New Models / Updated Modelsโ
New Model Supportโ
Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Features |
---|---|---|---|---|---|
Anthropic | claude-sonnet-4-5 | 200K | $3.00 | $15.00 | Chat, reasoning, vision, function calling, prompt caching |
Anthropic | claude-sonnet-4-5-20250929 | 200K | $3.00 | $15.00 | Chat, reasoning, vision, function calling, prompt caching |
Bedrock | eu.anthropic.claude-sonnet-4-5-20250929-v1:0 | 200K | $3.00 | $15.00 | Chat, reasoning, vision, function calling, prompt caching |
Azure AI | azure_ai/grok-4 | 131K | $5.50 | $27.50 | Chat, reasoning, function calling, web search |
Azure AI | azure_ai/grok-4-fast-reasoning | 131K | $0.43 | $1.73 | Chat, reasoning, function calling, web search |
Azure AI | azure_ai/grok-4-fast-non-reasoning | 131K | $0.43 | $1.73 | Chat, function calling, web search |
Azure AI | azure_ai/grok-code-fast-1 | 131K | $3.50 | $17.50 | Chat, function calling, web search |
Groq | groq/moonshotai/kimi-k2-instruct-0905 | Context varies | Pricing varies | Pricing varies | Chat, function calling |
Ollama | Ollama Cloud models | Varies | Free | Free | Self-hosted models via Ollama Cloud |
Featuresโ
- Anthropic
- Add new claude-sonnet-4-5 model family with tiered pricing above 200K tokens - PR #15041
- Add anthropic/claude-sonnet-4-5 to model price json with prompt caching support - PR #15049
- Add 200K prices for Sonnet 4.5 - PR #15140
- Add cost tracking for /v1/messages in streaming response - PR #15102
- Add /v1/messages/count_tokens to Anthropic routes for non-admin user access - PR #15034
- Gemini
- Ignore type param for gemini tools - PR #15022
- Vertex AI
- Azure
- Ollama
- Add ollama cloud models - PR #15008
- Groq
- Add groq/moonshotai/kimi-k2-instruct-0905 - PR #15079
- OpenAI
- Add support for GPT 5 codex models - PR #14841
- DeepInfra
- Update DeepInfra model data refresh with latest pricing - PR #14939
- Bedrock
- Nvidia NIM
- Add Nvidia NIM Rerank Support - PR #15152
Bug Fixesโ
New Provider Supportโ
- AMD Lemonade
- Add AMD Lemonade provider support - PR #14840
LLM API Endpointsโ
Featuresโ
-
- Return Cost for Responses API Streaming requests - PR #15053
-
- Add full support for native Gemini API translation - PR #15029
-
Passthrough Gemini Routes
-
Passthrough Vertex AI Routes
-
General
Management Endpoints / UIโ
Featuresโ
-
Virtual Keys
-
Models + Endpoints
-
Admin Settings
-
MCP
Bug Fixesโ
-
Virtual Keys
-
Models + Endpoints
- Make UI theme settings publicly accessible for custom branding - PR #15074
-
Teams
- fix failed copy to clipboard for http ui - PR #15195
-
Logs
-
Test key
- update selected model on key change - PR #15197
-
Dashboard
- Fix LiteLLM model name fallback in dashboard overview - PR #14998
Logging / Guardrail / Prompt Management Integrationsโ
Featuresโ
- OpenTelemetry
- Use generation_name for span naming in logging method - PR #14799
- Langfuse
- Prometheus
- support custom metadata labels on key/team - PR #15094
Guardrailsโ
Prompt Managementโ
Spend Tracking, Budgets and Rate Limitingโ
- Cost Tracking
- Proxy: end user cost tracking in the responses API - PR #15124
- Parallel Request Limiter v3
- Teams
- Add model specific tpm/rpm limits to teams on LiteLLM - PR #15044
MCP Gatewayโ
- Server Configuration
- Bug Fixes
Performance / Loadbalancing / Reliability improvementsโ
- Router Optimizations
- +62.5% P99 Latency Improvement - Remove router inefficiencies (from O(M*N) to O(1)) - PR #15046
- Remove hasattr checks in Router - PR #15082
- Remove Double Lookups - PR #15084
- Optimize _filter_cooldown_deployments from O(nรm + kรn) to O(n) - PR #15091
- Optimize unhealthy deployment filtering in retry path (O(n*m) โ O(n+m)) - PR #15110
- Cache Optimizations
- Worker Management
- Add proxy CLI option to recycle workers after N requests - PR #15007
- Metrics & Monitoring
- LiteLLM Overhead metric tracking - Add support for tracking litellm overhead on cache hits - PR #15045
Documentation Updatesโ
- Provider Documentation
- General Documentation
Security Fixesโ
- JWT Token Security - Don't log JWT SSO token on .info() log - PR #15145
New Contributorsโ
- @herve-ves made their first contribution in PR #14998
- @wenxi-onyx made their first contribution in PR #15008
- @jpetrucciani made their first contribution in PR #15005
- @abhijitjavelin made their first contribution in PR #14983
- @ZeroClover made their first contribution in PR #15039
- @cedarm made their first contribution in PR #15043
- @Isydmr made their first contribution in PR #15025
- @serializer made their first contribution in PR #15013
- @eddierichter-amd made their first contribution in PR #14840
- @malags made their first contribution in PR #15000
- @henryhwang made their first contribution in PR #15029
- @plafleur made their first contribution in PR #15111
- @tyler-liner made their first contribution in PR #14799
- @Amir-R25 made their first contribution in PR #15144
- @georg-wolflein made their first contribution in PR #15124
- @niharm made their first contribution in PR #15140
- @anthony-liner made their first contribution in PR #15015
- @rishiganesh2002 made their first contribution in PR #15153
- @danielaskdd made their first contribution in PR #15160
- @JVenberg made their first contribution in PR #15146
- @speglich made their first contribution in PR #15072
- @daily-kim made their first contribution in PR #14764