v1.77.2-stable - Bedrock Batches API
Deploy this version​
- Docker
- Pip
docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.77.2
pip install litellm
pip install litellm==1.77.2
Key Highlights​
- Bedrock Batches API - Support for creating Batch Inference Jobs on Bedrock using LiteLLM's unified batch API (OpenAI compatible)
- Qwen API Tiered Pricing - Cost tracking support for Dashscope (Qwen) models with multiple pricing tiers
New Models / Updated Models​
New Model Support​
Provider | Model | Context Window | Pricing ($/1M tokens) | Features |
---|---|---|---|---|
DeepInfra | deepinfra/deepseek-ai/DeepSeek-R1 | 164K | Input: $0.70 Output: $2.40 | Chat completions, tool calling |
Heroku | heroku/claude-4-sonnet | 8K | Contact provider for pricing | Function calling, tool choice |
Heroku | heroku/claude-3-7-sonnet | 8K | Contact provider for pricing | Function calling, tool choice |
Heroku | heroku/claude-3-5-sonnet-latest | 8K | Contact provider for pricing | Function calling, tool choice |
Heroku | heroku/claude-3-5-haiku | 4K | Contact provider for pricing | Function calling, tool choice |
Dashscope | dashscope/qwen-plus-latest | 1M | Tiered Pricing: • 0-256K tokens: $0.40 / $1.20 • 256K-1M tokens: $1.20 / $3.60 | Function calling, reasoning |
Dashscope | dashscope/qwen3-max-preview | 262K | Tiered Pricing: • 0-32K tokens: $1.20 / $6.00 • 32K-128K tokens: $2.40 / $12.00 • 128K-252K tokens: $3.00 / $15.00 | Function calling, reasoning |
Dashscope | dashscope/qwen-flash | 1M | Tiered Pricing: • 0-256K tokens: $0.05 / $0.40 • 256K-1M tokens: $0.25 / $2.00 | Function calling, reasoning |
Dashscope | dashscope/qwen3-coder-plus | 1M | Tiered Pricing: • 0-32K tokens: $1.00 / $5.00 • 32K-128K tokens: $1.80 / $9.00 • 128K-256K tokens: $3.00 / $15.00 • 256K-1M tokens: $6.00 / $60.00 | Function calling, reasoning, caching |
Dashscope | dashscope/qwen3-coder-flash | 1M | Tiered Pricing: • 0-32K tokens: $0.30 / $1.50 • 32K-128K tokens: $0.50 / $2.50 • 128K-256K tokens: $0.80 / $4.00 • 256K-1M tokens: $1.60 / $9.60 | Function calling, reasoning, caching |
Features​
- Bedrock
- VLLM
- Added transcription endpoint support - PR #14523
- Ollama
ollama_chat/
- images, thinking, and content as list handling - PR #14523
- General
- New debug flag for detailed request/response logging PR #14482
Bug Fixes​
- Azure OpenAI
- Fixed extra_body injection causing payload rejection in image generation - PR #14475
- LM Studio
- Resolved illegal Bearer header value issue - PR #14512
LLM API Endpoints​
Bug Fixes​
- /messages
- Don't send content block after message w/ finish reason + usage block - PR #14477
- /generateContent
Spend Tracking, Budgets and Rate Limiting​
Features​
- Qwen API Tiered Pricing - Added comprehensive tiered cost tracking for Dashscope/Qwen models - PR #14471, PR #14479
Bug Fixes​
- Provider Budgets - Fixed provider budget calculations - PR #14459
Management Endpoints / UI​
Features​
- User Headers Mapping - New X-LiteLLM Users mapping feature for enhanced user tracking - PR #14485
- Key Unblocking - Support for hashed tokens in
/key/unblock
endpoint - PR #14477 - Model Group Header Forwarding - Enhanced wildcard model support with documentation - PR #14528
Bug Fixes​
Logging / Guardrail Integrations​
Features​
- Noma Integration - Added non-blocking monitor mode with anonymize input support - PR #14401
Performance / Loadbalancing / Reliability improvements​
Performance​
- Removed dynamic creation of static values - PR #14538
- Using
_PROXY_MaxParallelRequestsHandler_v3
by default for optimal throughput - PR #14450 - Improved execution context propagation into logging tasks - PR #14455
New Contributors​
- @Sameerlite made their first contribution in PR #14460
- @holzman made their first contribution in PR #14459
- @sashank5644 made their first contribution in PR #14469
- @TomAlon made their first contribution in PR #14401
- @AlexsanderHamir made their first contribution in PR #14538