v1.77.2-stable - Bedrock Batches API

September 13, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.77.2-stable

pip install litellm
pip install litellm==1.77.2.post1

Key Highlights

Bedrock Batches API - Support for creating Batch Inference Jobs on Bedrock using LiteLLM's unified batch API (OpenAI compatible)
Qwen API Tiered Pricing - Cost tracking support for Dashscope (Qwen) models with multiple pricing tiers

New Models / Updated Models

New Model Support

Provider	Model	Context Window	Pricing ($/1M tokens)	Features
DeepInfra	`deepinfra/deepseek-ai/DeepSeek-R1`	164K	Input: $0.70 Output: $2.40	Chat completions, tool calling
Heroku	`heroku/claude-4-sonnet`	8K	Contact provider for pricing	Function calling, tool choice
Heroku	`heroku/claude-3-7-sonnet`	8K	Contact provider for pricing	Function calling, tool choice
Heroku	`heroku/claude-3-5-sonnet-latest`	8K	Contact provider for pricing	Function calling, tool choice
Heroku	`heroku/claude-3-5-haiku`	4K	Contact provider for pricing	Function calling, tool choice
Dashscope	`dashscope/qwen-plus-latest`	1M	Tiered Pricing: • 0-256K tokens: $0.40 / $1.20 • 256K-1M tokens: $1.20 / $3.60	Function calling, reasoning
Dashscope	`dashscope/qwen3-max-preview`	262K	Tiered Pricing: • 0-32K tokens: $1.20 / $6.00 • 32K-128K tokens: $2.40 / $12.00 • 128K-252K tokens: $3.00 / $15.00	Function calling, reasoning
Dashscope	`dashscope/qwen-flash`	1M	Tiered Pricing: • 0-256K tokens: $0.05 / $0.40 • 256K-1M tokens: $0.25 / $2.00	Function calling, reasoning
Dashscope	`dashscope/qwen3-coder-plus`	1M	Tiered Pricing: • 0-32K tokens: $1.00 / $5.00 • 32K-128K tokens: $1.80 / $9.00 • 128K-256K tokens: $3.00 / $15.00 • 256K-1M tokens: $6.00 / $60.00	Function calling, reasoning, caching
Dashscope	`dashscope/qwen3-coder-flash`	1M	Tiered Pricing: • 0-32K tokens: $0.30 / $1.50 • 32K-128K tokens: $0.50 / $2.50 • 128K-256K tokens: $0.80 / $4.00 • 256K-1M tokens: $1.60 / $9.60	Function calling, reasoning, caching

Features

Bedrock
- Bedrock Batches API - batch processing support with file upload and request transformation - PR #14518, PR #14522
VLLM
- Added transcription endpoint support - PR #14523
Ollama
- ollama_chat/ - images, thinking, and content as list handling - PR #14523
General
- New debug flag for detailed request/response logging PR #14482

Bug Fixes

Azure OpenAI
- Fixed extra_body injection causing payload rejection in image generation - PR #14475
LM Studio
- Resolved illegal Bearer header value issue - PR #14512

LLM API Endpoints

Bug Fixes

/messages
- Don't send content block after message w/ finish reason + usage block - PR #14477
/generateContent
- Gemini CLI Integration - Fixed token count errors - PR #14451, PR #14417

Spend Tracking, Budgets and Rate Limiting

Features

Qwen API Tiered Pricing - Added comprehensive tiered cost tracking for Dashscope/Qwen models - PR #14471, PR #14479

Bug Fixes

Provider Budgets - Fixed provider budget calculations - PR #14459

Management Endpoints / UI

Features

User Headers Mapping - New X-LiteLLM Users mapping feature for enhanced user tracking - PR #14485
Key Unblocking - Support for hashed tokens in /key/unblock endpoint - PR #14477
Model Group Header Forwarding - Enhanced wildcard model support with documentation - PR #14528

Bug Fixes

Log Tab Key Alias - Fixed filtering inaccuracies for failed logs - PR #14469, PR #14529

Logging / Guardrail Integrations

Features

Noma Integration - Added non-blocking monitor mode with anonymize input support - PR #14401

Performance / Loadbalancing / Reliability improvements

Performance

Removed dynamic creation of static values - PR #14538
Using _PROXY_MaxParallelRequestsHandler_v3 by default for optimal throughput - PR #14450
Improved execution context propagation into logging tasks - PR #14455

New Contributors

@Sameerlite made their first contribution in PR #14460
@holzman made their first contribution in PR #14459
@sashank5644 made their first contribution in PR #14469
@TomAlon made their first contribution in PR #14401
@AlexsanderHamir made their first contribution in PR #14538

Deploy this version​

Key Highlights​

New Models / Updated Models​

New Model Support​

Features​

Bug Fixes​

LLM API Endpoints​

Bug Fixes​

Spend Tracking, Budgets and Rate Limiting​

Features​

Bug Fixes​

Management Endpoints / UI​

Features​

Bug Fixes​

Logging / Guardrail Integrations​

Features​

Performance / Loadbalancing / Reliability improvements​

Performance​

New Contributors​

Full Changelog​

Deploy this version

Key Highlights

New Models / Updated Models

New Model Support

Features

Bug Fixes

LLM API Endpoints

Bug Fixes

Spend Tracking, Budgets and Rate Limiting

Features

Bug Fixes

Management Endpoints / UI

Features

Bug Fixes

Logging / Guardrail Integrations

Features

Performance / Loadbalancing / Reliability improvements

Performance

New Contributors

Full Changelog