Skip to main content

v1.77.2-stable - Bedrock Batches API

Krrish Dholakia
CEO, LiteLLM
Ishaan Jaffer
CTO, LiteLLM

Deploy this version​

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.77.2

Key Highlights​

  • Bedrock Batches API - Support for creating Batch Inference Jobs on Bedrock using LiteLLM's unified batch API (OpenAI compatible)
  • Qwen API Tiered Pricing - Cost tracking support for Dashscope (Qwen) models with multiple pricing tiers

New Models / Updated Models​

New Model Support​

ProviderModelContext WindowPricing ($/1M tokens)Features
DeepInfradeepinfra/deepseek-ai/DeepSeek-R1164KInput: $0.70
Output: $2.40
Chat completions, tool calling
Herokuheroku/claude-4-sonnet8KContact provider for pricingFunction calling, tool choice
Herokuheroku/claude-3-7-sonnet8KContact provider for pricingFunction calling, tool choice
Herokuheroku/claude-3-5-sonnet-latest8KContact provider for pricingFunction calling, tool choice
Herokuheroku/claude-3-5-haiku4KContact provider for pricingFunction calling, tool choice
Dashscopedashscope/qwen-plus-latest1MTiered Pricing:
• 0-256K tokens: $0.40 / $1.20
• 256K-1M tokens: $1.20 / $3.60
Function calling, reasoning
Dashscopedashscope/qwen3-max-preview262KTiered Pricing:
• 0-32K tokens: $1.20 / $6.00
• 32K-128K tokens: $2.40 / $12.00
• 128K-252K tokens: $3.00 / $15.00
Function calling, reasoning
Dashscopedashscope/qwen-flash1MTiered Pricing:
• 0-256K tokens: $0.05 / $0.40
• 256K-1M tokens: $0.25 / $2.00
Function calling, reasoning
Dashscopedashscope/qwen3-coder-plus1MTiered Pricing:
• 0-32K tokens: $1.00 / $5.00
• 32K-128K tokens: $1.80 / $9.00
• 128K-256K tokens: $3.00 / $15.00
• 256K-1M tokens: $6.00 / $60.00
Function calling, reasoning, caching
Dashscopedashscope/qwen3-coder-flash1MTiered Pricing:
• 0-32K tokens: $0.30 / $1.50
• 32K-128K tokens: $0.50 / $2.50
• 128K-256K tokens: $0.80 / $4.00
• 256K-1M tokens: $1.60 / $9.60
Function calling, reasoning, caching

Features​

  • Bedrock
    • Bedrock Batches API - batch processing support with file upload and request transformation - PR #14518, PR #14522
  • VLLM
    • Added transcription endpoint support - PR #14523
  • Ollama
    • ollama_chat/ - images, thinking, and content as list handling - PR #14523
  • General
    • New debug flag for detailed request/response logging PR #14482

Bug Fixes​


LLM API Endpoints​

Bug Fixes​


Spend Tracking, Budgets and Rate Limiting​

Features​

Bug Fixes​

  • Provider Budgets - Fixed provider budget calculations - PR #14459

Management Endpoints / UI​

Features​

  • User Headers Mapping - New X-LiteLLM Users mapping feature for enhanced user tracking - PR #14485
  • Key Unblocking - Support for hashed tokens in /key/unblock endpoint - PR #14477
  • Model Group Header Forwarding - Enhanced wildcard model support with documentation - PR #14528

Bug Fixes​


Logging / Guardrail Integrations​

Features​

  • Noma Integration - Added non-blocking monitor mode with anonymize input support - PR #14401

Performance / Loadbalancing / Reliability improvements​

Performance​

  • Removed dynamic creation of static values - PR #14538
  • Using _PROXY_MaxParallelRequestsHandler_v3 by default for optimal throughput - PR #14450
  • Improved execution context propagation into logging tasks - PR #14455

New Contributors​

  • @Sameerlite made their first contribution in PR #14460
  • @holzman made their first contribution in PR #14459
  • @sashank5644 made their first contribution in PR #14469
  • @TomAlon made their first contribution in PR #14401
  • @AlexsanderHamir made their first contribution in PR #14538

Full Changelog​