Skip to main content

v1.76.3-stable - Performance, Video Generation & CloudZero Integration

Krrish Dholakia
CEO, LiteLLM
Ishaan Jaffer
CTO, LiteLLM

Deploy this versionโ€‹

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.76.3

Key Highlightsโ€‹

  • Major Performance Improvements +400 RPS when using correct amount of workers + CPU cores combination
  • Video Generation Support - Added Google AI Studio and Vertex AI Veo Video Generation through LiteLLM Pass through routes
  • CloudZero Integration - New cost tracking integration for exporting LiteLLM Usage and Spend data to CloudZero.

Major Changesโ€‹

  • Performance Optimization: LiteLLM Proxy now achieves +400 RPS when using correct amount of CPU cores - PR #14153, PR #14242

    By default, LiteLLM will now use num_workers = os.cpu_count() to achieve optimal performance.

    Override Options:

    Set environment variable:

    DEFAULT_NUM_WORKERS_LITELLM_PROXY=1

    Or start LiteLLM Proxy with:

    litellm --num_workers 1
  • Security Fix: Fixed memory_usage_in_mem_cache cache endpoint vulnerability - PR #14229


Performance Improvementsโ€‹

This release includes significant performance optimizations. On our internal benchmarks we saw 1 instance get +400 RPS when using correct amount of workers + CPU cores combination.

  • +400 RPS Performance Boost - LiteLLM Proxy now uses correct amount of CPU cores for optimal performance - PR #14153
  • Default CPU Workers - Changed DEFAULT_NUM_WORKERS_LITELLM_PROXY default to number of CPUs - PR #14242

New Models / Updated Modelsโ€‹

New Model Supportโ€‹

ProviderModelContext WindowInput ($/1M tokens)Output ($/1M tokens)Features
OpenRouteropenrouter/openai/gpt-4.11M$2.00$8.00Chat completions with vision
OpenRouteropenrouter/openai/gpt-4.1-mini1M$0.40$1.60Efficient chat completions
OpenRouteropenrouter/openai/gpt-4.1-nano1M$0.10$0.40Ultra-efficient chat
Vertex AIvertex_ai/openai/gpt-oss-20b-maas131K$0.075$0.30Reasoning support
Vertex AIvertex_ai/openai/gpt-oss-120b-maas131K$0.15$0.60Advanced reasoning
Geminigemini/veo-3.0-generate-preview1K-$0.75/secVideo generation
Geminigemini/veo-3.0-fast-generate-preview1K-$0.40/secFast video generation
Geminigemini/veo-2.0-generate-0011K-$0.35/secVideo generation
Volcenginedoubao-embedding-large4KFreeFree2048-dim embeddings
Together AItogether_ai/deepseek-ai/DeepSeek-V3.1128K$0.60$1.70Reasoning support

Featuresโ€‹

Bug Fixesโ€‹

New Provider Supportโ€‹

  • Volcengine
    • Added Volcengine embedding module with handler and transformation logic - PR #14028

LLM API Endpointsโ€‹

Featuresโ€‹

Bugsโ€‹

  • General
    • Remove "/" or ":" from model name when being used as h11 header name - PR #14191
    • Bug fix for openai.gpt-oss when using reasoning_effort parameter - PR #14300

Spend Tracking, Budgets and Rate Limitingโ€‹

Featuresโ€‹

  • Added header support for spend_logs_metadata - PR #14186
  • Litellm passthrough cost tracking for chat completion - PR #14256

Bug Fixesโ€‹

  • Fixed TPM Rate Limit Bug - PR #14237
  • Fixed Key Budget not resets at expectable times - PR #14241

Management Endpoints / UIโ€‹

Featuresโ€‹

  • UI Improvements
    • Logs page screen size fixed - PR #14135
    • Create Organization Tooltip added on Success - PR #14132
    • Back to Keys should say Back to Logs - PR #14134
    • Add client side pagination on All Models table - PR #14136
    • Model Filters UI improvement - PR #14131
    • Remove table filter on user info page - PR #14169
    • Team name badge added on the User Details - PR #14003
    • Fix: Log page parameter passing error - PR #14193
  • Authentication & Authorization
    • Support for ES256/ES384/ES512 and EdDSA JWT verification - PR #14118
    • Ensure team_id is a required field for generating service account keys - PR #14270

Bugsโ€‹

  • General
    • Validate store model in db setting - PR #14269

Logging / Guardrail Integrationsโ€‹

Featuresโ€‹

Guardrailsโ€‹

  • Added guardrail to the Anthropic API endpoint - PR #14107

New Integrationโ€‹


Performance / Loadbalancing / Reliability improvementsโ€‹

Featuresโ€‹

  • Performance
    • LiteLLM Proxy: +400 RPS when using correct amount of CPU cores - PR #14153
    • Allow using x-litellm-stream-timeout header for stream timeout in requests - PR #14147
    • Change DEFAULT_NUM_WORKERS_LITELLM_PROXY default to number CPUs - PR #14242
  • Monitoring
    • Added Prometheus missing metrics - PR #14139
  • Timeout
    • Stream Timeout Control - Allow using x-litellm-stream-timeout header for stream timeout in requests - PR #14147
  • Routing
    • Fixed x-litellm-tags not routing with Responses API - PR #14289

Bugsโ€‹

  • Security
    • Fixed memory_usage_in_mem_cache cache endpoint vulnerability - PR #14229

General Proxy Improvementsโ€‹

Featuresโ€‹

  • SCIM Support
    • Added better SCIM debugging - PR #14221
    • Bug fixes for handling SCIM Group Memberships - PR #14226
  • Kubernetes
    • Added optional PodDisruptionBudget for litellm proxy - PR #14093
  • Error Handling
    • Add model to azure error message - PR #14294

New Contributorsโ€‹

  • @iabhi4 made their first contribution in PR #14093
  • @zainhas made their first contribution in PR #14087
  • @LifeDJIK made their first contribution in PR #14146
  • @retanoj made their first contribution in PR #14133
  • @zhxlp made their first contribution in PR #14193
  • @kayoch1n made their first contribution in PR #14191
  • @kutsushitaneko made their first contribution in PR #14171
  • @mjmendo made their first contribution in PR #14176
  • @HarshavardhanK made their first contribution in PR #14213
  • @eycjur made their first contribution in PR #14207
  • @22mSqRi made their first contribution in PR #14241
  • @onlylhf made their first contribution in PR #14028
  • @btpemercier made their first contribution in PR #11319
  • @tremlin made their first contribution in PR #14287
  • @TobiMayr made their first contribution in PR #14262
  • @Eitan1112 made their first contribution in PR #14252

Full Changelogโ€‹