v1.76.0-stable - RPS Improvements

August 23, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

info

LiteLLM is hiring a Founding Backend Engineer, in San Francisco.

Apply here if you're interested!

Deploy this version

info

This release is not live yet.

New Models / Updated Models

Bugs

OpenAI
- Gpt-5 chat: clarify does not support function calling PR #13612, s/o @superpoussin22
VertexAI
- fix vertexai batch file format by @thiagosalvatore in PR #13576
LiteLLM Proxy
- Add support for calling image_edits + image_generations via SDK to Proxy - PR #13735
OpenRouter
- Fix max_output_tokens value for anthropic Claude 4 - PR #13526
Gemini
- Fix prompt caching cost calculation - PR #13742
Azure
- Support ../openai/v1/respones api base - PR #13526
- Fix azure/gpt-5-chat max_input_tokens - PR #13660
Groq
- streaming ASCII encoding issue - PR #13675
Baseten
- Refactored integration to use new openai-compatible endpoints - PR #13783
Bedrock
- fix application inference profile for pass-through endpoints for bedrock - PR #13881
DataRobot
- Updated URL handling for DataRobot provider URL - PR #13880

Features

Together AI
- Added Qwen3, Deepseek R1 0528 Throughput, GLM 4.5 and GPT-OSS models cost tracking - PR #13637, s/o @Tasmay-Tibrewal
Fireworks AI
- add fireworks_ai/accounts/fireworks/models/deepseek-v3-0324 - PR #13821
VertexAI
- Add VertexAI qwen API Service - PR #13828
- Add new VertexAI image models vertex_ai/imagen-4.0-generate-001, vertex_ai/imagen-4.0-ultra-generate-001, vertex_ai/imagen-4.0-fast-generate-001 - PR #13874
Anthropic
- Add long context support w/ cost tracking - PR #13759
DeepInfra
- Add rerank endpoint support for deepinfra - PR #13820
- Add new models for cost tracking - PR #13883, s/o @Toy-97
Bedrock
- Add tool prompt caching on async calls - PR #13803, s/o @UlookEE
- role chaining and session name with webauthentication for aws bedrock - PR #13753, s/o @RichardoC
Ollama
- Handle Ollama null response when using tool calling with non-tool trained models - PR #13902
OpenRouter
- Add deepseek/deepseek-chat-v3.1 support - PR #13897
Mistral
- Add support for calling mistral files via chat completions - PR #13866, s/o @jinskjoy
- Handle empty assistant content - PR #13671
- Support new ‘thinking’ response block - PR #13671
Databricks
- remove deprecated dbrx models (dbrx-instruct, llama 3.1) - PR #13843
AI/ML API
- Image gen api support - PR #13893

LLM API Endpoints

Bugs

Responses API
- add default api version for openai responses api calls - PR #13526
- support allowed_openai_params - PR #13671

MCP Gateway

Bugs

fix StreamableHTTPSessionManager .run() error - PR #13666

Vector Stores

Bugs

Bedrock
- Using LiteLLM Managed Credentials for Query - PR #13787

Management Endpoints / UI

Bugs

Passthrough
- Fix query passthrough deletion - PR #13622

Features

Models
- Add Search Functionality for Public Model Names in Model Dashboard - PR #13687
- Auto-Add azure/ to deployment Name in UI - PR #13685
- Models page row UI restructure - PR #13771
Notifications
- Add new notifications toast UI everywhere - PR #13813
Keys
- Fix key edit settings after regenerating a key - PR #13815
- Require team_id when creating service account keys - PR #13873
- Filter - show all options on filter option click - PR #13858
Usage
- Fix ‘Cannot read properties of undefined’ exception on user agent activity tab - PR #13892
SSO
- Free SSO usage for up to 5 users - PR #13843

Logging / Guardrail Integrations

Bugs

Bedrock Guardrails
- Add bedrock api key support - PR #13835

Features

Datadog LLM Observability
- Add support for Failure Logging PR #13726
- Add time to first token, litellm overhead, guardrail overhead latency metrics - PR #13734
- Add support for tracing guardrail input/output - PR #13767
Langfuse OTEL
- Allow using Key/Team Based Logging - PR #13791
AIM
- Migrate to new firewall API - PR #13748
OTEL
- Add OTEL tracing for actual LLM API call - PR #13836
MLFlow
- Include predicted output in MLflow tracing - PR #13795, s/o @TomeHirata

Performance / Loadbalancing / Reliability improvements

Bugs

Cooldowns
- don't return raw Azure Exceptions to client (can contain prompt leakage) - PR #13529
Auto-router
- Ensures the relevant dependencies for auto router existing on LiteLLM Docker - PR #13788
Model Alias
- Fix calling key with access to model alias - PR #13830

Features

S3 Caching
- Use namespace as prefix for s3 cache - PR #13704
- Async S3 Caching support (4x RPS improvement) - PR #13852, s/o @michal-otmianowski
Model Group header forwarding
- reuse same logic as global header forwarding - PR #13741
- add support for hosted_vllm on UI - PR #13885
Performance
- Improve LiteLLM Python SDK RPS by +200 RPS (braintrust import + aiohttp transport fixes) - PR #13839
- Use O(1) Set lookups for model routing - PR #13879
- Reduce Significant CPU overhead from litellm_logging.py - PR #13895
- Improvements for Async Success Handler (Logging Callbacks) - Approx +130 RPS - PR #13905

General Proxy Improvements

Bugs

SDK
- Fix litellm compatibility with newest release of openAI (>v1.100.0) - PR #13728
Helm
- Add possibility to configure resources for migrations-job - PR #13617
- Ensure Helm chart auto generated master keys follow sk-xxxx format - PR #13871
- Enhance database configuration: add support for optional endpointKey - PR #13763
Rate Limits
- fixing descriptor/response size mismatch on parallel_request_limiter_v3 - PR #13863, s/o @luizrennocosta
Non-root
- fix permission access on prisma migrate in non-root image - PR #13848, s/o @Ithanil

Deploy this version​

New Models / Updated Models​

Bugs​

Features​

LLM API Endpoints​

Bugs​

MCP Gateway​

Bugs​

Vector Stores​

Bugs​

Management Endpoints / UI​

Bugs​

Features​

Logging / Guardrail Integrations​

Bugs​

Features​

Performance / Loadbalancing / Reliability improvements​

Bugs​

Features​

General Proxy Improvements​

Bugs​

Deploy this version

New Models / Updated Models

Bugs

Features

LLM API Endpoints

Bugs

MCP Gateway

Bugs

Vector Stores

Bugs

Management Endpoints / UI

Bugs

Features

Logging / Guardrail Integrations

Bugs

Features

Performance / Loadbalancing / Reliability improvements

Bugs

Features

General Proxy Improvements

Bugs