[PRE-RELEASE]v1.76.0-stable - RPS Improvements
info
LiteLLM is hiring a Founding Backend Engineer, in San Francisco.
Apply here if you're interested!
Deploy this version
info
This release is not live yet.
New Models / Updated Models
Bugs
- OpenAI
- Gpt-5 chat: clarify does not support function calling PR #13612, s/o @superpoussin22
- VertexAI
- fix vertexai batch file format by @thiagosalvatore in PR #13576
- LiteLLM Proxy
- Add support for calling image_edits + image_generations via SDK to Proxy - PR #13735
- OpenRouter
- Fix max_output_tokens value for anthropic Claude 4 - PR #13526
- Gemini
- Fix prompt caching cost calculation - PR #13742
- Azure
- Groq
- streaming ASCII encoding issue - PR #13675
- Baseten
- Refactored integration to use new openai-compatible endpoints - PR #13783
- Bedrock
- fix application inference profile for pass-through endpoints for bedrock - PR #13881
- DataRobot
- Updated URL handling for DataRobot provider URL - PR #13880
Features
- Together AI
- Added Qwen3, Deepseek R1 0528 Throughput, GLM 4.5 and GPT-OSS models cost tracking - PR #13637, s/o @Tasmay-Tibrewal
- Fireworks AI
- add fireworks_ai/accounts/fireworks/models/deepseek-v3-0324 - PR #13821
- VertexAI
- Anthropic
- Add long context support w/ cost tracking - PR #13759
- DeepInfra
- Bedrock
- Ollama
- Handle Ollama null response when using tool calling with non-tool trained models - PR #13902
- OpenRouter
- Add deepseek/deepseek-chat-v3.1 support - PR #13897
- Mistral
- Databricks
- remove deprecated dbrx models (dbrx-instruct, llama 3.1) - PR #13843
- AI/ML API
- Image gen api support - PR #13893
LLM API Endpoints
Bugs
MCP Gateway
Bugs
- fix StreamableHTTPSessionManager .run() error - PR #13666
Vector Stores
Bugs
Management Endpoints / UI
Bugs
- Passthrough
- Fix query passthrough deletion - PR #13622
Features
- Models
- Notifications
- Add new notifications toast UI everywhere - PR #13813
- Keys
- Usage
- Fix ‘Cannot read properties of undefined’ exception on user agent activity tab - PR #13892
- SSO
- Free SSO usage for up to 5 users - PR #13843
Logging / Guardrail Integrations
Bugs
- Bedrock Guardrails
- Add bedrock api key support - PR #13835
Features
- Datadog LLM Observability
- Langfuse OTEL
- Allow using Key/Team Based Logging - PR #13791
- AIM
- Migrate to new firewall API - PR #13748
- OTEL
- Add OTEL tracing for actual LLM API call - PR #13836
- MLFlow
- Include predicted output in MLflow tracing - PR #13795, s/o @TomeHirata
Performance / Loadbalancing / Reliability improvements
Bugs
- Cooldowns
- don't return raw Azure Exceptions to client (can contain prompt leakage) - PR #13529
- Auto-router
- Ensures the relevant dependencies for auto router existing on LiteLLM Docker - PR #13788
- Model Alias
- Fix calling key with access to model alias - PR #13830
Features
- S3 Caching
- Use namespace as prefix for s3 cache - PR #13704
- Async S3 Caching support (4x RPS improvement) - PR #13852, s/o @michal-otmianowski
- Model Group header forwarding
- Performance
- Improve LiteLLM Python SDK RPS by +200 RPS (braintrust import + aiohttp transport fixes) - PR #13839
- Use O(1) Set lookups for model routing - PR #13879
- Reduce Significant CPU overhead from litellm_logging.py - PR #13895
- Improvements for Async Success Handler (Logging Callbacks) - Approx +130 RPS - PR #13905
General Proxy Improvements
Bugs
- SDK
- Fix litellm compatibility with newest release of openAI (>v1.100.0) - PR #13728
- Helm
- Rate Limits
- fixing descriptor/response size mismatch on parallel_request_limiter_v3 - PR #13863, s/o @luizrennocosta
- Non-root