v1.71.1-stable - 2x Higher Requests Per Second (RPS)

May 24, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.71.1-stable

pip install litellm
pip install litellm==1.71.1

Key Highlights

LiteLLM v1.71.1-stable is live now. Here are the key highlights of this release:

Performance improvements: LiteLLM can now scale to 200 RPS per instance with a 74ms median response time.
File Permissions: Control file access across OpenAI, Azure, VertexAI.
MCP x OpenAI: Use MCP servers with OpenAI Responses API.

Performance Improvements

This release brings aiohttp support for all LLM api providers. This means that LiteLLM can now scale to 200 RPS per instance with a 40ms median latency overhead.

This change doubles the RPS LiteLLM can scale to at this latency overhead.

You can opt into this by enabling the flag below. (We expect to make this the default in 1 week.)

Flag to enable

On LiteLLM Proxy

Set the USE_AIOHTTP_TRANSPORT=True in the environment variables.

Environment Variable
export USE_AIOHTTP_TRANSPORT="True"

On LiteLLM Python SDK

Set the use_aiohttp_transport=True to enable aiohttp transport.

Python SDK
import litellm

litellm.use_aiohttp_transport = True # default is False, enable this to use aiohttp transport
result = litellm.completion(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello, world!"}],
)
print(result)

File Permissions

This release brings support for File Permissions and Finetuning APIs to LiteLLM Managed Files. This is great for:

Proxy Admins: as users can only view/edit/delete files they’ve created - even when using shared OpenAI/Azure/Vertex deployments.
Developers: get a standard interface to use Files across Chat/Finetuning/Batch APIs.

New Models / Updated Models

Gemini VertexAI, Google AI Studio
- New gemini models - PR 1, PR 2
  - gemini-2.5-flash-preview-tts
  - gemini-2.0-flash-preview-image-generation
  - gemini/gemini-2.5-flash-preview-05-20
  - gemini-2.5-flash-preview-05-20
Anthropic
- Claude-4 model family support - PR
Bedrock
- Claude-4 model family support - PR
- Support for reasoning_effort and thinking parameters for Claude-4 - PR
VertexAI
- Claude-4 model family support - PR
- Global endpoints support - PR
- authorized_user credentials type support - PR
xAI
- xai/grok-3 pricing information - PR
LM Studio
- Structured JSON schema outputs support - PR
SambaNova
- Updated models and parameters - PR
Databricks
- Llama 4 Maverick model cost - PR
- Claude 3.7 Sonnet output token cost correction - PR
Azure
- Mistral Medium 25.05 support - PR
- Certificate-based authentication support - PR
Mistral
- devstral-small-2505 model pricing and context window - PR
Ollama
- Wildcard model support - PR
CustomLLM
- Embeddings support added - PR
Featherless AI
- Access to 4200+ models - PR

LLM API Endpoints

Image Edits
- /v1/images/edits - Support for /images/edits endpoint - PR PR
- Content policy violation error mapping - PR
Responses API
- MCP support for Responses API - PR
Files API
- LiteLLM Managed Files support for finetuning - PR PR
- Validation for file operations (retrieve/list/delete) - PR

Management Endpoints / UI

Teams
- Key and member count display - PR
- Spend rounded to 4 decimal points - PR
- Organization and team create buttons repositioned - PR
Keys
- Key reassignment and 'updated at' column - PR
- Show model access groups during creation - PR
Logs
- Model filter on logs - PR
- Passthrough endpoint error logs support - PR
Guardrails
- Config.yaml guardrails display - PR
Organizations/Users
- Spend rounded to 4 decimal points - PR
- Show clear error when adding a user to a team - PR
Audit Logs
- /list and /info endpoints for Audit Logs - PR

Logging / Alerting Integrations

Prometheus
- Track route on proxy_* metrics - PR
Langfuse
- Support for prompt_label parameter - PR
- Consistent modelParams logging - PR
DeepEval/ConfidentAI
- Logging enabled for proxy and SDK - PR
Logfire
- Fix otel proxy server initialization when using Logfire - PR

Authentication & Security

JWT Authentication
- Support for applying default internal user parameters when upserting a user via JWT authentication - PR
- Map a user to a team when upserting a user via JWT authentication - PR
Custom Auth
- Support for switching between custom auth and API key auth - PR

Performance / Reliability Improvements

aiohttp Transport
- 97% lower median latency (feature flagged) - PR PR
Background Health Checks
- Improved reliability - PR
Response Handling
- Better streaming status code detection - PR
- Response ID propagation improvements - PR
Thread Management
- Removed error-creating threads for reliability - PR

General Proxy Improvements

Proxy CLI
- Skip server startup flag - PR
- Avoid DATABASE_URL override when provided - PR
Model Management
- Clear cache and reload after model updates - PR
- Computer use support tracking - PR
Helm Chart
- LoadBalancer class support - PR

Bug Fixes

This release includes numerous bug fixes to improve stability and reliability:

LLM Provider Fixes
- VertexAI:
  - Fixed quota_project_id parameter issue - PR
  - Fixed credential refresh exceptions - PR
- Cohere: Fixes for adding Cohere models through LiteLLM UI - PR
- Anthropic:
  - Fixed streaming dict object handling for /v1/messages - PR
- OpenRouter:
  - Fixed stream usage ID issues - PR
Authentication & Users
- Fixed invitation email link generation - PR
- Fixed JWT authentication default role - PR
- Fixed user budget reset functionality - PR
- Fixed SSO user compatibility and email validation - PR
Database & Infrastructure
- Fixed DB connection parameter handling - PR
- Fixed email invitation link - PR
UI & Display
- Fixed MCP tool rendering when no arguments required - PR
- Fixed team model alias deletion - PR
- Fixed team viewer permissions - PR
Model & Routing
- Fixed team model mapping in route requests - PR
- Fixed standard optional parameter passing - PR

New Contributors

@DarinVerheijke made their first contribution in PR #10596
@estsauver made their first contribution in PR #10929
@mohittalele made their first contribution in PR #10665
@pselden made their first contribution in PR #10899
@unrealandychan made their first contribution in PR #10842
@dastaiger made their first contribution in PR #10946
@slytechnical made their first contribution in PR #10881
@daarko10 made their first contribution in PR #11006
@sorenmat made their first contribution in PR #10658
@matthid made their first contribution in PR #10982
@jgowdy-godaddy made their first contribution in PR #11032
@bepotp made their first contribution in PR #11008
@jmorenoc-o made their first contribution in PR #11031
@martin-liu made their first contribution in PR #11076
@gunjan-solanki made their first contribution in PR #11064
@tokoko made their first contribution in PR #10980
@spike-spiegel-21 made their first contribution in PR #10649
@kreatoo made their first contribution in PR #10927
@baejooc made their first contribution in PR #10887
@keykbd made their first contribution in PR #11114
@dalssoft made their first contribution in PR #11088
@jtong99 made their first contribution in PR #10853

Demo Instance

Here's a Demo Instance to test changes:

Instance: https://demo.litellm.ai/
Login Credentials:
- Username: admin
- Password: sk-1234

Deploy this version​

Key Highlights​

Performance Improvements​

Flag to enable​

File Permissions​

New Models / Updated Models​

LLM API Endpoints​

Management Endpoints / UI​

Logging / Alerting Integrations​

Authentication & Security​

Performance / Reliability Improvements​

General Proxy Improvements​

Bug Fixes​

New Contributors​

Demo Instance​

Git Diff​