Skip to main content

[Pre-Release] v1.74.0

Krrish Dholakia
CEO, LiteLLM
Ishaan Jaffer
CTO, LiteLLM

Deploy this versionโ€‹

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.74.0.rc

Key Highlightsโ€‹

Azure Content Safety Guardrailsโ€‹

MCP Gateway: Segregate MCP toolsโ€‹

MCP Server Segregation is now supported on LiteLLM. This means you can specify the x-mcp-servers header to specify which servers to list tools from. This is useful when you want to request tools from only a subset of configured servers โ€” enabling curated toolsets and cleaner control.

Usageโ€‹

cURL Example with Server Segregation
curl --location 'https://api.openai.com/v1/responses' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer $OPENAI_API_KEY" \
--data '{
"model": "gpt-4o",
"tools": [
{
"type": "mcp",
"server_label": "litellm",
"server_url": "<your-litellm-proxy-base-url>/mcp",
"require_approval": "never",
"headers": {
"x-litellm-api-key": "Bearer YOUR_LITELLM_API_KEY",
"x-mcp-servers": "Zapier_Gmail"
}
}
],
"input": "Run available tools",
"tool_choice": "required"
}'

In this example, the request will only have access to tools from the "Zapier_Gmail" MCP server.

Team / Key Based Logging on UIโ€‹


This release brings support for Proxy Admins to configure Team/Key Based Logging Settings on the UI. This allows routing LLM request/response logs to different Langfuse/Arize projects based on the team or key.

For developers using LiteLLM, their logs are automatically routed to their specific Arize/Langfuse projects. On this release, we support the following integrations for key/team based logging:

  • langfuse
  • arize
  • langsmith

Python SDK: 2.3 Second Faster Import Timesโ€‹

This release brings significant performance improvements to the Python SDK with 2.3 seconds faster import times. We've refactored the initialization process to reduce startup overhead, making LiteLLM more efficient for applications that need quick initialization. This is a major improvement for applications that need to initialize LiteLLM quickly.


New Models / Updated Modelsโ€‹

Pricing / Context Window Updatesโ€‹

ProviderModelContext WindowInput ($/1M tokens)Output ($/1M tokens)Type
Watsonxwatsonx/mistralai/mistral-large131k$3.00$10.00New
Azure AIazure_ai/cohere-rerank-v3.54k$2.00/1k queries-New (Rerank)

Featuresโ€‹

Bugsโ€‹

  • Mistral
    • Fix transform_response handling for empty string content - PR
    • Turn Mistral to use llm_http_handler - PR
  • Gemini
    • Fix tool call sequence - PR
    • Fix custom api_base path preservation - PR
  • Anthropic
    • Fix user_id validation logic - PR
  • Bedrock
    • Support optional args for bedrock - PR
  • Ollama
    • Fix default parameters for ollama-chat - PR
  • VLLM
    • Add 'audio_url' message type support - PR

LLM API Endpointsโ€‹

Featuresโ€‹

  • /batches
    • Support batch retrieve with target model Query Param - PR
    • Anthropic completion bridge improvements - PR
  • /responses
    • Azure responses api bridge improvements - PR
    • Fix responses api error handling - PR
  • /mcp (MCP Gateway)
    • Add MCP url masking on frontend - PR
    • Add MCP servers header to scope - PR
    • Litellm mcp tool prefix - PR
    • Segregate MCP tools on connections using headers - PR
    • Added changes to mcp url wrapping - PR

Bugsโ€‹

  • /v1/messages
    • Remove hardcoded model name on streaming - PR
    • Support lowest latency routing - PR
    • Non-anthropic models token usage returned - PR
  • /chat/completions
    • Support Cursor IDE tool_choice format {"type": "auto"} - PR
  • /generateContent
    • Allow passing litellm_params - PR
    • Only pass supported params when using OpenAI models - PR
    • Fix using gemini-cli with Vertex Anthropic Models - PR
  • Streaming
    • Fix Error code: 307 for LlamaAPI Streaming Chat - PR
    • Store finish reason even if is_finished - PR

Spend Tracking / Budget Improvementsโ€‹

Bugsโ€‹

  • Fix allow strings in calculate cost - PR
  • VertexAI Anthropic streaming cost tracking with prompt caching fixes - PR

Management Endpoints / UIโ€‹

Bugsโ€‹

  • Team Management
    • Prevent team model reset on model add - PR
    • Return team-only models on /v2/model/info - PR
    • Render team member budget correctly - PR
  • UI Rendering
    • Fix rendering ui on non-root images - PR
    • Correctly display 'Internal Viewer' user role - PR
  • Configuration
    • Handle empty config.yaml - PR
    • Fix gemini /models - replace models/ as expected - PR

Featuresโ€‹

  • Team Management
    • Allow adding team specific logging callbacks - PR
    • Add Arize Team Based Logging - PR
    • Allow Viewing/Editing Team Based Callbacks - PR
  • UI Improvements
    • Comma separated spend and budget display - PR
    • Add logos to callback list - PR
  • CLI
    • Add litellm-proxy cli login for starting to use litellm proxy - PR
  • Email Templates
    • Customizable Email template - Subject and Signature - PR

Logging / Guardrail Integrationsโ€‹

Featuresโ€‹

Bugsโ€‹

  • Security
    • Ensure only LLM API route fails get logged on Langfuse - PR
  • OpenMeter
    • Integration error handling fix - PR
  • Message Redaction
    • Ensure message redaction works for responses API logging - PR
  • Bedrock Guardrails
    • Fix bedrock guardrails post_call for streaming responses - PR

Performance / Loadbalancing / Reliability improvementsโ€‹

Featuresโ€‹

  • Python SDK
    • 2 second faster import times - PR
    • Reduce python sdk import time by .3s - PR
  • Error Handling
    • Add error handling for MCP tools not found or invalid server - PR
  • SSL/TLS
    • Fix SSL certificate error - PR
    • Fix custom ca bundle support in aiohttp transport - PR

General Proxy Improvementsโ€‹

  • Startup
    • Add new banner on startup - PR
  • Dependencies
    • Update pydantic version - PR

New Contributorsโ€‹

Git Diffโ€‹