v1.74.7-stable

July 19, 2025

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.74.7-stable.patch.1

pip install litellm
pip install litellm==1.74.7.post2

Key Highlights

Vector Stores - Support for Vertex RAG Engine, PG Vector, OpenAI & Azure OpenAI Vector Stores.
Bulk Editing Users - Bulk editing users on the UI.
Health Check Improvements - Prevent unnecessary pod restarts during high traffic.
New LLM Providers - Added Moonshot AI and Vercel v0 provider support.

Vector Stores API

This release introduces support for using VertexAI RAG Engine, PG Vector, Bedrock Knowledge Bases, and OpenAI Vector Stores with LiteLLM.

This is ideal for use cases requiring external knowledge sources with LLMs.

This brings the following benefits for LiteLLM users:

Proxy Admin Benefits:

Fine-grained access control: determine which Keys and Teams can access specific Vector Stores
Complete usage tracking and monitoring across all vector store operations

Developer Benefits:

Simple, unified interface for querying vector stores and using them with LLM API requests
Consistent API experience across all supported vector store providers

Get started

Bulk Editing Users

v1.74.7-stable introduces Bulk Editing Users on the UI. This is useful for:

granting all existing users to a default team (useful for controlling access / tracking spend by team)
controlling personal model access for existing users

Health Check Server

This release brings reliability improvements that prevent unnecessary pod restarts during high traffic. Previously, when the main LiteLLM app was busy serving traffic, health endpoints would timeout even when pods were healthy.

Starting with this release, you can run health endpoints on an isolated process with a dedicated port. This ensures liveness and readiness probes remain responsive even when the main LiteLLM app is under heavy load.

Pricing / Context Window Updates

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)
Azure AI	`azure_ai/grok-3`	131k	$3.30	$16.50
Azure AI	`azure_ai/global/grok-3`	131k	$3.00	$15.00
Azure AI	`azure_ai/global/grok-3-mini`	131k	$0.25	$1.27
Azure AI	`azure_ai/grok-3-mini`	131k	$0.275	$1.38
Azure AI	`azure_ai/jais-30b-chat`	8k	$3200	$9710
Groq	`groq/moonshotai-kimi-k2-instruct`	131k	$1.00	$3.00
AI21	`jamba-large-1.7`	256k	$2.00	$8.00
AI21	`jamba-mini-1.7`	256k	$0.20	$0.40
Together.ai	`together_ai/moonshotai/Kimi-K2-Instruct`	131k	$1.00	$3.00
v0	`v0/v0-1.0-md`	128k	$3.00	$15.00
v0	`v0/v0-1.5-md`	128k	$3.00	$15.00
v0	`v0/v0-1.5-lg`	512k	$15.00	$75.00
Moonshot	`moonshot/moonshot-v1-8k`	8k	$0.20	$2.00
Moonshot	`moonshot/moonshot-v1-32k`	32k	$1.00	$3.00
Moonshot	`moonshot/moonshot-v1-128k`	131k	$2.00	$5.00
Moonshot	`moonshot/moonshot-v1-auto`	131k	$2.00	$5.00
Moonshot	`moonshot/kimi-k2-0711-preview`	131k	$0.60	$2.50
Moonshot	`moonshot/moonshot-v1-32k-0430`	32k	$1.00	$3.00
Moonshot	`moonshot/moonshot-v1-128k-0430`	131k	$2.00	$5.00
Moonshot	`moonshot/moonshot-v1-8k-0430`	8k	$0.20	$2.00
Moonshot	`moonshot/kimi-latest`	131k	$2.00	$5.00
Moonshot	`moonshot/kimi-latest-8k`	8k	$0.20	$2.00
Moonshot	`moonshot/kimi-latest-32k`	32k	$1.00	$3.00
Moonshot	`moonshot/kimi-latest-128k`	131k	$2.00	$5.00
Moonshot	`moonshot/kimi-thinking-preview`	131k	$30.00	$30.00
Moonshot	`moonshot/moonshot-v1-8k-vision-preview`	8k	$0.20	$2.00
Moonshot	`moonshot/moonshot-v1-32k-vision-preview`	32k	$1.00	$3.00
Moonshot	`moonshot/moonshot-v1-128k-vision-preview`	131k	$2.00	$5.00

Features

🆕 Moonshot API (Kimi)
- New LLM API integration for accessing Kimi models - PR #12592, Get Started
🆕 v0 Provider
- New provider integration for v0.dev - PR #12751, Get Started
OpenAI
- Use OpenAI DeepResearch models with litellm.completion (/chat/completions) - PR #12627 DOC NEEDED
- Add input_fidelity parameter for OpenAI image generation - PR #12662, Get Started
Azure OpenAI
- Use Azure OpenAI DeepResearch models with litellm.completion (/chat/completions) - PR #12627 DOC NEEDED
- Added response_format support for openai gpt-4.1 models - PR #12745
Anthropic
- Tool cache control support - PR #12668
Bedrock
- Claude 4 /invoke route support - PR #12599, Get Started
- Application inference profile tool choice support - PR #12599
Gemini
- Custom TTL support for context caching - PR #12541
- Fix implicit caching cost calculation for Gemini 2.x models - PR #12585
VertexAI
- Added Vertex AI RAG Engine support (use with OpenAI compatible /vector_stores API) - PR #12752, Get Started
vLLM
- Added support for using Rerank endpoints with vLLM - PR #12738, Get Started
AI21
- Added ai21/jamba-1.7 model family pricing - PR #12593, Get Started
Together.ai
- [New Model] add together_ai/moonshotai/Kimi-K2-Instruct - PR #12645, Get Started
Groq
- Add groq/moonshotai-kimi-k2-instruct model configuration - PR #12648, Get Started
Github Copilot
- Change System prompts to assistant prompts for GH Copilot - PR #12742, Get Started

Bugs

Anthropic
- Fix streaming + response_format + tools bug - PR #12463
XAI
- grok-4 does not support the stop param - PR #12646
AWS
- Role chaining with web authentication for AWS Bedrock - PR #12607
VertexAI
- Add project_id to cached credentials - PR #12661
Bedrock
- Fix bedrock nova micro and nova lite context window info in PR #12619

LLM API Endpoints

Features

/chat/completions
- Include tool calls in output of trim_messages - PR #11517
/v1/vector_stores
- New OpenAI-compatible vector store endpoints - PR #12699, Get Started
- Vector store search endpoint - PR #12749, Get Started
- Support for using PG Vector as a vector store - PR #12667, Get Started
/streamGenerateContent
- Non-gemini model support - PR #12647

Bugs

/vector_stores
- Knowledge Base Call returning error when passing as tools - PR #12628

MCP Gateway

Features

Access Groups
- Allow MCP access groups to be added via litellm proxy config.yaml - PR #12654
- List tools from access list for keys - PR #12657
Namespacing
- URL-based namespacing for better segregation - PR #12658
- Make MCP_TOOL_PREFIX_SEPARATOR configurable from env - PR #12603
Gateway Features
- Allow using MCPs with all LLM APIs (VertexAI, Gemini, Groq, etc.) when using /responses - PR #12546

Bugs

Fix to update object permission on update/delete key/team - PR #12701
Include /mcp in list of available routes on proxy - PR #12612

Management Endpoints / UI

Features

Keys
- Regenerate Key State Management improvements - PR #12729
Models
- Wildcard model filter support - PR #12597
- Fixes for handling team only models on UI - PR #12632
Usage Page
- Fix Y-axis labels overlap on Spend per Tag chart - PR #12754
Teams
- Allow setting custom key duration + show key creation stats - PR #12722
- Enable team admins to update member roles - PR #12629
Users
- New /user/bulk_update endpoint - PR #12720
Logs Page
- Add end_user filter on UI Logs Page - PR #12663
MCP Servers
- Copy MCP Server name functionality - PR #12760
Vector Stores
- UI support for clicking into Vector Stores - PR #12741
- Allow adding Vertex RAG Engine, OpenAI, Azure through UI - PR #12752
General
- Add Copy-on-Click for all IDs (Key, Team, Organization, MCP Server) - PR #12615
SCIM
- Add GET /ServiceProviderConfig endpoint - PR #12664

Bugs

Teams
- Ensure user id correctly added when creating new teams - PR #12719
- Fixes for handling team-only models on UI - PR #12632

Logging / Guardrail Integrations

Features

Google Cloud Model Armor
- New guardrails integration - PR #12492
Bedrock Guardrails
- Allow disabling exception on 'BLOCKED' action - PR #12693
Guardrails AI
- Support llmOutput based guardrails as pre-call hooks - PR #12674
DataDog LLM Observability
- Add support for tracking the correct span type based on LLM Endpoint used - PR #12652
Custom Logging
- Allow reading custom logger python scripts from S3 or GCS Bucket - PR #12623

Bugs

General Logging
- StandardLoggingPayload on cache_hits should track custom llm provider - PR #12652
S3 Buckets
- S3 v2 log uploader crashes when using with guardrails - PR #12733

Performance / Loadbalancing / Reliability improvements

Features

Health Checks
- Separate health app for liveness probes - PR #12669
- Health check app on separate port - PR #12718
Caching
- Add Azure Blob cache support - PR #12587
Router
- Handle ZeroDivisionError with zero completion tokens in lowest_latency strategy - PR #12734

Bugs

Database
- Use upsert for managed object table to avoid UniqueViolationError - PR #11795
- Refactor to support use_prisma_migrate for helm hook - PR #12600
Cache
- Fix: redis caching for embedding response models - PR #12750

Helm Chart

DB Migration Hook: refactor to support use_prisma_migrate - for helm hook PR
Add envVars and extraEnvVars support to Helm migrations job - PR #12591

General Proxy Improvements

Features

Control Plane + Data Plane Architecture
- Control Plane + Data Plane support - PR #12601
Proxy CLI
- Add "keys import" command to CLI - PR #12620
Swagger Documentation
- Add swagger docs for LiteLLM /chat/completions, /embeddings, /responses - PR #12618
Dependencies
- Loosen rich version from ==13.7.1 to >=13.7.1 - PR #12704

Bugs

Verbose log is enabled by default fix - PR #12596
Add support for disabling callbacks in request body - PR #12762
Handle circular references in spend tracking metadata JSON serialization - PR #12643

New Contributors

@AntonioKL made their first contribution in https://github.com/BerriAI/litellm/pull/12591
@marcelodiaz558 made their first contribution in https://github.com/BerriAI/litellm/pull/12541
@dmcaulay made their first contribution in https://github.com/BerriAI/litellm/pull/12463
@demoray made their first contribution in https://github.com/BerriAI/litellm/pull/12587
@staeiou made their first contribution in https://github.com/BerriAI/litellm/pull/12631
@stefanc-ai2 made their first contribution in https://github.com/BerriAI/litellm/pull/12622
@RichardoC made their first contribution in https://github.com/BerriAI/litellm/pull/12607
@yeahyung made their first contribution in https://github.com/BerriAI/litellm/pull/11795
@mnguyen96 made their first contribution in https://github.com/BerriAI/litellm/pull/12619
@rgambee made their first contribution in https://github.com/BerriAI/litellm/pull/11517
@jvanmelckebeke made their first contribution in https://github.com/BerriAI/litellm/pull/12725
@jlaurendi made their first contribution in https://github.com/BerriAI/litellm/pull/12704
@doublerr made their first contribution in https://github.com/BerriAI/litellm/pull/12661

v1.74.7-stable

Deploy this version

Key Highlights

Vector Stores API

Bulk Editing Users

Health Check Server

New Models / Updated Models

Pricing / Context Window Updates

Features

Bugs

LLM API Endpoints

Features

Bugs

MCP Gateway

Features

Bugs

Management Endpoints / UI

Features

Bugs

Logging / Guardrail Integrations

Features

Bugs

Performance / Loadbalancing / Reliability improvements

Features

Bugs

Helm Chart

General Proxy Improvements

Features

Bugs

New Contributors

Full Changelog

Deploy this version​

Key Highlights​

Vector Stores API​

Bulk Editing Users​

Health Check Server​

New Models / Updated Models​

Pricing / Context Window Updates​

Features​

Bugs​

LLM API Endpoints​

Features​

Bugs​

MCP Gateway​

Features​

Bugs​

Management Endpoints / UI​

Features​

Bugs​

Logging / Guardrail Integrations​

Features​

Bugs​

Performance / Loadbalancing / Reliability improvements​

Features​

Bugs​

Helm Chart​

General Proxy Improvements​

Features​

Bugs​

New Contributors​

Full Changelog​

Deploy this version

Key Highlights

Vector Stores API

Bulk Editing Users

Health Check Server

New Models / Updated Models

Pricing / Context Window Updates

Features

Bugs

LLM API Endpoints

Features

Bugs

MCP Gateway

Features

Bugs

Management Endpoints / UI

Features

Bugs

Logging / Guardrail Integrations

Features

Bugs

Performance / Loadbalancing / Reliability improvements

Features

Bugs

Helm Chart

General Proxy Improvements

Features

Bugs

New Contributors

Full Changelog