Deploy this versionβ
- Docker
- Pip
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.66.0-stable
pip install litellm==1.66.0.post1
v1.66.0-stable is live now, here are the key highlights of this release
Key Highlightsβ
- Realtime API Cost Tracking: Track cost of realtime API calls
- Microsoft SSO Auto-sync: Auto-sync groups and group members from Azure Entra ID to LiteLLM
- xAI grok-3: Added support for
xai/grok-3
models - Security Fixes: Fixed CVE-2025-0330 and CVE-2024-6825 vulnerabilities
Let's dive in.
Realtime API Cost Trackingβ
This release adds Realtime API logging + cost tracking.
- Logging: LiteLLM now logs the complete response from realtime calls to all logging integrations (DB, S3, Langfuse, etc.)
- Cost Tracking: You can now set 'base_model' and custom pricing for realtime models. Custom Pricing
- Budgets: Your key/user/team budgets now work for realtime models as well.
Start here
Microsoft SSO Auto-syncβ
Auto-sync groups and members from Azure Entra ID to LiteLLM
This release adds support for auto-syncing groups and members on Microsoft Entra ID with LiteLLM. This means that LiteLLM proxy administrators can spend less time managing teams and members and LiteLLM handles the following:
- Auto-create teams that exist on Microsoft Entra ID
- Sync team members on Microsoft Entra ID with LiteLLM teams
Get started with this here
New Models / Updated Modelsβ
xAI
- Added reasoning_effort support for
xai/grok-3-mini-beta
Get Started - Added cost tracking for
xai/grok-3
models PR
- Added reasoning_effort support for
Hugging Face
- Added inference providers support Get Started
Azure
- Added azure/gpt-4o-realtime-audio cost tracking PR
VertexAI
- Added enterpriseWebSearch tool support Get Started
- Moved to only passing keys accepted by the Vertex AI response schema PR
Google AI Studio
Azure
Databricks
General
- Added litellm.supports_reasoning() util to track if an llm supports reasoning Get Started
- Function Calling - Handle pydantic base model in message tool calls, handle tools = [], and support fake streaming on tool calls for meta.llama3-3-70b-instruct-v1:0 PR
- LiteLLM Proxy - Allow passing
thinking
param to litellm proxy via client sdk PR - Fixed correctly translating 'thinking' param for litellm PR
Spend Tracking Improvementsβ
- OpenAI, Azure
- Realtime API Cost tracking with token usage metrics in spend logs Get Started
- Anthropic
- General
Management Endpoints / UIβ
Test Key Tab
Added rendering of Reasoning content, ttft, usage metrics on test key page PR
View input, output, reasoning tokens, ttft metrics.
Tag / Policy Management
Added Tag/Policy Management. Create routing rules based on request metadata. This allows you to enforce that requests with
tags="private"
only go to specific models. Get StartedCreate and manage tags.
Redesigned Login Screen
- Polished login screen PR
Microsoft SSO Auto-Sync
- Added debug route to allow admins to debug SSO JWT fields PR
- Added ability to use MSFT Graph API to assign users to teams PR
- Connected litellm to Azure Entra ID Enterprise Application PR
- Added ability for admins to set
default_team_params
for when litellm SSO creates default teams PR - Fixed MSFT SSO to use correct field for user email PR
- Added UI support for setting Default Team setting when litellm SSO auto creates teams PR
UI Bug Fixes
Logging / Guardrail Improvementsβ
- Prometheus
- Emit Key and Team Budget metrics on a cron job schedule Get Started
Security Fixesβ
- Fixed CVE-2025-0330 - Leakage of Langfuse API keys in team exception handling PR
- Fixed CVE-2024-6825 - Remote code execution in post call rules PR
Helmβ
Demoβ
Try this on the demo instance today
Complete Git Diffβ
See the complete git diff since v1.65.4-stable, here