Skip to main content

v1.80.5-stable - Gemini 3.0 Support

Krrish Dholakia
CEO, LiteLLM
Ishaan Jaff
CTO, LiteLLM

Deploy this version​

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.80.5-stable

Key Highlights​


Prompt Management​



This release introduces LiteLLM Prompt Studio - a comprehensive prompt management solution built directly into the LiteLLM UI. Create, test, and version your prompts without leaving your browser.

You can now do the following on LiteLLM Prompt Studio:

  • Create & Test Prompts: Build prompts with developer messages (system instructions) and test them in real-time with an interactive chat interface
  • Dynamic Variables: Use {{variable_name}} syntax to create reusable prompt templates with automatic variable detection
  • Version Control: Automatic versioning for every prompt update with complete version history tracking and rollback capabilities
  • Prompt Studio: Edit prompts in a dedicated studio environment with live testing and preview

API Integration:

Use your prompts in any application with simple API calls:

response = client.chat.completions.create(
model="gpt-4",
extra_body={
"prompt_id": "your-prompt-id",
"prompt_version": 2, # Optional: specify version
"prompt_variables": {"name": "value"} # Optional: pass variables
}
)

Get started here: LiteLLM Prompt Management Documentation


Performance – /realtime 182× Lower p99 Latency​

This update reduces /realtime latency by removing redundant encodings on the hot path, reusing shared SSL contexts, and caching formatting strings that were being regenerated twice per request despite rarely changing.

Results​

MetricBeforeAfterImprovement
Median latency2,200 ms59 ms−97% (~37× faster)
p95 latency8,500 ms67 ms−99% (~127× faster)
p99 latency18,000 ms99 ms−99% (~182× faster)
Average latency3,214 ms63 ms−98% (~51× faster)
RPS1651,207+631% (~7.3× increase)

Test Setup​

CategorySpecification
Load TestingLocust: 1,000 concurrent users, 500 ramp-up
System4 vCPUs, 8 GB RAM, 4 workers, 4 instances
DatabasePostgreSQL (Redis unused)
Configurationconfig.yaml
Load Scriptno_cache_hits.py

Model Compare UI​

New interactive playground UI enables side-by-side comparison of multiple LLM models, making it easy to evaluate and compare model responses.

Features:

  • Compare responses from multiple models in real-time
  • Side-by-side view with synchronized scrolling
  • Support for all LiteLLM-supported models
  • Cost tracking per model
  • Response time comparison
  • Pre-configured prompts for quick and easy testing

Details:

  • Parameterization: Configure API keys, endpoints, models, and model parameters, as well as interaction types (chat completions, embeddings, etc.)

  • Model Comparison: Compare up to 3 different models simultaneously with side-by-side response views

  • Comparison Metrics: View detailed comparison information including:

    • Time To First Token
    • Input / Output / Reasoning Tokens
    • Total Latency
    • Cost (if enabled in config)
  • Safety Filters: Configure and test guardrails (safety filters) directly in the playground interface

Get Started with Model Compare

New Providers and Endpoints​

New Providers​

ProviderSupported EndpointsDescription
Docker Model Runner/v1/chat/completionsRun LLM models in Docker containers

New Models / Updated Models​

New Model Support​

ProviderModelContext WindowInput ($/1M tokens)Output ($/1M tokens)Features
Azureazure/gpt-5.1272K$1.38$11.00Reasoning, vision, PDF input, responses API
Azureazure/gpt-5.1-2025-11-13272K$1.38$11.00Reasoning, vision, PDF input, responses API
Azureazure/gpt-5.1-codex272K$1.38$11.00Responses API, reasoning, vision
Azureazure/gpt-5.1-codex-2025-11-13272K$1.38$11.00Responses API, reasoning, vision
Azureazure/gpt-5.1-codex-mini272K$0.275$2.20Responses API, reasoning, vision
Azureazure/gpt-5.1-codex-mini-2025-11-13272K$0.275$2.20Responses API, reasoning, vision
Azure EUazure/eu/gpt-5-2025-08-07272K$1.375$11.00Reasoning, vision, PDF input
Azure EUazure/eu/gpt-5-mini-2025-08-07272K$0.275$2.20Reasoning, vision, PDF input
Azure EUazure/eu/gpt-5-nano-2025-08-07272K$0.055$0.44Reasoning, vision, PDF input
Azure EUazure/eu/gpt-5.1272K$1.38$11.00Reasoning, vision, PDF input, responses API
Azure EUazure/eu/gpt-5.1-codex272K$1.38$11.00Responses API, reasoning, vision
Azure EUazure/eu/gpt-5.1-codex-mini272K$0.275$2.20Responses API, reasoning, vision
Geminigemini-3-pro-preview2M$1.25$5.00Reasoning, vision, function calling
Geminigemini-3-pro-image2M$1.25$5.00Image generation, reasoning
OpenRouteropenrouter/deepseek/deepseek-v3p1-terminus164K$0.20$0.40Function calling, reasoning
OpenRouteropenrouter/moonshot/kimi-k2-instruct262K$0.60$2.50Function calling, web search
OpenRouteropenrouter/gemini/gemini-3-pro-preview2M$1.25$5.00Reasoning, vision, function calling
XAIxai/grok-4.1-fast2M$0.20$0.50Reasoning, function calling
Together AItogether_ai/z-ai/glm-4.6203K$0.40$1.75Function calling, reasoning
Cerebrascerebras/gpt-oss-120b131K$0.60$0.60Function calling
Bedrockanthropic.claude-sonnet-4-5-20250929-v1:0200K$3.00$15.00Computer use, reasoning, vision

Features​

  • Gemini (Google AI Studio + Vertex AI)

    • Add Day 0 gemini-3-pro-preview support - PR #16719
    • Add support for Gemini 3 Pro Image model - PR #16938
    • Add reasoning_content to streaming responses with tools enabled - PR #16854
    • Add includeThoughts=True for Gemini 3 reasoning_effort - PR #16838
    • Support thought signatures for Gemini 3 in responses API - PR #16872
    • Correct wrong system message handling for gemma - PR #16767
    • Gemini 3 Pro Image: capture image_tokens and support cost_per_output_image - PR #16912
    • Fix missing costs for gemini-2.5-flash-image - PR #16882
    • Gemini 3 thought signatures in tool call id - PR #16895
  • Azure

    • Add azure gpt-5.1 models - PR #16817
    • Add Azure models 2025 11 to cost maps - PR #16762
    • Update Azure Pricing - PR #16371
    • Add SSML Support for Azure Text-to-Speech (AVA) - PR #16747
  • OpenAI

    • Support GPT-5.1 reasoning.effort='none' in proxy - PR #16745
    • Add gpt-5.1-codex and gpt-5.1-codex-mini models to documentation - PR #16735
    • Inherit BaseVideoConfig to enable async content response for OpenAI video - PR #16708
  • Anthropic

    • Add support for strict parameter in Anthropic tool schemas - PR #16725
    • Add image as url support to anthropic - PR #16868
    • Add thought signature support to v1/messages api - PR #16812
    • Anthropic - support Structured Outputs output_format for Claude 4.5 sonnet and Opus 4.1 - PR #16949
  • Bedrock

    • Haiku 4.5 correct Bedrock configs - PR #16732
    • Ensure consistent chunk IDs in Bedrock streaming responses - PR #16596
    • Add Claude 4.5 to US Gov Cloud - PR #16957
    • Fix images being dropped from tool results for bedrock - PR #16492
  • Vertex AI

    • Add Vertex AI Image Edit Support - PR #16828
    • Update veo 3 pricing and add prod models - PR #16781
    • Fix Video download for veo3 - PR #16875
  • Snowflake

    • Snowflake provider support: added embeddings, PAT, account_id - PR #15727
  • OCI

    • Add oci_endpoint_id Parameter for OCI Dedicated Endpoints - PR #16723
  • XAI

    • Add support for Grok 4.1 Fast models - PR #16936
  • Together AI

  • Cerebras

    • Fix Cerebras GPT-OSS-120B model name - PR #16939

Bug Fixes​

  • OpenAI

    • Fix for 16863 - openai conversion from responses to completions - PR #16864
    • Revert "Make all gpt-5 and reasoning models to responses by default" - PR #16849
  • General

    • Get custom_llm_provider from query param - PR #16731
    • Fix optional param mapping - PR #16852
    • Add None check for litellm_params - PR #16754

LLM API Endpoints​

Features​

Bugs​

  • General
    • Responses API cost tracking with custom deployment names - PR #16778
    • Trim logged response strings in spend-logs - PR #16654

Management Endpoints / UI​

Features​

  • Proxy CLI Auth

    • Allow using JWTs for signing in with Proxy CLI - PR #16756
  • Virtual Keys

    • Fix Key Model Alias Not Working - PR #16896
  • Models + Endpoints

    • Add additional model settings to chat models in test key - PR #16793
    • Deactivate delete button on model table for config models - PR #16787
    • Change Public Model Hub to use proxyBaseUrl - PR #16892
    • Add JSON Viewer to request/response panel - PR #16687
    • Standarize icon images - PR #16837
  • Teams

  • Fallbacks

    • Fallbacks icon button tooltips and delete with friction - PR #16737
  • MCP Servers

    • Delete user and MCP Server Modal, MCP Table Tooltips - PR #16751
  • Callbacks

    • Expose backend endpoint for callbacks settings - PR #16698
    • Edit add callbacks route to use data from backend - PR #16699
  • Usage & Analytics

    • Allow partial matches for user ID in User Table - PR #16952
  • General UI

    • Allow setting base_url in API reference docs - PR #16674
    • Change /public fields to honor server root path - PR #16930
    • Correct ui build - PR #16702
    • Enable automatic dark/light mode based on system preference - PR #16748

Bugs​

  • UI Fixes

    • Fix flaky tests due to antd Notification Manager - PR #16740
    • Fix UI MCP Tool Test Regression - PR #16695
    • Fix edit logging settings not appearing - PR #16798
    • Add css to truncate long request ids in request viewer - PR #16665
    • Remove azure/ prefix in Placeholder for Azure in Add Model - PR #16597
    • Remove UI Session Token from user/info return - PR #16851
    • Remove console logs and errors from model tab - PR #16455
    • Change Bulk Invite User Roles to Match Backend - PR #16906
    • Mock Tremor's Tooltip to Fix Flaky UI Tests - PR #16786
    • Fix e2e ui playwright test - PR #16799
    • Fix Tests in CI/CD - PR #16972
  • SSO

    • Ensure role from SSO provider is used when a user is inserted onto LiteLLM - PR #16794
    • Docs - SSO - Manage User Roles via Azure App Roles - PR #16796
  • Auth

    • Ensure Team Tags works when using JWT Auth - PR #16797
    • Fix key never expires - PR #16692
  • Swagger UI

    • Fixes Swagger UI resolver errors for chat completion endpoints caused by Pydantic v2 $defs not being properly exposed in the OpenAPI schema - PR #16784

AI Integrations​

Logging​

Guardrails​

Prompt Management​

  • Prompt Management
    • Allow specifying just prompt_id in a request to a model - PR #16834
    • Add support for versioning prompts - PR #16836
    • Allow storing prompt version in DB - PR #16848
    • Add UI for editing the prompts - PR #16853
    • Allow testing prompts with Chat UI - PR #16898
    • Allow viewing version history - PR #16901
    • Allow specifying prompt version in code - PR #16929
    • UI, allow seeing model, prompt id for Prompt - PR #16932
    • Show "get code" section for prompt management + minor polish of showing version history - PR #16941

Secret Managers​


MCP Gateway​

  • MCP Hub - Publish/discover MCP Servers within a company - PR #16857
  • MCP Resources - MCP resources support - PR #16800
  • MCP OAuth - Docs - mcp oauth flow details - PR #16742
  • MCP Lifecycle - Drop MCPClient.connect and use run_with_session lifecycle - PR #16696
  • MCP Server IDs - Add mcp server ids - PR #16904
  • MCP URL Format - Fix mcp url format - PR #16940

Performance / Loadbalancing / Reliability improvements​

  • Realtime Endpoint Performance - Fix bottlenecks degrading realtime endpoint performance - PR #16670
  • SSL Context Caching - Cache SSL contexts to prevent excessive memory allocation - PR #16955
  • Cache Optimization - Fix cache cooldown key generation - PR #16954
  • Router Cache - Fix routing for requests with same cacheable prefix but different user messages - PR #16951
  • Redis Event Loop - Fix redis event loop closed at first call - PR #16913
  • Dependency Management - Upgrade pydantic to version 2.11.0 - PR #16909

Documentation Updates​

  • Provider Documentation

    • Add missing details to benchmark comparison - PR #16690
    • Fix anthropic pass-through endpoint - PR #16883
    • Cleanup repo and improve AI docs - PR #16775
  • API Documentation

    • Add docs related to openai metadata - PR #16872
    • Update docs with all supported endpoints and cost tracking - PR #16872
  • General Documentation

    • Add mini-swe-agent to Projects built on LiteLLM - PR #16971

Infrastructure / CI/CD​

  • UI Testing

  • Dependency Management

    • Bump js-yaml from 3.14.1 to 3.14.2 in /tests/proxy_admin_ui_tests/ui_unit_tests - PR #16755
    • Bump js-yaml from 3.14.1 to 3.14.2 - PR #16802
  • Migration

  • Config

  • Release Notes

    • Add perf improvements on embeddings to release notes - PR #16697
    • Docs - v1.80.0 - PR #16694
  • Investigation


New Contributors​

  • @mattmorgis made their first contribution in PR #16371
  • @mmandic-coatue made their first contribution in PR #16732
  • @Bradley-Butcher made their first contribution in PR #16725
  • @BenjaminLevy made their first contribution in PR #16757
  • @CatBraaain made their first contribution in PR #16767
  • @tushar8408 made their first contribution in PR #16831
  • @nbsp1221 made their first contribution in PR #16845
  • @idola9 made their first contribution in PR #16832
  • @nkukard made their first contribution in PR #16864
  • @alhuang10 made their first contribution in PR #16852
  • @sebslight made their first contribution in PR #16838
  • @TsurumaruTsuyoshi made their first contribution in PR #16905
  • @cyberjunk made their first contribution in PR #16492
  • @colinlin-stripe made their first contribution in PR #16895
  • @sureshdsk made their first contribution in PR #16883
  • @eiliyaabedini made their first contribution in PR #16875
  • @justin-tahara made their first contribution in PR #16957
  • @wangsoft made their first contribution in PR #16913
  • @dsduenas made their first contribution in PR #16891

Full Changelog​

View complete changelog on GitHub