Skip to main content

[Preview] v1.80.0-stable - Agent Hub Support

Krrish Dholakia
CEO, LiteLLM
Ishaan Jaff
CTO, LiteLLM

Deploy this version​

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.80.0.rc.2

Key Highlights​

  • 🆕 Agent Hub Support - Register and make agents public for your organization
  • RunwayML Provider - Complete video generation, image generation, and text-to-speech support
  • GPT-5.1 Family Support - Day-0 support for OpenAI's latest GPT-5.1 and GPT-5.1-Codex models
  • Prometheus OSS - Prometheus metrics now available in open-source version
  • Vector Store Files API - Complete OpenAI-compatible Vector Store Files API with full CRUD operations
  • Embeddings Performance - O(1) lookup optimization for router embeddings with shared sessions

Agent Hub​

This release adds support for registering and making agents public for your organization. This is great for Proxy Admins who want a central place to make agents built in their organization, discoverable to their users.

Here's the flow:

  1. Add agent to litellm.
  2. Make it public.
  3. Allow anyone to discover it on the public AI Hub page.

Get Started with Agent Hub

Performance – /embeddings 13× Lower p95 Latency​

This update significantly improves /embeddings latency by routing it through the same optimized pipeline as /chat/completions, benefiting from all previously applied networking optimizations.

Results​

MetricBeforeAfterImprovement
p95 latency5,700 ms430 ms−92% (~13× faster)**
p99 latency7,200 ms780 ms−89%
Average latency844 ms262 ms−69%
Median latency290 ms230 ms−21%
RPS1,216.71,219.7+0.25%

Test Setup​

CategorySpecification
Load TestingLocust: 1,000 concurrent users, 500 ramp-up
System4 vCPUs, 8 GB RAM, 4 workers, 4 instances
DatabasePostgreSQL (Redis unused)
Configurationconfig.yaml
Load Scriptno_cache_hits.py

🆕 RunwayML​

Complete integration for RunwayML's Gen-4 family of models, supporting video generation, image generation, and text-to-speech.

Supported Endpoints:

  • /v1/videos - Video generation (Gen-4 Turbo, Gen-4 Aleph, Gen-3A Turbo)
  • /v1/images/generations - Image generation (Gen-4 Image, Gen-4 Image Turbo)
  • /v1/audio/speech - Text-to-speech (ElevenLabs Multilingual v2)

Quick Start:

Generate Video with RunwayML
curl --location 'http://localhost:4000/v1/videos' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-1234' \
--data '{
"model": "runwayml/gen4_turbo",
"prompt": "A high quality demo video of litellm ai gateway",
"input_reference": "https://example.com/image.jpg",
"seconds": 5,
"size": "1280x720"
}'

Get Started with RunwayML


Prometheus Metrics - Open Source​

Prometheus metrics are now available in the open-source version of LiteLLM, providing comprehensive observability for your AI Gateway without requiring an enterprise license.

Quick Start:

litellm_settings:
success_callback: ["prometheus"]
failure_callback: ["prometheus"]

Get Started with Prometheus


Vector Store Files API​

Complete OpenAI-compatible Vector Store Files API now stable, enabling full file lifecycle management within vector stores.

Supported Endpoints:

  • POST /v1/vector_stores/{vector_store_id}/files - Create vector store file
  • GET /v1/vector_stores/{vector_store_id}/files - List vector store files
  • GET /v1/vector_stores/{vector_store_id}/files/{file_id} - Retrieve vector store file
  • GET /v1/vector_stores/{vector_store_id}/files/{file_id}/content - Retrieve file content
  • DELETE /v1/vector_stores/{vector_store_id}/files/{file_id} - Delete vector store file
  • DELETE /v1/vector_stores/{vector_store_id} - Delete vector store

Quick Start:

Create Vector Store File
curl --location 'http://localhost:4000/v1/vector_stores/vs_123/files' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-1234' \
--data '{
"file_id": "file_abc"
}'

Get Started with Vector Stores


New Providers and Endpoints​

New Providers​

ProviderSupported EndpointsDescription
RunwayML/v1/videos, /v1/images/generations, /v1/audio/speechGen-4 video generation, image generation, and text-to-speech

New LLM API Endpoints​

EndpointMethodDescriptionDocumentation
/v1/vector_stores/{vector_store_id}/filesPOSTCreate vector store fileDocs
/v1/vector_stores/{vector_store_id}/filesGETList vector store filesDocs
/v1/vector_stores/{vector_store_id}/files/{file_id}GETRetrieve vector store fileDocs
/v1/vector_stores/{vector_store_id}/files/{file_id}/contentGETRetrieve file contentDocs
/v1/vector_stores/{vector_store_id}/files/{file_id}DELETEDelete vector store fileDocs
/v1/vector_stores/{vector_store_id}DELETEDelete vector storeDocs

New Models / Updated Models​

New Model Support​

ProviderModelContext WindowInput ($/1M tokens)Output ($/1M tokens)Features
OpenAIgpt-5.1272K$1.25$10.00Reasoning, vision, PDF input, responses API
OpenAIgpt-5.1-2025-11-13272K$1.25$10.00Reasoning, vision, PDF input, responses API
OpenAIgpt-5.1-chat-latest128K$1.25$10.00Reasoning, vision, PDF input
OpenAIgpt-5.1-codex272K$1.25$10.00Responses API, reasoning, vision
OpenAIgpt-5.1-codex-mini272K$0.25$2.00Responses API, reasoning, vision
Moonshotmoonshot/kimi-k2-thinking262K$0.60$2.50Function calling, web search, reasoning
Mistralmistral/magistral-medium-250940K$2.00$5.00Reasoning, function calling
Vertex AIvertex_ai/moonshotai/kimi-k2-thinking-maas256K$0.60$2.50Function calling, web search
OpenRouteropenrouter/deepseek/deepseek-v3.2-exp164K$0.20$0.40Function calling, prompt caching
OpenRouteropenrouter/minimax/minimax-m2205K$0.26$1.02Function calling, reasoning
OpenRouteropenrouter/z-ai/glm-4.6203K$0.40$1.75Function calling, reasoning
OpenRouteropenrouter/z-ai/glm-4.6:exacto203K$0.45$1.90Function calling, reasoning
Voyagevoyage/voyage-3.532K$0.06-Embeddings
Voyagevoyage/voyage-3.5-lite32K$0.02-Embeddings

Video Generation Models​

ProviderModelCost Per SecondResolutionsFeatures
RunwayMLrunwayml/gen4_turbo$0.051280x720, 720x1280Text + image to video
RunwayMLrunwayml/gen4_aleph$0.151280x720, 720x1280Text + image to video
RunwayMLrunwayml/gen3a_turbo$0.051280x720, 720x1280Text + image to video

Image Generation Models​

ProviderModelCost Per ImageResolutionsFeatures
RunwayMLrunwayml/gen4_image$0.051280x720, 1920x1080Text + image to image
RunwayMLrunwayml/gen4_image_turbo$0.021280x720, 1920x1080Text + image to image
Fal.aifal_ai/fal-ai/flux-pro/v1.1$0.04/image-Image generation
Fal.aifal_ai/fal-ai/flux/schnell$0.003/image-Fast image generation
Fal.aifal_ai/fal-ai/bytedance/seedream/v3/text-to-image$0.03/image-Image generation
Fal.aifal_ai/fal-ai/bytedance/dreamina/v3.1/text-to-image$0.03/image-Image generation
Fal.aifal_ai/fal-ai/ideogram/v3$0.06/image-Image generation
Fal.aifal_ai/fal-ai/imagen4/preview/fast$0.02/image-Fast image generation
Fal.aifal_ai/fal-ai/imagen4/preview/ultra$0.06/image-High-quality image generation

Audio Models​

ProviderModelCostFeatures
RunwayMLrunwayml/eleven_multilingual_v2$0.0003/charText-to-speech

Features​

  • OpenAI

    • Add GPT-5.1 family support with reasoning capabilities - PR #16598
    • Add support for reasoning_effort='none' for GPT-5.1 - PR #16658
    • Add verbosity parameter support for GPT-5 family models - PR #16660
    • Fix forward OpenAI organization for image generation - PR #16607
  • Gemini (Google AI Studio + Vertex AI)

    • Add support for reasoning_effort='none' for Gemini models - PR #16548
    • Add all Gemini image models support in image generation - PR #16526
    • Add Gemini image edit support - PR #16430
    • Fix preserve non-ASCII characters in function call arguments - PR #16550
    • Fix Gemini conversation format issue with MCP auto-execution - PR #16592
  • Bedrock

    • Add support for filtering knowledge base queries - PR #16543
    • Ensure correct aws_region is used when provided dynamically for embeddings - PR #16547
    • Add support for custom KMS encryption keys in Bedrock Batch operations - PR #16662
    • Add bearer token authentication support for AgentCore - PR #16556
    • Fix AgentCore SSE stream iterator to async for proper streaming support - PR #16293
  • Anthropic

    • Add context management param support - PR #16528
    • Fix preserve $defs for Anthropic tools input schema - PR #16648
    • Fix support Anthropic tool_use and tool_result in token counter - PR #16351
  • Vertex AI

    • Add Vertex Kimi-K2-Thinking support - PR #16671
    • Add vertex_credentials support to litellm.rerank() - PR #16479
  • Mistral

    • Fix Magistral streaming to emit reasoning chunks - PR #16434
  • Moonshot (Kimi)

    • Add Kimi K2 thinking model support - PR #16445
  • SambaNova

    • Fix SambaNova API rejecting requests when message content is passed as a list format - PR #16612
  • VLLM

    • Fix use vllm passthrough config for hosted vllm provider instead of raising error - PR #16537
    • Add headers to VLLM Passthrough requests with success event logging - PR #16532
  • Azure

    • Fix improve Azure auth parameter handling for None values - PR #14436
  • Groq

    • Fix parse failed chunks for Groq - PR #16595
  • Voyage

    • Add Voyage 3.5 and 3.5-lite embeddings pricing and doc update - PR #16641
  • Fal.ai

    • Add fal-ai/flux/schnell support - PR #16580
    • Add all Imagen4 variants of fal ai in model map - PR #16579

Bug Fixes​

  • General
    • Fix sanitize null token usage in OpenAI-compatible responses - PR #16493
    • Fix apply provided timeout value to ClientTimeout.total - PR #16395
    • Fix raising wrong 429 error on wrong exception - PR #16482
    • Add new models, delete repeat models, update pricing - PR #16491
    • Update model logging format for custom LLM provider - PR #16485

LLM API Endpoints​

New Endpoints​

Features​

  • Video Generation API

    • Allow internal users to access video generation routes - PR #16472
  • Vector Stores API

    • Vector store files stable release with complete CRUD operations - PR #16643
      • POST /v1/vector_stores/{vector_store_id}/files - Create vector store file
      • GET /v1/vector_stores/{vector_store_id}/files - List vector store files
      • GET /v1/vector_stores/{vector_store_id}/files/{file_id} - Retrieve vector store file
      • GET /v1/vector_stores/{vector_store_id}/files/{file_id}/content - Retrieve file content
      • DELETE /v1/vector_stores/{vector_store_id}/files/{file_id} - Delete vector store file
      • DELETE /v1/vector_stores/{vector_store_id} - Delete vector store
    • Ensure users can access search_results for both stream + non-stream response - PR #16459

Bugs​


Management Endpoints / UI​

Features​

  • Proxy CLI Auth

    • Fix remove strict master_key check in add_deployment - PR #16453
  • Virtual Keys

    • UI - Add Tags To Edit Key Flow - PR #16500
    • UI - Test Key Page show models based on selected endpoint - PR #16452
    • UI - Expose user_alias in view and update path - PR #16669
  • Models + Endpoints

    • UI - Add LiteLLM Params to Edit Model - PR #16496
    • UI - Add Model use backend data - PR #16664
    • UI - Remove Description Field from LLM Credentials - PR #16608
    • UI - Add RunwayML on Admin UI supported models/providers - PR #16606
    • Infra - Migrate Add Model Fields to Backend - PR #16620
    • Add API Endpoint for creating model access group - PR #16663
  • Teams

    • UI - Invite User Searchable Team Select - PR #16454
    • Fix use user budget instead of key budget when creating new team - PR #16074
  • Budgets

    • UI - Move Budgets out of Experimental - PR #16544
  • Guardrails

    • UI - Config Guardrails should not be deletable from table - PR #16540
    • Fix remove enterprise restriction from guardrails list endpoint - PR #15333
  • Callbacks

  • Usage & Analytics

    • UI - Improve Usage Indicator - PR #16504
    • UI - Model Info Page Health Check - PR #16416
    • Infra - Show Deprecation Warning for Model Analytics Tab - PR #16417
    • Fix Litellm tags usage add request_id - PR #16111
  • Health Check

    • Add Langfuse OTEL and SQS to Health Check - PR #16514
  • General UI

    • UI - Normalize table action columns appearance - PR #16657
    • UI - Button Styles and Sizing in Settings Pages - PR #16600
    • UI - SSO Modal Cosmetic Changes - PR #16554
    • Fix UI logos loading with SERVER_ROOT_PATH - PR #16618
    • Fix remove misleading 'Custom' option mention from OpenAI endpoint tooltips - PR #16622

Bugs​

  • Management Endpoints
    • Fix inconsistent error responses in customer management endpoints - PR #16450
    • Fix correct date range filtering in /spend/logs endpoint - PR #16443
    • Fix /spend/logs/ui Access Control - PR #16446
    • Add pagination for /spend/logs/session/ui endpoint - PR #16603
    • Fix LiteLLM Usage shows key_hash - PR #16471
    • Fix app_roles missing from jwt payload - PR #16448

Logging / Guardrail / Prompt Management Integrations​

New Integration​

Logging​

Guardrails​

Prompt Management​


Spend Tracking, Budgets and Rate Limiting​

  • End User Budgets
    • Allow pointing max_end_user budget to an id, so the default ID applies to all end users - PR #16456

MCP Gateway​

  • Configuration
    • Add dynamic OAuth2 metadata discovery for MCP servers - PR #16676
    • Fix allow tool call even when server name prefix is missing - PR #16425
    • Fix exclude unauthorized MCP servers from allowed server list - PR #16551
    • Fix unable to delete MCP server from permission settings - PR #16407
    • Fix avoid crashing when MCP server record lacks credentials - PR #16601

Agents​


Performance / Loadbalancing / Reliability improvements​

  • Embeddings Performance

    • Use router's O(1) lookup and shared sessions for embeddings - PR #16344
  • Router Reliability

    • Support default fallbacks for unknown models - PR #16419
  • Callback Management

    • Add atexit handlers to flush callbacks for async completions - PR #16487

General Proxy Improvements​

  • Configuration Management
    • Fix update model_cost_map_url to use environment variable - PR #16429

Documentation Updates​

  • Provider Documentation

    • Fix streaming example in README - PR #16461
    • Update broken Slack invite links to support page - PR #16546
    • Fix code block indentation for fallbacks page - PR #16542
    • Documentation code example corrections - PR #16502
    • Document reasoning_effort summary field options - PR #16549
  • API Documentation

    • Add docs on APIs for model access management - PR #16673
    • Add docs for showing how to auto reload new pricing data - PR #16675
    • LiteLLM Quick start - show how model resolution works - PR #16602
    • Add docs for tracking callback failure - PR #16474
  • General Documentation

    • Fix container api link in release page - PR #16440
    • Add softgen to projects that are using litellm - PR #16423

New Contributors​

  • @artplan1 made their first contribution in PR #16423
  • @JehandadK made their first contribution in PR #16472
  • @vmiscenko made their first contribution in PR #16453
  • @mcowger made their first contribution in PR #16429
  • @yellowsubmarine372 made their first contribution in PR #16395
  • @Hebruwu made their first contribution in PR #16201
  • @jwang-gif made their first contribution in PR #15691
  • @AnthonyMonaco made their first contribution in PR #16502
  • @andrewm4894 made their first contribution in PR #16487
  • @f14-bertolotti made their first contribution in PR #16485
  • @busla made their first contribution in PR #16293
  • @MightyGoldenOctopus made their first contribution in PR #16537
  • @ultmaster made their first contribution in PR #14436
  • @bchrobot made their first contribution in PR #16542
  • @sep-grindr made their first contribution in PR #16622
  • @pnookala-godaddy made their first contribution in PR #16607
  • @dtunikov made their first contribution in PR #16592
  • @lukapecnik made their first contribution in PR #16648
  • @jyeros made their first contribution in PR #16618

Full Changelog​

View complete changelog on GitHub