Skip to main content

v1.68.0-stable

Krrish Dholakia
Ishaan Jaffer

Deploy this version

docker run litellm
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.68.0-stable

Key Highlights

LiteLLM v1.68.0-stable will be live soon. Here are the key highlights of this release:

  • Bedrock Knowledge Base: You can now call query your Bedrock Knowledge Base with all LiteLLM models via /chat/completion or /responses API.
  • Rate Limits: This release brings accurate rate limiting across multiple instances, reducing spillover to at most 10 additional requests in high traffic.
  • Meta Llama API: Added support for Meta Llama API Get Started
  • LlamaFile: Added support for LlamaFile Get Started

Bedrock Knowledge Base (Vector Store)


This release adds support for Bedrock vector stores (knowledge bases) in LiteLLM. With this update, you can:

  • Use Bedrock vector stores in the OpenAI /chat/completions spec with all LiteLLM supported models.
  • View all available vector stores through the LiteLLM UI or API.
  • Configure vector stores to be always active for specific models.
  • Track vector store usage in LiteLLM Logs.

For the next release we plan on allowing you to set key, user, team, org permissions for vector stores.

Read more here

Rate Limiting

This release brings accurate multi-instance rate limiting across keys/users/teams. Outlining key engineering changes below:

  • Change: Instances now increment cache value instead of setting it. To avoid calling Redis on each request, this is synced every 0.01s.
  • Accuracy: In testing, we saw a maximum spill over from expected of 10 requests, in high traffic (100 RPS, 3 instances), vs. current 189 request spillover
  • Performance: Our load tests show this to reduce median response time by 100ms in high traffic 

This is currently behind a feature flag, and we plan to have this be the default by next week. To enable this today, just add this environment variable:

export LITELLM_RATE_LIMIT_ACCURACY=true

Read more here

New Models / Updated Models

  • Gemini (VertexAI + Google AI Studio)
    • Handle more json schema - openapi schema conversion edge cases PR
    • Tool calls - return ‘finish_reason=“tool_calls”’ on gemini tool calling response PR
  • VertexAI
    • Meta/llama-4 model support PR
    • Meta/llama3 - handle tool call result in content PR
    • Meta/* - return ‘finish_reason=“tool_calls”’ on tool calling response PR
  • Bedrock
  • OpenAI
    • Support OPENAI_BASE_URL in addition to OPENAI_API_BASE PR
    • Correctly re-raise 504 timeout errors PR
    • Native Gpt-4o-mini-tts support PR
  • 🆕 Meta Llama API provider PR
  • 🆕 LlamaFile provider PR

LLM API Endpoints

  • Response API
    • Fix for handling multi turn sessions PR
  • Embeddings
    • Caching fixes - PR
      • handle str -> list cache
      • Return usage tokens for cache hit
      • Combine usage tokens on partial cache hits
  • 🆕 Vector Stores
    • Allow defining Vector Store Configs - PR
    • New StandardLoggingPayload field for requests made when a vector store is used - PR
    • Show Vector Store / KB Request on LiteLLM Logs Page - PR
    • Allow using vector store in OpenAI API spec with tools - PR
  • MCP
    • Ensure Non-Admin virtual keys can access /mcp routes - PR

      Note: Currently, all Virtual Keys are able to access the MCP endpoints. We are working on a feature to allow restricting MCP access by keys/teams/users/orgs. Follow here for updates.

  • Moderations
    • Add logging callback support for /moderations API - PR

Spend Tracking / Budget Improvements

Management Endpoints / UI

  • Virtual Keys
    • Fix filtering on key alias - PR
    • Support global filtering on keys - PR
    • Pagination - fix clicking on next/back buttons on table - PR
  • Models
    • Triton - Support adding model/provider on UI - PR
    • VertexAI - Fix adding vertex models with reusable credentials - PR
    • LLM Credentials - show existing credentials for easy editing - PR
  • Teams
    • Allow reassigning team to other org - PR
  • Organizations
    • Fix showing org budget on table - PR

Logging / Guardrail Integrations

Performance / Loadbalancing / Reliability improvements

General Proxy Improvements