Skip to main content

v1.70.1-stable - Gemini Realtime API Support

Krrish Dholakia
Ishaan Jaffer

Deploy this version​

docker run litellm
docker run
-e STORE_MODEL_IN_DB=True
-p 4000:4000
ghcr.io/berriai/litellm:main-v1.70.1-stable

Key Highlights​

LiteLLM v1.70.1-stable is live now. Here are the key highlights of this release:

  • Gemini Realtime API: You can now call Gemini's Live API via the OpenAI /v1/realtime API
  • Spend Logs Retention Period: Enable deleting spend logs older than a certain period.
  • PII Masking 2.0: Easily configure masking or blocking specific PII/PHI entities on the UI

Gemini Realtime API​

This release brings support for calling Gemini's realtime models (e.g. gemini-2.0-flash-live) via OpenAI's /v1/realtime API. This is great for developers as it lets them easily switch from OpenAI to Gemini by just changing the model name.

Key Highlights:

  • Support for text + audio input/output
  • Support for setting session configurations (modality, instructions, activity detection) in the OpenAI format
  • Support for logging + usage tracking for realtime sessions

This is currently supported via Google AI Studio. We plan to release VertexAI support over the coming week.

Read more

Spend Logs Retention Period​

This release enables deleting LiteLLM Spend Logs older than a certain period. Since we now enable storing the raw request/response in the logs, deleting old logs ensures the database remains performant in production.

Read more

PII Masking 2.0​

This release brings improvements to our Presidio PII Integration. As a Proxy Admin, you now have the ability to:

  • Mask or block specific entities (e.g., block medical licenses while masking other entities like emails).
  • Monitor guardrails in production. LiteLLM Logs will now show you the guardrail run, the entities it detected, and its confidence score for each entity.

Read more

New Models / Updated Models​

  • Gemini (VertexAI + Google AI Studio)
    • /chat/completion
      • Handle audio input - PR
      • Fixes maximum recursion depth issue when using deeply nested response schemas with Vertex AI by Increasing DEFAULT_MAX_RECURSE_DEPTH from 10 to 100 in constants. PR
      • Capture reasoning tokens in streaming mode - PR
  • Google AI Studio
    • /realtime
      • Gemini Multimodal Live API support
      • Audio input/output support, optional param mapping, accurate usage calculation - PR
  • VertexAI
    • /chat/completion
      • Fix llama streaming error - where model response was nested in returned streaming chunk - PR
  • Ollama
    • /chat/completion
      • structure responses fix - PR
  • Bedrock
    • /chat/completion
      • Handle thinking_blocks when assistant.content is None - PR
      • Fixes to only allow accepted fields for tool json schema - PR
      • Add bedrock sonnet prompt caching cost information
      • Mistral Pixtral support - PR
      • Tool caching support - PR
    • /messages
      • allow using dynamic AWS Params - PR
  • Nvidia NIM
    • /chat/completion [NEED DOCS ON SUPPORTED PARAMS]
      • Add tools, tool_choice, parallel_tool_calls support - PR
  • Novita AI
    • New Provider added for /chat/completion routes - PR
  • Azure
  • Cohere
    • /embeddings
      • Migrate embedding to use /v2/embed - adds support for output_dimensions param - PR
  • Anthropic
  • VLLM
  • OpenAI

LLM API Endpoints​

  • Responses API
    • Fix delete API support - PR
  • Rerank API
    • /v2/rerank now registered as ‘llm_api_route’ - enabling non-admins to call it - PR

Spend Tracking Improvements​

  • /chat/completion, /messages
    • Anthropic - web search tool cost tracking - PR
    • Groq - update model max tokens + cost information - PR
  • /audio/transcription
    • Azure - Add gpt-4o-mini-tts pricing - PR
    • Proxy - Fix tracking spend by tag - PR
  • /embeddings
    • Azure AI - Add cohere embed v4 pricing - PR

Management Endpoints / UI​

Logging / Alerting Integrations​

Guardrails​

  • Guardrails
    • New /apply_guardrail endpoint for directly testing a guardrail - PR
  • Lakera
    • /v2 endpoints support - PR
  • Presidio
    • Fixes handling of message content on presidio guardrail integration - PR
    • Allow specifying PII Entities Config - PR
  • Aim Security
    • Support for anonymization in AIM Guardrails - PR

Performance / Loadbalancing / Reliability improvements​

General Proxy Improvements​

  • Authentication
    • Handle Bearer $LITELLM_API_KEY in x-litellm-api-key custom header PR
  • New Enterprise pip package - litellm-enterprise - fixes issue where enterprise folder was not found when using pip package
  • Proxy CLI
    • Add models import command - PR
  • OpenWebUI
    • Configure LiteLLM to Parse User Headers from Open Web UI
  • LiteLLM Proxy w/ LiteLLM SDK
    • Option to force/always use the litellm proxy when calling via LiteLLM SDK

New Contributors​

Demo Instance​

Here's a Demo Instance to test changes:

Git Diff​