Skip to main content

Vertex AI Live API WebSocket Passthrough

LiteLLM now supports WebSocket passthrough for the Vertex AI Live API, enabling real-time bidirectional communication with Gemini models.

Overviewโ€‹

The Vertex AI Live API WebSocket passthrough allows you to:

  • Connect to Vertex AI Live API through LiteLLM proxy
  • Use existing Vertex AI authentication methods
  • Pass through all WebSocket messages bidirectionally
  • Support text, audio, video, and multimodal interactions
  • Track costs automatically for all usage types

Configurationโ€‹

Environment Variablesโ€‹

Set the following environment variables for Vertex AI authentication:

# Required
DEFAULT_VERTEXAI_PROJECT=your-project-id
DEFAULT_VERTEXAI_LOCATION=us-central1

# Optional - use one of these for authentication
DEFAULT_GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
# OR run: gcloud auth application-default login

Configuration Fileโ€‹

Alternatively, configure in your config.yaml:

litellm_settings:
default_vertex_config:
vertex_project: "your-project-id"
vertex_location: "us-central1"
vertex_credentials: "os.environ/GOOGLE_APPLICATION_CREDENTIALS"

Usageโ€‹

WebSocket Endpointsโ€‹

  • ws://your-proxy-host/v1/vertex-ai/live
  • ws://your-proxy-host/vertex-ai/live

Query Parametersโ€‹

  • project_id (optional): Google Cloud project ID (can be set in config)
  • location (optional): Vertex AI location (can be set in config, default: us-central1)

Example Connectionโ€‹

// If project_id and location are set in config, you can connect without query params
const ws = new WebSocket('ws://localhost:4000/v1/vertex-ai/live');

// Or specify them explicitly
const ws = new WebSocket('ws://localhost:4000/v1/vertex-ai/live?project_id=your-project-id&location=us-central1');

Cost Trackingโ€‹

The WebSocket passthrough automatically tracks costs for all usage types based on the Vertex AI pricing:

Supported Cost Trackingโ€‹

  • Text: Character-based or token-based pricing depending on model
  • Audio: Per-second pricing for audio input/output
  • Video: Per-second pricing for video input
  • Images: Per-image pricing for image input

Cost Calculationโ€‹

Costs are calculated using the same methods as other Vertex AI models in LiteLLM:

  • Uses cost_per_character for Gemini models
  • Uses cost_per_token for partner models (Claude, Llama, etc.)
  • Includes audio, video, and image costs when applicable

Cost Loggingโ€‹

Costs are automatically logged to:

  • LiteLLM proxy logs
  • Database (if configured)
  • Spend tracking system
  • Admin dashboard

Example log output:

Vertex AI Live WebSocket session cost: $0.001234 (input: $0.000800, output: $0.000434) tokens: 150, characters: 1200, duration: 45.2s

API Referenceโ€‹

Setup Messageโ€‹

Send this message first to initialize the session:

{
"setup": {
"model": "projects/your-project-id/locations/us-central1/publishers/google/models/gemini-2.0-flash-live-preview-04-09",
"generation_config": {
"response_modalities": ["TEXT"]
}
}
}

Text Inputโ€‹

{
"client_content": {
"turns": [
{
"role": "user",
"parts": [{"text": "Hello! How are you?"}]
}
],
"turn_complete": true
}
}

Audio Inputโ€‹

{
"realtime_input": {
"media_chunks": [
{
"data": "base64-encoded-audio-data",
"mime_type": "audio/pcm"
}
]
}
}

Supported Featuresโ€‹

Response Modalitiesโ€‹

  • TEXT: Text responses
  • AUDIO: Audio responses with voice synthesis

Toolsโ€‹

  • Function Calling: Define and use custom functions
  • Code Execution: Execute Python code
  • Google Search: Search the web
  • Voice Activity Detection: Detect when user is speaking

Advanced Featuresโ€‹

  • Audio Transcription: Transcribe input and output audio
  • Proactive Audio: Model responds only when relevant
  • Affective Dialog: Understand emotional expressions

Examplesโ€‹

Python Clientโ€‹

import asyncio
import json
import websockets

async def chat_with_gemini():
uri = "ws://localhost:4000/v1/vertex-ai/live?project_id=your-project-id"

async with websockets.connect(uri) as websocket:
# Setup
setup = {
"setup": {
"model": "projects/your-project-id/locations/us-central1/publishers/google/models/gemini-2.0-flash-live-preview-04-09",
"generation_config": {"response_modalities": ["TEXT"]}
}
}
await websocket.send(json.dumps(setup))

# Wait for setup response
response = await websocket.recv()
print(f"Setup: {response}")

# Send message
message = {
"client_content": {
"turns": [{"role": "user", "parts": [{"text": "Hello!"}]}],
"turn_complete": True
}
}
await websocket.send(json.dumps(message))

# Receive response
async for response in websocket:
print(f"Response: {response}")
# Check if turn is complete
data = json.loads(response)
if data.get("serverContent", {}).get("turnComplete"):
break

asyncio.run(chat_with_gemini())

JavaScript Clientโ€‹

const ws = new WebSocket('ws://localhost:4000/v1/vertex-ai/live?project_id=your-project-id');

ws.onopen = function() {
// Send setup
const setup = {
setup: {
model: "projects/your-project-id/locations/us-central1/publishers/google/models/gemini-2.0-flash-live-preview-04-09",
generation_config: { response_modalities: ["TEXT"] }
}
};
ws.send(JSON.stringify(setup));
};

ws.onmessage = function(event) {
const data = JSON.parse(event.data);
console.log('Received:', data);

// Check if setup is complete
if (data.setupComplete) {
// Send a message
const message = {
client_content: {
turns: [{ role: "user", parts: [{ text: "Hello!" }] }],
turn_complete: true
}
};
ws.send(JSON.stringify(message));
}
};

Error Handlingโ€‹

The WebSocket connection may close with these codes:

  • 4001: Vertex AI credentials not configured
  • 4002: Project ID not provided
  • 1011: Internal server error

Authenticationโ€‹

The WebSocket passthrough uses the same authentication as other LiteLLM endpoints:

  1. API Key: Pass Authorization: Bearer your-api-key header
  2. Vertex AI Credentials: Set environment variables or config file

Limitationsโ€‹

  • Requires valid Google Cloud project with Vertex AI API enabled
  • WebSocket connections are not persistent across server restarts
  • Rate limits apply based on your Google Cloud quotas

Troubleshootingโ€‹

Common Issuesโ€‹

  1. Authentication Error: Ensure Vertex AI credentials are properly configured
  2. Project Not Found: Verify the project ID exists and has Vertex AI enabled
  3. Connection Refused: Check that the LiteLLM proxy server is running

Debug Modeโ€‹

Enable debug logging to see detailed connection information:

export LITELLM_LOG=DEBUG