Vertex AI Live API WebSocket Passthrough
LiteLLM now supports WebSocket passthrough for the Vertex AI Live API, enabling real-time bidirectional communication with Gemini models.
Overviewโ
The Vertex AI Live API WebSocket passthrough allows you to:
- Connect to Vertex AI Live API through LiteLLM proxy
- Use existing Vertex AI authentication methods
- Pass through all WebSocket messages bidirectionally
- Support text, audio, video, and multimodal interactions
- Track costs automatically for all usage types
Configurationโ
Environment Variablesโ
Set the following environment variables for Vertex AI authentication:
# Required
DEFAULT_VERTEXAI_PROJECT=your-project-id
DEFAULT_VERTEXAI_LOCATION=us-central1
# Optional - use one of these for authentication
DEFAULT_GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
# OR run: gcloud auth application-default login
Configuration Fileโ
Alternatively, configure in your config.yaml
:
litellm_settings:
default_vertex_config:
vertex_project: "your-project-id"
vertex_location: "us-central1"
vertex_credentials: "os.environ/GOOGLE_APPLICATION_CREDENTIALS"
Usageโ
WebSocket Endpointsโ
ws://your-proxy-host/v1/vertex-ai/live
ws://your-proxy-host/vertex-ai/live
Query Parametersโ
project_id
(optional): Google Cloud project ID (can be set in config)location
(optional): Vertex AI location (can be set in config, default: us-central1)
Example Connectionโ
// If project_id and location are set in config, you can connect without query params
const ws = new WebSocket('ws://localhost:4000/v1/vertex-ai/live');
// Or specify them explicitly
const ws = new WebSocket('ws://localhost:4000/v1/vertex-ai/live?project_id=your-project-id&location=us-central1');
Cost Trackingโ
The WebSocket passthrough automatically tracks costs for all usage types based on the Vertex AI pricing:
Supported Cost Trackingโ
- Text: Character-based or token-based pricing depending on model
- Audio: Per-second pricing for audio input/output
- Video: Per-second pricing for video input
- Images: Per-image pricing for image input
Cost Calculationโ
Costs are calculated using the same methods as other Vertex AI models in LiteLLM:
- Uses
cost_per_character
for Gemini models - Uses
cost_per_token
for partner models (Claude, Llama, etc.) - Includes audio, video, and image costs when applicable
Cost Loggingโ
Costs are automatically logged to:
- LiteLLM proxy logs
- Database (if configured)
- Spend tracking system
- Admin dashboard
Example log output:
Vertex AI Live WebSocket session cost: $0.001234 (input: $0.000800, output: $0.000434) tokens: 150, characters: 1200, duration: 45.2s
API Referenceโ
Setup Messageโ
Send this message first to initialize the session:
{
"setup": {
"model": "projects/your-project-id/locations/us-central1/publishers/google/models/gemini-2.0-flash-live-preview-04-09",
"generation_config": {
"response_modalities": ["TEXT"]
}
}
}
Text Inputโ
{
"client_content": {
"turns": [
{
"role": "user",
"parts": [{"text": "Hello! How are you?"}]
}
],
"turn_complete": true
}
}
Audio Inputโ
{
"realtime_input": {
"media_chunks": [
{
"data": "base64-encoded-audio-data",
"mime_type": "audio/pcm"
}
]
}
}
Supported Featuresโ
Response Modalitiesโ
- TEXT: Text responses
- AUDIO: Audio responses with voice synthesis
Toolsโ
- Function Calling: Define and use custom functions
- Code Execution: Execute Python code
- Google Search: Search the web
- Voice Activity Detection: Detect when user is speaking
Advanced Featuresโ
- Audio Transcription: Transcribe input and output audio
- Proactive Audio: Model responds only when relevant
- Affective Dialog: Understand emotional expressions
Examplesโ
Python Clientโ
import asyncio
import json
import websockets
async def chat_with_gemini():
uri = "ws://localhost:4000/v1/vertex-ai/live?project_id=your-project-id"
async with websockets.connect(uri) as websocket:
# Setup
setup = {
"setup": {
"model": "projects/your-project-id/locations/us-central1/publishers/google/models/gemini-2.0-flash-live-preview-04-09",
"generation_config": {"response_modalities": ["TEXT"]}
}
}
await websocket.send(json.dumps(setup))
# Wait for setup response
response = await websocket.recv()
print(f"Setup: {response}")
# Send message
message = {
"client_content": {
"turns": [{"role": "user", "parts": [{"text": "Hello!"}]}],
"turn_complete": True
}
}
await websocket.send(json.dumps(message))
# Receive response
async for response in websocket:
print(f"Response: {response}")
# Check if turn is complete
data = json.loads(response)
if data.get("serverContent", {}).get("turnComplete"):
break
asyncio.run(chat_with_gemini())
JavaScript Clientโ
const ws = new WebSocket('ws://localhost:4000/v1/vertex-ai/live?project_id=your-project-id');
ws.onopen = function() {
// Send setup
const setup = {
setup: {
model: "projects/your-project-id/locations/us-central1/publishers/google/models/gemini-2.0-flash-live-preview-04-09",
generation_config: { response_modalities: ["TEXT"] }
}
};
ws.send(JSON.stringify(setup));
};
ws.onmessage = function(event) {
const data = JSON.parse(event.data);
console.log('Received:', data);
// Check if setup is complete
if (data.setupComplete) {
// Send a message
const message = {
client_content: {
turns: [{ role: "user", parts: [{ text: "Hello!" }] }],
turn_complete: true
}
};
ws.send(JSON.stringify(message));
}
};
Error Handlingโ
The WebSocket connection may close with these codes:
4001
: Vertex AI credentials not configured4002
: Project ID not provided1011
: Internal server error
Authenticationโ
The WebSocket passthrough uses the same authentication as other LiteLLM endpoints:
- API Key: Pass
Authorization: Bearer your-api-key
header - Vertex AI Credentials: Set environment variables or config file
Limitationsโ
- Requires valid Google Cloud project with Vertex AI API enabled
- WebSocket connections are not persistent across server restarts
- Rate limits apply based on your Google Cloud quotas
Troubleshootingโ
Common Issuesโ
- Authentication Error: Ensure Vertex AI credentials are properly configured
- Project Not Found: Verify the project ID exists and has Vertex AI enabled
- Connection Refused: Check that the LiteLLM proxy server is running
Debug Modeโ
Enable debug logging to see detailed connection information:
export LITELLM_LOG=DEBUG