Vertex AI Text to Speech

Property	Details
Description	Google Cloud Text-to-Speech with Chirp3 HD voices and Gemini TTS
Provider Route on LiteLLM	`vertex_ai/chirp` (Chirp), `vertex_ai/gemini-*-tts` (Gemini)

Chirp3 HD Voices

Google Cloud Text-to-Speech API with high-quality Chirp3 HD voices.

Quick Start

LiteLLM Python SDK

Chirp3 Quick Start
from litellm import speech
from pathlib import Path

speech_file_path = Path(__file__).parent / "speech.mp3"
response = speech(
    model="vertex_ai/chirp",
    voice="alloy",  # OpenAI voice name - automatically mapped
    input="Hello, this is Vertex AI Text to Speech",
    vertex_project="your-project-id",
    vertex_location="us-central1",
)
response.stream_to_file(speech_file_path)

LiteLLM AI Gateway

1. Setup config.yaml

config.yaml
model_list:
  - model_name: vertex-tts
    litellm_params:
      model: vertex_ai/chirp
      vertex_project: "your-project-id"
      vertex_location: "us-central1"
      vertex_credentials: "/path/to/service_account.json"

2. Start the proxy

Start LiteLLM Proxy
litellm --config /path/to/config.yaml

3. Make requests

curl
OpenAI Python SDK

Chirp3 Quick Start
curl http://0.0.0.0:4000/v1/audio/speech \
  -H "Authorization: Bearer sk-1234" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vertex-tts",
    "voice": "alloy",
    "input": "Hello, this is Vertex AI Text to Speech"
  }' \
  --output speech.mp3

Chirp3 Quick Start
import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

response = client.audio.speech.create(
    model="vertex-tts",
    voice="alloy",
    input="Hello, this is Vertex AI Text to Speech",
)
response.stream_to_file("speech.mp3")

Voice Mapping

LiteLLM maps OpenAI voice names to Google Cloud voices. You can use either OpenAI voices or Google Cloud voices directly.

OpenAI Voice	Google Cloud Voice
`alloy`	en-US-Studio-O
`echo`	en-US-Studio-M
`fable`	en-GB-Studio-B
`onyx`	en-US-Wavenet-D
`nova`	en-US-Studio-O
`shimmer`	en-US-Wavenet-F

Using Google Cloud Voices Directly

LiteLLM Python SDK

Chirp3 HD Voice
from litellm import speech

# Pass Chirp3 HD voice name directly
response = speech(
    model="vertex_ai/chirp",
    voice="en-US-Chirp3-HD-Charon",
    input="Hello with a Chirp3 HD voice",
    vertex_project="your-project-id",
)
response.stream_to_file("speech.mp3")

Voice as Dict (Multilingual)
from litellm import speech

# Pass as dict for full control over language and voice
response = speech(
    model="vertex_ai/chirp",
    voice={
        "languageCode": "de-DE",
        "name": "de-DE-Chirp3-HD-Charon",
    },
    input="Hallo, dies ist ein Test",
    vertex_project="your-project-id",
)
response.stream_to_file("speech.mp3")

LiteLLM AI Gateway

curl
OpenAI Python SDK

Chirp3 HD Voice
curl http://0.0.0.0:4000/v1/audio/speech \
  -H "Authorization: Bearer sk-1234" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vertex-tts",
    "voice": "en-US-Chirp3-HD-Charon",
    "input": "Hello with a Chirp3 HD voice"
  }' \
  --output speech.mp3

Voice as Dict (Multilingual)
curl http://0.0.0.0:4000/v1/audio/speech \
  -H "Authorization: Bearer sk-1234" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vertex-tts",
    "voice": {"languageCode": "de-DE", "name": "de-DE-Chirp3-HD-Charon"},
    "input": "Hallo, dies ist ein Test"
  }' \
  --output speech.mp3

Chirp3 HD Voice
import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

response = client.audio.speech.create(
    model="vertex-tts",
    voice="en-US-Chirp3-HD-Charon",
    input="Hello with a Chirp3 HD voice",
)
response.stream_to_file("speech.mp3")

Voice as Dict (Multilingual)
import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

response = client.audio.speech.create(
    model="vertex-tts",
    voice={"languageCode": "de-DE", "name": "de-DE-Chirp3-HD-Charon"},
    input="Hallo, dies ist ein Test",
)
response.stream_to_file("speech.mp3")

Browse available voices: Google Cloud Text-to-Speech Console

Passing Raw SSML

LiteLLM auto-detects SSML when your input contains <speak> tags and passes it through unchanged.

LiteLLM Python SDK

SSML Input
from litellm import speech

ssml = """
<speak>
    <p>Hello, world!</p>
    <p>This is a test of the <break strength="medium" /> text-to-speech API.</p>
</speak>
"""

response = speech(
    model="vertex_ai/chirp",
    voice="en-US-Studio-O",
    input=ssml,  # Auto-detected as SSML
    vertex_project="your-project-id",
)
response.stream_to_file("speech.mp3")

Force SSML Mode
from litellm import speech

# Force SSML mode with use_ssml=True
response = speech(
    model="vertex_ai/chirp",
    voice="en-US-Studio-O",
    input="<speak><prosody rate='slow'>Speaking slowly</prosody></speak>",
    use_ssml=True,
    vertex_project="your-project-id",
)
response.stream_to_file("speech.mp3")

LiteLLM AI Gateway

curl
OpenAI Python SDK

SSML Input
curl http://0.0.0.0:4000/v1/audio/speech \
  -H "Authorization: Bearer sk-1234" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vertex-tts",
    "voice": "en-US-Studio-O",
    "input": "<speak><p>Hello!</p><break time=\"500ms\"/><p>How are you?</p></speak>"
  }' \
  --output speech.mp3

SSML Input
import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

ssml = """<speak><p>Hello!</p><break time="500ms"/><p>How are you?</p></speak>"""

response = client.audio.speech.create(
    model="vertex-tts",
    voice="en-US-Studio-O",
    input=ssml,
)
response.stream_to_file("speech.mp3")

Supported Parameters

Parameter	Description	Values
`voice`	Voice selection	OpenAI voice, Google Cloud voice name, or dict
`input`	Text to convert	Plain text or SSML
`speed`	Speaking rate	0.25 to 4.0 (default: 1.0)
`response_format`	Audio format	`mp3`, `opus`, `wav`, `pcm`, `flac`
`use_ssml`	Force SSML mode	`True` / `False`

Async Usage

Async Speech Generation
import asyncio
from litellm import aspeech

async def main():
    response = await aspeech(
        model="vertex_ai/chirp",
        voice="alloy",
        input="Hello from async",
        vertex_project="your-project-id",
    )
    response.stream_to_file("speech.mp3")

asyncio.run(main())

Gemini TTS

Gemini models with audio output capabilities using the chat completions API.

warning

Limitations:

Only supports pcm16 audio format
Streaming not yet supported
Must set modalities: ["audio"]
When using via LiteLLM Proxy, must include "allowed_openai_params": ["audio", "modalities"] in the request body to enable audio parameters

Quick Start

LiteLLM Python SDK

Gemini TTS Quick Start
from litellm import completion
import json

# Load credentials
with open('path/to/service_account.json', 'r') as file:
    vertex_credentials = json.dumps(json.load(file))

response = completion(
    model="vertex_ai/gemini-2.5-flash-preview-tts",
    messages=[{"role": "user", "content": "Say hello in a friendly voice"}],
    modalities=["audio"],
    audio={
        "voice": "Kore",
        "format": "pcm16"
    },
    vertex_credentials=vertex_credentials
)
print(response)

LiteLLM AI Gateway

1. Setup config.yaml

config.yaml
model_list:
  - model_name: gemini-tts
    litellm_params:
      model: vertex_ai/gemini-2.5-flash-preview-tts
      vertex_project: "your-project-id"
      vertex_location: "us-central1"
      vertex_credentials: "/path/to/service_account.json"

2. Start the proxy

Start LiteLLM Proxy
litellm --config /path/to/config.yaml

3. Make requests

curl
OpenAI Python SDK

Gemini TTS Request
curl http://0.0.0.0:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "gemini-tts",
    "messages": [{"role": "user", "content": "Say hello in a friendly voice"}],
    "modalities": ["audio"],
    "audio": {"voice": "Kore", "format": "pcm16"},
    "allowed_openai_params": ["audio", "modalities"]
  }'

Gemini TTS Request
import openai

client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")

response = client.chat.completions.create(
    model="gemini-tts",
    messages=[{"role": "user", "content": "Say hello in a friendly voice"}],
    modalities=["audio"],
    audio={"voice": "Kore", "format": "pcm16"},
    extra_body={"allowed_openai_params": ["audio", "modalities"]}
)
print(response)

Supported Models

vertex_ai/gemini-2.5-flash-preview-tts
vertex_ai/gemini-2.5-pro-preview-tts

See Gemini TTS documentation for available voices.

Advanced Usage

Gemini TTS with System Prompt
from litellm import completion

response = completion(
    model="vertex_ai/gemini-2.5-pro-preview-tts",
    messages=[
        {"role": "system", "content": "You are a helpful assistant that speaks clearly."},
        {"role": "user", "content": "Explain quantum computing in simple terms"}
    ],
    modalities=["audio"],
    audio={"voice": "Charon", "format": "pcm16"},
    temperature=0.7,
    max_tokens=150,
    vertex_credentials=vertex_credentials
)

Chirp3 HD Voices​

Quick Start​

LiteLLM Python SDK​

LiteLLM AI Gateway​

Voice Mapping​

Using Google Cloud Voices Directly​

LiteLLM Python SDK​

LiteLLM AI Gateway​

Passing Raw SSML​

LiteLLM Python SDK​

LiteLLM AI Gateway​

Supported Parameters​

Async Usage​

Gemini TTS​

Quick Start​

LiteLLM Python SDK​

LiteLLM AI Gateway​

Supported Models​

Advanced Usage​

Chirp3 HD Voices

Quick Start

LiteLLM Python SDK

LiteLLM AI Gateway

Voice Mapping

Using Google Cloud Voices Directly

LiteLLM Python SDK

LiteLLM AI Gateway

Passing Raw SSML

LiteLLM Python SDK

LiteLLM AI Gateway

Supported Parameters

Async Usage

Gemini TTS

Quick Start

LiteLLM Python SDK

LiteLLM AI Gateway

Supported Models

Advanced Usage