Skip to main content

Vertex AI Text to Speech

PropertyDetails
DescriptionGoogle Cloud Text-to-Speech with Chirp3 HD voices and Gemini TTS
Provider Route on LiteLLMvertex_ai/chirp (Chirp), vertex_ai/gemini-*-tts (Gemini)

Chirp3 HD Voices​

Google Cloud Text-to-Speech API with high-quality Chirp3 HD voices.

Quick Start​

LiteLLM Python SDK​

Chirp3 Quick Start
from litellm import speech
from pathlib import Path

speech_file_path = Path(__file__).parent / "speech.mp3"
response = speech(
model="vertex_ai/chirp",
voice="alloy", # OpenAI voice name - automatically mapped
input="Hello, this is Vertex AI Text to Speech",
vertex_project="your-project-id",
vertex_location="us-central1",
)
response.stream_to_file(speech_file_path)

LiteLLM AI Gateway​

1. Setup config.yaml

config.yaml
model_list:
- model_name: vertex-tts
litellm_params:
model: vertex_ai/chirp
vertex_project: "your-project-id"
vertex_location: "us-central1"
vertex_credentials: "/path/to/service_account.json"

2. Start the proxy

Start LiteLLM Proxy
litellm --config /path/to/config.yaml

3. Make requests

Chirp3 Quick Start
curl http://0.0.0.0:4000/v1/audio/speech \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"model": "vertex-tts",
"voice": "alloy",
"input": "Hello, this is Vertex AI Text to Speech"
}' \
--output speech.mp3

Voice Mapping​

LiteLLM maps OpenAI voice names to Google Cloud voices. You can use either OpenAI voices or Google Cloud voices directly.

OpenAI VoiceGoogle Cloud Voice
alloyen-US-Studio-O
echoen-US-Studio-M
fableen-GB-Studio-B
onyxen-US-Wavenet-D
novaen-US-Studio-O
shimmeren-US-Wavenet-F

Using Google Cloud Voices Directly​

LiteLLM Python SDK​

Chirp3 HD Voice
from litellm import speech

# Pass Chirp3 HD voice name directly
response = speech(
model="vertex_ai/chirp",
voice="en-US-Chirp3-HD-Charon",
input="Hello with a Chirp3 HD voice",
vertex_project="your-project-id",
)
response.stream_to_file("speech.mp3")
Voice as Dict (Multilingual)
from litellm import speech

# Pass as dict for full control over language and voice
response = speech(
model="vertex_ai/chirp",
voice={
"languageCode": "de-DE",
"name": "de-DE-Chirp3-HD-Charon",
},
input="Hallo, dies ist ein Test",
vertex_project="your-project-id",
)
response.stream_to_file("speech.mp3")

LiteLLM AI Gateway​

Chirp3 HD Voice
curl http://0.0.0.0:4000/v1/audio/speech \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"model": "vertex-tts",
"voice": "en-US-Chirp3-HD-Charon",
"input": "Hello with a Chirp3 HD voice"
}' \
--output speech.mp3
Voice as Dict (Multilingual)
curl http://0.0.0.0:4000/v1/audio/speech \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"model": "vertex-tts",
"voice": {"languageCode": "de-DE", "name": "de-DE-Chirp3-HD-Charon"},
"input": "Hallo, dies ist ein Test"
}' \
--output speech.mp3

Browse available voices: Google Cloud Text-to-Speech Console

Passing Raw SSML​

LiteLLM auto-detects SSML when your input contains <speak> tags and passes it through unchanged.

LiteLLM Python SDK​

SSML Input
from litellm import speech

ssml = """
<speak>
<p>Hello, world!</p>
<p>This is a test of the <break strength="medium" /> text-to-speech API.</p>
</speak>
"""

response = speech(
model="vertex_ai/chirp",
voice="en-US-Studio-O",
input=ssml, # Auto-detected as SSML
vertex_project="your-project-id",
)
response.stream_to_file("speech.mp3")
Force SSML Mode
from litellm import speech

# Force SSML mode with use_ssml=True
response = speech(
model="vertex_ai/chirp",
voice="en-US-Studio-O",
input="<speak><prosody rate='slow'>Speaking slowly</prosody></speak>",
use_ssml=True,
vertex_project="your-project-id",
)
response.stream_to_file("speech.mp3")

LiteLLM AI Gateway​

SSML Input
curl http://0.0.0.0:4000/v1/audio/speech \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"model": "vertex-tts",
"voice": "en-US-Studio-O",
"input": "<speak><p>Hello!</p><break time=\"500ms\"/><p>How are you?</p></speak>"
}' \
--output speech.mp3

Supported Parameters​

ParameterDescriptionValues
voiceVoice selectionOpenAI voice, Google Cloud voice name, or dict
inputText to convertPlain text or SSML
speedSpeaking rate0.25 to 4.0 (default: 1.0)
response_formatAudio formatmp3, opus, wav, pcm, flac
use_ssmlForce SSML modeTrue / False

Async Usage​

Async Speech Generation
import asyncio
from litellm import aspeech

async def main():
response = await aspeech(
model="vertex_ai/chirp",
voice="alloy",
input="Hello from async",
vertex_project="your-project-id",
)
response.stream_to_file("speech.mp3")

asyncio.run(main())

Gemini TTS​

Gemini models with audio output capabilities using the chat completions API.

warning

Limitations:

  • Only supports pcm16 audio format
  • Streaming not yet supported
  • Must set modalities: ["audio"]

Quick Start​

LiteLLM Python SDK​

Gemini TTS Quick Start
from litellm import completion
import json

# Load credentials
with open('path/to/service_account.json', 'r') as file:
vertex_credentials = json.dumps(json.load(file))

response = completion(
model="vertex_ai/gemini-2.5-flash-preview-tts",
messages=[{"role": "user", "content": "Say hello in a friendly voice"}],
modalities=["audio"],
audio={
"voice": "Kore",
"format": "pcm16"
},
vertex_credentials=vertex_credentials
)
print(response)

LiteLLM AI Gateway​

1. Setup config.yaml

config.yaml
model_list:
- model_name: gemini-tts
litellm_params:
model: vertex_ai/gemini-2.5-flash-preview-tts
vertex_project: "your-project-id"
vertex_location: "us-central1"
vertex_credentials: "/path/to/service_account.json"

2. Start the proxy

Start LiteLLM Proxy
litellm --config /path/to/config.yaml

3. Make requests

Gemini TTS Request
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gemini-tts",
"messages": [{"role": "user", "content": "Say hello in a friendly voice"}],
"modalities": ["audio"],
"audio": {"voice": "Kore", "format": "pcm16"}
}'

Supported Models​

  • vertex_ai/gemini-2.5-flash-preview-tts
  • vertex_ai/gemini-2.5-pro-preview-tts

See Gemini TTS documentation for available voices.

Advanced Usage​

Gemini TTS with System Prompt
from litellm import completion

response = completion(
model="vertex_ai/gemini-2.5-pro-preview-tts",
messages=[
{"role": "system", "content": "You are a helpful assistant that speaks clearly."},
{"role": "user", "content": "Explain quantum computing in simple terms"}
],
modalities=["audio"],
audio={"voice": "Charon", "format": "pcm16"},
temperature=0.7,
max_tokens=150,
vertex_credentials=vertex_credentials
)