Gemini Embedding 2 Preview: Multimodal Embeddings on LiteLLM
LiteLLM now supports multimodal embeddings with gemini-embedding-2-previewβmixing text, images, audio, video, and PDF content in a single request. Available via both the Gemini API (API key) and Vertex AI (GCP credentials).
- Gemini API (
gemini/...): each input element returns its own embedding, indexed0..N-1β same shape as OpenAI's/embeddings. LiteLLM routes to thebatchEmbedContentsendpoint with oneEmbedContentRequestper input. - Vertex AI (
vertex_ai/...): all input elements are combined into a single unified embedding viaembedContent. Vertex AI does not exposebatchEmbedContentsfor Gemini embedding models, soNparts β1vector. To get one vector per item, callembedding(...)once per input.
Supported Input Typesβ
| Modality | Supported Formats |
|---|---|
| Text | Plain text |
| Image | PNG, JPEG |
| Audio | MP3, WAV |
| Video | MP4, MOV |
| Documents |
Input Formatsβ
LiteLLM accepts three input formats for multimodal content:
- Data URIs β Base64-encoded inline:
data:image/png;base64,<encoded_data> - GCS URLs β Cloud Storage paths (Vertex AI):
gs://bucket/path/to/file.png - Gemini File References β Pre-uploaded files (Gemini API):
files/abc123
Quick Startβ
- Gemini API
- Vertex AI
- LiteLLM Proxy
from litellm import embedding
import os
os.environ["GEMINI_API_KEY"] = "your-api-key"
# Text + Image (base64)
response = embedding(
model="gemini/gemini-embedding-2-preview",
input=[
"The food was delicious and the waiter...",
"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAIAQMAAAD+wSzIAAAABlBMVEX///+/v7+jQ3Y5AAAADklEQVQI12P4AIX8EAgALgAD/aNpbtEAAAAASUVORK5CYII"
],
)
print(response)
import litellm
from litellm import embedding
litellm.vertex_project = "your-project-id"
litellm.vertex_location = "us-central1"
# Text + Image (GCS URL)
response = embedding(
model="vertex_ai/gemini-embedding-2-preview",
input=[
"Describe this image",
"gs://my-bucket/images/photo.png"
],
)
print(response)
1. Config (config.yaml)
model_list:
- model_name: gemini-embedding-2-preview
litellm_params:
model: gemini/gemini-embedding-2-preview
api_key: os.environ/GEMINI_API_KEY
- model_name: vertex-gemini-embedding-2-preview
litellm_params:
model: vertex_ai/gemini-embedding-2-preview
vertex_project: os.environ/VERTEXAI_PROJECT
vertex_location: os.environ/VERTEXAI_LOCATION
general_settings:
master_key: sk-1234
2. Start proxy
litellm --config config.yaml
3. Call embeddings
curl -X POST http://localhost:4000/embeddings \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-embedding-2-preview",
"input": [
"The food was delicious and the waiter...",
"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAIAQMAAAD+wSzIAAAABlBMVEX///+/v7+jQ3Y5AAAADklEQVQI12P4AIX8EAgALgAD/aNpbtEAAAAASUVORK5CYII"
]
}'
Input Format Examplesβ
| Format | Example | Provider |
|---|---|---|
| Data URI | data:image/png;base64,... | Gemini, Vertex AI |
| GCS URL | gs://bucket/path/image.png | Vertex AI |
| File reference | files/abc123 | Gemini API only |
Supported MIME Types for Data URIsβ
- Images:
image/png,image/jpeg - Audio:
audio/mpeg,audio/wav - Video:
video/mp4,video/quicktime - Documents:
application/pdf
GCS URL MIME Inferenceβ
For Vertex AI, MIME types are inferred from file extensions:
.pngβimage/png.jpg/.jpegβimage/jpeg.mp3βaudio/mpeg.wavβaudio/wav.mp4βvideo/mp4.movβvideo/quicktime.pdfβapplication/pdf
Optional Parametersβ
| Parameter | Description | Maps to |
|---|---|---|
dimensions | Output embedding size | outputDimensionality |
response = embedding(
model="gemini/gemini-embedding-2-preview",
input=["text to embed"],
dimensions=768, # Optional: control output vector size
)
Combined Embeddings (Gemini API, opt-in)β
By default the Gemini API path returns one embedding per input element (OpenAI-compatible). To fuse several modalities into a single vector β e.g., a product represented by its name + photo β wrap them in a nested list:
from litellm import embedding
# Default: 2 inputs β 2 separate embeddings
embedding(
model="gemini/gemini-embedding-2-preview",
input=["a red shoe", "data:image/png;base64,..."],
)
# Combined: text + image fused into 1 embedding
embedding(
model="gemini/gemini-embedding-2-preview",
input=[["a red shoe", "data:image/png;base64,..."]],
)
# Mixed: 1 combined entity + 1 plain text β 2 embeddings total
embedding(
model="gemini/gemini-embedding-2-preview",
input=[["a red shoe", "data:image/png;base64,..."], "just text"],
)
Useful for multi-modal retrieval where a single entity has more than one modality. See the embedding docs for details. On Vertex AI this opt-in is unnecessary β every request already returns one combined vector.
