Skip to main content

2 posts tagged with "multimodal"

View All Tags

Gemini Embedding 2 Preview: Multimodal Embeddings on LiteLLM

Sameer Kankute
SWE @ LiteLLM (LLM Translation)

LiteLLM now supports multimodal embeddings with gemini-embedding-2-previewβ€”mixing text, images, audio, video, and PDF content in a single request. Available via both the Gemini API (API key) and Vertex AI (GCP credentials).

Response shape differs by provider
  • Gemini API (gemini/...): each input element returns its own embedding, indexed 0..N-1 β€” same shape as OpenAI's /embeddings. LiteLLM routes to the batchEmbedContents endpoint with one EmbedContentRequest per input.
  • Vertex AI (vertex_ai/...): all input elements are combined into a single unified embedding via embedContent. Vertex AI does not expose batchEmbedContents for Gemini embedding models, so N parts β†’ 1 vector. To get one vector per item, call embedding(...) once per input.