2 posts tagged with "multimodal"

Gemini Embedding 2 (GA): Multimodal Embeddings on LiteLLM

April 24, 2026

SWE @ LiteLLM (LLM Translation)

Litellm now fully supports Gemini Embedding 2 GA.

info

For end-to-end behavior, input shapes, and MIME types, see the Gemini Embedding 2 Preview walkthrough. This post focuses on GA naming, cost map coverage.

Gemini Embedding 2 Preview: Multimodal Embeddings on LiteLLM

March 11, 2025

Sameer Kankute

SWE @ LiteLLM (LLM Translation)

LiteLLM now supports multimodal embeddings with gemini-embedding-2-preview—mixing text, images, audio, video, and PDF content in a single request. Available via both the Gemini API (API key) and Vertex AI (GCP credentials).

Response shape differs by provider

Gemini API (gemini/...): each input element returns its own embedding, indexed 0..N-1 — same shape as OpenAI's /embeddings. LiteLLM routes to the batchEmbedContents endpoint with one EmbedContentRequest per input.
Vertex AI (vertex_ai/...): all input elements are combined into a single unified embedding via embedContent. Vertex AI does not expose batchEmbedContents for Gemini embedding models, so N parts → 1 vector. To get one vector per item, call embedding(...) once per input.