3 posts tagged with "embeddings"

Gemini Embedding 2 (GA): Multimodal Embeddings on LiteLLM

April 24, 2026

SWE @ LiteLLM (LLM Translation)

Litellm now fully supports Gemini Embedding 2 GA.

info

For end-to-end behavior, input shapes, and MIME types, see the Gemini Embedding 2 Preview walkthrough. This post focuses on GA naming, cost map coverage.

Incident Report: vLLM Embeddings Broken by encoding_format Parameter

February 18, 2026

Sameer Kankute

SWE @ LiteLLM (LLM Translation)

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaffer

CTO, LiteLLM

Date: Feb 16, 2026 Duration: ~3 hours Severity: High (for vLLM embedding users) Status: Resolved

Summary

A commit (dbcae4a) intended to fix OpenAI SDK behavior broke vLLM embeddings by explicitly passing encoding_format=None in API requests. vLLM rejects this with error: "unknown variant \`, expected float or base64"`.

vLLM embedding calls: Complete failure - all requests rejected
Other providers: No impact - OpenAI and other providers functioned normally
Other vLLM functionality: No impact - only embeddings were affected

Gemini Embedding 2 Preview: Multimodal Embeddings on LiteLLM

March 11, 2025

Sameer Kankute

SWE @ LiteLLM (LLM Translation)

LiteLLM now supports multimodal embeddings with gemini-embedding-2-preview—mixing text, images, audio, video, and PDF content in a single request. Available via both the Gemini API (API key) and Vertex AI (GCP credentials).

Response shape differs by provider

Gemini API (gemini/...): each input element returns its own embedding, indexed 0..N-1 — same shape as OpenAI's /embeddings. LiteLLM routes to the batchEmbedContents endpoint with one EmbedContentRequest per input.
Vertex AI (vertex_ai/...): all input elements are combined into a single unified embedding via embedContent. Vertex AI does not expose batchEmbedContents for Gemini embedding models, so N parts → 1 vector. To get one vector per item, call embedding(...) once per input.

Summary​

Summary