Skip to main content

3 posts tagged with "embeddings"

View All Tags

Incident Report: vLLM Embeddings Broken by encoding_format Parameter

Sameer Kankute
SWE @ LiteLLM (LLM Translation)
Krrish Dholakia
CEO, LiteLLM
Ishaan Jaffer
CTO, LiteLLM

Date: Feb 16, 2026 Duration: ~3 hours Severity: High (for vLLM embedding users) Status: Resolved

Summary​

A commit (dbcae4a) intended to fix OpenAI SDK behavior broke vLLM embeddings by explicitly passing encoding_format=None in API requests. vLLM rejects this with error: "unknown variant \`, expected float or base64"`.

  • vLLM embedding calls: Complete failure - all requests rejected
  • Other providers: No impact - OpenAI and other providers functioned normally
  • Other vLLM functionality: No impact - only embeddings were affected

Gemini Embedding 2 Preview: Multimodal Embeddings on LiteLLM

Sameer Kankute
SWE @ LiteLLM (LLM Translation)

LiteLLM now supports multimodal embeddings with gemini-embedding-2-previewβ€”mixing text, images, audio, video, and PDF content in a single request. Available via both the Gemini API (API key) and Vertex AI (GCP credentials).

Response shape differs by provider
  • Gemini API (gemini/...): each input element returns its own embedding, indexed 0..N-1 β€” same shape as OpenAI's /embeddings. LiteLLM routes to the batchEmbedContents endpoint with one EmbedContentRequest per input.
  • Vertex AI (vertex_ai/...): all input elements are combined into a single unified embedding via embedContent. Vertex AI does not expose batchEmbedContents for Gemini embedding models, so N parts β†’ 1 vector. To get one vector per item, call embedding(...) once per input.