Skip to main content

Perplexity Embeddings

https://docs.perplexity.ai/docs/embeddings/quickstart

LiteLLM supports Perplexity's pplx-embed embedding models for web-scale text retrieval.

API Key​

# env variable
os.environ['PERPLEXITYAI_API_KEY']

Sample Usage - Embedding​

from litellm import embedding
import os

os.environ['PERPLEXITYAI_API_KEY'] = ""

response = embedding(
model="perplexity/pplx-embed-v1-0.6b",
input=["good morning from litellm"],
)
print(response)

Supported Parameters​

Perplexity embeddings support the following optional parameters:

ParameterTypeDescription
dimensionsintOutput embedding dimensions. 128–1024 for 0.6b models, 128–2560 for 4b models. Defaults to max.
encoding_formatstring"base64_int8" (default) or "base64_binary" for compressed output.

Example with Parameters​

from litellm import embedding
import os

os.environ['PERPLEXITYAI_API_KEY'] = ""

response = embedding(
model="perplexity/pplx-embed-v1-4b",
input=["Your text here"],
dimensions=512,
)
print(f"Embedding dimensions: {len(response.data[0]['embedding'])}")

Supported Models​

All models listed on the Perplexity Embeddings docs are supported. Use model=perplexity/<model-name>.

Model NameDimensionsMax TokensPrice (per 1M tokens)Function Call
pplx-embed-v1-0.6b102432K$0.004embedding(model="perplexity/pplx-embed-v1-0.6b", input)
pplx-embed-v1-4b256032K$0.03embedding(model="perplexity/pplx-embed-v1-4b", input)

Key Specifications​

  • Max texts per request: 512
  • Max tokens per input: 32,768
  • Combined request limit: 120,000 tokens
  • Matryoshka dimension reduction — reduce dimensions to 128+ for faster search and reduced storage
  • No instruction prefix required — embed text directly
  • Unnormalized embeddings — use cosine similarity for comparison