Skip to main content

Nvidia NIM - Rerank

Use Nvidia NIM Rerank models through LiteLLM.

PropertyDetails
DescriptionNvidia NIM provides high-performance reranking models for semantic search and retrieval-augmented generation (RAG)
Provider DocNvidia NIM Rerank API โ†—
Supported Endpoint/rerank

Overviewโ€‹

Nvidia NIM rerank models help you:

  • Reorder search results by relevance to a query
  • Improve RAG (Retrieval-Augmented Generation) accuracy
  • Filter and rank large document sets efficiently

Supported Models:

  • All Nvidia NIM rerank models on their platform
tip

See the full list of LiteLLM supported Nvidia NIM rerank models on Nvidia NIM

Usageโ€‹

LiteLLM Python SDKโ€‹

import litellm
import os

os.environ['NVIDIA_NIM_API_KEY'] = "nvapi-..."

response = litellm.rerank(
model="nvidia_nim/nvidia/llama-3_2-nv-rerankqa-1b-v2",
query="What is the GPU memory bandwidth of H100 SXM?",
documents=[
"The Hopper GPU is paired with the Grace CPU using NVIDIA's ultra-fast chip-to-chip interconnect, delivering 900GB/s of bandwidth.",
"A100 provides up to 20X higher performance over the prior generation.",
"Accelerated servers with H100 deliver 3 terabytes per second (TB/s) of memory bandwidth per GPU."
],
top_n=3,
)

print(response)

Response:

{
"results": [
{
"index": 2,
"relevance_score": 6.828125,
"document": {
"text": "Accelerated servers with H100 deliver 3 terabytes per second (TB/s) of memory bandwidth per GPU."
}
},
{
"index": 0,
"relevance_score": -1.564453125,
"document": {
"text": "The Hopper GPU is paired with the Grace CPU using NVIDIA's ultra-fast chip-to-chip interconnect, delivering 900GB/s of bandwidth."
}
}
]
}

Usage with LiteLLM Proxyโ€‹

1. Setup Configโ€‹

Add Nvidia NIM rerank models to your proxy configuration:

model_list:
- model_name: nvidia-rerank
litellm_params:
model: nvidia_nim/nvidia/llama-3_2-nv-rerankqa-1b-v2
api_key: os.environ/NVIDIA_NIM_API_KEY

2. Start Proxyโ€‹

litellm --config /path/to/config.yaml

3. Make Rerank Requestsโ€‹

curl -X POST http://0.0.0.0:4000/rerank \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"model": "nvidia-rerank",
"query": "What is the GPU memory bandwidth of H100?",
"documents": [
"H100 delivers 3TB/s memory bandwidth",
"A100 has 2TB/s memory bandwidth",
"V100 offers 900GB/s memory bandwidth"
],
"top_n": 2
}'

API Parametersโ€‹

Required Parametersโ€‹

ParameterTypeDescription
modelstringThe Nvidia NIM rerank model name with nvidia_nim/ prefix
querystringThe search query to rank documents against
documentsarrayList of documents to rank (1-1000 documents)

Optional Parametersโ€‹

ParameterTypeDefaultDescription
top_nintegerAll documentsNumber of top-ranked documents to return

Nvidia-Specific Parametersโ€‹

truncate: Controls how text is truncated if it exceeds the model's context window

  • "NONE": No truncation (request may fail if too long)
  • "END": Truncate from the end of the text
response = litellm.rerank(
model="nvidia_nim/nvidia/llama-3_2-nv-rerankqa-1b-v2",
query="GPU performance",
documents=["High performance computing", "Fast GPU processing"],
top_n=2,
truncate="END", # Nvidia-specific parameter
)

Authenticationโ€‹

Set your Nvidia NIM API key:

export NVIDIA_NIM_API_KEY="nvapi-..."

API Endpointโ€‹

The rerank endpoint uses a different base URL than chat/embeddings:

  • Chat/Embeddings: https://integrate.api.nvidia.com/v1/
  • Rerank: https://ai.api.nvidia.com/v1/

LiteLLM automatically uses the correct endpoint for rerank requests.

Custom API Base URLโ€‹

You can override the default base URL in several ways:

Option 1: Environment Variable

export NVIDIA_NIM_API_BASE="https://your-custom-endpoint.com"

Option 2: Pass as parameter

response = litellm.rerank(
model="nvidia_nim/nvidia/llama-3_2-nv-rerankqa-1b-v2",
query="test",
documents=["doc1"],
api_base="https://your-custom-endpoint.com",
)

Option 3: Full URL (including model path)

If you have the complete endpoint URL, you can pass it directly:

response = litellm.rerank(
model="nvidia_nim/nvidia/llama-3_2-nv-rerankqa-1b-v2",
query="test",
documents=["doc1"],
api_base="https://your-custom-endpoint.com/v1/retrieval/nvidia/llama-3_2-nv-rerankqa-1b-v2/reranking",
)

LiteLLM will detect the full URL (by checking for /retrieval/ in the path) and use it as-is.

How do I get an API key?โ€‹

Get your Nvidia NIM API key from Nvidia's website.