Skip to main content

/vector_stores/{vector_store_id}/search - Search Vector Store

Search a vector store for relevant chunks based on a query and file attributes filter. This is useful for retrieval-augmented generation (RAG) use cases.

Overviewโ€‹

FeatureSupportedNotes
Cost Trackingโœ…Tracked per search operation
Loggingโœ…Works across all integrations
End-user Trackingโœ…
Support LLM ProvidersOpenAI, Azure OpenAI, Bedrock, Vertex RAG EngineFull vector stores API support across providers

Usageโ€‹

LiteLLM Python SDKโ€‹

Non-streaming exampleโ€‹

Search Vector Store - Basic
import litellm

response = await litellm.vector_stores.asearch(
vector_store_id="vs_abc123",
query="What is the capital of France?"
)
print(response)

Synchronous exampleโ€‹

Search Vector Store - Sync
import litellm

response = litellm.vector_stores.search(
vector_store_id="vs_abc123",
query="What is the capital of France?"
)
print(response)

LiteLLM Proxy Serverโ€‹

  1. Setup config.yaml
model_list:
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY

general_settings:
# Vector store settings can be added here if needed
  1. Start proxy
litellm --config /path/to/config.yaml
  1. Test it with OpenAI SDK!
OpenAI SDK via LiteLLM Proxy
from openai import OpenAI

# Point OpenAI SDK to LiteLLM proxy
client = OpenAI(
base_url="http://0.0.0.0:4000",
api_key="sk-1234", # Your LiteLLM API key
)

search_results = client.beta.vector_stores.search(
vector_store_id="vs_abc123",
query="What is the capital of France?",
max_num_results=5
)
print(search_results)

OpenAI SDK (Standalone)โ€‹

OpenAI SDK Direct
from openai import OpenAI

client = OpenAI(api_key="your-openai-api-key")

search_results = client.beta.vector_stores.search(
vector_store_id="vs_abc123",
query="What is the capital of France?",
max_num_results=5
)
print(search_results)

Request Formatโ€‹

The request body follows OpenAI's vector stores search API format.

Example request bodyโ€‹

{
"query": "What is the capital of France?",
"filters": {
"file_ids": ["file-abc123", "file-def456"]
},
"max_num_results": 5,
"ranking_options": {
"score_threshold": 0.7
},
"rewrite_query": true
}

Required Fieldsโ€‹

  • query (string or array of strings): A query string or array for the search. The query is used to find relevant chunks in the vector store.

Optional Fieldsโ€‹

  • filters (object): Optional filter to apply based on file attributes.
    • file_ids (array of strings): Filter chunks based on specific file IDs.
  • max_num_results (integer): Maximum number of results to return. Must be between 1 and 50. Default is 10.
  • ranking_options (object): Optional ranking options for search.
    • score_threshold (number): Minimum similarity score threshold for results.
  • rewrite_query (boolean): Whether to rewrite the natural language query for vector search optimization. Default is true.

Response Formatโ€‹

Example Responseโ€‹

{
"object": "vector_store.search_results.page",
"search_query": "What is the capital of France?",
"data": [
{
"score": 0.95,
"content": [
{
"type": "text",
"text": "Paris is the capital and most populous city of France. With an official estimated population of 2,102,650 residents as of 1 January 2023 in an area of more than 105 kmยฒ, Paris is the fourth-most populated city in the European Union and the 30th most densely populated city in the world in 2022."
}
]
},
{
"score": 0.87,
"content": [
{
"type": "text",
"text": "France, officially the French Republic, is a country located primarily in Western Europe. Its capital is Paris, one of the most important cultural and economic centers in Europe."
}
]
}
]
}

Response Fieldsโ€‹

  • object (string): The object type, which is always vector_store.search_results.page.
  • search_query (string): The query that was used for the search.
  • data (array): An array of search result objects.
    • score (number): The similarity score of the search result, typically between 0 and 1, where 1 is the most similar.
    • content (array): Array of content objects containing the retrieved text.
      • type (string): The type of content, typically text.
      • text (string): The actual text content that was retrieved from the vector store.

Mock Response Testingโ€‹

For testing purposes, you can use mock responses:

Mock Response Example
import litellm

# Mock response for testing
mock_results = [
{
"score": 0.95,
"content": [
{
"text": "Paris is the capital of France.",
"type": "text"
}
]
},
{
"score": 0.87,
"content": [
{
"text": "France is a country in Western Europe.",
"type": "text"
}
]
}
]

response = await litellm.vector_stores.asearch(
vector_store_id="vs_abc123",
query="What is the capital of France?",
mock_response=mock_results
)
print(response)

Error Handlingโ€‹

Common errors you might encounter:

Error Handling Example
import litellm

try:
response = await litellm.vector_stores.asearch(
vector_store_id="vs_invalid",
query="What is the capital of France?"
)
except litellm.NotFoundError as e:
print(f"Vector store not found: {e}")
except litellm.RateLimitError as e:
print(f"Rate limit exceeded: {e}")
except Exception as e:
print(f"Unexpected error: {e}")

Best Practicesโ€‹

  1. Query Optimization: Use clear, specific queries for better search results.
  2. Result Filtering: Use file_ids filter to limit search scope when needed.
  3. Score Thresholds: Set appropriate score thresholds to filter out irrelevant results.
  4. Batch Queries: Use array queries when searching for multiple related topics.
  5. Error Handling: Always implement proper error handling for production use.
Best Practices Example
import litellm

async def search_documents(vector_store_id: str, user_query: str):
"""
Search documents with best practices applied
"""
try:
response = await litellm.vector_stores.asearch(
vector_store_id=vector_store_id,
query=user_query,
max_num_results=5,
ranking_options={
"score_threshold": 0.7 # Filter out low-relevance results
},
rewrite_query=True # Optimize query for vector search
)

# Filter results by score for additional quality control
high_quality_results = [
result for result in response.data
if result.score >= 0.8
]

return high_quality_results

except Exception as e:
print(f"Search failed: {e}")
return []

# Usage
results = await search_documents("vs_abc123", "What is the capital of France?")