Skip to main content

Assistants API

Covers Threads, Messages, Assistants.

LiteLLM currently covers:

  • Create Assistants
  • Delete Assistants
  • Get Assistants
  • Create Thread
  • Get Thread
  • Add Messages
  • Get Messages
  • Run Thread

Supported Providers:​

Quick Start​

Call an existing Assistant.

  • Get the Assistant

  • Create a Thread when a user starts a conversation.

  • Add Messages to the Thread as the user asks questions.

  • Run the Assistant on the Thread to generate a response by calling the model and the tools.

SDK + PROXY​

Create an Assistant

import litellm
import os

# setup env
os.environ["OPENAI_API_KEY"] = "sk-.."

assistant = litellm.create_assistants(
custom_llm_provider="openai",
model="gpt-4-turbo",
instructions="You are a personal math tutor. When asked a question, write and run Python code to answer the question.",
name="Math Tutor",
tools=[{"type": "code_interpreter"}],
)

### ASYNC USAGE ###
# assistant = await litellm.acreate_assistants(
# custom_llm_provider="openai",
# model="gpt-4-turbo",
# instructions="You are a personal math tutor. When asked a question, write and run Python code to answer the question.",
# name="Math Tutor",
# tools=[{"type": "code_interpreter"}],
# )

Get the Assistant

from litellm import get_assistants, aget_assistants
import os

# setup env
os.environ["OPENAI_API_KEY"] = "sk-.."

assistants = get_assistants(custom_llm_provider="openai")

### ASYNC USAGE ###
# assistants = await aget_assistants(custom_llm_provider="openai")

Create a Thread

from litellm import create_thread, acreate_thread
import os

os.environ["OPENAI_API_KEY"] = "sk-.."

new_thread = create_thread(
custom_llm_provider="openai",
messages=[{"role": "user", "content": "Hey, how's it going?"}], # type: ignore
)

### ASYNC USAGE ###
# new_thread = await acreate_thread(custom_llm_provider="openai",messages=[{"role": "user", "content": "Hey, how's it going?"}])

Add Messages to the Thread

from litellm import create_thread, get_thread, aget_thread, add_message, a_add_message
import os

os.environ["OPENAI_API_KEY"] = "sk-.."

## CREATE A THREAD
_new_thread = create_thread(
custom_llm_provider="openai",
messages=[{"role": "user", "content": "Hey, how's it going?"}], # type: ignore
)

## OR retrieve existing thread
received_thread = get_thread(
custom_llm_provider="openai",
thread_id=_new_thread.id,
)

### ASYNC USAGE ###
# received_thread = await aget_thread(custom_llm_provider="openai", thread_id=_new_thread.id,)

## ADD MESSAGE TO THREAD
message = {"role": "user", "content": "Hey, how's it going?"}
added_message = add_message(
thread_id=_new_thread.id, custom_llm_provider="openai", **message
)

### ASYNC USAGE ###
# added_message = await a_add_message(thread_id=_new_thread.id, custom_llm_provider="openai", **message)

Run the Assistant on the Thread

from litellm import get_assistants, create_thread, add_message, run_thread, arun_thread
import os

os.environ["OPENAI_API_KEY"] = "sk-.."
assistants = get_assistants(custom_llm_provider="openai")

## get the first assistant ###
assistant_id = assistants.data[0].id

## GET A THREAD
_new_thread = create_thread(
custom_llm_provider="openai",
messages=[{"role": "user", "content": "Hey, how's it going?"}], # type: ignore
)

## ADD MESSAGE
message = {"role": "user", "content": "Hey, how's it going?"}
added_message = add_message(
thread_id=_new_thread.id, custom_llm_provider="openai", **message
)

## 🚨 RUN THREAD
response = run_thread(
custom_llm_provider="openai", thread_id=thread_id, assistant_id=assistant_id
)

### ASYNC USAGE ###
# response = await arun_thread(custom_llm_provider="openai", thread_id=thread_id, assistant_id=assistant_id)

print(f"run_thread: {run_thread}")

Streaming​

from litellm import run_thread_stream 
import os

os.environ["OPENAI_API_KEY"] = "sk-.."

message = {"role": "user", "content": "Hey, how's it going?"}

data = {"custom_llm_provider": "openai", "thread_id": _new_thread.id, "assistant_id": assistant_id, **message}

run = run_thread_stream(**data)
with run as run:
assert isinstance(run, AssistantEventHandler)
for chunk in run:
print(f"chunk: {chunk}")
run.until_done()

👉 Proxy API Reference​

Azure OpenAI​

config

assistant_settings:
custom_llm_provider: azure
litellm_params:
api_key: os.environ/AZURE_API_KEY
api_base: os.environ/AZURE_API_BASE

curl

curl -X POST "http://localhost:4000/v1/assistants" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"instructions": "You are a personal math tutor. When asked a question, write and run Python code to answer the question.",
"name": "Math Tutor",
"tools": [{"type": "code_interpreter"}],
"model": "<my-azure-deployment-name>"
}'

OpenAI-Compatible APIs​

To call openai-compatible Assistants API's (eg. Astra Assistants API), just add openai/ to the model name:

config

assistant_settings:
custom_llm_provider: openai
litellm_params:
api_key: os.environ/ASTRA_API_KEY
api_base: os.environ/ASTRA_API_BASE

curl

curl -X POST "http://localhost:4000/v1/assistants" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"instructions": "You are a personal math tutor. When asked a question, write and run Python code to answer the question.",
"name": "Math Tutor",
"tools": [{"type": "code_interpreter"}],
"model": "openai/<my-astra-model-name>"
}'