Skip to main content

/responses [Beta]

LiteLLM provides a BETA endpoint in the spec of OpenAI's /responses API

FeatureSupportedNotes
Cost Tracking✅Works with all supported models
Logging✅Works across all integrations
End-user Tracking✅
Streaming✅
Fallbacks✅Works between supported models
Loadbalancing✅Works between supported models
Supported LiteLLM Versions1.63.8+
Supported LLM providersopenai

Usage​

Create a model response​

Non-streaming​

import litellm

# Non-streaming response
response = litellm.responses(
model="gpt-4o",
input="Tell me a three sentence bedtime story about a unicorn.",
max_output_tokens=100
)

print(response)

Streaming​

import litellm

# Streaming response
response = litellm.responses(
model="gpt-4o",
input="Tell me a three sentence bedtime story about a unicorn.",
stream=True
)

for event in response:
print(event)