Azure Responses API
Property | Details |
---|---|
Description | Azure OpenAI Responses API |
custom_llm_provider on LiteLLM | azure/ |
Supported Operations | /v1/responses |
Azure OpenAI Responses API | Azure OpenAI Responses API ↗ |
Cost Tracking, Logging Support | ✅ LiteLLM will log, track cost for Responses API Requests |
Supported OpenAI Params | ✅ All OpenAI params are supported, See here |
Usage​
Create a model response​
- LiteLLM SDK
- OpenAI SDK with LiteLLM Proxy
Non-streaming​
Azure Responses API
import litellm
# Non-streaming response
response = litellm.responses(
model="azure/o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
max_output_tokens=100,
api_key=os.getenv("AZURE_RESPONSES_OPENAI_API_KEY"),
api_base="https://litellm8397336933.openai.azure.com/",
api_version="2023-03-15-preview",
)
print(response)
Streaming​
Azure Responses API
import litellm
# Streaming response
response = litellm.responses(
model="azure/o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
stream=True,
api_key=os.getenv("AZURE_RESPONSES_OPENAI_API_KEY"),
api_base="https://litellm8397336933.openai.azure.com/",
api_version="2023-03-15-preview",
)
for event in response:
print(event)
First, add this to your litellm proxy config.yaml:
Azure Responses API
model_list:
- model_name: o1-pro
litellm_params:
model: azure/o1-pro
api_key: os.environ/AZURE_RESPONSES_OPENAI_API_KEY
api_base: https://litellm8397336933.openai.azure.com/
api_version: 2023-03-15-preview
Start your LiteLLM proxy:
litellm --config /path/to/config.yaml
# RUNNING on http://0.0.0.0:4000
Then use the OpenAI SDK pointed to your proxy:
Non-streaming​
from openai import OpenAI
# Initialize client with your proxy URL
client = OpenAI(
base_url="http://localhost:4000", # Your proxy URL
api_key="your-api-key" # Your proxy API key
)
# Non-streaming response
response = client.responses.create(
model="o1-pro",
input="Tell me a three sentence bedtime story about a unicorn."
)
print(response)
Streaming​
from openai import OpenAI
# Initialize client with your proxy URL
client = OpenAI(
base_url="http://localhost:4000", # Your proxy URL
api_key="your-api-key" # Your proxy API key
)
# Streaming response
response = client.responses.create(
model="o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
stream=True
)
for event in response:
print(event)
Azure Codex Models​
Codex models use Azure's new /v1/preview API which provides ongoing access to the latest features with no need to update api-version
each month.
LiteLLM will send your requests to the /v1/preview
endpoint when you set api_version="preview"
.
- LiteLLM SDK
- OpenAI SDK with LiteLLM Proxy
Non-streaming​
Azure Codex Models
import litellm
# Non-streaming response with Codex models
response = litellm.responses(
model="azure/codex-mini",
input="Tell me a three sentence bedtime story about a unicorn.",
max_output_tokens=100,
api_key=os.getenv("AZURE_RESPONSES_OPENAI_API_KEY"),
api_base="https://litellm8397336933.openai.azure.com",
api_version="preview", # 👈 key difference
)
print(response)
Streaming​
Azure Codex Models
import litellm
# Streaming response with Codex models
response = litellm.responses(
model="azure/codex-mini",
input="Tell me a three sentence bedtime story about a unicorn.",
stream=True,
api_key=os.getenv("AZURE_RESPONSES_OPENAI_API_KEY"),
api_base="https://litellm8397336933.openai.azure.com",
api_version="preview", # 👈 key difference
)
for event in response:
print(event)
First, add this to your litellm proxy config.yaml:
Azure Codex Models
model_list:
- model_name: codex-mini
litellm_params:
model: azure/codex-mini
api_key: os.environ/AZURE_RESPONSES_OPENAI_API_KEY
api_base: https://litellm8397336933.openai.azure.com
api_version: preview # 👈 key difference
Start your LiteLLM proxy:
litellm --config /path/to/config.yaml
# RUNNING on http://0.0.0.0:4000
Then use the OpenAI SDK pointed to your proxy:
Non-streaming​
from openai import OpenAI
# Initialize client with your proxy URL
client = OpenAI(
base_url="http://localhost:4000", # Your proxy URL
api_key="your-api-key" # Your proxy API key
)
# Non-streaming response
response = client.responses.create(
model="codex-mini",
input="Tell me a three sentence bedtime story about a unicorn."
)
print(response)
Streaming​
from openai import OpenAI
# Initialize client with your proxy URL
client = OpenAI(
base_url="http://localhost:4000", # Your proxy URL
api_key="your-api-key" # Your proxy API key
)
# Streaming response
response = client.responses.create(
model="codex-mini",
input="Tell me a three sentence bedtime story about a unicorn.",
stream=True
)
for event in response:
print(event)