Getting Started
LiteLLM is an open-source library that gives you a single, unified interface to call 100+ LLMs β OpenAI, Anthropic, Vertex AI, Bedrock, and more β using the OpenAI format.
- Call any provider using the same
completion()interface β no re-learning the API for each one - Consistent output format regardless of which provider or model you use
- Built-in retry / fallback logic across multiple deployments via the Router
- Self-hosted LLM Gateway (Proxy) with virtual keys, cost tracking, and an admin UI
Installationβ
pip install litellm
To run the full Proxy Server (LLM Gateway):
pip install 'litellm[proxy]'
Quick Startβ
Make your first LLM call using the provider of your choice:
- OpenAI
- Anthropic
- Vertex AI
- Bedrock
- Ollama
- Azure OpenAI
from litellm import completion
import os
os.environ["OPENAI_API_KEY"] = "your-api-key"
response = completion(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello, how are you?"}]
)
print(response.choices[0].message.content)
from litellm import completion
import os
os.environ["ANTHROPIC_API_KEY"] = "your-api-key"
response = completion(
model="anthropic/claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": "Hello, how are you?"}]
)
print(response.choices[0].message.content)
from litellm import completion
import os
# auth: run 'gcloud auth application-default login'
os.environ["VERTEXAI_PROJECT"] = "your-project-id"
os.environ["VERTEXAI_LOCATION"] = "us-central1"
response = completion(
model="vertex_ai/gemini-1.5-pro",
messages=[{"role": "user", "content": "Hello, how are you?"}]
)
print(response.choices[0].message.content)
from litellm import completion
import os
os.environ["AWS_ACCESS_KEY_ID"] = "your-key"
os.environ["AWS_SECRET_ACCESS_KEY"] = "your-secret"
os.environ["AWS_REGION_NAME"] = "us-east-1"
response = completion(
model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0",
messages=[{"role": "user", "content": "Hello, how are you?"}]
)
print(response.choices[0].message.content)
from litellm import completion
response = completion(
model="ollama/llama3",
messages=[{"role": "user", "content": "Hello, how are you?"}],
api_base="http://localhost:11434"
)
print(response.choices[0].message.content)
from litellm import completion
import os
os.environ["AZURE_API_KEY"] = "your-key"
os.environ["AZURE_API_BASE"] = "https://your-resource.openai.azure.com"
os.environ["AZURE_API_VERSION"] = "2024-02-01"
response = completion(
model="azure/your-deployment-name",
messages=[{"role": "user", "content": "Hello, how are you?"}]
)
print(response.choices[0].message.content)
Every response follows the OpenAI Chat Completions format, regardless of provider. β
Response Formatβ
Non-streaming responses return a ModelResponse object:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1677858242,
"model": "gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm doing well, thanks for asking."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 13,
"completion_tokens": 12,
"total_tokens": 25
}
}
Streaming responses (stream=True) yield ModelResponseStream chunks:
{
"id": "chatcmpl-abc123",
"object": "chat.completion.chunk",
"created": 1677858242,
"model": "gpt-4o",
"choices": [
{
"index": 0,
"delta": {
"role": "assistant",
"content": "Hello"
},
"finish_reason": null
}
]
}
π Full output format reference β
New to LiteLLM?β
Want to get started fast? Head to Tutorials for step-by-step walkthroughs β AI coding tools, agent SDKs, proxy setup, and more.
Need to understand a specific feature? Check Guides for streaming, function calling, prompt caching, and other how-tos.
Choose Your Pathβ
- completion(), embedding(), image_generation() and more
- Router with retry, fallback, and load balancing
- OpenAI-compatible exceptions across all providers
- Observability callbacks (Langfuse, MLflow, Heliconeβ¦)
- Virtual keys with per-key/team/user budgets
- Centralized logging, guardrails, and caching
- Admin UI for monitoring and management
- Drop-in replacement for any OpenAI-compatible client
LiteLLM Python SDKβ
Streamingβ
Add stream=True to receive chunks as they are generated:
from litellm import completion
import os
os.environ["OPENAI_API_KEY"] = "your-api-key"
for chunk in completion(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Write a short poem"}],
stream=True,
):
print(chunk.choices[0].delta.content or "", end="")
Exception Handlingβ
LiteLLM maps every provider's errors to the OpenAI exception types β your existing error handling works out of the box:
import litellm
try:
litellm.completion(
model="anthropic/claude-instant-1",
messages=[{"role": "user", "content": "Hey!"}]
)
except litellm.AuthenticationError as e:
print(f"Bad API key: {e}")
except litellm.RateLimitError as e:
print(f"Rate limited: {e}")
except litellm.APIError as e:
print(f"API error: {e}")
Logging & Observabilityβ
Send input/output to Langfuse, MLflow, Helicone, Lunary, and more with a single line:
import litellm
litellm.success_callback = ["langfuse", "mlflow", "helicone"]
response = litellm.completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Hi!"}]
)
π See all observability integrations β
Track Costs & Usageβ
Use a callback to capture cost per response:
import litellm
def track_cost(kwargs, completion_response, start_time, end_time):
print("Cost:", kwargs.get("response_cost", 0))
litellm.success_callback = [track_cost]
litellm.completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
stream=True
)
LiteLLM Proxy Server (LLM Gateway)β
The proxy is a self-hosted OpenAI-compatible gateway. Any client that works with OpenAI works with the proxy β no code changes needed.
Step 1 β Start the proxyβ
- pip
- Docker
litellm --model huggingface/bigcode/starcoder
# Proxy running on http://0.0.0.0:4000
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/your-deployment
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
api_version: "2023-07-01-preview"
docker run \
-v $(pwd)/litellm_config.yaml:/app/config.yaml \
-e AZURE_API_KEY=your-key \
-e AZURE_API_BASE=https://your-resource.openai.azure.com/ \
-p 4000:4000 \
docker.litellm.ai/berriai/litellm:main-latest \
--config /app/config.yaml --detailed_debug
Step 2 β Call it with the OpenAI clientβ
import openai
client = openai.OpenAI(api_key="anything", base_url="http://0.0.0.0:4000")
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Write a short poem"}]
)
print(response.choices[0].message.content)
π Full proxy quickstart with Docker β
Use /utils/transform_request to inspect exactly what LiteLLM sends to any provider β useful for debugging prompt formatting, header issues, and provider-specific parameters.
π Interactive API explorer (Swagger) β
Agent & MCP Gatewayβ
LiteLLM is a unified gateway for LLMs, agents, and MCP β you don't need a separate agent or MCP gateway. One endpoint for 100+ models, A2A agents, and MCP tools.