Skip to main content

Getting Started

LiteLLM is an open-source library that gives you a single, unified interface to call 100+ LLMs β€” OpenAI, Anthropic, Vertex AI, Bedrock, and more β€” using the OpenAI format.

  • Call any provider using the same completion() interface β€” no re-learning the API for each one
  • Consistent output format regardless of which provider or model you use
  • Built-in retry / fallback logic across multiple deployments via the Router
  • Self-hosted LLM Gateway (Proxy) with virtual keys, cost tracking, and an admin UI

PyPI GitHub Stars


Installation​

pip install litellm

To run the full Proxy Server (LLM Gateway):

pip install 'litellm[proxy]'

Quick Start​

Make your first LLM call using the provider of your choice:

from litellm import completion
import os

os.environ["OPENAI_API_KEY"] = "your-api-key"

response = completion(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello, how are you?"}]
)
print(response.choices[0].message.content)

Every response follows the OpenAI Chat Completions format, regardless of provider. βœ…

Response Format​

Non-streaming responses return a ModelResponse object:

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1677858242,
"model": "gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm doing well, thanks for asking."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 13,
"completion_tokens": 12,
"total_tokens": 25
}
}

Streaming responses (stream=True) yield ModelResponseStream chunks:

{
"id": "chatcmpl-abc123",
"object": "chat.completion.chunk",
"created": 1677858242,
"model": "gpt-4o",
"choices": [
{
"index": 0,
"delta": {
"role": "assistant",
"content": "Hello"
},
"finish_reason": null
}
]
}

πŸ“– Full output format reference β†’

Open in Colab
Open In Colab

New to LiteLLM?​

Want to get started fast? Head to Tutorials for step-by-step walkthroughs β€” AI coding tools, agent SDKs, proxy setup, and more.

Need to understand a specific feature? Check Guides for streaming, function calling, prompt caching, and other how-tos.


Choose Your Path​


LiteLLM Python SDK​

Streaming​

Add stream=True to receive chunks as they are generated:

from litellm import completion
import os

os.environ["OPENAI_API_KEY"] = "your-api-key"

for chunk in completion(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Write a short poem"}],
stream=True,
):
print(chunk.choices[0].delta.content or "", end="")

Exception Handling​

LiteLLM maps every provider's errors to the OpenAI exception types β€” your existing error handling works out of the box:

import litellm

try:
litellm.completion(
model="anthropic/claude-instant-1",
messages=[{"role": "user", "content": "Hey!"}]
)
except litellm.AuthenticationError as e:
print(f"Bad API key: {e}")
except litellm.RateLimitError as e:
print(f"Rate limited: {e}")
except litellm.APIError as e:
print(f"API error: {e}")

Logging & Observability​

Send input/output to Langfuse, MLflow, Helicone, Lunary, and more with a single line:

import litellm

litellm.success_callback = ["langfuse", "mlflow", "helicone"]

response = litellm.completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Hi!"}]
)

πŸ“– See all observability integrations β†’

Track Costs & Usage​

Use a callback to capture cost per response:

import litellm

def track_cost(kwargs, completion_response, start_time, end_time):
print("Cost:", kwargs.get("response_cost", 0))

litellm.success_callback = [track_cost]

litellm.completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
stream=True
)

πŸ“– Custom callback docs β†’


LiteLLM Proxy Server (LLM Gateway)​

The proxy is a self-hosted OpenAI-compatible gateway. Any client that works with OpenAI works with the proxy β€” no code changes needed.

LiteLLM Proxy Dashboard

Step 1 β€” Start the proxy​

litellm --model huggingface/bigcode/starcoder
# Proxy running on http://0.0.0.0:4000

Step 2 β€” Call it with the OpenAI client​

import openai

client = openai.OpenAI(api_key="anything", base_url="http://0.0.0.0:4000")

response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Write a short poem"}]
)
print(response.choices[0].message.content)

πŸ‘‰ Full proxy quickstart with Docker β†’

Debugging tool

Use /utils/transform_request to inspect exactly what LiteLLM sends to any provider β€” useful for debugging prompt formatting, header issues, and provider-specific parameters.

πŸ”— Interactive API explorer (Swagger) β†’


Agent & MCP Gateway​

LiteLLM is a unified gateway for LLMs, agents, and MCP β€” you don't need a separate agent or MCP gateway. One endpoint for 100+ models, A2A agents, and MCP tools.


What to Explore Next​