Skip to main content

Quick Start

Quick start CLI, Config, Docker

LiteLLM Server manages:

$ pip install 'litellm[proxy]'

Quick Start - LiteLLM Proxy CLI

Run the following command to start the litellm proxy

$ litellm --model huggingface/bigcode/starcoder

#INFO: Proxy running on


In a new shell, run, this will make an request. Ensure you're using openai v1.0.0+

litellm --test

This will now automatically route any requests for gpt-3.5-turbo to bigcode starcoder, hosted on huggingface inference endpoints.

Supported LLMs

All LiteLLM supported LLMs are supported on the Proxy. Seel all supported llms

$ litellm --model bedrock/anthropic.claude-v2

Quick Start - LiteLLM Proxy + Config.yaml

The config allows you to create a model list and set api_base, max_tokens (all litellm params). See more details about the config here

Create a Config for LiteLLM Proxy

Example config

- model_name: gpt-3.5-turbo # user-facing model alias
litellm_params: # all params accepted by litellm.completion() -
model: azure/<your-deployment-name>
api_base: <your-azure-api-endpoint>
api_key: <your-azure-api-key>
- model_name: gpt-3.5-turbo
model: azure/gpt-turbo-small-ca
api_key: <your-azure-api-key>
- model_name: vllm-model
model: openai/<your-model-name>
api_base: <your-api-base> # e.g.

Run proxy with config

litellm --config your_config.yaml

Using LiteLLM Proxy - Curl Request, OpenAI Package, Langchain

curl --location '' \
--header 'Content-Type: application/json' \
--data ' {
"model": "gpt-3.5-turbo",
"messages": [
"role": "user",
"content": "what llm are you"

More Info

📖 Proxy Endpoints - Swagger Docs

  • POST /chat/completions - chat completions endpoint to call 100+ LLMs
  • POST /completions - completions endpoint
  • POST /embeddings - embedding endpoint for Azure, OpenAI, Huggingface endpoints
  • GET /models - available models on server
  • POST /key/generate - generate a key to access the proxy

Quick Start Docker Image: Github Container Registry

Pull the litellm ghcr docker image

See the latest available ghcr docker image here:

docker pull

Run the Docker Image

docker run

Run the Docker Image with LiteLLM CLI args

See all supported CLI args here:

Here's how you can run the docker image and pass your config to litellm

docker run --config your_config.yaml

Here's how you can run the docker image and start litellm on port 8002 with num_workers=8

docker run --port 8002 --num_workers 8

Run the Docker Image using docker compose

Step 1

Here's an example docker-compose.yml file

version: "3.9"
- "8000:8000" # Map the container port to the host, change the host port if necessary
- ./litellm-config.yaml:/app/config.yaml # Mount the local configuration file
# You can change the port or number of workers as per your requirements or pass any new supported CLI augument. Make sure the port passed here matches with the container port defined above in `ports` value
command: [ "--config", "/app/config.yaml", "--port", "8000", "--num_workers", "8" ]

# of your docker-compose config if any

Step 2

Create a litellm-config.yaml file with your LiteLLM config relative to your docker-compose.yml file.

Check the config doc here

Step 3

Run the command docker-compose up or docker compose up as per your docker installation.

Use -d flag to run the container in detached mode (background) e.g. docker compose up -d

Your LiteLLM container should be running now on the defined port e.g. 8000.

Using with OpenAI compatible projects

Set base_url to the LiteLLM Proxy server

import openai
client = openai.OpenAI(

# request sent to model set on litellm proxy, `litellm --model`
response ="gpt-3.5-turbo", messages = [
"role": "user",
"content": "this is a test request, write a short poem"


Debugging Proxy

Events that occur during normal operation

litellm --model gpt-3.5-turbo --debug

Detailed information

litellm --model gpt-3.5-turbo --detailed_debug

Set Debug Level using env variables

Events that occur during normal operation


Detailed information


No Logs

export LITELLM_LOG=None