Skip to main content

Getting Started - E2E Tutorial

End-to-End tutorial for LiteLLM Proxy to:

  • Add an Azure OpenAI model
  • Make a successful /chat/completion call
  • Generate a virtual key
  • Set RPM limit on virtual key

Pre-Requisites

  • Install LiteLLM Docker Image OR LiteLLM CLI (pip package)
docker pull ghcr.io/berriai/litellm:main-latest

See all docker images

1. Add a model

Control LiteLLM Proxy with a config.yaml file.

Setup your config.yaml with your azure model.

model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/my_azure_deployment
api_base: os.environ/AZURE_API_BASE
api_key: "os.environ/AZURE_API_KEY"
api_version: "2024-07-01-preview" # [OPTIONAL] litellm uses the latest azure api_version by default

Model List Specification

  • model_name (str) - This field should contain the name of the model as received.
  • litellm_params (dict) See All LiteLLM Params
    • model (str) - Specifies the model name to be sent to litellm.acompletion / litellm.aembedding, etc. This is the identifier used by LiteLLM to route to the correct model + provider logic on the backend.
    • api_key (str) - The API key required for authentication. It can be retrieved from an environment variable using os.environ/.
    • api_base (str) - The API base for your azure deployment.
    • api_version (str) - The API Version to use when calling Azure's OpenAI API. Get the latest Inference API version here.

2. Make a successful /chat/completion call

LiteLLM Proxy is 100% OpenAI-compatible. Test your azure model via the /chat/completions route.

2.1 Start Proxy

Save your config.yaml from step 1. as litellm_config.yaml.

docker run \
-v $(pwd)/litellm_config.yaml:/app/config.yaml \
-e AZURE_API_KEY=d6*********** \
-e AZURE_API_BASE=https://openai-***********/ \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-latest \
--config /app/config.yaml --detailed_debug

# RUNNING on http://0.0.0.0:4000

Confirm your config.yaml got mounted correctly

Loaded config YAML (api_key and environment_variables are not shown):
{
"model_list": [
{
"model_name ...

2.2 Make Call

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "You are a helpful math tutor. Guide the user through the solution step by step."
},
{
"role": "user",
"content": "how can I solve 8x + 7 = -23"
}
]
}'

Expected Response

{
"id": "chatcmpl-2076f062-3095-4052-a520-7c321c115c68",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "I am gpt-3.5-turbo",
"role": "assistant",
"tool_calls": null,
"function_call": null
}
}
],
"created": 1724962831,
"model": "gpt-3.5-turbo",
"object": "chat.completion",
"system_fingerprint": null,
"usage": {
"completion_tokens": 20,
"prompt_tokens": 10,
"total_tokens": 30
}
}

3. Generate a virtual key

Track Spend, and control model access via virtual keys for the proxy

3.1 Set up a Database

Requirements

model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/my_azure_deployment
api_base: os.environ/AZURE_API_BASE
api_key: "os.environ/AZURE_API_KEY"
api_version: "2024-07-01-preview" # [OPTIONAL] litellm uses the latest azure api_version by default

general_settings:
master_key: sk-1234
database_url: "postgresql://<user>:<password>@<host>:<port>/<dbname>" # 👈 KEY CHANGE

Save config.yaml as litellm_config.yaml (used in 3.2).


What is general_settings?

These are settings for the LiteLLM Proxy Server.

See All General Settings here.

  1. master_key (str)

    • Description:
      • Set a master key, this is your Proxy Admin key - you can use this to create other keys (🚨 must start with sk-).
    • Usage:
      • Set on config.yaml set your master key under general_settings:master_key, example - master_key: sk-1234
      • Set env variable set LITELLM_MASTER_KEY
  2. database_url (str)

    • Description:
      • Set a database_url, this is the connection to your Postgres DB, which is used by litellm for generating keys, users, teams.
    • Usage:
      • Set on config.yaml set your master key under general_settings:database_url, example - database_url: "postgresql://..."
      • Set DATABASE_URL=postgresql://<user>:<password>@<host>:<port>/<dbname> in your env

3.2 Start Proxy

docker run \
-v $(pwd)/litellm_config.yaml:/app/config.yaml \
-e AZURE_API_KEY=d6*********** \
-e AZURE_API_BASE=https://openai-***********/ \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-latest \
--config /app/config.yaml --detailed_debug

3.3 Create Key w/ RPM Limit

Create a key with rpm_limit: 1. This will only allow 1 request per minute for calls to proxy with this key.

curl -L -X POST 'http://0.0.0.0:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"rpm_limit": 1
}

See full API Spec

Expected Response

{
"key": "sk-12..."
}

3.4 Test it!

Use your virtual key from step 3.3

1st call - Expect to work!

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-12...' \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "You are a helpful math tutor. Guide the user through the solution step by step."
},
{
"role": "user",
"content": "how can I solve 8x + 7 = -23"
}
]
}'

Expected Response

{
"id": "chatcmpl-2076f062-3095-4052-a520-7c321c115c68",
"choices": [
...
}

2nd call - Expect to fail!

Why did this call fail?

We set the virtual key's requests per minute (RPM) limit to 1. This has now been crossed.

curl -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-12...' \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "You are a helpful math tutor. Guide the user through the solution step by step."
},
{
"role": "user",
"content": "how can I solve 8x + 7 = -23"
}
]
}'

Expected Response

{
"error": {
"message": "Max parallel request limit reached. Hit limit for api_key: daa1b272072a4c6841470a488c5dad0f298ff506e1cc935f4a181eed90c182ad. tpm_limit: 100, current_tpm: 29, rpm_limit: 1, current_rpm: 2.",
"type": "None",
"param": "None",
"code": "429"
}
}

Troubleshooting

Non-root docker image?

If you need to run the docker image as a non-root user, use this.

SSL Verification Issue / Connection Error.

If you see

ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1006)

OR

Connection Error.

You can disable ssl verification with:

model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/my_azure_deployment
api_base: os.environ/AZURE_API_BASE
api_key: "os.environ/AZURE_API_KEY"
api_version: "2024-07-01-preview"

litellm_settings:
ssl_verify: false # 👈 KEY CHANGE

What is litellm_settings?

LiteLLM Proxy uses the LiteLLM Python SDK for handling LLM API calls.

litellm_settings are module-level params for the LiteLLM Python SDK (equivalent to doing litellm.<some_param> on the SDK). You can see all params here

Support & Talk with founders

Chat on WhatsApp Chat on Discord