Skip to main content

💰 Budgets, Rate Limits

Requirements:

Set Budgets

You can set budgets at 5 levels:

  • For the proxy
  • For an internal user
  • For a customer (end-user)
  • For a key
  • For a key (model specific budgets)

Apply a budget across all calls on the proxy

Step 1. Modify config.yaml

general_settings:
master_key: sk-1234

litellm_settings:
# other litellm settings
max_budget: 0 # (float) sets max budget as $0 USD
budget_duration: 30d # (str) frequency of reset - You can set duration as seconds ("30s"), minutes ("30m"), hours ("30h"), days ("30d").

Step 2. Start proxy

litellm /path/to/config.yaml

Step 3. Send test call

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Autherization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
}'

Reset Budgets

Reset budgets across keys/internal users/teams/customers

budget_duration: Budget is reset at the end of specified duration. If not set, budget is never reset. You can set duration as seconds ("30s"), minutes ("30m"), hours ("30h"), days ("30d").

curl 'http://0.0.0.0:4000/user/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
"max_budget": 10,
"budget_duration": 10s, # 👈 KEY CHANGE
}'

Note: By default, the server checks for resets every 10 minutes, to minimize DB calls.

To change this, set proxy_budget_rescheduler_min_time and proxy_budget_rescheduler_max_time

E.g.: Check every 1 seconds

general_settings: 
proxy_budget_rescheduler_min_time: 1
proxy_budget_rescheduler_max_time: 1

Set Rate Limits

You can set:

  • tpm limits (tokens per minute)
  • rpm limits (requests per minute)
  • max parallel requests
  • rpm / tpm limits per model for a given key

Use /team/new or /team/update, to persist rate limits across multiple keys for a team.

curl --location 'http://0.0.0.0:4000/team/new' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{"team_id": "my-prod-team", "max_parallel_requests": 10, "tpm_limit": 20, "rpm_limit": 4}'

See Swagger

Expected Response

{
"key": "sk-sA7VDkyhlQ7m8Gt77Mbt3Q",
"expires": "2024-01-19T01:21:12.816168",
"team_id": "my-prod-team",
}

Set default budget for ALL internal users

Use this to set a default budget for users who you give keys to.

This will apply when a user has user_role="internal_user" (set this via /user/new or /user/update).

This will NOT apply if a key has a team_id (team budgets will apply then). Tell us how we can improve this!

  1. Define max budget in your config.yaml
model_list: 
- model_name: "gpt-3.5-turbo"
litellm_params:
model: gpt-3.5-turbo
api_key: os.environ/OPENAI_API_KEY

litellm_settings:
max_internal_user_budget: 0 # amount in USD
internal_user_budget_duration: "1mo" # reset every month
  1. Create key for user
curl -L -X POST 'http://0.0.0.0:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{}'

Expected Response:

{
...
"key": "sk-X53RdxnDhzamRwjKXR4IHg"
}
  1. Test it!
curl -L -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-X53RdxnDhzamRwjKXR4IHg' \
-d '{
"model": "gpt-3.5-turbo",
"messages": [{"role": "user", "content": "Hey, how's it going?"}]
}'

Expected Response:

{
"error": {
"message": "ExceededBudget: User=<user_id> over budget. Spend=3.7e-05, Budget=0.0",
"type": "budget_exceeded",
"param": null,
"code": "400"
}
}

Grant Access to new model

Use model access groups to give users access to select models, and add new ones to it over time (e.g. mistral, llama-2, etc.).

Difference between doing this with /key/generate vs. /user/new? If you do it on /user/new it'll persist across multiple keys generated for that user.

Step 1. Assign model, access group in config.yaml

model_list:
- model_name: text-embedding-ada-002
litellm_params:
model: azure/azure-embedding-model
api_base: "os.environ/AZURE_API_BASE"
api_key: "os.environ/AZURE_API_KEY"
api_version: "2023-07-01-preview"
model_info:
access_groups: ["beta-models"] # 👈 Model Access Group

Step 2. Create key with access group

curl --location 'http://localhost:4000/user/new' \
-H 'Authorization: Bearer <your-master-key>' \
-H 'Content-Type: application/json' \
-d '{"models": ["beta-models"], # 👈 Model Access Group
"max_budget": 0}'

Create new keys for existing internal user

Just include user_id in the /key/generate request.

curl --location 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data '{"models": ["azure-models"], "user_id": "krrish@berri.ai"}'

API Specification

GenericBudgetInfo

A Pydantic model that defines budget information with a time period and limit.

class GenericBudgetInfo(BaseModel):
budget_limit: float # The maximum budget amount in USD
time_period: str # Duration string like "1d", "30d", etc.

Fields:

  • budget_limit (float): The maximum budget amount in USD
  • time_period (str): Duration string specifying the time period for the budget. Supported formats:
    • Seconds: "30s"
    • Minutes: "30m"
    • Hours: "30h"
    • Days: "30d"

Example:

{
"budget_limit": "0.0001",
"time_period": "1d"
}