💰 Budgets, Rate Limits


Set Budgets

You can set budgets at 3 levels:

  • For the proxy
  • For an internal user
  • For a customer (end-user)
  • For a key
  • For a key (model specific budgets)

Apply a budget across all calls on the proxy

Step 1. Modify config.yaml

master_key: sk-1234

# other litellm settings
max_budget: 0 # (float) sets max budget as $0 USD
budget_duration: 30d # (str) frequency of reset - You can set duration as seconds ("30s"), minutes ("30m"), hours ("30h"), days ("30d").

Step 2. Start proxy

litellm /path/to/config.yaml

Step 3. Send test call

curl --location '' \
--header 'Autherization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "gpt-3.5-turbo",
"messages": [
"role": "user",
"content": "what llm are you"

Reset Budgets

Reset budgets across keys/internal users/teams/customers

budget_duration: Budget is reset at the end of specified duration. If not set, budget is never reset. You can set duration as seconds ("30s"), minutes ("30m"), hours ("30h"), days ("30d").

curl '' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
"max_budget": 10,
"budget_duration": 10s, # 👈 KEY CHANGE

Note: By default, the server checks for resets every 10 minutes, to minimize DB calls.

To change this, set proxy_budget_rescheduler_min_time and proxy_budget_rescheduler_max_time

E.g.: Check every 1 seconds

proxy_budget_rescheduler_min_time: 1
proxy_budget_rescheduler_max_time: 1

Set Rate Limits

You can set:

  • tpm limits (tokens per minute)
  • rpm limits (requests per minute)
  • max parallel requests

Use /user/new, to persist rate limits across multiple keys.

curl --location '' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{"user_id": "", "max_parallel_requests": 10, "tpm_limit": 20, "rpm_limit": 4}'

See Swagger

Expected Response

"key": "sk-sA7VDkyhlQ7m8Gt77Mbt3Q",
"expires": "2024-01-19T01:21:12.816168",
"user_id": "",

Grant Access to new model

Use model access groups to give users access to select models, and add new ones to it over time (e.g. mistral, llama-2, etc.).

Difference between doing this with /key/generate vs. /user/new? If you do it on /user/new it'll persist across multiple keys generated for that user.

Step 1. Assign model, access group in config.yaml

- model_name: text-embedding-ada-002
model: azure/azure-embedding-model
api_base: "os.environ/AZURE_API_BASE"
api_key: "os.environ/AZURE_API_KEY"
api_version: "2023-07-01-preview"
access_groups: ["beta-models"] # 👈 Model Access Group

Step 2. Create key with access group

curl --location 'http://localhost:4000/user/new' \
-H 'Authorization: Bearer <your-master-key>' \
-H 'Content-Type: application/json' \
-d '{"models": ["beta-models"], # 👈 Model Access Group
"max_budget": 0}'

Create new keys for existing internal user

Just include user_id in the /key/generate request.

curl --location '' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data '{"models": ["azure-models"], "user_id": ""}'