Skip to main content

Grafana, Prometheus metrics [BETA]

LiteLLM Exposes a /metrics endpoint for Prometheus to Poll

Quick Start

If you're using the LiteLLM CLI with litellm --config proxy_config.yaml then you need to pip install prometheus_client==0.20.0. This is already pre-installed on the litellm Docker image

Add this to your proxy config.yaml

model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo
litellm_settings:
success_callback: ["prometheus"]
failure_callback: ["prometheus"]

Start the proxy

litellm --config config.yaml --debug

Test Request

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}'

View Metrics on /metrics, Visit http://localhost:4000/metrics

http://localhost:4000/metrics

# <proxy_base_url>/metrics

Metrics Tracked

Metric NameDescription
litellm_requests_metricNumber of requests made, per "user", "key", "model", "team", "end-user"
litellm_spend_metricTotal Spend, per "user", "key", "model", "team", "end-user"
litellm_total_tokensinput + output tokens per "user", "key", "model", "team", "end-user"
litellm_llm_api_failed_requests_metricNumber of failed LLM API requests per "user", "key", "model", "team", "end-user"

Monitor System Health

To monitor the health of litellm adjacent services (redis / postgres), do:

model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo
litellm_settings:
service_callback: ["prometheus_system"]
Metric NameDescription
litellm_redis_latencyhistogram latency for redis calls
litellm_redis_failsNumber of failed redis calls
litellm_self_latencyHistogram latency for successful litellm api call