Your Middleware Could Be a Bottleneck
How we improved LiteLLM proxy latency and throughput by replacing a single, simple middleware base class
Our Setup​
The LiteLLM proxy server has two middleware layers. The first is Starlette's CORSMiddleware (re-exported by FastAPI), which is a pure ASGI middleware. Then we have a simple BaseHTTPMiddleware called PrometheusAuthMiddleware.
The job of PrometheusAuthMiddleware is to authenticate requests to the /metrics endpoint. It's not on by default, you enable it with a flag in your proxy config:
Proxy config flag
litellm_settings:
require_auth_for_metrics_endpoint: true
The middleware checks two things: is the request hitting /metrics, and is auth even enabled? If both checks fail, which they do for the vast majority of requests, it just passes the request through unchanged.
PrometheusAuthMiddleware source
class PrometheusAuthMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request: Request, call_next):
if self._is_prometheus_metrics_endpoint(request):
if self._should_run_auth_on_metrics_endpoint() is True:
try:
await user_api_key_auth(request=request, api_key=...)
except Exception as e:
return JSONResponse(status_code=401, content=...)
response = await call_next(request)
return response
@staticmethod
def _is_prometheus_metrics_endpoint(request: Request):
if "/metrics" in request.url.path:
return True
return False
Looks harmless. Subclass BaseHTTPMiddleware, implement dispatch(), done. This is what you will see in Starlette's documentation1.

