One post tagged with "fastapi"

Your Middleware Could Be a Bottleneck

February 7, 2026

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaff

CTO, LiteLLM

Ryan Crabbe

Performance Engineer, LiteLLM

How we improved LiteLLM proxy latency and throughput by replacing a single, simple middleware base class

Our Setup

The LiteLLM proxy server has two middleware layers. The first is Starlette's CORSMiddleware (re-exported by FastAPI), which is a pure ASGI middleware. Then we have a simple BaseHTTPMiddleware called PrometheusAuthMiddleware.

The job of PrometheusAuthMiddleware is to authenticate requests to the /metrics endpoint. It's not on by default, you enable it with a flag in your proxy config:

Proxy config flag

litellm_settings:
    require_auth_for_metrics_endpoint: true

The middleware checks two things: is the request hitting /metrics, and is auth even enabled? If both checks fail, which they do for the vast majority of requests, it just passes the request through unchanged.

PrometheusAuthMiddleware source

class PrometheusAuthMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next):
        if self._is_prometheus_metrics_endpoint(request):
            if self._should_run_auth_on_metrics_endpoint() is True:
                try:
                    await user_api_key_auth(request=request, api_key=...)
                except Exception as e:
                    return JSONResponse(status_code=401, content=...)
        response = await call_next(request)
        return response

    @staticmethod
    def _is_prometheus_metrics_endpoint(request: Request):
        if "/metrics" in request.url.path:
            return True
        return False

Looks harmless. Subclass BaseHTTPMiddleware, implement dispatch(), done. This is what you will see in Starlette's documentation¹.

Our Setup​

Our Setup