Skip to main content

Rate Limit Headers

When you make a request to the proxy, the proxy will return the following OpenAI-compatible headers:

  • x-ratelimit-remaining-requests - Optional[int]: The remaining number of requests that are permitted before exhausting the rate limit.
  • x-ratelimit-remaining-tokens - Optional[int]: The remaining number of tokens that are permitted before exhausting the rate limit.
  • x-ratelimit-limit-requests - Optional[int]: The maximum number of requests that are permitted before exhausting the rate limit.
  • x-ratelimit-limit-tokens - Optional[int]: The maximum number of tokens that are permitted before exhausting the rate limit.
  • x-ratelimit-reset-requests - Optional[int]: The time at which the rate limit will reset.
  • x-ratelimit-reset-tokens - Optional[int]: The time at which the rate limit will reset.

These headers are useful for clients to understand the current rate limit status and adjust their request rate accordingly.

How are these headers calculated?

If key has rate limits set

The proxy will return the remaining rate limits for that key.

If key does not have rate limits set

The proxy returns the remaining requests/tokens returned by the backend provider.

If the backend provider does not return these headers, the value will be None.