Skip to main content

Response Headers

When you make a request to the proxy, the proxy will return the following headers:

Rate Limit Headers​

OpenAI-compatible headers:

HeaderTypeDescription
x-ratelimit-remaining-requestsOptional[int]The remaining number of requests that are permitted before exhausting the rate limit
x-ratelimit-remaining-tokensOptional[int]The remaining number of tokens that are permitted before exhausting the rate limit
x-ratelimit-limit-requestsOptional[int]The maximum number of requests that are permitted before exhausting the rate limit
x-ratelimit-limit-tokensOptional[int]The maximum number of tokens that are permitted before exhausting the rate limit
x-ratelimit-reset-requestsOptional[int]The time at which the rate limit will reset
x-ratelimit-reset-tokensOptional[int]The time at which the rate limit will reset

How Rate Limit Headers work​

If key has rate limits set

The proxy will return the remaining rate limits for that key.

If key does not have rate limits set

The proxy returns the remaining requests/tokens returned by the backend provider. (LiteLLM will standardize the backend provider's response headers to match the OpenAI format)

If the backend provider does not return these headers, the value will be None.

These headers are useful for clients to understand the current rate limit status and adjust their request rate accordingly.

Latency Headers​

HeaderTypeDescription
x-litellm-response-duration-msfloatTotal duration from the moment that a request gets to LiteLLM Proxy to the moment it gets returned to the client.
x-litellm-overhead-duration-msfloatLiteLLM processing overhead in milliseconds

Retry, Fallback Headers​

HeaderTypeDescription
x-litellm-attempted-retriesintNumber of retry attempts made
x-litellm-attempted-fallbacksintNumber of fallback attempts made
x-litellm-max-fallbacksintMaximum number of fallback attempts allowed

Cost Tracking Headers​

HeaderTypeDescriptionAvailable on Pass-Through Endpoints
x-litellm-response-costfloatCost of the API call
x-litellm-key-spendfloatTotal spend for the API key✅

LiteLLM Specific Headers​

HeaderTypeDescriptionAvailable on Pass-Through Endpoints
x-litellm-call-idstringId for this request✅
x-litellm-model-idstringDeployment id (model_info.id)
x-litellm-model-api-basestringAPI base URL✅
x-litellm-versionstringLiteLLM version
x-litellm-model-groupstringRouted model_list[].model_name (client model)

Example​

model_list:
- model_name: my-chat-model # clients call this
litellm_params:
model: gpt-4o-mini # LiteLLM calls this upstream
model_info:
id: "7c9f2a1b3d8e4f0a2c6b5d9e1f3a7b8c" # optional; auto-generated if omitted
HeaderExampleNotes
x-litellm-model-groupmy-chat-modelmodel_name / request model; not litellm_params.model.
x-litellm-model-id7c9f2a1b3d8e4f0a2c6b5d9e1f3a7b8cWhich deployment row; use with /v1/model/info?litellm_model_id=....
Response body modeloften my-chat-modelOften restamped to match the client; upstream id stays in config.

More examples (illustrative)​

HeaderExampleMeaning
x-litellm-response-cost0.000214This call (USD).
x-litellm-key-spend12.847Key total after this call.
x-litellm-response-duration-ms842.3Proxy end-to-end (ms).
x-litellm-overhead-duration-ms15.1LiteLLM overhead (ms).
x-litellm-attempted-retries0Retries.
x-litellm-attempted-fallbacks1Fallbacks to another deployment.
x-litellm-call-id019b2c4d-e5f6-7890-abcd-ef1234567890Logs / tracing.
x-litellm-version1.55.3Version.
x-litellm-model-api-basehttps://api.openai.com/v1Provider base (no query string).

Response headers from LLM providers​

LiteLLM also returns the original response headers from the LLM provider. These headers are prefixed with llm_provider- to distinguish them from LiteLLM's headers.

Example response headers:

llm_provider-openai-processing-ms: 256
llm_provider-openai-version: 2020-10-01
llm_provider-x-ratelimit-limit-requests: 30000
llm_provider-x-ratelimit-limit-tokens: 150000000