Skip to main content

Timeouts

The timeout set in router is for the entire length of the call, and is passed down to the completion() call level as well.

Global Timeouts

from litellm import Router 

model_list = [{...}]

router = Router(model_list=model_list,
timeout=30) # raise timeout error if call takes > 30s

print(response)

Custom Timeouts & Stream Timeouts (Per Model)

For each model, you can set timeout and stream_timeout under litellm_params:

  • timeout → maximum time for the complete response.
    Use this to cap long-running completions.

  • stream_timeout → maximum time to wait for the first chunk (i.e., first token) in a streaming response.
    Use this to abort “hanging” providers (e.g., Bedrock slow start) and retry another model.

from litellm import Router 
import asyncio

model_list = [{
"model_name": "gpt-3.5-turbo",
"litellm_params": {
"model": "azure/chatgpt-v-2",
"api_key": os.getenv("AZURE_API_KEY"),
"api_version": os.getenv("AZURE_API_VERSION"),
"api_base": os.getenv("AZURE_API_BASE"),
"timeout": 300 # sets a 5 minute timeout
"stream_timeout": 30 # sets a 30s timeout for streaming calls
}
}]

# init router
router = Router(model_list=model_list, routing_strategy="least-busy")
async def router_acompletion():
response = await router.acompletion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hey, how's it going?"}]
)
print(response)
return response

asyncio.run(router_acompletion())

Setting Dynamic Timeouts - Per Request

LiteLLM supports setting a timeout per request

Example Usage

from litellm import Router 

model_list = [{...}]
router = Router(model_list=model_list)

response = router.completion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "what color is red"}],
timeout=1
)

Testing timeout handling

To test if your retry/fallback logic can handle timeouts, you can set mock_timeout=True for testing.

This is currently only supported on /chat/completions and /completions endpoints. Please let us know if you need this for other endpoints.

curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
--data-raw '{
"model": "gemini/gemini-1.5-flash",
"messages": [
{"role": "user", "content": "hi my email is ishaan@berri.ai"}
],
"mock_timeout": true # 👈 KEY CHANGE
}'