Skip to main content

'Thinking' / 'Reasoning Content'

info

Requires LiteLLM v1.63.0+

Supported Providers:

  • Deepseek (deepseek/)
  • Anthropic API (anthropic/)
  • Bedrock (Anthropic + Deepseek + GPT-OSS) (bedrock/)
  • Vertex AI (Anthropic) (vertexai/)
  • OpenRouter (openrouter/)
  • XAI (xai/)
  • Google AI Studio (google/)
  • Vertex AI (vertex_ai/)
  • Perplexity (perplexity/)
  • Mistral AI (Magistral models) (mistral/)
  • Groq (groq/)

LiteLLM will standardize the reasoning_content in the response and thinking_blocks in the assistant message.

Example response from litellm
"message": {
...
"reasoning_content": "The capital of France is Paris.",
"thinking_blocks": [ # only returned for Anthropic models
{
"type": "thinking",
"thinking": "The capital of France is Paris.",
"signature": "EqoBCkgIARABGAIiQL2UoU0b1OHYi+..."
}
]
}

Quick Start

from litellm import completion
import os

os.environ["ANTHROPIC_API_KEY"] = ""

response = completion(
model="anthropic/claude-3-7-sonnet-20250219",
messages=[
{"role": "user", "content": "What is the capital of France?"},
],
reasoning_effort="low",
)
print(response.choices[0].message.content)

Expected Response

{
"id": "3b66124d79a708e10c603496b363574c",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": " won the FIFA World Cup in 2022.",
"role": "assistant",
"tool_calls": null,
"function_call": null
}
}
],
"created": 1723323084,
"model": "deepseek/deepseek-chat",
"object": "chat.completion",
"system_fingerprint": "fp_7e0991cad4",
"usage": {
"completion_tokens": 12,
"prompt_tokens": 16,
"total_tokens": 28,
},
"service_tier": null
}

Tool Calling with thinking

Here's how to use thinking blocks by Anthropic with tool calling.

Important: OpenAI-Compatible API Limitations

Compatibility Notice

Anthropic extended thinking with tool calling is not fully compatible with OpenAI-compatible API clients. This is due to fundamental architectural differences between how OpenAI and Anthropic handle reasoning in multi-turn conversations.

When using Anthropic models with thinking enabled and tool calling, you must include thinking_blocks from the previous assistant response when sending tool results back. Failure to do so will result in a 400 Bad Request error.

OpenAI vs Anthropic Architecture:

ProviderAPI ArchitectureReasoning StorageMulti-turn Handling
OpenAI (o1, o3)Responses API (Stateful)Server-sideServer stores reasoning internally; client sends previous_response_id
Anthropic (Claude)Messages API (Stateless)Client-sideClient must store and resend thinking_blocks with every request
  1. OpenAI's Chat Completions spec has no field for thinking_blocks
  2. OpenAI-compatible clients (LibreChat, Open WebUI, Vercel AI SDK, etc.) ignore the thinking_blocks field in responses
  3. When these clients reconstruct the assistant message for the next turn, the thinking blocks are lost
  4. Anthropic rejects the request because the assistant message doesn't start with a thinking block
LiteLLM supports thinking_blocks

LiteLLM's completion() API does support sending thinking_blocks in assistant messages. If you're using LiteLLM directly (not through an OpenAI-compatible client), you can preserve and resend thinking_blocks and everything will work correctly.

Solutions:

  1. Use LiteLLM's built-in workaround (recommended): Set litellm.modify_params = True and LiteLLM will automatically handle this incompatibility by dropping the thinking param when thinking_blocks are missing (see below)
  2. For client developers: Explicitly handle and resend the thinking_blocks field (see example below)
  3. Disable extended thinking when using tools with OpenAI-compatible clients that don't support thinking_blocks
  4. Use Anthropic's native API directly instead of OpenAI-compatible endpoints

LiteLLM Built-in Workaround

LiteLLM can automatically handle this incompatibility when modify_params=True is set. If the client sends a request with thinking enabled but the assistant message with tool_calls is missing thinking_blocks, LiteLLM will automatically drop the thinking param for that turn to avoid the error.

import litellm

# Enable automatic parameter modification
litellm.modify_params = True

# Now this will work even if thinking_blocks are missing from the assistant message
response = litellm.completion(
model="anthropic/claude-sonnet-4-20250514",
thinking={"type": "enabled", "budget_tokens": 1024},
tools=[...],
messages=[
{"role": "user", "content": "What's the weather in Madrid?"},
{
"role": "assistant",
"tool_calls": [{"id": "call_123", "type": "function", "function": {"name": "get_weather", "arguments": '{"city": "Madrid"}'}}]
# Note: thinking_blocks is missing here - LiteLLM will handle it
},
{"role": "tool", "tool_call_id": "call_123", "content": "22°C sunny"}
]
)
info

When modify_params=True and LiteLLM drops the thinking param, the model will not use extended thinking for that specific turn. The conversation will continue normally, but without reasoning for that response.

Correct way to include thinking_blocks:

# After receiving a response with tool_calls, include thinking_blocks when sending back:
assistant_message = {
"role": "assistant",
"content": response.choices[0].message.content,
"tool_calls": [...],
"thinking_blocks": response.choices[0].message.thinking_blocks # ← Required!
}

litellm._turn_on_debug()
litellm.modify_params = True
model = "anthropic/claude-3-7-sonnet-20250219" # works across Anthropic, Bedrock, Vertex AI
# Step 1: send the conversation and available functions to the model
messages = [
{
"role": "user",
"content": "What's the weather like in San Francisco, Tokyo, and Paris? - give me 3 responses",
}
]
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state",
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
},
},
"required": ["location"],
},
},
}
]
response = litellm.completion(
model=model,
messages=messages,
tools=tools,
tool_choice="auto", # auto is default, but we'll be explicit
reasoning_effort="low",
)
print("Response\n", response)
response_message = response.choices[0].message
tool_calls = response_message.tool_calls

print("Expecting there to be 3 tool calls")
assert (
len(tool_calls) > 0
) # this has to call the function for SF, Tokyo and paris

# Step 2: check if the model wanted to call a function
print(f"tool_calls: {tool_calls}")
if tool_calls:
# Step 3: call the function
# Note: the JSON response may not always be valid; be sure to handle errors
available_functions = {
"get_current_weather": get_current_weather,
} # only one function in this example, but you can have multiple
messages.append(
response_message
) # extend conversation with assistant's reply
print("Response message\n", response_message)
# Step 4: send the info for each function call and function response to the model
for tool_call in tool_calls:
function_name = tool_call.function.name
if function_name not in available_functions:
# the model called a function that does not exist in available_functions - don't try calling anything
return
function_to_call = available_functions[function_name]
function_args = json.loads(tool_call.function.arguments)
function_response = function_to_call(
location=function_args.get("location"),
unit=function_args.get("unit"),
)
messages.append(
{
"tool_call_id": tool_call.id,
"role": "tool",
"name": function_name,
"content": function_response,
}
) # extend conversation with function response
print(f"messages: {messages}")
second_response = litellm.completion(
model=model,
messages=messages,
seed=22,
reasoning_effort="low",
# tools=tools,
drop_params=True,
) # get a new response from the model where it can see the function response
print("second response\n", second_response)

Switching between Anthropic + Deepseek models

Set drop_params=True to drop the 'thinking' blocks when swapping from Anthropic to Deepseek models. Suggest improvements to this approach here.

litellm.drop_params = True # 👈 EITHER GLOBALLY or per request

# or per request
## Anthropic
response = litellm.completion(
model="anthropic/claude-3-7-sonnet-20250219",
messages=[{"role": "user", "content": "What is the capital of France?"}],
reasoning_effort="low",
drop_params=True,
)

## Deepseek
response = litellm.completion(
model="deepseek/deepseek-chat",
messages=[{"role": "user", "content": "What is the capital of France?"}],
reasoning_effort="low",
drop_params=True,
)

Spec

These fields can be accessed via response.choices[0].message.reasoning_content and response.choices[0].message.thinking_blocks.

  • reasoning_content - str: The reasoning content from the model. Returned across all providers.
  • thinking_blocks - Optional[List[Dict[str, str]]]: A list of thinking blocks from the model. Only returned for Anthropic models.
    • type - str: The type of thinking block.
    • thinking - str: The thinking from the model.
    • signature - str: The signature delta from the model.

Pass thinking to Anthropic models

You can also pass the thinking parameter to Anthropic models.

response = litellm.completion(
model="anthropic/claude-3-7-sonnet-20250219",
messages=[{"role": "user", "content": "What is the capital of France?"}],
thinking={"type": "enabled", "budget_tokens": 1024},
)

Checking if a model supports reasoning

Use litellm.supports_reasoning(model="") -> returns True if model supports reasoning and False if not.

litellm.supports_reasoning() usage
import litellm 

# Example models that support reasoning
assert litellm.supports_reasoning(model="anthropic/claude-3-7-sonnet-20250219") == True
assert litellm.supports_reasoning(model="deepseek/deepseek-chat") == True

# Example models that do not support reasoning
assert litellm.supports_reasoning(model="openai/gpt-3.5-turbo") == False