Computer Use
Computer use allows models to interact with computer interfaces by taking screenshots and performing actions like clicking, typing, and scrolling. This enables AI models to autonomously operate desktop environments.
Supported Providers:
- Anthropic API (
anthropic/
) - Bedrock (Anthropic) (
bedrock/
) - Vertex AI (Anthropic) (
vertex_ai/
)
Supported Tool Types:
computer
- Computer interaction tool with display parametersbash
- Bash shell tooltext_editor
- Text editor toolweb_search
- Web search tool
LiteLLM will standardize the computer use tools across all supported providers.
Quick Startโ
- LiteLLM Python SDK
- LiteLLM Proxy Server
import os
from litellm import completion
os.environ["ANTHROPIC_API_KEY"] = "your-api-key"
# Computer use tool
tools = [
{
"type": "computer_20241022",
"name": "computer",
"display_height_px": 768,
"display_width_px": 1024,
"display_number": 0,
}
]
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Take a screenshot and tell me what you see"
},
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mP8/5+hHgAHggJ/PchI7wAAAABJRU5ErkJggg=="
}
}
]
}
]
response = completion(
model="anthropic/claude-3-5-sonnet-latest",
messages=messages,
tools=tools,
)
print(response)
- Define computer use models on config.yaml
model_list:
- model_name: claude-3-5-sonnet-latest # Anthropic claude-3-5-sonnet-latest
litellm_params:
model: anthropic/claude-3-5-sonnet-latest
api_key: os.environ/ANTHROPIC_API_KEY
- model_name: claude-bedrock # Bedrock Anthropic model
litellm_params:
model: bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
aws_region_name: us-west-2
model_info:
supports_computer_use: True # set supports_computer_use to True so /model/info returns this attribute as True
- Run proxy server
litellm --config config.yaml
- Test it using the OpenAI Python SDK
import os
from openai import OpenAI
client = OpenAI(
api_key="sk-1234", # your litellm proxy api key
base_url="http://0.0.0.0:4000"
)
response = client.chat.completions.create(
model="claude-3-5-sonnet-latest",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Take a screenshot and tell me what you see"
},
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mP8/5+hHgAHggJ/PchI7wAAAABJRU5ErkJggg=="
}
}
]
}
],
tools=[
{
"type": "computer_20241022",
"name": "computer",
"display_height_px": 768,
"display_width_px": 1024,
"display_number": 0,
}
]
)
print(response)
Checking if a model supports computer use
โ
- LiteLLM Python SDK
- LiteLLM Proxy Server
Use litellm.supports_computer_use(model="")
-> returns True
if model supports computer use and False
if not
import litellm
assert litellm.supports_computer_use(model="anthropic/claude-3-5-sonnet-latest") == True
assert litellm.supports_computer_use(model="anthropic/claude-3-7-sonnet-20250219") == True
assert litellm.supports_computer_use(model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0") == True
assert litellm.supports_computer_use(model="vertex_ai/claude-3-5-sonnet") == True
assert litellm.supports_computer_use(model="openai/gpt-4") == False
- Define computer use models on config.yaml
model_list:
- model_name: claude-3-5-sonnet-latest # Anthropic claude-3-5-sonnet-latest
litellm_params:
model: anthropic/claude-3-5-sonnet-latest
api_key: os.environ/ANTHROPIC_API_KEY
- model_name: claude-bedrock # Bedrock Anthropic model
litellm_params:
model: bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
aws_region_name: us-west-2
model_info:
supports_computer_use: True # set supports_computer_use to True so /model/info returns this attribute as True
- Run proxy server
litellm --config config.yaml
- Call
/model_group/info
to check if your model supportscomputer use
curl -X 'GET' \
'http://localhost:4000/model_group/info' \
-H 'accept: application/json' \
-H 'x-api-key: sk-1234'
Expected Response
{
"data": [
{
"model_group": "claude-3-5-sonnet-latest",
"providers": ["anthropic"],
"max_input_tokens": 200000,
"max_output_tokens": 8192,
"mode": "chat",
"supports_computer_use": true, # ๐ supports_computer_use is true
"supports_vision": true,
"supports_function_calling": true
},
{
"model_group": "claude-bedrock",
"providers": ["bedrock"],
"max_input_tokens": 200000,
"max_output_tokens": 8192,
"mode": "chat",
"supports_computer_use": true, # ๐ supports_computer_use is true
"supports_vision": true,
"supports_function_calling": true
}
]
}
Different Tool Typesโ
Computer use supports several different tool types for various interaction modes:
- Computer Tool
- Bash Tool
- Text Editor Tool
The computer_20241022
tool provides direct screen interaction capabilities.
import os
from litellm import completion
os.environ["ANTHROPIC_API_KEY"] = "your-api-key"
tools = [
{
"type": "computer_20241022",
"name": "computer",
"display_height_px": 768,
"display_width_px": 1024,
"display_number": 0,
}
]
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Click on the search button in the screenshot"
},
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mP8/5+hHgAHggJ/PchI7wAAAABJRU5ErkJggg=="
}
}
]
}
]
response = completion(
model="anthropic/claude-3-5-sonnet-latest",
messages=messages,
tools=tools,
)
print(response)
The bash_20241022
tool provides command line interface access.
import os
from litellm import completion
os.environ["ANTHROPIC_API_KEY"] = "your-api-key"
tools = [
{
"type": "bash_20241022",
"name": "bash"
}
]
messages = [
{
"role": "user",
"content": "List the files in the current directory using bash"
}
]
response = completion(
model="anthropic/claude-3-5-sonnet-latest",
messages=messages,
tools=tools,
)
print(response)
The text_editor_20250124
tool provides text file editing capabilities.
import os
from litellm import completion
os.environ["ANTHROPIC_API_KEY"] = "your-api-key"
tools = [
{
"type": "text_editor_20250124",
"name": "str_replace_editor"
}
]
messages = [
{
"role": "user",
"content": "Create a simple Python hello world script"
}
]
response = completion(
model="anthropic/claude-3-5-sonnet-latest",
messages=messages,
tools=tools,
)
print(response)
Advanced Usage with Multiple Toolsโ
You can combine different computer use tools in a single request:
import os
from litellm import completion
os.environ["ANTHROPIC_API_KEY"] = "your-api-key"
tools = [
{
"type": "computer_20241022",
"name": "computer",
"display_height_px": 768,
"display_width_px": 1024,
"display_number": 0,
},
{
"type": "bash_20241022",
"name": "bash"
},
{
"type": "text_editor_20250124",
"name": "str_replace_editor"
}
]
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Take a screenshot, then create a file describing what you see, and finally use bash to show the file contents"
},
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mP8/5+hHgAHggJ/PchI7wAAAABJRU5ErkJggg=="
}
}
]
}
]
response = completion(
model="anthropic/claude-3-5-sonnet-latest",
messages=messages,
tools=tools,
)
print(response)
Specโ
Computer Tool (computer_20241022
)โ
{
"type": "computer_20241022",
"name": "computer",
"display_height_px": 768, // Required: Screen height in pixels
"display_width_px": 1024, // Required: Screen width in pixels
"display_number": 0 // Optional: Display number (default: 0)
}
Bash Tool (bash_20241022
)โ
{
"type": "bash_20241022",
"name": "bash" // Required: Tool name
}
Text Editor Tool (text_editor_20250124
)โ
{
"type": "text_editor_20250124",
"name": "str_replace_editor" // Required: Tool name
}
Web Search Tool (web_search_20250305
)โ
{
"type": "web_search_20250305",
"name": "web_search" // Required: Tool name
}