Chutes
Overview​
| Property | Details |
|---|---|
| Description | Chutes is a cloud-native AI deployment platform that allows you to deploy, run, and scale LLM applications with OpenAI-compatible APIs using pre-built templates for popular frameworks like vLLM and SGLang. |
| Provider Route on LiteLLM | chutes/ |
| Link to Provider Doc | Chutes Website ↗ |
| Base URL | https://llm.chutes.ai/v1/ |
| Supported Operations | /chat/completions, Embeddings |
What is Chutes?​
Chutes is a powerful AI deployment and serving platform that provides:
- Pre-built Templates: Ready-to-use configurations for vLLM, SGLang, diffusion models, and embeddings
- OpenAI-Compatible APIs: Use standard OpenAI SDKs and clients
- Multi-GPU Scaling: Support for large models across multiple GPUs
- Streaming Responses: Real-time model outputs
- Custom Configurations: Override any parameter for your specific needs
- Performance Optimization: Pre-configured optimization settings
Required Variables​
Environment Variables
os.environ["CHUTES_API_KEY"] = "" # your Chutes API key
Get your Chutes API key from chutes.ai.
Usage - LiteLLM Python SDK​
Non-streaming​
Chutes Non-streaming Completion
import os
import litellm
from litellm import completion
os.environ["CHUTES_API_KEY"] = "" # your Chutes API key
messages = [{"content": "What is the capital of France?", "role": "user"}]
# Chutes call
response = completion(
model="chutes/model-name", # Replace with actual model name
messages=messages
)
print(response)
Streaming​
Chutes Streaming Completion
import os
import litellm
from litellm import completion
os.environ["CHUTES_API_KEY"] = "" # your Chutes API key
messages = [{"content": "Write a short poem about AI", "role": "user"}]
# Chutes call with streaming
response = completion(
model="chutes/model-name", # Replace with actual model name
messages=messages,
stream=True
)
for chunk in response:
print(chunk)
Usage - LiteLLM Proxy Server​
1. Save key in your environment​
export CHUTES_API_KEY=""
2. Start the proxy​
model_list:
- model_name: chutes-model
litellm_params:
model: chutes/model-name # Replace with actual model name
api_key: os.environ/CHUTES_API_KEY
Supported OpenAI Parameters​
Chutes supports all standard OpenAI-compatible parameters:
| Parameter | Type | Description |
|---|---|---|
messages | array | Required. Array of message objects with 'role' and 'content' |
model | string | Required. Model ID or HuggingFace model identifier |
stream | boolean | Optional. Enable streaming responses |
temperature | float | Optional. Sampling temperature |
top_p | float | Optional. Nucleus sampling parameter |
max_tokens | integer | Optional. Maximum tokens to generate |
frequency_penalty | float | Optional. Penalize frequent tokens |
presence_penalty | float | Optional. Penalize tokens based on presence |
stop | string/array | Optional. Stop sequences |
tools | array | Optional. List of available tools/functions |
tool_choice | string/object | Optional. Control tool/function calling |
response_format | object | Optional. Response format specification |
Support Frameworks​
Chutes provides optimized templates for popular AI frameworks: