Skip to main content

One post tagged with "opus 4.6"

View All Tags

Day 0 Support: Claude Opus 4.6

Sameer Kankute
SWE @ LiteLLM (LLM Translation)
Ishaan Jaff
CTO, LiteLLM
Krrish Dholakia
CEO, LiteLLM

LiteLLM now supports Claude Opus 4.6 on Day 0. Use it across Anthropic, Azure, Vertex AI, and Bedrock through the LiteLLM AI Gateway.

Docker Image​

docker pull ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.80.0-stable.opus-4-6

Usage - Anthropic​

1. Setup config.yaml

model_list:
- model_name: claude-opus-4-6
litellm_params:
model: anthropic/claude-opus-4-6
api_key: os.environ/ANTHROPIC_API_KEY

2. Start the proxy

docker run -d \
-p 4000:4000 \
-e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
-v $(pwd)/config.yaml:/app/config.yaml \
ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.80.0-stable.opus-4-6 \
--config /app/config.yaml

3. Test it!

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
"model": "claude-opus-4-6",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}'

Usage - Azure​

1. Setup config.yaml

model_list:
- model_name: claude-opus-4-6
litellm_params:
model: azure_ai/claude-opus-4-6
api_key: os.environ/AZURE_AI_API_KEY
api_base: os.environ/AZURE_AI_API_BASE # https://<resource>.services.ai.azure.com

2. Start the proxy

docker run -d \
-p 4000:4000 \
-e AZURE_AI_API_KEY=$AZURE_AI_API_KEY \
-e AZURE_AI_API_BASE=$AZURE_AI_API_BASE \
-v $(pwd)/config.yaml:/app/config.yaml \
ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.80.0-stable.opus-4-6 \
--config /app/config.yaml

3. Test it!

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
"model": "claude-opus-4-6",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}'

Usage - Vertex AI​

1. Setup config.yaml

model_list:
- model_name: claude-opus-4-6
litellm_params:
model: vertex_ai/claude-opus-4-6
vertex_project: os.environ/VERTEX_PROJECT
vertex_location: us-east5

2. Start the proxy

docker run -d \
-p 4000:4000 \
-e VERTEX_PROJECT=$VERTEX_PROJECT \
-e GOOGLE_APPLICATION_CREDENTIALS=/app/credentials.json \
-v $(pwd)/config.yaml:/app/config.yaml \
-v $(pwd)/credentials.json:/app/credentials.json \
ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.80.0-stable.opus-4-6 \
--config /app/config.yaml

3. Test it!

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
"model": "claude-opus-4-6",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}'

Usage - Bedrock​

1. Setup config.yaml

model_list:
- model_name: claude-opus-4-6
litellm_params:
model: bedrock/anthropic.claude-opus-4-6-v1:0
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
aws_region_name: us-east-1

2. Start the proxy

docker run -d \
-p 4000:4000 \
-e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
-e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
-v $(pwd)/config.yaml:/app/config.yaml \
ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.80.0-stable.opus-4-6 \
--config /app/config.yaml

3. Test it!

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
"model": "claude-opus-4-6",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}'

Compaction​

Litellm supports enabling compaction for the new claude-opus-4-6.

Enabling Compaction​

To enable compaction, add the context_management parameter with the compact_20260112 edit type:

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
"model": "claude-opus-4-6",
"messages": [
{
"role": "user",
"content": "What is the weather in San Francisco?"
}
],
"context_management": {
"edits": [
{
"type": "compact_20260112"
}
]
},
"max_tokens": 100
}'

All the parameters supported for context_management by anthropic are supported and can be directly added. Litellm automatically adds the compact-2026-01-12 beta header in the request.

Response with Compaction Block​

The response will include the compaction summary in provider_specific_fields.compaction_blocks:

{
"id": "chatcmpl-a6c105a3-4b25-419e-9551-c800633b6cb2",
"created": 1770357619,
"model": "claude-opus-4-6",
"object": "chat.completion",
"choices": [
{
"finish_reason": "length",
"index": 0,
"message": {
"content": "I don't have access to real-time data, so I can't provide the current weather in San Francisco. To get up-to-date weather information, I'd recommend checking:\n\n- **Weather websites** like weather.com, accuweather.com, or wunderground.com\n- **Search engines** – just Google \"San Francisco weather\"\n- **Weather apps** on your phone (e.g., Apple Weather, Google Weather)\n- **National",
"role": "assistant",
"provider_specific_fields": {
"compaction_blocks": [
{
"type": "compaction",
"content": "Summary of the conversation: The user requested help building a web scraper..."
}
]
}
}
}
],
"usage": {
"completion_tokens": 100,
"prompt_tokens": 86,
"total_tokens": 186
}
}

Using Compaction Blocks in Follow-up Requests​

To continue the conversation with compaction, include the compaction block in the assistant message's provider_specific_fields:

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
"model": "claude-opus-4-6",
"messages": [
{
"role": "user",
"content": "How can I build a web scraper?"
},
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "Certainly! To build a basic web scraper, you'll typically use a programming language like Python along with libraries such as `requests` (for fetching web pages) and `BeautifulSoup` (for parsing HTML). Here's a basic example:\n\n```python\nimport requests\nfrom bs4 import BeautifulSoup\n\nurl = 'https://example.com'\nresponse = requests.get(url)\nsoup = BeautifulSoup(response.text, 'html.parser')\n\n# Extract and print all text\ntext = soup.get_text()\nprint(text)\n```\n\nLet me know what you're interested in scraping or if you need help with a specific website!"
}
],
"provider_specific_fields": {
"compaction_blocks": [
{
"type": "compaction",
"content": "Summary of the conversation: The user asked how to build a web scraper, and the assistant gave an overview using Python with requests and BeautifulSoup."
}
]
}
},
{
"role": "user",
"content": "How do I use it to scrape product prices?"
}
],
"context_management": {
"edits": [
{
"type": "compact_20260112"
}
]
},
"max_tokens": 100
}'

Streaming Support​

Compaction blocks are also supported in streaming mode. You'll receive:

  • compaction_start event when a compaction block begins
  • compaction_delta events with the compaction content
  • The accumulated compaction_blocks in provider_specific_fields

Effort Levels​

Four effort levels available: low, medium, high (default), and max. Pass directly via the effort parameter:

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
"model": "claude-opus-4-6",
"messages": [
{
"role": "user",
"content": "Explain quantum computing"
}
],
"effort": "max"
}'

1M Token Context (Beta)​

Opus 4.6 supports 1M token context. Premium pricing applies for prompts exceeding 200k tokens ($10/$37.50 per million input/output tokens). LiteLLM supports cost calculations for 1M token contexts.

US-Only Inference​

Available at 1.1× token pricing. LiteLLM supports this pricing model.