One post tagged with "minimax"

Day 0 Support: MiniMax-M2.5

February 12, 2026

Sameer Kankute

SWE @ LiteLLM (LLM Translation)

Krrish Dholakia

CEO, LiteLLM

Ishaan Jaff

CTO, LiteLLM

LiteLLM now supports MiniMax-M2.5 on Day 0. Use it across OpenAI-compatible and Anthropic-compatible APIs through the LiteLLM AI Gateway.

Supported Models

LiteLLM supports the following MiniMax models:

Model	Description	Input Cost	Output Cost	Context Window
MiniMax-M2.5	Advanced reasoning, Agentic capabilities	$0.3/M tokens	$1.2/M tokens	1M tokens
MiniMax-M2.5-lightning	Faster and More Agile (~100 tps)	$0.3/M tokens	$2.4/M tokens	1M tokens

Features Supported

Prompt Caching: Reduce costs with cached prompts ($0.03/M tokens for cache read, $0.375/M tokens for cache write)
Function Calling: Built-in tool calling support
Reasoning: Advanced reasoning capabilities with thinking support
System Messages: Full system message support
Cost Tracking: Automatic cost calculation for all requests

Docker Image

docker pull litellm/litellm:v1.81.3-stable

Usage - OpenAI Compatible API (/v1/chat/completions)

LiteLLM Proxy

1. Setup config.yaml

model_list:
  - model_name: minimax-m2-5
    litellm_params:
      model: minimax/MiniMax-M2.5
      api_key: os.environ/MINIMAX_API_KEY
      api_base: https://api.minimax.io/v1

2. Start the proxy

docker run -d \
  -p 4000:4000 \
  -e MINIMAX_API_KEY=$MINIMAX_API_KEY \
  -v $(pwd)/config.yaml:/app/config.yaml \
  ghcr.io/berriai/litellm:v1.81.3-stable \
  --config /app/config.yaml

3. Test it!

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
  "model": "minimax-m2-5",
  "messages": [
    {
      "role": "user",
      "content": "what llm are you"
    }
  ]
}'

With Reasoning Split

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
  "model": "minimax-m2-5",
  "messages": [
    {
      "role": "user",
      "content": "Solve: 2+2=?"
    }
  ],
  "extra_body": {
    "reasoning_split": true
  }
}'

Usage - Anthropic Compatible API (/v1/messages)

LiteLLM Proxy

1. Setup config.yaml

model_list:
  - model_name: minimax-m2-5
    litellm_params:
      model: minimax/MiniMax-M2.5
      api_key: os.environ/MINIMAX_API_KEY
      api_base: https://api.minimax.io/anthropic/v1/messages

2. Start the proxy

docker run -d \
  -p 4000:4000 \
  -e MINIMAX_API_KEY=$MINIMAX_API_KEY \
  -v $(pwd)/config.yaml:/app/config.yaml \
  ghcr.io/berriai/litellm:v1.81.3-stable \
  --config /app/config.yaml

3. Test it!

curl --location 'http://0.0.0.0:4000/v1/messages' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
  "model": "minimax-m2-5",
  "max_tokens": 1000,
  "messages": [
    {
      "role": "user",
      "content": "what llm are you"
    }
  ]
}'

With Thinking

curl --location 'http://0.0.0.0:4000/v1/messages' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
  "model": "minimax-m2-5",
  "max_tokens": 1000,
  "thinking": {
    "type": "enabled",
    "budget_tokens": 1000
  },
  "messages": [
    {
      "role": "user",
      "content": "Solve: 2+2=?"
    }
  ]
}'

Usage - LiteLLM SDK

OpenAI-compatible API

import litellm

response = litellm.completion(
    model="minimax/MiniMax-M2.5",
    messages=[
        {"role": "user", "content": "Hello, how are you?"}
    ],
    api_key="your-minimax-api-key",
    api_base="https://api.minimax.io/v1"
)

print(response.choices[0].message.content)

Anthropic-compatible API

import litellm

response = litellm.anthropic.messages.acreate(
    model="minimax/MiniMax-M2.5",
    messages=[{"role": "user", "content": "Hello, how are you?"}],
    api_key="your-minimax-api-key",
    api_base="https://api.minimax.io/anthropic/v1/messages",
    max_tokens=1000
)

print(response.choices[0].message.content)

With Thinking

response = litellm.anthropic.messages.acreate(
    model="minimax/MiniMax-M2.5",
    messages=[{"role": "user", "content": "Solve: 2+2=?"}],
    thinking={"type": "enabled", "budget_tokens": 1000},
    api_key="your-minimax-api-key"
)

# Access thinking content
for block in response.choices[0].message.content:
    if hasattr(block, 'type') and block.type == 'thinking':
        print(f"Thinking: {block.thinking}")

With Reasoning Split (OpenAI API)

response = litellm.completion(
    model="minimax/MiniMax-M2.5",
    messages=[
        {"role": "user", "content": "Solve: 2+2=?"}
    ],
    extra_body={"reasoning_split": True},
    api_key="your-minimax-api-key",
    api_base="https://api.minimax.io/v1"
)

# Access thinking and response
if hasattr(response.choices[0].message, 'reasoning_details'):
    print(f"Thinking: {response.choices[0].message.reasoning_details}")
print(f"Response: {response.choices[0].message.content}")

Cost Tracking

LiteLLM automatically tracks costs for MiniMax-M2.5 requests. The pricing is:

Input: $0.3 per 1M tokens
Output: $1.2 per 1M tokens
Cache Read: $0.03 per 1M tokens
Cache Write: $0.375 per 1M tokens

Accessing Cost Information

response = litellm.completion(
    model="minimax/MiniMax-M2.5",
    messages=[{"role": "user", "content": "Hello!"}],
    api_key="your-minimax-api-key"
)

# Access cost information
print(f"Cost: ${response._hidden_params.get('response_cost', 0)}")

Streaming Support

OpenAI API

response = litellm.completion(
    model="minimax/MiniMax-M2.5",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True,
    api_key="your-minimax-api-key",
    api_base="https://api.minimax.io/v1"
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Streaming with Reasoning Split

stream = litellm.completion(
    model="minimax/MiniMax-M2.5",
    messages=[
        {"role": "user", "content": "Tell me a story"},
    ],
    extra_body={"reasoning_split": True},
    stream=True,
    api_key="your-minimax-api-key",
    api_base="https://api.minimax.io/v1"
)

reasoning_buffer = ""
text_buffer = ""

for chunk in stream:
    if hasattr(chunk.choices[0].delta, "reasoning_details") and chunk.choices[0].delta.reasoning_details:
        for detail in chunk.choices[0].delta.reasoning_details:
            if "text" in detail:
                reasoning_text = detail["text"]
                new_reasoning = reasoning_text[len(reasoning_buffer):]
                if new_reasoning:
                    print(new_reasoning, end="", flush=True)
                    reasoning_buffer = reasoning_text

    if chunk.choices[0].delta.content:
        content_text = chunk.choices[0].delta.content
        new_text = content_text[len(text_buffer):] if text_buffer else content_text
        if new_text:
            print(new_text, end="", flush=True)
            text_buffer = content_text

Using with Native SDKs

Anthropic SDK via LiteLLM Proxy

import os
os.environ["ANTHROPIC_BASE_URL"] = "http://localhost:4000"
os.environ["ANTHROPIC_API_KEY"] = "sk-1234"  # Your LiteLLM proxy key

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="minimax-m2-5",
    max_tokens=1000,
    system="You are a helpful assistant.",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Hi, how are you?"
                }
            ]
        }
    ]
)

for block in message.content:
    if block.type == "thinking":
        print(f"Thinking:\n{block.thinking}\n")
    elif block.type == "text":
        print(f"Text:\n{block.text}\n")

OpenAI SDK via LiteLLM Proxy

import os
os.environ["OPENAI_BASE_URL"] = "http://localhost:4000"
os.environ["OPENAI_API_KEY"] = "sk-1234"  # Your LiteLLM proxy key

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="minimax-m2-5",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi, how are you?"},
    ],
    extra_body={"reasoning_split": True},
)

# Access thinking and response
if hasattr(response.choices[0].message, 'reasoning_details'):
    print(f"Thinking:\n{response.choices[0].message.reasoning_details[0]['text']}\n")
print(f"Text:\n{response.choices[0].message.content}\n")

Supported Models​

Features Supported​

Docker Image​

Usage - OpenAI Compatible API (/v1/chat/completions)​

With Reasoning Split​

Usage - Anthropic Compatible API (/v1/messages)​

With Thinking​

Usage - LiteLLM SDK​

OpenAI-compatible API​

Anthropic-compatible API​

With Thinking​

With Reasoning Split (OpenAI API)​

Cost Tracking​

Accessing Cost Information​

Streaming Support​

OpenAI API​

Streaming with Reasoning Split​

Using with Native SDKs​

Anthropic SDK via LiteLLM Proxy​

OpenAI SDK via LiteLLM Proxy​

Supported Models

Features Supported

Docker Image

Usage - OpenAI Compatible API (/v1/chat/completions)

With Reasoning Split

Usage - Anthropic Compatible API (/v1/messages)

With Thinking

Usage - LiteLLM SDK

OpenAI-compatible API

Anthropic-compatible API

With Thinking

With Reasoning Split (OpenAI API)

Cost Tracking

Accessing Cost Information

Streaming Support

OpenAI API

Streaming with Reasoning Split

Using with Native SDKs

Anthropic SDK via LiteLLM Proxy

OpenAI SDK via LiteLLM Proxy