Skip to main content

GMI Cloud

Overview​

PropertyDetails
DescriptionGMI Cloud is a GPU cloud infrastructure provider offering access to top AI models including Claude, GPT, DeepSeek, Gemini, and more through OpenAI-compatible APIs.
Provider Route on LiteLLMgmi/
Link to Provider DocGMI Cloud Docs ↗
Base URLhttps://api.gmi-serving.com/v1
Supported Operations/chat/completions, /models

What is GMI Cloud?​

GMI Cloud is a venture-backed digital infrastructure company ($82M+ funding) providing:

  • Top-tier GPU Access: NVIDIA H100 GPUs for AI workloads
  • Multiple AI Models: Claude, GPT, DeepSeek, Gemini, Kimi, Qwen, and more
  • OpenAI-Compatible API: Drop-in replacement for OpenAI SDK
  • Global Infrastructure: Data centers in US (Colorado) and APAC (Taiwan)

Required Variables​

Environment Variables
os.environ["GMI_API_KEY"] = ""  # your GMI Cloud API key

Get your GMI Cloud API key from console.gmicloud.ai.

Usage - LiteLLM Python SDK​

Non-streaming​

GMI Cloud Non-streaming Completion
import os
import litellm
from litellm import completion

os.environ["GMI_API_KEY"] = "" # your GMI Cloud API key

messages = [{"content": "What is the capital of France?", "role": "user"}]

# GMI Cloud call
response = completion(
model="gmi/deepseek-ai/DeepSeek-V3.2",
messages=messages
)

print(response)

Streaming​

GMI Cloud Streaming Completion
import os
import litellm
from litellm import completion

os.environ["GMI_API_KEY"] = "" # your GMI Cloud API key

messages = [{"content": "Write a short poem about AI", "role": "user"}]

# GMI Cloud call with streaming
response = completion(
model="gmi/anthropic/claude-sonnet-4.5",
messages=messages,
stream=True
)

for chunk in response:
print(chunk)

Usage - LiteLLM Proxy Server​

1. Save key in your environment​

export GMI_API_KEY=""

2. Start the proxy​

model_list:
- model_name: deepseek-v3
litellm_params:
model: gmi/deepseek-ai/DeepSeek-V3.2
api_key: os.environ/GMI_API_KEY
- model_name: claude-sonnet
litellm_params:
model: gmi/anthropic/claude-sonnet-4.5
api_key: os.environ/GMI_API_KEY

Supported Models​

ModelModel IDContext Length
Claude Opus 4.5gmi/anthropic/claude-opus-4.5409K
Claude Sonnet 4.5gmi/anthropic/claude-sonnet-4.5409K
Claude Sonnet 4gmi/anthropic/claude-sonnet-4409K
Claude Opus 4gmi/anthropic/claude-opus-4409K
GPT-5.2gmi/openai/gpt-5.2409K
GPT-5.1gmi/openai/gpt-5.1409K
GPT-5gmi/openai/gpt-5409K
GPT-4ogmi/openai/gpt-4o131K
GPT-4o-minigmi/openai/gpt-4o-mini131K
DeepSeek V3.2gmi/deepseek-ai/DeepSeek-V3.2163K
DeepSeek V3 0324gmi/deepseek-ai/DeepSeek-V3-0324163K
Gemini 3 Progmi/google/gemini-3-pro-preview1M
Gemini 3 Flashgmi/google/gemini-3-flash-preview1M
Kimi K2 Thinkinggmi/moonshotai/Kimi-K2-Thinking262K
MiniMax M2.1gmi/MiniMaxAI/MiniMax-M2.1196K
Qwen3-VL 235Bgmi/Qwen/Qwen3-VL-235B-A22B-Instruct-FP8262K
GLM-4.7gmi/zai-org/GLM-4.7-FP8202K

Supported OpenAI Parameters​

GMI Cloud supports all standard OpenAI-compatible parameters:

ParameterTypeDescription
messagesarrayRequired. Array of message objects with 'role' and 'content'
modelstringRequired. Model ID from available models
streambooleanOptional. Enable streaming responses
temperaturefloatOptional. Sampling temperature
top_pfloatOptional. Nucleus sampling parameter
max_tokensintegerOptional. Maximum tokens to generate
frequency_penaltyfloatOptional. Penalize frequent tokens
presence_penaltyfloatOptional. Penalize tokens based on presence
stopstring/arrayOptional. Stop sequences
response_formatobjectOptional. JSON mode with {"type": "json_object"}

Additional Resources​