Skip to main content

Xinference [Xorbits Inference]

https://inference.readthedocs.io/en/latest/index.html

Overviewโ€‹

PropertyDetails
DescriptionXinference is an open-source platform to run inference with any open-source LLMs, image generation models, and more.
Provider Route on LiteLLMxinference/
Link to Provider DocXinference โ†—
Supported Operations/embeddings, /images/generations

LiteLLM supports Xinference Embedding + Image Generation calls.

API Base, Keyโ€‹

# env variable
os.environ['XINFERENCE_API_BASE'] = "http://127.0.0.1:9997/v1"
os.environ['XINFERENCE_API_KEY'] = "anything" #[optional] no api key required

Sample Usage - Embeddingโ€‹

from litellm import embedding
import os

os.environ['XINFERENCE_API_BASE'] = "http://127.0.0.1:9997/v1"
response = embedding(
model="xinference/bge-base-en",
input=["good morning from litellm"],
)
print(response)

Sample Usage api_base paramโ€‹

from litellm import embedding
import os

response = embedding(
model="xinference/bge-base-en",
api_base="http://127.0.0.1:9997/v1",
input=["good morning from litellm"],
)
print(response)

Image Generationโ€‹

Usage - LiteLLM Python SDKโ€‹

from litellm import image_generation
import os

# xinference image generation call
response = image_generation(
model="xinference/stabilityai/stable-diffusion-3.5-large",
prompt="A beautiful sunset over a calm ocean",
api_base="http://127.0.0.1:9997/v1",
)
print(response)

Usage - LiteLLM Proxy Serverโ€‹

1. Setup config.yamlโ€‹

model_list:
- model_name: xinference-sd
litellm_params:
model: xinference/stabilityai/stable-diffusion-3.5-large
api_base: http://127.0.0.1:9997/v1
api_key: anything
model_info:
mode: image_generation

general_settings:
master_key: sk-1234

2. Start the proxyโ€‹

litellm --config config.yaml

# RUNNING on http://0.0.0.0:4000

3. Test itโ€‹

curl --location 'http://0.0.0.0:4000/v1/images/generations' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-1234' \
--data '{
"model": "xinference-sd",
"prompt": "A beautiful sunset over a calm ocean",
"n": 1,
"size": "1024x1024",
"response_format": "url"
}'

Advanced Usage - With Additional Parametersโ€‹

from litellm import image_generation
import os

os.environ['XINFERENCE_API_BASE'] = "http://127.0.0.1:9997/v1"

response = image_generation(
model="xinference/stabilityai/stable-diffusion-3.5-large",
prompt="A beautiful sunset over a calm ocean",
n=1, # number of images
size="1024x1024", # image size
response_format="b64_json", # return format
)
print(response)

Supported Image Generation Modelsโ€‹

Xinference supports various stable diffusion models. Here are some examples:

Model NameFunction Call
stabilityai/stable-diffusion-3.5-largeimage_generation(model="xinference/stabilityai/stable-diffusion-3.5-large", prompt="...")
stabilityai/stable-diffusion-xl-base-1.0image_generation(model="xinference/stabilityai/stable-diffusion-xl-base-1.0", prompt="...")
runwayml/stable-diffusion-v1-5image_generation(model="xinference/runwayml/stable-diffusion-v1-5", prompt="...")

For a complete list of supported image generation models, see: https://inference.readthedocs.io/en/latest/models/builtin/image/index.html

Supported Modelsโ€‹

All models listed here https://inference.readthedocs.io/en/latest/models/builtin/embedding/index.html are supported

Model NameFunction Call
bge-base-enembedding(model="xinference/bge-base-en", input)
bge-base-en-v1.5embedding(model="xinference/bge-base-en-v1.5", input)
bge-base-zhembedding(model="xinference/bge-base-zh", input)
bge-base-zh-v1.5embedding(model="xinference/bge-base-zh-v1.5", input)
bge-large-enembedding(model="xinference/bge-large-en", input)
bge-large-en-v1.5embedding(model="xinference/bge-large-en-v1.5", input)
bge-large-zhembedding(model="xinference/bge-large-zh", input)
bge-large-zh-noinstructembedding(model="xinference/bge-large-zh-noinstruct", input)
bge-large-zh-v1.5embedding(model="xinference/bge-large-zh-v1.5", input)
bge-small-en-v1.5embedding(model="xinference/bge-small-en-v1.5", input)
bge-small-zhembedding(model="xinference/bge-small-zh", input)
bge-small-zh-v1.5embedding(model="xinference/bge-small-zh-v1.5", input)
e5-large-v2embedding(model="xinference/e5-large-v2", input)
gte-baseembedding(model="xinference/gte-base", input)
gte-largeembedding(model="xinference/gte-large", input)
jina-embeddings-v2-base-enembedding(model="xinference/jina-embeddings-v2-base-en", input)
jina-embeddings-v2-small-enembedding(model="xinference/jina-embeddings-v2-small-en", input)
multilingual-e5-largeembedding(model="xinference/multilingual-e5-large", input)