Weights & Biases Inference
https://weave-docs.wandb.ai/quickstart-inference
Litellm provides support to all models from W&B Inference service. To use a model, set model=wandb/<any-model-on-wandb-inference-dashboard>
as a prefix for litellm requests. The full list of supported models is provided at https://docs.wandb.ai/guides/inference/models/
API Keyโ
You can get an API key for W&B Inference at - https://wandb.ai/authorize
import os
# env variable
os.environ['WANDB_API_KEY']
Sample Usage: Text Generationโ
from litellm import completion
import os
os.environ['WANDB_API_KEY'] = "insert-your-wandb-api-key"
response = completion(
model="wandb/Qwen/Qwen3-235B-A22B-Instruct-2507",
messages=[
{
"role": "user",
"content": "What character was Wall-e in love with?",
}
],
max_tokens=10,
response_format={ "type": "json_object" },
seed=123,
temperature=0.6, # either set temperature or `top_p`
top_p=0.01, # to get as deterministic results as possible
)
print(response)
Sample Usage - Streamingโ
from litellm import completion
import os
os.environ['WANDB_API_KEY'] = ""
response = completion(
model="wandb/Qwen/Qwen3-235B-A22B-Instruct-2507",
messages=[
{
"role": "user",
"content": "What character was Wall-e in love with?",
}
],
stream=True,
max_tokens=10,
response_format={ "type": "json_object" },
seed=123,
temperature=0.6, # either set temperature or `top_p`
top_p=0.01, # to get as deterministic results as possible
)
for chunk in response:
print(chunk)
The above examples may not work if the model has been taken offline. Check the full list of available models at https://docs.wandb.ai/guides/inference/models/.
Usage with LiteLLM Proxy Serverโ
Here's how to call a W&B Inference model with the LiteLLM Proxy Server
- Modify the config.yaml
model_list:
- model_name: my-model
litellm_params:
model: wandb/<your-model-name> # add wandb/ prefix to use W&B Inference as provider
api_key: api-key # api key to send your model
- Start the proxy
$ litellm --config /path/to/config.yaml
- Send Request to LiteLLM Proxy Server
- OpenAI Python v1.0.0+
- curl
import openai
client = openai.OpenAI(
api_key="litellm-proxy-key", # pass litellm proxy key, if you're using virtual keys
base_url="http://0.0.0.0:4000" # litellm-proxy-base url
)
response = client.chat.completions.create(
model="my-model",
messages = [
{
"role": "user",
"content": "What character was Wall-e in love with?"
}
],
)
print(response)
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: litellm-proxy-key' \
--header 'Content-Type: application/json' \
--data '{
"model": "my-model",
"messages": [
{
"role": "user",
"content": "What character was Wall-e in love with?"
}
],
}'
Supported Parametersโ
The W&B Inference provider supports the following parameters:
Chat Completion Parametersโ
Parameter | Type | Description |
---|---|---|
frequency_penalty | number | Penalizes new tokens based on their frequency in the text |
function_call | string/object | Controls how the model calls functions |
functions | array | List of functions for which the model may generate JSON inputs |
logit_bias | map | Modifies the likelihood of specified tokens |
max_tokens | integer | Maximum number of tokens to generate |
n | integer | Number of completions to generate |
presence_penalty | number | Penalizes tokens based on if they appear in the text so far |
response_format | object | Format of the response, e.g., {"type": "json"} |
seed | integer | Sampling seed for deterministic results |
stop | string/array | Sequences where the API will stop generating tokens |
stream | boolean | Whether to stream the response |
temperature | number | Controls randomness (0-2) |
top_p | number | Controls nucleus sampling |
Error Handlingโ
The integration uses the standard LiteLLM error handling. Further, here's a list of commonly encountered errors with the W&B Inference API -
Error Code | Message | Cause | Solution |
---|---|---|---|
401 | Authentication failed | Your authentication credentials are incorrect or your W&B project entity and/or name are incorrect. | Ensure you're using the correct API key and that your W&B project name and entity are correct. |
403 | Country, region, or territory not supported | Accessing the API from an unsupported location. | Please see Geographic restrictions |
429 | Concurrency limit reached for requests | Too many concurrent requests. | Reduce the number of concurrent requests or increase your limits. For more information, see Usage information and limits. |
429 | You exceeded your current quota, please check your plan and billing details | Out of credits or reached monthly spending cap. | Get more credits or increase your limits. For more information, see Usage information and limits. |
429 | W&B Inference isn't available for personal accounts. | Switch to a non-personal account. | Follow the instructions below for a work around. |
500 | The server had an error while processing your request | Internal server error. | Retry after a brief wait and contact support if it persists. |
503 | The engine is currently overloaded, please try again later | Server is experiencing high traffic. | Retry your request after a short delay. |
Error 429: Personal entities unsupportedโ
The user is on a personal account, which doesn't have access to W&B Inference. If one isn't available, create a Team to create a non-personal account.
Once done, add the openai-project
header to your request as shown below:
response = completion(
model="...",
extra_headers={"openai-project": "team_name/project_name"},
...
For more information, see Personal entities unsupported.
You can find more ways of using custom headers with LiteLLM here - https://docs.litellm.ai/docs/proxy/request_headers.