Skip to main content

Weights & Biases Inference

https://weave-docs.wandb.ai/quickstart-inference

tip

Litellm provides support to all models from W&B Inference service. To use a model, set model=wandb/<any-model-on-wandb-inference-dashboard> as a prefix for litellm requests. The full list of supported models is provided at https://docs.wandb.ai/guides/inference/models/

API Keyโ€‹

You can get an API key for W&B Inference at - https://wandb.ai/authorize

import os
# env variable
os.environ['WANDB_API_KEY']

Sample Usage: Text Generationโ€‹

from litellm import completion
import os

os.environ['WANDB_API_KEY'] = "insert-your-wandb-api-key"
response = completion(
model="wandb/Qwen/Qwen3-235B-A22B-Instruct-2507",
messages=[
{
"role": "user",
"content": "What character was Wall-e in love with?",
}
],
max_tokens=10,
response_format={ "type": "json_object" },
seed=123,
temperature=0.6, # either set temperature or `top_p`
top_p=0.01, # to get as deterministic results as possible
)
print(response)

Sample Usage - Streamingโ€‹

from litellm import completion
import os

os.environ['WANDB_API_KEY'] = ""
response = completion(
model="wandb/Qwen/Qwen3-235B-A22B-Instruct-2507",
messages=[
{
"role": "user",
"content": "What character was Wall-e in love with?",
}
],
stream=True,
max_tokens=10,
response_format={ "type": "json_object" },
seed=123,
temperature=0.6, # either set temperature or `top_p`
top_p=0.01, # to get as deterministic results as possible
)

for chunk in response:
print(chunk)
tip

The above examples may not work if the model has been taken offline. Check the full list of available models at https://docs.wandb.ai/guides/inference/models/.

Usage with LiteLLM Proxy Serverโ€‹

Here's how to call a W&B Inference model with the LiteLLM Proxy Server

  1. Modify the config.yaml
model_list:
- model_name: my-model
litellm_params:
model: wandb/<your-model-name> # add wandb/ prefix to use W&B Inference as provider
api_key: api-key # api key to send your model
  1. Start the proxy
$ litellm --config /path/to/config.yaml
  1. Send Request to LiteLLM Proxy Server
import openai
client = openai.OpenAI(
api_key="litellm-proxy-key", # pass litellm proxy key, if you're using virtual keys
base_url="http://0.0.0.0:4000" # litellm-proxy-base url
)

response = client.chat.completions.create(
model="my-model",
messages = [
{
"role": "user",
"content": "What character was Wall-e in love with?"
}
],
)

print(response)

Supported Parametersโ€‹

The W&B Inference provider supports the following parameters:

Chat Completion Parametersโ€‹

ParameterTypeDescription
frequency_penaltynumberPenalizes new tokens based on their frequency in the text
function_callstring/objectControls how the model calls functions
functionsarrayList of functions for which the model may generate JSON inputs
logit_biasmapModifies the likelihood of specified tokens
max_tokensintegerMaximum number of tokens to generate
nintegerNumber of completions to generate
presence_penaltynumberPenalizes tokens based on if they appear in the text so far
response_formatobjectFormat of the response, e.g., {"type": "json"}
seedintegerSampling seed for deterministic results
stopstring/arraySequences where the API will stop generating tokens
streambooleanWhether to stream the response
temperaturenumberControls randomness (0-2)
top_pnumberControls nucleus sampling

Error Handlingโ€‹

The integration uses the standard LiteLLM error handling. Further, here's a list of commonly encountered errors with the W&B Inference API -

Error CodeMessageCauseSolution
401Authentication failedYour authentication credentials are incorrect or your W&B project entity and/or name are incorrect.Ensure you're using the correct API key and that your W&B project name and entity are correct.
403Country, region, or territory not supportedAccessing the API from an unsupported location.Please see Geographic restrictions
429Concurrency limit reached for requestsToo many concurrent requests.Reduce the number of concurrent requests or increase your limits. For more information, see Usage information and limits.
429You exceeded your current quota, please check your plan and billing detailsOut of credits or reached monthly spending cap.Get more credits or increase your limits. For more information, see Usage information and limits.
429W&B Inference isn't available for personal accounts.Switch to a non-personal account.Follow the instructions below for a work around.
500The server had an error while processing your requestInternal server error.Retry after a brief wait and contact support if it persists.
503The engine is currently overloaded, please try again laterServer is experiencing high traffic.Retry your request after a short delay.

Error 429: Personal entities unsupportedโ€‹

The user is on a personal account, which doesn't have access to W&B Inference. If one isn't available, create a Team to create a non-personal account.

Once done, add the openai-project header to your request as shown below:

response = completion(
model="...",
extra_headers={"openai-project": "team_name/project_name"},
...

For more information, see Personal entities unsupported.

You can find more ways of using custom headers with LiteLLM here - https://docs.litellm.ai/docs/proxy/request_headers.