Skip to main content

Custom LLM API-Endpoints

LiteLLM supports Custom deploy api endpoints

LiteLLM Expects the following input and output for custom LLM API endpoints

Model Details

For calls to your custom API base ensure:

  • Set api_base="your-api-base"
  • Add custom/ as a prefix to the model param. If your API expects meta-llama/Llama-2-13b-hf set model=custom/meta-llama/Llama-2-13b-hf
Model NameFunction Call
meta-llama/Llama-2-13b-hfresponse = completion(model="custom/meta-llama/Llama-2-13b-hf", messages=messages, api_base="https://your-custom-inference-endpoint")
meta-llama/Llama-2-13b-hfresponse = completion(model="custom/meta-llama/Llama-2-13b-hf", messages=messages, api_base="https://api.autoai.dev/inference")

Example Call to Custom LLM API using LiteLLM

from litellm import completion
response = completion(
model="custom/meta-llama/Llama-2-13b-hf",
messages= [{"content": "what is custom llama?", "role": "user"}],
temperature=0.2,
max_tokens=10,
api_base="https://api.autoai.dev/inference",
request_timeout=300,
)
print("got response\n", response)

Setting your Custom API endpoint

Inputs to your custom LLM api bases should follow this format:

resp = requests.post(
your-api_base,
json={
'model': 'meta-llama/Llama-2-13b-hf', # model name
'params': {
'prompt': ["The capital of France is P"],
'max_tokens': 32,
'temperature': 0.7,
'top_p': 1.0,
'top_k': 40,
}
}
)

Outputs from your custom LLM api bases should follow this format:

{
'data': [
{
'prompt': 'The capital of France is P',
'output': [
'The capital of France is PARIS.\nThe capital of France is PARIS.\nThe capital of France is PARIS.\nThe capital of France is PARIS.\nThe capital of France is PARIS.\nThe capital of France is PARIS.\nThe capital of France is PARIS.\nThe capital of France is PARIS.\nThe capital of France is PARIS.\nThe capital of France is PARIS.\nThe capital of France is PARIS.\nThe capital of France is PARIS.\nThe capital of France is PARIS.\nThe capital of France'
],
'params': {
'temperature': 0.7,
'top_k': 40,
'top_p': 1
}
}
],
'message': 'ok'
}