Skip to main content

vertex_batch

Batch APIsโ€‹

Just add the following Vertex env vars to your environment.

# GCS Bucket settings, used to store batch prediction files in
export GCS_BUCKET_NAME="my-batch-bucket" # the bucket you want to store batch prediction files in
export GCS_PATH_SERVICE_ACCOUNT="/path/to/service_account.json" # path to your service account json file

# Vertex /batch endpoint settings, used for LLM API requests
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service_account.json" # path to your service account json file
export VERTEXAI_LOCATION="us-central1" # can be any vertex location
export VERTEXAI_PROJECT="my-project"

Usageโ€‹

Follow this complete workflow: create JSONL file โ†’ upload file โ†’ create batch โ†’ retrieve batch status โ†’ get file content

1. Create a JSONL file of batch requestsโ€‹

LiteLLM expects the file to follow the OpenAI batches files format.

Each body in the file should be an OpenAI API request.

Create a file called batch_requests.jsonl with your requests:

{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-2.5-flash-lite", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 10}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gemini-2.5-flash-lite", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 10}}

2. Upload the fileโ€‹

Upload your JSONL file. For vertex_ai, the file will be stored in your configured GCS bucket provided by GCS_BUCKET_NAME.

upload_file.py
from openai import OpenAI

oai_client = OpenAI(
api_key="sk-1234", # litellm proxy API key
base_url="http://localhost:4000" # litellm proxy base url
)

file_obj = oai_client.files.create(
file=open("batch_requests.jsonl", "rb"),
purpose="batch",
extra_body={"custom_llm_provider": "vertex_ai"}
)

print(f"File uploaded with ID: {file_obj.id}")

Expected Response:

{
"id": "gs://my-batch-bucket/litellm-vertex-files/publishers/google/models/gemini-2.5-flash-lite/abc123-def4-5678-9012-34567890abcd",
"bytes": 416,
"created_at": 1758303684,
"filename": "litellm-vertex-files/publishers/google/models/gemini-2.5-flash-lite/abc123-def4-5678-9012-34567890abcd",
"object": "file",
"purpose": "batch",
"status": "uploaded",
"expires_at": null,
"status_details": null
}

3. Create a batchโ€‹

Create a batch job using the uploaded file ID.

create_batch.py
batch_input_file_id = file_obj.id # from step 2
create_batch_response = oai_client.batches.create(
completion_window="24h",
endpoint="/v1/chat/completions",
input_file_id=batch_input_file_id, # e.g. "gs://my-batch-bucket/litellm-vertex-files/publishers/google/models/gemini-2.5-flash-lite/abc123-def4-5678-9012-34567890abcd"
extra_body={"custom_llm_provider": "vertex_ai"}
)

print(f"Batch created with ID: {create_batch_response.id}")

Expected Response:

{
"id": "7814463557919047680",
"completion_window": "24hrs",
"created_at": 1758328011,
"endpoint": "",
"input_file_id": "gs://my-batch-bucket/litellm-vertex-files/publishers/google/models/gemini-2.5-flash-lite/abc123-def4-5678-9012-34567890abcd",
"object": "batch",
"status": "validating",
"cancelled_at": null,
"cancelling_at": null,
"completed_at": null,
"error_file_id": null,
"errors": null,
"expired_at": null,
"expires_at": null,
"failed_at": null,
"finalizing_at": null,
"in_progress_at": null,
"metadata": null,
"output_file_id": "gs://my-batch-bucket/litellm-vertex-files/publishers/google/models/gemini-2.5-flash-lite",
"request_counts": null,
"usage": null
}

4. Retrieve batch statusโ€‹

Check the status of your batch job. The batch will progress through states: validating โ†’ in_progress โ†’ completed.

retrieve_batch.py
retrieved_batch = oai_client.batches.retrieve(
batch_id=create_batch_response.id, # Created batch id, e.g. 7814463557919047680
extra_body={"custom_llm_provider": "vertex_ai"}
)

print(f"Batch status: {retrieved_batch.status}")
if retrieved_batch.status == "completed":
print(f"Output file: {retrieved_batch.output_file_id}")

Expected Response (when completed):

{
"id": "7814463557919047680",
"completion_window": "24hrs",
"created_at": 1758328011,
"endpoint": "",
"input_file_id": "gs://my-batch-bucket/litellm-vertex-files/publishers/google/models/gemini-2.5-flash-lite/abc123-def4-5678-9012-34567890abcd",
"object": "batch",
"status": "completed",
"cancelled_at": null,
"cancelling_at": null,
"completed_at": null,
"error_file_id": null,
"errors": null,
"expired_at": null,
"expires_at": null,
"failed_at": null,
"finalizing_at": null,
"in_progress_at": null,
"metadata": null,
"output_file_id": "gs://my-batch-bucket/litellm-vertex-files/publishers/google/models/gemini-2.5-flash-lite/prediction-model-2025-09-19T21:26:51.569037Z/predictions.jsonl",
"request_counts": null,
"usage": null
}

5. Get file contentโ€‹

Once the batch is completed, retrieve the results using the output_file_id from the batch response.

Important: The output_file_id must be URL encoded when used in the request path.

get_file_content.py
import urllib.parse
import json

output_file_id = retrieved_batch.output_file_id
# URL encode the file ID
encoded_file_id = urllib.parse.quote_plus(output_file_id)

# Get file content
file_content = oai_client.files.content(
file_id=encoded_file_id,
extra_body={"custom_llm_provider": "vertex_ai"}
)

# Process the results
for line in file_content.text.strip().split('\n'):
result = json.loads(line)
print(f"Request: {result['request']}")
print(f"Response: {result['response']}")
print("---")

Expected Response:

The response contains JSONL format with one result per line:

{"status":"","processed_time":"2025-09-19T21:29:47.352+00:00","request":{"contents":[{"parts":[{"text":"Hello world!"}],"role":"user"}],"generationConfig":{"max_output_tokens":10},"system_instruction":{"parts":[{"text":"You are a helpful assistant."}]}},"response":{"candidates":[{"avgLogprobs":-0.48079710006713866,"content":{"parts":[{"text":"Hello there! It's nice to meet you"}],"role":"model"},"finishReason":"MAX_TOKENS"}],"createTime":"2025-09-19T21:29:47.484619Z","modelVersion":"gemini-2.5-flash-lite","responseId":"S8vNaIvKHdvshMIP_aOtuAg","usageMetadata":{"candidatesTokenCount":10,"candidatesTokensDetails":[{"modality":"TEXT","tokenCount":10}],"promptTokenCount":9,"promptTokensDetails":[{"modality":"TEXT","tokenCount":9}],"totalTokenCount":19,"trafficType":"ON_DEMAND"}}}
{"status":"","processed_time":"2025-09-19T21:29:47.358+00:00","request":{"contents":[{"parts":[{"text":"Hello world!"}],"role":"user"}],"generationConfig":{"max_output_tokens":10},"system_instruction":{"parts":[{"text":"You are an unhelpful assistant."}]}},"response":{"candidates":[{"avgLogprobs":-0.6168075137668185,"content":{"parts":[{"text":"I am unable to assist with this request."}],"role":"model"},"finishReason":"STOP"}],"createTime":"2025-09-19T21:29:47.470889Z","modelVersion":"gemini-2.5-flash-lite","responseId":"S8vNaOneHISShMIP28nA8QQ","usageMetadata":{"candidatesTokenCount":9,"candidatesTokensDetails":[{"modality":"TEXT","tokenCount":9}],"promptTokenCount":9,"promptTokensDetails":[{"modality":"TEXT","tokenCount":9}],"totalTokenCount":18,"trafficType":"ON_DEMAND"}}}