Skip to main content

Experimental Microservices Helm

Run LiteLLM as three independently scalable services — a gateway for LLM traffic, a backend for the management/UI API, and a static ui — plus a one-shot migrations Job.

Reference architecture: LiteLLM on Amazon EKS — gateway, backend, and ui behind one ALB Ingress, with Aurora Postgres, ElastiCache Redis, S3/CloudWatch, and Secrets Manager

For the motivation behind splitting the proxy (why a slow control-plane query can otherwise recycle the pods serving inference), see the blog post One Slow Dashboard Query Shouldn't Take Down Your LLM Traffic.

Experimental

This chart is experimental and its values schema may change between releases. Pin --version and review the diff before upgrading. The single-image deployment (Docker, Helm, Terraform) remains the supported default.

Components

ComponentPortSurface
gateway4000LLM data plane — /chat/completions, /v1/messages, embeddings, audio, batches, passthroughs, /health, /metrics
backend4001Management/UI API — keys, users, teams, orgs, SSO, audit logs, spend & usage analytics
ui3000Next.js dashboard, static export served by nginx
migrationsJobprisma migrate deploy, run once as a pre-install/pre-upgrade Helm hook

Each component is its own Deployment with its own Service, liveness/readiness probes, and HorizontalPodAutoscaler — a failure or load spike on one is contained to that surface.

Prerequisites

  • A Kubernetes cluster and Helm 3.8+ (OCI registry support).
  • An external Postgres database (writer endpoint; optional read replica).
  • Optional: Redis for caching / rate limiting.

Install

Step 1 — Create the Secrets

Sensitive values are passed by Secret reference only — create them first:

kubectl create namespace litellm

kubectl -n litellm create secret generic litellm-master-key-secret \
--from-literal=master-key="sk-..."

kubectl -n litellm create secret generic litellm-writer-secret \
--from-literal=username=litellm --from-literal=password="..."

# Only if you use a read replica (see "Separate read and write databases")
kubectl -n litellm create secret generic litellm-reader-secret \
--from-literal=username=litellm --from-literal=password="..."

Step 2 — Minimal values.yaml

masterKey:
secretName: litellm-master-key-secret
secretKey: master-key

database:
writer:
host: litellm-pg-rw.litellm.svc
port: 5432
dbname: litellm
passwordSecret:
name: litellm-writer-secret
usernameKey: username
passwordKey: password

# Optional: front all three services behind one host
ingress:
enabled: true
className: alb
host: aigateway.example.com

Step 3 — Install from the OCI registry

The chart is published to GitHub Container Registry: ghcr.io/berriai/litellm/chart/litellm.

helm upgrade --install litellm \
oci://ghcr.io/berriai/litellm/chart/litellm \
--version 1.86.0-dev \
-n litellm \
-f values.yaml

The chart runs prisma migrate deploy as a pre-install/pre-upgrade hook Job, then brings up the gateway, backend, and ui Deployments. With ingress.enabled=true a single host fronts all three: data-plane prefixes → gateway, UI assets → ui, catch-all → backend.

Configuration

Separate read and write databases

Routing heavy analytics reads off the writer is just the database.reader block. Set reader.host to enable it; leave it empty and every query goes to the writer. Unset reader fields fall back to the writer's values.

database:
# Writer — all writes (spend logs, tokens, config) land here
writer:
host: litellm-pg-rw.litellm.svc
port: 5432
dbname: litellm
passwordSecret:
name: litellm-writer-secret
usernameKey: username
passwordKey: password

# Reader — read-heavy ops (find_*, count, group_by, raw reads)
reader:
host: litellm-pg-ro.litellm.svc
port: 5432
dbname: litellm
passwordSecret:
name: litellm-reader-secret
usernameKey: username
passwordKey: password

The chart assembles DATABASE_URL and DATABASE_URL_READ_REPLICA from these pieces before the proxy starts. See Database Read Replica for how reads are routed.

RDS / Aurora IAM auth — set useIAMAuth: true on database.writer (and optionally database.reader) to mint short-lived IAM tokens instead of referencing a password Secret:

database:
writer:
host: litellm.cluster-xxxx.us-east-1.rds.amazonaws.com
dbname: litellm
useIAMAuth: true
reader:
host: litellm.cluster-ro-xxxx.us-east-1.rds.amazonaws.com
useIAMAuth: true # requires database.writer.useIAMAuth: true
serviceAccount:
create: true
name: litellm
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::<acct>:role/litellm-db

Redis

Leave redis.host empty to disable. Set redis.cluster: true for Redis Cluster mode (e.g. ElastiCache Cluster) — the chart emits REDIS_CLUSTER_NODES from host/port as the seed and the client discovers the rest from CLUSTER SLOTS.

redis:
cluster: true
host: litellm-redis.litellm.svc
port: 6379
passwordSecret:
name: litellm-redis-secret # leave empty for auth-less Redis
passwordKey: password

Per-component scaling and probes

Each of gateway, backend, ui accepts image, resources, livenessProbe, readinessProbe, hpa, extraEnv, envConfigMaps, envSecrets, logLevel, nodeSelector, tolerations, and affinity. gateway additionally takes numWorkers (uvicorn workers per pod, default 1) and config.proxy_config (rendered into a ConfigMap and mounted at /app/config/config.yaml).

Defaults size each surface for its own load profile:

gateway:
numWorkers: 1
hpa: { enabled: true, minReplicas: 1, maxReplicas: 10,
targetCPUUtilizationPercentage: 70, targetMemoryUtilizationPercentage: 80 }

backend:
hpa: { enabled: true, minReplicas: 1, maxReplicas: 4,
targetCPUUtilizationPercentage: 70 }

ui:
hpa: { enabled: false, minReplicas: 1, maxReplicas: 3 }

Migrations Job

Enabled by default. Runs prisma migrate deploy against the writer database as a Helm pre-install/pre-upgrade hook, using a dedicated litellm-migrations image. Disable it if your pipeline runs migrations out-of-band:

migrationJob:
enabled: true
backoffLimit: 4
ttlSecondsAfterFinished: 120
# The v2 resolver is used by default. To opt back into v1:
extraEnv:
- name: USE_V2_MIGRATION_RESOLVER
value: "false"

Ingress

Enable to wire the three Services behind one L7 entrypoint (required when serving the static UI over the network):

ingress:
enabled: true
className: alb
host: aigateway.example.com
annotations: {}
tls: []

The chart routes UI paths to the ui pods, data-plane prefixes to gateway, and the catch-all (/key/*, /user/*, /spend/*, …) to backend.