Skip to main content

Deploy to Cloud (AWS, GCP, Azure)

Step-by-step guides for running LiteLLM proxy in production on AWS, Google Cloud, or Azure. There are two supported paths. If you run Kubernetes, deploy with Helm on EKS, GKE, or AKS; the install is the same on every cloud, only the data stores and ingress differ. If you do not run Kubernetes, AWS and GCP have official Terraform modules that stand up the entire stack; Azure has no Terraform module, so AKS with Helm is the supported path there.

Architecture​

Clients (OpenAI SDK, LangChain, curl)
Application Load BalancerHTTPS, path-based routing
ECS Fargategateway :4000, backend :4001, ui :3000
gatewaybackendui
Aurora PostgreSQLwriter + reader, IAM auth
ElastiCache Redismulti-AZ, TLS
Secrets Managermaster key, provider keys

LiteLLM provides two deployment modes:

  • Monolithic: one litellm image serves LLM traffic, management APIs, and the UI. This is what the litellm-helm chart runs, and the simplest to operate.
  • Microservices: a gateway (LLM traffic, port 4000), backend (management APIs and UI backend, port 4001), and ui (port 3000), each deployed and scaled independently. This is what the componentized litellm chart and both Terraform modules run; see Microservices Helm for the full reference.

The supporting infrastructure is identical in either mode:

ComponentPurposeNotes
LiteLLM servicesOne proxy deployment (monolithic) or gateway + backend + ui (microservices)Stateless; run 2+ replicas behind a load balancer
PostgreSQLKeys, teams, users, spend logs, configRequired for the proxy's auth and tracking features
RedisRate limiting, router state, caching across instancesRequired once you run more than one instance
Migrations jobApplies schema migrations against PostgresRuns once per upgrade; proxy instances set DISABLE_SCHEMA_UPDATE=true

Core configuration​

DATABASE_URL="postgresql://user:password@host:5432/litellm"
LITELLM_MASTER_KEY="sk-..." # admin key for the proxy
LITELLM_SALT_KEY="sk-..." # encrypts provider credentials stored in the DB. Set once, never change it
DISABLE_SCHEMA_UPDATE="true" # proxy instances never run migrations; the migrations job does
STORE_MODEL_IN_DB="True" # manage models from the Admin UI instead of config files

LITELLM_SALT_KEY cannot be rotated after you add models: it encrypts the provider credentials stored in your database, and changing it makes them unreadable. Generate a strong random value and store both keys in your cloud's secret manager.

Official images are published to ghcr.io/berriai and mirrored at docker.litellm.ai/berriai. Use ghcr.io/berriai/litellm-database for monolithic deployments with Postgres (it bundles the Prisma toolchain), and pin a version tag rather than latest or a moving tag, so rollbacks are deterministic.

Provision the data stores​

The Helm path needs a PostgreSQL database and a Redis reachable from your cluster. Use the managed services:

Provision RDS PostgreSQL and ElastiCache Redis in the same VPC as your EKS cluster, with security groups permitting the cluster's nodes on ports 5432 and 6379.

Deploy with Helm​

First create the secrets both charts consume:

kubectl create secret generic litellm-masterkey \
--from-literal=masterkey="sk-$(openssl rand -hex 24)"

kubectl create secret generic litellm-db \
--from-literal=username=litellm \
--from-literal=password="<database-password>"

kubectl create secret generic litellm-env \
--from-literal=LITELLM_SALT_KEY="sk-$(openssl rand -hex 24)" \
--from-literal=REDIS_PASSWORD="<redis-password>" \
--from-literal=OPENAI_API_KEY="<provider-key>"

Then pick a deployment mode:

values.yaml
replicaCount: 3

image:
repository: ghcr.io/berriai/litellm-database
tag: "v1.90.2" # pin your version

masterkeySecretName: litellm-masterkey
masterkeySecretKey: masterkey

db:
useExisting: true
deployStandalone: false
endpoint: "<postgres-endpoint>"
database: litellm
secret:
name: litellm-db
usernameKey: username
passwordKey: password

environmentSecrets:
- litellm-env

proxy_config:
model_list:
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY
router_settings:
redis_host: "<redis-endpoint>"
redis_port: 6379
redis_password: os.environ/REDIS_PASSWORD
helm install litellm oci://ghcr.io/berriai/litellm-helm -f values.yaml

The chart lives at deploy/charts/litellm-helm; the published chart versions carry LiteLLM release numbers (for example 1.90.2), and helm show values oci://ghcr.io/berriai/litellm-helm lists every knob. Beyond the values above it supports autoscaling (autoscaling.* or keda.*), PodDisruptionBudgets (pdb.*), a Prometheus ServiceMonitor (serviceMonitor.*), read replica routing (db.readReplicaUrl, see Database Read Replica), graceful drain on shutdown (lifecycle), and ArgoCD or Helm hooks for the migrations job (migrationJob.hooks.*, see Helm PreSync hooks).

Both charts run the migrations job automatically and keep DISABLE_SCHEMA_UPDATE=true on the proxy pods. Expose the service through your cloud's ingress: the AWS Load Balancer Controller on EKS, GKE Ingress on GKE, or Application Gateway Ingress (AGIC) on AKS, with health checks on /health/readiness, then point your DNS record at the resulting load balancer. For secrets, prefer your cloud's secret manager over plain Kubernetes secrets (Key Vault CSI driver on AKS, for example); the charts consume whatever secret you mount.

Deploy with Terraform (AWS and GCP)​

The official modules deploy the full microservices stack (network, database, Redis, object storage, secrets, compute, load balancer, and a migrations job that runs before the services start) and are published to the Terraform Registry:

Provisions a VPC with public and private subnets, an Aurora PostgreSQL cluster (writer plus reader, IAM database auth), ElastiCache Redis (multi-AZ, encrypted), an S3 bucket, Secrets Manager entries, an Application Load Balancer, and ECS Fargate services.

main.tf
module "litellm" {
source = "BerriAI/litellm/aws"
version = "~> 1.90"

region = "us-east-1"
azs = ["us-east-1a", "us-east-1b"]
tenant = "acme"
env = "prod"

ui_password = var.ui_password
litellm_license = var.litellm_license # optional, omit for open source
acm_certificate_arn = var.acm_certificate_arn # TLS is required by default

proxy_config = {
model_list = [{
model_name = "gpt-4o"
litellm_params = {
model = "openai/gpt-4o"
api_key = "os.environ/OPENAI_API_KEY"
}
}]
}
gateway_extra_secrets = {
OPENAI_API_KEY = var.openai_key_secret_arn
}
}

Before you apply: provision the TLS certificate in AWS Certificate Manager (the module refuses a plaintext ALB unless you explicitly set allow_plaintext_alb = true), and create any provider-key secrets in Secrets Manager first, since gateway_extra_secrets takes their ARNs. After apply, point your DNS record at the ALB hostname.

The module auto-generates the master key into Secrets Manager if you do not supply one. The application connects to Aurora with short-lived IAM tokens, so its DATABASE_URL carries no password (the database master password itself is generated into Secrets Manager and never touches the application). Every resource is named <tenant>-litellm-<env>, and the module declares no provider, so you can for_each it to run one stack per tenant.

Verify the deployment​

Confirm the proxy is up and can reach its database:

curl -s https://llm.example.com/health/readiness

Then open the Admin UI at https://llm.example.com/ui and log in with your master key.

  1. Add a model. Go to Models + Endpoints and click Add Model: pick the provider, the model, and enter the provider credentials. With STORE_MODEL_IN_DB=True the model is saved to your database, so you manage models here rather than in config files.
  1. Create a key. Go to Virtual Keys and click Create New Key, scoping it to the model you just added.
  1. Send a request. Go to the Test Key playground, select your key and model, and send a message. A response here proves the full path: load balancer, proxy, database, and provider credentials.

Next steps​

Harden the deployment with the production checklist (worker counts, machine sizing, Redis settings, graceful degradation). Add regions with Multi-Region Deployment. For very high throughput (1000+ RPS), see resolving DB deadlocks.

🚅
LiteLLM Enterprise
SSO/SAML, audit logs, spend tracking, multi-team management, and guardrails — built for production.
Learn more →