Skip to main content

โœจ Enterprise Quickstart

Use this guide if you are on an Enterprise trial to evaluate LiteLLM as a unified LLM, MCP, and Agent gateway with enterprise controls and budget enforcement.

info

Deploy + Shared Setupโ€‹

All gateway and budget tests share one deployment and one org/team/key. Do this section first.

Prerequisitesโ€‹

  • Docker + Docker Compose
  • Postgres โ€” required for Admin UI, virtual keys, MCP/Agent registries, and budget tracking
  • An LLM provider API key (OpenAI, Azure, Anthropic, etc.)
  • Your Enterprise license key from the 7-day trial

Deploy with Docker Composeโ€‹

Follow the Docker Compose tab in the Getting Started Tutorial. Condensed steps:

docker pull ghcr.io/berriai/litellm-database:main-latest
curl -O https://raw.githubusercontent.com/BerriAI/litellm/main/docker-compose.yml

Create .env:

LITELLM_MASTER_KEY="sk-1234"
LITELLM_SALT_KEY="sk-salt-change-me"
LITELLM_LICENSE="eyJ..."
OPENAI_API_KEY="your-api-key"

Create config.yaml:

config.yaml
model_list:
- model_name: gpt-5.5
litellm_params:
model: openai/gpt-5.5
api_key: os.environ/OPENAI_API_KEY

litellm_settings:
callbacks: ["prometheus"]

general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
database_url: "postgresql://llmproxy:dbpassword9090@db:5432/litellm"
store_model_in_db: true
docker compose up

Verify Enterprise Editionโ€‹

Open http://localhost:4000/ โ€” Swagger should show "Enterprise Edition" in the description. See the Enterprise license FAQ.

Open the Admin UI at http://localhost:4000/ui and sign in with your master key.

Shared tenant setupโ€‹

Complete these steps in the Admin UI before starting the gateway tracks.

StepActionWhy
1Create an Organization and a TeamRBAC baseline for the PoC
2Set team max_budget (e.g. $10, duration 30d)Validates team-level spend envelope
3Create a team-scoped virtual key with model accessAll gateway tests use this key, not the master key
4Note the team ID and virtual key in a scratchpadReused in Sections 1โ€“4

โ†’ Multi-tenant Architecture ยท Virtual Keys


1. Validate LLM Gatewayโ€‹

Prove LiteLLM routes LLM requests through your virtual key, tracks spend, and enforces RBAC.

Stepsโ€‹

  1. Confirm model gpt-5.5 (or your model) appears in model_list (config or Admin UI โ†’ Models).

  2. Test with your master key:

curl -X POST 'http://localhost:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "gpt-5.5",
"messages": [{"role": "user", "content": "Hello from LiteLLM Enterprise Gateway"}]
}'
  1. Use your team virtual key โ€” repeat the same request with the key from shared setup.

  2. Verify response โ€” expect 200 OK; assistant text is in choices[0].message.content.

  3. Verify logs โ€” open Logs tab; confirm key, team, model, latency, and spend appear.

  4. Verify team spend โ€” open Teams tab โ†’ select your team; confirm spend incremented toward max_budget.

โ†’ Virtual Keys โ†’ Gateway Quickstart โ†’ Role-Based Access Control


2. MCP Gatewayโ€‹

Prove LiteLLM registers MCP servers, enforces per-key access, routes tool calls, and tracks MCP cost.

Stepsโ€‹

  1. Register MCP server โ€” Admin UI โ†’ MCP Servers โ†’ Add New MCP Server:

    • Name: deepwiki
    • URL: https://mcp.deepwiki.com/mcp
    • Transport: HTTP

    Or add to config.yaml:

mcp_servers:
- server_name: deepwiki
url: https://mcp.deepwiki.com/mcp
transport: http
available_on_public_internet: true
  1. Assign to team/key โ€” under MCP Settings on the virtual key or team, allow the deepwiki server. See MCP Permission Management.

  2. List tools โ€” confirm tools appear in Admin UI under MCP Servers โ†’ MCP Tools.

  3. Invoke via /v1/chat/completions:

curl -X POST 'http://localhost:4000/v1/chat/completions' \
-H 'Authorization: Bearer sk-team-key' \
-H 'Content-Type: application/json' \
-d '{
"model": "gpt-5.5",
"messages": [{"role": "user", "content": "TLDR of BerriAI/litellm repo"}],
"tools": [{
"type": "mcp",
"server_url": "litellm_proxy/deepwiki",
"server_label": "deepwiki",
"require_approval": "never"
}]
}'
  1. Verify response โ€” contains tool output and an assistant summary.

  2. Verify logs โ€” Logs tab shows MCP tool call with namespaced tool name and cost.

โ†’ MCP Overview ยท MCP Permission Management ยท Using your MCP


3. Agent Gatewayโ€‹

Prove LiteLLM registers A2A agents, enforces per-key access, invokes agents, and tracks agent-attributed spend.

Stepsโ€‹

  1. Deploy a sample agent โ€” use Multi-agent collaboration using A2A (simple deployable A2A agent with streaming support).

  2. Register in Admin UI โ€” Agents tab โ†’ Add Agent โ†’ enter name and URL.

  3. Assign to team/key โ€” under Agent Settings on the virtual key, allow the agent. See Agent Permission Management.

  4. List agents:

curl -H 'Authorization: Bearer sk-team-key' \
'http://localhost:4000/v1/agents'
  1. Invoke via the A2A SDK:
invoke_a2a_agent.py
import httpx, asyncio
from uuid import uuid4
from a2a.client import A2ACardResolver, A2AClient
from a2a.types import MessageSendParams, SendMessageRequest

LITELLM_BASE_URL = "http://localhost:4000"
LITELLM_VIRTUAL_KEY = "sk-team-key"

async def main():
headers = {"Authorization": f"Bearer {LITELLM_VIRTUAL_KEY}"}
async with httpx.AsyncClient(headers=headers) as client:
agents = (await client.get(f"{LITELLM_BASE_URL}/v1/agents")).json()
agent_id = agents[0]["agent_id"]
base_url = f"{LITELLM_BASE_URL}/a2a/{agent_id}"
resolver = A2ACardResolver(httpx_client=client, base_url=base_url)
a2a_client = A2AClient(
httpx_client=client,
agent_card=await resolver.get_agent_card(),
)
response = await a2a_client.send_message(
SendMessageRequest(
id=str(uuid4()),
params=MessageSendParams(
message={
"role": "user",
"parts": [{"kind": "text", "text": "Hello, what can you do?"}],
"messageId": uuid4().hex,
}
),
)
)
print(response.model_dump(mode="json", exclude_none=True, indent=2))

asyncio.run(main())
  1. Verify logs โ€” Logs tab shows key, team, latency, and agent-attributed cost. Cost counts toward team/key spend from Section 0.

โ†’ Agent Gateway Overview ยท Invoking A2A Agents ยท Agent Cost Tracking


4. Budgets & Spendโ€‹

Budget enforcement runs on all three gateways through the same virtual key โ€” one control plane governs LLM, MCP, and Agent spend.

4a. Key budget + rate limitsโ€‹

  1. Create a test key with a tight budget and RPM limit:
curl -X POST 'http://localhost:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"max_budget": 0.01,
"rpm_limit": 1,
"team_id": "<your-team-id>"
}'
  1. First request with the new key โ†’ 200 OK.
  2. Second request within the same minute โ†’ rate limit error (RPM exceeded).
  3. Confirm key spend in Admin UI under Virtual Keys.

โ†’ Virtual Keys ยท Docker Quick Start โ€” RPM test

4b. Team budgetโ€‹

Team max_budget was set in Section 0. After completing Sections 1โ€“3:

  1. Open Teams tab โ†’ select your PoC team.
  2. Confirm spend accumulated across LLM, MCP, and Agent calls.
  3. Optional negative test โ€” set team max_budget very low (e.g. $0.0001), make one LLM call, confirm budget-exceeded error.

โ†’ Multi-tenant Architecture

4c. Tag budgetโ€‹

  1. Add tag_budget_config to config.yaml and restart the proxy:
litellm_settings:
tag_budget_config:
poc:chat-app:
max_budget: 0.000000000001
budget_duration: 1d
  1. Make a tagged request:
curl -X POST 'http://localhost:4000/chat/completions' \
-H 'Authorization: Bearer sk-team-key' \
-H 'Content-Type: application/json' \
-d '{
"model": "gpt-5.5",
"messages": [{"role": "user", "content": "Hello"}],
"metadata": {"tags": ["poc:chat-app"]}
}'
  1. First call succeeds; second call with the same tag fails with a budget-exceeded error.

  2. Query tag spend:

curl -X GET 'http://localhost:4000/spend/tags' \
-H 'Authorization: Bearer sk-1234'

Verify: response lists poc:chat-app with total_spend and log_count.

Explore next: Projects ยท Temporary budget increases ยท Soft budget alerts ยท Spend reports ยท Budget Routing ยท Enterprise Spend Tracking


5. Enterprise Controlsโ€‹

Layer security and compliance on top of working gateways and budgets.

Audit logsโ€‹

Enable via store_audit_logs: true under litellm_settings of your config.yml. Delete a virtual key via API or UI, then check the Audit Logs tab.

โ†’ Audit Logs

Team/key guardrailsโ€‹

  1. Guardrails โ†’ create a guardrail (secret detection or content moderation)
  2. Policies โ†’ attach the guardrail to a team or key
  3. Send a request that should be blocked; confirm the guardrail fires

โ†’ Guardrail Policies โ†’ Guardrails Quick Start

SSO for Admin UIโ€‹

SSO controls Admin UI login โ€” separate from API auth (virtual keys or JWT). Register this redirect URI in your IdP:

https://<your-proxy-base-url>/sso/callback
GOOGLE_CLIENT_ID="<your-client-id>"
GOOGLE_CLIENT_SECRET="<your-client-secret>"
PROXY_BASE_URL="https://<your-proxy-base-url>"

Verify: sign in to the Admin UI through your identity provider.

Also available: Custom SSO ยท CLI SSO ยท SCIM provisioning

โ†’ SSO for Admin UI

JWT/OIDC Authโ€‹

Authenticate application requests with your identity provider's JWT tokens instead of static virtual keys.

โ†’ JWT-based Authentication

Secret managerโ€‹

Point LiteLLM at your secret manager so provider keys are read from vault instead of config files.

โ†’ Secret Managers Overview


7. Additional Enterprise Valueโ€‹


8. Need Help?โ€‹

Every Enterprise license includes a dedicated Slack or Teams channel with our engineering team. Reach out to us support@berri.ai and we'll be more than happy to help you!

See Professional Support.