OpenTelemetry v2 - Full-request tracing
OpenTelemetry v2 (OTel v2) is LiteLLM Proxy's next-generation tracing. It gives you one clean trace per request that shows the whole story of a request — the incoming HTTP call, authentication, guardrails, the LLM call itself, and the internal database/cache work — all nested in a single tree.
It follows standard OpenTelemetry GenAI semantic conventions, so the traces it produces are readable in any OTel backend (Grafana Tempo, Jaeger, Honeycomb, Datadog, …) and come with ready-made presets for popular LLM observability tools (Arize, Phoenix, Langfuse, Weave, Langtrace, Levo, AgentOps).
OTel v2 is off by default. Nothing in it runs until you set LITELLM_OTEL_V2=true. It is separate from the existing OpenTelemetry integration — pick one.
What you get
A single request to your proxy produces one trace that looks like this:
POST /v1/chat/completions ← HTTP request (server span)
├── auth /v1/chat/completions ← authentication
│ ├── postgres get_key_object ← DB lookups during auth
│ └── postgres get_team_membership
├── execute_guardrail presidio-pii ← each guardrail that runs
├── chat gpt-4o ← the LLM call (model, tokens, cost)
└── batch_write_to_db ← spend/usage written to DB
Highlights:
- One trace, end to end — the HTTP request, auth, guardrails, the LLM call, and DB writes all live in the same trace, correctly nested.
- Rich GenAI attributes — every LLM-call span carries
gen_ai.*attributes: model, provider, token usage, cost, finish reasons, request parameters, and more. - Standards-based — built on the official OpenTelemetry GenAI semantic conventions, so it works with any OTel-compatible backend.
- Vendor presets — one line to ship traces to Arize, Phoenix, Langfuse, Weave, Langtrace, Levo, or AgentOps in the format each tool expects.
- Safe by default — prompts and responses are not captured unless you explicitly opt in. Noisy routes (health checks, metrics scrapes, UI assets) are excluded automatically.
- Distributed tracing — if your client sends a
traceparentheader, LiteLLM's spans nest inside your existing trace.
Requirements
OTel v2 instruments the proxy's FastAPI app, so it needs the OpenTelemetry SDK plus the FastAPI instrumentation package:
pip install "litellm[proxy]" \
opentelemetry-api \
opentelemetry-sdk \
opentelemetry-exporter-otlp \
opentelemetry-instrumentation-fastapi
These packages ship with the proxy Docker image. You only need to install them manually for a
pip-based proxy.
Getting started
1. Send traces to any OTLP collector
Set the feature flag plus the standard OTEL_* environment variables. That's it — no config change needed.
- OTLP HTTP collector
- OTLP gRPC collector
- Print to console (testing)
LITELLM_OTEL_V2=true
OTEL_EXPORTER="otlp_http"
OTEL_ENDPOINT="http://localhost:4318"
LITELLM_OTEL_V2=true
OTEL_EXPORTER="otlp_grpc"
OTEL_ENDPOINT="http://localhost:4317"
gRPC export needs
grpcio. Install withpip install grpcio.
LITELLM_OTEL_V2=true
OTEL_EXPORTER="console"
Spans are printed to stdout — handy for verifying everything works before pointing at a real backend.
Pass auth headers your backend needs via OTEL_HEADERS:
OTEL_HEADERS="api-key=your-key,x-tenant=acme"
Then start the proxy as usual:
litellm --config config.yaml
Make a request, and you'll see one trace per request in your backend.
2. Send traces to a specific tool (presets)
For LLM observability tools, use a preset. A preset knows the tool's endpoint and emits attributes in the schema that tool expects. To enable one, add its name to callbacks in your config and set the tool's credentials as env vars.
- Arize
- Arize Phoenix
- Langfuse
- Weave (W&B)
- Langtrace
- Levo
- AgentOps
litellm_settings:
callbacks: ["arize"]
LITELLM_OTEL_V2=true
ARIZE_SPACE_ID="your-space-id"
ARIZE_API_KEY="your-api-key"
litellm_settings:
callbacks: ["arize_phoenix"]
LITELLM_OTEL_V2=true
PHOENIX_API_KEY="your-api-key"
PHOENIX_COLLECTOR_ENDPOINT="https://app.phoenix.arize.com/v1/traces"
PHOENIX_PROJECT_NAME="my-project" # optional
litellm_settings:
callbacks: ["langfuse_otel"]
LITELLM_OTEL_V2=true
LANGFUSE_PUBLIC_KEY="pk-..."
LANGFUSE_SECRET_KEY="sk-..."
LANGFUSE_HOST="https://cloud.langfuse.com" # or your self-hosted URL
litellm_settings:
callbacks: ["weave_otel"]
LITELLM_OTEL_V2=true
WANDB_API_KEY="your-api-key"
WANDB_PROJECT_ID="your-entity/your-project"
litellm_settings:
callbacks: ["langtrace"]
LITELLM_OTEL_V2=true
# Langtrace reads from your existing OTLP collector — point it at Langtrace:
OTEL_ENDPOINT="https://langtrace.ai/api/trace"
OTEL_HEADERS="api_key=your-langtrace-api-key"
litellm_settings:
callbacks: ["levo"]
LITELLM_OTEL_V2=true
LEVO_AUTH_TOKEN="your-token"
LEVO_ORG_ID="your-org"
litellm_settings:
callbacks: ["agentops"]
LITELLM_OTEL_V2=true
AGENTOPS_API_KEY="your-api-key"
Each preset adds its destination. List more than one callback (e.g. ["arize", "langfuse_otel"]) and your spans are shipped to all of them in parallel, each in the right format.
Capturing prompts & responses
By default, OTel v2 records metadata only (model, tokens, cost, timing) and never writes prompt or response text to your traces. This is intentional — it keeps sensitive content out of your observability backend.
To capture message content, opt in explicitly:
# no_content (default) — never capture prompts/responses
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT="no_content"
# span_only — write prompts/responses as attributes on spans
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT="span_only"
# span_and_event — write content to both spans and events
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT="span_and_event"
The gate is enforced centrally, so it applies to every backend at once — a user request can never force its prompt into your backend while capture is disabled.
Which routes are traced
High-frequency, non-LLM routes are excluded by default so they don't flood your traces: health checks (/health*), the Prometheus scrape (/metrics), and static UI/docs assets (/ui, /docs, /redoc, /_next, /openapi.json, favicons, …).
To change the set, use the standard OpenTelemetry env var (comma-separated paths, substring-matched):
# Trace everything, including health checks
OTEL_PYTHON_FASTAPI_EXCLUDED_URLS=""
# Exclude only your own custom paths
OTEL_PYTHON_FASTAPI_EXCLUDED_URLS="/health,/internal"
Per-key / per-team destinations (multi-tenant)
Some presets (arize, langfuse_otel, weave_otel) support per-request credentials: if a request carries team- or key-scoped credentials, its spans are routed to that tenant's project automatically. This lets one proxy serve many tenants, each seeing only their own traces — no extra setup beyond configuring those credentials on the key/team.
Distributed tracing
If the incoming request has a W3C traceparent header, LiteLLM continues that trace instead of starting a new one. Your LiteLLM spans then appear inline inside whatever distributed trace your application already has — so you can follow a request from your app, through the proxy, to the LLM provider, in one view.
Configuration reference
All values are environment variables. Boolean flags accept true/false.
| Variable | Default | Purpose |
|---|---|---|
LITELLM_OTEL_V2 | false | Master switch. OTel v2 does nothing until this is true. |
OTEL_EXPORTER (alias OTEL_EXPORTER_OTLP_PROTOCOL) | console | Exporter kind: console, otlp_http, otlp_grpc. |
OTEL_ENDPOINT (alias OTEL_EXPORTER_OTLP_ENDPOINT) | none | OTLP collector URL. Setting an endpoint implies otlp_http unless you override OTEL_EXPORTER. |
OTEL_HEADERS (alias OTEL_EXPORTER_OTLP_HEADERS) | none | Comma-separated key=value auth headers for your backend. |
OTEL_SERVICE_NAME | litellm | service.name resource attribute shown in your backend. |
OTEL_ENVIRONMENT_NAME | none | deployment.environment resource attribute (e.g. production). |
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT | no_content | Prompt/response capture: no_content, span_only, event_only, span_and_event. |
OTEL_PYTHON_FASTAPI_EXCLUDED_URLS | health/metrics/UI routes | Comma-separated paths to exclude from tracing (substring match). Set to "" to trace everything. |
LITELLM_OTEL_INTEGRATION_ENABLE_METRICS | false | Also emit GenAI client metrics (duration, token usage, cost). |
What's on an LLM-call span
Every chat <model> (LLM-call) span carries standard GenAI attributes, including:
| Attribute | Meaning |
|---|---|
gen_ai.operation.name | The operation, e.g. chat, embeddings. |
gen_ai.provider.name / gen_ai.system | The provider, e.g. openai, anthropic. |
gen_ai.request.model | The model requested. |
gen_ai.response.model | The model that answered. |
gen_ai.usage.input_tokens / output_tokens | Token counts. |
gen_ai.request.temperature, max_tokens, top_p, … | Request parameters, when set. |
gen_ai.response.finish_reasons | Why generation stopped. |
gen_ai.input.messages / gen_ai.output.messages | Prompt/response — only when content capture is enabled. |
Troubleshooting
No traces showing up?
- Confirm
LITELLM_OTEL_V2=trueis set in the proxy's environment. - Try
OTEL_EXPORTER="console"first — if spans print to stdout, the problem is your exporter endpoint/headers, not LiteLLM. - Make sure you hit an LLM route (e.g.
/v1/chat/completions). Health checks and UI routes are excluded by default. - Check that
opentelemetry-instrumentation-fastapiis installed (see Requirements).
Only see the LLM call but no auth/postgres/server span? Those server and DB spans require the FastAPI instrumentation package — install opentelemetry-instrumentation-fastapi.
I see metadata but no prompts/responses. That's the default. Set OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=span_only to capture content.
Support
For questions, open an issue at BerriAI/litellm.