Skip to main content

OpenTelemetry v2 - Full-request tracing

OpenTelemetry v2 (OTel v2) is LiteLLM Proxy's next-generation tracing. It gives you one clean trace per request that shows the whole story of a request — the incoming HTTP call, authentication, guardrails, the LLM call itself, and the internal database/cache work — all nested in a single tree.

It follows standard OpenTelemetry GenAI semantic conventions, so the traces it produces are readable in any OTel backend (Grafana Tempo, Jaeger, Honeycomb, Datadog, …) and come with ready-made presets for popular LLM observability tools (Arize, Phoenix, Langfuse, Weave, Langtrace, Levo, AgentOps).

Opt-in feature

OTel v2 is off by default. Nothing in it runs until you set LITELLM_OTEL_V2=true. It is separate from the existing OpenTelemetry integration — pick one.

What you get

A single request to your proxy produces one trace that looks like this:

POST /v1/chat/completions ← HTTP request (server span)
├── auth /v1/chat/completions ← authentication
│ ├── postgres get_key_object ← DB lookups during auth
│ └── postgres get_team_membership
├── execute_guardrail presidio-pii ← each guardrail that runs
├── chat gpt-4o ← the LLM call (model, tokens, cost)
└── batch_write_to_db ← spend/usage written to DB

Highlights:

  • One trace, end to end — the HTTP request, auth, guardrails, the LLM call, and DB writes all live in the same trace, correctly nested.
  • Rich GenAI attributes — every LLM-call span carries gen_ai.* attributes: model, provider, token usage, cost, finish reasons, request parameters, and more.
  • Standards-based — built on the official OpenTelemetry GenAI semantic conventions, so it works with any OTel-compatible backend.
  • Vendor presets — one line to ship traces to Arize, Phoenix, Langfuse, Weave, Langtrace, Levo, or AgentOps in the format each tool expects.
  • Safe by default — prompts and responses are not captured unless you explicitly opt in. Noisy routes (health checks, metrics scrapes, UI assets) are excluded automatically.
  • Distributed tracing — if your client sends a traceparent header, LiteLLM's spans nest inside your existing trace.

Requirements

OTel v2 instruments the proxy's FastAPI app, so it needs the OpenTelemetry SDK plus the FastAPI instrumentation package:

pip install "litellm[proxy]" \
opentelemetry-api \
opentelemetry-sdk \
opentelemetry-exporter-otlp \
opentelemetry-instrumentation-fastapi

These packages ship with the proxy Docker image. You only need to install them manually for a pip-based proxy.

Getting started

1. Send traces to any OTLP collector

Set the feature flag plus the standard OTEL_* environment variables. That's it — no config change needed.

LITELLM_OTEL_V2=true
OTEL_EXPORTER="otlp_http"
OTEL_ENDPOINT="http://localhost:4318"

Pass auth headers your backend needs via OTEL_HEADERS:

OTEL_HEADERS="api-key=your-key,x-tenant=acme"

Then start the proxy as usual:

litellm --config config.yaml

Make a request, and you'll see one trace per request in your backend.

2. Send traces to a specific tool (presets)

For LLM observability tools, use a preset. A preset knows the tool's endpoint and emits attributes in the schema that tool expects. To enable one, add its name to callbacks in your config and set the tool's credentials as env vars.

config.yaml
litellm_settings:
callbacks: ["arize"]
LITELLM_OTEL_V2=true
ARIZE_SPACE_ID="your-space-id"
ARIZE_API_KEY="your-api-key"
Send to several places at once

Each preset adds its destination. List more than one callback (e.g. ["arize", "langfuse_otel"]) and your spans are shipped to all of them in parallel, each in the right format.

Capturing prompts & responses

By default, OTel v2 records metadata only (model, tokens, cost, timing) and never writes prompt or response text to your traces. This is intentional — it keeps sensitive content out of your observability backend.

To capture message content, opt in explicitly:

# no_content (default) — never capture prompts/responses
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT="no_content"

# span_only — write prompts/responses as attributes on spans
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT="span_only"

# span_and_event — write content to both spans and events
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT="span_and_event"

The gate is enforced centrally, so it applies to every backend at once — a user request can never force its prompt into your backend while capture is disabled.

Which routes are traced

High-frequency, non-LLM routes are excluded by default so they don't flood your traces: health checks (/health*), the Prometheus scrape (/metrics), and static UI/docs assets (/ui, /docs, /redoc, /_next, /openapi.json, favicons, …).

To change the set, use the standard OpenTelemetry env var (comma-separated paths, substring-matched):

# Trace everything, including health checks
OTEL_PYTHON_FASTAPI_EXCLUDED_URLS=""

# Exclude only your own custom paths
OTEL_PYTHON_FASTAPI_EXCLUDED_URLS="/health,/internal"

Per-key / per-team destinations (multi-tenant)

Some presets (arize, langfuse_otel, weave_otel) support per-request credentials: if a request carries team- or key-scoped credentials, its spans are routed to that tenant's project automatically. This lets one proxy serve many tenants, each seeing only their own traces — no extra setup beyond configuring those credentials on the key/team.

Distributed tracing

If the incoming request has a W3C traceparent header, LiteLLM continues that trace instead of starting a new one. Your LiteLLM spans then appear inline inside whatever distributed trace your application already has — so you can follow a request from your app, through the proxy, to the LLM provider, in one view.

Configuration reference

All values are environment variables. Boolean flags accept true/false.

VariableDefaultPurpose
LITELLM_OTEL_V2falseMaster switch. OTel v2 does nothing until this is true.
OTEL_EXPORTER (alias OTEL_EXPORTER_OTLP_PROTOCOL)consoleExporter kind: console, otlp_http, otlp_grpc.
OTEL_ENDPOINT (alias OTEL_EXPORTER_OTLP_ENDPOINT)noneOTLP collector URL. Setting an endpoint implies otlp_http unless you override OTEL_EXPORTER.
OTEL_HEADERS (alias OTEL_EXPORTER_OTLP_HEADERS)noneComma-separated key=value auth headers for your backend.
OTEL_SERVICE_NAMElitellmservice.name resource attribute shown in your backend.
OTEL_ENVIRONMENT_NAMEnonedeployment.environment resource attribute (e.g. production).
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENTno_contentPrompt/response capture: no_content, span_only, event_only, span_and_event.
OTEL_PYTHON_FASTAPI_EXCLUDED_URLShealth/metrics/UI routesComma-separated paths to exclude from tracing (substring match). Set to "" to trace everything.
LITELLM_OTEL_INTEGRATION_ENABLE_METRICSfalseAlso emit GenAI client metrics (duration, token usage, cost).

What's on an LLM-call span

Every chat <model> (LLM-call) span carries standard GenAI attributes, including:

AttributeMeaning
gen_ai.operation.nameThe operation, e.g. chat, embeddings.
gen_ai.provider.name / gen_ai.systemThe provider, e.g. openai, anthropic.
gen_ai.request.modelThe model requested.
gen_ai.response.modelThe model that answered.
gen_ai.usage.input_tokens / output_tokensToken counts.
gen_ai.request.temperature, max_tokens, top_p, …Request parameters, when set.
gen_ai.response.finish_reasonsWhy generation stopped.
gen_ai.input.messages / gen_ai.output.messagesPrompt/response — only when content capture is enabled.

Troubleshooting

No traces showing up?

  1. Confirm LITELLM_OTEL_V2=true is set in the proxy's environment.
  2. Try OTEL_EXPORTER="console" first — if spans print to stdout, the problem is your exporter endpoint/headers, not LiteLLM.
  3. Make sure you hit an LLM route (e.g. /v1/chat/completions). Health checks and UI routes are excluded by default.
  4. Check that opentelemetry-instrumentation-fastapi is installed (see Requirements).

Only see the LLM call but no auth/postgres/server span? Those server and DB spans require the FastAPI instrumentation package — install opentelemetry-instrumentation-fastapi.

I see metadata but no prompts/responses. That's the default. Set OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=span_only to capture content.

Support

For questions, open an issue at BerriAI/litellm.