Helicone

Overview

Property	Details
Description	Helicone is an AI gateway and observability platform that provides OpenAI-compatible endpoints with advanced monitoring, caching, and analytics capabilities.
Provider Route on LiteLLM	`helicone/`
Link to Provider Doc	Helicone Documentation ↗
Base URL	`https://ai-gateway.helicone.ai/`
Supported Operations	`/chat/completions`, `/completions`, `/embeddings`

We support ALL models available through Helicone's AI Gateway. Use helicone/ as a prefix when sending requests.

What is Helicone?

Helicone is an open-source observability platform for LLM applications that provides:

Request Monitoring: Track all LLM requests with detailed metrics
Caching: Reduce costs and latency with intelligent caching
Rate Limiting: Control request rates per user/key
Cost Tracking: Monitor spend across models and users
Custom Properties: Tag requests with metadata for filtering and analysis
Prompt Management: Version control for prompts

Required Variables

Environment Variables
os.environ["HELICONE_API_KEY"] = ""  # your Helicone API key

Get your Helicone API key from your Helicone dashboard.

Usage - LiteLLM Python SDK

Non-streaming

Helicone Non-streaming Completion
import os
import litellm
from litellm import completion

os.environ["HELICONE_API_KEY"] = ""  # your Helicone API key

messages = [{"content": "What is the capital of France?", "role": "user"}]

# Helicone call - routes through Helicone gateway to OpenAI
response = completion(
    model="helicone/gpt-4",
    messages=messages
)

print(response)

Streaming

Helicone Streaming Completion
import os
import litellm
from litellm import completion

os.environ["HELICONE_API_KEY"] = ""  # your Helicone API key

messages = [{"content": "Write a short poem about AI", "role": "user"}]

# Helicone call with streaming
response = completion(
    model="helicone/gpt-4",
    messages=messages,
    stream=True
)

for chunk in response:
    print(chunk)

With Metadata (Helicone Custom Properties)

Helicone with Custom Properties
import os
import litellm
from litellm import completion

os.environ["HELICONE_API_KEY"] = ""  # your Helicone API key

response = completion(
    model="helicone/gpt-4o-mini",
    messages=[{"role": "user", "content": "What's the weather like?"}],
    metadata={
        "Helicone-Property-Environment": "production",
        "Helicone-Property-User-Id": "user_123",
        "Helicone-Property-Session-Id": "session_abc"
    }
)

print(response)

Text Completion

Helicone Text Completion
import os
import litellm

os.environ["HELICONE_API_KEY"] = ""  # your Helicone API key

response = litellm.completion(
    model="helicone/gpt-4o-mini",  # text completion model
    prompt="Once upon a time"
)

print(response)

Retry and Fallback Mechanisms

import litellm

litellm.api_base = "https://ai-gateway.helicone.ai/"
litellm.metadata = {
    "Helicone-Retry-Enabled": "true",
    "helicone-retry-num": "3",
    "helicone-retry-factor": "2",
}

response = litellm.completion(
    model="helicone/gpt-4o-mini/openai,claude-3-5-sonnet-20241022/anthropic", # Try OpenAI first, then fallback to Anthropic, then continue with other models,
    messages=[{"role": "user", "content": "Hello"}]
)

Supported OpenAI Parameters

Helicone supports all standard OpenAI-compatible parameters:

Parameter	Type	Description
`messages`	array	Required. Array of message objects with 'role' and 'content'
`model`	string	Required. Model ID (e.g., gpt-4, claude-3-opus, etc.)
`stream`	boolean	Optional. Enable streaming responses
`temperature`	float	Optional. Sampling temperature
`top_p`	float	Optional. Nucleus sampling parameter
`max_tokens`	integer	Optional. Maximum tokens to generate
`frequency_penalty`	float	Optional. Penalize frequent tokens
`presence_penalty`	float	Optional. Penalize tokens based on presence
`stop`	string/array	Optional. Stop sequences
`n`	integer	Optional. Number of completions to generate
`tools`	array	Optional. List of available tools/functions
`tool_choice`	string/object	Optional. Control tool/function calling
`response_format`	object	Optional. Response format specification
`user`	string	Optional. User identifier

Helicone-Specific Headers

Pass these as metadata to leverage Helicone features:

Header	Description
`Helicone-Property-*`	Custom properties for filtering (e.g., `Helicone-Property-User-Id`)
`Helicone-Cache-Enabled`	Enable caching for this request
`Helicone-User-Id`	User identifier for tracking
`Helicone-Session-Id`	Session identifier for grouping requests
`Helicone-Prompt-Id`	Prompt identifier for versioning
`Helicone-Rate-Limit-Policy`	Rate limiting policy name

Example with headers:

Helicone with Custom Headers
import litellm

response = litellm.completion(
    model="helicone/gpt-4",
    messages=[{"role": "user", "content": "Hello"}],
    metadata={
        "Helicone-Cache-Enabled": "true",
        "Helicone-Property-Environment": "production",
        "Helicone-Property-User-Id": "user_123",
        "Helicone-Session-Id": "session_abc",
        "Helicone-Prompt-Id": "prompt_v1"
    }
)

Advanced Usage

Using with Different Providers

Helicone acts as a gateway and supports multiple providers:

Helicone with Anthropic
import litellm

# Set both Helicone and Anthropic keys
os.environ["HELICONE_API_KEY"] = "your-helicone-key"

response = litellm.completion(
    model="helicone/claude-3.5-haiku/anthropic",
    messages=[{"role": "user", "content": "Hello"}]
)

Caching

Enable caching to reduce costs and latency:

Helicone Caching
import litellm

response = litellm.completion(
    model="helicone/gpt-4",
    messages=[{"role": "user", "content": "What is 2+2?"}],
    metadata={
        "Helicone-Cache-Enabled": "true"
    }
)

# Subsequent identical requests will be served from cache
response2 = litellm.completion(
    model="helicone/gpt-4",
    messages=[{"role": "user", "content": "What is 2+2?"}],
    metadata={
        "Helicone-Cache-Enabled": "true"
    }
)

Features

Request Monitoring

Track all requests with detailed metrics
View request/response pairs
Monitor latency and errors
Filter by custom properties

Cost Tracking

Per-model cost tracking
Per-user cost tracking
Cost alerts and budgets
Historical cost analysis

Rate Limiting

Per-user rate limits
Per-API key rate limits
Custom rate limit policies
Automatic enforcement

Analytics

Request volume trends
Cost trends
Latency percentiles
Error rates

Visit Helicone Pricing for details.

Overview​

What is Helicone?​

Required Variables​

Usage - LiteLLM Python SDK​

Non-streaming​

Streaming​

With Metadata (Helicone Custom Properties)​

Text Completion​

Retry and Fallback Mechanisms​

Supported OpenAI Parameters​

Helicone-Specific Headers​

Advanced Usage​

Using with Different Providers​

Caching​

Features​

Request Monitoring​

Cost Tracking​

Rate Limiting​

Analytics​

Additional Resources​