Skip to main content

/mcp [BETA] - Model Context Protocol

LiteLLM Proxy provides an MCP Gateway that allows you to use a fixed endpoint for all MCP tools and control MCP access by Key, Team.

LiteLLM MCP Architecture: Use MCP tools with all LiteLLM supported models

Overview​

FeatureDescription
MCP Operations• List Tools
• Call Tools
Supported MCP Transports• Streamable HTTP
• SSE
LiteLLM Permission Management✨ Enterprise Only
• By Key
• By Team
• By Organization

Adding your MCP​

On the LiteLLM UI, Navigate to "MCP Servers" and click "Add New MCP Server".

On this form, you should enter your MCP Server URL and the transport you want to use.

LiteLLM supports the following MCP transports:

  • Streamable HTTP
  • SSE (Server-Sent Events)

Using your MCP​

Connect via OpenAI Responses API​

Use the OpenAI Responses API to connect to your LiteLLM MCP server:

cURL Example
curl --location 'https://api.openai.com/v1/responses' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer $OPENAI_API_KEY" \
--data '{
"model": "gpt-4o",
"tools": [
{
"type": "mcp",
"server_label": "litellm",
"server_url": "<your-litellm-proxy-base-url>/mcp",
"require_approval": "never",
"headers": {
"x-litellm-api-key": "Bearer YOUR_LITELLM_API_KEY"
}
}
],
"input": "Run available tools",
"tool_choice": "required"
}'

Segregating MCP Server Access​

You can choose to access specific MCP servers and only list their tools using the x-mcp-servers header. This header allows you to:

  • Limit tool access to one or more specific MCP servers
  • Control which tools are available in different environments or use cases

The header accepts a comma-separated list of server names: "Zapier_Gmail,Server2,Server3"

Notes:

  • Server names with spaces should be replaced with underscores
  • If the header is not provided, tools from all available MCP servers will be accessible
cURL Example with Server Segregation
curl --location 'https://api.openai.com/v1/responses' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer $OPENAI_API_KEY" \
--data '{
"model": "gpt-4o",
"tools": [
{
"type": "mcp",
"server_label": "litellm",
"server_url": "<your-litellm-proxy-base-url>/mcp",
"require_approval": "never",
"headers": {
"x-litellm-api-key": "Bearer YOUR_LITELLM_API_KEY",
"x-mcp-servers": "Zapier_Gmail"
}
}
],
"input": "Run available tools",
"tool_choice": "required"
}'

In this example, the request will only have access to tools from the "Zapier_Gmail" MCP server.

Using your MCP with client side credentials​

Use this if you want to pass a client side authentication token to LiteLLM to then pass to your MCP to auth to your MCP.

You can specify your MCP auth token using the header x-mcp-auth. LiteLLM will forward this token to your MCP server for authentication.

Connect via OpenAI Responses API with MCP Auth​

Use the OpenAI Responses API and include the x-mcp-auth header for your MCP server authentication:

cURL Example with MCP Auth
curl --location 'https://api.openai.com/v1/responses' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer $OPENAI_API_KEY" \
--data '{
"model": "gpt-4o",
"tools": [
{
"type": "mcp",
"server_label": "litellm",
"server_url": "<your-litellm-proxy-base-url>/mcp",
"require_approval": "never",
"headers": {
"x-litellm-api-key": "Bearer YOUR_LITELLM_API_KEY",
"x-mcp-auth": YOUR_MCP_AUTH_TOKEN
}
}
],
"input": "Run available tools",
"tool_choice": "required"
}'

Customize the MCP Auth Header Name​

By default, LiteLLM uses x-mcp-auth to pass your credentials to MCP servers. You can change this header name in one of the following ways:

  1. Set the LITELLM_MCP_CLIENT_SIDE_AUTH_HEADER_NAME environment variable
Environment Variable
export LITELLM_MCP_CLIENT_SIDE_AUTH_HEADER_NAME="authorization"
  1. Set the mcp_client_side_auth_header_name in the general settings on the config.yaml file
config.yaml
model_list:
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: sk-xxxxxxx

general_settings:
mcp_client_side_auth_header_name: "authorization"

Using the authorization header​

In this example the authorization header will be passed to the MCP server for authentication.

cURL with authorization header
curl --location '<your-litellm-proxy-base-url>/v1/responses' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer $LITELLM_API_KEY" \
--data '{
"model": "gpt-4o",
"tools": [
{
"type": "mcp",
"server_label": "litellm",
"server_url": "<your-litellm-proxy-base-url>/mcp",
"require_approval": "never",
"headers": {
"x-litellm-api-key": "Bearer YOUR_LITELLM_API_KEY",
"authorization": "Bearer sk-zapier-token-123"
}
}
],
"input": "Run available tools",
"tool_choice": "required"
}'

✨ MCP Cost Tracking​

LiteLLM provides two ways to track costs for MCP tool calls:

MethodWhen to UseWhat It Does
Config-based Cost TrackingSimple cost tracking with fixed costs per tool/serverAutomatically tracks costs based on configuration
Custom Post-MCP HookDynamic cost tracking with custom logicAllows custom cost calculations and response modifications

Config-based Cost Tracking​

Configure fixed costs for MCP servers directly in your config.yaml:

config.yaml
model_list:
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: sk-xxxxxxx

mcp_servers:
zapier_server:
url: "https://actions.zapier.com/mcp/sk-xxxxx/sse"
mcp_info:
mcp_server_cost_info:
# Default cost for all tools in this server
default_cost_per_query: 0.01
# Custom cost for specific tools
tool_name_to_cost_per_query:
send_email: 0.05
create_document: 0.03

expensive_api_server:
url: "https://api.expensive-service.com/mcp"
mcp_info:
mcp_server_cost_info:
default_cost_per_query: 1.50

Custom Post-MCP Hook​

Use this when you need dynamic cost calculation or want to modify the MCP response before it's returned to the user.

1. Create a custom MCP hook file​

custom_mcp_hook.py
from typing import Optional
from litellm.integrations.custom_logger import CustomLogger
from litellm.types.mcp import MCPPostCallResponseObject


class CustomMCPCostTracker(CustomLogger):
"""
Custom handler for MCP cost tracking and response modification
"""

async def async_post_mcp_tool_call_hook(
self,
kwargs,
response_obj: MCPPostCallResponseObject,
start_time,
end_time
) -> Optional[MCPPostCallResponseObject]:
"""
Called after each MCP tool call.
Modify costs and response before returning to user.
"""

# Extract tool information from kwargs
tool_name = kwargs.get("name", "")
server_name = kwargs.get("server_name", "")

# Calculate custom cost based on your logic
custom_cost = 42.00

# Set the response cost
response_obj.hidden_params.response_cost = custom_cost



return response_obj


# Create instance for LiteLLM to use
custom_mcp_cost_tracker = CustomMCPCostTracker()

2. Configure in config.yaml​

config.yaml
model_list:
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: sk-xxxxxxx

# Add your custom MCP hook
callbacks:
- custom_mcp_hook.custom_mcp_cost_tracker

mcp_servers:
zapier_server:
url: "https://actions.zapier.com/mcp/sk-xxxxx/sse"

3. Start the proxy​

$ litellm --config /path/to/config.yaml 

When MCP tools are called, your custom hook will:

  1. Calculate costs based on your custom logic
  2. Modify the response if needed
  3. Track costs in LiteLLM's logging system

✨ MCP Permission Management​

LiteLLM supports managing permissions for MCP Servers by Keys, Teams, Organizations (entities) on LiteLLM. When a MCP client attempts to list tools, LiteLLM will only return the tools the entity has permissions to access.

When Creating a Key, Team, or Organization, you can select the allowed MCP Servers that the entity has access to.

LiteLLM Proxy - Walk through MCP Gateway​

LiteLLM exposes an MCP Gateway for admins to add all their MCP servers to LiteLLM. The key benefits of using LiteLLM Proxy with MCP are:

  1. Use a fixed endpoint for all MCP tools
  2. MCP Permission management by Key, Team, or User

This video demonstrates how you can onboard an MCP server to LiteLLM Proxy, use it and set access controls.

LiteLLM Python SDK MCP Bridge​

LiteLLM Python SDK acts as a MCP bridge to utilize MCP tools with all LiteLLM supported models. LiteLLM offers the following features for using MCP

  • List Available MCP Tools: OpenAI clients can view all available MCP tools
    • litellm.experimental_mcp_client.load_mcp_tools to list all available MCP tools
  • Call MCP Tools: OpenAI clients can call MCP tools
    • litellm.experimental_mcp_client.call_openai_tool to call an OpenAI tool on an MCP server

1. List Available MCP Tools​

In this example we'll use litellm.experimental_mcp_client.load_mcp_tools to list all available MCP tools on any MCP server. This method can be used in two ways:

  • format="mcp" - (default) Return MCP tools
    • Returns: mcp.types.Tool
  • format="openai" - Return MCP tools converted to OpenAI API compatible tools. Allows using with OpenAI endpoints.
    • Returns: openai.types.chat.ChatCompletionToolParam
MCP Client List Tools
# Create server parameters for stdio connection
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
import os
import litellm
from litellm import experimental_mcp_client


server_params = StdioServerParameters(
command="python3",
# Make sure to update to the full absolute path to your mcp_server.py file
args=["./mcp_server.py"],
)

async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
# Initialize the connection
await session.initialize()

# Get tools
tools = await experimental_mcp_client.load_mcp_tools(session=session, format="openai")
print("MCP TOOLS: ", tools)

messages = [{"role": "user", "content": "what's (3 + 5)"}]
llm_response = await litellm.acompletion(
model="gpt-4o",
api_key=os.getenv("OPENAI_API_KEY"),
messages=messages,
tools=tools,
)
print("LLM RESPONSE: ", json.dumps(llm_response, indent=4, default=str))

2. List and Call MCP Tools​

In this example we'll use

  • litellm.experimental_mcp_client.load_mcp_tools to list all available MCP tools on any MCP server
  • litellm.experimental_mcp_client.call_openai_tool to call an OpenAI tool on an MCP server

The first llm response returns a list of OpenAI tools. We take the first tool call from the LLM response and pass it to litellm.experimental_mcp_client.call_openai_tool to call the tool on the MCP server.

How litellm.experimental_mcp_client.call_openai_tool works​

  • Accepts an OpenAI Tool Call from the LLM response
  • Converts the OpenAI Tool Call to an MCP Tool
  • Calls the MCP Tool on the MCP server
  • Returns the result of the MCP Tool call
MCP Client List and Call Tools
# Create server parameters for stdio connection
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
import os
import litellm
from litellm import experimental_mcp_client


server_params = StdioServerParameters(
command="python3",
# Make sure to update to the full absolute path to your mcp_server.py file
args=["./mcp_server.py"],
)

async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
# Initialize the connection
await session.initialize()

# Get tools
tools = await experimental_mcp_client.load_mcp_tools(session=session, format="openai")
print("MCP TOOLS: ", tools)

messages = [{"role": "user", "content": "what's (3 + 5)"}]
llm_response = await litellm.acompletion(
model="gpt-4o",
api_key=os.getenv("OPENAI_API_KEY"),
messages=messages,
tools=tools,
)
print("LLM RESPONSE: ", json.dumps(llm_response, indent=4, default=str))

openai_tool = llm_response["choices"][0]["message"]["tool_calls"][0]
# Call the tool using MCP client
call_result = await experimental_mcp_client.call_openai_tool(
session=session,
openai_tool=openai_tool,
)
print("MCP TOOL CALL RESULT: ", call_result)

# send the tool result to the LLM
messages.append(llm_response["choices"][0]["message"])
messages.append(
{
"role": "tool",
"content": str(call_result.content[0].text),
"tool_call_id": openai_tool["id"],
}
)
print("final messages with tool result: ", messages)
llm_response = await litellm.acompletion(
model="gpt-4o",
api_key=os.getenv("OPENAI_API_KEY"),
messages=messages,
tools=tools,
)
print(
"FINAL LLM RESPONSE: ", json.dumps(llm_response, indent=4, default=str)
)