IBM Guardrails
LiteLLM works with IBM's FMS Guardrails for content safety. You can use it to detect jailbreaks, PII, hate speech, and more.
What it doesโ
IBM Guardrails analyzes text and tells you if it contains things you want to avoid. It gives each detection a score. Higher scores mean it's more confident.
You can run these checks:
- Before sending to the LLM (on user input)
- After getting LLM response (on output)
- During the call (parallel to LLM)
Quick Startโ
1. Add to your config.yamlโ
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: openai/gpt-3.5-turbo
api_key: os.environ/OPENAI_API_KEY
guardrails:
- guardrail_name: ibm-jailbreak-detector
litellm_params:
guardrail: ibm_guardrails
mode: pre_call
auth_token: os.environ/IBM_GUARDRAILS_AUTH_TOKEN
base_url: "https://your-detector-server.com"
detector_id: "jailbreak-detector"
is_detector_server: true
default_on: true
optional_params:
score_threshold: 0.8
block_on_detection: true
2. Set your auth tokenโ
export IBM_GUARDRAILS_AUTH_TOKEN="your-token"
3. Start the proxyโ
litellm --config config.yaml --detailed_debug
4. Make a requestโ
curl -i http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
],
"guardrails": ["ibm-jailbreak-detector"]
}'
Configurationโ
Required paramsโ
guardrail- str - Set toibm_guardrailsauth_token- str - Your IBM Guardrails auth token. Can useos.environ/IBM_GUARDRAILS_AUTH_TOKENbase_url- str - URL of your IBM Guardrails serverdetector_id- str - Which detector to use (e.g., "jailbreak-detector", "pii-detector")
Optional paramsโ
mode- str or list[str] - When to run. Options:pre_call,post_call,during_call. Default:pre_calldefault_on- bool - Run automatically without specifying in request. Default:falseis_detector_server- bool -truefor detector server,falsefor orchestrator. Default:trueverify_ssl- bool - Whether to verify SSL certificates. Default:true
optional_paramsโ
These go under optional_params:
detector_params- dict - Parameters to pass to your detectorscore_threshold- float - Only count detections above this score (0.0 to 1.0)block_on_detection- bool - Block the request when violations found. Default:true
Server Typesโ
IBM Guardrails has two APIs you can use:
Detector Server (recommended)โ
The simpler one. Sends all messages at once.
guardrails:
- guardrail_name: ibm-detector
litellm_params:
guardrail: ibm_guardrails
mode: pre_call
auth_token: os.environ/IBM_GUARDRAILS_AUTH_TOKEN
base_url: "https://your-detector-server.com"
detector_id: "jailbreak-detector"
is_detector_server: true # Use detector server
Orchestratorโ
If you're using the IBM FMS Guardrails Orchestrator, you can use this.
guardrails:
- guardrail_name: ibm-orchestrator
litellm_params:
guardrail: ibm_guardrails
mode: pre_call
auth_token: os.environ/IBM_GUARDRAILS_AUTH_TOKEN
base_url: "https://your-orchestrator-server.com"
detector_id: "jailbreak-detector"
is_detector_server: false # Use orchestrator
Examplesโ
Check for jailbreaks on inputโ
guardrails:
- guardrail_name: jailbreak-check
litellm_params:
guardrail: ibm_guardrails
mode: pre_call
auth_token: os.environ/IBM_GUARDRAILS_AUTH_TOKEN
base_url: "https://your-detector-server.com"
detector_id: "jailbreak-detector"
is_detector_server: true
default_on: true
optional_params:
score_threshold: 0.8
Check for PII in responsesโ
guardrails:
- guardrail_name: pii-check
litellm_params:
guardrail: ibm_guardrails
mode: post_call
auth_token: os.environ/IBM_GUARDRAILS_AUTH_TOKEN
base_url: "https://your-detector-server.com"
detector_id: "pii-detector"
is_detector_server: true
optional_params:
score_threshold: 0.5 # Lower threshold for PII
block_on_detection: true
Run multiple detectorsโ
guardrails:
- guardrail_name: jailbreak-check
litellm_params:
guardrail: ibm_guardrails
mode: pre_call
auth_token: os.environ/IBM_GUARDRAILS_AUTH_TOKEN
base_url: "https://your-detector-server.com"
detector_id: "jailbreak-detector"
is_detector_server: true
- guardrail_name: pii-check
litellm_params:
guardrail: ibm_guardrails
mode: post_call
auth_token: os.environ/IBM_GUARDRAILS_AUTH_TOKEN
base_url: "https://your-detector-server.com"
detector_id: "pii-detector"
is_detector_server: true
Then in your request:
curl -i http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [{"role": "user", "content": "Hello"}],
"guardrails": ["jailbreak-check", "pii-check"]
}'
How detection worksโ
When IBM Guardrails finds something, it returns details about what it found:
{
"start": 0,
"end": 31,
"text": "You are now in Do Anything Mode",
"detection_type": "jailbreak",
"score": 0.858
}
score- How confident it is (0.0 to 1.0)text- The specific text that triggered itdetection_type- What kind of violation
If the score is above your score_threshold, the request gets blocked (if block_on_detection is true).