Skip to main content

Gray Swan Cygnal Guardrail

Use Gray Swan Cygnal to continuously monitor conversations for policy violations, indirect prompt injection (IPI), jailbreak attempts, and other safety risks.

Cygnal returns a violation score between 0 and 1 (higher means more likely to violate policy), plus metadata such as violated rule indices, mutation detection, and IPI flags. LiteLLM can automatically block or monitor requests based on this signal.


Quick Start​

1. Obtain Credentials​

  1. Create a Gray Swan account and generate a Cygnal API key.
  2. Configure environment variables for the LiteLLM proxy host:
export GRAYSWAN_API_KEY="your-grayswan-key"

2. Configure config.yaml​

Add a guardrail entry that references the Gray Swan integration. Below is a balanced example that monitors both input and output but only blocks once the violation score reaches the configured threshold.

model_list:
- model_name: openai/gpt-4.1-mini
litellm_params:
model: openai/gpt-4.1-mini
api_key: os.environ/OPENAI_API_KEY

guardrails:
- guardrail_name: "cygnal-monitor"
litellm_params:
guardrail: grayswan
mode: [pre_call, post_call] # monitor both input and output
api_key: os.environ/GRAYSWAN_API_KEY
optional_params:
on_flagged_action: monitor # or "block"
violation_threshold: 0.5 # score >= threshold is flagged
reasoning_mode: hybrid # off | hybrid | thinking
categories:
safety: "Detect jailbreaks and policy violations"
policy_id: "your-cygnal-policy-id"
default_on: true

general_settings:
master_key: "your-litellm-master-key"

litellm_settings:
set_verbose: true

3. Launch the Proxy​

litellm --config config.yaml --port 4000

Choosing Guardrail Modes​

Gray Swan can run during pre_call, during_call, and post_call stages. Combine modes based on your latency and coverage requirements.

ModeWhen it RunsProtectsTypical Use Case
pre_callBefore LLM callUser input onlyBlock prompt injection before it reaches the model
during_callParallel to callUser input onlyLow-latency monitoring without blocking
post_callAfter responseFull conversationScan output for policy violations, leaked secrets, or IPI
guardrails:
- guardrail_name: "cygnal-monitor-only"
litellm_params:
guardrail: grayswan
mode: "during_call"
api_key: os.environ/GRAYSWAN_API_KEY
optional_params:
on_flagged_action: monitor
violation_threshold: 0.6
default_on: true

Best for visibility without blocking. Alerts are logged via LiteLLM’s standard logging callbacks.


Configuration Reference​

ParameterTypeDescription
api_keystringGray Swan Cygnal API key. Reads from GRAYSWAN_API_KEY if omitted.
modestring or listGuardrail stages (pre_call, during_call, post_call).
optional_params.on_flagged_actionstringmonitor (log only) or block (raise HTTPException).
.optional_params.violation_thresholdnumber (0-1)Scores at or above this value are considered violations.
optional_params.reasoning_modestringoff, hybrid, or thinking. Enables Cygnal’s reasoning capabilities.
optional_params.categoriesobjectMap of custom category names to descriptions.
optional_params.policy_idstringGray Swan policy identifier.