Skip to main content

[Preview] v1.81.12 - Guardrail Policy Templates & Action Builder

Krrish Dholakia
CEO, LiteLLM
Ishaan Jaff
CTO, LiteLLM

Deploy this version​

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.81.12.rc.1

Key Highlights​


Add Semgrep & fix OOMs​

This release fixes out-of-memory (OOM) risks from unbounded asyncio.Queue() usage. Log queues (e.g. GCS bucket) and DB spend-update queues were previously unbounded and could grow without limit under load. They now use a configurable max size (LITELLM_ASYNCIO_QUEUE_MAXSIZE, default 1000); when full, queues flush immediately to make room instead of growing memory. A Semgrep rule (.semgrep/rules/python/unbounded-memory.yml) was added to flag similar unbounded-memory patterns in future code. PR #20912


Guardrail Action Builder​

This release adds a visual action builder for guardrail policies with conditional execution support. You can now chain guardrails into multi-step pipelines β€” if a simple guardrail fails, route to an advanced one instead of immediately blocking. Each step has configurable ON PASS and ON FAIL actions (Next Step, Block, or Allow), and you can test the full pipeline with a sample message before saving.

Guardrail Action Builder

Access Groups​

Access Groups simplify defining resource access across your organization. One group can grant access to models, MCP servers, and agentsβ€”simply attach it to a key or team. Create groups in the Admin UI, define which resources each group includes, then assign the group when creating keys or teams. Updates to a group apply automatically to all attached keys and teams.

New Providers and Endpoints​

New Providers (2 new providers)​

ProviderSupported LiteLLM EndpointsDescription
Scaleway/chat/completionsScaleway Generative APIs for chat completions
Sarvam AI/chat/completions, /audio/transcriptions, /audio/speechSarvam AI STT and TTS support for Indian languages

New Models / Updated Models​

New Model Support (19 highlighted models)​

ProviderModelContext WindowInput ($/1M tokens)Output ($/1M tokens)
AWS Bedrockdeepseek.v3.2164K$0.62$1.85
AWS Bedrockminimax.minimax-m2.1196K$0.30$1.20
AWS Bedrockmoonshotai.kimi-k2.5262K$0.60$3.00
AWS Bedrockmoonshotai.kimi-k2-thinking262K$0.73$3.03
AWS Bedrockqwen.qwen3-coder-next262K$0.50$1.20
AWS Bedrocknvidia.nemotron-nano-3-30b262K$0.06$0.24
Azure AIazure_ai/kimi-k2.5262K$0.60$3.00
Vertex AIvertex_ai/zai-org/glm-5-maas200K$1.00$3.20
MiniMaxminimax/MiniMax-M2.51M$0.30$1.20
MiniMaxminimax/MiniMax-M2.5-lightning1M$0.30$2.40
Dashscopedashscope/qwen3-max258KTiered pricingTiered pricing
Perplexityperplexity/preset/pro-search-Per-requestPer-request
Perplexityperplexity/openai/gpt-4o-Per-requestPer-request
Perplexityperplexity/openai/gpt-5.2-Per-requestPer-request
Vercel AI Gatewayvercel_ai_gateway/anthropic/claude-opus-4.6200K$5.00$25.00
Vercel AI Gatewayvercel_ai_gateway/anthropic/claude-sonnet-4200K$3.00$15.00
Vercel AI Gatewayvercel_ai_gateway/anthropic/claude-haiku-4.5200K$1.00$5.00
Sarvam AIsarvam/sarvam-m8KFree tierFree tier
Anthropicfast/claude-opus-4-61M$30.00$150.00

Note: AWS Bedrock models are available across multiple regions (us-east-1, us-east-2, us-west-2, eu-central-1, eu-north-1, ap-northeast-1, ap-south-1, ap-southeast-3, sa-east-1). 54 regional model entries were added in total.

Features​

  • Anthropic

    • Enable non-tool structured outputs on Claude Opus 4.5 and 4.6 using output_format param - PR #20548
    • Add support for anthropic_messages call type in prompt caching - PR #19233
    • Managing Anthropic Beta Headers with remote URL fetching - PR #20935, PR #21110
    • Remove x-anthropic-billing block - PR #20951
    • Use Authorization Bearer for OAuth tokens instead of x-api-key - PR #21039
    • Filter unsupported JSON schema constraints for structured outputs - PR #20813
    • New Claude Opus 4.6 features for /v1/messages - PR #20733
    • Fix reasoning_effort=None and "none" should return None for Opus 4.6 - PR #20800
  • AWS Bedrock

    • Extend model support with 4 new beta models - PR #21035
    • Add Claude Opus 4.6 to _supports_tool_search_on_bedrock - PR #21017
    • Correct Bedrock Claude Opus 4.6 model IDs (remove :0 suffix) - PR #20564, PR #20671
    • Add output_config as supported param - PR #20748
  • Vertex AI

    • Add Vertex GLM-5 model support - PR #21053
    • Propagate extra_headers anthropic-beta to request body - PR #20666
    • Preserve usageMetadata in _hidden_params - PR #20559
    • Map IMAGE_PROHIBITED_CONTENT to content_filter - PR #20524
    • Add RAG ingest for Vertex AI - PR #21120
  • OCI / Cohere

    • OCI Cohere responseFormat/Pydantic support - PR #20663
    • Fix OCI Cohere system messages by populating preambleOverride - PR #20958
  • Perplexity

    • Perplexity Research API support with preset search - PR #20860
  • MiniMax

    • Add MiniMax-M2.5 and MiniMax-M2.5-lightning models - PR #21054
  • Kimi / Moonshot

  • Dashscope

    • Add dashscope/qwen3-max model with tiered pricing - PR #20919
  • Vercel AI Gateway

    • Add new Vercel AI Anthropic models - PR #20745
  • Azure AI

    • Add azure_ai/kimi-k2.5 to Azure model DB - PR #20896
    • Support Azure AD token auth for non-Claude azure_ai models - PR #20981
    • Fix Azure batches issues - PR #21092
  • DeepSeek

    • Sync DeepSeek model metadata and add bare-name fallback - PR #20938
  • Gemini

    • Handle image in assistant message for Gemini - PR #20845
    • Add missing tpm/rpm for Gemini models - PR #21175
  • General

    • Add 30 missing models to pricing JSON - PR #20797
    • Cleanup 39 deprecated OpenRouter models - PR #20786
    • Standardize endpoint display_name naming convention - PR #20791
    • Fix and stabilize model cost map formatting - PR #20895
    • Export PermissionDeniedError from litellm.__init__ - PR #20960

Bug Fixes​


LLM API Endpoints​

Features​

  • Responses API

    • Add server-side context management (compaction) support - PR #21058
    • Add Shell tool support for OpenAI Responses API - PR #21063
    • Preserve tool call argument deltas when streaming id is omitted - PR #20712
    • Preserve interleaved thinking/redacted_thinking blocks during streaming - PR #20702
  • Chat Completions

    • Add Web Search support using LiteLLM /search (web search interception hook) - PR #20483
    • Preserved nullable object fields by carrying schema properties - PR #19132
    • Support prompt_cache_key for OpenAI and Azure chat completions - PR #20989
  • Pass-Through Endpoints

    • Add support for langchain_aws via LiteLLM passthrough - PR #20843
    • Add custom_body parameter to endpoint_func in create_pass_through_route - PR #20849
  • Vector Stores

    • Add target_model_names for vector store endpoints - PR #21089
  • General

    • Add output_config as supported param - PR #20748
    • Add managed error file support - PR #20838

Bugs​

  • General
    • Stop leaking Python tracebacks in streaming SSE error responses - PR #20850
    • Fix video list pagination cursors not encoded with provider metadata - PR #20710
    • Handle metadata=None in SDK path retry/error logic - PR #20873
    • Fix Spend logs pickle error with Pydantic models and redaction - PR #20685
    • Remove duplicate PerplexityResponsesConfig from LLM_CONFIG_NAMES - PR #21105

Management Endpoints / UI​

Features​

  • Access Groups

    • New Access Groups feature for managing model, MCP server, and agent access - PR #21022
    • Access Groups table and details page UI - PR #21165
    • Refactor model_ids to model_names for backwards compatibility - PR #21166
  • Policies

    • Allow connecting Policies to Tags, simulating Policies, viewing key/team counts - PR #20904
    • Guardrail pipeline support for conditional sequential execution - PR #21177
    • Pipeline flow builder UI for guardrail policies - PR #21188
  • SSO / Auth

    • New Login With SSO Button - PR #20908
    • M2M OAuth2 UI Flow - PR #20794
    • Allow Organization and Team Admins to call /invitation/new - PR #20987
    • Invite User: Email Integration Alert - PR #20790
    • Populate identity fields in proxy admin JWT early-return path - PR #21169
  • Spend Logs

    • Show predefined error codes in filter with user definable fallback - PR #20773
    • Paginated searchable model select - PR #20892
    • Sorting columns support - PR #21143
    • Allow sorting on /spend/logs/ui - PR #20991
  • UI Improvements

    • Navbar: Option to hide Usage Popup - PR #20910
    • Model Page: Improve Credentials Messaging - PR #21076
    • Fallbacks: Default configurable to 10 models - PR #21144
    • Fallback display with arrows and card structure - PR #20922
    • Team Info: Migrate to AntD Tabs + Table - PR #20785
    • AntD refactoring and 0 cost models fix - PR #20687
    • Zscaler AI Guard UI - PR #21077
    • Include Config Defined Pass Through Endpoints - PR #20898
    • Rename "HTTP" to "Streamable HTTP (Recommended)" in MCP server page - PR #21000
    • MCP server discovery UI - PR #21079
  • Virtual Keys

    • Allow Management keys to access user/daily/activity and team - PR #20124
    • Skip premium check for empty metadata fields on team/key update - PR #20598

Bugs​

  • Logs: Fix Input and Output Copying - PR #20657
  • Teams: Fix Available Teams - PR #20682
  • Spend Logs: Reset Filters Resets Custom Date Range - PR #21149
  • Usage: Request Chart stack variant fix - PR #20894
  • Add Auto Router: Description Text Input Focus - PR #21004
  • Guardrail Edit: LiteLLM Content Filter Categories - PR #21002
  • Add null guard for models in API keys table - PR #20655
  • Show error details instead of 'Data Not Available' for failed requests - PR #20656
  • Fix Spend Management Tests - PR #21088
  • Fix JWT email domain validation error message - PR #21212

AI Integrations​

Logging​

  • PostHog

    • Fix JSON serialization error for non-serializable objects - PR #20668
  • Prometheus

    • Sanitize label values to prevent metric scrape failures - PR #20600
  • Langfuse

    • Prevent empty proxy request spans from being sent to Langfuse - PR #19935
  • OpenTelemetry

    • Auto-infer otlp_http exporter when endpoint is configured - PR #20438
  • CloudZero

    • Update CBF field mappings per LIT-1907 - PR #20906
  • General

    • Allow MAX_CALLBACKS override via env var - PR #20781
    • Add standard_logging_payload_excluded_fields config option - PR #20831
    • Enable verbose_logger when LITELLM_LOG=DEBUG - PR #20496
    • Guard against None litellm_metadata in batch logging path - PR #20832
    • Propagate model-level tags from config to SpendLogs - PR #20769

Guardrails​

  • Policy Templates

    • New Policy Templates: pre-configured guardrail combinations for specific use-cases - PR #21025
    • Add NSFW policy template, toxic keywords in multiple languages, child safety content filter, JSON content viewer - PR #21205
    • Add toxic/abusive content filter guardrails - PR #20934
  • Pipeline Execution

    • Add guardrail pipeline support for conditional sequential execution - PR #21177
    • Agent Guardrails on streaming output - PR #21206
    • Pipeline flow builder UI - PR #21188
  • Zscaler AI Guard

    • Zscaler AI Guard bug fixes and support during post-call - PR #20801
    • Zscaler AI Guard UI - PR #21077
  • ZGuard

    • Add team policy mapping for ZGuard - PR #20608
  • General

    • Add logging to all unified guardrails + link to custom code guardrail templates - PR #20900
    • Forward request headers + litellm_version to generic guardrails - PR #20729
    • Empty guardrails/policies arrays should not trigger enterprise license check - PR #20567
    • Fix OpenAI moderation guardrails - PR #20718
    • Fix /v2/guardrails/list returning sensitive values - PR #20796
    • Fix guardrail status error - PR #20972
    • Reuse get_instance_fn in initialize_custom_guardrail - PR #20917

Spend Tracking, Budgets and Rate Limiting​

  • Prevent shared backend model key from being polluted by per-deployment custom pricing - PR #20679
  • Avoid in-place mutation in SpendUpdateQueue aggregation - PR #20876

MCP Gateway (12 updates)​

  • MCP M2M OAuth2 Support - Add support for machine-to-machine OAuth2 for MCP servers - PR #20788
  • MCP Server Discovery UI - Browse and discover available MCP servers from the UI - PR #21079
  • MCP Tracing - Add OpenTelemetry tracing for MCP calls running through AI Gateway - PR #21018
  • MCP OAuth2 Debug Headers - Client-side debug headers for OAuth2 troubleshooting - PR #21151
  • Fix MCP "Session not found" errors - Resolve session persistence issues - PR #21040
  • Fix MCP OAuth2 root endpoints returning "MCP server not found" - PR #20784
  • Fix MCP OAuth2 query param merging when authorization_url already contains params - PR #20968
  • Fix MCP SCOPES on Atlassian issue - PR #21150
  • Fix MCP StreamableHTTP backend - Use anyio.fail_after instead of asyncio.wait_for - PR #20891
  • Inject NPM_CONFIG_CACHE into STDIO MCP subprocess env - PR #21069
  • Block spaces and hyphens in MCP server names and aliases - PR #21074

Performance / Loadbalancing / Reliability improvements (8 improvements)​

  • Remove orphan entries from queue - Fix memory leak in scheduler queue - PR #20866
  • Remove repeated provider parsing in budget limiter hot path - PR #21043
  • Use current retry exception for retry backoff instead of stale exception - PR #20725
  • Add Semgrep & fix OOMs - Static analysis rules and out-of-memory fixes - PR #20912
  • Add Pyroscope for continuous profiling and observability - PR #21167
  • Respect ssl_verify with shared aiohttp sessions - PR #20349
  • Fix shared health check serialization - PR #21119
  • Change model mismatch logs from WARNING to DEBUG - PR #20994

Database Changes​

Schema Updates​

TableChange TypeDescriptionPRMigration
LiteLLM_VerificationTokenNew IndexesAdded indexes on user_id+team_id, team_id, and budget_reset_at+expiresPR #20736Migration
LiteLLM_PolicyAttachmentTableNew ColumnAdded tags text array for policy-to-tag connectionsPR #21061Migration
LiteLLM_AccessGroupTableNew TableAccess groups for managing model, MCP server, and agent accessPR #21022Migration
LiteLLM_AccessGroupTableColumn ChangeRenamed access_model_ids to access_model_namesPR #21166Migration
LiteLLM_ManagedVectorStoreTableNew TableManaged vector store tracking with model mappings-Migration
LiteLLM_TeamTable, LiteLLM_VerificationTokenNew ColumnAdded access_group_ids text arrayPR #21022Migration
LiteLLM_GuardrailsTableNew ColumnAdded team_id text column-Migration

Documentation Updates (14 updates)​

  • LiteLLM Observatory section added to v1.81.9 release notes - PR #20675
  • Callback registration optimization added to release notes - PR #20681
  • Middleware performance blog post - PR #20677
  • UI Team Soft Budget documentation - PR #20669
  • UI Contributing and Troubleshooting guide - PR #20674
  • Reorganize Admin UI subsection - PR #20676
  • SDK proxy authentication (OAuth2/JWT auto-refresh) - PR #20680
  • Forward client headers to LLM API documentation fix - PR #20768
  • Add docs guide for using policies - PR #20914
  • Add native thinking param examples for Claude Opus 4.6 - PR #20799
  • Fix Claude Code MCP tutorial - PR #21145
  • Add API base URLs for Dashscope (International and China/Beijing) - PR #21083
  • Fix DEFAULT_NUM_WORKERS_LITELLM_PROXY default (1, not 4) - PR #21127
  • Correct ElevenLabs support status in README - PR #20643

New Contributors​

  • @iver56 made their first contribution in PR #20643
  • @eliasaronson made their first contribution in PR #20666
  • @NirantK made their first contribution in PR #19656
  • @looksgood made their first contribution in PR #20919
  • @kelvin-tran made their first contribution in PR #20548
  • @bluet made their first contribution in PR #20873
  • @itayov made their first contribution in PR #20729
  • @CSteigstra made their first contribution in PR #20960
  • @rahulrd25 made their first contribution in PR #20569
  • @muraliavarma made their first contribution in PR #20598
  • @joaokopernico made their first contribution in PR #21039
  • @datzscaler made their first contribution in PR #21077
  • @atapia27 made their first contribution in PR #20922
  • @fpagny made their first contribution in PR #21121
  • @aidankovacic-8451 made their first contribution in PR #21119
  • @luisgallego-aily made their first contribution in PR #19935

Full Changelog​

v1.81.9.rc.1...v1.81.12.rc.1