guardrails-ruby
Input/output validation and safety framework for LLM applications in Ruby.
Guardrails run before the LLM (input validation) and after (output validation). They catch prompt injection, PII leakage, toxic content, off-topic queries, hallucinated URLs, and more.
Installation
gem "guardrails-ruby"Or install directly:
gem install guardrails-ruby
Quick Start
require "guardrails_ruby"
guard = GuardrailsRuby::Guard.new do
input do
check :prompt_injection
check :pii, action: :redact
check :max_length, chars: 4096
end
output do
check :pii, action: :redact
check :hallucinated_urls, action: :warn
end
end
# Check input
result = guard.check_input("My SSN is 123-45-6789")
result.passed? # => false
result.sanitized # => "My SSN is [SSN REDACTED]"
# Wrap an LLM call
answer = guard.call(user_input) do |sanitized_input|
llm.chat(sanitized_input) # only runs if input checks pass
end
# output is automatically checked tooHow It Works
Input
│
▼
┌──────────────────┐
│ Input Checks │ deterministic first, then LLM-based
│ (in order) │
├──────────────────┤
│ :block → raise │
│ :redact → modify│
│ :warn → log │
│ :log → record │
└────────┬─────────┘
│ sanitized input
▼
┌──────────┐
│ LLM Call │
└────┬─────┘
│ raw output
▼
┌──────────────────┐
│ Output Checks │
│ (in order) │
└────────┬─────────┘
│
▼
Final Output
Built-in Checks
Input Checks
| Check | Type | Description |
|---|---|---|
prompt_injection |
Deterministic | Detect prompt injection / jailbreak attempts |
pii |
Deterministic | Detect SSN, credit cards, emails, phones, IPs, DOB |
toxic_language |
Deterministic | Detect threats, violence, harassment |
topic |
Deterministic | Restrict to allowed topics |
max_length |
Deterministic | Enforce input length limits |
encoding |
Deterministic | Reject malformed unicode, null bytes |
keyword_filter |
Deterministic | Blocklist/allowlist keyword filtering |
Output Checks
| Check | Type | Description |
|---|---|---|
pii |
Deterministic | Don't leak PII in responses |
hallucinated_urls |
Deterministic | Detect URLs not in source context |
hallucinated_emails |
Deterministic | Detect made-up email addresses |
format |
Deterministic | Validate output format (JSON, etc.) |
relevance |
Deterministic | Check answer addresses the question |
competitor_mention |
Deterministic | Redact competitor names |
Actions
Each check can be configured with an action:
-
:block— raisesGuardrailsRuby::Blocked(default) -
:redact— replaces detected content with placeholders -
:warn— passes but logs a warning -
:log— passes silently, records the violation
check :pii, action: :redact # replace PII with [SSN REDACTED], etc.
check :prompt_injection # defaults to :block
check :hallucinated_urls, action: :warnMiddleware
Wrap any LLM client transparently:
safe_llm = GuardrailsRuby::Middleware.new(my_llm_client) do
input do
check :prompt_injection
check :pii, action: :redact
end
output do
check :pii, action: :redact
end
end
response = safe_llm.chat("Tell me about account #12345")
# Input PII redacted before reaching LLM
# Output PII redacted before reaching userRails Integration
# config/initializers/guardrails.rb
GuardrailsRuby.configure do |config|
config.default_input_checks = [:prompt_injection, :pii, :max_length]
config.default_output_checks = [:pii, :hallucinated_urls]
config.on_violation = ->(v) { Rails.logger.warn("Guardrail: #{v}") }
end# app/controllers/chat_controller.rb
class ChatController < ApplicationController
include GuardrailsRuby::Controller
guardrails do
input do
check :prompt_injection
check :pii, action: :redact
end
output do
check :pii, action: :redact
end
end
def create
safe_input = guarded_input # reads params[:message]
answer = MyLLM.chat(safe_input)
render json: { answer: guarded_output(answer) }
rescue GuardrailsRuby::Blocked
render json: { error: "Request blocked." }, status: :unprocessable_entity
end
endCustom Checks
class ProfanityCheck < GuardrailsRuby::Check
check_name :profanity
direction :both
def call(text, context: {})
bad_words = @options.fetch(:words, %w[badword1 badword2])
found = bad_words.select { |w| text.downcase.include?(w) }
if found.any?
fail! "Profanity detected: #{found.join(', ')}",
matches: found,
sanitized: redact(text, found)
else
pass!
end
end
private
def redact(text, words)
result = text.dup
words.each { |w| result.gsub!(/#{Regexp.escape(w)}/i, "[REDACTED]") }
result
end
end
guard = GuardrailsRuby::Guard.new do
input { check :profanity, action: :redact }
endPII Detection
Built-in patterns detect:
| Type | Example | Redacted As |
|---|---|---|
| SSN | 123-45-6789 |
[SSN REDACTED] |
| Credit Card | 4111-1111-1111-1111 |
[CC REDACTED] |
user@example.com |
[EMAIL REDACTED] |
|
| Phone | (555) 123-4567 |
[PHONE REDACTED] |
| IP Address | 192.168.1.1 |
[IP REDACTED] |
| Date of Birth | DOB: 01/15/1990 |
[DOB REDACTED] |
Prompt Injection Detection
Detects common injection patterns:
- "Ignore all previous instructions..."
- "You are now a..."
- "Pretend you're..."
-
[system]/<system>markers - "STOP. Forget everything..."
- And more
Development
bundle install
bundle exec rake test
License
MIT License. See LICENSE.