ai_safety_rails
A Ruby gem that adds (1) a guardrails/safety layer around LLM calls and (2) an evaluation/regression harness for prompts and models. Provider-agnostic: works with RubyLLM or any client that exposes a simple request/response interface.
Features
Part 1 – Guardrails (middleware-style)
-
Input guardrails
- PII redaction: mask or strip emails, phone numbers, SSN-like patterns (configurable regexes).
- Optional prompt-injection heuristics (e.g. "ignore previous instructions" blocklist).
- Optional max input length and rate limiting (in-memory or Redis).
-
Output guardrails
-
Schema validation: validate LLM output against a JSON schema (via
json_schemer). - Optional blocklist/allowlist for topics or sensitive keywords.
-
Schema validation: validate LLM output against a JSON schema (via
-
Integration: Wraps any callable client:
input guardrails → call client → output guardrails → return. Sync (and optionally async) support.
Part 2 – Evaluation harness
- Test sets: Define evaluation sets (YAML/JSON) with input + expected output or criteria (e.g. "must include key X", "valid JSON", "no PII").
- Runner: Runs all examples, records latency, token usage (if exposed), pass/fail per criterion.
- Regression: Save baseline to JSON; compare runs and exit non-zero if metrics regress beyond a threshold.
-
CI-friendly: CLI and Rake tasks (e.g.
bundle exec ai_safety_rails eval path/to/evals).
Part 3 – Rails-friendly (optional)
-
Generators:
rails g guardrail pii_redaction,rails g eval_set support_tickets. -
Config: Optional
config/guardrails.ymland eval sets underconfig/llm_evals/. - Audit logging: Optional hook to log when a guardrail fired (Rails logger or audit table).
Installation
# Gemfile
gem "ai_safety_rails"bundle installUsage
Guardrails middleware
Wrap any LLM client (callable) with guardrails:
client = ->(input) { YourLLMClient.chat(input) }
guarded = AiSafetyRails::Guardrails::Middleware.wrap(client,
input_guardrails: [
AiSafetyRails::Guardrails::Input::PiiRedactor.new
],
output_guardrails: [
AiSafetyRails::Guardrails::Output::SchemaValidator.new(schema: my_json_schema_hash)
]
)
response = guarded.call("Hello, my email is user@example.com")
# Input is redacted before calling the client; output is validated before returning.Evaluation harness
- Define an eval set (e.g.
config/llm_evals/support_tickets.yaml):
name: support_tickets
description: Support ticket classification
examples:
- id: 1
input: "Customer cannot login"
expectations:
- type: valid_json
- type: has_key
key: category- Run evals via CLI or Rake:
bundle exec ai_safety_rails eval config/llm_evalsRails
When Rails is present, generators and config are available:
rails g ai_safety_rails:guardrail pii_redaction
rails g ai_safety_rails:eval_set support_ticketsOptional config/guardrails.yml is loaded automatically.
Development
-
Tests:
bundle exec rake test(minitest) -
Eval CLI:
bundle exec exe/ai_safety_rails eval path/to/evals
License
MIT.