0.0
The project is in a healthy, maintained state
Provider-agnostic guardrails (PII redaction, schema validation) and evaluation/regression harness for LLM prompts and models.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Development

~> 5.0
~> 13.0

Runtime

 Project Readme

ai_safety_rails

A Ruby gem that adds (1) a guardrails/safety layer around LLM calls and (2) an evaluation/regression harness for prompts and models. Provider-agnostic: works with RubyLLM or any client that exposes a simple request/response interface.

Features

Part 1 – Guardrails (middleware-style)

  • Input guardrails
    • PII redaction: mask or strip emails, phone numbers, SSN-like patterns (configurable regexes).
    • Optional prompt-injection heuristics (e.g. "ignore previous instructions" blocklist).
    • Optional max input length and rate limiting (in-memory or Redis).
  • Output guardrails
    • Schema validation: validate LLM output against a JSON schema (via json_schemer).
    • Optional blocklist/allowlist for topics or sensitive keywords.
  • Integration: Wraps any callable client: input guardrails → call client → output guardrails → return. Sync (and optionally async) support.

Part 2 – Evaluation harness

  • Test sets: Define evaluation sets (YAML/JSON) with input + expected output or criteria (e.g. "must include key X", "valid JSON", "no PII").
  • Runner: Runs all examples, records latency, token usage (if exposed), pass/fail per criterion.
  • Regression: Save baseline to JSON; compare runs and exit non-zero if metrics regress beyond a threshold.
  • CI-friendly: CLI and Rake tasks (e.g. bundle exec ai_safety_rails eval path/to/evals).

Part 3 – Rails-friendly (optional)

  • Generators: rails g guardrail pii_redaction, rails g eval_set support_tickets.
  • Config: Optional config/guardrails.yml and eval sets under config/llm_evals/.
  • Audit logging: Optional hook to log when a guardrail fired (Rails logger or audit table).

Installation

# Gemfile
gem "ai_safety_rails"
bundle install

Usage

Guardrails middleware

Wrap any LLM client (callable) with guardrails:

client = ->(input) { YourLLMClient.chat(input) }

guarded = AiSafetyRails::Guardrails::Middleware.wrap(client,
  input_guardrails: [
    AiSafetyRails::Guardrails::Input::PiiRedactor.new
  ],
  output_guardrails: [
    AiSafetyRails::Guardrails::Output::SchemaValidator.new(schema: my_json_schema_hash)
  ]
)

response = guarded.call("Hello, my email is user@example.com")
# Input is redacted before calling the client; output is validated before returning.

Evaluation harness

  1. Define an eval set (e.g. config/llm_evals/support_tickets.yaml):
name: support_tickets
description: Support ticket classification
examples:
  - id: 1
    input: "Customer cannot login"
    expectations:
      - type: valid_json
      - type: has_key
        key: category
  1. Run evals via CLI or Rake:
bundle exec ai_safety_rails eval config/llm_evals

Rails

When Rails is present, generators and config are available:

rails g ai_safety_rails:guardrail pii_redaction
rails g ai_safety_rails:eval_set support_tickets

Optional config/guardrails.yml is loaded automatically.

Development

  • Tests: bundle exec rake test (minitest)
  • Eval CLI: bundle exec exe/ai_safety_rails eval path/to/evals

License

MIT.