ai_safety_rails

A Ruby gem that adds (1) a guardrails/safety layer around LLM calls and (2) an evaluation/regression harness for prompts and models. Provider-agnostic: works with RubyLLM or any client that exposes a simple request/response interface.

Features

Part 1 – Guardrails (middleware-style)

Input guardrails
- PII redaction: mask or strip emails, phone numbers, SSN-like patterns (configurable regexes).
- Optional prompt-injection heuristics (e.g. "ignore previous instructions" blocklist).
- Optional max input length and rate limiting (in-memory or Redis).
Output guardrails
- Schema validation: validate LLM output against a JSON schema (via json_schemer).
- Optional blocklist/allowlist for topics or sensitive keywords.
Integration: Wraps any callable client: input guardrails → call client → output guardrails → return. Sync (and optionally async) support.

Part 2 – Evaluation harness

Test sets: Define evaluation sets (YAML/JSON) with input + expected output or criteria (e.g. "must include key X", "valid JSON", "no PII").
Runner: Runs all examples, records latency, token usage (if exposed), pass/fail per criterion.
Regression: Save baseline to JSON; compare runs and exit non-zero if metrics regress beyond a threshold.
CI-friendly: CLI and Rake tasks (e.g. bundle exec ai_safety_rails eval path/to/evals).

Part 3 – Rails-friendly (optional)

Generators: rails g guardrail pii_redaction, rails g eval_set support_tickets.
Config: Optional config/guardrails.yml and eval sets under config/llm_evals/.
Audit logging: Optional hook to log when a guardrail fired (Rails logger or audit table).

Installation

# Gemfile
gem "ai_safety_rails"

bundle install

Usage

Guardrails middleware

Wrap any LLM client (callable) with guardrails:

client = ->(input) { YourLLMClient.chat(input) }

guarded = AiSafetyRails::Guardrails::Middleware.wrap(client,
  input_guardrails: [
    AiSafetyRails::Guardrails::Input::PiiRedactor.new
  ],
  output_guardrails: [
    AiSafetyRails::Guardrails::Output::SchemaValidator.new(schema: my_json_schema_hash)
  ]
)

response = guarded.call("Hello, my email is user@example.com")
# Input is redacted before calling the client; output is validated before returning.

Evaluation harness

Define an eval set (e.g. config/llm_evals/support_tickets.yaml):

name: support_tickets
description: Support ticket classification
examples:
  - id: 1
    input: "Customer cannot login"
    expectations:
      - type: valid_json
      - type: has_key
        key: category

Run evals via CLI or Rake:

bundle exec ai_safety_rails eval config/llm_evals

Rails

When Rails is present, generators and config are available:

rails g ai_safety_rails:guardrail pii_redaction
rails g ai_safety_rails:eval_set support_tickets

Optional config/guardrails.yml is loaded automatically.

Development

Tests: bundle exec rake test (minitest)
Eval CLI: bundle exec exe/ai_safety_rails eval path/to/evals

License

MIT.

ai_safety_rails

Development

Runtime

ai_safety_rails

Features

Part 1 – Guardrails (middleware-style)

Part 2 – Evaluation harness

Part 3 – Rails-friendly (optional)

Installation

Usage

Guardrails middleware

Evaluation harness

Rails

Development

License