Project

ai_bouncer

0.0
The project is in a healthy, maintained state
Detect credential stuffing, SQL injection, XSS, and other attacks using ML embeddings. Lightweight (~30MB model) with ~2ms inference time.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Development

~> 13.0
~> 3.0
~> 1.0

Runtime

 Project Readme

AiBouncer

CI Gem Version

AI-powered HTTP request classification for Ruby on Rails. Detect SQL injection, XSS, SSRF, and 15 other attack types using ML embeddings.

Features

  • Fast: ~100ms inference time (memory mode)
  • Lightweight: ~32MB total model size
  • 18 attack types: SQLi, XSS, SSRF, XXE, SSTI, Log4Shell, and more
  • Calibrated confidence: Platt scaling for meaningful probability scores
  • IP/path allowlist/blocklist: Skip or block by IP (CIDR) or path (wildcards)
  • Request caching: Thread-safe LRU cache with TTL
  • Structured logging: JSON or text format, compatible with ELK/Datadog
  • ActiveSupport::Notifications: classify.ai_bouncer and attack_detected.ai_bouncer events
  • Hot reload: AiBouncer.reload! or SIGUSR2 signal, with file watcher in development
  • Dashboard: Mountable Rails engine with real-time threat monitoring
  • Flexible storage: In-memory or PostgreSQL + pgvector

Attack Types Detected

Category Types
Injection SQL Injection, NoSQL Injection, Command Injection, LDAP Injection, CRLF Injection
Client-side Cross-Site Scripting (XSS), Open Redirect
Server-side SSRF, XXE, SSTI, Log4Shell, Prototype Pollution
Protocol HTTP Request Smuggling, Host Header Injection
Access Path Traversal, Credential Stuffing
Recon Vulnerability Scanners, Spam Bots

Requirements

  • Ruby >= 3.2 (required by onnxruntime)
  • Rails 6.1+ (optional, for middleware/concern integration)

Installation

Add to your Gemfile:

gem 'ai_bouncer'

Then run the installer:

bundle install
rails generate ai_bouncer:install

This creates config/initializers/ai_bouncer.rb. Model files (~32MB) are auto-downloaded on first request.

Configuration

# config/initializers/ai_bouncer.rb

AiBouncer.configure do |config|
  config.enabled = Rails.env.production?
  config.storage = :memory

  # Paths to protect (for middleware)
  config.protected_paths = ["/login", "/register", "/api/*"]

  # Action when attack detected
  config.action = :block  # :block, :challenge, or :log
  config.threshold = 0.3

  # Model files location
  config.model_path = Rails.root.join("vendor", "ai_bouncer")

  # --- IP/Path Allowlist/Blocklist ---
  config.allowlisted_ips = ["10.0.0.0/8", "172.16.0.0/12"]
  config.blocklisted_ips = ["1.2.3.4"]
  config.allowlisted_paths = ["/health", "/metrics"]
  config.blocklisted_paths = ["/wp-admin/*", "/xmlrpc.php"]

  # --- Caching ---
  config.cache_enabled = true
  config.cache_ttl = 300        # seconds
  config.cache_max_size = 1000  # entries

  # --- Structured Logging ---
  config.log_format = :json          # :json or :text
  config.log_classifications = false  # log all, not just attacks

  # --- Hot Reload ---
  config.signal_reload = true       # reload on SIGUSR2
  config.watch_model_files = true   # dev mode file watcher

  # --- Dashboard ---
  config.dashboard_enabled = true
  config.dashboard_auth = -> { http_basic_authenticate_or_request_with(name: "admin", password: ENV["DASHBOARD_PW"]) }
  config.event_store_size = 10_000

  # --- Callbacks ---
  config.on_attack_detected = ->(request:, classification:, action:) {
    Rails.logger.warn "Attack: #{classification[:label]} from #{request.ip}"
  }
end

Usage

Option 1: Middleware (Automatic)

The middleware automatically protects configured paths:

# POST /login with body: username=admin'--&password=x
# => { label: "sqli", confidence: 0.94, is_attack: true }

Option 2: Controller Concern

class SessionsController < ApplicationController
  include AiBouncer::ControllerConcern

  protect_from_attacks only: [:create], threshold: 0.5, action: :block
end

Option 3: Manual Classification

result = AiBouncer.classify(
  AiBouncer.request_to_text(
    method: "POST",
    path: "/login",
    body: "username=admin'--&password=x",
    user_agent: "python-requests/2.28"
  )
)
# => { label: "sqli", confidence: 0.94, is_attack: true, latency_ms: 2.1 }

Dashboard

Mount the dashboard engine to monitor threats in real-time:

# Dashboard is auto-mounted at /ai_bouncer when dashboard_enabled = true
# Or mount manually in routes.rb:
mount AiBouncer::Engine, at: "/ai_bouncer"

The dashboard shows:

  • Total requests, attack count, attack rate
  • Attack type distribution
  • Recent attacks with details (label, confidence, method, path, IP)
  • Recent requests with cache status

Hot Reload

Reload the model without restarting the server:

# Programmatically
AiBouncer.reload!

# Via signal (when signal_reload = true)
kill -USR2 <pid>

In development, the file watcher automatically reloads when vectors.bin or labels.json changes.

Structured Logging

JSON format output (compatible with ELK, Datadog, Splunk):

{"timestamp":"2026-02-05T12:00:00.000Z","event":"attack_detected","label":"sqli","confidence":0.95,"is_attack":true,"latency_ms":2.1,"method":"POST","path":"/login","ip":"1.2.3.4"}

ActiveSupport::Notifications

Subscribe to classification events:

ActiveSupport::Notifications.subscribe("classify.ai_bouncer") do |*args|
  event = ActiveSupport::Notifications::Event.new(*args)
  StatsD.increment("ai_bouncer.classify", tags: ["label:#{event.payload[:label]}"])
end

ActiveSupport::Notifications.subscribe("attack_detected.ai_bouncer") do |*args|
  event = ActiveSupport::Notifications::Event.new(*args)
  Sentry.capture_message("Attack detected: #{event.payload[:label]}")
end

Storage Modes

Memory Mode (Default)

Vectors are kept in memory. Fast and simple.

config.storage = :memory

Database Mode

Vectors stored in PostgreSQL using pgvector. Scalable, add custom patterns at runtime.

config.storage = :database
rails generate ai_bouncer:migration
rails db:migrate
rails ai_bouncer:seed

Model Files

Model is hosted on HuggingFace: khasinski/ai-bouncer

File Size Description
embedding_model.onnx 29 MB Model2Vec ONNX model
vocab.json 550 KB Tokenizer vocabulary
vectors.bin 3.2 MB ~3,300 attack pattern vectors
labels.json 72 KB Labels, metadata, calibration params

How It Works

  1. Tokenize: Request -> Unigram tokens (trie-based longest match)
  2. Embed: Tokens -> 256-dim vector (Model2Vec via ONNX)
  3. Search: Find k=5 nearest attack patterns (cosine similarity)
  4. Vote: Distance-weighted voting on attack type
  5. Calibrate: Platt scaling for meaningful confidence scores
  6. Decide: Block if confidence > threshold

License

MIT License.

Contributing

  1. Fork it
  2. Create your feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a Pull Request

Report issues at github.com/khasinski/ai_bouncer