AiBouncer

AI-powered HTTP request classification for Ruby on Rails. Detect SQL injection, XSS, SSRF, and 15 other attack types using ML embeddings.

Features

Fast: ~100ms inference time (memory mode)
Lightweight: ~32MB total model size
18 attack types: SQLi, XSS, SSRF, XXE, SSTI, Log4Shell, and more
Calibrated confidence: Platt scaling for meaningful probability scores
IP/path allowlist/blocklist: Skip or block by IP (CIDR) or path (wildcards)
Request caching: Thread-safe LRU cache with TTL
Structured logging: JSON or text format, compatible with ELK/Datadog
ActiveSupport::Notifications: classify.ai_bouncer and attack_detected.ai_bouncer events
Hot reload: AiBouncer.reload! or SIGUSR2 signal, with file watcher in development
Dashboard: Mountable Rails engine with real-time threat monitoring
Flexible storage: In-memory or PostgreSQL + pgvector

Attack Types Detected

Category	Types
Injection	SQL Injection, NoSQL Injection, Command Injection, LDAP Injection, CRLF Injection
Client-side	Cross-Site Scripting (XSS), Open Redirect
Server-side	SSRF, XXE, SSTI, Log4Shell, Prototype Pollution
Protocol	HTTP Request Smuggling, Host Header Injection
Access	Path Traversal, Credential Stuffing
Recon	Vulnerability Scanners, Spam Bots

Requirements

Ruby >= 3.2 (required by onnxruntime)
Rails 6.1+ (optional, for middleware/concern integration)

Installation

Add to your Gemfile:

gem 'ai_bouncer'

Then run the installer:

bundle install
rails generate ai_bouncer:install

This creates config/initializers/ai_bouncer.rb. Model files (~32MB) are auto-downloaded on first request.

Configuration

# config/initializers/ai_bouncer.rb

AiBouncer.configure do |config|
  config.enabled = Rails.env.production?
  config.storage = :memory

  # Paths to protect (for middleware)
  config.protected_paths = ["/login", "/register", "/api/*"]

  # Action when attack detected
  config.action = :block  # :block, :challenge, or :log
  config.threshold = 0.3

  # Model files location
  config.model_path = Rails.root.join("vendor", "ai_bouncer")

  # --- IP/Path Allowlist/Blocklist ---
  config.allowlisted_ips = ["10.0.0.0/8", "172.16.0.0/12"]
  config.blocklisted_ips = ["1.2.3.4"]
  config.allowlisted_paths = ["/health", "/metrics"]
  config.blocklisted_paths = ["/wp-admin/*", "/xmlrpc.php"]

  # --- Caching ---
  config.cache_enabled = true
  config.cache_ttl = 300        # seconds
  config.cache_max_size = 1000  # entries

  # --- Structured Logging ---
  config.log_format = :json          # :json or :text
  config.log_classifications = false  # log all, not just attacks

  # --- Hot Reload ---
  config.signal_reload = true       # reload on SIGUSR2
  config.watch_model_files = true   # dev mode file watcher

  # --- Dashboard ---
  config.dashboard_enabled = true
  config.dashboard_auth = -> { http_basic_authenticate_or_request_with(name: "admin", password: ENV["DASHBOARD_PW"]) }
  config.event_store_size = 10_000

  # --- Callbacks ---
  config.on_attack_detected = ->(request:, classification:, action:) {
    Rails.logger.warn "Attack: #{classification[:label]} from #{request.ip}"
  }
end

Usage

Option 1: Middleware (Automatic)

The middleware automatically protects configured paths:

# POST /login with body: username=admin'--&password=x
# => { label: "sqli", confidence: 0.94, is_attack: true }

Option 2: Controller Concern

class SessionsController < ApplicationController
  include AiBouncer::ControllerConcern

  protect_from_attacks only: [:create], threshold: 0.5, action: :block
end

Option 3: Manual Classification

result = AiBouncer.classify(
  AiBouncer.request_to_text(
    method: "POST",
    path: "/login",
    body: "username=admin'--&password=x",
    user_agent: "python-requests/2.28"
  )
)
# => { label: "sqli", confidence: 0.94, is_attack: true, latency_ms: 2.1 }

Dashboard

Mount the dashboard engine to monitor threats in real-time:

# Dashboard is auto-mounted at /ai_bouncer when dashboard_enabled = true
# Or mount manually in routes.rb:
mount AiBouncer::Engine, at: "/ai_bouncer"

The dashboard shows:

Total requests, attack count, attack rate
Attack type distribution
Recent attacks with details (label, confidence, method, path, IP)
Recent requests with cache status

Hot Reload

Reload the model without restarting the server:

# Programmatically
AiBouncer.reload!

# Via signal (when signal_reload = true)
kill -USR2 <pid>

In development, the file watcher automatically reloads when vectors.bin or labels.json changes.

Structured Logging

JSON format output (compatible with ELK, Datadog, Splunk):

{"timestamp":"2026-02-05T12:00:00.000Z","event":"attack_detected","label":"sqli","confidence":0.95,"is_attack":true,"latency_ms":2.1,"method":"POST","path":"/login","ip":"1.2.3.4"}

ActiveSupport::Notifications

Subscribe to classification events:

ActiveSupport::Notifications.subscribe("classify.ai_bouncer") do |*args|
  event = ActiveSupport::Notifications::Event.new(*args)
  StatsD.increment("ai_bouncer.classify", tags: ["label:#{event.payload[:label]}"])
end

ActiveSupport::Notifications.subscribe("attack_detected.ai_bouncer") do |*args|
  event = ActiveSupport::Notifications::Event.new(*args)
  Sentry.capture_message("Attack detected: #{event.payload[:label]}")
end

Storage Modes

Memory Mode (Default)

Vectors are kept in memory. Fast and simple.

config.storage = :memory

Database Mode

Vectors stored in PostgreSQL using pgvector. Scalable, add custom patterns at runtime.

config.storage = :database

rails generate ai_bouncer:migration
rails db:migrate
rails ai_bouncer:seed

Model Files

Model is hosted on HuggingFace: khasinski/ai-bouncer

File	Size	Description
`embedding_model.onnx`	29 MB	Model2Vec ONNX model
`vocab.json`	550 KB	Tokenizer vocabulary
`vectors.bin`	3.2 MB	~3,300 attack pattern vectors
`labels.json`	72 KB	Labels, metadata, calibration params

How It Works

Tokenize: Request -> Unigram tokens (trie-based longest match)
Embed: Tokens -> 256-dim vector (Model2Vec via ONNX)
Search: Find k=5 nearest attack patterns (cosine similarity)
Vote: Distance-weighted voting on attack type
Calibrate: Platt scaling for meaningful confidence scores
Decide: Block if confidence > threshold

License

MIT License.

Contributing

Fork it
Create your feature branch
Commit your changes
Push to the branch
Create a Pull Request

Report issues at github.com/khasinski/ai_bouncer

ai_bouncer

Development

Runtime