BotVerification

A Rails gem for verifying that requests claiming to be from search engine bots (Google, Bing, etc.) and AI bots (GPTBot, PerplexityBot) are actually from those services.

Why?

User agents can be easily spoofed. This gem verifies bot requests using:

IP Range Matching (fast) - Checks against known IP ranges from official sources
Reverse DNS Verification (authoritative) - Falls back to DNS verification if IP range check fails

Supported Bots

Search Engines (IP + DNS verification)

Google (Googlebot, Google-Extended, etc.)
Bing (Bingbot, BingPreview)
Apple (Applebot)
Yandex (YandexBot)
Baidu (Baiduspider)

AI Bots (IP verification only)

OpenAI (GPTBot, ChatGPT-User, OAI-SearchBot)
Perplexity (PerplexityBot)
Amazon (Amazonbot)

Social Bots (user agent only - no verification available)

Facebook, Twitter, LinkedIn, Slack, Discord, Telegram

Installation

Add to your Gemfile:

gem "bot_verification"

Run the installer:

bundle install
rails generate bot_verification:install
rails db:migrate

Fetch initial IP ranges:

rails bot_verification:refresh

Usage

In Controllers

class MyController < ApplicationController
  include BotVerification::ControllerConcern

  def show
    if verified_good_bot?
      # Request is from a verified search engine bot
      # Serve full content
    elsif verified_good_bot?(mode: :search_and_ai)
      # Also includes verified AI bots (GPTBot, etc.)
    else
      # Regular user or unverified bot
      # Apply rate limiting, require auth, etc.
    end
  end
end

Verification Modes

# Only verified search engine bots (default, most secure)
verified_good_bot?(mode: :search_engines)

# Search engines + verified AI bots
verified_good_bot?(mode: :search_and_ai)

# All known bot patterns (trusts user agent, least secure)
verified_good_bot?(mode: :all_known)

Check Specifically for AI Bots

if verified_ai_bot?
  # Verified AI bot (GPTBot, PerplexityBot, etc.)
end

# Include unverifiable AI bots like ClaudeBot
if verified_ai_bot?(strict: false)
  # Any recognized AI bot
end

Direct Service Usage

# Verify a request
BotVerification.verify(ip, user_agent)
BotVerification.verify(ip, user_agent, mode: :search_and_ai)

# Check specific IP
BotVerification.verify_ip("66.249.66.1", :google)

# Detect bot type from user agent
BotVerification.detect_bot("Googlebot/2.1") # => :google

Performance

Verification uses a tiered approach, fastest first:

Tier	Method	Latency	When Used
1	Session cache	~1ms	Same session, same IP+UA
2	Rails cache	~1ms	Previously verified IP
3	IP range check	~5-10ms	First verification for IP
4	Reverse DNS	100-2000ms	IP range miss, only for bots

Important: DNS lookups only occur when:

User agent matches a known bot pattern, AND
IP range check fails

Regular users never trigger DNS lookups.

Configuration

# config/initializers/bot_verification.rb

BotVerification.configure do |config|
  # Table name for storing bot IP ranges
  config.table_name = "bot_ip_ranges"

  # Skip DNS verification entirely (only use IP range matching)
  # Set to true if DNS lookups are unacceptable for your project.
  # Note: Apple, Yandex, Baidu don't publish IP ranges, so they won't
  # be verifiable when DNS is skipped.
  config.skip_dns_verification = false

  # Timeout for each DNS lookup (seconds)
  # Only applies when skip_dns_verification is false
  config.dns_timeout = 1.0

  # Total timeout for all DNS operations (seconds)
  # Only applies when skip_dns_verification is false
  config.dns_total_timeout = 2.0

  # How long to cache verification results
  config.cache_ttl = 24.hours

  # How long to cache in session
  config.session_cache_ttl = 1.hour

  # Custom model class (optional)
  # config.ip_range_model_name = "MyBotIpRange"

  # Error callback - integrate with error tracking (Airbrake, Sentry, etc.)
  config.on_error = ->(error, context) {
    Airbrake.notify(error, context)
    # or: Sentry.capture_exception(error, extra: context)
  }

  # Refresh complete callback - for monitoring/notifications
  config.on_refresh_complete = ->(results) {
    failures = results.select { |_, r| !r[:success] }
    if failures.any?
      Rails.logger.warn("BotVerification refresh failures: #{failures.keys}")
    end
  }
end

Error Handling

The gem reports errors through multiple channels:

Logger

All operations log to config.logger (defaults to Rails.logger):

INFO - Successful operations
WARN - Non-critical issues (e.g., no ranges fetched)
ERROR - Failures (HTTP errors, parse errors)

Error Callback

For integration with error tracking services:

config.on_error = ->(error, context) {
  # error: The exception object
  # context: Hash with :bot_type, :source, :url
  Airbrake.notify(error, context)
}

Refresh Results

refresh_ip_ranges! returns a hash with success/failure for each bot type:

results = BotVerification.refresh_ip_ranges!
# => {
#   google: { success: true, count: 142 },
#   bing: { success: true, count: 8 },
#   openai_gptbot: { success: false, error: "HTTP 503: Service Unavailable" }
# }

# Check for failures
failures = results.select { |_, r| !r[:success] }

Refreshing IP Ranges

IP ranges should be refreshed daily. Choose the method that fits your deployment:

Rake Task (Heroku Scheduler, cron, etc.)

# Refresh all bot types
rails bot_verification:refresh

# Refresh specific bot type
rails bot_verification:refresh_bot[google]

Heroku Scheduler: Add rake bot_verification:refresh as a daily job.

Background Job (Sidekiq, etc.)

The gem includes BotVerification::RefreshJob for background processing:

# Enqueue to run now
BotVerification::RefreshJob.perform_later

# Refresh specific bot type
BotVerification::RefreshJob.perform_later("google")

# With Sidekiq-Cron (config/sidekiq.yml)
:schedule:
  refresh_bot_ips:
    cron: '0 4 * * *'  # 4am daily
    class: BotVerification::RefreshJob

Custom Job

Subclass for custom queue, error handling, or notifications:

# app/jobs/refresh_bot_ip_ranges_job.rb
class RefreshBotIpRangesJob < BotVerification::RefreshJob
  queue_as :low

  def perform(bot_type = nil)
    super
  rescue => e
    Airbrake.notify(e)
    raise
  end
end

Direct Call

# In a script or console
BotVerification.refresh_ip_ranges!
BotVerification.refresh_ip_ranges!(:google)

Cron (via whenever gem)

# config/schedule.rb
every 1.day, at: '4:00 am' do
  rake "bot_verification:refresh"
end

Rake Tasks

# Refresh all IP ranges
rails bot_verification:refresh

# Refresh specific bot type
rails bot_verification:refresh_bot[google]

# Show statistics
rails bot_verification:stats

# Clear caches
rails bot_verification:clear_cache

# Verify an IP
rails bot_verification:verify_ip[66.249.66.1,google]

# Check if table exists
rails bot_verification:check_table

Using Your Own Model

If you want more control, you can use your own model:

# app/models/my_bot_ip_range.rb
class MyBotIpRange < ApplicationRecord
  include BotVerification::IpRangeModel

  self.table_name = "my_bot_ip_ranges"

  # Add custom methods...
end

# config/initializers/bot_verification.rb
BotVerification.configure do |config|
  config.ip_range_model_name = "MyBotIpRange"
end

References

Changelog

For a detailed list of changes for each version of this project, please see the CHANGELOG.

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/webventures/bot_verification.

License

The gem is available as open source under the terms of the MIT License.