BotVerification
A Rails gem for verifying that requests claiming to be from search engine bots (Google, Bing, etc.) and AI bots (GPTBot, PerplexityBot) are actually from those services.
Why?
User agents can be easily spoofed. This gem verifies bot requests using:
- IP Range Matching (fast) - Checks against known IP ranges from official sources
- Reverse DNS Verification (authoritative) - Falls back to DNS verification if IP range check fails
Supported Bots
Search Engines (IP + DNS verification)
- Google (Googlebot, Google-Extended, etc.)
- Bing (Bingbot, BingPreview)
- Apple (Applebot)
- Yandex (YandexBot)
- Baidu (Baiduspider)
AI Bots (IP verification only)
- OpenAI (GPTBot, ChatGPT-User, OAI-SearchBot)
- Perplexity (PerplexityBot)
- Amazon (Amazonbot)
Social Bots (user agent only - no verification available)
- Facebook, Twitter, LinkedIn, Slack, Discord, Telegram
Installation
Add to your Gemfile:
gem "bot_verification"Run the installer:
bundle install
rails generate bot_verification:install
rails db:migrateFetch initial IP ranges:
rails bot_verification:refreshUsage
In Controllers
class MyController < ApplicationController
include BotVerification::ControllerConcern
def show
if verified_good_bot?
# Request is from a verified search engine bot
# Serve full content
elsif verified_good_bot?(mode: :search_and_ai)
# Also includes verified AI bots (GPTBot, etc.)
else
# Regular user or unverified bot
# Apply rate limiting, require auth, etc.
end
end
endVerification Modes
# Only verified search engine bots (default, most secure)
verified_good_bot?(mode: :search_engines)
# Search engines + verified AI bots
verified_good_bot?(mode: :search_and_ai)
# All known bot patterns (trusts user agent, least secure)
verified_good_bot?(mode: :all_known)Check Specifically for AI Bots
if verified_ai_bot?
# Verified AI bot (GPTBot, PerplexityBot, etc.)
end
# Include unverifiable AI bots like ClaudeBot
if verified_ai_bot?(strict: false)
# Any recognized AI bot
endDirect Service Usage
# Verify a request
BotVerification.verify(ip, user_agent)
BotVerification.verify(ip, user_agent, mode: :search_and_ai)
# Check specific IP
BotVerification.verify_ip("66.249.66.1", :google)
# Detect bot type from user agent
BotVerification.detect_bot("Googlebot/2.1") # => :googlePerformance
Verification uses a tiered approach, fastest first:
| Tier | Method | Latency | When Used |
|---|---|---|---|
| 1 | Session cache | ~1ms | Same session, same IP+UA |
| 2 | Rails cache | ~1ms | Previously verified IP |
| 3 | IP range check | ~5-10ms | First verification for IP |
| 4 | Reverse DNS | 100-2000ms | IP range miss, only for bots |
Important: DNS lookups only occur when:
- User agent matches a known bot pattern, AND
- IP range check fails
Regular users never trigger DNS lookups.
Configuration
# config/initializers/bot_verification.rb
BotVerification.configure do |config|
# Table name for storing bot IP ranges
config.table_name = "bot_ip_ranges"
# Skip DNS verification entirely (only use IP range matching)
# Set to true if DNS lookups are unacceptable for your project.
# Note: Apple, Yandex, Baidu don't publish IP ranges, so they won't
# be verifiable when DNS is skipped.
config.skip_dns_verification = false
# Timeout for each DNS lookup (seconds)
# Only applies when skip_dns_verification is false
config.dns_timeout = 1.0
# Total timeout for all DNS operations (seconds)
# Only applies when skip_dns_verification is false
config.dns_total_timeout = 2.0
# How long to cache verification results
config.cache_ttl = 24.hours
# How long to cache in session
config.session_cache_ttl = 1.hour
# Custom model class (optional)
# config.ip_range_model_name = "MyBotIpRange"
# Error callback - integrate with error tracking (Airbrake, Sentry, etc.)
config.on_error = ->(error, context) {
Airbrake.notify(error, context)
# or: Sentry.capture_exception(error, extra: context)
}
# Refresh complete callback - for monitoring/notifications
config.on_refresh_complete = ->(results) {
failures = results.select { |_, r| !r[:success] }
if failures.any?
Rails.logger.warn("BotVerification refresh failures: #{failures.keys}")
end
}
endError Handling
The gem reports errors through multiple channels:
Logger
All operations log to config.logger (defaults to Rails.logger):
-
INFO- Successful operations -
WARN- Non-critical issues (e.g., no ranges fetched) -
ERROR- Failures (HTTP errors, parse errors)
Error Callback
For integration with error tracking services:
config.on_error = ->(error, context) {
# error: The exception object
# context: Hash with :bot_type, :source, :url
Airbrake.notify(error, context)
}Refresh Results
refresh_ip_ranges! returns a hash with success/failure for each bot type:
results = BotVerification.refresh_ip_ranges!
# => {
# google: { success: true, count: 142 },
# bing: { success: true, count: 8 },
# openai_gptbot: { success: false, error: "HTTP 503: Service Unavailable" }
# }
# Check for failures
failures = results.select { |_, r| !r[:success] }Refreshing IP Ranges
IP ranges should be refreshed daily. Choose the method that fits your deployment:
Rake Task (Heroku Scheduler, cron, etc.)
# Refresh all bot types
rails bot_verification:refresh
# Refresh specific bot type
rails bot_verification:refresh_bot[google]Heroku Scheduler: Add rake bot_verification:refresh as a daily job.
Background Job (Sidekiq, etc.)
The gem includes BotVerification::RefreshJob for background processing:
# Enqueue to run now
BotVerification::RefreshJob.perform_later
# Refresh specific bot type
BotVerification::RefreshJob.perform_later("google")
# With Sidekiq-Cron (config/sidekiq.yml)
:schedule:
refresh_bot_ips:
cron: '0 4 * * *' # 4am daily
class: BotVerification::RefreshJobCustom Job
Subclass for custom queue, error handling, or notifications:
# app/jobs/refresh_bot_ip_ranges_job.rb
class RefreshBotIpRangesJob < BotVerification::RefreshJob
queue_as :low
def perform(bot_type = nil)
super
rescue => e
Airbrake.notify(e)
raise
end
endDirect Call
# In a script or console
BotVerification.refresh_ip_ranges!
BotVerification.refresh_ip_ranges!(:google)Cron (via whenever gem)
# config/schedule.rb
every 1.day, at: '4:00 am' do
rake "bot_verification:refresh"
endRake Tasks
# Refresh all IP ranges
rails bot_verification:refresh
# Refresh specific bot type
rails bot_verification:refresh_bot[google]
# Show statistics
rails bot_verification:stats
# Clear caches
rails bot_verification:clear_cache
# Verify an IP
rails bot_verification:verify_ip[66.249.66.1,google]
# Check if table exists
rails bot_verification:check_tableUsing Your Own Model
If you want more control, you can use your own model:
# app/models/my_bot_ip_range.rb
class MyBotIpRange < ApplicationRecord
include BotVerification::IpRangeModel
self.table_name = "my_bot_ip_ranges"
# Add custom methods...
end
# config/initializers/bot_verification.rb
BotVerification.configure do |config|
config.ip_range_model_name = "MyBotIpRange"
endReferences
Changelog
For a detailed list of changes for each version of this project, please see the CHANGELOG.
Development
After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/webventures/bot_verification.
License
The gem is available as open source under the terms of the MIT License.