CrawlerDetect is a Ruby version of PHP class @CrawlerDetect.
It helps to detect bots/crawlers/spiders via the user agent and other HTTP-headers. Currently able to detect 1,000's of bots/spiders/crawlers.
Comparing with other popular bot-detection gems:
|Number of bot-patterns||>1000||~280||~280|
|Number of checked HTTP-headers||10||1||1|
|Number of updates of bot-list (1st half of 2018)||14||1||7|
In order to remain up-to-date, this gem does not accept any crawler data updates – any PRs to edit the crawler data should be offered to the original JayBizzle/CrawlerDetect project.
Add this line to your application's Gemfile:
CrawlerDetect.is_crawler?("Bot user agent") => true
Or if you need crawler name:
detector = CrawlerDetect.new("Googlebot/2.1 (http://www.google.com/bot.html)") detector.is_crawler? # => true detector.crawler_name # => "Googlebot"
Optionally you can add additional methods for
request.is_crawler? # => false request.crawler_name # => nil
It's more flexible to use
request.is_crawler? rather than
CrawlerDetect.is_crawler? because it automatically checks 10 HTTP-headers, not only
Only one thing you have to do is to configure
class Application < Rails::Application # ... config.middleware.use Rack::CrawlerDetect end
In some cases you may want to use your own white-list, or black-list or list of http-headers to detect User-agent.
It is possible to do via
CrawlerDetect::Config. For example, you may have initializer like this:
CrawlerDetect.setup! do |config| config.raw_headers_path = File.expand_path("crawlers/MyHeaders.json", __dir__) config.raw_crawlers_path = File.expand_path("crawlers/MyCrawlers.json", __dir__) config.raw_exclusions_path = File.expand_path("crawlers/MyExclusions.json", __dir__) end
Make sure that your files are correct JSON files. Look at the raw files which are used by default for more information.