HttpBL =========== HttpBL is drop-in IP-filtering middleware for Rails 2.3+ and other Rack-based applications. It resolves information about each request's source IP address from the Http:BL service at http://projecthoneypot.org, and denies access to clients whose IP addresses are associated with suspicious behavior like impolite crawling, comment-spamming, dictionary attacks, and email-harvesting. * Deny access to IP addresses that are associated with suspicious behavior which exceeds a customizable threshold. * Expire blocked IPs that have not been associated with suspicious behavior after a customizable period of days. * Identify common search engines by IP address (not User-Agent), and disallow access to a specific subset. * Optionally use memcached to avoid repeated look-ups per client-session Installation ------------ gem install httpbl Basic Usage ------------ HttpBL is Rack middleware, and can be used with any Rack-based application. First, you must obtain an API key for the Http:BL service at http://projecthoneypot.org To add HttpBL to your middleware stack, simply add the following to config.ru: require 'httpbl' use HttpBL, :api_key => "YOUR API KEY" For Rails 2.3+ add the following to environment.rb: require 'httpbl' config.middleware.use HttpBL, :api_key => "YOUR API KEY" Advanced Usage ------------- To insert HttpBL at the top of the Rails rackstack: (use 'rake middleware' to confirm that Rack::Lock is at the top of the stack) config.middleware.insert_before(Rack::Lock, HttpBL, :api_key => "YOUR API KEY") To customize HttpBL's filtering behavior, use the available options: use HttpBL, :api_key => "YOUR API KEY", :deny_types => [1, 2, 4], :threat_level_threshold => 0, :age_threshold => 5, :blocked_search_engines => [0], :memcached_server => "127.0.0.1:11211", :memcached_options => {see: memcache-client documentation} Available Options: The following options (shown with default values) are available to customize the behavior of the httpbl middleware filter: :deny_types => [1, 2, 4, 8, 16, 32, 64, 128] Project Honeypot classifies suspicious behavior as belonging to certain types, which are identified in the API's response to each IP lookup. You can tell HttpBL to only deny certain kinds of behavior by changing this to a subset of those possible. As of March 2009, only types 1, 2, and 4 have been specified, but additional types are reserved for the future and HttpBL checks against all of the anticipated type codes by default. Thus, there may be a very small performance advantage to setting :deny_types => [1, 2, 4] simply to exclude checks for codes that aren't (yet) being used; however, this will have to be updated if more codes come into use, whereas the default requires no further attention. The current types are: 1: Suspicious 2: Harvester 4: Comment Spammer :threat_level_threshold => 2 The threat level reported by Project Honeypot is based on a logarithmic scale, approximated by: 1: 1 spam 25: 100 spam 50: 10,000 spam 100: 1,000,000 spam. in which spam is pronounced spam even in the plural. Choosing a threat level threshold can be tricky business if one isn't sure how accurate the measure of threat is, since it would be improper to block legitimate traffic by mistake. Because the email addresses that Project Honeypot uses as spam-bait are unique, artificial, and well-hidden, NO email should be sent to those addresses at all, and it is fair to assume that even the low threat level associated with just a few spam is still significant. With that in mind, the default threshold is 2; if you want to filter more aggressively, set :threat_level_threshold => 0 :age_threshold => 10 This sets the number of days that IP addresses that have been associated with suspicous activity must wait to regain access after the suspicious activity has ceased. Keeping this at a sane value will allow IPs that are reassigned or cleaned up to expire from the blacklist. If you want to be more aggressive (require a longer cool-off-period), set :age_threshold => 30; if you want to let IPs back in after just a few days, set :age_threshold => 5 :blocked_search_engines => [] Because Project Honeypot identifies search engine traffic by IP address, this filter may be used to exclude certain robots from your site. If one presumes that request-IPs are at least marginally more difficult to spoof than User-Agent strings, this filter may be marginally more effective than some other robot detection systems. If there are particular search engines that you would like to exclude from your site, set :blocked_search_engines => [0, ... ] where the codes defined by http://projecthoneypot.org/httpbl_api.php are: 0: Undocumented 1: AltaVista 2: Ask 3: Baidu 4: Excite 5: Google 6: Looksmart 7: Lycos 8: MSN 9: Yahoo 10: Cuil 11: InfoSeek 12: Miscellaneous :memcached_server => nil :memcached_options => {} When using httpbl in a production environment, it is *strongly* recommended that you configure httpbl to use memcached to temporarily store the blacklist status of client ip addresses. This greatly enhances the efficiency of the filter because it need only look up each client ip address once per session, instead of once per request. It also reduces the potential burden of a popular web application that uses httpbl on project honeypot's api services. Simply set :memcached_server and :memcached_options according to the conventions of the memcache-client ruby library; for example: :memcached_server => '127.0.0.1:11211', :memcached_options => {:namespace => 'my_app'} memcache-client is included in rails by default, but if you're using rack without rails, you will need to install and require the memcache-client gem. :dns_timeout => 0.5 DNS requests to the Http:BL service shouldn't take this long, but if they do, you can modify this setting to prevent the request from hanging until a system default timeout. Of course, setting this timeout too low will essentially disable the filter (but 0 is a bad idea), if responses can't be returned from the API before the request is permitted. Best not to mess with it unless you know what you're doing - it's a safety mechanism.
Project
httpbl
A Rack middleware IP filter that uses Http:BL to exclude suspicious robots.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
Pull Requests
Development
Dependencies
Runtime
>= 0
Project Readme