No commit activity in last 3 years
No release in over 3 years
A big hairy fuzzy spider that crawls your site, wreaking havoc
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

Runtime

>= 0.8.1
>= 4.2.0
 Project Readme

Tarantula¶ ↑

Purpose of this fork¶ ↑

* Add panmind/ssl_helper interaction support - if the crawler receives
  a `302` response redirecting to the very same requested URL, assume
  it is a SSL requirement and use `https!` to set the SSL request hdrs;

* Add panmind/jquery-ajax-nav support - if the crawler receives a `302`
  response with a `#` or `#!` in it, assume it is an XHR requirement
  thus use `xhr` to re-send the request.

DESCRIPTION¶ ↑

Tarantula is a big fuzzy spider. It crawls your Rails application, fuzzing data to see what breaks.

Usage¶ ↑

Installation¶ ↑

The latest and greatest version is always available on GitHub. (See the rakefile for dependencies, or just let RubyGems handle it.)

gem install tarantula

You can also grab it from RubyForge, where we will push stable releases but may not be as bleeding edge as the GitHub gem.

gem install tarantula

Project Setup¶ ↑

To set up Tarantula into your application, add the following line into either config/environment.rb or config/environments/test.rb (preferred). This assumes that you have Rails 2.1 or higher installed.

config.gem 'tarantula', :lib => 'relevance/tarantula'

Since Rails doesn’t (yet) support automatically loading rake tasks that live inside gems, you will need to update your Rakefile to load Tarantula’s rake tasks. The simplest approach is to start by vendoring Tarantula into your Rails app.

mkdir -p vendor/gems
cd vendor/gems
gem unpack tarantula

You can then add the following line into your Rakefile, which will allow your application to discover Tarantula’s rake tasks.

load File.join(RAILS_ROOT, Dir["vendor/gems/tarantula-*/tasks/*.rake"])

Crawling Your App¶ ↑

Use the included rake task to create a Rails integration test that will allow Tarantula to crawl your app.

#!sh
rake tarantula:setup

Take a moment to familiarize yourself with the generated test. If parts of your application require login, update the test to make sure Tarantula can access those parts of your app.

require "relevance/tarantula"

class TarantulaTest < ActionController::IntegrationTest
  # Load enough test data to ensure that there's a link to every page in your
  # application. Doing so allows Tarantula to follow those links and crawl
  # every page.  For many applications, you can load a decent data set by
  # loading all fixtures.
  fixtures :all

  def test_tarantula
    # If your application requires users to log in before accessing certain
    # pages, uncomment the lines below and update them to allow this test to
    # log in to your application.  Doing so allows Tarantula to crawl the
    # pages that are only accessible to logged-in users.
    #
    #   post '/session', :login => 'quentin', :password => 'monkey'
    #   follow_redirect!

    tarantula_crawl(self)
  end
end

If you want to set custom options, you can get access to the crawler and set properties before running it. For example, this would turn on HTMLTidy.

def test_tarantula
  post '/session', :login => 'kilgore', :password => 'trout'
  assert_response :redirect
  assert_redirected_to '/'
  follow_redirect!

  t = tarantula_crawler(self)
  t.handlers << Relevance::Tarantula::TidyHandler.new
  t.crawl '/'
end

Now it’s time to turn Tarantula loose on your app. Assuming your project is at /work/project/:

#!sh
cd /work/project
rake tarantula:test

Verbose Mode¶ ↑

If you run the test using the steps shown above, Tarantula will produce a report in tmp/tarantula. You can also set VERBOSE=true to see more detail as the test runs.

For more options, please see the test suite.

Allowed Errors¶ ↑

If, for example, a 404 is an appropriate response for some URLs, you can tell Tarantula to allow 404s for URLs matching a given regex:

t = tarantula_crawler(self)
t.allow_404_for %r{/users/\d+/}

Testing for Common Attacks¶ ↑

You can specify the attack strings that Tarantula throws at your application.

def test_tarantula
  t = tarantula_crawler(self)

  Relevance::Tarantula::FormSubmission.attacks << { 
    :name => :xss,
    :input => "<script>gotcha!</script>",
    :output => "<script>gotcha!</script>",
  }

  Relevance::Tarantula::FormSubmission.attacks << { 
    :name => :sql_injection,
    :input => "a'; DROP TABLE posts;",
  }

  t.handlers << Relevance::Tarantula::AttackHandler.new
  t.times_to_crawl = 2
  t.crawl "/posts"
end

This example adds custom attacks for both SQL injection and XSS. It also tells Tarantula to crawl the app 2 times. This is important for XSS attacks because the results won’t appear until the second time Tarantula performs the crawl.

Timeout¶ ↑

You can specify a timeout for each specific crawl that Tarantula runs. For example:

def test_tarantula
  t = tarantula_crawler(self)
  t.times_to_crawl = 2
  t.crawl_timeout = 5.minutes
  t.crawl "/"
end

The above will crawl your app twice, and each specific crawl will timeout if it takes longer then 5 minutes. You may need a timeout to keep the tarantula test time reasonable if your app is large or just happens to have a large amount of ‘never-ending’ links, such as with an any sort of “auto-admin” interface.

Bugs/Requests¶ ↑

Please submit your bug reports, patches, or feature requests at Lighthouse:

relevance.lighthouseapp.com/projects/17868-tarantula/overview

You can view the continuous integration results for Tarantula, including results against all supported versions of Rails, on RunCodeRun here:

runcoderun.com/relevance/tarantula

License¶ ↑

Tarantula is released under the MIT license.