Project

redback

0.0
No commit activity in last 3 years
No release in over 3 years
Fetches a URL you give it and recursively searches for all URLs it can find, building up a list of unique URLs on the same hostname.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Runtime

>= 0.8.6
>= 0.6.4
 Project Readme

redback

Redback is a Ruby spider (geddit?). Pass it a website, and it will begin its many-legged crawl, scurrying across the site to pull out all the unique URLs it can find.

Just like a terrifying real-life spider, redback aims to be fast: in particular, it sends requests in parallel so one slow page won't slow down your crawl.

Installation

$ gem install redback

Usage

Command line

$ redback http://example.com/

…in which case it will print all the URLs it finds within the site http://example.com/.

You can output he results to a file like this:

$ redback http://example.com > output.txt

Or feed them to another command line tool like this:

$ redback http://xkcd.com | grep xml

Within Ruby

It can also be used as a library:

require 'redback'

Redback.new "http://example.com" { |url| puts url }

The Redback.new method accepts a URL and a block; the block will be executed for each URL found.