Project

iron-crawler

0.0

No commit activity in last 3 years

No release in over 3 years

iron-crawler noqcks/iron-crawler Homepage Documentation Source Code Bug Tracker Wiki

A generic web crawler that doesn't crawl outside URLs.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

2026

Popularity

18,312

1

0

0

Releases

Current version

1.2.1

7

2016-02-07

2016-02-08

Development

Primary Language

Ruby

Licenses

MIT

Average date of last 50 commits

2016-02-07

Reverse Dependencies

0

Dependencies

Development

~> 1.0

~> 1.0.2

flay

~> 2.7.0

flog

~> 4.3.2

inch

~> 0.7.0

~> 2.0.1

rdoc

~> 3.12

reek

~> 3.10.0

~> 5.0.0

~> 3.4.0

~> 0.37.0

>= 0

>= 0

yard

~> 0.8.0

Runtime

>= 0

Project Readme

Iron Crawler

A generic web crawler.

Features

From a starting URL, it will crawl all links on that URL and print a list of URLs visited.

Follow href attributes contained in tags from the same domain
Ignores href attributes contained in tags from other domains (even subdomains)
Captures script src and link href tags for script and link tags respectively
Outputs a list of visited URLs

Getting Started

It's easy to get started!

Install

gem install iron-crawler

Run

iron-crawler <url>

The above command will crawl any site for you.

TODO

concurrency (will probably have to move away from mechanize)
test coverage with Rspec
set up CI pipeline with travis-ci to automatically publish to rubygems