0.0
No commit activity in last 3 years
No release in over 3 years
A flexible, modular web crawler
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 5.0
~> 0.14
~> 0.7

Runtime

~> 0.14
>= 1.9
~> 1.5
~> 0.10
~> 0.6
 Project Readme

NewsCrawler

NewsCrawler is a flexible, modular web crawler intended to provide website analysis framework.

Build Status Coverage Status

Installation

gem install news_crawler

Getting started

To crawl a site (e.g. www.example.com) with default configuration and modules

news_crawler www.example.com

You can resume crawling by invoke without any arguments.

news_crawler

For more informations about configuration, modules development see NewsCrawler's page

Requirements

  • Ruby >= 1.9.3
  • MongoDB

Caution

This is a prelease version, so API can be changed significantly.

Copyright

Copyright (C) 2013 Hà Quang Dương contact@haqduong.net

NewsCrawler is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

NewsCrawler is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with NewsCrawler. If not, see http://www.gnu.org/licenses/.