Category: Web Content Scrapers

33%

56%

2013-01-26

link_thumbnailer gottfrois/link_thumbnailer Homepage Documentation Source Code Bug Tracker Wiki

link_thumbnailer

0.3

No release in over 3 years

Low commit activity in last 3 years

Ruby gem generating thumbnail images from a given URL.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

667,654

510

104

Releases

3.4.0

2012-08-19

2020-07-24

Activity

89%

79%

2019-02-26

data_miner seamusabshere/data_miner Homepage Documentation Source Code Bug Tracker Wiki

data_miner

0.12

No commit activity in last 3 years

No release in over 3 years

There's a lot of open issues

Download, pull out of a ZIP/TAR/GZ/BZ2 archive, parse, correct, and import XLS, ODS, XML, CSV, HTML, etc. into your ActiveRecord models. Uses Upsert internally for speed.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

472,605

301

Releases

3.0.0

115

2009-10-30

2014-02-04

Activity

63%

100%

2013-07-08

cobweb stewartmckee/cobweb Homepage Documentation Source Code Bug Tracker Wiki

cobweb

0.13

No release in over 3 years

Low commit activity in last 3 years

There's a lot of open issues

Cobweb is a web crawler that can use resque to cluster crawls to quickly crawl extremely large sites which is much more performant than multi-threaded crawlers. It is also a standalone crawler that has a sophisticated statistics monitoring interface to monitor the progress of the crawls.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

323,132

228

Releases

1.2.1

2010-11-10

2021-01-09

Activity

50%

57%

2016-04-07

sinew gurgeous/sinew Homepage Documentation Source Code Bug Tracker

sinew

0.09

Low commit activity in last 3 years

A long-lived project that still receives updates

Crawl web sites easily using ruby recipes, with caching and nokogiri.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

52,785

254

Releases

4.0.1

2012-06-04

2023-08-19

Activity

100%

37%

2018-04-28

tanakai glaucocustodio/tanakai Homepage Documentation Source Code

tanakai

0.09

Low commit activity in last 3 years

Maintained fork of Kimurai, a modern web scraping framework written in Ruby and based on Capybara/Nokogiri

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

12,858

260

Releases

1.7.3

2022-08-13

2023-12-14

Activity

57%

2021-01-05

fletcher hulihanapplications/fletcher Homepage Documentation Source Code Bug Tracker Wiki

fletcher

0.03

No release in over 3 years

Low commit activity in last 3 years

Easily fetch product information from third party websites such as Amazon, Steam, eBay, etc.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

80,242

Releases

0.6.9

2011-12-07

2014-05-12

Activity

75%

23%

2014-01-04