Category: Web Content Scrapers

33%

56%

2013-01-26

link_thumbnailer gottfrois/link_thumbnailer Homepage Documentation Source Code Bug Tracker Wiki

link_thumbnailer

0.24

No release in over 3 years

Low commit activity in last 3 years

Ruby gem generating thumbnail images from a given URL.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

Popularity

760,915

512

105

Releases

3.4.0

2012-08-19

2020-07-24

Activity

88%

80%

2020-08-27

cobweb stewartmckee/cobweb Homepage Documentation Source Code Bug Tracker Wiki

cobweb

0.1

No release in over 3 years

Low commit activity in last 3 years

There's a lot of open issues

Cobweb is a web crawler that can use resque to cluster crawls to quickly crawl extremely large sites which is much more performant than multi-threaded crawlers. It is also a standalone crawler that has a sophisticated statistics monitoring interface to monitor the progress of the crawls.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

Popularity

338,039

225

Releases

1.2.1

2010-11-10

2021-01-09

Activity

50%

57%

2016-04-07

data_miner seamusabshere/data_miner Homepage Documentation Source Code Bug Tracker Wiki

data_miner

0.09

No commit activity in last 3 years

No release in over 3 years

There's a lot of open issues

Download, pull out of a ZIP/TAR/GZ/BZ2 archive, parse, correct, and import XLS, ODS, XML, CSV, HTML, etc. into your ActiveRecord models. Uses Upsert internally for speed.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

Popularity

492,739

306

Releases

3.0.0

115

2009-10-30

2014-02-04

Activity

63%

100%

2013-07-08

tanakai glaucocustodio/tanakai Homepage Documentation Source Code Bug Tracker

tanakai

0.08

Low commit activity in last 3 years

Maintained fork of Kimurai, a modern web scraping framework written in Ruby and based on Capybara/Nokogiri

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

Popularity

48,857

283

Releases

1.7.5

2022-08-13

2025-02-10

Activity

66%

66%

2021-12-23

sinew gurgeous/sinew Homepage Documentation Source Code Bug Tracker

sinew

0.07

Low commit activity in last 3 years

No release in over a year

Crawl web sites easily using ruby recipes, with caching and nokogiri.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

Popularity

55,934

254

Releases

4.0.1

2012-06-04

2023-08-19

Activity

100%

37%

2018-04-28

boilerpipe-ruby gregors/boilerpipe-ruby Homepage Documentation Source Code Bug Tracker Wiki

boilerpipe-ruby

0.04

No commit activity in last 3 years

No release in over 3 years

A pure ruby implementation of the boilerpipe web content extraction algorithm

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

Popularity

1,394,448

Releases

0.5.0

2016-03-13

2021-02-15

Activity

42%

2019-05-06

fletcher hulihanapplications/fletcher Homepage Documentation Source Code Bug Tracker Wiki

fletcher

0.03

No release in over 3 years

Low commit activity in last 3 years

Easily fetch product information from third party websites such as Amazon, Steam, eBay, etc.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

Popularity

82,970

Releases

0.6.9

2011-12-07

2014-05-12

Activity

75%

23%

2014-01-04

docparser jurriaan/docparser Homepage Documentation Source Code Bug Tracker Wiki

docparser

0.01

No commit activity in last 3 years

No release in over 3 years

DocParser is a Ruby Gem for webscraping

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

Popularity

33,895

Releases

0.3.0

2013-04-11

2020-04-13

Activity

100%

2016-05-08

arachnid2 samnissen/arachnid2 Homepage Documentation Source Code Bug Tracker Wiki

arachnid2

0.01

No release in over 3 years

Low commit activity in last 3 years

A simple, fast web crawler

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

Popularity

31,813

Releases

0.4.0

2018-05-29

2020-07-15

Activity

100%

69%

2019-04-15

horsefield apa512/horsefield Homepage Documentation Source Code Bug Tracker Wiki

horsefield

0.0

Low commit activity in last 3 years

A long-lived project that still receives updates

It's a scraper

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

Popularity

99,646

Releases

0.7.1

2013-08-25

2025-05-17

Activity

2015-12-11