Category: Web Content Scrapers

Download, pull out of a ZIP/TAR/GZ/BZ2 archive, parse, correct, and import XLS, ODS, XML, CSV, HTML, etc. into your ActiveRecord models. Uses Upsert internally for speed.

2019

2020

2021

2022

2023

2024

472,953

301

3.0.0

115

2009-10-30

2014-02-04

Show more project details Compare

cobweb

0.13

No release in over 3 years

Low commit activity in last 3 years

There's a lot of open issues

cobweb stewartmckee/cobweb Homepage

Cobweb is a web crawler that can use resque to cluster crawls to quickly crawl extremely large sites which is much more performant than multi-threaded crawlers. It is also a standalone crawler that has a sophisticated statistics monitoring interface to monitor the progress of the crawls.

2019

2020

2021

2022

2023

2024

323,607

228

1.2.1

2010-11-10

2021-01-09

Show more project details Compare

link_thumbnailer

0.3

No release in over 3 years

Low commit activity in last 3 years

link_thumbnailer gottfrois/link_thumbnailer Homepage

Ruby gem generating thumbnail images from a given URL.

2019

2020

2021

2022

2023

2024

669,163

510

3.4.0

2012-08-19

2020-07-24

Show more project details Compare

horsefield

0.0

No commit activity in last 3 years

No release in over 3 years

horsefield apa512/horsefield Homepage

It's a scraper

2019

2020

2021

2022

2023

2024

93,690

0.6.1

2013-08-25

2020-05-29

Show more project details Compare

wombat

0.55

Low commit activity in last 3 years

There's a lot of open issues

No release in over a year

wombat felipecsl/wombat Homepage

Generic Web crawler with a DSL that parses structured data from web pages

2019

2020

2021

2022

2023

2024

204,696

1,303

3.0.0

2011-12-27

2022-08-23

Show more project details Compare

anemone

0.87

No commit activity in last 3 years

No release in over 3 years

anemone chriskite/anemone Homepage

Anemone web-spider framework

2019

2020

2021

2022

2023

2024

1,043,653

1,612

0.7.2

2009-04-14

2012-05-30

Show more project details Compare

fletcher

0.03

No release in over 3 years

Low commit activity in last 3 years

fletcher hulihanapplications/fletcher Homepage

Easily fetch product information from third party websites such as Amazon, Steam, eBay, etc.

2019

2020

2021

2022

2023

2024

80,297

0.6.9

2011-12-07

2014-05-12

Show more project details Compare

arachnid2

0.01

No release in over 3 years

Low commit activity in last 3 years

arachnid2 samnissen/arachnid2 Homepage

A simple, fast web crawler

2019

2020

2021

2022

2023

2024

28,975

0.4.0

2018-05-29

2020-07-15

Show more project details Compare

sinew

0.09

Low commit activity in last 3 years

A long-lived project that still receives updates

sinew gurgeous/sinew Homepage

Crawl web sites easily using ruby recipes, with caching and nokogiri.

2019

2020

2021

2022

2023

2024

52,856

254

4.0.1

2012-06-04

2023-08-19

Show more project details Compare

pismo

0.34

No commit activity in last 3 years

No release in over 3 years

There's a lot of open issues

pismo peterc/pismo Homepage

Pismo extracts and retrieves content-related metadata from HTML pages - you can use the resulting data in an organized way, such as a summary/first paragraph, body text, keywords, RSS feed URL, favicon, etc.

2019

2020

2021

2022

2023

2024

230,362

746

0.7.4

2010-03-26

2010-12-19

Show more project details Compare

docparser

0.01

No release in over 3 years

Low commit activity in last 3 years

docparser jurriaan/docparser Homepage

DocParser is a Ruby Gem for webscraping

2019

2020

2021

2022

2023

2024

31,982

0.3.0

2013-04-11

2020-04-13

Show more project details Compare

boilerpipe-ruby

0.05

No commit activity in last 3 years

No release in over 3 years

boilerpipe-ruby gregors/boilerpipe-ruby Homepage

A pure ruby implementation of the boilerpipe web content extraction algorithm

2019

2020

2021

2022

2023

2024

1,384,005

0.5.0

2016-03-13

2021-02-15

Show more project details Compare

kimurai

0.45

No release in over 3 years

Low commit activity in last 3 years

There's a lot of open issues

kimurai vifreefly/kimuraframework Homepage

Modern web scraping framework written in Ruby and based on Capybara/Nokogiri

2019

2020

2021

2022

2023

2024

154,144

999

1.4.0

2018-08-23

2019-01-30

Show more project details Compare

tanakai

0.09

Low commit activity in last 3 years

tanakai glaucocustodio/tanakai Homepage

Maintained fork of Kimurai, a modern web scraping framework written in Ruby and based on Capybara/Nokogiri

2019

2020

2021

2022

2023

2024

13,428

260

1.7.3

2022-08-13

2023-12-14

Show more project details Compare

wiki-api

0.0

Low commit activity in last 3 years

A long-lived project that still receives updates

wiki-api dblommesteijn/wiki-api Homepage

MediaWiki API and Page content parser for Headlines (nested), TextBlocks, ListItems, and Links.

2019

2020

2021

2022

2023

2024

9,177

0.1.2

2013-03-28

2023-12-20

Show more project details Compare

url_scraper

0.0

No commit activity in last 3 years

No release in over 3 years

url_scraper super-engineer/url_scraper Homepage

A simple plugin for extracting information from url entered by user (Something like what facebook does). This gem is built on top of opengraph gem created by michael bleigh.

2019

2020

2021

2022

2023

2024

9,020

0.0.5

2013-05-06

2013-07-23

Show more project details Compare