Web Content Scrapers

anemone

Anemone web-spider framework

Rubygem anemone

Total Downloads
160857
Releases
23
Current Version
0.7.2
Released
2012-05-30 00:00:00 UTC
First Release
2009-04-14 05:00:00 UTC

Github chriskite/anemone

Watchers
1024
Forks
223
Development activity
Inactive
Last commit
2012-05-30 19:34:21 UTC
First commit
Contributors
10
Issues

Pismo

Pismo extracts and retrieves content-related metadata from HTML pages - you can use the resulting data in an organized way, such as a summary/first paragraph, body text, keywords, RSS feed URL, favicon, etc.

Rubygem pismo

Total Downloads
27637
Releases
13
Current Version
0.7.4
Released
2010-12-19 00:00:00 UTC
First Release
2010-03-26 00:00:00 UTC
Depending Gems
2

Github peterc/pismo

Watchers
501
Forks
85
Development activity
Less active
Last commit
2013-09-14 13:04:08 UTC
Contributors
11
Issues
Wiki pages

data_miner

Download, pull out of a ZIP/TAR/GZ/BZ2 archive, parse, correct, and import XLS, ODS, XML, CSV, HTML, etc. into your ActiveRecord models. Uses Upsert internally for speed.

Rubygem data_miner

Total Downloads
150060
Releases
115
Current Version
3.0.0
Released
2014-02-04 00:00:00 UTC
First Release
2009-10-30 07:00:00 UTC

Github seamusabshere/data_miner

Watchers
229
Forks
18
Development activity
Less active
Last commit
2014-02-04 20:35:33 UTC
Contributors
5
Issues
Wiki pages

metainspector

MetaInspector lets you scrape a web page and get its title, charset, link and meta tags

Rubygem metainspector

Total Downloads
41290
Releases
52
Current Version
2.1.0
Released
2014-02-16 00:00:00 UTC
First Release
2007-12-05 23:00:00 UTC
Depending Gems
3

Github jaimeiniesta/metainspector

Watchers
203
Forks
22
Development activity
Less active
Last commit
2014-03-25 08:45:34 UTC

cobweb

Cobweb is a web crawler that can use resque to cluster crawls to quickly crawl extremely large sites which is much more performant than multi-threaded crawlers. It is also a standalone crawler that has a sophisticated statistics monitoring interface to monitor the progress of the crawls.

Rubygem cobweb

Total Downloads
70164
Releases
81
Current Version
1.0.19
Released
2013-11-26 00:00:00 UTC
First Release
2010-11-10 00:00:00 UTC
Depending Gems
2

Github stewartmckee/cobweb

Watchers
74
Forks
19
Development activity
Less active
Last commit
2014-01-17 11:32:48 UTC
Top contributors
Contributors
5
Issues
Wiki pages

sinew

Crawl web sites easily using ruby recipes, with caching and nokogiri.

Rubygem sinew

Total Downloads
3232
Releases
5
Current Version
1.0.4
Released
2013-11-10 00:00:00 UTC
First Release
2012-06-04 00:00:00 UTC
Depending Gems
0

Github gurgeous/sinew

Watchers
182
Forks
10
Development activity
Inactive
Last commit
2013-11-14 04:30:26 UTC
Top contributors
Contributors
1
Issues
Wiki pages

fletcher

Easily fetch product information from third party websites such as Amazon, Steam, eBay, etc.

Rubygem fletcher

Total Downloads
10500
Releases
15
Current Version
0.6.6
Released
2013-12-01 00:00:00 UTC
First Release
2011-12-07 00:00:00 UTC
Depending Gems
0

Github hulihanapplications/fletcher

Watchers
33
Forks
6
Development activity
Less active
Last commit
2013-12-01 00:23:16 UTC
Top contributors
Contributors
2
Issues
Wiki pages

DocParser

DocParser is a Ruby Gem for webscraping

Rubygem docparser

Total Downloads
3347
Releases
8
Current Version
0.2.0
Released
2014-03-08 00:00:00 UTC
First Release
2013-04-11 00:00:00 UTC
Depends on following gems
Depending Gems
0

Github jurriaan/docparser

Watchers
6
Forks
0
Development activity
Less active
Last commit
2013-10-19 18:17:45 UTC
First commit
Top contributors
Contributors
1
Issues
Wiki pages

url_scraper

A simple plugin for extracting information from url entered by user (Something like what facebook does). This gem is built on top of opengraph gem created by michael bleigh.

Rubygem url_scraper

Total Downloads
1251
Releases
3
Current Version
0.0.5
Released
2013-07-23 00:00:00 UTC
First Release
2013-05-06 00:00:00 UTC
Depends on following gems
Depending Gems
0

Github super-engineer/url_scraper

Watchers
5
Forks
0
Development activity
Less active
Last commit
2013-07-23 17:05:52 UTC
First commit
Top contributors
Contributors
0
Issues
×

In order to continue, you must be signed in using your Github account.

If you're signing in using this account for the first time Github will ask for your permission to give access to your public user data to the Ruby Toolbox.

Although the Github Authorization page does not mention it, the request includes read-only access to your verified email address (user:email OAuth scope). This is neccessary so there's a way to notify you about comments, information about your accepted project edits and the like. You can review your notification settings on your account page once you're signed in.