Category

Web Content Scrapers

This category does not have a description yet. You can add one on github!

0.17
Repository is gone
A long-lived project that still receives updates
MetaInspector lets you scrape a web page and get its links, images, texts, meta tags...
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
 Popularity
0.14
Repository is gone
No release in over 3 years
Anemone web-spider framework
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
 Popularity
0.12
Repository is gone
A pure ruby implementation of the boilerpipe web content extraction algorithm
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
 Popularity
0.06
Repository is gone
No release in over 3 years
Download, pull out of a ZIP/TAR/GZ/BZ2 archive, parse, correct, and import XLS, ODS, XML, CSV, HTML, etc. into your ActiveRecord models. Uses Upsert internally for speed.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
 Popularity
0.04
Repository is gone
No release in over a year
Cobweb is a web crawler that can use resque to cluster crawls to quickly crawl extremely large sites which is much more performant than multi-threaded crawlers. It is also a standalone crawler that has a sophisticated statistics monitoring interface to monitor the progress of the crawls.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
 Popularity
0.04
Repository is gone
No release in over a year
Ruby gem generating thumbnail images from a given URL.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
 Popularity
0.03
Repository is gone
No release in over 3 years
Pismo extracts and retrieves content-related metadata from HTML pages - you can use the resulting data in an organized way, such as a summary/first paragraph, body text, keywords, RSS feed URL, favicon, etc.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
 Popularity
0.02
Repository is gone
A long-lived project that still receives updates
Generic Web crawler with a DSL that parses structured data from web pages
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
 Popularity
0.01
Repository is gone
No release in over a year
Crawl web sites easily using ruby recipes, with caching and nokogiri.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
 Popularity
0.01
Repository is gone
No release in over 3 years
Easily fetch product information from third party websites such as Amazon, Steam, eBay, etc.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
 Popularity
0.01
Repository is gone
No release in over a year
It's a scraper
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
 Popularity
0.0
Repository is gone
No release in over a year
Modern web scraping framework written in Ruby and based on Capybara/Nokogiri
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
 Popularity
0.0
Repository is gone
A simple, fast web crawler
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
 Popularity
0.0
Repository is gone
No release in over 3 years
A simple plugin for extracting information from url entered by user (Something like what facebook does). This gem is built on top of opengraph gem created by michael bleigh.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
 Popularity
0.0
Repository is gone
No release in over 3 years
DocParser is a Ruby Gem for webscraping
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
 Popularity
0.0
Repository is gone
No release in over 3 years
MediaWiki API and Page content parser for Headlines (nested), TextBlocks, ListItems, and Links.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
 Popularity