Category

Web Content Scrapers

This category does not have a description yet. You can add one on github!

0.49
No commit activity in last 3 years
No release in over 3 years
There's a lot of open issues
Pismo extracts and retrieves content-related metadata from HTML pages - you can use the resulting data in an organized way, such as a summary/first paragraph, body text, keywords, RSS feed URL, favicon, etc.
0.19
No release in over 3 years
Low commit activity in last 3 years
There's a lot of open issues
Cobweb is a web crawler that can use resque to cluster crawls to quickly crawl extremely large sites which is much more performant than multi-threaded crawlers. It is also a standalone crawler that has a sophisticated statistics monitoring interface to monitor the progress of the crawls.
0.19
No commit activity in last 3 years
No release in over 3 years
There's a lot of open issues
Download, pull out of a ZIP/TAR/GZ/BZ2 archive, parse, correct, and import XLS, ODS, XML, CSV, HTML, etc. into your ActiveRecord models. Uses Upsert internally for speed.
0.01
No commit activity in last 3 years
No release in over 3 years
A simple plugin for extracting information from url entered by user (Something like what facebook does). This gem is built on top of opengraph gem created by michael bleigh.