Web Content Scrapers
Tools designed to extract and process data from websites efficiently
0.61
Anemone web-spider framework
2021
2022
2023
2024
2025
2026
0.41
MetaInspector lets you scrape a web page and get its links, images, texts, meta tags...
2021
2022
2023
2024
2025
2026
0.39
Generic Web crawler with a DSL that parses structured data from web pages
2021
2022
2023
2024
2025
2026
0.33
Modern web scraping framework written in Ruby and based on Capybara/Nokogiri
2021
2022
2023
2024
2025
2026
0.23
Pismo extracts and retrieves content-related metadata from HTML pages - you can use the resulting data in an organized way, such as a summary/first paragraph, body text, keywords, RSS feed URL, favicon, etc.
2021
2022
2023
2024
2025
2026
0.21
Ruby gem generating thumbnail images from a given URL.
2021
2022
2023
2024
2025
2026
0.09
Cobweb is a web crawler that can use resque to cluster crawls to quickly crawl extremely large sites which is much more performant than multi-threaded crawlers. It is also a standalone crawler that has a sophisticated statistics monitoring interface to monitor the progress of the crawls.
2021
2022
2023
2024
2025
2026
0.08
Download, pull out of a ZIP/TAR/GZ/BZ2 archive, parse, correct, and import XLS, ODS, XML, CSV, HTML, etc. into your ActiveRecord models. Uses Upsert internally for speed.
2021
2022
2023
2024
2025
2026
0.07
Maintained fork of Kimurai, a modern web scraping framework written in Ruby and based on Capybara/Nokogiri
2021
2022
2023
2024
2025
2026
0.06
Crawl web sites easily using ruby recipes, with caching and nokogiri.
2021
2022
2023
2024
2025
2026
0.04
A pure ruby implementation of the boilerpipe web content extraction algorithm
2021
2022
2023
2024
2025
2026
0.02
Easily fetch product information from third party websites such as Amazon, Steam, eBay, etc.
2021
2022
2023
2024
2025
2026
0.01
DocParser is a Ruby Gem for webscraping
2021
2022
2023
2024
2025
2026
0.01
A simple, fast web crawler
2021
2022
2023
2024
2025
2026
0.0
It's a scraper
2021
2022
2023
2024
2025
2026
0.0
A simple plugin for extracting information from url entered by user (Something
like what facebook does). This gem is built on top of opengraph gem created by michael
bleigh.
2021
2022
2023
2024
2025
2026
0.0
MediaWiki API and Page content parser for Headlines (nested), TextBlocks, ListItems, and Links.
2021
2022
2023
2024
2025
2026