HTML parsing

nokogiri

1359

959
80
Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser. Among Nokogiri's many features is the ability to search documents via XPath or CSS3 selectors. XML is like violence - if it doesn’t solve your problems, you are not using enough of it. Last commit: Mon, 30 Aug 2010 16:02:24 +0000

gem install nokogiri

Downloads: 408036

v1.4.3.1
70026

hpricot

333

180
38
a swift, liberal HTML parser with a fantastic library Last commit: Mon, 30 Aug 2010 14:56:47 +0000

gem install hpricot

Downloads: 209697

v0.8.2
157032

scrubyt

259

254
29
scRUBYt! is an easy to learn and use, yet powerful and effective web scraping framework. It's most interesting part is a Web-scraping DSL built on HPricot and WWW::Mechanize, which allows to navigate to the page of interest, then extract and query data records with a few lines of code. It is hard to describe scRUBYt! in a few sentences - you have to see it for yourself! Last commit: Mon, 25 May 2009 17:07:26 +0000

gem install scrubyt

Downloads: 4421

v0.4.06
3203

scrapi

57

74
3
scrAPI is an HTML scraping toolkit for Ruby. It uses CSS selectors to write easy, maintainable scraping rules to select, extract and store data from HTML content. Last commit: Mon, 25 Aug 2008 20:41:23 +0000

gem install scrapi

Downloads: 3469

v1.2.0
3350

libxml-ruby

22

11
3
The Libxml-Ruby project provides Ruby language bindings for the GNOME Libxml2 XML toolkit. It is free software, released under the MIT License. Libxml-ruby's primary advantage over REXML is performance - if speed is your need, these are good libraries to consider, as demonstrated by the informal benchmark below. Last commit: Sun, 02 May 2010 21:38:42 +0000

gem install libxml-ruby

Downloads: 106188

v1.1.4
44191