Project

alacrity

0.0
No commit activity in last 3 years
No release in over 3 years
Web Page Scraper written in Ruby - Extracts any viable HTML DOM elements specified using CSS selectors into structured scraped data
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Runtime

~> 1.5.11
 Project Readme

ALACRITY

Alacrity is a simple Ruby Scraper, given a web page source url, alacrity finds all relevant information you want for the search elements. Alacrity depends on Nokogiri gem and uses css selector inbuilt in Nokogiri.

Installation

Add gem 'alacrity' to Gemfile
Or
gem install skrape

How to use Alacrity

Lets say you have a source page url where-in a snippet is following:

<html>
    <body>
        <h3>I want to be scraped!</h3>
        <h3>Dont forget to scrap me too!</h3>
    </body>
</html>

Running Alacrity for searching elements 'h3' will return something like this:

{all_h3_tags => {0=>"I want to be scraped!",1=>{Dont forget to scrap me too!}}

Sample Run:

get_me_info = Alacrity::Source.new("http://some_url.com") do
  fetch "all_h3_tags", :lookup=>'h3'
end

Custom procs and lambas!

Alacrity gets the text of all elements found by default, although you can run your own Procs with definition depending what you want your structured data to be, note the 'elem' inside your proc/lambda are Nokogiri::XML::Element, so read the documentation over at Nokogiri to see the methods and variables you have defined on Nokogiri::XML::Element

Sample Run:

get_me_info = Alacritys::Source.new("http://www.infibeam.com/") do
  fetch "all_anchor_tags", :lookup=>'a',:post_fetch=>proc {|elem| elem.attributes["href"].value rescue nil}
end

get_me_info.structured_data["all_anchor_tags"] should give you all anchor tags links!

Read more in Wiki