Project

web_parser

0.0
No commit activity in last 3 years
No release in over 3 years
Simple gem for easy web page parsing.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 1.3
>= 0
>= 0

Runtime

>= 1.6.0
 Project Readme

WebParser Code Climate

Simple gem for easy information fetching from web pages.

Installation

Add this line to your application's Gemfile:

gem 'web_parser'

And then execute:

$ bundle

Or install it yourself as:

$ gem install web_parser

Example usage

Just write your own class and include WebParser::Doc.

class YahooSearch
  # include WebParser
  include WebParser::Doc

  # define recipes
  recipes do
    # simplest way to define recipe
    query          :xpath, '//input[@id="yschsp"]/@value'
    # You can use simple normalization as last parameter
    query_downcase :xpath, '//input[@id="yschsp"]/@value',
                   ->(value) { value.text.downcase }
    # You can also provide just method name for normalization
    page_number    :css, '#pg > strong', :to_i
    # Or you can do whatever you want to obtain value, just provide lambda as
    # parameter
    first_page?    :lambda, ->(doc) {
      doc.css('#pg > strong').text.to_i == 1
    }
    # Nesting
    right_links do
      sign_in :css, '#yucs-profile', :strip
      mail    :css, '#yucs-mail_link_id', :strip
    end
    # Array, usefull for example when parsing eshops
    results :css, '#web > ol > li' do
      name :css, '> .res h3'
      url  :xpath, './/h3[1]/a/@href'
    end
  end
end

Then just initialize your class and call parse.

require 'open-uri'

html_page = open('http://search.yahoo.com/search?p=Ruby').read

YahooSearch.new(html_page).parse
=> {:query=>"Ruby",
    :query_downcase=>"ruby",
    :page_number=>1,
    :first_page?=>true,
    :right_links=>{:sign_in=>"Sign In", :mail=>"Mail"},
    :results=>
      [
        {
          :name=>"Ruby Programming Language",
          :url=>"https://www.ruby-lang.org/en/" },
        {
          :name=>"Ruby - Wikipedia, the free encyclopedia",
          :url=>"http://en.wikipedia.org/wiki/Ruby"},
        {
          :name=>"Ruby - Image Results",
          :url=>"http://images.search.yahoo.com/search/images?_adv_prop=image&va=Ruby"},
        {
          :name=>"Ruby (programming language) - Wikipedia, the free
                encyclopedia",
          :url=>"http://en.wikipedia.org/wiki/Ruby_(programming_language)"},
        {
          :name=>"‘Ruby’ Today: Reality Star Dishes on Show’s Failure ...",
          :url=>"http://abcnews.go.com/blogs/entertainment/2013/01/ruby-today-reality-star-dishes-on-shows-failure/"},
        {
          :name=>"Download Ruby",
          :url=>"https://www.ruby-lang.org/en/downloads/"},
        {
          :name=>"Ruby: The gemstone Ruby information and pictures",
          :url=>"http://www.minerals.net/gemstone/ruby_gemstone.aspx"},
        {
          :name=>"Ruby - Gemstone",
          :url=>"http://www.gemstone.org/index.php?option=com_content&view=article&id=85:ruby&catid=1:gem-by-gem&Itemid=14"},
        {
          :name=>"Buy Loose Precious Ruby Gemstones at Wholesale Prices from ...",
          :url=>"http://www.gemselect.com/ruby/ruby.php"},
        {
          :name=>"Ruby on Rails",
          :url=>"http://rubyonrails.org/"},
        {
          :name=>"Ruby (Adventures) - Bulbapedia, the community-driven Pokémon ...",
          :url=>"http://bulbapedia.bulbagarden.net/wiki/Ruby_(Adventures)"}
      ]
    }

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request