No commit activity in last 3 years
No release in over 3 years
It's a scraper
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 1.7
>= 0
>= 0
>= 12.3.3

Runtime

 Project Readme

Horsefield

It's for scraping.

Installation

Add this line to your application's Gemfile:

gem 'horsefield'

And then execute:

$ bundle

Or install it yourself as:

$ gem install horsefield

Usage

Define a scraper:

class RedditScraper
  include Horsefield::Scraper

  many :posts, '#siteTable .thing' do
    one :title, 'a.title'
    many :links, './/a[contains(@href, "reddit.com")]/@href'
  end

  many :trending, '.trending-subreddits-content > ul > li a'
end

and use it with a URL or an HTML string:

RedditScraper.new('http://www.reddit.com').scrape

Enjoy:

{:posts=>
  [{:title=>"Chris Pratt, homeless, living in this van, holding the script to his first acting job",
    :links=>["http://www.reddit.com/user/Ripsaw99", "http://www.reddit.com/r/pics/", "http://www.reddit.com/r/pics/comments/2v16z9/chris_pratt_homeless_living_in_this_van_holding/"]},
   {:title=>"Cannot believe I got him to sit and stay for this.",
    :links=>["http://www.reddit.com/user/Hurevolution4lx", "http://www.reddit.com/r/aww/", "http://www.reddit.com/r/aww/comments/2v0tuh/cannot_believe_i_got_him_to_sit_and_stay_for_this/"]}
  ...

Contributing

  1. Fork it ( https://github.com/[my-github-username]/horsefield/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request