slasherrb

This project is actually the ruby version of slasherjs. Slasher is a library that could extract the main content of an HTML article document. The result of extraction is depending of assumption on HTML document structure itself. Therefore, there may be flaws in the result if the document doesn't match the structure that is recognised by the library. This condition will make the library will be improved from time to time.

How To Install

Like other rubygems, just:

gem install slasher

or put this on your Gemfile

gem 'slasher'

How To Use

To use the library, you need to have an HTML document first.

require 'net/http'
require 'slasher'

uri = URI("http://sea-games-2015.liputan6.com/read/2252937/all-indonesia-finals-ganda-putra-sumbang-emas")
html = Net::HTTP.get(uri)

slasher = Slasher.new(html)
content = slasher.slash

#content variable will have the main content of the HTML document (article).

Website Coverage

This library has been tested against some websites and you can see the complete list in this document

TODO

Add more test cases: international websites
Performance analysis
Better API documentation

slasher

Development

Runtime

slasherrb

How To Install

How To Use

Website Coverage

TODO