Hiroiyomi
Hiroiyomi is an HTML parser and provides a filter feature by tag names.
Features
- Filter HTML page content by tag names
Synopsis
Hiroiyomi reads a HTML raw content from a specific url and builds DOM structures inside. After the structures are built, the DOM elements are filtered. Here are the steps of how to build a element from raw string.
e.g. <h1 class="title">Hiroiyomi</1>
- Check one char until
<appears. - After the '<', check next sequential chars and store them as element name until space char appears.
- After the space char, check next sequential chars and store them as attribute name until space char,
>,/,=,", or'appears. - After
>or/, there is no attribute value. After=,", or', check next sequential chars and store them as attribute value until space char,>,/,", or'appears. - After
/, the element does not have close tag. - After
>, check next sequential chars and store them as text of the element child until new<appears, - After
</, check next sequential chars and compare them with the element name whether the both are the same. - If the both the element name and the name after
</are the same, the element is build as DOM element in this case.
Installation
Add this line to your application's Gemfile:
gem 'hiroiyomi'And then execute:
$ bundle
Or install it yourself as:
$ gem install hiroiyomi
Usage
# @param [String] url URL
# @param [Array] filter of filtered by name list, e.g. [h1, h2, h3]
# @param [Boolean] is_deep Whether result is filtered into children
#
# @return [Array] of Hiroiyomi::Html::Element which has been filtered
Hiroiyomi.read('https://github.com', filter: %w[h1 h2 h3 a link], is_deep: true) Requirement
- Ruby 2.5.1+
Development
After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/tomosm/hiroiyomi.