Project

hashtml

0.0
No commit activity in last 3 years
No release in over 3 years
HashTML is a gem for parsing HTML documents to Ruby Hash-like objects.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 1.3
~> 10.1
~> 0.8

Runtime

~> 1.5
 Project Readme

hashtml Codeship Status for MRod15/hashtml Build Status Code Climate PullReview stats Dependency Status

HashTML is a gem for parsing HTML documents to Ruby Hash-like objects

Installation

HashTML is available as a RubyGem:

gem install hashtml

Usage

HashTML parses a Nokogiri::HTML::Document or anything that responds to to_s with a string of valid HTML. A HashTML object corresponding to the data structure of the given HTML is generated.

Example:

html = <<-HTML
    <html>
        <body>
            <div id="d1" style="color: blue">
                <h1>hello world!</h1>
            </div>
        </body>
    </html>
HTML
hashtml = HashTML.new(html)
hashtml.inspect # => #<HashTML:0x00000001328650 @root_node=#<HashTML::Node:0x000000013283f8 @name="document", @attributes={}, @children=[#<HashTML::Node:0x00000001327ef8 @name="html", @attributes={}, @children=[#<HashTML::Node:0x00000001327a20 @name="body", @attributes={}, @children=[#<HashTML::Text:0x00000001326300 @text="\n    ">, #<HashTML::Node:0x00000001326288 @name="div", @attributes={"id"=>"d1", "style"=>"color: blue"}, @children=[#<HashTML::Text:0x0000000132c8b8 @text="\n        ">, #<HashTML::Node:0x0000000132c728 @name="h1", @attributes={}, @children=[#<HashTML::Text:0x0000000132b4e0 @text="hello world!">]>, #<HashTML::Text:0x0000000132a7c0 @text="\n    ">]>, #<HashTML::Text:0x00000001329a50 @text="\n    ">, #<HashTML::Node:0x000000013299d8 @name="div", @attributes={"id"=>"d2", "style"=>"color: green"}, @children=[#<HashTML::Text:0x000000013306c0 @text="\n        ">, #<HashTML::Node:0x00000001330620 @name="p", @attributes={}, @children=[#<HashTML::Text:0x0000000132ef00 @text="Lorem ipsum dolor sit amet, consectetur adipiscing elit.">]>, #<HashTML::Text:0x0000000132e5c8 @text="\n    ">]>, #<HashTML::Text:0x0000000132d6f0 @text="\n  ">]>]>]>>

HashTML allows you to convert the object to a Ruby Hash with to_h.

Example:

html = <<-HTML
    <html>
        <body>
            <div id="d1" style="color: blue">
                <h1>hello world!</h1>
            </div>
        </body>
    </html>
HTML
hashtml = HashTML.new(html)
hashtml.to_h # => {"document"=>{:attributes=>{}, :children=>[{"html"=>{:attributes=>{}, :children=>[{"body"=>{:attributes=>{}, :children=>[{:text=>"\n    "}, {"div"=>{:attributes=>{"id"=>"d1", "style"=>"color: blue"}, :children=>[{:text=>"\n        "}, {"h1"=>{:attributes=>{}, :children=>[{:text=>"hello world!"}]}}, {:text=>"\n    "}]}}, {:text=>"\n    "}, {"div"=>{:attributes=>{"id"=>"d2", "style"=>"color: green"}, :children=>[{:text=>"\n        "}, {"p"=>{:attributes=>{}, :children=>[{:text=>"Lorem ipsum dolor sit amet, consectetur adipiscing elit."}]}}, {:text=>"\n    "}]}}, {:text=>"\n  "}]}}]}}]}}

You can access elements and change them simply by "navigating" trough them. And when you're done, simply regenerate your HTML by doing to_html!

Example:

html = <<-HTML
    <html>
        <body>
            <div id="d1" style="color: blue">
                <h1>hello world!</h1>
            </div>
        </body>
    </html>
HTML

hashtml = HashTML.new(html)
hashtml.document.hmtl.body.div.inspect # => #<HashTML::Node:0x00000000b6c128 @name="div", @attributes={"id"=>"d1", "style"=>"color: blue"}, @children=[#<HashTML::Text:0x00000000b72528 @text="\n        ">, #<HashTML::Node:0x00000000b72348 @name="h1", @attributes={}, @children=[#<HashTML::Text:0x00000000b71268 @text="hello world!">]>, #<HashTML::Text:0x00000000b704a8 @text="\n    ">]>

hashtml.document.hmtl.body.div.attributes['id'] = 'new_id1'
hashtml.document.hmtl.body.div.inspect # => #<HashTML::Node:0x00000000b6c128 @name="div", @attributes={"id"=>"new_id1", "style"=>"color: blue"}, @children=[#<HashTML::Text:0x00000000b72528 @text="\n        ">, #<HashTML::Node:0x00000000b72348 @name="h1", @attributes={}, @children=[#<HashTML::Text:0x00000000b71268 @text="hello world!">]>, #<HashTML::Text:0x00000000b704a8 @text="\n    ">]>

hashtml.document.hmtl.body.div.h1.text # => 'hello world!'
hashtml.document.hmtl.body.div.h1.text = 'such edit! wow'
hashtml.document.hmtl.body.div.h1.text # => 'such edit! wow'

hashtml.to_html # => <document><html><body>
                             <div id="new_id1" style="color: blue">
                                 <h1>such edit! wow</h1>
                             </div>
                         </body></html></document>

Worried about navigating and having tons of elements with the same tag at the same level? That's not a problem! Just identify the node by it's attributes!

Example:

html = <<-HTML
<html>
  <body>
    <div class="main">
      <span id="s1" style="color: blue">
        <h1>hello world!</h1>
      </span>
      <span id="s2" style="color: green">
        <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>
      </span>
    </div>
  </body>
</html>
HTML

hashtml = HashTML.new(html)
hashtml.document.html.body.div.span({'id' => 's2'}).attributes['id'] = 'new_id2'
hashtml.document.html.body.div.span({'id' => 's1'}).h1.text = 'such edit! much navigation! wow'

hashtml.to_html # => <document><html><body>
                         <div class="main">
                           <span id="s1" style="color: blue">
                             <h1>such edit! much navigation! wow</h1>
                           </span>
                           <span id="new_id2" style="color: green">
                             <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>
                           </span>
                         </div>
                       </body></html></document>