The project is in a healthy, maintained state
Extend Nokogiri with several useful HTML-centric features.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Runtime

>= 1.14
 Project Readme

nokogiri-html-ext

A Ruby gem extending Nokogiri with several useful HTML-centric features.

Gem Downloads Build

Key features

  • Resolves all relative URLs in a Nokogiri-parsed HTML document.
  • Adds helpers for getting and setting a document's <base> element's href attribute.
  • Supports Ruby 2.7 and newer

Getting Started

Before installing and using nokogiri-html-ext, you'll want to have Ruby 2.7 (or newer) installed. Using a Ruby version managment tool like rbenv, chruby, or rvm is recommended.

nokogiri-html-ext is developed using Ruby 2.7.8 and is tested against additional Ruby versions using GitHub Actions.

Installation

Add nokogiri-html-ext to your project's Gemfile and run bundle install:

source "https://rubygems.org"

gem "nokogiri-html-ext"

Usage

base_href

nokogiri-html-ext provides two helper methods for getting and setting a document's <base> element's href attribute. The first, base_href, retrieves the element's href attribute value if it exists.

require "nokogiri/html-ext"

doc = Nokogiri::HTML(%(<html><body>Hello, world!</body></html>))

doc.base_href
#=> nil

doc = Nokogiri::HTML(%(<html><head><base target="_top"><body>Hello, world!</body></html>))

doc.base_href
#=> nil

doc = Nokogiri::HTML(%(<html><head><base href="/foo"><body>Hello, world!</body></html>))

doc.base_href
#=> "/foo"

The base_href= method allows you to manipulate the document's <base> element.

require "nokogiri/html-ext"

doc = Nokogiri::HTML(%(<html><body>Hello, world!</body></html>))

doc.base_href = "/foo"
#=> "/foo"

doc.at_css("base").to_s
#=> "<base href=\"/foo\">"

doc = Nokogiri::HTML(%(<html><head><base href="/foo"><body>Hello, world!</body></html>))

doc.base_href = "/bar"
#=> "/bar"

doc.at_css("base").to_s
#=> "<base href=\"/bar\">"

resolve_relative_urls!

nokogiri-html-ext will resolve a document's relative URLs against a provided source URL. The source URL should be an absolute URL (e.g. https://jgarber.example) representing the location of the document being parsed. The source URL may be any String (or any Ruby object that responds to #to_s).

nokogiri-html-ext takes advantage of the Nokogiri::XML::Document.parse method's second positional argument to set the parsed document's URL.Nokogiri's source code is very complex, but in short: the Nokogiri::HTML method is an alias to the Nokogiri::HTML4 method which eventually winds its way to the aforementioned Nokogiri::XML::Document.parse method. Phew. 🥵

URL resolution uses Ruby's built-in URL parsing and normalizing capabilities. Absolute URLs will remain unmodified.

Note: If the document's markup includes a <base> element whose href attribute is an absolute URL, that URL will take precedence when performing URL resolution.

An abbreviated example:

require "nokogiri/html-ext"

markup = <<-HTML
  <html>
  <body>
    <a href="/home">Home</a>
    <img src="/foo.png" srcset="../bar.png 720w">
  </body>
  </html>
HTML

doc = Nokogiri::HTML(markup, "https://jgarber.example")

doc.url
#=> "https://jgarber.example"

doc.base_href
#=> nil

doc.base_href = "/foo/bar/biz"
#=> "/foo/bar/biz"

doc.resolve_relative_urls!

doc.at_css("base")["href"]
#=> "https://jgarber.example/foo/bar/biz"

doc.at_css("a")["href"]
#=> "https://jgarber.example/home"

doc.at_css("img").to_s
#=> "<img src=\"https://jgarber.example/foo.png\" srcset=\"https://jgarber.example/foo/bar.png 720w\">"

resolve_relative_url

You may also resolve an arbitrary String representing a relative URL against the document's URL (or <base> element's href attribute value):

doc = Nokogiri::HTML(%(<html><base href="/foo/bar"></html>), "https://jgarber.example")

doc.resolve_relative_url("biz/baz")
#=> "https://jgarber.example/foo/biz/baz"

Acknowledgments

nokogiri-html-ext wouldn't exist without the Nokogiri project and its community.

nokogiri-html-ext is written and maintained by Jason Garber.

License

nokogiri-html-ext is freely available under the MIT License. Use it, learn from it, fork it, improve it, change it, tailor it to your needs.