A long-lived project that still receives updates
Rubygem that finds the canonical version of a URL by providing #canonicalize methods for the String, URI::HTTP, URI::HTTPS and Addressable::URI classes
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Runtime

 Project Readme

URLCanonicalize

Gem Version Gem downloads Build status Maintainability Coverage Status Dependency Status Security Codacy Badge

URLCanonicalize is a Ruby gem that finds the canonical version of a URL. It provides canonicalize methods for the String, URI::HTTP, URI::HTTPS and Addressable::URI classes.

Installation

Add this line to your application's Gemfile:

gem 'url_canonicalize'

Usage

'http://www.twitter.com'.canonicalize # => 'https://twitter.com/'

URI('http://www.twitter.com').canonicalize # => #<URI::HTTP:0x00000008767908 URL:https://twitter.com/>

Addressable::URI.canonicalize('http://www.twitter.com') # => #<Addressable::URI:0x43c9 URI:https://twitter.com/>

More Information

URLCanonical follows HTTP redirects and also looks for rel="canonical" hints in both the HTTP headers and the <head> section of the response HTML. The URL it returns will be both normalized and canonical. The intention is that whatever variant of a URL is supplied the result will always be the same. The intended use case is for applications that need to dedupe a list of URLs, for instance to check if a new URL is already present in a list. If the list is built from canonicalized URLs then the resulting set will have fewer URLs that point to the same ultimate resource.