0.0
No release in over 3 years
Low commit activity in last 3 years
There's a lot of open issues
Twingly URL tools
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Development

~> 12
~> 3
~> 0

Runtime

>= 3.0.1, < 6.0
 Project Readme

Twingly::URL

GitHub Build Status

Twingly URL tools.

  • twingly/url - Parse and validate URLs
    • Twingly::URL.parse - Returns one or more Twingly::URL instance
  • twingly/url/hasher - Generate URL hashes suitable for primary keys
    • Twingly::URL::Hasher.taskdb_hash(url) - MD5 hexdigest
    • Twingly::URL::Hasher.documentdb_hash(url) - SHA256 unsigned long, native endian digest
    • Twingly::URL::Hasher.autopingdb_hash(url) - SHA256 64-bit signed, native endian digest
  • twingly/url/utilities - Utilities to work with URLs
    • Twingly::URL::Utilities.extract_valid_urls - Returns Array of valid Twingly::URL

Getting Started

Install the gem:

gem install twingly-url

Usage (this output was created with examples/url.rb):

require "twingly/url"

url = Twingly::URL.parse("http://www.twingly.co.uk/search")
url.scheme                    # => "http"
url.normalized.scheme         # => "http"
url.trd                       # => "www"
url.normalized.trd            # => "www"
url.sld                       # => "twingly"
url.normalized.sld            # => "twingly"
url.tld                       # => "co.uk"
url.normalized.tld            # => "co.uk"
url.ttld                      # => "uk"
url.normalized.ttld           # => "uk"
url.domain                    # => "twingly.co.uk"
url.normalized.domain         # => "twingly.co.uk"
url.host                      # => "www.twingly.co.uk"
url.normalized.host           # => "www.twingly.co.uk"
url.origin                    # => "http://www.twingly.co.uk"
url.normalized.origin         # => "http://www.twingly.co.uk"
url.path                      # => "/search"
url.normalized.path           # => "/search"
url.without_scheme            # => "//www.twingly.co.uk/search"
url.normalized.without_scheme # => "//www.twingly.co.uk/search"
url.userinfo                  # => ""
url.normalized.userinfo       # => ""
url.user                      # => ""
url.normalized.user           # => ""
url.password                  # => ""
url.normalized.password       # => ""
url.valid?                    # => "true"
url.normalized.valid?         # => "true"
url.to_s                      # => "http://www.twingly.co.uk/search"
url.normalized.to_s           # => "http://www.twingly.co.uk/search"

url = Twingly::URL.parse("http://räksmörgås.макдональдс.рф/foo")
url.scheme                    # => "http"
url.normalized.scheme         # => "http"
url.trd                       # => "räksmörgås"
url.normalized.trd            # => "xn--rksmrgs-5wao1o"
url.sld                       # => "макдональдс"
url.normalized.sld            # => "xn--80aalb1aicli8a5i"
url.tld                       # => "рф"
url.normalized.tld            # => "xn--p1ai"
url.ttld                      # => "рф"
url.normalized.ttld           # => "xn--p1ai"
url.domain                    # => "макдональдс.рф"
url.normalized.domain         # => "xn--80aalb1aicli8a5i.xn--p1ai"
url.host                      # => "räksmörgås.макдональдс.рф"
url.normalized.host           # => "xn--rksmrgs-5wao1o.xn--80aalb1aicli8a5i.xn--p1ai"
url.origin                    # => "http://xn--rksmrgs-5wao1o.xn--80aalb1aicli8a5i.xn--p1ai"
url.normalized.origin         # => "http://xn--rksmrgs-5wao1o.xn--80aalb1aicli8a5i.xn--p1ai"
url.path                      # => "/foo"
url.normalized.path           # => "/foo"
url.without_scheme            # => "//räksmörgås.макдональдс.рф/foo"
url.normalized.without_scheme # => "//xn--rksmrgs-5wao1o.xn--80aalb1aicli8a5i.xn--p1ai/foo"
url.userinfo                  # => ""
url.normalized.userinfo       # => ""
url.user                      # => ""
url.normalized.user           # => ""
url.password                  # => ""
url.normalized.password       # => ""
url.valid?                    # => "true"
url.normalized.valid?         # => "true"
url.to_s                      # => "http://räksmörgås.макдональдс.рф/foo"
url.normalized.to_s           # => "http://xn--rksmrgs-5wao1o.xn--80aalb1aicli8a5i.xn--p1ai/foo"

url = Twingly::URL.parse("http://xn--rksmrgs-5wao1o.xn--80aalb1aicli8a5i.xn--p1ai/foo")
url.scheme                    # => "http"
url.normalized.scheme         # => "http"
url.trd                       # => "xn--rksmrgs-5wao1o"
url.normalized.trd            # => "xn--rksmrgs-5wao1o"
url.sld                       # => "xn--80aalb1aicli8a5i"
url.normalized.sld            # => "xn--80aalb1aicli8a5i"
url.tld                       # => "xn--p1ai"
url.normalized.tld            # => "xn--p1ai"
url.ttld                      # => "xn--p1ai"
url.normalized.ttld           # => "xn--p1ai"
url.domain                    # => "xn--80aalb1aicli8a5i.xn--p1ai"
url.normalized.domain         # => "xn--80aalb1aicli8a5i.xn--p1ai"
url.host                      # => "xn--rksmrgs-5wao1o.xn--80aalb1aicli8a5i.xn--p1ai"
url.normalized.host           # => "xn--rksmrgs-5wao1o.xn--80aalb1aicli8a5i.xn--p1ai"
url.origin                    # => "http://xn--rksmrgs-5wao1o.xn--80aalb1aicli8a5i.xn--p1ai"
url.normalized.origin         # => "http://xn--rksmrgs-5wao1o.xn--80aalb1aicli8a5i.xn--p1ai"
url.path                      # => "/foo"
url.normalized.path           # => "/foo"
url.without_scheme            # => "//xn--rksmrgs-5wao1o.xn--80aalb1aicli8a5i.xn--p1ai/foo"
url.normalized.without_scheme # => "//xn--rksmrgs-5wao1o.xn--80aalb1aicli8a5i.xn--p1ai/foo"
url.userinfo                  # => ""
url.normalized.userinfo       # => ""
url.user                      # => ""
url.normalized.user           # => ""
url.password                  # => ""
url.normalized.password       # => ""
url.valid?                    # => "true"
url.normalized.valid?         # => "true"
url.to_s                      # => "http://xn--rksmrgs-5wao1o.xn--80aalb1aicli8a5i.xn--p1ai/foo"
url.normalized.to_s           # => "http://xn--rksmrgs-5wao1o.xn--80aalb1aicli8a5i.xn--p1ai/foo"

url = Twingly::URL.parse("https://admin:correcthorsebatterystaple@example.com/")
url.scheme                    # => "https"
url.normalized.scheme         # => "https"
url.trd                       # => ""
url.normalized.trd            # => "www"
url.sld                       # => "example"
url.normalized.sld            # => "example"
url.tld                       # => "com"
url.normalized.tld            # => "com"
url.ttld                      # => "com"
url.normalized.ttld           # => "com"
url.domain                    # => "example.com"
url.normalized.domain         # => "example.com"
url.host                      # => "example.com"
url.normalized.host           # => "www.example.com"
url.origin                    # => "https://example.com"
url.normalized.origin         # => "https://www.example.com"
url.path                      # => "/"
url.normalized.path           # => "/"
url.without_scheme            # => "//admin:correcthorsebatterystaple@example.com/"
url.normalized.without_scheme # => "//admin:correcthorsebatterystaple@www.example.com/"
url.userinfo                  # => "admin:correcthorsebatterystaple"
url.normalized.userinfo       # => "admin:correcthorsebatterystaple"
url.user                      # => "admin"
url.normalized.user           # => "admin"
url.password                  # => "correcthorsebatterystaple"
url.normalized.password       # => "correcthorsebatterystaple"
url.valid?                    # => "true"
url.normalized.valid?         # => "true"
url.to_s                      # => "https://admin:correcthorsebatterystaple@example.com/"
url.normalized.to_s           # => "https://admin:correcthorsebatterystaple@www.example.com/"

Dependencies

Only the gems listed in the Gem Specification.

Development

To inspect the Public Suffix List, this handy command can be used (also works in projects that use twingly-url as an dependency).

open $(bundle show public_suffix)/data/list.txt

Tests

Run tests with

bundle exec rake

Profiling

There's some profiling tasks available through Rake

cd profile/
bundle # Install dependencies
bundle exec rake -T # Show available tasks

Note that this isn't a benchmark, we're using ruby-prof and memory_profiler which will slow things down.

Release workflow

  • Update the examples in this README if needed, generate the output with

      ruby examples/url.rb
    
  • Bump the version in lib/twingly/version.rb in a commit, no need to push (the release task does that).

  • Ensure you are signed in to RubyGems.org as twingly with gem signin.

  • Build and publish the gem. This will create the proper tag in git, push the commit and tag and upload to RubyGems.

      bundle exec rake release
    
  • Update the changelog with GitHub Changelog Generator (gem install github_changelog_generator if you don't have it, set CHANGELOG_GITHUB_TOKEN to a personal access token to avoid rate limiting by GitHub). This command will update CHANGELOG.md. You need to commit and push manually.

      github_changelog_generator