No commit activity in last 3 years
No release in over 3 years
See how closely two long, multi-word phrases match each other. Something Like That is asymmetrical, meaning “Azkaban” will match “Harry Potter and the Prisoner of Azkaban” much more strongly than vice versa. Great for ordering search results gathered from diverse sources.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Runtime

~> 0.3
 Project Readme

Something Like That

See how closely two long, multi-word phrases match each other.

Something Like That is asymmetrical, meaning “Azkaban” will match “Harry Potter and the Prisoner of Azkaban” much more strongly than vice versa. Great for ordering search results gathered from diverse sources.

Based on the modified Monge-Elkan method described in Jimenez et al. (2009) (pdf), using the amatch library’s implementation of the Jaro-Winkler similarity measure. Available on RubyGems.

Installation

$ gem install something_like_that

Usage

>> require 'something_like_that'
=> true
>> query = SomethingLikeThat.new('Hannibal Lecter')
=> "Hannibal Lecter"
>> query.match('Hannibal Lecter Goes to Washington')
=> 1.0
>> query.match('Hannibal Buress')
=> 0.7071067811865476
>> query.match?('Hannibal Buress')
=> false
>> SomethingLikeThat::Scorer.threshold = 0.7
=> 0.7
>> query.match?('Hannibal Buress')
=> true

Config

This gem collects similarity scores for matching pairs of tokens (words) from two different phrases, then averages them together.

  • SomethingLikeThat::Scorer.threshold (default = 0.8)
    During the first (tokenwise) round of scoring, match scores below this value are dropped to 0. Once the resulting scores are averaged, this value determines whether #match? returns true or false.
  • SomethingLikeThat::Scorer.mean_exponent (default = 2)
    The method outlined by Jimenez et al. (2009) uses a generalized mean to favor matches over non-matches. For a two-word phrase, one exact match (1.0) and one non-match (0.0) produce an arithmetic mean (p = 1) of 0.5 and a quadratic mean (p = 2) of ~0.7. (For an in-depth analysis, see Section 3 of their paper.)

License

The MIT License (MIT)

Copyright © 2017 Ryan Lue