Levenstein algorithm helps calculate the distance between two words. You can make a spell checker using this algorithm.
This gem attempts to solve this as fast as possible but at the expense of disallowing big string and non ASCII characters. Any string more than 4096 characters will raise RuntimeError.
You can also use the pure Ruby method as a workaround, which doesn't have any limitation, but it's way slower.
Read the usage for more info.
You need to have the C compiler installed to be able to compile the C extension. On Arch Linux:
# pacman -S gcc make
On Debian based systems:
# apt install gcc build-essential
You might also require
ruby-dev in Debian based systems which provides support for ruby.h header file:
# apt install ruby-dev
Once you are done, and you can compile the C file, you can follow the installation!
Add this line to your application's Gemfile:
And then execute:
$ bundle install
Or install it yourself as:
$ gem install string_dot_levenshtein
To get word rank, just use
String#levenshtein(str). For example:
irb(main):001:0> require 'string_dot_levenshtein' => true irb(main):002:0> "sunday".levenshtein("saturday") => 3
- The length of the string is limited to 4096 characters.
- Any non-ASCII character can return an incorrect result.
- It's the fastest.
To overcome the cons led by using
irb(main):001:0> require 'string_dot_levenshtein' => true irb(main):002:0> "sunday".levenshtein_rb("saturday") => 3 # This is the C Ruby method irb(main):003:0> "a".levenshtein("🐻") => 4 # This is the pure Ruby method irb(main):004:0> "a".levenshtein_rb("🐻") => 1 # This is the C Ruby method irb(main):005:0> "cómo".levenshtein("como") => 2 # This is the pure Ruby method irb(main):006:0> "cómo".levenshtein_rb("como") => 1
- It's 9 - 10x slower than the String#levenstein(arg) as it's written in pure Ruby
- It can work with arbitrary-length string
- It can work with emojis
- It can work with various other languages
Note that words are case-sensitive, unstripped. 3 and 4-byte emojis are counted as 3 and 4-byte string.
If you are building an application where the user is guaranteed to use only 1 byte ASCII characters, use the
String#levenshtein(str)method, it's fast!
For supporting various languages and emoji (although they don't calculate the distance), use
If you want to search a dictionary against matches, you can get a dictionary.com verified (scraped) wordlist here: https://github.com/Souravgoswami/wordlists/blob/master/dictionary.com_verified_words
After checking out the repo, run
bin/setup to install dependencies.
To install this gem onto your local machine, run
Our main aim is to make the calculation as fast as possible without losing any accuracy. This is a small gem, we don't care a bit about code-readability and maintainability. So any suggestion or contribution that makes the gem faster are greatly welcomed.
Bug reports and pull requests are welcome on GitHub at https://github.com/Souravgoswami/string_dot_levenshtein.
The gem is available as open-source under the terms of the MIT License.