No release in over 3 years
Low commit activity in last 3 years
Uses the fast Aho-Corasick text search system to find occurrences of any of a dictionary of strings across an input string.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Development

~> 3.2
 Project Readme

Aho-Corasick Matcher

A Ruby gem for finding strings in text using the Aho-Corasick string matching search.

Aho-Corasick is O(n + m) where n is the size of the string to be searched and m is the size of the dictionary. This means it's particularly suited for searching for occurrences of words using large dictionaries, as the runtime increases only linearly.

It's quite memory-intensive, and building a matcher is expensive – but once it's been built, matching terms is very fast.

Current version: 1.0.2
Supported Ruby versions: >= 2.7

Usage

require 'aho_corasick_matcher'

matcher = AhoCorasickMatcher.new(['a', 'b', 'ab'])
matcher.match('aba')
#=> ['a', 'ab', 'b', 'a']

matcher = AhoCorasickMatcher.new(['thistle', 'sift', 'thistles'])
matcher.match('Theophilus thistle, the successful thistle sifter, in sifting a sieve full of un-sifted thistles, thrust three thousand thistles through the thick of his thumb.')
#=> ["thistle", "thistle", "sift", "sift", "sift", "thistle", "thistles", "thistle", "thistles"]

Thanks

Loosely based on Tim Cowlishaw's implementation of the same algorithm.

License

Copyright © 2015-2024 Altmetric LLP

Distributed under the MIT License.