No release in over 3 years
Low commit activity in last 3 years
Uses the fast Aho-Corasick text search system to find occurrences of any of a dictionary of strings across an input string.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Development

~> 3.2
 Project Readme

Aho-Corasick Matcher

A Ruby gem for finding strings in text using the Aho-Corasick string matching search.

Aho-Corasick is O(n + m) where n is the size of the string to be searched and m is the size of the dictionary. This means it's particularly suited for searching for occurrences of words using large dictionaries, as the runtime increases only linearly.

It's quite memory-intensive, and building a matcher is expensive – but once it's been built, matching terms is very fast.

Current version: 1.0.2
Supported Ruby versions: >= 2.7

Usage

require 'aho_corasick_matcher'

matcher = AhoCorasickMatcher.new(['a', 'b', 'ab'])
matcher.match('aba')
#=> ['a', 'ab', 'b', 'a']

matcher = AhoCorasickMatcher.new(['thistle', 'sift', 'thistles'])
matcher.match('Theophilus thistle, the successful thistle sifter, in sifting a sieve full of un-sifted thistles, thrust three thousand thistles through the thick of his thumb.')
#=> ["thistle", "thistle", "sift", "sift", "sift", "thistle", "thistles", "thistle", "thistles"]

Thanks

Loosely based on Tim Cowlishaw's implementation of the same algorithm.

License

Copyright © 2015-2024 Altmetric LLP

Distributed under the MIT License.