A long-lived project that still receives updates
A Ruby gem wrapping the legendary Rust Aho-Corasick algorithm! Aho-Corasick is a powerful string searching algorithm that finds multiple patterns simultaneously in a text. Features include overlapping matches, case-insensitive search, find & replace, match positions, and configurable match strategies. Perfect for content filtering, tokenization, and multi-pattern search at lightning speed! (ノ◕ヮ◕)ノ*:・゚✧
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Runtime

~> 0.9.117
 Project Readme

Aho-Corasick Rust ✨

Gem Version

Blazing-fast multi-pattern string matching for Ruby! (ノ◕ヮ◕)ノ*:・゚✧

ahocorasick-rust is a Ruby wrapper for the Aho-Corasick algorithm implemented in Rust! 🦀💎

What is Aho-Corasick? 🤔

Aho-Corasick is a powerful string searching algorithm that can find multiple patterns simultaneously in a single pass through your text! Unlike traditional string matching that searches for one pattern at a time, Aho-Corasick builds a finite state machine from your dictionary of patterns and matches them all at once.

Perfect for:

  • 🔍 Content filtering & moderation
  • 📝 Finding keywords in large documents
  • 🚫 Detecting prohibited words or phrases
  • 🏷️ Multi-pattern text analysis
  • ⚡ Any scenario where you need to search for many patterns efficiently!

Why this gem rocks:

  • 🦀 Powered by Rust for maximum speed
  • 💎 Clean, intuitive Ruby API with 7+ search methods
  • 🚀 Up to 67x faster than pure Ruby implementations
  • ✨ Precompiled binaries for major platforms
  • 🎯 Multiple search modes: overlapping, positioned, existence checks
  • 🔄 Find & replace with hash or block-based logic
  • 🌈 Works with Ruby 2.7+ and UTF-8/emoji

Installation 📦

Add this gem to your Gemfile:

gem 'ahocorasick-rust'

Then execute:

bundle install

Or install it yourself:

gem install ahocorasick-rust

Features ✨

  • Multiple search modes - Find all matches, overlapping matches, or just check existence
  • Position tracking - Get byte offsets for every match
  • Case-insensitive matching - Optional ASCII case-insensitive search
  • Match strategies - Control priority when patterns overlap
  • Find & replace - Replace patterns with strings or dynamic logic via blocks
  • Unicode support - Works seamlessly with UTF-8 text and emoji
  • Zero-copy where possible - Efficient memory usage

Quick Start 🎀

Basic Pattern Matching

require 'ahocorasick-rust'

# Create a matcher with your patterns
matcher = AhoCorasickRust.new(['cat', 'dog', 'fox'])

# Find all matches
matcher.lookup("The quick brown fox jumps over the lazy dog.")
# => ["fox", "dog"]

# Check if any pattern exists
matcher.match?("I have a cat")
# => true

Case-Insensitive Matching

matcher = AhoCorasickRust.new(['Ruby', 'Python'], case_insensitive: true)

matcher.lookup('I love RUBY and python!')
# => ["Ruby", "Python"]

Get Match Positions

matcher = AhoCorasickRust.new(['fox', 'dog'])

matcher.lookup_with_positions('The fox and dog')
# => [
#      { pattern: 'fox', start: 4, end: 7 },
#      { pattern: 'dog', start: 12, end: 15 }
#    ]

Find & Replace

matcher = AhoCorasickRust.new(['bad', 'worse', 'worst'])

# Replace with hash
matcher.replace_all('This is bad and worse', { 'bad' => 'good', 'worse' => 'better' })
# => "This is good and better"

# Replace with block
matcher.replace_all('This is bad and worse') { |word| '*' * word.length }
# => "This is *** and *****"

Overlapping Matches

matcher = AhoCorasickRust.new(['abc', 'bcd', 'cde'])

# Regular lookup finds non-overlapping matches
matcher.lookup('abcde')
# => ["abc"]

# Overlapping lookup finds all matches
matcher.lookup_overlapping('abcde')
# => ["abc", "bcd", "cde"]

Advanced: Match Strategies

# Prefer longest matches
matcher = AhoCorasickRust.new(
  ['test', 'testing'],
  match_kind: :leftmost_longest
)

matcher.lookup('testing')
# => ["testing"]  # chooses longer match over 'test'

Find First (Efficient for Existence Checks)

matcher = AhoCorasickRust.new(['foo', 'bar', 'baz'])

# Get just the first match (faster than getting all matches)
matcher.find_first('hello foo bar baz')
# => "foo"

# Or with position
matcher.find_first_with_position('hello foo bar')
# => { pattern: 'foo', start: 6, end: 9 }

API Overview 🔍

Constructor:

  • AhoCorasickRust.new(patterns, case_insensitive: false, match_kind: :leftmost_first)

Search Methods:

  • #lookup(text) - Find all non-overlapping matches
  • #lookup_overlapping(text) - Find all matches including overlaps
  • #lookup_with_positions(text) - Find matches with byte positions
  • #match?(text) - Check if any pattern exists (returns boolean)
  • #find_first(text) - Get first match only
  • #find_first_with_position(text) - Get first match with position

Replace Methods:

  • #replace_all(text, hash) - Replace with hash mapping
  • #replace_all(text) { |match| ... } - Replace with block

Documentation 📖

Want more examples? Check out our example script with content filtering, language detection, and more! 🌈

Benchmark 📊

Don't just take our word for it - check out these performance numbers! 🎉

Test Setup 1

  • Words: 500 patterns
  • Test cases: 2,000
  • Text length: 3,154 chars (avg), 23,676 (max)
       user     system      total        real
each&include  6.487059   0.185424   6.672483 (  6.791808)
ruby_ahoc     4.178672   0.138610   4.317282 (  4.547964)
rust_ahoc     0.157662   0.004847   0.162509 (  0.166964)

🎈 27.2x faster than pure Ruby implementation!

Test Setup 2

  • Words: 500 patterns
  • Test cases: 2,000
  • Text length: 49,162 chars (avg), 10,392,056 (max)
       user     system      total        real
each&include 27.903179   0.237389  28.140568 ( 28.563194)
ruby_ahoc    45.220535   0.363107  45.583642 ( 46.477702)
rust_ahoc     0.670583   0.007192   0.677775 (  0.686904)

🎈 67.7x faster than pure Ruby implementation!

The larger your text and the more patterns you have, the more this gem shines! ✨

Platform Support 🌍

Precompiled binaries are available for:

  • 🍎 macOS (ARM64 & x86_64)
  • 🐧 Linux (ARM64 & x86_64)

If a precompiled binary isn't available for your platform, the gem will automatically compile the Rust extension during installation.

Development 🛠️

Want to contribute? Yay! 🎉

# Install dependencies
bundle install

# Compile the extension
fish -c "bundle exec rake dev compile"

# Run tests
fish -c "bundle exec rake test"

# Build the gem
gem build ahocorasick-rust.gemspec

References 📚

Contributing 💝

Bug reports and pull requests are welcome on GitHub at https://github.com/jetpks/ahocorasick-rust-ruby!

License 📄

This gem is available as open source under the terms of the MIT License.


Made with 💖 and Rust 🦀 by Eric