Aho-Corasick Rust ✨
Blazing-fast multi-pattern string matching for Ruby! (ノ◕ヮ◕)ノ*:・゚✧
ahocorasick-rust is a Ruby wrapper for the Aho-Corasick algorithm implemented in Rust! 🦀💎
What is Aho-Corasick? 🤔
Aho-Corasick is a powerful string searching algorithm that can find multiple patterns simultaneously in a single pass through your text! Unlike traditional string matching that searches for one pattern at a time, Aho-Corasick builds a finite state machine from your dictionary of patterns and matches them all at once.
Perfect for:
- 🔍 Content filtering & moderation
- 📝 Finding keywords in large documents
- 🚫 Detecting prohibited words or phrases
- 🏷️ Multi-pattern text analysis
- ⚡ Any scenario where you need to search for many patterns efficiently!
Why this gem rocks:
- 🦀 Powered by Rust for maximum speed
- 💎 Clean, intuitive Ruby API with 7+ search methods
- 🚀 Up to 67x faster than pure Ruby implementations
- ✨ Precompiled binaries for major platforms
- 🎯 Multiple search modes: overlapping, positioned, existence checks
- 🔄 Find & replace with hash or block-based logic
- 🌈 Works with Ruby 2.7+ and UTF-8/emoji
Installation 📦
Add this gem to your Gemfile:
gem 'ahocorasick-rust'Then execute:
bundle installOr install it yourself:
gem install ahocorasick-rustFeatures ✨
- Multiple search modes - Find all matches, overlapping matches, or just check existence
- Position tracking - Get byte offsets for every match
- Case-insensitive matching - Optional ASCII case-insensitive search
- Match strategies - Control priority when patterns overlap
- Find & replace - Replace patterns with strings or dynamic logic via blocks
- Unicode support - Works seamlessly with UTF-8 text and emoji
- Zero-copy where possible - Efficient memory usage
Quick Start 🎀
Basic Pattern Matching
require 'ahocorasick-rust'
# Create a matcher with your patterns
matcher = AhoCorasickRust.new(['cat', 'dog', 'fox'])
# Find all matches
matcher.lookup("The quick brown fox jumps over the lazy dog.")
# => ["fox", "dog"]
# Check if any pattern exists
matcher.match?("I have a cat")
# => trueCase-Insensitive Matching
matcher = AhoCorasickRust.new(['Ruby', 'Python'], case_insensitive: true)
matcher.lookup('I love RUBY and python!')
# => ["Ruby", "Python"]Get Match Positions
matcher = AhoCorasickRust.new(['fox', 'dog'])
matcher.lookup_with_positions('The fox and dog')
# => [
# { pattern: 'fox', start: 4, end: 7 },
# { pattern: 'dog', start: 12, end: 15 }
# ]Find & Replace
matcher = AhoCorasickRust.new(['bad', 'worse', 'worst'])
# Replace with hash
matcher.replace_all('This is bad and worse', { 'bad' => 'good', 'worse' => 'better' })
# => "This is good and better"
# Replace with block
matcher.replace_all('This is bad and worse') { |word| '*' * word.length }
# => "This is *** and *****"Overlapping Matches
matcher = AhoCorasickRust.new(['abc', 'bcd', 'cde'])
# Regular lookup finds non-overlapping matches
matcher.lookup('abcde')
# => ["abc"]
# Overlapping lookup finds all matches
matcher.lookup_overlapping('abcde')
# => ["abc", "bcd", "cde"]Advanced: Match Strategies
# Prefer longest matches
matcher = AhoCorasickRust.new(
['test', 'testing'],
match_kind: :leftmost_longest
)
matcher.lookup('testing')
# => ["testing"] # chooses longer match over 'test'Find First (Efficient for Existence Checks)
matcher = AhoCorasickRust.new(['foo', 'bar', 'baz'])
# Get just the first match (faster than getting all matches)
matcher.find_first('hello foo bar baz')
# => "foo"
# Or with position
matcher.find_first_with_position('hello foo bar')
# => { pattern: 'foo', start: 6, end: 9 }API Overview 🔍
Constructor:
AhoCorasickRust.new(patterns, case_insensitive: false, match_kind: :leftmost_first)
Search Methods:
-
#lookup(text)- Find all non-overlapping matches -
#lookup_overlapping(text)- Find all matches including overlaps -
#lookup_with_positions(text)- Find matches with byte positions -
#match?(text)- Check if any pattern exists (returns boolean) -
#find_first(text)- Get first match only -
#find_first_with_position(text)- Get first match with position
Replace Methods:
-
#replace_all(text, hash)- Replace with hash mapping -
#replace_all(text) { |match| ... }- Replace with block
Documentation 📖
- API Reference - Complete method documentation with examples
- Match Kind Guide - Understanding match strategies
- Example Script - Real-world usage examples
Want more examples? Check out our example script with content filtering, language detection, and more! 🌈
Benchmark 📊
Don't just take our word for it - check out these performance numbers! 🎉
Test Setup 1
- Words: 500 patterns
- Test cases: 2,000
- Text length: 3,154 chars (avg), 23,676 (max)
user system total real
each&include 6.487059 0.185424 6.672483 ( 6.791808)
ruby_ahoc 4.178672 0.138610 4.317282 ( 4.547964)
rust_ahoc 0.157662 0.004847 0.162509 ( 0.166964)
🎈 27.2x faster than pure Ruby implementation!
Test Setup 2
- Words: 500 patterns
- Test cases: 2,000
- Text length: 49,162 chars (avg), 10,392,056 (max)
user system total real
each&include 27.903179 0.237389 28.140568 ( 28.563194)
ruby_ahoc 45.220535 0.363107 45.583642 ( 46.477702)
rust_ahoc 0.670583 0.007192 0.677775 ( 0.686904)
🎈 67.7x faster than pure Ruby implementation!
The larger your text and the more patterns you have, the more this gem shines! ✨
Platform Support 🌍
Precompiled binaries are available for:
- 🍎 macOS (ARM64 & x86_64)
- 🐧 Linux (ARM64 & x86_64)
If a precompiled binary isn't available for your platform, the gem will automatically compile the Rust extension during installation.
Development 🛠️
Want to contribute? Yay! 🎉
# Install dependencies
bundle install
# Compile the extension
fish -c "bundle exec rake dev compile"
# Run tests
fish -c "bundle exec rake test"
# Build the gem
gem build ahocorasick-rust.gemspecReferences 📚
- Aho-Corasick (Rust) - The amazing Rust implementation we wrap
- Aho-Corasick Algorithm - Learn about the algorithm
- Original Ruby Implementation - Pure Ruby version for comparison
Contributing 💝
Bug reports and pull requests are welcome on GitHub at https://github.com/jetpks/ahocorasick-rust-ruby!
License 📄
This gem is available as open source under the terms of the MIT License.
Made with 💖 and Rust 🦀 by Eric