MultiStringReplace
A fast multiple string replace library for ruby. Uses a C implementation of the Aho–Corasick Algorithm based on https://github.com/morenice/ahocorasick while adding support for a few performance enhancements and on the fly multiple string replacement.
If Regex is not needed, this library offers significant performance advantages over String.gsub() for large string and with a large number of tokens.
Installation
Add this line to your application's Gemfile:
gem 'multi_string_replace'
And then execute:
$ bundle
Or install it yourself as:
$ gem install multi_string_replace
Usage
MultiStringReplace.match("The quick brown fox jumps over the lazy dog brown", ['brown', 'fox'])
# { 0 => [10, 44], 1 => [16] }
MultiStringReplace.replace("The quick brown fox jumps over the lazy dog brown", {'brown' => 'black', 'fox' => 'wolf'})
# The quick black wolf jumps over the lazy dog black
You can also pass in a Proc, these will only get evaluated when the token is encountered. The start and end replace position will passed to the proc.
MultiStringReplace.replace("The quick brown fox jumps over the lazy dog brown", {'brown' => 'black', 'fox' => ->(s, e) { "cat" }})
# => "The quick black cat jumps over the lazy dog black"
# returning nil will cause the substitution to be ignored.
MultiStringReplace.replace("The quick brown fox jumps over the lazy dog brown", {'brown' => 'black', 'fox' => ->(s, e) { nil }})
# => "The quick black fox jumps over the lazy dog black"
MultiStringReplace.replace("The quick brown fox jumps over the lazy dog brown", {'brown' => 'black', 'fox' => ->(s, e) { "" }})
# => "The quick black jumps over the lazy dog black"
This should allow for very fast and simple templating systems.
Also adds a mreplace method to String which does the same thing:
"The quick brown fox jumps over the lazy dog brown".mreplace({'brown' => 'black', 'fox' => ->(_, _) { "cat" }})
Reuse a compiled automaton (faster for repeated calls)
When running many matches/replacements with the same set of keys, build the automaton once and reuse it:
ac = MultiStringReplace::Automaton.new(['brown', 'fox'])
ac.match("The quick brown fox")
ac.replace("The quick brown fox", { 'brown' => 'black', 'fox' => 'wolf' })
This avoids rebuilding the trie/failure links on every call and can improve throughput significantly for repeated workloads.
Performance
Performing token replacement on a 200K text file repeated 100 times
user system total real
multi gsub 1.322510 0.000000 1.322510 ( 1.344405)
MultiStringReplace 0.196823 0.007979 0.204802 ( 0.207219)
mreplace 0.200593 0.004031 0.204624 ( 0.205379)
Benchmark source: https://github.com/jedld/multi_string_replace/blob/master/bin/benchmark.rb
Run the benchmark locally
- Install dependencies
bundle install
- Compile the native extension (recommended for accurate numbers)
bundle exec rake compile
- Run the benchmark script
bundle exec ruby bin/benchmark.rb
Notes:
- The script reads the sample text from
spec/fixtures/test.txt
and will write results toreplaced.txt
andreplaced2.txt
in the repo root. - To change the number of iterations or the input text, edit
bin/benchmark.rb
(look for the100.times
loop and thespec/fixtures/test.txt
path). - For repeated runs with the same keys, consider updating the benchmark to use
MultiStringReplace::Automaton
to showcase the speedup for batched workloads.
Advanced benchmark flags:
# choose input and iterations
bundle exec ruby bin/benchmark.rb -f spec/fixtures/test.txt -n 200
# enable Automaton (reuse compiled trie)
bundle exec ruby bin/benchmark.rb -A
Development
After checking out the repo, run bin/setup
to install dependencies. Then, run rake compile
followed by run rake spec
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in version.rb
, and then run bundle exec rake release
, which will create a git tag for the version, push git commits and tags, and push the .gem
file to rubygems.org.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/jedld/multi_string_replace. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.
License
The gem is available as open source under the terms of the MIT License.
Code of Conduct
Everyone interacting in the MultiStringReplace project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.