A long-lived project that still receives updates
A Japanese text normalization library follows the conventions of neologd with some performance optimizations. It is designed to preprocess Japanese text before applying NLP techniques.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies
 Project Readme

Neologdish::Normalizer for Ruby Check Gem Version

A Japanese text normalization library for Ruby follows the conventions of neologd/mecab-ipadic-neologd, with some performance optimizations, without external dependencies. It is designed to preprocess Japanese text before applying NLP techniques.

The specific rules are documented here: https://github.com/neologd/mecab-ipadic-neologd/wiki/Regexp.ja

Usage

require "neologdish-normalizer"

Neologdish::Normalizer.normalize("南アルプスの 天然水- Sparking* Lemon+ レモン一絞り")
# => 南アルプスの天然水-Sparking*Lemon+レモン一絞り

Benchmark

The performance comparison between the official Ruby example (https://github.com/neologd/mecab-ipadic-neologd/wiki/Regexp.ja#ruby-written-by-kimoto-and-overlast) and this library is as follows:

                           user     system      total        real
original normalizer:   4.200670   0.032004   4.232674 (  4.274573)
this library:          1.158801   0.005238   1.164039 (  1.170226)

The benchmark script is here: ./scripts/benchmark.rb

Installation

Install the gem and add to the application's Gemfile by executing:

bundle add 'neologdish-normalizer'

If bundler is not being used to manage dependencies, install the gem by executing:

gem install 'neologdish-normalizer'

Development

After checking out the repo, run bin/setup to install dependencies. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and the created tag, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/moznion/neologdish-normalizer-ruby.