0.02
Low commit activity in last 3 years
A long-lived project that still receives updates
Ruby port of the UEA-Lite stemmer, designed to normalize common English suffixes without aggressive stemming.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies
 Project Readme

uea-stemmer

Ruby implementation of the UEA-Lite stemmer for conservative stemming in search and indexing workloads. The gem has no runtime dependencies.

UEA-Lite uses a rule set to normalize suffixes while avoiding aggressive stemming.

Behavior Notes

The stemmer operates on a single token at a time and returns a stemmed token.

Notable behavior of this implementation:

  • possessive apostrophes are removed

  • contractions are expanded by default (for example, don't becomes do not)

  • tokens beginning with uppercase letters are preserved, and pluralized acronyms ending in a lowercase s are singularized

  • pure numbers, and tokens containing hyphens/underscores, are passed through unchanged

This is a port to Ruby from the Java port of the original Perl script by Marie-Claire Jenkins and Dr. Dan J. Smith at the University of East Anglia.

Installation

Requires Ruby 3.1 or newer.

Install the gem:

gem install uea-stemmer

Install from source:

git clone https://github.com/ealdent/uea-stemmer.git
cd uea-stemmer
gem build uea-stemmer.gemspec
gem install ./uea-stemmer-*.gem

Example Usage

Basic usage:

require "uea-stemmer"
stemmer = UEAStemmer.new

stemmer.stem("helpers")   # => "helper"
stemmer.stem("dying")     # => "die"
stemmer.stem("scarred")   # => "scar"

You can extract the matching rule with stem_with_rule:

result = stemmer.stem_with_rule("invited")
result.word      # => "invite"
result.rule_num  # => "22.3"
result.rule      # => #<UEAStemmer::Rule ...>

Disable contraction expansion:

UEAStemmer.new(nil, nil, skip_contractions: true).stem("don't")
# => "don't"

Use the singleton instance:

DefaultUEAStemmer.instance.stem("running")  # => "run"

Development

This project does not require Bundler or Rake for normal development. Run the tests directly:

ruby -Itest test/uea_stemmer_test.rb

Build the gem package:

gem build uea-stemmer.gemspec

GitHub Actions runs the test suite and gem build on supported Ruby versions.

Contributing

  • Fork the project.

  • Make your feature addition or bug fix.

  • Add or update tests.

  • Run +ruby -Itest test/uea_stemmer_test.rb+.

  • Run +gem build uea-stemmer.gemspec+.

  • Send a pull request.

Relevant Web Pages

Copyright © 2005 by the University of East Anglia and authored by Marie-Claire Jenkins and Dr. Dan J Smith. This port to Ruby was done by Jason Adams using the port to Java by Richard Churchill.

This project is distributed under the Apache 2.0 License. See LICENSE for details.