0.02
No release in over 3 years
Low commit activity in last 3 years
Port of UEA-Lite Stemmer to Ruby, a conservative stemmer for search and indexing.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Development

 Project Readme

uea-stemmer

Ruby implementation of the UEA-Lite stemmer for conservative stemming in search and indexing workloads.

UEA-Lite uses a rule set to normalize suffixes while avoiding aggressive stemming.

Behavior Notes

The stemmer operates on a single token at a time and returns a stemmed token.

Notable behavior of this implementation:

  • possessive apostrophes are removed

  • contractions are expanded by default (for example, don't becomes do not)

  • tokens beginning with uppercase letters are preserved, and pluralized acronyms ending in a lowercase s are singularized

  • pure numbers, and tokens containing hyphens/underscores, are passed through unchanged

This is a port to Ruby from the Java port of the original Perl script by Marie-Claire Jenkins and Dr. Dan J. Smith at the University of East Anglia.

Installation

Install the gem:

gem install uea-stemmer

Install from source:

git clone https://github.com/ealdent/uea-stemmer.git
cd uea-stemmer
bundle install
bundle exec rake test
bundle exec rake install

Example Usage

Basic usage:

require "uea-stemmer"
stemmer = UEAStemmer.new

stemmer.stem("helpers")   # => "helper"
stemmer.stem("dying")     # => "die"
stemmer.stem("scarred")   # => "scar"

You can extract the matching rule with stem_with_rule:

result = stemmer.stem_with_rule("invited")
result.word      # => "invite"
result.rule_num  # => 22.3
result.rule      # => #<UEAStemmer::Rule ...>

Disable contraction expansion:

UEAStemmer.new(nil, nil, skip_contractions: true).stem("don't")
# => "don't"

Use the singleton instance:

DefaultUEAStemmer.instance.stem("running")  # => "run"

Contributing

  • Fork the project.

  • Make your feature addition or bug fix.

  • Add or update tests.

  • Run +bundle exec rake test+.

  • Send me a pull request. Bonus points for topic branches.

Relevant Web Pages

Copyright © 2005 by the University of East Anglia and authored by Marie-Claire Jenkins and Dr. Dan J Smith. This port to Ruby was done by Jason Adams using the port to Java by Richard Churchill.

This project is distributed under the Apache 2.0 License. See LICENSE for details.