No commit activity in last 3 years
No release in over 3 years
Harmonizes records based on fuzzy string/phrase matching. Built on redis for speed
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 1.10
~> 10.0
>= 0

Runtime

 Project Readme

HarmonizerRedis

HarmonizerRedis is a Ruby gem that aids the process of relabeling/grouping free text phrases to resolve the many ways people spell or describe something. It uses fuzzy string matching along with inverse term frequencies to score and rank similarities between phrases. The gem uses Redis for performance.

Usage

Configuration

The Redis must be configured first. Refer to the [Redis] (https://github.com/redis/redis-rb) for more information. Redis.current should be set to the Redis connection.

Redis.current = Redis.new

Adding an entry

HarmonizerRedis::Linkage represents the connection between your data structures and the gem. Linkages contain string content, an id (which will be a uniquely generated uuid), and a category_id which identifies the collection this entry belongs to.

my_category_id = 100
linkage = HarmonizerRedis::Linkage.new(content: 'harmonizer redis',
                                       category_id: my_category_id)
linkage.save
my_linkage_id = linkage.id # "520c488b-e9f8-4a6f-aaea-0d5e37b97644"

Retrieving an entry

my_linkage = HarmonizerRedis::Linkage.find(my_linkage_id)

Calculating and Retrieving Similarities

Calculate similarities for all the linkages in a category in a batch. New calculations will need to be performed if new linkages are added.

HarmonizerRedis.calculate_similarities(my_category_id)

To get an Array of similar phrases. The default is to return the top 20 phrases. If new linkages have been added or if the similarities have not yet been computed for this linkage, it will be computed automatically with this call.

my_linkage.get_similarities

Merging into groups, labeling groups, and getting recommended labels

Each entry in this array is an array in the following format [text_label, group_label, similarity_score, phrase_id]

After deciding which phrase the linkage should be combined with - use the accompanying phrase_id data to merge the phrases into a group

my_linkage.merge_with_phrase(phrase_id)

To label everything in the same group:

my_linkage.set_corrected_label('HarmonizerRedis')

To suggest labels for this group (this works better the more HarmonizerRedis is used)

my_linkage.recommend_labels

Lastly to get the final corrected label of a linkage:

my_linkage.corrected

Contributing

Feel free to fork this repo and change it as you wish. We prefer pull requests on github, but you can send us emails. All attributions need to be tested as well.

License

The gem is available as open source under the terms of the MIT License.