Classifier Reborn is a general classifier module to allow Bayesian and other types of classifications. It is a fork of cardmagic/classifier under more active development. Currently, it has Bayesian Classifier and Latent Semantic Indexer (LSI) implemented.
Here is a quick illustration of the Bayesian classifier.
$ gem install classifier-reborn $ irb irb(main):001:0> require 'classifier-reborn' irb(main):002:0> classifier = ClassifierReborn::Bayes.new 'Ham', 'Spam' irb(main):003:0> classifier.train "Ham", "Sunday is a holiday. Say no to work on Sunday!" irb(main):004:0> classifier.train "Spam", "You are the lucky winner! Claim your holiday prize." irb(main):005:0> classifier.classify "What's the plan for Sunday?" #=> "Ham"
Now, let's build an LSI, classify some text, and find a cluster of related documents.
irb(main):006:0> lsi = ClassifierReborn::LSI.new irb(main):007:0> lsi.add_item "This text deals with dogs. Dogs.", :dog irb(main):008:0> lsi.add_item "This text involves dogs too. Dogs!", :dog irb(main):009:0> lsi.add_item "This text revolves around cats. Cats.", :cat irb(main):010:0> lsi.add_item "This text also involves cats. Cats!", :cat irb(main):011:0> lsi.add_item "This text involves birds. Birds.", :bird irb(main):012:0> lsi.classify "This text is about dogs!" #=> :dog irb(main):013:0> lsi.find_related("This text is around cats!", 2) #=> ["This text revolves around cats. Cats.", "This text also involves cats. Cats!"]
There is much more that can be done using Bayes and LSI beyond these quick examples. For more information read the following documentation topics.
- Installation and Dependencies
- Bayesian Classifier
- Latent Semantic Indexer (LSI)
- Classifier Validation
- Development and Contributions (Optional Docker instructions included)
Notes on JRuby support
gem 'classifier-reborn-jruby', platforms: :java
While experimental, this gem should work on JRuby without any kind of additional changes. Unfortunately, you will not be able to use C bindings to GNU/GSL or similar performance-enhancing native code. Additionally, we do not use
fast_stemmer, but rather an implementation of the Porter Stemming algorithm. Stemming will differ between MRI and JRuby, however you may choose to disable stemming and do your own manual preprocessing (or use some other popular Java library).
If you encounter a problem, please submit your issue with
[JRuby] in the title.
Code of Conduct
In order to have a more open and welcoming community,
Classifier Reborn adheres to the
code of conduct adapted from the
Ruby on Rails code of conduct.
Please adhere to this code of conduct in any interactions you have in the
If you encounter someone violating these terms, please let Chase Gilliam know and we will address it as soon as possible.
Authors and Contributors
- Lucas Carlson
- David Fayram II
- Cameron McBride
- Ivan Acosta-Rubio
- Parker Moore
- Chase Gilliam
- and many more...
The Classifier Reborn library is released under the terms of the GNU LGPL-2.1.