Project

harlequin

0.0
No commit activity in last 3 years
No release in over 3 years
harlequin is a Ruby wrapper for linear and quadratic discriminant analysis in R for statistical classification. Also allows means testing to determine significance of discriminant variables.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

>= 0
>= 0

Runtime

 Project Readme

About

Harlequin is a gem that allows easy access to the linear and quadratic discriminant analysis functions of R. To use harlequin, initialize a DiscriminantAnalysis object with an array of variable names for analysis, and a classification variable name as a second argument, like so:

analysis = DiscriminantAnalysis.new([:weight, :height], :gender)

Training rows should be formatted as hashes with pairs of the form variable_name => value. For example, we can add some rows to the analysis above with

analysis.add_training_data(
                           { :weight => 200, :height => 72, :gender => 'male' },
                           { :weight => 205, :height => 71, :gender => 'male' },
                           { :weight => 140, :height => 63, :gender => 'female'},
                           { :weight => 130, :height => 61, :gender => 'female'}
                          )

(Note that there must be more than 1 of each classification value represented in the training data, and variable values must not be constant within a class.)

Initialize linear or quadratic analysis with #init_lda_analysis or #init_qda_analysis, respectively. Then we can predict the class of new rows, also given as hashes:

analysis.init_lda_analysis
analysis.predict(:weight => 180, :height => 68) # => {:class=>"male", :confidence=>0.9999999999666846}

Multiple predictions can be computed at once in the same way as adding multiple training rows.

In order to assess the effectiveness of adding a variable, the DiscriminantAnalysis class includes access to the two-sample t-test for difference in means between classes. This currently works for binary classification only.

analysis.t_test(:weight) # => {:t_statistic=>12.0748, :degrees_of_freedom=>1.471, :p_value=>0.01898}

Requirements

A Ruby script using Harlequin requires an R instance, so make sure you have a working copy of R installed on your system. The OSX binaries for R can be found here. See the documentation for Rinruby for more details.

You will also need the additional R packages MASS and alr3. These can be installed with the R command line by first choosing a mirror with chooseCRANmirror() and then installing with install.packages(c("MASS"), c("alr3")).