0.0
Repository is archived
No release in over a year
HybridForest provides random forests built upon combinations of different decision tree algorithms to enable diverse tree ensembles. Until version 1.0.0, please expect breaking changes.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 13.0
~> 0.2.6
~> 3.0
~> 1.7
>= 0

Runtime

~> 13.0
~> 0.2.6
~> 3.0
~> 1.7
>= 0
 Project Readme

HybridForest

HybridForest offers the possibility to build hybrid random forests, i.e., ensembles where the base learners are built from not one but several different decision tree algorithms. As of now, two types of trees are supported:

  • CARTTree
    • Performs binary splits at each internal node.
    • Supports categorical and continuous features.
    • Supports binary classification problems.
    • Uses gini impurity to find the most discriminatory feature.
    • Considers a random subset of features in each split.
    • Loosely based on the original CART algorithm (Breiman et al., 1984).
  • ID3Tree
    • Performs multiway (>=2) splits at each internal node.
    • Supports categorical and continuous features.
    • Supports binary classification problems.
    • Uses entropy to find the most discriminatory feature.
    • Considers every feature in max one split.
    • Loosely based on the ID3 algorithm (Quinlan, 1986).

The random forest itself is represented by the RandomForest class. A random forest classifier can be created with one of three base learner configurations.

  1. Hybrid mode
# Equivalent, hybrid is default.
HybridForest::RandomForest.new(number_of_trees: 100, ensemble_type: :hybrid)
HybridForest::RandomForest.new(number_of_trees: 100) 
  1. CART mode
HybridForest::RandomForest.new(number_of_trees: 100, ensemble_type: :cart) 
  1. ID3 mode
HybridForest::RandomForest.new(number_of_trees: 100, ensemble_type: :id3) 

The implementation is quite naive and there are a bunch of features that might be nice to have but are not supported, including:

  • Pruning
  • Parallelization
  • More decision trees, e.g., C4.5 and CHAID
  • Additional hyperparameters

Installation

Add this line to your application's Gemfile:

gem 'hybridforest'

And then execute:

$ bundle install

Or install it yourself as:

$ gem install hybridforest

Usage

require "hybridforest"

# Prepare data. 
# A dataset can be passed as a CSV file path, an array of hashes, or a hash of arrays.
training_set, test_set, actual_test_labels = HybridForest::Utils.train_test_split("data.csv")

# Create classifier.
hybrid_forest = HybridForest::RandomForest.new(number_of_trees: 100)

# Fit model.
hybrid_forest.fit(training_set)

# Predict.
predicted_labels = hybrid_forest.predict(test_set)

# Report metrics.
puts HybridForest::Utils.prediction_report(actual_test_labels, predicted_labels)

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and the created tag, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/hi-tech-jazz/hybridforest.

License

The gem is available as open source under the terms of the MIT License.