0.0
The project is in a healthy, maintained state
Implementation of classic multi-armed bandit algorithms in Ruby. Supports Thompson Sampling, UCB1, and Epsilon-Greedy strategies with a simple, consistent API for A/B testing and optimization tasks.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies
 Project Readme

ClassicBandit

CI

A Ruby library for classic (non-contextual) multi-armed bandit algorithms including Thompson Sampling, UCB1, and Epsilon-Greedy.

Requirements

  • Ruby >= 3.0.0

Installation

Add this line to your application's Gemfile:

gem 'classic_bandit'

And then execute:

$ bundle install

Or install it yourself as:

$ gem install classic_bandit

Usage

A/B Testing Example

require 'classic_bandit'

# Initialize banners for A/B testing
arms = [
  ClassicBandit::Arm.new(id: 'banner_a', name: 'Spring Campaign'),
  ClassicBandit::Arm.new(id: 'banner_b', name: 'Summer Campaign')
]

# Choose algorithm: Epsilon-Greedy with 10% exploration
bandit = ClassicBandit::EpsilonGreedy.new(arms: arms, epsilon: 0.1)

# In your application
selected_arm = bandit.select_arm
# Display the selected banner to user
show_banner(selected_arm.id)

# Update with user's response
# 1 for click, 0 for no click
bandit.update(selected_arm, 1)

Available Algorithms

Epsilon-Greedy

Balances exploration and exploitation with a fixed exploration rate.

bandit = ClassicBandit::EpsilonGreedy.new(arms: arms, epsilon: 0.1)
  • Simple
  • Explicitly controls exploration with ε parameter
  • Explores randomly with probability ε, exploits best arm with probability 1-ε

UCB1

Upper Confidence Bound algorithm that automatically balances exploration and exploitation.

bandit = ClassicBandit::Ucb1.new(arms: arms)
  • No explicit exploration parameter needed
  • Automatically balances exploration and exploitation
  • Uses confidence bounds to select arms
  • Always tries untested arms first

Softmax

Temperature-based algorithm that selects arms according to their relative rewards.

bandit = ClassicBandit::Softmax.new(
  arms: arms,
  initial_temperature: 1.0,
  k: 0.5
)
  • Uses Boltzmann distribution for arm selection
  • Higher temperature leads to more exploration
  • Temperature decreases over time for better exploitation
  • Smooth probability distribution over arms

Thompson Sampling

Bayesian approach that maintains a probability distribution over each arm's rewards.

bandit = ClassicBandit::ThompsonSampling.new(arms: arms)
  • Naturally balances exploration and exploitation
  • Uses Beta distribution to model uncertainty
  • Performs well in practice with no tuning required
  • Adapts quickly to reward patterns

Common Interface

All algorithms share the same interface:

# Select an arm
arm = bandit.select_arm

# Update the arm with reward
bandit.update(arm, 1)  # Success
bandit.update(arm, 0)  # Failure

Development

After checking out the repo, run:

$ bundle install
$ bundle exec rspec

To release a new version:

  1. Update the version number in version.rb
  2. Create a git tag for the version
  3. Push git commits and tags

License

The gem is available as open source under the terms of the MIT License.