0.0
The project is in a healthy, maintained state
Implementation of classic multi-armed bandit algorithms in Ruby. Supports Thompson Sampling, UCB1, and Epsilon-Greedy strategies with a simple, consistent API for A/B testing and optimization tasks.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies
 Project Readme

ClassicBandit

CI

A Ruby library for classic (non-contextual) multi-armed bandit algorithms including Thompson Sampling, UCB1, and Epsilon-Greedy.

Requirements

  • Ruby >= 3.0.0

Installation

Add this line to your application's Gemfile:

gem 'classic_bandit'

And then execute:

$ bundle install

Or install it yourself as:

$ gem install classic_bandit

Usage

A/B Testing Example

require 'classic_bandit'

# Initialize banners for A/B testing
arms = [
  ClassicBandit::Arm.new(id: 'banner_a', name: 'Spring Campaign'),
  ClassicBandit::Arm.new(id: 'banner_b', name: 'Summer Campaign')
]

# Choose algorithm: Epsilon-Greedy with 10% exploration
bandit = ClassicBandit::EpsilonGreedy.new(arms: arms, epsilon: 0.1)

# In your application
selected_arm = bandit.select_arm
# Display the selected banner to user
show_banner(selected_arm.id)

# Update with user's response
# 1 for click, 0 for no click
bandit.update(selected_arm, 1)

Available Algorithms

Epsilon-Greedy

Balances exploration and exploitation with a fixed exploration rate.

bandit = ClassicBandit::EpsilonGreedy.new(arms: arms, epsilon: 0.1)
  • Simple
  • Explicitly controls exploration with ε parameter
  • Explores randomly with probability ε, exploits best arm with probability 1-ε

UCB1

Upper Confidence Bound algorithm that automatically balances exploration and exploitation.

bandit = ClassicBandit::Ucb1.new(arms: arms)
  • No explicit exploration parameter needed
  • Automatically balances exploration and exploitation
  • Uses confidence bounds to select arms
  • Always tries untested arms first

Softmax

Temperature-based algorithm that selects arms according to their relative rewards.

bandit = ClassicBandit::Softmax.new(
  arms: arms,
  initial_temperature: 1.0,
  k: 0.5
)
  • Uses Boltzmann distribution for arm selection
  • Higher temperature leads to more exploration
  • Temperature decreases over time for better exploitation
  • Smooth probability distribution over arms

Thompson Sampling

Bayesian approach that maintains a probability distribution over each arm's rewards.

bandit = ClassicBandit::ThompsonSampling.new(arms: arms)
  • Naturally balances exploration and exploitation
  • Uses Beta distribution to model uncertainty
  • Performs well in practice with no tuning required
  • Adapts quickly to reward patterns

Common Interface

All algorithms share the same interface:

# Select an arm
arm = bandit.select_arm

# Update the arm with reward
bandit.update(arm, 1)  # Success
bandit.update(arm, 0)  # Failure

Development

After checking out the repo, run:

$ bundle install
$ bundle exec rspec

To release a new version:

  1. Update the version number in version.rb
  2. Create a git tag for the version
  3. Push git commits and tags

License

The gem is available as open source under the terms of the MIT License.