ClassicBandit

A Ruby library for classic (non-contextual) multi-armed bandit algorithms including Thompson Sampling, UCB1, and Epsilon-Greedy.

Requirements

Ruby >= 3.0.0

Installation

Add this line to your application's Gemfile:

gem 'classic_bandit'

And then execute:

$ bundle install

Or install it yourself as:

$ gem install classic_bandit

Usage

A/B Testing Example

require 'classic_bandit'

# Initialize banners for A/B testing
arms = [
  ClassicBandit::Arm.new(id: 'banner_a', name: 'Spring Campaign'),
  ClassicBandit::Arm.new(id: 'banner_b', name: 'Summer Campaign')
]

# Choose algorithm: Epsilon-Greedy with 10% exploration
bandit = ClassicBandit::EpsilonGreedy.new(arms: arms, epsilon: 0.1)

# In your application
selected_arm = bandit.select_arm
# Display the selected banner to user
show_banner(selected_arm.id)

# Update with user's response
# 1 for click, 0 for no click
bandit.update(selected_arm, 1)

Available Algorithms

Epsilon-Greedy

Balances exploration and exploitation with a fixed exploration rate.

bandit = ClassicBandit::EpsilonGreedy.new(arms: arms, epsilon: 0.1)

Simple
Explicitly controls exploration with ε parameter
Explores randomly with probability ε, exploits best arm with probability 1-ε

UCB1

Upper Confidence Bound algorithm that automatically balances exploration and exploitation.

bandit = ClassicBandit::Ucb1.new(arms: arms)

No explicit exploration parameter needed
Automatically balances exploration and exploitation
Uses confidence bounds to select arms
Always tries untested arms first

Softmax

Temperature-based algorithm that selects arms according to their relative rewards.

bandit = ClassicBandit::Softmax.new(
  arms: arms,
  initial_temperature: 1.0,
  k: 0.5
)

Uses Boltzmann distribution for arm selection
Higher temperature leads to more exploration
Temperature decreases over time for better exploitation
Smooth probability distribution over arms

Thompson Sampling

Bayesian approach that maintains a probability distribution over each arm's rewards.

bandit = ClassicBandit::ThompsonSampling.new(arms: arms)

Naturally balances exploration and exploitation
Uses Beta distribution to model uncertainty
Performs well in practice with no tuning required
Adapts quickly to reward patterns

Common Interface

All algorithms share the same interface:

# Select an arm
arm = bandit.select_arm

# Update the arm with reward
bandit.update(arm, 1)  # Success
bandit.update(arm, 0)  # Failure

Development

After checking out the repo, run:

$ bundle install
$ bundle exec rspec

To release a new version:

Update the version number in version.rb
Create a git tag for the version
Push git commits and tags

License

The gem is available as open source under the terms of the MIT License.