ClassicBandit
A Ruby library for classic (non-contextual) multi-armed bandit algorithms including Thompson Sampling, UCB1, and Epsilon-Greedy.
Requirements
- Ruby >= 3.0.0
Installation
Add this line to your application's Gemfile:
gem 'classic_bandit'
And then execute:
$ bundle install
Or install it yourself as:
$ gem install classic_bandit
Usage
A/B Testing Example
require 'classic_bandit'
# Initialize banners for A/B testing
arms = [
ClassicBandit::Arm.new(id: 'banner_a', name: 'Spring Campaign'),
ClassicBandit::Arm.new(id: 'banner_b', name: 'Summer Campaign')
]
# Choose algorithm: Epsilon-Greedy with 10% exploration
bandit = ClassicBandit::EpsilonGreedy.new(arms: arms, epsilon: 0.1)
# In your application
selected_arm = bandit.select_arm
# Display the selected banner to user
show_banner(selected_arm.id)
# Update with user's response
# 1 for click, 0 for no click
bandit.update(selected_arm, 1)
Available Algorithms
Epsilon-Greedy
Balances exploration and exploitation with a fixed exploration rate.
bandit = ClassicBandit::EpsilonGreedy.new(arms: arms, epsilon: 0.1)
- Simple
- Explicitly controls exploration with ε parameter
- Explores randomly with probability ε, exploits best arm with probability 1-ε
UCB1
Upper Confidence Bound algorithm that automatically balances exploration and exploitation.
bandit = ClassicBandit::Ucb1.new(arms: arms)
- No explicit exploration parameter needed
- Automatically balances exploration and exploitation
- Uses confidence bounds to select arms
- Always tries untested arms first
Softmax
Temperature-based algorithm that selects arms according to their relative rewards.
bandit = ClassicBandit::Softmax.new(
arms: arms,
initial_temperature: 1.0,
k: 0.5
)
- Uses Boltzmann distribution for arm selection
- Higher temperature leads to more exploration
- Temperature decreases over time for better exploitation
- Smooth probability distribution over arms
Thompson Sampling
Bayesian approach that maintains a probability distribution over each arm's rewards.
bandit = ClassicBandit::ThompsonSampling.new(arms: arms)
- Naturally balances exploration and exploitation
- Uses Beta distribution to model uncertainty
- Performs well in practice with no tuning required
- Adapts quickly to reward patterns
Common Interface
All algorithms share the same interface:
# Select an arm
arm = bandit.select_arm
# Update the arm with reward
bandit.update(arm, 1) # Success
bandit.update(arm, 0) # Failure
Development
After checking out the repo, run:
$ bundle install
$ bundle exec rspec
To release a new version:
- Update the version number in version.rb
- Create a git tag for the version
- Push git commits and tags
License
The gem is available as open source under the terms of the MIT License.