The project is in a healthy, maintained state
Bloom filter implementation using a bit array with double hashing. Supports configurable expected items and false positive rate, merge, serialization, and memory usage reporting.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies
 Project Readme

philiprehberger-bloom_filter

Tests Gem Version Last updated

Space-efficient probabilistic set with configurable false positive rate

Requirements

  • Ruby >= 3.1

Installation

Add to your Gemfile:

gem "philiprehberger-bloom_filter"

Or install directly:

gem install philiprehberger-bloom_filter

Usage

require "philiprehberger/bloom_filter"

filter = Philiprehberger::BloomFilter.new(expected_items: 10_000, false_positive_rate: 0.01)
filter.add('hello')
filter.include?('hello')  # => true
filter.include?('world')  # => false

Merge Filters

a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.add('alpha')
b.add('beta')
a.merge(b)
a.include?('beta')  # => true

Bulk Add

filter = Philiprehberger::BloomFilter.new(expected_items: 10_000)
filter.bulk_add(%w[alpha beta gamma delta])
filter.include?('beta')  # => true

Cardinality Estimation

filter.count_estimate  # => ~4.0 (estimated unique items)

Intersection

a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.bulk_add(%w[shared only-a])
b.bulk_add(%w[shared only-b])

result = a.intersection(b)
result.include?('shared')  # => true
result.include?('only-a')  # => false

Fill Rate

filter.fill_rate  # => 0.023 (proportion of set bits)

Equality and Copying

a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.add('hello')
b.add('hello')
a == b  # => true

clone = a.copy
clone.add('world')
a.include?('world')  # => false (independent copy)

False Positive Rate

filter = Philiprehberger::BloomFilter.new(expected_items: 1000, false_positive_rate: 0.01)
100.times { |i| filter.add("item-#{i}") }
filter.false_positive_rate  # => ~0.0001 (actual rate based on current fill)

Superset Check

a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.bulk_add(%w[alpha beta gamma])
b.add('alpha')
a.superset?(b)  # => true

Empty Check

filter = Philiprehberger::BloomFilter.new(expected_items: 1000)
filter.empty?  # => true
filter.add('hello')
filter.empty?  # => false

Union (Non-Mutating)

a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.add('alpha')
b.add('beta')

result = a.union(b)
result.include?('alpha')  # => true
result.include?('beta')   # => true
a.include?('beta')        # => false (a is unchanged)

Subset Check

a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.add('alpha')
b.bulk_add(%w[alpha beta gamma])
a.subset?(b)  # => true

Operator Aliases

a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.add('alpha')
b.add('beta')

union = a | b              # same as a.union(b)
intersection = a & b       # same as a.intersection(b)

Compatibility Check

a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.compatible?(b)  # => true, safe to merge / intersect / union

Saturation Check

filter = Philiprehberger::BloomFilter.new(expected_items: 100)
filter.saturated?  # => false
200.times { |i| filter.add("item-#{i}") }
filter.saturated?(threshold: 0.5)  # => true

Serialization

data = filter.serialize
restored = Philiprehberger::BloomFilter.deserialize(data)
restored.include?('hello')  # => true

JSON Serialization

json = filter.to_json
restored = Philiprehberger::BloomFilter.from_json(json)
restored.include?('hello')  # => true

API

Method Description
.new(expected_items:, false_positive_rate:) Create a new bloom filter
#add(item) Add an item to the filter
#include?(item) Check if an item might be in the filter
#merge(other) Merge another compatible filter into this one
#clear Reset the filter
#count Number of items added
#memory_usage Bit array size in bytes
#serialize Serialize to a hash
#bulk_add(items) Add all items from an enumerable
#count_estimate Estimate unique item count from fill rate
#intersection(other) Create filter matching items in both
#fill_rate Proportion of set bits (0.0 to 1.0)
#==(other) Structural equality comparison
#copy Create an independent deep clone
#false_positive_rate Actual FP rate based on current fill
#to_json Serialize to JSON string
#superset?(other) Check if self contains all bits of other
#empty? Check if no items have been added
#union(other) Non-mutating OR returning a new filter
#subset?(other) Check if every set bit in self is also set in other
#compatible?(other) Check structural compatibility with another filter
#saturated?(threshold:) True if fill rate is at/above threshold
#|(other) Operator alias for #union
#&(other) Operator alias for #intersection
#hash / #eql? Hash key support consistent with #==
#inspect Human-readable representation
.deserialize(data) Restore a filter from serialized data
.from_json(str) Restore a filter from a JSON string

Development

bundle install
bundle exec rspec
bundle exec rubocop

Support

If you find this project useful:

Star the repo

🐛 Report issues

💡 Suggest features

❤️ Sponsor development

🌐 All Open Source Projects

💻 GitHub Profile

🔗 LinkedIn Profile

License

MIT