philiprehberger-bloom_filter
Space-efficient probabilistic set with configurable false positive rate
Requirements
- Ruby >= 3.1
Installation
Add to your Gemfile:
gem "philiprehberger-bloom_filter"Or install directly:
gem install philiprehberger-bloom_filterUsage
require "philiprehberger/bloom_filter"
filter = Philiprehberger::BloomFilter.new(expected_items: 10_000, false_positive_rate: 0.01)
filter.add('hello')
filter.include?('hello') # => true
filter.include?('world') # => falseMerge Filters
a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.add('alpha')
b.add('beta')
a.merge(b)
a.include?('beta') # => trueBulk Add
filter = Philiprehberger::BloomFilter.new(expected_items: 10_000)
filter.bulk_add(%w[alpha beta gamma delta])
filter.include?('beta') # => trueCardinality Estimation
filter.count_estimate # => ~4.0 (estimated unique items)Intersection
a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.bulk_add(%w[shared only-a])
b.bulk_add(%w[shared only-b])
result = a.intersection(b)
result.include?('shared') # => true
result.include?('only-a') # => falseFill Rate
filter.fill_rate # => 0.023 (proportion of set bits)Equality and Copying
a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.add('hello')
b.add('hello')
a == b # => true
clone = a.copy
clone.add('world')
a.include?('world') # => false (independent copy)False Positive Rate
filter = Philiprehberger::BloomFilter.new(expected_items: 1000, false_positive_rate: 0.01)
100.times { |i| filter.add("item-#{i}") }
filter.false_positive_rate # => ~0.0001 (actual rate based on current fill)Superset Check
a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.bulk_add(%w[alpha beta gamma])
b.add('alpha')
a.superset?(b) # => trueEmpty Check
filter = Philiprehberger::BloomFilter.new(expected_items: 1000)
filter.empty? # => true
filter.add('hello')
filter.empty? # => falseUnion (Non-Mutating)
a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.add('alpha')
b.add('beta')
result = a.union(b)
result.include?('alpha') # => true
result.include?('beta') # => true
a.include?('beta') # => false (a is unchanged)Subset Check
a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.add('alpha')
b.bulk_add(%w[alpha beta gamma])
a.subset?(b) # => trueOperator Aliases
a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.add('alpha')
b.add('beta')
union = a | b # same as a.union(b)
intersection = a & b # same as a.intersection(b)Compatibility Check
a = Philiprehberger::BloomFilter.new(expected_items: 1000)
b = Philiprehberger::BloomFilter.new(expected_items: 1000)
a.compatible?(b) # => true, safe to merge / intersect / unionSaturation Check
filter = Philiprehberger::BloomFilter.new(expected_items: 100)
filter.saturated? # => false
200.times { |i| filter.add("item-#{i}") }
filter.saturated?(threshold: 0.5) # => trueSerialization
data = filter.serialize
restored = Philiprehberger::BloomFilter.deserialize(data)
restored.include?('hello') # => trueJSON Serialization
json = filter.to_json
restored = Philiprehberger::BloomFilter.from_json(json)
restored.include?('hello') # => trueAPI
| Method | Description |
|---|---|
.new(expected_items:, false_positive_rate:) |
Create a new bloom filter |
#add(item) |
Add an item to the filter |
#include?(item) |
Check if an item might be in the filter |
#merge(other) |
Merge another compatible filter into this one |
#clear |
Reset the filter |
#count |
Number of items added |
#memory_usage |
Bit array size in bytes |
#serialize |
Serialize to a hash |
#bulk_add(items) |
Add all items from an enumerable |
#count_estimate |
Estimate unique item count from fill rate |
#intersection(other) |
Create filter matching items in both |
#fill_rate |
Proportion of set bits (0.0 to 1.0) |
#==(other) |
Structural equality comparison |
#copy |
Create an independent deep clone |
#false_positive_rate |
Actual FP rate based on current fill |
#to_json |
Serialize to JSON string |
#superset?(other) |
Check if self contains all bits of other |
#empty? |
Check if no items have been added |
#union(other) |
Non-mutating OR returning a new filter |
#subset?(other) |
Check if every set bit in self is also set in other |
#compatible?(other) |
Check structural compatibility with another filter |
#saturated?(threshold:) |
True if fill rate is at/above threshold |
#|(other) |
Operator alias for #union
|
#&(other) |
Operator alias for #intersection
|
#hash / #eql?
|
Hash key support consistent with #==
|
#inspect |
Human-readable representation |
.deserialize(data) |
Restore a filter from serialized data |
.from_json(str) |
Restore a filter from a JSON string |
Development
bundle install
bundle exec rspec
bundle exec rubocopSupport
If you find this project useful: