0.01
No commit activity in last 3 years
No release in over 3 years
Ever needed to compactly store and query a set of mostly consecutive, non-negative integers? Probably not, but if you do this library may work for you. It's just about as fast as a Set and a lot smaller for numbers that stay close together.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies
 Project Readme

DNS

The DumbNumbSet (DNS) is intended to be somewhat efficient storage for mostly consecutive, positive integers. In particular, the serialized size should be small for sending over the network. Memory usage should be similarly less than a stock set or hash table.

Note that the starting value of a consecutive set of integers does not matter, nor do large gaps in between consecutive sets. But lots of sparse data does not work so well.

Installation

gem install dumb_numb_set

Optionally:

gem install msgpack

Usage

require 'dumb_numb_set'

dns = DumbNumbSet.new

dns << 1

dns.include? 1  #=> true

dns.remove 1

dns.include? 1  #=> false

dns.to_a        #=> [1]

Tests

ruby test/test.rb

Benchmarks

ruby test/bm.rb

Memory Test

 ruby test/mem.rb SIZE

Modify $multiplier and data for more fun.

Serialized Size

Perfectly Dense Data

This is the best case scenario for the DNS: a list of consecutive integers.

The values below are the length of the serialized data structure from Marshal.dump using ruby 1.9.3p392 (2013-02-22 revision 39386) [x86_64-linux].

Items        Hash         DNS      %reduction
---------------------------------------------
  1k   |        4632  |       253   |  95%
 10k   |       49632  |      2211   |  96%
100k   |      534098  |     24254   |  95%
  1M   |     5934098  |    245565   |  96%
 10M   |    59934098  |   2557080   |  96%
100M   |   683156884  |  26163639   |  96%
  1B   |         ?    | 262229211   |   ?
---------------------------------------------

For even smaller serialized sizes, install the MessagePack and the DNS will automatically use it in conjuction with Marshal.

For 1 billion items, my machine ran out of memory (16GB).

Less Dense Data

The less dense the data is, the less benefit a DNS provides. In the worse case, DNS is equivalent to a hash table, i.e. one key per value stored.

In the tables below, random values were used in the range 0-(size * 30). This allows significant gaps to form in the data. For even sparser data, DNS starts to be less efficient than a hash table.

Items        Hash         DNS      %reduction
---------------------------------------------
  1k   |        4921 |      5121   |  -4% (worse)
 10k   |       57045 |     53621   |   6%
100k   |      588230 |    537184   |   9%
  1M   |     6334404 |   5758291   |   9%
 10M   |    68298120 |   58082016  |   15%
100M   |   751061842 | 609400600   |   29%
  1B   |          ?  |         ?   |   ?
---------------------------------------------

For 1 billion items, my machine ran out of memory (16GB).

Speed

Some quick benchmarks for insert, remove, and lookup speed. The DNS is expected to be a little slower, since it is trading off speed for memory conservation and is written in Ruby, not C/Java. However, it is nearly as fast for every operation except adding random numbers (and JRuby hashes are faster at removing values).

Tests are run with 1 million values.

ruby 1.9.3p448 (x86_64):

                               user     system      total        real
Hash add random            0.540000   0.020000   0.560000 (  0.549499)
DumbNumbSet add random     0.850000   0.020000   0.870000 (  0.864700)
Hash add in order          0.540000   0.020000   0.560000 (  0.556441)
DumbNumbSet add in order   0.490000   0.000000   0.490000 (  0.483713)
Hash add shuffled          0.570000   0.020000   0.590000 (  0.589316)
DumbNumbSet add shuffled   0.540000   0.010000   0.550000 (  0.538420)
Hash look up               0.930000   0.010000   0.940000 (  0.940849)
DNS look up                0.820000   0.000000   0.820000 (  0.818728)
Hash remove                0.980000   0.030000   1.010000 (  0.999362)
DNS remove                 0.950000   0.000000   0.950000 (  0.953170)

jruby 1.7.4:

                               user     system      total        real
Hash add random            0.590000   0.000000   0.590000 (  0.561000)
DumbNumbSet add random     1.670000   0.020000   1.690000 (  0.978000)
Hash add in order          2.580000   0.010000   2.590000 (  0.809000)
DumbNumbSet add in order   0.490000   0.000000   0.490000 (  0.319000)
Hash add shuffled          0.730000   0.000000   0.730000 (  0.591000)
DumbNumbSet add shuffled   0.440000   0.000000   0.440000 (  0.420000)
Hash look up               0.730000   0.000000   0.730000 (  0.568000)
DNS look up                0.740000   0.010000   0.750000 (  0.626000)
Hash remove                0.270000   0.000000   0.270000 (  0.201000)
DNS remove                 0.590000   0.000000   0.590000 (  0.564000)