With a regular queue, you expect "FIFO" behavior: first in, first out. With a stack you expect "LIFO": last in first out. With a priority queue, you push elements along with a score and the lowest scored element is the first to be popped. Priority queues are often used in algorithms for e.g. scheduling of timers or bandwidth management, Huffman coding, and various graph search algorithms such as Dijkstra's algorithm, A* search, or Prim's algorithm.
The d-ary heap data structure is a generalization of the binary heap, in
which the nodes have d children instead of 2. This allows for "decrease
priority" operations to be performed more quickly with the tradeoff of slower
delete minimum. Additionally, d-ary heaps can have better memory cache
behavior than binary heaps, allowing them to run more quickly in practice
despite slower worst-case time complexity. In the worst case, a d-ary heap
O(log n / log d) to push, with the tradeoff that pop is
O(d log n / log d).
Although you should probably just use the default d value of
4 (see the
analysis below), it's always advisable to benchmark your specific use-case.
The basic API is:
heap << objectadds a value as its own score.
heap.push(score, value)adds a value with an extrinsic score.
heap.popremoves and returns the value with the minimum score.
heap.pop_lte(score)pops if the minimum score is
<=the provided score.
heap.peekto view the minimum value without popping it.
The score must be
Float or convertable to a
Float(score) (i.e. it should implement
#to_f). Constraining scores to
numeric values gives a 40+% speedup under some benchmarks!
Integer scores must have an absolute value that fits into
unsigned long long. This is architecture dependant but on an IA-64 system this is 64
bits, which gives a range of -18,446,744,073,709,551,615 to
Comparing arbitary objects via
a <=> b was the original design and may
be added back in a future version, if (and only if) it can be done without
impacting the speed of numeric comparisons.
require "d_heap" Task = Struct.new(:id) # for demonstration heap = DHeap.new # defaults to a 4-heap # storing [score, value] tuples heap.push Time.now + 5*60, Task.new(1) heap.push Time.now + 30, Task.new(2) heap.push Time.now + 60, Task.new(3) heap.push Time.now + 5, Task.new(4) # peeking and popping (using last to get the task and ignore the time) heap.pop # => Task heap.pop # => Task heap.peek # => Task, but don't pop it from the heap heap.pop # => Task heap.pop # => Task heap.empty? # => true heap.pop # => nil
If your values behave as their own score, by being convertible via
Float(value), then you can use
#<< for implicit scoring. The score should
not change for as long as the value remains in the heap, since it will not be
re-evaluated after being pushed.
heap.clear # The score can be derived from the value by using to_f. # "a <=> b" is *much* slower than comparing numbers, so it isn't used. class Event include Comparable attr_reader :time, :payload alias_method :to_time, :time def initialize(time, payload) @time = time.to_time @payload = payload freeze end def to_f time.to_f end def <=>(other) to_f <=> other.to_f end end heap << comparable_max # sorts last, using <=> heap << comparable_min # sorts first, using <=> heap << comparable_mid # sorts in the middle, using <=> heap.pop # => comparable_min heap.pop # => comparable_mid heap.pop # => comparable_max heap.empty? # => true heap.pop # => nil
You can also pass a value into
#pop(max) which will only pop if the minimum
score is less than or equal to
Read the API documentation for more detailed documentation and examples.
Add this line to your application's Gemfile:
And then execute:
$ bundle install
Or install it yourself as:
$ gem install d_heap
One naive approach to a priority queue is to maintain an array in sorted order.
This can be very simply implemented using
This can be very fast—
O(1)—but the worst-case for insert is
O(n) because it may need to
memcpy a significant portion of the array.
The standard way to implement a priority queue is with a binary heap. Although
this increases the time for
pop, it converts the amortized time per push + pop
O(d log n / log d).
However, I was surprised to find that—at least for some benchmarks—my pure ruby
heap implementation was much slower than inserting into and popping from a fully
sorted array. The reason for this surprising result: Although it is
memcpy has a very small constant factor, and calling
<=> from ruby code
has relatively much larger constant factors. If your queue contains only a
few thousand items, the overhead of those extra calls to
<=> is far more
than occasionally calling
memcpy. In the worst case, a d-heap will require
d + 1 times more comparisons for each push + pop than a
Moving the sift-up and sift-down code into C helps some. But much more helpful
is optimizing the comparison of numeric scores, so
a <=> b never needs to be
called. I'm hopeful that MJIT will eventually obsolete this C-extension. JRuby
or TruffleRuby may already run the pure ruby version at high speed. This can be
hotspot code, and the basic ruby implementation should perform well if not for
the high overhead of
There are two fundamental heap operations: sift-up (used by push) and sift-down (used by pop).
- Both sift operations can perform as many as
log n / log dswaps, as the element may sift from the bottom of the tree to the top, or vice versa.
- Sift-up performs a single comparison per swap:
O(1). So pushing a new element is
O(log n / log d).
- Swap down performs as many as d comparions per swap:
O(d). So popping the min element is
O(d log n / log d).
Assuming every inserted element is eventually deleted from the root, d=4 requires the fewest comparisons for combined insert and delete:
- (1 + 2) lg 2 = 4.328085
- (1 + 3) lg 3 = 3.640957
- (1 + 4) lg 4 = 3.606738
- (1 + 5) lg 5 = 3.728010
- (1 + 6) lg 6 = 3.906774
Leaf nodes require no comparisons to shift down, and higher values for d have higher percentage of leaf nodes:
- d=2 has ~50% leaves,
- d=3 has ~67% leaves,
- d=4 has ~75% leaves,
- and so on...
See https://en.wikipedia.org/wiki/D-ary_heap#Analysis for deeper analysis.
Space usage is linear, regardless of d. However higher d values may provide better cache locality. Because the heap is a complete binary tree, the elements can be stored in an array, without the need for tree or list pointers.
Ruby can compare Numeric values much faster than other ruby objects, even if
those objects simply delegate comparison to internal Numeric values. And it is
often useful to use external scores for otherwise uncomparable values. So
DHeap uses twice as many entries (one for score and one for value)
as an array which only stores values.
docs/benchmarks.txt, as well as
docs/profile.txt for more details or updated results. These benchmarks were
measured with v0.4.0 and ruby 2.7.2 without MJIT enabled.
These benchmarks use very simple implementations for a pure-ruby heap and an
array that is kept sorted using
comparison, an alternate implementation
Three different scenarios are measured:
- push N values but never pop (clearing between each set of pushes).
- push N values and then pop N values. Although this could be used for heap sort, we're unlikely to choose heap sort over Ruby's quick sort implementation. I'm using this scenario to represent the amortized cost of creating a heap and (eventually) draining it.
- For a heap of size N, repeatedly push and pop while keeping a stable size. This is a very simple approximation for how most scheduler/timer heaps would be used. Usually when a timer fires it will be quickly replaced by a new timer, and the overall count of timers will remain roughly stable.
In these benchmarks,
DHeap runs faster than all other implementations for
every scenario and every value of N, although the difference is much more
noticable at higher values of N. The pure ruby heap implementation is
push alone at every value of N, but is significantly slower
than bsearch + insert for push + pop until N is very large (somewhere between
10k and 100k)!
For very small N values the benchmark implementations,
DHeap runs faster than
the other implementations for each scenario, although the difference is still
relatively small. The pure ruby binary heap is 2x or more slower than bsearch +
insert for common common push/pop scenario.
== push N (N=5) ========================================================== push N (c_dheap): 1701338.1 i/s push N (rb_heap): 971614.1 i/s - 1.75x slower push N (bsearch): 946363.7 i/s - 1.80x slower == push N then pop N (N=5) =============================================== push N + pop N (c_dheap): 1087944.8 i/s push N + pop N (findmin): 841708.1 i/s - 1.29x slower push N + pop N (bsearch): 773252.7 i/s - 1.41x slower push N + pop N (rb_heap): 471852.9 i/s - 2.31x slower == Push/pop with pre-filled queue of size=N (N=5) ======================== push + pop (c_dheap): 5525418.8 i/s push + pop (findmin): 5003904.8 i/s - 1.10x slower push + pop (bsearch): 4320581.8 i/s - 1.28x slower push + pop (rb_heap): 2207042.0 i/s - 2.50x slower
DHeap has pulled significantly ahead of bsearch + insert for all
scenarios, but the pure ruby heap is still slower than every other
implementation—even resorting the array after every
#push—in any scenario that
== push N (N=21) ========================================================= push N (c_dheap): 408307.0 i/s push N (rb_heap): 212275.2 i/s - 1.92x slower push N (bsearch): 169583.2 i/s - 2.41x slower == push N then pop N (N=21) ============================================== push N + pop N (c_dheap): 199435.5 i/s push N + pop N (findmin): 162024.5 i/s - 1.23x slower push N + pop N (bsearch): 146284.3 i/s - 1.36x slower push N + pop N (rb_heap): 72289.0 i/s - 2.76x slower == Push/pop with pre-filled queue of size=N (N=21) ======================= push + pop (c_dheap): 4836860.0 i/s push + pop (findmin): 4467453.9 i/s - 1.08x slower push + pop (bsearch): 3345458.4 i/s - 1.45x slower push + pop (rb_heap): 1560476.3 i/s - 3.10x slower
At higher values of N,
DHeap's logarithmic growth leads to little slowdown
DHeap#push, while insert's linear growth causes it to run slower and
slower. But because
#pop is O(1) for a sorted array and O(d log n / log d)
for a d-heap, scenarios involving
#pop remain relatively close even as N
increases to 5k:
== Push/pop with pre-filled queue of size=N (N=5461) ============== push + pop (c_dheap): 2718225.1 i/s push + pop (bsearch): 1793546.4 i/s - 1.52x slower push + pop (rb_heap): 707139.9 i/s - 3.84x slower push + pop (findmin): 122316.0 i/s - 22.22x slower
Somewhat surprisingly, bsearch + insert still runs faster than a pure ruby heap for the repeated push/pop scenario, all the way up to N as high as 87k:
== push N (N=87381) ====================================================== push N (c_dheap): 92.8 i/s push N (rb_heap): 43.5 i/s - 2.13x slower push N (bsearch): 2.9 i/s - 31.70x slower == push N then pop N (N=87381) =========================================== push N + pop N (c_dheap): 22.6 i/s push N + pop N (rb_heap): 5.5 i/s - 4.08x slower push N + pop N (bsearch): 2.9 i/s - 7.90x slower == Push/pop with pre-filled queue of size=N (N=87381) ==================== push + pop (c_dheap): 1815277.3 i/s push + pop (bsearch): 762343.2 i/s - 2.38x slower push + pop (rb_heap): 535913.6 i/s - 3.39x slower push + pop (findmin): 2262.8 i/s - 802.24x slower
Array#fetch is reading the input data, external to heap operations.
These benchmarks use integers for all scores, which enables significantly faster
a <=> b were used instead, then the difference between push
and pop would be much larger. And ruby's
Tracepoint impacts these different
implementations differently. So we can't use these profiler results for
comparisons between implementations. A sampling profiler would be needed for
more accurate relative measurements.
It's informative to look at the
ruby-prof results for a simple binary search +
insert implementation, repeatedly pushing and popping to a large heap. In
particular, even with 1000 members, the linear
Array#insert is still faster
than the logarithmic
Array#bsearch_index. At this scale, ruby comparisons are
still (relatively) slow and
memcpy is (relatively) quite fast!
%self total self wait child calls name location 34.79 2.222 2.222 0.000 0.000 1000000 Array#insert 32.59 2.081 2.081 0.000 0.000 1000000 Array#bsearch_index 12.84 6.386 0.820 0.000 5.566 1 DHeap::Benchmarks::Scenarios#repeated_push_pop d_heap/benchmarks.rb:77 10.38 4.966 0.663 0.000 4.303 1000000 DHeap::Benchmarks::BinarySearchAndInsert#<< d_heap/benchmarks/implementations.rb:61 5.38 0.468 0.343 0.000 0.125 1000000 DHeap::Benchmarks::BinarySearchAndInsert#pop d_heap/benchmarks/implementations.rb:70 2.06 0.132 0.132 0.000 0.000 1000000 Array#fetch 1.95 0.125 0.125 0.000 0.000 1000000 Array#pop
Contrast this with a simplistic pure-ruby implementation of a binary heap:
%self total self wait child calls name location 48.52 8.487 8.118 0.000 0.369 1000000 DHeap::Benchmarks::NaiveBinaryHeap#pop d_heap/benchmarks/implementations.rb:96 42.94 7.310 7.184 0.000 0.126 1000000 DHeap::Benchmarks::NaiveBinaryHeap#<< d_heap/benchmarks/implementations.rb:80 4.80 16.732 0.803 0.000 15.929 1 DHeap::Benchmarks::Scenarios#repeated_push_pop d_heap/benchmarks.rb:77
You can see that it spends almost more time in pop than it does in push. That is expected behavior for a heap: although both are O(log n), pop is significantly more complex, and has d comparisons per layer.
DHeap shows a similar comparison between push and pop, although it spends
half of its time in the benchmark code (which is written in ruby):
%self total self wait child calls name location 43.09 1.685 0.726 0.000 0.959 1 DHeap::Benchmarks::Scenarios#repeated_push_pop d_heap/benchmarks.rb:77 26.05 0.439 0.439 0.000 0.000 1000000 DHeap#<< 23.57 0.397 0.397 0.000 0.000 1000000 DHeap#pop 7.29 0.123 0.123 0.000 0.000 1000000 Array#fetch
Additionally, when used to sort timers, we can reasonably assume that:
- New timers usually sort after most existing timers.
- Most timers will be canceled before executing.
- Canceled timers usually sort after most existing timers.
So, if we are able to delete an item without searching for it, by keeping a map of positions within the heap, most timers can be inserted and deleted in O(1) time. Canceling a non-leaf timer can be further optimized by marking it as canceled without immediately removing it from the heap. If the timer is rescheduled before we garbage collect, adjusting its position will usually be faster than a delete and re-insert.
Alternative data structures
As always, you should run benchmarks with your expected scenarios to determine which is right.
Depending on what you're doing, maintaining a sorted
#insert might be just fine! As discussed above, although
O(n) for insertions,
memcpy is so fast on modern hardware that this
may not matter. Also, if you can arrange for insertions to occur near the end
of the array, that could significantly reduce the
memcpy overhead even more.
More complex heap varients, e.g. Fibonacci heap, can allow heaps to be merged as well as lower amortized time.
Hashed and Heirarchical Timing Wheels (or some variant in that
family of data structures) can be constructed to have effectively
time in most cases. Although the implementation for that data structure is more
complex than a heap, it may be necessary for enormous values of N.
included is will include
DHeap::Set, which augments the
basic heap with an internal
Hash, which maps a set of values to scores.
loosely inspired by go's timers. e.g: It lazily sifts its heap after deletion
and adjustments, to achieve faster average runtime for add and cancel
included is will include
DHeap::Lazy, which contains some
features that are loosely inspired by go's timers. e.g: It lazily sifts its
heap after deletion and adjustments, to achieve faster average runtime for add
and cancel operations.
Additionally, I was inspired by reading go's "timer.go" implementation to experiment with a 4-ary heap instead of the traditional binary heap. In the case of timers, new timers are usually scheduled to run after most of the existing timers. And timers are usually canceled before they have a chance to run. While a binary heap holds 50% of its elements in its last layer, 75% of a 4-ary heap will have no children. That diminishes the extra comparison overhead during sift-down.
After checking out the repo, run
bin/setup to install dependencies. Then, run
rake spec to run the tests. You can also run
bin/console for an interactive
prompt that will allow you to experiment.
To install this gem onto your local machine, run
bundle exec rake install. To
release a new version, update the version number in
version.rb, and then run
bundle exec rake release, which will create a git tag for the version, push
git commits and tags, and push the
.gem file to
Bug reports and pull requests are welcome on GitHub at https://github.com/nevans/d_heap. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.
The gem is available as open source under the terms of the MIT License.
Code of Conduct
Everyone interacting in the DHeap project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.