A heap is a great data structure for merging and sorting data. It’s implemented as a tree with the special heap property: A parent node is always less or equal than its children nodes, according to some comparison operator. In particular, the top element of the heap is always its smallest element. To guarantee quick retrieval and insertion, the tree doesn’t necessarily have to be well balanced. A leftist heap, for instance, is lopsided, with left branches always larger or equal to right branches.
The invariant of the leftist heap is expressed in terms of its right spines. The right spine of a tree is its rightmost path. Its length is called the rank of the tree. In a leftist heap the rank of the right child is always less or equal to the rank of the left child — the tree is leaning left. Because of that, the rank can grow at most logarithmically with the number of elements.
You can always merge two heaps by merging their right spines because they are just sorted linked lists. Since the right spines are at most logarithmically long, the merge can be done in logarithmic time. Moreover, it’s always possible to rotate nodes in the merged path to move heavier branches to the left and thus restore the leftist property.
With merging thus figured out, deletion from the top and insertion are trivial. After removing the top, you just merge left and right children. When inserting a new element, you create a singleton heap and merge it with the rest.
Implementation
The implementation of the functional leftist heap follows the same pattern we’ve seen before. We start with the definition:
A heap can either be empty or consist of a rank, a value, and two children: left and right heaps.
Let’s start with the definition of a non-empty heap as a private structure inside the Heap
class:
template<class T>
class Heap
{
private:
struct Tree
{
Tree(T v) : _rank(1), _v(v) {}
Tree(int rank
, T v
, std::shared_ptr<const Tree> const & left
, std::shared_ptr<const Tree> const & right)
: _rank(rank), _v(v), _left(left), _right(right)
{}
int _rank;
T _v;
std::shared_ptr<const Tree> _left;
std::shared_ptr<const Tree> _right;
};
std::shared_ptr<Tree> _tree;
...
};
Heap data is just the shared_ptr<Tree>
. An empty shared_ptr
encodes an empty heap, otherwise it points to a non-empty Tree
.
We’ll make the constructor of a non-empty heap private, because not all combinations of its arguments create a valid heap — see the two assertions:
Heap(T x, Heap const & a, Heap const & b) { assert(a.isEmpty() || x <= a.front()); assert(b.isEmpty() || x <= b.front()); // rank is the length of the right spine if (a.rank() >= b.rank()) _tree = std::make_shared<const Tree>( b.rank() + 1, x, a._tree, b._tree); else _tree = std::make_shared<const Tree>( a.rank() + 1, x, b._tree, a._tree); }
We’ll make sure these assertions are true whenever we call this constructor from inside Heap
code. This constructor guarantees that, as long as the two arguments are leftist heaps, the result is also a leftist heap. It also calculates the rank of the resulting heap by adding one to the rank of its right, shorter, branch. We’ll set the rank of an empty heap to zero (see implementation of rank
).
As always with functional data structures, it’s important to point out that the construction takes constant time because the two subtrees are shared rather than copied. The sharing is thread-safe because, once constructed, the heaps are always immutable.
The clients of the heap will need an empty heap constructor:
Heap() {}
A singleton constructor might come in handy too:
explicit Heap(T x) : _tree(std::make_shared(x)) {}
They will need a few accessors as well:
bool isEmpty() const { return !_tree; }
int rank() const { if (isEmpty()) return 0; else return _tree->_rank; }
The top, smallest, element is accessed using front
:
T front() const { return _tree->_v; }
As I explained, the removal of the top element is implemented by merging left and right children:
Heap pop_front() const { return merge(left(), right()); }
Again, this is a functional data structure, so we don’t mutate the original heap, we just return the new heap with the top removed. Because of the sharing, this is a cheap operation.
The insertion is also done using merging. We merge the original heap with a singleton heap:
Heap insert(T x) { return merge(Heap(x), *this); }
The workhorse of the heap is the recursive merge algorithm below:
static Heap merge(Heap const & h1, Heap const & h2) { if (h1.isEmpty()) return h2; if (h2.isEmpty()) return h1; if (h1.front() <= h2.front()) return Heap(h1.front(), h1.left(), merge(h1.right(), h2)); else return Heap(h2.front(), h2.left(), merge(h1, h2.right())); }
If neither heap is empty, we compare the top elements. We create a new heap with the smaller element at the top. Now we have to do something with the two children of the smaller element and the other heap. First we merge the right child with the other heap. This is the step I mentioned before: the merge follows the right spines of the heaps, guaranteeing logarithmic time. The left child is then combined with the result of the merge. Notice that the Heap
constructor will automatically rotate the higher-rank tree to the left, thus keeping the leftist property. The code is surprisingly simple.
You might wonder how come we are not worried about the trees degenerating — turning into (left leaning) linked lists. Consider, however, that such a linked list, because of the heap property, would always be sorted. So the retrieval of the smallest element would still be very fast and require no restructuring. Insertion of an element smaller than the existing top would just prepend it to the list — a very cheap operation. Finally, the insertion of a larger element would turn this element into a length-one right spine — the right child of the top of the linked list. The degenerate case is actually our best case.
Turning an unsorted list of elements into a heap could naively be done in O(N*log(N)) time by inserting the elements one by one. But there is a better divide-and-conquer algorithm that does it in O(N) time (the proof that it’s O(N) is non-trivial though):
template<class Iter> static Heap heapify(Iter b, Iter e) { if (b == e) return Heap(); if (e - b == 1) return Heap(*b); else { Iter mid = b + (e - b) / 2; return merge(heapify(b, mid), heapify(mid, e)); } }
This function is at the core of heap sort: you heapify a list and then extract elements from the top one by one. Since the extraction takes O(log(N)) time, you end up with a sort algorithm with the worst case performance O(N*log(N)). On average, heapsort is slower than quicksort, but quicksort’s worst case performance is O(N2), which might be a problem in some scenarios.
- Complete source code is available on GitHub.
- The previous blog post in this series was about functional data structures and concurrency.
January 24, 2014 at 4:20 pm
my guestimation is that code will be slow because shared_ptr, if bored try with boost sp and disable atomic increments with macro. anyway I once did a treeish structure that for children had indexes into an vector(single vector of nodes where insert is push_back, remove swap selected, last, pop), since you prob care about immutability i guess stable_vector is more appropriate.
that said I like this topics, so please continue if you have motivation, I prefer this kind of posts compared to Haskell inspired one that I cant understand 😛
June 9, 2014 at 10:36 am
[…] many tree-like data structures that are logarithmically cheap to clone-mutate (red-black trees, leftist heaps). Parallel algorithms are easy to implement with functional data structures, since the programmer […]