April 21, 2014

Getting Lazy with C++

Posted by Bartosz Milewski under Atomics, C++, Concurrency, Functional Programming, Metaprogramming, Monads, Multithreading, Parallelism, Programming
[38] Comments

Lazy evaluation can be a powerful tool for structuring your code. For instance, it can let you turn your code inside out, inverting the flow of control. Many a Haskell program take advantage of laziness to express algorithms in clear succinct terms, turning them from recipes to declarations.

The question for today’s blog post is: How can we tap the power of lazy evaluation in an inherently eager language like C++? I’ll lead you through a simple coding example and gradually introduce the building blocks of lazy programming: the suspension, the lazy stream, and a whole slew of functional algorithms that let you operate on them. In the process we’ll discover some fundamental functional patterns like functors, monads, and monoids. I have discussed them already in my post about C++ futures. It’s very edifying to see them emerge in a completely different context.

The Problem

Let’s write a program that prints the first n Pythagorean triples. A Pythagorean triple consists of three integers, x, y, and z, that satisfy the relation x² + y² = z². Let’s not be fancy and just go with the brute force approach. Here’s the program in C:

void printNTriples(int n)
{
    int i = 0;
    for (int z = 1; ; ++z)
        for (int x = 1; x <= z; ++x)
            for (int y = x; y <= z; ++y)
                if (x*x + y*y == z*z) {
                    printf("%d, %d, %d\n", x, y, z);
                    if (++i == n)
                        return;
                }
}

Here, a single C function serves three distinct purposes: It

Generates Pythagorean triples,
Prints them,
Counts them; and when the count reaches n, breaks.

This is fine, as long as you don’t have to modify or reuse this code. But what if, for instance, instead of printing, you wanted to draw the triples as triangles? Or if you wanted to stop as soon as one of the numbers reached 100? The problem with this code is that it’s structured inside out: both the test and the sink for data are embedded in the innermost loop of the algorithm. A more natural and flexible approach would be to:

Generate the list of Pythagorean triples,
Take the first ten of them, and
Print them.

And that’s exactly how you’d write this program in Haskell:

main = print (take 10 triples)

triples = [(x, y, z) | z <- [1..]
                     , x <- [1..z]
                     , y <- [x..z]
                     , x^2 + y^2 == z^2]

This program reads: take 10 triples and print them. It declares triples as a list (square brackets mean a list) of triples (x, y, z), where (the vertical bar reads “where”) z is an element of the list of integers from 1 to infinity, x is from 1 to z, y is from x to z, and the sum of squares of x and y is equal to the square of z. This notation is called “list comprehension” and is characteristic of Haskell terseness.

You see the difference? Haskell let’s you abstract the notion of the list of Pythagorean triples so you can operate on it as one entity, whereas in C (or, for that matter, in C++) we were not able to disentangle the different, orthogonal, aspects of the program.

The key advantage of Haskell in this case is its ability to deal with infinite lists. And this ability comes from Haskell’s inherent laziness. Things are never evaluated in Haskell until they are absolutely needed. In the program above, it was the call to print that forced Haskell to actually do some work: take 10 elements from the list of triples. Since the triples weren’t there yet, it had to calculate them, but only as many as were requested and not a number more.

Suspension

We’ll start with the most basic building block of laziness: a suspended function. Here’s the first naive attempt:

template<class T>
class Susp {
public:
    explicit Susp(std::function<T()> f)
        : _f(f)
    {}
    T get() { return _f(); }
private:
    std::function<T()> _f;
};

We often create suspensions using lambda functions, as in:

int x = 2;
int y = 3;
Susp<int> sum([x, y]() { return x + y; });
...
int z = sum.get();

Notice that the suspended lambda may capture variables from its environment: here x and y. A lambda, and therefore a suspension, is a closure.

The trouble with this implementation is that the function is re-executed every time we call get. There are several problems with that: If the function is not pure, we may get different values each time; if the function has side effects, these may happen multiple times; and if the function is expensive, the performance will suffer. All these problems may be addressed by memoizing the value.

Here’s the idea: The first time the client calls get we should execute the function and store the returned value in a member variable. Subsequent calls should go directly to that variable. We could implement this by setting a Boolean flag on the first call and then checking it on every subsequent call, but there’s a better implementation that uses thunks.

A thunk is a pointer to a free function taking a suspension (the this pointer) and returning a value (by const reference). The get method simply calls this thunk, passing it the this pointer.

Initially, the thunk is set to thunkForce, which calls the method setMemo. This method evaluates the function, stores the result in _memo, switches the thunk to thunkGet, and returns the memoized value. On subsequent calls get goes through the getMemo thunk which simply returns the memoized value.

template<class T>
class Susp
{
    // thunk
    static T const & thunkForce(Susp * susp) {
        return susp->setMemo();
    }
    // thunk
    static T const & thunkGet(Susp * susp) {
        return susp->getMemo();
    }
    T const & getMemo() {
        return _memo;
    }
    T const & setMemo() {
        _memo = _f();
        _thunk = &thunkGet;
        return getMemo();
    }
public:
    explicit Susp(std::function<T()> f)
        : _f(f), _thunk(&thunkForce), _memo(T())
    {}
    T const & get() {
        return _thunk(this);
    }
private:
    T const & (*_thunk)(Susp *);
    mutable T   _memo;

    std::function<T()> _f;
};

(By the way, the function pointer declaration of _thunk looks pretty scary in C++, doesn’t it?)

[Edit: I decided to remove the discussion of the thread safe implementation since it wasn’t ready for publication. The current implementation is not thread safe.]

You can find a lot more detail about the Haskell implementation of suspended functions in the paper by Tim Harris, Simon Marlow, and Simon Peyton Jones, Haskell on a Shared-Memory Multiprocessor.

Lazy Stream

The loop we used to produce Pythagorean triples in C worked on the push principle — data was pushed towards the sink. If we want to deal with infinite lists, we have to use the pull principle. It should be up to the client to control the flow of data. That’s the inversion of control I was talking about in the introduction.

We’ll use a lazy list and call it a stream. In C++ a similar idea is sometimes expressed in terms of input and forward iterators, although it is understood that an iterator itself is not the source or the owner of data — just an interface to one. So we’ll stick with the idea of a stream.

We’ll implement the stream in the functional style as a persistent data structure fashioned after persistent lists (see my series of blog post on persistent data structures). It means that a stream, once constructed, is never modified. To “advance” the stream, we’ll have to create a new one by calling the const method pop_front.

Let’s start with the definition: A stream is either empty or it contains a suspended cell. This immediately suggests the implementation as a (possibly null) pointer to a cell. Since the whole stream is immutable, the cell will be immutable too, so it’s perfectly safe to share it between copies of the stream. We can therefore use a shared pointer:

template<class T>
class Stream
{
private:
    std::shared_ptr <Susp<Cell<T>>> _lazyCell;
};

Of course, because of reference counting and memoization, the stream is only conceptually immutable and, in the current implementation, not thread safe.

So what’s in the Cell? Remember, we want to be able to generate infinite sequences, so Stream must contain the DNA for not only producing the value of type T but also for producing the offspring — another (lazy) Stream of values. The Cell is just that: A value and a stream.

template<class T>
class Cell
{
public:
    Cell() {} // need default constructor for memoization
    Cell(T v, Stream<T> const & tail)
        : _v(v), _tail(tail)
    {}
    explicit Cell(T v) : _v(v) {}
    T val() const {
        return _v;
    }
    Stream<T> pop_front() const {
        return _tail;
    }
private:
    T _v;
    Stream<T> _tail;
};

This mutually recursive pair of data structures works together amazingly well.

template<class T>
class Stream
{
private:
    std::shared_ptr <Susp<Cell<T>>> _lazyCell;
public:
    Stream() {}
    Stream(std::function<Cell<T>()> f)
        : _lazyCell(std::make_shared<Susp<Cell<T>>>(f))
    {}
    Stream(Stream && stm)
        : _lazyCell(std::move(stm._lazyCell))
    {}
    Stream & operator=(Stream && stm)
    {
        _lazyCell = std::move(stm._lazyCell);
        return *this;
    }
    bool isEmpty() const
    {
        return !_lazyCell;
    }
    T get() const
    {
        return _lazyCell->get().val();
    }
    Stream<T> pop_front() const
    {
        return _lazyCell->get().pop_front();
    }
};

There are several things worth pointing out. The two constructors follow our formal definition of the Stream: one constructs an empty stream, the other constructs a suspended Cell. A suspension is created from a function returning Cell.

I also added a move constructor and a move assignment operator for efficiency. We’ll see it used in the implementation of forEach.

The magic happens when we call get for the first time. That’s when the suspended Cell comes to life. The value and the new stream are produced and memoized for later use. Or, this may happen if the first call is to pop_front. Notice that pop_front is a const method — the Stream itself is immutable. The method returns a new stream that encapsulates the rest of the sequence.

Let’s get our feet wet by constructing a stream of integers from n to infinity. The constructor of a Stream takes a function that returns a Cell. We’ll use a lambda that captures the value of n. It creates a Cell with that value and a tail, which it obtains by calling intsFrom with n+1:

Stream<int> intsFrom(int n)
{
    return Stream<int>([n]()
    {
        return Cell<int>(n, intsFrom(n + 1)); 
    });
}

It’s a recursive definition, but without the usual recursive function calls that eat up the stack. The call to the inner intsFrom is not made from the outer intsFrom. Instead it’s made the first time get is called on the emerging Stream.

Of course, we can also create finite streams, like this one, which produces integers from n to m:

Stream<int> ints(int n, int m)
{
    if (n > m)
        return Stream<int>();
    return Stream<int>([n, m]()
    {
        return Cell<int>(n, ints(n + 1, m));
    });
}

The trick is to capture the limit m as well as the recursion variable n. When the limit is reached, we simply return an empty Stream.

We’ll also need the method take, which creates a Stream containing the first n elements of the original stream:

Stream take(int n) const {
    if (n == 0 || isEmpty())
        return Stream();
    auto cell = _lazyCell;
    return Stream([cell, n]()
    {
        auto v = cell->get().val();
        auto t = cell->get().pop_front();
        return Cell<T>(v, t.take(n - 1));
    });
}

Here we are capturing the suspended cell and use it to lazily generate the elements of the new, truncated, Stream. Again, the key to understanding why this works is to keep in mind that Streams and Cells are conceptually immutable, and therefore can be shared by the implementation. This has some interesting side effects, which don’t influence the results, but change the performance. For instance, if the caller of take forces the evaluation of the first n elements — e.g., by passing them through the consuming forEach below — these elements will appear miraculously memoized in the original Stream.

Finally, we’ll need some way to iterate through streams. Here’s an implementation of forEach that consumes the stream while enumerating it and feeding its elements to a function.

template<class T, class F>
void forEach(Stream<T> strm, F f)
{
    while (!strm.isEmpty())
    {
        f(strm.get());
        strm = strm.pop_front();
    }
}

It’s the assignment:

strm = strm.pop_front();

which consumes the stream by decreasing the reference count of the head of the Stream. In particular, if you pass an rvalue Stream to forEach, its elements will be generated and deleted in lockstep. The algorithm will use constant memory, independent of the virtual length of the Stream. What Haskell accomplishes with garbage collection, we approximate in C++ with reference counting and shared_ptr.

Working with Streams

It’s not immediately obvious how to translate our Pythagorean triple program from nested loops to lazy streams, so we’ll have to take inspiration from the corresponding Haskell program. Let me first rewrite it using a slightly different notation:

triples = do
    z <- [1..]
    x <- [1..z]
    y <- [x..z]
    guard (x^2 + y^2 == z^2)
    return (x, y, z)

The general idea is this: Start with the stream of integers from 1 to infinity. For every such integer — call it z — create a stream from 1 to z. For each of those — call them x — create a stream from x to z. Filter out those which don’t satisfy the Pythagorean constraint. Finally, output a stream of tuples (x, y, z).

So far we’ve learned how to create a stream of integers — we have the function intsFrom. But now we’ll have to do something for each of these integers. We can’t just enumerate those integers and apply a function to each, because that would take us eternity. So we need a way to act on each element of a stream lazily.

In functional programming this is called mapping a function over a list. In general, a parameterized data structure that can be mapped over is called a functor. I’m going to show you that our Stream is a functor.

Stream as a Functor

The idea is simple: we want to apply a function to each element of a stream to get a new transformed stream (it’s very similar to the std::transform algorithm from STL). The catch is: We want to do it generically and lazily.

To make the algorithm — we’ll call it fmap — generic, we have to parameterize it over types. The algorithm starts with a Stream of elements of type T and a function from T to some other type U. The result should be a stream of U.

We don’t want to make U the template argument, because then the client would have to specify it explicitly. We want the compiler to deduce this type from the type of the function. We want, therefore, the function type F to be the parameter of our template (this will also allow us to call it uniformly with function pointers, function objects, and lambdas):

template<class T, class F>
auto fmap(Stream<T> stm, F f)

Without the use of concepts, we have no way of enforcing, or even specifying, that F be a type of a function from T to U. The best we can do is to statically assert it inside the function:

static_assert(std::is_convertible<F, std::function<U(T)>>::value,
        "fmap requires a function type U(T)");

But what is U? We can get at it using decltype:

decltype(f(stm.get()));

Notice that decltype takes, as an argument, an expression that can be statically typed. Here, the expression is a function call of f. We also need a dummy argument for this function: we use the result of stm.get(). The argument to decltype is never evaluated, but it is type-checked at compile time.

One final problem is how to specify the return type of fmap. It’s supposed to be Stream<U>, but we don’t know U until we apply decltype to the arguments of fmap. We have to use the new auto function declaration syntax of C++11. So here are all the type-related preliminaries:

template<class T, class F>
auto fmap(Stream<T> stm, F f)->Stream<decltype(f(stm.get()))>
{
    using U = decltype(f(stm.get()));
    static_assert(std::is_convertible<F, std::function<U(T)>>::value,
        "fmap requires a function type U(T)");
    ...
}

Compared to that, the actual implementation of fmap seems rather straightforward:

    if (stm.isEmpty()) return Stream<U>();
    return Stream<U>([stm, f]()
    {
        return Cell<U>(f(stm.get()), fmap(stm.pop_front(), f));
    });

In words: If the stream is empty, we’re done — return an empty stream. Otherwise, create a new stream by suspending a lambda function. That function captures the original stream (by value) and the function f, and returns a Cell. That cell contains the value of f acting on the first element of the original stream, and a tail. The tail is created with fmap acting on the rest of the original stream.

Equipped with fmap, we can now attempt to take the first step towards generating our triples: apply the function ints(1, z) to each element of the stream intsFrom(1):

fmap(intsFrom(1), [](int z)
{
    return ints(1, z);
});

The result is a Stream of Streams of integers of the shape:

But now we are stuck. We’d like to apply ints(x, z) to each element of that sequence, but we don’t know how to get through two levels of Stream. Our fmap can only get through one layer. We need a way to flatten a Stream of Streams. That functionality is part of what functional programmers call a monad. So let me show you that Stream is indeed a monad.

Stream as a Monad

If you think of a Stream as a list, the flattening of a list of lists is just concatenation. Suppose for a moment that we know how to lazily concatenate two Streams (we’ll get to it later) and let’s implement a function mjoin that concatenates a whole Stream of Streams.

You might have noticed a pattern in the implementation of lazy functions on streams. We use some kind of recursion, which starts with “Are we done yet?” If not, we do an operation that involves one element of the stream and the result of a recursive call to the function itself.

The “Are we done yet?” question usually involves testing for an empty stream. But here we are dealing with a Stream of Streams, so we have to test two levels deep. This way we’ll ensure that the concatenation of a Stream of empty Streams immediately returns an empty Stream.

The recursive step in mjoin creates a Cell whose element is the head of the first stream, and whose tail is the concatenation of the tail of the first stream and the result of mjoin of the rest of the streams:

template<class T>
Stream<T> mjoin(Stream<Stream<T>> stm)
{
    while (!stm.isEmpty() && stm.get().isEmpty())
    {
        stm = stm.pop_front();
    }
    if (stm.isEmpty()) return Stream<T>();
    return Stream<T>([stm]()
    {
        Stream<T> hd = stm.get();
        return Cell<T>( hd.get()
                      , concat(hd.pop_front(), mjoin(stm.pop_front())));
    });
}

The combination of fmap and mjoin lets us compose function like intsFrom or ints that return Streams. In fact, this combination is so common that it deserves its own function, which we’ll call mbind:

template<class T, class F>
auto mbind(Stream<T> stm, F f) -> decltype(f(stm.get()))
{
    return mjoin(fmap(stm, f));
}

If we use mbind in place of fmap:

mbind(intsFrom(1), [](int z)
{
    return ints(1, z);
});

we can produce a flattened list:

1 1 2 1 2 3 1 2 3 4...

But it’s not just the list: Each element of the list comes with variables that are defined in its environment — here the variable z. We can keep chaining calls to mbind and capture more variables in the process:

mbind(intsFrom(1), [](int z)
{
    return mbind(ints(1, z), [z](int x)
    {
        return mbind(ints(x, z), [x, z](int y)
        {
            ...
        }
    }
}

At this point we have captured the triples x, y, z, and are ready for the Pythagorean testing. But before we do it, let’s define two additional functions that we’ll use later.

The first one is mthen which is a version of mbind that takes a function of no arguments. The idea is that such a function will be executed for each element of the stream, but it won’t use the value of that element. The important thing is that the function will not be executed when the input stream is empty. In that case, mthen will return an empty stream.

We implement mthen using a slightly modified version of fmap that takes a function f of no arguments:

template<class T, class F>
auto fmapv(Stream<T> stm, F f)->Stream<decltype(f())>
{
    using U = decltype(f());
    static_assert(std::is_convertible<F, std::function<U()>>::value,
        "fmapv requires a function type U()");

    if (stm.isEmpty()) return Stream<U>();
    return Stream<U>([stm, f]()
    {
        return Cell<U>(f(), fmapv(stm.pop_front(), f));
    });
}

We plug it into the definition of mthen the same way fmap was used in mbind:

template<class T, class F>
auto mthen(Stream<T> stm, F f) -> decltype(f())
{
    return mjoin(fmapv(stm, f));
}

The second useful function is mreturn, which simply turns a value of any type into a one-element Stream:

template<class T>
Stream<T> mreturn(T v)
{
    return Stream<T>([v]() {
        return Cell<T>(v);
    });
}

We’ll need mreturn to turn our triples into Streams.

It so happens that a parameterized type equipped with mbind and mreturn is called a monad (it must also satisfy some additional monadic laws, which I won’t talk about here). Our lazy Stream is indeed a monad.

Stream as a Monoid and a Monad Plus

When implementing mjoin we used the function concat to lazily concatenate two Streams. Its implementation follows the same recursive pattern we’ve seen so many times:

template<class T>
Stream<T> concat( Stream<T> lft
                , Stream<T> rgt)
{
    if (lft.isEmpty())
        return rgt;
    return Stream<T>([=]()
    {
        return Cell<T>(lft.get(), concat<T>(lft.pop_front(), rgt));
    });
}

What’s interesting is that the concatenation of streams puts them under yet another well known functional pattern: a monoid. A monoid is equipped with a binary operation, just like concat, which must be associative and possess a unit element. It’s easy to convince yourself that concatenation of Streams is indeed associative, and that the neutral element is an empty Stream. Concatenating an empty Stream, whether in front or in the back of any other Stream, doesn’t change the original Stream.

What’s even more interesting is that being a combination of a monoid and a monad makes Stream into a monad plus, and every monad plus defines a guard function — exactly what we need for the filtering of our triples. This function takes a Boolean argument and outputs a Stream. If the Boolean is false, the Stream is empty (the unit element of monad plus!), otherwise it’s a singleton Stream. We really don’t care what value sits in this Stream — we never use the result of guard for anything but the flow of control. In Haskell, there is a special “unit” value () — here I use a nullptr as its closest C++ analog.

Stream<void*> guard(bool b)
{
    if (b) return Stream<void*>(nullptr);
    else return Stream<void*>();
}

We can now pipe the result of guard into mthen, which will ignore the content of the Stream but won’t fire when the Stream is empty. When the Stream is not empty, we will call mreturn to output a singleton Stream with the result tuple:

Stream<std::tuple<int, int, int>> triples()
{
    return mbind(intsFrom(1), [](int z)
    {
        return mbind(ints(1, z), [z](int x)
        {
            return mbind(ints(x, z), [x, z](int y)
            {
                return mthen(guard(x*x + y*y == z*z), [x, y, z]()
                {
                    return mreturn(std::make_tuple(x, y, z));
                });
            });
        });
    });
}

These singletons will then be concatenated by the three levels of mbind to create one continuous lazy Stream of Pythagorean triples.

Compare this function with its Haskell counterpart:

triples = do
    z <- [1..]
    x <- [1..z]
    y <- [x..z]
    guard (x^2 + y^2 == z^2)
    return (x, y, z)

Now, the client can take 10 of those triples from the Stream — and the triples still won’t be evaluated!. It’s the consuming forEach that finally forces the evaluation:

void test()
{
    auto strm = triples().take(10);
    forEach(std::move(strm), [](std::tuple<int, int, int> const & t)
    {
        std::cout << std::get<0>(t) << ", " 
                  << std::get<1>(t) << ", " 
                  << std::get<2>(t) << std::endl;
    });
}

Conclusion

The generation of Pythagorean triples is a toy example, but it shows how lazy evaluation can be used to restructure code in order to make it more reusable. You can use the same function triples to print the values in one part of your program and draw triangles in another. You can filter the triples or impose different termination conditions. You can use the same trick to generate an infinite set of approximation to the solution of a numerical problem, and then use different strategies to truncate it. Or you can create an infinite set of animation frames, and so on.

The building blocks of laziness are also reusable. I have used them to implement the solution to the eight-queen problem and a conference scheduling program. Once they made thread safe, the combinators that bind them are thread safe too. This is, in general, the property of persistent data structures.

You might be concerned about the performance of lazy data structures, and rightly so. They use the heap heavily, so memory allocation and deallocation is a serious performance bottleneck. There are many situation, though, where code structure, reusability, maintenance, and correctness (especially in multithreaded code) are more important than performance. And there are some problems that might be extremely hard to express without the additional flexibility gained from laziness.

I made the sources to all code in this post available on GitHub.

38 Responses to “Getting Lazy with C++”

Mathias Gaunard Says:

April 21, 2014 at 11:52 am
The type erasure involved with std::function makes it too slow for any serious use.
You have at least a function call for each element you iterate.
Sergei Trofimovich Says:

April 21, 2014 at 1:54 pm
Really nice post 🙂

A minor typo:

Susp sum(x, y { return x + y; };
looks like the closing ‘)’ is mising.

Thanks!
Bartosz Milewski Says:

April 21, 2014 at 2:31 pm
@Mathias: Yes, type erasure is not cheap and I try to avoid it as much as possible. Unfortunately, I couldn’t figure out how to avoid it in this case. If the suspension were paremeterized by the full type of the lambda (the way mbind is), that type would leak to Stream and I wouldn’t be able to concatenate streams from different sources. Having said that, I don’t think this is the main bottleneck — memory management is. The cost of a function call per item creation fades in comparison with the cost of memory allocation and deallocation. I wouldn’t recommend using lazy streams in the innermost loop of your program. Anywhere else, though, I’d rather go with safe, maintainable, and reusable way of coding.
Bartosz Milewski Says:

April 21, 2014 at 2:32 pm
@Sergei: Thank you, fixed!
Mathias Gaunard Says:

April 21, 2014 at 2:55 pm
There is no hard requirement for type erasure here, you could just keep the original types as-is. This is similar to expression templates.
Concat could also be implemented this way. You would just return a more complex type than Stream.that would contain the two types of the inputs.

notlost Says:

April 21, 2014 at 7:11 pm

Why would you do any of this when an iterator already provides a lazy abstraction and is an order of magnitude more efficient?

	#include <boost/tuple/tuple.hpp>
	#include <boost/iterator/iterator_facade.hpp>

	struct triple_iterator: public boost::iterator_facade
	<triple_iterator, boost::tuple<int, int, int> const,
	boost::random_access_traversal_tag>
	{
	int m;
	value_type value;

	triple_iterator(): m(1), value{3, 4, 5} {
	}

	bool equal (triple_iterator const& o) const {
	return m == o.m;
	}

	difference_type distance_to (triple_iterator const& o) const {
	return o.m – m;
	}

	void increment() { ++m; calc(); }
	void decrement() { –m; calc(); }
	void advance (difference_type x) { m += x; calc(); }
	reference dereference() const { return value; }

	void calc() {
	auto const n = m + 1;
	auto const nn = n*n;
	auto const mm = m*m;
	auto const nm = n*m;
	value = value_type { nn – mm, 2 * nm, nn + mm };
	}
	};

	#include <algorithm>
	#include <iterator>
	#include <iostream>
	#include <boost/tuple/tuple_io.hpp>
	namespace tuples = boost::tuples;

	int main() {
	std::copy_n (triple_iterator(), 10, std::ostream_iterator
	<triple_iterator::value_type>(std::cout, "\n"));
	}

view raw

gistfile1.cpp

hosted with ❤ by GitHub

Laurent Giroud Says:

April 21, 2014 at 9:01 pm
I admit that I only carefully analyzed the first examples then skimmed through the rest of your article but looking at the final result in c++ suggests to me that Haskell is a much better lazy language than c++ can ever dream of being. I feel like I would rather abandon c++ entirely in favour of Haskell rather than jump through these hoops.

C++ looks like it would require an enormous dose of syntactic sugar or maybe lisp like macros to make this declarative style more appealing. As of now the signal to noise ratio of such code is pretty low, at least for the triples function. I do not think the standards committee has ever seriously considered reducing the language verbosity a priority so I doubt this aspect can be improved.

I wonder if it would be feasible and viable to use an Haskell DSL to write triple in a way similar to the Haskell code and generate the c++ code from it?

Also, did you do any performance comparisons between the Haskell and c++ versions?
Robert Obryk Says:

April 21, 2014 at 10:34 pm
There seems to be a problem with double evaluation in Susp: If we have two threads running setMemo concurrently and one of them has already written to _memo and _thunk, then the other one can make a write to _memo that races with a completely new thread executing getMemo.
Bartosz Milewski Says:

April 22, 2014 at 10:34 am
@notlost: In the reddit discussion jzwinck posted an iterator implementation that actually produces Pythagorean triples.

But this is not the point. You can come up with a handcrafted solution to every particular problem without ever noticing a pattern. My goal was to demonstrate this pattern — actually several patterns.

If you look at my github repository you’ll see lazy streams used in the implementation of a parallel solution to the 8 queen problem (LazyQueen project). A more advanced example is the conference timetable problem. The initial effort in understanding how lazy streams work pays back in producing simple elegant solutions to many diverse problems. And, most importantly, they are easy to parallelize, which you can’t say about any of the handcrafted solutions.
Bartosz Milewski Says:

April 22, 2014 at 11:06 am
@Robert: In principle, you’re right. But I don’t think you can race if you are overwriting with exactly the same value — hence the requirement for the function to be pure: it will always produce the same value. Read the Haskell paper I mentioned in the post for a more detailed discussion.
monkeybuddhaSean Says:

April 22, 2014 at 1:49 pm
You might be interested in a port of Python’s itertools library to C++11

https://github.com/ryanhaining/cppitertools

It provides, zip, takewhile, dropwhile, map, filter, etc.
pekh Says:

April 22, 2014 at 3:12 pm
Writing the same value in two threads to the same memory location is a data race. According to the standard we have undefined behavior in this case.

1.10.4
Two expression evaluations conﬂict if one of them modiﬁes a memory location (1.7) and the other one accesses or modiﬁes the same memory location.

1.10.21

The execution of a program contains a data race if it contains two conﬂicting actions in diﬀerent threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undeﬁned behavior.

It might work for very simple cases (T = int, …) on architectures playing nice but I would not count on it. The compiler will assume the code is race free and might do funny things that break the code.

Another issue is that _memo = f() transfers the control to user code (move/copy-assignment operator) which might contain data race. Consider T = std::string in your example: f() have no side-effects and return “example string” and both threads will try to move-assign into _memo causing problems. Another type that will fail here is std::shared_ptr, you can write to different shared_ptr instances even thought they are copies and share their reference count object. You are however not allowed to write to the same shared_ptr instance in multiple threads.

There is another race problem regarding _memo pointed out by Robert. Thread 1 stalls inside f(). Thread 2 executes f() and write to _memo and _thunk. Thread 2 or 3 now execute and tries to read from _memo and finds itself in a race with Thread 1 writing to _memo. This is however basically the same issue pointed out above.
Anon Coward Says:

April 22, 2014 at 3:16 pm
Won’t you run into trouble in the Cell(T v, Stream const & tail) constructor since the move constructor for Stream prevents the `tail’ parameter from being copyable by implicitly deleting the Stream copy constructor?
Bartosz Milewski Says:

April 22, 2014 at 4:33 pm
@Anon Coward: I haven’t run into this trouble, but you’re right. In fact, I believe now that the default move constructor should do the right thing anyway, so I didn’t have to define it explicitly. I’ll have to check that.
Bartosz Milewski Says:

April 22, 2014 at 5:04 pm
@pekh: Damn C++ memory model and processor architectures! You might be right. Even though it will work on the x86, it’s not portable C++. I’ll have to think about it some more. This is work in progress. (Of course, there’s always the version with the lock to fall back on.)
Germán Diago Says:

April 23, 2014 at 9:00 am
I find functional programming pretty interesting because of its properties. I took a look at Haskell and now I am learning Lisp. But I think the most natural way to approach this in c++ would be with some sort of coroutines. We cannot put this high reference-counting + heap abuse when one of the strong points of C++ is being fast. We have boost.coroutine for now, but it’s not standard c++. Just my two cents, a personal opinion.
ericboost Says:

April 23, 2014 at 2:26 pm
I think Mathias is right that this can be done without type-erasure, but I can’t be sure until I try to implement it. C++11 makes this hard since lambdas can’t appear in non-evaluated contexts (e.g. decltype), and so can’t be returned from functions. With C++14 and deduced function return types, that limitation is gone.

I would love to try to implement this without the type-erasure and dynamic allocation. If it can be made efficient, and if the right syntactic sugar can be found, this would make a wonderful addition to my range lib as a simple and powerful way to define input ranges.
ericboost Says:

April 24, 2014 at 10:01 am
Yup, doable without type-erasure or dynamic allocation:

https://github.com/ericniebler/range-v3/blob/master/libs/range/v3/scratch/range.cpp

I only needed to add a flatten (a.k.a. mjoin) range adaptor to my range lib to get this to work. (Well, and fix a few bugs.)

I’d like to find a nicer syntax for this. Hopefully, I can make it as concise and expressive as Python’s list comprehensions. Then we’ll really have something.
BaSz Says:

April 24, 2014 at 11:38 am
@Bartosz – pekh is definitely right – this is a data race. i guess the best mechanism to overcome all issues, related to computing value here, would be call_once:

http://en.cppreference.com/w/cpp/thread/call_once

a little bonus is that user’s f() would need not to be thread safe here, as call_once() would handle it as well.

Paul Says:

April 26, 2014 at 11:36 am

This also can already be done using Linq in C++ with something like this:

auto q = LINQ(
            from(z, boost::irange(1, std::numerical_limits::max()))
            from(x, boost::irange(1, z))
            from(y, boost::irange(x, z))
            where(xx + yy == z*z)
            select(boost::fusion::make_vector(x, y, z))
        );
for(auto&& x:q | linq::take(10)) std::cout << x << std::endl;

And doesn't require type erasure nor dynamic allocation. However, I don't how well it scales to concurrency since the where clause needs to be evaluated during iteration.

Bartosz Milewski Says:

April 27, 2014 at 4:41 pm
After careful consideration I decided to remove the discussion of thread safety. There are ways of making the suspension thread safe, but unfortunately they are expensive in terms of performance. A lock free solution is possible, and is used in Haskell but, as it’s usual with lock-free solutions, it requires very careful analysis. I hope to come back to it in the next blog post.
Bartosz Milewski Says:

April 27, 2014 at 4:48 pm
@Paul: Yes, I’m aware of LINQ’s approach. It is based on a monadic implementation with SQL-like syntax. C# guys who came up with LINQ are very well versed in functional programming.
Range Comprehensions | Eric Niebler Says:

April 27, 2014 at 5:10 pm
[…] Bartosz published a blog post about just this problem. In his post, Bartosz describes some pretty simple results from category […]
nosenseetal Says:

May 1, 2014 at 2:42 am
Bartosz where is LazyQueen on github? I cant find it… and i dont care enough to go through timetable code. 🙂
Bartosz Milewski Says:

May 1, 2014 at 10:43 am
@nosenseetal : Lazy Queue is in the Okasaki repository: https://github.com/BartoszMilewski/Okasaki/blob/master/LazyQueue/Queue.h . I’m still trying to figure out the best way to make the suspension thread safe though. Using call_once works, but it’s rather expensive.
nosenseetal Says:

May 1, 2014 at 11:18 pm
QuEEN, not QuEUe 🙂 “If you look at my github repository you’ll see lazy streams used in the implementation of a parallel solution to the 8 queen problem (LazyQueen project).”
Bartosz Milewski Says:

May 5, 2014 at 11:03 am
Oh, the 8 queens problem. Here’s the eager version: https://github.com/BartoszMilewski/Okasaki/blob/master/Queens/Queens.cpp, and here’s the lazy one: https://github.com/BartoszMilewski/Okasaki/blob/master/LazyQueens/Queens.cpp
nosenseetal Says:

May 17, 2014 at 7:52 am
Thank you, like I said simpler the example higher prob I will learn something. 🙂
C++ Ranges are Pure Monadic Goodness | Bartosz Milewski's Programming Cafe Says:

October 22, 2014 at 1:40 am
[…] purposefully re-formatted Haskell code to match C++ (A more realistic rendition of it is in my post Getting Lazy with C++). Bind operators >>= are normally used in infix position but here I wanted to match them […]
Range comprehensions with C++ lazy generators | a totally unnecessary blog Says:

March 15, 2015 at 2:50 pm
[…] very interesting articles on this topic and I’d heartily suggest reading Bartosz Milewski’s “Getting Lazy with C++” and Eric Niebler’s “Range […]

Kamil Says:

April 7, 2015 at 10:52 pm

this is my idea instead of thunks:

template<class T>
class Susp
{
public:
    explicit Susp(std::function<T()> f)
        : _f(f), _memo(T())
    {}
    T const & get() {
        call_once(init, [&] {_memo = _f(); });
        return _memo;
    }
private:
    T   _memo;

    once_flag init;
    std::function<T()> _f;
};

weberc2 Says:

July 22, 2015 at 8:51 pm
strm.take(-1) will return a copy of strm (because we’re checking n==0 instead of n<1) is this by design?
Bartosz Milewski Says:

July 23, 2015 at 12:27 pm
@weberc2: Calling take with -1 is an error. In production code I would either use an assertion or an exception.
Lazy evaluation with pair factory in C++ - BlogoSfera Says:

September 30, 2015 at 8:58 pm
[…] find https://bartoszmilewski.com/2014/04/21/getting-lazy-with-c/ and Lazy evaluation in C++ post but they dont help me. Base problem implement lazy because factory […]
Markus Says:

December 10, 2015 at 1:48 am
This is a really good post. Thank you! (I also enjoyed your talks available on youtube.)
There’s one point I don’t quite understand. You say
“We could implement this by setting a Boolean flag on the first call and then checking it on every subsequent call, but there’s a better implementation that uses thunks.”
What I don’t understand is this: In the case of the boolean flag I have to store a bool and a T. With thunks I have to store a pointer and a T. The latter needs more space. So in which way is the bool implementation worse than the one using thunks?
(Background info: I’m looking at template programming on microcontrollers. This area is still dominated by C and the general misconception in the field seems to be that C++ inevitably introduces bloat. That’s a shame because C++ has so much to offer.)
Bartosz Milewski Says:

December 10, 2015 at 11:34 am
I was more concerned about speed and concurrency than space. If you’re saying that a pointer requires more storage than a Boolean, then you’re probably using a custom-made compiler (which kind of makes sense for microcontrollers). Normal compilers will either use the same amount of storage, or pad the structure so the loads are word-aligned. Not to mention that a heap allocator can only deal with multiples of default size (usually pointer size plus a header).

Thread-safe (atomic) update of a single pointer to function is much faster that thread-safe manipulation of a bool and a value. This might not be an issue if your microcontroller is single-threaded.
Berserker Says:

November 5, 2016 at 2:41 pm
@Kamil: std::once_flag is neither copyable nor movable.
[Перевод] «Современный» C++: сеанс плача с причитаниями – CHEPA website Says:

January 31, 2019 at 1:55 am
[…] являлся ответом на статью Бартоша Милевского «Getting Lazy with C++», в котором простая сишная функция для распечатки […]

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Bartosz Milewski's Programming Cafe

Getting Lazy with C++

The Problem

Suspension

Lazy Stream

Working with Streams

Stream as a Functor

Stream as a Monad

Stream as a Monoid and a Monad Plus

Conclusion

38 Responses to “Getting Lazy with C++”

Leave a comment

Archived Entry

Bartosz Milewski's Programming Cafe

Getting Lazy with C++

The Problem

Suspension

Lazy Stream

Working with Streams

Stream as a Functor

Stream as a Monad

Stream as a Monoid and a Monad Plus

Conclusion

Share this:

Related

38 Responses to “Getting Lazy with C++”

Leave a comment

Archived Entry