Regular readers of this blog might notice that the Naiad team is rather fond of defining new kinds of dataflow. Just yesterday, we did it again with a paper that defines “timely dataflow”. In this post, I’m going to introduce timely dataflow at a high level, and follow-up posts will talk more about the distributed Naiad system that we have built on top of it. Read more…
We’ve just finished work on a paper that describes the distributed implementation of Naiad. I will be presenting it in November at the ACM Symposium on Operating Systems Principles (SOSP) in Nemacolin, PA.
Way back at the beginning of time, we had a post on performing PageRank on a 1.5B edge graph of who-follows-who on Twitter. We talked a bit about how several big data systems don’t do quite as well as a 40 line, single-threaded C# program. There was also a promise to show how to make things go much, much faster. So we’re going to do that today.
am going to be presenting presented our work on Naiad at the GraphLab workshop, this afternoon at 3:10pm. I’ll talk about the trade-offs between data-parallel and graph-parallel systems, and show how Naiad delivers the best of both worlds.
In this post we will cover a programming pattern frequently thought to be incompatible with data-parallel systems: sequentially updating a collection of values each as a function of the most recent values. In C#, this would look something like:
// sequentially updates array of values for (int i = 0; i < values.Length; i++) values[i] = updateBasedOn(values);
This type of iterative process is at the heart of several graph algorithms, from efficient pageranking, to graph coloring, to general loopy belief propagation. Although the logic looks very sequential, we will see that it can be made very parallel in the context of graph processing.
We’ve walked through a few example programs in previous posts, computing the connected components of a symmetric graph and the strongly connected components of a directed graph, but we didn’t say very much about how a computer (or computers) might execute these programs. In this post we’ll do that at a high level by introducing differential dataflow, the main feature distinguishing Naiad’s computational model from previous approaches to data-parallel computation. It is what lets Naiad perform iterative computation efficiently, and update these computations in millisecond timescales.