In this post we will cover a programming pattern frequently thought to be incompatible with data-parallel systems: sequentially updating a collection of values each as a function of the most recent values. In C#, this would look something like:
// sequentially updates array of values for (int i = 0; i < values.Length; i++) values[i] = updateBasedOn(values);
This type of iterative process is at the heart of several graph algorithms, from efficient pageranking, to graph coloring, to general loopy belief propagation. Although the logic looks very sequential, we will see that it can be made very parallel in the context of graph processing.
We’ve walked through a few example programs in previous posts, computing the connected components of a symmetric graph and the strongly connected components of a directed graph, but we didn’t say very much about how a computer (or computers) might execute these programs. In this post we’ll do that at a high level by introducing differential dataflow, the main feature distinguishing Naiad’s computational model from previous approaches to data-parallel computation. It is what lets Naiad perform iterative computation efficiently, and update these computations in millisecond timescales.
So in the last few posts we’ve hopefully whetted your appetite for using Naiad, but we haven’t said much about the secret sauce that enables it to compute results efficiently. Just before Thanksgiving, we finished work on the camera-ready version of our paper on differential dataflow, which will be presented early next year at the Conference on Innovative Data Systems Research (CIDR 2013). This paper sets out the key ideas and algorithms that underpin Naiad, and enable it to perform complex iterative analyses on real-world data in real-time.
The main idea in the paper is differential computation, which is a new form of incremental computation. The main idea of differential computation is that you can benefit from storing multiple previous versions of a computation’s state, if there exists some partial order describing the “reasons” for which the state changes. When applied to data-parallel dataflow, this allows a system like Naiad to disentangle the changes to a collection that are due to iteration, and those that are due to external inputs. In the next few weeks, we’ll post some more details about the math of differential computation, and discuss different ways of implementing it. In the meantime, have a read of the paper and get in touch if you have any questions!
Frank has been talking to the folks at Channel 9 about Naiad: giving some background on the project, showing how programs are written, and demonstrating what you can do with them. If you have a spare 35 minutes, I recommend that you watch the video:
All of the example source that Frank shows can be downloaded as part of the Naiad source code release. If you have any questions, leave a comment here or on Channel 9, or email us at naiadquestions at microsoft dot com.
One of the potentially game-changing advantages of incremental computation is that it allows you to react to incoming data when it is freshest, and most valuable. In particular, social networks are producing data at ever-increasing rates, and – because much of this data is conversational – we need the expressiveness of a system like Naiad to perform sophisticated analyses on it. In this post, I’ll show you how to connect a simple Naiad program to a stream of tweets from Twitter, and discuss some of the ways that Naiad can be used to analyze this type of data.
Naiad is designed to take advantage of distributed compute clusters: a large number of processor cores enables Naiad to work faster by processing data in parallel, while the aggregate RAM of a cluster enables Naiad to scale to larger problem instances. In this post, I’ll talk a bit about how data-parallelism in Naiad works from an architectural point of view, then go into the details of how to write and run your own distributed Naiad programs. To follow along with the steps, I’d recommend downloading the Naiad source release and running some of the example applications in the NaiadExamples project, as I’ll be using these in the step-by-step instructions below.
In the previous post we looked at writing an iterative algorithm for computing the connected component structure of an undirected graph. In this post we are going to expand on this, and show how to use that functionality as a subroutine in computing the strongly connected component structure of a directed graph. The code to follow is in NaiadExamples/StronglyConnectedComponents.cs in the Naiad download.
