Welcome to Naiad
Hello everybody and welcome to the first of several posts on Naiad, an incremental, iterative, and interactive data-parallel computing platform that we’re currently developing at Microsoft Research Silicon Valley. We’re doing these posts so we can entertain (or horrify) anybody interested in developments at the bleeding edge of so-called “big data computing”. We’ll cover a few topics, ranging from simple code examples to the research underpinnings that make Naiad unique, which we hope will appeal to a spectrum of readers from big data novices to active researchers. Much of our discussion is aligned with an associated source code release, available at the Naiad project page.
Let us start in a conventional fashion with a little bit of history. Back in the mists of time, our intrepid project founders, Frank and Michael, were having deep and meaningful thoughts about the next generation of data-parallel computing platforms. This didn’t happen out of the blue: Michael was largely responsible for Dryad, a distributed acyclic dataflow computation platform, while Frank had been a satisfied, and sometimes not-so-satisfied, user of DryadLINQ for many years. Their extensive experience led to a couple of observations (made by others, in the past):
- Many big-data algorithms contain loops, and these loops are often data-dependent, iterating until the answer doesn’t change any more.
- As the answer starts to stabilize, much of the work done in each iteration is redundant with work done in previous iterations, since much of the data are the same. Significant speed-ups are possible by considering just the changes rather than the entire state itself.
This sounded appealing, and not too complicated, so we set about modifying Dryad to support cyclic dataflow graphs capable of re-using previously computed results. However, it soon became apparent that extending Dryad wasn’t going to fly, and that Naiad required a completely different computational model.
Despite its suggestive name, Naiad is a completely new, stand-alone system, written in C#, with nothing at all in common with Dryad (other than its too-clever-by-half name). Coding was started around the middle of 2011, and we’ve now reached the point of having a working system we’re comfortable sharing with others. We think that Naiad offers functionality and benefits that aren’t currently found in any other data-parallel compute platform, and with the source code release we hope that other people might be interested in trying the system out. This blog is intended to help explain how it all works (it really is different!), as well as to share some of the interesting issues that we’ve run into over the last year and no doubt will encounter in the months to come.
If you are interested in future of big data computing, then we hope you will find these posts entertaining, and we’d love to hear your thoughts.
We have prepared a rough outline of posts-to-come, several of which are ready to read now, and others still need to be done. We’ll link them as they come on-line. The first batch of posts are an introduction to writing programs in Naiad
- My first Naiad program (interactive word counting).
- My second Naiad program (building a search index).
- My first iterative Naiad program (connected components, introducing FixedPoint).
- My last-ever Naiad program (interactive strongly connected components).
Naiad is also a distributed computation platform, and there are several things to say about running jobs in distributed mode.
- Running distributed Naiad programs.
- Naiad’s distributed system architecture.
- Naiad’s clock protocol.
- Scheduling, compaction, and completeness in Naiad.
Understanding what is actually going on in Naiad can be challenging, even when you are pretty familiar with it (us). We have a few things (at least) to say about that.
- Performance and Measurement.
- Optimizing Naiad’s memory footprint.
- Debugging in Naiad.
The technical details that make Naiad new and interesting are pretty neat, but strictly-speaking are not required reading.
- Differential Dataflow (how things work, aka “the math post”)
- Reachability and the clock protocol’s correctness.
Finally, it is important to understand Naiad’s place in the larger “big data” space. We have made many design decision contrary to other systems, for better or for worse, and will try to explain them here.
- History, motivation, and related work.
We’ll probably come up with other things to say, but if we’ve missed something you’re very interested in hearing about, let us know at naiadquestions at microsoft dot com.