Naiad: A Timely Dataflow System
We’ve just finished work on a paper that describes the distributed implementation of Naiad. I will be presenting it in November at the ACM Symposium on Operating Systems Principles (SOSP) in Nemacolin, PA. UPDATED: The video is now available to watch: (MP4) (Flash).
The paper describes how Naiad builds on a new model called timely dataflow. We have a forthcoming series of blog posts about timely dataflow, the low-level Naiad API, and some of the applications that use it. In the meantime, check out the abstract and download the full SOSP paper (PDF, 17 pages):
Naiad is a distributed system for executing data parallel, cyclic dataflow programs. It offers the high throughput of batch processors, the low latency of stream processors, and the ability to perform iterative and incremental computations. Although existing systems offer some of these features, applications that require all three have relied on multiple platforms, at the expense of efficiency, maintainability, and simplicity. Naiad resolves the complexities of combining these features in one framework.
A new computational model, timely dataflow, underlies Naiad and captures opportunities for parallelism across a wide class of algorithms. This model enriches dataflow computation with timestamps that represent logical points in the computation and provide the basis for an efficient, lightweight coordination mechanism.
We show that many powerful high-level programming models can be built on Naiad’s low-level primitives, enabling such diverse tasks as streaming data analysis, iterative machine learning, and interactive graph mining. Naiad outperforms specialized systems in their target application domains, and its unique features enable the development of new high-performance applications.
Update: tl;dr? Read the introductory blog post on timely dataflow.