Skip to content

So long and thanks for all the hits

As you might have heard, the Microsoft Research Silicon Valley lab is no longer.

It’s been a great privilege to work with so many talented researchers over the past three years. I look forward to seeing what comes next from all of us.

Derek Murray (@mrry)

GraphLINQ: A graph library for Naiad

The most recent Naiad release contains GraphLINQ, a new library of LINQ-like operators full of graph-specific optimizations. GraphLINQ is an example of how one can build up domain-specific libraries with carefully tailored implementations, which nonetheless integrate smoothly with Naiad and the rest of its libraries. This post (first of several) explains some of GraphLINQ’s methods by building up a PageRank example. We will talk more about asynchronous graph processing, and how GraphLINQ is built in future posts.
Read more…

Building new frameworks on Naiad

Since we first announced Naiad back in 2012, the system has grown in many ways, including performance, robustness, and support for other platforms. From a developer’s point of view, the biggest change came in early 2013, when we split the project into two parts, so that our differential dataflow implementation became a library on top of the more general distributed Naiad runtime. The main consequence of this split is that you can now build new frameworks and DSLs on top of Naiad, and take full advantage of its high-performance runtime. In future posts, we will talk more about some of the frameworks that we have released with Naiad. Today I’m going to show how easy it is to build a new framework, and the opportunities that this opens up.

Read more…

Naiad on YARN and Azure HDInsight

One of the most commonly asked questions about Naiad is, “How on earth do you run it on a cluster?” When we first released Naiad, the only solution available was to grit your teeth and run the distributed programs manually, using scripts that were manually tailored to a particular cluster. In the mean time, Hadoop YARN has become a widely available framework for running data-processing applications on a cluster. This post describes how we ported the latest version of Naiad to run on top of Hadoop, and how this makes it easier to run Naiad programs on your data in Microsoft Azure.

Read more…

Announcing the Naiad 0.4 release

Since we released Naiad as open-source software on GitHub last October, we have been busy adding features that make it easier to use Naiad for your applications, in settings ranging from your laptop to a distributed cluster. In the coming weeks, we’ll be posting about what’s new. This post gathers together the things that have changed.

Read more…

Tuning the performance of Naiad. Part 1: the network

We have recently been talking about Naiad’s low latency (see, for example, Derek’s presentation at SOSP).  If you have ever tried to coax good performance out of a distributed system, you may be wondering exactly how we get coordination latencies of less than 1ms across 64 computers with 8 parallel workers per machine.  In this series of posts we’re going to reveal to our curious – and possibly skeptical – users what we did in gory detail.
Read more…

Naiad available on GitHub

We are pleased to announce that Naiad is now available on GitHub under the Apache 2.0 open source license. Naiad builds and runs using the .NET CLR on Windows, and Mono on Windows, Linux, and OS X.

Read more…

An introduction to timely dataflow

Regular readers of this blog might notice that the Naiad team is rather fond of defining new kinds of dataflow. Just yesterday, we did it again with a paper that defines “timely dataflow”. In this post, I’m going to introduce timely dataflow at a high level, and follow-up posts will talk more about the distributed Naiad system that we have built on top of it. Read more…

Naiad: A Timely Dataflow System

We’ve just finished work on a paper that describes the distributed implementation of Naiad. I will be presenting it in November at the ACM Symposium on Operating Systems Principles (SOSP) in Nemacolin, PA. UPDATED: The video is now available to watch: (MP4) (Flash).

Read more…

Graph Analysis and Hilbert Space-filling Curves

Way back at the beginning of time, we had a post on performing PageRank on a 1.5B edge graph of who-follows-who on Twitter. We talked a bit about how several big data systems don’t do quite as well as a 40 line, single-threaded C# program. There was also a promise to show how to make things go much, much faster. So we’re going to do that today.

Read more…