Skip to content

Hello world in Naiad

by on October 9, 2012

In this post we’ll cover the very basics of putting a Naiad program together, and learn a little bit about how Naiad works.

The first thing we’ll want to do is create a new Naiad project. To do this in Visual Studio, open the Naiad solution from the code release, and add a new project of type “Console application”. Once you’ve done that, add a reference to the Naiad project (which links in Naiad.dll), and add a using Naiad; line at the top of the program. In the following code, let’s imagine that we’ve already taken care of these details.

We’re about to write a simple interactive program that will count the frequencies of words in the text we present to it. The example tracks the code in the Naiad source download found in NaiadExamples/WordCount.cs.

The first thing we do in a Naiad computation is to create a Naiad controller.

using (var controller = new Controller())
{

This starts up Naiad and provides the opportunity to supply execution parameters (as arguments to the Controller). These include things like how many threads Naiad should use, whether it should look for other processes (and if so, where), and many other parameters. For now, we’ll just let Naiad use its defaults.

The next thing we need to do is define the computation we want to perform. Unlike many systems where you start with some fixed input and transform it through a sequence of steps, Naiad’s interactive access currently requires us to first define the computation, then start feeding in data. So to start out, we declare a new source of input for Naiad.

  var myText = controller.NewInput<string>();

The method asks Naiad (specifically, the controller managing its execution) to create a new input into which we can put records. We have also indicated that we are planning on putting in strings, which allows Naiad to use C#’s type system to infer and enforce subsequent type decisions.

The variable myText now represents a collection into which we can add records. Before we do that, we want to describe the computation that these records will be subjected to. Our plan is to write a program that splits each string into words, and then counts the frequency of each word, writing the frequencies back to the screen. Here is how we do that:

  myText.SelectMany(x => x.Split(' '))
        .Count(y => y, (k, c) => k + ":" + c)
        .Subscribe(l => { foreach (var x in l) Console.WriteLine(x); });

This is a bunch of lines all at once, so let’s walk through them.

The first method, SelectMany, is a lot like LINQ’s record-by-record transformation operator. SelectMany takes as a parameter a function from an input element to a list of output elements, and its result is the concatenation or “flattening” of the lists produced by the application of the function to each input record. In this case, we the function takes a string, x, and produces the list of strings returned by x.Split(‘ ‘). These are exactly the “words” in each input line of text (though a more robust implemention would take case and symbols into account). Now is a good time to point out that Naiad’s “Collections” are unordered multisets: a record can appear multiple times, and their order is not preserved (or acknowledged). The output of SelectMany, like myText, is a handle to an (as yet empty) multiset of strings on which we can base further computation.

The second method, Count, is technically a new method not found in LINQ, because rather than count the number of records in the collection it allows us to count records in subgroups within the collection. If you are familiar with LINQ, this is a lot like using GroupBy (LINQ’s grouping method) where each group is reduced with the Count function. Here Count takes two parameters: first a key selector (what should we group the records by?) and then a result selector (given the key and number of records with that key, what record should be output?). In our case, we are saying that we should group records using the identity function (we want to count each word by itself), and when we have a (word, count) pair, we want to stick the two together as one string.

The final method, Subscribe, is Naiad’s way of getting information back from the computation. Subscribe takes a callback function that responds to the outputs Naiad produces, and Naiad promises to invoke this function for each batch of input data submitted to Naiad, with the output corresponding to that batch. All records in Naiad, including the inputs to and output from Naiad, take the form of “differences”. Once we start up the example program (in a few paragraphs) we’ll see what these outputs look like and explain what they mean.

Now we’ve built up the computation, we expect to see the counts for words in the strings that we supply to the program. Let’s see how to introduce some input.


  Console.WriteLine("Enter lines of text, or an empty line to exit.");
  var line = Console.ReadLine();
  while (line != "")
  {
    myText.OnNext(line);
    line = Console.ReadLine();
  }

This bit of code prompts the user for input, and for each line the user supplies it submits the line as one batch of input. Naiad breaks the line apart into words, updates the counts associated with the words, and invokes our callback with the list of changes to the collection.

Before we start playing with the computation, let’s finish out the program. There are two more lines to politely shut down a Naiad computation.

  myText.OnCompleted();
  controller.Join();
}

The OnCompleted method indicates to Naiad that the input stream is now closed. This lets Naiad reason about what bits of computation are outstanding, and in particular gets it ready to respond to the next method. Calling Join on a controller blocks the calling thread until the controller has retired all of its inputs and we are assured that the callback has been invoked for every batch submitted to the input. If you don’t call these two, you often find that your program exits very quickly, before Naiad has even started to process any data.

Here are all the Naiad parts together, the whole program minus any C# boilerplate.

using (var controller = new Controller())
{
  // create an incrementally updateable collection
  var myText = controller.NewInput<string>();

  // segment strings, count, and print
  myText.SelectMany(x => x.Split(' '))
        .Count(y => y, (k, c) => k + ":" + c)
        .Subscribe(l => { foreach (var x in l) Console.WriteLine(x); });

  Console.WriteLine("Enter lines of text, or an empty line to exit.");

  var line = Console.ReadLine();
  while (line != "")
  {
    myText.OnNext(line);
    line = Console.ReadLine();
  }

  myText.OnCompleted();
  controller.Join();
}

Ok! Now let’s play a bit with this program. I’ve put lines that I typed as input in bold.

Enter lines of text, or an empty line to exit.
hello world
[ hello:1, 1 ]  
[ world:1, 1 ]

What has happened here, and what are all these numbers?  In response to my input, Naiad has printed two records – these are the “differences” we mentioned above, and they describe how the collection has changed.  Each difference has a value and a frequency.  So in this output, “hello” and “world” are the two words we extracted from the input string,  while the number 1 following the colon is the current count of that word. The second 1 indicates that the record “hello:1” has had its frequency increased by one; it didn’t use to exist in the collection, and now it does, once. The same thing is the case with “world:1”. This may make more sense once we add some more input.

hello  
[ hello:1, -1 ]   
[ hello:2, 1 ]

Well this is different! What has happened here is that we added the word “hello”, increasing its count by one, up to two. The result is that a record that used to be in the output “hello:1” is no longer there (so we need to decrement its frequency) whereas a new record has emerged: “hello:2”.

goodbye world
[ world:1, -1 ]   
[ world:2, 1 ]   
[ goodbye:1, 1 ]

A few things happened here. The count for “world” went up by one, and “goodbye” showed up as a new word. Fortunately Naiad manages all of this for you.

goodbye
[ goodbye:1, -1 ]   
[ goodbye:2, 1 ]

And a little more of the same. This covers most of what the code example means to cover, so we’ll finish with that.

Although counting words incrementally is a very simple application, things get more interesting as we start to make more complex programs. The main point is that we did all of our programming here as if against a static dataset – without worrying about how to make it incremental – and Naiad sorted out all of the incremental stuff for us. For example, we could quite easily modify the WordCount.cs example to first load in a few gigabytes of text, and then work interactively with it. If everything is working correctly, you still have interactive access to word counts (and will be delighted that we don’t reprint all counts for all words with each batch of input…). Over the next few posts, we will see some more neat algorithms that you can write in Naiad.

Advertisements

From → Naiad

2 Comments
  1. DiY permalink

    Hi,
    i’m trying to use Naiad from Visual Studio F# Interactive (Script.fsx).

    #r @”C:\Naiad\bin\Debug\naiad.dll”
    open Naiad
    let c = new Controller()

    but the following error occurred:

    System.ArgumentNullException: Value cannot be null.
    Parameter name: source
    at System.Linq.Enumerable.ToArray[TSource](IEnumerable`1 source)
    at Naiad.Configuration.get_Endpoints() in C:\Naiad\Controller.cs:line 42
    at Naiad.Controller..ctor(Configuration config) in C:\Naiad\Controller.cs:line 557
    at Naiad.Controller..ctor() in C:\Naiad\Controller.cs:line 577
    at .$FSI_0006.main@() in C:\Test\Script.fsx:line 3
    Stopped due to error

    • Hi there,

      Sorry about that – there seems to be a bug in the default constructor for Controller, which we will fix in the next release.

      In the meantime, you can work around this by instead writing:

      > let mutable (args : string array) = Array.zeroCreate 0;;
      > let c = new Controller(Configuration.FromArgs(&args));;

      We’d be interested to hear about your experiences with Naiad and F#!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: