How do we kick our synchronous addiction?

Asynchronous programming is superior both in memory usage and in overall throughput when compared to synchronous programming . We've known this fact for years. If we look at Django or Ruby on Rails, arguably the two most promising new web application frameworks to emerge in the past few years, both of them are written in such a way that synchronous programming is assumed. Why is it that even in 2010 we're still writing programs that rely on synchronous programming ?

The reason that we're stuck on synchronous programming is twofold. Firstly, the programming model required for straightforward asynchronous implementations is inconvenient. Secondly, popular and/or mainstream languages lack the built-in language constructs that are needed to implement a less-straightforward approach to asynchronous programming.

Asynchronous programming is too hard

Let's first examine the straightforward implementation: an event loop. In this programming model, we have a single process with a single loop that runs continuously. Functionality is achieved by writing functions to execute small tasks quickly, and inserting those functions into that event loop. One of those functions might read some bytes from a socket, while another function might write a few bytes to a file, and yet another function might do something computational like calculating an XOR on the data that's been buffered from that first socket.

The most important part about this event loop is that only one thing is ever happening at a time. That means that you really have to break your logic up into small chunks that can be performed incrementally. If any one of our functions blocks, it hogs the event loop and nothing else can execute during that time.

We have some really great frameworks geared towards making this event loop model easier to work with. In Python, there's Twisted and, more recently, Tornado. In Ruby there's EventMachine. In PERL there's POE. What these frameworks do is twofold: provide constructs for more easily working with an event loop (e.g. Deferreds or Promises), and provide asynchronous implementations of common tasks (e.g. HTTP clients and DNS resolution).

But these frameworks stop very short of making asynchronous programming easy for two reasons. The first reason is that we really do have to completely change our coding style. Consider what it would take to render a simple blog web page with comments. Here's some JavaScript code to demonstrate how this might work in a synchronous framework:

function handleBlogPostRequest(request, response, postSlug) {
    var db = new DBClient();
    var post = db.getBlogPost(postSlug);
    var comments = db.getComments(post.id);
    var html = template.render('blog/post.html',
        {'post': post, 'comments': comments});
    response.write(html);
    response.close();
}

Now here's some JavaScript code to demonstrate how this might look in an asynchronous framework. Note several things here: We've specifically written this in such a way that it doesn't become nested four levels deep. We've also written these callback functions inside of the handleBlogPostRequest function to take advantage of closure so as to retain access to the request and response objects, the template context, and the database client. Both the desire to avoid nesting and the closure are things that we need to think about as we write this code, that were not even considerations in the synchronous version:

function handleBlogPostRequest(request, response, postSlug) {
    var context = {};
    var db = new DBClient();
    function pageRendered(html) {
        response.write(html);
        response.close();
    }
    function gotComments(comments) {
        context['comments'] = comments;
        template.render('blog/post.html', context).addCallback(pageRendered);
    }
    function gotBlogPost(post) {
        context['post'] = post;
        db.getComments(post.id).addCallback(gotComments);
    }
    db.getBlogPost(postSlug).addCallback(gotBlogPost);
}

I've chosen JavaScript here to prove a point, by the way. People are very excited about node.js right now, and it's a very cool framework, but it doesn't hide all of the complexities involved in doing things asynchronously. It only hides some of the implementation details of the event loop.

The second reason why these frameworks fall short is because not all IO can be handled properly by a framework, and in these cases we have to resort to bad hacks. For example, MySQL does not offer an asynchronous database driver, so most of the major frameworks end up using threads to ensure that this communication happens out of band.

Given the inconvenient API, the added complexity, and the simple fact that most developers haven't switched to using this style of programming, leads us to the conclusion that this type of framework is not a desirable final solution to the problem (even though I do concede that you can get Real Work done today using these techniques, and many people do). That being the case, what other options do we have for asynchronous programming? Coroutines and lightweight processes, which brings us to our next major problem.

Languages don't support easier asynchronous paradigms

There are a few language constructs that, if implemented properly in modern programming languages, could pave the way for alternative methods of doing asynchronous programming that don't have the drawbacks of the event loop. These constructs are coroutines and lightweight processes.

A coroutine is a function that can suspend and resume its execution at certain, programmatically specified, locations. This simple concept can serve to transform blocking-looking code to be non-blocking. At certain critical points in your IO library code, the low-level functions that are doing IO can choose to "cooperate". That is, it can choose to suspend execution in order for another function to resume execution and continue on.

Here's an example (it's Python, but fairly understandable for all I hope):

def download_pages():
    google = urlopen('http://www.google.com/').read()
    yahoo = urlopen('http://www.yahoo.com/').read()

Normally the way this would work is that a socket would be opened, connected to Google, an HTTP request sent, and the full response would be read, buffered, and assigned to the google variable, and then in turn the same series of steps would be taken for the yahoo variable.

Ok, now imagine that the underlying socket implementation were built using coroutines that cooperated with each other. This time, just like before, the socket would be opened and a connection would be made to Google, and then a request would be fired off. This time, however, after sending the request, the socket implementation suspends its own execution.

Having suspended its execution (but not yet having returned a value), execution continues on to the next line. The same thing happens on the Yahoo line: once its request has been fired off, the Yahoo line suspends its execution. But now there's something else to cooperate with--there's actually some data ready to be read on the Google socket--so it resumes execution at that point. It reads some data from the Gooogle socket, and then suspends its execution again.

It jumps back and forth between the two coroutines until one has finished. Let's say that the Yahoo socket has finished, but the Google one has not. In this case, the Google socket just continues to read from its socket until it has completed, because there are no other coroutines to cooperate with. Once the Google socket is finally finished, the function returns with all of the buffered data.

Then the Yahoo line returns with all of its buffered data.

We've preserved the style of our blocking code, but we've used asynchronous programming to do it. Best of all, we've preserved our original program flow--the google variable is assigned first, and then the yahoo variable is assigned. In truth, we've got a smart event loop going on underneath the covers to control who gets to execute, but it's hidden from us due to the fact that coroutines are in play.

Languages like PHP, Python, Ruby, and Perl simply don't have built-in coroutines that are robust enough to implement this kind of behind-the-scenes transformation. So what about these lightweight processes?

Lightweight processes are what Erlang uses as its main concurrency primitive. Essentially these are processes that are mostly implemented in the Erlang VM itself. Each process has approximately 300 words of overhead and its execution is scheduled primarily by the Erlang VM, sharing no state at all amongst processes. Essentially, we don't have to think twice about spawning a process, as it's essentially free. The catch is that these processes can only communicate via message passing.

Implementing these lightweight processes at the VM level gets rid of the memory overhead, the context switching, and the relative sluggishness of interprocess communication provided by the operating system. Since the VM also has insight into the memory stack of each process, it can freely move or resize those processes and their stacks. That's something that the OS simply cannot do.

With this model of lightweight processes, it's possible to again revert back to the convenient model of using a separate process for all of our asynchronous programming needs. The question becomes this: can this notion of lightweight processes be implemented in languages other than Erlang? The answer to that is "I don't know." To my knowledge, Erlang takes advantage of some features of the language itself (such as having no mutable data structures) in its lightweight process implementation.

Where do we go from here?

The key to moving forward is to drop the notion that developers need to learn to think about all of their code in terms of callbacks and asynchrony, as the asynchronous event loop frameworks require them to do. Over the past ten years, we can see that most developers, when faced with that decision, simply choose to ignore it. They continue to use the inferior blocking methodologies of yesteryear.

We need to look at these alternative implementations like coroutines and lightweight processes, so that we can make asynchronous programming as easy as synchronous programming. Only then will we be able to kick this synchronous addiction.

Exploring Erlang's gen_server

Today I found myself at Super Happy Dev House, amongst a whole house full of geeks (I say this in the most friendly way possible). I thought for a while about working on Pinax project, which is what I really wanted to do. However, it didn't seem geeky enough for the occasion.

No, today the call of the geek was too strong. There was only one place to turn to achieve maximal geekiness: functional programming. Not only functional programming, but I thought this occasion called for a concurrency-oriented programming language. Yes, today was the perfect day for some Erlang.

I've done a few small projects with Erlang, but never before have I used OTP. So the goal is to write an OTP gen_server which would store a user's "presence" (think Twitter, Pownce, etc.) in memory. No, it's not a good idea. Yes, it's kind of fun. So here we go!

First we need to decide on an API:

start_link() ->
    gen_server:start_link({local, ?SERVER}, ?MODULE, [], []).

post_presence(UserID, Presence) ->
    gen_server:cast(?SERVER, {post_presence, UserID, Presence}).

list_presence(UserID) ->
    gen_server:call(?SERVER, {list_presence, UserID}).

list_presence(UserID, Limit) ->
    gen_server:call(?SERVER, {list_presence, UserID, Limit}).

public_list(Limit) ->
    gen_server:call(?SERVER, {public_list, Limit}).

start_link will start the server. post_presence will post a user's current presence to the server. list_presence with one argument gets all of the specified user's presence information. With a second argument, it limits the amount of returned presence information to the specified number. Calling public_list means that you're getting the latest posts from anyone.

Before that, though, let's get some of the gen_server cruft out of the way:

-module(presence_database).
-behavior(gen_server).

-define(SERVER, ?MODULE).

%% gen_server API
-export([init/1, handle_call/3, handle_cast/2, handle_info/2, terminate/2,
    code_change/3]).

%% presence API
-export([start_link/0, post_presence/2, list_presence/1, list_presence/2,
    public_list/1]).

handle_info(_Info, State) ->
    {noreply, State}.

terminate(_Reason, _State) ->
    ok.

code_change(_OldVersion, State, _Extra) ->
    {ok, State}.

init([]) ->
    {ok, dict:new()}.

These are mostly things that we need to have in order for things to work. We define our module name, the behavior that this module mimics, and define a constant. Then, we export the required functions for gen_server and export our public presence api functions. Finally we implement really simple functions for those gen_server functions that we don't care much about.

Now since I couldn't figure out how to slice a list in Erlang, I wrote my own slice function. Make sure to flame me for this.

slice(List, Start, End) ->
    slice(List, Start, End, 0, []).

slice([], _Start, _End, _Index, Acc) ->
    lists:reverse(Acc);
slice([Item | Rest], Start, End, Index, Acc) ->
    case Index >= Start of
        true ->
            case Index < End of
                true ->
                    slice(Rest, Start, End, Index + 1, [Item | Acc]);
                false ->
                    slice(Rest, Start, End, Index + 1, Acc)
            end;
        _ ->
            slice(Rest, Start, End, Index + 1, Acc)
    end.

...and I'm pretty sure that using guard expressions this could be done in one case statement. Or maybe it could be done in a single list comprehension. I don't know. This works for our purposes.

Now we have to write functions called handle_cast and handle_call. handle_cast is essentially a server function which you don't expect a response from. Messages can queue up to this function and they will be handled sequentially but never return any value to the caller. handle_call is exactly the opposite. The caller is expecting a response, so this is a blocking operation.

Let's use handle_cast to accept new presence notifications:

handle_cast({post_presence, UserID, Presence}, State) ->
    case dict:is_key(UserID, State) of
        true ->
            {noreply, dict:append(UserID, {erlang:now(), Presence}, State)};
        _ ->
            {noreply, dict:store(UserID, [{erlang:now(), Presence}], State)}
    end;
handle_cast(_Msg, State) ->
    {noreply, State}.

In essence, we check to see if the user has registered their presence, and if so, we add their presence to their presence list. If they haven't submitted their presence before, we create a new presence list for them and add their submitted presence to that list. We have also generated a catchall version of the function for when the call doesn't match the signature {post_presence, UserID, Presence} for some reason.

Now let's do the harder one: the call to get various presence information from our server:

userfy_list(User, List) ->
    lists:map(fun({Time, Msg}) -> {User, Time, Msg} end, List).

handle_call({list_presence, UserID}, _From, State) ->
    case dict:find(UserID, State) of
        {ok, Value} ->
            {reply, Value, State};
        error ->
            {reply, {error, user_does_not_exist}, State}
    end;
handle_call({list_presence, UserID, Limit}, _From, State) ->
    case dict:find(UserID, State) of
        {ok, Value} ->
            {reply, lists:nthtail(Limit, Value), State};
        error ->
            {reply, {error, user_does_not_exist}, State}
    end;
handle_call({public_list, Limit}, _From, State) ->
    LatestEntries = lists:flatten([userfy_list(User, List) || {User, List} <- dict:to_list(State)]),
    Sorted = lists:sort(fun({_, A, _}, {_, B, _}) -> A > B end, LatestEntries),
    {reply, slice(Sorted, 0, Limit), State};
handle_call(_Request, _From, State) ->
    {reply, ok, State}.

The first version of handle_call is very straightforward. It simply looks up the list of presence information for a given user and returns that in the reply. The second version does a similar thing, but calls lists:nthtail on the value to limit the number of presence data that is included in that list.

The final version of handle_call is the most complicated, because we want to get information about everyone, order it by the date that it was posted, and limit the number of returned results by the specified limit. An added complexity is that we have to change the data format to include the name of the user who posted it.

First it uses a list comprehension to take the state dictionary and run our userfy_list function on every User/List pair. This will add the user to the presence tuple. Then we flatten that list to be just a single list of userfied tuples. Then, we sort that by the middle value (the timestamp of the presence). Finally, we use our newly-created slice function to take just the slice of information that we care about and return it to the user. Again, there is a catchall handle_call which discards incorrectly-formatted messages.

Demo

1> c(presence_database).
{ok,presence_database}
2> presence_database:start_link().
{ok,<0.37.0>}
3> presence_database:post_presence("ericflo", "I am at Super Happy Dev House").
ok
4> presence_database:post_presence("ericflo", "I am writing erlang code").
ok
5> presence_database:post_presence("dreid", "I am having fun at SHDH").
ok
6> presence_database:list_presence("ericflo").
[{{1226,816100,955},"I am at Super Happy Dev House"},
 {{1226,816113,249498},"I am writing erlang code"}]
7> presence_database:list_presence("dreid").
[{{1226,816135,937300},"I am having fun at SHDH"}]
8> presence_database:public_list(5).
[{"dreid",{1226,816135,937300},"I am having fun at SHDH"},
 {"ericflo",{1226,816113,249498},"I am writing erlang code"},
 {"ericflo",
  {1226,816100,955},
  "I am at Super Happy Dev House"}]
9> presence_database:public_list(2).
[{"dreid",{1226,816135,937300},"I am having fun at SHDH"},
 {"ericflo",{1226,816113,249498},"I am writing erlang code"}]

Conclusions

This really isn't robust. If we wanted it to be more robust, we should use some sort of persistent storage, we should use more than one process and do some sort of consistent hashing to distribute the load, and we should have a supervisor process to ensure that crashed processes restart correctly and reload their data.

All of that is the case, but yet through this simple exercise I've learned a ton about Erlang's gen_server module. I don't know if my explanations will do people any good, but hopefully they give a glimpse into the world of Erlang. So now that all of that is said, show me how to slice in Erlang!

EDIT: Commenter Mihai has made me aware of the lists:sublist function, which does exactly what I want. I can now discard my crappy slice function.

Search

Badges

  • django badge
  • apache badge
  • GeoURL
  • XFN Friendly
  • Valid HTML 4.01 Transitional