"Web Hooks"

A few months back GitHub rolled out its implementation of something that they call "Service Hooks". The idea behind these hooks is that when you commit some new piece of code to GitHub, they want to be able to alert other services that you have committed that code. For example, one of the service hooks is the ability to send a tweet to Twitter, and another of those hooks updates the Lighthouse ticket tracker.

I thought this was a really good idea when they rolled it out, so I did a bit of searching and found out that there is a larger body of work surrounding this idea, and that body of work is called Web Hooks. The central idea behind web hooks is that a user supplies a service that they use with a URL. Then, when that user performs an action on that service, the service agrees to send an HTTP POST request to the user's specified URL, with some information about the action that the user took on the service.

SlideShare has an excellent presentation deck about this idea, which likens it to Unix pipes. That analogy makes a lot of sense if you think about it. With the standard model that most websites follow today, a client can only send requests. This means repeated polling until the client receives the information that it is interested in. With web hooks, however, the service is responsible for passing that action along to the next service. This simple yet powerful mechanism can allow for very advanced systems which interoperate very simply through chaining.

Let's expore a concrete example of what this might look like. A few months back I signed up for a pro account on Flickr, so that I could upload some of the pictures that I had stored on my computer. What I did was to upload some pictures with descriptions, and then I went and posted on Twitter some of the links to those pictures. I also went and added that new Flickr account to FriendFeed so that others could see my pictures as well.

This was all a manual process. If both Flickr and Twitter supported web hooks, I could have simply set up their respective URLs and uploaded my pictures. The process might have happened like this: First, the pictures are uploaded. Then Flickr sends a POST request to Twitter, with the description of the picture and a link to the picture. Twitter sends a POST request to FriendFeed, adding the new item to my FriendFeed lifestream.

You could even write custom scripts to handle the web hooks. For example let's say that I want any tweet with the name 'Kevin' to be sent to my brother's email address. I could add a URL to Twitter linking to a script on my computer which scans the contents of the tweet. If the tweet has the name 'Kevin' in it, it would send an email. If not, it might do nothing.

I think that this concept is very powerful not only in terms of rendering trivial the interoperability between disparate services, but also in terms of simply saving on bandwidth and computing power. Technologies which constantly poll resources hoping for updated content seem silly in comparison to the powerful simplicity that web hooks provide.

There are definitely some drawbacks to a system like this. Firstly, the name: I actually can't think of a worse name for this concept. Web hooks?! Let's come up with something better. All joking aside though, this type of system does face a serious problem when it comes to the question of reliability. If a script receives no POST, it could mean that either no event happened, or that the internet connection went down for a bit, or that the service is down, or any number of other possible things. I think the solution for this is a hybrid model of sparse polling in conjunction with web hooks.

Most of all, this technology just seems so underused. There are ridiculously few people who implement something like this, yet it seems like an undeniably useful service--especially given its relative simplicity to implement. Let's all try to encourage the services that we use on a daily basis to support web hooks, because by doing just that, we can make the web a lot better.

The internet is in immediate danger of collapse

Wow, that's quite a title! You're already probably queuing up all of your counterpoints and your rebuttals. In fact it's not quite that serious, but a seriously worrying trend is emerging that I'd like to address.

The Problem

Today many sites are so totally dependent on 3rd-party services that when certain services go down, a chain of outages end up knocking out many of the sites that we use on a daily basis--some for critical business applications. But I'm being too abstract, so let's take a concrete example of this: Google Analytics.

Back in late 2007, Google's analytics and monitoring service went down with no explanation. Most sites at that time had installed a line of Javascript in their HTML head element which made a call to document.write. This causes the rest of the site to stop and wait for Google's servers. Normally this is not a bad thing, because Google has some pretty fast and reliable web servers. But in this 24 hour period, anyone who had this code in their head element had an absolutely broken site. Users could not see their site at all. And was no fault of either the site developers or of the users--just Google's fault.

Another example: Earlier this year, Amazon had an outage in their popular S3 file storage service for several hours. At the time, you didn't need to be very tech savvy or know much about computers to know that something was seroiusly wrong with the internet. Sites from across the net were throwing 500 errors, looking completely awful without their media files, and the internet simply became a pretty awful place to get things done. From one company. Having a problem with one service.

And since then we have become even more dependent on 3rd party services for even more widgets, "cloud computing", and more. Frankly this "cloud computing" craze scares the hell out of me. The more interconnected our various bits of HTML and HTTP are, the more chances there are for massive catastrophe. Just look at the credit default swaps problem we're having in the USA for another concrete example of how this type of interdependence can fail in catastrophic ways.

The Solution

Services like S3, Google Analytics, and even Twitter are great services. They add lots of value for larger businesses and even more for a startup, so there's a large incentive to use them. I think that's absolutely fine and is actually a good idea. That being said, we need to manage our use of these services in a responsible way. Instead of storing data directly to S3, store it on a server and asynchronously upload it to S3. That way, you can set up an S3-pinger and if it goes down you can have the server automatically switch to serving the media itself.

We need to build standardized tools that fetch data from webservices locally, from which they are served to the user. We need to build systems that asynchronously sync data bidirectionally from all of these different webservers and ensure that the integrity of our data on the web is sound. Right now this is a tedious, and error-prone task, but we can do better. We can build cross-platform tools and libraries that will solve this problem, allow us to use 3rd-party services, and rest sound knowing that tomorrow no matter what happens to Amazon, the internet will still be around.

DISCLAIMER: This is almost entirely a ripoff of a talk given by Timothy Fitz at Super Happy Dev House last month in San Francisco. While I think it's a really good point I can't take credit for having been the first to worry about it.

Search

Badges

  • django badge
  • apache badge
  • GeoURL
  • XFN Friendly
  • Valid HTML 4.01 Transitional