12.1 Introducing Publish and Subscribe | Content Syndication with RSS

The problems of efficient delivery of information have existed for far longer than RSS has, as have techniques for addressing them. Indeed, both the systems described in this chapter follow the classic Gang of Four "Observer" design pattern. It is perhaps from the other title of this pattern that the name of the technique arises: Publish and Subscribe.

Let's think of the situation from above. We have a feed, and this feed has its users. The users take the feed and do what they will we've already seen some examples of potential uses in the previous chapters but each of the users depends on his copy of the feed (whether in memory, or converted to another format and saved) being up-to-date. In order for this to happen, the users subscribe to a system that watches the feed continuously. This system then publishes notifications to all of the users when the feed changes. The users are then responsible for updating their copies of the feed, ordinarily by requesting it from the server.

There are two ways of implementing such a system within RSS: one with the relevant elements included in RSS 0.92 and 2.0, and the other suggested by an RSS 1.0 module. We will deal with these in turn .

12.1.1 Publish and Subscribe Within RSS 0.92 and 2.0

RSS 0.92's Publish and Subscribe system is the result of work by Userland Software, and its CEO Dave Winer. It was introduced with, and perhaps inspired, the release of RSS 0.92, and features heavily in Userland products. Furthermore, its workings are based on XML-RPC and SOAP, two protocols that Winer was instrumental in starting. But that is not to say that RSS 0.92's Publish and Subscribe is in any way proprietary it is not.

We have three characters to watch in this drama: the user , the feed, and the Publish and Subscribe system, better known as the cloud. Here is the process:

The user's system sends a message, via either XML-RPC, SOAP, or HTTP-POST, to the cloud to subscribe. This message contains five parameters:
- The name of the procedure that the cloud should call to notify the user of changes.
- The TCP port on which the user's system is listening.
- The path to the user's system. This is all that is needed, as the user's system will be running either XML-RPC or SOAP, both of which use HTTP as their transport layer, and the cloud can determine the IP address of the caller from the initial request. This has a nice security benefit: one user cannot make a registration call on behalf of another.
- A string indicating which protocol to use, either XML-RPC or SOAP, when the cloud messages the user.
- A list of URLs of RSS files to be watched.
If the registration is successful, the cloud returns true ; if not, it returns an error.
Somewhat later on, the cloud either detects or is informed of a change in the feed.
The cloud messages the user using the protocol requested , giving the URL of the changed feed.
The user requests the fresh feed from its server and does what she likes with it.

Note that by design, the RSS 0.92 Publish and Subscribe system expires subscriptions after 25 hours, forcing them to be renewed every day.

As we have already seen in Chapter 4, the RSS 0.92 Publish and Subscribe system is denoted in the RSS feed with the cloud element:

 <cloud domain="   DOMAIN NAME or IP ADDRESS of the cloud   "         port="   TCP port on which the cloud is listening   "         path="   The path to the cloud's listener   "         registerProcedure="   The procedure name to register with the cloud   "         protocol="   either   xml-rpc   or   soap,   case-sensitive   " />

You see that we talk of passing messages via XML-RPC or SOAP? These may require some explanation. The two protocols are the basis of the fashionable technology known collectively as web services . We have already used some web services in previous chapters: getting information from Google using SOAP, for example. With those examples, we were passing a query, encoded in XML, and getting a set of results back, also encoded in XML. In the case of RSS 0.92 Publish and Subscribe, we are doing something similar: passing XML-encoded messages between systems. What makes the RSS 0.92 Publish and Subscribe system different from the other uses of web services we have already seen is that the user's system has to not only pass a message and wait for a reply, but also continually listen for other systems trying to talk to it. For this reason, it cannot be used with machines that reside behind Network Address Translation (NAT) systems. The user's machine must be directly addressable from the rest of the Internet, and it must be listening for people trying to do so.

To see how this works in practice, we will now build our own basic RSS 0.92 Publish and Subscribe system.

Learning XML-RPC and SOAP

O'Reilly publishes some very good books on the implementation of XML-RPC and SOAP. You may be interested in reading the following:

Programming Web Services with XML-RPC , by Simon St. Laurent, Joe Johnston, and Edd Dumbill
Web Services Essentials , by Ethan Cerami
Programming Web Services with SOAP , by Doug Tidwell, James Snell, and Pavel Kulchenko

RSS 0.92 Publish and Subscribe has, at the time of this writing, been publicly implemented twice. Userland Software incorporates it directly into their own products, and NewsIsFree (http://www. newsisfree .com), the RSS aggregator site run by Mike Krus, uses it as well. Both services produce feeds with a cloud element, and both run servers that will notify you of changes to the feeds.

These are both nice services, but what if you want to use your own Publish and Subscribe system to update, say, a web site's rendering of a feed? No problem we just roll our own Publish and Subscribe system, as we will in a moment.

12.1.2 Publish and Subscribe with RSS 1.0

As it stands, the RSS 0.92 version of Publish and Subscribe is currently unavailable to standard RSS 1.0 users; it exists neither in the core specification nor in any published modules known to me. But that is not to say that you cannot create a module to import the cloud element. You are, of course, at liberty to do so.

Meanwhile, the rest of us may be looking at the mod_changedpage module. This Publish and Subscribe module, written by Aaron Swartz, is fundamentally different from the system used with RSS 0.92. Where the latter uses web services protocols to communicate between systems, mod_changedpage uses simple HTTP POST procedures to handle the same job.

It works like this: the RSS 1.0 feed's channel contains the single element cp:server , which contains simply the URL of the Publish and Subscribe service for the feed. A user who wants to subscribe to notifications for this feed sends an HTTP POST request to that URL, taking two parameters:

responder

The URL of the user's mod_changedpage system

target

The URL of the feed we want to monitor:

responder=http%3A%2F%2FURL.OFUSERSSYSTEM&target=httP%3A%2F%2FURL OF FEED

(Notice that the :// of the URLs within the parameters are entity encoded as %3A%2F%2F .)

The original proposal document for this specification states:

Additional attributes should be allowed before or after the URL. Implementations should ignore attributes they don't understand. The order of the attributes is not significant. Content-Type should be specified as "application/x-www-form-urlencoded".

If this sounds familiar, it should be. This is how HTML forms are encoded, as described in the HTML spec.

Either way, the URL is received at the Publish and Subscribe service, which returns a HTTP status code of 200 if the subscription is successful and an HTTP status code of 400 if the subscription failed (it is also recommended that a 400 code should be accompanied by a plain-text file explaining the error as far as possible).

When the feed updates, the Publish and Subscribe system sends a similar HTTP POST request, with one parameter the URL of the changed feed:

url=http %3A%2F%2FURL.OF.CHANGEDFEED

The user's system can then do what it likes with that feed.