Section 9.2. Publish and Subscribe | Developing Feeds with Rss and Atom

9.2. Publish and Subscribe

The traditional guidelines for the use of syndication feeds present a small problem. There is always going to be a balance between the need to be up to date and the need to refrain from abusing the feed publisher's server by requesting the feed every few seconds. While the norm is to request the feed a maximum of once an hour, many feeds deserve to be followed much more closely. Conversely, many feeds update only once a day, or less often. Requesting those once an hour is a waste of time and resources and potentially expensive for the publisher. On top of the compressed feeds and conditional GET already discussed, one other idea has long existed within the feed world: Publish and Subscribe.

Let's think of the earlier situation. We have a feed, and this feed has its users. The users take the feed and do what they willI've presented some examples of potential uses in the previous chaptersbut each user depends on his copy of the feed (whether in memory, or converted to another format and saved) being up to date. In order for this to happen, the users subscribe to a system that watches the feed continuously. This system then publishes notifications to all of the users when the feed changes. The users are then responsible for updating their copies of the feed, ordinarily by requesting it from the server.

Currently, there is no Publish and Subscribe mechanism for Atom, but there are two ways to implement such a system within RSS: one with the relevant elements included in RSS 2.0, and the other suggested by an RSS 1.0 module. We will deal with these in turn.

9.2.1. Publish and Subscribe Within RSS 2.0

RSS 2.0's Publish and Subscribe system is the result of work by Userland Software and its CEO Dave Winer. It was introduced with, and perhaps inspired, the release of RSS 0.92 and features heavily in Userland products. Furthermore, its workings are based on XML-RPC and SOAP, two protocols that Winer was instrumental in starting. But that is not to say that RSS 2.0's Publish and Subscribe is in any way proprietaryit is not.

There are three characters to watch in this drama: the user, the feed, and the Publish and Subscribe system, itself better known as the cloud. Here is the process:

The user's system sends a message, via either XML-RPC, SOAP, or HTTP-POST, to the cloud to subscribe. This message contains five parameters:
- The name of the procedure that the cloud should call to notify the user of changes.
- The TCP port on which the user's system is listening.
- The path to the user's system. This is all that is needed because the user's system will be running either XML-RPC or SOAP, both of which use HTTP as their transport layer, and the cloud can determine the IP address of the caller from the initial request. This has a nice security benefit: one user cannot make a registration call on behalf of another.
- A string indicating which protocol to use, either XML-RPC or SOAP, when the cloud messages the user.
- A list of URLs of RSS files to be watched.
If the registration is successful, the cloud returns true; if not, it returns an error.
Somewhat later on, the cloud either detects or is informed of a change in the feed.
The cloud messages the user using the protocol requested, giving the URL of the changed feed.
The user requests the fresh feed from its server and does what she likes with it.

Note that by design, the RSS 2.0 Publish and Subscribe system expires subscriptions after 25 hours, forcing them to be renewed every day.

As I described in Chapter 4, the RSS 2.0 Publish and Subscribe system is denoted in the RSS feed with the cloud element:

<cloud domain=" Domain or IP address of the cloud"         port="TCP port on which the cloud is listening"         path="The path to the cloud's listener"         registerProcedure=" The procedure name to register with the cloud"         protocol="Either xml-rpc  or soap,  Case-sensitive" />

You see that we talk of passing messages via XML-RPC or SOAP? These may require some explanation. The two protocols are the basis of the fashionable technology known collectively as web services. We have already used some web services in previous chapters: getting information from Google using SOAP, for example. With those examples, we were passing a query, encoded in XML, and getting a set of results back, also encoded in XML. In the case of RSS 2.0 Publish and Subscribe, we are doing something similar: passing XML-encoded messages between systems. What makes the RSS 2.0 Publish and Subscribe system different from the other uses of web services you have already seen is that the user's system has to not only pass a message and wait for a reply, but also continually listen for other systems trying to talk to it. For this reason, it cannot be used with machines that reside behind Network Address Translation (NAT) systems. The user's machine must be directly addressable from the rest of the Internet, and it must be listening for people trying to do so.

O'Reilly publishes some very good books on the implementation of XML-RPC and SOAP. You may be interested in reading the following:

Programming Web Services with XML-RPC
Web Services Essentials
Programming Web Services with SOAP

RSS 2.0 Publish and Subscribe has been publicly implemented at least twice. Userland Software incorporates it directly into its own products, and NewsIsFree (http://www.newsisfree.com), the RSS aggregator site run by Mike Krus, uses it as well. Both services produce feeds with a cloud element, and both run servers that will notify you of changes to the feeds.

These are both nice services, but what if you want to use your own Publish and Subscribe system to update, say, a web site's rendering of a feed? No problem: we'll just roll our own Publish and Subscribe system (see the Section 9.3).

9.2.2. Publish and Subscribe with RSS 1.0

As it stands, the RSS 2.0 version of Publish and Subscribe is currently unavailable to standard RSS 1.0 users; it exists neither in the core specification nor in any published modules. But that is not to say that you cannot create a module to import the cloud element. You are, of course, at liberty to do so.

Meanwhile, the rest of us may be looking at the mod_changedpage module. This Publish and Subscribe module, written by Aaron Swartz, is fundamentally different from the system used with RSS 2.0. Where the latter uses web services protocols to communicate between systems, mod_changedpage uses simple HTTP POST procedures to handle the same job.

It works like this: the RSS 1.0 feed's channel contains the single element cp:server, which contains simply the URL of the Publish and Subscribe service for the feed. A user who wants to subscribe to notifications for this feed sends an HTTP POST request to that URL, taking two parameters:

responder: The URL of the user's mod_changedpage system
target: The URL of the feed to be monitored:
responder=http%3A%2F%2FURL.OFUSERSSYSTEM&target=http%3A%2F%2FURLOF FEED

Note that the :// of the URLs within the parameters are entity-encoded as %3A%2F%2F.

The original proposal document for this specification states:

Additional attributes should be allowed before or after the URL. Implementations should ignore attributes they don't understand. The order of the attributes is not significant. Content-Type should be specified as "application/x-www-form-urlencoded."

If this sounds familiar, it should be. This is how HTML forms are encoded, as described in the HTML spec.

Either way, the URL is received at the Publish and Subscribe service, which returns a HTTP status code of 200 if the subscription is successful and an HTTP status code of 400 if the subscription failed (it is also recommended that a 400 code should be accompanied by a plain-text file explaining the error as thoroughly as possible).

When the feed updates, the Publish and Subscribe system sends a similar HTTP POST request, with one parameterthe URL of the changed feed:

url=http %3A%2F%2FURL.OF.CHANGEDFEED

The user's system can then do what it likes with that feed.