Chapter 9. Feeds in the Wild


Chapter 9. Feeds in the Wild

When you have to kill a man, it costs nothing to be polite.

Sir Winston Churchill

Now that we're fully capable of creating and consuming feeds, we can start to use them with a flourish. The final chapters of this book aim to help you do just that. This one deals with the place a feed has in the wider world of the Internet: how it should dress, talk, and hold itself in finer society.


    9.1. Once You Have Created Your Simple RSS Feed

    Once you have created your feed, there are just one or two more things to do. None of these are mandatory, but they are all so simple, and give so much to the richness of the Net, that you are encouraged to invest the little time needed. You can work through these one by one in about half an hour.

    9.1.1. Publish a Link

    Place a link to the RSS feed on your page! People regularly forget to do this and wonder why, after looking at their server logs, no one is subscribed to their feed. There are standard icons emerging from each of the news aggregators and desktop readerssome of these are freely available for this use, but even a simple text link is better than nothing at all. The original icon for RSS (see Figure 9-1) was the white-on-orange XML logo, produced by Userland Software. Most of the other logos (see Figures Figure 9-2 and Figure 9-3) stem artistically from this one vision. With the advent of Atom, the wording on the button is beginning to change. My prediction is that it will change to the word "Feed" over time, but the arguments over the standard syndication feed button have been almost as fraught as those over the standards themselves. At time of writing, for example, it looks as if the next version of Apple's Mac OS X will contain an RSS reader that uses a version of the button colored blue. A schism looms on the horizon for those who care about such things.

    Figure 9-1. The original XML button


    Figure 9-2. The XML button with the Radio Userland coffee cup logo


    Figure 9-3. A further, very common development of an RSS button


    Either way, it really does pay to make your feeds as noticeable as possible. One way to do this, of course, is to take the human out of the equation entirely and use autodiscovery.

    9.1.2. Enabling Autodiscovery

    Any web page with a corresponding feed document should contain an autodiscovery link pointing to said feed. This allows programs to determine the location of the feed even when they are only given the URL of the page itself. So the user doesn't really need to know about feeds at all. She can just point her feed application at the page she likes, and the rest happens automatically. Some browsers also automatically display the RSS badge, or a variant thereof, whenever you're looking at a page with an autodiscovery link.

    To do this, simply add the relevant link from those in Example 9-1 to your HTML page inside the <head> section.

    Example 9-1. Autodiscovery links
    <link rel="alternate" type="application/rss+xml" title="RSS 2.0" href="url/to/rss2/file">
    <link rel="alternate" type="application/rdf+xml" title="RSS 1.0" href="url/to/rss1/file">
    <link rel="alternate" type="application/atom+xml" title="Atom" href="url/to/atom/file">

    The top of my own weblog's source code looks like Example 9-2.

    Example 9-2. Autodiscovery links in action
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
            "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    <head>
    <title>Ben Hammersley's Dangerous Precedent - The Weblog</title>
    
    <link rel="alternate" type="application/atom+xml" title="Atom" href="http://www.
    benhammersley.com/atom.xml"/>
    <link rel="alternate" type="application/rdf+xml" title="RSS" href="http://www.
    benhammersley.com/index.rdf"/>

    Technically, the autodiscovery link is a standard element of the XHTML format and is used by many metadata standards to link documents to their XHTML partner. The rel="alternate" attribute cannot change in this context, but the others can. In particular, the type attribute shows the MIME type of the feed you are pointing to. If, in the future, the feed world decides on a different MIME type format, the autodiscovery link should be changed as well.

    9.1.3. Serving a Feed Correctly

    If you have wide enough access privileges to your server's configuration, a lot can be done to make your position in the Internet ecosystem a great deal more fruitful. You can make life easier on yourself, and others, by setting the correct MIME type and by using the features of HTTP 1.1.

    9.1.3.1 MIME types

    Here we must tread carefully. As much as can be said definitively here, the correct MIME types for RSS and Atom are as follows:

    • RSS 1.0: application/rdf+xml

    • RSS 2.0: either application/rss+xml, application/xml, or text/xml

    • Atom: application/atom+xml

    As you can see, RSS 2.0 is tricky. The specification gives no help on this matter at all, and there is as yet no consensus within the community. The problem stems from the way certain browsers deal with these values, downloading the file instead of displaying it on the screen. This might cause confusion for users, and so some people claim that serving all of them with a MIME type of text/xml is the more sensible choice. I personally disagree, and go with application/rss+xml. The other two formats strictly define their MIME type in the earlier list.

    To set this up on an Apache server, you can include an .htaccess file within the directory you are serving from. Assuming you are creating feeds in every format, called index.rdf, index.atom, and index.rss, you would want lines in your .htaccess file that look like Example 9-3.

    Example 9-3. An example .htaccess file to set the correct MIME types for feeds
    AddType application/rdf+xml rdf
    AddType application/atom+xml atom
    AddType application/rss+xml rss

    For more details on .htaccess files, see the official documentation at http://httpd.apache.org/docs/howto/htaccess.html.

    9.1.3.2 HTTP 1.1

    HTTP Version 1.1 has two features that can greatly reduce the amount of bandwidth used by your feeds. Bandwidth is an enormous problem for feed providers. Already, many of the more popular sites are restricting the number of times an individual IP address can request the feed per day. Others are reducing the size of the feed they provide. Certainly most webloggers will find that their feed makes up over half of their served bandwidth after a while. It is worth, therefore, trying to reduce this figure. It can be done with both compression and by enabling conditional GET.

    9.1.3.2.1 Compression

    Under the HTTP 1.1 specification, data sent via the protocol can be compressed using the gzip algorithm. This works wonders with big text documents like feeds, making them smaller and so faster and cheaper to serve. You should make sure compression is enabled wherever possible.

    If your feeds are static files, served by Apache, you should install and use the necessary Apache modules. For Apache 1.x users, mod_gzip is found at http://sourceforge.net/projects/mod-gzip/, and good unofficial documentation for it sits at http://www.schroepl.net/projekte/mod_gzip/index.htm. For Apache 2.0 users, mod_deflate is included within the source package.

    If you use Perl, PHP, or Python to dynamically create your feeds, Mark Nottingham's cgi_buffer libraries take care of the compression for you. See http://www.mnot.net/cgi_buffer/ for more. It is highly recommended.

    9.1.3.2.2 Conditional GET

    The main problem with serving feed documents is that your consumers are liable to request the file whether it has changed or not. If all your readers are downloading the file once an hour, yet you're only updating it once a week, both you and they are wasting each other's bandwidth. This can cost you a lot of money.

    Happily, the authors of the HTTP 1.1 standard came up with an answer to this: conditional GET. It works like this: when you request an RSS feed that you have seen before, you tell the server some details about the last version you saw, and if it has changed, the server will give you the new version. If not, however, the server replies that nothing is new in the world, and you can go about your merry way. Doing that only takes a few hundred bytes, as opposed to the 20,000 or so that a full feed might. Happy savings in bandwidth and speed all round.

    Most people are already serving documents using conditional GET. The standard was written in 1999, and effectively every server application in use today supports it without your needing to worry about it. The problems come when you serve dynamically created feeds or write a feed-consuming application.

    The handling of conditional GET with the common scripting languages has been covered across the Web by specialists in the field, and it's more productive to point you to their examples. Therefore, have a look at:

    • Perl: Kellan Elliott-McCrea at http://laughingmeme.org/archives/000479.html

    • PHP: Simon Willison at http://simon.incutio.com/archive/2003/04/23/conditionalGet

    • Python: Jarno Virtanen at http://www.hole.fi/jajvirta/weblog/20030928T2101.html

    Mark Nottingham's cgi_buffer libraries mentioned earlier also take care of conditional GET.

    9.1.3.3 RSScache.com

    As I finish this book, http://www.rsscache.com is being launched. The site promises to cut the amount of bandwidth that your feed uses by caching it and serving it for you.

    Using the system is very simple. Just add the URL http://my.rsscache.com/ in front of your own feed. For example:

    http://my.rsscache.com/www.benhammersley.com/index.rdf

    The system works very well; however, it removes any relationship between your server's log files and the real number of subscribers you have. If you don't care about knowing how many people read your feeds and are worried about bandwidth use, this seems like a great service.

    9.1.4. Registering with Aggregators

    Registering your feed at the major aggregators helps people and automatic services find your information. For example, most of the desktop news readers available today use the lists of feeds available at Syndic8 as a menu of feeds available to their users. Being part of this is a good thing. Here are a few of the major aggregators and their URLs:

    • Syndic8 (http://www.syndic8.com/suggest_start.php)

    • Feedster (http://www.feedster.com/add.php)

    • NewsIsFree (http://www.newsisfree.com/contact.php)

    A very full, and constantly maintained, list of aggregators that take submissions of feed URLs for their catalogs can be found at http://www.rss-specifications.com/rss-submission.htm.

    9.1.5. Metadata for the Main Page

    Along with the autodiscovery links mentioned earlier, there are other lines of metadata that can be usefully added to your site. Syndic8 has a few other built-in features that aid with its cataloging and require just this sort of metadata to be added to your page. These features deal with the geographical origin of the feed and its subject's place within the Open Directory at http://www.dmoz.org. If you are registering your feed with Open Directory, it is worthwhile to add these lines:

    <META NAME="dc.creator.e-mail" CONTENT="yourname@yourdomain.com">
    <META NAME="dc.creator.name" CONTENT="Your Name">

    Then find the correct place within the Open Directory (http://www.dmoz.org) and add it to your site's page like so (if your site is not in the Open Directory, take this opportunity to submit it!):

    <META NAME="dmoz.id" CONTENT="Computers/Internet/On_the_Web/Weblogs/Tools">

    Now visit the Getty Thesaurus of Geographical Names (TGN) at http://www.getty.edu/research/tools/vocabulary/tgn/ and find the location that best represents your web site's location. Make a note of the name of the place, the name of the nation, the latitude, the longitude, and the TGN number.

    Then go to the International Standards Organization web site at http://www.iso.org/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1.html and find the country code of your nation. Use this information to add the following lines to your site (replace my information with yours):

    <meta name="tgn.id" content="7011781" />
    <meta name="tgn.name" content="London" />
    <meta name="tgn.nation" content="United Kingdom" />
       
    <meta name="geo.position" content="51.500;-0.167" />
    <meta name="geo.placename" content="London, England" />
    <meta name="geo.country" content="UK" />

    Now post a message to the Syndic8 mailing list (syndic8@yahoogroups.com) asking for a Syndic8 editor to flick the proverbial metadata switch on your feed. People will now be able to search for you by location and subject.

    9.1.6. Counting Hits and Clickthroughs

    Completely new since the first edition of this book, and although still in beta as I write this, the services at http://www.feedburner.com are proving to be immensely popular. One of Feedburner's most popular features, the others being dealt with in Chapter 10, is its circulation data.

    Circulation data is very hard to accurately determine on the Web. The layers of caching implemented within the larger ISPs mean that you may have many more readers than you think. But it's certainly useful to be able to get an accurate count, especially with information on which applications your readers are using.

    Using FeedBurner will give you what appear to be, more or less, accurate results and an idea of which links in your feed people are clicking on the most.