Section 7.3. Producing Atom Feeds


7.3. Producing Atom Feeds

Because it is still in its infancy, the Atom Syndication Format has few libraries available to make its generation a simple matter. Unlike RSS, with its years of development, people haven't had the time or opportunity to build Perl modules or the like. The few that do exist are invariably out of date or will be by the time you read this book. With this in mind, therefore, there is little point in detailing those that do exist. By the time Atom goes to Version 1.0, there will be simpler alternatives, and we'll cover those in later editions of this book. Preorder now!

From that you'll see, of course, that using a library for the creation of Atom feeds is overkill. For most simple uses, you're perfectly well off using a series of print commands or using a templating system as if you were producing ordinary RSS.

Producing Atom with Perl

Perl is the one language with at least a framework of two Atom creation libraries. Ben Trott's XML::Atom and Tim Appnel's XML::Atom::Syndication are both very promising starts. But both are, at time of writing, either incomplete or out of date. Keep an eye on them, though, as they sport both good beginnings and fine authors.


7.3.1. Validating Atom Feeds

Atom's strict structure, and the fact that by the time the shouting is over, there will be only one version to get people's knickers in a twist, means that validation is easy. The Feed Validator, at http://feedvalidator.org/, is the one-stop shop for such needs. Written by Sam Ruby and Mark Pilgrim, both leading brains of the syndication world, it produces extremely useful results. Test your feeds often.


    Chapter 8. Parsing and Using Feeds

    The limits of my language mean the limits of my world.

    Ludwig Wittgenstein

    By now you should be fully up to speed with both RSS 1.0 and 2.0 and Atom. You will also have seen, in Chapter 2, the most popular feed-reading applications. In this chapter, we deal with the consumption of the feeds, for either display on a web page or use within your own programs.


      8.1. Important Issues

      This is where it starts to get really messy. We have discussed the production of three standards in this book, but there are hundreds of variations in the wild. This chaos has arisen because of a combination of the confusion of standards development, common misunderstandings about what constitutes valid XML, and a general agreement among the developers of feed-reading applications that they would parse invalid feeds at all costs. Indeed, it is because of these clashing versions of the RSS family that the Atom project was started in the first place.

      When it comes to parsing any given feed, therefore, you need to take one specific fact into account: the feed is probably invalid. This might seem harsh, but you have to remember that the RSS community went through a long period when the specifications were so loosely defined that it was hard to pin down exactly what was and wasn't valid. These feeds, and the systems that produce them, have not been revisited. Standards compliance aside, it's thought that at least 10% of feeds aren't even valid XML. This causes a lot of problems in itself.

      The situation is improving, but you must remain aware of the problem. The most popular newsreader applications are built to be liberal parsers. That is, they act like modern web browsers and try to work round as many errors as they can. This tends to bring about a false sense of security that will betray you when you use your own parsing tools.

      There is also a great deal of debate about whether or not it is right to use a liberal parser. The argument for strictness is that the RSS community would be better served by people's errors being pointed out to them. This may or may not be true, but it's a lost argument. The RSS community is now too big for any form of universal collaborative action, and no one is going to use a strict parser on a public web site and have it break in front of the rest of the world. The end user, after all, doesn't know or care about the technicalities discussed in this book. Liberal parsers, whether morally correct or not, have won the day.

      Of course, there's only so far you can go with liberality, and even the most accepting of parsers will balk at the really badly formed feed. You are therefore advised to pay a lot of attention to generating well-formed feeds, despite the fact that if you do make a mistake, it will probably go unnoticed by the majority of your readers. The old adage to be strict in what you produce and liberal in what you accept has much application here.

      The feed parsers available to us can be separated into two groups: those for display and those for programmatic use. The older, the Display Parsers, to coin a phrase, turn RSS and Atom into HTML. Programmatic Parsers, another neologism, turn RSS and Atom into internal structures within programs. There is only a little crossover.

      8.1.1. Converting Atom to RSS

      There really isn't a problem consuming both RSS and Atom in the same parser. By now, most of the useful libraries take both formats equally seriously. However, with Atom still in flux, there may be a time when the parsers' authors haven't caught up with a new specification. If you really must use a parser that supports only RSS or can't deal with the latest version of Atom, and yet you must also consume the latest Atom feeds, the only thing to do is convert the Atom feed to RSS. This can be done in two ways:


      Third-party services

      Various people, for varying reasons, offer RSS-to-Atom conversion as an online service. Two examples are http://www.2rss.com/software.php?page=atom2rss and http://www.feedburner.com.


      XSLT

      Because both formats are XML-based, you can use XSLT to convert between the two. Aaron Straup Cope makes some XSL stylesheets to do just this. They are available at http://www.aaronland.info/xsl/atom/0.3/.

      I don't recommend either option; it's far preferable, if Atom is going to be presented, to use an Atom-capable parser from the start. RSS is certainly never going to go away, but then again, neither is Atom.