8.1. Important Issues
This is where it starts to get really messy. We have discussed the production of three standards in this book, but there are hundreds of variations in the wild. This chaos has arisen because of a combination of the confusion of standards development, common misunderstandings about what constitutes valid XML, and a general agreement among the developers of feed-reading applications that they would parse invalid feeds at all costs. Indeed, it is because of these clashing versions of the RSS family that the Atom project was started in the first place.
When it comes to parsing any given feed, therefore, you need to take one specific fact into account: the feed is probably invalid. This might seem harsh, but you have to remember that the RSS community went through a long period when the specifications were so loosely defined that it was hard to pin down exactly what was and wasn't valid. These feeds, and the systems that produce them, have not been revisited. Standards compliance aside, it's thought that at least 10% of feeds aren't even valid XML. This causes a lot of problems in itself.
The situation is improving, but you must remain aware of the problem. The most popular newsreader applications are built to be liberal parsers. That is, they act like modern web browsers and try to work round as many errors as they can. This tends to bring about a false sense of security that will betray you when you use your own parsing tools.
There is also a great deal of debate about whether or not it is right to use a liberal parser. The argument for strictness is that the RSS community would be better served by people's errors being pointed out to them. This may or may not be true, but it's a lost argument. The RSS community is now too big for any form of universal collaborative action, and no one is going to use a strict parser on a public web site and have it break in front of the rest of the world. The end user, after all, doesn't know or care about the technicalities discussed in this book. Liberal parsers, whether morally correct or not, have won the day.
Of course, there's only so far you can go with liberality, and even the most accepting of parsers will balk at the really badly formed feed. You are therefore advised to pay a lot of attention to generating well-formed feeds, despite the fact that if you do make a mistake, it will probably go unnoticed by the majority of your readers. The old adage to be strict in what you produce and liberal in what you accept has much application here.
The feed parsers available to us can be separated into two groups: those for display and those for programmatic use. The older, the Display Parsers, to coin a phrase, turn RSS and Atom into HTML. Programmatic Parsers, another neologism, turn RSS and Atom into internal structures within programs. There is only a little crossover.
8.1.1. Converting Atom to RSS
There really isn't a problem consuming both RSS and Atom in the same parser. By now, most of the useful libraries take both formats equally seriously. However, with Atom still in flux, there may be a time when the parsers' authors haven't caught up with a new specification. If you really must use a parser that supports only RSS or can't deal with the latest version of Atom, and yet you must also consume the latest Atom feeds, the only thing to do is convert the Atom feed to RSS. This can be done in two ways:
I don't recommend either option; it's far preferable, if Atom is going to be presented, to use an Atom-capable parser from the start. RSS is certainly never going to go away, but then again, neither is Atom.