4.2. The Basic Structure
The top level of an RSS 2.0 document is the rss version="2.0" element. This is followed by a single channel element. The channel element contains the entire feed contents and all associated metadata.
4.2.1. Required Channel Subelements
There are 3 required and 16 optional subelements of channel within RSS 2.0. Here are the required subelements:
The name of the feed. In most cases, this is the same name as the associated web site or service.
<title>RSS and Atom</title>
A URL pointing to the associated resource, usually a web site. The link must be an IANA-registered URI scheme, such as http://, https://, news://, or ftp://, though it isn't necessary for a application developer to support all these by default. The most common by a large margin is http://. For example:
Some words to describe your channel.
<description>This is a nice RSS 2.0 feed of an even nicer weblog</description>
Although it isn't explicitly stated in the specification, it is highly recommended that you do not put anything other than plain text in the channel/title or channel/description elements. There are some existing feeds with HTML within those elements, but these cause a considerable amount of wailing, and at least a small amount of gnashing of teeth. Do not do it. Use plain text only in these elements. The following sidebar, "Including HTML Within title or description," gives a fuller account of this, but in my opinion it's a bad idea.
Including HTML Within title or description
Since the early days of RSS 0.91, there's been an ongoing debate about whether the item/title or item/description elements may, or should, contain HTML. In my opinion, they should not, for both practical and philosophical reasons. Practically speaking, including HTML markup requires the client software to be able to parse or filter it. While this is fine with many desktop agents, it restricts developers looking for other uses of the data. This brings us to the philosophical aspect. RSS's second use, after providing headlines and content to desktop readers and sites, is to provide indexable metadata. By combining presentation and content (i.e., by including HTML markup within the description element), you could disable this feature.
However, my opinion lost out on this one. RSS 2.0 now allows for entity-encoded HTML within the item/description tag. It doesn't mention anything, in either direction, regarding item/title, and people are basically making it up as they go along. With that in mind, I still state that item/title at least should be considered plain text.
If you want to put HTML within the item/description element, you can do it in two ways:
- Entity encoding
With entity encoding, the angle brackets of HTML tags are converted to their respective HTML entities, < and >. If you need to show angle brackets as literal characters, the ampersand character itself should be encoded as well:
This is a <em>lovely left angle bracket:</em> &lt;
- Within a CDATA block
The alternative is to enclose the HTML within a CDATA block. This removes one level of entity encoding, as in:
<![CDATA[This is a <em>lovely left angle bracket:</em> <]]>
Either approach is acceptable according to the specification, and there is no way for a program to tell the difference between the two, or to tell if the description is actually just plain text that resembles encoded HTML. This is a major problem with the RSS 2.0 specification, as you'll see when we talk about parsing feeds. Atom and RSS 1.0 both have their own ways around this issue.
4.2.2. Optional Channel Subelements
There are 16 optional channel subelements of RSS 2.0. Technically speaking, you can leave these out altogether. However, I encourage you to add as many as you can. Much of this stuff is static; the content of the element never changes. Placing it into your RSS template or adding another line to a script is little work for the additional value of your feed's metadata. This is especially true for the first three subelements listed here:
The language the feed is written in. This allows aggregators to index feeds by language and should contain the standard Internet language codes as per RFC 1766.
A copyright notice for the content in the feed:
<copyright>Copyright 2004 Ben Hammersley</copyright>
The email address of the person to contact for editorial enquiries. It should be in the format: name @example.com (FirstName LastName).
<managingEditor>email@example.com (Ben Hammersley)</managingEditor>
The email address of the person responsible for technical issues with the feed:
<webMaster>firstname.lastname@example.org (Geek McNerdy)</webMaster>
The publication date of the content within the feed. For example, a daily morning newspaper publishes at a certain time early every morning. Technically, any information in the feed should not be displayed until after the publication date, so you can set pubDate to a time in the future and expect that the feed won't be displayed until after that time. Few existing RSS readers take any notice of this element in this way, however. Nevertheless, it should be in the format outlined in RFC 822:
<pubDate>Sun, 12 Sep 2004 19:00:40 GMT</pubDate>
The date and time, RFC 822-style, when the feed last changed. Note the difference between this and channel/pubDate. lastBuildDate must be in the past. It is this element that feed applications should take as the "last time updated" value and not channel/pubDate.
<pubDate>Sun, 12 Sep 2004 19:01:55 GMT</pubDate>
Identical in syntax to the item/category element you'll see later. This takes one optional attribute, domain. The value of category should be a forward-slash-separated string that identifies a hierarchical location in a taxonomy represented by the domain attribute. Sadly, there is no consensus either within the specification or in the real world as to any standard format for the domain attribute. It would seem most sensible to restrict it to a URL; however, it needn't necessarily be so.
This should contain a string indicating which program created the RSS file:
<generator>Movable Type v3.1b3</generator>
A URL that points to an explanation of the standard for future reference. This should point to http://blogs.law.harvard.edu/tech/rss:
The <cloud/> element enables a rarely used feature known as "Publish and Subscribe," which we shall investigate fully in Chapter 9. It takes no value itself, but it has five mandatory attributes, themselves also explained in Chapter 9: domain, path, port, registerProcedure, and protocol.
<cloud domain="rpc.sys.com" port="80" path="/RPC2" registerProcedure= "pingMe" protocol="soap"/>
ttl, short for Time-to-Live, should contain a number, which is the minimum number of minutes the reader should wait before refreshing the feed from its source. Feed authors should adjust this figure to reflect the time between updates and the number of times they wish their feed to be requested, versus how up to date they need their consumers to be.
This describes a feed's accompanying image. It's optional, but many aggregators look prettier if you include one. It has three required and two optional subelements of its own:
The URL of a GIF, JPG, or PNG image that corresponds to the feed. It is, quite obviously, required.
A description of the image, normally used within the ALT attribute of HTML's <img> tag. It is required.
The URL to which the image should be linked. This is usually the same as the channel/link.
- width and height
The width and height of the icon, in pixels. The icons should be a maximum of 144 pixels wide by 400 pixels high. The emergent standard is 88 pixels wide by 31 pixels high. Both elements are optional.
<image> <title>RSS2.0 Example</title> <url>http://www.exampleurl.com/example/ images/logo.gif</url> <link>http://www.exampleurl.com/example/index.html</link> <width>88</width> <height>31</height> <description>The World's Leading Technical Publisher</description> </image>
The PICS rating for the feed; it helps parents and teachers control what children access on the Internet. More information on PICS can be found at http://www.w3.org/PICS/. This labeling scheme is little used at present, but an example of a PICS rating would be:
<rating>(PICS-1.1 "http://www.gcf.org/v2.5" labels on "1994.11.05T08:15-0500" until "1995.12.31T23:59-0000" for "http://w3.org/PICS/Overview.html" ratings (suds 0.5 density 0 color/hue 1))</rating>
An element that lets RSS feeds display a small text box and Submit button, and associates them with a CGI application. Many RSS parsers support this feature, and many sites use it to offer archive searching or email newsletter sign-ups, for example. textInput has four required subelements:
The label for the Submit button. It can have a maximum of 100 characters.
Text to explain what the textInput actually does. It can have a maximum of 500 characters.
The name of the text object that is passed to the CGI script. It can have a maximum of 20 characters.
The URL of the CGI script.
<textInput> <title>Search</title> <description>Search the Archives</ description> <name>query</name> <link>http://www.exampleurl.com/example/ search.cgi</link> </textInput>
- skipDays and skipHours
A set of elements that can control when a feed user reads the feed. skipDays can contain up to seven day subelements: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, or Sunday. skipHours contains up to 24 hour subelements, the numbers 1-24, representing the time in Greenwich Mean Time (GMT). The client should not retrieve the feed during any day or hour listed within these two elements. The elements are ORed not ANDed: in the example here, the application is instructed not to request the feed during 8 p.m. on any day, and never on a Monday:
4.2.3. item Elements
RSS 2.0 can have any number of item elements. The item element is at the heart of RSS; it contains the primary content of the feed. Technically, item elements are optional, but a syndication feed with no items is just a glorified link. Not having any items doesn't mean the feed is invalid, just extremely boring.
All item subelements are optional, with the proviso that at least one of item/title or item/description is present. You can use this feature to build lists (more on that later).
With item, there are the 10 standard item subelements available:
Usually, this is the title of the story linked to by the item, but it can also be seen as a one-line list item. There is controversy over whether HTML is allowed within this element; for more information, see the sidebar Sidebar 4-1.
The URL of the story the item is describing.
A synopsis of the story. The description can contain entity-encoded HTML. Again, as with item/title, see the pertinent sidebar Sidebar 4-1.
This should contain the email address of the resource's author referred to within the item. The specification's example is in the format email@example.com (firstname lastname) but isn't explained further:
<author>firstname.lastname@example.org (Ben Hammersley)</author>
Exactly the same as channel/category, but it pertains to the individual item only:
<category domain="the twisted passages of my mind">up/to_the_left/there</category>
This should contain the URL of any comments page for the item; it's primarily used with weblogs:
This describes a file associated with an item. It has no content, but it takes three attributes: url is the URL of the enclosure, length is its size in bytes, and type is the standard MIME type for the enclosure. Some feed applications can download these files automatically. The original idea was for configuring a feed aggregator to automatically download large media files overnight, thereby deferring the extra bandwidth required. This is an underused feature of RSS 2.0 because most aggregators don't support it, but in 2004, it became the focus of a lot of development around the idea of podcasting. See the sidebar Sidebar 4-1 for details.
<enclosure url="http://www.example.com/hotxxxpron.mpg" length= "34657834" type="video/mpeg"/>
Standing for Globally Unique Identifier, this element should contain a string that uniquely identifies the item. It must never change, and it must be unique to the object it is describing. If that content changes in any way, it must gain a new guid. This element also has the optional attribute isPermalink, which, if TRue, denotes that the value of the element can be taken as a URL to the object referred to by the item. Therefore, if no item/link element is present, but the isPermalink attribute is set to TRue, the application can take the value of guid in its place. The specification doesn't say what to do if both are present and aren't the same, but it seems sensible to give preference within any application to the item/link element.
The publication date of the item. Again, as with channel/pubDate, any information in the item shouldn't be displayed until after the publication date, but few existing RSS readers take any notice of this element in this way. The date is in RFC 822 format.
<pubDate>Mon, 13 Sep 2004 00:23:05 GMT</pubDate>
This should contain the name of the feed of the site from which the item was derived, and the attribute url should be the URL of that other site's feed:
Example 4-1 shows these parts assembled into an RSS 2.0 XML document.
Example 4-1. An example RSS 2.0 feed
<?xml version="1.0"?> <rss version="2.0"> <channel> <title>RSS2.0Example</title> <link>http://www.exampleurl.com/example/index.html</link> <description>This is an example RSS 2.0 feed</description> <language>en-gb</language> <copyright>Copyright 2002, Oreilly and Associates.</copyright> <managingEditor>email@example.com</managingEditor> <webMaster>firstname.lastname@example.org</webMaster> <rating> </rating> <pubDate>03 Apr 02 1500 GMT</pubDate> <lastBuildDate>03 Apr 02 1500 GMT</lastBuildDate> <docs>http://blogs.law.harvard.edu/tech/rss</docs> <skipDays><day>Monday</day></skipDays> <skipHours><hour>20</hour></skipHours> <category domain="http://www.dmoz.org">Business/Industries/Publishing/Publishers/ Nonfiction/Business/O'Reilly_and_Associates/</category> <generator>NewsAggregator'o'Matic</generator> <ttl>30<ttl> <cloud domain="http://www.exampleurl.com" port="80" path="/RPC2" registerProcedure="pleaseNotify" protocol="XML-RPC" /> <image> <title>RSS2.0 Example</title> <url>http://www.exampleurl.com/example/images/logo.gif</url> <link>http://www.exampleurl.com/example/index.html</link> <width>88</width> <height>31</height> <description>The World's Leading Technical Publisher</description> </image> <textInput> <title>Search</title> <description>Search the Archives</description> <name>query</name> <link>http://www.exampleurl.com/example/search.cgi</link> </textInput> <item> <title>The First Item</title> <link>http://www.exampleurl.com/example/001.html</link> <description>This is the first item.</description> <source url="http://www.anothersite.com/index.xml">Another Site</source> <enclosure url="http://www.exampleurl.com/example/001.mp3" length="543210" type"audio/mpeg"/> <category domain="http://www.dmoz.org">Business/Industries/Publishing/Publishers/ Nonfiction/Business/O'Reilly_and_Associates/</category> <comments>http://www.exampleurl.com/comments/001.html</comments> <author>Ben Hammersley</author> <pubDate>Sat, 01 Jan 2002 0:00:01 GMT</pubDate> <guid isPermaLink="true">http://www.exampleurl.com/example/001.html</guid> </item> <item> <title>The Second Item</title> <link>http://www.exampleurl.com/example/002.html</link> <description>This is the second item.</description> <source url="http://www.anothersite.com/index.xml">Another Site</source> <enclosure url="http://www.exampleurl.com/example/002.mp3" length="543210" type"audio/mpeg"/> <category domain="http://www.dmoz.org">Business/Industries/Publishing/Publishers/ Nonfiction/Business/O'Reilly_and_Associates/</category> <comments>http://www.exampleurl.com/comments/002.html</comments> <author>Ben Hammersley</author> <pubDate>Sun, 02 Jan 2002 0:00:01 GMT</pubDate> <guid isPermaLink="true">http://www.exampleurl.com/example/002.html</guid> </item> <item> <title>The Third Item</title> <link>http://www.exampleurl.com/example/003.html</link> <description>This is the third item.</description> <source url="http://www.anothersite.com/index.xml">Another Site</source> <enclosure url="http://www.exampleurl.com/example/003.mp3" length="543210" type"audio/mpeg"/> <category domain="http://www.dmoz.org">Business/Industries/Publishing/Publishers/ Nonfiction/Business/O'Reilly_and_Associates/</category> <comments>http://www.exampleurl.com/comments/003.html</comments> <author>Ben Hammersley</author> <pubDate>Mon, 03 Jan 2002 0:00:01 GMT</pubDate> <guid isPermaLink="true">http://www.exampleurl.com/example/003.html</guid> </item> </channel> </rss>
4.2.4. The Simplest Possible RSS 2.0 Feed
This, really, is the key to the success of RSS 2.0. The simplest thing you need to do to make the feed validate is very uncomplicated indeed (see Example 4-2). While this isn't any help when you are trying to convey complex information, as with RSS 1.0, or if you're trying to build a complete document-centric system, as with Atom, it is very useful for many other applications.
Example 4-2. The simplest possible RSS 2.0 feed
<?xml version="1.0" encoding="utf-8"?> <rss version="2.0"> <channel> <title>The Simplest Feed</title> <link>http://example.org/index.html</link> <description>The Simplest Possible RSS 2.0 Feed</description> <item> <description>Simple Simple Simple</description> </item> </channel> </rss>
Chapter 10 describes many useful applications that take this a minimalist approach to using RSS 2.0-compliant feeds.