Specifications


The specifications for RSS 2.0 and 1.0 vary in their stringency and guidance when building feeds. Although much of the information is self-explanatory, it's worth discussing some areas in detail, because they are common sources of error.

RSS 2.0

RSS 2.0 is arguably the most common RSS feed produced today, as well as being the simplest specification (running about 10 pages). However, it is also the most misinterpreted of the three main syndication formats.

The core structure of an RSS 2.0 feed consists of:

  • q A root node of rss, with a version attribute (should be 2.0).

  • q One (and only one) channel node within the rss node. This node serves as a container for the remainder of the document and provides information about the feed. The following table discusses the elements of the channel node. Other elements may be added if the namespace is included in the feed.

  • q One or more item nodes within the channel node. These are the individual items of the feed.

Open table as spreadsheet

Element

Required/Optional

Notes

title

Required

A name for the feed. This is typically the same as the name for the site or application it comes from.

link

Required

The URL for the Web site producing the feed. In the case of application-specific feeds, this should be to a site providing more information on the content of the feed.

description

Required

A longer description of the source and content of the feed.

language

Optional

The language the channel is written in, using the language- locale format. For example, en-us for the United States English, fr-ca for Canadian French, or fr-be for Belgian French.

copyright

Optional

Any copyright notice for the content in the feed.

managingEditor

Optional

The e-mail address of the person responsible for the content of the feed.

webMaster

Optional

The e-mail address for the person responsible for the technical source of the feed. Typically, this is the same as managingEditor, but it may be different if one person creates the feed and another makes it available for reading.

pubDate

Optional

The last publication date for the feed. See following sidebar on Dates and RSS because this is one of the primary sources for errors and incompatibilities in RSS 2.0 feeds.

lastBuildDate

Optional

Similar to the pubDate, the lastBuildDate is the date (and time) when the feed was last built. Generally, pubDate and lastBuildDate are the same. However, if the feed needs to be changed without publishing a new item (such as when there is a correction or update), only lastBuildDate changes. This date could be used by a client to determine if an update has occurred, although this is rarely done.

category

Optional

The name of a category describing the content of the feed. Multiple category elements may exist in the channel.

generator

Optional

The application used to create the feed. Basically for information only.

docs

Optional

The URL of the RSS 2.0 specification-http://www.blogs.law.harvard.edu/tech/rss. This serves a similar purpose to a namespace URL by providing a location to get more information about the structure of the feed.

cloud

Optional

A rarely used element identifying the cloud Web service that can be used to notify clients of changes to the feed. I have never seen such a service, and most RSS processors actually only make use of the ttl element shown next.

ttl

Optional

"Time To Live" is the time (in minutes) that clients should wait before re-querying the feed. This should be set to a value based on the average change frequency of the RSS feed. For example, a news site might update hourly, so the ttl value should be 60. Alternatively, a personal RSS feed might update occasionally, so a value of daily (1440) would be good enough.

image

Optional

Information used to attach a graphic for the feed. This is sometimes used to customize the appearance of the feed icon for aggregators, but is rarely used by aggregators.

rating

Optional

A rarely used element containing the PICS rating-that is, the Platform for Internet Content Selection, a standard way of identifying the type and rating of content. The intent is to enable teachers and parents to manage what children may be exposed to on the Internet. This is only useful if your Web site requires a PICS rating.

textInput

Optional

A rarely used element (I don't think I've ever seen this outside of the specification, and even the specification states: "The purpose of the textInput element is something of a mystery."). Best to just ignore this field and move on.

skipHours

Optional

A rarely used element (I don't think I've ever seen this outside of the specification).

skipDays

Optional

A rarely used element (I don't think I've ever seen this outside of the specification). This is a hint to applications reading the feed that no updates are permitted on the days listed (space delimited).

image from book
Dates and RSS

RSS 2.0 uses the slightly outdated RFC 822 for its date format. This format has the general structure:

 Day of Week, Year Month Day Hour:Minute:Second Timezone 

where Day of Week is an optional value. For example, the following are all valid date formats based on the specification:

  • q Tue, 15 Nov 2005 16:00:01 PDT

  • q 13 Feb 2006 07:37:00-0800

  • q Wed, 02 Oct 2002 13:00:00 GMT

This format can get confusing in that the time zone value can be any of the following: UT, GMT, EST, EDT, CST, CDT, MST, MDT, PST, PDT, Z, A, M, N, or a numeric offset (+/0000 to +/1200). These refer to Greenwich Mean Time (UT or Universal Time), a US-centric time zone and Military time (the A, M or N values, although these are generally considered deprecated) or an offset. Making it slightly worse, RSS 2.0 also allows two-digit years. All this variability means that parsing these dates can be difficult. For .NET developers, this is a problem because .NET supports a later RFC, 1123. This RFC simplified the date structure to remove the support for US time zones. In addition, it removed the requirement of the day of week value. While the day of the week value may appear, it is not required. Dates compatible with RFC 1123 are also compatible with the older RFC 822, but not vice versa.

However, the default parser in .NET throws an exception when passed any of the US time zones. That is, DateTime.Parse(“Tue, 15 Nov 2005 16:00:01 PDT”) throws an exception as this format is no longer supported with the later RFC.

RSS 1.0 avoids many of these issues (as does Atom, as we shall see soon) by using the ISO 8601 standard for its date format:

 Year-Month-DayTHour:Minute:SecondTimeZone 

Where time zone can either be Z for Universal Time or the time zone offset in numeric form (+/Hours:Minutes e.g. 08:00 or +5:00)

image from book

The following table discusses the elements of the item node:

Open table as spreadsheet

Element

Required/Optional

Notes

title

Optional (see notes)

The title of the item. This is usually the headline. Although optional, it is highly recommended if you want a useful feed. One of title or description must be present in a feed.

link

Optional (see notes)

The URL of the item. This can be omitted for information-only feeds, but this is rare. The content of this element is also interpreted differently by some sites. Most feeds use this element to point to the URL of the main post. However, posts that refer to content that exists elsewhere, such as a blog post about a Web site, actually point to the original Web site, not the blog post. Either use is acceptable, as long as you are consistent.

description

Optional (see notes)

Either the excerpt or the full post. This is one of those "religious arguments" frequent in the computer industry. Many people believe that the feed should only contain a brief excerpt of the actual post, requiring users to go to the Web site to read the full post. This form reduces the overall bandwidth of the feed, particularly when large posts are syndicated. Alternatively, it does reduce the overall value of the feed, especially for aggregating sites.

author

Optional

The e-mail address of the author of the post. This is included here primarily for feeds that may contain posts by multiple authors.

category

Optional

The name of a category describing the content of the feed. Multiple category elements may exist in the item. Client applications can use these categories to organize or filter the content.

comments

Optional

URL of a page for entering comments.

enclosure

Optional

For a long time, this was a rarely used element. Then came podcasting, and it became a popular tag. The enclosure element is used to identify an external media item associated with this item. Often, it is a music item or video. Applications processing this element (podcasting software) download this media item automatically. The enclosure element takes the form:

 <enclosure url="url" length="bytes" type="mime" /> 
  

Where url is the URL of the item, bytes is the length of the item (in bytes, as a courtesy to applications downloading it), and mime is the MIME type of the media. For example:

 <enclosure url="http://www.cbc.ca/quirks/media/2005 -2006/mp3/qq-2006-02-11.mp3" length="22280320" type="audio/mpeg"/> 

guid

Optional

A string uniquely identifying the item. This is one of the other major inconsistencies in the RSS specification. Many RSS feeds use the URL of the post here, with the optional attribute isPermaLink set to true:

 <guid isPermaLink="true">http://www.foo.com/ 23.html</guid> 

Other sites use it simply as a URI:

 <guid isPermaLink="false">Titan_5052</guid> 
  

The only required consistency is that within the feed, the guid element should be unique. Beyond that, you're on your own.

pubDate

Optional

The date and time of the posting in RFC 822 format

source

Optional

A rarely used element. This is essentially a self reference to the feed URL for the item in case it is viewed separately. It takes the form:

 <source url="url to feed">Feed title</source> 

image from book
Categories versus tags

There are two primary ways of identifying the topic of an item in a feed. Although the RSS 2.0 specification includes the comments element, some blogging engines do not support creating or adding categories. This limitation has led to the tagging movement, originally proposed by Technorati. The idea of either format is to identify the general scope of the content of the feed item. That is, to mark it as being about photography, music, or the Olympics. Software processing the feed would then associate items with the same category or tag.

The category element is an optional element for each item, and an item can include multiple category elements. The format is:

 <category domain="taxonomy">value</category> 

The domain value is optional and should point to the URI describing the category's taxonomy. Just as with a namespace URI, this could be a schema or other description of the structure of the categories, or it could be a unique value for each category. For example, MSDN includes categories in the article feeds.

 <category domain="mscomdomain:Subject">XML</category> 

The tag element is something that can be added to the body of the description of an item and is a simple HTML anchor tag:

 <a href="taxonomy" rel="tag">value</a> 

The href for the taxonomy should point to the URL of any site that organizes content by tag. The suggested default is http://www.technorati.com/tag/[tagname], but it could also be Wikipedia, Flickr, Delicious, or other site, as long as the last item on the query string is the tag name. For example:

 <a href="http://www.technorati.com/tag/XML" rel="tag">XML</a> 

You can make the choice of category versus tag, if the blogging engine or other RSS generator doesn't support the addition of category tags. However, even if your blogging engine supports adding categories, you may want to also provide tags. They are a low-weight way of adding information to the feed. In addition, using a tag enables your content to be combined with the growing body of other content associated with that tag.

image from book

RSS 2.0 is certainly easy to create, and it is broadly accepted. Parsing these feeds from multiple sources can be a bit of a headache, however, because of the variability in the interpretation of the specification and lack of an XML schema for validation. Because of this, I highly recommend use of an online validator, such as http://www.feedvalidator.org.

RSS 1.0

RSS 1.0 is a very extensible syndication format based on the W3C's Resource Description Framework (RDF). Although not as simple as RSS 2.0 (the specification is 18 pages to RSS 2.0's 10 pages, not counting the additional pages of the RDF specification), it certainly has fewer ambiguities. When referring to the acronym for RSS 1.0, it expands to RDF Site Summary in a nicely recursive acronym of an acronym. It is the logical child of RSS 0.9 (but not 0.91 or 0.92) and improves on that format. The main goals were to standardize the then orphaned RSS 0.9 and enable additional expansion and evolution by adding modules. In addition, the specification is much more precise, reducing misinterpretation.

The core structure of an RSS 1.0 feed consists of:

  • q A root node of RDF from the rdf namespace (http://www.w3.org/1999/02/22-rdf-syntax-ns#).

  • q One (and only one) channel node within the RDF element. This contains information about the channel itself. The following table discusses the common elements of this node.

  • q An items element within the channel element. This includes pointers to all the item elements of the feed, in order.

  • q One or more item nodes. Note that (unlike in RSS 2.0) these are not contained within the channel node, but are child elements of the RDF element. The following table discusses the elements of the item nodes.

  • q Optionally, one image node. This node is associated with the feed and is usually an icon (88×31 pixels) for the home site.

  • q Optionally, one textinput node.

  • q Liberal use of the rdf:about attribute to provide extra information for the elements. This attribute is required for the channel, image, item, and textinput nodes. The following table discusses the standard elements of the RSS 1.0 channel.

Open table as spreadsheet

Element

Required/Optional

Notes

title

Required

A name for the feed. This is typically the same as the name for the site or application it comes from.

link

Required

URL to the home page for the feed.

description

Required

A description of the channel's content.

image

Required (see notes)

Occasionally used. This element is only required if an image element is also in the feed. It is typically an icon (88×31 pixels) used by the site.

items

Required

A listing of pointers to the item nodes elsewhere in the document. This is used to provide the order of the items using an rdf:Seq (sequence) element:

 <rdf:Seq> <rdf:li rdf:resource='http://url.to.item1' /> <rdf:li rdf:resource='http://url.to.item2' /> <rdf:li rdf:resource='http://url.to.item3' /> </rdf:Seq> 

textinput

Required (see notes)

Not frequently used. This element is required only if a textinput element is also used in the feed. Contains a pointer to that textinput with an rdf:resource attribute:

rdf:about

Required

An XML attribute that points to the URI identifying the channel. Generally, this is the URL of the feed, but this is not required.

The following table discusses the standard elements of the RSS 1.0 item.

Open table as spreadsheet

Element

Required/Optional

Notes

title

Required

The title or headline of the item.

link

Required

The URL of the item.

description

Optional

A brief description or excerpt of the item.

In addition to the channel and item elements, RSS 1.0 feeds can also include image and textinput elements. These elements are rarely used. If they do exist, however, you also need matching elements in the channel to point to them. The image element is used to provide a graphic for the feed, typically an icon. The textinput element is a bit of a throwback to the days when pages contained their own search mechanism (for example, the isindex tag) and is intended to provide a mechanism for searching the RSS feed.

In addition to the core RSS elements, RSS 1.0 also includes support for extensibility through modules. These are namespace-identified extensions that add information to the feed. The three most commonly used modules are:

  • q Dublin Core-Standard metadata elements used here, and in RDF generally. These include items such as author, date, language, and so on.

  • q Syndication-Provides hints to readers about how frequently the feed should be updated. This fits the same role as the skipDays, ttl, and skipHours elements of RSS 2.0.

  • q Content-One of the ongoing discussions around RSS is whether the description should hold the entire body of the item. If it is just an excerpt, the content:encoded element can be used to hold the entire post.

Although RSS 1.0 is slightly more difficult to read and write than RSS 2.0, it has the benefit of being a more accurate specification. That is, with less ambiguity in the specification, multiple implementations of RSS 1.0 are more likely to match. This is definitely not the same for RSS 2.0, where it sometimes seems that each person creating a feed has interpreted the specification differently.




Professional XML
Professional XML (Programmer to Programmer)
ISBN: 0471777773
EAN: 2147483647
Year: 2004
Pages: 215

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net