Section 11.2. Case Study: mod_Book | Developing Feeds with Rss and Atom

11.2. Case Study: mod_Book

My wife and I recently moved from London, England, to Florence, Italy, via Sweden. In the first edition of this book, we were just about to leave, and to that effect, much of the contents of our home was already in storage: most of it being books. In the end, it turned out we sent 86 tea chests full of books across Europe.

So now we're unpacking. Many people really like our books and would like to borrow them, and so, for many reasons, it would be quite cool to list details of our books into a feed. As we unpack the books, we will most likely try to scan their barcodes and arrange our library (we're geeky like that), so we will have all sorts of data available.

The challenge is then to design a module for both 1.0 and 2.0 (and Atom, eventually) that can deal with books.

11.2.1. What Do We Know?

The first thing to think about is precisely what knowledge you already have about what you're trying to describe. With books, you know a great deal:

The title
The author
The publisher
The ISBN
The subject
The date of publication
The content itself

There are also, alas, things that you might think you know, but which, in fact, you don't. In the case of books, unless you are dealing with a specific edition in a specific place at a specific time, you don't know the number of pages, the price, the printer, the paper quality, or how critics received it. For the sake of sharable data, these aren't universally useful values. They will change with time and aren't internationally sharable. Remember that once it has left your machine, the data you createin this case, each itemis lost to you. As the first author, it is your responsibility to create it in such a way that it retains its value for as long as possible with as wide an audience as possible.

So, Rule 1 of module design is: decide what data you know, and what data you do not know.

11.2.2. Can We Express This Data Already?

Rule 2 of module design is: if possible, use another module's element to deliver the same information.

This is another key point. It is much less work to leverage the efforts of others, and when many people have spent time introducing Dublin Core support to desktop readers, for example, you should reward them by using Dublin Core as much as possible. Module elements need to be created only if there is no suitable alternative already in the wild.

So, to reexamine the data:

The title

Titles can be written within the core title element of either 1.0 or 2.0, or within the dc:title element of the Dublin Core module. You should always strive to use the core namespace first, so title it is.

The author

Here, you have the first core split between 1.0 and 2.0. In 2.0, you can use the core author element. There is no such thing in 1.0, so you must use the dc:creator element of Dublin Core. Because you should always strive to use the core namespace first, RSS 2.0 users should use author. However, because it's best to have as simple a module specification as possible, use the same element in both module versions. You might want to import the RSS 2.0 namespace into the 1.0 feed and use author in both; however, this can't be done. RSS 2.0's root namespace is "", which can't be imported because there isn't a namespace URI to point to. You can possibly use the URL of the 2.0 specification document as the URI, declare xmlns:rss2="http://backend.userland.com/rss", and then use rss2:author, but because the URI is different, technically this doesn't refer to the same vocabulary as that used in RSS 2.0. As you'll see, using the same elementeven if it is in a slightly different syntaxis very useful if you wish to develop RSS applications. So, for the sake of simplicity, let's opt for dc:creator. You can also use dc:contributor to denote a contributor.

The publisher

Publishers are lovely people and happily have their very own Dublin Core element, dc:publisher.

The ISBN

ISBNs are fantastically useful here. Because the ISBN governing body ensures that each ISBN is unique to its book, this can serve as a globally unique identifier. What's more, you can even turn an ISBN into a URI using the format urn:isbn:0123456789. For RSS 1.0, this will prove remarkably useful, as we will discuss in a moment. Meanwhile, denoting the ISBN is a good idea. Let's invent a new element. Choosing book as the namespace prefix, let's call it book:isbn.

The subject

A book's subject can be a matter of debateespecially with fictionso it may not be entirely sane to make this element mandatory or to trust it. Nevertheless, there are ways to write it. RSS 2.0's core element category may help here, as will dc:subject, especially when used with RSS 1.0's mod_taxonomy.

All these schemes, however, rely on being able to place the subject within a greater hierarchy. Fortunately, library scientists are hard at work on this, and there are many to choose from. For our purposes, let's use the Open Directory hierarchyjust to provide continuity throughout this book.

The date of publication

Again, you can see a clash between the extended core of RSS 2.0 and RSS 1.0's use of Dublin Core. Within RSS 2.0 pubDate is available, and within RSS 1.0, we rely on dc:date. Given that Dublin Core is more widely recognized within the RDF world and perfectly valid within the RSS 2.0 world, it saves time and effort to standardize on it. This is a good example of Rule 3: because you can't tell people what they can't do with your data, you must make it easy for them to do what they want.

The content itself

Now to the content itself. The core description doesn't work here: we're talking about the content, not a précis of it, and we certainly don't want to include all the content, so content:encoded is out too. We really need an element to contain an excerpt of the book: the opening paragraph, for example.

Hurrah! We can invent a new element! Let's call it book:openingPara.

So, out of all the information we want to include, we need to invent only two new elements: book:isbn and book:openingPara. This isn't a bad thing: modules do not just consist of masses of new elements slung out into the public. They should also include guidance as to the proper usage of existing modules in the new context. Reuse and recycle as much as possible.

To summarize, we now have:

<title/> <dc:author/> <dc:publisher/> <book:isbn/> <dc:subject/> <dc:date/> <book:openingPara/>

11.2.3. Putting the New Elements to Work with RSS 2.0

Before creating the feed item, let's decide on what the link will point to. Given that my book collection isn't web-addressable in that way, I'm going to point people to the relevant page on http://isbn.nu, Glenn Fleishman's book-price comparison site.

For an RSS 2.0 item, you can therefore use Example 11-1.

Example 11-1. mod_Book for RSS 2.0

<item>   <title>Down and Out in the Magic Kingdom</title>   <link>http://isbn.nu/0765304368/</link>   <dc:author>Cory Doctorow</dc:author>   <dc:publisher>Tor Books</dc:publisher>   <book:isbn>0765304368</book:isbn>   <dc:subject>Fiction</dc:subject>   <dc:date>2003-02-01T00:01+00:00</dc:date>   <book:openingPara> I lived long enough to see the cure for death; to see the rise of the  Bitchun Society, to learn ten languages; to compose three symphonies; to realize my  boyhood dream of taking up residence in Disney World; to see the death of the workplace  and of work.</book:openingPara> </item>

As you can see in this simple strand of RSS 2.0, the inclusion of book metadata is easy. You know all about the book, and a mod_Book-compatible reader allows you to read the first paragraph and, if it's appealing, to click on the link and buy it. All is good.

11.2.4. Putting the New Elements to Work with RSS 1.0

With RSS 1.0, let's make a few changes. First, the book needs a URI for the rdf:about attribute of item. This isn't as straightforward as you might think. You need to think about precisely what you're describing. In this case, the choice is between a specific bookthe one that is sitting on my desk right nowand the concept of that book, of which my specific book is one example.

The URI determines this. If I make the URI http://www.benhammersley.com/myLibrary/catalogue/0765304368, the item refers to my own copy: one discreet object.

If, however, I make the URI urn:isbn:0765304368, the item refers to the general concept of Cory Doctorow's book. For our purposes here, this is the one to go for. If I were producing an RSS feed for a lending library, it might be different. Example 11-2 makes these changes to mod_Book in RSS 1.0.

Example 11-2. mod_Book in RSS 1.0

<item rdf:about="urn:isbn:0765304368">   <title>Down and Out in the Magic Kingdom</title>   <link>http://isbn.nu/0765304368/</link>   <dc:author>Cory Doctorow</dc:author>   <dc:publisher>Tor Books</dc:publisher>   <book:isbn>0765304368</book:isbn>   <dc:subject>Fiction</dc:subject>   <dc:date>2003-02-01T00:01+00:00</dc:date>   <book:openingPara> I lived long enough to see the cure for death; to see the rise of the      Bitchun Society, to learn ten languages; to compose three symphonies; to realize my  boyhood dream of taking up residence in Disney World; to see the death of the workplace  and of work.</book:openingPara> </item>

The second thing to think about is the preference for all the element values within RSS 1.0 to be rdf:resources and not literal strings. To this end, you need to assign URIs to each possible value. Within RSS 1.0, you can keep extending all the information you have to greater and greater detail. At this point, you must think about your audience. If you foresee people using the feed for only the simplest of taskssuch as displaying the list in a reader or on a siteyou can stop now. If you see people using the data in deeper, more interesting applications, then you need to give guidance as to how far each element should be extended.

For the purposes of this chapter, we need to go no further, but for an example, let's go anyway. Example 11-3 expands the dc:author element via RDF and the use of a new RDF vocabulary: FOAF, or Friend of a Friend (see http://www.rdfweb.org).

Example 11-3. Expanding the module even further

<?xml version="1.0"?> <rdf:RDF  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:book="http://www.exampleurl.com/namespaces" xmlns="http://purl.org/rss/1.0/" >     <item rdf:about="urn:isbn:0765304368">   <title>Down and Out in the Magic Kingdom</title>   <link>http://isbn.nu/0765304368/</link>   <dc:author rdf:resource="mailto:doctorow@craphound.com" />   <dc:publisher>Tor Books</dc:publisher>   <book:isbn>0765304368</book:isbn>   <dc:subject>Fiction</dc:subject>   <dc:date>2003-02-01T00:01+00:00</dc:date>   <book:openingPara> I lived long enough to see the cure for death; to see the rise of the      Bitchun Society, to learn ten languages; to compose three symphonies; to realize my  boyhood dream of taking up residence in Disney World; to see the death of the workplace  and of work.</book:openingPara> </item>     <dc:author rdf:about="mailto:doctorow@craphound.com">  <foaf:Person>    <foaf:name>Cory Doctorow</foaf:name>    <foaf:title>Mr</foaf:title>    <foaf:firstName>Cory</foaf:firstName>    <foaf:surname>Doctorow</foaf:surname>    <foaf:homepage rdf:resource="http://www.craphound.com"/>    <foaf:workPlaceHomepage rdf:resource="http://www.eff.org/" />  </foaf:Person> </dc:author>     </rdf:RDF>

Because only you, as the module designer, know the scope of the data you want to put across, you must document your module accordingly. Speaking of which...

11.2.5. Documentation

You must document your module. It is obligatory. The place to do so is at the address you use as the namespace URI. Without documentation, no one will know precisely what you mean, and no one will be able to support your module. Without support, the module is worthless on the wider stage.