Section 5.2. Resource Description Framework | Developing Feeds with Rss and Atom

5.2. Resource Description Framework

This system of defining everything with URIs, and using this to describe the relationships between things, has been formalized in a system known as the Resource Description Framework (RDF). In this section, we'll look at enough RDF to give you a head start on the rest of the book. For a much deeper insight into RDF, take a look at Practical RDF (O'Reilly).

Because RDF is quite abstractits ability to be written in different ways notwithstandingin this chapter, we are going to look at what the RDF developers call the "data model," which we can call "the really simple version, in pictures."

5.2.1. Resources, PropertyTypes, and Properties

As before, within the data model, anything (an object, a person, a document, a concept, a section of a document, etc.) can have a URI. In RDF anything addressable with a URI is called a resource.

Some resources can be used as properties of other resources. For example, the concept of "Author" has a URI of its own (all concepts can), and other resources can have a property of "author." Such resources are called PropertyTypes.

A property is the combination of a resource, a PropertyType, and a value. For example, "The Author of RSS and Atom is Ben Hammersley." The value can be a string ("Ben Hammersley" in the previous example), or it can be another resourcefor example, "Ben Hammersley (resource) has a home page (PropertyType) at http://www.benhammersley.com (resource)."

5.2.2. Nodes and Arcs

RDF's data model is most easily understood with diagrams, called RDF graphs, that show the relationships between resources, PropertyTypes, and properties. In these diagrams, the RDF world is split into nodes and arcs.

The resources and the values are the nodes, identified by their URIs. The PropertyTypes are the arcs, representing connections between nodes. The arcs themselves are also described by a URI.

Figure 5-1 is an RDF graph that shows the previous managingEditor example as three nodes connected by two arcstwo separate RDF triples. By convention, the subject is at the blunt end of the arrow, the property (or predicate) is the arrow itself, and the object is at the pointy end of the arrow.

Figure 5-1. A simple RDF graph

In Figure 5-1, the subject node on the left, representing the URI http://www.example.org/example.rss, has a relationship with the object node on the right, representing the URI editor@oreilly.com, and this relationship is defined by the URI http://purl.org/rss/1.0/modules/rss091#managingEditor. The subject node also has a relationship with another object node, representing the URI http://purl.org/rss/1.0/channel, and that relationship is defined by the URI http://www.w3.org/199/02/22-rdf-syntax-ns#type.

What makes things interesting with RDF is that, as I've said before, a node can be both a subject and an object in a chain of node, arc, node, arc, node, and so on (or, to put it another way, resource, PropertyType, resource, PropertyType, resource, and so on). Consider the graph in Figure 5-2.

Figure 5-2. An RDF graph of the continuing example

In this example, we've taken the RDF graph a step further. We've created a resource to represent the managing editor (you'll notice that the managing editor resource itself is anonymouswe haven't defined it with a URI yet, hence the empty rectanglethis isn't a problem), but have given it resources of its own, with PropertyType arcs whose URIs represent the managing editor's full name, home page, and email address.

This allows some definitive statements:

The channel (where the concept of "channel" is identified by the URI http://purl.org/rss/1.0/ and the channel itself is identified by the URI http://www.example.org/example.rss) has a resource called managingEditor (which is part of a concept defined by the URI http://purl.org/rss/1.0/modules/rss091#), which in turn has one resource of its own, identified as a "home page" in the context of the URI http://example.org/stuff/1.0/, which is itself identified with the URI http://jorge.oreilly.com/. It also has two properties, fullName and email, both in the context of the URI http://example.org/stuff/1.0/, with the values Jorge Grandehoncho and mailto:editor@oreilly.com, respectively.

Or to put it simply:

This channel has a managing editor whose name is Jorge Grandehoncho, whose home page is http://jorge.oreilly.com/, and whose email address is editor@oreilly.com.

You should bear two things in mind. First, the continuation of the RDF graph doesn't have to be constrained to one RDF document. The preceding example can be extended by including more RDF data at the network-retrievable version of the resource's URIs. So, while the RDF data for this book may refer to me solely by author, PropertyType, and a URI, the RDF at that URI can also refer to other things I have written, and those articles can contain RDF data that refers to the subjects of the articles. This distributed nature of RDF allows for vast fields of statements to be made definitively, and every additional set of RDF data increases the power of the whole considerably. RDF data is designed with aggregation in mind.

Second, and this will become key later on, because the PropertyTypesthe possible relationships between nodesare represented by a URI, anyone can develop a set of elements. RDF vocabularies, therefore, can be developed to describe anything. And, as long as the URI is unique, RDF parsers won't get confused. Your descriptive powers, therefore, are endless: either an RDF vocabulary exists, or it is simple to make up your own.

Outside the scope of this book, there are also various languages for describing RDF vocabularies, or ontologies. When you add all of these together, you have what is called the Semantic Web.

5.2.3. Fitting RDF to RSS

This system for creating definitive statements from metadata fits perfectly with the aims of RSS. RSS feeds are, at their core, collections of resources with implicit relationships, and RDF is designed to describe these relationships. Also, and most powerfully, RDF makes these relationships explicit in a way that allows them to be used.

For example, the RDF graph can be traveled in any direction. The statement "This document (subject/resource) was written (predicate/PropertyType) by Ben Hammersley (object/resource)" can be read from the other end of the graph: "Ben Hammersley (subject/resource) wrote (predicate/PropertyType) this document (object/resource)."

So, you can query a database of RDF-based documents for "all the documents written by Ben Hammersley." If more triples are declared within the documents, you can query for "all the documents written by the man with the email address ben@benhammersley.com," or even "all the documents written by the man with the email address ben@benhammersley.com, and which are on the subject of dates." To take it even further, you can query for "all the documents written by the man with the email address ben@benhammersley.com, and which are on the subject of dates (in the context of small fruits, but not romantic encounters)." By taking different paths through an RDF graph, you can extract all sorts of data quite easily. You can also, by adding in RDF vocabularies not covered by RSS, do even more complicated searches, such as, "Find me all the articles written by any friend of Ben Hammersley, during any year that Manchester United won the English Premier League," and searches that are much more complex and interesting.

The ability of RDF to allow complex querying is one definite attraction, but the implications go further than that. Because RDF works just as well distributed as in a database, publishing an RDF version of RSS provides a remarkably useful entry point for the RDF world to access your site. Also, because the RDF vocabularies are easily definable, anyone can invent one. This makes RDF both wide ranging and fast growing, but in a way that doesn't require a single standards overlord. In the language of RSS 1.0, RDF is extensible.