RDF | XML, Web Services, and the Data Revolution


Team-Fly

	XML, Web Services, and the Data Revolution By Frank P. Coyle
	Table of Contents

	Chapter 2. The XML Technology Family

RDF is part of the Semantic Web.

RDF is an effort to bring order to the Web. It is part of the W3C's Semantic Web initative, an effort not to create a separate Web but to extend the current one in a way that gives information well-defined meaning, better enabling computers and people to work in cooperation. Because there are billions of pieces of data on the Web, the problem is getting to the information you really need. Most search engines fail miserably, returning thousands of unhelpful links because Web pages don't provide information about their content. However, some search engines do better than others because they use metadata.

Yahoo and Google use metadata to improve search results.

Technically metadata is data about data. The search engines Yahoo and Google use metadata to build useful search links. When you search Yahoo, you're searching through human-generated subject categories and site labels. Google, on the other hand uses a method that ranks relevant Web sites based on the structure of the Internet itself. For example, Google interprets a link from page A to page B as a vote for page B by page A. More votes or links that connect to a page mean a higher rank for that page. Also, votes cast by "important" Web pages count more than links from "unimportant" pages. Either way, smart search requires metadata.

Metadata

Metadata: information about information.

Metadata includes the indexing and organization required to retrieve library material such as books by author, title, or subject. It is the software infrastructure behind a large video store catalog that lets a customer find a movie directed by Quentin Tarantino or all movies where the director also appears in the film ( Reservoir Dogs, Apocalypse Now ). Then when the customer gets the movie home, the metadata of the yellow pages lets one find the phone number for pizza delivery so there will be something to eat while watching the movie.

The common thread here is information about information. In each case, there is a need for information about what you're looking for ”the book's location, the video's name , the pizza shop's phone number ”to zero in on your goal.

Metadata isn't needed but it certainly helps.

Is metadata required? In theory, no. The brute “force approach ”looking through a library one book at a time, or wandering past video store shelves until you find a movie, or calling all the possible numbers in your area code until you hit on pizza delivery ”is always a possibility. But that would be far too time consuming. Without metadata there wouldn't be time for much else beside brute-force searching.

MetaData: Beyond Search

Although metadata is most commonly used to find things, metadata is also used to support the business side of an enterprise. The video store uses metadata to determine how often videos are being rented, when it's time to move rentals to the for-sale bin, and who its best customers are. Running a viable video store operation would be impossible without metadata.

The Components of RDF

RDF helps organizations exchange metadata.

RDF is used to identify the commonality behind different ways of categorizing data and to represent that commonality in such a way that Web architects can use it to build new and more complex technologies. The Resource Description Framework, as its name implies, is a framework for describing and interchanging metadata. It is built on the following three definitions.

Resources

All things described by RDF expressions are called resources. A resource may be an entire Web page, such as the HTML document http://www.w3.org/Overview.html. A resource may also be a part of a Web page, such as a specific HTML or XML element within the document. A resource is anything that can have a URI; this includes all the Web's pages, as well as individual elements of an XML document.

Properties

Properties are specific aspects, characteristics, attributes, or relations used to describe resources. A particular property is a resource that has a name and can be used as a property, for example Author or Title. In many cases, all we really care about is the name; but a property needs to be a resource so that it can have its own properties.

Statements

A statement consists of a resource, a property, and a value. These parts are known as the subject, predicate, and object of a statement. A typical statement is, "The Author of http://davenet.userland.com/2001/09/10/openSourceIn2001 is Dave Winer." The value can be just a string, for example "Dave Winer," or it can be another resource, as in the example, "The Home-Page of http://davenet.userland.com/2001/09/10/openSourceIn2001 is http://davenet.userland.com."

A specific resource together with a named property plus the value of that property for that resource is an RDF statement. The object of a statement (that is, the property value) can be another resource or it can be a literal, such as a resource (specified by a URI) or a simple string or other primitive data type defined by XML. In RDF terms, a literal may have content that is XML markup but is not further evaluated by the RDF processor.

Consider as a simple example the sentence, "Dave Winer is the creator of the resource http://davenet.userland.com/2001/09/10/openSourceIn2001." This sentence has the following parts:

Subject (resource):	http://davenet.userland.com/2001/09/10/openSourceIn2001
Predicate (property):	Creator
Object (literal):	"Dave Winer"

RDF Vocabularies

RDF may be used to define specialized vocabularies.

Properties standing alone, however, are not very useful. The expectation is that properties will be packaged, for example, as a set of basic bibliographic properties such as Author, Title, and Date. Over time, property collections or vocabularies will emerge in competition with each other, such as vocabularies for online learning or wine connoisseurship. This means that opinions , pointers, indexes, or anything that helps discovery will have high value. Diversity of ideas inevitably leads to a diversity of vocabularies, since anyone can come up with a vocabulary, advertise it, and charge a fee. The market will help the good ones survive.

RDF is designed to have the following characteristics:

Independence: Properties that tell us something about a resource can be invented by anyone. There is no one list of accepted ways to categorize things. Although different categorization schemes already exist (for example, those of the Library of Congress), RDF is flexible enough to admit other schemes that have not yet been invented.
Interchangeability: RDF statements are convertible into XML, which means they can be exchanged across the Web.
Scalability: RDF statements are simple, three-part records (resource, property, value), so they are easy to handle and retrieve data from, even when they exist in large numbers.
Properties functioning as resources: Properties themselves can have properties that can be discovered and manipulated like any other resource. This recursive aspect of RDF is important because, for example, I might want to know if anyone out there has defined a property that describes the genre of a movie, with values like Comedy, Horror, Romance, or Thriller. I'll need metadata to help with that.
Values functioning as resources: Most Web pages have a property named Home-Page which points to the Web site from which the page came. This value is itself a resource. Therefore it's important that values of properties must be able to include resources.
Statements functioning as resources: Statements themselves can have properties. Because there is so much varied information on the Web, we'll need to do lookups based on other people's categorizations (as is done with Yahoo). This means that any statement, such as "The Subject of this page is Java Technology," needs to have properties that tell us "Who said so? And when?" Thus statements need properties.

The Semantic Web is seen as an evolving data repository.

For Tim Berners-Lee, the real power of the Semantic Web will come about when program agents collect Web content from a variety of sources and exchange their results with other programs and agents. The effectiveness of these agents will increase as more Web content and services become available. The dream is that a Semantic Web will allow agents not explicitly designed to work together to transfer data by using RDF semantics that describe what the data really is all about.


Team-Fly

Top