Section 7.8. Web-Related XML Applications | Web Design in a Nutshell: A Desktop Quick Reference (In a Nutshell (OReilly))

7.8. Web-Related XML Applications

XML is already being put to powerful uses on the Web. Some languages, like XHTML and RSS, are expanding the possibilities of web-based content and changing the way we use the Web itself. Others have found small niche uses (such as SMIL and MathML) or have yet to live up to their promised potential (such as SVG). This section introduces these XML languages and others that are relevant to the Web.

Figure 7-1. An unstyled XML document displayed in Firefox 1.0

Figure 7-2. An XML document with a CSS style sheet displayed in Firefox 1.0

7.8.1. XHTML (Extensible Hypertext Markup Language)

In the context of XML, XHTML is a language for describing the content of hypertext documents intended to be viewed or read in some sort of browsing client. It uses a DTD that declares such elements as paragraphs, headings, lists, and hyperlinks. It uses the namespace http://www.w3.org/1999/xhtml.

In the context of web design, XHTML is the updated version of HTML and is the current W3C recommendation for authoring web pages. It has all the same elements and attributes as the HTML 4.01 Recommendation, but where HTML was written according to the broader rules of SGML, XHTML has been rewritten according to XML syntax. That means that XHTML documents need to be well-formed, requiring more stringent markup practices. XHTML is by far the dominant use of XML on the Web.

XHTML is discussed in great detail in Chapters 8 through 15.

7.8.2. RSS (Really Simple Syndication or RDF Site Summary)

RSS is an XML language and file format for syndicating web content. The elements in the RSS vocabulary provide metadata about content (such as its headline, author, description, and originating site) that allows content to be shared as data, known as an RSS feed. While originally intended for headlines and short summaries, some RSS feeds now contain the full content of each posting, including marked-up XHTML content. The content of the feed is up to the discretion of the author.

RSS feeds can be used to display information from other sites on a web page, such as headlines from Slashdot on a technology-related site. RSS feeds can also be read using special programs called feed readers (or news readers ). Readers may be web-based or standalone desktop applications. Web sites that combine feeds from many sources in one place are sometimes called aggregators .

Some popular RSS feed readers include SharpReader (Windows), NetNewsWire (Mac), and the web-based Bloglines. A web search for "RSS readers" will turn up many more. Some browsers, such as Firefox 1.0 and Safari RSS, come with built-in RSS readers.

7.8.2.1. How it works

To understand how RSS works, consider this possible scenario. Say you have a favorite news site that is updated frequently throughout the day and you want to make sure you don't miss their Oscar nomination announcement. You could use your web browser to visit the site every 20 minutes and scan through it for new posts, but that would waste a lot of time. But, if that site is RSS-enabled (and most news sites are), every time they post an article to the site, a listing of that article simultaneously appears in RSS feed readers that have subscribed to the site and are themselves checking the site once an hour or so. Using a news reader, you could keep an eye on new articles as they are posted and take a break only when you see Oscar in the title.

Originally developed to create web "channels" during the days of web push technologies, news sites were the first to put RSS to widespread use. But it wasn't until the weblog (or blog) phenomenon that the RSS acronym became as familiar as HTML.

Because blog creation software such as Blogger and Movable Type made it easy to publish content as an RSS feed, most bloggers make their site content available both on a web page and via an RSS feed (watch for the ubiquitous orange RSS or XML icon). That means that you can use a news reader to see when your friends post without having to check every blog, every day. Furthermore, you can often read the content right there in the reader, without skipping from site to site.

Many web users have integrated spending time with their RSS feed readers into their daily routines. Bloggers are finding that an increasing number of visitors are reading their sites via RSS feeds rather than in the context of a designed page. In this way, RSS has made a significant impact on how information is produced and consumed.

7.8.2.2. Trouble over an RSS standard

The story of the development of RSS has all the makings of a daytime drama. Along the way, RSS developers divided into two camps, both claiming right to the initials "RSS" for their specifications. The result is that we, indeed, now have two recent standards, RSS 1.0 and RSS 2.0, that sound sequential, but are actually conflicting. In addition, there are several older incompatible flavors of RSS (0.91, 0.92, 0.93, and others) that are still in use.

The history of the RSS "fork" is well documented, and it makes for some interesting reading. Check out Mark Pilgrim's blow-by-blow account taken from actual message board and mailing list posts at diveintomark.org/archives/2002/09/06/history_of_the_rss_fork. You can also find a more general RSS history by Joseph Reagle at goatee.net/2003/rss-history.html.

RSS 1.0 is the product of the RSS-DEV Working Group, a committee of individuals, some of whom had worked on various incarnations of RSS since its inception. Their vision for RSS (RDF Site Summary) is that it should take full advantage of RDF (a metadata syntax discussed below) and XML namespaces in order to harness the full power of XML. They added these features into the developing RSS 0.91 spec in development and called the result RSS 1.0.

On the other side of the debate is David Winer (of Userland Software) who maintains that the reason RSS became so popular in the first place is because it was so simple to author and use. It achieved this simplicity specifically because it didn't include RDF or namespaces, and David and others wanted it to stay that way. David made minor changes to RSS 0.91 and called the result RSS 2.0 (for Really Simple Syndication). RSS 2.0 is not RDF based, but does address namespaces.

Developers on both sides of the RSS controversy agree that the technology is far too useful to suffer from conflicting and confusing standards. As of this writing, everyone has agreed to work toward a unified method, or at least distinctive names, for web syndication.

7.8.2.3. Enter Atom

In June 2003, Sam Ruby set up a wiki to discuss and design "a well-formed log entry." Many of those frustrated with both the political drama and technical limitations of RSS joined the effort, and in June 2004 formally set up the Atompub Working Group at the IETF (Internet Engineering Task Force, a volunteer organization that develops Internet standards) to develop and formalize a new feed format and publishing protocol called Atom (formerly Echo). The Atompub Working Group's goal is to create a single standard for syndicated content feeds based on experience gained with RSS.

As of this writing, Atom 1.0 has been published and accepted as a proposed standard. Atom is being backed and implemented by some important syndication tool developers and indexers (e.g., Google and Technorati).

7.8.2.4. For further reading

For more information on RSS and Atom, visit these online resources:

web.resource.org/rss/1.0/: RSS 1.0 specification
blogs.law.harvard.edu/tech/rss: RSS 2.0 specification
www.intertwingly.net/slides/2003/rssQuickSummary.html: A comparison of RSS specifications
ietf.org/html.charters/atompub-charter.html: IETF's Atom Publishing Format and Protocol Charter
www.intertwingly.net/wiki/pie/FrontPage: The Atom Project

7.8.3. RDF (Resource Description Framework)

RDF is an XML application used to define the structure of metadata for documents; for example, data that is useful for indexing, navigating, and searching a site. A standard method for describing the contents of a web site, page, or resource could be useful to automated agents that search the Web for specific information.

Metadata could be used in the following ways:

For descriptions of resources to provide better search engine capabilities
In cataloging, for describing the content and content relationships available at a particular web site, page, or digital library
In describing collections of pages that represent a single logical "document"
For digital signatures that allow electronic commerce, collaboration, and other "trust"-based applications

A simple RDF document that provides author information about a book looks like this (this example is taken from and describes the O'Reilly book XML in a Nutshell):

 <rdf: RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description about="urn:isbn:0596000588">    <author>Elliotte Rusty Harold</author>    <author>W. Scott Means</author> </rdf:Description> </rdf:RDF>

The first line of code declares the namespace for RDF as http://www.w3.org/1999/02/22-rdf-syntax-ns#.

For more information about RDF, see the W3C's pages at www.w3.org/RDF/.

7.8.4. SVG (Scalable Vector Graphics)

The W3C is developing the Scalable Vector Graphics (SVG) standard for describing two-dimensional graphics in XML. SVG allows for three types of graphic objects: vector graphic shapes (paths consisting of straight lines and curves), images, and text. The following sample SVG code (taken from the W3C Recommendation) creates an SVG document fragment that contains a red circle with a blue outline (stroke):

 <?xml version="1.0" standalone="no"?> <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 20001102//EN" "http://www.w3.org/TR/2000/CR-SVG-20001102/DTD/svg-20001102.dtd"> <svg width="12cm" height="4cm">    <desc>Example circle01 - circle expressed in physical units</desc>    <circle cx="6cm" cy="2cm" r="1cm"            style="fill:red; stroke:blue; stroke-width:0.1cm" /> </svg>

The SVG standard provides ways to describe paths, fills, a variety of shapes, special filters, text, and basic animation. When using SVG within another XML document type, identify its namespace as http://www.w3.org/2000/svg.

To view SVG graphics, you must have an SVG viewer installed. The most popular is Adobe's SVG Viewer (available as a free download at www.adobe.com), which allows SVG documents to display in a browser window. Adobe also includes tools for creating SVG files in Illustrator and GoLive. (As of this writing, it is unclear whether Adobe will continue to support GoLive now that it has acquired Macromedia.)

For more information on SVG and lists of all available viewers, editors, and converters, see the W3C pages at www.w3.org/Graphics/SVG. Or, if you want your information in book form, try SVG Essentials by J. David Eisenberg (O'Reilly) or Fundamentals of SVG Programming: Concepts to Source Code by Oswald Campesato (Charles River Media).

7.8.5. SMIL (Synchronized Multimedia Integration Language)

SMIL (pronounced "smile") is an XML language for combining audio, video, text, animation, and graphics in a precise, synchronized fashion. A SMIL file instructs the client to retrieve media elements that reside on the server as standalone files. Those separate elements are then assembled and played by the SMIL player.

The SMIL 1.0 Recommendation, released in June of 1998, was one of the first XML-based DTDs proposed by the W3C. The SMIL 2.0 Recommendation, released in January 2005, greatly expands upon the functionality established in the initial specification. It is broken into modules to be used with XHTML 1.1.

7.8.5.1. How SMIL works

The best way to get a quick understanding of SMIL is to look at a simple example. The following SMIL code creates a 15-second narrated slideshow, in which an audio track plays as a series of three images displayed in sequence.

 <par dur="15s"> <audio src="/books/4/439/1/html/2/audio_file.mp3" begin="0s" />    <seq>       <img src="/books/4/439/1/html/2/image_1.jpg" begin="0s" />       <img src="/books/4/439/1/html/2/image_2.jpg" begin="5s" />       <img src="/books/4/439/1/html/2/image_3.jpg" begin="10s" />    </seq> </par>

Looking at the code, it is easy to pick out the audio and image elements. Each points to a separate media file on the server.

All elements contained within the <par> element are played in parallel (at the same time); therefore, the audio will continue playing as the images are displayed. The image elements are contained in the <seq> element, which means they will be played one after another (in sequence). The begin attribute gives timing instructions for when each element should be displayed. In the example, the images will display in slideshow fashion every five seconds.

For more information on SMIL, take a look at SMIL 2.0: Interactive Multimedia for Web and Mobile Devices by Dick C.A. Bulterman and Lloyd Rutledge (Springer). Or you can check out these online resources.

W3C SMIL resources: Go right to the source for a good starting place for research or to keep up to date on the latest developments. See www.w3.org/AudioVideo/. For a thorough explanation of all SMIL elements and their supported attributes and values, make your way through the SMIL 2.0 Recommendation at www.w3.org/TR/smil20/cover.html.
JustSMIL Home (now part of Streaming Media World): This is a great site containing tutorials, product reviews, news, tips, and other useful SMIL information. See smw.internet.com/smil/smilhome.html.

7.8.6. MathML (Mathematical Markup Language)

MathML is an XML application for describing mathematical notation and capturing both its structure and content. The goal of MathML is to enable mathematics to be served, received, and processed on the World Wide Web. The MathML 2.0 Recommendation was released by the W3C Recommendation in October 2003.

Because there is no way to reproduce mathematical equations directly using HTML, authors had resorted to inserting graphical images of equations into the flow of text. This effectively removes the information from the structure of the document. MathML allows the information to remain in the document in a meaningful way. With adequate style sheets, mathematical notation can be formatted for high-quality visual presentation. Several vendors offer applets and plug-ins that allow the display of MathML information in browser windows.

For examples of MathML, see the Recommendation at www.w3.org/TR/2003/REC-MathML2-20031021. The main MathML page (www.w3.org/Math) is a good starting place for information.

7.8.7. Other XML Applications

There are far too many XML applications to list in a book. However, you may find that the more languages you are aware of, the better your grasp of XML's possibilities. The following are just a handful of the XML applications you may hear about.

DocBook: DocBook is a DTD for technical publications and software documentation. DocBook is officially maintained by the DocBook Technical Committee of OASIS, and you can find the official home page located at www.oasis-open.org/committees/docbook/.
Chemical Markup Language (CML): CML is used for managing and presenting molecular and technical information over a network. For more information, see www.xml-cml.org.
Open Financial Exchange (OFX): OFX is a joint project of Microsoft, Intuit, and Checkfree. It is an XML application for describing financial transactions that take place over the Internet. For more information, see www.ofx.net/ofx/default.asp.