Section 24.1. What Is XML?

24.1. What Is XML?

XML, or Extensible Markup Language, has many similarities to HTML. Like HTML, it's a tag-based language used to identify different pieces of information and to structure data into a meaningful document. For example, HTML has the <h1> tag to identify the most important headline, or the <ul> tag to denote a bulleted list. But HTML has only a handful of tags, and, in many cases, they don't always meaningfully identify the information you're presenting. For example, you can format a news title like "Bigfoot to Wed Super Model" with an <h1> tag, but you could also use the <h1> tag to format the name of a product you're selling, the title of a book, or an event on a calendar. In these cases, you're using the same tag to identify different types of information; it would be more informative to use a tag that accurately identifies the type of information, like <product>, <title>, or <event>.

That's where the "X" in XML comes in. XML is not really a markup language like HTML, as much as it's a set of guidelines for creating your own markup languages. The X, or extensible , part of XML lets you define your own types of tagsor "extend" the language to fit your needs. In this way, you can create very specific tags to describe different types of information like invoices, books, personnel, and so on.


Note: To learn more about XML check out www.w3schools.com/xml, grab a copy of Learning XML (O'Reilly) by Erik T. Ray, or visit the XMLTopic Center on the Macromedia Web site: www.macromedia.com/devnet/topics/xml.html.

For example, say the National Exasperator wanted to come up with a way of storing a list of headlines, publication dates, and summaries for its news stories. In HTML, this might look something like:

 <h1>Praying Mantis Says Prayers Were Answered</h1> <p>10-30-2005<br/>In a bizarre story from the insect kingdom, The National Exasperator's own Brian Albert reports that a praying mantis in Borneo has gained the power of speech. You won't want to miss this interview.</p> 

This is all well and good for display in a Web browser, but it doesn't give any sense of what kind of information is being presented. This is particularly important when you keep in mind that XML was invented as a way of exchanging data between computers. So if another computer encountered this HTML, it wouldn't understand the purpose of the text inside the <h1> tag. In fact, even a human viewing this code might not easily discern what the "10-30-2005" means; it looks like a date, but maybe it's an ID number or some secret code used at National Exasperator headquarters. XML provides a much clearer way of defining the structure and meaning of content. For example, the National Exasperator's IT staff could decide to come up with their own XML format to store this information. In this case, the same information might be written in XML like this:

 <news> <title>Praying Mantis Says Prayers Were Answered</title> <pubdate>10-30-2005</pubdate> <summary>In a bizarre story from the insect kingdom, The National Exasperator's own Brian Albert reports that a praying mantis in Borneo has gained the power of speech. You won't want to miss this interview.</summary> </news> 

Kind of like HTML, right? But with a completely different set of tags. This new markup makes the meaning of each chunk of information clearer: you can easily tell that this data is a news item and that it has a title, a publication date, and a summary. In a nutshell , that's what XML is about: creating tags that meaningfully identify the information inside them.

24.1.1. Rules of the Road

Because XML is intended to be an easy way to exchange data between different computers, operating systems, programs, institutions, and people, there are some fairly strict requirements to ensure that everyone's playing by the same rules. If you've done your fair share of writing raw HTML code, much of this will be familiar to you (see Section 3.2). In fact, if you've written XHTML code (see Section 3.2.2), you've already been writing XML. XHTML is an XML version of HTML that just has a few more rules than plain-old HTML.

  • Every XML document must have a single "root" element . A root element is a tag that surrounds all other tags in a document and appears only once in a document. In an XHTML (and an HTML) document, for example, this is the <html> tag. In the National Exasperator news XML format introduced above, this tag is <news>. If you're creating your own XML-formatted file, you could make this root element whatever you wanted: <calendar>, <invoice>, and so on. It makes sense for this tag to be descriptive of whatever content you're storing inside the file.

  • All tags must be nested properly, with no overlapping tags . This rule works just as it does in HTML. You can't have code like this: <b><i>Bold and italics</b> </i>. Since the opening <i> tag appears after the opening <b> tag, its closing tag</i>must appear before (or inside of) the closing <b> tag, like this: <b><i>Bold and italics</i></b>.

  • All tags must have both an opening and closing tag, or be self-closing . For example, in HTML a paragraph of text is indicated by both an opening <p> and a closing </p>. Some HTML tags, however, don't hold content, like the <img> tag or the line break (<br>) tag. The XML version of these tags include a forward slash at the end of the tag, like this: <br/>. This type of tag is called an empty element .

  • The property values of all tags must be quoted . For example, in HTML, the <a> tag is used to add a link to a page, using the "href" property. In non-XML HTML you could get away with this: <a href=index.html>Home</a>. In XML, this doesn't fly. You need to quote the href property's value like this <a href="index.html">Home</a>. You're probably used to doing this already, and if you've been using Dreamweaver, the program always does this for you. But when writing your own XML files, make sure to include quotes around a tag's property values.

If your XML file meets these conditions, it's known as (to use the official XML designation) "well- formed ." In other words, your XML code is written properly. If you write more complex XML documents, there are additional rules you'll need to follow, but these are the basic requirements.

In many cases, you'll also include what's called a prolog an introduction of sorts, that appears at the very top of the document and announces what kind of document it is. In its most basic form the prolog looks like this:

 <?xml version="1.0"?> 

The prolog can also include the type of encoding (useful for indicating different characters for different languages) used in the document.

Here, then, is a basic, complete, and well-formed XML document:

 <?xml version="1.0" encoding="ISO-8859-1"?> <news> <pub>National Exasperator News</pub> <rights>Copyright 2005, The National Exasperator</rights> <entry id="284"> <title>Battle of the Century</title> <link>http://www.nationalexasperator.com/headlines/battle.html</link> <summary>The terrifying true story of the battle between two of the most feared legends of all time: Bigfoot vs. the Loch Ness Monster.</summary> <pubdate>10-30-2005</pubdate> </entry> <entry id="295"> <title>Aliens are Here!</title> <link>http://www.nationalexasperator.com/headlines/aliens.html</link> <summary>Reporters for The National Exasperator have discovered that aliens from another planet--metal clad robots, with the words "Space Man" emblazoned across their chests--walk among us.</summary> <pubdate>10-30-2005</pubdate> </entry> </news> 


Tip: Dreamweaver can verify whether an XML file is well formed. Open the file in Dreamweaver and then choose File Check Page Validate as XML. The Results panel group will open. If nothing appears inside the Validation panel, the file is OK. If theres an error, a message explaining the problem will appear. Fix the error and try to validate the document again. Dreamweaver can even validate XML using a DTD file (see the box below).
UP TO SPEED
Taming the Tower of Babel: DTDs and XML Schemas

You may be wondering: if anyone can make up her own tags to create her own types of XML files, how can XML help computers, people, and organizations exchange data. After all, if you come up with one way of formatting invoices using XML, and your buddy in accounting uses his own set of tags to create invoices, you'll end up with two different and incompatible types of files for tracking the same information. It's like the Tower of Babeleveryone speaking their own language and unable to talk to each other. Fortunately, XML provides two solutions to this problem: DTDs (or Document Type Definitions) and XML Schemas. Both are methods of creating a common vocabulary, so everyone can use the same language to talk about the same things.

In fact, you've already been using a DTD when building Web pages in Dreamweaver. When you create a new Web page, Dreamweaver adds a line of code at the beginning of the page, like this:

 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/ TR/xhtml1/DTD/xhtml1-transitional.dtd"> 

This line will vary depending on the type of HTML or XHTML you use (see Section 3.2.2). But the concept is the same. The line defines the document type for the pagein this example, XHTML 1.0 Transitionaland points to a URL where the DTD can be foundhere, it's http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd.

Essentially, the DTD for each type of HTML or XHTML defines what tags are allowed and how they should be written. If you don't follow the rules, the page is considered invalid. In fact, Dreamweaver's validator, discussed on Section 14.3, is doing just thatmaking sure your code follows the rules of a particular DTD.

XML Schemas are just another method of enforcing a language for a particular XML format, with a few bells and whistles that DTDs lack. DTDs have been around a long time and are more common; schemas are a newer concept, but will probably eventually replace DTDs. Both XML Schemas and DTDs are very confusing beastsdifficult to read and difficult to create. There are many DTDs and Schemas available for describing a wide range of different types of information. They're often created by a consortium of businesses that agree to a single way of describing information, so that they can easily share data with each other. You probably won't be creating your own anytime soon, but just keep in mind that they exist and are a common way to make sure everyone's speaking the same tongue.

Dreamweaver 8 includes a nice feature related to both DTDs and Schemas: If you include a DTD or Schema in an XML file, and then edit that XML file in Code view, Dreamweaver will display Code Hints for the various XML tags as you type. Code Hints are shortcuts for typing an entire tag or tag property; as you begin to type a tag, Dreamweaver pops up a small window displaying any tags that match what you've typed so far. At that point, you can just select the correct tag, instead of having to type it all out. This feature is also available when working with HTML in Code view and is described on Section 9.2.


If all this sounds like a lot of work, you're right. XML is a big topic, full of complex nooks and crannies. If you're just a busy Web designer making sure you get your client's latest press release up on the Web, you may not find yourself needing to create a brand-new XML format for your client's documents, or learn the ins and outs of creating XML files. And you may never have to. However, there are already a lot of XML files in existence and even more on their way. One of the reasons XML is so popular is that, although it may be a bit tedious for humans to write, computers are whizzes at following the detailed rules needed to create and read XML files. XML has become a kind of lingua franca for computer communication; different computers, operating systems, and programs can easily exchange information using an agreed-upon XML-based type of document.

XML lets programmers access Amazon.com's vast databases of information without understanding anything about how Amazon's computers are set up or what programs Amazon uses. This is also what lets any number of programs know what to expect when they retrieve an RSS feed from a blogger or Wired.com. Because XML makes it easy to exchange data, you'll find it becoming more and more common as you continue your career as a Web designer.

Looking at the sample code on Section 24.1.1, you can see that XML doesn't have much going for it in the looks department. There's nothing but text and tags. To turn that information into an attractive display, you need to use two other XML-related technologies: XSLT and XPath.

24.1.2. XSLT and XPath

Although XML is very much like HTML in many ways, it doesn't have any inherent formatting capabilities. Unlike with HTML, where an <h1> tag is at least displayed differentlybolder and biggerthan other text, a Web browser doesn't know how to display an XML tag. For example, should the <news> tag be black, blue, or red all over? XSLT and XPath are two complementary (and very complex) languages that let you define how XML tags should look. Fortunately, even though these languages are hard to master, Dreamweaver takes care of the entire process. All you need to know is how to use Dreamweaver's Design view to create cool-looking Web pages.

But just so you can show off to your co-workers and pad your resume, here's a brief explanation of these technologies.

XSLT is the magic dust that transforms an XML document into an HTML document. In fact, it's used to create any number of different types of documents for Web browsers, palmtops, printers, and so on, out of a single XML file. XSLT stands for Extensible Style Language Transformations, which is just a really weird name for a programming language that converts XML tags<event>Halloween Social</event>, for exampleinto something else, like the code a Web browser understands<h1>Halloween Social</h1>. In a nutshell, that's what Dreamweaver's XML tools do: they use XSLT to transform XML into HTML.


Note: Because XSLT adds formatting to XML, much like Cascading Style Sheets add formatting to HTML, you'll often see an XSLT file referred to as an XSLT style sheet .

But XSLT can't do the job on its own. Another technology, XPath, is needed to tell XSLT which tags to transform. XPath (yet another language!) provides the means to identify particular elements or tags in an XML file. You use XPath to create what's called an "XPath expression," which is kind of like a trail of cookie crumbs that leads from one part of the document (frequently the beginning tag, or "root element") to the particular "node"tag or tag propertyyou wish to select. In its most basic form, XPath works very much like the document window's tag selector: it pinpoints a tag nested in any number of other tags. For instance, using the XML code on Section 24.1.1 as an example, the XPath expression to the title of one of the news items is /news/entry/title .

Think of it this way: XPath is used to identify the XML tags that XSLT transforms into HTML. XSLT does the actual conversion to HTML, but XPath tells XSLT which tags to convert. They work hand in hand to get the job done. And, fortunately, that's all you need to know. In fact, it's more than you need to know to use Dreamweaver to turn XML into cool-looking Web pages.


Note: Dreamweaver 8 adds built-in reference material covering XML and XSLT. See Section 9.6 for more on Dreamweaver's Reference panel.


Dreamweaver 8[c] The Missing Manual
Dreamweaver 8[c] The Missing Manual
ISBN: 596100566
EAN: N/A
Year: 2006
Pages: 233

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net