Chapter 10 - To HTML and Beyond! | |
XSLT For Dummies | |
by Richard Wagner | |
Hungry Minds 2002 |
When you take a glance at HTML code, it sure does look an awful lot like an XML document, doesnt it? Ahhhhh, but a quick look like that can be deceiving, because HTML is not the same as XML. I overview the similarities and differences in Chapter 1, but I will dive deeper in this section. HTML is a markup language with a collection of those familiar elements and attributes, but HTML is a lot more lax than its stricter XML cousin. Think of HTML as the laid-back surfer dude from California and XML as the New York accountant type. Here are a few notable differences between HTML and XML:
Remember The core principle of XML is well- formedness (every start tag must have a matching end tag). And despite its markup language similarities to XML, HTML is not well- formed and should not be confused with XML. To tweak words once directed to Dan Quayle, I could say the following to HTML: I knew XML. XML was a friend of mine. Senator, youre no XML. To illustrate the difference between these two markup languages, consider the following HTML document. It is perfectly acceptable HTML, but breaks every rule in the book for XML: <html> <hEaD> <META NAME=Generator CONTENT='Star Web Page Commander'> </HeAd> <body BGCOLOR='#FFFFFF' link="#0000FF"> <p>Buzz: You are a sad, strange, little man. You have my pity. Farewell. <hr> </body> A Web browser can display that HTML flawlessly, but you dont want to run this code through an XML processor. It would barf all over the place! To make the code XML friendly, you need to change it to: <html> <head> <meta name="Generator" content="Star Web Page Commander"/> </head> <body bgcolor="#FFFFFF" link="#0000FF"> <p>Buzz: You are a sad, strange, little man. You have my pity. Farewell.</p> <hr/> </body> </html> The problem is that, even in this simplest of examples, many browsers dont display the code exactly as you intended. For example, a browser doesnt recognize the <hr/> empty tag element as a horizontal rule element, because that is not part of HTML standard. Therefore, outputting tags as a pure XML document doesnt give you reliable formatting results. Fortunately, you and I arent the only ones who see these variations as problematic . To reconcile the differences between HTML and XML, there is a new W3C standard afoot called XHTML, which is an XMLized version of HTML. In other words, XHTML gives the California surfer a shirt and tie and aims to give it some semblance of discipline. Everyone has to grow up, sooner or later! Over time, when the use of XHTML becomes widespread and browsers fully support it, switching between XML and HTML and displaying the results on the Web will be a breeze . But because older browsers will be around for some time to come, this XML Nirvana is still quite a ways off. Therefore, living with these nuances is something that you and I just have to get used to. So how do these differences between HTML and XML impact you as an XSLT stylesheet author? The most important implication is on the input side: Before you can transform an HTML document with XSLT, you first need to make the HTML document well-formed. XSLT cant work with HTML documents that arent well-formed. Tip HTML Tidy is a nifty utility that is freely available on the Web. You can use it to convert your HTML code into well-formed HTML. Go to tidy. sourceforge . net to download this utility. Also available is a Windows version of the tool called TidyGUI, which you can download at perso.wanadoo.fr/ablavier/TidyGUI .
|