XML and XSL

team lib

The Extensible Markup Language (XML) is a World Wide Web Consortium (W3C) standard, approved in February 1998, for describing the content of Web pages. The Extensible Stylesheet Language (XSL) is a draft W3C standard, released in August 1998, for describing how to present XML pages within a Web browser. But before I talk about what XML does, consider what the ubiquitous Hypertext Markup Language (HTML) doesn't do. (In this tutorial, I'm assuming that you have some knowledge of HTML and can read simple HTML code.)

HTML Basics

Say you wish to display the contents of this tutorial on a Web page. You might put the word "Tutorial" in headline style 2, the lesson number (Lesson 124) and title (XML and XSL) in headline style 1, the author's name in headline style 3, then the text in normal body style. In simple form, and with apologies to Network Magazine's long-suffering Webmaster, the code would look like:

 <html> <h2>Tutorial</h2> <h1>Lesson 124: XML and XSL</h1>_ <h3>by Alan Zeichick</h3> The Extensible Markup Language (XML) is a World Wide Web  Consortium (W3C) standard ... </html> 

So far, so goodexcept there's no logical rhyme or reason behind the coding. If you want to search a site (or a long document) for tutorials, for certain lesson numbers , or for everything by a particular author, you'd have to know that the site usually uses h2 for the article type, and h3 for the author name. Not particularly efficient, and rather ad hoc. You'd probably prefer to design the code to read as:

 <html> <article_type>Tutorial </article_type> <lesson_number>124 </lesson_number> <article_title>XML and XSL </article_title> <author>by Alan Zeichick</author> <article_text>The Extensible Markup Language (XML) is ... </article_text> </html> 

That's the principle behind XML, which lets Web site developers use meaningful tags based on the content of a Web page. Furthermore, XML allows site creators to define new tags as needed, rather than rely on a fixed set of generic HTML tags blessed by the W3C, or "embraced and extended" by a particular browser manufacturer.

Before you get carried away, please note that the second code fragment above is nonsensical : It's not HTML, and it's not really XML (but it's close). So, next I will look at what makes up an XML document, and then rewrite the example in genuine XML.

XML Documents

XML isn't a page-description languageit's a structured data-definition language. It describes the content of a Web page using tags. (XML can actually describe any arbitrary data, but for this tutorial assume XML code is being written for the Web.) An XML document must start with an XML prolog, which begins with a declaration that the document is written in XML; it may optionally include a Document Type Definition (DTD) that describes the elements, tags, attributes, and other elements of the document. (DTD will be discussed in more detail later.) The prolog is followed by a single tag that encapsulates the entire document. That tag usually is named root or document.

Further, in XML, unlike HTML, all tags must be closed, and although tags may be nested, they may not overlap.

So, here is the example rewritten in XML (save it as tutorial.xml). The XML prolog, which begins with <?xml, contains two attributes; one describes the version of XML used, and the other states that the document is complete in this one file. An XML document that adheres to these rules (and a few others regarding reserved characters ) is said to be well- formed and should be interpretable by any XSL processor.

 <?xml version="1.0" standalone="yes"?> <document> <article_type>Tutorial </article_type> <lesson_number>124</lesson_number> <article_title>XML and XSL </article_title> <author>Alan Zeichick</author> <article_text>The Extensible Markup Language (XML) is ... </article_text> </document> 

XSL: Displaying Xml

XML, as stated earlier, is a way to describe the meaning of a document. Unlike HTML, it does not describe how to display a document in a Web browser. No browser can understand arbitrary tags like <author> and <lesson_ number>. That's why you need the Extensible Stylesheet Language (XSL), which describes the intended physical appearance of an XML document. You will also need software, called an XSL processor, to read the XML document, apply the XSL style sheet to its tags, and produce standard HTML as output. I will talk about where to find XSL processors, but first I will create a sample XSL style sheet.

An XSL style sheet consists of a text file contained within <xsl> and </xsl> tags. Between those tags are a series of rules describing the XML tags within an XML document and telling how to format them in HTML. Remember I wanted the article_type above to be displayed in HTML headline style 2? The XSL rule for that XML tag would be:

 <rule> <target-element type="article_type"/> <h2><children/></h2> </rule> 

The special element <children/> tells the XSL processor to apply the XSL rule to the contents of the tagged item. Thus, the XML text <article_type>Tutorial</article_type> will be processed into <h2>Tutorial</h2>. Note that any HTML tag can be included as part of an XSL rule. Also, XML tags can be nested; in that case, XSL rules are applied recursively as needed. This capability allows Web site designers to exercise very fine control over the appearance of pagesfar more than is illustrated using this simple example.

Here is the complete XSL document tutorial.xsl for the tutorial.xml file. The final rule, which doesn't explicitly mention a target-element type, is a catch-all that applies a default formatting to all tags not specifically defined.

 <xsl> <rule> <target-element type="article_type"/> <h2><children/></h2> </rule> <rule> <target-element type="lesson_ number"/> <h1>Lesson<children/></h1> </rule> <rule> <target-element type="article_title"/> <h1><children/></h1> </rule>_<rule> <target-element type="author"/> <h3>by <children/></h3> </rule> <rule> <target-element type="article_text"/> <p><children/></p> </rule> <rule> <target-element/> <p><children/></p> </rule> </xsl> 

Generating Html

So, you'd like to see the HTML code generated by the XSL style sheet? Well, it's not as easy as loading it on a browser, as no generally available Web browser currently contains XSL processing capabilities. Microsoft includes some rudimentary XML parsing capability in Internet Explorer 4, and Netscape has demonstrated some XML functionality in Mozilla 5 (the core of its next-generation browser), but for now you'll need external software to apply the XSL style sheets to an XML document. You can find links to a number of freeware XSL processors at www.w3.org/XML/. [Note: Newer browser versions fully support XSL.]

The simplest processor to play with is Microsoft's MSXSL.EXE, downloadable from www.microsoft.com/xml/xsl/msxsl.asp/. This command-line utility can apply XSL style sheets to an XML document, and produce an HTML output file. It's simple enough to run: from a DOS prompt, run MSXSL i xmlfilename s xslfilename o htmlfilename.

If there is an error in the XML or XSL code, the MSXSL processor will let you know roughly where the problem occurred, and it may even guess at what went wrong (such as an argument mismatch: In my sample XSL file above, I had initially terminated the <h1> command with </h2>). But if all goes according to plan, you'll end up with the HTML file:

 <div><h2>Tutorial</h2> <h1>Lesson 124</h1> <h1>XML and XSL</h1> <h3>by Alan Zeichick</h3> <p>The Extensible Markup Language (XML) is ...</p></div> 

If this example represented the best that XML could do, the technology would have died a swift death; after all, many of those capabilities can be handled using straightforward HTML with Cascading Style Sheets (CSS). The real payoff will come from using XML's more advanced features, such as data validation using the DTDs and the Extensible Linking Language.

Document Type Definitions

Earlier, I discussed the prolog section of an XML document. The prolog must contain the <?xml statement, but it may optionally include either DTDs or a link to another file containing DTDs for application to the XML file. DTD validates a well-formed (that is, syntactically correct) XML document; all tags used in the body of the XML document must be defined in the DTD.

The DTD for this Tutorial, for example, would need to define the article_type field. It could simply define it as a random string of characters, but that wouldn't be of much benefit. A better strategy might be to predefine all of the possible article types, such as "Tutorial," "Feature," or "NT Techniques." DTDs can further specify that the article_type field must occur once, and only once, within an XML document, but that the author field can occur more than once, so articles with multiple authors can be supported. It could further specify that a new field, author_email, is valid only if the "@" symbol is included within it exactly once, or if the author_phone field can contain only digits. You get the idea.

An XML file can be validated only with an XML validating parser. (For a current list of validating parsers, nearly all of which are Java classes a developer can include in custom applications, rather than ready-to-run programs, see www.w3.org/XML/.) In general, if you're using only XML to build pages for displaying within a browser, you need not worry about DTDs. Those rigid definitions are required, however, to use XML as a domain-specific data-definitional languageto pass e-commerce data between servers, for example, or to standardize a way to describe chemical data, astronomical readings , or consumer credit reports . In those cases, having a strict definition of an XML document's allowable fields, and each field's allowable values and format, will make it easy to implement Web pages that enable the automated transfer of data between applications or organizations. One group doing just that is the XML/EDI Group (www.xmledi.com), which submitted an e-commerce-oriented DTD to the W3C in August 1998.

Resources

The World Wide Web Consortium's site at www.w3.org/ XML/ should be your first stop. This page provides many links to standards proposals and other papers, newsgroups and forums, and XML software and demo code.

Microsoft maintains a fairly extensive collection of documents about XML and XSL at www.microsoft.com/xml/. The MSXSL.EXE command-line processor and instructions can be found at www.microsoft.com/xml/ xsl/ msxsl.asp. There are also additional documents at www.microsoft.com/standards/xml/default.asp.

Netscape, which also supports XML, has posted developer information at http://developer.netscape.com/tech/ metadata/ index.html, but it's not as extensive as Microsoft's library.

The best book I've found on XML is XML: Extensible Markup Language , by Elliotte Rusty Harold (IDG Books, 1998); if you're looking for one book on the topic, this is it.

This tutorial, number 124, by Alan Zeichick, was originally published in the November 1998 issue of Network Magazine.

 
team lib


Network Tutorial
Lan Tutorial With Glossary of Terms: A Complete Introduction to Local Area Networks (Lan Networking Library)
ISBN: 0879303794
EAN: 2147483647
Year: 2003
Pages: 193

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net