Section 7.2. How It Works


7.2. How It Works

XML has four basic components :

  • A document marked up in an XML language

  • An optional Document Type Definition or XML Schema that defines the elements and the rules for their use in that language

  • Style sheets for presentation instructions

  • Parsers that interpret the XML document

Take a closer look at each.

7.2.1. XML Documents

XML documents may be used for a wide variety of content. A document might be text based (such as a magazine article), or it might contain only numerical data to be transferred from one database or application to another. An XML document might also contain an abstract structure, such as a particular vector graphic shape (as in SVG) or a mathematical equation (as in MathML).

A Brief XML History

Both XML and HTML have roots in SGML (Standard Generalized Markup Language). SGML is a comprehensive set of syntax rules for marking up documents and data that has existed as an ISO standard since 1986. It is the big kahuna of meta-languages. For information on SGML, including its history, see www.oasis-open.org/cover/general.html.

When Tim Berners-Lee needed a markup language that told browsers how to display content, he used SGML to create HTML. In other words, HTML is an SGML application, albeit a very simplified one.

As the Web matured, there was a clear need for more versatile markup languages. SGML provided a good model, but it was too vast and complex; it had many features that were redundant, overly complicated, or simply weren't useful. XML is a simplified and reduced form of SGML.

Much of the credit for XML's creation can be attributed to Jon Bosak of Sun Microsystems, Inc., who started the W3C working group responsible for scaling down SGML to its portable, Web-friendly form. Other big players include James Clark, the technical lead of the working group, and Tim Bray, Michael Sperberg-McQueen, and Jean Paoli, the co-editors of the XML specification.

XML 1.0 became a W3C Recommendation on February 10, 1998 and it was revised three times, with the third edition released in 2004. At that time, the W3C released XML 1.1, which addressed issues with Unicode, among other things. Developers are still encouraged to use XML 1.0 if they do not need the newer features. Various aspects and modules of XML are still in development. For more information and updates on XML progress, see the W3C's site at www.w3.org/XML.


It is important to note that an XML document is not limited to one physical file. It may be made up of content from multiple files that are integrated via special markup, or it may exist only as records in a database that are assembled on the fly. The end result is always marked-up text content.

7.2.2. Document Type Definition (DTD)

Some XML languages also use a Document Type Definition (DTD) that defines each element allowed in the document along with its attributes and rules for use. An XML-compliant application may check the document against its DTD to "decode" the markup and make sure that it follows its own rules. A document that conforms to its DTD is said to be valid . DTDs are discussed in detail later in this chapter.

An updated method for defining XML elements and document structure is XML Schemas . A particular instance of an XML Schema is called an XML Schema Definition (XSD) . The difference is that XSDs are XML-based, while DTDs (an older form of schema) are created according to the rules of SGML. XSDs are more powerful in describing XML languages, but the price is that they also tend to be more complicated and difficult to read and write. XML Schemas are outside the scope of this introductory chapter, but you can find information on the W3C site at www.w3c.org/XML/Schema.

7.2.3. Style Sheets and XML

A markup language describes only the structure of a document; it is not concerned with how it looks. Like HTML, XML documents can use Cascading Style Sheets for presentation. In fact, the CSS Level 2 Recommendation has been broadened for use with all XML applications, not just web documents. CSS is covered in Part III of this book.

Another style sheet language called the Extensible Stylesheet Language (XSL) exists for XML documents. XSL creates a large overhead in processing, whereas CSS is fast and simple, making it generally preferable.

XSL is useful when the contents of the XML document need to be "transformed" before final display. Transforming generally refers to the process of converting one XML language to another, such as turning a particular XML language into XHTML on the fly, but it can also be used for transformations as simple as replacing words with other words. An Extensible Stylesheet Language for Transformations(XSLT, a subset of XSL) style sheet works as a translator in the transformation process. XSL is not covered in this chapter; for more information, see the XSL information on the W3C site at www.w3.org/Style/XSL/.

7.2.4. Parsers

Software that interprets the information in XML documents is called an XML parser or processor. Parsers are generally built into other XML-compliant applications (such as web browsers or database servers), although standalone, command-line XML parsers do exist. It's the parser's job to pass elements and their contents to the application piece by piece for display or execution.

One of the things the parser does is make sure that the XML document is well-formed , that is, that it follows all of the rules of XML markup syntax correctly. If a document is not well-formed, parsers are instructed not to process it (although some are more forgiving than others). Well-formedness is discussed in the following section. Some parsers are also validating parsers, meaning they check the document for validity against a DTD.




Web Design in a Nutshell
Web Design in a Nutshell: A Desktop Quick Reference (In a Nutshell (OReilly))
ISBN: 0596009879
EAN: 2147483647
Year: 2006
Pages: 325

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net