XML Documents

XML Documents

Its going to be important for you to know how XML documents work, so use this section to ensure that youre up to speed. Heres an example XML document that Ill take a look at:

 <?xml version="1.0" encoding="UTF-8"?>  <DOCUMENT>      <GREETING>          Hello From XML      </GREETING>      <MESSAGE>          Welcome to the wild and woolly world of XML.      </MESSAGE>  </DOCUMENT> 

Heres how this document works: I start with the XML processing instruction <?xml version="1.0" encoding="UTF-8"?> (all XML processing instructions start with <? and end with ?>), which indicates that Im using XML version 1.0, the only version currently defined, and UTF-8 character encoding, which means that Im using an eight-bit condensed version of Unicode:

 <?xml version="1.0" encoding="UTF-8"?> <DOCUMENT>      <GREETING>          Hello From XML      </GREETING>      <MESSAGE>          Welcome to the wild and woolly world of XML.      </MESSAGE>  </DOCUMENT> 

Next , I create a new tag named <DOCUMENT> . You can use any name, not just DOCUMENT, for a tag, as long as the name starts with a letter or underscore (_), and the following characters consist of letters , digits, underscores, dots (.), or hyphens (-), but no spaces. In XML, tags always start with < and end with >.

XML documents are made up of XML elements , and you create XML elements with an opening tag, such as <DOCUMENT> , followed by any element content (if any), such as text or other elements, and ending with the matching closing tag that starts with </, such as </DOCUMENT> . You enclose the entire document, except for processing instructions, in one element, called the root element , and thats the <DOCUMENT> element here:

 <?xml version="1.0" encoding="UTF-8"?>  <DOCUMENT>     .      .      .  </DOCUMENT> 

Now Ill add a new element, <GREETING> , that encloses text content (in this case, Hello From XML) within this XML document as follows :

 <?xml version="1.0" encoding="UTF-8"?>  <DOCUMENT>      <GREETING>          Hello From XML      </GREETING>      .      .      .  </DOCUMENT> 

Next, I can add a new element as well, <MESSAGE> , which also encloses text content, like this:

 <?xml version="1.0" encoding="UTF-8"?>  <DOCUMENT>      <GREETING>          Hello From XML      </GREETING>      <MESSAGE>          Welcome to the wild and woolly world of XML.      </MESSAGE> </DOCUMENT> 

Now the <DOCUMENT> root element contains two elements <GREETING> and <MESSAGE> . And each of the <GREETING> and <MESSAGE> elements themselves hold text. In this way, Ive created a new XML document.

Theres more to the story, howeverXML documents can also be well- formed and valid .

Well-Formed XML Documents

To be well-formed, an XML document must follow the syntax rules set up for XML by the W3C in the XML 1.0 recommendation (which you can find at www.w3.org/TR/REC-xml). Informally, well-formed means mostly that the document must contain one or more elements, and one element, the root element , must contain all the other elements. Also, each element must nest inside any enclosing elements properly. For example, the following document is not well formed, because the </GREETING> closing tag comes after the opening <MESSAGE> tag for the next element:

 <?xml version="1.0" encoding="UTF-8"?>  <DOCUMENT>      <GREETING>          Hello From XML      <MESSAGE>      </GREETING>          Welcome to the wild and woolly world of XML.      </MESSAGE>  </DOCUMENT> 

Valid XML Documents

Most XML browsers will check your document to see whether it is well-formed. Some of them can also check whether its valid. An XML document is valid if a Document Type Declaration (DTD) or XML schema is associated with it, and if the document complies with that DTD or schema. That is, the DTD or schema specifies a set of rules for the documents own internal consistency, and if the browser can confirm that the document follows those rules, the document is valid.

XML schemas are gaining popularity, and much more support for schemas is coming in XSLT 2.0 (in fact, supporting XML schemas is the motivating force behind XSLT 2.0), but DTDs are still the most commonly used tools for ensuring validity. DTDs can be stored in a separate file, or they can be stored in the document itself, in a <!DOCTYPE> element. This example adds a <!DOCTYPE> element to the example XML document we developed:

 <?xml version="1.0" encoding="UTF-8"?>  <?xml-stylesheet type="text/css" href="first.css"?>  <!DOCTYPE DOCUMENT [     <!ELEMENT DOCUMENT (GREETING, MESSAGE)>      <!ELEMENT GREETING (#PCDATA)>      <!ELEMENT MESSAGE (#PCDATA)>  ]>  <DOCUMENT>      <GREETING>          Hello From XML      </GREETING>      <MESSAGE>          Welcome to the wild and woolly world of XML.      </MESSAGE>  </DOCUMENT> 

This book does not cover DTDs (see Inside XML for all the details on DTDs), but what this DTD says is that you can have <GREETING> and <MESSAGE> elements inside a <DOCUMENT> element, that the <DOCUMENT> element is the root element, and that the <GREETING> and <MESSAGE> elements can hold text.

You can have all kinds of hierarchies in XML documents, where one element encloses another, down to many levels deep. You can also give elements attributes , like this: <CIRCLE COLOR="blue"> , where the COLOR attribute holds the value blue. You can use such attributes to store additional data about elements. You can also include comments in XML documents that explain more about specific elements by enclosing comment text inside <!-- and -->.

Heres an example of an XML document, planets.xml, that puts these features to work by storing data about the planets Mercury, Venus, and Earth, such as their mass, length of their day, density, distance from the sun, and so on. This document is used throughout the book, because it includes most of the XML features youll work with in a short, compact form:

Listing 1.1 planets.xml
 <?xml version="1.0"?>  <PLANETS>      <PLANET>          <NAME>Mercury</NAME>          <MASS UNITS="(Earth = 1)">.0553</MASS>          <DAY UNITS="days">58.65</DAY>          <RADIUS UNITS="miles">1516</RADIUS>          <DENSITY UNITS="(Earth = 1)">.983</DENSITY>          <DISTANCE UNITS="million miles">43.4</DISTANCE><!--At perihelion-->      </PLANET>      <PLANET>          <NAME>Venus</NAME>          <MASS UNITS="(Earth = 1)">.815</MASS>          <DAY UNITS="days">116.75</DAY>          <RADIUS UNITS="miles">3716</RADIUS>          <DENSITY UNITS="(Earth = 1)">.943</DENSITY>          <DISTANCE UNITS="million miles">66.8</DISTANCE><!--At perihelion-->      </PLANET>      <PLANET>          <NAME>Earth</NAME>          <MASS UNITS="(Earth = 1)">1</MASS>          <DAY UNITS="days">1</DAY>          <RADIUS UNITS="miles">2107</RADIUS>          <DENSITY UNITS="(Earth = 1)">1</DENSITY>          <DISTANCE UNITS="million miles">128.4</DISTANCE><!--At perihelion-->      </PLANET>  </PLANETS> 

You also need to understand a few XML definitions in this book:

  • CDATA. Simple character data (that is, text that does not include any markup).

  • ID. A proper XML name, which must be unique (that is, not shared by any other attribute of the ID type).

  • IDREF. Will hold the value of an ID attribute of some element, usually another element that the current element is related to.

  • IDREFS. Multiple IDs of elements separated by whitespace.

  • NAME Character. A letter, digit, period, hyphen, underscore, or colon .

  • NAME. An XML name, which must start with a letter, an underscore, or a colon, optionally followed by additional name characters.

  • NAMES . A list of names, separated by whitespace.

  • NMTOKEN. A token made up of one or more letters, digits, hyphens, underscores, colons, and periods.

  • NMTOKENS. Multiple proper XML names in a list, separated by whitespace.

  • NOTATION. A notation name (which must be declared in the DTD).

  • PCDATA. Parsed character data. PCDATA does not include any markup, and any entity references have been expanded already in PCDATA.

That gives us an overview of XML documents, including what a well-formed and valid document is. If you dont feel youre up to speed on XML documents, read another book on the subject, such as Inside XML . You might also look at some of the XML resources on the Web:

  • http://www.w3c.org/xml. The World Wide Web Consortiums main XML site, the starting point for all things XML.

  • http://www.w3.org/XML/1999/XML-in-10-points. XML In 10 Points (actually only seven); an XML overview.

  • http://www.w3.org/TR/REC-xml. This is the official W3C recommendation for XML 1.0, the current (and only) version. Not terribly easy to read.

  • http://www.w3.org/TR/xml-stylesheet/. All about using stylesheets and XML.

  • http://www.w3.org/TR/REC-xml-names/. All about XML namespaces.

  • http://www.w3.org/XML/Activity.html. An overview of current XML activity at W3C.

  • http://www.w3.org/TR/xmlschema-0/, http://www.w3.org/TR/xmlschema-1/, and http://www.w3.org/TR/xmlschema-2/. XML schemas, the alternative to DTDs.

  • http://www.w3.org/TR/xlink/. The XLinks specification.

  • http://www.w3.org/TR/xptr. The XPointers specification.

  • http://www.w3.org/TR/xhtml1/. The XHTML 1.0 specification.

  • http://www.w3.org/TR/xhtml11/. The XHTML 1.1 specification.

  • http://www.w3.org/DOM/. The W3C Document Object Model, DOM.

So, now youve created XML documentshow can you take a look at them?

Inside XSLT
Inside Xslt
ISBN: B0031W8M4K
Year: 2005
Pages: 196

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net