XML Documents | Developing Enterprise Web Services: An Architects Guide: An Architects Guide

The purpose of an XML document is to capture structured data, just like an object in an object-oriented programming language. Documents are structured into a number of elements, delimited by tags which may or may not be nested within other elements.

Anyone familiar with the syntax of HTML will immediately be comfortable with the look and feel of XML, although anyone thinking about coding XML like HTML must be wary XML is extremely strict in its syntax, where the interpretation of HTML (particularly by browsers) is quite permissive. As we progress through the examples, it is worth remembering the fundamental document syntax:

All tags must have corresponding end tags unless they are devoid of subelements, in which case they can be represented as
```
 <element-name … attributes … />. 
```
No element can overlap any other element, although nesting within elements is allowed.
A document can only have a single root element (which excludes the XML declaration <?xml … ?>).
Attributes of an element must have unique names within the scope of a single tag.
Only element names and attribute name-value pairs may be placed within a tag declaration.

The best way to understand XML is by example, and the XML document shown in Figure 2-1 is typical of the structure of most XML documents, though it is somewhat shorter than most we'll be seeing in the Web services world.

Figure 2-1. A simple XML document.

 <?xml version="1.0" encoding="utf-8"?> <dvd>   <title>The Phantom Menace</title>   <year>2001</year> </dvd>

Figure 2-1 shows a simple XML document that contains data about a DVD. The document (as all XML documents should) begins with the XML Declaration, delimited by <? and ?>. This declaration provides information for any programs that are going to process the document. In this case it informs any processors that the XML document is encoded according to version 1.0 (at the moment 1.0 is the first and only XML version and the 1.1 effort is underway) and the underlying textual encoding is UTF-8 as opposed to ASCII.

The remainder of the document is where the actual structured data is held. In this case we have a root element delimited by the dvd tag, which contains two subelements delimited by the title and year tags. Those subelements contain textual data that we assume relates to the name of the film on the disk and the year of its release (though this is a convention and we could name elements badly, just as we can poorly name variables when programming).

We can take this document one stage further and make it a little more useful for those programs who might want to derive richer information from it. The document shown in Figure 2-2 embellishes that from Figure 2-1 adding in the DVD regional information as an attribute to the root element region="2". We have also added a comment to aid human readability that is delimited by .

Figure 2-2. A simple XML document with attributes and comments.

 <?xml version="1.0" encoding="utf-8"?> <!-- This is the European release of the DVD --> <dvd region="2">   <title>The Phantom Menace</title>   <year>2001</year> </dvd>

The addition of the attribute in Figure 2-2 would, for instance, be of great help to a DVD cataloging system that could use the region attribute to classify disks by their target geographical region.