Structure of an XML Document | SOA for the Business Developer: Concepts, BPEL, and SCA (Business Developers series)

Let's look again at the XML document shown earlier (Listing 4.1).

Our document is composed of a set of tagged constructs, with each tag bounded by angle brackets (< and >). The first construct is called the XML declaration. This line identifies the XML version, specifies an encoding (a subject far afield from most business development), and includes an initial and final question mark (?).

 <?xml version="1.0" encoding="ISO-8859-1"?>

If the XML declaration is missing, the document must conform to XML 1.0. For details on encoding, see the Web site http://skew.org/xml/tutorial.

Our document also includes a comment, which is text that an XML processor can ignore and whose purpose is usually to clarify (to a human reader) some aspect of the file. A comment is characterized by specific initial and final characters, as shown here.

 <!- policy request ->

Most important (as noted in the XML specification), the document includes "one or more elements" that represent the data of interest. In most cases, an element is "delimited by start-tags and end-tags"; as shown here, the end-tag precedes the element name with a virgule, or forward slash (/).

 <Insured> ... </Insured>

Each XML document must have a root element (such as Insured), which includes a set of other elements in a tree structure. Figure 4.2 shows the tree structure for the document under review.

image from book
Figure 4.2: XML tree structure

An important characteristic of XML is that each superior, or parent, element includes its immediately subordinate, or child, elements completely. For example, a parent element cannot include a start-tag of a child element without including the corresponding end-tag. Child elements can themselves have children, and we use the phrase descendants to refer to all elements that are within the start and end tags of a parent element, at any level of nesting.

Most elements include content: child elements, a literal value (as is in the Make element - <Make>Honda</Make>), or both, as in the following example of mixed content:

 <Vehicle>Cool!    <Make>Triumph</Make>    <Model>Spitfire</Model> </Vehicle>

An element also can include attributes, which are name-value pairs that provide additional information associated with the element. You specify attributes in the start-tag of the element. Each attribute is composed of a name, an equal sign (=), and a single- or double-quoted string. To separate one attribute from the next, you use white space (carriage returns, spaces, or tabs). Here's an example of an element with two attributes.

 <Insured Customer Status="In Review">

Although attributes are part of the element, they are not considered content. This distinction has a practical effect when you work with XPath, as described in Chapter 6.

Note that each identifier in the XML document is case-sensitive. An element named vehicle is distinct from one named VEHICLE.

You can organize your data in various ways, specifying elements in some cases and attributes in others. Within the Insured element, for example, we might add an Options element to describe the insurance coverage in effect when a car is disabled:

 <Options>    <TemporaryRental MaximumDays="10"/>    <Towing/> </Options>

The TemporaryRental element tells whether the insurance company pays for a customer's vehicle rental; the MaximumDays attribute indicates the maximum number of days that are covered; and the Towing element indicates whether the insurance company pays for use of a towing service.

An element (such as TemporaryRental) with no content is said to be empty. It can have a start and end tag or (as shown) can have a single tag with an ending virgule.

A lack of content in an element can mean that the data is elsewhere. A value for the Towing element might be assigned as a default in the XML Schema definition (as described later) or in the software that accesses the XML processor.

Later in this book, we mention processing instructions (PIs), which are XML statements that are used by the XML processor or by the software that invokes the processor. Here's an example.

 <? HandleThis how="somehow" ?>

The PI includes a question mark within each angle bracket and specifies a PI target, which is a name that identifies the instruction. The PI's content is the set of parameters and related values. In this example, the PI target is HandleThis, the parameter is how, and the value of the parameter is somehow.

Last, you may want the XML processor to ignore characters that are otherwise meaningful in a technical sense. For example, content that appears to be an element start-tag (such as <You&Me>) might be a string to be printed. To hide such content, place it in a character-data (CDATA) section, as shown here.

 <![CDATA[<You&Me>]]>