Elements of an XML Document | Microsoft SQL Server 2005: The Complete Reference: Full Coverage of all New and Improved Features

The following are the essential elements of the XML document:

Prolog
Comments
Elements
Root
Child element
Empty element
Attributes

Prolog

The prolog contains the metadata that describes the document, such as processing instructions, telling the XML parser how to interpret the XML document, including style sheet information, encoding details, or version information. The very first line of an XML document should contain the processing instruction. The tag definition for the processing instruction is “<?” for the open tag and “?>” for the close tag.

 <?xml version="1.0"?>

If I wanted to include a reference to a style sheet, I would do the following:

 <?xml-stylesheet type="text/xsl" href="URL for XSL Stylesheet"?>

Comments

In an XML document, comments can be inserted between individual elements of the documents. Comments are defined with the start tag (<--) and the end tag (-->). In an XML document, the comments will be “grayed out” when displayed in the browser.

 <?xml version="1.0"?> <Customer>     <-- Customer details are contained below-->       <company>           <name>Microsoft</name>               <businessaddress>                        <-- this is the address-->                        <address>One Microsoft Way</address>                        <city>Redmond</city>                        <--this is the state-->                        <state>Washington</state>               </businessaddress>      </company> </Customer>

Comments can span multiple lines, so make sure you include the “-->” end tag, or else you may be excluding information that shouldn’t be excluded. It is very similar to the way comments are handled in SQL Server. Without your ending the comment, everything becomes a comment (as in all programming languages).

Elements

Elements are the basis of XML documents. Without elements, there is no XML document. XML elements are made up of start tags and end tags, just like HTML. The difference is that in XML, the start tag and end tag define the data within the tags, not how to format the data, as is the case in HTML. Herein lies the power of XML. We can create our own elements as we need to; we are not boxed into a strict set of rules governing how our data gets described.

There are still some rules regarding XML elements, which is basically the definition of the “well-formed” XML document:

The root node is a single, unique element, containing all other elements.
Open/close tags must match.
Elements are case sensitive.
Correct nesting-child elements must be contained within parent elements.
Attributes are enclosed in either single or double quotes.
Attributes cannot be repeated within an element.

The root node is the very first element in your XML document. It is the single, unique element that describes the rest of the document data. In the preceding example, the root element is <customer>, and at the end of the document, we close out the document with the </customer> end tag. If we fail to include the end tag, the document errors out and is not parsed. Within the <customer> root node, we can have multiple elements and child elements.

Now that we’ve defined the root node, we can start describing our data. Referring to the preceding example again, the next element is <company>. The <company> element contains child elements that further describe the data. This “parent” element contains “child elements” that are actually describing the <customer> details. Since each <company> could have many addresses, such as mailing, shipping, or headquarters, we can nest elements such as <businessaddress> to describe a particular address. That is the “X” in XML. Our document is extensible.

The rules still apply, though; we need to make sure that each element has the open and close portion of the element. Without these, the document becomes invalid. Notice also the nesting; the elements do not overlap. An example of poorly nested elements would be

 <name>       <customer>       </name>       </customer>

The <customer> element overlaps the closing </name> element. The correct nesting would look like this:

 <name>    <customer>    Some Data in the middle    </customer> </name>

An empty element is an element that does not contain any data or child elements. Empty elements can be defined in two ways:

 <companyname></companyname>

or the shorthand way:

 <companyname/>

I do not like the shorthand way; it throws me for a loop when I see it. It reminds me of not declaring your variables in a Visual Basic program. So do everyone a favor and avoid them if you can, if only to make your XML easier to read.

Attributes

Attributes are another way to provide further details about an element. The only rules to follow are these:

Attributes can be specified only once; the order is not important.
Attributes must be defined in the start tag of an element.
Attribute values must be enclosed in either single or double quotes.

 <state region='Northwest'>WA</state>

The attribute “region” further describes the State element. In an attribute, white space is ignored, so another acceptable version of the preceding element is

 <state       region='Northwest'>       WA </state>

SQL Server 2005 does all the correct formatting for us, so we will always get a valid XML document back from a query. When you start to delve further into XML, you will find a couple of useful tools available on the MSDN Web site. The XML Tree Viewer and the XML Validator will both assist you in debugging your XML. They can be downloaded at http://msdn.microsoft.com as can XML Notepad, a useful utility for writing and editing XML files.