Components of an XML DocumentSo far, I have provided some background material and discussed some of the characteristics of XML, but you still don't really know what makes up an XML document. In this section, I'll discuss the components required to build an XML document and then present an XML document that illustrates these various components. Types of MarkupAn XML document is a text file that consists of character data and markup. In the case of XML, there are several different constructs that are considered to be markup. A summary of these different constructs is shown in Table A.1. Table A.1. Summary of XML markup.
ElementsThe most fundamental structure of an XML document is the element. A well-formed XML document must contain at least one element, although an XML document usually contains many elements. An element typically surrounds character data with start and end tags. A sample of an element is <address>1106 River Avenue</address> This is a single element named <address> and it contains the character data " 1106 River Avenue ." Note that in an XML document, an element always uses the following syntax:
Instead of containing character data, elements can also contain other elements called child elements. When we begin to discuss elements containing other elements (that is, nested elements), you can start to visualize the XML document almost as a tree. As you can see in Figure A.1, we have a tree that has a <record> element at the root and two child elements named <name> and <address> . Figure A.1. Tree representation of a simple XML document.
The XML document that corresponds to the XML tree shown in Figure A.1 is <?xml version="1.0" encoding="UTF-8"?> <record> <name>Matthew Kolb</name> <address>1700 Grand Avenue</address> </record> In this example, the <record> element has two child elements, <name> and <address> . The <name> element contains the character data " Matthew Kolb " and the <address> element contains the character data " 1700 Grand Avenue ." Note that the child elements follow the same rules for start and end tags (that is, each element must have matching start and end tags). Elements can also be empty, and of course, they're called empty elements. We can use the standard notation without any character data, such as <book> </book> , or we can use a shorthand notation that consolidates the start and end tags into one tag, such as <book/> . Note that the forward slash appears after the element name in an empty element; it usually appears before the element name in an end tag. XML is case sensitive, so opening and closing tags for elements must use the same case. Either lower case or all capital letters can be used (and even mixed case), however you must be consistent. For example, <account>data</ACCOUNT> isn't valid. AttributesEach element can also have one or more attributes associated with it. Attributes are usually used to store data that is relative to a particular instance of an element. An attribute has a name and value associated with it, and it appears as part of the element's start tag. For example, the following is a valid attribute: <book isbn="0735712891">XML and Perl</book> As you can see, the <book> element has one attribute named isbn , and the value of the isbn attribute is " 0735712891 ." As I mentioned earlier, the attribute is applicable to this particular element ”another book element would have a different isbn attribute value. Elements can have several attributes if required. Attributes within the start tag must be separated by at least one space: <book isbn="0735712891" price".99">XML and Perl</book> Attribute values have to be quoted (either single or double quotes are allowed). The quotes must match (that is, be the same) for each attribute. For example, isbn="0735712891" price='$39.99' is ok, but isbn='0735712891" price="$39.99' isn't. Use of Attributes Versus ElementsYou could have easily created <isbn> and <price> elements in the previous section and stored the values as character data rather than attributes. Why would you use an attribute instead of an element (or vice versa)? Well, there isn't an easy answer to the question, and it is a popular topic on newsgroups or forums that always provokes a lot of strong opinions about when to use an element versus when to use an attribute. I might not be able to provide a definitive answer, but I can certainly provide a few suggestions. As you become more familiar with XML, you will get more comfortable with designing XML documents and the best approach is usually obvious. Let's take a look at a few examples that will help you determine when to store data as an element or an attribute. Storing Data as an ElementIf there is more than one occurrence of a data item, then you will need to store the data in an element rather than in an attribute. For example, let's say that you need to store a list of employee names . One option would be to use a root element named < employees > that has multiple <employee> elements. Each employee element has two child elements, <name> and <phone> , and each of these elements contains character data. Here is an example of that hierarchy: <?xml version="1.0" encoding="UTF-8"?> <employees> <employee> <name>Joseph</name> <phone>112</phone> </employee> <employee> <name>Kayla</name> <phone>114</phone> </employee> <employee> <name>Sean</name> <phone>116</phone> </employee> <employee> <name>Matthew</name> <phone>118</phone> </employee> </employees> This example can easily be extended to include additional information for each employee, such as an employee number, department, or home address. Storing Data as an AttributeA good example of when to use an attribute to store data is when you need to assign a unique identifier to each element, or the data describes the element itself. Let's take a look at the list of employees again, and let's say that you want to associate an employee number to each name. An example is shown in the following: <?xml version="1.0" encoding="UTF-8"?> <employees> <employee id="100"> <name>Joseph</name> <extension>112</extension> </employee> <employee id="101"> <name>Kayla</name> <extension>114</extension> </employee> <employee id="102"> <name>Sean</name> <extension>116</extension> </employee> <employee id="103"> <name>Matthew</name> <extension>118</extension> </employee> </employees> As you can see, without reorganizing your XML document, you uniquely associated each employee element with an employee identification number. Depending on what you are going to do with the XML document will help drive the design of the XML document. For example, you would need a document structure similar to this if you plan to search the XML document and find employee <name> elements based on the employee identification numbers. Another example of when it is beneficial to use attributes involves data that requires units (for example, kilograms, degrees Celsius, kilometers, and so forth). For example, the following XML element would require an additional step (a Perl split function call) to separate the data from the units of the data: <weight>75 kg</weight> An alternative to mixing the data and units in the character data would be to store the data unit in an attribute. For example, the following element and attribute would be easier to parse: <weight unit="kg">75</weight> Attributes can also be used when you want to limit the possible range of values to an enumerated list or range of valid values. This will be demonstrated a little later in this Appendix when we discuss DTDs and XML schemas. |