19.3. Structuring Data
In this section and throughout this chapter, we create our own XML markup. XML allows you to describe data precisely in a
XML Markup for an Article
In Fig. 19.2, we present an XML document that marks up a simple article using XML. The line
Figure 19.2. XML used to mark up an article.
This document begins with an
XML declaration
(line 1), which identifies the document as an XML document. The
version
attribute
specifies the XML version to which the document conforms. The current XML standard is version 1.0
.
Though the W3C released a version 1.1 specification in February 2004, this
XML comments
(lines 23), which begin with
<!--
and end with
-->
, can be placed almost
In Fig. 19.2, article (lines 520) is the root element. The lines that precede the root element (lines 14) are the XML prolog . In an XML prolog, the XML declaration must appear before the comments and any other markup.
The elements we used in the example do not come from any specific markup language. Instead, we chose the element names and markup structure that best describe our particular data. You can invent elements to mark up your data. For example, element
title
(line 6) contains text that describes the article's title (e.g.,
Simple XML
). Similarly,
date
(line 8),
author
(lines 1013),
firstName
(line 11),
lastName
(line 12),
summary
(line 15) and
content
(lines 1719) contain text that describes the date, author, the author's first name, the author's last
XML elements are nested to form hierarchieswith the root element at the top of the hierarchy. This allows document authors to create parent/child relationships between data. For example, elements title , date , author , summary and content are nested within article . Elements firstName and lastName are nested within author . Figure 19.21 shows the hierarchy of Fig. 19.2.
{% if main.adsdop %}{% include 'adsenceinline.tpl' %}{% endif %} Any element that contains other elements (e.g., article or author ) is a container element . Container elements also are called parent elements . Elements nested inside a container element are child elements (or children) of that container element. Viewing an XML Document in Internet Explorer
The XML document in Fig. 19.2 is simply a text file named
article.xml
. This document does not contain formatting information for the article. This is because XML is a technology for describing the structure of data. Formatting and displaying data from an XML document are application-specific issues. For example, when the
Figure 19.3. article.xml displayed by Internet Explorer.(This item is displayed on page 937 in the print version)(a)
(b)
Note the minus sign () and plus sign (
+
) in the screen shots of Fig. 19.3. Although these symbols are not part of the XML document, Internet Explorer places them next to every container element. A minus sign indicates that Internet Explorer is displaying the container element's child elements. Clicking the minus sign next to an element collapses that element (i.e., causes Internet Explorer to hide the container element's children and replace the minus sign with a plus sign). Conversely, clicking the plus sign
[ Note: In Windows XP Service Pack 2, by default Internet Explorer displays all the XML elements in expanded view, and clicking the minus sign (Fig. 19.3(a) does not do anything. So by default, Windows will not be able to collapse the element. To enable this functionality, right click the Information Bar just below the Address field and select Allow Blocked Content... . Then click Yes in the popup window that appears.] XML Markup for a Business LetterNow that we have seen a simple XML document, let's examine a more complex XML document that marks up a business letter (Fig. 19.4). Again, we begin the document with the XML declaration (line 1) that states the XML version to which the document conforms.
Figure 19.4. Business letter
|
1 <?xml version = "1.0" ?> 2 <!-- Fig. 19.4: letter.xml --> 3 <!-- Business letter marked up as XML --> 4 5 <!DOCTYPE letter SYSTEM "letter.dtd" > 6 7 <letter> 8 <contact type = "sender" > 9 <name> Jane Doe </name> 10 <address1> Box 12345 </address1> 11 <address2> 15 Any Ave. </address2> 12 <city> Othertown </city> 13 <state> Otherstate </state> 14 <zip> 67890 </zip> 15 <phone> 555-4321 </phone> 16 <flag gender = "F" /> 17 </contact> 18 19 <contact type = "receiver" > 20 <name> John Doe </name> 21 <address1> 123 Main St. </address1> 22 <address2></address2> 23 <city> Anytown </city> 24 <state> Anystate </state> 25 <zip> 12345 </zip> 26 <phone> 555-1234 </phone> 27 <flag gender = "M" /> 28 </contact> 29 30 <salutation> Dear Sir: </salutation> 31 32 <paragraph> It is our privilege to |
Line 5 specifies that this XML document references a DTD. Recall from Section 19.2 that DTDs define the structure of the data for an XML document. For example, a DTD specifies the elements and parent-child relationships between elements permitted in an XML document.
|
|
The DTD reference (line 5) contains three items, the name of the root element that the DTD specifies ( letter ); the keyword SYSTEM (which denotes an external DTD a DTD declared in a separate file, as opposed to a DTD declared locally in the same file); and the DTD's name and location (i.e., letter.dtd in the current directory). DTD document filenames typically end with the .dtd extension. We discuss DTDs and letter.dtd in detail in Section 19.5.
Several tools (many of which are free) validate documents against DTDs and schemas (discussed in Section 19.5 and Section 19.6, respectively). Microsoft's XML Validator is available free of charge from the Download Sample link at
msdn.microsoft.com/archive/en-us/samples/internet/xml/xml_validator/default.asp
This validator can validate XML documents against both DTDs and Schemas. To install it, run the downloaded executable file
xml_validator.exe
and follow the steps to complete the installation. Once the installation is successful,
Root element
letter
(lines 744 of Fig. 19.4) contains the child elements
contact
,
contact
,
salutation
,
paragraph
,
paragraph
,
closing
and
signature
. In addition to being placed between tags, data also can be placed in
attributes
name-value pairs that appear within the angle brackets of start tags. Elements can have any number of attributes (separated by spaces) in their start tags. The first
contact
element (lines 817) has an attribute named
type
with
attribute value
"sender"
, which indicates that this
contact
element identifies the letter's sender. The second
contact
element (lines 1928) has attribute
type
with value
"receiver"
, which indicates that this
contact
element identifies the letter's recipient. Like element names, attribute names are case sensitive, can be any length, may contain letters, digits, underscores, hyphens and periods, and must begin with either a letter or an underscore character. A
contact
element stores various items of information about a contact, such as the contact's name (represented by element
name
), address (represented by elements
address1
,
address2
,
city
,
state
and
zip
), phone number (represented by element
phone
) and gender (represented by attribute
gender
of element
flag
). Element
salutation
(line 30) marks up the letter's salutation. Lines 3240 mark up the letter's body using two
paragraph
elements. Elements
closing
(line 42) and
signature
(line 43) mark up the closing
|
Line 16 introduces the
empty element
flag
. An empty element is one that does not contain any content. Instead, an empty element sometimes contains data in attributes. Empty element
flag
contains an attribute that indicates the gender of the contact (represented by the parent
contact
element). Document authors can close an empty element either by placing a slash immediately
<address2></address2>
Note that the
address2
element in line 22 is empty because there is no second part to this contact's address. However, we must include this element to conform to the structural rules specified in the XML document's DTD
letter.dtd
(which we present in Section 19.5). This DTD specifies that each
contact
element must have an
address2
child element (even if it is empty). In Section 19.5, you will learn how DTDs