Section 19.3. Structuring Data


19.3. Structuring Data

In this section and throughout this chapter, we create our own XML markup. XML allows you to describe data precisely in a well-structured format.

XML Markup for an Article

In Fig. 19.2, we present an XML document that marks up a simple article using XML. The line numbers shown are for reference only and are not part of the XML document.

Figure 19.2. XML used to mark up an article.

  1  <?xml version = "1.0" ?>  2  <!-- Fig. 19.2: article.xml -->  3  <!-- Article structured with XML -->  4  5  <article>  6     <title>Simple XML</title>  7  8     <date>May 5, 2005</date>  9 10     <author>                          11        <firstName> John</firstName> 12        <lastName> Doe</lastName>    13     </author>                         14 15     <summary> XML is pretty easy.</summary> 16 17     <content> 18        In this chapter, we present a wide variety of examples that use XML. 19     </content> 20  </article> 

This document begins with an XML declaration (line 1), which identifies the document as an XML document. The version attribute specifies the XML version to which the document conforms. The current XML standard is version 1.0. Though the W3C released a version 1.1 specification in February 2004, this newer version is not yet widely supported. The W3C may continue to release new versions as XML evolves to meet the requirements of different fields.

Portability Tip 19.1

Documents should include the XML declaration to identify the version of XML used. A document that lacks an XML declaration might be assumed to conform to the latest version of XMLwhen it does not, errors could result.


Common Programming Error 19.1

Placing whitespace characters before the XML declaration is an error.


XML comments (lines 23), which begin with <!-- and end with -->, can be placed almost anywhere in an XML document. XML comments can span to multiple linesan end marker on each line is not needed; the end marker can appear on a subsequent line as long as there is exactly one end marker (-->) for each begin marker (<!--). Comments are used in XML for documentation purposes. Line 4 is a blank line. As in a Visual Basic program, blank lines, whitespaces and indentation are used in XML to improve readability. Later you will see that the blank lines are normally ignored by XML parsers.

Common Programming Error 19.2

In an XML document, each start tag must have a matching end tag; omitting either tag is an error. Soon, you will learn how such errors are detected.


Common Programming Error 19.3

XML is case sensitive. Using different cases for the start tag and end tag names for the same element is a syntax error.


In Fig. 19.2, article (lines 520) is the root element. The lines that precede the root element (lines 14) are the XML prolog. In an XML prolog, the XML declaration must appear before the comments and any other markup.

The elements we used in the example do not come from any specific markup language. Instead, we chose the element names and markup structure that best describe our particular data. You can invent elements to mark up your data. For example, element title (line 6) contains text that describes the article's title (e.g., Simple XML). Similarly, date (line 8), author (lines 1013), firstName (line 11), lastName (line 12), summary (line 15) and content (lines 1719) contain text that describes the date, author, the author's first name, the author's last name, a summary and the content of the document, respectively. XML element names can be of any length and may contain letters, digits, underscores, hyphens and periods. However, they must begin with either a letter or an underscore, and they should not begin with "xml" in any combination of uppercase and lowercase letters (e.g., XML, Xml, xMl) as this is reserved for use in the XML standards.

Common Programming Error 19.4

Using a whitespace character in an XML element name is an error.


Good Programming Practice 19.1

XML element names should be meaningful to humans and should not use abbreviations.


XML elements are nested to form hierarchieswith the root element at the top of the hierarchy. This allows document authors to create parent/child relationships between data. For example, elements title, date, author, summary and content are nested within article. Elements firstName and lastName are nested within author. Figure 19.21 shows the hierarchy of Fig. 19.2.

Common Programming Error 19.5

Nesting XML tags improperly is a syntax error. For example, <x><y>hello</x></y> is an error, because the </y> tag must precede the </x> tag.


Any element that contains other elements (e.g., article or author) is a container element. Container elements also are called parent elements. Elements nested inside a container element are child elements (or children) of that container element.

Viewing an XML Document in Internet Explorer

The XML document in Fig. 19.2 is simply a text file named article.xml. This document does not contain formatting information for the article. This is because XML is a technology for describing the structure of data. Formatting and displaying data from an XML document are application-specific issues. For example, when the user loads article.xml in Internet Explorer (IE), MSXML (Microsoft XML Core Services) parses and displays the document's data. Internet Explorer uses a built-in style sheet to format the data. Note that the resulting format of the data (Fig. 19.3) is similar to the format of the listing in Fig. 19.2. In Section 19.7, we show how to create style sheets to transform your XML data into various formats suitable for display.

Figure 19.3. article.xml displayed by Internet Explorer.


Note the minus sign () and plus sign (+) in the screen shots of Fig. 19.3. Although these symbols are not part of the XML document, Internet Explorer places them next to every container element. A minus sign indicates that Internet Explorer is displaying the container element's child elements. Clicking the minus sign next to an element collapses that element (i.e., causes Internet Explorer to hide the container element's children and replace the minus sign with a plus sign). Conversely, clicking the plus sign next to an element expands that element (i.e., causes Internet Explorer to display the container element's children and replace the plus sign with a minus sign). This behavior is similar to viewing the directory structure using Windows Explorer. In fact, a directory structure often is modeled as a series of tree structures, in which the root of a tree represents a drive letter (e.g., C:), and nodes in the tree represent directories. Parsers often store XML data as tree structures to facilitate efficient manipulation, as discussed in Section 19.8.

[Note: In Windows XP Service Pack 2, by default Internet Explorer displays all the XML elements in expanded view, and clicking the minus sign (Fig. 19.3(a) does not do anything. So by default, Windows will not be able to collapse the element. To enable this functionality, right click the Information Bar just below the Address field and select Allow Blocked Content.... Then click Yes in the popup window that appears.]

XML Markup for a Business Letter

Now that we have seen a simple XML document, let's examine a more complex XML document that marks up a business letter (Fig. 19.4). Again, we begin the document with the XML declaration (line 1) that states the XML version to which the document conforms.

Figure 19.4. Business letter marked up as XML.

  1  <?xml version = "1.0" ?>  2  <!-- Fig. 19.4: letter.xml -->  3  <!-- Business letter marked up as XML -->  4  5  <!DOCTYPE letter SYSTEM "letter.dtd">  6  7  <letter>  8     <contact type = "sender">  9        <name>Jane Doe</name> 10        <address1>Box 12345</address1> 11        <address2>15 Any Ave.</address2> 12        <city>Othertown</city> 13        <state>Otherstate</state> 14        <zip>67890</zip> 15        <phone>555-4321</phone> 16        <flag gender = "F" /> 17     </contact> 18 19     <contact type = "receiver"> 20        <name>John Doe</name> 21        <address1>123 Main St.</address1> 22        <address2></address2> 23        <city>Anytown</city> 24        <state>Anystate</state> 25        <zip>12345</zip> 26        <phone>555-1234</phone> 27        <flag gender = "M" /> 28     </contact> 29 30     <salutation>Dear Sir:</salutation> 31 32     <paragraph>It is our privilege to inform you about our new database 33        managed with XML. This new system allows you to reduce the 34        load on your inventory list server by having the client machine 35        perform the work of sorting and filtering the data. 36     </paragraph> 37 38     <paragraph>Please visit our Web site for availability 39        and pricing. 40     </paragraph> 41 42     <closing>Sincerely,</closing> 43     <signature>Ms. Jane Doe</signature> 44  </letter> 

Line 5 specifies that this XML document references a DTD. Recall from Section 19.2 that DTDs define the structure of the data for an XML document. For example, a DTD specifies the elements and parent-child relationships between elements permitted in an XML document.

Error-Prevention Tip 19.1

An XML document is not required to reference a DTD, but validating XML parsers can use a DTD to ensure that the document has the proper structure.


Portability Tip 19.2

Validating an XML document helps guarantee that independent developers will exchange data in a standardized form that conforms to the DTD.


The DTD reference (line 5) contains three items, the name of the root element that the DTD specifies (letter); the keyword SYSTEM (which denotes an external DTDa DTD declared in a separate file, as opposed to a DTD declared locally in the same file); and the DTD's name and location (i.e., letter.dtd in the current directory). DTD document filenames typically end with the .dtd extension. We discuss DTDs and letter.dtd in detail in Section 19.5.

Several tools (many of which are free) validate documents against DTDs and schemas (discussed in Section 19.5 and Section 19.6, respectively). Microsoft's XML Validator is available free of charge from the Download Sample link at

 msdn.microsoft.com/archive/en-us/samples/internet/ xml/xml_validator/default.asp 


This validator can validate XML documents against both DTDs and Schemas. To install it, run the downloaded executable file xml_validator.exe and follow the steps to complete the installation. Once the installation is successful, open the validate_js.htm file located in your XML Validator installation directory in IE to validate your XML documents. We installed the XML Validator at C:\XMLValidator (Fig. 19.5). The output (Fig. 19.6) shows the results of validating the document using Microsoft's XML Validator. Visit www.w3.org/XML/Schema for a list of additional validation tools.

Figure 19.5. Validating an XML document with Microsoft's XML Validator.


Figure 19.6. Validation result using Microsoft's XML Validator.


Root element letter (lines 744 of Fig. 19.4) contains the child elements contact, contact, salutation, paragraph, paragraph, closing and signature. In addition to being placed between tags, data also can be placed in attributesname-value pairs that appear within the angle brackets of start tags. Elements can have any number of attributes (separated by spaces) in their start tags. The first contact element (lines 817) has an attribute named type with attribute value "sender", which indicates that this contact element identifies the letter's sender. The second contact element (lines 1928) has attribute type with value "receiver", which indicates that this contact element identifies the letter's recipient. Like element names, attribute names are case sensitive, can be any length, may contain letters, digits, underscores, hyphens and periods, and must begin with either a letter or an underscore character. A contact element stores various items of information about a contact, such as the contact's name (represented by element name), address (represented by elements address1, address2, city, state and zip), phone number (represented by element phone) and gender (represented by attribute gender of element flag). Element salutation (line 30) marks up the letter's salutation. Lines 3240 mark up the letter's body using two paragraph elements. Elements closing (line 42) and signature (line 43) mark up the closing sentence and the author's "signature," respectively.

Common Programming Error 19.6

Failure to enclose attribute values double ("") or single ('') quotes is a syntax error.


Line 16 introduces the empty element flag. An empty element is one that does not contain any content. Instead, an empty element sometimes contains data in attributes. Empty element flag contains an attribute that indicates the gender of the contact (represented by the parent contact element). Document authors can close an empty element either by placing a slash immediately preceding the right angle bracket, as shown in line 16, or by explicitly writing an end tag, as in line 22

 <address2></address2> 


Note that the address2 element in line 22 is empty because there is no second part to this contact's address. However, we must include this element to conform to the structural rules specified in the XML document's DTDletter.dtd (which we present in Section 19.5). This DTD specifies that each contact element must have an address2 child element (even if it is empty). In Section 19.5, you will learn how DTDs indicate that certain elements are required while others are optional.



Visual BasicR 2005 for Programmers. DeitelR Developer Series
Visual Basic 2005 for Programmers (2nd Edition)
ISBN: 013225140X
EAN: 2147483647
Year: 2004
Pages: 435

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net