Document Instances

The XML 1.0 Recommendation states that XML documents can come in two forms: well-formed and valid. A well-formed document is generally more flexible, or less strict, than a valid one. As we will discuss in the next section, this type of document needs only to have the proper beginning and ending tags as well as proper nesting to be considered an XML document. Valid documents, on the other hand, must conform in accordance to the referenced DTD. You should understand both types because both are acceptable in XML.

Well-Formed Documents

Well-formed XML conforms to the Recommendation by fulfilling several requirements:

  • There can be only one root element.
  • All open elements must be properly closed (that is, if a start tag is present, there should be an end tag, or if the element is empty, it should be appropriately tagged as such).
  • Any nested elements should be ended in correct order.

The first requirement is a simple one: there can be only one root element. For instance, if the two-roots.xml document in Listing 2-6 was building off the customer_v4.dtd schema, it would not be well formed.

Listing 2-6 two-roots.xml: A well-formed XML document with only one root element.

 <?xml version = "1.0"?> <!DOCTYPE customer SYSTEM "customer_v4.dtd"> <!-- being the root element --> <customer type = "current" id = "xyz"> <internal type = "current" id = "xyz"> <name> <first>John</first> <middle>Smithy</middle> <middle>Smithy</middle>  <last>Doe</last> </name> <contact> <address> <street>123 Some Street</street> <city>Anytown</city> <state>NC</state> <zip>25555</zip> </address> <phone> <work>919.555.1213</work> </phone> </contact> </internal> <author>&dtd-author;</author> </customer> <customer type = "past" id = "xyz"> <external type = "past" id = "xyz"> <name> <first>Jane</first> <last>Doe</last> </name> <contact> <address> <street>123 Some Street</street> <city>Anytown</city> <state>NC</state> <zip>25555</zip> </address> <phone> <work>919.555.1220</work> </phone> </contact> </external> <author>&dtd-author;</author> </customer> 

Figure 2-5 shows the error reported by Internet Explorer 5.5, which contains an XML parser that will check for a well-formed (but not valid) document. As you can see, this document is not well formed because it has two instances of the <customer> element and no single governing root element.

Figure 2-5 The error reported by Internet Explorer 5.5 because the document has multiple root elements.

The second requirement is that all open elements should be properly closed. If we build on the same example but remove the second instance of <customer> and leave off our ending </customer> element, we again violate the rules for a well-formed document. To demonstrate, we will load the example in Listing 2-7, missing-closing.xml, into Internet Explorer 5.5.

Listing 2-7 missing-closing.xml: Missing the closing </customer> tag and therefore not well formed.

 <?xml version = "1.0"?> <!DOCTYPE customer SYSTEM "customer_v4.dtd"> <customer type = "current" id = "xyz"> <internal type = "current" id = "xyz"> <name> <first>John</first> <middle>Smithy</middle> <last>Doe</last> </name> <contact> <address> <street>123 Some Street</street> <city>Anytown</city> <state>NC</state> <zip>25555</zip> </address> <phone> <work>919.555.1213</work> </phone> </contact> </internal> <author>&dtd-author;</author> 

Because Internet Explorer 5.5 contains a parser that validates whether documents are well formed, loading the previous document results in the error shown in Figure 2-6. Do not worry about parsers at this point. They will be covered in detail in Chapter 3.

Figure 2-6 Internet Explorer generating an error because an ending tag is missing.

The last requirement of a well-formed document is that any nesting of elements must occur in the proper format. For this example we will use XHTML because improper nesting of HTML elements was commonly accepted until the release of XHTML. In nested-error.html, shown in Listing 2-8, the <strong> element improperly ends before the <em> element.

Well-formed documents have few requirements for conforming to XML 1.0, but they can be useful and helpful in your projects. However, because much of your work will also demand a specific element structure, as well as any required attributes or elements, you might need documents that are not only well formed, but valid as well.

Listing 2-8 nested-error.html: Incorrectly ending the <strong> element before the nested <em> element.

 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en"  xml:lang="en"> <head> <title>Core XML</title> </head> <body> <p> Sometimes you <strong>really <em>need</strong></em>  to make sure you do the right thing. </p> </body> </html> 

Valid and Non-Valid Documents

The second way to verify XML documents is to compare their structure to a governing DTD or schema. A DTD or schema can define everything from the structure of the document to the data types, required attributes, and other requirements for how elements and attributes can be included in the document.

If any part of the document does not conform to the referenced DTD, it is not a valid document. This method of verifying documents is most often used in industry standards or application-to-application communication. These types of environments demand that the documents they exchange conform to a specific model, because without such a requirement the risk of processing invalid data is high and could adversely affect the entire system.

Using the customer_v4.dtd schema, we impose the requirement that the <customer> element should contain a type attribute value of either current or previous. If we exclude this value, as we did in johndoe_v5.xml, and run the document through a parser, checking only to see if the document is well formed, the document will be processed without any warning or error.

 <?xml version = "1.0" encoding = "UTF-8"?> <!DOCTYPE customer SYSTEM "customer.dtd"> <customer > <name> <first>John</first> <middle>Smithy</middle> <last>Doe</last> </name> <address> <line1>123 Some Street</line1> <line2>P.O. Box 555</line2> <city>Anytown</city> <state>NC</state> <zip>55555</zip> </address> <phone> <home>5551212</home> <work>5551213</work> </phone> <online> <email>john@doe.com</email> <url>http://www.doe.com</url> </online> </customer> 

However, if we ran the same example through a parser checking for validity, such as Xerces from the Apache group (http://xml.apache.org), we get an error complaining that the required type attribute is not included.



XML Programming
XML Programming Bible
ISBN: 0764538292
EAN: 2147483647
Year: 2002
Pages: 134

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net