Building an XML Document | Cross-Platform Web Services Using C# & JAVA (Charles River Media Internet & Web Design)

Well-Formed and Valid XML

An XML document is often referred to as being well-formed and valid. This means that the document meets all the rules discussed in the previous section and contains all the information specified in either a Document Type Definition (DTD) or in an XML schema. Both act as a packing slip for XML documents, specifying which data needs to be present in the document. Validating a document against a schema or a DTD is a costly process and, thus, probably only occurs during the development of SOAP software. Once a developer ensures that his software produces the correct XML in the SOAP transactions, the validation is probably turned off. This is all dependent on how each vendor implements Web Services.

Well-Formed XML

A well-formed XML document follows the rules set forth by the W3C. Put simply, there must be one or more elements, there can only be one root element that is not overlapped by any other element, and every start tag must have an end tag unless it’s an empty element. Thus, one of the original examples was valid when it just had one empty element.

    <?xml version="1.0" ?>     <BOOK TITLE="Cross Platform Web Services"/>

By removing the / at the end of the element, the BOOK element no longer has a closing / or element. Therefore, the following document is not well formed.

    <?xml version="1.0" ?>     <BOOK TITLE="Cross Platform Web Services">

But by adding a closing BOOK element, the document becomes well formed again.

    <?xml version="1.0" ?>     <BOOK TITLE="Cross Platform Web Services"></BOOK>

Another error that prevents a document from being well formed happens when the root element gets overlapped by another tag. In the following example, BOOKDATA overlaps the root element BOOK and this causes a parsing error.

    <?xml version="1.0" ?>     <BOOK TITLE="Cross Platform Web Services">      <AUTHOR>Brian Hochgurtel</AUTHOR>      <BOOKDATA>         <PAGECOUNT>400</PAGECOUNT>         <PUBLISHER>Charles River Media</PUBLISHER>     </BOOK>     </BOOKDATA>

A quick way of checking the well-formedness of a document is to have Internet Explorer view the document. Figure 2.1 shows how Internet Explorer reports the error in the previous example.

click to expand
Figure 2.1: Using Internet Explorer to report well-formedness errors in XML documents.

Validity and Document Type Definition (DTD)

DTDs are a hold over from the older Serialized General Markup Language (SGML) standard that the publishing industry created to publish books. They are slowly falling out of favor with developers because they do not use XML. Up until the recent time, DTDs were the only way to have a valid XML document, but they didn’t provide many of the things needed for common programming such as types or order. They did, however, allow a user to specify entity references that gave the ability to substitute values in and out of XML documents. Consider the following simple XML document.

    <?xml version="1.0" ?>     <BOOK TITLE="Cross Platform Web Services">         <PAGECOUNT>400</PAGECOUNT>         <AUTHOR>Brian Hochgurtel</AUTHOR>         <PUBLISHER>Charles River Media</PUBLISHER>     </BOOK>

Note

DTDs rarely occur outside the publishing industry. However, you will find that many tools, such as Sun Microsystems Forte™, allow you to easily create them.

A simple DTD for this file would need to recognize that BOOK is the root element and TITLE, PAGECOUNT, AUTHOR, and PUBLISHER are all children of BOOK. Because BOOK is a root element, it cannot be optional but all the other elements can be. We also need the DTD to recognize that TITLE is an attribute of BOOK. The following is the appropriate DTD for this XML document.

    <?xml version="1.0" ?>     <!-- This is a comment -->     <!-- The following code is the DTD -->     <!-- The PI and the DTD are the prolog of the document     -->     <!DOCTYPE BOOK [     <!ELEMENT BOOK (PAGECOUNT?,AUTHOR+,PUBLISHER+)>     <!ATTLIST BOOK TITLE CDATA #REQUIRED>     <!ELEMENT PAGECOUNT (#PCDATA)>     <!ELEMENT AUTHOR (#PCDATA)>     <!ELEMENT PUBLISHER (#PCDATA)>     ]>     <BOOK TITLE="Cross Platform Web Services">       <PAGECOUNT>400</PAGECOUNT>       <AUTHOR>Brian Hochgurtel</AUTHOR>       <PUBLISHER>Charles River Media</PUBLISHER>     </BOOK>

The DTD at the beginning of the document is clearly not XML. It’s a completely different language and doesn’t provide many of the constructs needed to be useful to a developer. For example, it is not possible to specify the minimum or maximum number of AUTHORS that need to be in the document. Specifying quantities is done with the symbols +, *, and ?. The + means 1 or more of the element whereas the * means 0 or more. The ? means the element is optional.

The vagueness and the difficult syntax of DTDs cause most developers to look at schemas as a way to validate XML documents.

Validity and Schemas

The following XML code example is a schema generated by Visual Studio.NET. In the code, there are a lot of namespaces defined and many of them deal specifically with things Visual Studio needs, but there is still important information present in this code that helps a developer more than any DTD could.

Skip past the namespace definitions and look at the first xs:element. The first xs:element defines the requirement for PAGECOUNT in the XML document, and these requirements are that it is a string according to the type attribute, minOccurs set to 0 indicates that PAGECOUNT is not required, and maxOccurs means that PAGECOUNT can appear a maximum of three times. Additionally, the xs:sequence tag allows you to determine the order of the elements in the schema.

    <?xml version="1.0" ?>     <xs:schema        targetNamespace="http://www.advocatemedia.com/~vs1C0.xsd"       xmlns:mstns="http://www.advocatemedia.com/~vs1C0.xsd"       xmlns="http://www.advocatemedia.com/~vs1C0.xsd"       xmlns:xs="http://www.w3.org/2001/XMLSchema"       xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"       attributeFormDefault="qualified"       elementFormDefault="qualified">     <xs:element name="BOOK">       <xs:complexType>          <xs:sequence>          <xs:element name="PAGECOUNT" type="xs:string" minOccurs="0"            maxOccurs="3" msdata:Ordinal="0" />          <xs:element name="AUTHOR" type="xs:string" minOccurs="0"            msdata:Ordinal="1" />          <xs:element name="PUBLISHER" type="xs:string" minOccurs="0"            msdata:Ordinal="2" />         </xs:sequence>         <xs:attribute name="TITLE" form="unqualified" type="xs:string"           />       </xs:complexType>     </xs:element>     <xs:element name="NewDataSet" msdata:IsDataSet="true"             msdata:EnforceConstraints="False">       <xs:complexType>         <xs:choice maxOccurs="unbounded">           <xs:element ref="BOOK" />         </xs:choice>       </xs:complexType>     </xs:element>     </xs:schema>

Note

Schema types are often used to identify primitive types in a SOAP message.

The XML in a schema is quite complex and confusing, but rarely does a programmer need to worry about coding a schema by hand. There are several tools available, such as Visual Studio.NET, that generate schemas automatically based on a given XML document. The most a programmer might do is go in and modify any defaults such as minOccurs and maxOccurs.

It is important to notice that the schema does represent the primitive type that the value of the XML element must appear as. In this case, all the values comprise of xs:string but other types may be represent as well such as xs:int or xs:bool. Schema is one of the few XML standards that represent types in this manner.

Most developers favor schemas in software development. They provide many definitions, such as type and order, needed to generate code or to validate technical XML documents.