Defining the Structure and Content of an XML Document

Two main approaches to defining the format and contents of XML documents are Document Type Definitions (DTDs) and XML schemas. However, before discussing DTDs and XML schemas, let's discuss the format of an XML document.

Well- Formed and Valid XML Documents

XML documents are considered well-formed if they follow the rules in the XML specification regarding document structure. Let's take a look at some of the more important points related to well-formed documents.

An XML document can only have one root element, that is, the top level of the document tree. For example, in the following XML document, the root element is named < employees > :

 <?xml version="1.0" encoding="UTF-8"?>  <employees>     <name employee_id="100">Joseph</name>     <name employee_id="101">Kayla</name>     <name employee_id="102">Sean</name>     <name employee_id="103">Matthew</name>  </employees>

All elements (including the root element) must have matching start tags and end tags. In the following case, the XML document isn't well-formed because the <name> element containing the character data Matthew element is missing the end tag.

 <?xml version="1.0" encoding="UTF-8"?>  <employees>     <name employee_id="100">Joseph</name>     <name employee_id="101">Kayla</name>     <name employee_id="102">Sean</name>      <!Missing closing tag here >     <name employee_id="103">Matthew  </employees>

Another important rule regarding XML document structure is nesting. All elements @IT: XML document, the root element is <employees> , and it is made up of multiple <employee> elements. Each <employee> element has <name> and <employee_num> child elements. For the nesting to be correct, the closing tag for each child (for example, <employee> ) must be closed with an end tag before you can start the next <employee> element.You can see that the end tag for the <employee> element containing the character data "Sean" is missing. In this case, I've illegally tried to start a new <employee> element containing the character data "Matthew" before properly closing the <employee> element containing the character data "Sean."

 <?xml version="1.0" encoding="UTF-8"?>  <employees>     <employee>        <name>Joseph</name>        <employee_num>100</employee_num>     </employee>     <employee>        <name>Kayla</name>        <employee_num>101</employee_num>     </employee>     <employee>        <name>Sean</name>        <employee_num>102</employee_num>  <!Missing </employee> closing tag here >     <employee>        <name>Matthew</name>        <employee_num>103</employee_num>     </employee>  </employees>

A stricter check of an XML document verifies that the XML document is valid. An XML document is considered valid if it follows the rules defined in a DTD or XML schema regarding structure and content. For example, a DTD or XML schema may specify that each of the <employee> elements must contain a name <element> and an <employee_num> element, and in this case, the preceding document (with the missing </employee> end tag replaced ) would be considered valid. If the DTD or XML schema specified that each <employee> element is required to have a <telephone_number> element, then the XML document would be considered invalid.

Some XML parsers are validating parsers. A validating parser compares an XML document against a DTD or XML schema and verifies that the XML document contains the required elements and is properly structured according to the DTD or XML schema. If a validating parser finds that an XML document doesn't correspond to a DTD or XML schema, it notifies the user of the problem.

An XML document can be well-formed but invalid (that is, it is syntactically correct, but it just doesn't match your DTD or XML schema). However, if an XML document is valid, that implies that it is also well-formed.

All XML documents need to be well-formed, but they don't necessarily need to be valid. It is good XML practice to develop and use a DTD or XML schema that defines your XML document whenever XML documents are being exchanged. Let's say that your company exchanges sales information with other companies. In a situation such as this, it is critical to first agree on the data that will be exchanged, and then define a set of rules that govern the structure of the data in an XML document. After these rules are defined by a DTD or XML schema, then the two companies can begin to exchange information in XML.

Think of a DTD or XML schema as a contract between the companies that defines the structure and content of the XML document. This contract will help you verify that the XML documents you send to other companies are in the proper format (so that they'll be able to process your XML documents with their applications). Also, this contract verifies that the XML documents you receive from other companies are in the proper format (so that your applications will be able to process their XML documents). It may take some negotiating between the organizations to agree on document structure and content, but all the benefits provided by XML make it worth the trouble.

You can use XML documents in a number of cases that aren't defined by a DTD or XML schema. For example, if you're using XML for simple or small XML documents (for example, configuration/startup files) that aren't exchanged between organizations, you may not need to develop a DTD or XML schema. As long as the XML documents are well-formed, they will suffice for particular applications.

Note

An interesting question that occasionally pops up is, "Are XML documents that don't have a DTD or XML schema considered invalid?" The answer is no. An XML document is considered invalid if it doesn't follow the corresponding DTD or XML schema. So, if an XML document doesn't have a DTD or XML schema, it can't be invalid.

Let's take a look at DTDs and XML schemas and how the rules that control the structure and contents of XML documents are specified. Both DTDs and XML schemas have advantages and disadvantages associated with them, and which one you use depends on your requirements. After I discuss DTDs and XML schemas, I will present a few advantages and disadvantages that will aid you in deciding which approach to use.

Document Type Definition

A DTD is a set of rules in non-XML format that defines the content and the structure of an XML document. The DTD describes the elements that appear in the XML document, parent-child relationships (that is, nesting), and also specifies which elements are required and which elements are optional.

XML Document Type Definition

The easiest way to describe the structure of a DTD might be to walk through an example, starting with the input data, describing a DTD, and then building the corresponding XML document. For this example, let's assume that you work for a financial institution and you're working with the data generated by a series of daily bank transactions. Each transaction will contain the following information:

Account number

Name

Amount

Date

Transaction type: withdrawal or deposit

Tree Representation of Data

Figure A.2 contains a tree structure that shows the relationship between several data elements. As you can see, each transaction has an attribute type (that is, withdrawal or deposit) associated with it and four child elements ( <account_number> , <name> , <amount> , and <date> ). If you describe the data in terms of an XML document, the root element would be <daily_activity> , and it has multiple <transaction> child elements. The <transaction> element has a type attribute associated with it, and it has three child elements: <account_number> , <name> , and <amount> . Now that you know the parent-child relationships between the data elements, let's take a look at how to describe this in a DTD.

Figure A.2. Relationship between the data associated with a bank transaction.

graphics/apafig02.gif

Structure of an XML Document Type Definition

The DTD that corresponds to the data represented by Figure A.2 is shown in Listing A.1. Let's take a closer look at the DTD and explain the format. Several important points need to be made.

Listing A.1 A simple DTD that contains bank transaction information. (Filename: app_a_daily_activity.dtd)

 <!ELEMENT daily_activity (transaction+)>  <!ATTLIST daily_activity     branch CDATA #REQUIRED>  <!ELEMENT transaction (account_number, name, amount, month, day)>  <!ATTLIST transaction type (withdrawaldeposit) #REQUIRED>  <!ELEMENT account_number (#PCDATA)>  <!ELEMENT name (#PCDATA)>  <!ELEMENT amount (#PCDATA)>  <!ELEMENT month (#PCDATA)>  <!ELEMENT day (#PCDATA)>

The data shown in Figure A.2 follows the structure of the DTD hierarchy. At the top of the DTD is the declaration for the <daily_activity> element:

 <!ELEMENT daily_activity (transaction+)>

Whenever you declare an element in a DTD, you use the following format:

 <!ELEMENT element_name contents>

In this case, you're declaring the element name is <daily_activity> , and the content of the <daily_activity> element is multiple <transaction> child elements. Also, the <daily_activity> element is the root of your XML document because it is the first element to appear in the DTD. Note the plus sign "+" in the declaration of the <daily_activity> element. That is an indicator of the number of children that the <daily_activity> element is permitted to have. The valid element suffixes are

*	zero or more occurrences of the element
?	zero or one occurrence of the element
+	one or more occurrences of the element

In the absence of one of the suffix characters , the DTD allows only one occurrence of the child element.

The next declaration in your DTD is

 <!ATTLIST daily_activity     branch CDATA #REQUIRED>

This is the declaration for an element attribute that uses the following format:

 <!ATTLIST element_name attribute_name attribute_type default>

The element_name and attribute_name parameters are self-explanatory, but let's take a closer look at their types and the default parameters. In your case, you have one attribute associated with the <daily_activity> element named branch and the attribute type is CDATA . Note that several attributes can be declared as part of each ATTLIST declaration. The attribute_type parameter specifies the type of the attribute. Currently, 10 attribute types are supported:

CDATA ” Attribute value data is character data.
ENTITIES ” Attribute value data is made up of multiple entities that are defined elsewhere in the DTD.
ENTITY ” Attribute value data is a single entity that is defined elsewhere in the DTD.
Enumeration ” Attribute data must be selected from an enumerated list. This bounds the possible user inputs (for example, an enumerated list containing days of the week would only contain Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday, thereby limiting the possible values to only valid values). Note that the enumeration type is the only attribute type that doesn't have a DTD keyword, hence the lowercase name.
ID ” Attribute data value must be unique within the XML document (that is, the value can appear in the XML document, but not as the value assigned to an attribute of type ID ).
IDREF ” Attribute data value is the ID type attribute of another element in the XML document.
IDREFS “ ” Attribute data value is a list of multiple ID type attributes of other elements in the XML document.
NMTOKEN ” Attribute data value must follow a specific naming convention that is similar, but not identical to, XML names . A valid NMTOKEN can consist of alphanumeric characters and the punctuation marks _, -, ., and :.
NMTOKENS ” Attribute data value consists of multiple NMTOKENS separated by whitespace.
NOTATION ” Attribute contains the name of a notation that is declared in the DTD. Note that this is very rarely used.

The next declaration is for the transaction element:

 <!ELEMENT transaction (account_number, name, amount, month, day)>

This declaration says that the <transaction> element has five child elements: <account_number> , <name> , <amount> , <month> , and <day> . The <transaction> element also has an attribute that is declared by the following statement:

 <!ATTLIST transaction type (withdrawaldeposit) #REQUIRED>

As you can see, the transaction element has one attribute named type . The type attribute is an enumerated attribute, and the only valid values are withdrawal or deposit . Because the default declaration for the type attribute is #REQUIRED , this means that a value for the type attribute must be present, and it must be one of the values that appears in the enumerated list.

The valid values for the attribute default declarations are shown in the following.

#FIXED ” The #FIXED attribute default declaration means that the attribute is a constant value.
#IMPLIED ” The #IMPLIED attribute default declaration means that the attribute is optional.
Literal ” The attribute default value is provided as a quoted string. Similar to the enumeration attribute type, there isn't a reserved word for the literal default declaration.
#REQUIRED ” The #REQUIRED attribute default declaration means (as you may have already guessed from the name) that the attribute is required.

The last portion of our DTD contains the declaration for the child elements and is shown in the following.

 <!ELEMENT account_number (#PCDATA)>  <!ELEMENT name (#PCDATA)>  <!ELEMENT amount (#PCDATA)>

The declarations for the <account_number> , <name> , and <amount> elements are basically the same, all specifying that the respective elements will contain only #PCDATA . The notation #PCDATA tells us that the elements will not contain any child elements, only character data.

PCDATA stands for parsed character data; basically, that is the remaining character data after all entity references have been resolved. Be careful not to confuse PCDATA with CDATA. Remember, CDATA denotes a section of an XML document that may contain characters (for example, "<" and ">") that are normally encoded as < and >, respectively.

Document Type Definition Example

Now that you have specified the format of the XML document by designing a DTD, let's take a look at the XML document shown in Listing A.2. As you can see, the sample XML document has two <transaction> elements. Note that the standalone is equal to " no ", which means that this XML document relies on the existence of an external DTD.

Listing A.2 XML document containing bank transactions using an external DTD. (Filename: app_a_daily_activity_ext.xml)

 <?xml version="1.0" encoding="UTF-8" standalone="no"?>  <!DOCTYPE daily_activity SYSTEM "app_a_daily_activity.dtd">  <daily_activity branch="Manasquan, NJ">  <transaction type="withdrawal">     <account_number>11-22-33-4444</account_number>     <name>Mark Riehl</name>     <amount>100.00</amount>     <month>6</month>     <day>8</day>  </transaction>  <transaction type="deposit">     <account_number>22-11-44-1111</account_number>     <name>Joseph Burns</name>     <amount>50.00</amount>     <month>6</month>     <day>10</day>  </transaction>  </daily_activity>

The last DTD-related item that we need to discuss is the DOCTYPE declaration in the DTD. The following DOCTYPE declaration is used in your DTD:

 <!DOCTYPE daily_activity SYSTEM "daily_activity.dtd">

The DOCTYPE declaration must appear after the XML declaration, but before the root element of the document. The format of the DOCTYPE declaration is as follows:

 <!DOCTYPE root_element SYSTEM "DTD name">

This is fairly straightforward ”the DOCTYPE declaration needs to include the root element name and the location of the external DTD file. The location can include a path on the local machine to the DTD or a URL if the DTD is available on another machine.

In addition to the SYSTEM identifier, you can also use the PUBLIC identifier. PUBLIC provides a globally unique string that identifies the version of the DTD being used and other information, but this string doesn't specify the filename of the DTD. The PUBLIC keyword is rarely used.

A slight variation on the XML document is shown in Listing A.3. As you can see, we've changed the value of the standalone parameter to "yes" and included the entire DTD in place of the SYSTEM identifier. Now, you have a standalone XML document. It is often beneficial to include the DTD at the top of the XML document, so that when you make changes to the DTD or the XML document, you can easily make any required changes to synchronize the DTD and the XML document. One potential drawback of storing the DTD in an XML file is that the DTD can't be used during the validation of another XML document. Remember, a validating XML parser needs access to the DTD (or an XML schema) to perform the validation.

Listing A.3 XML document containing bank transactions using an external DTD . (Filename: app_a_daily_activity_ext.xml)

 <?xml version="1.0" encoding="UTF-8" standalone="yes"?>  <!DOCTYPE daily_activity [  <!ELEMENT daily_activity (transaction*)>  <!ATTLIST daily_activity date CDATA #REQUIRED>  <!ELEMENT transaction (account_number, name, amount)>  <!ATTLIST transaction type (withdrawaldeposit) #REQUIRED>  <!ELEMENT account_number (#PCDATA)>  <!ELEMENT name (#PCDATA)>  <!ELEMENT amount (#PCDATA)>  ]>  <daily_activity month="6" day="10">  <transaction type="withdrawal">     <account_number>11-22-33-4444</account_number>     <name>Mark Riehl</name>     <amount>100.00</amount>  </transaction>  <transaction type="deposit">     <account_number>22-11-44-1111</account_number>     <name>Joseph Burns</name>     <amount>50.00</amount>  </transaction>  </daily_activity>

Note

Please see http://www.w3.org/TR/REC-xml for additional information on DTDs.

XML Schemas

XML schemas are XML documents that are similar to DTDs in that they are used to specify the content and structure of an XML document. However, that is where the similarities end. XML schemas were developed by the W3C in response to complaints about DTDs. For example, DTDs don't support namespaces. Also, DTD data typing is weak and it only applies to attributes. Unfortunately, XML schemas are much more complicated than DTDs. Fortunately, XML schemas are much more powerful than DTDs, so their additional complexity has a benefit, and their benefits are worth the extra work.

Although a DTD will enable you to verify that a particular element contains data or that an attribute must be selected from an enumerated list, XML schemas enable you to specify that an element must contain a particular data type. For example, you can specify that a <title> element in a report must contain a string, the <price> element of an XML catalog must be a value that is greater than zero, or a <date> element must be an integer between 1 and 31. As you can see, XML schemas enable a finer grain of resolution when it comes to defining contents of an XML document.

The subject of XML schemas is complex and cannot be thoroughly discussed in the pages available in this appendix. However, I will present an example of an XML schema and discuss all its major components .

Simple Types

One of the major concepts behind XML schemas is using simple and complex types to describe elements. A simple type is just that ”it cannot have any attributes or enclose any other elements. A complex type can do those things the simple cannot ”that is, support attributes and enclose other elements. One of the tasks that you will be doing quite often when working with XML schemas is building complex types. A number of simple types are available to use as part of XML schema. Table A.2 shows all the simple types built into XML schema.

Note

Additional information on simple types in XML schema can be found at http://www.w3.org/TR/xmlschema-0/.

Table A.2. Simple types defined in XML schema.

Simple	Type Description
`anyURI`	Contains a standard URI, such as http://www.w3c.org.
`base64Binary`	Contains Base64-encoded binary data.
`boolean`	Contains either true, false, 0, or 1.
`byte`	Contains a small integer, such as “1 or 126.
`date`	Contains the date, such as 2000-08-12.
`dateTime`	Contains time and date using the following format: 2000-08-12T10:00:00.000-05:00, which corresponds to August 12, 2000 at 10:00 a.m. Eastern.
`decimal`	Contains decimal data.
`double`	Contains the equivalent of a 64-bit floating-point number.
`duration`	Contains a time-duration as a string. For example, P1Y2M3DT10H30M12.3S represents 1 year, 2 months, 3 days, 10 hours, 30 minutes, and 12.3 seconds.
`ENTITIES`	Contains an XML version 1.0 specification `ENTITIES` attribute type.
`ENTITY`	Contains an XML version 1.0 specification `ENTITY` attribute type.
`float`	Contains the equivalent of a 32-bit floating-point number.
`gday`	Contains a day of the month, for example 12.
`gMonth`	Contains a month number in the Gregorian calendar. For example, 08 represents the month of August.
`gMonthDay`	Contains a month and a day from the Gregorian calendar. For example, 08 “12 represents August 12.
`gYear`	Contains a year in the Gregorian calendar, for example, 2000.
`gYearMonth`	Contains a year and a month in the Gregorian calendar. For example, 2000 “08 represents August of 2000.
`hexBinary`	Contains hexadecimal encoded binary data.
`ID`	Contains an XML version 1.0 specification `ID` attribute type.
`IDREF`	Contains an XML version 1.0 specification `IDREF` attribute type.
`IDREFS`	Contains an XML version 1.0 specification `IDREFS` attribute type.
`int`	Contains an integer that ranges from “2147483648 to 214748367.
`integer`	Contains a type derived from decimal ”that is, decimal digits with an option sign.
`language`	Contains the xml:lang value that is defined in the XML version 1.0 specification.
`long`	Contains a `long` integer.
`Name`	Contains a valid XML 1.0 `Name` type.
`NCName`	Contains a XML Namespace `NCName` . Note that this is the same as an XML Namespace `Qname` without the prefix and colon .
`negativeInteger`	Contains a negative integer with a valid range of “126789 to “1.
`NMTOKEN`	Contains an XML version 1.0 specification `NMTOKEN` attribute type.
`NMTOKENS`	Contains an XML version 1.0 specification `NMTOKENS` attribute type.
`nonNegativeInteger`	Contains a non-negative integer.
`nonPositiveInteger`	Contains a negative integer.
`normalizedString`	New lines, tabs, and carriage returns are converted to spaces before processing.
`NOTATION`	Contains an XML version 1.0 specification `NOTATION` attribute type.
`positiveInteger`	Contains a positive integer.
`QName`	Contains a valid XML Namespace `QName` .
`short`	Contains a short integer, such as 1 or 12678.
`string`	Indicates a normal string.
`Time`	Contains a string representing time as either 13:20:00.000 or 13:20:00.000-05:00. Note the “05:00, which is for a conversion to Eastern Standard Time, 5 hours behind Universal time.
`token`	Similar to a `normalizedString` ; however, leading and trailing whitespace is removed and multiple adjacent space characters are reduced to a single space character.
`unsignedByte`	Valid range is from 0 to 126.
`unsignedInt`	Derived from an unsigned long, and the maximum inclusive value is 4294967295.
`unsignedLong`	Derived from the `nonNegativeInteger` type, the max value is 18446744073709551615.
`unsignedShort`	Derived from an unsigned integer, the max inclusive value is 65535.

As you can see, there are a large number of data types already defined for you in XML schema. As mentioned earlier, you can use these types to limit the valid values of element character data in an XML document.

XML Schema Example

Now that I've provided a little bit of a background to XML schema, let's take a look at an actual XML schema. To make things a little easier, let's reuse the XML document that you created for the DTD section as your input data. I'll show it again here for your convenience in Listing A.4.

Listing A.4 XML document containing daily bank transactions. (Filename: app_a_daily_activity_ndc.xml)

 <?xml version="1.0" encoding="UTF-8"?>  <daily_activity branch="Manasquan, NJ">     <transaction type="withdrawal">        <account_number>11-22-33-4444</account_number>        <name>Mark Riehl</name>        <amount>0.00</amount>        <month>6</month>        <day>8</day>     </transaction>     <transaction type="deposit">        <account_number>22-11-44-1111</account_number>        <name>Joseph Burns</name>        <amount>.00</amount>        <month>6</month>        <day>10</day>     </transaction>  </daily_activity>

Remember, XML schemas have basically the same function as a DTD. Their purpose is to help us define the structure and contents of an XML document. As mentioned earlier, some XML parsers can compare an XML document to an XML schema to verify that the XML document is valid, that is, conforms to the XML schema.

Take a look at the XML document shown in Listing A.4. Which elements, attributes, and child relationships do you need to define, so that you can describe the XML document? Looking at the XML document in Listing A.4, let's make a list of what you need. First, you need the following elements:

<daily_activity> ” Root element that has an attribute named branch .
<transaction_type> ” Complex element that has an attribute named type . The <transaction_type> element has the following child elements: <account_number> , <name> , <amount> , <month> ,and <day> .

Now that you know what you need to define, let's take a look at the XML schema in Listing A.5 that defines this XML document.

Listing A.5 XML schema for the daily bank transactions. (Filename: app_a_daily_activity.xsd)

 1.   <?xml version="1.0" encoding="UTF-8"?>  2.   <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"  elementFormDefault="qualified">  3.  4.     <! account_number element definition >  5.     <xs:element name="account_number" type="xs:string"/>  6.  7.     <! amount element definition >  8.     <xs:element name="amount" type="xs:string"/>  9.  10.     <! name element definition >  11.     <xs:element name="name" type="xs:string"/>  12.  13.     <! day element definition >  14.     <xs:element name="day">  15.       <xs:simpleType>  16.         <xs:restriction base="xs:integer">  17.           <xs:minInclusive value="1"/>  18.           <xs:maxInclusive value="31"/>  19.         </xs:restriction>  20.       </xs:simpleType>  21.     </xs:element>  22.  23.     <! month element definition >  24.     <xs:element name="month">  25.       <xs:simpleType>  26.         <xs:restriction base="xs:integer">  27.           <xs:minInclusive value="1"/>  28.           <xs:maxInclusive value="12"/>  29.         </xs:restriction>  30.       </xs:simpleType>  31.     </xs:element>  32.  33.     <! root element definition >  34.     <xs:element name="daily_activity">  35.       <xs:complexType>  36.         <xs:sequence>  37.           <xs:element ref="transaction" minOccurs="0" maxOccurs="unbounded"/>  38.         </xs:sequence>  39.         <xs:attribute name="branch" type="xs:string" use="required"/> 40.       </xs:complexType>  41.     </xs:element> 42.  43.     <! transaction element definition >  44.     <xs:element name="transaction">  45.       <xs:complexType>  46.         <xs:sequence>  47.           <xs:element ref="account_number"/>  48.           <xs:element ref="name"/>  49.           <xs:element ref="amount"/>  50.           <xs:element ref="month"/>  51.           <xs:element ref="day"/>  52.         </xs:sequence>  53.         <xs:attribute name="type" use="required">  54.           <xs:simpleType>  55.             <xs:restriction base="xs:string">  56.               <xs:enumeration value="withdrawal"/>  57.               <xs:enumeration value="deposit"/>  58.             </xs:restriction>  59.           </xs:simpleType>  60.         </xs:attribute>  61.       </xs:complexType>  62.     </xs:element>  63.  64.   </xs:schema>

1 “2 The first important point about the XML schema is the XML declaration on the first line. Remember, XML schemas are well-formed, valid XML documents. You will also notice that we're using a W3C schema namespace, and all our element names will be prefixed with xs .

4 “11 The first three elements that we declare in the XML schema also happen to be the simplest. The <account_number> element is declared by type string; that's because our account number was of the form "11-22-33-4444."

We declare the <amount> element as a type string also because it contains the transfer amount as a string (for example, $100.00, including the dollar sign). The <name> element is also declared as a type string in this block.

13 “21 XML schemas have a feature called facets that enable you to limit the data that a simple type can store. We're defining the <day> element as a type positiveInteger . In addition to the type definition, we're going to use the minInclusive and maxInclusive facets to restrict the valid values for the <day> element. Because this is a day of the month, we're going to limit the valid values to between 1 and 31.

Note

Additional information on XML schema simple types and their allowable facets can be found at http://www.w3.org/TR/xmlschema-0/#SimpleTypeFacets.

23 “31 The declaration for the <month> element is just about identical to the declaration for the <day> element. Both are declared as positiveInteger types, and both are using the minInclusive and maxInclusive facets to restrict the valid values for this element. The only difference is the range of valid values. Because this is a declaration for a month, the valid range of values is from 1 to 12.

33 “41 The declaration for the root <daily_activity> element has a few different constructs that we need to discuss. First, note that the <daily_activity> element is being declared as a complex type by the <complexType> element. Remember, because the root element will have child elements, it must be declared as a complex type. Also, note the <sequence> element; it will contain all the child elements of our root <daily_activity> . Here, we can also specify the minimum and the maximum number of occurrences of the transaction element by using the minOccurs and maxOccurs attributes, respectively. In this block, we're also defining the <daily_activity> element attribute named branch . The required branch attribute has been defined as type string because it will contain a town name.

43 “62 The declaration of the <transaction> element is the longest declaration in the XML schema. First, we're declaring that the <transaction> element is a complex type. The <transaction> element is a complex element because it has several child elements and an attribute. Each of the child elements appears in the <sequence> .

In addition to declaring the child elements in this block, we're also declaring the type attribute for the <transaction> element. Note that the attribute is being declared as a simple type, and that the type is a string . We are also limiting the potential values for the attribute by using the <restriction> element and specifying the valid values (" withdrawal " or " deposit ").

Comparing Document Type Definitions and XML Schemas

Before we finish up this section, let's discuss some of the advantages and disadvantages of DTDs and XML schemas.

Advantages and Disadvantages of DTDs

Several advantages to using DTDs exist. First, DTDs are easy to learn and work with, and therefore, you have a relatively short learning curve. So, you can be working with DTDs in a short amount of time. As you have seen from the examples, DTDs can be very compact, even for fairly complicated XML documents. If you're new to XML, DTDs might initially be a little easier to work with compared to XML schemas.

DTDs do have quite a few shortcomings. First, DTDs are not in XML, so you need to become familiar with the format used by DTDs in addition to learning XML. Also, DTDs have limited capabilities. For example, using a DTD, you can't specify that elements must contain a particular data type. All you can do is specify that some type of data is present in an element. Remember, one of the main uses for XML is to facilitate the exchange of data. DTDs don't allow you to specify data types. As a result, you will need to add additional error checking to your application to validate the input data in the XML document whenever you're using DTDs (because they can create unnecessary, additional work).

Advantages and Disadvantages of XML Schemas

XML schemas have several advantages. First, contrary to DTDs, XML schemas are written in XML. It's easier to work in XML, especially if you're learning XML for the first time. Another advantage of XML schemas is that they are much stricter than DTDs. For example, you can specify data types for elements. This allows for more control over input data and enables you to reduce the input type checking required in your application.

The major disadvantage of XML schemas is their complexity and size. Initially, they can be overwhelming. In our example, the XML schema is approximately four times the size of the DTD that describes the same XML document. However, a number of commercial and freely available tools and editors are available to help automate the process of XML schema development. One of the more popular commercial tools used throughout the development of the book is XML Spy (http://www.xmlspy.com).

As you become more familiar with XML, you will begin to see that the extra work required by XML schemas is well worth the effort.