Earlier in this chapter, we worked with Office 2003 documents and converted them to an XML format. Using a schema allowed us to specify the element names and structures for the documents that we created. Schemas also helped to determine if data in the XML document was valid.
In this section, well create both Document Type Definitions (DTDs) and schemas. Collectively, we call DTDs and schema document models . Document models provide a template and rules for constructing XML documents. When a document matches these rules, it is a valid document. Valid documents must start by being well formed . Then they have to conform to the DTD or schema.
The rules contained in DTDs and schemas usually involve the following:
Specifying the name of elements and attributes
Identifying the type of content that can be stored
Specifying hierarchical relationships between elements
Stating the order for the elements
Indicating default values for attributes
Before you create either a DTD or schema, you should be familiar with the information that youre using and the relationships between different sections of the data. This will allow you to create a useful XML representation. I find it best to work with sample data in an XML document and create the DTD or schema once Im sure that the structure of the document meets my needs.
Its good practice to create a DTD or schema when you create multiple XML documents with the same structure. Document models are also useful where more than one author has to create the same or similar XML documents. Finally, if you need to use XML documents with other software, there may be a requirement to produce a DTD or schema so that the data translates correctly.
If youre writing a one-off XML document with element structures that youll never use again, its probably overkill to create a document model. It will certainly be quicker for you to create the elements as you need them and make changes as required without worrying about documentation.
The DTD specification is older than XML schemas. In fact, DTDs predate XML documents and have their roots in Standard Generalized Markup Language (SGML). Because the specification is much older than XML, it doesnt use an XML structure.
On the other hand, schemas use XML to provide descriptions of the document rules. This means that its possible to use an XML editor to check whether a schema is a well-formed document. You dont have this kind of checking ability with DTDs.
Schemas provide many more options for specifying the type of data for elements and attributes than DTDs. You can choose from 44 built-in datatypes so, for example, you can specify whether an element contains a string, datetime, or Boolean value. You can also add restrictions to specify a range of values, for example, numbers greater than 500. If the built-in types dont meet your needs, you can create your own datatypes and inherit details from existing datatypes.
The datatype support within XML schemas gives you the ability to be very specific in your specifications. You can include much more detail about elements and attributes than is possible in a DTD. Schemas can apply more rigorous error checking than DTDs.
Schemas also support namespaces. Namespaces allow you to identify elements from different sources by providing a unique identifier. This means that you can include multiple schemas in an XML document and reuse a single schema in multiple XML documents. Organizations are likely to work with the same kinds of data, so being able to reuse schema definitions is an important advantage when working with schemas.
One common criticism of XML documents is that they are verbose. As XML documents, the same criticism could be leveled at schemas. When compared with DTDs, XML schemas tend to be much longer. It often takes several lines to achieve something that you could declare in a single line within a DTD.
Table 3-1 shows the main differences between DTDs and schemas.
DTD cant be parsed.
XSD document can be parsed.
No support for data typing.
Datatypes can be specified and custom datatypes created.
DTDs cant inherit from one another.
Schemas support inheritance.
No support for namespaces.
Support for namespaces.
One DTD for each XML document.
Multiple schema documents can be used.
A DTD defines an XML document by providing a list of elements that are legal within that document. It also specifies where the elements must appear in the document as well as the number of times the element should appear.
You create or reference a DTD with a DOCTYPE declaration; youve probably seen these at the top of XHTML and HTML documents. A DTD can either be stored within an XML document or in an external DTD file.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN" "http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd">
The simplest DOCTYPE declaration includes only a reference to the root element of the document:
This declaration can also include other declarations, a reference to an external file, or both. DTD declarations are listed under the XML declaration:
<?xml version="1.0"?> <!DOCTYPE documentRoot [element declarations]>
All internal declarations are contained in a DOCTYPE declaration at the top of the XML document. This includes information about the elements and attributes in the document. The element declarations can be on different lines:
<!DOCTYPE documentRoot [ <!ELEMENT declaration 1> <!ELEMENT declaration 2> ]>
External file references point to declarations saved in files with the extension .dtd . They are useful if you are working with multiple documents that have the same rules. External DTD references are included in an XML document with
<!DOCTYPE documentRoot SYSTEM "file.dtd">
DTDs contain declarations for elements, attributes, and entities.
You declare an element in the following way:
<!ELEMENT elementName (elementContents)>
Make sure that you use the same case for the element name in both the declaration and XML document.
Elements that are emptythat is, that dont have any contentuse the word EMPTY :
<!ELEMENT elementName (EMPTY)>
Child elements appear in a list after the parent element name. The order within the DTD indicates the order for the elements in the XML document:
<!ELEMENT elementName (child1, child2, child3)>
Elements can also include modifiers to indicate how often they should appear in the XML document. Children that appear once or more use a plus + sign as a modifier:
<!ELEMENT elementName (childName+)>
The pipe character ( ) indicates a choice of elements. Its like including the word or .
<!ELEMENT elementName (child1child2)>
You can combine a choice with other elements by using brackets to group elements together:
<!ELEMENT elementName ((child1child2),child3)> <!ELEMENT elementName (child1, child2(child3,child4))>
Optional child elements are shown with an asterisk. This means they can appear any number of times or not at all.
<!ELEMENT elementName (childName*)>
A question mark ( ? ) indicates child elements that are optional but that can appear a maximum of once:
<!ELEMENT elementName (childName?)>
Elements that contain character data include CDATA as content:
<!ELEMENT elementName (#CDATA)>
You can also use the word ANY to indicate that any type of data is acceptable:
<!ELEMENT elementName (ANY)>
The element declarations can be quite complicated. For example:
<!ELEMENT elementName ((child1child2child3),child4+,child5*,#CDATA)>
This declaration means that the element called elementName contains character data. It includes a choice between the child1 , child2 , or child3 elements, followed by child4 , which can appear once or more. The element child5 is optional.
Table 3-2 provides an overview of the symbols used in element declarations.
Specifies the order of child elements.
Signifies that an element has to appear at least once, i.e., one or more times.
Allows a choice between elements.
Marks content as a group.
Specifies that the element is optional and can appear any number of times, i.e., 0 or more times.
Specifies that the element is optional, but if it is present, it can only appear once, i.e., 0 or 1 times.
No symbol indicates that element must appear exactly once.
Attributes declarations come after the elements. Their declarations are a little more complicated:
<!ATTLIST elementName attributeName attributeType defaultValue>
The elementName is the element that includes this attribute. Table 3-3 shows the main values for attributeType .
A unique identifier
The id of another element
A list of ids from other elements
A valid XML name, i.e., doesnt start with a number and has no spaces
A list of valid XML names
An entity name
A list of entity names
A list of specific values, e.g., (redbluegreen)
Most commonly, attributes are of the type CDATA.
The defaultValue indicates a default value for the element. In the following example, the XML element <address> will have an <addressType> attribute with a default value of home . In other words, if the attribute isnt included in the XML document, a value of home will be assumed.
<!ATTLIST address addressType CDATA "home">
Using #REQUIRED will force a value to be set for the attribute in the XML document:
<!ATTLIST address addressType CDATA #REQUIRED>
You can use #IMPLIED if the attribute is optional:
<!ATTLIST address addressType CDATA #IMPLIED>
If you always want to use the same value for an attribute and dont want it to be overridden, use #FIXED :
<!ATTLIST address addressType CDATA #FIXED "home">
You can also specify a range of acceptable values separated by a pipe character :
<!ATTLIST address addressType (homeworkmailing) "home">
You can declare all attributes of a single element at the same time within the same ATTLIST declaration:
<!ATTLIST address addressType (homepostalwork) #REQUIRED addressID CDATA #IMPLIED addressDefault (truefalse) "true">
The declaration lists a required addressType attribute, which has to have a value of home , postal , or work . The addressID is a CDATA type and is optional. The final attribute, addressDefault , can have a value of either true or false with the default value being true .
You can also declare attributes separately:
<!ATTLIST address addressType (homepostalwork) #REQUIRED> <!ATTLIST address addressID CDATA #IMPLIED > <!ATTLIST address addressDefault (truefalse) "true">
Entities are a shorthand way to refer to something that you want to use in more than one place or in more than one XML document. You also use them for specific characters on a keyboard. If youve worked with HTML, youve probably used entities for nonbreaking spaces ( ) and the copyright symbol ( © ).
You declare an entity as follows :
<!ENTITY entityName "entityValue">
Whenever you want to use the value of the entity in an XML document, you can use &entityName; .
In the following example, Ive declared two entities, email and author :
<!ENTITY email "email@example.com"> <!ENTITY author "Sas Jacobs, AIP">
I could refer to these entities in my XML document using &email; or &author; . The entities mean firstname.lastname@example.org and Sas Jacobs, AIP .
Entities can also reference external content; we call these external entities . They are a little like using a server-side include file in an HTML document.
<!ENTITY address SYSTEM "addressBlock.xml">
The XML document would use the entity &address; to insert the contents from the addressBlock.xml file. You could also use a URL like http://www.friendsofed.com/addressBlock.xml. The advantage here is that you only have to update the entity in a single location and the value will change throughout the XML document.
The following listing shows a sample inline DTD. The DTD describes our phone book XML document:
<!DOCTYPE phoneBook[ <!ELEMENT phoneBook (contact+)> <!ELEMENT contact (name,address,phone)> <!ELEMENT name (#PCDATA)> <!ELEMENT address (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ATTLIST contact id CDATA #REQUIRED> ]>
Ive saved the XML document containing these declarations in the resource file addressDTD.xml . Figure 3-36 shows this file validated within XMLSpy.
The file addressEDTD.xml refers to the same declarations in the external DTD. If you open the resource file phoneBook.dtd youll see that it doesnt include a DOCTYPE declaration at the top of the file. This listing shows the content:
<!ELEMENT phoneBook (contact+)> <!ELEMENT contact (name,address,phone)> <!ELEMENT name (#PCDATA)> <!ELEMENT address (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ATTLIST contact id CDATA #REQUIRED>
This DTD declares the root element phoneBook . The root element can contain a single element contact , which can appear one or more times. The contact element contains three elements name , address , and phone each of which must appear exactly once. The data in these elements is of type PCDATA or parsed character data.
The DTD includes a declaration for the attribute id within the contact element. The type is CDATA, and it is a required attribute.
Designing DTDs can be a tricky process, so you will probably find it easier if you organize your declarations carefully . You can add extra lines and spaces so that the DTD is easy to read.
An XML schema is an XML document that lists the rules for other XML documents. It defines the way elements and attributes are structured, the order of elements, and the datatypes used for elements and attributes.
A schema has the same role as a DTD. It determines the rules for valid XML documents. Unlike DTDs, however, you dont have to learn new syntax to create schemas because they are another example of an XML document. Schemas are popular for this reason. Some people find it strange that DTDs use a non-XML approach to define XML document structure.
At the time of writing, the current recommendation for XML schemas was at www.w3.org/TR/2004/REC-xmlschema-1-20041028/. Youll find the Datatypes section of the recommendation at www.w3.org/TR/2004/REC-xmlschema-2-20041028/. The working drafts for XML Schema version 1.1 are at www.w3.org/TR/2005/WD-xmlschema11-1-20050224/ and www.w3.org/TR/2005/WD-xmlschema11-2-20050224/.
Schemas offer several advantages over DTDs. Because schemas can inherit from each other, you can reuse them with different document groups. Its easier to use XML documents created from databases with schemas because they recognize different datatypes. You write schemas in XML so you can use the same tools that you use for your other XML documents.
You can embed a schema within an XML document or store it within an external XML file saved with an .xsd extension. In most cases, its better to store the schema information externally so youll be able to reuse it with other XML documents that follow the same format.
An external schema starts with an optional XML declaration followed by a <schema> element, which is the document root. The <schema> element contains a reference to the default namespace. The xmlns declaration shows that all elements and datatypes come from the namespace http://www.w3.org/2001/XMLSchema. In my declaration, elements from this namespace should use the prefix xsd .
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
As with a DTD, a schema describes the document model for an XML document. This can consist of declarations about elements and attributes and about datatypes. The order of the declarations in the XSD document doesnt matter.
You declare elements as either simpleType or complexType . They can also have empty, simple, complex, or mixed content. Elements that have attributes are automatically complexType elements. Elements that only include text are simpleType .
Ive included a sample schema document called addressSchema.xsd with your resources to illustrate some of the concepts in this section. Youll probably want to have it open as you refer to the examples that follow. You can see the complete schema at the end of this section.
In the sample schema, youll notice that the prefix xsd is used in front of all elements. This is because Ive referred to the namespace with the xsd prefix, that is, xmlns:xsd=http://www.w3.org/2001/XMLSchema . Everything included from this namespace will be prefixed in the same way .
Simple type elements contain text only and have no attributes or child elements. In other words, simple elements contain character data. The text included in a simple element can be of any datatype. You can define simple element as follows:
<xsd:element name="elementName" type="elementType"/>
In our phone book XML document, the <name> , <address> , and <phone> elements are simple type elements. The definitions in the XSD schema document show this:
<xsd:element name="name" type="xsd:string"/> <xsd:element name="address" type="xsd:string"/> <xsd:element name="phone" type="xsd:string"/>
There are 44 built-in simple types in the W3C Schema Recommendation. You can find out more about these types at www.w3.org/TR/xmlschema-2/. Common simple types include string , integer, float , decimal , date , time , ID , and boolean .
Attributes are also simple type elements and are defined with
<xsd:attribute name="attributeName" type="elementType"/>
All attributes are optional unless their use attribute is set to required :
<xsd:attribute name="attributeName" type="elementType" use="required"/>
The id attribute in the <contact> element is an example of a required attribute:
<xsd:attribute name="id" type="xsd:integer" use="required"/>
A default or fixed value can be set for simple elements by using
<xsd:attribute name="attributeName" type="elementType" default="defaultValue"/>
<xsd:attribute name="attributeName" type="elementType" fixed="fixedValue"/>
You cant change the value of a simple type element that has a fixed value.
Complex type elements include attributes and/or child elements. In fact, any time an element has one or more attributes it is automatically a complex type element. The <contact> element is an example of a complex type element.
Complex type elements have different content types, as shown in Table 3-4.
Element has no content.
Element contains only text.
Element contains only child elements.
Element contains child elements and text.
Its a little confusing. An element can have a complex type with simple content, or it can be a complex type element with empty content. Ill go through these alternatives next .
A complex type element with empty content such as
is defined in a schema with
<xsd:element name="recipe"> <xsd:complexType> <xsd:attribute name="id" type="xsd:positiveInteger"/> </xsd:complexType> </xsd:element>
The <recipe> element is a complexType but only contains an attribute. In the example, the attribute is declared. We could also use a ref attribute to refer to an attribute that is already declared elsewhere within the schema.
A complex type element with simple content like
<recipe id="1234"> Omelette </recipe>
is declared in the following way:
<xsd:element name="recipe"> <xsd:complexType> <xsd:simpleContent> <xsd:extension base="xsd:string"> <xsd:attribute name="id" type="xsd:positiveInteger"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType> </xsd:element>
In other words, the complex element called <recipe> has a complex type but simple content. The content has a base type of string . The element includes an attribute called id that is a positiveInteger .
Complex types have content that is either a sequence, a list, or a choice of elements. You must use either <sequence> , <all> , or <choice> to enclose your child elements. Attributes are defined outside of the <sequence> , <all> , or <choice> elements.
A complex type element with complex content such as
<recipe> <food> Eggs </food> </recipe>
is declared as follows:
<xsd:element name="recipe"> <xsd:complexType> <xsd:sequence> <xsd:element ref="food"/> </xsd:sequence> </xsd:complexType> </xsd:element>
A complex type element with mixed content such as
<recipe> Omelette <food> Eggs </food> </recipe>
is defined with
<xsd:element name="recipe"> <xsd:complexType mixed="true"> <xsd:sequence> <xsd:element name="food" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element>
The mixed attribute is set to true so that the <recipe> element can contain a mixture of both child elements and text or character data.
If an element has children, the declaration needs to specify the names of the child elements, the order in which they appear, and the number of times that they can be included.
The sequence element specifies the order of child elements:
<xsd:element name="elementName"> <xsd:complexType> <xsd:sequence> <xsd:element name="childElement1" type="xsd:string"/> <xsd:element name="childElement2" type="xsd:string"/> <xsd:element name="childElement3" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element>
You can replace sequence with all where child elements can be written in any order but each child element must appear only once:
<xsd:all> <xsd:element name="childElement1" type="xsd:string"/> <xsd:element name="childElement2" type="xsd:string"/> <xsd:element name="childElement3" type="xsd:string"/> </xsd:all>
The element choice indicates that only one of the child elements should be included from the group:
<xsd:choice> <xsd:element name="childElement1" type="xsd:string"/> <xsd:element name="childElement2" type="xsd:string"/> </xsd:choice>
The number of times an element appears within another can be set with the minOccurs and maxOccurs attributes:
<xsd:element name="food" type="xsd:string" minOccurs="0" maxOccurs="1"/>
In the previous example, the element is optional but if it is present, it must appear only once. You can use the value unbounded to specify an unlimited number of occurrences:
<xsd:element name="food" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/>
When neither of these attributes is present, the element must appear exactly once.
If youre not sure about the structure of a complex element, you can specify any content:
<xsd:element name="elementName"> <xsd:complexType> <xsd:any minOccurs="0" /> </xsd:complexType> </xsd:element>
The author of an XML document that uses this schema will be able to create an optional child element.
You can also use the element anyAttribute to add attributes to an element:
<xsd:element name="elementName"> <xsd:complexType> <xsd:element name="childElement" type="xsd:string"/> <xsd:anyAttribute /> </xsd:complexType> </xsd:element>
You can use annotations to describe your schemas. An <annotation> element contains a <documentation> element that encloses the description. You can add annotations anywhere , but its often helpful to include them underneath an element declaration:
<xsd:element name="recipe"> <xsd:annotation> <xsd:documentation> A description about the element </xsd:documentation> </xsd:annotation> ... more declarations </xsd:element>
You can include a schema in an XML document by referencing it in the document root. Schemas always include a reference to the XMLSchema namespace. Optionally, they may include a reference to a target namespace:
<phoneBook xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="addressSchema.xsd">
The reference uses noNamespaceSchemaLocation because the schema document doesnt have a target namespace.
The topic of schemas is very complicated. There are other areas that I havent discussed in this chapter. An example that relates to the phone book XML document should make things a little clearer.
This listing shows the complete schema from the resource file addressSchema.xsd :
<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name="phoneBook"> <xsd:complexType> <xsd:sequence> <xsd:element ref="contact" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="contact"> <xsd:complexType> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="address" type="xsd:string"/> <xsd:element name="phone" type="xsd:string"/> </xsd:sequence> <xsd:attribute name="id" type="xsd:integer" use="required"/> </xsd:complexType> </xsd:element> </xsd:schema>
The schema starts by declaring itself as an XML document and referring to the http://www.w3.org/2001/XMLSchema namespace. The first element defined is <phoneBook> . This is a complexType element that contains one or more <contact> elements. The attribute ref indicates that Ive defined <contact> elsewhere in the document.
The <contact> element contains the simple elements <name> , <address> , and <phone> in that order. Each child element of <contact> can appear only once and is of type string . The <contact> element also contains a required attribute called id that is an integer type.
The schema is saved as resource file addressSchema.xsd . The XML file that references this schema is addressSchema.xml . You can open the XML file in XMLSpy or another validating XML editor and validate it against the schema.
We havent covered everything there is to know about XML schemas in this section, but there should be enough to get you started.