The World Wide Web Consortium is currently working on a replacement for DTDs. Why would the Consortium want to do this? DTDs seem to do a decent job at what they were designed to do. Is it just old technology? No, the problem is that they are far from the perfect solution. DTDs are not without their problems, and we're going to discuss these problems here.
An example of one of these problems that occurs often is the ability to put any type of data, even garbled text, in a field that is supposed to contain a date. This is allowed because there is no type checking. The Schema SpecificationWith this myriad of problems present, the W3C put together a goals list for the DTD replacement. At a minimum, the replacement would be all the following:
The result, so far, is a specification consisting of three documents:
This is personal opinion, but the one thing that the W3C missed the boat on was the one thing that drove the development of an alternative system to DTDs, simplicity. If you think DTDs are difficult to understand and write, just wait until we get to the point where we can talk about schemas with a little working knowledge. I believe this will hurt XML schema adoption. In November 2000, Elliott Rusty Harold, who is in the forefront of XML discussions and who wrote The XML Bible, after briefly noting the dangers of predictions , said he felt there would be only a partial success for the XML schema language. Knowing that XML developers need schemas desperately, he also said he felt that these were too complex. Also, he felt that once it was determined what was useful and what was not, schemas would be replaced . This is at least 10 years in the future, though. Adding a little more fog to the equation is Microsoft. Rather than wait for XML schemas to become a standard, Microsoft went and developed its own schema mechanism based on the original XML-Data note (http://www.w3.org/TR/1998/NOTE-XML-data-0105/) and the Document Content Description note (http://www.w3.org/TR/NOTE-dcd). These references are very outdated . If you want more information, you can go to http://msdn.microsoft.com/xml/reference/schema/start.asp. This is Microsoft's schema reference. Because of this, I will be covering the W3C recommendation here and will discuss the Microsoft differences later in the book when necessary. The Basics of SchemasFor a change, let's look at schemas in a kind of tutorial mode. We'll take an XML document and its schema document and analyze how a schema document is pieced together. First let's look at the XML document in Listing 1.7. This is also known as an instance document of a schema because it has a schema associated with it, as given in Listing 1.8. Listing 1.7 The Starting XML Document<?xml version="1.0"?> <resumes applicationDate="2000-12-20"> <applicant> <name>Troy Miller</name> <street>MBA Way</street> <city>Roy</city> <state>UT</state> <zip>84067</zip> </applicant> <applicant> <name>Mark Hilliard</name> <street>1821 W. 2400 S.</street> <city>Roy</city> <state>UT</state> <zip>84067</zip> </applicant> <comment>Can we hire one of these people, please?</comment> <jobsAvailable> <job num="1176A0"> <title>Programmer</title> <positions>Emp 4</positions> <salary>45000</salary> <comment>What programming language?</comment> </job> <job num="A5-113-2"> <title>Claim Adjuster</title> <position>Emp 6</position> <salary >32000</salary> <hiredate>2000-12-21</hiredate> </ job > </jobsAvailable > </resumes> Notice, first of all, that this is a well- formed XML document. It has a root element <resumes> that contains other elements ( <department> , <applicant> ), which in turn contain subelements until you come to either numbers or test values. Even though it is not shown in the resumes document, there are ways to declare the location of the associated schema file for this instance document via a namespace mechanism.We'll cover that a little later in this chapter. Now let's look at the schema document shown in Listing 1.8. Listing 1.8 Schema Document Associated with the XML Document in Listing 1.7<xsd:schema xmlns:xsd="http://www.w3.org/2000/08/XMLSchema"> <xsd:annotation> <xsd:documentation> Resumes schema for resumes.xml. </xsd:documentation> </xsd:annotation> <xsd:element name="resumes" type="resumesType"/> <xsd:element name="comment" type="xsd:string"/> <xsd:complexType name="resumesType"> <xsd:sequence> <xsd:element name="applicant" type="address"/> <xsd:element name="jobsAvailable" type="jobListType"/> <xsd:element ref="comment" minOccurs="0"/> </xsd:sequence> <xsd:attribute name="applicationDate" type="xsd:date"/> </xsd:complexType> <xsd:complexType name="address"> <xsd:element name="name" type="xsd:string"/> <xsd:element name="street" type="xsd:string"/> <xsd:element name="city" type="xsd:string"/> <xsd:element name="state" type="xsd:string"/> <xsd:element name="zip" type="xsd:decimal"/> </xsd:complexType> <xsd:complexType name="jobListType"> <xsd:sequence> <xsd:ComplexType name="job" type="jobDesc" <xsd:attribute name="num" type="xsd:string"/> </xsd:complexType> </xsd:sequence> </xsd:complexType> <xsd:complexType name="jobDesc"> <xsd:element name="title" type="xsd:string"/> <xsd:element name="position" type="xsd:string"/> <xsd:element name="salary"> <xsd:simpleType> <xsd:restriction base="xsd:positiveInteger"> <xsd:maxExclusive value="55000"/> </xsd:restriction> </xsd:simpleType> </xsd:element> </xsd:complexType> </xsd:schema> Notice again, just like the XML document with which it is associated, this schema document is a well-formed XML document. We'll be going through this document for much of the rest of this chapter. Looking at this document now, you can see several new element names throughout the listing, with <complexType> and <simpleType> the more important among them. In brief, elements that contain subelements or that carry attributes are said to be complex types, whereas elements that contain numbers (and strings, and dates, and so on) but do not contain any subelements are said to be simple types. Some elements have attributes; attributes always have simple types. This distinction between the two different types of elements is critical to schemas. The XML schema specification comes with many different simple types already defined, which we will see in the next table, while we declare complex types ourselves . It is also possible to declare new simple types. We'll see that also. The important thing to remember about how to go about writing a schema is that, conceptually, you do it just like you write a DTD. Start with the root element. If it has subelements, define it as a complexType . Now define each subelement. If the subelements do not have subelements themselves , define them as either an element (in which case you're done with it) or a simpleType if you want to embellish its description. Repeat these last steps as many times as necessary until all elements are defined. If you are familiar with the C or C++ programming languages, you might think of a complex type as a named struct with the simple types as the struct components . Until we get around to discussing schema namespaces, please bear with the prefix xsd: and accept it as is. It won't affect the discussion for now. Simple TypesAs I said earlier, the XML schema specification has a set of predefined simple types. These are listed in Table 1.7. Table 1.7. Predefined Simple Data Types
These built-in simple data types are used in two cases, one of which is when defining the type of an <element> , as in the following example: <xsd:element name="title" type="xsd:string"/> The other case is when you are deriving a new simpleType , which we'll talk about in the next section. An important concept concerning simpleTypes is what I call declaration scope. If an element or attribute declaration appears as a child of the <xsd:schema> declaration and is external to any complexType declaration, then it is considered a global declaration. This allows it to be referenced with the ref attribute, making it unnecessary to redeclare the element. <xsd:element name="comment" type="xsd:string"/> . . <xsd:element name="jobsAvailable" type="jobListType"/> <xsd:element ref="comment" minOccurs="0"/> </xsd:sequence> Here, the fact that the comment element meets the two necessary criteria allows us to declare a reference to the comment element. This keeps us from having to repeatedly redefine the same type of elements. Again, by definition, simple types contain no subelement or attributes. Both built-in simple types and their derivations can be used in all element and attribute declarations. New simple types are defined by deriving them from existing simple types, both those already built-in and simple types that have been previously derived. In particular, we can derive new simple types by a process known as restriction. We do this by making the legal range of values for the new type a subset of the existing type's range of values. We need two elements to accomplish this: first, the <simpleType> element to define and name the new simple type, and second, the <restriction> element to indicate the base type of the element and to identify the facets that constrain the range of values. Don't let this new term throw you. There's nothing magical about it. Think of facets as a synonym for properties, and you should have no problem. A complete list of facets is provided in Tables 1.8, 1.9, and 1.10.These tables list the built-in simple types and which facets apply to them. Table 1.8. Simple Data Types and Associated Facets
Table 1.9. Ordered Simple Data Types and Associated Facets
Table 1.10. Time and Date Ordered Simple Data Types and Associated Facets
Just about all of the facets defined in Tables 1.8, 1.9, and 1.10 are straightforward. One of most interesting, however, is enumeration, which is a facet in Table 1.8. Enumeration limits a simple type to a set of distinct values. For example, we could define an element militaryMonth derived from a string whose value must be one of the standard military abbreviations for months of the year. Suppose we want to create a new type of integer called salary whose range of values is between 25000 and 75000 (inclusive).We know this is of type integer, so we base it on the integer type. Our base type integer consists of values less than 25000 and greater than 75000. So we restrict the range of the salary element by employing two facets, minInclusive and maxInclusive (see Listings 1.9 and 1.10). Listing 1.9 Schema Fragment Detailing minInclusive and maxInclusive<xsd:element name="salary"> <xsd:simpleType> <xsd:restriction base="xsd:positiveInteger"> <xsd:minExclusive value="25000"/> <xsd:maxExclusive value="75000"/> </xsd:restriction> </xsd:simpleType> </xsd:element> Listing 1.10 Enumeration Facet Example<xsd:simpleType name="militaryMonth" base="xsd:String"> <xsd:enumeration value = "JAN"> <xsd:enumeration value = "FEB"> <xsd:enumeration value = "MAR"> <xsd:enumeration value = "APR"> <xsd:enumeration value = "MAY"> <xsd:enumeration value = "JUN"> <xsd:enumeration value = "JUL"> <xsd:enumeration value = "AUG"> <xsd:enumeration value = "SEP"> <xsd:enumeration value = "OCT"> <xsd:enumeration value = "NOV"> <xsd:enumeration value = "DEC"> </xsd:simpleType> Enumeration values specified for a particular type must be unique. Complex TypesWe define new complex types with the <complexType> element. These definitions usually contain other element declarations, element references, and attribute declarations. The declarations are not themselves types; rather, they declare a relationship between a name and constraints, which dictates how that name appears in documents governed by the associated schema. Listing 1.11 shows our complexType address definition: Listing 1.11 complexType address<xsd:complexType name="address"> <xsd:element name="name" type="xsd:string"/> <xsd:element name="street" type="xsd:string"/> <xsd:element name="city" type="xsd:string"/> <xsd:element name="state" type="xsd:string"/> <xsd:element name="zip" type="xsd:decimal"/> </xsd:complexType> This definition declares that any element appearing in an instance document whose type is declared to be address must consist of five elements. These elements must be called name , street , city , state , and zip , as specified by the values of the declarations' name attributes, and the elements must appear in the same order in which they are declared. The first four of these elements will each contain a string, and the fifth will contain a decimal number. Default and Fixed ContentThe <xsd:element> element has two unique attributes that we have not discussed, fixed and default . Only one of these attributes can be used in the same element at a time. fixed specifies the value of an element and also declares that the value cannot be changed. In the following example, the width element is declared to always have a value of 10. <xsd:element name="width" type="xsd:integer" fixed="10"/> default assigns a value to an element, but this value can change as needed or stay the same. It's never a good idea to declare a variable in an algebraic equation as empty or 0 in case the variable is ever divided into another number. Division by 0 is a very bad thing. So you can assign a default value other than 0 to the element that defines the variable and solves the problem. <xsd:element name="width" type="xsd:integer" default="10"/> Here the width element is declared to have a value of 10, and this value can be changed as deemed necessary. Attribute DeclarationsAttributes are declared with the <attribute> element. This element has a type attribute that specifies the simple type of the attribute. Remember that attributes can only be of simple type. Attributes can appear once or not at all (the default), so attribute occurrence syntax is different than element syntax. Specifically, a use attribute is placed in an attribute declaration to indicate whether the attribute is required or optional and, if optional, whether the attribute's value is fixed or there is a default. This is an identical usage to the way elements are declared to have a fixed or default value (see Table 1.11). Table 1.11. The Possible Values of use
A second attribute, value , provides any value that is called for. Let's look at an example: <xsd:attribute name="age" type="xsd:int" use="default" value="32"> This declaration means that the appearance of an age attribute is optional, although its value must be 32 if it does appear. If it does not appear, a schema processor will create an age attribute with this value. Gathering the facets and attributes that affect the number of times an element or attribute can appear ( minOccurs , maxOccurs , fixed , default , use , and value ), we can summarize their effect as shown in Table 1.12. Table 1.12. Attribute/Element Occurrence Constraints
Element ContentWhat, exactly, can an element contain? After all, we've talked about everything else except that. The XML schema specification makes provisions for two types of content: empty content and mixed content. Empty ContentJust as in a DTD, there can be empty elements in the schema. They contain no data, but they can have attributes. Declaring a complexType and assigning the value empty to the type attribute of the element specifies empty elements. Let's say we have the following XML element: <RECTANGLE WIDTH='12' HEIGHT='8'> This element has no content, only attributes, so we call this an empty element. Listing 1.12 shows the associated schema declaration for this element. Listing 1.12 Schema Declaration for the RECTANGLE Element<xsd:element name="rectangle" <xsd:complexType content="empty"> <xsd:attribute name="width" type="xsd:int" use="default" value"1"/> <xsd:attribute name="height" type="xsd:int" use="default" value"1"/> </xsd:complexType> </xsd:element> Notice that there is no content, only attribute declarations. Mixed ContentOur resume schema can be characterized as elements containing subelements, and the deepest subelements contain character data. There are also provisions for the construction of schemas where character data can appear alongside subelements, and this character data is not necessarily confined to the deepest subelements. To accomplish this construction, we use the mixed value of the content attribute. Listing 1.13 shows an XML snippet from a letter in reply to a resume sent for review. Listing 1.13 XML Snippet of a Letter<?xml version="1.0"? <letter> <salutation>Dear Mr. <name>Troy Miller</name>.</salutation> Your resume dated <resumeDate>12 December 2000</resumeDate> was received on <resumereceivedDate> 1999-05-21 </resumereceivedDate>. </letter> Notice the text appearing between elements and their child elements. Specifically, text appears between the elements <salutation> , <name> , <resumeDate> , and <receivedDate> , which are all children of <letter> , and text appears around the element name , which is the child of a child of <letter> . Listing 1.14 shows a schema snippet declaring <letter> . Listing 1.14 Schema for the Resume Letter<xsd:element name="letter"> <xsd:complexType content="mixed"> <xsd:element name="salutation"> <xsd:complexType content="mixed"> <xsd:element name="name" type="xsd:string"/> </xsd:complexType> </xsd:element> <xsd:element name="resumeDate" type="xsd:string"/> <xsd:element name="receivedDate" type="xsd:date" minOccurs="0"/> <!-- etc --> </xsd:complexType> </xsd:element> If you take a close look at this schema definition and the original XML document declaration, you might say, "Well, this schema covers the occurrence of text inside the <salutation> element (Dear Mr.), but what about the other text 'Your resume dated' and 'was received on'? " These occurrences were taken care of by declaring the main element, <letter> , as being of mixed content. Think of it as, "Any text outside of child elements of their parent element is covered by declaring the parent element as being of mixed content." How's that for a quote? Schema AnnotationsThere must be a way to provide comments in schemas, right? DTDs use XML comments, so what's the parallel with schemas? If you were thinking that comment methods in schemas might be more complicated than in DTDs, you would be right. There is an upside to this complexity, however. Schemas provide three elements for the addition of annotations:
An interesting twist to annotations in schemas is that the W3C took into account the fact that annotations are not just for the human reader in human-readable form but also for the machine that likes to read them in machine-readable form. You'll understand in a minute. Both of the elements <documentation> and <appInfo> appear as subelements of the <annotation> element. The <annotation> element can appear at the beginning of schema constructs. Look at Listing 1.15, which appeared at the beginning of the resumes XML document. Listing 1.15 The <xsd:documentation> Element<xsd:schema xmlns:xsd="http://www.w3.org/2000/08/XMLSchema"> <xsd:annotation> <xsd:documentation> Resumes schema for resumes.xml. </xsd:documentation> </xsd:annotation> Here we've provided what is basically nothing more than a comment. This is the human-readable text we talked about. Annotations can also appear at the beginning of other schema constructs such as simpleType and attribute. The appInfo element, which wasn't in our resumes example, can be used to provide information for program tools, stylesheets, and any other applications written to take advantage of it. This is the machine-readable form that complements the human-readable form. Listing 1.16 enumerates what facets, properties, and restrictions the float and double types have in a way that could be machine readable. It is from the W3C XML Schema Part 2: Datatypes specification. Listing 1.16 Facets and Properties of the Float and Double Data Types<simpleType name="float" id="float"> <annotation> <appinfo> <hfp:hasFacet name="pattern"/> <hfp:hasFacet name="enumeration"/> <hfp:hasFacet name="whiteSpace"/> <hfp:hasFacet name="maxInclusive"/> <hfp:hasFacet name="maxExclusive"/> <hfp:hasFacet name="minInclusive"/> <hfp:hasFacet name="minExclusive"/> <hfp:hasProperty name="ordered" value="true"/> <hfp:hasProperty name="bounded" value="true"/> <hfp:hasProperty name="cardinality" value="finite"/> <hfp:hasProperty name="numeric" value="true"/> </appinfo> <documentation xml:lang="en" source="http://www.w3.org/TR/xmlschema-2/#float"/> </annotation> <restriction base="anySimpleType"> <whiteSpace value="collapse"/> </restriction> </simpleType> <simpleType name="double" id="double"> <annotation> <appinfo> <hfp:hasFacet name="pattern"/> <hfp:hasFacet name="enumeration"/> <hfp:hasFacet name="whiteSpace"/> <hfp:hasFacet name="maxInclusive"/> <hfp:hasFacet name="maxExclusive"/> <hfp:hasFacet name="minInclusive"/> <hfp:hasFacet name="minExclusive"/> <hfp:hasProperty name="ordered" value="true"/> <hfp:hasProperty name="bounded" value="true"/> <hfp:hasProperty name="cardinality" value="finite"/> <hfp:hasProperty name="numeric" value="true"/> </appinfo> <documentation xml:lang="en" source="http://www.w3.org/TR/xmlschema-2/#double"/> </annotation> <restriction base="anySimpleType"> <whiteSpace value="collapse"/> </restriction> </simpleType> |