Chapter 5. Creating XML Schemas

CONTENTS

XML Schemas in Internet Explorer
W3C XML Schemas
Declaring Types and Elements
Specifying Attribute Constraints and Defaults
Creating Simple Types
Creating Empty Elements
Creating Mixed-Content Elements
Annotating Schemas
Creating Choices
Creating Sequences
Creating Attribute Groups
Creating all Groups
Schemas and Namespaces

For the previous two chapters, we've been working with DTDs. Over time, many people have complained to the W3C about the complexity of DTDs and have asked for something simpler. W3C listened, assigned a committee to work on the problem, and came up with a solution that is much more complex than DTDs ever were: XML schemas.

On the other hand, schemas are also far more powerful and precise than DTDs ever were. With schemas, not only can you specify the syntax of a document as you would with a DTD, but you also can do the following: specify the actual data types of each element's content, inherit syntax from other schemas, annotate schemas, use schemas with multiple namespaces, create simple and complex data types, specify the minimum and maximum number of times that an element can occur, create list types, create attribute groups, restrict the ranges of values that elements can hold, restrict what other schemas can inherit from yours, merge fragments of multiple schemas together, require that attribute or element values be unique, and much more.

Currently, the specification for XML schemas is in the working draft stage, which means that it will probably change before becoming a recommendation. You can find the specification in these three documents:

http://www.w3.org/TR/xmlschema-0/. XML schema primer, a tutorial introduction to schemas
http://www.w3.org/TR/xmlschema-1/. XML schema structures, the formal details on creating schemas
http://www.w3.org/TR/xmlschema-2/. XML schema data types, all about the data types that you can use in schemas

The schema working group expressly set out to tackle a few issues: using namespaces when validating documents, providing for data typing and restrictions, allowing and restricting inheritance between schemas, and creating primitive data types, among others.

Some software is available for modern schema support. W3C had an early schema checker at http://cgi.w3.org/cgi-bin/xmlschema-check, but that page now points to an alpha version of a new schema checker at what looks like a temporary location: http://www.w3.org/2000/06/webdata/xsv. Apache's Xerces XML parser now contains some support for schemas see http://xml.apache.org/xerces-j/. Oracle also has some support see http://technet.oracle.com/tech/xml/schema_java/index.htm. XML Spy has support at http://new.xmlspy.com/features_schema.html. You can find additional software implementations (including a tool to convert from DTDs to schemas) at http://www.w3.org/XML/Schema.html.

One of the original proponents of XML schemas was Microsoft. Microsoft's documentation on XML frequently decried DTDs as being too complex and said that schemas would fix the problem. That's not the way it turned out. In fact, the Microsoft implementation of XML schemas in Internet Explorer was promptly outdated not long after it was introduced.

XML Schemas in Internet Explorer

As with many other developers, Microsoft got caught basing its software on a relatively early XML specification, which promptly changed. As implemented in Internet Explorer, Microsoft's schemas are based on the XML-Data Note http://www.w3.org/TR/1998/NOTE-XML-data-0105/ and the Document Content Description (DCD) note http://www.w3.org/TR/NOTE-dcd, and are now very outdated.

In this chapter, I'll take a look at schemas as used today in the only implementation that I know of: Internet Explorer. After getting an overview of schemas as used in Internet Explorer, I'll go on to take a look in depth at the most recent XML schema specification and explore how to use it.

You can find the Microsoft XML schema reference at http://msdn.microsoft.com/xml/reference/schema/start.asp. To see how to use schemas in Internet Explorer, I'm going to create an example here. In this case, I'll create a document holding the names of a couple XML programmers in a document whose root is <PROGRAMMING_TEAM>.

<?xml version="1.0" ?> <PROGRAMMING_TEAM>     <PROGRAMMER>Fred Samson</PROGRAMMER>     <PROGRAMMER>Edward Fredericks</PROGRAMMER>     <DESCRIPTION>XML Programming Team</DESCRIPTION> </PROGRAMMING_TEAM>

How do you associate a schema with this document as far as Internet Explorer is concerned? You do so by specifying a default namespace for the document, using the xmlns namespace attribute in the root element, and prefacing the name of the schema file with x-schema:, like this:

<?xml version="1.0" ?> <PROGRAMMING_TEAM xmlns="x-schema:schema1.xml">     <PROGRAMMER>Fred Samson</PROGRAMMER>     <PROGRAMMER>Edward Fredericks</PROGRAMMER>     <DESCRIPTION>XML Programming Team</DESCRIPTION> </PROGRAMMING_TEAM>

Here, I'm naming the schema file schema1.xml (Internet Explorer does not insist on any special extension for schema files). All that remains is to create the schema file itself. To do that, I start with the <schema> element, like this:

<schema>     .     .     . </schema>

You can name the schema using the name attribute, like this:

<schema name="schema1">     .     .     . </schema>

One of the advantages of using schemas is that they allow you to specify the actual data types that you want to use, but those data types weren't fully fleshed out at the time Microsoft decided to implement schemas, so Microsoft implemented its own. To create a schema for Internet Explorer, you set up a default namespace, urn:schemas-microsoft-com:xml-data, and a namespace prefix of dt for data types: urn:schemas-microsoft-com:datatypes:

<schema name="schema1"     xmlns="urn:schemas-microsoft-com:xml-data"     xmlns:dt="urn:schemas-microsoft-com:datatypes">     .     .     . </schema>

Now Internet Explorer data types are available for use with the dt prefix; you'll find those data types in Table 5.1. There's more information at http://msdn.microsoft.com/xml/reference/schema/datatypes.asp as well.

Table 5.1. Microsoft XML Schema Data Types
Type	Description
`bin.base64`	Base64-encoded binary object.
`bin.hex`	Hexadecimal digits.
`boolean`	`0` or `1` values.
`char`	A one-character string.
`date`	Date (in ISO 8601 format, without the time data, such as `"2001-10-15"`).
`dateTime`	Date and time (in ISO 8601 format, with optional time data, such as `"2001-10-15T09:41:33"`).
`dateTime.tz`	Date, time, and time zone (in ISO 8601 format, with optional time data, and time zone, such as `"2001-10-15T09:41:33-08:00"`).
`fixed.14.4`	Format identical to the `number` format, but with no more than 14 digits to the left of the decimal point, and no more than 4 to the right.
`float`	Floating point number.
`int`	Integer value.
`number`	A simple number, with no limit on digits. This value can have a sign, fractional digits, and an exponent.
`time`	Time in a ISO 8601 format, with no date and no time zone.
`time.tz`	Time in a ISO 8601 format, with no date and with an optional time zone.
`i1`	Integer represented in 1 byte.
`i2`	Integer represented in 1 word.
`i4`	Integer represented in 4 bytes.
`r4`	Real number, with 7-digit precision.
`r8`	Real number, with 15-digit precision.
`ui1`	Unsigned integer, stored in 1 byte.
`ui2`	Unsigned integer, stored in 2 bytes.
`ui4`	Unsigned integer, stored in 4 bytes.
`uri`	Universal resource identifier (URI).
`uuid`	Hexadecimal digits representing octets.

To actually specify the syntax of an element in an Internet Explorer schema currently, you use the <elementtype> element, as in this case, where I'm specifying that the <PROGRAMMER> and <DESCRIPTION> elements can contain only text and that their content model is closed this means that they cannot accept any other content than listed. (In these versions of schemas, if you leave the content model as open, the element can contain content other than what you specify.)

<schema name="schema1"     xmlns="urn:schemas-microsoft-com:xml-data"     xmlns:dt="urn:schemas-microsoft-com:datatypes">     <elementtype name="PROGRAMMER" content="textOnly" model="closed"/>     <elementtype name="DESCRIPTION" content="textOnly" model="closed"/>     .     .     . </SCHEMA>

If I want to specify the type of an element, I can use the dt:type attribute like this, specifying a data type from Table 5.1:

<schema name="schema1"     xmlns="urn:schemas-microsoft-com:xml-data"     xmlns:dt="urn:schemas-microsoft-com:datatypes">     <elementtype name="PROGRAMMER" content="textOnly" model="closed"/>     <elementtype name="DESCRIPTION" content="textOnly" model="closed"/> <elementtype name="counter" dt:type="int"/>     .     .     . </schema>

Unfortunately, Internet Explorer doesn't yet support data type checking using the data types that you specify in schemas. However, you can use data types directly in XML documents like this in Internet Explorer:

<document xmlns:dt="urn:schemas-microsoft-com:datatypes"><dt:int>8</dt:int></document>

Next, I'll define the <PROGRAMMING_TEAM> element in this schema. This element can contain both <PROGRAMMER> and <DESCRIPTION> elements, but it can contain only elements (not text). You specify this by assigning the value eltOnly to the <elementtype> element's content attribute in this version of schemas. Notice that I'm also specifying that the <PROGRAMMER> element must occur at least once and that the <DESCRIPTION> element must also occur once, but only once, in the <PROGRAMMING_TEAM> element, using the minOccurs and maxOccurs attributes:

<schema name="schema1"     xmlns="urn:schemas-microsoft-com:xml-data"     xmlns:dt="urn:schemas-microsoft-com:datatypes">     <elementtype name="PROGRAMMER" content="textOnly" model="closed"/>     <elementtype name="DESCRIPTION" content="textOnly" model="closed"/>     <elementtype name="PROGRAMMING_TEAM" content="eltOnly" model="closed">         <element type="PROGRAMMER" minOccurs="1" maxOccurs="*"/>         <element type="DESCRIPTION" minOccurs="1" maxOccurs="1"/>   </elementtype> </schema>

That's what schemas look like in Internet Explorer. It's not worth fleshing out this example in more detail because it's based on an obsolete schema model; because the XML schema specification has changed a great deal, Microsoft will have to change its implementation.

For that reason, in the rest of this chapter, I'll take a look at the way the W3C says schemas should work. Unfortunately, no software support exists for true XML schemas yet, but it will come in time. Presumably, the Internet Explorer schema model will follow the official schema recommendation when it comes out, at least as a partial implementation. This means that the material in the rest of the chapter is the way things will look in the future, even in Internet Explorer.

W3C XML Schemas

Most of this chapter will center on an example document, book.xml, and its accompanying schema, book.xsd (.xsd is the extension that W3C uses by convention for schema files). This example is all about recording the books loaned by one person, Doug Glass, and borrowed by another, Britta Regensburg. I record the name and address of the borrower and lender, as well as data about the actual books borrowed, including their titles, publication date, replacement value, and maximum number of days that the book may be loaned for. Here's what book.xml looks like:

<?xml version="1.0"?> <transaction borrowDate="2001-10-15">     <Lender phone="607.555.2222">         <name>Doug Glass</name>         <street>416 Disk Drive</street>         <city>Medfield</city>         <state>MA</state>     </Lender>     <Borrower phone="310.555.1111">         <name>Britta Regensburg</name>         <street>219 Union Drive</street>         <city>Medfield</city>         <state>CA</state>     </Borrower>     <note>Lender wants these back in two weeks!</note>     <books>         <book bookID="123-4567-890">             <bookTitle>Earthquakes for Breakfast</bookTitle>             <pubDate>2001-10-20</pubDate>             <replacementValue>15.95</replacementValue>             <maxDaysOut>14</maxDaysOut>         </book>         <book bookID="123-4567-891">             <bookTitle>Avalanches for Lunch</bookTitle>             <pubDate>2001-10-21</pubDate>             <replacementValue>19.99</replacementValue>             <maxDaysOut>14</maxDaysOut>         </book>         <book bookID="123-4567-892">             <bookTitle>Meteor Showers for Dinner</bookTitle>             <pubDate>2001-10-22</pubDate>             <replacementValue>11.95</replacementValue>             <maxDaysOut>14</maxDaysOut>         </book>         <book bookID="123-4567-893">             <bookTitle>Snacking on Volcanoes</bookTitle>             <pubDate>2001-10-23</pubDate>             <replacementValue>17.99</replacementValue>             <maxDaysOut>14</maxDaysOut>         </book>     </books> </transaction>

Note in particular that this document has a root element named <transaction> and various subelements such as <Lender>, <Borrower>, <books>, and so on. In fact, the subelements themselves have elements, such as the multiple <book> elements inside the <books> element.

In terms of XML schemas, elements that enclose subelements or have attributes are complex types. Elements that enclose only simple data such as numbers, strings, or dates but that do not have any subelements are simple types. In addition, attributes are always simple types because attribute values cannot contain any structure. If you look at a document as a tree, simple types have no subnodes, while complex types can.

The distinction between simple and complex types is an important one because you declare simple and complex types differently. You declare complex types yourself, and the XML schema specification comes with many simple types already declared, as we'll see. You can also declare your own simple types; we'll see how to do that as well.

Here's another thing to note: No part of the document book.xml indicates what schema you should use with it (with DTDs, you use the <!DOCTYPE> element to specify an external DTD). The W3C has not been very clear on what the exact mechanism is for associating schema with documents in fact, the W3C assumes that an XML processor can find the schema for a document without any information from the document itself. How this is expected to work in detail is not clear right now, although the W3C seems to want this association between schema and document to be made through the use of namespaces, much like the namespace declaration that we saw in the XML example document used in Internet Explorer:

<?xml version="1.0" ?> <PROGRAMMING_TEAM xmlns="x-schema:schema1.xml">     <PROGRAMMER>Fred Samson</PROGRAMMER>     <PROGRAMMER>Edward Fredericks</PROGRAMMER>     <DESCRIPTION>XML Programming Team</DESCRIPTION> </PROGRAMMING_TEAM>

This relies quite a bit on the XML processor to track down the schema from a namespace declaration; it's less of a problem for this Internet Explorer example because Internet Explorer knows that a namespace that begins with x-schema: refers to a schema. In general, however, you'll have to declare a namespace for a document that refers to the schema, something like this:

<?xml version="1.0"?> <transaction borrowDate="2001-10-15"     xmlns="http://www.starpowder.com/schema">     <Lender phone="607.555.2222">         <name>Doug Glass</name>         <street>416 Disk Drive</street>         <city>Medfield</city>         <state>MA</state>     </Lender>     .     .     .

It's then going to be up to the XML processor to find the schema from this description. Inside a schema, you can declare a target namespace. When the XML processor finds the schema and verifies that its target namespace is the same as the document's, it can validate the document. More details on this process will become clear as schemas come into more popular use.

Here's the schema for the document book.xml; this schema is named book.xsd, and the prefix xsd: is the prefix used by convention to indicate a W3C schema namespace. (I'll take a look at ways of avoiding the xsd: prefix later in this chapter, but usually you associate the namespace xsd with the W3C schema namespace and prefix W3C schema elements with xsd: so that they don't conflict with the elements you're declaring.) Note that, like all schemas, book.xsd is a well-formed XML document:

<xsd:schema xmlns:xsd="http://www.w3.org/1999/XMLSchema">     <xsd:annotation>         <xsd:documentation>             Book borrowing transaction schema.         </xsd:documentation>     </xsd:annotation>     <xsd:element name="transaction" type="transactionType"/>     <xsd:complexType name="transactionType">         <xsd:element name="Lender" type="address"/>         <xsd:element name="Borrower" type="address"/>         <xsd:element ref="note" minOccurs="0"/>         <xsd:element name="books" type="books"/>         <xsd:attribute name="borrowDate" type="xsd:date"/>     </xsd:complexType>     <xsd:element name="note" type="xsd:string"/>     <xsd:complexType name="address">         <xsd:element name="name" type="xsd:string"/>         <xsd:element name="street" type="xsd:string"/>         <xsd:element name="city" type="xsd:string"/>         <xsd:element name="state" type="xsd:string"/>         <xsd:attribute name="phone" type="xsd:string"             use="optional"/>     </xsd:complexType>     <xsd:complexType name="books">         <xsd:element name="book" minOccurs="0" maxOccurs="10">             <xsd:complexType>                 <xsd:element name="bookTitle" type="xsd:string"/>                 <xsd:element name="pubDate" type="xsd:date" minOccurs='0'/>                 <xsd:element name="replacementValue" type="xsd:decimal"/>                 <xsd:element name="maxDaysOut">                     <xsd:simpleType base="xsd:integer">                         <xsd:maxExclusive value="14"/>                      </xsd:simpleType>                 </xsd:element>                 <xsd:attribute name="bookID" type="catalogID"/>             </xsd:complexType>         </xsd:element>     </xsd:complexType>     <xsd:simpleType name="catalogID" base="xsd:string">         <xsd:pattern value="\d{3}-\d{4}-\d{3}"/>     </xsd:simpleType> </xsd:schema>

We'll go through the various parts of this schema in this chapter, but you can already see some of the structure here. Note, for example, that you use a particular namespace in XML schemas "http://www.w3.org/1999/XMLSchema" and that the schema elements such as <xsd:element> are part of that namespace (you don't have to use the prefix xsd:, but it's conventional). As you can see, elements are declared with the <xsd:element> element, and attributes are declared with the <xsd:attribute> element. Furthermore, you specify the type of elements and attributes when you declare them. To create types, you can use the <xsd:complexType> and <xsd:simpleType> elements (you can then create elements from the simple or complex types you've created, or use the built-in simple types), schema annotations with the <xsd:annotation> element, and so on.

In this case, the root element of the document, <transaction>, is defined to be of the type transactionType, and this element can contain several other elements, including those of the address and books types. The address type itself is defined to contain elements that hold a person's name and address, and the books type holds elements named <book> that describe a book, including its title, publication date, and so on. Using this schema, you can describe the syntax of book.xml completely. I'll start taking this schema apart now as we explore it piece by piece.

Declaring Types and Elements

The most basic thing to understand about XML schemas is the concept of using simple and complex types, and how they relate to declaring elements. Unlike with DTDs, you specify the type of the elements that you declare with schemas.

This means that the first step in declaring elements is to make sure that you have the types you want and that often means defining new complex types. Complex types can enclose elements and have attributes, and simple types cannot do either. You can find the simple types built into XML schemas in Table 5.2. (When you specify these types in schemas, bear in mind that you'll preface them with the W3C schema prefix, usually xsd:.)

Table 5.2. Simple Types Built into XML Schema
Type	Description
`binary`	Holds binary values, such as `110001`
`boolean`	Holds values such as `True, False, 1, 0`
`byte`	Represents a byte value, such as `123;` maximum of `255`
`century`	Holds a century, such as `20`
`date`	Represents a date in YYYY-MM-DD format, such as `2001-10-15`
`decimal`	Holds decimal values, such as `5.4, 0, -219.06`
`double`	Represents a double-precision 64-bit floating point
`ENTITIES`	Represents the XML 1.0 `ENTITIES` attribute type
`ENTITY`	Represents the XML 1.0 `ENTITY` attribute type
`float`	Represents a single-precision 32-bit floating point
`ID`	Represents the XML 1.0 `ID` attribute type
`IDREF`	Represents the XML 1.0 `IDREF` attribute type
`IDREFS`	Represents the XML 1.0 `IDREFS` attribute type
`int`	Represents an integer, such as `123456789`
`integer`	Represents an integer
`language`	Holds a language identifier, such as `de, fr, or en-US,` as defined in XML 1.0
`long`	Represents a long integer, such as `12345678901234`
`month`	Holds a month, such as `2001-10`
`Name`	Represents the XML 1.0 `Name` type
`NCName`	Holds an XML name without a namespace prefix and colon
`negativeInteger`	Represents a negative integer
`NMTOKEN`	Represents the XML 1.0 `NMTOKEN` attribute type
`NMTOKENS`	Represents the XML 1.0 `NMTOKENS` attribute type
`nonNegativeInteger`	Represents a non-negative integer
`nonPositiveInteger`	Represents a positive integer
`NOTATION`	Represents the XML 1.0 `NOTATION` attribute type
`positiveInteger`	Represents a positive integer
`QName`	Represents the XML `Namespace Qualified Name` type
`recurringDate`	Specifies a recurring date, such as `--10-15,` which means every October 15th
`recurringDay`	Specifies a recurring day, such as `----31,` which means every 15th day
`recurringDuration`	Holds a recurring duration, such as `--10-15T12:00:00,` which means October 15th every year at noon (Co-Ordinated Universal Time)
`short`	Represents a short integer, such as `12345`
`string`	Represents a string of text.
`time`	Represents a time, such as `12:00:00.000`
`timeDuration`	Holds a time duration, such as `P1Y2M3DT4H5M6.7S,` which means 1 year, 2 months, 3 days, 4 hours, 5 minutes, and 6.7 seconds
`timeInstant`	Holds the time in a format like this: `2001-10-15T12:00:00.000-05:00` (includes time zone adjustment)
`timePeriod`	Holds a time period, such as `2001-10-15T12:00`
`unsignedByte`	Represents an unsigned byte value
`unsignedInt`	Represents an unsigned integer
`unsignedLong`	Represents an unsigned long integer
`unsignedShort`	Represents an unsigned short integer
`uriReference`	Holds a URI, such as http://www.w3c.org
`year`	Holds a year, such as 2001

To ensure compatibility between XML schemas and XML DTDs, you should use only the simple types ID, IDREF, IDREFS, ENTITY, ENTITIES, NOTATION, NMTOKEN, and NMTOKENS when declaring attributes.

You create new complex types using the <xsd:complexType> element in schemas. A complex type definition itself usually contains element declarations, references to other elements, and attribute declarations. You declare elements with the <xsd:element> element, and you declare attributes with the <xsd:attribute> element. As in DTDs, element declarations specify the syntax of an element in schemas, however, element declarations can specify the elements' type as well. In addition, you can also specify the type of attributes.

Here's an example from book.xsd; in this case, I'm declaring a complex type named address, which holds the elements that make up a person's address:

<xsd:complexType name="address">     <xsd:element name="name" type="xsd:string"/>     <xsd:element name="street" type="xsd:string"/>     <xsd:element name="city" type="xsd:string"/>     <xsd:element name="state" type="xsd:string"/>     <xsd:attribute name="phone" type="xsd:string"         use="optional"/> </xsd:complexType>

I'll use address as the type of the <Lender> and <Borrower> elements so that I can store the addresses of the books' lender and borrower; that declaration looks like this:

<xsd:complexType name="transactionType">     <xsd:element name="Lender" type="address"/>     <xsd:element name="Borrower" type="address"/>     <xsd:element ref="note" minOccurs="0"/>     <xsd:element name="books" type="books"/>     <xsd:attribute name="borrowDate" type="xsd:date"/> </xsd:complexType>

In the address type, I'm indicating that any element of this type must have five elements and one attribute. Those elements are <name>, <street>, <city>, and <state>, and the attribute is phone. Note how the declarations for these elements set their data types as well: <name>, <street>, and <city> must all be of type xsd:string, the <state> element must be of type NMTOKEN, and the attribute phone must also be of type xsd:string.

The definition of the address complex type contains only declarations based on the simple types xsd:string. On the other hand, complex types can themselves contain elements that are based on complex types. You can see how this works in the transactionType, which is the type of book.xml's root element, <transaction>; in this case, two of the elements, <Lender> and <Borrower>, themselves are of the address type:

<xsd:complexType name="transactionType">     <xsd:element name="Lender" type="address"/>     <xsd:element name="Borrower" type="address"/>     <xsd:element ref="note" minOccurs="0"/>     <xsd:element name="books" type="books"/>     <xsd:attribute name="borrowDate" type="xsd:date"/> </xsd:complexType>

Note that the transactionType type also includes an attribute, borrowDate, which is of the simple type xsd:date. Attributes are always of a simple type because attributes can't have internal content.

After you've defined a new type, you can declare new elements of that type. For example, after declaring the transactionType, you can declare the <transaction> element, which is the root element of the document, to be of that type:

<xsd:element name="transaction" type="transactionType"/> <xsd:complexType name="transactionType">     <xsd:element name="Lender" type="address"/>     <xsd:element name="Borrower" type="address"/>     <xsd:element ref="note" minOccurs="0"/>     <xsd:element name="books" type="books"/>     <xsd:attribute name="borrowDate" type="xsd:date"/> </xsd:complexType> . . .

So far, then, we've had a look in overview of how to create new element and attribute declarations you use <xsd:element> and <xsd:attribute> elements and set the type attribute of those elements to the type that you want. If you want to use a complex type, you'll have to create it; you do that with the <xsd:complexType> element (we'll see how to create simple types in a few pages).

Now take a look at the declaration for the <note> element in the transactionType type:

<xsd:complexType name="transactionType">     <xsd:element name="Lender" type="address"/>     <xsd:element name="Borrower" type="address"/>     <xsd:element ref="note" minOccurs="0"/>     <xsd:element name="books" type="books"/>     <xsd:attribute name="borrowDate" type="xsd:date"/> </xsd:complexType>

Here, I'm not declaring a new element instead, I'm including an already existing element by referring to it. That is to say, the <note> element already exists, like this:

<xsd:complexType name="transactionType">     <xsd:element name="Lender" type="address"/>     <xsd:element name="Borrower" type="address"/>     <xsd:element ref="note" minOccurs="0"/>     <xsd:element name="books" type="books"/>     <xsd:attribute name="borrowDate" type="xsd:date"/> </xsd:complexType> <xsd:element name="note" type="xsd:string"/>

Using the ref attribute lets you include an element that has already been defined in a complex type definition. However, you can't just include any element by reference. The element that you refer to must have been declared globally, which means that it itself is not part of any other complex type. A global element or attribute declaration appears as an immediate child element of the <xsd:schema> element; when you declare an element or an attribute globally, it can be used in any complex type. Using the ref attribute in this way is a powerful technique because it lets you avoid redefining elements that already exist globally.

Now I'll take a look at how to specify how many times elements can occur in a complex type.

Specifying How Often Elements Can Occur

I've indicated that the <note> element can either appear or not appear in elements of the transactionType because I've set the minOccurs attribute like this, indicating that the minimum number of times this element can occur is 0:

<xsd:complexType name="transactionType">     <xsd:element name="Lender" type="address"/>     <xsd:element name="Borrower" type="address"/>     <xsd:element ref="note" minOccurs="0"/>     <xsd:element name="books" type="books"/>     <xsd:attribute name="borrowDate" type="xsd:date"/> </xsd:complexType>

In general, you can specify the minimum number of times that an element appears with the minOccurs attribute and the maximum number of times that it can appear with the maxOccurs attribute. For example, here's how I would say that the <note> element could appear from zero to five times in the transactionType type:

<xsd:complexType name="transactionType">     <xsd:element name="Lender" type="address"/>     <xsd:element name="Borrower" type="address"/>     <xsd:element ref="note" minOccurs="0" maxOccurs="5"/>     <xsd:element name="books" type="books"/>     <xsd:attribute name="borrowDate" type="xsd:date"/> </xsd:complexType>

The default value for minOccurs is 1; if you don't specify a value for maxOccurs, its default value is the value of minOccurs. To indicate that there is no upper bound to the maxOccurs attribute, set it to the value unbounded.

Specifying Default Values for Elements

Besides the minOccurs and maxOccurs attributes, you can also use the <xsd:element> element's fixed and default attributes to indicate values that an element must have (you use one or the other of these attributes, not both together). For example, setting fixed to 400 means that the element's value must always be 400. Setting the default value to 400, on the other hand, means that the default value for the element is 400, but if the element appears in the document, its actual value is the value it encloses.

For example, here I'm setting the value of an element named <maxTrials> to 100, and specifying that it must always be 100, using the fixed attribute in the <xsd:element> element:

<xsd:element name="maxTrials" type="xsd:integer" fixed="100"/>

Here I'm giving this element the default value of 100 instead of fixing its value at 100, which is useful if you want to provide default values to be used if the user doesn't specify an alternate value:

<xsd:element name="maxTrials" type="xsd:integer" default="100"/>

Specifying Attribute Constraints and Defaults

As with elements, you can specify the type of attributes. Unlike with elements, however, attributes must be of a simple type. In addition, you don't use minOccurs and maxOccurs for attributes because attributes can appear only once, at most. Instead, you use a different syntax when constraining attributes.

You declare attributes with the <xsd:attribute> element. The <xsd:attribute> element itself has a type attribute that gives the attribute's (simple) type. So how do you indicate if an attribute is required or optional, or if there's a default value, or even if the value of the attribute is fixed at a certain value? You use the <xsd:attribute> element's use and value attributes.

The use attribute specifies whether the attribute is required or optional; if the attribute is optional, the use attribute specifies whether the attribute's value is fixed or whether there is a default. The second attribute, value, holds any value that is needed.

For example, I've added an attribute named phone to the Address type; this attribute is of type xsd:string, and its use is optional:

<xsd:complexType name="address">     <xsd:element name="name" type="xsd:string"/>     <xsd:element name="street" type="xsd:string"/>     <xsd:element name="city" type="xsd:string"/>     <xsd:element name="state" type="xsd:string"/>     <xsd:attribute name="phone" type="xsd:string"         use="optional"/> </xsd:complexType>

Here are the possible values for the use attribute:

required. The attribute is required and may have any value.
optional. The attribute is optional and may have any value.
fixed. The attribute value is fixed, and you set its value with the value attribute.
default. If the attribute does not appear, its value is the default value set with the value attribute. If it does appear, its value is the value that it is assigned in the document.
prohibited. The attribute must not appear.

For example, consider this attribute declaration:

<xsd:attribute name="counter" type="xsd:int "use="fixed" value="400">

This declaration creates an integer attribute named counter, whose value is always 400. Now consider this attribute declaration:

<xsd:attribute name="counter" type="xsd:int "use="default" value="400">

This means that the counter attribute has a default value of 400 if it is not used, and it has the value assigned to it if it is used.

Creating Simple Types

Most of the types that I've used in book.xsd are simple types that come built into the XML schema specification, such as xsd:string, xsd:integer, xsd:date, and so on. However, take a look at the attribute named bookID this attribute is declared to be of the type catalogID:

<xsd:complexType name="books">     <xsd:element name="book" minOccurs="0" maxOccurs="10">         <xsd:complexType>             <xsd:element name="bookTitle" type="xsd:string"/>             <xsd:element name="pubDate" type="xsd:date" minOccurs='0'/>             <xsd:element name="replacementValue" type="xsd:decimal"/>             <xsd:element name="maxDaysOut">                 <xsd:simpleType base="xsd:integer">                     <xsd:maxExclusive value="14"/>                  </xsd:simpleType>             </xsd:element>             <xsd:attribute name="bookID" type="catalogID"/>         </xsd:complexType>     </xsd:element> </xsd:complexType>

This type, catalogID, is itself a simple type that is not built into the XML schema specification; instead, I've defined it with the <simpleType> element, like this:

<xsd:complexType name="books">     <xsd:element name="book" minOccurs="0" maxOccurs="10">         <xsd:complexType>             <xsd:element name="bookTitle" type="xsd:string"/>             <xsd:element name="pubDate" type="xsd:date" minOccurs='0'/>             <xsd:element name="replacementValue" type="xsd:decimal"/>             <xsd:element name="maxDaysOut">                 <xsd:simpleType base="xsd:integer">                     <xsd:maxExclusive value="14"/>                  </xsd:simpleType>             </xsd:element>             <xsd:attribute name="bookID" type="catalogID"/>         </xsd:complexType>     </xsd:element> </xsd:complexType> <xsd:simpleType name="catalogID" base="xsd:string">     <xsd:pattern value="\d{3}-\d{4}-\d{3}"/> </xsd:simpleType>

Note in particular that you must base new simple types such as catalogID on already existing simple type (either a built-in simple type or one that you've created; here, I'm using the built-in xsd:string type). To do that, you use the base attribute in the <xsd:simpleType> element. In the case of the catalogID type, I've based it on the xsd:string type with the attribute/value pair base="xsd:string". To describe the properties of new simple types, XML schemas use facets, which are discussed in the next section.

Creating Simple Types Using Facets

Using facets lets you restrict the data that a simple type can hold. For example, say that you want to create a simple type named dayOfMonth that can hold only values between 1 and 31, inclusive. In that case, you can define it this way, using the two facets minInclusive and maxInclusive:

<xsd:simpleType name="dayOfMonth" base="xsd:integer">      <xsd:minInclusive value="1"/>      <xsd:maxInclusive value="31"/>  </xsd:simpleType>

Now that you've created this new simple type, you can declare elements and attributes of this type.

In book.xsd, the catalogID simple type is even more powerful than this dayOfMonth simple type. The catalogID simple type uses the pattern facet to specify a regular expression (that is, a pattern set up to match text in the format that you specify) that text strings values for this type must satisfy:

<xsd:simpleType name="catalogID" base="xsd:string">     <xsd:pattern value="\d{3}-\d{4}-\d{3}"/> </xsd:simpleType>

In this case, the text in the simpleType type must match the regular expression "\d{3}-\d{4}-\d{3}", which stands for three digits, a hyphen, four digits, another hyphen, and three digits.

About Regular Expressions

The regular expressions used in XML schema facets are the same as those used in the Perl programming language. You can find the complete documentation for Perl regular expressions at the Comprehensive Perl Archive Network (CPAN) Web site: http://www.cpan.org/doc/manual/html/pod/perlre.html. (Regular expressions are not a skill that you'll need in this book.)

The catalogID type is the type of the <book> element's bookID attribute, so I can specify book ID values like this in book.xml, matching the regular expression that I've used for this attribute:

<book bookID="123-4567-890">     <bookTitle>Earthquakes for Breakfast</bookTitle>     <pubDate>2001-10-20</pubDate>     <replacementValue>15.95</replacementValue>     <maxDaysOut>14</maxDaysOut> </book>

What facets are there, and what built-in simple types support them? You'll find the seven general facets, listed by the simple types that support them, in Table 5.3.

Table 5.3. Simple Types and Applicable Facets
Type	Length	minLength	maxLength	Pattern	Enumeration
`binary`	x	x	x	x	x
`boolean`				x
`byte`				x	x
`century`				x	x
`date`				x	x
`decimal`				x	x
`double`				x	x
`ENTITIES`	x	x	x		x
`ENTITY`	x	x	x	x	x
`float`				x	x
`ID`	x	x	x	x	x
`IDREF`	x	x	x	x	x
`IDREFS`	x	x	x		x
`int`				x	x
`integer`				x	x
`language`	x	x	x	x	x
`long`				x	x
`month`				x	x
`Name`	x	x	x	x	x
`NCName`	x	x	x	x	x
`negativeInteger`				x	x
`NMTOKEN`	x	x	x	x	x
`NMTOKENS`	x	x	x		x
`nonNegativeInteger`				x	x
`nonPositiveInteger`				x	x
`NOTATION`	x	x	x	x	x
`positiveInteger`				x	x
`QName`	x	x	x	x	x
`recurringDate`			x	x
`recurringDay`			x	x
`recurringDuration`			x	x
`short`				x	x
`string`	x	x	x	x	x
`time`				x	x
`timeDuration`			x	x
`timeInstant`			x	x
`timePeriod`			x	x
`unsignedByte`				x	x
`unsignedInt`				x	x
`unsignedLong`				x	x
`unsignedShort`				x	x
`uriReference`	x	x	x	x	x
`year`				x	x

The numeric simple types, and those simple types that can be ordered, also have some additional facets, which you see in Tables 5.4 and 5.5.

Table 5.4. Ordered Simple Types and Applicable Facets, Part 1
Type	maxInclusive	maxExclusive	minInclusive
`binary`
`byte`	x	x	x
`century`	x	x	x
`date`	x	x	x
`decimal`	x	x	x
`double`	x	x	x
`float`	x	x	x
`int`	x	x	x
`integer`	x	x	x
`long`	x	x	x
`month`	x	x	x
`negativeInteger`	x	x	x
`nonNegativeInteger`	x	x	x
`nonPositiveInteger`	x	x	x
`positiveInteger`	x	x	x
`recurringDate`	x	x	x
`recurringDay`	x	x	x
`recurringDuration`	x	x	x
`short`	x	x	x
`string`	x	x	x
`time`	x	x	x
`timeDuration`	x	x	x
`timeInstant`	x	x	x
`timePeriod`	x	x	x
`unsignedByte`	x	x	x
`unsignedInt`	x	x	x
`unsignedLong`	x	x	x
`unsignedShort`	x	x	x
`year`	x	x	x

Table 5.5. Ordered Simple Types and Applicable Facets, Part 2
Type	minExclusive	Precision	Scale	Encoding
`binary`				x
`byte`	x	x	x
`century`	x
`date`	x
`decimal`	x	x	x
`double`	x
`float`	x
`int`	x	x	x
`integer`	x	x	x
`long`	x	x	x
`month`	x
`negativeInteger`	x	x	x
`nonNegativeInteger`	x	x	x
`nonPositiveInteger`	x	x	x
`positiveInteger`	x	x	x
`recurringDate`	x
`recurringDay`	x
`recurringDuration`	x
`short`	x	x	x
`string`	x
`time`	x
`timeDuration`	x
`timeInstant`	x
`timePeriod`	x
`unsignedByte`	x	x	x
`unsignedInt`	x	x	x
`unsignedLong`	x	x	x
`unsignedShort`	x	x	x
`year`	x

Some additional facets that have to do with dates and times apply to simple types, and you'll find them in Table 5.6.

Table 5.6. Time and Date Simple Types and Applicable Facets
Type	Period	Duration
`century`	x	x
`date`	x	x
`month`	x	x
`recurringDate`	x	x
`recurringDay`	x	x
`recurringDuration`	x	x
`time`	x	x
`timeDuration`	x	x
`timeInstant`	x	x
`timePeriod`	x	x
`year`	x	x

Of all the facets that you see in Tables 5.3, 5.4, and 5.5, my favorites are minInclusive, maxInclusive, pattern, and enumeration. We've seen the first three, but not the enumeration facet yet.

The enumeration facet lets you set up an enumeration of values, exactly as you can do in DTDs (as we saw in the previous chapter). Using an enumeration, you can restrict the possible values of a simple type to a list of values that you specify.

For example, to set up a simple type named weekday whose values can be "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", and "Saturday", you'd define that type like this:

<xsd:simpleType name="weekday" base="xsd:string">     <xsd:enumeration value="Sunday"/>     <xsd:enumeration value="Monday"/>     <xsd:enumeration value="Tuesday"/>     <xsd:enumeration value="Wednesday"/>     <xsd:enumeration value="Thursday"/>     <xsd:enumeration value="Friday"/>     <xsd:enumeration value="Saturday"/> </xsd:simpleType>

Using Anonymous Type Definitions

So far, all the element declarations that we've used in the book.xsd schema have used the type attribute to indicate the new element's type. But what if you want to use a type only once? Do you have to go to the trouble of declaring it and naming it, all to use it in only one element declaration?

It turns out that there is an easier way. You can use an anonymous type definition to avoid having to define a whole new type that you'll reference only once. Using an anonymous type definition simply means that you enclose the <xsd:simpleType> or <xsd:complexType> element inside the <xsd:element> element declaration. In this case, you don't assign an explicit value to the type attribute in the <xsd:element> element because the anonymous type that you're using doesn't have a name. (In fact, you can tell that an anonymous type definition is being used if the <xsd:complexType> element doesn't include a type attribute.)

Here's an example from book.xsd; in this case, I'll use an anonymous type definition for the <book> element. This element holds <bookTitle>, <pubDate>, <replacementValue>, and <maxDaysOut> elements. It will also have an attribute named bookID, so it looks like a good one to create from a complex type. Instead of declaring a separate complex type, however, I'll just put the <xsd:complexType> element inside the <xsd:element> element that declares <book>:

<xsd:element name="book" minOccurs="0" maxOccurs="10">     <xsd:complexType>     .     .     .     </xsd:complexType> </xsd:element>

Now I'm free to add the elements that I want inside the <book> element without defining a named, separate complex type at all:

<xsd:element name="book" minOccurs="0" maxOccurs="10">     <xsd:complexType>         <xsd:element name="bookTitle" type="xsd:string"/>         <xsd:element name="pubDate" type="xsd:date" minOccurs='0'/>         <xsd:element name="replacementValue" type="xsd:decimal"/>         .         .         .     </xsd:complexType> </xsd:element>

You can also use simple anonymous types; for example, the <maxDaysOut> element holds the maximum number of days that a book is supposed to be out. To set the maximum number of days that a book can be out to 14, I use a new simple anonymous type so that I can use the maxExclusive facet, like this:

<xsd:element name="book" minOccurs="0" maxOccurs="10">     <xsd:complexType>         <xsd:element name="bookTitle" type="xsd:string"/>         <xsd:element name="pubDate" type="xsd:date" minOccurs='0'/>         <xsd:element name="replacementValue" type="xsd:decimal"/>         <xsd:element name="maxDaysOut">             <xsd:simpleType base="xsd:integer">                 <xsd:maxExclusive value="14"/>              </xsd:simpleType>         </xsd:element>         .         .         .     </xsd:complexType> </xsd:element>

You can also include attribute declarations in anonymous type definitions, like this:

<xsd:element name="book" minOccurs="0" maxOccurs="10">     <xsd:complexType>         <xsd:element name="bookTitle" type="xsd:string"/>         <xsd:element name="pubDate" type="xsd:date" minOccurs='0'/>         <xsd:element name="replacementValue" type="xsd:decimal"/>         <xsd:element name="maxDaysOut">             <xsd:simpleType base="xsd:integer">                 <xsd:maxExclusive value="14"/>              </xsd:simpleType>         </xsd:element>         <xsd:attribute name="bookID" type="catalogID"/>     </xsd:complexType> </xsd:element>

Now I'll take a look at declaring empty elements.

Creating Empty Elements

Empty elements have no content, but they can have attributes so how do you declare them in XML schema? You do that by declaring a complex type and setting the <xsd:complexType> element's content attribute to "empty".

Here's an example. In this case, I'm going to create a new empty element named <image> that can take three attributes: source, width, and height, like this: <image source="/images/cover.gif" height="256" width=512" />. I start by declaring this element:

<xsd:element name="image">     .     .     . </xsd:element>

I haven't used the type attribute in this element's declaration because I'll use an anonymous type definition to base this element on. To create the anonymous type, I use a <complexType> element; notice that I'm setting the content attribute to "empty":

<xsd:element name="image">     <xsd:complexType content="empty">     .     .     .     </xsd:complexType> </xsd:element>

Finally, I add the attributes that this element will use:

<xsd:element name="image">     <xsd:complexType content="empty">         <xsd:attribute name="source" type="xsd:string" />         <xsd:attribute name="width" type="xsd:decimal" />         <xsd:attribute name="height" type="xsd:decimal" />     </xsd:complexType> </xsd:element>

And that's all it takes now the empty element <image> is ready to be used.

Creating Mixed-Content Elements

So far, the plain text in the documents that we've looked at in this chapter has been confined to the deepest elements in the document that is, to elements that enclose no child elements, just text. However, as you know, you can also create elements that support mixed content, both text and other elements. You can create mixed-content elements with schemas as well as DTDs. In these elements, character data can appear at the same level as child elements.

Here's an example document that shows what mixed-content elements look like using the elements that we've declared in book.xsd; in this case, I'm creating a new element named <reminder> that encloses a reminder letter to a book borrower to return a book:

<?xml version="1.0"> <reminder>     Dear <name>Britta Regensburg</name>:         The book <bookTitle>Snacking on Volcanoes</bookTitle>     was supposed to be out for only <maxDaysOut>14</maxDaysOut>     days. Please return it or pay     $<replacementValue>17.99</replacementValue>.     Thank you. </reminder>

This document uses both elements that we've defined before, character data, and the new <reminder> element. The <reminder> element is the one that has a mixed-content model; to declare it in a schema, I'll start by creating an anonymous new complex type inside the declaration for <reminder>:

<xsd:element name="reminder">     <xsd:complexType>     .     .     .     </xsd:complexType>  </xsd:element>

Recall that the <xsd:complexType> element has the content attribute with which you specify a content model; in this case, the content model is mixed:

<xsd:element name="reminder">     <xsd:complexType content="mixed">     .     .     .     </xsd:complexType>  </xsd:element>

Now all I have to do is to add the declarations for the elements that you can use inside the <reminder> element, like this:

<xsd:element name="reminder">     <xsd:complexType content="mixed">         <xsd:element name="name" type="xsd:string"/>         <xsd:element name="bookTitle" type="xsd:string"/>         <xsd:element name="maxDaysOut">             <xsd:simpleType base="xsd:integer">                 <xsd:maxExclusive value="14"/>             </xsd:simpleType>         </xsd:element>         <xsd:element name="replacementValue" type="xsd:decimal"/>     </xsd:complexType>  </xsd:element>

As you might recall from our discussion of DTDs, you can't constrain the order or number of child elements appearing in a mixed-model element. There's more power available when it comes to schemas, however here, the order and number of child elements does indeed have to correspond to the order and number of child elements that you specify in the schema. In other words, even though DTDs provide only partial syntax specifications for mixed-content models, schemas provide much more complete syntax specifications.

In fact, the demands on XML processors that want to support schemas are great including, for example, that they must implement the complete syntax for Perl-type regular expressions simply so that they can support the pattern facet. The upshot is that it might be a long time until a full implementation of schemas appears (if ever!).

elementOnly and textOnly Content Elements

Now that you know that you can use the content attribute of <xsd:complexType> to specify content models, such as empty and mixed, that raises this question: What model were we using when we used <xsd:complexType> without specifying a content model at all? For example, here's how the type transactionType is defined in book.xsd:

<xsd:element name="transaction" type="transactionType"/> <xsd:complexType name="transactionType">     <xsd:element name="Lender" type="address"/>     <xsd:element name="Borrower" type="address"/>     <xsd:element ref="note" minOccurs="0"/>     <xsd:element name="books" type="books"/>     <xsd:attribute name="borrowDate" type="xsd:date"/> </xsd:complexType>

The default model for complex types is called elementOnly, which means that the type can include only elements. In other words, this type definition is the same as this one in which I explicitly make the type elementOnly:

<xsd:element name="transaction" type="transactionType"/> <xsd:complexType name="transactionType" content="elementOnly">     <xsd:element name="Lender" type="address"/>     <xsd:element name="Borrower" type="address"/>     <xsd:element ref="note" minOccurs="0"/>     <xsd:element name="books" type="books"/>     <xsd:attribute name="borrowDate" type="xsd:date"/> </xsd:complexType>

Actually, I should say that the default complex type content model is elementOnly except in one case. When you derive a complex type from a simple type, the content model is textOnly, not elementOnly. The textOnly content model specifies that the content of elements of this type is text, which means that the XML processor will not apply any syntax rules to that content.

You can also create textOnly content model types explicitly, as in this example, in which I'm creating a new version of the <bookID> element using the textOnly content model to allow it to support different indexing schemes (and, therefore, a different format for book IDs). Because different indexing schemes will have different formats for book ID values, I'm intentionally removing the syntax checking here:

<xsd:element name="bookID">     <xsd:complexType content="textOnly">         <xsd:attribute name="indexingScheme" type="xsd:string" />     </xsd:complexType> </xsd:element>

The result is that the <bookID> element may now contain any kind of text (but no elements), and the XML processor won't check it for syntax violations.

As a general rule, W3C suggests that you stay away from removing all syntax checks like this if you can avoid it. In fact, it's not difficult to use regular expressions to specify alternate pattern matches. This means that, in this case, you can still use a simple type based on string and can constrain the syntax of the book ID with the pattern facet.

Annotating Schemas

In DTDs, you can use XML comments to add annotations and provide documentation. In schemas, you might expect that the situation would be a little more complex, and you'd be right. XML schemas define three new elements that you use to add annotations to schemas: <xsd:annotation>, <xsd:documentation>, and <xsd:appInfo>.

Here's how things work: The <xsd:annotation> element is the container element for the <xsd:documentation> and <xsd:appInfo> elements. The <xsd:documentation> element holds text of the kind you'd expect to see in a normal comment that is, text designed for human readers. As its name implies, the <xsd:appInfo> element, on the other hand, holds annotations suitable for applications that read the document. Such applications can pick up information from <xsd:appInfo> elements if those elements are constructed in a way they recognize.

Here's an example using <xsd:annotation> and <xsd:appInfo> elements. This example is actually from the schema that the W3C publishes for the data types that it uses in the XML schemas, and it's part of what's called the schema of all schemas. This is the declaration of the simple type string, and the <appInfo> element indicates what facets and properties this simple type has in a way that can be read by other applications. (Here it's the <appInfo>, not the <xsd:appInfo> element, because in this schema, the default namespace is the XML schema namespace.)

<simpleType name="string" base="urSimpleType">     <annotation>         <appinfo>             <has-facet name="length"/>             <has-facet name="minLength"/>             <has-facet name="maxLength"/>             <has-facet name="pattern"/>             <has-facet name="enumeration"/>             <has-facet name="maxInclusive"/>             <has-facet name="maxExclusive"/>             <has-facet name="minInclusive"/>             <has-facet name="minExclusive"/>             <has-property name="ordered" value="true"/>             <has-property name="bounded" value="false"/>             <has-property name="cardinality" value="countably infinite"/>             <has-property name="numeric" value="false"/>         </appinfo>     </annotation> </simpleType>

Here's another example; this one is from book.xsd and uses the <xsd:annotation> and <xsd:documentation> elements, adding an explanatory comment at the beginning of the book.xsd schema:

<xsd:schema xmlns:xsd="http://www.w3.org/1999/XMLSchema">     <xsd:annotation>         <xsd:documentation>             Book borrowing transaction schema.         </xsd:documentation>     </xsd:annotation>     <xsd:element name="transaction" type="transactionType"/>     <xsd:complexType name="transactionType">         <xsd:element name="Lender" type="address"/>         <xsd:element name="Borrower" type="address"/>         <xsd:element ref="note" minOccurs="0"/>         <xsd:element name="books" type="books"/>         <xsd:attribute name="borrowDate" type="xsd:date"/>     </xsd:complexType>     .     .     .

In fact, you can use the <xsd:annotation> element at the beginning of most schema constructions, such as the <xsd:schema>, <xsd:complexType>, <xsd:simpleType>, <xsd:element>, and <xsd:attribute> elements, and so on. Here's an example in which I've added an annotation to a complex type in book.xsd:

<xsd:schema xmlns:xsd="http://www.w3.org/1999/XMLSchema">     <xsd:annotation>         <xsd:documentation>             Book borrowing transaction schema.         </xsd:documentation>     </xsd:annotation>     <xsd:element name="transaction" type="transactionType"/>     <xsd:complexType name="transactionType">     <xsd:annotation>         <xsd:documentation>             This type is used by the root element.         </xsd:documentation>     </xsd:annotation>         <xsd:element name="Lender" type="address"/>         <xsd:element name="Borrower" type="address"/>         <xsd:element ref="note" minOccurs="0"/>         <xsd:element name="books" type="books"/>         <xsd:attribute name="borrowDate" type="xsd:date"/>     </xsd:complexType>     .     .     .

As we've seen with DTDs, you can create choices and sequences of elements and, as you might expect, you can do the same in schemas.

Creating Choices

A choice lets you specify a number of elements, only one of which will be chosen. To create a choice in XML schemas, you use the <xsd:choice> element. Here's an example in this case, I'll change the transactionType type so that the borrower can borrow either several books or just one book. I do this by creating the <xsd:choice> element that holds both a <books> element and a <book> element (note that in this case, the <book> element needs to be made into a global element so that I can refer to it in this choice, so I remove it from the declaration of the <books> element, as you see here:

<xsd:complexType name="transactionType">     <xsd:element name="Lender" type="address"/>     <xsd:element name="Borrower" type="address"/>     <xsd:element ref="note" minOccurs="0"/>     <xsd:choice>         <xsd:element name="books" type="books"/>         <xsd:element ref="book"/>     <xsd:choice>     <xsd:attribute name="borrowDate" type="xsd:date"/> </xsd:complexType> <xsd:complexType name="books">     <xsd:element ref="book" minOccurs="0" maxOccurs="10" /> </xsd:complexType> <xsd:element name="book">     <xsd:complexType>         <xsd:element name="bookTitle" type="xsd:string"/>         <xsd:element name="pubDate" type="xsd:date" minOccurs='0'/>         <xsd:element name="replacementValue" type="xsd:decimal"/>         <xsd:element name="maxDaysOut">            <xsd:simpleType base="xsd:integer">                 <xsd:maxExclusive value="14"/>            </xsd:simpleType>         </xsd:element>         <xsd:attribute name="bookID" type="catalogID"/>     </xsd:complexType> </xsd:element>

Next, I'll take a look at creating sequences.

Creating Sequences

By default, complex types in declare sequences of elements must appear in a conforming document. It turns out that you can also create sequences yourself using the <xsd:sequence> element.

For example, say that I want to let the borrower borrow not just books, but also a magazine. To do that, I can create a new group named booksAndMagazine. A group collects elements together. You can name groups, and you can then include a group in other elements using the <xsd:group> element and referring to the group by name:

<xsd:complexType name="transactionType">     <xsd:element name="Lender" type="address"/>     <xsd:element name="Borrower" type="address"/>     <xsd:element ref="note" minOccurs="0"/>     <xsd:choice>         <xsd:element name="books" type="books"/>         <xsd:element ref="book"/>         <xsd:group ref="booksAndMagazine"/>     <xsd:choice>     <xsd:attribute name="borrowDate" type="xsd:date"/> </xsd:complexType>

To create the group named booksAndMagazine, I use the <xsd:group> element; to ensure that the elements inside that group appear in a specific sequence, I use the <xsd:sequence> element this way:

<xsd:complexType name="transactionType">     <xsd:element name="Lender" type="address"/>     <xsd:element name="Borrower" type="address"/>     <xsd:element ref="note" minOccurs="0"/>     <xsd:choice>         <xsd:element name="books" type="books"/>         <xsd:element ref="book"/>         <xsd:group ref="booksAndMagazine"/>     <xsd:choice>     <xsd:attribute name="borrowDate" type="xsd:date"/> </xsd:complexType> <xsd:complexType name="books">     <xsd:element ref="book" minOccurs="0" maxOccurs="10" /> </xsd:complexType> <xsd:group name="booksAndMagazine">     <xsd:sequence>         <xsd:element ref="books"/>         <xsd:element ref="magazine"/>     </xsd:sequence> </xsd:group> <xsd:element name="book">     <xsd:complexType>         <xsd:element name="bookTitle" type="xsd:string"/>         <xsd:element name="pubDate" type="xsd:date" minOccurs='0'/>         <xsd:element name="replacementValue" type="xsd:decimal"/>         <xsd:element name="maxDaysOut">            <xsd:simpleType base="xsd:integer">                 <xsd:maxExclusive value="14"/>              </xsd:simpleType>         </xsd:element>         <xsd:attribute name="bookID" type="catalogID"/>     </xsd:complexType> </xsd:element> <xsd:element name="magazine">     <xsd:complexType>         <xsd:element name="magazineTitle" type="xsd:string"/>         <xsd:element name="pubDate" type="xsd:date" minOccurs='0'/>         <xsd:element name="maxDaysOut">             <xsd:simpleType base="xsd:integer">                  <xsd:maxExclusive value="14"/>             </xsd:simpleType>         </xsd:element>         <xsd:attribute name="magazineID" type="catalogID"/>     </xsd:complexType> </xsd:element>

Creating Attribute Groups

You can also create groups of attributes using the <xsd:attributeGroup> element. For example, say that I wanted to add a number of attributes to the <book> element that describe the book. To do that, I can create an attribute group named bookDescription and then reference that attribute group in the declaration for <book>:

<xsd:element name="book">     <xsd:complexType>         <xsd:element name="bookTitle" type="xsd:string"/>         <xsd:element name="pubDate" type="xsd:date" minOccurs='0'/>         <xsd:element name="replacementValue" type="xsd:decimal"/>         <xsd:element name="maxDaysOut">            <xsd:simpleType base="xsd:integer">                 <xsd:maxExclusive value="14"/>             </xsd:simpleType>         </xsd:element>         <xsd:attributeGroup ref="bookDescription"/>     </xsd:complexType> </xsd:element>

To create the attribute group bookDescription, I just use the <xsd:attributeGroup> element, enclosing the <xsd:attribute> elements that I use to declare the attributes in the <xsd:attributeGroup> element:

<xsd:attributeGroup name="bookDescription">     <xsd:attribute name="bookID" type="CatalogID"/>     <xsd:attribute name="numberPages" type="xsd:decimal"/>     <xsd:attribute name="coverType">         <xsd:simpleType base="xsd:string">             <xsd:enumeration value="leather"/>             <xsd:enumeration value="cloth"/>             <xsd:enumeration value="vinyl"/>         </xsd:simpleType>     </xsd:attribute> </xsd:attributeGroup><xsd:element name="book">     <xsd:complexType>         <xsd:element name="bookTitle" type="xsd:string"/>         <xsd:element name="pubDate" type="xsd:date" minOccurs='0'/>         <xsd:element name="replacementValue" type="xsd:decimal"/>         <xsd:element name="maxDaysOut">            <xsd:simpleType base="xsd:integer">                 <xsd:maxExclusive value="14"/>             </xsd:simpleType>         </xsd:element>         <xsd:attributeGroup ref="bookDescription"/>     </xsd:complexType> </xsd:element> <xsd:attributeGroup name="bookDescription">     <xsd:attribute name="bookID" type="CatalogID"/>     <xsd:attribute name="numberPages" type="xsd:decimal"/>     <xsd:attribute name="coverType">         <xsd:simpleType base="xsd:string">             <xsd:enumeration value="leather"/>             <xsd:enumeration value="cloth"/>             <xsd:enumeration value="vinyl"/>         </xsd:simpleType>     </xsd:attribute> </xsd:attributeGroup>

Groups Versus Parameter Entities

The process of creating a group of elements or attributes and then referencing that group in another element mimics the use of parameter entities in DTDs. With DTDs, you do the same thing include a group, elements, or attributes in a more or less similar way. There are no such things as parameter entities in schemas, but using groups, you can accomplish most of what parameter entities are used for in DTDs.

As you can see, schemas provide some sophisticated mechanisms for building documents up from pieces.

Creating all Groups

Schemas support another type of group: the all group. All the elements in an all group may appear once or not at all, and they may appear in any order. This group must be used at the top level of the content model, and the group's children must be individual elements that is, this group must itself contain no groups. In addition, any element in this content model can appear no more than once (which means that the allowed values of minOccurs and maxOccurs are 0 and 1 only).

Here's an example; in this case, I'm converting the transactionType type into an all group:

<xsd:complexType name="transactionType">     <xsd:all>         <xsd:element name="Lender" type="address"/>         <xsd:element name="Borrower" type="address"/>         <xsd:element ref="note" minOccurs="0"/>         <xsd:element name="books" type="books"/>     </xsd:all>     <xsd:attribute name="borrowDate" type="xsd:date"/> </xsd:complexType>

This means that the elements in this type may now appear in any order but can appear only once, at most. Another important point is that if you use it, the <xsd:all> group must contain all the element declarations in a content model. (That is, you can't declare additional elements that are in the content model but outside the group.)

Schemas and Namespaces

One of the big ideas behind schemas was to allow XML processors to validate documents that use namespaces (which DTDs have a problem with). Toward that end, the <schema> element has a new attribute: targetNamespace.

The targetNamespace attribute specifies the namespace to which the schema is targeted that is, the namespace that it is intended for. This means that if an XML processor is validating a document and is checking elements in a particular namespace, it will know what schema to check, based on the schema's target namespace. That's the idea behind target namespaces: You can indicate what namespace a schema is targeted to so that an XML processor can determine which schema(s) to use to validate a document.

You can also specify whether the elements and attributes that were locally declared in a schema need to be qualified when used in a namespace. We've seen globally declared elements and attributes in schemas they're declared at the top level in the schema, directly under the <schema> element. All the other elements and attributes declared in a schema that is, those not declared as direct children of the <schema> element are locally declared. Schemas allow you to indicate whether locals need to be qualified when used in a document.

Using Unqualified Locals

I'll start looking at how schemas work with target namespaces and locals by beginning with unqualified locals (which don't need to be qualified in a document). To indicate whether elements need to be qualified, you use the elementFormDefault attribute of the <schema> element, and you indicate whether attributes need to be qualified; you use the attributeFormDefault attribute of the same element. You can set the elementFormDefault and attributeFormDefault attributes to either "qualified" or "unqualified".

I'll take a look at an example to see how this works. Here, I'm indicating that the target namespace of a schema is "http://www.starpowder.com/ namespace". I'm also making the W3C XML schema namespace, "http://www.w3.org/1999/XMLSchema", the default namespace for the document so that I don't have to qualify the XML schema elements such as <annotation> and <complexType> with a prefix such as xsd.

However, I have to be a little careful. When an XML processor dealing with this schema wants to check, say, the transactionType complex type, it will need to know what namespace to search and it won't find the transactionType type in the default namespace, which is the W3C XML schema namespace. For that reason, I'll define a new namespace prefix, t, and associate that prefix with the same namespace as the target namespace. Now I can use t: to prefix types defined in this schema so that an XML processor will know what namespace to search for their definitions.

I'll also indicate that both elements and attributes should be unqualified in this case. Here's what this new schema looks like. (Notice that I qualify types defined in this schema with the t prefix, but not types, such as string, that are defined in the default "http://www.w3.org/1999/XMLSchema" namespace.)

<schema xmlns="http://www.w3.org/1999/XMLSchema"     xmlns:t="http://www.starpowder.com/namespace"     targetNamespace="http://www.starpowder.com/namespace"     elementFormDefault="unqualified"     attributeFormDefault="unqualified">     <annotation>         <documentation>             Book borrowing transaction schema.         </documentation>     </annotation>     <element name="transaction" type="t:transactionType"/>     <complexType name="transactionType">         <element name="Lender" type="t:address"/>         <element name="Borrower" type="t:address"/>         <element ref="note" minOccurs="0"/>         <element name="books" type="t:books"/>         <attribute name="borrowDate" type="date"/>     </complexType>     <element name="note" type="string"/>     <complexType name="address">         <element name="name" type="string"/>         <element name="street" type="string"/>         <element name="city" type="string"/>         <element name="state" type="string"/>         <attribute name="phone" type="string"             use="optional"/>     </complexType>     <complexType name="books">         <element name="book" minOccurs="0" maxOccurs="10">             <complexType>                 <element name="bookTitle" type="string"/>                 <element name="pubDate" type="date" minOccurs='0'/>                 <element name="replacementValue" type="decimal"/>                 <element name="maxDaysOut">                     <simpleType base="integer">                         <maxExclusive value="14"/>                      </simpleType>                 </element>                 <attribute name="bookID" type="t:catalogID"/>             </complexType>         </element>     </complexType>     <simpleType name="catalogID" base="string">         <pattern value="\d{3}-\d{4}-\d{3}"/>     </simpleType> </schema>

So how does a document that conforms to this schema look? Here's an example note that locals are unqualified, but I do need to qualify globals such as <transaction>, <note>, and <books>. Note also that the namespace of this document is the same as the target namespace of the schema that specifies its syntax, as it should be.

<?xml version="1.0"?> <at:transaction xmlns:at="http://www.starpowder.com/namespace"     borrowDate="2001-10-15">     <Lender phone="607.555.2222">         <name>Doug Glass</name>         <street>416 Disk Drive</street>         <city>Medfield</city>         <state>MA</state>     </Lender>     <Borrower phone="310.555.1111">         <name>Britta Regensburg</name>         <street>219 Union Drive</street>         <city>Medfield</city>         <state>CA</state>     </Borrower>     <at:note>Lender wants these back in two weeks!</at:note>     <at:books>         <book bookID="123-4567-890">             <bookTitle>Earthquakes for Breakfast</bookTitle>             <pubDate>2001-10-20</pubDate>             <replacementValue>15.95</replacementValue>             <maxDaysOut>14</maxDaysOut>         </book>         .         .         .     </at:books> </at:transaction>

Using Qualified Locals

You can also require that locals be qualified. Here's an example schema that requires element names to be qualified in conforming documents:

<schema xmlns="http://www.w3.org/1999/XMLSchema"     xmlns:t="http://www.starpowder.com/namespace"     targetNamespace="http://www.starpowder.com/namespace"     elementFormDefault="qualified"     attributeFormDefault="unqualified">     <annotation>         <documentation>             Book borrowing transaction schema.         </documentation>     </annotation>     .     .     .

What does a document that conforms to this schema look like? Here's an example note that I qualify all elements explicitly:

<?xml version="1.0"?> <at:transaction xmlns:at="http://www.starpowder.com/namespace"     borrowDate="2001-10-15">     <at:Lender phone="607.555.2222">         <at:name>Doug Glass</at:name>         <at:street>416 Disk Drive</at:street>         <at:city>Medfield</at:city>         <at:state>MA</at:state>     </at:Lender>     <at:Borrower phone="310.555.1111">         <at:name>Britta Regensburg</at:name>         <at:street>219 Union Drive</at:street>         <at:city>Medfield</at:city>         <at:state>CA</at:state>     </at:Borrower>     <at:note>Lender wants these back in two weeks!</at:note>     <at:books>         <at:book bookID="123-4567-890">             <at:bookTitle>Earthquakes for Breakfast</at:bookTitle>             <at:pubDate>2001-10-20</at:pubDate>             <at:replacementValue>15.95</at:replacementValue>             <at:maxDaysOut>14</at:maxDaysOut>         </at:book>         .         .         .     </at:books> </at:transaction>

Another way of creating a document that conforms to this schema is to replace the explicit qualification of every element with an implicit qualification by using a default namespace. Here's what that looks like:

<?xml version="1.0"?> <transaction xmlns="http://www.starpowder.com/namespace"     borrowDate="2001-10-15">     <Lender phone="607.555.2222">         <name>Doug Glass</name>         <street>416 Disk Drive</street>         <city>Medfield</city>         <state>MA</state>     </Lender>     <Borrower phone="310.555.1111">         <name>Britta Regensburg</name>         <street>219 Union Drive</street>         <city>Medfield</city>         <state>CA</state>     </Borrower>     <note>Lender wants these back in two weeks!</note>     <books>         <book bookID="123-4567-890">             <bookTitle>Earthquakes for Breakfast</bookTitle>             <pubDate>2001-10-20</pubDate>             <replacementValue>15.95</replacementValue>             <maxDaysOut>14</maxDaysOut>         </book>         .         .         .     </books> </transaction>

So far, we've indicated that all locals must either be qualified or unqualified. However, there is a way of specifying that individual locals be either qualified or unqualified, and you do that with the form attribute.

Here's an example; in this case, I'll leave all locals unqualified, except the bookID attribute, which I'll specify must be qualified:

<schema xmlns="http://www.w3.org/1999/XMLSchema"     xmlns:t="http://www.starpowder.com/namespace"     targetNamespace="http://www.starpowder.com/namespace"     elementFormDefault="unqualified"     attributeFormDefault="unqualified">     <annotation>         <documentation>             Book borrowing transaction schema.         </documentation>     </annotation>     <element name="transaction" type="t:transactionType"/>     <complexType name="transactionType">         <element name="Lender" type="t:address"/>         <element name="Borrower" type="t:address"/>         <element ref="note" minOccurs="0"/>         <element name="books" type="t:books"/>         <attribute name="borrowDate" type="date"/>     </complexType>     <element name="note" type="string"/>     <complexType name="address">         <element name="name" type="string"/>         <element name="street" type="string"/>         <element name="city" type="string"/>         <element name="state" type="string"/>         <attribute name="phone" type="string"             use="optional"/>     </complexType>     <complexType name="books">         <element name="book" minOccurs="0" maxOccurs="10">             <complexType>                 <element name="bookTitle" type="string"/>                 <element name="pubDate" type="date" minOccurs='0'/>                 <element name="replacementValue" type="decimal"/>                 <element name="maxDaysOut">                     <simpleType base="integer">                         <maxExclusive value="14"/>                      </simpleType>                 </element>                 <attribute name="bookID" type="t:catalogID"                 form="qualified"/>             </complexType>         </element>     </complexType>     <simpleType name="catalogID" base="string">         <pattern value="\d{3}-\d{4}-\d{3}"/>     </simpleType> </schema>

Here's a document that conforms to this schema note that all locals are unqualified, except the bookID attribute, which is qualified:

<?xml version="1.0"?> <at:transaction xmlns:at="http://www.starpowder.com/namespace"     borrowDate="2001-10-15">     <Lender phone="607.555.2222">         <name>Doug Glass</name>         <street>416 Disk Drive</street>         <city>Medfield</city>         <state>MA</state>     </Lender>     <Borrower phone="310.555.1111">         <name>Britta Regensburg</name>         <street>219 Union Drive</street>         <city>Medfield</city>         <state>CA</state>     </Borrower>     <at:note>Lender wants these back in two weeks!</at:note>     <at:books>         <book at:bookID="123-4567-890">             <bookTitle>Earthquakes for Breakfast</bookTitle>             <pubDate>2001-10-20</pubDate>             <replacementValue>15.95</replacementValue>             <maxDaysOut>14</maxDaysOut>         </book>         .         .         .     </at:books> </at:transaction>

There's plenty more power wrapped up in schemas. For example, you can have one schema inherit functionality from another, much as you would in an object-oriented programming language, and you can restrict the inheritance process, also as you would in an object-oriented programming language. This standard is one that's still evolving and still expanding. Let's hope that it won't expand past the capabilities of XML processor authors and that we'll see more processors that will support schemas at least partially in the near future.