C.3 Schema Parts | Office 2003 XML

The simple schemas in Examples Example C-2 and Example C-3 use a lot of pieces of XSD, and you can use them as models for future schemas, but there are a lot more options available, even in the most readily usable subset of XSD.

C.3.1 Namespaces

The only namespace declaration to appear in either example was the namespace declaration for XSD itself:

 xmlns:xs="http://www.w3.org/2001/XMLSchema"

In this case, the schema was defining a vocabulary that was not in a namespace, so there was no need to define an additional namespace. If, as is typical, your schemas define vocabularies that are in a namespace, you'll need to define the namespace on the root xs:schema element. Example C-4 shows a slightly modified version of Example C-3, defining the vocabulary as belonging to the http://simonstl.com/ns/authors/ namespace. Changes to the schema appear in bold.

Example C-4. Example C-3 rewritten to support a namespace

<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"             targetNamespace="http://simonstl.com/ns/authors/"            xmlns="http://simonstl.com/ns/authors/"             elementFormDefault="qualified"            attributeFormDefault="unqualified" >       <xs:element name="authors">     <xs:complexType>       <xs:sequence>         <xs:element maxOccurs="unbounded" ref="person"/>       </xs:sequence>     </xs:complexType>   </xs:element>       <xs:element name="person">     <xs:complexType>       <xs:sequence minOccurs="0">         <xs:element ref="name"/>         <xs:element ref="nationality"/>       </xs:sequence>       <xs:attribute ref="id" use="required"/>     </xs:complexType>   </xs:element>       <xs:element name="name" type="xs:string"/>       <xs:element name="nationality" type="xs:string"/>       <xs:attribute name="id" type="xs:string"/>     </xs:schema>

All of the changes in this case are at the top. The targetNamespace attribute tells the XSD processor what namespace is being defined here, and the xmlns attribute that follows declares the default namespace to use that same namespace URI. (If you leave off the xmlns attribute, the connections between the ref attributes and their corresponding xs:element and xs:attribute declarations will break.) The elementFormDefault and attributeFormDefault attributes declare whether local elements and attributes will be namespace-qualified by default. To match typical XML 1.0 practice, elements are qualified and attributes are not.

Namespace handling in XSD can get extremely complicated if you start using unqualified elements, qualified attributes, or mixing all of them by using the form attribute on individual declarations. The easiest approaches are definitely either to work without namespaces at all or to use qualified elements and unqualified attributes.

It's also worth noting that you don't have to define attributes used in documents for namespace declarations. XSD doesn't consider them attributes and doesn't validate them.

C.3.2 Named and Anonymous Type Definitions

All of the types defined in Examples Example C-2, Example C-3, and Example C-4 were anonymous. Only the xs:elements and xs:attributes had names, not the xs:complexType elements. Some of the declarations referenced a named type, xs:string (a predefined datatype), but these schemas didn't create any named types of their own. If you want to create named types for the complex type content of Example C-4, you could further modularize it as shown in Example C-5.

Example C-5. Example C-4 rewritten to break out complex types

<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"             targetNamespace="http://simonstl.com/ns/authors/"            xmlns="http://simonstl.com/ns/authors/"             elementFormDefault="qualified"            attributeFormDefault="unqualified" >       <xs:element name="authors" type="authorsContent" />       <xs:complexType name="authorsContent">     <xs:sequence>       <xs:element maxOccurs="unbounded" ref="person"/>     </xs:sequence>   </xs:complexType>       <xs:element name="person" type="personContent" />       <xs:complexType name="personContent">     <xs:sequence minOccurs="0">       <xs:element ref="name"/>       <xs:element ref="nationality"/>     </xs:sequence>     <xs:attribute ref="id" use="required"/>   </xs:complexType>       <xs:element name="name" type="xs:string"/>       <xs:element name="nationality" type="xs:string"/>       <xs:attribute name="id" type="xs:string"/>     </xs:schema>

Instead of this definition of the authors element:

  <xs:element name="authors">     <xs:complexType>       <xs:sequence>         <xs:element maxOccurs="unbounded" ref="person"/>       </xs:sequence>     </xs:complexType>   </xs:element>

the schema now uses:

  <xs:element name="authors" type="authorsContent" />       <xs:complexType name="authorsContent">     <xs:sequence>       <xs:element maxOccurs="unbounded" ref="person"/>     </xs:sequence>   </xs:complexType>

The actual xs:element now looks more like its simpler cousins that simply referenced a datatype, while the xs:complexType is a separate component. This approach means that the xs:complexType can be referenced by multiple elements that have the same content model, and it also means that advanced schema developers can derive additional types from the authorsContent type to create variations. (If you don't have an explicit reason to create named types, it is frequently easier to avoid them altogether.)

C.3.3 Datatypes

The examples have been using datatypes, a special kind of named type, since Example C-2. This xs:element refers to the xs:string datatype:

<xs:element name="nationality" type="xs:string"/>

The xs:string datatype is probably the most commonly used type, and it may be okay during the early development of your schemas to define all content as being of type xs:string and then go through later and define more specific types. XSD includes over forty types that you can use without further work, described briefly below.

xs:anyURI: Contains any URL or URI as its value.
xs:base64binary: Contains Base 64 encoded binary content, as defined in RFC 2045.
xs:boolean: Contains a true/false value, expressed as true, false, 0, or 1.
xs:byte: Contains an integer value between -128 and 127.
xs:date: Contains a date in the ISO 8601 [-]CCYY-MM-DD[Z|(+|-)hh:mm] format. The optional negative at the start indicates if the year is before 0 AD, CC is the century, YY the year, MM the month, and DD the day. The [Z|(+|-)hh:mm] is an optional time zone, where Z indicates Universal Time (UTC). For example, August 5, 2004 as experienced in London might be written 2004-08-05Z, while December 7, 1941 BC on the east coast of the United States would be written -1941-12-07-05:00.
xs:dateTime: Much like xs:date above, except that it adds time information, making the complete format [-]CCYY-MM-DDThh:mm:ss[Z|(+|-)hh:mm], where the T is a capital letter T used as a divider, and hh:mm:ss is hours, minutes, and seconds. Hours are expressed in 24-hour time. For example, August 5, 2004 at 9:51 P.M. as experienced in London might be written 2004-08-05T21:51:00Z, while December 7, 1941 BC at 11:37:42 A.M. on the east coast of the United States would be written -1941-12-07T11:37:42-05:00.
xs:decimal: Contains a number with one decimal point and an arbitrary number of digits. A leading negative sign is permitted, as are any number of insignificant leading or following zeros. There is no restriction on the number of digits used, but scientific notation (12.04E+2, for instance) is prohibited. Legal decimals include 0, 4.624, -4.6424, 0010.1111220, and 11221523432399322146838572919572399102.556.
xs:double: A 64-bit floating point number, expressed using a decimal format with optional scientific notation, as well as the values 0 (positive zero), -0 (negative zero), INF (positive infinity), -INF (negative infinity), and NaN (not a number). Doubles are expressed internally as powers of two rather than powers of ten, so some rounding errors may appear in calculations made with doubles.
xs:duration: A length of time, expressed using the format PnYnMnDTnHnMnS. The leading P is mandatory and the T marks the boundary between date and time measurement, but the other letters are required only if used. For example, P1Y is a duration of one year, P2M is a duration of two months, P1DT2H is one day and two hours, and PT20M03S is twenty minutes and three seconds. You should probably avoid combining years or months with days and smaller units, as comparisons can become very complicated.
xs:ENTITIES: Maps to the ENTITIES type in DTDs, used for unparsed entities. This is included for completeness, but your odds of seeing or using it are slim.
xs:ENTITY: Like xs:ENTITIES, maps to the ENTITY type in DTDs, used for unparsed entities. This is included for completeness, but your odds of seeing or using it are slim.
xs:float: Exactly like xs:double, except only a 32-bit floating point space.
xs:gDay, xs:gMonth, xs:gMonthDay, xs:gYear, xs:gYearMonth: These types represent durations of calendar time with an optional time zone. The first three refer to repeating times (every 15th of the month, every June, every June 15th, respectively), while xs:gYear and xs:gYearMonth refer to specific years and months within a year (the year 2110, June 2110).
xs:hexBinary: Like xs:base64binary, this holds encoded binary content, except that data is encoded by representing every byte in text as its hexadecimal value.
xs:ID: Maps to the DTD type ID, which is used for attribute values that must be unique within a document. Unlike its use in DTDs, it can be applied to both attribute and element content. Its value must start with a letter or underscore, and be composed of letters and numbers, underscores, periods, and hyphens.
xs:IDREF: Maps to the DTD IDREF, which is used for attribute values that must match an ID value elsewhere in the document. Unlike its use in DTDs, it can be applied to both attribute and element content. Like ID, its value must start with a letter, underscore, or colon, and be composed of letters and numbers, underscores, periods, and hyphens.
xs:IDREFS: Maps to the DTD IDREFS, and is just like xs:IDREF, except that multiple identifiers pointing to IDs may appear, separated by spaces.
xs:int: Represents 32-bit integers, in the range from -2147483648 to 2147483647. Any number of leading zeros is permitted, but no decimal points, scientific notation, INF, or NaN. Legal values include 20, -9743, 0, and 2147483645.
xs:integer: Like decimal, this represents all positive and negative integers with any number of digits allowed. No decimal point may appear. -0 and +0 are permitted, but they are considered equal. Legal values include -200, 420, and 2147483649.
xs:language: A language code like those used by the xml:lang attribute, based on RFC 1766. Values might include en-US for English as spoken in the United States, fr-CA for Canadian French, or fr for French.
xs:long: A 64-bit integer, in the range -9223372036854775808 to 9223372036854775807. No decimal points, scientific notation, INF, or NaN are permitted.
xs:Name: An XML Schema version of the XML 1.0 Name production, which must start with a letter, underscore, or colon, and be composed of letters, numbers, periods, underscores, hyphens, and colons.
xs:NCName: Like xs:Name, except that colons are prohibited.
xs:negativeInteger: Exactly like xs:integer, except that no positive integers or zero are allowed.
xs:NMTOKEN: An XML Schema version of the XML 1.0 NMTOKEN production, which allows values containing letters, numbers, periods, colons, underscores, and hyphens.
xs:NMTOKENS: Just like xs:NMTOKEN, except that multiple tokens may appear separated by whitespace.
xs:nonNegativeInteger: Exactly like xs:integer, except that negative values are prohibited. Zero is allowed.
xs:nonPositiveInteger: Exactly like xs:integer, except that positive values are prohibited. Zero is allowed.
xs:normalizedString: A string of characters that will be reported as if all whitespace characters are spaces no tabs, linefeeds, or carriage returns will be reported to the program.
xs:NOTATION: An XML Schema version of the rarely-used XML 1.0 NOTATION type for attributes.
xs:positiveInteger: Exactly like xs:integer, except that negative values are prohibited. Zero is not allowed.
xs:QName: A namespace-qualified name. The prefix used in the value must be in scope, declared in this element or in an ancestor element, and the application will be told of the namespace URI and the local portion of the name.
xs:short: A 16-bit integer, in the range -32768 to 32768. Decimal points are forbidden.
xs:string: Any legal XML text you like.
xs:time: Time information represented in a 24-hour format as hh:mm:ss[Z|(+|-)hh:mm], where hh:mm:ss is hours, minutes, and seconds and the rest is an optional timezone. For example, 9:51 A.M. as experienced in London might be written 09:51:00Z, while 11:37:42 P.M. on the east coast of the United States would be written 23:37:42-05:00.
xs:token: Just like xs:string, except that all whitespace is collapsed down to single spaces and leading and trailing whitespace is removed.
xs:unsignedByte, xs:unsignedInt, xs:unsignedLong, xs:unsignedShort: Positive 8-bit, 32-bit, 64-bit, and 16-bit integers, respectively. Zero is permitted in all of these, but negative numbers, decimal points, INF, and NaN are not.

XML Schema Part 2 provides a set of facilities for creating additional constraints on these datatypes using a facet-based system, but those facilities definitely deserve a book of their own. For most applications, one of these basic types will be acceptable.

C.3.4 Varied Document Structures

While some XML documents, particularly those spreadsheet or database contents, only need to define containers and possibly a sequence, richer documents often contain a much wider variety of possibilities. Sections may be optional or appear repeatedly, but may also be replaced with a variety of different choices. Choices may themselves include or be included by sequences. XML Schema offers support for many different kinds of document structure.

Examples Example C-2 through Example C-5 all used the xs:sequence element and the minOccurs and maxOccurs attributes shown below.

  <xs:element name="person">     <xs:complexType>       <xs:sequence minOccurs="0">         <xs:element ref="name" />         <xs:element ref="nationality" />       </xs:sequence>       <xs:attribute ref="id" use="required"/>     </xs:complexType>   </xs:element>

The xs:sequence element is called a compositor, imposing order on its child xs:element particles. There are two other compositors available: xs:choice and xs:all. The xs:choice element permits one of a list of particles to appear, while xs:all requires that all particles must appear but doesn't put constraints on the order in which they appear. In addition to setting rules for their particles, compositors also act as a group, and you can specify minOccurs or maxOccurs for the group as a whole. (The default value for both the minOccurs and maxOccurs is one.)

If you wanted to define a person element that included both name and nationality but weren't concerned about the order in which they appeared, you could use:

  <xs:element name="person">     <xs:complexType>       <xs:all>         <xs:element ref="name"/>         <xs:element ref="nationality"/>       </xs:all>       <xs:attribute ref="id" use="required"/>     </xs:complexType>   </xs:element>

(Note that the xs:attribute isn't part of the group. Attributes are part of the type, but the compositors only apply to element content.)

If, on the other hand, you wanted to define a person element that could contain a choice of a name or an alias, you might use:

  <xs:element name="person">     <xs:complexType>       <xs:choice minOccurs="0" >         <xs:element ref="name" />         <xs:element ref="alias" />       </xs:choice>       <xs:attribute ref="id" use="required"/>     </xs:complexType>   </xs:element>

The particles inside of an xs:sequence or xs:choice may be xs:element, xs:sequence, xs:choice, xs:any, or xs:group elements. (xs:all may only contain xs:element.) For example, a choice might be between an element and sequence of choices:

<xs:element name="pachinko">  <xs:complexType>   <xs:choice>     <xs:element name="simple" type="xs:string" />     <xs:sequence>       <xs:choice>         <xs:element name="choice1" type="xs:string" />         <xs:element name="choice2" type="xs:string"       </xs:choice>       <xs:choice>         <xs:element name="choiceA" type="xs:string" />         <xs:element name="choiceB" type="xs:string"       </xs:choice>     </xs:sequence>   </xs:choice>  </xs:complexType> </xs:element>

In this case, the pachinko element may contain an element named simple, or it may contain the sequence. The sequence requires either a choice1 or a choice2 element (but not both), followed by either a choiceA or a choiceB element (again, not both.)

XML Schema prohibits certain combinations of compositors, requiring that schema structures always provide a deterministic path to a particular combination of elements; the processor should never have to keep two possible choices in mind while it works out which particle a particular element matches. Most simple schemas will never encounter these problems, but more complex ones can fall afoul of them. For more detail, see Chapter 7 of Eric van der Vlist's XML Schema.

C.3.5 When Anything Is Allowed

If you aren't concerned about what goes into a particular element or particle, you can use the xs:any element for its content and xs:anyAttribute to specify its attributes. You can limit the contents to particular namespaces using the namespace attribute and tell the schema validator to skip the contents using the processContents attribute. For example, if you wanted to create an extension element that permitted any content and had any namespaces, you might declare it like:

<xs:element name="extension">   <xs:complexType>     <xs:sequence minoccurs="0" maxOccurs="unbounded">       <xs:any namespace="##any" processContents="skip" />     </xs:sequence>     <xs:anyAttribute namespace="##any" processContents="skip" />   </xs:complexType> </xs:element>

The namespace attribute can hold a namespace URI (or URIs, separated by whitespace), as well as one of four wildcards:

##local: Only elements (or attributes, for xs:anyAttribute) in no namespace at all may appear.
##targetNamespace: Only elements (or attributes, for xs:anyAttribute) in the schema's target namespace may appear.
##any: Elements (or attributes, for xs:anyAttribute) in any namespace at all may appear.
##other: Only elements (or attributes, for xs:anyAttribute) that are not in the schema's target namespace may appear.

The xs:any element must appear within an xs:sequence or xs:choice, while the xs:anyAttribute may appear in xs:attributeGroup as well as xs:complexType and related elements.

C.3.6 Model Groups

If you have lots of declarations you'll be using frequently but don't need to be able to extend or restrict them, you can use the xs:group element, first to define a group of declarations and then to reference them.

For example, the declaration for the person element in Example C-3 looked like:

<xs:element name="person">   <xs:complexType>     <xs:sequence minOccurs="0">       <xs:element ref="name"/>       <xs:element ref="nationality"/>     </xs:sequence>     <xs:attribute ref="id" use="required"/>   </xs:complexType> </xs:element>

If you planned to reuse this combination of name and nationality but not the id attribute, you could create a model group holding the sequence and reference it inside the xs:complexType. The new version would look like:

<xs:element name="person">   <xs:complexType>     <xs:group ref="name-nationality" />     <xs:attribute ref="id" use="required"/>   </xs:complexType> </xs:element>     <xs:group name="name-nationality">   <xs:sequence minOccurs="0">     <xs:element ref="name"/>     <xs:element ref="nationality"/>   </xs:sequence> </xs:group>

You can do the same thing to attributes if you have a group of attributes to be applied repeatedly. To create a set of attributes referring to URLs and giving MIME types of the desired content, you might create an xs:attributeGroup like this one:

<xs:attributeGroup name="retrievalInformation" >   <xs:attribute name="href" type="xs:anyURI" />   <xs:attribute name="mime-type" type="xs:string"/> </xs:attribute>     <xs:element name="link">    <xs:complexType>       <xs:attributeGroup ref="retrievalInformation" />    </xs:complexType> </xs:element>

The link element could now have attributes named href and mime-type.

The xs:group element may contain any compositor (xs:sequence, xs:choice, or xs:all) and its contents, while xs:attributeGroup is limited to containing xs:attribute, xs:attributeGroup, or xs:anyAttribute. If you need to put both elements and attributes in a group, use xs:complexType instead.

C.3.7 Empty Content, Mixed Content, and Default Values

XML Schema can support a few more types of content than have been shown so far, as well as supply content to documents in some cases. The simplest case that hasn't been shown yet is the creation of an element (like br in HTML) that must always be empty. The easiest way to do this is to use an xs:complexType element that doesn't reference any elements, like this:

<xs:element name="br">    <xs:complexType>    </xs:complexType> </xs:element>

If you want to add attributes, they can be placed in the xs:complexType element without changing the emptiness of the br element.

Another common case is mixed content, where text and elements appear on the same level of a document. A classic case is a paragraph that contains bold, italic, and underlined text. In simple HTML, this might look like:

<p>This is <b>bold</b>, this is <i>italic</i>, and this is  <u>underline</u>.</p>

To make this work, you need to create a definition of the p element that contains an xs:complexType element whose mixed attribute is set to true:

<xs:element name="p">    <xs:complexType mixed="true">       <xs:choice minOccurs="0" maxOccurs="unbounded">         <xs:element name="b" type="xs:string" />         <xs:element name="i" type="xs:string" />         <xs:element name="u" type="xs:string" />       </xs:choice>    </xs:complexType> </xs:element>

The choice will permit as many b, i, and u elements as necessary, while mixed="true" will permit text to be mingled with any of them.

If instead of these fancy features you just want to create a definition that provides a default value to an element or attribute if one is not provided, you can use the default attribute on simple element or attribute declarations. To create an element called name whose value defaults to Winky if the element is present but empty, you would write:

<xs:element name="name" default="Winky" />

To create an attribute named flavor whose value defaults to vanilla, you would write:

<xs:attribute name="flavor" default="vanilla" />

Unlike the element, the default value will only be applied if the attribute is absent. You can also fix a value to an attribute or element. If you insisted that the flavor must always be vanilla, you could instead use:

<xs:attribute name="flavor" fixed="vanilla" />

The flavor attribute's value will default to vanilla if the attribute isn't present in the document, and an error will be reported if a document contains a flavor attribute with any other value.

C.3.8 Annotations

The last feature of XML Schema worth noting here is its support for annotations. Every single element in XML Schema permits an xs:annotation element as its first child (except xs:annotation itself, that is). The xs:annotation element may contain any number of xs:documentation and xs:appinfo elements, and the content models for both of those are wide open.

The xs:appinfo element is intended for machine-readable content, while the xs:documentation element is intended for human-readable content. Both accept a source attribute that points to a URI, and xs:documentation also accepts an xml:lang attribute that specifies the human language in which the documentation appears. At present, Office ignores both of these, but xs:documentation in particular is an opportunity for you to provide additional information in your schemas. For example, to document the flavor attribute's peculiar status, a careful schema writer might modify its definition:

<xs:attribute name="flavor" fixed="vanilla">   <xs:annotation>     <xs:documentation xml:lang="en-US">       While many people like multiple flavors of ice cream,        the manager of this project insists that everyone must       have vanilla, and accepts no questions on the matter.     </xs:documentation>   </xs:annotation> </xs:attribute>

You can also use HTML, DocBook, or the XML vocabulary of your choice within xs:documentation, and then use other programs or stylesheets to create more formal documentation using this information.

C.3.9 Other Features

XML Schema defines a wide variety of other features, including extension and restriction of both structural types and datatypes, combining types, inclusion and export of external schemas, substitution groups, keys for establishing uniqueness among parts of a document, a mechanism for suggesting which schema applies to a document, and attributes that let parts of a document identify which types within the schema apply to them. Office doesn't support many of these features, and many of them have complex interactions with data models. If you need more information on these features, please consult a book dedicated to XML Schema.