The simple schemas in Examples Example C-2 and Example C-3 use a lot of pieces of XSD, and you can use them as models for future schemas, but there are a lot more options available, even in the most readily usable subset of XSD. C.3.1 NamespacesThe only namespace declaration to appear in either example was the namespace declaration for XSD itself: xmlns:xs="http://www.w3.org/2001/XMLSchema" In this case, the schema was defining a vocabulary that was not in a namespace, so there was no need to define an additional namespace. If, as is typical, your schemas define vocabularies that are in a namespace, you'll need to define the namespace on the root xs:schema element. Example C-4 shows a slightly modified version of Example C-3, defining the vocabulary as belonging to the http://simonstl.com/ns/authors/ namespace. Changes to the schema appear in bold. Example C-4. Example C-3 rewritten to support a namespace<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://simonstl.com/ns/authors/" xmlns="http://simonstl.com/ns/authors/" elementFormDefault="qualified" attributeFormDefault="unqualified" > <xs:element name="authors"> <xs:complexType> <xs:sequence> <xs:element maxOccurs="unbounded" ref="person"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="person"> <xs:complexType> <xs:sequence minOccurs="0"> <xs:element ref="name"/> <xs:element ref="nationality"/> </xs:sequence> <xs:attribute ref="id" use="required"/> </xs:complexType> </xs:element> <xs:element name="name" type="xs:string"/> <xs:element name="nationality" type="xs:string"/> <xs:attribute name="id" type="xs:string"/> </xs:schema> All of the changes in this case are at the top. The targetNamespace attribute tells the XSD processor what namespace is being defined here, and the xmlns attribute that follows declares the default namespace to use that same namespace URI. (If you leave off the xmlns attribute, the connections between the ref attributes and their corresponding xs:element and xs:attribute declarations will break.) The elementFormDefault and attributeFormDefault attributes declare whether local elements and attributes will be namespace-qualified by default. To match typical XML 1.0 practice, elements are qualified and attributes are not.
It's also worth noting that you don't have to define attributes used in documents for namespace declarations. XSD doesn't consider them attributes and doesn't validate them. C.3.2 Named and Anonymous Type DefinitionsAll of the types defined in Examples Example C-2, Example C-3, and Example C-4 were anonymous. Only the xs:elements and xs:attributes had names, not the xs:complexType elements. Some of the declarations referenced a named type, xs:string (a predefined datatype), but these schemas didn't create any named types of their own. If you want to create named types for the complex type content of Example C-4, you could further modularize it as shown in Example C-5. Example C-5. Example C-4 rewritten to break out complex types<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://simonstl.com/ns/authors/" xmlns="http://simonstl.com/ns/authors/" elementFormDefault="qualified" attributeFormDefault="unqualified" > <xs:element name="authors" type="authorsContent" /> <xs:complexType name="authorsContent"> <xs:sequence> <xs:element maxOccurs="unbounded" ref="person"/> </xs:sequence> </xs:complexType> <xs:element name="person" type="personContent" /> <xs:complexType name="personContent"> <xs:sequence minOccurs="0"> <xs:element ref="name"/> <xs:element ref="nationality"/> </xs:sequence> <xs:attribute ref="id" use="required"/> </xs:complexType> <xs:element name="name" type="xs:string"/> <xs:element name="nationality" type="xs:string"/> <xs:attribute name="id" type="xs:string"/> </xs:schema> Instead of this definition of the authors element: <xs:element name="authors"> <xs:complexType> <xs:sequence> <xs:element maxOccurs="unbounded" ref="person"/> </xs:sequence> </xs:complexType> </xs:element> the schema now uses: <xs:element name="authors" type="authorsContent" /> <xs:complexType name="authorsContent"> <xs:sequence> <xs:element maxOccurs="unbounded" ref="person"/> </xs:sequence> </xs:complexType> The actual xs:element now looks more like its simpler cousins that simply referenced a datatype, while the xs:complexType is a separate component. This approach means that the xs:complexType can be referenced by multiple elements that have the same content model, and it also means that advanced schema developers can derive additional types from the authorsContent type to create variations. (If you don't have an explicit reason to create named types, it is frequently easier to avoid them altogether.) C.3.3 DatatypesThe examples have been using datatypes, a special kind of named type, since Example C-2. This xs:element refers to the xs:string datatype: <xs:element name="nationality" type="xs:string"/> The xs:string datatype is probably the most commonly used type, and it may be okay during the early development of your schemas to define all content as being of type xs:string and then go through later and define more specific types. XSD includes over forty types that you can use without further work, described briefly below.
XML Schema Part 2 provides a set of facilities for creating additional constraints on these datatypes using a facet-based system, but those facilities definitely deserve a book of their own. For most applications, one of these basic types will be acceptable. C.3.4 Varied Document StructuresWhile some XML documents, particularly those spreadsheet or database contents, only need to define containers and possibly a sequence, richer documents often contain a much wider variety of possibilities. Sections may be optional or appear repeatedly, but may also be replaced with a variety of different choices. Choices may themselves include or be included by sequences. XML Schema offers support for many different kinds of document structure. Examples Example C-2 through Example C-5 all used the xs:sequence element and the minOccurs and maxOccurs attributes shown below. <xs:element name="person"> <xs:complexType> <xs:sequence minOccurs="0"> <xs:element ref="name" /> <xs:element ref="nationality" /> </xs:sequence> <xs:attribute ref="id" use="required"/> </xs:complexType> </xs:element> The xs:sequence element is called a compositor, imposing order on its child xs:element particles. There are two other compositors available: xs:choice and xs:all. The xs:choice element permits one of a list of particles to appear, while xs:all requires that all particles must appear but doesn't put constraints on the order in which they appear. In addition to setting rules for their particles, compositors also act as a group, and you can specify minOccurs or maxOccurs for the group as a whole. (The default value for both the minOccurs and maxOccurs is one.) If you wanted to define a person element that included both name and nationality but weren't concerned about the order in which they appeared, you could use: <xs:element name="person"> <xs:complexType> <xs:all> <xs:element ref="name"/> <xs:element ref="nationality"/> </xs:all> <xs:attribute ref="id" use="required"/> </xs:complexType> </xs:element> (Note that the xs:attribute isn't part of the group. Attributes are part of the type, but the compositors only apply to element content.) If, on the other hand, you wanted to define a person element that could contain a choice of a name or an alias, you might use: <xs:element name="person"> <xs:complexType> <xs:choice minOccurs="0" > <xs:element ref="name" /> <xs:element ref="alias" /> </xs:choice> <xs:attribute ref="id" use="required"/> </xs:complexType> </xs:element> The particles inside of an xs:sequence or xs:choice may be xs:element, xs:sequence, xs:choice, xs:any, or xs:group elements. (xs:all may only contain xs:element.) For example, a choice might be between an element and sequence of choices: <xs:element name="pachinko"> <xs:complexType> <xs:choice> <xs:element name="simple" type="xs:string" /> <xs:sequence> <xs:choice> <xs:element name="choice1" type="xs:string" /> <xs:element name="choice2" type="xs:string" </xs:choice> <xs:choice> <xs:element name="choiceA" type="xs:string" /> <xs:element name="choiceB" type="xs:string" </xs:choice> </xs:sequence> </xs:choice> </xs:complexType> </xs:element> In this case, the pachinko element may contain an element named simple, or it may contain the sequence. The sequence requires either a choice1 or a choice2 element (but not both), followed by either a choiceA or a choiceB element (again, not both.) XML Schema prohibits certain combinations of compositors, requiring that schema structures always provide a deterministic path to a particular combination of elements; the processor should never have to keep two possible choices in mind while it works out which particle a particular element matches. Most simple schemas will never encounter these problems, but more complex ones can fall afoul of them. For more detail, see Chapter 7 of Eric van der Vlist's XML Schema. C.3.5 When Anything Is AllowedIf you aren't concerned about what goes into a particular element or particle, you can use the xs:any element for its content and xs:anyAttribute to specify its attributes. You can limit the contents to particular namespaces using the namespace attribute and tell the schema validator to skip the contents using the processContents attribute. For example, if you wanted to create an extension element that permitted any content and had any namespaces, you might declare it like: <xs:element name="extension"> <xs:complexType> <xs:sequence minoccurs="0" maxOccurs="unbounded"> <xs:any namespace="##any" processContents="skip" /> </xs:sequence> <xs:anyAttribute namespace="##any" processContents="skip" /> </xs:complexType> </xs:element> The namespace attribute can hold a namespace URI (or URIs, separated by whitespace), as well as one of four wildcards:
The xs:any element must appear within an xs:sequence or xs:choice, while the xs:anyAttribute may appear in xs:attributeGroup as well as xs:complexType and related elements. C.3.6 Model GroupsIf you have lots of declarations you'll be using frequently but don't need to be able to extend or restrict them, you can use the xs:group element, first to define a group of declarations and then to reference them. For example, the declaration for the person element in Example C-3 looked like: <xs:element name="person"> <xs:complexType> <xs:sequence minOccurs="0"> <xs:element ref="name"/> <xs:element ref="nationality"/> </xs:sequence> <xs:attribute ref="id" use="required"/> </xs:complexType> </xs:element> If you planned to reuse this combination of name and nationality but not the id attribute, you could create a model group holding the sequence and reference it inside the xs:complexType. The new version would look like: <xs:element name="person"> <xs:complexType> <xs:group ref="name-nationality" /> <xs:attribute ref="id" use="required"/> </xs:complexType> </xs:element> <xs:group name="name-nationality"> <xs:sequence minOccurs="0"> <xs:element ref="name"/> <xs:element ref="nationality"/> </xs:sequence> </xs:group> You can do the same thing to attributes if you have a group of attributes to be applied repeatedly. To create a set of attributes referring to URLs and giving MIME types of the desired content, you might create an xs:attributeGroup like this one: <xs:attributeGroup name="retrievalInformation" > <xs:attribute name="href" type="xs:anyURI" /> <xs:attribute name="mime-type" type="xs:string"/> </xs:attribute> <xs:element name="link"> <xs:complexType> <xs:attributeGroup ref="retrievalInformation" /> </xs:complexType> </xs:element> The link element could now have attributes named href and mime-type. The xs:group element may contain any compositor (xs:sequence, xs:choice, or xs:all) and its contents, while xs:attributeGroup is limited to containing xs:attribute, xs:attributeGroup, or xs:anyAttribute. If you need to put both elements and attributes in a group, use xs:complexType instead. C.3.7 Empty Content, Mixed Content, and Default ValuesXML Schema can support a few more types of content than have been shown so far, as well as supply content to documents in some cases. The simplest case that hasn't been shown yet is the creation of an element (like br in HTML) that must always be empty. The easiest way to do this is to use an xs:complexType element that doesn't reference any elements, like this: <xs:element name="br"> <xs:complexType> </xs:complexType> </xs:element> If you want to add attributes, they can be placed in the xs:complexType element without changing the emptiness of the br element. Another common case is mixed content, where text and elements appear on the same level of a document. A classic case is a paragraph that contains bold, italic, and underlined text. In simple HTML, this might look like: <p>This is <b>bold</b>, this is <i>italic</i>, and this is <u>underline</u>.</p> To make this work, you need to create a definition of the p element that contains an xs:complexType element whose mixed attribute is set to true: <xs:element name="p"> <xs:complexType mixed="true"> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element name="b" type="xs:string" /> <xs:element name="i" type="xs:string" /> <xs:element name="u" type="xs:string" /> </xs:choice> </xs:complexType> </xs:element> The choice will permit as many b, i, and u elements as necessary, while mixed="true" will permit text to be mingled with any of them. If instead of these fancy features you just want to create a definition that provides a default value to an element or attribute if one is not provided, you can use the default attribute on simple element or attribute declarations. To create an element called name whose value defaults to Winky if the element is present but empty, you would write: <xs:element name="name" default="Winky" /> To create an attribute named flavor whose value defaults to vanilla, you would write: <xs:attribute name="flavor" default="vanilla" /> Unlike the element, the default value will only be applied if the attribute is absent. You can also fix a value to an attribute or element. If you insisted that the flavor must always be vanilla, you could instead use: <xs:attribute name="flavor" fixed="vanilla" /> The flavor attribute's value will default to vanilla if the attribute isn't present in the document, and an error will be reported if a document contains a flavor attribute with any other value. C.3.8 AnnotationsThe last feature of XML Schema worth noting here is its support for annotations. Every single element in XML Schema permits an xs:annotation element as its first child (except xs:annotation itself, that is). The xs:annotation element may contain any number of xs:documentation and xs:appinfo elements, and the content models for both of those are wide open. The xs:appinfo element is intended for machine-readable content, while the xs:documentation element is intended for human-readable content. Both accept a source attribute that points to a URI, and xs:documentation also accepts an xml:lang attribute that specifies the human language in which the documentation appears. At present, Office ignores both of these, but xs:documentation in particular is an opportunity for you to provide additional information in your schemas. For example, to document the flavor attribute's peculiar status, a careful schema writer might modify its definition: <xs:attribute name="flavor" fixed="vanilla"> <xs:annotation> <xs:documentation xml:lang="en-US"> While many people like multiple flavors of ice cream, the manager of this project insists that everyone must have vanilla, and accepts no questions on the matter. </xs:documentation> </xs:annotation> </xs:attribute> You can also use HTML, DocBook, or the XML vocabulary of your choice within xs:documentation, and then use other programs or stylesheets to create more formal documentation using this information. C.3.9 Other FeaturesXML Schema defines a wide variety of other features, including extension and restriction of both structural types and datatypes, combining types, inclusion and export of external schemas, substitution groups, keys for establishing uniqueness among parts of a document, a mechanism for suggesting which schema applies to a document, and attributes that let parts of a document identify which types within the schema apply to them. Office doesn't support many of these features, and many of them have complex interactions with data models. If you need more information on these features, please consult a book dedicated to XML Schema. |