17.8 Allowing Any Content | XML in a Nutshell, Third Edition

It is often necessary to allow users to include any type of markup content they see fit. Also, it is useful to tell the schema processor to validate the content of a particular element against another application's schema. Incorporating XHTML content into another document is an example of this usage.

These applications are supported by the xs:any element. This element accepts attributes that indicate what level of validation should be performed on the included content, if any. Also, it accepts a target namespace that can be used to limit the vocabulary of included content. For instance, going back to the address-book example, to associate a rich-text notes element with an address entry, you could add the following element declaration to the address element declaration:

 <xs:element name="notes" minOccurs="0">   <xs:complexType>     <xs:sequence>       <xs:any namespace="http://www.w3.org/1999/xhtml"            minOccurs="0" maxOccurs="unbounded"            processContents="skip"/>     </xs:sequence>   </xs:complexType> </xs:element>

The attributes of the xs:any element tell the schema processor that zero or more elements belonging to the XHTML namespace ( http://www.w3.org/1999/xhtml ) may occur at this location. Notice that this is done by setting minOccurs to 0 and maxOccurs to unbounded . It also states that these elements should be skipped . This means that no validation will be performed against the actual XHTML namespace by the parser. Other possible values for the processContents attribute are lax and strict . When set to lax , the processor will attempt to validate any element it can find a declaration for and silently ignore any unrecognized elements. The strict option requires every element to be declared and valid per the schema associated with the namespace given.

There is also support in schemas to declare that any attribute may appear within a given element. The xs:anyAttribute element may include the namespace and processContents attributes, which perform the same function as they do in the xs:any element. For example, adding the following markup to the address element would allow any XLink attributes to appear in an instance document:

 <xs:element name="address">   <xs:complexType> . . .   <xs:attributeGroup ref="addr:nationality"/>   <xs:attribute name="ssn" type="addr:ssn"/>   <xs:anyAttribute namespace="http://www.w3.org/1999/xlink"       processContents="skip"/>   </xs:complexType>  </xs:element>

This style of vocabulary mixing may seem strange given the effort that normally goes into creating constraints with schemas, but it fits well with the architecture of XLink.

17.8.1 Using Multiple Documents

As an application grows and becomes more complex, it is important to take steps to maintain readability and extensibility. Things like separating a large schema into multiple documents, importing declarations from external schemas, and deriving new types from existing types are all typical tasks that will face designers of real-world schemas.

Just as large computer programs are separated into multiple physical source files, large schemas can be separated into smaller, self-contained schema documents. Although a single large schema could be arbitrarily separated into multiple smaller documents, taking the time to group related declarations into reusable modules can simplify future schema development.

There are three mechanisms that include declarations from external schemas for use within a given schema: xs:include , xs:redefine , and xs:import . The next three sections will discuss the differences between these methods and when and where they should be used.

17.8.1.1 Including external declarations

The xs:include element is the most straightforward way to bring content from an external schema into your own schema. To demonstrate how xs:include might be used, Example 17-12 shows a new schema document called physical-address.xsd that contains a declaration for a new complex type called physicalAddressType .

Example 17-12. physical-address.xsd

 <xs:schema xmlns:xsi="http://www.w3.org/2001/XMLSchema"   targetNamespace="http://namespaces.oreilly.com/xmlnut/address"   xmlns:addr="http://namespaces.oreilly.com/xmlnut/address"   attributeFormDefault="qualified" elementFormDefault="qualified">      <xs:annotation>     <xs:documentation xml:lang="en-us">       Simple schema example from O'Reilly's       <a href="http://www.oreilly.com/catalog/xmlnut">XML in a         Nutshell.</a>       Copyright 2004 O'Reilly Media, Inc.     </xs:documentation>   </xs:annotation>      <xs:complexType name="physicalAddressType">     <xs:sequence>       <xs:element name="street" type="xs:string" maxOccurs="3"/>       <xs:element name="city" type="xs:string"/>       <xs:element name="state" type="xs:string"/>     </xs:sequence>   </xs:complexType>    </xs:schema>

The address-book.xsd schema document can include and reference this declaration:

 <xs:schema xmlns:xsi="http://www.w3.org/2001/XMLSchema"   targetNamespace="http://namespaces.oreilly.com/xmlnut/address"   xmlns:addr="http://namespaces.oreilly.com/xmlnut/address"   attributeFormDefault="qualified" elementFormDefault="qualified"> . . .    <xs:include schemaLocation="physical-address.xsd"/>    <xs:element name="address">   <xs:complexType>     <xs:sequence> . . .       <xs:element name="physicalAddress"           type="addr:physicalAddressType"/> . . .     </xs:sequence> . . .   </xs:complexType>  </xs:element>

Content that has been included using the xs:include element is treated as though it were actually a part of the including schema document. But unlike external entities, the included document must be a valid schema in its own right. That means that it must be a well- formed XML document and have an xs:schema element as its root element. Also, the target namespace of the included schema must match that of the including document. (It can include references to content defined in the including schema, however.)

17.8.1.2 Modifying external declarations

The xs:include element allows external declarations to be included and used as-is by another schema document. But, sometimes, it is useful to extend and modify types and declarations from another schema, which is where the xs:redefine element comes in.

Functionally, the xs:redefine elements works very much like the xs:include element. The major difference is that within the scope of the xs:redefine element, types from the included schema may be redefined without generating an error from the schema processor. For example, the xs:redefine element could extend the physicalAddressType type to include longitude and latitude attributes without modifying the original declaration in physical-address.xsd :

 <xs:schema xmlns:xsi="http://www.w3.org/2001/XMLSchema"   targetNamespace="http://namespaces.oreilly.com/xmlnut/address"   xmlns:addr="http://namespaces.oreilly.com/xmlnut/address"   attributeFormDefault="qualified" elementFormDefault="qualified"> . . . <xs:redefine schemaLocation="physical-address.xsd">   <xs:complexType name="physicalAddressType">     <xs:complexContent>       <xs:extension base="addr:physicalAddressType">         <xs:attribute name="latitude" type="xs:decimal"/>         <xs:attribute name="longitude" type="xs:decimal"/>       </xs:extension>     </xs:complexContent>   </xs:complexType>  </xs:redefine> . . . </xs:schema>

17.8.1.3 Importing schemas for other namespaces

The xs:include and xs:redefine elements are useful when the declarations are all part of the same application. But as more public schemas become available, incorporating declarations from external sources into custom applications will be important. The xs:import element is provided for this purpose.

Using xs:import , it is possible to make the global types and elements that are declared by a schema belonging to another namespace accessible from within an arbitrary schema. The W3C has used this functionality to create type libraries . A sample type library was developed by the schema working group and can be viewed on the W3C web site at http://www.w3.org/2001/03/XMLSchema/TypeLibrary.xsd. The library includes schema type declarations for representing text, arrays, lists, mathematics, measured quantities , and binary data.

To use some of the types from this library in a schema, include the following xs:import element as a child of the root schema element:

 <xs:import namespace="http://www.w3.org/2001/03/XMLSchema/TypeLibrary"     schemaLocation="http://www.w3.org/2001/03/XMLSchema/TypeLibrary.xsd"/>

17.8.2 Derived Complex Types

We have been using the xs:extension and xs:restriction elements without going too deeply into how or why they work. The schema language provides functionality for extending existing types, which is conceptually similar to that of inheritance in object-oriented programming. The extension and restriction elements allow new types to be defined either by expanding or limiting the potential values of existing types.

17.8.2.1 Deriving by extension

When deriving a new type from an existing type, the resulting type is equivalent to appending the contents of the new declaration to the contents of the base declaration. For instance, the following example declares a new type called mailingAddressType that extends the physicalAddressType type to include a Zip Code:

 <xs:complexType name="mailingAddressType">   <xs:complexContent>     <xs:extension base="addr:physicalAddressType">       <xs:sequence>         <xs:element name="zipCode" type="xs:string"/>       </xs:sequence>     </xs:extension>   </xs:complexContent> </xs:complexType>

This declaration appends a required element, zipCode , to the existing physicalAddressType type. The biggest benefit of this approach is that as new declarations are added to the underlying type, the derived type will automatically inherit them.

17.8.2.2 Deriving by restriction

When a new type is a logical subset of an existing type, the xs:restriction element allows this relationship to be expressed directly. Like the xs:extension type, it allows a new type to be created based on an existing type. In the case of simple types, this restriction is a straightforward application of additional constraints on the value of that simple value.

In the case of complex types, it is not quite so straightforward. Unlike the extension process, it is necessary to completely reproduce the parent type definition as part of the restriction definition. By omitting parts of the parent definition, the restriction element creates a new, constrained type. As an example, this xs:complexType element derives a new type from the physicalAddressType that only allows a single street element to contain the street address. The original physicalAddressType looks like:

 <xs:complexType name="physicalAddressType">   <xs:sequence>     <xs:element name="street" type="xs:string" maxOccurs="3"/>     <xs:element name="city" type="xs:string"/>     <xs:element name="state" type="xs:string"/>   </xs:sequence> </xs:complexType>

The restricted version looks like:

 <xs:complexType name="simplePhysicalAddressType">   <xs:complexContent>     <xs:restriction base="addr:physicalAddressType">       <xs:sequence>         <xs:element name="street" type="xs:string"/>         <xs:element name="city" type="xs:string"/>         <xs:element name="state" type="xs:string"/>       </xs:sequence>     </xs:restriction>   </xs:complexContent>  </xs:complexType>

Notice that this type very closely resembles the physicalAddressType , except the maxOccurs="3 " attribute has been removed from the street element declaration.

17.8.2.3 Using derived types

One of the chief benefits of creating derived types is that the derived type may appear in place of the parent type within an instance document. (Applications that read the schema, like data binding applications, can use its type hierarchy for processing the document as well.) The xsi:type attribute tells the schema processor that the element on which it appears conforms to a type that is derived from the normal type expected. For example, take the instance document in Example 17-13, which conforms to the address schema.

Example 17-13. addressdoc.xml using a derived type

 <?xml version="1.0"?> <addr:address xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"     xsi:schemaLocation="http://namespaces.oreilly.com/xmlnut/address        address-schema.xsd"     xmlns:addr="http://namespaces.oreilly.com/xmlnut/address"     addr:language="en"     addr:ssn="123-45-6789"> . . .   <physicalAddress addr:latitude="34.003855" addr:longitude="-81.034808"     xsi:type="addr:simplePhysicalAddressType">     <street>1400 Main St.</street>     <city>Columbia</city>     <state>SC</state>   </physicalAddress> . . . </addr:address>

Notice that the physicalAddress element has an xsi:type attribute that informs the validator that the current element conforms to the simplePhysicalAddressType , rather than the physicalAddressType that would normally be expected. This feature is particularly useful when developing internationalized applications, as distinct address types could be derived for each country and then flagged in the instance document for proper validation.

17.8.3 Substitution Groups

A feature that is closely related to derived types is the substitution group . A substitution group is a collection of elements that are all interchangeable with a particular element, called the head element , within an instance document. To create a substitution group, all that is required is that an element declaration include a substitutionGroup attribute that names the head element for that group. Then, anywhere that the head element's declaration is referenced in the schema, any member of the substitution group may also appear. Unlike derived types, it isn't necessary to use the xsi:type attribute in an instance document to identify the type of the substituted element.

The primary restriction on substitution groups is that every element in the group must be either of the same type as or derived from the head element's type. Declaring a numeric element and trying to add it to a substitution group based on a string element would generate an error from the schema processor. The elements must also be declared globally and in the target namespace of the schema.