XML Schema

With the exception of the basic XML syntax, XML Schema is without a doubt the single most important technology in the XML family. In the Web services world, XML Schema is the key technology for enabling interoperation.

XML Schema is a W3C recommendation^[3] that provides a type system for XML-based computing systems. XML Schema is an XML-based language that provides a platform-independent system for describing types and interrelations between those types. Another aspect of XML Schema is to provide structuring for XML documents.

^[3] See http://www.w3.org/XML/Schema#dev for links to the XML Schema specifications.

Document Type Definitions (or DTDs) were the precursor to XML Schema, and are a text- (not XML-) based format designed to convey information about the structure of a document. Unlike XML Schema, DTDs do not concern themselves with type systems, but simply constrain documents based on their structure. Furthermore, since the DTD language is not XML-based, many of the XML-friendly tools that we use are incapable of processing DTDs. Because of these reasons, and the fact that no recent Web services protocols have used DTDs, we can consider DTDs as a deprecated technology in the Web services arena. Instead, XML Schema has become the dominant metadata language for Web services (and indeed for most other application areas by this time).

In fact, the analogy between XML technologies and object-orientation is clear if we compare XML documents to objects and XML Schema types to classes. XML documents that conform to a schema are known as instance documents, in the same way that objects of a particular class are known as instances. Thus we can conceptually match XML Schema schemas with classes and XML documents with objects, as shown in Figure 2-9.

Figure 2-9. Comparing XML to object-oriented model.

graphics/02fig01.jpg

The conceptual relationship between an object model and XML Schema is straightforward to comprehend. Where object-based systems classes and their interrelationships provide the blueprint for the creation and manipulation of objects, in the XML arena it is the type model expressed in XML Schema schemas that constrain documents that confirm to those schemas.

Like object-oriented programming languages, XML Schema provides a number of built-in types and allows these to be extended in a variety of ways to build abstractions appropriate for particular problem domains. Each XML Schema type is represented as the set of (textual) values that instances of that type can take. For instance the boolean type is allowed to take values of only true and false, while the short type is allowed to take any value from -32768 to 32767 inclusively. In fact, XML Schema provides 44 different built-in types specified in the http://www.w3.org/2001/XMLSchema namespace. Additionally, XML Schema allows users to develop their own types, extending and manipulating types to create content models is the very heart of XML Schema.

XML Schema and Namespaces

As we have seen, the built-in types from XML Schema are qualified with the namespace http://www.w3.org/2001/XMLSchema. We must not use this namespace when we develop our own types, in the same way that we would not develop types under the java.lang package in Java or System namespace in .Net. However, like adding package or namespace affiliations in object-oriented programming, affiliating a type with a namespace in XML Schema is straightforward. Adding a targetNamespace declaration to an XML Schema to affiliate it with a namespace is analogous to adding a package declaration to a Java class, as shown in Figure 2-10.

Figure 2-10. Adding namespace affiliation to an XML schema.

graphics/02fig02.jpg

The skeletal schema shown in Figure 2-10 outlines the basic principle on which all XML Schema operate: the schema element delimits the namespace (like the keyword package delimits the package scope for a single Java source file) and the targetNamespace gives the namespace a name (like the period-separated string that follows the package keyword).

Don't be confused by the number of namespaces that exist in Figure 2-10. There are in fact only two of them and they play three distinct roles. The default namespace is the XML Schema namespace because the elements that we use in this document, such as the root element <schema>, are from the XML Schema namespace. The targetNamespace namespace is used to declare the namespace which the types that will be declared in this schema will be affiliated with. Finally, the explicit namespace tns (an abbreviation of Target NameSpace) will be used to allow types and elements within this schema to reference one another and, hence, it shares the same URI as the targetNamespace element.

A First Schema

Now that we understand the basics of XML Schema construction, we can write a simple schema with which we can constrain a document. This simple schema example does not explore any of the type-system features of XML Schema, but instead concentrates on constraining a simple document as a first step. Drawing on our experience with DVD documents earlier in this chapter we will create a schema that can validate a given DVD document. Let's recap the document that we want to constrain in Figure 2-11:

Figure 2-11. An XML document containing DVD information.

 <?xml version="1.0" encoding="utf-8"?> <dvd xmlns="http://dvd.example.com" region="2"> <title>The Phantom Menace</title> <year>2001</year> </dvd>

If we analyze the document in Figure 2-11, we see that it contains an element called dvd, which itself contains two elements, title and year, which are all qualified with the namespace http://dvd.example.com. From this, we immediately know that the targetNamespace is http://dvd.example.com. We also know that the schema requires two nested elements and a globally scoped element, and so we can construct the schema, as shown in Figure 2-12:

Figure 2-12. A first DVD schema.

 <?xml version="1.0" encoding="UTF-8"?> <schema xmlns="http://www.w3.org/2001/XMLSchema"   targetNamespace="http://dvd.example.com"   elementFormDefault="qualified"   attributeFormDefault="unqualified" >   <element name="dvd">     <complexType>       <sequence>         <element name="title" type="string"/>         <element name="year" type="positiveInteger"/>       </sequence>       <attribute name="region" type="positiveInteger"/>     </complexType>   </element> </schema>

Since the elements in the document in Figure 2-11 have a namespace that matches the targetNamespace of the schema in Figure 2-12, we can assume that the document is a valid instance of the schema.

The schema dictates that the instance document must have an opening element with the name dvd from the line <element name="dvd"> at the opening line of the schema body.

The conventional style for XML Schema documents is to declare the opening element with elementFormDefault= "qualified" and attributeFormDefault="unqualified" to ensure that elements in instance documents should be namespace qualified by default, while any attributes should lack any namespace qualification.

The schema then goes on to declare that there should be a sequence of two nested elements within that first dvd element, called title and year, respectively. Specifying this is done with four elements. The first of these is the complexType element which indicates that the parent dvd element consists of other elements nested within it. Inside the complexType element we see a sequence element. A sequence element places the constraint on any conformant document that elements nested within must follow the same sequence as the schema. In this case, since the elements nested within the sequence are the title element followed by the year element, conformant documents must also specify title before year. The title element must contain information in string form because its type attribute is set to the string type from the XML Schema namespace. Similarly, the year element specifies that its information must be encoded as an XML Schema positiveInteger type.

The final aspect of this schema is to describe that the outer-most dvd element requires an attribute to hold region information. This constraint is applied with the <attribute> element which mandates an attribute called region whose value must be of type positiveInteger.

While we can now begin to create simple schemas to constrain simple documents, scaling this approach to large schemas and large documents is usually impractical and undesirable. Instead we need to look beyond the document which after all is only the serialized, readable form of XML to the real power of XML Schema: its type system.

Implementing XML Schema Types

The real beauty of XML Schema is that once a document has been validated against a schema, it becomes more than just a set of elements and tags it becomes a set of types and instances. The elements contained within a document are processed and the type and instance information from them is exposed to the consuming software agent. After validation, the information contained in an XML Document is called a post schema-validation Infoset, or usually an Infoset. Infosets make it possible to reflect over the logical contents of a document, just like in some object-oriented programming languages, and so the power of XML Schema as a platform-independent type system is revealed. To demonstrate, let's start to build some types and see how the (logical) type system works with the (physical) document.

Creating Simple Types via Restriction

XML Schema provides a total of 44 simple types with which to build content models. However, unlike simple types in most programming languages, in XML Schema these types can be used as base types for the creation of specialized subtypes. There is a key difference though when we define a subtype of a simple type in XML Schema, in that we do not change the structure of the type (as we would do when we inherit from a base class in Java), but instead change the subset of values that the subtype can handle. For instance we might specify a subtype of the simple type string that can only be used to hold a value that represents a postcode. Similarly we might restrict the date type to valid dates within a particular century.

We create a subtype of a simple type in XML Schema using the restriction element. Within this element, we specify the name of the simple type whose set of permissible values we will be restricting (known as the base type) and how exactly the restriction will be applied. Restrictions are then specified by constraining facets of the base simple type, where the set of available facets in XML Schema is shown in Figure 2-13.^[4]

^[4] Information from Part 2 of the XML Schema Specification at http://www.w3.org/TR/xmlschema-2/

Figure 2-13. XML schema facets.
Facet Element	Description
`length`	Specifies the number of characters in a string-based type, the number of octets in a binary-based type, or the number of items in a list-based type.
`minLength`	For string datatypes, minLength is measured in units of characters. For hexBinary and base64Binary and datatypes, minLength is measured in octets by binary data. For list-based datatypes, minLength is measured in number in number of list items.
`maxLength`	For string datatypes, maxLength is measured in units of characters. For hexBinary and base64Binary datatypes, maxLength is measured in octets of binary data. For list-based datatypes, maxLength is measured in number of list items.
`pattern`	Constrains the value to any value matching a specified regular expression.
`enumeration`	Specifies a fixed value that the type must match.
`whiteSpace`	Sets rules for the normalization of white space in types.
`maxInclusive`	Constrains a type's value space to values with a specific inclusive upper bound.
`maxExclusive`	Constrains a type's value space to values with a specific exclusive upper bound.
`minInclusive`	Constrains a type's value space to values with a specific inclusive lower bound.
`maxExclusive`	Constrains a type's value space to values with a specific exclusive lower bound.
`fractionDigits`	For decimal types, specifies the maximum number of decimal digits to the right of the decimal point.
`totalDigits`	For number types, specifies the maximum number of digits.

Each of the facets shown in Figure 2-13 allows us to constrain simple types in a different way. For example, to create a simple type that can be used to validate a British postal code, we would constrain a string type using the pattern facet with a (complicated) regular expression as shown in Figure 2-14.

Figure 2-14. The pattern facet.

 <simpleType name="PostcodeType">   <restriction base="string">     <xs:pattern value="(GIR 0AA)|((([A-Z][0-9][0-9]?)|(([A-Z][A-HJ-Y][0-9][0-9]?)|(([A-Z][0-9][A-Z])|([A-Z][A-HJ-Y][0-9]?[A-Z])))) [0-9][A-Z]{2})"/>   </restriction> </simpleType>

The pattern specified in Figure 2-14 allows only values that match the British postal code standard, such as SW1A 1AA (the Prime Minister's residence in 10 Downing Street) or W1A 1AE (the American Embassy in London). Formally, these rules are defined by the British Post Office^[5] as:

^[5] http://www.govtalk.gov.uk/gdsc/schemaHtml/BS7666-v1-xsd-PostCodeType.htm

The first part of the code before the space character (known as the outward code) can be 2, 3 or 4 alpha-numeric characters followed by a space and the second part of the code (the inward code), which is 3 characters long and is always 1 digit followed by 2 alpha-characters. Permitted combinations according to the PostcodeType type are: AN NAA, ANN NAA, AAN NAA, AANN NAA, ANA NAA, AANA NAA, (where A=alpha character and N=numeric character).
The letters I and Z are not used in the second alpha position (except GIR 0AA which is an historical anomaly in the British postal system).
The second half of the code never uses the letters C, I, K, M, O, and V.

Any divergence from this form will mean that the element is not a valid PostcodeType instance.

Similarly, we might want to create an enumeration where only specific values are allowed within a type, such as those for currencies. An example of this is shown in Figure 2-15, where the XML Schema string type is restricted to allow only certain values that represent a number of world currencies:

Figure 2-15. The pattern facet.

 <xs:simpleType name="CurrencyType">   <xs:restriction base="xs:string">     <xs:enumeration value="GBP"/>     <xs:enumeration value="AUD"/>     <xs:enumeration value="USD"/>     <xs:enumeration value="CAD"/>     <xs:enumeration value="EUR"/>     <xs:enumeration value="YEN"/>   </xs:restriction> </xs:simpleType>

The CurrencyType declared in Figure 2-15 would validate elements such as <my-currency>GBP</my-currency>, but would not validate <your-currency>DM</your-currency> since the string DM is not part of this simpleType restriction (nor for that matter are Deutsch Marks any longer legal tender).

Continuing in a monetary theme, we can create StockPriceType type where we specify that the number of digits after the decimal point is at the most 2. In Figure 2-16 we restrict the XML Schema decimal type such that the maximum number of digits after the decimal point in a stock price is 2. This type can then be used to validate elements that have the form <msft>25.52</msft> and <sunw>3.7</sunw>:

Figure 2-16. The fractionDigits facet.

 <xs:simpleType name="StockPriceType">   <xs:restriction base="xs:decimal">     <xs:fractionDigits value="2"/>   </xs:restriction> </xs:simpleType>

To specify sizes of allowed values, we use the length, maxLength and minLength facets. For instance, a sensible precaution to take when creating computer passwords is to mandate a minimum length for security and a maximum length for ease of use (and thus indirectly for security). In XML Schema, we can use maxLength, and minLength facets to create a PasswordType as shown in Figure 2-17:

Figure 2-17. maxLength and minLength facets.

 <xs:simpleType name="PasswordType">   <xs:restriction base="xs:string">     <xs:minLength value="6"/>     <xs:maxLength value="10"/>   </xs:restriction> </xs:simpleType>

When applied to an element in a document, the PasswordType in Figure 2-17 allows values like <password>kather1ne</password>, but does not allow for values such as <password>carol</password> based on the number of characters contained in the element. Of course if a particularly overbearing system administration policy was put into place, we could end up having passwords of a long, fixed length using the length facet instead of minLength and maxLength.

In much the same way that we set the maximum and minimum number of characters with the maxLength, minLength and length facets, we can also specify the maximum and minimum values. Specifying a range of values is achieved with the maxInclusive, minInclusive, minExclusive and maxExclusive facets. For instance, we may wish to define the range of seconds in a minute for timing purposes. A simpleType called SecondsType is shown in Figure 2-18, where the int type from XML Schema is constrained to accept the values from 0 (inclusive) to 59 (60 exclusive):

Figure 2-18. minInclusive and maxExclusive facets.

 <xs:simpleType name="SecondsType">   <xs:restriction base="xs:int">     <xs:minInclusive value="0"/>     <xs:maxExclusive value="60"/>   </xs:restriction> </xs:simpleType>

Similarly we might want to define the years in a particular century, as we see in Figure 2-19, where the years that are part of the 20th century are captured as being positive integers (which have the range from 1 upward) from 1901 (1900 exclusive) through to 2000 (inclusive):

Figure 2-19. minExclusive and maxInclusive facets.

 <xs:simpleType name="TwentiethCenturyType">   <xs:restriction base="xs:positiveInteger">     <xs:minExclusive value="1900"/>     <xs:maxInclusive value="2000"/>   </xs:restriction> </xs:simpleType>

The totalDigits facet puts an upper limit on the number of digits that a number-based type can contain. For example a year number, for around the next 8000 years, contains a total of four digits. Thus, we can create a simple year type using the totalDigits facet to constrain the number of digits to four, as shown in Figure 2-20 where the positiveInteger type from XML Schema is restricted to those positive integers which have at most 4 digits:

Figure 2-20. The totalDigit facet.

 <xs:simpleType name="YearType">   <xs:restriction base="xs:positiveInteger">     <xs:totalDigits value="4"/>   </xs:restriction> </xs:simpleType>

The final facet for restricting the value space of simple types is whiteSpace. This facet allows a simple type implementer to specify how any white spaces (tabs, spaces, carriage returns, and so on) are handled when they appear inside elements. There are three options for the whiteSpace facet which are: preserve (the XML processor will not remove any white space characters), replace (the XML processor will replace all white space with spaces), and collapse (same as replace, with all preceding and trailing white space removed).

Often the whiteSpace facet is applied along with other facets to deal with extraneous white space. For instance if we add a whiteSpace facet to the YearType from Figure 2-20, the XML processor that processes instances of this type can deal with any unimportant white space in it. This is shown in Figure 2-21, where the whiteSpace facet is set to collapse, which effectively rids the value of any unwanted white space after it has been processed:

Figure 2-21. The whiteSpace facet.

 <xs:simpleType name="YearType">   <xs:restriction base="xs:positiveInteger">     <xs:totalDigits value="4"/>     <xs:whiteSpace value="collapse"/>   </xs:restriction> </xs:simpleType>

So, if the XML processor receives an element of type YearType such as:

 <moon-landing>     1969

</moon-landing>, the whiteSpace collapse facet will effectively reduce it to <moon-landing>1969</moon-landing>.

The built-in simple type NormalizedString will automatically strip line feeds, carriage returns or tabs from any white spaced text.

Simple Type: List and Union

Though restriction is one means of creating new simple types, it is not the only way. XML Schema supports two additional mechanisms for creating new simple types: union and list.

Both union and list are aggregation mechanisms, and so there is no type hierarchy. Therefore we cannot "cast" between base type and union or list type as we can with types derived through restriction.

The list mechanism is the simpler of the two to understand. In short, simple types created via the list mechanism are a white space-delimited list of values from the base type. For example, we can create a list of instances of YearType from Figure 2-20 to create the YearsType as shown in Figure 2-22:

Figure 2-22. Creating new simple types with list.

 <xs:simpleType name="YearType">   <xs:restriction base="xs:positiveInteger">     <xs:whiteSpace value="collapse"/>     <xs:totalDigits value="4"/>   </xs:restriction> </xs:simpleType> <xs:simpleType name="YearsType">   <xs:list itemType="YearType"/> </xs:simpleType>

The YearsType type defined in Figure 2-22 can then be used to validate instances of the YearsType such as the years element in Figure 2-23.

Figure 2-23. An instance of the YearsType type.

 <WWII> 1939 1940 1941 1942 1943 1944 1945 1946</WWII>

The union mechanism is slightly more subtle than the list. It allows the aggregation of the value spaces of two types to be combined into the value space of a new single simple type. For instance, imagine we have two simple types that represent fruits and vegetables, respectively, as shown in Figure 2-24:

Figure 2-24. FruitType and VegetableType simple types.

  <xs:simpleType name="FruitType">   <xs:restriction base="xs:string">     <xs:enumeration value="ORANGE"/>     <xs:enumeration value="APPLE"/>     <xs:enumeration value="BANANA"/>     <xs:enumeration value="KIWI"/>   </xs:restriction> </xs:simpleType>   <xs:simpleType name="VegetableType">   <xs:restriction base="xs:string">     <xs:enumeration value="POTATO"/>     <xs:enumeration value="CABBAGE"/>     <xs:enumeration value="TURNIP"/>     <xs:enumeration value="LEEK"/>   </xs:restriction> </xs:simpleType>

We can use the FruitType and VegetableType types in Figure 2-24 to create a FruitAndVegetableType via a union as shown here in Figure 2-25:

Figure 2-25. Creating a new simple type via a union.

 <xs:simpleType name="FruitAndVegetableType">   <xs:union memberTypes="FruitType VegetableType"/> </xs:simpleType>

The resulting FruitAndVegetableType type can be used to validate elements such as <organically-grown>BANANA</organically-grown> and <menu-item> POTATO</menu-item> because both BANANA and POTATO are valid values for the FruitAndVegetableType type.

Simple Type Support in Programming Languages

The XML Schema support for simple user-defined types that allow custom value and lexical spaces is a powerful aspect of the technology. However, since most programming languages do not support this feature, typically programmers have had to produce properties/accessors/mutators that constrain the value space by manually checking values and throwing exceptions where constraints have been invalidated. For example, take the Java equivalent of the YearType type (from Figure 2-20) shown in Figure 2-26:

Figure 2-26. Value and Lexical handling with Java's primitive types.

 public class Year {   public int getValue()   {     return _value;   }   public void setValue(int value)                        throws InvalidValueException   {     if(value >= 1000 && value <= 9999)     {       _value = value;     }     else     {       // Invalid year       throw new InvalidValueException();     }   }   private int _value; }

The Year class in Figure 2-26 is somewhat lengthier than the equivalent XML Schema simple type since it has to handle the value space imperatively rather than declaratively. To deal with the lexical space of year instances, we need to manually check the possible values and report back to the user when an invalid value is encountered as exemplified in the Year.setValue(int) method.

Writing these kinds of classes by hand is long-winded and prone to error. Of course we could provide tool support to deal with these issues (like the xsd.exe tool from the .Net platform toolkit), but if we are dealing with schematized XML documents, it happens that we don't necessarily need to. Consider the diagram in Figure 2-27 of a typical XML-enabled software agent (which could be a standalone application, a database, or more likely a Web service) that communicates with its environment through schematized XML documents.

Figure 2-27. Delegating Value/Lexical space error handling to the XML processor.

graphics/02fig04.jpg

The ability to define custom value/lexical spaces that fit our precise needs means that it is possible to delegate constraint checking of values in an XML document to the XML processor. Once the XML processor has produced an Infoset for the program to consume, the XML document that the Infoset was created from must have passed validation by its schema, and so the value and lexical constraints placed on the documents must be satisfied. Knowing this, the developer of the consuming program no longer has to write lengthy constraint checking code since this would be a replication of work that the XML processor already undertakes. Thus using schemas can remove some of the burden of manually checking values in our code, though it is not a substitute for failing to program defensively!

Complex Types

As well as creating specialized versions of the XML Schema simple types, we can also create new complex types by aggregating existing types into a structure. XML Schema supports three means of aggregating types with three different complex type compositors: sequence, choice, and all whose semantics are outlined in Figure 2-28.

Figure 2-28. complexType compositors.
Compositor	Description
`sequence`	Specifies that the contents of the complex type must appear as an ordered list.
`choice`	Allows a choice of any of the contents of the complex type.
`all`	Specifies that the contents of the complex type appear as a unordered list.

While the semantics of the compositors vary, the syntax of each is quite similar. To use any of the compositors, we simply declare a new complex type with a compositor as its child element, as shown here in Figure 2-29:

Figure 2-29. Declaring a new complexType using the sequence compositor.

 <xs:complexType name="AddressType">   <xs:sequence>     <xs:element name="number" type="xs:string"/>     <xs:element name="street" type="xs:string"/>     <xs:element name="city" type="xs:string"/>     <xs:element name="state" type="xs:string"/>     <xs:element name="post-code" type="xs:string"/>   </xs:sequence>   <xs:attribute name="business-address" type="xs:boolean"/> </xs:complexType>

In Figure 2-29 we create a new complexType called AddressType by aggregating five elements of type string which represent a mailing address, and a single attribute of type boolean which is used to indicate whether this address is business or residential.

In the scope of a sequence compositor, each contained element must appear exactly once by default. If more flexibility is needed, then we can add the minOccurs and maxOccurs attributes to each contained element. The minOccurs attribute is set to a value greater than or equal to 0 which then specifies the minimum number of occurrences for its element within the compositor. The maxOccurs attribute specifies the maximum number of elements that should appear in the compositor from 1 to the special value unbounded (which is logically an infinite number of times).

With the AddressType in Figure 2-29, we can now validate elements such as the address element in Figure 2-30:

Figure 2-30. A valid instance of the AddressType type.

 <address>   <number>221b</number>   <street>Baker Street</street>   <city>London</city>   <state>N/A</state>   <post-code>NW1 6XE</post-code> </address>

The all compositor is similar to the sequence compositor except that ordering constraint is relaxed. Therefore while the elements contained within an all compositor must be present, the order in which they appear is unimportant from the point of view of the XML processor.

The minOccurs and maxOccurs attributes do not make sense in the scope of an all compositor since (for example) it is impossible to specify that an instance document should contain all the instances of a maxOccurs="unbounded" element! Instead, omitting these attributes gives us the default semantics of exactly one element per compositor. The only exception here is that minOccurs="0" can be used to specify optional elements.

An example of the all compositor is shown in Figure 2-31, where the PurchaseOrderType type is presented. The PurchaseOrderType uses the all compositor to create an aggregate structure containing mandatory order-number and item elements, and an optional description element (specified by the minOccurs="0" attribute):

Figure 2-31. Using the all compositor.

 <xs:complexType name="PurchaseOrderType">   <xs:all>     <xs:element name="order-number"       type="xs:positiveInteger"/>     <xs:element name="item" type="xs:string"/>     <xs:element name="description" type="xs:string"       minOccurs="0"/>   </xs:all> </xs:complexType>

The PurchaseOrderType type from Figure 2-31 can be used to validate the instances shown in Figure 2-32, where we see instances both where the description element is missing and where it is present:

Figure 2-32. Valid PurchaseOrderType instances.

 <purchase-order>   <order-number>1002</order-number>   <item>11025-32098</item>   <description>Personal MP3 Player</description> </purchase-order> <purchase-order>   <item>44045-23112</item>   <order-number>5290</order-number> </purchase-order>

Using the choice compositor, we can force the contents of part of a document to be one of a number of possible options. For example, in Figure 2-33 we see the UserIdentifierType, which allows a user to supply either a login identifier or Microsoft Passport-style single-signon credentials to log in to a system (this type of arrangement is typical in e-commerce sites).^[6]

^[6] Note that this is a hypothetical example that has been deliberately shortened for clarity, and the types used are not representative of the actual Passport API.

Figure 2-33. Using the choice compositor.

 <xs:complexType name="UserIdentifierType">   <xs:choice>     <xs:element name="login-id" type="xs:string"/>     <xs:element name="passport" type="xs:anyURI"/>   </xs:choice> </xs:complexType>

The UserIdentifierType can be used to validate elements that contain either a login-id, or a passport element, but not both. Therefore both the elements shown in Figure 2-34 can be validated against the UserIdentifierType:

Figure 2-34. Valid UserIdentifierType elements.

 <logon>   <login-id>chewbacca@wookie.org</login-id> </logon> <logon>   <passport>     http://passport.example.org/uid/2235:112e:77fa:9699:aad1   </passport> </logon>

The minOccurs and maxOccurs attributes can be used within choice compositor. They allow us to expand the basic exclusive OR operation that choice provides, to support selection based on quantity as well as content, as exemplified in Figure 2-35:

Figure 2-35. Choosing elements based on cardinality.

 <xs:complexType name="DrinksMenuType">   <xs:choice>     <xs:element name="beer" type="b:BeerType" minOccurs="0"       maxOccurs="2"/>     <xs:element name="wine" type="w:WineType" minOccurs="0"       maxOccurs="1"/>   </xs:choice> </xs:complexType>

Using the DrinksMenuType type, we can specify using the minOccurs and maxOccurs attributes that our choice can be either two beers or one drink of wine, as shown in Figure 2-36.

Figure 2-36. Instance documents constrained by choice.

 <!-- Either two beers... --> <drinks>   <b:beer type="bitter"/>   <b:beer type="lager"/> </drinks> <!-- ... Or a single drink of wine --> <drinks>   <w:wine country="France" grape="Pinot Noir" year="1998"/> </drinks>

Equally, we could select based on quantity of a single item. For example we could envision a choice where beer can be sold in four, six and twelve packs by simply setting the minOccurs and maxOccurs attributes to 4, 6 and 12, respectively, as shown in Figure 2-37:

Figure 2-37. Choice based on cardinality.

 <xs:complexType name="DrinksMenuType">   <xs:choice>     <xs:element name="beer" type="xs:string" minOccurs="4"       maxOccurs="4"/>     <xs:element name="beer" type="xs:string" minOccurs="6"       maxOccurs="6"/>     <xs:element name="beer" type="xs:string" minOccurs="12"       maxOccurs="12"/>   </xs:choice> </xs:complexType>

With choice, we have drawn to a close our discussion on compositors. We have seen how we can aggregate existing types into new types in a variety of ways (sequence, choice, all) and some of the variations on those themes (like choice-by-cardinality). However, we can also create new types not only by aggregating existing types, but by aggregating existing types and textual content. For instance, we might wish to mix textual information and structured data to create a letter^[7] as shown in Figure 2-38.

^[7] This example adapted from the W3 Schools example at: http://www.w3schools.com/schema/schema_complex_mixed.asp

Figure 2-38. Mixed textual and element content.

 <letter> Dear Professor <name>Einstein</name>, Your shipment (order: <orderid>1032</orderid> ) will be shipped on <shipdate>2003-06-14</shipdate>. </letter>

In order to mix elements and text, we must create a type that allows such mixtures (and by default types do not). Thus we create a schema such as that shown here in Figure 2-39:

Figure 2-39. Schema supporting mixed textual and element content.

 <xs:element name="letter">   <xs:complexType mixed="true">     <xs:sequence>       <xs:element name="name" type="xs:string"/>       <xs:element name="orderid" type="xs:positiveInteger"/>       <xs:element name="shipdate" type="xs:date"/>     </xs:sequence>   </xs:complexType> </xs:element>

The way that we support mixed textual and elemental content is to create a complexType with mixed content. Thus when the mixed attribute is set to true (in its absence the default is false), the resulting type can mix elements and text as shown in the letter example in Figure 2-38.

The `any` Element

By default, all complex types that we create have closed content models. This means that only the elements that are specified when the type is declared can appear in instances. While this certainly encourages strong typing, it can also be a problem. How do we handle elements within a document that we cannot predict ahead of time? Indeed many of the Web services protocols that we will encounter in later chapters have this requirement, where the content model of schemas for particular protocols has to be extended on a per application basis (in fact, we discuss how WS-Transaction extends WS-Coordination in this way in Chapter 7). Fortunately this kind of extensibility is supported in XML Schema through the any element, which allows us to develop an open content model for a type through the use of wildcards.

Using any within a complex type means that any element can appear at that location, so that it becomes a placeholder for future content that we cannot predict while building the type. For attributes, there is the anyAttribute which defines placeholders for future attribute extensions.

Of course, we might not want to allow completely arbitrary content to be embedded, and so any can be constrained in a number of ways, but don't worry, it will still be generic even after the constraints. The first constraint that we can place on any is how the contents that are substituted will be treated by the XML processor. The processContents attribute has a number of options that can be chosen to set the level of validation of elements specified by an any element. These are:

strict This is the default value in the absence of any processContents attribute. The XML processor must have access to the schema for the namespaces of the substituted elements and fully validate those elements against that schema.
lax This is similar to strict, with the exception that if no schema can be located for substituted elements, then the XML parser simply checks for well-formed XML.
skip This is the least taxing validation method, which instructs the XML processor not to validate any elements from the specified namespaces.

The namespaces against which the contents may be validated are specified by a second optional attribute for the any element called namespace. This attribute specifies the namespace of the elements that it is valid to substitute for an any element within a document, and has a number of possible settings:

##any This is the default setting for the namespace attribute which implies that elements from any namespace are allowed to exist in the placeholder specified by the any element.
##other Specifying this value for the namespace attribute allows elements from any namespace except the namespace of the parent element (i.e., not the targetNamespace of the parent).
##local The substituted elements must come from no namespace.
##targetNamespace Only elements from the namespace of the parent element can be contained.

Finally we are allowed to combine some of the above options to make the available namespaces more configurable. That is, we are allowed to specify a space-separated list of valid namespace URIs (instead of ##any and ##other), plus optionally ##targetNamespace and ##local. Thus we can restrict the namespaces for which it is valid to substitute any element to a list of (one or more) specific namespaces if necessary.

An example of how the any element is used is presented in Figure 2-40.

Figure 2-40. WS-Transaction messages are extensible via any and anyAttribute.

 <xsd:complexType name="Notification">   <xsd:sequence>     <xsd:element name="TargetProtocolService"       type="wsu:PortReferenceType"/>     <xsd:element name="SourceProtocolService"       type="wsu:PortReferenceType" />     <xsd:any namespace="##other" processContents="lax"       minOccurs="0" maxOccurs="unbounded"/>   </xsd:sequence>   <xsd:anyAttribute namespace="##other"     processContents="lax"/> </xsd:complexType>

Figure 2-40 shows the Notification type from the WS-Transaction protocol schema. A Notification in WS-Transaction is a message that is transmitted between actors in the protocol. However, since WS-Transaction is designed to allow different back end transaction systems to operate on the Internet, the messages it exchanges have to be extensible enough to express the semantics of each back end system. This, of course, calls for an open content model to allow third parties to extend the protocol to suit their own systems.

The protocol supports wildcard elements and attributes via the xsd:any and xsd:anyAttribute elements. In both cases, the wildcard element namespaces must come from any namespace other than the WS-Transaction namespace as its namespace attribute is set to ##other. This is exemplified in Figure 2-41 where we see a SOAP message (see Chapter 3 for a full explanation of SOAP) from one vendor's WS-Transaction implementation (see Chapter 7 for details on Web services transactions) using the wildcard elements to propagate information pertinent to their implementation.^[8]

^[8] This SOAP message is from Arjuna Technologies' XTS 1.0 implementation of the WS-Transaction protocol.

Although the DialogIdentifier element from the SOAP message in Figure 2-41 wasn't specified by the schema, it is still a valid message because it matches the constraints of the <xsd:any namespace="##other" processContents="lax" minOccurs="0" maxOccurs="unbounded"/> element from the schema. It matches the ##other constraint since it comes from the namespace http://schemas.arjuna.com/ws/2003/01/wsarjtx which is valid since the WS-Transaction namespace is http://schemas.xmlsoap.org/ws/2002/08/wstx. Since the schema maintains that processing of these elements is lax, it means that the XML processor that receives this message will validate the well-formed XML of the DialogIdentifier element. Thus the message conforms to the schema even though the originators of the schema had no idea about the organization that ultimately created the conformant message, let alone the message itself.

Figure 2-41. Using Wildcard element to extend a WS-Transaction message.

 <?xml version="1.0" encoding="UTF-8"?> <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">   <soapenv:Body>     <wstx:OnePhaseCommit xmlns:wstx="http://schemas.xmlsoap.org/ws/2002/08/wstx">       <wstx:TargetProtocolService xmlns:wstx="http://schemas.xmlsoap.org/ws/2002/08/wstx">         <wsu:Address xmlns:wsu="http://schemas.xmlsoap.org/ws/2002/07/utility"> http://localhost:5555/jboss-net/services/TwoPCParticipantMSG         </wsu:Address>       </wstx:TargetProtocolService>       <wstx:SourceProtocolService xmlns:wstx="http://schemas.xmlsoap.org/ws/2002/08/wstx">         <wsu:Address xmlns:wsu="http://schemas.xmlsoap.org/ws/2002/07/utility"> http://localhost/jboss-net/services/TwoPCCoordinator         </wsu:Address>       </wstx:SourceProtocolService>       <wsarjtx:DialogIdentifier xmlns:wsarjtx=         "http://schemas.arjuna.com/ws/2003/01/wsarjtx">         123456       </wsarjtx:DialogIdentifier>     </wstx:OnePhaseCommit>   </soapenv:Body> </soapenv:Envelope>

In addition to the any and anyAttribute elements, XML Schema also provides two special types called anyType and anySimpleType which can be used instead of a specific named type where we need our schemas to be more generic.

The anyType type is the most generic of the two being substitutable for any type in the whole XML Schema type system, including user-derived types. The anySimpleType is more constrained and supports only types that are from the set of forty-four XML Schema simple types or types derived from them.

These special types provide the same kind of generality when creating type-based content models as the any element provides for document structure. It is not unusual to see attributes like type="xs:anyType" or type="xs:anySimpleType" in element declarations where the type of such elements is expected to be determined by the application that consumes the schema, and not by the schema developer.

Inheritance

While the ability to constrain instance documents is essential for interoperability, harnessing the type system exploits the real power of XML Schema. The inheritance features in XML Schema allow us to create type hierarchies that capture the data models of the underlying software systems that XML is designed to support.

In fact, we have already seen one form of inheritance when we used the restriction feature to create new simple types with differently constrained value and lexical spaces. However, XML Schema also supports a mechanism called extension that allows us to augment (rather than constrain) the capabilities of an existing type. Using this facility we can begin to create hierarchies of complex types just as we can in object-oriented programming languages.

When using complex type extension, we have two options for creating subtypes. We can create subtypes that contain only simple content (text and attributes only), or subtypes that contain complex content (other elements as well as text and attributes).

An example of extending a complex type with additional simple content is shown in Figure 2-42:

Figure 2-42. Complex Type `extension with simpleContent`.

 <xs:complexType name="MonitorType">   <xs:simpleContent>     <xs:extension base="xs:string">       <xs:attribute name="flatscreen" type="xs:boolean"/>     </xs:extension>   </xs:simpleContent> </xs:complexType>

The MonitorType complex type in Figure 2-42 uses the simpleContent element to add a single attribute to its content, which is defined as being the string built-in type. The base type of the MonitorType (string) is specified by the base attribute in the extension element. The additional simple content is specified as the only child of this extension element. The new subtype we have defined can now be used to validate elements such as <monitor flatscreen="true">HP P4831D</monitor>.

Figure 2-43 shows an example of how we can use the extension mechanism to create subtypes with additional elements using the complexContent construct.

Figure 2-43. Complex Type `extension with complexContent`.

 <xs:complexType name="PersonType">   <xs:sequence>     <xs:element name="forename" type="xs:string"/>     <xs:element name="surname" type="xs:string"/>   </xs:sequence> </xs:complexType> <xs:complexType name="FootballerType">   <xs:complexContent>     <xs:extension base="PersonType">       <xs:sequence>         <xs:element name="team" type="xs:string"/>         <xs:element name="goals" type="xs:int"/>       </xs:sequence>     </xs:extension>   </xs:complexContent> </xs:complexType>

The PersonType type in Figure 2-43 can be used to validate instances such as that shown here in Figure 2-44:

Figure 2-44. An Instance of the `PersonType` Type.

 <person>   <forename>Alan</forename>   <surname>Turing</surname> </person>

The FootballerType in Figure 2-43 has complexContent, allowing the elements and attributes to appear within the body of the type. It capitalizes on that fact by adding the team and goals elements to extend on the base PersonType to allow the validation of such elements as shown in Figure 2-45:

Figure 2-45. A `FootballerType` Type Instance.

 <footballer>   <forename>Alan</forename>   <surname>Shearer</surname>   <team>Newcastle United</team>   <goals>145</goals> </footballer>

As we see in Figure 2-45, instances of the FootballerType type have a similar structure to instances of the PersonType type, because the FootballerType subtype inherits the forename and surname elements from the PersonType, but adds the elements team and goals.

From this example, we can see that it is possible to use the extension mechanism to build type hierarchies in XML Schema, just as we can in object-oriented programming languages. However, to be able to exploit such hierarchies (e.g. to "cast" between types) we need to use another XML Schema mechanism: substitution groups.

Substitution Groups

Substitution groups are a feature that allows us to declare that an element can be substituted for other elements in an instance document. We achieve this by assigning an element to a special group a substitution group that is substitutable for the element at the head of that group, effectively creating an equivalence relation between document elements of the same type (or subtype).

Elements in a substitution group must have the same type as the head element, or a type that has been derived from the head element's type.

While this isn't exactly like polymorphic behavior in object-oriented programming languages since the base-type/derived type relationship isn't implicit, this feature is immensely useful for creating extensible schemas with open content models.

To illustrate this point, consider the schema shown in Figure 2-46. This schema demonstrates how to use substitution groups to deal with element-level substitutions a kind of polymorphic behavior for instance documents. The substitution group consists of the elements cast-member and crew-member declaring themselves to be substitutable for a person element through the substitutionGroup="person" attribute declaration. Note that this is a valid substation group because both cast-member and crew-member are types derived from the PersonType type.

The definition of the cast-and-crew element references the person element from within a sequence, setting the maxOccurs attributes to allow any number of person elements to exist within an instance. However, since person is an abstract element (and thus cannot appear as an element in its own right), this schema actually supports the substitution of person elements for any other element declared to be in the same substitution group. Therefore, this schema will validate instance documents such as that shown in Figure 2-47.

Figure 2-46. Using substitution groups.

 <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">   <xs:complexType name="PersonType">     <xs:sequence>       <xs:element name="firstname" type="xs:string"         minOccurs="0"/>       <xs:element name="surname" type="xs:string"/>     </xs:sequence>   </xs:complexType>   <xs:complexType name="CastMemberType">     <xs:complexContent>       <xs:extension base="PersonType">         <xs:sequence>           <xs:element name="character" type="xs:string"/>         </xs:sequence>       </xs:extension>     </xs:complexContent>   </xs:complexType>   <xs:complexType name="CrewMemberType">     <xs:complexContent>       <xs:extension base="PersonType">         <xs:sequence>           <xs:element name="function" type="xs:string"/>         </xs:sequence>       </xs:extension>     </xs:complexContent>   </xs:complexType>   <!-- Declare substitution group and head element -->   <xs:element name="person" type="PersonType"     abstract="true"/>   <xs:element name="cast-member" type="CastMemberType"     substitutionGroup="person"/>   <xs:element name="crew-member" type="CrewMemberType"     substitutionGroup="person"/>   <!-- Now define the actual document -->   <xs:element name="cast-and-crew">     <xs:complexType>       <xs:sequence>         <xs:element ref="person" maxOccurs="unbounded"/>       </xs:sequence>     </xs:complexType>   </xs:element> </xs:schema>

Figure 2-47. Supporting polymorphic behavior with substitution groups.

 <?xml version="1.0" encoding="UTF-8"?> <cast-and-crew>   <crew-member>     <firstname>Lucas</firstname>     <surname>George</surname>     <function>director</function>   </crew-member>   <cast-member>     <firstname>Ewan</firstname>     <surname>McGregor</surname>     <character>Obi Wan Kenobi</character>   </cast-member> </cast-and-crew>

The instance document in Figure 2-47 shows how types from the person substitution group can be used in places where the original schema has specified a PersonType element. In this case since both cast-member and crew-member are part of the person substitution group, the document is valid.

Like the any and anyAttribute elements, substitution groups are a useful mechanism for creating schema types which are extensible. Again like the any and anyAttribute elements, substitution groups are widely found in various Web services standards. WSDL (see Chapter 3) makes extensive use of substitution groups to allow other protocols (such as BPEL, see Chapter 6) to extend its basic features to more complex problem domains.

Global and Local Type Declarations

Just like classes in object-oriented programming, we need to create instances of XML Schema types in order to do real work like moving XML encoded messages between Web services. In this section, we examine two means for creating instances of types: using global types and declaring local types.

We have already seen examples of both of global (schema-scoped) and local (element-scoped) type declarations throughout the previous sections. A global type definition occurs where we embed a type directly as a child of the <schema> element of a schema. Conversely, a local type is declared as the child an <element> element, which is a direct child of the <schema> element. This is exemplified in Figure 2-48.

Figure 2-48. Global and Local type declarations.

 <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified">   <!-- A Global Type -->   <xs:complexType name="CardType">     <xs:sequence>       <xs:element name="card-type">         <xs:simpleType>           <xs:restriction base="xs:string">             <xs:enumeration value="Visa"/>             <xs:enumeration value="MasterCard"/>           </xs:restriction>         </xs:simpleType>       </xs:element>       <xs:element name="expiry">         <xs:simpleType>           <xs:restriction base="xs:string">             <xs:pattern value="[0-9]{2}-[0-9]{2}"/>           </xs:restriction>         </xs:simpleType>       </xs:element>       <xs:element name="number">         <xs:simpleType name="CardNumberType">           <xs:restriction base="xs:string">             <xs:pattern               value="[0-9]{4} [0-9]{4} [0-9]{4} [0-9]{4}"/>           </xs:restriction>         </xs:simpleType>       </xs:element>       <xs:element name="holder" type="xs:string"/>     </xs:sequence>    </xs:complexType>    <!-- A local type -->    <xs:element name="debit-card">      <xs:complexType>        <xs:complexContent>          <xs:extension base="CardType">            <xs:attribute name="issue"              type="xs:positiveInteger"/>          </xs:extension>        </xs:complexContent>      </xs:complexType>    </xs:element>    <!-- Another local type -->    <xs:element name="wallet">      <xs:complexType>        <xs:sequence>          <xs:element name="credit-card" type="CardType"            minOccurs="0" maxOccurs="unbounded"/>          <xs:element name="debit-card" ref="debit-card"            minOccurs="0" maxOccurs="unbounded"/>        </xs:sequence>      </xs:complexType>    </xs:element> </xs:schema>

The distinction between the two is important. Global types such as CardType in Figure 2-48 are globally visible and so are available within the namespace in which they are declared and in other namespaces, can be extended and generally behave as we would expect classes to behave in an object-oriented programming language. Instances of global types are created by constructing an element whose type attribute refers to that particular global type's name. This is shown in Figure 2-48 where we see this element:

 <xs:element name="credit-card" type="CardType"  minOccurs="0" maxOccurs="unbounded"/>

that defines that an instance of the CardType type can be present any number of times in a wallet element.

On the other hand, local types are declared inline with an element (like debit-card and wallet in Figure 2-48). While the element itself is visible to other elements and types, its implementing type is not and therefore is not extendable by other types in fact, the implementing type doesn't even have a name so that it can be referred to.

When we declare local types, they can subsequently be referred to only by their enclosing element name and their content cannot be extended. In programming terms, this is similar to a component whose API is known, but whose type is anonymous and internal structure is a black box. This is shown in Figure 2-48 where the wallet (itself a local type) is defined as containing any number of instances of the debit-card local type via the ref attribute, like this:

 <xs:element name="debit-card" ref="debit-card"           minOccurs="0" maxOccurs="unbounded"/>.

Instances of local types are specified by the ref attribute, e.g., <xs:element name="credit-card" ref="debit-card" … />

Instances of global types are specified by type attribute, e.g., <xs:element name="credit-card" type="CardType" … >

Whether to declare types globally or locally depends on our intended use for those types. If we intend for those types to form part of a type hierarchy, then they should be declared globally so they can be extended at will. If, however, we intend for a type to only support instances within XML documents, then it should be declared locally.

A good rule of thumb for developing content models is to type hierarchies with global types, but to create local type declarations at the leaf nodes of those hierarchies. Thus within the hierarchy we have the full flexibility supplied by global types, yet the "interface" presented to users of that hierarchy is a collection of element declarations against which XML documents can be validated.

Managing Schemas

While most of the schemas we have seen in this chapter have been short, it is possible for schemas that serve particularly complicated problem domains to become long and difficult to manage. XML Schema helps to solve this problem by providing the include mechanism that allows us to partition a single logical schema (i.e., the set of types from a single targetNamespace) across a number of physical schema documents. For instance, we could choose to create type hierarchies in one physical document and create the document layout in another physical document for ease of management. These two separate physical documents can then be made into a single logical schema by including the type hierarchy document in the document structure schema, as shown in Figure 2-49 and Figure 2-50.

Figure 2-49. The Type hierarchy part of the Wallet schema.

 <?xml version="1.0" encoding="UTF-8"?> <xs:schema targetNamespace="http://wallet.example.com" xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified">   <xs:complexType name="CardType">     <xs:sequence>       <xs:element name="card-type">         <xs:simpleType>           <xs:restriction base="xs:string">             <xs:enumeration value="Visa"/>             <xs:enumeration value="MasterCard"/>           </xs:restriction>         </xs:simpleType>       </xs:element>       <xs:element name="expiry">         <xs:simpleType>           <xs:restriction base="xs:string">             <xs:pattern value="[0-9]{2}-[0-9]{2}"/>           </xs:restriction>         </xs:simpleType>       </xs:element>       <xs:element name="number">         <xs:simpleType name="CardNumberType">           <xs:restriction base="xs:string">             <xs:pattern               value="[0-9]{4} [0-9]{4} [0-9]{4} [0-9]{4}"/>           </xs:restriction>         </xs:simpleType>       </xs:element>       <xs:element name="holder" type="xs:string"/>     </xs:sequence>   </xs:complexType> </xs:schema>

Figure 2-50. The Document-Structure part of the Wallet schema.

 <?xml version="1.0" encoding="UTF-8"?> <xs:schema targetNamespace="http://wallet.example.com" xmlns:tns="http://wallet.example.com" xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified">   <xs:include schemaLocation="CreditCard.xsd"/>   <xs:element name="debit-card">     <xs:complexType>       <xs:complexContent>         <xs:extension base="tns:CardType">           <xs:attribute name="issue"             type="xs:positiveInteger"/>         </xs:extension>       </xs:complexContent>     </xs:complexType>   </xs:element>   <xs:element name="wallet">     <xs:complexType>       <xs:sequence>         <xs:element name="credit-card" type="tns:CardType"           minOccurs="0" maxOccurs="unbounded"/>         <xs:element name="tns:debit-card" ref="debit-card"           minOccurs="0" maxOccurs="unbounded"/>       </xs:sequence>     </xs:complexType>   </xs:element> </xs:schema>

The schema shown in Figure 2-49 effectively becomes the container for all of the types that might be used in the XML documents that conform to the schema (which at the moment is only a single type, CardType). The schema in Figure 2-50 uses the include mechanism to create a single logical schema containing itself and the included schema from Figure 2-49. This gives access to all of the types defined in the included schema, allowing the wallet to be constructed in the same way as it was when the two schemas were physically one (in Figure 2-48), with the advantage that because the individual schemas are smaller, maintaining them is easier.

While the include mechanism is fine for partitioning a single schema across multiple physical schema documents, it is limited to schema documents which share the same targetNamespace. It is easy to see the limitation of this mechanism if we imagine for a moment that the definition of the CardType had not been developed by the same in-house team that created the wallet, but had instead been created by an outside consortium of credit card companies. In this case the targetNamespace will be different from that of the wallet schema and so include will not work. Instead, we use the import mechanism, which allows us to combine types and elements from different namespaces into a single schema.

Figure 2-51. The Credit Card schema.

 <?xml version="1.0" encoding="UTF-8"?> <xs:schema targetNamespace="http://card.example.com" xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified">   <xs:complexType name="CardType">   <!-- Card implementation omitted for brevity -->   </xs:complexType> </xs:schema>

Figure 2-52. The Wallet schema.

 <?xml version="1.0" encoding="UTF-8"?> <xs:schema targetNamespace="http://wallet.example.com" xmlns:tns="http://wallet.example.com" xmlns:cc="http://card.example.com" xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified">   <xs:import namespace="http://card.example.org"     schemaLocation="CreditCard.xsd"/>   <xs:element name="debit-card">     <xs:complexType>       <xs:complexContent>         <xs:extension base="cc:CardType">           <xs:attribute name="issue"             type="xs:positiveInteger"/>         </xs:extension>       </xs:complexContent>     </xs:complexType>   </xs:element>   <xs:element name="wallet">     <xs:complexType>       <xs:sequence>         <xs:element name="credit-card" type="cc:CardType"           minOccurs="0" maxOccurs="unbounded"/>         <xs:element name="debit-card" ref="tns:debit-card"           minOccurs="0" maxOccurs="unbounded"/>       </xs:sequence>     </xs:complexType>   </xs:element> </xs:schema>

The schema in Figure 2-51 declares a single type (CardType) in the namespace http://card.example.com. The schema containing the CardType type is then exposed to the schema shown in Figure 2-52 via the import mechanism, which involves specifying both the namespace that is being imported and the location of the schema which is attributed with that targetNamespace.

The imported namespace is given a prefix (so that it can be referenced within the wallet schema) via the xmlns:cc attribute in the root element of the wallet schema document. Now the components of the credit card schema (including CardType) are accessible to the wallet schema by referencing its qualified name (QName) via the prefix cc.

Once we have imported a schema, we can freely reference its contents. In the wallet schema, we use the contents of the credit card schema to create a new type of card (debit-card) by extending the credit card schema's CardType. We also create a wallet element that declares instances of both the global CardType and instances of the local debit-card type. As we have seen, the import declaration works just like an import declaration in the Java programming language or using a declaration in C#, which simply exposes the types from a foreign namespace to the current namespace.

Schemas and Instance Documents

Until this point we have largely focused on either XML documents or constructing portable type systems with XML Schema. However, it is only when these two aspects of XML intersect that we actually have a usable technology for moving structured data between systems. That is, we need to be able to communicate the abstract notions defined in schemas via concrete XML documents and on receipt of an XML document, be able to translate it back into some form suitable for processing within the receiving system which is generally an Infoset or native object model, not a mass of angle brackets and text. The relationship between types, elements and instance documents is captured in Figure 2-53.

Figure 2-53. Relationship between Types, Elements and Documents.

graphics/02fig05.jpg

Schema-aware XML processors (like Apache's Xerces and the .Net System.XML classes) use an instance document's namespace to match against the corresponding namespace of a schema. However, the XML Schema specification doesn't mandate how the XML processor should locate that schema in the first place. Typically, an XML processor will be programmatically or administratively configured with the locations of any required schemas before undertaking any processing. However, this can be restrictive in that the schemas of all possible instance documents must be known ahead of time if they are to be validated by the XML processor.

While the XML Schema specification doesn't provide a means of mandating the location of a schema, it does provide a means of hinting at its location by placing and xsi:schemaLocation attribute into the instance document, as shown in Figure 2-54.

Figure 2-54. Using the `xsi:schemaLocation` attribute to locate a schema.

 <ptr:printer xmlns:p="http://printer.example.org"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://printer.example.org file:/home/local/root/schemas/printer.xsd">   <!-- rest of schema omitted for brevity -->

The xsi:schemaLocation attribute specifies a set of space-delimited namespace-location pairs indicating the location of schemas for particular namespaces. Upon finding the xsi:schemaLocation attribute, the XML processor may (since it is only a hint) try to obtain the specified schema from the suggested location. Of course, the processor may not try to obtain this information from the xsi:schemaLocation attribute, especially if it already has the necessary document-schema mappings through other means.

XML Schema Best Practices

We've now seen a great deal of XML Schema, and over time we have built up a set of informal best practices based on the notion of defining important global types and their interrelations first and document structure later. However, it is useful to condense these details down to their barest bones for quick reference:

Always use elementFormDefault="qualified" and attributeFormDefault="unqualified" to ensure that elements are namespaced by default and attributes are not.
Declare all types globally; declare elements (apart from the document root) locally.
Use types to express content models, use elements to dictate the structure of documents.
Use the XML Schema features that most closely match your object model. Do not map the object model onto a different model in Schema just because it makes writing schemas easier.

These best practices are intended as guidelines. Over time you will develop your own practices that more accurately match the kinds of solutions you are working on. However, the fact remains that no matter what style we ultimately develop for Web services projects, we still need to use XML to move data around systems.