Declaring Elements | Professional XML (Programmer to Programmer)

Elements, of course, are the main components of any XML document. When you declare the required elements of any instance document that is making use of your XML Schema document, you use several methods.

Elements can be either a simple type or a complex type. The simple type is the first one reviewed.

Simple Types

Elements are considered simple types if they contain no child elements or attributes. When you declare simple types, three possible simple types are at your disposal-Atomic types, List types, and Union types.

Atomic Types

Atomic types are by far the simplest. For instance, you can have an XML document that is as simple as the one presented in Listing 6-9.

Listing 6-9: An XML document that requires only a single type

      <?xml version="1.0" encoding="UTF-8"?>      <City xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"       xsi:noNamespaceSchemaLocation="AtomicType.xsd">St. Louis</City>

In this case, there is only a single element that is quite simple. It doesn't contain any other child elements, and it doesn't contain any attributes or have any rules about its contents. Defining this through an XML Schema document is illustrated in Listing 6-10.

Listing 6-10: Declaring an XML Schema document with a simple type

      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>      <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">         <xs:element name="City" type="xs:string" />      </xs:schema>

From Listing 6-10, the XML Schema document contains a single element declaration. The <City> element declaration-also considered an atomic type-is constructed using the <xs:element> element. From here, two attributes are contained within the <xs:element> element.

The name attribute is used to define the name of the element as it should appear in the XML document. Remember that the value provided here is case-sensitive, meaning that when using this XML Schema document you cannot present the element <city> if you want the document to be considered valid.

Besides the name attribute, the other attribute presented is the type attribute. The type attribute allows you to define the datatype of the contents of the <City> element. In the XML Schema document that is presented in Listing 6-10, the datatype of the <City> element is defined as being of type string.

Note

The full list of available datatypes that can be utilized in your element and attribute declarations are presented later in this chapter.

It is rare to declare only a single atomic type and nothing more. In many cases, you use the <xs:simpleType> element. This chapter next takes a look at how to construct list types.

List Types

A list type enables you to define a list of values within a single element. Because problems sometimes arise with list types, they are not always considered best practice. It is usually considered better to separate values, with each using its own elements rather than put them all into a single element. Putting multiple values within a single element is illustrated in the XML document presented in Listing 6-11.

Listing 6-11: An XML document that requires only a single type

      <?xml version="1.0" encoding="UTF-8"?>      <FundIds xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"       xsi:noNamespaceSchemaLocation="ListTypes.xsd">       60003333 600003334 60003335 60003336</FundIds>

This XML document contains a single element, <FundIds>, which contains what appears as a single value, but really it is four values that are separated with a space. Defining this in an XML Schema document is presented in Listing 6-12.

Listing 6-12: An XML Schema document using a list type

      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>      <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">        <xs:element name="FundIds" type="FundIdsType" />        <xs:simpleType name="FundIdsType">           <xs:list itemType="xs:int" />        </xs:simpleType>      </xs:schema>

As with previous examples of atomic types, a single declaration begins the document.

      <xs:element name="FundIds" type="FundIdsType" />

In this case, the <xs:element> element declares an element with the name of <FundIds>, and you can see that it is of type FundIdsType. This isn't the type you would normally expect because it is nothing like string, double, or int. Instead it is a type that must be further defined in your XML Schema document.

      <xs:simpleType name="FundIdsType">         <xs:list itemType="xs:int" />      </xs:simpleType>

The FundIdsType is defined using a single <xs:list> element. You declare a list type using the <xs:list> element. To define the type that is used within the list type itself, you use the itemType attribute. In this case, the itemType attribute is provided a type of int. No matter which type you define, the items that are contained within the list of items in the single element are separated with a space.

The XML document that was provided as an example shows four fund ids that are separated by a single space.

      <FundIds>60003333 600003334 60003335 60003336</FundIds>

Be aware of a problem when using strings within an element that makes use of the list type. For instance, suppose you have a definition like the one presented in Listing 6-13.

Listing 6-13: An XML Schema document using a list type

      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>      <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">        <xs:element name="BaseballTeams" type="BaseballTeamsType" />        <xs:simpleType name="BaseballTeamsType">           <xs:list itemType="xs:string" />        </xs:simpleType>      </xs:schema>

This XML Schema defines a list type that is supposed to be a list of string values representing American and Canadian baseball teams. A valid instance document of this type is illustrated in Listing 6-14.

Listing 6-14: An XML document that provides list of baseball teams

      <?xml version="1.0" encoding="UTF-8"?>      <BaseballTeams xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"       xsi:noNamespaceSchemaLocation="ListTypes.xsd">       Cardinals Yankees Mets Rockies</BaseballTeams>

In this case, the XML document is valid and performs as you want. In this case, four items are defined in the list type. This works well because the strings are single words. Imagine instead that the XML document that is making use of this XML Schema document is presented as shown in Listing 6-15.

Listing 6-15: An XML document that provides list of baseball teams

      <?xml version="1.0" encoding="UTF-8"?>      <BaseballTeams xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"       xsi:noNamespaceSchemaLocation="ListTypes.xsd">       Cardinals Yankees Mets Blue Jays</BaseballTeams>

Although four teams are listed in this element, the Blue Jays from Toronto consists of two words that are separated by a single space. The problem is that items are also separated by a space so, when processed, the example from Listing 6-15 appears to consist of five items and not four. This is one of the reasons you should think about separating these types of items into their own elements instead of presenting them within a list type element.

Union types

When working with list types, you might be interested in presenting multiple item types within a single element. For instance, if you are presenting mutual funds, for example, you want a list that consists of the ID of the fund (an int value) or the ticker of the fund (a string value). If this is the case, you can combine items in a single list, thereby making a union. An example XML Schema document that allows such a construction is presented in Listing 6-16.

Listing 6-16: Allowing a union type from an XML Schema document

      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>      <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">         <xs:element name="FundIds" type="FundType" />         <xs:simpleType name="FundType">             <xs:union memberTypes="FundIdsType FundTickerType" />         </xs:simpleType>         <xs:simpleType name="FundIdsType">            <xs:list itemType="xs:int" />         </xs:simpleType>         <xs:simpleType name="FundTickerType">            <xs:list itemType="xs:string" />         </xs:simpleType>      </xs:schema>

A few things are going on in this XML Schema document. First, a couple of list types are defined within the document-FundIdsType and FundTickerType. Each of these list types is using a different datatype-one is using int and the other is using string.

      <xs:simpleType name="FundIdsType">         <xs:list itemType="xs:int" />      </xs:simpleType>      <xs:simpleType name="FundTickerType">         <xs:list itemType="xs:string" />      </xs:simpleType>

To utilize both these list types within a single element, you create a union between the two using the <xs:union> element. The <xs:union> element from Listing 6-16 utilizes both the list types (in union) through the use of the memberTypes attribute. This is where you can place the types that can be part of that union with a space separating the items.

      <xs:simpleType name="FundType">         <xs:union memberTypes="FundIdsType FundTickerType" />      </xs:simpleType>

Thereafter, the <xs:element> element defines a type attribute with a value of the union-FundType. This construction makes valid the XML document shown in Listing 6-17.

Listing 6-17: An XML document using the union type

      <?xml version="1.0" encoding="UTF-8"?>      <FundIds xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"       xsi:noNamespaceSchemaLocation="UnionType.xsd">         60003333 60003334 60003335 60003336 JAXP      </FundIds>

Complex Types

In addition to defining simple types by using the <xs:simpleType> element, you can also define elements that contain other child elements or attributes. In these cases, you define a complex type using the <xs:complexType> element. Listing 6-18 defines a simple complex type.

Listing 6-18: Declaring an anonymous complex type

      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>      <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">        <xs:element name="Process">          <xs:complexType>            <xs:sequence>              <xs:element name="Name" type="xs:string" />              <xs:element name="Address" type="xs:string" />              <xs:element name="City" type="xs:string" />              <xs:element name="State" type="xs:string" />              <xs:element name="Country" type="xs:string" />            </xs:sequence>          </xs:complexType>        </xs:element>      </xs:schema>

In this case, first a single element is defined-the <Process> element. Before closing the <xs:element> tag to define the <Process> element, you utilize a <xs:complexType> element. The reason that the <xs:complexType> element is used is because numerous subelements are contained within the <Process> element.

Listing 6-18 shows a complex type that is considered an anonymous complex type. It is considered anonymous because it is an unnamed type. Instead, the type is really defined by the nested <xs:complexType> element itself.

Within the <xs:complexType> element, an <xs:sequence> element is used to define all the simple types that are contained within the <Process> element. This XML Schema states that the <Process> element must contain a <Name>, <Address>, <City>, <State>, and <Country> element. All the subelements defined are of type string and they all must be contained within the XML document in order for the document to be considered valid. If one of the elements is missing or is repeated more than once, the XML document is considered invalid. Also, if the elements are out of order, the XML document is considered invalid. A sample XML document that makes use of this type is presented in Listing 6-19.

Listing 6-19: Using the complex type

      <Process xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"       xsi:noNamespaceSchemaLocation="ComplexTypes.xsd">       <Name>Bill Evjen</Name>       <Address>123 Main Street</Address>       <City>Saint Charles</City>       <State>Missouri</State>       <Country>USA</Country>      </Process>

Making the anonymous complex type into a named complex type is simple. To create a named complex type, you construct your XML Schema document as shown in Listing 6-20.

Listing 6-20: Declaring a named complex type

            <?xml version="1.0" encoding="UTF-8" standalone="yes"?>      <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">        <xs:element name="Process" type="ContactDetails" />        <xs:complexType name="ContactDetails">           <xs:sequence>              <xs:element name="Name" type="xs:string" />              <xs:element name="Address" type="xs:string" />              <xs:element name="City" type="xs:string" />              <xs:element name="State" type="xs:string" />              <xs:element name="Country" type="xs:string" />           </xs:sequence>        </xs:complexType>      </xs:schema>

In this case, a single parent element is defined, <Process>, of type ContactDetails. The ContactDetails definition is a named complex type. It is named using the name attribute. This works like the anonymous complex type presented earlier.

Looking at Reusability

When you define a named complex type, you get into the area of reusability within your XML Schema document. In Listing 6-20, you can see that the <Process> element uses an instance of ContactDetails in its definition. Because ContactDetails is encapsulated, it can be reused. Listing 6-21 shows an example of how it can be reused in multiple elements.

Listing 6-21: Declaring a named complex type

      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>      <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">        <xs:element name="Process">           <xs:complexType>              <xs:sequence>                 <xs:element name="BillingAddress" type="ContactDetails" />                 <xs:element name="ShippingAddress" type="ContactDetails" />              </xs:sequence>           </xs:complexType>        </xs:element>        <xs:complexType name="ContactDetails">           <xs:sequence>              <xs:element name="Name" type="xs:string" />              <xs:element name="Address" type="xs:string" />              <xs:element name="City" type="xs:string" />              <xs:element name="State" type="xs:string" />              <xs:element name="Country" type="xs:string" />                 </xs:sequence>        </xs:complexType>      </xs:schema>

From this example, you can se that two elements are nested within the <Process> element-<BillingAddress> and <ShippingAddress>. Both these elements are defined as the same type-ContactDetails. Figure 6-1 shows how this is represented visually.

image from book
Figure 6-1

Reusing the ContactDetails complex type means that you can build a valid XML instance document as presented in Listing 6-22.

Listing 6-22: Process XML document with two ContactDetails instances

      <?xml version="1.0" encoding="UTF-8"?>      <Process xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"       xsi:noNamespaceSchemaLocation="ComplexTypes.xsd">         <BillingAddress>            <Name>Bill Evjen</Name>            <Address>123 Main Street</Address>            <City>Saint Charles</City>            <State>Missouri</State>            <Country>USA</Country>         </BillingAddress>         <ShippingAddress>            <Name>Bill Evjen</Name>            <Address>123 Main Street</Address>            <City>Saint Charles</City>            <State>Missouri</State>            <Country>USA</Country>         </ShippingAddress>      </Process>

This example shows both instances of the ContactDetails type being utilized by different elements.

sequence and all

So far, you have been mostly presented with the use of the <sequence> element in building complex types. Using <sequence> means that all the items in the list are presented in the instance document in the order in which they are declared within the complex type. On the other hand, using <all> allows the creator of the instance document to place the elements in any order they wish-though it is still a requirement that all the elements appear in the construction. Listing 6-23 details a schema that makes use of the <all> element.

Listing 6-23: Using the <all> element

      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>      <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">        <xs:element name="Process">          <xs:complexType>            <xs:all>              <xs:element name="Name" type="xs:string" />              <xs:element name="Address" type="xs:string" />              <xs:element name="City" type="xs:string" />              <xs:element name="State" type="xs:string" />              <xs:element name="Country" type="xs:string" />            </xs:all>          </xs:complexType>        </xs:element>      </xs:schema>

Using this construction means that an XML document is considered valid in the following format:

      <Process xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"       xsi:noNamespaceSchemaLocation="ComplexTypes.xsd">       <Name>Bill Evjen</Name>       <Address>123 Main Street</Address>       <City>Saint Charles</City>       <State>Missouri</State>       <Country>USA</Country>      </Process>

It is also considered valid in this format:

      <Process xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"       xsi:noNamespaceSchemaLocation="ComplexTypes.xsd">       <Country>USA</Country>       <Name>Bill Evjen</Name>       <State>Missouri</State>       <Address>123 Main Street</Address>       <City>Saint Charles</City>      </Process>

Element Types

As you have noticed, one of the big advantages to XML Schemas is that they are able to datatype their contents in more finely grained manner than a DTD Schema can. A multitude of datatypes are at your disposal when creating elements. You assign a datatype to an element by using the type attribute.

      <xs:element name="Name" type="xs:string" />

In this case, an element of <Name> is declared, and it is specified to be of type string. This means that the contents of the <Name> element will always be considered a string value. This means that the following bit of XML is considered valid XML:

      <Name>Bill Evjen</Name>

Although you could also have an element declaration like this one:

      <xs:element name="Age" type="xs:string" />

And this would also be considered valid XML:

      <Age>23</Age>

In this case, however, the 23 value would be considered a string. To give more meaning to any value that you place within the <Age> element, it would probably be better to declare the <Age> element in the following fashion:

      <xs:element name="Age" type="xs:int" />

Two sets of types at your disposal in the XML Schema world-primitive datatypes and derived datatypes. The primitive datatypes are the base foundation types, and the derived datatypes build upon the primitive types to create more elaborate types. Figure 6-2 shows a graph of the primitive and derived datatypes available to you when creating XML Schemas.

image from book
Figure 6-2

In this diagram, you can see a number of primitive datatypes, and only two of them (string and decimal) have been derived from to create some additional datatypes. The primitive datatypes are detailed in the following table.

Open table as spreadsheet

Primitive Data Types	Description
`anyURI`	An absolute or relative URI such as http://www.lipperweb.com/.
`base64Binary`	A Base64 binary encoded set of data.
`boolean`	A bit-flag option that can be represented as true/false, yes/no, 1/0, on/off or something similar.
`date`	A date consisting of a day/month/year combination according to the Gregorian calendar as it is defined by the ISO 8601 standard.
`dateTime`	A date and time value which consists of a date utilizing the day/month/year values and a time set utilizing hour/minute/second as defined by the Gregorian calendar.
`decimal`	A variable precision number that is either positive or negative.
`double`	A double-precision floating point number (64-bit).
`duration`	A duration of time which is a set year/month/day/hour/minute/second length of time according to the Gregorian calendar.
`float`	A single-precision floating point number (32-bit).
`gDay`	A day within the Gregorian calendar.
`gMonth`	A month within the Gregorian calendar.
`gMonthDay`	A month and day within the Gregorian calendar.
`gYear`	A year within the Gregorian calendar.
`gYearMonth`	A year and a month within the Gregorian calendar.
`hexBinary`	A set of binary data that has been hex-encoded.
`NOTATION`	A set of QNames.
`QName`	A qualified name.
`string`	A character string of any length.
`time`	An instance of time. The value range is from 00:00:00 (which represents midnight) to 23:59:59 (which represents one second before midnight).

The datatypes derived from both the string and the decimal primitive datatypes are presented in the following table.

Open table as spreadsheet

Derived Data Types	Description
`byte`	An integer ranging from −128 to 127. A `byte` type is derived from the `short` type.
`ENTITIES`	A set of `ENTITY` data type values (one or more).
`ENTITY`	An `ENTITY` attribute type as presented in the XML 1.0 specification. An `ENTITY` type is derived from the `NCName` type.
`ID`	An `ID` attribute type as presented in the XML 1.0 specification. The ID type is derived from the `NCName` type.
`IDREF`	A reference to an element with a defined `ID` attribute value. An `IDREF` type is derived from the `NCName` type.
`IDREFS`	A set of `IDREF` attribute types (one or more).
`int`	A numerical value ranging from −2147483648 to 2147483647. The `int` type is derived from the `long` type.
`integer`	A numerical value that consists of a whole number that doesn't contain any decimal places. This number can be negative or positive. The `integer` type is derived from the `decimal` type.
`language`	A representation of a natural language identifier as defined by RFC 3066. The `language` type is derived from the `token` type.
`long`	An integer value ranging from −9223372036854775808 and 9223372036854775807. The `long` type is derived from the `integer` type.
`Name`	A token consisting of characters and represents Names as defined in the XML 1.0 specification. A `Name` type is derived from the `token` type.
`NCName`	A "non-colonized" name as presented in the XML 1.0 specification. An `NCName` type is derived from the `Name` type.
`negativeInteger`	An integer which is made up of a negative value. The `negativeInteger` type is derived from the `nonPositiveInteger` type.
`NMTOKEN`	A set of characters that make up a token value and represents the attribute as defined in the XML 1.0 specification. The `NMTOKEN` type is derived from the `token` type.
`NMTOKENS`	A set of `NMTOKEN` attribute types (one or more).
`nonNegativeInteger`	A positive integer that must be greater than or equal to zero. The `nonNegativeInteger` type is derived from the `integer` type.
`nonPositiveInteger`	A negative integer that must be less than or equal to zero. The `nonPositiveInteger` type is derived from the `integer` type.
`normalizedString`	A whitespace normalized string. The `normalizedString` type is derived from the `string` type.
`positiveInteger`	A positive integer that is greater than zero (but not equal to). A `positiveInteger` type is derived from the `nonNegativeInteger` type.
`short`	An integer value ranging from −32768 to 32767. A `short` type is derived from the `int` type.
`token`	A tokenized string. A `token` type is derived from the `normalizedString` type.
`unsignedByte`	An integer value ranging from 0 to 255. The `unsignedByte` type is derived from the `unsignedShort` type.
`unsignedInt`	An integer value ranging from 0 to 4294967295. The `unsignedInt` type is derived from the `unsignedLong` type.
`unsignedLong`	An integer value ranging from 0 to 18446744073709551615. The `unsignedLong` type is derived from the `nonNegativeInteger` type.
`unsignedShort`	An integer value ranging from 0 to 65535. The `unsignedShort` type is derived from the `unsignedInt` type.

Just as these derived datatypes are built upon other types, you can build your own datatypes through the use of the <simpleType> element directly in your XML Schema document.

Listing 6-24 provides an example of creating a custom datatype called MyCountry.

Listing 6-24: Creating a custom datatype called MyCountry

      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>      <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">        <xs:element name="Process">           <xs:complexType>              <xs:sequence>                 <xs:element name="BillingAddress" type="ContactDetails" />                 <xs:element name="ShippingAddress" type="ContactDetails" />              </xs:sequence>           </xs:complexType>        </xs:element>        <xs:complexType name="ContactDetails">           <xs:sequence>              <xs:element name="Name" type="xs:string" />              <xs:element name="Address" type="xs:string" />              <xs:element name="City" type="xs:string" />              <xs:element name="State" type="xs:string" />              <xs:element name="Country" type="MyCountry" />           </xs:sequence>        </xs:complexType>        <xs:simpleType name="MyCountry">           <xs:restriction base="xs:string">              <xs:enumeration value="USA" />              <xs:enumeration value="UK" />              <xs:enumeration value="Canada" />              <xs:enumeration value="Finland" />           </xs:restriction>        </xs:simpleType>      </xs:schema>

In the code presented in Listing 6-24, a complex type is used for both the <BillingAddress> and the <ShippingAddress> elements that consists of a series of elements. The element to pay attention to in this example is the <Country> element. The <Country> element is built from a custom derived datatype.

You build a custom datatype using the <simpleType> element and the name attribute (to give a name to your new datatype). In this example, the datatype is named MyCountry. Next, the <restriction> element is used to derive from another datatype using the base attribute-in this case, string. Next, you place a further restriction by making the MyCountry datatype an enumeration of possible string values using the <enumeration> element.

Groups and Choices

You have seen some examples of encapsulation so far in this chapter. Another form of encapsulation places commonly used element groups together in a package that can be used over and over again within your XML Schema document. This can be accomplished using the <group> element.

For instance, suppose you have an XML Schema document as shown in Listing 6-25.

Listing 6-25: Creating a reusable group in your XML Schema document

      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>      <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">        <xs:element name="Process">           <xs:complexType>              <xs:sequence>                 <xs:element name="OrderNumber" type="xs:positiveInteger" />                 <xs:group ref="ContactDetails" />              </xs:sequence>           </xs:complexType>        </xs:element>        <xs:group name="ContactDetails">           <xs:sequence>              <xs:element name="Name" type="xs:string"/>              <xs:element name="Address" type="xs:string"/>              <xs:element name="City" type="xs:string"/>              <xs:element name="State" type="xs:string"/>              <xs:element name="Country" type="xs:string"/>           </xs:sequence>        </xs:group>      </xs:schema>

In this case, you create a group called ContactDetails that encapsulates the <Name>, <Address>, <City>, <State>, and <Country> elements. You define the child elements of the <Process> element; a <group> element is used to incorporate the defined group ContactDetails. To associate the <group> element to the defined group, ContactDetails, you use the ref attribute. Its value is the name of the group.

By specifying the group using a <sequence> element, you are also stating that all the elements of ContactDetails must appear in the order in which they are defined.

Now suppose you want to provide a choice of elements that might appear as a child element of the <Process> element. You can allow choices to single elements or even to entire groups of elements. Suppose you wanted to change the <Process> element construction to allow for either an American or a Canadian address, but you don't want to use the same set of elements to define both of these items. At the same time, you wanted to allow for only a single instance of either of these element groups to appear within the <Process> element. This is the situation where you would use the <choice> element within the XML Schema document. Listing 6-26 provides the XML Schema document that defines this situation.

Listing 6-26: Creating a reusable group in your XML Schema document

            <?xml version="1.0" encoding="UTF-8" standalone="yes"?>      <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">        <xs:element name="Process">           <xs:complexType>              <xs:sequence>                 <xs:element name="OrderNumber" type="xs:positiveInteger"/>                 <xs:choice>                    <xs:group ref="ContactDetailsUS"/>                    <xs:group ref="ContactDetailsCanada"/>                 </xs:choice>              </xs:sequence>           </xs:complexType>        </xs:element>        <xs:group name="ContactDetailsUS">           <xs:sequence>              <xs:element name="US_Name" type="xs:string"/>              <xs:element name="Address" type="xs:string"/>              <xs:element name="City" type="xs:string"/>              <xs:element name="State" type="xs:string"/>              <xs:element name="Country" type="xs:string"/>           </xs:sequence>        </xs:group>        <xs:group name="ContactDetailsCanada">           <xs:sequence>              <xs:element name="Canada_Name" type="xs:string"/>              <xs:element name="Address" type="xs:string"/>              <xs:element name="City" type="xs:string"/>              <xs:element name="Province" type="xs:string"/>              <xs:element name="Country" type="xs:string"/>           </xs:sequence>        </xs:group>      </xs:schema>

In this case, two groups are defined-ContactDetailsUS and ContactDetailsCanada. The elements between the two groups are similar, but have some differences. For instance, each group uses a unique element name for the name of the contact. The US version uses <US_Name> whereas the Canadian version uses <Canada_Name>. Also, the US version uses <State> whereas the Canadian version uses <Province>. The diagram of the schema is presented in Figure 6-3.

image from book
Figure 6-3

Within the <Process> element, you are interested in having only one of either of these groups appear. So you use the <choice> element.

      <xs:element name="Process">         <xs:complexType>            <xs:choice>               <xs:group ref="ContactDetailsUS"/>               <xs:group ref="ContactDetailsCanada"/>            </xs:choice>         </xs:complexType>      </xs:element>

Within the <choice> element are all the choices you want to allow. In this case, only two choices are defined-ContactDetailsUS and ContactDetailsCanada. Allowing only one or the other means that your XML instance document takes the form presented in either Listing 6-27 or 6-28.

Listing 6-27: An XML document using the American contact information

      <?xml version="1.0" encoding="UTF-8"?>      <Process xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"       xsi:noNamespaceSchemaLocation="ComplexTypes.xsd">         <OrderNumber>1234</OrderNumber>         <US_Name>Bill Evjen</US_Name>         <Address>123 Main Street</Address>         <City>Saint Charles</City>         <State>Missouri</State>         <Country>USA</Country>      </Process>

The US version of the document uses the <US_Name> and <State> elements.

Listing 6-28: An XML document using the Canadian contact information

      <?xml version="1.0" encoding="UTF-8"?>      <Process xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"       xsi:noNamespaceSchemaLocation="ComplexTypes.xsd">         <OrderNumber>1234</OrderNumber>         <Canada_Name>Bill Evjen</Canada_Name>         <Address>123 Main Street</Address>         <City>Vancouver</City>         <Province>British Columbia</Province>         <Country>Canada</Country>      </Process>

Finally, the Canadian version of the document uses the <Canada_Name> and <Province> elements.

Element Restrictions

Building an XML Schema document is all about establishing restrictions. You are defining a set structure of XML that must be in place in order for the XML document to be considered valid. This means, as you have seen so far, that certain elements have to appear, that they have to be spelled in a particular way, and that their values must be of a certain datatype.

You can take the restrictions even further by using a number of available attributes when creating your elements or attributes (attributes are covered shortly).

Cardinality in XML Schemas

One of the problems with DTDs that was mentioned in the beginning part of this chapter was how they deal with cardinality. You want to have a really fine-grained way to define how often (and if) items can appear in a document. Cardinality in XML Schema document is done through the use of the minOccurs and maxOccurs attributes.

minOccurs

The minOccurs attribute specifies the minimum number of times an item may appear. Listing 6-29 shows the minOccurs attribute in use.

Listing 6-29: Using the minOccurs attribute with an element

      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>      <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">        <xs:element name="Process">           <xs:complexType>              <xs:sequence>                 <xs:element name="OrderNumber" type="xs:positiveInteger" />                 <xs:group ref="ContactDetails" />              </xs:sequence>           </xs:complexType>        </xs:element>              <xs:group name="ContactDetails">           <xs:sequence>              <xs:element name="Salutation" type="xs:string" minOccurs="0" />              <xs:element name="Name" type="xs:string"/>              <xs:element name="Address" type="xs:string"/>              <xs:element name="City" type="xs:string"/>              <xs:element name="State" type="xs:string"/>              <xs:element name="Country" type="xs:string"/>           </xs:sequence>        </xs:group>      </xs:schema>

In Listing 6-29, a new element is added-<Salutation>. This element includes a minOccurs attribute with a value set to 0 (zero). This means that the <Salutation> element can appear zero or one times in the document. The XML presented in Listing 6-30 is considered valid XML.

Listing 6-30: Using the minOccurs in an instance document

      <?xml version="1.0" encoding="UTF-8"?>      <Process xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"       xsi:noNamespaceSchemaLocation="DefaultValues.xsd">         <OrderNumber>1234</OrderNumber>         <Salutation>Mr.</Salutation>         <Name>Bill Evjen</Name>         <Address>123 Main Street</Address>         <City>Saint Charles</City>         <State>Missouri</State>         <Country>USA</Country>      </Process>

This also means that the code shown in Listing 6-31 is also considered valid XML.

Listing 6-31: Using the minOccurs in an instance document

      <?xml version="1.0" encoding="UTF-8"?>      <Process xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"       xsi:noNamespaceSchemaLocation="DefaultValues.xsd">         <OrderNumber>1234</OrderNumber>         <Name>Bill Evjen</Name>         <Address>123 Main Street</Address>         <City>Saint Charles</City>         <State>Missouri</State>         <Country>USA</Country>      </Process>

maxOccurs

The other attribute that helps you to control the number of times an element appears in any of your instance documents is the maxOccurs attribute. This attribute controls the maximum number of times an element may appear in your document. Listing 6-32 shows an example of using the maxOccurs attribute in your XML Schema document.

Listing 6-32: Using the maxOccurs attribute

            <?xml version="1.0" encoding="UTF-8" standalone="yes"?>      <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">        <xs:element name="Process">           <xs:complexType>              <xs:sequence>                 <xs:element name="OrderNumber" type="xs:positiveInteger" />                 <xs:group ref="ContactDetails" />              </xs:sequence>           </xs:complexType>        </xs:element>        <xs:group name="ContactDetails">           <xs:sequence>              <xs:element name="Name" type="xs:string"/>              <xs:element name="Address" type="xs:string"/>              <xs:element name="City" type="xs:string"/>              <xs:element name="State" type="xs:string"/>              <xs:element name="Country" type="xs:string"/>              <xs:element name="Telephone" type="xs:string" maxOccurs="2" />           </xs:sequence>        </xs:group>      </xs:schema>

In this case, a <Telephone> element is added that can occur once or twice within the XML document. All elements defined here need to occur at least once (unless they have a minOccurs attribute set to 0). This means that you can now have an instance document as presented in Listing 6-33.

Listing 6-33: Using the maxOccurs attribute

      <?xml version="1.0" encoding="UTF-8"?>      <Process xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"       xsi:noNamespaceSchemaLocation="Cardinality.xsd">         <OrderNumber>1234</OrderNumber>         <Name>Bill Evjen</Name>         <Address>123 Main Street</Address>         <City>Saint Charles</City>         <State>Missouri</State>         <Country>USA</Country>         <Telephone>555-1212</Telephone>         <Telephone>555-1213</Telephone>      </Process>

As you can see from this listing, the <Telephone> element has appeared twice in the document. This is allowed to occur because the maxOccurs attribute is set to 2.

A twist on the maxOccurs attribute is that you can set it to have an unlimited number of items by setting the value of the attribute to unbounded. This is shown in Listing 6-34.

Listing 6-34: Using the maxOccurs attribute

      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>      <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">        <xs:element name="Process">                 <xs:complexType>              <xs:sequence>                 <xs:element name="OrderNumber" type="xs:positiveInteger" />                 <xs:group ref="ContactDetails" />              </xs:sequence>           </xs:complexType>        </xs:element>        <xs:group name="ContactDetails">           <xs:sequence>              <xs:element name="Name" type="xs:string"/>              <xs:element name="Address" type="xs:string"/>              <xs:element name="City" type="xs:string"/>              <xs:element name="State" type="xs:string"/>              <xs:element name="Country" type="xs:string"/>              <xs:element name="Telephone" type="xs:string"               maxOccurs="unbounded" />           </xs:sequence>        </xs:group>      </xs:schema>

With this in place, the <Telephone> element can now appear as many times in the document as the instance document author wants (remember that it has to appear at least once in the document).

Default values

You sometimes want to specify whether an element has a default value. You can do this to make XML documents less error-prone and more user-friendly. For instance, suppose you want to provide a new child element called <OrderLocation> to the <Process> element and provide a default value to this element at the same time. You do this by using the default attribute within the element. You accomplish this task as presented in Listing 6-35.

Listing 6-35: Creating an element with a default value attached to it

      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>      <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">        <xs:element name="Process">           <xs:complexType>              <xs:sequence>                 <xs:element name="OrderNumber" type="xs:positiveInteger" />                 <xs:element name="Location" type="xs:string" default="Seattle" />                 <xs:group ref="ContactDetails" />              </xs:sequence>           </xs:complexType>        </xs:element>        <xs:group name="ContactDetails">           <xs:sequence>              <xs:element name="Name" type="xs:string"/>              <xs:element name="Address" type="xs:string"/>              <xs:element name="City" type="xs:string"/>              <xs:element name="State" type="xs:string"/>              <xs:element name="Country" type="xs:string"/>           </xs:sequence>        </xs:group>      </xs:schema>

In this case an XML element called <Location> provides the location where the order is to be processed. Using the default attribute within the <element> element, you are able to assign a default value of Seattle. This means that if no value is present in the instance document, a value of Seattle will be assumed. Using an XML Schema document as shown here means you can have an XML document like the one in Listing 6-36.

Listing 6-36: Building the <Location> element

      <?xml version="1.0" encoding="UTF-8"?>      <Process xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"       xsi:noNamespaceSchemaLocation="DefaultValues.xsd">         <OrderNumber>1234</OrderNumber>         <Location>San Francisco</Location>         <Name>Bill Evjen</Name>         <Address>123 Main Street</Address>         <City>Saint Charles</City>         <State>Missouri</State>         <Country>USA</Country>      </Process>

This is a valid instance document. Although a default value is set for the <Location> element with the XML Schema document from Listing 6-35, you can easily just override this value by assigning a new value (as shown in Listing 6-36) by setting the value to San Francisco. You could have also made use of the default value by building the XML instance document as presented in Listing 6-37.

Listing 6-37: Building the <Location> element using the default value

      <?xml version="1.0" encoding="UTF-8"?>      <Process xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"       xsi:noNamespaceSchemaLocation="DefaultValues.xsd">         <OrderNumber>1234</OrderNumber>         <Location />         <Name>Bill Evjen</Name>         <Address>123 Main Street</Address>         <City>Saint Charles</City>         <State>Missouri</State>         <Country>USA</Country>      </Process>

In this case, the value of Seattle is used for the <Location> element because nothing is specified. Note that using the default attribute means that you can use no value for the <Location> element, but at the same time, the <Location> element must appear in the document. If the element is not present, the instance document is considered invalid.

Fixed Values

A fixed value is similar to that of a default value with the big difference that the end user cannot change the value. When using a fixed value for an element, you are assigning a value that cannot be changed at all. This is done using the fixed attribute rather than the default attribute. For instance, if you wanted to set the <Location> element to a fixed value of Seattle, you would use code like that shown in Listing 6-38.

Listing 6-38: Creating an element with a fixed value attached to it

            <?xml version="1.0" encoding="UTF-8" standalone="yes"?>      <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">        <xs:element name="Process">           <xs:complexType>              <xs:sequence>                 <xs:element name="OrderNumber" type="xs:positiveInteger" />                 <xs:element name="Location" type="xs:string" fixed="Seattle" />                 <xs:group ref="ContactDetails" />              </xs:sequence>           </xs:complexType>        </xs:element>        <xs:group name="ContactDetails">           <xs:sequence>              <xs:element name="Name" type="xs:string"/>              <xs:element name="Address" type="xs:string"/>              <xs:element name="City" type="xs:string"/>              <xs:element name="State" type="xs:string"/>              <xs:element name="Country" type="xs:string"/>           </xs:sequence>        </xs:group>      </xs:schema>

Using this XML Schema means that the following element is valid:

      <Location>Seattle</Location> <!-- Valid -->

Using a value other than Seattle causes your instance document to be considered invalid:

      <Location>San Francisco</Location> <!-- Invalid -->

When using the fixed attribute, you may find that it behaves like the default attribute. As a consumer of this schema, you are not required to place a value within the <Location> element. This means that the following use of the <Location> element is also considered valid XML, and a value of Seattle is assumed.

      <Location /> <!-- Valid -->

Null Values

In some instances, you want to set items so that a null value is allowed. Sometimes, you also may want to set elements so that they cannot be null as well. In these cases, you use the nillable attribute and set this to either true or false. Its use is presented in Listing 6-39.

Listing 6-39: Creating an element with a value which can be null

      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>      <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">        <xs:element name="Process">           <xs:complexType>                    <xs:sequence>                 <xs:element name="OrderNumber" type="xs:positiveInteger" />                 <xs:element name="Location" type="xs:string" nillable="true" />                 <xs:group ref="ContactDetails" />              </xs:sequence>           </xs:complexType>        </xs:element>        <xs:group name="ContactDetails">           <xs:sequence>              <xs:element name="Name" type="xs:string"/>              <xs:element name="Address" type="xs:string"/>              <xs:element name="City" type="xs:string"/>              <xs:element name="State" type="xs:string"/>              <xs:element name="Country" type="xs:string"/>           </xs:sequence>        </xs:group>      </xs:schema>

Defining Attributes

So far in this chapter, much of the attention has been on XML elements. You saw how easy it is to create element declarations in your XML Schema documents. You can also just as easily create declarations for the attributes.

An attribute is a key/value pair that actually appears inside an element. Attributes are there to further define an element. Any element can contain as many attributes as it needs. Listing 6-40 shows an example of declaring an attribute to be used within the <Name> element.

Listing 6-40: Creating an attribute for the <Name> element

      <?xml version="1.0" encoding="UTF-8" standalone="yes"?>      <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">        <xs:element name="Process">           <xs:complexType>              <xs:sequence>                 <xs:element name="OrderNumber" type="xs:positiveInteger" />                 <xs:group ref="ContactDetails" />              </xs:sequence>           </xs:complexType>        </xs:element>        <xs:group name="ContactDetails">           <xs:sequence>              <xs:element name="Name">                 <xs:complexType>                   <xs:simpleContent>                      <xs:extension base="xs:string">                               <xs:attribute name="Sex"/>                      </xs:extension>                   </xs:simpleContent>                 </xs:complexType>              </xs:element>              <xs:element name="Address" type="xs:string"/>              <xs:element name="City" type="xs:string"/>              <xs:element name="State" type="xs:string"/>              <xs:element name="Country" type="xs:string"/>           </xs:sequence>        </xs:group>      </xs:schema>

This example shows that the element name can now have an attribute Sex contained within. This means that the following construction is possible:

      <Name Sex="M">Bill Evjen</Name>

It is also just as possible to do without the attribute, and the element is still considered valid.

      <Name>Bill Evjen</Name>

With the attribute, the <Name> element becomes even more defined. Just as you do when you declare an element declaration, you can declare an attribute by providing the name of the attribute and the datatype of the value it can hold. In this case, the datatype is defined as a string, so that it can contain an "M" or an "F".

An attribute is declared within the <schema>, <complexType>, or <attributeGroup> elements. If there are other declarations within this complex type, such as other elements, the attribute declarations should appear at the bottom of the element declarations. If you are declaring multiple attributes, they do not need to appear in any specific order.

Default Values

The attribute tag within an XML Schema document can contain the attribute default as well. This specifies the initial value of the attribute as it is created. If the end user, creating an instance document based upon a schema with this type of attribute declaration, doesn't override the initial value, the default value is used. This is shown in Listing 6-41 within this partial XML Schema document.

Listing 6-41: Creating default values for attributes

      <xs:element name="Name">         <xs:complexType>            <xs:simpleContent>               <xs:extension base="xs:string">                  <xs:attribute name="Member" default="No" />               </xs:extension>            </xs:simpleContent>         </xs:complexType>      </xs:element>

The big difference between defaults for elements and attributes is that when you define a default value for an element, the element must still appear in the XML instance document even if the consumer doesn't specify any value. Attributes, on the other hand, don't need to be present, and the default value is assumed.

use Attribute

The use attribute allows you to specify whether the attribute for the element is required. This is an optional attribute itself and can take one of three possible values-optional, prohibited, or required. This is shown in Listing 6-42.

Listing 6-42: Using the use attribute

      <xs:element name="Name">         <xs:complexType>            <xs:simpleContent>               <xs:extension base="xs:string">                  <xs:attribute name="Member" use="required" />               </xs:extension>            </xs:simpleContent>         </xs:complexType>      </xs:element>

The default setting is optional.

Putting Restrictions on Attribute Values

At times you don't want to allow the end user to enter any value he wants for an attribute, but instead, you choose to put a limit on the attribute's values. This is done as follows (shown in Listing 6-43):

Listing 6-43: Restrictions being applied to attributes

      <xs:element name="Name">         <xs:complexType>            <xs:simpleContent>               <xs:extension base="xs:string">                  <xs:attribute name="Age">                    <xs:simpleType>                       <xs:restriction base="xs:positiveInteger">                          <xs:minInclusive value="12" />                          <xs:maxInclusive value="95"" />                       </xs:restriction>                    </xs:simpleType>                  </xs:attribute>               </xs:extension>            </xs:simpleContent>         </xs:complexType>      </xs:element>

This is done by using the <minInclusive> and <maxInclusive> elements. The preceding example specifies an attribute Age where the minimum value that can be utilized is 12 and the maximum value is 95. Therefore, if the end user inputs a value that is not within this range, the XML document is considered invalid.

Earlier in this chapter, you saw how it is possible to use the <xs:restriction> element to put in an enumeration of available options as well:

      <xs:simpleType name="MyCountry">         <xs:restriction base="xs:string">            <xs:enumeration value="USA" />            <xs:enumeration value="UK" />            <xs:enumeration value="Canada" />            <xs:enumeration value="Finland" />         </xs:restriction>      </xs:simpleType>

In this case, the <xs:restriction> element along with the list of <xs:enumeration> elements forces a restriction to only the items in the list.

Other types of restrictions you can utilize include:

      <xs:restriction base="xs:string">         <xs:minLength value="1" />      </xs:restriction>

      <xs:restriction base="xs:string">         <xs:maxLength value="20" />      </xs:restriction>

Using <xs:minLength> or <xs:maxLength> allows you to define the length restriction of the element contents. These elements are used to define string restrictions. You could also define numerical restrictions using the <xs:totalDigits> element.

The available constraining facets in the XSD Schema language include:

Open table as spreadsheet

Primitive Data Types	Description
`enumeration`	Defines a set of allowed values
`fractionDigits`	Defines a value with the specific number of decimal digits
`length`	Sets the units of length that the element can contain
`maxExclusive`	Defines an upper-level bound value based upon the data type of the element
`maxInclusive`	Defines the maximum value
`maxLength`	Defines the maximum number of units of length that is allowed for a value
`minExclusive`	Defines a lower-level bound value based upon the data type of the element
`minInclusive`	Defines the minimum value
`minLength`	Defines the minimum number of units of length that is allowed for a value
`pattern`	Defines an exact structure of a value using regular expressions
`totalDigits`	Defines the total allowed digit (int) values
`whitespace`	Defines how whitespace elements should be treated in the value. Possible values include `Preserve`, `Replace`, or `Collapse`

Attribute Groups

If you have certain attributes that are used across a wide variety of elements, it is easier to create an attribute group in order to manage these attributes. This function allows you to create a group of attributes that you can assign to different elements without having to declare the same attributes over and over again for each element. This is shown in Listing 6-44.

Listing 6-44: Creating an attribute group

      <xs:attributeGroup name="myAttributes">         <xs:attribute name="x" type="xs:integer" />         <xs:attribute name="y" type="xs:integer" />      </xs:attributeGroup>      <xs:complexType name="myElementType">         <xs:attributeGroup ref="myAttributes" />      </xs:complexType>

The idea is to declare a group of attributes within the <attributeGroup> element. When you are ready to declare a set of attributes within an element, you simply make a reference to the attribute group using the <attributeGroup> element. Within this tag, you simply point to the attribute group reference using the ref attribute.

Even when using the attribute groups to define attributes within your elements, you can still provide an element with attributes other than those that the attribute group specifies as is shown in Listing 6-45.

Listing 6-45: Using additional attributes

            <xs:attributeGroup name="myAttributes">         <xs:attribute name="x" type="xs:integer" />         <xs:attribute name="y" type="xs:integer" />      </xs:attributeGroup>      <xs:complexType name="myElementType">         <xs:attribute name="z" type="xs:integer" />         <xs:attributeGroup ref="myAttributes" />      </xs:complexType>

With this declaration, you assign the attributes that are represented in the attribute group myAttributes as well as the new attribute z.

In some situations, you don't want to use every attribute that is defined within the attribute group. In these cases, you simply use the prohibited keyword with the use attribute to turn off the capability for the end user to employ that particular attribute within the element.