XML Schema | SOA for the Business Developer: Concepts, BPEL, and SCA (Business Developers series)

The next sections introduce XML Schema definitions, or XSDs. You use an XML Schema to describe the values permitted in an XML document and the relationships among the document's elements. For details about the XML Schema definition language, see the Web site http://www.w3.org/TR/xmlschema-0.

Data Type

To understand an XSD, you first need to understand the notion of data type. A data type is an identifier that specifies a set of values along with operations able to act on those values. In most computer languages, for example, an integer can include digits and can be mathematically combined with other numbers.

In general, we use data types to validate user input, to avoid runtime errors, and to increase the efficiency of runtime code. Data types also allow a meaningful categorization of data, so that, for instance:

Each use of an employee ID is based on a data type named Employee-ID.
Each employee record has the same hierarchy, with data types such as Employee-ID, Name, and Age.
A known set of validations is in use when a data type such as Name (always containing Last-Name and First-Name) is used in different ways, whether for an employee record or a customer-contact record.

Data types are described in various ways. With some simplification, we can characterize XML Schema types as fitting into one of four basic categories: primitive, derived, simple, and complex.

A primitive type is a data type that does not include another and is not derived from another. Examples include Boolean, decimal, and string.

A derived type is based on a primitive type and is provided by XML Schema. The type nmtoken, for example, is a subset of a string, with no spaces anywhere.

A simple type can be a primitive type or a derived type, but it also can be derived from another simple type by a developer. We want to emphasize a particular use of types in business development, so we'll use the phrase simple type to refer only to a developer-derived simple type. If you read XML specifications, however, be aware that the phrase "simple type" can include primitive and derived types.

A simple type retains every operation of the type on which the simple type is based but adds data restrictions that reflect business rules. Given a base type of string, for example, you might create a simple type called Employee-ID and allow only the letters A through J followed by five digits. The meaningfulness of the name Employee-ID helps developers and business analysts to think and communicate clearly. In addition, type checking at different points in the development cycle can ensure that all employee IDs conform to the restrictions specified in the definition of the type.

A complex type is a data type that is composed of other data types. An example might be called Employee-Record, which includes a simple type called Employee-ID, a complex type called Name, an integer called Age, and so on. Any data type in the composition may be primitive, derived, simple, or complex. When you create a complex type, you give names even to the primitive and derived types that are included in that complex type. As always, meaningful names are helpful.

A type doesn't contain values but identifies what values are possible. An employee-record type, for example, doesn't refer to a specific employee, but expresses the format needed to describe an employee. A variable, in contrast, contains data that fulfills the type requirements. A variable of an employee-record type, for example, can hold the employee ID, name, and age of a specific employee. The variable is said to be an instance of the type.

As we stated earlier, a data type allows specific values and operations. A string, for example, can be concatenated with a string, but not with an integer. In relation to a given value, however, you can sometimes ignore or override those restrictions. You can concatenate the string "The year was" with the integer 2000, for example, but only if the integer is first converted to a string. Conversions may happen automatically or by the developer's cast, which is a directive that converts a value from one type to another.

Computer languages are sometimes categorized as either weakly typed, to the extent that data-type conversions happen automatically, or strongly typed, to the extent that the conversions happen only as a result of an explicit cast. A weakly typed language requires less discipline at development time; but a strongly typed language allows for faster processing at run time because less type checking is required then.

Purpose of an XML Schema

You author an XML Schema (sometimes called an XML Schema definition, or XSD) to describe the values that are allowable in each element and attribute in an XML document and to characterize the relationship of one element to another. In short, an XSD describes an XML vocabulary.

In SOA, the XSD has two primary purposes:

to tell SOA-related tools (often in an integrated development environment) how to construct the code that operates "behind the scenes" to make possible a data exchange between the requester and a specific service
to assign validation rules that restrict the kind of data accepted by an endpoint at run time

The XSDs for a particular service must be available at each endpoint of a transmission. In the case of a Web service, an XSD is often embedded in a Web Services Description Language (WSDL) file, as described in Chapter 5. The XSD usually is not transferred with the business data.

Structure of an XML Schema

An XML Schema specifies a content model, which is an allowable set of content (names and types) for elements, along with equivalent details on the attributes of each element. An XML stream that conforms to the rules established in the XML Schema is called an instance document, and we can refer to the elements and attributes in that stream as instance elements and instance attributes, respectively.

As the example in Listing 4.3 shows, an XML Schema is itself written in XML.

Listing 4.3: Sample XML Schema definition

 <?xml version="1.0" encoding="ISO-8859-1"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:target="http://www.IBM.com/HighlightInsurance" targetNamespace="http://www.IBM.com/HighlightInsurance" elementFormDefault="unqualified">    <annotation>       <documentation xml:lang="en">          An example XML Schema Definition (XSD).       </documentation>    </annotation>    <element name="Insured">       <complexType>          <choice minOccurs="1" maxOccurs="unbounded">             <element name="CarPolicy">                <complexType>                   <choice maxOccurs="unbounded">                      <element name="Vehicle"                               type="target:VehicleType"/>                      <element name="Driver"                               type="target:DriverType"/>                   </choice>                   <attribute name="PolicyType"                              type="string" default="Auto"/>                </complexType>             </element>             <element name="HomePolicy" type="target:HomePolicyType"/>             </choice>             <attribute name="CustomerID" type="string" use="required"/>         </complexType>      </element>      <complexType name="VehicleType">         <group ref="target:VehicleGroup"/>         <attribute name="VIN" use="required">            <simpleType>               <restriction base="string">                  <minLength value="4"/>                  <maxLength value="26"/>               </restriction>            </simpleType>         </attribute>         <attribute name="Category" type="string"/>    </complexType>    <group name="VehicleGroup">       <sequence>          <element name="Make" type="string"/>          <element name="Model" type="string"/>       </sequence>    </group> <!- The complex types DriverType and HomePolicyType are not shown -> </schema>

Global and Local Types

Any data types that are immediate children of the schema element are global, which means you can use the type name when assigning characteristics to elements or attributes anywhere in the Schema. An example of a global type is VehicleType.

 <schema>    <complexType name="VehicleType">     .     .    </complexType> </schema>

Other data types you declare are local, which means that they affect only their parent, in which case a type name is unnecessary.

 <element name="Insured">    <complexType>     .     .    </complexType>

A data type that has no name is sometimes called an anonymous type.

Simple and Complex Types

A global or local type can be simple or complex. A simple type is a data type that indicates the allowable text content for an element or attribute, as in the following lines.

 <attribute name="VIN" use="required">    <simpleType>       <restriction base="string">         <minLength value="4"/>         <maxLength value="26"/>       </restriction>    </simpleType> </attribute>

Here, the subordinate restriction element includes two kinds of details:

A base type, which is the type from which the simple type will be derived. The base type is a global simple type, a derived type, or a primitive type.
A set of facets (that is, characteristics) and related values. In this case, a restriction is in place for a minimum number of characters (minLength) and a maximum number of characters (maxLength).

Instead of a restriction element, we could have used a list element (to allow content to be a series of values of the base type) or a union element (to allow content to represent any of several base types).

A complex type is a data type that includes elements, attributes, or both, as in the following lines.

 <element name="CarPolicy">    <complexType>       <choice maxOccurs="unbounded">          <element name="Vehicle" type="target:VehicleType"/>          <element name="Driver" type="string"/>       </choice>       <attribute name="PolicyType"                  type="string" default="Auto"/>    </complexType> </element>

Here, the subordinate choice element shows that

any number of child elements are valid, as indicated by the maxOccurs attribute setting
a given child element can be called Vehicle (of type VehicleType) or Driver (a string)

Instance attributes are optional by default. CarPolicy has an optional PolicyType attribute, for example, and the default value of PolicyType is Auto.

VehicleType is another complex type, which includes a group of elements (type VehicleGroup, as described in the next section) and the attributes VIN and Category.

 <complexType name="VehicleType">    <group ref="target:VehicleGroup"/>    <attribute name="VIN" use="required">       .       .    </attribute>    <attribute name="Category" type="string"/> </complexType>

Groups

A group declaration is essentially a data type that specifies a list of elements. An example of a group declaration is VehicleGroup.

 <group name="VehicleGroup">    <sequence>       <element name="Make" type="string"/>       <element name="Model" type="string"/>    </sequence> </group>

Instance elements are required unless the minimum-occurrence (minOccurs) attribute is set to 0. (The default value of minOccurs is 1.) The element VehicleGroup, for example, indicates that each child instance element (Make and Model) is required.

Sequencing

A complex type or group might include a sequencing element:

The sequence element means that the instance elements must be in the specified order.
The all element means that the instance elements can be in any order.
The choice element means that the instance document can include a subset of elements.

Let's look at two examples.

Insured (the root element of the XML instance document) must include at least one CarPolicy or HomePolicy instance element and can include any number of those elements in any combination or order.

 <element name="Insured">    <complexType>       <choice minOccurs="1" maxOccurs="unbounded">          <element name="CarPolicy">             <complexType>                .                .             </complexType>          </element>          <element name="HomePolicy"                   type="target:HomePolicyType"/>       </choice>       <attribute name="CustomerID" type="string"                  use="required"/>    </complexType> </element>

Incidentally, the Insured instance element also must include a CustomerID attribute, as indicated by the value of use.

As shown in VehicleGroup, the Make instance element must precede Model.

 <group name="VehicleGroup">    <sequence>       <element name="Make" type="string"/>       <element name="Model" type="string"/>    </sequence> </group>

Simple and Complex Content

You can derive a complex type from an existing (base) type. The derived type will have characteristics of the base type

with extensions, which are added attributes, content, or both
with restrictions, which are exclusions of the existing attributes, content, or both

For example, you can use the simpleContent Schema element to add attributes to an instance element that has text content. Consider the Options element, which we described earlier.

 <Options>    <TemporaryRental MaximumDays="10"/>    <Towing/> </Options>

Here is a related XML Schema element.

 <element name="Options">    <all>       <element name="TemporaryRental" type="TemporaryRentalType"/>       <element name="Towing" type="boolean" default="true"/>    </all> </element> <complexType name="TemporaryRentalType" default="true">    <simpleContent>       <extension base="boolean">          <attribute name="MaximumDays"                     type="integer" default="10"/>          <attribute name="MaxDollarPerDay"                     type="decimal" default="25.99"/>       </extension>    </simpleContent> </complexType>

Within the TemporaryRental Type element, the simple content element specifies that any instance element based on that type has Boolean text (value true or false, with a default) and can include the optional attributes Maximum Days (which takes an integer value) and MaxDollarPerDay (which takes a decimal value). Each attribute value has a default, too.

You might use the element simpleContent to create a type that restricts aspects of an existing complex type, which itself has simple content. Here's an example.

 <complexType name="PremiumRentalType" default="true">    <simpleContent>       <restriction base="TemporaryRentalType">          <attribute name="MaximumDays" use="required"/>          <attribute name="MaxDollarPerDay" use="prohibited"/>       </extension>    </simpleContent> </complexType>

Given the new definition, you could allow an instance element (of whatever name) and base that element on the complex type PremiumRentalType. Only one attribute is valid, and it's required. The modified MaxDollarPerDay attribute is not valid in the instance element because, in the Schema attribute definition, the value of use is prohibited.

You also can use the complexContent Schema element

to extend a complex type - for example, to add elements or attributes
to restrict a complex type - for example, to remove elements or attributes, to change element characteristics such as the minimum number of occurrences, or to change attribute characteristics

Schemas and Namespaces

An XML processor uses namespace details to process an instance document, and the primary source of the namespace details can be either the instance document or the related Schema. Different SOA implementations handle the issue differently, but in any case, settings in the Schema indicate which source to use:

If the source is the Schema, less namespace information appears in the instance document. In this case, the instance document is easier to understand and maintain, especially when the Schema organization is complicated. Also, the Schema author has the flexibility to merge or split Schemas, with less chance that the reorganization will mean changes to the existing XML instance documents.
If the source of namespace details is the XML instance document, that document shows namespace information to a wider audience, and the XML processor may run faster.

In the schema element of our sample XML Schema definition, we set the attribute elementFormDefault to unqualified, which means that most instance elements are not qualified with a namespace name. Instead, the Schema is the source of namespace details for every element other than the root element. The unqualified setting is the default.

The schema element includes other namespace details, too:

The namespace http://www.w3.org/2001/XMLSchema is the default namespace, but this namespace is often associated with the prefix xs or xsd, as shown later in a WSDL file. That namespace refers to identifiers such as attribute, which is the name of an element in the XML Schema itself.
The attribute targetNamespace specifies the target namespace, which is used as a category for each name you're adding. The target namespace includes the names of the elements and attributes that will be allowed in your XML instance document and includes the names of any types you create.
As you can see, the name of the target namespace is also in a second namespace declaration. The purpose of that second declaration is to allow references to any data-type information that you (as the author of the XML Schema) specify in the Schema. The following lines, for example, reference the namespace that is identified by the prefix target and points to VehicleType, which is a data type in the Schema.
```
 <element name="Vehicle"          type="target:VehicleType"/> 
```
Aside from the types you create, a set of XML Schema types is available in the default namespace (http://www.w3.org/2001/XMLSchema). For that reason, the use of string in the following declaration is valid in our example.

 <element name="Make" type="string"/>

Last, the documentation element in our example includes the prefix xml, which refers to a namespace that is defined by the XML specification. The language of the documentation element is American English, as indicated by the following entry.

 xml:lang="en-US"