XML Schema | Java Web Services Architecture (The Morgan Kaufmann Series in Data Management Systems)

Figure A.3 shows the XML schema for the Flute employee list XML from Listing A.1. The first thing you notice is that, compared with the DTD in Figure A.2, the XML schema is much longer. The reason is twofold: First, XML Schema, being XML, is more verbose. Second, XML Schema defines the business rules for a Flute employee much more comprehensively than the DTD. Although it is long, it is easy to understand when broken down into smaller parts. We will introduce the different parts one by one and, where appropriate, map the schema structure to the elements in the employeeList DTD.

click to expand
Figure A.3: XML Schema for the employeeList document

Namespaces

In the schema for the employeeList XML, we used a vocabulary that had particular meaning in the context. The vocabulary used to construct the schema (e.g., element, attribute, restriction, simpleType, complexType, etc.) has specific meanings in an XML Schema document. This vocabulary is defined in a context, or a namespace: http://www.w3.org/2001/XMLSchema.

By associating these words with a namespace, we are qualifying the names and, in the process, ensuring no clashes arise between the same words used in different contexts. A Java programmer can think of a namespace as analogous to a package name. The same class name can be used in a piece of Java code, but only if it is qualified by the package name to which it belongs.

Before using a qualified name in a document, a namespace must be declared:

 xmlns:xsd="http://www.w3.org/2001/XMLSchema"

Read this as "XML namespace qualifier 'xsd' represents namespace 'http://www.w3.org/2001/XMLSchema.'" In declaring this namespace, we are essentially saying that all elements and attributes qualified with "xsd" are defined in the namespace http://www.w3.org/2001/XMLSchema. A name is qualified by using the declared qualifier as a prefix. For example, the word schema is qualified as xsd:schema. This means that the element schema has been defined in the namespace represented by the xsd qualifier. The XML Schema namespace is also called the "schema of schemas," because it defines all schema definition elements and attributes.

The employeeList schema also declares a targetNamespace and a default namespace. The default namespace is in effect when elements are referred to without a qualifier. The default namespace is declared without a namespace qualifier (which was xsd in the previous example):

 <xsd:schema  xmlns:xsd="http://www.w3.org/2001/XMLSchema"           targetNamespace="http://www.flute.com"           xmlns="http://www.flute.com"> ... <xsd:element  ref="employee" minOccurs="1" maxOccurs="1"/>

In Figure A.3 the default namespace applies to elements such as employee, employeeList, and dept. In the code above, ref="employee" refers to an employee element declared in the default namespace (http://www.flute.com).

The targetNamespace declaration signifies that the vocabulary defined in this schema document (employee, first_name, email, dept, extn, etc.) belong to the http://www.flute.com namespace (Figure A.4). The target namespace is the one to which the employeeList schema elements are defined. In the example, we have elected to associate the default namespace with the target namespace elements—that is, to refer to the target namespace elements without a qualifier. The target namespace value is significant because, when an instance document uses the elements declared in the schema document, those element declarations must point to the target namespace value. (Figure A.5, in the "Bringing It All Together" section, illustrates this.).

click to expand
Figure A.4

click to expand
Figure A.5

A schema need not define elements in a namespace (i.e., it's okay to have a schema with no targetNamespace attribute). Instance documents that use elements defined without a namespace may use the noNamespaceSchema attribute to provide the process with the schema's location:

 xsi:noNamespaceSchemaLocation="employee.xsd"

Simple Types

Elements such as first_name and email have simple datatype values (e.g. strings). These types (string, int, etc.), which are prefixed with the xsd qualifier, are called built-in types, because they are defined in the schema of schemas. To declare built-in XML schema simple types in the employeeList XML schema, the schema declaration is straightforward:

 <xsd:element name="name" type="type" minOccurs="int" maxOccurs="int"/>

For example:

 <xsd:element name="email" type="xsd:string"/

In a schema, elements and attributes are declared, and types are defined.

minOccurs and maxOccurs constraints determine how many times that particular element may be repeated in the document. The default value of minOccurs and maxOccurs is "1". A special value of unbounded is used to indicate that a particular element may repeat any number of times.

Chapter 10 provides mapping between common built-in datatypes and Java types.

Extending Simple Types

The Flute Bank business rules state that all employee_id values must be between 1 and 100,000. An element declaration such as <xsd:element name="employee_id" type="xsd:int"/> enforces the rule only partially: all employee IDs are integer values. To add further constraints on the declaration, XML Schema allows new types to be defined by extending built-in types, using the simpleType element:

 <xsd:element name="employee_id">     <xsd:simpleType>     <xsd:restriction base="xsd:int">         <xsd:minInclusive value="1"/>         <xsd:maxInclusive value="100000"/>     </xsd:restriction>     </xsd:simpleType> </xsd:element>

In the above XML fragment, the employee_id element is declared with a new simple type that is a restriction on the base built-in type int. The restrictions are added to the employee_id element on top of the built-in type restriction of integers and are declared using facets. In this example, the facets added to the int type are minInclusive and maxInclusive. The general syntax for a simpleType is

 <xsd:simpleType>           <xsd:restriction base="simple-type">                 <xsd:facet value="value"/>                 <xsd:facet value="value"/>                 ...           </xsd:restriction>     </xsd:simpleType>

When a restriction element contains multiple facets, they are ORed if they are enumeration or pattern facets. All other facets are ANDed.

The example below shows how a simple type is created by applying a pattern facet to a string datatype. The pattern expression is a regular expression.

 <xsd:element name="dept">   <xsd:simpleType>       <xsd:restriction base="xsd:string">           <xsd:pattern value="[0-9]{3}-[0-9]{3}-[0-9]{4}"/>       </xsd:restriction>   </xsd:simpleType> </xsd:element>

Table A.1 shows a few facets that can be used with different built-in types to create new simpleTypes.

Table A.1: Facets

Built-in type	Facet
String, all number types	enumeration
*Example*
<xsd:element name=stateCode <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="CA"/> <xsd:enumeration value="MA"/> </xsd:restriction> </xsd:simpleType> </xsd:element name=employeeType
stateCode can have only values "CA" or "MA"

Built-in type	Facet
String, token, normalized String	length, minLength, maxLength pattern
*Example*
<xsd:element name=stateCode <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:minLength value="2"/> <xsd:maxLength value="2"/> </xsd:restriction> </xsd:simpleType> </xsd:element name=employeeType stateCode length = 2 characters <xsd:element name="extn"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:pattern value="[0-9]{5}"/> </xsd:restriction> </xsd:simpleType> </xsd:element>
extn is a sequence of five digits. (Any regular expression can be used.)

Complex Types

A complex data structure is modeled with a complexType element. A type created with complexType maps to a Java bean. A complex type can contain other subelements and can have attributes (simpleTypes can have neither). In the following code, employee is a complex type consisting of several elements and one attribute:

Table A.2: Facets

Built-in type	Facet
Most numeric types	maxInclusive, minInclusive, maxExclusive, minExclusive
*Example*
<xsd:element name="employee_id"> <xsd:simpleType> <xsd:restriction base="xsd:int"> <xsd:minInclusive value="1"/> <xsd:maxInclusive value="100000"/> </xsd:restriction> </xsd:simpleType> </xsd:element>
employee_id is an integer between 0 and 100,000.

Built-in type	Facet
Decimal	totalDigits, fractionalDigits
*Example*
<xsd:element name="amount"> <xsd:simpleType> <xsd:restriction base="xsd:decimal"> <xsd:totalDigits value="10"/> <xsd:fractionDigits value="2"/> </xsd:restriction> </xsd:simpleType> </xsd:element>

Built-in type

Facet

Decimal

totalDigits, fractionalDigits

Example

    <xsd:element name="amount">      <xsd:simpleType>      <xsd:restriction base="xsd:decimal">          <xsd:totalDigits value="10"/>          <xsd:fractionDigits value="2"/>      </xsd:restriction>      </xsd:simpleType>    </xsd:element>

 <xsd:element name="employee">    <xsd:complexType>       <xsd:sequence>           <xsd:element ref="employee_id" minOccurs="1" maxOccurs="1"/>           <xsd:element ref="name" minOccurs="1" maxOccurs="1"/>           <xsd:element ref="extn" minOccurs="1" maxOccurs="1"/>           <xsd:element ref="dept" minOccurs="1" maxOccurs="1"/>           <xsd:element ref="email" minOccurs="1" maxOccurs="1"/>       </xsd:sequence>       <xsd:attributeGroup ref="employeeAttribute"/>    </xsd:complexType> </xsd:element>

This example ensures that an employee instance XML will contain elements for employee ID, name, extension, department, and email. You may note that these datatypes of subelements are not defined in the complex type element. Instead, we have chosen to define the subtypes elsewhere in the document and only refer to those definitions here.

For example, <xsd:element ref="employee_id" minOccurs="1" maxOccurs="1"/> uses a reference to the employee_id element, which is of a simpleType defined later in the schema. This is not the only way in which a complex type can be defined. It is also possible to define a simpleType inline:

 <xsd:element name="employee">    <xsd:complexType>       <xsd:sequence>          <xsd:element name="employee_id" minOccurs="1" maxOccurs="1"/>             <xsd:simpleType>               ...              </ xsd:simpleType>            <xsd:element>            <xsd:element ref="name" minOccurs="1" maxOccurs="1"/>            <xsd:element ref="extn" minOccurs="1" maxOccurs="1"/>            <xsd:element ref="dept" minOccurs="1" maxOccurs="1"/>            <xsd:element ref="email" minOccurs="1" maxOccurs="1"/>          </xsd:sequence>       <xsd:attributeGroup ref="employeeAttribute"/>    </xsd:complexType> </xsd:element>

However, defining a simpleType with a name and then referring to it wherever it is used lends itself to reuse of types.

sequence and all

sequence signifies that the order of elements declared in it is important. If the order of the subelements within a complexType is not important, the all element can be used to convey this:

 <xsd:complexType>     <xsd:all>          <xsd:element ref="employee_id" minOccurs="1" maxOccurs="1"/>          <xsd:element ref="name" minOccurs="1" maxOccurs="1"/>          <xsd:element ref="extn" minOccurs="1" maxOccurs="1"/>     </xsd:all> </xsd:complexType>

choice

What if we want to express that a Flute employee has either a manager_id or an employee_id? (This is not good design, but the point is to illustrate how the XML schema can handle choices.) choices can appear in a sequence:

 <xsd:element name="employee">     <xsd:complexType>         <xsd:sequence>             <xsd:choice>                       <xsd:element ref="employee_id" />                       <xsd:element ref="manager_id" />                 </xsd:choice>                 <xsd:element ref="name" />             <xsd:element ref="extn" />             <xsd:element ref="email" />             <xsd:element ref="dept"/>         </xsd:sequence>     </xsd:complexType> </xsd:element>

In the above fragment, employee must have name, extn, email, and dept and either a manager_id or an employee_id.

attributes

In our Flute Bank example, employees can be described by an employee_type attribute indicating whether they are permanent or contract. In an XML Schema document, an element with one or more attributes can be defined only as a complexType. The example below shows how the attribute declaration is made within the employee complexType. In the example, the attribute is allowed only two values (enumeration). This means that using a datatype of xsd:string is insufficient to define the type of the employee_type attribute. Just as we did for employee_id, we must define a new type for this attribute. Just as we created a new simpleType for employee_id, the employee_type attribute defines a new type by restricting the base xsd:string type with enumeration facets:

 <xsd:element name="employee">    <xsd:complexType>       <xsd:sequence>           <xsd:element ref="employee_id" minOccurs="1" maxOccurs="1"/>           <xsd:element ref="name" minOccurs="1" maxOccurs="1"/>           <xsd:element ref="extn" minOccurs="1" maxOccurs="1"/>           <xsd:element ref="dept" minOccurs="1" maxOccurs="1"/>           <xsd:element ref="email" minOccurs="1" maxOccurs="1"/>       </xsd:sequence>       <xsd:attribute name="empType" use="required">           <xsd:simpleType>              <xsd:restriction base="xsd:string">                   <xsd:enumeration value="contract"/>                      <xsd:enumeration value="perm"/>              </xsd:restriction>          </xsd:simpleType>      </xsd:attribute>  </xsd:complexType>

The general syntax for declaring attributes locally is

 <xsd:attribute name="name" use="required|optional|prohibited" default/fixed="value">     <xsd:simpleType>           <xsd:restriction base="built-in type">                <xsd:facet value="value"/>                 ...           </xsd:restriction>     </xsd:simpleType> </xsd:attribute>

 <xsd:attribute name="name" type="built-in type" "use="required|optional|prohibited" default/fixed="value"/>

An attribute can also be defined globally (i.e., not inline within the complexType element definition) and then referred to within the element. When declaring attributes globally, the use parameter cannot appear within the global declaration; instead, it must be specified in the complexType element where it is referenced. The code below shows a global declaration:

 <xsd:element name="employee">    <xsd:complexType>       <xsd:sequence>           <xsd:element ref="employee_id" minOccurs="1" maxOccurs="1"/>           ...           <xsd:element ref="email" minOccurs="1" maxOccurs="1"/>       </xsd:sequence>       <xsd:attribute ref="empType" use="required"/>    </xsd:complexType> ...    <xsd:attribute name="empType">       <xsd:simpleType>           <xsd:restriction base="xsd:string">               <xsd:enumeration value="contract"/>               <xsd:enumeration value="perm"/>           </xsd:restriction>       </xsd:simpleType>   </xsd:attribute>

attributeGroup

If an element has several attributes, the attribute declarations can be grouped and a single reference made to the attribute group:

 <xsd:element name="employee">   <xsd:complexType>       <xsd:sequence>           <xsd:element ref="employee_id" minOccurs="1" maxOccurs="1"/>           <xsd:element ref="name" minOccurs="1" maxOccurs="1"/>           <xsd:element ref="extn" minOccurs="1" maxOccurs="1"/>           <xsd:element ref="dept" minOccurs="1" maxOccurs="1"/>           <xsd:element ref="email" minOccurs="1" maxOccurs="1"/>       </xsd:sequence>       <xsd:attributeGroup ref="employeeAttribute">   </xsd:complexType> </xsd:element> ... <xsd:attributeGroup name="employeeAttribute">   <xsd:attribute name="USBased" type="xsd:string" use="optional" />   <xsd:attribute name="empType" use="required">       <xsd:simpleType>          <xsd:restriction base="xsd:string">               <xsd:enumeration value="contract"/>               <xsd:enumeration value="perm"/>          </xsd:restriction>       </xsd:simpleType>   </xsd:attribute> </xsd:attributeGroup>

Comments in XML Schema

XML Schema provides the annotation element to document the schema. An annotation element can contain two elements: the documentation element, meant for human consumption, and the appinfo element, for machine consumption:

 <asd:annotation> <xsd:documentation xml:lang="en">  The next appinfo element provides a custom instruction to the processor </xsd:documentation> <xsd:appinfo>         <instruction some instruction </instruction>   </xsd:appinfo> </xsd:annotation

XML Schema cannot handle all types of validations. It cannot handle validations that require complex cross-element or -attribute values (e.g., a rule such as, "If employee_type attribute value is 'contract,' then employee_id value must be between 20,000 and 40,000"). For these types of complex validations, the appinfo element can provide instructions to another tool (e.g., an XSLT engine or Schematron) to enforce these complex constraints. An XML schema validator would validate the annotated schema against the instance document, and Schematron would validate the instance document based on the extracted instructions embedded in the appinfo elements.