2.5 XML Schema

XML Schema is designed to enable object-oriented descriptions of XML. The rich type system introduced in XML Schema was specifically designed to allow the encoding of structured data in XML. In XForms, XML Schema's ability to create object-oriented descriptions of XML data is used to advantage in modeling the data to be collected by the application.

User input can be checked against these declarative constraints using XML processors. The rest of this section gives a brief tutorial on the features of XML Schema that prove useful in designing data models for XForms applications. The interested reader is referred to the wealth of XML Schema resources for additional details.

With the maturing of XML on the Web, the use of XML for structured data interchange is becoming increasingly popular. Data repositories, such as relational databases, also find XML representations of structured data a convenient means of exchanging data among different systems. These uses of XML for encapsulating and interchanging structured data create the need for static type checking of XML data. Type information can be captured using XML Schema, and such type constraints can be automatically checked using off-the-shelf XML processors such as xerces. ^[9] Such structured data can be bound to specific implementation languages such as Java using data binding . This proves a convenient means of interchanging data among systems distributed across the network and automatically marshaling such data between the XML interchange representation and the run-time representation used by a given environment.

^[9] http://xml.apache.org/xerces

We illustrate the XML Schema declaration for USAddress in Figure 2.10 and compare it to an equivalent Interface Definition Language (IDL) declaration of the same type in Figure 2.6. Notice that the IDL representation is biased toward implementation languages, whereas the XML representation is biased toward declaratively capturing the required information in an implementation-independent manner. These representations should not be viewed as competing approaches; rather, each reflects different design points in the overall spectrum of possible solutions.

Figure 2.6 IDL declaration of type USAddress

 <  idl  >   interface USAddress {   String name; String  street; String city;   //Enumeration of  two letter codes   USState  state;   Integer zip; Float gpsLatitude; Float gpsLongitude;   } </  idl  >

2.5.1 Schema Built-in Types

The following built-in types from XML Schema are especially relevant for modeling structured data to be collected from the user. Note that XML Schema also has a few built-in types that are more relevant to defining document grammars, for example, token that will not be discussed in detail in this book. The complete list of built-in schema data types is described in XML Schema Part 2. ^[10]

^[10] http://www.w3.org/TR/xmlschema-2/

These built-in types help constrain the lexical values of leaf nodes in an XML structure. These constraints can thus be applied to the text contents of an element or the value of an attribute. This set of basic types can be extended as described in the next section. XML Schema defines several additional data types derived from the above set of built-in types. We illustrate the use of the enumerated built-in types with an example in Figure 2.7. An XML instance document is shown with the type of each element declared using attribute xsi:type . Later, we extend this example to define a complete schema for invitations in Section 2.5.3. ^[11]

^[11] Note that the types in this initial example are shown using xsi:type for clarity. In a real-world example, these would be provided by the XML Schema definition.

Figure 2.7 Illustrates the use of some of the built-in data types provided by XML Schema.

 <  invitation  xmlns:xsi   ="http://www.w3.org/2001/XMLSchema-instance"  xmlns:xsd  ="http://www.w3.org/2001/XMLSchema">   <  title  >BubbleDog's 5th Birthday</  title  >   <  age   xsi:type  ="xsd:integer">5</  age  >   <  born   xsi:type  ="xsd:Date">1997-12-21</  born  >  <!-- party At a palindromic moment -->  <  party   xsi:type  ="xsd:dateTime">   2002-12-21T20:02:00-07:00</  party  >  <!-- lasts 1 hour -->  <  duration   xsi:type  ="xsd:duration">PT1H</  duration  >  <!-- Recurs annually on December 21 -->  <  annual   xsi:type  ="xsd:gMonthDay">12-21</  annual  >  <!-- if celebrated monthly -->  <  monthly   xsi:type  ="xsd:gDay">21</  monthly  >   <  location   xsi:type  ="USAddress">...</  location  >   <  replyTo   xsi:type  ="xsd:anyURI">...</  replyTo  >   <  picture   xsi:type  ="xsd:anyURI">...</  picture  > </  invitation  >

Table 2.3. Commonly Used XML Schema Data Types

Type	Description	Example
string	Text	`simple`
boolean	Logic	`false`
decimal	Numbers	`1.25`
integer	integers	`2002`
float	32-bit float	`67.433E12`
double	64-bit float	`7.33E32`
duration	Time period	`P1DT1H`
dateTime	ISO 8601 date-time	`2003-01-01T00:00:00`
time	Instant of time	`12:00:00`
date	Calendar date	`2003-01-01`
gYearMonth	Calendar month	`2003-01`
gYear	Calendar year	`2003`
gMonthDay	Monthly recurring date	`15`
gDay	Annually recurring day	`12-15`
gMonth	Annually recurring Month	`12`
base64Binary	Binary data	`...`
anyURI	URI	`http://example.com`

2.5.2 Extending Built-in Types

New data types can be defined starting from the set of XML Schema built-in types. Such type derivations are carried out by imposing appropriate restrictions on the set of allowable values for a given built-in type. Allowable values in XML Schema are governed by several facets ; by restricting these facets, one can define subtypes of the built-in types described thus far. Table 2.4 lists facets that can be used in defining subtypes of the built-in types; type derivation by restricting values along one or more facets is called restriction . Note that not all of the facets listed in Table 2.4 are available on all built-in types; for details, see the XML Schema specification.

Simple types are defined in XML Schema using element simpleType . We show examples of the use of element simpleType in defining two user-defined types, USState and ZIPCode . We show an example of using string enumeration to define a new type called USState in Figure 2.8.

Figure 2.8 Type `USState` is derived by restricting type `xsd:string` .

 <  xsd:simpleType   name  ="USState"  xmlns:xsd  ="http://www.w3.org/2001/XMLSchema">   <  xsd:restriction   base  ="xsd:string">     <  xsd:enumeration   value  ="AK"/>     <  xsd:enumeration   value  ="AL"/>     <  xsd:enumeration   value  ="AR"/>  <!-- and so on ...-->  </  xsd:restriction  > </  xsd:simpleType  >

We define type ZIPCode by restricting xsd:string in Figure 2.9; the set of allowable values is specified via facet pattern . XML Schema also allows the definition of list and union types using element simpleType . Complete details of the use of element simpleType are beyond the scope of this book, and the interested reader is referred to the references on XML Schema.

Figure 2.9 Using facet pattern to define type `ZIPCode` that can hold five-digit U.S. ZIP codes.

 <  xsd:simpleType   name  ="ZIPCode"  xmlns:xsd  ="http://www.w3.org/2001/XMLSchema">   <  xsd:restriction   base  ="xsd:string">     <  xsd:pattern   value  ="\d{5}"/>   </  xsd:restriction  > </  xsd:simpleType  >

Table 2.4. Facets Restrict Values of Built-in XML Schema Data Types

Facet	Description
pattern	Regular expression
enumeration	Enumerate values
minLength	Minimum length
maxLength	Maximum length
minExclusive	Lower Bound
minInclusive	Minimum allowed value
maxExclusive	Upper bound
maxInclusive	Maximum allowed value
length	Length
minLength	Minimum length
maxLength	Maximum length

2.5.3 Defining Aggregations Using Complex Types

Higher level data aggregations are encoded in XML using elements and attributes. The previous section described simple types as defined by XML Schema. XML structures that are the result of attaching attributes or element children are called complex types in XML Schema.

Constructs for creating complex types are defined in XML Schema Part 1, ^[12] and XML Schema Primer ^[13] gives a good tutorial introduction to this topic. This section gives a high-level overview of how these constructs can be used to define data aggregations.

^[12] http://www.w3.org/TR/xmlschema-1/

^[13] http://www.w3.org/TR/xmlschema-0/

In XML Schema, complex types allow elements in their content and may carry attributes; simple types cannot have element content and cannot carry attributes. XML Schema definitions create new types, and XML Schema declarations enable elements and attributes with specific names and types to appear in XML instances. In this section, we focus on defining complex types and declaring the elements and attributes that appear within them.

We illustrate these concepts by first defining complex type USAddress and then using this to define a more complete schema for the party invitation introduced in Figure 2.7. The schema in Figure 2.10 defines a new type called USAddress . It declares that data conforming to type USAddress must have 5 element children and 2 attributes. It further constrains the values of these elements and attributes using XML Schema built-in types.

Figure 2.10 Type Definition for complex type `USAddress` .

 <  x:complexType   name  ="USAddress"  xmlns:x  ="http://www.w3.org/2001/XMLSchema">   <  x:sequence  >     <  x:element   name  ="name"  type  ="x:string"/>     <  x:element   name  ="street"  type  ="x:string"/>     <  x:element   name  ="city"  type  ="x:string"/>     <  x:element   name  ="state"  type  ="x:string"/>     <  x:element   name  ="zip"  type  ="x:integer"/>   </  x:sequence  >   <  x:attribute   name  ="gpsLatitude"  type  ="x:decimal"/>   <  x:attribute   name  ="gpsLongitude"  type  ="x:decimal"/> </  x:complexType  >

New complex types are defined using element complexType , and such definitions contain a set of element declarations, element references, and attribute declarations. The declarations are not themselves types, but rather an association between a name and the constraints that govern the appearance of that name in conforming XML instances. Thus, these are similar to statements in programming languages used to declare identifiers of a given type.

Elements are declared using element element ; attributes are declared using element attribute . For example, we define InvitationType as a complex type, and within that definition, we see element and attribute declarations as shown in Figure 2.11.

Figure 2.11 Definition of type `InvitationType` .

 <  s:schema   xmlns:s  ="http://www.w3.org/2001/XMLSchema">  <!-- insert USAddress definition here -->  <  s:complexType   name  ="InvitationType">     <  s:sequence  >       <  s:element   name  ="title"  type  ="s:string"/>       <  s:element   name  ="age"  type  ="s:integer"/>       <  s:element   name  ="born"  type  ="s:date"/>       <  s:element   name  ="party"  type  ="s:dateTime"/>       <  s:element   name  ="duration"  type  ="s:duration"/>       <  s:element   name  ="annual"  type  ="s:gMonthDay"/>       <  s:element   name  ="monthly"  type  ="s:gDay"/>       <  s:element   name  ="location"  type  ="USAddress"/>       <  s:element   name  ="replyTo"  type  ="USAddress"/>       <  s:element   name  ="picture"  type  ="s:anyURI"/> </  s:sequence  ></  s:complexType  ></  s:schema  >

The consequence of the definition shown in Figure 2.11 is that any element whose type is declared to be InvitationType must consist of the requisite number of elements and attributes. These elements must be named as specified by the values of the name attributes appearing in the definition, and each element must appear in the same order as declared. The USAddress definition contains only declarations involving the simple types xsd:string and decimal . More advanced type definitions, like the one for InvitationType shown in Figure 2.11, can use complex types defined earlier by using the same mechanism shown here.

In defining InvitationType , two of the element declarations, replyTo and location , associate different element names with the same complex type USAddress . The consequence of this definition is that any element appearing in an instance document whose type is declared to be InvitationType must consist of elements named replyTo and location , each containing the five subelements ( name , street , city , state , and zip ) that were declared as part of type USAddress . These elements may also carry the GPS attributes that were declared as part of USAddress .

Finally, notice that the declaration of child elements is enclosed in element sequence . Attributes minOccurs and maxOccurs on element sequence may be used to specify cardinality constraints on the number of child elements. If omitted, these default to 1 as in the examples shown in Figure 2.11.