XML Schema Syntax | The Official XMLSPY Handbook

Up to this point, with the exception of the discussion on XML namespaces, I have not delved into the syntax of XML Schema. However, I believe the forthcoming XML Schema language discussion will be far simpler for you because you have already created an XML Schema and validated an instance document against it. It is important to understand the underlying XML Schema syntax, and you can always view the XML Schema code in Text view. The source code for the Purchase Order Schema that you developed is shown in Listing 4-2:

Listing 4-2: The Purchase Order Schema

 <?xml version="1.0" encoding="UTF-8"?> <xsd:schema targetNamespace="http://www.company.com/examples/purchaseorder"  xmlns="http://www.company.com/examples/purchaseorder"  xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"  attributeFormDefault="unqualified">    <xsd:element name="Order">       <xsd:complexType>          <xsd:sequence>             <xsd:element name="ShippingAddress" type="AddressType"/>             <xsd:element name="BillingAddress" type="AddressType"/>             <xsd:element name="Line-Items">                <xsd:complexType>                   <xsd:sequence maxOccurs="unbounded">                      <xsd:element name="Product" type="ProductType"/>                   </xsd:sequence>                </xsd:complexType>             </xsd:element>             <xsd:element ref="Note"/>          </xsd:sequence>       </xsd:complexType>    </xsd:element>    <xsd:complexType name="AddressType">       <xsd:sequence>          <xsd:element name="Street1" type="xsd:string"/>          <xsd:element name="Street2" type="xsd:string" minOccurs="0"/>          <xsd:element name="City" type="xsd:string"/>          <xsd:element name="State" type="xsd:string"/>          <xsd:element name="Zip">             <xsd:simpleType>                <xsd:restriction base="xsd:string">                   <xsd:pattern value="[0-9]{5}"/>                </xsd:restriction>             </xsd:simpleType>          </xsd:element>       </xsd:sequence>    </xsd:complexType>    <xsd:complexType name="ProductType">       <xsd:sequence>          <xsd:element name="Description" type="xsd:string"/>          <xsd:element name="Price">             <xsd:simpleType>                <xsd:restriction base="xsd:string">                   <xsd:pattern value="[0-9]{0,}\.[0-9]{2}"/>                </xsd:restriction>             </xsd:simpleType>          </xsd:element>          <xsd:element name="Quantity" type="xsd:positiveInteger"/>          <xsd:element name="Ship-Date" type="xsd:date"                              nillable="true" minOccurs="0"/>          <xsd:element ref="Note" minOccurs="0"/>       </xsd:sequence>       <xsd:attribute name="prod-id" type="xsd:integer"/>    </xsd:complexType>    <xsd:element name="Note" type="xsd:string"/> </xsd:schema>

In this section, I explain the semantics and language constructs of the XML Schema language. I make references to the sample Purchase Order Schema shown in Listing 4-2.

Simple types

Simple types are the most basic XML Schema data construct. You use them to define all attributes, as well as elements that contain only text and do not have any attributes associated with them. In the example Purchase Order, you defined many simple types, such as

<xsd:element name=”Quantity” type=”xsd:integer”/> <xsd:element name=”Ship-Date” type=”xsd:date” nillable=”true” minOccurs=”0”/> <xsd:element name=”Street1” type=”xsd:string”/>

To declare an element as a simple type, type <xsd:element to start the declaration, followed by name=”elementname” (where elementname should be substituted for whatever name you have chosen for the element), followed by type=”elementtype” (where elementtype is one of the built-in XML Schema data types, such as xsd:integer, xsd:date, or xsd:string or any of the built-in types described in detail in Part 2 of the XML Schema section, located at www.w3.org/TR/xmlschema-2/ ). Finish the element declaration by closing the xsd:element tag. The data types begin with xsd:, which indicates that they belong to a namespace. In fact, they are defined in the same namespace used in the previous section: www.w3.org/2001/XMLSchema. Elements that are simple types can be declared globally as a child element of the schema element or locally within some other complexType.

The AddressType element (a global complex type) includes several elements that are of simple type: Street1, Street2, City, State, and Zip (see Listing 4-3).

Listing 4-3: The AddressType Global Complex Type Definition

<xsd:complexType name="AddressType">    <xsd:sequence>       <xsd:element name="Street1" type="xsd:string"/>       <xsd:element name="Street2" type="xsd:string" minOccurs="0"/>       <xsd:element name="City" type="xsd:string"/>       <xsd:element name="State">          <xsd:simpleType>             <xsd:restriction base="xsd:string">                <xsd:enumeration value="AL"/>                <xsd:enumeration value="AK"/>                <xsd:enumeration value="AZ"/>                <!-- ... (lines omitted) ...-->                <xsd:enumeration value="WV"/>                <xsd:enumeration value="WY"/>             </xsd:restriction>          </xsd:simpleType>       </xsd:element>       <xsd:element name="Zip">          <xsd:simpleType>             <xsd:restriction base="xsd:string">                <xsd:pattern value="[0-9]{5}"/>             </xsd:restriction>          </xsd:simpleType>       </xsd:element>    </xsd:sequence> </xsd:complexType>

Deriving simple types

Simple types are defined using one of the 44 built-in data types such as string, integer, date, and Boolean. The built-in data types closely resemble the built-in data types of SQL or other popular programming languages. Using the built-in data types is the easiest option to define simple type elements, however you can just as well create your own custom simple types by selecting one (or possibly more than one) of the built-in data types to serve as a base. Customize the data type through the application of various restrictions to suit the particular needs of your application. Common restrictions might include restricting a string to be equal to a token in an enumeration of strings, specifying a range of valid minimum and maximum integer values, or specifying a sequence or pattern of characters. In fact, the built-in XML Schema data types are themselves derived from each other, with anySimpleType as the root of the simple type inheritance tree. anySimpleType, as its name suggests, can be any simple type (that is, it has no restrictions on its value space). The 44 built-in data types are, therefore, divided into two categories: primitive built-in types, which are not defined in terms of any other data types, and built-in derived data types, which are derived from the primitive built-in data types. The inheritance tree for the built-in data types (all simple types) is shown in Figure 4-15.

click to expand
Figure 4-15: The inheritance tree for the built-in XML Schema data types.

A complete listing of all the built-in XML Schema data types is also available in the Details Entry Helper window, accessed by clicking on the drop-down box that appears adjacent to the type property. This window also lists the names of any global elements or global complex types defined in the current XML Schema. There are three ways to create new simple types—by restriction, list, or union. The following sections explain these three methods.

Restriction

Simple types can be derived by placing restrictions on their value. In Listing 4-3, you restricted the values for the Zip and Price elements, which were both simple types. Their definitions are listed in the following code:

... <xsd:element name="Price">    <xsd:simpleType>       <xsd:restriction base="xsd:string">          <xsd:pattern value="[0-9]{0,}\.[0-9]{2}"/>       </xsd:restriction>    </xsd:simpleType> </xsd:element> ... <xsd:element name="State">    <xsd:simpleType>       <xsd:restriction base="xsd:string">          <xsd:enumeration value="AL"/>          <xsd:enumeration value="AK"/>          <!-- ... (lines omitted) ...-->          <xsd:enumeration value="WV"/>          <xsd:enumeration value="WY"/>       </xsd:restriction>    </xsd:simpleType> </xsd:element>

To restrict the values of a simple type, start by typing <xsd:simpleType> to begin the definition of the custom simple type, followed by <xsd:restriction base=”basetype”>, where basetype is the existing simple type on which you are basing the new simple type. This implies that the resulting data type will be a subset of the base type’s value space. In the Purchase Order examples, the child element of the xsd:restriction element was the xsd:pattern element, which specifies a regular expression pattern to be fulfilled. Other possible restrictions include placing constraints on minExclusive, minInclusive, maxExclusive, maxInclusive, totalDigits, fractionDigits, length, minLength, maxLength, enumeration, and whitespace. These constraints are self-explanatory except, perhaps, for whitespace, which defines how to handle whitespace (preserve, collapse, and replace are the options available). All restrictions are listed in the Facets Entry Helper window in Schema Design view.

List Types

The list element creates a data type whose contents consist of a whitespace-delimited list of values. The data type of the tokens in the list is determined by the itemType attribute, which defines the base type that is to be used for the list. As an example, you could create a data type consisting of a whitespace-delimited list of integers as follows:

<xsd:simpleType name=”points” type=”xsd:integer”>  ... <xsd:simpleType name=”list-of-points”>  <xsd:list itemType=”points”/> </xsd:simpleType>

Union Types

The union type is something of a multiple inheritance for simple types. I am not a great fan of this technique. A union type’s value space is equal to the union of the value spaces of two or more base data types. The memberTypes attribute contains a whitespace-delimited list of base types participating in the union. For example:

<xsd:simpleType name=”date-or-integer”> <xsd:union memberTypes=”xsd:date xsd:integer”> </xsd:simpleType>

Global definitions

In the Purchase Order example, I talked a little about global types when you converted sequences of elements into reusable schema components. The two most common globally defined schema components are a global elements or global type definitions (either simple or complex types). Other global constructs do exist and are discussed in Chapter 5. You can easily identify a globally defined schema construct because they are defined as children of the schema element. Using global declarations can greatly increase the modularity of your XML Schema.

Complex types

A complex type is any element that contains either attributes or child elements. There are four cases to consider:

Complex elements that contain child elements and possibly attributes (no textual content)
Elements that contain both text content and attributes
Empty elements (placeholders) with one or more attributes but no child elements or textual content
Mixed-content elements that contain a mixture of both child elements and textual content

Complex Elements containing child elements and attributes

The AddressType element of the Purchase Order Schema was an example of a complex element definition that contained only child elements, as shown in the following code:

<xsd:complexType name=”AddressType”>    <xsd:sequence>       <xsd:element name=”Street1” type=”xsd:string”/>       <xsd:element name=”Street2” type=”xsd:string” minOccurs=”0”/>       <xsd:element name=”City” type=”xsd:string”/>       <xsd:element name=”State” type=”xsd:string”/>       <xsd:element name=”Zip”>          <xsd:simpleType>             <xsd:restriction base=”xsd:string”>                <xsd:pattern value=”[0-9]{5}”/>             </xsd:restriction>          </xsd:simpleType>       </xsd:element>    </xsd:sequence> </xsd:complexType>

The structure is very simple: Start with the xsd:complexType element to mark the beginning of the complex type definition, assign a value to the name attribute, open the xsd:sequence tag, and list the elements in the order that you would like them to appear. Then close off the xsd:sequence and xsd:complexType tags.

Elements that contain both text content and attributes

To add attributes to any element (no matter what kind) immediately before the closing complexType element, type <xsd:attribute from within a complex type definition, followed by name=”attname”, where attname is replaced by the attribute’s name, or ref=”simpletype”, where simpletype is the name of a referenced, globally declared simple type or attribute definition. Because attributes themselves are simple types, all the rules governing the use of built-in data types, as well as those restricting and constraining their value spaces also apply to attributes. Complete the definition with a closing /> tag.

An attribute declaration must be defined at the very end of the complex type to which it belongs. If an element contains two or more attributes, the ordering of the attribute declarations within a complex element are irrelevant in so far as the XML processor is concerned. Attributes are inherently unordered, and there is no way to specify a particular sequence of attributes. The ProductType global complex type defined a required attribute, id, of type xsd:integer, and an optional attribute, department, of type xsd:string, shown in the following code after the closing sequence tag but before the closing complexType tag:

<xsd:complexType name="ProductType"> <xsd:sequence>    <xsd:element name="Description" type="xsd:string"/>    <xsd:element name="Price">       <xsd:simpleType>          <xsd:restriction base="xsd:string">             <xsd:length value="1000"/>             <xsd:whiteSpace value="preserve"/>             <xsd:pattern value="[0-9]{0,}\.[0-9]{2}"/>          </xsd:restriction>       </xsd:simpleType>   </xsd:element>   <xsd:element name="Quantity" type="xsd:positiveInteger"/>   <xsd:element name="Ship-Date" type="xsd:date"                            nillable="true" minOccurs="0"/>    <xsd:element ref="Note" minOccurs="0"/> </xsd:sequence> <xsd:attribute name="id" type="xsd:integer" use="required"/> <xsd:attribute name="department" type="xsd:string" use="optional"/> </xsd:complexType>

In the absence of the use=”required” attribute within the xsd:attribute declaration, the attribute is assumed to be optional. You may explicitly dictate the value of an attribute by specifying use=”fixed” and value=”fixedvalue” where fixedvalue is the only valid value that the attribute can take on. For example:

... <xsd:attribute name=”maxItems” type=”xsd:integer” use=”fixed” value=”100”/> ...

This specifies that the maxItems attribute, if present, must have a value of 100. I am not convinced that this is very useful. More likely, you would want to specify a default value for an attribute in the event that it is not present. For example:

... <xsd:attribute name=”language” type=”xsd:string” use=”default” value=”English”/> ...

This code line specifies that if the language attribute is not present, the default language of English is assumed. In summary, you may add, delete, or modify any element by clicking on an element in Schema Editing view and using the Attribute Overview panel at the bottom middle of the screen.

Empty Elements

To define an empty element, such as the br element that represents a line break in HTML, simply use an empty complexType element, as shown here:

<xsd:element name="br">     <xsd:complexType/> </xsd:element>

If the empty element is meant to contain attributes, add attributes as described in the previous section and as shown here:

<xsd:element name=”br”>    <xsd:complexType>       <xsd:attribute name=”length” type=”xsd:int” use=”optional”/>       <xsd:attribute name=”align” type=”xsd:string” use=”optional”/>    </xsd:complexType> </xsd:element>

Elements with mixed content

Prose-oriented content (Web sites, books, manuals, and so on), if expressed in XML, is usually mixed content. Consider the following paragraph fragment:

... <para>The <Emphasis>quick</Emphasis> brown fox jumped over the <Underline>lazy</Underline> dog</para> ...

The paragraph tag contains a mixture of both text and child elements (the Emphasis and Underline tags) in any order. Mixed-content elements are an advanced topic because they require you to have some background on the various compositor types. Thus far, I have discussed only the sequence compositor, which defines a strict ordering of child elements.

Cross-Reference See Chapter 5 for more about compositor models and defining mixed element types.

Global Elements

In DTDs, all element definitions are said to be global by definition. Consider the following DTD fragment:

<!ELEMENT book (title)> <!ELEMENT title (#PCDATA)>

This DTD fragment defines two global elements, book and title. The book element has one child element, which is an element reference to the title element. One consequence of using global types is that both the book and title elements are required to appear in an instance document having the same name as defined in the DTD, for example:

<book> <title>The XMLSPY Handbook</title> </book>

Using global elements is a simple, but inflexible, way to design XML content models.

Global Types

The XML Schema supports global elements (mostly for backward compatibility with DTDs), and introduces support for global complex type definitions (also referred to as global complex types, global types, or simply types), which are element definitions that have been assigned a unique name. After a global type has been defined, you can declare an element’s type to be that of a known, existing type. Consider the following XML Schema code fragment that defines a similar book structure, as a global complex type:

<xs:schema>  ...    <xs:complexType name=”book”>       <xs:sequence>          <xs:element name=”title” type=”xs:string”/>       </xs:sequence>    </xs:complexType> ... </xs:schema>

The preceding code listing defined the book element as a sequence of one element, title. The book global type definition can be used as a building block for developing more advanced XML structures in the form of types or elements. The book type definition becomes the content model of the element that declares it; however, the type definition does not, by itself, have a content model that can be expressed in an instance document. If you defined an XML Schema that had only type definitions and no global elements, the XML Schema would have no content model at all. That is, there would be no document element, and the XML Schema processor would produce an error message saying that the XML Schema had no content model. Typically, in schema design, you define several global types and then define one or more global elements that declare themselves to be of a specified type. For example, in the following code fragment I declare a product element to be of type book:

<xs:element name=”product” type=”book”/>

Here is how the product element might look like in an instance document:

<product> <title>The XMLSPY Handbook</title> </product>

Here you see that the product element (assume it is globally defined) is of type book (a global type). The ability to define, name, and subsequently declare elements is unique to XML Schema and greatly improves the flexibility by which a schema author can express a content model.

Declaring an Element

As discussed in the last section, after you have defined your types (that is, developed a type definition for any of the different complex types that I have discussed), you must declare elements to be of an existing type. For example, in the Purchase Order Schema, both the ShippingAddress and BillingAddress elements are declared to be of type AddressType.

 <xsd:element name="Order">                 <xsd:complexType>                         <xsd:sequence>                                 <xsd:element name="ShippingAddress" type="AddressType"/>                                 <xsd:element name="BillingAddress" type="AddressType"/>                                 ...                         ...                         </xsd:sequence>                 </xsd:complexType> </xsd:element>

Referencing a Global Element

Global element definitions can be reused. With a globally declared element, such as the Note element (a global element), create an element and reference a global element, for example:

<xsd:element name=”Order”>    <xsd:complexType>       <xsd:sequence>          <xsd:element name=”ShippingAddress” type=”AddressType”/>          <xsd:element name=”BillingAddress” type=”AddressType”/>          <xsd:element ref=”Note”/>       </xsd:sequence>    </xsd:complexType> </xsd:element>

The principal difference between declaring a global type and referencing a global element is that the latter does not require name and type attributes because they are not applicable options for global elements. Global elements do not have an associated type.

Global types versus Global Elements

I don’t believe there is much technical benefit in using a global element over a global complex type as the model for your XML Schema components; the reverse, however, is not true. There are huge benefits to using a global complex type over a global element. For example, in addition to giving you the ability to declare any element’s type to that of a global complex type, you can use global types as bases to derive new type definitions. You can also use them to employ polymorphic design strategies through the use of substitution groups. (Both these benefits are discussed in the next chapter.)

So why did the W3C even bother with global elements in the first place? I believe it is primarily for reasons of providing backward compatibility with DTDs because in DTDs, everything is a global element. Global elements are conceptually easier to understand than global types. Consequently, support for global types in XML Schema also serves to lower the learning curve for XML Schema. Perhaps the W3C wanted to provide XML Schema with enough similarities to DTDs in order to facilitate XML Schema adoption.

Anonymous type definitions

Not all complex types are required to be reusable; it is often the case that an element is a complex type simply because it contains child elements or attributes, however it is only meant to be used in one place and nowhere else. Defining an anonymous complex type is the same as defining a global complex type with two differences:

An anonymous type definition occurs nested locally within another global type or global element (that is, it is not defined globally as a child of the root schema element).
The anonymous type declaration has no associated type attribute, hence the meaning of the term anonymous.

The Line-Items element definition, nested within the Order element is an example of an anonymous complex type definition as it meets the two criteria specified previously. It is shown here:

<xsd:element name="Order">    <xsd:complexType>       <xsd:sequence>          <xsd:element name="ShippingAddress" type="AddressType"/>          <xsd:element name="BillingAddress" type="AddressType"/>                <xsd:element name="Line-Items">             <xsd:complexType>                <xsd:sequence>                   <xsd:element name="Product"                                                   type="ProductType" maxOccurs="unbounded"/>                </xsd:sequence>             </xsd:complexType>          </xsd:element>          <xsd:element ref="Note"/>       </xsd:sequence>    </xsd:complexType> </xsd:element>

Two disadvantages of using anonymous complex elements (as opposed to global complex elements) are that you cannot use an anonymous complex element as a base for extension and substitution (covered in the next chapter), and, as previously stated, you cannot reuse anonymous complex elements anywhere else.