Object-Oriented Schema Design | The Official XMLSPY Handbook

Thus far, the discussion of XML Schema design has included coverage of numerous globally defined constructs (schema components defined directly underneath the root schema element) such as global complex types and groups. These serve to help improve the overall reusability and modularity of your schema components. Globally defined complex elements allow you to define a schema component and explicitly assign the complex type a respective type name. You can then declare elements elsewhere throughout an XML Schema under a separate local name and specify the element type to be that of a named complex type defined elsewhere. Similarly, groups allow us to define and name smaller assemblages of elements, which can be subsequently referenced throughout an XML Schema, thereby constituting building blocks for constructing other complex types.

Global constructs, such as the various groups discussed in the last section, coupled with the capability to easily work with multiple XML Schema files and components defined under potentially different namespaces serve to greatly increase the flexibility of XML Schema design. They exceed the flexibility of DTDs, yet they fall short of providing true extensible object-oriented design capabilities, which are so deeply rooted in modern software development practices. In particular, these include the common practice of building component types through the use of inheritance and polymorphism. To support true extensible object-oriented design, the XML Schema defines a straightforward mechanism for deriving complex types and specifying equivalency between elements, thus emulating polymorphic XML behavior—this is the subject of the next several sections.

Deriving complex types by extension

Deriving a complex type enables you to build upon an existing complex type definition, adding to it whatever additional schema components or parts meet your application design requirements. The technique of developing a complex type through derivation is similar to the process discussed in Chapter 4 for deriving custom simple types. The main difference is that in Chapter 4, the base type was always a simple type; whereas in this chapter, the base type is a complex type.

Consider the AddressType complex type definition from the sample Purchase Order Schema. It is limited in its potential usage because it is specifically designed to deal with U.S. addresses—this is because U.S.-specific data elements such as State and Zip are included in the model. In contrast, a Canadian address requires elements for Province and PostalCode. An object-oriented approach to designing the AddressType complex type so that it could work with both U.S. and Canadian mailing addresses would be to first separate the common address components, Street1, Street2, and City, from the regionally specific address elements as shown in the following code listing:

<xsd:complexType name=”AddressType”>    <xsd:sequence>       <xsd:element name=”Street1” type=”xsd:string”/>       <xsd:element name=”Street2” type=”xsd:string” minOccurs=”0”/>       <xsd:element name=”City” type=”xsd:string”/>    </xsd:sequence> </xsd:complexType>

The next step is to create the derived complex types CanadianAddress and USAddress, extending the AddressType listed above to include additional regionally specific data elements. The schema code listing for the CanadianAddress complex type is listed in the following code. The USAddress is not shown, although it is virtually identical. The complete schema listing for this example is located in the Order_5-09.xsd file on the companion CD. The important aspects appear in bold in the following listing.

<xsd:complexType name="CanadianAddress">       <xsd:complexContent>          <xsd:extension base="AddressType">             <xsd:sequence>                <xsd:element name="Province">                   <xsd:simpleType>                      <xsd:restriction base="xsd:string">                         <xsd:enumeration value="AB"/>                         <xsd:enumeration value="BC"/>                         <!-- Canadian Province Abbreviations -->                      </xsd:restriction>                   </xsd:simpleType>                </xsd:element>                <xsd:element name="PostalCode">                   <xsd:simpleType>                      <xsd:restriction base="xsd:string">                         <xsd:pattern value="\p{L}\d\p{L}\d\p{L}\d"/>                      </xsd:restriction>                   </xsd:simpleType>                </xsd:element>             </xsd:sequence>          </xsd:extension>       </xsd:complexContent>    </xsd:complexType> </xsd:schema>

The CanadianAddress complex type definition begins with the complexContent element, which allows you to derive new content. Next the extension element appears and specifies the base type as being AddressType. Finally, a sequence compositor is used and contains locally defined data elements, Province and PostalCode, which are themselves simple types that have been derived by restriction. The Province element has different enumerated values corresponding to Canadian province abbreviations, and the PostalCode element specifies a different pattern consisting of a letter (uppercase or lowercase) followed by number, repeated three times (for example, R3R3B6 would be valid).

The complete model of the CanadianAddress complex type is the model of the base type, plus the additional data elements that were locally defined inside the type definition. The content of the base plus additional data elements are treated as though they were both children of the same sequence compositor. Furthermore, only complex types may serve as base types through which extended types are defined (either through extension or restriction). This is a major additional reason to use global complex types instead of global element definitions.

To create a complex type based on an extension of an existing complex type, follow these steps:

Using the Schema Design view, create a new complex type, as you normally do, from the Schema Overview page.
Expand the new complex type so that it is graphically editable.
From the Details window, set derivedBy to extension and base to AddressType. The sequence compositor and the additional Province and PostalCode elements are shown in Figure 5-15. Note that the base type is visually represented inside a colored region, and the extended elements appear outside that region.

Figure 5-15: Extending complex types using XMLSPY.

Deriving complex types by restriction

In addition to deriving new complex types by extending content models, XML Schema supports the derivation of new types by restricting the models of existing complex types. In Chapter 4, I used restriction to place constraints on the value space of simple types; restriction in terms of complex types involves creating a subset of the schema component by eliminating unwanted parts of the model. These may include deleting an element defined in the base type or imposing additional cardinality constraints by means of specifying different values for minOccurs or maxOccurs. This has the net result of constricting the possible range of occurrences for a particular element or compositor with respect to what was originally permitted in the base type definition. Generally, I find that deriving complex types by restriction is less useful than extending complex types. In fact, it can be problematic due to numerous inconsistencies and other issues surrounding the use of a proper validation of complex types derived by restriction.

As an example, I will create a new complex type definition called BulkOrder that is identical to an OrderType element (a complex type representation of the Order global element) except that it requires a minimum of 10 line items so that it can qualify as a bulk order. The source code for this example is located in the Order_5-10.xsd file. The Order_5-10.xsd file has several changes with respect to the other Purchase Order Schemas. First, I had to convert the Order global element into a global complex type called OrderType. This was necessary because global elements cannot be derived—only complex types can. Here is the code listing for the derived BulkOrderType complex type that is derived by restriction based on the OrderType complex element:

<xsd:complexType name="BulkOrderType">    <xsd:complexContent>       <xsd:restriction base="OrderType">          <xsd:sequence>             <xsd:element name="ShippingAddress" type="AddressType"/>             <xsd:element name="BillingAddress" type="AddressType"/>             <xsd:element name="Line-Items">                <xsd:complexType>                   <xsd:sequence minOccurs="10" maxOccurs="unbounded">                      <xsd:element name="Product" type="ProductType"/>                   </xsd:sequence>                </xsd:complexType>             </xsd:element>             <xsd:element ref="Note"/>          </xsd:sequence>       </xsd:restriction>    </xsd:complexContent> </xsd:complexType>

BulkOrderType, the restricted derived type, is enveloped by a complexContent element that allows you to derive a model by either restriction or extension. Next, the restriction element appears and specifies a base element of OrderType. The rest of the type definition corresponds to the OrderType definition with one modification; the minOccurs attribute for the Line-Items sequence compositor has been increased to 10. A type definition that is derived by restriction appears to be the opposite of the previous derivation by extension example, which did not need to include the actual definition of the base complex type. Types derived by restriction must explicitly repeat all the components of the base type definition that are to be included in the derived type. Furthermore, elements appearing in the restricted type must be a subset of the base type. That means you may delete elements or other components from the base complex element. You cannot, however, introduce any new elements or components into the model of the new type. Any changes that you make to the values of the minOccurs or maxOccurs must result in a more restricted range (a smaller range) of possible occurrences in the derived type.

As previously mentioned, there are several inconsistencies between the implementations of validating XML Schema parsers when dealing with validation of complex types derived by restriction. This is possibly due to some ambiguity in the schema specification. It results in undesirable program behavior when you switch between any of the well-known parser implementations. Therefore, I recommend against the use of restricted complex types for the time being. It is likely that future XML Schema specifications will clarify this subject and eventually resolve the matter. For these reasons XMLSPY has only limited support for editing and no support for validating complex types derived by restriction in Schema Design view. To create a new complex type by restriction in XMLSPY, follow these steps:

Add a new complex type to your schema from the Schema Overview page.
Expand the new component; in the Details window set the base complex type to the complex type definition you intend to use as a base type.
Set the derivedBy attribute to restriction, as shown in Figure 5-16. XMLSPY will automatically copy over the model of the base type; you, as the schema author, can then delete or make legal changes to the minOccurs or maxOccurs attributes. XMLSPY does not have a way of graphically displaying the elements that have been omitted in the restricted type, nor does it perform any validation or additional checking for illegal editing operations as you are editing.

Figure 5-16: Deriving complex types by restriction.

Redefining complex types from external schemas

Now that you have learned how to derive complex types based on extension or restriction, the question arises: How do you make changes to an externally defined complex type definition? XML Schema provides a simple redefine mechanism for doing exactly that. The following example builds on the example of Order_5-05a.xsd and Order_5-05b.xsd. These imported an externally defined AddressType definition from an external file that belonged to the same target namespace. In this example, I redefine the imported AddressType definition to include an additional field called CustomerName—the abbreviated code listing is shown in the following code, and the complete source is located in the Order_5-11a.xsd and Order_5-12.xsd files on the companion CD.

<xsd:schema targetNamespace="http://www.company.com/examples/purchaseorder" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://www.company.com/ examples/purchaseorder" elementFormDefault="unqualified" attributeFormDefault="unqualified">    <!-- Redefine Address construct From order_5-11b.xsd -->    <xsd:redefine schemaLocation="order_5-11b.xsd">       <xsd:complexType name="AddressType">          <xsd:complexContent>             <xsd:extension base="AddressType">                <xsd:sequence>                   <xsd:element name="CustomerName" type="xsd:string"/>                </xsd:sequence>             </xsd:extension>          </xsd:complexContent>       </xsd:complexType>    </xsd:redefine>    <xsd:element name="Order">       <xsd:complexType>          <xsd:sequence>             <xsd:element name="ShippingAddress" type="AddressType"/>             <xsd:element name="BillingAddress" type="AddressType"/>             <-- Omitted for Brevity -->       </xsd:complexType>    </xsd:element>        <-- Omitted for Brevity --> </xsd:schema>

The redefinition syntax is straightforward. It is just like any type derivation syntax except that an enveloping redefine element is required to help locate the externally defined component. As a cautionary note, type redefinition only works when the externally defined schema component belongs to the same namespace or does not declare a target namespace. Furthermore, you should use redefine with caution. Type redefinitions can result in unknown and potentially dangerous side effects if the author of the externally defined schema component makes a change to the schema that conflicts with the component redefinition. A question arises: What if you, as the schema author, would like to impose a restriction prohibiting the redefinition of a particular schema component? This is indeed possible, as you see next.

Restricting the use of complex types

Although deriving types by either restriction or extension can have many positive aspects in terms of improving the overall design of your XML Schema, there are times that you might want to specify that a complex type definition should be final. You do not permit further derivation. An XML Schema can restrict type derivation either by restriction, extension, or both, specifying it to be final as shown in the following code:

<complexType name=”Order” final=”extension”>  <sequence>   <!-- Omitted for Brevity -->  </sequence> </complexType>

The final attribute specifies that the complex type definition may not be further derived by extension; specifying a final attribute value of restriction will disallow derivation by restriction, and the value #all will completely disallow type derivation of any kind.

Substitution groups and abstract type definitions

Substitution groups in XML Schema enable you to specify a grouping of different element types as being equivalent, and thus allowing them to be used interchangeably—elements belonging to the same substitution group may be substituted for one another inside of an instance document. An abstract element type refers to an element definition that cannot be referenced within an instance document; rather a member of the abstract type’s substitution group must appear in its place.

These two concepts—substitution groups and abstract element types—are used together to enable polymorphism, a widely used object-oriented programming concept, also known as late-binding in compiler-speak. Polymorphism (from the Greek meaning “having multiple forms”) describes the characteristic of being able to assign a different meaning or use to something in different contexts. Polymorphism, as far as XML Schema is concerned, refers to the practice of defining a family of equivalent, or substitutable, components.

As an example, consider the schema of Order_5-09.xsd, which defined two complex types, USAddress and CanadianAddress, through derivation. One unfortunate side effect was that the Order element that declared the BillingAddress and ShippingAddress elements (which used to be of AddressType), had to be changed to include a choice compositor consisting of two elements. These are CA, declared to be of type CanadianAddress, or UA, declared to be of type USAddress, as shown Figure 5-17.

click to expand
Figure 5-17: The modified order element of Order_5-09.xsd handles either Canadian or U.S. addresses.

Although the Order element of Figure 5-17 can properly handle either U.S. or Canadian shipping and billing addresses, this approach is unwieldy. Imagine if you added any other international address type meant to be substitutable. The Order element would greatly increase in complexity. It would be ideal, instead, to structure the XML Schema so that the ShippingAddress and BillingAddress types could accept U.S. and Canadian address elements, in addition to any other kind of address type—including ones that haven’t yet been defined.

XML Schema substitution groups, in conjunction with abstract types, can be employed to make your XML Schema handle any kind of address element as follows: Define an abstract element type called Address. This will be designated as a head element, which is essentially a placeholder for an undetermined type (such as USAddress, CanadianAddress, or any other future address elements). The Address abstract type definition doesn’t define a model because abstract element types by definition cannot be used in an instance document. Next, create two global element definitions: USAddress and CanadianAddress, each containing its respective complete element definition. Designate that they be substitutable elements with the Address element.

In practice, the XML Schema code for this scenario requires numerous modifications to the Purchase Order examples that you have been using thus far. Using substitution groups can be considered the opposite of type derivation. In type derivation, I made an existing type definition more specific by either adding or deleting parts of the definition. In contrast, substitution groups have the effect of making an element definition more general, allowing for the possibility of accepting elements that have not yet been defined. Furthermore, whereas type derivation was only applied to type definitions (globally declared complex types), substitution groups may only be used in global element definitions.

The complete listing for a Purchase Order Schema that uses substitution groups and an abstract type is located in the Order_5-11.xsd file. The important aspects are shown in the following listing:

<xsd:schema targetNamespace="http://www.company.com/examples/purchaseorder" ... >       <xsd:element name="Address" abstract="true"/>    <xsd:element name="USAddress" substitutionGroup="Address">       <xsd:complexType>          <xsd:sequence>             <xsd:element name="Street1" type="xsd:string"/>              <!-- Omitted for Brevity -->          </xsd:sequence>          <xsd:attribute name="addressType" type="xsd:string" use="required"/>       </xsd:complexType>    </xsd:element>    <xsd:element name="CanadianAddress" substitutionGroup="Address">       <xsd:complexType>          <xsd:sequence>             <xsd:element name="Street1" type="xsd:string"/>             <!-- Omitted for Brevity -->          </xsd:sequence>          <xsd:attribute name="addressType" type="xsd:string" use="required"/>       </xsd:complexType>    </xsd:element>    <xsd:element name="Order">       <xsd:complexType>          <xsd:sequence>             <xsd:element ref="Address"/>             <xsd:element ref="Address"/>             <!-- Omitted for Brevity -->>           </xsd:sequence>       </xsd:complexType>    </xsd:element>    <!-- Omitted for Brevity --> </xsd:schema>

Starting from the top, the Address element is specified to be an abstract element through the presence of the abstract attribute being set to true. Address will be used as the head element for the substitution group you are developing. Next you define the USAddress and CanadianAddress and specify that they belong to the Address substitution group as specified by the substitutionGroup attribute. I had to add an additional attribute to both USAddress and CanadianAddress to specify the kind of address (for example, billing or shipping) because they are represented as global elements. Before, I had used global complex types. Next, the Order element appears, which references the Address abstract element twice. In an instance document, a CanadianAddress or USAddress or any other element belonging to the Address substitution group must appear in its place in order to be valid. The instance document fragment below is valid:

<Order xmlns=”http://www.company.com/examples/purchaseorder” ...>    <USAddress addressType=”shipping”>       <Street1>123 Westland Ave</Street1>       <City>Boston</City>       <State>MA</State>       <Zip>02115</Zip>    </USAddress>    <CanadianAddress addressType=”billing”>       <Street1>100 Front St.</Street1>       <City>Toronto</City>       <Province>ON</Province>       <PostalCode>T3G1L4</PostalCode>    </CanadianAddress>    <!-- Omitted for Brevity --> </Order>

To create a substitution group in XMLSPY’s Schema Design view, add global elements to your schema as you would normally do. Then, using the Details window, specify that a type is either abstract or belongs to a substitution group. The Order element that I just discussed is shown in Figure 5-18.

click to expand
Figure 5-18: Visual representation of substitution groups in XMLSPY.

Using XML Schemas with DTDs

Because you can use any XML document in conjunction with a DTD and because an XML Schema is itself an XML document, it should be no surprise that you can use DTD constructs in defining an XML Schema. The most common reason why this is sometimes done is to define and use general entity definitions inside your XML Schema. There is no real equivalent of a general entity definition in XML Schema. General entities appearing in an XML Schema are all resolved before the XML Schema is used for validation. Therefore, to define a general entity in a DTD for use inside an XML Schema, simply follow the instructions for creating a DTD and a general entity, as discussed in Chapter 3, and use the entities as you would with any other XML document.

Generating class files from XML Schemas

As I near the end of this discussion on XML Schemas, I remind you of the fact that XML is not a full programming language because it cannot be compiled or executed as a standalone, binary executable file. Rather XML documents must be bound to an external software application or runtime environment. Therefore, XML development does not end after schema design or even after editing and validating instance documents based on an XML Schema. Rather you (or someone on your team) will ultimately need to implement the software code used for the processing of XML documents within the context of a software application. Creating a program language binding requires writing and implementing the necessary software class files so that object representations of your XML document can be replicated in-memory and operated on. Your class files ultimately need to implement methods (or functions) that provide programmatic access to the contents of your XML document, as well as to load, create, validate, process, transform, modify, and flush an XML document.

You can write an XML program code binding in any software programming language, such as Java or C++. Writing of the language binding is facilitated through the use of high-level XML processing Application Programming Interfaces (APIs) such as Microsoft MSXML and Apache Xerces, which are freely available and have implementations targeted at various programming languages. (See Appendix A for a complete listing of XML processors and where to obtain them.) An analogy to this is how, after completing a schema design for a relational database and populating the various tables with data, you still need to implement a database application using higher-level APIs. These might include Active Data Objects (ADO), Open Database Connectivity (ODBC), or Java Database Connectivity (JDBC), all of which could accomplish the desired business application functionality. XMLSPY is not a conventional Integrated Development Environment because it is not meant to compile and debug Java or C++ programs as you might be accustomed to doing using tools such as Borland JBuilder or Microsoft Visual Studio.NET. Still, XMLSPY includes code-generation capabilities meant to accelerate the transition from XML Schema design to the coding and implementation phase of an XML application. Program code generation in XMLSPY is driven by your XML Schema and an XMLSPY template file (.spl), which specifies how output code should look. You have the option of creating your own templates or using the predefined templates.

To generate program code for your Schema, follow these steps:

Open the XML Schema.
Choose DTD/Schema → Generate Program Code. The window shown in Figure 5-19 appears.

Figure 5-19: Generating class files based on your completed XML Schema.
Choose a predefined template or specify a path to your own template and click OK. An XMLSPY File Directory Picker asks you for the desired output directory. Specify a directory on your local file system and click OK. The generated program code will be output to the specified directory.

At the time of this writing, XMLSPY includes two built-in templates for generating Java and C++ code. It is expected that other templates for languages, such as C# and Visual Basic.NET, will be made available in the near future. The predefined templates generate one class file corresponding to each globally declared element or complex type in your XML Schema, preserving the inheritance tree as defined in your XML Schema. Additional code is implemented, such as functions that read XML files into a DOM in-memory representation, write XML files from a DOM representation back to a system file, as well as XML validation and transformation. The outputted program code is expressed in C++ or Java programming languages. The C++ generated output targets MSXML 4.0 and includes a Visual Studio 6.0 project file; the generated Java output is written against the industry-standard Java API for XML Parsing (JAXP) and includes a Sun Forte for Java project file. You can modify the appearance of generated code by editing the built-in templates or by creating new ones from scratch.

Output code is completely customizable via a simple yet powerful template language, which gives full control in mapping XML Schema built-in data-types to the primitive data types of a particular programming language. Additionally, you can build your own templates to automate the generation of code for such things as Enterprise JavaBeans, SQL scripts, and Active Server Pages.

Using XMLSPY’s Schema Editor in conjunction with code generation templates makes XMLSPY well suited as a software modeling tool, allowing XML applications to be prototyped at a high level in XML Schema and then automatically generated. Changes to an application’s XML Schema content model can be immediately reconciled with a software implementation simply by rerunning the code generator. Although the generated code does not produce a completed application, it does free you from having to write and test low-level infrastructure code, allowing you to focus on implementing business logic.

Additional information on program code generation is available in the online Help menu. Choose Help → Contents → Code Generator → The Way to SPL (Spy Programming Language).