Structuring Schemas | Using XML with Legacy Business Applications

Of the thousand ways that schema language gives you to hang yourself, the options for structuring your schemas account for at least several hundred. Here are some of the major contributors to the variety.

Global Types and Local Elements versus Global Elements

In addition to declaring your own types and creating Elements from your types, you may declare Elements and reuse them by way of a reference. An example of the syntax for doing this appears below.

 <xs:element ref="Column01" minOccurs="0"/>

Column01 has been defined somewhere else in the schema, based on a type. This style is uncommon because the referenced names must have a scope that is global to the whole namespace used in the schema. As you can imagine, this requires all names to be unique. Ensuring uniqueness can be tedious in large schemas and can lead to some awkward names. It is, however, one way to ensure that names are globally unique and that things that are named the same actually are the same.

The more common practice is to define simple and complex types, which by their nature (with the exception noted in the next subsection) have names that are global in scope. These are then used to define Element names whose scope is only local to the complex type in which they are defined.

You will, however, probably see both approaches.

Named Types and Anonymous Types

Although I didn't point it out at the time, we have already seen examples of named and anonymous types. A named type has a name Attribute in the xs:simpleType or xs:complexType Element where it is defined. In contrast, an anonymous type has no name and is defined in-line, immediately following the Element where it is used. In such cases the Element has no type Attribute. SimpleCSV1.xsd, shown in the beginning of the chapter, uses only anonymous types. SimpleCSV6.xsd uses named types for the Row and ColumnXX Elements.

The scope of anonymous types is local to the Element to which they apply. As such, they are not reusable. This flies in the face of good software engineering practices for a number of obvious reasons. Anonymous types lead to a lot of duplicate, repetitive code. They are prone to inconsistency when what should be an identical Element gets defined differently in different contexts. They can be a maintenance nightmare when, for example, you need to change the zip code everywhere it is used from the standard five digits to the five plus four format. Also, anonymous types can lead to some very large schema files. For a contrast between the approaches, look at SimpleCSV7.xsd (which uses an anonymous type only for the root Element of the document, a common practice) and Simple CSV8.xsd (which uses all but one anonymous type; we had to have an extension base of the String1024Type). The former schema document is about 52 lines while the latter is about 118 lines. Using a lot of anonymously defined complex types nested within each other can also lead to some schemas that are very hard to read unless you are looking at them with a schema authoring tool that provides a graphical view.

On the other hand, if you use a type only once, it may not make much sense to give it a global name. Schemas that use a lot of named types can also make it hard to get the full picture of an Element's content. I guess complex is just complex however you structure it. There is, however, one instance in which you must use a named type even if you use it only once. You must name a type, rather than declare it anonymously, if you want to use it as a base for restriction or extension.

Modularity: The xs:include and xs:import Elements

As with source code in programs, once something gets to a certain size it starts to get a bit unwieldy in terms of understandability and management. Schema language provides two mechanisms for breaking a single logical schema into several different physical schemas. Both are Elements in the W3C XML Schema namespace.

xs:include : This Element directs a schema processor to read the file specified by the schemaLocation Attribute and consider it as part of the namespace of the including schema. The included schema must either have no target name space declared or have declared one that matches that of the including schema. The net effect is as if there weren't two separate schemas but only one.
xs:import : This Element works in a fashion similar to the xs:include Element. It also has a schemaLocation Attribute that a schema processor uses to find the schema file. However, there is a subtle difference (perhaps not so subtle when you understand the impact). You don't import a file of declarations; you import a namespace. Since you are importing a namespace, the imported schema must have either a different target namespace than the importing schema or no target namespace. The most common usage is for the two schemas to have different target namespaces. In this situation the xs:import Element has a namespace Attribute that identifies the URI of the namespace being imported, and the namespace must also be declared via the xmlns Attribute in the root xs:schema Element.

An Example of Importing Type Libraries

As I write this there are very few comprehensive libraries of common business documents in schema language. Hence, there are few examples we can draw from actual usage. However, we can review some of the approaches being considered by organizations such as ANSI ASC X12 and the OASIS UBL TC.

This example incorporates several approaches widely considered to be best practices. It is based on what Roger Costello [2003] calls the Venetian Blind Design. A similar example has been considered by groups within X12 and the UBL TC. At the time that I write this, neither group has officially adopted the features of this approach. However, it does reflect the general direction in which they are currently headed. I expect that although specific details will probably vary, the major features of this approach will become fairly common as more comprehensive libraries become available.

This example deals with a simple purchase order listing a buyer, a seller, and one or more line items. The following files comprise the example:

purchaseOrder.xml : a sample instance document
purchaseOrder.xsd : the schema for the instance document
procurement.xsd : a type library containing types that are specific to procurement (would probably be maintained by a specific subcommittee in a larger standards body)
common.xsd : a type library containing types used throughout the organization's schemas

All these files are available on the book's Web site, but I show them here to highlight specific concepts in context.

Let's begin by looking at the instance document.

purchaseOrder.xml

 <?xml version="1.0" encoding="UTF-8"?> <PO:PurchaseOrder xmlns:PO="urn:x12:names:schemas:005010:Procurement:PurchaseOrder" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=   "urn:x12:names:schemas:005010:Procurement:PurchaseOrder    purchaseOrder.xsd">   <Buyer>     <PartyName>       <FirstName>Barney</FirstName>       <LastName>Rubble</LastName>     </PartyName>     <Address>       <Street>12 Rock Lane</Street>       <City>Gravelly</City>       <State>AR</State>       <ZipCode>72838</ZipCode>     </Address>     <DUNSNumber>123456789</DUNSNumber>   </Buyer>   <Seller>     <PartyName>       <FirstName>Fred</FirstName>       <LastName>Flintstone</LastName>     </PartyName>     <Address>       <Street>452 Interstate 30</Street>       <City>Rockwall</City>       <State>TX</State>       <ZipCode>75032</ZipCode>     </Address>     <DUNSNumber>987654321</DUNSNumber>   </Seller>   <OrderedItems>     <ProductID>RH-932</ProductID>     <ProductName>Rock Hammer</ProductName>     <LineItemIdentifier>1</LineItemIdentifier>     <Quantity>2</Quantity>     <UnitOfMeasure>EA</UnitOfMeasure>     <RequestedDeliveryDate>2003-03-04</RequestedDeliveryDate>   </OrderedItems>   <OrderedItems>     <ProductID>WG-004</ProductID>     <ProductName>Heavy Duty Work Gloves</ProductName>     <LineItemIdentifier>2</LineItemIdentifier>     <Quantity>4</Quantity>     <UnitOfMeasure>PR</UnitOfMeasure>     <RequestedDeliveryDate>2003-03-04</RequestedDeliveryDate>   </OrderedItems> </PO:PurchaseOrder>

Notice that the only Element with a namespace prefix is the root Element. We'll see why when we look at purchaseOrder.xsd. Note also the use of a URN for the document's namespace. Among other things, the URN specifies a version within X12's namespace.

Finally, I need to point out a syntax difference between how schema locations are specified in instance documents versus schema documents. In instance documents there is a single xsi:schemaLocation Attribute on the root Element that lists pairs of URIs and the locations where the schema may be found. We'll see when we look at the schema document that it is done a bit differently there.

In this instance document we reference only a single namespace. In regard to the location, again since none of these examples are hosted at X12, the location reference assumes that the schema is located in the same place as the instance document. The overall structure of this document is specified in purchaseOrder.xsd.

purchaseOrder.xsd

 <?xml version="1.0" encoding="UTF-8"?> <xs:schema   targetNamespace="urn:x12:names:schemas:005010:Procurement:PurchaseOrder"   xmlns:PO="urn:x12:names:schemas:005010:Procurement:PurchaseOrder"   xmlns:pro="urn:x12:names:schemas:005010:Procurement:common"   xmlns:xs="http://www.w3.org/2001/XMLSchema"   elementFormDefault="unqualified"   attributeFormDefault="unqualified">   <xs:import     namespace="urn:x12:names:schemas:005010:Procurement:common"     schemaLocation="procurement.xsd"/>   <xs:annotation>   <xs:documentation>This is the generic Purchase Order schema.       We declare the root Element with an anonymous complex type       whose Elements are defined from library types.   </xs:documentation>   <xs:annotation>   <xs:element name="PurchaseOrder">     <xs:complexType>       <xs:sequence>         <xs:element name="Buyer" type="pro:BuyerType"/>         <xs:element name="Seller" type="pro:SellerType"/>         <xs:element name="OrderedItems" type="pro:OrderedItem"             maxOccurs="unbounded"/>       </xs:sequence>     </xs:complexType>   </xs:element> </xs:schema>

We declare a target namespace specifically for a purchase order document, so the root Element of our instance document must explicitly be in that namespace. However, because we have set elementFormDefault to "unqualified," no Elements below the root Element in the instance document tree need to have namespace prefixes. The other main thing to note is that this is a fairly simple skeleton schema focused on showing the overall structure of the document. It makes heavy use of reusable, common types. It declares the root Element with an anonymous complex type that has a sequence. The types of each of the Elements in this sequence are defined in the Procurement type library in procurement.xsd. Because we are importing these types from the namespace defined in procurement.xsd, we qualify them with a namespace prefix. The elementForm Default setting of "unqualified" does not relieve us of this requirement since a schema processor would otherwise not know where to look for the named types.

Now, to the difference in how the schema locations are specified. Notice that in the xs:import element we import a single namespace. In this case the schema Location Attribute has a single value, which is the location of the namespace being imported by that Element. The two different usages of the schemaLocation Attribute are legal because in the instance document the schemaLocation Attribute was from the xsi namespace, while in the schema document it is from the xs namespace.

There is one final thing to note. The instance document schema is built up of components that are ultimately based on types in several libraries. However, we only see it importing the namespace of the layer immediately below it. This is because of the way libraries are layered. The layer immediately below this schema for a specific instance document is that of the type library for the responsible subcommittee.

Let's now examine procurement.xsd, the type library that purchaseOrder.xsd imports.

procurement.xsd

 <?xml version="1.0" encoding="UTF-8"?> <xs:schema   targetNamespace="urn:x12:names:schemas:005010:Procurement:common"   xmlns:Pro="urn:x12:names:schemas:005010:Procurement:common"   xmlns:X12Com="urn:x12:names:schemas:005010:common"   xmlns:xs="http://www.w3.org/2001/XMLSchema"   elementFormDefault="unqualified"   attributeFormDefault="unqualified">   <xs:import namespace="urn:x12:names:schemas:005010:common"     schemaLocation="common.xsd"/>   <xs:annotation>     <xs:documentation>This schema declares items unique to the         Procurement subcommittee. This is a proposal that         illustrates how schemas might be structured and should         not be considered as suggesting any specific simple or         complex types. It has the same status as task group         working papers.     </xs:documentation>   </xs:annotation>   <xs:annotation>     <xs:documentation>This section declares extensions to common         complex types.     </xs:documentation>   </xs:annotation>   <xs:complexType name="PartyType">     <xs:complexContent>       <xs:extension base="X12Com:PartyType">         <xs:sequence>           <xs:element name="DUNSNumber" type="xs:string"/>         </xs:sequence>       </xs:extension>     </xs:complexContent>   </xs:complexType>   <xs:annotation>     <xs:documentation>This section declares modules common to the         Procurement SC.     </xs:documentation>   </xs:annotation>   <xs:complexType name="BuyerType">     <xs:complexContent>       <xs:extension base="Pro:PartyType"/>     </xs:complexContent>   </xs:complexType>   <xs:complexType name="SellerType">     <xs:complexContent>       <xs:extension base="Pro:PartyType"/>     </xs:complexContent>   </xs:complexType>   <xs:complexType name="OrderedItem">     <xs:complexContent>       <xs:extension base="X12Com:LineItemType">         <xs:sequence>           <xs:element name="RequestedDeliveryDate"               type="xs:date"/>         </xs:sequence>       </xs:extension>     </xs:complexContent>   </xs:complexType> </xs:schema>

This schema declares a library of all the types used in instance document schemas produced by this particular subcommittee. The target namespace is Procurement:common within the X12 namespace structure, and it imports the common X12 namespace with which we'll finish. Types imported from that namespace are qualified with the X12Com namespace prefix. In this subcommittee library we basically extend the complex types declared in the common type library, shown below.

common.xsd

 <?xml version="1.0" encoding="UTF-8"?> <xs:schema   targetNamespace="urn:x12:names:schemas:005010:common"   xmlns:common="urn:x12:names:schemas:005010:common"   xmlns:xs="http://www.w3.org/2001/XMLSchema"   elementFormDefault="unqualified"   attributeFormDefault="unqualified">   <xs:annotation>     <xs:documentation>This schema contains declarations common         to all X12 subcommittees. This is a proposal that         illustrates how schemas might be structured and should         not be considered as suggesting any specific simple or         complex types. It has the same status as task group         working papers.     </xs:documentation>   </xs:annotation>   <xs:annotation>     <xs:documentation>This section declares base lower-level         types.     </xs:documentation>   </xs:annotation>   <xs:simpleType name="NameType">     <xs:restriction base="xs:string">       <xs:maxLength value="50"/>     </xs:restriction>   </xs:simpleType>   <xs:annotation>     <xs:documentation>This section declares common simple types.     </xs:documentation>   </xs:annotation>   <xs:simpleType name="FirstNameType">     <xs:restriction base="common:NameType"/>   </xs:simpleType>   <xs:simpleType name="LastNameType">     <xs:restriction base="common:NameType"/>   </xs:simpleType>   <xs:annotation>     <xs:documentation>This section declares common complex types         composed of simple types.     </xs:documentation>   </xs:annotation>   <xs:complexType name="PersonNameType">     <xs:sequence>       <xs:element name="FirstName" type="common:FirstNameType"/>       <xs:element name="LastName" type="common:LastNameType"/>     </xs:sequence>   </xs:complexType>   <xs:complexType name="ResidenceAddressType">     <xs:sequence>       <xs:element name="Street" type="xs:string"/>       <xs:element name="City" type="xs:string"/>       <xs:element name="State" type="xs:string"/>       <xs:element name="ZipCode" type="xs:string"/>     </xs:sequence>   </xs:complexType>   <xs:complexType name="ProductType">     <xs:sequence>       <xs:element name="ProductID" type="xs:string"/>       <xs:element name="ProductName" type="xs:string"/>     </xs:sequence>   </xs:complexType>   <xs:annotation>     <xs:documentation>This section declares common complex types         composed of lower-level complex types.     </xs:documentation>   </xs:annotation>   <xs:complexType name="PartyType">     <xs:sequence>       <xs:element name="PartyName"           type="common:PersonNameType"/>       <xs:element name="Address"           type="common:ResidenceAddressType"/>     </xs:sequence>   </xs:complexType>   <xs:annotation>     <xs:documentation>This section declares common complex types         derived by extension from other complex types.     </xs:documentation>   </xs:annotation>   <xs:complexType name="LineItemType">     <xs:complexContent>       <xs:extension base="common:ProductType">         <xs:sequence>           <xs:element name="LineItemIdentifier"               type="xs:integer"/>           <xs:element name="Quantity" type="xs:decimal"/>           <xs:element name="UnitOfMeasure" type="xs:string"/>         </xs:sequence>       </xs:extension>     </xs:complexContent>   </xs:complexType> </xs:schema>

This main type library has types shared in common by all subcommittees. If X12 ever did anything like this it would, of course, be much larger. It would probably be split into many different files that were combined with the main schema file using the xs:import or xs:include Elements. Most of the types declared here are either simple types derived by restriction from built-in schema language data types or complex types built up from simple types. Since it is at the lowest layer, it declares only its own namespace and the standard W3C XML Schema namespace.

In summary, this example shows a layered architecture for defining type libraries. Each layer uses types only from the layer immediately below it. The example makes heavy use of named types and uses no global Element declarations. While using several different namespaces, it still allows instance documents to have minimal namespace qualification on their Elements and Attributes. It is very modular, extensible, and maintainable . Again, groups such as ANSI ASC X12 and the OASIS UBL TC will probably end up adopting an approach that looks different than the one used in this example. However, due to the advantages of this demonstrated approach I think that over time such groups will head in a similar direction.