Overview of XML Schema


One of the latest and most eagerly awaited XML- related initiatives is XML Schema. As the name implies, it is intended for use in specifying the structures of document types. DTDs already provide a mechanism for addressing this function, but they have some drawbacks. These drawbacks are especially severe for data-oriented application.

The vocabulary for DTDs is the only one in the XML platform that does not follow XML document syntax. Therefore, applications and tools must be able to process both XML document and XML DTD syntax. Moreover, developers and authors must learn how to use both types of syntax. While not a functional short coming, this difference is annoying.

The functional shortcoming of DTDs is that they specify data types with far less precision than programming languages and DBMSs. In data-oriented applications where the purpose of XML is to facilitate the exchange of data among different pieces of software, the XML processor cannot perform even the most basic validation of whether a piece of data is a string, decimal, or date.

Therefore, programmers must write special code to convert data back and forth between the less precise DTD type system and the more precise programming type systems. This conversion code imposes a significant development burden , but it also raises the possibility of different conversion code treating the same XML document data differently. So despite using the same DTD, such software could still be incompatible due to different conversion policies.

Another concern about DTDs is that they lack flexibility in reusing structures across formats and supporting formats that evolve over time. DTDs work well when a single group controls all of the definitions used in its software and all of the software that uses a particular definition. But the use of XML documents over the Internet has evolved so that organizations want to combine definitions from a variety of formats to suit their business needs and then want to extend definitions without breaking software they have already deployed.

How It Works

Like a DTD, an XML Schema file is a description of the rules that valid documents must follow. However, the process of designing formats with XML Schema is more similar to designing programming data structures or database schemas. Example 3-15 uses XML Schema to specify rules for the address section of our "Order" format defined using a DTD in Example 2-9.

Example 3-15
 <xsd:complexType name="AddressesType">  <xsd:choice>   <xsd:element name="BillShip" type="AddressType"/>   <xsd:sequence>    <xsd:element name="Ship" type="AddressType"/>    <xsd:element name="Bill" type="AddressType"/>   </xsd:sequence>  </xsd:choice> </xsd:complexType> 

This example enhances the corresponding section of Example 2-9 in two ways. First, it starts by defining the "AddressesType" rather than the "Addresses" element. By defining a more generic type, developers will find it easier to reuse this block of rules across multiple formats such as Orders, Invoices, and Statements. Second, it takes a more explicit approach to differentiating among shipping and billing addresses. It specifies that elements of the "AddressesType" have the choice of having either a single "BillShip" element or a sequence with a "Bill" element, then a "Ship" element. However, all of these elements are of the same "AddressType." By making it easier to define an "AddressType" and reuse its structure across elements, XML Schema raises the distinction between billing and shipping addresses in the element name rather than hiding it in an attribute. Example 3-16 shows the definition for the "AddressType."

Example 3-16
 <xsd:complexType name="AddressType">  <xsd:sequence>   <xsd:element name="FirstName" type="xsd:string"/>   <xsd:element name="MiddleName" type="xsd:string"/>   <xsd:element name="LastName" type="xsd:string"/>   <xsd:element name="Street" type="xsd:string"   minOccurs="0" maxOccurs="4">    <xsd:attribute name="lineOrder"    type="xsd:integer" use="implied">    <xsd:unique name="lineOrderUnique>     <xsd:selector xpath="/"/>     <xsd:field xpath="@lineOrder"/>    </xsd:unique>   <xsd:element name="City" type="xsd:string"/>   <xsd:element name="State" type="xsd:string"/>   <xsd:element name="Postal" type="PostalType"/>   <xsd:element name="Country" type="xsd:string"/>  </xsd:sequence> </xsd:complexType> 

This example specifies a structure for an address similarly to Example 2-9, but with three improvements. First, the cardinality constraint on the "Street" elements is more precise than simply 1 or more. Constraining the maximum number of such elements to 4 could be very important for ensuring that addresses fit on envelopes and labels. Second, we've explicitly indicated that the "lineOrder" attribute must be an integer and have a unique value for each "Address" element. Last, we've defined the "Postal" element as having the "PostalType" rather than a generic "string" type.

The definition of this "PostalType" illustrates how XML Schema accommodates additions over time. Example 3-17 shows how to define this type as a string with a maximum length of 10. This definition could be precise enough for the initial version of the schema, but we might want to perform more validation in a subsequent version by having specific formats for US and UK postal codes.

Example 3-17
 <simpleType name="PostalType">  <restriction base="string">   <length value="10" fixed="true"/>  </restriction> </simpleType> 

Example 3-18 shows how to extend Example 3-17 seamlessly to accommodate this change.

Example 3-18
 <element name="USPostal" type="USPostalType" substitutionGroup="Postal"/> <element name="UKPostal" type="UKPostalType" substitutionGroup="Postal"/> <simpleType name="USPostalType">  <restriction base="PostalType">   <pattern value="\ d{ 5} \ d{ 5} -\ d{ 4} "/>  </restriction> </simpleType> <simpleType name="UKPostalType">  <restriction base="PostalType">   <pattern value="[A-Z]{ 2} \ d\ s\ d[A-Z]{ 2} "/>  </restriction> </simpleType> 

In the new version of the schema represented by Example 3-18, we first define the "USPostal" and "UKPostal" elements as members of a substitution group for "Postal" elements. These definitions make it allowable for conforming documents to substitute a "USPostal" or "UKPostal" element for any "Postal" element. Then we extend the "PostalType" so that it now has the subtypes "USPostalType" and "UKPostalType." We can then use these more specific types in place of less specific "PostalTypes." This combination of substitution groups and type derivation provides something very much like polymorphism in object-oriented programming languages.

Each of these subtypes specifies different restrictions on the pattern of valid postal codes. For the United States, the allowable pattern is five numeric digits plus an optional dash followed by four numeric digits. For the United Kingdom, the allowable pattern is two capital letters followed by a numeric digit, a space, and a numeric digit followed by two capital letters . Any new software that creates "Order" documents can specify a "USPostal" or "UKPostal" element based on the country where the order originates and be assured that this important typing information will be enforced as the document works its way through the supply chain. Note that any document conforming to the original schema with just "Postal" elements is still valid, so this extension maintains backward compatibility.

In addition to improving schema over time, XML Schema also has features that make it easy to reuse the efforts of other groups. Suppose that the developers creating the schema for Order documents want to take advantage of the work done by another group in the area of payment encoding. Example 3-19 shows how to defer the format validation of a specific element to a schema defined elsewhere.

Example 3-19
 <element name="Payment>  <complexType>   <any namespace=http://www.foocompany.com/xml/payments   processContents="strict"/>  </complexType> </element> 

As with the "Order" DTD in Example 2-9, this snippet defines a "Payment" element as part of the format. However, rather than defining the subelements themselves and forcing developers to maintain these rules independently, this example enables any element from the namespace maintained by another hypothetical development team. By setting the "processContents" attribute to "strict," this example instructs the XML Schema processor to retrieve the other schema and validate the elements against it. This approach makes the very definition of data formats a collaborative exercise well suited to the Internet.

Practical Usage

Most new commercial XML applications that work with server applications or databases use XML Schema. The rapid emergence of support for the standard in development tools and components makes it the most convenient means of ensuring the precise compatibility necessary for software data exchange. The endorsement of the W3C really makes it the only viable choice where a format may be released to external parties as a means of commercial inter operability. In cases where an organization intends to use a format for internal purposes only or the format is part of a noncommercial effort, you could consider alternatives.

XML Schema is not without its detractors. Many developers complain that it is overly complex, failing to follow the 80-20 rule. They say that instead of making it easy to satisfy the needs of most developers that need a relatively small set of features, it forces everyone to deal with a huge feature set. There are even efforts to promote alternative means of specifying XML data formats that are easier to understand. The most visible alternative is RELAX NG from the Organization for the Advancement of Structured Information Standards (OASIS). However, the simple reality is that the XML Schema standard, while imperfect, is usable as it is. Therefore, coming from the premier XML standards body makes its widespread acceptance very likely.

For managers and developers worried about minimizing complexity, a better strategy is to develop formal guidelines about what features to use. As part of an overall XML technology strategy, the senior architects and developers within an organization may agree on the parts of the standard that they will use and then enforce this decision internally. As a proper subset of the standard, this approach retains compatibility. If externally provided formats go outside the designated feature set, they can decide on a case by case basis whether the benefit of the format justifies the additional complexity burden.



XML. A Manager's Guide
XML: A Managers Guide (2nd Edition) (Addison-Wesley Information Technology Series)
ISBN: 0201770067
EAN: 2147483647
Year: 2002
Pages: 75
Authors: Kevin Dick

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net