Structuring with Schemas

Team-Fly    

 
XML, Web Services, and the Data Revolution
By Frank  P.  Coyle
Table of Contents
Chapter 2.   The XML Technology Family


Having defined some XML and come up with a namespace for ZwiftBooks, it's now time to think how we might use our new data representation to begin to accept queries as XML over the Internet. When using XML to exchange data among clients , partners , and suppliers, it's important to be able to define how XML documents should be structured. What's needed is a schema.

A schema defines a data representation.

"Schema" is a general term that describes the form of data. Originally the term was used to define the structure of databases. A schema is a formal specification of the grammar for a specific XML vocabulary. Schemas are useful in validating XML document content, determining whether the XML document conforms to the grammar expressed by the XML Schema, and describing XML structure to others, which enables the exchange of structured information between collaborating applications or business partners in a platform- and middleware-neutral manner.

XML may be specified with DTDs or XML Schema.

In the XML world, there are two schemas: DTDs and XML Schemas. Figure 2.6 illustrates how either may be used to define a family of XML document instances. DTDs focus primarily on structure, allowing an XML vocabulary designer to specify the elements and attributes that are appropriate for a set of XML instance documents. However, DTDs have limited capability to describe data types within an XML document.

Figure 2.6. Both DTDs and XML Schemas may be used to define the structure of an XML document.

graphics/02fig06.jpg

XML Schema is a newer technology, adopted by W3C as an official Recommendation in May 2001, intended to provide the kind of detailed structure often associated with programming languages' data types and useful in enabling solutions for the exchange of XML-based information where it is helpful to check data format accuracy before processing begins.

Let's look at how ZwiftBooks could use schemas to help it move its business to an Internet base.

DTD

A DTD may be used by both a sender and a receiver of XML.

Figure 2.7 shows how a DTD can be used by both the sender and receiver of a ZwiftBooks document. When ZwiftBooks publishes its DTD for a query, senders can use the DTD to create XML documents that the ZwiftBooks server will understand. On the receiving end, the server can compare an incoming XML document against the DTD and determine if the incoming XML data is valid with respect to the DTD. If not, it can return a message indicating an error in the incoming data format.

Figure 2.7. A ZwiftBooks DTD can be used by both the sender and receiver of an XML document.

graphics/02fig07.jpg

DTD structure and syntax are covered in more detail in Appendix A. There are several points to keep in mind:

  • DTDs are written using a different syntax from XML. This is because DTDs were first developed in the SGML world before XML existed.

  • DTDs define the elements and attributes that can validly appear in our ZwiftBooks XML documents.

  • DTDs are not able to define distinctions about data types. For example, a DTD cannot declare that an element must contain a valid date or numeric field or even an ISBN number. A DTD is limited to declaring that an element must contain text; it cannot control what kind of text, as, for example, by distinguishing between numeric and alphabetic characters .

XML Schema

An XML Schema is itself an XML document.

The alternative to using DTDs to specify what constitutes a valid ZwiftBooks XML document is to use XML Schemas. The XML Schema Recommendation goes beyond the basic text-based descriptors provided by DTDs. It provides more detail about the kind of data that can appear as part of an XML document. Unlike the non-XML syntax of a DTD, XML Schema is itself an XML vocabulary that defines rules governing the structure and content of elements and attributes in an XML document. Because XML Schemas are XML, they can be processed and managed like any XML instance.

XML Schemas can reduce the burden of processing XML.

XML Schemas take a giant step in moving XML from the document world into the data processing world. Schemas support a broad range of data types, and include useful features like range constraints that let XML authors describe bounds for data that can be enforced by schema processors. The advantage of using XML Schemas over DTDs is that XML Schemas eliminate the need for hand-coded data checking of XML data fields. Off-the-shelf software can validate XML data against a broad range of built-in data types.

As Figure 2.8 illustrates, XML Schema data types form a hierarchy. All data types derive, directly or indirectly, from the root anyType , which can be used to indicate any value at all. Below anyType , the hierarchy branches into two groups consisting of simple types and complex types.

Figure 2.8. XML Schema data types [1]

graphics/02fig08.jpg

[1] http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/. Copyright 2001 World Wide Web Consortium (Massachusetts Institute of Technology, Institut National de Recherche en Informatique et en Automatique, Keio University). All Rights Reserved. http://www.w3.org/Consortium/Legal/. Recommendation date May 2001.

XML Schema supports different ways to define data types.

As we move down the tree, types derive from their parents by modifying in some way the properties of parent types. There are four kinds of derivation rules: restriction, enumeration, list, and union. Restriction works by constraining a parent base type. For example, positive- Integer inherits from nonNegativeInteger by restricting the scope of values to positive values. Enumeration constrains by restricting legal values to a set based on an underlying type. For example, it's possible to declare a holiday data type that allows only values from a list of company holidays (such as "12-25", "01-01", and "02-14"). List constraint is similar to enumeration except that multiple values are permitted. For example, the list data type option could be used to define a data item such as personalHoliday that reflects an employee's ability to pick three out of ten listed days as personal holidays. Union allows the mixing of data types, such as positiveInteger and String . For example, a company that used 9 as the internal code for the rank of vice president could use either the integer 9 or the string "Vice President" within a companyRank element in its XML.

Complex Types

Complex types can be used to model application-specific data.

Complex types are an important aspect of XML Schema that allow application developers to define application-specific data types that can be checked by programs that check XML documents for validity. As Figure 2.8 shows, XML Schema divides complex types into two categories: those with simple content and those with complex content. Both varieties allow attributes, but simple content types can contain only characters, whereas complex content types can contain child elements.

ZwiftBooks and XML Schema

Let's follow ZwiftBooks on the path to XML Schema. It should consider moving from DTD to XML Schema mainly to be able to specify more accurately the structure of documents sent over the Web and to allow both sender and receiver to validate the XML against the schema using off-the-shelf tools. For example, DTDs can specify only text in the zipcode data item, while with XML Schema the text can be refined to numeric data for the zip code.

Figure 2.9 illustrates ZwiftBooks' use of XML Schema to describe a complex data type called book . Because a schema definition is an XML document, we must be careful to distinguish between elements associated with the XML Schema namespace and ZwiftBooks' elements. Thus the declaration allows us to use the short-cut term xs in place of the official namespace, http://www.w3.org/2001/XMLSchema. We then use <xs:element name =" Book "> to define our new complex data type that consists of three subelements, isbn , title, and author .

 <xsd:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> 
Figure 2.9. A ZwiftBooks schema to describe the data associated with a book.

graphics/02fig09.jpg

Schemas in Practice

XML Schema provides a rich catalog of element descriptions.

Schemas provide a flexible and rich language to specify, package, publish, and exchange both structured and unstructured information across application or business boundaries. Schemas may be used in several ways with XML documents. Through use of the xsi:type attribute, an element can explicitly assert its type within a specific XML document instance. Such assertions can be used to validate an element against a predefined XSL type. To handle versioning and time- related processing, the XML Schema type timeInstant may be used to specify a particular instant of time. Values for timeInstant are compliant with the ISO 8601 time and date standard. The XML Schema uriReference data type can handle links to URIs, which may be specified using an absolute or relative syntax.

XML Processing

When XML arrives at a server, it is typically validated against a DTD or XML Schema and then stored, transformed, or processed in some way depending on the application. Both validation and processing can be performed by XML parsers. In the XML parsing and processing world there are two major alternatives: the Document Object Model (DOM) and the Simple API for XML (SAX).

DOM

DOM is a W3C Recommendation.

DOM is a W3C-supported standard application programming interface (API) that provides a platform- and language-neutral interface to allow developers to programmatically access and modify the content and structure of tree-structured documents such as HTML or XML. Because DOM is language neutral, programmers can use scripting languages or programming languages such as Java to process XML.

DOM is constantly evolving to keep up with changes in the XML world. It includes the following levels, each of which provides more capability to the DOM API:

  • DOM Level 0 informally refers to the functionality available to scripting languages in Netscape Navigator 3.0 and Microsoft Internet Explorer 3.0. There is no W3C specification for Level 0.

  • DOM Level 1 is a W3C Recommendation completed in October 1998. Level 1 provides support for XML 1.0 and HTML processing.

  • DOM Level 2 is a W3C Recommendation completed in November 2000. Level 2 extends Level 1 with support for XML 1.0 with namespaces and CSS. It also provides support for user interface and tree manipulation events and adds additional DOM tree manipulation capabilities.

  • DOM Level 3 is still under development as of this writing. Level 3 is intended to extend Level 2 by adding user interface keyboard events. It will also support DTDs, XML Schema, and XPath.

DOM requires significant system resources for large documents.

DOM falls under the category of tree-based APIs which are useful for a wide range of applications but can strain system resources when documents are large. Also, because DOM is language neutral, processing in a strongly typed language such as Java can introduce unwanted complexities when going from DOM interfaces to language-specific data structures. To overcome these drawbacks, developers began to look at event-based processing models that reduced system memory requirements and let developers create their own data structures. The result was SAX.

SAX

SAX is simple.

SAX is an example of a grass-roots development effort to provide a simple, Java-based API for processing XML. It began with design discussions taking place publicly on the XML-DEV mailing list. One month after discussions began, the first draft interface was released in January 1998. After further mailing list discussion, SAX 1.0 was released.

SAX is event-driven.

SAX differs from DOM in that SAX is event-driven. Programmers define event handlers that are notified when elements are found in an XML document. The handler code uses the information delivered by SAX to perform application-specific processing tasks .

SAX supports processing pipelines.

SAX2 was released in May 2000 and provides support for other languages besides Java. SAX2 also supports filter chains that may be used to construct event processing pipelines. Filter chains are useful in building complex XML applications that can be partitioned into tasks modeled as simple SAX components . In a SAX filter chain, each SAX component does some processing and passes data on to the next SAX component in the chain. (See Chapter 4 for additional discussion of filter architectures.)

SAX requires programmers to maintain state.

While SAX is simple, there is a downside. Programmers working with SAX must build their own data structures to help keep track of where they are in a document. In effect, they have to maintain state as they process XML using SAX. This can make programs complex and more difficult to maintain. Thus it's important that developers understand both SAX and DOM and choose their API based on the requirements of the application.


Team-Fly    
Top


XML, Web Services, and the Data Revolution
XML, Web Services, and the Data Revolution
ISBN: 0201776413
EAN: 2147483647
Year: 2002
Pages: 106
Authors: Frank Coyle

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net