Representing Data with XML

< Day Day Up >

The Extensible Markup Language (XML) has revolutionized the computing world since its introduction in 1997. Designed as a standard way of representing data transferred over the Internet, XML is really a meta-language for specifying other markup languages. The language contains no predefined set of tags; rather, it offers a means of creating tags and the relationships between them. With this flexibility, XML has been used to represent data with applications ranging from Web page layout to advanced mathematics to business-to-business supply chain transactions.

However, representing data is only one piece of the puzzle. A holistic data operations solution requires other tools, and the XML community has responded to this need with a number of additional standards, as described in the following sections on XML Schema, parsers, and XQuery.

Schema

To reliably process an XML document, an application needs to set expectations on what data should be present. For example, if an XML document is supposed to contain a mailing address, the application would expect that a ZIP or postal code would be supplied and send some error notification if that data were missing. In this simple example, the application could reasonably contain the logic to check for a valid address. However, for applications that process more varied and complex XML documents, embedding such logic in the application would be inefficient, especially if revisions to the document format were expected.

XML Schema solves this problem by establishing rules for XML documents to make sure that data is in the expected format. Rules in Schema files define the proper sequence and frequency of elements, specify which elements are required or optional, and even set valid value ranges for the elements.

DTDS VERSUS SCHEMA

When the XML 1.0 specification first came out, the structure of documents was defined by Document Type Definition (DTD) files. As the popularity of XML grew, a number of failings were identified in DTDs, which led to the development of XML Schema. DTDs were not written using XML syntax, so large DTDs were hard to read, write, and maintain. Also, the validation criteria that could be set in DTDs were considered too simple and limited.

As an example of the rules that can be specified in an XML Schema file, Listing 10.1 shows the syntax used to set the range for bets allowed at a gaming table in this book's Wonderland Casino example. Rules about data values are defined by the <xs:restriction> element. The attribute values in the <xs:totalDigits> and <xs:fractionDigits> elements specify that the bet will be displayed with no more than four digits for dollars and two digits for cents . Meanwhile, the attribute values in the <xs:minInclusive> and <xs:maxInclusive> elements determine that the bets will be in the range of 25 cents to $1,000.

Listing 10.1. Selected Portion of the `CasinoLayout.xsd` Schema File

 <xs:element name="minimumBet">     <xs:simpleType>         <xs:restriction base="xs:decimal">             <xs:totalDigits value='6'/>             <xs:fractionDigits value='2'/>             <xs:minInclusive value='0.25'/>             <xs:maxInclusive value='1000.00'/>         </xs:restriction>     </xs:simpleType> </xs:element>

XML Parser APIs

After an application receives an XML document, there needs to be a way to programmatically process the data. Several popular Application Programming Interfaces (APIs) for parsing XML data exist: SAX, DOM, and JAXB.

The Simple API for XML (SAX) was the first widely accepted XML for Java API. SAX makes it possible for an application to read an XML document element by element and react to each element via event handlers. For example, one specific Java method, startElement() , would be called every time an XML element's start tag was encountered ; a second method, endElement() , would be called every time an end tag was found. Any logic to interpret how one piece of data related to another was up to application developers. The saving grace for parsers that used this API was that they were fast.

The Domain Object Model (DOM) is a platform- and language-neutral API that allows an application to load an entire XML document and read or update each element of that document. A DOM parser dynamically builds a tree representation of the document in memory as the document is read. Unfortunately, with this process, the amount of memory used is directly linked to the size of the XML document to be processed . Unlike SAX, the DOM API provides application developers with tools to logically link the data. However, to work effectively with the data, developers still need to manually put the raw XML data into Java objects that better represent the data's purpose. For example, if an XML document contains an <ADDRESS> element with <STREET> , <CITY> , <STATE> , <COUNTRY> , and <ZIPCODE> subelements, the DOM API would read in each piece of data with the getNodeValue() method, but it would be up to the application developer to put the data into an instance of a Java class called Address , with the appropriate member variables .

The Java Architecture for XML Binding (JAXB) overcomes DOM's limitations. This Java API allows an application to unmarshal an external XML document into Java objects, manipulate and validate the data as represented in Java, and marshal the modified data back into an external XML document. JAXB uses XML Schema to automatically derive Java classes and then puts that data directly into objects of the derived classes. With direct access to the data of interest and not having to build the document structure dynamically, JAXB parsers use memory more efficiently than DOM parsers. In the example of address-related data discussed previously, a parser implementing JAXB creates the Address class and objects. The downside to JAXB is that it supports only a subset of the XML Schema specification.

HOW JAXB IS PACKAGED

Note that the reference implementation of JAXB is not distributed as part of Java 2 Standard Edition (J2SE) or Java 2 Enterprise Edition (J2EE). It's available as part of a separate downloadable Java Web Services Developer Pack (Java WSDP) from the Sun Web site.

XQuery

This expression-based language is analogous to the database-oriented Structured Query Language (SQL). It combines the path -based navigation capabilities of XPath with for , let , where , and result (collectively called FLWR statements ) clause-based iteration and user -defined functions.

In the following sections, you learn how WebLogic Workshop combines the use of Schema, XML parser APIs, and the XQuery language to create a more intuitive development model for loading and manipulating XML documents.