XML Overview | ColdFusion MX Professional Projects

Team-Fly

XML is the Extensible Markup Language. An XML file is some collection of data that is demarcated by tags. XML files are structured and highly portable.

Let's consider the Jargoneer application from Chapter 2 again. In that application, the MIDP device requests an HTML page from the Jargon File server, then parses the page and displays the results on the screen. As I mentioned, a cleaner architecture might be to have the MIDP device talk to an intermediate server. This server would retrieve the HTML page, perform the parsing, and send some distilled version of the data down to the MIDP device. Figure 11-1 shows this architecture.

click to expand
Figure 11-1: A simple architecture for Jargoneer

What, exactly, would get sent from the intermediate server to the MIDP device? The simplest technique for exchanging data between a server and a device would be to use a properties file, like this:

 word: grok pronunciation: /grok/ type: vt. meaning: [from the novel "Stranger in ...

This works fine and is probably all you would need for simple applications. You'd have to write a class that could parse this input (MIDP doesn't include java.util.Properties), but that wouldn't be too bad.

However, chances are excellent that some parts of your application are already speaking XML, and it would likely simplify your life considerably if your MIDlet could parse XML instead of having its own specific data format. Furthermore, using XML validation during the development cycle may be a big help in flushing out bugs.

As an XML file, then, the same information would probably look like this:

 <?xml version="1.0" encoding="ISO-8859-1"?> <jargon-definition>   <word>grok</word>   <pronunciation>/grok/</pronunciation>   <type>vt.</type>   <meaning>[from the novel "Stranger in ...</meaning> </jargon-definition>

This simple XML document illustrates some important points. First, tags mark off every piece of data (element) in the document. In essence, every element has a name. Matching start and end tags are used to clearly separate elements. For example, the start tag <word> and the end tag </word> surround the word itself. Also note that elements may be nested. The jargon-definition element is simply a collection of other elements. Any of the other elements could contain further nested elements.

Element tags may also contain attributes. An alternate way of writing the previous XML file looks like this:

 <?xml version="1.0" encoding="ISO-8859-1"?> <jargon-definition word="grok" pronunciation="/grok/" type="vt.">   [from the novel "Stranger in ... </jargon-definition>

It's up to you exactly how you structure your XML data. Usually it depends on the structure of your application and the systems with which you will be exchanging data.

XML and HTML

XML looks a lot like HTML, but there are some important differences. First, HTML has a fixed set of tags, like <TITLE>, <BODY>, <H1>, <P>, and so forth. In XML, you can define whatever tags you want.

HTML is also pretty lax about requiring closing tags. For example, HTML documents typically have <P> tags at the beginning of each paragraph, but it's unusual to have matching </P> close tags. As a matter of fact, the HTML world is pretty loose about document formatting in general. You can throw all sorts of strange documents at a browser and it will do its best to display them.

It is possible to write HTML such that it complies with XML; this is XHTML. For more information on XHTML, see http://www.w3.org/TR/xhtml1/.

Understanding SAX

SAX is the Simple API for XML, a standard API for Java applications that want to parse XML data. The API is documented online at http://www.megginson.com/SAX/, but SAX-compliant parsers usually include the SAX API as part of their software. The current version of SAX is 2.0, but the small parsers covered in this chapter are only at the 1.0 level if they implement SAX at all.

SAX 1.0 revolves around the org.xml.sax.Parser interface. Parser has a method, parse(), that parses through an entire XML document, spitting out events to listening objects. Typically, your application will implement a DocumentHandler that receives notification about start tags, end tags, element data, and other important events. A SAX 1.0 application looks something like this:

 try {   Parser p = new SAXParser(); // Create a specific parser implementation.   // Create some DocumentHandler named handler.   p.setDocumentHandler(handler);   p.parse(); } catch (Exception e) { // Handle exceptions. }

The call to parse() proceeds until the document has been fully parsed. During the parse, callback methods in the registered DocumentHandler are invoked. In these methods, you'll process the data from the XML document.

SAX 1.0 is not MIDP compliant straight out of the box. The Parser interface includes a setLocale() method that references the java.util.Locale class, a class that is missing in the MIDP platform. Later sections on MinML and wbxml describe how to work around this problem.

Another standard API, the Document Object Model (DOM), takes a different approach to XML parsing. With DOM, the parser creates an internal model of a document as it is parsed. After parsing is complete, an application can examine the entire document. DOM is further described here at http://www.w3.org/TR/DOM-Level-2-Core/. Although none of the parsers described in this chapter implement DOM directly, some of them do follow the DOM paradigm of creating an internal representation of a parsed document.

Validation and Development Cycle

XML documents may also make reference to a Document Type Definition (DTD) or an XML Schema; these are files that describe the contents of a particular kind of XML document. We could, for example, write a DTD that specified the contents of a jargon-definition document. This is part of the power of XML, and it's the reason XML is sometimes called self-describing data.

Given a document, you can determine if it conforms to its DTD, which is a great way to determine if part of your system is producing data that's unreadable by the rest of your system. In XML terms, a document that follows the rules of its DTD or schema is valid. In the J2SE and J2EE worlds, parsers may be validating or non-validating. The J2ME world is too small to support XML document validation, so all of the parsers we'll discuss in this chapter are non-validating.

Even though you won't be able to perform validation on a MID, you may well want to use validating parsers during your development and test cycle. For example, you might write code that emulates the MID client, having it request data from your server and validate the results. This helps flush out bugs in the server code before you make the switchover to the MID client software.

Design Tips

Common sense, as always, takes you a long way. As you contemplate the use of XML in your MIDP application, keep three things in mind:

Keep the documents small. If you're sending some 100-KB document down to the MID and only using a few elements, it's time to rethink your server-side strategy. You can probably transform the document at the server and just send what you need to the MID. Keep in mind that network connectivity is likely to be slow, and there's not much memory on the MID.
Don't use comments in the XML that you send to the MID, except perhaps as a debugging aid during the development cycle. Comments will only make the document longer, which implies a slower download and more memory usage on the MID.
Choose a parser that fits your needs. Some of the parsers we'll examine build an entire model of a document in memory as the document is parsed. This is like writing a blank check to the supplier of the XML document. If the server sends you a 1-MB file, these types of parsers will attempt to read through the whole thing, right up until they run out of memory. On the other hand, if you know the size of the files you'll be parsing, and they are small enough, you might choose a model-building parser, as it is slightly easier to use than the other types of parsers.

Finally, you may be concerned about the performance of a small XML parser. This is a valid concern, especially on a small device that has a relatively slow processor. For a fascinating comparison of XML parser performance, see http://www.extreme.indiana.edu/~aslom/exxp/. With small documents, the small parsers can hold their own or outperform larger parsers.

Team-Fly