5.10 Summary | Building Parsers With Javaв„ў

	Building Parsers with Java By Steven John Metsker
	Table of Contents

	Chapter 5. Parsing Data Languages

Content

The acceptance of XML as a standard is on the rise. If you offer to pass data to another division or another company, the receiving information technology group can hardly say no to a request to encode the data in XML. On the other hand, the integration of XML and Java is still primitive at the time of this writing. The two main approaches ”SAX and DOM ”both have drawbacks.

If you read coffee data into DOM, for example, a coffee is a node that has roast as a node that has french as a node. All these nodes have equal status as nodes in the tree. But in Java, there are fundamental differences between these types of information. A coffee is an object with roast as an attribute and french as this attribute's value. The proper translation from nodes to objects, attributes, and values is the developer's responsibility. Assembling objects from DOM trees is extra work for developers, fraught with opportunities for introducing defects. By the time you read this, the DOM approach may have yielded to schema-oriented approaches that tighten the connection between XML and Java.

The SAX approach to XML lets you plug behavior in to the parser's recognition process. However, the S in SAX stands for Simple, and the programming interface to SAX parsers is limited. You can receive notification of the two basic events ”recognizing an element or recognizing text ”but you cannot, for example, tie a specific behavior to recognizing a <roast> parameter.

By comparison, writing your own parser gives you more control but also more of a maintenance burden . Once you build a parser and your business comes to rely on it, someone will have to keep it running even as business conditions change. If you create a new language, you may also find that other departments or businesses cannot read in your data. You can assume that any information technology department can handle XML, but you cannot assume that any information technology department can write a parser.

There are, nonetheless, several reasons for creating a parser for a data language. First, if the language already exists, as in the coffee marketing example, and you cannot change it, you must write some kind of parser from scratch. Another constraint is that XML is clearly limited to markup languages. If you do not want markup tags in your language, XML is not the answer. A final motivation for writing your own parser is that climbing the learning curve for creating data languages will prepare you for building more-advanced parsers. Developing expertise at creating new languages from sequence, alternation , repetition, and terminal objects is a gateway to a world of languages that lie beyond markup.

Top