DOM is defined almost completely in terms of interfaces rather than classes. Different parsers provide their own custom implementations of these standard interfaces. This offers a great deal of flexibility. Generally you do not install the DOM interfaces on their own. Instead they come bundled with a parser distribution that provides the detailed implementation classes. DOM isn't quite as broadly supported as SAX, but most of the major Java parsers provide it, including Crimson, Xerces, XML for Java, the Oracle XML Parser for Java, and GNU JAXP. DOM is not complete to itself. Almost all significant DOM programs need to use some parser-specific classes. DOM programs are not too difficult to port from one parser to another, but a recompile is normally required. You can't just change a system property to switch from one parser to another, as you can with SAX. In particular, DOM2 does not specify how one parses a document, creates a new document, or serializes a document into a file or onto a stream. These important functions are all performed by parser-specific classes. JAXP, the Java API for XML Processing, fills in a few of the holes in DOM by providing standard parser-independent means to parse existing documents, create new documents, and serialize in-memory DOM trees to XML files. Most current Java parsers that support DOM2 also support JAXP 1.1. JAXP is a standard part of Java 1.4. Although JAXP is not included in earlier versions of Java, it does work with Java 1.1 and later and is bundled with most parser class libraries. DOM3 promises to fill the same holes that JAXP fills (that is, parsing, serializing, and bootstrapping), but it is not yet finished and not yet supported in a large way by any parsers. Because DOM depends so heavily on parser classes, its performance characteristics vary widely from one parser to the next . Speed is something of a concern, but memory consumption is a much bigger issue for most applications. All DOM implementations I've seen use more space for the in-memory DOM tree than the actual file on the disk occupies. Generally the in-memory DOM trees range from three to ten times as large as the actual XML text. Some parsers including Xerces offer a "lazy DOM" that leaves most of the document on the disk and reads into memory only those parts of the document that the client actually requests . Another distinguishing factor between different DOM implementations is the extra features the parser provides. Most parsers provide methods to parse XML documents and serialize DOM trees to XML. Other useful features include schema validation, database access, XInclude, XSLT, XPath, support for different character sets, and application-specific DOMs like the MathML, SVG, and WML DOMs. For example, the Oracle and Xerces parsers provide schema validation. lfred and Crimson don't. lfred has partial support for XInclude. The other three don't. The Oracle XML parser can produce a DOM Document object from a SQL query against a relational database or a JDBC ResultSet object. The other three can't. The Oracle XML parser can decode the WAP binary XML format. The other three can't. Xerces has specialized DOMs for HTML and WML documents. The other three don't. These are all nonstandard features; but if they're useful to you, that would be a good reason to choose one parser over another. Table 9.2 summarizes parser support for various useful features.
Table 9.2. DOM Parser Features
|