What Is JDOM? | Processing XML with Javaв„ў: A Guide to SAX, DOM, JDOM, JAXP, and TrAX

JDOM is an open source, tree-based, pure Java API for parsing, creating, manipulating, and serializing XML documents. Brett McLaughlin and Jason Hunter invented it in spring 2000. I asked Jason how it happened , and here is what he told me:

In the early months of 2000, in a time before I knew Brett, I found myself working with XML for a contract project and growing increasingly frustrated with DOM as a way to solve my problems. My mind had an expectation for what a Java-based XML manipulation API would look like. DOM wasn't anything like it.

In the spring of 2000, I attended Brett's talk on DOM and SAX at the O'Reilly Conference on Enterprise Java. I was hoping he'd share with me the DOM philosophy so I could see why reality wasn't matching my expectations. Rather than clearing things up, I found every fifth slide in his presentation was titled "Gotcha!" and listed one more thing you had to watch out for.

After his talk we sat down together on the lawn in San Jose. It was a gorgeous spring day. He was just about to release a book that was clearly destined to be a bestseller (Java and XML buzzwords in the title, what can go wrong, right elharo?). I was telling him some of what that means for a person's career, based on my experience with a popular servlets book. I used the opportunity to ask him (someone far more expert in XML than myself at the time), "Why does it have to be like this?" He thought about it, we talked about it, and ten minutes later we decided to start an open source project to create a Java-specific XML object model. It was the first alternative to DOM in the Java world.

We worked for about a month designing the early API. We each had our role to play. Brett made sure the API was consistent with XML specifications. I made sure the API was acceptable to a Java programmer who wanted to just use XML and get on with their life. We had two private betas, then a public beta 3. James Duncan Davidson was helpful during the two private betas, especially on the interfaces-versus-classes debate.

Since then numerous people have contributed to JDOM's development, including Alex Rosen, Alex Chaffee, James Duncan Davidson, Philip Nelson, Jools Enticknap, Bradley S. Huffman, and yours truly.

JDOM is open source like SAX and DOM. (Proprietary XML APIs really have not caught on.) Hunter and McLaughlin publish it under the very liberal Apache license. Essentially you can do anything you want with it except use the name "JDOM" for derivative works. It has already been forked once, resulting in James Strachan's dom4j.

dom4j

James Strachan forked JDOM in late 2000 to experiment with using interfaces built by factory methods instead of concrete classes built by constructors to represent the nodes. The result was dom4j.

dom4j has some features I like, including integrated XPath support and a generic Node interface that makes document navigation a lot simpler. However, my observation is that most developers find it much easier to work with class-based APIs such as JDOM than with pure interface-based APIs such as dom4j and DOM. Furthermore, classes can enforce constraints such as, "The name property of an Element must be a legal XML name." Interfaces can't do that. In my opinion, dom4j makes it too easy to slip out of the constraints of XML and produce a malformed document.

Like DOM, JDOM represents an XML document as a tree composed of elements, attributes, comments, processing instructions, text nodes, CDATA sections, and so forth. The entire tree is available at any time. Unlike SAX, JDOM can access any part of the tree at any time. Unlike DOM, all of the different kinds of nodes in the tree are represented by concrete classes rather than interfaces. Furthermore, there is no generic Node interface or class that all of the different node classes implement or extend. ^[1]

^[1] This is personally my least-favorite aspect of the JDOM design. It makes tree-walking and search operations far more cumbersome than they are in DOM.

JDOM is written in and for Java. It consistently uses the Java coding conventions and the class library. For example, all primary JDOM classes have equals() , toString() , and hashCode() methods. They all implement the Cloneable and Serializable interfaces. The children of an Element or a Document object are stored in a java.util.List . JDOM strives to be correctnot only with respect to XML, but also with respect to Java.

JDOM does not itself include a parser. Instead it depends on a SAX parser with a custom ContentHandler to parse documents and build JDOM models from them. Xerces 1.4.4 is bundled with JDOM. However, it can work equally well with any SAX2 compliant parser, including Crimson, lfred, the Oracle XML Parser for Java, Piccolo, Xerces-2, and more. Any of these can read an XML document and feed it into JDOM. JDOM can also convert DOM Document objects into JDOM Document objects, which is useful for piping the output of existing DOM programs to the input of a JDOM program. However, if you're working with a stream of XML data read from a disk or a network, it's preferable to use SAX to produce the JDOM tree, because it avoids the overhead of building the in-memory tree twice in two different representations.

Like DOM (and unlike SAX), JDOM can build a new XML tree in memory. Data for the tree can come from a non-XML source such as a database, from literals in the Java program, or from calculations as in many of the Fibonacci number examples in this book. When creating new XML documents from scratch (rather than reading them from a parser), JDOM checks all of the data for well- formedness . For example, unlike many DOM implementations , JDOM does not allow programs to create comments whose data includes the double hyphen ( -- ) or elements and attributes whose namespaces conflict in impossible ways.

Once a document has been loaded into memory, whether by creating it from scratch or by parsing it from a stream, JDOM can modify the document. A JDOM tree is fully read-write. All parts of the tree can be moved, deleted, and added tosubject to the usual restrictions of XML. (For example, you can add an attribute to an element but not to a comment.) Unlike DOM, there are no annoying read-only sections of the tree that you can't change.

Finally, when you're finished working with a document in memory, JDOM lets you serialize it back out to disk or onto a stream as a sequence of bytes. JDOM provides numerous options to specify the encoding, indenting, line end characters , and other details of serialization. Alternately, if you don't want to convert the document to a stream, you can produce a SAX event sequence or a DOM document as output instead.