I've used the DOM for all the XML processing in the Java and C++ code for this book. If you've followed me up to this point I think you can probably understand why that was an appropriate choice. However, it may not be the most appropriate choice for all circumstances, given your own particular situation and requirements. I said in Chapter 1 that there were other approaches. It's time to talk about them a bit more.
Simple API for XML
SAX isn't a "real" standard in that it hasn't been blessed by any standards body, but the SAX implementations probably follow SAX no more or less than people's DOM implementations conform to the W3C Recommendation. Again, SAX is based on an event-driven model that uses callbacks for each type of XML construct it encounters in an input stream. Unlike the DOM, SAX just reads a data stream in serially and triggers events. It doesn't build a tree (or anything, for that matter) in memory. Thus it can be quite appropriate if you have to process very large XML documents that would consume considerable memory if processed using the DOM.
Several API libraries for Java and C++ support SAX, so it is widely available. However, a major downside with SAX, as I said in Chapter 1, is that there isn't a standard way to create documents using SAX. The Apache Xerces distribution has SAX-based classes for creating XML output. So does MSXML. However, they're different.
Still, the choice of SAX over DOM is subject to several nonfunctional considerations, not the least of which is the programming paradigm. Are you more comfortable with event-based programming, or do you prefer working with trees? Take your pick.
Generated Class Bindings
An increasing number of products and freely available tools will generate Java or C++ classes for you if given a schema for input. Here are a few of those products.
The promise of tools like this is that if you give them a schema, they'll generate all the code necessary to let you access an XML document just like you would any other C++ or Java object. There is no complicated DOM, SAX, or other lower-level XML-specific code to write. This solution may be superior to DOM programming for many situations and is probably worth your consideration. However, despite all its benefits we need to keep in mind some of the potential drawbacks.
I've been around long enough to remember some early code generation products and to remember that they never caught on despite the promised benefits. Do a thorough assessment. The tools may make processing small, simple documents very easy. However, for larger, more complex documents with many optional Elements and Attributes, most of your program logic may deal more with content than with the particular APIs. A code generator may or may not save you significant effort over DOM programming.
I'm sure that 40 years ago similar concerns were raised by old assembly language programmers warning about the drawbacks of third-generation languages like COBOL and FORTRAN. So, call me an old (or new) fuddy-duddy if you like. The best advice I can offer is to do a thorough evaluation, including testing with a wide variety of inputs, before you commit to a particular tool. Despite the potential drawbacks, I do need to say that these tools get one thing right. They start with the data model.
Options for Procedural Languages
It's not that you don't have any options with strictly procedural languages like C, COBOL, and FORTRAN. It's just that your options are somewhat limited and nonstandard. I'll discuss three basic options here.
The first option involves linking routines in an XML supporting object-oriented language, probably C++, with your application. Digital Equipment (later Compaq and now part of Hewlett Packard) as early as 1978 supported a calling and linking standard that allowed modules written in any language on the VMS operating system to call modules in any other language. Even today not all operating systems and development environments offer this support or offer it as transparently . However, many do. If you happen to be fortunate enough to have this option, it's probably the easiest route to adding XML support to your application. Develop all your XML handling modules in C++, design so that you're sure you can pass the data back and forth, and you have the job done.
The second option is specialized API libraries, software packages, or compilers that provide XML APIs directly to these procedural languages. Several open source and proprietary alternatives are available. Here are a few examples.
APIs like these can certainly provide native XML support to an existing application. However, there are several issues to consider. As I said earlier, they generally don't provide native support for standard APIs like SAX and the DOM. Another issue is whether or not they support schema validation. Some older tools may not even support XML namespaces. Find out! Other issues to consider are the same as for any other development tool. The cost of the tool, its quality, its support, and the stability of the vendor (or the breadth and depth of the open source community) are usually of the highest importance.
The third option to consider is reassessing whether or not you really need native XML support. If your application is coded entirely in C, COBOL, or FORTRAN, you probably still do a lot of processing in batch mode. Unless you have particular requirements for real-time behavior (perhaps something similar to a CICS [*] transaction processing monitor), consider auxiliary stand-alone conversion utilities like those developed in this book.