For Developers | Using XML with Legacy Business Applications

This section describes the coding approach, conventions, and style I chose to use in this book. If you're reading this book as a technical end user , you may want to skip ahead to the next section.

General Coding Approach and Conventions

The Java and C++ code presented in this book uses object-oriented techniques. The DOM and all its various parts are object-oriented, as are the Java file operations and the C++ file operations. (I have chosen to use the C++ classes instead of the old-style C libraries.) That said, most of what we programmers really care about is procedural in nature, that is, the code that lives and works inside methods. We care most about how to call methods that manipulate DOM objects. We don't care as much about everything else.

I have not gone out of my way to make this code object-oriented. In the beginning it is fairly simple and not necessarily very heavily object-oriented. However, as the design progresses and matures through the book, I do use more object-oriented techniques when they promote reuse and extension. I'm enough of an old-school programmer to be a bit concerned about the performance implications of declaring and freeing a lot of objects dynamically. I generally tend to avoid doing so when I can. However, we're valuing reusability and extensibility over performance in this design. So, there are cases when we do create a lot of objects dynamically at runtime rather than declaring them statically at compile time. If the code lives on and it turns out to be a dog at runtime, we can investigate more efficient designs. A modular, object-oriented approach also helps us in that regard.

I have not chosen to construct elaborate object models for the non-XML entities manipulated in these programs. In addition there aren't very many class diagrams or other things as found in the Unified Modeling Language ( UML ). We're going to keep it simple and focus on the essentials.

That said, here are a few other notes and general rules on coding approach and style.

Clarity over cleverness : As any seasoned programmer knows , there are ways to write programs that are exceedingly clever. Such cleverness often yields short, efficient programs. However, in many cases these come at the expense of clarity, since it sometimes makes it harder to understand what the program is really doing. In this book where being clear means using a bit more code at the expense of being as clever as possible, I do the former rather than the latter. You can be clever in your own programs if you want, but in trying to get across some basic concepts I would rather be clear. We will follow the KISS principle: Keep it simple, stupid!
Error handling : Since these are basic utilities intended for light use and teaching concepts, I did not implement elaborate error handling. However, all parsing and DOM exceptions are caught and enough information is reported to enable someone reasonably experienced in XML to fix a problem.
DOM extensions : Some DOM implementations, notably Microsoft's MSXML in our case, offer extensions to the methods, functions, and interfaces defined in the DOM. To try to make the code platform independent and to keep the Java and C++ implementations as alike as possible, I avoid using these extensions in the code. However, where the extensions offer alternatives to the approaches I take in the code, I often comment on the alternative in the text.
File organization : Each class is coded in its own source file. The name of the source file generally matches the name of the class. This is, of course, standard in Java but not in C++.
Header files : For C++, each class has a separate header file.
Naming conventions : Upper camel case is used in the XML examples and for class names in the programs. Lower camel case is used for method names and variables . These choices seem to follow the prevailing conventions. With few exceptions abbreviations are avoided. Variable names in programs are prefixed with an abbreviation indicating the data type. Although in Java and C++ the line between variables and classes is sometimes hard to determine, I won't use this prefixing convention for classes.
Comments : The code is liberally commented. I have tried to err on the side of having too many comments rather than not enough. I hope you find them sufficient without being too tedious and obvious.
Formatting : Opening and closing brackets for code blocks appear on their own lines. Lines generally break around column 65. I have tried to make indentation and continuations consistent in the source files, using spaces and no tabs.
Pseudocode : The pseudocode presented here is based on Programming Design Language ( PDL ) [Caine and Gordon 1975]. This commonly used pseudocode detail design language is basically structured English with a few reserved words. I see no particular merits in it over other pseudocode languages, but I picked it to lend some consistency to the pseudocode. My usage extends PDL by including calls to specific DOM methods. Appendix B presents a summary of pseudocode conventions used in this book.

Additional C++ Considerations

A few words are in order regarding my approach to using C++. In particular, I want to point out handling of strings, exceptions, and constants.

The final ISO and ANSI C++ standard has extremely useful string classes (at last!). These are much easier to use and less prone to runtime exceptions than old C-style char arrays, char pointers, and the C string library. However, these string classes are still not universally supported "out of the box" by many major C++ compilers. They weren't supported natively by Visual C++ 6.0, which I'm using for this book. Even though some add-on open source and proprietary class libraries do support these classes, many developers use only what is standard from the compiler vendor. In addition, a lot of the legacy C++ code currently in production doesn't use these string classes. Finally, neither MSXML nor the Apache Xerces C++ API that is its best alternative use the ISO/ANSI string classes. MSXML uses COM strings (as we'll discuss in Chapter 2 and Appendix C), and the Xerces C++ API uses its own XML char class. They both support conversions to and from char arrays, not ISO/ANSI strings. So, for all these reasons I'm sticking with char arrays, char pointers, and the old C string library.

In the early days of C++ compilers, throwing exceptions for common runtime errors was discouraged because exception handling wasn't very efficient. The general view is that compilers have gotten a lot more efficient in this regard. However, a lot of the standard C++ library functions still use status codes or return values rather than throw exceptions for every little thing the way that Java does. For this reason I generally use the approach of returning status values from functions rather than throwing exceptions.

Regarding constants, some programmers believe that using #define rather than const to define constants is "evil" [Cline 2002]. I certainly prefer the Java style of defining constant class members . However, I found it extremely awkward to define constants that could be shared across classes using this approach with Visual C++ 6.0. There may be better ways to do it with compilers that provide better support for the final ISO/ANSI C++ standard. I'm sure there are people who are more clever about C++ programming than I am. At any rate, I've taken the easy way out in this book and used old C style #defines.

The general picture you should draw from this discussion is that the C++ code presented here does not necessarily represent current best practices in C++ coding. It instead represents "good" practices (I hope) as of the mid-1990s. However, I'm not really too concerned about that. If your legacy C++ applications are very old, they probably don't reflect current best practices either. Your code may look very similar to mine. On the other hand, if you are using more up-to-date compilers and following current best practices (for example, using the string class instead of char arrays), by all means keep doing what you're doing and don't regress. You should, however, still be able to follow my code and update the techniques to your current best practices as appropriate.