How to Use This Book


How to Use This Book

This book is organized as an advanced tutorial that can also serve as a solid and comprehensive reference. Chapter 1 covers the bare minimum material needed to start working with XML, although for the most part this is not intended as a comprehensive introduction, but more as a review for readers who already have read other, more basic books. Chapter 2 introduces RSS, XML-RPC, and SOAP, the XML applications used for examples throughout the rest of the book. This is followed by two chapters on generating XML from your own programs (a subject all too often presented as a lot more complicated than it actually is). Chapter 3 covers generating XML directly from code, and Chapter 4 covers converting legacy data in other formats to XML. The remaining bulk of the book is devoted to the major APIs for processing XML:

  • The event-based SAX API

  • The tree-based DOM API

  • The tree-based JDOM API

  • XPath APIs for searching XML documents

  • The TrAX API for XSLT processing

Finally, the book finishes with an appendix providing quick references to the main APIs.

If you have limited experience with XML, I suggest that you read at least the first five chapters in order. From that point forward, if you have a particular API preference, you may begin with the part that covers the major API you're interested in:

  • Chapters 6 to 8 cover SAX.

  • Chapters 9 to 13 cover DOM.

  • Chapters 14 and 15 cover JDOM.

Once you're comfortable with one or more of these APIs, you can read Chapters 16 and 17 on XPath and XSLT. However, those APIs and chapters do require some knowledge of at least one of the three major APIs.


The Online Edition

The entire book is available online in plain- vanilla HTML at my Cafe con Leche web site. You can find it at http://www.cafeconleche.org/books/xmljava/. Every word of this book is there. Nothing has been held back or left out. I do hope you also find the printed book useful and choose to buy itit's certainly cheaper than the paper and toner you'd use up printing out all 1,120 pages from your laser printerbut you are by no means obligated to do so. My goal is to make this material as broadly available and useful as possible.

The online version has no protection other than copyright law and your own good will. You don't need to register to read it, or to download some special electronic key that becomes invalid when you buy a new laptop (and that probably wouldn't run on Linux or a Mac in the first place). I want people to read and use this book. I do not want to put up silly roadblocks that make it less useful than it could be. I do ask, as a courtesy , that you do not republish the online edition on your own server. Doing so makes it extremely difficult for me to keep the book up to date. If you want to save a few pages on your laptop so you can read this book on an airplane, I don't really mind. But please don't pass out your own copies to anyone else. Instead, refer your friends and colleagues to the web site or the printed book.


Some Grammatical Notes

The rules of English grammar were laid down, written in stone, and encoded in the DNA of elementary school teachers long before computers were invented. Unfortunately, this means that sometimes I have to decide between syntactically correct code and syntactically correct English. When forced to do so, English normally loses. This means that sometimes a punctuation mark appears outside a quotation mark when you'd normally expect it to appear inside, a sentence begins with a lowercase letter, or something similarly unsettling occurs. For the most part, I've tried to use various typefaces to make the offending phrase less jarring. In particular, please note the following:

  • Italicized text is used for emphasis, the first occurrence of an important term , titles of books and other cited works, words in languages other than English, words as words themselves (for example, Booboisie is a very funny word), Java system properties, host names , and resolvable URLs.

  • Monospaced text is used for XML and Java source code, namespace URLs, system prompts, and program output.

  • Italicized monospace text is used for pieces of XML and Java source code that should be replaced by some other text.

  • Bold monospaced text is used for literal text that the user types at a command line, as well as for emphasis in code.

It's not just English grammar that gets a little squeezed, either. The necessities of fitting code onto a printed page rather than a computer screen have occasionally caused me to deviate from the ideal Java coding conventions. The worst problem is line length. I can fit only 65 characters across the page in a line of code. To try to make maximum use of this space, I indent each block by two spaces and indent line continuations by one space, rather than the customary four spaces and two spaces respectively. Even so, I still have to break lines where I otherwise would prefer not to. For example, I originally wrote this line of code for Chapter 4:

 result.append("          <Amount>" + amount + "</Amount>\r\n"); 

To fit it on the page, however, I had to split it into two pieces, like this:

 result.append("          <Amount>"); 
result.append(amount +"</Amount>\r\n"); 

This wasn't too bad, but sometimes even this wasn't enough and I had to remove indents from the front of the line that would otherwise be present. This occasionally forced the indentation not to line up as prettily as it otherwise might, as in this example from Chapter 3:

 wout.write( 
"xmlns='http://namespaces.cafeconleche.org/xmljava/ch3/'\r\n"
  ); 

The silver lining to this cloud is that sometimes the extra attention I give to the code when I'm trying to cut down its size results in better code. For example, in Chapter 4, I found I needed to remove a few characters from this line:

 OutputStreamWriter wout = new OutputStreamWriter(out, "UTF8"); 

On reflection I realized that nowhere did the program actually need to know that wout was an OutputStreamWriter as opposed to merely a Writer . Thus I could easily rewrite the offending line as follows :

 Writer wout = new OutputStreamWriter(out, "UTF8"); 

This follows the general object-oriented principle of using the least-specific type that will suit. This polymorphism makes the code more flexible in the future should I find a need to swap in a different kind of Writer .