Chapter 12. Java and SAX | Real World XML (2nd Edition)

The previous chapter was all about using Java and the XML DOM. However, some people find using the DOM difficult and think that the whole concept of treating an XML document as a tree is unnecessarily complex. Rather than having to navigate through the whole document, they say, wouldn't it be great if the whole document came to you? That's the idea behind the Simple API for XML (SAX), and this chapter is dedicated to it. SAX really is a lot easier to use for manypossibly even mostXML parsing that you have to do. If you need to know the hierarchy of a document, use DOM methods ; if you don't, you might try SAX.

You may be surprised to learn that we've already been putting the ideas behind SAX to work throughout the entire previous chapter. You may recall that in that chapter, I set up a recursive method named display that was called for every node in the DOM tree. In display , I used a switch statement to make things easier. That switch statement had case statements to handle different types of nodes:

 public static void display(Node node, String indent)  {     if (node == null) {         return;     }     int type = node.getNodeType();     switch (type) {         case Node.DOCUMENT_NODE: {             displayStrings[numberDisplayLines] = indent;             displayStrings[numberDisplayLines] += "<?xml version=\"1.0\" encoding=\""+               "UTF-8" + "\"?>";             numberDisplayLines++;             display(((Document)node).getDocumentElement(), "");             break;          }          case Node.ELEMENT_NODE: {              displayStrings[numberDisplayLines] = indent;              displayStrings[numberDisplayLines] += "<";              displayStrings[numberDisplayLines] += node.getNodeName();              int length = (node.getAttributes() != null) ? node.getAttributes().getLength(  ) : 0;              Attr attributes[] = new Attr[length];              for (int loopIndex = 0; loopIndex < length; loopIndex++) {                  attributes[loopIndex] = (Attr)node.getAttributes().item(loopIndex);              }              .              .              .

I was able to add the code that handled elements to one case statement, add the code to handle processing instructions to another case statement, and so on.

In essence, we were handling XML documents the same way that SAX does. Instead of navigating through the document ourselves , we let the document come to us, having the code call various case statements for the various nodes in the document. That's what SAX does. It's event-based , which means that when the SAX parser encounters an element, it treats that as an event and calls the code you specify for handling elements; when it encounters a processing instruction, it treats that as an event and calls the code you specify for handling processing instructions, and so on. In this way, you don't have to navigate through the document yourselfit comes to you. (SAX is simpler and usually faster than DOM techniques, partially because SAX is read-only and DOM techniques are read/write.) The fact that we've already done a significant amount of programming with this technique indicates how useful it is.