Stream-Based Parsing with SAX | XML Programming Bible

One of the major disadvantages of the DOM is how it processes large files. Because the DOM requires the entire file to be read in by the parser, memory can constrain the performance of your applications, if not render them useless. SAX parsers solve this problem by streaming in the document according to specific events. In this section we cover the behavior of a SAX parser and how to use one.

The Behavior of a SAX Parser

Unlike the DOM, which creates a tree-based representation, SAX doesn't have a default object model. When you use a SAX parser and read in a document, you will not be given a default object model. These parsers only read in your XML document and fire events based on the following:

Open or start of elements
Closing or end of elements
#PCDATA and CDATA sections
Processing instructions, comments, and entity declarations

Three Steps to Using SAX

The three steps to using SAX in your applications are

Creating a custom object model, like a Book class
Creating a SAX parser
Creating a document handler to turn your document into instances of your custom object model

Because SAX does not come with a default object model representation for the data in your XML document, you need to create your own the first time you use this method. The model could be something as simple as creating a Book class if your XML document is an address book.

After your custom model is created to hold your data in your application, the next step is creating a "document handler" to initialize instances of your object models from the document. This document handler is a listener for the various events we listed that are fired by the SAX parser. Most of the work involved in using SAX is in creating these document handlers.

As the SAX parser reads a document, events are fired based on all the "registered" document event listeners and translated into method calls on your document handler implementation. The document handler must then do something useful with these method calls.

Figure 3-5 shows the sequence of method calls the SAX parser makes on your document handler implementation. You can see from this picture how the SAX parser exposes the document as a series of events that are translated into method calls in your document handler implementation.