One of the major disadvantages of the DOM is how it processes large files. Because the DOM requires the entire file to be read in by the parser, memory can constrain the performance of your applications, if not render them useless. SAX parsers solve this problem by streaming in the document according to specific events. In this section we cover the behavior of a SAX parser and how to use one.
Unlike the DOM, which creates a tree-based representation, SAX doesn't have a default object model. When you use a SAX parser and read in a document, you will not be given a default object model. These parsers only read in your XML document and fire events based on the following:
The three steps to using SAX in your applications are
Because SAX does not come with a default object model representation for the data in your XML document, you need to create your own the first time you use this method. The model could be something as simple as creating a Book class if your XML document is an address book.
After your custom model is created to hold your data in your application, the next step is creating a "document handler" to initialize instances of your object models from the document. This document handler is a listener for the various events we listed that are fired by the SAX parser. Most of the work involved in using SAX is in creating these document handlers.
As the SAX parser reads a document, events are fired based on all the "registered" document event listeners and translated into method calls on your document handler implementation. The document handler must then do something useful with these method calls.
Figure 3-5 shows the sequence of method calls the SAX parser makes on your document handler implementation. You can see from this picture how the SAX parser exposes the document as a series of events that are translated into method calls in your document handler implementation.
You should choose your parser depending on the nature of the processing and the size of the XML documents. A tree-based parser usually needs to load the entire document into memory, so it can be impractical because of physical constraints on memory when processing documents like dictionaries or large databases. With a stream-based parser you can skip over elements that you aren't interested in (for example, when looking up a particular word in a dictionary). If your application needs to process certain elements in relation to other elements, however, a tree-based parser is much easier to work with. It's worth noting that a tree-based parser can be built on top of a stream-based parser and that the output of a tree-based parser can be "walked" to provide a stream-based interface to an application. In this section we cover the DOM and SAX parsing methods and provide example scenarios in which you can decide which method is appropriate for a given task.
Figure 3-5 A SAX Event Order.
The term "walked" refers to taking pieces of the document and sending them out in parts . You are traversing, or walking, the document objective model.
DOM implementations are currently biased toward in-memory storage of the document, but this may change as Persistent DOM (PDOM) implementations become more popular. Even with memory limitations, however, DOM certainly has a place because of features that help it access and manipulate documents. The following are DOM benefits you should focus on:
The first two benefits are the ability to randomly access the document and create complex searches. These provide a means for searching for elements and retrieving information, such as data and attributes, on these elements. The DOM can also be bound to an XML DTD or schema, which means it can be checked to make sure the data contained in the document is valid according to the rules of the DTD or schema. Finally it provides the ability to read data out of a document and write data to it.
The DOM's simplicity, powerful access to the document, and a well-defined specification make it a popular parser method. It also pairs well with XSLT and other document-transformation solutions you might require. Therefore, if your project is small and you need to complete it quickly, using a DOM-based method is a great choice. However, if you are going to process large files and have the time to write a more robust application, you should look into a SAX-based implementation.
If you need to parse and process huge XML documents, SAX implementations offer some benefits over DOM-based ones. You should first ask yourself, however, if an improved design would remove the need for large documents. For example, prefiltering in a database that can stream XML might suit your needs. By going with SAX, however, you can enforce options for document manipulation by using XSLT and requiring your team to write code to internally manage, store, and rewrite the document.
Like the DOM, SAX has a particular set of benefits. The following list contains some of the most useful:
The biggest advantage of SAX is, arguably, its ability to process files of any size. The way the parser streams data in and out (exposes data) allows it to handle files of any size. SAX is also useful when you want to build your own data structure and allows you to grab only subsets of the information in a given document. Finally it can be a fast method of processing documents, especially when parsing large files.
SAX is best suited to sequential-scan applications when you want to go through the XML document quickly from start to finish. Also, sometimes you won't need the overhead of the full-blown DOM, so a SAX parser will be sufficient for creating a lightweight and compact internal data structure.
To help you choose a method we included a few example scenarios. While processing and using XML documents is more widely adopted every day, many lessons have gone unnoticed because of lack of experience. These scenarios should help by allowing you to walk down a decision path and choose the right approach. With these benefits in mind, review the following scenarios and determine which parser is appropriate for each one.
Company XYZ currently has 20,000 employees. The Human Resources data file is currently stored in XML format, and you are to write an application that returns the average annual salary of all employees .
If you use the DOM interface your application will need to load the entire employee database into memory and retrieve the document.employee[i].annual_salary[0] value for each employee, and then average the values.
Using the SAX approach you could write an event handler that looks for only the <annual_salary> element and ignores everything else. You could parse through the file systematically and efficiently . The solution in this scenario is clear: SAX makes one pass through a large file and looks for specific data.
Company ABC has 375 employees. The Human Resources data file is stored in XML format, and you are to write an application that allows users to scroll through the list of employees and find detailed information on an employee.
If you tried this with the SAX approach you would either have to parse through the XML document every time you wanted to display information to the user , which is inefficient, or you would have to build your own memory structure so that you could parse it once and then access it multiple times. You'd need to keep track of all the information yourself and develop and maintain the code required to support this data storage scheme.
By using the DOM you have access to the entire employee database as nodes in a tree. The data storage mechanism and the code that supports it is essentially provided by the parser. This is a much easier solution!
If we return to the first scenario, let's say you're asked to modify your application to give 7 percent raises to those employees below the company average and 4 percent to those above it.
Your application would need to be parsed with the DOM. SAX does not support modification of data. Even if it did, as an event-based parser it would be difficult to write two sets of event handlers: one to calculate the average and one to update the data as it's parsed for the second time.
![]() XML: Visual QuickStart Guide (2nd Edition) | ![]() Beginning XML, 4th Edition (Programmer to Programmer) | ![]() Real World XML (2nd Edition) | ![]() XML 1.1 Bible |