Push Versus Pull Models | XML and ASP.NET

only for RuBoard

To understand the differences in the two APIs, it is important to first understand the differences between the push and the pull models. In a pull model , the application queries the parser for document contents. With this model, it is possible for the consumer (the application) to pull document contents one at a time and even perform selective processing by skipping over elements of no interest. The parser can make all the document contents available to the application by completing one task: The parser loads the entire document into memory. Because the parser constructs an internal tree structure that allows the application to access the tree structure through the API, the API is also referred to as tree-based API .

In the push model , the parser notifies the application of the document contents as it encounters them (when it parses through the stream of documents in a forward-only fashion). The parser does this through a sequence of callback methods , or events, fired into the application. These events are similar to the more common events in a graphical user interface (GUI). The application developer must provide the event handlers with the knowledge to handle the events that correspond to the start and end of each tag in the document, the start and end of the document, and the occurrence of a block of text. At any given time, the parser holds only a small part of the document in memory, which is equal to the currently encountered tag and its attributes (in the case of an element or the block of text). The application has to implement the event listeners, which expose a standard set of events. This model is known as an event-driven interface . The API is referred as an event-based API .

The DOM is an example of the pull model. This standard was produced by the World Wide Web Consortium (W3C). W3C released the first DOM recommendation, the DOM Level 1, in 1998. On November 13, 2000, it released the DOM Level 2 Specification. (You can view this specification at www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113/). At the time of this writing, the DOM Level 3 specification was being drafted (visit www.w3.org/TR/2001/WD-DOM-Level-3-Core-20010605/ or, for an updated version, visit www.w3.org/TR/DOM-Level-3-Core/).

Building Push and Pull Models

Both push and pull models can be built one over the other. To build a push model over a pull model, generate the events by traversing an in-memory data structure constructed by the pull model. On the other hand, you can generate a tree data structure similar to the one in a pull model through the event handlers in the push model.

The DOM is a platform-neutral specification that declares a set of interfaces. These interfaces define how a DOM-compliant parser might allow us to load, access, manipulate, and serialize XML documents.Various DOM implementations are available today for various platforms. Because we are discussing the integration of the ASP.NET and XML technologies, we need to discuss Microsoft's implementation of the DOM interface. This chapter discusses the MSXML implementation that all the ASP and IE 5.x developers currently use.

The SAX implementation model uses a push model for document parsing. Although the DOM provides a powerful set of interfaces to access and manipulate the document structure, it has some inherent limitations when it works with large documents. It consumes an enormous amount of memory and it slows down the process during loading because it creates the in-memory representation of the large document. SAX evolved in response to these limitations in the DOM approach discussed by the members of the xml-dev mailing list. David Megginson, who released a specification on May 11, 1998, led this group . He also provided the implementations for several popular XML parsers, including Microsoft's MSXML. The SAX specification, although not backed by any consortia like the W3C, is widely accepted by the XML development community and several parser developers have included support for SAX. Megginson released the latest SAX2 specification with support for namespaces in May of 2000.

Understanding DOM and SAX is easy when you compare them to database cursors. The DOM is similar to the scrollable and updateable cursors, whereas the SAX is similar to the read-only, forward-only cursors .

The .NET Reader and Writer Classes

The .NET System.XML assembly provides a combination of stream-based API, such as the SAX model, with a forward-only cursor and pull model, such as the DOM. This is an innovative change in XML document parsing because the SAX model gives us the "best of both the worlds " approach. Chapter 6, "Exploring the System.XML Namespace," looks into the abstract XMLReader and XMLWriter base classes and their concrete implementations. The streaming nature of the classes eliminates the need to create the in-memory representation of the entire document.

One of the most significant benefits in the DOM model is its provision for the random accessing of nodes to insert, delete, or update the nodes. As long as you are working with smaller documents and with the use of permissible resources, you cannot forgo the DOM model. The System.XML assembly provides the classes implementing the DOM interfaces. These are similar to the implementation found in MSXML 4.0 (with a few changes).

The MSXML parser is written with COM technology, whereas the classes in the System.XML namespace are written using managed code. Although neither share common code, they are tested using the same suite of regression tests. Furthermore, the same teams at Microsoft were used to design and implement each of the libraries, and large parts of each library are built on the W3C DOM specification. Therefore, both libraries contain similar objects and methods.

only for RuBoard