SAX2 and MSXML Interfaces

only for RuBoard

Previously, you saw how to work with the MSXML 4.0 DOM implementation with ASP.NET by using the .NET's COM Interop services. In this section, you look at how to implement the SAX2 Visual Basic interfaces in Visual Basic .NET. SAX defines a set of interfaces that model the Infoset. MSXML 4.0 offers C++ and Visual Basic-based support to SAX. The names of the interfaces in MSXML mirror what was originally defined in the Java

Language, but prefixed with ISAX for C++ and IVBSAX for Visual Basic. Table 5.6 maps the MSXML SAX2 COM Visual Basic interfaces to the original Java interfaces.

Table 5.6. SAX Interfaces

Java Interfaces

MSXML and Visual Basic

COM Description

ContentHandler

IVBSAXContentHandler

Primary SAX interface that models the Infoset's core information items.

ErrorHandler

IVBSAXErrorHandler

Models fatal errors, errors, and warnings (per XML 1.0).

DTDHandler

IVBSAXDTDHandler

Models unparsed entities and notations.

EntityResolver

IVBSAXEntityResolver

Allows an application to perform custom resolution of external entity identifiers.

LexicalHandler

IVBSAXLexicalHandler

Models non-core lexical information (such as comments, CDATA sections, and entity references).

DeclHandler

IVBSAXDeclHandler

Models element and attribute declarations.

XMLReader

IVBSAXXMLReader

Makes it possible to tie together the previously listed interfaces in order to process a complete document information item.

Attributes

IVBSAXAttributes

Models a collection of attributes.

Locator

IVBSAXLocator

Provides contextual information about the caller.

SAX is essentially comprised of three basic components :

  • The XML Parser , the implementation of the XMLReader interface { SAXXMLReader in MSXML}, streams the document from top to bottom and notifies the content handler when it encounters different items in the document, such as the start and the end tags of an element. For example, when it encounters the start of an element, it calls the startElement method that's defined in the ContentHandler interface.

  • In the Event Handlers , you need to provide the code to process the document. The Event Handlers are the classes that implement one or more of the SAX interfaces that model the Infoset's information items. The ContentHandler interface models the core items, and the DTDHandler , LexicalHandler , and the DeclHandler model the rest. The ErrorHandler interface supports error handling, and models fatal errors and warnings.

  • The Application creates an instance of the XMLReader and creates instances of one or more handler classes that together represent the Document Handler. It then registers the different handler objects with the reader by setting the respective properties of the reader. Finally, it starts the process by calling the parse() method of the XML Parser.

Transforming XML to HTML Using SAX

In the examples discussed earlier in this chapter, we transformed the Customers XML document to HTML by using node traversal in DOM and by using an XSLT stylesheet. Let's see how we can accomplish the same result by implementing the SAX interfaces. Start by creating a new Visual Basic project for an ASP.NET web application and name this project VBSAX, as shown in Figure 5.11. We use this project for the remaining examples in this chapter.

Figure 5.11. Creating the VBSAX ASP.NET web application.
graphics/05fig11.gif

The following is the customers.xml file that you need to add to the VBSAX project and use for the transformation:

 <?xml version="1.0" encoding="utf-8" ?>  <Customers>       <Customer id="ALFKI">            <CompanyName>Alfreds Futterkiste</CompanyName>            <Contact>                 <FirstName>Maria</FirstName>                 <LastName>Anders</LastName>                 <Title>Sales Representative</Title>            </Contact>       </Customer>       <Customer id="THEBI">            <CompanyName>The Big Cheese</CompanyName>            <Contact>                <FirstName>Liz</FirstName>                <LastName>Nixon</LastName>                <Title>Marketing Manager</Title>            </Contact>       </Customer>       <Customer id="EASTC">            <CompanyName>Eastern Connection</CompanyName>            <Contact>                 <FirstName>Ann</FirstName>                 <LastName>Devon</LastName>                 <Title>Sales Agent</Title>            </Contact>       </Customer>  </Customers> 

Rename the webform1.aspx that was created by default to SAXTransform.aspx . Listing 5.14 shows the code for the SAXTransform.aspx page that instantiates the SAXXMLReader class and the VBSAXLIB.CSAXTransform class, which you see in Listing 5.15. While you instantiate the CSAXTransform class, you can choose either to write the output to a file on the server, as shown by the commented code, or directly to the browser by passing a reference of the Response object to the transform object. To run this aspx page, you must add a reference to the VBSAXLIB Class Library. Next , you see how to build this library.

Listing 5.14 Transforming XML to HTML Using SAX ( SAXTransform.aspx )
 <%@ Page language="VB"    %>  <%@ Import Namespace = "VBSAXLIB"   %>  <%@ Import Namespace = "MSXML2"   %>  <%            'write the output to file            'Dim htmlFilePath as String            'htmlFilePath =  Server.MapPath(".") & "\customers.html"            'Dim transform As New VBSAXLIB.CSAXTransform(htmlFilePath)            'write the output to the browser            Dim transform As New VBSAXLIB.CSAXTransform(Response)            Dim reader As New MSXML2.SAXXMLReader40 ()            reader.contentHandler = transform            reader.errorHandler = transform            On Error Resume Next            reader.parseURL("http://localhost/VBSAX/customers.xml")  %> 

The contentHandler and the errorHandler properties are set to the transform object so that the reader notifies it of the corresponding events after the parseURL method is invoked.

To create the VBSAXLIB Class Library, you need to create a Visual Basic project, as shown in Figure 5.12.

Figure 5.12. Creating the VBSAXLIB Class Library.
graphics/05fig12.gif

The next step is to add a class to the project by using the Add New Item menu item from the Project menu. Name the file CSAXTransform.vb , as shown in Figure 5.13.

Figure 5.13. Adding the CSAXTransform class.
graphics/05fig13.gif

The CSAXTransform class requires a reference to the System.web assembly and the MSXML2 wrapper (discussed in the section, "Working with MSXML 4.0 on the Server-Side with ASP.NET"). After you add these references the Solution Explorer, it looks close to what's shown in Figure 5.14.

Figure 5.14. The Solution Explorer after adding the assembly references and the CSAXTransform class.
graphics/05fig14.gif

The CSAXTransform class implements the IVBSAXContentHandler and the IVBSAXErrorHandler interfaces. This class provides two constructors: One allows you to send the data to the browser through the HTTP Response stream, and the other provides you with an option to alternatively write the output to an HTML file on the file system. The second option is a common scenario when you need to generate HTML in a staging environment before you push it to the production environment. Also, this option allows you to use this Class Library in a desktop-based Windows application. Listing 5.15 shows the code for the CSAXTransform class that performs the job of transforming the XML into HTML for use in the VBSAX ASP.NET application. (Listing 5.15 is split with some explanation introduced along with the subheadings at important steps.)

Listing 5.15 Implementing the ContentHandler and the ErrorHandler Interfaces in Visual Basic .NET ( CSAXTransform.vb )
 Imports System.IO  Imports MSXML2  Public Class CSAXTransform      Implements IVBSAXContentHandler, IVBSAXErrorHandler      Private custCount As Integer      Private Response As System.Web.HttpResponse      Private file As StreamWriter      Private bHTTPStream As Boolean      '****************** constructors *****************************      Sub New(ByRef Response As System.Web.HttpResponse)          Me.Response = Response          bHTTPStream = True      End Sub      Sub New(ByVal filePath As String)          bHTTPStream = False          Dim fs As FileStream = New FileStream(filePath, FileMode.Create, _                     FileAccess.Write)          file = New StreamWriter(fs)          ' Set the file pointer to the beginning.          file.BaseStream.Seek(0, SeekOrigin.Begin)      End Sub      Private Sub writeHTML(ByVal str As String)          If bHTTPStream Then              Response.Write(str)          Else              file.Write(str)          End If      End Sub 
Implementing the IVBSAXContentHandler Interface

When you implement an interface, it's mandatory to provide an implementation for all the methods and properties in the interface, as shown in the following code. Otherwise, you might find some methods contain code that does not perform any functions.

 Private Sub startDocument() Implements IVBSAXContentHandler.startDocument          writeHTML("<HTML><HEAD><TITLE>Customers</TITLE></HEAD>"& vbCrLf & "<BODY>")      End Sub      Private Sub startElement(ByRef strNamespaceURI As String, ByRef strLocalName As graphics/ccc.gif String, _                    ByRef strQName As String, ByVal oAttributes As MSXML2.IVBSAXAttributes) graphics/ccc.gif _                    Implements IVBSAXContentHandler.startElement          Select Case strLocalName              Case "Customers"                  writeHTML("<TABLE border='1'><THEAD>                  <TH>CustomerID</TH><TH>Company Name</TH>" & _                              "<TH>Contact Name</TH><TH>Contact Title</TH></THEAD><TBODY>")              Case "Customer"                  writeHTML("<TR><TD>")                  If oAttributes.length > 0 Then                      writeHTML(oAttributes.getValue(0))              End If              custCount = custCount + 1          Case "CompanyName", "Title"                  writeHTML("</TD><TD>")          Case "FirstName"                  writeHTML("</TD><TD>")      End Select   End Sub  Private Sub processingInstruction(ByRef target As String, ByRef data As String) _                      Implements IVBSAXContentHandler.processingInstruction  End Sub  Private Sub startPrefixMapping(ByRef strPrefix As String, ByRef strURI As String) _                      Implements IVBSAXContentHandler.startPrefixMapping  End Sub  Private Sub endPrefixMapping(ByRef strPrefix As String) _                      Implements IVBSAXContentHandler.endPrefixMapping  End Sub  Private Sub endElement(ByRef strNamespaceURI As String, ByRef strLocalName As String, _                      ByRef strQName As String) Implements IVBSAXContentHandler.endElement      Select Case strLocalName          Case "Customers"              writeHTML("</TABLE> " & vbCrLf & _"<BR><P><B>Found "                       & custCount & " customer(s) </B></P>")          Case "Customer"              writeHTML("</TD>" & vbCrLf & "</TR>")      End Select  End Sub  Private Sub ignorableWhitespace(ByRef strChars As String) _                      Implements IVBSAXContentHandler.ignorableWhitespace  End Sub  Private Sub characters(ByRef strChars As String) _                      Implements IVBSAXContentHandler.characters               writeHTML(removeNewline(strChars))  End Sub  Private Sub skippedEntity(ByRef strName As String) Implements graphics/ccc.gif IVBSAXContentHandler.skippedEntity  End Sub  Private WriteOnly Property documentLocator() As MSXML2.IVBSAXLocator _                      Implements IVBSAXContentHandler.documentLocator      Set(ByVal Value As MSXML2.IVBSAXLocator)          End Set      End Property      Private Sub endDocument() Implements IVBSAXContentHandler.endDocument          writeHTML(vbCrLf & "</BODY></HTML>")          If Not bHTTPStream Then              file.Flush()              file.Close()          End If      End Sub      Private Function removeNewline(ByVal strChars As String) As String          Select Case strChars              Case vbCrLf, vbCr, vbLf                  strChars = ""          End Select          removeNewline = strChars      End Function  End Class 

The preceding code is simple to follow. The startDocument method starts by writing the HTML syntax. The startElement method starts a TABLE tag on the occurrence of the root element and increments the counter to count the number of customers it encounters. On the occurrence of other elements, it closes and starts the TD tags. The characters method simply writes out all the PCDATA the parser encounters. The endElement method writes out the closing tags and also writes the final count of the customers when the end of the root tag is encountered . The endDocument method closes the BODY and HTML tags and closes the stream writer in the case of a file stream.

In this example, you are writing the data to the output stream as you find it. Notice that, to keep track of the number of customers, a counter variable is maintained , whereas in the case of DOM, you could easily find that value from the length property of a NodeList object. In a more complex application where you need to do some processing based on the previously found data, you need to create your state-handling mechanism using data structures, such as Stacks and Collections, in the ContentHandler implementation. This retains the state as the parser reads the document from start to finish. For example, if you want to calculate the sum of the three items purchased from an inventory document and display them in alphabetical order, you need to retain the state of all the items until you encounter the last item purchased.

Implementing the IVBSAXErrorHandler Interface

The current SAX2 implementation in MSXML treats all types of errors as fatal errors, so we must provide implementation for the fatalError() method alone. We write the error message to the output stream and log the message to a file. The following code shows how you can implement the IVBSAXErrorHandler interface.

 Public Sub localError(ByVal oLocator As MSXML2.IVBSAXLocator, ByRef strErrorMessage As graphics/ccc.gif String, _         ByVal nErrorCode As Integer) Implements IVBSAXErrorHandler.error      End Sub      Public Sub fatalError(ByVal oLocator As MSXML2.IVBSAXLocator, ByRef strErrorMessage graphics/ccc.gif As String, _          ByVal nErrorCode As Integer) Implements IVBSAXErrorHandler.fatalError          writeHTML("<BR><B>Error  <BR>Error Code: " & _                           nErrorCode & " Error: " & strErrorMessage & "Line:" & graphics/ccc.gif oLocator.lineNumber & _                           ", Position " & oLocator.columnNumber & "</B><BR>")          Log(" Error Code: " & nErrorCode & " Error: " & strErrorMessage & "Line: " & graphics/ccc.gif oLocator.lineNumber & _                           ", Position " & oLocator.columnNumber)      End Sub      Public Sub ignorableWarning(ByVal oLocator As MSXML2.IVBSAXLocator, ByRef graphics/ccc.gif strErrorMessage As String, _                                  ByVal nErrorCode As Integer) Implements graphics/ccc.gif IVBSAXErrorHandler.ignorableWarning      End Sub 

You might notice a slight difference in the name used for the error() method. This is because Visual Basic .NET does not allow you to use the keyword error for a method name.

Log Errors to a File

The following code is a simple subroutine to write out the error messages to a log file:

 Public Shared Sub Log(ByVal logMessage As String)      Dim fs As FileStream = New FileStream("log.txt", FileMode.OpenOrCreate, _          FileAccess.Write)      Dim w As New StreamWriter(fs)      ' Set the file pointer to the end.      w.BaseStream.Seek(0, SeekOrigin.End)      w.Write(vbCrLf + "Log Entry : ")      w.WriteLine("{0} {1}", DateTime.Now.ToLongTimeString(), _         DateTime.Now.ToLongDateString())      w.WriteLine("  {0}", logMessage)      w.WriteLine("------------------------------------------------------")      w.Flush()      w.Close()  End Sub 

The subroutine logs the error message that's passed to it as an argument into the log.txt file with the current time and date. The subroutine then closes the file stream.

Although this example parses an XML document of a small size , it shows you how to implement the SAX interfaces and use SAX in an ASP.NET application. This example converts all the data available in the XML document into HTML. However, in the case of XML documents of large sizes, you can change the code to display only a summary of the data by filtering some elements. We look at an alternative way of doing this in the next section, as we learn about SAX filters.

SAX Filters

The XMLFilter interface enables you to include additional transparent interceptors between the XMLReader and the ContentHandler class. This lets you create a chain or a pipeline of filters and gives each one a specific processing responsibility. Each of these filters intercept the calls to the final content handler, do some processing themselves , and delegate the call to the next content handler in the chain.

This can be slightly confusing when you look at it for the first time, so here's a simple explanation: The first filter in the chain behaves as a content handler for the base XMLReader class and it behaves as an XMLReader for the second filter (or the content handler) in the chain. This dual behavior is possible for the filter class because it must implement both the XMLReader and the ContentHandler interfaces. Figure 5.15 shows a visual representation of the chain of filters.

Figure 5.15. The flow of methods.
graphics/05fig15.gif

Note

The current example does not include the second level, represented by FILTER2 in Figure 5.15, but it's been added here to show a possible intermediate filter in the chain.


Next, you see how to include a filter in between the Reader class and the CSAXTransform class that you saw in the last example. The responsibility of the CSAXTransform class was to count the number of customers and transform the XML syntax into HTML. This class ignores what is present in the XML document; instead, it simply responds to the calls by the Reader class. We utilize this fact to add some searching capability into the same application. An implementation of the XMLFilter interface intercepts the calls, reviews all the customer elements, but passes to the CSAXTransform instance only those customer elements whose ID attribute value starts with a specified search string.

We need to add a new ASP.NET page, SAXFilter.aspx , to the VBSAX web project and the corresponding Visual Basic .NET class CSAXFilterImpl to the VBSAXLIB Class Library.

The SAXFilter.aspx ASP.NET page in Listing 5.16 is almost identical to the SAXTransform.aspx page in the previous example with the exception for the inclusion of the CSAXFilterImpl class, which implements the IVBSAXXMLFilter interface.

Listing 5.16 SAXFilter.aspx
 <%@ Page language="VB"    %>  <%@ Import Namespace = "VBSAXLIB"   %>  <%@ Import Namespace = "MSXML2"   %>  <%       Dim reader As IVBSAXXMLReader          reader = New SAXXMLReader40 ()          Dim filterImpl As New CSAXFilterImpl("AL")// provide "AL" as the                                                    //search string          Dim filter As MSXML2.IVBSAXXMLFilter          filter = filterImpl          filter.parent = reader          Dim transform As New CSAXTransform(Response)          reader = filterImpl          reader.contentHandler = transform          reader.errorHandler = transform          On Error Resume Next          reader.parseURL("http://localhost/VBSAX/customers.xml")  %> 

The constructor for CSAXFilterImpl takes a string parameter, which is the search string for the ID attribute. The filter sets an instance of the XMLReader as its parent. Then the reader variable is set to the filter implementation. This is possible because the CSAXFilterImpl implements the XMLReader interface. This reader sets its Content Handler and Error Handler to an instance of the CSAXTransform class. Notice that the Filter implementation class keeps a reference to these handlers in instance variables (these variables are prefixed with m_ in the CSAXFilterImpl example class), so that it can call methods on them when required. Finally, the application calls the parseURL() method. CSAXFilterImpl ,in turn , passes this call of parseURL() to its parent XMLReader ( m_parent in the CSAXFilterImpl class), to which it had received a reference in the beginning through its parent property.

This class is lengthy because it must provide an implementation for the four interfaces and must account for all the handler types a parent XMLReader might set. (Look at the most significant part of this code in Listing 5.17.)

Listing 5.17 The CSAXFilterImpl Class ( VBSAXLIB.CSAXFilterImpl )
 Imports MSXML2  Public Class CSAXFilterImpl      Implements IVBSAXXMLFilter, IVBSAXXMLReader, IVBSAXContentHandler,  IVBSAXErrorHandler      Private m_parent As IVBSAXXMLReader      Private m_contentHandler As IVBSAXContentHandler      Private m_dtdHandler As IVBSAXDTDHandler      Private m_entityResolver As IVBSAXEntityResolver      Private m_errorHandler As IVBSAXErrorHandler      Private m_lexicalHandler As IVBSAXLexicalHandler      Private m_declHandler As IVBSAXDeclHandler      Private m_secureBaseURL As String      Private m_baseURL As String      Private bSkipCustomer As Boolean      Private customerID As String      Sub New(ByVal customerID As String)          Me.customerID = customerID          bSkipCustomer = False      End Sub 

The initParse() method sets the contentHandler and the errorHandler properties of the parent XMLReader class to itself. We mentioned that the filter behaves as the Content Handler to the base XMLReader class. That is exactly what the initParse() method configures.

The following code shows the initparse() method and the IVBSAXXMLFilter and IVBSAXXMLReader interface implementations :

 Public Sub initParse()          m_parent.contentHandler = Me          m_parent.errorHandler = Me      End Sub  .  .  Private Sub parseURL(ByVal strURL As String) _                                      Implements IVBSAXXMLReader.parseURL          initParse()          m_parent.parseURL(strURL)      End Sub  .  'IVBSAXXMLFilter  interface implementation      Public Property parent() As MSXML2.IVBSAXXMLReader _                                      Implements IVBSAXXMLFilter.parent          Set(ByVal Value As MSXML2.IVBSAXXMLReader)              m_parent = Value          End Set          Get              Return m_parent          End Get      End Property  ' IVBSAXXMLReader  interface implementation  Public Property contentHandler() As MSXML2.IVBSAXContentHandler _                                      Implements IVBSAXXMLReader.contentHandler          Set(ByVal Value As MSXML2.IVBSAXContentHandler)              m_contentHandler = Value          End Set          Get              Return m_contentHandler          End Get      End Property  .  Public Property errorHandler() As MSXML2.IVBSAXErrorHandler _                                       Implements IVBSAXXMLReader.errorHandler          Set(ByVal Value As MSXML2.IVBSAXErrorHandler)              m_errorHandler = Value          End Set          Get              Return m_errorHandler          End Get  End Property 
The ContentHandler Implementation

As previously mentioned, the filter class behaves like the XMLReader for the filter or the Content Handler that appears next in the chain. The filter class accomplishes this by performing the necessary processing when it receives information through a parse event by the parent XMLReader . It passes the event down the chain to the next Content Handler, m_contentHandler in this case, which has a reference to the CSAXTransform instance.

 'IVBSAXContentHandler interface implementation  Private Sub startElement(ByRef strNamespaceURI As String, ByRef strLocalName As String, _                      ByRef strQName As String, ByVal oAttributes As graphics/ccc.gif MSXML2.IVBSAXAttributes) _                      Implements IVBSAXContentHandler.startElement          Select Case strLocalName              Case "Customer"                  If Mid(oAttributes.getValue(0), 1, Len(customerID)) = customerID Then                      bSkipCustomer = False                      m_contentHandler.startElement(strNamespaceURI, strLocalName, strQName, graphics/ccc.gif oAttributes)                  Else                      bSkipCustomer = True                  End If              Case "Customers"                  m_contentHandler.startElement(strNamespaceURI, strLocalName, strQName, graphics/ccc.gif oAttributes)              Case Else                  If Not bSkipCustomer Then                      m_contentHandler.startElement(strNamespaceURI, strLocalName, strQName, graphics/ccc.gif oAttributes)                  End If          End Select      End Sub      Private Sub endElement(ByRef strNamespaceURI As String, ByRef  strLocalName As String, _                          ByRef strQName As String) Implements graphics/ccc.gif IVBSAXContentHandler.endElement          If strLocalName = "Customers" Or Not bSkipCustomer Then              m_contentHandler.endElement(strNamespaceURI, strLocalName, strQName)          End If      End Sub  .  ,  Private WriteOnly Property documentLocator() As MSXML2.IVBSAXLocator _                                Implements IVBSAXContentHandler.documentLocator          Set(ByVal Value As MSXML2.IVBSAXLocator)              m_contentHandler.documentLocator = Value          End Set      End Property  .  .    Private Sub characters(ByRef strChars As String) _                                  Implements IVBSAXContentHandler.characters          If Not bSkipCustomer Then              m_contentHandler.characters(strChars)          End If      End Sub      Private Sub endDocument() Implements IVBSAXContentHandler.endDocument          m_contentHandler.endDocument()      End Sub  .  'IVBSAXErrorHandler interface implementation  Public Sub fatalError(ByVal oLocator As MSXML2.IVBSAXLocator, ByRef  strErrorMessage As String, _                               ByVal nErrorCode As Integer) Implements graphics/ccc.gif IVBSAXErrorHandler.fatalError             m_errorHandler.fatalError(oLocator, strErrorMessage, nErrorCode)       End Sub  . 

The methods of the ContentHandler interface pass the events to the CSAXTransform instance to do the processing only if the ID attribute matches the search criteria. The Error Handler simply passes the events to the next Error Handler to take necessary action. Figure 5.16 shows the output that's produced by the CSAXTransform class.

Figure 5.16. Using XMLFilter to filter XML documents.
graphics/05fig16.gif
Using SAXFilter to Update and Create New Elements

The preceding SAXFilter example shows you how to call the methods of a Content Handler selectively. In a way, it simulates the events when required. We can make use of this technique to modify the existing content or create new content in the XML documents. Although using SAX, it is not possible to modify the contents of the original document and save it as you can with DOM; you can only make it appear to be modifying the content by creating a new document. For example, as the events are being intercepted by the filter, you can pass a new parameter value to the characters() method of the Content Handler. As the handler writes the document to a new location, the new document will contain the modified values. Similarly, you can add new elements to the document by manually invoking methods of the ISAXContentHandler in an appropriate order that's similar to the old location, in case an element already exists.

Abort Processing

When an implementation finds the content it's looking for in a document (take the Customer example, for instance), if you are searching for a specific customer, you can stop processing if you find a customer with a specified ID. The SAXXMLReader object doesn't have a method that can interrupt parsing. Instead, you must stop parsing by raising an application-specific exception. The ContentHandler implementations can accomplish this by indicating to the XMLReader that it wants to abort processing. The following code shows how this is done:

 Private Sub endElement(ByRef strNamespaceURI As String, ByRef strLocalName As  String, _                          ByRef strQName As String) Implements graphics/ccc.gif IVBSAXContentHandler.endElement          If bFound Then              Err.Raise(vbObjectError + errAbort , "endElement", "Abort processing")          End If      End Sub 

MXXMLWriter

The CSAXTransform class transforms the XML syntax to HTML. Assume that we required an implementation of a Content Handler who handles the details of building another XML document from the events passed to it by the XMLReader class. This class allows you to specify the output as a string or an implementation of the IStream interface. It allows you to control the output by specifying whether indentation is required, omitting the XML declaration, setting the encoding, and so on. If this is the requirement, the MXXMLWriter class does all this for you when it's connected to SAXXMLReader . Listing 5.18 shows you how to use the MXXMLWriter in conjunction with the SAXXMLReader class. This application writes the output XML string (string is the default output; this can be set to an implementation of IStream) created by the MXXMLWriter class. Setting the omitXMLDeclaration property to true filters out the XML declaration part in the output.

Listing 5.18 Using MXXMLWriter ( MXXMLWriter.aspx )
 <%@ Page language="VB"    %>  <%@ Import Namespace = "MSXML2"   %>  <%            Dim reader As New SAXXMLReader()          Dim writer As New MXXMLWriter()          reader.contentHandler = writer          reader.dtdHandler = writer          reader.errorHandler = writer          reader.putProperty("http://xml.org/sax/properties/declaration-handler", writer)          reader.putProperty("http://xml.org/sax/properties/lexical-handler", writer)          writer.omitXMLDeclaration = True          reader.parseURL("http://localhost/VBSAX/customers.xml")  %>  <html>        <body>             <PRE>                  <%= Server.HTMLEncode(writer.output)%>             </PRE>        </body>  </html> 

You can also manually build an XML document by using the MXXMLWriter class, and by invoking methods of the ISAXContentHandler , ISAXDTDHandler , ISAXDeclHandler , and ISAXLexicalHandler interfaces. To make your life simpler, a new object, MXHTMLWriter in MSXML 4.0, allows you to output HTML using a stream of SAX events, similar to what we did in the CSAXTransform class.

Benefits of Using SAX

Now that you have seen the DOM and the SAX APIs at work and you have seen the benefits of using DOM, take a look at the benefits of using SAX so that you can make a well-balanced decision between the two APIs when you develop your applications using XML:

  • Can parse large documents efficiently ” With SAX, memory consumption does not increase with the size of the file. If you need to process large documents of the order of 2MB and more, SAX is the better alternative, as long as you want the document for read-only access.

  • Allows you to abort parsing ” SAX allows you to abort processing at any time. You can use it to create applications that fetch particular data. After data is retrieved, you can stop processing.

  • Can retrieve small amounts of information ” If you want to scan the document for a small subset of data, it's inefficient to read the unnecessary data into memory. With SAX, it's possible to ignore the data that doesn't interest you.

  • Creating a new document structure is efficient ” In cases where you might want to create a new document structure by filtering out some elements in the original document, SAX allows you to do this more efficiently and quickly.

Drawbacks of Using SAX

The following are the drawbacks of the SAX API:

  • Requires state handling mechanisms ” As the SAX parses through the documents and raises the events, it does not retain any state of the previous elements and even the relationships between two elements. You might want to maintain some context and do further processing based on the previous values. To accomplish this, you need to include state handling mechanisms in your code. You have to create data structures, which can become complex for documents with complex structures.

  • Difficult to implement pull model on top of push model ” It is a challenge when you want to build a pull model on top of the push model. The pull model is consumer driven and it allows the consumer to navigate to the content it desires to process. But with a push model, everything must be passed through the application. To simulate the pull behavior, the content handlers require building complex state machines that involve working with many variables. The document might have to be parsed more than once, because navigating back and forth is not possible in a single pass with the SAX model.

With the .NET Framework, you might choose to work with XMLReader classes as the pull model offers a more familiar programming model along with several performance benefits. If you still find yourself comfortable with SAX, it's possible to layer a set of push-style interfaces on top of the XMLReader pull model, but the reverse is not true.

MSXML Versions Shipped with Microsoft Products

Table 5.7 lists some of the more recent MSXML versions to help you find out the version that's available to you.

Table 5.7. MSXML Versions Shipped with Microsoft Products

Operating System or Program

Internet Explorer

MSXML Version / Filename

Office 2000

Internet Explorer 5.0

2.0a / msxml.dll

Windows 95, Windows 98, or Windows NT 4.0

Internet Explorer 5.01

2.5a / msxml.dll

Windows 2000

Internet Explorer 5.01

2.5 / msxml.dll

Windows 2000

Internet Explorer 5.01, Service Pack 1 (SP1)

2.5 Service Pack 1 (SP1) / msxml.dll

Windows 95, Windows 98, Windows NT 4.0, Windows 2000, or Windows 2000 Service Pack 1 (SP1)

Internet Explorer 5.5

2.5 Service Pack 1 f(SP1) / msxml.dll

Microsoft SQL Server 2000

 

2.6 / msxml2.dll

Windows XP Home Edition, Windows XP Professional

Internet Explorer 6

3.0 / msxml3.dll

Downloading Microsoft XML Core Services 4.0

You can download MSXML 4.0 from the Microsoft download center at http://msdn.microsoft.com/downloads/default.asp?url=/downloads/sample.asp?url=/msdn-files/027/001/766/msdncompositedoc.xml.

only for RuBoard


XML and ASP. NET
XML and ASP.NET
ISBN: B000H2MXOM
EAN: N/A
Year: 2005
Pages: 184

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net