SAX


Simple API for XML (SAX), unlike most things in the XML universe, is not a World Wide Web Consortium (W3C) specification but a public domain API that has evolved over time, through the cooperation of individuals on the xml-dev mailing list. It was originally defined as a set of Java interfaces, but working versions in other languages (e.g. C++, Perl, Python) have also evolved.

start sidebar

JAXP uses SAX 2.0. SAX details can be found at www.saxproject.org and the xml-devmailinglistat http://xml.org/xml-dev/index.shtml.

end sidebar

SAX parsers read XML sequentially and do event-based parsing. Effectively, the parser goes through the document serially and invokes callback methods on preconfigured handlers when major events occur during traversal.

The handlers invoked by the parser, as shown in Figure 9.1b, are as follows:

  • org.xml.sax.ContentHandler. Methods on the implementing class are invoked when document events occur, such as startDocument(), endDocument(),or startElement(). An adaptor class DefaultHandler implements this interface with null implementations for the methods and is extended by developers to override the methods in which they are interested.

  • org.xml.sax.ErrorHandler. Methods on the implementing class are invoked when parsing errors occur, such as error(), fatalError(), or warning(). It is usually a good idea to implement a custom error handler, because the DefaultHandler (which is also the default error handler) throws an exception for fatal errors and ignores everything else (validation errors are considered nonfatal).

  • org.xml.sax.DTDHandler. Methods of the implementing class are invoked when a DTD is being parsed. A special handler is needed for DTDs because of their inherent non-XML syntax. Developers are unlikely to implement this interface, because Web services typically use XML schemas, which themselves are XML documents.

  • org.xml.sax.EntityResolver. Methods of the implementing class are invoked when the SAX parser encounters an XML with a reference to an external entity (e.g., DTD or schema).

When parsing documents using SAX, the application will at a bare minimum have a ContentHandler configured to receive callbacks and an ErrorHandler to handle exceptional conditions. An ErrorHandler is also required when the XML needs to be validated, as will be seen later.

JAXP supports SAX 2.0 completely. In its current state, SAX 2.0 is divided into three packages that are overlaid with JAXP:

  • org.xml.sax. Defines the SAX interfaces.

  • org.xml.sax.helpers. Contains SAX helper classes that implement some of the above interfaces.

  • org.xml.sax.ext. Contains SAX extensions for advanced processing (e.g., to read comments).

  • javax.xml.parsers. Defines the JAXP portion of SAX.

Tables 9.1 through 9.3 describe these packages. Exceptions, deprecated classes, and classes relevant to SAX 1 are not listed.

Table 9.1: The org.xml.sax Package

ContentHandler

An interface that defines callback methods to receive notifications of XML events from the SAX parser

DTDHandler

An interface that defines callback methods to receive notifications of DTD parsing events

EntityResolver

An interface that acts as an agent of the XML reader for resolving entity references in the document

ErrorHandler

An interface that defines callback methods to receive notifications of error messages from the parser

InputSource

A class to encapsulate a single XML document for input

Locator

An interface for the location specification in a document

XMLFilter

An extension of the XMLReader interface to filter an XMLReader

XMLReader

An interface that defines methods to read and parse a document

Table 9.2: The org.xml.sax.helpers Package

DefaultHandler

A convenience implementation of all the core SAX handler interfaces

LocatorImpl

A convenience implementation of Locator

Table 9.3: The org.xml.sax.ext Package

DeclHandler

An interface that enables parsing of DTD declarations in an XML document

LexicalHandler

An interface that enables detection of normally unparsed items, such as comments and CDATA sections

JAXP and SAX

The SAX part of JAXP relevant to parsing is essentially the factory and parser class with the addition of two exception classes. This is described in Table 9.4.

Table 9.4: The SAX Parsing Part of JAXP in the javax.xml.parsers Package

SAXParse

An interface that wraps an XMLReader and implementations does all the SAX parsing

SAXParserFactory

A factory class used to obtain a reference to the SAXParser and configure it if necessary, using properties

Although the current implementation comes with only one SAX parser, based on a system property called javax.xml.SAXParserFactory, the implementation of the SAXParserFactory can be changed dynamically.

The following steps occur when the SAXParserFactory factory is instantiated to obtain a reference to a parser:

  1. If the system property javax.xml.SAXParserFactory is set, its value is used as the class name of the parser.

  2. If a jaxp.properties file exists in the lib directory of the JVM being used, it is read, and the same property is searched for.

  3. If the JAR services API is available, the JAR files will be searched for the file.

  4. META-INF/services/javax.xml.parsers.SAXParserFactory. This file contains the classname of the implementation, such as org.apache.xerces.jaxp .SAXParserFactoryImpl.

  5. The default factory implementation of the reference implementation is used.

The SAXParserFactory can additionally be configured by using the setFeature() method. These properties, which are a part of SAX and not the JAXP specifications, are defined as the URI format—for example, factory.setFeature ("http://xml.org/sax/features/namespaces", true);

Table 9.5 summarizes the relevant properties and their effects.

Table 9.5: Properties that Can Be Configured with the SaxPaserFactory

Property/Description

Default

Available in RI

http://xml.org/sax/features/validation

  • Returns a validating parser. A parser will always check to see if the XML is well formed, but a validating parser will also validate the XML.

False

Yes

http://xml.org/sax/features/namespaces

  • The parser is namespace aware and performs namespace processing.

False

Yes

http://xml.org/sax/features/namespaces-prefixes

  • The parser returns the original prefixed names and attributes. If false, neither attributes nor namespace declarations are reported.

False

Yes

http://xml.org/sax/features/string-interning

  • The parser internalizes String objects. Strings instantiated are pooled in the JVM during processing, using the java.lang.String.intern() method.

False

No. The RI uses its own string optimization.

http://xml.org/sax/features/external-general-entities

  • All external text entities are included

False

True if validating parser

http://xml.org/sax/features/external-parameter-entities

  • All external parameter entities and external DTD subsets are included.

False

True if validating parser

The factory can be informed to return a validating parser using the setValidating(true) method or return a parser that is namespace aware using the setNamespaceAware(true) method (default is false). These methods have the same effect as setting the corresponding properties above.

Let us look at a simple example of an XML file being parsed using SAX. The XML file contains administrator information for the flutebank.com server and is shown in Listing 9.1.

Listing 9.1: Sample XML file to be parsed with SAX

start example
 <?xml version="1.0"?> <contact:flutebank xmlns:contact="http://www.flutebank.com/contacts">      <contact:administrator type="maintenance" level="support-1">            <contact:firstname>John</contact:firstname>            <contact:lastname> Malkovich</contact:lastname>            <contact:telephone>                  <contact:pager>783-393-9213</contact:pager>                  <contact:cellular>379-234-2342</contact:cellular>                  <contact:desk>322-324-2349</contact:desk>            </contact:telephone>            <contact:email>                  <contact:work>john.malkovich@flutebank.com</contact:work>                  <contact:personal>john.malkovich@home.com</contact:personal>            </contact:email>      </contact:administrator </contact:flutebank> 
end example

Listing 9.2a demonstrates the simplicity of the code needed to parse the XML with using SAX in JAXP. A factory is obtained, some properties are set, a parser is obtained from the factory, and the XML is processed using a class as the callback handler for SAX.

Listing 9.2a: SAX parsing code

start example
 package com.flutebank.parsing; import java.io.*; import javax.xml.parsers.*; import org.xml.sax.helpers.DefaultHandler; public class SAXParsing {     public static void main(String[] arg) {         try {         String filename = arg[0];  // Create a new factory that will create the SAX parser         SAXParserFactory factory = SAXParserFactory.newInstance();         factory.setNamespaceAware(true);         SAXParser parser = factory.newSAXParser();  // Create a new handler to handle content         DefaultHandler handler = new MySAXHandler();  // Parse the XML using the parser and the handler         parser.parse(new File(filename), handler);         } catch (Exception e) {            System.out.println(e);         }     } } 
end example

In Listing 9.2b, only some methods of the ContentHandler are overridden by the custom handler, and all three methods from the ErrorHandler are overridden to handle errors.

Listing 9.2b: SAX parsing handler

start example
 package com.flutebank.parsing; import org.xml.sax.helpers.DefaultHandler; import org.xml.sax.*; public class MySAXHandler extends DefaultHandler {     /** The start of a namespace scope */     public void startPrefixMapping(String prefix,String uri) {         System.out.println("----Namespace scope start");         System.out.println("     "+ prefix + "=\" "+ uri + "\" ");     }     /** The end of a namespace scope */     public void endPrefixMapping(String prefix) {         System.out.println("----Namespace scope end");         System.out.println("     " + prefix);     }     /** The opening tag of an element.*/     public void startElement(String namespaceURI,String localName,String                                                               qName,Attributes atts) {         System.out.println("----Opening tag of an element");         System.out.println("       Namespace: " + namespaceURI);         System.out.println("      Local name: " + localName);         System.out.println("  Qualified name: " + qName);         for(int i=0; i<atts.getLength(); i++) {             System.out.println("       Attribute: " + atts.getQName(i) +                                        "=\" "+ atts.getValue(i) + "\" ");         }     }   // Error handler methods /** Handle warnings during parsing */    public void warning(SAXParseException exp) throws SAXException {         show("Warning",exp);         throw(exp);    } /** Handle errors during parsing */     public void error(SAXParseException exp) throws SAXException {         show("Error",exp);         throw(exp);     } /** Handle fatal errors during parsing */     public void fatalError(SAXParseException exp) throws SAXException {         show("Fatal Error",exp);         throw(exp);     } /** Private method for printing details */    private void show(String type,SAXParseException exp) {         System.out.println(type + ": "+ exp.getMessage());         System.out.println("Line "+ exp.getLineNumber() +                            " Column "+ exp.getColumnNumber());         System.out.println("System ID: " + exp.getSystemId());     } } 
end example

The handlers can be set up in multiple ways. Either a single class extending the DefaultHandler can be passed to the instance of the parser, as in Listing 9.2b, or individual handlers can be configured using methods in the XMLReader class, as shown below.

       SAXParserFactory factory = SAXParserFactory.newInstance();       SAXParser parser = factory.newSAXParser();   // Obtain a reference to the underlying XMLReader of the Parser       XMLReader reader = parser.getXMLReader();   // Specify the handlers for the reader       reader.setErrorHandler(new MyErrorHandler());       reader.setContentHandler(new MyContentHandler());       reader.setDTDHandler(new MyDTDHanlder());       reader.setEntityResolver(new MyEntityResolver()); // Use the XMLReader to parse the entire file.       InputSource input = new InputSource(filename); // Start the SAX parsing. Relevant methods in the handlers // will be invoked by the parser         reader.parse(input); 

Neither the SAXParserFactory nor the SAXParser is guaranteed to be multi-threaded, and it is a good idea to have different instances per application processing thread.




Java Web Services Architecture
Java Web Services Architecture (The Morgan Kaufmann Series in Data Management Systems)
ISBN: 1558609008
EAN: 2147483647
Year: 2005
Pages: 210

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net