SAX Packages and Classes


SAX contains the packages described in the following table:

Open table as spreadsheet

Package

Description

org.xml.sax

Defines handler interfaces. Handler implementations of these interfaces are registered with parsers, which call the handler methods in order to report parsing events and errors.

org.xml.sax.ext

Contains two additional, non-mandatory handlers for dealing with DTD declarations and lexical information.

org.xml.sax.helpers

Provides default implementations for some of the core interfaces defined in org.xml.sax.

Each of these packages has several classes. For reasons of brevity, I discuss only the important classes in this chapter.

The SAXParser Class

is the SAXParser class is an abstract class that wraps an XMLReader implementation class. You obtain a reference to this class by calling the newSAXParser() method on the factory class.

      SAXParserFactory factory=SAXParserFactory.newInstance();      SAXParser parser=factory.newSAXParser(); 

After you have a SAXParser object, you can then specify the input for the parsing process using the parser() method. The input to the parser can come from a variety of sources, such as InputStreams, files, URLs, and SAX InputSources. You can open an InputStream on the document to be parsed and send the reference to it as an argument to the parse method of the parser. Instead of using an InputStream, you can pass an instance of the File class or a URL reference or a SAX InputSource as an argument to the parse method. The InputSource class defined in the org.xml.sax package provides a single input source for an XML entity. A single input source may be a byte stream and/or a character stream. It may also be a public or system identifier.

As this parser object parses the document, the handler methods are called. Some of the important methods of the SAXParser class are as follows:

  • q parse-This is the most important method of this class. Several overloaded parse methods take different parameters, such as File, InputSource, InputStream, and URI. For each of the different input types, you also specify the handler to be used during parsing.

  • q getXMLReader-This method returns the XMLReader that is encapsulated by the implementation of this class.

  • q get/setProperty-These methods allow you to get and set the parser properties, such as the validating parser.

The XMLReader Interface

The XMLReader interface is implemented by the parser's driver and is mainly used for reading an XML document. The interface allows you to register an event handler for document processing. Some of the important methods of this interface are:

  • q parse-Two overloaded parse methods take input from either an InputSource object or a String URI. The method parses the input source document and generates events in your handler. The method call is synchronous and does not return until the entire document is parsed or an exception occurs during parsing.

  • q setContentHandler-This method registers a content event handler. If the content event handler is not registered, all the content events during parsing are ignored. It is possible to change the content handler in the middle of parsing. If a new content handler is registered during parsing, the parser immediately uses the new handler while processing the rest of the document.

  • q setDTDHandler-Like a content handler in the previous paragraph, this method registers a DTD handler. You use DTDHandler to report notation and unparsed entity declarations to the application. If the DTD handler is not registered, all DTD events are ignored. As with the content handler, the DTD handler can be changed during parsing. Because DTDs are supported only for maintaining backward compatibility, you may not be using this handler frequently in your applications.

  • q setEntityResolver-Like the previous two methods, setEntityResolver allows you to define an EntityResolver that can be changed during processing.

  • q setErrorHandler-Allows you to handle errors generated during parsing.

Receiving SAX Events

To receive SAX events, you write a Java class that implements one of the SAX interfaces, which means your class has all the same functions as the interface does. You specify that a class implements an interface by declaring it like this:

      public class ProductsReader implements ContentHandler 

ProductsReader is the name of my new class, and ContentHandler is the name of the interface. Actually, this is the most important interface in SAX, as it is the one that defines the callback methods for content related events (that is, events about elements, attributes, and their contents). So what you are doing here is creating a class that contains methods that a SAX-aware parser knows about.

The ContentHandler interface contains a whole series of methods, most of which you can ignore in the normal course of events. Unfortunately, when you implement an interface, you have to provide implementations of all the methods defined in that interface. However, SAX provides you with a default, empty implementation, called DefaultHandler. So rather than implement ContentHandler, you can instead extend DefaultHandler, like this:

      public class ProductsReader extends DefaultHandler 

By extending the DefaultHandler class, you can trap specific events by picking and choosing which methods to provide to your own implementations. If you leave things as they are, the base class (DefaultHandler in this case) provides its own implementation of them for use by ProductsReader. However, if you provide your own implementations of the methods, they are used instead. In the preceding example, the method invoked would now be ProductsReader.startDocument(). This might do something totally different from DefaultHandler's implementation.

Actually, DefaultHandler is a very important class because it also provides default implementations of the three other core SAX interfaces: ErrorHandler, DTDHandler, and EntityResolver. Throughout this chapter, I extend the DefaultHandler class to leverage the functionalities from any of the existing SAX interfaces.

The DefaultHandler Class

The DefaultHandler class provides a default implementation for all the callback methods defined in the following interfaces:

  • q ContentHandler-The class implementing this interface receives notifications on basic document-related events such as the start and end of elements and character data.

  • q ErrorHandler-This interface provides the basic interface for SAX error handlers. The SAX application implements this interface to provide customized error handling.

  • q DTDHandler-The class implementing this interface receives notification of basic DTD-related events.

  • q EntityResolver-This interface provides a basic interface for resolving entities. The SAX application implements this interface to provide customized handling for external entities.

You can use only the DefaultHandler in your application and override the desired methods from the four handler interfaces. Some of the important methods of the DefaultHandler class are as follows:

  • q startDocument/endDocument-These are callback methods called by the parser whenever it encounters a start and end of a parsed document.

  • q startElement / endElement-These are callback methods called by the parser whenever it encounters a start and end of an element during parsing. The method receives the parameters that indicate the local and qualified name of the element.

  • q characters-This method receives notification of character data inside an element during parsing.

  • q processingInstruction-This method receives notification of a processing instruction during parsing.

Using the XMLReader Interface

The primary entry point to any SAX implementation is the XMLReader interface. This interface contains methods for the following:

  • q Controlling how the underlying XML parser operates (for example, validating versus non-validating)

  • q Enabling and disabling specific SAX features, such as namespace processing

  • q Registering object instances to receive XML parsing notifications (via the xxxHandler interfaces)

  • q Initiating the parsing process on a specific document URI or input source (via the parse() methods)

Before an application can use the XMLReader interface, it must first obtain a reference to an object that implements it. The decision about how to support the XMLReader interface is left up to the implementers of the particular SAX distribution. For instance, the Xerces package supplies a class called org.apache.xerces.parsers.SAXParser that implements the XMLReader interface. Any application that uses Xerces to provide SAX support can simply create a new instance of the SAXParser class and use it immediately.

The SAX specification does define a special helper class (from the org.xml.sax.helpers package) called XMLReaderFactory that is intended to act as a class factory for XMLReader instances. It has two static methods for creating a new XMLReader object instance:

  • q createXMLReader()-Allows you to create an XMLReader using the system defaults

  • q createXMLReader(String className)-Allows you to create an XMLReader from the supplied class name

Of course, both of these methods require that the class name of the class that supports the XMLReader interface be known in advance. Here is the code required to create the XMLReader using the second overload.

      XMLReader reader =        XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser"); 

In addition, you can obtain a reference to an XMLReader class instance by directly instantiating the SAXParser class from the Xerces package inside its constructor:

      XMLReader reader = (XMLReader)new org.apache.xerces.parsers.SAXParser(); 

Now that you have an XMLReader object instance to work with, you can register your class to receive XML parse callback notifications. The following code shows the skeleton implementation required for parsing an XML file.

      import javax.xml.parsers.SAXParserFactory;      import javax.xml.parsers.SAXParser;      import org.xml.sax.*;      import org.xml.sax.helpers.*;      public class ProductsReader extends DefaultHandler      {        public static void main(String[] args) throws Exception           System.out.println("Start of Products...");          ProductsReader readerObj = new ProductsReader();          readerObj.read(args[0]);          //add processing here        }        public void read(String fileName) throws Exception        {          XMLReader reader = XMLReaderFactory.createXMLReader            ("org.apache.xerces.parsers.SAXParser");          reader.setContentHandler(this);          reader.parse(fileName);          }          public void startDocument()          {          //add processing here          }          //add other event handlers          //add other event handlers      } 

The previous code uses the XMLReaderFactory.createXMLReader() method to get a reference to the XMLReader object. After that, you invoke the setContentHandler() to register a content event handler. If the application does not register a content handler, all events raised by the SAX parser are ignored. In this case, you pass in the reference to the current context object that has the event handlers for processing the events raised by the SAX parser.

      reader.setContentHandler(this); 

When you invoke the parse() method, you pass in the name of the XML file to be parsed.

      reader.parse(fileName); 

As the XML file is parsed, the SAX parser raises events, such as startDocument, endDocument, and so on that will be discussed in the next section.

DefaultHandler Class

The DefaultHandler class, contained in the org.xml.sax.helpers package, is a helper class that is primarily used as the base class for SAX2 applications. It provides default implementations for all of the callbacks present in the SAX handler classes such as EntityResolver, DTDHandler, ContentHandler, and ErrorHandler. You can extend this class when you need to implement only part of an interface. The DefaultHandler class has a number of predefined methods, called callback methods that the SAX parser calls:

  • q characters-Called by the SAX parser for text nodes

  • q endDocument-Called by the SAX parser when it sees the end of the document

  • q endElement-Called by the SAX parser when it sees the closing tag of an element

  • q startDocument-Called by the SAX parser when it sees the start of the document

  • q startElement-Called by the SAX parser when it sees the opening tag of an element

All the required callback methods are already implemented in the DefaultHandler class, but they don't do anything. That means you have to implement only the methods you want to use, such as startDocument() to catch the beginning of the document or endDocument() to catch the end of the document, as described later in this chapter. The following table lists the significant methods of the DefaultHandler class.

Open table as spreadsheet

Method

Description

characters

Handles text nodes

endDocument

Handles the end of the document

endElement

Handles the end of an element

error

Handles a recoverable parser error

fatalError

Reports a fatal parsing error

ignorableWhitespace

Handles ignorable whitespace (such as that used to indent a document) in element content

notationDecl

Handles a notation declaration

processingInstruction

Handles an XML processing instruction (such as a JSP directive)

resolveEntity

Resolves an external entity

setDocumentLocator

Sets a Locator object for document events

skippedEntity

Handles a skipped XML entity

startDocument

Handles the beginning of the document

startElement

Handles the start of an element

startPrefixMapping

Handles the start of a namespace mapping

unparsedEntityDecl

Handles an unparsed entity declaration

warning

Handles a parser warning

Before looking at an example of this, consider the XML document in Listing 13-1, which describes the attributes of a set of products.

Listing 13-1: A sample XML file

image from book
      <?xml version="1.0" encoding="ISO-8859-1"?>      <Products>        <Product>          <ProductID>1</ProductID>          <Name>Adjustable Race</Name>           <ProductNumber>AR-5381</ProductNumber>        </Product>        <Product>          <ProductID>2</ProductID>          <Name>Bearing Ball</Name>          <ProductNumber>BA-8327</ProductNumber>        </Product>        <Product>          <ProductID>3</ProductID>          <Name>BB Ball Bearing</Name>          <ProductNumber>BE-2349</ProductNumber>        </Product>        <Product>          <ProductID>4</ProductID>          <Name>Headset Ball Bearings</Name>          <ProductNumber>BE-2908</ProductNumber>         </Product>      </Products> 
image from book

The root element of this XML document is <Products>, which contains an arbitrary number of <Product> elements. Each <Product> element, in turn, contains <ProductID>, <Name>, and <ProductNumber> elements.

Now that you took a brief look at the methods, I go into details on how to use them starting with handling the start and end of the document.

The startDocument() and endDocument() Methods

To handle the start and end of a document, you use the startDocument() and endDocument() methods. These signify the beginning and end of events. Here is an example of using of these methods:

      public void startDocument() throws SAXException      {        try{          System.out.println("Start Document");        }        catch(Exception e){          throw new SAXException(e.toString());        }      }      public void endDocument() throws SAXException      {        try{          System.out.println("End Document");        }        catch(Exception e){          throw new SAXException(e.toString());        }      } 

The processingInstruction() Method

You can handle processing instructions by using the processingInstruction() method, which is called automatically when the SAX parser finds a processing instruction. The target of the processing instruction is passed to this method, as is the data for the processing instruction, which means you can handle processing instructions as follows:

      public void processingInstruction(String target, String data) throws SAXException      {        try{          System.out.println("PI(Target= " + target + " Data= " + data + ")");        }        catch(Exception e){          throw new SAXException(e.toString());        }      } 

Namespace Callbacks

This distinguishes between the namespace of an element, signified by an element prefix and an associated namespace URI, and the local name of an element. Two methods include a namespace, startPrefixMapping() and endPrefixMapping(). These are invoked when a parser reaches the beginning and end of a prefix mapping, respectively. A prefix mapping is declared using the xmlns attribute for a namespace and the namespace declaration typically occurs as part of the root element.

The namespace URI is supplied as an argument to the startPrefixMapping() method. This URI is added to namespaceMappings object within the body of the startPrefixMapping() method. The following code shows an example of the startPrefixMapping() method:

      public void startPrefixMapping (String prefix, String uri) throws SAXException      {       try{          namespaceMappings.put(uri,prefix);       }       catch(Exception e){          throw new SAXException(e.toString());       }      } 

The mapping ends when the element that declared the mapping is closed, which triggers the end PrefixMapping() method.

The endPrefixMapping() method, in the following code, removes the mappings when they are no longer available:

      public void endPrefixMapping(String prefix) throws SAXException      {        try{          for (Iterator i=namespaceMappings.keySet().iterator();i.hasNext();)        {            String uri=(String) i.next();            String thisPrefix=(String)namespaceMappings.get(uri);            if(prefix.equals(thisPrefix)){              namespaceMappings.remove(uri);               break;            }          }         }        catch(Exception e){          throw new SAXException(e.toString());        }      } 

The startElement() and endElement() Methods

To handle the start of an element, use the startElement() method. This method is passed the name-space URI of the element, the local (unqualified) name of the element, the qualified name of the element, and the element's attributes (as an Attributes object).

      public void startElement(String uri,String localName,String qName,Attributes atts)          throws SAXException      {        try        {          //Display Start Element name          System.out.println("Start Element : " + qName);          //Determine prefix of a namespace          String prefix=(String)namespaceMappings.get(uri);          if(prefix.equals(""))          {            prefix="[None]";          }          System.out.println("  Element(Namespace:Prefix = '" + prefix +            "' URI = '" + uri + "')");          //Process Attribute of each element          for(int i=0;i<atts.getLength();i++)          {            System.out.println("    Attribute(name: '" + atts.getLocalName(i) + "',              value = '" + atts.getValue(i) + "')");            String attURI=atts.getURI(i);            String attPrefix="";            if(attURI.length() >0)            {              attPrefix=(String)namespaceMappings.get(uri);            }            if(attPrefix.equals(""))            {              attPrefix="[None]";            }            if(attURI.equals(""))            {              attURI="[None]";            }            System.out.println("     Attribute(Namespace:Prefix = '" + attPrefix +              "' URI = '" + attURI + "')");          }          }          catch(Exception e)          {           throw new SAXException(e.toString());          }      } 

In the startElement() callback method, the parameters are the names of the elements and an org.xml.sax.Attributes instance. This helper class contains references to the attributes within an element and allows easy iteration through the attributes of the element. To refer to an attribute, use its index or name. Helper methods, such as getURI(int index) and getLocalName(int index), provide additional namespace information about an attribute.

In addition to the startElement() method, which is called when the SAX parser sees the beginning of an element, you can also implement the endElement() method to handle an element's closing tag. The end of the element displays the complete name of the closed element.

      public void endElement(String uri,String localName,String elemName)          throws SAXException      {        try        {          System.out.println("End Element : \"" + elemName + "\"");        }        catch(Exception e){          throw new SAXException(e.toString());        }      } 

Element Data Callback

This contains additional elements, textual data, or a combination of the two. In XML, the textual data within elements is sent to an application through the characters() callback. This method is passed an array of characters, the location in that array where the text for the current text node starts, and the length of the text in the text node.

      public void characters(char[] ch,int start,int length) throws SAXException      {        try        {          String s=new String(ch,start,length);          if((s.trim()).equals("")){}          else{            System.out.println("Character Encountered :\"" + s.trim() + "\"");          }          }        catch(Exception e){          throw new SAXException(e.toString());        }      } 

The ignorableWhitespace() Method

The ignorableWhitespace() method reports white space, often by using the characters() method. This occurs when no DTD or XML schema is referenced. The constraints in a DTD or a schema specify that no character data is allowed between the start of one element and the subsequent start of another element. If a reference to a DTD is removed, the white spaces will trigger the characters() callback instead of the ignorableWhitespace() callback.

      public void ignorableWhitespace(char[] ch, int start, int length)        throws SAXException      {        try        {          //Ignores whitespaces or call the characters method          //characters(ch, start, length);        }        catch(Exception e){          throw new SAXException(e.toString());        }      } 

By default, the SAX parser also calls ignorableWhitespace when it finds whitespace text nodes, such as whitespace used for indentation. If you want to handle that text like any other text, you can simply pass it on to the characters method you just implemented.

The skippedEntity() Method

The skippedEntity() method is issued when an entity is skipped by a nonvalidating parser. The call-back gives the name of the entity that can be displayed as the output. The following code shows the empty body of the skippedEntity() method:

      public void skippedEntity(java.lang.String name) throws SAXException      {        try{          System.out.println("Entity : " + name);        }        catch(Exception e){          throw new SAXException(e.toString());        }      } 

The setDocumentLocator() Method

The setDocumentLocator() method sets an org.xml.sax.Locator for use in any other SAX event associated with the application. The Locator class has several methods, such as getLineNumber() and getColumnNumber(), which return the current location of the parsing process within an XML file. The following code gives the definition of the setDocumentLocator() method:

      public void setDocumentLocator(Locator locator)      {        //Locator object can be saved for later use in application        this.locator=locator;      } 

Handling Errors and Warnings

The ErrorHandler interface is another callback interface provided by SAX that can be implemented by an application to receive information about parsing problems as they occur. The ErrorHandler interface specifies three notification functions to be implemented by a client application:

Open table as spreadsheet

Method

Description

warning

Called for abnormal events that are not errors or fatal errors

error

Called when the XML parser detects a recoverable error. For instance, a validating parser would throw this error when a well-formed XML document violates the structural rules provided in its DTD

fatalError

Called when a non-recoverable error is recognized. Non-recoverable errors are generally violations of XML well-formedness rules (for instance, forgetting to terminate an element open tag)

The process for registering to receive notifications on the ErrorHandler interface is similar to that for registering the DefaultHandler interface. First, an object that implements the ErrorHandler interface must be instantiated. The new instance is then passed to the XMLReader.setErrorHandler() method so that the SAX parser is aware of its existence.

As you can see, SAX makes it easy to handle warnings and errors. You can implement the warning() method to handle warnings, the error() method to handle errors, and the fatalError() method to handle errors that the SAX parser considers fatal enough to make it stop processing. Before you handle the errors, you must invoke the setErrorHandler() method as follows:

      public void read(String fileName) throws Exception      {        XMLReader reader = XMLReaderFactory.createXMLReader          ("org.apache.xerces.parsers.SAXParser");        reader.setContentHandler(this);        reader.setErrorHandler(this);        reader.parse(fileName);      } 

Once you have invoked the setErrorHandler() method, you must implement one of the notification methods as follows:

      public void warning(SAXParseException e) throws SAXException      {        System.out.println("Warning: ");        displayErrorInfo(e);      }      public void error(SAXParseException e) throws SAXException      {        System.out.println("Error: ");        displayErrorInfo(e);      }      public void fatalError(SAXParseException e) throws SAXException       {        System.out.println("Fatal error: ");        displayErrorInfo(e);      }      private void displayErrorInfo(SAXParseException e)      {        System.out.println("    Public ID: " + e.getPublicId());        System.out.println("    System ID: " + e.getSystemId());        System.out.println("   Line number: " + e.getLineNumber());        System.out.println("   Column number: " + e.getColumnNumber());        System.out.println("   Message: " + e.getMessage());      } 

The displayErrorInfo() helper method displays the details of the exception through the various methods of the SAXParseException object.




Professional XML
Professional XML (Programmer to Programmer)
ISBN: 0471777773
EAN: 2147483647
Year: 2004
Pages: 215

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net