Generating SAX Events from a Text File

printer-friendly version of this section  Print  e-mail this section  E-Mail  add a public, group or private note  Add Note  add a bookmark about this section  Add Bookmark    

Java APIs for XML Kick Start
By Aoyon Chowdhury, Parag Choudhary

Table of Contents
Chapter 4.  Advanced Use of SAX


Generating SAX Events from a Text File

So far, you have learned how to parse an XML file and handle events generated by a SAX parser. In this section, you will learn how to use the SAX events to process a text file.

To understand how to use SAX events to process a text file, you must do the following:

  1. Create a text file containing a hypothetical email message.

  2. Create a SAX parser that generates callback events for each of the elements of the email message. The application implementing the SAX parser will handle the callback events.

  3. Create the application that will use the SAX parser.

Let's look at each of these steps in detail.

Creating the SampleMailMessage.txt File

The first step is to create the text file containing the mail message. As is normally the case with mail messages, it will have the following entries:

  • From

  • To

  • CC

  • BCC

  • Subject

  • MailBody

To create the text file, open the text editor of your choice, enter the following code, and save the file as SampleMailMessage.txt:

From:somereader@somewhere.com To:authors@sams.com CC:publisher@sams.com BCC:otherreaders@worldwide.com Sent:04/01/2001 Subject:Your Books MailBody:Hi, We think your books are great and are of great value to all readers. Thanks  graphics/ccc.gifand keep the good work going. 

As shown, the hypothetical mail message is from somereader@somewhere.com. It is sent to authors@sams.com complimenting the quality of their books. publishers@sams.com and otherreaders@worldwide.com are on the CC and BCC lists, respectively.

The SAX parser you are going to write will generate callback events for each of the elements of the mail message: From, To, CC, BCC, Sent, Subject, and MailBody.

Creating the SAX Parser

The next step is to create the parser. The parser will be named MySAXEventGenerator. To create it, you'll need to do the following:

  1. Import the necessary packages.

  2. Implement the XMLReader interface.

  3. Create ContentHandler, String, and Attributes variables.

  4. Implement the parse() method. In the parse() method, iterate over the file and generate callback methods for each of the elements of the mail message.

  5. Implement the other methods of the XMLReader interface. For this example, having do-nothing methods will suffice.

Importing the Packages

The first step is to import the packages necessary for the SAX parser program to compile successfully. To create the SAX parser, the SAX packages and the io packages must be imported. The SAX packages define the interfaces and methods required for the SAX parser, and the io package is needed for the input/output operations.

To begin creating a SAX parser named MySAXEventGenerator, launch a text editor and create a file called MySAXEventGenerator.java. To import the packages, add the following lines:

import java.io.*; import org.xml.sax.*; import org.xml.sax.helpers.AttributesImpl; 

Here we've imported the io and SAX packages, which contain the necessary methods and interfaces for creating the SAX parser.

Implementing the XMLReader Interface

The next step is to create the class declaration and implement the XMLReader interface. The XMLReader interface provides the necessary methods to read a document using callbacks. To create the class declarations and implement the XMLReader interface, add the following line of code listed in bold:

import java.io.*; import org.xml.sax.*; import org.xml.sax.helpers.AttributesImpl; public class MySAXEventGenerator implements XMLReader { 
Instantiating the Variables

Next, we'll instantiate the String and Attributes variables. Start by declaring a reference to ContentHandler, and instantiate String and an Attributes variable. The ContentHandler reference will store the instance of the ContentHandler class when the SAX parser is called from an application. The String variable will be used to pass the namespaceURI parameter, and the Attributes variable will be used to pass the Attributes parameter in the startElement and endElement callback methods.

To instantiate the variables, add the lines of code shown in bold in Listing 4.14.

Listing 4.14 Instantiating Variables
import java.io.*; import org.xml.sax.*; import org.xml.sax.helpers.AttributesImpl; public class MySAXEventGenerator implements XMLReader {        ContentHandler handler;     // There are no namespaces and no attributes used in this example     String nsURI = "";  // NamespaceURI     Attributes atts = new AttributesImpl(); 

Here we've declared the reference to the ContentHandler and instantiated the String and Attributes variables. Notice that they are all null. This is because the calling application will use the setContentHandler() method of the XMLReader interface to set the reference to the ContentHandler. Also, there are no namespace and attributes in the document to be parsed.

Implementing the parse() Method

Next we'll implement the methods of the XMLReader interface. We'll begin by implementing the parse() method, because it does almost all the work for this application.

As the name suggests, the parse() method parses the document from any valid input stream. We'll set the parse() method to do the following:

  1. Ensure that the calling application has provided a valid ContentHandler reference. This is required to handle the callback methods generated by the parse() method.

  2. For efficiency reasons, convert InputStream to a BufferedReader.

  3. Read the first line from the SampleMailMessage.txt file.

  4. Invoke the startDocument() callback method.

  5. If the line read from the SampleMailMessage.txt file contains From, To, CC, BCC, Sent, Subject, or MailBody, invoke the startElement(), characters(), and endElement() callback methods. Because each element generates the startElement(), characters(), and endElement() callbacks, we'll create a method called output() that takes the element name and its content as a parameter and then invokes the callback method.

  6. After the file has been read completely, invoke the endDocument() callback method.

To do these tasks, add the lines of code listed in bold in Listing 4.15.

Listing 4.15 Implementing the parse() Method
Attributes atts = new AttributesImpl(); // Parse the input public void parse(InputSource input) throws IOException, SAXException {    try {             // Get an efficient reader for the file             if (handler==null) {               throw new SAXException("No content handler");             }             java.io.Reader r = input.getCharacterStream();   BufferedReader br = new BufferedReader(r);   String line = br.readLine();         //Invoke the startDocument method             handler.startDocument();       do {                  if (line.startsWith("From:"))               {                     output("From:", line);               }               else if (line.startsWith("To:"))               {                     output("To:", line);               }               else if (line.startsWith("CC:"))               {                       output("CC:", line);               }               else if (line.startsWith("CC:"))               {                       output("BCC:", line);               }               else if (line.startsWith("Sent:"))               {                        output("Sent:", line);               }               else if (line.startsWith("Subject:"))               {                       output("Subject:", line);               }              else if (line.startsWith("MailBody:"))               {                        output("MailBody:", line);               }             } while (null != (line = br.readLine()));                     handler.endDocument();         }          catch (IOException ioex) {             System.out.println(ioex.getMessage());         }          catch (SAXException saxEx) {             /* If there are errors in the data, the detailed message of the exception is  graphics/ccc.gifdisplayed*/             System.out.println(saxEx.getMessage());         }     }     void output(String name, String line)     throws SAXException     {       int textLength = line.length() - name.length();       handler.startElement(nsURI, name, name, atts);       handler.characters(line.toCharArray(),                          name.length(),                          textLength);       handler.endElement(nsURI, name, name);     } 

In the output() method, the name of the element (such as From, To, and so on) is taken as both the qualified name and local name and then passed to the startElement() and endElement() methods. The nsURI String and the atts Attributes objects that we had instantiated earlier are also passed on as the parameters. These will be used when invoking the startElement() and endElement() methods.

In the characters() method, the line read from the file is converted to a character array and passed on as a parameter. The length of the element name and the difference between the length of the line read from the file and the length of the element is passed on as the start position in the array and the number of characters to be read from the array.

Implementing Other Methods

Next we'll need to implement the other methods of the XMLReader interface. For this application, other than the getContentHandler() and the setContentHandler() methods, we'll simply implement the other methods of the XMLReader interface as do-nothing methods:

  • setContentHandler(ContentHandler handler)

  • getContentHandler()

  • setErrorHandler(ErrorHandler handler)

  • getErrorHandler()

  • parse()

  • parse(InputSource input)

  • parse(java.lang.String systemId)

  • getDTDHandler()

  • getEntyityResolver()

  • setEntityResolver(EntityResolver resolver)

  • setDTDHandler(DTDHandler handler)

  • getProperty(java.lang.String name)

  • setProperty(java.lang.String name, java.lang.Object value)

  • setFeature()

  • getFeature(java.lang.String name)

To implement these methods, add the lines listed in bold in Listing 4.16.

Listing 4.16 Implementing the XMLReader Methods
void output(String name, String line)     throws SAXException     {      .......     } // Enable the application to register a content event handler.     public void setContentHandler(ContentHandler handler) {       this.handler = handler;     }     // Return the current content handler.     public ContentHandler getContentHandler() {       return this.handler;     }     // Enable the application to register an error event handler.     public void setErrorHandler(ErrorHandler handler)     { }     // Return the current error handler.     public ErrorHandler getErrorHandler()     { return null; }     // Parse an XML document from a system identifier (URI).     public void parse(String systemId)     throws IOException, SAXException     { }     //Return the current DTD handler.     public DTDHandler getDTDHandler()     { return null; }     //Return the current entity resolver.     public EntityResolver getEntityResolver()     { return null; }     // Enable the application to register an entity resolver.     public void setEntityResolver(EntityResolver resolver)     { }     // Enable the application to register a DTD event handler.     public void setDTDHandler(DTDHandler handler)     { }     // Get value of a property.     public Object getProperty(String name)     { return null; }     // Set the value of a property.     public void setProperty(String name, Object value)     { }     // Set the state of a feature.     public void setFeature(String name, boolean value)     { }     // Get the value of a feature.     public boolean getFeature(String name)     { return false; } } 

We have now successfully created the SAX parser.

Creating the Application

The next and final step is to create the application that uses the SAX parser to parse the text file. To create the application, you need to do the following:

  1. Import the JAXP classes.

  2. Extend the DefaultHandler class.

  3. Get an instance of the parser.

  4. Set the ContentHandler.

  5. Convert the File object to an InputSource object.

  6. Call the parse() method.

  7. Override the necessary callback methods.

Importing the Packages

Launch a text editor and start creating the application that parses the SampleMailMessage.txt file. Name the application MySAXEventGeneratorReader.java. MySAXEventGeneratorReader is a simple application that parses the SampleMailMessage.txt file and displays the structure on the command window.

The first step is to import the packages necessary for the application to access the SAX and JAXP APIs. In the MySAXEventGeneratorReader.java file, add the following lines:

import javax.xml.parsers.*; import org.xml.sax.*; import org.xml.sax.helpers.*; import java.io.*; 
Extending the DefaultHandler Class

Next, we need to write the class declaration by extending the DefaultHandler class and entering the main() method:

public class MySAXEventGeneratorReader extends DefaultHandler { static public void main(String[] args) throws Exception { 

We've extended the DefaultHandler because the ContentHandler interface needs to be implemented to handle the callback methods generated by the parser. After extending the DefaultHandler class, we simply need to override the ContentHandler interface methods that are required.

Creating an Instance of the SAX Parser

Now we'll create an instance of the SAX parser that we've implemented. To do so, add the following line of code to the main() method:

MySAXEventGenerator mySAXEventGenerator = new MySAXEventGenerator(); 
Setting the ContentHandler

Now the event handler needs to be registered with the SAX parser. To register an instance of the ContentHandler interface, add the following lines of code:

static public void main(String[] args) throws Exception {         .........         MySAXEventGenerator mySAXEventGenerator = new MySAXEventGenerator();         mySAXEventGenerator.setContentHandler(new MySAXEventGeneratorReader()); 

In this case, it will be the instance of the application itself, because it extends the DefaultHandler class.

Converting the File Object to an InputSource Object

The next step is to convert the file object that is to be parsed to an InputSource object, because the parse() method takes the InputSource object as its only parameter.

Converting the File object to an InputSource object is a three-step process. First, you need to convert the File object to a FileReader object. Next, the FileReader object has to be converted to a BufferedReader object. Finally, the BufferedReader object has to be converted into the InputSource object.

To convert the File object to an InputSource object, add the lines of code listed in bold:

static public void main(String[] args) throws Exception {         ......... mySAXEventGenerator.setContentHandler(new MySAXEventGeneratorReader());               File f = new File("SampleMailMessage.txt");         FileReader fr = new FileReader(f);         BufferedReader br = new BufferedReader(fr);         InputSource inputSource = new InputSource(br); 

Next, invoke the parse() method.

Invoking the parse() Method

The parse() method parses the InputSource object and generates the necessary callbacks. To invoke the parse() method, add the lines of code listed in bold in Listing 4.17.

Listing 4.17 Invoking the parse() Method
static public void main(String[] args) throws Exception {         .........         InputSource inputSource = new InputSource(br); try{     /*Parse an XML document - the document is read and overridden callbacks in the  graphics/ccc.gifMySAXEventGeneratorReader are invoked*/     mySAXEventGenerator.parse(inputSource);     }     catch (SAXParseException saxException) {     /* If errors in XML data are trapped and location is displayed*/     System.out.println("\n\nError in data.xml at line:"+saxException.getLineNumber()+ "("+saxException.getColumnNumber()+")\n");     System.out.println(saxException.toString());     }     catch (SAXException saxEx) {     /* If there are errors in XML data, the detailed message of the exception is  graphics/ccc.gifdisplayed*/     System.out.println(saxEx.getMessage());     } } //End of main 

After the parsing is started, events are generated whenever an element such as From, To, and so on, is found. To handle these, we'll need to override the necessary callback methods.

Overriding the Callback Methods

Looking into the SAX parser code, you will find that the only callback methods that the SAX parser is generating are the startDocument(), startElement(), endElement(), characters(), and endDocument() methods. Therefore, you need to override these five methods in the application.

To override these methods, add the lines of code listed in bold in Listing 4.18.

Listing 4.18 Implementing the Callback Methods
} //End of main     public void startDocument()         {       System.out.println("\n Start Document: -----Reading SampleMailMessage with  graphics/ccc.gifMySAXEventGeneratorReader------\n");     }     public void startElement(String namespaceURI, String localName,                           String qualifiedName, Attributes elementAttributes)         {         System.out.println("Start Element-> "+qualifiedName);     }     public void characters(char[] ch, int start, int length)     {         System.out.println("Characters: " + new String(ch,start,length));     }        public void endElement(String namespaceURI, String localName,                         String qualifiedName)     {         System.out.println("End Element-> "+qualifiedName);          }        public void endDocument()        {             System.out.println("\n End Document: ----------------Finished Reading the  graphics/ccc.gifdocument---------------------\n");            } } 

NOTE

The code discussed here is available in the example0405 folder. This folder also contains the sample CarParts.xml file.


You can now compile and run the program. The output should be similar to the listing shown in Listing 4.19.

Listing 4.19 Output of MySAXEventGeneratorReader
Version 1.0 of MySAXEventGeneratorReader  Start Document: -----Reading SampleMailMessage with MySAXEventGeneratorReader------ Start Element-> From: Characters: somereader@somewhere.com End Element-> From: Start Element-> To: Characters: authors@sams.com End Element-> To: Start Element-> CC: Characters: publisher@sams.com End Element-> CC: Start Element-> Sent: Characters: 04/01/2001 End Element-> Sent: Start Element-> Subject: Characters: Your Books End Element-> Subject: Start Element-> MailBody: Characters: Hi, We think your books are great and are of great value to all readers.  graphics/ccc.gifThanks and keep the good work going.  End Element-> MailBody: End Document: ----------------Finished Reading the document------------------- 

We've successfully created a SAX parser that generates SAX events from a text file.


printer-friendly version of this section  Print  e-mail this section  E-Mail  add a public, group or private note  Add Note  add a bookmark about this section  Add Bookmark    
Top

[0672324342/ch04lev1sec5]

 
 


JavaT APIs for XML Kick Start
JAX: Java APIs for XML Kick Start
ISBN: 0672324342
EAN: 2147483647
Year: 2002
Pages: 133

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net