Generating SAX Events from a Text File So far, you have learned how to parse an XML file and handle events generated by a SAX parser. In this section, you will learn how to use the SAX events to process a text file. To understand how to use SAX events to process a text file, you must do the following: -
Create a text file containing a hypothetical email message. -
Create a SAX parser that generates callback events for each of the elements of the email message. The application implementing the SAX parser will handle the callback events. -
Create the application that will use the SAX parser. Let's look at each of these steps in detail. Creating the SampleMailMessage.txt File The first step is to create the text file containing the mail message. As is normally the case with mail messages, it will have the following entries: -
From -
To -
CC -
BCC -
Subject -
MailBody To create the text file, open the text editor of your choice, enter the following code, and save the file as SampleMailMessage.txt: From:somereader@somewhere.com To:authors@sams.com CC:publisher@sams.com BCC:otherreaders@worldwide.com Sent:04/01/2001 Subject:Your Books MailBody:Hi, We think your books are great and are of great value to all readers. Thanks and keep the good work going. As shown, the hypothetical mail message is from somereader@somewhere.com. It is sent to authors@sams.com complimenting the quality of their books. publishers@sams.com and otherreaders@worldwide.com are on the CC and BCC lists, respectively. The SAX parser you are going to write will generate callback events for each of the elements of the mail message: From, To, CC, BCC, Sent, Subject, and MailBody. Creating the SAX Parser The next step is to create the parser. The parser will be named MySAXEventGenerator. To create it, you'll need to do the following: -
Import the necessary packages. -
Implement the XMLReader interface. -
Create ContentHandler, String, and Attributes variables. -
Implement the parse() method. In the parse() method, iterate over the file and generate callback methods for each of the elements of the mail message. -
Implement the other methods of the XMLReader interface. For this example, having do-nothing methods will suffice. Importing the Packages The first step is to import the packages necessary for the SAX parser program to compile successfully. To create the SAX parser, the SAX packages and the io packages must be imported. The SAX packages define the interfaces and methods required for the SAX parser, and the io package is needed for the input/output operations. To begin creating a SAX parser named MySAXEventGenerator, launch a text editor and create a file called MySAXEventGenerator.java. To import the packages, add the following lines: import java.io.*; import org.xml.sax.*; import org.xml.sax.helpers.AttributesImpl; Here we've imported the io and SAX packages, which contain the necessary methods and interfaces for creating the SAX parser. Implementing the XMLReader Interface The next step is to create the class declaration and implement the XMLReader interface. The XMLReader interface provides the necessary methods to read a document using callbacks. To create the class declarations and implement the XMLReader interface, add the following line of code listed in bold: import java.io.*; import org.xml.sax.*; import org.xml.sax.helpers.AttributesImpl; public class MySAXEventGenerator implements XMLReader { Instantiating the Variables Next, we'll instantiate the String and Attributes variables. Start by declaring a reference to ContentHandler, and instantiate String and an Attributes variable. The ContentHandler reference will store the instance of the ContentHandler class when the SAX parser is called from an application. The String variable will be used to pass the namespaceURI parameter, and the Attributes variable will be used to pass the Attributes parameter in the startElement and endElement callback methods. To instantiate the variables, add the lines of code shown in bold in Listing 4.14. Listing 4.14 Instantiating Variablesimport java.io.*; import org.xml.sax.*; import org.xml.sax.helpers.AttributesImpl; public class MySAXEventGenerator implements XMLReader { ContentHandler handler; // There are no namespaces and no attributes used in this example String nsURI = ""; // NamespaceURI Attributes atts = new AttributesImpl(); Here we've declared the reference to the ContentHandler and instantiated the String and Attributes variables. Notice that they are all null. This is because the calling application will use the setContentHandler() method of the XMLReader interface to set the reference to the ContentHandler. Also, there are no namespace and attributes in the document to be parsed. Implementing the parse() Method Next we'll implement the methods of the XMLReader interface. We'll begin by implementing the parse() method, because it does almost all the work for this application. As the name suggests, the parse() method parses the document from any valid input stream. We'll set the parse() method to do the following: -
Ensure that the calling application has provided a valid ContentHandler reference. This is required to handle the callback methods generated by the parse() method. -
For efficiency reasons, convert InputStream to a BufferedReader. -
Read the first line from the SampleMailMessage.txt file. -
Invoke the startDocument() callback method. -
If the line read from the SampleMailMessage.txt file contains From, To, CC, BCC, Sent, Subject, or MailBody, invoke the startElement(), characters(), and endElement() callback methods. Because each element generates the startElement(), characters(), and endElement() callbacks, we'll create a method called output() that takes the element name and its content as a parameter and then invokes the callback method. -
After the file has been read completely, invoke the endDocument() callback method. To do these tasks, add the lines of code listed in bold in Listing 4.15. Listing 4.15 Implementing the parse() MethodAttributes atts = new AttributesImpl(); // Parse the input public void parse(InputSource input) throws IOException, SAXException { try { // Get an efficient reader for the file if (handler==null) { throw new SAXException("No content handler"); } java.io.Reader r = input.getCharacterStream(); BufferedReader br = new BufferedReader(r); String line = br.readLine(); //Invoke the startDocument method handler.startDocument(); do { if (line.startsWith("From:")) { output("From:", line); } else if (line.startsWith("To:")) { output("To:", line); } else if (line.startsWith("CC:")) { output("CC:", line); } else if (line.startsWith("CC:")) { output("BCC:", line); } else if (line.startsWith("Sent:")) { output("Sent:", line); } else if (line.startsWith("Subject:")) { output("Subject:", line); } else if (line.startsWith("MailBody:")) { output("MailBody:", line); } } while (null != (line = br.readLine())); handler.endDocument(); } catch (IOException ioex) { System.out.println(ioex.getMessage()); } catch (SAXException saxEx) { /* If there are errors in the data, the detailed message of the exception is displayed*/ System.out.println(saxEx.getMessage()); } } void output(String name, String line) throws SAXException { int textLength = line.length() - name.length(); handler.startElement(nsURI, name, name, atts); handler.characters(line.toCharArray(), name.length(), textLength); handler.endElement(nsURI, name, name); } In the output() method, the name of the element (such as From, To, and so on) is taken as both the qualified name and local name and then passed to the startElement() and endElement() methods. The nsURI String and the atts Attributes objects that we had instantiated earlier are also passed on as the parameters. These will be used when invoking the startElement() and endElement() methods. In the characters() method, the line read from the file is converted to a character array and passed on as a parameter. The length of the element name and the difference between the length of the line read from the file and the length of the element is passed on as the start position in the array and the number of characters to be read from the array. Implementing Other Methods Next we'll need to implement the other methods of the XMLReader interface. For this application, other than the getContentHandler() and the setContentHandler() methods, we'll simply implement the other methods of the XMLReader interface as do-nothing methods: -
setContentHandler(ContentHandler handler) -
getContentHandler() -
setErrorHandler(ErrorHandler handler) -
getErrorHandler() -
parse() -
parse(InputSource input) -
parse(java.lang.String systemId) -
getDTDHandler() -
getEntyityResolver() -
setEntityResolver(EntityResolver resolver) -
setDTDHandler(DTDHandler handler) -
getProperty(java.lang.String name) -
setProperty(java.lang.String name, java.lang.Object value) -
setFeature() -
getFeature(java.lang.String name) To implement these methods, add the lines listed in bold in Listing 4.16. Listing 4.16 Implementing the XMLReader Methodsvoid output(String name, String line) throws SAXException { ....... } // Enable the application to register a content event handler. public void setContentHandler(ContentHandler handler) { this.handler = handler; } // Return the current content handler. public ContentHandler getContentHandler() { return this.handler; } // Enable the application to register an error event handler. public void setErrorHandler(ErrorHandler handler) { } // Return the current error handler. public ErrorHandler getErrorHandler() { return null; } // Parse an XML document from a system identifier (URI). public void parse(String systemId) throws IOException, SAXException { } //Return the current DTD handler. public DTDHandler getDTDHandler() { return null; } //Return the current entity resolver. public EntityResolver getEntityResolver() { return null; } // Enable the application to register an entity resolver. public void setEntityResolver(EntityResolver resolver) { } // Enable the application to register a DTD event handler. public void setDTDHandler(DTDHandler handler) { } // Get value of a property. public Object getProperty(String name) { return null; } // Set the value of a property. public void setProperty(String name, Object value) { } // Set the state of a feature. public void setFeature(String name, boolean value) { } // Get the value of a feature. public boolean getFeature(String name) { return false; } } We have now successfully created the SAX parser. Creating the Application The next and final step is to create the application that uses the SAX parser to parse the text file. To create the application, you need to do the following: -
Import the JAXP classes. -
Extend the DefaultHandler class. -
Get an instance of the parser. -
Set the ContentHandler. -
Convert the File object to an InputSource object. -
Call the parse() method. -
Override the necessary callback methods. Importing the Packages Launch a text editor and start creating the application that parses the SampleMailMessage.txt file. Name the application MySAXEventGeneratorReader.java. MySAXEventGeneratorReader is a simple application that parses the SampleMailMessage.txt file and displays the structure on the command window. The first step is to import the packages necessary for the application to access the SAX and JAXP APIs. In the MySAXEventGeneratorReader.java file, add the following lines: import javax.xml.parsers.*; import org.xml.sax.*; import org.xml.sax.helpers.*; import java.io.*; Extending the DefaultHandler Class Next, we need to write the class declaration by extending the DefaultHandler class and entering the main() method: public class MySAXEventGeneratorReader extends DefaultHandler { static public void main(String[] args) throws Exception { We've extended the DefaultHandler because the ContentHandler interface needs to be implemented to handle the callback methods generated by the parser. After extending the DefaultHandler class, we simply need to override the ContentHandler interface methods that are required. Creating an Instance of the SAX Parser Now we'll create an instance of the SAX parser that we've implemented. To do so, add the following line of code to the main() method: MySAXEventGenerator mySAXEventGenerator = new MySAXEventGenerator(); Setting the ContentHandler Now the event handler needs to be registered with the SAX parser. To register an instance of the ContentHandler interface, add the following lines of code: static public void main(String[] args) throws Exception { ......... MySAXEventGenerator mySAXEventGenerator = new MySAXEventGenerator(); mySAXEventGenerator.setContentHandler(new MySAXEventGeneratorReader()); In this case, it will be the instance of the application itself, because it extends the DefaultHandler class. Converting the File Object to an InputSource Object The next step is to convert the file object that is to be parsed to an InputSource object, because the parse() method takes the InputSource object as its only parameter. Converting the File object to an InputSource object is a three-step process. First, you need to convert the File object to a FileReader object. Next, the FileReader object has to be converted to a BufferedReader object. Finally, the BufferedReader object has to be converted into the InputSource object. To convert the File object to an InputSource object, add the lines of code listed in bold: static public void main(String[] args) throws Exception { ......... mySAXEventGenerator.setContentHandler(new MySAXEventGeneratorReader()); File f = new File("SampleMailMessage.txt"); FileReader fr = new FileReader(f); BufferedReader br = new BufferedReader(fr); InputSource inputSource = new InputSource(br); Next, invoke the parse() method. Invoking the parse() Method The parse() method parses the InputSource object and generates the necessary callbacks. To invoke the parse() method, add the lines of code listed in bold in Listing 4.17. Listing 4.17 Invoking the parse() Methodstatic public void main(String[] args) throws Exception { ......... InputSource inputSource = new InputSource(br); try{ /*Parse an XML document - the document is read and overridden callbacks in the MySAXEventGeneratorReader are invoked*/ mySAXEventGenerator.parse(inputSource); } catch (SAXParseException saxException) { /* If errors in XML data are trapped and location is displayed*/ System.out.println("\n\nError in data.xml at line:"+saxException.getLineNumber()+ "("+saxException.getColumnNumber()+")\n"); System.out.println(saxException.toString()); } catch (SAXException saxEx) { /* If there are errors in XML data, the detailed message of the exception is displayed*/ System.out.println(saxEx.getMessage()); } } //End of main After the parsing is started, events are generated whenever an element such as From, To, and so on, is found. To handle these, we'll need to override the necessary callback methods. Overriding the Callback Methods Looking into the SAX parser code, you will find that the only callback methods that the SAX parser is generating are the startDocument(), startElement(), endElement(), characters(), and endDocument() methods. Therefore, you need to override these five methods in the application. To override these methods, add the lines of code listed in bold in Listing 4.18. Listing 4.18 Implementing the Callback Methods} //End of main public void startDocument() { System.out.println("\n Start Document: -----Reading SampleMailMessage with MySAXEventGeneratorReader------\n"); } public void startElement(String namespaceURI, String localName, String qualifiedName, Attributes elementAttributes) { System.out.println("Start Element-> "+qualifiedName); } public void characters(char[] ch, int start, int length) { System.out.println("Characters: " + new String(ch,start,length)); } public void endElement(String namespaceURI, String localName, String qualifiedName) { System.out.println("End Element-> "+qualifiedName); } public void endDocument() { System.out.println("\n End Document: ----------------Finished Reading the document---------------------\n"); } } NOTE The code discussed here is available in the example0405 folder. This folder also contains the sample CarParts.xml file. You can now compile and run the program. The output should be similar to the listing shown in Listing 4.19. Listing 4.19 Output of MySAXEventGeneratorReaderVersion 1.0 of MySAXEventGeneratorReader Start Document: -----Reading SampleMailMessage with MySAXEventGeneratorReader------ Start Element-> From: Characters: somereader@somewhere.com End Element-> From: Start Element-> To: Characters: authors@sams.com End Element-> To: Start Element-> CC: Characters: publisher@sams.com End Element-> CC: Start Element-> Sent: Characters: 04/01/2001 End Element-> Sent: Start Element-> Subject: Characters: Your Books End Element-> Subject: Start Element-> MailBody: Characters: Hi, We think your books are great and are of great value to all readers. Thanks and keep the good work going. End Element-> MailBody: End Document: ----------------Finished Reading the document------------------- We've successfully created a SAX parser that generates SAX events from a text file. |