CONTENTS |
The previous chapter was all about using Java and the XML DOM. Some people, however, find using the DOM difficult and see the whole concept of treating an XML document as a tree unnecessarily complex. Rather than having to navigate through the whole document, they say, wouldn't it be great if the whole document came to you? That's the idea behind the Simple API for XML (SAX), and this chapter is dedicated to it. SAX really is a lot easier to use for many possibly even most XML parsing that you have to do.
You may be surprised to learn that we've already been putting the idea behind SAX to work throughout the entire previous chapter. You may recall that in that chapter, I set up a recursive method named display that was called for every node in the DOM tree. In display, I used a switch statement to make things easier. That switch statement had case statements to handle different types of nodes:
public static void display(Node node, String indent) { if (node == null) { return; } int type = node.getNodeType(); switch (type) { case Node.DOCUMENT_NODE: { displayStrings[numberDisplayLines] = indent; displayStrings[numberDisplayLines] += "<?xml version=\"1.0\" encoding=\""+ "UTF-8" + "\"?>"; numberDisplayLines++; display(((Document)node).getDocumentElement(), ""); break; } case Node.ELEMENT_NODE: { displayStrings[numberDisplayLines] = indent; displayStrings[numberDisplayLines] += "<"; displayStrings[numberDisplayLines] += node.getNodeName(); int length = (node.getAttributes() != null) ? node.getAttributes().getLength() : 0; Attr attributes[] = new Attr[length]; for (int loopIndex = 0; loopIndex < length; loopIndex++) { attributes[loopIndex] = (Attr)node.getAttributes().item(loopIndex); } . . .
I was able to add the code that handled elements to one case statement, the code to handle processing instructions to another case statement, and so on.
In essence, we were handling XML documents the same way that SAX does. Instead of navigating through the document ourselves, we let the document come to us, having the code call various case statements for the various nodes in the document. That's what SAX does. It's event-based, which means that when the SAX parser encounters an element, it treats that as an event and calls the code that you specify should be called for elements; when it encounters a processing instruction, it treats that as an event and calls the code that you specify should be called for processing instructions, and so on. In this way, you don't have to navigate through the document yourself it comes to you. The fact that we've already based a significant amount of programming on this technique indicates how useful it is.
The XML for Java package from alphaWorks that we used in the previous chapter (http://www.alphaworks.ibm.com/tech/xml4j) also supports SAX. That means you can use the same JAR files in this chapter that we used in the previous chapter; just make sure that they're added to your CLASSPATH like this in Windows (and make the command all one line):
C:\>SET CLASSPATH=%CLASSPATH%;C:\xmlparser\XML4J_3_0_1\xerces.jar; C:\xmlparser\XML4J_3_0_1\xercesSamples.jar
Or, use the -classpath switch, as discussed in the previous chapter:
%javac -classpath C:\xmlparser\XML4J_3_0_1\xerces.jar; C:\xmlparser\XML4J_3_0_1\xercesSamples.jar browser.java %java -classpath C:\xmlparser\XML4J_3_0_1\xerces.jar; C:\xmlparser\XML4J_3_0_1\xercesSamples.jar browser
This first example will show how to work with SAX. In this case, I'll use SAX to count the number of <CUSTOMER> elements in customer.xml, just as the first example in the previous chapter did. Here's customer.xml:
<?xml version = "1.0" standalone="yes"?> <DOCUMENT> <CUSTOMER> <NAME> <LAST_NAME>Smith</LAST_NAME> <FIRST_NAME>Sam</FIRST_NAME> </NAME> <DATE>October 15, 2001</DATE> <ORDERS> <ITEM> <PRODUCT>Tomatoes</PRODUCT> <NUMBER>8</NUMBER> <PRICE>$1.25</PRICE> </ITEM> <ITEM> <PRODUCT>Oranges</PRODUCT> <NUMBER>24</NUMBER> <PRICE>$4.98</PRICE> </ITEM> </ORDERS> </CUSTOMER> <CUSTOMER> <NAME> <LAST_NAME>Jones</LAST_NAME> <FIRST_NAME>Polly</FIRST_NAME> </NAME> <DATE>October 20, 2001</DATE> <ORDERS> <ITEM> <PRODUCT>Bread</PRODUCT> <NUMBER>12</NUMBER> <PRICE>$14.95</PRICE> </ITEM> <ITEM> <PRODUCT>Apples</PRODUCT> <NUMBER>6</NUMBER> <PRICE>$1.50</PRICE> </ITEM> </ORDERS> </CUSTOMER> <CUSTOMER> <NAME> <LAST_NAME>Weber</LAST_NAME> <FIRST_NAME>Bill</FIRST_NAME> </NAME> <DATE>October 25, 2001</DATE> <ORDERS> <ITEM> <PRODUCT>Asparagus</PRODUCT> <NUMBER>12</NUMBER> <PRICE>$2.95</PRICE> </ITEM> <ITEM> <PRODUCT>Lettuce</PRODUCT> <NUMBER>6</NUMBER> <PRICE>$11.50</PRICE> </ITEM> </ORDERS> </CUSTOMER> </DOCUMENT>
Here, I'll base the new program on a new class named FirstParserSAX. We'll need an object of that class to pass to the SAX parser so that it can call the methods in that object when it encounters elements, the start of the document, the end of the document, and so on. I begin by creating an object of the FirstParserSAX class named SAXHandler:
import org.xml.sax.*; import org.apache.xerces.parsers.SAXParser; public class FirstParserSAX { public static void main(String[] args) { FirstParserSAX SAXHandler = new FirstParserSAX(); . . . } }
Next, I create the actual SAX parser that we'll work with. This parser is an object of the org.apache.xerces.parsers.SAXParser class (just like the DOM parser objects we worked with in the previous chapter were objects of the org.apache.xerces.parsers.DOMParser class). To use the SAXParser class, I import that class and the supporting classes in the org.xml.sax package, and I'm free to create a new SAX parser named parser:
import org.xml.sax.*; import org.apache.xerces.parsers.SAXParser; public class FirstParserSAX { public static void main(String[] args) { FirstParserSAX SAXHandler = new FirstParserSAX(); SAXParser parser = new SAXParser(); . . . } }
The SAXParser class is derived from the XMLParser class, which in turn is based on the java.lang.Object class:
java.lang.Object | +--org.apache.xerces.framework.XMLParser | +--org.apache.xerces.parsers.SAXParser
The constructor of the SAXParser class is SAXParser(); the methods of the SAXParser class are listed in Table 12.1. The constructor of the XMLParser class is protectedXMLParser(); the methods of the XMLParser class are listed in Table 12.2.
Method | Description |
---|---|
void attlistDecl(int elementTypeIndex, int attrNameIndex, int attType, java.lang.String enumString, int attDefaultType, int attDefaultValue) | Callback for attribute type declarations |
void characters(char[] ch, int start, int length) | Callback for characters that specifies the characters in an array |
void comment(int dataIndex) | Callback for comments |
void commentInDTD(int dataIndex) | Callback for comments in a DTD |
void elementDecl(int elementType, XMLValidator.ContentSpec contentSpec) | Callback for an element type declaration |
void endCDATA() | Callback for the end of CDATA sections |
void endDocument() | Callback for the end of a document |
void endDTD() | Callback for the end of the DTD |
void endElement(int elementType) | Callback for the end of an Element element |
void endEntityReference(int entityName, int entityType, int entityContext) | Callback for the end of an entity reference |
void endNamespaceDeclScope(int prefix) | Callback for the end of the scope of a namespace declaration |
void externalEntityDecl(int entityName, int publicId, int systemId) | Callback for a parsed external general entity declaration |
void externalPEDecl(int entityName, int publicId, int systemId) | Callback for a parsed external parameter entity declaration |
ContentHandler getContentHandler() | Instruction to get the content handler |
protected DeclHandler getDeclHandler() | Instruction to get the DTD declaration event handler |
DTDHandler getDTDHandler() | Instruction to get the current DTD handler |
boolean getFeature(java.lang.String featureId) | Instruction to get the state of a parser feature |
java.lang.String[] getFeaturesRecognized() | Instruction to get a list of features that the parser recognizes |
protected LexicalHandler getLexicalHandler() | Instruction to get the lexical handler for this parsers |
protected boolean getNamespacePrefixes() | Instruction to get the value of namespace prefixes |
java.lang.String[] getPropertiesRecognized() | Instruction to get a list of properties that the parser recognizes |
java.lang.Object getProperty (java.lang.String propertyId) | Instruction to get the value of a property |
void ignorableWhitespace(char[] ch, int start, int length) | Callback for ignorable whitespace |
void internalEntityDecl(int entityName, int entityValue) | Callback for an internal general entity declaration |
void internalPEDecl(int entityName, int entityValue) | Callback for an internal parameter entity declaration |
void internalSubset(int internalSubset) | Callback from DOM Level 2 |
void notationDecl(int notationName, int publicId, int systemId) | Callback for notification of a notation declaration event |
void processingInstruction(int piTarget, int piData) | Callback for processing instructions |
void processingInstructionInDTD (int piTarget, int piData) | Callback for processing instructions in DTD |
void setContentHandler(ContentHandler handler) | Instruction to set a content handler to let an application handle SAX events |
protected void setDeclHandler (DeclHandler handler) | Instruction to set the DTD declaration event handler |
void setDocumentHandler(DocumentHandler handler) | Instruction to set the document handler |
void setDTDHandler(DTDHandler handler) | Instruction to set the DTD handler |
void setFeature(java.lang.String featureId, boolean state) | Instruction to set the state of any feature |
protected void setLexicalHandler(LexicalHandler handler) | Instruction to set the lexical event handler |
protected void setNamespacePrefixes(boolean process) | Specifier for how the parser reports raw prefixed names, as well as if xmlns attributes are reported |
void setProperty(java.lang.String propertyId, java.lang.Object value) | Instruction to set the value of any property |
void startCDATA() | Callback for the start of a CDATA section |
void startDocument(int versionIndex, int encodingIndex, int standaloneIndex) | Callback for the start of the document |
void startDTD(int rootElementType, int publicId, int systemId) | Callback for a <!DOCTYPE > declaration |
void startElement(int elementType, XMLAttrList attrList, int attrListIndex) | Callback for the start of an element |
void startEntityReference(int entityName, int entityType, int entityContext) | Callback for the start of an entity reference |
void startNamespaceDeclScope(int prefix, int uri) | Callback for the start of the scope of a namespace declaration |
void unparsedEntityDecl(int entityName, int publicId, int systemId, int notationName) | Callback for an unparsed entity declaration event |
Method | Description |
---|---|
void addRecognizer(org.apache.xerces. readers.XMLDeclRecognizer recognizer) | Adds a recognizer |
abstract void attlistDecl(int elementType, int attrName, int attType, java.lang.String enumString, int attDefaultType, int attDefaultValue) | Serves as a callback for an attribute list declaration |
void callCharacters(int ch) | Calls the characters callback |
void callComment(int comment) | Calls the comment callback |
void callEndDocument() | Calls the end document callback |
boolean callEndElement(int readerId) | Calls the end element callback |
void callProcessingInstruction (int target, int data) | Calls the processing instruction callback |
void callStartDocument(int version, int encoding, int standalone) | Calls the start document callback |
void callStartElement(int elementType) | Calls the start element callback |
org.apache.xerces.readers.XMLEntityHandler. EntityReader changeReaders() | Is called by the reader subclasses at the end of input |
abstract void characters(char[] ch, int start, int length) | Serves as a callback for characters |
abstract void characters(int data) | Serves as a callback for characters using string pools |
abstract void comment(int comment) | Serves as a callback for comment |
void commentInDTD(int comment) | Serves as a callback for comment in DTD |
abstract void elementDecl(int elementType, XMLValidator.ContentSpec contentSpec) | Serves as a callback for an element declaration |
abstract void endCDATA() | Serves as a callback for the end of the CDATA section |
abstract void endDocument() | Serves as a callback for the end of a document |
abstract void endDTD() | Serves as a callback for the end of the DTD |
abstract void endElement(int elementType) | Serves as a callback for the end of an element |
void endEntityDecl() | Serves as a callback for the end of an entity declaration |
abstract void endEntityReference (int entityName, int entityType, int entityContext) | Serves as a callback for an end of entity reference |
abstract void endNamespaceDeclScope(int prefix) | Serves as a callback for the end of a namespace declaration scope |
java.lang.String expandSystemId(java.lang. String systemId) | Expands a system ID and method returns the system ID as an URL |
abstract void externalEntityDecl (int entityName, int publicId, int systemId) | Serve as a callback for a external general entity declaration |
abstract void externalPEDecl(int entityName, int publicId, int systemId) | Serves as a callback for an external parameter entity declaration |
protected boolean getAllowJavaEncodings() | Is True if Java encoding names are allowed in the XML document |
int getColumnNumber() | Gives the column number of the current position in the document |
protected boolean getContinueAfterFatalError() | Is True if the parser will continue after a fatal error |
org.apache.xerces.readers.XMLEntityHandler. EntityReader getEntityReader() | Gets the entity reader |
EntityResolver getEntityResolver() | Gets the current entity resolver |
ErrorHandler getErrorHandler() | Gets the current error handler |
boolean getFeature(java.lang.String featureId) | Gets the state of a feature |
java.lang.String[] getFeaturesRecognized() | Gets a list of features recognized by this parser |
int getLineNumber() | Gets the current line number in the document |
Locator getLocator() | Gets the locator used by the parser |
protected boolean getNamespaces() | Is True if the parser preprocesses namespaces |
java.lang.String[] getPropertiesRecognized() | Gets the list of recognized properties for the parser |
java.lang.Object getProperty(java.lang. String propertyId) | Gets the value of a property |
java.lang.String getPublicId() | Gets the public ID of the InputSource |
protected org.apache.xerces.validators.schema. XSchemaValidator getSchemaValidator() | Gets the current XML schema validator |
java.lang.String getSystemId() | Gets the system ID of the InputSource |
protected boolean getValidation() | Is True if validation is turned on. |
protected boolean getValidationDynamic() | Is True if validation is determined based on whether a document contains a grammar |
protected boolean getValidationWarnOn DuplicateAttdef() | Is True if an error is created when an attribute is redefined in the grammar |
protected boolean getValidationWarnOn UndeclaredElemdef() | Is True if the parser creates an error when an undeclared element is referenced |
abstract void ignorableWhitespace(char[] ch, int start, int length) | Serves as a callback for ignorable whitespace |
abstract void ignorableWhitespace(int data) | Serves as a callback for ignorable whitespace based on string pools |
abstract void internalEntityDecl (int entityName, int entityValue) | Serves as a callback for an internal general entity declaration |
abstract void internalPEDecl(int entityName, int entityValue) | Serves as a callback for an internal parameter entity declaration |
abstract void internalSubset (int internalSubset) | Supports DOM Level 2 internalSubsets |
boolean isFeatureRecognized(java.lang. String featureId) | Is True if the given feature is recognized |
boolean isPropertyRecognized(java.lang. String propertyId) | Is True if the given property is recognized |
abstract void notationDecl(int notationName, int publicId, int systemId) | Serves as a callback for a notation declaration |
void parse(InputSource source) | Parses the given input source |
void parse(java.lang.String systemId) | Parses the input source given by a system identifier |
boolean parseSome() | Supports application-driven parsing |
boolean parseSomeSetup(InputSource source) | Sets up application-driven parsing |
void processCharacters(char[] chars, int offset, int length) | Processes character data given a character array |
void processCharacters(int data) | Processes character data |
abstract void processingInstruction (int target, int data) | Serves as a callback for processing instructions |
void processingInstructionInDTD (int target, int data) | Serves as a callback for processing instructions in a DTD |
void processWhitespace(char[] chars, int offset, int length) | Processes whitespace |
void processWhitespace(int data) | Processes whitespace based on string pools |
void reportError(Locator locator, java.lang.String errorDomain, int majorCode, int minorCode, java.lang.Object[] args, int errorType) | Reports errors |
void reset() | Resets the parser so that it can be reused |
protected void resetOrCopy() | Resets or copies the parser |
int scanAttributeName(org.apache.xerces. readers.XMLEntityHandler.EntityReader entityReader, int elementType) | Scans an attribute name |
int scanAttValue(int elementType, int attrName) | Scans an attribute value |
void scanDoctypeDecl(boolean standalone) | Scans a doctype declaration |
int scanElementType(org.apache.xerces.readers.XMLEntityHandler.EntityReader entityReader, char fastchar) | Scans an element type |
boolean scanExpectedElementType(org.apache. xerces.readers.XMLEntityHandler.EntityReader entityReader, char fastchar) | Scans an expected element type |
protected void setAllowJavaEncodings(boolean allow) | Supports the use of Java encoding names |
protected void setContinueAfterFatalError(boolean continueAfterFatalError) | Lets the parser continue after fatal errors |
void setEntityResolver(EntityResolver resolver) | Specifies the resolver (resolves external entities) |
void setErrorHandler(ErrorHandler handler) | Sets the error handler |
void setFeature(java.lang.String featureId, boolean state) | Sets the state of a feature |
void setLocale(java.util.Locale locale) | Sets the locale |
void setLocator(Locator locator) | Sets the locator |
protected void setNamespaces(boolean process) | Specifies whether the parser preprocesses namespaces |
void setProperty(java.lang.String propertyId, java.lang.Object value) | Sets the value of a property |
void setReaderFactory(org.apache.xerces.readers.XMLEntityReaderFactory readerFactory) | Sets the reader factory |
protected void setSendCharDataAsCharArray (boolean flag) | Sets character data processing preferences |
void setValidating(boolean flag) | Indicates to the parser that you are validating |
protected void setValidation(boolean validate) | Specifies whether the parser validates |
protected void setValidationDynamic (boolean dynamic) | Lets the parser validate a document only if it contains a grammar |
protected void setValidationWarnOn DuplicateAttdef(boolean warn) | Specifies whether an error is created when attributes are redefined in the grammar |
protected void setValidationWarnOn UndeclaredElemdef(boolean warn) | Specifies whether the parser causes an error when an element's content model references an element by name that is not declared |
abstract void startCDATA() | Serves as a callback for the start of the CDATA section |
abstract void startDocument(int version, int encoding, int standAlone) | Serves as a callback for the start of the document |
abstract void startDTD(int rootElementType, int publicId, int systemId) | Serves as a callback for the start of the DTD |
abstract void startElement(int elementType, XMLAttrList attrList, int attrListHandle) | Serves as a callback for the start of an element |
boolean startEntityDecl(boolean isPE, int entityName) | Serves as a callback for the start of an entity declaration |
abstract void startEntityReference (int entityName, int entityType, int entityContext) | Serves as a callback for the start of an entity reference |
abstract void startNamespaceDeclScope (int prefix, int uri) | Serves as a callback for the start of a namespace declaration scope |
boolean startReadingFromDocument (InputSource source) | Starts reading from a document |
boolean startReadingFromEntity (int entityName, int readerDepth, int context) | Starts reading from an external entity |
void startReadingFromExternalSubset (java.lang.String publicId, java.lang. String systemId, int readerDepth) | Starts reading from an external DTD subset |
void stopReadingFromExternalSubset() | Stops reading from an external DTD subset |
abstract void unparsedEntityDecl (int entityName, int publicId, int systemId, int notationName) | Serves as a callback for unparsed entity declarations |
boolean validEncName(java.lang. String encoding) | Is True if the given encoding is valid |
boolean validVersionNum(java.lang. String version) | Is True if the given version is valid |
We have a SAXParser object now, and we need to register the SAXHandler object we created with the SAXParser object so the methods of the SAXHandler object are called when the parser starts the document, finds a new element, and so forth. SAX parsers call quite a number of methods, such as those for elements, processing instructions, declarations in DTDs, and so on. The methods a SAX parser calls to inform you that a new item has been found in the document are called callback methods, and you must register those methods with the SAX parser.
Four core SAX interfaces support the various callback methods:
EntityResolver implements customized handling for external entities.
DTDHandler handles DTD events.
ContentHandler handles the content of a document, such as elements and processing instructions.
ErrorHandler handles errors that occur while parsing.
There are many callback methods in these interfaces; if you want to use an interface, you have to implement all those methods. The XML for Java package makes it easier for you by creating a class called DefaultHandler, which has default implementations for all the required callback methods. The constructor of the DefaultHandler class is DefaultHandler(); its methods are listed in Table 12.3.
Method | Description |
---|---|
void characters(char[] ch, int start, int length) | Callback for character data inside an element |
void endDocument() | Callback for the end of the document |
void endElement(java.lang.String uri, java.lang.String localName, java.lang.String rawName) | Callback for the end of an element |
void endPrefixMapping(java.lang.String prefix) | Callback for the end of a namespace mapping |
void error(SAXParseException e) | Callback for a recoverable parser error |
void fatalError(SAXParseException e) | Callback for a fatal XML parsing error |
void ignorableWhitespace(char[] ch, int start, int length) | Callback for ignorable whitespace in element content |
void notationDecl(java.lang.String name, java.lang.String publicId, java.lang.String systemId) | Callback for a notation declaration |
void processingInstruction(java.lang.String target, java.lang.String data) | Callback for a processing instruction |
InputSource resolveEntity(java.lang.String publicId, java.lang.String systemId) | Callback for an external entity |
void setDocumentLocator(Locator locator) | Sets a Locator object for document events |
void skippedEntity(java.lang.String name) | Callback for a skipped entity |
void startDocument() | Callback for the beginning of the document |
void startElement(java.lang.String uri, java.lang.String localName, java.lang.String rawName, Attributes attributes) | Callback for the start of an element |
void startPrefixMapping(java.lang.String prefix, java.lang.String uri) | Callback for the start of a namespace mapping |
void unparsedEntityDecl(java.lang.String name, java.lang.String publicId, java.lang.String systemId, java.lang.String notationName) | Callback for an unparsed entity declaration |
void warning(SAXParseException e) | Callback for parser warnings |
If you base your program on the DefaultHandler interface, you need to implement only the callback methods you're interested in, so I'll derive the main class of this program, FirstParserSAX, on the DefaultHandler interface, which you do with the extends keyword:
import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class FirstParserSAX extends DefaultHandler { public static void main(String[] args) { FirstParserSAX SAXHandler = new FirstParserSAX(); SAXParser parser = new SAXParser(); . . . } }
Now we're ready to register the FirstParserSAX class with the SAX parser. In this case, I'm not going to worry about handling DTD events or resolving external entities; I'll just handle the document's content and any errors with the setContentHandler and setErrorHandler methods:
import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class FirstParserSAX extends DefaultHandler { public static void main(String[] args) { FirstParserSAX SAXHandler = new FirstParserSAX(); SAXParser parser = new SAXParser(); parser.setContentHandler(SAXHandler); parser.setErrorHandler(SAXHandler); . . . } }
This registers the SAXHandler object so that it will receive SAX content and error events. I'll add the methods that will be called after finishing the main method.
To actually parse the XML document, you use the parse method of the parser object. I'll let the user specify the name of the document to parse on the command by parsing args[0]. (Note that you don't need to pass the name of a local file to the parse method you can pass the URL of a document on the Internet, and the parse method will retrieve that document.) The parse method can throw Java exceptions, which means that you have to enclose it in a try block, which has a subsequent catch block:
import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class FirstParserSAX extends DefaultHandler { public static void main(String[] args) { try { FirstParserSAX SAXHandler = new FirstParserSAX(); SAXParser parser = new SAXParser(); parser.setContentHandler(SAXHandler); parser.setErrorHandler(SAXHandler); parser.parse(args[0]); } catch (Exception e) { e.printStackTrace(System.err); } } }
That completes the main method, so I'll implement the methods that are called when the SAX parser parses the XML document. In this case, the goal is to determine how many <CUSTOMER> elements the document has, so I implement the startElement method like this:
import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class FirstParserSAX extends DefaultHandler { public void startElement(String uri, String localName, String rawName, Attributes attributes) { . . . } public static void main(String[] args) { try { FirstParserSAX SAXHandler = new FirstParserSAX(); SAXParser parser = new SAXParser(); parser.setContentHandler(SAXHandler); parser.setErrorHandler(SAXHandler); parser.parse(args[0]); } catch (Exception e) { e.printStackTrace(System.err); } } }
The startElement method is called each time the SAX parser sees the start of an element, and the endElement method is called when the SAX parser sees the end of an element.
Note that two element names are passed to the startElement method: localName and rawName. You use the localName argument with namespace processing; this argument holds the name of the element without any namespace prefix. The rawName argument holds the full, qualified name of the element, including any namespace prefix.
We're just going to count the number of <CUSTOMER> elements, so I'll take a look at the element's rawName argument. If that argument equals "CUSTOMER", I'll increment a variable named customerCount:
import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class FirstParserSAX extends DefaultHandler { int customerCount = 0; public void startElement(String uri, String localName, String rawName, Attributes attributes) { if (rawName.equals("CUSTOMER")) { customerCount++; } } public static void main(String[] args) { try { FirstParserSAX SAXHandler = new FirstParserSAX(); SAXParser parser = new SAXParser(); parser.setContentHandler(SAXHandler); parser.setErrorHandler(SAXHandler); parser.parse(args[0]); } catch (Exception e) { e.printStackTrace(System.err); } } }
How do you know when you've reached the end of the document and there are no more <CUSTOMER> elements to count? You use the endDocument method, which is called when the end of the document is reached. I'll display the number of tallied <CUSTOMER> elements in that method:
import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class FirstParserSAX extends DefaultHandler { int customerCount = 0; public void startElement(String uri, String localName, String rawName, Attributes attributes) { if (rawName.equals("CUSTOMER")) { customerCount++; } } public void endDocument() { System.out.println("The document has " + customerCount + " <CUSTOMER> elements."); } public static void main(String[] args) { try { FirstParserSAX SAXHandler = new FirstParserSAX(); SAXParser parser = new SAXParser(); parser.setContentHandler(SAXHandler); parser.setErrorHandler(SAXHandler); parser.parse(args[0]); } catch (Exception e) { e.printStackTrace(System.err); } } }
You can compile and run this program like this:
%java FirstParserSAX customer.xml The document has 3 <CUSTOMER> elements.
And that's all it takes to get started with SAX.
In this next example, as in the previous chapter, I'm going to write a program that parses and displays an entire document, indenting each element, processing instruction, and so on, as well as displaying attributes and their values. Here, however, I'll use SAX methods, not DOM methods. If you pass customer.xml to this program, which I'll call IndentingParserSAX.java, the program will display the whole document properly indented.
I start by letting the user specify what document to parse and then parsing that document as before. To actually parse the document, I'll call a new method, displayDocument, from the main method. The displayDocument method will fill the array displayStrings with the formatted document, and the main method will print it out:
import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class IndentingParserSAX extends DefaultHandler { public static void displayDocument(String uri) { . . . } public static void main(String args[]) { displayDocument(args[0]); for(int index = 0; index < numberDisplayLines; index++){ System.out.println(displayStrings[index]); } } }
In the displayDocument method, I'll create a SAX parser and register an object of the program's main class with that parser so that the methods of the object will be called for SAX events:
import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class IndentingParserSAX extends DefaultHandler { public static void displayDocument(String uri) { try { IndentingParserSAX SAXHandler = new IndentingParserSAX(); SAXParser parser = new SAXParser(); parser.setContentHandler(SAXHandler); parser.setErrorHandler(SAXHandler); parser.parse(uri); } catch (Exception e) { e.printStackTrace(System.err); } } public static void main(String args[]) { displayDocument(args[0]); for(int index = 0; index < numberDisplayLines; index++){ System.out.println(displayStrings[index]); } } }
All that's left is to create the various methods that will be called for SAX events, and I'll start with the beginning of the document.
When the SAX parser encounters the beginning of the document to parse, it calls the startDocument method. This method is not passed any arguments, so I'll just have the program display the XML declaration. As in the previous chapter, I'll store the text to display in the array of String objects named displayStrings, our location in that array in the integer variable numberDisplayLines, and the current indentation level in a String object named indent. Using an array of strings like this will facilitate the conversion process when we adapt this program to display in a Java window.
Here's how I add the XML declaration to the display strings in startDocument:
import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class IndentingParserSAX extends DefaultHandler { static String displayStrings[] = new String[1000]; static int numberDisplayLines = 0; static String indent = ""; public static void displayDocument(String uri) { try { IndentingParserSAX SAXHandler = new IndentingParserSAX(); SAXParser parser = new SAXParser(); parser.setContentHandler(SAXHandler); parser.setErrorHandler(SAXHandler); parser.parse(uri); } catch (Exception e) { e.printStackTrace(System.err); } } public void startDocument() { displayStrings[numberDisplayLines] = indent; displayStrings[numberDisplayLines] += "<?xml version=\"1.0\" encoding=\""+ "UTF-8" + "\"?>"; numberDisplayLines++; } . . . public static void main(String args[]) { displayDocument(args[0]); for(int index = 0; index < numberDisplayLines; index++){ System.out.println(displayStrings[index]); } } }
I'll take a look at handling processing instructions next.
You can handle processing instructions with the processingInstruction callback. This method is called with two arguments: the processing instruction's target, and its data. For example, in the <?xml-stylesheet type="text/css" href="style.css"?>, its target is xml-stylesheet and its data is type="text/css" href="style.css".
Here's how I handle processing instructions, adding them to the display strings note that I check first to make sure there is some data before adding it to the processing instruction's display:
import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class IndentingParserSAX extends DefaultHandler { static String displayStrings[] = new String[1000]; static int numberDisplayLines = 0; static String indent = ""; public void processingInstruction(String target, String data) { displayStrings[numberDisplayLines] = indent; displayStrings[numberDisplayLines] += "<?"; displayStrings[numberDisplayLines] += target; if (data != null && data.length() > 0) { displayStrings[numberDisplayLines] += ' '; displayStrings[numberDisplayLines] += data; } displayStrings[numberDisplayLines] += "?>"; numberDisplayLines++; } public static void main(String args[]) { . . . } }
You can handle the start of elements with the startElement method. Because we've found a new element, I'll add four spaces to the current indentation to handle any children the element has, and I'll display its name using the rawName argument:
import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class IndentingParserSAX extends DefaultHandler { static String displayStrings[] = new String[1000]; static int numberDisplayLines = 0; static String indent = ""; public void startElement(String uri, String localName, String rawName, Attributes attributes) { displayStrings[numberDisplayLines] = indent; indent += " "; displayStrings[numberDisplayLines] += '<'; displayStrings[numberDisplayLines] += rawName; displayStrings[numberDisplayLines] += '>'; numberDisplayLines++; } public static void main(String args[]) { . . . } }
That's enough to display the opening tag of an element, but what if the element has attributes?
One of the arguments passed to the startElement method is an object that implements the Attributes interface:
public void startElement(String uri, String localName, String rawName, Attributes attributes) { . . . }
This object gives you access to the attributes of the element; you can find the methods of the Attributes interface in Table 12.4. You can reach the attributes in an object that implements this interface based on index, name, or namespace-qualified name.
Method | Description |
---|---|
int getIndex(java.lang.String rawName) | Gets the index of an attribute given its raw name |
int getIndex(java.lang.String uri, java.lang.String localPart) | Gets the index of an attribute by namespace and local name |
int getLength() | Gets the number of attributes in the list |
java.lang.String getLocalName(int index) | Gets an attribute's local name by index |
java.lang.String getRawName(int index) | Gets an attribute's raw name by index |
java.lang.String getType(int index) | Gets an attribute's type by index |
java.lang.String getType(java.lang. String rawName) | Gets an attribute's type by raw name |
java.lang.String getType(java.lang.String uri, java.lang.String localName) | Gets an attribute's type by namespace and local name |
java.lang.String getURI(int index) | Gets an attribute's namespace URI by index |
java.lang.String getValue(int index) | Gets an attribute's value by index |
java.lang.String getValue(java.lang. String rawName) | Gets an attribute's value by raw name |
java.lang.String getValue(java.lang.String uri, java.lang.String localName) | Gets an attribute's value by namespace name and local name |
So how do we find and display all the attributes an element has? I'll find the number of attributes using the Attributes interface's getLength method, and then I'll get the names and values of the attributes with the getRawName and getValue methods, referring to attributes by index note that I first make sure that this element actually has attributes by checking to make sure that the attributes argument is not null:
import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class IndentingParserSAX extends DefaultHandler { static String displayStrings[] = new String[1000]; static int numberDisplayLines = 0; static String indent = ""; public void startElement(String uri, String localName, String rawName, Attributes attributes) { displayStrings[numberDisplayLines] = indent; indent += " "; displayStrings[numberDisplayLines] += '<'; displayStrings[numberDisplayLines] += rawName; if (attributes != null) { int numberAttributes = attributes.getLength(); for (int loopIndex = 0; loopIndex < numberAttributes; loopIndex++) { displayStrings[numberDisplayLines] += ' '; displayStrings[numberDisplayLines] += attributes.getRawName(loopIndex); displayStrings[numberDisplayLines] += "=\""; displayStrings[numberDisplayLines] += attributes.getValue(loopIndex); displayStrings[numberDisplayLines] += '"'; } } displayStrings[numberDisplayLines] += '>'; numberDisplayLines++; } public static void main(String args[]) { . . . } }
That's all it takes; now we're handling the element's attributes as well.
Many of the elements in customer.xml contain text, such as the <FIRST_NAME> and <LAST_NAME> elements, and we want to display that text. To handle element text, you use the characters callback.
This method is called with three arguments an array of type char that holds the actual character text, the starting location in the array, and the length of the text. For elements that contain only one text node, the starting location is always 0 in the character array.
To add the text inside an element to the display strings, I implement the characters method, converting the character array to a Java String object named characterData like this note that I use the String class's trim method to trim the text of leading and trailing spaces:
import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class IndentingParserSAX extends DefaultHandler { static String displayStrings[] = new String[1000]; static int numberDisplayLines = 0; static String indent = ""; public void characters(char characters[], int start, int length) { String characterData = (new String(characters, start, length)).trim(); . . . } public static void main(String args[]) { . . . } }
To eliminate indentation text the spaces used to indent the elements in the file customer.xml I add an if statement and then add the text itself to the display strings this way:
import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class IndentingParserSAX extends DefaultHandler { static String displayStrings[] = new String[1000]; static int numberDisplayLines = 0; static String indent = ""; public void characters(char characters[], int start, int length) { String characterData = (new String(characters, start, length)).trim(); if(characterData.indexOf("\n") < 0 && characterData.length() > 0) { displayStrings[numberDisplayLines] = indent; displayStrings[numberDisplayLines] += characterData; numberDisplayLines++; } } public static void main(String args[]) { . . . } }
That's all there is to it. By default, the XML for Java SAX parser reports the whitespace a document uses for indentation, which is called "ignorable" whitespace.
So how do you actually ignore "ignorable" whitespace? It's actually easier to ignore ignorable whitespace with the SAX parser than with the DOM parser. The SAX parser needs to know only what text it can ignore, so you must indicate what the proper grammar of the document is, which you could do with a DTD in customer.xml:
<?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)> ]> <DOCUMENT> <CUSTOMER> <NAME> <LAST_NAME>Smith</LAST_NAME> <FIRST_NAME>Sam</FIRST_NAME> </NAME> <DATE>October 15, 2001</DATE> <ORDERS> <ITEM> <PRODUCT>Tomatoes</PRODUCT> <NUMBER>8</NUMBER> <PRICE>$1.25</PRICE> </ITEM> <ITEM> <PRODUCT>Oranges</PRODUCT> <NUMBER>24</NUMBER> <PRICE>$4.98</PRICE> </ITEM> </ORDERS> </CUSTOMER> <CUSTOMER> <NAME> <LAST_NAME>Jones</LAST_NAME> <FIRST_NAME>Polly</FIRST_NAME> </NAME> <DATE>October 20, 2001</DATE> <ORDERS> <ITEM> <PRODUCT>Bread</PRODUCT> <NUMBER>12</NUMBER> <PRICE>$14.95</PRICE> </ITEM> <ITEM> <PRODUCT>Apples</PRODUCT> <NUMBER>6</NUMBER> <PRICE>$1.50</PRICE> </ITEM> </ORDERS> </CUSTOMER> <CUSTOMER> <NAME> <LAST_NAME>Weber</LAST_NAME> <FIRST_NAME>Bill</FIRST_NAME> </NAME> <DATE>October 25, 2001</DATE> <ORDERS> <ITEM> <PRODUCT>Asparagus</PRODUCT> <NUMBER>12</NUMBER> <PRICE>$2.95</PRICE> </ITEM> <ITEM> <PRODUCT ID = "5231" TYPE = "3133">Lettuce</PRODUCT> <NUMBER>6</NUMBER> <PRICE>$11.50</PRICE> </ITEM> </ORDERS> </CUSTOMER> </DOCUMENT>
Now, the SAX parser will not call the characters callback when it sees ignorable whitespace (such as indentation spaces); it will call a method named ignorableWhitespace. That means you can comment out the if statement I used to filter out ignorable whitespace before:
public void characters(char characters[], int start, int length) { String characterData = (new String(characters, start, length)).trim(); //if(characterData.indexOf("\n") < 0 && characterData.length() > 0) { displayStrings[numberDisplayLines] = indent; displayStrings[numberDisplayLines] += characterData; numberDisplayLines++; //} }
That's all it takes to filter out ignorable whitespace, just give the SAX parser some way of figuring out what is ignorable, such as adding a DTD to your document.
Note that you can add code to the ignorableWhitespace to handle that whitespace if you like in fact, you can even pass it on to the characters callback, as I'm doing here:
public void ignorableWhitespace(char characters[], int start, int length) { characters(characters, start, length); }
So far, we've handled the start of each element and incremented the indentation level each time to handle any possible children. We also must display the end tag for each element and decrement the indentation level; I'll do that in the endElement callback, which is called each time the SAX parser reaches the end of an element. Here's what that looks like in code:
import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class IndentingParserSAX extends DefaultHandler { static String displayStrings[] = new String[1000]; static int numberDisplayLines = 0; static String indent = ""; public void endElement(String uri, String localName, String rawName) { indent = indent.substring(0, indent.length() - 4); displayStrings[numberDisplayLines] = indent; displayStrings[numberDisplayLines] += "</"; displayStrings[numberDisplayLines] += rawName; displayStrings[numberDisplayLines] += '>'; numberDisplayLines++; } public static void main(String args[]) { . . . } }
There's one last topic to cover: handling errors and warnings.
The DefaultHandler interface defines several callbacks to handle warnings and errors from the parser. These methods are warning, which handles parser warnings; error, which handles parser errors; and fatalError, which handles errors so severe that the parser can't continue.
Each of these methods is passed an object of the class SAXParseException, and that object supports a method, getMessage, that will return the warning or error message. I display those messages using System.err.println message, which prints to the Java err output channel, which corresponds to the console by default:
import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class IndentingParserSAX extends DefaultHandler { static String displayStrings[] = new String[1000]; static int numberDisplayLines = 0; static String indent = ""; . . . public void warning(SAXParseException exception) { System.err.println("WARNING! " + exception.getMessage()); } public void error(SAXParseException exception) { System.err.println("ERROR! " + exception.getMessage()); } public void fatalError(SAXParseException exception) { System.err.println("FATAL ERROR! " + exception.getMessage()); } public static void main(String args[]) { . . . } }
That's all we need; you can see the results of parsing customer.xml in Figure 12.1, where I'm using the MS-DOS more filter to stop the display from scrolling off the top of the window. This program is a success, and you can find the complete code in Listing 12.1.
import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class IndentingParserSAX extends DefaultHandler { static String displayStrings[] = new String[1000]; static int numberDisplayLines = 0; static String indent = ""; public static void displayDocument(String uri) { try { IndentingParserSAX SAXHandler = new IndentingParserSAX(); SAXParser parser = new SAXParser(); parser.setContentHandler(SAXHandler); parser.setErrorHandler(SAXHandler); parser.parse(uri); } catch (Exception e) { e.printStackTrace(System.err); } } public void startDocument() { displayStrings[numberDisplayLines] = indent; displayStrings[numberDisplayLines] += "<?xml version=\"1.0\" encoding=\""+ "UTF-8" + "\"?>"; numberDisplayLines++; } public void processingInstruction(String target, String data) { displayStrings[numberDisplayLines] = indent; displayStrings[numberDisplayLines] += "<?"; displayStrings[numberDisplayLines] += target; if (data != null && data.length() > 0) { displayStrings[numberDisplayLines] += ' '; displayStrings[numberDisplayLines] += data; } displayStrings[numberDisplayLines] += "?>"; numberDisplayLines++; } public void startElement(String uri, String localName, String rawName, Attributes attributes) { displayStrings[numberDisplayLines] = indent; indent += " "; displayStrings[numberDisplayLines] += '<'; displayStrings[numberDisplayLines] += rawName; if (attributes != null) { int numberAttributes = attributes.getLength(); for (int loopIndex = 0; loopIndex < numberAttributes; loopIndex++) { displayStrings[numberDisplayLines] += ' '; displayStrings[numberDisplayLines] += attributes.getRawName(loopIndex); displayStrings[numberDisplayLines] += "=\""; displayStrings[numberDisplayLines] += attributes.getValue(loopIndex); displayStrings[numberDisplayLines] += '"'; } } displayStrings[numberDisplayLines] += '>'; numberDisplayLines++; } public void characters(char characters[], int start, int length) { String characterData = (new String(characters, start, length)).trim(); if(characterData.indexOf("\n") < 0 && characterData.length() > 0) { displayStrings[numberDisplayLines] = indent; displayStrings[numberDisplayLines] += characterData; numberDisplayLines++; } } public void ignorableWhitespace(char characters[], int start, int length) { //characters(characters, start, length); } public void endElement(String uri, String localName, String rawName) { indent = indent.substring(0, indent.length() - 4); displayStrings[numberDisplayLines] = indent; displayStrings[numberDisplayLines] += "</"; displayStrings[numberDisplayLines] += rawName; displayStrings[numberDisplayLines] += '>'; numberDisplayLines++; } public void warning(SAXParseException exception) { System.err.println("WARNING! " + exception.getMessage()); } public void error(SAXParseException exception) { System.err.println("ERROR! " + exception.getMessage()); } public void fatalError(SAXParseException exception) { System.err.println("FATAL ERROR! " + exception.getMessage()); } public static void main(String args[]) { displayDocument(args[0]); for(int index = 0; index < numberDisplayLines; index++){ System.out.println(displayStrings[index]); } } }
The previous example displayed the entire document, but you can be more selective than that through a process called filtering. When you filter a document, you extract only those elements you're interested in.
Here's a new example named searcherSAX.java. In this case, I'll let the user specify what document to search and what element name to search for, like this, which will display all <ITEM> elements in customer.xml:
%java searcherSAX customer.xml ITEM
This program is not difficult to write, now that we've written the indenting parser example. Note, however, that we must handle not just the specific element the user is searching for, but also the element's children that are contained inside the element. I'll adapt the IndentingParserSAX.java program to create searcherSAX.java; all we'll have to control is when we display elements and when we don't. If the current element matches the element the user is searching for, I'll set a Boolean variable named printFlag to true:
public void startElement(String uri, String localName, String rawName, Attributes attributes) { if(rawName.equals(searchFor)){ printFlag=true; } . . . }
Now I can check whether printFlag is true; if so, I'll add the current element and its attributes to the display strings:
public void startElement(String uri, String localName, String rawName, Attributes attributes) { if(rawName.equals(searchFor)){ printFlag=true; } if (printFlag){ displayStrings[numberDisplayLines] = indent; indent += " "; displayStrings[numberDisplayLines] += '<'; displayStrings[numberDisplayLines] += rawName; if (attributes != null) { int numberAttributes = attributes.getLength(); for (int loopIndex = 0; loopIndex < numberAttributes; loopIndex++) { displayStrings[numberDisplayLines] += ' '; displayStrings[numberDisplayLines] += attributes.getRawName(loopIndex); displayStrings[numberDisplayLines] += "=\""; displayStrings[numberDisplayLines] += attributes.getValue(loopIndex); displayStrings[numberDisplayLines] += '"'; } } displayStrings[numberDisplayLines] += '>'; numberDisplayLines++; } }
And I can do the same in other callback methods that add text to the displayStrings array, such as the character callback:
public void characters(char characters[], int start, int length) { if(printFlag){ String characterData = (new String(characters, start, length)).trim(); if(characterData.indexOf("\n") < 0 && characterData.length() > 0) { displayStrings[numberDisplayLines] = indent; displayStrings[numberDisplayLines] += characterData; numberDisplayLines++; } } }
Note that we don't want to set printFlag to false until after the element that the user is searching for ends, at which point we've displayed the whole element and all its children. When the element ends, I set printFlag to false this way:
public void endElement(String uri, String localName, String rawName) { if(printFlag){ indent = indent.substring(0, indent.length() - 4); displayStrings[numberDisplayLines] = indent; displayStrings[numberDisplayLines] += "</"; displayStrings[numberDisplayLines] += rawName; displayStrings[numberDisplayLines] += '>'; numberDisplayLines++; } if(rawName.equals(searchFor)){ printFlag=false; } }
That's all it takes. I'll filter customer.xml for <ITEM> elements like this:
%java searcherSAX customer.xml ITEM | more
You can see the results in Figure 12.2, where I'm filtering customer.xml to find all <ITEM> elements. The complete code appears in Listing 12.2.
import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class searcherSAX extends DefaultHandler { static String displayStrings[] = new String[1000]; static int numberDisplayLines = 0; static String indent = ""; static boolean printFlag; static String searchFor; public static void displayDocument(String uri) { try { searcherSAX SAXHandler = new searcherSAX(); SAXParser parser = new SAXParser(); parser.setContentHandler(SAXHandler); parser.setErrorHandler(SAXHandler); parser.parse(uri); } catch (Exception e) { e.printStackTrace(System.err); } } public void processingInstruction(String target, String data) { if(printFlag){ displayStrings[numberDisplayLines] = indent; displayStrings[numberDisplayLines] += "<?"; displayStrings[numberDisplayLines] += target; if (data != null && data.length() > 0) { displayStrings[numberDisplayLines] += ' '; displayStrings[numberDisplayLines] += data; } displayStrings[numberDisplayLines] += "?>"; numberDisplayLines++; } } public void startDocument() { if(printFlag){ displayStrings[numberDisplayLines] = indent; displayStrings[numberDisplayLines] += "<?xml version=\"1.0\" encoding=\""+ "UTF-8" + "\"?>"; numberDisplayLines++; } } public void startElement(String uri, String localName, String rawName, Attributes attributes) { if(rawName.equals(searchFor)){ printFlag=true; } if (printFlag){ displayStrings[numberDisplayLines] = indent; indent += " "; displayStrings[numberDisplayLines] += '<'; displayStrings[numberDisplayLines] += rawName; if (attributes != null) { int numberAttributes = attributes.getLength(); for (int loopIndex = 0; loopIndex < numberAttributes; loopIndex++) { displayStrings[numberDisplayLines] += ' '; displayStrings[numberDisplayLines] += attributes.getRawName(loopIndex); displayStrings[numberDisplayLines] += "=\""; displayStrings[numberDisplayLines] += attributes.getValue(loopIndex); displayStrings[numberDisplayLines] += '"'; } } displayStrings[numberDisplayLines] += '>'; numberDisplayLines++; } } public void characters(char characters[], int start, int length) { if(printFlag){ String characterData = (new String(characters, start, length)).trim(); if(characterData.indexOf("\n") < 0 && characterData.length() > 0) { displayStrings[numberDisplayLines] = indent; displayStrings[numberDisplayLines] += characterData; numberDisplayLines++; } } } public void ignorableWhitespace(char characters[], int start, int length) { if(printFlag){ //characters(ch, start, length); } } public void endElement(String uri, String localName, String rawName) { if(printFlag){ indent = indent.substring(0, indent.length() - 4); displayStrings[numberDisplayLines] = indent; displayStrings[numberDisplayLines] += "</"; displayStrings[numberDisplayLines] += rawName; displayStrings[numberDisplayLines] += '>'; numberDisplayLines++; } if(rawName.equals(searchFor)){ printFlag=false; } } public void warning(SAXParseException exception) { System.err.println("WARNING! " + exception.getMessage()); } public void error(SAXParseException exception) { System.err.println("ERROR! " + exception.getMessage()); } public void fatalError(SAXParseException exception) { System.err.println("FATAL ERROR! " + exception.getMessage()); } public static void main(String args[]) { String arg = args[0]; searchFor = args[1]; displayDocument(arg); for(int index = 0; index < numberDisplayLines; index++){ System.out.println(displayStrings[index]); } } }
The examples we've created so far have all created text-based output using the System.out.println method. As noted in the previous chapter, however, few browsers these days work that way. In the next section, I'll take a look at creating a windowed browser.
We wrote the indenting parser example to store the display text in an array named displayStrings, so it's easy to display that text in a Java window as we did in the previous chapter. To do that, I'll create a new example named browserSAX.java; in this program, I create a new object of a class I'll call AppFrame. Then I pass displayStrings and the number of lines to display to the AppFrame class's constructor, and then call the AppFrame object's show method to show the window:
import java.awt.*; import java.awt.event.*; import org.apache.xerces.parsers.SAXParser; import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; public class browserSAX extends DefaultHandler { . . . public static void main(String args[]) { displayDocument(args[0]); AppFrame f = new AppFrame(displayStrings, numberDisplayLines); f.setSize(300, 500); f.addWindowListener(new WindowAdapter() {public void windowClosing(WindowEvent e) {System.exit(0);}}); f.show(); } }
The AppFrame class is based on the Java Frame class, and displays the text we've passed to it:
class AppFrame extends Frame { String displayStrings[]; int numberDisplayLines; public AppFrame(String[] d, int n) { displayStrings = d; numberDisplayLines = n; } public void paint(Graphics g) { Font font; font = new Font("Courier", Font.PLAIN, 12); g.setFont(font); FontMetrics fontmetrics = g.getFontMetrics(getFont()); int y = fontmetrics.getHeight(); for(int index = 0; index < numberDisplayLines; index++){ y += fontmetrics.getHeight(); g.drawString(displayStrings[index], 5, y); } } }
You can see the browerSAX program at work in Figure 12.3, where customer.xml is displayed in a Java window. The complete code appears in Listing 12.3.
import java.awt.*; import java.awt.event.*; import org.apache.xerces.parsers.SAXParser; import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; public class browserSAX extends DefaultHandler { static String displayStrings[] = new String[1000]; static int numberDisplayLines = 0; static String indent = ""; public static void displayDocument(String uri) { try { browserSAX SAXHandler = new browserSAX(); SAXParser parser = new SAXParser(); parser.setContentHandler(SAXHandler); parser.setErrorHandler(SAXHandler); parser.parse(uri); } catch (Exception e) { e.printStackTrace(System.err); } } public void processingInstruction(String target, String data) { displayStrings[numberDisplayLines] = indent; displayStrings[numberDisplayLines] += "<?"; displayStrings[numberDisplayLines] += target; if (data != null && data.length() > 0) { displayStrings[numberDisplayLines] += ' '; displayStrings[numberDisplayLines] += data; } displayStrings[numberDisplayLines] += "?>"; numberDisplayLines++; } public void startDocument() { displayStrings[numberDisplayLines] = indent; displayStrings[numberDisplayLines] += "<?xml version=\"1.0\" encoding=\""+ "UTF-8" + "\"?>"; numberDisplayLines++; } public void startElement(String uri, String localName, String rawName, Attributes attributes) { displayStrings[numberDisplayLines] = indent; indent += " "; displayStrings[numberDisplayLines] += '<'; displayStrings[numberDisplayLines] += rawName; if (attributes != null) { int numberAttributes = attributes.getLength(); for (int loopIndex = 0; loopIndex < numberAttributes; loopIndex++) { displayStrings[numberDisplayLines] += ' '; displayStrings[numberDisplayLines] += attributes.getRawName(loopIndex); displayStrings[numberDisplayLines] += "=\""; displayStrings[numberDisplayLines] += attributes.getValue(loopIndex); displayStrings[numberDisplayLines] += '"'; } } displayStrings[numberDisplayLines] += '>'; numberDisplayLines++; } public void characters(char characters[], int start, int length) { String characterData = (new String(characters, start, length)).trim(); if(characterData.indexOf("\n") < 0 && characterData.length() > 0) { displayStrings[numberDisplayLines] = indent; displayStrings[numberDisplayLines] += characterData; numberDisplayLines++; } } public void ignorableWhitespace(char characters[], int start, int length) { //characters(characters, start, length); } public void endElement(String uri, String localName, String rawName) { indent = indent.substring(0, indent.length() - 4); displayStrings[numberDisplayLines] = indent; displayStrings[numberDisplayLines] += "</"; displayStrings[numberDisplayLines] += rawName; displayStrings[numberDisplayLines] += '>'; numberDisplayLines++; } public void warning(SAXParseException exception) { System.err.println("WARNING! " + exception.getMessage()); } public void error(SAXParseException exception) { System.err.println("ERROR! " + exception.getMessage()); } public void fatalError(SAXParseException exception) { System.err.println("FATAL ERROR! " + exception.getMessage()); } public static void main(String args[]) { displayDocument(args[0]); AppFrame f = new AppFrame(displayStrings, numberDisplayLines); f.setSize(300, 500); f.addWindowListener(new WindowAdapter() {public void windowClosing(WindowEvent e) {System.exit(0);}}); f.show(); } } class AppFrame extends Frame { String displayStrings[]; int numberDisplayLines; public AppFrame(String[] d, int n) { displayStrings = d; numberDisplayLines = n; } public void paint(Graphics g) { Font font; font = new Font("Courier", Font.PLAIN, 12); g.setFont(font); FontMetrics fontmetrics = g.getFontMetrics(getFont()); int y = fontmetrics.getHeight(); for(int index = 0; index < numberDisplayLines; index++){ y += fontmetrics.getHeight(); g.drawString(displayStrings[index], 5, y); } } }
Now that we're parsing and displaying XML documents in windows, there's no reason to restrict ourselves to displaying the text form of an XML document. Take a look at the next topic.
In the previous chapter, I adapted the DOM parser browser we wrote to display circles. It will be instructive to do the same here for the SAX parser browser because it will show how to retrieve specific attribute values. Here's what the document this browser, circlesSAX.java, might read this document is called circles.xml, and as in the previous chapter, I'm specifying the (x, y) origin of each circle and the radius of the circle as attributes of the <CIRCLE> element:
<?xml version = "1.0" ?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CIRCLE|ELLIPSE)*> <!ELEMENT CIRCLE EMPTY> <!ELEMENT ELLIPSE EMPTY> <!ATTLIST CIRCLE X CDATA #IMPLIED Y CDATA #IMPLIED RADIUS CDATA #IMPLIED> <!ATTLIST ELLIPSE X CDATA #IMPLIED Y CDATA #IMPLIED WIDTH CDATA #IMPLIED HEIGHT CDATA #IMPLIED> ]> <DOCUMENT> <CIRCLE X='200' Y='160' RADIUS='50' /> <CIRCLE X='170' Y='100' RADIUS='15' /> <CIRCLE X='80' Y='200' RADIUS='45' /> <CIRCLE X='200' Y='140' RADIUS='35' /> <CIRCLE X='130' Y='240' RADIUS='25' /> <CIRCLE X='270' Y='300' RADIUS='45' /> <CIRCLE X='210' Y='240' RADIUS='25' /> <CIRCLE X='60' Y='160' RADIUS='35' /> <CIRCLE X='160' Y='260' RADIUS='55' /> </DOCUMENT>
Here the trick will be to recover the values of the attributes X, Y, and RADIUS. I'll store those values in arrays named x, y, and radius. Getting an element's attribute values is easier using the XML for Java's SAX parser than it is with the DOM parser. The startElement method is passed an object of Attributes interface, and all you have to do is use that object's getValue method, passing it the name of the attribute you're interested in:
import java.awt.*; import java.awt.event.*; import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class circlesSAX extends DefaultHandler { static int numberFigures = 0; static int x[] = new int[100]; static int y[] = new int[100]; static int radius[] = new int[100]; . . . public void startElement(String uri, String localName, String rawName, Attributes attrs) { if (rawName.equals("CIRCLE")) { x[numberFigures] = Integer.parseInt(attrs.getValue("X")); y[numberFigures] = Integer.parseInt(attrs.getValue("Y")); radius[numberFigures] = Integer.parseInt(attrs.getValue("RADIUS")); numberFigures++; } } . . . public static void main(String args[]) { displayDocument(args[0]); AppFrame f = new AppFrame(numberFigures, x, y, radius); f.setSize(400, 400); f.addWindowListener(new WindowAdapter() {public void windowClosing(WindowEvent e) {System.exit(0);}}); f.show(); } }
Having stored all the circles' data, I display them in the AppFrame class as we did in the previous chapter:
class AppFrame extends Frame { int numberFigures; int[] xValues; int[] yValues; int[] radiusValues; public AppFrame(int number, int[] x, int[] y, int[] radius) { numberFigures = number; xValues = x; yValues = y; radiusValues = radius; } public void paint(Graphics g) { for(int loopIndex = 0; loopIndex < numberFigures; loopIndex++){ g.drawOval(xValues[loopIndex], yValues[loopIndex], radiusValues[loopIndex], radiusValues[loopIndex]); } } }
And that's all it takes; you can see the results in Figure 12.4, where the browser is displaying circles.xml. The complete listing appears in Listing 12.4.
import java.awt.*; import java.awt.event.*; import org.xml.sax.*; import org.w3c.dom.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class circlesSAX extends DefaultHandler { static int numberFigures = 0; static int x[] = new int[100]; static int y[] = new int[100]; static int radius[] = new int[100]; public static void displayDocument(String uri) { try { circlesSAX SAXHandler = new circlesSAX(); SAXParser parser = new SAXParser(); parser.setContentHandler(SAXHandler); parser.setErrorHandler(SAXHandler); parser.parse(uri); } catch (Exception e) { e.printStackTrace(System.err); } } public void startElement(String uri, String localName, String rawName, Attributes attrs) { if (rawName.equals("CIRCLE")) { x[numberFigures] = Integer.parseInt(attrs.getValue("X")); y[numberFigures] = Integer.parseInt(attrs.getValue("Y")); radius[numberFigures] = Integer.parseInt(attrs.getValue("RADIUS")); numberFigures++; } } public void warning(SAXParseException exception) { System.err.println("WARNING! " + exception.getMessage()); } public void error(SAXParseException exception) { System.err.println("ERROR! " + exception.getMessage()); } public void fatalError(SAXParseException exception) { System.err.println("FATAL ERROR! " + exception.getMessage()); } public static void main(String args[]) { displayDocument(args[0]); AppFrame f = new AppFrame(numberFigures, x, y, radius); f.setSize(400, 400); f.addWindowListener(new WindowAdapter() {public void windowClosing(WindowEvent e) {System.exit(0);}}); f.show(); } } class AppFrame extends Frame { int numberFigures; int[] xValues; int[] yValues; int[] radiusValues; public AppFrame(int number, int[] x, int[] y, int[] radius) { numberFigures = number; xValues = x; yValues = y; radiusValues = radius; } public void paint(Graphics g) { for(int loopIndex = 0; loopIndex < numberFigures; loopIndex++){ g.drawOval(xValues[loopIndex], yValues[loopIndex], radiusValues[loopIndex], radiusValues[loopIndex]); } } }
The Node interface available that is when you use the DOM parser contains all the standard W3C DOM methods for navigating in a document, including getNextSibling, getPreviousSibling, getFirstChild, getLastChild, and getParent. It's different when you use a SAX parser, however, because this parser does not create a tree of nodes, so those methods don't apply.
Instead, if you want to find a particular element, you have to find it yourself. In the previous chapter, I found the third person's name in meetings.xml:
<?xml version="1.0"?> <MEETINGS> <MEETING TYPE="informal"> <MEETING_TITLE>XML In The Real World</MEETING_TITLE> <MEETING_NUMBER>2079</MEETING_NUMBER> <SUBJECT>XML</SUBJECT> <DATE>6/1/2002</DATE> <PEOPLE> <PERSON ATTENDANCE="present"> <FIRST_NAME>Edward</FIRST_NAME> <LAST_NAME>Samson</LAST_NAME> </PERSON> <PERSON ATTENDANCE="absent"> <FIRST_NAME>Ernestine</FIRST_NAME> <LAST_NAME>Johnson</LAST_NAME> </PERSON> <PERSON ATTENDANCE="present"> <FIRST_NAME>Betty</FIRST_NAME> <LAST_NAME>Richardson</LAST_NAME> </PERSON> </PEOPLE> </MEETING> </MEETINGS>
It's not difficult to do the same thing here, but in SAX programming, finding a specific element takes a little code. I start by finding the third <PERSON> element and setting a variable named thirdPersonFlag true when I find it:
public void startElement(String uri, String localName, String rawName, Attributes attributes) { if(rawName.equals("PERSON")) { personCount++; } if(personCount == 3) { thirdPersonFlag = true; } . . . }
When the SAX parser is parsing the third person's <FIRST_NAME> element, I'll set a variable named firstNameFlag to true; when it's parsing the third person's <LAST_NAME> element, I'll set a variable named lastNameFlag to true.
public void startElement(String uri, String localName, String rawName, Attributes attributes) { if(rawName.equals("PERSON")) { personCount++; } if(personCount == 3) { thirdPersonFlag = true; } if(rawName.equals("FIRST_NAME") && thirdPersonFlag) { firstNameFlag = true; } if(rawName.equals("LAST_NAME") && thirdPersonFlag) { firstNameFlag = false; lastNameFlag = true; } }
Watching the variables firstNameFlag and lastNameFlag, I can store the person's first and last names in the character callback:
public void characters(char characters[], int start, int length) { String characterData = (new String(characters, start, length)).trim(); if(characterData.indexOf("\n") < 0 && characterData.length() > 0) { if(firstNameFlag) { firstName = characterData; } if(lastNameFlag) { lastName = characterData; } } }
When the SAX parser is done parsing the third <PERSON> element, I'll display that person's name:
public void endElement(String uri, String localName, String rawName) { if(thirdPersonFlag && lastNameFlag){ System.out.println("Third name: " + firstName + " " + lastName); thirdPersonFlag = false; firstNameFlag = false; lastNameFlag = false; } }
And that's the technique you use when you're hunting a specific element using a SAX parser you just wait until the parser hands it to you. Here are the results:
%java navSAX meetings.xml Third name: Betty Richardson
You can see the full code for this program, navSAX.java, in Listing 12.5.
import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.xerces.parsers.SAXParser; public class navSAX extends DefaultHandler { int personCount; boolean thirdPersonFlag = false, firstNameFlag = false, lastNameFlag = false; String firstName, lastName; public static void displayDocument(String uri) { try { navSAX SAXHandler = new navSAX(); SAXParser parser = new SAXParser(); parser.setContentHandler(SAXHandler); parser.setErrorHandler(SAXHandler); parser.parse(uri); } catch (Exception e) { e.printStackTrace(System.err); } } public void startElement(String uri, String localName, String rawName, Attributes attributes) { if(rawName.equals("PERSON")) { personCount++; } if(personCount == 3) { thirdPersonFlag = true; } if(rawName.equals("FIRST_NAME") && thirdPersonFlag) { firstNameFlag = true; } if(rawName.equals("LAST_NAME") && thirdPersonFlag) { firstNameFlag = false; lastNameFlag = true; } } public void characters(char characters[], int start, int length) { String characterData = (new String(characters, start, length)).trim(); if(characterData.indexOf("\n") < 0 && characterData.length() > 0) { if(firstNameFlag) { firstName = characterData; } if(lastNameFlag) { lastName = characterData; } } } public void endElement(String uri, String localName, String rawName) { if(thirdPersonFlag && lastNameFlag){ System.out.println("Third name: " + firstName + " " + lastName); thirdPersonFlag = false; firstNameFlag = false; lastNameFlag = false; } } public void warning(SAXParseException exception) { System.err.println("WARNING! " + exception.getMessage()); } public void error(SAXParseException exception) { System.err.println("ERROR! " + exception.getMessage()); } public void fatalError(SAXParseException exception) { System.err.println("FATAL ERROR! " + exception.getMessage()); } public static void main(String args[]) { displayDocument(args[0]); } }
In the previous chapter, we saw that the XML for Java DOM parser has several methods that let you modify a document in memory, such as insertBefore and addChild, and so on. SAX parsers don't give you access to the whole document tree at once, so no similar methods exist here.
However, if you want, you can "modify" the structure of a document when using a SAX parser simply by calling various callback methods yourself. In the previous chapter, I modified customer.xml to create customer2.xml, adding a <MIDDLE_NAME> element with the text XML to each <PERSON> element in addition to the <FIRST_NAME> and <LAST_NAME> elements. It's easy enough to do the same here using SAX methods. All I have to do is to wait for a <FIRST_NAME> element and then "create" a new element by calling the startElement, characters, and endElement callbacks myself:
public void endElement(String uri, String localName, String rawName) { indent = indent.substring(0, indent.length() - 4); displayStrings[numberDisplayLines] = indent; displayStrings[numberDisplayLines] += "</"; displayStrings[numberDisplayLines] += rawName; displayStrings[numberDisplayLines] += '>'; numberDisplayLines++; if (rawName.equals("FIRST_NAME")) { startElement("", "MIDDLE_NAME", "MIDDLE_NAME", null); characters("XML".toCharArray(), 0, "XML".length()); endElement("", "MIDDLE_NAME", "MIDDLE_NAME"); } }
In the main method, I'll write this new document out to customer2.xml:
public static void main(String args[]) { displayDocument(args[0]); try { FileWriter filewriter = new FileWriter("customer2.xml"); for(int loopIndex = 0; loopIndex < numberDisplayLines; loopIndex++){ filewriter.write(displayStrings[loopIndex].toCharArray()); filewriter.write('\n'); } filewriter.close(); } catch (Exception e) { e.printStackTrace(System.err); } }
And that's it; here's what the resulting file, customer2.xml, looks like, with the new <MIDDLE_NAME> elements:
<?xml version="1.0" encoding="UTF-8"?> <DOCUMENT> <CUSTOMER> <NAME> <LAST_NAME> Smith </LAST_NAME> <FIRST_NAME> Sam </FIRST_NAME> <MIDDLE_NAME> XML </MIDDLE_NAME> </NAME> <DATE> October 15, 2001 </DATE> <ORDERS> <ITEM> <PRODUCT> Tomatoes </PRODUCT> <NUMBER> 8 </NUMBER> <PRICE> $1.25 </PRICE> </ITEM> <ITEM> <PRODUCT> Oranges </PRODUCT> <NUMBER> 24 </NUMBER> <PRICE> $4.98 </PRICE> </ITEM> </ORDERS> </CUSTOMER> <CUSTOMER> <NAME> <LAST_NAME> Jones </LAST_NAME> <FIRST_NAME> Polly </FIRST_NAME> <MIDDLE_NAME> XML </MIDDLE_NAME> </NAME> <DATE> October 20, 2001 </DATE> <ORDERS> <ITEM> <PRODUCT> Bread </PRODUCT> <NUMBER> 12 </NUMBER> <PRICE> $14.95 </PRICE> </ITEM> <ITEM> <PRODUCT> Apples </PRODUCT> <NUMBER> 6 </NUMBER> <PRICE> $1.50 </PRICE> </ITEM> </ORDERS> </CUSTOMER> <CUSTOMER> <NAME> <LAST_NAME> Weber </LAST_NAME> <FIRST_NAME> Bill </FIRST_NAME> <MIDDLE_NAME> XML </MIDDLE_NAME> </NAME> <DATE> October 25, 2001 </DATE> <ORDERS> <ITEM> <PRODUCT> Asparagus </PRODUCT> <NUMBER> 12 </NUMBER> <PRICE> $2.95 </PRICE> </ITEM> <ITEM> <PRODUCT> Lettuce </PRODUCT> <NUMBER> 6 </NUMBER> <PRICE> $11.50 </PRICE> </ITEM> </ORDERS> </CUSTOMER> </DOCUMENT>
That finishes our work with the XML for Java package and with Java for the moment. In the next chapter, I'm going to start taking a look at working with XSL transformations.
CONTENTS |