SAX | Effective XML: 50 Specific Ways to Improve Your XML

SAX enables you to write code that is completely independent of the underlying parser. The major issue is that you use the XMLReaderFactory.createXMLReader() method to construct new instances of the XMLReader interface rather than calling the constructor directly. For example, here is the correct way to load a SAX2 parser.

 XMLReader parser = XMLReaderFactory.createXMLReader();

Below is the wrong way to load a SAX parser.

 XMLReader parser = new org.apache.xerces.parsers.SAXParser();

The first statement loads the parser named by the org.xml.sax.driver system property. This is easy to adjust at runtime. The second statement always loads Xerces and can't be changed without recompiling. Both statements may actually create the same object, but the first leaves open the possibility of using a different parser without even recompiling the code.

You can hard-code the parser you want as a string if you really want to rely on a specific parser.

 XMLReader parser = XMLReaderFactory.createXMLReader(   "org.apache.xerces.parsers.SAXParser");

Ideally, however, the string containing the fully packaged qualified class name should be part of a resource bundle or other configuration file separate from the code itself so it can be modified without recompiling.

Note

This doesn't work quite as well in other languages as it does in Java. For instance, C and C++ bind early rather than late, so you probably need to recompile to switch to a different parser. Furthermore, although SAX is fairly common in C and C++ parsers, there is no official standard for it, so there are often subtle differences between implementations . Code editing may be required. Languages like Python and Perl fall somewhere in between Java and C++ in terms of ease of switching parsers. This doesn't reflect any fundamental limitations in these languages, just that the programmers who wrote the first parsers and defined SAX preferred to work in Java. Nonetheless, even if you're not working in Java, you should still endeavor to write code that's as parser independent as possible in order to minimize the amount of work you have to do when swapping out one parser for another.

Furthermore, unless you have very good reasons to limit the choice of parser, you should always provide a fallback to the default parser if you fail to load a specific parser by name. Exactly which parsers are available can vary a lot from environment to environment, especially in Java 1.3 and earlier. For example, the following code falls back on the default parser if Xerces isn't in the classpath.

 XMLReader parser; try {   parser =     XMLReaderFactory.createXMLReader(       "org.apache.xerces.parsers.SAXParser"); } catch (SAXException ex) { // Xerces not found   parser = XMLReaderFactory.createXMLReader(); }

As long as you stick to the core classes in the org.xml.sax and org.xml.sax.helpers packages, your code should be reasonably portable to any parser that implements SAX2. And although technically optional, every SAX2 parser I've ever encountered also implements the optional interfaces in the org.xml.sax.ext packages, that is, LexicalHandler and DeclHandler .

One area in which SAX parsers do differ is support for various features and properties. Although SAX2 defines over a dozen standard features and properties, only two must be implemented by all conformant processors (http://xml.org/sax/features/namespaces and http://xml.org/sax/features/namespace-prefixes). The others are all optional, and some parsers do omit them. If you attempt to set or read a feature or property that the parser does not understand, it will throw a SAXNotRecognizedException . If it recognizes the feature or property name but cannot set that feature or property to the requested value, it will throw a SAXNotSupportedException . Both are subclasses of SAXException . Be sure to catch these and respond appropriately.

Sometimes, the feature or property is optional for processing, and you can just ignore a failure to set it. For example, if you attempt to set a LexicalHandler object and fail, you may not be able to round-trip the document. However, you'll still get all the information you really care about.

 try {   parser.setProperty(    "http://xml.org/sax/properties/lexical-handler",    new MyLexicalHandler()); } catch (SAXNotRecognizedException ex) {   // no big deal; continue normal processing }

Other times you may want to give up completely. For example, if your code depends on interning of names for proper operationsfor instance, it compares element names using the == operatoryou'll want to make sure that the http://xml.org/sax/features/string-interning property is true and not continue if it isn't.

 try {   parser.setFeature(     "http://xml.org/sax/features/string-interning", true); } catch (SAXException ex) {   throw new RuntimeException(    "Could not find a parser that supports string interning"); }

Still other times you may want to take some intermediate course. For example, if the parser doesn't support validation, you might try to load a different parser that does.

 XMLReader parser = XMLReaderFactory.createXMLReader(); try {   parser.setFeature(     "http://xml.org/sax/features/validation", true); } catch (SAXNotRecognizedException ex) {   try { // Load a parser that is known to validate     parser = XMLReaderFactory.createXMLReader(       "org.apache.xerces.parsers.SAXParser");     parser.setFeature("http://xml.org/sax/features/validation",                        true);   }   catch (SAXException se) {     // Xerces is not in the classpath     throw new RuntimeException(       "Could not find a validating parser");   } } // continue with parsing and validating...

In all cases, however, you should not assume that you can actually set the feature or property you're trying to set. Be prepared for the attempt to fail, and respond accordingly . This will help your code either work properly or fail gracefully no matter which parser is used.

Another area in which parsers differ is support for SAX 2.0.1. This minor upgrade to SAX adds Locator2 , Attributes2 , and EntityResolver2 interfaces that fill a few small holes in SAX 2.0. These interfaces are not yet broadly supported (and arguably cannot be supported in a JAXP-compliant environment). Thus, you need to be more careful when using them. However, you can test for availability before using them by using instanceof . For example, the following code prints the encoding if and only if the Locator passed to setDocumentLocator() is a Locator2 object.

 public void setDocumentLocator(Locator locator) {   if (locator instanceof Locator2) {     Locator2 locator2 = (Locator2) locator;     System.out.println("Encoding is " + locator2.getEncoding());   } }

Alternatively, you can simply check the values of the http://xml.org/sax/features/use-locator2, http://xml.org/sax/features/use-attributes2, and http://xml.org/sax/features/use-entity-resolver2 features. If the parser supports these SAX 2.0.1 classes, these features will have the value true. However, if it does not support them, reading these URLs will probably throw a SAXNotRecognizedException rather than returning false. This makes them a little more cumbersome than simply using instanceof .