DTDHandler | Processing XML with Javaв„ў: A Guide to SAX, DOM, JDOM, JAXP, and TrAX

SAX is mostly about the instance document, not the DTD or schema. However, given a validating parser, or at least an internal DTD subset, the DTD can affect the contents of the instance document in six ways:

It can provide default values for attributes.
It can assign types to attributes, which affects their normalized value.
It can distinguish between ignorable and non-ignorable white space.
It can declare general entities.
It can declare unparsed entities.
It can declare notations.

The first four are resolved silently. For example, when applying a default value for an attribute to an element, the parser simply adds that attribute to the Attributes object it passes to startElement() . It doesn't tell you that it's done it. It just does it.

The DTDHandler interface covers the last two effects. Because notations and unparsed entities are so infrequently used, they're not made a part of the main ContentHandler interface. Instead they're given their own callback interface that's just for working with notations and unparsed entities, DTDHandler . This is summarized in Example 7.16 The few developers who need this functionality can use it. Everyone else can ignore it.

Example 7.16 The DTDHandler Interface

 package org.xml.sax; public interface DTDHandler {   public void notationDecl(String name, String publicID,    String systemID) throws SAXException;   public void unparsedEntityDecl(String name, String publicID,    String systemID, String notationName) throws SAXException; }

As with other callback interfaces, developers implement this interface in a class of their own choosing. That concrete instantiation is registered with the XMLReader through its setDTDHandler() method. For parallelism, there's also a getDTDHandler() method, although it isn't much needed in practice:

 public void  setDTDHandler  (DTDHandler  handler  )  public DTDHandler  getDTDHandler  ()

As with the other callback interfaces, you can uninstall a DTDHandler by passing null to setDTDHandler() .

The most common thing to do with a DTDHandler is simply to store all of the information provided about the notations and unparsed entities. Then the ContentHandler can refer back to this when it needs to resolve an unparsed entity. Example 7.17 is a simple DTDHandler implementation that stores the notations and unparsed entities declared in the DTD in two hash tables.

Example 7.17 A Caching DTDHandler

 import org.xml.sax.*; import java.util.Hashtable; public class UnparsedCache implements DTDHandler {   private Hashtable notations = new Hashtable();   private Hashtable entities = new Hashtable();   public void notationDecl(String name, String publicID,    String systemID) {     System.out.println(name);     notations.put(name, new Notation(name, publicID, systemID));   }   public void unparsedEntityDecl(String name, String publicID,    String systemID, String notationName) {     entities.put(name, new UnparsedEntity(name, publicID,      systemID, notationName));   }   public UnparsedEntity getUnparsedEntity(String name) {     System.out.println("Getting " + name);     return (UnparsedEntity) entities.get(name);   }   public Notation getNotation(String name) {     System.out.println("Getting " + name);     return (Notation) notations.get(name);   } }

For the convenience of tracking the several strings associated with each notation and unparsed entity, I wrap each one in a very simple class that just has a constructor, some getter methods , the equals() and hashCode() methods needed to store these objects in hash tables, and a toString() method for convenient output. The Notation class is shown in Example 7.18, and the UnparsedEntity class is shown in Example 7.19. Once you learn about DOM, an alternative would be to use that API's Notation and Entity classes instead.

Example 7.18 A Notation Utility Class

 public class Notation {   private String name;   private String publicID;   private String systemID;   public Notation(String name, String publicID,    String systemID) {     this.name = name;     this.publicID = publicID;     this.systemID = systemID;   }   public String getName() {     return this.name;   }   public String getSystemID() {     return this.systemID;   }   public String getPublicID() {     return this.publicID;   }   public boolean equals(Object o) {     if (o instanceof Notation) {       Notation n = (Notation) o;       // Well-formedness requires every notation to have       // at least a SYSTEM or a PUBLIC ID so both should not be       // simultaneously null as long as the UnparsedCache built       // this object.       if (publicID == null) {         return name.equals(n.name)          && systemID.equals(n.systemID);       }       else if (systemID == null) {         return name.equals(n.name)          && publicID.equals(n.publicID);       }       else {         return name.equals(n.name)          && publicID.equals(n.publicID)          && systemID.equals(n.systemID);       }     }     return false;   }   public int hashCode() {     if (publicID == null) {       return name.hashCode() ^ systemID.hashCode();     }     else if (systemID == null) {       return name.hashCode() ^ publicID.hashCode();     }     else {       return name.hashCode() ^ publicID.hashCode()        ^ systemID.hashCode();     }   }   public String toString() {     StringBuffer result = new StringBuffer(name);     if (publicID != null) {       result.append(" PUBLIC ");       result.append(publicID);       if (systemID != null) {         result.append(" ");         result.append(systemID);       }     }     else {       result.append(" SYSTEM ");       result.append(systemID);     }     return result.toString();   } }

Example 7.19 An UnparsedEntity Utility Class

 public class UnparsedEntity {   private String name;   private String publicID;   private String systemID;   private String notationName;   public UnparsedEntity(String name, String publicID,    String systemID, String notationName) {     this.name = name;     this.publicID = publicID;     this.systemID = systemID;     this.notationName = notationName;   }   public String getName() {     return this.name;   }   public String getSystemID() {     return this.systemID;   }   public String getPublicID() {     return this.publicID;   }   public String getNotationName() {     return this.notationName;   }   public boolean equals(Object o) {     if (o instanceof UnparsedEntity) {       UnparsedEntity entity = (UnparsedEntity) o;       if (publicID == null) {         return name.equals(entity.name)          && systemID.equals(entity.systemID)          && notationName.equals(entity.notationName);       }       else {         return name.equals(entity.name)          && systemID.equals(entity.systemID)          && publicID.equals(entity.publicID)          && notationName.equals(entity.notationName);       }     }     return false;   }   public int hashCode() {     if (publicID == null) {       return name.hashCode() ^ systemID.hashCode()        ^ notationName.hashCode();     }     else {       return name.hashCode() ^ publicID.hashCode()        ^ systemID.hashCode() ^ notationName.hashCode();     }   }   public String toString() {     StringBuffer result = new StringBuffer(name);     if (publicID == null) {       result.append(" PUBLIC ");       result.append(publicID);     }     else {       result.append(" SYSTEM ");     }     result.append(" ");     result.append(systemID);     return result.toString();   } }

When you later encounter an attribute of type ENTITY, ENTITIES, or NOTATION in the ContentHandler , you can use the getEntity() and getNotation() methods to return the relevant data for that item. Example 7.20 is a simple program to list the unparsed entities and notations discovered in an XML document.

Example 7.20 A Program That Lists the Unparsed Entities and Notations Used in an XML Document

 import org.xml.sax.*; import org.xml.sax.helpers.*; import java.io.IOException; import java.util.StringTokenizer; public class EntityLister extends DefaultHandler {   private UnparsedCache cache;   public EntityLister(UnparsedCache cache) {     this.cache = cache;   }   public void startElement(String namespaceURI, String localName,    String qualifiedName, Attributes attributes) {     for (int i = 0; i < attributes.getLength(); i++) {       if (attributes.getType(i).equals("NOTATION")) {         Notation n = cache.getNotation(attributes.getValue(i));         System.out.println("Element " + qualifiedName          + " has notation " + n);       }       else if (attributes.getType(i).equals("ENTITY")) {         UnparsedEntity e = cache.getUnparsedEntity(          attributes.getValue(i));         System.out.println("Entity: " + e);       }       else if (attributes.getType(i).equals("ENTITIES")) {         String entityNames = attributes.getValue(i);         StringTokenizer st          = new StringTokenizer(entityNames);         while (st.hasMoreTokens()) {            String name = st.nextToken();            UnparsedEntity e = cache.getUnparsedEntity(name);            System.out.println("Entity: " + e);         }       }     }   }   public static void main(String[] args) {     if (args.length <= 0) {       System.out.println("Usage: java EntityLister URL");       return;     }     String document = args[0];     try {       XMLReader parser = XMLReaderFactory.createXMLReader();       // I want to use qualified names       parser.setFeature(        "http://xml.org/sax/features/namespace-prefixes", true);       UnparsedCache cache = new UnparsedCache();       parser.setDTDHandler(cache);       parser.setContentHandler(new EntityLister(cache));       parser.parse(document);     }     catch (Exception e) {       System.out.println("Could not read document because "        + e.getMessage());     }   } }

It took me a while to find an XML document in the wild that actually used notations and unparsed entities. But David Carlisle pointed out to me that DocBook uses notations to identify preformatted elements in which white space should be preserved. This book is written in DocBook, so I decided to run EntityLister across a rough draft of this chapter. Here's what came out:

 %  java EntityLister xmlreader.xml  Element screen has notation linespecific SYSTEM linespecific Element programlisting has notation linespecific  SYSTEM linespecific Element programlisting has notation linespecific  SYSTEM linespecific Element programlisting has notation linespecific  SYSTEM linespecific Element programlisting has notation linespecific  SYSTEM linespecific Element programlisting has notation linespecific  SYSTEM linespecific Element programlisting has notation linespecific  SYSTEM linespecific Element programlisting has notation linespecific  SYSTEM linespecific Element programlisting has notation linespecific  SYSTEM linespecific Element programlisting has notation linespecific  SYSTEM linespecific Element programlisting has notation linespecific  SYSTEM linespecific Element programlisting has notation linespecific  SYSTEM linespecific Element programlisting has notation linespecific  SYSTEM linespecific Element programlisting has notation linespecific  SYSTEM linespecific Element programlisting has notation linespecific  SYSTEM linespecific Element screen has notation linespecific SYSTEM linespecific Element programlisting has notation linespecific  SYSTEM linespecific ...