Receiving Skipped Entities


Receiving Skipped Entities

Validating parsers resolve all general entity references that occur in both element content and attribute values. However, nonvalidating parsers are allowed not to read the external DTD subset. Consider the simple XHTML document in Example 6.14.

Example 6.14 An XML Document Containing a Potentially Skipped Entity Reference
 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"               "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml">   <body>      <h1>My resum&eacute;</h1>   </body> </html> 

If a parser does not read the DTD, then it has no way of knowing what the entity reference &eacute; stands for, or indeed whether that entity reference is even properly defined. However, such a nonvalidating parser will assume that the entity reference is defined in the external DTD subset it didn't read. But rather than reporting the replacement text for that entity, it reports a skipped entity using the skippedEntity() callback method:

 public void  skippedEntity  (String  name  ) throws SAXException 

For example, according to the XHTML 1.0 specification, if a User Agent such as a browser

encounters an entity reference (other than one of the predefined HTML entities) for which the User Agent has processed no declaration (which could happen if the declaration is in the external subset which the User Agent hasn't read), the entity reference should be processed as the characters (starting with the ampersand and ending with the semi- colon ) that make up the entity reference.

In other words, rather than rendering &prescription_take; as the symbol 8, the browser is supposed to draw it as simply &prescription_take;. If you were writing an XHTML browser that did not validate but did require full conformance to XHTML 1.0, you would probably implement the skippedEntity() method by passing an ampersand, the name of the entity reference, and a semicolon to the characters() method in the same content handler, like this:

 public void skippedEntity(String name)   throws SAXException {   StringBuffer sb = new StringBuffer();   sb.append('&');   sb.append(name);   sb.append(';');   char[] text = new char[sb.length()];   sb.getChars(0, sb.length(), text, 0)   this.characters(text, 0, text.length); } 

Skipped entities can also appear in attribute values. For example:

 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"                "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml">   <body>      <div purpose="resum&eacute;">      ...      </div>   </body> </html> 

This is one of the few holes in SAX. The parser will not report such an entity to you. The value it assigns to the attribute is calculated by simply deleting the entity reference. In this example, the value of the purpose attribute would be reported as "resum" if the parser does not read the DTD.



Processing XML with Java. A Guide to SAX, DOM, JDOM, JAXP, and TrAX
Processing XML with Javaв„ў: A Guide to SAX, DOM, JDOM, JAXP, and TrAX
ISBN: 0201771861
EAN: 2147483647
Year: 2001
Pages: 191

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net