Receiving Characters

Receiving Characters

When the parser reads #PCDATA, it passes this text to the characters() method as an array of chars. Although it would be simpler if characters() just took a String as an argument, using a char[] array allows certain performance optimizations. In particular, parsers often store a large chunk of the original document in a single array, and repeatedly pass that same array to the characters() method, while updating the values of start and length .

On the flip side, when there's a large amount of text between two tags with no intervening markup, the parser may choose to call characters() multiple times even though it doesn't need to. Xerces generally won't pass more than 16K of text in one call. Crimson is limited to about 8K of text per call. At the extreme, I have even seen a parser pass a single character at a time to the characters() method. You must not assume that the parser will pass the maximum contiguous run of text in a single call to characters() .

This can lead to some uncomfortable contortions when processing many documents. Given an element such as <Name>Birdsong Clock</ Name > , you typically want to process the entire content as a unit. This requires you to set a boolean flag at the start-tag for the element in startElement() ; accumulate the data into a buffer of some kind, often a StringBuffer ; and act on the data only when you reach the end-tag for the element, as signaled by the endElement() method.

For an example, I'm going to revisit the Fibonacci XML-RPC client program from Chapter 5. This time, rather than printing the result on System.out , I'm going to collect the result and make it available as a BigInteger . Once again, this will require the ContentHandler to recognize the contents of the single double element in the response while ignoring everything else. Example 6.10 demonstrates this.

Example 6.10 A SAX Client for the Fibonacci XML-RPC Server
 import*; import*; import java.math.BigInteger; import org.xml.sax.*; import org.xml.sax.helpers.*; public class NewFibonacciClient {   public final static String DEFAULT_SERVER    = "";   public static BigInteger calculateFibonacci(int index,    String server) throws IOException, SAXException {       // Connect to the the server       URL u = new URL(server);       URLConnection uc = u.openConnection();       HttpURLConnection connection = (HttpURLConnection) uc;       connection.setDoOutput(true);       connection.setDoInput(true);       connection.setRequestMethod("POST");       OutputStream out = connection.getOutputStream();       Writer wout = new OutputStreamWriter(out, "UTF-8");       // Transmit the request XML document       wout.write("<?xml version=\"1.0\"?>\r\n");       wout.write("<methodCall>\r\n");       wout.write(        "  <methodName>calculateFibonacci</methodName>\r\n");       wout.write("  <params>\r\n");       wout.write("    <param>\r\n");       wout.write("      <value><int>" + index        + "</int></value>\r\n");       wout.write("    </param>\r\n");       wout.write("  </params>\r\n");       wout.write("</methodCall>\r\n");       wout.flush();       wout.close();        // Read the response XML document       XMLReader parser = XMLReaderFactory.createXMLReader(         "org.apache.xerces.parsers.SAXParser"       );       FibonacciHandler handler = new FibonacciHandler();       parser.setContentHandler(handler);       InputStream in = connection.getInputStream();       InputSource source = new InputSource(in);       parser.parse(source);       in.close();       connection.disconnect();       return handler.result;   }   static class FibonacciHandler extends DefaultHandler {     StringBuffer buffer = null;     BigInteger result = null;     public void startElement(String namespaceURI,      String localName, String qualifiedName, Attributes atts) {       if (qualifiedName.equals("double")) {         buffer = new StringBuffer();       }     }     public void endElement(String namespaceURI, String localName,      String qualifiedName) {       if (qualifiedName.equals("double")) {         String accumulatedText = buffer.toString();         result = new BigInteger(accumulatedText);         buffer = null;       }     }     public void characters(char[] text, int start, int length)      throws SAXException {       if (buffer != null) {         buffer.append(text, start, length);       }     }   }   public static void main(String[] args) {     int index;     try {       index = Integer.parseInt(args[0]);     }     catch (Exception e) {       System.out.println(        "Usage: java NewFibonacciClient number url"       );       return;     }     String server = DEFAULT_SERVER;     if (args.length >= 2) server = args[1];     try {       BigInteger result = calculateFibonacci(index, server);       System.out.println(result);     }     catch (Exception e) {       e.printStackTrace();     }   } } 

The return value is stored in a private BigInteger field named result . The value of this field only makes sense after the response has been received and parsed; therefore, I hide the ContentHandler in a static inner class, which is accessed through the static calculateFibonacci() method. Because ContentHandler methods often need to be called in specific order from a certain context, the strategy of hiding them inside a nonpublic , possibly inner class is quite common. It's not absolutely required, but it does make the class safer and the public interface much simpler.

What's really new here is how the characters() method operates. Fibonacci numbers grow arbitrarily large exponentially quickly. There does exist a Fibonacci number, the exact size depending on the parser, that will not be completely given in a single call to characters() . Consequently, rather than simply storing a boolean flag that tells us whether we're in the double element, we use a StringBuffer field. This is null outside the double element. It is non-null inside the double element. When it is non-null, the characters() method appends data to the buffer. That data is acted onin this case, converted to an integeronly when an end-tag is spotted and the endElement() method is invoked.

This general approach of accumulating data into a buffer and acting on it only after the last character of data has been seen is very common in SAX programs. Elements that contain mixed content are handled similarly. Elements that can recursively contain other elements with the same name (for example, in XHTML a div can contain another div ) are trickier, but normally can be handled by using a stack of element name flags rather than a single boolean flag. Indeed stacks are often very convenient data structures when processing XML with SAX, as is evident in earlier examples and will be seen again before this chapter is done.

Processing XML with Java. A Guide to SAX, DOM, JDOM, JAXP, and TrAX
Processing XML with Javaв„ў: A Guide to SAX, DOM, JDOM, JAXP, and TrAX
ISBN: 0201771861
EAN: 2147483647
Year: 2001
Pages: 191 © 2008-2017.
If you may any questions please contact us: