Implementations have quite a bit of leeway in exactly how they parse and serialize any given document. For example, a parser may represent CDATA sections as CDATASection objects, or it may merge them into neighboring Text objects. A parser may include entity reference nodes in the tree, or it may instead include the nodes corresponding to each entity's replacement text. A parser may include comments, or it may ignore them. DOM3 adds four methods to the Document interface to control exactly how a parser makes these choices: public void normalizeDocument () public boolean canSetNormalizationFeature (String name, boolean state ) public void setNormalizationFeature (String name, boolean state ) public boolean getNormalizationFeature (String name ) The canSetNormalizationFeature() method tests whether the implementation supports the desired value (true or false) for the named feature. The setNormalizationFeature() method sets the value of the named feature. It throws a DOMException with the error code NOT_FOUND_ERR if the implementation does not support the feature at all. It throws a DOMException with the error code NOT_SUPPORTED_ERR if the implementation does not support the requested value for the feature (for example, if you try to set to true a feature that must have the value false). Finally, after all of the features have been set, client code can invoke the normalizeDocument() method to modify the tree in accordance with the current values for all of the different features. Caution These are very bleeding-edge ideas from the latest DOM3 Core Working Draft. Xerces 2.0.2 is the only parser that supports any of this so far. The DOM3 specification defines 13 standard features:
In addition, vendors are allowed to define their own nonstandard features. Feature names must be XML 1.0 names and should use a vendor-specific prefix such as apache: or oracle: . For an example of how these could be useful, consider the SOAP servlet in Example 10.14. It needed to locate the calculateFibonacci element in the request document and extract its full text content. This had to work even if that element contained comments and CDATA sections. The getFullText() method that accomplished this wasn't too hard to write. Nonetheless, in DOM3 it's even easier. Set the create-cdata-nodes and comments features to false and call normalizeDocument() as soon as the document is parsed. Once this is done, the calculateFibonacci element contains only one text-node child. try { Document request = parser.parse(in); request.setNormalizationFeature("create-cdata-nodes", false); request.setNormalizationFeature("comments", false); request.normalizeDocument(); NodeList ints = request.getElementsByTagNameNS( "http://namespaces.cafeconleche.org/xmljava/ch3/", "calculateFibonacci"); Node calculateFibonacci = ints.item(0); Node text = calculateFibonacci.getFirstChild(); String generations = text.getNodeValue(); // ... } catch (DOMException e) { // The create-cdata-nodes features is true by default and // parsers aren't required to support a false value for it, so // you should be prepared to fall back on manual normalization // if necessary. The comments feature, however, is required. } This wouldn't work for the XML-RPC case, however, because XML-RPC documents can contain processing instructions, and there's no feature to turn off processing instructions. |