DOM Level 3


DOM Level 3 (DOM3) will finally add a standard load-and-save package, making it possible to write completely implementation-independent DOM programs. This package, org.w3c.dom.ls , is identified by the feature strings LS-Load and LS-Save. The loading part includes the DOMBuilder interface you've already encountered . The saving part is based on the DOMWriter interface. DOMWriter is more powerful than XMLSerializer . Whereas XMLSerializer is limited to outputting documents, document fragments , and elements, DOMWriter can output any kind of node at all. Furthermore, you can install a filter into a DOMWriter to control its output.

As shown by the method signatures in Example 13.3, DOMWriter can copy a Node object from memory into serialized bytes or characters . It has methods to write XML nodes onto a Java OutputStream or a String . The most common kind of node you'll write is a Document , but you also can write all of the other kinds of nodes as well, such as Element , Attr , and Text . This interface also has methods to control exactly how the output is formatted and how errors are reported .

Caution

This section is based on very early, bleeding-edge technology and specifications, particularly the July 25, 2002 Working Draft of the Document Object Model (DOM) Level 3 Abstract Schemas and Load and Save Specification [http://www.w3.org/TR/2002/WD-DOM-Level-3-LS-20020725] and Xerces-J 2.0.2. Even with Xerces-J 2.2, most of the code in this section won't even compile, much less run. Furthermore, it's virtually guaranteed that the details in this section will change before DOM3 becomes a final recommendation.


Example 13.3 The DOM3 DOMWriter Interface
 package org.w3c.dom.ls; public interface DOMWriter {   public void    setFeature(String name, boolean state)    throws DOMException;   public boolean canSetFeature(String name, boolean state);   public boolean getFeature(String name) throws DOMException;   public String  getEncoding();   public void    setEncoding(String encoding);   public String  getNewLine();   public void    setNewLine(String newLine);   public boolean writeNode(OutputStream out, Node node);   public String writeToString(Node node) throws DOMException;   public DOMErrorHandler getErrorHandler();   public void setErrorHandler(DOMErrorHandler errorHandler);   public DOMWriterFilter getFilter();   public void setFilter(DOMWriterFilter filter); } 

Note

DOMWriter is not a java.io.Writer . In fact, it even prefers output streams to writers. The name is just a coincidence .


The primary purpose of this interface is to write nodes into strings or onto streams. These nodes can be complete documents or parts thereof, such as elements or text nodes. For example, the following code fragment uses the DOMWriter object writer to copy the Document object doc onto System.out and copy its root element into a String :

 try {   DOMWriter writer;   // initialize the DOMWriter...   writer.writeNode(document, System.out);   String root =    writer.writeToString(document.getDocumentElement()); } catch (Exception e) {   System.err.println(e); } 

DOMWriter also has several methods to configure the output. The setNewLine() method can choose the line separator used for output. The only legal values are carriage return, a line feed, or both; that is, in Java parlance, "\r" , "\n" , or "\r\n" . You can also set this to null to indicate you want the platform's default value.

The setEncoding() method changes the character encoding used for the output. Which encodings any given serializer supports varies from implementation to implementation, but common values include UTF-8, UTF-16, and ISO-8859-1. UTF-8 is the default if a value is not supplied. For example, the following writer sets up the output for use on a Macintosh:

 DOMWriter writer;  // initialize the DOMWriter... writer.setNewLine("\r"); writer.setEncoding("MacRoman"); 

More detailed control of the output can be achieved by getting and setting features of the DOMWriter , as you'll see shortly.

The setErrorHandler() method can install an org.w3c.dom.DOMErrorHandler object to receive notification of any problems that arise when outputting a node such as an element that uses the same prefix for two different namespace URIs on two attributes. This is a callback interface, similar to org.xml.sax.ErrorHandler but even simpler because it doesn't use different methods for different kinds of errors. Example 13.4 demonstrates this interface. The handleError() method returns true if processing should continue after the error, or false if it shouldn't.

Example 13.4 The DOM3 DOMErrorHandler Interface
 package org.w3c.dom; public interface DOMErrorHandler {   public boolean handleError(DOMError error); } 

In Xerces-2, the XMLSerializer class implements the DOMWriter interface. If you prefer, you can use these methods instead of the ones discussed in the last section. Example 13.5 demonstrates a complete program that builds a simple SVG document in memory and writes it into the file circle.svg in the current working directory using a \r\n line end and the UTF-16 encoding. The error handler is set to an anonymous inner class that prints error messages on System.err and returns false to indicate that processing should stop when an error is detected .

Example 13.5 Serializing with DOMWriter
 import org.w3c.dom.*; import org.apache.xerces.dom3.*; import org.apache.xerces.dom3.ls.DOMWriter; import org.apache.xml.serialize.XMLSerializer; import java.io.IOException; import javax.xml.parsers.*; public class SVGCircle {   public static void main(String[] args) {     try {       // Find the implementation       DocumentBuilderFactory factory        = DocumentBuilderFactory.newInstance();       factory.setNamespaceAware(true);       DocumentBuilder builder = factory.newDocumentBuilder();       DOMImplementation impl = builder.getDOMImplementation();       // Create the document       DocumentType svgDOCTYPE = impl.createDocumentType(        "svg", "-//W3C//DTD SVG 1.0//EN",        "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd"       );       Document doc = impl.createDocument(        "http://www.w3.org/2000/svg", "svg", svgDOCTYPE);       // Fill the document       Node rootElement = doc.getDocumentElement();       Element circle = doc.createElementNS(        "http://www.w3.org/2000/svg", "circle");       circle.setAttribute("r", "100");       rootElement.appendChild(circle);       // Serialize the document onto System.out       DOMWriter writer = new XMLSerializer();       writer.setNewLine("\r\n");       writer.setEncoding("UTF-16");       writer.setErrorHandler(         new DOMErrorHandler() {           public boolean handleError(DOMError error) {             System.err.println(error.getMessage());             return false;           }         }       );       writer.writeNode(System.out, doc);     }     catch (Exception e) {       System.err.println(e);     }   } } 

Note

Xerces-J 2.2 currently puts the DOMWriter interface in the org.apache. xerces.dom3.ls package instead of the org.w3c.dom.ls package. The Xerces development team is trying to keep the experimental DOM3 classes separate from the main API until DOM3 is more stable.


Creating DOMWriters

Example 13.5 depends on Xerces-specific classes. It won't work with GNU JAXP, or the Oracle XML Parser for Java, or other parsers, even after these parsers are upgraded to support DOM3. However, you can write the code in a much more parser-independent fashion by using the DOMImplementationLS interface, shown in Example 13.6, to create concrete implementations of DOMWriter , rather than constructing the implementation classes directly. DOMImplementationLS is a subinterface of DOMImplementation that adds three methods to create new DOMBuilder s, DOMWriter s, and DOMInputSource s.

Example 13.6 The DOM3 DOMImplementationLS Interface
 package org.w3c.dom.ls; public interface DOMImplementationLS {   public static final short MODE_SYNCHRONOUS  = 1;   public static final short MODE_ASYNCHRONOUS = 2;   public DOMWriter      createDOMWriter();   public DOMInputSource createDOMInputSource();   public DOMBuilder     createDOMBuilder(short mode,    String schemaType) throws DOMException; } 

You retrieve a concrete instance of this factory interface by using the DOM3 DOMImplementationRegistry factory class introduced in Chapter 10 to request a DOMImplementation object that supports the LS-Save feature. Then you cast that object to DOMImplementationLS . For example,

 try {   DOMImplementation impl = DOMImplementationRegistry    .getDOMImplementation("Core 2.0 LS-Save 3.0");   if (impl != null) {       DOMImplementationLS implls = (DOMImplementationLS) impl;       DOMWriter writer = implls.createDOMWriter();       writer.writeNode(System.out, document);   }   else {     System.out.println(      "Could not find a DOM3 Save compliant parser.");   } } catch (Exception e) {   System.err.println(e); } 

Using this technique, it's uncomplicated to write a completely implementation-independent program to generate and serialize XML documents, as Example 13.7 demonstrates. It uses the DOMImplementationRegistry class to load the DOMImplementationLS and the DOMWriter class to output the final result. Otherwise, it just uses the standard DOM2 classes that you've seen in previous chapters.

Example 13.7 An Implementation-Independent DOM3 Program to Build and Serialize an XML Document
 import org.w3c.dom.*; import org.w3c.dom.ls.*; public class SVGDOMCircle {   public static void main(String[] args) {     try {       // Find the implementation       DOMImplementation impl        = DOMImplementationRegistry.getDOMImplementation(           "Core 2.0 LS-Load 3.0 LS-Save 3.0");       if (impl == null) {         System.out.println(          "Could not find a DOM3 Load-Save compliant parser.");         return;       }       // Create the document       DocumentType svgDOCTYPE = impl.createDocumentType(        "svg", "-//W3C//DTD SVG 1.0//EN",        "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd"       );       Document doc = impl.createDocument(        "http://www.w3.org/2000/svg", "svg", svgDOCTYPE);       // Fill the document       Node rootElement = doc.getDocumentElement();       Element circle = doc.createElementNS(        "http://www.w3.org/2000/svg", "circle");       circle.setAttribute("r", "100");       rootElement.appendChild(circle);       // Serialize the document onto System.out       DOMImplementationLS implls = (DOMImplementationLS) impl;       DOMWriter writer = implls.createDOMWriter();       writer.writeNode(System.out, doc);     }     catch (Exception e) {       System.err.println(e);     }   } } 

This program needs to test for both the LS-Load and LS-Save features because it's not absolutely guaranteed that an implementation that has one will have the other, particularly in the early days of DOM3.

Serialization Features

The default settings for the writeNode() and writeToString() methods are acceptable for most uses. Occasionally, however, you will want a little more control over the serialized form. For example, you might want the output to be "pretty printed" with extra white space added to indent the elements nicely . Or you might want the output to be in canonical form. All of this and more can be controlled by setting features in the writer before invoking the write method.

Defined features include the following:

normalize-characters, optional, default true

If true, then output text should be normalized according to the W3C Character Model. For example, the word caf e would be represented as the four-character string c a f rather than the five-character string c a f e combining_acute_accent. Implementations are only required to support a false value for this feature.

split-cdata-sections, required, default true

If true, then CDATA sections that contain the CDATA section end delimiter ]]> are split into pieces and the ]]> is included in a raw text node. If false, then such CDATA sections are not split; instead, an error is reported and output stops.

entities, required, default true

If true, then entity references such as © are included in the output. If false, then they are not; instead, their replacement text is included.

whitespace-in-element-content, optional, default true

If true, then all white space is output. If false, then text nodes that contain only white space are deleted if the parent element's declaration from the DTD/schema does not allow #PCDATA to appear at that point.

discard-default-content, required, default true

If true, then the implementation will not write out any nodes whose presence can be inferred from the DTD or schema, for example, default attribute values. If false, then it will include them in the instance document it outputs.

canonical-form, optional, default false

If true, then the document will be written according to the rules specified by the Canonical XML specification. For example, attributes will be lexically ordered and CDATA sections will not be included. If false, then the exact output is implementation dependent.

format-pretty-print, optional, default false

If true, white space will be adjusted to "pretty print" the XML. Exactly what this meansfor example, how many spaces elements are indented or what maximum line length is usedis left up to implementations.

validate, optional, default false

If true, then the document's schema is used to validate the document as it is being output. Any validation errors that are discovered are reported to the registered error handler. (Both validation and error handlers are other new features in DOM3.)

Implementations may define additional custom features, the names of which will generally begin with vendor-specific prefixes such as "apache:" or "oracle:." For portability, remember to check for the existence of such a feature with canSetFeature() before setting it. Otherwise, you're likely to encounter an unexpected DOMException when the program is run with a different parser.

For example, the following code fragment attempts to output the Document object doc onto the OutputStream out in canonical form. However, if the implementation of DOMWriter doesn't support Canonical XML, it simply outputs the document in the normal way.

 try {   DOMWriter writer = new XMLSerializer();   if (writer.canSetFeature("canonical-form", true)) {     writer.setFeature("canonical-form", true);   }   writer.writeNode(out, doc); } catch (Exception e) {   System.err.println(e); } 

Filtering Output

One of the more original aspects of the DOMWriter API is the ability to attach filters to a writer that remove certain nodes from the output. A DOMWriterFilter is a subinterface of NodeFilter from the traversal API described in Chapter 12, and works almost exactly like it. This shouldn't be too surprising, because serializing a document is merely another tree-walking operation.

To perform output filtering, you first implement the DOMWriterFilter interface shown in Example 13.8. As with the NodeFilter superinterface, the acceptNode() method returns one of the three named constants NodeFilter.FILTER_ACCEPT , NodeFilter.FILTER_REJECT , or NodeFilter.FILTER_SKIP to indicate whether or not a particular node and its descendants should be output. (This method isn't listed here because it's inherited from the superinterface.)

Example 13.8 The DOMWriterFilter Interface
 package org.w3c.dom.ls; public interface DOMWriterFilter extends NodeFilter {   public int getWhatToShow(); } 

The getWhatToShow() method returns an int constant indicating which kinds of nodes are passed to this filter for processing. This is a combination of the bit constants used by NodeIterator and TreeWalker in Chapter 12: NodeFilter.SHOW_ELEMENT , NodeFilter.SHOW_TEXT , NodeFilter.SHOW_COMMENT , and so on.

Example 8.9 demonstrated a SAX filter that removed everything that wasn't in the XHTML namespace from a document. Example 13.9 is a DOMWriterFilter that accomplishes the same task.

Example 13.9 Filtering Everything That Isn't XHTML on Output
 import org.w3c.dom.*; import org.w3c.dom.traversal.NodeFilter; import org.w3c.dom.ls.DOMWriterFilter; public class XHTMLFilter implements DOMWriterFilter {   public final static String XHTML_NAMESPACE    = "http://www.w3.org/1999/xhtml";   // This filter only operates on elements. Everything else   // will be output without passing through the filter. However,   // descendants of non-XHTML elements will not be output   // because their ancestor elements have been rejected.   // Note that this means we don't fully handle nested XHTML;   // e.g., XHTML contains SVG, which contains XHTML.   // XHTML inside SVG will not be output.   public short getWhatToShow() {     return NodeFilter.SHOW_ELEMENT;   }   public int acceptNode(Node node) {     int type = node.getNodeType();     if (type != Node.ELEMENT_NODE) {       return NodeFilter.FILTER_ACCEPT;     }     String namespace = node.getNamespaceURI();     if (XHTML_NAMESPACE.equals(namespace)) {       return NodeFilter.FILTER_ACCEPT;     }     else {      return NodeFilter.FILTER_SKIP;     }   } } 

The one thing that this doesn't filter out is non-XHTML attributes. Those are written out with their elements. They are not passed to acceptNode() . To filter out attributes from other namespaces would require a custom DOMWriter . You might be able to remove them from the element nodes passed to acceptNode() , but this would modify the in-memory tree as well as the streamed output. Furthermore, although Java doesn't support this, the IDL code for DOMWriter indicates that the Node passed to acceptNode() is read only. The underlying implementation is probably not expecting acceptNode() to modify its argument. Doing so would be asking for corrupt data structures.

You can install a filter into a DOMWriter using the setFilter() method. Then any node the filter rejects will not be serialized. Example 13.10 uses the above XHTMLFilter to output pure XHTML from an input document that might contain SVG, MathML, SMIL, or other non-XHTML elements.

Example 13.10 Using a DOMWriterFilter
 import org.w3c.dom.*; import org.w3c.dom.ls.*; public class XHTMLPurifier {   public static void main(String[] args) {     try {       // Find the implementation       DOMImplementation impl        = DOMImplementationRegistry.getDOMImplementation(           "Core 2.0 LS-Load 3.0 LS-Save 3.0");       if (impl == null) {         System.out.println(          "Could not find a DOM3 Load-Save compliant parser.");         return;       }       DOMImplementationLS implls = (DOMImplementationLS) impl;       // Load the parser       DOMBuilder parser = implls.createDOMBuilder(        DOMImplementationLS.MODE_SYNCHRONOUS);       // Parse the document       Document doc = parser.parseURI(document);       // Serialize the document onto System.out while filtering       DOMWriter writer = implls.createDOMWriter();       DOMWriterFilter filter = new XHTMLFilter();       writer.setFilter(filter);       writer.writeNode(System.out, doc);     }     catch (Exception e) {       System.err.println(e);     }   } } 


Processing XML with Java. A Guide to SAX, DOM, JDOM, JAXP, and TrAX
Processing XML with Javaв„ў: A Guide to SAX, DOM, JDOM, JAXP, and TrAX
ISBN: 0201771861
EAN: 2147483647
Year: 2001
Pages: 191

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net