DOM Level 3
DOM Level 3 (DOM3) will finally add a standard load-and-save package, making it possible to write completely
As shown by the method signatures in Example 13.3,
DOMWriter
can copy a
Node
object from memory into serialized bytes or
Caution
This section is based on very early, bleeding-edge technology and specifications, particularly the July 25, 2002 Working Draft of the
Document Object Model (DOM) Level 3 Abstract Schemas and Load and Save Specification
[http://www.w3.org/TR/2002/WD-DOM-Level-3-LS-20020725] and Xerces-J 2.0.2. Even with Xerces-J 2.2, most of the code in this section won't even compile, much less run. Furthermore, it's virtually
Example 13.3 The DOM3 DOMWriter Interface
package org.w3c.dom.ls;
public interface DOMWriter {
public void setFeature(String name, boolean state)
throws DOMException;
public boolean canSetFeature(String name, boolean state);
public boolean getFeature(String name) throws DOMException;
public String getEncoding();
public void setEncoding(String encoding);
public String getNewLine();
public void setNewLine(String newLine);
public boolean writeNode(OutputStream out, Node node);
public String writeToString(Node node) throws DOMException;
public DOMErrorHandler getErrorHandler();
public void setErrorHandler(DOMErrorHandler errorHandler);
public DOMWriterFilter getFilter();
public void setFilter(DOMWriterFilter filter);
}
Note
DOMWriter
is not a
java.io.Writer
. In fact, it even prefers output streams to writers. The name is just a
The primary purpose of this interface is to write nodes into strings or onto streams. These nodes can be complete documents or
try {
DOMWriter writer;
// initialize the DOMWriter...
writer.writeNode(document, System.out);
String root =
writer.writeToString(document.getDocumentElement());
}
catch (Exception e) {
System.err.println(e);
}
DOMWriter
also has several methods to configure the output. The
setNewLine()
method can choose the line separator used for output. The only legal values are
The setEncoding() method changes the character encoding used for the output. Which encodings any given serializer supports varies from implementation to implementation, but common values include UTF-8, UTF-16, and ISO-8859-1. UTF-8 is the default if a value is not supplied. For example, the following writer sets up the output for use on a Macintosh:
DOMWriter writer;
// initialize the DOMWriter...
writer.setNewLine("\r");
writer.setEncoding("MacRoman");
More detailed control of the output can be achieved by getting and setting features of the DOMWriter , as you'll see shortly.
The
setErrorHandler()
method can install an
org.w3c.dom.DOMErrorHandler
object to receive notification of any problems that arise when outputting a node such as an element that uses the same prefix for two different namespace URIs on two attributes. This is a callback interface, similar to
org.xml.sax.ErrorHandler
but even simpler because it doesn't use different methods for different kinds of errors. Example 13.4
Example 13.4 The DOM3 DOMErrorHandler Interface
package org.w3c.dom;
public interface DOMErrorHandler {
public boolean handleError(DOMError error);
}
In Xerces-2, the
XMLSerializer
class implements the
DOMWriter
interface. If you prefer, you can use these methods instead of the ones discussed in the last section. Example 13.5 demonstrates a complete program that builds a simple SVG document in memory and
Example 13.5 Serializing with DOMWriter
import org.w3c.dom.*;
import org.apache.xerces.dom3.*;
import org.apache.xerces.dom3.ls.DOMWriter;
import org.apache.xml.serialize.XMLSerializer;
import java.io.IOException;
import javax.xml.parsers.*;
public class SVGCircle {
public static void main(String[] args) {
try {
// Find the implementation
DocumentBuilderFactory factory
= DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
DOMImplementation impl = builder.getDOMImplementation();
// Create the document
DocumentType svgDOCTYPE = impl.createDocumentType(
"svg", "-//W3C//DTD SVG 1.0//EN",
"http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd"
);
Document doc = impl.createDocument(
"http://www.w3.org/2000/svg", "svg", svgDOCTYPE);
// Fill the document
Node rootElement = doc.getDocumentElement();
Element circle = doc.createElementNS(
"http://www.w3.org/2000/svg", "circle");
circle.setAttribute("r", "100");
rootElement.appendChild(circle);
// Serialize the document onto System.out
DOMWriter writer = new XMLSerializer();
writer.setNewLine("\r\n");
writer.setEncoding("UTF-16");
writer.setErrorHandler(
new DOMErrorHandler() {
public boolean handleError(DOMError error) {
System.err.println(error.getMessage());
return false;
}
}
);
writer.writeNode(System.out, doc);
}
catch (Exception e) {
System.err.println(e);
}
}
}
Note Xerces-J 2.2 currently puts the DOMWriter interface in the org.apache. xerces.dom3.ls package instead of the org.w3c.dom.ls package. The Xerces development team is trying to keep the experimental DOM3 classes separate from the main API until DOM3 is more stable. Creating DOMWriters
Example 13.5 depends on Xerces-specific classes. It won't work with GNU JAXP, or the Oracle XML Parser for Java, or other parsers, even after these parsers are upgraded to support DOM3. However, you can write the code in a much more parser-independent fashion by using the
DOMImplementationLS
interface, shown in Example 13.6, to create concrete
Example 13.6 The DOM3 DOMImplementationLS Interface
package org.w3c.dom.ls;
public interface DOMImplementationLS {
public static final short MODE_SYNCHRONOUS = 1;
public static final short MODE_ASYNCHRONOUS = 2;
public DOMWriter createDOMWriter();
public DOMInputSource createDOMInputSource();
public DOMBuilder createDOMBuilder(short mode,
String schemaType) throws DOMException;
}
You retrieve a concrete instance of this factory interface by using the DOM3 DOMImplementationRegistry factory class introduced in Chapter 10 to request a DOMImplementation object that supports the LS-Save feature. Then you cast that object to DOMImplementationLS . For example,
try {
DOMImplementation impl = DOMImplementationRegistry
.getDOMImplementation("Core 2.0 LS-Save 3.0");
if (impl != null) {
DOMImplementationLS implls = (DOMImplementationLS) impl;
DOMWriter writer = implls.createDOMWriter();
writer.writeNode(System.out, document);
}
else {
System.out.println(
"Could not find a DOM3 Save compliant parser.");
}
}
catch (Exception e) {
System.err.println(e);
}
Using this technique, it's uncomplicated to write a completely implementation-independent program to generate and serialize XML documents, as Example 13.7 demonstrates. It uses the DOMImplementationRegistry class to load the DOMImplementationLS and the DOMWriter class to output the final result. Otherwise, it just uses the standard DOM2 classes that you've seen in previous chapters. Example 13.7 An Implementation-Independent DOM3 Program to Build and Serialize an XML Document
import org.w3c.dom.*;
import org.w3c.dom.ls.*;
public class SVGDOMCircle {
public static void main(String[] args) {
try {
// Find the implementation
DOMImplementation impl
= DOMImplementationRegistry.getDOMImplementation(
"Core 2.0 LS-Load 3.0 LS-Save 3.0");
if (impl == null) {
System.out.println(
"Could not find a DOM3 Load-Save compliant parser.");
return;
}
// Create the document
DocumentType svgDOCTYPE = impl.createDocumentType(
"svg", "-//W3C//DTD SVG 1.0//EN",
"http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd"
);
Document doc = impl.createDocument(
"http://www.w3.org/2000/svg", "svg", svgDOCTYPE);
// Fill the document
Node rootElement = doc.getDocumentElement();
Element circle = doc.createElementNS(
"http://www.w3.org/2000/svg", "circle");
circle.setAttribute("r", "100");
rootElement.appendChild(circle);
// Serialize the document onto System.out
DOMImplementationLS implls = (DOMImplementationLS) impl;
DOMWriter writer = implls.createDOMWriter();
writer.writeNode(System.out, doc);
}
catch (Exception e) {
System.err.println(e);
}
}
}
This program needs to test for both the LS-Load and LS-Save features because it's not
Serialization Features
The default settings for the
writeNode()
and
writeToString()
methods are acceptable for most uses. Occasionally, however, you will want a little more control over the serialized form. For example, you might want the output to be "pretty printed" with extra white space added to indent the elements
Defined features include the following:
Implementations may define additional custom features, the
For example, the following code fragment attempts to output the Document object doc onto the OutputStream out in canonical form. However, if the implementation of DOMWriter doesn't support Canonical XML, it simply outputs the document in the normal way.
try {
DOMWriter writer = new XMLSerializer();
if (writer.canSetFeature("canonical-form", true)) {
writer.setFeature("canonical-form", true);
}
writer.writeNode(out, doc);
}
catch (Exception e) {
System.err.println(e);
}
Filtering Output
One of the more original aspects of the
DOMWriter
API is the ability to attach filters to a writer that remove certain nodes from the output. A
DOMWriterFilter
is a subinterface of
NodeFilter
from the traversal API described in Chapter 12, and works almost exactly like it. This shouldn't be too surprising, because serializing a document is merely another
To perform output filtering, you first implement the DOMWriterFilter interface shown in Example 13.8. As with the NodeFilter superinterface, the acceptNode() method returns one of the three named constants NodeFilter.FILTER_ACCEPT , NodeFilter.FILTER_REJECT , or NodeFilter.FILTER_SKIP to indicate whether or not a particular node and its descendants should be output. (This method isn't listed here because it's inherited from the superinterface.) Example 13.8 The DOMWriterFilter Interface
package org.w3c.dom.ls;
public interface DOMWriterFilter extends NodeFilter {
public int getWhatToShow();
}
The getWhatToShow() method returns an int constant indicating which kinds of nodes are passed to this filter for processing. This is a combination of the bit constants used by NodeIterator and TreeWalker in Chapter 12: NodeFilter.SHOW_ELEMENT , NodeFilter.SHOW_TEXT , NodeFilter.SHOW_COMMENT , and so on. Example 8.9 demonstrated a SAX filter that removed everything that wasn't in the XHTML namespace from a document. Example 13.9 is a DOMWriterFilter that accomplishes the same task. Example 13.9 Filtering Everything That Isn't XHTML on Output
import org.w3c.dom.*;
import org.w3c.dom.traversal.NodeFilter;
import org.w3c.dom.ls.DOMWriterFilter;
public class XHTMLFilter implements DOMWriterFilter {
public final static String XHTML_NAMESPACE
= "http://www.w3.org/1999/xhtml";
// This filter only operates on elements. Everything else
// will be output without passing through the filter. However,
// descendants of non-XHTML elements will not be output
// because their ancestor elements have been rejected.
// Note that this means we don't fully handle nested XHTML;
// e.g., XHTML contains SVG, which contains XHTML.
// XHTML inside SVG will not be output.
public short getWhatToShow() {
return NodeFilter.SHOW_ELEMENT;
}
public int acceptNode(Node node) {
int type = node.getNodeType();
if (type != Node.ELEMENT_NODE) {
return NodeFilter.FILTER_ACCEPT;
}
String namespace = node.getNamespaceURI();
if (XHTML_NAMESPACE.equals(namespace)) {
return NodeFilter.FILTER_ACCEPT;
}
else {
return NodeFilter.FILTER_SKIP;
}
}
}
The one thing that this doesn't filter out is non-XHTML attributes. Those are written out with their elements. They are not passed to acceptNode() . To filter out attributes from other namespaces would require a custom DOMWriter . You might be able to remove them from the element nodes passed to acceptNode() , but this would modify the in-memory tree as well as the streamed output. Furthermore, although Java doesn't support this, the IDL code for DOMWriter indicates that the Node passed to acceptNode() is read only. The underlying implementation is probably not expecting acceptNode() to modify its argument. Doing so would be asking for corrupt data structures. You can install a filter into a DOMWriter using the setFilter() method. Then any node the filter rejects will not be serialized. Example 13.10 uses the above XHTMLFilter to output pure XHTML from an input document that might contain SVG, MathML, SMIL, or other non-XHTML elements. Example 13.10 Using a DOMWriterFilter
import org.w3c.dom.*;
import org.w3c.dom.ls.*;
public class XHTMLPurifier {
public static void main(String[] args) {
try {
// Find the implementation
DOMImplementation impl
= DOMImplementationRegistry.getDOMImplementation(
"Core 2.0 LS-Load 3.0 LS-Save 3.0");
if (impl == null) {
System.out.println(
"Could not find a DOM3 Load-Save compliant parser.");
return;
}
DOMImplementationLS implls = (DOMImplementationLS) impl;
// Load the parser
DOMBuilder parser = implls.createDOMBuilder(
DOMImplementationLS.MODE_SYNCHRONOUS);
// Parse the document
Document doc = parser.parseURI(document);
// Serialize the document onto System.out while filtering
DOMWriter writer = implls.createDOMWriter();
DOMWriterFilter filter = new XHTMLFilter();
writer.setFilter(filter);
writer.writeNode(System.out, doc);
}
catch (Exception e) {
System.err.println(e);
}
}
}
|