The detailed behavior of a serializer is controlled by an OutputFormat object. This class can configure almost any aspect of serialization, including setting the maximum line length, changing the indentation, specifying which elements have their text escaped as CDATA sections, and more. A few options even have the potential to make your documents malformed . For example, if you add an element to the list of nonescaping elements, then any reserved characters like < and & that appear in its text content will be output as themselves rather than escaped as < and & . One of the most frequent requests for serializers is "pretty printing" data with extra line breaks and indentation. Within reasonable limits, the OutputFormat class can provide this. Simply pass true to setIndenting() , pass the number of spaces you want each level to be indented to setIndent() , and pass the maximum line length to setLineWidth() . Example 13.1 demonstrates . Example 13.1 Using Xerces' OutputFormat Class to "Pretty Print" XMLimport java.math.*; import java.io.IOException; import org.w3c.dom.*; import javax.xml.parsers.*; import org.apache.xml.serialize.*; public class IndentedFibonacci { public static void main(String[] args) { try { // Find the implementation DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setNamespaceAware(true); DocumentBuilder builder = factory.newDocumentBuilder(); DOMImplementation impl = builder.getDOMImplementation(); // Create the document Document doc = impl.createDocument(null, "Fibonacci_Numbers", null); // Fill the document BigInteger low = BigInteger.ONE; BigInteger high = BigInteger.ONE; Element root = doc.getDocumentElement(); for (int i = 0; i < 10; i++) { Element number = doc.createElement("fibonacci"); Text text = doc.createTextNode(low.toString()); number.appendChild(text); root.appendChild(number); BigInteger temp = high; high = high.add(low); low = temp; } // Serialize the document OutputFormat format = new OutputFormat(doc); format.setLineWidth(65); format.setIndenting(true); format.setIndent(2); XMLSerializer serializer = new XMLSerializer(System.out, format); serializer.serialize(doc); } catch (FactoryConfigurationError e) { System.out.println("Could not locate a JAXP factory class"); } catch (ParserConfigurationException e) { System.out.println( "Could not locate a JAXP DocumentBuilder class" ); } catch (DOMException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } } When run, this program produces the following output: C:\XMLJAVA> java IndentedFibonacci <?xml version="1.0" encoding="UTF-8"?> <Fibonacci_Numbers> <fibonacci>1</fibonacci> <fibonacci>1</fibonacci> <fibonacci>2</fibonacci> <fibonacci>3</fibonacci> <fibonacci>5</fibonacci> <fibonacci>8</fibonacci> <fibonacci>13</fibonacci> <fibonacci>21</fibonacci> <fibonacci>34</fibonacci> <fibonacci>55</fibonacci> </Fibonacci_Numbers> I think you'll agree that this looks much more attractive than the smushed together output from the bare serialization without any extra white space. One warning, however: White space is significant in XML. Adding this white space has changed the document. This is not the same document as existed before it was "pretty printed." For this particular application, the extra white space is insignificant, but this is not true for all XML applications. White space is just the beginning of what the OutputFormat class can control. Other features include the MIME media type, the XML declaration, the system and public IDs for the document type, which elements' content should be escaped as CDATA sections, and more. Following is a list of the properties you can control by invoking various methods on OutputFormat . In some cases, the default is document dependent. When it's not, the default value is given in parentheses.
Following is the beginning of the output that this program produces: C:\XMLJAVA> java ValidFibonacciMathML <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE math PUBLIC "-//W3C//DTD MathML 2.0//EN" "http://www.w3.org/TR/MathML2/dtd/mathml2.dtd"> <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> <mi>f(1)</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>f(2)</mi> <mo>=</mo> <mn>1</mn> </mrow> ... You can imagine other requests for the serializer. For example, you might want a line break after each </mrow> end-tag but no line breaks inside mrow elements. Although OutputFormat doesn't give you enough control to arrange serialization to this level of detail, you could write a custom subclass of XMLSerializer to accomplish this. |