Java Implementation


Java Implementation

The Java implementation is composed of the following source files:

  • CSVToXMLSimple.java : the main routine for the Java application

  • CSVRowReader.java : a class with parse and write methods

Note that SAXErrorHandler (used in the Java implementation of the XML to CSV Converter utility) is missing. More on that below.

main in CSVToXMLBasic.java

Let's start with the implementation-dependent bit of pseudocode.

 Set up DOM XML environment (dependent on implementation)
Create the Output XML Document (dependent on implementation) 

There are at least three different ways to use JAXP and Xerces to set up the DOM and create a DOM Document. In the Xerces package org.apache.xerces.dom there is a DocumentImpl interface that allows us to create a DOM Document with a single method and no arguments.

 Document docOutput = new DocumentImpl(); 

However, this interface is not part of the standard JAXP implementation and is listed as one of the internal APIs in the Xerces Javadocs's Implementation Documentation section. Portability and maintainability considerations lead me to avoid using anything so nonstandard and implementation-dependent.

Another alternative exists in the relatively more standard JAXP implementation. By using the DocumentBuilder class and the DocumentBuilderFactory (as we did in Chapter 2), we can get a DOMImplementation from the DocumentBuilder, then call its createDocument method. The createDocument method takes three arguments: the URI of the namespace of the document being created, the qualified name of the document, and the document type of the document. The document type can either be set to null or be set up using the DOMImplementation's createDocumentType method. The code would look something like this, assuming we left the document type null.

 DocumentBuilderFactory Factory =
  DocumentBuilderFactory.newInstance();
DocumentBuilder Builder = Factory.newDocumentBuilder();
DOMImplementation DOMImpl = Builder.getDOMImplementation();
Document docOutput =
  DOMImpl.createDocument("urn:BabelBlaster.org",
  "bb:SimpleCSV"); 

While this is fully supported in JAXP, it is also fairly cumbersome. In addition, with the discussion of document type, this approach shows its origins of using DTDs rather than schemas. There is nothing wrong with this approach and it may be appropriate for some uses. However, we'll use a simpler method. The DocumentBuilder also has a newDocument method. Here's how we use it.

 DocumentBuilderFactory Factory =
  DocumentBuilderFactory.newInstance();
DocumentBuilder Builder = Factory.newDocumentBuilder();
docOutput = Builder.newDocument(); 

Note that in this program we don't call the setErrorHandler method to declare an error handler class that handles SAX parsing exceptions. This is because we aren't parsing! We are creating an empty DOM Document instead of parsing an existing one. The only XML- related exceptions we throw are DOM exceptions, and SAXErrorHandler doesn't deal with those.

The code for creating the root Element and appending it to the Document Node is pretty straightforward. The code snippet below shows how these two operations are done in Java. We won't need to look at them again after this.

 //  rootElement <- Call Document's createElement method,
//    with tagName of SimpleCSV
eleRoot = docOutput.createElement("SimpleCSV");

//  Document <-  Call Document's appendChild to append
//    Root to Document
docOutput.appendChild(eleRoot); 

The other interesting part of the Java implementation is saving the document. Here's the code:

 //  Save Output XML Document (dependent on
//    implementation)
//  Open file output stream
OutputXML = new FileOutputStream(sOutputXMLName);

//  Create an XML Serializer and set it to use
//    the file output stream
MySerializer = new XMLSerializer();
MySerializer.setOutputByteStream(OutputXML);

//  Create the Output Format for the Serializer for
//    XML, UTF-8, and indentation true
MyOutputFormat = new OutputFormat("XML","UTF-8",true);
MySerializer.setOutputFormat(MyOutputFormat);

//  Set the serializer to be a DOM serializer
MySerializer.asDOMSerializer();

//  Write out the document
MySerializer.serialize(docOutput); 

In this case the Java code actually looks less like the DOM recommendation than the MSXML C++ code. Version 1.2 of JAXP doesn't have anything that directly implements the DOM Level 3 recommendation "save Document" semantics. We get this functionality from Xerces by way of the XMLSerializer in the org.apache.xml.serialize package. The documentation for the XMLSerializer is found in the Other Classes section of the Xerces API Javadocs. The XMLSerializer provides both DOM and SAX approaches to serializing a document. True to the Java way of doing things, it uses an existing java.io.OutputStream or java.io.Writer to do the actual writing. So, I created a FileOutput stream for the output file. The XMLSerializer also requires that an OutputFormat class be specified. The XMLSerializer can be set up with both the OutputStream or Writer and the OutputFormat in the constructor. However, for clarity I show them being set up with separate calls. I chose to take a simple route and used the OutputFormat constructor, which specifies creation of an XML Document, with UTF-8 encoding and indentation set to true.

The next step before actually serializing is to set the XMLSerializer to be a DOM serializer since we have to tell the XMLSerializer whether we're using DOM or SAX. Finally, the actual save semantics are implemented by calling the XMLSerializer's serialize method, passing the DOM Document as the single parameter.

DOM Level 3 had not yet been approved when this version of Xerces was developed, which was several months before I wrote this chapter. It appeared from the documentation that these classes were still undergoing some development. So, if you are writing Java code to do DOM serialization, it would be good to get the latest version of Xerces and review the functionality of the XMLSerializer and related classes.

parse in CSVRowReader.java

The parse method is straight Java without any XML involved. It is pretty much a one-for-one implementation of the pseudocode. Review the source file if you're interested.

write in CSVRowReader.java

The write method is also a lot of Java with little new in the way of XML. The only new concepts here arise in creating the Text Node and appending it to the ColumnXX Node.

From CSVRowReader.java ”write
 //  Text Node <- Call Document's createTextNode method
//    with text from ColumnArray[ColumnNumber]
Text texText = docOutput.createTextNode(
  sbColumnArray[iColumnNumber].toString());

//  Free the memory for this column
sbColumnArray[iColumnNumber] = null;

//  Column Element <- Call Column Element's appendChild
//    to add Text Node as child
eleColumn.appendChild(texText); 

As noted above, we call the Document interface's createText method, passing the contents of the ColumnArray entry (implemented as an array of StringBuffers). Then we call the Column Element's appendChild method.