Writing a DOM Parser

I l @ ve RuBoard

There's no question that it's easier to write DOM-based code because you can access the entire document structure as a memory object rather than having to collect information as it passes by during a serial parse of the document. You can use the same DTD and XML, but the parser is totally different when written as a DOM implementation, as you can see in Listing 14.6.

Listing 14.6 DOMProductImporter.java
 package com.bfg.xml; import org.apache.xerces.dom.TextImpl; import org.xml.sax.SAXParseException; import org.apache.xerces.parsers.DOMParser; import org.w3c.dom.Attr; import org.w3c.dom.Document; import org.w3c.dom.NamedNodeMap; import org.w3c.dom.Node; import org.w3c.dom.Element; import org.w3c.dom.NodeList; import java.sql.*; import java.util.HashMap; import java.util.Vector; import java.util.Iterator; import java.text.NumberFormat; import java.text.DecimalFormat; import java.text.SimpleDateFormat; import java.util.ResourceBundle; public class DOMProductImporter  {     private static final String     DEFAULT_PARSER_NAME = "org.apache.xerces.parsers.DOMParser";     private static ResourceBundle sql_bundle =       ResourceBundle.getBundle("com.bfg.xml.SQLQueries");     public static void loadProducts(String uri) {       DOMProductImporter myclass = new DOMProductImporter();          try {              DOMParser parser =             (DOMParser)Class.forName(DEFAULT_PARSER_NAME).newInstance();             parser.setFeature("http://xml.org/sax/features/validation", true);            parser.parse(uri);             myclass.readDocument(parser.getDocument());         }  catch (org.xml.sax.SAXParseException spe) {             if (spe.getException() != null)                 spe.getException().printStackTrace(System.err);     else                 spe.printStackTrace(System.err);     myclass.rollbackAndQuit();         }  catch (org.xml.sax.SAXException se) {             if (se.getException() != null)                 se.getException().printStackTrace(System.err);             else                 se.printStackTrace(System.err);     myclass.rollbackAndQuit();         }  catch (Exception e) {             e.printStackTrace(System.err);     myclass.rollbackAndQuit();         }     }     Connection conn = null;     HashMap authors = new HashMap();     HashMap categories = new HashMap();     int max_author = 0;     int max_cat = 0;     public void rollbackAndQuit() {      try {          if (conn != null) {              Statement st = conn.createStatement();              st.executeUpdate("ROLLBACK");              st.close();              conn.close();          }     }  catch (Exception ex) {}     System.out.println("Aborting import, rolling back and quitting");     System.exit(1);     }     public void readDocument(Document doc) {       try {            Class.forName("org.gjt.mm.mysql.Driver").newInstance();           conn =            DriverManager.getConnection("jdbc:mysql://localhost/BFG");           Statement st = conn.createStatement();           ResultSet rs = st.executeQuery("SELECT * FROM CATEGORY");           while (rs.next()) {                categories.put(rs.getString("CATEGORY_NAME"),                 new Integer(rs.getInt("CATEGORY_ID")));                if (rs.getInt("CATEGORY_ID") > max_cat) {                    max_cat = rs.getInt("CATEGORY_ID");                }         }         rs.close();         rs = st.executeQuery("SELECT * FROM AUTHOR");         while (rs.next()) {             authors.put(rs.getString("AUTHOR_NAME"),                          new Integer(rs.getInt("AUTHOR_ID")));             if (rs.getInt("AUTHOR_ID") > max_author) {                 max_author = rs.getInt("AUTHOR_ID");             }        }        rs.close();        st.executeUpdate("SET AUTOCOMMIT=0");        st.close();        NodeList products = doc.getDocumentElement().getChildNodes();       if (products != null) {            int len = products.getLength();            for (int i = 0; i < len; i++) {                Node n = products.item(i);                if (n.getNodeName().equals("Product")) {                readProduct((Element)n);              }            }         }    }  catch (Exception ex) {         ex.printStackTrace(System.err);         rollbackAndQuit();    }   }     public void readProduct(Element node) {     String ISBN = node.getAttribute("ISBN");     Vector prod_authors = null;     Vector prod_categories = null;     Date pubdate = null;     String title = null;     double price = 0D;     String description = null;     NodeList children = node.getChildNodes();     if (children != null) {        int len = children.getLength();        for (int i = 0; i < len; i++) {             Node n = children.item(i);             if (n.getNodeName().equals("Authors"))                 prod_authors = readVector((Element) n, "Author");             if (n.getNodeName().equals("Categories"))                 prod_categories = readVector((Element) n, "Category");             if (n.getNodeName().equals("Title"))                 title = readString((Element) n);             if (n.getNodeName().equals("Description"))                description = readString((Element) n);             if (n.getNodeName().equals("Pubdate"))                 pubdate = readDate((Element) n);              if (n.getNodeName().equals("Price"))                 price = readDouble((Element) n);              }         }      createProduct(ISBN, prod_authors, prod_categories, pubdate, title, price, graphics/ccc.gif description);     }     public Vector readVector(Element node, String type) {       Vector results = new Vector();       NodeList children = node.getChildNodes();       if (children != null) {       int len = children.getLength();              for (int i = 0; i < len; i++) {              Node n = children.item(i);              if (n.getNodeName().equals(type)) {                  results.add(readString((Element)n));              }           }        }        return results;     }      public Date readDate(Element node) {      String dateString = readString(node);      try {           SimpleDateFormat nf = new SimpleDateFormat("MM/dd/yy");          return new Date(nf.parse(dateString).getTime());     }  catch (Exception ex) {          System.out.println("Invalid value for date: " +                              dateString);           rollbackAndQuit();     }      return null;    }     public double readDouble(Element node) {       String doubleString = readString(node);       try {           NumberFormat nf = new DecimalFormat();           return nf.parse(doubleString).doubleValue();      }  catch (Exception ex) {           System.out.println("Invalid value for double: " +                               doubleString);           rollbackAndQuit();      }      return 0;     }     public String readString(Element node) {       StringBuffer sb = new StringBuffer();       NodeList children = node.getChildNodes();       if (children != null) {          int len = children.getLength();          for (int i = 0; i < len; i++) {               Node n = children.item(i);               switch (n.getNodeType()) {               case Node.CDATA_SECTION_NODE: {                   sb.append(n.getNodeValue());                   break;               }               case Node.TEXT_NODE: {                   if (n instanceof TextImpl) {                      if (!(((TextImpl)n).isIgnorableWhitespace())) {                   sb.append(n.getNodeValue());                      }                 }  else {                      sb.append(n.getNodeValue());                 }               }            }          }       }     return sb.toString();     }     public void createProduct(String ISBN, Vector prod_authors, Vector prod_categories,                               Date pubdate, String title, double price, String graphics/ccc.gif description) {       try {           PreparedStatement pstmt =                conn.prepareStatement(sql_bundle.getString("deleteProd"));            pstmt.setString(1, ISBN);            pstmt.executeUpdate();            pstmt.close();            pstmt =                 conn.prepareStatement(sql_bundle.getString("deleteProdXref"));            pstmt.setString(1, ISBN);            pstmt.executeUpdate();            pstmt.close();            pstmt =                 conn.prepareStatement(sql_bundle.getString("deleteCatXref"));            pstmt.setString(1, ISBN);            pstmt.executeUpdate();            pstmt.close();            pstmt =                 conn.prepareStatement(sql_bundle.getString("insertProd"));            pstmt.setString(1, ISBN);            pstmt.setString(2, title);            pstmt.setDouble(3, price);            pstmt.setDate(4, pubdate);            pstmt.setString(5, description);            pstmt.executeUpdate();            Iterator author_it = prod_authors.iterator();            while (author_it.hasNext()) {                String author = (String) author_it.next();                int author_id;                if (authors.get(author) != null) {                    author_id = ((Integer)authors.get(author)).intValue();                }  else {                     pstmt =                          conn.prepareStatement(sql_bundle.getString("insertAuthor"));                     author_id = ++max_author;                     pstmt.setInt(1, author_id);                     pstmt.setString(2, author);                     pstmt.executeUpdate();                     pstmt.close();                     authors.put(author, new Integer(author_id));                }                pstmt =                     conn.prepareStatement(sql_bundle.getString("insertAuthorXref"));                 pstmt.setString(1, ISBN);                 pstmt.setInt(2, author_id);                 pstmt.executeUpdate();          }          Iterator cat_it = prod_categories.iterator();          while (cat_it.hasNext()) {              String cat = (String) cat_it.next();              int cat_id;              if (categories.get(cat) != null) {                  cat_id = ((Integer)categories.get(cat)).intValue();              }  else {                  pstmt =                       conn.prepareStatement(sql_bundle.getString("insertCategory"));                  cat_id = ++max_cat;                  pstmt.setInt(1, cat_id);                  pstmt.setString(2, cat);                  pstmt.executeUpdate();                  pstmt.close();                  categories.put(cat, new Integer(cat_id));            }            pstmt =                 conn.prepareStatement(sql_bundle.getString("insertCatXref"));                 pstmt.setString(1, ISBN);                 pstmt.setInt(2, cat_id);                 pstmt.executeUpdate();            }       }  catch (java.sql.SQLException ex) {            ex.printStackTrace(System.err);            rollbackAndQuit();       }     }     public static void main(String argv[]) {         if (argv.length == 0) {             System.exit(1);         }     loadProducts(argv[0]);    } } 

The top of the class is very similar to the SAX version. Some different packages have been imported and a different parser is being used, but so far the code looks remarkably the same. The first big difference occurs when you actually have the code parse the document. With a SAX parser, all the processing happens as a side effect because of the callbacks. With DOM, running the parser produces a document object that is at the top of an XML hierarchy that you need to have your importer walk.

readDocument does the same initializations as when you were using SAX (reading in the category and author list from the database). But then it gets a list of the child nodes of the document root ”the File element. The children of the file element must be Product elements, according to the DTD.

The code then iterates over the Product elements, calling readProduct on them in turn . readProduct gets the ISBN attribute and then iterates over its child nodes, looking for the various values that it needs to build a product. After all the values are read from the subelements, it calls createProduct .

readVector is put into service to read in two different elements: the author and category lists. It makes sure that the children are what it expects them to be (the type), although the DTD should validate against anything but the right values. After all the values are read using readString , it returns the vector.

readDate and readDouble just parse the result of doing a readString . readString assembles the characters in an element, ignoring any unnecessary whitespace.

createProduct is almost identical to the version used with SAX. The only difference between them is that DOM hands the data in as arguments rather than using global variables .

You will likely find this version much cleaner and easier to read than the SAX version. This purity comes at a price, however. The program currently reads in a few products, but imagine that you want to have it read in several million records instead. By its nature, DOM would want to read the entire thing into memory and build one huge structure to maintain it.

As you might imagine, this can put a serious crimp on your swap space and performance because hundreds of megabytes or even gigabytes of RAM are used trying to manage it.

On the other hand, SAX throws away the XML as soon as it's processed , which means that the SAX version could handle a billion records as easily as a dozen . Whether MySQL could handle a billion records is another matter, of course.

So, choosing between SAX and DOM is a matter of deciding whether it's more important to keep the memory usage down or have the ability to traverse the entire structure at will. One advantage of using DOM is that you can modify and output the XML again, as demonstrated in the next section.

I l @ ve RuBoard


MySQL and JSP Web Applications. Data-Driven Programming Using Tomcat and MySQL
MySQL and JSP Web Applications: Data-Driven Programming Using Tomcat and MySQL
ISBN: 0672323095
EAN: 2147483647
Year: 2002
Pages: 203
Authors: James Turner

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net