|
The DOM parser reads an XML document in its entirety into a tree data structure. For most practical applications, DOM works fine. However, it can be inefficient if the document is large and if your processing algorithm is simple enough that you can analyze nodes on the fly, without having to see all of the tree structure. In these cases, you should use the SAX parser instead. The SAX parser reports events as it parses the components of the XML input, but it does not store the document in any wayit is up to the event handlers whether they want to build a data structure. In fact, the DOM parser is built on top of the SAX parser. It builds the DOM tree as it receives the parser events. Whenever you use a SAX parser, you need a handler that defines the event actions for the various parse events. The ContentHandler interface defines several callback methods that the parser executes as it parses the document. Here are the most important ones:
For example, when parsing the fragment <font> <name>Helvetica</name> <size units="pt">36</size> </font> the parser makes sure the following calls are generated:
Your handler needs to override these methods and have them carry out whatever action you want to carry out as you parse the file. The program at the end of this section prints all links <a href="..."> in an HTML file. It simply overrides the startElement method of the handler to check for links with name a and an attribute with name HRef. This is potentially useful for implementing a "web crawler," a program that reaches more and more web pages by following links. NOTE
The sample program is a good example for the use of SAX. We don't care at all in which context the a elements occur, and there is no need to store a tree structure. Here is how you get a SAX parser: SAXParserFactory factory = SAXParserFactory.newInstance(); SAXParser parser = factory.newSAXParser(); You can now process a document: parser.parse(source, handler); Here, source can be a file, URL string, or input stream. The handler belongs to a subclass of DefaultHandler. The DefaultHandler class defines do-nothing methods for the four interfaces: ContentHandler DTDHandler EntityResolver ErrorHandler The example program defines a handler that overrides the startElement method of the ContentHandler interface to watch out for a elements with an href attribute:
The startElement method has three parameters that describe the element name. The qname parameter reports the qualified name of the form alias:localname. If namespace processing is turned on, then the namespaceURI and lname parameters describe the namespace and local (unqualified) name. As with the DOM parser, namespace processing is turned off by default. You activate namespace processing by calling the setNamespaceAware method of the factory class: SAXParserFactory factory = SAXParserFactory.newInstance(); factory.setNamespaceAware(true); SAXParser saxParser = factory.newSAXParser(); Example 12-8 contains the code for the web crawler program. Later in this chapter, you will see another interesting use of SAX. An easy way of turning a non-XML data source into XML is to report the SAX events that an XML parser would report. See the section on XSL transformations for details. Example 12-8. SAXTest.java1. import java.io.*; 2. import java.net.*; 3. import javax.xml.parsers.*; 4. import org.xml.sax.*; 5. import org.xml.sax.helpers.*; 6. 7. /** 8. This program demonstrates how to use a SAX parser. The 9. program prints all hyperlinks links of an XHTML web page. 10. Usage: java SAXTest url 11. */ 12. public class SAXTest 13. { 14. public static void main(String[] args) throws Exception 15. { 16. String url; 17. if (args.length == 0) 18. { 19. url = "http://www.w3c.org"; 20. System.out.println("Using " + url); 21. } 22. else 23. url = args[0]; 24. 25. DefaultHandler handler = new 26. DefaultHandler() 27. { 28. public void startElement(String namespaceURI, 29. String lname, String qname, Attributes attrs) 30. { 31. if (lname.equalsIgnoreCase("a") && attrs != null) 32. { 33. for (int i = 0; i < attrs.getLength(); i++) 34. { 35. String aname = attrs.getLocalName(i); 36. if (aname.equalsIgnoreCase("href")) 37. System.out.println(attrs.getValue(i)); 38. } 39. } 40. } 41. }; 42. 43. SAXParserFactory factory = SAXParserFactory.newInstance(); 44. factory.setNamespaceAware(true); 45. SAXParser saxParser = factory.newSAXParser(); 46. InputStream in = new URL(url).openStream(); 47. saxParser.parse(in, handler); 48. } 49. } javax.xml.parsers.SAXParserFactory 1.4
javax.xml.parsers.SAXParser 1.4
org.xml.sax.ContentHandler 1.4
org.xml.sax.Attributes 1.4
|
|