What s Next


HTML Parsing

The last parsing mechanism we will cover is HTML parsing, which allows an application to process the HTML tags found in an HTML document. You can use this functionality to write spider applications, simple editors, and other applications that need access to HTML. Listing 10.3 shows a small example of reading an HTML document from the hard drive and parsing it. All the parsing is performed with Resin-supplied classes.

Listing 10.3: HTML parsing example.

start example
 package servlet; import java.io.*; import javax.servlet.http.*; import javax.servlet.*; import org.w3c.dom.*; import com.caucho.xml.*; public class HTMLParsing extends HttpServlet {   public void doGet(HttpServletRequest request,     HttpServletResponse response)     throws IOException, ServletException {     try {       Html parser = new Html();       Document doc = parser.parseDocument(         "c:/data/resin/resin-ee-3.0.2/sample.html");       FileOutputStream os = new           FileOutputStream("newsample.xml");       XmlPrinter printer = new XmlPrinter(os);       printer.printHtml(doc);       os.close();     } catch(Exception e) {        e.printStackTrace();     }   } } 
end example

The code in Listing 10.3 shows how to parse an HTML document using the Resin-supplied HTML parser. The primary class for the parser is Html; it doesn't require the use of a factory. Once the object is instantiated, the parseDocument() method is called and given the path to a sample HTML file. This method parses the document and places it in a DOM object. In the sample code, the XmlPrinter outputs the HTML to another file.

You can find the full API for the Html object at: www.caucho.com/resin/ javadoc/com/caucho/xml/Html.html. The Html parser can be used to parse into a DOM object using both a file and a String object as well as using SAX.




Mastering Resin
Mastering Resin
ISBN: 0471431036
EAN: 2147483647
Year: 2002
Pages: 180

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net