23.9 Reading HTML


Figure 23-11 shows the hierarchy breakdown for the classes involved in reading and parsing an HTML document with the HTMLEditorKit.

Figure 23-11. The class hierarchy for parsing HTML via HTMLEditorKit
figs/swng2.2311.gif

23.9.1 Document Parsers

The first function involved in loading and displaying an HTML document is parsing it. The HTMLEditorKit class has hooks for returning a parser to do the job. The classes in the javax.swing.text.html.parser package implement a DTD-based[8] parser for this purpose.

[8] DTD stands for Document Type Definition. The editor kit uses its own compiled DTD based on HTML 3.2 rather than one of the public standard versions another factor that complicates efforts to extend the tags supported.

But since we're here, let's look at the flow of an incoming HTML document. The editor kit instantiates a parser to read the document. ParserDelegator does what its name implies and delegates the actual parsing duties to another class DocumentParser, in this case. ParserDelegator also handles loading the DTD used to create the real parser. Ostensibly, you could load your own DTD, but this whole process is rather tightly coupled to the HTML DTD supplied by the good folks at Sun. Once the parser is in place, you can send it a document and a ParserCallback instance and start parsing. As the parser finds tokens and data, it passes them off to the callback instance that does the real work of building the document.

You can display the document as it is built, or you can wait for the entire document to be loaded before displaying it. The tokenThreshold property from HTMLDocument determines exactly when the display work begins. See the discussion following Table 23-12 for more details on the token threshold.



Java Swing
Graphic Java 2: Mastering the Jfc, By Geary, 3Rd Edition, Volume 2: Swing
ISBN: 0130796670
EAN: 2147483647
Year: 2001
Pages: 289
Authors: David Geary

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net