Lexical information in an XML document implies information that pertains to the text of the XML itself. In an XML document, the CDATA section, comments, and the references to parsed entities constitute the lexical information. Normally, the CDATA section is used in an XML document when you want to store text that is not be parsed. This happens when an element contains large sections of text that might include special characters, and it is inconvenient to replace each occurrence with an appropriate entity reference. The CDATA section is analogous to the <pre></pre> tags of HTML. For example, to put the following text in an XML file, you will need to escape every occurrence of <, >, and &; otherwise, the parser will generate errors: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>SAMS Publishing is the &best& <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>> This is obviously painful and counterproductive. To overcome such issues, a CDATA section is used. Also, a parser normally ignores a comment in an XML file. However, when creating applications that read, filter, and output XML files, you might need to read and write out the comments in the XML source file. Additionally, the XML source file might contain entity references such as &CompanyName;. In normal parsing, &CompanyName; would be replaced by the text it represents. However, a filtering application will need to transfer the entity reference as is without de-referencing it. To handle these three cases, the SAX package provides the LexicalHandler interface. It contains the methods that provide the mechanisms to handle CDATA sections, comments, and parsed entities in an XML file. To handle the lexical events in your application, you need to do the following: -
Import the LexicalHandler interface -
Implement the LexicalHandler interface -
Configure the XMLReader to send the lexical events to the LexicalHandler -
Implement the methods defined in the LexicalHandler interface To understand how the lexical handler works, you need to make some changes in the CarParts.xml file and the MyXMLHandler application. Updating the CarParts.xml File In the CarParts.xml file, make the following changes: -
Separate the DTD from the CarParts.xml file and name it CarParts.dtd. -
In the CarParts.xml, refer to the CarParts.dtd. -
Add a new element called forCDATA in the DTD and the CarParts.xml file. -
Change the entry in the supplier element from Engine 1 to the entity reference &companyname;. To make these changes, CarParts.dtd should be as shown in Listing 4.3. The lines in bold are the entries that have been newly added. Listing 4.3 Using LexicalHandler<?xml version='1.0' encoding='us-ascii'?> <!-- DTD for the XML file that describes car parts --> <!ELEMENT carparts (supplier,engines,carbodies,wheels,carstereos,forCDATA)> <!ELEMENT engines (engine+)> <!ELEMENT carbodies (carbody+)> <!ELEMENT wheels (wheel+)> <!ELEMENT carstereos (carstereo+)> <!ELEMENT forCDATA (CDATA)> <!ELEMENT supplier (#PCDATA)> <!ATTLIST supplier name CDATA #REQUIRED URL CDATA #REQUIRED > <!ELEMENT engine (#PCDATA)*> <!ATTLIST engine id CDATA #REQUIRED type CDATA #REQUIRED capacity (1000 | 2000 | 2500 ) #REQUIRED price CDATA #IMPLIED text CDATA #IMPLIED > <!ELEMENT carbody (#PCDATA)*> <!ATTLIST carbody id CDATA #REQUIRED type CDATA #REQUIRED color CDATA #REQUIRED > <!ELEMENT wheel (#PCDATA)*> <!ATTLIST wheel id CDATA #REQUIRED type CDATA #REQUIRED price CDATA #IMPLIED size (X | Y | Z) #IMPLIED > <!ELEMENT carstereo (#PCDATA)*> <!ATTLIST carstereo id CDATA #REQUIRED manufacturer CDATA #REQUIRED model CDATA #REQUIRED Price CDATA #REQUIRED > Note that the element forCDATA, which contains CDATA, has been added into the DTD. Also, the entity reference has been retained in the XML file itself. To update the CarParts.xml file with the changes, add the lines displayed in bold in Listing 4.4. Listing 4.4 Output of MyXMLHandler with LexicalHandler<?xml version='1.0' encoding='us-ascii'?> <!-- XML file that describes car parts --> <!DOCTYPE carparts SYSTEM "CarParts.dtd" [ <!ENTITY companyname "Heaven Car Parts (TM)"> <!ENTITY companyweb "http://carpartsheaven.com"> ]> <carparts> <?supplierformat format="X13" version="3.2"?> <supplier name="&companyname;" URL="&companyweb;"> &companyname; </supplier> <engines> <engine type="Alpha37" capacity="2500" price="3500"> Engine 1 </engine> </engines> <carbodies> <carbody type="Tallboy" color="blue"> Car Body 1 </carbody> </carbodies> <wheels> <wheel type="X3527" price="120"> Wheel Set 1 </wheel> </wheels> <carstereos> <carstereo manufacturer="MagicSound" model="T76w" Price="500"> Car Stereo 1 </carstereo> </carstereos> <forCDATA><![CDATA[Special Text: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>SAMS Publishing is the &best& <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>..]]> </forCDATA> </carparts> Next, you need to import the LexicalHandler interface into the MyXMLHandler application. Importing the LexicalHandler Package To import the LexicalHandler interface to the MyXMLHandler application, add the following line listed in bold: import javax.xml.parsers.*; import org.xml.sax.*; import org.xml.sax.helpers.*; import org.xml.sax.ext.LexicalHandler; public class MyXMLHandler extends DefaultHandler { ..... } Next, implement the LexicalHandler interface in the MyXMLHandler application. Implementing the LexicalHandler Interface To implement the LexicalHandler interface within the MyXMLHandler, add the following line displayed in bold: import javax.xml.parsers.*; import org.xml.sax.*; import org.xml.sax.helpers.*; import org.xml.sax.ext.LexicalHandler; public class MyXMLHandler extends DefaultHandler implements LexicalHandler{ Next, XMLReader needs to be configured to send the lexical events to the LexicalHandler. Configuring XMLReader for Lexical Handling Configuring the XMLReader for lexical handling involves setting the property http://xml.org/sax/properties/lexical-handler. This property is defined in the SAX standard, and can be set by using the setProperty() method of XMLReader. To set the property, add the line listed in bold: static public void main(String[] args) throws Exception { ...................... /*XML Reader is the interface for reading an XML document using callbacks*/ XMLReader xmlReader = saxParser.getXMLReader(); xmlReader.setProperty("http://xml.org/sax/properties/lexical-handler", new MyXMLHandler()); /*set the error handler*/ xmlReader.setErrorHandler(new MyErrorHandler()); ................................. } You've now set the required property for lexical handling and configured XMLReader to pass the lexical events to your lexical handler, which in this case is the application itself. Next, you need to implement the methods of the LexicalHandler interface. Implementing Methods of the LexicalHandler Interface The LexicalHandler interface declares seven methods that must be defined in the application implementing the interface. These methods pertain to processing the comments, CDATA sections, and parsed entities. You'll need to implement the methods so that they display the lexical event name when it occurs. To do so, add the lines of code listed in bold in Listing 4.5. Listing 4.5 Implementing Methods of LexicalHandler Interfacepublic void printAllAttributes(Attributes elementAttributes) { System.out.println("\tTotal Number of Attributes: "+ elementAttributes. getLength()); for(int i=0;i<elementAttributes.getLength();i++) { System.out.println("\t\tAttribute: "+ elementAttributes.getQName(i)+ " = "+ elementAttributes.getValue(i)); } } //Lexical Handler Methods public void startCDATA() throws SAXException { System.out.println("\nStarting CDATA Section\n"); } public void endCDATA() throws SAXException { System.out.println("\nEnding CDATA Section\n"); } public void comment(char[] ch, int start, int length) throws SAXException { String commentText = new String(ch,start,length); System.out.println("Comment : " + commentText + "\n"); } public void startDTD(java.lang.String name, java.lang.String publicId, java.lang.String systemId) throws SAXException { System.out.println("Starting DTD :" + systemId); } public void endDTD() throws SAXException { System.out.println("Ending DTD"); } public void startEntity(java.lang.String name) throws SAXException { System.out.println("Starting entity :" + name); } public void endEntity(java.lang.String name) throws SAXException { System.out.println("Ending entity :" + name); } NOTE The code discussed here is available in the example0402 folder. This folder also contains the sample CarParts.xml file. You can now compile and run the program. The output should be similar to the listing displayed in Listing 4.6. The output from the lexical handling methods is displayed in bold. Listing 4.6 Output of MyXMLHandler with LexicalHandler MethodsVersion 0402.0 of MyXMLHandler in example0402 Locator :file:///D:/sams_work/Java Api/Chapter 4- SAX APIs - Advanced Use/Example0402/ CarParts.xml Start Document: -----Reading the document CarParts.xml with MyXMLHandler------ Comment : XML file that describes car parts Starting DTD :CarParts.dtd Starting entity :[dtd] Comment : DTD for the XML file that describes car parts Ending entity :[dtd] Ending DTD Location of event at line number :8 Start Element-> carparts Total Number of Attributes: 0 Location of event at line number :10 Start Element-> supplier Total Number of Attributes: 2 Attribute: name = Heaven Car Parts (TM) Attribute: URL = http://carpartsheaven.com Characters: Starting entity :companyname Characters: Heaven Car Parts (TM) Ending entity :companyname Characters: End Element-> supplier Location of event at line number :13 Start Element-> engines Total Number of Attributes: 0 Location of event at line number :14 Start Element-> engine Price= 3500 Characters: Engine 1 Characters: End Element-> engine End Element-> engines Location of event at line number :18 Start Element-> carbodies Total Number of Attributes: 0 Location of event at line number :19 Start Element-> carbody Total Number of Attributes: 3 Attribute: id = C32 Attribute: type = Tallboy Attribute: color = blue Characters: Car Body 1 Characters: End Element-> carbody End Element-> carbodies Location of event at line number :23 Start Element-> wheels Total Number of Attributes: 0 Location of event at line number :24 Start Element-> wheel Price= 120 Characters: Wheel Set 1 Characters: End Element-> wheel End Element-> wheels Location of event at line number :28 Start Element-> carstereos Total Number of Attributes: 0 Location of event at line number :29 Start Element-> carstereo Total Number of Attributes: 4 Attribute: id = C2 Attribute: manufacturer = MagicSound Attribute: model = T76w Attribute: Price = 500 Characters: Car Stereo 1 Characters: End Element-> carstereo End Element-> carstereos Location of event at line number :33 Start Element-> forCDATA Total Number of Attributes: 0 Starting CDATA Section Characters: Special Text: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SAMS Publishing is the &best& <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>.. Ending CDATA Section End Element-> forCDATA End Element-> carparts End Document: ----------------Finished Reading the document--------------- Note that the LexicalHandler treats the DTD as an entity. |