The ProcessingInstruction Interface


The ProcessingInstruction interface represents a processing instruction such as <?xml-stylesheet type="text/css" href="order.css"?> or <?php echo "Hello World";?> .

Example 11.17 summarizes the ProcessingInstruction interface. This interface adds methods to get the target and the data of the processing instruction as strings. Even if the data has a pseudo-attribute format, as in <?xml-stylesheet type="text/css" href="order.css"?> , DOM doesn't recognize that. For this processing instruction, the target is xml-stylesheet and the data is type="text/css" href="order.css . "

Example 11.17 The ProcessingInstruction Interface
 package org.w3c.dom; public interface ProcessingInstruction extends Node {   public String getTarget();   public String getData();   public void   setData(String data) throws DOMException; } 

As usual, ProcessingInstruction objects also have all the methods of the Node superinterface, such as getNodeName() and getNodeValue() . The value of a processing instruction is its data. Processing instructions do not have children, however, so Node methods like getFirstChild() return null, and methods such as appendChild() throw a DOMException with the code HIERARCHY_REQUEST_ERR .

As an example, let's extend the earlier XLinkSpider program in Example 11.6 so that it respects robots processing instructions. Such an instruction looks like this, and appears in the prolog of an XML document:

 <?robots index="yes" follow="no"?> 

The semantics of this instruction is deliberately similar to the robots META tag in HTML. That is, follow="yes" means robots should follow links they find in this page; follow="no" means they shouldn't. Similarly, index="yes" means search engines should include this page; index="no" means they shouldn't.

Like many processing instructions, the syntax is based on pseudo-attributes. DOM doesn't provide any means to parse these, even though it's a very common format for processing instructions. However, you can fake DOM out. I'm going to extract the target and data of the processing instruction and use them to form a string that has this format:

 <  target data  /> 

In other words, a processing instruction such as <?robots index="yes" follow="no"?> is going to turn into a String like <robots index="yes" follow="no" /> . This string is in turn a well- formed XML document that can be parsed and its attributes extracted. Admittedly, this approach is very circuitous and probably not optimally efficient. On the other hand, it's a lot easier to code and explain than writing your own mini-parser just to handle pseudo-attributes. Example 11.18 is a simple utility class that implements this hack. The parsing is completely hidden inside the constructor, so if this is too offensive to your sensibilities, you can replace it with more appropriate code without changing the public interface. Because this class is quite useful in practice, not merely an example for this book, I've placed it in the com.macfaq.xml package. Don't forget to configure your class and source paths appropriately when compiling it.

Example 11.18 Reading PseudoAttributes from a ProcessingInstruction
 package com.macfaq.xml; import org.w3c.dom.*; import javax.xml.parsers.*; import org.xml.sax.*; import java.io.*; public class PseudoAttributes {   private NamedNodeMap pseudo;   public PseudoAttributes(ProcessingInstruction pi)    throws SAXException {     StringBuffer sb = new StringBuffer("<");     sb.append(pi.getTarget());     sb.append(" ");     sb.append(pi.getData());     sb.append("/>");     StringReader reader = new StringReader(sb.toString());     InputSource source = new InputSource(reader);     try {       DocumentBuilderFactory factory        = DocumentBuilderFactory.newInstance();       DocumentBuilder parser = factory.newDocumentBuilder();       // This line will throw a SAXException if the processing       // instruction does not use pseudo-attributes.       Document doc = parser.parse(source);       Element root = doc.getDocumentElement();       pseudo = root.getAttributes();     }     catch (FactoryConfigurationError e) {       // I don't absolutely need to catch this, but I hate to       // throw an Error for no good reason.       throw new SAXException(e.getMessage());     }     catch (SAXException e) {       throw e;     }     catch (Exception e) {       throw new SAXException(e);     }   }   // delegator methods   public Attr item(int index) {     return (Attr) pseudo.item(index);   }   public int getLength() {     return pseudo.getLength();   }   public String getValue(String name) {     Attr att = (Attr) pseudo.getNamedItem(name);     if (att == null) return "";     return att.getValue();   } } 

This class makes it easy for the earlier DOMSpider program in Example 11.6 to recognize the robots processing instruction. I won't repeat the entire program, most of which hasn't changed. The relevant change is in the spider() method, which now has to look for a robots processing instruction in each document and use that to decide whether or not to call process() ( index="yesno" ) and/or findLinks() ( follow="yesno" ).

 public void spider(String systemID) {     currentDepth++;     try {       if (currentDepth < maxDepth) {         Document document = parser.parse(systemID);         // Look for a robots PI with follow="no"         boolean index = true;         boolean follow = true;         NodeList children = document.getChildNodes();         for (int i = 0; i < children.getLength(); i++) {           Node child = children.item(i);           int type = child.getNodeType();           if (type == Node.PROCESSING_INSTRUCTION_NODE) {             ProcessingInstruction pi              = (ProcessingInstruction) child;             if (pi.getTarget().equals("robots")) {                PseudoAttributes pseudo = new PseudoAttributes(pi);                if (pseudo.getValue("index").equals("no")) {                  index = false;                }                if (pseudo.getValue("follow").equals("no")) {                  follow = false;                }             }           }         } // end for         if (index) process(document, systemID);         if (follow) {           Vector toBeVisited = new Vector();           // search the document for uris,           // store them in vector, and print them           findLinks(            document.getDocumentElement(), toBeVisited, systemID);           Enumeration e = toBeVisited.elements();           while (e.hasMoreElements()) {             String uri = (String) e.nextElement();             visited.add(uri);             spider(uri);           } // end while         } // end if       }     }     catch (SAXException e) {       // Couldn't load the document,       // probably not well-formed XML, skip it     }     catch (IOException e) {       // Couldn't load the document,       // likely network failure, skip it     }     finally {       currentDepth--;       System.out.flush();     }   } 


Processing XML with Java. A Guide to SAX, DOM, JDOM, JAXP, and TrAX
Processing XML with Javaв„ў: A Guide to SAX, DOM, JDOM, JAXP, and TrAX
ISBN: 0201771861
EAN: 2147483647
Year: 2001
Pages: 191

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net