20.5 Setting Up an External Reference Entity Handler


Many documents have external entities, and you want to have a program to parse those different entities.

Technique

Use the xml_set_external_ref_handler() function, and be prepared to parse the extra data.

The XML files:

 xmlref-test.xml: <!DOCTYPE tst [ <!ENTITY arms SYSTEM "xmlref-test2.xml"> ]> <body>     <face>         <nose>             <size>                 big             </size>             <hairs>                 none             </hairs>         </nose>         <mouth>             <size>                 small             </size>             <lips>                 chapped             </lips>         </mouth>         <eyes>             <size>                 medium             </size>              <color>                  green              </color>          </eyes>      </face>      &arms;  </body>  xmlref-test2.xml:  <arms>      <biceps>          <size>              medium          </size>          <veins>              not huge          </veins>      </biceps>      <triceps>          <size>              small          </size>          <sore>              yes          </sore>      </triceps>  </arms> 

The PHP file:

 <?php function start_element($parser, $element_name, $element_attr) {     switch ($element_name) {         case "ARMS":         case "FACE":             $data = ucfirst(strtolower($element_name));             print "<h2>Descriptor for $data:</h2>\n<br>\n";             break;         case "BICEPS":         case "SIZE":         case "TRICEPS":         case "EYES":         case "MOUTH":         case "LIPS":         case "NOSE":         case "HAIRS":             $data = ucfirst(strtolower($element_name));             print "\n<br>\n$data";             break;         case "VEINS":             print "\n<br>\nAre they sore from a workout? ";             break;         case "COLOR":      print _\n<br>\nThe color of my eyes is:_;             break;     } } function end_element($parser, $element_name, $element_attr) {     // empty } function character_data($parser, $data) {     echo $data; } function pi_handler($parser, $type, $data) {     if ($type == "php") {         eval($data);     } } function lost_data($parser, $data) {     echo "<!-- Not Dealt with data (per se): $data -->\n"; } function xml_conn($file) {     $parser = xml_parser_create();     xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, 1);     xml_set_element_handler($parser, "start_element", "end_element");     xml_set_character_data_handler($parser, "character_data");     xml_set_processing_instruction_handler($parser, "pi_handler");     xml_set_default_handler($parser, "lost_data");     xml_set_external_entity_ref_handler ($parser, "external_ent_ref");     if (!($fp = fopen($file, "r"))) {         die("Cannot Open File, $file");     }     return array($fp, $parser); } function external_ent_ref($parser,                           $open_ent_names,                           $base,                           $system_id,                           $public_id) {     if ($system_id != "") {         list ($fp, $parser) = xml_conn ($system_id);         while ($data = fread ($fp, 4096)) {             print $data;             xml_parse($parser, $data, feof($fp)) or                 die(sprintf('XML Error: %s at line %d',                             xml_error_string(xml_get_error_code($parser)),                             xml_get_current_line_number($parser)));         }         xml_parser_free ($parser);         return true;     }     return false; } list ($fp, $parser) = xml_conn('tst_xmlref.xml'); while ($data = fread ($fp, 4096)) {     xml_parse ($parser, $data, feof($fp))         or die (sprintf('XML Error: %s at line %d',                         xml_error_string(xml_get_error_code($parser)),                         xml_get_current_line_number($parser))); } xml_parser_free ($parser); ?> 

Comments

This code might look significantly more complicated than the rest of the code in this chapter, but I promise that it really isn't. The same theory of creating handlers and then opening and parsing the documents applies, except that there are a couple more levels.

An external entity reference is usually in the form of

 <!ENTITY foobar SYSTEM "foobar.xml"> 

In plain English, that means, "There's more data to parse, but we put the data in foobar.xml, so I would appreciate it if you go there to find the rest of the data." It is similar to using include() or require() in PHP ”using external entities enables you to break up big files into files of manageable size.

The elements of the solution code that deserve special attention are the xml_conn() function and the external_ent_ref() function ”they are the crux of this recipe.

The xml_conn() function is used to create a new XML parser for a specified file. Specifically, it creates a parser ( $parser ), sets all the handlers, and opens a connection to the file specified by the first argument ( $file ). We wrap this in a function because it is used in many places ”every time we reach an external entity reference, we have to parse the document pointed to by that external entity reference. It is similar to a Web crawler following links.

The external_ent_ref() function is the default handler for external entities, which is defined by the xml_set_external_entity_ref_handler() function. For every system entity, the external_ent_ref() function opens and parses the document with the same criteria with which we are parsing the main document (that is, using the start_element and end_element handlers).

Those are the two main functions involved in the code in the solution. All the rest of the code is the standard start element and end element handlers, as well as a few extra-fancy handlers, such as the XML default handler. The XML default handler is similar to the AUTOLOAD subroutine in Perl; it takes all the document elements that aren't recognized by the parser.

I neglected to discuss two functions in the explanation because they are really not important to the main idea of external entity references. Those functions are the pi_handler() function and the lost_data() function.

The pi_handler() function takes any processing instructions (in the format <? target data ?> ), and makes decisions about what to do with those processing instructions based on the target ( $type ). In the solution example, if we are given PHP processing instructions ( <?php ?> ), we evaluate the code and call it a night. Any other type of processing instructions go to the lost_data() function.

The lost_data() function is like a garbage truck ”it takes all the data that the XML processor throws away. For example, things such as missing processing instructions or unparsed tags go straight to the lost_data() function. From there, we can manipulate the extra bits and pieces.



PHP Developer's Cookbook
PHP Developers Cookbook (2nd Edition)
ISBN: 0672323257
EAN: 2147483647
Year: 2000
Pages: 351

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net