I l @ ve RuBoard |
Let's move on to a more focused discussion of the various event handlers you can register with the parser. PHP includes handlers for elements and attributes, character data, processing instructions, external entities, and notations. Each of these is discussed in detail in the following sections. Handling ElementsThe xml_set_element_handler() function is used to identify the functions that handle elements encountered by the XML parser as it progresses through a document. This function accepts three arguments: the handle for the XML parser, the name of the function to call when it finds an opening tag, and the name of the function to call when it finds a closing tag, respectively. Here's an example: xml_set_element_handler($xml_parser, "startElementHandler", "endElementHandler"); In this case, I've told the parser to call the function startElementHandler() when it finds an opening tag and the function endElementHandler() when it finds a closing tag. These handler functions must be set up to accept certain basic information about the element generating the event. When PHP calls the start tag handler, it passes it the following three arguments:
Because closing tags do not contain attributes, the end tag handler is only passed two arguments:
In order to demonstrate this, consider Listing 2.4 ”a simple XML document. Listing 2.4 Letter Marked Up with XML ( letter.xml )<?xml version="1.0"?> <letter> <date>10 January 2001</date> <salutation> <para> Dear Aunt Hilda, </para> </salutation> <body> <para> Just writing to thank you for the wonderful train set you sent me for Christmas. I like it very much, and Sarah and I have both enjoyed playing with it over the long holidays. </para> <para> It has been a while since you visited us. How have you been? How are the dogs, and has the cat stopped playing with your knitting yet? We were hoping to come by for a short visit on New Year's Eve, but Sarah wasn't feeling well. However, I hope to see you next month when I will be home from school for the holidays. </para> </body> <conclusion> <para>Hugs and kisses -- Your nephew, Tom</para> </conclusion> </letter> Listing 2.5 uses element handlers to create an indented list mirroring the hierarchical structure of the XML document in Listing 2.4. Listing 2.5 Representing an XML Document as a Hierarchical List<html> <head> <basefont face="Arial"> </head> <body> <?php // run when start tag is found function startElementHandler($parser, $name, $attributes) { echo "<ul><li>$name</li>"; } function endElementHandler($parser, $name) { echo "</ul>"; } // XML data file $xml_file = "letter.xml"; // initialize parser $xml_parser = xml_parser_create(); // set element handler xml_set_element_handler($xml_parser, "startElementHandler", "endElementHandler"); // read XML file if (!($fp = fopen($xml_file, "r"))) { die("File I/O error: $xml_file"); } // parse XML while ($data = fread($fp, 4096)) { // error handler if (!xml_parse($xml_parser, $data, feof($fp))) { die("XML parser error: " . xml_error_string(xml_get_error_code($xml_parser))); } } // all done, clean up! xml_parser_free($xml_parser); ?> </body> </html> Each time the parser finds an opening tag, it creates an unordered list and adds the tag name as the first item in that list; each time it finds an ending tag, it closes the list. The result is a hierarchical representation of the XML document's structure. Handling Character DataThe xml_set_character_data_handler() registers event handlers for character data. It accepts two arguments: the handle for the XML parser and the name of the function to call when it finds character data. For example: xml_set_character_data_handler($xml_parser, "characterDataHandler"); This tells the SAX parser to use the function named characterDataHandler() to process character data. When PHP calls this function, it automatically passes it the following two arguments:
Listing 2.6 demonstrates how this could be used. Listing 2.6 Stripping Out Tags from an XML Document<html> <head> <basefont face="Arial"> </head> <body> <?php // cdata handler function characterDataHandler($parser, $data) { echo $data; } // XML data $xml_data = <<<EOF <?xml version="1.0"?> <grammar> <noun type="proper">Mary</noun> <verb tense="past">had</verb> a <adjective>little</adjective> <noun type="common">lamb.</noun> </grammar> EOF; // initialize parser $xml_parser = xml_parser_create(); // set cdata handler xml_set_character_data_handler($xml_parser, "characterDataHandler"); if (!xml_parse($xml_parser, $xml_data)) { die("XML parser error: " . xml_error_string(xml_get_error_code($xml_parser))); } // all done, clean up! xml_parser_free($xml_parser); ?> </body> </html> In this case, the characterDataHandler() function works in much the same manner as PHP's built-in strip_tags() function ”it scans through the XML and prints only the character data encountered. Because I haven't registered any element handlers, any tags found during this process are ignored. You'll notice also that this example differs from the ones you've seen thus far, in that the XML data doesn't come from an external file, but has been defined via a variable in the script itself using "here document" syntax.
It should be noted that the character data handler is also invoked on CDATA blocks; Listing 2.7 is a variant of Listing 2.6 that demonstrates this. Listing 2.7 Parsing CDATA Blocks<html> <head> <basefont face="Arial"> </head> <body> <?php // cdata handler function characterDataHandler($parser, $data) { echo $data; } // XML data $xml_string = <<<EOF <?xml version="1.0"?> <message> <from>Agent 5292</from> <to>Covert-Ops HQ</to> <encoded_message> <![CDATA[ 563247 !#9292 73%639 1^2736 @@6473 634292 930049 292 *7623&& 62367& ]]> </encoded_message> </message> EOF; // initialize parser $xml_parser = xml_parser_create(); // set cdata handler xml_set_character_data_handler($xml_parser, "characterDataHandler"); if (!xml_parse($xml_parser, $xml_string)) { die("XML parser error: " . xml_error_string(xml_get_error_code($xml_parser))); } // all done, clean up! xml_parser_free($xml_parser); ?> </body> </html>
Table 2.1. A Comparison of Parser Behavior in CDATA Sections Containing Entity References
Handling Processing InstructionsYou can set up a handler for PIs with xml_set_processing_instruction_handler() , which operates just like the character data handler above. This snippet designates the function PIHandler() as the handler for all PIs found in the document: xml_set_processing_instruction_handler($xml_parser, "PIHandler"); The designated handler must accept three arguments:
Listing 2.8 demonstrates how it works in practice. When the parser encounters the PHP code within the document, it calls the PI handler, which executes the code as a PHP statement and displays the result. Listing 2.8 Executing PIs within an XML Document<html> <head> <basefont face="Arial"> </head> <body> <?php // cdata handler function characterDataHandler($parser, $data) { echo $data . "<p>"; } // PI handler function PIHandler($parser, $target, $data) { // if php code, execute it if (strtolower($target) == "php") { eval($data); } // otherwise just print it else { echo "PI found: [$target] $data"; } } // XML data $xml_data = <<<EOF <?xml version="1.0"?> <article> <header>insert slug here</header> <body>insert body here</body> <footer><?php print "Copyright UNoHoo Inc," . date("Y", mktime()); ?></footer> </article> EOF; // initialize parser $xml_parser = xml_parser_create(); // set cdata handler xml_set_character_data_handler($xml_parser, "characterDataHandler"); // set PI handler xml_set_processing_instruction_handler($xml_parser, "PIHandler"); if (!xml_parse($xml_parser, $xml_data)) { die("XML parser error: " . xml_error_string(xml_get_error_code($xml_parser))); } // all done, clean up! xml_parser_free($xml_parser); ?> </body> </html> Listing 2.8 designates the function PIHandler() as the handler to be called for all PIs encountered within the document. As explained previously, this function is passed the PI target and instruction as function arguments. When a PI is located within the document, PIHandler() first checks the PI target ( $target ) to see if is a PHP instruction. If it is, eval() is called to evaluate and execute the PHP code ( $data ) within the PI. If the target is any other application, PHP obviously cannot execute the instructions, and therefore resorts to merely displaying the PI to the user .
Handling External EntitiesYou already know that an entity provides a simple way to reuse frequently repeated text segments within an XML document. Most often, entities are defined and referenced within the same document. However, sometimes a need arises to separate entities that are common across multiple documents into a single external file. These entities, which are defined in one file and referenced in others, are known as external entities . If a document contains references to external entities, PHP offers xml_set_external_entity_ref_handler() , which specifies how these entities are to be handled. This snippet designates the function externalEntityHandler() as the handler for all external entities found in the document: xml_set_external_entity_ref_handler($xml_parser, "externalEntityHandler"); The handler designated by xml_set_external_entity_ref_handler() must be set up to accept the following five arguments:
In order to illustrate this, consider the following XML document (see Listing 2.9), which contains an external entity reference (see Listing 2.10). Listing 2.9 XML Document Referencing an External Entity ( mission.xml )<?xml version="1.0"?> <!DOCTYPE mission [ <!ENTITY warning SYSTEM "warning.txt"> ]> <mission> <objective>Find the nearest Starbucks</objective> <goal>Bring back two lattes, one espresso and one black coffee</goal> <priority>Critical</priority> <w>&warning;</w> </mission>
Listing 2.10 Referenced External Entity ( warning.txt )This document will self-destruct in thirty seconds. Listing 2.11 is a sample script that demonstrates how the entity resolver works. Listing 2.11 Resolving External Entities<html> <head> <basefont face="Arial"> </head> <body> <?php // external entity handler function externalEntityHandler($parser, $name, $base, $systemId, $publicId) { // read referenced file if (!readfile($systemId)) { die("File I/O error: $systemId"); } else { return true; } } // cdata handler function characterDataHandler($parser, $data) { echo $data . "<p>"; } // XML data file $xml_file = "mission.xml"; // initialize parser $xml_parser = xml_parser_create(); // set cdata handler xml_set_character_data_handler($xml_parser, "characterDataHandler"); // set external entity handler xml_set_external_entity_ref_handler($xml_parser, "externalEntityHandler"); // read XML file if (!($fp = fopen($xml_file, "r"))) { die("File I/O error: $xml_file"); } // parse XML while ($data = fread($fp, 4096)) { // error handler if (!xml_parse($xml_parser, $data, feof($fp))) { die("XML parser error: " . xml_error_string(xml_get_error_code($xml_parser))); } } // all done, clean up! xml_parser_free($xml_parser); ?> </body> </html> When this script runs, the external entity handler finds and resolves the entity reference, and includes it in the main document. In this case, the external entity is merely included, not parsed or processed in any way; however, if you want to see an example in which the external entity is itself an XML document that needs to be parsed further, take a look at Listing 2.23 in the "A Composite Example" section. Handling Notations and Unparsed EntitiesYou already know that notations and unparsed entities go together ”and PHP allows you to handle them, too, via its xml_set_notation_decl_handler() and xml_set_unparsed_entity_decl_handler() functions. (If you don't know what notations and unparsed entities are, drop by Chapter 1, "XML and PHP Basics," and find out what you missed.) Like all the other handlers discussed thus far, both these functions designate handlers to be called when the parser encounters either a notation declaration or an unparsed entity. The following snippet designates the functions unparsedEntityHandler() and notationHandler() as the handlers for unparsed entities and notations found in the document: xml_set_unparsed_entity_decl_handler($xml_parser, "unparsedEntityHandler"); xml_set_notation_decl_handler($xml_parser, "notationHandler"); The handler designated by xml_set_notation_decl_handler() must be capable of accepting the following five arguments:
Similarly, the handler designated by xml_set_unparsed_entity_decl_handler() must be capable of accepting the following six arguments:
In order to understand how these handlers work in practice, consider Listing 2.12, which sets up two unparsed entities representing directories on the system and a notation that tells the system what to do with them (run a script that calculates the disk space they're using, and mail the results to the administrator). Listing 2.12 XML Document Containing Unparsed Entities and Notations ( list.xml )<?xml version="1.0"?> <!DOCTYPE list [ <!ELEMENT list (#PCDATA dir)*> <!ELEMENT dir EMPTY> <!ATTLIST dir name ENTITY #REQUIRED> <!NOTATION directory SYSTEM "/usr/local/bin/usage.pl"> <!ENTITY config SYSTEM "/etc" NDATA directory> <!ENTITY temp SYSTEM "/tmp" NDATA directory> ]> <list> <dir name="config" /> <dir name="temp" /> </list> Listing 2.13 is the PHP script that parses the XML document. Listing 2.13 Handling Unparsed Entities<html> <head> <basefont face="Arial"> </head> <body> <?php // cdata handler function characterDataHandler($parser, $data) { echo $data . "<p>"; } // unparsed entity handler function unparsedEntityHandler($parser, $entity, $base, $systemId, $publicId, $notation) { global $notationsArray; if ($systemId) { exec("$notationsArray[$notation] $systemId"); } } // notation handler function notationHandler($parser, $notation, $base, $systemId, $publicId) { global $notationsArray; if ($systemId) { $notationsArray[$notation] = $systemId; } } // XML data file $xml_file = "list.xml"; // initialize array to hold notation declarations $notationsArray = array(); // initialize parser $xml_parser = xml_parser_create(); // set cdata handler xml_set_character_data_handler($xml_parser, "characterDataHandler"); // set entity and notation handlers xml_set_unparsed_entity_decl_handler($xml_parser, "unparsedEntityHandler"); xml_set_notation_decl_handler($xml_parser, "notationHandler"); // read XML file if (!($fp = fopen($xml_file, "r"))) { die("File I/O error: $xml_file"); } // parse XML while ($data = fread($fp, 4096)) { // error handler if (!xml_parse($xml_parser, $data, feof($fp))) { die("XML parser error: " . xml_error_string(xml_get_error_code($xml_parser))); } } // all done, clean up! xml_parser_free($xml_parser); ?> </body> </html> This is a little different from the scripts you've seen so far, so an explanation is in order. The notationHandler() function, called whenever the parser encounters a notation declaration, simply adds the notation and its associated system identifier to a global associative array, $notationsArray . Now, whenever an unparsed entity is encountered, the unparsedEntityHandler() function matches the notation name within the entity declaration to the keys of the associative array, and launches the appropriate script with the entity as parameter. Obviously, how you use these two handlers depends a great deal on how your notation declarations and unparsed entities are set up. In this case, I use the notation to specify the location of the application and the entity handler to launch the application whenever required.You also can use these handlers to display binary data within the page itself ( assuming that your target environment is a browser), to process it further, or to ignore it altogether.
Handling Everything ElseFinally, PHP also offers the xml_set_default_handler() function for all those situations not covered by the preceding handlers. In the event that no other handlers are defined for the document, all events generated will be trapped and resolved by this handler. This snippet designates the function defaultHandler() as the default handler for the document: xml_set_default_handler($xml_parser, "defaultHandler"); The function designated by xml_set_default_handler() must be set up to accept the following two arguments:
In Listing 2.14, every event generated by the parser is passed to the default handler (because no other handlers are defined), which simply prints the data received. The final output? An exact mirror of the input! Listing 2.14 Demonstrating the Default Handler<html> <head> <basefont face="Arial"> </head> <body> <?php // default handler function defaultHandler($parser, $data) { echo "<pre>" . htmlspecialchars($data) . "</pre>"; } // XML data $xml_data = <<<EOF <?xml version="1.0"?> <element>carbon <!-- did you know that diamond is a form of carbon? -Ed --> </element> EOF; // initialize parser $xml_parser = xml_parser_create(); // set default handler xml_set_default_handler($xml_parser, "defaultHandler"); if (!xml_parse($xml_parser, $xml_data)) { die("XML parser error: " . xml_error_string(xml_get_error_code($xml_parser))); } // all done, clean up! xml_parser_free($xml_parser); ?> </body> </html> |
I l @ ve RuBoard |