Handling SAX Events

I l @ ve RuBoard

Let's move on to a more focused discussion of the various event handlers you can register with the parser.

PHP includes handlers for elements and attributes, character data, processing instructions, external entities, and notations. Each of these is discussed in detail in the following sections.

Handling Elements

The xml_set_element_handler() function is used to identify the functions that handle elements encountered by the XML parser as it progresses through a document. This function accepts three arguments: the handle for the XML parser, the name of the function to call when it finds an opening tag, and the name of the function to call when it finds a closing tag, respectively.

Here's an example:

 xml_set_element_handler($xml_parser, "startElementHandler", "endElementHandler"); 

In this case, I've told the parser to call the function startElementHandler() when it finds an opening tag and the function endElementHandler() when it finds a closing tag.

These handler functions must be set up to accept certain basic information about the element generating the event.

When PHP calls the start tag handler, it passes it the following three arguments:

  • A handle representing the XML parser

  • The name of the element

  • A list of the element's attributes (as an associative array)

Because closing tags do not contain attributes, the end tag handler is only passed two arguments:

  • A handle representing the XML parser

  • The element name

In order to demonstrate this, consider Listing 2.4 ”a simple XML document.

Listing 2.4 Letter Marked Up with XML ( letter.xml )
 <?xml version="1.0"?>  <letter>  <date>10 January 2001</date>  <salutation>        <para>        Dear Aunt Hilda,        </para>  </salutation>  <body>        <para>        Just writing to thank you for the wonderful train set you sent me for        Christmas. I like it very much, and Sarah and I have both enjoyed playing        with it over the long holidays.        </para>        <para>        It has been a while since you visited us. How have you been? How are the        dogs, and has the cat stopped playing with your knitting yet? We were hoping        to come by for a short visit on New Year's Eve, but Sarah wasn't feeling        well. However, I hope to see you next month when I will be home from school        for the holidays.        </para>  </body>  <conclusion>        <para>Hugs and kisses -- Your nephew, Tom</para>  </conclusion>  </letter> 

Listing 2.5 uses element handlers to create an indented list mirroring the hierarchical structure of the XML document in Listing 2.4.

Listing 2.5 Representing an XML Document as a Hierarchical List
 <html>  <head>  <basefont face="Arial">  </head>  <body>  <?php  // run when start tag is found  function startElementHandler($parser, $name, $attributes)  {       echo "<ul><li>$name</li>";  }    function endElementHandler($parser, $name)  {       echo "</ul>";  }  // XML data file  $xml_file = "letter.xml";  // initialize parser  $xml_parser = xml_parser_create();  // set element handler  xml_set_element_handler($xml_parser, "startElementHandler", "endElementHandler");  // read XML file  if (!($fp = fopen($xml_file, "r")))  {       die("File I/O error: $xml_file");  }  // parse XML  while ($data = fread($fp, 4096))  {       // error handler        if (!xml_parse($xml_parser, $data, feof($fp)))        {             die("XML parser error: " .  xml_error_string(xml_get_error_code($xml_parser)));        }  }  // all done, clean up!  xml_parser_free($xml_parser);  ?>  </body>  </html> 

Each time the parser finds an opening tag, it creates an unordered list and adds the tag name as the first item in that list; each time it finds an ending tag, it closes the list. The result is a hierarchical representation of the XML document's structure.

Handling Character Data

The xml_set_character_data_handler() registers event handlers for character data. It accepts two arguments: the handle for the XML parser and the name of the function to call when it finds character data.

For example:

 xml_set_character_data_handler($xml_parser, "characterDataHandler"); 

This tells the SAX parser to use the function named characterDataHandler() to process character data.

When PHP calls this function, it automatically passes it the following two arguments:

  • A handle representing the XML parser

  • The character data found

Listing 2.6 demonstrates how this could be used.

Listing 2.6 Stripping Out Tags from an XML Document
 <html>  <head>  <basefont face="Arial">  </head>  <body>  <?php  // cdata handler  function characterDataHandler($parser, $data)  {       echo $data;  }  // XML data  $xml_data = <<<EOF  <?xml version="1.0"?>  <grammar>        <noun type="proper">Mary</noun> <verb tense="past">had</verb> a  <adjective>little</adjective> <noun type="common">lamb.</noun>  </grammar>  EOF;  // initialize parser  $xml_parser = xml_parser_create();  // set cdata handler  xml_set_character_data_handler($xml_parser, "characterDataHandler");  if (!xml_parse($xml_parser, $xml_data))  {         die("XML parser error: " .  xml_error_string(xml_get_error_code($xml_parser)));  }    // all done, clean up!  xml_parser_free($xml_parser);  ?>  </body>  </html> 

In this case, the characterDataHandler() function works in much the same manner as PHP's built-in strip_tags() function ”it scans through the XML and prints only the character data encountered. Because I haven't registered any element handlers, any tags found during this process are ignored.

You'll notice also that this example differs from the ones you've seen thus far, in that the XML data doesn't come from an external file, but has been defined via a variable in the script itself using "here document" syntax.

Here, Boy!

"Here-document" syntax provides a convenient way to create PHP strings that span multiple lines, or strings that retain their internal formatting (including tabs and line breaks).

Consider the following example:

 <?php  $str = <<<MARKER  This is        a multi  line                    string  MARKER;  ?> 

The <<< symbol indicates to PHP that what comes next is a multiline block, and should be stored "as is," right up to the specified marker. This marker must begin with an alphabetic or underscore character, can contain only alphanumeric and underscore characters , and when indicating the end of the block, must be flush with the left-hand margin of your code.

It should be noted that the character data handler is also invoked on CDATA blocks; Listing 2.7 is a variant of Listing 2.6 that demonstrates this.

Listing 2.7 Parsing CDATA Blocks
 <html>  <head>  <basefont face="Arial">  </head>  <body>  <?php  // cdata handler  function characterDataHandler($parser, $data)  {       echo $data;  }  // XML data  $xml_string = <<<EOF  <?xml version="1.0"?>  <message>        <from>Agent 5292</from>        <to>Covert-Ops HQ</to>        <encoded_message>        <![CDATA[       563247 !#9292 73%639 1^2736 @@6473 634292 930049 292 *7623&& 62367&        ]]>        </encoded_message>  </message>  EOF;  // initialize parser  $xml_parser = xml_parser_create();  // set cdata handler  xml_set_character_data_handler($xml_parser, "characterDataHandler");  if (!xml_parse($xml_parser, $xml_string))  {       die("XML parser error: " .  xml_error_string(xml_get_error_code($xml_parser)));  }  // all done, clean up!  xml_parser_free($xml_parser);  ?>  </body>  </html> 

When Less Work Is More

There's an important caveat you should note when dealing with character data via PHP's SAX parser. If a character data section contains entity references, then PHP will not replace the entity reference with its actual value first and then call the handler. Rather, it will split the character data into segments around the reference and operate on each segment separately.

What does this mean? Well, here's the sequence of events:

  1. PHP first calls the handler for the CDATA segment before the entity reference.

  2. It then replaces the reference with its value, and calls the handler again.

  3. Finally, it calls the handler a third time for the segment following the entity reference.

Table 2.1 might help to make this clearer. The first column uses a basic XML document without entities; the second column uses a document containing an entity reference within the data block. Both examples use the same character data handler; however, as the output shows, the first example calls the handler once, whereas the second calls the handler thrice.

Table 2.1. A Comparison of Parser Behavior in CDATA Sections Containing Entity References

XML Document without Entity References

XML Document with Entity References

<?xml version="1.0"?> <message>Welcome to GenericCorp. We're just like everyone else. </message>

<?xml version="1.0"?> <!DOCTYPE message [ <!ENTITY company "GenericCorp"> ]> <message>Welcome to &company;. We're just like everyone else.</message>

The Handler:

 

<?php // cdata handler function characterDataHandler($parser, $data) { echo " handler in " . $data . " handler out "; } ?>

<?php // cdata handler function characterDataHandler($parser, $data) { echo " handler in " . $data . " handler out "; } ?>

The output:

 

handler in Welcome to GenericCorp. We're just like everyone else. handler out

handler in Welcome to handler out handler in GenericCorp handler out handler in . We're just like everyone else. handler out

Handling Processing Instructions

You can set up a handler for PIs with xml_set_processing_instruction_handler() , which operates just like the character data handler above.

This snippet designates the function PIHandler() as the handler for all PIs found in the document:

 xml_set_processing_instruction_handler($xml_parser, "PIHandler"); 

The designated handler must accept three arguments:

  • A handle representing the XML parser (you can see that this is standard for all event handlers)

  • The PI target (an identifier for the application that is to process the instruction)

  • The instruction itself

Listing 2.8 demonstrates how it works in practice. When the parser encounters the PHP code within the document, it calls the PI handler, which executes the code as a PHP statement and displays the result.

Listing 2.8 Executing PIs within an XML Document
 <html>  <head>  <basefont face="Arial">  </head>  <body>  <?php  // cdata handler  function characterDataHandler($parser, $data)  {       echo $data . "<p>";  }  // PI handler  function PIHandler($parser, $target, $data)  {       // if php code, execute it        if (strtolower($target) == "php")        {            eval($data);        }        // otherwise just print it        else        {             echo "PI found: [$target] $data";        }  }  // XML data  $xml_data = <<<EOF  <?xml version="1.0"?>  <article>        <header>insert slug here</header>        <body>insert body here</body>        <footer><?php print "Copyright UNoHoo Inc," . date("Y", mktime()); ?></footer>    </article>  EOF;  // initialize parser  $xml_parser = xml_parser_create();  // set cdata handler  xml_set_character_data_handler($xml_parser, "characterDataHandler");  // set PI handler  xml_set_processing_instruction_handler($xml_parser, "PIHandler");  if (!xml_parse($xml_parser, $xml_data))  {       die("XML parser error: " .  xml_error_string(xml_get_error_code($xml_parser)));  }  // all done, clean up!  xml_parser_free($xml_parser);  ?>  </body>  </html> 

Listing 2.8 designates the function PIHandler() as the handler to be called for all PIs encountered within the document. As explained previously, this function is passed the PI target and instruction as function arguments.

When a PI is located within the document, PIHandler() first checks the PI target ( $target ) to see if is a PHP instruction. If it is, eval() is called to evaluate and execute the PHP code ( $data ) within the PI. If the target is any other application, PHP obviously cannot execute the instructions, and therefore resorts to merely displaying the PI to the user .

Careful eval() -uation

You may not know this (I didn't), but PHP ”which is usually pretty rigid about ending every statement with a semicolon ”allows you to omit the semicolon from the statement immediately preceding a closing PHP tag. For example, this is perfectly valid PHP code:

 <?php print "Copyright UNoHoo Inc," . date("Y", mktime()) ?> 

However, if you were to place this code in a PI, and pass it to eval() , as in Listing 2.8, eval() would generate an error. This is because the eval() function requires that all PHP statement(s) passed to it for evaluation must end with semicolons.

Handling External Entities

You already know that an entity provides a simple way to reuse frequently repeated text segments within an XML document. Most often, entities are defined and referenced within the same document. However, sometimes a need arises to separate entities that are common across multiple documents into a single external file. These entities, which are defined in one file and referenced in others, are known as external entities .

If a document contains references to external entities, PHP offers xml_set_external_entity_ref_handler() , which specifies how these entities are to be handled.

This snippet designates the function externalEntityHandler() as the handler for all external entities found in the document:

 xml_set_external_entity_ref_handler($xml_parser, "externalEntityHandler"); 

The handler designated by xml_set_external_entity_ref_handler() must be set up to accept the following five arguments:

  • A handle representing the XML parser

  • The entity name

  • The base URI for the SYSTEM identifier (PHP currently sets this to an empty string)

  • The SYSTEM identifier itself (if available)

  • The PUBLIC identifier (if available)

In order to illustrate this, consider the following XML document (see Listing 2.9), which contains an external entity reference (see Listing 2.10).

Listing 2.9 XML Document Referencing an External Entity ( mission.xml )
 <?xml version="1.0"?>  <!DOCTYPE mission  [ <!ENTITY warning SYSTEM "warning.txt">  ]>  <mission>        <objective>Find the nearest Starbucks</objective>        <goal>Bring back two lattes, one espresso and one black coffee</goal>        <priority>Critical</priority>        <w>&warning;</w>  </mission> 

True to You

The handler for external entities must explicitly return true if its actions are successful. If the handler returns false (or returns nothing at all, which works out to the same thing), the parser exits with error code 21 (see the "Handling Errors" section for more information on error codes).

Listing 2.10 Referenced External Entity ( warning.txt )
 This document will self-destruct in thirty seconds. 

Listing 2.11 is a sample script that demonstrates how the entity resolver works.

Listing 2.11 Resolving External Entities
 <html>  <head>  <basefont face="Arial">  </head>  <body>  <?php  // external entity handler  function externalEntityHandler($parser, $name, $base, $systemId, $publicId)  {       // read referenced file        if (!readfile($systemId))        {            die("File I/O error: $systemId");        }        else        {            return true;        }  }  // cdata handler  function characterDataHandler($parser, $data)  {       echo $data . "<p>";  }  // XML data file  $xml_file = "mission.xml";  // initialize parser  $xml_parser = xml_parser_create();  // set cdata handler  xml_set_character_data_handler($xml_parser, "characterDataHandler");  // set external entity handler  xml_set_external_entity_ref_handler($xml_parser, "externalEntityHandler");  // read XML file  if (!($fp = fopen($xml_file, "r")))  {       die("File I/O error: $xml_file");  }  // parse XML  while ($data = fread($fp, 4096))  {       // error handler        if (!xml_parse($xml_parser, $data, feof($fp)))        {             die("XML parser error: " .  xml_error_string(xml_get_error_code($xml_parser)));        }  }  // all done, clean up!  xml_parser_free($xml_parser);  ?>  </body>  </html> 

When this script runs, the external entity handler finds and resolves the entity reference, and includes it in the main document. In this case, the external entity is merely included, not parsed or processed in any way; however, if you want to see an example in which the external entity is itself an XML document that needs to be parsed further, take a look at Listing 2.23 in the "A Composite Example" section.

Handling Notations and Unparsed Entities

You already know that notations and unparsed entities go together ”and PHP allows you to handle them, too, via its xml_set_notation_decl_handler() and xml_set_unparsed_entity_decl_handler() functions. (If you don't know what notations and unparsed entities are, drop by Chapter 1, "XML and PHP Basics," and find out what you missed.) Like all the other handlers discussed thus far, both these functions designate handlers to be called when the parser encounters either a notation declaration or an unparsed entity.

The following snippet designates the functions unparsedEntityHandler() and notationHandler() as the handlers for unparsed entities and notations found in the document:

 xml_set_unparsed_entity_decl_handler($xml_parser, "unparsedEntityHandler");  xml_set_notation_decl_handler($xml_parser, "notationHandler"); 

The handler designated by xml_set_notation_decl_handler() must be capable of accepting the following five arguments:

  • A handle representing the XML parser

  • The notation name

  • A base URI for the SYSTEM identifier

  • The SYSTEM identifier itself (if available)

  • The PUBLIC identifier (if available)

Similarly, the handler designated by xml_set_unparsed_entity_decl_handler() must be capable of accepting the following six arguments:

  • A handle representing the XML parser

  • The name of the unparsed entity

  • A base for the SYSTEM identifier

  • The SYSTEM identifier itself (if available)

  • The PUBLIC identifier (if available)

  • The notation name

In order to understand how these handlers work in practice, consider Listing 2.12, which sets up two unparsed entities representing directories on the system and a notation that tells the system what to do with them (run a script that calculates the disk space they're using, and mail the results to the administrator).

Listing 2.12 XML Document Containing Unparsed Entities and Notations ( list.xml )
 <?xml version="1.0"?>  <!DOCTYPE list  [ <!ELEMENT list (#PCDATA  dir)*>  <!ELEMENT dir EMPTY>  <!ATTLIST dir name ENTITY #REQUIRED>  <!NOTATION directory SYSTEM "/usr/local/bin/usage.pl">  <!ENTITY config SYSTEM "/etc" NDATA directory>  <!ENTITY temp SYSTEM "/tmp" NDATA directory>  ]>  <list>        <dir name="config" />        <dir name="temp" />  </list> 

Listing 2.13 is the PHP script that parses the XML document.

Listing 2.13 Handling Unparsed Entities
 <html>  <head>  <basefont face="Arial">  </head>  <body>  <?php  // cdata handler  function characterDataHandler($parser, $data)  {  echo $data . "<p>";  }  // unparsed entity handler  function unparsedEntityHandler($parser, $entity, $base, $systemId, $publicId,  $notation)  {       global $notationsArray;        if ($systemId)        {             exec("$notationsArray[$notation] $systemId");        }  }  // notation handler  function notationHandler($parser, $notation, $base, $systemId, $publicId)  {       global $notationsArray;        if ($systemId)        {             $notationsArray[$notation] = $systemId;        }  }  // XML data file  $xml_file = "list.xml";  // initialize array to hold notation declarations  $notationsArray = array();  // initialize parser  $xml_parser = xml_parser_create();  // set cdata handler  xml_set_character_data_handler($xml_parser, "characterDataHandler");  // set entity and notation handlers  xml_set_unparsed_entity_decl_handler($xml_parser, "unparsedEntityHandler");  xml_set_notation_decl_handler($xml_parser, "notationHandler");  // read XML file  if (!($fp = fopen($xml_file, "r")))  {       die("File I/O error: $xml_file");  }    // parse XML  while ($data = fread($fp, 4096))  {       // error handler        if (!xml_parse($xml_parser, $data, feof($fp)))        {            die("XML parser error: " .  xml_error_string(xml_get_error_code($xml_parser)));        }  }  // all done, clean up!  xml_parser_free($xml_parser);  ?>  </body>  </html> 

This is a little different from the scripts you've seen so far, so an explanation is in order.

The notationHandler() function, called whenever the parser encounters a notation declaration, simply adds the notation and its associated system identifier to a global associative array, $notationsArray . Now, whenever an unparsed entity is encountered, the unparsedEntityHandler() function matches the notation name within the entity declaration to the keys of the associative array, and launches the appropriate script with the entity as parameter.

Obviously, how you use these two handlers depends a great deal on how your notation declarations and unparsed entities are set up. In this case, I use the notation to specify the location of the application and the entity handler to launch the application whenever required.You also can use these handlers to display binary data within the page itself ( assuming that your target environment is a browser), to process it further, or to ignore it altogether.

Rapid " exec() -ution"

The PHP exec() function provides a handy way to execute any command on the system. That's why it's so perfect for a situation like the one shown in Listing 2.13. With the usage.pl script and directory name both available to the parser, it's a simple matter to put them together and then have exec() automatically run the disk usage checker every time a directory name is encountered within the XML document.

The convenience of exec() comes at a price, however. Using exec() can pose significant security risks, and can even cause your system to slow down or crash if the program you are " exec() -uting" fails to exit properly. The PHP manual documents this in greater detail.

If you prefer to have the output from the command displayed (or processed further), you should consider the passthru () function, designed for just that purpose.

Handling Everything Else

Finally, PHP also offers the xml_set_default_handler() function for all those situations not covered by the preceding handlers. In the event that no other handlers are defined for the document, all events generated will be trapped and resolved by this handler.

This snippet designates the function defaultHandler() as the default handler for the document:

 xml_set_default_handler($xml_parser, "defaultHandler"); 

The function designated by xml_set_default_handler() must be set up to accept the following two arguments:

  • A handle representing the XML parser

  • The data encountered

In Listing 2.14, every event generated by the parser is passed to the default handler (because no other handlers are defined), which simply prints the data received. The final output? An exact mirror of the input!

Listing 2.14 Demonstrating the Default Handler
 <html>  <head>  <basefont face="Arial">  </head>  <body>  <?php  // default handler  function defaultHandler($parser, $data)  {       echo "<pre>" . htmlspecialchars($data) . "</pre>";  }  // XML data  $xml_data = <<<EOF  <?xml version="1.0"?>  <element>carbon <!-- did you know that diamond is a form of carbon? -Ed -->  </element>  EOF;  // initialize parser  $xml_parser = xml_parser_create();  // set default handler  xml_set_default_handler($xml_parser, "defaultHandler");  if (!xml_parse($xml_parser, $xml_data))    {       die("XML parser error: " .  xml_error_string(xml_get_error_code($xml_parser)));  }  // all done, clean up!  xml_parser_free($xml_parser);  ?>  </body>  </html> 
I l @ ve RuBoard


XML and PHP
XML and PHP
ISBN: 0735712271
EAN: 2147483647
Year: 2002
Pages: 84

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net