A Few Examples

I l @ ve RuBoard

Now that you know the theory, let's see how it works in some real-life examples. The following sections illustrate how PHP's SAX parser can be used to "do something useful" with XML data.

Formatting an XML Invoice for Display in a Web Browser

Consider the XML document in Listing 2.21, which contains an invoice for material delivered by Sammy's Sports Store.

Listing 2.21 XML Invoice ( invoice.xml )
 <?xml version="1.0"?>  <!DOCTYPE invoice  [ <!ENTITY message "Thank you for your purchases!">  <!ENTITY terms SYSTEM "terms.xml">  ]>  <invoice>        <customer>              <name>Joe Wannabe</name>              <address>                    <line>23, Great Bridge Road</line>                    <line>Bombay, MH</line>                    <line>India</line>              </address>        </customer>        <date>2001-09-15</date>        <reference>75-848478-98</reference>        <items>              <item cid="AS633225">                    <desc>Oversize tennis racquet</desc>                    <price>235.00</price>                    <quantity>1</quantity>                    <subtotal>235.00</subtotal>              </item>              <item cid="GT645">                    <desc>Championship tennis balls (can)</desc>                    <price>9.99</price>                    <quantity>4</quantity>                    <subtotal>39.96</subtotal>              </item>              <item cid="U73472">                    <desc>Designer gym bag</desc>                    <price>139.99</price>                    <quantity>1</quantity>                    <subtotal>139.99</subtotal>              </item>              <item cid="AD848383">                    <desc>Custom-fitted sneakers</desc>                    <price>349.99</price>                    <quantity>1</quantity>                    <subtotal>349.99</subtotal>              </item>        </items>        <?php displayTotal(); ?>        <delivery>Next-day air</delivery>        &terms;        &message;  </invoice> 

The entity &terms references the file "terms.xml", which is shown in Listing 2.22.

Listing 2.22 Payment Terms and Conditions in XML ( terms.xml )
 <?xml version="1.0"?>  <terms>        <term>Visa, Mastercard, American Express accepted. Checks will be accepted  for orders totalling more than USD 5000.00</term>        <term>All payments must be made in US currency</term>        <term>Returns within 15 days</term>        <term>International orders may be subject to additional customs duties and  levies</term>  </terms> 

This invoice contains many of the constructs you've just studied: PIs, external entities, and plain- vanilla elements and data. It therefore serves as a good proving ground to demonstrate how PHP, combined with SAX, can be used to format XML data for greater readability. The script in Listing 2.23 parses the previous XML data to create an HTML page that is suitable for printing or viewing in a browser.

Listing 2.23 Generating HTML Output from XML Data with SAX
 <html>  <head>  <basefont face="Arial">  </head>  <body bgcolor="white">  <font size="+3">Sammy's Sports Store</font>  <br>  <font size="-2">14, Ocean View, CA 12345, USA http://www.sammysportstore.com/</font>  <p>  <hr>    <center>INVOICE</center>  <hr>  <?php  // element handlers  // these look up the element in the associative arrays  // and print the equivalent HTML code  function startElementHandler($parser, $name, $attribs)  {       global $startTagsArray;        // expose element being processed        global $currentTag;        $currentTag = $name;        // look up element in array and print corresponding HTML        if ($startTagsArray[$name])        {             echo $startTagsArray[$name];        }  }  function endElementHandler($parser, $name)  {       global $endTagsArray;        if ($endTagsArray[$name])        {             echo $endTagsArray[$name];        }  }  // character data handler  // this prints CDATA as it is found  function characterDataHandler($parser, $data)  {       global $currentTag;        global $subTotals;        echo $data;        // record subtotals for calculation of grand total        if ($currentTag == "SUBTOTAL")        {             $subTotals[] = $data;        }  }  // external entity handler  // if SYSTEM-type entity, this function looks up the entity and parses it  function externalEntityHandler($parser, $name, $base, $systemId, $publicId)  {       if ($systemId)        {             parse($systemId);              // explicitly return true              return true;        }        else        {             return false;        }  }  // PI handler  // this function processes PHP code if it finds any  function PIHandler($parser, $target, $data)  {       // if php code, execute it        if (strtolower($target) == "php")        {             eval($data);        }  }  // this function adds up all the subtotals  // and prints a grand total  function displayTotal()  {       global $subTotals;        foreach($subTotals as $element)        {             $total += $element;        }        echo "<p> <b>Total payable: </b> " . $total;  }  // function to actually perform parsing  function parse($xml_file)  {       // initialize parser        $xml_parser = xml_parser_create();        // set callback functions        xml_set_element_handler($xml_parser, "startElementHandler", "endElementHandler");  xml_set_character_data_handler($xml_parser, "characterDataHandler");  xml_set_processing_instruction_handler($xml_parser, "PIHandler");    xml_set_external_entity_ref_handler($xml_parser, "externalEntityHandler");        // read XML file        if (!($fp = fopen($xml_file, "r")))        {             die("File I/O error: $xml_file");        }        // parse XML        while ($data = fread($fp, 4096))        {           // error handler              if (!xml_parse($xml_parser, $data, feof($fp)))              {                   $ec = xml_get_error_code($xml_parser);                    die("XML parser error (error code " . $ec . "): " . graphics/ccc.gif xml_error_string($ec) . "<br>Error occurred at line " .  xml_get_current_line_number($xml_parser));              }        }  // all done, clean up!  xml_parser_free($xml_parser);  }  // arrays to associate XML elements with HTML output  $startTagsArray = array( 'CUSTOMER' => '<p> <b>Customer: </b>',  'ADDRESS' => '<p> <b>Billing address: </b>',  'DATE' => '<p> <b>Invoice date: </b>',  'REFERENCE' => '<p> <b>Invoice number: </b>',  'ITEMS' => '<p> <b>Details: </b> <table width="100%" border="1" cellspacing="0" graphics/ccc.gif cellpadding="3"><tr><td><b>Item  description</b></td><td><b>Price</b></td><td><b>Quantity</b></td><td><b>Subtotal</b></ graphics/ccc.gif td></tr>',  'ITEM' => '<tr>',  'DESC' => '<td>',  'PRICE' => '<td>',  'QUANTITY' => '<td>',  'SUBTOTAL' => '<td>',  'DELIVERY' => '<p> <b>Shipping option:</b> ',  'TERMS' => '<p> <b>Terms and conditions: </b> <ul>',  'TERM' => '<li>'  );  $endTagsArray = array( 'LINE' => ',',  'ITEMS' => '</table>',  'ITEM' => '</tr>',  'DESC' => '</td>',  'PRICE' => '</td>',  'QUANTITY' => '</td>',  'SUBTOTAL' => '</td>',  'TERMS' => '</ul>',  'TERM' => '</li>'  );  // create array to hold subtotals  $subTotals = array();  // begin parsing  $xml_file = "invoice.xml";  parse($xml_file);  ?>  </body>  </html> 

Figure 2.1 shows what the end result looks like.

Figure 2.1. Results of converting the XML invoice into HTML with SAX.

graphics/02fig01.gif

How did I accomplish this? Quite easily by using the various event handlers exposed by SAX. As the script in Listing 2.23 demonstrates , I defined handlers for elements, character data, PIs, and external entities. I also created two associative arrays, which map XML element names to HTML constructs; each time one of those XML elements is encountered , PHP replaces it with the corresponding HTML output, and prints it. PIs and external entities are handled in the normal manner; note that this time around, my external entity handler is not merely displaying the content of the referenced entity, but it is also parsing its contents.

Listing 2.23 demonstrates how easy it is to take marked -up XML data and "do something useful" with it ”in this case, render it in a format capable of display in a standard web browser.You could just as easily format the XML as ASCII text, WML pages, or (in combination with PHP's PDF generation functions) PDF documents.

A technique such as the one described previously is suitable for simple, short XML documents; however, it can prove to be tedious when dealing with larger, more complex documents. For documents like these, you might want to consider using the Document Object Model (DOM), discussed in the next chapter; or a more powerful stylesheet language such as XSLT, discussed in Chapter 4, "PHP and Extensible Stylesheet Language Transformations (XSLT)."

Parsing and Displaying RSS Data on a Web Site

Another fairly common application of PHP's SAX parser involves using it to parse RDF Site Summary (RSS) documents and extract data from them for display on a web site.

In case you didn't already know, RSS 1.0 documents are well- formed XML documents that conform to the W3C's Resource Description Format (RDF) specification. RSS 1.0 documents typically contain a description of the content on a web site. Many popular portals publish these documents as an easy way to allow other web sites to syndicate and link to their content.

A Rich Resource

For more information on RSS and RDF, take a look at http://purl.org/rss/1.0/ for the RSS 1.0 specification, and also visit the W3C's web site for RDF at http://www.w3.org/RDF/. And then drop by this book's companion web site (http://www.xmlphp.com), which has links to tutorials on how to integrate RSS 1.0 content feeds into your own web site.

Listing 2.24 demonstrates what an RSS 1.0 document looks like.

Listing 2.24 RSS 1.0 document ( fm-releases.rdf )
 <?xml version="1.0" encoding="ISO-8859-1"?>  <rdf:RDF   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"   xmlns="http://purl.org/rss/1.0/"   xmlns:dc="http://purl.org/dc/elements/1.1/">    <channel rdf:about="http://freshmeat.net/">      <title>freshmeat.net</title>      <link>http://freshmeat.net/</link>      <description>freshmeat.net maintains the Web's largest index of Unix and  cross-platform open source software. Thousands of applications are meticulously  cataloged in the freshmeat.net database, and links to new code are added  daily.</description>      <dc:language>en-us</dc:language>      <dc:subject>Technology</dc:subject>      <dc:publisher>freshmeat.net</dc:publisher>      <dc:creator>freshmeat.net contributors</dc:creator>      <dc:rights>Copyright (c) 1997-2002 OSDN</dc:rights>      <dc:date>2002-02-11T10:20+00:00</dc:date>      <items>        <rdf:Seq>          <rdf:li rdf:resource="http://freshmeat.net/releases/69583/" />          <rdf:li rdf:resource="http://freshmeat.net/releases/69581/" />          <!-- remaining items deleted -->        </rdf:Seq>      </items>      <image rdf:resource="http://freshmeat.net/img/fmII-button.gif" />      <textinput rdf:resource="http://freshmeat.net/search/" />    </channel>    <image rdf:about="http://freshmeat.net/img/fmII-button.gif">      <title>freshmeat.net</title>      <url>http://freshmeat.net/img/fmII-button.gif</url>      <link>http://freshmeat.net/</link>    </image>    <item rdf:about="http://freshmeat.net/releases/69583/">      <title>sloop.splitter 0.2.1</title>      <link>http://freshmeat.net/releases/69583/</link>      <description>A real time sound effects program.</description>      <dc:date>2002-02-11T04:52-06:00</dc:date>    </item>    <item rdf:about="http://freshmeat.net/releases/69581/">       <title>apacompile 1.9.9</title>       <link>http://freshmeat.net/releases/69581/</link>       <description>A full-featured Apache compilation HOWTO.</description>       <dc:date>2002-02-11T04:52-06:00</dc:date>    </item>     <!-- remaining items deleted -->  </rdf:RDF> 

The Scent of Fresh Meat

The RSS 1.0 document in Listing 2.24 describes the content appearing on the front page of the popular open-source software portal Freshmeat.net (http://www.freshmeat.net/).

Freshmeat.net 's RSS content feed is updated on a frequent basis with a list of the latest software added to the site; visit the web site for a copy of the latest version.

Now, this is a well-formed XML document, with clearly defined blocks for <channel> and <item> information. All that's needed now is some code to parse this document and return a list of the <item> s within it, together with the title, URL, and description of each.

With PHP's SAX parser, this is easy to accomplish. Listing 2.25 contains the code for a PHP class designed to parse the RSS document in Listing 2.24 and return PHP arrays containing the information within it. This information can then be formatted and displayed on a web page.

Listing 2.25 A PHP class to parse an RSS 1.0 document ( rssparser.class.inc )
 <?  class RSSParser  {     //      // class variables      //      // holds name of element currently being parser      var $tag = "";      // location variable indicating whether parser is within      // item or channel block      var $location = 0;      // array counter      var $counter = 0;      // name of RSS file      var $file = "";      // associative array for channel data      var $channelData = array();      // nested array of arrays for item data      // every element of this array will represent      // one item in the channel      var $itemData = array();      //      // class methods      //      // set the name of the RSS file to parse      // this is usually a local file      // set it to a remote file only      // if your PHP build supports fopen() over HTTP      function setRSS($file)      {         $this->file = $file;      }      // element handlers      // these keep track of the element currently being parsed      // and adjust $location and $tag accordingly      function startElementHandler($parser, $name, $attributes)      {         $this->tag = $name;          if ($name == "ITEM")          {             // if entering item block              // set location variable to 1              $this->location = 1;          }          else if ($name == "CHANNEL")          {             // if entering channel block              // set location variable to 2              $this->location = 2;          }      }      function endElementHandler($parser, $name)      {         $this->tag = "";          // if exiting channel or item block          // reset location variable to 0          if ($name == "ITEM")          {             $this->counter++;              $this->location = 0;          }          else if ($name == "CHANNEL")          {             $this->location = 0;          }      }        // character data handler      // this function checks to see whether the parser is      // currently reading channel or item information      // and appends the information to the appropriate array      function characterDataHandler($parser, $data)      {         $data = trim(htmlspecialchars($data));          // only interested in these three elements...          if ($this->tag == "TITLE"  $this->tag == "LINK"  $this->tag == "DESCRIPTION")          {             // if within an item block              // add data to item array              if ($this->location == 1)              {                 $this->itemData[$this->counter][strtolower($this->tag)] .= $data;              }              else if ($this->location == 2)              {                 // else add it to channel array                  $this->channelData[strtolower($this->tag)] .= $data;              }          }      }      // data retrieval methods      // this returns the array with channel information      function getChannelData()      {         return $this->channelData;      }      // this returns the array with item information      function getItemData()      {         return $this->itemData;      }      // all the work happens here      // parse the specified RSS file      // this populates the $channelData and $itemData arrays      function parseRSS()      {         // create parser          $this->xmlParser = xml_parser_create();          // set object reference          xml_set_object($this->xmlParser, $this);          // configure parser behaviour          xml_parser_set_option($this->xmlParser, XML_OPTION_CASE_FOLDING, TRUE);          xml_parser_set_option($this->xmlParser, XML_OPTION_SKIP_WHITE, TRUE);          // set up handlers          xml_set_element_handler($this->xmlParser, "startElementHandler", graphics/ccc.gif "endElementHandler");          xml_set_character_data_handler($this->xmlParser, "characterDataHandler");          // read RSS file          if (!($fp = fopen($this->file, "r")))          {               die("Could not read $this->file");          }          // begin parsing...          while ($data = fread($fp, 2048))          {             if (!xml_parse($this->xmlParser, $data, feof($fp)))              {                 die("The following error occurred: " . graphics/ccc.gif xml_error_string(xml_get_error_code($this->xmlParser)));              }          }          // destroy parser          xml_parser_free($this->xmlParser);      }  // end of class  }  ?> 

This might look complicated, but it's actually pretty simple. The class above attempts to simplify the task of parsing and using an RDF file by parsing it and extracting the information within it into the following two arrays:

  • The $channelData associative array, which contains information on the channel title, URL, and description

  • The $itemData array, which is a two-dimensional array containing information (title, URL, and description) on the individual items in the channel list. The total number of elements in the $itemData array corresponds to the total number of <item> elements in the RSS document.

The class also exposes the following public methods:

  • setRSS() ” Set the name of the RSS file to parse

  • parseRSS() ” Actually parse the specified RSS file and place the information extracted from it into the two arrays

  • getChannelData() ” Retrieve the array containing channel information

  • getItemData() ” Retrieve the array containing the item list

When using this class (look at Listing 2.26 for a usage example), the first step is, obviously, to specify the name of the RSS file to parse. Once this has been specified and stored in a class variable, the parseRSS() method is invoked to actually parse the document.

This parseRSS() method does all the things you've become familiar with in this chapter: Create an XML parser, configure it, set up callback functions, and sequentially iterate through the document, calling appropriate handlers for each XML construct encountered. As the parser moves through the document, it uses the $location variable to identify its current location, and the $tag variable to identify the name of the element currently being parsed. Based on these two pieces of data, the character data handler knows which array to place the descriptive channel/item information into.

An Object Lesson

Special mention should be made of the xml_set_object() function used within the parseRSS() class method in Listing 2.25. You've probably not seen this function before, so I'll take the opportunity to explain it a little.

The xml_set_object() function is designed specifically to associate an XML parser with a class, and to link class methods and parser callback functions together. Callback functions defined for the parser are assumed to be methods of the enveloping class.

In order to better understand why xml_set_object() is necessary, try commenting out the call to the xml_set_object() function in Listing 2.25, and see what happens.

Listing 2.26 demonstrates how the class from Listing 2.25 can be combined with the RSS document in Listing 2.24 to generate PHP arrays representing the RSS content, and how those arrays can then be manipulated to display the information as browser-readable HTML.

Listing 2.26 Parsing an RDF File and Formatting the Result as an HTML Document
 <?php  // include class  include("rssparser.class.inc");  // instantiate a new RSSParser  $rp = new RSSParser();  // define the RSS 1.0 file to parse  $rp->setRSS("fm-releases.rdf");  // parse the file  $rp->parseRSS();  // get channel information  $channel = $rp->getChannelData();  // retrieve item list (array)  // every element of this array is itself an associative array  // with keys ('title', 'link', 'description')  $items = $rp->getItemData();  // uncomment the next line to see a list of object properties  // print_r($rp);  ?>  <html>  <head><basefont face="Arial"></head>  <body>  <h2><? echo $channel['title']; ?></h2>  <ul>  <?  // iterate through item list  // print each item as a list item with hyperlink, title and description  foreach($items as $item)  {     echo "<li>";      echo "<a href=" . $item['link'] . ">" . $item['title'] . "</a>";      echo "<br>" . $item['description'];  }  ?>  </ul>  </body>  </html> 

The script in Listing 2.26 creates an instance of the RSSParser class and parses the specified RSS file via the parseRSS() class method. It then iterates through the arrays returned by the class methods getChannelData() and getItemData() , and formats the elements of these arrays for display.

Figure 2.2 demonstrates what the output of Listing 2.26 looks like.

Figure 2.2. The results of converting an RDF file into HTML with SAX.

graphics/02fig02.gif

I l @ ve RuBoard


XML and PHP
XML and PHP
ISBN: 0735712271
EAN: 2147483647
Year: 2002
Pages: 84

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net