I l @ ve RuBoard |
Traversing the DOM with PHP's DOM ClassesBecause PHP's DOM parser works by creating standard objects to represent XML structures, an understanding of these objects and their capabilities is essential to using this technique effectively. This section examines the classes that form the blueprint for these objects in greater detail. DomDocument ClassA DomDocument object is typically the first object created by the DOM parser when it completes parsing an XML document. It may be created by a call to xmldoc() : $doc = xmldoc("<?xml version='1.0'?><element>potassium</element>"); Or, if your XML data is in a file (rather than a string), you can use the xmldocfile() function to create a DomDocument object: $doc = xmldocfile("element.xml");
When you examine the structure of the DomDocument object with print_r() ,you can see that it contains basic information about the XML document ”including the XML version, the encoding and character set, and the URL of the document: DomDocument Object ( [name] => [url] => [version] => 1.0 [standalone] => -1 [type] => 9 [compression] => -1 [charset] => 1 )
Each of these properties provides information on some aspect of the XML document:
The application can use this information to make decisions about how to process the XML data ”for example, as Listing 3.3 demonstrates , it may reject documents based on the version of XML being used. Listing 3.3 Using DomDocument Properties to Verify XML Version Information<?php // XML data $xml_string = "<?xml version='1.0'?><element>potassium</element>"; // create a DOM object if (!$doc = xmldoc($xml_string)) { die("Error in XML"); } // version check else if ($doc->version > 1.0) { die("Unsupported XML version"); } else { // XML processing code here } ?> In addition to the properties described previously, the DomDocument object also comes with the following methods :
While parsing XML data, you'll find that the root() method is the one you use most often, whereas the add_root() and dumpmem() methods come in handy when you're creating or modifying an XML document tree in memory (discussed in detail in the "Manipulating DOM Trees" section).
In Listing 3.4, the variable $fruit contains the root node (the element named fruit ). Listing 3.4 Accessing the Document Element via the DOM<?php // create a DomDocument object $doc = xmldoc("<?xml version='1.0' encoding='UTF-8' standalone='yes'?><fruit>watermelon</ fruit>"); // root node $fruit = $doc->root(); ?>
DomElement ClassThe PHP parser represents every element within the XML document as an instance of the DomElement class, which makes it one of the most important in this lineup. When you view the structure of a DomElement object, you see that it has two distinct properties that represent the element name and type, respectively.You'll remember from Listing 3.2 that these properties can be used to identify individual elements and extract their values. Here is an example: DomElement Object ( [type] => 1 [tagname] => vegetable ) A special note should be made here of the type property, which indicates the type of node under discussion. This type property contains an integer value mapping to one of the parser's predefined node types. Table 3.1 lists the important types. Table 3.1. DOM Node Types
If you plan to use the type property within a script to identify node types (as I will be doing shortly in Listing 3.5), you should note that it is considered preferable to use the named constants rather than their corresponding integer values, both for readability and to ensure stability across API changes. The DomElement object also exposes a number of useful object methods:
Again, the two most commonly used ones are the children() and attributes() methods, which return an array of DomElement and DomAttribute objects, respectively. The get_attribute() method can be used to return the value of a specific attribute of an element (refer to Listing 3.8 for an example), whereas the new_child() , set_attribute() , and set_content() methods are used when creating or modifying XML trees in memory, and are discussed in detail in the section entitled "Manipulating DOM Trees." Note that PHP's DOM implementation does not currently offer any way of removing an attribute previously set with the set_attribute() method.
Listing 3.5 demonstrates one of these in action by combining the children() method of a DomElement object with a recursive function and HTML's unordered lists to create a hierarchical tree mirroring the document structure (similar in concept, though not in approach, to Listing 2.5). At the end of the process, a count of the total number of elements encountered is displayed. Listing 3.5 Representing an XML Document as a Hierarchical List<?php // XML file $xml_file = "letter.xml"; // parse it if (!$doc = xmldocfile($xml_file)) { die("Error in XML document"); } // get the root node $root = $doc->root(); // get its children $children = get_children($root); // element counter // start with 1 so as to include document element $elementCount = 1; // start printing print_tree($children); // this recursive function accepts an array of nodes as argument, // iterates through it and prints a list for each element found function print_tree($nodeCollection) { global $elementCount; // iterate through array echo "<ul>"; for ($x=0; $x<sizeof($nodeCollection); $x++) { // add to element count $elementCount++; // print element as list item echo "<li>" . $nodeCollection[$x]->tagname; // go to the next level of the tree $nextCollection = get_children($nodeCollection[$x]); // recurse! print_tree($nextCollection); } echo "</ul>"; } // function to return an array of children, given a parent node function get_children($node) { $temp = $node->children(); $collection = array(); // iterate through children array for ($x=0; $x<sizeof($temp); $x++) { // filter out all nodes except elements // and create a new array if ($temp[$x]->type == XML_ELEMENT_NODE) { $collection[] = $temp[$x]; } } // return array containing child nodes return $collection; } echo "Total number of elements in document: $elementCount"; ?> Listing 3.5 is fairly easy to understand. The first step is to obtain a reference to the root of the document tree via the root() method; this reference serves as the starting point for the recursive print_tree() function. This function obtains a reference to the children of the root node, processes them, and then calls itself again to process the next level of nodes in the tree. The process continues until all the nodes in the tree have been exhausted. An element counter is used to track the number of elements found, and to display a total count of all the elements in the document. DomText ClassCharacter data within an XML document is represented by the DomText class. Here's what it looks like: DomText Object ( [type] => 3 [content] => cabbages ) The type property represents the node type ( XML_TEXT_NODE in this case, as can be seen from Table 3.1), whereas the content property holds the character data itself. In order to illustrate this, consider Listing 3.6, which takes an XML-encoded list of country names , parses it, and puts that list into a PHP array. Listing 3.6 Using DomText Object Properties to Retrieve Character Data from an XML Document<?php // XML data $xml_string = "<?xml version='1.0'?> <earth> <country>Albania</country> <country>Argentina</country> <!-- and so on --> <country>Zimbabwe</country> </earth>"; // create array to hold country names $countries = array(); // create a DOM object from the XML data if(!$doc = xmldoc($xml_string)) { die("Error parsing XML"); } // start at the root $root = $doc->root(); // move down one level to the root's children $nodes = $root->children(); // iterate through the list of children foreach ($nodes as $n) { // for each <country> element // get the text node under it // and add it to the $countries[] array $text = $n->children(); if ($text[0]->content != "") { $countries[] = $text[0]->content; } } // uncomment this line to see the contents of the array // print_r($countries); ?> Fairly simple ”a loop is used to iterate through all the <country> elements, adding the character data found within each to the global $countries array.
DomAttribute ClassA call to the attributes() method of the DomElement object generates an array of DomAttribute objects, each of which looks like this: DomAttribute Object ( [name] => color [value] => green ) The attribute name can be accessed via the name property, and the corresponding attribute value can be accessed via the value property. Listing 3.7 demonstrates how this works by using the value of the color attribute to highlight each vegetable or fruit name in the corresponding color. Listing 3.7 Accessing Attribute Values with the DomAttribute Object<?php // XML data $xml_string = "<?xml version='1.0'?> <sentence> What a wonderful profusion of colors and smells in the market - <vegetable color='green'>cabbages</vegetable>, <vegetable color='red'>tomatoes</vegetable>, <fruit color='green'>apples</fruit>, <vegetable color='purple'>aubergines</vegetable>, <fruit color='yellow'>bananas</fruit> </sentence>"; // parse it if (!$doc = xmldoc($xml_string)) { die("Error in XML document"); } // get the root node $root = $doc->root(); // get its children $children = $root->children(); // iterate through child list for ($x=0; $x<sizeof($children); $x++) { // if element node if ($children[$x]->type == XML_ELEMENT_NODE) { // get the text node under it $text = $children[$x]->children(); $cdata = $text[0]->content; // check its attributes to see if "color" is present $attributes = $children[$x]->attributes(); if (is_array($attributes) && ($index = is_color_attribute_present($attributes))) { // if it is, colorize the element content echo "<font color=" . $index . ">" . $cdata . "</font>"; } else { // else print it as is echo $cdata; } } // if text node else if ($children[$x]->type == XML_TEXT_NODE) { // simply print the content echo $children[$x]->content; } } // function to iterate through attribute list // and return the value of the "color" attribute if available function is_color_attribute_present($attributeList) { foreach($attributeList as $attrib) { if ($attrib->name == "color") { $color = $attrib->value; break; } } return $color; } ?> There is, of course, a simpler way to do this ”just use the DomElement object's get_attribute() method. Listing 3.8, which generates equivalent output to Listing 3.7, demonstrates this alternative (and much shorter) approach. Listing 3.8 Accessing Attribute Values (a Simpler Approach)<?php // XML data $xml_string = "<?xml version='1.0'?> <sentence> What a wonderful profusion of colors and smells in the market - <vegetable color='green'>cabbages</vegetable>, <vegetable color='red'>tomatoes</vegetable>, <fruit color='green'>apples</fruit>, <vegetable color='purple'>aubergines</vegetable>, <fruit color='yellow'>bananas</fruit> </sentence>"; // parse it if (!$doc = xmldoc($xml_string)) { die("Error in XML document"); } // get the root node $root = $doc->root(); // get its children $children = $root->children(); // iterate through child list for ($x=0; $x<sizeof($children); $x++) { // if element node if ($children[$x]->type == XML_ELEMENT_NODE) { // get the text node under it $text = $children[$x]->children(); $cdata = $text[0]->content; // check to see if element contains the "color" attribute if ($children[$x]->get_attribute("color")) { // "color" attribute is present, colorize text echo "<font color=" . $children[$x]->get_attribute("color") . ">" . $cdata . "</font>"; } else { // otherwise just print the text as is echo $cdata; } } // if text node else if ($children[$x]->type == XML_TEXT_NODE) { // print content as is echo $children[$x]->content; } } ?> A Composite ExampleNow that you know how it works, how about seeing how it plays out in real life? This example takes everything you learned thus far, and uses that knowledge to construct an HTML file from an XML document. I'll be using a variant of the XML invoice (Listing 2.21) from Chapter 2, adapting the SAX-based approach demonstrated there to the new DOM paradigm. As you'll see, although the two techniques are fundamentally different, they can nonetheless achieve a similar effect. Listing 3.9 is the marked -up invoice. Listing 3.9 An XML Invoice ( invoice.xml )<?xml version="1.0"?> <invoice> <customer> <name>Joe Wannabe</name> <address> <line>23, Great Bridge Road</line> <line>Bombay, MH</line> <line>India</line> </address> </customer> <date>2001-09-15</date> <reference>75-848478-98</reference> <items> <item cid="AS633225"> <desc>Oversize tennis racquet</desc> <price>235.00</price> <quantity>1</quantity> <subtotal>235.00</subtotal> </item> <item cid="GT645"> <desc>Championship tennis balls (can)</desc> <price>9.99</price> <quantity>4</quantity> <subtotal>39.96</subtotal> </item> <item cid="U73472"> <desc>Designer gym bag</desc> <price>139.99</price> <quantity>1</quantity> <subtotal>139.99</subtotal> </item> <item cid="AD848383"> <desc>Custom-fitted sneakers</desc> <price>349.99</price> <quantity>1</quantity> <subtotal>349.99</subtotal> </item> </items> <delivery>Next-day air</delivery> </invoice> Listing 3.10 parses the previous XML data to create an HTML page, suitable for printing or viewing in a browser. Listing 3.10 Formatting an XML Document with the DOM<html> <head> <basefont face="Arial"> </head> <body bgcolor="white"> <font size="+3">Sammy's Sports Store</font> <br> <font size="-2">14, Ocean View, CA 12345, USA http://www.sammysportstore.com/</font> <p> <hr> <center>INVOICE</center> <hr> <?php // arrays to associate XML elements with HTML output $startTagsArray = array( 'CUSTOMER' => '<p> <b>Customer: </b>', 'ADDRESS' => '<p> <b>Billing address: </b>', 'DATE' => '<p> <b>Invoice date: </b>', 'REFERENCE' => '<p> <b>Invoice number: </b>', 'ITEMS' => '<p> <b>Details: </b> <table width="100%" border="1" cellspacing="0" cellpadding="3"><tr><td><b>Item description</b></td><td><b>Price</b></td><td><b> Quantity</b></td><td><b>Sub-total</b></td></tr>', 'ITEM' => '<tr>', 'DESC' => '<td>', 'PRICE' => '<td>', 'QUANTITY' => '<td>', 'SUBTOTAL' => '<td>', 'DELIVERY' => '<p> <b>Shipping option:</b> ', 'TERMS' => '<p> <b>Terms and conditions: </b> <ul>', 'TERM' => '<li>' ); $endTagsArray = array( 'LINE' => ',', 'ITEMS' => '</table>', 'ITEM' => '</tr>', 'DESC' => '</td>', 'PRICE' => '</td>', 'QUANTITY' => '</td>', 'SUBTOTAL' => '</td>', 'TERMS' => '</ul>', 'TERM' => '</li>' ); // array to hold sub-totals $subTotals = array(); // XML file $xml_file = "/home/sammy/invoices/invoice.xml"; // parse document $doc = xmldocfile($xml_file); // get the root node $root = $doc->root(); // get its children $children = $root->children(); // start printing print_tree($children); // this recursive function accepts an array of nodes as argument, // iterates through it and: // - marks up elements with HTML // - prints text as is function print_tree($nodeCollection) { global $startTagsArray, $endTagsArray, $subTotals; foreach ($nodeCollection as $node) { // how to handle elements if ($node->type == XML_ELEMENT_NODE) { // print HTML opening tags echo $startTagsArray[strtoupper($node->tagname)]; // recurse $nextCollection = $node->children(); print_tree($nextCollection); // once done, print closing tags echo $endTagsArray[strtoupper($node->tagname)]; } // how to handle text nodes if ($node->type == XML_TEXT_NODE) { // print text as is echo($node->content); } // PI handling code would come here // this doesn't work too well in PHP 4.1.1 // see the sidebar entitled "Process Failure" // for more information } } // this function gets the character data within an element // it accepts an element node as argument // and dives one level deeper into the DOM tree // to retrieve the corresponding character data function getNodeContent($node) { $content = ""; $children = $node->children(); if ($children) { foreach ($children as $child) { $content .= $child->content; } } return $content; } ?> Figure 3.2 shows what the output looks like. Figure 3.2. Sammy's Sports Store invoice.
As with the SAX example (refer to Listing 2.23), the first thing to do is define arrays to hold the HTML markup for specific tags; in Listing 3.10, this markup is stored in the $startTagsArray and $endTagsArray variables . Next, the XML document is read by the parser, and an appropriate DOM tree is generated in memory. An array of objects representing the first level of the tree ”the children of the root node ”is obtained and the function print_tree() is called. This print_tree() function is a recursive function, and it forms the core of the script. The print_tree() function accepts a node list as argument, and iterates through this list, examining each node and processing it appropriately. As you can see, the function is set up to perform specific tasks , depending on the type of node:
Additionally, if the node is an element, the print_tree() function obtains a list of the element's children ”if any exist ”and proceeds to call itself with that node list as argument. And so the process repeats itself until the entire tree has been parsed. As Listing 3.10 demonstrates, this technique provides a handy way to recursively scan through a DOM tree and perform different actions based on the type of node encountered.You can use this technique to count, classify, and process the different types of elements encountered (Listing 3.5 demonstrated a primitive element counter); or even construct a new tree from the existing one.
|
I l @ ve RuBoard |