20.9 Parsing Using the DOM-XML Functions | PHP Developers Cookbook (2nd Edition)

You want to use the DOM-XML functions to parse a document.

Technique

First, you have to make sure to install PHP using the --with-dom configure option. Then you can parse an XML document like so:

 XML File: tst.xml <sites>     <site>         <title>PHP.net</title>         <url>http://www.php.net/</url>         <description>             The homepage of PHP.         </description>         <keywords>             MySQL, PHP, Documentation, downloads, articles, books         </keywords>     </site> </sites> PHP File: process.php <?php $start = array(title => "<h2>"); $end   = array(title => "</h2><br>",                keywords => "<br>",                description => "<br>"); $que[last] = ''; $doc = xmldocfile('tst.xml'); // Root node $root = $doc->root(); process_node($root); function process_node($node) {     global $que, $start, $end;     switch ($node->type) {         case XML_ELEMENT_NODE:             switch (($name = strtolower(trim($node->name)))) {                 case 'site':                     print '<br>';                     $que[last] = "";                     break;                 case 'title':                     print $start[title];                     $que[last] = $name;                     break;                 case 'description':                 case 'keywords':                     $que[last] = $name;                     $name = ucfirst($name);                     print "<b>${ name} :</b>: ";                     break;                 default:                     $que[last] = "";                     break;             }         case XML_TEXT_NODE:             if (!empty($que[last])) {                 print $node->content . $end[$que[last]];             }             $que[last] = '';             break;     }     $children = $node->children();     if (is_array($children)) {         foreach ($children as $child) {             process_node($child);         }     } } ?>

Description

The recipes before this one use Expat's SAX-based interface for processing XML documents. With SAX-based processors, you register handlers and then parse the XML document. When a particular tag or type of tag is reached, the handler is then called. Starting with PHP 4, PHP offers a Document Object Module (DOM)-based processor (by interfacing with libxml). DOM -based processors will parse an XML document into a tree that you can then access.

In the solution, we traverse a simple XML document, tst.xml, using the DOM-XML functions. The crux of this recipe lies in the xmldocfile() function, which loads an XML file into a tree structure. After we have the document in a tree structure, we loop through that tree, using the process_node() function to process the data.

On the most basic level, we pass the root node (top-level) to the process_node() function, and then we move through the XML document processing any children elements as we find them.

When processing an individual element, we first find out the node type by accessing the current node's type property. If it is XML_ELEMENT_NODE , meaning that it contains the element name (available via the name property) and the element's attributes, you can access the attributes through the attributes() method like so:

 <?php $attr = $node->attributes(); foreach ($attr as $attr_name => $attr_val) {     print "${attr_name}: ${attr_val}\n<br>\n"; } ?>

We check what type of element it is, and print the beginning data as appropriate. We also save the element's name in the $que[last] property if it is an element that will require further processing when we get to the XML_TEXT_NODE for that element. Otherwise, we empty the $que[last] property so that an element won't be processed twice when we handle XML_TEST_NODE s.

If the node's type is XML_TEXT_NODE ”meaning it contains the data contained in the previously processed node ”we are left to decide whether to output that node's data. If the $que[last] element is not empty, we print the text contained within the element and we add the ending HTML for the element stored in the $end array.

After we have processed the current element, we get a list of that element's children nodes (an array of children nodes to be exact) by calling the children() method. We loop through these nodes and process them with the process_node() function.

If you want to parse raw XML data, you can use the xmldoc() function, which behaves the same way as the xmldocfile() function except that it accepts XML data as the first argument instead of the name of a XML file:

 <?php $doc = xmldoc('<sites>     <site>         <title>PHP.net</title>         <url>http://www.php.net/</url>         <description>             The homepage of PHP.         </description>         <keywords>             MySQL, PHP, Documentation, downloads, articles, books         </keywords>     </site> </sites>'); $root = $doc->root(); // The same process_node() as in the solution process_node($root); ?>