Recipe 12.4. Parsing Complex XML Documents | PHP Cookbook: Solutions and Examples for PHP Programmers

12.4.1. Problem

You have a complex XML document, such as one where you need to introspect the document to determine its schema, or you need to use more esoteric XML features, such as processing instructions or comments.

12.4.2. Solution

Use the DOM extension. It provides a complete interface to all aspects of the XML specification.

<?php $dom = new DOMDocument; $dom->load('address-book.xml'); foreach ($dom->getElementsByTagname('person') as $person) {     $firstname = $person->getElementsByTagname('firstname');     $firstname_text_value = $firstname->item(0)->firstChild->nodeValue;     $lastname = $person->getElementsByTagname('lastname');     $lastname_text_value = $lastname->item(0)->firstChild->nodeValue;     print "$firstname_text_value $lastname_text_value\n"; } ?> David Sklar Adam Trachtenberg

12.4.3. Discussion

The W3C's DOM provides a platform- and language-neutral method that specifies the structure and content of a document. Using the DOM, you can read an XML document into a tree of nodes and then maneuver through the tree to locate information about a particular element or elements that match your criteria. This is called tree-based parsing.

Additionally, you can modify the structure by creating, editing, and deleting nodes. In fact, you can use the DOM functions to author a new XML document from scratch; see Recipe 12.2.

One of the major advantages of the DOM is that by following the W3C's specification, many languages implement DOM functions in a similar manner. Therefore, the work of translating logic and instructions from one application to another is considerably simplified. PHP 5 comes with a new series of DOM methods that are in stricter compliance with the DOM standard than previous versions of PHP.

The DOM is large and complex. For more information, read the specification at http://www.w3.org/DOM/ or pick up a copy of XML in a Nutshell.

DOM functions in PHP are object oriented. To move from one node to another, access properties such as $node->childNodes, which contains an array of node objects, and $node->parentNode, which contains the parent node object. Therefore, to process a node, check its type and call a corresponding method, as shown in Example 12-5.

Parsing a DOM object

<?php // $node is the DOM parsed node <book cover="soft">PHP Cookbook</book> $type = $node->nodeType; switch($type) { case XML_ELEMENT_NODE:     // I'm a tag. I have a tagname property.     print $node->tagName;  // prints the tagname property: "book"     break; case XML_ATTRIBUTE_NODE:     // I'm an attribute. I have a name and a value property.     print $node->name;  // prints the name property: "cover"     print $node->value; // prints the value property: "soft"     break; case XML_TEXT_NODE:     // I'm a piece of text inside an element.     // I have a name and a content property.     print $node->nodeName;  // prints the name property: "#text"     print $node->nodeValue; // prints the text content: "PHP Cookbook"     break; default:     // another type     break; } ?>

To automatically search through a DOM tree for specific elements, use getElementsByTagname( ). Example 12-6 shows how to do so with multiple book records.

Card catalog in XML

<books>     <book>         <title>PHP Cookbook</title>         <author>Sklar</author>         <author>Trachtenberg</author>         <subject>PHP</subject>     </book>     <book>         <title>Perl Cookbook</title>         <author>Christiansen</author>         <author>Torkington</author>         <subject>Perl</subject>     </book> </books>

Example 12-7 shows how to find all authors.

Printing all authors using DOM

// find and print all authors $authors = $dom->getElementsByTagname('author'); // loop through author elements foreach ($authors as $author) {     // childNodes holds the author values     $text_nodes = $author->childNodes;     foreach ($text_nodes as $text) {          print $text->nodeValue . "\n";     } } Sklar Trachtenberg Christiansen Torkington

The getElementsByTagname( ) method returns an array of element node objects. By looping through each element's children, you can get to the text node associated with that element. From there, you can pull out the node values, which in this case are the names of the book authors, such as Sklar and TRachtenberg.

12.4.4. See Also

Recipe 12.3 for parsing simple XML documents; Recipe 12.5 for parsing large XML documents; documentation on DOM at http://www.php.net/dom; more information about the underlying libxml2 C library at http://xmlsoft.org/.