5.6 Before and After: Reading XML into a Tree

 <  Day Day Up  >  

The most common XML task is reading an XML document and printing out its contents. DOM and SimpleXML store XML documents in trees. This allows you to easily maneuver though the document to find information because if you know where your node is located, you can access it directly.

5.6.1 Reading an Address Book

The following programs take an XML document, read it into a tree, and then find a set of nodes. They all use the example address book from Example 5-1 as their data, search for all the people nodes, and then print out everyone's first and last name .

The first two versions use DOM, one written in PHP 4 and the other in PHP 5. There is also a SimpleXML version, which is considerably shorter.

5.6.1.1 PHP 4 and DOM

Example 5-5 demonstrates the PHP 4 DOM extension.

Example 5-5. Reading XML with PHP 4 and DOM
 $dom = domxml_open_file('address-book.xml'); foreach ($dom->get_elements_by_tagname('person') as $person) {     $firstname = $person->get_elements_by_tagname('firstname');     $firstname_text = $firstname[0]->first_child( );     $firstname_text_value = $firstname_text->node_value( );     $lastname = $person->get_elements_by_tagname('lastname'));     $lastname_text = $lastname[0]->first_child( );     $lastname_text_value = $lastname_text->node_value( );     print "$firstname_text_value $lastname_text_value\n"; }  Rasmus Lerdorf   Zeev Suraski  

Under PHP 4, DOM method names use underscores ( _ ) to separate words, such as get_elements_by_tagname( ) and first_child( ) .

To find all elements with a given name, use get_elements_by_tagname( ) . This returns an array of elements through which you can iterate. You foreach through each of the <person> s. Within this loop, there are two steps: retrieving the email element and then printing out the value.

The $person DOM object is just a subtree , or a portion, of the original DOM object, but it is also a full-featured node of its own. Therefore, you can still call $person->get_elements_by_tagname('firstname') to find all firstname elements, just as you earlier called $dom->get_elements_by_tagname('person') to locate person s. However, in this case, you only look through the elements in $person instead of the entire XML document.

Since get_elements_by_tagname( ) returns an array of DOM objects but the array contains only one element, it's referenced with $firstname[0] .

You can't directly call a method on a returned object; therefore, it takes two lines to grab an element's text. You must do $firstname_text = $firstname[0]->first_child( ) and then $firstname_text_value = $firstname_text->node_value( ) .

5.6.1.2 PHP 5 and DOM

DOM in PHP 5 uses objects instead of resources, as shown in Example 5-6.

Example 5-6. Reading XML with PHP 5 and DOM
 $dom = newDOMDocument; $dom->load('address-book.xml'); foreach ($dom->getElementsByTagname('person') as $person) {     $firstname = $person->getElementsByTagname('firstname');     $firstname_text_value = $firstname->item(0)->firstChild->nodeValue;     $lastname = $person->getElementsByTagname('lastname');     $lastname_text_value = $lastname->item(0)->firstChild->nodeValue;     print "$firstname_text_value $lastname_text_value\n"; }  Rasmus Lerdorf   Zeev Suraski  

This is similar to the PHP 4 example. Method names, however, are altered to use the studlyCaps naming convention. So, get_elements_by_tagname( ) is getElementsByTagname( ) .

Such a trivial change seems designed merely to frustrate programmers. However, the DOM standard uses studlyCaps and so do the DOM methods in every other language, including C, Java, Perl, and Python. Therefore, the methods were renamed to bring them into line with this convention.

The getElementsByTagname( ) method now returns a DOM node list instead of an array. Therefore, you access the elements using the item( ) method. For example:

 // PHP 4 $firstname[0]; // PHP 5 $firstname->item(0); 

Two changes result from PHP 5's improved object model. You no longer need to use a temporary variable to hold an object before accessing one of its methods. So, to find the $firstname_text_value , you can write $firstname->item(0)->firstChild->nodeValue . Also, the firstChild and nodeValue properties are no longer methods, so they don't take ( ) s.

Another way to read XML with PHP 5 and DOM is to create a custom class. Since the DOMDocument is now a PHP object, you can extend it and create your own methods. Example 5-7 creates an addressBook class that contains a printNames( ) method.

Example 5-7. Reading XML with PHP 5 and DOM by subclassing DOMDocument
 class addressBook extends DOMDocument {     public function printNames( ) {         foreach ($this->getElementsByTagname('person') as $person) {             $firstname = $person->getElementsByTagname('firstname');             $firstname_text_value = $firstname->item(0)->firstChild->nodeValue;                  $lastname = $person->getElementsByTagname('lastname');             $lastname_text_value = $lastname->item(0)->firstChild->nodeValue;                  print "$firstname_text_value $lastname_text_value\n";         }     } } $ab = new addressBook; $ab->load('address-book.xml'); $ab->printNames( );  Rasmus Lerdorf   Zeev Suraski  

The addressBook class has access to all the methods and properties of DOMDocument , so you interact with it just like a standard DOM object.

You can also create new methods, such as printNames( ) . The only difference between the code inside printNames( ) and the previous example is that you now refer to the DOM object as $this .

This makes it very easy to write a set of DOM manipulation methods that help you navigate through the object while keeping your ability to invoke a particular DOM method when necessary.

5.6.1.3 PHP 5 and SimpleXML

The SimpleXML version is the shortest, as shown in Example 5-8.

Example 5-8. Reading XML with PHP 5 and SimpleXML
 $sx = simplexml_load_file('address-book.xml'); foreach ($sx->person as $person) {     $firstname_text_value = $person->firstname;     $lastname_text_value = $person->lastname;          print "$firstname_text_value $lastname_text_value\n"; }  Rasmus Lerdorf   Zeev Suraski  

When you use SimpleXML, there's no need to call a method to retrieve a set of elements with the same tag name. Instead, you can directly iterate over them using foreach . Here, the iteration occurs over $sx->person , which holds all the person nodes.

You can also directly print SimpleXML objects:

 foreach ($sx->person as $person) {     print "$person->firstname $person->lastname\n"; }  Rasmus Lerdorf   Zeev Suraski  

PHP interpolates SimpleXML objects inside of quoted strings and retrieves the text stored in them.

5.6.2 Reading an Address Book with Namespaces

Reading namespaced elements is similar to reading global elements, but in addition to the element's local name, you must also provide the namespace.

The following examples use Example 5-2s revised XML address book, which places everything in a namespace.

5.6.2.1 PHP 4 and DOM

There is no easy way to get all namespaced elements in PHP 4 if you just use DOM. The best way to solve this problem is to use DOM in conjunction with XPath. See Section 5.7, later in this chapter, for more details.

5.6.2.2 PHP 5 and DOM

PHP 5 has a special DOM method to retrieve elements within a namespace: getElementsByTagnameNS( ) . To search a namespaced document, replace the calls to getElementsByTagname( ) with getElementsByTagnameNS( ) . Pass the namespace URL as the first argument and the element's local name as the second argument, as shown in Example 5-9.

Example 5-9. Reading namespaced XML with PHP 5 and DOM
 $ab = 'http://www.example.com/address-book/'; $dom = new DOMDocument; $dom->load('address-book-ns.xml'); foreach ($dom->getElementsByTagnameNS($ab, 'person') as $person) {     $firstname = $person->getElementsByTagnameNS($ab, 'firstname');     $firstname_text_value = $firstname->item(0)->firstChild->nodeValue;     $lastname = $person->getElementsByTagnameNS($ab, 'lastname');     $lastname_text_value = $lastname->item(0)->firstChild->nodeValue;     print "$firstname_text_value $lastname_text_value\n"; }  Rasmus Lerdorf   Zeev Suraski  

XML documents use a namespace prefix to handle the problem of bulky URLs. Since DOM methods use the actual name instead of a prefix, using a variable as a surrogate prefix makes code easier to read and ensures requests always use the same namespace.

Besides the modified getElementsByTagNameNS( ) method, this code is identical to the non-namespaced version in Example 5-6.

5.6.2.3 PHP 5 and SimpleXML

SimpleXML assumes that if you're reading an XML document with namespaces, then they were probably forced upon you by some namespace tyrant and you're wishing that the namespaces would just disappear. So it does its best to grant your request.

The code to read the namespaced version of the address book is almost identical to the non-namespaced version in Example 5-7, except that you specify the namespace with the children( ) method. This tells SimpleXML to look for elements inside a namespace. In Example 5-10, this method is used inside the foreach to find person elements.

Example 5-10. Reading namespaced XML with PHP 5 and SimpleXML
 $ab = 'http://www.example.com/address-book/'; $sx = simplexml_load_file('address-book-ns.xml'); foreach ($sx->children($ab)->person as $person) {         $firstname_text_value = $person->firstname;         $lastname_text_value = $person->lastname;                  print "$firstname_text_value $lastname_text_value\n"; }  Rasmus Lerdorf   Zeev Suraski  

The children( ) method takes a single argument, the namespace URL. The code $sx->children($ab)->person is equivalent to $sx->person , except that the call to children($ab) makes SimpleXML return only person elements inside of the http://www.example.com/address-book/ namespace.

Once the namespace has been set, SimpleXML remembers it and assumes any further children also live in that namespace. Therefore, you can access $person->firstname without calling children( ) again, but it is okay to do so:

 $sx = simplexml_load_file('address-book-ns.xml'); $ab = 'http://www.example.com/address-book/'; foreach ($sx->children($ab)->person as $person) {     $firstname_text_value = $person->children($ab)->firstname;     $lastname_text_value = $person->children($ab)->lastname;          print "$firstname_text_value $lastname_text_value\n"; } 

 <  Day Day Up  >  


Upgrading to PHP 5
Upgrading to PHP 5
ISBN: 0596006365
EAN: 2147483647
Year: 2004
Pages: 144

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net