5.7 Before and After: Searching XML with XPath

 <  Day Day Up  >  

Except for the simplest documents, it's rarely easy to access the data you want one element at a time. As your XML files become increasingly complex and your parsing desires grow, using XPath is easier than filtering the data inside a foreach .

In PHP 4 and PHP 5, there is an XPath class that takes a DOM object as its constructor. You can then search the object and receive DOM nodes in reply. SimpleXML also supports XPath, and it's easier to use because its integrated into the SimpleXML object.

Whether you're using XPath with DOM or SimpleXML, both extensions support XML Namespaces. Therefore, you can find elements or attributes that live inside a namespace.

5.7.1 Reading an Address Book

Everything that can be done with the traditional DOM methods can also be done using XPath. The examples in this section also show how to search the address book.

5.7.1.1 PHP 4 and DOM

To create an XPath query under PHP 4, you must start with a DOM object and pass it off to an XPath context.

Here's how to retrieve the email addresses of everyone in the address book:

 $dom = domxml_open_file('address-book.xml'); $xpath = xpath_new_context($dom); $emails = $xpath->xpath_eval('/address-book/person/email'); // Can also be: // $emails = xpath_eval($xpath, '/address-book/person/email'); foreach ($emails->nodeset as $e) {     $tmp = $e->first_child( );     $email = $tmp->node_value( );     // do something with $email } 

After creating a new DOM object, call xpath_new_context( ) to initialize the XPath context. Query this context using xpath_eval( ) , passing the XPath query as the first parameter (in this example, it's /people/person/email ). This function returns an array of matching DOM nodes, which are stored in the array's nodeset element. Iterate through nodeset to act upon each node in turn .

By default, xpath_eval( ) operates upon the entire XML document. Search a subsection of the tree by passing in the subtree as a final parameter to xpath_eval( ) . For instance, to gather all the first and last names of people in the address book, retrieve all the people nodes and query each node individually, as in Example 5-11.

Example 5-11. Using XPath with PHP 4 and DOM
 $dom = domxml_open_file('address-book.xml'); $xpath = xpath_new_context($dom); $person = $xpath->xpath_eval('/address-book/person'); foreach ($person->nodeset as $p) {     $fn = $xpath->xpath_eval('firstname', $p);     $tmp = $fn->nodeset[0]->first_child;     $firstname = $tmp->node_value( );     $ln = $xpath->xpath_eval('lastname', $p);     $tmp = $ln->nodeset[0]->first_child( );     $lastname = $tmp->node_value( );     print "$firstname $lastname\n"; }  Rasmus Lerdorf   Zeev Suraski  

Inside the foreach , call xpath_eval( ) to retrieve the firstname and lastname nodes. Now, in addition to the XPath query, also pass $people to the method. This makes the search local to the node.

5.7.1.2 PHP 5 and DOM

DOM supports XPath queries, but again you do not perform the query directly on the DOM object itself. In keeping with PHP 5's superior OO capabilities, instead of the using the xpath_new_context( ) function, you create a DOMXPath object:

 $dom = newDOMDocument; $dom->load('address-book.xml'); $xpath = new DOMXPath($dom); $email = $xpath->query('/address-book/person/email'); 

Instantiate DOMXPath by passing in a DOMDocument to the constructor. To execute the XPath query, call query( ) with the query text as your argument. This returns an iterable DOM node list of matching nodes. The email example is now:

 $dom = newDOMDocument; $dom->load('address-book.xml'); $xpath = new DOMXPath($dom); $emails = $xpath->query('/address-book/person/email'); foreach ($emails as $e) {     $email = $e->firstChild->nodeValue;     // do something with $email } 

The fundamental logic is the same for this example and the PHP 4 DOM version; however, this code is cleaner. The XPath querying method no longer places the node list in a nodeset array element, so you iterate directly over the returned list.

Also, the PHP 4 example requires a temporary variable, $tmp , to hold $e->firstChild . With PHP 5, you can access $e->firstChild->nodeValue directly.

The more complex example, where you retrieve firstname and lastname , is significantly shorter than the PHP 4 version, as shown in Example 5-12.

Example 5-12. Using XPath with PHP 5 and DOM
 $dom = newDOMDocument; $dom->load('address-book.xml'); $xpath = new domXPath($dom); $person = $xpath->query('/address-book/person'); foreach ($person as $p) {     $fn = $xpath->query('firstname', $p);     $firstname = $fn->item(0)->firstChild->nodeValue;     $ln = $xpath->query('lastname', $p);     $lastname = $ln->item(0)->firstChild->nodeValue;     print "$firstname $lastname\n"; }  Rasmus Lerdorf   Zeev Suraski  

Again, the temporary variable is eliminated, as well as the need to reference nodeset . However, the syntax to restrict query to a subset of the tree is the same as that in the email example ”you still pass the subtree in as a second parameter.

5.7.1.3 PHP 5 and SimpleXML

In contrast to DOM, all SimpleXML objects have an integrated query( ) method. Calling this method queries the current object using XPath and returns a SimpleXML object containing the matching nodes, so you don't need to instantiate another object to use XPath. The method's one argument is your XPath query.

To find all the matching email addresses in Example 5-1s sample address book:

 $s = simplexml_load_file('address-book.xml'); $emails = $s->xpath('/address-book/person/email'); foreach ($emails as $email) {     // do something with $email } 

This is shorter because there's no need to dereference the firstNode or to take the nodeValue .

SimpleXML handles the more complicated example, too. Since xpath( ) returns SimpleXML objects, you can query them directly, as in Example 5-13.

Example 5-13. Using XPath with PHP 5 and SimpleXML
 $s = simplexml_load_file('address-book.xml'); $people = $s->xpath('/address-book/person'); foreach($people as $p) {     list($firstname) = $p->xpath('firstname');     list($lastname) = $p->xpath('lastname');          print "$firstname $lastname\n"; }  Rasmus Lerdorf   Zeev Suraski  

Since the inner XPath queries return only one element, use list to grab it from the array.

5.7.2 Reading an Address Book with Namespaces

When your elements and attributes live inside an XML namespace, you must indicate this in your XPath request. To refer to a namespaced element in XPath, append a namespace prefix and colon to the local name .

The revised XPath query to find email addresses is as follows :

 /ab:address-book/ab:person/ab:email 

Of course, this assumes that your XPath processor knows to map the prefix ab to the namespace http://www.example.com/address-book/ . Sometimes this is done automatically; other times, you must do this in your program.

In PHP, every XPath extension provides a function that lets you associate prefixes and namespace URLs. In PHP 4, you have to call this function to register all namespaces. In PHP 5, both DOM and SimpleXML will automatically register any namespaces used in the document with their prefix.

Of course, you are always free to register another prefix for a namespace. This is a good idea if you don't produce all the XML documents you're consuming. If your data provider alters their namespace prefix, their document is still valid because a validator ignores prefixes and only examines the actual namespace URL. However, if you're relying on a specific prefix in your code, your application will break.

Regardless of whether you're using PHP 4 or PHP 5, to use XPath to find elements living in the default namespace, you must always manually assign a prefix to the namespace inside your program and then use that prefix in your XPath query. This is not a limitation of PHP, but a design "feature" of the XPath specification.

Example 5-14 modifies Example 5-2 to use a default namespace instead of the prefix ab .

Example 5-14. Example default namespaced XML address book
 <address-book      xmlns="http://www.example.com/address-book/">     <person id="1">         <!--Rasmus Lerdorf-->         <firstname>Rasmus</firstname>         <lastname>Lerdorf</lastname>         <city>Sunnyvale</city>         <state>CA</state>         <email>rasmus@php.net</email>     </person>     <person id="2">         <!--Zeev Suraski-->         <firstname>Zeev</firstname>         <lastname>Suraski</lastname>         <city>Tel Aviv</city>         <state></state>         <email>zeev@php.net</email>     </person> </address-book> 

If you encounter an XML document like this, you cannot use the following XPath query:

 /address-book/person/email 

This query searches for non-namespaced elements, and your elements live in a namespace, even if they do not explicitly indicate this. You should create a prefix to use with the http://www.example.com/address-book/ namespace and use that in your queries. The code for this is shown in the following examples.

5.7.2.1 PHP 4 and DOM

Example 5-15 shows how to find first and last names in a namespaced XML document in PHP 4.

Example 5-15. Using XPath with PHP 4, DOM, and namespaces
 $dom = domxml_open_file('address-book-ns.xml'); $xpath = new domXPath($dom); $xpath->xpath_register_ns('ab', 'http://www.example.com/address-book/'); $person = $xpath->query('/ab:address-book/ab:person'); foreach ($person as $p) {     $fn = $xpath->query('ab:firstname', $p);     $firstname = $fn[0]->firstChild->nodeValue;     $ln = $xpath->query('ab:lastname', $p);     $lastname = $ln[0]->firstChild->nodeValue;     print "$firstname $lastname\n"; }  Rasmus Lerdorf   Zeev Suraski  

There are two differences between this version and the non-namespaced one. First, there's a call to xpath_register_ns( ) to associate ab with the namespace. Second, wherever there are XPath queries, all elements are prefixed with ab : (for example, /ab:address-book/ab:person ).

5.7.2.2 PHP 5 and DOM

In PHP 5, there are few differences between Example 5-16 and the non-namespaced version because the DOMXPath object automatically registers ab as a prefix.

Example 5-16. Using XPath with PHP 5, DOM, and namespaces
 $dom = newDOMDocument; $dom->load('address-book-ns.xml'); $xpath = new DOMXPath($dom); $person = $xpath->query('/ab:address-book/ab:person'); foreach ($person as $p) {     $fn = $xpath->query('ab:firstname', $p);     $firstname = $fn->item(0)->firstChild->nodeValue;     $ln = $xpath->query('ab:lastname', $p);     $lastname = $ln->item(0)->firstChild->nodeValue;     print "$firstname $lastname\n"; }  Rasmus Lerdorf   Zeev Suraski  

As you can see, the changes occur only in the XPath query syntax.

To register a namespace, call the DOMXPath object's registerNamespace( ) method. For example, this registers ab :

 $xpath->registerNamespace('ab', 'http://www.example.com/address-book/'); 

5.7.2.3 PHP 5 and SimpleXML

SimpleXML also registers namespaces for you, so it too requires minimal modifications, as shown in Example 5-17.

Example 5-17. Using XPath with PHP 5, SimpleXML, and namespaces
 $s = simplexml_load_file('address-book-ns.xml'); $people = $s->xpath('/ab:address-book/ab:person'); foreach($people as $p) {     list($firstname) = $p->xpath('ab:firstname');     list($lastname) = $p->xpath('ab:lastname');          print "$firstname $lastname\n"; }  Rasmus Lerdorf   Zeev Suraski  

You cannot currently register namespaces using SimpleXML. This means that the only way to access elements in the default namespace is to invoke a cumbersome XPath expression.

Example 5-18 shows how to query the address book when the elements live inside a default namespace, as in Example 5-14.

Example 5-18. Using XPath with PHP 5, SimpleXML, and default namespaces
 $s = simplexml_load_file('address-book-ns.xml'); $ab = 'http://www.example.com/address-book/'; $people = $s->xpath("/*[local-name( )='address-book' and                          namespace-uri( )='$ab']                      /*[local-name( )='person'       and                          namespace-uri( )='$ab']"); foreach($people as $p) {     list($firstname) = $p->xpath("*[local-name( )='firstname' and                                      namespace-uri( )='$ab']");     list($lastname)  = $p->xpath("*[local-name( )='lastname'  and                                      namespace-uri( )='$ab']");          print "$firstname $lastname\n"; }  Rasmus Lerdorf   Zeev Suraski  

This complex XPath expression uses the local-name( ) and namespace-uri( ) XPath functions to search for nodes based on the namespace URL, instead of the more concise syntax that uses namespace prefixes.

For more on XPath, see Section A.6.

 <  Day Day Up  >  


Upgrading to PHP 5
Upgrading to PHP 5
ISBN: 0596006365
EAN: 2147483647
Year: 2004
Pages: 144

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net