Consuming Your First Feed | Professional Web APIs with PHP. eBay, Google, PayPal, Amazon, FedEx, Plus Web Feeds

Yahoo! produces a plethora of web feeds available to the world at large, so the first example looks at one of the Yahoo! feeds: http://rss.news.yahoo.com/rss/software. Take a quick look at that page in your browser and you will see the now familiar sight of an XML document that provides a live feed. Apart from the bits and bobs at the top of the feed, which provide information about the Yahoo! channel, you will see something like this:

 ... - <item>   <title>Astaro rolls out new spyware (InfoWorld)</title>   <guid isPermaLink="false">infoworld/20050308/57432</guid>   <pubDate>Tue, 08 Mar 2005 12:00:55 GMT</pubDate>   <description>InfoWorld - Astaro on Tuesday released an improved version of its Linux-based security package that now includes gateway-based spyware protection against malware and the ability to block and removed infected software already on a system.</description>   </item> - <item>   <title>Google Preps Enterprise-Ready Desktop Search (Ziff Davis)</title>   <guid isPermaLink="false">zd/20050308/147231</guid>   <pubDate>Tue, 08 Mar 2005 07:52:45 GMT</pubDate>   <description>Ziff Davis - As free desktop search tools raise corporate security and policy concerns, Google says it wants to reach enterprises with its software for searching local data.</description>   </item> ...

Obviously, by the time you read this, the actual feed items will have changed, and the start and end bits of the feed have been left out for brevity's sake. The important things here to note are item tags, which enclose, among other things, link, pubDate, and description tags. This repetitive type of structure is obviously a good candidate for an iterative process. But what do you do with it now? Somehow you need to take that XML and process it so that you can present it on your own web pages.

Luckily, PHP5 makes it pretty easy to consume web feeds using SimpleXML. Take a look at the following program:

 <?php   $request = "http://rss.news.yahoo.com/rss/software";   $response = file_get_contents($request);   $xml = simplexml_load_string($response);   echo "<h1>{$xml->channel->title}</h1>";   foreach($xml->channel->item AS $story)   {     echo "<a href=\"$story->link\" title=\"\">$story->title</a><br>";     echo "<p>$story->description</p><br><br>";   } ?>

You begin by defining the request string, which is the URL of the web feed you want to consume. The call to simplexml_load_string() takes the document returned by file_get_contents() and creates an easily accessible object, whereby each element of the XML document can be accessed directly through the object references. As you can see, you can even iterate through the various item elements with a foreach loop, printing out each item's pertinent information in turn. This process is discussed in more detail later; this is just a basic demo for now.

As you can tell, SimpleXML is pretty neat; it deals with the XML, and provides you with a clean interface to use it all. Unfortunately, it didn't come around until PHP5, so if you are still using PHP4, it isn't available to you. You do have some alternatives, however. MiniXML does the same sorts of things for you; however, the interface it provides is a bit different. Here is the preceding example, converted to use MiniXML:

 <?php   require("./minixml.inc.php");   $request = "http://rss.news.yahoo.com/rss/software";   $response = file_get_contents($request);   $parsedDoc = new MiniXMLDoc();  $parsedDoc->fromString($response);   $rootEl =& $parsedDoc->getRoot();   $title = $rootEl->getElementByPath('channel/title');   echo "<h1>" . $title->getValue() . "</h1>";   $returnedElement =& $rootEl->getElement('channel');   $elChildren =& $returnedElement->getAllChildren(); for($i = 0; $i < $returnedElement->numChildren(); $i++) {   if ($elChildren[$i]->name() == 'item')    {     $link = $elChildren[$i]->getElementByPath('link');     $title = $elChildren[$i]->getElementByPath('title');     $desc = $elChildren[$i]->getElementByPath('description');     echo "<a href=\"" . $link->getValue() . "\">" . html_entity_decode($title-         >getValue()) . "</a><br>";     echo "<p>" . html_entity_decode($desc->getValue()) . "</p><br><br>";    } } ?>

Note

Throughout the book I have attempted to keep included files in the same directory as the script itself. This is done for simplicity but is actually a poor security practice. Allowing included files to exist within your document root can be a large problem, because executing those scripts outside of their normal context can have unexpected consequences on your site as a whole. Play it safe and keep include files outside of the document root; just give Apache (or whatever web server you are using) access to the directory in question.

As you can see, the interfaces are in fact quite different, but the net result is the same. MiniXML is distributed under the GPL license, something to keep in mind if you intend to distribute projects based upon it. Psychogenic (creators of MiniXML) does have other licenses available if you need to distribute a project using it under other terms. The output for both scripts is identical (see Figure 3-1).

image from book
Figure 3-1