11.11 Finding Stale and Fresh Links


You want to find all links on a certain page and see which ones work and which ones don't.

Technique

Roll your own engine. First, find all the links on the page and then check each one to make sure that it works:

 <?php include_once 'Snoopy.class.inc'; $good_urls = array(); $bad_urls  = array(); // Use snoopy to fetch all the links on yahoo.com $snoopy = new Snoopy; $snoopy->fetchlinks('http://www.yahoo.com/'); $links = $snoopy->results; //Expand URL's that are not fully qualified $links = expand_links($links, 'http://www.yahoo.com'); foreach ($links as $link) {     if (check_link($link))         array_push($good_urls, $link);     else         array_push($bad_urls, $link); } print "The Ok urls are:\n" . implode("\n", $good); print "\n\nThe Bad urls are:\n" . implode("\n", $bad); // Check to make sure the link works function check_link($url) {     $snoopy = new Snoopy;     $snoopy->fetch($url);     // A response code of 404 means that the     // file was not found.     if (intval($snoopy->response_code) == 404) {         return(false);     } else {         return(true);     } } // Expand links into their full paths function expand_links($links, $base_url) {     foreach ($links as $link) {         if (!preg_match('!^([a-z ])*\://!i', $link)) {             $link = ($link[0] ==='/') ?                     $baseurl . $link :                     $baseurl . '/' . $link;         }         $ret[] = $link;     }     return($ret); } ?> 

Comments

Here we use the Snoopy class, available from http://snoopy. sourceforge .net/. We will use Snoopy 's fetchlinks method to fetch all the links from http://www.yahoo.com/. into an array. We then use the expandLinks() function to expand all URLs into fully qualified URLs. Then we try to open each URL. If the connection is successful, we add the URL to the $good_urls array and then close the connection; otherwise , we add the URL to the $bad_urls array.



PHP Developer's Cookbook
PHP Developers Cookbook (2nd Edition)
ISBN: 0672323257
EAN: 2147483647
Year: 2000
Pages: 351

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net