TechniqueRoll your own engine. First, find all the links on the page and then check each one to make sure that it works: <?php include_once 'Snoopy.class.inc'; $good_urls = array(); $bad_urls = array(); // Use snoopy to fetch all the links on yahoo.com $snoopy = new Snoopy; $snoopy->fetchlinks('http://www.yahoo.com/'); $links = $snoopy->results; //Expand URL's that are not fully qualified $links = expand_links($links, 'http://www.yahoo.com'); foreach ($links as $link) { if (check_link($link)) array_push($good_urls, $link); else array_push($bad_urls, $link); } print "The Ok urls are:\n" . implode("\n", $good); print "\n\nThe Bad urls are:\n" . implode("\n", $bad); // Check to make sure the link works function check_link($url) { $snoopy = new Snoopy; $snoopy->fetch($url); // A response code of 404 means that the // file was not found. if (intval($snoopy->response_code) == 404) { return(false); } else { return(true); } } // Expand links into their full paths function expand_links($links, $base_url) { foreach ($links as $link) { if (!preg_match('!^([a-z ])*\://!i', $link)) { $link = ($link[0] ==='/') ? $baseurl . $link : $baseurl . '/' . $link; } $ret[] = $link; } return($ret); } ?> CommentsHere we use the Snoopy class, available from http://snoopy. sourceforge .net/. We will use Snoopy 's fetchlinks method to fetch all the links from http://www.yahoo.com/. into an array. We then use the expandLinks() function to expand all URLs into fully qualified URLs. Then we try to open each URL. If the connection is successful, we add the URL to the $good_urls array and then close the connection; otherwise , we add the URL to the $bad_urls array. |