Hack81.Check for Broken Links


Hack 81. Check for Broken Links

Use output buffering to analyze the current page and CURL to check the links on a page to make sure they point to existing pages.

Broken links are the bane of web administrators; what's worse, a link that works today might not work next week, due to the ever-evolving nature of the Web. To help fix this pesky problem, the script in this hack captures blocks of HTML and checks the links for that section of markup. Then it provides a handy report noting any bad links, allowing you to easily find and repair problems.

8.4.1. The Code

Save the code in Example 8-5 as index.php. All this page does is present a link that works and one that doesn't.

Example 8-5. The host page for the link checker
 <?php require_once( "checklinks.php" ); ?> <html> <body> <?php checklinks_start() ?> <div style="width: 800px" /> <a href="http://www.cnn.com">CNN</a><br/> <a href="http://badlink">Bad link</a><br/> <?php checklinks_end() ?> </div> </body> </html> 

Save the script in Example 8-6 as checklinks.php.

Example 8-6. The link-checker code
 <?php function checklinks_start() { ob_start(); } function checklinks_end() {   $doc = ob_get_clean();   preg_match_all( "/\<a.*?href=[\"|\'](.*?)[\"|\']\>/", $doc, $found );   print( $doc );   $badlinks = array();   foreach( $found[1] as $link )   {     $ch = curl_init( $link ); ob_start(); curl_exec( $ch ); $out = ob_get_clean(); if ( curl_errno( $ch ) != 0 )        $badlinks []= $link; curl_close( $ch );   }   if ( count( $badlinks ) > 0 ) { ?> <br/> <table style="background: red;" cellspacing="2" cellpadding="2" width="100%"> <tr><td style="white; color: white; text-align:center;">Bad links</td></tr> <tr><td style="background: white;"> <?php foreach( $badlinks as $link ) { echo( $link."<br/>" ); } ?> </td></tr> </table> <?php } } ?> 

8.4.2. Running the Hack

Upload both scripts to your server and navigate to the index.phppage. If you are connected to the Web, you will get a browser display that looks like Figure 8-1.

Figure 8-1. Link checker with a report


The portion of HTML to be checked is bracketed with calls to checklinks_start() and checklinks_end( ). These methods in turn use ob_start() and ob_get_clean( ) to buffer the PHP output. The checklinks_end( ) function then uses the CURL functions to request the pages pointed to by the anchor (<a>) tags. As long as the computer is connected to the Internet, the link to the CNN home page should be fine; but the link to http://badlink will never be valid. As a result, the checklinks_end( ) function prints out a report detailing which links are broken.

8.4.3. See Also

  • "Test Your Application with Simulated Users" [Hack #82]

  • "Test Your Application with Robots" [Hack #83]

  • "Spider Your Site" [Hack #84]



PHP Hacks
PHP Hacks: Tips & Tools For Creating Dynamic Websites
ISBN: 0596101392
EAN: 2147483647
Year: 2006
Pages: 163

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net