Section 8.5. Link Checkers | Creating Web Sites: The Missing Manual

8.5. Link Checkers

A link checker is an automated tool that scans through one or more of your Web pages. It tests each link it finds by trying to retrieve the target page (the page your link is pointing to). Depending on the tool and the type of validation you're performing, link checkers might only scan internal links, or they might branch out to follow every link in every page until they've completely covered your Web site.

Sophisticated link checkers are built into programs like FrontPage and Dreamweaver, and they're great for digging through your Web site and finding problems. In Dreamweaver, use the command Site Check Links Sitewide to perform a link check. In FrontPage, you can use a similar feature by choosing View Reports Problems Hyperlinks.

Figure 8-7. In FrontPage, you can open an individual page for editing using File Open, or an entire Web site using File Open Site (just choose the top-level folder of your Web site). You can then rearrange and rename files in the Folder List and FrontPage will update any related links automatically.

Figure 8-8. In FrontPage, choose View Hyperlinks, and youll see this view, which shows you how the currently selected page fits into your Web site. In this example, the current page is index.htm. Arrows pointing away from index.htm represent links that lead to other pages. Arrows pointing to index.htm represent links in other pages that lead to index.htm. Click one of the plus (+) boxes next to another page, and you'll see all the links for that page, too.

The link checkers that are built into HTML editors work on the copy of your Web site that's stored on your computer. That's the best way to keep watch for errors as you're developing your Web site, but it's no help once your Web site's out in the wild. For example, it won't catch mistakes like linking to a local file on your hard drive or forgetting to upload a file you need to the Web server.

To get the final word on your Web site's links, you might want to try a free online link checker. The World Wide Web Consortium provides a solid choice at http://validator.w3.org/checklink. To start your free online link check, follow these steps:

Surf to http://validator.w3.org/checklink .

This takes you to the W3C Link Checker utility.
Enter the full URL for the page you want to check in the text box .

If your Web site has a default page like index.htm , you can type in just the domain name without explicitly supplying a file name .

Choose the options you want to apply (Figure 8-9) .

Figure 8-9. Start by choosing the Web page you want to check, and whether or not you use recursion (which is used in this example). For more on how recursion works, see step 3, Section 8.5. Then click Check to get started.

Select "Summary only" if you don't want to see the detailed list of steps that appears as the link checker examines each page. However, it's better to leave this option turned off, so you can get a better understanding of exactly what pages the link checker is examining.

Select "Hide redirects" if you want to ignore instructions that would redirect the link checker to another page (Section 8.5.1). Usually, redirects indicate that your link still works, but should be updated to a new page.

The "Don't send the Accept-header" option tells the link checker not to tell the Web site about its language preferences. This setting only has an effect if you're creating a multilingual Web site, which is beyond the scope of this book.

The "Checked linked documents recursively" option allows you to search more than one page at a time using recursion. If you don't use this option, the link validator simply checks every link in the page you specify, and makes sure it points to a live Web page. If you use recursion, the link validator checks all the links in the current page, and then it follows each link. For example, if you have a link that points to a page named info .htm , the link checker first verifies that info.htm exists. Then it finds all the links in info.htm , and starts testing them. In fact, if info.htm links to another page (like contact.htm ), the link checker branches out to that page and starts checking its links as well.

Note: The link checker is smart enough to avoid checking the same page twice. It also doesn't use recursion on external links. That means that if you start your link checker on the home page of your Web site, it will follow the links to get to every other page on your site, but it won't go any further. In other words, recursion is a great way to drill through all the links in your entire Web site in one go.

If you want to limit recursion (perhaps because you have a lot of pages and you don't need to check them all), you can supply a "recursion depth," which is the maximum number of levels you want to dig down. For example, if the recursion depth is 1, the link checker will only follow the first set of links. If you don't supply a recursion depth, the link checker checks everything.

Select "Save options in a cookie" if you want your browser to remember these choices .

If you use this option, the next time you use the link checker, the browser will fill in the check boxes using your previous settings.
Click Check to start checking links .
The link checker shows a report that lists each link it checks (Figure 8-10). This report is updated while the link checker works. If you're using recursion, you'll see the link checker branch out from one page to another. The report adds a separate section for each page .

Figure 8-10. The final report shows a list of links in anchors and images. Any links that lead to dead ends are highlighted in red. Links that may need attention are highlighted in yellow. One example is links that are redirected. Although they still work, they may be out of date and might not last for long.

8.5.1. Using Redirects

In order to be a good Web citizen, you also need to respect people that are linking to your Web site. That means once you create your Web site and it becomes popular, try to avoid tinkering with page and folder names . Making a minor change could disrupt someone else's link, making it impossible for an eager Web surfer to get to your site.

Some Web gurus handle this problem using redirects . When they rearrange their sites, they keep all the old files. However, they remove the content from the old files, and replace it with a redirect a special instruction that tells the browser to automatically navigate to a new page. The advantage of using a redirect is that it prevents a broken link, but it doesn't lock you into the old structure of your Web site if you've decided it's time to make a change.

To create a redirect, you need to add a special <meta> tag to the <head> portion of your Web page. This tag indicates the new destination using an absolute URL, and lists the number of seconds to wait before performing the redirect. Here's an example:

 <!doctype html public "-//w3c//dtd html 4.0 transitional//en"> <html> <head>  <meta http-equiv="REFRESH" content="10; URL=http://www.mysite.com/homepage.htm">  <title>Redirect</title> </head> <body> <h1>The page you want has moved</h1> <p> Please update your bookmarks. The new home page is <a href=" http://www.mysite.com/homepage.htm"> http://www.mysite.com/homepage.htm</a>. </p> <p> You should be redirected to the new site in 10 seconds. Click <a  <a href=" http://www.mysite.com/homepage.htm"> here</a> to visit the new page immediately. </p> </body> </html>

To adapt this page for your own purposes, just change the number of seconds (currently at 10) and the redirect URL. When the browser loads this page, it shows the temporary page for the indicated number of seconds, and then automatically requests the new page.

Redirect pages really serve two purposes. They keep your pages working when you change your Web site's structure, and they inform the Web visitor that the link is obsolete. That's where the time delay comes init provides a few seconds to notify the visitor that they're entering the Web site the wrong way. Many Web sites keep their redirect pages around for a relatively short amount of time (for example, a few months), after which they remove the page altogether.