How Many Pages on Your Site Are Indexed?


The number of your site's pages that you want indexed is all of themall of the public pages, anyway. Many of your pages might be privateunavailable to the general public because they are secured behind passwordsbut that's fine, because you do not want private pages indexed for the whole world to see anyway. No, the real problem is when public pages that you want to be searched are missing from the index.

Later in this chapter, we look at why some of your public pages are missing from search indexes, but first, we simply check how many pages you already have indexed. We calculate your site's inclusion ratiothe number of pages indexed divided by the number of pages you have on your site.

Determine How Many Pages You Have

Although this might sound exceedingly odd to those of you with smaller sites, it is not always easy to know how many pages are in a Web site. Especially for large decentralized sites, it might require a lot of thought to even estimate the total number of pages in your Web site. If you can easily estimate the number of pages on your site, feel free to skip ahead to the next section to check how many pages you have indexed.

As you begin the task of counting your Web pages, keep in mind that you should be counting "publicly available" pages only. That means no private (secured) pagespages locked behind passwordsought to be part of your calculation, because you do not want those pages on public display in a search engine. So, if you have special pages that show your customers their invoices or their order status, it makes sense that you have them password-protected so each customer sees only his own information. Don't count these pages in your site total, because you do not want them in the search indexes anyway.

For the purposes of calculating your inclusion ratio, it is wonderful if you know precisely the total number of pages on your site, but it is not required to be so accurate. If you do not know the exact number, there are several ways to make a reasonable estimate:

  • Ask your Webmaster. Your Webmaster might not know either, but he has probably been asked this question before and at least he has thought about the answer. Question your Webmaster's logic in guessing the total so you can evaluate its credibility.

  • Check your corporate search engine. If you have a search engine that allows visitors to search only your site, check to see how many of your Web site's pages are in your corporate search index. (Be aware that if your corporate search engine's index is updated through crawling, your corporate search index will be missing many of the same pages that Google and other Internet search engines are missing.)

  • Add up the counts from your content sources. Most Web pages are really document pagesthey have a document somewhere in your content management system or your e-Commerce catalog. Granted, you might not be able to count all of your pages accurately using this method, but it helps you make a more accurate estimate than taking a stab in the dark.

  • Use a special spider. You can unleash your own spider on your site. Special spiders, such as the free Xenu (http://home.snafu.de/tilman/xenulink.html) and the $98 OptiSpider (www.optitext.com/optispider), are designed to find pages on your site that you might have overlooked, and they can count what they get. Unfortunately, as with a corporate search engine, many of the same barriers that block Internet search spiders will block these special spiders, too. The good news is that special spiders can show you where they were blocked, so that you can take the corrective actions we show later in this chapter.

  • Check each search engine. This might seem odd, but each search engine has stored a different number of pages from your site. This is probably the worst method to use for estimation, but it is better than a complete guess. In the next section, we show how to coax the search engines to tell you how many of your pages they have included in their indexes.

After you have estimated how many pages are on your site, you are ready to check how many pages you have indexed in the major search engines.

Check How Many Pages Are Indexed

Search engines understand that you want to know how many pages of your site are indexed, and they have made it easy to do. Every search engine has a special search operator designed to show you how many pages it has stored in its index for a particular site.

To check how many pages you currently have included in Google, enter the query "site:yourdomain.com" to find the number of pages on the yourdomain Web site. For example, the query "site:coach.com" shows how many pages are indexed from the handbag manufacturer's site, as shown in Figure 10-3.

Figure 10-3. Checking how many pages are indexed. Coach.com has thousands of pages included in the Google search index.


Google is not aloneAOL, MSN, and Yahoo! all provide that special "site:" operator for you to see how many of your pages are indexed. Ask Jeeves makes it a bit tougher, forcing you to use its Advanced Search interface. You must choose a word that is on every one of your pages (such as a word from your company name) and then fill in yourdomain.com in the Domain or Site field before searching to get the Ask Jeeves page count.

Instead of entering these special operators by hand, you can use one of several tools to take the drudgery out of the reporting. MarketLeap (www.marketleap.com), a search marketing consultancy, offers their Search Engine Saturation Reporting Tool, as shown in Figure 10-4.

Figure 10-4. Tool for checking indexed pages. MarketLeap's Search Engine Saturation Report shows how many pages are currently indexed by engine.


You can see from this report that at least 181,000 pages from Intel's public Web site are indexed in Google. (Some of these pages might be duplicates, but most are unique.) Intel is also well represented in Yahoo! with nearly 72,000 pages indexed. Although both Yahoo! and Google index much of Intel's site, you can see there is a 100,000 page difference between the two engines. Different spiders crawl Intel's site differently, resulting in different pages being indexed.

Calculate Your Inclusion Ratio

You probably already guessed how to calculate your inclusion ratio (the percentage of your site's pages residing in a search index). Just take the number of pages found in a search index (Ask Jeeves, for example) and divide that by the total number of pages you have estimated to be on your site. For example, if Ask Jeeves reports that you have 10,000 pages indexed, and your content management system has 15,000 documents in it, your Ask Jeeves inclusion ratio is 10,000 ÷ 15,000 = 0.67 or 67 percent.

So what is the right metric to shoot for? It is minimally acceptable for you to have about 50 percent of your publicly available pages in the search indexes. Fifty percent is the minimum, but you can get nearly 100 percent included, if you work at it.

On rare occasions, you might find a Web site whose inclusion ratio exceeds 100 percent. No, the search engine is not handing out special bonus pages. Instead, you may have a serious problem on your site. The search index might contain duplicate pages, possibly because you have many dynamic URLs (which are explained later in the chapter). Even more serious, your site's private content (information that should be protected from public view) might be in the search index through a security error. Or, you might have underestimated the total pages on your site, which would be the happiest cause of a runaway inclusion ratio.

If you have nearly 100 percent of your site's pages indexed in all search engines, rejoice, and then skip the rest of this chapter. Most companies don't. Most Web sites have far less than 100 percent indexedsome have less than 5 percent. Next, you will learn how to increase your inclusion ratio, perhaps to 100 percent.



    Search Engine Marketing, Inc. Driving Search Traffic to Your Company's Web Site
    Search Engine Marketing, Inc.: Driving Search Traffic to Your Companys Web Site (2nd Edition)
    ISBN: 0136068685
    EAN: 2147483647
    Year: 2005
    Pages: 138

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net