The number of your site's pages that you want indexed is all of themall of the public pages, anyway. Many of your pages might be privateunavailable to the general public because they are secured behind passwordsbut that's fine, because you do not want private pages indexed for the whole world to see anyway. No, the real problem is when public pages that you want to be searched are missing from the index.
Later in this chapter, we look at why some of your public pages are missing from search indexes, but first, we simply check how many pages you already have indexed. We calculate your site's inclusion ratiothe number of pages indexed divided by the number of pages you have on your site.
Determine How Many Pages You Have
Although this might sound exceedingly odd to those of you with smaller sites, it is not always easy to know how many pages are in a Web site. Especially for large decentralized sites, it might require a lot of thought to even estimate the total number of pages in your Web site. If you can easily estimate the number of pages on your site, feel free to skip ahead to the next section to check how many pages you have indexed.
As you begin the task of counting your Web pages, keep in mind that you should be counting "publicly available" pages only. That means no private (secured) pagespages locked behind passwordsought to be part of your calculation, because you do not want those pages on public display in a search engine. So, if you have special pages that show your customers their invoices or their order status, it makes sense that you have them password-protected so each customer sees only his own information. Don't count these pages in your site total, because you do not want them in the search indexes anyway.
For the purposes of calculating your inclusion ratio, it is wonderful if you know precisely the total number of pages on your site, but it is not required to be so accurate. If you do not know the exact number, there are several ways to make a reasonable estimate:
After you have estimated how many pages are on your site, you are ready to check how many pages you have indexed in the major search engines.
Check How Many Pages Are Indexed
Search engines understand that you want to know how many pages of your site are indexed, and they have made it easy to do. Every search engine has a special search operator designed to show you how many pages it has stored in its index for a particular site.
To check how many pages you currently have included in Google, enter the query "site:yourdomain.com" to find the number of pages on the yourdomain Web site. For example, the query "site:coach.com" shows how many pages are indexed from the handbag manufacturer's site, as shown in Figure 10-3.
Figure 10-3. Checking how many pages are indexed. Coach.com has thousands of pages included in the Google search index.
Google is not aloneAOL, MSN, and Yahoo! all provide that special "site:" operator for you to see how many of your pages are indexed. Ask Jeeves makes it a bit tougher, forcing you to use its Advanced Search interface. You must choose a word that is on every one of your pages (such as a word from your company name) and then fill in yourdomain.com in the Domain or Site field before searching to get the Ask Jeeves page count.
Instead of entering these special operators by hand, you can use one of several tools to take the drudgery out of the reporting. MarketLeap (www.marketleap.com), a search marketing consultancy, offers their Search Engine Saturation Reporting Tool, as shown in Figure 10-4.
Figure 10-4. Tool for checking indexed pages. MarketLeap's Search Engine Saturation Report shows how many pages are currently indexed by engine.
You can see from this report that at least 181,000 pages from Intel's public Web site are indexed in Google. (Some of these pages might be duplicates, but most are unique.) Intel is also well represented in Yahoo! with nearly 72,000 pages indexed. Although both Yahoo! and Google index much of Intel's site, you can see there is a 100,000 page difference between the two engines. Different spiders crawl Intel's site differently, resulting in different pages being indexed.
Calculate Your Inclusion Ratio
You probably already guessed how to calculate your inclusion ratio (the percentage of your site's pages residing in a search index). Just take the number of pages found in a search index (Ask Jeeves, for example) and divide that by the total number of pages you have estimated to be on your site. For example, if Ask Jeeves reports that you have 10,000 pages indexed, and your content management system has 15,000 documents in it, your Ask Jeeves inclusion ratio is 10,000 ÷ 15,000 = 0.67 or 67 percent.
So what is the right metric to shoot for? It is minimally acceptable for you to have about 50 percent of your publicly available pages in the search indexes. Fifty percent is the minimum, but you can get nearly 100 percent included, if you work at it.
On rare occasions, you might find a Web site whose inclusion ratio exceeds 100 percent. No, the search engine is not handing out special bonus pages. Instead, you may have a serious problem on your site. The search index might contain duplicate pages, possibly because you have many dynamic URLs (which are explained later in the chapter). Even more serious, your site's private content (information that should be protected from public view) might be in the search index through a security error. Or, you might have underestimated the total pages on your site, which would be the happiest cause of a runaway inclusion ratio.
If you have nearly 100 percent of your site's pages indexed in all search engines, rejoice, and then skip the rest of this chapter. Most companies don't. Most Web sites have far less than 100 percent indexedsome have less than 5 percent. Next, you will learn how to increase your inclusion ratio, perhaps to 100 percent.