Section 11.4. Directories and Search Engines


11.4. Directories and Search Engines

Now that you're well on your way to perfecting and popularizing your site, it's time to start looking at the second level of Internet promotionsearch engines. Getting your Web site into the most important search engine catalogs is a key step in publicizing your Web site. Working your way up the rankings so Web searchers are likely to find you takes more work, and monopolizes the late-night hours of many a Webmaster.

11.4.1. Directories

Directories are searchable site listings, with a difference: They're created by humans . That means a small army of computer workers painstakingly puts together a collection of sites, neatly sorted into categories. The advantage of directories is that they're well-organized. A couple of clicks can get you a complete list of California regional newspapers. The unquestioned disadvantage is that they're dramatically smaller than full-text search catalogs. That means directories aren't very useful for those in search of a piece of elusive information that doesn't easily fall into a category, like the most commonly misspelled words. Over the years , as the Web's ballooned in size , directories have become increasingly specialized, and full-text searching tools have become the most common way to hunt for information.

So, given that directories are just the unattractive cousins of full-text search engines, why do you need to worry about them? Two reasons. First, many Web surfers still use directories, even if they don't use them as often as full-text search engines. Second, some search engines (including Google) pay attention to directory listings, and tend to rank sites higher if they turn up in certain directories. Getting into the right directories can help you start to move up the list in a full-text search. And just like college, getting into a directory requires that you submit an application, which you'll learn about next .

11.4.1.1. The Open Directory Project

The most important directory to submit your site to is the Open Directory Project (ODP) at http://dmoz.org. The ODP is a huge, long-standing Web site directory that's staffed entirely by thousands of volunteer editors, who review submissions in countless categories. The ODP isn't the most popular Web directory (that honor currently goes to the Yahoo directory), but it is used behind the scenes by other search engines, including Google, Yahoo, AOL, HotBot, and Lycos. In fact, Google's own directory service (http://directory.google.com) is based on the ODP.

Before submitting to the ODP, take the time to make sure you do it right. An incorrect submission could result in your Web site not getting listed at all. You can find a complete description of the rules at http://dmoz.org/add.html, but here are the key requirements:

  • Don't submit your site more than once.

  • Don't submit your site to more than one category.

  • Don't submit more than one page or section of your site (unless you have a really good reason, like the separate sections are dramatically different).

  • Don't submit sites that contain "illegal" content. (By the OPD definition, this is more accurately described as unsavory content, like pornography, libelous content, or material that advocates illegal activityyou know who you are.)

  • Clean up any broken links, outdated information, or any other red flags that might suggest to an editor that your site isn't here for the long term .

  • When you submit your site, describe it carefully and accurately. Don't promote it. In other words "Ketchup Masters is a manufacturer of gourmet ketchup" is acceptable. " Ketchup Masters is the best food-oriented site on the Webthe Louisville Times says you can't miss it!" is not.

  • Don't submit a site that's not completed. Your "under construction" page won't get listed.

Next step is to spend some time at the http://dmoz.org site, until you've found the single best category for your site (see Figure 11-5).

Figure 11-5. Top: When you first surf to the ODP site, you're greeted by a group of general, top-level categories.
Bottom: As you click your way deeper into the hierarchy, you'll eventually find the specific subcategory that'll make a good home for your site. Here's the Arts Visual Arts Native and Tribal category. There are several subcategories (like Asia, with 22 sites). Categories with an @ after their names link to a different place in the hierarchy.


Once you've found the right section, click the "suggest URL" link at the top of the page and fill out the submission form (see Figure 11-6). The form includes your URL, the title of your site, a brief description, and your email address.

Figure 11-6. Here's a portion of the ODP submission form for a new site. Read all the instructions carefully, fill in the boxes, and then click the inviting Submit button at the bottom of the page (not shown here).



Tip: If you have some free time on your hands, you can offer to help edit a site categoryjust click the "become an editor" link. And even if you don't have editorial aspirations, why not check out the editor guidelines at http://dmoz.org/guidelines to get a better idea of what's going on in the mind of an ODP editor, and how he'll evaluate your Web site submission?

Once you've submitted your site, there's nothing to do but wait (and submit your site to the other directories and search engines discussed in this chapter). If two or three weeks pass without your site appearing in the listing, and you haven't received an email describing any problems with your site, try submitting again. If that still doesn't work, it's time to contact the editor of the category where you submitted your site. Write a polite email asking why your site wasn't added to the listing, and include the date of your submission(s) and the name , URL, and description of your site. You can find the email address for the category editor at the very bottom of the category page (see Figure 11-7).

11.4.1.2. Other directories

ODP is a great starting point, but it isn't the only directory on the block. The other two heavyweights are Yahoo and Looksmart. Unfortunately, getting your site on these directories takes considerably more work. If you've created a commercial site, you'll almost certainly need to pay a fee. If you've created a non-commercial site, you can probably get in free, but it may take persistence, emails, multiple submissions, and a bit of luck.

Here are some links to get your started:

  • For Yahoo (www.yahoo.com), the official submission guidelines are at http://docs.yahoo.com/ info /suggest. However, you'll be much happier with the unofficial write-up at www.apromotionguide.com/yahoo.html, which discusses your free and for-fee options, and explains what the cryptic rejection emails Yahoo sends you really mean.

  • For Looksmart (www.looksmart.com), it gets more confusing. You'll probably want to dodge the high fees and try to get in through the back door through a related Web site called Zeal (www.zeal.com). You can read the Zeal guidelines at www.zeal.com/guidelines/ user , and get a walkthrough of the horribly convoluted process at www.apromotionguide.com/looksmart.html.

Once you're done with directories (or just ready to move on), it's time to take a look at full-text search engines.

Figure 11-7. Click the editor's name ("sprice") to find out who he is, what categories he manages , and how you can email him.


11.4.2. Search Engines

For most people, search engines are the one and only tool for finding information on the Web. In order for the average person to find your Web site, you need to make sure your site is in the most popular search engine catalogs, and turns up as a result for the right searches. This task is harder than it seems, because the Web is full of millions of sites jockeying for position. In order to get noticed, you need to spend time developing your site and enhancing its visibility. You also need to understand how search engines rank pages (see the box below for an example).

The undisputed king of Web search engines is Google (www.google.com). Not only is Google far and away the most popular search engine, it also powers other search engines (usually without being credited). Google performs an amazing amount of workevery day it chews through hundreds of millions of search requests .


Tip: For more information about search engines, including who's on top and who owns whom, check out www.searchenginewatch.com.
UP TO SPEED
How Google's PageRank Works

Google uses a rating system called PageRank to size up different Web pages and determine how they rank when someone conducts a search.

PageRank isn't used to find search results; instead, it's used to order them. When you perform a search with Google, it pulls out all the sites that match your search keywords. Then, it orders its results according to the PageRank of each page.

The basic idea behind the PageRank system is that the value of your Web site is determined by the community of other Web sites that link to it. There are a few golden rules:

  • The more sites that link to you, the better.

  • A link from a more popular site (a site with a higher PageRank) is more valuable than a link from a less popular site.

  • The more links a site has, the less each link is worth. In other words, if someone links to your site and just a handful of other sites, that link is valuable. If someone links to your site and hundreds of other sites, the link's value is diluted.

Although Google regularly fine tunes its secret PageRank recipe, Web gurus spend hours trying to deconstruct it. For some fascinating reading, you can learn more about how PageRank works (loosely) at www.akamarketing.com/google-ranking-tips.html and www.markhorrell.com/seo/pagerank.html. The original formulation of PageRank is described in an academic paper by Google co-founders Sergey Brin and Larry Page at http://www-db. stanford .edu/~backrub/google.html.

For way more information about Google and its internal workings, be sure to check out Google: The Missing Manual .


It's not too difficult to get noticed by Google. By the time your site's about a month old, Google will probably have stumbled across it at least once, usually by following a link from another site or the ODP. A link to your site is the best way to introduce yourself to Google. As described in the box above, Google takes outside links into consideration when sizing up a site, so the more sites that link to you the more likely you are to turn up in someone's search results.

If you're impatient or you think Google's passing you by, you can introduce yourself directly using the submission form at www.google.com/addurl.html (see Figure 11-8). Most popular search engines include a submission form like this. Just make sure you keep track of where you've submitted, so you don't inadvertently submit to the same search engine more than once.

Figure 11-8. You can safely skip filling in the comments section on this page but make sure to include the http://prefix at the start of your Web page's URL.


11.4.2.1. Rising up in the rankings

You'll soon discover that it's not difficult to get into Google's catalog. However, you might find that it's exceedingly hard to get noticed. For example, suppose you've submitted the site www.SamMenzesHomemadePasta.com . To check if you're in Google, try an extremely specific search that targets just your site, like "Sam Menzes Homemade Pasta." This should definitely lead to your doorstep. Now, try searching for just "Homemade Pasta." Odds are, you won't turn up in the top 10, or even the top 100.

So how do you create a site that the casual searcher's likely to find? There's no easy answer. Just remember that the secret to getting a good search ranking is having a good PageRank, and getting a good PageRank is all about connections. In order to stand out, your Web site needs to share links with the other leading sites in your category area.

If you want to delve into the nitty-gritty of search engine optimization (known to Webmasters as SEO), consider becoming a regular reader of www.webmasterworld.com and www.searchenginewatch.com. You'll find articles and forums where Webmasters discuss the good, bad, and downright seedy tricks to try and get noticed.


Tip: It's possible to get too obsessed about search engine rankings. Here's a good rule of thumbdon't spend more time trying to improve your search engine ranking than you spend improving your Web site. In the long term, the only way to gain real popularity is to become one of the best sites on the block.
11.4.2.2. Google AdWords

As a Web surfer, you've no doubt seen several lifetimes' worth of flashing messages, gaudy banners, and invasive pop-ups, all trying to sell you some hideously awful products. It probably comes as no surprise to learn that these types of ads aren't the way to promote your sitethey're more likely to alienate people than entice them.

However, there are respectable paid placements that can get your site in front of the right readers, at the right time, and with the right amount of tact. One of the best choices is AdWords ( https ://adwords.google.com), Google's insanely flexible advertising system.

The idea behind AdWords is that you create text ads that Google will show alongside its regular search results. The neat part is that the ads aren't shown indiscriminately. Instead, you choose the search keywords that you want your ad to be associated with (see Figure 11-9).

The neat (and slightly confusing) part about AdWords is that you bid for the keywords you want to use. For example, you might tell Google you're willing to pay 25 cents for the keyword "food." Google takes this into consideration with everyone else's bids, and shows the higher bidders more often. (Google will tell you the highest bid, in case you just want to beat that by a penny.) However, you don't get charged anything for appearing on Google's page. You owe money only when someone clicks on your ad to get to your site.

By this point, you might be getting a little nervous. Given the fact that Google handles hundreds of millions of searches a day, isn't it possible for a measly one-cent bid to quickly put you and your site into bankruptcy? Fortunately, Google's got the solution for this, too. You just tell Google how much you're willing to pay per day. Once you hit your limit, Google stops showing your ad.

Interestingly, the bid amount isn't the only factor that determines how often an ad appears. Popularity is also important. If Google shows your ad over and over again and it never gets a click, Google realizes that your ad just isn't working, and lets you know with an automatic email message. It may then start showing your ad significantly less, or refrain from showing it altogether until you can improve it.

Figure 11-9. To see AdWords in action, try searching for a name brand like Microsoft. You'll see a section clearly marked Sponsored Links appear on the right side of the search results, or just above the search results in a blue shaded box.


AdWords can be competitive. In order to have a chance against all the AdWords sharks, you need to know how much a click is worth to your site. For example, if you're selling monogrammed socks, you need to know what percentage of visitors actually buy something (the conversion rate ) and how much they're likely to spend. A typical cost per click hovers around 35 cents, but there's a wide range. At last measure, the word free topped the charts at $1.33, while capitalism could be had for a songa mere 10 cents. (And in recent history, law firms have bid "mesothe-lioma"an asbestos-related cancer that could have a class action lawsuit in the makingup close to $100.) Before you sign up with AdWords, it's a good idea to conduct some serious research.


Note: You can learn more about AdWords from Google: The Missing Manual , which includes a whole chapter about it. You can also get an online introduction at http://searchenginewatch.com/sereport/article.php/2164591. Finally, for a change of pace, surf to www.iterature.com/adwords for a story about an artist's attempt to use AdWords to distribute AdWords, and why it failed.
11.4.2.3. Hiding from search engines

In rare situations, you might create a page that you don't want to turn up on a search engine. The most common reason is because you've posted some information that you want to share with only a few friends , like the latest Amazon e- coupons . If Google indexes your site, thousands of visitors could flood your way, sucking up your bandwidth for the rest of the month. Another reason may be that you're posting something semi-private that you don't want other people to stumble across, like a story about how you stole a dozen staplers from your boss. If you fall into the latter category, be very cautious. Keeping search engines away is the least of your problemsonce a site's on the Web, it will be discovered. And once it's discovered , it won't ever go away (see the box below).

But there is at least one thing you can do to minimize your site's visibility or, possibly, keep it off search engines altogether. To understand how this procedure works, recall that search engines do their work in several stages (Section 11.4.2). In the first stage, a robot program crawls across the Web downloading sites. You can tell this robot not to index your site, or a portion of it, in several ways. (Not all search engines respect these rules, but mostincluding Googledo.)

UP TO SPEED
Web Permanence

You've probably heard a lot of talk about the ever-changing nature of the Web. Maybe you're worried that the links you create today will lead to dead sites or missing pages tomorrow. Well, there's actually a much different issue taking shapeold site copies that just won't go away.

Once you put your work on the Web, you've lost control of it forever. The corollary to this sobering thought is: Always make sure you aren't posting something that's illegal, infringes on copyright, is personally embarrassing, or could get you fired . Because once you've put this material out on the Web, it may never go away.

For example, imagine you accidentally reveal your company's trade secret for carrot -flavored chewing gum. A few weeks later, an excited customer links to your site. You realize your mistake, and pull the pages off your Web server. But have you really contained the problem?

Assuming the Google robot has visited your site recently (which is more than likely), Google now has a copy of your old Web site. Even worse , people can get this cached (saved) copy from Google if they know about the cache keyword. For example, if the offending page's URL is www.GumLover.com/newProduct.htm , a savvy Googler can get the old copy of the page using the search "cache:www.GumLover.com/newProduct.htm." (Less savvy surfers might still stumble onto the cached copy of a page by clicking the Cached link that appears after each search result in Google's listing.) Believe it or not, this trick's been used before to get accidentally leaked information, ranging from gossip to software license keys.

You can try to get your page out of Google's cache as quickly as possible using the remove URL feature at http://services.google.com/urlconsole/controller. But even if this works, you're probably starting to see the problemthere's no way to know how many search engines have made copies of your work. Interested people who notice you've pulled down the information will hit these search engines and copy the details to their own sites, making it pretty near impossible to eliminate the lingering traces of your mistake. There are even catalogs dedicated to preserving old Web sites for posterity (see the Wayback Machine at www.archive.org).


To keep a robot away from a single page, add the robots meta tag to your page. Use the content value noindex, as shown here:

 <meta name="robots" content="noindex"> 

Remember, like all meta tags, you place this in the <head> section of your HTML document.

Alternatively, you can use the content nofollow to tell the robot to index the current page, but not to follow any of its links:

 <meta name="robots" content="nofollow"> 

If you have larger portions of your site that you want to block off, you're better off to create a specialized file called robots.txt , and place it in the top-level folder of your site. The robot will check this file before it goes any further. The content inside the robots.txt file sets the rules.

If you want to stop a robot from indexing any part of your site, add this to the robots.txt file:

 User-Agent: * Disallow: / 

The User-agent part identifies the type of robot you're addressing. (An asterisk represents all robots.) The Disallow part indicates what part of the Web site is off limits. (A single forward slash represent the whole site.)

To rope off just the Photos subfolder in your site, use this:

 User-Agent: * Disallow: /Photos 

To stop a robot from indexing certain types of content (like images), use this:

 User-Agent: * Disallow: /*.gif Disallow: /*.jpeg 

As this example shows, you can put as many Disallow rules as you want in the robots.txt file, one after the other.

Remember, the robots.txt file is just a set of guidelines for search engine robots. It's not a form of access control. In other words, it's similar to posting a "No Flyers" sign on your mailboxit works only as long as advertisers choose to heed it.


Tip: You can learn much more about robots, including how to tell when they visit your site, and how to restrict the robots coming from specific search engines, at www.robotstxt.org.


Creating Web Sites. The Missing Manual
Creating Web Sites: The Missing Manual
ISBN: B0057DA53M
EAN: N/A
Year: 2003
Pages: 135

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net