Defining a Special Search


A typical search won't meet every need. In fact, the searches many people conduct are less than successful or inefficient when successful because keywords alone usually can't express a query very well. (A search with results that require half a day to process when all you need is one answer is both successful and inefficient.) Even the addition of a Boolean search won't help you locate the information you need in many cases. The problem is that you haven't defined the search criteria completely. You can test most Google special searches using the Advanced Search Page shown in Figure 2.2.

click to expand
Figure 2.2: The Google Advanced Search page helps you learn about special searches.

Look at the issue this way. Let's say that it takes about a minute for you to click a link in Google and decide whether the link will work or not. A minute isn't very long when you consider the time required to load the page and actually look at some of the information. A small result set of 60 links requires 60 minutes to search. Many of the search results we've discussed so far in the book contain thousands of links. Even if you eliminate many of those links by reading the snippets that Google provides, you can still spend a lot of time looking for the link you need.

Special searches can help define searches in more precise terms. In fact, careful use of search terms can reduce the number of results to the few you need. A perfect search returns just the results you need, but a less optimal result is acceptable in most cases. The idea is to reduce the number of results to a manageable level. The following sections describe various types of special search terms.

Including Terms

Google doesn't recognize common words as valid search terms. The kit refers to many of these terms as stop words. The examples include where and how . However, many terms fall into this category ”terms that you might need to adequately define your search. Obviously, small words such as the fall into this category, as do most numbers .

Whenever you want to ensure that Google considers a stop word you provide add a + (plus) sign in front of it. For example, using a search such as +for whom +the bell tolls ensures that Google considers both for and the as part of the search criteria.

Note  

Always include terms carefully because adding stop words normally increases the number of search results. Unless you know that a stop word will actually improve the search results, avoid using it.

Excluding Terms

Most human languages have some level of ambiguity. The same word can mean different things depending on the context. In addition, the same word can have completely different meanings. For example, bit means a number of things depending on the context. Computer users view a bit as the smallest amount of data that you can create, while people who work with horses view a bit as something to put in a horse's mouth. A carpenter will view a bit as the sharp part of a tool, and a cook will view it as a small amount. Because of the ambiguity in human language, you need some way to exclude search terms that don't meet your needs.

Sometimes a keyword might produce odd results ”at least it's odd to the person receiving the results. For example, most developers and users are aware of Amazon ”the online reseller of books. However, this term also applies to a mythological group of warrior women and to a specific location on earth. Unless you specifically exclude terms, you could end up with references for everything but the online version of Amazon.

Google provides the “ (minus) symbol for removing words from consideration. For example, the search term develop computer language “program “engineering “education helps you locate topics on computer speech, rather than a computer language such as C#, the language design process, or languages used for education. Another interesting word with too many results is renaissance . A typical Google search will turn up 6,410,000 hits. By adding a single exclusion, renaissance “Harlem , you can reduce the number of results to 5,450,000 because Google no longer displays links to pages with the word Harlem in them. The exclusion technique works much like a Boolean NOT ”it helps you obtain everything but the excluded terms.

Tip  

It's easy to tune the exclusion terms by creating a keyword search and looking at the results. Simply start removing the terms from the initial search results that don't fit your search criteria. In some cases, you'll actually need to open the search page to locate a word that is generic enough to remove a number of the invalid results. Of course, you have to be careful not to use an exclusion word that will remove valid results as well.

Using a Synonym Search

Google provides a poorly documented symbol you can use with a search term to find the synonyms of that term in addition to the term itself. The only place you can find the synonym symbol is on the Advanced Search page at http://www.google.com/help/refinesearch.html. Interestingly enough, it doesn't appear within the symbols defined in the Google Web Services Kit and might not work for all searches. The ~ (tilde) symbol tells Google to perform a synonym search.

In Chapter 1, you learned that the Visual Basic Serial Port keywords returned 132,000 results at the time of writing. Likewise, the VB Serial Port keywords returned 57,300 results. Google returns 1,480,000 results when using ~VB Serial Port as a search term. When you go through the list of results, you'll see plenty of VB entries. However, you'll also see Visual Basic and VisualBASIC entries, along with other permutations such as BASIC alone.

Tip  

Google might not provide every synonym you want or it might include a few you haven't considered . The best way to determine which synonyms Google uses is to perform a synonym search on the word alone , but eliminate the search word, such as ~VB-VB . The first highlighted word in the snippet provided with the search results is the first synonym Google uses. Eliminate this word by excluding it also, such as ~VB-VB-code-ActiveX-Visual-Basic-VB6 . Eventually, Google will return an error message saying it didn't find any search results. At this point, you have all of the synonyms that Google uses for a particular search term. Now you can decide whether to add additional search terms of your own or to exclude search terms you really don't want to use.

You can use synonym searches to locate information when a Web site developer could use any of a number of terms. For example, if you use ~environment behavior as a search term, Google also returns pages with words such as climate and nature highlighted because these terms are synonyms for environment.

It's important to use some kind of filtering with a synonym search. The environment behavior search returns 3,830,000 results, but the ~environment behavior search returns 5,610,000 results, which is too many to view. For example, you could change the search to ~environment child behavior disorders-animal- mineral -discipline . By adding two keywords and excluding three others, you can reduce the number of results to 196,000. This is still too many results to view, but does reduce the total by 95 percent.

Using Precise Phrases

Google doesn't normally retain the order of your keywords in a search ”it simply looks for the keywords wherever they might appear in the resource. Even though the order of the words in the search phrase determines the order of the results you receive, the resource need not have the words in any particular order. Generally, you can look for information this way and not experience any problems. However, when you're looking for a specific quote, a product, or reference the order of the words becomes important. When this happens, you need to search using a precise phrase.

Precise phrases appear within double quotes. For example, if you wanted to look up the precise phrase, "for whom the bell tolls," you'd enclose it in double quotes as shown. One of the advantages of using a precise phrase is that it preserves small words without using any other special symbols.

The effects of a precise phrase are easy to see. Using all the words, +for whom +the bell tolls , returns 86,800 results. On the other hand, the precise phrase returns only 582 results.

Performing Site Restricted Searches

Most of the noncommercial sites that I check for information lack a search engine. In many cases, it's simply because the site is on a host where the Webmaster lacks access to the server. In some cases, the site lacks a search engine because the Webmaster simply hasn't had time to create one. Even when a site includes a search engine, it might provide incomplete results or not work the way you think it should. All of these problems occur with enough regularity that you will use the site search regularly.

Tip  

Because less skilled users usually have problems with "secret" combinations of query words and the site search is so helpful, this one search provides a significant return when using Web services. Even if you don't have a problem remembering the special query word used for a site search, the fact that you have to type the domain in for each site you want to visit can cause problems. Consequently, if you want to provide a case for using Google Web Services and need a scenario with quantifiable results ”this is one of the best. A Web service that provides access to specific sites through Google can greatly reduce search times and make users more productive.

To use the site search, simply add the query word site: to your search, along with the domain of the search site. For example, if you wanted to search Microsoft's main Web site, you'd use site:www.microsoft.com . Notice that you don't include a protocol (the http or ftp portion of the URL). In addition, you can't drill down to a particular folder on the domain. Consequently, you can't directly search hosted sites using this technique, unless you use a few tricks.

One of the best methods to drill down into a particular hosted site is to combine a site search with an URL search (see the " Defining an URL Search" section for details). The combination of site search and URL search ensures you get results from just that portion of the domain.

You can also use keywords to perform the search. For example, the site might include a particular keyword as part of the title or text for every page. (See the " Defining a Title Search" section of the chapter for details.) My Web site includes my company name , DataCon Services, as part of the title for every page. Not only does this technique make it easier to find my page, it also ensures you know precisely which Web site you're visiting.

Note  

You can't include more than one site per search. Consequently, this is a great place to use Web services. A user could enter multiple sites on a form and the code associated with the form would automatically make one request per entry.

Performing Date Restricted Searches

The Internet contains old data ”old in terms of some technologies such as computers. A technology that's new today is old hat in 6 months to a year. Consequently, if you want new technology solutions, you need to filter out the old solutions that the media keeps around for historical reasons. Likewise, you might actually want older information. A history buff might not want the latest information because it doesn't have the historical context of the older material. For that matter, you might simply want to locate information that you remember seeing 3 months ago and don't want to wade through newer information.

This is one case where the Google Advanced Search page doesn't reflect the realities of using Google Web Services very well. The Google Advanced Search page lets you select from: anytime , past 3 months, past 6 months, and past year. The Google Web Service is a lot more capable in this regard, because you can supply specific dates for the information you want. The only problem is you have to supply the data range as a Julian date. Supplying the daterange query word as daterange:2452760-2452964 means that you want to search for pages that Google updated between 30 April 2003 and 21 November 2003.

Note  

Because the Google webbot doesn't traverse every link on the Internet every day, updates to a Web site often require a few days to appear. A webbot is essentially a software robot ”it travels across the Internet looking for new information. When it finds new information, it makes a change to the Google database that you access using Google Web Services. Consequently, it's important to realize that the dates you see on Google are the dates that the Google webbot last saw a change to the affected Web page, not the date the change actually occurred.

Calculating the Julian date isn't for the faint of heart because you need to consider all kinds of issues that have occurred over the years with the calendar, such as the introduction of the Gregorian calendar we now use. Fortunately, most programming languages include a special function that helps you convert Gregorian dates to Julian dates. If your programming language or other development environment (such as a scripted Web page) is one of the few that don't support this feature, you can always use a Web site that performs the conversion for you, such as the U.S. Naval Observatory site at http://aa.usno.navy.mil/data/docs/JulianDate.html. Note that you should only use the integer portion of the results from this site ”not the included decimal portion, which signifies the current time.

Defining a Title Search

Google actually supports two forms of title search. The first type performs a title search using just a single keyword, while the second type performs a title search using all of the keywords you've supplied. To perform a single keyword search, use the intitle query word. For example, if you use intitle:VB6 program development as a search term, Google only looks for VB6 in the title of the page. The words program and development can appear anywhere . To perform a title search with all of the keywords, use the allintitle query word. Google returns only two results when you use allintitle:VB6 program development as the search term, but returns 604 results when you use the intitle:VB6 program development search term.

Tip  

You don't have to settle for an all or nothing approach to a keyword search. It's possible to use the intitle query word in front of any word in the search term. For example, to search for Visual Basic projects for applications that require serial port support, you could use a search term such as intitle:Visual intitle:Basic serial port . Google will look only in titles for Visual Basic, but will look through all parts of the Web page for serial port.

It's easy to combine a title search with other searches to locate a specific Web page. For example, you can't directly search my site using a site search because I don't have my own domain. However, you can locate information about my site by combining a site search with a title search like this: site:www.mwt.net intitle:DataCon Services .

Title searches aren't as successful as other search types in helping you map out sites if the site creator didn't use titles carefully. In fact, many private sites don't use titles at all, while some commercial sites use them inconsistently. For example, the search term site:www.microsoft.com intitle: Windows doesn't necessarily return all of the Windows sites. The results are still useful, but Microsoft doesn't use site titles consistently. It's important to note that the search will return results for every page that does include Windows in the title, so this search will return links for products such as Windows Media Player and Windows 2000.

It's also important to remember that the keyword you supply can appear anywhere in the title, as shown in Figure 2.3. In this case, the example uses an allintitle search with Visual Basic 6 as the keyword. Depending on how you phrase the search term, you can obtain very specific results using this technique, but you also need to realize that the search results might not be complete.

click to expand
Figure 2.3: The Google Advanced Search page helps you learn about special searches.
Tip  

Most competent Web page developers provide a title that accurately matches the content of the page. Consequently, the title search can reduce the number of results that apply to sites with less helpful information. This technique is usually better for commercial or professional sites, rather than home or hobby sites.

Defining an URL Search

Like the title search, Google provides two forms of the URL search. The first uses just a single keyword as the basis for searching for text within an URL, while the second uses all of the keywords as the basis for the search. To perform a single keyword search, use the inurl query word. In this case, you must think about words that you'd see in an URL. For example, it's likely that you'll see VB6, not Visual Basic 6, as part of an URL. You can use the allinurl query word to perform a search with all of the keywords. In this case, you might want to perform a search for one or more subfolders of a main site. For example, you might look for eBay and antique as keywords. Figure 2.4 shows typical results for an allinurl query. Again, the idea is to choose words that will appear as part of an URL. Like the title search, you can also use multiple instances of the inurl query word to locate several words in an URL without using all of the keywords for this purpose.

click to expand
Figure 2.4: Use URL searches to locate sites that have specific areas that match your keywords.

Many Web sites don't have their own domain ”they host space on someone else's server and have a home URL that reflects their location within another domain. For example, you can use the URL search technique to locate information on my site. All you need to do is perform a site search and combine it with an URL search like this: site:www.mwt.net inurl:~jmueller . The ~jmueller portion of the URL for my home page appears as part of every URL for my Web site.

Of course, you can use a combination technique to search portions of a larger Web site. For example, you might want to visit only the Windows-specific pages on Microsoft's site. A combination search, site:www.microsoft.com inurl:windows , provides the results you want. A side effect of this particular search is that you'll actually build a site map of sort for just the Windows-specific URLs. In fact, if you couple Google Web Services with a graphics application, you could easily build a chart that lays out the entire Microsoft site.

Note  

This example only returns links that have Windows in them. If you want links for Windows 2000, you need to expand the search term to include Windows 2000 like this: site:www.microsoft.com inurl:windows OR inurl:windows2000 . An URL search can be more restricted than a title search. While the title search discussed earlier returns all Microsoft sites with the word Windows in the title, the URL search returns only those sites that have the word Windows in the URL.

Looking for Text Alone

It's possible to conduct a search where the keywords appear within the title, URL, links, or other areas of the site, but not within the text of the site itself (the area within the <body> tag). Precisely why this happens depends on the site, but often it has to do with text that you'll never see. The result of this kind of fruitless search is that you end up finding sites that have nothing to do with the research you're conducting. To avoid this problem, you can perform a text only search using the allintext query word.

Adding the allintext query word ensures that you only receive sites that actually discuss the topic you want to read about. For example, a search on VB Serial Port returns 57,300 results. However, using allintext:VB Serial Port returns only 41,100 results, which means that 16,200 of the previous results didn't have any mention of VB Serial Port in the actual text of the document.

This kind of search is especially helpful in technical or professional searches where you need to find specific words and learn how they're used. For example, developers have specific needs in this area when locating code that performs a particular task. It's hard to locate such examples because Web sites often list the terms in other areas without actually providing a coding example.

Looking for Links Alone

One of the biggest questions that developers have when asked about link searches is why anyone would want to look for a link. A link on a page doesn't really say much except that the author of the page is providing a pointer to some other location. However, links are important for a number of reasons. For example, you might want to know how many sites reference your Web site. Sometimes it's handy to know which sites have information that relates to a particular topic. You can even use Google to discover which pages have a particular link on them. The following sections discuss all three of the link searches.

Working with Back Links

Finding Web pages that reference another Web page can be important for a number of reasons. However, many developers will use this approach to look for pages that reference a particular site. For example, you might need to verify which sites reference your site. To use this search, specify the link query word as part of the search term, along with the URL that you want to locate. You must use this query word alone ”it doesn't work in combination with other query words.

Site references are important for a number of reasons. You might have paid to get your site mentioned on the host site (or provided some other service, such as writing content for the host site). Users of your site might complain that they can't access links found on other sites. Performing a site search helps you locate sites with old links to your site.

Note  

The back link search differs from the specific link search in a very important way. When working with the back link search, you provide a specific URL for which you want to search. On the other hand, a specific link search looks for links on a page that contain a particular keyword. The keyword isn't an URL in this case, but part of an URL. Use the back link search when you need to find a particular URL and the specific link search when you need to locate URLs that contain specific keywords.

Locating Related Links

Not every piece of information on the Internet relates in some way, but connections do exist. Locating these relations can prove difficult and without help, you won't find them all. The related query word helps you find sites that have some kind of connection to each other based on the keyword you provide. For example, you might want to locate a site that sells classic cars based on the content of a site that you've already located. Figure 2.5 shows typical results from this kind of search.

click to expand
Figure 2.5: Locate related sites using the related links search.

Remember that this search locates information based on an URL that you supply. Consequently, it's important to locate a site that specifically matches your search criteria. Otherwise, Google will return results based on a less than perfect URL selection. You must use this query word alone ”it doesn't work in combination with other query words.

Finding Specific Links

Sometimes you need to locate pages that have a specific link on them. For example, you might want to know which sites reference support for a particular product. To search for specific keywords in links, you provide a search term that includes the allinlinks query word. Unfortunately, this search doesn't appear on the Advanced Search page, so the only way you can try it out is using Google Web Services.




Mining Google Web Services
Mining Google Web Services: Building Applications with the Google API
ISBN: 0782143334
EAN: 2147483647
Year: 2004
Pages: 157

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net