Hack 52. Prefetch Yahoo Search Results

 < Day Day Up > 

Hack 52. Prefetch Yahoo! Search Results

Automatically prefetch and cache the first search result on Yahoo! Web Search.

If you know how to use them properly, search engines are pretty darn good at finding exactly the page you're looking for. Google is so confident in its algorithm that it includes a hidden attribute in the search results page that tells Firefox to prefetch the first search result and cache it. You're probably going to click on the first result anyway, and when you do, it will load almost instantaneously, because your browser has already been there.

Yahoo! Web Search is pretty good, too, but it doesn't yet have this particular feature. So let's add it.

There are two important things about Yahoo! search results that you can discover by viewing the source on its search results page. First, the links of the search results each have a class yschttl. Yahoo! uses this for styling the links with CSS, but you can use it to find the links in the first place. A single XPath query can extract a list of all the links with the class yschttl, and the first one of those is the one we want to prefetch and cache.

The second thing you need to know is that the search results Yahoo! provides are actually redirects through a tracking script on rds.yahoo.com that records which link you clicked on. A sample link looks like this:

 http://rds.yahoo.com/S=2766679/K=gpl+compatible/v=2/SID=e/TID=F510_112/  l=WS1/R=2/IPC=us/SHE=0/H=1/SIG=11sgv1lum/EXP=1116517280/*-http%3A//www.gnu.  org/licenses/gpl-faq.html 

To save time and bandwidth, and to avoid skewing Yahoo!'s tracking statistics, this user script will extract the target URL out of the first search result link before requesting it. The target URL is always at the end of the tracking URL, after the *-, with characters such as colons (:) escaped into their hexadecimal equivalents. Here's the target URL in the previous example:

 http://www.gnu.org/licenses/gpl-faq.html 

When I say "prefetch and cache," there is really only one step: prefetch. By default, Firefox automatically caches pages according to HTTP's caching directives and your browser preferences. For this script to have the desired effect, make sure that your browser preferences are set to enable caching pages. Open a new window or tab, go to about:config, and double-check the following preferences:

 * browser.cache.disk.enable  /* should be "true" */ * browser.cache.check_doc_frequency  /* should be 0, 2, or 3 */ 

about:config shows you all your browser preferences, even ones that are not normally configurable through the Options dialog. Type part of a preference name (such as browser.cache) in the Filter box to narrow the list of displayed preferences.


6.7.1. The Code

This user script will run on Yahoo! search results pages. It works by finding the first search result on the page and retrieving it. You might think that it would be easier to add a <link rel="prefetch"> to the page, which is how Google's prefetching works. Unfortunately, this does not work, because by the time the user script executes, Firefox has already prefetched all the links it's going to fetch for the page.

Save the following user script as yahooprefetch.user.js:

 // ==UserScript== // @name Yahoo! Prefetcher // @namespace http://diveintomark.org/projects/greasemonkey/ // @description prefetch first link on Yahoo! web search results // @include http://search.yahoo.com/search* // ==/UserScript== var elmFirstResult = document.evaluate("//a[@class='yschttl']", document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue; if (!elmFirstResult) return; var urlFirstResult = unescape(elmFirstResult.href.replace(/^.*\*-/, '')); var oRequest = { method: 'GET', url: urlFirstResult, headers: {'X-Moz': 'prefetch',   'Referer': document.location.href}}; GM_log('prefetching ' + urlFirstResult); GM_xmlhttpRequest(oRequest); 

6.7.2. Running the Hack

To verify that the script is working properly, you'll need to clear your browser cache. You don't need to do this every time, just once to prove to yourself that the script is doing something. To clear your cache, go to the Tools menu and select Options; then, go to the Privacy tab and click the Clear button next to Cache.

You can also use the LiveHTTPHeaders extension to see exactly which URLs Firefox fetches. You can download the extension at http://livehttpheaders.mozdev.org/.


Now, install the user script from Tools Install This User Script, and then go to http://search.yahoo.com and search for gpl compatible. The prefetching happens in the background after the page is fully loaded, so wait a second or two after the search results come up. There wont be any visible indication onscreen that Firefox is prefetching the link. You might see some additional activity on your modem or network card, but it's hard to separate this from the activity of loading the rest of the Yahoo! search results page.

Open a new browser window or tab and go to about:cache. This displays information about Firefox's browser cache. Under "Disk cache device," click List Cache Entries. You should see a key for http://www.gnu.org/philosophy/license-list.html. This is the result of Firefox prefetching and caching the first Yahoo! search results. Click that URL to see specific information about the cache entry, as shown in Figure 6-11.

Figure 6-11. Information about a prefetched page


6.7.3. Hacking the Hack

By now, you should realize that this prefetching technique can be used anywhere, with any links. Do you use some other search engine, perhaps a site-specific search engine such as Microsoft Developer's Network (MSDN)? You can apply the same technique to those search results.

For example, going to http://msdn.microsoft.com and searching for active accessibility takes you to a search results page at this URL:

 http://search.microsoft.com/search/results. aspx?qu=active+accessibility&View=msdn&st=b&c=0&s=1&swc=0 

If you view source on the page, you will see that the result links are contained within a <div > tag. This means that the first result can be found with this XPath query:

   var elmFirstResult = document.evaluate("//div[@class='results']// a[@href]", document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null). singleNodeValue; 

Unlike with Yahoo! search results, search result links are not redirected through a tracking script, so you will need to change this line:

   var urlFirstResult = unescape(elmFirstResult.href.replace(/^.*\*-/, '')); 

to this:

   var urlFirstResult = elmFirstResult.href; 

The rest of the script will work unchanged.

     < Day Day Up > 


    Greasemonkey Hacks
    Greasemonkey Hacks: Tips & Tools for Remixing the Web with Firefox
    ISBN: 0596101651
    EAN: 2147483647
    Year: 2005
    Pages: 168
    Authors: Mark Pilgrim

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net