Hack91.Search Google by Link Graph


Hack 91. Search Google by Link Graph

Use Google's Web Services API and a Flikr-style link graph to search Google.

Google is a great search engine, but sometimes I find myself looking at the page snippets more than I do the pages themselves. This hack takes the snippets and looks for repeating words around the search term. It's a fascinating way to get more insight into a search phrase.

9.7.1. The Code

Save the code in Example 9-10 as index.php.

Example 9-10. A DHTML link graph that uses Google as a data source
 <?php  require_once("Services/Google.php"); $ignore = array(  'the','for','and','with','the','new','are','but','its','that','was',  'your', 'yours', 'also', 'all', 'use', 'could', 'would', 'should', 'when',     'they',  'far', 'one', 'two', 'three', 'you', 'most', 'how', 'these', 'there', 'now',     'our',  'from', 'only', 'here', 'will' );  $ignorehash = array();  foreach( $ignore as $word ) { $ignorehash[ $word ] = 1; } $term = "Code Generation";  if( array_key_exists( 'term', $_GET ) )    $term = $_GET['term']; $key = "GOOGLE_KEY"; $google = new Services_Google( $key ); $google->queryOptions['limit'] = 50; $google->search( $term ); $data = array(); foreach($google as $key => $result) {   $data []= array(     'title' => $result->title, 'snippet' => $result->snippet, 'URL' => $result->URL   );  } function jsencode( $text ) {    $text = preg_replace( '/\'/', '', $text );    return $text; } function get_words( $text ) {    $text = preg_replace( '/\<(.*?)\>/', '', $text );    $text = preg_replace( '/[.]/', '', $text );    $text = preg_replace( '/,/', '', $text );    $text = html_entity_decode( $text );    $text = preg_replace( '/\<(.*?)\>/', '', $text );    $text = preg_replace( '/[\'|\"|\-|\+|\:|\;|\@|\/|\\\\|\#|\!|\(|\)]/', '',    $text );   $text = preg_replace( '/\s+/', ' ', $text );   $words = array();   foreach( split( ' ', $text ) as $word )   {     $word = strtolower( $word ); $word = preg_replace( '/^\s+/', '', $word ); $word = preg_replace( '/\s+$/', '', $word ); if( strlen( $word ) > 2 )       $words []= $word;   }   return $words; } $found = array(); $id = 0; foreach( $data as $row ) {   $row['id'] = $id; $id += 1;   $words = @get_words( $row['snippet'] );   foreach( $words as $word )   {     if ( !array_key_exists( $word, $found ) ) {       $found[$word] = array();   $found[$word]['word'] = $word;   $found[$word]['count'] = 0;   $found[$word]['rows'] = array();     } $found[$word]['count'] += 1; $found[$word]['rows'][$row['URL']] = $row;   }  } $good = array(); foreach( array_keys( $found ) as $text )  {   if ( $found[$text]['count'] > 1 && array_key_exists( $text, $ignorehash ) ==      false )  $good []= $found[$text]; } $min = 1000000;  $max = -1000000; function row_compare( $a, $b ) { return strcmp( $a['word'], $b['word'] ); } usort( $good, 'row_compare' );  foreach( $good as $row ) {    if ( $row['count'] < $min ) $min = $row['count'];    if ( $row['count'] > $max ) $max = $row['count']; } $ratio = 10.0 / (float)( $max - $min ); ?> <html> <head> <style type="text/css"> .word-link { line-height: 18pt; } .title { border-bottom: 1px dotted black; margin-top: 5px; } .snippet { margin-left: 20px; font-size:small; margin-top: 5px; margin-bottom: 5px; } </style> <script language="Javascript"> var pages = [ <?php foreach( $data as $row ) { ?> {   url: '<?php echo( $row['URL'] ); ?>',   snippet: '<?php echo( jsencode( $row['snippet'] ) ); ?>',   title: '<?php echo( jsencode( $row['title'] ) ); ?>' },  <?php  }  ?>  ]; function display( items ) {    var obj = document.getElementById( 'found' );    var html = "";    for( i in items )    {     var p = pages[ items[ i ] ];  html += "<div class=\"title\"><a href=\""+p.url+"\" target=\"_blank\">"+p.          title+"</a></div>";     html += "<div class=\"snippet\">"+p.snippet+"</div>";   }   obj.innerHTML = html; } </script> </head> <body> <table width="600" cellspacing="0" cellpadding="5"> <tr> <td colspan="2"> <form> Search term: <input type="text" name="term" value="<?php echo($term); ?>" /> &nbsp; <input type="submit" value="Search"> </form> </td> </tr> <tr> <td width="50%" valign="top"> <?php foreach( $good as $row ) { $val = (float)( $row['count'] - $min ); $fontsize = floor( 10.0 + ( $val * $ratio ) ); $row_ids = array(); foreach( $row['rows'] as $r ) { $row_ids []= $r['id']; } $rows = join(',', $row_ids ); ?> <a  href="javascript:display([<?php echo($rows); ?>]);" style="font-size:<?php echo($fontsize); ?>pt;"><?php echo( $row['word'] ); ?></a> &nbsp; <?php } ?> </td> <td width="50%"  valign="top"> </td> </tr> </table> </body> </html> 

This script is a combination of PHP and JavaScript. The PHP uses the Services_Google PEAR module [Hack #2] to download a set of search results. It then removes the HTML from the results and breaks up the text into words. It counts the number of hits on each word and stores that number, along with the related article URLs and descriptions, all via JavaScript arrays on the page.

After that, it's up to the browser, which displays the found terms on the lefthand side of the display. The JavaScript handles when a user clicks on a term by setting the inner HTML (innerHTML) on the righthand side of the display to show the found articles.

All of this occurs in the JavaScript display() function.


9.7.2. Running the Hack

Edit the file to replace the value of $key with the value that you get when you sign up for Google's Web API access (http://www.google.com/apis/). Next, install the Services_Google PEAR module [Hack #2].

The final step is to upload the index.phpfile to the server and browse to it in your browser. The result should look like Figure 9-12.

Figure 9-12. Searching for Addams Family


The lefthand column is showing me all of the words that show up several times in the snippet associated with each search result. As you can see, the two most popular are Addams and Family, which makes perfect sense. But there are some interesting ones as well, such as the names of the other characters in the show, as well as review, cast, and (surprisingly) goofs.

Clicking on any one of these items will list the pages that had that word in the snippet, as shown in Figure 9-13.

I wrote this little page for this book as a test of the Google Web Services API, but it's turned out to be much cooler than that. The link-graph-style visualization [Hack #24] can take this information to a whole new level.

Figure 9-13. Clicking on a snippet term shows the related pages


9.7.3. See Also

  • "Create Link Graphs" [Hack #24]



PHP Hacks
PHP Hacks: Tips & Tools For Creating Dynamic Websites
ISBN: 0596101392
EAN: 2147483647
Year: 2006
Pages: 163

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net