Enhancing Results | Professional Web APIs with PHP. eBay, Google, PayPal, Amazon, FedEx, Plus Web Feeds

Although the preceding example works rather well for testing, and begins to show just how easy it can be to perform searches through the API, graphically it is far from pleasing. The next example conceptually presents several of the concepts necessary in integrating Google search results within your own website. Keep in mind that every page (or template) is different, and the steps and effort required will vary based on the complexity of the template and your familiarity with it:

Create a development copy.

View the source of the target page, then copy and paste it into your development program. Take a quick look through for relative links (try searching for ="/ or ="), and change them to the absolute location. CSS (Cascading Style Sheets) in particular are important ones to fix.
Examine the development copy.

Take a look at the development copy in your web browser; it should look identical to the original. If not, check again for relative links. It isn't imperative that everything is perfect; as long as the overall flow of the page remains the same, you can continue. Take a look at the major block elements of the page (table cells or major DIV/SPAN blocks). Using a Mozilla web browser with the Web Developer extension can make this trivial, because it can outline these elements under the Outline drop-down list. Consider where the search box and results will be displayed, and which current page elements they should be styled after.
Remove unneeded content from development copy.

Unless the target page was rather empty to begin with, the search results will be displacing content previously on the page. Remove the unneeded content from the page; be careful to avoid removing structural data. At this point, the page layout may break. If this happens, one of two things has occurred. First, it is possible that the height of the content area was an integral part of the design, and removing it caused other elements to be sized improperly. If this happened, insert some meaningless text and check again. In the complete example, the search results will take up that space. Second, it is possible that there were some structural elements mixed in with the content that was just removed (for example, if each story on a news page was contained in its own table cell). Undo your changes and examine the code more closely.
Insert the search code from the previous example into the development copy.

Take the search code from the previous example and insert it into the development copy. Place the code in the hole created by removing content. If the content contained a few essential structural elements, try splitting the results up among them, with the search result information (keywords, result count, and so) in one, the results in the next, and the link for the next page in the last. If the content contained many structural elements, echo the appropriate HTML code within the foreach loop that iterates through the results.
View the resultant page in your web browser.

Take a look at the page in your web browser. It should look relatively in tune with the original. Run a few searches to see how things work. Make sure you run at least one search designed to have only a few or no results to ensure the page still works structurally.
Tweak the document to use appropriate stylistic elements.

Now that the document works structurally and is able to properly display both large and small result sets, start adding the appropriate style elements noted in Step 2. It is a good idea to check your progress frequently, because errors placed within the foreach loop can drastically affect output and can make the document more difficult to diagnose later.
Make a small code change to ensure that Google only searches your site.

There isn't much point in giving people a standard Google search box on your site. Everyone knows where Google is already, and what good would it do you if someone did a search and the number one result was one of your competitors!? Making a small change to the code will ensure that only results from your domain are returned.
```
 'q'=>" site:preinheimer.com ". $searchQuery, 
```
Putting that in place of the previous query line will instruct Google to only return search results from the target domain. Note a few things, however. First, you can only specify one site at a time (a workaround is presented later in this chapter). Second, notice that there is no space between the colon and the beginning of the domain name — that is important. Finally, notice that there is a space after the domain name and before the query. I performed this addition to the query here, rather than elsewhere, so that the addition would be invisible to the end user. Their original query will still be displayed.
Reexamine the original page for nonstatic content.

If the source document was a static document, your work is/would be done. However, this is rarely the case with modern sites. Check the source document for nonstatic content that will need to be added to the search page (dynamic side bars, footers, and so on), and integrate those elements into the search page, replacing their static counterparts. Alternatively, save the code used to present the search results as its own file, and include it into the document in the appropriate location with whatever means are provided on the site.

By following this series of steps, I was able to turn the previous simplistic search page into an integrated page, shown in Figure 6-3.

image from book
Figure 6-3

Caching Results

Google's limit of 1,000 queries per day might seem a little short, but much can be done to ensure you get the most out of your query limit. Many users of your search system will likely be searching for the same things, and by caching the results, you can minimize the number of queries performed and likely speed results to the end user. Examining frequently queried terms can also prove to be a valuable resource to your web development team. If the same items are sought time and time again, perhaps the site design should be modified to make the searched-for items easier to find.

This caching function was designed to be a layer sitting between the calling function and the actual API calling function. This way the same saved results can be used across different pages, different templates, and so on. An easier (though more limited) method to accomplish this task would be to make use of the uniqueness of the URL (since the search parameters are on there) and just save the whole page to disk. Then, when a request comes in, check for an existing page. If it exists, serve it; if not, capture output with output buffering and save a copy to disk for next time.

 function getGoogleResults(&$client, $searchQuery, $start) {   $key = md5($start . $searchQuery);   $query = "SELECT * FROM 06_google_cache_meta WHERE `key` = '$key' AND     ((NOW() - `time`) < 84600)";   $results = getAssoc($query);

The function takes the $client object as a parameter, even though it isn't actually needed by the function itself. This is done so the object can be passed off to the runGoogleSearch function if required (I avoid global variables like the plague). First, $key is generated — this is the primary key within the database, simply a hash of the $start offset and the search query itself (this is discussed in greater detail shortly). A query is generated to check for cached data matching the current request. The getAssoc() function will return an associative array of the results. The full code listing for the getAssoc() function is available in Appendix A.

   if (count($results) > 0)   {     //echo "Using cached data";     $result = array();     $result['estimateIsExact'] = $results['estimateIsExact'];     $result['estimatedTotalResultsCount'] = $results['estimatedTotalResultsCount'];     $result['startIndex'] = $start + 1;     $searchResultQuery = "SELECT * FROM 06_google_cache WHERE `query` = '$searchQuery' AND `start`= '$start'";     $searchResults = getAssoc($searchResultQuery);     $result['endIndex'] = $start + count($searchResults);     $result['resultElements'] = $searchResults;     return $result;   }else

If results are found to the query, cached data exists, and it is recent enough to be of use. A $request object is created and populated with information from the initial query. Then, a second query is performed to obtain the actual search results. Because the getAssoc() function returns an array (and the names of the appropriate rows are identical to those returned by the Google API), it may be directly added to the $request object. Some basic math is performed to obtain the values for startIndex and endIndex to save on space in the database.

   {     //echo "Ran query against API";     $result = runGoogleSearch(&$client, $searchQuery, $start);     if ($client->fault)     {           return $result;     } else {           if ($client->getError())           {           return $result;

Because the database did not contain relevant cached items, the Google API will need to be called to obtain the requested search results. In the event of an error, return the generated $result object to the calling function. It can be dealt with there, just as it would be if you were not caching results.

         } else        {          $linkID = db_connect();          $key = md5($start . $searchQuery);          $query = mysql_escape_string($searchQuery);          $insertQuery = "REPLACE INTO 06_google_cache_meta            (`key`, `query`, `start`, `estimateIsExact`,              `estimatedTotalResultsCount`, `time`)            VALUES ('$key', '$query', '$start', '{$result['estimateIsExact']}',              '{$result['estimatedTotalResultsCount']}', null)";          insertQuery($insertQuery);

Here the results begin to be cached. First the meta-information about the query is cached, namely the query, the start index, and information on the estimated total result count. The key that is generated is used so that MySQL's handy REPLACE INTO syntax can be used, which will replace a row if it exists already, or just insert a new one if not (but it requires a primary key to run). The MD5 (Message-Digest Algorithm 5) hashing algorithm is handy to generate these keys because it is of predictable length.

         $queryResults = $result['resultElements'];         $index = 0;         if (count($queryResults) > 1)         {

The search result element is copied out of the $result element so it can be iterated through via the upcoming foreach() loop. An index is created so you can note which result in particular is being worked on, so results are printed in order when the cached copy is used. The check to ensure that the total result count is higher than 0 is needed to avoid errors with running a foreach() on an empty element.

           foreach($queryResults as $item)           {             $url = mysql_escape_string($item['URL']);             $snippet = mysql_escape_string($item['snippet']);             $title = mysql_escape_string($item['title']);             $key = md5($start . $index . $query);

Because the data is being saved to the database, it must be escaped. It is always preferable to use the database-specific escape function because it will escape all required characters, rather than a few common ones. Again the md5() function is used to generate a unique key.

               $insertQuery = "REPLACE INTO 06_google_cache                 (`key`, `index`, `query`, `start`, `snippet`, `title`, `url`)                VALUES                 ('$key', '$index', '$query', '$start', '$snippet', '$title',                 '$url')";               replaceQuery($insertQuery, $linkID);               $index++;           }         }         return $result;       }     }   } }

Finally, the information is replaced into the database and the index value is incremented. The original $result object returned by the API call is returned to the calling function to be used.

Note

Note that the use of the md5() function here could conceivably cause some issues. Although the MD5 hashing algorithm itself is more than secure enough for this purpose, the inputs used are subject to collision. A search for "paul" with a start index of 11, and a search for "1paul" and a start index of 1 would both provide the md5() function with identical input, and as such an identical key would be returned. To resolve this, ensure all inputs are trim()'d before entry, and add a space into the md5() call.

The 06_google_cache table:

 CREATE TABLE `06_google_cache` (   `key` varchar(32) NOT NULL default '',   `index` int(11) NOT NULL default '0',   `query` varchar(255) NOT NULL default '',   `start` int(11) NOT NULL default '0',   `snippet` text NOT NULL,   `title` varchar(75) NOT NULL default '',   `url` varchar(255) NOT NULL default '',   PRIMARY KEY (`key`) ) TYPE=MyISAM;

The values used are relatively arbitrary, with the exception of the key value. The result of md5() is always the same length.

The 06_google_cache_meta table:

 CREATE TABLE `06_google_cache_meta` (   `key` varchar(32) NOT NULL default '',   `query` varchar(255) NOT NULL default '',   `start` int(11) NOT NULL default '0',   `estimateIsExact` set('1','') NOT NULL default '',   `estimatedTotalResultsCount` int(11) NOT NULL default '0',   `time` timestamp(14) NOT NULL,   PRIMARY KEY (`key`) ) TYPE=MyISAM;