Although the
Create a development copy.
View the source of the target page, then copy and paste it into your development program. Take a quick look through for relative links (try searching for ="/ or =" ), and change them to the absolute location. CSS (Cascading Style Sheets) in particular are important ones to fix.
Examine the development copy.
Take a look at the development copy in your web browser; it should look identical to the original. If not, check again for relative links. It isn't imperative that everything is perfect; as long as the overall flow of the page remains the same, you can continue. Take a look at the major block elements of the page (table
Remove unneeded content from development copy.
Unless the target page was rather empty to begin with, the search results will be displacing content previously on the page. Remove the unneeded content from the page; be careful to avoid removing structural data. At this point, the page layout may break. If this happens, one of two things has occurred. First, it is possible that the height of the content area was an integral part of the design, and removing it caused other elements to be
Insert the search code from the previous example into the development copy.
Take the search code from the previous example and insert it into the development copy. Place the code in the hole created by removing content. If the content contained a few essential structural elements, try splitting the results up among them, with the search result information (keywords, result count, and so) in one, the results in the next, and the link for the next page in the last. If the content contained many structural elements, echo the appropriate HTML code within the foreach loop that iterates through the results.
View the resultant page in your web browser.
Take a look at the page in your web browser. It should look relatively in tune with the original. Run a few searches to see how things work. Make sure you run at least one search designed to have only a few or no results to ensure the page still works structurally.
Tweak the document to use appropriate stylistic elements.
Now that the document works structurally and is able to properly display both large and small result sets, start adding the appropriate style elements noted in Step 2. It is a good idea to check your progress frequently, because errors placed within the foreach loop can drastically affect output and can make the document more difficult to diagnose later.
Make a small code change to ensure that Google only searches your site.
There isn't much point in giving people a standard Google search box on your site. Everyone
'q'=>" site: preinheimer.com ". $searchQuery,
Putting that in place of the previous query line will instruct Google to only return search results from the target domain. Note a few things, however. First, you can only specify one site at a time (a workaround is presented later in this chapter). Second, notice that there is no space between the
Reexamine the original page for nonstatic content.
If the source document was a static document, your work is/would be done. However, this is rarely the case with modern sites. Check the source document for nonstatic content that will need to be added to the search page (dynamic side bars, footers, and so on), and integrate those elements into the search page, replacing their static counterparts. Alternatively, save the code used to present the search results as its own file, and include it into the document in the appropriate location with whatever means are provided on the site.
By following this series of steps, I was able to
Figure 6-3
Google's limit of 1,000 queries per day might seem a little short, but much can be done to ensure you get the most out of your query limit. Many users of your search system will likely be searching for the same things, and by caching the results, you can minimize the number of queries performed and likely speed results to the end user. Examining frequently queried terms can also
This caching function was designed to be a layer sitting between the calling function and the actual API calling function. This way the same saved results can be used across different pages, different templates, and so on. An easier (though more limited) method to accomplish this task would be to make use of the uniqueness of the URL (since the search parameters are on there) and just save the whole page to disk. Then, when a request comes in, check for an existing page. If it exists, serve it; if not, capture output with output buffering and save a copy to disk for next time.
function getGoogleResults(&$client, $searchQuery, $start) { $key = md5($start . $searchQuery); $query = "SELECT * FROM 06_google_cache_meta WHERE `key` = '$key' AND ((NOW() - `time`) < 84600)"; $results = getAssoc($query);
The function takes the
$client
object as a parameter, even though it isn't actually needed by the function itself. This is done so the object can be passed off to the
runGoogleSearch
function if required (I avoid global
if (count($results) > 0) { //echo "Using cached data"; $result = array(); $result['estimateIsExact'] = $results['estimateIsExact']; $result['estimatedTotalResultsCount'] = $results['estimatedTotalResultsCount']; $result['startIndex'] = $start + 1; $searchResultQuery = "SELECT * FROM 06_google_cache WHERE `query` = '$searchQuery' AND `start`= '$start'"; $searchResults = getAssoc($searchResultQuery); $result['endIndex'] = $start + count($searchResults); $result['resultElements'] = $searchResults; return $result; }else
If results are found to the query, cached data exists, and it is recent enough to be of use. A
$request
object is created and
{ //echo "Ran query against API"; $result = runGoogleSearch(&$client, $searchQuery, $start); if ($client->fault) { return $result; } else { if ($client->getError()) { return $result;
Because the database did not contain relevant cached items, the Google API will need to be called to obtain the
} else { $linkID = db_connect(); $key = md5($start . $searchQuery); $query = mysql_escape_string($searchQuery); $insertQuery = "REPLACE INTO 06_google_cache_meta (`key`, `query`, `start`, `estimateIsExact`, `estimatedTotalResultsCount`, `time`) VALUES ('$key', '$query', '$start', '{$result['estimateIsExact']}', '{$result['estimatedTotalResultsCount']}', null)"; insertQuery($insertQuery);
Here the results begin to be cached. First the
$queryResults = $result['resultElements']; $index = 0; if (count($queryResults) > 1) {
The search result element is
foreach($queryResults as $item) { $url = mysql_escape_string($item['URL']); $snippet = mysql_escape_string($item['snippet']); $title = mysql_escape_string($item['title']); $key = md5($start . $index . $query);
Because the data is being saved to the database, it must be escaped. It is always preferable to use the
$insertQuery = "REPLACE INTO 06_google_cache (`key`, `index`, `query`, `start`, `snippet`, `title`, `url`) VALUES ('$key', '$index', '$query', '$start', '$snippet', '$title', '$url')"; replaceQuery($insertQuery, $linkID); $index++; } } return $result; } } } }
Finally, the information is
| Note |
Note that the use of the md5() function here could conceivably cause some issues. Although the MD5 hashing algorithm itself is more than secure enough for this purpose, the inputs used are subject to collision. A search for "paul" with a start index of 11, and a search for "1paul" and a start index of 1 would both provide the md5() function with identical input, and as such an identical key would be returned. To resolve this, ensure all inputs are trim() 'd before entry, and add a space into the md5() call. |
The 06_google_cache table:
CREATE TABLE `06_google_cache` ( `key` varchar(32) NOT NULL default '', `index` int(11) NOT NULL default '0', `query` varchar(255) NOT NULL default '', `start` int(11) NOT NULL default '0', `snippet` text NOT NULL, `title` varchar(75) NOT NULL default '', `url` varchar(255) NOT NULL default '', PRIMARY KEY (`key`) ) TYPE=MyISAM;
The values used are relatively arbitrary, with the exception of the key value. The result of md5() is always the same length.
The 06_google_cache_meta table:
CREATE TABLE `06_google_cache_meta` ( `key` varchar(32) NOT NULL default '', `query` varchar(255) NOT NULL default '', `start` int(11) NOT NULL default '0', `estimateIsExact` set('1','') NOT NULL default '', `estimatedTotalResultsCount` int(11) NOT NULL default '0', `time` timestamp(14) NOT NULL, PRIMARY KEY (`key`) ) TYPE=MyISAM;