Many developers will want to use PHP to create some type of search feature for their Web site. Yes, you can use the search engine provided by the Web server, but only if you have access to the server. When working with a hosted setup, you might not have such access and it might prove difficult to write a good search engine application. In addition, the search engines provided with many Web servers are inferior to the Google search engine. The PHP technique lets you create a site search engine that works for many needs. The following sections show two forms of this example ”a simple best 10 results technique and a technique that extends the number of results to whatever Google can provide.
Note | In general, you'll find this solution effective only if you have a site with moderate traffic. Remember that the Google license agreement limits you to 1,000 searches per day. It's possible to overcome this problem partially by using caching techniques (see the "Writing a PHP Application with Database Support" section for details). You can also request more searches per day from Google. In some cases, Google does provide a waiver for deserving individuals. Make sure you understand the ramifications of using Google Web Services before you devote a lot of time and effort to an application that will experience problems. |
The example in this section assumes that you want to limit the search information to your site. It also presents the output as a table. Some developers prefer to use a tabular format for some types of search results because it lets the viewer scan the information quickly. Listing 7.1 shows a typical PHP site search. You'll find the complete source code for this example in the \Chapter 07\SimpleSearch folder of the source code located on the Sybex Web site.
<?php // Include the NuSOAP class. include("nusoap.php"); // Get the input data as needed. // Search term. if ($_REQUEST["txtSearchTerm"] == null) $SearchTerm = ""; else $SearchTerm = $_REQUEST["txtSearchTerm"]; ?> <html> <head> <title></title> </head> <body> <form action="Your Web Site" id="SubmissionForm" method=get name="SubmissionForm"> ... Other Form Related Code ... </form> <?php if ($_REQUEST["Submit"] == "Submit") { // Create an instance of the SOAP client. This client must point // to the Google search site. Don't attempt to create a client // that uses a proxy because it won't work with Google. $soapclient = new soapclient("http://api.google.com/search/beta2"); // Uncomment the next line to see debug messages // $soapclient->debug_flag = 1; // Set up an array with the parameters use for the call. Make // sure you include your license key or the call will definitely // fail. $params = array( 'key' => 'Your License Key', 'q' => "{$SearchTerm} DataCon Services site:www.mwt.net", 'start' => 1, 'maxResults' => 10, 'filter' => false, 'restrict' => '', 'safeSearch' => false, 'lr' => '', 'ie' => '', 'oe' => ''); // Invoke the method. Include the method name, the list of // parameters, the namespace, and the SOAP action. $result = $soapclient->call("doGoogleSearch", $params, "urn:GoogleSearch", "urn:GoogleSearch"); // Display the total results. print_r("<p>Total Estimated Results: "); print_r($result[estimatedTotalResultsCount]); print_r("</p>"); // Get the result array. $Results = $result[resultElements]; // Display the actual results. print_r("<p>Actual Results: "); print_r(count($Results)); print_r("</p>"); ?> <table border="1" cellpadding="5" width="90%"> <tr> <th width="200pt">Title</th> <th width="100pt">URL</th> <th width="50pt">Size</th> <th>Snippet</th> </tr> <?php // Process each result array element. for ($Counter = 0; $Counter < count($Results); $Counter++) { $IndSite = $Results[$Counter]; print_r("<tr><td>"); print_r($IndSite[title]); print_r("</td><td><a href="); print_r($IndSite[URL]); print_r(">Go To Site</a></td><td>"); print_r($IndSite[cachedSize]); print_r("</td><td>"); print_r($IndSite[snippet]); print_r("</td></tr>"); } } ?> </table> </body> </html>
The code begins with a simple include() function call. The PHP examples in this rely on NuSOAP, so I've included the required PHP file in the example. If you use another SOAP product, you'll need to include whatever support it requires in your code.
Once the code imports the required SOAP support, it saves any data passed as parameters and creates an input form. The input form lets the user type a search phrase. The current setup for this form also lets the user request a search without any search phrase. In this case, the example returns a list of all pages on the site, which you could cache and use as a site map.
Creating the SOAP client comes next. The code performs three tasks . First, it creates the actual client. Notice that you must point the client at the Google search URL, rather than the Web Service Description Language (WSDL) file. Pointing at the WSDL file and using a proxy to make the method calls results in an error. (Other Web services do allow you to use a proxy ”some developers might prefer this practice, which is why I specifically mentioned the problem here.) You can find an example of the failed WSDL setup in the \Chapter 07\NoWSDL folder of the source code on the Sybex Web site.
Second, the code creates an array of arguments. You must include all of the arguments in the array, even if you don't use a particular argument ”the SOAP parser reports an error if you don't. Also, notice the order of the arguments in the example. In general, the order shouldn't matter, but you'll receive fewer errors if you use the order shown. Finally, notice that the q argument includes the $SearchTerm provided by the user, a special identifier, and the site parameter.
I have a problem searching my site with Google because another company hosts it and I haven't obtained a unique domain name. Consequently, my main page is at http://www.mwt.net/~jmueller. Many other small businesses and individuals find themselves in the same situation. It's important to remember that a site search accepts only the domain name for the site argument. This limitation means you can't use a site such as http://www.mwt.net/~jmueller. The /~jmueller portion of the URL is unacceptable. You can get around this problem by specifying your name, company name, or other unique key term as part of the title on every page. By combining the domain with the keyword, you can create a site search type that Google doesn't support directly.
Third, the code makes the call to doGoogleSearch() . Because the example uses the SOAP calling method shown, you must provide the method name, the list of parameters, the name-space, and the SOAP action as a minimum or the call will fail. One return from the call, $result contains the search results, along with statistics information Google provides.
The application shows the estimated results provided by Google and the actual number of array elements returned next. A number of searches show that the estimatedTotalResultsCount field is inaccurate. Consequently, always use the count() function to determine the actual number of array elements.
The final piece of code displays the results on screen. The return values appear in the resultElements array and it usually works better if you place the elements in a separate variable as shown to make the information easier to handle. In this example, we only use the site title, URL, page size, and a descriptive snippet. Most users will get everything they need from these items. Figure 7.1 shows typical output from this example.
Notice that the snippet is short and normally includes the search terms in bold type. Generally, the snippet contains the kind of information you want. However, you might find that you have to include your own hints as part of a database or perform specific background searches to create snippets that are more informative.
The multiple search technique is about the same as the simple search technique, except that you now have to keep track of pages. This technique introduces a few issues you need to consider when working with PHP and Google Web Services. You'll find the complete source code for this example in the \Chapter 07\MultiResult folder of the source code located on the Sybex Web site.
Tip | It often helps to run a simple test of your SOAP setup before you begin adding too many features to your application. The \Chapter 07\Test folder of the example source code on the Sybex Web site contains a test application you can try. This example displays the raw results from a site search. |
Before you can track anything, you need to provide some means of saving the starting result number from page to page. The best way to do this is provide the user with a textbox that contains the current result number. Using this method lets you display the result number (so the user knows something has changed) and the simple tracking method shown in the following code.
// Create the starting page. if ($_REQUEST["pvtStartIndex"] == null) $StartIndex = 1; else if ($_REQUEST["Submit"] == "Forward") $StartIndex = $_REQUEST["pvtStartIndex"] + 10; else if (($_REQUEST["Submit"] == "Back") && $_REQUEST["pvtStartIndex"] > 10) $StartIndex = $_REQUEST["pvtStartIndex"] - 10; else $StartIndex = $_REQUEST["pvtStartIndex"];
You need to increment or decrement the starting index variable, $StartIndex , to reflect the number of results that Google returns with each request. Normally, I request 10, the maximum that Google allows, to keep the number of requests to a minimum. If you find that you can't display 10 requests at a time, you'll need to increment the starting index by a different amount.
As part of the code change, you also need to tell Google Web Services to use the new starting index. This means modifying the $params array shown in Listing 7.1 to accept the $StartIndex variable as input. You must perform a cast of the variable as shown here to ensure Google Web Services will understand the request.
˜ start => ( Integer )$ StartIndex
An odd thing happens when you don't perform the cast. Google doesn't report an error ”it simply tells you there aren't any results. Look through the WSDL file and you'll notice that the start element has to contain a number, but the lack of a fault indication makes this a particularly difficult error to find.
All that you need at this point is two additional buttons (Forward and Back) and a textbox for display. Figure 7.2 shows typical output from this application.