Considering Performance Issues | Mining Google Web Services: Building Applications with the Google API

Some developers confuse the concept of performance with the idea of speed. An application that performs well (has good performance) isn't necessarily fast. Performance is a measure of how well an application accomplishes the task that you set before it. Speed is only one aspect of performance. You also have to consider factors such as resource usage and user access speed (efficiency). In addition, you often have to consider the effect of search repeatability and network bandwidth availability (reliability). The following sections discuss performance concerns for Google applications.

Addressing Speed Concerns

Speed measures how fast an application can perform a task. Many developers concentrate on this factor when developing an application because it's relatively easy to quantify. You can easily demonstrate that a particular coding change or technique improvement provides a corresponding increase in speed. Making changes that result in a speed increase is important when using a Web service such as Google Web Services because your application incurs a performance penalty when it requests the data.

Quantifying speed is relatively easy for most applications because the developer has control over the environment. On the other hand, getting, proving, and quantifying a speed increase with Google Web Services can prove elusive . For example, your application will always slow during peak activity periods on Google ”you can't control this factor and it always affects the overall performance of your application. Consequently, long- term speed measurement is essential when working with Google Web Services. You need to consider whether a change actually provides a performance boost or Google Web Services just happened to provide faster results during the initial test. In addition, make one change at a time because you can't accurately measure the effects of multiple changes.

It's also important to consider the state of Google Web Services at the time of your test. Monitor the state of changes by visiting the developer forum at http://groups.google.com/groups? group =google.public.web-apis. This newsgroup helps you keep up-to-date on changes that Google is making that could affect your application. (Google will also send you a newsletter with probable changes to Google Web Services.)

Note

This book doesn't even begin to address local application speed issues because the language you choose, application environment, and platform affect the speed of your application. Look in the language-specific chapters of the book for suggestions on third party resources you can use for that language. Writing code that executes quickly takes time, effort, and planning, so make sure you begin with a good application specification.

Initially, you might get the idea that you have to perform all kinds of weird programming to gain much of a speed increase. However, you can reduce all Google Web Services application speed improvements into five main areas.

Use the Fewest Possible Calls A combination of optimized searches, explicit input, and data ordering usually serves to reduce the number of calls your application has to make to Google. Every call costs time, so even reducing the number of calls by one round-trip helps. It's important to remember that Google returns data in 10-record chunks , so you should optimize your searches around this number.

Handle Only the Required Data Some developers parse and store every piece of data that a Web service has to offer with the idea that they might need the information later. In general, you only need to save the site title, URL, and snippet or summary to get good results with Google. In some cases, you might want to save additional information, such as the estimated number of results or the cached size of the page. The important element to consider is that every data manipulation costs time and resources, so you want to work with just the data you need to make your application work well.

Use Offline Storage Effectively Don't assume that every application has to use offline storage or that you need to store everything offline. An application used to perform research might not benefit from offline storage as most requests are unique and data input is unlikely to repeat. In addition, an application that requires a source of constant updates might not benefit much from complete offline storage ”you might want to store just the essentials for locating the data such as an URL. (See the "Using Offline Storage Effectively" section of this chapter for details.)

Improve the Local Application Speed It's easy to become fixated on the speed of Google Web Services communication and forget local application requirements. The local application has a large effect on application speed. Consider items such as how fast the application makes a request. Because Google relies exclusively on SOAP, your communication choices are limited, but there are still ways to build the request more quickly if you eliminate unnecessary inputs. Consider special programming needs as well. For example, don't rely on Google to sort the data if none of the default sort criteria completely meet your needs ” sort the data locally instead.

Define the Best Possible User Experience Many developers assume that fast code always results in a fast application. When a user spends considerable time trying to figure out your application, code execution speed becomes a nonissue. Always check user performance when you consider the speed of your application because the user is going to be the main choke point. Whenever you make the user fast, you gain a significant improvement in application speed (not to mention reducing support costs). Chapter 11 discusses this concept in detail.

Tip	Always use the current version of Google Web Services to get speed, efficiency, reliability, and request features. As Google improves its Web service, you'll see options for additional inputs and search types that will make your application faster.

Addressing Efficiency Concerns

Efficiency affects performance by modifying the resource requirements for the application. An efficient application uses resources to their fullest and therefore reduces the cost of using the application. Making an application efficient can improve application speed as well. For example, an application that uses memory efficiently won't have to rely on swap files or other memory enhancements as much, which usually results in a speed boost. However, an efficient application can just as easily slow performance. An application that uses disk storage rather than memory to improve overall system efficiency by freeing memory for other applications is almost certain to work slower than an application that relies exclusively on memory.

Note

It's important to understand that most performance tuning relies on assumptions that might not be true on the production system. The more control you exercise over the host machine, the better you can control the assumptions you make about performance tuning. Real systems run multiple applications, including background applications such as virus checkers. In addition, applications can experience problems such as memory leaks. Consequently, you need to make the best assumptions you can about the application environment and use those assumptions when tuning your application.

You'll also find that efficiency affects reliability. An application that uses resources conservatively is less likely to run out of resources to process the incoming Google data. Resource deprivation is a major cause of application crashes, so using resources carefully means your application is likely to crash less often.

A Google Web Services application developer only considers the client side of the data exchange because Google takes care of the server side. When a user makes a request, you must consider the efficiency of that request. Inefficient requests can cause Google to return more results than needed and reduce overall system performance. As a side effect, consider how inefficient requests will add to the load the Google Web Services servers must handle. When multiple developers create applications that perform requests inefficiently, server load increases , and could increase the time the user waits for responses.

One of the most important efficiency considerations is the effect of false starts on application efficiency. For example, Google normally lists a snippet for a Web site so you know what it contains before you actually click the link to view it. However, some Web sites use a summary instead, so the smart developer looks for both. You also need to consider the organization of the links ”the way Google organizes them might not be the way that you need them. Sometimes it's better to have the user assign a weight to each link so that ordering becomes easier and more appropriate to that particular user's requirements.

Tip

Some users will experience problems using the snippets that Google provides. In some cases, the snippet is too short ”it doesn't provide enough information for the user to make a decision about the link. In other cases, the snippet appears in the middle of a sentence or incorporates pieces of multiple sentences, making it impossible to understand. A custom application could request the required information from Google, and then download amplifying information from the Web site to provide the user with better information.

Buffering the Data

Reliability measures several factors. For example, if you can satisfy a search only 80 percent of the time, then the application is only 80 percent reliable, ignoring other reliability factors. Likewise, an application that doesn't provide repeatable results within a given time and without any change in external factors (such as the technique Google uses to create the results) isn't very reliable. Data buffering can help increase reliability by making the results of a search available, even when the network connection to Google fails. In addition, data aging ensures the user obtains consistent results, but not outdated results.

Google, more than any other Web service, can benefit greatly from certain types of data buffering. You already know that Google only allows you to retrieve 10 results at a time. Consequently, most of the application displays in the book revolve around a display that provides 10 links. However, if you want for the user to click a button every time the application needs to retrieve additional links, user efficiency suffers. In general, it's better to request results before you actually need them. You can use one of two techniques to make buffering seamless.

The first approach is to display the links in pages. When using this technique, you build three buffers into the application. The first buffer holds the previous 10 links, the second the current 10 links, and the third the next 10 links. This way the user can page back and forth without experiencing a delay unless the paging is faster than the buffering process can work. Note that the first buffer won't hold any information on the first request because there isn't any previous information; likewise, the third buffer won't hold information on the last request because there isn't any additional information available.

The second approach is to scroll the data. When using this technique, you create one buffer with 10 or more results before the current link and a second buffer that holds the current link plus 10 or more additional links. This technique relies on the user scrolling through the results one link at a time. Because of the limitations that Google places on your use of the Web service, this technique is less efficient. You really do want to request pages in 10-link segments to reduce the number of calls your application makes.

No matter what buffering technique you use, it pays to place the buffering code in a separate thread. Whenever the user changes the current record, the application should create a thread to request more data as a background process. Making the request using this technique reduces the performance hit and makes the request process almost invisible to the user.

Listing 10.1 shows a typical example of the paged buffer approach. This listing shows only the essentials ”the actual source code is much longer. You'll find the complete listing for this example in the \Chapter 10\BufferedPage folder of the source code located on the Sybex Web site.

Listing 10.1: Using the Paged Buffer Approach

 // These variables track the current index number.   String   NextIndex;   String   PreviousIndex;   // This variable defines when an update is complete.   Boolean IsUpdating;   private void btnTest_Click(object sender, System.EventArgs e)   {      DataRow NewData; // New data for SearchResults.      // Wait until the update is complete.      while (IsUpdating)         lblUpdating.Visible = true;      lblUpdating.Visible = false;      // Create a thread to get the next results.      Thread ReqThread = new Thread(new ThreadStart(NextRequest));      // Determine which type of processing to perform.      if (btnTest.Text == "&Test")      {         // Make the initial request.         MakeRequest("SearchResults");         // Get the next results by starting a thread to         // perform the work.         ReqThread.Start();         // Continue by changing the button text.         btnTest.Text = "&Next";         // Enable the previous button.         btnPrevious.Enabled = true;         // Set the next index value.         NextIndex = txtResults.Text;      }      else      {         // Verify there are more entries to obtain.         if (Int32.Parse(NextIndex) < Int32.Parse(txtEstResults.Text))         {            // Set the current index value.            PreviousIndex = txtIndex.Text;            txtIndex.Text = NextIndex;            NextIndex = Convert.ToString(               Int32.Parse(NextIndex) + Int32.Parse(txtResults.Text));            // Transfer the NextResults table to the SearchResults table.            dsGoogle.Tables["SearchResults"].Clear();            foreach (DataRow DR in dsGoogle.Tables["NextResults"].Rows)            {               NewData = dsGoogle.Tables["SearchResults"].NewRow();               NewData.ItemArray = DR.ItemArray;               dsGoogle.Tables["SearchResults"].Rows.Add(NewData);            }            // Get the next reults by starting a thread to            // perform the work.            ReqThread.Start();         }      }   }   private void NextRequest()   {     // We're updating.     IsUpdating = true;     // Request the next set of results. Place them in the NextResults     // buffer table.     if (Int32.Parse(NextIndex) < Int32.Parse(txtEstResults.Text))        MakeRequest("NextResults");     // Request the previous set of results. Place them in the     // PreviousResults buffer table.     if (Int32.Parse(txtIndex.Text) >= Int32.Parse(txtResults.Text))        MakeRequest("PreviousResults");     // Done updating.     IsUpdating = false;   }

The example begins by creating a thread. The thread isn't executing, but it's available for later use. The thread code appears in the NextRequest() method shown later in this listing. As you can see, this method calls the MakeRequest() method. The MakeRequest() code is essentially the same as the code in Listing 6.5. All that I added was the ability to use more than one table to store the data.

The next step is to ensure the thread isn't updating the database. It's not an issue of reentrancy, but one of database usage. You must make sure that the client doesn't perform multiple updates of the same table. When the user clicks Next or Previous when an update is completing, the application displays a message. The message is cleared when the update process completes.

Whenever the user starts the application, the DataSet is empty, which means the code must fill it with data. Consequently, the code has two paths to follow. The first case occurs only when the user creates a new query; the second case occurs whenever the user moves between pages of the query results.

An initial query must fill the SearchResults table of dsGoogle , so the application calls MakeRequest() without using a thread. This initial request is the only time most users will need to wait for results. In most other cases, the data will already appear in one of the two caches. After the application fills the SearchResults table, the code starts the thread using ReqThread.Start() , which fills the two other tables in the background.

The code also makes a few other changes to the application at this point. The code replaces Test button with the Next button and enables the Previous button. Finally, the code sets the NextIndex value. This string holds the next index that the application will need to find.

Both the Previous and Next buttons operate the same, just in different directions in the query set. The first step is to verify that the query set contains additional links to process. Otherwise, the code could look for nonexistent index values. At this point, the code updates the three index values: PreviousIndex, txtIndex.Text (current), and NextIndex .

The data the user needs already appears in the NextResults table ( PreviousResults if the user clicks the Previous button). The code simply clears the current SearchResults table data and fills it with the NextResults table data. Note the technique used in this case. Depending on the language you use, you might need to use other data transfer techniques. Finally, the code starts the thread. Again, this step fills the two cache tables in the background.