Section 13.3. Guesstimate | Ajax Design Patterns

13.3. Guesstimate

Approximate, Estimate, Extrapolate, Fuzzy, Guess, Guesstimate, Interpolate, Predict, Probabilistic, Sloppy, Trend

Figure 13-5. Guesstimate

13.3.1. Developer Story

Devi's producing a taxi tracker so the head office knows where its fleet is at any time. The taxis only transmit their location every 10 seconds, which would ordinarily lead to jerky movements on the map. However, Devi wants the motion to be smooth, so she uses interpolation to guess the taxi's location between updates.

13.3.2. Problem

How can you cut down on calls to the server?

13.3.3. Forces

To comprehend system activity and predict what might happen next, it's useful to have frequent updates from the server.
It's expensive to keep updating from the server.

13.3.4. Solution

Instead of requesting information from the server, make a reasonable guess. There are times when it's better to provide a good guess than nothing at all. Typically, this pattern relates to dynamic situations where the browser periodically grabs new information, using Periodic Refresh. The aim is help the user spot general trends, which are often more important than precise figures. It's a performance optimization because it allows you to give almost the same benefit as if the data was really arriving instantaneously, but without the bandwidth overhead. For this reason, it makes more sense when using Periodic Refresh than HTTP Streaming, because the latter doesn't incur as much overhead in sending frequent messages.

One type of Guesstimate is based on historical data. There are several ways the browser might have access to such data:

The browser application can capture recent data by accumulating any significant observations into variables that last as long as the Ajax App is open.
The browser application can capture long-term data in cookies, so it's available in subsequent sessions.
The server can expose historical data for interrogation by the browser application.

Equipped with historical data, it's possible to extrapolate future events, albeit imprecisely. Imagine a collaborative environment where multiple users can drag-and-drop objects in a common space, something like the Magnetic Poetry Ajax App (http://www.broken-notebook.com/magnetic/). Using a Periodic Refresh of one second, users might see an erratic drag motion, with the object appearing to leap across space, then stay still for a second, then leap again. A Guesstimate would exploit the fact that the motion of the next second is probably in the same direction and speed as that of the previous section. Thus, the application can, for that second, animate the object as if it were being dragged in the same direction the whole time. Then, when the real position becomes apparent a second later, the object need not leap to that position, but a new estimate can be taken as to where the object's moving, and the object can instead move smoothly toward the predicted location. In other words, the object is always moving in the prediction of its current predicted location. Dragging motion is an example where users would likely favor a smooth flow at the expense of some accuracy, over an erratic display that is technically correct.

How about long-term historical data, stretching over weeks and months instead of seconds and minutes? Long-term data can also be used for a Guesstimate. Imagine showing weather on a world map. The technically correct approach would be not to show any weather initially, and then to gradually populate the map as weather data is received from the server. But the philosophy here would suggest relying on historical data for a first-cut map, at least for a few indicative icons. In the worst case, the Guesstimate could be based on the previous day's results. Or it might be based on a more sophisticated statistical model involving several data points.

Historical data is not the only basis for a Guesstimate. It's also conceivable the browser performs a crude emulation of business logic normally implemented server side. The server, for example, might take 10 seconds to perform a complex financial query. That's a problem for interactivity, where the user might like to rapidly tweak parameters. What if the browser could perform a simple approximation, perhaps based on a few assumptions and rule-of-thumb reasoning? Doing so might give the user a feel for the nature of the data, with the long server trip required only for detailed information.

There are a few gotchas with Guesstimate. For example, the Guesstimate might end up being an impossible result, like "-5 minutes remaining" or "12.8 users online"! If there's a risk that your algorithm will lead to such situations, you probably want to create a mapping back to reality; for instance, truncate or set limits. Another gotcha is an impossible change, such as the number of all-time web site visitors suddenly dropping. The iTunes example discussed later in this chapter provides one mitigation technique: always underestimate, which will ensure that the value goes up upon correction. With Guesstimate, you also have the risk that you won't get figures back from the server, leading to even greater deviation from reality than expected. At some point, you'll probably need to give up and be explicit about the problem.

13.3.5. Decisions

13.3.5.1. How often will real data be fetched? How often will Guesstimates be made?

Most Guesstimates are made between fetches of real data. The point of the Guesstimate is to reduce the frequency of real data, so you need to decide on a realistic frequency. If precision is valuable, real data will need to be accessed quite frequently. If server and bandwidth resources are restricted, there will be fewer accesses and a greater value placed on the Guesstimate algorithm.

Also, how often will a new Guesstimate be calculated? Guesstimates tend to be fairly mathematical in nature, and too many of them will impact on application performance. On the other hand, too few Guesstimates will defeat the purpose.

13.3.5.2. How will the Guesstimate be consolidated with real data?

Each time new data arrives, the Guesstimate somehow needs to be brought into line with the new data. In the simplest case, the Guesstimate is just discarded, and the fresh data is adopted until a new Guesstimate is required.

Sometimes, a more subtle transition is warranted. Imagine a Guesstimate that occurs once every second, with real data arriving on the minute. The 59-second estimate might be well off the estimate by 1 minute. If a smooth transition is important, and you want to avoid a sudden jump to the real value, then you can estimate the real value at two minutes, and spend the next minute making Guesstimates in that direction.

The iTunes demo in the "Real-World Examples" section, later in this chapter, includes another little trick. The algorithm concedes a jump will indeed occur, but the Guesstimate is deliberately underestimated. Thus, when the real value arrives, the jump is almost guaranteed to be upward as the user would expect. Here, there is a deliberate effort to make the Guesstimate less accurate than it could be, with the payoff being more realistic consolidation with real data.

13.3.5.3. Will users be aware a Guesstimate is taking place?

It's conceivable that users will notice some strange things happening with a Guesstimate. Perhaps they know what the real value should be, but the server shows something completely different. Or perhaps they notice a sudden jump as the application switches from a Guesstimate to a fresh value from the server. These experiences can erode trust in the application, especially as users may be missing the point that the Guesstimate is for improved usability. Trust is critical for public web sites, where many alternatives are often present, and it would be especially unfortunate to lose trust due to a feature that's primarily motivated by user experience concerns.

For entertainment-style demos, Guesstimates are unlikely to cause much problem. But what about using a Guesstimate to populate a financial chart over time? The more important the data being estimated, and the less accurate the estimate, the more users need to be aware of what the system is doing. At the very least, consider a basic message or legal notice to that effect.

13.3.5.4. What support will the server provide?

In some cases, the server exposes information that the browser can use to make a Guesstimate. For example, historical information will allow the browser to extrapolate to the present. You need to consider what calculations are realistic for the browser to perform and ensure it will have access to the appropriate data. A Web Service exposing generic history details is not the only possibility. In some cases, it might be preferable for the server to provide a service related to the algorithm itself. In the Apple iTunes example shown later in the "Real-World Examples" section, recent real-world values are provided by the server, and the browser must analyze them to determine the rate per second. However, an alternative design would be for the server to calculate the rate per second, reducing the work performed by each browser.

13.3.6. Real-World Examples

13.3.6.1. Apple iTunes counter

As its iTunes Music Store neared its 500 millionth song download, Apple decorated its homepage (http://apple.com) with a rapid counter that appeared to show the number of downloads in real-time (Figure 13-6). The display made for an impressive testimony to iTunes' popularity and received plenty of attention. It only connected with the server once a minute, but courtesy of a Guesstimate algorithm described later in "Code Example: iTunes Counter," the display smoothly updated every 100 milliseconds.

Figure 13-6. iTunes counter

13.3.6.2. Gmail Storage Space

The Gmail homepage (http://gmail.com) (Figure 13-7) shows a message like this to unregistered users:

2446.034075 megabytes (and counting) of free storage so you'll never need to delete another message.

But there's a twist: the storage capacity figure increases each second. Having just typed a couple of sentences, it's up to 2446.039313 megabytes. Gmail is providing a not-so-subtle message about its hosting credentials.

Figure 13-7. Gmail storage

Andrew Parker has provided some analysis of the homepage (http://www.avparker.com/2005/07/10/itunes-gmail-dynamic-counters/). The page is initially loaded with the storage capacity for the first day of the previous month and the current month (also the next month, though that's apparently not used). When the analysis occurred, 100 MB was added per month. Once you know that, you can calculate how many megabytes per second. So the algorithm determines how many seconds have passed since the current month began, and it can then infer how many megabytes have been added in that time. Add that to the amount at the start of the month, and you have the current storage capacity each second.

13.3.7. Code Example: iTunes Counter

The iTunes counter^[*] relies on a server-based service that updates every five minutes. On each update, the service shows the current song tally and the tally five minutes prior. Knowing how many songs were sold in a five-minute period allows for a songs-per-second figure to be calculated. So the script knows the recent figure and, since it knows how many seconds have passed since then, it also has an estimate of how many songs were sold. The counter then shows the recent figure plus the estimate of songs sold since then.

^[*] An expanded analysis of the iTunes counter can be found on my blog at http://www.softwareas.com/ajaxian-guesstimate-on-download-counter. Some of the comments and spacing has been changed for this analysis.

There are two actions performed periodically:

Once a minute, doCountdown( ) calls the server to get new song stats.
Once every 100 milliseconds, runCountdown( ) uses a rate estimation to morph the counter display.

The key global variables are rate and curCount:

rate is the number of songs purchased per millisecond, according to the stats in the XML that's downloaded each minute.
curCount is the counter value.

So each time the server response comes in, rate is updated to reflect the most recent songs-per-millisecond figure. curCount is continuously incremented according to that figure, with the output shown on the page.

The stats change on the server every five minutes, though doCountdown( ) pulls down the recent stats once a minute to catch any changes a bit sooner:

   //get most recent values from xml and process   ajaxRequest('http://www.apple.com/itunes/external_counter.xml',               initializeProcessReqChange);   //on one minute loop   var refreshTimer = setTimeout(doCountdown,refresh);

It fetches is the counter XML:

   <root>     <count name="curCount" timestamp="Thu, 07 Jul 2005 14:16:00 GMT">       484406324     </count>     <count name="preCount" timestamp="Thu, 07 Jul 2005 14:11:00 GMT">       484402490     </count>   </root>

setCounters( ) is the function that performs the Guesstimate calculation based on this information. It extracts the required parameters from XML into ordinary variablesfor example:

   preCount = parseInt(req.responseXML.getElementsByTagName               ('count')[1].childNodes[0].nodeValue);

When a change is detected, it recalculates the current rate as (number of new songs) / (time elapsed). Note that no assumption is made about the five-minute duration; hence, the time elapsed (dataDiff) is always deduced from the XML.

   //calculate difference in values   countDiff = initCount-preCount;   //calculate difference in time of values   dateDiff = parseInt(initDate.valueOf()-preDate.valueOf( ));   //calculate rate of increase   // i.e. ((songs downloaded in previous time)/time)*incr   rate = countDiff/dateDiff;

This is the most accurate rate estimate possible, but next, the script reduces it by 20 percent. Why would the developers want to deliberately underestimate the rate? Presumably because they have accepted that the rate will always be off, with each browser-server sync causing the counter to adjust itself in one direction or another, and they want to ensure that direction is always upwards. Making the estimate lower than expected just about guarantees the counter will increaserather than decreaseon every synchronization point. True, there will still be a jump, and it will actually be bigger on average because of the adjustment, but it's better than having the counter occasionally drop in value, which would break the illusion.

   rate = rate*0.8;

As well as the once-a-minute server call, there's the once-every-100-milliseconds counter repaint. runCountdown( ) handles this. With the rate variable re-Guesstimated once a minute, it's easy enough to determine the counter value each second. incr is the pause between redisplays (in this case, 100 ms). So every 100 ms, it will simply calculate the new Guesstimated song quantity by adding the expected increment in that time. Note that the Gmail counter example just discussed calculates the total figure each time, whereas the present algorithm gradually increments it. The present algorithm is therefore a little more efficient, although more vulnerable to rounding errors.

   //multiply rate by increment   addCount = rate*incr;   //add this number to counter   curCount += addCount;

And finally, the counter display is morphed to show the new Guesstimate. The show was all over when the tally reached 500 million songs, so the counter will never show more than 500 million.

   c.innerHTML = (curCount<500000000) ?                 intComma(Math.floor(curCount)) : "500,000,000+";

13.3.8. Related Patterns

13.3.8.1. Periodic Refresh

A Guesstimate can often be used to compensate for gaps between Periodic Refresh (Chapter 10).

13.3.8.2. Predictive Fetch

Predictive Fetch (see earlier) is another performance optimization based on probabilistic assumptions. Predictive Fetch guesses what the user will do next, whereas Guesstimate involves guessing the current server state. In general, Guesstimate decreases the number of server calls, whereas Predictive Fetch actually increases the number of calls.

13.3.8.3. Fat Client

Guesstimates require some business and application logic to be calculated in the browser, a characteristic of Fat Clients (see later).

13.3.9. Metaphor

If you know what time it was when you started reading this pattern, you can Guesstimate the current time by adding an estimate of your reading duration.