Local Optimizations | Speed Up Your Site[c] Web Site Optimization

Okay, you've switched to a better algorithm and revamped your data structure. You've refactored your code and minimized DOM interaction, but speed is still an issue. It is time to tune your code by tweaking loops and expressions to speed up hot spots. In his classic book, Writing Efficient Programs (Prentice Hall, 1982), Jon Bentley revealed 27 optimization guidelines for writing efficient programs. These code-tuning rules are actually low-level refactorings that fall into five categories: space for time and vice versa, loops, logic, expressions, and procedures. In this section, I touch on some highlights.

Trade Space for Time

Many of the optimization techniques you can read about in Bentley's book and elsewhere trade space (more code) for time (more speed). You can add more code to your scripts to achieve higher speed by " defactoring " hot spots to run faster. By augmenting objects to store additional data or making it more easily accessible, you can reduce the time required for common operations.

In JavaScript, however, any additional speed should be balanced against any additional program size . Optimize hot spots, not your entire program. You can compensate for this tradeoff by packing and compressing your scripts.

Augment Data Structures

Douglas Bagnall employed data structure augmentation in the miniscule 5K chess game that he created for the 2002 5K contest (http://www.the5k.org/). Bagnall used augmented data structures and binary arithmetic to make his game fast and small. The board consists of a 120-element array, containing numbers representing either pieces, empty squares, or "off-the-board" squares. The off-the-board squares speed up the testing of the sidespreventing bishops, etc., from wrapping from one edge to the other while they're moving, without expensive positional tests.

Each element in his 120-item linear array contains a single number that represents the status of each square. So instead of this:

 board=[16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,2,3,4,5,6,2,3,4,5,16,....]

He did this:

 bstring="ggggggggggggggggggggg23456432gg11111111gg0000 ... g";  for (z=0;z<120;z++){      board[z]=parseInt(bstring.charAt(z),35); }

This base-35 value represents the squares on the board ( parseInt using a radix of 35). As alpha "g" corresponds to 16 (the 5th bit; that is, bit 4), Bagnall says he actually could have used base-17 instead of 35. Perhaps this will leave room for future enhancements.

Each position on the board is encoded like this:

 bit 4  (16): 0 = on board, 1 = off board.  bit 3   (8): 0 = white, 1 = black. bits 0-2(7): 0 = empty, non-zero = the piece type: 1 - pawn 2 - rook 3 - knight 4 - bishop 5 - queen 6 - king

So to test the color of a piece, movingPiece , you'd use the following:

 ourCol=movingPiece & 8;    // what color is it? 8=black, 0=white         movingPiece &= 7;   // now we have the color info, dump it.      if(movingPiece > 1){   // If it is not a pawn.

Bagnall also checks that the piece exists (because the preceding code will return white for an empty square), so he checks that movingPiece is non-empty. To see his code and the game in action, visit the following sites:

http://halo.gen.nz/chess/
http://halo.gen.nz/chess/main-branch/ (the actual code)

Cache Frequently Used Values

One of the most effective techniques you can use to speed up your JavaScripts is to cache frequently used values. When you cache frequently used expressions and objects, you do not need to recompute them. So instead of this (see Listing 10.5):

Listing 10.5 A Loop That Needs Caching and Fewer Evaluations

 var d=35; for (var i=0; i<1000; i++) {     y += Math.sin(d)*10; }

Do this (see Listing 10.6):

Listing 10.6 Caching Complex Calculations Out of a Loop

 var d=35; var math_sind = Math.sin(d)*10; for (var i=0; i<1000; i++) {     y += math_sind; }

Because Math is a global object, declaring the math_sind variable also avoids resolving to a global object for each iteration. You can combine this technique with minimizing DOM interaction by caching frequently used object or property references. Simplify the calculations within your loops and their conditionals.

Store Precomputed Results

For expensive functions (like sin() ), you can precompute values and store the results. You can use a lookup table (O(1)) to handle any subsequent function calls instead of recomputing the function (which is expensive). So instead of this:

 function foo(i) {      if (i < 10) {return  i * i - i;  } }

Do this:

 values = [0*0-0, 1*1-1, 2*2-2, ..., 9*9-9];  function foo(i) {     if (i < 10)  {return values[i];} }

This technique is often used with trigonometric functions for animation purposes. A sine wave makes an excellent approximation of the acceleration and deceleration of a body in motion:

 for (var i=1; i<=360; i++) {      sin[i] = Math.sin(i); }

In JavaScript, this technique is less effective than it is in a compiled language like C. Unchanging values are computed at compile time in C, while in an interpreted language like JavaScript, they are computed at runtime.

Use Local versus Global Variables

Reducing the scope of your variables is not only good programming practice, it is faster. So instead of this (see Listing 10.7):

Listing 10.7 Loop with Global Variable

 function MyInnerLoop(){     for(i=0;i<1000;i++); }

Do this (see Listing 10.8):

Listing 10.8 Loop with Local Variable

 function MyInnerLoop(){     for(var i=0;i<1000;i++); }

Local variables are 60 percent to 26 times faster than global variables for tight inner loops. This is due in part to the fact that global variables require more time to search up the function's scope chain. Local variables are properties of the function's call object and are searched first. Netscape 6 in particular is slow in using global variables. Mozilla 1.1 has improved speed, but this technique is relevant to all browsers. See Scott Porter's local versus global test at http://javascript- games .org/articles/local_global_bench.html.

Trade Time for Space

Conversely, you can trade time for space complexity by densely packing your data and code into a more compact form. By recomputing information, you can decrease the space requirements of a program at the cost of increased execution time.

Packing

Packing decreases storage and transmission costs by increasing the time to compact and retrieve the data. Sparse arrays and overlaying data into the same space at different times are two examples of packing. Removing spaces and comments are two more examples of packing. Substituting shorter strings for longer ones can also help pack data into a more compact form.

Interpreters

Interpreters reduce program space requirements by replacing common sequences with more compact representations.

Some 5K competitors (http://www.the5k.org/) combine these two techniques to create self-extracting archives of their JavaScript pages, trading startup speed for smaller file sizes (http://www.dithered.com/experiments/compression/). See Chapter 9, "Optimizing JavaScript for Download Speed," for more details.

Optimize Loops

Most hot spots are inner loops, which are commonly used for searching and sorting. There are a number of ways to optimize the speed of loops: removing or simplifying unnecessary calculations, simplifying test conditions, loop flipping and unrolling, and loop fusion. The idea is to reduce the cost of loop overhead and to include only repeated calculations within the loop.

Combine Tests to Avoid Compound Conditions

"An efficient inner loop should contain as few tests as possible, and preferably only one." ^[14] Try to simulate exit conditions of the loop by other means. One technique is to embed sentinels at the boundary of data structures to reduce the cost of testing searches. Sentinels are commonly used for arrays, linked lists, and binary search trees. In JavaScript, however, arrays have the length property built-in, at least after version 1.2, so array boundary sentinels are more useful for arrays in languages like C.

^[14] Bentley, Programming Pearls , 192.

One example from Scott Porter of JavaScript-Games.org is splitting an array of numeric values into separate arrays for extracting the data for a background collision map in a game. The following example of using sentinels also demonstrates the efficiency of the switch statement:

 var serialData=new;  Array(-1,10,23,53,223,-1,32,98,45,32,32,25,-1,438,54,26,84,-1,487,43,11); var splitData=new Array(); function init(){     var ix=-1,n=0,s,l=serialData.length;     for(;n<l;n++){         s=serialData[n];         switch(s){  // switch blocks are much more efficient             case -1 : // than if... else if... else if...                 splitData[++ix]=new Array();                 break;             default :                 splitData[ix].push(s);         }     }     alert(splitData.length); }

Scott Porter explains the preceding code using some assembly language and the advantage of using the switch statement:

"Here, -1 is the sentinel value used to split the data blocks. Switch blocks should always be used where possible, as it's so much faster than an ifelse series. This is because with the if else statements, a test must be made for each "if" statement, whereas switch blocks generate vector jump tables at compile time so NO test is actually required in the underlying code! It's easier to show with a bit of assembly language code. So an if/else statement:

 if(n==12)          someBlock();     else if(n==26)         someOtherBlock();

becomes something like this in assembly:

 cmp eax,12;      jz    someBlock;     cmp eax,26;     jz    someOtherBlock;

Whereas a switch statement:

 switch(a){      case 12 :         someBlock();         break;     case 26 :         someOtherBlock();         break; }

becomes something like this in assembly:

 jmp [VECTOR_LIST+eax];

where VECTOR_LIST would be a list of pointers to the address of the start of the someBlock and someOtherBlock functions. At least this would be the method if the switch were based on a numeric value. For string values I'd imagine eax would be replaced by a pointer to the location of a string for the comparison.

As you can see, the longer the if...else if... block became, the more efficient the switch block would become in comparison." ^[15]

^[15] Scott Porter, email to author, 16 July 2002.

Next, let's look at some ways to minimize loop overhead. Using the right techniques, you can speed up a for loop by two or even three times.

Hoist Loop-Invariant Code

Move loop-invariant code out of loops ( otherwise called coding motion out of loops ) to speed their execution. Rather than recomputing the same value in each iteration, move it outside the loop and compute it only once. So instead of this:

 for (i=0;i<iter;i++) {      d=Math.sqrt(y);     j+=i*d;  }

Do this:

 d=Math.sqrt(y);  for (i=0;i<iter;i++) {     j+=i*d; }

Reverse Loops

Reversing loop conditions so that they count down instead of up can double the speed of loops. Counting down to zero with the decrement operator ( i-- ) is faster than counting up to a number of iterations with the increment operator ( i++ ). So instead of this (see Listing 10.9):

Listing 10.9 A Normal for Loop Counts Up

 function loopNormal() {     for (var i=0;i<iter;i++) {         // do something here     } }

Do this (see Listing 10.10):

Listing 10.10 A Reversed for Loop Counts Down

 function loopReverse() {     for (var i=iter;i>0;i--) {         // do something here     } }

Flip Loops

Loop flipping moves the loop conditional from the top to the bottom of the loop. The theory is that the do while construct is faster than a for loop. So a normal loop (see Listing 10.9) would look like this flipped (see Listing 10.11):

Listing 10.11 A Flipped Loop Using do while

 function loopDoWhile() { var i=0; do {     i++; } while (i<iter); }

In JavaScript, however, this technique gives poor results. IE 5 Mac gives inconsistent results, while IE and Netscape for Windows are 3.7 to 4 times slower. The problem is the complexity of the conditional and the increment operator. Remember that we're measuring loop overhead here, so small changes in structure and conditional strength can make a big difference. Instead, combine the flip with a reverse count (see Listing 10.12):

Listing 10.12 Flipped Loop with Reversed Count

 function loopDoWhileReverse() { var i=iter; do {     i--; } while (i>0); }

This technique is more than twice as fast as a normal loop and slightly faster than a flipped loop in IE5 Mac. Even better, simplify the conditional even more by using the decrement as a conditional like this (see Listing 10.13):

Listing 10.13 Flipped Loop with Improved Reverse Count

 function loopDoWhileReverse2() { var i=iter-1; do {     // do something here } while (i--); }

This technique is over three times faster than a normal for loop. Note the decrement operator doubles as a conditional; when it gets to zero, it evaluates as false. One final optimization is to substitute the pre-decrement operator for the post-decrement operator for the conditional (see Listing 10.14).

Listing 10.14 Flipped Loop with Optimized Reverse Count

 function loopDoWhileReverse3() { var i=iter; do {     // do something here } while (--i); }

This technique is over four times faster than a normal for loop. This last condition assumes that i is greater than zero. Table 10.2 shows the results for each loop type listed previously for IE5 on my Mac PowerBook.

Table 10.2. Loop Optimizations Compared
	Normal	Do While	Reverse	Do While Reverse	Do While Reverse2	Do WhileReverse3
Total time (ms)	2022	1958	1018	932	609	504
Cycle time (ms)	0.0040	0.0039	0.0020	0.0018	0.0012	0.0010

Unroll or Eliminate Loops

Unrolling a loop reduces the cost of loop overhead by decreasing the number of times you check the loop condition. Essentially, loop unrolling increases the number of computations per iteration. To unroll a loop, you perform two or more of the same statements for each iteration, and increment the counter accordingly . So instead of this:

 var iter = number_of_iterations;  for (var i=0;i<iter;i++) {     foo(); }

Do this:

 var iter = multiple_of_number_of_unroll_statements;  for (var i=0;i<iter;) {     foo();i++;     foo();i++;     foo();i++;     foo();i++;     foo();i++;     foo();i++; }

I've unrolled this loop six times, so the number of iterations must be a multiple of six. The effectiveness of loop unrolling depends on the number of operations per iteration. Again, the simpler, the better. For simple statements, loop unrolling in JavaScript can speed inner loops by as much as 50 to 65 percent. But what if the number of iterations is not known beforehand? That's where techniques like Duff's Device come in handy.

Duff's Device

Invented by programmer Tom Duff while he was at Lucasfilm Ltd. in 1983, ^[16] Duff's Device generalizes the loop unrolling process. Using this technique, you can unroll loops to your heart's content without knowing the number of iterations beforehand. The original algorithm combined a do-while and a switch statement. The technique combines loop unrolling, loop reversal, and loop flipping. So instead of this (see Listing 10.15):

^[16] Tom Duff, "Tom Duff on Duff's Device" [electronic mailing list], (Linkping, Sweden: Lysator Academic Computer Society, 10 November 1983 [archived reproduction]), available from the Internet at http://www.lysator.liu.se/c/duffs-device.html. Duff describes the loop unrolling technique he developed while at Lucasfilm Ltd.

Listing 10.15 Normal for Loop

 testVal=0; iterations=500125; for (var i=0;i<iterations;i++) {     // modify testVal here }

Do this (see Listing 10.16):

Listing 10.16 Duff's Device

 function duffLoop(iterations) {     var testVal=0;     // Begin actual Duff's Device     // Original JS Implementation by Jeff Greenberg 2/2001     var n = iterations / 8;     var caseTest = iterations % 8;     do {         switch (caseTest)         {         case 0: [modify testVal here];         case 7: [ditto];         case 6: [ditto];         case 5: [ditto];         case 4: [ditto];         case 3: [ditto];         case 2: [ditto];         case 1: [ditto];         }         caseTest=0;     }     while (--n > 0); }

Like a normal unrolled loop, the number of loop iterations ( n = iterations/8 ) is a multiple of the degree of unrolling (8, in this example). Unlike a normal unrolled loop, the modulus ( caseTest = iterations % 8 ) handles the remainder of any leftover iterations through the switch/case logic. This technique is 8 to 44 percent faster in IE5+, and it is 94 percent faster in NS 4.7.

Fast Duff's Device

You can avoid the complex do/switch logic by unrolling Duff's Device into two loops. So instead of the original, do this (see Listing 10.17):

Listing 10.17 Fast Duff's Device

 function duffFastLoop8(iterations) { // from an anonymous donor to Jeff Greenberg's site     var testVal=0;     var n = iterations % 8;     while (n--)     {         testVal++;     }     n = parseInt(iterations / 8);     while (n--)     {         testVal++;         testVal++;         testVal++;         testVal++;         testVal++;         testVal++;         testVal++;         testVal++;     } }

This technique is about 36 percent faster than the original Duff's Device on IE5 Mac. Even better, optimize the loop constructs by converting the while decrement to a do while pre-decrement like this (see Listing 10.18):

Listing 10.18 Faster Duff's Device

 function duffFasterLoop8(iterations) {     var testVal=0;     var n = iterations % 8;     if (n>0) {         do         {             testVal++;         }         while (--n); // n must be greater than 0 here     }     n = parseInt(iterations / 8);     do     {         testVal++;         testVal++;         testVal++;         testVal++;         testVal++;         testVal++;         testVal++;         testVal++;     }     while (--n); }

This optimized Duff's Device is 39 percent faster than the original and 67 percent faster than a normal for loop (see Table 10.3).

Table 10.3. Duff's Device Improved
500,125 Iterations	Normal `for` Loop	Duff's Device	Duff's Fast	Duff's Faster
Total time (ms)	1437	775	493	469
Cycle time (ms)	0.00287	0.00155	0.00099	0.00094

How Much to Unroll?

To test the effect of different degrees of loop unrolling, I tested large iteration loops with between 1 and 15 identical statements for the Faster Duff's Device. Table 10.4 shows the results.

Table 10.4. Faster Duff's Device Unrolled
Duff's Faster	1 Degree	2	3	4	5	6	7
Total time (ms)	925	661	576	533	509	490	482
Cycle time (ms)	0.00184	0.00132	0.00115	0.00106	0.00101	0.00097	0.00096
Duff's Faster	8	9	10	11	12	13	14	15
Total time (ms)	469	467	457	453	439	437	433	433
Cycle time (ms)	0.00093	0.00093	0.00091	0.00090	0.00087	0.00087	0.00086	0.00086

As you can see in Table 10.4, the effect diminishes as the degree of loop unrolling increases. Even after two statements, the time to loop through many iterations is less than 50 percent of a normal for loop. Around seven statements, the time is cut by two- thirds . Anything over eight reaches a point of diminishing returns. Depending on your requirements, I recommend that you choose to unroll critical loops by between four and eight statements for Duff's Device.

Fuse Loops

If you have two loops in close proximity that use the same number of iterations (and don't affect each other), you can combine them into one loop. So instead of this:

 for (i=0; i<j; i++) {      sumserv += serv(i); } for (i=0; i<j; i++) {     prodfoo *= foo(i); }

Do this:

 for (i=0; i<j; i++) {      sumserv  += serv(i);     prodfoo *= foo(i); }

Fusing loops avoids the additional overhead of another loop control structure and is more compact.