4.24 Accessing Elements of a Single Dimension Array

To access an element of a zero-based array, you can use the simplified formula:

 Element_Address = Base_Address + index * Element_Size

For the Base_Address entry you can use the name of the array (because HLA associates the address of the first element of an array with the name of that array). The Element_Size entry is the number of bytes for each array element. If the object is an array of bytes, the Element_Size field is one (resulting in a very simple computation). If each element of the array is a word (or other two-byte type) then Element_Size is two. And so on. To access an element of the SixteenInts array in the previous section, you'd use the formula:

 Element_Address = SixteenInts + index*4

The 80x86 code equivalent to the statement "EAX:=SixteenInts[index]" is

      mov( index, ebx );      shl( 2, ebx ); //Sneaky way to compute 4*ebx      mov( SixteenInts[ ebx ], eax );

There are two important things to notice here. First of all, this code uses the shl instruction rather than the intmul instruction to compute 4*index. The main reason for choosing shl is that it was more efficient. It turns out that shl is a lot faster than intmul on many processors.

The second thing to note about this instruction sequence is that it does not explicitly compute the sum of the base address plus the index times two. Instead, it relies on the indexed addressing mode to implicitly compute this sum. The instruction "mov( SixteenInts[ ebx ], eax );" loads EAX from location SixteenInts+EBX, which is the base address plus index*4 (because EBX contains index*4). Sure, you could have used

      lea( eax, SixteenInts );      mov( index, ebx );      shl( 2, ebx );           //Sneaky way to compute 4*ebx      add( eax, ebx );         //Compute base address plus index*4      mov( [ebx], eax );

in place of the previous sequence, but why use five instructions where three will do the same job? This is a good example of why you should know your addressing modes inside and out. Choosing the proper addressing mode can reduce the size of your program, thereby speeding it up.

Of course, as long as we're discussing efficiency improvements, it's worth pointing out that the 80x86 scaled indexed addressing modes let you automatically multiply an index by one, two, four, or eight. Because this current example multiplies the index by four, we can simplify the code even further by using the scaled indexed addressing mode:

      mov( index, ebx );      mov( SixteenInts[ ebx*4 ], eax );

Note, however, that if you need to multiply by some constant other than one, two, four, or eight, then you cannot use the scaled indexed addressing modes. Similarly, if you need to multiply by some element size that is not a power of two, you will not be able to use the shl instruction to multiply the index by the element size; instead, you will have to use intmul or some other instruction sequence to do the multiplication.

The indexed addressing mode on the 80x86 is a natural for accessing elements of a single dimension array. Indeed, its syntax even suggests an array access. The only thing to keep in mind is that you must remember to multiply the index by the size of an element. Failure to do so will produce incorrect results.

Before moving on to multidimensional arrays, a couple of additional points about addressing modes and arrays are in order. The above sequences work great if you only access a single element from the SixteenInts array. However, if you access several different elements from the array within a short section of code, and you can afford to dedicate another register to the operation, you can certainly shorten your code and, perhaps, speed it up as well. Consider the following code sequence:

      lea( ebx, SixteenInts );      mov( index, esi );      mov( [ebx+esi*4], eax );

Now EBX contains the base address, and ESI contains the index value. Of course, this hardly appears to be a good trade-off. However, when accessing additional elements in SixteenInts you do not have to reload EBX with the base address of SixteenInts for the next access. The following sequence is a little shorter than the comparable sequence that doesn't load the base address into EBX:

      lea( ebx, SixteenInts );      mov( index, esi );      mov( [ebx+esi*4], eax );       .       .                               //Assumption: EBX is left alone       .                               //    through this code.      mov( index2, esi );      mov( [ebx+esi*4], eax );

This code is slightly shorter because the "mov( [ebx+esi*4], eax);" instruction is slightly shorter than the "mov( SixteenInts[ebx*4], eax);" instruction. Of course the more accesses to SixteenInts you make without reloading EBX, the greater your savings will be. Tricky little code sequences such as this one sometimes pay off handsomely. However, the savings depend entirely on which processor you're using. Code sequences that run faster on one 80x86 CPU might actually run slower on a different CPU. Unfortunately, if speed is what you're after there are no hard and fast rules. In fact, it is very difficult to predict the speed of most instructions on the 80x86 CPUs.

4.24.1 Sorting an Array of Values

Almost every textbook on this planet gives an example of a sort when introducing arrays. Because you've probably seen how to do a sort in high level languages already, it's probably instructive to take a quick look at a sort in HLA. The example code in this section will use a variant of the bubble sort, which is great for short lists of data and lists that are nearly sorted, but horrible for just about everything else.^[28]

 const      NumElements:= 16; static    DataToSort: uns32[ NumElements ] :=                    [                       1, 2, 16, 14,                       3, 9, 4, 10,                       5, 7, 15, 12,                       8, 6, 11, 13                    ];      NoSwap: boolean;            .            .            .      // Bubble sort for the DataToSort array:      repeat           mov( true, NoSwap );           for( mov( 0, ebx ); ebx <= NumElements-2; inc( ebx )) do                mov( DataToSort[ ebx*4], eax );                if( eax > DataToSort[ ebx*4 + 4] ) then                     mov( DataToSort[ ebx*4 + 4 ], ecx );                     mov( ecx, DataToSort[ ebx*4 ] );                     mov( eax, DataToSort[ ebx*4 + 4 ] ); // Note: EAX contains                     mov( false, NoSwap );                // DataToSort[ ebx*4 ]                endif;           endfor;      until( NoSwap );

The bubble sort works by comparing adjacent elements in an array. The interesting thing to note in this code fragment is how it compares adjacent elements. You will note that the if statement compares EAX (which contains "DataToSort[ebx*4]") against "DataToSort[EBX*4 + 4]". Because each element of this array is four bytes (uns32), the index "[EBX*4 + 4]" references the next element beyond "[EBX*4]".

As is typical for a bubble sort, this algorithm terminates if the innermost loop completes without swapping any data. If the data is already presorted, then the bubble sort is very efficient, making only one pass over the data. Unfortunately, if the data is not sorted (worst case, if the data is sorted in reverse order), then this algorithm is extremely inefficient. Indeed, although it is possible to modify the code above so that, on the average, it runs about twice as fast, such optimizations are wasted on such a poor algorithm. However, the bubble sort is very easy to implement and understand (which is why introductory texts continue to use it in examples).

^[28]Fear not; you'll see some better sorting algorithms in the chapter on procedures and recursion.