4.25 Multidimensional Arrays

The 80x86 hardware can easily handle single dimension arrays. Unfortunately, there is no magic addressing mode that lets you easily access elements of multidimensional arrays. That's going to take some work and several instructions.

Before discussing how to declare or access multidimensional arrays, it would be a good idea to figure out how to implement them in memory. The first problem is to figure out how to store a multidimensional object into a one-dimensional memory space.

Consider for a moment a Pascal array of the form "A:array[0..3,0..3] of char;". This array contains 16 bytes organized as four rows of four characters. Somehow you've got to draw a correspondence with each of the 16 bytes in this array and 16 contiguous bytes in main memory. Figure 4-5 shows one way to do this:

click to expand
Figure 4-5: Mapping a 4x4 Array to Sequential Memory Locations.

The actual mapping is not important as long as two things occur: (1) each element maps to a unique memory location, (that is, no two entries in the array occupy the same memory locations) and (2) the mapping is consistent. That is, a given element in the array always maps to the same memory location. So what you really need is a function with two input parameters (row and column) that produce an offset into a linear array of 16 memory locations.

Now any function that satisfies the above constraints will work fine. Indeed, you could randomly choose a mapping as long as it was unique. However, what you really want is a mapping that is efficient to compute at runtime and works for any-sized array (not just 4x4 or even limited to two dimensions). While there are a large number of possible functions that fit this bill, there are two functions in particular that most programmers and most high level languages use: row major ordering and column major ordering.

4.25.1 Row Major Ordering

Row major ordering assigns successive elements, moving across the rows and then down the columns, to successive memory locations. This mapping is demonstrated in Figure 4-6.

click to expand
Figure 4-6: Row Major Array Element Ordering.

Row major ordering is the method employed by most high level programming languages including Pascal, C/C++, Java, Ada, and Modula-2. It is very easy to implement and easy to use in machine language. The conversion from a twodimensional structure to a linear array is very intuitive. You start with the first row (row number zero) and then concatenate the second row to its end. You then concatenate the third row to the end of the list, then the fourth row, and so on (see Figure 4-7).

click to expand
Figure 4-7: Another View of Row Major Ordering for a 4x4 Array.

The actual function that converts a list of index values into an offset is a slight modification of the formula for computing the address of an element of a single dimension array. The formula to compute the offset for a 4x4 two-dimensional row major ordered array is

 Element_Address = Base_Address + (colindex * row_size + rowindex) * Element_Size

As usual, Base_Address is the address of the first element of the array (A[0][0] in this case), and Element_Size is the size of an individual element of the array in bytes. colindex is the leftmost index, and rowindex is the rightmost index into the array. Row_size is the number of elements in one row of the array (four, in this case, because each row has four elements). Assuming Element_Size is one, this formula computes the following offsets from the base address:

 Column      Row     Offset into Array Index       Index 0           0               0 0           1               1 0           2               2 0           3               3 1           0               4 1           1               5 1           2               6 1           3               7 2           0               8 2           1               9 2           2               10 2           3               11 3           0               12 3           1               13 3           2               14 3           3               15

For a three-dimensional array, the formula to compute the offset into memory is the following:

 Address = Base + ((depthindex*col_size+colindex) * row_size + rowindex) *           Element_Size

col_size is the number of items in a column; row_size is the number of items in a row. In C/C++, if you've declared the array as "type A[i] [j] [k];" then row_size is equal to k and col_size is equal to j.

For a four-dimensional array, declared in C/C++ as "type A[i] [j] [k] [m];" the formula for computing the address of an array element is

 Address = Base + (((LeftIndex * depth_size + depthindex)*col_size+colindex) * row_size + rowindex) * Element_Size

depth_size is equal to j, col_size is equal to k, and row_size is equal to m. leftIndex represents the value of the leftmost index.

By now you're probably beginning to see a pattern. There is a generic formula that will compute the offset into memory for an array with any number of dimensions; however, you'll rarely use more than four.

Another convenient way to think of row major arrays is as arrays of arrays. Consider the following single dimension Pascal array definition:

 A: array [0..3] of sometype;

Assume that sometype is the type "sometype = array [0..3] of char;".

A is a single dimension array. Its individual elements happen to be arrays, but you can safely ignore that for the time being. The formula to compute the address of an element of a single dimension array is

 Element_Address = Base + Index * Element_Size

In this case Element_Size happens to be four because each element of A is an array of four characters. So what does this formula compute? It computes the base address of each row in this 4x4 array of characters (see Figure 4-8):

click to expand
Figure 4-8: Viewing a 4x4 Array as an Array of Arrays.

Of course, once you compute the base address of a row, you can reapply the single dimension formula to get the address of a particular element. While this doesn't affect the computation at all, conceptually it's probably a little easier to deal with several single dimension computations rather than a complex multidimensional array element address computation.

Consider a Pascal array defined as "A:array [0..3] [0..3] [0..3] [0..3] [0..3] of char;". You can view this five-dimension array as a single dimension array of arrays. The following Pascal code provides such a definition:

 type           OneD = array [0..3] of char;           TwoD = array [0..3] of OneD;           ThreeD = array [0..3] of TwoD;           FourD = array [0..3] of ThreeD; var           A : array [0..3] of FourD;

The size of OneD is four bytes. Because TwoD contains four OneD arrays, its size is 16 bytes. Likewise, ThreeD is four TwoDs, so it is 64 bytes long. Finally, FourD is four ThreeDs, so it is 256 bytes long. To compute the address of "A [b, c, d, e, f]" you could use the following steps:

Compute the address of A [b] as "Base + b * size". Here size is 256 bytes. Use this result as the new base address in the next computation.
Compute the address of A [b, c] by the formula "Base + c*size", where Base is the value obtained immediately above, and size is 64. Use the result as the new base in the next computation.
Compute the base address of A [b, c, d] by "Base + d*size" with Base coming from the previous computation and size being 16. Use the result as the new base in the next computation.
Compute the address of A [b, c, d, e] with the formula "Base + e*size" with Base from the preceding computation and size being four. Use this value as the base for the next computation.
Finally, compute the address of A [b, c, d, e, f] using the formula "Base + f*size" where Base comes from the above computation and size is one (obviously you can simply ignore this final multiplication). The result you obtain at this point is the address of the desired element.

One of the main reasons you won't find higher-dimension arrays in assembly language is that assembly language emphasizes the inefficiencies associated with such access. It's easy to enter something like "A [b,c,d,e,f]" into a Pascal program, not realizing what the compiler is doing with the code. Assembly language programmers are not so cavalier: They see the mess you wind up with when you use higher-dimension arrays. Indeed, good assembly language programmers try to avoid two-dimension arrays and often resort to tricks in order to access data in such an array when its use becomes absolutely mandatory. But more on that a little later.

4.25.2 Column Major Ordering

Column major ordering is the other function frequently used to compute the address of an array element. FORTRAN and various dialects of BASIC (e.g., older versions of Microsoft BASIC) use this method to index arrays.

In row major ordering the rightmost index increases the fastest as you moved through consecutive memory locations. In column major ordering the leftmost index increases the fastest. Pictorially, a column major ordered array is organized as shown in Figure 4-9.

click to expand
Figure 4-9: Column Major Array Element Ordering.

The formulae for computing the address of an array element when using column major ordering is very similar to that for row major ordering. You simply reverse the indexes and sizes in the computation:

 For a two-dimension column major array: Element_Address = Base_Address + (rowindex * col_size + colindex) * Element_Size For a three-dimension column major array: Address = Base + ((rowindex*col_size+colindex) * depth_size + depthindex) * Element_Size For a four-dimension column major array: Address =      Base + (((rowindex * col_size + colindex)*depth_size + depthindex) *           Left_size + Leftindex) * Element_Size