7.3 RecordsStructures | The Art of Assembly Language

7.3 Records/Structures

Another major composite data structure is the Pascal record or C/C++ structure . The Pascal terminology is probably better, as it avoids confusion with the term data structure . Therefore, we'll adopt the term record here.

An array is homogeneous, meaning that its elements are all of the same type. A record, on the other hand, is heterogeneous and its elements can have differing types. The purpose of a record is to let you encapsulate logically related values into a single object.

Arrays let you select a particular element via an integer index. With records, you must select an element, known as a field , by the field's name. Each of the field names within the record must be unique. That is, the same field name may not appear two or more times in the same record. However, all field names are local to their record, and you may reuse those names elsewhere in the program.

7.3.1 Records in Pascal/Delphi

Here's a typical record declaration for a Student data type in Pascal/Delphi:

 type      Student =           record               Name:     string [64];               Major:    smallint;   // 2-byte integer in Delphi               SSN:      string[11];               Mid1:     smallint;               Midt:     smallint;               Final:    smallint;               Homework: smallint;               Projects: smallint;           end;

Many Pascal compilers allocate all of the fields in contiguous memory locations. This means that Pascal will reserve the first 65 bytes for the name, ^[3] the next 2 bytes hold the major code, the next 12 bytes the Social Security number, and so on.

7.3.2 Records in C/C++

Here's the same declaration in C/C++:

 typedef       struct       {           char Name[65]; // Room for a 64-character zero-terminated string.           short Major;   // Typically a 2-byte integer in C/C++           char SSN[12];  // Room for an 11-character zero-terminated string.           short Mid1;           short Mid2;           short Final;           short Homework;           short Projects       } Student;

7.3.3 Records in HLA

In HLA, you can also create structure types using the record/endrecord declaration. In HLA, you would encode the record from the previous sections as follows :

 type      Student:          record              Name: char[65]; // Room for a 64-character                              // zero-terminated string.              Major: int16;              SSN: char[12];  // Room for an 11-character                              // zero-terminated string.              Mid1: int16;              Mid2: int16;              Final: int16;              Homework: int16;              Projects: int16;          endrecord;

As you can see, the HLA declaration is very similar to the Pascal declaration. Note that to stay consistent with the Pascal declaration, this example uses character arrays rather than strings for the Name and SSN (Social Security number) fields. In a typical HLA record declaration, you'd probably use a string type for at least the Name field (keeping in mind that a string variable is a four-byte pointer).

7.3.4 Memory Storage of Records

The following Pascal example demonstrates a typical Student variable declaration:

 var     John: Student;

Given the earlier declaration for the Pascal Student data type, this allocates 81 bytes of storage laid out in memory as shown in Figure 7-8.

Figure 7-8: Student data structure storage in memory

If the label John corresponds to the base address of this record, then the Name field is at offset John + 0 , the Major field is at offset John + 65 , the SSN field is at offset John + 67 , and so on.

Most programming languages let you refer to a record field by its name rather than by its numeric offset into the record. The typical syntax for field access uses the dot operator to select a field from a record variable. Given the variable John from the previous example, here's how you could access various fields in this record:

 John.Mid1 = 80;           // C/C++ example  John.Final := 93;         (* Pascal Example *)  mov( 75, John.Projects ); // HLA example

Figure 7-8 suggests that all fields of a record appear in memory in the order of their declaration. In theory, a compiler can freely place the fields anywhere in memory that it chooses. In practice, though, almost every compiler places the fields in memory in the same order they appear within the record declaration. The first field usually appears at the lowest address in the record, the second field appears at the next highest address, the third field follows the second field in memory, and so on.

Figure 7-8 also suggests that compilers pack the fields into adjacent memory locations with no gaps between the fields. While this is true for many languages, this certainly isn't the most common memory organization for a record. For performance reasons, most compilers will actually align the fields of a record on appropriate memory boundaries. The exact details vary by language, compiler implementation, and CPU, but a typical compiler will place fields at an offset within the record's storage area that is 'natural' for that particular field's data type. On the 80x86, for example, compilers that follow the Intel ABI (application binary interface) will allocate one-byte objects at any offset within the record, words only at even offsets, and double-word or larger objects on double-word boundaries. Although not all 80x86 compilers support the Intel ABI, most do, which allows records to be shared among functions and procedures written in different languages on the 80x86. Other CPU manufacturers provide their own ABI for their processors and programs that adhere to an ABI can share binary data at run time with other programs that adhere to the same ABI.

In addition to aligning the fields of a record at reasonable offset boundaries, most compilers will also ensure that the length of the entire record is a multiple of two, four, or eight bytes. They accomplish this by adding padding bytes at the end of the record to fill out the record's size. The reason that compilers pad the size of a record is to ensure that the record's length is an even multiple of the size of the largest scalar (non-composite data type) object in the record or the CPU's optimal alignment size , whichever is smaller. For example, if a record has fields whose lengths are one, two, four, eight, and ten bytes long, then an 80x86 compiler will generally pad the record's length so that it is an even multiple of eight. This allows you to create an array of records and be assured that each record in the array starts at a reasonable address in memory.

Although some CPUs don't allow access to objects in memory at misaligned addresses, many compilers allow you to disable the automatic alignment of fields within a record. Generally, the compiler will have an option you can use to globally disable this feature. Many of these compilers also provide a pragma or a packed keyword of some sort that lets you turn off field alignment on a record-by-record basis. Disabling the automatic field alignment feature may allow you to save some memory by eliminating the padding bytes between the fields (and at the end of the record), provided that field misalignment is acceptable on your CPU. The cost, of course, is that the program may run a little bit slower when it needs to access misaligned values in memory.

One reason to use a packed record is to gain manual control over the alignment of the fields within the record. For example, suppose you have acouple of functions written in two different languages and both of these functions need to access some data in a record. Further, suppose that the two compilers for these functions do not use the same field alignment algorithm. A record declaration like the following (in Pascal) may not be compatible with the way both functions access the record data:

 type      aRecord: record          bField : byte;  (* assume Pascal compiler supports a byte type *)          wField : word;  (* assume Pascal compiler supports a word type *)          dField : dword; (* assume Pascal compiler supports a double-word type *)     end; (* record *)

The problem here is that the first compiler could use the offsets zero, two, and four for the bField , wField , and dField fields, respectively, while the second compiler might use offsets zero, four, and eight.

Suppose however, that the first compiler allows you to specify the packed keyword before the record keyword, causing the compiler to store each field immediately following the previous one. Although using the packed keyword will not make the records compatible with both functions, it will allow you to manually add padding fields to the record declaration, as follows:

 type      aRecord: packed record          bField   :byte;          padding0 :array[0..2] of byte; (* add padding to dword align wField *)          wField   :word;          padding1 :word;                (* add padding to dword align dField *)          dField   :dword;      end; (* record *)

Maintaining code where you've handled the padding in a manual fashion can be a real chore. However, if incompatible compilers need to share data, this is a trick worth knowing because it can make data sharing possible. For the exact details concerning packed records, you'll have to consult your language's reference manual.

^[3] Pascal strings usually require an extra byte, in addition to all the characters in the string, to encode the length.