14.9 Object Implementation

In a high level object-oriented language like C++ or Delphi, it is quite possible to master the use of objects without really understanding how the machine implements them. One of the reasons for learning assembly language programming is to fully comprehend low level implementation details so you can make educated decisions concerning the use of programming constructs like objects. Further, because assembly language allows you to poke around with data structures at a very low level, knowing how HLA implements objects can help you create certain algorithms that would not be possible without a detailed knowledge of object implementation. Therefore, this section and its corresponding subsections explain the low level implementation details you will need to know in order to write object-oriented HLA programs.

HLA implements objects in a manner quite similar to records. In particular, HLA allocates storage for all var objects in a class in a sequential fashion, just like records. Indeed, if a class consists of only var data fields, the memory representation of that class is nearly identical to that of a corresponding record declaration. Consider the Student record declaration taken from Chapter 4 and the corresponding class (see Figures 14-1 and 14-2, respectively):

 type      student: record                Name: char[65];                Major: int16;                SSN: char[12];                Midterm1: int16;                Midterm2: int16;                Final: int16;                Homework: int16;                Projects: int16;      endrecord; student2: class      var           Name: char[65];           Major: int16;           SSN: char[12];           Midterm1: int16;           Midterm2: int16;           Final: int16;           Homework: int16;           Projects: int16; endclass;

click to expand
Figure 14-1: Student RECORD Implementation in Memory.

click to expand
Figure 14-2: Student CLASS Implementation in Memory.

If you look carefully at Figures 14-1 and 14-2, you'll discover that the only difference between the class and the record implementations is the inclusion of the VMT (virtual method table) pointer field at the beginning of the class object. This field, which is always present in a class, contains the address of the class's virtual method table that, in turn, contains the addresses of all the class's methods and iterators. The VMT field, by the way, is present even if a class doesn't contain any methods or iterators.

As pointed out in previous sections, HLA does not allocate storage for static objects within the object. Instead, HLA allocates a single instance of each static data field that all objects share. As an example, consider the following class and object declarations:

 type      tHasStatic: class           var                i:int32;                j:int32;                r:real32;           static                c:char[2];                b:byte;      endclass; var      hs1: tHasStatic;      hs2: tHasStatic;

Figure 14-3 shows the storage allocation for these two objects in memory.

click to expand
Figure 14-3: Object Allocation with Static Data Fields.

Of course, const, val, and #macro objects do not have any run-time memory requirements associated with them, so HLA does not allocate any storage for these fields. Like the static data fields, you may access const, val, and #macro fields using the class name as well as an object name. Hence, even if tHasStatic has these types of fields, the memory organization for tHasStatic objects would still be the same as shown in Figure 14-3.

Other than the presence of the virtual method table (VMT) pointer, the presence of methods and procedures has no impact on the storage allocation of an object. Of course, the machine instructions associated with these routines do appear somewhere in memory. So in a sense the code for the routines is quite similar to static data fields insofar as all the objects share a single instance of the routine.

14.9.1 Virtual Method Tables

When HLA calls a class procedure, it directly calls that procedure using a call instruction, just like any normal procedure call. Methods are another story altogether. Each object in the system carries a pointer to a virtual method table, which is an array of pointers to all the methods and iterators appearing within the object's class (see Figure 14-4).

click to expand
Figure 14-4: Virtual Method Table Organization.

Each iterator or method you declare in a class has a corresponding entry in the virtual method table. That double word entry contains the address of the first instruction of that iterator or method. Calling a class method or iterator is a bit more work than calling a class procedure (it requires one additional instruction plus the use of the EDI register). Here is a typical calling sequence for a method:

 mov( ObjectAdrs, ESI );           // All class routines do this. mov( [esi], edi );                // Get the address of the VMT into EDI call( (type dword [edi+n]));      // "n" is the offset of the method's                                   // entry in the VMT.

For a given class there is only one copy of the VMT in memory. This is a static object so all objects of a given class type share the same VMT. This is reasonable since all objects of the same class type have exactly the same methods and iterators (see Figure 14-5).

click to expand
Figure 14-5: All Objects That Are the Same Class Type Share the Same VMT.

Although HLA builds the VMT record structure as it encounters methods and iterators within a class, HLA does not automatically create the virtual method table for you. You must explicitly declare this table in your program. To do this, you include a statement like the following in a static or readonly declaration section of your program, e.g.,

 readonly      VMT( classname );

Because the addresses in a virtual method table should never change during program execution, the readonly section is probably the best choice for declaring VMTs. It should go without saying that changing the pointers in a VMT is, in general, a really bad idea. So putting VMTs in a static section is usually not a good idea.

A declaration like the one above defines the variable classname._VMT_. In the section on constructors coming up later this chapter, you will see that you need this name when initializing object variables. The class declaration automatically defines the classname._VMT_ symbol as an external static variable. The declaration above just provides the actual definition for this external symbol.

The declaration of a VMT uses a somewhat strange syntax because you aren't actually declaring a new symbol with this declaration; you're simply supplying the data for a symbol that you previously declared implicitly by defining a class. That is, the class declaration defines the static table variable classname._VMT_; all you're doing with the VMT declaration is telling HLA to emit the actual data for the table. If, for some reason, you would like to refer to this table using a name other than classname._VMT_, HLA does allow you to prefix the declaration above with a variable name, e.g.,

 readonly      myVMT: VMT( classname );

In this declaration, myVMT is an alias of classname._VMT_. As a general rule, you should avoid aliases in a program because they make the program more difficult to read and understand. Therefore, it is unlikely that you would ever really need to use this type of declaration.

Like any other global static variable, there should be only one instance of a VMT for a given class in a program. The best place to put the VMT declaration is in the same source file as the class's method, iterator, and procedure code (assuming they all appear in a single file). This way you will automatically link in the VMT whenever you link in the routines for a given class.

14.9.2 Object Representation with Inheritance

Up to this point, the discussion of the implementation of class objects has ignored the possibility of inheritance. Inheritance only affects the memory representation of an object by adding fields that are not explicitly stated in the class declaration.

Adding inherited fields from a base class to another class must be done carefully. Remember, an important attribute of a class that inherits fields from a base class is that you can use a pointer to the base class to access the inherited fields from that base class, even if the pointer contains the address of some other class (that inherits the fields from the base class). As an example, consider the following classes:

 type      tBaseClass: class           var                i:uns32;                j:uns32;                r:real32;           method mBase;      endclass;      tChildClassA: class inherits( tBaseClass )           var                c:char;                b:boolean;                w:word;           method mA;      endclass;      tChildClassB: class inherits( tBaseClass )           var                d:dword;                c:char;                a:byte[3];      endclass;

Because both tChildClassA and tChildClassB inherit the fields of tBaseClass, these two child classes include the i, j, and r fields as well as their own specific fields. Furthermore, whenever you have a pointer variable whose base type is tBaseClass, it is legal to load this pointer with the address of any child class of tBaseClass; therefore, it is perfectly reasonable to load such a pointer with the address of a tChildClassA or tChildClassB variable. For example:

 var      B1: tBaseClass;      CA: tChildClassA;      CB: tChildClassB;      ptr: pointer to tBaseClass;           .           .           .      lea( ebx, B1 );      mov( ebx, ptr );      << Use ptr >>           .           .           .      lea( eax, CA );      mov( ebx, ptr );      << Use ptr >>           .           .           .      lea( eax, CB );      mov( eax, ptr );      << Use ptr >>

Because ptr points at an object of type tBaseClass, you may legally (from a semantic sense) access the i, j, and r fields of the object where ptr is pointing. It is not legal to access the c, b, w, or d fields of the tChildClassA or tChildClassB objects because at any one given moment the program may not know exactly what object type ptr references.

In order for inheritance to work properly, the i, j, and r fields must appear at the same offsets in all child classes as they do in tBaseClass. This way, an instruction of the form "mov((type tBaseClass [ebx]).i, eax);" will correct access the i field even if EBX points at an object of type tChildClassA or tChildClassB. Figure 14-6 shows the layout of the child and base classes:

click to expand
Figure 14-6: Layout of Base and Child Class Objects in Memory.

Note that the new fields in the two child classes bear no relation to one another, even if they have the same name (e.g., the c fields in the two child classes do not lie at the same offset). Although the two child classes share the fields they inherit from their common base class, any new fields they add are unique and separate. Two fields in different classes share the same offset only by coincidence if those fields are not inherited from a common base class.

All classes (even those that aren't related to one another) place the pointer to the virtual method table at offset zero within the object. There is a single VMT associated with each class in a program; even classes that inherit fields from some base class have a VMT that is (generally) different than the base class's VMT. Figure 14-7 shows how objects of type tBaseClass, tChildClassA, and tChildClassB point at their specific VMTs:

click to expand
Figure 14-7: Virtual Method Table References from Objects.

A virtual method table is nothing more than an array of pointers to the methods and iterators associated with a class. The address of the first method or iterator that appears in a class is at offset zero, the address of the second appears at offset four, and so on. You can determine the offset value for a given iterator or method by using the @offset function. If you want to call a method directly (using 80x86 syntax rather than HLA's high level syntax), you could use code like the following:

 var      sc: tBaseClass;           .           .           .      lea( esi, sc );                    // Get the address of the object (& VMT).      mov( [esi], edi );                 // Put address of VMT into EDI.      call( (type dword [edi+@offset( tBaseClass.mBase )] );

Of course, if the method has any parameters, you must push them onto the stack before executing the code above. Don't forget when making direct calls to a method you must load ESI with the address of the object. Any field references within the method will probably depend upon ESI containing this address. The choice of EDI to contain the VMT address is nearly arbitrary. Unless you're doing something tricky (like using EDI to obtain run-time type information), you could use any register you please here. As a general rule, you should use EDI when simulating class method calls because this is the convention that HLA employs and most programmers will expect this.

Whenever a child class inherits fields from some base class, the child class's VMT also inherits entries from the base class's VMT. For example, the VMT for class tBaseClass contains only a single entry — a pointer to method tBaseClass.mBase. The VMT for class tChildClassA contains two entries: a pointer to tBaseClass.mBase and tChildClassA.mA. Because tChildClassB doesn't define any new methods or iterators, tChildClassB's VMT contains only a single entry, a pointer to the tBaseClass.mBase method. Note that tChildClassB's VMT is identical to tBaseclass's VMT. Nevertheless, HLA produces two distinct VMTs. This is a critical fact that we will make use of a little later. Figure 14-8 shows the relationship between these VMTs.

click to expand
Figure 14-8: Virtual Method Tables for Inherited Classes.

Although the VMT always appears at offset zero in an object (and, therefore, you can access the VMT using the address expression "[ESI]" if ESI points at an object), HLA actually inserts a symbol into the symbol table so you may refer to the VMT symbolically. The symbol _pVMT_ (pointer to virtual methodtable) provides this capability. So a more readable way to access the VMT pointer (as in the previous code example) is

      lea( esi, sc );      mov( (type tBaseClass [esi])._pVMT_, edi );      call( (type dword [edi+@offset( tBaseClass.mBase )] );

If you need to access the VMT directly, there are a couple ways to do this. Whenever you declare a class object, HLA automatically includes a field named _VMT_ as part of that class. _VMT_ is a static array of double word objects. Therefore, you may refer to the VMT using an identifier of the form classname._VMT_. Generally, you shouldn't access the VMT directly, but as you'll see shortly, there are some good reasons why you need to know the address of this object in memory.