4.7 Pointer Data Types


4.7 Pointer Data Types

Some people refer to pointers as scalar data types; others refer to them as composite data types. This text will treat them as scalar data types even though they exhibit some tendencies of both scalar and composite data types.

Of course, the place to start is with the question "What is a pointer?" Now you've probably experienced pointers firsthand in the Pascal, C, or Ada programming languages, and you're probably getting worried right now. Almost everyone has a real bad experience when they first encounter pointers in a high level language. Well, fear not! Pointers are actually easier to deal with in assembly language. Besides, most of the problems you had with pointers probably had nothing to do with pointers, but rather with the linked list and tree data structures you were trying to implement with them. Pointers, on the other hand, have many uses in assembly language that have nothing to do with linked lists, trees, and other scary data structures. Indeed, simple data structures like arrays and records often involve the use of pointers. So if you've got some deep-rooted fear about pointers, well forget everything you know about them. You're going to learn how great pointers really are.

Probably the best place to start is with the definition of a pointer. Just exactly what is a pointer, anyway? Unfortunately, high level languages like Pascal tend to hide the simplicity of pointers behind a wall of abstraction. This added complexity (which exists for good reason, by the way) tends to frighten programmers because they don't understand what's going on.

Now if you're afraid of pointers, well, let's just ignore them for the time being and work with an array. Consider the following array declaration in Pascal:

      M: array [0..1023] of integer; 

Even if you don't know Pascal, the concept here is pretty easy to understand. M is an array with 1024 integers in it, indexed from M[0] to M[1023]. Each one of these array elements can hold an integer value that is independent of all the others. In other words, this array gives you 1024 different integer variables, each of which you refer to by number (the array index) rather than by name.

If you encounter a program that has the statement "M[0]:=100;" you probably wouldn't have to think at all about what is happening with this statement. It is storing the value 100 into the first element of the array M. Now consider the following two statements:

      i := 0; (* Assume "i" is an integer variable *)      M [i] := 100; 

You should agree, without too much hesitation, that these two statements perform the same operation as "M[0]:=100;". Indeed, you're probably willing to agree that you can use any integer expression in the range 01023 as an index into this array. The following statements still perform the same operation as our single assignment to index zero:

      i := 5;             (* assume all variables are integers*)      j := 10;      k := 50;      m [i*j-k] := 100; 

"Okay, so what's the point?" you're probably thinking. "Anything that produces an integer in the range 01023 is legal. So what?" Okay, how about this:

      M [1] := 0;      M [ M [1] ] := 100; 

Whoa! Now that takes a few moments to digest. However, if you take it slowly, it makes sense, and you'll discover that these two instructions perform the exact same operation you've been doing all along. The first statement stores zero into array element M[1]. The second statement fetches the value of M[1], which is an integer so you can use it as an array index into M, and uses that value (zero) to control where it stores the value 100.

If you're willing to accept the above as reasonable, perhaps bizarre, but usable nonetheless, then you'll have no problems with pointers. Because m[1] is a pointer! Well, not really, but if you were to change "M" to "memory" and treat this array as all of memory, this is the exact definition of a pointer.

4.7.1 Using Pointers in Assembly Language

A pointer is simply a memory location whose value is the address (or index, if you prefer) of some other memory location. Pointers are very easy to declare and use in an assembly language program. You don't even have to worry about array indices or anything like that.

An HLA pointer is a 32-bit value that may contain the address of some other variable. If you have a dword variable p that contains $1000_0000, then p "points" at memory location $1000_0000. To access the dword that p points at, you could use code like the following:

       mov( p, ebx );           // Load EBX with the value of pointer p.       mov( [ebx], eax );       // Fetch the data that p points at. 

By loading the value of p into EBX this code loads the value $1000_0000 into EBX (assuming p contains $1000_0000 and, therefore, points at memory location $1000_0000). The second instruction above loads the EAX register with the word starting at the location whose offset appears in EBX. Because EBX now contains $1000_0000, this will load EAX from locations $1000_0000 through $1000_0003.

Why not just load EAX directly from location $1000_0000 using an instruction like "mov( mem, EAX );" (assuming mem is at address $1000_0000)? Well, there are a lot of reasons. But the primary reason is that this single instruction always loads EAX from location mem. You cannot change the address from which it loads EAX. The former instructions, however, always load EAX from the location where p is pointing. This is very easy to change under program control. In fact, the simple instruction "mov( &mem2, p );" will cause those same two instructions above to load EAX from mem2 the next time they execute. Consider the following instruction sequence:

      mov( &i, p );           // Assume all variables are STATIC variables.          .          .          .      if( some_expression ) then           mov( &j, p );      // Assume the code above skips this                                 // instruction and           .                     // you get to the next instruction by jumping           .                     // to this point from somewhere else.           .           endif;           mov( p, ebx );        // Assume both of the above code paths           mov( [ebx], eax );    // wind up down here. 

This short example demonstrates two execution paths through the program. The first path loads the variable p with the address of the variable i. The second path through the code loads p with the address of the variable j. Both execution paths converge on the last two mov instructions that load EAX with i or j depending upon which execution path was taken. In many respects, this is like a parameter to a procedure in a high level language like Pascal. Executing the same instructions accesses different variables depending on whose address (i or j) winds up in p.

4.7.2 Declaring Pointers in HLA

Because pointers are 32 bits long, you could simply use the dword type to allocate storage for your pointers. However, there is a much better way to do this: HLA provides the "pointer to" phrase specifically for declaring pointer variables. Consider the following example:

 static      b:         byte;      d:         dword;      pByteVar:  pointer to byte := &b;      pDWordVar: pointer to dword := &d; 

This example demonstrates that it is possible to initialize as well as declare pointer variables in HLA. Note that you may only take addresses of static variables (static, readonly, and storage objects) with the address-of operator, so you can only initialize pointer variables with the addresses of static objects.

You can also define your own pointer types in the type section of an HLA program. For example, if you often use pointers to characters, you'll probably want to use a type declaration like the one in the following example:

 type      ptrChar:      pointer to char; static      cString:      ptrChar; 

4.7.3 Pointer Constants and Pointer Constant Expressions

HLA allows two literal pointer constant forms: the address-of operator followed by the name of a static variable or the constant NULL. In addition to these two literal pointer constants, HLA also supports simple pointer constant expressions.

The constant zero represents the NULL pointer — that is, an illegal address that does not exist.[10] Programs typically initialize pointers with NULL to indicate that a pointer has explicitly not been initialized.

In addition to simple address literals and the value zero, HLA allows very simple constant expressions wherever a pointer constant is legal. Pointer constant expressions take one of the three following forms:

         &StaticVarName [ PureConstantExpression ]         &StaticVarName + PureConstantExpression         &StaticVarName - PureConstantExpression 

The PureConstantExpression term is a numeric constant expression that does not involve any pointer constants. This type of expression produces a memory address that is the specified number of bytes before or after ("-" or "+", respectively) the StaticVarName variable in memory. Note that the first two forms above are semantically equivalent; they both return a pointer constant whose address is the sum of the static variable and the constant expression.

Because you can create pointer constant expressions, it should come as no surprise to discover that HLA lets you define manifest pointer constants in the const section. The program in Listing 4-5 demonstrates how you can do this.

Listing 4-5: Pointer Constant Expressions in an HLA Program.

start example
 program PtrConstDemo; #include( "stdlib.hhf" ); static      b: byte := 0;         byte 1, 2, 3, 4, 5, 6, 7; const      pb:= &b + 1; begin PtrConstDemo;      mov( pb, ebx );      mov( [ebx], al );      stdout.put( "Value at address pb = $", al, nl ); end PtrConstDemo; 
end example

Upon execution, this program prints the value of the byte just beyond b in memory (which contains the value $01).

4.7.4 Pointer Variables and Dynamic Memory Allocation

Pointer variables are the perfect place to store the return result from the HLA Standard Library malloc function. The malloc function returns the address of the storage it allocates in the EAX register; therefore, you can store the address directly into a pointer variable with a single mov instruction immediately after a call to malloc:

 type      bytePtr:      pointer to byte; var      bPtr: bytePtr;           .           .           .      malloc( 1024 );                     // Allocate a block of 1,024 bytes.      mov( eax, bPtr );                   // Store address of block in bPtr.           .           .           .      free( bPtr );                         // Free the allocated block when done                                           // using it.           .           .           . 

In addition to malloc and free, the HLA Standard Library provides a realloc procedure. The realloc routine takes two parameters: a pointer to a block of storage that malloc (or realloc) previously created and a new size. If the new size is less than the old size, realloc releases the storage at the end of the allocated block back to the system. If the new size is larger than the current block, then realloc will allocate a new block and move the old data to the start of the new block, then free the old block.

Typically, you would use realloc to correct a bad guess about a memory size you'd made earlier. For example, suppose you want to read a set of values from the user but you won't know how many memory locations you'll need to hold the values until after the user has entered the last value. You could make a wild guess and then allocate some storage using malloc based on your estimate. If, during the input, you discover that your estimate was too low, simply call realloc with a larger value. Repeat this as often as required until all the input is read. Once input is complete, you can make a call to realloc to release any unused storage at the end of the memory block.

The realloc procedure uses the following calling sequence:

 realloc( ExistingPointer, NewSize ); 

Realloc returns a pointer to the newly allocated block in the EAX register.

One danger exists when using realloc. If you've made multiple copies of pointers into a block of storage on the heap and then call realloc to resize that block, all the existing pointers are now invalid. Effectively realloc frees the existing storage and then allocates a new block. That new block may not be in the same memory location at the old block, so any existing pointers (into the block) that you have will be invalid after the realloc call.

4.7.5 Common Pointer Problems

There are five common problems programmers encounter when using pointers. Some of these errors will cause your programs to immediately stop with a diagnostic message; other problems are more subtle, yielding incorrect results without otherwise reporting an error or simply affecting the performance of your program without displaying an error. These five problems are

  • Using an uninitialized pointer

  • Using a pointer that contains an illegal value (e.g., NULL)

  • Continuing to use malloc'd storage after that storage has been freed

  • Failing to free storage once the program is done using it

  • Accessing indirect data using the wrong data type

The first problem above is using a pointer variable before you have assigned a valid memory address to the pointer. Beginning programmers often don't realize that declaring a pointer variable only reserves storage for the pointer itself, it does not reserve storage for the data that the pointer references. The short program in Listing 4-6 demonstrates this problem.

Listing 4-6: Uninitialized Pointer Demonstration.

start example
 // Program to demonstrate use of // an uninitialized pointer. Note // that this program should terminate // with a Memory Access Violation exception. program UninitPtrDemo; #include( "stdlib.hhf" ); static      // Note: by default, varibles in the      // static section are initialized with      // zero (NULL) hence the following      // is actually initialized with NULL,      // but that will still cause our program      // to fail because we haven't initialized      // the pointer with a valid memory address.      Uninitialized: pointer to byte; begin UninitPtrDemo;      mov( Uninitialized, ebx );      mov( [ebx], al );      stdout.put( "Value at address Uninitialized: = $", al, nl ); end UninitPtrDemo; 
end example

Although variables you declare in the static section are, technically, initialized, static initialization still doesn't initialize the pointer in this program with a valid address (it initializes them with zero, which is NULL).

Of course, there is no such thing as a truly uninitialized variable on the 80x86. What you really have are variables that you've explicitly given an initial value and variables that just happen to inherit whatever bit pattern was in memory when storage for the variable was allocated. Much of the time, these garbage bit patterns laying around in memory don't correspond to a valid memory address. Attempting to dereference such a pointer (that is, access the data in memory at which it points) typically raises a Memory Access Violation exception.

Sometimes, however, those random bits in memory just happen to correspond to a valid memory location you can access. In this situation, the CPU will access the specified memory location without aborting the program. Although to a naive programmer this situation may seem preferable to stopping the program, in reality this is far worse because your defective program continues to run without alerting you to the problem. If you store data through an uninitialized pointer, you may very well overwrite the values of other important variables in memory. This defect can produce some very difficul-to-locate problems in your program.

The second problem programmers have with pointers is storing invalid address values into a pointer. The first problem, described previously, is actually a special case of this second problem (with garbage bits in memory supplying the invalid address rather than you producing via a miscalculation). The effects are the same; if you attempt to dereference a pointer containing an invalid address you will either get a Memory Access Violation exception or you will access an unexpected memory location.

The third problem listed previously is also known as the dangling pointer problem. To understand this problem, consider the following code fragment:

    malloc( 256 );        // Allocate some storage.    mov( eax, ptr );      // Save address away in a pointer variable.      .      .                   // Code that use the pointer variable "ptr".      .    free( ptr );          // Free the storage associated with "ptr".      .      .                   // Code that does not change the value in                          // "ptr".      .    mov( ptr, ebx );    mov( al, [ebx] ); 

In this example you will note that the program allocates 256 bytes of storage and saves the address of that storage in the ptr variable. Then the code uses this block of 256 bytes for a while and frees the storage, returning it to the system for other uses. Note that calling free does not change the value of ptr in any way; ptr still points at the block of memory allocated by malloc earlier. Indeed, free does not change any data in this block, so upon return from free, ptr still points at the data stored into the block by this code. However, note that the call to free tells the system that this 256-byte block of memory is no longer needed by the program, and the system can use this region of memory for other purposes. The free function cannot enforce the fact that you will never access this data again; you are simply promising that you won't. Of course, the code fragment above breaks this promise; as you can see in the last two instructions above the program fetches the value in ptr and accesses the data it points at in memory.

The biggest problem with dangling pointers is that you can get away with using them a good part of the time. As long as the system doesn't reuse the storage you've freed, using a dangling pointer produces no ill effects in your program. However, with each new call to malloc, the system may decide to reuse the memory released by that previous call to free. When this happens, any attempt to dereference the dangling pointer may produce some unintended consequences. The problems range from reading data that has been overwritten (by the new, legal use of the data storage) to overwriting the new data and to (the worst case) overwriting system heap management pointers (doing so will probably cause your program to crash). The solution is clear: Never use a pointer value once you free the storage associated with that pointer.

Of all the problems, the fourth (failing to free allocated storage) will probably have the least impact on the proper operation of your program. The following code fragment demonstrates this problem:

      malloc( 256 );      mov( eax, ptr );           .                // Code that uses the data where ptr is pointing.           .                // This code does not free up the storage           .                // associated with ptr.      malloc( 512 );      mov( eax, ptr );      // At this point, there is no way to reference the original      // block of 256 bytes pointed at by ptr. 

In this example the program allocates 256 bytes of storage and references this storage using the ptr variable. At some later time the program allocates another block of bytes and overwrites the value in ptr with the address of this new block. Note that the former value in ptr is lost. Because the program no longer has this address value, there is no way to call free to return the storage for later use. As a result, this memory is no longer available to your program. While making 256 bytes of memory inaccessible to your program may not seem like a big deal, imagine that this code is in a loop that repeats over and over again. With each execution of the loop the program loses another 256 bytes of memory. After a sufficient number of loop iterations, the program will exhaust the memory available on the heap. This problem is often called a memory leak because the effect is the same as though the memory bits were leaking out of your computer (yielding less and less available storage) during program execution.[11]

Memory leaks are far less damaging than using dangling pointers. Indeed, there are only two problems with memory leaks: the danger of running out of heap space (which, ultimately, may cause the program to abort, though this is rare) and performance problems due to virtual memory page swapping. Nevertheless, you should get in the habit of always freeing all storage once you are done using it. When your program quits, the operating system reclaims all storage including the data lost via memory leaks. Therefore, memory lost via a leak is only lost to your program, not the whole system.

The last problem with pointers is the lack of type-safe access. This can occur because HLA cannot and does not enforce pointer type checking. For example, consider the program in Listing 4-7.

Listing 4-7: Type-Unsafe Pointer Access Example.

start example
 // Program to demonstrate use of // lack of type checking in pointer // accesses. program BadTypePtrDemo; #include( "stdlib.hhf" ); static      ptr:      pointer to char;      cnt:      uns32; begin BadTypePtrDemo;      // Allocate sufficient characters      // to hold a line of text input      // by the user:      malloc( 256 );      mov( eax, ptr );      // Okay, read the text a character      // at a time by the user:      stdout.put( "Enter a line of text: ");      stdin.flushInput();      mov( 0, cnt );      mov( ptr, ebx );      repeat           stdin.getc();          // Read a character from the user.           mov( al, [ebx] );      // Store the character away.           inc( cnt );            // Bump up count of characters.           inc( ebx );            // Point at next position in memory.      until( stdin.eoln());      // Okay, we've read a line of text from the user,      // now display the data:      mov( ptr, ebx );      for( mov( cnt, ecx ); ecx > 0; dec( ecx )) do           mov( [ebx], eax );           stdout.put( "Current value is $", eax, nl );           inc( ebx );      endfor;      free( ptr ); end BadTypePtrDemo; 
end example

This program reads in data from the user as character values and then displays the data as double word hexadecimal values. While a powerful feature of assembly language is that it lets you ignore data types at will and automatically coerce the data without any effort, this power is a two-edged sword. If you make a mistake and access indirect data using the wrong data type, HLA and the 80x86 may not catch the mistake, and your program may produce inaccurate results. Therefore, you need to take care when using pointers and indirection in your programs that you use the data consistently with respect to data type.

[10]Actually, address zero does exist, but if you try to access it under Windows or Linux you will get a general protection fault.

[11]Note that the storage isn't lost from you computer; once your program quits, it returns all memory (including unfreed storage) to the O/S. The next time the program runs it will start with a clean slate.




The Art of Assembly Language
The Art of Assembly Language
ISBN: 1593272073
EAN: 2147483647
Year: 2005
Pages: 246
Authors: Randall Hyde

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net