3.3 Run-Time Memory Organization

An operating system like Linux or Windows tends to put different types of data into different sections (or segments) of memory. Although it is possible to reconfigure memory to your choice by running the linker and specify various parameters, by default Windows loads an HLA program into memory using the organization appearing in Figure 3-7 (Linux is similar, though it rearranges some of the sections).

click to expand
Figure 3-7: HLA Typical Run-Time Memory Organization.

The operating system reserves the lowest memory addresses. Generally, your application cannot access data (or execute instructions) at these low addresses. One reason the OS reserves this space is to help trap NULL pointer references. If you attempt to access memory location zero, the operating system will generate a "general protection fault" meaning you've accessed a memory location that doesn't contain valid data. Because programmers often initialize pointers to NULL (zero) to indicate that the pointer is not pointing anywhere, an access of location zero typically means that the programmer has made a mistake and has not properly initialized a pointer to a legal (non-NULL) value. Also note that if you attempt to use one of the 80x86 16-bit addressing modes (HLA doesn't allow this, but were you to encode the instruction yourself and execute it …) the address will always be in the range 0..$1FFFE.^[4] This will also access a location in the reserved area, generating a fault.

The remaining six areas in the memory map hold different types of data associated with your program. These sections of memory include the stack section, the heap section, the code section, the readonly section, the static section, and the storage section. Each of these memory sections correspond to some type of data you can create in your HLA programs. The following sections discuss each of these sections in detail.

3.3.1 The Code Section

The code section contains the machine instructions that appear in an HLA program. HLA translates each machine instruction you write into a sequence of one or more byte values. The CPU interprets these byte values as machine instructions during program execution.

By default, when HLA links your program it tells the system that your program can execute instructions in the code segment and you can read data from the code segment. Note, specifically, that you cannot write data to the code segment. The operating system will generate a general protection fault if you attempt to store any data into the code segment.

Machine instructions are nothing more than data bytes. In theory, you could write a program that stores data values into memory and then transfers control to the data it just wrote, thereby producing a program that writes itself as it executes. This possibility produces romantic visions of artificially intelligent programs that modify themselves to produce some desired result. In real life, the effect is somewhat less glamorous.

Prior to the popularity of protected mode operating systems, like Windows and Linux, a program could overwrite the machine instructions during execution. Most of the time this was due to defects in a program, not because the program was artificially intelligent. A program would begin writing data to some array and fail to stop once it reached the end of the array, eventually overwriting the executing instructions that make up the program. Far from improving the quality of the code, such a defect usually causes the program to fail spectacularly.

Of course, if a feature is available, someone is bound to take advantage of it. Some programmers have discovered that in some special cases, using selfmodifying code — that is, a program that modifies its machine instructions during execution — can produce slightly faster or slightly smaller programs. Unfortunately, self-modifying code is very difficult to test and debug. Given the speed of modern processors combined with their instruction set and wide variety of addressing modes, there is almost no reason to use self-modifying code in a modern program. Indeed, protected mode operating systems like Linux and Windows make it difficult for you to write self-modifying code.

HLA automatically stores the data associated with your machine code into the code section. In addition to machine instructions, you can also store data into the code section by using the following pseudo-opcodes:^[5]

byte
word
dword
uns8
uns16
uns32
int8
int16
in32
boolean
char

The following byte statement exemplifies the syntax for each of these pseudoopcodes:

 byte comma_separated_list_of_byte_constants ;

Here are some examples:

      boolean    true;      char         'A';      byte         0, 1, 2;      byte         "Hello", 0      word         0, 2;      int8         -5;      uns32         356789, 0;

If more than one value appears in the list of values after the pseudo-opcode, HLA emits each successive value to the code stream. So the first byte statement above emits three bytes to the code stream, the values zero, one, and two. If a string appears within a byte statement, HLA emits one byte of data for each character in the string. Therefore, the second byte statement above emits six bytes: the characters ‘H’, ‘e’, ‘l’, ‘l’, and ‘o’, followed by a zero byte.

Keep in mind that the CPU will attempt to treat data you emit to the code stream as machine instructions unless you take special care not to allow the execution of the data. For example, if you write something like the following:

      mov( 0, ax );      byte 0,1,2,3;      add( bx, cx );

Your program will attempt to execute the 0, 1, 2, and 3 byte values as a machine instruction after executing the mov. Unless you know the machine code for a particular instruction sequence, sticking such data values into the middle of your code will almost always produce unexpected results. More often than not, this will crash your program. Therefore, you should never insert arbitrary data bytes into the middle of an instruction stream unless you understand exactly what you are doing. Typically when you place such data in your programs, you'll execute some code that transfers control around the data.

3.3.2 The Static Sections

The static section is where you will typically declare your variables. Although the static section syntactically appears as part of a program or procedure, keep in mind that HLA moves all static variables to the static section in memory. Therefore, HLA does not sandwich the variables you declare in the static section between procedures in the code section.

In addition to declaring static variables, you can also embed lists of data into the static declaration section. You use the same technique to embed data into your static section that you use to embed data into the code section: You use the byte, word, dword, uns32, and so on, pseudo-opcodes. Consider the following example:

 static     b:    byte := 0;         byte 1,2,3;     u:    uns32 := 1;         uns32 5,2,10;     c:    char;         char 'a', 'b', 'c', 'd', 'e', 'f';     bn: boolean;          boolean true;

Data that HLA writes to the static memory segment using these pseudo-opcodes is written to the segment after the preceding variables. For example, the byte values 1, 2, and 3 are emitted to the static section after b's 0 byte. Because there aren't any labels associated with these values, you do not have direct access to these values in your program. You can use the indexed addressing modes to access these extra values (examples will appear a little later in this chapter).

In the examples above, note that the c and bn variables do not have an (explicit) initial value. However, if you don't provide an initial value, HLA will initialize the variables in the static section to all zero bits, so HLA assigns the NULL character (ASCII code zero) to c as its initial value. Likewise, HLA assigns false as the initial value for bn. In particular, you should note that your variable declarations in the static section always consume memory, even if you haven't assigned them an initial value. Any data you declare in a pseudo-opcode like byte will always follow the actual data associated with the variable declaration.

3.3.3 The Read-Only Data Section

The readonly data section holds constants, tables, and other data that your program cannot change during execution. You create read-only objects by declaring them in the readonly declaration section. The readonly section is very similar to the static section with three primary differences:

The readonly section begins with the reserved word readonly rather than static.
All declarations in the readonly section generally have an initializer.
The system does not allow you to store data into a readonly object while the program is running.

Example:

 readonly      pi:          real32 := 3.14159;      e:           real32 := 2.71;      MaxU16:      uns16 := 65_535;      MaxI16:      int16 := 32_767;

All readonly object declarations must have an initializer because you cannot initialize the value under program control.^[6] For all intents and purposes, you can think of readonly objects as constants. However, these constants consume memory and other than you cannot write data to readonly objects, they behave like, and you can use them like, static variables. Because they behave like static objects, you cannot use a readonly object everywhere a constant is allowed; in particular, readonly objects are memory objects, so you cannot supply a readonly object and some other memory object as the operands to an instruction.^[7]

Like the static section, you may embed data values in the readonly section using the byte, word, dword, and so on, data declarations, e.g.,

 readonly      roArray: byte := 0;                     byte 1, 2, 3, 4, 5;      qwVal: qword := 1;                qword 0;

3.3.4 The Storage Section

The readonly section requires that you initialize all objects you declare. The static section lets you optionally initialize objects (or leave them uninitialized, in which case they have the default initial value of zero). The storage section completes the initialization coverage: You use it to declare variables that are always uninitialized when the program begins running. The storage section begins with the storage reserved word and contains variable declarations without initializers. Here is an example:

 storage      UninitUns32:    uns32;      i:              int32;      character:      char;      b:              byte;

Linux and Windows will initialize all storage objects to zero when they load your program into memory. However, it's probably not a good idea to depend upon this implicit initialization. If you need an object initialized with zero, declare it in a static section and explicitly set it to zero.

Variables you declare in the storage section may consume less disk space in the executable file for the program. This is because HLA writes out initial values for readonly and static objects to the executable file, but uses a compact representation for uninitialized variables you declare in the storage section; note, however, that this behavior is OS and object-module format dependent. Because the storage section does not allow initialized values, you cannot put unlabeled values in the storage section using the byte, word, dword, and so on, pseudo-opcodes.

3.3.5 The @NOSTORAGE Attribute

The @nostorage attribute lets you declare variables in the static data declaration sections (i.e., static, readonly, and storage) without actually allocating memory for the variable. The @nostorage option tells HLA to assign the current address in a declaration section to a variable but do not allocate any storage for the object. That variable will share the same memory address as the next object appearing in the variable declaration section. Here is the syntax for the @nostorage option:

 variableName: varType; @nostorage;

Note that you follow the type name with "@nostorage;" rather than some initial value or just a semicolon. The following code sequence provides an example of using the @nostorage option in the readonly section:

 readonly      abcd: dword; nostorage;                byte 'a', 'b', 'c', 'd';

In this example, abcd is a double word whose L.O. byte contains 97 (‘a’), byte #1 contains 98 (‘b’), byte #2 contains 99 (‘c’), and the H.O. byte contains 100 (‘d’). HLA does not reserve storage for the abcd variable, so HLA associates the following four bytes in memory (allocated by the byte directive) with abcd.

Note that the @nostorage attribute is only legal in the static, storage, and readonly sections (the so-called static declarations sections). HLA does not allow its use in the var section that you'll read about next.

3.3.6 The Var Section

HLA provides another variable declaration section, the var section, that you can use to create automatic variables. Your program will allocate storage for automatic variables whenever a program unit (i.e., main program or procedure) begins execution, and it will deallocate storage for automatic variables when that program unit returns to its caller. Of course, any automatic variables you declare in your main program have the same lifetime^[8] as all the static, readonly, and storage objects, so the automatic allocation feature of the var section is wasted in the main program. In general, you should only use automatic objects in procedures (see the chapter on procedures for details). HLA allows them in your main program's declaration section as a generalization.

Because variables you declare in the var section are created at runtime, HLA does not allow initializers on variables you declare in this section. So the syntax for the var section is nearly identical to that for the storage section; the only real difference in the syntax between the two is the use of the var reserved word rather than the storage reserved word.^[9] The following example illustrates this:

 var       vInt:    int32;       vChar:   char;

HLA allocates variables you declare in the var section in the stack memory section. HLA does not allocate var objects at fixed locations within the stack segment; instead, it allocates these variables in an activation record associated with the current program unit. The chapter on procedures, later in this book, will discuss activation records in greater detail; for now it is important only to realize that HLA programs use the EBP register as a pointer to the current activation record. Therefore, any time you access a var object, HLA automatically replaces the variable name with "[EBP displacement]". Displacement is the offset of the object in the activation record. This means that you cannot use the full scaled indexed addressing mode (a base register plus a scaled index register) with var objects because var objects already use the EBP register as their base register. Although you will not directly use the two register addressing modes often, the fact that the var section has this limitation is a good reason to avoid using the var section in your main program.

3.3.7 Organization of Declaration Sections Within Your Programs

The static, readonly, storage, and var sections may appear zero or more times between the program header and the associated begin for the main program. Between these two points in your program, the declaration sections may appear in any order, as the following example demonstrates:

 program demoDeclarations; static      i_static:      int32; var      i_auto:      int32; storage      i_uninit:      int32; readonly      i_readonly: int32 := 5; static      j:      uns32; var      k:      char; readonly      i2:      uns8 := 9; storage      c:      char; storage      d:      dword; begin demoDeclarations;      << code goes here >> end demoDeclarations;

In addition to demonstrating that the sections may appear in an arbitrary order, this section also demonstrates that a given declaration section may appear more than once in your program. When multiple declaration sections of the same type (e.g., the three storage sections above) appear in a declaration section of your program, HLA combines them into a single group.

^[4]It's $1FFFE, not $FFFF, because you could use the indexed addressing mode with a displacement of $FFFF along with the value $FFFF in a 16-bit register.

^[5]This isn't a complete list. HLA generally allows you to use any scalar data type name as a statement to reserve storage in the code section. You'll learn more about the available data types later in this text.

^[6]There is one exception you'll see a little later in this chapter.

^[7]mov is an exception to this rule because HLA emits special code for memory-to-memory move operations.

^[8]The lifetime of a variable is the point from which memory is first allocated to the point the memory is deallocated for that variable.

^[9]Actually, there are a few other, minor differences, but we won't deal with those differences in this text. See the HLA Reference Manual for more details.