Testing for Stack Overflow | Embedded Systems Firmware Demystified (With CD-ROM)

Programs use two different kinds of variables: static (or global) and stack variables. Static variables are assigned a fixed address. Regardless of what functions execute, a static variable always resides at the same address. Stack variables (local or auto variables in C) always belong to a specific function and are only accessible during the life of the function that declared them. The stack variables for a function are created from scratch (on the stack) each time the function runs. Depending on the stack depth when the function starts, its stack variables can very well be placed in a different location in memory.

Static variables are easy to work with but take up memory space all the time. Stack variables are temporarily allocated by the currently running function and are very convenient for temporary storage. The space allocated to a stack variable is given up and used by other functions as soon as the function that declared the variable completes. Both types of variables are necessary, and both have good and bad points.

The allocation and deallocation of stack variables is handled by code at the beginning and end of every function, called, respectively, the prolog and the epilog . This invisible code is generated by the compiler and is part of the function overhead, not part of the functions logic. In most cases, the prolog and epilog allocate and deallocate a stack frame large enough to accommodate the variables declared inside the function. The allocation adjusts the current stack pointer and frame pointer (usually downward) and provides a frame of memory space to be used by the currently running function. When an embedded system is first built, a limited amount of memory is assigned to the stack. If, in the process of executing functions, some function nesting combination causes the stack pointer to go beyond the end of the space that was allocated to the stack, the result is a stack overflow one of the hardest-to-find bugs you can encounter.

What happens following a stack overflow depends on what the targets memory map looks like. In many single-threaded systems, the stack pointer is set to the top of memory (allowing it to grow downward), and the heap (space given to the application by malloc ) starts just above the end of the process text/data/BSS space and grows upward (see Figure 12.1). The point in memory at which the stack overflow occurs varies greatly depending on how much memory is allocated on the heap. Remember that the heap grows upward, the stack grows downward, and somewhere in the middle they meet. The point where heap and stack meet is the point of corruption, and this point is very hard to identify because it depends on the dynamics of your stack and heap.

Figure 12.1: Single-Threaded Application Memory Map.

In a single-threaded application, the stack and heap often grow toward each other in the unused portion of data space.

Figure 12.2: Multi-Threaded Application Memory Map.

In a multi-threaded application, each task has its own stack, greatly complicating the difficulty of detecting an overflow.

A multi-tasking environment complicates things even further. In a multi-tasking environment, you have a separate logical stack for each task. Usually, each task is configured with some stack size that you are able to specify based on what the task is going to do. This design can make things more confusing because now you have potential for T different stacks to overflow (where T is the number of tasks in the system). What the stack overflows into depends on where the stack is. If the system is set up with task stacks allocated from a common block of memory, the overflow of one task stack is likely to corrupt the stack of some other task that is not currently running. If stack frames are scattered throughout memory (Figure 12.2), an overflow can corrupt other variables in the system. Either way, a stack overflow can be extremely hard to track down, primarily because the code that appears to be failing is probably not doing anything illegal on its own; it is instead crashing because some other tasks stack overflowed and corrupted its variables.

This elusive quality of stack overflows gets worse . Almost all embedded systems have interrupts. Interrupt handlers are generally asynchronous to what is going on in application space, and usually the interrupt handler is written in C. The C code in the interrupt handler is no different from any other C code. The interrupt handler code creates a stack frame by adjusting the current stack pointer. The current stack pointer in this case is the stack pointer of the task that was interrupted . If this tasks current function nesting brings its stack pointer close to the end of its allocated memory space, the occurrence of the interrupt causes a stack overflow because that interrupt handler code grows on that stack.

How do we deal with this problem? Ideally, it would be nice just to allocate a lot of memory to each stack. In some cases thats all that is needed. In other cases, you have to spend time figuring out what function nesting, task, and interrupt conditions caused the problem.

Prefill the Stack Memory or Buffer Zone

The most common way to detect a stack overflow is to prefill the RAM space used by the stack with some known pattern and then to detect if this pattern becomes entirely overwritten. For example, you could load the stacks memory space with 0x55 as the pattern. Then you could create an additional task, a buffer checker, that knows the location of each stack, and uses a timer or other event to execute this task periodically. If the buffer checker detects that the end of any of the stacks has been modified, you have detected a stack overflow.

Note that this test only detects the evidence of an overflow you still havent caught the offending application in the act. (It is possible a wild pointer or other fault is responsible for the detected change.) Now you know, however, what task is causing the problem based on which stack overflowed. You can investigate the code that is specific to that task. If there is good reason for the stack usage, however, just increase that tasks stack size. In the case of a single-threaded application memory arrangement shown in Figure 12.1, you might want to prefill a buffer zone. If that buffer zone is ever modified, you then know you had a stack overflow.

On the proactive side, you can also use this method for general-purpose stack size adjustment. Prefill all stacks or buffer zones with a pattern and then adjust allocated stack sizes based on the point at which the pattern is overwritten in each stack. Keep in mind that this scheme is not perfect. Unfortunately, it is very possible that a function can overflow the stack without corrupting the pattern. This is the case if there is a function with an unused array in its frame. For instance, see Listing 12.18.

Listing 12.18: Undetected Stack Overrun.

 func(int arg) {     char buffer[32];     int  val;     if (arg > 45) {         sprintf(buffer,"hey, arg is greater than 45!\n");         func1(buffer);     }     val = func2(arg);     return(val); }

The 32 bytes set aside for buffer might not be modified. If you are near the end of your stack frame and call the function in Listing 12.18 with arg < 45 , and the end of stack space overlaps with the space allocated to buffer, the 0x55 pattern would be left intact, but the overflow would still corrupt the memory space below the end of the stack.

Note	Try to avoid large arrays on the stack. Sometimes you just gotta have em but be careful.

Use a Per Function Stack Frame Checker

An alternative technique for finding stack overflow is to insert a stack check function at the top of every function. This stack-check function is likely to be an assembler function that is capable of looking at the current stack pointer and determining if it is beyond the end of the stack. For multi-tasking systems, this function would also need to be aware of what task is running because different tasks use different stacks.

A stack-check function is ideal for catching an overflow where it actually happens. You can design the function to raise an exception automatically and to perform a stack trace to capture the function nesting that caused the overflow.

Unfortunately, the stack-check function can impose extreme runtime overhead. The execution time of the stack-check function is added to every function call in your system. One option is to use it selectively; some functions might not be suspect. However, the more selective you are, the less likely you are to catch the bug. Also, if you decide to be selective, be aware that even if the function doesnt have any large arrays, it still might be the offending function. Also, it might be the function that called the current function that is really to blame. Referring to Listing 12.18, if func2() does nothing but increment arg and return the value (using very little stack), it still can be the offending function because of the stack depth caused by func() .

A stack-check function may be impractical for other reasons. You might not have source for all the code, so you dont have the option of installing this function everywhere. You might have all the source but might have coded large parts of the project without using the stack check macro. Adding the calls late in a large project is both tedious and error prone.

One Way To Be Prepared

Rather than find out too late that you should have made provisions for a stack check, you might want to make a suitable macro part of your projects coding standard. Because the call is a macro, including it in the code doesnt mean it gets used by the code. Most of the time, you use an empty definition for STKCHK() . With an empty definition, the call does not generate any code. When you encounter a stack problem, you can activate the macro by changing its definition (see Listing 12.19).

Listing 12.19: Creating a Stack-Check Hook.

 #include "stackchecker.h" func() {     STKCHK();     yada, yada, yada }