Code Hardening Techniques | GNU/Linux Application Programming (Programming Series)

Code hardening can take a number of different forms, and entire books have been written on the topic. In this section, we ll look at a variety of techniques that can help build better code.

Return Values

The failure to check return values is one of the most common mistakes made in modern software. Many applications call user or system functions and are very optimistic about their successful operation. When building hardened software, all reasonable attempts should be made to check return values, and if failures are found, deal with them appropriately. Reasonable attempts is a key here; consider the following bogus example:

 ret = printf(Current mode is %d\n, mode);     if (ret < 0) {       ret = printf(An error occured emitting mode.\n);     }

The point is easily illustrated , but in most cases (of user and system calls) the return value is relevant and should be checked in every case.

Strongly Consider User/Network I/O

Whenever we develop applications that take input either from a user or from the network (such as a Sockets application), it s even more critical to scrutinize the incoming data. Errors such as insufficient data for a given operation or more data received than buffer space is available for are two of the most common.

Use Safe String Functions

A number of standard C library functions suffer from security problems. The problem they present is that there s no bounds checking, which means that they can be exploited (we ll discuss the buffer overflow issue shortly). The simple solution to this problem is to avoid unsafe functions and instead use the safe versions (as shown in Table 26.1).

Buffer Overflow

Buffer overruns cause unpredictable software behavior in the best case and security exploits in the worst. Buffer overruns can be avoided very simply. Consider the following erroneous example:

 static char myArray[10];     ...     int i;     for (i = 0 ; i < 10 ; i++) {       myArray[i] = (char)(0x30+i);     }     myArray[i] = 0;    // <Overrun

Table 26.1: Safe Replacements for C Library Functions
Unsafe Function	Safe Replacement	Header
gets	fgets	stdio.h
sprintf	snprintf	stdio.h
strcat	strncat	string.h
strcpy	strncpy	string.h
strcmp	strncmp	string.h
strcasecmp	strncasecmp	strings.h
vsprintf	vsnprintf	stdio.h

In this example, we ve overrun the bounds of our array by writing to the eleventh element. Whatever object follows this array is now corrupted. There s actually a very simple solution to this problem, and it involves a better programming practice using symbolic constants. In the next example, we create a constant defining the size of our array, but then we add one more element for the trailing NULL .

 #define ARRAY_SIZE    10     static char myArray[ARRAY_SIZE+1];     ...     int i;     for (i = 0 ; i < ARRAY_SIZE ; i++) {       myArray[i] = (char)(0x30+i);     }     myArray[ARRAY_SIZE] = 0;

We ve automatically protected our array by an extra element at the end, but also ”in good programming practice ”we ve used a symbol to denote the size of the array, rather than relying on a number.

Provide Logical Alternatives at Decision Points

A very common mistake that can yield unpredictable results is the absence of a default section in a switch statement. Consider the following example:

 switch(mode) {       case OPERATIONAL_MODE:         /* switch to operational mode processing */         break;       case BUILT_IN_TEST_MODE:         /* switch to test processing */         break;     }

In the event another mode was added but this particular code segment was not updated, the result after this segment has executed is unpredictable. The solution is to always include a default section that either asserts (in debugging mode) or at a minimum notifies the caller that a problem has occurred. If we re really not expecting another mode, we can simply assert here to catch the condition during debugging:

 switch(mode) {       case OPERATIONAL_MODE:         /* switch to operational mode processing */         break;       case BUILT_IN_TEST_MODE:         /* switch to test processing */         break;       default:         assert(0);         break;     }

A similar problem exists with if / then / else chains. The following example illustrates the problem:

 float multiplier = 0.0;     if (state == FIRST_STAGE) multiplier = 0.75;     else if (state == SECOND_STAGE) multiplier = 1.25;

If our state is corrupted or takes on a value that we did not expect, then our multiplier takes on the value of 0.0, and the result is unpredictable at best and, depending upon the application, catastrophic at worst. An else should be provided to at a minimum catch the issue, such as seen here:

 float multiplier = 0.0;     if (state == FIRST_STAGE) multiplier = 0.75;     else if (state == SECOND_STAGE) multiplier = 1.25;     else multiplier = SAFE_MULTIPLIER;

In many cases, the trailing else isn t necessary, but whenever one is seen, it should be given extra scrutiny to avoid erroneous results.

Self-Identifying Structures

A self-identifying structure is a method that mimics the concept of runtime type checking present in strongly typed languages. In a strongly typed language, the use of an invalid type results in a runtime error. Consider the passing of pointers in a weakly typed language such as C. With C typecasting, it s not difficult to confuse one structure for another.

With a simple policy change to C structures and a limited amount of checking, we can help ensure that functions are dealing with the right types. Consider the C source shown in Listing 26.1. At lines 6 “12, we see our target structure, which contains a special header called a signature (sometimes called a runtime type identifier ). The type is shown at line 4, in this case simply a signature that uniquely represents our structure. We then provide two macro functions that initialize ( INIT_TARGET_MARKER ) and then check ( CHECK_TARGET_MARKER ) the signature in the structure.

Skipping ahead a little, we look at the main function at lines 34 “54. We allocate two objects (both of size targetMarket_t ) and then initialize one of them as an actual target marker using the INIT_TARGET_MARKER macro. Finally, we try to display each of the objects by passing each to the displayTarget function.

In our displayTarget function (lines 22 “31), our first task is to check the signature of the received object by calling CHECK_TARGET_MARKER . If the signature is not correct, we assert rather than risk providing bogus information. Granted, in a production system we could probably handle this better, but this illustrates the concept.

Listing 26.1 Illustrating a Self-identifying Structure (on the CD-ROM at ./source/ch26/selfident.c )

  1  :       #include <stdio.h>  2  :       #include <assert.h>  3  :  4  :       #define TARGET_MARKER_SIG       0xFAF32000  5  :  6  :       typedef struct {  7  :  8  :         unsigned int signature;  9  :         unsigned int targetType;  10  :         double       x, y, z;  11  :  12  :       } targetMarker_t;  13  :  14  :  15  :       #define INIT_TARGET_MARKER(ptr) \  16  :                 (((targetMarker_t *)ptr)->signature = TARGET_MARKER_SIG)  17  :       #define CHECK_TARGET_MARKER(ptr) \  18  :                 assert(((targetMarker_t *)ptr)->signature == \  19  :                           TARGET_MARKER_SIG)  20  :  21  :  22  :       void displayTarget(targetMarker_t *target)  23  :       {  24  :  25  :         /* Pre-check of the target structure */  26  :         CHECK_TARGET_MARKER(target);  27  :  28  :         printf("Target type is %d\n", target->targetType);  29  :  30  :         return;  31  :       }  32  :  33  :  34  :       int main()  35  :       {  36  :         void *object1, *object2;  37  :  38  :         /* Create two objects */  39  :         object1 = (void *)malloc(sizeof(targetMarker_t));  40  :         assert(object1);  41  :         object2 = (void *)malloc(sizeof(targetMarker_t));  42  :         assert(object2);  44  :         /* Init object1 as a target marker struct */  45  :         INIT_TARGET_MARKER(object1);  46  :  47  :         /* Try to display object1 */  48  :         displayTarget((targetMarker_t *)object1);  49  :  50  :         /* Try to display object2 */  51  :         displayTarget((targetMarker_t *)object2);  52:   53  :         return 0;  54  :       }

Reporting Errors

The reporting of errors is an interesting topic because the policy that s chosen can be very different, depending upon the type of application we re developing. For example, if we re writing a command-line utility, emitting error messages to stderr is a common method to communicate errors to the user. But what happens if we re building an application that has I/O capabilities, such as an embedded Linux application? There are a number of possibilities, including the generation of a specialized log or use of the standard system log ( syslog ). The syslog function has the prototype:

 #include <syslog.h>     void syslog(int priority, char *format, ...);

To the syslog function, we provide a priority, a format string, and some arguments (similar to printf ). The priority can be one of LOG_EMERG , LOG_ALERT , LOG_CRIT , LOG_ERR , LOG_WARNING , LOG_NOTICE , LOG_INFO , or LOG_DEBUG . An example of using syslog to generate a message to the system log is shown in Listing 26.2.

Listing 26.2 Simple Example of syslog Use (on the CD-ROM at ./source/ch26/_simpsyslog.c )

  1  :       #include <syslog.h>  2  :  3  :       int main()  4  :       {  5  :  6  :         syslog(LOG_ERR, "Unable to load configuration!");  7  :  8  :         return 0;  9  :       }

This results in our system log (stored within our filesystem at /var/log/_messages ) being updated as:

 Jul 21 18:13:10 camus sltest: Unable to load configuration!

In this example, our application in Listing 26.2 was called sltest , with the hostname of camus . The system log can be especially useful because it s an aggregate of many error reports . This allows a developer to see where a message was generated in relation to others, which can be very useful in helping to understand problems.

Note	The syslog is very useful for communicating information for system applications and daemons.

One final topic on error reporting is that of being specific about the error being reported . The error message must uniquely identify the error in order for the user to be able to deal with it reasonably.

Reducing Complexity Also Reduces Potential Bugs

Code that is of higher complexity potentially contains more bugs. It s a fact of life, but one that we can use to help reduce defects. In some disciplines this is called refactoring , but the general goal is to take a complex piece of software and break it up so that it s more easily understood . This very act can lead to higher quality software that is more easily maintained .

Self-Protective Functions

Writing self-protective functions can be a very useful debugging mechanism to ensure that your software is correct. The programming language Eiffel includes language features to provide this mechanism (known as programming-by-contract ).

Being self-protective means that when you write a function, you scrutinize the input to the function and, upon completion of its processing, scrutinize the output to ensure that what you ve done is correct.

Let s look at an example of a simple function that illustrates this behavior (see Listing 26.3).

Note	If an expression results in false (0), the assert function causes the application to fail and an error to be generated to stdout . To disable assert s within an application, the NDEBUG symbol can be defined, which causes the assert calls to be optimized away.

Listing 26.3 Example of a Self-protective Function (on the CD-ROM at ./source/ch26/_selfprot.c )

  1  :       STATUS_T checkAntennaStatus(ANTENNA_T antenna, MODE_T *mode)  2  :       {  3  :         ANTENNA_STS_T retStatus;  4  :  5  :         /* Validate the input */  6  :         assert(validAntenna(antenna));  7  :         assert(validMode(mode));  8  :  9  :  10  :         /**/  11  :         /* Internal checkAntennaStatus processing */  12  :         /**/  13  :  14  :  15  :         /* We may have changed modes, check it. */  16  :         assert(validMode(mode));  17  :  18  :         return retStatus;  19  :       }

In Listing 26.3 we see a function that first ensures that it s getting good data (validating input) and then that what it s providing is correct (checking output). We also could have returned errors upon finding these conditions, but for this example, we re mandating proper behavior at all levels. If all functions performed this activity, finding the real source of bugs would be a snap.

The use of assert isn t restricted just to ensuring that function inputs and outputs are correct. It can also be used for internal consistency. Any critical failure that should be identified during debugging is easily handled with assert .

Note	Using the assert call for internal consistency is often the only practical way to find timing (race condition) bugs in threaded code.

Maximize Debug Output

Too much output can disguise errors; too little and an error could be missed. The right balance must be found when emitting debug and error output to ensure that only the necessary information is presented, to avoid overloading an already overloaded user.

Memory Debugging

There are many libraries available that support debugging dynamic memory management on GNU/Linux. One of the most popular is called Electric Fence, which programs the underlying processor s MMU (memory management unit) to catch memory errors via segment faults. Electric Fence can also detect exceeding array bounds. The Electric Fence library is very powerful and identifies memory errors immediately.

Compiler Support

The compiler itself can be an invaluable tool to identify issues in our code. When we build software, we should always enable warnings using the -Wall flag. To further ensure that warnings aren t missed in large applications, we can enable the -Werror flag, which treats warnings as errors and therefore halts further compilation of a source file. When building an application that has many source files, this combination can be beneficial. This is demonstrated as:

 gcc -Wall -Werror test.c -o test

If we want our source to have ANSI compatibility, we can enable checking for ANSI compliance (with pedantic checking) as:

 gcc -ansi -pedantic test.c -o test

Identifying uninitialized variables is a very useful test, but in addition to the warning option, optimization must also be enabled, because the data flow information is available only when the code is optimized:

 gcc -Wall -O -Wuninitialized test.c -o test

Chapter 4, The GNU Compiler Toolchain, provides additional warning information. The gcc main page also contains numerous warning options about those enabled via -Wall .