Preventing Buffer Overruns

The first line of defense is simply to write solid code! Although some aspects of writing secure code are a little arcane, preventing buffer overruns is mostly a matter of writing a robust application. Writing Solid Code (Microsoft Press, 1993), by Steve Maguire, is an excellent resource. Even if you re already a careful, experienced programmer, this book is still worth your time.

Always validate all your inputs the world outside your function should be treated as hostile and bent upon your destruction. Likewise, nothing about the function s internal implementation, nothing other than the function s expected inputs and output, should be accessible outside the function. I recently exchanged mail with a programmer who had written a function that looked like this:

void PrintLine(const char* msg) { char buf[255]; sprintf(buf, "Prefix %s suffix\n", msg);  }

When I asked him why he wasn t validating his inputs, he replied that he controlled all the code that called the function, he knew how long the buffer was, and he wasn t going to overflow it. Then I asked him what he thought might happen if someone else who wasn t that careful needed to maintain his code. Oh, he said. This type of construct is just asking for trouble functions should always fail gracefully, even if unexpected input is passed into the function.

Safe String Handling

String handling is the single largest source of buffer overruns, so a review of the commonly used functions is in order. Although I m going to cover the single-byte versions, the same problems apply to the wide-character string-handling functions. To complicate matters even further, Windows systems support lstrcpy, lstrcat, and lstrcpyn, and the Windows shell contains similar functions, such as StrCpy, StrCat, and StrCpyN exported from Shlwapi.dll. Although the lstr family of calls varies a little in the details and the calls work with both single-byte and multibyte character sets depending on how an LPTSTR ends up being defined by the application, they suffer from the same problems as the more familiar ANSI versions.

strcpy

The strcpy function is inherently unsafe and should be used rarely, if at all. Let s take a look at the function declaration:

char *strcpy( char *strDestination, const char *strSource );

The number of ways that this function call can blow up is nearly unlimited. If either the destination or the source buffer is null, you end up in the exception handler. If the source buffer isn t null-terminated, the results are undefined, depending on how lucky you are about finding a random null byte. The greatest problem is that if the source string is longer than the destination buffer, an overflow occurs. This function can be used safely only in trivial cases, such as copying a fixed string into a buffer to prefix another string.

Here s some code that handles this function as safely as possible:

/*This function shows how to use strcpy as safely as possible.*/ bool HandleInput(const char* input) { char buf[80]; if(input == NULL) { assert(false); return false; } //The strlen call will blow up if input isn't null-terminated. if(strlen(input) < sizeof(buf)) { //Everything checks out. strcpy(buf, input); } else { return false; } //Do more processing of buffer. return true; }

As you can see, this is quite a bit of error checking, and if the input string isn t null-terminated, the function will probably throw an exception. I ve had programmers argue with me that they ve checked dozens of uses of strcpy and that most of them were done safely. That may be the case, but if they always used safer functions, there would be a lower incidence of problems. Even if a programmer is careful, it s easy for the programmer to make mistakes with strcpy. I don t know about you, but I write enough bugs into my code without making it any easier on myself. I know of several software projects in which strcpy was banned and the incidence of reported buffer overruns dropped significantly.

Consider placing the following into your common headers:

#define strcpy Unsafe_strcpy

This statement will cause any instances of strcpy to throw compiler errors. I look at it as a safety matter I might not get tossed off my horse often, but I always wear a helmet in case I am. Likewise, if I use only safe string-handling functions, it s much less likely that an error on my part will become a catastrophic failure. If you eliminate strcpy from your code base, it s almost certain that you ll remove a few bugs along with it.

strncpy

The strncpy function is much safer than its cousin, but it also comes with a few problems. Here s the declaration:

char *strncpy( char *strDest, const char *strSource, size_t count );

The obvious problems are still that passing in a null or otherwise illegal pointer for source or destination will cause exceptions. Another possible way to make a mistake is for the count value to be incorrect. Note, however, that if the source buffer isn t null-terminated, the code won t fail. You might not anticipate the following problem: no guarantee exists that the destination buffer will be null-terminated. (The lstrcpyn function does guarantee this.) I also normally consider it a severe error if the user input passed in is longer than my buffers allow that s usually a sign that either I ve screwed up or someone is trying to hack me. The strncpy function doesn t make it easy to determine whether the input buffer was too long. Let s take a look at a couple of examples.

Here s the first:

/*This function shows how to use strncpy. A better way to use strncpy will be shown next.*/ bool HandleInput_Strncpy1(const char* input) { char buf[80]; if(input == NULL) { assert(false); return false; } strncpy(buf, input, sizeof(buf) - 1); buf[sizeof(buf) - 1] = '\0'; //Do more processing of buffer. return true; }

This function will fail only if input is an illegal pointer. You also need to pay attention to the use of the sizeof operator. If you use sizeof, you can change the buffer size in one place, and you won t end up having unexpected results 100 lines down. Moreover, you should always set the last character of the buffer to a null character. The problem here is that we re not sure whether the input was too long. The documentation on strncpy helpfully notes that no return value is reserved for an error. Some people are quite happy just to truncate the buffer and continue, thinking that some code farther down will catch the error. This is wrong. Don t do it! If you re going to end up throwing an error, do it as close as possible to the source of the problem. It makes debugging a lot easier when the error happens near the code that caused it. It s also more efficient why execute more instructions than you have to? Finally, the truncation might just happen in a place that causes unexpected results ranging from a security hole to user astonishment. (According to The Tao of Programming [Info Books, 1986], by Jeffrey James, user astonishment is always bad.) Take a look at the following code, which fixes this problem:

/*This function shows a better way to use strncpy. It assumes that input should be null-terminated.*/ bool HandleInput_Strncpy2(const char* input) { char buf[80]; if(input == NULL) { assert(false); return false; } buf[sizeof(buf) - 1] = '\0'; strncpy(buf, input, sizeof(buf)); if(buf[sizeof(buf) - 1] != '\0') { //Overflow! return false; } //Do more processing of buffer. return true; }

The HandleInput_Strncpy2 function is much more robust. The changes are that I set the last character to a null character first as a test and then allow strncpy to write the entire length of the buffer, not sizeof(buf) 1. Then I check for the overflow condition by testing to see whether the last character is still a null.

sprintf

The sprintf function is right up there with strcpy in terms of the mischief it can cause. There is almost no way to use this function safely. Here s the declaration:

int sprintf( char *buffer, const char *format [, argument] ... );

Except in trivial cases, it isn t easy to verify that the buffer is long enough for the data before calling sprintf. Let s take a look at an example:

/* Example of incorrect use of sprintf */ bool SprintfLogError(int line, unsigned long err, char* msg) { char buf[132]; if(msg == NULL) { assert(false); return false; } //How many ways can sprintf fail??? sprintf(buf, "Error in line %d = %d - %s\n", line, err, msg); // Do more stuff such as logging the error to file and displaying // it to user. return true; }

How many ways can this function fail? If msg isn t null-terminated, SprintfLogError will probably throw an exception. I ve used 21 characters to format the error. The err argument can take up to 10 characters to display, and the line argument can take up to 11 characters. (Line numbers shouldn t be negative, but something could go wrong.) So it s safe to pass in only 89 characters for the msg string. Remembering the number of characters that can be used by the various format codes is difficult. The return from sprintf isn t a lot of help either. It tells you how many characters were written, so you could write code like this:

if(sprintf(buf, "Error in line %d = %d - %s\n", line, err, msg) >= sizeof(buf)) exit(-1);

There is no graceful recovery. You ve overwritten who knows how many bytes with who knows what, and you might have just overwritten your exception handler pointer! You cannot use exception handling to mitigate a buffer overflow; your attacker can cause your exception-handling routines to do their work for them. The damage has already been done the game is over, and the attacker won. If you re determined to use sprintf, a nasty hack will allow you to do it safely. (I m not going to show an example.) Open the NUL device for output with fopen and call fprintf and the return value from fprintf tells you how many bytes would be needed. You could then check that value against your buffer or even allocate as much as you need. The _output function underlies the entire printf family of calls, and it has considerable overhead. Calling _output twice just to format some characters into a buffer isn t efficient.

_snprintf

The _snprintf function is one of my favorites. It has the following declaration:

int _snprintf( char *buffer, size_t count, const char *format [, argument] ... );

You have all the flexibility of _sprintf, and it s safe to use. Here s an example:

/*Example of _snprintf usage*/ bool SnprintfLogError(int line, unsigned long err, char* msg) { char buf[132]; if(msg == NULL) { assert(false); return false; } if(_snprintf(buf, sizeof(buf)-1, "Error in line %d = %d - %s\n", line, err, msg) < 0) { //Overflow! return false; } else { buf[sizeof(buf)-1] = '\0'; } // Do more stuff, such as logging the error to a file // and displaying it to user. return true; }

It seems that you must worry about something no matter which of these functions you use: _snprintf doesn t guarantee that the destination buffer is null-terminated at least not as it s implemented in the Microsoft C run-time library so you have to check that yourself. To make matters even worse, this function wasn t part of the C standard until the ISO C99 standard was adopted. Because _snprintf is a nonstandard function, which is why it starts with an underscore, four behaviors are possible if you re concerned about writing cross-platform code. It can return a negative number if the buffer was too small, it can return the number of bytes that it should have written, and it might or might not null-terminate the buffer. If you re concerned about writing portable code, it is usually best to write a macro or wrapper function to check for errors that will isolate the differences from the main line of code. Other than remembering to write portable code, just remember to specify the character count as one less than the buffer size to always allow room for the trailing null character, and always null-terminate the last character of the buffer.

Concatenating strings can be unsafe using the more traditional functions. Like strcpy, strcat is unsafe except in trivial cases, and strncat is difficult to use. Using _snprintf makes concatenating strings easy and safe. As a result of a debate I had with one of my developers, I once tested the performance difference between _snprintf and strncpy followed by strncat. It isn t substantial unless you re in a tight loop doing thousands of operations.

Standard Template Library Strings

One of the coolest aspects of writing C++ code is using the Standard Template Library (STL). The STL has saved me a lot of time and made me much more efficient. My earlier complaint about there not being a native string type in C is now answered. A native string type is available in C++. Here s an example:

/*Example of STL string type*/ #include <string> using namespace std; void HandleInput_STL(const char* input) { string str1, str2; //Use this form if you're sure that the input is null-terminated. str1 = input; //If you're not sure whether input is null-terminated, you can // do the following: str2.append(input, 132); //132 == max characters to copy in //Do more processing here. //Here's how to get the string back. printf("%s\n", str2.c_str()); }

I can t think of anything easier than this! If you want to concatenate two strings, it s as simple as

string s1, s2; s1 = "foo"; s2 = "bar" //Now s1 = "foobar" s1 += s2;

The STL also has several really useful member functions you can use to find characters and strings within another string and truncate the string. It comes in a wide-character version too. Microsoft Foundation Classes (MFC) CStrings work almost exactly the same way.

gets and fgets

A chapter on unsafe string handling wouldn t be complete without a mention of gets. The gets function is defined as

char *gets( char *buffer );

This function is just a disaster waiting to happen. It s going to read from the stdin stream until it gets a linefeed or carriage return. There s no way to know whether it s going to overflow the buffer. Don t use gets use fgets or a C++ stream object instead.