The Sin Explained

Formatting data for display or storage can be a somewhat difficult task. Thus, many computer languages include routines to easily reformat data. In most languages, the formatting information is described using some sort of a string, called the format string . The format string is actually defined using limited data processing language thats designed to make it easy to describe output formats. But many developers make an easy mistakethey use data from untrusted users as the format string. As a result, attackers can write strings in the data processing language to cause many problems.

The design of C/C++ makes this especially dangerous: C/C++s design makes it harder to detect format string problems, and format strings include some especially dangerous commands (particularly %n) that do not exist in some other languages format string languages.

In C/C++, a function can be declared to take a variable number of arguments by specifying an ellipsis () as the last (or only) argument. The problem is that the function being called has no way to know just how many arguments are being passed in. The most common set of functions to take variable length arguments is the printf family: printf, sprintf, snprintf , fprintf, vprintf , and so on. Wide character functions that perform the same function have the same problem. Lets take a look at an illustration:

 #include <stdio.h> int main(int argc, char* argv[]) {  if(argc > 1)  printf(argv[1]);  return 0; } 

Fairly simple stuff. Now lets look at what can go wrong. The programmer is expecting the user to enter something benign , such as Hello World . If you give it a try, youll get back Hello World . Now lets change the input a littletry %x %x . On a Windows XP system using the default command line (cmd.exe), youll now get the following:

 E:\projects_sins\format_bug>format_bug.exe "%x %x" 12ffc0 4011e5 

Note that if youre running a different operating system, or are using a different command line interpreter, you may need to make some changes to get this exact string fed into your program, and the results will likely be different. For ease of use, you could put the arguments into a shell script or batch file.

What happened ? The printf function took an input string that caused it to expect two arguments to be pushed onto the stack prior to calling the function. The %x specifiers enabled you to read the stack, four bytes at a time, as far as youd like. It isnt hard to imagine that if you had a more complex function that stored a secret in a stack variable, the attacker would then be able to read the secret. The output here is the address of the stack location (0x12ffc0), followed by the code location that the main() function will return into. As you can imagine, both of these are extremely important pieces of information that are being leaked to an attacker.

You may now be wondering just how the attacker uses a format string bug to write memory. One of the least used format specifiers is %n, which writes the number of characters that should have been written so far into the address of the variable you gave as the corresponding argument. Heres how it should be used:

 unsigned int bytes; printf("%s%n\n", argv[1], &bytes); printf("Your input was %d characters long\n, bytes"); 

The output would be:

 E:\projects_sins\format_bug>format_bug2.exe "Some random input" Some random input Your input was 17 characters long 

On a platform with four-byte integers, the %n specifier will write four bytes at once, and %hn will write two bytes. Now attackers only have to figure out how to get the address theyd like in the appropriate position in the stack, and tweak the field width specifiers until the number of bytes written is what theyd like.

Note 

You can find a more complete demonstration of the steps needed to conduct an exploit in Chapter 5 of Writing Secure Code, Second Edition by Michael Howard and David C. LeBlanc (Microsoft Press, 2002), or in The Shellcoders Handbook: Discovering and Exploiting Security Holes by Jack Koziol, David Litchfield, Dave Aitel, Chris Anley, Sinan noir Eren, Neel Mehta, and Riley Hassell (Wiley, 2004).

For now, lets just assume that if you allow attackers to control the format string in a C/C++ program, it is a matter of time before they figure out how to make you run their code. An especially nasty aspect of this type of attack is that before launching the attack, they can probe the stack and correct the attack on the fly. In fact, the first time the author demonstrated this attack in public, he used a different command line interpreter than hed used to create the demonstration, and it didnt work. Due to the unique flexibility of this attack, it was possible to correct the problem and exploit the sample application with the audience watching.

Most other languages dont support the equivalent of a %n format specifier, and they arent directly vulnerable to easy execution of attacker-supplied code, but you can still run into problems. There are other, more complex variants on this attack that other languages are vulnerable to. If attackers can specify a format string for output to a log file or database, they can cause incorrect or misleading logs. Additionally, the application reading the logs may consider them trusted input, and once this assumption is violated, weaknesses in that applications parser may lead to execution of arbitrary code. A related problem is embedding control characters in log filesbackspaces can be used to erase things; line terminators can obfuscate or even eliminate the attackers traces.

This should go without saying, but if an attacker can specify the format string fed to scanf or similar functions, disaster is on the way.

Sinful C/C++

Unlike many other flaws well examine, this one is fairly easy to spot as a code defect. Its very simple:

 printf(user_input); 

is wrong, and

 printf("%s", user_input); 

is correct.

One variant on the problem that many programmers neglect is that it is not sufficient to do this correctly only once. There are a number of common code constructs where you might use sprintf to place a formatted string into a buffer, and then slip up and do this:

 fprintf(STDOUT, err_msg); 

The attacker then only has to craft the input so that the format specifiers are escaped, and in most cases, this is a much more easily exploited version because the err_msg buffer frequently will be allocated on the stack. Once attackers manage to walk back up the stack, theyll be able to control the location that is written using user input.

Related Sins

Although the most obvious attack is related to a code defect, it is a common practice to put application strings in external files for internationalization purposes. If your application has sinned by failing to protect the file properly, then an attacker can supply format strings because of a lack of proper file access.

Another related sin is failing to properly validate user input. On some systems, an environment variable specifies the locale information, and the locale, in turn , determines the directory where language-specific files will be found. On some systems, the attacker might even cause the application to look in arbitrary directories.



19 Deadly Sins of Software Security. Programming Flaws and How to Fix Them
Writing Secure Code
ISBN: 71626751
EAN: 2147483647
Year: 2003
Pages: 239

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net