What Is a Format String Bug?

A format string bug occurs when user -supplied data is included in the format specification string of one of the printf family of functions, including

 printf fprintf sprintf snprintf vfprintf vprintf vsprintf vsnprintf 

and any similar functions on your platform that accept a string that can contain C-style format specifiers, such as the wprintf functions on the Windows platforms. The attacker supplies a number of format specifiers that have no corresponding arguments on the stack, and values from the stack are used in their place. This leads to information disclosure and potentially the execution of arbitrary code.

As we have already discussed, printf functions are meant to be passed as a format string that determines how the output is laid out, and what set of variables are substituted into the format string. The following code will, for example, print out the square root of 2 to 4 decimal places:

 printf("The square root of 2 is: %2.4f\n", sqrt(2.0)); 

However, strange behaviors occur if we provide a format string but omit the variables that are to be substituted. Here is a generic program that calls printf with the argument it is passed on the command line.

 #include <stdio.h> #include <stdlib.h>     int main(int argc, char *argv[]) {         if(argc != 2)         {                 printf("Error - supply a format string please\n");                 return 1;         }             printf(argv[1]);         printf("\n");             return 0; } 

If we compile this like so:

 cc fmt.c -o fmt 

and call it as follows :

 ./fmt "%x %x %x %x" 

we are effectively calling printf like this:

 printf("%x %x %x %x"); 

The important thing here is that although we have supplied the format string, we haven't supplied the four numeric variables to be substituted into the string. Interestingly, printf doesn't fail, instead producing output that looks like this:

 4015c98c 4001526c bffff944 bffff8e8 

So printf() is unexpectedly obtaining four arguments from somewhere. These arguments are in fact coming from the stack.

This may initially appear not to be a problem; however, an attacker might possibly be able to see the contents of the stack. What does that mean? Well, in itself it might reveal sensitive information such as usernames and passwords, but the problem runs deeper than that. If we try supplying a large number of %x specifiers, like this:

 ./fmt "AAAAAAAAAAAAAAAAAAA%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x %x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x %x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x %x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x" 

we obtain some interesting results.

 ./fmt "AAAAAAAAAAAAAAAAAAA%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x %x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x %x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x %x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x%x"     AAAAAAAAAAAAAAAAAAA4001526cbffff7d880483e18049530804962cbffff8084003e280 2bffff834bffff84080482ae80484900bffff8084003e26a0bffff8404015abc040014d2 8280483000804832180484002bffff834804829880484904000cc20bffff82c400152cc2 bffff972bffff9780bffffa8ebffffab1bffffac3bffffae3bffffaf6bffffb08bffffb2 abffffb3cbffffb4ebffffb5bbffffb64bffffb6ebffffb85bffffd63bffffd71bffffd9 2bffffdadbffffdc2bffffdcfbffffddabffffdebbffffdf8bffffe00bffffe0fbffffe2 4bffffe34bffffe42bffffe50bffffe61bffffe6fbffffe7abffffe85bffffed6bffffee 5bffffef7bfffff0abfffff1bbfffff2bbfffffd6bfffffde0103febfbff610001164380 48034420567400000008098048300b0c0d0e0fbffff96d000000383669002f2e0036746d 664141414141414141414141414141414125414141257825782578257825782578257825 7825782578257825782578257825782578257825782578257825782578 

As you can see, we are pulling a large amount of data from the stack, but then towards the end of the string we see the hex-encoded representation of the beginning of our string:

 41414141414141 

This result is somewhat unexpected, but makes sense if you consider that the format string itself is held on the stack, so 4-byte segments from the string are being passed as the " numbers " to be substituted into the string. Therefore, we can get data from the stack in hex format.

What else can we do? Well, to take a look at a few of the different type conversion specifiers that we can use, look at:

 man sprintf 

We see a large number of conversion specifiers d, i, o, u and x for integers, e, f, g, a for floating point, and c for characters . A few other interesting specifiers are present though, and these expect something other than a simple numeric argument.

s The argument is treated as a pointer to a string. The string is substituted into the output.

n The argument is treated as a pointer to an integer (or integer variant such as short ). The number of characters output so far is stored in the address pointed to by the argument.

So, if we specify %n in the format string, the number of characters output so far is written to the location specified by the argument, thus:

 ./fmt "AAAAAAAAAAAAAAAAAAA%n%n%n%n%n%n%n%n%n%n%n" 
Note 

Don't forget to add ulimit c unlimted to ensure you get a core dump.

This example is more interesting, and illustrates the danger inherent in allowing a user to specify format strings. Consulting the above description of printf format specifiers, you should see that the %n type specifier expects an address as its argument, and will write the number of characters output so far into that address. This means we can overwrite values stored at specific addresses, allowing us to take control of execution. Don't worry if you don't completely understand the implications of this right now, we will spend the rest of the chapter explaining it in detail.

Recalling the ASCII example above, we can use the precision specifier to control the number of characters output; if we want to output 50 characters, we can specify %050x , which will output a hexadecimal integer padded with leading zeroes until it contains exactly 50 digits.

Also, if you recall that the arguments to the printf function can be drawn from within the string itselfour 41414141 example aboveyou will see that we can use the %n specifier to write a value we control to the address of our choice.

Using these facts, we can run arbitrary code because the following conditions exist:

  • We can control the values of the arguments, and we can write the number of characters output to anywhere in memory.

  • The width specifier allows us to pad output to an almost arbitrary length certainly to 255 characters. We can overwrite a single byte with the value of our choice.

  • We can do this four times, so we can overwrite almost any 4 bytes with the value of our choice. Overwriting 4 bytes allows the attacker to overwrite addresses. We might have problems writing to addresses with 00 bytes because the 00 byte terminates a string in C. We can probably get around these problems by writing 2 bytes starting at the address before it, however.

  • Because we can generally guess the address of a function pointer (saved return address, binary import table, C++ vtable) we can cause a string that we supply to be executed as code.

It is worth clearing up several common misconceptions relating to format string attacks:

  • They don't just affect UNIX.

  • They aren't necessarily stack based.

  • Stack protection mechanisms will not generally defend against them.

  • They can generally be detected with static code analysis tools.

The security advisory of the Van Dyke VShell SSH Gateway for Windows format string vulnerability provides a good illustration of these points and can be found at www.atstake.com/research/advisories/2001/a021601-1.txt .

This is quite a severe vulnerability. An arbitrary code execution vulnerability in a component that authenticates users effectively removes all access control from that component. In this case, a skilled attacker could capture the plaintext of all user sessions with relative ease, or take control of the system with ease.

To summarize, a format string bug occurs when user-supplied data is included in the format specification string of one of the printf family of functions. The attacker supplies a number of format specifiers that have no corresponding arguments on the stack, and values from the stack are used in their place. This leads to information disclosure and potentially the execution of arbitrary code.



The Shellcoder's Handbook. Discovering and Exploiting Security
Hacking Ubuntu: Serious Hacks Mods and Customizations (ExtremeTech)
ISBN: N/A
EAN: 2147483647
Year: 2003
Pages: 198
Authors: Neal Krawetz

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net