Buried Under a Pile of Printouts | Shellcoders Programming Uncovered (Uncovered series)

The availability of source code is simultaneously desirable and undesirable. It is desirable because the source code considerably simplifies the search of overflowing buffers. Why then is it undesirable? Exactly because of the same reason! You should not hope to find anything new in source code that has been read by everyone. The lack of source code considerably narrows the audience of investigators , cutting off the army of application programmers and crowd of amateurs. In this environment of Assembly commands, only those can survive who program faster than they think and think faster than they speak. Hackers must remember hundreds of data structures, feel their interrelation practically at the physical level, and intuitively understand, in which direction to dig. Programming experienced is a benefit, because it helps hackers understand the thinking of the developer of the application being investigated. Just think: How would you solve the problem if you were the program developer? What errors might you make? Where might you show inexcusable carelessness, being lured by the compactness of the code and the elegance of the listing?

By the way, it is necessary to say some words about elegance. A common opinion is that a careless style of programming inevitably provokes programmers to make blatant errors, including overflow errors, whereas pedantically corrected program is most likely bug-free, and analyzing it means wasting time. However, no one can tell for sure. In my practice, I have encountered outrageously careless listings that worked excellently because they were designed by true professionals who knew beforehand where precautions needed to be taken. Also, I have encountered academically accurate programs that checked everything that could be checked multiple times with religious fanaticism but still were full of overflow bugs . Carefulness is not enough to guarantee that the program will be bug-free. To prevent errors, it is necessary to have a wide range of programming experiences, including negative ones. Imposing carelessness often comes with experience, because it is a kind of reaction to youthful passion for efficiency and optimization.

Neglecting #define directives or using them illiterately indicates that the program was written by an amateur . In particular, if the size of the buff buffer is defined through MAX_BUF_SIZE , then the size of the string copied there must also be limited by the same value instead of MAX_STR_SIZE , specified by a separated #define directive. Pay special attention to the types of function arguments working with data blocks. Passing a pointer to the function without specifying the block size is the most frequent error of beginners , as well as excessive use of the strcpy and strncpy functions. The first function is unsafe, because there is no possibility of limiting the maximum allowed length of the string being copied, and the second function is unreliable, because there is no possibility of indicating truncation of the string tail if the siring doesn't fit the buffer (this alone could be a source of blundering bugs).

In general, there are only two techniques of searching for overflowing buffers. Both of them are faulty and defective. The simplest approach (but not the cleverest one) is methodically feeding the service being investigated with text strings of different lengths and observing its reaction. If the service crashes, then the overflowing buffer is detected . This technology doesn't always produce the expected result because, proceeding this way, it is possible to pass two steps from an enormous hole without noticing anything suspicious. For example, assume that the server is expecting an URL. Further, suppose it naively assumes that the protocol name cannot be longer than four characters . In this case, to overflow the buffer it is enough to send it something like httttttttp://someserver.com . Note that http://sssssssssssssoooooooomeserver.com won't produce any result How would you know beforehand which checks have been omitted by the programmer? Perhaps, the developer hoped that the combinations of more than two slashes would never be encountered? Or that more than one colon could not exist sequentially? By testing all variants blindly, you'll detect the overflow error no sooner than doomsday, when this problem would lose its urgency. Most "serious" queries are composed of hundreds of fields that interact with one another in a sophisticated manner. Thus, the brute-force testing method becomes useless here. This is where systematic analysis comes in.

In theory, to guarantee detection of all overflowing buffers, it is enough to read the entire source code or disassembled listing of the program, line by line, and notice all missing checks. In practice, however, this is unrealistic because of the monstrous amount of the code. In addition, not every missing check indicates the presence of overflowing buffer. Consider the code in Listing 8.1.

Listing 8.1: The rabbit hole

 f(char *src) {         char buf[0x10];         strcpy(buf, src);         ... }

If the length of the src string exceeds 10h characters, the buffer will overflow and overwrite the return address. The entire problem is that it is necessary to find out whether or not the parent function checks the length of the src string before passing it. Even if there are no explicit checks but the string is formed so that it is guaranteed not to exceed the specified length (and it might be formed in a grandparent function), then no buffer overflow will occur and all time and effort spent for analysis will be wasted .

Briefly, the hacker will have to do lots of assiduous work. Searching for overflowing buffers is hard to formalize and practically impossible to automate. Microsoft invests billions of dollars in technologies of analysis improvement, but it hasn't seen excessive benefits. So, it cannot be expected that an individual hacker would achieve greater success.

First, it is necessary to investigate those buffers that can be influenced in some way. Usually, these are buffers related to network service. Local hacking is much less interesting!

Assume that an overflow error has been detected. Where do you go from there? In further investigation, only the disassembler can help. Do not try to squeeze any additional information from the source code. The order, in which variables are located in memory, is not defined. It practically never matches the order, in which they were declared in the program. It might happen that most of these variables won't be present in memory, because the compiler has placed them into registers or the optimizer has even discarded them as unneeded. Along with this, it is necessary to mention that all demo listings presented in this book assume that variables in memory are located in the same order as the order, in which they were declared.