3.3 Problem Areas

 <  Day Day Up  >  

So far, the reverse engineering process that has been presented is an idealized one; all tools are assumed to work correctly on all targets, and the resulting disassembly is assumed to be accurate.

In most real-world reverse engineering cases, however, this is not the case. The tools may not process the target at all, or may provide an inaccurate disassembly of the underlying machine code. The target may contain hostile code, be encrypted or compressed, or simply have been compiled using nonstandard tools.

The purpose of this section is to introduce a few of the common difficulties encountered when using these tools. It's not an exhaustive survey of protection techniques, nor does it pretend to provide reasonable solutions in all cases; what follows should be considered background for the next section of this chapter, which discusses the writing of new tools to compensate for the problems the current tools cannot cope with.

3.3.1 Antidebugging

The prevalence of open source software on Linux has hampered the development of debuggers and other binary analysis tools; the developers of debuggers still rely on ptrace, a kernel-level debugging facility that is intended for working with "friendly" programs. As has been more than adequately shown (see Section 3.5 for more information), ptrace cannot be relied on for dealing with foreign or hostile binaries.

The following simple ”and by now, quite common ”program locks up when being debugged by a ptrace-based debugger:

 #include <sys/ptrace.h>     #include <stdio.h>     int main( int argc, char **argv ) {         if ( ptrace(PTRACE_TRACEME, 0, NULL, NULL) < 0 ) {             /* we are being debugged */             while (1) ;         }         printf("Success: PTRACE_TRACEME works\n");         return(0);     } 

On applications that tend to be less obvious about their approach, the call to ptrace will be replaced with an int 80 system call:

 asm("\t xorl %ebx, %ebx    \n"    /* PTRACE_TRACEME = 0 */     "\t movl , %ea    \n"    /* from /usr/include/asm.unistd.h */     "\t int 80        \n"    /* system call trap */     ); 

These work because ptrace checks the task struct of the caller and returns -1 if the caller is currently being ptrace( )ed by another process. The check is very simple, but is done in kernel land:

 /* from /usr/src/linux/arch/i386/kernel/ptrace.c */ if (request == PTRACE_TRACEME) {         /* are we already being traced? */         if (current->ptrace & PT_PTRACED)               goto out;         /* set the ptrace bit in the process flags. */         current->ptrace = PT_PTRACED;         ret = 0;         goto out;   } 

The usual response to this trick is to jump over or NOP out the call to ptrace, or to change the condition code on the jump that checks the return value. A more graceful way ”and this extends beyond ptrace as a means of properly dealing with system calls in the target ”is to simply wrap ptrace with a kernel module:

 /*---------------------------------------------------------------------------*/     /* ptrace wrapper: compile with `gcc -c new_ptrace.c`                        load with    `insmod -f new_ptrace.o`                        unload with  `rmmod new_ptrace`        */     #define __KERNEL_  _     #define MODULE     #define LINUX     #include <linux/kernel.h>  /* req */     #include <linux/module.h>  /* req */     #include <linux/init.h>    /* req */     #include <linux/unistd.h>  /* syscall table */     #include <linux/sched.h>   /* task struct, current(  ) */     #include <linux/ptrace.h>  /* for the ptrace types */     asmlinkage int (*old_ptrace)(long req, long pid, long addr, long data);     extern long sys_call_table[];     asmlinkage int new_ptrace(long req, long pid, long addr, long data){         /* if the caller is currently being ptrace(  )ed: */            if ( current->ptrace & PT_PTRACED ) {               if ( req == PTRACE_TRACEME                          req == PTRACE_ATTACH                           req == PTRACE_DETACH                    req == PTRACE_CONT      )                 /* lie to it and say everything's fine */                     return(0);                /* notify user that some other ptrace was encountered */               printk("Prevented pid %d (%s) from ptrace(%ld) on %ld\n",                        current->pid, current->comm, request, pid );               return(-EIO); /* the standard ptrace(  ) ret val */            }            return((*old_ptrace)(req, pid, addr, data));     }     int _  _init init_new_ptrace(void){            EXPORT_NO_SYMBOLS;         /* save old ptrace system call entry, replace it with ours */            old_ptrace = (int(*)(long request, long pid, long addr,                  long data))   (sys_call_table[_  _NR_ptrace]);            sys_call_table[_  _NR_ptrace] = (unsigned long) new_ptrace;            return(0);     }     void _  _exit exit_new_ptrace(void){         /* put the original syscall entry back in the syscall table */            if ( sys_call_table[_  _NR_ptrace] != (unsigned long) new_ptrace )               printk("Warning: someone hooked ptrace(  ) after us. "                  "Reverting.\n");            sys_call_table[_  _NR_ptrace] = (unsigned long) old_ptrace;            return;     }     module_init(init_new_ptrace);        /* export the init routine */     module_exit(exit_new_ptrace);        /* export the exit routine */ /*-----------------------------------------------------------------------*/ 

This is, of course, a small taste of what can be done in kernel modules; between hooking system calls and redirecting interrupt vectors (see Section 3.5 for more on these), the reverse engineer can create powerful tools with which to examine and monitor hostile programs.

Many automated debugging or tracing tools are based on ptrace and, as a result, routines such the following have come into use:

 /* cause a SIGTRAP and see if it gets through the debugger */     int being_debugged = 1;     void int3_count( int signum ) {         being_debugged = 0;     }     int main( int argc, char **argv ) {         signal(SIGTRAP, int3_count);         asm( "\t int3 \n");         /* ... */         if ( being_debugged ) {             while (1) ;         }         return(0);     } 

With a live debugger such as gdb, these pose no problem: simply sending the generated signal to the process with gdb's signal SIGTRAP command fools the process into thinking it has received the signal without interference. In order to make the target work with automatic tracers, the signal specified in the signal call simply has to be changed to a user signal:

 68 00 85 04 08          push 
 68 00 85 04 08 push $0x8048500 6a 05 push $0x5 ; SIGTRAP e8 83 fe ff ff call 80483b8 <_init+0x68> ... becomes ... 68 00 85 04 08 push $0x8048500 6a 05 push $0x1E ; SIGUSR1 e8 83 fe ff ff call 80483b8 <_init+0x68> 
x8048500 6a 05 push
 68 00 85 04 08 push $0x8048500 6a 05 push $0x5 ; SIGTRAP e8 83 fe ff ff call 80483b8 <_init+0x68> ... becomes ... 68 00 85 04 08 push $0x8048500 6a 05 push $0x1E ; SIGUSR1 e8 83 fe ff ff call 80483b8 <_init+0x68> 
x5 ; SIGTRAP e8 83 fe ff ff call 80483b8 <_init+0x68> ... becomes ... 68 00 85 04 08 push
 68 00 85 04 08 push $0x8048500 6a 05 push $0x5 ; SIGTRAP e8 83 fe ff ff call 80483b8 <_init+0x68> ... becomes ... 68 00 85 04 08 push $0x8048500 6a 05 push $0x1E ; SIGUSR1 e8 83 fe ff ff call 80483b8 <_init+0x68> 
x8048500 6a 05 push
 68 00 85 04 08 push $0x8048500 6a 05 push $0x5 ; SIGTRAP e8 83 fe ff ff call 80483b8 <_init+0x68> ... becomes ... 68 00 85 04 08 push $0x8048500 6a 05 push $0x1E ; SIGUSR1 e8 83 fe ff ff call 80483b8 <_init+0x68> 
x1E ; SIGUSR1 e8 83 fe ff ff call 80483b8 <_init+0x68>

A final technique that is fairly effective is to scan for embedded debug trap instructions ( int3 or 0xCC ) in critical sections of code:

 /* we need the extern since C cannot see into the asm statement */ extern void here(void); int main( int argc, char **argv ) {     /* check for a breakpoint at the code label */       if ( *(unsigned char *)here == 0xCC ) {         /* we are being debugged */             return(1);       }     /* create code label with an asm statement */     asm("\t here: \n\t nop \n");     printf("Not being debugged\n");       return(0); } 

In truth, this only works because gdb's support for debug registers DR0-DR3 via its hbreak command is broken. Since the use of the debug registers is supported by ptrace (see Section 3.4.2 later in this chapter), this is most likely a bug or forgotten feature; however, GNU developers are nothing if not inscrutable, and it may be up to alternative debuggers such as ald or ups to provide adequate debug register support.

3.3.2 Antidisassembly

The name of this section is somewhat a misnomer. Typical antidisassembler techniques such as the " off-by-one -byte" and "false return" tricks will not be discussed here; by and large, such techniques fool disassemblers but fail to stand up to a few minutes of human analysis and can be bypassed with an interactive disassembler or by restarting disassembly from a new offset. Instead, what follows is a discussion of mundane problems that are much more likely to occur in practice and can be quite tedious , if not difficult, to resolve.

One of the most common techniques to obfuscate a disassembly is static linking. While this is not always intended as obfuscation, it does frustrate the analysis of the target, since library calls are not easily identified. In order to resolve this issue, a disassembler or other analysis tool that matches signatures for functions in a library (usually libc) with sequences of bytes in the target.

The technique for generating a file of signatures for a library is to obtain the exported functions in the library from the file header (usually an AR file, as documented in /usr/include/ar.h ), then iterate through the list of functions, generating a signature of no more than SIGNATURE_MAX bytes for all functions that are SIGNATURE_MIN lengths or greater in length. The values of these two constants can be obtained by experimentation; typical values are 128 bytes and 16 bytes, respectively.

Generating a function signature requires disassembling up to SIGNATURE_MAX bytes of an instruction, halting the disassembly when an unconditional branch ( jmp ) or return ( ret ) is encountered. The disassembler must be able to mask out variant bytes in an instruction with a special wildcard byte; since 0xF1 is an invalid opcode in the Intel ISA, it makes an ideal wildcard byte.

Determining which bytes are invariant requires special support that most disassemblers do not have. The goal is to determine which bytes in an instruction do not change ”in general, the opcode, ModR/M byte, and SIB byte will not change. More accurate information can be found by examining the Intel Opcode Map (see Section 3.5 for more information); the addressing methods of operands give clues as to what may or may not change during linking:

 * Methods C D F G J P S T V X Y are always invariant * Methods E M Q R W contain ModR/M and SIB bytes which may contain   variant bytes, according to the following conditions:     If the ModR/M 'mod' field is 00 and either 1) the ModR/M 'rm'      field is 101 or 2) the SIB base field is 101, then the 16- or     32-bit displacement of the operand is variant. * Methods I J are variant if the type is 'v' [e.g., Iv or Jv] * Methods A O are always variant 

The goal of signature generation is to create as large a signature as possible, in which all of the variant (or prone to change in the linking process) bytes are replaced with wildcard bytes.

When matching library function signatures to byte sequences in a binary, a byte-for-byte comparison is made, with the wildcard bytes in the signature always matching bytes in the target. If all of the bytes in the signature match those in the target, a label is created at the start of the matching byte sequence that bears the name of the library function. Note that it is important to implement this process so that as few false positives are produced as possible; this means signature collisions ”i.e., two library functions with identical signatures ”must be resolved by discarding both signatures.

One of the greatest drawbacks of the GNU binutils package (the collection of tools containing ld, objdump , objcopy, etc.) is that its tools are entirely unable to handle binaries that have had their ELF section headers removed (see the upcoming Section 3.4.1). This is a serious problem, for two reasons: first of all, the Linux ELF loader will load and execute anything that has ELF program headers but, in accordance with the ELF standard, it assumes the section headers are optional; and secondly, the ELF Kickers (see Section 3.5) package contains a utility called sstrip that removes extraneous symbols and ELF section headers from a binary.

The typical approach to an sstriped binary is to switch tools and use a disassembler without these limitations, such as IDA, ndisasm, or even the embedded disassembler in biew or hte. This is not really a solution, though; currently, there are tools in development or in private release that attempt to rebuild the section headers based on information in the program headers.

 <  Day Day Up  >  


Security Warrior
Security Warrior
ISBN: 0596005458
EAN: 2147483647
Year: 2004
Pages: 211

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net