12.16.1 ProblemAn object file disassembler can produce an assembly language version of a binary, which can then be used to understand and possibly modify the binary. 12.16.2 SolutionAnti-disassembly tricks are useful in frustrating automatic analysis, but they generally will not hold up to a human review of the disassembly. Make sure to combine the methods presented in the discussion with data or code obfuscation techniques. 12.16.3 DiscussionMany disassemblers assume that long runs of NULL bytes are data, although some will continue to disassemble regardless. In the Intel instruction set, 0x00 is the opcode for add al, [eax] a valid instruction. The following macros use NULL bytes to increment the eax register by pushing eax, loading the address of the pushed value into eax, and executing add al, [eax] instructions as many times as the user specifies. #define NULLPAD_START asm volatile ( \ "pushl %eax \n" \ "movl %esp, %eax\n") #define NULLPAD asm volatile ("addb %al, (%eax)\n") #define NULLPAD_END asm volatile ("popl %eax\n") #define NULLPAD_10 NULLPAD_START; \ NULLPAD; NULLPAD; NULLPAD; NULLPAD; NULLPAD; \ NULLPAD_END This is particularly effective if the value referenced by eax that is, the value at the top of the stack is used later in the program. Note that many disassemblers that ignore runs of NULL bytes allow the user to override this behavior. To demonstrate the effect this macro has on disassemblers, the following source code was compiled and disassembled: void my_func(void) { int x; NULLPAD_10; for (x = 0; x < 10; x++) printf("%x\n", x); } DataRescue's IDA Pro disassembler creates a code/data boundary at the start of the NULL bytes, and completely ignores the instructions that follow: 08048374 my_func: 08048374 55 push ebp 08048375 89 E5 mov ebp, esp 08048377 83 EC 08 sub esp, 8 0804837A 50 push eax 0804837B 89 E0 mov eax, esp 0804837B ; ------------------------------------------------------------------ 0804837D 00 db 0 ; 0804837E 00 db 0 ; 0804837F 00 db 0 ; 08048380 00 db 0 ; 08048381 00 db 0 ; 08048382 00 db 0 ; 08048383 00 db 0 ; 08048384 00 db 0 ; 08048385 00 db 0 ; 08048386 00 db 0 ; 08048387 58 db 58h ; X 08048388 C7 db 0C7h ; + 08048389 45 db 45h ; E 0804838A FC db 0FCh ; n 0804838B 00 db 0 ; 0804838C 00 db 0 ; 0804838D 00 db 0 ; The GNU objdump utility ignores the NULL bytes, though the rest of the disassembly was not affected: 08048374 <my_func>: 8048374: 55 push %ebp 8048375: 89 e5 mov %esp,%ebp 8048377: 83 ec 08 sub $0x8,%esp 804837a: 50 push %eax 804837b: 89 e0 mov %esp,%eax ... 8048385: 00 00 add %al,(%eax) 8048387: 58 pop %eax 8048388: c7 45 fc 00 00 00 00 movl $0x0,0xfffffffc(%ebp) 804838f: 83 7d fc 09 cmpl $0x9,0xfffffffc(%ebp) 8048393: 7e 02 jle 8048397 <my_func2+0x23> 8048395: eb 1a jmp 80483b1 <my_func2+0x3d> Most disassemblers can be fooled by a simple misalignment error for example, jumping into the middle of an instruction so that the target of the jump is disassembled incorrectly. The typical technique of performing an unconditional jump into another instruction is not very effective with disassemblers that follow the flow of execution the jump will be followed, and the bytes between the jump and the jump target will be ignored. Instead, you can use a conditional jump, followed by the first byte of a multibyte instruction (0x0F is ideal for this, because it is the first byte of all two-byte opcodes); this way, a flow-of-execution disassembler will disassemble the code after the conditional branch. #define DISASM_MISALIGN asm volatile ( \ " pushl %eax \n" \ " cmpl %eax, %eax \n" \ " jz 0f \n" \ " .byte 0x0F \n" \ "0: \n" \ " popl %eax \n") This macro compares the eax register to itself, forcing a true condition; the jz instruction is therefore always followed during execution. A disassembler will either ignore the jz instruction and interpret the 0x0F byte that follows as an instruction, or it will follow the jz instruction. If the jz instruction is followed, the disassembler can still interpret the code incorrectly if the address after the jz instruction is disassembled before the address to which the jz instruction jumps. For example: void my_func(void) { int x; DISASM_MISALIGN; for (x = 0; x < 10; x++) printf("%x\n", x); } IDA Pro disassembles the code after the jz instruction at address 0804837D before following the jump itself, resulting in an incorrect disassembly: 08048374 my_func: 08048374 55 push ebp 08048375 89 E5 mov ebp, esp 08048377 83 EC 08 sub esp, 8 0804837A 50 push eax 0804837B 39 C0 cmp eax, eax 0804837D 74 01 jz short near ptr loc_804837F+1 0804837F 0804837F loc_804837F: ; CODE XREF: .text:0804837D#j 0804837F 0F 58 C7 addps xmm0, xmm7 08048382 45 inc ebp 08048383 FC cld 08048383 ; -------------------------------------------------------------------- 08048384 00 db 0 ; 08048385 00 db 0 ; 08048386 00 db 0 ; 08048387 00 db 0 ; 08048388 83 db 83h ; â 08048389 7D db 7Dh ; } 0804838A FC db 0FCh ; n The GNU objdump disassembler does not follow the jump at all and encounters the same problem: 08048374 <my_func2>: 8048374: 55 push %ebp 8048375: 89 e5 mov %esp,%ebp 8048377: 83 ec 08 sub $0x8,%esp 804837a: 50 push %eax 804837b: 39 c0 cmp %eax,%eax 804837d: 74 01 je 8048380 <my_func2+0xc> 804837f: 0f 58 c7 addps %xmm7,%xmm0 8048382: 45 inc %ebp 8048383: fc cld 8048384: 00 00 add %al,(%eax) 8048386: 00 00 add %al,(%eax) 8048388: 83 7d fc 09 cmpl $0x9,0xfffffffc(%ebp) Sophisticated disassemblers attempt to reconstruct as much as possible of the original source code of the binary. One of the tasks they perform towards this goal is the recognition of functions within the binary. Because the end of a function is generally assumed to be the first return instruction encountered, it is possible to truncate a function within the disassembler by providing a false return. The following macro will return to a byte after the ret instruction, causing the definition of the function to end prematurely: #define DISASM_FALSERET asm volatile ( \ " pushl %ecx /* save registers */\n" \ " pushl %ebx \n" \ " pushl %edx \n" \ " movl %esp, %ebx /* save ebp, esp */\n" \ " movl %ebp, %esp \n" \ " popl %ebp /* save old %ebp */\n" \ " popl %ecx /* save return addr */\n" \ " lea 0f, %edx /* edx = addr of 0: */\n" \ " pushl %edx /* return addr = edx */\n" \ " ret \n" \ " .byte 0x0F /* off-by-one byte */\n" \ "0: \n" \ " pushl %ecx /* restore ret addr */\n" \ " pushl %ebp /* restore old &ebp */\n" \ " movl %esp, %ebp /* restore ebp, esp */\n" \ " movl %ebx, %esp \n" \ " popl %ebx \n" \ " popl %ecx \n") The first three pushl instructions and the last three popl instructions save and restore the registers that will be used in the course of the false return. The current stack pointer is saved in the ebx register, and the current stack pointer is set to the frame pointer (ebp) of the current function this places the frame pointer of the calling function at the top of the stack. The saved frame pointer is moved into the ebp register, and the return address is moved into the ecx register so that these values can be preserved across the return. The instruction movl 0f, %edx stores the address of the local code label 0: in the edx register. This address is then pushed onto the stack, where it becomes the new return address. The following ret instruction causes the program to jump to code label 0:, where the execution context of the function (the stack and frame pointers, saved frame pointer, and return address) is restored to its original state. When a disassembler follows the control flow of the program, rather than blindly disassembling instructions from the start of the code segment, it will encounter the false return statement and will stop disassembly of the current function. As a result, any instructions after the false return will not be disassembled, and they will appear as data located in the code segment. void my_func(void) { int x; for (x = 0; x < 10; x++) printf("%x\n", x); DISASM_FALSERET; /* other stuff can be done here that won't be disassembled */ } This produces the following disassembly in IDA Pro: 08048357 51 push ecx 08048358 53 push ebx 08048359 52 push edx 0804835A 89 E3 mov ebx, esp 0804835C 89 EC mov esp, ebp 0804835E 5D pop ebp 0804835F 59 pop ecx 08048360 8D 15 69 83 04 08 lea edx, ds:dword_8048369 08048366 52 push edx 08048367 C3 retn 08048367 my_func endp ; sp = -0Ch 08048367 08048367 ;---------------------------------------------------------------- 08048368 0F db 0Fh ; 08048369 51 55 89 E5 dword_8048369 dd 0E5895551h 08048369 ; DATA XREF: my_func+38#r 0804836D 89 db 89h ; ë 0804836E DC db 0DCh ; ? 0804836F 5A db 5Ah ; Z 08048370 5B db 5Bh ; [ 08048371 59 db 59h ; Y 08048372 C9 db 0C9h ; + 08048373 C3 db 0C3h ; + The false return at address 08048367 ends the function, with the subsequent code not being disassembled. The XREF at address 08048369, however, clearly indicates that something strange is going on, even though the disassembly is incorrect. There is also an indication of a stack error at the endp directive. A cracker can simply examine the instruction making the reference, in this case push edx at address 08048366, to realize that the return address is being overwritten. A disassembler that does not follow the control flow will be not be affected by the false return trick, as the following output from objdump demonstrates: 8048357: 51 push %ecx 8048359: 52 push %edx 8048358: 53 push %ebx 804835a: 89 e3 mov %esp,%ebx 804835c: 89 ec mov %ebp,%esp 804835e: 5d pop %ebp 804835f: 59 pop %ecx 8048360: 8D 15 69 83 04 08 lea 0x8048369,%edx 8048366: 52 push %edx 8048367: c3 ret 8048368: 0f 51 55 89 sqrtps 0xffffff89(%ebp),%xmm2 804836c: e5 89 in $0x89,%eax 804836e: dc 5a 5b fcompl 0x5b(%edx) 8048371: 59 pop %ecx 8048372: c9 leave 8048373: c3 ret The false return at address 08048367 does not affect the subsequent disassembly, although the misalignment trick at address 08048368 does cause the next three instructions to be disassembled incorrectly. This provides an example of how two simple techniques can be combined to create an inaccurate disassembly in different types of disassemblers. |