Disassembly is the process of taking a compiled executable and converting it back to its assembly language roots. Executables can be programmed using a variety of different programming tools and languages (C++, .NET, C#, Visual Basic, and so on), but in the end, its execution is done by the CPU using machine-language instructions. Every single task and process on a PC is being executed using machine language, as illustrated in Figure 12-1. Nothing enters a CPU that isn’t amachine instruction. In the programming world, assembly language is the closest thing we have to machine language.
Figure 12-1: Executable code pathway
In the best of worlds, we could disassemble any malicious executable back to its original language and see the source code as it was written, author’s comments and all. But that isn’t possible with many compiled executables, because the act of compiling usually removes the author’s original comments, as well as the original programming structure.
There are decompilers available for some programming languages, but the output results are mixed. Languages that compile a language into a “pseudo-op code” intermediate step are easier to reverse-engineer than those that do not. For instance, a Java applet can usually be decompiled back to its original Java statements, but you will have less success with a C++-compiled executable.
Because of this problem, the world of disassembly usually means taking a compiled executable and decompiling it to its assembly language representation. Of course, this means competent disassemblers need to understand both assembly language and malicious code. Becoming a competent disassembler involves six steps:
Learn assembly language.
Learn the assembly language instructions available on a particular computer platform.
Choose a disassembler program.
Learn about malicious programming techniques.
Create a disassembly environment.
Practice with different types of code.
I’ll cover each of those topics in this chapter.
This chapter does not cover analyzing the source of nonexecutable code, such as scripting languages, macros, and HTML.