Chapter 17: Observation and Reverse Engineering

Examining a black box component, either by observing its behavior in use or by reverse engineering to determine its inner workings, can provide information that is useful in finding security bugs. In this chapter, we begin by discussing some of the basic methods you can use to study the behavior of black box components without reverse engineering their inner workings. Then we discuss how to use debuggers (programs used to track down bugs by tracing program execution), decompilers (programs that convert a program s binary code into a higher level programming language), and disassemblers (programs that convert a program s binary code into assembly language) to reverse engineer a program. Unlike most other chapters of this book, this chapter does not discuss a specific type of security issue. Instead, this chapter discusses techniques that you can use to better understand bugs in your applications. Examination of your own code is a valuable for understanding how others might exploit the binaries you ship.

Observation Without a Debugger or Disassembler

As discussed later in this chapter, reverse engineering a computer program commonly involves the study and analysis of the program s binary executable without the use of the program s source code to help you understand its inner workings. This effort requires that the program first be decompiled into code of high-level programming (like C++) or disassembled into assembly code and may also require the use of a debugger to trace the program s execution. It also requires a great deal of time because you must acquire in-depth comprehension of a program s binary code.

Observation of a program s operation, which we discuss in this section, can be performed in significantly less time. In fact, you can accomplish this type of study without even having a copy of the target program s binary files. Two common methods of observation are comparing output based on changes in input and the use of monitoring tools to study the effects of program execution.

Comparing Output

When you are searching for security flaws, it is helpful to have as much detailed knowledge of an application s operation as possible. One way to obtain this information is by using the application and noting the details of its functionality. For example, if you are trying to bypass a filter that attempts to block cross-site scripting attacks (discussed in Chapter 10, HTML Scripting Attacks ), you could try sending every possible character separately as input and watching to determine which characters are filtered.

Small changes in output often reveal underlying implementation details, anomalies, and bugs. Error messages also disclose helpful information. For example, the error message SQL error indicates that the operation requires interaction with a database. This approach ” observing the program s behavior in use ” does not require great technical skill or in-depth understanding of code.

Output Can Become Input

A program s output is sometimes later used as input by the same or a different program. Examples include network traffic and data files. Understanding how a program produces output can help you create similar data that could later be consumed as input. For example, a client application might send data (output) over the network to be read by the server (input). Making small changes in the user -controlled input to an application that produces output data can help determine the data format used by the application. For example, Microsoft Office Word documents can contain hyperlinks (URLs). By repeatedly saving the same file using a slightly different hyperlink each time, comparing the resulting data files and noting their differences, you can begin to discover the document file format.

Recently, we used this approach to save two Microsoft Word files (output), the first with the hyperlink (URL) Test , and the second with the URL Test2 . Figure 17-1 shows offset 0x146c-0x147b of both files when viewed in a binary editor (also known as a hex editor). Notice any differences? In addition to the change in the URL text, the binary data preceding the URL changed from 05 00 00 00 to 06 00 00 00. If the terminating NULL character is included, the length of the embedded URL changed from 5 characters in the first document to 6 characters in the second document. This corresponds directly with the data preceding the URL in the file. Also note that the length of the embedded URL is stored in little endian notation (meaning it is stored backward, so the actual value is 00 00 00 06).

Figure 17-1: Saving and comparing two documents with a one-character difference, which reveals the string length is stored in the 4 bytes preceding the string

How Does Comparing Output Help Your Testing?

By using the approach outlined in the preceding section, you will be better able to understand an application s implementation and data format. Understanding the data format and data handling is helpful when you focus testing on a specific area or try to create malicious input. In the Word file example, it is probable that when the application opens this file as input, it reads the size field before reading the data field (the hyperlink s URL). The program s parser might assume that the length of the data matches the length specified in the file and that the data is NULL- terminated . Neither of these assumptions should be made!

Imagine that the parser allocates a buffer the size of the size field and then reads the data field until a NULL character is encountered . By creating a test case in which the size field is small and the data field contains a large amount of data, you might cause a buffer overrun . By comparing the output that becomes input, you can more easily understand the various fields contained in the input and use that knowledge to create better test cases.

Important

Because comparing output requires analysis of only a program s output, study of parts of a program this way is especially useful to attackers who cannot obtain access to the program to disassemble it or run it under a debugger.

Using Monitoring Tools

Monitoring tools can give you even greater insight into how software works without the need for a debugger or assembly knowledge. Some common monitoring tools are Logger/Log Viewer, RegMon, FileMon, Ethereal, and Microsoft SQL Server Profiler.

Monitoring tools enable you to quickly understand key pieces of information about a program s implementation. By using RegMon and FileMon, you can determine which files and registry keys are written and read. Ethereal shows all network traffic. SQL Server Profiler shows the exact SQL statements made. Logger/Log Viewer enables you to obtain two important pieces of information: application programming interfaces (APIs) and the data used in the parameters when calling the APIs. By knowing this information, you will be better able to understand the application s implementation and create better test cases.

Earlier chapters discuss in more detail the uses of RegMon (Chapter 3, Finding Entry Points ), FileMon (Chapter 3), Ethereal (Chapter 4, Becoming a Malicious Client ), and SQL Server Profiler (Chapter 16, SQL Injection ). Here is some information about Logger/Log Viewer.

Logger/Log Viewer

You might have noticed that many of the tools used for security testing are general tools not made exclusively for security purposes. The pair of tools, Logger and Log Viewer, is no different. These tools were created to help users more easily debug applications and find performance issues. They are included as part of the Microsoft Debugging Tools for Windows ( http://www.microsoft.com/whdc/devtools/debugging/installx86.mspx ). These free tools enable you to log for later viewing the API calls a process makes. For example, Log Viewer, in Figure 17-2, shows that Wordpad.exe calls the lstrcpyW API to copy the document filename into a buffer.

Figure 17-2: Viewing which APIs are called and the parameters used in the call in Log Viewer

Tip	Because of the way Logger uses the stack, it is known not to work in all situations. See the product documentation for details. In cases in which Logger does not function as needed, APISpy32 ( http://www.internals.com ) can be used instead.

You can also use Logger/Log Viewer to see whether hyperlinks in an application are invoked with a call to WinExec . By viewing the parameter data, you can also see how attacker-supplied data is used throughout the application and can target your testing more effectively. For example, lstrcpyW was used to copy the filename Document.rtf, as shown in Figure 17-2. Because an attacker might be able to control the filename, an interesting targeted buffer overflow case is a long filename (you know the application calls an API to perform an unbounded string copy).

Important

The API functions lstrcpy and strcpy are not compiled into an application s binary in the same way. lstrcpy is part of kernel32.dll. When lstrcpy is called, code inside kernel32.dll is executed, which enables Logger and APISpy32 to detect its use. On the other hand, the code for strcpy is built into the application s binary and cannot be detected by the monitoring tools mentioned. Functions like strcpy can be more easily identified by using a dis-assembler, as discussed later in this chapter.