This section introduces a number of strategies for auditing code and explains their strengths and weaknesses. Keep in mind that these strategies can (and often must) be combined to suit the nuances of the application you're reviewing. Developing your own strategies based on the workflow you find most appealing is encouraged, too. Three basic categories of code-auditing strategies are described in the following sections, and all three have their value in different situations. The following list summarizes the categories:
Each strategy description in the following sections includes a scorecard so that you can compare the finer points easily. Table 4-3 gives you a legend for understanding these scorecards.
Code Comprehension StrategiesCode comprehension strategies are organized around discovering vulnerabilities by directly analyzing the code. Typically, success with these techniques require you to read the code and understand it. They require higher degrees of concentration and discipline than other techniques, but they pay dividends in terms of learning the codebase. As noted in the previous bulleted list, the abbreviation "CC" is used for the following discussion of these strategies. Trace Malicious InputThe CC1 technique (see Table 4-4) is close to what most people think code review involves. You start at an entry point to the system, where user-malleable information can come in. You then trace the flow of code forward, performing limited data flow analysis. You keep a set of possible "bad" inputs in the back of your mind as you read the code and try to trace down anything that looks like a potential security issue. This technique is an effective way to analyze code, but it requires some experience so that you know which functions to trace into.
Generally, you focus your efforts on searching for any type of behavior that appears unsafe: a vulnerability class you recognize, a failure to define a trust boundary where it's needed, and so forth. It's hard to go too far off track with this technique because you can usually keep yourself on the trail of malleable input data. However, overlooking issues when you get tired or impatient can happen, as inevitably you start skipping over functions you would have analyzed earlier in the day. Unfortunately, this strategy is so time consuming that you're certain to lose focus at some point. This kind of analysis can prove difficult in object-oriented code, especially poorly designed object-oriented code. You'll know quickly whether this is an issue because the first user input you trace makes you open five or six source code files, usually before the system manages to do anything with the input. In this case, you need the assistance of accurate design documentation, including a fairly complete threat model. Failing that, you should postpone your analysis and perform some module or class review first to understand the system from an object-oriented perspective. Analyze a ModuleThe crux of the CC2 technique (see Table 4-5) is reading code line by line in a file. Instead of drilling down into function calls and objects you encounter, or back-tracing to see how functions are called, you take notes about any potential issues you spot.
You might not expect this, but many experienced code reviewers settle on the CC2 technique as a core part of their approach. In fact, two of your authors typically start reviewing a new codebase by finding the equivalent of the util/directory and reading the framework and glue code line by line. This technique has great side benefits for future logic and design review efforts because you pick up the language and idioms of the program and its creators. It might seem as though you'd miss issues left and right by not tracing the flow of execution, but it actually works well because you aren't distracted by jumping around the code constantly and can concentrate on the code in front of you. Furthermore, all the code in the same file tends to be cohesive, so you often have similar algorithms to compare. This technique has tradeoffs as well. First, it's taxing, and often you feel mental fatigue kick in after too many continuous hours. Sometimes you stop being effective a little while before you realize it, which can lead to missed vulnerabilities. The other problem is that documenting every potential issue requires considerable discipline, and maintaining the momentum for longer than four or five hours can be hard. Generally, you should stop for the day at this point and switch to other types of less intense analysis. This technique has another hidden flaw: It's easy to go off track and review code that isn't security-relevant and isn't teaching you anything about the application. Unfortunately, you need to have a good feel for software review to know whether you're spending your time effectively. Even considering that, sometimes a piece of code just catches your fancy and you follow it down the rabbit hole for the next several hours. So make sure you're sticking to your process when using this review strategy and accurately assessing how valuable it is. Analyze an AlgorithmThe CC3 strategy (see Table 4-6) requires knowing enough of the system design to be able to select a security-relevant algorithm and analyze its implementation. This strategy is essentially the same as analyzing a module (CC2); however, you're less likely to go off track.
Of course, the effectiveness of this strategy depends almost entirely on the algorithm you select to analyze, so you need to choose something security relevant. It's best to focus your efforts on pervasive and security critical algorithms, such as those that enforce the security model, implement cryptography, or are used in most input processing. Analyze a Class or ObjectThe CC4 strategy (see Table 4-7) is almost the same as analyzing a module (CC2, Table 4-5), except you focus on a class implementation.
This strategy is more effective than CC2 for object-oriented programs because objects tend to be fairly cohesive. It's also less prone to slipping off track, although how much is determined by how cohesive and security relevant the object is. As with CC2, you need to pay close attention when employing this review strategy. Trace Black Box HitsChapter 1, "Software Vulnerability Fundamentals," introduced black box testing and fuzz-testing, and this chapter explains how they can affect the assessment process. To recap, in black box testing, you manually feed an application with different erroneous data to see how the program responds; fuzz-testing uses tools to automate the blackbox testing process. You flag your black box input as a "hit" when it causes the program to crash or disclose useful information it shouldn't. These hits are then traced to identify the vulnerabilities that caused the abnormal behavior. Essentially, black box testing is a brute-force method for finding vulnerabilities and isn't very thorough; however, it might enable you to catch "low-hanging fruit" in a short time. Occasionally, it will also help you find extremely subtle vulnerabilities that are difficult to identify with code analysis. The CC5 strategy (See Table 4-8) provides a method for including black box and fuzz-testing in a more detailed application assessment. The procedure for performing this strategy is fairly simple. It requires only a functioning version of the application and identification of the entry points you want to target. Then you need to tailor the types of inputs you generate from your fuzz-testing tool or manually iterate through a smaller set of inputs. For example, if you're auditing a Web server, and the entry point is a TCP port 80 connection, you probably want to use an HTTP protocol fuzzer. You might have additional knowledge of the implementation that enables you to further alter your inputs and improve your chances of successful hits. Of course, nonstandard or proprietary protocols or file formats might require far more effort in generating a fuzzing tool. Luckily, you can simplify this task to some degree by using frameworks such as SPIKE, discussed later in "Fuzz-Testing Tools."
Note Ideally, black box analysis should be part of the QA process. However, the QA process might not be broad enough to address the true range of potentially malicious input. So you should use any available QA testing harnesses but alter the input beyond the parameters they already check. The "Fault Injection" chapter of The Shellcoder's Handbook (Wiley, 2004) covers black box testing techniques extensively. It outlines a number of useful input generation methods, summarized in the following list:
Candidate Point StrategiesCandidate point (CP) strategies are one of the fastest ways of identifying the most common classes of vulnerabilities. These strategies focus on identifying idioms and structured code patterns commonly associated with software vulnerabilities. The reviewer can then back-trace from these candidate points to find pathways allowing access from untrusted input. The simplicity of this approach makes candidate point strategies the basis for most automated code analysis. Of course, the disadvantage is that these strategies don't encourage a strong understanding of the code and ignore vulnerabilities that don't fit the rather limited candidate point definitions. General Candidate Point ApproachThe CP1 strategy (see Table 4-9) is almost the opposite of a code comprehension strategy. You start with the lowest-level routines that grant access to application assets or could harbor a vulnerability. This process might involve using automated tools to discover potentially unsafe code constructs or just a simple text search based on your existing knowledge of the application and potential vulnerabilities. You then trace backward through the code to see whether these routines expose any vulnerabilities accessible from an application entry point.
For example, say you use an analysis tool that reports the following: util.c: Line 1293: sprintf() used on a stack buffer You would attempt to verify whether it's really a bug. The function might look something like this: int construct_email(char *name, char *domain) { char buf[1024]; sprintf(buf, "%s@%s", name, domain); ... do more stuff here ... } You can't determine whether this bug is exploitable until you verify that you can control either the name or domain argument to this function, and that those strings can be long enough to overflow buf. So you need to check each instance in which construct_email() is called to verify whether it's vulnerable. This verification approach is actually fairly quick, but it has a number of drawbacks. Mainly, it's an incomplete approach; it improves your familiarity with the application, but it doesn't increase your understanding of how the application works. Instead, you must rely on assumptions of what constitutes a vulnerability, and these assumptions might not reflect the code accurately. Therefore, using only this approach can cause you to miss more complex vulnerabilities or even simple vulnerabilities that don't fit strict classifications. Automated Source Analysis ToolThe CP2 strategy (see Table 4-10) can be used to generate candidate points, as discussed in the CP1 strategy. This strategy has gotten a lot of press in the past few years, as software companies scramble to find simpler and less expensive methods of securing their applications. The result has been an explosion in the number and variety of source analysis tools.
Early source-code analysis systems were just simple lexical analyzers; they searched for patterns matching potentially vulnerable source strings. Newer systems can actually perform a fairly detailed analysis of an application's data flow and identify several classes of vulnerabilities. These tools can be helpful in identifying candidate points and even offer some level of analysis to speed up manual review of identified candidates. The downside of automated source analysis tools is that they are in their infancy. The current batch of tools require a high time and cost investment and have inconsistent performance. Most tools require extensive configuration and have serious issues with identifying excessive false-positive candidate points. This problem is so severe that the results of the tool are often ignored because of time required to trace all the false-positive results. Finally, as a candidate point strategy, automated source analysis tools focus only on a specific set of potentially vulnerable idioms. Therefore, they are limited in the classes of vulnerabilities they can detect. Even the best automated source analysis tools fail to identify simple vulnerabilities outside their parameters or complex vulnerabilities that lack an easily defined direct relationship. These complex vulnerabilities include most design and logic vulnerabilities in addition to many of the more complex implementation vulnerabilities. Taking all the preceding points into account, there is still a lot of potential for automated source analysis tools. The technology will certainly improve, and the long-term benefits will eventually outweigh the downsides. In fact, many development groups are already using automated analysis to augment manual code review and internal quality control. This practice can be expected to grow as tools become more flexible and can be integrated into the complete review process more effectively. Simple Lexical Candidate PointsA wide range of vulnerabilities lend themselves to identification based on simple pattern-matching schemes (the CP3 strategy shown in Table 4-11). Format string vulnerabilities and SQL injection are two obvious examples. In identifying these vulnerabilities, the reviewer uses a utility such as grep or findstr to generate a list of candidate points from across a codebase. This list is then paired down based on what the reviewer knows about the application design. For instance, you should be able to eliminate the majority of these candidate points by simply identifying whether they are in a module that handles any potentially malicious input. After the list has been paired down, you use the general candidate point approach (CP1) to identify any exploitable paths to this location.
Simple Binary Candidate PointsAs with source analysis, a range of candidate points can be identified fairly easily in an application's binary code (the CP4 strategy shown in Table 4-12). For example, you can identify a starting list of candidate points for sign extension vulnerabilities by listing the occurrences of the MOVSX instruction on an Intel binary executable. You can also search for many equivalent source patterns in the binary; this method is essential when you don't have access to the application's source code. You can then pair down the list and trace in essentially the same manner you would for the lexical candidate point strategy (CP3).
Black Box-Generated Candidate PointsWhen black box testing returns results indicating software bugs, you need to work backward from the fault point to find the cause. This strategy (CP5) is summarized in Table 4-13.
Most of the time, the black box method involves performing some level of crash analysis. To perform this step, you probably need to be familiar with assembly code. Many debuggers can correlate source code with assembly code to some degree, so if you have source code available, you might not need to be as familiar with assembly code. Sooner or later, however, a good auditor should be competent at reading and interpreting assembly code. Fortunately, it's something that you will almost certainly pick up with experience, and you can take advantage of a lot of available literature on assembly code for a variety of architectures. Because most popular software is compiled for Intel platforms, you will probably want to learn this platform first. In addition to books and online tutorials, you can find a comprehensive manual of the Intel instruction set and programming guides from Intel at www.intel.com/design/pentium4/manuals/index_new.htm. Now you have the challenge of tracing backward from a memory dump of where the crash occurred to where in the code something went wrong. This topic could warrant an entire chapter or more, but because it's not the focus of this chapter (or the book), just the basics are covered. First, some crash dumps are easy to find because they crash precisely at the location where the bug is triggered. Consider this following code, for example: text:76F3F707 movzx ecx, word ptr [eax+0Ah] text:76F3F70B dec ecx text:76F3F70C mov edx, ecx text:76F3F70E shr ecx, 2 text:76F3F711 lea edi, [eax+19h] text:76F3F714 rep movsd text:76F3F716 mov ecx, edx text:76F3F718 and ecx, 3 text:76F3F71B rep movsb text:76F3F71D pop edi text:76F3F71E pop esi A huge memory copy will occur, assuming you can control the short integer located at [eax+0Ah] and set that integer to 0. If it's set to 0, the dec ecx instruction causes an integer underflow, which results in a large memory copy. Note This type of bug is discussed in more detail in Chapter 6, "C Language Issues." Don't worry if you don't understand it now. Just be aware that a huge memory copy occurs as a result, thus corrupting large amounts of program data. If you had fuzz-tested this bug, it would crash on the rep movsd instruction. This bug is fairly straightforward to analyze via back-tracing because you know instantly where the crash occurs. The remaining work is to figure out where [eax+0Ah] is populated. Usually you search the immediate function where the application has crashed; failing that, you might need to do more investigative work. In this case, you need to see where the eax register was set and trace back to find where it was allocated. In object-oriented code, references like this might refer to an object instantiation of a class, which makes things more difficult (if you have only the binary to work with) because you can't see a direct path from the population of that memory location to a place where it's referenced and used. Thankfully, othersin particular, Halvar Flakehave done work on dealing with object recognition in binaries and weeding out unwanted code paths to help isolate activity in a certain part of the application. (Flake's BinNavi tool and objrec IDA plug-in are described in "Binary Navigation Tools," later in this chapter.) In this situation, a crash is analyzed with this basic procedure:
The second example of dealing with faults happens when the application crashes at a seemingly random location. This can happen when memory corruption occurs at some point in the program but the corrupted memory region isn't accessed (or accessed in such a way that a fault is generated) until much later in the code. In fact, in the previous assembly example, imagine that you traced it back and determined that [eax+0Ah] was set to 10 when a class was initialized and is never changed. This crash then becomes mystifying because you have determined that [eax+0Ah] is never set to 0, yet here it is crashing because it was set to 0! In this case, what has likely happened is one of two things:
If the first case is true, when you fuzz the application again with the same input, an identical crash will probably occur, but if the second case is true, the application might crash somewhere totally different or not at all. So how do you find out what's going on? Several tools are available to help you discover the cause of a fault, depending on the nature of the vulnerability. The easiest one to discover is when a buffer that's not part of any sort of structure has been allocated on the heap and overflowed. Although the random crashes seem like a problem at first, you can isolate problems such as this one fairly quickly. Microsoft has a tool named gflags that's part of the Microsoft Debugging Tools for Windows (available at www.microsoft.com/whdc/devtools/debugging/debugstart.mspx), which is useful in this situation. In particular, you can use it to enable "heap paging" functionality in the process you're debugging. Essentially, heap paging causes each request for memory to be allocated at the end of a page so that a guard page immediately follows the memory allocated. So when a buffer overflow occurs, an attempt is made during the copy operation to write data to the guard page, thus triggering an exception. Therefore, you can cause an exception to occur immediately when the bug is triggered. Custom memory allocators might be more difficult, however. One approach is to intercept calls to the custom memory allocation routines and redirect them to system allocation routines. The difficulty of this approach depends on the OS, whether memory allocators are in a separate shared library, and whether they are externally accessible symbols. Other solutions might include patching binary code to make the custom memory allocators do nothing except call the real allocation routines. Some of these methods can become messy and programming intensive, but your choice depends on the testing environment and what tools you have available. For example, in a UNIX environment, hijacking function calls to a shared library is quite simple using the LD_PRELOAD functionality that UNIX linkers provide. You can set this environment variable to direct the linker to load a library of your choosing instead of the library function that's intended to be called. Note The LD_PRELOAD linker functionality has been a target of security bugs in the past, and it's discussed in more detail in the coverage of UNIX vulnerabilities in Chapter 10, "Unix II: Processes." Another quick-and-dirty hack involves using a debugger to manually redirect calls from one location to another to cause different allocation routines to be called. For example, you could set a breakpoint in a debugger on a custom application, and then set the instruction pointer to point to the system's memory allocator whenever the breakpoint is triggered. This method is tedious because allocations probably occur hundreds of times in the application you're examining; however, many debuggers enable you to create scripts or carry out tasks automatically when a breakpoint is triggered. For example, in the SoftICE debugger, you could issue the following command: bpx 12345678 DO "r eip malloc" This command sets a breakpoint on memory location 0x12345678 (assuming the custom memory allocator is at that location). When the breakpoint is triggered, the instruction pointer is changed to point to the malloc() routine instead. If you have corrupted a structure, you need to examine the effects of that corruption to understand how it occurred. Look for the offset of the lowest corrupted structure member to get a more accurate location. Once you know the location, you should be able to determine that the corruption occurred in one of the following two ways:
So you need to identify where the corrupted elements exist in the structure you are examining. Doing this can cut down on time spent examining how the structure is manipulated, as fixed-size data types being modified aren't a concern. The way certain offsets of the structure are accessed gives you a clear indication of what kind of data is being stored there. Code indicating data buffers in a structure might look something like this: lea eax, [ebx+0FCh] push [ebp + arg_0] push eax call strcpy Suppose you're examining a crash because [ebx+124h] is supposed to be a pointer, but instead it's 0x41414141 because you have somehow corrupted the structure. Looking at the preceding code, you can see that [ebx+0FCh] is apparently a string because it's passed as the destination argument to strcpy(). You could then trace back arg_0 and see whether you controlled it and whether it's indeed the result of the structure corruption. Application-Specific Candidate PointsAfter you've spent some time with a codebase, you'll start to notice recurring vulnerable patterns and programmatic idioms. Sometimes they are vulnerable utility functions, such as a database wrapper or a string-handling routine. With the CP6 strategy (see Table 4-14), you focus on the similarities in these patterns and develop simple methods of searching the code to generate candidate point lists. Usually this strategy involves nothing more than creating a simple script of regular expression tests in your language of choice. Although you might get sidetracked in the Perl versus Python versus Ruby versus flavor-of-the-month debate. It's worth pointing out that the cool kids are using Haskell.
Design Generalization StrategiesDesign generalization (DG) strategies focus on identifying logic and design vulnerabilities by reviewing the implementation and inferring higher-level design abstractions. After you have this understanding, you can use design generalization strategies to identify areas of overlapping trust where trust boundaries are required. This approach is a variation on generalization in software design, in which higher-level interfaces and components are developed by generalizing lower-level implementations. Generalization strategies are used primarily as a follow-up component to other strategies because they require a good understanding of the application's implementation and function. Model the SystemChapter 2 discussed threat modeling as a way to develop an abstraction for a system by the process of factoring (top-down). However, there's no reason you can't run the threat model in reverse and model the system by generalizing from the implementation (bottom-up), and then factoring back down into components you haven't seen yet. This DG1 strategy (see Table 4-15) can be extremely thorough and is highly effective when you want to establish the most detailed knowledge of the system. Unfortunately, it's also slow, as it amounts to reverse-engineering the complete design from the implementation. However, it's the best method for identifying design and architectural vulnerabilities from an existing implementation.
Typically, you need to perform detailed modeling for only security-critical components, such as the application's security subsystem, input handling chain, or other major framework components used throughout the application. However, an application refactoring cycle does give you an opportunity to build a complete model that has been validated against the implementation. This cycle introduces overhead into the refactoring process, but it's far less obtrusive than modeling after the application is finished, and it can pay dividends in securing the application design during and after refactoring. Hypothesis TestingThe DG2 strategy (see Table 4-16) is simply the process of attempting to determine the design of smaller programmatic elements by making a hypothesis and testing it through observations of the implementation. This strategy is especially necessary for any medium to large applications because they are too large to wrap your brain around at one time. Instead, you make a guess on what abstraction the implementation reflects, and then try to analyze the implementation in the context of that assumption. If you're right, you've successfully reverse-engineered an element of the design from the implementation. If you're wrong, your efforts should give you enough context to make a more educated guess of the correct purpose.
Deriving Purpose and FunctionThe DG3 strategy outlined in Table 4-17 refers to the process of directly identifying the abstraction an implementation represents. One of the best ways to perform this strategy is by picking key programmatic elements and summarizing them. For example, try to identify code elements that appear to enforce a trust boundary. Then attempt to derive the associated trust levels, privileges, and basic structure from the implementation. This method can require copious note taking and some diagramming, and you might have a few missteps; however, at the end, you should have a good understanding of the programmatic idioms responsible for the component of the trust model you're assessing. From this understanding, you should be able to identify design and architectural issues in this part of the model.
Design Conformity CheckAs you review an application's implementation, you'll see a number of commonly traveled code paths, and you should focus your design generalization efforts on these areas. You need to look closely at the "gray areas" in these componentsparts of the design where a correct action is undefined in a certain case, thus resulting in implementation-specific behavior. If you don't have access to a formal specification, you don't know whether a piece of code is implementing defined behavior; however, this might not matter. Essentially, your goal is to examine all the oddball cases when some operation is performed on potentially untrusted data. After you discover what the application is attempting to perform in a function or module, it becomes apparent when something incorrect is allowed to pass through. This DG4 strategy is summarized in Table 4-18.
This strategy is concerned with identifying vulnerabilities that result from discrepancies between a design specification and an implementation. The design specification is a guideline for what the application is supposed to do, but these specifications are rarely followed to the letter. Design specifications often fail to define behavior for every single case, resulting in "gray areas" that later developers must interpret. After you're familiar with the application's internals, you should identify variances between the specification and implementation. You need to identify the implications of that variance and how they could affect the application's security. Sometimes a specification policy breach has no security impact; however, many security vulnerabilities are the result of specification variances with unintended consequences. Note The term "policy breach," not "security breach," has been used in this discussion. In a policy breach, the application allows some condition to happen that shouldn't be allowed according to the specification. Policy breaches often equate to security breaches, but not always. Determining the consequences is a matter of considering how the newly discovered behavior might affect the rest of the system. This determination involves reading the code at each point affected by the policy breach and considering special cases the underlying platform might present. For example, imagine auditing a Web server that allows you to set arbitrary environment variables when receiving certain malformed headers. (Usually, each header is prefixed with HTTP_ and then set as an environment variable.) This behavior is most certainly a policy breach. To evaluate the consequences, you need to read other parts of the system to determine how attackers might be able to abuse this inconsistency with the specification. In this case, you would probably discover that you could set arbitrary values for security-relevant Common Gateway Interface (CGI) variables in a server-side application. You might be able to set the AUTH_USER variable to fool an application into thinking you had already authenticated or set REMOTE_HOST and REMOTE_ADDR to make it seem as though you're connecting locally and (as such) allowed to access sensitive data. On UNIX systems, your knowledge of the operating system might suggest that setting the special linker environment variables (such as LD_PRELOAD) could be useful and result in running arbitrary code. |