Approaches to Static Analysis | Software Security: Building Security In

Probably the simplest and most straightforward approach to static analysis is the UNIX utility grepthe same functionality you find implemented in the earliest tools such as ITS4. Armed with a list of good search strings, grep can reveal a lot about a code base. The downside is that grep is rather lo-fi because it doesn't understand anything about the files it scans. Comments, string literals, declarations, and function calls are all just part of a stream of characters to be matched against.

You might be amused to note that using grep to search code for words like "bug," "XXX," "fix," "here," and best of all "assume" often reveals interesting and relevant tidbits. Any good security source code review should start with that.

Better fidelity requires taking into account the lexical rules that govern the programming language being analyzed. By doing this, a tool can distinguish between a vulnerable function call:

gets(&buf);

a comment:

/* never ever call gets */

and an innocent and unrelated identifier:

int begetsNextChild = 0;

As mentioned earlier, basic lexical analysis is the approach taken by early static analysis tools, including ITS4, Flawfinder, and RATSall of which preprocess and tokenize source files (the same first steps a compiler would take) and then match the resulting token stream against a library of vulnerable constructs. Earlier, Matt Bishop and Mike Dilger built a special-purpose lexical analysis tool specifically to identify time-of-checktime-of-use (TOCTOU) flaws [Bishop and Dilger 1996].

While lexical analysis tools are certainly a step up from grep, they produce a hefty number of false positives because they make no effort to account for the target code's semantics. A stream of tokens is better than a stream of characters, but it's still a long way from understanding how a program will behave when it executes. Although some security defect signatures are so strong that they don't require semantic interpretation to be identified accurately, most are not so straightforward.

To increase precision, a static analysis tool must leverage more compiler technology. By building an abstract syntax tree (AST) from source code, such a tool could take into account the basic semantics of the program being evaluated.

Armed with an AST, the next decision to make involves the scope of the analysis. Local analysis examines the program one function at a time and doesn't consider relationships between functions. Module-level analysis considers one class or compilation unit at a time, so it takes into account relationships between functions in the same module and considers properties that apply to classes, but it doesn't analyze calls between modules. Global analysis involves analyzing the entire program, so it takes into account all relationships between functions.

The scope of the analysis also determines the amount of context the tool considers. More context is better when it comes to reducing false positives, but it can lead to a huge amount of computation to perform.

A History of Rule Coverage

Coding rules in explicit form have evolved rapidly in their coverage of potential vulnerabilities. Before Bishop and Dilger's work [1996] on race conditions in file access, explicit coding rulesets (if they existed at all) were only checklist documents of ad hoc information authored, managed, and typically not widely shared by experienced software security practitioners. Bishop and Dilger's tool was one of the first recognized attempts to capture a ruleset and automate its application through lexical scanning of code.^[6] For the next four years, plenty of research was done in the area, but no other tools and accompanying rulesets emerged to push things forward.

^[6] Bishop and Dilger's tool was built around a limited set of rules covering potential race conditions in file accesses using C on UNIX systems [Bishop and Dilger 1996].

This changed in early 2000 with the release of ITS4, a tool whose rule-set also targeted C/C++ code but went beyond the single-dimensional approaches of the past to cover a broad range of potential vulnerabilities in 144 different APIs or functions. This was followed the next year by the release of two more tools, Flawfinder and RATS. Flawfinder, written by David Wheeler, is an "interestingly" implemented C/C++ scanning tool with a somewhat larger set of rules than ITS4. RATS, authored by John Viega, not only offers a broader ruleset covering 310 C/C++ APIs or functions but also includes rulesets for the Perl, PHP, Python, and OpenSSL domains. In parallel with this public development, Cigital (the company that originally created ITS4) began commercially using SourceScope, a follow-on to ITS4 with a new standard of coverage653 C/C++ APIs or functions. Figure 4-1 shows how the rulesets from early tools intersect.

Figure 4-1. A Venn diagram showing the overlap for ITS4, RATS, and SourceScope rules. Together, these rules define a reasonable minimum set of C and C++ rules for static analysis tools. (Thanks to Sean Barnum, who created this diagram.)

Today a handful of first-tier options are available in the static code analysis tools space. These tools include but are not limited to:

Coverity: Prevent <http://www.coverity.com/products/products_security.html>
Fortify: Source Code Analysis <http://www.fortifysoftware.com/products/sca/>
Ounce Labs: Prexis/Engine <http://www.ouncelabs.com/prexis_engine.html>
Secure Software: CodeAssure Workbench <http://www.securesoftware.com/products/source.html>

Each of the tools offers a comprehensive and growing ruleset varying in both size and area of focus. As you investigate and evaluate which tool is most appropriate for your needs, the coverage of the accompanying ruleset should be one of your primary factors of comparison.

Together with the Software Engineering Institute, Cigital has created a searchable catalog of rules published on the Department of Homeland Security's Building Security In portal <http://buildsecurityin.us-cert.gov/portal/>. This catalog contains full coverage of the C/C++ rulesets from ITS4, RATS, and SourceScope and is intended to represent the foundational set of security rules for C/C++ development. Though some currently available tools have rulesets much more comprehensive than this catalog, we consider this the minimum standard for any modern tool scanning C/C++ code for security vulnerabilities.

Modern Rules

Since the early days of ITS4, the idea of security rules and security vulnerability categories has progressed. Today, a number of distinct efforts to categorize, describe, and "tool-ify" software security knowledge are under way. My approach is covered in Chapter 12, where I present a simple taxonomy of coding errors that lead to security problems. The first box, Modern Security Rules Schema, describes the schema developed at Cigital for organizing security rule information and gives an example.^[7] The second box, A Complete Modern Rule on pages 119 through 122, provides an example of one of the many rules compiled in the extensive Cigital knowledge base.

^[7] Also of note is the new book The 19 Deadly Sins of Software Security, which provides treatment of the rules space as well [Howard, LeBlanc, and Viega 2005]. Chapter 12 includes a mapping of my taxonomy against the 19 sins and the OWASP top ten <http://www.owasp.org/documentation/topten.html>.