6.5. Types of Test ToolsThere are many different tools that can be used to test and debug very specific aspects of products. Some of these tools are more commonly used by developers than testers, but these tools can also be part of automated testing. For instance, knowing when a program started to leak memory during tests makes it easier to identify the particular changes that caused the memory leak. This section looks at a number of different kinds of tools that can be used as part of a test environment. 6.5.1. Memory AnalyzersOne of the classic problems with programming languages such as C and C++ is that developers are responsible for keeping track of the memory that their programs use. If the memory leaksthat is, if it isn't freed properly for later reusethen the program will eventually run out of memory and probably crash. In addition to permitting leaks, many languages don't check for programming errors such as reading or writing past the end of an array, which corrupts memory. The alternative to all of this is to use a language such as Java that support garbage collection, the process whereby unused memory is automatically managed while the program runs (at the cost of somewhat less control over when memory is reclaimed). Memory analyzers are tools to keep track of how and when memory is allocated and freed by a program. They may also check for errors including ignoring an array's length; using a variable that is no longer valid in a later part of a program; reading from memory that hasn't been used yet; and a whole host of other common coding mistakes. One of the oldest commercial memory analyzers is Purify (http://www.ibm.com/software/awdtools/purify), originally from Pure Software, then Rational, and now IBM. For many developers, Purify defined the expectations for what a memory analyzer should be able to do and how easy it should be to use. Purify runs on Windows, HP-UX, IRIX, GNU/Linux, and Solaris. It works by instrumenting the executable and the libraries that make up a program, adding code to every function to track when each part of it is called. No recompilation of source code is necessary, though more information can be collected if the source code is recompiled. There is a related product named PurifyPlus that provides code coverage (see the next section, Section 6.5.2) as well as memory checking, but PurifyPlus does require recompilation of the program. Other good commercial memory analyzers include TotalView (http://www.etnus.com), which is also a graphical debugger, and Insure++ (http://www.parasoft.com). There are a number of open source memory analyzers (see http://en.wikipedia.org/wiki/Memory_debugger for a long list of them), but the best-known ones are Electric Fence (http://perens.com/FreeSoftware/ElectricFence), Valgrind (http://valgrind.kde.org), and dmalloc (http://dmalloc.com). If you're willing to recompile gcc, there are also patches available at http://gcc.gnu.org/extensions.html to add the -fbounds-checking command-line argument. Electric Fence uses lots of memory but is particularly good at detecting out-of-bounds memory reads or writes. dmalloc is a replacement library for the standard memory allocation library, so it is likely to detect any bug that involves incorrect allocation or freeing of memory. dmalloc is also relatively portable and fast. Valgrind is actually an entire simulated x86 processor for GNU/Linux programs, and memory analysis is just one part of what it can do. Valgrind is probably the closest open source equivalent to Purify. A basic comparison of Electric Fence, dmalloc, and Valgrind can be found in "A Survey of Static and Dynamic Analyzer Tools," by Elise Hewett and Paul DiPalma (http://www.cs.wm.edu/~coppit/csci780-fall2003/final-papers/hewett-dipalma.pdf). An alternative approach is to add some form of memory analysis to your product yourself. This can be particularly useful when you allocate one large block of memory at startup and then have your own memory allocation functions. It's not hard to make sure that every allocation is tracked and then to provide a way to display how much memory is being used by each part of the product. A useful idea is to provide not just the amount of memory used, but also the change since the last time the value was displayed. This lets you see more easily which parts of your product are leaking the most memory. Of course, all this monitoring and analysis comes at a price. The executables are somewhat larger, but the main effect is that applications run more slowly when using memory analyzer tools. This is not usually a problem when the tool is being used to catch mistakes that are triggered by simply starting up a product, but using them to help debug errors that occur only after running a product for a long time can be very tedious indeed. The different speeds can also make timing-dependent bugs go away. This difficult class of bugs, where monitoring something changes the bug, are sometimes known as Heisenbugs.[2]
Even languages such as Java that do support garbage collection can benefit from the use of analysis tools such as JProbe (http://www.quest.com/jprobe), which can show where memory is not being garbage collected as expected and can also suggest why. Fine-tuning exactly when garbage collection occurs in order to improve interactive performance is another use for this kind of memory analyzer. 6.5.2. Coverage ToolsCoverage tools report how much of a product's source code is used when the product is tested. Of course, just because a line of source code has been executed doesn't mean that it has been fully tested and found to behave correctly in all cases. But if a line of code has never even been executed, you don't know anything at all about it.
Another kind of coverage testing is branch coverage. This measures the number of different places where a particular condition was never fully exercised. For example, were both the TRue and false branches in a particular if-then-else statement executed? Still another kind of coverage is condition coverage, where every Boolean in a conditional statement's condition is monitored to make sure that it has been tested as both TRue and false. For example, given the following: if (var1 && var2 && var3) { ... } the Booleans var1, var2, and var3 are expected to have been both true and false during the tests. Tests that set every group of Booleans used in such conditional statements to all possible combinations of values take an exponentially long time to run, depending on the number of Booleans, but using a coverage tool to tell you which conditions were never tested at all can still be useful. Good coverage tools summarize their results so that you can see which files or classes have received lower than average testing with your current tests. Summary reports that allow you to drill down to the details of each function or method can also help you avoid rerunning the coverage tests. As with any testing, the art of coverage testing is to focus on areas of concern and not to expect 100% coverage. Coverage is a worthwhile endeavor, but the lure of concrete numbers can encourage an unwarranted overreliance on coverage for judging how well testing is going. 6.5.3. Performance ToolsFor developers, the idea of performance tools usually suggests profilers. Just like coverage tools, profilers record how often each line, function or method, and class were called, but they also record how much time was spent in each place in the source code. This information can help a developer understand where the product spends most of its time (the bottlenecks) and may suggest some areas to focus on for improving the product's performance. It's a good idea to profile only the released version of the product, since nonoptimized code, or code with debugging enabled, will often change the results of profiling. Many compilers including gcc already support profiling abilities with separate tools such as gprof. Compiling the product with the correct arguments will cause profiling data to be saved when the product is run. This data can then be processed separately later on. Compiler-driven profiling tends to be text only, and it can be hard to follow the results for large programs. One of the best-known commercial profilers is Quantify, originally from Rational, now IBM (http://www.ibm.com/software/awdtools/quantify). It has good graphical summaries of the results, which can be expanded and followed through different parts of the source code. Valgrind (see Section 6.5.1, earlier in this chapter) also has profiling abilities, and there is a graphical frontend to these named KCachegrind (http://kcachegrind.sourceforge.net). For testers, performance is usually about loading the product with unusual amounts of data, large numbers of users, huge numbers of files, or any other parameter that can be modified. The idea is to discover what the limits of the product are, not necessarily to understand the causes of these limits. Once the limits are known, that information can be used to set customers' expectations and to guide them as they configure the product. Stress testing differs from load testing in that stress testing examines how the product behaves when the resources that it needs (memory, CPU cycles, disk space) are in short supply. 6.5.4. Static Code AnalyzersAnother entire class of testing tools is static code analyzers. These tools take the source code of the product as input and analyze it. Some of the more common kinds of information provided by these tools are:
A more unusual example of static analysis of a product's source code is measuring the stability of its API. A stable API is one that doesn't change greatly between two versions of a product. I wrote such a tool for Java applications (JDiff; see http://jdiff.org). The statistic that it uses doesn't seem to published anywhere else, so I have included it here. JDiff counts the number of program elements (Java packages, classes, constructors, methods, and fields) that have been added, removed, or changed between versions. The percentage change of the API between versions is then defined as: For example, if there were 15 packages in the old API, and then 2 packages were removed and 4 packages were added, there are now 17 packages in the new API. If another 3 existing packages have changed between versions, then the simple percentage difference would be: A change of 100% means that there is nothing in common between the two APIs; a change of 0% indicates that nothing changed between the two APIs. In practice, this formula is applied recursively for every package's classes and class members. That is, the value for the number of packages changed (3 in the example) is not an integer, but instead is the value obtained by applying the same formula to all the classes in the changed packages, and then to all the constructors, methods, and fields of the changed classes. This results in a more accurate percentage difference. Real-world figures are a 28% difference between Java J2SE© 1.2 and J2SE 1.3, and a 46% difference between Versions 1.3.1 and 1.4. As might be expected, patch releases of J2SE have much lower percentage differences. Finally, one location for a somewhat dated, but still useful list of static analysis tools is http://testingfaqs.org/t-static.html. |