Section 6.5. Types of Test Tools


6.5. Types of Test Tools

There are many different tools that can be used to test and debug very specific aspects of products. Some of these tools are more commonly used by developers than testers, but these tools can also be part of automated testing. For instance, knowing when a program started to leak memory during tests makes it easier to identify the particular changes that caused the memory leak. This section looks at a number of different kinds of tools that can be used as part of a test environment.

6.5.1. Memory Analyzers

One of the classic problems with programming languages such as C and C++ is that developers are responsible for keeping track of the memory that their programs use. If the memory leaksthat is, if it isn't freed properly for later reusethen the program will eventually run out of memory and probably crash. In addition to permitting leaks, many languages don't check for programming errors such as reading or writing past the end of an array, which corrupts memory. The alternative to all of this is to use a language such as Java that support garbage collection, the process whereby unused memory is automatically managed while the program runs (at the cost of somewhat less control over when memory is reclaimed).

Memory analyzers are tools to keep track of how and when memory is allocated and freed by a program. They may also check for errors including ignoring an array's length; using a variable that is no longer valid in a later part of a program; reading from memory that hasn't been used yet; and a whole host of other common coding mistakes.

One of the oldest commercial memory analyzers is Purify (http://www.ibm.com/software/awdtools/purify), originally from Pure Software, then Rational, and now IBM. For many developers, Purify defined the expectations for what a memory analyzer should be able to do and how easy it should be to use. Purify runs on Windows, HP-UX, IRIX, GNU/Linux, and Solaris. It works by instrumenting the executable and the libraries that make up a program, adding code to every function to track when each part of it is called. No recompilation of source code is necessary, though more information can be collected if the source code is recompiled. There is a related product named PurifyPlus that provides code coverage (see the next section, Section 6.5.2) as well as memory checking, but PurifyPlus does require recompilation of the program. Other good commercial memory analyzers include TotalView (http://www.etnus.com), which is also a graphical debugger, and Insure++ (http://www.parasoft.com).

There are a number of open source memory analyzers (see http://en.wikipedia.org/wiki/Memory_debugger for a long list of them), but the best-known ones are Electric Fence (http://perens.com/FreeSoftware/ElectricFence), Valgrind (http://valgrind.kde.org), and dmalloc (http://dmalloc.com). If you're willing to recompile gcc, there are also patches available at http://gcc.gnu.org/extensions.html to add the -fbounds-checking command-line argument.

Electric Fence uses lots of memory but is particularly good at detecting out-of-bounds memory reads or writes. dmalloc is a replacement library for the standard memory allocation library, so it is likely to detect any bug that involves incorrect allocation or freeing of memory. dmalloc is also relatively portable and fast. Valgrind is actually an entire simulated x86 processor for GNU/Linux programs, and memory analysis is just one part of what it can do. Valgrind is probably the closest open source equivalent to Purify. A basic comparison of Electric Fence, dmalloc, and Valgrind can be found in "A Survey of Static and Dynamic Analyzer Tools," by Elise Hewett and Paul DiPalma (http://www.cs.wm.edu/~coppit/csci780-fall2003/final-papers/hewett-dipalma.pdf).

An alternative approach is to add some form of memory analysis to your product yourself. This can be particularly useful when you allocate one large block of memory at startup and then have your own memory allocation functions. It's not hard to make sure that every allocation is tracked and then to provide a way to display how much memory is being used by each part of the product. A useful idea is to provide not just the amount of memory used, but also the change since the last time the value was displayed. This lets you see more easily which parts of your product are leaking the most memory.

Of course, all this monitoring and analysis comes at a price. The executables are somewhat larger, but the main effect is that applications run more slowly when using memory analyzer tools. This is not usually a problem when the tool is being used to catch mistakes that are triggered by simply starting up a product, but using them to help debug errors that occur only after running a product for a long time can be very tedious indeed. The different speeds can also make timing-dependent bugs go away. This difficult class of bugs, where monitoring something changes the bug, are sometimes known as Heisenbugs.[2]

[2] Like a fundamental particle bound by Heisenberg's uncertainty principle, the bug resists all attempts to pin down both its effect and its location at a particular instant. The act of observing a Heisenbug seemingly destroys information about it.

Even languages such as Java that do support garbage collection can benefit from the use of analysis tools such as JProbe (http://www.quest.com/jprobe), which can show where memory is not being garbage collected as expected and can also suggest why. Fine-tuning exactly when garbage collection occurs in order to improve interactive performance is another use for this kind of memory analyzer.

6.5.2. Coverage Tools

Coverage tools report how much of a product's source code is used when the product is tested. Of course, just because a line of source code has been executed doesn't mean that it has been fully tested and found to behave correctly in all cases. But if a line of code has never even been executed, you don't know anything at all about it.

Some coverage tools are better than others at tracking exceptions. Exceptions are a way of making a function immediately return from any line of its source code if certain error situations occur; the lines after an exception is thrown will not be executed, so should they not be counted as covered.


Another kind of coverage testing is branch coverage. This measures the number of different places where a particular condition was never fully exercised. For example, were both the TRue and false branches in a particular if-then-else statement executed? Still another kind of coverage is condition coverage, where every Boolean in a conditional statement's condition is monitored to make sure that it has been tested as both TRue and false. For example, given the following:

if (var1 && var2  && var3) { ... }

the Booleans var1, var2, and var3 are expected to have been both true and false during the tests. Tests that set every group of Booleans used in such conditional statements to all possible combinations of values take an exponentially long time to run, depending on the number of Booleans, but using a coverage tool to tell you which conditions were never tested at all can still be useful.

Good coverage tools summarize their results so that you can see which files or classes have received lower than average testing with your current tests. Summary reports that allow you to drill down to the details of each function or method can also help you avoid rerunning the coverage tests.

As with any testing, the art of coverage testing is to focus on areas of concern and not to expect 100% coverage. Coverage is a worthwhile endeavor, but the lure of concrete numbers can encourage an unwarranted overreliance on coverage for judging how well testing is going.

6.5.3. Performance Tools

For developers, the idea of performance tools usually suggests profilers. Just like coverage tools, profilers record how often each line, function or method, and class were called, but they also record how much time was spent in each place in the source code. This information can help a developer understand where the product spends most of its time (the bottlenecks) and may suggest some areas to focus on for improving the product's performance. It's a good idea to profile only the released version of the product, since nonoptimized code, or code with debugging enabled, will often change the results of profiling.

Many compilers including gcc already support profiling abilities with separate tools such as gprof. Compiling the product with the correct arguments will cause profiling data to be saved when the product is run. This data can then be processed separately later on. Compiler-driven profiling tends to be text only, and it can be hard to follow the results for large programs. One of the best-known commercial profilers is Quantify, originally from Rational, now IBM (http://www.ibm.com/software/awdtools/quantify). It has good graphical summaries of the results, which can be expanded and followed through different parts of the source code. Valgrind (see Section 6.5.1, earlier in this chapter) also has profiling abilities, and there is a graphical frontend to these named KCachegrind (http://kcachegrind.sourceforge.net).

For testers, performance is usually about loading the product with unusual amounts of data, large numbers of users, huge numbers of files, or any other parameter that can be modified. The idea is to discover what the limits of the product are, not necessarily to understand the causes of these limits. Once the limits are known, that information can be used to set customers' expectations and to guide them as they configure the product. Stress testing differs from load testing in that stress testing examines how the product behaves when the resources that it needs (memory, CPU cycles, disk space) are in short supply.

6.5.4. Static Code Analyzers

Another entire class of testing tools is static code analyzers. These tools take the source code of the product as input and analyze it. Some of the more common kinds of information provided by these tools are:


Language conformance

This is how closely the source code conforms to a standard for the language it is written in. Compilers for each language often deviate from the standard for the language, which can make code that compiles on one machine fail to compile when using a different compiler or when compiling on a different machine. When compiling C source code with gcc, there is a -std argument to specify which language standard is being used.


Security

Source code can be analyzed for statements that are vulnerable to cracking by potential stack smashing or other buffer overruns. Examples of tools that do this are StackGuard, Stack Shield, ProPolice, and Libsafe, all of which are conveniently compared in the paper "A Comparison of Publicly Available Tools for Dynamic Buffer Overflow Prevention," by John Wilander and Mariam Kamkar (http://www.ida.liu.se/~johwi/research_publications).


Correctness

Usually this involves proving, in the mathematical sense, that a program correctly implements what was intended. Far less formal, but still very useful, are tools such as the open source FindBugs (http://findbugs.sourceforge.net), which analyzes Java source code for different bug patterns, even looking for silly little bugs. FindBugs has a good track record of finding real bugs in many well-known applications that were already in production. Other similar tools are listed at http://findbugs.sourceforge.net/links.html, and there is a comparison of such tools for Java at http://www.cs.umd.edu/~jfoster/papers/issre04.pdf.


Size

How large is your product? One simple way to check in a Unix shell is by typing:

find . -name "*.[ch]" -print0 | xargs --null wc 

to count the number of lines in each of the .c and .h C source files in the current directory and its subdirectories. You could even count only lines that end with a semicolon by replacing wc with grep ';' | wc.

A more robust way of doing all this is to use SLOCCount (http://www.dwheeler.com/sloccount), an easy-to-use open source line-counting application that works with most languages. If you want to track how the size of your product changes over time, StatCVS (http://statcvs.sourceforge.net) can do so for CVS repositories and can also generate lots of other CVS-related information; however, what's defined as a line of code is not as sophisticated as in SLOCCount.


Complexity

Counting the number of lines of source code is only the simplest (and some would say an almost meaningless) way of measuring a software product. There are a number of different ways to measure the complexity of source code, and tools for each different method exist for a variety of languages. An introduction to different software metrics and their history can be found at http://irb.cs.tu-berlin.de/~zuse/metrics/3-hist.html.

As well as measuring the size of your product, SLOCCount calculates the COCOMOII (http://sunset.usc.edu/research/COCOMOII) complexity of a product and uses the results to make an estimate of the cost of recreating the product. The resulting figures always seem high until you count the number of hours everyone has spent on the project.


Documentation

Tools such as Javadoc and doxygen generate documentation for developers from the comments embedded in the source code. These tools are described in more detail in Section 8.8.

A more unusual example of static analysis of a product's source code is measuring the stability of its API. A stable API is one that doesn't change greatly between two versions of a product. I wrote such a tool for Java applications (JDiff; see http://jdiff.org). The statistic that it uses doesn't seem to published anywhere else, so I have included it here. JDiff counts the number of program elements (Java packages, classes, constructors, methods, and fields) that have been added, removed, or changed between versions. The percentage change of the API between versions is then defined as:

For example, if there were 15 packages in the old API, and then 2 packages were removed and 4 packages were added, there are now 17 packages in the new API. If another 3 existing packages have changed between versions, then the simple percentage difference would be:

A change of 100% means that there is nothing in common between the two APIs; a change of 0% indicates that nothing changed between the two APIs. In practice, this formula is applied recursively for every package's classes and class members. That is, the value for the number of packages changed (3 in the example) is not an integer, but instead is the value obtained by applying the same formula to all the classes in the changed packages, and then to all the constructors, methods, and fields of the changed classes. This results in a more accurate percentage difference. Real-world figures are a 28% difference between Java J2SE© 1.2 and J2SE 1.3, and a 46% difference between Versions 1.3.1 and 1.4. As might be expected, patch releases of J2SE have much lower percentage differences.

Finally, one location for a somewhat dated, but still useful list of static analysis tools is http://testingfaqs.org/t-static.html.



Practical Development Environments
Practical Development Environments
ISBN: 0596007965
EAN: 2147483647
Year: 2004
Pages: 150

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net