8.3 Test Coverage

One frequent keyword in the testing literature is test coverage. The term describes the answer to the question, How much of X will be covered by my tests? (where X stands for different coverage types). At least two types of coverage can be differentiated: specification-based coverage and code-based coverage.

Specification-Based Coverage

This coverage type refers to the completeness of taking account of our software specification in the test cases. The background can be, for example, requirement tables, use-case models, and state transition diagrams. This type of coverage is normally determined by manual inspections.

That our test suite covers the functional requirements is undoubtedly one of the most important aspects of adequate testing. A pragmatic approach to reach this goal is to have the customer (or the analysts as customer representatives) write testable requirements, those which can be translated unambiguously into unit tests. Then these test cases can be reviewed by the customer to check for accuracy and sufficiency. A more XP-like approach would be to ensure that all acceptance tests can be found as unit tests for system boundary classes and to make sure that whenever an acceptance test fails a unit test will also fail.

Code-Based Coverage

Code-based coverage can refer to the control flow or the data flow of a program or even to both. A large number of coverage metrics have been proposed, including the following commonly used (control flow) metrics:

Line Coverage. This code coverage measure tells you the percentage of your program lines that were "touched" in the test run. This is the weakest measure, because even 100% coverage permits many errors.
Branch Coverage. This code coverage measure tells you how many of the branches in the control flow were visited during the test. This measure is somewhat stronger, but again, even 100% coverage does not guarantee that there are no errors.
Path Coverage. This coverage measure shows you the percentage of all possible paths—branching combinations—that are getting tested. However, despite 100% coverage, which is not realistic for commercial systems, there could still be hidden errors.

The commercial coverage tools currently available for Java ^[6] support only line coverage. Academic tools try to determine stronger metrics. However, the determination of all possible branches in the control flow of polymorphic messages is difficult or even impossible when dynamic class loading is permitted, which conflicts with the above coverage goal. None-theless, such a coverage model can be useful even if we cannot know when 100% coverage is obtained, as long as we can determine that progress with regard to the model being made.

If it were our only goal to increase the value of a specific coverage measure (e.g., to achieve 100% line coverage), then we would fall victim to the phenomenon normally observed when the quality of human work is evaluated on the basis of derived numbers: "People tend to optimize the metric rather than the goal. Tools should complement not replace programmer judgement." ^[7]

Therefore, it appears to be meaningful indeed to use an appropriate tool now and then to determine the (change of) percentage coverage and, above all, to identify code pieces not executed within the tests. In contrast, it would be a doubtful goal if we were to achieve a specific value at all costs to then happily lean back. The results of our effort in determining the coverage can show us flaws in our tests. In this respect, we have to distinguish between the following categories of uncovered code:

Code not being tested, but which should be tested. This discovery is the greatest benefit for us.
Dead code that should be removed. This is also very useful.
Code generated automatically, which is not invoked in our application.
Code that can be reached only against a very high testing effort. This often concerns error handling code, because Java's concept of checked exceptions provokes (at least the interim) insertion of empty try-catch blocks. These uncovered lines occur less frequently in programs developed by the test-first approach. ^[8]

In a paper entitled, How To Misuse Code Coverage [URL:TestingCoverage], Brian Marick states more reasons why code coverage can never be the goal, but merely a nice addition to a tester's common sense. In summary, Marick writes in the Yahoo! discussion previously cited:

Coverage can't tell you that you're missing code, because coverage tools work on the code you have. How much assurance should you expect from a tool that is oblivious to so many bugs?

Mutation testing is an interesting addition to conventional coverage analysis and was originally proposed by DeMillo [78]. This testing type is based on targeted changes to the application code and on subsequent verification of whether or not the original test suite can detect such a change as an error. In contrast to conventional coverage metrics, this method can be used to identify code parts that are executed within the suite but have implications not verified in the tests. One representative of this type of tools is JesTer [URL:JesTer], which uses JUnit to run mutation tests.

^[6]For example JProbe with its coverage module [URL:JProbe].

^[7]Kent Beck in a Yahoo! discussion of the "code coverage" issue.

^[8]Ron Jeffries even gives an inductive proof that test-first development can always achieve 100% line coverage ([URL:YahooXP], Message 26626).