A Testing Process | A Practical Guide to Testing Object-Oriented Software

Given an incremental, iterative development process model, we can now sketch out a process for testing. We will defer many of the details to later chapters because basically all the information in this book belongs in the process. First, we will outline a series of issues that must be addressed to give a basic shape to the test process. Then we will consider how each development product is tested.

Planning Issues

Testing is traditionally incorporated into a development process at the point where executable code is available. Common practice is to perform a unit test on individual modules as they are developed, an integration test on subsystems as they are assembled from units and/or other subsystems, and a system test as the system becomes available. If an iterative, incremental process is used, then, at a minimum, system testing is performed after each increment is completed. Class testing and interaction testing are performed during or after each iteration for an increment. Regression testing is performed on any software whose implementation changed but whose specification did not. If both have changed, the test suites are revised and then reapplied.

In our approach, testing is conducted even before code is written. Models, being representations of the system just as code is a representation of the system, can be tested. In particular, design models lend themselves to testing by a form of execution that we describe in Chapter 4. Using analysis models, we can test a system in the sense of validation testing, thus ensuring that the right system is being specified. This last type of testing does not change much from traditional approaches, and so it is only a peripheral focus to this book.

Dimensions of Software Testing

Testing embraces many activities that must be performed. All these activities comprise testing. With respect to these activities, we identify five dimensions of testing that describe the answers to the following five questions^[3]:

^[3] A sixth dimension concerning where testing will be performed is important from an organizational perspective, but it is not of concern to us in the context of this book.

Who performs the testing? Will the software's developers test it, will there be an independent group of people to test the software, or will there be some combination of the two?
Which pieces will be tested? Will every component of the software be tested or just the ones associated with higher risks?
When will testing be performed? Will testing be an ongoing process, an activity done at special milestones, or an activity performed at the end of development?
How will testing be performed? Will the testing be based solely on what the software is supposed to do or based also on how it is implemented?
How much testing is adequate? How will it be decided that enough testing has been done or where limited resources are best allocated for testing?

These are dimensions in the sense that each one represents an important consideration over a continuum of possible levels of effort or approaches, but each is independent of all the others. Each dimension must be considered when designing a testing effort, and a decision must be made about where on a continuum the project wishes to place itself. A decision made for one dimension will have no impact on decisions made for any of the other dimensions. All decisions together will determine the resources needed, the methods used, and the quality of the results of the total testing effort.

We will now take a look at each of these dimensions in more detail. These dimensions will also be considered in various discussions throughout the book. We represent each dimension with a continuum. A continuum is a sequence of possible levels for which it is difficult to delineate where one level ends and a subsequent one begins. In the physical world, the visible spectrum of light is a continuum, ranging from red to indigo. Orange is in the spectrum, but there is no widespread agreement exactly where orange begins and ends. That does not, however, prevent us from using orange or discussing its merits for athletic-team colors.

Just as there is no color that is better than another, so there is no "best" choice on each dimension. However, certain colors are more appropriate in certain situations and certain choices on a testing dimension are better than others in a given situation. In this chapter, our focus is on describing the five dimensions. We will address implications of each dimension on total testing efforts in the next chapters when we discuss individual techniques. Along the way, we hope to give you some view of how various combinations of positions relate to levels of quality in the software product.

Who Performs Testing?

A project includes both developer and tester roles. Developer is a role characterized by performing activities that generate a product for example, analysis, design, programming, debugging, or documenting. Tester is a role characterized by performing activities to detect failures in a product. This includes selecting tests for a specific purpose, constructing the tests, and executing and evaluating the results. A given project member could assume both roles of developer and tester. Giving programmers responsibility for unit testing their own code is a common practice, although we strongly recommend a buddy testing scheme. System testing is commonly assigned to independent testers people assuming the role of tester, but not of developer.

Figure 3.6 illustrates a continuum ranging from the situation in which the developers are responsible for all testing to the situation in which the independent tester is responsible for all testing. In the latter case, each end of the continuum is not encountered in practice as often as the middle. In particular, it is typical only in small projects for developers to have responsibility for the final system testing of the implementation against the system requirements. Projects that involve life-critical functionality are typically the ones in which each component is unit tested by an independent tester. Some government regulations make this the expected choice. In between these two extremes are two popular choices. In one case, developers are totally responsible for class testing, but pairs of developers exchange code and test each other's code, hence the previously mentioned buddy testing. In the other case, an independent tester is given responsibility for specifying test cases while the developer is responsible for the construction and execution of the tests.

Figure 3.6. Continuum for assignments of roles in class testing

graphics/03fig06.gif

In this book, we discuss testing processes and techniques and usually do not identify just who is performing them. That decision must be based on the effective use of resources at various points along the whole effort. The decision also is influenced by government and industry regulations. Actual test plans for a project should call out who is responsible for various testing activities to be performed. There are many ways to assign roles to project team members, and we have not yet discovered a "best" way.

Which Pieces Are Tested?

Which parts of a system should be tested? Options vary from testing nothing to testing every single component (or line of code) that goes into the final software product. The continuum is represented in Figure 3.7.

Figure 3.7. Continuum for which parts of the software to test

graphics/03fig07.gif

A software system comprises many components. In object-oriented programming, the most basic component is a class. At one end of this continuum is the position "we will test every class that is included in this system." At the other end is the position "we will not test any piece." Faults are found as a result of random operation of the system or through providing "evaluation copies" on the Web and letting users report errors.

The middle ground is to have a systematic approach, perhaps statistical methods, for selecting a subset of the total set of components to be tested. The classes being reused from other projects or taken from class libraries may not need to be tested. Some of the classes will not be easy to test individually because testing them requires complex drivers to provide input or examine output. The drivers themselves will require considerable effort to write and might need considerable testing and debugging. Part of choosing where to be on this continuum is based on balancing the yield (defects found per hour of effort) of testing with the effort needed to build the test infrastructure.

If testing all classes is not feasible, what strategy can you use to select the test cases to develop? One strategy is to generate test cases at random. Of course, this is not a very good strategy since it might not test commonly used functions of the software. Another strategy might focus on probable uses of the system, thereby putting primary emphasis on tests that use the more common inputs to the software. Still another strategy might emphasize pathological cases obscure uses of the system under the (probably incorrect) assumption that if the developers paid attention to more obscure or obtuse requirements, then they must have understood all the requirements.^[4]

^[4] Testing solely using pathological cases is not a good strategy.

When Is Testing Performed?

Components can be tested as they are developed, or testing can be delayed until all components are integrated into a single executable, as shown in Figure 3.8. The further into development we wait, the more disruptive it will be to make changes based on test results.

Figure 3.8. Continuum for when software can be tested

graphics/03fig08.gif

When should testing be done? Sometimes testing is done only at the end of the development process that is, system testing and/or acceptance testing is the only formal testing done on software. This approach might work well when there are relatively few developers working from a well-understood set of requirements, but it is wishful thinking for most development efforts. It is widely recognized that the sooner a problem can be identified, the easier and cheaper it is to fix. Therefore, at the other end of the continuum is the decision to test every day. Between the extremes is testing each software component as it is produced. This will slow down the early progress of a development effort; however, it can pay off by greatly reducing the problems encountered later in a project as these pieces are composed into the larger system.

Also between the extremes is testing at the end of each increment. Rather than assembling individually tested pieces into the deliverable for the increment, this approach takes untested pieces, integrates them, and then tests the complete set of code as a monolithic whole. This is intended to reduce the cost of testing each individual piece as it is written. Success depends upon how complex each piece is and how experienced the development staff is. In very simple functionality, there may be sufficiently few defects that can be found by testing from the "outside." For more complex functionality, the defects may be buried so deeply in the code that it may be difficult to validate specific attribute values from outside the assembled increment. This approach is useful for components for which implementing a test driver is a significant effort.

One important issue in testing development products is the level of detail each represents. Consider, for example, an analysis model that is under refinement. What are the inputs to such a model? In other words, how detailed can we be in defining a test case for something that itself is not very well defined? We will address this issue in Chapter 4. The goal of this process is to provide feedback that can assist developers in making correct decisions.

How Is Testing Performed?

How will testing be performed? The basic approaches to testing software are based on the specification and the implementation, Figure 3.9.

Figure 3.9. Continuum for how software is tested

graphics/03fig09.gif

The specification for a software entity states what that entity is supposed to do that is, it describes the valid set of inputs to the entities, including the constraints on how multiple inputs might be related to one another, and what outputs correspond to the various inputs. The implementation for a software entity is an expression of an algorithm that produces the outputs for various inputs so that the specifications are obeyed. In short, a specification tells what a software entity does and an implementation tells how that software entity does what it does. Exhaustively covering specification information assures us that the software does what it is supposed to do. Exhaustively covering implementation information assures us that the software does not do anything that it is not supposed to do.

Specifications play a significant role in testing. We will need to have a specification written for many components of the software to be developed and tested, including specifications for systems, subsystems, and classes. It seems reasonable that we can generate test cases for a component based solely on its specification. However, for some components, implementation-based testing will be important to make certain the test suite is as thorough as it can be. For high-risk components, for example, we will want to make certain every bit of the code has been executed.

Besides testing individual components, we will also want to test the interactions between various components. This is traditionally referred to as integration testing, which occurs when components are integrated to create larger systems. The purpose of integration testing is to detect faults that arise because of interface errors or invalid assumptions about interfaces. Integration testing is particularly important in object-oriented systems because of the presence of inclusion polymorphism (see page 32), which is implemented using dynamic binding.

In an iterative, incremental process, integration testing will occur on a continuing basis. It will start with primitive objects being aggregated into more complex objects and move to complex objects that represent subsystems that are being integrated. In Chapter 6 we will provide some techniques for building effective test cases for interactions.

Adequacy of Test Cases

From practical and economic perspectives, testing software completely is usually just not possible. A reasonable goal for testing is to develop enough test cases to ensure that the software exhibits no failures in typical uses or in life-critical situations. This captures the idea of adequacy of testing a software product. Test it enough to be reasonably sure the software works as it is supposed to.

Adequacy can be measured based on the concept of coverage. Coverage can be measured in at least two ways. One way is in terms of how many of the requirements called out in the specification are tested. Of course, some requirements will require many test cases. Another way is in terms of how much of the software itself was executed as a result of running the test suite. A test suite might be adequate if some proportion of the lines of source code or possible execution paths through the source code was executed at least one time during test suite execution. These measures reflect two basic approaches to testing. One is based on what the software is supposed to do. The other is based on how the software actually works. Both approaches must be adopted to perform adequate testing.

In functional testing, which is also referred to as specification-based or black box testing, test cases are constructed based solely on the software's specification and not on how the software is implemented. This approach is useful for all levels of testing because it has the advantage that test cases can be developed even before coding begins. However, the effectiveness of functional testing depends greatly on the quality of the specification and the ability of the test suite developers to interpret it correctly.

In structural testing, which is also referred to as implementation-based or white box testing, test cases are constructed based on the code that implements the software. The output of each test case must be determined by the software's specification, but the inputs can be determined from analyzing the code itself to determine various values that cause various execution paths to be taken. The main advantage of this approach is improved coverage. The main disadvantage is that if the programmer did not implement the full specification, then that part of the functionality will not be tested.

To adequately test software, some combination of both approaches is usually most effective. Function-based is the stronger approach, but structural testing improves confidence in case the software does not do something it should not do.

How Much Testing Is Adequate?

This question is impossible to answer in general and it is not an easy question to answer even for a specific piece of software.^[5] There are many aspects to consider when addressing this question. The expected lifetime of the software is one consideration. Applications that will transform data from an old application to a new one seldom require extensive testing. Another consideration is whether the application containing the software is life-critical, which obviously requires very extensive testing. Note this is a decision about how thoroughly to test an individual piece chosen for testing.

^[5] Do not confuse this with the earlier continuum in which we considered which pieces to test.

One ad hoc view of adequacy is that testing continues as long as the costs of uncovering faults are balanced by the increased quality of the product. Another view considers the prevailing standards within the domain in which the software application is situated. Testing is designed to conform to those standards for example, there are obvious differences in quality standards between drug manufacturing and furniture manufacturing.

The differing levels of adequate testing can be viewed on a continuum, shown in Figure 3.10, from no testing at all, to minimal coverage in which we select a few tests to perform, and on to exhaustive testing in which every possible test case is run. Companies and sometimes even individual projects set testing policies based on a position along the continuum where they are comfortable.

Figure 3.10. Continuum for how much testing can be done

The amount of testing required should be determined relative to the long-term and short-term goals of the project, and relative to the software being developed. We frequently speak of "coverage" with respect to adequacy. Coverage is a measure of how completely a test suite exercises the capabilities of a piece of software. Different measures are used by different people for example, one measure might be based on whether every line of code is executed at least once when a test suite is run while another measure might be based on the number of requirements that are checked by the test suite. Consequently, coverage is expressed in phrases such as "75% of the code was executed by this test suite," or "One test case was constructed from each specified requirement."

We believe test coverage measures should be formulated primarily in terms of requirements and can vary depending on the priorities and objectives of the project. If, for example, requirements are specified by use cases, coverage will be measured by how many of the use cases are used and how many scenarios are created for each use case. Coverage measured in terms of implementation is useful in measuring the completeness of the specification-based test suite. If some code is not executed, then testers should work with developers to determine what test cases are missing or whether the software implements unspecified functionality.

We apply risk analysis in the testing process to determine the level of detail and amount of time to dedicate to testing a component for example, more time will be spent testing classes that are identified as reusable assets than those that are intended for use in a prototype. A reasonable scale of increasing risk for components is as follows:

Prototype components
Production components
Library components
Framework components

The result of recognizing differing levels of risk is the acceptance of differing levels of adequate test coverage. We will present testing algorithms that guide the testing of specific products. These algorithms will include what we term a rheostat effect, which produces differing levels of test coverage from the same algorithm. For example, in Orthogonal Array Testing on page 228 we will talk about testing different numbers of combinations of attribute values.