Validation

This section defines validation in a Taguchi context as the process of ensuring that the program meets the functional specification. In other words, no functions have been left out, and gratuitous (unspecified) features and unintended side effects are not present. It's not possible to test all the combinations the program will encounter. Therefore, you must consider which ones to test, how to test them, and what degree of confidence these tests can produce in the program's validity.

Case Study 18.2: Taguchi Methods for Software Validation^[9]

This case study was used by Taguchi et al.^[9] to show how the method can be used for a software validation application. It is common knowledge that not all the paths or threads of even a simple program can actually be tested, due to combinatorial explosion. This case study illustrates this fact for a rather simple program and yields insight into how to validate such a program with a high degree of confidence.

Consider a shared printer that serves a number of users over an office LAN. Server printers like this typically have a lot of options. This printer has five paper trays, five print ranges, six choices of the number of pages per sheet (1, 2, 3, 4, 6, 8), four duplex options, two collation choices (stapled or not), two orientations (portrait or landscape), and six possible paper scales. This yields 144,000 combinations that the program must be able to deal withbut it gets worse. There are yes/no choices for 11 print options, or 2¹¹ = 2,048 combinations. Multiplied by 144,000, this gives us 294,912,000 possible operational modes for this shared printer.

The developers cannot check them all, so which ones should they choose, and with what confidence of validation? Even if it were possible to check out one every minute, 24 hours a day, 365 days a year, it would take 561 years. If a program could be written to check out one combination every second, it would still take almost ten yearsat least three times the product's expected market life. By the time the automated testing were finished, the product would be two generations out of date! And this is pretty simple as computer programs for enterprise software go. In the past, test plans aimed at the most common or likely user choices and checked them out over a month or two. Testing was complete when the time was up, when the product shipped, when the error rate went down, or when a week went by with no new errors discovered. Taguchi Methods for software testing employ an orthogonal array to improve both area coverage and detection rate. The Taguchi case study chose eight factors at three levels:

Factor Letter	Factor Name	Level 1	Level 2	Level 3
A	Staple	No	Yes
B	Side	2 to 1	1 to 2	2 to 2
C	Number of copies	3	20	50
D	Number of pages	2	20	50
E	Paper tray	Normal	Tray 5	Tray 3
F	Darkness	Normal	Light	Dark
G	Enlarge	100%	78%	128%
H	Execution	From PC	At machine	From memory

These test parameters attempt to cover the range on each factor. They produce the following L₁₈ orthogonal matrix:

	A	B	C	D	E	F	G	H
1	1	1	1	1	1	1	1	1
2	1	1	2	2	2	2	2	2
3	1	1	3	3	3	3	3	3
4	1	2	1	1	2	2	3	3
5	1	2	2	2	3	3	1	1
6	1	2	3	3	1	1	2	2
7	1	3	1	2	1	3	2	3
8	1	3	2	3	2	1	3	1
9	1	3	3	1	3	2	1	2
10	2	1	1	3	3	2	2	1
11	2	1	2	1	1	3	3	2
12	2	1	3	2	2	1	1	3
13	2	2	1	2	3	1	3	2
14	2	2	2	3	1	2	1	3
15	2	2	3	1	2	3	2	1
16	2	3	1	3	2	3	1	2
17	2	3	2	1	3	1	2	3
18	2	3	3	2	1	2	3	1

The case study reports the results of only 18 tests, according to each row in the orthogonal test array. For each test, if the system provides the proper response, the result is 0. If it fails, the result is 1. After the tests are run and the data collected, a set of two-way response tables are generatedan AxB table, an AxC table, and so on. The entry in each table is the sum of the 1s in its responses. For example, there are nine combinations of B_iC_j (for i = 1,2,3 and j = 1,2,3), and there are two runs of B₂C₃ in an L₁₈ array. The result in the tests was 2 in B₂C₃, and 2 out of 2 is 100%. In contrast, there were two failures in A₁B₂ out of three occurrences, or 66.7%. Finally, the 100% failure combinations are investigated to see why they occurred. Note that this method localizes the failures to a program segment of code but does not necessarily identify them. Taguchi reports that three international firms have used this method in this application. All three achieved 400% improvement in both area coverage and failure detection rate.

Reliability and validity are closely related, because the valid implementation of a responsive, verified design meets its user's requirements. Furthermore, predictive validity in a software system is tantamount to software trustworthiness. Reliability refers to the consistency of a number of measurements using the same measurement methods. Validity means that the measurement taken measures what you intended to measure.^[10] Obviously, a measurement may be reliable but not valid. Software metrics researchers classify validity into criterion-related validity and content validity. The former is often called predictive validity and measures how well the software will perform to its user's requirements in the future, clearly a primary measure of trustworthiness. The latter is a measure of how well the validity measurement covers the subject. Kan uses the analogy of a rifle accuracy test to show the difference between reliability and validity.^[11] If you take several test shots with a new rifle, and they produce a neat 1-inch grouping at 100 yards at the edge of the target, the rifle has reliable accuracy. On the other hand, if they produce a 10-inch scatter group with its center in the middle of the target, the rifle has valid accuracy. The ideal is a small grouping in the center of the target; this shows that the rifle is both reliable and valid. Having gained this result by careful adjustment, the target shooter can go to his or her next match certain of the gun's trustworthiness, or the hunter can take the rifle into the field confident that it will be a trustworthy firearm. With software testing, we want to establish tests that show that the software is both reliable and valid and thus will be found completely trustworthy by its users. Like sighting in the rifle, there is a bit of tension between reliability and validity, and testing metrics need to respond to both requirements. Of course, such measurements in software are multidimensional compared to sighting in a simple mechanical device such as a firearm.

Case Study 18.2: Taguchi Methods for Software Validation[9]

Case Study 18.2: Taguchi Methods for Software Validation^[9]