The best test case is the one in which all input elements have been removed that have no bearing on whether the undesired behavior occurs. If the defective program fails when processing a large input data set, it’s very important to reduce that data set before attempting to diagnose the problem.
If the input is a homogeneous aggregate of values, such as a matrix of floating-point numbers, you can try a matrix that just has the first and last rows of the matrix, or the first and last columns. If this doesn’t cause the problem to manifest, keep adding rows (or columns) from the original input back in, until it does manifest itself. You can also start with the upper left corner of the array and simultaneously add a row and a column.
If the input is a collection of heterogeneous aggregates, the techniques required to cut down the input are a bit more complicated. If the input file is a collection of independent records, you can take a random selection of 10 percent of the records. If the problem still manifests itself, take a random selection of 10 percent of the remaining records, and repeat the process until the problem no longer manifests itself. As an alternative, you can try cutting the input set in half, and if the problem persists, continue cutting it in half until the problem no longer manifests itself.
If the reported problem was related to a particular key or combination of key values in the records, try selecting those records that have the problematic key or keys. If that selection still manifests the problem, use the random 10 percent method or the binary selection method to cut down the input set until the problem no longer manifests itself.
Another approach to cutting down a test input data set is warranted if the input has a complex structure, such as an application program that is being input to a faulty compiler. This approach considers the frequency of use and misuse of the elements of the structure. The following elements should be selected from the original input:
The elements that are least frequently used
The elements that are most frequently misused
The combinations and sequences that are least frequently used
The combinations and sequences that are most frequently misused
Opportunities for misusing input data elements arise out of complex input structures. A program in a high-level programming language is an input data set for a compiler. Examples of frequently misused input elements are goto statements, data overlays such C/C++ union, and Fortran COMMON statements.
Yet another approach to cutting down a test input data set is to try to reduce more than one aspect of the input at a time, keeping the others constant. The aspects of a test data set include size, sequence, and values. The size aspect includes both the number of items, as well as the number of dimensions, in the case of arrays. The sequence aspect includes repeating patterns and ascending or descending ordering. The values aspect includes magnitude and the set of unique values represented.
How does reducing the required input suggest hypotheses? It focuses your attention on just those parts of the input that are causing problems. If a test data set doesn’t include a particular feature, there is no point in forming a hypothesis that the handling of that feature is the cause of the defect. By the time you have cut down your test case input to the minimum size required to manifest the defect, you will have eliminated a whole host of potential hypotheses from further consideration.