4.2 Criterion-Oriented Validity

It is unfortunate, but most of what we would like to know about a person who is interviewing with us for a job as a programmer simply cannot be gleaned from the interview process. We would like to know if this person can write good code. We would like to know if this person has been a consistent high producer of code. We would like to know if this person will work well on a software development team. These attributes are the programmer quality attributes. These are the things that we would really like to know about our programmer before we make a commitment to hire this person. They are our criterion attributes. The interesting thing about criterion attributes is that we can, for the most part, only know them after the fact. The very thing that we wish to know, we cannot know. We can, however, learn from the past. We have probably hired many programmers in the past. Some of these programmers will have been good programmers and others were probably marginal. We need to identify attributes of these programmers that we can measure and select a subset of these attributes that can be shown to be related to our criterion measures. We can then use the historical data to identify programmer attributes that are related to our criteria, measure our existing staff to determine the degree of relationship between each of the identified attributes and one or more of the programmer criterion measures, and build a working set of attributes that we can use to identify potentially good programmers from a set of potential candidates.

4.2.1 Predictive Validity

It is clear that we can use the historical data at our disposal to develop relationships among measurable programmer attributes and programmer criterion measures. We might speculate, for example, that a programmer's IQ is a good predictor of her programming skills. This is a reasonable hypothesis. Critical reasoning skills are a necessary attribute of programming. IQ is quite possibly a good measure of these reasoning skills. We cannot, however, speculate about this relationship or suppose that it holds. We must conduct an experiment to understand the relationship between IQ and good programming skills. As part of this experiment we will choose a set of programmers whose programming ability has been assessed by a team of qualified experts. Assume, for the moment, that these experts can produce a reliable assessment of programmer ability. This means that each of the experts can assign a value from 1 to 5 to a variable skill that represents a fair and reliable assessment of a programmer's actual programming ability as demonstrated at our workplace by past performance. Based on our new skill measure, we can learn which programmers we should have hired and which we should not have hired. Unfortunately, this information is not timely. We would like to know this before we hire a potentially bad programmer.

We would like to develop measures on attributes, such as IQ, that we think are related to the skill variable such that the attribute measures vary directly or inversely with our skill assessment. If we are successful in identifying attributes that are related to the skill variable within the pool of existing programmers, then we can use these attributes to predict the skill variable of new candidates. We can then validate that such a relationship holds in the future by experimentation. That is, we can use our attribute measures to hire only the candidates who score high (or low) on our predictive measures. If our selected attributes do a good job of identifying good programmers, then they will have predictive validity.

Sometimes, the concept of predictive validity and correlation are associated. We will see in Chapter 7 that these are two very different things. A good attribute measure may not have a very good statistical correlation with our criterion measure. The correlation statistic measures only linear relationships between two variables. As an example, a programmer IQ value will probably be a very good predictor of programmer skill. Prospective programmers with a very low IQ score are probably going to have a difficult time finding their way to work, much less solving complex programming tasks. Good cognitive skills are directly related to the programming activity. However, there is probably a point of diminishing returns. Extremely bright people are challenged only once by complicated problems. Programming is often a very repetitive activity. It readily loses its charm to very bright people who do not like the constraints of a typical programming job. They will probably not do well in the long haul as production programmers. We will learn this when we conduct our validation study.

We will not reject IQ as an attribute measure to predict our criterion measure of skill because there is a nonlinear relationship between IQ and skill. Quite the contrary, there is probably a strong quadratic relationship between IQ and skill. We will probably learn to eliminate people from our candidate pool that have IQ scores less than 110 and greater than 135. The IQ attribute will probably have very good predictive validity for skill but it will not be a linear relationship.

4.2.2 Concurrent Validity

We would very much like to have a test of programmer aptitude. High scores on our test of aptitude measure would then indicate that we can identify those potential employees who would make good employees. The problem with developing such an instrument and ensuring that it is working correctly is to evaluate a group of incoming employees with our instrument and then wait 10 to 20 years to get some long-term data on how well these employees really did in our organization. In most cases, we simply cannot wait that long to find out whether our instrument really did allow us to discriminate between potentially good programmers and those who would not succeed in this role.

We can use our existing staff of programmers to validate that the instrument of programmer aptitude we have developed really does work. Within the framework of our current staff, we probably have a full range of successful to not-so-successful programmers. This group of programmers can be used to validate the instrument. They will allow us to establish concurrent validity. We can administer our test of programmer aptitude to them. To the extent that the scores from our programmer aptitude test correspond with the evaluation of our programming staff, the new instrument will have concurrent validity. We will not have to wait 10 to 20 years to see whether the test will discriminate between good programmers and poor ones.