4.4 Construct Validity

Not all metrics measure what their names imply. Each metric seeks to quantify an underlying construct. We can easily believe that the lines of code (LOC) metric measures the size of a program at least on one of the attribute dimensions of size. The presumption is that if we know the LOC value of a program module, then we will learn something about the amount of raw programmer effort that was expended on that module. In this sense, LOC is a good measure of the size construct. Other metrics, such as Hal-stead's effort measure, have little or nothing to do with the actual developer effort construct that they purport to measure.

Many software development organizations have developed their own ad hoc programmer aptitude tests. In these test scenarios, a potential hire is asked to write a program to solve a well-defined but complex algorithm. After the victim has completed the exercise, his or her efforts will be evaluated as to whether the program actually works, how good the solution was, and also how many mistakes were made in the effort. The resulting program would typically be, for example, 50 to 60 LOC. The supposed underlying psychological construct for this exercise is programmer ability. Unfortunately, this type of exercise does not have construct validity. It will not test what its designers think it is testing, for a number of reasons. First, a good programmer can be expected to produce no more than n clean lines of code per week, where n is a function of the language and the complexity of the problem being solved. Whatever the value of n is for a given organization is certainly well below the 50 LOC that the potential hire is expected to produce on the spot. Second, typically, most software development time will be spent in design and just simply understanding what the task at hand is. Coding is simply the act of translating a design to a particular programming language. There will be no time for such introspection in the brief interval that the hapless interviewee will have to demonstrate his or her ability. Third, most good programmers are very careful and highly methodical. They do not simply sit at a computer terminal and hack code. This is a most atypical code development scenario.

These ad hoc programming tests are probably really good in their ability to identify hackers; that is the real construct that the tests measure. In the vast majority of software development organizations, a hacker will not do well in the long run. Good programming skills probably include substantial attention to design detail, the ability to work within a team, and considerable social skills. Modern software systems are designed and built by very large development organizations, possibly distributed throughout the world. There is really no place in such an organization for a lone wolf, a code jockey. Yet, this is the very thing that most locally developed programming aptitude tests do, in fact, measure.