When looking for ways to prevent and diagnose bugs, it’s helpful to consider the stages of writing a program and how defects are introduced in each of those stages. We identify three stages of writing a program:
Conception
Expression
Transcription
For each stage, we identify the particular causes of errors that occur during that stage. This division of stages is logical rather than physical. Programmers can pass from one stage to another without conscious thought. This set of stages defines the coding phase of software development.
In the conception stage, a program is couched in a language-independent notation such as pseudocode or structure charts. Nonessential details are omitted.
The effort involved in the conception stage can be entirely mental. In fact, the results of the conception stage may never be written down. Errors occur in the conception stage due to the programmer not fully understanding the problem to be solved.
Here is a list of errors that commonly occur during the conception stage:
An algorithm makes invalid assumptions about the input.
An algorithm makes invalid assumptions about the program state.
An algorithm omits coverage of logical possibilities.
An algorithm omits actions that must be performed.
An algorithm performs actions that aren’t needed.
An algorithm applies wrong action when it detects a condition.
An algorithm performs actions in the wrong order.
An algorithm has a time complexity that prevents it from completing in a time acceptable in an interactive or real-time situation.
A data structure doesn’t have capacity to handle the volume of input.
A data structure uses a representation that will cause loss of information.
A data structure omits storage for input items or intermediate results.
A data structure doesn’t facilitate data access in an acceptable time.
In the expression stage, the programmer encodes the algorithm in a specific programming language. The programmer has filled in all the details necessary to compile and execute the program.
Once again, the effort involved in the expression stage can be entirely mental. It is possible not to commit the results of this stage to writing.
There are two possible reasons for expression-stage errors. First, the programmer writes a program that is invalid in the target programming language. Language translators (compilers, interpreters) normally find these problems, particularly if the problem is lexical or syntactic.
Second, the programmer writes a valid program that doesn’t say what he or she meant to write. Since even the best language translator can’t analyze intentions, these problems aren’t normally detected until further on in the software-development process.
To know how to recognize expression errors, it’s useful to consider the analogous errors made by people speaking a language that isn’t their native language. Grammatical mistakes can give away a speaker who has excellent pronunciation and vocabulary in the second language. Here are some of the more common mistakes of this kind:
Using appropriate words but putting them in the wrong order
Conjugating verbs, particularly irregular verbs, incorrectly
Selecting the wrong case endings for nouns and adjectives
Omitting items not required in the speaker’s native language, such as being verbs and definite and indefinite articles
These mistakes are interesting to us because programmers who use more than one language often make analogous mistakes in programming. They are also interesting because they reflect the kind of errors that experienced programmers make when writing code.
The following errors occur during the expression stage:
Unintended expression evaluation order is caused by operator precedence rules.
Loss of precision is caused by mixed data types and implicit conversions.
Unintended statement order of execution is caused by control-flow default rules.
Incompatible operand types in dynamically typed languages are caused by attribute derivation and inheritance rules.
The following code segments show typical expression errors of the second type. The comment says what the programmer meant to say, and the code says something else.
C count positive values: C compare each value to zero C if greater, then increment counter DO I = 1, N IF( X(I) .LT. 0 ) THEN C = C + 1 ENDIF ENDDO // count positive values: // compare each value to zero // if greater, then increment counter for ( i=0; i< 0 ) { c += 1; } }
Some languages, such as Fortran, C, and PL/I, require the language translator to accept implausible or questionable source code. Such programs that are accepted by the translator often have problems like loss of precision in conversions, wrong type of pointer for the object pointed to, and so forth.
The language translator will handle these codes by making assumptions or adding hidden code. The language translator may accept the program without complaint. Language translators use several mechanisms to cope with questionable code:
Adding conversions between data types
Assuming equality of operand width and pointer sizes
Generating code regardless of precision loss
Language translators for languages such as C++, Ada, and Java are normally much stricter about their language interpretations. They require matching types between formal and actual arguments, between left- and right-hand sides of assignments, and between pointers and the objects to which they point. This strictness prevents certain types of expression errors from resulting in an executable program that doesn’t do what the programmer intended.
In the transcription stage, the programmer embodies the program in a machine-readable medium. In the ancient past of computing, professional typists would create decks of cards by key-punching the code written by hand on a standard form by the programmer. Today, programmers do their own data entry on workstations or PCs, and program text is stored directly on disk.
When the transcription stage is complete, the program is in a form that the language translator will process.
The following errors occur during the transcription stage:
Omitting characters or words
Inserting extra characters or words
Substituting characters or words
Transposing adjacent characters or words
Some transcription errors are caused by difficulty reading the previous version of the program. Others are simply typing mistakes.
Transcription mistakes are trivial to correct, but they can result in valid programs that don’t express the intent of the programmer. If the language processor doesn’t catch them, they can be very difficult to isolate.
Some of computing’s most infamous bugs have been caused by transcription errors. Omitting or inserting a blank or period for a comma can be very expensive, even fatal. In the following Fortran statements, a period has been inadvertently substituted for a comma in the second statement. This change results in the intended loop control statement becoming a simple assignment. Since the code that was the body of the loop is only executed once, the results are quite unexpected.
DO 10 I = 1,100 DO10I=1.100