32.5. Commenting Techniques | Code Complete: A Practical Handbook of Software Construction, Second Edition

< Free Open Study >

Commenting is amenable to several different techniques depending on the level to which the comments apply: program, file, routine, paragraph, or individual line.

Commenting Individual Lines

In good code, the need to comment individual lines of code is rare. Here are two possible reasons a line of code would need a comment:

The single line is complicated enough to need an explanation.
The single line once had an error, and you want a record of the error.

Here are some guidelines for commenting a line of code:

Avoid self-indulgent comments Many years ago, I heard the story of a maintenance programmer who was called out of bed to fix a malfunctioning program. The program's author had left the company and couldn't be reached. The maintenance programmer hadn't worked on the program before, and after examining the documentation carefully, he found only one comment. It looked like this:

MOV AX, 723h   ; R. I. P. L. V. B.

After working with the program through the night and puzzling over the comment, the programmer made a successful patch and went home to bed. Months later, he met the program's author at a conference and found out that the comment stood for "Rest in peace, Ludwig van Beethoven." Beethoven died in 1827 (decimal), which is 723 (hexadecimal). The fact that 723h was needed in that spot had nothing to do with the comment. Aaarrrrghhhhh!

Endline Comments and Their Problems

Endline comments are comments that appear at the ends of lines of code:

Visual Basic Example of Endline Comments

For employeeId = 1 To employeeCount    GetBonus( employeeId, employeeType, bonusAmount )    If employeeType = EmployeeType_Manager Then       PayManagerBonus( employeeId, bonusAmount ) ' pay full amount    Else       If employeeType = EmployeeType_Programmer Then          If bonusAmount >= MANAGER_APPROVAL_LEVEL Then             PayProgrammerBonus( employeeId, StdAmt() ) ' pay std. amount          Else             PayProgrammerBonus( employeeId, bonusAmount ) ' pay full amount          End If       End If    End If Next

Although useful in some circumstances, endline comments pose several problems. The comments have to be aligned to the right of the code so that they don't interfere with the visual structure of the code. If you don't align them neatly, they'll make your listing look like it's been through the washing machine. Endline comments tend to be hard to format. If you use many of them, it takes time to align them. Such time is not spent learning more about the code; it's dedicated solely to the tedious task of pressing the spacebar or the Tab key.

Endline comments are also hard to maintain. If the code on any line containing an endline comment grows, it bumps the comment farther out and all the other endline comments will have to be bumped out to match. Styles that are hard to maintain aren't maintained, and the commenting deteriorates under modification rather than improving.

Endline comments also tend to be cryptic. The right side of the line usually doesn't offer much room, and the desire to keep the comment on one line means that the comment must be short. Work then goes into making the line as short as possible instead of as clear as possible.

Avoid endline comments on single lines In addition to their practical problems, endline comments pose several conceptual problems. Here's an example of a set of endline comments:

Listing 31-22. C++ Example of Useless Endline Comments

 memoryToInitialize = MemoryAvailable();    // get amount of memory available       <-- 1 pointer = GetMemory( memoryToInitialize ); // get a ptr to the available memory      | ZeroMemory( pointer, memoryToInitialize ); // set memory to 0                        | ...                                                                                  | FreeMemory( pointer );                     // free memory allocated       <-- 1

(1)The comments merely repeat the code.

A systemic problem with endline comments is that it's hard to write a meaningful comment for one line of code. Most endline comments just repeat the line of code, which hurts more than it helps.

Avoid endline comments for multiple lines of code If an endline comment is intended to apply to more than one line of code, the formatting doesn't show which lines the comment applies to:

Visual Basic Example of a Confusing Endline Comment on Multiple Lines of Code

For rateIdx = 1 to rateCount                  ' Compute  discounted rates    LookupRegularRate( rateIdx, regularRate )    rate( rateIdx ) = regularRate * discount( rateIdx ) Next

Even though the content of this particular comment is fine, its placement isn't. You have to read the comment and the code to know whether the comment applies to a specific statement or to the entire loop.

When to Use Endline Comments

Consider three exceptions to the recommendation against using endline comments:

Use endline comments to annotate data declarations Endline comments are useful for annotating data declarations because they don't have the same systemic problems as endline comments on code, provided that you have enough width. With 132 columns, you can usually write a meaningful comment beside each data declaration:

Cross-Reference

Other aspects of endline comments on data declarations are described in "Commenting Data Declarations," later in this section.

Java Example of Good Endline Comments for Data Declarations

int boundary = 0;         // upper index of sorted part of array String insertVal = BLANK; // data elmt to insert in sorted part of array int insertPos = 0;        // position to insert elmt in sorted part of array

Avoid using endline comments for maintenance notes Endline comments are sometimes used for recording modifications to code after its initial development. This kind of comment typically consists of a date and the programmer's initials, or possibly an error-report number. Here's an example:

for i = 1 to maxElmts -- 1  -- fixed error #A423 10/1/05 (scm)

Adding such a comment can be gratifying after a late-night debugging session on software that's in production, but such comments really have no place in production code. Such comments are handled better by version-control software. Comments should explain why the code works now, not why the code didn't work at some point in the past.

Use endline comments to mark ends of blocks An endline comment is useful for marking the end of a long block of code the end of a while loop or an if statement, for example. This is described in more detail later in this chapter.

Cross-Reference

Use of endline comments to mark ends of blocks is described further in "Commenting Control Structures," later in this section.

Aside from a couple of special cases, endline comments have conceptual problems and tend to be used for code that's too complicated. They are also difficult to format and maintain. Overall, they're best avoided.

Commenting Paragraphs of Code

Most comments in a well-documented program are one-sentence or two-sentence comments that describe paragraphs of code:

Java Example of a Good Comment for a Paragraph of Code

// swap the roots oldRoot = root[0]; root[0] = root[1]; root[1] = oldRoot;

The comment doesn't repeat the code it describes the code's intent. Such comments are relatively easy to maintain. Even if you find an error in the way the roots are swapped, for example, the comment won't need to be changed. Comments that aren't written at the level of intent are harder to maintain.

Write comments at the level of the code's intent Describe the purpose of the block of code that follows the comment. Here's an example of a comment that's ineffective because it doesn't operate at the level of intent:

Java Example of an Ineffective Comment

/* check each character in "inputString" until a dollar sign is found or all characters have been checked */ done = false; maxLen = inputString.length(); i = 0; while ( !done && ( i < maxLen ) ) {    if ( inputString[ i ] == '$' ) {       done = true;    }    else {       i++;    } }

Cross-Reference

This code that performs a simple string search is used only for purposes of illustration. For real code, you'd use Java's built-in string library functions instead. For more on the importance of understanding your language's capabilities, see "Read!" in Section 33.3.

You can figure out that the loop looks for a $ by reading the code, and it's somewhat helpful to have that summarized in the comment. The problem with this comment is that it merely repeats the code and doesn't give you any insight into what the code is supposed to be doing. This comment would be a little better:

// find '$' in inputString

This comment is better because it indicates that the goal of the loop is to find a $. But it still doesn't give you much insight into why the loop would need to find a $ in other words, into the deeper intent of the loop. Here's a comment that's better still:

// find the command-word terminator ($)

This comment actually contains information that the code listing does not, namely that the $ terminates a command word. In no way could you deduce that fact merely from reading the code fragment, so the comment is genuinely helpful.

Another way of thinking about commenting at the level of intent is to think about what you would name a routine that did the same thing as the code you want to comment. If you're writing paragraphs of code that have one purpose each, it isn't difficult. The comment in the previous code sample is a good example. FindCommandWordTerminator() would be a decent routine name. The other options, Find$InInputString() and CheckEachCharacterInInputStrUntilADollarSignIsFoundOrAllCharactersHaveBeenChecked(), are poor names (or invalid) for obvious reasons. Type the description without shortening or abbreviating it, as you might for a routine name. That description is your comment, and it's probably at the level of intent.

Focus your documentation efforts on the code itself For the record, the code itself is always the first documentation you should check. In the previous example, the literal, $, should be replaced with a named constant and the variables should provide more of a clue about what's going on. If you want to push the edge of the readability envelope, add a variable to contain the result of the search. Doing that clearly distinguishes between the loop index and the result of the loop. Here's the code rewritten with good comments and good style:

Listing 31-22. Java Example of a Good Comment and Good Code

 // find the command-word terminator foundTheTerminator = false; commandStringLength = inputString.length(); testCharPosition = 0; while ( !foundTheTerminator && ( testCharPosition < commandStringLength ) ) {    if ( inputString[ testCharPosition ] == COMMAND_WORD_TERMINATOR ) {       foundTheTerminator = true;       terminatorPosition = testCharPosition;       <-- 1    }    else {       testCharPosition = testCharPosition + 1;    } }

(1)Here's the variable that contains the result of the search.

If the code is good enough, it begins to read at close to the level of intent, encroaching on the comment's explanation of the code's intent. At that point, the comment and the code might become somewhat redundant, but that's a problem few programs have.

Another good step for this code would be to create a routine called something like FindCommandWordTerminator() and move the code from the sample into that routine. A comment that describes that thought is useful but is more likely than a routine name to become inaccurate as the software evolves.

Cross-Reference

For more on moving a section of code into its own routine, see "Extract routine/extract method" in Section 24.3.

Focus paragraph comments on the why rather than the how Comments that explain how something is done usually operate at the programming-language level rather than the problem level. It's nearly impossible for a comment that focuses on how an operation is done to explain the intent of the operation, and comments that tell how are often redundant. What does the following comment tell you that the code doesn't?

Java Example of a Comment That Focuses on How

// if account flag is zero if ( accountFlag == 0 ) ...

The comment tells you nothing more than the code itself does. What about this comment?

Java Example of a Comment That Focuses on Why

// if establishing a new account if ( accountFlag == 0 ) ...

This comment is a lot better because it tells you something you couldn't infer from the code itself. The code itself could still be improved by use of a meaningful enumerated type name instead of O and a better variable name. Here's the best version of this comment and code:

Java Example of Using Good Style In Addition to a "Why" Comment

// if establishing a new account if ( accountType == AccountType.NewAccount ) ...

When code attains this level of readability, it's appropriate to question the value of the comment. In this case, the comment has been made redundant by the improved code, and it should probably be removed. Alternatively, the purpose of the comment could be subtly shifted, like this:

Java Example of Using a "Section Heading" Comment

// establish a new account if ( accountType == AccountType.NewAccount ) {    ... }

If this comment documents the whole block of code following the if test, it serves as a summary-level comment and it's appropriate to retain it as a section heading for the paragraph of code it references.

Use comments to prepare the reader for what is to follow Good comments tell the person reading the code what to expect. A reader should be able to scan only the comments and get a good idea of what the code does and where to look for a specific activity. A corollary to this rule is that a comment should always precede the code it describes. This idea isn't always taught in programming classes, but it's a well-established convention in commercial practice.

Make every comment count There's no virtue in excessive commenting too many comments obscure the code they're meant to clarify. Rather than writing more comments, put the extra effort into making the code itself more readable.

Document surprises If you find anything that isn't obvious from the code itself, put it into a comment. If you have used a tricky technique instead of a straightforward one to improve performance, use comments to point out what the straightforward technique would be and quantify the performance gain achieved by using the tricky technique. Here's an example:

C++ Example of Documenting a Surprise

for ( element = 0; element < elementCount; element++ ) {    // Use right shift to divide by two. Substituting the    // right-shift operation cuts the loop time by 75%.    elementList[ element ] = elementList[ element ] >> 1; }

The selection of the right shift in this example is intentional. Among experienced programmers, it's common knowledge that for integers, right shift is functionally equivalent to divide-by-two.

If it's common knowledge, why document it? Because the purpose of the operation is not to perform a right shift; it is to perform a divide-by-two. The fact that the code doesn't use the technique most suited to its purpose is significant. Moreover, most compilers optimize integer division-by-two to be a right shift anyway, meaning that the reduced clarity is usually unnecessary. In this particular case, the compiler evidently doesn't optimize the divide-by-two, and the time saved will be significant. With the documentation, a programmer reading the code would see the motivation for using the nonobvious technique. Without the comment, the same programmer would be inclined to grumble that the code is unnecessarily "clever" without any meaningful gain in performance. Usually such grumbling is justified, so it's important to document the exceptions.

Avoid abbreviations Comments should be unambiguous, readable without the work of figuring out abbreviations. Avoid all but the most common abbreviations in comments. Unless you're using endline comments, using abbreviations isn't usually a temptation. If you are and it is, realize that abbreviations are another strike against a technique that struck out several pitches ago.

Differentiate between major and minor comments In a few cases, you might want to differentiate between different levels of comments, indicating that a detailed comment is part of a previous, broader comment. You can handle this in a couple of ways. You can try underlining the major comment and not underlining the minor comment:

C++ Example of Differentiating Between Major and Minor Comments with Underlines Not Recommended

 // copy the string portion of the table, along the way omitting       <-- 1 // strings that are to be deleted                                       | //--------------------------------------------------------------------------       <-- 1 // determine number of strings in the table       <-- 2 ... // mark the strings to be deleted       <-- 3 ...

(1)The major comment is underlined.
(2)A minor comment that's part of the action described by the major comment isn't underlined here…
(3)…or here.

The weakness of this approach is that it forces you to underline more comments than you'd really like to. If you underline a comment, it's assumed that all the nonunderlined comments that follow it are subordinate to it. Consequently, when you write the first comment that isn't subordinate to the underlined comment, it too must be underlined and the cycle starts all over. The result is too much underlining or inconsistent underlining in some places and no underlining in others.

This theme has several variations that all have the same problem. If you put the major comment in all caps and the minor comments in lowercase, you substitute the problem of too many all-caps comments for too many underlined comments. Some programmers use an initial cap on major statements and no initial cap on minor ones, but that's a subtle visual cue too easily overlooked.

A better approach is to use ellipses in front of the minor comments:

C++ Example of Differentiating Between Major and Minor Comments with Ellipses

 // copy the string portion of the table, along the way omitting       <-- 1 // strings that are to be deleted // ... determine number of strings in the table       <-- 2 ... // ... mark the strings to be deleted       <-- 3 ...

(1)The major comment is formatted normally.
(2)A minor comment that's part of the action described by the major comment is preceded by an ellipsis here…
(3)…and here.

Another approach that's often best is to put the major-comment operation into its own routine. Routines should be logically "flat," with all their activities on about the same logical level. If your code differentiates between major and minor activities within a routine, the routine isn't flat. Putting the complicated group of activities into its own routine makes for two logically flat routines instead of one logically lumpy one.

This discussion of major and minor comments doesn't apply to indented code within loops and conditionals. In such cases, you'll often have a broad comment at the top of the loop and more detailed comments about the operations within the indented code. In those cases, the indentation provides the clue to the logical organization of the comments. This discussion applies only to sequential paragraphs of code in which several paragraphs make up a complete operation and some paragraphs are subordinate to others.

Comment anything that gets around an error or an undocumented feature in a language or an environment If it's an error, it probably isn't documented. Even if it's documented somewhere, it doesn't hurt to document it again in your code. If it's an undocumented feature, by definition it isn't documented elsewhere and it should be documented in your code.

Suppose you find that the library routine WriteData( data, numItems, blockSize ) works properly except when blockSize equals 500. It works fine for 499, 501, and every other value you've ever tried, but you've found that the routine has a defect that appears only when blockSize equals 500. In code that uses WriteData(), document why you're making a special case when blockSize is 500. Here's an example of how it could look:

Java Example of Documenting the Workaround for an Error

blockSize = optimalBlockSize( numItems, sizePerItem ); /* The following code is necessary to work around an error in WriteData() that appears only when the third parameter equals 500. '500' has been replaced with a named constant for clarity. */ if ( blockSize == WRITEDATA_BROKEN_SIZE ) {    blockSize = WRITEDATA_WORKAROUND_SIZE; } WriteData ( file, data, blockSize );

Justify violations of good programming style If you've had to violate good programming style, explain why. That will prevent a well-intentioned programmer from changing the code to a better style, possibly breaking your code. The explanation will make it clear that you knew what you were doing and weren't just sloppy give yourself credit where credit is due!

Don't comment tricky code; rewrite it Here's a comment from a project I worked on:

C++ Example of Commenting Clever Code

// VERY IMPORTANT NOTE: // The constructor for this class takes a reference to a  UiPublication. // The UiPublication object MUST NOT BE DESTROYED before the  DatabasePublication // object. If it is, the DatabasePublication object will cause  the program to // die a horrible death.

This is a good example of one of the most prevalent and hazardous bits of programming folklore: that comments should be used to document especially "tricky" or "sensitive" sections of code. The reasoning is that people should know they need to be careful when they're working in certain areas.

This is a scary idea.

Commenting tricky code is exactly the wrong approach to take. Comments can't rescue difficult code. As Kernighan and Plauger emphasize, "Don't document bad code rewrite it" (1978).

One study found that areas of source code with large numbers of comments also tended to have the most defects and to consume the most development effort (Lind and Vairavan 1989). The authors hypothesized that programmers tended to comment difficult code heavily.

When someone says, "This is really tricky code," I hear them say, "This is really bad code." If something seems tricky to you, it will be incomprehensible to someone else. Even something that doesn't seem all that tricky to you can seem impossibly convoluted to another person who hasn't seen the trick before. If you have to ask yourself "Is this tricky?" it is. You can always find a rewrite that's not tricky, so rewrite the code. Make your code so good that you don't need comments, and then comment it to make it even better.

This advice applies mainly to code you're writing for the first time. If you're maintaining a program and don't have the latitude to rewrite bad code, commenting the tricky parts is a good practice.

Commenting Data Declarations

Comments for variable declarations describe aspects of the variable that the variable name can't describe. It's important to document data carefully; at least one company that studied its own practices has concluded that annotations on data are even more important than annotations on the processes in which the data is used (SDC, in Glass 1982). Here are some guidelines for commenting data:

Cross-Reference

For details on formatting data, see "Laying Out Data Declarations" in Section 31.5. For details on how to use data effectively, see Chapters 10 through 13.

Comment the units of numeric data If a number represents length, indicate whether the length is expressed in inches, feet, meters, or kilometers. If it's time, indicate whether it's expressed in elapsed seconds since 1-1-1980, milliseconds since the start of the program, and so on. If it's coordinates, indicate whether they represent latitude, longitude, and altitude and whether they're in radians or degrees; whether they represent an X, Y, Z coordinate system with its origin at the earth's center; and so on. Don't assume that the units are obvious. To a new programmer, they won't be. To someone who's been working on another part of the system, they won't be. After the program has been substantially modified, they won't be.

Alternatively, in many cases you should embed the units in the variable names rather than in comments. An expression like distanceToSurface = marsLanderAltitude looks like it's probably correct, but distanceToSurfaceInMeters = marsLanderAltitudeInFeet exposes an obvious error.

Comment the range of allowable numeric values If a variable has an expected range of values, document the expected range. One of the powerful features of the Ada programming language was the ability to restrict the allowable values of a numeric variable to a range of values. If your language doesn't support that capability (and most languages don't), use a comment to document the expected range of values. For example, if a variable represents an amount of money in dollars, indicate that you expect it to be between $1 and $100. If a variable indicates a voltage, indicate that it should be between 105v and 125v.

Cross-Reference

A stronger technique for documenting allowable ranges of variables is to use assertions at the beginning and end of a routine to assert that the variable's values should be within a prescribed range. For more details, see Section 8.2, "Assertions."

Comment coded meanings If your language supports enumerated types as C++ and Visual Basic do use them to express coded meanings. If it doesn't, use comments to indicate what each value represents and use a named constant rather than a literal for each of the values. If a variable represents kinds of electrical current, comment the fact that 1 represents alternating current, 2 represents direct current, and 3 represents undefined.

Here's an example of documenting variable declarations that illustrates the three preceding recommendations all the range information is given in comments:

Visual Basic Example of Nicely Documented Variable Declarations

Dim cursorX As Integer  ' horizontal cursor position; ranges from 1..MaxCols Dim cursorY As Integer  ' vertical cursor position; ranges from 1..MaxRows Dim antennaLength As Long      ' length of antenna in meters; range is >= 2 Dim signalStrength As Integer  ' strength of signal in kilowatts; range is >= 1 Dim characterCode As Integer      ' ASCII character code; ranges from 0..255 Dim characterAttribute As Integer ' 0=Plain; 1=Italic; 2=Bold; 3=BoldItalic Dim characterSize As Integer      ' size of character in points; ranges from 4..127

Comment limitations on input data Input data might come from an input parameter, a file, or direct user input. The previous guidelines apply as much to routine-input parameters as to other kinds of data. Make sure that expected and unexpected values are documented. Comments are one way of documenting that a routine is never supposed to receive certain data. Assertions are another way to document valid ranges, and if you use them the code becomes that much more self-checking.

Document flags to the bit level If a variable is used as a bit field, document the meaning of each bit:

Visual Basic Example of Documenting Flags to the Bit Level

' The meanings of the bits in statusFlags are as follows, from most ' significant bit to least significant bit: ' MSB   0     error detected: 1=yes, 0=no '       1-2   kind of error: 0=syntax, 1=warning, 2=severe, 3=fatal '       3     reserved (should be 0) '       4     printer status: 1=ready, 0=not ready '       ... '       14    not used (should be 0) ' LSB   15-32 not used (should be 0) Dim statusFlags As Integer

Cross-Reference

For details on naming flag variables, see "Naming Status Variables" in Section 11.2.

If the example were written in C++, it would call for bit-field syntax so that the bit-field meanings would be self-documenting.

Stamp comments related to a variable with the variable's name If you have comments that refer to a specific variable, make sure the comment is updated whenever the variable is updated. One way to improve the odds of a consistent modification is to stamp the comment with the variable name. That way, string searches for the variable name will find the comment as well as the variable.

Document global data If global data is used, annotate each piece well at the point at which it's declared. The annotation should indicate the purpose of the data and why it needs to be global. At each point at which the data is used, make it clear that the data is global. A naming convention is the first choice for highlighting a variable's global status. If a naming convention isn't used, comments can fill the gap.

Cross-Reference

For details on using global data, see Section 13.3, "Global Data."

Commenting Control Structures

The space before a control structure is usually a natural place to put a comment. If it's an if or a case statement, you can provide the reason for the decision and a summary of the outcome. If it's a loop, you can indicate the purpose of the loop.

Cross-Reference

For other details on control structures, see Section 31.3, "Layout Styles," Section 31.4, "Laying Out Control Structures," and Chapters 14 through 19.

C++ Example of Commenting the Purpose of a Control Structure

 // copy input field up to comma       <-- 1 while ( ( *inputString != ',' ) && ( *inputString != END_OF_STRING ) ) {    *field = *inputString;    field++;    inputString++; } // while -- copy input field       <-- 2 *field = END_OF_STRING; if ( *inputString != END_OF_STRING ) {    // read past comma and subsequent blanks to get to the next input field       <-- 3    inputString++;    while ( ( *inputString == ' ' ) && ( *inputString != END_OF_STRING ) ) {       inputString++;    } } // if -- at end of string

(1)Purpose of the following loop.
(2)End of the loop (useful for longer, nested loops although the need for such a comment indicates overly complicated code).
(3)Purpose of the loop. Position of comment makes it clear that inputString is being set up for the loop.

This example suggests some guidelines:

Put a comment before each if, case, loop, or block of statements Such a place is a natural spot for a comment, and these constructs often need explanation. Use a comment to clarify the purpose of the control structure.

Comment the end of each control structure Use a comment to show what ended for example,

} // for clientIndex - process record for each client

A comment is especially helpful at the end of long loops and to clarify loop nesting. Here's a Java example of using comments to clarify the ends of loop structures:

Java Example of Using Comments to Show Nesting

 for ( tableIndex = 0; tableIndex < tableCount; tableIndex++ ) {    while ( recordIndex < recordCount ) {       if ( !IllegalRecordNumber( recordIndex ) ) {          ...       } // if       <-- 1    } // while         | } // for       <-- 1

(1)These comments indicate which control structure is ending.

This commenting technique supplements the visual clues about the logical structure given by the code's indentation. You don't need to use the technique for short loops that aren't nested. When the nesting is deep or the loops are long, however, the technique pays off.

Treat end-of-loop comments as a warning indicating complicated code If a loop is complicated enough to need an end-of-loop comment, treat the comment as a warning sign: the loop might need to be simplified. The same rule applies to complicated if tests and case statements.

End-of-loop comments provide useful clues to logical structure, but writing them initially and then maintaining them can become tedious. The best way to avoid such tedious work is often to rewrite any code that's complicated enough to require tedious documentation.

Commenting Routines

Routine-level comments are the subject of some of the worst advice in typical computer-science textbooks. Many textbooks urge you to pile up a stack of information at the top of every routine, regardless of its size or complexity:

Cross-Reference

For details on formatting routines, see Section 31.7. For details on how to create high-quality routines, see Chapter 7.

Visual Basic Example of a Monolithic, Kitchen-Sink Routine Prolog

'********************************************************************** (u@' Name: CopyString ' ' Purpose:      This routine copies a string from the source '               string (source) to the target string (target). ' ' Algorithm:    It gets the length of "source" and then copies each '               character, one at a time, into "target". It uses '               the loop index as an array index into both "source" '               and "target" and increments the loop/array index '               after each character is copied. ' ' Inputs:       input    The string to be copied ' ' Outputs:      output   The string to receive the copy of "input" ' ' Interface Assumptions: None ' ' Modification History: None ' ' Author:       Dwight K. Coder ' Date Created: 10/1/04 ' Phone:        (555) 222-2255 ' SSN:          111-22-3333 ' Eye Color:    Green ' Maiden Name:  None ' Blood Type:   AB- ' Mother's Maiden Name: None ' Favorite Car: Pontiac Aztek ' Personalized License Plate: "Tek-ie" '**********************************************************************

This is ridiculous. CopyString is presumably a trivial routine probably fewer than five lines of code. The comment is totally out of proportion to the scale of the routine. The parts about the routine's Purpose and Algorithm are strained because it's hard to describe something as simple as CopyString at a level of detail that's between "copy a string" and the code itself. The boilerplate comments Interface Assumptions and Modification History aren't useful either they just take up space in the listing. Requiring the author's name is redundant with information that can be retrieved more accurately from the revision-control system. To require all these ingredients for every routine is a recipe for inaccurate comments and maintenance failure. It's a lot of make-work that never pays off.

Another problem with heavy routine headers is that they discourage good factoring of the code the overhead to create a new routine is so high that programmers will tend to err on the side of creating fewer routines, not more. Coding conventions should encourage good practices; heavy routine headers do the opposite.

Here are some guidelines for commenting routines:

Keep comments close to the code they describe One reason that the prolog to a routine shouldn't contain voluminous documentation is that such a practice puts the comments far away from the parts of the routine they describe. During maintenance, comments that are far from the code tend not to be maintained with the code. The comments and the code start to disagree, and suddenly the comments are worthless. Instead, follow the Principle of Proximity and put comments as close as possible to the code they describe. They're more likely to be maintained, and they'll continue to be worthwhile.

Several components of routine prologs are described below and should be included as needed. For your convenience, create a boilerplate documentation prolog. Just don't feel obliged to include all the information in every case. Fill out the parts that matter, and delete the rest.

Describe each routine in one or two sentences at the top of the routine If you can't describe the routine in a short sentence or two, you probably need to think harder about what it's supposed to do. Difficulty in creating a short description is a sign that the design isn't as good as it should be. Go back to the design drawing board and try again. The short summary statement should be present in virtually all routines except for simple Get and Set accessor routines.

Cross-Reference

Good routine names are key to routine documentation. For details on how to create them, see Section 7.3, "Good Routine Names."

Document parameters where they are declared The easiest way to document input and output variables is to put comments next to the parameter declarations:

Java Example of Documenting Input and Output Data Where It's Declared Good Practice

public void InsertionSort(    int[] dataToSort, // elements to sort in locations firstElement..lastElement    int firstElement, // index of first element to sort (>=0)    int lastElement // index of last element to sort (<= MAX_ELEMENTS) )

This practice is a good exception to the rule of not using endline comments; they are exceptionally useful in documenting input and output parameters. This occasion for commenting is also a good illustration of the value of using standard indentation rather than endline indentation for routine parameter lists you wouldn't have room for meaningful endline comments if you used endline indentation. The comments in the example are strained for space even with standard indentation. This example also demonstrates that comments aren't the only form of documentation. If your variable names are good enough, you might be able to skip commenting them. Finally, the need to document input and output variables is a good reason to avoid global data. Where do you document it? Presumably, you document the globals in the monster prolog. That makes for more work and, unfortunately, in practice usually means that the global data doesn't get documented. That's too bad because global data needs to be documented at least as much as anything else.

Cross-Reference

Endline comments are discussed in more detail in "Endline Comments and Their Problems," earlier in this section.

Take advantage of code documentation utilities such as Javadoc If the code in the previous example were actually written in Java, you would have the additional ability to set up the code to take advantage of Java's document extraction utility, Javadoc. In that case, "documenting parameters where they are declared" would change to look like this:

Java Example of Documenting Input and Output Data To Take Advantage of Javadoc

/**  * ... <description of the routine> ...  *  * @param dataToSort elements to sort in locations firstElement..lastElement  * @param firstElement index of first element to sort (>=0)  * @param lastElement index of last element to sort (<= MAX_ELEMENTS)  */ public void InsertionSort(    int[] dataToSort,    int firstElement,    int lastElement )

With a tool like Javadoc, the benefit of setting up the code to extract documentation outweighs the risks associated with separating the parameter description from the parameter's declaration. If you're not working in an environment that supports document extraction, like Javadoc, you're usually better off keeping the comments closer to the parameter names to avoid inconsistent edits and duplication of the names themselves.

Differentiate between input and output data It's useful to know which data is used as input and which is used as output. Visual Basic makes it relatively easy to tell because output data is preceded by the ByRef keyword and input data is preceded by the ByVal keyword. If your language doesn't support such differentiation automatically, put it into comments. Here's an example in C++:

C++ Example of Differentiating Between Input and Output Data

void StringCopy(    char *target,          // out: string to copy to    const char *source     // in: string to copy from ) ...

Cross-Reference

The order of these parameters follows the standard order for C++ routines but conflicts with more general practices. For details, see "Put parameters in input-modify-output order" in Section 7.5. For details on using a naming convention to differentiate between input and output data, see Section 11.4.

C++-language routine declarations are a little tricky because some of the time the asterisk (*) indicates that the argument is an output argument and a lot of the time it just means that the variable is easier to handle as a pointer than as a nonpointer type. You're usually better off identifying input and output arguments explicitly.

If your routines are short enough and you maintain a clear distinction between input and output data, documenting the data's input or output status is probably unnecessary. If the routine is longer, however, it's a useful service to anyone who reads the routine.

Document interface assumptions Documenting interface assumptions might be viewed as a subset of the other commenting recommendations. If you have made any assumptions about the state of variables you receive legal and illegal values, arrays being in sorted order, member data being initialized or containing only good data, and so on document them either in the routine prolog or where the data is declared. This documentation should be present in virtually every routine.

Cross-Reference

For details on other considerations for routine interfaces, see Section 7.5, "How to Use Routine Parameters." To document assumptions using assertions, see "Use assertions to document and verify preconditions and postconditions" in Section 8.2.

Make sure that global data that's used is documented. A global variable is as much an interface to a routine as anything else and is all the more hazardous because it sometimes doesn't seem like one.

As you're writing the routine and realize that you're making an interface assumption, write it down immediately.

Comment on the routine's limitations If the routine provides a numeric result, indicate the accuracy of the result. If the computations are undefined under some conditions, document the conditions. If the routine has a default behavior when it gets into trouble, document the behavior. If the routine is expected to work only on arrays or tables of a certain size, indicate that. If you know of modifications to the program that would break the routine, document them. If you ran into gotchas during the development of the routine, document those also.

Document the routine's global effects If the routine modifies global data, describe exactly what it does to the global data. As mentioned in Section 13.3, "Global Data," modifying global data is at least an order of magnitude more dangerous than merely reading it, so modifications should be performed carefully, part of the care being clear documentation. As usual, if documenting becomes too onerous, rewrite the code to reduce global data.

Document the source of algorithms that are used If you've used an algorithm from a book or magazine, document the volume and page number you took it from. If you developed the algorithm yourself, indicate where the reader can find the notes you've made about it.

Use comments to mark parts of your program Some programmers use comments to mark parts of their program so that they can find them easily. One such technique in C++ and Java is to mark the top of each routine with a comment beginning with these characters:

/**

This allows you to jump from routine to routine by doing a string search for /** or to use your editor to jump automatically if it supports that.

A similar technique is to mark different kinds of comments differently, depending on what they describe. For example, in C++ you could use @keyword, where keyword is a code you use to indicate the kind of comment. The comment @param could indicate that the comment describes a parameter to a routine, @version could indicate file-version information, @throws could document the exceptions thrown by a routine, and so on. This technique allows you to use tools to extract different kinds of information from your source files. For example, you could search for @throws to retrieve documentation about all the exceptions thrown by all the routines in a program.

cc2e.com/3259

This C++ convention is based on the Javadoc convention, which is a well-established interface documentation convention for Java programs (java.sun.com/j2se/javadoc/). You can define your own conventions in other languages.

Commenting Classes, Files, and Programs

Classes, files, and programs are all characterized by the fact that they contain multiple routines. A file or class should contain a collection of related routines. A program contains all the routines in a program. The documentation task in each case is to provide a meaningful, top-level view of the contents of the file, class, or program.

Cross-Reference

For layout details, see Section 31.8, "Laying Out Classes." For details on using classes, see Chapter 6, "Working Classes."

General Guidelines for Class Documentation

For each class, use a block comment to describe general attributes of the class:

Describe the design approach to the class Overview comments that provide information that can't readily be reverse-engineered from coding details are especially useful. Describe the class's design philosophy, overall design approach, design alternatives that were considered and discarded, and so on.

Describe limitations, usage assumptions, and so on Similar to routines, be sure to describe any limitations imposed by the class's design. Also describe assumptions about input and output data, error-handling responsibilities, global effects, sources of algorithms, and so on.

Comment the class interface Can another programmer understand how to use a class without looking at the class's implementation? If not, class encapsulation is seriously at risk. The class's interface should contain all the information anyone needs to use the class. The Javadoc convention is to require, at a minimum, documentation for each parameter and each return value (Sun Microsystems 2000). This should be done for all exposed routines of each class (Bloch 2001).

Don't document implementation details in the class interface A cardinal rule of encapsulation is that you expose information only on a need-to-know basis: if there is any question about whether information needs to be exposed, the default is to keep it hidden. Consequently, class interface files should contain information needed to use the class but not information needed to implement or maintain the inner workings of the class.

General Guidelines for File Documentation

At the top of a file, use a block comment to describe the contents of the file:

Describe the purpose and contents of each file The file header comment should describe the classes or routines contained in a file. If all the routines for a program are in one file, the purpose of the file is pretty obvious it's the file that contains the whole program. If the purpose of the file is to contain one specific class, the purpose is also obvious it's the file that contains the class with a similar name.

If the file contains more than one class, explain why the classes need to be combined into a single file.

If the division into multiple source files is made for some reason other than modularity, a good description of the purpose of the file will be even more helpful to a programmer who is modifying the program. If someone is looking for a routine that does x, does the file's header comment help that person determine whether this file contains such a routine?

Put your name, e-mail address, and phone number in the block comment Authorship and primary responsibility for specific areas of source code becomes important on large projects. Small projects (fewer than 10 people) can use collaborative development approaches, such as shared code ownership in which all team members are equally responsible for all sections of code. Larger systems require that programmers specialize in different areas of code, which makes full-team shared-code ownership impractical.

In that case, authorship is important information to have in a listing. It gives other programmers who work on the code a clue about the programming style, and it gives them someone to contact if they need help. Depending on whether you work on individual routines, classes, or programs, you should include author information at the routine, class, or program level.

Include a version-control tag Many version-control tools will insert version information into a file. In CVS, for example, the characters

// $Id$

will automatically expand to

// $Id: ClassName.java,v 1.1 2004/02/05 00:36:43 ismene Exp $

This allows you to maintain current versioning information within a file without requiring any developer effort other than inserting the original $Id$ comment.

Include legal notices in the block comment Many companies like to include copyright statements, confidentiality notices, and other legal notices in their programs. If yours is one of them, include a line similar to the one below. Check with your company's legal advisor to determine what information, if any, to include in your files.

Java Example of a Copyright Statement

// (c) Copyright 1993-2004 Steven C. McConnell. All Rights Reserved. ...

Give the file a name related to its contents Normally, the name of the file should be closely related to the name of the public class contained in the file. For example, if the class is named Employee, the file should be named Employee.cpp. Some languages, notably Java, require the file name to match the class name.

The Book Paradigm for Program Documentation

Most experienced programmers agree that the documentation techniques described in the previous section are valuable. The hard, scientific evidence for the value of any one of the techniques is still weak. When the techniques are combined, however, evidence of their effectiveness is strong.