11.3 Metrics based on code complexity | Advanced Object Oriented Programming with Visual FoxPro 6.0

Metrics based on code complexity

Evaluating code complexity is the most basic way to measure progress. There are various ways to measure code complexity, the oldest of which was to simply count all lines of code. Later on, the impact of each line was considered as well, which resulted in information of higher quality. Without considering the complexity of each line of code (LOC), certain coding practices give a different result that others, even though the code might accomplish the same thing. Cascaded IF statements versus IF statements with complex expressions is an example of such a misleading scenario.

Number of lines of code

As mentioned above, counting the LOC is the simplest of all metrics. It gave a great deal of information in the old days. But with the introduction of object-oriented development, counting lines of code was considered bad practice. There are some good reasons for that. Having many lines of code does not indicate progress, but improper use of object technology. For this reason, various other metrics were invented and developed.

However, I still believe LOC is a valuable metric and provides necessary base information for all kinds of other metrics. Only the goals and use of this measurement have changed. LOCs should not be used for scheduling or measuring progress. But if it's bad practice to have numerous lines of code, then we have a great and easy way to detect bad code. However, large projects will always have a huge number of LOC. Does that make them bad by default? No. So it doesn't seem quite as easy as we thought.

Lines of code are not a useful measurement by themselves. They only provide useful information in combination with other metrics. Knowing that a project has 100,000 lines of code doesn't tell us anything. But if we know that 100,000 lines are spread over 50 classes and 500 methods, then we also know that we have an average method size of 200 lines. That's quite a lot, and gives us a good indication that object-oriented technology might be applied improperly. At the same time, there might be only a couple of really huge methods while others have a small number of LOC. In this case, our 200-line average doesn't draw an accurate picture. Fortunately, the information is still valuable, because we can now easily spot those methods and fix them.

Calculating the number of lines of code is easy. In a Visual FoxPro application, you analyze all PRGs, the method and property fields of VCX libraries, and other metadata tables FoxPro uses, like menus and reports. There is some controversy about how to rate specific application components, such as menus. Do you rate the MNX file, the generated MPR, or both? In this case, I encourage you to use only the MNX file. The reason for that is the fact that the generation tool might change. Newer versions of the generator can create more lines of code, more comments, a different number of classes, and so on. In this case, any kind of measurement would reflect either progress or a step back, even though no changes were made to the project. The same applies for other types of metadata that might be converted in some kind of source code, be it something VFP uses internally, or some extension or tool written by yourself or a third party.

Even when you're simply counting every line of code, you shouldn't treat every line equally. Every program I've seen so far has a quite large number of blank lines. It's obvious that these lines should be left out of your statistics. You should also count the number of comment lines separately. It gets a little more complex when it comes to class, method, property, and function definitions. I never count lines that define classes, such as DEFINE CLASS and ENDDEFINE. Counting these lines would give you a different result when using source code classes and visual class libraries, because there are no lines for the class definition in VCX files. The same applies for ADD OBJECT because this command is not used in VCX files. This is also true for method definitions they aren't needed in the VCX file. However, method definitions can contain parameter information. They are a crucial part of any application, so leaving them out would again produce a wrong number, especially because they may contain PARAMETERS or LPARAMETERS statements. For this reason I recommend counting method definitions if they have a parameter statement such as this:

FUNCTION Execute( lcId )

It is rather simple to figure out whether a method definition has a parameter statement. If it does, the line must include a parenthesis, in addition to some regular text after the open parenthesis.

I apply the same logic for regular functions that are not members of an object.

I never count the line that defines the end of a function or method definition, because it is optional; therefore, different coding styles would affect my measurements, and again, those lines don't exist in visual class libraries.

Property definitions are often overlooked in visual classes. They are regular lines of code in source code classes, but they're stored in the property field in visual class libraries. Make sure you count these lines as well. You can simply go through the property field and count all its lines, because the properties are stored line-by-line in plain English (see Chapter 5.)

Visual FoxPro's line structures are rather simple. There is no way to put several program lines in one line of text like there is in other languages. However, one program line can stretch across multiple text lines, using a semicolon at the end of each line. If you find a semicolon at the end of a line, you should not count the next line so that you can get an accurate result.

Line complexity

Counting the lines of code as described above gives you a baseline number that you can use for various other metrics. However, there are a lot of additional issues to consider if you want to come up with a number that has some real statistical value. If you only want to see how your project evolves, and you don't change your staff of programmers, then the methods described above should be sophisticated enough. However, if you want results you can compare to other projects, or if you want to compare the statistics for various programmers, then you need to eliminate factors such as personal coding style and the like.

A good (but non-trivial) way to do this is to determine the complexity of each line and to rate it. This will eliminate, for instance, the difference between cascaded IF statements and IF statements with multiple expressions. To rate a line of code, you have to count the number of complexity indicators individually for each line. Indicators include the number of method and function calls, operators such as =, $, #, AND, and OR, as well as the main purpose of each line. When dealing with LOCATE statements, lines that contain two function calls are not necessarily twice as complex as LOCATEs with a single function call or only expressions. These lines might contain a simple LOCATE statement with compound search criteria. So no matter how complex a LOCATE might be, I'd always count it as one line of code. However, when using IF statements, lines with two function calls or expressions usually are twice as complex as lines that have only one expression. They are at least equal to cascaded IF statements, which have at least twice as many lines of code. They have ENDIF statements, and very often they have redundant code because ELSE branches might be needed more than once. For this reason I add a factor of 1.5 for every additional expression or function call I find in an IF line. So an IF statement with one function call would be counted as one line. IF lines with two function calls are counted as 2.5 lines; if I find three function calls, I count the line as four lines, and so forth. The REPLACE command offers yet another variety, because it can be used to replace multiple values.

Many different commands in Visual FoxPro need special treatment. Addressing all of them would be impossible, so I recommend finding a couple of command groups. Lines that are always counted as one line should be in one group. Another group might contain all commands that have the complexity of multiple lines, such as the REPLACE lines. Yet another group would contain lines with more complex statements, such as IF, CASE, and DO WHILE.

Considering all these factors gives you a more accurate picture that can eliminate the differences in coding styles. Unfortunately, you won't be able to come up with a perfect algorithm to eliminate all differences. But every little step toward more accurate and objective results is a good and valuable one.

Code complexity is measured in many ways. People often count lines of code and come up with a complexity factor. I prefer the method described above. It doesn't really tell you about the complexity, but it does give you a greater number of lines of code. This might seem incorrect at first (and in many ways it might be), but I think it is a lot easier to handle, and it also allows you to compare source code from different programmers without having a result that's influenced by personal coding style.

Knowing about the code complexity can also give you some interesting hints about personal preferences of each programmer. You can see who likes to impress himself by creating complex lines, and who wants to give others a chance to read and understand his code, by keeping it simple unless there is a good reason for complexity.

Percentage of comment lines

Adding comments to source code is especially important in object-oriented applications. While you are counting lines of code, you also check whether everybody is documenting his code properly. Every time I see code with less than 10% comment lines, I get suspicious.

But again, differences in coding style make counting comment lines a little difficult. One programmer might like to use separate lines for comments, while another might add comments to regular lines using the && delimiter. The programmer who uses separate comment lines might like to use short lines and wrap comments over multiple lines. Yet another programmer might be known to create comment lines several hundred characters long. I try to compensate for all those differences by counting lines with attached comments (using &&) as two lines. When I encounter monster comment lines, I start to count everything longer than 80 characters as multiple lines.

If you use naming conventions, you can go even further and filter out certain types of comments. Some might be temporarily disabled code. Those lines should either be ignored all together, or they should be counted separately. Other comments could be function headers, documented changes, or just regular comments.

Percentage of procedural code

Visual FoxPro is a hybrid language that allows combining object-oriented features with traditional procedural code. This is a great feature, but because the programmer is not forced to use object-oriented technology (as he is in languages like Smalltalk), it allows him to apply non-object-oriented techniques in scenarios where proper object design is desired.

Finding procedural code is simple. Simply count every line that is not within the boundaries of a class definition.

Having a large percentage of procedural code usually is a sign of insufficient knowledge or poor acceptance of object-oriented technology. Unfortunately, making that judgement is not quite that easy. Keep in mind that Visual FoxPro applications will always require some procedural code the main program and menus, for instance. For this reason, you need to examine where the procedural code is located and what it is used for. But as a rule of thumb, there should be less than 1% procedural code in good object-oriented applications.