12.1 Fuzzy Logic

You can use an approach known as fuzzy logic to estimate a project's size in lines of code (Putnam and Myers 1992, Humphrey 1995). Estimators are usually capable of classifying features as Very Small, Small, Medium, Large, and Very Large. We can then use historical data about how many lines of code the average Very Small feature requires, how many lines of code the average Small feature requires, and so on to compute the total lines of code. Table 12-1 shows an example of how such an estimate might be created.

Table 12-1: Example of Using Fuzzy Logic to Estimate a Program's Size
Feature Size	Average Lines of Code per Feature	Number of Features	Estimated Lines of Code
Very Small	127	22	2,794
Small	253	15	3,795
Medium	500	10	5,000
Large	1,014	30	30,420
Very Large	1,998	27	53,946
TOTAL	-	104	95,955

The entries in the Average Lines of Code per Feature column in the table should be based on your organization's historical data and are fixed before the estimation begins. The Number of Features column is a count of how many features you have classified into each size category. The Estimated Lines of Code column is computed from the other two columns. As shown, the estimate has 5 significant digits, which is well beyond the accuracy of the underlying numbers. If I were presenting this estimate, I would present it as "96,000 lines of code" or even "100,000 lines of code" (that is, to one or two significant digits) to avoid using too much precision and conveying a false sense of accuracy.

How to Get the Average Size Numbers

Fuzzy logic works best when the sizes are calibrated from your organization's historical data. As a rule of thumb, the differences in size between adjacent categories should be at least a factor of 2. Some experts recommend a factor of 4 difference (Putnam and Meyers 1992).

You should create the initial size averages by classifying completed work from one or more completed systems. Go through the past system and classify each feature as Very Small, Small, Medium, Large, or Very Large. Then count the total number of lines of code for the features in each classification and divide that by the number of features to arrive at the average lines of code for each feature classification. Table 12-2 shows an example of how this might work out.

Table 12-2: Example of Creating Average LOC Numbers
Size	Number of Features	Count of Total LOC	Average LOC
Very Small	117	14,859	127
Small	71	17,963	253
Medium	56	28,000	500
Large	169	171,366	1,014
Very Large	119	237,762	1,998

The numbers in this table are purely for purposes of illustration. You should work out your own numbers by using your own organization's historical data.

Tip #55

Use fuzzy logic to estimate program size in lines of code.

How to Classify New Functionality

When assigning new functionality to size categories, it's important that the assumptions about what constitutes a Very Small, Small, Medium, Large, or Very Large feature in the estimate are the same as the assumptions that went into creating the average sizes in the first place. You can accomplish this in any of three ways:

Have the same people who are going to create the estimate create the original numbers for the sizes.
Train the estimators so that they classify features accurately.
Document the specific criteria for Very Small, Small, Medium, Large, and Very Large so that estimators can apply the size categories consistently.

How Not to Use Fuzzy Logic

One interesting aspect of statistics is that statistical summaries can have more validity than any of the individual data points that make up the summary. As discussed in Chapter 10, "Decomposition and Recomposition," the Law of Large Numbers gives the rolled-up estimate an accuracy above and beyond the accuracy of the individual estimates. The whole is truly greater than the sum of its parts.

When using fuzzy logic, it's important to remember this phenomenon, that the rolled-up number has a validity that the underlying numbers do not have. The reason fuzzy logic works is that we can safely assume that if 71 small features required an average of 253 lines of code in the past, 15 small features will each probably require approximately 253 lines of code in the future. However, the fact that the average is 253 lines of code does not mean that any specific feature will actually consist of 253 lines of code. The sizes of individual Small features could range from 50 lines of code to 1,000 lines of code. So, although the rolled-up estimate produced by fuzzy logic can be surprisingly accurate, you should not overextend the technique to make estimates of sizes of specific features.

By the same token, the fuzzy logic approach works well when you have about 20 features or more. If you don't have at least 20 total features to estimate, the statistics of this approach won't work properly, and you should look for another method.

Extensions of Fuzzy Logic

Fuzzy logic can also be used to estimate effort if you have the underlying data to support it. Table 12-3 shows an example of how that would work.

Table 12-3: Example of Using Fuzzy Logic to Estimate Effort
Size	Average Staff Days per Feature	Number of Features	Estimated Effort (Staff Days)
Very Small	4.2	22	92.4
Small	8.4	15	126
Medium	17	10	170
Large	34	30	1,020
Very Large	67	27	1,809
TOTAL	-	104	3,217

The numbers shown in the table are purely for purposes of illustration, and you would need to derive your own Average Staff Days per Feature from your organization's historical data.

The final estimate of 3,217 staff days is again too precise. You could simplify it to 3,200 staff days, 3,000 staff days, or 13 staff years (assuming 250 staff days per year). You can also always consider presenting the number as a range, such as 10 to 15 staff years, which would communicate an entirely different accuracy than would 3,217 staff days.