12.3 Story Points

Another variation on fuzzy logic is story points, which were originally associated with Extreme Programming (Cohn 2006). The technique is similar to fuzzy logic, but there are some interesting and useful variations that make story points worth discussing separately.

When using story points, the team reviews the list of stories (or requirements or features) it is considering building and assigns a size to each story. In this sense, story points are similar to fuzzy logic, except that the stories are normally assigned a numeric value from one of the scales shown in Table 12-8.

Table 12-8: Most Common Story Point Scales
Story Point Scale	Specific Points on the Scale
Powers of 2	1, 2, 4, 8, 16
Fibonacci sequence	1, 2, 3, 5, 8, 13

The result of this estimation activity is the creation of a list like the one shown in Table 12-9.

Table 12-9: Example of List of Stories and Assigned Story Points
Story	Points
Story 1	2
Story 2	1
Story 3	4
Story 4	8
…
Story 60	2
TOTAL	180

At this stage of their use, the story points are not terribly useful because they are a unitless measure—they don't translate into any specific number of lines of code, number of staff days, or calendar time. The critical idea behind story points is that the team has estimated all the stories at the same time, using the same scale, and in a way that is substantially free from bias.

Next, the team will plan an iteration, including planning to deliver some number of story points. The plan might be based on an assumption that a story point translates to a specific amount of effort, but that is just an assumption at that early point in the project.

After the iteration has been completed, the team will be in a position to have some real estimation capability. The team can look at how many story points it delivered, how much effort it expended, and how much calendar time elapsed, and it can then make a preliminary calibration of how story points translate to effort and calendar time. This is often called velocity. Table 12-10 shows an example of this.

Table 12-10: Data from Iteration 1 and Initial Calibration
Data for Iteration 1
27 story points delivered
12 staff weeks expended
3 calendar weeks expended
Preliminary Calibration
Effort = 27 story points ÷ 12 staff weeks = 2.25 story points/staff week
Schedule = 27 story points ÷ 3 calendar weeks = 9 story points/calendar week

This initial calibration allows the project manager to make a historical-data-based estimate of the remainder of the project, as shown in Table 12-11.

Table 12-11: Initial Projection for Remainder of Project
Data for Iteration 1
Assumptions (from Preliminary Calibration)
Effort = 2.25 story points/staff week
Schedule = 9 story points/calendar week
Project size = 180 story points
Preliminary Whole-Project Estimate
Effort = 180 story points ÷ 2.25 story points/staff week = 80 staff weeks
Schedule = 180 story points ÷ 9 story points/calendar week = 20 calendar weeks

Of course, the computations in Table 12-11 assume that the team will remain the same in future iterations, and the projection doesn't account for the planning considerations of holidays, vacations, and so on. But on iterative projects, it does provide for very early projections of whole-project outcomes based on historical data from the same project.

The initial whole-project estimates should be refined based on data from later iterations. The shorter your iterations are, the sooner you'll have data you can use to estimate the rest of the project and the more confident you can be in those estimates.

Tip #57

Use story points to obtain an early estimate of an iterative project's effort and schedule that is based on data from the same project.

Cautions About Ratings Scales

Fuzzy logic uses a verbal scale of Very Small, Small, Medium, Large, and Very Large. Story points use a scale based on powers of 2 or Fibonacci numbers. Which is better?

On a numeric scale, the ratios between the numbers on the scale suggest that the underlying quantities being measured bear a proportionate relationship. If your story points scale is a Fibonacci sequence, a scale of 1, 2, 3, 5, 8, 13 suggests that a story of 5 points will take 5/3 as much effort as a story of 3 points. It suggests that a story of 13 points will take more than 4 times as much effort as a story of 3 points.

These relationships turn out to be a double-edged sword. If the necessary care is taken to ensure that stories classified as 13 points really are about 4 times as much effort as stories classified as 3 points, that's great. That means you can compute an average effort per story point (as described earlier), multiply the total number of story points by the average, and get a meaningful result (also as described earlier).

Accomplishing this level of accuracy requires that great discipline be exercised in assigning story points to stories. It also requires checking actual project data to ensure that the ratios that are estimated are the ratios actually found in practice.

If care is not taken to ensure that the underlying numeric ratios implied by the Fibonacci sequence or by the powers of 2 are accurate, numeric story points have the potential to lead to computed results that are less valid than they appear. The use of a numeric scale implies that you can perform numeric operations on the numbers: multiplication, addition, subtraction, and so on. But if the underlying relationships aren't valid—that is, a story worth 13 points doesn't really require 13/3 as much effort as a story worth 3 points—then performing numeric operations on the "13" isn't any more valid than performing a numeric operation on "Large" or "Very Large."

Table 12-12 illustrates another way of describing this issue.

Table 12-12: Example of What Can Happen with a Numeric Scale That Isn't as Numeric as It Appears
Story Point Classification	Number of Stories	Apparent Story Points	Intended Ratio	Actual Ratio (from Data)	Real Story Points
"1"	4	"4"	1	2	4
"2"	7	"14"	2	2.5	18
"3"	5	"15"	3	3	15
"5"	5	"25"	5	7	35
"8"	12	"96"	8	11	132
"13"	2	"26"	13	17	34
TOTAL	43	"180"	-	-	238

In this example, the misleading numeric scale led us to believe that 180 points was a reasonable approximation of our total effort, but the real effort is about 30% higher.

Tip #58

Exercise caution when calculating estimates that use numeric ratings scales. Be sure that the numeric categories in the scale actually work like numbers, not like verbal categories such as small, medium, and large.