Spread
Common measures of spread include range, variance, and standard deviation.
Variation is often depicted graphically with a frequency plot or histogram ( see p. 111).
Range is the difference between the largest and smallest values in a data set.
The Min is the smallest value in a data set
The Max is the largest value in a data set
The Range is the difference between the Max and the Min
Ex: Here are ten ages in
Min = 32 , Max = 44
Range = Max − Min = 44 − 32 = 12
Variance tells you how far off the data values are from the mean overall.
Calculate the mean of all the data points, Xbar
Calculate the difference between each data point and the average (X i —Xbar)
Square those figures for all data points
This ensures that you'll always be dealing with a positive number—
Add the squared values together (a value called the sum of squares in statistics)
Divide that total by n-1 (the number of data values minus 1)
Note that the equation above
Though more people are familiar with standard deviation ( see below), variance has one big advantage: it is additive while standard deviations are not. That means, for example, that the total variance for a process can be determined by adding together the variances for all the process steps.
So to calculate a standard deviation for an entire process, first calculate the variances for each process step, add those variances together, then take the square root. Do not add together the standard deviations of each step.
A drawback to using variance is that it is not in the same units of measure as the data points. Ex: for cycle times, the variance would be in units of "minutes squared," which doesn't make logical sense.
Think of standard deviation as the "average distance from each data point to the mean." Calculate the standard deviation for a sample or population by doing the same steps as for the variance, then simply taking the square root. Here's how the equation would look for the 10 ages listed on the previous page:
Just as with variance, the standard deviation of a population is denoted with sigma instead of "s", as shown here:
The standard deviation is a handy measure of variability because it is stated in the same units as the data points. But as noted above, you CANNOT add standard deviations together to get a combined standard deviation for multiple process steps. If you want an indication of spread for a process overall, add together the variances for each step then take the square root.
Boxplots, or box-and-whisker diagrams, give a quick look at the distribution of a set of data
They provide an instant picture of variation and some insight into strategies for finding what caused the variation
They allows easy comparison of multiple data sets
Boxplots are typically provided as output from statistical packages such as Minitab (you will rarely construct one by hand)
The "box" shows the range of data values comprising 50% of the data set (the 2 nd and 3 rd quartiles)
The line that divides the box shows the median ( see definition on p. 107)
Single-line "whiskers" extend below and above the box (or the left and right, if the box is horizontal) showing the width of the 1
st
and 4
th
quartiles, and
Data values that fall far from other data values in the set are plotted separately and labeled as outliers
Often, outliers reflect errors in recording data
If the data value is real, you should investigate what was going on in the process at the time