Flylib.com

Books Software

 
 
 

Measures of spread (range, variance, standard deviation)


Measures of spread (range, variance, standard deviation)

Highlights

  • Spread tells us how the data are distributed around the center point. A lot of spread = high variation.

  • Common measures of spread include range, variance, and standard deviation.

  • Variation is often depicted graphically with a frequency plot or histogram ( see p. 111).

click to expand

Range

Range is the difference between the largest and smallest values in a data set.

  • The Min is the smallest value in a data set

  • The Max is the largest value in a data set

  • The Range is the difference between the Max and the Min

    • Ex: Here are ten ages in ascending order: 32, 33, 34, 34, 35, 37, 37, 39, 41, 44

    • Min = 32 , Max = 44

    • Range = Max Min = 44 32 = 12

Variance

Variance tells you how far off the data values are from the mean overall.

  1. Calculate the mean of all the data points, Xbar

  2. {% if main.adsdop %}{% include 'adsenceinline.tpl' %}{% endif %}

    Calculate the difference between each data point and the average (X i —Xbar)

  3. Square those figures for all data points

    • This ensures that you'll always be dealing with a positive number— otherwise , all of the values would cancel each other out and sum to zero

  4. Add the squared values together (a value called the sum of squares in statistics)

  5. Divide that total by n-1 (the number of data values minus 1)

Note that the equation above follows statistical conventions (p. 105) for describing sample statistics. Variance for a population uses a sigma as shown here.

Though more people are familiar with standard deviation ( see below), variance has one big advantage: it is additive while standard deviations are not. That means, for example, that the total variance for a process can be determined by adding together the variances for all the process steps.

  • So to calculate a standard deviation for an entire process, first calculate the variances for each process step, add those variances together, then take the square root. Do not add together the standard deviations of each step.

A drawback to using variance is that it is not in the same units of measure as the data points. Ex: for cycle times, the variance would be in units of "minutes squared," which doesn't make logical sense.

Standard deviation

Think of standard deviation as the "average distance from each data point to the mean." Calculate the standard deviation for a sample or population by doing the same steps as for the variance, then simply taking the square root. Here's how the equation would look for the 10 ages listed on the previous page:

Just as with variance, the standard deviation of a population is denoted with sigma instead of "s", as shown here:

The standard deviation is a handy measure of variability because it is stated in the same units as the data points. But as noted above, you CANNOT add standard deviations together to get a combined standard deviation for multiple process steps. If you want an indication of spread for a process overall, add together the variances for each step then take the square root.



Boxplots

Highlights

  • Boxplots, or box-and-whisker diagrams, give a quick look at the distribution of a set of data

  • They provide an instant picture of variation and some insight into strategies for finding what caused the variation

  • They allows easy comparison of multiple data sets

click to expand

To use boxplots

  • Boxplots are typically provided as output from statistical packages such as Minitab (you will rarely construct one by hand)

  • The "box" shows the range of data values comprising 50% of the data set (the 2 nd and 3 rd quartiles)

    • The line that divides the box shows the median ( see definition on p. 107)

  • Single-line "whiskers" extend below and above the box (or the left and right, if the box is horizontal) showing the width of the 1 st and 4 th quartiles, and lowest and highest values

  • Data values that fall far from other data values in the set are plotted separately and labeled as outliers

    • Often, outliers reflect errors in recording data

    • If the data value is real, you should investigate what was going on in the process at the time