This section features two types of tools used to understand variation:
When collecting process data, plot the data on one of the following charts before continuing with other analyses:
Variation is the term applied to any differences that occur in products, services, and processes. There are two types of variation:
Note that there are different strategies for dealing with the two types of variation: To reduce common cause variation, you have develop new methods for doing the work everyday. To eliminate special cause variation, you have to look for something that was temporary or that has changed in the process, and find ways to prevent that cause from affecting the process again.
OPTIONAL: If you have done a histogram or have reason to believe the data are from a normal distribution (see p. 114), you can use the Run Chart Table (p. 121) to look for patterns of special causes. If this is the case…
# pts not on median |
Lower limit of runs |
Upper limit of runs |
10 |
3 |
8 |
11 |
3 |
9 |
12 |
3 |
10 |
13 |
4 |
10 |
14 |
4 |
11 |
15 |
4 |
12 |
16 |
6 |
12 |
17 |
5 |
13 |
18 |
6 |
13 |
19 |
6 |
14 |
20 |
6 |
14 |
21 |
7 |
15 |
22 |
7 |
16 |
23 |
8 |
16 |
24 |
8 |
17 |
25 |
9 |
17 |
26 |
9 |
18 |
27 |
9 |
19 |
28 |
10 |
19 |
29 |
10 |
20 |
30 |
11 |
20 |
31 |
11 |
21 |
32 |
11 |
22 |
33 |
11 |
22 |
34 |
12 |
23 |
35 |
19 |
23 |
36 |
13 |
23 |
37 |
13 |
25 |
38 |
14 |
25 |
39 |
14 |
26 |
40 |
15 |
26 |
41 |
16 |
26 |
42 |
16 |
27 |
43 |
17 |
27 |
44 |
17 |
28 |
45 |
17 |
29 |
46 |
17 |
30 |
47 |
18 |
30 |
48 |
18 |
31 |
49 |
19 |
31 |
50 |
19 |
32 |
60 |
24 |
37 |
70 |
28 |
43 |
80 |
33 |
48 |
90 |
37 |
54 |
100 |
42 |
59 |
110 |
46 |
65 |
120 |
48 |
70 |
Fixed opportunity: the sample size or "unit" being sampled is constant Variable opportunity: the sample size or "unit" being sampled changes
If you aren't sure what kind of data you have, see p. 70.
See below for more details on selecting charts for continuous data and see p. 130 for selecting charts for attribute data.
In most cases, you will be creating two charts for each set of continuous data. The first chart shows the actual data points or averages, the second chart shows the ranges or standard deviations. Why use both?
The data (I or Xbar) chart…
The range (mR or R) chart…
ImR chart (Individuals, moving Range)
Plots individuals data (I) on one chart and moving ranges (mR— the differences between each two adjacent points) on a second chart. Use when the best subgroup size is one, which will happen when…
ImR is a good chart to start with when evaluating continuous data. You can often do a quick chart by hand then use it to build a different or more elaborate chart later.
X, R chart (Xbar&R, Average + Range)
Plots averages of subgroups (Xbar) on one chart and the ranges (R) within the subgroups on the other chart. The Xbar&R Chart is used with a sampling plan to monitor repetitive processes.
The Xbar&R chart is the most commonly used control chart because it uses the Central Limit Theorem (p. 114) to normalize data—meaning it doesn't matter as much what the underlying distribution of the data is. It is also more sensitive than the ImR to process shifts.
X,S chart (Xbar&S, Average + Standard Deviation)
Plots subgroup averages (Xbar) plus standard deviations of the subgroups (S). Similar in use to Xbar&R charts except these can be used only when you have sample sizes of at least 10 units (statisticians believe that the standard deviation is reliable only when sample sizes are 9 or larger). It's far more common to use smaller sample sizes (≤9) so in most cases an Xbar&R chart will be a better choice.
See below for instructions on rational subgrouping for Xbar&R and Xbar&S charts.
For both Xbar&R and Xbar&S charts, you'll need to collect data in sets of points called subgroups, then calculate and plot the averages for those subgroups. Rational subgrouping is the process of selecting a subgroup based upon "logical" grouping criteria or statistical considerations.
Often, you can use natural breakpoints to determine subgroups:
Ex: If you have 3 shifts operating per day, collect 1 data point per shift and calculate the average for those 3 data points (you'll plot one "average" reading per day)
Or if you want to look for differences between shifts, collect, say, 5 data points per shift (you'll plot 3 average readings every day, 1 per shift)
If the data are not normally distributed, use the guidelines on the Central Limit Theorem, p. 114, and rational subgrouping guidelines to determine the proper subgroup size.
Subgroup size selection can also be used to address the following data problems:
Tips |
|
The constants in these formulas will change as the subgroup size changes (see second table on next page).
Individuals + Moving Range Charts (ImR chart) |
||
---|---|---|
Centerline |
X Average of data points |
mR Average of the moving ranges |
UCL |
X + 2.66 mR |
D4R |
LCL |
X − 2.66 mR |
D3R |
Subgroup Averages + Range (X&R chart) |
||
---|---|---|
Centerline |
X Average of subgroup averages |
R Average of subgroup ranges |
UCL |
X + A2R |
D 4R |
LCL |
X 2 − A2R |
D 3R |
Subgroup Averages + Std Dev (X&S chart) |
||
---|---|---|
Centerline |
X Average of subgroup averages |
S Average of subgroup std. dev. |
UCL |
X + A3R |
B4S |
LCL |
X − A3R |
B3S |
Note |
The X, R, and S symbols should technically be in lower-case letters, but (except for statistics books) are more often seen with capitals, so that is the convention used here. The A, D, and B factors are on the next page. |
n |
A2 |
A3 |
B3 |
B4 |
d2 |
D3 |
D4 |
---|---|---|---|---|---|---|---|
2 |
1.88 |
2.66 |
.00 |
3.27 |
1.13 |
.00 |
3.27 |
3 |
1.02 |
1.95 |
.00 |
2.57 |
1.69 |
.00 |
2.57 |
4 |
.73 |
1.63 |
.00 |
2.27 |
2.06 |
.00 |
2.28 |
5 |
.58 |
1.43 |
.00 |
2.09 |
2.33 |
.00 |
2.11 |
6 |
.48 |
1.29 |
.03 |
1.97 |
2.53 |
.00 |
2.00 |
7 |
.42 |
1.18 |
.12 |
1.88 |
2.70 |
.08 |
1.92 |
8 |
.37 |
1.10 |
.19 |
1.82 |
2.85 |
.14 |
1.86 |
9 |
.34 |
1.03 |
.24 |
1.76 |
2.97 |
.18 |
1.82 |
10 |
.31 |
.98 |
.28 |
1.72 |
3.08 |
.22 |
1.78 |
11 |
.29 |
.93 |
.32 |
1.68 |
3.17 |
.26 |
1.74 |
12 |
.27 |
.89 |
.35 |
1.65 |
3.26 |
.28 |
1.72 |
13 |
.25 |
.85 |
.38 |
1.62 |
3.34 |
.31 |
1.69 |
14 |
.24 |
.82 |
.41 |
1.59 |
3.41 |
.33 |
1.67 |
15 |
.22 |
.79 |
.43 |
1.57 |
3.47 |
.35 |
1.65 |
16 |
.21 |
.76 |
.45 |
1.55 |
3.53 |
.36 |
1.64 |
17 |
.20 |
.74 |
.47 |
1.53 |
3.59 |
.38 |
1.62 |
18 |
.19 |
.72 |
.48 |
1.52 |
3.64 |
.39 |
1.61 |
19 |
.19 |
.70 |
.50 |
1.50 |
3.69 |
.40 |
1.60 |
20 |
.18 |
.68 |
.51 |
1.49 |
3.74 |
.42 |
1.59 |
When data points can have only one of two values—such as when comparing a product or service to a standard and classifying it as being acceptable or not (pass/fail)—it is called binomial data. Use one of the following control charts for binomial data:
p-chart: Charts the proportion of defectives in each subgroup np-chart: Charts the number of defectives in each subgroup (must have same sample size each time)
Note how Control Limits change as subgroup size changes (the p-chart has variable subgroup sizes)
P-charts are often used in transactional situations: billing errors, defective loan applications, proportion of invoices with errors, defective room service orders, sales order data, etc.
A Poisson (pronounced pwa-sahn) distribution describes count data where you can easily count the number of occurrence (Ex: errors on a form, dents on a car), but not the number of non-occurrences (there is no such thing as a "non-dent"). These data are best charted on either:
c-chart: Charts the defect count per sample (must have the same sample size each time)
u-chart: Charts the number of defects per unit sampled in each subgroup (uses a proportion, so it's OK if sample size varies)
"Counts of blemishes" is one example of Poisson data—you can count blemishes but not non-blemishes. Also, the number of blemishes is relatively rare given the area of opportunity (having two small dents in a car is a relatively rare event compared to the proportion of the car that is NOT dented). Poisson data is plotted on either c-charts or u-charts depending on whether sample size varies.
If the sample size is always the same (10% variation in sample size is OK) use c-charts. If the sample size varies use the u-chart.
Tips for converting attribute data to continuous data |
In general, much more information is contained in continuous data than in attribute data, so control charts for continuous data are preferred. Possible alternatives to attribute charting for different situations:
|
When charting continuous data, you normally create two charts, one for the data and one for ranges (ImR, Xbar&R, etc.). In contrast, charts for attribute data use only the chart of the count or percentage.
Chart Type |
Centerline |
Upper Control Limit |
Lower Control Limit |
---|---|---|---|
p |
p |
||
np |
np |
||
c |
c |
||
u |
The "test for special causes" described on the following pages assume that you have normally distributed data (see p. 114):
All tests for special causes also assume you have independent observations:
Many of these tests relate to "zones," which mark off the standard deviations from the mean. Zone C is ± 1 std dev.; Zone B is between 1 and 2 std. dev.; and Zone A is between 2 and 3 std dev.
1 point beyond Zone A: Detects a shift in the mean, an increase in the standard deviation, or a single aberration in the process. Check your R-chart to rule out increases in variation.
9 points in a row on one side of the average in Zone C or beyond: Detects a shift in the process mean.
6 points in a row steadily increasing or decreasing: Detects a trend or drift in the process mean. Small trends will be signaled by this test before the first test.
14 points in a row alternating up and down: Detects systematic effects, such as two alternately used machines, vendors, or operators.
2 out of 3 points in a row in Zone A or beyond: Detects a shift in the process average or increase in the standard deviation. Any two out of three points provide a positive test.
4 out of 5 points in Zone B or beyond: Detects a shift in the process mean. Any four out of five points provide a positive test.
15 points in a row in Zone C, above and below the centerline: Detects stratification of subgroups—appears when observations in a subgroup come from sources with different means.
8 points in a row on both sides of the centerline with none in Zone C: Detects stratification of subgroups when the observations in one subgroup come from a single source, but subgroups come from different sources with different means.
To compare the actual variation in a process (Voice of the Process) to its allowed variation limits (Voice of the Customer):
The proportion of current values that fall inside specification limits tells us whether the process is capable of meeting customer expectations.
Can be done on any process that has a specification established, whether manufacturing or transactional, and that has a capable measuring system. More specifically, in manufacturing and engineering…
In services…
Tip |
|
When beginning to measure/monitor a parameter always:
Any process experiences more variation in the long term than in the short term, so "capability" will vary depending on whether you're collecting data for a short period of time (a day, week) or for much longer (several months or years).
The equations and basic concepts are identical for calculating short-term and long-term capability except for how the standard deviation is calculated:
Be alert: Many companies calculate process capability statistics using long-term variation, but use the "C" labels; others are careful to distinguish between long—and short-term variation. Check with data experts in your company to see what standards they follow.
Note |
The calculations here are for continuous, normal data. Refer to any good statistics textbook for capability analysis on attribute data. |
The choice: Cp vs. Cpk (or "P" versions)
Cp and Pp are ratios of total variation allowed by the specification to the total variation actually measured from the process.
Cpk is the smaller of Cpu or Cpl (same for the P versions) when a process has both an upper and lower specification limit.
Tips |
|
The Lean Six Sigma Pocket Toolbook A Quick Reference Guide to Nearly 100 Tools for Improving Process Quality, Speed, and Complexity