Section 05. Box Plot


05. Box Plot

Overview

The Box Plot is a graphical representation of data, as shown in Figure 7.05.1.

Figure 7.05.1. Composition of a Box Plot.


The graph is composed of elements describing the distribution of the data.

  • The y-axis is the output being measured; for instance if the data represents purity of a product, then the y-axis will be purity.

  • The middle horizontal line in the box is the median of the data, that is the fiftieth percentile of the data (a common mistake here is to assume it is the mean, not the median).

  • The bottom of the box lies on the twenty-fifth percentile of the data.

  • The top of the box lies on the seventy-fifth percentile of the data.

  • The box in its entirety thus represents the middle 50% of the data; meaning, the 25% of the data that lies either side of the median from the twenty-fifth percentile to the seventy-fifth percentile.

  • The bottom whisker runs down from the twenty-fifth percentile to usually the tenth (or the fifth depending on software settings). Anything below this is considered a possible outlier. It cannot be thrown out or discounted without investigation.

  • The top whisker runs up from the seventy-fifth percentile to usually the ninetieth (or ninety-fifth depending on software settings). Anything above this is considered an outlier.

  • Asterisks represent the outliers above the end of the top whisker or below the end of the bottom whisker.

Box Plots are dull when applied to a column of data in one lump. They become far more useful when the data is cut by an X. For example in Figure 7.05.2, the data is cut by the X Operator, and there are three boxes representing the Operators Bob, Jane, and Walt.

Figure 7.05.2. A Box Plot applied to data cut by Operator.


Interpreting the Output

Box Plots should not be used in isolation. Figure 7.05.2 seems to show a difference in the time each Operator takes to process an entity (the boxes aren't all aligned). The key word here is seems. A Box Plot can help identify if there might be differences between levels of an X, such as the Operator, but those differences might not be statistically significant. Any graphical tool such as this should be followed by the appropriate statistical tool to understand the likelihood of seeing the difference purely by random chance.

In fact Box Plots are best used as part of a Multi-Vari Study where data is collected and analyzed systematically using multiple graphical and statistical tools. Factors, such as the shape of the data (normal or non-normal) and the number of data points, along with a host of others, play a role in understanding the data.




Lean Sigma(c) A Practitionaer's Guide
Lean Sigma: A Practitioners Guide
ISBN: 0132390787
EAN: 2147483647
Year: 2006
Pages: 138

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net