Section 19. DOECharacterizing


19. DOECharacterizing

Overview

Characterizing a process using Designed experiments involves

  • Determining which Xs (both controlled and uncontrolled) most affect the Ys

  • Identifying critical process and noise variables

  • Identifying those variables that need to be carefully controlled in the process.

Before reading this section it is important that the reader has read and understood "DOEIntroduction" also in this chapter. This section covers only designing, analyzing, and interpreting a Characterizing Design, not the full DOE roadmap.

Characterizing Designs[33] in Lean Sigma are based on a set of experimental designs known as Full Factorials and specifically the most efficient of those known as 2k Factorials. These are chosen because they

[33] For more detail see Statistics for Experimenters by Box, Hunter and Hunter.

  • Are good for early investigations because they can look at a large number of factors with relatively few runs

  • Can be the basis for more complex designs and thus lend themselves well to sequential studies

  • Are fairly straightforward to analyze.

A 2k factorial refers to k factors (Xs), each with two levels (a High and a Low value for the X). Thus, a 2-Factor Design is known as a 22 factorial and has two factors, each with two levels, and can be done in 22 = 4 runs, as shown in Figure 7.19.1a. Likewise a 23 factorial has three factors, each with two levels and can be done in 23 = 8 runs, as shown graphically in Figure 7.19.1b.

Figure 7.19.1. 2-Factor and 3-Factor Full Factorial Designs.


Thus, to conduct a 2-Factor Full Factorial with two levels for each X (such as high and low), there would be four runs required, as follows:

  • X1 at the low level, X2 at the low level

  • X1 at the low level, X2 at the high level

  • X1 at the high level, X2 at the low level

  • X1 at the high level, X2 at the high level

Writing it this way looks messy, so designs for 2k factorials are usually written as matrices shown in a specific standard order.[34] Also, for convention (and to enable the mathematics in the analysis itself), the low level of a factor is renamed as the " 1" level and the high level is renamed as the "+1" level. This is known as using coded units. An additional benefit here is that Attribute Xs can also be included in the design, for example Supplier A might be the 1 level and Supplier B the +1 level. Many Belts worry unnecessarily about coding and its implications. A Belt should know it exists and looks to the software to show which type of units (coded or actual) is displayed. Example design matrices for a 22 and a 23 factorial are shown in Figure 7.19.2.

[34] Known as Yates' Order.

Figure 7.19.2. Design matrices for a 22 and a 23 Full Factorial Design.


Notice that the 23 Factorial matrix actually contains the 22 Factorial matrix. This is a key property in the use of these types of Designs for sequential experimentation. If the smaller 2-Factor Design for the Xs Temperature and Concentration had already been run and it was important to add a third Factor Catalyst, the third Factor could be added easily.

After the Design has been constructed, it is simply a matter of running the experiments depicted by each row in the matrix, so for the 22 Factorial:

  • 1 (low) level for Temperature, 1 (low) level for Concentration

  • +1 (high) level for Temperature, 1 (low) level for Concentration

  • And so on

In reality, the runs would be done in random order (to prevent external Noise Variables affecting the results) and most DOE software packages generate the random order automatically. For each run, the associated value for the Y or Ys are recorded and the values become additional columns in the matrix, as in Figure 7.19.3. From the first run in the Figure when the process was run with a low ( 1) value of Temperature and a low ( 1) value of Concentration and all other Xs were maintained as constant as possible, the resulting Y value (perhaps a Yield or similar) was 23(%).

Figure 7.19.3. Design Matrix after the associated response data has been added.

Temp

Conc

Y

1

1

23

1

1

5

1

1

28

1

1

9


The purpose of DOE is to understand the effect of the Xs. To do this, the data must be analyzed to determine what contribution each X makes to the changes in the Y. From Figure 7.19.3 there are some simple calculations that can be made:

  • The average Y value at the low ( 1) Temperature level is the average of runs 1 and 3. This equates to (23 + 28) ÷ 2 = 25.5

  • The average Y value at the high (+1) Temperature level is the average of runs 2 and 4. This equates to (5 + 9) ÷ 2 = 7

  • The effect of going from the low ( 1) level to the high (+1) level of Temperature is (7 25.5) = 18.5 units of Y. Thus, as the Temperature increases from the low level to the high level, Y goes down by 18.5 units.

  • By a similar calculation, the effect of going from the low ( 1) level to the high (+1) level of Concentration is (28 + 9) ÷ 2 (23 + 5) ÷ 2 = + 4.5 units of Y. Thus, as Concentration increases from the low level to the high level, Y goes up by 4.5 units.

Temperature seems to have the largest effect on the Y, although it is negative. If Temperature in this process is not controlled well, then it is likely that there would be large swings in Y associated with the changes in Temperature. Concentration seems to have an effect, but a much smaller one.

Figure 7.19.4 shows a graphical representation of an interaction between 2 Xs. A 2-Factor interaction is when the effect of one X changes depending on the level of the second X. A simple example of this is when cooking entities in an oven with two factors, such as time and temperature. For low time, a high temperature is required, but for high time a low temperature is required.

Figure 7.19.4. Surface plot of an Interaction between 2 Xs (output from Minitab v14).


Figure 7.19.4 depicts a situation somewhat similar to this. From the figure, it is easy to see that two diagonally opposite corners in the plot are lowered and the other two are raised, forming a "twist" in the solution landscape. Thus, to detect an interaction, the effect is calculated from the differences in the averages of the diagonal corners. This is mathematically analogous to calculating a third column in the Design Matrix that is the multiple of the levels of the 2 Xs, as shown in Figure 7.19.5.

Figure 7.19.5. Design Matrix including the interaction between temperature and concentration.

Temp

Conc

Interaction

1

1

1

1

1

1

1

1

1

1

1

1


The corners of the Design space at [ 1, 1] (run 1) and [ +1, +1] (run 4) are diagonally opposite from one another and have an Interaction Level of +1. Similarly, the two other corners (runs 2 and 3) are diagonally opposite to each other and have an Interaction Level of 1. To calculate the size of the interaction for the example in Figure 7.19.3, the equation takes the average of the runs at the 1 level from the average of the runs at the +1 level for the interaction:

(Average of runs 1 & 4) (Average of runs 2 & 3) = (23 + 9) ÷ 2 (5 + 28) ÷ 2 = 0.5

For this example, there is almost no interaction at all shown by the sample of data collected.

In "DOEIntroduction" in this chapter, it was stated that DOE is about modeling the data. At this point, there is no model, but the size of the effects form the model. A generic 2-Factor model is

Y = β0 + β1X1 + β2X2 + β12X1X2

The term β0 is actually the average value of Y across all the data points, that is (23 + 9 + 5 + 28) ÷ 4 = 16.25. The other terms are half the size of the effects just calculated:[35]

[35] An Effect is the size of change associated in going from 1 to + 1, which means 2 units. A coefficient is the size of change incurred for 1 unit of change in the X and is half the size of the Effect.

  • β1 = Temperature Effect / 2 = 18.5 ÷ 2 = 9.25

  • β2 = Concentration Effect / 2 = + 4.5 / 2 = 2.25

  • β12 = Interaction Effect / 2 = 0.5 / 2 = 0.25

The final equation in coded units (-1s and +1s) for this data is thus:

Yield = 16.25 (9.25 x Temp) + (2.25 x Conc) + (0.25 x Temp x Conc)

For any known values of Temperature and Concentration, if they were expressed in coded units, the associated Yield could be calculated.

All of this theory is important to at least follow, but in the practical world of being a Belt, it is a lengthy process to do this by hand. Statistical software packages make this incredibly simple to generate. Figure 7.19.6 shows the output of a Full Factorial analysis for the previous example. As you can see from the top part of the output, the equation calculated by hand seems to have been done correctly, which is always a relief.

Figure 7.19.6. Full factorial analysis for example data listed in Figure 7.19.3 (output from Minitab v14).

Factorial Fit: y versus A, B

Estimated Effects and Coefficients for y (code units)

Term

Effect

Coef

    

Constant

 

16.250

    

A

18.500

9.250

    

B

4.500

2.250

    

A*B

0.500

0.250

    

S = *

      

Analysis of Variance for y (coded units)

Source

DF

Seq SS

Adj SS

Adj MS

F

P

Main Effects

2

362.500

362.500

181.250

*

*

2-Way Interactions

1

0.250

0.250

0.250

*

*

Residual Error

0

*

*

*

  

Total

3

362.750

    


The bottom part of Figure 7.19.6 shows an ANOVA table (for more detail see "ANOVA" in this chapter). ANOVA divides the total variability in the data into its component pieces, in this case the variation due to

  • Main Effects (Temperature and Concentration)

  • Temperature-Concentration interaction

  • Error (the part not explained by the preceding three components)

The first place to look in an ANOVA is at the p-values. The p-values here are associated with the signal-to-noise ratio of the size of the effects versus the background noise and how unlikely a ratio of that size is to occur purely by random chance. Thus, the hypotheses that the p-values relate to are

  • H0: Factor has no effect

  • Ha: Factor has an effect

If the p-value is low (less than 0.05), then it is unlikely that an effect of that magnitude relative to background noise could have occurred purely by random chance; thus, the Null Hypothesis H0 is probably incorrect.

What is unusual about the ANOVA table shown in Figure 7.19.6 is that the p-values don't exist because there is no Error value for the noise element of the signal-to-noise ratio. The issue here is the sheer efficiency of the design. The model has four coefficients, namely β01, β2, and β12 and there were only four data points run; four data points for four unknowns doesn't give redundancy to give an estimate of background variation (noise).

There are two options available to create the p-values, either add more runs or reduce the number of coefficients. In "DOEIntroduction" in this chapter, it recommends using at least 25% additional runs, in this case a minimum of one extra run. This run should be done as the Center Point at (0,0) and would immediately provide the much-needed p-values.

However, in this example there is the latter option of reducing the number of coefficients. The first analysis showed that the Interaction has a small coefficient relative to the size of the other coefficients. Ignoring this coefficient completely, there are now four runs for three coefficients, namely β0, β1, and β2. The analysis can be rerun and the data point that was used to calculate the β12 coefficient can be diverted to help understand the size of the background noise (in a limited way, but it's better than nothing). This is known as the "reduced model" and the associated analysis results are shown in Figure 7.19.7.

Figure 7.19.7. The reduced analysis results (output from Minitab v14).

Factorial Fit: y versus A, B

Estimated Effects and Coefficients for y (code units)

Term

Effect

Coef

SE Coef

T

P

 

Constant

 

16.250

0.2500

65.00

0.010

 

A

18.500

9.250

0.2500

37.00

0.017

 

B

4.500

2.250

0.2500

9.00

0.070

 

S = 0.5

R Sq = 99.93%

R Sq (adj) = 99.79%

 

Analysis of Variance for y (coded units)

Source

DF

Seq SS

Adj SS

Adj MS

F

P

Main Effects

2

362.500

362.500

181.250

725.00

0.026

Residual Error

1

0.250

0.250

0.250

  

Total

3

362.750

    


By simply "pooling" the Interaction coefficient β12 into the error term, there are now a number of statistics available:

  • The initial place to look is the p-value, which in this case indicates that the Main Effects are significant; the size of the Main Effects are 725 times greater than the background noise with a likelihood of occurrence of p = 0.026. This is less than the standard cutoff of 0.05.[36]

    [36] In fact, in DOE the cutoff used for the p-value in the initial stages of analysis is p = 0.1 so that no important coefficients are eliminated accidentally. DOEs are efficient with data, so a more cautious approach is used.

    • The R-Sq value of 99.93% means that of all the variation in the data, 99.93% of it is explained by the model.

    • R-Sq(adj) is close to R-Sq which means there probably aren't any redundant terms in the model. If R-Sq(adj) drops well below R-Sq, then there are terms in the model that don't contribute anything.

    • S = 0.5 is the standard deviation of the background noise, which is small compared with the size of the Main Effects.

The output in Figure 7.19.7 doesn't show a key piece of information called Percentage Contribution or otherwise known as Epsilon2. In simple terms, this is the amount of variation each Factor explains out of the total variation seen in the data. Figure 7.19.8 shows a separate ANOVA analysis run in the same software to generate the numbers to calculate this.

Figure 7.19.8. Balanced ANOVA analysis to generate Epsilon2 (output from Minitab v14).

Analysis of Variance for Y

Source

DF

SS

MS

F

P

Temp

1

342.25

342.25

1369.00

0.017

Conc

1

20.25

20.25

81.00

0.070

Error

1

0.25

0.25

  

Total

3

362.75

   

S = 0.5

R Sq = 99.93%

R Sq (adj) = 99.79%


Epsilon2 is defined as follows:


For the example shown in Figure 7.19.8, the Epsilon2 values are

  • Epsilon2 (Temp) = {SS(Temp) / SS(Total)} x 100 = {342.25 / 362.75} x 100 = 94.35%

  • Epsilon2 (Conc) = {SS(Conc) / SS(Total)} x 100 = {20.25 / 362.75} x 100 = 5.58%

This can be seen graphically in Figure 7.19.9, which really demonstrates how large a contribution is made by Temperature (interestingly the Error term is so small it is invisible on the graphic).

Figure 7.19.9. Graphical representation of Epsilon2.


Temperature has such a large effect on the Yield that (unless an X was missed along the way) controlling Temperature alone would go a long way to keeping this process under control.

The next question to be answered at this point would be "at what value should Temperature be set to gain the maximum Yield?" That could be done by taking Temperature (and possibly Concentration) as the X Variable in a Regression Study, or taking Concentration and Temperature forward into an Optimizing DOE (see "DOEOptimizing" in this chapter).

Roadmap

The roadmap to designing, analyzing, and interpreting a Characterizing DOE is as follows:

Step 1.

Identify the Factors (Xs) and Responses (Ys) to be included in the experiment. Each Y should have a verified Measurement System. Factors should come from earlier narrowing down tools in the Lean Sigma roadmap and should not have been only brainstormed at this point.

Step 2.

Determine the levels for each Factor, which are the Low and High values for the 2-Level Factorial. For example, for Temperature it could be 60°C and 80°C. Choice of these levels is important in that if they are too close together then the Response might not change enough across the levels to be detectable. If they are too far apart then all of the action could have been overlooked in the center of the design. The phrase that best sums up the choice of levels is "to be bold, but not reckless." Push the levels outside regular operating conditions, but not so far as to make the process completely fail or to make the process unsafe in any way.

Step 3.

Determine the Design to use. For 2-Level Full Factorials this is simple because there is only one Design available for a given number of Factors. A choice has to be made about how to add additional "redundant" points to get a good estimate of error. The standard approach is to add two to four Center Points in the middle of the Design at point (0,0,..., 0). This somewhat alleviates the problem of pushing the Levels too far apart as described in Step 2. More than four Center Points doesn't provide much additional information for the investment needed. For two or three Factors, two Center Points usually is fine.

Step 4.

Create the Design Matrix in the software with columns for the Response and each Factor. An example is shown in Figure 7.19.10 for three Factors and two Responses, Yield and Strength. The software created the additional columns (C1-C4) in this case to form part of its analysis.

Figure 7.19.10. Example data entry for a Characterizing DOE (output from Minitab v14).


Step 5.

Collect the data from the process as per the runs in the Design Matrix in Step 4. The runs should be conducted in random order to ensure that external Noise Variables don't affect the results. The corresponding Y value(s) for each run should be entered directly into the Design Matrix in the software. Be sure to keep a backup copy of all data.

Step 6.

Run the DOE analysis in the software including all terms in the model (all Factors and all Interactions). If the Design has more than three factors, run the model showing only 3-way and 2-way Interactions. Interactions in four or more variables do not occur.

Step 7.

Most software packages give a visual representation of the effects of each of the Factors and Interactions, which can be invaluable in reducing the model (eliminating terms in the model with negligibly small coefficients). An example of a 4-Factor Full Factorial is shown in Figure 7.19.11. The focus should be on the highest order interaction(s) first. The vertical line at 6.3 in the figure represents the p-value cutoff point, in this case 0.10. The higher value of 0.1 versus 0.05 is used to avoid eliminating any important effects accidentally. Focusing on the highest order Interactions, it is clear that all four 3-way Interactions are below the cutoff line and should be eliminated from the model. Also, any 2-way Interactions with effects less than the 3-way Interactions being eliminated, should also be removed, in this case AC, CD, AD.

Figure 7.19.11. Example Pareto Chart of Effects for a 4-Factor Full Factorial (output from Minitab v14).


The numerical p-values for each of the Factors and Interactions should be available in the software. Here they are shown in Figure 7.19.12. It is clear that the elimination of the 3-way Interactions makes sense because their p-values are all well above 0.1.

Figure 7.19.12. Numerical analysis of 4-Factor Full Factorial (output from Minitab v14).

Factorial Fit: Convert versus Cat-Chrg, Temp, Pressure, Conc

Estimated Effects and Coefficents for Convert (coded units)

Term

Effect

Coef

SE Coef

T

P

Constant

 

72.250

0.1250

578.00

0.001

Cat-Chrg

8.000

4.000

0.1250

32.00

0.020

Temp

24.000

12.000

0.1250

96.00

0.007

Pressure

2.250

1.125

0.1250

9.00

0.070

Conc

5.500

2.750

0.1250

22.00

0.029

Cat-Chrg*Temp

1.000

0.500

0.1250

4.00

0.156

Cat-Chrg*Pressure

0.750

0.375

0.1250

3.00

0.205

Cat-Chrg*Conc

0.000

0.000

0.1250

0.00

1.000

Temp*Pressure

1.250

0.625

0.1250

5.00

0.126

Temp*Conc

4.500

2.250

0.1250

18.00

0.035

Pressure*Conc

0.250

0.125

0.1250

1.00

0.500

Cat-Chrg*Temp*Pressure

0.750

0.375

0.1250

3.00

0.205

Cat-Chrg*Temp*Conc

0.500

0.250

0.1250

2.00

0.295

Cat-Chrg*Pressure*Conc

0.250

0.125

0.1250

1.00

0.500

Temp*Pressure*Conc

0.750

0.375

0.1250

3.00

0.205


For designs using Center Points, inspect the F-test for Curvature. If the p-value here is large, then there is no curvature effect to analyze and these too can be pooled in to calculate the error term.

Step 8.

Based on the preceding results, rerun the reduced model with only the significant effects.

In some analyses the software does not allow the term for a Factor, such as C, to be removed from the model because the Factor in question appears in a higher order Interaction, such as AxC, that is not being eliminated from the model. This is known as hierarchy. In this case, just keep the Factor in place and it could be eliminated later or it might have to stay in the final model even if it does not contribute much by itself, only in the Interaction form.

For the 4-Factor example, the results are shown in Figure 7.19.13. At this point, the estimate of error should be good because eight pieces of information have been pooled to create it (DF = 8 for the Residual Error). According to the Lean Sigma analysis rule for p-values less than 0.05, this is as far as the analysis can go because all the p-vales are below 0.05; so the associated Factors or Interactions are significant. However, it seems unlikely that all the eight remaining terms really contribute to the big process picture. To understand their contribution, the Epsilon2 value should be calculated.

Figure 7.19.13. Numerical analysis for the reduced model of 4-Factor Full Factorial (output from Minitab v14).

Factorial Fit: Convert Versus Cat-Chrg, Temp, Pressure, Conc

Estimated Effects and Coefficients for convert (coded units)

Term

Effect

Coef

SE Coef

T

P

 

Constant

 

72.50

0.2577

280.37

0.000

 

Cat-Chrg

8.000

4.000

0.2577

15.52

0.000

 

Temp

24.000

12.000

0.2577

46.57

0.000

 

Pressure

2.250

1.125

0.2577

4.37

0.002

 

Conc

5.500

2.750

0.2577

10.67

0.000

 

Cat-Chrg*Temp

1.000

0.500

0.2577

1.94

0.088

 

Temp*Pressure

1.250

0.625

0.2577

2.43

0.042

 

Temp*Conc

4.500

2.250

0.2577

8.73

0.000

 

S = 1.03078

R Sq = 99.70%

R Sq (adj) = 99.43%

 

Analysis of Variance for Convert (coded units)

Source

DF

Seq SS

Adj SS

Adj MS

F

P

Main Effects

4

2701.25

2701.25

675.313

635.59

0.000

2-Way Interactions

3

91.25

91.25

30.417

28.63

0.000

Residual Error

8

8.50

8.50

1.063

  

Total

15

2801.00

    


Step 9.

Calculate Epsilon2 for each significant effect in the reduced model. To do this, the Sequential Sum of Squares (Seq SS) for each effect is represented as a percentage of the Total Seq SS (2801.00 in this case). To list the Seq SS for each effect separately in Minitab requires another function called Balanced ANOVAmost other statistical software packages provide this in the ANOVA function. The results of this can be seen in Figure 7.19.14.

Figure 7.19.14. Epsilon2 Calculation for 4-Factor example (output from Minitab v14 with Epsilon2 hand calculated and typed in).


It is clear from the figure that only the effects from the following give meaningful contribution, the other effects are negligible.

  • Catalyst-Charge

  • Temperature

  • Concentration

  • Temperature-Concentration Interaction (even this is debatable because it is so small)

Step 10.

Run the final reduced model for the effects identified in Step 9. The final ANOVA table is shown in Figure 7.19.15. The model explains 98.61% of the variation in the run data (R-Sq). The R-Sq(adj) value is close to the R-Sq value, which indicates that there aren't any redundant terms in the model. The Lack of Fit of the model is insignificant (the p-value is well above 0.05). By closely controlling the three Factors in question (the Interaction is also taken care of if the Factors are controlled because it comprises two of the remaining Factors), about 98.61% of the variability in the process is contained.

Figure 7.19.15. Final analysis results for 4-Factor example (output from Minitab v14).

Factorial Fit: Convert versus Cat-Chrg, Temp, Conc

Estimated Effects and Coefficients for Convert (coded units)

Team

Effect

Coef

SE Coef

T

P

 

Constant

 

72.250

0.4707

153.48

0.000

 

Cat-Chrg

8.000

4.000

0.4707

8.50

0.000

 

Temp

24.000

12.000

0.4707

25.49

0.000

 

Conc

5.500

2.750

0.4707

5.84

0.000

 

Temp*Conc

4.500

2.250

0.4707

4.78

0.001

 

S = 1.88294

R Sq = 98.61%

RSq (adj) = 98.10%

 

Analysis of Variance for Convert (coded units)

Source

DF

Seq SS

Adj SS

Adj MS

F

P

Main Effects

3

2681.00

2681.00

893.667

252.06

0.000

2-Way Interactions

1

81.00

81.00

81.000

22.85

0.001

Residual Error

11

39.00

39.00

3.545

  

Lack of Fit

3

5.00

5.00

1.667

0.39

0.762

Pure Error

8

34.00

34.00

4.250

  

Total

15

2801.00

    


Step 11.

Formulate conclusions and recommendations. The output of a Characterizing Design is not the optimum settings for all of the Factors identified; it merely identifies the critical process Factors. Recommendations here would certainly include better control on the three identified Factors, but quite possibly a subsequent Optimizing DOE to determine the optimum settings for the three factors.

Other Options

Some potential elements that were not considered

  • Attribute Y data In the preceding analyses, all of the Ys were Continuous data, which made the statistics work more readily. If the Y is Attribute data, then often a key assumption of equal variance across the design is violated. To remove this issue, Attribute data has to be transformed, which is considered well beyond the scope of this book.

  • Blocking on a Noise Variable The preceding examples did not include any Blocked Variables. Blocks are added if the Team is not certain that a potential Noise Variable could have an effect. The Noise Variable is tracked for the experiment and effectively entered as an additional X in the software. After the initial analysis run, when the first reductions are made, the effect of Block would appear with its own p-value. If the p-value is above 0.1 then eliminate the Block from the model, as it has no effect. If the Block is important, then it should be included in the model and subsequent analysis and experimental designs.

  • Center Points and Curvature Center Points are useful to find good estimates of Error, but they also allow investigation of Curvature in the model. When the initial analysis is run, the Curvature should have its own p-value. Again if the p-value is high, then there is no significant effect from Curvature and the Curvature term can be eliminated from the reduced model going forward. Curvature is examined in greater detail in "DOEOptimizing" in this chapter.




Lean Sigma(c) A Practitionaer's Guide
Lean Sigma: A Practitioners Guide
ISBN: 0132390787
EAN: 2147483647
Year: 2006
Pages: 138

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net