You are on page 1of 3

June 2005

Model Reduction in Designed Experiments

By Keith M. Bower

To understand the relationship between variables, Six Sigma practitioners may be required to perform a
designed experiment. When obtaining the results using the Analysis of Variance (ANOVA) procedure, it
may be appropriate to remove terms from the full model. This article describes a methodology for
removing terms, and an understanding of why this approach may be legitimate.

Models in Designed Experiments

It is important to note that mathematical models used in designed experiments are merely an attempt to
understand relationships between variables. As George Box once observed, all models are all wrong, but
some are useful.

When performing a designed experiment, variables and factor levels are chosen to hopefully provide key
insights into these relationships. Certain techniques such as blocking, replication and randomization may
be appropriate in this endeavor.1

From the outset, an experimenter typically considers the full model, namely a model that includes higher
order interactions as well as main effects. Once the experiment has been performed and results obtained,
it may be appropriate to remove certain terms if they do not explain a “significant” amount of the total
variation. In practice, the experimenter attempts to develop an understanding of the few (owing to the
sparsity of effects phenomenon) main and interaction effects that may be driving the relationship under
investigation.

A hypothetical example, used solely for pedagogical purposes, illustrates the methodology behind model
reduction.

Example

Consider a chemical reaction in which four temperature settings – 100ºC, 120ºC, 140ºC and 160ºC and
two pressure settings – 200 PSI and 1000 PSI are to be investigated. The eight combinations are tested
five times in a random order, leading to forty observations in total.

Once the data are collected the results are analyzed using the ANOVA procedure, employing the full
model as shown in (1).

(1) Yield = µ + Temperature i + Pressure j + Temp × Pressure ij + ε

Where i = 1, 2, 3, 4; j = 1, 2

www.asq.org/sixsigma
Table 1

Source DF Seq SS Adj SS Adj MS F P


Temp 3 457.28 457.28 152.43 5.25 0.005
Pressure 1 0.35 0.35 0.35 0.01 0.913
Temp*Pressure 3 38.09 38.09 12.70 0.44 0.728
Error 32 928.19 928.19 29.01
Total 39 1423.91

From Table 1, we find there are 1423.91 sums of squares in total, with 39 degrees of freedom.2
Frequently, practitioners mistakenly disregard this final line in the ANOVA table. However, consider
that 1423.91 divided by 39 gives us the sample variance (36.5). The ANOVA table furnishes us with
information as to how this sample variance is being decomposed into the constituent elements (this is,
after all, why it is called ANOVA). As discussed by R.A. Fisher: 3

When the variation of any quantity (variate) is produced by the action of two or more independent
causes, it is known that the variance produced by all the causes simultaneously in operation is the
sum of the values of the variance produced by each cause separately…The property of the variance,
by which each independent cause makes its own contribution to the total, enables us to analyse the
total, and to assign, with more or less accuracy, the several portions to their appropriate causes, or
groups of causes.

In this example we find that roughly 25% of the variation is explained by Temp, with only 3 of the 39
available degrees of freedom being used to explain this amount of variation. In comparison, neither the
Temp*Pressure interaction effect nor the Pressure main effect seem to be important sources of variation.
Note also that their P-values are statistically insignificant (i.e. 0.913 and 0.728 are both higher than
0.05).

From the evidence indicated in the ANOVA table, it is clear that this system appears to be driven
primarily by the main effect of Temp. The other terms in the model are therefore placed into the error
term as they are regarded as being, essentially, noise.

The model would then be refit and residuals assessed in the usual manner to ensure that model
assumptions are approximately well met. 4

Summary

When considering sources of variation, the ANOVA table provides key information for model reduction
purposes. The Six Sigma practitioner needs to use common sense, along with their process knowledge to
distinguish between the important few sources of variation from the trivial many.

www.asq.org/sixsigma
About the Author

Keith M. Bower is a statistician and webmaster for www.KeithBower.com, a site devoted to providing
access to online learning materials for quality improvement using statistical methods. He received a
bachelor’s degree in mathematics with economics from Strathclyde University in Great Britain and a
master’s degree in quality management and productivity from the University of Iowa in Iowa City,
USA. He is a member of ASQ and the Six Sigma Forum.

References

1. For more information on these terms, see Keith M. Bower, “Some Comments on Historical
Designed Experiments,” ASQ Six Sigma Forum, November 2004.
2. For a discussion of degrees of freedom, see Keith M. Bower, “Why Divide by n-1?” ASQ Six
Sigma Forum, February 2005.
3. R.A. Fisher, “Studies in Crop Variation. I. An Examination of the Yield of Dressed Grain from
Broadbalk,” Journal of Agricultural Science, 11 no. 2 (1920), 110-111.
4. For information on assessing model assumptions, see Douglas C. Montgomery, Design and
Analysis of Experiments, (New York: John Wiley and Sons, Inc., 2004): 75-85.

www.asq.org/sixsigma

You might also like