You are on page 1of 56

Experimental Design

Experimental Design
Unified Concepts, Practical
Applications, and Computer
Implementation
Bruce L. Bowerman, Richard T. OConnell, and
Emily S. Murphree

Experimental Design: Unified Concepts, Practical Applications, and


Computer Implementation
Copyright Business Expert Press, LLC, 2015.
All rights reserved. No part of this publication may be reproduced,
stored in a retrieval system, or transmitted in any form or by any
meanselectronic, mechanical, photocopy, recording, or any other
except for brief quotations, not to exceed 400 words, without the prior
permission of the publisher.
First published in 2015 by
Business Expert Press, LLC
222 East 46th Street, New York, NY 10017
www.businessexpertpress.com
ISBN-13: 978-1-60649-958-0 (paperback)
ISBN-13: 978-1-60649-959-7 (e-book)
Business Expert Press Quantitative Approaches to Decision Making
Collection
Collection ISSN: 2163-9515 (print)
Collection ISSN: 2163-9582 (electronic)
Cover and interior design by Exeter Premedia Services Private Ltd.,
Chennai, India
First edition: 2015
10 9 8 7 6 5 4 3 2 1
Printed in the United States of America.

Abstract
Experimental Design: Unified Concepts, Practical Applications, and Computer Implementation is a concise and innovative book that gives a complete
presentation of the design and analysis of experiments in approximately
one half the space of competing books. With only the modest prerequisite
of a basic (noncalculus) statistics course, this text is appropriate for the
widest possible audience.

Keywords
experimental design, fractional factorials, Latin square designs, nested
designs, one factor analysis, one-way ANOVA, randomized block design,
response surfaces, split plot design, two factor analysis, two level factorial
designs, two-way ANOVA

Contents
Prefaceix
Chapter 1 
An Introduction to Experimental Design:
One Factor Analysis........................................................ 1
Chapter 2 
Two Factor Analysis..................................................... 45
Chapter 3

More Advanced Experimental Designs.......................125

Chapter 4 
Two Level Factorials, Fractional Factorials,
Block Confounding, and Response Surfaces...............179
Appendix A

Statistical Tables..........................................................249

References257
Index259

Preface
Experimental Design: Unified Concepts, Practical Applications, and Computer Implementation is a concise and innovative book that gives a c omplete
presentation of the design and analysis of experiments in approximately
one half the space of competing books. With only the modest prerequisite of a basic (noncalculus) statistics course, this text is appropriate
for the widest possible audiencecollege juniors, seniors, and first year
graduate students in business, the social sciences, the sciences and statistics, as well as professionals in business and industry. Using a unique and
integrative approach, this text organizes and presents the two procedures
for analyzing experimental design dataanalysis of variance (ANOVA)
and regression analysisin such a way that the reader or instructor can
move through the material more quickly and efficiently than when using
competing books and so that the true advantages of both ANOVA and
regression analysis are made clearer.
Because ANOVA is more intuitive, this book devotes most of its first
three chapters to showing how to use ANOVA to analyze the type of
experimental design data that it can be validly used to analyze: balanced
(equal sample size) data or unbalanced (unequal sample size) data from
one factor studies, balanced data from two factor studies (two-way
factorials and randomized block designs), and balanced data from three
or more factor studies. Chapter 3 includes a general ANOVA procedure
for analyzing balanced data experiments.
Regression analysis can be used to analyze almost any balanced or
unbalanced data experiment but is less intuitive than ANOVA. Therefore,
this book waits to discuss regression analysis until it is needed to analyze
data that cannot be analyzed by ANOVA. This is in Section 2.4, where
the analysis of unbalanced data resulting from two-way factorials is discussed. Waiting until Section 2.4, gives more space to explain regression
analysis from first principles to readers who have little or no background
in this subject and also allows concise discussion of the regression analyses of one factor studies and incomplete block designs. Section 2.4 also

x PREFACE

introduces using regression to analyze experimental designs employing


a covariate (the analysis of covariance), which is discussed in detail in
the companion book to this book: Regression Analysis: Unified Concepts,
Practical Applications, and Computer Implementation. Readers who wish
to study all of the ANOVA procedures in Chapters 1, 2, and 3 before
studying regression analysis may skip Section 2.4 without loss of continuity. Such readers would study Section 2.4 after completing Chapter 3 and
then proceed to Chapter 4. Chapter 4 gives (in our opinion) the clearest
and most informative discussion in any book of using regression analysis
to both understand and analyze data resulting from fractional factorial
and block confounding experiments. Chapter 4 also gives a short discussion of response surface methodology. In addition, all chapters feature
motivating examples and conclude with a section showing how to use
SAS and with a set of exercises. Excel, MINITAB, and SAS outputs are
used throughout the text, and the books website contains more exercises
for each chapter.
Author Bruce Bowerman would like to thank Professor David
Nickerson of the University of Central Florida for motivating the writing
of this book. Author Bowerman would also like to thank Professor John
Skillings of Miami University for many helpful discussions concerning
experimental design. In this book we have used some examples and ideas
from an excellent, more theoretical experimental design book written by
Professor Skillings and Professor Don Weber (see the references). All three
authors would like to thank editor Scott Isenberg, production manager
Destiny Hadley, and permission editor Marcy Schnidewind, as well as the
fine people at Exeter, for their hard work. Most of all, we are indebted to
our families for their love and encouragement over the years.
Bruce L. Bowerman
Richard T. OConnell
Emily S. Murphree

CHAPTER 1

An Introduction to
Experimental Design:
One Factor Analysis
1.1 Basic Concepts of Experimental Design
In many statistical studies a variable of interest, called the response
variable (or dependent variable), is identified. Then data are collected
that tell us about how one or more factors (or independent variables)
influence the variable of interest. If we cannot control the factor(s) being
studied, we say that the data obtained are observational. For example,
suppose that in order to study how the size of a home relates to the sales
price of the home, a real estate agent randomly selects 50 recently sold
homes and records the square footages and sales prices of these homes.
Because the real estate agent cannot control the sizes of the randomly
selected homes, we say that the data are observational.
If we can control the factors being studied, we say that the data are
experimental. Furthermore, in this case the values, or levels, of the
factor (or combination of factors) are called treatments. The purpose
of most experiments is to compare and estimate the effects of the different treatments on the response variable. For example, suppose that an
oil company wishes to study how three different gasoline types (A, B,
and C ) affect the mileage obtained by a popular compact automobile
model. Here the response variable is gasoline mileage, and the company
will study a single factorgasoline type. Because the oil company can
control which gasoline type is used in the compact automobile, the data
that the oil company will collect are experimental. Furthermore, the

EXPERIMENTAL DESIGN

treatmentsthe levels of the factor gasoline typeare gasoline types A,


B, and C .
In order to collect data in an experiment, the different treatments
are assigned to objects (people, cars, animals, or the like) that are called
experimental units. For example, in the gasoline mileage situation, gasoline types A, B, and C will be compared by conducting mileage tests
using a compact automobile. The automobiles used in the tests are the
experimental units. In general, when a treatment is applied to more than
one experimental unit, it is said to be replicated. Furthermore, when
the analyst controls the treatments employed and how they are applied
to the experimental units, a designed experiment is being carried out.
A commonly used, simple experimental design is called the completely
randomized experimental design.
In a completely randomized experimental design, independent
random samples of experimental units are assigned to the treatments.
As illustrated in the following example, we can sometimes assign independent random samples of experimental units to the treatments by
assigning different random samples of experimental units to different
treatments.
Example 1.1
North American Oil Company is attempting to develop a reasonably
priced gasoline that will deliver improved gasoline mileages. As part of
its development process, the company would like to compare the effects
of three types of gasoline (A, B, and C ) on gasoline mileage. For testing purposes, North American Oil will compare the effects of gasoline
types A, B, and C on the gasoline mileage obtained by a popular compact
model called the Lance. Suppose the company has access to 1,000 Lances
that are representative of the population of all Lances, and suppose the
company will utilize a completely randomized experimental design that
employs samples of size five. In order to accomplish this, five Lances will
be randomly selected from the 1,000 available Lances. These autos will be
assigned to gasoline type A. Next, five different Lances will be randomly
selected from the remaining 995 available Lances. These autos will be
assigned to gasoline type B. Finally, five different Lances will be randomly

AN INTRODUCTION TO EXPERIMENTAL DESIGN

Table 1.1 The Gasoline Mileage Data


Gasoline type B

Gasoline type C

yA1 = 34.0

yB1 = 35.3

yC1 = 33.3

yA2 = 35.0

yB2 = 36.5

yC2 = 34.0

yA3 = 34.3

yB3 = 36.4

yC3 = 34.7

yA4 = 35.5

yB4 = 37.0

yC4 = 33.0

yA5 = 35.8

yB5 = 37.6

yC5 = 34.9

Mileage

Gasoline type A

38
37
36
35
34
33
A

B
Gas type

selected from the remaining 990 available Lances. These autos will be
assigned to gasoline type C .
Each randomly selected Lance is test driven using the appropriate gasoline type (treatment) under normal conditions for a specified distance,
and the gasoline mileage for each test drive is measured. We let yij denote
the jth mileage obtained when using gasoline type i. The mileage data
obtained are given in Table 1.1. Here we assume that the set of gasoline
mileage observations obtained by using a particular gasoline type is a sample randomly selected from the infinite population of all Lance mileages
that could be obtained using that gasoline type. Examining the box plots
shown below the mileage data, we see some evidence that gasoline type B
yields the highest gasoline mileages.

1.2 One-Way Analysis of Variance


Suppose we wish to study the effects of p treatments (treatments
1, 2,..., p) on a response variable. For any particular treatment, say
treatment i, we define mi and si to be the mean and standard deviation
of the population of all possible values of the response variable that
could potentially be observed when using treatment i. Here we refer

EXPERIMENTAL DESIGN

to mi as treatment mean i. The goal of one-way analysis of variance


(often called one-way ANOVA) is to estimate and compare the effects
of the different treatments on the response variable. We do this by estimating and comparing the treatment means m1 , m2 ,, m p . Here we assume
that a sample has been randomly selected for each of the p treatments
by employing a completely randomized experimental design. We let ni
denote the size of the sample that has been randomly selected for treatment i, and we let yij denote the jth value of the response variable that
is observed when using treatment i. The one factor model describing
yij says that
yij = mi + eij
= m + ti + eij
Here, mi = m + ti is treatment mean i, m is a parameter common to all
treatments called the overall mean, and ti is a parameter unique to the ith
treatment called the ith treatment effect. Furthermore, eij is an error term
that tells us by how much yij deviates from mi . This error term describes
the effects of all factors other than treatment i on yij . To give precise definitions of m and ti, we impose the side condition that says that
p

i =1

=0

This implies that m., which we define to be the mean of the treatment
means, is
p

m. =

mi
i =1

( m + ti )
i =1

p m + ti
i =1

=m

That is, the previously considered overall mean m is equal to m., the mean
of the treatment means. Moreover, because mi = m + ti the treatment
effect ti is equal to mi m = mi m., the difference between the ith treatment mean and the mean of the treatment means.

AN INTRODUCTION TO EXPERIMENTAL DESIGN

The point estimate of the ith treatment mean mi is


ni

yi =

y
j =1

ij

ni

the mean of the sample of ni values of the response variable observed


when using treatment i. Moreover, the point estimate of si , the standard
deviation of the population of all possible values of the response variable
that could potentially be observed when using treatment i, is

ni

si =

( y
j =1

ij

yi )2

ni 1

the standard deviation of the sample of ni values of the response variable


observed when using treatment i.
For example, consider the gasoline mileage situation. We let mA, mB ,
and mC denote the means and s A, sB , and sC denote the standard deviations
of the populations of all possible gasoline mileages using gasoline types
A, B, and C . To estimate these means and standard deviations, North
American Oil has employed a completely randomized experimental
designand has obtained the samples of mileages in Table 1.1. The means
of these samples y A = 34.92, y B = 36.56, and yC = 33.98are the
point estimates of mA , mB , and mC . The standard deviations of these
sampless A = .7662, sB = .8503, and sC = .8349are the point estimates of s A, sB , and sC . Using these point estimates, we will (later in
this section) test to see whether there are any statistically significant differences between the treatment means mA, mB , and mC . If such differences
exist, wewill estimate the magnitudes of these differences. This will allow
North American Oil to judge whether these differences have practical
importance.
The one-way ANOVA formulas allow us to test for significant differences between treatment means and allow us to estimate differences

EXPERIMENTAL DESIGN

between treatment means. The validity of these formulas requires that the
following three ANOVA assumptions hold:
1. Constant variancethe p populations of values of the response variable associated with the treatments have equal variances. We denote
the constant variance as s 2.
2. Normalitythe p populations of values of the response variable
associated with the treatments all have normal distributions.
3. Independencethe different yij response variable values are statistically independent of each other.
Because the previously described process of randomly assigning experimental units to the treatments implies that each yij can be assumed to be
a randomly selected response variable value, the ANOVA assumptions
say that each yij is assumed to have been randomly and independently
selected from a population of response variable values that is normally
distributed with mean mi and variance s 2. Stated in terms of the error
term of the one factor model yij = mi + eij , the ANOVA assumptions say
that each eij is assumed to have been randomly and independently selected
from a population of error term values that is normally distributed with
mean zero and variance s 2.
The one-way ANOVA results are not very sensitive to violations of
the equal variances assumption. Studies have shown that this is particularly true when the sample sizes employed are equal (or nearly equal).
Therefore, a good way to make sure that unequal variances will not be a
problem is to take samples that are the same size. In addition, it is useful
to compare the sample standard deviations s1 , s2 ,..., s p to see if they are
reasonably equal. As a general rule, the one-way ANOVA results will be
approximately correct if the largest sample standard deviation is no more than
twice the smallest sample standard deviation. The variations of the samples
can also be compared by constructing a box plot for each sample (as we
have done for the gasoline mileage data in Table 1.1). Several statistical
tests also employ the sample variances to test the equality of the population variances. See Section 1.3.
The normality assumption says that each of the p populations is normally distributed. This assumption is not crucial. It has been shown that

AN INTRODUCTION TO EXPERIMENTAL DESIGN

the one-way ANOVA results are approximately valid for mound-shaped


distributions. It is useful to construct a box plot or a stem-and-leaf display for each sample. If the distributions are reasonably symmetric, and if
there are no outliers, the ANOVA results can be trusted for sample sizes
as small as 4 or 5. As an example, consider the gasoline mileage study of
Example 1.1. The box plots of Table 1.1 suggest that the variability of the
mileages in each of the three samples is roughly the same. Furthermore,
the sample standard deviations s A = .7662, sB = .8503, and sC = .8349
are reasonably equal (the largest is not even close to twice the smallest).
Therefore, it is reasonable to believe that the constant variance assumption is satisfied. Moreover, because the sample sizes are the same, unequal
variances would probably not be a serious problem anyway. Many small,
independent factors influence gasoline mileage, so the distributions of
mileages for gasoline types A, B, and C are probably mound-shaped.
In addition, the box plots of Table 1.1 indicate that each distribution
is roughly symmetric with no outliers. Thus, the normality assumption
probably approximately holds. Finally because North A
merican Oil
employed a completely randomized design, with each of the fifteen different response variable values (gasoline mileages) being obtained by using a
different experimental unit (car), the independence assumption probably
holds.
1.2.1Testing for Significant Differences Between
Treatment Means
As a preliminary step in one-way ANOVA, we wish to determine whether
there are any statistically significant differences between the treatment means
m1 , m2 ,..., m p. To do this, we test the null hypothesis
H 0 : m1 = m2 = ... = m p
which says that the p treatment means are equal. Moreover, because
we have seen that the ith treatment effect ti is equal to mi m, the null
hypothesis H 0 : m1 = m2 = = m p is equivalent to the null hypothesis
H 0 : t1 = t2 = .... = t p = 0

EXPERIMENTAL DESIGN

That is, the null hypothesis H 0 of equal treatment means is equivalent


to the null hypothesis that all treatment effects are zero, which says that
all the treatments have the same effect on the mean response. We test H 0
versus the alternative hypothesis
H a : At least two of m1 , m2 ,..., m p differ
or, equivalently,

H a : At least two of t1 , t2 ,, t p do not equal zero

This alternative says that at least two treatments have different effects on
the mean response.
To carry out such a test, we compare what we call the between-
treatment variability to the within-treatment variability. We can
understand and numerically measure these two types of variability by
defining several sums of squares and mean squares. To begin to do this
we define n to be the total number of experimental units employed in the
one-way ANOVA, and we define y to be the overall mean of all observed
values of the response variable. Then we define the following:
The treatment sum of squares is
p

SST = ni ( yi y )2
i =1

In order to compute SST , we calculate the difference between each sample treatment mean yi and the overall mean y , we square each of these
differences, we multiply each squared difference by the number of observations for that treatment, and we sum over all treatments. The SST measures the variability of the sample treatment means. For instance, if all the
sample treatment means ( yi values) were equal, then the treatment sum
of squares would be equal to 0. The more the yi values vary, the larger will

AN INTRODUCTION TO EXPERIMENTAL DESIGN

be SST . In other words, the treatment sum of squares measures the amount
of between-treatment variability.
As an example, consider the gasoline mileage data in Table 1.1. In this
experiment we employ a total of
n = nA + nB + nC = 5 + 5 + 5 = 15
experimental units. Furthermore, the overall mean of the 15 observed
gasoline mileages is
y=

34.0 + 35.0 + ... + 34.9 527.3


=
= 35.153
15
15

Then
SST =

i = A , B ,C

ni ( yi y )2

= nA ( y A y )2 + nB ( yB y )2 + nC ( yC y )2
= 5(34.922 35.153)2 + 5(36.56 35.153)2 + 5(33.98 35.153)2
= 17.0493
In order to measure the within-treatment variability, we define the
following quantity:
The error sum of squares is
n1

n2

np

SSE = ( y1 j y1 ) + ( y2 j y2 ) + ... + ( y pj y p )2
j =1

j =1

j =1

Here y1 j is the jth observed value of the response in the first sample, y2 j
is the jth observed value of the response in the second sample, and so
forth. The previous formula says that we compute SSE by calculating the
squared difference between each observed value of the response and its

10

EXPERIMENTAL DESIGN

corresponding sample treatment mean and by summing these squared


differences over all the observations in the experiment.
The SSE measures the variability of the observed values of the response
variable around their respective sample treatment means. For example,
if there were no variability within each sample, the error sum of squares
would be equal to 0. The more the values within the samples vary, the
larger will be SSE.
As an example, in the gasoline mileage study, the sample treatment
means are y A = 34.92, yB = 36.56, and yC = 33.98. It follows that
nA

nC

nB

SSE = ( y Aj y A )2 + ( yBj yB )2 + ( yCj yC )2


j =1

j =1

j =1

= [(34.0 34.92) + (35.0 34.92 ) + (34.3 34.92 )2


2

+ (35.5 34.92 )2 + (35.8 34.92)2 ] + [(35.3 36.56)2


+ (36.5 36.56)2 + (36.4 36.56)2 + (37.0 36.56)2
+ (37.6 36.56)2 ] + [(33.3 33.98)2 + (34.0 33.98)2
+ (34.77 33.98)2 + (33.0 33.98)2 + (34.9 33.98)2 ]
= 8.028
Finally, we define a total sum of squares, denoted SSTO, to be
p

ni

SSTO = ( yij y )2
i =1 j =1

It can be shown that SSTO is the sum of SST and SSE . That is:
SSTO = SST + SSE
This says that the total variability in the observed values of the response
must come from one of two sourcesthe between-treatment variability
or the within-treatment variability. Therefore, the SST and SSE are said to
partition the total sum of squares. For the gasoline mileage study
SSTO = SST + SSE = 17.0493 + 8.028 = 25.0773

AN INTRODUCTION TO EXPERIMENTAL DESIGN

11

Using the treatment and error sums of squares, we next define two mean
squares. The treatment mean square is
MST =

SST
p 1

MSE =

SSE
n p

The error mean square is

In order to test whether there are any statistically significant differences


between the treatment means, we compare the amount of between-
treatment variability (MST) to the amount of within-treatment variability (MSE) by using the test statistic
F=

MST
MSE

It can be shown that if the null hypothesis H 0 : m1 = m2 = = m p is


true, then the population of all possible values of F is described by an F
distribution having p 1 numerator and n p denominator degrees of
freedom. It can also be shown that E ( MST ) and E ( MSE ), the expected
values of the mean squares MST and MSE , are given by the formulas
p

E ( MST ) = s 2 +

n ( m m. )
i =1

p 1

* 2

and E ( MSE ) = s 2

where m.* = ni mi / n. If H 0 : m1 = m2 = ... = m p is true, the part of


i =1

E ( MST ) after the plus sign equals zero and thus E ( MST ) = s 2. This
implies that E ( MST ) / E ( MSE ) = 1. On the other hand, if H 0 is not
true, the part of E ( MST ) after the plus sign is greater than 0 and thus
E ( MST ) > s 2. This implies that E ( MST ) / E ( MSE ) > 1. We conclude
that values of F = MST / MSE that are large (substantially greater than 1)

12

EXPERIMENTAL DESIGN

would lead us to reject H 0. To decide exactly how large F has to be to


reject H 0, we consider the probability of a Type I error for the hypothesis
test. A Type I error is committed if we reject H 0 : m1 = m2 = ... = m p when
H 0 is true. This means that we would conclude that the treatment means
differ when they do not differ. To perform the hypothesis test, we set
the probability of a Type I error (also called the level of significance) for
the test equal to a specified value a. The smaller the value of a at which
we can reject H 0, the smaller is the probability that we have concluded
that the treatment means differ when they do not differ. Therefore, the
stronger is the evidence that we have made the correct decision in concluding that the treatment means differ. In practice, we usually choose a
to be between .10 and .01, with .05 being the most common choice of a.
If we can reject H 0 at level of significance .05, we regard this as strong
evidence that the treatment means differ. Note that we rarely set a lower
than .01 because doing so would mean that the probability of a Type II
error (failing to conclude that the treatment means differ when they really
do differ) would be unacceptably large.

An F Test for Differences Between Treatment Means


Suppose that we wish to compare p treatment means m1 , m2 ,, m p
and consider testing
H 0 : m1 = m2 = ... = m p (all treatment means are equal)
versus
H a : At least two of m1 , m2 ,..., m p differ
(at least two treatment means differ)
Define the F statistic
F=

MST SST / ( p 1)
=
MSE SSE / (n p )

AN INTRODUCTION TO EXPERIMENTAL DESIGN

13

Also define the p-value related to F to be the area under the curve
of the F distribution having p 1 numerator and n p denominator
degrees of freedom to the right of F .
Then, we can reject H 0 in favor of H a at level of significance a if
either of the following equivalent conditions hold:
1. F > Fa

2. p -value < a

Here,
F > Fa is the point on the horizontal axis under the curve of the F
distribution having p 1 numerator and n p denominator degrees of
freedom that gives a right hand tail area equal to a.

Figure 1.1 illustrates the rejection point


F > Fa and the p-value for the
hypothesis test. A large value of F results when MST , which measures the
between treatment variability, is large in comparison to MSE, which measures the within treatment variability. If F is large enough, this implies
that the null hypothesis H 0 should be rejected. The rejection point
F > Fa tells
us when F is large enough to reject H 0 at level of significance a. When F
is large, the associated p-value is small. If this p-value is less than a, we
can reject H 0 at level of significance a.
Example 1.2
Consider the North American Oil Company data in Table 1.1. The company wishes to determine whether any of gasoline types A, B, and C
have different effects on mean Lance gasoline mileage. That is, we wish
to see whether there are any statistically significant differences between
mA , mB , and mC . To do this, we test the null hypothesis H 0 : mA = mB = mC ,
which says that gasoline types A, B, and C have the same effects on mean
gasoline mileage. We test H 0 versus the alternative H a : At least two of
mA , mB , and mC differ, which says that at least two of gasoline types A, B,
and C have different effects on mean gasoline mileage.
Because we have previously computed SST to be 17.0493 and SSE to
be 8.028, and because we are comparing p = 3 treatment means, we have

14

EXPERIMENTAL DESIGN

The curve of the F distribution having


p and n p degrees of freedom

1a

a = The level of
significance
Fa

If F(model) Fa,
do not reject H0 in favor of Ha

If F(model) > Fa,


reject H0 in favor of Ha

(a) The rejection point Fa based on setting the


probability of a Type I error equal to a
The curve of the F distributon having
p and np degrees of freedom

p-value
F(model)
(b) If the p-value is smaller than a, then
F(model) > Fa and we reject H0.

Figure 1.1 An F test for testing for differences between treatment


means

MST =

SST 17.0493
=
= 8.525
p 1
3 1

and
MSE =

SSE
8.028
=
= 0.669
n p 15 3

It follows that
F=

MST 8.525
=
= 12.74
MSE 0.669

In order to test H 0 at the .05 level of significance, we use F.05 with


p 1 = 3 1 = 2 numerator and n p = 15 3 = 12 denominator degrees

AN INTRODUCTION TO EXPERIMENTAL DESIGN

15

of freedom. Table A1 in Appendix A tells us that this F point equals 3.89,


so we have
F = 12.74 > F.05 = 3.89
Therefore, we reject H0 at the .05 level of significance. This says we have
strong evidence that at least two of the treatment means mA , mB , and mC
differ. In other words, we conclude that at least two of gasoline types A, B,
and C have different effects on mean gasoline mileage.
The results of an analysis of variance are often summarized in what is
called an analysis of variance table. This table gives the sums of squares
(SST , SSE , and SSTO ), the mean squares ( MST and MSE ), and the F
statistic and its related p-value for the ANOVA. The table also gives the
degrees of freedom associated with each source of variationtreatments,
error, and total. Table 1.2 gives the ANOVA table for the gasoline mileage
problem. Notice that in the column labeled Sums of squares, the values
of SST and SSE sum to SSTO.
Figure 1.2 gives the MINITAB and Excel output of an analysis of
variance of the gasoline mileage data. Note that the upper portion of
theMINITAB output and the lower portion of the Excel output give the
ANOVA table of Table 1.2. Also, note that each output gives the value
F = 12.74 and the related p-value, which equals .001(rounded). Because
this p-value is less than .05, we reject H 0 at the .05 level of significance.
Figure 1.3 gives the SAS output of an analysis of variance of the gasoline
mileage data.
1.2.3Statistical Inference for Pairwise Differences Between and
Linear Combinations of Treatment Means
If the one-way ANOVA F test says that at least two treatment means
differ, then we investigate which treatment means differ and we estimate how large the differences are. We do this by making what we call
pairwise comparisons (that is, we compare treatment means two at a
time). One way to make these comparisons is to compute point estimates of and confidence intervals for pairwise differences. For example,
in the gasoline mileage case we might estimate the pairwise differences

16

EXPERIMENTAL DESIGN

Table 1.2 Analysis of variance (ANOVA) table for testing


H 0 : mA = mB = mC

Source

Degrees
of
freedom

Treatments

p 1= 31
=2

SST = 17.0493

Error

n p = 15 3
= 12

SSE = 8.028

Total

n 1 = 15 1
= 14

SSTO = 25.0773

Sums of
squares

Mean
squares
SST
p 1
17.0493
=
31
= 8.525

MST =

F statistic p-value
MST
MSE
8.525
=
0.669
= 12.74

F=

0.001

SSE
np
8.028
=
15 3
= 0.669

MSE =

mB mA , mA mC , and mB mC . Here, for instance, the pairwise difference mB mA can be interpreted as the change in mean mileage achieved
by changing from using gasoline type A to using gasoline type B.
There are two approaches to calculating confidence intervals for pairwise differences. The first involves computing the usual, or individual,
confidence interval for each pairwise difference. Here, if we are computing
100(1 a ) percent confidence intervals, we are100(1 a ) percent confident that each individual pairwise difference is contained in its respective
interval. That is, the confidence level associated with each (individual)
comparison is100(1 a ) percent, and we refer to a as the comparisonwise error rate. However, we are less than100(1 a ) percent confident
that all of the pairwise differences are simultaneously contained in their
respective intervals. A more conservative approach is to compute simultaneous confidence intervals. Such intervals make us100(1 a )percent
confident that all of the pairwise differences are simultaneously contained
in their respective intervals. That is, when we compute simultaneous
intervals, the overall confidence level associated with all the comparisons
being made in the experiment is100(1 a ) percent, and we refer to a as
the experimentwise error rate.

N
5
5
5

SS

Source of Variation

np

3 n1

10

34.8

36.0

8 MSE

0.0011 10

Pvalue

7 MST

12.7424 9

0.587
0.723
0.697

Variance

6 SSTO

8.5247 7
0.6690 8

2 1
12 2
14 3
5 SSE

MS

34.92 11
36.56 12
33.98 13

Average

33.6

Upper
3.0190
0.439

12 y
B

13 y
C

14 F
05

Type B subtracted from:


Lower
Center
Upper
Type C -3.9590 -2.5800 -1.2010

Type A subtracted from:


Lower
Center
Type B
1.6400
0.2610
Type C -2.3190 -0.9400

Tukey 95% Simultaneous


Confidence Intervals

9 F statistic 10 pvalue related to F 11 y


A

3.8853 14

F crit

37.2

++++

()
*

()
*

()
*

++++

df

SST

P
0.001

Figure 1.2 MINITAB and Excel output of an analysis of variance of the gasoline mileage data in Table 1.1

Sum

174.6
182.8
169.9

Type C
F
12.74 9

Individual 95%
CIs For Mean Based on Pooled StDev

Type B,
MS
8.525 7
0.669 8

AN INTRODUCTION TO EXPERIMENTAL DESIGN

1 p1

17.0493 4
8.0280 5
25.0773 6

5
5
5

Type A
Type B
Type C
ANOVA

(b) The Excel output


SUMMARY
Groups
Count

Between Groups
Within Groups
Total

Type A,
SS
17.049 4
8.028 5
25.077 6

StDev
Mean
34.920 11 0.766
36.560 12 0.850
33.980 13 0.835

Pooled StDev = 0.818

Level
Type A
Type B
Type C

(a) The MINITAB output


ANOVA:
One-way
DF
Source
2 1
Gas Type
12 2
Error
14 3
Total


17

2.326733

36.56000000p 1.26000000

Residual

35.76301898q

34.60780316r

38.51219684r

Figure 1.3 SAS output of an analysis of variance of the gasoline mileage data in Table 1.1

37.35698102q

Lower 95% CL
Upper 95% CL
for Individual for Individual

1.26000000

Upper 95% CL
for Mean

0.51730069o
0.51730069
0.51730069
0.44799550

Std Error of
Estimate

35.15333333

Pr > F
0.0011j

MILEAGE Mean

F Value
12.74i

Residual Lower 95% CL


for Mean

0.0081n
0.0942
0.0003
0.0007

p1 b np c n1 d SST e SSE f SSTO g MST h MSE i F j pvalue for F k s = MSE l yB yA


t for testing H0: B A = 0 n pvalue for testing H0: B A = 0 o MSE(1/nB + 1/nA)
p
yB q 95% confidence interval for B r 95% prediction interval for yB,0

35.30000000

Predicted
Value

Observed
Value

Observation

36.56000000p

Predicted
Value

3.17m
1.82
4.99
4.71

Pr > |T|

0.81792420
k

Root MSE

Mean Square
8.52466667g
0.66900000h

T for H0:
Parameter=0

35.30000000

Observed
Value

1.64000000l
0.94000000
2.58000000
2.11000000

Estimate

Sum of Squares
17.04933333d
8.02800000e
25.07733333f
C. V.

Observation

MUBMUA
MUAMUC
MUBMUC
MUB(MUC+MUA)/2

Parameter

0.679870

Source
DF
Model
2a
Error
12b
Corrected Total 14c
Rsquare

Dependent Variable: MILEAGE

18
EXPERIMENTAL DESIGN

AN INTRODUCTION TO EXPERIMENTAL DESIGN

19

Several kinds of simultaneous confidence intervals can be computed.


We first present what is called the Tukey formula for simultaneous intervals. If we are interested in studying all pairwise differences between treatment means, the Tukey formula yields the most precise (shortest) simultaneous
confidence intervals. In general, a Tukey simultaneous 100(1 a) percent
confidence interval is longer than the corresponding individual 100(1 a)
percent confidence interval. Thus, intuitively, we are paying a penalty for
simultaneous confidence by obtaining longer intervals. One pragmatic
approach to comparing treatment means is to first determine if we can
use the more conservative Tukey intervals to make meaningful pairwise
comparisons. If we cannot, then we might see what the individual intervals tell us. In the following box we present both individual and Tukey
simultaneous confidence intervals for pairwise differences. We also present the formula for a confidence interval for a single treatment mean and
the formula for a prediction interval for an individual response, which
we might use after we have used pairwise comparisons to determine the
best treatment.

Estimation and Prediction in One-Way ANOVA


1. Consider the pairwise difference mi mh which can be interpreted to be the change in the mean value of the response variable
associated with changing from using treatment h to using treatment i. Then, a point estimate of the difference mi mh is yi yh
where yi and yh are the sample treatment means associated with
treatments i and h.
2. An individual 100(1 a) percent confidence interval for mi mh is

1 1
( yi yh ) t a / 2 MSE +
ni nh

Here, t a /2 is the point on the horizontal axis under the curve of


the t distribution having n p degrees of freedom that gives a

20

EXPERIMENTAL DESIGN

right hand tail area equal to /2. Table A2 in Appendix A is a


table of t points.
3. A Tukey simultaneous 100(1 a) percent confidence interval
for mi mh in the set of all possible pairwise differences between
treatment means is

MSE
( yi yh ) qa

Here the value qa is obtained from Table A3, which is a table


of percentage points of the studentized range. In this table qa
is listed corresponding to values of p and n p. Furthermore,
we assume that the sample sizes ni and nh are equal to the same
value, which we denote as m. If ni and nh are not equal, we replace
qa MSE / m by (qa / 2 ) MSE[(1 / ni ) + (1 / nh ). In this case,
the confidence interval is only approximately correct.
4. A point estimate of the treatment mean mi is yi and an individual 100(1 a) percent confidence interval for mi is

MSE
yi t a /2

ni

5. A point prediction of yi 0 = mi + ei 0, a randomly selected individual value of the response variable when using treatment i, is yi, and
a 100(1 a) percent prediction interval for yi 0 is

1
yi t a /2 MSE 1 +
ni

Note that, because the ANOVA assumptions imply that the error term
ei 0 is assumed to be randomly selected from a normally distributed
population of error term values having mean zero, ei 0 has a fifty percent
chance of being positive and a fifty percent chance of being negative.
Therefore, we predict ei 0 to be zero, and this implies that the point

AN INTRODUCTION TO EXPERIMENTAL DESIGN

21

estimate yi of mi is also the point prediction of yi 0 = mi + ei 0. However,


because the error term ei 0 will probably not be zero, yi is likely to be
less accurate as a point prediction of yi 0 = mi + ei 0 than as a point estimate of mi . For this reason, the100(1 a ) percent prediction interval
for yi 0 = mi + ei 0 has an extra 1 under the radical and thus is longer
than the100(1 a ) percent confidence interval for mi.

Example 1.3
In the gasoline mileage study, we are comparing p = 3 treatment means
( mA , mB , and mC ). Furthermore, each sample is of size m = 5, there are a
total of n = 15 observed gas mileages, and the MSE found in Table 1.2 is
.669. Because q.05 = 3.77 is the entry found in Table A3 corresponding to
p = 3 and n p = 12, a Tukey simultaneous 95 percent confidence interval for mB mA is

MSE
.669

( yB y A ) q.05
= (36.56 34.92 ) 3.77
m
5

= [1.64 1.379]
= [.261, 3.019]
Similarly, Tukey simultaneous 95 percent confidence intervals for mA mC
and mB mC are, respectively,

[( y

[( y

yC ) 1.379] = [(34.92 33.98) 1.379]


= [.94 1.379]
= [ 0.439, 2.319]

and
yC ) 1.379] = [(36.56 33.98) 1.379]
= [2.58 1.379]
= [1.201, 3.959]

22

EXPERIMENTAL DESIGN

These intervals make us simultaneously 95 percent confident that


(1)changing from gasoline type A to gasoline type B increases mean mileage by between .261 and 3.019 mpg, (2) changing from gasoline type C
to gasoline type A might decrease mean mileage by as much as .439 mpg
or might increase mean mileage by as much as 2.319 mpg, and (3)changing from gasoline type C to gasoline type B increases mean mileage by
between 1.201 and 3.959 mpg. The first and third of these intervals make
us 95 percent confident that mB is at least .261 mpg greater than mA and
at least 1.201 mpg greater than mC . Therefore, we have strong evidence
that gasoline type B yields the highest mean mileage of the gasoline types
tested. Furthermore, noting that t .025 based on n p = 12 degrees of freedom is 2.179 (see Table A2), it follows that an individual 95 percent
confidence interval for mB is

MSE
.669
yB t.025
= 36.56 2.179

nB
5

= [35.763, 37.357 ]
This interval says we can be 95 percent confident that the mean mileage obtained by all Lances using gasoline type B is between 35.763 and
37.357 mpg. Also, a 95 percent prediction interval for yB 0 = mB + eB 0,
the mileage obtained by a randomly selected individual Lance when
driven using gasoline type B, is

1
1
yB t .025 MSE 1 + = 36.56 2.179 .669 1 +
5
nB

= [34.608, 38.512]
Notice that the 95 percent confidence interval for mB is graphed on the
MINITAB output of Figure 1.2, and both the 95 percent confidence
interval for mB and the 95 percent prediction interval for an individual
Lance mileage using gasoline type B are given on the SAS output in Figure 1.4. The MINITAB output also shows the 95 percent confidence
intervals for mA and mC , and a typical SAS output would also give these

AN INTRODUCTION TO EXPERIMENTAL DESIGN

23

intervals, but to save space we have omitted them. Also, the MINITAB
output gives Tukey simultaneous 95 percent intervals. For example, consider finding the Tukey interval for mB mA on the MINITAB output. To
do this, we look in the table corresponding to Type A subtracted from
and find the row in this table labeled Type B. This row gives the interval
for Type A subtracted from Type Bthat is, the interval for mB mA.
This interval is [.261, 3.109], as previously calculated. Finally, note that
the half-length of the individual 95 percent confidence interval for a pairwise comparison is (because nA = nB = nC = 5 )
1 1
1 1
t .025 MSE + = 2.179 .669 + = 1.127
5 5
n
n
i
h
This half-length implies that the individual intervals are shorter than
the previously constructed Tukey intervals, which have a half-length
of 1.379. Recall, however, that the Tukey intervals are short enough to
allow us to conclude with 95 percent confidence that mB is greater than
mA and mC .
We next suppose in the gasoline mileage situation that gasoline type
B contains a chemicalChemical XXthat is not contained in gasoline
types A or C . To assess the effect of Chemical XX on gasoline mileage, we
consider

mB

mC + mA
2

This is the difference between the mean mileage obtained by using gasoline type B and the average of the mean mileages obtained by using
gasoline types C and A. Note that
1
1
mB ( mC + mA ) / 2 = mA + (1) mB + mC
2
2
= a A mA + aB mB + aC mC
=

l = A , B ,C

al ml

24

EXPERIMENTAL DESIGN

where a A = (1 / 2 ) , aB = 1, and aC = (1 / 2 ). In general, if a1 , a2 ,..., a p


are arbitrary constants, we say that
p

a m
i =1

= a1 m1 + a2 m2 + ... + a p m p

is a linear combination of the treatment means m1 , m2 ,..., m p. As with a


pairwise difference mi mh, we can find a point estimate of and an individual 100(1 a ) percent confidence interval for any linear combination
of the treatment means. However, since Tukey simultaneous 100(1 a )
percent confidence intervals do not exist for linear combinations that
are not simple pairwise differences, we need other kinds of simultaneous
100(1 a ) percent confidence intervals. Two types of such intervals that
apply to general linear combinations are Scheff simultaneous 100(1 a)
percent confidence intervals and Bonferroni simultaneous 100(1 a)
percent confidence intervals. We now summarize estimating a general
linear combination of the treatment means. Because the formulas for the
Scheff intervals involve Fa points based on numerator and denominator
degrees of freedom that vary depending on what is being estimated, we will
be notationally concise and place these degrees of freedom for the appro( p 1, n p )
priate F points in parentheses above the F points. For example, F a
denotes an Fa point based on p 1 numerator and n p denominator
degrees of freedom. We will use such notation at various times in this book.

Estimating a Linear Combination of the Treatment


Means in One-Way ANOVA
p

1. A point estimate of the linear combination ai mi is


i =1

a y
i =1

Letting s = MSE , a100(1 a) percent confidence interval for


p

a m is
i =1

p
ai yi t a / 2 s
i =1

ai2

i =1 ni

AN INTRODUCTION TO EXPERIMENTAL DESIGN

2. We define a contrast to be any linear combination


that

a
i =1

a m
i =1

25

such

= 0 . Suppose that we wish to find a Scheff simulta-

neous 100(1 a) percent confidence interval for a contrast in


the set of all possible contrasts. Then:
a. The Scheff interval for the difference mi mh (which is a
contrast) is

( yi y h )

( p 1) F (ap 1, n p ) s

1 1
+
ni nh
p

b. The Scheff interval for the contrast ai mi is


i =1

p
ai yi
i =1

a2

( p 1) F (ap 1, n p ) s i
n
i =1

3. Suppose that we wish to find a Scheff simultaneous 100(1 a)


percent confidence interval for a linear combination in the set
of all possible linear combinations (some of which are not contrasts). Then,
a.
The Scheff interval for the difference mi mh , is

( yi y h )

( p, n p )

pFa

1 1
+
ni nh
p

b.
The Scheff interval for the linear combination ai mi is
i =1

P
ai yi
i =1

( p, n p )

p Fa

ai2

i =1 ni

4. A Bonferroni simultaneous 100(1 a) percent confidence interval for mi mh in a prespecified set of g linear combinations is

1 1
+
( yi yh ) ta /2g s
ni nh

26

EXPERIMENTAL DESIGN

5. A Bonferroni simultaneous 100(1 a) percent confidence interp

val for

a m
i =1

in a prespecified set of g linear combinations is


p
ai yi ta / 2g s
i =1

ai2

n
i =1

The choice of which Scheff formula to use requires that we make a


decision before we observe the samples. We must decide whether we are
interested in:
1. finding simultaneous confidence intervals for linear combinations, all
of which are contrasts, in which case we use formulas 2a and 2b;
2. finding simultaneous confidence intervals for linear combinations,
some of which are not contrasts, in which case we use formulas 3a
and 3b.
Of course, we will not literally calculate Scheff simultaneous confidence intervals for all possible contrasts (or more general linear combinations). However, the Scheff simultaneous confidence interval
formula applies to all possible contrasts (or more general linear combinations). This allows us to data snoop. Data snooping means that we will
let the data suggest which contrasts or linear combinations we will investigate further. Remember, however, we must decide whether we will
study contrasts or more general linear combinations before we observe
the data. Also,note that because the Tukey formula for simultaneous
100(1 a ) percent confidence intervals applies to all possible pairwise
differences of the treatment means, this formula also allows us to data
snoop, in the sense that it allows us to let the data suggest which pairwise
differences to further investigate. On the other hand, the Bonferroni
formula for Bonferroni simultaneous 100(1 a ) percent confidence
intervals requires that we prespecifybefore we observe the dataaset
of linear combinations. Thus this formula does not allow us to data
snoop.

AN INTRODUCTION TO EXPERIMENTAL DESIGN

27

Example 1.4
Consider the North American Oil Company problem. Suppose that we
had decidedbefore we observed the gasoline mileage data in Table 1.1
that we wished to find Scheff simultaneous 95 percent confidence intervals for all contrasts in the following set of contrasts:
Set I

mB mA

mA mC

mB mC

m + mA
mB C

Suppose that we also wish to find such intervals for other contrasts that
the data might suggest. That is, we are considering all possible contrasts.
To verify, for example, that mB ( mC + mA ) / 2 is a contrast, note that

mB

mC + mA
1
1
= mA + 1 mB mC
2
2
2
= a A mA + aB mB + aC mC

1
1
Here, a A = , aB = 1, and aC = , which implies that
2
2

i = A , B ,C

ai = a A + aB + aC
1
1
= +1
2
2
=0

Moreover,
ai2 a A2 aB2 aC2
=n +n +n
i = A , B ,C ni
A
B
C
1
2
( 2 ) (1)2 ( 12 )2
=
+
+
5
5
5
= .3
Since s = MSE = .669 = .8179, it follows that a Scheff simultaneous
95 percent confidence interval for mB ( mC + mA ) / 2 is (using formula 2b)

28

EXPERIMENTAL DESIGN

y + yA
a2
( p 1, n p )
( p 1) Fa
s i
yB C
2
i = A , B ,C n

33.98 + 34.92

( 3 1,15 3 )
= 36. 56
(3 1) F.05
(.8179) .3
2

= [2.11 2(3.89)(.8179) .3 ]
= [.86, 3.36]
This interval says that we are 95 percent confident that mB is between
.86mpg and 3.36 mpg greater than ( mC + mA ) / 2. Note here that Chemical XX might be a major factor causing mB to be greater than ( mC + mA ) / 2.
However, this is not at all certain. The chemists at North American Oil
must use the previous comparison, along with their knowledge of the
chemical compositions of gasoline types A, B, and C , to assess the effect
of Chemical XX on gasoline mileage. The Scheff simultaneous 95 percent confidence intervals for mB mA , mA mC , and mB mC (the other
contrasts in Set I) can be calculated by using formula 2a.
Next, suppose that we had decidedbefore we observed the gasoline
mileage data in Table 1.1that we wished to calculate Scheff simultaneous 95 percent confidence intervals for all the linear combinations in
Set II:
Set II

mA

mB
m + mA
mB C
2

mC

mB mA

mA mC

mB mC

In addition, suppose that we wish to find such intervals for other linear
combinations that the data might suggest. Note that mA , mB , and mC are
not contrasts. That is, these means cannot be written as i = A , B ,C ai mi,
where

i = A , B ,C

ai = 0

For example,

mB = (0) mA + (1) mB + (0) mC

AN INTRODUCTION TO EXPERIMENTAL DESIGN

29

which implies that

i = A , B ,C

ai = 0 + 1 + 0
=1

Therefore we must use formulas 3a and 3b to calculate Scheff intervals.


Whereas, formulas 2a and 2b use

( p 1) Fa( p 1, n p ) = (3 1) F.05(31,153)
= 2(3.89)
= 2.7893
to calculate Scheff simultaneous 95 percent confidence intervals for all
possible contrasts, formulas 3a and 3b use the larger
( p, n p )

p Fa

( 3,15 3 )

= 3 F.05

= 3(3.49)
= 3.2357
to calculate Scheff simultaneous 95 percent confidence intervals for all
possible linear combinations. Because the formulas 3a and 3b differ from
the respective formulas 2a and 2b only by these comparative values, we
are paying for desiring Scheff simultaneous 95 percent confidence intervals for all possible linear combinations (some of which are not contrasts)
by having longer (and thus less precise) Scheff simultaneous 95percent
confidence intervals for the contrasts.
Next, consider finding Bonferroni simultaneous 95 percent confidence intervals for the prespecified linear combinations mB mA , mA mC , mB mC , and mB
mB mA , mA mC , mB mC , and mB ( mC + mA ) / 2. Since there are g = 4 linear combinations here, we need to find t a / 2 g = t .05/ 2( 4 ) = t .00625. Using Excel to find
t .00625 based on n p = 15 3 = 12 degrees of freedom, we find that
t .00625 = 2.934459. This t point is larger than the previously found Scheff
( 3 1,15 3 )
interval point (3 1) F.05
= 2.7893 for all possible contrasts, so the
Bonferroni simultaneous 95 percent confidence intervals would be longer
than the corresponding Scheff intervals. On the other hand, consider

30

EXPERIMENTAL DESIGN

finding Bonferroni simultaneous 95 percent confidence intervals for the


prespecified two linear combinations mB mC and mB ( mC + mA ) / 2.
Since g = 2, we need to find t a / 2 g = t .05/ 2( 2 ) = t .0125. Using Excel to find
t .0125 based on n p = 15 3 = 12 degrees of freedom, we find that
t .0125 = 2.560033. This t point is smaller than the Scheff interval point

(3 1) F.05(31,153 )

= 2.7893 for all possible contrasts, so the Bonferroni


simultaneous 95 percent confidence intervals would be shorter than the
corresponding Scheff intervals.
We now summarize the use of Tukey, Scheff, and Bonferroni simultaneous confidence intervals.
1. If we are interested in all pairwise comparisons of treatment means,
the Tukey formula will give shorter intervals than will the Scheff
or Bonferroni formulas. If a small number of prespecified pairwise
comparisons are of interest, the Bonferroni formula might give
shorter intervals in some situations.
2. If we are interested in all contrasts (or more general linear combinations) of treatment means, the Scheff formula should be used. If a
small number of prespecified contrasts (or more general linear combinations) are of interest, the Bonferroni formula might give shorter
intervals. This is particularly true if the number of prespecified contrasts (or more general linear combinations) is less than or equal to
the number of treatments.
3. Whereas the Tukey and Scheff formulas can be used for data snooping, the Bonferroni formula cannot.
4. It is reasonable in any given problem to use all of the formulas
(Tukey, Scheff, and Bonferroni) that apply. Then we can choose the
formula that provides the shortest intervals.
Instead of (or in addition to) using confidence intervals to make pairwise
comparisons of treatment means, we can also make such comparisons by
using hypothesis tests. Suppose, for example, that we wish to make several
pairwise comparisons. We might perform several individual t -tests each
with a probability of a Type I error set equal to a. The following summary
box shows how to perform an individual t test.

AN INTRODUCTION TO EXPERIMENTAL DESIGN

31

An Individual t-Test for Pairwise Comparisons


of Treatment Means
Define
t=

yi y h

s (1 / ni ) + (1 / nh )

Also, define the p-value to be twice the area under the curve of the
t -distribution having n p degrees of freedom to the right of t . Then
we can reject H 0 : mi mh = 0 in favor of H a : mi mh 0 at level of
significance a if either of the following equivalent conditions hold.
1. | t | > ta / 2 or | yi yh | > ta / 2 s (1 / ni ) + (1 / nh )
2. p-value < a

For example, in the gasoline mileage situation consider testing


H 0 : mB mA = 0 versus H a : mB mA 0 . Since yB y A = 36.56 34.92 = 1.64
yB y A = 36.56 34.92 = 1.64 and
MSE[(1 / nB ) + (1 / nA )] = .669[(1 / 5) + (1 / 5)] = .5173
= .669[(1 / 5) + (1 / 5)] = .5173, the test statistic t equals 1.64/.5173 = 3.17. The p-value for the
hypothesis test is twice the area under the curve of the t distribution
having n p = 15 3 = 12 degrees of freedom to the right of t = 3.17.
The SAS output in Figure 1.3 gives the point estimates yB y A = 1.64,
the standard error of this estimate MSE[(1 / nB ) + (1 / nA )] = .5173,
and the test statistic t = 3.17. The SAS output also tells us that the
p-value for the hypothesis test is .0018. Because this p-value is less
than an a of .05, we can reject H 0 : mB mA = 0 at the .05 level of significance. In fact, because this p-value is less than an a of .01, we can
also reject H 0 : mB mA = 0 at the .01 level of significance. This would
be regarded as very strong evidence that mB and mA differ. Further examining the SAS output, we see this output gives the point estimates,
standard errors of the estimates, test statistics, and p-values for testing
H 0 : mA mC = 0, H 0 : mB mC = 0, and H 0 : mB ( mC + mA ) / 2 = 0.
For example, the SAS output tells us that the p-value for testing

32

EXPERIMENTAL DESIGN

H 0 : mB mC = 0 is .0003. Because this p-value is less than an a of .001,


we have extremely strong evidence that mB and mC differ. Also, note that the
test statistic t for testing H 0 : mB ( mC + mA ) / 2 = 0 is the point estimate
yB ( yC + y A ) / 2 = 36.56 (33.98 + 34.94 ) / 2 = 2.11 divided by the
standard error of this point estimate, which we have calculated in Example 1.4 and is s ai2 / ni = .8179 .3 = .4480 (within rounding).
i = A , B ,C

When we perform several individual t tests each with the probability


of a Type I error set equal to a, we say that we are setting the comparisonwise error rate equal to a. However, in such a case the probability of
making at least one Type I error (that is, the probability of deciding that at
least one difference between treatment means differs from zero when the
difference does not differ from zero) is greater than a. This probability of
making at least one Type I error is called the experimentwise error rate.
To control the experimentwise error rate, we can carry out hypothesis tests
based on the Bonferroni, Scheff, or Tukey methods. The rejection rules for
these tests are simple modifications of the simultaneous confidence interval formulas. For example, suppose that we wish to test H 0 : mi mh = 0
versus H a : mi mh 0. Using the Bonferroni method, we would declare
the difference between mi and mh to be statistically significant if
| yi yh | > ta / 2g s

1 1
+
ni nh

Here, g is the number of pairwise comparisons in a prespecified set, and


we are controlling the experimentwise error rate over the g pairwise comparisons in the prespecified set. Using the Scheff method, we would
declare the difference between mi and mh to be statistically significant if
| yi y h | >

( p 1) Fa( p 1, n p ) s

1 1
+
ni nh

In this case we are controlling the experimentwise error rate over all null
p
hypotheses that set a contrast i =1 ai mi equal to zero. The Tukey method
declares the difference between mi and mh to be statistically significant if
| yi y h | > q a

MSE
m

AN INTRODUCTION TO EXPERIMENTAL DESIGN

33

Here, we are controlling the experimentwise error rate over all possible
pairwise comparisons of treatment means. Recall from our discussion of
Tukey simultaneous 95 percent confidence intervals that the sample size
for each treatment is assumed to be the same value m, and qa is a studentized range value obtained from Table A3 corresponding to the values p
and n p.
A modification of the Tukey procedure is the StudentNewman
Keuls (SNK) procedure, which has us first arrange the sample treatment
means from smallest to largest. Denoting these ordered sample means as
y(1) , y( 2 ) ,..., y( p ), the SNK procedure declares the difference between the
ordered population means m(i ) and m( h ) (where i is greater than h) to be
statistically significant if
| y(i ) y( h ) | > qa (i h + 1, n p )

MSE
m

Here, we denote the fact that the studentized range value obtained from
Table A3 depends upon i h + 1, the number of steps between y(i ) and
y( h ), by denoting this studentized range value as qa(i h + 1, n p ).
For example, in the gasoline mileage example the three sample means
y A = 34.92, yB = 36.56, and yC = 33.98 arranged in increasing order are
y(1) = 33.88, y( 2 ) = 34.42, and y( 3 ) = 36.56. To compare m( 3 ) with m(1), (that
is, mB with mC ) at significance level .05, we look up q.05 3 1 + 1, 15 3 = q.05 3, 12
q.05 3 1 + 1, 15 3 = q.05 3, 12 in Table A3 to be 3.77. Because | y( 3 ) y(1) | = | 36.56 33.98 | = 2.58
y(1) | = | 36.56 33.98 | = 2.58 is greater than q.05 (3,12) MSE / m = 3.77 .669 / 5 = 1.379 ,
we conclude that mB and mC differ. To compare m( 3 ) with m( 2 ) (that is, mB with
mA) at significance level .05, we look up q.05 (3 2 + 1,15 3) = q.05 (2,12)
in Table A3 to be 3.08. Because | y( 3 ) y( 2 ) | = | 36.56 34.92 | = 1.64 is
greater than q.05 (2,12) MSE / m = 3.08 .669 / 5 = 1.127, we conclude
that mB and mA differ. To compare m( 2 ) with m(1) (that is, mA with mC ) at significance level .05, we look up q.05 (2 1 + 1,15 3) = q.05 (2,12) in Table A3
to be 3.08. Because | y( 2 ) y(1) | = | 34.92 33.98 | = .94 is not greater than
q.05 (2,12) MSE / m = 3.08 .669 / 5 = 1.127, we cannot conclude that
mA and mC differ. In general, the SNK procedure has neither a comparisonwise or an experimentwise error rate. Rather, the SNK procedure controls
the error rate at a for all comparisons of means that are the same number of
ordered steps apart. For example, in the gasoline mileage e xample, the error

34

EXPERIMENTAL DESIGN

rate is a for comparing m( 3 ) with m( 2 ) (that is, mB with mA) and m( 2 ) with m(1)
(that is, mA with mC ) because in both cases i h + 1 = 2.
In general, because qa(i h + 1, n p ) MSE / m decreases as the
number of steps apart i h + 1 decreases, the SNK procedure is more liberal than the Tukey procedure in declaring significant differences between
treatment means. However, the SNK procedure is more conservative than
performing individual t tests in declaring such significant differences. It
is important to note that since many users find individual t tests each at
significance level a [or individual100(1 a ) percent confidence intervals]
easy to calculate and, therefore, use them for making multiple treatment
mean comparisons (including those suggested by the data), statisticians
recommend doing this only if the F test of H 0 : m1 = m2 = ... = m p rejects
H 0 at significance level a. Such a use of the F test as a preliminary test of
significance, followed by making multiple, individual t test comparisons,
is called Fishers least significant difference (LSD) procedure. Simulation studies suggest that Fishers LSD procedure controls the experimentwise error rate for the multiple comparisons at approximately a.
Lastly, in some situations it is important to use a control treatment and compare various treatment means with the control treatment
mean. This is true, for example, in medical research where various new
medicines would be compared with a placebo. The placebo might, for
example, be a pill with no active ingredients, and measuring the placebo
effect is important because sometimes patients react favorably simply
because they are taking a pill. Dunnetts procedure declares treatment
mean mi to be different from the control mean mcontrol at significance
level a if
| yi ycontrol | > d a( p 1, n p )

2 MSE
m

Here, p 1 is the number of noncontrol treatments, m is the common


sample size for all treatments (including the control), and d a( p 1, n p )
is obtained from Table A4. For example, in the gasoline mileage situation,
suppose that the oil companys current gasoline is gasoline type A and
the company wishes to compare gasoline types B, and C with gasoline
type A, which is regarded as the control treatment. Letting a equal .05,

yC ycontrol

AN INTRODUCTION TO EXPERIMENTAL DESIGN

35

we find from Table A4 that d .05 (3 1,15 3 = 12) = d .05 (2,12) is 2.50.
Therefore, d .05 (2,12) 2 MSE / m = 2.50 2(.669) / 5 = 1.293. Because
| y B ycontrol | = | y B y A | = | 36.56 34.92 | = 1.64 is greater than 1.293,
we conclude that mB differs from mA. Because | yC ycontrol | = | yC y A | = | 33.98 34.92 | = .
| = | yC y A | = | 33.98 34.92 | = .94 is not greater than 1.293, we cannot conclude that
mC differs from mA.

1.3 Fixed and Random Models


The methods of Section 1.2 describe the situation in which the treatments
(that is, factor levels) are the only treatments of interest. This is called
the fixed model case. However, in some situations the treatments have
been randomly selected from a population of treatments. In such a case
we are interested in making statistical inferences about the population of
treatments.
Example 1.5
Suppose that a pharmaceutical company wishes to examine the potency
of a liquid medication mixed in large vats. To do this, the company randomly selects a sample of four vats from a months production and randomly selects four separate samples from each vat. The data in Table 1.3
represents the recorded potencies. In this case we are not interested in the
potencies in only the four randomly selected vats. Rather, we are interested in the potencies in all possible vats.

Table 1.3 Liquid medication potencies from four randomly


selected vats
Vat 1

Vat 2

Vat 3

Vat 4

6.1

7.1

5.6

6.5

6.6

7.3

5.8

6.8

6.4

7.3

5.7

6.2

6.3

7.7

5.3

6.3

y2 = 7.35

y3 = 5.6

y4 = 6.45

y1 = 6.35

36

EXPERIMENTAL DESIGN

Let yij denote the potency of the jth sample in the ith randomly
selected vat. Then the random model says that
yij = mi + eij
Here, mi is the mean potency of all possible samples of liquid medication
that could be randomly selected from the ith randomly selected vat. That
is, mi is the mean potency of all of the liquid medication in the ith randomly selected vat. Moreover, since the four vats were randomly selected,
mi is assumed to have been randomly selected from the population of all
possible vat means. This population is assumed to be normally distributed
with mean m and variance s m2 . Here, m is the mean potency of all possible
samples of liquid medication that could be randomly selected from all
possible vats. That is, m is the mean potency of all possible liquid medication. In addition, s m2 is the variance between all possible vat means.
We further assume that each error term eij has been randomly selected
from a normally distributed population of error term values having mean
zero and variance s 2 and that different error terms eij are independent of
each other and of the randomly selected means mi . Under these assumptions we can test the null hypothesis H 0 : s m2 = 0. This hypothesis says
that all possible vat means are equal. We test H 0 versus the alternative
hypothesis H a : s m2 0, which says that there is some variation between
the vat means. Specifically, we can reject H 0 in favor of H a at significance
level a = .05 if the F statistic of Section 1.2, F = MST / MSE , is greater
than Fa = F.05 = 3.49, which is based on p 1 = 4 1 = 3 numerator and
n p = 16 4 = 12 denominator degrees of freedom.
Table 1.4 tells us that since F = 45.5111 is greater than F.05 = 3.49,
we can reject H 0 : s m2 = 0 with a = .05. Therefore, we conclude that there
is variation in the population of all vat means. That is, we conclude that
some of the vat means differ. Furthermore, as illustrated in Table 1.4, we
can calculate point estimates of the variance components s 2 and s m2 .
These estimates are .0542 and .6031, respectively. Note here that the
variance component s 2 measures the within-vat variability, while
s m2 measures the between-vat variability. In this case the between-vat
variability is substantially higher than the within-vat variability. We can
also calculate a 95 percent confidence interval for m , the mean potency

F = 45.5111

MST = 2.4667
MSE = .0542

SST = 7.4

SSE = .65

p 1 = 3

n p = 12

Model

Error

H0 : s 2m = 0

H0 : m1 = m2 = ... = mp

s 2 + n s 2m
s2

1 p
n ( m m )2
p 1 i=1 i i

E(mean square)
random model

s2

s2 +

E(mean square)
fixed model

MST
2.4667
Furthermore, a 100(1 a)% = 95% confidence interval for m is y ta / 2
= [6.4375 3.182(.3926 )]
= 6.4375 t .025
pm
4( 4 )

= [5.1881,7.6869]
Here, t a /2 is based on p 1 = 4 1 = 3 degrees of freedom.

n2i
1 p
i=1
1. n =
n p (= m for equal sample sizes)
p 1 i=1 i
ni

i =1

2. Since F = 45.5111 > F.05 = 3.49, we can reject H0 : s 2m = 0 with a = .05.


2
2
3. Since E(MSE) = s , a point estimate of s is MSE = .0542.
4. Since E( MST ) = s 2 + n s 2m , a point estimate of s 2 + n s 2m is MST. Thus a point estimate of s 2m is ( MST MSE ) = ( MST MSE ) = 2.4667 .0542 = .6031
p
n
m
4
yi

6.35 + 7.35 + 5.6 + 6.45


i=1
5. For equal sample sizes ( n i = m ) a point estimate of m is y =
=
= 6.4375
4
p

Notes:

F statistic

Mean
square

Sum of squares

df

Source

Table 1.4 ANOVA table for fixed and random models


AN INTRODUCTION TO EXPERIMENTAL DESIGN
37

38

EXPERIMENTAL DESIGN

of all possible liquid medication. As shown in Table 1.4, this 95 percent


interval is [5.1881, 7.6869]. To narrow this interval, we could randomly
select more vats and more samples from each vat.
The above example illustrates that the procedure for testing
H 0 : m1 = m2 = ... = m p, which is appropriate when the p treatments are
the only treatments of interest, is the same as the procedure for testing H 0 : s m2 = 0, which is appropriate when the p treatments are randomly selected from a large population of treatments. Furthermore,
each procedure is justified by the expected mean squares given in
Table 1.4. Specifically, in the fixed model case, E ( MST ) / E ( MSE ) = 1
when H 0 : m1 = m2 = ... = m p is true, and in the random model case,
E ( MST ) / E ( MSE ) = 1 when H 0 : s m2 = 0 is true. Moreover, in both
cases E ( MST ) / E ( MSE ) > 1 when H 0 is not true. Thus, in both cases
we reject H 0 when F = MST / MSE is large.

1.4 Testing the Equality of Population Variances


Consider testing H 0 : s12 = s22 = ... = s 2p versus H a : At least two of
s12 , s22 ,..., s 2p differ. We can test H 0 versus H a by using the sample variances s12 , s22 ,..., s 2p . Here, these sample variances are assumed to be calculated from p independent samples of sizes n1 , n2 ,..., n p that have been
randomly selected from p normally distributed populations having variances s12 , s22 ,..., s 2p. Then, Hartleys test says that if all samples have the
same size m, we can reject H 0 : s12 = s22 = ... = s 2p in favor of H a at level of
significance a if
F=

max ( s12 , s22 ,..., s 2p )


min( s12 , s22 ,..., s 2p )

is greater than Fa , which is based on p numerator and m 1 denominator


degrees of freedom. If the sample sizes are unequal, but do not differ substantially, we can set m equal to the maximum sample size.
Unfortunately, Hartleys test is very sensitive to departures from the
normality assumption. If the populations being sampled are described by
probability distributions that are somewhat nonnormal, but the population variances are equal, Hartleys test is likely to incorrectly reject the

AN INTRODUCTION TO EXPERIMENTAL DESIGN

39

null hypothesis that the population variances are equal. An alternative test
that does not require the populations to have normal distributions is the
Brown-Forsythe-Levene (BFL) test. To carry out this test, which involves
considerable calculation, we let zij = | yij med i |, where med i denotes the
median of the observations in the ith sample. We then calculate
ni

zi . =

z
j =1

ij

and z.. =

ni

ni

z
i =1 j =1

ij

where n = n1 + n2 + ... + n p is the total sample size. The BFL test then says
that we can reject H 0 : s12 = s22 = ... = s 2p at level of significance a if
p

L=

n (z
i =1
p ni

i.

(z
i =1 j =1

ij

z.. )2 / ( p 1)
zi . )2 / (n p )

is greater than Fa , which is based on p 1 numerator and n p denominator degrees of freedom.

1.5 Using SAS


In Figure 1.4 we present the SAS program that yields the analysis of the
North American Oil Company data that is presented in Figure 1.3. Note
that in this program we employ a class variable to define the one factor
model.

1.6Exercises
1.1 A
 n oil company wishes to study the effects of four different gasoline additives on mean gasoline mileage. The company randomly
selects four groups of six automobiles each and assigns a group of
six automobiles to each additive type (W , X , Y , and Z). Here, all
24 automobiles employed in the experiment are the same make and
model. Each of the six automobiles assigned to a gasoline additive is
test driven using the appropriate additive, and the gasoline mileage

40

EXPERIMENTAL DESIGN

DATA GASOLINE;

Defines factor GASTYPE


and response variable MILEAGE

INPUT GASTYPE $ MILEAGE @@; }


DATALINES;
A 34.0 A 35.0
B 35.3 B 36.5
C 33.3 C 34.0

A 34.3
B 36.4
C 34.7

A 35.5
B 37.0
C 33.0

A 35.8
B 37.6
C 34.9

DataSee Table 1.1

;
PROC GLM; }
Specifies General Linear Models Procedure
CLASS GASTYPE; }
Defines class variable GASTYPE
Specifies model, and CLM
MODEL MILEAGE = GASTYPE / P CLM; }
requests confidence intervals
Estimates B A
ESTIMATE MUB-MUA GASTYPE -1 1;}
Estimates A C
ESTIMATE MUA-MUC GASTYPE 1 0 -1;}
Estimates B C
ESTIMATE MUB-MUC GASTYPE 0 1 -1;}
ESTIMATE MUB-(MUC+MUA)/2
C + A
Estimates mB
GASTYPE -.5 1 -.5 ; }

PROC GLM ;
CLASS GASTYPE ;
MODEL MILEAGE = GASTYPE / P CLI ; }
CLI requests prediction intervals
Notes: 1.The coefficients in the above ESTIMATE statements are obtained by writing the quantity to be
estimated as a linear combination of the factor level means mA , mB , and mC with the factor levels
considered in alphabetical order. For example, if we consider MUB MUA (that is, mB mA ),
we write this difference as
mA + mB = 1( mA ) + 1( mB ) + 0( mC )
Here, the trailing zero coefficient corresponding to mC may be dropped to obtain
ESTIMATE MUB MUA GASTYPE 1 1;
As another example, the coefficients in the ESTIMATE statement for
MUB ( MUC + MUA ) / 2 (that is, mB ( mC + mA ) /2 ) are obtained by writing this
expression as
1
1
mB ( mC + mA ) / 2 = ( mA ) + 1( mB ) + mC
2
2
= .5( mA ) + 1( mB ) + ( .5) mC
Thus we obtain
ESTIMATE MUB ( MUC + MUA ) / 2 GASTYPE .5 1 .5;
2. Expressions inside single quotes (for example, MUB MUA) are labels that may be up to 16
characters in length.
3. Confidence intervals (CLM) and prediction intervals (CLI) may not be requested in the same
MODEL statement when using PROC GLM.

Figure 1.4 SAS program to analyze the North American Oil Company
data

AN INTRODUCTION TO EXPERIMENTAL DESIGN

41

for the test drive is recorded. The results of the experiment are given
in Table 1.5. A one-way ANOVA of this data is carried out by using
SAS. The PROC GLM output is given in Figure 1.5. Note that the
treatment means mW , mX , mY , and mZ are denoted as MUW, MUX,
MUY, and MUZ on the output.
(a) Identify and report the values of SSTO, SST , MST , SSE , and MSE
SSTO, SST , MST , SSE , and MSE .
(b) Identify, report, and interpret F and its associated p-value.
(c) Identify, report, and interpret the appropriate individual t statistics and associated p-values for making all pairwise comparisons
of mW , mX , mY , and mZ .
(d) Identify, report, and interpret the appropriate individual
t
statistic and associated p-value for testing the significance of
[( mY + mZ ) / 2] [( mX + mW ) / 2].
(e) Identify, report, and interpret a point estimate of and a 95
percent confidence interval for mZ (see observation 24).
(f ) Identify, report, and interpret a point prediction of and a 95
percent prediction interval for y Z 0 = mZ + eZ 0
1.2 Consider the one-way ANOVA of the gasoline additive data in
Table 1.5 and the SAS output of Figure 1.5
(a) Compute individual 95 percent confidence intervals for all possible pairwise differences between treatment means.
(b) Compute Tukey simultaneous 95 percent confidence intervals
for all possible pairwise differences between treatment means.
(c) Compute Scheff simultaneous 95 percent confidence intervals
for all possible pairwise differences between treatment means.
(d) Compute Bonferroni simultaneous 95 percent confidence intervals for the (prespecified) set of all possible pairwise differences
between treatment means.
(e) Which of the above intervals are the most precise?
1.3 Consider the one-way ANOVA of the gasoline additive data in
Table 1.5 and the SAS output of Figure 1.5. Also consider the
prespecified set of linear combinations (contrasts):

mZ mW
mY mW
mZ mX
( mY + mZ ) ( mX + mW )

2
2

mY mX

mZ mY

mX mW

42

EXPERIMENTAL DESIGN

Table 1.5 Gasoline additive test results


Gasoline additive

Mean

31.2

27.6

35.7

34.5

32.6

28.1

34.0

36.2

30.8

27.4

35.1

35.2

31.5

28.5

33.9

35.8

32.0

27.5

36.1

34.9

30.1

28.7

34.8

35.3

31.3667

27.9667

34.9333

35.3167

(a) Compute an individual 95 percent confidence interval for


( mY + mZ ) / 2 ( mX + mW ) / 2. If gasoline additives Y and Z
have an ingredient not possessed by gasoline additives X and W ,
what do you conclude?
(b) Compute Scheff simultaneous 95 percent confidence intervals for
the linear combinations in the aforementioned set.
(c) Compute Bonferroni simultaneous 95 percent confidence intervals for the linear combinations in the aforementioned set.
1.4 In order to compare the durability of four different brands of
golf balls (Alpha, Best, Century, and Divot), the National Golf
Association randomly selects five balls of each brand and places each
ball into a machine that exerts the force produced by a 250-yard
drive. The number of simulated drives needed to crack or chip each
ball is recorded. The results are given in Table 1.6. The Excel output
of a one-way ANOVA of these data is shown in Figure 1.6. Test
for statistically significant differences between the treatment means
mAlpha , mBest , mCentury , and mDivot . Set a = .05. Use pairwise comparisons
to find the most durable brands of golf balls.

OBSERVED
VALUE
35.30000000
PREDICTED
VALUE
35.31666667

35.31666667

PREDICTED
VALUE

T FOR H0:
PARAMETER=0
9.14
17.01
0.89
8.25
16.12
-7.87
17.86

127.22

F VALUE
0.0001

PR > F
3

DF

-0.01666667

RESIDUAL

-0.01666667

RESIDUAL

0.0001
0.0001
0.3857
0.0001
0.0001
0.0001
0.0001

PR > |T|

F VALUE
127.22

34.67916207

LOWER 95% CL
INDIVIDUAL
33.62998806

LOWER 95% CL
FOR MEAN

127.22

F VALUE

STD ERROR OF
ESTIMATE
0.43221008
0.43221008
0.43221008
0.43221008
0.43221008
0.43221008
0.30561868

TYPE III SS

R-SQUARE
0.950205

213.88125000

PR > F
0.0001
ROOT MSE
0.74860982

35.95417126
UPPER 95% CL
INDIVIDUAL
37.00334528

UPPER 95% CL
FOR MEAN

0.0001

PR > F

C. V.
2.3108
MILEAGE MEAN
32.39583333

AN INTRODUCTION TO EXPERIMENTAL DESIGN

Figure 1.5 SAS output of a one-way ANOVA of the gasoline additive test results

24

35.30000000

24

OBSERVATION

ESTIMATE
3.95000000
7.35000000
0.38333333
3.56666667
6.96666667
-3.40000000
5.45833333

TYPE I SS

213.88125000

SUM OF SQUARES MEAN SQUARE


213.88125000 71.29375000
11.20833333 0.56041667
225.08958333

OBSERVED
VALUE

DF

DF
3
20
23

OBSERVATION

PARAMETER
MUZ-MUW
MUZ-MUX
MUZ-MUY
MUY-MUW
MUY-MUX
MUX-MUW
(Y+Z)/2-(X+W)/2

ADDTYPE

SOURCE

SOURCE
MODEL
ERROR
CORRECTED TOTAL

DEPENDENT VARIABLE: MILEAGE

SAS
GENERAL LINEAR MODELS PROCEDURE


43

44

EXPERIMENTAL DESIGN

Table 1.6 Golf ball durability test results


Brand

Mean

Alpha

Best

Century

Divot

281

270

218

364

220

334

244

302

274

307

225

325

242

290

273

337

251

331

249

355

253.6

306.4

241.8

336.6

ANOVA
Source of Variation
Between Groups
Within Groups
Total

SS

df

MS

P-Value

F crit

29860.4
9698.4
39558.8

3
16
19

9953.4667
606.15

16.420798

3.853E-05

3.2388715

Figure 1.6 Excel output of a one-way ANOVA of the golf ball


durability data

1.5 M
 odify the golf ball durability data in Table 1.6 by assuming that
the four brands of golf balls have been randomly selected from the
population of all brands. Then, using the random model:
(a) Test H 0 : s m2 = 0 versus H a : s m2 0 by setting a equal to .05.
(b) Find point estimates of s 2 and s m2 . Interpret.
(c) Find a 95 percent confidence interval for m . Interpret this interval.

Index
Analysis of variance (ANOVA)
assumptions, 6
one-way analysis of variance
(one-way ANOVA), 4
table, 15, 16
two-way ANOVA, 53
using pooling, 223
Basic design, 197, 200
Block confounding, 214234
Block sum of squares (SSB), 80
Bonferroni simultaneous intervals, 24,
25, 26, 29, 30, 86
Brown-Forsythe-Levene (BFL) test,
39
Column vector, 93
Complete model, 103
Completely randomized experimental
design, 2
Control treatment, 34
Covariate, 115
Cross-over design, 162168
Dependent variable, 1
Designed experiment, 2
Design generator, 197, 200
Dunnetts procedure, 34
Error mean square, 11
Error sum of squares (SSE), 9, 80
Experimental units, 2
F (model) statistic, 96
F tests, 58, 59
Fishers least significant difference
(LSD), 34
Fixed models, 3538
Fold over design, 205

Fractional factorials
basic techniques, 189204
fold over designs, 204214
Plackett-Burman designs, 204214
General analysis approach, 132152
Hartleys test, 38
Independent variables, 1
ith factor level mean, 50
ith treatment effect, 4
jth factor level mean, 51, 105
L equations, 218
L procedure, 218
Latin square design, 158162
Least squares point estimates, 93
Linear combination, 24
Matrix algebra, 94
Mean square error, 95
Mean squares, 8, 11
Multiple coefficient of variation, 95
Nested factors, 125132
Null hypothesis, 8
One factor analysis
basic concepts of, 13
fixed models, 3538
population variances, equality of,
38, 39
random models, 3538
significant differences between
treatment means, 715
treatment means, linear
combinations of, 1535

260 Index

One factor model, 4


One-way analysis of variance
(one-way ANOVA), 4
Overall mean, 4
Pairwise comparisons, 15
Pairwise differences, 15, 19
Partial F test, 103, 104
Point prediction, 20
Preliminary test of significance, 34
Principle block, 218
Random models, 3538
Randomized block analysis of
variance, 111
Randomized block design, 152158
Randomized complete block design,
111
Randomized incomplete block design,
112
Reduced model, 103
Response surface methodology,
234237
Response variable, 1, 3
Scheff intervals, 29
Scheff simultaneous intervals, 24, 25,
26, 62, 86
Side condition, 4
Significance, preliminary test of, 34

Split plot design, 152158


Standard error, 95
Student-Newman-Keuls (SNK)
procedure, 33
Sums of squares, 8
t-Test, 31
Test statistic, 11
Three factors, 132152
Total sum of squares (SSTO), 10, 80
Treatment mean, 4
Treatment sum of squares (SST), 80
Treatment variability, 8
Tukey simultaneous intervals, 20
Two factor analysis
estimation and prediction, 60
interaction, 48
mixed effects models, 6774
random effects models, 6774
randomized block design, 7490
regression analysis, 90115
Two factor factorial, 152158
Two-factor factorial experiment, 49,
179189
Two-way ANOVA, 53
Unbalanced and incomplete
experimental design data,
90115
Unexplained variation, 95

You might also like