You are on page 1of 14

A Bayesian Analysis of Very Small Unreplicated

Experiments
Vctor Aguirre-Torres
a
*

and Romn de la Vara


b
It is not uncommon to deal with very small experiments in practice. For example, if the experiment is conducted on the
production process, it is likely that only a very few experimental runs will be allowed. If testing involves the destruction of
expensive experimental units, we might only have very small fractions as experimental plans. In this paper, we will consider
the analysis of very small factorial experiments with only four or eight experimental runs. In addition, the methods presented
here could be easily applied to larger experiments. A Daniel plot of the effects to judge signicance may be useless for this
type of situation. Instead, we will use different tools based on the Bayesian approach to judge signicance. The rst tool
consists of the computation of the posterior probability that each effect is signicant. The second tool is referred to in
Bayesian analysis as the posterior distribution for each effect. Combining these tools with the Daniel plot gives us more
elements to judge the signicance of an effect. Because, in practice, the response may not necessarily be normally
distributed, we will extend our approach to the generalized linear model setup. By simulation, we will show that not only
in the case of discrete responses and very small experiments, the usual large sample approach for modeling generalized
linear models may produce a very biased and variable estimators, but also that the Bayesian approach provides a very
sensible results. Copyright 2013 John Wiley & Sons, Ltd.
Keywords: Generalized Linear Models; Unreplicated Factorial Experiments; Bayesian Model Selection; Signicant Effects; Small
Sample Analysis
1. Introduction
I
n practice, we may nd situations where experimental runs are very expensive, and thus, small unreplicated experiments are used.
For example, in a study of a cookie production process, each run was meant to produce a batch of 500 kg of product, and then, the
responses were measured on samples of cookies taken from the batch. For this case, we were able to use at most two factors each
at two levels. Some of the responses were hardness (continuous), likeness of texture evaluated by a panel of 10 judges (binomial), and
sweetness evaluated by a panel of 10 judges (binomial). Hence, in this paper, we will consider experiments with four or eight
experimental runs and models that allow for non-normal data. Unreplicated experiments are typically analyzed using the Daniel
1
plot, but for small fractions, this plot of the effects could provide little or no information.
We will give two examples in order to illustrate our ideas in this paper. First, we will consider a simulated 2
3
unreplicated full factorial
experiment where the response has a gamma distribution. The observations are generated using a linear predictor =1.5A+1.5B 1.5C,
with log as the link function (both terms will be dened in Section 2). The observations and the estimated coefcients are given in Table I.
The coefcients were obtained using the function glm from the R (Foundation for Statistical Computing, Vienna, Austria) statistical
package
2
after tting the saturated model.
Clearly, the coefcients for A and C stand out from the rest. The usual procedure to determine their signicance is based on the
Daniel plot
3
in Figure 1. Signicance is hard to judge because all the coefcients fall along a straight line. If we include the
ShapiroWilk normality test
4
as part of the analysis of unreplicated experiments, the resulting p-value is 0.9894, suggesting that
the discrepancy observed in A and C falls within a normal variation. This is in agreement with the Daniel plots interpretation.
Now, we will consider the more challenging situation of the cookie production process. As we mentioned before, several responses
were measured, one of them being sweetness. In Section 4, we will discuss in more detail this case; meanwhile, we will consider Table II
which shows the number of times that a panel of 10 judges found the formula given by the treatment combination sweeter than a
control formulation. This table also shows the estimates of the effects provided by glm using a binomial model. Figure 2 is the
corresponding Daniel plot.
a
Statistics Department, Instituto Tecnolgico Autnomo de Mxico (ITAM), Ro Hondo #1, Mxico, D.F. 01080, Mexico
b
Quality Engineering Department, Centro de Investigacin en Matemticas (CIMAT), Guanajuato, GTO 03600, Mexico
*Correspondence to: Vctor Aguirre-Torres, Statistics Department, Instituto Tecnolgico Autnomo de Mxico (ITAM), Ro Hondo #1, Mxico, D.F. 01080, Mexico.

E-mail: aguirre@itam.mx
Copyright 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413426
Special Issue Article
(wileyonlinelibrary.com) DOI: 10.1002/qre.1578
Published online 27 December 2013 in Wiley Online Library
4
1
3
The Daniel plot is very difcult to interpret, because a central line is not evident. The p-value of the ShapiroWilk normality test is
0.08 indicating a lack of normality. However, it is difcult to judge signicance because all effects have a similar magnitude (around 6).
Therefore, based on the previous examples, we can now enumerate some of the challenges present in the analysis of very small
unreplicated factorial experiments:
1. Let us assume that the response is normally distributed:
1. The Daniel plot alone could be very difcult to interpret with 7 points, not to mention 3 points.
2. Normality tests may have very little power to detect departures for 3 or even 7 points. Evidence of this assertion will be
shown in Section 5.
3. The interpretation of Daniel plot depends on effect sparsity. This can be very restrictive for three effects.
2. When normality cannot be assumed, the analysis can be carried out by using a generalized linear model (GLM). This is a
nonlinear model that is estimated iteratively.
1. A referee noticed that when the response is binomial, the maximum likelihood estimator (MLE) may not exist.
5
Then, the
analysis of the Daniel plot of the output of the iterative procedure, which may not be the MLE, could be worthless.
2. If the response is assumed to be Poisson, even if the MLE existed, the Daniel plot would rely on the MLEs asymptotic
normality, which is dubious for four or even eight observations. Challenge 1.1 applies here too.
3. In fact, we ran some simulations for two 2
3
experiments, one experiment with binomial response and the other with Poisson
response. In both cases, we collected the estimates produced by the R function glm. These estimates behaved very widely
in that the bias and dispersion around the true values of the effects were huge for both cases. The latter is understandable
for the binomial case because, theoretically, the MLE may not exist, but it was unexpected in the Poisson model. More
discussion about these results is given in Section 5.
4. In the case of a gamma response, when none of the observations is zero, and the link function is the one mentioned in
Table III, it is shown
6
that the MLE exists and that it is nite and unique. In addition to MLEs lack of normality in such small
samples, challenges 1.1, 1.2, and 1.3 apply here too.
Figure 1. Daniel plot. Simulated 2
3
full factorial with gamma response
Table I. Observations and estimated coefcients for simulated 2
3
factorial experiment with gamma response
Label Observation Coefcients
0 0.57 0.43
A 7.69 1.36
B 0.61 0.66
AB 52.08 0.39
C 0.01 1.66
AC 0.48 0.41
BC 0.37 0.16
ABC 0.21 0.84
V. AGUIRRE-TORRES AND R. DE LA VARA
Copyright 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413426
4
1
4
3. There are other approaches used to analyze the GLM besides the Daniel plot. They correspond to information criteria
7
(Bayesian information criterion [BIC] and Akaike information criterion [AIC]) or stepwise model selection based on deviance.
8
1. Bayesian information criterion and AIC are very similar. They both are a function of the MLE, something that at least for the
binomial and Poisson distribution would be dubious.
2. Bayesian information criterion and AIC are computed for all submodels, and then, a decision is made by maximizing the
criteria. Although, theoretically, the MLE exists for the gamma distribution, when we simulated several 2
3
experiments, it
turned out that the R function glm failed to converge for several submodels. The latter may be caused by numerical
problems of the iterative procedure. Hence, the approach could not be applied for the gamma model and very small
factorials. You may see the supporting information in the online version of this paper.
3. The stepwise procedure based on deviance depends on the large sample approximation of the
2
distribution to the
difference of the deviances. This is very controversial for very small factorials.
4. One of the referees suggested considering the bootstrap. This approach has been used in experimental design;
9
however,
the authors mentioned that their method is more convenient for moderate size designed experiments and replications.
None of the previous conditions hold in the situation considered in this paper. There are, however, other forms of bootstrap
in a regression setup that are interesting to explore.
To overcome the challenges mentioned previously in this paper, we propose to supplement the Daniel plot with tools from the
Bayesian approach. The rst tool is the posterior probability that the effects are active (PPA). And the second tool is the posterior
distribution of each effect (PDE). We will now discuss the opportunities that these tools represent.
1. Both the PPA and PDE are an exact solution for small samples. Hence, they do not depend on the MLEs asymptotic normality or
any other large sample approximation.
2. Both the PPA and PDE could work even if the MLE does not exist. This is because the posterior distributions or posterior
probabilities do not depend on the MLE.
3. Even if the Daniel plot does not show signicance, PPA and PDE could provide evidence of signicance. This is exemplied with
the simulated experiment with gamma response.
4. If the 95% posterior probability interval of an effect is completely on one side of zero, then there would be evidence that the
effect is active. The center of the distribution also gives the sign and an estimate for the effect.
5. Even if the posterior probability interval is not completely on one side of zero, there is a possibility of the odds of an effect being
positive (or negative) as an indication of signicance. This interpretation is possible because the PDE is a probability density for
the unknown effect.
Table II. Observations and estimated coefcients for the cookie experiment
Label Sweetness Effects Sweetness GLM
0 3
A 6 7.1
B 0 5.6
AB 9 6.5
Figure 2. Daniel plot. Cookie sweetness 2
2
factorial experiment
V. AGUIRRE-TORRES AND R. DE LA VARA
Copyright 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413426
4
1
5
6. In addition, a by-product of the PPA is the probability that none of the effects are active. The latter is useful because if this
probability is small then there would provide evidence that at least one of the effects should be active, and consequently,
the pattern of probabilities of the effects would suggest which of them are active.
7. Both the PPA and PDE do not depend on factor sparsity.
Some authors
10,11
compare directly the posterior probability of all possible models and then, choose the model with the highest
posterior probability. As seen in those papers, several models may differ by a very small amount of probability and then, the decision
is ambiguous. Instead, we have observed that computing the posterior probabilities for each effect is more informative.
The paper is organized in the following way. Section 2 presents the GLM formulation and the frequentist large sample approach of
analysis. It also shows some simulation results for small samples, displaying the need to supplement this approach with other tools.
Section 3 gives the Bayesian setup and its application to the computation of the PPA and PDE. Section 4 shows the application of the
Bayesian tools to the examples of the rst section. Section 5 shows the results of simulation studies evaluating the PPAs performance,
a comparison with the normality test approach, and it also contains a sensitivity analysis to evaluate the impact of different prior
choices on PPA.
2. The generalized linear model
As shown in the cookie experiment, non-normal responses are present in industrial experiments.
12,13,3
We also found an experiment
in our statistical practice where the response was just a Bernoulli random variable that indicated a change in color of a shampoo over
time. Instead of trying to normalize the response, it is more natural to model it directly with an appropriate distribution. The previous
situations led us naturally to consider a GLM to analyze these experiments, because this approach intrinsically incorporates the non-
normal nature of the response.
However, a GLM is a nonlinear model, where the analysis relies on parameters MLEs large sample properties. To examine this more
clearly, let y
t
=[y
1
,y
2
, ,y
n
] be a vector of independent observations, with vectors of means [
1
,
2
, ,
n
]. Under the GLM, the
observation y
i
, has a distribution that is a member of the exponential family, that is
f y
i

i
; j exp r y
i

i
b
i
c y
i
; f g;
where r(), b() and c() are specic functions depending on the family of distributions,
i
is the natural location parameter, is the
dispersion parameter, and the systematic part of the model involves the factors of the experiment plus their interactions, represented
by the variables x
1
, x
k
. The model is built around the linear predictor, =
0
+
1
x
1
++
k
x
k
. The model is constructed through the
use of a link function
j
=g(
j
) j =1, , n. The link function is required to be monotonic and differentiable. A link is canonical if
j
=
j
.
The variance Var(y
j
) also depends on
j
; this specic functional dependence changes with the distribution.
Let f(y|) be the likelihood function of the model. The vector contains the vector of parameters in the linear predictor. If
^
is the
MLE of the coefcients of the linear predictor, then typically, the analysis relies on the asymptotic normality given by

n
p
^

_ _
L
N 0; V
1
_ _
; n:
The matrix V() has the form X
t
D(X,)X. The matrix X is the usual design matrix containing main effects and interactions, and D(X,)
is a diagonal matrix that depends on the covariates, the density of y
i
and the link functions derivatives. For explicit expressions of D
(X,) see, for example,
14
page 5.
It is tempting to use the ratio of
^

i
divided by the estimator of the asymptotic standard error obtained from V
^

_ _
to nd the
signicant effects. However, we tried this approach in example 7.4 of,
3
and the estimate was enormous relative to the effects.
Therefore, the individual normal standardized values are worthless. Instead, it is customary to make a normal plot of the standardized
effects and then judge which points that deviate from a central line could be considered as candidates of active effects.
In order to demonstrate clearly the impact of a small sample and the MLEs nonexistence, we generated 200 replicates of a
2
3
factorial experiment with a binomial response with 10 trials and a probability of success given by the linear predictor
= 2A 3B +3C +2BC. By the same arguments as in,
5
it can be shown that for the data of this experiment the MLE never exists.
Figure 3 is the effect Bs histogram.
Table III. Log-likelihood functions for generalized linear models
Family link lnf(y|M
i
,
i
)
Binomial
j
ln

j
1
j
_ _

n
j1
y
j
x
t
j

i
n
j
ln 1 e
x
t
j

i
_ _ _ _
Poisson
j
=log(
j
)
n
j1
e
x
t
j

i
y
j
x
t
j

i
_ _
Gamma
j
=log(
j
) n r lnr ln r rn
0
r
n
j1
y
j
e
x
t
j

i
r 1
n
j1
lny
j
V. AGUIRRE-TORRES AND R. DE LA VARA
Copyright 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413426
4
1
6
This plot clearly points several problems: (i) the distribution is clearly non-normal; (ii) it takes only a discrete set of values; and (iii)
the distribution is centered around 11, far from the true value of 3, suggesting that the estimator could be very biased. Similar
facts were observed for the rest of the effects. In addition, we used the ShapiroWilk normality test as an indicator in order to verify
whether the Daniel plot gives evidence of signicance. Figure 4 is the histogram of the p-value of this test statistic for the 200
simulated experiments. It shows that less than 5% of the experiments gave a p-value of less than 0.1 suggesting that in most of
the cases the Daniel plot did not give evidence of signicance in terms of departure from the straight line. This also shows the
normality tests low power.
We also generated 200 replicates of a 2
3
factorial experiment with a Poisson response with the linear predictor =1 +AB+0.75C+
0.5BC. Figure 5 gives the histogram for the effect B. It should be around 1, but instead it is around 6 and with a large dispersion.
Similar things happen with the other effects. This is similar to what we observed for the binomial distribution, suggesting that the
MLE does not exist in this situation as well. Thus, it does not make sense to use it. Furthermore, the standard errors computed from
the asymptotic variance were large (12313.17 ) in this case and therefore, useless.
3. The Bayesian approach
The use of Bayesian methods in GLMs is not new.
11,1416
The latter works mainly address the case of large data sets. This paper differs
from those works in that we consider very small structured data sets that present all of the challenges mentioned previously.
The Bayesian approach incorporates prior knowledge of the parameters. This knowledge is represented by a prior density function
denoted by f(). Then, the prior knowledge and the empirical evidence represented by the likelihood function are blended by Bayes
Theorem. This blend can take several forms. In this paper, we only use two, namely, to compute the PPA and the parameters posterior
probability distribution.
Figure 3. Histogram of effect B. Monte Carlo simulated binomial 2
3
factorial experiment
Figure 4. Histogram of p-values, ShapiroWilk normality test. Simulated 2
3
factorial experiment with binomial response and 10 trials
V. AGUIRRE-TORRES AND R. DE LA VARA
Copyright 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413426
4
1
7
3.1. A prior density
We will rst consider the case of . It is treated in several ways in the literature, see, for example,
14,17
. In particular, in,
17
page 421, the
authors suggest the use of a multivariate normal prior with mean
0
and covariance matrix V
0
without indicating a specic procedure
on how to obtain these parameters. More explicit proposals for the binomial model are found in,
18
which assume the existence of a
previous study from which the prior density for is given. By using Bayes Theorem, they obtain a posterior density, which is used as
the prior density for the current study. If a previous study is not available, then a subset of the data could be used to furnish the prior.
In our experience, no previous study is available, and thus, our data sets are so small that splitting the data would not be feasible.
In Bayesian data analysis,
17
the use of the conjugate prior distribution approach consisting of adding hypothetical data points and
then applying the non-informative prior distributions approach to the augmented data set is suggested. This approach is not useful
for our situation because, in general, there is no guarantee that the integrated likelihood (10) exists for a general GLM when an
improper prior is used.
Given the previous considerations, we used the following approach to dene a prior. First, we considered the parameter
0
, which
is linked to the original scale by the relationship

0
g
0
: (1)
In order to obtain a prior distribution for
0
, notice that
0
could have the following two interpretations: rst, it could be considered
the mean response when none of the effects are signicant or if the factors in the experiment are continuous; second, it is the mean
response when all the factors in the experiment are set to zero, namely, the mean response in the central region of the experiment.
Notice that
0
is related to an observable characteristic in the experiment. Regardless of the interpretation, this approach assumes
that the experimenter has a broad idea about the mean responses value, and it is stated in terms of a probability interval of the form
P L

<
0
< U

_ _
1

; (2)
where L

and U

are a lower and upper bound for


0
, and

is a small fraction, 5% or 1%. Assuming for a strictly increasing link


function from (1) and (2), it follows that:
P g L

_ _
<
0
< g U

_ _ _ _
1

(3)
Then, one way to employ in (3), a normal distribution is by choosing the following parameters for the prior density of
0
as

g L

_ _
g U

_ _
2
;

0

g U

_ _

0
z
1 =2
(4)
where z

is the th percentile of the standard normal distribution. If the link function is strictly decreasing, then

0
remains the
same, but

0
changes to

0
g L

_ _

0
_ _
=z
1 =2
.In the extreme case, when there is no information about
0
, a normal
distribution around zero could be used. In Section 5, we give the results of this choice in a brief sensitivity study for all three
distributions, where if we have no information about
0
, we use a normal distribution N(0,10) (variance is equal to 100).
For the rest of the parameters,
i
and 1 i k, we will assume that they are independent Cauchy random variables centered at zero,
because it is assumed that there is no information about the sign of the effect. If, however, there is information that an effect should
have a certain sign, this could easily be incorporated by truncating when sampling from the prior distribution accordingly.
Figure 5. Histogram of effect B. Monte Carlo simulated Poisson 2
3
factorial experiment
V. AGUIRRE-TORRES AND R. DE LA VARA
Copyright 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413426
4
1
8
The assumption of independence is common in.
1921,10
The Cauchy distribution is chosen so that the prior is less informative. This
choice is also supported by the sensitivity study of Section 5.
The previous proposal is sufcient for binomial and Poisson models. We use the same procedure for , but there is an extra
parameter for the gamma distribution, namely, the shape parameter r. To deal with this parameter, it is assumed that the density
of y
j
is given by
f y
j
jM
i
_ _

1
r
r
j
e
y
j
=j
y
r1
j
: (5)
This model supposes that the parameter r is the same for all observations, E(y
j
) =
j
=r
j
, and that the canonical link isg
j
_ _

1

j

j
,
but this link is not used because some estimates may give negative values for
j
. Consequently, a log link will be used, that is, log(
j
) =
j
.
It is well known that Var y
j
_ _

2
j
=r, thus the coefcient of variation of y
j
is given by: CV y
j
_ _

Var y
j
_ _
_
=
j
1=

r
p
.To obtain a
prior distribution for r, we assume that a probability interval for the coefcient of variation of the response is available, namely, that
there are constants L
c
, U
c
, and
c
such that
P L
c
< CV y < U
c
1
c
;
then
P
1
U
2
c
< r <
1
L
2
c
_ _
1
c
; (6)
because r is positive, a two parameter (a,b) distribution is used as a prior. In order to fulll (6), we need to solve in (a,b) the following
system of nonlinear equations:
P a; b <
1
U
2
c
_ _
w
c
and P a; b >
1
L
2
c
_ _
1 w
c (7)
where 0 <w<1 is a weight that could be taken as 0.5 if both tails of the interval have the same probability. A solution for (7) could be
obtained by minimizing
Q
w
a; b P a; b <
1
U
2
c
_ _
w
c
_ _
2
P a; b >
1
L
2
c
_ _
1 w
c
_ _
2
;
with respect to (a,b). A possible strategy is to start with w=0.5, and if a solution is not found, then iterate on w. When there is no
information about CV(y
j
), we propose tentatively to use the interval (L
c
,U
c
) =(0.15,3.6) and
c
=0.01; this choice represents a very
generous interval.
3.2. Posterior probability that an effect is active
This approach is an extension of GLM of the ideas given in.
1921
They dealt with cases of observations having a normal distribution
and where the link function is the identity.
For the fractional factorial with n =2
k
runs, let m=2
n 1
. Consequently, there are M
0
, M
1
, M
2
,, and M
m1
possible models, where
M
0
is the constant model and none of the effects is signicant. Model i denoted by M
i
has as a parameter vector
i
, which contains the
active effects and the shape parameter in the case of gamma distribution. The approach is focused on computing the PPA, which is
given by
P
j

Mi :xj is present
p M
i
y j (8)
where p(M
i
|y) is the posterior probability of model M
i
given that the experiment has been observed. From Bayes Theorem,
p M
i
y j
p M
i
f y M
i
j

m
h0
p M
h
f y M
h
j :
(9)
In the previous formula, p(M
i
) is a prior probability that model M
i
is true (to be determined later) and f(y|M
i
) is the so-called
integrated likelihood of model M
i
,which is given by
f y M
i
j
i
f y M
i
;
i
f
i
M
i
j d
i
; j (10)
and the density f(
i
|M
i
) is the prior density for the parameters present in model M
i
. If a parameter is not active, then it is set to zero.
V. AGUIRRE-TORRES AND R. DE LA VARA
Copyright 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413426
4
1
9
Effects with relative large posterior probabilities are considered candidates of active effects. Notice that this proposal does not
depend on a large sample size; in fact, it could be computed for any sample size. However, it may depend on the choice of prior
distribution, hence, it is convenient to conduct sensitivity analysis and observe how the pattern of posterior probabilities change with
respect to changes in the prior distribution.
Notice also that prior knowledge of the phenomenon can be incorporated by means of a prior distribution. For example, the
subject matter expert may know that an effect should a have positive sign. If the latter is so, it can be easily incorporated into the
calculation of the integrated likelihood by truncation.
For the normal distribution and the identity link function, Box and Meyer derived an analytic expression for (10). However, there is
no closed form formula for this integral in a general GLM case. Hence, we estimated the integral using a crude Monte Carlo average by
simulating from the prior distribution of the parameters f(
i
|M
i
).
In order to dene the prior probability of a model p(M
i
), we considered that in a screening experiment all columns may have the same
importance. If we let be the prior probability that any one effect is active, then the probability of observing a model with t
i
signicant
effects will be
19

ti
1
nti
. From empirical evidence on fractional factorial experiments, known as factor sparsity, it has been observed
that 0 < <0.4, thus, we consider the value of 0.2 for 8 and 16 experimental runs, and 0.3 for four experimental runs.
In the Bayesian variable-selection approach,
10
a weighting procedure is used that tries to impose effect heredity, meaning, that an
interaction term is likely to be present when the corresponding main effects are present in the model. We did not pursue that option
because simulations given in Section 5 shows that the choice of
19
gives sensible results.
Table III gives the corresponding links and log-likelihood functions for each of the models considered here. Regarding
computation, it is convenient to calculate rst the log-likelihood. The latter is particularly useful for gamma distribution because
the computation of the log-gamma function is much more stable than the gamma function itself.
3.3. Posterior distribution of the effects
Now, let us consider a second approach. From Bayes Theorem, the PDE given the experiment is obtained from

i
M
i
; y j f y M
i
;
i
j f
i
M
i
j (11)
In order to obtain a simulated sequence from the posterior distribution, a numerical method, namely, Bayesian Markov Chain Monte
Carlo
22,23
is used. This method simulates a sequence froma Markov chain whose limiting distribution is the posterior distribution (
i
|M
i
, y).
This methods application uses the terms in the sequence after a burn-in period. The method is implemented in a free software, openBUGS
(Copyright 1989, 1991 Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA).
24,25
The density (
i
|M
i
, y) could be used to nd posterior probability intervals as well as to compute probabilities of events. For
example, if A is an event, then its odds are P(A)/(1 P(A)). With the posterior density, we can calculate the odds that a parameter is
positive or negative and use it as evidence of whether the effect is active.
4. Some examples
In this section, we will show the proposals application to the examples of the rst section.
4.1. Simulated gamma example
Figure 6 shows the PPA for the simulated experiment with a gamma response. The probability labeled with zero is the probability that
none of the effects is signicant. In this example, it is very small, thereby, providing evidence that at least one of the effects should be
signicant. Furthermore, in this plot, effects A and C stand out from the rest supporting the fact that they should be signicant.
We were able to obtain the posterior distribution for
0
, r and ve effects with eight observations and a gamma model. Figure 6
also shows PDEs boxplots. They give strong evidence that effect A is positive and effect C should be negative.
Table IV gives some quantiles from the posterior distribution. In particular, the posterior probability 95% intervals support the fact
that effect A is positive and effect C should be negative. Notice also that they contain the true value of the parameters. In this case, the
odds are: for A to be positive, 0.975/0.025 =39 or 39 to 1 and for C to be negative, 0.98/0.02 =49 or 49 to 1. Both are huge.
Therefore, in this case, the tools from the Bayesian approach led us to a much more enriching analysis than that of the Daniel plot.
4.2. Cookie experiment: sweetness.
In order to evaluate sweetness, factory management considered it convenient to use a 2-alternative forced choice response.
26
Each of
the 10 judges evaluated four pairs of cookies in random order between and within pairs. Each pair contained a control formulation
and a formula coming from the corresponding treatment combination. If the formula from the treatment combination was
considered sweeter than the control, then, it was considered a success. Thus, the response to be modeled was the total number of
successes for each treatment combination.
Table II contains this response as well as the estimates of the effects produced by glm. Figure 2 is the corresponding Daniel plot
and, as mentioned previously, its analysis is difcult. In contrast, Figure 7 which gives the PPA provides the following interpretations:
V. AGUIRRE-TORRES AND R. DE LA VARA
Copyright 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413426
4
2
0
1. The probability that none of the effects is active and practically zero, thus, there is at least one active effect
2. Effect A is clearly active
3. Effect B is clearly inactive
4. Effect AB is very likely to be active
Figure 7 gives the PDEs boxplots and suggests that both effects A and AB are active and positive. This is also conrmed from
Table V and the posterior probability 95% intervals for A and C. From Figure 7, the third quartile is close to zero, hence, the posterior
odds that effect B is negative are around 3 to 1, which is not very denite. Therefore, in this example and for this response, the tools
from the Bayesian approach gave us a more enriching analysis of the data.
5. Some simulation results
In order to compare the sampling behavior of the posterior probability method and the frequentist approach, we ran a simulation for
the three distributions discussed here. We will only present the results for the binomial and gamma distribution because the results
for Poisson distribution are similar to those of the binomial case. We used a 2
3
experiment in all cases with different linear predictors.
In all cases, we ran 200 Monte Carlo replications. This is a small number but sufcient to show the main features of both the Bayesian
and frequentist approaches.
For each replication, we also computed the estimated effects from the function glm from the R package, the ShapiroWilk
normality test on the estimated effects to resemble the interpretation of the Daniel plot, its corresponding p-value, and the posterior
probability that each of the effects are active.
Then, we ran a sensitivity study. The purpose is to evaluate the impact of three different aspects of prior choices on the PPA. We
used examples with binomial, Poisson, and gamma responses.
5.1. Binomial distribution
The linear predictor was =2A3B +3C +2BC. Prior considerations were (L

,U

) =(0.1,0.9) and

=0.01. As we mentioned in Section 2,


the estimators of the effects are very biased and show a strong effect of discreteness. See, for example, Figure 8, which is the scatter
plot of the estimates of effects C and AB.
The estimator of effect C is around 10 when its true value is 3. The estimator of effect AB is around 2.5 when its true value is 0.
There is also a positive correlation between the two estimators around 0.84; however, this contaminates the interpretation of the
Daniel plot because in some samples AB may look signicant, but it is due to this correlation with a signicant effect. Other effects
Table IV. Means, standard deviations, and posterior distribution of the effects quantiles for a simulated 2
3
factorial experiment
with a gamma response
Effect node mean sd 2.5% median 97.5%
A beta[1] 0.83 0.43 0.05 0.84 1.64
B beta[2] 0.38 0.43 0.47 0.38 1.22
C beta[3] 0.98 0.45 1.83 0.98 0.08
AB beta[4] 0.04 0.43 0.80 0.04 0.90
AC beta[5] 0.48 0.43 1.33 0.48 0.38
[1]
[2]
[3]
[4]
[5]
box plot: beta
-2.0
-1.0
0.0
1.0
2.0
Figure 6. Posterior probability that an effect is active and posterior distribution of the effect boxplots. Simulated gamma response. [1] A, [2] B, [3] C, [4] AB, and [5] AC
V. AGUIRRE-TORRES AND R. DE LA VARA
Copyright 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413426
4
2
1
are also highly correlated. Because it is not possible to visually interpret the 200 replications, in order to evaluate the Daniel plots
performance, we used the p-values obtained from the ShapiroWilk normality test as a proxy. Figure 4 shows the histogram of these
p-values. We noticed that they are below 0.10 for less than 5% of the replications, thus, implying a low power to detect non-normality.
On the other hand, Figure 13 (1) gives the boxplots of the posterior probabilities that each of the effects are active. It is clearly
shows that the posterior probabilities for effects B and C are practically one for most of the replications, and that the corresponding
probabilities for effects A and BC are similar, because they are of the same size. Furthermore, the probabilities of the rest of the effects
are, in general, close to zero, but for some samples are exceptional in that they are greater than 0.5.
It is remarkable that in this very small experiment, there is no need for the effect sparsity because four effects (50%) were active,
and they were all detected effectively.
5.2. Gamma distribution
The linear predictor was =1 +2A B +0.5C. Prior considerations were for
0
(L

,U

) =(0.5,12) and

=0.01 for the coefcient of


variation (L
c
,U
c
) =(0.15,3.6) and
c
=0.01. Both intervals are very generous. Figure 9 gives the histogram for the estimates of effect
A where the distribution is unimodal, centered around the true value, and without gaps.
Figure 10 is the scatter plot between the estimates of effects A and AB. It shows that the estimates of effect AB are centered around
zero as they should be, and also that they are not correlated (the estimated correlation is 0.027).
Figure 11 gives the histogram of the p-values of the ShapiroWilk normality test. It shows that p value <0.10 for less than 20% of
the replications, which is a very low score if we expect the Daniel plot to show signs of non-normality.
Finally, Figure 12 gives the boxplot of the PPA. The probability that none of the effects are active is quite close to zero with very low
variation as expected. The probabilities of effect A, the largest effect, are also very close to one and with low variation. The
probabilities of effect B have an interquartile range between 0.2 and 0.8. The probabilities of the smallest active effect are, in general,
larger than those of the null effects. Thus, this approach behaves as expected.
5.3. A sensitivity analysis
As mentioned by both referees, in analyzing a small fractional factorial experiment with a Bayesian approach, the prior used can
inuence the result. For this reason, we included a sensitivity analysis with respect to three aspects relative to the prior choice. The
analysis is constructed using a fractional factorial setup. The linear predictor and prior considerations are the same for the binomial
and gamma distributions as in Section 5. For the Poisson response, the linear predictor is =1 +A B +0.75C +0.5BC, prior
considerations for
0
are (L

,U

) =(0.5,50) and

=0.01.In this case,


0
=e
1
and it is contained in the a priori interval.
The factors and their levels are given in Table VI. The experiment consisted of a 2
3 1
fraction with a dening contrast I =ABC=1
and 200 independent Monte Carlo replications for each treatment combination for each response: binomial, Poisson, and gamma.
Notice that the treatment (1, 1, 1) is the choice used in the examples and in the previous simulation results. The order of
treatments or scenarios is given in Table VII.
box plot: beta
b
e
t
a
-
4
.
0
-
2
.
0
0
.
0
2
.
0
4
.
0
6
.
0
[1]
[2]
[3]
1.26
Figure 7. Posterior probability that an effect is active and posterior distribution of the effect boxplots. Cookie sweetness experiment. [1] A, [2] B, and [3] AB
Table V. Means, standard deviations, and posterior distribution of the effects quantiles for cookie sweetness 2
2
factorial
experiment
Effect node mean sd 2.5% median 97.5%
A beta[1] 2.68 0.95 1.19 2.53 4.88
B beta[2] 0.91 0.94 3.04 0.78 0.62
AB beta[3] 2.00 0.95 0.51 1.85 4.22
V. AGUIRRE-TORRES AND R. DE LA VARA
Copyright 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413426
4
2
2
Factor A represents the experimenters willingness to assume that there are active effects. Factor B represents the prior knowledge
of the values of the response. Factor C represents a choice between a peaked innite variance versus a at nite variance distribution
for the effects; both choices are symmetric around zero.
Figure 13 gives the PPAs boxplots for each scenario for the binomial distribution. Generally speaking, effects with PPA close to one
or zero remained quite similar. A comparable thing happened with the other responses. More evident changes occurred for effects
with probabilities in the middle. In order to measure their impact, we used the factorial structure of the study and as a response
the mean PPA. Then, we computed the effects for each aspect in the usual way. Table VII gives the means and standard deviations
for the average PPA of effect A and the binomial response.
We calculated the estimated effects and corresponding standard deviations using the data of Table VII for some variables where
the previous exploratory analysis suggested an important change. Table VIII gives the impacts for chosen variables and distributions.
Let us consider the prior probability , for the binomial example. It has a negative effect of around 8%, but for the Poisson and
gamma distributions, it has a positive effect of 8% and 46%, respectively. All of these effects are signicant. Given this result
and because this choice is inexpensive to change, we suggest getting the PPA for both values of and then, comparing the
resulting patterns.
Regarding prior information on
0
, there is a signicant negative effect; the exception is the binomial distribution. The latter is
expected because, in this case,
0
is equal to zero, which corresponds to
0
=0.5 and is within the prior interval. Therefore, this
suggests the importance of giving an approximate interval for
0
, which according to our experience is not too difcult to elicit
because we have found, in general, that the people involved in experiments have a broad idea of the average values of the response.
The effect of peaked or at distribution, at least for all the examples shown, are negative and signicant, meaning, that the Cauchy
distribution is a better choice than the uniform distribution. We chose the Cauchy distribution because it has an innite variance,
which seems to be useful.
Figure 8. Scatter plot of effect estimates, AB versus C. Simulated 2
3
experiment with binomial response and 10 trials. Linear predictor: =2A3B +3C +2BC
Figure 9. Histogram of effect A. Monte Carlo simulated 2
3
factorial experiment with gamma response
V. AGUIRRE-TORRES AND R. DE LA VARA
Copyright 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413426
4
2
3
Figure 10. Scatter plot of effect estimates, AB versus A. Simulated 2
3
experiment with gamma response. Linear predictor: =1 +2AB +0.5C
Figure 11. Histogram of p-values, ShapiroWilk normality test. Simulated 2
3
factorial experiment with gamma response
Figure 12. Posterior probability that an effect is active boxplots. Simulated 2
3
factorial experiment with gamma response. Linear predictor: =1 +2AB +0.5C
V. AGUIRRE-TORRES AND R. DE LA VARA
Copyright 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413426
4
2
4
Table VI. Factors and levels sensitivity analysis
Factor Prior aspect () (+)
A Prior probability that an effect is active 0.2 0.3
B Prior information on
0
and density Yes, N

0
;

0
_ _
No, N(0,10)
C Prior distribution for
i
Cauchy Uniform(30,30)
Figure 13. Posterior probability that an effect is active boxplots. Sensitivity analysis with binomial response. Linear predictor: =2A3B +3C +2BC
Table VII. Sensitivity analysis. average posterior probability that an effect is active, and standard deviation for binomial response,
variable A
A B C Average PPA. Std Dev. PPA.
1 1 1 0.6316 0.3248
1 1 1 0.5275 0.4109
1 1 1 0.4405 0.3989
1 1 1 0.4904 0.3859
PPA: posterior probability that an effect is active
Table VIII. Summary sensitivity analysis for selected variables: Effects of aspects on average posterior probability that an effect is
active
Aspect
Binomial, A Poisson, A Gamma, B
Effect t-stat Effect t-stat Effect t-stat
A 0.08 2.85 0.08 3.12 0.46 17.16
B 0.03 1.00 0.09 3.32 0.23 8.34
C 0.11 4.23 0.18 6.66 0.25 9.120
V. AGUIRRE-TORRES AND R. DE LA VARA
Copyright 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413426
4
2
5
6. Concluding remarks
In this paper, we proposed to supplement the use of Daniel plots with tools from the Bayesian approach, namely, PPA and PDE when
the experiments are unreplicated. We showed that this is particularly useful when the size of the experiment is very small for both
normal and non-normal responses.
Acknowledgements
Vctor Aguirre-Torres is a Professor in ITAMs Statistics Department, and Romn de la Vara is a Professor in CIMATs Quality Engineering
Department. The rst author was partially supported by Asociacin Mexicana de la Cultura A.C and conducted part of this research
while on sabbatical leave at CIMATs Probability and Statistics Department. Both authors wish to thank the editor and two anonymous
referees for their helpful comments that helped to improve the paper.
References
1. Daniel C. Applications of Statistics to Industrial Experimentation. John Wiley and Sons: New York, NY, 1976.
2. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN3-900051-07-0,
URL http://www.R-project.org/, 2012.
3. Myers RH, Montgomery DC, Vining G. Generalized Linear Models: with Applications in Engineering and the Sciences. Second edition. John Wiley
and Sons: New York, NY, 2010.
4. Benski H. C. Use of a normality test to identify signicant effects in factorial designs. Journal of Quality Technology 1989; 21:174178.
5. Albert A, Anderson JA. On the existence of maximum likelihood estimates in logistic regression models. Biometrika 1984; 71:110.
6. Wedderburn R. On the existence and uniqueness of the maximum likelihood estimates for certain generalized linear models. Biometrika 1976;
63:2732.
7. Wang X, George E, Adaptive Bayesian criteria in variable selection for generalized linear models. Statistica Sinica 2007; 17:667690.
8. SAS Institute, 9.2 Users Guide (2
nd
ed.), SAS Institute: Cary, NC, 2013.
9. Kenett R, Rahav E, Steinberg D, Bootstrap analysis of designed experiments. Quality and Reliability Engineering International 2006; 22:659667.
10. Chipman H, Hamada M, Wu CFJ. A Bayesian variable-selection approach for analyzing designed experiments with complex aliasing. Technometrics
1997; 39:372381.
11. Weaver BP, Hamada M. A Bayesian approach to the analysis of industrial experiments: an illustration with binomial count data. Quality Engineering
2008; 20:269280.
12. Hamada M, Nelder J. Generalized linear models for quality-improvement experiments. Journal of Quality Technology 1997; 29:292303.
13. Lewis S, Montgomery DC. Examples of designed experiments with Non-normal responses. Journal of Quality Technology 2001; 33:265278.
14. Dey K, Sujit K, Bani, K. Generalized Linear Models. A Bayesian Perspective. Marcel Dekker: New York, NY, 2000.
15. Wu CFJ, Hamada M. Experiments: Planning, Analysis, and Parameter Design Optimization. John Wiley and Sons: New York, NY, 2000.
16. Ntzoufras I, Dellaportas P, Forster JJ. Bayesian variable and link determination for generalised linear models. Journal of Statistical Planning and
Inference 2003; 111: 165180.
17. Gelman A, Carlin J, Stern H, Rubin D. Bayesian Data Analysis (2nd edition). Chapman & Hall/CRC: Boca Raton, FL, 2004.
18. Chen M, Shao Q, Ibrahim J. Monte Carlo Methods in Bayesian Computation. Springer Series in Statistics: New York, NY, 2000.
19. Box G, Meyer R. An analysis for unreplicated fractional factorials. Technometrics 1986; 28:1118.
20. Box G, Meyer R. Analysis of Unreplicated Factorials Allowing for Possibly Faulty Observations. In Design, Data, and Analysis, C. Mallows (ed.). John
Wiley and Sons: New York, NY, 1987.
21. Box G, Meyer R. Finding the active factors in fractionated screening experiments. Journal of Quality Technology 1993; 25:94104.
22. Gilks WR, Richardson S, Spiegelhalter DJ (eds.) Markov Chain Monte Carlo in Practice. Chapman and Hall: London, UK, 1996.
23. Brooks SP. Markov chain Monte Carlo method and its application. The Statistician 1998; 47:69100.
24. OpenBUGS, version 3.2.2 rev 1063. http://www.openbugs.info/w.cgi/FrontPage (3 February 2013).
25. Gilks W. Derivative-Free Adaptive Rejection Sampling for Gibbs Sampling. In Bayesian Statistics 4, J M Bernardo, J O Berger, A P Dawid, A F M Smith
(eds). Oxford University Press: UK, 1992; 641665.
26. Gacula M, Singh J, Bi J, Altan A. Statistical Methods in Food and Consumer Research (2nd edition). Academic Press: New York, NY, 2009.
Authors' biographies
Vctor Aguirre-Torres has a PhD in Statistics from North Carolina State University. He is an elected member of Mexicos Sistema
Nacional de Investigadores in the area of Physics and Mathematics. He is a professor in the Statistics Department at Instituto
Tecnolgico Autnomo de Mxico.
Roman de la Vara has a Doctoral degree in Probability and Statistics from the Center for Research in Mathematics in Mxico. He is a
professor in Center for Research in Mathematics in Mxicos Quality Engineering Department.
Supporting information
Additional supporting information may be found in the online version of this article at the publishers web site.
V. AGUIRRE-TORRES AND R. DE LA VARA
Copyright 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413426
4
2
6

You might also like