Quantitative Management
Vol. 6, No. 4, pp. 353369, 2009
Q QT TQ QM M
ICAQM 2009
A Bayesian Reliability Approach to
Multiple Response Optimization with
Seemingly Unrelated Regression Models
John J. Peterson
1
, Guillermo MirQuesada
2
and Enrique del Castillo
3
1
Research Statistics Unit, GlaxoSmithKline Pharmaceuticals, King of Prussia, PA, USA
Bioprocess Research and Development, Lilly Technical CenterNorth, Indianapolis, IN, USA
Department of Industrial and Manufacturing Engineering, The Pennsylvania State University,
2
3
University Park, PA, USA
(Received September 2006, accepted April 2008)
______________________________________________________________________
Abstract: This paper presents a Bayesian predictive approach to multiresponse optimization
experiments. It generalizes the work of Peterson [33] in two ways that make it more flexible for use
in applications. First, a multivariate posterior predictive distribution of seemingly unrelated regression
models is used to determine optimumfactor levels by assessing the reliability of a desired multivariate
response. It is shown that it is possible for optimal mean response surfaces to appear satisfactory yet
be associated with unsatisfactory overall process reliabilities. Second, the use of a multivariate normal
distribution for the vector of regression error terms is generalized to that of the (heavier tailed)
multivariate tdistribution. This provides a Bayesian sensitivity analysis with regard to moderate
outliers. The effect of adding design points is also considered through a preposterior analysis. The
advantages of this approach are illustrated with two real examples.
Keywords: Design space, desirability function, Gibbs sampling, multivariate tdistribution, posterior
predictive distribution, robust parameter design, robust regression.
______________________________________________________________________
1. Introduction
tatistically designed experiments and associated response surface methods are considered
effective methods for optimizing products and processes. Much has been written about
experiments involving a single response, but less has been written about multiple response
experiments, although they are quite prevalent. Popular statistical packages such as Design
Expert
and JMP
, Y Z e  (2)
where
' '
=
1
[ ,..., ] ,
r
Y Y Y
' ' '
=
1
[ ,..., ] ,
r
  
' '
' ' =
1
[ ,..., ] ,
r
e e e and Z is a m p block diagonal
matrix of the form with
1
diag( ,..., ),
r
Z Z = + + ... .
r
p
1
p p
' ' '
] x
Here,
and for
' =
1
( ,..., ),
i i in
Y Y Y
' = (
i
e
1
,..., ),
i in
e e =
1
[ ( ) ,..., ( )
i i in
Z z x z =1,..., i r.
For the SMR model under a noninformative prior (which is proportional to
the posterior predictive density function of has the multivariate
+
E
( 1)/2
  ),
r
Y
356 Peterson, MirQuesada and del Castillo
tdistribution form. See, for example, Press [36]. Simulation of a multivariate tdistribution
r.v. with df can be done simply by simulating a multivariate normal r.v. and an
independent chisquare r.v. with df (Johnson [18]). For the SMR model then,
v
v
= e ( ) Pr(  , data) p A Y x can be computed directly for each x x by Monte Carlo
simulations. This was done by Peterson [33] as a way to do multiresponse surface
optimization by maximizing ( ) p x over the experimental region. MirQuesada et al. [29]
extended these results for the SMR case to include noise variables. Computation of
multivariate tdistribution probabilities over hyperrectangles can also be computed
efficiently by numerical integration (Genz and Bretz [13]).
For the SUR model, no closedform density or sampling procedure exists. However,
using Gibbssampling (Griffiths [15]) it is easy to generate random pairs of SUR model
parameters from the posterior distribution of ( , ).  _ Using the SUR model in (1) it is then
straightforward to simulate r.v.'s from the posterior predictive distribution of given Y Y
. x
3. Computing the Bayesian Reliability
3.1. The SUR Model with Normally Distributed Error Terms
Before describing the Bayesian analysis, it is convenient to discuss some (conditional)
maximumlikelihood estimates for the SURmodel. For a given , _ the maximum likelihood
estimate (MLE) of  can then be expressed as
1 1 1
[ ( ) ] ( ) ,
n n
' ' =
Z I Z Z I Y  E E
I
(3)
where
n
is the n n identity matrix and is the Kronecker direct product operator.
The variancecovariance matrix of
 is
' =
1 1
Var( ) [ ( ) ] .
n
Z I Z  _
,  the variancecovariance matrix, _ For a given , can be estimated by
1
1
( ) ( ) ,
n
j j
j
n
=
( ) =
_
'    _ e e
( ) ( ( ),..., ( )) e e ' =
(4)
where
1 j j rj
   e
'
= = ( ) ( ) , 1,..., .
i
e y i r z x  
and
ij ij i j
Let
i
 be the
maximum likelihood estimator of
i
 for each responsetype independently of the other
responses, and define
, i
1
r
' ' = ( ,..., ).    , The estimator of 
l


' =
1 1
[ ( ( ) ) ]
n
 Z I Z _
l
'
1
( ( ) ) ,
n
Y  Z I _ (5)
is called the twostage Aiken estimator (Zellner [43]).
= e In order to compute and maximize ( ) Pr(  , data) p A x Y x
p
over the experimental
region, it is important to have a relatively efficient method for approximating ( ) x by
Monte Carlo simulations. The approach taken in this paper is to simulate a large number of
r.v.'s from the posterior distribution of ( ,  ( , )  _ _), and use each value to generate a
r.v. for each x. In this way, the sample of
Y
( , )  _ values can be reused for simulating
values at each
Y
x point, instead of having to do the Gibbs sampling all over again for each
xpoint.
A Bayesian Reliability Approach to Multiple Response Optimization 357
( , ) Consider the noninformative prior for  _ which is proportional to
(Percy [32] and Griffiths [15]). Note that the posterior distribution of
+ ( 1)/2
 
r
_
 given is
modeled by
_
1 1
~ ( ( ),( ( ) ) ),
n
N
' Z I Z   _ _ _
( )
 _
(6)
where has the form as in (3). This follows from Srivastava and Giles [39]. Note also
that the posterior distribution of
1
_ given  is described by
1 1 1
~ ( , ( )), W n n   _ _
W n
1 1
( ), n
(7)
 where is the Wishart distribution with df and scale parameter _
(
and
)  _
( , )
has the form as in (4). This follows from a slight modification of expression (7) in
Percy [32]. Sampling values from the posterior distribution of  _ can be done as
follows using Gibbs sampling:
Step 0. Initialize the Gibbs sampling chain using
1/ 2
t

= + (   _ )  e where
*

corresponds to (5),
( )  _ corresponds to the form _ in (5), and where
, ). 0
r
I Here, ~ ( N e t can be used to induce a slight overdispersion as
recommended by Gelman et al. [12]. In this article, t = 2 is used. This initialization
is done since  is approximately normal with mean
*
 and variancecovariance
matrix
( ).  _
Step 1. Generate a _ value as in (7) by using the most recently simulated  and the
decomposition ' = , SI where
1 1
( )
1
_ I
' = n  I I = S _ and
1
.
=
'
n
i i
c c Here,
1
,...,
n
_
c c are iid ( , )
r
N 0 I distributed.
Step 2. Generate a  value as in (6) by using the most recently simulated _ and
'
0
, R = +
( )  _ c where
1 1
) ]
n
' ' = [ (  R R Z I _ Z and
0
c is distributed as
( , )
r
N 0 I .
Following Percy [32], we use a burnin of 100 iterations for steps 1 and 2. See Geweke
[14] for use of (conditionally conjugate) informative priors for  and . _
To compute ( ), p x N are generated for each vectors Y . x Each simulated
is generated using
vector, Y
( )
,
s
Y
(
'
(
= +
(
(
'
(
#
1
( ) ( ) ( )
( )
,
( )
s s s
r
z x
Y e
z x

( ) ( )
( , )
(8)
where
s s
 _
( )
is sampled using the Gibbs sampler,
s
e
( )
( , ),
s
N 0 _
=1,..., . s N point
is sampled from
and For each new
( ) ( )
( , )
s
x the same
s
N  _
( ), p
pairs are used. The
Bayesian reliability, x is approximated by
=
e _
( )
1
1
( ),
N
s
s
I A
N
Y
, N ( ) I
, ) Y
(9)
for large where is an indicator function.
Percy [32] provides a similar, but threestep, Gibbs sampling procedure that generates
a ( ,  _ triplet for a given x value. However, this is not efficient for our purposes as
358 Peterson, MirQuesada and del Castillo
this Gibbs sampling procedure would have to be redone for many in order to
optimize
points x
( ). p x Percy also proposes a multivariate normal approximation to the posterior
predictive distribution of given Y . x However, such an approximation may not be
accurate for small sample sizes. This is because one would expect the true posterior
predictive distribution of given Y x to have heavier tails than a normal distribution due
to model parameter uncertainty; this is indeed the case with the SMR model.
3.2. The SUR Model with tDistribution Error Terms
The multivariate tdistribution can be a useful generalization of the multivariate normal
distribution for applied statistics (Liu and Rubin [25]). In particular, it can be a useful tool
for modeling bellshaped distributions that have heavier than normal "tails". Liu [26]
illustrates the utility of using tdistribution errors for robust data analysis within the context
of an SMR model. Rajagopal et al. [40] provide an example for the univariate regression
case.
In this subsection, we show how to sample from the posterior predictive distribution of
a SUR model with multivariate tdistribution errors. This will allow the experimenter to
perform a sensitivity analysis with regard to a distribution that spans a continuum from
Cauchy to normal This can be useful for many typically used response
surface designs that may not provide enough data to perform discriminating tests of
normality. Our experience is that (if the mean response is in A) the Bayesian reliability
= (df 1) = (df ).
( ) p x gets smaller as the df get smaller reflecting a more disperse posterior predictive
distribution for at each Y point. x If ( ) p x is acceptably large for both small and large df,
then our sensitivity analysis provides some confidence that we have a reliable process,
provided that the residual distribution appears bellshaped and we have found good
regression models for each response type.
The SUR model with tdistribution errors has the same model form as in (1) but with
i
's replaced by
'
c
i
's, where the vector, c c e = ( ,..., )
r
c
,
1
has a multivariate tdistribution with
location parameter 0, scale (matrix) parameter, _ and df parameter The inverse of . v , _
is sometimes called the precision matrix. For details about the multivariate
tdistribution see Kotz and Johnson [23]. Here, we are assuming that is known. Some
authors recommend using df (which implies three finite moments) for the
tdistribution for purposes of modeling a heavytailed errors. See for example, Lange et al.
[24], Gelman et al. [12], and Congdon [5]. The same noninformative prior, proportional to
can be used for the tdistribution errors model. This prior is used in this article.
1
, _
v
= 4 v
+ ( 1)/2
  ,
r
_
To do Gibbs sampling from a SUR model with tdistributions errors, first consider the
following weighted SUR model
'
= + = = ( ) , 1,..., 1,..., ,
ij
ij i j i
j
e
Y i r j n
w
z x 
i
e
(10)
where (10) is defined as in (1) but with an index j to represent the observation number and
with replaced by
'
=
1
( ,.. ) / .
ij j
e w Here, .,
j j rj
e e e (0, ) N _ =
,
~ iid ,
Conditional on
1,..., . j n
j
w
1 r
(10) is a weighted SUR model. Note that here the weight is the same
for each vector of responses, ,
j
Y but different for each observation. If
for where are iid chisquare with df, then (unconditional on the
th
j / =
j j
w u v
,..., , n
1
,...,
n
u u v =1 j
's) the rx1 error vectors,
j
u ' =
1
( / ,..., ) , 1,..., ,
j j j
e w w j t
v
/
rj
e n are iid multivariate with
df, location parameter vector 0, and scale parameter matrix, _ (Congdon [5]). As with
the normal errors SUR model, we use the noninformative prior which is proportional to
A Bayesian Reliability Approach to Multiple Response Optimization 359
+ ( 1)/2
  .
r
_
' ' =
1 1
( , ) ( ( ) ) ( W Z W Z Z  _ _ _
1
( ,..., ).
n
diag w w = W
In order to set up the Gibbs sampling, we need to define some estimatorlike
functions of the data and model parameters. First we define
1
) , W Y
where In addition let
' =
1 1
( , ) [ ( ) ] . V W Z W Z _ _
l
Finally let,
'
=
= _
1
1
( , ) ( )( ) .
n
i i i i i
i
w
n
   W Y Z Y Z _
The basic steps of the Gibbs sampling are as follows.
Step 0. Initialize the Gibbs sampling chain using (5) for .  For
1
, ., ),
n
w
simulate ' ,
i
w s where /
= ( .. diag w W
=
i i
v and the '
i
u s have independent chisquare
distributions with v df.
w u
= ( . i n 1,..., )
Step 1. Simulate  ,  _ W according to
1 1
, ~ ( , W n n   W W _ _ ( , )).
Step 2. Simulate  ,  _ W according to
, ~ ( ( , ), N   W W V _ _ ( , )). W _
Step 3. Simulate
1
,..., )
n
w (conditional on = ( diag w W  and ) _ by simulating each
i
w
independently according to a gamma distribution, =1,..., ,
i i
n where
( , ) G b c denotes a gamma distribution with density function,
~ ( , ), w G b c i
= >
I
1 /
( ; , ) for 0.
( )
b w c
b
w e
g w b c w
b c
( )/2 = + b v r Here, and
1 1
1
[ ( ) ( )] .
2 2
'
= +
i i i i i
v
c y y Z _ Z  
( )
(11)
s
Computing Y ( ) p and x follows as in (8) and (9). If so desired, it is clear from the
above Gibbs sampling steps, that one can use the same (conditionally conjugate)
informative priors for  and as for the normal errors SUR model. _
+
3.3. The Addition of Noise Variables
One advantage of this posterior predictive approach to multiresponse optimization is
that it easily allows the experimenter to incorporate noise variables and thereby do
robustparameterdesign process optimization. A noise variable is a factor that may be
precisely controlled in a laboratory setting but not in actual production use. To see how
noise variables can be incorporated, let
'
=
1 1
( ,... , ,..., )
h h k
x x x x x
+1
,...,
h k
x x where are
noise variables. Here, it is typically assumed that the = + ( 1,..., )
j
x j h k
(0, 1). N
+
are scaled such
that they are iid By simulating
'
1
( ,..., )
h k
x x
= e
1 1
( ,..., ) Pr(  ,..., , data)
h h
p x x A x x Y
1
( ,..., )
h
p x x
and substituting into the simulation
for (8), can be computed. Maximizing
provides for a way to do robust process optimization. Details for the SMR
case are discussed in MirQuesada et al. [29] and Rajagopal et al. [40]. Extension to the
SUR case for normal or tdistributed errors is straightforward.
360 Peterson, MirQuesada and del Castillo
4. Optimization of the Bayesian Reliability
If there are 23 controllable factors, then it is easy to maximize ( ) p x by gridding over
the experimental region. For a larger number of controllable factors two other approaches
are possible. One approach is to use a general optimization procedure such as can be found
in Nelder and Mead [31], Price [37], or Chatterjee et al. [4]. Another approach is to create a
closedform approximate model for ( ) p x using logistic regression or some other regression
procedure such as a generalized additive model (Wood [42]). By creating a coarse to
moderately dense grid over the experimental region, logistic regression can be applied to
the data. For example, the grid can be an factorial design where e
( )
(
s
I A Y ), x
k
m = m
oints, 510, say. Since we can simulate many pairs for each of many e
( )
( ( ), )
s
I A Y x p x
it should be possible to create a good approximate closedform model, ( ) p , x for ( ). p x
One can then maximize ( ) p x using some suitable optimization procedure. See Peterson
[33] for an example using the SMR model.
5. A Preposterior Analysis
As will be seen in the next section, it may happen that standard multiresponse
optimization procedures indicate that satisfactory results for the mean response surfaces are
possible while the associated Bayesian reliability, ( ), p x is not satisfactory. If this happens
it is because the posterior predictive distribution is too disperse or possibly even oriented in
a way that causes ( ) p x to be too small. One remedial possibility is to reduce the process
variation or change the correlation structure in such a way as to increase ( ). p x However,
this may not always be possible, and in some cases difficult or costly when possible. There
is another approach which will increase ( ) p x to some degree, provided that the mean
response surfaces provide satisfactory results. Some of the dispersion of the posterior
predictive distribution is due to the uncertainty of the model parameters. This uncertainty
can be reduced by increasing the sample size. Increasing the number of observations will
not make ( ) p x go to one, due to the uncertainty of the natural process variation itself, but
it may be useful to assess how much ( ) p x will increase as additional data are added.
5.1. A Preposterior Analysis with Normally Distributed Error Terms
One way to assess how additional data might affect the posterior predictive distribution
is to impute new data in such a way as to predict the effect of having additional data using
the information we currently have at hand. In this paper we take two different approaches
to imputing additional data for the SUR model with normally distributed error terms. The
first approach is based upon single imputation, where we impute an (imaginary) additional
data set that has the property that it keeps
 and
( )  _ in (7).
The second approach is based upon (parametric bootstrap) multiple imputation. Let
be estimates of
( ,  _) ( , )  _ (such as
*
 and
( )  _ in (5)). Using the model form in (2),
we simulate
a
new response values, from (n n) ,
i
Y
( ) ( 1,...,
i a
) N i n n , = +  _ Z and then
generate N realizations of ( , )  _ conditional on the augmented data set, using the Gibbs
sampling process. These realizations are then used to generate N realizations of a new
response variable, using (8). This whole process is repeated m times to get m estimates , Y
A Bayesian Reliability Approach to Multiple Response Optimization 361
Z of These values are then averaged to get a
final estimate of
+ +
e
1 1
Pr(  , , ,..., , ,..., ).
a a
n n n n
Y A data Y Y x Z
1 1
,...,  ,...,
{Pr( 
+ +
, ,
m
,
1 1
, data, ..., , ,..., )}.
+ +
e
Z B _
n n n n
a a
Y Y
E Y
Z a a
n n n n
Y Y x Z Z A
This preposterior estimate of e Pr(  A Y x
+ +
e
1 1
r(  , data, ,..., , ,..., ).
a a
n n n n
Y A Y Y x Z Z
 , data Y
, data) is a (parametric) bootstrap estimate
of an expected value; as such it seems reasonable to use m equal to 200 (Efron and
Tibshirani [9]). This multiple imputation approach, though more computationally intensive,
can also be used to produce a histogram of simulated realizations from the random variable
P
Note that multiple imputation here is not done by simulating from the posterior
predictive distribution, but instead from
( ) ( 1,...,
i a
N ), i n n , = +  _ Z where

and are point estimates of
'
= +
1
[ /2 (1/2)( ) ( )]
i i i i
c v y Z B y Z B E
= + ( 1,..., ),
i a
w i n n
i
c
1
i
.
i
y
Such modifications will give the experimenter an idea of how much the reliability can
be expected to increase by reducing model uncertainty. For example, the experimenter can
forecast the effects of replicating the experiment a certain number of times. This idea is
similar in spirit to the notion of a "preposterior" analysis as described by Raiffa and
Schlaiffer [38].
6. Examples
6.1. A Mixture Experiment
This example involves a mixture experiment to study the surfactants and emulsification
variables involved in pseudolatex formation for controlledrelease drug containing beads
(Frisbee and McGinity [11]). An extreme vertices design was used to study the influence of
surfactant blends on the size of the particles in the pseudolatex and the glass transition
temperature of films cast from those pseudolatexes. The factors chosen were: ="% of
1
x
362 Peterson, MirQuesada and del Castillo
= = Pluronic
F68",
2
"% x of polyoxyethlene 40 monostearate",
3
"% x
1
Y
2
, Y
1
Y
2
. Y
1
Y
2
Y
of polyoxyethylene
sorbitan fatty acid ester NF". The experimental design used was a modified McLean
Anderson design (McLean and Anderson [28]) with two centroid points, resulting in a
sample size of eleven. The response variables measured were particle size' and glass
transition temperature, which are denoted here as and respectively. The goal of
the study was to find values of x
1
, x
2
, and x
3
to minimize as best as possible both and
Here, we choose an upper bound for to be 240 and an upper bound for to be 19.
Anderson and Whitcomb [2] also analyze this data set to illustrate Design Expert's capability
to map out overlapping mean response surfaces.
Frisbee and McGinity [11] and Anderson and Whitcomb [2] use an SMR model with
secondorder terms to model the bivariate response data. For this example, however, a
severe outlier run was deleted. The resulting regression models obtained were:
= + +
1 1 2 3 1 3 2 3
248 272 533 485 424 , y x x x x x x x
= + + +
2 1 2 3 1 3 2 3
18.7 14.1 35.4 36.7 18.0 . y x x x x x x x
2
, Y
= + + +
2 1 2 3 1 2 1 3 2 3
18.8 15.6 35.4 3.59min( , ) 17.7min( , ) 10.0min( , ), y x x x x x x x x x
2
Y
2
R
values p
(13)
(14)
For this paper, several different mixtureexperiment regression models were fit to each
for each response type. For the Beckertype model (Becker [3]),
(15)
resulted in a mean squared error of 1.71, which is a 53% reduction over the quadratic
model for in (14). The adjusted for the model in (15) is 96.4%. It turned out that
the model forms in (13) and (15) gave the best overall fits to the data. As such, these two
different (SUR) model forms were chosen to model the response surfaces. The WilksShapiro
test for normality of the residuals for each regression model yields greater than
0.05. Tests for multivariate normality via skewness and kurtosis (Mardia [27]) were not
significant at the 5% level; although such tests would not be very sensitive for the small
sample size used in this example.
Figure 1 shows the points where the predicted mean responses associated with the
model in (13) are less than 240. Likewise, Figure 2 shows the points where the predicted
mean responses associated with the model in (15) are less than 19. Figure 3 shows the
points where the predicted mean responses associated with both models in (13) and (15) are
less than 240 and 19, respectively.
We define = s s
1 2 1 2
{ , : 240, 19} A y y y y and = e ( ) Pr(  , data). p A Y x ( ) p All x x
probabilities are computed here using N=1000 simulated values for each Y x point. For
this example, ten independent Gibbs sampling chains were simulated for 1000 iterations
following a burnin of 100 iterations. Each chain was thinned to take only every tenth
simulation. Here, N=1000 was taken as a reasonable value (Gelman et al. [12]). For
binomial probabilities, N of a 1000 produces a standard error of at most 0.0158 (assuming
roughly independent posterior simulations). The GelmanRubin convergence statistics for
all of the model parameters were all very good (less than 1.1 as recommended by Gelman
et al. [12]). Gridding over the design simplex was done using 32,761 grid points. Using the
SUR models in (13) and (15) we obtain 0.19) .
'
= = ) *) 0.622 max ( ( p p x x * (0.81, 0, at = x
*. Clearly, this is no indication of a reliable process at x If the experimenter had instead
used the classical SMR model (forms in (13) and (14)) to maximize ( ) p x over the design
simplex then he/she would obtain max = = ( ) ( *) 0.863 p p x x * (0.78, 0, 0.22) . at
'
= x
( ) p Hence the optimal x for the SMR model is about a 39% increase in process reliability,
A Bayesian Reliability Approach to Multiple Response Optimization 363
2
y
p
though still possibly unacceptable. The noticeable difference in probabilities is due to the
fact that the (better fitting) model in (15), while having a smaller MSE, also has a larger
mean predicted value than the model in (14) when is greater then 0.5. The
probabilities
1
x
( ) x for the SUR model were also computed assuming that the residual
errors had a tdistribution with 4 df. In this case, max = = ( ) ( *) 0.613 p p x x
* (0.83, 0, 0.17) .
'
= x
at
(Gelman et al. [12], suggest a tdistribution with 4 df for doing a robust
data analysis.)
0.00
0.25
0.50
0.75
1.00
x1=1
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00 x2=1 x3=1
1
, y
Figure 1. The gray area is the part of the response surface
associated with the model in (13) where the predicted mean
response, is less than or equal to 240.
0.00
0.25
0.50
0.75
1.00
x1=1
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00 x2=1 x3=1
2
, y
Figure 2. The gray area is the part of the response surface
associated with the model in (15) where the predicted mean
response, is less than or equal to 19.
x1=1
x2=1 x3=1
x1=1
x2=1 x3=1
364 Peterson, MirQuesada and del Castillo
0.00
0.25
0.75
1.00
x1=1
0.50
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
x2=1 x3=1
s
1
240 y s
2
19. y
points
Figure 3. The gray area is the part of the response surface
associated with the models in (13) and (15) where both
and
All of these models may indicate the need for remedial action. Such action could be of
the form of reducing the process variability, decreasing the means, and/or removing
uncertainty due to the unknown model parameter values. Since the first two actions may be
difficult to achieve, we consider the effects of adding more replications to the experimental
design by way of a preposterior analysis. To assess the effect of adding additional data, the
preposterior analyses discussed in section 5 were performed. To keep the computations
tractable, the same optimized x associated with each model and the original data
set, were used. For each model, ( p ) x was computed using its own optimal xpoint,
for the SMR model (with normal errors), for the
SUR model (with normal errors), and
* (0.78, 0, 0.22) = x * (0.81, 0, 0.19)
'
= x
0.17)
'
* (0.83, 0,
'
= x for the SUR model with
tdistributions errors (with 4 df). For each model, the entire design matrix was replicated 2,
3, or 4 times. For the SMR model the degrees of freedom were adjusted accordingly. For
the SUR model with normally distributed errors, both preposterior approaches (single and
multiple imputation) discussed in section 5.1 were used. For the SUR model with the
tdistribution errors, the multiple imputation approach as discussed in section 5.2 was used.
( *) p Figure 4 shows the increase in x as the number of replications is increased from
one to four for both the SMR and SUR models. Here, it is evident that the SMR model
might lead the experimenter to believe that reduction of model parameter uncertainty by
using three or four replications would provide sufficient evidence that the process has a
high rate of conformance with the specifications given by the set A. However, the better
fitting SUR models, using either normal or tdistribution errors, indicate that increasing the
number of experimental replications may not validate that the process has a high rate of
conformance even with four replications. Instead, the SUR models are indicating that the
experimenter must improve the process means and/or variances to obtain conformance
with higher probability. The single imputation and (bootstrap) multiple imputation results
for the SUR model with normal errors are reasonably close but further research needs to be
done on how these two preposterior approaches compare. Nonetheless, this shows the
x1=1
x2=1 x3=1
A Bayesian Reliability Approach to Multiple Response Optimization 365
importance of improved modeling which can be achieved by generalizing from the SMR
model to the more flexible SUR model.
SMR normal errors
SUR normal errors
SUR normal errors with bootstrap
SUR tdist. errors with bootstrap
Rep=1 Rep=2 Rep=3 Rep=4
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
Figure 4. Probabilities of conformance for increasing numbers of design
replications. Reps 24 are the preposterior probability estimates. (Bootstrapping is
not applicable for Rep=1.) For each model, p(x) was computed using its own
optimal xpoint from Rep=1.
6.2. Optimization of an HPLC Assay
This example illustrates the optimization of an event probability
for a high performance liquid chromatography (HPLC) assay as originally discussed in
Peterson [33]. Here there are three factors ( = percent of isopropyl alcohol (pipa), =
temperature (temp), and = pH) and four responses ( = resolution (rs), = run time,
= signaltonoise ratio (s/n), = tailing). For this assay, the chemist desires to have the
event,
e Pr(  , data) A Y x
2
x
2
y
1
x
3
x
1
y
3
y
4
y
= > s > s s
1 2 3 4
{ : 1.8, 15, 300, 0.75 0.85}, A y y y y y (16)
occur with high probability. As such it is desired to maximize = e ( ) Pr(  , data) p A x Y x as
a function of . x
A BoxBehnken experimental design was run, with three center points, to gather data
to fit four quadratic response surfaces. For the SMR model, full secondorder quadratic
regression forms were used for each response.
All of the response surface models fit well, with all values above 99%. As in
example 1, the WilksShapiro test for normality of the residuals for each regression model
yields pvalues greater than 0.05, and the Mardia tests for multivariate skewness and
2
R
SMR normal errors
SUR normal errors
SUR normal errors with bootstrap
SUR tdist. errors with bootstrap
Rep=1 Rep=2 Rep=3 Rep=4
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
366 Peterson, MirQuesada and del Castillo
kurtosis were not significant at the 5% level. The factor levels were coded so that all values
were between and with the center of the experimental region at the origin. 1
(1
1
2
3
4 0
+1,

 
+
(1)
1 1
(2)
1
(3)
1 1
(4)
1
y x
y x
y x
y x
Some of the factor terms for the secondorder response surface models were not
statistically significant so a SUR model was created from an SMR model by removing
some of the nonsignificant terms, while still preserving modelterm hierarchy. Using the
STEPWISE option in SAS PROC REG the four regression models obtained for the SUR
model analysis were:
     
     
     
= + + + + + +
= + + + + + +
= + + + + + +
= +
) (1) (1) 2 (1) 2 (1)
0 2 2 11 1 22 2 12 1 2 1
(2) (2) (2) (2) 2 (2) 2 (2)
0 1 2 2 3 3 11 1 22 2 12 1 2 2
(3) (3) (3) (3) 2 (3)
0 2 2 3 3 33 3 12 1 2 3
(4)
1
,
,
,
x x x x x e A
x x x x x x
x x x x x e
   + + + +
(4) (4) 2 (4) 2
2 2 11 1 22 2 4
. x x x e
e
(17)
For comparison purposes, a sensitivity analysis involving three models were performed.
The three models were:
Model 1: An SMR model using a full secondorder polynomial with normally distributed
errors.
Model 2: A SUR model as shown in (17) above with normally distributed errors.
Model 3: A SUR model as shown in (17) above with errors have a tdistribution with 4 df.
For models 13, 10,000 Monte Carlo simulations were done as it appears that the true
underlying Bayesian probabilities were extreme (close to 1). One hundred burnin
simulations were done to get each independent simulated value. Gridding steps of 0.1 were
used across the coded design space. For the SMR model (Model 1), the maximum ( ) p x
value is = ( *) 0.96 p 4 x where ' = * (73.5, 43, 0.1) . x
( ) p
However for the SUR model with
normal errors (Model 2), the maximum x value is = ( *) 1 p x where ' = * (73.5, 43 x , 0.1)
(although a neighborhood containing * x also had values of = ( ) p 1). x Replacing the
normal errors assumption in Model 2 with tdistributions errors (with 4 df) (Model 3),
produced a maximum ( p ) value of = ( *) 0.978, p x where ' = * x (74.5, 44.9, 0.06) . x
It is interesting to note that the ( *) p x s for the SUR models are larger than for the
SMR model. In this example, Model 2 is simply a special case of Model 1 where the
nonsignificant regression terms are removed. Apparently, this removal of nonsignificant
terms for Model 2 tightens up the posterior predictive distribution enough to increase the
optimal ( ) p x value of that over Model 1. Even the optimal ( ) p x value for Model 3 is
slightly larger than that for Model 1 despite the use of a residual error tdistribution with 4
df. For this example, the sensitivity analysis tells us that for all three models the worst case
probability is 0.964. If this smallest reliability estimate is adequate then we need not do a
preposterior analysis to check the effects of gathering additional data.
7. Summary
The SMRmodel (with normal errors) has a closedformposterior predictive distribution
allowing quick and easy computation of = e ( ) Pr(  ,data) p A x Y x or other posterior
predictive metrics as shown in Peterson [33]. However, in some cases the use of the more
general SUR model will be preferable. One such case was shown in the first example
(section 6.1) where the fit of one of the response types was greatly increased by a change in
the basic model form. For the second example (section 6.2) all of the individual regression
models each had some terms that were not statistically significant. The larger posterior
probability of conformance value for the SUR model over the SMR model indicates that
A Bayesian Reliability Approach to Multiple Response Optimization 367
some further efficiency can be obtained by removing terms in some of the models that do
not appear predictive.
The preposterior analysis discussed in section 5 allows the investigator to assess the
effect of model parameter uncertainty on the posterior predictive probability of conformance.
If the process means are all in conformance with process specifications, then an increase in
data will result in some increase in posterior predictive probability of conformance. If this
predicted increase is satisfactory, then the experimenter may want to gather more data to
confirm this. If this predicted increase is not satisfactory, then the experimenter may wish
to take different action and consider the possibility of process modification to improve
response means and/or variances. At this point, it is not clear in general how the single and
multiple imputation preposterior analyses compare to each other. Further research is needed
to investigate the properties of preposterior analyses for response surface optimization.
Useful modifications of the SUR model are possible with the addition of noise variables
and a tdistribution model for the residual errors. Further research in this area to make the
variancecovariance matrix a function of the controllable factors may also prove helpful to
experimenters.
Acknowledgements
We would like to thank Joseph Schaffer for a helpful discussion on the imputation
aspects of this work as related to the preposterior analysis.
References
1. Ames, A. E., Mattucci, N., MacDonald, S., Szonyi, G. and Hawkins, D. M. (1997).
Quality loss functions for optimization across multiple response surfaces. Journal of
Quality Technology, 29, 339346.
2. Anderson, M. J. and Whitcomb, P. J. (1998). Find the most favorable formulations.
Chemical Engineering Progress, April, 6367.
3. Becker, N. G. (1968). Models for the response of a mixture. Journal of the Royal
Statistics Society, Series B, 30, 349358.
4. Chatterjee, S., Laudato, M. and Lynch, L. A. (1996). Genetic algorithms and their
statistical applications: an introduction. Computational Statistics and Data Analysis, 22,
633651.
5. Congdon, P. (2006). Bayesian Statistical Modeling, 2nd edition. John Wiley and Sons
Ltd., Chichester.
6. del Castillo, E., Montgomery, D. C. and McCarville, D. R. (1996). Modified
desirability functions for multiple response optimization. Journal of Quality Technology,
28, 337345.
7. Derringer, G. and Suich, R. (1980). Simultaneous optimization of several response
variables. Journal of Quality Technology, 12, 214219.
8. Derringer, G. (1994). A balancing act: optimizing a product's properties, Quality
Progress, June, 5158.
9. Efron, B. and Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Chapman and
Hall/CRC, Boca Raton.
10. Food and Drug Administration (2006). Guidance for Industry  Q8 Pharmaceutical
Development. U. S. Department of Health and Human Services, CDER, CBER, USA.
11. Frisbee, S. E. and McGinity, J. W. (1994). Influence of nonionic surfactants on the
physical and chemical properties of a biodegradable pseudolatex. European Journal of
Pharmaceutics and Biopharmaceutics, 40, 355363.
368 Peterson, MirQuesada and del Castillo
12. Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2004). Bayesian Data Analysis,
2nd edition. Chapman and Hall/CRC, Boca Raton.
13. Genz, A. and Bretz, F. (2002). Methods for the computation of multivariate t
probabilities. Journal of Computational and Graphical Statistics, 11, 950971.
14. Geweke, J. (2005). Contemporary Bayesian Econometrics and Statistics. John Wiley and
Sons, Inc. Hoboken, NJ.
15. Griffiths, W. (2003). Bayesian inference in the seemingly unrelated regressions model.
In ComputerAided Econometrics, eds. D. E. A. Giles, New York, Marcel Dekker,
263290.
16. Harrington, E. C. (1965). The desirability function. Industrial Quality Control, 21,
494498
17. Hunter, J. S. (1999). Discussion of response surface methodology current status and
future directions. Journal of Quality Technology, 31, 5457.
18. Johnson, Mark E. (1987). Multivariate Statistical Simulation. John Wiley, New York.
19. Johnson, R. A. and Wichern, D. W. (2002). Applied Multivariate Statistical Analysis, 5th
edition. Englewood Cliffs, Prentice Hall.
20. Khuri, A. I. and Conlon, M. (1987). Simultaneous optimization of multiple responses
represented by polynomial regression function. Technometrics, 23, 363375.
21. Kim, K. and Lin, D. K. J. (2000). Simultaneous optimization of mechanical
properties of steel by maximizing exponential desirability functions. Journal of the
Royal Statistical Society, Series C, 49, 311325.
22. Ko, Y. H., Kim, K. J. and Jun, C. H. (2005). A new loss functionbased method for
multiresponse optimization. Journal of Quality Technology, 37, 5059.
23. Kotz, S. and Johnson, R. (1985). Encyclopedia of Statistical Sciences, 6, 129130.
24. Lange, K., Little, R. and Taylor, J. (1989). Robust statistical modeling using the
tdistribution. Journal of the American Statistical Association, 84, 881896.
25. Liu, C. and Rubin, D. B. (1995). ML estimation of the tdistribution using EM and its
extensions, ECM and ECME. Statistica Sinica, 5, 1939.
26. Liu, C. (1996). Bayesian robust multivariate linear regression with incomplete data.
Journal of the American Statistical Association, 91, 12191227.
27. Mardia, K. V. (1974). Applications of some measures of multivariate skewness and
kurtosis in testing normality and robustness studies. Sankhya B, 36, 115128.
28. McLean, R. A. and Anderson, V. L. (1966). Extreme vertices design of mixture
experiments. Technometrics, 8, 447454.
29. MirQuesada, G., del Castillo, E. and Peterson, J. J., (2004). A Bayesian approach for
multiple response surface optimization in the presence of noise variables. Journal of
Applied Statistics, 31, 251270.
30. Montgomery, D. C. and Bettencourt, V. M. (1977). Multiple response surface methods
in computer simulation. Simulation, 29, 113121.
31. Nelder, J. A. and Mead, R. (1964). A simplex method for function minimization.
The Computer Journal, 7, 308313.
32. Percy, D. F. (1992). Prediction for seemingly unrelated regressions. Journal of the Royal
Statistical Society, Series B, 54, 243252.
33. Peterson, J. J. (2004). A posterior predictive approach to multiple response surface
optimization. Journal of Quality Technology, 36, 139153.
34. Peterson, J. J. (2008). A Bayesian approach to the ICH Q8 definition of design space.
Journal of Biopharmaceutical Statistics, 18, 958974.
35. Pignatiello, Jr. J. J. (1993). Strategies for robust multiresponse quality engineering. IIE
A Bayesian Reliability Approach to Multiple Response Optimization 369
Transactions 25, 515.
36. Press, S. J. (2003). Subjective and Objective Bayesian Statistics: Principles, Models, and
Applications, 2nd edition. John Wiley, New York.
37. Price, W. L. (1977). A controlled random search procedure for global optimization.
The Computer Journal, 20, 367370.
38. Raiffa, H. and Schlaiffer, R. (2000). Applied Statistical Decision Theory. John Wiley,
New York.
39. Srivastava, V. K. and Giles, D. E. A. (1987). Seemingly Unrelated Regression Equations
Models. MarcelDekker, New York.
40. Rajagopal, R., del Castillo, E. and Peterson, J. J. (2005). Model and distribution
robust process optimization with noise factors. Journal of Quality Technology, 37,
210222. (Corrigendum 38, p83).
41. Vining, G. G. (1998). A compromise approach to multiresponse optimization. Journal
of Quality Technology, 30, 309313.
42. Wood, S. N. (2006). Generalized Additive Models, an Introduction with R. Chapman and
Hall/CRC, Boca Raton, FL.
43. Zellner, A. (1962). An efficient method of estimating seemingly unrelated regressions
and tests of aggregation bias. Journal of the American Statistical Association, 57, 500509.
44. Zellner, A. (1971). An Introduction to Bayesian Inference in Econometrics. John Wiley,
New York.
Authors Biographies:
John J. Peterson is a Senior Director in the Research Statistics Unit of GlaxoSmithKline
Pharmaceuticals. He received his B.S. in Applied Mathematics and in Computer Science
(double major) from the State University of New York at Stony Brook and his Ph.D. in
statistics from The Pennsylvania State University. Dr. Peterson has over 20 years experience
as a statistician in the pharmaceutical industry. His current research area is in response
surface methodology as applied to pharmaceutical industry problems, including applications
to "chemistry, manufacturing, and control" (CMC) and combination drug studies. Dr.
Peterson is a Fellow of the American Statistical Association and a Senior Member of the
American Society for Quality. He is also on the editorial boards of the Journal of Quality
Technology and the journal Applied Stochastic Models in Business and Industry.
Guillermo MirQuesada is a Sr. Research Scientist in the Bioprocess Research and
Development department at Eli Lilly and Co. He received a Ph.D. in Industrial Engineering
and Operations Research from Pennsylvania State University. He has worked in the
Biotech division of Eli Lilly and Co. since 2003, where he has supported the development
of manufacturing processes for active pharmaceutical ingredients. He is involved in
activities related to integrating Quality by Design principles in the drug development plan
and assessing the capability of manufacturing process in development.
Enrique del Castillo is a Distinguished Professor of Engineering in the Department of
Industrial & Manufacturing Engineering at the Pennsylvania State University. He also
holds an appointment as Professor of Statistics at PSU and directs the Engineering
Statistics Laboratory. Dr. Castillos research interests include Engineering Statistics with
particular emphasis on Response Surface Methodology and Time Series Control. An
author of over 80 refereed journal papers, he is the author of the textbooks Process
Optimization, a Statistical Approach (Springer, 2007), Statistical Process Adjustment for Quality
Control (Wiley, 2002), and coeditor (with B.M. Colosimo) of the book Bayesian Process
Monitoring, Control, and Optimization (CRC, 2006). He is currently (20062009) editorin
chief of the Journal of Quality Technology.