You are on page 1of 17

Quality Technology &

Quantitative Management
Vol. 6, No. 4, pp. 353-369, 2009
Q QT TQ QM M
ICAQM 2009
A Bayesian Reliability Approach to
Multiple Response Optimization with
Seemingly Unrelated Regression Models
John J. Peterson
1
, Guillermo Mir-Quesada
2
and Enrique del Castillo
3
1
Research Statistics Unit, GlaxoSmithKline Pharmaceuticals, King of Prussia, PA, USA
Bioprocess Research and Development, Lilly Technical Center-North, Indianapolis, IN, USA
Department of Industrial and Manufacturing Engineering, The Pennsylvania State University,
2
3
University Park, PA, USA
(Received September 2006, accepted April 2008)
______________________________________________________________________
Abstract: This paper presents a Bayesian predictive approach to multiresponse optimization
experiments. It generalizes the work of Peterson [33] in two ways that make it more flexible for use
in applications. First, a multivariate posterior predictive distribution of seemingly unrelated regression
models is used to determine optimumfactor levels by assessing the reliability of a desired multivariate
response. It is shown that it is possible for optimal mean response surfaces to appear satisfactory yet
be associated with unsatisfactory overall process reliabilities. Second, the use of a multivariate normal
distribution for the vector of regression error terms is generalized to that of the (heavier tailed)
multivariate t-distribution. This provides a Bayesian sensitivity analysis with regard to moderate
outliers. The effect of adding design points is also considered through a preposterior analysis. The
advantages of this approach are illustrated with two real examples.
Keywords: Design space, desirability function, Gibbs sampling, multivariate t-distribution, posterior
predictive distribution, robust parameter design, robust regression.
______________________________________________________________________
1. Introduction
tatistically designed experiments and associated response surface methods are considered
effective methods for optimizing products and processes. Much has been written about
experiments involving a single response, but less has been written about multiple response
experiments, although they are quite prevalent. Popular statistical packages such as Design
Expert

and JMP

allow experimenters to analyze multiple response experiments by


providing procedures based upon "overlapping mean responses" or "desirability functions"
of mean responses. The overlapping mean response approach provides an overlay plot of
the mean response surfaces to see if there is a configuration of factor levels that
simultaneously satisfies the conformance criteria of the experimenter. A listing of articles
providing examples or discussion of this approach can be found in Montgomery and
Bettencourt [30]. Harrington [16] first proposed an approach based upon a notion of a
desirability function. The idea here is that for each response type, a function,
taking values on [0, 1] is created to express the desirability of the mean response as a
function of the factor levels. (Here, is an
S
th
i ( ),
i
d y
1 r y vector of estimated mean responses, and
closer to 1 are more desirable.) The overall desirability, is a geometric mean
of the individual values. Later on, Derringer and Suich [7], del Castillo et al. [6], and
Kim and Lin [21] proposed modifications of Harrington's desirability function.
( )' s
i
d y ( ), D y
( )
i
d y
354 Peterson, Mir-Quesada and del Castillo
Another type of approach based upon a quadratic loss, about a set of target
values have been discussed by Khuri and Conlon [20], Pignatiello [35], Ames et al. [1],
Vining [41], and Ko et al. [22]. Some of these approaches try to model the joint predictive
distribution of but do not capture the uncertainty due to the estimated variance-
covariance matrix parameters. Furthermore, for the quadratic loss function and desirability
function approaches (except perhaps the one due to Harrington) it may be difficult to assign
values of the function to a scale that can be converted into "poor", "good", "excellent", etc.
Here, a panel of experts may be required (Derringer [8]) to obtain an informative "univariate
response index" (Hunter [17]) for a multiresponse optimization problem.
( ), Q y
y
The overlapping mean response, desirability function, and quadratic loss function
approaches have the drawback that they do not completely characterize the uncertainty
associated with future multivariate responses and their associated optimization measures.
The danger of this is that an experimenter may use one of these methods to get an optimal
factor configuration, validate it with two or three successful runs, and then begin production.
For example, suppose that the probability that a future multivariate response is satisfactory
is only 0.7. Even so, the chance of getting three successful, independent validation runs is
0.343, which can easily happen. Hunter [17] states that the variance of univariate response
indices for multiresponse optimization "can be disturbing" and further study is needed to
assess the influence of parameter uncertainty.
For the optimal conditions obtained by these approaches, Peterson [33] used a real-
data example to show that the probability (i.e. reliability) of a good multivariate response,
as measured by these optimization criteria, can be unacceptably low. Furthermore, he
showed that ignoring the model parameter uncertainty can lead to reliability estimates that
are too large. Apractical drawback of the methodology in Peterson [33] is that the regression
models were limited to the standard (normal theory) multivariate regression (SMR) model
(cf., Johnson and Wichern [19]) having the same covariate structure across response types.
In this paper we generalize the applicability of the Bayesian methodology in Peterson
[33] to make it more widely useful for addressing commonly occurring multivariate response
surface problems. This is done by introducing a method for utilizing seemingly unrelated
regression (SUR) models (Zellner [44]) where each response-type has its own covariate
structure. In addition to the multivariate normal distribution assumption for the vector of
regression errors, we provide a further modification of this approach to handle
regression-error vectors that have a multivariate t-distribution. The t-distribution modeling
is useful for many typical response surface experiments from a Bayesian sensitivity analysis
perspective. Many response surface designs have sample sizes sufficiently small as to make
it difficult to assess normality of the residuals. Obvious outliers can be removed, but moderate
outliers may be somewhat confounded with factor effects. The t-distribution allows the
experimenter to vary the thickness of the outer tails of the distribution of the residual
errors by varying the associated degrees of freedom parameter. These two modifications of
Peterson [33] provide flexibility that may be needed for typical response surface experiments.
In this paper, we re-analyze the two examples found in Peterson [33] using the above
two generalizations. In the first example, a mixture experiment, we show that use of SUR
modeling can make a noticeable difference. The second example provides an illustration
that the optimal conditions for the process under study provide a high likelihood of meeting
specifications whether or not we use a normal distribution or a t-distribution with heavier
tails. This distribution sensitivity analysis provides some assurance that extensive validation
efforts may not be needed before implementing this optimized process.
A Bayesian Reliability Approach to Multiple Response Optimization 355
= e ( ) Pr( | , data), p A x Y x For these examples, process optimality is measured by
where is a vector of responses, 1 k 1 r is a Y x vector of process factors, and is a
specification set that describes desirable or acceptable values for The approach used in
this paper models the posterior predictive distribution of given
A
. Y
to find a value of x x Y
that maximizes ( p ). Here, the probability of response conformance or reliability, ( ), p x x
is easy to interpret. However, other process reliabilities can be constructed if one wishes to
employ a desirability function, or quadratic loss function, For example,
using the posterior predictive distribution one can compute
( ) D Y , ( ). Q Y
= > ( ) Pr( ( ) *| , data) p D D x Y x
or = s ( ) ( ) Pr( *| , data p Q Q ) x Y x if informative values of or are available. An
illustration is given in Peterson [33]. The predictive nature of this approach also easily
allows for the incorporation of noise variables to help the experimenter create a robust
process. See, for example, Mir-Quesada et al. [29] and Rajagopal et al. [40].
* D * Q
An important application of the ( ) p x desirability function is the set of all points x -
such that ( ) p x is at least some prespecified reliability value. An example of such a set
appears in Peterson [33]. This type of set has applications for process capability, and in fact
has been proposed by Peterson [34] for construction of a "design space", a pharmaceutical
manufacturing process capability region described in the FDA document "Guidance for
Industry Q8 Pharmaceutical Development" [10] available at http://www.fda.gov/cber/
gdlns/ichq8pharm.htm.
2. The Statistical Model for a Bayesian Reliability
To compute = e ( ) Pr( | , data) p A x Y x for multiresponse process optimization, we
need to obtain the posterior predictive distribution for given Y . x The regression model
considered here is the one that allows the experimenter to use a different (parametrically)
linear model for each response type. This will allow for more flexible and accurate modeling
of than one would obtain with the SMR model. Y
Here, is a vector of response-types and
'
=
1
( ,..., )
r
Y Y Y r x is a 1 k vector of factors
that influence by way of the functions Y
'
= + = ( ) , 1,..., ,
i i i i
Y e i z x | r (1)
where
i
is a vector of regression model parameters and is a vector of
covariates which are arbitrary functions of
1
i
p ( )
i
z x 1
i
p
. x Furthermore,
r
is a random
variable with a multivariate normal distribution having mean vector 0 and variance-
covariance matrix,
'
) e =
1
( , e e ...,
. _ The model in (1) has been referred to as the "seemingly unrelated
regressions" (SUR) model (Zellner [43]). When = = ( ) z x
1
( ) ... z x ( )
r
z x , we obtain the
SMR model.
In order to model all of the data and obtain a convenient form for estimating the
regression parameters, consider the following vector-matrix form,
= +

, Y Z e | (2)
where
' '
=

1
[ ,..., ] ,
r
Y Y Y
' ' '
=
1
[ ,..., ] ,
r
| | |
' '
' ' =
1
[ ,..., ] ,
r
e e e and Z is a m p block diagonal
matrix of the form with
1
diag( ,..., ),
r
Z Z = + + ... .
r
p
1
p p
' ' '
] x
Here,
and for
' =

1
( ,..., ),
i i in
Y Y Y
' = (
i
e
1
,..., ),
i in
e e =
1
[ ( ) ,..., ( )
i i in
Z z x z =1,..., i r.
For the SMR model under a noninformative prior (which is proportional to
the posterior predictive density function of has the multivariate
+
E
( 1)/2
| | ),
r
Y
356 Peterson, Mir-Quesada and del Castillo
t-distribution form. See, for example, Press [36]. Simulation of a multivariate t-distribution
r.v. with df can be done simply by simulating a multivariate normal r.v. and an
independent chi-square r.v. with df (Johnson [18]). For the SMR model then,
v
v
= e ( ) Pr( | , data) p A Y x can be computed directly for each x x by Monte Carlo
simulations. This was done by Peterson [33] as a way to do multiresponse surface
optimization by maximizing ( ) p x over the experimental region. Mir-Quesada et al. [29]
extended these results for the SMR case to include noise variables. Computation of
multivariate t-distribution probabilities over hyper-rectangles can also be computed
efficiently by numerical integration (Genz and Bretz [13]).
For the SUR model, no closed-form density or sampling procedure exists. However,
using Gibbs-sampling (Griffiths [15]) it is easy to generate random pairs of SUR model
parameters from the posterior distribution of ( , ). | _ Using the SUR model in (1) it is then
straightforward to simulate r.v.'s from the posterior predictive distribution of given Y Y
. x
3. Computing the Bayesian Reliability
3.1. The SUR Model with Normally Distributed Error Terms
Before describing the Bayesian analysis, it is convenient to discuss some (conditional)
maximumlikelihood estimates for the SURmodel. For a given , _ the maximum likelihood
estimate (MLE) of | can then be expressed as
1 1 1

[ ( ) ] ( ) ,
n n

' ' =

Z I Z Z I Y | E E
I
(3)
where
n
is the n n identity matrix and is the Kronecker direct product operator.
The variance-covariance matrix of

| is

' =
1 1

Var( ) [ ( ) ] .
n
Z I Z | _
, | the variance-covariance matrix, _ For a given , can be estimated by
1
1

( ) ( ) ,
n
j j
j
n
=
( ) =
_
' | | | _ e e
( ) ( ( ),..., ( )) e e ' =
(4)
where
1 j j rj
| | | e
'
= = ( ) ( ) , 1,..., .
i
e y i r z x | |

and
ij ij i j
Let
i
| be the
maximum likelihood estimator of
i
| for each response-type independently of the other
responses, and define
, i
1

r
' ' = ( ,..., ). | | | , The estimator of |
l
-

|

' =
1 1

[ ( ( ) ) ]
n
| Z I Z _
l

'

1

( ( ) ) ,
n
Y | Z I _ (5)
is called the two-stage Aiken estimator (Zellner [43]).
= e In order to compute and maximize ( ) Pr( | , data) p A x Y x
p
over the experimental
region, it is important to have a relatively efficient method for approximating ( ) x by
Monte Carlo simulations. The approach taken in this paper is to simulate a large number of
r.v.'s from the posterior distribution of ( , | ( , ) | _ _), and use each value to generate a
r.v. for each x. In this way, the sample of
Y
( , ) | _ values can be re-used for simulating
values at each
Y
x point, instead of having to do the Gibbs sampling all over again for each
x-point.
A Bayesian Reliability Approach to Multiple Response Optimization 357
( , ) Consider the noninformative prior for | _ which is proportional to
(Percy [32] and Griffiths [15]). Note that the posterior distribution of
+ ( 1)/2
| |
r
_
| given is
modeled by
_
1 1

~ ( ( ),( ( ) ) ),
n
N

' Z I Z | | _ _ _
( )

| _
(6)
where has the form as in (3). This follows from Srivastava and Giles [39]. Note also
that the posterior distribution of
1
_ given | is described by
1 1 1

~ ( , ( )), W n n | | _ _
W n
1 1

( ), n
(7)
| where is the Wishart distribution with df and scale parameter _

(
and
) | _
( , )
has the form as in (4). This follows from a slight modification of expression (7) in
Percy [32]. Sampling values from the posterior distribution of | _ can be done as
follows using Gibbs sampling:
Step 0. Initialize the Gibbs sampling chain using
1/ 2

t
-
= + ( | | _ ) | e where
*

|
corresponds to (5),

( ) | _ corresponds to the -form _ in (5), and where
, ). 0
r
I Here, ~ ( N e t can be used to induce a slight overdispersion as
recommended by Gelman et al. [12]. In this article, t = 2 is used. This initialization
is done since | is approximately normal with mean
*

| and variance-covariance
matrix

( ). | _
Step 1. Generate a _ value as in (7) by using the most recently simulated | and the
decomposition ' = , SI where
1 1

( )
1
_ I

' = n | I I = S _ and
1
.
=
'
n
i i
c c Here,
1
,...,
n
_
c c are iid ( , )
r
N 0 I distributed.
Step 2. Generate a | value as in (6) by using the most recently simulated _ and
'
0
, R = +

( ) | _ c where
1 1
) ]
n
' ' = [ ( | R R Z I _ Z and
0
c is distributed as
( , )
r
N 0 I .
Following Percy [32], we use a burn-in of 100 iterations for steps 1 and 2. See Geweke
[14] for use of (conditionally conjugate) informative priors for | and . _
To compute ( ), p x N are generated for each -vectors Y . x Each simulated
is generated using
-vector, Y
( )
,
s
Y
(
'
(
= +
(
(
'
(

#
1
( ) ( ) ( )
( )
,
( )
s s s
r
z x
Y e
z x
|
( ) ( )
( , )
(8)
where
s s
| _
( )
is sampled using the Gibbs sampler,
s
e
( )
( , ),
s
N 0 _
=1,..., . s N -point
is sampled from
and For each new
( ) ( )
( , )
s
x the same
s
N | _
( ), p
pairs are used. The
Bayesian reliability, x is approximated by
=
e _
( )
1
1
( ),
N
s
s
I A
N
Y
, N ( ) I
, ) Y
(9)
for large where is an indicator function.
Percy [32] provides a similar, but three-step, Gibbs sampling procedure that generates
a ( , | _ triplet for a given x value. However, this is not efficient for our purposes as
358 Peterson, Mir-Quesada and del Castillo
this Gibbs sampling procedure would have to be re-done for many in order to
optimize
-points x
( ). p x Percy also proposes a multivariate normal approximation to the posterior
predictive distribution of given Y . x However, such an approximation may not be
accurate for small sample sizes. This is because one would expect the true posterior
predictive distribution of given Y x to have heavier tails than a normal distribution due
to model parameter uncertainty; this is indeed the case with the SMR model.
3.2. The SUR Model with t-Distribution Error Terms
The multivariate t-distribution can be a useful generalization of the multivariate normal
distribution for applied statistics (Liu and Rubin [25]). In particular, it can be a useful tool
for modeling bell-shaped distributions that have heavier than normal "tails". Liu [26]
illustrates the utility of using t-distribution errors for robust data analysis within the context
of an SMR model. Rajagopal et al. [40] provide an example for the univariate regression
case.
In this subsection, we show how to sample from the posterior predictive distribution of
a SUR model with multivariate t-distribution errors. This will allow the experimenter to
perform a sensitivity analysis with regard to a distribution that spans a continuum from
Cauchy to normal This can be useful for many typically used response
surface designs that may not provide enough data to perform discriminating tests of
normality. Our experience is that (if the mean response is in A) the Bayesian reliability
= (df 1) = (df ).
( ) p x gets smaller as the df get smaller reflecting a more disperse posterior predictive
distribution for at each Y -point. x If ( ) p x is acceptably large for both small and large df,
then our sensitivity analysis provides some confidence that we have a reliable process,
provided that the residual distribution appears bell-shaped and we have found good
regression models for each response type.
The SUR model with t-distribution errors has the same model form as in (1) but with
i
's replaced by
'
c
i
's, where the vector, c c e = ( ,..., )
r
c
,
1
has a multivariate t-distribution with
location parameter 0, scale (matrix) parameter, _ and df parameter The inverse of . v , _
is sometimes called the precision matrix. For details about the multivariate
t-distribution see Kotz and Johnson [23]. Here, we are assuming that is known. Some
authors recommend using df (which implies three finite moments) for the
t-distribution for purposes of modeling a heavy-tailed errors. See for example, Lange et al.
[24], Gelman et al. [12], and Congdon [5]. The same noninformative prior, proportional to
can be used for the t-distribution errors model. This prior is used in this article.
1
, _
v
= 4 v
+ ( 1)/2
| | ,
r
_
To do Gibbs sampling from a SUR model with t-distributions errors, first consider the
following weighted SUR model
'
= + = = ( ) , 1,..., 1,..., ,
ij
ij i j i
j
e
Y i r j n
w
z x |
i
e
(10)
where (10) is defined as in (1) but with an index j to represent the observation number and
with replaced by
'
=
1
( ,.. ) / .
ij j
e w Here, .,
j j rj
e e e (0, ) N _ =
,
~ iid ,
Conditional on
1,..., . j n
j
w
1 r
(10) is a weighted SUR model. Note that here the weight is the same
for each vector of responses, ,
j
Y but different for each observation. If
for where are iid chi-square with df, then (unconditional on the
th
j / =
j j
w u v
,..., , n
1
,...,
n
u u v =1 j
's) the rx1 error vectors,
j
u ' =
1
( / ,..., ) , 1,..., ,
j j j
e w w j t
v
/
rj
e n are iid multivariate with
df, location parameter vector 0, and scale parameter matrix, _ (Congdon [5]). As with
the normal errors SUR model, we use the noninformative prior which is proportional to
A Bayesian Reliability Approach to Multiple Response Optimization 359
+ ( 1)/2
| | .
r
_

' ' =
1 1

( , ) ( ( ) ) ( W Z W Z Z | _ _ _
1
( ,..., ).
n
diag w w = W
In order to set up the Gibbs sampling, we need to define some estimator-like
functions of the data and model parameters. First we define


1
) , W Y
where In addition let

' =
1 1
( , ) [ ( ) ] . V W Z W Z _ _
l
Finally let,
'
=
= _
1
1
( , ) ( )( ) .
n
i i i i i
i
w
n
| | | W Y Z Y Z _
The basic steps of the Gibbs sampling are as follows.
Step 0. Initialize the Gibbs sampling chain using (5) for . | For
1
, ., ),
n
w
simulate ' ,
i
w s where /
= ( .. diag w W
=
i i
v and the '
i
u s have independent chi-square
distributions with v df.
w u
= ( . i n 1,..., )
Step 1. Simulate | , | _ W according to
1 1

, ~ ( , W n n | | W W _ _ ( , )).
Step 2. Simulate | , | _ W according to

, ~ ( ( , ), N | | W W V _ _ ( , )). W _
Step 3. Simulate
1
,..., )
n
w (conditional on = ( diag w W | and ) _ by simulating each
i
w
independently according to a gamma distribution, =1,..., ,
i i
n where
( , ) G b c denotes a gamma distribution with density function,
~ ( , ), w G b c i

= >
I
1 /
( ; , ) for 0.
( )
b w c
b
w e
g w b c w
b c
( )/2 = + b v r Here, and
1 1
1
[ ( ) ( )] .
2 2

'
= +
i i i i i
v
c y y Z _ Z | |
( )
(11)
s
Computing Y ( ) p and x follows as in (8) and (9). If so desired, it is clear from the
above Gibbs sampling steps, that one can use the same (conditionally conjugate)
informative priors for | and as for the normal errors SUR model. _
+
3.3. The Addition of Noise Variables
One advantage of this posterior predictive approach to multiresponse optimization is
that it easily allows the experimenter to incorporate noise variables and thereby do
robust-parameter-design process optimization. A noise variable is a factor that may be
precisely controlled in a laboratory setting but not in actual production use. To see how
noise variables can be incorporated, let
'
=
1 1
( ,... , ,..., )
h h k
x x x x x
+1
,...,
h k
x x where are
noise variables. Here, it is typically assumed that the = + ( 1,..., )
j
x j h k
(0, 1). N
+
are scaled such
that they are iid By simulating
'
1
( ,..., )
h k
x x
= e
1 1
( ,..., ) Pr( | ,..., , data)
h h
p x x A x x Y
1
( ,..., )
h
p x x
and substituting into the simulation
for (8), can be computed. Maximizing
provides for a way to do robust process optimization. Details for the SMR
case are discussed in Mir-Quesada et al. [29] and Rajagopal et al. [40]. Extension to the
SUR case for normal or t-distributed errors is straightforward.
360 Peterson, Mir-Quesada and del Castillo
4. Optimization of the Bayesian Reliability
If there are 2-3 controllable factors, then it is easy to maximize ( ) p x by gridding over
the experimental region. For a larger number of controllable factors two other approaches
are possible. One approach is to use a general optimization procedure such as can be found
in Nelder and Mead [31], Price [37], or Chatterjee et al. [4]. Another approach is to create a
closed-form approximate model for ( ) p x using logistic regression or some other regression
procedure such as a generalized additive model (Wood [42]). By creating a coarse to
moderately dense grid over the experimental region, logistic regression can be applied to
the data. For example, the grid can be an factorial design where e
( )
(
s
I A Y ), x
k
m = m
oints, 5-10, say. Since we can simulate many pairs for each of many e
( )
( ( ), )
s
I A Y x -p x
it should be possible to create a good approximate closed-form model, ( ) p , x for ( ). p x
One can then maximize ( ) p x using some suitable optimization procedure. See Peterson
[33] for an example using the SMR model.
5. A Preposterior Analysis
As will be seen in the next section, it may happen that standard multiresponse
optimization procedures indicate that satisfactory results for the mean response surfaces are
possible while the associated Bayesian reliability, ( ), p x is not satisfactory. If this happens
it is because the posterior predictive distribution is too disperse or possibly even oriented in
a way that causes ( ) p x to be too small. One remedial possibility is to reduce the process
variation or change the correlation structure in such a way as to increase ( ). p x However,
this may not always be possible, and in some cases difficult or costly when possible. There
is another approach which will increase ( ) p x to some degree, provided that the mean
response surfaces provide satisfactory results. Some of the dispersion of the posterior
predictive distribution is due to the uncertainty of the model parameters. This uncertainty
can be reduced by increasing the sample size. Increasing the number of observations will
not make ( ) p x go to one, due to the uncertainty of the natural process variation itself, but
it may be useful to assess how much ( ) p x will increase as additional data are added.
5.1. A Preposterior Analysis with Normally Distributed Error Terms
One way to assess how additional data might affect the posterior predictive distribution
is to impute new data in such a way as to predict the effect of having additional data using
the information we currently have at hand. In this paper we take two different approaches
to imputing additional data for the SUR model with normally distributed error terms. The
first approach is based upon single imputation, where we impute an (imaginary) additional
data set that has the property that it keeps

| and

_ in the Gibbs sampling the same.


However, the additional data involves augmenting the regression design matrix and df. It is
evident from (6) that the posterior distribution of | given _ can be modified to behave as
if additional data were used by augmenting the design matrix and the size of the
identity matrix Accordingly, from (7) one can see that the posterior distribution of
Z
.
n
I _
given | can be changed to behave as if more data were added by increasing the n in the
Wishart distribution, both for the df and for the n in the
1
n -coefficient of
1

( ) | _ in (7).
The second approach is based upon (parametric bootstrap) multiple imputation. Let
be estimates of

( , | _) ( , ) | _ (such as
*

| and

( ) | _ in (5)). Using the model form in (2),
we simulate
a
new response values, from (n n) ,
i
Y

( ) ( 1,...,
i a
) N i n n , = + | _ Z and then
generate N realizations of ( , ) | _ conditional on the augmented data set, using the Gibbs
sampling process. These realizations are then used to generate N realizations of a new
response variable, using (8). This whole process is repeated m times to get m estimates , Y
A Bayesian Reliability Approach to Multiple Response Optimization 361
Z of These values are then averaged to get a
final estimate of
+ +
e
1 1
Pr( | , , ,..., , ,..., ).
a a
n n n n
Y A data Y Y x Z
1 1

,..., | ,...,
{Pr( |
+ +
, ,
m
,
1 1
, data, ..., , ,..., )}.
+ +
e
Z B _
n n n n
a a
Y Y
E Y
Z a a
n n n n
Y Y x Z Z A
This preposterior estimate of e Pr( | A Y x
+ +
e
1 1
r( | , data, ,..., , ,..., ).
a a
n n n n
Y A Y Y x Z Z
| , data Y
, data) is a (parametric) bootstrap estimate
of an expected value; as such it seems reasonable to use m equal to 200 (Efron and
Tibshirani [9]). This multiple imputation approach, though more computationally intensive,
can also be used to produce a histogram of simulated realizations from the random variable
P
Note that multiple imputation here is not done by simulating from the posterior
predictive distribution, but instead from

( ) ( 1,...,
i a
N ), i n n , = + | _ Z where

|
and are point estimates of

_ | _ and respectively. The reason for this is that, for any


fixed -point. x simulating from the posterior predictive distribution will get us nowhere.
This is because multiple imputation from the posterior predictive distribution produces an
estimate of
+ +
+ +
e
1 1
,..., | , ,...,
{Pr( | ,
n n n n
a a
Y Y data
E Y x
Z Z 1
,..., d
1
, , ,..., )}.
a a
n n n n
A ata Y Y Z Z (12)
But (12) equals This follows from the well known result
that
= e ( ) Pr( | , dat p x A Y x
=
2 1 2
( ( | )) ( ).
a).
E E Y Y E Y
+ +
Z Z
e
One may be able to use new responses simulated from the
posterior predictive distribution to compute
+ +
e
1 1
,..., | , ,...,
{maxPr( |
n n n n
a a
Y Y data
R
E Y
x
1 1
, ,..., , ,..., )}.
a a
n n n
A data Y Y x Z Z ,
n
But such a two-tiered Monte Carlo computation (with a maximization in between)
could become rather burdensome.
5.2. A Preposterior Analysis with t-Distributed Error Terms
The Gibbs sampling for the SUR model with t-distributed error terms poses a difficulty
for the single imputation approach. This is due to the fact that we need to use the
terms (11) in the Gibbs sampling simulations
for the but each term depends upon As such, we need to use
the more computationally intensive multiple imputation (parametric bootstrap) approach.

'
= +
1
[ /2 (1/2)( ) ( )]
i i i i
c v y Z B y Z B E
= + ( 1,..., ),
i a
w i n n
i
c
1
i
.
i
y
Such modifications will give the experimenter an idea of how much the reliability can
be expected to increase by reducing model uncertainty. For example, the experimenter can
forecast the effects of replicating the experiment a certain number of times. This idea is
similar in spirit to the notion of a "preposterior" analysis as described by Raiffa and
Schlaiffer [38].
6. Examples
6.1. A Mixture Experiment
This example involves a mixture experiment to study the surfactants and emulsification
variables involved in pseudolatex formation for controlled-release drug- containing beads
(Frisbee and McGinity [11]). An extreme vertices design was used to study the influence of
surfactant blends on the size of the particles in the pseudolatex and the glass transition
temperature of films cast from those pseudolatexes. The factors chosen were: ="% of
1
x
362 Peterson, Mir-Quesada and del Castillo
= = Pluronic

F68",
2
"% x of polyoxyethlene 40 monostearate",
3
"% x
1
Y
2
, Y
1
Y
2
. Y
1
Y
2
Y
of polyoxyethylene
sorbitan fatty acid ester NF". The experimental design used was a modified McLean-
Anderson design (McLean and Anderson [28]) with two centroid points, resulting in a
sample size of eleven. The response variables measured were particle size' and glass
transition temperature, which are denoted here as and respectively. The goal of
the study was to find values of x
1
, x
2
, and x
3
to minimize as best as possible both and
Here, we choose an upper bound for to be 240 and an upper bound for to be 19.
Anderson and Whitcomb [2] also analyze this data set to illustrate Design Expert's capability
to map out overlapping mean response surfaces.
Frisbee and McGinity [11] and Anderson and Whitcomb [2] use an SMR model with
second-order terms to model the bivariate response data. For this example, however, a
severe outlier run was deleted. The resulting regression models obtained were:
= + +
1 1 2 3 1 3 2 3
248 272 533 485 424 , y x x x x x x x
= + + +
2 1 2 3 1 3 2 3
18.7 14.1 35.4 36.7 18.0 . y x x x x x x x
2
, Y
= + + +
2 1 2 3 1 2 1 3 2 3
18.8 15.6 35.4 3.59min( , ) 17.7min( , ) 10.0min( , ), y x x x x x x x x x
2
Y
2
R
-values p
(13)
(14)
For this paper, several different mixture-experiment regression models were fit to each
for each response type. For the Becker-type model (Becker [3]),
(15)
resulted in a mean squared error of 1.71, which is a 53% reduction over the quadratic
model for in (14). The adjusted for the model in (15) is 96.4%. It turned out that
the model forms in (13) and (15) gave the best overall fits to the data. As such, these two
different (SUR) model forms were chosen to model the response surfaces. The Wilks-Shapiro
test for normality of the residuals for each regression model yields greater than
0.05. Tests for multivariate normality via skewness and kurtosis (Mardia [27]) were not
significant at the 5% level; although such tests would not be very sensitive for the small
sample size used in this example.
Figure 1 shows the points where the predicted mean responses associated with the
model in (13) are less than 240. Likewise, Figure 2 shows the points where the predicted
mean responses associated with the model in (15) are less than 19. Figure 3 shows the
points where the predicted mean responses associated with both models in (13) and (15) are
less than 240 and 19, respectively.
We define = s s
1 2 1 2
{ , : 240, 19} A y y y y and = e ( ) Pr( | , data). p A Y x ( ) p All x x
probabilities are computed here using N=1000 simulated values for each Y x point. For
this example, ten independent Gibbs sampling chains were simulated for 1000 iterations
following a burn-in of 100 iterations. Each chain was thinned to take only every tenth
simulation. Here, N=1000 was taken as a reasonable value (Gelman et al. [12]). For
binomial probabilities, N of a 1000 produces a standard error of at most 0.0158 (assuming
roughly independent posterior simulations). The Gelman-Rubin convergence statistics for
all of the model parameters were all very good (less than 1.1 as recommended by Gelman
et al. [12]). Gridding over the design simplex was done using 32,761 grid points. Using the
SUR models in (13) and (15) we obtain 0.19) .
'
= = ) *) 0.622 max ( ( p p x x * (0.81, 0, at = x
*. Clearly, this is no indication of a reliable process at x If the experimenter had instead
used the classical SMR model (forms in (13) and (14)) to maximize ( ) p x over the design
simplex then he/she would obtain max = = ( ) ( *) 0.863 p p x x * (0.78, 0, 0.22) . at
'
= x
( ) p Hence the optimal x for the SMR model is about a 39% increase in process reliability,
A Bayesian Reliability Approach to Multiple Response Optimization 363
2
y
p
though still possibly unacceptable. The noticeable difference in probabilities is due to the
fact that the (better fitting) model in (15), while having a smaller MSE, also has a larger
mean predicted value than the model in (14) when is greater then 0.5. The
probabilities
1
x
( ) x for the SUR model were also computed assuming that the residual
errors had a t-distribution with 4 df. In this case, max = = ( ) ( *) 0.613 p p x x
* (0.83, 0, 0.17) .
'
= x
at
(Gelman et al. [12], suggest a t-distribution with 4 df for doing a robust
data analysis.)
0.00
0.25
0.50
0.75
1.00
x1=1
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00 x2=1 x3=1
1
, y
Figure 1. The gray area is the part of the response surface
associated with the model in (13) where the predicted mean
response, is less than or equal to 240.
0.00
0.25
0.50
0.75
1.00
x1=1
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00 x2=1 x3=1
2
, y
Figure 2. The gray area is the part of the response surface
associated with the model in (15) where the predicted mean
response, is less than or equal to 19.
x1=1
x2=1 x3=1
x1=1
x2=1 x3=1
364 Peterson, Mir-Quesada and del Castillo
0.00
0.25
0.75
1.00
x1=1
0.50
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
x2=1 x3=1
s
1
240 y s
2
19. y
-points
Figure 3. The gray area is the part of the response surface
associated with the models in (13) and (15) where both
and
All of these models may indicate the need for remedial action. Such action could be of
the form of reducing the process variability, decreasing the means, and/or removing
uncertainty due to the unknown model parameter values. Since the first two actions may be
difficult to achieve, we consider the effects of adding more replications to the experimental
design by way of a preposterior analysis. To assess the effect of adding additional data, the
preposterior analyses discussed in section 5 were performed. To keep the computations
tractable, the same optimized x associated with each model and the original data
set, were used. For each model, ( p ) x was computed using its own optimal x-point,
for the SMR model (with normal errors), for the
SUR model (with normal errors), and
* (0.78, 0, 0.22) = x * (0.81, 0, 0.19)
'
= x
0.17)
'
* (0.83, 0,
'
= x for the SUR model with
t-distributions errors (with 4 df). For each model, the entire design matrix was replicated 2,
3, or 4 times. For the SMR model the degrees of freedom were adjusted accordingly. For
the SUR model with normally distributed errors, both preposterior approaches (single and
multiple imputation) discussed in section 5.1 were used. For the SUR model with the
t-distribution errors, the multiple imputation approach as discussed in section 5.2 was used.
( *) p Figure 4 shows the increase in x as the number of replications is increased from
one to four for both the SMR and SUR models. Here, it is evident that the SMR model
might lead the experimenter to believe that reduction of model parameter uncertainty by
using three or four replications would provide sufficient evidence that the process has a
high rate of conformance with the specifications given by the set A. However, the better
fitting SUR models, using either normal or t-distribution errors, indicate that increasing the
number of experimental replications may not validate that the process has a high rate of
conformance even with four replications. Instead, the SUR models are indicating that the
experimenter must improve the process means and/or variances to obtain conformance
with higher probability. The single imputation and (bootstrap) multiple imputation results
for the SUR model with normal errors are reasonably close but further research needs to be
done on how these two preposterior approaches compare. Nonetheless, this shows the
x1=1
x2=1 x3=1
A Bayesian Reliability Approach to Multiple Response Optimization 365
importance of improved modeling which can be achieved by generalizing from the SMR
model to the more flexible SUR model.
SMR normal errors
SUR normal errors
SUR normal errors with bootstrap
SUR t-dist. errors with bootstrap
Rep=1 Rep=2 Rep=3 Rep=4
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
Figure 4. Probabilities of conformance for increasing numbers of design
replications. Reps 2-4 are the preposterior probability estimates. (Bootstrapping is
not applicable for Rep=1.) For each model, p(x) was computed using its own
optimal x-point from Rep=1.
6.2. Optimization of an HPLC Assay
This example illustrates the optimization of an event probability
for a high performance liquid chromatography (HPLC) assay as originally discussed in
Peterson [33]. Here there are three factors ( = percent of isopropyl alcohol (pipa), =
temperature (temp), and = pH) and four responses ( = resolution (rs), = run time,
= signal-to-noise ratio (s/n), = tailing). For this assay, the chemist desires to have the
event,
e Pr( | , data) A Y x
2
x
2
y
1
x
3
x
1
y
3
y
4
y
= > s > s s
1 2 3 4
{ : 1.8, 15, 300, 0.75 0.85}, A y y y y y (16)
occur with high probability. As such it is desired to maximize = e ( ) Pr( | , data) p A x Y x as
a function of . x
A Box-Behnken experimental design was run, with three center points, to gather data
to fit four quadratic response surfaces. For the SMR model, full second-order quadratic
regression forms were used for each response.
All of the response surface models fit well, with all values above 99%. As in
example 1, the Wilks-Shapiro test for normality of the residuals for each regression model
yields p-values greater than 0.05, and the Mardia tests for multivariate skewness and
2
R
SMR normal errors
SUR normal errors
SUR normal errors with bootstrap
SUR t-dist. errors with bootstrap
Rep=1 Rep=2 Rep=3 Rep=4
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
366 Peterson, Mir-Quesada and del Castillo
kurtosis were not significant at the 5% level. The factor levels were coded so that all values
were between and with the center of the experimental region at the origin. 1
(1
1
2
3
4 0
+1,
|
| |
+
(1)
1 1
(2)
1
(3)
1 1
(4)
1
y x
y x
y x
y x
Some of the factor terms for the second-order response surface models were not
statistically significant so a SUR model was created from an SMR model by removing
some of the non-significant terms, while still preserving model-term hierarchy. Using the
STEPWISE option in SAS PROC REG the four regression models obtained for the SUR
model analysis were:
| | | | | |
| | | | | |
| | | | | |
= + + + + + +
= + + + + + +
= + + + + + +
= +
) (1) (1) 2 (1) 2 (1)
0 2 2 11 1 22 2 12 1 2 1
(2) (2) (2) (2) 2 (2) 2 (2)
0 1 2 2 3 3 11 1 22 2 12 1 2 2
(3) (3) (3) (3) 2 (3)
0 2 2 3 3 33 3 12 1 2 3
(4)
1
,
,
,
x x x x x e A
x x x x x x
x x x x x e
| | | + + + +
(4) (4) 2 (4) 2
2 2 11 1 22 2 4
. x x x e
e
(17)
For comparison purposes, a sensitivity analysis involving three models were performed.
The three models were:
Model 1: An SMR model using a full second-order polynomial with normally distributed
errors.
Model 2: A SUR model as shown in (17) above with normally distributed errors.
Model 3: A SUR model as shown in (17) above with errors have a t-distribution with 4 df.
For models 1-3, 10,000 Monte Carlo simulations were done as it appears that the true
underlying Bayesian probabilities were extreme (close to 1). One hundred burn-in
simulations were done to get each independent simulated value. Gridding steps of 0.1 were
used across the coded design space. For the SMR model (Model 1), the maximum ( ) p x
value is = ( *) 0.96 p 4 x where ' = * (73.5, 43, 0.1) . x
( ) p
However for the SUR model with
normal errors (Model 2), the maximum x value is = ( *) 1 p x where ' = * (73.5, 43 x , 0.1)
(although a neighborhood containing * x also had values of = ( ) p 1). x Replacing the
normal errors assumption in Model 2 with t-distributions errors (with 4 df) (Model 3),
produced a maximum ( p ) value of = ( *) 0.978, p x where ' = * x (74.5, 44.9, 0.06) . x
It is interesting to note that the ( *) p x s for the SUR models are larger than for the
SMR model. In this example, Model 2 is simply a special case of Model 1 where the
non-significant regression terms are removed. Apparently, this removal of non-significant
terms for Model 2 tightens up the posterior predictive distribution enough to increase the
optimal ( ) p x value of that over Model 1. Even the optimal ( ) p x value for Model 3 is
slightly larger than that for Model 1 despite the use of a residual error t-distribution with 4
df. For this example, the sensitivity analysis tells us that for all three models the worst case
probability is 0.964. If this smallest reliability estimate is adequate then we need not do a
preposterior analysis to check the effects of gathering additional data.
7. Summary
The SMRmodel (with normal errors) has a closed-formposterior predictive distribution
allowing quick and easy computation of = e ( ) Pr( | ,data) p A x Y x or other posterior
predictive metrics as shown in Peterson [33]. However, in some cases the use of the more
general SUR model will be preferable. One such case was shown in the first example
(section 6.1) where the fit of one of the response types was greatly increased by a change in
the basic model form. For the second example (section 6.2) all of the individual regression
models each had some terms that were not statistically significant. The larger posterior
probability of conformance value for the SUR model over the SMR model indicates that
A Bayesian Reliability Approach to Multiple Response Optimization 367
some further efficiency can be obtained by removing terms in some of the models that do
not appear predictive.
The preposterior analysis discussed in section 5 allows the investigator to assess the
effect of model parameter uncertainty on the posterior predictive probability of conformance.
If the process means are all in conformance with process specifications, then an increase in
data will result in some increase in posterior predictive probability of conformance. If this
predicted increase is satisfactory, then the experimenter may want to gather more data to
confirm this. If this predicted increase is not satisfactory, then the experimenter may wish
to take different action and consider the possibility of process modification to improve
response means and/or variances. At this point, it is not clear in general how the single and
multiple imputation preposterior analyses compare to each other. Further research is needed
to investigate the properties of preposterior analyses for response surface optimization.
Useful modifications of the SUR model are possible with the addition of noise variables
and a t-distribution model for the residual errors. Further research in this area to make the
variance-covariance matrix a function of the controllable factors may also prove helpful to
experimenters.
Acknowledgements
We would like to thank Joseph Schaffer for a helpful discussion on the imputation
aspects of this work as related to the preposterior analysis.
References
1. Ames, A. E., Mattucci, N., MacDonald, S., Szonyi, G. and Hawkins, D. M. (1997).
Quality loss functions for optimization across multiple response surfaces. Journal of
Quality Technology, 29, 339-346.
2. Anderson, M. J. and Whitcomb, P. J. (1998). Find the most favorable formulations.
Chemical Engineering Progress, April, 63-67.
3. Becker, N. G. (1968). Models for the response of a mixture. Journal of the Royal
Statistics Society, Series B, 30, 349-358.
4. Chatterjee, S., Laudato, M. and Lynch, L. A. (1996). Genetic algorithms and their
statistical applications: an introduction. Computational Statistics and Data Analysis, 22,
633-651.
5. Congdon, P. (2006). Bayesian Statistical Modeling, 2nd edition. John Wiley and Sons
Ltd., Chichester.
6. del Castillo, E., Montgomery, D. C. and McCarville, D. R. (1996). Modified
desirability functions for multiple response optimization. Journal of Quality Technology,
28, 337-345.
7. Derringer, G. and Suich, R. (1980). Simultaneous optimization of several response
variables. Journal of Quality Technology, 12, 214-219.
8. Derringer, G. (1994). A balancing act: optimizing a product's properties, Quality
Progress, June, 51-58.
9. Efron, B. and Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Chapman and
Hall/CRC, Boca Raton.
10. Food and Drug Administration (2006). Guidance for Industry - Q8 Pharmaceutical
Development. U. S. Department of Health and Human Services, CDER, CBER, USA.
11. Frisbee, S. E. and McGinity, J. W. (1994). Influence of nonionic surfactants on the
physical and chemical properties of a biodegradable pseudolatex. European Journal of
Pharmaceutics and Biopharmaceutics, 40, 355-363.
368 Peterson, Mir-Quesada and del Castillo
12. Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2004). Bayesian Data Analysis,
2nd edition. Chapman and Hall/CRC, Boca Raton.
13. Genz, A. and Bretz, F. (2002). Methods for the computation of multivariate t-
probabilities. Journal of Computational and Graphical Statistics, 11, 950-971.
14. Geweke, J. (2005). Contemporary Bayesian Econometrics and Statistics. John Wiley and
Sons, Inc. Hoboken, NJ.
15. Griffiths, W. (2003). Bayesian inference in the seemingly unrelated regressions model.
In Computer-Aided Econometrics, eds. D. E. A. Giles, New York, Marcel Dekker,
263-290.
16. Harrington, E. C. (1965). The desirability function. Industrial Quality Control, 21,
494-498
17. Hunter, J. S. (1999). Discussion of response surface methodology current status and
future directions. Journal of Quality Technology, 31, 54-57.
18. Johnson, Mark E. (1987). Multivariate Statistical Simulation. John Wiley, New York.
19. Johnson, R. A. and Wichern, D. W. (2002). Applied Multivariate Statistical Analysis, 5th
edition. Englewood Cliffs, Prentice Hall.
20. Khuri, A. I. and Conlon, M. (1987). Simultaneous optimization of multiple responses
represented by polynomial regression function. Technometrics, 23, 363-375.
21. Kim, K. and Lin, D. K. J. (2000). Simultaneous optimization of mechanical
properties of steel by maximizing exponential desirability functions. Journal of the
Royal Statistical Society, Series C, 49, 311-325.
22. Ko, Y. H., Kim, K. J. and Jun, C. H. (2005). A new loss function-based method for
multiresponse optimization. Journal of Quality Technology, 37, 50-59.
23. Kotz, S. and Johnson, R. (1985). Encyclopedia of Statistical Sciences, 6, 129-130.
24. Lange, K., Little, R. and Taylor, J. (1989). Robust statistical modeling using the
t-distribution. Journal of the American Statistical Association, 84, 881-896.
25. Liu, C. and Rubin, D. B. (1995). ML estimation of the t-distribution using EM and its
extensions, ECM and ECME. Statistica Sinica, 5, 19-39.
26. Liu, C. (1996). Bayesian robust multivariate linear regression with incomplete data.
Journal of the American Statistical Association, 91, 1219-1227.
27. Mardia, K. V. (1974). Applications of some measures of multivariate skewness and
kurtosis in testing normality and robustness studies. Sankhya B, 36, 115-128.
28. McLean, R. A. and Anderson, V. L. (1966). Extreme vertices design of mixture
experiments. Technometrics, 8, 447-454.
29. Mir-Quesada, G., del Castillo, E. and Peterson, J. J., (2004). A Bayesian approach for
multiple response surface optimization in the presence of noise variables. Journal of
Applied Statistics, 31, 251-270.
30. Montgomery, D. C. and Bettencourt, V. M. (1977). Multiple response surface methods
in computer simulation. Simulation, 29, 113-121.
31. Nelder, J. A. and Mead, R. (1964). A simplex method for function minimization.
The Computer Journal, 7, 308-313.
32. Percy, D. F. (1992). Prediction for seemingly unrelated regressions. Journal of the Royal
Statistical Society, Series B, 54, 243-252.
33. Peterson, J. J. (2004). A posterior predictive approach to multiple response surface
optimization. Journal of Quality Technology, 36, 139-153.
34. Peterson, J. J. (2008). A Bayesian approach to the ICH Q8 definition of design space.
Journal of Biopharmaceutical Statistics, 18, 958-974.
35. Pignatiello, Jr. J. J. (1993). Strategies for robust multiresponse quality engineering. IIE
A Bayesian Reliability Approach to Multiple Response Optimization 369
Transactions 25, 5-15.
36. Press, S. J. (2003). Subjective and Objective Bayesian Statistics: Principles, Models, and
Applications, 2nd edition. John Wiley, New York.
37. Price, W. L. (1977). A controlled random search procedure for global optimization.
The Computer Journal, 20, 367-370.
38. Raiffa, H. and Schlaiffer, R. (2000). Applied Statistical Decision Theory. John Wiley,
New York.
39. Srivastava, V. K. and Giles, D. E. A. (1987). Seemingly Unrelated Regression Equations
Models. Marcel-Dekker, New York.
40. Rajagopal, R., del Castillo, E. and Peterson, J. J. (2005). Model and distribution-
robust process optimization with noise factors. Journal of Quality Technology, 37,
210-222. (Corrigendum 38, p83).
41. Vining, G. G. (1998). A compromise approach to multiresponse optimization. Journal
of Quality Technology, 30, 309-313.
42. Wood, S. N. (2006). Generalized Additive Models, an Introduction with R. Chapman and
Hall/CRC, Boca Raton, FL.
43. Zellner, A. (1962). An efficient method of estimating seemingly unrelated regressions
and tests of aggregation bias. Journal of the American Statistical Association, 57, 500-509.
44. Zellner, A. (1971). An Introduction to Bayesian Inference in Econometrics. John Wiley,
New York.
Authors Biographies:
John J. Peterson is a Senior Director in the Research Statistics Unit of GlaxoSmithKline
Pharmaceuticals. He received his B.S. in Applied Mathematics and in Computer Science
(double major) from the State University of New York at Stony Brook and his Ph.D. in
statistics from The Pennsylvania State University. Dr. Peterson has over 20 years experience
as a statistician in the pharmaceutical industry. His current research area is in response
surface methodology as applied to pharmaceutical industry problems, including applications
to "chemistry, manufacturing, and control" (CMC) and combination drug studies. Dr.
Peterson is a Fellow of the American Statistical Association and a Senior Member of the
American Society for Quality. He is also on the editorial boards of the Journal of Quality
Technology and the journal Applied Stochastic Models in Business and Industry.
Guillermo Mir-Quesada is a Sr. Research Scientist in the Bioprocess Research and
Development department at Eli Lilly and Co. He received a Ph.D. in Industrial Engineering
and Operations Research from Pennsylvania State University. He has worked in the
Biotech division of Eli Lilly and Co. since 2003, where he has supported the development
of manufacturing processes for active pharmaceutical ingredients. He is involved in
activities related to integrating Quality by Design principles in the drug development plan
and assessing the capability of manufacturing process in development.
Enrique del Castillo is a Distinguished Professor of Engineering in the Department of
Industrial & Manufacturing Engineering at the Pennsylvania State University. He also
holds an appointment as Professor of Statistics at PSU and directs the Engineering
Statistics Laboratory. Dr. Castillos research interests include Engineering Statistics with
particular emphasis on Response Surface Methodology and Time Series Control. An
author of over 80 refereed journal papers, he is the author of the textbooks Process
Optimization, a Statistical Approach (Springer, 2007), Statistical Process Adjustment for Quality
Control (Wiley, 2002), and co-editor (with B.M. Colosimo) of the book Bayesian Process
Monitoring, Control, and Optimization (CRC, 2006). He is currently (2006-2009) editor-in-
chief of the Journal of Quality Technology.