531 Notesv 2

Graduate Lectures and Problems in Quality
Control and Engineering Statistics:

Theory and Methods
To Accompany
Statistical Quality Assurance Methods for Engineers
by
Vardeman and Jobe
Stephen B. Vardeman
V2.0: January 2001
c Stephen Vardeman 2001. Permission to copy for educational
purposes granted by the author, subject to the requirement that
this title page be axed to each copy (full or partial) produced.
2
Contents
1 Measurement and Statistics 1
1.1 Theory for Range-Based Estimation of Variances . . . . . . . . . 1
1.2 Theory for Sample-Variance-Based Estimation of Variances . . . 3
1.3 Sample Variances and Gage R&R . . . . . . . . . . . . . . . . . . 4
1.4 ANOVA and Gage R&R . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Condence Intervals for Gage R&R Studies . . . . . . . . . . . . 7
1.6 Calibration and Regression Analysis . . . . . . . . . . . . . . . . 10
1.7 Crude Gaging and Statistics . . . . . . . . . . . . . . . . . . . . . 11
1.7.1 Distributions of Sample Means and Ranges from Integer
Observations . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.7.2 Estimation Based on Integer-Rounded Normal Data . . . 13
2 Process Monitoring 21
2.1 Some Theory for Stationary Discrete Time Finite State Markov
Chains With a Single Absorbing State . . . . . . . . . . . . . . . 21
2.2 Some Applications of Markov Chains to the Analysis of Process
Monitoring Schemes . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Integral Equations and Run Length Properties of Process Moni-
toring Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3 An Introduction to Discrete Stochastic Control Theory/Minimum
Variance Control 37
3.1 General Exposition . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4 Process Characterization and Capability Analysis 45
4.1 General Comments on Assessing and Dissecting Overall Variation 45
4.2 More on Analysis Under the Hierarchical Random Eects Model 47
4.3 Finite Population Sampling and Balanced Hierarchical Structures 50
5 Sampling Inspection 53
5.1 More on Fraction Nonconforming Acceptance Sampling . . . . . 53
5.2 Imperfect Inspection and Acceptance Sampling . . . . . . . . . . 58
3
4 CONTENTS
5.3 Some Details Concerning the Economic Analysis of Sampling In-
spection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6 Problems 69
1 Measurement and Statistics . . . . . . . . . . . . . . . . . . . . . 69
2 Process Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3 Engineering Control and Stochastic Control Theory . . . . . . . 93
4 Process Characterization . . . . . . . . . . . . . . . . . . . . . . . 101
5 Sampling Inspection . . . . . . . . . . . . . . . . . . . . . . . . . 115
A Useful Probabilistic Approximation 127
Chapter 1
Measurement and Statistics
V&J 2.2 presents an introduction to the topic of measurement and the relevance
of the subject of statistics to the measurement enterprise. This chapter expands
somewhat on the topics presented in V&J and raises some additional issues.
Note that V&J equation (2.1) and the discussion on page 19 of V&J are
central to the role of statistics in describing measurements in engineering and
quality assurance. Much of Stat 531 concerns process variation. The discus-
sion on and around page 19 points out that variation in measurements from a
process will include both components of real process variation and measure-
ment variation.
1.1 Theory for Range-Based Estimation of Vari-
ances
Suppose that X
1
; X
2
; : : : ; X
n
are iid Normal (,
2
) random variables and let
R = max X
i
minX
i
= max(X
i
) min(X
i
)
=
max
X
i
min
X
i
= (maxZ
i
minZ
i
)
where Z
i
= (X
i
)=. Then Z
1
; Z
2
; : : : ; Z
n
are iid standard normal random
variables. So for purposes of studying the distribution of the range of iid normal
variables, it suces to study the standard normal case. (One can derive general
facts from the = 1 facts by multiplying by .)
Consider rst the matter of the nding the mean of the range of n iid stan-
dard normal variables, Z
1
; : : : ; Z
n
. Let
U = minZ
i
; V = maxZ
i
and W = V U :
1
2 CHAPTER 1. MEASUREMENT AND STATISTICS
Then
EW = EV EU
and
EU = EminZ
i
= E(minZ
i
) = Emax(Z
i
) ;
where the n variables Z
1
; Z
2
; : : : ; Z
n
are iid standard normal. Thus
EW = EV EU = 2EV :
Then, (as is standard in the theory of order statistics) note that
V t , all n values Z
i
are t :
So with the standard normal cdf,
P[V t] =
n
(t)
and thus a pdf for V is
f(v) = n(v)
n1
(v) :
So
EV =
Z
1
1
v
n(v)
n1
(v)
dv ;
and the evaluation of this integral becomes a (very small) problem in numerical
analysis. The value of this integral clearly depends upon n. It is standard to
invent a constant (whose dependence upon n we will display explicitly)
d
2
(n)
:
= EW = 2EV
that is tabled in Table A.1 of V&J. With this notation, clearly
ER = d
2
(n) ;
(and the range-based formulas in Section 2.2 of V&J are based on this simple
fact).
To nd more properties of W (and hence R) requires appeal to a well-known
order statistics result giving the joint density of two order statistics. The joint
density of U and V is
f(u; v) =
n(n 1)(u)(v) ((v) (u))

n2
for v > u
0 otherwise :
A transformation then easily shows that the joint density of U and W = V U
is
g(u; w) =
n(n 1)(u)(u +w) ((u +w) (u))

n2
for w > 0
0 otherwise :
1.2. THEORYFORSAMPLE-VARIANCE-BASEDESTIMATIONOF VARIANCES3
Then, for example, the cdf of W is
P[W t] =
Z
t
0
Z
1
1
g(u; w)dudw ;
and the mean of W
2
is
EW
2
=
Z
1
0
Z
1
1
w
2
g(u; w)dudw :
Note that upon computing EW and EW
2
, one can compute both the variance
of W
Var W = EW
2
(EW)
2
and the standard deviation of W,
p
Var W . It is common to give this standard
deviation the name d
3
(n) (where we continue to make the dependence on n
explicit and again this constant is tabled in Table A.1 of V&J). Clearly, having
computed d
3
(n)
:
=
p
Var W, one then has
p
Var R = d
3
(n) :
1.2 Theory for Sample-Variance-Based Estima-
tion of Variances
Continue to suppose that X
1
; X
2
; : : : ; X
n
are iid Normal (;
2
) random vari-
ables and take
s
2
:
=
1
n 1
n
X
i=1
(X
i

X)
2
:
Standard probability theory says that
(n 1)s
2
2

2
n1
:
Now if U
2
it is the case that EU = and Var U = 2. It is thus immediate

that
Es
2
= E

2
n 1
(n 1)s
2

2
n 1
(n 1)s
2
=
2
and
Var s
2
= Var

2
n 1
(n 1)s
2

2
n 1
2
Var
(n 1)s
2
=
2
4
n 1
so that
p
Var s
2
=
2
r
2
n 1
:
Knowing that (n 1)s
2
=
2

2
n1
also makes it easy enough to develop
properties of s =
p
s
2
. For example, if
f(x) =
8
<
:
1
2
(n1)=2
(
n1
2
)
x
(
n1
2
)1
exp
x
2
for x > 0
0 otherwise
is the
2
n1
probability density, then
Es = E
r

2
n 1
r
(n 1)s
2
2
=

p
n 1
Z
1
0
p
xf(x)dx = c
4
(n) ;
for
c
4
(n)
:
=
R
1
0
p
xf(x)dx
p
n 1
another constant (depending upon n) tabled in Table A.1 of V&J. Further, the
standard deviation of s is
p
Var s =
q
Es
2
(Es)
2
=
q
2
(c
4
(n))
2
=
q
1 c
2
4
(n) = c
5
(n)
for
c
5
(n)
:
=
q
1 c
2
4
(n)
yet another constant tabled in Table A.1.
The fact that sums of independent
2
random variables are again
2
(with
degrees of freedom equal to the sum of the component degrees of freedom) and
the kinds of relationships in this section provide means of combining various
kinds of sample variances to get pooled estimators of variances (and variance
components) and nding the means and variances of these estimators. For ex-
ample, if one pools in the usual way the sample variances from r normal samples
of size m to get a single pooled sample variance, s
2
pooled
, r(m1)s
2
pooled
=
2
is
2
with degrees of freedom = r(m 1). That is, all of the above can be
applied by thinking of s
2
pooled
as a sample variance based on a sample of size
n= r(m1) + 1.
1.3 Sample Variances and Gage R&R
The methods of gage R&R analysis presented in V&J 2.2.2 are based on ranges
(and the facts in 1.1 above). They are presented in V&J not because of their
eciency, but because of their computational simplicity. Better (and analo-
gous) methods can be based on the facts in 1.2 above. For example, under the
two-way random eects model (2.4) of V&J, if one pools I J cell sample
variances s
2
ij
to get s
2
pooled
, all of the previous paragraph applies and gives meth-
ods of estimating the repeatability variance component
2
(or the repeatability
standard deviation ) and calculating means and variances of estimators based
on s
2
pooled
.
1.4. ANOVA AND GAGE R&R 5
Or, consider the problem of estimating
reproducibility
dened in display (2.5)
of V&J. With y
ij
as dened on page 24 of V&J, note that for xed i, the
J random variables y
ij

i
have the same sample variance as the J random
variables y
ij
, namely
s
2
i
:
=
1
J 1
X
j
( y
ij
y
i:
)
2
:
But for xed i the J random variables y
ij
i
are iid normal with mean and
variance
2
+
2
+
2
=m, so that
Es
2
i
=
2
+
2
+
2
=m :
So
1
I
X
i
s
2
i
is a plausible estimator of
2
+
2
+
2
=m. Hence
1
I
X
i
s
2
i

s
2
pooled
m
;
or better yet
max
0;
1
I
X
i
s
2
i

s
2
pooled
m
!
(1.1)
is a plausible estimator of
2
reproducibility
.
1.4 ANOVA and Gage R&R
Under the two-way random eects model (2.4) of V&J, with balanced data, it
is well-known that the ANOVA mean squares
MSE =
1
IJ(m1)
X
i;j;k
(y
ijk
y
::
)
2
;
MSAB =
m
(I 1)(J 1)
X
i;j
( y
ij
y
i:
y
:j
+ y
::
)
2
;
MSA =
mJ
I 1
X
i
( y
i:
y
::
)
2
; and
MSB =
mI
J 1
X
i
( y
:j
y
::
)
2
;
are independent random variables, that
EMSE =
2
;
EMSAB =
2
+m
2
;
EMSA =
2
+m
2
+mJ
2
; and
EMSB =
2
+m
2
+mI
2
;
Table 1.1: Two-way Balanced Data Random Eects Analysis ANOVA Table
ANOVA Table
Source SS df MS EMS
Parts SSA I 1 MSA
2
+m
2
+mJ
2
Operators SSB J 1 MSB

2
+m
2
+mI
2
PartsOperators SSAB (I 1)(J 1) MSAB

2
+m
2
Error SSE (m1)IJ MSE

2
Total SSTot mIJ 1
and that the quantities
(m1)IJMSE
EMSE
;
(I 1)(J 1)MSAB
EMSAB
;
(I 1)MSA
EMSA
and
(J 1)MSB
EMSB
are
2
random variables with respective degrees of freedom
(m1)IJ ; (I 1)(J 1) ; (I 1) and (J 1) :
These facts about sums of squares and mean squares for the two-way random
eects model are often summarized in the usual (two-way random eects model)
ANOVA table, Table 1.1. (The sums of squares are simply the mean squares
multiplied by the degrees of freedom. More on the interpretation of such tables
can be found in places like 8-4 of V.)
As a matter of fact, the ANOVA error mean square is exactly s
2
pooled
from
1.3 above. Further, the expected mean squares suggest ways of producing sen-
sible estimators of other parametric functions of interest in gage R&R contexts
(see V&J page 27 in this regard). For example, note that
2
reproducibility
=
1
mI
EMSB +
1
m
(1
1
I
)EMSAB
1
m
EMSE ;
which suggests the ANOVA-based estimator
b
2
reproducibility
= max
0;
1
mI
MSB +
1
m
(1
1
I
)MSAB
1
m
MSE
: (1.2)
What may or may not be well known is that this estimator (1.2) is exactly the
estimator of
2
reproducibility
in display (1.1).
Since many common estimators of quantities of interest in gage R&R studies
are functions of mean squares, it is useful to have at least some crude standard
errors for them. These can be derived from delta method/propagation of
error/Taylor series argument provided in the appendix to these notes. For
example, if MS
i
i = 1; : : : ; k are independent random variables, (
i
MS
i
=EMS
i
)
with a
2
i
distribution, consider a function of k real variables f(x
1
; : : : ; x
k
) and
the random variable
U = f(MS
1
; MS
2
; :::; MS
k
) :
1.5. CONFIDENCE INTERVALS FOR GAGE R&R STUDIES 7
Propagation of error arguments produce the approximation
Var U
k
X
i=1
@f
@x
i
EMS1;EMS2;:::;EMS
k
!
2
Var MS
i
=
k
X
i=1
@f
@x
i
EMS1;EMS2;:::;EMS
k
!
2
2(EMS
i
)
2
i
;
and upon substituting mean squares for their expected values, one has a stan-
dard error for U, namely
p
d
Var U =
v
u
u
t
2
k
X
i=1
@f
@x
i
MS
1
;MS
2
;:::;MS
k
!
2
(MS
i
)
2
i
: (1.3)
In the special case where the function of the mean squares of interest is linear
in them, say
U =
k
X
i=1
c
i
MS
i
;
the standard error specializes to
p
d
Var U =
v
u
u
t
2
k
X
i=1
c
2
i
(MS
i
)
2
i
;
which provides at least a crude method of producing standard errors for b
2
reproducibility
and b
2
overall
. Such standard errors are useful in giving some indication of the
precision with which the quantities of interest in a gage R&R study have been
estimated.
1.5 Condence Intervals for Gage R&R Studies
The parametric functions of interest in gage R&R studies (indeed in all random
eects analyses) are functions of variance components, or equivalently, functions
of expected mean squares. It is thus possible to apply theory for estimating such
quantities to the problem of assessing precision of estimation in a gage study.
As a rst (and very crude) example of this, note that taking the point of view of
1.4 above, where U = f(MS
1
; MS
2
; : : : ; MS
k
) is a sensible point estimator of
an interesting function of the variance components and
p
d
Var U is the standard
error (1.3), simple approximate two-sided 95% condence limits can be made as
U 1:96
p
d
Var U :
These limits have the virtue of being amenable to hand calculation from the
ANOVA sums of squares, but they are not likely to be reliable (in terms of
holding their nominal/asymptotic coverage probability) for I,J or m small.
Linear models experts have done substantial research aimed at nding re-
liable condence interval formulas for important functions of expected mean
squares. For example, the book Condence Intervals on Variance Components
by Burdick and Graybill gives results (on the so-called modied large sample
method) that can be used to make condence intervals on various important
functions of variance components. The following is some material taken from
Sections 3.2 and 3.3 of the Burdick and Graybill book.
Suppose that MS
1
; MS
2
; : : : ; MS
k
are k independent mean squares. (The
MS
i
are of the form SS
i
=
i
, where SS
i
=EMS
i
=
i
MS
i
=EMS
i
has a
2
i
distribution.) For 1 p < k and positive constants c
1
; c
2
; : : : ; c
k
suppose that
the quantity
= c
1
EMS
1
+ +c
p
EMS
p
c
p+1
EMS
p+1
c
k
EMS
k
(1.4)
is of interest. Let
b
= c
1
MS
1
+ +c
p
MS
p
c
p+1
MS
p+1
c
k
MS
k
:
Approximate condence limits on in display (1.4) are of the form
L =
b

q
V
L
and/or U =
b
+
q
V
U
;
for V
L
and V
U
dened below.
Let F
:df1;df2
be the upper point of the F distribution with df
1
and df
2
degrees of freedom. (It is then the case that F
:df1;df2
= (F
1:df2;df1
)
1
.) Also,
let
2
:df
be the upper point of the
2
df
distribution. With this notation
V
L
=
p
X
i=1
c
2
i
MS
2
i
G
2
i
+
k
X
i=p+1
c
2
i
MS
2
i
H
2
i
+
p
X
i=1
k
X
j=p+1
c
i
c
j
MS
i
MS
j
G
ij
+
p1
X
i=1
p
X
j>i
c
i
c
j
MS
i
MS
j
G
ij
;
for
G
i
= 1

i
2
:i
;
H
i
=

i
2
1:
i
1 ;
G
ij
=
(F
:i;j
1)
2
G
2
i
F
2
:i;j
H
2
j
F
:i;j
;
and
G
ij
=
8
>
<
>
:
0 if p = 1
1
p 1
1

i
+
j
:i+j
2
(
i
+
j
)
2
j

G
2
i
j

G
2
j
i
!
otherwise :
On the other hand,
V
U
=
p
X
i=1
c
2
i
MS
2
i
H
2
i
+
k
X
i=p+1
c
2
i
MS
2
i
G
2
i
+
p
X
i=1
k
X
j=p+1
c
i
c
j
MS
i
MS
j
H
ij
+
k1
X
i=p+1
k
X
j>i
c
i
c
j
MS
i
MS
j
H
ij
;
1.5. CONFIDENCE INTERVALS FOR GAGE R&R STUDIES 9
for G
i
and H
i
as dened above, and
H
ij
=
(1 F
1:i;j
)
2
H
2
i
F
2
1:i;j
G
2
j
F
1:
i
;
j
;
and
H
ij
=
8
>
>
<
>
>
:
0 if k = p + 1
1
k p 1
0
@
1

i
+
j
2
:i+j
!
2
(
i
+
j
)
2
j

G
2
i
j

G
2
j
i
1
A
otherwise :
One uses (L; 1) or (1; U) for condence level (1 ) and the interval (L; U)
for condence level (1 2). (Using these formulas for hand calculation is
(obviously) no picnic. The C program written by Brandon Paris (available o
the Stat 531 Web page) makes these calculations painless.)
A problem similar to the estimation of quantity (1.4) is that of estimating
= c
1
EMS
1
+ +c
p
EMS
p
(1.5)
for p 1 and positive constants c
1
; c
2
; : : : ; c
p
. In this case let
b
= c
1
MS
1
+ +c
p
MS
p
;
and continue the G
i
and H
i
notation from above. Then approximate condence
limits on given in display (1.5) are of the form
L =
b

v
u
u
t
p
X
i=1
c
2
i
MS
2
i
G
2
i
and/or U =
b
+
v
u
u
t
p
X
i=1
c
2
i
MS
2
i
H
2
i
:
One uses (L; 1) or (1; U) for condence level (1 ) and the interval (L; U)
for condence level (1 2).
The Fortran program written by Andy Chiang (available o the Stat 531
Web page) applies Burdick and Graybill-like material and the standard errors
(1.3) to the estimation of many parametric functions of relevance in gage R&R
studies.
Chiangs 2000 Ph.D. dissertation work (to appear in Technometrics in Au-
gust 2001) has provided an entirely dierent method of interval estimation of
functions of variance components that is a uniform improvement over the mod-
ied large sample methods presented by Burdick and Graybill. His approach
is related to improper Bayes methods with so called Jereys priors. Andy
has provided software for implementing his methods that, as time permits, will
be posted on the Stat 531 Web page. He can be contacted (for preprints of his
work) at stackl@nus.edu.sg at the National University of Singapore.
1.6 Calibration and Regression Analysis
The estimation of standard deviations and variance components is a contribu-
tion of the subject of statistics to the quantication of measurement system
precision. The subject also has contributions to make in the matter of im-
proving measurement accuracy. Calibration is the business of bringing a local
measurement system in line with a standard measurement system. One takes
measurements y with a gage or system of interest on test items with known
values x (available because they were previously measured using a gold stan-
dard measurement device). The data collected are then used to create a con-
version scheme for translating local measurements to approximate gold standard
measurements, thereby hopefully improving local accuracy. In this short sec-
tion we note that usual regression methodology has implications in this kind of
enterprise.
The usual polynomial regression model says that n observed random values
y
i
are related to xed values x
i
via
y
i
=
0
+
1
x
i
+
2
x
2
i
+ +
k
x
k
i
+"
i
(1.6)
for iid Normal (0;
2
) random variables "
i
. The parameters and are the
usual objects of inference in this model. In the calibration context with x a
gold standard value, quanties precision for the local measurement system.
Often (at least over a limited range of x) 1) a low order polynomial does a good
job of describing the observed x-y relationship between local and gold standard
measurements and 2) the usual (least squares) tted relationship
^ y = g(x) = b
0
+bx +b
2
x
2
+ +b
k
x
k
has an inverse g
1
(y). When such is the case, given a measurement y
n+1
from
the local measurement system, it is plausible to estimate that a corresponding
measurement from the gold standard system would be ^ x
n+1
= g
1
(y
n+1
). A
reasonable question is then How good is this estimate?. That is, the matter
of condence interval estimation of x
n+1
is important.
One general method for producing such condence sets for x
n+1
is based on
the usual prediction interval methodology associated with the model (1.6).
That is, for a given x, it is standard (see, e.g. 9-2 of V or 9.2.4 of V&J#2) to
produce a prediction interval of the form
^ y t
q
s
2
+ (std error(^ y))
2
for an additional corresponding y. And those intervals have the property that
for all choices of x; ;
0
;
1
;
2
; :::;
k
P
x;;0;1;2;:::;
k
[y is in the prediction interval at x]
= desired condence level
= 1 P[a t
nk1
random variable exceeds jtj] .
1.7. CRUDE GAGING AND STATISTICS 11
But rewording only slightly, the event
y is in the prediction interval at x
is the same as the event
x produces a prediction interval including y.
So a condence set for x
n+1
based on the observed value y
n+1
is
fxj the prediction interval corresponding to x includes y
n+1
g . (1.7)
Conceptually, one simply makes prediction limits around the tted relationship
^ y = g(x) = b
0
+ bx + b
2
x
2
+ + b
k
x
k
and then upon observing a new y sees
what xs are consistent with that observation. This produces a condence set
with the desired condence level.
The only real diculties with the above general prescription are 1) the lack of
simple explicit formulas and 2) the fact that when is large (so that the regres-
sion
p
MSE tends to be large) or the tted relationship is very nonlinear, the
method can produce (completely rational but) unpleasant-looking condence
sets. The rst problem is really of limited consequence in a time when stan-
dard statistical software will automatically produce plots of prediction limits
associated with low order regressions. And the second matter is really inherent
in the problem.
For the (simplest) linear version of this inverse prediction problem, there
is an approximate condence method in common use that doesnt have the
deciencies of the method (1.7). It is derived from a Taylor series argument and
has its own problems, but is nevertheless worth recording here for completeness
sake. That is, under the k = 1 version of the model (1.6), commonly used
approximate condence limits for x
n+1
are (for ^ x
n+1
= (y
n+1
b
0
)=b
1
and
x the sample mean of the gold standard measurements from the calibration
experiment)
^ x
n+1
t
p
MSE
jb
1
j
s
1 +
1
n
+
(^ x
n+1
x)
2
P
n
i=1
(x
i
x)
2
.
1.7 Crude Gaging and Statistics
All real-world measurement is to the nearest something. Often one may ignore
this fact, treat measured values as if they were exact and experience no real
diculty when using standard statistical methods (that are really based on an
assumption that data are exact). However, sometimes in industrial applications
gaging is crude enough that standard (e.g. normal theory) formulas give
nonsensical results. This section briey considers what can be done to appro-
priately model and draw inferences from crudely gaged data. The assumption
throughout is that what are available are integer data, obtained by coding raw
observations via
integer observation =
raw observation some reference value
smallest unit of measurement
(the smallest unit of measurement is the nearest something above).
1.7.1 Distributions of Sample Means and Ranges from In-
teger Observations
To begin with something simple, note rst that in situations where only a few
dierent coded values are ever observed, rather than trying to model observa-
tions with some continuous distribution (like a normal one) it may well make
sense to simply employ a discrete pmf, say f, to describe any single measure-
ment. In fact, suppose that a single (crudely gaged) observation Y has a pmf
f(y) such that
f(y) = 0 unless y = 1; 2; :::; M :
Then if Y
1
; Y
2
; : : : ; Y
n
are iid with this marginal discrete distribution, one can
easily approximate the distribution of a function of these variables via simulation
(using common statistical packages). And for two of the most common statistics
used in QC settings (the sample mean and range) one can even work out exact
probability distributions using computationally feasible and very elementary
methods.
To nd the probability distribution of

Y in this context, one can build up
the probability distributions of sums of iid Y
i
s recursively by adding probabil-
ities on diagonals in two-way joint probability tables. For example the n = 2
distribution of

Y can be obtained by making out a two-way table of joint prob-
abilities for Y
1
and Y
2
and adding on diagonals to get probabilities for Y
1
+Y
2
.
Then making a two-way table of joint probabilities for (Y
1
+ Y
2
) and Y
3
one
can add on diagonals and nd a joint distribution for Y
1
+Y
2
+Y
3
. Or noting
that the distribution of Y
3
+Y
4
is the same as that for Y
1
+Y
2
, it is possible to
make a two-way table of joint probabilities for (Y
1
+Y
2
) and (Y
3
+Y
4
), add on
diagonals and nd the distribution of Y
1
+Y
2
+Y
3
+Y
4
. And so on. (Clearly,
after nding the distribution for a sum, one simply divides possible values by n
to get the corresponding distribution of

Y .)
To nd the probability distribution of R = maxY
i
minY
i
(for Y
i
s as above)
a feasible computational scheme is as follows. Let
S
kj
=
P
j
x=k
f(y) = P[k Y j] if k j
0 otherwise
and compute and store these for 1 k; j M. Then dene
M
kj
= P[minY
i
= k and max Y
i
= j] :
Now the event fminY
i
= k and max Y
i
= jg is the event fall observations are
between k and j inclusiveg less the event fthe minimum is greater than k or the
maximum is less than jg. Thus, it is straightforward to see that
M
kj
= (S
kj
)
n
(S
k+1;j
)
n
(S
k;j1
)
n
+ (S
k+1;j1
)
n
and one may compute and store these values. Finally, note that
P[R = r] =
Mr
X
k=1
M
k;k+r
:
These algorithms are good for any distribution f on the integers 1; 2; : : : ; M.
Karen (Jensen) Hultings DIST program (available o the Stat 531 Web page)
automates the calculations of the distributions of

Y and R for certain fs re-
lated to integer rounding of normal observations. (More on this rounding idea
directly.)
1.7.2 Estimation Based on Integer-Rounded Normal Data
The problem of drawing inferences from crudely gaged data is one that has a
history of at least 100 years (if one takes a view that crude gaging essentially
rounds exact values). Sheppard in the late 1800s noted that if one rounds
a continuous variable to integers, the variability in the distribution is typically
increased. He thus suggested not using the sample standard deviation (s) of
rounded values but instead employing what is known as Sheppards correction
to arrive at
r
(n 1)s
2
n

1
12
(1.8)
as a suitable estimate of standard deviation for integer-rounded data.
The notion of interval-censoring of fundamentally continuous observations
provides a natural framework for the application of modern statistical theory to
the analysis of crudely gaged data. For univariate X with continuous cdf F(xj)
depending upon some (possibly vector) parameter , consider X
derived from
X by rounding to the nearest integer. Then the pmf of X
is, say,
g(x
j)
:
=
F(x
+:5j) F(x
:5j) for x
an integer
0 otherwise :
Rather than doing inference based on the unobservable variables X
1
; X
2
; : : : ; X
n
that are iid F(xj), one might consider inference based on X
1
; X
2
; : : : ; X
n
that
are iid with pmf g(x
j).
The normal version of this scenario (the integer-rounded normal data model)
makes use of
g(x
j; )
:
=
8
<
:

+:5
:5
for x
an integer
0 otherwise ;
and the balance of this section will consider the use of this specic important
model. So suppose that X
1
; X
2
; : : : ; X
n
are iid integer-valued random obser-
vations (generated from underlying normal observations by rounding). For an
observed vector of integers (x
1
; x
2
; : : : ; x
n
) it is useful to consider the so-called
likelihood function that treats the (joint) probability assigned to the vector
(x
1
; x
2
; : : : ; x
n
) as a function of the parameters,
L(; )
:
=
Y
i
g(x
i
j; ) =
Y
i
i
+:5
i
:5
:
The log of this function of and is (naturally enough) called the loglikelihood
and will be denoted as
L(; )
:
= lnL(; ) :
A sensible estimator of the parameter vector (; ) is the point (b ; b ) max-
imizing the loglikelihood. This prescription for estimation is only partially
complete, depending upon the nature of the sample x
1
; x
2
; : : : ; x
n
. There are
three cases to consider, namely:
1. When the sample range of x
1
; x
2
; : : : ; x
n
is at least 2, L(; ) is well-
behaved (nice and mound-shaped) and numerical maximization or just
looking at contour plots will quickly allow one to maximize the loglikeli-
hood. (It is worth noting that in this circumstance, usually b is close to
the Sheppard corrected value in display (1.8).)
1
; x
2
; : : : ; x
n
is 1, strictly speaking L(; )
fails to achieve a maximum. However, with
m
:
= #[x
i
= minx
i
] ;
(; ) pairs with small and
minx
i
+:5
1
m
n
will have
L(; ) sup
;
L(; ) = mlnm+ (n m) ln(n m) nlnn :
That is, in this case one ought to estimate that is small and the
relationship between and is such that a fraction m=n of the underlying
normal distribution is to the left of minx
i
+:5, while a fraction 1 m=n
is to the right.
1
; x
2
; : : : ; x
n
is 0, strictly speaking L(; )
fails to achieve a maximum. However,
sup
;
L(; ) = 0
and for any 2 (x
1
:5; x
1
+:5), L(; ) !0 as !0. That is, in this
case one ought to estimate that is small and 2 (x
1
:5; x
1
+:5).
Beyond the making of point estimates, the loglikelihood function can provide
approximate condence sets for the parameters and/or . Standard large
sample statistical theory says that (for large n and
2
:
the upper point of
the
2
distribution):
1. An approximate (1) level condence set for the parameter vector (; )
is
f(; )jL(; ) > sup
;
L(; )
1
2
2
:2
g : (1.9)
2. An approximate (1 ) level condence set for the parameter is
fj sup
L(; ) > sup

;
L(; )
1
2
2
:1
g : (1.10)
3. An approximate (1 ) level condence set for the parameter is
fj sup
L(; ) > sup

;
L(; )
1
2
2
:1
g : (1.11)
Several comments and a fuller discussion are in order regarding these con-
dence sets. In the rst place, Karen (Jensen) Hultings CONEST program
(available o the Stat 531 Web page) is useful in nding sup
;
L(; ) and pro-
ducing rough contour plots of the (joint) sets for (; ) in display (1.9). Second,
it is common to call the function of dened by
L
() = sup
L(; )
the prole loglikelihood function for and the function of
L
() = sup
L(; )
the prole loglikelihood function for . Note that display (1.10) then says that
the condence set should consist of those s for which the prole loglikelihood
is not too much smaller than the maximum achievable. And something entirely
analogous holds for the sets in (1.11). Johnson Lee (in 2001 Ph.D. dissertation
work) has carefully studied these condence interval estimation problems and
determined that some modication of methods (1.10) and (1.11) is necessary in
order to provide guaranteed coverage probabilities for small sample sizes. (It
is also very important to realize that contrary to naive expectations, not even
a large sample size will make the usual t-intervals for and
2
-intervals for
hold their nominal condence levels in the event that is small, i.e. that the
rounding or crudeness of the gaging is important. Ignoring the rounding when
it is important can produce actual condence levels near 0 for methods with
large nominal condence levels.)
Table 1.2: for 0-Range Samples Based on Very Small n
n :05 :10 :20

2 3:084 1:547 :785
3 :776 :562
4 :517
Intervals for a Normal Mean Based on Integer-Rounded Data
Specically regarding the sets for in display (1.10), Lee (in work to appear in
the Journal of Quality Technology) has shown that one must replace the value
2
:1
with something larger in order to get small n actual condence levels not
too far from nominal for most (; ). In fact, the choice
c(n; ) = nln
t
2
2
:(n1)
n 1
+ 1
!
(for t
2
:(n1)
the upper

2
point of the t distribution with = n 1 degrees of
freedom) is appropriate.
After replacing
2
:1
with c(n; ) in display (1.10) there remains the numer-
ical analysis problem of actually nding the interval prescribed by the display.
The nature of the numerical analysis required depends upon the sample range
encountered in the crudely gaged data. Provided the range is at least 2, L
()
is well-behaved (continuous and mound-shaped) and even simple trial and
error with Karen (Jensen) Hultings CONEST program will quickly produce
the necessary interval. When the range is 0 or 1, L
() has respectively 2 or 1
discontinuities and the numerical analysis is a bit trickier. Lee has recorded the
results of the numerical analysis for small sample sizes and = :05; :10 and :20
(condence levels respectively 95%; 90% and 80%).
When a sample of size n produces range 0 with, say, all observations equal
to x
, the intuition that one ought to estimate 2 (x
:5; x
+ :5) is sound
unless n is very small. If n and are as recorded in Table 1.2 then display
(1.10) (modied by the use of c(n; ) in place of
2
:1
) leads to the interval
(x
; x
+ ). (Otherwise it leads to (x
:5; x
+:5) for these .)

In the case that a sample of size n produces range 1 with, say, all observations
x
or x
+1, the interval prescribed by display (1.10) (with c(n; ) used in place
of
2
:1
) can be thought of as having the form (x
+:5
L
; x
+:5+
U
) where
L
and
U
depend upon
n
x
= #[observations x
] and n
x
+1
= #[observations x
+ 1] . (1.12)
When n
x
n
x
+1
, it is the case that
L

U
. And when n
x
n
x
+1
,
correspondingly
L

U
. Let
m = maxfn
x
; n
x
+1
g (1.13)
Table 1.3: (
1
;
2
) for Range 1 Samples Based on Small n
n m :05 :10 :20

2 1 (6:147; 6:147) (3:053; 3:053) (1:485; 1:485)
3 2 (1:552; 1:219) (1:104; 0:771) (0:765; 0:433)
4 3 (1:025; 0:526) (0:082; 0:323) (0:639; 0:149)
2 (0:880; 0:880) (0:646; 0:646) (0:441; 0:441)
5 4 (0:853; 0:257) (0:721; 0:132) (0:592; 0:024)
3 (0:748; 0:548) (0:592; 0:339) (0:443; 0:248)
6 5 (0:772; 0:116) (0:673; 0:032) (0:569; 0:000)
4 (0:680; 0:349) (0:562; 0:235) (0:444; 0:126)
3 (0:543; 0:543) (0:420; 0:420) (0:299; 0:299)
7 6 (0:726; 0:035) (0:645; 0:000) (0:556; 0:000)
5 (0:640; 0:218) (0:545; 0:130) (0:446; 0:046)
4 (0:534; 0:393) (0:432; 0:293) (0:329; 0:193)
8 7 (0:698; 0:000) (0:626; 0:000) (0:547; 0:000)
6 (0:616; 0:129) (0:534; 0:058) (0:446; 0:000)
5 (0:527; 0:281) (0:439; 0:197) (0:347; 0:113)
4 (0:416; 0:416) (0:327; 0:327) (0:236; 0:236)
9 8 (0:677; 0:000) (0:613; 0:000) (0:541; 0:000)
7 (0:599; 0:065) (0:526; 0:010) (0:448; 0:000)
6 (0:521; 0:196) (0:443; 0:124) (0:361; 0:054)
5 (0:429; 0:321) (0:350; 0:242) (0:267; 0:163)
10 9 (0:662; 0:000) (0:604; 0:000) (0:537; 0:000)
8 (0:587; 0:020) (0:521; 0:000) (0:450; 0:000)
7 (0:515; 0:129) (0:446; 0:069) (0:371; 0:012)
6 (0:437; 0:242) (0:365; 0:174) (0:289; 0:105)
5 (0:346; 0:346) (0:275; 0:275) (0:200; 0:200)
and correspondingly take
1
= maxf
L
;
U
g and
2
= minf
L
;
U
g .
Table 1.3 then gives values for
1
and
2
for n 10 and = :05; :10 and :2.
Intervals for a Normal Standard Deviation Based on Integer-Rounded
Data
Specically regarding the sets for in display (1.11), Lee found that in order
to get small n actual condence levels not too far from nominal, one must not
only replace the value
2
:1
with something larger, but must make an additional
adjustment for samples with ranges 0 and 1.
Consider rst replacing
2
:1
in display (1.11) with a (larger) value d(n; )
given in Table 1.4. Lee found that for those (; ) with moderate to large ,
Table 1.4: d(n; ) for Use in Estimating
n :05 :10
2 10:47 7:71
3 7:26 5:23
4 6:15 4:39
5 5:58 3:97
6 5:24 3:71
7 5:01 3:54
8 4:84 3:42
9 4:72 3:33
10 4:62 3:26
15 4:34 3:06
20 4:21 2:97
30 4:08 2:88
1 3:84 2:71
making this d(n; ) for
2
:1
substitution is enough to produce an actual con-
dence level approximating the nominal one. However, even this modication
is not adequate to produce an acceptable coverage probability for (; ) with
small .
For samples with range 0 or 1, formula (1.11) prescribes intervals of the form
(0; U). And reasoning that when is small, samples will typically have range
0 or 1, Lee was able to nd (larger) replacements for the limit U prescribed by
(1.11) so that the resulting estimation method has actual condence level not
much below the nominal level for any (; ) (with large or small).
That is if a 0-range sample is observed, estimate by
(0;
0
)
where
0
is taken from Table 1.5. If a range 1 sample is observed consisting,
say, of values x
and x
+1, and n
x
; n
x
+1
and m are as in displays (1.12) and
(1.13), estimate using
(0;
1;m
)
where
1;m
is taken from Table 1.6.
The use of these values
0
for range 0 samples, and
1;m
for range 1 samples,
and the values d(n; ) in place of
2
:1
in display (1.11) nally produces a reliable
method of condence interval estimation for when normal data are integer-
rounded.
Table 1.5:
0
for Use in Estimating
n :05 :10
2 5:635 2:807
3 1:325 0:916
4 0:822 0:653
5 0:666 0:558
6 0:586 0:502
7 0:533 0:464
8 0:495 0:435
9 0:466 0:413
10 0:443 0:396
11 0:425 0:381
12 0:409 0:369
13 0:396 0:358
14 0:384 0:349
15 0:374 0:341
Table 1.6:
1;m
for Use in Estimating (m in Parentheses)
n :05 :10
2 16:914(1) 8:439(1)
3 3:535(2) 2:462(2)
4 1:699(3) 2:034(2) 1:303(3) 1:571(2)
5 1:143(4) 1:516(3) 0:921(4) 1:231(3)
6 0:897(5) 1:153(4) 1:285(3) 0:752(5) 0:960(4) 1:054(3)
7 0:768(6) 0:944(5) 1:106(4) 0:660(6) 0:800(5) 0:949(4)
8 0:687(7) 0:819(6) 0:952(5) 0:599(7) 0:707(6) 0:825(5)
1:009(4) 0:880(4)
9 0:629(8) 0:736(7) 0:837(6) 0:555(8) 0:644(7) 0:726(6)
0:941(5) 0:831(5)
10 0:585(9) 0:677(8) 0:747(7) 0:520(9) 0:597(8) 0:654(7)
0:851(6) 0:890(5) 0:753(6) 0:793(5)
11 0:550(10) 0:630(9) 0:690(8) 0:493(10) 0:560(9) 0:609(8)
0:775(7) 0:851(6) 0:685(7) 0:763(6)
12 0:522(11) 0:593(10) 0:646(9) 0:470(11) 0:531(10) 0:573(9)
0:708(8) 0:789(7) 0:818(6) 0:626(8) 0:707(7) 0:738(6)
13 0:499(12) 0:563(11) 0:610(10) 0:452(12) 0:506(11) 0:544(10)
0:658(9) 0:733(8) 0:791(7) 0:587(9) 0:655(8) 0:716(7)
14 0:479(13) 0:537(12) 0:580(11) 0:436(13) 0:485(12) 0:520(11)
0:622(10) 0:681(9) 0:745(8) 0:558(10) 0:607(9) 0:674(8)
0:768(7) 0:698(7)
15 0:463(14) 0:515(13) 0:555(12) 0:422(14) 0:468(13) 0:499(12)
0:593(11) 0:639(10) 0:701(9) 0:534(11) 0:574(10) 0:632(9)
0:748(8) 0:682(8)
Chapter 2
Process Monitoring
Chapters 3 and 4 of V&J discuss methods for process monitoring. The key
concept there regarding the probabilistic description of monitoring schemes is
the run length idea introduced on page 91 and specically in display (3.44).
Theory for describing run lengths is given in V&J only for the very simplest case
of geometrically distributed T. This chapter presents some more general tools
for the analysis/comparison of run length distributions of monitoring schemes,
namely discrete time nite state Markov chains and recursions expressed in
terms of integral (and dierence) equations.
2.1 Some Theory for Stationary Discrete Time
Finite State Markov Chains With a Single
Absorbing State
These are probability models for random systems that at times t = 1; 2; 3 : : :
can be in one of a nite number of states
S
1
; S
2
; : : : ; S
m
; S
m+1
:
The Markov assumption is that the conditional distribution of where the
system is at time t +1 given the entire history of where it has been up through
time t only depends upon where it is at time t. (In colloquial terms: The
conditional distribution of where Ill be tomorrow given where I am and howI got
here depends only on where I am, not on how I got here.) So called stationary
Markov Chain (MC) models employ the assumption that movement between
states from any time t to time t + 1 is governed by a (single) matrix of (one-
step) transition probabilities (that is independent of t)
P
(m+1)(m+1)
= (p
ij
)
where
p
ij
= P[system is in S
j
at time t + 1 j system is in S
i
at time t] :
21
22 CHAPTER 2. PROCESS MONITORING
S S
S
1 2
3
.1 .05
.8 .05
1.0
.1
.9
Figure 2.1: Schematic for a MC with Transition Matrix (2.1)
As a simple example of this, consider the transition matrix
P
33
:
=
0
@
:8 :1 :1
:9 :05 :05
0 0 1
1
A
: (2.1)
Figure 2.1 is a useful schematic representation of this model.
The Markov Chain represented by Figure 2.1 has an interesting property.
That is, while it is possible to move back and forth between states 1 and 2,
once the system enters state 3, it is stuck there. The standard jargon for this
property is to say that S
3
is an absorbing state. (In general, if p
ii
= 1, S
i
is
called an absorbing state.)
Of particular interest in applications of MCs to the description of process
monitoring schemes are chains with a single absorbing state, say S
m+1
, where it
is possible to move (at least eventually) from any other state to the absorbing
state. One thing that makes these chains so useful is that it is very easy to
write down a matrix formula for a vector giving the mean number of transitions
required to reach S
m+1
from any of the other states. That is, with
L
i
= the mean number of transitions required to move from S
i
to S
m+1
;
L
m1
=
0
B
B
B
@
L
1
L
2
.
.
.
L
m
1
C
C
C
A
; P
(m+1)(m+1)
=
0
@
R
mm
r
m1
0
1m
1
11
1
A
; and 1
m1
=
0
B
B
B
@
1
1
.
.
.
1
1
C
C
C
A
it is the case that
L = (I R)
1
1 : (2.2)
2.1. SOME THEORYFORSTATIONARYDISCRETE TIME FINITE STATEMARKOVCHAINS WITHASIN
To argue that display (2.2) is correct, note that the following system of m
equations clearly holds:
L
1
= (1 +L
1
)p
11
+ (1 +L
2
)p
12
+ + (1 +L
m
)p
1m
+ 1 p
1;m+1
L
2
= (1 +L
1
)p
21
+ (1 +L
2
)p
22
+ + (1 +L
m
)p
2m
+ 1 p
2;m+1
.
.
.
L
m
= (1 +L
1
)p
m1
+ (1 +L
2
)p
m2
+ + (1 +L
m
)p
mm
+ 1 p
m;m+1
:
But this set is equivalent to the set
L
1
= 1 +p
11
L
1
+p
12
L
2
+ +p
1m
L
m
L
2
= 1 +p
21
L
1
+p
22
L
2
+ +p
2m
L
m
.
.
.
L
m
= 1 +p
m1
L
1
+p
m2
L
2
+ +p
mm
L
m
and in matrix notation, this second set of equations is
L = 1 +RL : (2.3)
So
LRL = 1 ;
i.e.
(I R)L = 1 :
Under the conditions of the present discussion it is the case that (I R) is
guaranteed to be nonsingular, so that multiplying both sides of this matrix
equation by the inverse of (I R) one nally has equation (2.2).
For the simple 3-state example with transition matrix (2.1) it is easy enough
to verify that with
R=
:8 :1
:9 :05
one has
(I R)
1
1 =
10:5
11
:
That is, the mean number of transitions required for absorption (into S
3
) from
S
1
is 10:5 while the mean number required from S
2
is 11:0.
When one is working with numerical values in P and thus wants numerical
values in L, the matrix formula (2.2) is most convenient for use with numerical
analysis software. When, on the other hand, one has some algebraic expressions
for the p
ij
and wants algebraic expressions for the L
i
, it is usually most eective
to write out the system of equations represented by display (2.3) and to try and
see some slick way of solving for an L
i
of interest.
It is also worth noting that while the discussion in this section has centered
on the computation of mean times to absorption, other properties of time to
absorption variables can be derived and expressed in matrix notation. For
example, Problem 2.22 shows that it is fairly easy to nd the variance (or
standard deviation) of time to absorption variables.
2.2 Some Applications of Markov Chains to the
Analysis of Process Monitoring Schemes
When the current condition of a process monitoring scheme can be thought
of as discrete random variable (with a nite number of possible values), because
1. the variables Q
1
; Q
2
; ::. fed into it are intrinsically discrete (for example
representing counts) and are therefore naturally modeled using a discrete
probability distribution (and the calculations prescribed by the scheme
produce only a xed number of possible outcomes),
2. discretization of the Qs has taken place as a part of the development
of the monitoring scheme (as, for example, in the zone test schemes
outlined in Tables 3.5 through 3.7 of V&J), or
3. one approximates continuous distributions for Qs and/or states of the
scheme with a nely-discretized version in order to approximate exact
(continuous) run length properties,
one can often apply the material of the previous section to the prediction of
scheme behavior. (This is possible when the evolution of the monitoring scheme
can be thought of in terms of movement between states where the conditional
distribution of the next state depends only on a distribution for the next Q
which itself depends only on the current state of the scheme.) This section
contains four examples of what can be done in this direction.
As an initial simple example, consider the simple monitoring scheme (sug-
gested in the book Sampling Inspection and Quality Control by Wetherill) that
signals an alarm the rst time
1. a single point Q plots outside 3 sigma limits, or
2. two consecutive Qs plot between 2 and 3 sigma limits.
(This is a simple competitor to the sets of alarm rules specied in Tables 3.5
through 3.7 of V&J.) Suppose that one assumes that Q
1
; Q
2
; : : : are iid and
q
1
= P[Q
1
plots outside 3 sigma limits]
and
q
2
= P[Q
1
plots between 2 and 3 sigma limits] :
Then one might think of describing the evolution of the monitoring scheme with
a 3-state MC with states
S
1
= all is OK,
S
2
= no alarm yet and the current Q is between 2 and 3 sigma limits, and
S
3
= alarm.
2.2. SOME APPLICATIONS OF MARKOVCHAINS TOTHE ANALYSIS OF PROCESS MONITORINGSCH
q + q
1 2
S
3
S
2
S
1
1- q - q
1 2
q
2
1- q - q
1 2
q
1
1.0
0
Figure 2.2: Schematic for a MC with Transition Matrix (2.4)
For this representation, an appropriate transition matrix is
P =
0
@
1 q
1
q
2
q
2
q
1
1 q
1
q
2
0 q
1
+q
2
0 0 1
1
A
(2.4)
and the ARL of the scheme (under the iid model for the Q sequence) is L
1
, the
mean time to absorption into the alarm state from the all-OK state. Figure
2.2 is a schematic representation of this scenario.
It is worth noting that a system of equations for L
1
and L
2
is
L
1
= 1 q
1
+ (1 +L
2
)q
2
+ (1 +L
1
)(1 q
1
q
2
)
L
2
= 1 (q
1
+q
2
) + (1 +L
1
)(1 q
1
q
2
) ;
which is equivalent to
L
1
= 1 +L
1
(1 q
1
q
2
) +L
2
q
2
L
2
= 1 +L
1
(1 q
1
q
2
) ;
which is the non-matrix version of the system (2.3) for this example. It is
easy enough to verify that this system of two linear equations in the unknowns
L
1
and L
2
has a (simultaneous) solution with
L
1
=
1 +q
2
1 (1 q
1
q
2
) q
2
(1 q
1
q
2
)
:
As a second application of MC technology to the analysis of a process moni-
toring scheme, we will consider a so-called Run-Sum scheme. To dene such a
scheme, one begins with zones for the variable Q as indicated in Figure 3.9 of
V&J. Then scores are dened for various possible values of Q. For j = 0; 1; 2
a score of +j is assigned to the eventuality that Q is in the positive j-sigma to
(j + 1)-sigma zone, while a score of j is assigned to the eventuality that Q
is in the negative j-sigma to (j +1)-sigma zone. A score of +3 is assigned to
any Q above the upper 3-sigma limit while a score of 3 is assigned to any Q
below the lower 3-sigma limit. Then, for the variables Q
1
; Q
2
; : : : one denes
corresponding scores Q
1
; Q
2
; : : : and run sums R
1
; R
2
; : : : where
R
i
= the sum of scores Q
through time i under the provision that a

new sum is begun whenever a score is observed with a sign dierent
from the existing Run-Sum.
(Note, for example, that a new score of Q
= +0 will reset a current Run-Sum

of R = 2 to +0.) The Run-Sum scheme then signals at the rst i for which
jQ
i
j = 3 or jR
i
j 4.
Then dene states for a Run-Sum process monitoring scheme
S
1
= no alarm yet and R = 0,
S
2
S
3
S
4
S
5
= no alarm yet and R = +0,
S
6
S
7
S
8
= no alarm yet and R = +3, and
S
9
= alarm.
If one assumes that the observations Q
1
; Q
2
; : : : are iid and for j = 3; 2; 1; 0;
+0; +1; +2; +3 lets
q
j
= P[Q
1
= j] ;
an appropriate transition matrix for describing the evolution of the scheme is
P =
0
B
B
B
B
B
B
B
B
B
B
B
B
@
q
0
q
1
q
2
0 q
+0
q
+1
q
+2
0 q
3
+q
+3
0 q
0
q
1
q
2
q
+0
q
+1
q
+2
0 q
3
+q
+3
0 0 q
0
q
1
q
+0
q
+1
q
+2
0 q
3
+q
2
+q
+3
0 0 0 q
0
q
+0
q
+1
q
+2
0 q
3
+q
2
+q
1
+q
+1
q
0
q
1
q
2
0 q
+0
q
+1
q
+2
0 q
3
+q
+3
q
0
q
1
q
2
0 0 q
+0
q
+1
q
+2
q
3
+q
+3
q
0
q
1
q
2
0 0 0 q
+0
q
+1
q
3
+q
+2
+q
+3
q
0
q
1
q
2
0 0 0 0 q
+0
q
3
+q
+1
+q
+2
+q
+3
0 0 0 0 0 0 0 0 1
1
C
C
C
C
C
C
C
C
C
C
C
C
A
and the ARL for the scheme is L
1
= L
5
. (The fact that the 1st and 5th rows of
P are identical makes it clear that the mean times to absorption from S
1
and S
5
2.2. SOME APPLICATIONS OF MARKOVCHAINS TOTHE ANALYSIS OF PROCESS MONITORINGSCH
q
-1
q
0
q
1
q
2
q
m
q
m-1
q
-m
h -h 0 2h/m -h/m h/m
f *(y)
... ...
Figure 2.3: Notational Conventions for Probabilities from Rounding Q k
1
Values
must be the same.) It turns out that clever manipulation with the non-matrix
version of display (2.3) in this example even produces a fairly simple expression
for the schemes ARL. (See Problem 2.24 and Reynolds (1971 JQT) and the
references therein in this nal regard.)
To turn to a dierent type of application of the MC technology, consider
the analysis of a high side decision interval CUSUM scheme as described in
4.2 of V&J. Suppose that the variables Q
1
; Q
2
; : : : are iid with a continuous
distribution specied by the probability density f(y). Then the variables Q
1
k
1
; Q
2
k
1
; Q
3
k
1
; : : : are iid with probability density f
(y) = f(y +k
1
). For a
positive integer m, we will think of replacing the variables Q
i
k
1
with versions
of them rounded to the nearest multiple of h=m before CUSUMing. Then the
CUSUM scheme can be thought of in terms of a MC with states
S
i
= no alarm yet and the current CUSUM is (i 1)
h
m
for i = 1; 2; : : : ; m and
S
m+1
= alarm.
Then let
q
m
=
Z
h+
1
2
(
h
m
)
1
f
(y)dy = P[Q
1
k
1
h +
1
2
h
m
] ;
q
m
=
Z
1
h
1
2
(
h
m
)
f
(y)dy = P[h
1
2
h
m
< Q
1
k
1
] ;
and for m < j < m take
q
j
=
Z
j(
h
m
)+
1
2
(
h
m
)
j(
h
m
)
1
2
(
h
m
)
f
(y)dy : (2.5)
These notational conventions for probabilities q
m
; : : : ; q
m
are illustrated in
Figure 2.3.
In this notation, the evolution of the high side decision interval CUSUM
scheme can then be described in approximate terms by a MC with transition
matrix
P
(m+1)(m+1)
=
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
0
X
j=m
q
j
q
1
q
2
q
m1
q
m
1
X
j=m
q
j
q
0
q
1
q
m2
q
m1
+q
m
2
X
j=m
q
j
q
1
q
0
q
m3
q
m2
+q
m1
+q
m
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
q
m
+q
m+1
q
m+2
q
m+3
q
0
m
X
j=1
q
j
0 0 0 0 1
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
:
For i = 1; : : : ; m the mean time to absorption from state S
i
(L
i
) is approximately
the ARL of the scheme with head start (i 1)
h
m
. (That is, the entries of the

vector L specied in display (2.2) are approximate ARL values for the CUSUM
scheme using various possible head starts.) In practice, in order to nd ARLs
for the original scheme with non-rounded iid observations Q, one would nd
approximate ARL values for an increasing sequence of ms until those appear
to converge for the head start of interest.
As a nal example of the use of MC techniques in the probability modeling
of process monitoring scheme behavior, consider discrete approximation of the
EWMA schemes of 4.1 of V&J where the variables Q
1
; Q
2
; : : : are again iid
with continuous distribution specied by a pdf f(y). In this case, in order to
provide a tractable discrete approximation, it will not typically suce to simply
discretize the variables Q(as the EWMA calculations will then typically produce
a number of possible/exact EWMA values that grows as time goes on). Instead,
it is necessary to think directly in terms of rounded/discretized EWMAs. So for
an odd positive integer m, let = (UCL
EWMA
LCL
EWMA
)=m and think of
replacing an (exact) EWMA sequence with a rounded EWMA sequence taking
on values a
i
dened by
a
i
:
= LCL
EWMA
+

2
+ (i 1)
for i = 1; 2; : : : ; m. For i = 1; 2; :::; m let
S
i
= no alarm yet and the rounded EWMA is a
i
and
S
m+1
= alarm.
2.3. INTEGRAL EQUATIONS ANDRUNLENGTHPROPERTIES OF PROCESS MONITORINGSCHEMES2
And for 1 i; j m, let
q
ij
= P[moving from S
i
to S
j
] ;
= P[a
j

2
(1 )a
i
+Q a
j
+

2
] ;
= P[
a
j
(1 )a
i

2
Q
a
j
(1 )a
i
+

2
] ;
= P[a
i
+
(j i)

2
Q a
i
+
(j i)
+

2
] ;
=
Z
ai+
(ji)
+

2
ai+
(ji)
2
f(y)dy : (2.6)
Then with
P =
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
q
11
q
12
q
1m
1
m
X
j=1
q
1j
q
21
q
22
q
2m
1
m
X
j=1
q
2j
.
.
.
.
.
.
.
.
.
.
.
.
q
m1
q
m2
q
mm
1
m
X
j=1
q
mj
0 0 0 1
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
the mean time to absorption from the state S
(m+1)=2
(the value L
(m+1)=2
) of
a MC with this transition matrix is an approximation for the EWMA scheme
ARL with EWMA
0
= (UCL
EWMA
+LCL
EWMA
)=2. In practice, in order to
nd the ARL for the original scheme, one would nd approximate ARL values
for an increasing sequence of ms until those appear to converge.
The four examples in this section have illustrated the use of MC calculations
in the second and third of the two circumstances listed at the beginning of this
section. The rst circumstance is conceptually the simplest of the three, and is
for example illustrated by Problems 2.25, 2.28 and 2.37. The examples have also
all dealt with iid models for the Q
1
; Q
2
; : : : sequence. Problem 2.26 shows that
the methodology can also easily accommodate some kinds of dependencies in
the Q sequence. (The discrete model in Problem 2.26 is itself perhaps less than
completely appealing, but the reader should consider the possibility of discrete
approximation of the kind of dependency structure employed in Problem 2.27
before dismissing the basic concept illustrated in Problem 2.26 as useless.)
2.3 Integral Equations and Run Length Proper-
ties of Process Monitoring Schemes
There is a second (and at rst appearance quite dierent) standard method of
approaching the analysis of the run length behavior of some process monitoring
schemes where continuous variables Q are involved. That is through the use of
integral equations, and this section introduces the use of these. (As it turns out,
by the time one is forced to nd numerical solutions of the integral equations,
there is not a whole lot of dierence between the methods of this section and
those of the previous one. But it is important to introduce this second point of
view and note the correspondence between approaches.)
Before going to the details of specic schemes and integral equations, a small
piece of calculus/numerical analysis needs to be reviewed and notation set for
use in these notes. That concerns the approximation of denite integrals on the
interval [a; a +h]. Specication of a set of points
a a
1
a
2
a
m
a +h
and weights
w
i
0 with
m
X
i=1
w
i
= h
so that
Z
a+h
a
f(y)dy may be approximated as
m
X
i=1
w
i
f(a
i
)
for reasonable functions f(y), is the specication of a so-called quadrature
rule for approximating integrals on the interval [a; a+h]. The simplest of such
rules is probably the choice
a
i
:
= a +
i
1
2
m
h with w
i
:
=
h
m
: (2.7)
(This choice amounts to approximating an integral of f by a sum of signed areas
of rectangles with bases h=m and (signed) heights chosen as the values of f at
midpoints of intervals of length h=m beginning at a.)
Now consider a high side CUSUM scheme as in 4.2 of V&J, where Q
1
; Q
2
; : : :
are iid with continuous marginal distribution specied by the probability density
f(y). Dene the function
L
1
(u)
:
= the ARL of the high side CUSUM scheme using a head start of u :
If one begins CUSUMing at u, there are three possibilities of where he/she will be
after a single observation, Q
1
. If Q
1
is large (Q
1
k
1
hu) then there will be
an immediate signal and the run length will be 1. If Q
1
is small (Q
1
k
1
u)
the CUSUM will zero out, one observation will have been spent, and on
average L
1
(0) more observations are to be faced in order to produce a signal.
Finally, if Q
1
is moderate (u < Q
1
k
1
< h u) then one observation will
have been spent and the CUSUM will continue from u+(Q
1
k
1
), requiring on
average an additional L
1
(u +(Q
1
k
1
)) observations to produce a signal. This
reasoning leads to the equation for L
1
,
L
1
(u) = 1 P[Q
1
k
1
h u] + (1 +L
1
(0))P[Q
1
k
1
u]
+
Z
k1+hu
k1u
(1 +L
1
(u +y k
1
))f(y)dy :
Writing F(y) for the cdf of Q
1
and simplifying slightly, this is
L
1
(u) = 1 +L
1
(0)F(k
1
u) +
Z
h
0
L
1
(y)f(y +k
1
u)dy : (2.8)
The argument leading to equation (2.8) has a twin that produces an integral
equation for
L
2
(v)
:
= the ARL of a low side CUSUM scheme using a head start of v :
That equation is
L
2
(v) = 1 +L
2
(0) (1 F(k
2
u)) +
Z
0
h
L
2
(y)f(y +k
2
v)dy : (2.9)
And as indicated in display (4.20) of V&J, could one solve equations (2.8) and
(2.9) (and thus obtain L
1
(0) and L
2
(0)) one would have not only separate high
and low side CUSUM ARLs, but ARLs for some combined schemes as well.
(Actually, more than what is stated in V&J can be proved. Yashchin in a
Journal of Applied Probability paper in about 1985 showed that with iid Qs,
high side decision interval h
1
and low side decision interval h
2
for nonnegative
h
2
, if k
1
k
2
and
(k
1
k
2
) jh
1
h
1
j max (0; u v max(h
1
; h
2
)) ;
for the simultaneous use of high and low side schemes
ARL
combined
=
L
1
(0)L
2
(v) +L
1
(u)L
2
(0) L
1
(0)L
2
(0)
L
1
(0) +L
2
(0)
:
It is easily veried that what is stated on page 151 of V&J is a special case of
this result.) So in theory, to nd ARLs for CUSUM schemes one need only
solve the integral equations (2.8) and (2.9). This is easier said than done. The
one case where fairly explicit solutions are known is that where observations are
exponentially distributed (see Problem 2.30). In other cases one must resort to
numerical solution of the integral equations.
So consider the problem of approximate solution of equation (2.8). For
a particular quadrature rule for integrals on [0; h], for each a
i
one has from
equation (2.8) the approximation
L
1
(a
i
) 1 +L
1
(a
1
)F(k
1
a
i
) +
m
X
j=1
w
j
L
1
(a
j
)f(a
j
+k
1
a
i
) :
That is, at least approximately one has the system of m linear equations
L
1
(a
1
) = 1 +L
1
(a
1
)[F(k
1
a
1
) +w
1
f(k
1
)] +
m
X
j=2
L
1
(a
j
)w
j
f(a
j
+k
1
a
1
) ;
L
1
(a
2
) = 1 +L
1
(a
1
)[F(k
1
a
2
) +w
1
f(a
1
+k
1
a
2
)] +
m
X
j=2
L
1
(a
j
)w
j
f(a
j
+k
1
a
2
) ;
.
.
.
L
1
(a
m
) = 1 +L
1
(a
1
)[F(k
1
a
m
) +w
1
f(a
1
+k
1
a
m
)] +
m
X
j=2
L
1
(a
j
)w
j
f(a
j
+k
1
a
m
)
in the m unknowns L
1
(a
1
); : : : ; L
1
(a
m
). Again in light of equation (2.8) and the
notion of numerical approximation of denite integrals, upon solving this set of
equations (for approximate values of (L
1
(a
1
); : : : ; L
1
(a
m
)) one may approximate
the function L
1
(u) as
L
1
(u) 1 +L
1
(a
1
)F(k
1
u) +
m
X
j=1
w
j
L
1
(a
j
)f(a
j
+k
1
u) :
It is a revealing point that the system of equations above is of the form (2.3)
that was so useful in the MC approach to the determination of ARLs. That is,
let
L =
0
B
B
B
@
L
1
(a
1
)
L
1
(a
2
)
.
.
.
L
1
(a
m
)
1
C
C
C
A
and
R =
0
B
B
B
@
F(k
1
a
1
) +w
1
f(k
1
) w
2
f(a
2
+k
1
a
1
) w
m
f(a
m
+k
1
a
1
)
F(k
1
a
2
) +w
1
f(a
1
+k
1
a
2
) w
2
f(k
1
) w
m
f(a
m
+k
1
a
2
)
.
.
.
.
.
.
.
.
.
F(k
1
a
m
) +w
1
f(a
1
+k
1
a
m
) w
2
f(a
2
+k
1
a
m
) w
m
f(k
1
)
1
C
C
C
A
and note that the set of equations for the a
i
head start approximate ARLs is
exactly of the form (2.3). With the simple quadrature rule in display (2.7) note
that a generic entry of R; r
ij
, for j 2 is
r
ij
= w
j
f(a
j
+k
1
a
i
) =
h
m
(j i)
h
m
+k
1
:
But using again the notation f
(y) = f(y+k
1
) employed in the CUSUM example
of 2.2, this means
r
ij
=
h
m
(j i)
h
m
Z
(ji)(
h
m
)+
1
2
(
h
m
)
(ji)(
h
m
)
1
2
(
h
m
)
f
(y)dy = q
ji
(in terms of the notation (2.5) from the CUSUM example). The point is that
whether one begins from a discretize the Q k
1
distribution and employ the
MC material point of view or from a do numerical solution of an integral
equation point of view is largely immaterial. Very similar large systems of
linear equations must be solved in order to nd approximate ARLs.
As a second application of integral equation ideas to the analysis of process
monitoring schemes, consider the EWMA schemes of 4.1 of V&J where Q
1
; Q
2
; : : :
are iid with a continuous distribution specied by the probability density f(y).
Let
L(u) = the ARL of a EWMA scheme with EWMA
0
= u :
When one begins a EWMA sequence at u, there are 2 possibilities of where
he/she will be after a single observation, Q
1
. If Q
1
is extreme (Q
1
+(1)u >
UCL
EWMA
or Q
1
+(1 )u < LCL
EWMA
) then there will be an immediate
signal and the run length will be 1. If Q
1
is moderate (LCL
EWMA
Q
1
+
(1 )u UCL
EWMA
) one observation will have been spent and on average
L(Q
1
+(1)u) more observations are to be faced in order to produce a signal.
Now the event
LCL
EWMA
Q
1
+ (1 )u UCL
EWMA
is the event
LCL
EWMA
(1 )u
Q
1

UCL
EWMA
(1 )u
;
so this reasoning produces the equation
L(u) = 1
1 P[
LCL
EWMA
(1 )u
Q
1

UCL
EWMA
(1 )u
+
Z
UCL
EWMA
(1)u
LCL
EWMA
(1)u
(1 +L(y + (1 )u)) f(y)dy ;

or
L(u) = 1 +
Z
UCL
EWMA
(1)u
LCL
EWMA
(1)u
L(y + (1 )u)f(y)dy ;
or nally
L(u) = 1 +
1
Z
UCL
EWMA
LCLEWMA
L(y)f
y (1 )u
dy : (2.10)
As in the previous (CUSUM) case, one must usually resort to numerical
methods in order to approximate the solution to equation (2.10). For a partic-
ular quadrature rule for integrals on [LCL
EWMA
; UCL
EWMA
], for each a
i
one
has from equation (2.10) the approximation
L(a
i
) 1 +
1
m
X
j=1
w
j
L(a
j
)f
a
j
(1 )a
i
: (2.11)
Now expression (2.11) is standing for a set of m equations in the m unknowns
L(a
1
); : : : ; L(a
m
) that (as in the CUSUM case) can be thought of in terms of
the matrix expression (2.3) if one takes
L =
0
B
@
L(a
1
)
.
.
.
L(a
m
)
1
C
A and R
mm
=
0
@
w
j
f
aj(1)ai
1
A
: (2.12)
Solution of the system represented by equation (2.11) or the matrix expression
(2.3) with denitions (2.12) produces approximate values for L(a
1
); : : : ; L(a
m
)
and therefore an approximation for the function L(u) as
L(u) 1 +
1
m
X
j=1
w
j
L(a
j
)f
a
j
(1 )u
:
Again as in the CUSUM case, it is worth noting the similarity between the
set of equations used to nd MC ARL approximations and the set of equa-
tions used to nd integral equation ARL approximations. With the quadra-
ture rule (2.7) and an odd integer m, using the notation = (UCL
EWMA
LCL
EWMA
)=m employed in 2.2 in the EWMA example, note that a generic
entry of R dened in (2.12) is
r
ij
=
w
j
f
a
j
(1)a
i
=
f
a
i
+
(ji)

Z
a
i
+
(ji)
+

2
ai+
(ji)
2
f(y)dy = q
ij
;
(in terms of the notation (2.6) from the EWMA example of 2.2). That is,
as in the CUSUM case, the sets of equations used in the MC and integral
equation approximations for the EWMA
0
= a
i
ARLs of the scheme are very
similar.
As a nal example of the use of integral equations in the analysis of process
monitoring schemes, consider the X=MR schemes of 4.4 of V&J. Suppose that
observations x
1
; x
2
; : : : are iid with continuous marginal distribution specied
by the probability density f(y). Dene the function
L(y) = the mean number of additional observations to alarm, given that
there has been no alarm to date and the current observation is y.
Then note that as one begins X=MR monitoring, there are two possibilities of
where he/she will be after observing the rst individual, x
1
. If x
1
is extreme
(x
1
< LCL
x
or x
1
> UCL
x
) there will be an immediate signal and the run
length will be 1. If x
1
is not extreme (LCL
x
x
1
UCL
x
) one observation
will have been spent and on average another L(x
1
) observations will be required
in order to produce a signal. So it is reasonable that the ARL for the X=MR
scheme is
ARL = 1 (1 P[LCL
x
x
1
UCL
x
]) +
Z
UCLx
LCLx
(1 +L(y))f(y)dy ;
that is
ARL = 1 +
Z
UCL
x
LCL
x
L(y)f(y)dy ; (2.13)
where it remains to nd a way of computing the function L(y) in order to feed
it into expression (2.13).
In order to derive an integral equation for L(y) consider the situation if there
has been no alarm and the current individual observation is y. There are two
possibilities of where one will be after observing one more individual, x. If x
is extreme or too far from y (x < LCL
x
or x > UCL
x
or jx yj > UCL
R
)
only one additional observation is required to produce a signal. On the other
hand, if x is not extreme and not too far from y (LCL
x
x UCL
x
and
jx yj UCL
R
) one more observation will have been spent and on average
another L(x) will be required to produce a signal. That is,
L(y) = 1 (P[x < LCL
x
or x > UCL
x
or jx yj > UCL
R
])
+
Z
min(UCL
x
;y+UCL
R
)
max(LCLx;yUCLR)
(1 +L(x))f(x)dx ;
that is,
L(y) = 1 +
Z
min(UCLx;y+UCLR)
max(LCLx;yUCLR)
L(x)f(x)dx
= 1 +
Z
UCLx
LCL
x
I[jx yj UCL
R
]L(x)f(x)dx : (2.14)
(The notation I[A] is indicator function notation, meaning that when A holds
I[A] = 1; and otherwise I[A] = 0.) As in the earlier CUSUM and EWMA ex-
amples, once one species a quadrature rule for denite integrals on the interval
[LCL
x
; UCU
x
], this expression (2.14) provides a set of m linear equations for
approximate values of L(a
i
)s. When this system is solved, the resulting values
can be fed into a discretized version of equation (2.13) and an approximate ARL
produced. It is worth noting that the potential discontinuities of the integrand
in equation (2.14) (produced by the indicator function) have the eect of mak-
ing numerical solutions of this equation much less well-behaved than those for
the other integral equations developed in this section.
The examples of this section have dealt only with ARLs for schemes based
on (continuous) iid observations. It therefore should be said that:
1. The iid assumption can in some cases be relaxed to give tractable integral
equations for situations where correlated sequences Q
1
; Q
2
; : : : are involved
(see for example Problem 2.27),
2. Other descriptors of the run length distribution (beyond the ARL) can
often be shown to solve simple integral equations (see for example the
integral equations for CUSUM run length second moment and run length
probability function in Problem 2.31), and
3. In some cases, with discrete variables Q there are dierence equation ana-
logues of the integral equations presented here (that ultimately correspond
to the kind of MC calculations illustrated in the previous section).
Chapter 3
An Introduction to Discrete
Stochastic Control
Theory/Minimum Variance
Control
Section 3.6 of V&J provides an elementary introduction to the topic of Engi-
neering Control and contrasts this adjustment methodology with (the process
monitoring methodology of) control charting. The last item under the En-
gineering Control heading of Table 3.10 of V&J makes reference to optimal
stochastic control theory. The object of this theory is to model system behav-
ior using probability tools and let the consequences of the model assumptions
help guide one in the choice of eective control/adjustment algorithms. This
chapter provides a very brief introduction to this theory.
3.1 General Exposition
Let
f: : : ; Z(1); Z(0); Z(1); Z(2); : : :g
stand for observations on a process assuming that no control actions are taken.
One rst needs a stochastic/probabilistic model for the sequence fZ(t)g, and
we will let
F
stand for such a model. F is a joint distribution for the Zs and might, for
example, be:
1. a simple random walk model specied by the equation Z(t) = Z(t 1) +
(t), where the s are iid normal (0;
2
) random variables,
37
38CHAPTER3. ANINTRODUCTIONTODISCRETE STOCHASTICCONTROL THEORY/MIN
2. a random walk model with drift specied by the equation Z(t) = Z(t
1)+d+(t), where d is a constant and the s are iid normal (0;
2
) random
variables, or
3. some Box-Jenkins ARIMA model for the fZ(t)g sequence.
Then let
a(t)
stand for a control action taken at time t, after observing the process. One
needs notation for the current impact of control actions taken in past periods,
so we will further let
A(a; s)
stand for the current impact on the process of a control action a taken s periods
ago. In many systems, the control actions, a, are numerical, and A(a; s) = ah(s)
where h(s) is the so-called impulse response function giving the impact of a
unit control action taken s periods previous. A(a; s) might, for example, be:
1. given by A(a; s) = a for s 1 in a machine tool control problem where a
means move the cutting tool out a units (and the controlled variable is
a measured dimension of a work piece),
2. given by A(a; s) = 0 for s u and by A(a; s) = a for s > u in a machine
tool control problem where a means move the cutting tool out a units
and there are u periods of dead time, or
3. given by A(a; s) =

1 exp
sh
a for s 1 in a chemical process

control problem with time constant and control period h seconds.
We will then assume that what one actually observes for (controlled) process
behavior at time t 1 is
Y (t) = Z(t) +
t1
X
s=0
A(a(s); t s) ;
which is the sum of what would have been observed with no control and all of
the current eects of previous control actions. For t 0, a(t) will be chosen
based on
f: : : ; Z(1); Z(0); Y (1); Y (2); : : : ; Y (t)g :
A common objective in this context is to choose the actions so as to minimize
E
F
(Y (t) T(t))
2
or
t
X
s=1
E
F
(Y (s) T(s))
2
3.1. GENERAL EXPOSITION 39
for some (possibly time-dependent) target value T(s). The problem of choosing
of control actions to accomplish this goal is called the minimum variance
(MV) control problem, and it has a solution that can be described in fairly
(deceptively, perhaps) simple terms.
Note rst that given f: : : ; Z(1); Z(0); Y (1); Y (2); : : : ; Y (t)g one can recover
f: : : ; Z(1); Z(0); Z(1); Z(2); : : : ; Z(t)g. This is because
Z(s) = Y (s)
s1
X
r=0
A(a(r); s r)
i.e., to get Z(s), one simply subtracts the (known) eects of previous control
actions from Y (s).
Then the model F (at least in theory) provides one a conditional distribution
for Z(t + 1); Z(t + 2); Z(t + 3); : : : given the observed Zs through time t. The
conditional distribution for Z(t + 1); Z(t + 2); Z(t + 3) : : : given what one can
observe through time t, namely f: : : ; Z(1); Z(0); Y (1); Y (2); : : : ; Y (t)g, is then
the conditional distribution one gets for Z(t +1); Z(t +2); Z(t +3); : : : from the
model F after recovering Z(1); Z(2); : : : ; Z(t) from the corresponding Y s. Then
for s t + 1, let
E
F
[Z(s)j : : : ; Z(1); Z(0); Z(1); Z(2); : : : ; Z(t)] or just E
F
[Z(s)jZ
t
]
stand for the mean of this conditional distribution of Z(s) available at time t.
Suppose that there are u 0 periods of dead time (u could be 0). Then
the earliest Y that one can hope to inuence by choice of a(t) is Y (t +u + 1).
Notice then that if one takes action a(t) at time t, ones most natural projection
of Y (t +u + 1) at time t is
b
Y (t +u+1jt)
:
= E
F
[Z(t +u+1)jZ
t
] +
t1
X
s=0
A(a(s); t +u+1 s) +A(a(t); u+1)
It is then natural (and in fact turns out to give the MV control strategy) to try
to choose a(t) so that
b
Y (t +u + 1jt) = T(t +u + 1) :
That is, the MV strategy is to try to choose a(t) so that
A(a(t); u+1) = T(t+u+1)
(
E
F
[Z(t +u + 1)jZ
t
] +
t1
X
s=0
A(a(s); t +u + 1 s)
)
:
A caveat here is that in practice MV control tends to be ragged. That
is, in order to exactly optimize the mean squared error, constant tweaking (and
often fairly large adjustments are required). By changing ones control objective
somewhat it is possible to produce smoother optimal control policies that are
nearly as eective as MV algorithms in terms of keeping a process on target.
That is, instead of trying to optimize
E
F
t
X
s=1
(Y (s) T(s))
2
;
in a situation where the as are numerical (a = 0 indicating no adjustment
and the size of adjustments increasing with jaj) one might for a constant > 0
set out to minimize the alternative criterion
E
F
t
X
s=1
(Y (s) T(s))
2
+
t1
X
s=0
(a(s))
2
!
:
Doing so will smooth the MV algorithm.
3.2 An Example
To illustrate the meaning of the preceding formalism, consider the model (F)
specied by
Z(t) = W(t) +(t) for t 0
and W(t) = W(t 1) +d +(t) for t 1
(3.1)
for d a (known) constant, the s normal (0;
2
), the s normal (0;

2
) and
all the s and s independent. (Z(t) is a random walk with drift observed
with error.) Under this model and an appropriate 0 mean normal initializing
distribution for W(0), it is the case that each
b
Z(t + 1j t)
:
= E
F
[Z(t + 1)jZ(0); : : : ; Z(t)]
may be computed recursively as
b
Z(t + 1jt) = Z(t) + (1 )
b
Z(tjt 1) +d
for some constant (that depends upon the known variances
2
and
2
).
We will nd MV control policies under model (3.1) with two dierent func-
tions A(a; s). Consider rst the possibility
A(a; s) = a 8s 1 ; (3:2:2) (3.2)
(an adjustment a at a given time period takes its full and permanent eect
at the next time period).
Consider the situation at time t = 0. Available are Z(0) and
b
Z(0j1) (the
prior mean of W(0)) and from these one may compute the prediction
b
Z(1j0)
:
= Z(0) + (1 )
b
Z(0j1) +d :
3.2. AN EXAMPLE 41
That means that taking control action a(0), one should predict a value of
b
Y (1j0)
:
=
b
Z(1j0) +a(0)
for the controlled process at time t = 1, and upon setting this equal to the
target T(1) and solving for a(0) one should thus choose
a(0) = T(1)
b
Z(1j0) :
At time t = 1 one has observed Y (1) and may recover Z(1) by noting that
Y (1) = Z(1) +A(a(0); 1) = Z(1) +a(0) ;
so that
Z(1) = Y (1) a(0) :
Then a prediction (of the uncontrolled process) one step ahead is
b
Z(2j1)
:
= Z(1) + (1 )
b
Z(1j0) +d :
That means that with a target of T(2) one should predict a value of the con-
trolled process at time t = 2 of
b
Y (2j1)
:
=
b
Z(2j1) +a(0) +a(1) :
Upon setting this value equal to T(2) and solving it is clear that one should
choose
a(1) = T(2)
b
Z(2j1) +a(0)
:
So in general under (3.2), at time t one may note that
Z(t) = Y (t)
t1
X
s=0
a(s)
and (recursively) compute
b
Z(t + 1jt)
:
= Z(t) + (1 )
b
Z(tjt 1) +d :
Then setting the predicted value of the controlled process equal to T(t +1) and
solving for a(t), nd the MV control action
a(t) = T(t + 1)
b
Z(t + 1jt) +
t1
X
s=0
a(s)
!
:
Finally, consider the problem of MV control under the same model (3.1),
but now using
A(a; s) =
0 if s = 1
a for s = 2; 3; : : :
(3.3)
(a description of response to process adjustment involving one period of delay,
after which the full eect of an adjustment is immediately and permanently
felt).
Consider the situation at time t = 0. In hand are Z(0) and the prior mean
of W(0),
b
Z(0j1), and the rst Y that one can aect by choice of a(0) is Y (2).
Now
Z(2) = W(2) +(2) ;
= W(1) +d +(2) +(2) ;
= Z(1) (1) +d +(2) +(2)
so that
b
Z(2j0)
:
= E
F
[Z(2)jZ(0)] ;
= E
F
[Z(1) (1) +d +(2) +(2)jZ(0)] ;
=
b
Z(1j0) +d ;
= Z(0) + (1 )
b
Z(0j1) + 2d
is a prediction of where the uncontrolled process will be at time t = 2. Then a
prediction for the controlled process at time t = 2 is
b
Y (2j0)
:
=
b
Z(2j0) +A(a(0); 2) =
b
Z(2j0) +a(0)
and upon setting this equal to the time t = 2 target, T(2), and solving, one has
the MV control action
a(0) = T(2)
b
Z(2j0) :
At time t = 1 one has in hand Y (1) = Z(1) and
b
Z(1j0) and the rst Y that
can be aected by the choice of a(1) is Y (3). Now
Z(3) = W(3) +(3) ;
= W(2) +d +(3) +(3) ;
= Z(2) (2) +d +(3) +(3)
so that
b
Z(3j1)
:
= E
F
[Z(3)jZ(0); Z(1)] ;
= E
F
[Z(2) (2) +d +(3) +(3)jZ(0); Z(1)] ;
=
b
Z(2j1) +d ;
= Z(1) + (1 )
b
Z(1j0) + 2d
is a prediction of where the uncontrolled process will be at time t = 3. Then a
prediction for the controlled process at time t = 3 is
b
Y (3j1)
:
=
b
Z(3j1) +A(a(0); 3) +A(a(1); 2) =
b
Z(3j1) +a(0) +a(1)
3.2. AN EXAMPLE 43
and upon setting this equal to the time t = 3 target, T(3), and solving, one has
the MV control action
a(1) = T(3)
b
Z(3j1) +a(0)
:
Finally, in general under (3.3), one may at time t note that
Z(t) = Y (t)
t2
X
s=0
a(s)
and (recursively) compute
b
Z(t + 2jt)
:
= Z(t) + (1 )
b
Z(tjt 1) + 2d :
Then setting the time t + 2 predicted value of the controlled process equal to
T(t + 2) and solving for a(t), we nd the MV control action
a(t) = T(t + 2)
b
Z(t + 2jt) +
t1
X
s=0
a(s)
!
:
Chapter 4
Process Characterization
and Capability Analysis
Sections 5.1 through 5.3 of V&J discuss the problem of summarizing the be-
havior of a stable process. The bottom line of that discussion is that one-
sample statistical methods can be used in a straightforward manner to char-
acterize a process/population/universe standing behind data collected under
stable process conditions. Section 5.5 of V&J opens a discussion of summariz-
ing process behavior when it is not sensible to model all data in hand as random
draws from a single/xed universe. The notes in this chapter carry the theme
of 5.5 of V&J slightly further and add some theoretical detail missing in the
book.
4.1 General Comments on Assessing and Dis-
secting Overall Variation
The questions How much variation is there overall? and Where is the varia-
tion coming from? are fundamental to process characterization/understanding
and the guidance of improvement eorts. To provide a framework for discus-
sion here, suppose that in hand one has r samples of data, sample i of size n
i
(i = 1; : : : ; r). Depending upon the specic application, these r samples can
have many dierent logical structures. For example, 5.5 of V&J considers the
case where the n
i
are all the same and the r samples are naturally thought of as
having a balanced hierarchical/tree structure. But many others (both regular
and completely irregular) are possible. For example Figure 4.1 is a schematic
parallel to Figure 5.16 of V&J for a staggered nested data structure.
When data in hand represent the entire universe of interest, methods of
probability and statistical inference have no relevance to the basic questions
How much variation is there overall? and Where is the variation coming
from? The problem is one of descriptive statistics only, and various creative
45
46CHAPTER4. PROCESS CHARACTERIZATIONANDCAPABILITYANALYSIS
1
1 2
1
1 2 1 1 1
Level of A
Level of B(A)
Level of C(B(A))
Level of D(C(B(A)))
2
1
1 2 1
Figure 4.1: Schematic of a staggered Nested Data Set
combinations of methods of statistical graphics and basic numerical measures
(like sample variances and ranges) can be assembled to address these issues.
And most simply, a grand sample variance is one sensible characterization of
overall variation.
The tools of probability and statistical inference only become relevant when
one sees data in hand as representing something more than themselves. And
there are basically two standard routes to take in this enterprise. The rst
posits some statistical model for the process standing behind the data (like the
hierarchical random eects model (5.28) of V&J). One may then use the data
in hand in the estimation of parameters (and functions of parameters) of that
model in order to characterize process behavior, assess overall variability and
dissect that variation into interpretable pieces.
The second standard way in which probabilistic and statistical methods be-
come relevant (to the problems of assessing overall variation and analysis of its
components) is through the adoption of a nite population sampling perspec-
tive. That is, there are times where there is conceptually some (possibly highly
structured) concrete data set of interest and the data in hand arise through the
application (possibly in various complicated ways) of random selection of some
of the elements of that data set. (As one possible example, think of a warehouse
that contains 100 crates, each of which contains 4 trays, each of which in turn
holds 50 individual machine parts. The 20,000 parts in the warehouse could
constitute a concrete population of interest. If one were to sample 3 crates
at random, select at random 2 trays from each and then select 5 parts from
each tray at random, one has a classical nite population sampling problem.
Probability/randomness has entered through the sampling that is necessitated
because one is unwilling to collect data on all 20,000 parts.)
Section 5.5 of V&J introduces the rst of these two approaches to assessing
and dissecting overall variation for balanced hierarchical data. But it does not
treat the nite population sampling ideas at all. The present chapter of these
notes thus extends slightly the random eects analysis ideas discussed in 5.5
and then presents some simple material from the theory of nite population
4.2. MORE ONANALYSIS UNDERTHE HIERARCHICAL RANDOMEFFECTS MODEL47
sampling.
4.2 More on Analysis Under the Hierarchical
Random Eects Model
Consider the hierarchical random eects model with 2 levels of nesting discussed
in 5.5.2 of V&J. We will continue the notations y
ijk
; y
ij
; y
i:
and y
::
used in
that section and also adopt some additional notation. For one thing, it will be
useful to dene some ranges. Let
R
ij
= max
k
y
ijk
min
k
y
ijk
= the range of the jth sample within the ith level of A;
i
= max
j
y
ij
min
j
y
ij
= the range of the J sample means within the ith level of A;
and
= max
i
y
i:
min
i
y
i:
= the range of the means for the I levels of A :
It will also be useful to consider the ANOVA sums of squares and mean
squares alluded to briey in 5.5.3. So let
SSTot =
X
i;j;k
(y
ijk
y
::
)
2
= (IJK 1) the grand sample variance of all IJK observations ;
SSC(B(A)) =
X
i;j;k
(y
ijk
y
ij
)
2
= (K 1) the sum of all IJ level C sample variances ;
SSB(A) = K
X
i;j
( y
ij
y
i:
)
2
= K(J 1) the sum of all I sample variances of J means y
ij
and
SSA = KJ
X
i
( y
i:
y
::
)
2
= KJ(I 1) the sample variance of the I means y
i:
:
Note that in the notation of 5.5.2, SSA = KJ(I 1)s
2
A
, SSB(A) = K(J
1)
P
I
i=1
s
2
Bi
and SSC(B(A)) = (K 1)
P
i;j
s
2
ij
= IJ(K 1)b
2
. And it is an
algebraic fact that SSTot = SSA+SSB(A) +SSC(B(A)).
Mean squares are derived from these sums of squares by dividing by appro-
priate degrees of freedom. That is, dene
MSA
:
=
SSA
I 1
;
MSB(A)
:
=
SSB(A)
I(J 1)
;
and
MSC(B(A))
:
=
SSC(B(A))
IJ(K 1)
:
Now these ranges, sums of squares and mean squares are interesting measures
of variation in their own right, but are especially helpful when used to produce
estimates of variance components and functions of variance components. For
example, it is straightforward to verify that under the hierarchical random eects
model (5.28) of V&J
ER
ij
= d
2
(K) ;
E
i
= d
2
(J)
q
+
2
=K
and
E = d
2
(I)
q
+
2
=J +
2
=JK :
So, reasoning as in 2.2.2 of V&J (there in the context of two-way random eects
models and gage R&R) reasonable range-based point estimates of the variance
components are
b
2
=

R
d
2
(K)
2
;
b
2
= max
0;
d
2
(J)
b
2
K
!
and
b
2
= max
0;

d
2
(I)
1
J
d
2
(J)
2
!
:
Now by applying linear model theory or reasoning from V&J displays (5.30)
and (5.32) and the fact that Es
2
ij
=
2
, one can nd expected values for the
mean squares above. These are
EMSA = KJ
2
+K
2
+
2
;
EMSB(A) = K
2
+
2
and
EMSC(B(A)) =
2
:
And in a fashion completely parallel to the exposition in 1.4 of these notes,
standard linear model theory implies that the quantities
IJ(K 1)MSC(B(A))
EMSC(B(A))
;
I(J 1)MSB(A)
EMSB(A)
and
(I 1)MSA
EMSA
are independent
2
random variables with respective degrees of freedom
IJ(K 1); I(J 1) and (I 1) :
4.2. MORE ONANALYSIS UNDERTHE HIERARCHICAL RANDOMEFFECTS MODEL49
Table 4.1: Balanced Data Hierarchical Random Eects Analysis ANOVA Table
(2 Levels of Nesting)
ANOVA Table
Source SS df MS EMS
A SSA I 1 MSA KJ
2
+K
2
+
2
B(A) SSB(A) I(J 1) MSB(A) K
2
+
2
C(B(A)) SSC(B(A)) IJ(K 1) MSC(B(A))
2
Total SSTot IJK 1
These facts about sums of squares and mean squares for the hierarchical
random eects model are conveniently summarized in the usual (hierarchical
random eects model) ANOVA table (for two levels of nesting), Table 4.1. Fur-
ther, the fact that the expected mean squares are simple linear combinations
of the variance components
2
,
2
and
2
motivates the use of linear combina-
tions of mean squares in the estimation of the variance components (as in 5.5.3
of V&J). In fact (as indicated in 5.5.3 of V&J) the standard ANOVA-based
estimators
b
2
=
SSC(B(A))
IJ(K 1)
;
b
2
=
1
K
max
0;
SSB(A)
I(J 1)
b
2
and
b
2
=
1
JK
max
0;
SSA
(I 1)

SSB(A)
I(J 1)
are exactly the estimators (described without using ANOVA notation) in dis-
plays (5.29), (5.31) and (5.33) of V&J. The virtue of describing them in the
present terms is to suggest/emphasize that all that was said in 1.4 and 1.5
(in the gage R&R context) about making standard errors for functions of mean
squares and ANOVA-based condence intervals for functions of variance com-
ponents is equally true in the present context.
For example, the formula (1.3) of these notes can be applied to derive stan-
dard errors for b
2
and b
2
immediately above. Or since
=
1
K
EMSB(A)
1
K
EMSC(B(A))
and
=
1
JK
EMSA
1
JK
EMSB(A)
are both of form (1.4), the material of 1.5 can be used to set condence limits
for these quantities.
As a nal note in this discussion of the what is possible under the hierarchical
random eects model, it is worth noting that while the present discussion has
been conned to a balanced data framework, Problem 4.8 shows that at least
point estimation of variance components can be done in a fairly elementary
fashion even in unbalanced data contexts.
4.3 Finite Population Sampling and Balanced
Hierarchical Structures
This brief subsection is meant to illustrate the kinds of things that can be done
with nite population sampling theory in terms of estimating overall variability
in a (balanced) hierarchical concrete population of items and dissecting that
variability.
Consider rst a nite population consisting of NM items arranged into N
levels of A, with M levels of B within each level of A. (For example, there might
be N boxes, each containing M widgets. Or there might be N days, on each of
which M items are manufactured.) Let
y
ij
= a measurement on the item at level i of A and level j of B within the
ith level of A (e.g. the diameter of the jth widget in the ith box) :
Suppose that the quantity of interest is the (grand) variance of all NM mea-
surements,
S
2
=
1
NM 1
N
X
i=1
M
X
j=1
(y
ij
y
:
)
2
:
(This is clearly one quantication of overall variation.)
The usual one-way ANOVA identity applied to the NM numbers making up
the population of interest shows that the population variance can be expressed
as
S
2
=
1
NM 1
M(N 1)S
2
A
+N(M 1)S
2
B
where
S
2
A
=
1
N 1
N
X
i=1
( y
i
y
:
)
2
= the variance of the N A level means
and
S
2
B
=
1
N
N
X
i=1
0
@
1
M 1
M
X
j=1
(y
ij
y
i
)
2
1
A
= the average of the N within A level variances.
Suppose that one selects a simple randomsample of n levels of A, and for each
level of A a simple random sample of m levels of B within A. (For example, one
might sample n boxes and m widgets from each box.) A naive way to estimate
S
2
is to simply use the sample variance
s
2
=
1
nm1
X
(y
ij
y
:
)
2
4.3. FINITE POPULATIONSAMPLINGANDBALANCEDHIERARCHICAL STRUCTURES51
where the sum is over the nm items selected and y
:
is the mean of those mea-
surements. Unfortunately, this is not such a good estimator. Material from
Chapter 10 of Cochrans Sampling Techniques can be used to show that
Es
2
=
m(n 1)
nm1
S
2
A
+
n(m1)
nm1
+
m(n 1)
nm1
1
m

1
M
S
2
B
;
which is not in general equal to S
2
.
However, it is possible to nd a linear combination of the sample versions of
S
2
A
and S
2
B
that has expected value equal to the population variance. That is,
let
s
2
A
=
1
n 1
X
( y
i
y
:
)
2
= the sample variance of the n sample means (from the sampled levels of A)
and
s
2
B
=
1
n
X
1
m1
X
(y
ij
y
i
)
2
= the average of the n sample variances (from the sampled levels of A) :

Then, it turns out that
Es
2
A
= S
2
A
+
1
m

1
M
S
2
B
and
Es
2
B
= S
2
B
:
From this it follows that an unbiased estimator of S
2
is the quantity
M(N 1)
NM 1
s
2
A
+
N(M 1)
NM 1

M(N 1)
NM 1
1
m

1
M
s
2
B
:
This kind of analysis can, of course, be carried beyond the case of a single
level of nesting. For example, consider the situation with two levels of nest-
ing (where both the nite population and the observed values have balanced
hierarchical structure). Then in the ANOVA notation of 4.2 above, take
s
2
A
=
SSA
(I 1)JK
;
s
2
B
=
SSB(A)
I(J 1)K
and
s
2
C
=
SSC(B(A))
IJ(K 1)
:
Let S
2
A
; S
2
B
and S
2
C
be the population analogs of s
2
A
; s
2
B
and s
2
C
, and f
B
and
f
C
be the sampling fractions at the second and third stages of item selection.
Then it turns out that
Es
2
A
= S
2
A
+
(1 f
B
)
J
S
2
B
+
(1 f
C
)
JK
S
2
C
;
Es
2
B
= S
2
B
+
(1 f
C
)
K
S
2
C
and
Es
2
C
= S
2
C
:
So (since the grand population variance, S
2
, is expressible as a linear combina-
tion of S
2
A
; S
2
B
and S
2
C
, each of which can be estimated by a linear combination
of s
2
A
; s
2
B
and s
2
C
) an unbiased estimator of the population variance can be built
as an appropriate linear combination of s
2
A
; s
2
B
and s
2
C
.
Chapter 5
Sampling Inspection
Chapter 8 of V&J treats the subject of sampling inspection, introducing the
basic methods of acceptance sampling and continuous inspection. This chapter
extends that discussion somewhat. We consider how (in the fraction noncon-
forming context) one can move from single sampling plans to quite general
acceptance sampling plans, we provide a brief discussion of the eects of inspec-
tion/measurement error on the real (as opposed to nominal) statistical proper-
ties of acceptance sampling plans, and then the chapter closes with an elabo-
ration of 8.5 of V&J, providing some more details on the matter of economic
arguments in the choice of sampling inspection schemes.
5.1 More on Fraction Nonconforming Acceptance
Sampling
Section 8.1 of V&J (and for that matter 8.2 as well) connes itself to the
discussion of single sampling plans. For those plans, a sample size is xed in
advance at some value n, and lot disposal is decided on the basis of inspection of
exactly n items. There are, however, often good reasons to consider acceptance
sampling plans whose ultimate sample size depends upon how the inspected
items look as they are examined. (One might, for example, want to consider
a double sampling plan that inspects an initial small sample, terminating
sampling if items look especially good or especially bad so that appropriate
lot disposal seems clear, but takes an additional larger sample if the initial
one looks inconclusive regarding the likely quality of the lot.) This section
considers fraction nonconforming acceptance sampling from the most general
perspective possible and develops the OC, ASN, AOQ and ATI for a general
fraction nonconforming plan.
Consider the possibility of inspecting one item at a time from a lot of N, and
after inspecting each successive item deciding to 1) stop sampling and accept
53
54 CHAPTER 5. SAMPLING INSPECTION
1 4
2
4
6
X
n
Accept
Reject
1
3
5
n
5 6 3 2
Figure 5.1: Diagram for the n = 6, c = 2 Single Sampling Plan
the lot, 2) stop sampling and reject the lot or 3) inspect another item. With
X
n
= the number of nonconforming items found among the rst n inspected
a helpful way of thinking about various dierent plans in this context is in
terms of possible paths through a grid of ordered pairs of integers (n; X
n
) with
0 X
n
n. Dierent acceptance sampling plans then amount to dierent
choices of Accept Boundary and Reject Boundary. Figure 5.1 is a diagram
representing a single sampling plan with n = 6 and c = 2, Figure 5.2 is a diagram
representing a doubly curtailed version of this plan (one that recognizes that
there is no need to continue inspection after lot disposal has been determined)
and Figure 5.3 illustrates a double sampling plan in these terms.
Now on a diagram like those in the gures, one may very quickly count the
number of permissible paths from (0; 0) to a point in the grid by (working left
to right) marking each point (n; X
n
) in the grid (that it is possible to reach)
with the sum of the numbers of paths reaching (n 1; X
n
1) and (n 1; X
n
)
provided neither of those points is a stop-sampling point. (No feasible paths
leave a stop-sampling point. So path counts to them do not contribute to path
counts for any points to their right.) Figure 5.4 is a version of Figure 5.2 with
permissible movements through the (n; X
n
) grid marked by arrows, and path
counts indicated.
The reason that one cares about the path counts is that for any stop-sampling
5.1. MORE ONFRACTIONNONCONFORMINGACCEPTANCE SAMPLING55
X
n
Accept
Reject
1 4
2
1
3
n
5 6 3 2
Figure 5.2: Diagram for Doubly Curtailed n = 6, c = 2 Single Sampling Plan
Accept
Reject
X
n
1 4
2
4
1
3
5
n
5 6 3 2
Figure 5.3: Diagram for a Small Double Sampling Plan
X
n
Accept
Reject
1 4
2
1
3
n
5 6 3 2
1 1
1 2 3 4 4
1 3 6 10 10
10 6 3 1
1 1
Figure 5.4: Diagram for the Doubly Curtailed Single Sampling Plan with Path
Counts Indicated
point (n; X
n
), from perspective A
P[reaching (n; X
n
)] = (path count from (0,0) to (n; X
n
))
Nn
NpX
n
N
Np
;
while from perspective B
P[reaching (n; X
n
)] = (path count from (0,0) to (n; X
n
)) p
Xn
(1 p)
nXn
:
And these probabilities of reaching the various stop sampling points are the
fundamental building blocks of the standard statistical characterizations of an
acceptance sampling plan.
For example, with A and R respectively the acceptance and rejection bound-
aries, the OC for an arbitrary fraction nonconforming plan is
Pa =
X
(n;Xn)2A
P[reaching (n; X
n
)] : (5.1)
And the mean number of items sampled (the Average Sample Number) is
ASN =
X
(n;Xn)2A[R
nP[reaching (n; X
n
)] : (5.2)
Further, under the rectifying inspection scenario, from perspective B
AOQ =
X
(n;Xn)2A
(1
n
N
)pP[reaching (n; X
n
)] ; (5.3)
from perspective A
AOQ =
X
(n;Xn)2A
(p
X
n
N
)P[reaching (n; X
n
)] (5.4)
5.1. MORE ONFRACTIONNONCONFORMINGACCEPTANCE SAMPLING57
and
ATI = N(1 Pa) +
X
(n;Xn)2A
nP[reaching (n; X
n
)] : (5.5)
These formulas are conceptually very simple and quite universal. The fact
that specializing them to any particular choice of acceptance boundary and
rejection boundary might have been unpleasant when computations had to be
done by hand is largely irrelevant in todays world of plentiful fast and cheap
computing. These simple formulas and a personal computer make completely
obsolete the many many pages of specialized formulas that at one time lled
books on acceptance sampling.
Two other matters of interest remain to be raised regarding this general
approach to fraction nonconforming acceptance sampling. The rst concerns
the dicult mathematical question What are good shapes for the accept and
reject boundaries? We will talk a bit in the nal section of this chapter about
criteria upon which various plans might be compared and allude to how one
might try to nd a best plan (best shapes for the acceptance and rejection
boundaries) according to such criteria. But at this point, we wish only to
note that Abraham Wald working in the 1940s on the problem of sequential
testing, developed some approximate theory that suggests that parallel straight
line boundaries (the acceptance boundary below the rejection boundary) have
some attractive properties. He was even able to provide some approximate
two-point design criteria. That is, in order to produce a plan whose OC curve
runs approximately through the points (p
1
; Pa
1
) and (p
2
; Pa
2
) (for p
1
< p
2
and
Pa
1
> Pa
2
) Wald suggested linear stop-sampling boundaries with
slope =
ln
1p
1
1p2
ln
p2(1p1)
p1(1p2)
: (5.6)
An appropriate X
n
-intercept for the acceptance boundary is approximately
h
A
=
ln
Pa1
Pa2
ln
p2(1p1)
p
1
(1p
2
)
; (5.7)
while an appropriate X
n
-intercept for the rejection boundary is approximately
h
R
=
ln
1Pa2
1Pa
1
ln
p
2
(1p
1
)
p1(1p2)
: (5.8)
Wald actually derived formulas (5.6) through (5.8) under innite lot size
assumptions (that also allowed him to produce some approximations for both the
OC and ASN of his plans). Where one is thinking of applying Walds boundaries
in acceptance sampling of a real (nite N) lot, the question of exactly how to
truncate the sampling (close in the right side of the continue sampling region)
X
n
1 4
2
1
3
n
5 6 3 2
1
1 2
1 2 3 4
4 4 3
1 1 1
0
Figure 5.5: Path Counts from (1; 1) to Stop Sampling Points for the Plan of
Figure 5.4
must be answered in some sensible fashion. And once that is done, the basic
formulas (5.1) through (5.5) are of course relevant to describing the resulting
plan. (See Problem 5.4 for an example of this kind of logic in action.)
Finally, it is an interesting side-light here (that can come into play if one
wishes to estimate p based on data from something other than a single sampling
plan) that provided the stop-sampling boundary has exactly one more point in it
than the largest possible value of n, the uniformly minimum variance unbiased
estimator of p for both type A and type B contexts is (for (n; X
n
) a stop-
sampling point)
b p ((n; X
n
)) =
path count from (1,1) to (n; X
n
)
path count from (0,0) to (n; X
n
)
:
For example, Figure 5.5 shows the path counts from (1,1) needed (in conjunction
with the path counts indicated in Figure 5.4) to nd the uniformly minimum
variance unbiased estimator of p when the doubly curtailed single sampling plan
of Figure 5.4 is used.
Table 5.1 lists the values of b p for the 7 points in the stop-sampling boundary
for the doubly curtailed single sampling plan with n = 6 and c = 2, along with
the corresponding values of X
n
=n (the maximum likelihood estimator of p).
5.2 Imperfect Inspection and Acceptance Sam-
pling
The nominal statistical properties of sampling inspection procedures are per-
fect inspection properties. The OC formulas for the attributes plans in 8.1
and 8.4 of V&J and 5.1 above are really premised on the ability to tell with
certainty whether an inspected item is conforming or nonconforming. And the
OC formulas for the variables plans in 8.2 of V&J are premised on an assump-
tion that the measurement x that determines whether an item is conforming or
5.2. IMPERFECT INSPECTION AND ACCEPTANCE SAMPLING 59
Table 5.1: The UMVUE and MLE of p for the Doubly Curtailed Single Sampling
Plan
Stop-sampling point (n; X
n
) UMVUE, b p MLE, X
n
=n
(3; 3) 1=1 3=3
(4; 0) 0=1 0=4
(4; 3) 2=3 3=4
(5; 1) 1=4 1=5
(5; 3) 3=6 3=5
(6; 2) 4=10 2=6
(6; 3) 4=10 3=6
Table 5.2: Perspective B Description of a Single Inspection Allowing fo Inspec-
tion Error
Inspection Result
G D
Actual G (1 w
G
)(1 p) w
G
(1 p) 1 p
Condition D pw
D
p(1 w
D
) p
1 p
nonconforming can be obtained for a given item completely without measure-

ment error. But the truth is that real-world inspection is not perfect and the
nominal statistical properties of these methods at best approximate their actual
properties. The purpose of this section is to investigate (rst in the attributes
context and then in the variables context) just how far actual OC values for
common acceptance sampling plans can be from nominal ones.
Consider rst the percent defective context and suppose that when a con-
forming (good) item is inspected, there is a probability w
G
of misclassifying it as
nonconforming. Similarly, suppose that when a nonconforming (defective) item
is inspected, there is a probability w
D
of misclassifying it as conforming. Then
from perspective B, a probabilistic description of any single inspected item is
given in Table 5.2, where in that table we are using the abbreviation
p
= w
G
(1 p) +p(1 w
D
)
for the probability that an item (of unspecied actual condition) is classied as
nonconforming by the inspection process.
It should thus be obvious that from perspective B in the fraction noncon-
forming context, an attributes single sampling plan with sample size n and
acceptance number c has an actual acceptance probability that depends not
only on p but on w
G
and w
D
as well through the formula
Pa(p; w
G
; w
D
) =
c
X
x=0
n
x
(p
)
x
(1 p
)
nx
: (5.9)
On the other hand, the perspective A version of the fraction nonconforming
scenario yields the following. For an integer x from 0 to n, let U
x
and V
x
be
independent random variables,
U
x
Binomial (x; 1 w
D
) and V
x
Binomial (n x; w
G
) :
And let
r
x
= P[U
x
+V
x
c]
be the probability that a sample containing x nonconforming items actually
passes the lot acceptance criterion. (Note that the nonstandard distribution of
U
x
+V
x
can be generated using the same adding on diagonals of a table of joint
probabilities idea used in 1.7.1 to generate the distribution of x.) Then it is
evident that from perspective A an attributes single sampling plan with sample
size n and acceptance number c has an actual acceptance probability
Pa(p; w
G
; w
D
) =
n
X
x=0
Np
x
N(1p)
nx
N
n
r
x
: (5.10)
It is clear that nonzero w
G
or w
D
change nominal OCs given in displays (8.6)
and (8.5) of V&J into the possibly more realistic versions given respectively by
equations (5.9) and (5.10) here. In some cases, it may be possible to determine
w
G
and w
D
experimentally and therefore derive both nominal and real OC
curves for a fraction nonconforming single sampling plan. Or, if one were a
priori willing to guarantee that 0 w
G
a and that 0 w
D
b, it is pretty
clear that from perspective B one might then at least guarantee that
Pa(p; a; 0) Pa(p; w
G
; w
D
) Pa(p; 0; b) (5.11)
and have an OC band in which the real OC (that depends upon the unknown
inspection ecacy) is guaranteed to lie.
Similar analyses can be done for nonconformities per unit contexts as follows.
Suppose that during inspection of product, real nonconformities are missed
with probability m and that (independent of the occurrence and inspection
of real nonconformities) phantom nonconformities are observed according
to a Poisson process with rate
P
per unit inspected. Then from perspective B
in a nonconformities per unit context, the number of nonconformities observed
on k units is Poisson with mean
k((1 m) +
P
) ;
so that an actual acceptance probability corresponding to the nominal one given
in display (8.8) of V&J is
Pa(;
P
; m) =
c
X
x=0
exp(k((1 m) +
P
)) (k((1 m) +
P
))
x
x!
: (5.12)
And from perspective A, with a realized per unit defect rate on N units,
let U
;m
Binomial (k;
k
N
(1 m)) be independent of V
P
Poisson(k
P
).
5.2. IMPERFECT INSPECTION AND ACCEPTANCE SAMPLING 61
Then an actual acceptance probability corresponding to the nominal one given
in display (8.7) of V&J is
Pa(;
P
; m) = P[U
;m
+V
P
c] : (5.13)
And the same kinds of bounding ideas used above for the fraction nonconforming
context might be used with the OC (5.12) in the mean nonconformities per unit
context. Pretty clearly, if one could guarantee that
P
a and that m b, one
would have (from display (5.12))
Pa(; a; 0) Pa(;
P
; m) Pa(; 0; b) (5.14)
in the perspective B situation.
The violence done to the OC notion by the possibility of imperfect inspec-
tion in an attributes sampling context is serious, but not completely unman-
ageable. That is, where one can determine the likelihood of inspection errors
experimentally, expressions (5.9), (5.10), (5.12) and (5.13) are simple enough
characterizations of real OCs. And where w
G
and w
D
(or
P
and m) are small,
bounds like (5.11) (or (5.14)) show that both the nominal (the w
G
= 0 and
w
D
= 0, or
P
= 0 and m = 0 case) OC and real OC are trapped in a fairly
narrow band and can not be too dierent. Unfortunately, the situation is far
less happy in the variables sampling context.
The origin of the diculty with admitting there is measurement error when
it comes to variables acceptance sampling is the fundamental fact that standard
variables plans attempt to treat all (; ) pairs with the same value of p equally.
And in short, once one admits to the possibility of measurement error clouding
the evaluation of the quantity x that must say whether a given item is conform-
ing or nonconforming, that goal is unattainable. For any level of measurement
error, there are (; ) pairs (with very small ) for which product variation can
so to speak hide in the measurement noise. So some fairly bizarre real OC
properties result for standard plans.
To illustrate, consider the case of unknown variables acceptance sam-
pling with a lower specication, L and adopt the basic measurement model
(2.1) of V&J for what is actually observed when an item with characteristic x
is measured. Now the development in 8.2 of V&J deals with a normal (; )
distribution for observations. An important issue is What observations? Is it
the xs or the ys of the model (2.1)? It must be the xs, for the simple reason
that p is dened in terms of and . These parameters describe what the lot
is really like, NOT what it looks like when measured with error. That is, the
of 8.2 of V&J must be the
x
of page 19 of V&J. But then the analysis of
8.2 is done essentially supposing that one has at his or her disposal x and s
x
to use for decision making purposes, while all that is really available are y and
s
y
!!! And that turns out to make a huge dierence in the real OC properties of
the standard method put forth in 8.2.
That is, applying criterion (8.35) of V&J to what can really be observed
(namely the noise-corrupted ys) one accepts a lot i
y L ks
y
: (5.15)
And under model (2.1) of V&J, a given set of parameters (
x
;
x
) for the x
distribution has corresponding fraction nonconforming
p(
x
;
x
) =
L
x
and acceptance probability

Pa(
x
;
x
; ;
measurement
) = P
y L
s
y
k
= P
0
@
yy
y=
p
n

Ly
y=
p
n
sy
y
k
p
n
1
A
where
y
is given in display (2.3) of V&J. But then let
=
L
y
y
=
p
n
=
(L
x
)=
x
=
x
q
1 +

2
measurement
2
x
=
p
n
; (5.16)
and note that
y
y
y
=
p
n
Normal (0; 1)
independent of
s
y
y
, which has the distribution of
p
U=(n 1) for U a
2
n1
random variable. That is, with W a noncentral t random variable with noncen-
trality parameter given in display (5.16), we have
Pa(
x
;
x
; ;
measurement
) = P[W k
p
n] :
And the crux of the matter is that (even if measurement bias, , is 0) in
display (5.16) is not a function of (L
x
)=
x
alone unless one assumes that
measurement
is EXACTLY 0.
Even with no measurement bias, if
measurement
6= 0 there are (
x
;
x
) pairs
with
L
x
x
= z
(and therefore p = (z)) and ranging all the way from z
p
n to 0. Thus
considering z 0 and p :5 there are corresponding Pas ranging from
P[a t
n1
random variable k
p
n]
to
P[a non-central t
n1
(z
p
n) random variable k
p
n] ;
(the nominal OC), while considering z 0 and p :5 there are corresponding
Pas ranging from (the nominal OC)
P[a non-central t
n1
(z
p
n) random variable k
p
n] ;
5.3. SOME DETAILS CONCERNINGTHE ECONOMICANALYSIS OF SAMPLINGINSPECTION63
p .5
1.0
Pa(p)
Figure 5.6: Typical Real OC for a One-Sided Variables Acceptance Sampling
Plan in the Presence of Nonzero Measurement Error
to
P[a t
n1
random variable k
p
n] :
That is, one is confronted with the extremely unpleasant and (initially counter-
intuitive) picture of real OC indicated in Figure 5.6.
It is important to understand the picture painted in Figure 5.6. The situation
is worse than in the attributes data case. There, if one knows the ecacy of
the inspection methodology it is at least possible to pick a single appropriate
OC curve. (The OC bands indicated by displays (5.11) and (5.14) are created
only by ignorance of inspection ecacy.) The bizarre OC bands created in
the variables context (and sketched in Figure 5.6) do not reduce to curves if one
knows the inspection bias and precision, but rather are intrinsic to the fact that
unless
measurement
is exactly 0, dierent (; ) pairs with the same p must have
dierent Pas under acceptance criterion (5.15). And the only way that one can
replace the situation pictured in Figure 5.6 with one having a thinner and more
palatable OC band (something approximating a curve) is by guaranteeing
that
2
x
2
measurement
is of some appreciable size. That is, given a particular measurement precision,
one must agree to concern oneself only with cases where product variation cannot
hide in measurement noise. Such is the only way that one can even come close
to the variables sampling goal of treating (; ) pairs with the same p equally.
5.3 Some Details Concerning the Economic Analy-
sis of Sampling Inspection
Section 8.5 of V&J alludes briey to the possibility of using economic/decision-
theoretic arguments in the choice of sampling inspection schemes and cites the
1994 Technometrics paper of Vander Wiel and Vardeman. Our rst objective
in this section is to provide some additional details of the Vander Wiel and
Vardeman analysis. To that end, consider a stable process fraction noncon-
forming situation and continue the w
G
and w
D
notation used above (and also
introduced on page 493 of V&J). Note that Table 5.2 remains an appropriate
description of the results of a single inspection. We will suppose that inspection
costs are accrued on a per item basis and adopt the notation of Table 8.16 of
V&J for the costs.
As a vehicle to a very quick demonstration of the famous all or none
principle, consider facing N potential inspections and employing a random
inspection policy that inspects each item independently with probability .
Then the mean cost suered over N items is simply N times that suered for 1
item. And this is
ECost = (k
I
+ (1 p)w
G
k
GF
+p(1 w
D
)k
DF
+pw
D
k
DP
) + (1 )pk
DU
= (k
I
+w
G
k
GF
pK) +pk
DU
(5.17)
for
K = (1 w
D
)(k
DU
k
DF
) +w
D
(k
DU
k
DP
) +w
G
k
GF
(as in display (8.50) of V&J). Now it is clear from display (5.17) that if K < 0,
ECost is minimized over choices of by the choice = 0. On the other hand,
if K > 0, ECost is minimized over choices of
by the choice = 0 if p
k
I
+w
G
k
GF
K
and
by the choice = 1 if p
k
I
+w
G
k
GF
K
:
That is, if one denes
p
c
=
1 if K 0
k
I
+w
G
k
GF
K
if K > 0
then an optimal random inspection policy is clearly
= 0 (do no inspection) if p < p
c
and
= 1 (inspect everything) if p > p
c
:
This development is simple and completely typical of what one gets from eco-
nomic analyses of stable process (perspective B) inspection scenarios. Where
quality is poor, all items should be inspected, and where it is good none should
be inspected. Vander Wiel and Vardeman argue that the specic criterion de-
veloped here (and phrased in terms of p
c
) holds not only as one looks for an
optimal random inspection policy, but completely generally as one looks among
all possible inspection policies for one that minimizes expected total cost. But
it is essential to remember that the context is a stable process/perspective B
context, where costs are accrued on a per item basis, and in order to implement
the optimal policy one must know p! In other contexts, the best (minimum
expected cost) implementable/realizable policy will often turn out to not be of
the all or none variety. The remainder of this section will elaborate on this
assertion.
For the balance of the section we will consider (Barlows formulation) of
what well call the Deming Inspection Problem (as Demings consideration
of this problem rekindled interest in these matters and engendered considerable
controversy and confusion in the 1980s and early 1990s). That is, well consider
a lot of N items, assume a cost structure where
k
1
= the cost of inspecting one item (at the proposed inspection site)
and
k
2
= the cost of later grief caused by a defective item that is not detected
and suppose that inspection is without error. (This is the Vander Wiel and
Vardeman cost structure with k
I
= k
1
; k
DF
= 0 and k
DU
= k
2
, where both
w
G
and w
D
are assumed to be 0.) The objective will be optimal (minimum
expected cost) choice of a xed n inspection plan (in the language of 8.1
of V&J, a single sampling with rectication plan). That is, well consider the
optimal choice of n and c supposing that with
X = the number nonconforming in a sample of n ;
if X c the lot will be accepted (all nonconforming items in the sample
will be replaced with good ones and no more inspection will be done), while
if X > c the lot will be rejected (all items in the lot will be inspected and
all nonconforming items replaced with good ones). (The implicit assumption
here is that replacements for nonconforming items are somehow known to be
conforming and are produced for free.) And we will continue use of the stable
process or perspective B model for the generation of the items in the lot.
In this problem, the expected total cost associated with the lot is a function
of n, c and p,
ETC(n; c; p) = k
1
n + (1 Pa(n; c; p))k
1
(N n) +pPa(n; c; p)k
2
(N n)
= k
1
N
1 +Pa(n; c; p)
1
n
N
p
k
2
k
1
1
: (5.18)
Optimal choice of n and c requires that one be in the business of comparing the
functions of p dened in display (5.18). How one approaches that comparison
depends upon what one is willing to input into the decision process in terms of
information about p.
First, if p is xed/known and available for use in choosing n and c, the op-
timization of criterion (5.18) is completely straightforward. It amounts only to
the comparison of numbers (one for each (n; c) pair), not functions. And the
solution is quite simple. In the case that p > k
1
=k
2
,

p
k2
k
1
1
> 0 and from

examination of display (5.18) minimum expected total cost will be achieved if
Pa(n; c; p) = 0 or if

1
n
N
= 0. That is, all is optimal. In the case that

p < k
1
=k
2
,

p
k2
k
1
1
< 0 and from examination of formula (5.18) minimum

expected total cost will be achieved if Pa(n; c; p) = 1 and

1
n
N
= 1. That
is, none is optimal. This is a manifestation of the general Vander Wiel and
Vardeman result. For known p in this kind of problem, sampling/partial in-
spection makes no sense. One is not going to learn anything about p from the
sampling. Simple economics (comparison of p to the critical cost ratio k
1
=k
2
)
determines whether it is best to inspect and rectify, or to take ones lumps in
later costs.
When one may not assume that p is xed/known (and it is thus unavail-
able for use in choosing an optimal (n; c) pair) some other approach has to be
taken. One possibility is to describe p with a probability distribution G, average
ETC(n; c; p) over p according to that distribution to get E
G
ETC(n; c), and then
to compare numbers (one for each (n; c) pair) to identify an optimal inspection
plan. This makes sense
1. from a Bayesian point of view, where the distribution G reects ones
prior beliefs about p, or
2. from a non-Bayesian point of view, where the distribution G is a process
distribution describing how p is thought to vary lot to lot.
The program SAMPLE (written by Tom Lorenzen and modied slightly by
Steve Crowder) available o the Stat 531 Web page will do this averaging and
optimization for the case where G is a Beta distribution.
Consider what insights into this average out according to G idea can be
written down in more or less explicit form. In particular, consider rst the
problem of choosing a best c for a particular n, say (c
opt
G
(n)). Note that if a
sample of n results in xnonconforming items, the (conditional) expected cost
incurred is
nk
1
+ (N n)k
2
E
G
[p jX = x] with no more inspection
and
Nk
1
if the remainder of the lot is inspected :
(Note that the form of the conditional mean of p given X = x depends upon
the distribution G.) So, one should do no more inspection if
nk
1
+ (N n)k
2
E
G
[p jX = x] < Nk
1
;
i.e. if
E
G
[p jX = x] <
k
1
k
2
;
and the remaining items should be inspected if
E
G
[p jX = x] >
k
1
k
2
:
So, an optimal choice of c is
c
opt
G
(n) = max
x j E
G
[p j X = x]
k
1
k
2
: (5.19)
(And it is perhaps comforting to know that the monotone likelihood ratio prop-
erty of the binomial distribution guarantees that E
G
[p jX = x] is monotone in
x.)
What is this saying? The assumptions 1) that p G and 2) that conditional
on p the variable X Binomial (n; p) together give a joint distribution for p and
X. This in turn can be used to produce for each x a conditional distribution
of pjX = x and therefore a conditional mean value of p given that X = x.
The prescription (5.19) says that one should nd the largest x for which that
conditional mean value of p is still less than the critical cost ratio and use that
value for c
opt
G
(n). To complete the optimization of E
G
ETC(n; c; p), one then
would then need to compute and compare (for various n) the quantities
E
G
ETC(n; c
opt
G
(n); p) : (5.20)
The fact is that depending upon the nature of G, the minimizer of quantity
(5.20) can turn out to be anything from 0 to N. For example, if G puts all its
probability on one side or the other of k
1
=k
2
, then the conditional distributions
of p given X = x must concentrate all their probability (and therefore have
their means) on that same side of the critical cost ratio. So it follows that if G
puts all its probability to the left of k
1
=k
2
, none is optimal (even though one
doesnt know p exactly), while if G puts all its probability to the right of k
1
=k
2
,
all is optimal in terms of optimizing E
G
ETC(n; c; p).
On the other hand, consider an unrealistic but instructive situation where
k
1
= 1; k
2
= 1000 and G places probability
1
2
on the possibility that p = 0 and
probability
1
2
on the possibility that p = 1. Under this model the lot is either
perfectly good or perfectly bad, and a priori one thinks these possibilities are
equally likely. Here the distribution G places probability on both sides of the
breakeven quantity k
1
=k
2
= :001. Even without actually carrying through the
whole mathematical analysis, it should be clear that in this scenario the optimal
n is 1! Once one has inspected a single item, he or she knows for sure whether
p is 0 or is 1 (and the lot can be rectied in the latter case).
The most common mathematically nontrivial version of this whole analysis
of the Deming Inspection Problem is the case where G is a Beta distribution.
If G is the Beta(; ) distribution,
E
G
[p jX = x] =
+x
+ +n
so that c
opt
G
(n) is the largest value of x such that
+x
+ +n

k
1
k
2
:
That is, in this situation, for byc the greatest integer in y,
c
opt
G
(n) = b
k
1
k
2
( + +n) c = b
k
1
k
2
n +
k
1
k
2
( +)c ;
which for large n is essentially
k1
k2
n. The optimal value of n can then be found
by optimizing (over choice of n) the quantity
E
G
ETC(n; c
opt
G
(n); p)
=
Z
1
0
ETC(n; c
opt
G
(n); p)
1
B(; )
p
1
(1 p)
1
dp :
The reader can check that this exercise boils down to the minimization over n
of
1
n
N
c
opt
G
(n)
X
x=0
n
x
Z
1
0
p
x
(1 p)
nx
p
k
2
k
1
1
p
1
(1 p)
1
dp :
(The SAMPLE program of Lorenzen alluded to earlier actually uses a dierent
approach than the one discussed here to nd optimal plans. That approach is
computationally more ecient, but not as illuminating in terms of laying bare
the basic structure of the problem as the route taken in this exposition.)
As two nal pieces of perspective on this topic of economic analysis of sam-
pling inspection we oer the following. In the rst place, while the Deming
Inspection Problem is not a terribly general formulation of the topic, the results
here are typical of how things turn out. Second, it needs to be remembered that
what has been described here is the nding of a cost-optimal xed n inspection
plan. The problem of nding a plan optimal among all possible plans (of the
type discussed in 5.1) is a more challenging one. For G placing probability
on both sides of the critical cost ratio, not only need it not be that case that
all or none is optimal, but in general an optimal plan need not be of the
xed n variety. While in principle the methodology for nding an overall best
inspection plan is well-established (involving as it does so called dynamic pro-
gramming or backwards induction) the details are unpleasant enough that
it will not make sense to pursue this matter further.
Chapter 6
Problems
1 Measurement and Statistics
1.1. Suppose that a sample variance s
2
is based on a sample of size n from a
normal distribution. One might consider estimating using s or s=c
4
(n),
or even some other multiple of s.
(a) Since c
4
(n) < 1, the second of these estimators has a larger variance
than the rst. But the second is unbiased (has expected value )
while the rst is not. Which has the smaller mean squared error,
E(b )
2
? Note that (as is standard in statistical theory), E(b
)
2
=Var b +(Eb )
2
. (Mean squared error is variance plus squared
bias.)
(b) What is an optimal (in terms of minimum mean squared error) mul-
tiple of s to use in estimating ?
1.2. How do R=d
2
(n) and s=c
4
(n) compare (in terms of mean squared error)
as estimators of ? (The assumption here is that they are both based on
a sample from a normal distribution. See Problem 1.1 for a denition of
mean squared error.)
1.3. Suppose that sample variances s
2
i
, i = 1; 2; : : : ; r are based on independent
samples of size m from normal distributions with a common standard
deviation, . A common SQC-inspired estimator of is s=c
4
(m). Another
possibility is
s
pooled
=
s
s
2
1
+ +s
2
r
r
69
70 CHAPTER 6. PROBLEMS
or
^ = s
pooled
=c
4
((m1)r + 1) :
Standard distribution theory says that r(m1)s
2
pooled
=
2
has a
2
distri-
bution with r(m1) degrees of freedom.
(a) Compare s=c
4
(m), s
pooled
and ^ in terms of mean squared error.
(b) What is an optimal multiple of s
pooled
(in terms of mean squared
error) to use in estimating ?
(Note: See Vardeman (1999 IIE Transactions) for a complete treatment
of the issues raised in Problems 1.1 through 1.3.)
1.4. Set up a double integral that gives the probability that the sample range
of n standard normal random variables is between .5 and 2.0. How is
this probability related to the probability that the sample range of n iid
normal (;
2
) random variables is between .5 and 2.0?
1.5. It is often helpful to state standard errors (estimated standard devi-
ations) corresponding to point estimates of quantities of interest. In a
context where a standard deviation, , is to be estimated by

R=d
2
(n)
based on r samples of size n, what is a reasonable standard error to an-
nounce? (Be sure that your answer is computable from sample data, i.e.
doesnt involve any unknown process parameters.)
1.6. Consider the paper weight data in Problem (2.12) of V&J. Assume that
the 2-way random eects model is appropriate and do the following.
(a) Compute the y
ij
; s
ij
and R
ij
for all IJ = 25 = 10 PieceOperator
combinations. Then compute both row ranges of means
i
and row
sample variances of means s
2
i
.
(b) Find both range-based and sample variance-based point estimates of
the repeatability standard deviation, .
(c) Find both range-based and sample variance-based point estimates of
the reproducibility standard deviation
reproducibility
=
q
+
2
.
(d) Get a statistical package to give you the 2-way ANOVA table for these
data. Verify that s
2
pooled
= MSE and that your sample variance-
based estimate of
reproducibility
from part (c) is
s
max
0;
1
mI
MSB +
I 1
mI
MSAB
1
m
MSE
:
1. MEASUREMENT AND STATISTICS 71
(e) Find a 90% two-sided condence interval for the parameter .
(f) Use the material in 1.5 and give an approximate 90% two-sided
condence interval for
reproducibility
.
(g) Find a linear combination of the mean squares from (d) whose ex-
pected value is
2
overall
=
2
reproducibility
+
2
. All the coecients in
your linear combination will be positive. In this case, the you may use
the next to last paragraph of 1.5 to come up with an approximate
90% two-sided condence interval for
overall
. Do so.
(h) The problem from which the paper weight data are drawn indicates
that specications of approximately 4g/m
2
are common for paper
of the type used in this gage study. These translate to specications
of about :16g for pieces of paper of the size used here. Use these
specications and your answer to part (g) to make an approximate
90% condence interval for the gage capability ratio
GCR =
6
overall
(U L)
:
Used in the way it was in this study, does the scale seem adequate
to check conformance to such specications?
(i) Give (any sensible) point estimates of the fractions of the overall mea-
surement variance attributable to repeatability and to reproducibil-
ity.
1.7. In a particular (real) thorium detection problem, measurement variation
for a particular (spectral absorption) instrument was thought to be about
measurement
= :002 instrument units. (Division of a measurement ex-
pressed in instrument units by 58.2 gave values in g/l.) Suppose that in
an environmental study, a eld sample is to be measured once (producing
y
new
) on this instrument and the result is to be compared to a (contem-
poraneous) measurement of a lab blank (producing y
old
). If the eld
reading exceeds the blank reading by too much, there will be a declaration
that there is a detectable excess amount of thorium present.
(a) Assuming that measurements are normal, nd a critical value L
c
so
that the lab will run no more than a 5% chance of a false positive
result.
(b) Based on your answer to (a), what is a lower limit of detection,
L
d
, for a 90% probability () of correctly detecting excess thorium?
What, by the way, is this limit in terms of g/l?
1.8. Below are 4 hypothetical samples of size n = 3. A little calculation shows
that ignoring the fact that there are 4 samples and simply computing s
based on 12 observations will produce a standard deviation much larger
than s
pooled
. Why is this?
3,6,5 4,3,1 8,9,6 2,1,4
1.9. In applying ANOVA methods to gage R&R studies, one often uses linear
combinations of independent mean squares as estimators of their expected
values. Section 1.5 of these notes shows it is possible to also produce stan-
dard errors (estimated standard deviations) for these linear combinations.
Suppose that MS
1
; MS
2
; : : : ; MS
k
are independent random variables,

i
MS
i
EMS
i

2
i
. Consider the random variable
U = c
1
MS
1
+c
2
MS
2
+ +c
k
MS
k
:
(a) Find the standard deviation of U.
(b) Your expression from (a) should involve the means EMS
i
, that in
applications will be unknown. Propose a sensible (data-based) es-
timator of the standard deviation of U that does not involve these
quantities.
(c) Apply your result from (b) to give a sensible standard error for the
ANOVA-based estimators of
2
,
2
reproducibility
and
2
overall
.
1.10. Section 1.7 of the notes presents rounded data likelihood methods for
normal data with the 2 parameters and . The same kind of thing can be
done for other families of distributions (which can have other numbers of
parameters). For example, the exponential distributions with means
1
can be used. (Here there is the single parameter .) These exponential
distributions have cdfs
F
(x) =
1 exp(x) for x 0
0 for x < 0 :
Below is a frequency table for twenty exponential observations that have
been rounded to the nearest integer.
rounded value 0 1 2 3 4
frequency 7 8 2 2 1
(a) Write out an expression for the appropriate rounded data log like-
lihood function for this problem,
L() = lnL(dataj) :
1. MEASUREMENT AND STATISTICS 73
(You should be slightly careful here. Exponential random variables
only take values in the interval (0; 1).)
(b) Make a plot of L(). Use it and identify the maximum likelihood
estimate of based on the rounded data.
(c) Use the plot from (b) and make an approximate 90% condence in-
terval for . (The appropriate
2
value has 1 associated degree of
freedom.)
1.11. Below are values of a critical dimension (in .0001 inch above nominal)
measured on hourly samples of size n = 5 precision metal parts taken
from the output of a CNC (computer numerically controlled) lathe.
sample 1 2 3 4 5 6 7 8
measurements 4,3,3,2,3 2,2,3,3,2 4,1,0,1,0 2,0,2,1,4 2,2,1,3,4 2, 2,2,1,2 0,0,0,2,0 1,1,2,0,2
(a) Compute for each of these samples the raw sample standard devi-
ation (ignoring rounding) and the Sheppards correction standard
deviation that is appropriate for integer rounded data. How do these
compare for the eight samples above?
(b) For each of the samples that have a range of at least 2, use the CON-
EST program to nd rounded normal data maximum likelihood
estimates of the normal parameters and . The program as writ-
ten accepts observations 1, so you will need to add an integer to
each element of some of the samples above before doing calculation
with the program. (I dont remember, but you may not be able to in-
put a standard deviation of exactly 0 either.) How do the maximum
likelihood estimates of compare to x values? How do the max-
imum likelihood estimates of compare to both the raw standard
deviations and to the results of applying Sheppards correction?
(c) Consider sample #2. Make 95% and 90% condence intervals for
both and using the work of Johnson Lee.
(d) Consider sample #1. Use the CONEST program to get a few ap-
proximate values for L
() and some approximate values for L
().
(For example, look at a contour plot of L over a narrow range of
means near to get an approximate value for L
().) Sketch L
()
and L
() and use your sketches and Lees tables to produce 95%

condence intervals for and .
(e) What 95% condence intervals for and would result from a 9th
sample, f2; 2; 2; 2; 2g?
1.12. A single operator measures a single widget diameter 15 times and obtains
a range of R = 3 10
4
inches. Then this person measures the diameters
of 12 dierent widgets once each and obtains a range of R = 8 10
4
inches. Give an estimated standard deviation of widget diameters (not
including measurement error).
1.13. Cylinders of (outside) diameter O must t in ring bearings of (inside)
diameter I, producing clearance C = I O. We would like to have
some idea of the variability in actual clearances that will be obtained by
random assembly of cylinders produced on one production line with ring
bearings produced on another. The gages used to measure I and O are
(naturally enough) dierent.
In a study using a single gage to measure outside diameters of cylinders,
n
O
= 10 dierent cylinders were measured once each, producing a sample
standard deviation s
O
= :001 inch. In a subsequent study, this same
gage was used to measure the outside diameter of an additional cylinder
m
O
= 5 times, producing a sample standard deviation s
Ogage
= :0005
inch.
In a study using a single gage to measure inside diameters of ring bearings,
n
I
= 20 dierent inside diameters were measured once each, producing
a sample standard deviation s
I
= :003 inch. In a subsequent study, this
same gage was used to measure the inside diameter of another ring bearing
m
I
= 10 times, producing a sample standard deviation s
Igage
= :001 inch.
(a) Give a sensible (point) estimate of the standard deviation of C pro-
duced under random assembly.
(b) Find a sensible standard error for your estimate in (a).
2 Process Monitoring
Methods
2.1. Consider the following hypothetical situation. A variables process mon-
itoring scheme is to be set up for a production line, and two dierent
measuring devices are available for data gathering purposes. Device A
produces precise and expensive measurements and device B produces less
precise and less expensive measurements. Let
measurement
for the two
devices be respectively
A
and
B
, and suppose that the target for a par-
ticular critical diameter for widgets produced on the line is 200.0.
2. PROCESS MONITORING 75
(a) A single widget produced on the line is measured n = 10 times with
each device and R
A
= 2:0 and R
B
= 5:0. Give estimates of
A
and
B
.
(b) Explain why it would not be appropriate to use one of your estimates
from (a) as a for setting up an x and R chart pair for monitoring
the process based on measurements from one of the devices.
Using device A, 10 consecutive widgets produced on the line (under
presumably stable conditions) have (single) measurements with R =
8:0.
(c) Set up reasonable control limits for both x and R for the future mon-
itoring of the process based on samples of size n = 10 and measure-
ments from device A.
(d) Combining the information above about the A measurements on 10
consecutive widgets with your answer to (a), under a model that says
observed diameter = real diameter + measurement error
where real diameter and measurement error are independent,
give an estimate of the standard deviation of the real diameters. (See
the discussion around page 19 of V&J.)
(e) Based on your answers to parts (a) and (d), set up reasonable control
limits for both x and R for the future monitoring of the process based
on samples of size n = 5 and measurements from the cheaper device,
device B.
2.2. The following are some data taken from a larger set in Statistical Quality
Control by Grant and Leavenworth, giving the drained weights (in ounces)
of contents of size No. 2
1
2
cans of standard grade tomatoes in puree. 20
samples of three cans taken from a canning process at regular intervals
are represented.
Sample x
1
x
2
x
3
1 22.0 22.5 22.5
2 20.5 22.5 22.5
3 20.0 20.5 23.0
4 21.0 22.0 22.0
5 22.5 19.5 22.5
6 23.0 23.5 21.0
7 19.0 20.0 22.0
8 21.5 20.5 19.0
9 21.0 22.5 20.0
10 21.5 23.0 22.0
Sample x
1
x
2
x
3
11 20.0 19.5 21.0
12 19.0 21.0 21.0
13 19.5 20.5 21.0
14 20.0 21.5 24.0
15 22.5 19.5 21.0
16 21.5 20.5 22.0
17 19.0 21.5 23.0
18 21.0 20.5 19.5
19 20.0 23.5 24.0
20 22.0 20.5 21.0
(a) Suppose that standard values for the process mean and standard de-
viation of drained weights ( and ) in this canning plant are 21.0 oz
and 1.0 oz respectively. Make and interpret standards given x and R
charts based on these samples. What do these charts indicate about
the behavior of the lling process over the time period represented
by these data?
(b) As an alternative to the standards given range chart made in part
(a), make a standards given s chart based on the 20 samples. How
does its appearance compare to that of the R chart?
Now suppose that no standard values for and have been provided.
(c) Find one estimate of for the lling process based on the average
of the 20 sample ranges,

R, and another based on the average of 20
sample standard deviations, s.
(d) Use
=
x and your estimate of based on

R and make retrospective
control charts for x and R. What do these indicate about the stability
of the lling process over the time period represented by these data?
(e) Use
=
x and your estimate of based on s and make retrospective
control charts for x and s. How do these compare in appearance
to the retrospective charts for process mean and variability made in
part (d)?
2.3. The accompanying data are some taken from Statistical Quality Control
Methods by I.W. Burr, giving the numbers of beverage cans found to be
defective in periodic samples of 312 cans at a bottling facility.
Sample Defectives
1 6
2 7
3 5
4 7
5 5
6 5
7 4
8 5
9 12
10 6
Sample Defectives
11 7
12 7
13 6
14 6
15 6
16 6
17 23
18 10
19 8
20 5
(a) Suppose that company standards are that on average p = :02 of the
cans are defective. Use this value and make a standards given p chart
based on the data above. Does it appear that the process fraction
defective was stable at the p = :02 value over the period represented
by these data?
(b) Make a retrospective p chart for these data. What is indicated by
this chart about the stability of the canning process?
2.4. Modern business pressures are making standards for fractions nonconform-
ing in the range of 10
4
to 10
6
not uncommon.
(a) What are standards given 3 control limits for a p chart with stan-
dard fraction nonconforming 10
4
and sample size 100? What is the
all-OK ARL for this scheme?
(b) If p becomes twice the standard value (of 10
4
), what is the ARL
for the scheme from (a)? (Use your answer to (a) and the binomial
distribution for n = 100 and p = 2 10
4
.)
(c) What do (a) and (b) suggest about the feasibility of doing process
monitoring for very small fractions defective based on attributes
data?
2.5. Suppose that a dimension of parts produced on a certain machine over a
short period can be thought of as normally distributed with some mean
and standard deviation = :005 inch. Suppose further, that values of
this dimension more than .0098 inch from the 1.000 inch nominal value
are considered nonconforming. Finally, suppose that hourly samples of 10
of these parts are to be taken.
(a) If is exactly on target (i.e. = 1:000 inch) about what fraction of
parts will be nonconforming? Is it possible for the fraction noncon-
forming to ever be any less than this gure?
(b) One could use a p chart based on n = 10 to monitor process per-
formance in this situation. What would be standards given 3 sigma
control limits for the p chart, using your answer from part (a) as the
standard value of p?
(c) What is the probability that a particular sample of n = 10 parts will
produce an out-of-control signal on the chart from (b) if remains
at its standard value of = 1:000 inch? How does this compare to
the same probability for a 3 sigma x chart for an n = 10 setup with
a center line at 1.000? (For the p chart, use a binomial probability
calculation. For the x chart, use the facts that
x
= and
x
=
=
p
n.) What are the ARLs of the monitoring schemes under these
conditions?
(d) Compare the probability that a particular sample of n = 10 parts
will produce an out-of-control signal on the p chart from (b) to the
probability that the sample will produce an out of control signal on
the (n = 10) 3 sigma x chart rst mentioned in (c), supposing that in
fact = 1:005 inch. What are the ARLs of the monitoring schemes
under these conditions? What moral is told by your calculations here
and in part (c)?
2.6. The article High Tech, High Touch, by J. Ryan, that appeared in Qual-
ity Progress in 1987 discusses the quality enhancement processes used by
Martin Marietta in the production of the space shuttle external (liquid
oxygen) fuel tanks. It includes a graph giving counts of major hardware
nonconformities for each of 41 tanks produced. The accompanying data
are approximate counts read from that graph for the last 35 tanks. (The
rst six tanks were of a dierent design than the others and are thus not
included here.)
Tank Nonconformities
1 537
2 463
3 417
4 370
5 333
6 241
7 194
8 185
9 204
10 185
11 167
12 157
13 139
14 130
15 130
16 267
17 102
18 130
Tank Nonconformities
19 157
20 120
21 148
22 65
23 130
24 111
25 65
26 74
27 65
28 148
29 74
30 65
31 139
32 213
33 222
34 93
35 194
(a) Make a retrospective c chart for these data. Is there evidence of
real quality improvement in this series of counts of nonconformities?
Explain.
(b) Consider only the last 17 tanks represented above. Does it appear
that quality was stable over the production period represented by
these tanks? (Make another retrospective c chart.)
(c) It is possible that some of the gures read from the graph in the
original article may dier from the real gures by as much as, say,
15 nonconformities. Would this measurement error account for the
apparent lack of stability you found in (a) or (b) above? Explain.
2.7. Boulaevskaia, Fair and Seniva did a study of defect detection rates for
the visual inspection of some glass vials. Vials known to be visually iden-
tiable as defective were marked with invisible ink, placed among other
vials, and run through a visual inspection process at 10 dierent time peri-
ods. The numbers of marked defective vials that were detected/captured,
the numbers placed into the inspection process, and the corresponding
ratios for the 10 periods are below.
X = number detected/captured 6 10 15 18 17 2 7 5 6 5
n = number placed 30 30 30 30 30 15 15 15 15 15
X=n .2 .33 .5 .6 .57 .13 .47 .33 .4 .33
(Overall, 91 of the 225 marked vials placed into the inspection process
were detected/captured.)
(a) Carefully investigate (and say clearly) whether there is evidence in
these data of instability in the defect detection rate.
(b) 91=225 = :404. Do you think that the company these students worked
with was likely satised with the 40.4% detection rate? What, if
anything, does your answer here have to do with the analysis in (a)?
2.8. (Narrow Limit Gaging) Parametric probability model assumptions
can sometimes be used to advantage even where one is ultimately going
to generate and use attributes data. Consider a situation where process
standards are that widget diameters are to be normally distributed with
mean = 5 and standard deviation = 1. Engineering specications on
these diameters are 5 3.
As a process monitoring device, samples of n = 100 of these widgets are
going to be checked with a go/no-go gage, and
X =the number of diameters in a sample failing to pass the gaging test
will be counted and plotted on an np chart. The design of the go/no-
go gage is up to you to choose. You may design it to pass parts with
diameters in any interval (a; b) of your choosing.
(a) One natural choice of (a; b) is according to the engineering specica-
tions, i.e. as (2,8). With this choice of go/no-go gage, a 3 control
chart for X signals if X 2. Find the all-OK ARL for this scheme
with this gage.
(b) One might, however, choose (a; b) in other ways besides according to
the engineering specications, e.g. as (5 ; 5 +) for some other
than 3. Show that the choice of = 2:71 and a control chart that
signals if X 3 will have about the same all-OK ARL as the scheme
from (a).
(c) Compare the schemes from (a) and (b) supposing that diameters are
in fact normally distributed with mean = 6 and standard deviation
= 1.
2.9. A one-sided upper CUSUM scheme is used to monitor
Q = the number of defectives in samples of size n = 400 .
Suppose that one uses k
1
= 8 and h
1
= 10. Use the normal approximation
to the binomial distribution to obtain an approximate ARL for this scheme
if p = :025.
2.10. Consider the monitoring of a process that we will assume produces nor-
mally distributed observations X with standard deviations = :04.
(a) Set up both a two-sided CUSUM scheme and a EWMA scheme for
monitoring the process (Q = X), using a target value of .13 and a
desired all-OK ARL of roughly 370, if quickest possible detection of
a change in mean of size = :02 is desired.
(b) Plot on the same set of axes, the logarithms of the ARLs for your
charts from (a) as functions of , the real mean of observations being
CUSUMed or EWMAed. Also plot on this same set of axes the
logarithms of ARLs for a standard 3 Shewhart Chart for individuals.
Comment upon how the 3 ARL curves compare.
2.11. Shear strengths of spot welds made by a certain robot are approximately
normal with a short term variability described by = 60 lbs. The
strengths in samples of n of these welds are going to be obtained and
x values CUSUMed.
(a) Give a reference value k
2
, sample size n and a decision interval h
2
so that a one-sided (lower) CUSUM scheme for the xs will have an
ARL of about 370 if = 800 lbs and an ARL of about 5 if = 750
lbs.
(b) Find a sample size and a lower Shewhart control limit for x, say #,
so that if =800 lbs, there will be about 370 samples taken before an
x will plot below #, and if = 750 there will be on average about 5
samples taken before an x will plot below #.
2.12. You have data on the eciency of a continuous chemical production process.
The eciency is supposed to be about 45%, and you will use a CUSUM
scheme to monitor the eciency. Eciency is computed once per shift,
but from much past data, you know that :7%.
(a) If you wish quickest possible detection of a shift of .7% (one standard
deviation) in mean eciency, design a two-sided CUSUM scheme for
this situation with an all-OK ARL of about 500.
(b) Apply your procedure from (a) to the data below. Are any alarms
signaled?
Shift Eciency
1 45.7
2 44.6
3 45.0
4 44.4
5 44.4
6 44.2
7 46.1
8 44.6
9 45.7
10 44.4
Shift Eciency
11 45.8
12 45.4
13 46.8
14 45.5
15 45.8
16 46.4
17 46.0
18 46.3
19 45.6
(c) Make a plot of raw CUSUMs using a reference value of 45%. From
your plot, when do you think that the mean eciency shifted away
from 45%?
(d) What are the all-OK and = 45:7% ARLs if one employs your
procedure from (a) modied by giving both the high and low side
charts head starts of u = v = h
1
=2 = h
2
=2?
(e) Repeat part (a) using a EWMAscheme rather than a CUSUM scheme.
(f) Apply your procedure from (e) to the data. Are any alarms signaled?
Plot your EWMA values. Based on this plot, when do you think that
the mean eciency shifted away from 45%?
2.13. Consider the problem of designing a EWMA control chart for xs, where
in addition to choosing chart parameters one gets to choose the sample
size, n. In such a case, one can choose monitoring parameters to produce
both a desired (large) on-target ARL and a desired (small) o-target ARL
units away from the target.
Suppose, for example, that a process standard deviation is = 1 and one
wishes to design for an ARL of 370 if the process mean, , is on target,
and an ARL of no more than 5.0 if is o target by as much as = 1:0.
Using
Q
= =
p
n and shift = =
Q
and reading from one of the graphs in
Crowders 1989 JQT paper, values of
opt
for detecting a change in process
mean of this size using EWMAs of xs are approximately as below:
n 1 2 3 4 5 6 7 8 9
opt
.14 .08 .06 .05 .05 .04 .04 .04 .03
Use Crowders EWMA ARL program (and some trial and error) to nd
values of K that when used with the s above will produce an on-target
ARL of 370. Then determine how large n must then be in order to meet
the 370 and 5.0 ARL requirements. How does this compare to what Table
4.8 says is needed for a two-sided CUSUM to meet the same criteria?
2.14. Consider a combination of high and low side decision interval CUSUM
schemes with h
1
= h
2
= 2:5, u = 1, v = 1, k
1
= :5 and k
2
= :5.
Suppose that Qs are iid normal variables with
Q
= 1:0. Find the ARLs
for the combined scheme if
Q
= 0 and then if
Q
= 1:0. (You will need to
use Gans CUSUM ARL program and Yashchins expression for combining
high and low side ARLs.)
2.15. Set up two dierent X/MR monitoring chart pairs for normal variables
Q, in the case where the standards are
Q
= 5 and
Q
= 1:715 and the all-
OK ARL desired is 250. For these combinations, what ARLs are relevant
if in fact
Q
= 5:5 and
Q
= 2:00? (Run Crowders X=MR ARL program
to get these with minimum interpolation.)
2.16. If one has discrete or rounded data and insists on using x and/or R charts,
1.7.1 shows how these may be based on the exact all-OK distributions
of x and/or R (and not on normal theory control limits). Suppose that
measurements arise from integer rounding of normal random variables
with = 2:25 and = :5 (so that essentially only values 1, 2, 3 and 4 are
ever seen). Compute the four probabilities corresponding to these rounded
values (and fudge them slightly so that they total to 1.00). Then, for
n = 4 compute the probability distributions of x and R based on iid
observations from this distribution. Then run Karen (Jensen) Hultings
DIST program and compare your answers to what her program produces.
2.17. Suppose that standard values of process parameters are = 17 and =
2:4.
(a) Using sample means x based on samples of size n = 4, design both
a combined high and low side CUSUM scheme (with 0 head starts)
and a EWMA scheme to have an all-OK ARL of 370 and quickest
possible detection of a shift in process of mean of size .6.
(b) If, in fact, the process mean is = 17:5 and the process standard
deviation is = 3:0, show how you would nd the ARL associated
with your schemes from (a). (You dont need to actually interpolate
in the tables, but do compute the values you would need in order to
enter the tables, and say which tables you must employ.)
2.18. A discrete variable X can take only values 1, 2, 3, 4 and 5. Nevertheless,
managers decide to monitor process spread using the ranges of samples
of size n = 2. Suppose, for sake of argument, that under standard plant
conditions observations are iid and uniform on the values 1 through 5 (i.e.
P[X = 1] = P[X = 2] = P[X = 3] = P[X = 4] = P[X = 5] = :2).
(a) Find the distribution of R for this situation. (Note that R has possi-
ble values 0, 1, 2, 3 and 4. You need to reason out the corresponding
probabilities.)
(b) The correct answer to part (a) has ER = 1:6. This implies that if
many samples of size n = 2 are taken and

Rcomputed, one can expect
a mean range near 1:6. Find and criticize corresponding normal
theory control limits for R.
(c) Suppose that instead of using a normal-based Shewhart chart for R,
one decides to use a high side Shewhart-CUSUM scheme (for ranges)
with reference value k
1
= 2 and starting value 0, that signals the rst
time any range is 4 or the CUSUM is 3 or more. Use your answer for
(a) and show how to nd the ARL for this scheme. (You need not
actually carry through the calculations, but show explicitly how to
set things up.)
2.19. SQC novices faced with the task of analyzing a sequence of (say) m indi-
vidual observations collected over time often do the following: Compute
x and s from the m data values and apply control limits x 3s to
the m individuals. Say why this method of operation is essentially useless.
(Compare Problem 1.8.)
2.20. Consider an x chart based on standards
0
and
0
and samples of size n,
where only the one point outside 3 limits alarm rule is in use.
(a) Find ARLs if in fact =
0
, but
p
nj
0
j= is respectively 0, 1,
2, and 3.
(b) Find ARLs if in fact =
0
, but =
0
is respectively .5, .8, 1, 1.5
and 2.0.
Theory
2.21. Consider the problem of samples of size n = 1 in variables control charting
contexts, and the notion of there using moving ranges for various purposes.
This problem considers a little theory that may help illustrate the impli-
cations of using an average moving range, MR, in the estimation of in
such circumstances.
Suppose that X
1
and X
2
are independent normal random variables with a
common variance
2
, but possibly dierent means
1
and
2
. (You may, if
you wish, think of these as widget diameters made at times 1 and 2, where
the process mean has potentially shifted between the sampling periods.)
(a) What is the distribution of X
1
X
2
? The distribution of (X
1
X
2
)=?
(b) For t > 0, write out in terms of values of the probability
P[j(X
1
X
2
)=j t] :
In doing this, abbreviate (
1
2
)= as .
(c) Notice that in part (b), you have found the cumulative distribution
function for the random variable MR=. Dierentiate your answer
to (b) to nd the probability density for MR= and then use this
probability density to write down an integral that gives the mean of
the random variable MR=, E(MR=). (You may abbreviate the
standard normal pdf as , rather than writing everything out.)
Vardeman used his trusty HP 15C (and its denite integral routine)
and evaluated the integral in (c) for various values of . Some values
that he obtained are below.
0 :1 :2 :3 :4 :5 1:0 1:5
E(MR=) 1.1284 1.1312 1.1396 1.1537 1.1732 1.198 1.399 1.710
2:0 2:5 3:0 3:5 4:0 large jj
2.101 2.544 3.017 3.506 4.002 jj
(Notice that as expected, the = 0 value is d
2
for a sample of size
n = 2.)
(d) Based on the information above, argue that for n independent normal
random variables X
1
; X
2
; : : : ; X
n
with common standard deviation ,
if
1
=
2
= =
n
then the sample average moving range, MR,
when divided by 1.1284 has expected value .
(e) Now suppose that instead of being constant, the successive means,
1
;
2
; : : : ;
n
in fact exhibit a reasonably strong linear trend. That
is suppose that
t
=
t1
+ . What is the expected value of
MR/1.1284 in this situation. Does MR/1.1284 seem like a sensi-
ble estimate of here?
(f) In a scenario where the means could potentially bounce around
according to
t
=
t1
k, how large might k be without destroy-
ing the usefulness of MR/1.1284 as an estimate of ? Defend your
opinion on the basis of the information contained in the table above.
2.22. Consider the kind of discrete time Markov Chain with a single absorbing
state used in 2.1 to study the run length properties of process monitoring
schemes. Suppose that one wants to know not the mean times to absorp-
tion from the nonabsorbing states, but the variances of those times. Since
for a generic random variable X, VarX =EX
2
(EX)
2
, once one has mean
times to absorption (belonging to the vector L = (I R)
1
1) it suces
to compute the expected squares of times to absorption. Let M be an
m 1 vector containing expected squares of times to absorption (from
states S
1
through S
m
). Set up a system of m equations for the elements
of M in terms of the elements of R; L and M. Then show that in matrix
notation
M = (I R)
1
(I + 2R(I R)
1
)1 :
2.23. So-called Stop-light Control or Target Area Control of a measured
characteristic X proceeds as follows. One rst denes Green (OK),
Yellow (Marginal) and Red (Unacceptable) regions of possible values
of X. One then periodically samples a process according to the following
rules. At a given sampling period, a single item is measured and if it
produces a Green X, no further action is necessary at the time period in
question. If it produces a Red X, lack of control is declared. If it produces
a Yellow X, a second item is immediately sampled and measured. If this
second item produces a Green X, no further action is taken at the period
in question, but otherwise lack of control is declared.
Suppose that in fact a process under stop-light monitoring is stable and
p
G
= P[X is Green], p
Y
= P[X is Yellow] and p
R
= 1 p
G
p
Y
= P[X
is Red].
(a) Find the mean number of sampling periods from the beginning of
monitoring through the rst out-of-control signal, in terms of the ps.
(b) Find the mean total number of items measured from the beginning
of monitoring through the rst out-of-control signal, in terms of the
ps.
2.24. Consider the Run-Sum control chart scheme discussed in 2.2. In the notes
Vardeman wrote out a transition matrix for a Markov Chain analysis of
the behavior of this scheme.
(a) Write out the corresponding system of 8 linear equations in 8 mean
times to absorption for the scheme. Note that the mean times till
signal from T = 0 and T = +0 states are the same linear
combinations of the 8 mean times and must thus be equal.
(b) Find a formula for the ARL of this scheme. This can be done as
follows. Use the equations for the mean times to absorption from
states T = +3 and T = +2 to nd a constant
+2;+3
such that
L
+3
=
+2;+3
L
+2
. Find similar constants
+1;+2
,
+0;+1
,
2;3
,
1;2
and
0;1
. Then use these constants to write a single linear
equation for L
+0
= L
0
that you can solve for L
+0
= L
0
.
2.25. Consider the problem of monitoring
X = the number of nonconformities on a widget :
Suppose the standard for is so small that a usual 3 Shewhart control
chart will signal any time X
t
> 0. On intuitive grounds the engineers
involved nd such a state of aairs unacceptable. The replacement for the
standard Shewhart scheme that is then being contemplated is one that
signals at time t if
i) X
t
2
or ii) X
t
= 1 and any of X
t1
, X
t2
, X
t3
or X
t4
is also equal
to 1.
Show how you could nd an ARL for this scheme. (Give either a matrix
equation or system of linear equations one would need to solve. State
clearly which of the quantities in your set-up is the desired ARL.)
2.26. Consider a discrete distribution on the (positive and negative) integers
specied by the probability function p(). This distribution will be used
below to help predict the performance of a Shewhart type monitoring
scheme that will sound an alarm the rst time that an individual obser-
vation X
t
is 3 or more in absolute value (that is, the alarm bell rings the
rst time that jX
t
j 3).
(a) Give an expression for the ARL of the scheme in terms of values of
p(), if observations X
1
; X
2
; X
3
; : : : are iid with probability function
p().
(b) Carefully set up and show how you would use a transition matrix
for an appropriate Markov Chain in order to nd the ARL of the
scheme under a model for the observations X
1
; X
2
; X
3
; : : : specied
as follows:
X
1
has probability function p(), and given X
1
; X
2
; : : : ; X
t1
,
the variable X
t
has probability function p( X
t1
)
You need not carry out any matrix manipulations, but be sure to
fully explain how you would use the matrix you set up.
2.27. Consider the problem of nding ARLs for a Shewhart individuals chart
supposing that observations X
1
; X
2
; X
3
; : : : are not iid, but rather realiza-
tions from a so-called AR(1) model. That is, suppose that in fact for some
with jj < 1
X
t
= X
t1
+
t
for a sequence of iid normal random variables
1
;
2
; : : : each with mean 0
and variance
2
. Notice that under this model the conditional distribution
of X
t+1
given all previous observations is normal with mean X
t
and
variance
2
.
Consider plotting values X
t
on a Shewhart chart with control limits UCL
and LCL.
(a) For LCL < u < UCL, let L(u) stand for the mean number of ad-
ditional observations (beyond X
1
) that will be required to produce
an out of control signal on the chart, given that X
1
= u. Carefully
derive an integral equation for L(u).
(b) Suppose that you can solve your equation from (a) for the function
L(u) and that it is sensible to assume that X
1
is normal with mean 0
and variance
2
=(1
2
). Show how you would compute the ARL for
the Shewhart individuals chart under this model for the X sequence.
2.28. A one-sided upper CUSUM scheme with reference value k
1
= :5 and de-
cision interval h
1
= 4 is to be used to monitor Poisson () observations.
(CUSUM 4 causes a signal.)
(a) Set up, but dont try to manipulate with a Markov Chain transition
matrix that you could use to nd (exact) ARLs for this scheme.
(b) Set up, but dont try to manipulate with a Markov Chain transition
matrix that you could use to obtain (exact) ARLs if the CUSUM
scheme is combined with a Shewhart-type scheme that signals any
time an observation 3 or larger is obtained.
2.29. In 2.3, Vardeman argued that if Q
1
; Q
2
; : : : are iid continuous random
variables with probability density f and cdf F, a one-sided (high side)
CUSUM scheme with reference value k
1
and decision interval h
1
has ARL
function L(u) satisfying the integral equation
L(u) = 1 +L(0)F(k
1
u) +
Z
h1
0
L(y)f(y +k
1
u)dy :
Suppose that a (one-sided) Shewhart type criterion is added to the CUSUM
alarm criterion. That is, consider a monitoring system that signals the rst
time the high side CUSUM exceeds h
1
or Q
t
> M, for a constant M > k
1
.
Carefully derive an integral equation similar to the one above that must be
satised by the ARL function of the combined Shewhart-CUSUM scheme.
2.30. Consider the problem of nding ARLs for CUSUMschemes where Q
1
; Q
2
; : : :
are iid exponential with mean 1. That is, suppose that one is CUSUMing
iid random variables with common probability density
f(x) =
e
x
for x > 0
0 otherwise :
(a) Argue that the ARL function of a high side CUSUM scheme for this
situation satises the dierential equation
L
0
(u) =
L(u) L(0) 1 for 0 u k

1
L(u) L(u k
1
) 1 for k
1
u :
(Vardeman and Ray (Technometrics, 1985) solve this dierential equa-
tion and a similar one for low side CUSUMs to obtain ARLs for
exponential Q.)
(b) Suppose that one decides to approximate high side exponential CUSUM
ARLs by using simple numerical methods to solve (approximately)
the integral equation discussed in class. For the case of k
1
= 1:5 and
h
1
= 4:0, write out the R matrix (in the equation L = 1 +RL) one
has using the quadrature rule dened by m = 8, a
i
= (2i 1)h
1
=2m
and each w
i
= h
1
=m.
(c) Consider making a Markov Chain approximation to the ARL referred
to in part (b). For m = 8 and the discretization discussed in class,
write out the R matrix that would be used in this case. How does
this matrix compare to the one in part (b)?
2.31. Consider the problem of determining the run length properties of a high
side CUSUM scheme with head start u, reference value k and decision
interval h if iid continuous observations Q
1
; Q
2
; : : : with common proba-
bility density f and cdf F are involved. Let T be the run length variable.
In class, Vardeman concentrated on L(u) =ET, the ARL of the scheme.
But other features of the run length distribution might well be of interest
in some applications.
(a) The variance of T, Var T =ET
2
L
2
(u) might also be of importance
in some instances. Let M(u) =ET
2
and argue very carefully that
M(u) must satisfy the integral equation
M(u) = 1+(M(0) + 2L(0)) F(ku)+
Z
h
0
(M(s) + 2L(s)) f(s+ku)ds :
(Once one has found L(u), this gives an integral equation that can
be solved for M(u), leading to values for Var T, since then Var T =
M(u) L
2
(u).)
(b) The probability function of T, P(t; u) = Pr[T = t] might also be of
importance in some instances. Express P(1; u) in terms of F. Then
argue very carefully that for t > 1, P(t; u) must satisfy the recursion
P(t; u) = P(t 1; 0)F(k u) +
Z
h
0
P(t 1; s)f(s +k u)ds :
(There is thus the possibility of determining successively the function
P(1; u), then the function P(2; u), then the function P(3; u), etc.)
2.32. In 2.2, Vardeman considered a two alarm rule monitoring scheme due
to Wetherill and showed how nd the ARL for that scheme by solving
two linear equations for quantities L
1
and L
2
. It is possible to extend the
arguments presented there and nd the variance of the run length.
(a) For a generic random variable X, express both Var X and E(X+1)
2
in terms of EX and EX
2
.
(b) Let M
1
be the expected square of the run length for the Wetherill
scheme and let M
2
be the expected square of the number of additional
plotted points required to produce an out-of-control signal if there has
been no signal to date and the current plotted point is between 2-
and 3-sigma limits. Set up two equations for M
1
and M
2
that are
linear in M
1
, M
2
, L
1
and L
2
.
(c) The equations from (b) can be solved simultaneously for M
1
and M
2
.
Express the variance of the run length for the Wetherill scheme in
terms of M
1
, M
2
, L
1
and L
2
.
2.33. Consider a Shewhart control chart with the single extra alarm rule signal
if 2 out of any 3 consecutive points fall between 2 and 3 limits on one
side of the center line. Suppose that points Q
1
; Q
2
; Q
3
; : : : are to be
plotted on this chart and that the Qs are iid.
Use the notation
p
A
= the probability Q
1
falls outside 3 limits
p
B
=the probability Q
1
falls between 2 and 3 limits above the center line
p
C
=the probability Q
1
falls between 2 and 3 limits below the center line
p
D
=the probability Q
1
falls inside 2 limits
and set up a Markov Chain that you can use to nd the ARL of this scheme
under the iid model for the Qs. (Be sure to carefully and completely dene
your state space, write out the proper transition matrix and indicate which
entry of (I R)
1
1 gives the desired ARL.)
2.34. A process has a good state and a bad state. Suppose that when in
the good state, the probability that an observation on the process plots
outside of control limits is g, while the corresponding probability for the
bad state is b. Assume further that if the process is in the good state at
time t 1, there is a probability d of degradation to the bad state before
an observation at time t is made. (Once the process moves into the bad
state it stays there until that condition is detected via process monitoring
and corrected.) Find the ARL/mean time of alarm, if the process is in
the good state at time t = 0 and observation starts at time t = 1.
2.35. Consider the following (nonstandard) process monitoring scheme for a
variable X that has ideal value 0. Suppose h(x) > 0 is a function with
h(x) = h(x) that is decreasing in jxj. (h has its maximum at 0 and
decreases symmetrically as one moves away from 0.) Then suppose that
i) control limits for X
1
are h(0),
and ii) for t > 1 control limits for X
t
are h(X
t1
).
(Control limits vary. The larger that jX
t1
j is, the tighter are the limits
on X
t
.) Discuss how you would nd an ARL for this scheme for iid X
with marginal probability density f. (Write down an appropriate integral
equation, briey discuss how you would go about solving it and what you
would do with the solution in order to nd the desired ARL.)
2.36. Consider the problemof monitoring integer-valued variables Q
1
; Q
2
; Q
3
; :::(well
suppose that Q can take any integer value, positive or negative). Dene
h(x) = 4 jxj
and consider the following denition of an alarm scheme:
1) alarm at time i = 1 if jQ
1
j 4, and
2) for i 2 alarm at time i if jQ
i
j h(Q
i1
).
For integer j, let q
j
= P[Q
1
= j] and suppose the Q
i
are iid. Carefully
describe how to nd the ARL for this situation. (You dont need to pro-
duce a formula, but you do need to set up an appropriate MC and tell me
exactly/completely what to do with it in order to get the ARL.)
2.37. Consider the problem of monitoring integer-valued variables Q
t
(well sup-
pose that Q can take any integer value, positive or negative). A combi-
nation of individuals and moving range charts will be used according to
the scheme that at time 1, Q
1
alone will be plotted, while at time t > 1
both Q
t
and MR
t
= jQ
t
Q
t1
j will be plotted. The alarm will ring at
the rst period where jQ
t
j > 3 or MR
t
> 4. Suppose that the variables
Q
1
; Q
2
; : : : are iid and p
i
= P[Q
1
= i]. Consider the problem of nding
an average run length in this scenario.
(a) Set up the transition matrix for an 8 state Markov Chain describing
the evolution of this charting method from t = 2 onward, assuming
that the alarm doesnt ring at t = 1. (State S
i
for i = 3, 2, 1, 0,
1, 2, 3 will represent the situation no alarm yet and the most recent
observation is i and there will be an alarm state.)
(b) Given values for the p
i
, one could use the transition matrix from part
(a) and solve for mean times to alarm from the states S
i
. Call these
L
3
, L
2
, L
1
, L
0
, L
1
, L
2
, and L
3
. Express the average run length
of the whole scheme (including the plotting at time t = 1 when only
Q
1
is plotted) in terms of the L
i
and p
i
values.
3. ENGINEERING CONTROL AND STOCHASTIC CONTROL THEORY93
3 Engineering Control and Stochastic Control
Theory
3.1. Consider the use of the PI(D) controller X(t) = :5E(t) +:25E(t) in a
situation where the control gain, G, is 1 and the target for the controlled
variable is T(t)
:
= 0. Suppose that no control actions are applied before the
time t = 0, but that for t 0, E(t) and E(t) are used to make changes
in the manipulated variable, X(t), according to the above equation.
Suppose further that the value of the controlled variable, Y (t), is the sum
of what the process would do with no control, say Z(t), and the sum of
eects at time t of all changes in the manipulated variable made in previous
periods based on E(0), E(0), E(1), E(1), E(2), E(2); : : : ; E(t 1),
E(t 1).
Consider 3 possible patterns of impact at time s of a change in the ma-
nipulated variable made at time t, X(t) :
Pattern 1: The eect on Y (s) is 1 X(t) for all s t + 1 (a control action takes its
full eect immediately).
Pattern 2: The eect on Y (t + 1) is 0, but the eect on Y (s) is 1 X(t) for all
s t + 2 (there is one period of dead time, after which a control action
immediately takes its full eect).
Pattern 3: The eect on Y (s) is 1 (1 2
ts
)X(t) for all s t + 1 (there is an
exponential/geometric pattern in the way the impact of X(t) is felt,
the full eect only being seen for large s).
Consider also 3 possible deterministic patterns of uncontrolled process
behavior, Z(t):
Pattern A: Z(t) = 3 for all t 1 (the uncontrolled process would remain
constant, but o target).
Pattern B: Z(t) = 3 for all 1 t 5, while Z(t) = 3 for all 6 t (there is a
step change in where the uncontrolled process would be).
Pattern C: Z(t) = 3 +t for all t 1 (there is a linear trend in where the
uncontrolled process would be).
For each of the 3 3 = 9 combinations of patterns in the impact of
changes in the manipulated variable and behavior of the uncontrolled
process, make up a table giving at times t = 1; 0; 1; 2; : : : ; 10 the val-
ues of Z(t), E(t), E(t), X(t) and Y (t).
3.2. Consider again the PI(D) controller of Problem 3.1. Suppose that the
target is T(t), where T(t) = 0 for t 5 and T(t) = 3 for t > 5. For the
Pattern 1 of impact of control actions and Patterns A, B and C for Z(t),
make up tables giving at times t = 1; 0; 1; 2; : : : ; 10 the values of Z(t),
T(t), E(t), E(t), X(t) and Y (t).
3.3. Consider again the PI(D) controller of Problem 3.1 and
Pattern D: Z(t) = (1)
t
(the uncontrolled process would oscillate around the
target).
For the Patterns 1 and 2 of impact of control actions, make up tables giving
at times t = 1; 0; 1; 2; : : : ; 10 the values of Z(t), T(t), E(t), E(t), X(t)
and Y (t).
3.4. There are two tables here giving some values of an uncontrolled process
Z(t) that has target T(t)
:
= 0. Suppose that a manipulated variable X is
available and that the simple (integral only) control algorithm
X(t) = E(t)
will be employed, based on an observed process Y (t) that is the sum of
Z(t) and the eects of all relevant changes in X.
Consider two dierent scenarios:
(a) a change of X in the manipulated variable impacts all subsequent
values of Y (t) by the addition of an amount X, and
(b) there is one period of dead time, after which a change of X in the
manipulated variable impacts all subsequent values of Y (t) by the
addition of an amount X.
Fill in the two tables according to these two scenarios and then comment
on the lesson they seem to suggest about the impact of dead time on the
eectiveness of PID control.
3.5. On pages 87 and 88 V&J suggest that over-adjustment of a process will
increase rather than decrease variation. In this problem we will investi-
gate this notion mathematically. Imagine periodically sampling a widget
produced by a machine and making a measurement y
i
. Conceptualize the
situation as
y
i
=
i
+
i
where
Table 6.1: Table for Problem 3.4(a), No Dead Time
t Z(t) T(t) Y (t) E(t) = X(t)
0 1 0 1
1 1 0
2 1 0
3 1 0
4 1 0
5 1 0
6 1 0
7 1 0
8 1 0
9 1 0
Table 6.2: Table for Problem 3.4(a), One Period of Dead Time
t Z(t) T(t) Y (t) E(t) = X(t)
0 1 0 1
1 1 0
2 1 0
3 1 0
4 1 0
5 1 0
6 1 0
7 1 0
8 1 0
9 1 0
i
= the true machine setting (or widget diameter) at time i
and
i
= random variability at time i aecting only measurement i .
Further, suppose that the (coded) ideal diameter is 0 and
i
is the sum of
natural machine drift and adjustments applied by an operator up through
time i. That is, with
i
= the machine drift between time i 1 and time i
and
i
= the operator (or automatic controllers) adjustment applied
between time i 1 and time i
suppose that
0
= 0 and for j 1 we have
j
=
j
X
i=1
i
+
j
X
i=1
i
:
We will here consider the (integral-only) adjustment policies for the ma-
chine
i
= y
i1
for an 2 [0; 1] :
It is possible to verify that for j 1
if = 0 : y
j
=
P
j
i=1
i
+
j
if = 1 : y
j
=
j
j1
+
j
and if 2 (0; 1) : y
j
=
P
j
i=1
i
(1 )
ji
P
j
i=1
i1
(1 )
ji
+
j
.
Model
0
;
1
;
2
; : : : as independent random variables with mean 0 and vari-
ance
2
and consider predicting the likely eectiveness of the adjustment
policies by nding lim
j!1
E
2
j
. (E
2
j
is a measure of how close to proper
adjustment the machine can be expected to be at time j.)
(a) Compare choices of supposing that
i
:
= 0. (Here the process is
stable.)
(b) Compare choices of supposing that
i
:
= d, some constant. (This is
a case of deterministic linear machine drift, and might for example
be used to model tool wear over reasonably short periods.)
(c) Compare choices of supposing
1
;
2
; : : : is a sequence of indepen-
dent random variables with mean 0 and variance
2
that is indepen-
dent of the sequence. What would you recommend using if this
(random walk) model seems appropriate and is thought to be about
one half of ?
3.6. Suppose that : : : ; (1); (0); (1); (2); : : : are iid normal random variables
with mean 0 and variance
2
and that
Z(t) = (t 1) +(t) :
(Note that under this model consecutive Zs are correlated, but those
separated in time by at least 2 periods are independent.) As it turns out,
under this model
E
F
[Z(t + 1)jZ
t
] =
1
t + 2
t
X
j=0
(1)
j
(t + 1 j)Z(t j)
while
E
F
[Z(s)jZ
t
] = 0 for s t + 2 :
If T(t)
:
= 0 nd optimal (MV) control strategies for two dierent situations
involving numerical process adjustments a.
(a) First suppose that A(a; s) = a for all s 1. (Note that in the limit
as t !1, the MV controller is a proportional-only controller.)
(b) Then suppose the impact of a control action is similar to that in (a),
except there is one period of delay, i.e.
A(a; s) =
a for s 2
0 for s = 1
(You should decide that a(t)
:
= 0 is optimal.)
(c) For the situation without dead time in part (a), write out Y (t) in
terms of s. What are the mean and variance of Y (t)? How do these
compare to the mean and variance of Z(t)? Would you say from this
comparison that the control algorithm is eective in directing the
process to the target T(t) = 0?
(d) Again for the situation of part (a), consider the matter of process
monitoring for a change from the model of this problem (that ought
to be greeted by a revision of the control algorithm or some other
appropriate intervention). Argue that after some start-up period it
makes sense to Shewhart chart the Y (t)s, treating them as essentially
iid Normal (0;
2
) if all is OK. (What is the correlation between
Y (t) and Y (t 1)?)
3.7. Consider the optimal stochastic control problem as described in 3.1 with
Z(t) an iid normal (0; 1) sequence of random variables, control actions
a 2 (1; 1), A(a; s) = a for all s 1 and T(s)
:
= 0 for all s. What do
you expect the optimal (minimum variance) control strategy to turn out
to be? Why?
3.8. (Vander Wiel) Consider a stochastic control problem with the following
elements. The (stochastic) model, F, for the uncontrolled process, Z(t),
will be
Z(t) = Z(t 1) +(t)
where the (t) are iid normal (0;
2
) random variables and is a (known)
constant with absolute value less than 1. (Z(t) is a rst order autoregres-
sive process.) For this model,
E
F
[Z(t + 1)j : : : ; Z(1); Z(0); Z(1); : : : ; Z(t)] = Z(t) :
For the function A(a; s) describing the eect of a control action a taken s
periods previous, we will use A(a; s) = a
s1
for another known constant
0 < < 1 (the eect of an adjustment made at a given period dies out
geometrically).
Carefully nd a(0), a(1), and a(2) in terms of a constant target value T
and Z(0), Y (1) and Y (2). Then argue that in general
a(t) = T
1 + ( )
t1
X
s=0
s
!
Y (t) ( )
t
X
s=1
s
Y (t s) :
For large t, this prescription reduces to approximately what?
3.9. Consider the following stochastic control problem. The stochastic model,
F, for the uncontrolled process Z(t), will be
Z(t) = ct +(t)
where c is a known constant and the (t)s are iid normal (0;
2
) random
variables. (The Z(t) process is a deterministic linear trend seen through
iid/white noise.) For the function A(a; s) describing the eect of a control
action a taken s periods previous, we will use A(a; s) = (1 2
s
)a for all
s 1. Suppose further that the target value for the controlled process is
T = 0 and that control begins at time 0 (after observing Z(0)).
(a) Argue carefully that
b
Z(t) =E
F
[Z(t+1)j : : : ; Z(1); Z(0); Z(1); : : : ; Z(t)] =
c(t + 1).
(b) Find the minimum variance control algorithm and justify your an-
swer. Does there seem to be a limiting form for a(t)?
(c) According to the model here, the controlled process Y (t) should have
what kind of behavior? (How would you describe the joint distribu-
tion of the variables Y (1); Y (2); : : : ; Y (t)?) Suppose that you decide
to set up Shewhart type control limits to use in monitoring the
Y (t) sequence. What values do you recommend for LCL and UCL
in this situation? (These could be used as an on-line check on the
continuing validity of the assumptions that we have made here about
F and A(a; s).)
3.10. Consider the following optimal stochastic control problem. Suppose that
for some (known) appropriate constants and , the uncontrolled process
Z(t) has the form
Z(t) = Z(t 1) +Z(t 2) +(t)
for the s iid with mean 0 and variance
2
. (The s are independent of
all previous Zs.) Suppose further that for control actions a 2 (1; 1),
A(a; 1) = 0 and A(a; s) = a for all s 2. (There is a one period delay,
following which the full eect of a control action is immediately felt.) For
s 1, let T(s) be an arbitrary sequence of target values for the process.
(a) Argue that
E
F
[Z(t + 1)j : : : ; Z(t 2); Z(t 1); Z(t)] = Z(t) +Z(t 1)
and that
E
F
[Z(t+2)j : : : ; Z(t2); Z(t1); Z(t)] = (
2
+)Z(t)+Z(t1) :
(b) Carefully nd a(0), a(1) and a(2) in terms of Z(1), Z(0), Y (1),
Y (2) and the T(s) sequence.
(c) Finally, give a general form for the optimal control action to be taken
at time t 3 in terms of : : : ; Z(1); Z(0); Y (1); Y (2); : : : ; Y (t) and
a(0); a(1); : : : ; a(t 1).
3.11. Use the rst order autoregressive model of Problem 3.8 and consider the
two functions A(a; s) from Problem 3.6. Find the MV optimal control
polices (in terms of the Y s) for the T
:
= 0 situation. Are either of these
PID control algorithms?
3.12. A process has a Good state and a Bad state. Every morning a gremlin
tosses a coin with P[Heads] = u > :5 that governs how states evolve day
to day. Let
C
i
= P[change state on day i from that on day i 1] .
Each C
i
is either u or 1 u.
(a) Before the gremlin tosses the coin on day i, you get to choose whether
C
i
= u (so that Heads =)change)
or
C
i
= 1 u (so that Heads =)no change)
(You either apply some counter-measures or let the process evolve
naturally.) Your object is to see that the process is in the Good state
as often as possible. What is your optimal strategy? (What should
you do on any morning i? This needs to depend upon the state of
the process from day i 1.)
(b) If all is as described here, the evolution of the states under your op-
timal strategy from (a) is easily described in probabilistic terms. Do
so. Then describe in rough/qualitative terms how you might monitor
the sequence of states to detect the possibility that the gremlin has
somehow changed the rules of process evolution on you.
(c) Now suppose that there is a one-day time delay in your counter-
measures. Before the gremlin tosses his coin on day you get to choose
only whether
C
i+1
= u
or
C
i+1
= 1 u:
(You do not get to choose C
i
on the morning of day i.) Now what is
your optimal strategy? (What you should choose on the morning of
day i depends upon what you already chose on the morning of day
(i 1) and whether the process was in the Good state or in the Bad
state on day (i 1).) Show appropriate calculations to support your
answer.
4. PROCESS CHARACTERIZATION 101
4 Process Characterization
4.1. The following are depth measurements taken on n = 8 pump end caps.
The units are inches.
4:9991; 4:9990; 4:9994; 4:9989; 4:9986; 4:9991; 4:9993; 4:9990
The specications for this depth measurement were 4:999 :001 inches.
(a) As a means of checking whether a normal distribution assumption is
plausible for these depth measurements, make a normal plot of these
data. (Use regular graph paper and the method of Section 5.1.) Read
an estimate of from this plot.
Regardless of the appearance of your plot from (a), henceforth sup-
pose that one is willing to say that the process producing these
lengths is stable and that a normal distribution of depths is plau-
sible.
(b) Give a point estimate and a 90% two-sided condence interval for
the process capability, 6.
(c) Give a point estimate and a 90% two-sided condence interval for
the process capability ratio C
p
.
(d) Give a point estimate and a 95% lower condence bound for the
process capability ratio C
pk
.
(e) Give a 95% two-sided prediction interval for the next depth measure-
ment on a cap produced by this process.
(f) Give a 99% two-sided tolerance interval for 95% of all depth mea-
surements of end caps produced by this process.
4.2. Below are the logarithms of the amounts (in ppm by weight) of aluminum
found in 26 bihourly samples of recovered PET plastic at a Rutgers Uni-
versity recycling plant taken from a JQT paper by Susan Albin. (In this
context, aluminum is an impurity.)
5.67, 5.40, 4.83, 4.37, 4.98, 4.78, 5.50, 4.77, 5.20, 4.14, 3.40, 4.94, 4.62,
4.62, 4.47, 5.21, 4.09, 5.25, 4.78, 6.24, 4.79, 5.15, 4.25, 3.40, 4.50, 4.74
(a) Set up and plot charts for a sensible monitoring scheme for these
values. (They are in order if one reads left to right, top to bottom.)
Caution: Simply computing a mean and sample standard deviation
for these values and using limits for individuals of the form x 3s
does not produce a sensible scheme! Say clearly what you are doing
and why.
(b) Suppose that (on the basis of an analysis of the type in (a) or other-
wise) it is plausible to treat the 26 values above as a sample of size
n = 26 from some physically stable normally distributed process.
(Note x 4:773 and s :632.)
i. Give a two-sided interval that you are 90% sure will contain
the next log aluminum content of a sample taken at this plant.
Transform this to an interval for the next raw aluminum content.
ii. Give a two-sided interval that you are 95% sure will contain
90% of all log aluminum contents. Transform this interval to one
for raw aluminum contents.
(c) Rather than adopting the stable process model alluded to in part
(b) suppose that it is only plausible to assume that the log purity
process is stable for periods of about 10 hours, but that mean purities
can change (randomly) at roughly ten hour intervals. Note that if
one considers the rst 25 values above to be 5 samples of size 5, some
summary statistics are then given below:
period 1 2 3 4 5
x 5.050 4.878 4.410 5.114 4.418
s .506 .514 .590 .784 .661
R 1.30 1.36 1.54 2.15 1.75
Based on the usual random eects model for this two-level nested/hierarchical
situation, give reasonable point estimates of the within-period stan-
dard deviation and the standard deviation governing period to period
changes in process mean.
4.3. A standard (in engineering statistics) approximation due to Wallis (used
on page 468 of V&J) says that often it is adequate to treat the variable
x ks as if it were normal with mean k and variance
1
n
+
k
2
2n
:
Use the Wallis approximation to the distribution of x + ks and nd k
such that for x
1
; x
2
; : : : ; x
26
iid normal random variables, x +ks is a 99%
upper statistical tolerance bound for 95% of the population. (That is,
your job is to choose k so that P[
x+ks
:95] :99.) How does

your approximate value compare to the exact one given in Table A.9b?
4.4. Consider the problem of pooling together samples of size n from, say,
ve dierent days to make inferences about all widgets produced during
that period. In particular, consider the problem of estimating the fraction
of widgets with diameters that are outside of engineering specications.
Suppose that
N
i
= the number of widgets produced on day i
p
i
= the fraction of widgets produced on day i that have
diameters that are outside engineering specications
and
^ p
i
= the fraction of the ith sample that have out-of-spec. diameters .
If the samples are simple random samples of the respective daily produc-
tions, standard nite population sampling theory says that
Eb p
i
= p
i
and Var b p
i
=
N
i
1
N
i
n
p
i
(1 p
i
)
n
:
Two possibly dierent estimators of the population fraction of diameters
out of engineering specications,
p =
5
X
i=1
N
i
p
i
5
X
i=1
N
i
;
are
b p =
5
X
i=1
N
i
b p
i
5
X
i=1
N
i
and

b p =
1
5
5
X
i=1
^ p
i
:
Show that E^ p = p, but that E
b p need not be p unless all N

i
are the same.
Assuming the independence of the ^ p
i
, what are the variances of ^ p and

b p ?
Note that neither of these needs to equal
N 1
N 5n
p(1 p)
5n
:
4.5. Suppose that the hierarchical random eects model used in Section 5.5 of
V&J is a good description of how 500 widget diameters arise on each of 5
days in each of 10 weeks. (That is, suppose that the model is applicable
with I = 10, J = 5 and K = 500.) Suppose further, that of interest is
the grand (sample) variance of all 10 5 500 widget diameters. Use the
expected mean squares and write out an expression for the expected value
of this variance in terms of
2
,
2
and
2
.
Now suppose that one only observes 2 widget diameters each day for 5
weeks and in fact obtains the data in the accompanying table. From
these data obtain point estimates of the variance components
2
,
2
and
2
. Use these and your formula from above to predict the variance of all
10 5 500 widget diameters. Then make a similar prediction for the
variance of the diameters from the next 10 weeks, supposing that the
2
variance component could be eliminated.

4.6. Consider a situation in which a lot of 50,000 widgets has been packed into
100 crates, each of which contains 500 widgets. Suppose that unbeknownst
to us, the lot consists of 25,000 widgets with diameter 5 and 25,000 widgets
with diameter 7. We wish to estimate the variance of the widget diameters
in the lot (which is 50,000/49,999). To do so, we decide to select 4 crates
at random, and from each of those, select 5 widgets to measure.
(a) One (not so smart) way to try and estimate the population variance
is to simply compute the sample variance of the 20 widget diameters
we end up with. Find the expected value of this estimator under
two dierent scenarios: 1st where each of the 100 crates contains 250
widgets of diameter 5 and 250 widgets with diameter 7, and then
2nd where each crate contains widgets of only one diameter. What,
in general terms, does this suggest about when the naive sample
variance will produce decent estimates of the population variance?
(b) Give the formula for an estimator of the population variance that is
unbiased (i.e. has expected value equal to the population variance).
4.7. Consider the data of Table 5.8 in V&J and the use of the hierarchical
normal random eects model to describe their generation.
(a) Find point estimates of the parameters
2
and
2
based rst on
ranges and then on ANOVA mean squares.
Table 6.3: Data for Problem 4.5
Day k = 1 k = 2 y
ij
s
2
ij
y
i:
s
2
Bi
M 15:5 14:9 15:2 .18
T 15:2 15:2 15:2 0
Week 1 W 14:2 14:2 14:2 0 15:0 .605
R 14:3 14:3 14:3 0
F 15:8 16:4 16:1 .18
M 6:2 7:0 6:6 .32
T 7:2 8:4 7:8 .72
Week 2 W 6:6 7:8 7:2 .72 7:0 .275
R 6:2 7:6 6:9 .98
F 5:6 7:4 6:5 1:62
M 15:4 14:4 14:9 .50
T 13:9 13:3 13:6 .18
Week 3 W 13:4 14:8 14:1 .98 14:0 .370
R 12:5 14:1 13:3 1:28
F 13:2 15:0 14:1 1:62
M 10:9 11:3 11:1 .08
T 12:5 12:7 12:6 .02
Week 4 W 12:3 11:7 12:0 .18 12:0 .515
R 11:0 12:0 11:5 .50
F 12:3 13:3 12:8 .50
M 7:5 6:7 7:1 .32
T 6:7 7:3 7:0 .18
Week 5 W 7:2 6:0 6:6 .72 7:0 .155
R 7:6 7:6 7:6 0
F 6:3 7:1 6:7 .32
(b) Find a standard error for your ANOVA-based estimator of
2
from
(a).
(c) Use the material in 1.5 and make a 90% two sided condence interval
for
2
.
4.8. All of the variance component estimation material presented in the text is
based on balanced data assumptions. As it turns out, it is quite possible
to do point estimation (based on sample variances) from even unbalanced
data. A basic fact that enables this is the following: If X
1
; X
2
; : : : ; X
n
are
uncorrelated random variables, each with the same mean, then
Es
2
=
1
n
n
X
i=1
Var X
i
:
(Note that the usual fact that for iid X
i
, Es
2
=
2
, is a special case of
this basic fact.)
Consider the (hierarchical) random eects model used in Section 5.5 of
the text. In notation similar to that in Section 5.5 (but not assuming that
data are balanced), let
y
ij
= the sample mean of data values at level i of A and level j of B within A
s
2
ij
= the sample variance of the data values at level i of A and level j of B within A
y
i
= the sample mean of the values y
ij
at level i of A
s
2
Bi
= the sample variance of the values y
ij
at level i of A
and
s
2
A
= the sample variance of the values y
i
Suppose that instead of being furnished with balanced data, one has a
data set where 1) there are I = 2 levels of A, 2) level 1 of A has J
1
= 2
levels of B while level 2 of A has J
2
= 3 levels of B, and 3) level 1 of B
within level 1 of A has n
11
= 2 levels of C, level 2 of B within level 1 of
A has n
12
= 4 levels of C, levels 1 and 2 of B within level 2 of A have
n
21
= n
22
= 2 levels of C and level 3 of B within level 2 of A has n
23
= 3
levels of C.
Evaluate the following: Es
2
pooled
, E
1
5
P
i;j
s
2
ij
, Es
2
B1
, Es
2
B2
, E
1
2
s
2
B1
+s
2
B2
,
Es
2
A
. Then nd linear combinations of s
2
pooled
,
1
2
s
2
B1
+s
2
B2
and s
2
A
that could sensibly used to estimate
2
and
2
.
4.9. Suppose that on I = 2 dierent days (A), J = 4 dierent heats (B) of
cast iron are studied, with K = 3 tests (C) being made on each. Suppose
further that the resulting percent carbon measurements produce SSA =
:0355, SSB(A) = :0081 and SSC(B(A)) = SSE = :4088.
(a) If one completely ignores the hierarchical structure of the data set,
what sample variance is produced? Does this quantity estimate
the variance that would be produced if on many dierent days a
single heat was selected and a single test made? Explain carefully!
(Find the expected value of the grand sample variance under the
hierarchical random eects model and compare it to this variance of
single measurements made on a single day.)
(b) Give point estimates of the variance components
2
;
2
and
2
.
(c) Your estimate of
2
should involve a linear combination of mean

squares. Give the variance of that linear combination in terms of the
model parameters and I; J and K. Use that expression and propose
a sensible estimated standard deviation (a standard error) for this
linear combination. (See 1.4 and Problem 1.9.)
4.10. Consider the one variable/second order version of the propagation of
error ideas discussed in Section 5.4 of the text. That is, for a random
variable X with mean and standard deviation
2
and nice function g,
let Y = g(X) and consider approximating EY and Var Y . A second order
approximation of g made at the point x = is
g(x) g() +g
0
()(x ) +
1
2
g
00
()(x )
2
:
(Note that the approximating quadratic function has the same value,
derivative and second derivative as g for the value x = .) Let
3
=E(X
)
3
and
4
=E(X )
4
. Based on the above preamble, carefully argue
for the appropriateness of the following approximations:
EY g() +
1
2
g
00
()
2
and
Var Y (g
0
())
2
2
+g
0
()g
00
()
3
+
1
4
(g
00
())
2
(
4
4
) :
4.11. (Vander Wiel) A certain RCL network involving 2 resistors, 2 capaci-
tors and a single inductor has a dynamic response characterized by the
transfer function
V
out
V
in
(s) =
s
2
+
1
!
1
s +!
2
1
s
2
+
2
!
2
s +!
2
2
;
where
!
1
= (C
2
L)
1/2
;
!
2
=
C
1
+C
2
LC
1
C
2
1/2
;
1
=
R
2
2L!
1
;
and
2
=
R
1
+R
2
2L!
2
:
R
1
and R
2
are the resistances involved in ohms, C
1
and C
2
are the ca-
pacitances in Farads, and L is the value of the inductance in Henries.
Standard circuit theory says that !
1
and !
2
are the natural frequencies
of this network,
!
2
1
=!
2
2
= C
1
=(C
1
+C
2
)
is the DC gain, and
1
and
2
determine whether the zeros and poles
are real or complex. Suppose that the circuit in question is to be massed
produced using components with the following characteristics:
EC
1
=
1
399
F Var C
1
=
1
3990
2
ER
1
= 38 Var R
1
= (3:8)
2
EC
2
=
1
2
F Var C
2
=
1
20
2
ER
2
= 2 Var R
2
= (:2)
2
EL = 1H Var L = (:1)
2
Treat C
1
, R
2
, C
2
, R
2
and L
2
as independent random variables and use
the propagation of error approximations to do the following:
(a) Approximate the mean and standard deviation of the DC gains of
the manufactured circuits.
(b) Approximate the mean and standard deviation of the natural fre-
quency !
2
.
Now suppose that you are designing such an RCL circuit. To simplify
things, use the capacitors and the inductor described above. You may
choose the resistors, but their quality will be such that
Var R
1
= (ER
1
=10)
2
and Var R
2
= (ER
2
=10)
2
:
Your design goals are that
2
should be (approximately) .5, and sub-
ject to this constraint, Var
2
be minimum.
(c) What values of ER
1
and ER
2
satisfy (approximately) the design
goals, and what is the resulting (approximate) standard deviation
of
2
?
(Hint for part (c): The rst design goal allows one to write ER
2
as a
function of ER
1
. To satisfy the second design goal, use the propagation of
error idea to write the (approximate) variance of
2
as a function of ER
1
only. By the way, the rst design goal allows you to conclude that none
of the partial derivatives needed in the propagation of error work depend
on your choice of ER
1
.)
4.12. Manufacturers wish to produce autos with attractive t and nish, part
of which consists of uniform (and small) gaps between adjacent pieces
of sheet metal (like, e.g., doors and their corresponding frames). The
accompanying gure is an idealized schematic of a situation of this kind,
where we (at least temporarily) assume that edges of both a door and
its frame are linear. (The coordinate system on this diagram is pictured
as if its axes are vertical and horizontal. But the line on the body
need not be an exactly vertical line, and whatever this lines intended
orientation relative to the ground, it is used to establish the coordinate
system as indicated on the diagram.)
On the gure, we are concerned with gaps g
1
and g
2
. The rst is at the
level of the top hinge of the door and the second is d units below that
level in the body coordinate system (d units down the door frame line
from the initial measurement). People manufacturing the car body are
responsible for the dimension w. People stamping the doors are responsi-
ble for the angles
1
and
2
and the dimension y. People welding the top
door hinge to the door are responsible for the dimension x. And people
hanging the door on the car are responsible for the angle . The quantities
x; y; w; ;
1
and
2
are measurable and can be used in manufacturing to
1

2
s r
p
q
top hinge and origin of
the body coordinate system
line of body door frame
door
u t
w
y
x
d
g
g
1
2
Figure 6.1: Figure for Problem 4.12
verify that the various folks are doing their jobs. A door design engineer
has to set nominal values for and produce tolerances for variation in these
quantities. This problem is concerned with how the propagation of errors
method might help in this tolerancing enterprise, through an analysis of
how variation in x; y; w; ;
1
and
2
propagates to g
1
; g
2
and g
1
g
2
.
If I have correctly done my geometry/trigonometry, the following relation-
ships hold for labeled points on the diagram:
p = (xsin; xcos )
q = p + (y cos
; y sin
s = (q
1
+q
2
tan( +
1
+
2
) ; 0)
and
u = (q
1
+ (q
2
+d) tan( +
1
+
2
) ; d) :
Then for the idealized problem here (with perfectly linear edges) we have
g
1
= w s
1
and
g
2
= w u
1
:
Actually, in an attempt to allow for the notion of form error in the ideally
linear edges, one might propose that at a given distance below the origin
of the body coordinate system the realized edge of a real geometry is its
nominal position plus a form error. Then instead of dealing with g
1
and
g
2
, one might consider the gaps
g
1
= g
1
+
1
2
and
g
2
= g
2
+
3
4
;
for body form errors
1
and
3
and door form errors
2
and
4
. (The in-
terpretation of additive form errors around the line of the body door
frame is perhaps fairly clear, since the error at a given level is measured
perpendicular to the body line and is thus well-dened for a given real-
ized body geometry. The interpretation of an additive error on the right
side door line is not so clear, since in general one will not be measuring
perpendicular to the line of the door, or even at any consistent angle with
it. So for a realized geometry, what form error to associate with a given
point on the ideal line or exactly how to model it is not completely clear.
Well ignore this logical problem and proceed using the models above.)
Well use d = 40 cm, and below are two possible sets of nominal values
for the parameters of the door assembly:
Design A
x = 20 cm
y = 90 cm
w = 90:4 cm
= 0
1
=

2
2
=

2
Design B
x = 20 cm
y = 90 cm
w = (90 cos

10
+:4) cm
=

10
1
=

2
2
=
4
10
Partial derivatives of g
1
and g
2
(evaluated at the design nominal values of
x; y; w; ;
1
and
2
) are:
Design A
@g1
@x
= 0
@g1
@y
= 1
@g1
@w
= 1
@g
1
@
= 0
@g1
@
1
= 20
@g1
@2
= 20
@g2
@x
= 0
@g
2
@y
= 1
@g2
@w
= 1
@g2
@
= 40
@g
2
@1
= 60
@g2
@
2
= 60
Design B
@g1
@x
= :309
@g1
@y
= :951
@g1
@w
= 1
@g
1
@
= 0
@g1
@
1
= 19:021
@g1
@2
= 46:833
@g2
@x
= :309
@g
2
@y
= :951
@g2
@w
= 1
@g2
@
= 40
@g
2
@1
= 59:02
@g2
@
2
= 86:833
(a) Suppose that a door engineer must eventually produce tolerances for
x; y; w; ;
1
and
2
that are consistent with :1 cm tolerances on
g
1
and g
2
. If we interpret :1 cm tolerances to mean
g1
and
g2
are no more than .033 cm, consider the set of sigmas
x
= :01 cm
y
= :01 cm
w
= :01 cm
= :001 rad
1
= :001 rad
2
= :001 rad
First for Design A and then for Design B, investigate whether this
set of sigmas is consistent with the necessary nal tolerances on
g
1
and g
2
in two dierent ways. Make propagation of error approxi-
mations to
g1
and
g2
. Then simulate 100 values of both g
1
and g
2
using independent normal random variables x; y; w; ;
1
and
2
with
means equal to the design nominals and these standard deviations.
(Compute the sample standard deviations of the simulated values
and compare to the .033 cm target.)
(b) One of the assumptions standing behind the propagation of error
approximations is the independence of the input random variables.
Briey discuss why independence of the variables
1
and
2
may not
be such a great model assumption in this problem.
(c) Notice that for Design A the propagation of error formula predicts
that variation on the dimension x will not much aect the gaps
presently of interest, g
1
and g
2
, while the situation is dierent for
Design B. Argue, based on the nominal geometries, that this makes
perfectly good sense. For Design A, one might say that the gaps g
1
and g
2
are robust to variation in x. For this design, do you think
that the entire t of the door to the body of the car is going to be
robust to variation in x? Explain.
(Note, by the way, that the fact that
@g1
@
= 0 for Design A also makes
this design look completely robust to variation in in terms of the
gap g
1
, at least by standards of the propagation of error formula.
But the situation for this variable is somewhat dierent than for x.
This partial derivative is equal to 0 because for y; w; ;
1
and
2
at
their nominal values, g
1
considered as a function of alone has a
local minimum at = 0. This is dierent from g
1
being constant
in . A more rened second order propagation of error analysis
of this problem, that essentially begins from a quadratic approxima-
tion to g
1
instead of a linear one, would distinguish between these
two possibilities. But the rst order analysis done on the basis of
formula (5.27) of the text is often helpful and adequate for practical
purposes.)
(d) What does the propagation of error formula predict for variation in
the dierence g
1
g
2
, rst for Design A, and then for Design B?
(e) Suppose that one desires to take into account the possibility of form
errors aecting the gaps, and thus considers analysis of g
1
and g
2
instead of g
1
and g
2
. If standard deviations for the variables are
all .001 cm, what does the propagation of error analysis predict for
variability in g
1
and g
2
for Design A?
4.13. The electrical resistivity, , of a wire is a property of the material involved
and the temperature at which it is measured. At a given temperature, if
a cylindrical piece of wire of length L and (constant) cross-sectional area
A has resistance R, then the materials resistivity is calculated as
=
RA
L
:
In a lab exercise intended to determine the resistivity of copper at 20
C,
students measure the length, diameter and resistance of a wire assumed
to have circular cross-sections. Suppose the length is approximately 1
meter, the diameter is approximately 2:010
3
meters and the resistance
is approximately :54 10
2
. Suppose further that the precisions of the
measuring equipment used in the lab are such that standard deviations
L
= 10
3
meter,
D
= 10
4
meter and
R
= 10
4
are appropriate.
(a) Find an approximate standard deviation that might be used to de-
scribe the precision associated with an experimentally derived value
of .
(b) Imprecision in which of the measurements appears to be the biggest
contributor to imprecision in experimentally determined values of ?
(Explain.)
(c) One should probably expect the approximate standard deviation de-
rived here to under-predict the kind of variation that would actually
be observed in such lab exercises over a period of years. Explain why
this is so.
4.14. A bullet is red horizontally into a block (of much larger mass) suspended
by a long cord, and the impact causes the block and embedded bullet to
swing upward a distance d measured vertically from the blocks lowest
position. The laws of mechanics can be invoked to argue that if d is mea-
sured in feet, and before testing the block weighs w
1
, while the block and
embedded bullet together weigh w
2
(in the same units), then the velocity
(in fps) of the bullet just before impact with the block is approximately
v =
w
2
w
2
w
1
p
64:4 d :
Suppose that the bullet involved weighs about .05 lb, the block involved
weighs about 10.00 lb and that both w
1
and w
2
can be determined with
a standard deviation of about .005 lb. Suppose further that the distance
d is about .50 ft, and can be determined with a standard deviation of .03
ft.
(a) Compute an approximate standard deviation describing the uncer-
tainty in an experimentally derived value of v.
(b) Would you say that the uncertainties in the weights contribute more
to the uncertainty in v than the uncertainty in the distance? Explain.
(c) Say why one should probably think of calculations like those in part
(a) as only providing some kind of approximate lower bound on the
uncertainty that should be associated with the bullets velocity.
4.15. On page 243 of V&J there is an ANOVA table for a balanced hierarchical
data set. Use it in what follows.
5. SAMPLING INSPECTION 115
(a) Find standard errors for the usual ANOVA estimates of
2
and
2
(the casting and analysis variance components).
(b) If you were to later make 100 castings, cut 4 specimens from each
of these and make a single lab analysis on each specimen, give a
(numerical) prediction of the overall sample variance of these future
400 measurements (based on the hierarchical random eects model
and the ANOVA estimates of
2
,
2
and
2
).
5 Sampling Inspection
Methods
5.1. Consider attributes single sampling.
(a) Make type A OC curves for N = 20, n = 5 and c = 0 and 1, for both
percent defective and mean defects per unit situations.
(b) Make type B OC curves for n = 5, c = 0, 1 and 2 for both percent
defective and mean defects per unit situations.
(c) Use the imperfect inspection analysis presented in 5.2 and nd OC
bands for the percent defective cases above with c = 1 under the
assumption that w
D
:1 and w
G
:1.
5.2. Consider single sampling for percent defective.
(a) Make approximate OC curves for n = 100, c = 1; n = 200, c = 2;
and n = 300, c = 3.
(b) Make AOQ and ATI curves for a rectifying inspection scheme using
a plan with n = 200 and c = 2 for lots of size N = 10; 000. What is
the AOQL?
5.3. Find attributes single sampling plans (i.e. nd n and c) having approxi-
mately
(a) Pa = :95 if p = :01 and Pa = :10 if p = :03.
(b) Pa = :95 if p = 10
6
and Pa = :10 if p = 3 10
6
.
5.4. Consider a (truncated sequential) attributes acceptance sampling plan,
that for
X
n
= the number of defective items found through the nth item inspected
rejects the lot if it ever happens that X
n
1:5 +:5n, accepts the lot if it
ever happens that X
n
1:5 +:5n, and further never samples more than
11 items. We will suppose that if sampling were extended to n = 11, we
would accept for X
11
= 4 or 5 and reject for X
11
= 6 or 7 and thus note
that sampling can be curtailed at n = 10 if X
10
= 4 or 6.
(a) Find expressions for the OC and ASN for this plan.
(b) Find formulas for the AOQ and ATI of this plan, if it is used in a
rectifying inspection scheme for lots of size N = 100.
5.5. Consider single sampling based on a normally distributed variable.
(a) Find a single limit variables sampling plan with L = 1:000, = :015,
p
1
= :03, Pa
1
= :95, p
2
= :10 and Pa
2
= :10. Sketch the OC curve
of this plan. How does n compare with what would be required for
an attributes sampling plan with a comparable OC curve?
(b) Find a double limits variables sampling plan with L = :49, U = :51
= :004, p
1
= :03, Pa
1
= :95, p
2
= :10 and Pa
2
= :10. Sketch
the OC curve of this plan. How does n compare with what would
be required for an attributes sampling plan with a comparable OC
curve?
(c) Use the Wallis approximation and nd a single limit variables sam-
pling plan for L = 1:000, p
1
= :03, Pa
1
= :95, p
2
= :10 and
Pa
2
= :10. Sketch an approximate OC curve for this plan.
5.6. In contrast to what you found in Problem 5.3(b), make use of the fact that
the upper 10
6
point of the standard normal distribution is about 4.753,
while the upper 310
6
point is about 4.526 and nd the n required for a
known single limit variables acceptance sampling plan to have Pa = :95
if p = 10
6
and Pa = :10 if p = 3 10
6
. What is the Achilles heel (fatal
weakness) of these calculations?
5.7. Consider the CSP-1 plan with i = 100 and f = :02. Make AFI and AOQ
plots for this plan and nd the AOQL for both cases where defectives are
rectied and where they are culled.
5.8. Consider the classical problem of acceptance sampling plan design. Sup-
pose that one wants plans whose OC drops near p = :03 (wants Pa :5
for p = :03) also wants p = :04 to have Pa :05.
(a) Design an attributes single sampling plan approximately meeting the
above criteria.
Suppose that in fact nonconforming is dened in terms of a mea-
sured variable, X, being less than a lower specication L = 13, and
that it is sensible to use a normal model for X.
(b) Design a known variables plan for the above criteria if = 1.
(c) Design an unknown variables plan for the above criteria.
Theory
5.9. Consider variables acceptance sampling based on exponentially distributed
observations, supposing that there is a single lower limit L = :2107.
(a) Find means corresponding to fractions defective p = :10 and p = :19.
(b) Use the Central Limit Theorem to nd a number k and sample size
n so that an acceptance sampling plan that rejects a lot if x < k has
Pa = :95 for p = :10 and Pa = :10 for p = :19.
(c) Sketch an OC curve for your plan from (b).
5.10. Consider the situation of a consumer who will repeatedly receive lots of
1500 assemblies. These assemblies may be tested at a cost of $24 apiece
or simply be put directly into a production stream with a later extra
manufacturing cost of $780 occurring for each defective that is undetected
because it was not tested. Well assume that the supplier replaces any
assembly found to be defective (either at the testing stage or later when the
extra $780 cost occurs) with a guaranteed good assembly at no additional
cost to the consumer. Suppose further that the producer of the assemblies
has agreed to establish statistical control with p = :02.
(a) Adopt perspective B with p known to be .02 and compare the mean
per-lot costs of the following 3 policies:
i. test the whole lot,
ii. test none of the lot, and
iii. go to Mil. Std. 105D with AQL= :025 and adopt an inspection
level II, normal inspection single sampling plan (i.e. n = 125
and c = 7), doing 100% inspection of rejected lots. (This by the
way, is not a recommended use of the standard. It is designed
to guarantee a consumer the desired AQL only when all the
switching rules are employed. Im abusing the standard.)
(b) Adopt the point of view that in the short term, perspective B may
be appropriate, but that over the long term the suppliers p vacillates
between .02 and .04. In fact, suppose that for successive lots the
p
i
= perspective B p at the time lot i is produced
are independent random variables, with P[p
i
= :02] = P[p
i
= :04] =
:5. Now compare the mean costs of policies i), ii) and iii) from (a)
used repeatedly.
(c) Suppose that the scenario in (b) is modied by the fact that the
consumer gets control charts from the supplier in time to determine
whether for a given lot, perspective B with p = :02 or p = :04 is
appropriate. What should the consumers inspection policy be, and
what is its mean cost of application?
5.11. Suppose that the fractions defective in successive large lots of xed size N
can be modeled as iid Beta (; ) random variables with = 1 and = 9.
Suppose that these lots are subjected to attributes acceptance sampling,
using n = 100 and c = 1. Find the conditional distribution of p given that
the lot is accepted. Sketch probability densities for both the original Beta
distribution and this conditional distribution of p given lot acceptance.
5.12. Consider the following variation on the Deming Inspection Problem dis-
cussed in 5.3. Each item in an incoming lot of size N will be Good (G),
Marginal (M) or Defective (D). Some form of (single) sampling inspection
is contemplated based on counts of Gs, Ds and Ms. There will be a
per-item inspection cost of k
1
for any item inspected, while any Ms go-
ing uninspected will eventually produce a cost of k
2
, and any Ds going
uninspected will produce a cost of k
3
> k
2
. Adopt perspective B, i.e. that
any given incoming lot was produced under some set of stable conditions,
characterized here by probabilities p
G
, p
M
and p
D
that any given item in
that lot is respectively G, M or D.
(a) Argue carefully that the All or None criterion is in force here and
identify the condition on the ps under which All is optimal and
the condition under which None is optimal.
(b) If p
G
, p
M
and p
D
are not known, but rather are described by a joint
probability distribution, n other than N or 0 can turn out to be op-
timal. A particularly convenient distribution to use in describing the
ps is the Dirichlet distribution (it is the multivariate generalization
of the Beta distribution for variables that must add up to 1). For a
Dirichlet distribution with parameters
G
> 0,
M
> 0 and
D
> 0,
it turns out that if X
G
, X
M
and X
D
are the counts of Gs, Ms and
Ds in a sample of n items, then
E[p
G
jX
G
; X
M
; X
D
] =

G
+X
G
G
+
M
+
D
+n
E[p
M
jX
G
; X
M
; X
D
] =

M
+X
M
G
+
M
+
D
+n
and
E[p
D
jX
G
; X
M
; X
D
] =

D
+X
D
G
+
M
+
D
+n
:
Use these expressions and describe what an optimal lot disposal (ac-
ceptance or rejection) is, if a Dirichlet distribution is used to describe
the ps and a sample of n items yields counts X
G
, X
M
and X
D
.
5.13. Consider the Deming Inspection Problem exactly as discussed in 5.3.
Suppose that k
1
= $50, k
2
= $500, N = 200 and ones a priori beliefs
are such that one would describe p with a (Beta) distribution with mean
.1 and standard deviation .090453. For what values of n are respectively
c = 0, 1 and 2 optimal? If you are brave (and either have a pretty good
calculator or are fairly quick with computing) compute the expected total
costs associated with these values of n (obtained using the corresponding
c
opt
(n)). From these calculations, what (n; c) pair appears to be optimal?
5.14. Consider the problem of estimating the process fraction defective based
on the results of an inverse sampling plan that samples until 2 defective
items have been found. Find the UMVUE of p in terms of the random
variable n = the number of items required to nd the second defective.
Show directly that this estimator of p is unbiased (i.e. has expected value
equal to p). Write out a series giving the variance of this estimator.
5.15. The paper The Economics of Sampling Inspection by Bernard Smith
(that appeared in Industrial Quality Control in 1965 and is based on earlier
theoretical work of Guthrie and Johns) gives a closed form expression for
an approximately optimal n in the Deming inspection problem for cases
where p has a Beta(; ) prior distribution and both and are integers.
Smith says
n
opt
v
u
u
t
N B(; )p
0
(1 p
0
)
p
0
Bi(j + 1; p
0
)

+
Bi( + 1j +; p
0
)
for p
0
k
1
=k
2
the break-even quantity, B(; ) the usual beta function and
Bi(xjn; p) the probability that a binomial (n; p) random variable takes a
value of x or more. Suppose that k
1
= $50, k
2
= $500, N = 200 and our a
priori beliefs about p (or the process curve) are such that it is sensible
to describe p as having mean .1 and standard deviation .090453. What
xed n inspection plan follows from the Smith formula?
5.16. Consider the Deming inspection scenario as discussed in 5.3. Suppose
that N = 3, k
1
= 1:5, k
2
= 10 and a prior distribution G assigns P[p =
:1] = :5 and P[p = :2] = :5. Find the optimal xed n inspection plan by
doing the following.
(a) For sample sizes n = 1 and n = 2, determine the corresponding
optimal acceptance numbers, c
opt
G
(n).
(b) For sample sizes n = 0, 1, 2 and 3 nd the expected total costs
associated with those sample sizes if corresponding best acceptance
numbers are used.
5.17. Consider the Deming inspection scenario once again. With N = 100,
k
1
= 1 and k
2
= 10, write out the xed p expected total cost associated
with a particular choice of n and c. Note that None is optimal for p < :1
and All is optimal for p > :1. So, in some sense, what is exactly optimal
is highly discontinuous in p. On the other hand, if p is near .1, it doesnt
matter much what inspection plan one adopts, All, None or anything
else for that matter. To see this, write out as a function of p
worst possible expected total cost(p) best possible expected total cost(p)
best possible expected total cost(p)
:
How big can this quantity get, e.g., on the interval [.09,.11]?
5.18. Consider the following percent defective acceptance sampling scheme. One
will sample items one at a time up to a maximum of 8 items. If at any
point in the sampling, half or more of the items inspected are defective,
sampling will cease and the lot will be rejected. If the maximum 8 items
are inspected without rejecting the lot, the lot will be accepted.
(a) Find expressions for the type B Operating Characteristic and the
ASN of this plan.
(b) Find an expression for the type A Operating Characteristic of this
plan if lots of N = 50 items are involved.
(c) Find expressions for the type B AOQ and ATI of this plan for lots
of size N = 50.
(d) What is the (uniformly) minimum variance unbiased estimator of p
for this plan? (Say what value one should estimate for every possible
stop-sampling point.)
5.19. Vardeman argued in 5.3 that if one adopts perspective B with known p
and costs are assessed as the sum of identically calculated costs associated
with individual items, either All or None inspection plans will be
optimal. Consider the following two scenarios (that lack one or the other
of these assumptions) and show that in each the All or None paradigm
fails to hold.
(a) Consider the Deming inspection scenario discussed in 5.3, with k
1
=
$1 and k
2
= $100 and suppose lots of N = 5 are involved. Suppose
that one adopts not perspective B, but instead perspective A, and
that p is known to be .2 (a lot contains exactly 1 defective). Find the
expected total costs associated with All and then with None in-
spection. Then suggest a sequential inspection plan that has smaller
expected total cost than either All or None. (Find the expected
total cost of your suggested plan and verify that it is smaller than
that for both All and None inspection plans.)
(b) Consider perspective B with p known to be .4. Suppose lots of size
N = 5 are involved and costs are assessed as follows. Each inspec-
tion costs $1 and defective items are replaced with good items at no
charge. If the lot fails to contain at least one good item (and this
goes undetected) a penalty of $1000 will be incurred, but otherwise
the only costs charged are for inspection. Find the expected total
costs associated with All and then with None inspection. Then
argue convincingly that there is a better xed n plan. (Say clearly
what plan is superior and show that its expected total cost is less
than both All and None inspection.)
5.20. Consider the following nonstandard variables acceptance sampling sit-
uation. A supplier has both a high quality/low variance production line
(#1) and a low quality/high variance production line (#2) used to man-
ufacture widgets ordered by Company V. Coded values of a critical di-
mension of these widgets produced on the high quality line are normally
distributed with
1
= 0 and
1
= 1, while coded values of this dimension
produced on the low quality line are normally distributed with
2
= 0 and
2
= 2. Coded specications for this dimension are L = 3 and U = 3.
The supplier is known to mix output from the two lines in lots sent to
Company V. As a cost saving measure, this is acceptable to Company V,
provided the fraction of out-of-spec. widgets does not become too large.
Company V expects
= the proportion of items in a lot coming from the high variance line (#2)
to vary lot to lot and decides to institute a kind of incoming variables
acceptance sampling scheme. What will be done is the following. The
critical dimension, X, will be measured on each of n items sampled from a
lot. For each measurement X, the value Y = X
2
will be calculated. Then,
for a properly chosen constant, k, the lot will be accepted if

Y k and
rejected if

Y > k. The purpose of this problem is to identify suitable n
and k, if Pa :95 is desired for lots with p = :01 and Pa :05 is desired
for lots with p = :03.
(a) Find an expression for p (the long run fraction defective) as a func-
tion of . What values of correspond to p = :01 and p = :03
respectively?
(b) It is possible to show (you need not do so here) that EY = 3 + 1
and Var Y = 9
2
+39+2. Use these facts, your answer to (a) and
the Central Limit Theorem to help you identify suitable values of n
and k to use at Company V.
5.21. On what basis is it sensible to criticize the relevance of the calculations
usually employed to characterize the performance of continuous sampling
plans?
5.22. Individual items produced on a manufacturers line may be graded as
Good (G), Marginal (M) or Defective (D). Under stable process
conditions, each successive item is (independently) G with probability p
G
,
M with probability p
M
and D with probability p
D
, where p
G
+p
M
+p
D
= 1.
Suppose that ultimately, defective items cause three times as much extra
expense as marginal ones.
Based on the kind of cost information alluded to above, one might give
each inspected item a score s according to
s =
8
<
:
3 if the item is D
1 if the item is M
0 if the item is G :
It is possible to argue (dont bother to do so here) that Es = 3p
D
+ p
M
and Var s = 9p
D
(1 p
D
) +p
M
(1 p
M
) 3p
D
p
M
.
(a) Give formulas for standards-given Shewhart control limits for average
scores s based on samples of size n. Describe how you would obtain
the information necessary to calculate limits for future control of s.
(b) Ultimately, suppose that standard values are set at p
G
= :90, p
M
=
:07 and p
D
= :03 and n = 100 is used for samples of a high volume
product. Use a normal approximation to the distribution of s and
nd an approximate ARL for your scheme from part (a) if in fact the
mix of items shifts to where p
G
= :85; p
M
= :10 and p
D
= :05.
(c) Suppose that one decides to use a high side CUSUM scheme to mon-
itor individual scores as they come in one at a time. Consider a
scheme with k
1
= 1 and no head-start that signals the rst time that
a CUSUM of scores of at least h
1
= 6 is reached. Set up an ap-
propriate transition matrix and say how you would use that matrix
to nd an ARL for this scheme for an arbitrary set of probabilities
(p
G
; p
M
; p
D
).
(d) Suppose that inspecting an item costs 1/5th of the extra expense
caused by an undetected marginal item. A plausible (single sampling)
acceptance sampling plan for lots of N = 10; 000 of these items then
accepts the lot if
s :20 :
If rejection of the lot will result in 100% inspection of the remain-
der, consider the (perspective B) economic choice of sample size
for plans of this form, in particular the comparison of n = 100 and
n = 400 plans. The following table gives some approximate accep-
tance probabilities for these plans under two sets of probabilities
p = (p
G
; p
M
; p
D
).
n = 100 n = 400
p = (:9; :07; :03) Pa :76 Pa :92
p = (:85; :10; :05) Pa :24 Pa :08
Find expected costs for these two plans (n = 100 and n = 400) if
costs are accrued on a per-item and per-inspection basis and prior
probabilities of these two sets of process conditions are respectively
.8 for p = (:9; :07; :03) and .2 for p = (:85; :10; :05).
5.23. Consider variables acceptance sampling for a quantity X that has engi-
neering specications L = 3 and U = 5. We will further suppose that X
has standard deviation = :2.
(a) Suppose that X is uniformly distributed with mean . That is, sup-
pose that X has probability density
f(x) =
1:4434 if :3464 < x < +:3464

0 otherwise :
What means
1
and
2
correspond to fractions defective p
1
= :01
and p
2
= :03?
(b) Find a sample size n and number k such that a variables acceptance
sampling plan that accepts a lot when 4 k < x < 4 +k and rejects
it otherwise, has Pa
1
:95 for p
1
= :01 and Pa
2
:10 for p
2
= :03
when, as in part (a), observations are uniformly distributed with
mean and standard deviation = :2.
(c) Suppose that one applies your plan from (b), but instead of being
uniformly distributed with mean and standard deviation = :2,
observations are normal with that mean and standard deviation.
What acceptance probability then accompanies a fraction defective
p
1
= :01?
5.24. A large lot of containers are each full of a solution of several gases. Suppose
that in a given container the fraction of the solution that is gas A can be
described with the probability density
f(x) =
( + 1)x
x 2 (0; 1)
0 otherwise :
For this density, it is possible to show that EX = ( + 1)=( + 2) and
VarX = ( + 1)=( + 2)
2
( + 3). Containers with X < :1 are considered
defective and we wish to do acceptance sampling to hopefully screen lots
with large p.
(a) Find the values of corresponding to fractions defective p
1
= :01 and
p
2
= :03.
(b) Use the Central Limit Theorem and nd a number k and a sample
size n so that an acceptance sampling plan that rejects if x < k has
Pa
1
= :95 and Pa
2
= :10.
5.25. A measurement has an upper specication U = 5:0. Making a normal
distribution assumption with = :015 and desiring Pa
1
= :95 for p
1
= :03
and Pa
2
= :10 for p
2
= :10, a statistician sets up a variables acceptance
sampling plan for a sample of size n = 23 that rejects a lot if x > 4:97685.
In fact, a Weibull distribution with shape parameter = 400 and scale
parameter is a better description of this characteristic than the normal
distribution the statistician used. This alternative distribution has cdf
F(xj) =
0 if x < 0
1 exp(
400
) if x > 0 ;
and mean :9986 and standard deviation = :0032.
Show how to obtain an approximate OC curve for the statisticians ac-
ceptance sampling plan under this Weibull model. (Use the Central Limit
Theorem.) Use your method to nd the real acceptance probability if
p = :03.
5.26. Heres a prescription for a possible fraction nonconforming attributes ac-
ceptance sampling plan:
stop and reject the lot the rst time that X
n
2 +
n
2
stop and accept the lot the rst time that n X
n
2 +
n
2
(a) Find a formula for the OC for this symmetric wedge-shaped plan.
(One never samples more than 7 items and there are exactly 8 stop
sampling points prescribed by the rules above.)
(b) Consider the use of this plan where lots of size N = 100 are subjected
to rectifying inspection and inspection error is possible. (Assume
that any item inspected and classied as defective is replaced with
one drawn from a population that is in fact a fraction p defective and
has been inspected and classied as good.) Use the parameters w
G
and w
D
dened in 5.2 of the notes and give a formula for the real
AOQ of this plan as a function of p, w
G
and w
D
.
5.27. Consider a perspective A economic analysis of some fraction defective
xed n inspection plans. (Dont simply try to use the type Bcalculations
made in class. They arent relevant. Work this out from rst principles.)
Suppose that N = 10, k
1
= 1 and k
2
= 10 in a Deming Inspection
Problem cost structure. Suppose further that a prior distribution for
p (the actual lot fraction defective) places equal probabilities on p = 0; :1
and :2 . Here we will consider only plans with n = 0; 1 or 2. Let
X = the number of defectives in a simple random sample from the lot
(a) For n = 1, nd the conditional distributions of p given X = x.
For n = 2, it turns out that the joint distribution of X and p is:
x
0 1 2
0 :333 0 0 :333
p :1 :267 :067 0 :333
:2 :207 :119 :007 :333
:807 :185 :007
and the conditionals of p given X = x are:
x
0 1 2
0 :413 0 0
p :1 :330 :360 0
:2 :2257 :640 1:00
(b) Use your answer to (a) and show that the best n = 1 plan REJECTS
if X = 0 and ACCEPTS if X = 1. (Yes, this is correct!) Then use
the conditionals above for n = 2 and show that the best n = 2 plan
REJECTS if X = 0 and ACCEPTS if X = 1 or 2.
(c) Standard acceptance sampling plans REJECT FOR LARGE X. Ex-
plain in qualitative terms why the best plans from (b) are not of this
form.
(d) Which sample size (n = 0; 1 or 2) is best here? (Show calculations
to support your answer.)
A Useful Probabilistic
Approximation
Here we present the general delta method or propagation of error approx-
imation that stands behind several variance approximations in these notes as
well as much of 5.4 of V&J. Suppose that a p 1 random vector
X =
0
B
B
B
@
X
1
X
2
.
.
.
X
p
1
C
C
C
A
has a mean vector
=
0
B
B
B
@
EX
1
EX
2
.
.
.
EX
p
1
C
C
C
A
=
0
B
B
B
@
2
.
.
.
p
1
C
C
C
A
and p p variance-covariance matrix
=
0
B
B
B
B
B
@
VarX
1
Cov (X
1
; X
2
) Cov(X
1
; X
p1
) Cov(X
1
; X
p
)
Cov(X
1
; X
2
) VarX
2
Cov(X
2
; X
p1
) Cov(X
2
; X
p
)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Cov(X
1
; X
p1
) Cov (X
2
; X
p1
) VarX
p1
Cov(X
p1
; X
p
)
Cov (X
1
; X
p
) Cov (X
2
; X
p
) Cov (X
p1
; X
p
) VarX
p
1
C
C
C
C
C
A
=
0
B
B
B
B
B
@
2
1

12
2

1;p1
p1

1p
12
2

2
2

2;p1
p1

2p
p
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2p
p

2;p1
p1

2
p1

p1;p
p1
1p
p

2p
p

p1;p
p1
p

2
p
1
C
C
C
C
C
A
= (
ij
j
)
(Recall that if X
1
and X
j
are independent,
ij
= 0.)
127
128 A USEFUL PROBABILISTIC APPROXIMATION
Then for a k p matrix of constants
A = (a
ij
)
consider the random vector
Y
k1
= A
kp
X
p1
It is a standard piece of probability that Y has mean vector
0
B
B
B
@
EY
1
EY
2
.
.
.
EY
k
1
C
C
C
A
= A
and variance-covariance matrix
CovY = AA
0
(The k = 1 version of this for uncorrelated X
i
is essentially quoted in (5.23)
and (5.24) of V&J.)
The propagation of error method says that if instead of the relationship
Y = AX, I concern myself with k functions g
1
; g
2
; :::; g
k
(each mapping R
p
to
R) and dene
Y =
0
B
B
B
@
g
1
(X)
g
2
(X)
.
.
.
g
k
(X)
1
C
C
C
A
a multivariate Taylors Theorem argument and the facts above provide an ap-
proximate mean vector and an approximate covariance matrix for Y . That is,
if the functions g
i
are dierentiable, let
D
kp
=
@g
i
@x
j
1;2;:::;p
!
A multivariate Taylor approximation says that for each x
i
near
i
y =
0
B
B
B
@
g
1
(x)
g
2
(x)
.
.
.
g
k
(x)
1
C
C
C
A
0
B
B
B
@
g
1
()
g
2
()
.
.
.
g
k
()
1
C
C
C
A
+D(x)
So if the variances of the X
i
are small (so that with high probability Y is near
, that is that the linear approximation above is usually valid) it is plausible
129
that Y has mean vector
0
B
B
B
@
EY
1
EY
2
.
.
.
EY
k
1
C
C
C
A
0
B
B
B
@
g
1
()
g
2
()
.
.
.
g
k
()
1
C
C
C
A
and variance-covariance matrix
Cov Y DD
0

531 Notesv 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

531 Notesv 2

Uploaded by

Copyright:

Available Formats

Graduate Lectures and Problems in Quality

Control and Engineering Statistics:

n(n 1)(u)(v) ((v) (u))

n(n 1)(u)(u +w) ((u +w) (u))

it is the case that EU = and Var U = 2. It is thus immediate

Operators SSB J 1 MSB

PartsOperators SSAB (I 1)(J 1) MSAB

Error SSE (m1)IJ MSE

L(; ) > sup

L(; ) > sup

n :05 :10 :20

, the intuition that one ought to estimate 2 (x

+:5) for these .)

n m :05 :10 :20

through time i under the provision that a

= +0 will reset a current Run-Sum

. (That is, the entries of the

(1 +L(y + (1 )u)) f(y)dy ;

a for s 1 in a chemical process

), the s normal (0;

immediately above. Or since

= the average of the n sample variances (from the sampled levels of A) :

nonconforming can be obtained for a given item completely without measure-

and acceptance probability

> 0 and from

= 0. That is, all is optimal. In the case that

< 0 and from examination of formula (5.18) minimum

() and some approximate values for L

() and use your sketches and Lees tables to produce 95%

L(u) L(0) 1 for 0 u k

:95] :99.) How does

b p need not be p unless all N

variance component could be eliminated.

should involve a linear combination of mean

1:4434 if :3464 < x < +:3464

You might also like