Anderson Darling 1952

Asymptotic Theory of Certain "Goodness of Fit" Criteria Based on Stochastic
Processes
T. W. Anderson; D. A. Darling
The Annals of Mathematical Statistics, Vol. 23, No. 2. (Jun., 1952), pp. 193-212.
Stable URL:
http://links.jstor.org/sici?sici=0003-4851%28195206%2923%3A2%3C193%3AATOC%22O%3E2.0.CO%3B2-R
The Annals of Mathematical Statistics is currently published by Institute of Mathematical Statistics.
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained
prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in
the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/journals/ims.html.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic
journals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers,
and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community take
advantage of advances in technology. For more information regarding JSTOR, please contact support@jstor.org.
http://www.jstor.org
Wed Aug 29 17:43:32 2007
ASYMPTOTIC THEORY OF CERTAIN "GOODNESS OF FIT" CRITERIA

BASED ON STOCHASTIC PROCESSES
Columbia University and University of Michigan
1. Summary. The statistical problem treated is that of testing the hypothesis

that n independent, identically distributed random variables have a specified
continuous distribution function F ( x ) . If F,(x) is the empi]-ical cumulative distribution function and +(t) is some nonnegative weight function (0 5 t $ I ) ,
we consider n' sup-,<,,,
{ 1 F ( z ) - F J z ) 1 + ' [ F ( x ) I )and r / - [ F ( z ) - Fn(z)12

'La
+[F(x)]dF(x). A general method for calculating the limiting distributions of

these criteria is developed by reducing them to corresponding problems in
stochastic processes, which in turn lead to more or less classical eigenvalue and
boundary value problems for special classes of differential equations. For certain
weight functions including = 1 and = l / [ t ( l - t ) ] we give explicit limiting distributions. A table of the asymptotic distribut,ion of the von Mises w2
criterion is given.
2. Introduction. One method of testing the hypothesis that n observations

have been drawn from a population with specified distribution function F ( x ) is
to compare the empirical histogram based on dividing the real line into intervals with the hypothetical histogram by means of the x 2 tests. A test which does
not involve a subjective grouping of the data consists of comparing the empirical
cumulative distribution function with the hypothetical distribution function.
Let F,(x) be the empirical distribution function based on n observations; that
is, F,(x) = k / n if k observations are $ x for k = 0, 1, . . . , n . We wish to consider a convenient measure of the discrepancy or "distance" between two distribution functions. (For a more detailed discussion cf. Wald and Wolfowitz [21].)
In accordance with the usual notions of a metric in function space, we treat the
following measures:
where +(t) ( 2 0) is some preassigned weight function.

If a measure W: is adopted, the hypothesis is rejected for those samples for
which W: > 21, say, and if a measure K , is adopted, the hypothesis is rejected
when K , > zz , say. The numbers zl and zz are to be chosen so that when the
hypothesis is true the probability of rejection is some specified number (for
1
This work was done mainly at the Rand Corporation.
193
194
T. W. ANDERSON AND D. A. DARLING
example, .Ol or .05). The main purpose of this paper is to give methods for finding the asymptotic distributions of W : and K., and, hence, approximate values
of the significance points, 21 and zz . We assume that the hypothetical distribution is continuous.
The fundamental ideas for tests of this nature are due to Kolmogorov [ll],
Smirnov [17], Cram& [2], and von Mises [19], and for large n certain tests have
been developed by them. The present paper treats these tests in somewhat
more detail, the analysis being greatly expedited by reducing the problems to
straightforward considerations in the theory of continuous Gaussian stochastic
processes. This reduction was developed by Doob [6], and used by him to give
a simplified proof of Kolmogorov's fundamental result.
The principal innovation in this paper is the incorporation of a weight function to allow more flexibility in the tests. Although we are able to make explicit
calculations for only a few simple types of weight functions, the principal mathematical problems are reduced to classical problems in the theory of differential
equations.
The function #(t), 0 S t $ 1, is to be chosen by the statistician so as to weight
the deviations according to the importance attached to various portions of the
distribution function. This choice depends on the power against the alternative
distributions considered most important. The selection of $(t) = 1 yields no2,
the criterion of von Mises, for w:, and the criterion of Kolmogorov for K , . For
to exist for all samples except a set with probability zero, it is necessary
and sufficient that the following integrals exist:
for every ul(O
< ul < I),
for every uz (0 < uz < 1).

Given the data XI, xs , . . - , x, arranged in increasing magnitude (with probability one there are no equalities between any two of them, since the distribution is assumed continuous), we obtain for practical computations the simpler
variants of (2.1) and (2.2),
(2.6)
where
K, =
di
max { d + m ] max [nF(xj)

j-l.....n
- ( j - 11, j - nF(xj)I I,
195
ASYMPTOTIC THEORY O F CERTAIN CRITERIA
For (2.5) to hold the integrals &(t),42(t) must exist; for (2.6) to hold it is necessary and sufficient that
if $(t)is differentiable (substituting the difference quotient in (2.8) if $(t)is not

differentiable).
3. Reduction to a continuous stochastic process. Since F(x) is assumed continuous, we can make the transformation u = F(x). Then the observations are
ui = $'(xi) (i = I , 2, .. , n ) , and under the null hypothesis these can be considered as drawn from the uniform distribution between 0 and 1. Let G,(u) be
the empirical distribution derived from ul , . . , u, . Then ~ f and
, Kn are,
respectively,
(3.2)
Kn = sup
Ogu61
6I Gn(u) - u I m.
u 5 1, Y,(u) = &[Gn(u) - u)] is a random variable and

For every 0
the set of these random variables may be considered a stochastic process with
parameter u. Putting
(3.4)
A.(z)
~r
B,(z)
Pr
{bl
1
I Z/$(u)
~:(u)$(u)du
5z
sup
osus1
Pr { W: 5 z ) ,
1 Yn(u)
5 zJ
= Pr
IK, 5 z},
we wish to calculate A(z) = Jim A&), n -+ GO, and B(z) = lim Bn(z),n --+ a,
if these limits exist.
For fixed ul , u2 , . . , uk the joint distribution of Yn(u1),Yn(zlz),. . , Y,(ut)
approaches a k-variate normal distribution as n -+ a. Thus the asymptotic
process is Gaussian (normal) and is specified by its mean and covariance functions. For finite n we have
E(Yn(u))= 0,
E(Yn(u)Y ,(v)) = min (u, v )
The limiting process is a Gaussian process' y(u), 0
- uv.
_I
u 5 1, for which
E(Y(~=
) ) 0,
E(y(u)y(v))= min (u,v ) - uv,
such that the probability is 1 that y(u) is continuous [6]. Putting
w.
196
T.
(3.8)
b(z)
A. DARLING
ANDERSON AND D.
P r ( sup
o$usl
I y(u) I
d#m5 z j ,
we wish to conclude that A ( z ) = a ( z ) and B ( z ) = b(z). Having established these

equalities we shall be in a position to use certain representation theorems for
stochastic processes to great advantage.
In [4] Donsker has given the following theorem: Let R be the space of real,
single-valued functions g(t) which are continuous except for at most a finite
number of finite jumps, and let C be the space of continuous functions. Let
F(g) be a functional dejined on R and continuous in the uniform topology,
0 , n --t a , implies I F(gn) - F(go) I --t 0 , n
a,
i.e., supostsl j gn(t) - go(t) 1
g, E R, go E C , ezcept for a set of go(t) with 0 probability according to the probability
associated with y(t). Then
-+
-+
lim Pr {FIYn(t)I 5 zj = Pr ( F [ y ( t ) ]5 z j .
(3.9)
n-m
It follows from this result that if #(u)is bounded A(z)
= a ( z ) and B ( z ) = b(z)'
T o handle more general weight functions for the case of integrals we want to
extend this result. We shall assume that +(u) is continuous in any interval
0 < ul 5 u $ u2 < 1. Secondly uTeassume that
(3.10)
lul
1
#(t)t log log - dt,
t
exist for every ul ( 0
1-t
dt
< ul < 1 ) . I t is shown in Section 5 that

(1
+ t)y(t/(l + t))
X(t)
is the Wiener process which has the property (1121 p. 242 and p. 247)
1
there exists a to such that x 2 ( t ) S 2t log log - for 0 < t < to
t
This implies that
there exists a
(3.12)
~o
such that
1 - u
y 2 ( ~5
) 2u(1 - u) log log U
Thus with probabilit'y 1 +(t)y2(t) is majorized by k+(t)tloglog(l/t) for

k 2 2(1 - %). Thus if the first integral in ((3.10) exists
(3.13)
l " ' y2(t)$(t)dl
exists with probability 1. (taking the principal value when the integral is improper). A similar argument holds for the existence of
197
ASYMPTOTIC THEORY OF CERTAIN CRITERIA
Thus ['$(t)$(t)
dt exists with probability 1. This defines a functional continuous
in the uniform topology. Hence from Donsker's theorem A(z)
a(z).
4. The limiting distribution of the integral criterion. I n this section we show

how t o find a(z) in terms of the solution of a certain differential equation and
give two examples of this method. The statistic W: is essentially that introduced
by Cram& [2]; in the case of $(t) = 1, it is n times the u2 criterion studied by
von Mises [19] and Smirnov [17].
The method we use is analogous to the technique of Kac and Siegert [lo]. We
shall sketch briefly the extension of their results.
By Mercer's theorem a symmetric continuous correlation function k(s, t),
0 5 s, t I1, which is square integrable (in one variable and in both variables),
can be expressed as
where X i is an eigenvalue and f i(t) is the corresponding normalized eigenfunction

of the integral equation
and
(4.3)
the Kronecker delta. I n most cases k(0,O) = k(1, 1) = 0 ; hence f,(O) = fi(l) = 0.
Since k(s, t) is positive definite, X i > 0. The series (4.1) converges absolutely
and uniformly in the unit square.
Let X1 , X2 , . . . be independently, normally distributed with means zero and
variances 1. If k(t, t) < m , then we can define
the series converges in the mean and with probability one for each t. Then
z ( t ) is a Gaussian process with Ez(t) = , O and Ez(s)z(t) = k(s, t). Thus z(t)
y ( t ) when k(s, t) = 1/$(s)
gives the same stochastic process as
[min (s, t) - st]. From this it follows that with probability 1
d$(t)
d$(t)
198
T. W. ANDERSON AND D. A. DARLINQ
See [lo] for details of this proof. Thus
T h e infinite product converges absolutely and uniformly for all real u, and in
general 1/X, = 0(l/n2).
We desire a more general result, however, because one weight function we
treat leads to a kernel that is not continuous a t (0, 0) and (1, 1). We use the
following theorem of Hammerstein [9]: Let k(s, t) be continuous in the unit square
except possibly at the corners of the square; let ak(s, t)/as be continuous in the
interior of both triangles in which the square is divided by the line between (0,O) and
(1, l ) , and let the partial derivative be bounded in the domain e $ s 5 1 - e and
0 5 t _I 1for each a(> 0). Then the series on the right of (4.1) converges uniformly
to k(s, t) in every domain in the interior of the unit square.
Since k(s, t) = d+(S)
z/@) [min (s, t) - st], the condition is that $(t) be
continuous for 0 < t < 1 and
be continuous for 0 $ t $ s S 1 - e and
be continuous for a 5 s $ t 5 1.
In this case (4.4) converges in the mean and with probability one for every
t(e S t 5 1 - a), and z(t) is the same process as x(t) in this interval.
If
J 1k(t,
t) dt
<
m,
EzlI/% <
lo
(by Bessel's inequality) and
x:./Xj
converges with probability one. Further, with probability one, ~ ~ - " , l , f j ( t ) / f i

converges in the mean (integral with respect to t) and it converges to z(t). Thus
we have with probability one
For
small enough
ASYMPTOTIC
for any 6
tion of
> 0. Thus the distribution of w2=
I'
x2(t)dt is the limiting distribu-
z2(t) dt. With a similar argument for the integral of f i t ) we argue
that the distribution of W Zis the distribution of

function (4.6).
THEOREM
4.1. If
(4.12)
199
THEORY OF CERTAIN CRITERIA
k(s, t )
zzlx:./Xjwith characteristic
dm d+%[min (s, t ) - st]
i s continuous or if k(s, t ) i s continuous except at (0, 0 ) and (1, 1 ) with ak(s, t)/as
continuous for 0 < s, t < 1, s # t, and bounded in a I s S 1 - a, 0 5 t 6 1 for
every a ( > O ) then the characteristic function of W Zi s given by (4.6), where fAj) are
the eigenvalues of k(s, t ) defined by (4.2).
In our case the integral equation is
f (t) =
I, tmin (t, s) - tsl d m dJ.(fs( s)) ds.
I t can be shown that if f ( t ) satisfies (4.13) for some A, then h ( t )

satisfies
for that X (see [8], Sections 604 and 605) and h ( 0 ) = h(1)
k ( 1 , 1 ) = 0. Let h(t, A ) be the solution of (4.14) for which
h(0, X )
f(t)+-'(t)
0 when k(0, 0 ) =
0,
(4.15)
If +(t) is continuous ( 0 I t I l ) ,such a solution exists and h(t, A ) is continuous
in t(0 4 t 4 1). Since h(1, A) = 0 for A an eigenvalue of (4.13), the roots of
h(1, X) = 0 are the roots of the Fredholm determinant D(X) associated with
k ( s , t). I t can be shown that
The characteristic function (4.6) is
The square root is taken so as to make (4.17) real and positive when the characteristic function is real and positive. The details of this proof are given in [8],
Section 605.
THEOREM
4.2. Let +(t) be continuous for 0 I t I
1. Then the equation (4.14)
has a unique solution h(t, A) for every X > 0 satisfying (4.15). Then the characteristic function of wZi s
200
T. W. ANDERSON A N D D. A . DARLING
Thus we have reduced the p-roblem of finding the characteristic function of
W2 to finding the general solution of a differential equation.

The semi-invariants K, of
the eigenvalues. Since
w 2are given quite easily (when they exist) through
n
w
(4.19)
4(t) =
(1 - 2it/Xj)-',
9-1
the coefficient of (it)"/n! in the power series expansion of log +(t) is
Hence we obtain for the mean and variance, for instance,
Even without knowing the eigenvalues, the moments can be calculated in terms
of the iterates of the kernel k(s, t). Putting kl(s, t) = k(s, t) = (min (s, t) 1
st)
d m ) ,kn+l(s, 1) =
k.(s, u)k(u, I) du, we have by means of the bilinear
expansion
(4.22)
kn(s, t ) =
C X7"fi(~)fj(t)-
Hence,
and, in particular,
r
(4.24)
l1
k(s, S) ds =
6'
u2 = 2
s(l
- s)+(s) as,
(1
- sj2+(s) 0
/' t2+(t)dt ds.
k2(r, t) ds dt = 4
We now present two applications of this method.

Example 1. Let + ( t ) 3 1; then W $ = nu2. The differental equation h"(t)
Xh(t) = 0 has a solution
1
h(t, A) = - sin 4 X t
4x
satisfying (4.15). Taking h(1,O) as limx,,,

(4.26)
h(1, A)
1, we find that l/dmis
ASYMPTOTIC THEORY O F CERTAIN CRITERIA
201
This expression was given by Smirnov [15] and later by von Mises [20] using
entirely different methods. A formal method for finding the distribution (by
inverting the Fourier transform) was given later by Smirriov [16], but his expression is not amenable to numerical calculation. The following procedure expresses al(z) = Pr [ w 25 z ] in terms of tabulated functions.
I t appears convenient to work with the Laplace transform. We have
Using integration by parts, we obtain
for the cdf al(z). We wish to invert this Laplace transform. Now
We suppose in the sequel that the real part of t, R(t)

nomial expansion to the last expression ; thus
where
(<')
= (- l)'I'(j
>
0 and apply the bi-
+ +)/[I'(+)j!]. I t may be readily verified that the com-
plex inversion formula can be used termwise here since the abscissa of convergence
of f(t)/t is Rit) = 0, and the above series coiirrergesabsolutely and uniformly in
the half plane R(t) 2 /? > 0.
Since
we have
where
202
by virtue of the convolution property of the Laplace transform. I n this integral

we change variables, putting x = u sech2 0 to give
@(z)=
dGr(3,4)ru4
Ae-A2/(8z)
lw
dm
e-'A2"4'" Ei.h2e (cosh e sinh B)+de
2312d \/;
r(3/4)z3I4
- ( A ~ I ( S Z ) ) cash B
(sinh 8)' dB
where Kt(%)is the standard Bessel function [22].

Having inverted the typical term, we finally obtain by summing
zy-o
The convergence of this series is very rapid. If al(z) =

u,(z), we find
(using the fact that Kt(t) is a decreasing functhat ~ ~ + ~ ( z ) / u ,<
( z k,e-(4jf"'(2z'
)
tion of t), where ko < 1.12, kl < 1.007, kz < 1.002, k, < 1.0007 for j 2 3. Since
Kt(t) is positive, u,(z) > 0. Using a crude geometric series bound for &(z) =
zy-4U,(Z), we can show that for z S 2, R4(z) < ,0002. Moreover, for z 6 2,
&(z) < u ~ ( z )< uz(z) < 2h1(z). In computation, therefore, one need only take
as many terms in the series as are different from 0 in the number of decimal
places carried. We give below a table of z for equal increments (.01) of a,(z)
with the 5010, 1% and .lyosignificance points. The calculations have been carried to 6 figures before rounding off. The authors are indebted to Mr. Jack
Laderman of Columbia University and the Numerical Analysis Department of
the Rand Corporation for their assistance in preparing the table.
The semi-invariants of this distribution are easily obtained since the eigenvalues are A, = 1/ (r2j2).Thus
where B, are the Bernoulli numbers: B1 = 1/6, B2 = 1/30, etc.

Example 2. #(t) = l/[t(l - t)]. Since the variance of Yn(t) = 4; [G,(t) - t]
is t(l - t), an interesting weight function for yI(t) is the reciprocal of this varia n ~ eIn
. ~a certain sense, this function assigns to each point of the distribution
2
This suggestion was first made by L. J. Savage.
TABLE 1
Limiting Distribution of nw2
a l ( z ) = lim Pr(nw2 5 z ]
n-m
F ( x ) equal weights. '4 statistician may prefer to use this weight function when
he feels that + ( t ) = 1 does not give enough weight to the tails of the distribution.
204
T. H'. ANDERSON AND D.
A.
DARLING
In this example
is not continuous a t ( t , s) = (0, 0) or ( I , 1 ) ; hence we need the extended result

of Theorem 4.1 to justify our procedure. It is known that the Ferrer associated
) t ( l - t)P,(2t - 1 ) satisfy the integral
Legendre polynomials f , ( t ) = ~ : ( t =
equation with X i = l / [ i ( i l ) ] (see [23],p. 324). Thus the characteristic function of w2is
An analysis similar to that used in Example 1 shows that the cdf, a 2 ( z ) ,can be
expressed as
--
~/(8(~2+1))-((4j+1)~i~~u.~)'(Sz)
dw.
5. Theory of deviations. The second test criterion led to the calculation of

B,(z)
P r ( sup
osus1
dii / G,(u)
- u
1 d + j )5
2).
In order to handle the limiting distribution we consider the functional
I t follows from the theorem of Donsker [4] that for

lim B,(z)
n-m
+(IL)
bounded we have
Pr ( K 5 z ) ,
and the problem is reduced to that of calculating the distribution of ( 5 . 1 ) .This

is the elegant idea of Doob [GI, who treated the case = 1 .
This is known as an "absorption probability" problem on account of its very
suggestive analogy with a simple diffusion model. I t is clear that the event that
( -Z(+(U))-"
y ( u ) 5 z(+(u))-', 0 5 u 5 1 ) is equivalent to the event ( K 5 x ) ;
thus the probability b ( z ) is, very crudely speaking, the "proportion" of all those
ASYMPTOTIC
THEORY OF CERTAIN CRITERIA
205
paths y(u) of the diffusing particle which do not get "absorbed into'' (i.e., intersect) the "barriers" y = &z(#(u))-'.
It is convenient to make a trailsformation due to Doob [6] which renders the
analysis simpler. If we put
it is easy to verify that X(t) is the Wiener-Einstein process; that is, X(t) is
Gaussian, X(0) = 0, E(X(t)) = 0, E(X(t)X(s)) = min (s, t). Then
where
Thus we have the absorption probability problem for the free particle with
barriers x = f[(t) for t 1 0.
The method of solution is to treat the corresponding diffusion problem as a
boundary value problem with the diffusiorl equation
associated with the region t 2 0, / x / $ [(t). In line with the preceding analogy
f(t, x) will be the "density" of paths X(ZL)which for 0 Iu It have not been
"absorbed" and for which X(t) = x; hence
will give the probability of nonabsorption up to time t. I t is the limit of this

expression for t -+ a which will yield b(z). For a more detailed discussion of
these points see LQvy [12], pp. 78 et seq.
We need the following existence and uniqueness theorem:
THEOREM
5.1. Given that [(t) of (5.2) has a bounded derivative for to 5 t 5 tt ,
there exists a unique function p(to , y; t, x) such that for any continuous function
s(Y), / Y I < [(to), the function
has the following properties:

(1) f(t, x) satisfies (5.3) in the domain to
(2)
lim
z-+?t{(t)
< t < t ~ / ,x / < [ (t),
f (t, x) = 0,
t l > t > to,
lim
f(t, x)
1-10
g (7)
The proof of this theorem is contained quite explicitly in the fundamental

paper of Fortet [7] (especially ch. V), who considers in great detail the general
problem of absorption probabilities. Fortet treats only the case of one absorbing
barrier, but his results are easily extended to the above case of two barriers.
~,
The differential p(t0, y; t, x) dx can be interpreted, to terms of order ( d ~ )as
the probability that if the diffusing particle starts a t (to , y) it will not have been
absorbed in the barriers f t ( t ) during the interval (to , t), and will lie between x
and x
dx a t time t.
We have not stated the best theorem possible. If t(t) is merely continuous
the absorption probability density f(t, x) exists. For the existence of a solution
to (5.3) satisfying (2) and (3) of Theorem 5.1 it is sufficient to require that [(t)
satisfy a Lipschitz condition associated with the law of the iterated logarithm.
Finally we remark in passing that unless f(t, x) is of the form (5.4) (the so called
"normal" solution of Fortet) its uniqueness is not assured (cf. Doetsch [3]).
If in the theorem ((t) has a bounded derivative for t 2 0 then we plainly have
but if t(t) does not have a bounded derivative for t 2 0, (5.5) can no longer be
employed to determine b(z). However, if there are a finite number of intervals
in each of which [(t) has a bounded derivative and between which [(t) has a
simple jump discontinuity it is easy to modify the above result; in fact over
some of the intervals ((t) may be infinite. A piecewise determination can be
made and the solution can be continued to beyond the last discontinuity, and
then (5.5) can be used. Suppose the points of discontinuity of [(t) are
0 < tl < tz < . . . < t, and suppose t(t) is, say, left continuous. In the region
(0, tl) we have the solution go(t, x) = po(O, 0; t, x) by the above theorem. Now
if t(t1) < t(tl
0) we define g:(tl , x) by
and if 5(tl) > t(tl

0) we define g:(t~ , x) = go(t1, x), / x I 5 [(tl
0). Then
g:(tl , x) is continuous in 1 x 1 < ((tl
0) and we have for tl < t < tz a function
gl(t, x) defined by Theorem 5.1;
$71
(t, x)
g:(tl, Y ) ~(t1,
I Y; t, 2) d ~ .
iuI<E(11+0)
In the same way we can define a function g?(tz, x) which will yield a function
207
g2(t, x) for t2 < t < t3 . This process will ultimately yield a unique function
g,(t, 2) for t > t, . Finally
HZ)
lim
r-ta
((1)
-((t)
g,, (t, x) dx.
I t is clear that if i(t) =

in some of the intervals the successive determination of the functions gk(t, x) may still be carried forward. This would correspond
to an absence of the absorbing barrier over the interval.
Using the relation (5.2) and the above remarks we have the following theorem
for the weight function +(zL):
THEOREM
5.2. Suppose there is a finite sequence 0 = UQ < ul < uz . . . <
u, < ZL,+I = 1 such that in the interval (uk , ~ k + l ] +(t) is either ( I ) identically zero
or (2) is bounded away from zero and has a bounded derivative. Then there exists a
unique sequence of functions (pk(tk, y; t, x)) such that for t in the interval
) ) conclusions of Theorem 5.1
((uk/(l - uk) = t k < t < t k + l = uk+J(l - u ~ + ~the
hold for the functions pk(tk, y; t, x), k = 0, 1, . . , n, l(t) being defined by (5.2).
From this theorem we can generate a set of functions gk(t, x), t k < t < t k + l ,
k = 0, I, . . . , n, and another set g:(tk , x), k = 1,2, . . . , n, as before. g:+l(tk+l , x)
agrees with gk(tk+l , x) over the set of x for which the latter is defined; that is,
( x 1 < l(tk+l), and is zero for other values; namely, [(tk+l 0) > 1 x I > .$(&+I)
if ((t) has a positive jump a t tl;+l . Putting
+
we finally have (5.6) for b(z).

In a formal way the problem is thus solved, but the analytical difficulties of
getting an explicit solutioh may be prohibitive. If ((t) consists of a set of linear
/3)-' in a piecewise way)
arcs (which implies that flis of the form (au
then D(z) can be determined by quadratures (see, for example, Goursat [8], ch.
29, Ex. 3). We make an application of this remark below.
I t is clear that if +(u) becomes infinite for some 0 < u < 1 then b(z) = 0 for
every z > 0. But since y(0) = y(1) = 0 it is possible that +(u) may become
infinite for u = 0 or 1 and still yield a nondegenerate b(z). But in this case it is
necessary that +(u) not dominate [2u(l - 2 4 ) log log l / ( u ( l - u))]-' for u near
0 or 1.
We shall consider several examples.
Example 1. Let +(u) be a constant over a set of intervals,
By choosing enough intervals, an arbitrary weight function can be approximated, in a manner of speaking.
It follo~vsthat the problem will be essentially solved if we can determine the
208
T. W. ANDERSOX AND D. A. DARLING
functions pk(tk , y; t, 5) of Theorem 5.2. I n this case the function [(t) becomes,
by (5.21,
and we must find the solut'ion to equation (5.3) which satisfies the conditions
(2) and (3) of Theorem 5.1.
As before we put t k = uk/(l - uk), and it follows by a classical procedure of
superposing an infinite system of sources and sinks along the line t = tk that we
may get the Green's solution. I n fact, let us put a source a t t = t k , 5 = yj , of
strength sj , where
for j = 0, k1, f2,

obtain
. Then for t k < t

00
pk(tk g; t, 2) =
j-.-m
$ tk+l and I y
sj
42r(t
,-
I<(z/G)
(1
+ tk)we
t(z-~j)~/(f-fh)
- tk)
which may be directly verified by substitution to be a solution. I t has been

tacitly assumed that qk > 0 ; if qk = 0 we obtain only the term corresponding to
j = 0 in the above solution, namely, the fundamental solution
Now on putting
and using the method outlined above, we obtain
for p,(tk , xk ; tk+l , x*+~)as in (5.7), and finally as an "explicit" solution,

bl(z)
lim
gn(t, 2) dx.
t-oo
/rl<-t(l+t)
4Fi
The resulting functioll bl(z) is a multiply infinite sum of integrals of an nvariate Gaussian distribi~tionover an n-dimensional rectangle.
209
We consider now the following special case of the above result
O S a < u S b S l ,
#(u) =
otherwise.
Thus the test of the hypothesis is confined to detecting discrepancies over only
a central portion of the interval [0, 11. Using the preceding notation we have
n = 2 and
and hence
Yj
= 2jz(tl
+ 1) + ( - l ) ' x i ,
sj = (- 1)' exp
( -2z2(tl
~ 2 0 22, 2 ; t, 2 ) =
+ 1)jZ- 2 z x ( -
I)'!,
e- i ( z - z 2 ) 2 / ( t - 1 2 )
4 2 ? r ( t - t2) '
Thus, putting bl(z) = P(a, b, z ) ,
a J
z(l+ty)
-z(i+ts) -z(l+tl)
p(0, 0;t i , x i ) p i ( t i , x i ; tr, x 2 ) h i dx2
for sj and y j as in (5.8).

The double integral is seen to be over a bivariate normal distribution, and if
we let n ( x l , xz , p1 , p2 , u: , U: , p ) be the normal bivariate density in xi , 2 2 with
210
T. TV. ANDERSON AND D. A. DARLING
means p l , p2 , variances a:
above integral
, a:
and correlation
we obtain by rewriting the
where
A somewhat simpler way of writing this result is as follows. Let M(u, v, [, 7, p)

be the volume under the normal bivariate surface with means zero and variances
1 and correlation p which is above the rectangle with vertices
Then, remembering that tl = a / ( l - a), tz = b/(l - b) and M(u, v, [, 7, p) =

M(-u, v, 6,7, - p ) , Ue obtain after a simple transformation of the above integral
There are tables available in which the function M is tabulated; see also P6lya
[14]. Also, if either a = 0 or b = 1 then p = 0 and the function can be calculated
with the ordinary univariate Gaussian tables. Putting a = 0, b = 1 simultaneously we obtain Kolmogorov's result
which has been tabulated [18]. I n the general case the convergence is very rapid
and good results can be obtained by using a few central terms (in (5.9) the terms
corresponding to f j are clearly equal).
The formula (5.9) is in disagreement with a recent announcement (without
proof) of Maniya [13]. Maniya's note appeared subsequent to a restricted paper
by the authors.
By using the general formula above it is possible to get, for example, a weight
function to test discrepancies over only the tails of the distribution, etc.
Ea-ample 2. We next investigate
0,
otherwise,
211
which is the weight function emsidered before with the w2test. By an earlier
remark we must have a > 0 and b < 1, else absorption is certain and b(z) is
degenerate. The transformation (5.2) yields
where X(t) is the Wiener-Einstein process. Here it appears convenient to make

another transformation. Let u(t) be the Uhlenbeck process with correlation
parameter p ; that is, u(t) is stationary Gaussian and Markovian with
E(u(s)u(t)) = exp (-@ I t - s I). Then from the known correspondence (cf.
Doob [ 5 ] )
we obtain
1 u(t) 1
a
1 - a
S z, - log -2p
or since the process is strictly stationary
which is an absorption probability with a uniform barrier.

The function b2(z) is of some importance in the theories of communications
and statistical equilibrium (cf. Bellman and Harris [l]), and may eventually be
tabulated. I t seems very difficult to give a complete analysis, but the following
partial result is given without proof.
Let cu = 3 log (b(1 - a ) ) / ( a ( l - b)) so that b2(z) is a function of a. Then it
is possible to find the Laplace transform of b4z) in the following form:
where D,(z) is the Weber function [23]. I t seems very difficult to get even any
qualitative information from this formula.,
REFERENCES
[I] R. BELLMAN
AND T. HARRIS,
"Recurreqce times for the Ehrenfest model," Pacific
Jour. Math., Vol. 1 (1951)) pp. 179-193.
C R A M ~"On
R ,
the composition of elementary errors," Skandinavisk Aktuarie[2] HARALD
lidskrift, Vol. 11 (1928), pp. 13-74, 141-180.
[3] G. DOETSCK,
"Les Bquations aux ddrivdes partielles du type parabolique," Enseignement Math., Vol. 35 (1936), pp. 43-87.
212
T. W. ANDERSON A N D D. A. DARLING
[4] M . D . DONSKER,"Justification and extension of Doob's heuristic approach t o the

Kolmogorov-Smirnov theorems," Annals of Math. Stat., Vol. 23 (1952), pp.
277-281.
[5] J . L. DOOB,"The brownian motion and stochastic equations," Annals of Mathematics,
Vol. 2 (1942), pp. 351-369.
[6] J . L. DOOB,"Heuristic approach to the Kolmogorov-Smirnov theorems," Annals of
Math. Stat., Vol. 20 (1949), pp. 393-403.
[7] R . FORTET,"Les fonctions alBatoires du type de Markoff associBes A certaines Bquations
lindaires aux dBrivBes partielles du type parabolique," Jour. Math. Pures Appl.,
Vol. 22 (1943), pp. 177-243.
[8] E . GOURSAT,
COUTSd'Analyse MathSmatique, Vol. 3, 2nd ed., Gauthier-Villars, Paris,
1915.
[9] A . HAMIMERSTEIN,
"Ober Entwicklungen gegebener Funktionen nach Eigenfunktionen
von Randjvertaufgaben," Math. Zeit., Vol. 27 (1927), pp. 269-311.
[lo] M. KACA N D A. J. F. SIEGERT,"An explicit representation of a stationary Gaussian
process," Annals of Math. Stat., Vol. 18 (1947), pp. 438-442.
"Sulla determinazione empirica delle leggi di probabilita," Giorn.
[ l l ] A . N . KOLMOGOROV,
1st. Ital. Attuari, Vol. 4 (1933), pp. 1-11.
[12] P . L ~ v YProcessus
,
Siochastiques et Mouuement Brownien, Gauthiers-Villars, Paris, 1948.
[13] G. M. MANIYA,
"Generalization of the criterion of A. N. Kolmogorov," Doklady Akad.
N a u k S S S R ( N S ) , Vol. 69 (1949), pp. 495-497.
"Remarks
,
on computing the probability integral," Proceedings of the Berke[I41 G . P ~ L Y A
ley Symposium on Mathematical Statistics and Probability, University of California Press, 1949, pp. 63-78.
[15] N . V. SMIRNOV,
"Sur la distribution de w2," C . R. Acad. Sci. Paris, Vol. 202 (1936),
p. 449.
1161 N . V. SMIRNOV,
"On the distribution of the w2 criterion," Rec. Math. (Mat. Sbornik)
( N S ) , Vol. 2 (1937), pp. 973-993.
[17] N. V. SMIRNOV,
"On the deviation of the empirical distribution function," Rec. Math.
(Mat. Sbornik) ( N S ) , Vol. 6 (1939), pp. 3-26.
[18] N . V. SYIRXOV,
"Table for estimating goodness of fit of empirical distributions."
Annals of Math. Stat., Vol. 19 (1948), pp. 279-281.
[19] R . V O N MISES,Wahrscheinlichkeitsrechnung, Deuticke, Vienna, 1931.
[20] R . V O N MISES, "Differentiable statistical functions," Annals of Math. Stat., Vol. 18
(1947), pp. 309-318.
[21] A . WALDA N D J. WOLFOWITZ,
"Confidence limits for continuous distribution functions,"
Annals of Math. Stat., Vol. 10 (1939), pp. 10.5118.
[22] G. N. WATSON,
*4 Treatise on the Theory of Bessel Functions, Cambridge University
Press, 1922.
A K D G. N. WATSON,
A
Course of Modern Analysis, Cambridge Uni1231 E. T. WHITTAKER
versity Press, 1927.

Anderson Darling 1952

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Anderson Darling 1952

Uploaded by

Copyright:

Available Formats

Asymptotic Theory of Certain "Goodness of Fit" Criteria Based on Stochastic

ASYMPTOTIC THEORY OF CERTAIN "GOODNESS OF FIT" CRITERIA

Columbia University and University of Michigan

1. Summary. The statistical problem treated is that of testing the hypothesis

{ 1 F ( z ) - F J z ) 1 + ' [ F ( x ) I )and r / - [ F ( z ) - Fn(z)12

+[F(x)]dF(x). A general method for calculating the limiting distributions of

2. Introduction. One method of testing the hypothesis that n observations

where +(t) ( 2 0) is some preassigned weight function.

This work was done mainly at the Rand Corporation.

T. W. ANDERSON AND D. A. DARLING

for every ul(O

< ul < I),

for every uz (0 < uz < 1).

max { d + m ] max [nF(xj)

ASYMPTOTIC THEORY O F CERTAIN CRITERIA

if $(t)is differentiable (substituting the difference quotient in (2.8) if $(t)is not

u 5 1, Y,(u) = &[Gn(u) - u)] is a random variable and

we wish to conclude that A ( z ) = a ( z ) and B ( z ) = b(z). Having established these

It follows from this result that if #(u)is bounded A(z)

exist for every ul ( 0

< ul < 1 ) . I t is shown in Section 5 that

Thus with probabilit'y 1 +(t)y2(t) is majorized by k+(t)tloglog(l/t) for

l " ' y2(t)$(t)dl

ASYMPTOTIC THEORY OF CERTAIN CRITERIA

dt exists with probability 1. This defines a functional continuous

in the uniform topology. Hence from Donsker's theorem A(z)

4. The limiting distribution of the integral criterion. I n this section we show

where X i is an eigenvalue and f i(t) is the corresponding normalized eigenfunction

T. W. ANDERSON AND D. A. DARLINQ

See [lo] for details of this proof. Thus

be continuous for 0 $ t $ s S 1 - e and

(by Bessel's inequality) and

converges with probability one. Further, with probability one, ~ ~ - " , l , f j ( t ) / f i

> 0. Thus the distribution of w2=

x2(t)dt is the limiting distribu-

z2(t) dt. With a similar argument for the integral of f i t ) we argue

that the distribution of W Zis the distribution of

THEORY OF CERTAIN CRITERIA

dm d+%[min (s, t ) - st]

I, tmin (t, s) - tsl d m dJ.(fs( s)) ds.

I t can be shown that if f ( t ) satisfies (4.13) for some A, then h ( t )

The characteristic function (4.6) is

Thus we have reduced the p-roblem of finding the characteristic function of

W2 to finding the general solution of a differential equation.

w 2are given quite easily (when they exist) through

the coefficient of (it)"/n! in the power series expansion of log +(t) is

Hence we obtain for the mean and variance, for instance,

k.(s, u)k(u, I) du, we have by means of the bilinear

/' t2+(t)dt ds.

We now present two applications of this method.

satisfying (4.15). Taking h(1,O) as limx,,,

1, we find that l/dmis

ASYMPTOTIC THEORY O F CERTAIN CRITERIA

Using integration by parts, we obtain

We suppose in the sequel that the real part of t, R(t)

0 and apply the bi-

+ +)/[I'(+)j!]. I t may be readily verified that the com-

T. W. ANDERSON AND D. A. DARLING

by virtue of the convolution property of the Laplace transform. I n this integral

e-'A2"4'" Ei.h2e (cosh e sinh B)+de

where Kt(%)is the standard Bessel function [22].

The convergence of this series is very rapid. If al(z) =

where B, are the Bernoulli numbers: B1 = 1/6, B2 = 1/30, etc.

This suggestion was first made by L. J. Savage.