Explaining The Saddlepoint Approximation

Explaining the Saddlepoint Approximation
Constantino GOUTIS and George CASELLA
first few terms and integrate. The integral can be over the
complex plane, corresponding to the inversion formula of a
Saddlepoint approximations are powerful tools for obtain- Fourier transform, but this is a secondary point.
ing accurate expressions for densities and distribution func- We start with an elementary motivation of the technique,
tions. We give an elementary motivation and explanation of stressing familiar Taylor series expansions. At the begin-
approximation techniques, starting with Taylor series ex- ning we will somewhat ignore the statistical applications,
pansions and progressing to the Laplace approximation of because saddlepoint approximations are general techniques,
integrals. These approximations are illustrated with exam- and quite often references to random variables and distri-
ples of the convolution of simple densities. We then turn to butions may be more confusing than illuminating. Once
the saddlepoint approximation and, using both the Fourier the approximation is developed, however, we will examine
inversion formula and Edgeworth expansions, we derive the some statistical applications. Throughout the article, we as-
saddlepoint approximation to the density of a single ran- sume that the functions are as regular as needed. In other
dom variable. We next approximate the density of the sam- words, when we write a derivative or an integral, we as-
ple mean of iid random variables, and also demonstrate the sume that they exist. Furthermore, we develop the methods
technique for approximating the density of a maximum like- in the univariate case. This is almost without loss of gen-
lihood estimator in exponential families. erality as the multivariate case is essentially the same but
with a somewhat more complicated notation.
KEY WORDS: Edgeworth expansions; Fourier trans-
To keep the technical level reasonable, we avoid any rig-
form; Laplace method; Maximum likelihood estimators;
orous asymptotic analysis, but a few remarks are in or-
Moment-generating functions; Taylor series.
der. The accuracy of an approximation is assessed by ex-
amining the size of the error of approximation. We use
the notation 0(l) which denotes a function that satisfies
1imt<O tO (') = constant, where for a random sample of
size n, standard techniques typically give approximations
1. INTRODUCTION of order 0( ). The saddlepoint can improve this to 0( n)
The saddlepoint approximation has been a valuable tool and even 0(n-3/2) in some circumstances. Unfortunately,
in asymptotic analysis. Various techniques of accurate ap- calculation of these error terms often requires detailed tech-
proximations, relying on it in one way or another, have been nical arguments, and we do not include them here (see Field
developed since the seminal article by Daniels (1954). Reid and Ronchetti 1990 or Kolassa 1994). This omission does
(1988, 1991) gave a comprehensive review of the applica- not speak to the importance of such calculations. Indeed, in
tions and a broad coverage of the relevant literature. application, the size of the approximation error is perhaps
The number of applications of the saddlepoint approxi- the most important concern, and this error must be assessed
mation is quite impressive, as warrants this extremely pow- either through analytical or numerical means.
erful approximation (see Section 5 for a partial list). Typ- In Section 2 we introduce approximation techniques from
ically, derivations and implementations of saddlepoint ap- the point of Taylor series and Laplace approximations, and
proximations rely on tools such as exponential tilting, Edge- give some examples. Section 3 is an attempt to explain the
worth expansions, Hermite polynomials, complex integra- original derivation of the saddlepoint, which has its roots
tion, and other advanced notions. Although these are impor- in Fourier transforms and complex analysis, along with an-
tant tools for researchers in the area, they may obscure the other derivation based on Edgeworth expansions. Those un-
fundamental idea of the saddlepoint approximation. A goal willing to wade through these derivations need only look at
of this article is to illustrate that there is a simple basic idea formula (25), which gives the formula for the density ap-
behind this useful technique. Namely, write the quantity one proximation of a sample mean. Section 4 treats the case of
wishes to approximate as an integral, expand the integrand the MLE in exponential families, with the important for-
with respect to the dummy variable of integration, keep the mulas being (34) and (36). Section 5 contains a short dis-
cussion.
George Casella is Liberty Hyde Bailey Professor of Biological Statistics, 2. FIRST EXPANSIONS
Department of Biometrics, Cornell University, 434 Warren Hall, Ithaca,
NY 14853. The authors thank Luis Tenorio for useful discussions, and We begin by looking at some basic principles of approx-
the reviewers for providing detailed comments on earlier versions of this imation, using the familiar tool of the Taylor expansion. As
article, which resulted in a much improved presentation. This research was we will see, the underlying strategy of this approximation
supported by NSF Grants DMS 9305547 and DMS 9625440, and this is
carries through to more sophisticated approximations. Note,
paper BU-131 1-M in the Biometrics Unit, Cornell University, Ithaca, NY
14853. The original version of this article was written in December 1995, however, that the approximations in this section can have
before the tragic death of Costas Goutis in July 1996. large errors. As we are working with densities of single
216 The Amnerican Statistician, Autguist 1999, Vol. 53, No. 3 (?) 1999 Americanl Statistical Association
This content downloaded from 39.50.186.67 on Sun, 01 Jul 2018 00:55:15 UTC
All use subject to http://about.jstor.org/terms
random variables rather than means, the order of the error is not particularly useful or illuminating. (We will later see
can be expected to be about 0(1). a number of cases where the representation (5) arises fairly
naturally.)
2.1 From Taylor to Laplace
By defining k(x, t) = log m(x, t) we consider the Laplace
approximation of the integral of exp k(x, t) with respect to
Perhaps the simplest way to approximate a positive func-
the variable t. For any fixed x, we write
tion f(x) is to use the first few terms of its Taylor series
expansion. We will use that idea, but not for f(x) itself but
rather for h (x) log f(x). Writing f(x) = exph( x) and f(x) Jexp{k(x,t(x))+ -(2 k(t) ( dt
choosing xo as the point to expand around, we obtain
( 1/2
f (x) exp {h(xo) + (x - xo)h (xo) =exp {k (x, i(x))}-f2)

- ~ ~~~~~~~~~,2k (x, t)(6 (6)
at2 t(x)
+ (X tXo)2 h//} (1)
where, for each x, i(x) satisfies &k(x,t)/0t = 0 and
02k(x, t)/&t2 < 0, and hence maximizes k(x, t).
The above approximation simplifies if we choose xo In the above expressions, the notation makes it explicit
where h'(si) = 0. The second term disappears and we have that the maximum t depends on x. In that sense, (6) is a
set of integrated Taylor expansions, one for each x, as op-
posed to (2) which is a single series around iz. Because we
f(x) exp {h(G) + 2 hl} (2) are continually recentering the approximation, there is hope
that it is more accurate than (2). However, there is a price
to be paid: if we want the values of f (x) at various points
The approximation (2) will be exact if h(x) is a quadratic
function. If not, obviously for an x far from s~, the omitted
x, we must compute i(x), k(x, i(x)), and &2k(x, t)/&t2 each
time afresh.
terms of order (x - s)3 and higher will be important and
It is clear that (6) will be exact if k(x, t) is a quadratic
the approximation will not be good.
function of t for each x but, as we will see, there are other
Although the approximation (2) may be useful in its own
cases where the approximation is also exact. However, the
right, it also can be used for computing integrals of positive
choice of the form of k(x, t) usually is determined by other
functions, such as f f(x)dx. Expanding the integrand as in
considerations. Furthermore, it is worth noting that the ac-
(2), we obtain
curacy of (6) depends on the value of x where we want to
approximate f (x) because, in general, the omitted third and
If(x)dx Jexp{hG)+ ( j) h"(xI)} dx. (3) higher order derivatives with respect to t depend on x.
We illustrate the foregoing approximations by deriving
the distribution of the sum of two random variables by ap-
If iz is a maximum, h"I(,) is negative and the right side
proximating the convolution integral.
of (3) can be explicitly computed by recognizing that the
kernel of the integral is the same as the kernel of a normal Example 1: Gamma Distribution. We first look at a sim-
density with mean iz and variance -1/h" (l). Hence ple, somewhat artificial example. Suppose that the function
we wish to approximate is a gamma density with shape pa-
rameter equal to 2co, (with co > 1) and scale parameter equal
J f(x) dx exp {hQI%)} (-h"(x~)) 1/ (4)
to 1; that is,
The above technique is called the Laplace approximation,

X (X) = 1 x2c-1exp(-X) (7)
and as its name suggests, it has been known for ages. It F(2oz)
implicitly requires that the integral is computed over the
for positive x. The Taylor series approximation is obtained
whole line, but it may be sufficiently accurate as long as
by applying (2). The maximum is achieved at s 2oc - 1
the mass of the approximating function is within the limits
and the second derivative of the logarithm of (7) evaluated
of the integration.
at iz is equal to -1/(2c - 1). Hence
The next, perhaps most natural step is to combine the
two methods (2) and (4)-that is, to try to approximate a
function by a Laplace type approximation of an integral.
We first write the function f as fx (2) ) exp (2ca-1) [log(2ca -1) -1]
f (x) Jm(x,t) dt, (5) 9(9n -2 }. (8)
for some positive m(x, t). This is always possible by con- Recalling that f (x) is actually a density function, we can
renormalize
sidering, for example, m(p, t) f()mO(t), where mO (t) is the approximation by calculating the constant
a function integrating to one, but the latter representation of the right side of (8) 50 that f f (cc) dcc= 1. Doing this,
The American Statisticiani, August 1999, Vol. 53, No. 3 217
we obtain It is interesting to note that the convolution formula (10)
can be solved for the maximum in some generality. If we
write
[f [2-F(2oz
2oX1)] -1)]
- exp
1 {L <f(
2(2a c-)1-
J 1)k (9)
k(x, y) = log[g(x - y)g(y)] = log[g(x - y)] + log[g(y)],
a normal density with mean and variance both equal to 2oc -
1. then under mild regularity conditions, &k(x, y)/&y has a
We can probably write f(x) in the form (5) in several zero at y = x/2, and we can apply the convolution formula
ways, but a simple one is motivated from elementary dis- somewhat straightforwardly to other densities.
tribution theory. We know that the sum of two independent
Example 2: Student's t distribution. Let X be a random
F(a, 1) random variables is a F(2ca, 1) random variable, so
variable with a Student's t distribution with v degrees of
if g( ) is the F(ca, 1) density then f(x) is a convolution of
freedom (X tj). Then X has density
the form
x
f(x) g(x - y)g(y) dy, f(x) = C where C =- 2

(I + X2/V) 2 r (2 ) 7
exp(?X)( y)ya-y1 dy To approximate the density of the sum T =X1 + X2, where
X1 and X2 are independent t, random variables, we use (10)
with g(y) = 1 + y2/v and find the zeros of
exp ( 'x) X exp (a - 1) [log(x - y) + log y]} dy.
k(x, y) = log (1 + (x -y)2/v) + log (1 + y2/V)
(10)
The three solutions of &k(x, y)/9y = 0 are y = x42 and
The exponent in the integrand is maximized for y = x/2
y = (x + vxc2- 4v)/2. For x2 < 4v the last two roots are
and its second derivative with respect to y is equal to - (ca -
complex, and y = x/2 is the maximum. Applying (6) we
{(X _ y)-2 + y-2}. Applying (6) we obtain can approximate the distribution of T by
Jexp { (c- 1) [log(x -y) + log y] } dy

fT(X) C2 4wv [ 722 i]l [1 V 1 (12)
( T 1/2 2oa-1
To evaluate the approximation, we compare it to the exact
density of T for v = 9, which is
Substituting in (10), we see that the kernel of the approxi-
mating function is a gamma density with a shape parameter C25w 9
equal to 2c - 1 and scale parameter equal to one. Thus, if fT JT~X} 64 - 9
(X) = C64
we renormalize it so that it integrates to one we obtain the
622336 + 56576 A- + 48496 A- + 272 A + 7x94 (1
F(2ca, 1) density exactly. X (x2/9 A- 4)9 .(3
(The exact density of the sum of two iid t,

exact was found with Roger Berger during a fit of Mathemnatica-
--- approximation induced frenzy.)
(212 (shr drenormnaslzed
Figure 1 compares the approximation (12), and its renor-
malized version, with the exact density. The agreement is
quite good.
2.2 Approximating Marginals
Another application, where a function is naturally repre-

sented by an integral, arises quite naturally in the calcula-
/V
tion of marginal densities. For example, if (X, Y) f(x, y),
then by a direct application of (6) the marginal density of
X, fx (x) can be approximated by
fx (x) Jf (c Y) dy
apoxiain are vitull the sam in th tails.

where y - y(c) satisfies & logf (c,y)/&y - 0 and is a local
maximum. It is easy to see that by repeating this approxima-
tion, we can also marginalize higher dimensional densities.
218 General
Formula (14) can also be applied to integrated likeli- merical inversion of (17).] The function Kx (t) = log /x (t)
hoods, a desirable enterprise in the presence of nuisance is also called the cumtulant generating function (cgf) of a
parameters. Given a likelihood function L(O, A x), where 0 random variable X. (Mathematically, the cgf and mgf are
is the parameter of interest, A is a nuisance parameter, and equivalent. The fact that the cgf generates the mean and
x is the data, an integrated likelihood LI for 0 is obtained variance, instead of the uncentered moments, is statistically
by integrating out A. This leads to the approximation more appealing. We can think of K" as a variance.) How-
ever, we do not have to think of a random variable at all,
Li(O|x) JL(0, Alx) dA and (17) is applicable even if f(x) is negative or does not
integrate to one. There is a complication in that we have
(92log L(0, A x) ~-1/2 to deal with complex rather than real numbers, but (17) is
L(, Ax) y )A2 ) (15) similar to (5) and this suggests that we can use the same
ideas as in the previous section.
This approximation is the Cox and Reid (1987) approximate
We first make a change of variable t' = it. We can then
conditional likelihood, as first noted by Sweeting (1987),
write (17) as
and can be considered a version of the modified profile like-
lihood of Barndorff-Nielsen (1983). The factor that is "ad- 1 fT+i
justing" the likelihood is the observed Fisher information. f](X) _ exp{Kx(t) -tx} dt (18)
See Barndorff-Nielsen (1988) for an alternate derivation and
further discussion. for T in a neighborhood of zero. It is a consequence of a
theorem from complex analysis (the closed curve theorem)
3. THE REAL THING
that the integral (18) is the same over all paths that are
parallel to the imaginary axis in a neighborhood of zero
The examples in the previous section are, of course,
where OX (t) exists. Thus, we are free to choose a value for
rather artificial. The function f(x) has a closed form and
T over which to do the integration.
there is no need to use any approximations. Indeed, we
We take k(x, t) = Kx (t) - tx and, as in (6), we find the
do not gain anything as the approximating functions are
point i(x) that satisfies
equally complicated as the original one. As one might ex-
pect, this is not always the case. In this section we look at a I> (t) = x. (19)
number of important statistical problems where saddlepoint
approximations are useful. Expanding the exponent in (18) around i(x) we have
The first statistical application of the saddlepoint approx-
imation was derived by Daniels (1954). He approached the
-Kx(t) - tx _Kx(t(x)) - F(x)x ? (- 2()) K>(t(x)).
problem of finding a density approximation through the in- 2
version of a Fourier transform. Such an approach has the (20)
advantage of automatically providing a function m(.) sat-
isfying (5), but also carries the disadvantage of making us We now substitute in (18) and integrate with respect to t
deal with complex integration. We begin with some details along the line parallel to the imaginary axis through the
about the inversion formula. point t(x); that is, we choose the point -r in the limits of
the integral to be t(x). To treat this maneuver with rigor re-
3.1 The Inversion Formula quires great care (as in Daniels 1954 or Field and Ronchetti
We recall that for a density f(x), the moment generating 1990, chap. 3), but if we proceed informally (as if it were
function (mgf) is defined as a real integral) we again see that there is the kernel of a
normal density. Similar to (6) we obtain
r+0
ox(t) J exp(tx)f(x) dx, (16) 1/2
-00
provided that the integral is finite for t in some open neigh-

fx(x) 2Y2 Kx' (i())) ex
borhood of zero. (That is, there exists c > 0 for which (21)
bx(t) < oc for -c < t < c.) From Xx(t), we can obtain
Viewed
f(x) by using the inversion formula (Feller 1971, chap. XV; as a point in the complex plane, t(x) is neither a
Billingsley 1995, sec. 26) maximum nor a minimum but a saddlepoint of Kx (t) -ttx, as
the function is constant in the imaginary direction and has
1 r+0 an extrema in the real direction. (See Field and Ronchetti
f (x) = 2 O/x (it) exp(- itx) dt
1990, sec. 3.2 for details.) Note that here this extrema
1 r+
is a minimum, and to do the integration here we require
- 2/ j exp {Kx(it) - itx} dt, (17) Kx(t(x)) > 0 (compare to (3), where we needed x to be a
maximum). [It follows from the Cauchy-Schwarz inequal-
where i -1/= and we have defined Kfx(t) =logbx (t). ity that I?1j > 0. Daniels (1954, sec. 6) gave conditions
These formulas are common in statistical contexts where under which the equation K> (t) inx has a unique root.]
f(x) is a density and $x (it) is the characteristic function. We also note that choosing 'r = (xn) in (18) is an applica-
[Waller, Turnbull, and Hardin (1995) discussed exact nu- tion of a method called steepest ascent. This approximation
The American1 Statistician, August 1999, Vol. 53, No. 3 219
method takes advantage of the fact that, since t(x) is an Solving the saddlepoint equation & log Ox (t) /0t = x yiel
extreme point, the function is falling away rapidly as we the saddlepoint
move from this point. Thus, the influence on the integral of
neighboring points is diminished, making the approxima-
t(x) -p 2x- Ax (24)
tion (21) seem more reasonable. (See Field and Ronchetti 4x
1990, sec. 3.2, or Kolassa 1994, sec. 4.4, for details.) and applying (21) yields the approximate density. For p = 7
Expression (21) is what is commonly thought of as the and A = 5, Figure 2 displays the saddlepoint, renormalized
saddlepoint approximation to a density. Its error of approx- saddlepoint, and exact densities. Here, the saddlepoint and
imation is much better than the Taylor series approximation the renormalized saddlepoint are remarkably accurate.
to a function. In classical, or "first-order" asymptotics, the
3.1.1 Saddlepoints for Sums
error terms usually decrease at the rate n1/2, for a sample
A useful application of the saddlepoint is the approxima-
of size n. The saddlepoint is "second-order" asymptotics,
tion of the density of a sum or average. Perhaps the simplest
and can have error terms decrease as fast as n3/2, which
nontrivial example, which is also the oldest one (Daniels
yields a big improvement in accuracy for small samples.
1954), is the derivation of the density of a sample mean of
We return to this point in Section 3.2.
independent and identically distributed random variables.
Example 3. Noncentral chi-squared. Hougaard (1988) The key here is that the moment generating function of a
presented an interesting application of the saddlepoint ap- sum of iid random variables can be easily computed from
proximation, of which the following is a special case. The the original moment generating function, and (21) can be
noncentral chi-squared density has no closed form, and is directly applied.
usually written Consider X to be the sample mean of X1, X2,.. ,Xn,
iid random variables. Each Xi has a moment generating
xp/2?kle x/2 ke-A function qx(t) and a cumulant generating function Kx (t).
kF F(p/2 + k)2P/2+k k! (22) An elementary statistical argument shows that the moment
generating function of X is OX(t) = bx(t/n)T and the cu-
where p is the degrees of freedom and A is the noncentrality mulant generating function is Kg (t) = nKx (t/n). A direct
parameter. The density is an infinite mixture of central chi- application of (21) then gives
squared densities, where the weights are Poisson probabili-
ties. It turns out that calculation of the moment-generating
function is simple, and it can be expressed in closed form fx (X) 21rKx(i(x))
as x exp {rn [Kx (i(z)) - (z)] }. (25)
e2At/(1-2t) The right side of expression (25) is the saddlepoint ap-
ox(t) (1 - 2t)P/2 (23) proximation to the density of X. Of course, there are several
loose ends in the derivation which should be formalized in
order to be legitimate. Daniels (1954, 1987) presented all
the details. Note the fact that we are dealing with densities,
o - - - Exact and random variables enter only in the derivation of the
o --- saddlepoint cumulant generating function Kx(t) from the cumulants
co - - renormalized of the individual random variables. We can also appeal to
it to renormalize fx(x-) so that it integrates to 1, which
amounts to adjusting the constant (n/(2wF))1/2. This typi-
co
cally requires numerical integration.
The saddlepoint approximation can also be used on dis-
crete distributions, as the next example shows.
Example 4: Poisson Distribution. Let Xl... . ,Xn be iid

C)
from the Poisson distribution with mean A. The cumulant
generating function of Xi is Kx (t) = A (exp(t) - 1) which
-J~~~~~~ yields t(x) = log (x/A) as the saddlepoint. The formula (25)
Figur 2.Eatdniyo0 7(oi ie oehrwt h ade
can be used directly, but now the mean can take only values
pitappoiain(hr ahs ndrnraie adeon p x = r/n for integer r. Substituting we obtain
prxmto0 ln ahs.Ternomlzdsdlpitapoia /\1/2
tin6svrulytesm steeatdniy
6 o 5 lo 15 20 25 30 35 40 45 50 -x (X) ( r 2r ) exp {n ( - I - (log >) x] }
2 (1)n X- Ar+?/2
This amounts to replacing the factorial in the exact distri-

bution of x by Stirling's approximation.
220 Genzeral
It is interesting to compare the approximation (25) with to the density of X to be
the simpler one, obtained by expanding the integrand in the
inversion formula around 0 rather than around its maxi-
fx(x)> =qnfjX At)
mum. If we keep the first two terms we obtain
a _ a ~___ _
fx(x) = J exp {nKx(it) - itx4 dt [ 6+ {u (( /

(27)
2_ exp {n [Kx(0) + (Kx(0) - x)-
Ignoring the term in braces produces the usual normal ap-
proximation, which is thus accurate to O( j9. If we are
K/ (O) t2 ] }dt. using (26) for values of x near ,u, then the value of the ex-
pression in braces is close to zero, and the approximation
This corresponds to the inversion of a quadratic cumulant
will then be accurate to O(n) The trick of the saddlepoint
function iKx (O)t-Kx (O)t2 /(2n), and it can be easily seen
approximation is to make this always be the case.
that we have nothing more or less than the central limit the-
orem. So the gain in the accuracy of the saddlepoint approx- 3.2.1 A Family of Edgeworth Expansions
imation over the central limit theorem can be attributed to To make the Edgeworth expansion accurate to 0(n), we
a more clever expansion of the logarithm of the integrand. use a family of densities so that, for each x, we can choose
This expansion is around a value that depends on x, the a density from the family to zero out the term in braces
value at which we would like to approximate the density, in (26). One method of creating such a family is through
rather than around 0. The implementation of such recenter- a technique known as exponential tilting (see Efron 1981;
ing, and its advantages, will become more apparent in the Stuart and Ord 1987, sec. 11.13, or Reid 1988.) Starting
next section. with a density f of interest, we create the family
3.2 The Edgeworth Approach

.F= {f(. I): f(x 7) = exp{Tx -K(-)If(x),
Although the original saddlepoint derivation of Daniels where K(r) is the cumulant generating function of f. It
(1954) was based on inversion of the characteristic function, immediately follows that if Xl, X2... Xn are iid from
there is an alternate derivation, that Reid (1988) called "a f(x r), the density of X is
more statistical version," based on Edgeworth expansions.
In many ways it is similar to the original Daniels deriva- fx_(x l7) = exp{n[7x - K(T)]}fx(x), (28)
tion (as it involves inversion of an integral), but it has the where fx (x) is the density of the mean of an iid sample
advantage helping us make our statements about the "order from f. (This can be verified by calculating the mgf of
of the approximation" somewhat clearer. Moreover, it pro-
an iid sample from f(x Ir), and then directly calculating
vides a more precise picture of how the greater accuracy is
the mgf of fx(xI -r).) Let Atu, a2, and /,, denote the mean,
achieved. variance, and skewness of f(. -r), respectively, and apply
An Edgeworth expansion of a distribution is accom-
(27) to f_(x (Ir) to obtain a family of Edgeworth expansions
plished by first expanding the cumulant generating func-
for fx (x) given by
tion in a Taylor series around 0, then inverting (see Feller
1971, chap. XVI, or Stuart and Ord 1987, chap. 6 and sec.
11.13-11.16). fx(x) = exp-n[r7x - K(T-)]} ??$ ( jL)
Let X1 X2,.. . Xn be iid with density f with mean and
variance and a2. A useful form of an Edgeworth expan-
sion was given by Hall (1992, equation 2.17), and can be
written (29)
p (( - )< w) Now the parameter -r is free for us to choose in (28) and,

given x, we choose T so that utr = x. Recalling that K(T)
is the cumulant generating function, we can equivalently
w @ (w) + qf(w) (W-1) ? ()], (26)
choose -r so that K'(r) = x, the familiar saddlepoint equa-
tion. Denoting this value by '~, we get the approximation
where 4D and X are, respectively, the distribution and density
function of a standard normal and , = E(X1 - At)3 is the
skewness. For the Edgeworth expansion, the (n ) term is fx(x) =exp{-n[fx-K(l)]}j -(O) 1?+O (I)].
of the form p(w)/n, where p is a polynomial. Because it
(30)
is multiplied by b(w), it follows that its derivatives (in w)
maintain the same order of approximation. Thus, (26) can Since op,= K"(T), we see that (30) is equivalent to (25),
be differentiated to obtain a density approximation with the the saddlepoint approximation.
same order of accuracy. If we do that, and then make the However, we now clearly see the order of the approxi-
transformation x =r ?w +A,u we obtain the approximation
mation, and what we needed to do to obtain that order. The
The Americanz Statistician, August 1999, Vol. 53, No. 3 221
saddlepoint improves on the usual normal approximation by eralizations). Consider Xl, ... , X , independent ran-
eliminating the skewness term in the asymptotic expansion dom variables with density
of the density. And this is accomplished by using a new ap-
proximating density for each value of x. So this is similar f (xIO) = exp {Os(x) - K(O) - d(x)}. (31)
in flavor to the derivation in Section 2.1.
A version of the sufficient statistic is S - j> s(Xi) which
We have only shown that the order of the approximation
has a density
is Q(n-r/2), not the 0(n-3/2) that is often claimed. This
better error rate is obtained by renormalizing (30) so that it f(sJ0) = exp {Os - nK(O) - h(s)}. (32)
integrates to one. Details of this are contained in Daniels'
From sufficiency considerations, we need to consider only
original 1954 paper, Field and Ronchetti (1990, chap. 3),
f(s 0). We also realize that there is no need to apply the
or Kolassa (1994, chap. 4). We omit them here, but note
saddlepoint approximation to the entire density (32), but
that such renormalization may be quite computer intensive;
only to the function exp {-h(s)}, as the part of (32) that
see, for example, the discussion in Section 3.3 of Field and
involves 0 is relatively simple.
Ronchetti (1990).
Next, note that because f(s 0) is a density, it integrates
Last, there is an interesting analogy between the exponen-
to one, and hence by integrating and rearranging (32) it
tial tilting derivation and the inversion formula derivation
follows that
of the saddlepoint approximation. Here, by using a fam-
ily of densities, for any value x we were able to select the
exp {nK(0)} = Jexp (Os) exp{- h(s)} ds. (33)
member of the family with the value of ,uM in (28) that ze-
roed out a term in the expansion. In the inversion formula
The right side is exactly (16) with 0 instead of t and s
derivation of Section 3.1, the complex integration resulted
instead of x, so the cgf of exp {-h(s)} is nK(0). (Here,
in a family of paths over which we could integrate. There,
we have to think of 0 as a dummy variable rather than a
we chose the path that also zeroed out a term in the ex-
parameter of a distribution.) Hence, we can apply the ap-
pansion (20). Thus, either derivation results in us having an
proximation (21) directly to exp {-h(s)}.
extra parameter that we can control to help produce a good
Alternatively, we can realize that (32) is in the form of
approximation.
an "exponential tilt" of exp {-h(s)}, so the method of Sec-
tion 3.2.1 can be applied. In either case, we obtain that the
4. MAXIMUM LIKELIHOOD ESTIMATION
density of f (s I0) is approximated by
The saddlepoint approximation is useful in obtaining ap- f(slO) ~ 1

proximations to the density of the maximum likelihood [27rn K" (F(s))] 1/2
estimator (MLE), particularly in exponential families (see
X exp { [O - (s)] s -n [K(0) -K (ts) ,(34)
Daniels 1983 or Field and Ronchetti 1990, chap. 4, for gen-
where t(s) solves the equation
nK'(t) = s. (35)
Thus, the approximate density of the sum is obtained al-

0 ~~~~~~~Exact most automatically from K(O), the cgf of an individual ob-
---saddlepoint
servation. By using an approximation for h(s) we avoid the
- - Prenormalized
possibly difficult job of computing it.
Now, it seems almost an accident that if we take the
derivative of the log-likelihood corresponding to the den-
sity of the sufficient statistic s and set it equal to zero, we
0)
obtain (35), and, hence, t(s) is the maximum likelihood esti-
mate. Equation (35) suggests that we can obtain the density
of the maximum likelihood estimate by a transformation of
f(sIO). Since t and s are related by s = nK'(t), the Jaco-
bian of the transformation is nIK"(i). Writing s as s(t), a
o, , , , _
function of F, we obtain
Fiuec xc estyo h L rmaPreodsrbtoJae

on=10obevtos(oilietoehrwtthsadeinap -nvK"(t 1/2
f(F0o) _
prxmtod sotdse)adrnraie adeon prxmto
?; 10 20 30 40 50 60 70 B0
(ln0ahs.Ternraie adeon prxmto seat
x
X exp l(ea- hwts(h) -rn (K()-bK(t)) a (36)
222 General
parameter 0 enter the formula. Though the approximation have the approximation
is quite accurate, it typically requires numerical methods to
compute s(i), K"(t) and the exact normalizing constant.
As it turns out, the renormalized (36) is exact in many of P(X >a) Jj (2rKxj(t()))1
the simple cases that we examined. [Daniels (1980) showed
x exp {n [Kx (t(x)) -t(x)x] } dx
that the three cases where the saddlepoint approximation to
the density of a sample mean or sum is exact are the gamma, = ' n )1/2 [KX,,(t) ]1/ 2
the normal, and the inverse Gaussian distributions.] To take
full advantage of (36) requires more complicated settings x exp {rn [Kx (t) - tKx (t)] } dt,
than we will look at here, so we content ourselves with a
simple, exact example. where we make the transformation Kx (t) = x and t(a)
satisfies Kx (i(a)) = a. This transformation was noted by
Example 5: Pareto Distribution. Let Xl,... X, be iid
Daniels (1983), and allows the evaluation of the integral
from the Pareto distribution with known lower limit ae. The
with only one saddlepoint evaluation. However, one still
density is
must do the integration, probably using a numerical method.
[See Robert and Casella (1999), sec. 6.3, for an illustration
of the use of the Metropolis algorithm to do such a calcu-
f (xJ13) = B0+1) X > at, (37)
lation.]
Tail area approximations have seen much more develop-
a member of the exponential family. From (32) we see that ment. The work of Lugannini and Rice (1980) produced a
s = - Z log xi, K(,3) - log (,j3O), and the saddlepointvery
is accurate approximation that only requires the evalu-
given byt= s+n-log(o) The saddlepoint approximation is
ation of one saddlepoint, and no integration. It is derived
straightforward to compute and, from (36), the density of by further transformations of the saddlepoint approxima-
,3, the maximum likelihood estimator of ,3, is approximated
tion; see the discussions by Field and Ronchetti (1990,
by sec. 6.2) or Kolassa (1994, sec. 5.3). There are other ap-
proaches to tail area approximations; for example, the work
of Barndorff-Nielsen (1991), which takes advantage of an-
f(13113) I[n] ( e3) T(1//)1 (38) cillary statistics or the Bayes-based approximation of Di-
Ciccio and Martin (1993). Also, Wood, Booth, and Butler
Figure 3 shows this approximation and its renormalized ver- (1993) gave generalizations of the Lugannini and Rice for-
sion, which is exact. mula.
5. BEYOND BASICS [Received Decenmber 1995. Revised Junie 1998.]
The technique of the saddlepoint has widespread appli- REFERENCES

cability, far beyond that illustrated here. In Section 4. we
Barndorff-Nielsen, 0. (1983), "On a Formula for the Distribution of the
applied the saddlepoint to MLEs in exponential families,
Maximum Likelihood Estimator," Biomnetrika, 70, 343-365.
but more general classes of estimators can be handled. For (1988), Discussion of "Saddlepoint Methods and Statistical Infer-
example, approximations to the density of multivariate M- ence," by N. Reid, Statistical Science, 3, 228-229.
estimators can be obtained (Field 1982; Field and Ronchetti (1991), "Modified Signed Log-Likelihood Ratio," Biomnetrika, 78,
557-563.
1990, chap. 4). Moreover, the cgf itself may be estimated,
Barndorff-Nielsen, O., and Cox, D. R. (1994), Inferences and Asymptotics,
with application to L estimators (Easton and Ronchetti
London: Chapman and Hall.
1986) and general nonlinear statistics (Gatto and Ronchetti Billingsley, P. (1995), Probability and Measure (3rd ed.), New York: Wiley.
1996). Booth, J. G., Hall, P., and Wood, A. (1992), "Bootstrap Estimation of
Other applications include finite population models Conditional Distributions," The Annals of Statistics, 20, 1594-1610.
(Wang 1993a), bootstrapping, and related confidence meth- Butler, R. W., Huzurbazar, S., and Booth, J. G. (1992a), "Saddlepoint
Approximations for the Generalized Variance and Wilks' Statistic,"
ods (Wang 1993b; Booth, Hall, and Wood 1992; DiCiccio,
Biometrika, 79, 157-169.
Martin, and Young 1992), ANOVA and MANOVA (Butler, (1992b), "Saddlepoint Approximations for the Bartlett-Nonda-
Huzurbazar, and Booth 1992ab, 1993), prior distributions Pillai Trace Statistic in Multivariate Analysis," Biomnetrika, 79, 705-515.
(Eichenhaur-Herrmann and Ickstadt 1993), generalized lin- (1993), "Saddlepoint Approximations for Tests of Block Indepen-
dence, Sphericity and Equal Variances and Covariances," Journal of the
ear models (Strawderman, Casella, and Wells 1996), expo-
Royal Statistical Society, Ser. B, 55, 171-183.
nential linear models (Fraser et al. 1991), multiparameter
Cox, D. R., and Reid, N. (1987), "Parameter Orthogonality and Approx-
exponential families (Pierce and Peters 1992), studentized imate Conditional Inference" (with discussion), Journial of the Royal
means (Daniels and Young 1991), and 2 x 2 tables (Straw- Statistical Society, Ser. B, 49, 1-39.
derman and Wells 1998). Daniels, H. E. (1954), "Saddlepoint Approximations in Statistics," Annals
of Mathematical Statistics, 25, 631-650.
There is another use of the saddlepoint approximation
(1980), "Exact Saddlepoint Approximations," Biometrika, 67, 59-
that is, perhaps, even more important than the approxima-
63.
tion of a density function. That is the use of the saddlepoint (1983), "Saddlepoint Approximations for Estimating Equations,"
to approximate the tail area of a distribution. From (25) we Biometrika; 70, 89-96.
The American Statistician, August 1999, Vol. 53, No. 3 223
(1987), "Tail Probability Approximations," International Statistics Distribution of the Sum of Independent Random Variables," Advances
Review, 55, 37-48. in Applied Probability, 12, 475-490.
Daniels, H. E., and Young, G. A. (1991), "Saddlepoint Approximations for McCullagh, P. (1987), Tensor Methods in Statistics, London: Chapman and
the Studentized Mean," Biometrika, 78, 169-179. Hall.
DiCiccio, T. J., and Martin, M. A. (1993), "Simple Modifications for Pierce, D. A., and Peters, D. (1992), "Practical Use of Higher Order
Signed Roots of Likelihood Ratio Statistics," Journal of the Royal Sta- Asymptotics for Multiparameter Exponential Families" (with discus-
tistical Society, Ser. B, 55, 305-316. sion), Journal of the Royal Statistical Society, Ser. B, 54, 701-737.
DiCiccio, T. J., Martin, M. A., and Young, G. A. (1992), "Fast and Accurate Reid, N. (1988), "Saddlepoint Methods and Statistical Inference" (with
Double Bootstrap Confidence Intervals," Biometrika, 79, 285-295. discussion), Statistical Science, 3, 213-238.
Easton, G. S., and Ronchetti, E. (1986), "General Saddlepoint Approx- (1991), "Approximations and Asymptotics," in Statistical Theory
imations With Application to L Statistics," Journal of the American and Models, Essays in Honor of D. R. Cox, London: Chapman and Hall,
Statistical Association, 81, 420-430. pp. 287-334.
Efron, B. (1981), "Nonparametric Standard Errors and Confidence Inter- Robert, C. P., and Casella, G. (1999), Monte Carlo Statistical Methods,
vals" (with discussion), Canadian Journal of Statistics, 9, 139-172. New York: Springer-Verlag
Eichenhaur-Herrmann, J., and Ickstadt, K. (1993), "A Saddlepoint Charac- Strawderman, R. W., Casella, G., and Wells, M. T. (1996), "Practical Small
terization for Classes of Priors With Shape-Restricted Densities," Statis- Sample Asymptotics for Regression Problems," Jourinal of the American
tics and Decisions, 11, 175-179. Statistical Association, 91, 643-654.
Feller, W. (1971), An Introduction to Probability Theory and its Applica- Strawderman, R. W., and Wells, M. T. (1998), "Approximately Exact In-
tions (Vol. II), New York: Wiley. ference for the Common Odds Ratio in 2 x 2 Tables" (with discussion),
Field, C. (1982), "Small Sample Asymptotic Expansions for Multivariate Journal of the American Statistical Association, 93, 1294-1320.
M Estimators," The Annals of Statistics, 10, 672-689.
Stuart, A., and Ord, J. K. (1987), Kendall' s Advanced Theory of Statistics
Field, C., and Ronchetti, E. (1990), Small Sample Asymptotics, Hayward, (Vol. I, 5th ed.), New York: Oxford University Press.
CA: Institute of Mathematical Statistics.
Sweeting, T. J. (1987), Discussion of "Saddlepoint Approximations in
Fraser, D. A. S., Reid, N., and Wong, A. (1991), "Exponential Linear Mod- Statistics," by D. R. Cox and N. Reid, Journal of the Royal Statistical
els: A Two-Pass Procedure for Saddlepoint Approximation," Journal of Society, Ser. B, 49, 20-21.
the Royal Statistical Society, Ser. B, 53, 483-492.
Waller, L. A., Turnbull, B. W., and Hardin, J. M. (1995), "Obtaining Distri-
Gatto, R., and Ronchetti, E. (1996), "General Saddlepoint Approximations bution Functions by Numerical Inversion of Characteristic Functions,"
of Marginal Densities and Tail Probabilities," Journal of the American The American Statistician, 49, 346-350.
Statistical Association, 91, 666-673.
Wang, S. (1993a), "Saddlepoint Expansions in Finite Population Prob-
Hall, P. (1992), The Bootstrap and Edgeworth Expansion, New York:
lems," Biometrika, 80, 583-590.
Springer-Verlag.
(1993b), "Saddlepoint Methods for Bootstrap Confidence Bands in
Hougaard, P. (1988), Discussion of "Saddlepoint Methods and Statistical
Nonparametric Regression," Australian Journal of Statistics, 35, 93-101.
Inference," by N. Reid, Statistical Science, 3, 230-231.
Wood, A. T. A., Booth, J. G., and Butler, R. W. (1993), "Saddlepoint Ap-
Kolassa, J. E. (1994), Series Approximation Methods in Sstatistics, New
proximations to the CDF of Some Statistics With Nonnormal Limit
York: Springer-Verlag.
Distributions," Journal of the American Statistical Associationi, 88, 680-
Lugannani, R., and Rice, S. (1980), "Saddlepoint Approximation for the 686.
224 General

Explaining The Saddlepoint Approximation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Explaining The Saddlepoint Approximation

Uploaded by

Copyright:

Available Formats

Explaining the Saddlepoint Approximation

Constantino GOUTIS and George CASELLA

f (x) exp {h(xo) + (x - xo)h (xo) =exp {k (x, i(x))}-f2)

The above technique is called the Laplace approximation,

f (x) Jm(x,t) dt, (5) 9(9n -2 }. (8)

The American Statisticiani, August 1999, Vol. 53, No. 3 217

f(x) g(x - y)g(y) dy, f(x) = C where C =- 2

Jexp { (c- 1) [log(x -y) + log y] } dy

(The exact density of the sum of two iid t,

2.2 Approximating Marginals

Another application, where a function is naturally repre-

apoxiain are vitull the sam in th tails.

provided that the integral is finite for t in some open neigh-

The American1 Statistician, August 1999, Vol. 53, No. 3 219

Example 4: Poisson Distribution. Let Xl... . ,Xn be iid

This amounts to replacing the factorial in the exact distri-

fx(x) = J exp {nKx(it) - itx4 dt [ 6+ {u (( /

3.2 The Edgeworth Approach

p (( - )< w) Now the parameter -r is free for us to choose in (28) and,

The Americanz Statistician, August 1999, Vol. 53, No. 3 221

The saddlepoint approximation is useful in obtaining ap- f(slO) ~ 1

Thus, the approximate density of the sum is obtained al-

Fiuec xc estyo h L rmaPreodsrbtoJae

5. BEYOND BASICS [Received Decenmber 1995. Revised Junie 1998.]

The technique of the saddlepoint has widespread appli- REFERENCES

The American Statistician, August 1999, Vol. 53, No. 3 223

You might also like