Professional Documents
Culture Documents
c
1998 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.
Numerical Tools for the Bayesian Analysis of
Stochastic Frontier Models
JACEK OSIEWALSKI
Department of Econometrics, Academy of Economics, 31-510 Krak ow, Poland
MARK F. J. STEEL
*
CentER and Department of Econometrics, Tilburg University, 5000 LE Tilburg, The Netherlands
Abstract
In this paper we describe the use of modern numerical integration methods for making
posterior inferences in composed error stochastic frontier models for panel data or individual
cross- sections. Two Monte Carlo methods have been used in practical applications. We
survey these two methods in some detail and argue that Gibbs sampling methods can greatly
reduce the computational difculties involved in analyzing such models.
Keywords: Efciency analysis, composed error models, posterior inference, Monte Carlo-importance sampling,
Gibbs sampling
Introduction
The stochastic frontier or composed error framework was rst introduced in Meeusen and
van den Broeck (1977) and Aigner, Lovell and Schmidt (1977) and has been used in many
empirical applications. The reader is referred to Bauer (1990) for a survey of the literature.
In previous papers (van den Broeck, Koop, Osiewalski and Steel, 1994, hereafter BKOS;
Koop, Steel and Osiewalski, 1995; Koop, Osiewalski and Steel, 1994, 1997a,b,c, hereafter
KOS) we used Bayesian methods to analyze stochastic frontier models and argued that such
methods had several advantages over their classical counterparts in the treatment of these
models. Most importantly, the Bayesian methods we outlined enabled us to provide exact
nite sample results for any feature of interest and to take fully into account parameter
uncertainty. In addition, they made it relatively easy to treat uncertainty about which
model to use since they recommended taking weighted averages over all models, where the
weights were posterior model probabilities. We succesfully applied the Bayesian approach
in various empirical problems, ranging from hospital efciencies (KOS, 1997c) to analyses
of the growth of countries in KOS (1997a,b). Throughout the paper, we shall label the
individual units as rms, but the same models can, of course, be used in other contexts.
*
We acknowledge many helpful discussions with Gary Koop in the course of our joint work on previous
stochastic frontier papers as well as useful comments by an anonymous referee and the Editor of this Special
Issue. The rst author was supported by the Polish Committee for Scientic Research (KBN; grant no. 1-H02B-
015-11) during the work on the present version and acknowledges the hospitality of the Center for Operations
Research and Econometrics (Louvain-la-Neuve, Belgium) at the initial stage of the research.
104 OSIEWALSKI AND STEEL
KOS (1997c) investigates hospital efciency from a large panel of U.S. hospitals from
1987 to 1991. The techniques proposed are shown to be computationally feasible, even
in very high dimensions (the largest model involved a Gibbs sampler in 803 dimensions).
The consequences of certain prior assumptions are explicited and the usual within estimator
for the xed effects case (see Schmidt and Sickles, 1984) is reinterpreted in this context.
The latter estimator corresponds to an implied prior which strongly favours low efciency.
In addition to measuring hospital-specic efciencies, we also explain hospital efciency
through exogenous variables (i.e. non-prot, for-prot or government-run dummies and a
measure of workers per patient).
KOS (1997a,b) apply the concept of stochastic frontier to economic growth in a wide
variety of countries. Using a translog production frontier, KOS (1997a) decomposes output
change into technical, efciency and input changes and contrasts the stochastic frontier
approach with Data Envelopment Analysis (DEA). Since inefciency terms were treated
as distinct quantities across time (10 years) and countries (17 OECD countries), the di-
mension of the required numerical integration always exceeded the number of data points.
Nevertheless, Gibbs sampling proved both feasible and reliable in all models considered.
KOS (1997b) proposes an extension of this model for data from 44 countries with different
levels of development (yearly data for the period 19651990). Since it is unreasonable to
assume a constant input quality across these countries, we view output as depending on
effective inputs rather than actual inputs. The relationship between effective and actual
factors depends on observables such as education. In addition, we allow for measurement
error in observed capital. The added complications in the model lead to a frontier that is
bilinear in the parameters, which can easily be handled by Gibbs sampling. Despite the
increased size of the data set, the numerical methods proposed perform very well. In this
application, the largest model required integration in 1180 dimensions.
In this paper we describe numerical integration methods which are used to performa fully
Bayesian analysis of stochastic frontier models. We present these methods in a much more
systematic way than in previous papers and summarize our empirical experiences. For a
more general survey of numerical methods in Bayesian econometrics see Geweke (1995).
The paper is organized as follows. Section 1 introduces the Bayesian model we consider
in the paper. Section 2 discusses the necessity of modern (Monte Carlo type) numerical
integration for making posterior inferences in this context. Section 3 is devoted to Monte
Carlo with Importance Sampling and Section 4 presents Gibbs sampling.
1. The Sampling Model and Prior Distribution
The basic stochastic frontier sampling model considered here is given by:
y
ti
= h(x
ti
, ) +v
ti
z
ti
, (1)
where y
ti
is the natural logarithm of output (or the negative of log cost) for rm i at time
t (i = 1, . . . , N; t = 1, . . . , T); x
ti
is a row vector of exogenous variables; h, a known
measurable function, and , a vector of k unknown parameters, dene the deterministic
part of the frontier; v
ti
is a symmetric disturbance capturing the random nature of the
BAYESIAN ANALYSIS OF STOCHASTIC FRONTIER MODELS 105
frontier itself (due to, e.g., measurement error); z
ti
is a nonnegative disturbance capturing
the level of inefciency of rm i at time t ; and z
ti
and v
ti
are independent of each other
and across rms and time. Thus, we do not allow for autocorrelated errors at this stage of
our research. Generally, efciency will be measured as r
ti
= exp(z
ti
), which is an easily
interpretable quantity in (0, 1). If (1) represents a production function then r
ti
measures
technical efciency. In the case of a cost function, z
ti
captures the overall cost inefciency,
reecting cost increases due to both technical and allocative inefciency of the rmi at time
t . Often the translog specication is used as a exible functional form for h(x
ti
, ) in (1).
For the translog cost function, Kumbhakar (1997) derives the exact relationship between
allocative inefciency in the cost share equations and in the cost function, which indicates
that z
ti
s in (1) cannot be independent of the exogenous variables and the parameters in
the cost function. However, this independence assumption will usually be maintained as a
crude approximation because it leads to simpler posterior analysis.
Note that our framework is suitable for panel data, but the case of just one cross-section,
considered in many previous papers, is easily covered as it corresponds to T = 1. Here we
make the assumption that z
ti
is independent (conditionally upon whatever parameters are
necessary to describe its sampling distribution) over both i and t , as in KOS (1997a,b); see
also Pitt and Lee (1981, Model II). KOS (1997c) followan alternative modeling strategy and
assume that the inefciency level is an individual (rm) effect, i.e. z
ti
= z
i
(t = 1, . . . , T);
see also Pitt and Lee (1981, Model I) and Schmidt and Sickles (1984). In a Bayesian
context, KOS (1997c) dene the difference between xed and random effects models as a
difference in the structure of the prior information. Whenever the effects are marginally
prior independent, they talk of xed effects, whereas the situation with prior dependent
effects, linked through a hierarchical prior, is termed Bayesian random effects. They
provide both theoretical and empirical support for the latter type of model. Thus, we shall
focus on Bayesian random effects models in the sequel.
In general, a fully parametric Bayesian analysis requires specifying
(i) a sampling distribution parameterized by a nite-dimensional vector (say, ),
(ii) a prior distribution for that .
In order to satisfy (i) and obtain the likelihood function we assume that v
ti
is N(0,
2
),
i.e. Normal with zero mean and constant variance
2
, and z
ti
is Exponential with mean
(and standard deviation)
ti
which can depend on some (say, m 1) exogenous variables
explaining possible systematic differences in efciency levels. In particular, we assume
ti
=
m
j =1
w
ti j
j
(2)
where
j
> 0 are unknown parameters and w
ti 1
= 1. If m > 1, the distribution of z
ti
can differ for different t or i or both and thus we call this case the Varying Efciency
Distribution (VED) model. If m = 1, then
ti
=
1
1
and all inefciency terms constitute
independent draws from the same distribution. This case is called the Common Efciency
Distribution (CED) model. The choice of the Exponential distribution for z
ti
stems from
the ndings in BKOS, where a more general family of Erlang distributions and a truncated
Normal were considered as possible specications; their results clearly indicated that an
Exponential distribution is least sensitive to the particular choice of the prior on
ti
. Note
106 OSIEWALSKI AND STEEL
that the sampling density of the observable y
ti
given x
ti
, w
ti
= (w
ti 1
, . . . , w
ti m
) and
= (
,
2
,
1
, . . . ,
m
)
j =1
w
ti j
j
_
dz
ti
, (3)
where f
G
(. | a, b) indicates the Gamma density with mean a/b and variance a/b
2
. Alter-
natively, the sampling density can be represented as
p(y
ti
| x
ti
, w
ti
, ) =
1
ti
exp
_
1
ti
_
m
ti
+
1
2
1
ti
__
(m
ti
/), (4)
where
m
ti
= h(x
ti
, ) y
ti
2
/
ti
,
(.) denotes the distribution function of N(0, 1), and
ti
is given by (2). See Greene
(1990) for a similar expression and BKOS for the generalization to the Erlang context. The
likelihood function, L( | data), is the product of the densities (4) over t and i . As a result
of integrating out the inefciency terms z
ti
, the form of (4) is quite complicated indeed, and
even the numerical evaluation of the ensueing likelihood function is nontrivial, as shown in
BKOS.
An important aspect of any efciency analysis is making inferences on individual ef-
ciencies of observed rms. It is easy to show that, conditionally on the parameters and
the data, the unobserved inefciency term z
ti
of an observed y
ti
has a truncated Normal
distribution with density
p(z
ti
| y
ti
, x
ti
, w
ti
, ) = [(m
ti
/)]
1
f
1
N
(z
ti
| m
ti
,
2
)I (z
ti
0), (5)
and the rst two moments about 0 equal
E(z
ti
| y
ti
, x
ti
, w
ti
, ) = m
ti
+[(m
ti
/)]
1
f
1
N
(m
ti
/ | 0, 1) (6)
and
E(z
2
ti
| y
ti
, x
ti
, w
ti
, ) =
2
+m
2
ti
+m
ti
[(m
ti
/)]
1
f
1
N
(m
ti
/ | 0, 1); (7)
see Greene (1990) and BKOS. Both the density value at a prespecied point z
ti
and the
moments are complicated functions of the model parameters (given the data). If, instead
of the efciency of a particular rm i , we are interested in the efciency of a hypothetical
(unobserved) rm belonging to that industry, we should simply focus on the Exponen-
tial distribution with mean
ti
as in (2), given certain (representative) values for w
ti
if
m > 1.
In principle, the prior distribution of can be any distribution, but it is usually pre-
ferred not to introduce too much subjective information about the parameters. Therefore,
BAYESIAN ANALYSIS OF STOCHASTIC FRONTIER MODELS 107
we propose the following prior structure
p() = p(
2
) p() p(
1
, . . . ,
m
) f
G
_
2
|
n
0
2
,
s
0
2
_
f ()
m
j =1
f
G
(
j
| a
j
, g
j
), (8)
which reects lack of prior knowledge about the frontier parameters , possibly except
for regularity conditions imposed by economic theory. That is, we assume f () 1 if
there are no regularity constraints, and, if such conditions are imposed, f () = 1 for all
satisfying them and f () = 0 for all other . Alternatively, we could use a proper prior
distribution on , possibly truncated to the region of regularity. Typically, we shall choose
the prior hyperparameters n
0
> 0 and s
0
> 0 so as to represent very weak prior information
on the precision of the stochastic frontier. Note that we cannot take as the prior density for
2
the kernel of the limiting case where s
0
= 0, because this would result in the lack of
existence of the posterior distribution (see Fern andez, Osiewalski and Steel, 1997). Thus,
the use of the usual Jeffreys type prior for
2
(which corresponds to the Gamma kernel
with n
0
= s
0
= 0) is precluded, unless we put some panel structure on the inefciency
terms by assuming that, e.g., they are time-invariant individual effects as in KOS (1997c).
For the m parameters of the efciency distribution we take proper, independent Gamma
priors in order to avoid the pathology described by Ritter (1993) and discussed in more
general terms by Fern andez, Osiewalski and Steel (1997). Following KOS (1997b,c), we
suggest using a
j
= g
j
= 1 for j > 1, a
1
= 1 and g
1
= ln(r
), where r
(0, 1) is the
hyperparameter to be elicited. In the CED model (m = 1), r
i =1
K
i
P(M
i
)
, (9)
where, for each model, K
j
denotes the integrating constant of the posterior density, i.e.
K
j
=
_
p()L
j
( | data) d,
with L
j
( | data) the likelihood corresponding to M
j
. The ratio K
j
/K
i
is called the
Bayes factor of model j against model i and summarizes the relative evidence in the data
supporting M
j
as opposed to M
i
. Posterior model probabilities can guide us in choosing a
single model, but are more naturally used as weights in a pooled analysis (over the whole set
of p models), thus formally taking model uncertainty into account. In BKOS, three Erlang
distributions and a general truncated Normal distribution for the nonnegative disturbance
constituted the class of models under consideration, and their posterior probabilities were
used as weights to produce the overall results.
Using g(; data) as a generic notation for any function of interest, we can represent our
Bayesian inference problem as the ratio of integrals
E[g(; data) | data) =
_
p()L( | data) d
. (10)
The main numerical difculty amounts to evaluating this ratio. In the case of the stochastic
frontier model, the likelihood is too complex to analytically calculate any such posterior
summary. Numerical integration methods are unavoidable. Most quantities of interest such
as moments of the parameters or of functions of the parameters, probabilities of certain
regions for the parameters, etc. can be expressed as expectations of some g(.; .) in (10).
The integrating constant of the posterior, which is crucial for the calculation of Bayes
factors, as mentioned above, exactly corresponds to the integral in the denominator of (10).
Note that traditional quadratures, like Cartesian product rules, are not helpful because
they are feasible only when the dimension of , equal to k +m +1, is very small. Iterative
quadrature, as discussed in Naylor and Smith (1982), combined with a judicious choice of
parameterization, has been known to handle problems up to 5 or 6 dimensions. However,
our model is inherently non-standard and the dimension of is already 5 in the simplest
case of the CEDmodel (m = 1) with a Cobb-Douglas frontier depending on = (
1
3
)
(k = 3), and will be much higher for more complicated models. This effectively renders
these type of numerical integration procedures quite useless.
Another interesting approach is based on Laplaces method to approximate the integrals
in (10). Tierney and Kadane (1986) show how the ratio form of (10) can be exploited
in applying this method. This method is quite inexpensive to implement as it requires
no generation of drawings from the posterior, but relies crucially upon properties of the
BAYESIAN ANALYSIS OF STOCHASTIC FRONTIER MODELS 109
integrand (such as unimodality) and quickly becomes problematic in multidimensional
settings. As a consequence, our stochastic frontier problems do not lend themselves easily
to analysis through such approximations.
A straightforward Monte Carlo method would be to generate i.i.d. drawings from the
posterior distribution and use this to calculate empirical sampling counterparts of the inte-
grals in (10). Such a Direct Monte Carlo integration would be fairly easy to implement, but
requires the ability to draw from the posterior distribution. The latter is often not a trivial
matter, and certainly not feasible for the class of models treated here. Ley and Steel (1992)
use Rejection Sampling (see e.g. Devroye (1986) or Smith and Gelfand (1992)) to draw
directly from the posterior for a version of our model with no measurement error v
ti
in (1).
However, when measurement error is allowed for, these rejection methods can no longer
be used.
An important issue in conducting parametric inference is the choice of parameterization.
Hills and Smith (1992) argue that the adequacy and efciency of numerical and analytical
techniques for Bayesian inference can depend on the parameterization of the problem.
These issues will not be investigated in the present paper.
In the next Sections, we will discuss in more detail two numerical integration procedures
that have successfully been used for a Bayesian analysis of the type of stochastic frontier
models introduced in Section 1. In particular, these methods are Monte Carlo Importance
Sampling integration and Markov Chain Monte Carlo methods.
3. Monte Carlo with Importance Sampling
The Monte CarloImportance Sampling (MC-IS) approach to calculating the generic inte-
gral
I =
_
A() d =
_
A()
s()
s() d = E
s
_
A()
s()
_
, (12)
where E
s
denotes expectation with respect to s, we estimate I by the sample mean
I =
1
M
M
l=1
A(
(l)
)/s(
(l)
) (13)
based on M independent drawings
(1)
, . . . ,
(M)
from s(). The ratio of integrals in (10)
is estimated by the appropriate ratio of sample means. The asymptotic variance of the latter
ratio (as M tends to innity) can be estimated from the same sample (
(1)
, . . . ,
(M)
). For
theoretical details see e.g. OHagan (1994, p. 223224), Geweke (1989). An alternative
110 OSIEWALSKI AND STEEL
way of interpreting I in (11) is the following:
I =
_
g(; data)
p()L( | data)
s()
s() d
= E
s
_
g(; data)
p()L( | data)
s()
_
, (14)
which immediately shows us that the sample estimate of I in (13) can also be written as
I =
1
M
M
l=1
g(
(l)
; data)w(
(l)
), (15)
where w() = p()L( | data)/s() indicates the weight attached to each drawing.
Note that through choosing g(; data) = 1, we approximate K
j
in (9), i.e. the integrating
constant, by the sample average of the weights. Typically, this integrating constant of the
posterior will not be known, so that we only have the posterior kernel at our disposal. This
essentially means that we will need to evaluate the ratio of integrals in (10) in order to obtain
a correctly scaled Monte Carlo estimate of the particular g(; data) we have in mind.
To achieve convergence in practice, it is imperative that we avoid having wildly varying
weights w(
(l)
); in particular, we want to avoid having one or two drawings that completely
dominate the sample average in (15). Thus, we choose the importance function with thick
enough tails, so that the ratio of the posterior to the importance function does not blow up
if we happen to draw far out in the tails. Usually, a multivariate Student t density with a
small number of degrees of freedom(like a Cauchy density), centered around the maximum
likelihood estimate, will do.
BKOS presented an application of MC-IS to simple 7- and 8-parameter stochastic frontier
cost models for cross-section data (T = 1) and found this numerical method feasible, but
very time-consuming and requiring a lot of effort on the part of the researcher. First, nding
(and ne-tuning!) a good importance function was not a trivial task. Second, even about
150,000 drawings, obtained in a typical overnight run on a PC, did not lead to smooth
density plots, as a result of some very large weights. Those experiences were not promising
for highly dimensional problems. Perhaps adopting a different parameterization (see the
discussion in Section 2) could improve matters, but a different method, requiring much
less judgmental user input, is readily available. Using the same models and data, Koop,
Steel and Osiewalski (1995) showed that Gibbs sampling, a chief competitor to MC-IS, is
superior to the latter method both in terms of the total computing time (required to obtain
reliable nal results) and the effort of the researcher. We will discuss Gibbs sampling in
the next section.
It is worth noting that the actual implementation of MC-IS, presented by BKOS, did not
rely on analytical formulae for features of the conditional distribution of the z
ti
s given
the data and parameters [the strategy of using knowledge of the conditional properties is
often referred to as an application of the Rao-Blackwell Theorem, see e.g., Gelfand and
Smith (1990)]. Instead of evaluating expressions like (5) for the density plot or (6) and
(7) for moments at each draw
(l)
, z
(l)
ti
was randomly drawn from the conditional density
p(z
ti
| y
ti
, x
ti
, w
ti
, =
(l)
), and the posterior summaries were calculated on the basis of
BAYESIAN ANALYSIS OF STOCHASTIC FRONTIER MODELS 111
the sample (z
(1)
ti
, . . . , z
(M)
ti
). However, the increase in dimensionality of the Monte Carlo
due to sampling z
ti
s for a few selected rms is not that important because each z
ti
is drawn
from its actual conditional posterior distribution. In the joint posterior density
p(z
ti
, | data) = p(z
ti
| y
ti
, x
ti
, w
ti
, ) p( | data),
only the marginal density of has to be mimicked by the importance function s() which
creates the main challenge for MC-IS. The conditional density of z
ti
is a univariate Normal
density truncated to be nonnegative. Simulation from this density is simple using the
algorithmdescribed in Geweke (1991). This algorithmis a combination of several different
algorithms depending on where the truncation point is. For example, if the truncation point
is far in the left tail the algorithmsimply draws fromthe unrestricted Normal, discarding the
rare draws which are beyond the truncation point. For other truncation points the algorithm
uses rejection methods based on densities which tend to work well in the relevant region.
4. Gibbs Sampling
Gibbs sampling is a technique for obtaining a sample from a joint distribution of a random
vector by taking random draws from only full conditional distributions. A detailed
description of the technique can be found in e.g. Casella and George (1992), Gelfand and
Smith (1990), and Tierney (1994).
Suppose we are able to partition into (
1
, . . . ,
p
)
(q+1)
1
is drawn from p(
1
|
2
=
(q)
2
, . . . ,
p
=
(q)
p
),
(q+1)
2
is drawn from p(
2
|
1
=
(q+1)
1
,
3
=
(q)
3
, . . . ,
p
=
(q)
p
),
. . .
(q+1)
p
is drawn from p(
p
|
1
=
(q+1)
1
, . . . ,
p1
=
(q+1)
p1
).
Note that each pass consists of p steps, i.e. drawings of the p subvectors of . The starting
point,
(0)
, is arbitrary. Under certain general conditions [irreducibility and aperiodicity as
described in e.g., Tierney (1994)], the distribution of
(q)
converges to the joint distribution,
p(), as q tends to innity. Thus, we draw a sample directly from the joint distribution in
an asymptotic sense. In practical applications we have to discard a (large) number of passes
before convergence is reached.
In contrast to the MC-IS approach, we now no longer have independent drawings from
the posterior, but the Markovian structure of the sampler induces serial correlation be-
tween the drawings. This will, in itself, tend to reduce the numerical efciency of the
Gibbs method [see Geweke (1992) for a formal evaluation of numerical efciency based
on spectral analysis]. However, variation of the weights has a similar negative effect on
112 OSIEWALSKI AND STEEL
the efciency of the MC-IS method, and the Gibbs sampler does not require a judicious
choice of an importance function s(). Gibbs sampling is, in a sense, more automatic
and less dependent upon critical input fromthe user. As the drawings in Gibbs sampling are
(asymptotically) from the actual posterior distribution, which is properly normalized, there
is no need to evaluate the integrating constant K
j
separately. This has two consequences:
we do not require the evaluation of a ratio of integrals as in (10) in order to normalize
our estimates of g(; data), but, on the other hand, the Gibbs sampler does not easily lend
itself to approximating K
j
which is naturally obtained by integrating the likelihood with
the prior, as in (9). This makes the calculation of Bayes factors, used in assessing posterior
model probabilities (see Section 2) less immediate than in the MC-IS context. However,
a number of solutions based on the harmonic mean of the likelihood function is presented
in Newton and Raftery (1994), and in Gelfand and Dey (1994). Chib (1995) proposes an
entirely different approach based on a simple identity for K
j
.
In order to efciently use Gibbs sampling to make posterior inferences on both the pa-
rameters and rm efciencies, we have to consider the joint posterior density of and z,
p(, z | data), where z is a T N 1 vector of all the z
ti
s. Now, instead of integrating out z,
which was shown to lead to a very nonstandard likelihood function in Section 1, we shall
consider given z and the data, which is quite easy to deal with. On the other hand, this
implies that we also need to include z in the Gibbs sampler. The underlying idea of adding
random variables to the sampler in order to obtain simple conditional distributions is often
called Data Augmentation, after Tanner and Wong (1987). Note that the dimension is then
NT +k +m+1, greater than the number of observations. Despite this high dimensionality,
the steps involved in the Gibbs are very easy to implement, and the resulting sampler is
found in Koop, Steel and Osiewalski (1995) to have very good numerical properties, and
to be far preferable to Monte Carlo with Importance Sampling in the particular application
used. The conditional posterior density p(z | data, ) is the product of the TN independent
truncated Normal densities given by (5), so we can very easily draw z
ti
s given the data and
the parameters. These draws are immediately transformed into efciency indicators dened
as r
ti
= exp(z
ti
). Thus, this NT-dimensional step of each pass of our Gibbs sampler is
quite simple. It is worth stressing at this stage that the Gibbs sampler, unlike the Monte
Carlo approach outlined in BKOS, yields a draw of the whole vector z at each pass, and
that for this reason, the efciency measures for all N rms and all T periods are obtained
as a by-product of our Gibbs sampling methodology.
However, the main difference with respect to MC-IS is in drawing . Now, the unwieldy
form of the marginal posterior, p( | data), is not used at all, and we focus instead on the
conditional posterior densities of subvectors of given the remaining parameters and z.
Given z, the frontier parameters (,
2
) are independent of and can be treated as the
parameters of the (linear or nonlinear) Normal regression model in (1). Thus we obtain the
following full conditionals for
2
and :
p(
2
| data, z, )= f
G
_
2
|
n
0
+ T N
2
,
1
2
_
s
0
+
t,i
[y
ti
+z
ti
h(x
ti
, )]
2
__
, (16)
BAYESIAN ANALYSIS OF STOCHASTIC FRONTIER MODELS 113
p( | data, z,
2
) f () exp
_
1
2
t,i
(y
ti
+ z
ti
h(x
ti
, ))
2
_
. (17)
The full conditional posterior densities of
j
( j = 1, . . . , m) have the general form
p(
j
| data, z,
(j )
) f
G
_
j
| a
j
+
t,i
w
ti j
, g
j
_
exp
_
t,i
z
ti
D
ti 1
_
, (18)
where
D
tir
=
j =r
w
ti j
j
for r = 1, . . . , m (D
ti 1
= 1 when m = 1) and
(j )
denotes without its j th element.
Since w
ti 1
= 1, the full conditional of
1
is just Gamma with parameters a
1
+ NT and
g
1
+ z
11
D
111
+ + z
T N
D
T N1
.
Depending on the form of the frontier and on the values of w
ti j
s for j > 2, the full
conditionals for and for
j
( j = 2, . . . , m) can be quite easy or very difcult to drawfrom.
Drawing from nonstandard conditional densities within the Gibbs sampler requires special
techniques, like rejection methods or the Metropolis-Hastings algorithm [see e.g., Tierney
(1994) or OHagan (1994)]. Alternatively, lattice-based methods such as the Griddy Gibbs
sampler as described in Ritter and Tanner (1994) can be used. In the context of stochastic
frontiers, Koop, Steel and Osiewalski (1995) use rejection sampling to deal with non-
Exponential Erlang inefciencydistributions for z
ti
, andKOS(1994) applyanindependence
chain Metropolis-Hastings algorithm (see Tierney, 1994) to handle a complicated exible
form of h(x
ti
, ). Especially the latter procedure implies a substantial added complexity in
the numerical integration and requires additional input from the user, not unlike the choice
of an importance function in MC-IS. Therefore, we stress two important special cases where
considerable simplications are possible:
(i) linearity of the frontier,
(ii) 0-1 dummies for w
ti j
( j = 2, . . . , m).
If h(x
ti
, ) = x
ti
then (17) is a k-variate Normal density, possibly truncated due to
regularity conditions. That is, we have
p( | data, z,
2
) f () f
k
N
( |
,
2
(X
X)
1
), (19)
where
= (X
X)
1
X
(y + z),
and y and X denote a NT 1 vector of y
ti
s and a NT k matrix with x
ti
s as rows,
respectively. Cobb-Douglas or translog frontiers serve as examples of linearity in ; see
Koop, Steel and Osiewalski (1995) and KOS (1997a,c). It is worth stressing that linearity
in all elements of is not essential for obtaining tractable conditionals. Suppose that
= (
2
) and the frontier is linear in
1
given
2
. Then the full conditional of
1
alone is
multivariate (truncated) Normal. Even if the conditional of
2
is not standard, its dimension
114 OSIEWALSKI AND STEEL
is smaller than that of . Of course, a bilinear frontier (linear in
1
given
2
and in
2
given
1
) leads to (truncated) Normal conditional densities for both subvectors. Bilinear frontiers
are thus quite easy to handle, as evidenced by KOS (1997b) who consider a Cobb-Douglas
production frontier with simple parametric effective factor corrections.
The dichotomous (0-1) character of the variables explaining efciency differences (w
ti j
;
j = 2, . . . , m) greatly simplies (18) which simply becomes a Gamma density:
p(
j
| data, z,
(j )
) = f
G
(
j
| a
j
+
t,i
w
ti j
, g
j
+
t,i
w
ti j
z
ti
D
ti j
), (20)
From the purely numerical perspective, it pays to dichotomize original variables which are
not 0-1 dummies.
The above discussion conrms that the stochastic frontier model considered in this paper
can be analyzed using Gibbs sampling methods. That is, even though the marginal posteriors
of and the z
ti
s are unwieldy, the conditionals for a suitable partition of the set of parameters
should be much easier to work with. By taking a long enough sequence of successive draws
from the conditional posterior densities, each conditional on previous draws from the other
conditional densities, we can create a sample which can be treated as coming from the joint
posterior distribution.
The posterior expectation of any arbitrary function of interest, g(, z; data), can be ap-
proximated by its sample mean, g