Handbook of Statistics Vol 14 (Elsevier, 1996) WW

Preface
As with the earlier volumes in this series, the main purpose of this volume of the Handbook of Statistics is to serve as a source reference and teaching supplement to courses in empirical finance. Many graduate students and researchers in the finance area today use sophisticated statistical methods but there is as yet no comprehensive reference volume on this subject. The present volume is intended to fill this gap. The first part of the volume covers the area of asset pricing. In the first paper, Ferson and Jagannathan present a comprehensive survey of the literature on econometric evaluation of asset pricing models. The next paper by Harvey and Kirby discusses the problems of instrumental variable estimation in latent variable models of asset pricing. The next paper by Lehman reviews semi-parametric methods in asset pricing models. Chapter 23 by Shanken also falls in the category of asset pricing. Part II of the volume on term structure of interest rates consists of only one paper by Pagan, Hall and Martin. The paper surveys both the econometric and finance literature in this area, and shows some similarities and divergences between the two approaches. The paper also documents several stylized facts in the data that prove useful in assessing the adequacy of the different models. Part III of the volume deals with different aspects of volatility. The first paper by Ghysels, Harvey and Renault present a comprehensive survey on the important topic of stochastic volatility models. These models have their roots both in mathematical finance and financial econometrics and are an attractive alternative to the popular ARCH models. The next paper by LeRoy presents a critical review of the literature on variance-bounds tests for market efficiency. The third paper by Palm on GARCH models of stock price volatility, surveys some more recent developments in this area. Several surveys on the ARCH models have appeared in the literature and these are cited in the paper. The paper surveys developments since the appearance of these surveys. Part IV of the volume deals with prediction problems. The first paper by Diebold and Lopez deals with the statistical methods of evaluation of forecasts. The second paper by Kaul, reviews the literature on the predictability of stock returns. This area has always fascinated those involved in inaking money in financial markets as well as academics who presumably are interested in studying whether one can, in fact, make money in the financial markets. The third paper by Lahiri reviews statistical
vi
Preface
evidence on interest rate spreads as predictors of business cycles. Since there is not much of a literature to survey in this area, Lahiri presents some new results. Part V of the volume deals with alternative probabilistic models in finance. The first paper by Brock and deLima surveys several areas subsumed under the rubic "complexity theory." This includes chaos theory, nonlinear time series models, long memory models and models with asymmetric information. The next paper by Cameron and Trivedi surveys the area of count data models in finance. In some financial studies, the dependent variable is a count, taking non-negative integer values. The next paper by McCulloch surveys the literature on stable distributions. This area was very active in finance in the early 60's due to the work by Mandelbrot but since then has not received much attention until 'recently when interest in stable distributions has revived. The last paper by McDonald reviews the variety of probability distributions which have been and can be used in the statistical analysis of financial data. Part VI deals with application of specialized statistical methods in finance. This part covers important statistical methods that are of general applicability (to all the models considered in the previous sections) and not covered adequately in the other chapters. The first paper by Maddala and Li covers the area of bootstrap methods. The second paper by Rao covers the area of principal component and factor analyses which has, during recent years, been widely used in financial research particularly in arbitrage pricing theory (APT). The third paper by Maddala and Nimalendran reviews the area of errors in variables models as applied to finance. Almost all variables in finance suffer from the errors in variables problems. The fourth paper b y Qi surveys the applications of artificial neutral networks in financial research. These are general nonparametric nonlinear models. The final paper by Maddala reviews the applications of limited dependent variable models in financial research. Part VII of the volume contains surveys of miscellaneous other problems. The first paper by Bates surveys the literature on testing option pricing models. The next paper by Evans discusses what are known in the financial literature as "peso problems." The next paper by Hasbrouck covers market microstructure, which is an active area of research in finance. The paper discusses the time series work in this area. The final paper by Shanken gives a comprehensive survey of tests of portfolio efficiency. One important area left out has been the use of Bayesian methods in finance. In principle, all the problems discussed in the several chapters of the volume can be analyzed from the Bayesian point of view. Much of this work remains to be done. Finally, we would like to thank Ms. Jo Ducey for her invaluable help at several stages in the preparation of this volume and patient assistance in seeing the manuscript through to publication. G. S. Maddala C. R. Rao
Contributors
D. S. Bates, Department of Finance, Wharton School, University of Pennsylvania, Philadelphia, PA 19104, USA (Ch. 20) W. A. Brock, Department of Economics, University of Wisconsin, Madison, WI 53706, USA (Ch. 11) A. C. Cameron, Department of Economics, University of California at Davis, Davis, CA 95616-8578, USA (Ch. 12) P. J. F. de Lima, Department of Economics, The Johns Hopkins University, Baltimore, MD 21218, USA (Ch. 11) F. X. Diebold, Department of Economics, University of Pennsylvania, Philadelphia, PA 19104, USA (Ch. 8) M. D. D. Evans, Department of Economics, Georgetown University, Washington DC 20057-1045, USA (Ch. 21) W. E. Ferson, Department of Finance, University of Washington, Seattle, WA 98195, USA (Ch. 1) E. Ghysels, Department of Economics, The Pennsylvania State University, University Park, PA 16802 and CIRANO (Centre interuniversitaire de recherche en analyse des organisations), Universitd de Montrdal, Montrdal, Quebec, Canada H3A2A5 (Ch. 5) A. D. Hall, School of Business, Bond University, Gold Coast, QLD 4229, Australia (Ch. 4) A. C. Harvey, Department of Statistics, London School of Economics, Houghton Street, London WC2A 2AE, UK (Ch. 5) C. R. Harvey, Department of Finance, Fuqua School of Business, Box 90120, Duke University, Durham, NC 27708-0120, USA (Ch. 2) J. Hasbrouck, Department of Finance, Stern School of Business, 44 West 4th Street, New York, N Y 10012-1126, USA (Ch. 22) R. Jagannathan, Finance Department, School of Business and Management, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong (Ch. 1) G. Kaul, University of Michigan Business School, Ann Harbor, M Z 481091234 (Ch. 9) C. M. Kirby, Department of Finance, College of Business & Mgm., University of Maryland, College Park, MD 20742, USA (Ch. 2) K. Lahiri, Department of Economics, State University of New York at Albany, Albany, N Y 12222 USA (Ch. 10)
XV
xvi
Contributors
B. N. Lehmann, Graduate School of International Relations, University of California at San Diego, 9500 Gilman Drive, LaJolla, CA 92093-0519, USA (Ch. 3) S. F. LeRoy, Department of Economics, University of California at Santa Barbara, Santa Barbara, CA 93106-9210 (Ch. 6) H. Li, Department of Management Science, The Chinese University of Hongkong, 302 Leung Kau Kui Building, Shatin, NT, Hong Kong (Ch. 15) J. A. Lopez, Department of Economics, University of Pennsylvania, Philadelphia, PA 19104, USA (Ch. 8) G. S. Maddala, Department of Economics, Ohio State University, 1945 N. High Street, Columbus, OH 43210-1172, USA (Chs. 15, 17, 19) V. Martin, Department of Economics, University of Melbourne, Parkville, VIC 3052, Australia (Ch. 4) J. H. McCulloch, Department of Economics and Finance, 410 Arps Hall, 1945 N. High Street, Columbus, OH 43210-1172, USA (Ch. 13) J. B. McDonald, Department of Economics, Brigham Young University, Provo, UT 84602, USA (Ch. 14) M. Nimalendran, Department of Finance, College of Business, University of Florida, Gainesville, FL 32611, USA (Ch. 17) A. R. Pagan, Economics Program, RSSS, Australian National University, Canberra, ACT 0200, Australia (Ch. 4) F. C. Palm, Department of Quantitative Economics, University of Limburg, P.O. Box 616, 6200 MD Maastricht, The Netherlands (Ch. 7) M. Qi, Department of Economics, College of Business Administration, Kent State University, P.O. Box 5190, Kent, OH 44242 (Ch. 18) C. R. Rao, The Pennsylvania State University, Center for Multivariate Analysis, Department of Statistics, 325 Classroom Bldg., University park, PA 168026105, USA (Ch. 16) E. Renault, lnstitut D'Economie Industrielle, Universitd des Sciences Sociales, Place Anatole France, F-31042 Toulouse Cedex, France (Ch. 5) J. Shanken, Department of Finance, Simon School of Business, University of Rochester, Rochester, N Y 14627, USA (Ch. 23) P. K. Trivedi, Department of Economics, Indiana University, Bloomington, IN 47405-6620, USA (Ch. 12) J. G. Wang, AT&T, Rm. N460-WOS, 412 Mt. Kemble Avenue, Morristown, NJ 07960, USA (Ch. 10)
G. S. Maddala and C. R. Rao, eds., Handbookof Statistics, Vol. 14 1996 Elsevier Science B.V. All rights reserved.
Econometric Evaluation of Asset Pricing Models*
Wayne E. Ferson and Ravi Jagannathan
We provide a brief review of the techniques that are based on the generalized method of moments ( G M M ) and used for evaluating capital asset pricing models. We first develop the C A P M and multi-beta models and discuss the classical twostage regression method originally used to evaluate them. We then describe the pricing kernel representation of a generic asset pricing model; this representation facilitates use of the G M M in a natural way for evaluating the conditional and unconditional versions of most asset pricing models. We also discuss 'diagnostic methods that provide additional insights.
1. Introduction
A major part of the research effort in finance is directed toward understanding why we observe a variety of financial assets with different expected rates of return. F o r example, the U.S. stock market as a whole earned an average annual return of 11.94% during the period from January of 1926 to the end of 1991. U.S. Treasury bills, in contrast, earned only 3.64%. The inflation rate during the same period was 3.11% (see Ibbotson Associates 1992). To appreciate the magnitude of these differences, note that in 1926 a nice dinner for two in New York would have cost about $10. I f the same $10 had been invested in Treasury bills, by the end of 1991 it would have grown to $110, still enough for a nice dinner for two. Yet $10 invested in stocks would have grown to $6,756. The point is that the average return differentials a m o n g financial assets are both substantial and economically important. A variety of asset pricing models have been proposed to explain this phenomenon. Asset pricing models describe how the price of a claim to a future payoff is determined in securities markets. Alternatively, we m a y view asset pri-
* Ferson acknowledges financial support from the Pigott-PACCAR Professorship at the University of Washington. Jagannathan acknowledges financial support from the National Science Foundation, grant SBR-9409824. The views expressed herein are those of the authors and not necessarily those of the Federal Reserve Bank of Minneapolis or the Federal Reserve System.
W.E. Ferson and R. Jagannathan
cing models as describing the expected rates of return on financial assets, such as stocks, bonds, futures, options, and other securities. Differences among the various asset pricing models arise from differences in their assumptions that restrict investors' preferences, endowments, production, and information sets; the stochastic process governing the arrival of news in the financial markets; and the type of frictions allowed in the markets for real and financial assets. While there are differences among asset pricing models, there are also important commonalities. All asset pricing models are based on one or more of three central concepts. The first is the law of one price, according to which the prices of any two claims which promise the same future payoff must be the same. The law of one price arises as an implication of the second concept, the no-arbitrage principle. The no-arbitrage principle states that market forces tend to align the prices of financial assets to eliminate arbitrage opportunities. Arbitrage opportunities arise when assets can be combined, by buying and selling, to form portfolios that have zero net cost, no chance of producing a loss, and a positive probability of gain. Arbitrage opportunities tend to be eliminated by trading in financial markets, because prices adjust as investors attempt to exploit them. For example, if there is an arbitrage opportunity because the price of security A is too low, then traders' efforts to purchase security A will tend to drive up its price. The law of one price follows from the no-arbitrage principle, when it is possible to buy or sell two claims to the same future payoff. If the two claims do not have the same price, and if transaction costs are smaller than the difference between their prices, then an arbitrage opportunity is created. The arbitrage pricing theory (APT, Ross 1976) is one of the most well-known asset pricing model based on arbitrage principles. The third central concept behind asset pricing models is financial market equilibrium. Investors' desired holdings of financial assets are derived from an optimization problem. A necessary condition for financial market equilibrium in a market with no frictions is that the first-order conditions of the investors' optimization problem be satisfied. This requires that investors be indifferent at the margin to small changes in their asset holdings. Equilibrium asset pricing models follow from the first-order conditions for the investors' portfolio choice problem and from a market-clearing condition. The market-clearing condition states that the aggregate of investors' desired asset holdings must equal the aggregate "market portfolio" of securities in supply. The earliest of the equilibrium asset pricing models is the Sharpe-LintnerMossin-Black capital asset pricing model (CAPM), developed in the early 1960s. The CAPM states that expected asset returns are given by a linear function of the assets' betas, which are their regression coefficients against the market portfolio. Merton (1973) extended the CAPM, which is a single-period model, to an economic environment where investors make consumption, savings, and investment decisions repetitively over time. Econometrically, Merton's model generalizes the CAPM from a model with a single beta to one with multiple betas. A multiple-beta model states that assets' expected returns are linear functions of a number of betas. The APT of Ross (1976) is another example of a multiple-beta
Econometric evaluation of asset pricing models
asset pricing model, although in the APT the expected returns are only approximately a linear function of the relevant betas. In this paper we emphasize (but not exclusively) the econometric evaluation of asset pricing models using the generalized method of moments (GMM, Hansen 1982). We focus on the G M M because, in our opinion, it is the most important innovation in empirical methods in finance within the past fifteen years. The approach is simple, flexible, valid under general statistical assumptions, and often powerful in financial applications. One reason the G M M is "general" is that many empirical methods used in finance and other areas can be viewed as special cases of the G M M . The rest of this paper is organized as follows. In Section 2 we develop the CAPM and multiple-beta models and discuss the classical two-stage regression procedure that was originally used to evaluate these models. This material provides an introduction to the various statistical issues involved in the empirical study of the models; it also motivates the need for multivariate estimation methods. In Section 3 we describe an alternative representation of the asset pricing models which facilitates the use of the G M M . We show that most asset pricing models can be represented in this stochastic discount factor form. In Section 4 we describe the G M M procedure and illustrate how to use it to estimate and test conditional and unconditional versions of asset pricing models. In Section 5 we discuss model diagnostics that provide additional insight into the causes for statistical rejections and that help assess specification errors in the models. In order to avoid a proliferation of symbols, we sometimes use the same symbols to mean different things in different subsections. The definitions should be clear from the context. We conclude with a summary in Section 6.
2. Cross-sectional regression methods for testing beta pricing models

In this section we first derive the CAPM and generalize its empirical specification to include multiple-beta models. We then describe the intuitively appealing crosssectional regression method that was first employed by Black, Jensen, and Scholes (1972, abbreviated here as BJS) and discuss its shortcomings.
2.1. The capital asset pricing model

The CAPM was the first equilibrium asset pricing model, and it remains one of the foundations of financial economics. The model was developed by Sharpe (1964), Lintner (1965), Mossin (1966), and Black (1972). There are a huge number of theoretical pa~oers which refine the necessary assumptions and provide derivations of the CAP~L He-r-ewe providea4~r-iefn'eview of the theory. Let Rit denote one plus the return on asset i during period t, i = 1, 2, ..., N. Let Rmt denote the corresponding gross return for the market portfolio of all assets in the economy. The return on the market portfolio envisioned by the theory is not observable. In view of this, empirical studies of the CAPM commonly assume
W.E. Ferson and R..lagannathan
that the market return is an exact linear function of the return on an observable portfolio of common stocks.1 Then, according to the CAPM, E(Rit) = 60 + 61fli where (2.1)
fli = Cov(Rit, Rmt) /Var(Rmt) .

According to the CAPM, the market portfolio with return Rmt is on the minimum-variance frontier of returns. A return is said to be on the minimumvariance frontier if there is no other portfolio with the same expected return but lower variance. If investors are risk averse, the CAPM implies that Rmt is on the positively sloped portion of the minimum-variance frontier, which implies that the coefficient 61 > 0. In equation (2.1), 60 = E(R0t), where the return Rot is referred to as a zero-beta asset to Rmt because of the condition Cov(Rot,Rmt) = O. To derive the CAPM, assume that investors'choose asset holdings at each date t - 1 so as to maximize the following one-period objective function:
V[E(Rpt 1I), Var(Rpt II)]
(2.2)
where Rpt denotes the date t return on the optimally chosen portfolio and E(.II) and Var(.[/) denote the expectation and variance of return, conditional on the information set I of the investor as of time t-1. We assume that the function V[-,.] is increasing and concave in its first argument, decreasing in its second argument, and time-invariant. For the moment we assume that the information set Iincludes only the unconditional moments of asset returns, and we drop the symbol I to simplify the notation. The first-order conditions for the optimization problem given above can be manipulated to show that the following must hold:
E(Rit) = E(R0t ) ~- flipE(Rp, - Rot ) (2.3)
for every asset i = 1, 2, ..., N, where Rz,t is the return on the optimally chosen portfolio, Rot is the return on the asset that has zero covariance with Rpt, and flip
= Cov(Rit,Rpt)/Var(Rpt).
To get from the first-order condition for an investor's optimization problem, as stated in equation (2.3), to the CAPM, it is useful to understand some of the properties of the minimum-variance frontier, that is, the set of portfolio returns with the minimum variance, given their expected returns. It can be readily verified that the optimally chosen portfolio of the investor is on the minimum-variance frontier. One property of the minimum-variance frontier is that it is closed to portfolio formation. That is, portfolios of frontier portfolios are also on the frontier.
1 When this assumption fails, it introduces market proxy error. This source of error is studied by Roll (1977), Stambaugh (1982), Kandel (1984), Kandel and Starnbaugh (1987), Shanken (1987), Hansen and Jagannathan (1994), and Jagannathan and Wang (1996), among others. We will ignore proxy error in our discussion.
Suppose that all investors have the same beliefs. Then every investor's optimally chosen portfolio will be on the same frontier, and hence the market portfolio of all assets in the economy - which is a portfolio of every investor's optimally chosen portfolio - will also be on the frontier. It can be shown (Roll 1977) that equation (2.3) will hold if Rpt is replaced by the return of any portfolio on the frontier and Rot is replaced by its corresponding zero-beta return. Hence we can replace an investor's optimal portfolio in equation (2.3) with the return on the market portfolio to get the CAPM, as given by equation (2.1).
2.2. Testable implications of the C A P M

Given an interesting collection of assets, and if their expected returns and marketportfolio betas fli are known, a natural way to examine the CAPM would be to estimate the empirical relation between the expected returns and the betas and see if that relation is linear. However, neither betas nor expected returns are observed by the econometrician. Both must be estimated. The finance literature first attacked this problem by using a two-step, time-series, cross-sectional approach. Consider the following sample analogue of the population relation given in (2.1):
Ri = go -I- ~lbi q- ei, i = 1,... ,N
(2.4)
which is a cross-sectional regression of Ri o n bi, with regression coefficients equal to 60 and 61. In equation (2.4), Ri denotes the sample average return of the asset, i, and b; is the (OLS) slope coefficient estimate from a regression of the return, R,-t, over time on the market index return, Rmt; bi is a constant. Let ui = Ri-E(git) and vi = fli-bi. Substituting these relations for E(Rit) and fli in (2.1) leads to (2.4) and specifies the composite error as ei = ui+blVi. This gives rise to a classic errors-in-variables problem, as the regressor bi in the cross-sectional regression model (2.4) is measured with error. Using finite time-series samples for the estimate of b,-, the regression (2.4)will deliver inconsistent estimates of 60 and 31, even with an infinite cross-sectional sample. However, the cross-sectional regression will provide consistent estimates of the coefficients as the time-series sample size T (which is used in the first step to estimate the beta coefficient fig) becomes very large. This is because the first-step estimate of fl~ is consistent, so as T becomes large, the errors-in-variables problem of the second-stage regression vanishes. The measurement error in beta may be large for individual securities, but it is smaller for portfolios. In view of this fact, early research focused on creating portfolios of securities in such a way that the betas of the portfolios could be estimated precisely. Hence one solution to the errors-in-variables problem is to work with portfolios instead of individual securities. This creates another problem. Arbitrarily chosen portfolios tend to exhibit little dispersion in their betas. If all the portfolios available to the econometrician have the same betas, then equation (2.1) has no empirical content as a cross-sectional relation. Black, Jensen, and Scholes (BJS, 1972) came up with an innovative solution to overcome
W.E. Ferson and R. Jagannathan
this difficulty. At every point in time for which a cross-sectional regression is run, they estimate betas on individual securities based on past history, sort the securities based on the estimated values of beta, and assign individual securities to beta groups. This results in portfolios with a substantial dispersion in their betas. Similar portfolio formation techniques have become standard practice in the empirical finance literature. Suppose that we can create portfolios in such a way that we can view the errors-in-variables problem as being of second-order importance. We still have to determine how to assess whether there is empirical support for the CAPM. A standard approach in the literature is to consider specific alternative hypotheses about the variables which determine expected asset returns. According to the CAPM, the expected return for any asset is a linear function of its beta only. Therefore, one natural test would be to examine if any other cross-sectional variable has the ability to explain the deviations from equation (2.1). This is the strategy that Fama and MacBeth (1973) followed by incorporating the square of beta and measures of nonmarket (or residual time-series) variance as additional variables in the cross-sectional regressions. More recent empirical studies have used the relative size of firms, measured by the market value of their equity, the ratio of book-to-market-equity, and related variables. 2 For example, the following model may be specified: E(R;t) = 6o + 61fli + OsizeLMEi (2.5)
where LMEi is the natural logarithm of the total market value of the equity capital of firm i. In what follows we will first show that these ideas extend easily to the general multiple-beta model. We will then develop a sampling theory for the cross-sectional regression estimators.
2.3. Multiple-beta pricing models and cross-sectional regression methods

According to the CAPM, the expected return on an asset is a linear function of its market beta. A multiple-beta model asserts that the expected return is a linear function of several betas, i.e., E(Ri,) = 6o +
~_,
k=l,...,K
6kflik
(2.6)
where flik, k-= 1,... ,K, are the multiple regression coefficients of the return of asset i on K economy-wide pervasive risk factors, fk, k = 1 , . . . , K. The coefficient 80 is the expected return on an asset that has fl0k = 0, for k -- 1 , . . . , K; i.e., it is the expected return on a zero- (multiple-) beta asset. The coefficient 6k, corresponding to the kth factor, has the following interpretation: it is the expected return differential, or premium, for a portfolio that has flik = 1 and to = 0 for all j k, 2 Fama and French (1992)is a prominent recent exampleof this approach. Berk (1995)providesa justification for using relativemarket value and book-to-priceratios as measures of expectedreturns.
Econometricevaluation of assetpricing models
measured in excess of the zero-beta asset's expected return. In other words, it is the expected return premium per unit of beta risk for the risk factor, k. Ross (1976) showed that an approximate version of (2.6) will hold in an arbitrage-free economy. Connor (1984) provided sufficient conditions for (2.6) to hold exactly in an economy with an infinite number of assets in general equilibrium. This version of the multiple-beta model, the exact APT, has received wide attention in the finance literature. When the factors, fk, are observed by the econometrician, the cross-sectional regression method can be used to empirically evaluate the multiple-beta model. 3 For example, the alternative hypothesis that the size of the firm is related to expected returns, given the factor betas, may be examined by using cross-sectional regressions of returns on the K factor betas and the LMEi, similar to equation (2.5), and by examining whether the coefficient 6size is different from zero.
2.4. Sampling distributions for coefficient estimators: The two-stage, cross-sectional regression method
In this section we follow Shanken (1992) and Jagannathan and Wang (1993, 1996) in deriving the asymptotic distribution of the coefficients that are estimated using the cross-sectional regression method. For the purposes of developing the sampiing theory, we will work with the following generalization of equation (2.6):
Ka k=0 /(2
E(Rit) = ~ Y l k Aik -k ZY2kflik

k= 1
(2.7)
where {Aik} are observable characteristics of firm i, which are assumed to be measured without error (the first "characteristic," when k = 0, is the constant 1.0). One of the attributes may be the size variable LMEi. The fli a r e regression betas on a set of/2 economic risk factors, which may include the market index return. Equation (2.7) can be written more compactly using matrix notation as # = X7 (2.8)
where Rt = [Rlt,... ,RNt], I.t = E(Rt),X = [A :/~], and the definition of the matrices A and/~ and the vector 7 follow from (2.7). The cross-sectional method proceeds in two stages. First, fl is estimated by time-series regressions of Rit on the risk factors and a constant. The estimates are denoted by b. Let x = [A : b], and let R denote the time-series average of the return vector Rt. Let 9 denote the estimator of the coefficient vector obtained from the following cross-sectional regression:
9 = (x'x) -lx'R
(2.9)
3 See Chen (1983), Connor and Korajczyk(1986), Lehmannand Modest (1987),and McElroyand Burmeister (1988)for discussions on estimating and testing the model when the factor realizations are not observableunder some additional auxiliary assumptions.
W . E . Ferson and R. Jagannathan
where we assume that x is of rank 1 + K1 + K2. If b and R converge respectively to/3 and E(Rt) in probability, then 9 will converge in probability to 7. Black, Jensen, and Scholes (1972) suggest estimating the sampling errors associated with the estimator, 9, as follows. Regress Rt on x at each date t to obtain 9t, where
St = (x'x) - l x ' R t
(2.10)
The BJS estimate of the covariance matrix of T1/2(9 - 7) is given by

v = T -1Z(gt
t
- g)(gt - O)'
(2.11)
which uses the fact that g is the sample mean of the gt's. Substituting the expression for gt given in (2.10) into the expression for v given in (2.11) gives
U ~- ( X t X ) - I x ' [ T - I Z ( R t
, - - R ) ( R t - R ) t ] x ( x ' x ) -1
(2.12)
To analyze the BJS covariance matrix estimator, we write the average return vector, R , as R = x7 + (R - kt) - (x - X ) 7 . Substitute this expression for R into the expression for 9 in (2.9) to obtain
Y - 7 = ( x ' x ) - l x ' [ ( R - #) - (b - fl)72]
(2.13)
(2.14)
Assume that b is a consistent estimate o f / 3 and that T I / 2 ( R - #) ~ a u and T1/2(b - / 3 ) ~ a h, where u and h are random variables with well-defined distributions and --+a indicates convergence in distribution. We then have
7)
(x'x)-Wu
- (x'x)-Wh
(2.15)
In (2.15) the first term on the right side is that component of the sampling error that arises from replacing g by the sample average R. The second term is the component of the sampling error that arises due to replacing/3 by its estimate b. The usual consistent estimate of the asymptotic variance of u is given by
T -1 Z ( R t
t
- R)(Rt - R)' .
(2.16)
Therefore, a consistent estimate of variance of the first term in (2.15) is given by

R)(R, -1
which is the same as the expression for the BJS estimate for the covariance matrix of the estimated coefficients v, given in (2.12). Hence if we ignore the sampling error that arises from using estimated betas, then the BJS covariance estimator
Econometric evaluation o f asset pricing models
provides a consistent estimate of the variance of the estimator g. However, if the sampling error associated with the betas is not small, then the BJS covariance estimator will have a bias. While it is not possible to determine the magnitude of the bias in general, Shanken (1992) provides a method to assess the bias under additional assumptions. 4 Consider the following univariate time-series regression for the return of asset i on a constant and the k th economic factor:
Rit = O~ik q- flikfkt -[- eikt .
(2.17)
We make the following additional assumptions about the error terms in (2.17): (1) the error Zm is mean zero, conditional on the time series of the economic factors fk; (2) the conditional covariance of ~ikt and ejlt, given the factors, is a fixed constant tr;jkl. We denote the matrix of the {aqu}ij by Zu. Finally, we assume that (3) the sample covariance matrix of the factors exists and converges in probability to a constant positive definite matrix f~, with the typical element f~kl. THEOREM 2.1. (Shanken, 1992/Jagannathan and Wang, 1996) T1/2(g - ?) converges in distribution to a normally distributed random variable with zero mean and covariance matrix V + W, where V is the probability limit of the matrix v given in (2.12) and
W=
(x'x)-lx'{?2kY2'(f*;1IIk'f~n1) } x(x'x)-I
(2.18)
l,k= 1,...,K2
where Hkl is defined in the appendix. PROOF. See the appendix. Theorem 2.1 shows that in order to obtain a consistent estimate of the covariance matrix of the BJS two-step estimator g, we first estimate v (a consistent estimate of V) by using the BJS method. We then estimate W by its sample analogue. Although the cross-sectional regression method is intuitively very appealing, the above discussion shows that in order to assess the sampling errors associated with the parameter estimators, we need to make rather strong assumptions. In addition, the econometrician must take a stand on a particular alternative hypothesis against which to reject the model. The general approach developed in Section 4 below has, among its advantages, weaker statistical assumptions and the ability to handle both unspecified as well as specific alternative hypotheses.
4 Shanken (1992) uses betas computed from multiple regressions. The derivation which follows uses betas computed from univariate regressions, for simplicity of exposition. The two sets of betas are related by an invertible linear transformation. Alternatively, the factors may be orthogonalized without loss of generality.
10
W. E. Ferson and R. Jagannathan
3. Asset pricing models and stochastic discount factors

Virtually all financial asset pricing models imply that any gross asset return Ri, t+l, multiplied by some market-wide random variable mt+ 1, has a constant conditional expectation: Et {mt+lRi,t+l} = 1,all i. (3.1)
The notation Et {-} will be used to denote the conditional expectation, given a market-wide information set. Sometimes it will be convenient to refer to expectations conditional on a subset Zt of the market information, which are denoted as E(. I Zt). F o r example, Zt can represent a vector of instrumental variables for the public information set which are available to the econometrician. When Zt is the null information set, the unconditional expectation is denoted as E(.). I f we take the expected values of equation (3.1), it follows that versions of the same equation must hold for the expectations E(-[Zt) and E(.). The r a n d o m variable mr+ 1 has various names in the literature. It is known as a stochastic discount factor, an equivalent martingale measure, a R a d o n - N i c o d y m derivative, or an intertemporal marginal rate of substitution. We will refer to an mt +1 which satisfies (3.1) as a valid stochastic discount factor. The motivation for use of this term arises from the following observation. Write equation (3.1) as Pit = Et{mt+lXi,t+l } where Xi, t + l is the payoff of asset i at time t 1 (the market value plus any cash payments) and Ri,t+l = Xi,t+l/Pit. Equation (3.1) says that if we multiply a future payoff Xi, t+l by the stochastic discount factor mt+l and take the expected value, we obtain the present value of the future payoff. The existence of an rnt+ 1 that satisfies (3.1) says that all assets with the same payoffs have the same price (i.e., the law o f one price). With the restriction that rot+ ! is a strictly positive random variable, equation (3.1) becomes equivalent to a no-arbitrage condition. The condition is that all portfolios of assets with payoffs that can never be negative, but are positive with positive probability, must have positive prices. The no-arbitrage condition does not uniquely identify m t + 1 unless markets are complete, which means that there are as m a n y linearly independent payoffs available in the securities markets as there are states of nature at date t + 1. To obtain additional insights about the stochastic discount factor and the no-arbitrage condition, assume for the m o m e n t that the markets are complete. Given complete markets, positive state prices are required to rule out arbitrage opportunities. 5 Let qt~ denote the time t price of a security that pays one unit at date t + 1 if, and only if, the state of nature at t + 1 is s. Then the time t price of a
5 See Debreu (1959) and Arrow (1970) for models of complete markets. See Beja (1971), Rubinstein (1976), Ross (1977), Harrison and Kreps (1979), and Hansen and Richard (1987) for further theoretical discussions.
11
security that promises to pay state of nature s, is given by
{Xi,s,t+l}
units at date t + 1, as a function of the
ZqtsX~,s,t+, = ~nts(qts/~ts)Xi,s,t+,
S S
where nt~ is the probability, as assessed at time t, that state s occurs at time t + 1. Comparing this expression with equation (3.1) shows that ms,t+1 = qts/nts is the value of the stochastic discount factor in state s, under the assumption that the markets are complete. Since the probabilities are positive, the condition that the random variable defined by {ms.t+ 1} is strictly positive is equivalent to the condition that all state prices are positive. Equation (3.1) is convenient for developing econometric tests of asset pricing models. Let Rt+l denote the vector of gross returns on the N assets on which the econometrican has observations. Then (3.1) can be written as
E{Rt+lmt+l} - 1 = 0
(3.2)
where 1 denotes the N vector of ones and 0 denotes the N vector of zeros. The set of N equations given in (3.2) will form the basis for tests using the generalized method of moments. It is the specific form of mt + ~ implied by a model that gives the equation empirical content.
3.1. Stochastic discount factor representations of the C A P M and multiple-beta asset pricing models
Consider the CAPM, as given by equation (2.1):
E(Rit+l ) = go q- (~lfli
where
fli = Cov(Rit+l , Rmt+l ) /Var(R,,t+l ) .

The CAPM can also be expressed in the form of equation (3.1), with a particular specification of the stochastic discount factor. To see this, expand the expected product in (3.1) into the product of the expectations plus the covariance, and then rearrange to obtain
E( Rit+l ) = liE(mr+l) + Cov(Rit+l ; -mt+l /E(mt+l ) )
(3.3)
Equating terms in equations (2.1) and (3.3) shows that the CAPM of equation (2.1) is equivalent to a version of equation (3.1), where
E(Rit+lmt+l) = 1
where
mt+l = CO -- ClRmt+l
co = [1 + E(emt+l)~l/Var(Rmt+l)]/60
(3.4)
12
and cl = 6t/[6oVar(Rmt+l)]. Equation (3.4) was originally derived by Dybvig and Ingersoll (1982). Now consider the following multiple-beta model which was given in equation (2.6):
E(Rit+I) = 60 qZ kflik k=l,...,K
"
It can be readily verified by substitution that this model implies the following stochastic discount factor representation:
E(Rit+lmit+l ) =
where
mit+l = co q- ClJ~t+l + " " -J- CKfKt+I
with co = [1 + Z { 6 k E ( f k ) / V a r ( f k ) } ] / 6 o
k
(3.5)
and ej = {6j/foVar(fj)}, j = 1,...,K .
The preceding results apply to the CAPM and multiple-beta models, interpreted as statements about the unconditional expected returns of the assets. These models are also interpreted as statements about conditional expected returns in some tests where the expectations are conditioned on predetermined, publicly available information. All of the analysis of this section can be interpreted as applying to conditional expectations, with the appropriate changes in notation. In this case, the parameters Co, Cl, 60, 61, etc., will be functions of the time t information set. 3.2. Other examples of stochastic discount factors In equilibrium asset pricing models, equation (3.1) arises as a first-order condition for a consumer-investor's optimization problem. The agent maximizes a lifetime utility function of consumption (including possibly a bequest to heirs). Denote this function by V(-). If the allocation of resources to consumption and to investment assets is optimal, it is not possible to obtain higher utility by changing the allocation. Suppose that an investor considers reducing consumption at time t to purchase more of (any) asset. The utility cost at time t of the forgone consumption is the marginal utility of consumption expenditures Ct, denoted by (3V/OCt) > 0, multiplied by the price Pi,t of the asset, measured in the same units as the consumption expenditures. The expected utility gain of selling the share and consuming the proceeds at time t + 1 is
13
Et{(Pi,t+l + O i , t + l ) ( O V / O C t + l ) )
where Di,t+ 1 is the cash flow or dividend paid at time t + 1. If the allocation maximizes expected utility, the following must hold:
Pi,tEt{ (OV/OCt) } = Et{ (Pi,t+I + Di,t+l)( OV/OCt+i) }.

This intertemporal Euler equation is equivalent to equation (3.1), with
mt+l = (OV/OG+I)/Et{(OV/OCt)} .
(3.6)
The mt+l in equation (3.6) is the intertemporal marginal rate of substitution (IMRS) of the representative consumer. The rest of this section shows how many models in the asset pricing literature are special cases of (3.1), where mt+l is defined by equation (3.6). 6 If a representative consumer's lifetime utility function V(-) is time-separable, the marginal utility of consumption at time t, (b V/OCt), depends only on variables dated at time t. Lucas (1978) and Breeden (1979) derived consumption-based asset pricing models of the following type, assuming that the preferences are timeseparable and additive:
v= tu(C,I
t
where/~ is a time discount parameter and u(.) is increasing and concave in current consumption Ct. A convenient specification for u(.) is
u(C) = [C 1-~ - 1]/(1 - ~) .
(3.7)
In equation (3.7), ~ > 0 is the concavity parameter of the period utility function. This function displays constant relative risk aversion equal to ~.7 Based on these assumptions and using aggregate consumption data, a number of empirical studies test the consumption-based asset pricing model. 8 Dunn and Singleton (1986) and Eichenbaum, Hansen, and Singleton (1988), among others, model consumption expenditures that may be durable in nature. Durability introduces nonseparability over time, since the flow of consumption services depends on the consumer's previous expenditures, and the utility is de-
6 Asset pricing models typically focus on the relation of security returns to aggregate quantities. It is therefore necessary to aggregate the Euler equations of individuals to obtain equilibrium expressions in terms of aggregate quantities. Theoretical conditions which justify the use of aggregate quantities are discussed by Gorman (1953), Wilson (1968), Rubinstein (1974), Constantinides (1982), Lewbel (1989), Luttmer (1993), and Constantinides and Duffle (1994). 7 Relative risk aversion in consumption is defined as -Cu"(C)/u'(C). Absolute risk aversion is -u"(C)/u~(C), where a prime denotes a derivative. Ferson (1983) studies a consumption-based asset pricing model with constant absolute risk aversion. s Substituting (3.7) into (3.6) shows that mr+1 = fl(Ct+l/Ct) -c~. Empirical studies of this model include Hansen and Singleton (1982, 1983), Ferson (1983), Brown and Gibbons (1985), Jagannathan (1985), Ferson and Merrick (1987), and Wheatley (1988).
14
fined over the services. Current expenditures increase the consumer's future utility of services if the expenditures are durable. The consumer optimizes over the expenditures Ct; thus, durability implies that the marginal utility, (OV/OCt), depends on variables dated other than date t. Another form of time-nonseparability arises if the utility function exhibits habit persistence. Habit persistence means that consumption at two points in time are complements. For example, the utility of current consumption is evaluated relative to what was consumed in the past. Such models are derived by Ryder and Heal (1973), Becker and Murphy (1988), Sundaresan (1989), Constantinides (1990), Detemple and Zapatero (1991), and Novales (1992), among others. Ferson and Constantinides (1991) model both the durability of consumption expenditures and habit persistence in consumption services. They show that the two combine as opposing effects. In an example where the effect is truncated at a single lag, the derived utility of expenditures is V = (1 - ~ ) - 1 Z f l t ( C t + bCt_~)l-~
t
(3.8)
The marginal utility at time t is
(OV/OC,) = flt(C, -]- bCt_l) -a Ac-fit+AbEt {(Ct+l -~-bCt) -a }
(3.9)
The coefficient b is positive and measures the rate of depreciation if the good is durable and there is no habit persistence. If habit persistence is present and the good is nondurable, this implies that the lagged expenditures enter with a negative effect (b < 0). Ferson and Harvey (1992) and Heaton (1995) consider a form of time-nonseparability which emphasizes seasonality. The utility function is
(1 t
+ bC,_4)
where the consumption expenditure decisions are assumed to be quarterly. The subsistence level (in the case of habit persistence) or the flow of services (in the case of durability) is assumed to depend only on the consumption expenditure in the same quarter of the previous year. Abel (1990) studies a form of habit persistence in which the consumer evaluates current consumption relative to the aggregate consumption in the previous period, consumption that he or she takes as exogenous. The utility function is like equation (3.8), except that the "habit stock," bCt-1, refers to the aggregate consumption. The idea is that people care about "keeping up with the Joneses." Campbell and Cochrane (1995) also develop a model in which the habit stock is taken as exogenous by the consumer. This approach results in a simpler and more tractable model, since the consumer's optimization does not have to take account of the effects of current decisions on the future habit stock. Epstein and Zin (1989, 1991) consider a class of recursive preferences which can be written as Vt = F(Ct, CEQt(Vt+~)). CEQ~(.) is a time t "certainty equiva-
15
lent" for the future lifetime utility V t + 1 . The function F(.,CEQt(.)) generalizes the. usual expected utility function of lifetime consumption and may be time-nonseparable. Epstein and Zin (1989) study a special case of the recursive preference model in which the preferences are
Vt = [(1 - fl)Ctp + flEt(Vtl-I~)P/O-~)] 1/p
(3.10)
They show that when p 0 and 1 - ~ 0, the IMRS for a representative agent becomes
[fl(C,+,/C,) p-1 ] (1-~)/P{Rm,,+ 1}((1-~-p)/p)
(3.11)
where Rm,t+! is the gross market portfolio return. The coefficient of relative risk aversion for timeless consumption gambles is ~, and the elasticity of substitution for deterministic consumption is (1 - p ) - l . If~ = 1 - p, the model reduces to the time-separable, power utility model. If u = 1, the log utility model of Rubinstein (1976) is obtained. In summary, many asset pricing models are special cases of the equation (3.1). Each model specifies that a particular function of the data and the model parameters is a valid stochastic discount factor. We now turn to the issue of estimating the models stated in this form.
4. The generalized method of moments
In this section we provide an overview of the generalized method of moments and a brief review of the associated asymptotic test statistics. We then show how the G M M is used to estimate and test various specifications of asset pricing models.
4.1. An overview of the generalized method of moments in asset pricing models

Let Xt+l be a vector of observable variables. Given a model which specifies mt+l : m(O, Xt+l), estimation of the parameters 0 and tests of the model can then proceed under weak assumptions, using the G M M as developed by Hansen (1982) and illustrated by Hansen and Singleton (1982) and Brown and Gibbons (1985). Define the following model error term:
ui,t+l = m(O, xt+l)Ri,,+l - 1 .
(4.1)
The equation (3.1) implies that Et{ui,t+l } = 0 for all i. Given a sample of N assets and T time periods, combine the error terms from (4.1) into a T N matrix u, with typical row ult+l. By the law of iterated expectations, the model implies that E(ui,t+l ]Zt) -~ 0 for all i and t (for any Zt in the information set at time t), and therefore E(ut+lZt)= 0 for all t. The condition E ( u t + l Z t ) = 0 says that Ut+l is orthogonal to Zt and is therefore called an orthogonality condition. These or-
16
W.E. Fersonand R. Jagannathan
thogonality conditions are the basis of tests of asset pricing models using the GMM. A few points deserve emphasis. First, G M M estimates and tests of asset pricing models are motivated by the implication that E(ui,t+llZt) = 0, for any Zt in the information set at time t. However, the weaker condition E(ut+lZt) = 0, for a given set of instruments Zt, is actually used in the estimation. Therefore, G M M tests of asset pricing models have not exploited all of the predictions of the theories. We believe that further refinements to exploit the implications of the theories more fully will be useful. Empirical work on asset pricing models relies on rational expectations, interpreted as the assumption that the expectation terms in the model are mathematical conditional expectations. For example, the rational expectations assumption is used when the expected value in equation (3.1) is treated as a mathematical conditional expectation to obtain expressions for E(.IZ) and E(.). Rational expectations implies that the difference between observed realizations and the expectations in the model should be unrelated to the information that the expectations are conditioned on. Equation (3.1) says that the conditional expectation of the product of mt+l and Ri,t+l is the constant 1.0. Therefore, the error term 1 - mt+lRi,t+l in equation (4.1) should not be predictably different from zero when we use any information available at time t. If there is variation over time in a return Re,t+1 that is predictable using instruments Zt, the model implies that the predictability is removed when Ri,t+l is multiplied by a valid stochastic discount factor, mt+l. This is the sense in which conditional asset pricing models are asked to "explain" predictable variation in asset returns. This idea generalizes the "random walk" model of stock values, which implies that stock returns should be completely unpredictable. That model is a special case which can be motivated by risk neutrality. Under risk neutrality the IMRS is a constant. In this case, equation (3.1) implies that the r e t u r n Ri,t+ 1 should not differ predictably from a constant. G M M estimation proceeds by defining an N x L matrix of sample mean orthogonality conditions, G = (dZ/T), and letting g = vec(G), where Z is a T x L matrix of observed instruments with typical row Z/, a subset of the available information at time t. 9 The vec(.) operator means to partition G into row vectors, each of length L: (h_s, _h2, ..., hN). Then one stacks the h's into a vector, O, with length equal to the number of orthogonality conditions, NL. Hansen's (1982) G M M estimates of 0 are obtained by searching for parameter values that make 9 close to zero by minimizing a quadratic form 91W9, where W is an NLxNL weighting matrix. Somewhat more generally, let ut+l(O) denote the random N vector Rt+lm(O, xt+l)--l, and define 9 r ( 0 ) = T-l~(u~(O)Z t-l). Let Or denote the parameter values that minimize the quadratic form 9~rAror, where A r is any positive definite N L x NL matrix that may depend on the sample, and let J r 9 This section assumes that the same instruments are used for each of the asset equations. In general, each asset equation could use a differentset of instruments, which complicatesthe not~ttion.
17
denote the minimized value of the quadratic form g'rArgr. Jagannathan and Wang (1993) show that J r will have a weighted chi-square distribution which can be used for testing the hypothesis that (3.1) holds. THEOREM 4.1. (Jagannathan and Wang, 1993). Suppose that the matrix AT converges in probability to a constant positive definite matrix A. Assume also that x/Tor(00) ~ a N(0, S), where N(., .) denotes the multivariate normal distribution, 00 are the true parameter values, and S is a positive definite matrix. Let
D = E[Ogr/O0]lO=Oo
and let Q = (s 1/2)(,41/2) [I - (141/2)'D(D'AD)- 1Dt(A1/2)1 (A 1/2)(81/2) where A 1/2 and S 1/2 are the upper triangular matrices from the Cholesky decompositions of A and S. Then the matrix Q has NL-dim(O) nonzero, positive eigenvalues. Denote these eigenvalues by 2i, i = 1, 2, ..., NL-dim(O). Then Jr converges to
~'IZI Av "'" -]- ~NL-dim(O)~NL-dim(O)

where Xi, i = 1, 2, ..., NL-dim(O) independent random variables, each with a ChiSquare distribution with one degree of freedom. PROOF. See Jagannathan and Wang (1993). Notice that when the matrix A is W - S -1, the matrix Q is idempotent of rank NL-dim(O). Hence the nonzero eigenvalues of Q are unity. In this case, the asymptotic distribution reduces to a simple chi-square distribution with NLdim(P) degrees of freedom. This is the special case considered by Hansen (1982), who originally derived the asymptotic distribution of the Jr-statistic. The JTstatistic and its extension, as provided in Theorem 4.1, provide a goodness-of-fit test for models estimated by the GMM. Hansen (1982) shows that the estimators of 0 that minimize 9'W9 are consistent and asymptotically normal, for any fixed W. If the weighting matrix W is chosen to be the inverse of a consistent estimate of the covariance matrix of the orthogonality conditions S, the estimators are asymptotically efficient in the class of estimators that minimize 9'Wo for fixed W's. The asymptotic variance matrix of this optimal GMM estimator of the parameter vector is given as Cov(0) = [E(Og/OO)'WE(Og/O0)]-1 (4.2)
where 09/00 is an NL dim(P) matrix of derivatives. A consistent estimator for the asymptotic covariance of the sample mean of the orthogonalit~ conditions is used in practice. That is, we replace W in (4.2) with Cov(9)- and replace E(0g/00) with its sample analogue. An example of a consistent estimator for the optimal weighting matrix is given by Hansen (1982) as
18

Cov(g) = [(1/T)~--~ y~(Ut+lUtt+l_j)~ (ltl~_j) ] t j (4.3)
where denotes the Kronecker product. A special case that often proves useful arises when the orthogonality conditions are not serially correlated. In that special case, the optimal weighting matrix is the inverse of the matrix Cov(g), where Cov(9) = [(l/T)Z(ut+lu't+l) (ZtZ~)] . t (4.4)
The GMM weighting matrices originally proposed by Hansen (1982) have some drawbacks. The estimators are not guaranteed to be positive definite, and they may have poor finite sample properties in some applications. A number of studies have explored alternative estimators for the GMM weighting matrix. A prominent example by Newey and West (1987a) suggests weighting the autocovariance terms in (4.3) with Bartlett weights to achieve a positive semi-definite matrix. Additional refinements to improve the finite sample properties are proposed by Andrews (1991), Andrews and Monahan (1992), and Ferson and Foerster (1994).
4.2. Testing hypotheses with the G M M
As we noted above, the Jr-statistic provides a goodness-of-fit test for a model that is estimated by the GMM, when the model is overidentified. Hansen's J:rstatistic is the most commonly used test in the finance literature that has used the GMM. Other standard statistical tests based on the GMM are also used in the finance literature for testing asset pricing models. One is a generalization of the Wald test, and a second is analogous to a likelihood ratio test statistic. Additional test statistics based on the GMM are reviewed by Newey (1985) and Newey and West (1987b). For the Wald test, consider the hypothesis to be tested as expressed in the Mvector valued function H(O) = 0, where M < dim(0). The GMM estimates of 0 are asymptotically normal, with mean 0 and variance matrix t~ov(0). Given standard regularity conditions, it follows that the estimates of/z/are asymptotically normal, with mean zero and variance matrix//0Cov(0)//~, where subscripts denote partial derivatives, and that the quadratic form
is asymptotically chi-square, providing a standard Wald test. A likelihood ratio type test is described by Newey and West (1987b), Eichenbaum, Hansen, and Singleton (1988, appendix C), and Gallant (1987). Newey and West (1987b) call this the D test. Assume that the null hypothesis implies that the orthogonality conditions E(9*) = 0 hold, while, under the alternative, a subset E(9 ) = 0 hold. For example, 9* = (9, h). When we estimate the model under the null hypothesis, the quadratic form 9"~W'9 * is minimized. Let W~I be the upper left block of W*; that is, let it be the estimate of Cov (9) -~ under the null. When we
19
hold this matrix fixed the model can be estimated under the alternative by mini= mizing g~Wfl g. The difference of the two quadratic forms T [ g * ' w V - g'W lg] is asymptotically chi-square, with degrees of freedom equal to M if the null hypothesis is true. Newey and West (1987b) describe additional variations on these tests.
4.3. Illustrations." Using the G M M to test the conditional CAPM

The CAPM imposes nonlinear overidentifying restrictions on the first and second moments of asset returns. These restrictions can form a basis for econometric tests. To see these restrictions more clearly, notice that if an econometrician knows or can estimate Cov(Rit,Rmt), E(Rmt), Var(Rmt), and E(R0t), it is possible to compute E(Rit) from the CAPM, using equation (2.1). Given a direct sample estimate of E(Rit), the expected return is overidentified. It is possible to use the overidentification to construct a test of the CAPM by asking if the expected return on the asset is different from the expected return assigned by the model. In this section we illustrate such tests by using both the traditional, return-beta formulation and the stochastic discount factor representation of the CAPM. These examples extend easily to the multiple-beta models.
4.3.1. Static or unconditional CAPMs If we make the assumption that all the expectation terms in the CAPM refer to the unconditional expectations, we have an unconditional version of the CAPM. It is straightforward to estimate and then test an unconditional version of the CAPM, using equation (3.1) and the stochastic discount factor representation given in equation (3.4). The stochastic discount factor is
mt+l ~ Co -[- ClRmt+l
where Co and Cl are fixed parameters. Using only the unconditional expectations, the model implies that E{(c0 + clRmt+l)Rt+l-1) = 0 where Rt+ ~ is the vector of gross asset returns. The vector of sample orthogonality conditions is
gT : gT(CO, e l ) ~- ( 1 / T ) ~ ' ~ ( ( C t 0 + ClRmt+l)Rt+l - 1} .
With assets N > 2, the number of orthogonality conditions is N and the number of parameters is 2, so the Jr-statistic has N - 2 degrees of freedom. Tests of the unconditional CAPM using the stochastic discount factor representation are conducted by Carhart et al. (1995) and Jagannathan and Wang (1996), who reject the model using monthly data for the postwar United States.
20
Tests of the unconditional CAPM may also be conducted using the linear, return-beta formulation of equation (2.1) and the G M M . Let rt = R t - R o t l be the vector of excess returns, where R o t is the gross return on some reference asset and 1 is an N vector of ones; also let ut = rt - flrmt, where fl is the N vector of the betas of the excess returns, relative to the market, and rmt = R m t - R o t is the excess return on the market portfolio. The model implies that E(ut) = E(utrmt) = 0 . Let the instruments be Zt = (1, rmt)'. The sample orthogonality condition is then
a (fl) = r -1
t
fir,,,) o zt
The number of orthogonality conditions is 2N and the number of parameters is N, so the model is overidentified and may be tested using the Jr-statistic. An alternative approach to testing the model using the return-beta formulation is to estimate the model under the hypothesis that expected returns depart from the predictions of the CAPM by a vector of parameters ~, which are called J e n s e n ' s a l p h a s . Redefining ut = rt - ~ - f i r m , the model has 2N parameters and 2N orthogonality conditions, so it is exactly identified. It is easy to show that the G M M estimators of ~ and fl are the same as the OLS estimators, and equation (4.4) delivers White's (1980) heteroskedasticity-consistent standard errors. The CAPM may be tested using a Wald test or the D-statistic, as described above. Tests of the unconditional CAPM using the linear return-beta formulation are conducted with the G M M by MacKinlay and Richardson (1991), who reject the model for monthly U.S. data.
4.3.2. Conditional CAPMs
Empirical studies that rejected the unconditional CAPM, as well as mounting evidence of predictable variation in the distribution of security rates of return, led to empirical work on conditional versions of the CAPM starting in the early 1980s. In a c o n d i t i o n a l a s s e t p r i c i n g m o d e l it is assumed that the expectation terms in the model are conditional expectations, given a public information set that is represented by a vector of predetermined instrumental variables Z t . The multiplebeta models of Merton (1973) and Cox, Ingersoll, and Ross (1985) are intended to accommodate conditional expectations. Merton (1973, 1980) and Cox-IngersollRoss also showed how a conditional version of the CAPM may be derived as a special case of their intertemporal models. Hansen and Richard (1987) describe theoretical relations between conditional and unconditional versions of meanvariance efficiency. The earliest empirical formulations of conditional asset pricing models were the l a t e n t v a r i a b l e m o d e l s developed by Hansen and Hodrick (1983) and Gibbons and Ferson (1985) and later refined by Campbell (1987) and Ferson, Foerster, and Keim (1993). These models allow time-varying expected returns, but maintain the assumption that the conditional betas are fixed parameters. Consider the
21
linear, return-beta representation of the CAPM under these assumptions, writing E(rdZt-1) = flE(rmt[Zt-1). The returns are measured in excess of a risk-free asset. Let rlt be some reference asset with nonzero ill, so that E(rulZt_l) = fllE(rmt[Zt_l) Solving this expression for
E(rmtlZt-0 and
substituting, we have
E(rtlz
-l) = C (rl,lZ -I)
where C = (ft.~ill) and ./ denotes element-by-element division. With this substitution, the expected market risk premium is the latent variable in the model, and C is the N vector of the model parameters. When we form the error term ut = rt - Crlt, the model implies E ( u t [ Z t _ l ) = 0 and we can estimate and test the model by using the G M M . Gibbons and Ferson (1985) argued that the latent variable model is attractive in view of the difficulties in measuring the true market portfolio, but Wheatley (1989) emphasized that it remains necessary to assume that ratios of the betas, measured with respect to the unobserved market portfolio, are constant parameters. Campbell (1987) and Ferson and Foerster (1995) show that a single-beta latent variable model is rejected in U.S. data. This finding rejects the hypothesis that there is a (conditional) minimum-variance portfolio such that the ratios of conditional betas on this portfolio are fixed parameters. Therefore, the empirical evidence suggests that conditional asset pricing models should be consistent with either (1) a time-varying beta or (2) more than one beta for each assetJ Conditional, multiple-beta models with constant betas are examined empirically by Ferson and Harvey (1991), Evans (1994), and Ferson and Korajczyk (1995). They reject such models with the usual statistical tests but find that they still capture a large fraction of the predictability of stock and bond returns over time. When allowing for time-varying betas, these studies find that the timevariation in betas contributes a relatively small amount to the time-variation in expected asset returns. Intuition for this finding can be obtained by considering the following approximation. Suppose that time-variation in expected excess returns is E(r[Z) = 2fl, where 2 is a vector of time-varying expected risk premiums for the factors and fl is a matrix of time-varying betas. Using a Taylor series, we can approximate Var[E(r]Z)] ~ E(fl)'Var[2]E(fl) + E(2)'Var[fl]E(2) . The first term in the decomposition reflects the contribution of the time-varying risk premiums; the second reflects the contribution of time-varying betas. Since the average beta E(fl) is on the order of 1.0 in monthly data, while the average risk 10A model with more than one fixed beta, and with time-varyingrisk premiums, is generally consistent with a single, time-varyingbeta for each asset. For example, assume that there are two factors with constant betas and time-varyingrisk premiums, where a time-varyingcombination of the two factors is a minimum-varianceportfolio.
22
premium E(2) is typically less than 0.01, the first term dominates the second term. This means that time-variation in conditional betas is less important than timevariation in expected risk premiums, from the perspective of modeling predictable variation in expected asset returns. While from the perspective of modeling predictable time-variation in asset returns, time-variation in conditional betas is not as important as time-variation in expected risk premiums, this does not imply that beta variation is empirically unimportant. From the perspective of modeling the cross-sectional variation in expected asset returns, beta variation over time may be very important. To see this, consider the unconditional expected excess return vector, obtained from the model as E{E(rIZ)} = E{23} = E(2)E(fl) + Cov(2, fl) . Viewed as a cross-sectional relation, the term Cov(2, 3) may vary significantly in a cross section of assets. Therefore, the implications of a conditional version of the CAPM for the cross section of unconditional expected returns may depend importantly on common time-variation in betas and expected market risk premiums. The empirical tests of Jagannathan and Wang (1996) suggest that this is the case. Harvey (1989) replaced the constant beta assumption with the assumption that the ratio of the expected market premium to the conditional market variance is a fixed parameter, as in
E (rmt I Z t - 1 ) / V a r ( r m t IZt-l) = ?
The conditional expected returns may then be written according to the conditional CAPM as
E(r, lZt_~ )
= 7Cov(rt, rmtlZt-1) .
Harvey's version of the conditional CAPM is motivated by Merton's (1980) model in which the ratio Y, called the m a r k e t p r i c e o f risk, is equal to the relative risk aversion of a representative investor in equilibrium. Harvey also assumes that the conditional expected risk premium on the market (and the conditional market variance, given fixed Y) is a linear function of the instruments, as in
E(rmtlZt_l ) ~- fimZt_l '
I where 6m is a coefficient vector. Define the error terms vt = r , m - 6mZt-~ and wt = rt(1 - vtT). The model implies that the stacked error term ut = (vt,wt) satisfies E ( u t l Z t _ l ) = 0, so it is straightforward to estimate and then test the model using the GMM. Harvey (1989) rejects this version of the conditional CAPM for monthly data in the U.S. In Harvey (1991) the same formulation is rejected when applied using a world market portfolio and monthly data on the stock markets of 21 developed countries. The conditional CAPM may be tested using the stochastic discount factor representation given by equation (3.4): mt+l = C o t - CltRmt+l. In this case the
23
coefficients Cot and clt are measurable functions of the information set Zt. To implement the model empirically it is necessary to specify functional forms for the Cot and clt. From the expression (3.4) it can be seen that these coefficients are nonlinear functions of the conditional expected market return and its conditional variance. As yet there is no theoretical guidance for specifying the functional forms. Cochrane (1996) suggests approximating the coefficients using linear functions, and this approach is followed by Carhart et al. (1995), who reject the conditional CAPM for monthly U.S. data. Jagannathan and Wang (1993) show that the conditional CAPM implies an unconditional two-factor model. They show that
mt+l : ao + alE(rmt+l lit) + Rmt+l
(where It denotes the information set of investors and a0 and al are fixed parameters) is a valid stochastic discount factor in the sense that E(Ri, t+ lmt+ 1) --- 1 for this choice of mt +1. Using a set of observable instruments Zt, and assuming that E(rmt+l [Zt) is a linear function of Zt, they find that their version of the model explains the cross section of unconditional expected returns better than does an unconditional version of the CAPM. Bansal and Viswanathan (1993) develop conditional versions of the CAPM and multiple-factor models in which the stochastic discount factor mt+l is a nonlinear function of the market or factor returns. Using nonparametric methods, they find evidence to support the nonlinear versions of the models. Bansal, Hsieh, and Viswanathan (1993) compare the performance of nonlinear models with linear models, using data on international stocks, bonds, and currency returns, and they find that the nonlinear models perform better. Additional empirical tests of the conditional CAPM and multiple-beta models, using stochastic discount factor representations, are beginning to appear in the literature. We expect that future studies will further refine the relations among the various empirical specifications.
5. Model diagnostics
We have discussed several examples of stochastic discount factors corresponding to particular theoretical asset pricing models, and we have shown how to test whether these models assign the right expected returns to financial assets. The stochastic discount factors corresponding to these models are particular parametric functions of the data observed by the econometrician. While empirical studies based on these parametric approaches have led to interesting insights, the parametric approach makes strong assumptions about the economic environment. In this section we discuss some alternative econometric approaches to the problem of asset pricing models.
24
5.1. M o m e n t
w. E. Ferson and R. Jagannathan inequality restrictions
Hansen and Jagannathan (1991) derive restrictions from asset pricing models while assuming as little structure as possible. In particular, they assume that the financial markets obey the law of one price and that there are no arbitrage opportunities. These assumptions are sufficient to imply that there exists a stochastic discount factor m t + l (which is almost surely positive, if there is no arbitrage) such that equation (3.1) is satisfied. Note that if the stochastic discount factor is a degenerate random variable (i.e., a constant), then equation (3.1) implies that all assets must earn the same expected return. If assets earn different expected returns, then the stochastic discount factor cannot be a constant. In other words, cross-sectional differences in expected asset returns carry implications for the variance of any valid stochastic discount factor, which satisfies equation (3.1). Hansen and Jagannathan make use of this observation to derive a lower bound on the volatility of stochastic discount factors. Shiller (1979, 1981), Singleton (1980), and Leroy and Porter (1981) derive a related volatility bound in specific models, and their empirical work suggests that the stochastic discount factors implied by these simple models are not volatile enough to explain expected returns across assets. Hansen and Jagannathan (1991) show how to use the volatility bound as a general diagnostic device. In what follows we derive the Hansen and Jagannathan (1991) bound and discuss their empirical application. To simplify the exposition, we focus on an unconditional version of the bound using only the unconditional expectations. We posit a hypothetical, unconditional, risk-free asset with return R f = E(mt+0 -1 . We take the value of Rf, or equivalently E(mt+ 1), as a parameter to be varied as we trace out the bound. The law of one price guarantees the existence of some stochastic discount factor which satisfies equation (3.1). Consider the following projection of any such mt+l on the vector of gross asset returns, Rt+l:
mt+l = R t'+ l f l Jr
~t+l
(5.1)
where
E(ct+lRt+l) = 0
and where fl is the projection coefficient vector. Multiply both sides of equation (5.1) by Rt+l and take the expected value of both sides of the equation, using E[Rt+lCt+l] = O, t o arrive at an expression which may be solved for ft. Substituting this expression back into (5.1) gives the "fitted values" of the projection as
mt+ 1 = R t + l f l = R t + I E ( R t + i R t + I )
, / ! ! --1
1 .
(5.2)
By inspection, the mt*+l given by equation (5.2) is a valid stochastic discount factor, in the sense that equation (3.1) is satisfied when mt*+l is used in place of mr+ 1 . We have therefore constructed a stochastic discount factor mt*+l that is also a payoff on an investment position in the N given assets, where the vector
25
t -1 E(Rt+IRt+I) _1 provides the weights. This payoff is the unique linear least
squares approximation of every admissible stochastic discount factor in the space of available asset payoffs. Substituting mr+ 1 . for Rt+lfl' in equation (5.1) shows that we may write any stochastic discount factor, mr+l, as
mt+l ~ mt+ 1 + t+l
where E(~t+lmt*+l ) = O. It follows that Var(mt+l) _> Var(mt+l). This expression is the basis of the Hansen-Jagannathan bound 11 on the variance of mt +2. Since mr+1 depends only on the second moment matrix of the N returns, the lower bound depends only on the assets available to the econometrician and not on the particular asset pricing model that is being studied. To obtain an explicit expression for the variance bound in terms of the underlying asset-return moments, substitute from the previous expressions to obtain Var(mt+l) _~ Var(mt+l) = ffVar(Rt+l)// = [Cov(m, R')Var(R)-2] Var(R)[Var(R)-lCov(m, R')]
= [1 - E(m)E(R')]Var(R) -1 [! - E(m)E(R)]
(5.3)
where the time subscripts are suppressed to conserve notation and the last line follows from E(mR) = 1 = E(m)E(R) + Cov(m, R). As we vary the hypothetical values of E(m) = R~f1, the equation (5.3) traces out a parabola in E(m), ~(m) space, where ~(rn) is the standard deviation of mt+l. If we place ~(rn) on the y axis and E(m) on the x axis, the Hansen-Jagannathan bounds resemble a cup, and the implication is that any valid stochastic discount factor mt+l must have a mean and standard deviation that place it within the cup. The lower bound on the volatility of a stochastic discount factor, as given by equation (5.3), is closely related to the standard mean-variance analysis that has long been used in the financial economics literature. To see this, recall that if r = R - R f is the vector of excess returns, then (3.1) implies that 0 = E(mr) = E(m)E(r) + pa(m)a(r) . Since - 1 _< p < 1, we have that
a ( m ) / E ( m ) > E(ri)/cr(ri)
for all i. The right side of this expression is the Sharpe ratio for asset i. The Sharpe ratio is defined as the expected excess return on an asset, divided by the standard deviation of the excess return (see Sharpe 1994 for a recent discussion of this ratio). Consider plotting every portfolio that can be formed from the N assets in the Standard Deviation (x axis) - Mean (y axis) plane. The set of such portfolios
11Related bounds were derived by Kandel and Stambaugh (1987), Mackinlay (1987, 1995), and Shanken (1987).
26
Iv. E. Ferson and R. Jagannathan
with the smallest possible standard deviation for a given mean return is the minimum-variance boundary. Consider the tangent to the minimum-variance boundary from the point 1/E(m) on the y axis. The tangent point is a portfolio of the asset returns, and the slope of this tangent line is the maximum Sharpe ratio that can be attained with a given set of N assets and a given risk-free rate, Rf = 1/E(m). The slope of this line is also equal to Rf multiplied by the Hansen-Jagannathan lower bound on a(m) for a given E(m) =/~fl. That is, we have that
a(m) > E(m)lMax{E(ri)/a(ri)} I

for the given Rf. The preceding analysis is based on equation (3.1), which is equivalent to the law of one price. If there are no arbitrage opportunities, it implies that mt+l is a strictly positive random variable. Hansen and Jagannathan (1991) show how to obtain a tighter bound on the standard deviation of mt+l by making use of the restriction that there are no arbitrage opportunities. They also show how to incorporate conditioning variables into the analysis. Snow (1991) extends the Hansen-Jagannathan analysis to include higher moments of the asset returns. His extension is based on the Holder inequality, which implies that for given values of 6 and p such that (1/3) + ( l / p ) -- 1 it is true that E(mR) <_ E(ma)I/~E(RP) lip. Cochrane and Hansen (1992) refine the Hansen-Jagannathan bound to consider information about the correlation between a given stochastic discount factor and the vector of asset returns. This provides a tighter set of restrictions than the original bounds, which only make use of the fact that the correlation must be between -1 and + 1.
5.2. Statistical infere'nce for moment inequality restrictions

Cochrane and Hansen (1992), Burnside (1994), and Cecchetti, Lam, and Mark (1994) show how to take sampling errors into account when examining whether a particular candidate stochastic discount factor satisfies the Hansen-Jagannathan bound. In what follows we will outline a computation which allows for sampling errors, following the discussion in Cochrane and Hansen (1992). Assume that the econometrician has a time series of T observations on a candidate for the stochastic discount factor, denoted by Yt, and the N asset returns R t. We also assume that the risk-free asset is not one of the N assets. Hence v -- E(m) = 1/RF is an unknown parameter to be estimated. Consider a linear regression of mt+l onto the unit vector and the vector of asset returns as mt+l = ~ + R't ~ + ut+l. We use the regression function in the following system of population moment conditions:
27 (5.4)
E(~ + R't~ ) = v
E(e,= + Rte',#) = ! u
E(yt) = v
+ RI, ) 2] -
<_ 0
The first equation says that the expected value of m t =-- O~ R t t f l -~ V. The second equation says that the regression function for mt is a valid stochastic discount factor. The third equation says that v is the expected value of the particular candidate discount factor that we wish to test. The fourth equation states that the Hansen-Jagannathan bound is satisfied by the particular candidate stochastic discount factor. We can estimate the parameters v, ~, and the N v e c t o r / / , using the N + 3 equations in (5.4), by treating the last inequality as an equality and using the G M M . Treating the last equation as an equality corresponds to the null hypothesis that the mean and variance of Yt place it on the Hansen-Jagannathan boundary. Under the null hypothesis that the last equation of (5.4) holds as an equality, the minimized value of the G M M criterion function J r , multiplied by T, has a chi-square distribution with one degree of freedom. Cochrane and Hansen (1992) suggest testing the inequality relation using the one-sided test.
5.3. Specification error bounds
The methods we have examined so far are developed, for the most part, under the null hypothesis that the asset pricing model under consideration by the econometrician assigns the right prices (or expected returns) to all assets. An alternative is to assume that the model is wrong and examine how wrong the model is. In this section we will follow Hansen and Jagannathan (1994) and discuss one possible way to examine what is missing in a model and assign a scalar measure of the model's misspecification) 2 Let Yt denote the candidate stochastic discount factor corresponding to a given asset pricing model, and let m t denote the unique stochastic discount factor that we constructed earlier, as a combination of asset payoffs. We assume that E[ytRt] does not equal 1N, the N vector of ones; i.e., the model does not correctly price all of the gross returns. We can project Yt on the N asset returns to get yt = RIt~ + ut, and project m;" on the vector of asset returns to get m t = R'tfl + et. Since the candidate Yt does not correctly price all of the assets, then ~ and fl will not be the same. Define pt = (fl - oOtgt as the modifying p a y o f f to the candidate stochastic
12 GMM-based model specificationtests are examined in a general setting by Newey (1985). Other related work includes that by Boudoukh, Richardson, and Smith (1993), who compute approximate bounds on the probabilities of the test statistics in the presence of inequality restrictions; Chen and Knez (1992) develop nonparametric measures of market integration by using related methods; and Hansen, Heaton, and Luttmer (1995) show how to compute specification error and volatility bounds when there are market frictions such as short-sale constraints and proportional transaction costs.
28
discount factor Yr. Clearly, (Yt +Pt) is a valid stochastic discount factor, satisfying equation (3.1). Hansen and Jagannathan (1994) derive specification tests based on the size of the modifying payoff, which measures how far the model's candidate for a stochastic discount factor yt is from a valid stochastic discount factor. Hansen and Jagannathan (1994) show that a natural measure of this distance is 6 = E(pt2), which provides an economic interpretation for the model's misspecification. Payoffs that are orthogonal to Pt are correctly priced by the candidate Yt, and E(p 2) is the maximum amount of mispricing by using Yt for any payoff normalized to have a unit second moment. The modifying payoffpt is also the minimal modification that is sufficient to make yt a valid stochastic discount factor. Hansen and Jagannathan (1994) consider an estimator of the distance measure 6 given as the solution to the following maximization problem:
6T =
Max~T-1 E [ Y t2 - (Yt + o{Rt) 2 q- 2~t1~/]1/2.

t
(5.5)
If ~r is the solution to (5.5), then the estimate of the modifying payoff is ~R,. It can be readily verified that the first-order condition to (5.5) implies that o~tTRt satisfies the sample counterpart to the asset pricing equation (3.1). To obtain an estimate of the sampling error associated with the estimated value 6it, consider
ut = y2t - (yt + o~'rRt)2 + 2~r_IU I
The sample mean of ut is 6~,. We can obtain a consistent estimator of the variance of 6~ by the frequency zero spectral density estimators described in Newey and West (1987a) or Andrews (1991) and applied to the time series {ut - 6~}t=1...r. Let sv denote the estimated standard deviation of 6~ obtained this way. Then, under standard assumptions, we have that T1/2(6~ . - 6 ) / s r converges to a normal (0,1) random variable. Hence, using the delta method, we obtain
T1/Z fir /2sT( fir - 6) ~ N(O, 1) .
(5.6)
6. Conclusions
In this article we have reviewed econometric tests of a wide range of asset pricing models, where the models are based on the law of one price, the no-arbitrage principle, and models of market equilibrium with investor optimization. Our review included the earliest of the equilibrium asset pricing models, the CAPM, and also considered dynamic multiple-beta and arbitrage pricing models. We provided some results for the asymptotic distribution of traditional two-pass estimators for asset pricing models stated in the linear, return-beta formulation. We emphasized the econometric evaluation of asset pricing models by using
29
Hansen's (1982) generalized method of moments. Our examples illustrate the simplicity and flexibility of the G M M approach. We showed that most asset pricing models can be represented in a stochastic discount factor form, which makes the application of the G M M straightforward. Finally, we discussed model diagnostics that provide additional insight into the causes of the statistical rejections in G M M tests and which help assess the specification errors in these models.
Appendix
PROOF OF THEOREM 2.1 The proof comes from Jagannathan and Wang (1996). We first introduce some additional notation. Let IN be the N-dimensional identity matrix and 1T be a T-dimensional vector of ones. It follows from equation (2.17) that
i ! R - ~ = r - ~ ( I u 1~) ~t,
k= 1,...,K2
where
~k =
(C l k l ~ - ' ' : ~ l k T ~ ' ' ' : C N k l ~ ' ' - ~ f - N k T
)'.
By the definition of bk, we have that

bk - /~k = [IN ( ( f k ~ ,) -1 fk)]~k ,
wherefk is the vector-demeaned factor realizations, conformable to the vector ek. In view of the assumption that the conditional covariance of ei~ and Ejtt, given the time series of the factors (denoted by J~), is a fixed constant aqkl, we have that E[(bk - flt)(R1 --/~l)[Jk]
= T - I [ I N ((fk t fk) -1 fk t )]E[~kellfk](IN t 1T)
= r-l[IN ((J/)-l)]Ykt(XN 1T) = r-l~t~ [ ( A ) - I 1 T ) ] = 0

where we denote the matrix of the {aijkl}ij by Zkl. The last line follows from the fact t h a t f f l v = 0. Hence we have shown that ( b k - fit) is uncorrelated with (R - #). Therefore, the terms u and h72 should be uncorrelated, and the asymptotic variance of T1/2(9- ?) in equation (2.15) is given by
(x'x)-lx'[Var(u) + Var(h?2)lx(x'x) -1
Let r~qkl denote the limiting value of Cov(v~f~eik, v/-f f / g t ) , as T ~ o0. Let the matrix with rcqtt being its ij th element be denoted by IItt. We assume that the sample covariance matrix of the factors exists and converges in probability to a constant positive definite matrix f~, with typical element ~kt. Since v ~ (bit - flit) converges in distribution to the random variable f~, -1 v ~ f/~eit, , we have
30
Var(hY2) and
= 2...al k 1
K2 ])2k~2l kk
fUIH kt"~ll '~-1
W = (x'x)-lx'Var(hy2)x(x'x) -'
Z
1,k=l,...,k2
(x'x)-lx'{y2kY2'(n2H~tOTtl)}x(x'x)-I
where H~I is a matrix whose i,jth element is the limiting value of Cov(v@f~ Elk,
v~f/~jl) as T --, ~ .
Q.E.D.
References
Abel, A. (1990). Asset prices under habit formation and catching up with the Jones. Amer. Econom. Rev. Papers Proc. 80, 38M2. Andrews, D. W. K. (1991). Heteroskedasticity and autoeorrelation consistent covariance matrix estimation. Econometrica 59, 817-858. Andrews, D. W. K. and J. C. Monahan (1992). An improved heteroskedasticity and autocorrelation consistent covariance matrix estimator. Econometrica 60, 953-966. Arrow, K. J. (1970). Essays in the Theory o f Risk-Bearing. Amsterdam: North-Holland. Bansal, R. and S. Viswanathan (1993). No- arbitrage and arbitrage pricing: A new approach. 3". Finance 8, 1231-1262. Bansal, R., D. A. Hsieh and S. Viswanathan (1993). A new approach to international arbitrage pricing. J. Finance 48, 1719-1747. Becker, G. S. and K. M. Murphy (1988). A theory of rational addiction. J. Politic. Econom. 96, 675700. Beja, A. (1971). The structure of the cost of capital under uncertainty. Rev. Econom. Stud. 38(8), 359368. Berk, J. B. (1995). A critique of size-related anomalies. Rev. Financ. Stud. 8, 27.%286. Black, F. (1972). Capital market equilibrium with restricted borrowing. J. Business 45, 444-455. Black, F., M. C. Jensen and M. Scholes (1972). The capital asset pricing model: Some empirical tests. In: Studies in the Theory of Capital Markets, M. C. Jensen, ed., New York: Praeger, 79-121. Boudoukh, J., M. Richardson and T. Smith (1993). Is the ex ante risk premium always positive? A new approach to testing conditional asset pricing models. J. Financ. Econom. 34, 387M08. Breeden, D. T. (1979). An intertemporal asset pricing model with stochastic consumption and investment opportunities. J. Financ. Econom. 7, 265-296. Brown, D. P. and M. R. Gibbons (1985). A simple econometric approach for utility-based asset pricing models. J. Finance 40, 359-381. Burnside, C. (1994). Hansen-Jagannathan bounds as classical tests of asset-pricing models. J. Business Econom. Statist. 12, 57-79. Campbell, J. Y. (1987). Stock returns and the term structure. J. Financ. Econom. 18, 373-399. Campbell, J. Y. and J. Cochrane (1995). By force of habit. Manuscript, Harvard Institute of Economic Research, Harvard University. Carhart, M., K. Welch, R. Stevens and R. Krail (1995). Testing the conditional CAPM. Working Paper, University of Chicago. Cecchetti, S. G., P. Lam and N. C. Mark (1994). Testing volatility restrictions on intertemporal marginal rates of substitution implied by Euler equations and asset returns. J. Finance 49, 123-152.
31
Chen, N. (1983). Some empirical tests of the theory of arbitrage pricing. J. Finance 38, 1393-1414. Chen, Z. and P. Knez (1992). A measurement framework of arbitrage and market integration. Working Paper, University of Wisconsin. Cochrane, J. H. (1996). A cross-sectional test of a production based asset pricing model. Working Paper, University of Chicago. Cochrane, J. H. and L. P. Hansen (1992). Asset pricing explorations for macroeconomics. In: NBER Macroeconomics Annual 1992, O. J. Blanchard and S. Fischer, eds., Cambridge, Mass.: MIT Press. Connor, G. (1984). A unified beta pricing theory. J. Econom. Theory 34, 13-31. Connor, G. and R. A. Korajczyk (1986). Performance measurement with the arbitrage pricing theory: A new framework for analysis. J. Financ. Econom. 15, 373-394. Constantinides, G. M. (1982). Intertemporal asset pricing with heterogeneous consumers and without demand aggregation. J. Business 55, 253-267. Constantinides, G. M. (1990). Habit formation: A resolution of the equity premium puzzle. J. Politic. Econom. 98, 519-543. Constantinides, G. M. and D. Duffle (1994). Asset pricing with heterogeneous consumers. Working Paper, University of Chicago and Stanford University. Cox, J. C., J. E. Ingersoll, Jr. and S. A. Ross (1985). A theory of the term structure of interest rates. Econometrica 53, 385-407. Debreu, G. (1959). Theory of Value: An Axiomatic Analysis of Economic Equilibrium. New York: Wiley. Detemple, J. B. and F. Zapatero (1991). Asset prices in an exchange economy with habit formation. Econometrica 59, 1633-1657. Dunn, K. B. and K. J. Singleton (1986). Modeling the term structure of interest rates under nonseparable utility and durability of goods. J. Financ. Econom. 17, 27-55. Dybvig, P. H. and J. E. Ingersoll, Jr., (1982). Mean-variance theory in complete markets. J. Business 55, 233-251. Eichenbaum, M. S., L. P. Hansen and K. J. Singleton (1988). A time series analysis of representative agent models of consumption and leisure choice under uncertainty. Quart. J. Econom. 103, 51-78. Epstein, L. G. and S. E. Zin (1989). Substitution, risk aversion and the temporal behavior of consumption and asset returns: A theoretical framework. Econometrica 57, 937-969. Epstein, L. G. and S. E. Zin (1991). Substitution, risk aversion and the temporal behavior of consumption and asset returns. J. Politic. Econom. 99, 263-286. Evans, M. D. D. (1994). Expected returns, time-varying risk, and risk premia. J. Finance 49, 655-679. Fama, E. F. and K. R. French. (1992). The cross-section of expected stock returns. J. Finance 47, 427465. Fama, E. F. and J. D. MacBeth (1973). Risk, return, and equilibrium: Empirical tests. J. Politic. Econom. 81, 607436. Ferson, W. E. (1983). Expectations of real interest rates and aggregate consumption: Empirical tests. J. Financ. Quant. Anal. 18, 477-497. Ferson, W. E. and G. M. Constantinides (1991). Habit persistence and durability in aggregate consumption: Empirical tests. J. Financ. Econom. 29, 199-240. Ferson, W. E. and S. R. Foerster (1994). Finite sample properties of the generalized method of moments tests of conditional asset pricing models. J. Financ. Econom. 36, 29-55. Ferson, W. E. and S. R. Foerster (1995). Further results on the small-sample properties of the generalized method of moments: Tests of latent variable models. In: Res. Financ., Vol. 13. Greenwich, Conn.: JAI Press, pp. 91-114. Ferson, W. E., S. R. Foerster and D. B. Keim (1993). General tests of latent variable models and mean-variance spanning. J. Finance 48, 131-156. Ferson, W. E. and C. R. Harvey (1991). The variation of economic risk premiums. J. Politic. Econom. 99, 385-415. Ferson, W. E. and C. R. Harvey (1992). Seasonality and consumption-based asset pricing. J. Finance 47, 511-552.
32
Ferson, W. E. and R. A. Korajczyk (1995). Do arbitrage pricing models explain the predictability of stock returns? J. Business 68, 309-349. Ferson, W. E. and J. J. Merrick, Jr. (1987). Non-stationarity and stage-of-the-business-cycle effects in consumption-based asset pricing relations. J. Financ. Econom. 18, 127-146. Gallant, R. (1987). Nonlinear Statistical Models. New York: Wiley. Gibbons, M. R. and W. Ferson (1985). Testing asset pricing models with changing expectations and an unobservable market portfolio. J. Financ. Econom. 14, 217-236. Gorman, W. M. (1953). Community preference fields. Econometrica 21, 63-80. Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica 50, 1029-1054. Hansen, L. P., J. Heaton and E. G. J. Luttmer (1995). Econometric evaluation of asset pricing models. Rev. Financ. Stud. 8, 237-274. Hansen, L. P. and R. Hodrick (1983). Risk averse speculation in the forward foreign exchange market: An econometric analysis of linear models. In: Exchange Rates and International Macroeconomics, J. A. Frenkel, ed., Chicago: University of Chicago Press. Hansen, L. P. and R. Jagannathan (1991). Implications of security market data for models of dynamic economies. J. Politic. Econom. 99, 225-262. Hansen, L. P. and R. Jagannathan (1994). Assessing specification errors in stochastic discount factor models. NBER Technical Working Paper No. 153. Hansen, L. P. and S. F. Richard (1987). The role of conditioning information in deducing testable restrictions implied by dynamic asset pricing models. Econornetrica 55, 587~13. Hansen, L. P. and K. J. Singleton (1982). Generalized instrumental variables estimation of nonlinear rational expectations models. Econometrica 50, 1269-1286. Hansen, L. P. and K. J. Singleton (1983). Stochastic consumption, risk aversion, and the temporal behavior of asset returns. J. Politic. Econorn. 91,249-265. Harrison, M. and D. Kreps (1979). Martingales and arbitrage in multi-period securities markets. J. Econom. Theory 20, 381-408. Harvey, C. R. (1989). Time-varying conditional covariances in tests of asset pricing models. J. Financ. Econom. 24, 289-317. Harvey, C. R. (1991). The world price of covariance risk. J. Finance 46, 111-157. Heaton, J. (1995). An empirical investigation of asset pricing with temporally dependent preference specifications. Econometrica 63, 681-717. Ibbotson Associates. (1992). Stocks, bonds, bills, and inflation. 1992 Yearbook. Chicago: Ibbotson Associates. Jagannathan, R. (1985). An investigation of commodity futures prices using the consumption-based intertemporal capital asset pricing model. J. Finance 40, 175-191. Jagannathan R. and Z. Wang (1993). The CAPM is alive and well. Federal Reserve Bank of Minneapolis Research Department Staff Report 165. Jagannathan, R. and Z. Wang (1996). The conditional-CAPM and the cross-section of expected returns. J. Finance 51, 3-53. Kandel, S. (1984). On the exclusion of assets from tests of the mean-variance efficiency of the market portfolio. J. Finance 39, 63-75. Kandel, S. and R. F. Stambaugh (1987). On correlations and inferences about mean-variance efficiency. J. Financ. Econom. 18, 61 90. Lehmann, B. N. and D. M. Modest (1987). Mutual fund performance evaluation: A comparison of benchmarks and benchmark comparisons. J. Finance 42, 233-265. Leroy, S. F. and R. D. Porter (1981). The present value relation: Tests based on implied variance bounds. Econometrica 49, 555-574. Lewbel, A. (1989). Exact aggregation and a representative consumer. Quart. J. Econom. 104, 621~533. Lintner, J. (1965). The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets. Rev. Econom. Statist. 47, 13-37. Lucas, R. E. Jr. (1978). Asset prices in an exchange economy. Econometrica 46, 1429-1445. Luttmer, E. (1993). Asset pricing in economies with frictions. Working Paper, Northwestern University.
33
McElroy, M. B. and E. Burmeister (1988). Arbitrage pricing theory as a restricted nonlinear multivariate regression model. J. Business Econom. Statist. 6, 29-42. MacKinlay, A. C. (1987). On multivariate tests of the CAPM. J. Financ. Econom. 18, 341-371. MacKinlay, A. C. and M. P. Richardson (1991). Using generalized method of moments to test meanvariance efficiency. J. Finance 46, 511-527. MacKinlay, A. C. (1995). Mulifactor models do not explain deviations from the CAPM. J. Financ. Econom. 38, 3-28. Merton, R. C. (1973). An intertemporal capital asset pricing model. Econometrica 41, 867-887. Merton, R. C. (1980). On estimating the expected return on the market: An exploratory investigation. J. Financ. Econom. 8, 323-361. Mossin, J. (1966). Equilibrium in a capital asset market. Econometrica 34, 768-783. Newey, W. (1985). Generalized method of moments specification testing. J. Econometrics 29, 229-256. Newey, W. K. and K. D. West (1987a). A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55, 703-708. Newey, W. K. and K. D. West. (1987b). Hypothesis testing with efficient method of moments estimation, lnternat. Econom. Rev. 28, 777-787. Novales, A. (1992). Equilibrium interest-rate determination under adjustment costs. J. Econom. Dynamic Control 16, 1-25. Roll, R. (1977). A critique of the asset pricing theory's tests: Part 1: On past and potential testability of the theory: J. Financ. Econom. 4, 129-176. Ross, S. A. (1976). The arbitrage pricing theory of capital asset pricing. J. Econom. Theory 13, 341360. Ross, S. (1977). Risk, return and arbitrage. In: Risk and Return in Finance, I. Friend and J. L. Bicksler, eds. Cambridge, Mass.: BaUinger. Rubinstein, M. (1974). An aggregation theorem for securities markets. J. Financ. Econom. 1, 225-244. Rubinstein, M. (1976). The valuation of uncertain income streams and the pricing of options. Bell J. Econom. Mgmt. Sci. 7, 407-425. Ryder H. E., Jr. and G. M. Heal (1973). Optimum growth with intertemporally dependent preferences. Rev. Econom. Stud. 40, 1-33. Shanken, J. (1987). Multivariate proxies and asset pricing relations: Living with the roll critique. J. Financ. Econom. 18, 91-110. Shanken, J. (1992). On the estimation of beta-pricing models. Rev. Financ. Stud. 5, 1-33. Sharpe, W. F. (1964). Capital asset prices: A theory of market equilibrium under conditions of risk. J. Finance 19, 425-442. Sharpe, W. F. (1994). The Sharpe ratio. J. Port. Mgmt. 21, 49-58. Shiller, R. J. (1979). The volatility of long-term interest rates and expectations models of the term structure. J. Politic. Econom. 87, 1190-1219. Shiller, R. J. (1981). Do stock prices move too much to be justified by subsequent changes in dividends? Amer. Econom. Rev. 71, 421-436. Singleton, K. J. (1980). Expectations models of the term structure and implied variance bounds. J. Politic. Econom. 88, 1159-1176. Snow, K. N. (1991). Diagnosing asset pricing models using the distribution of asset returns. J. Finance 46, 955-983. Stambaugh, R. F. (1982). On the exclusion of assets from tests of the two-parameter model: A sensitivity analysis. J. Financ. Econom. 10, 237-268. Sundaresan, S. M. (1989). Intertemporally dependent preferences and the volatility of consumption and wealth. Rev. Financ. Stud. 2, 73-89. Wheatley, S. (1988). Some tests of international equity integration. J. Financ. Econom. 21, 177 212. Wheatley, S. M. (1989). A critique of latent variable tests of asset pricing models. J. Financ. Econom. 23, 325-338. White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48, 817-838. Wilson, R. (1968). The theory of syndicates. Econometrica 36, 119-132.
G. S. Maddala and C. R. Rao, eds., Handbookof Statistics, Vol. 14 1996ElsevierScience B.V. All rights reserved.
")
z~
Instrumental Variables Estimation of Conditional Beta Pricing Models
Campbell R. Harvey and Chris Kirby
A number of well-known asset pricing models imply that the expected return on an asset can be written as a linear function of one or more beta coefficients that measure the asset's sensitivity to sources of undiversifiable risk. This paper provides an overview of the econometric evaluation of such models using the method of instrumental variables. We present numerous examples that cover both singlebeta and multi-beta models. These examples are designed to illustrate the various options available to researchers for estimating and testing beta pricing models. We also examine the implications of a variety of different assumptions concerning the time-series behavior of conditional betas, covariances, and reward-to-risk ratios. The techniques discussed in this paper have applications in other areas of asset pricing as well. 1. Introduction Asset pricing models often imply that the expected return on an asset can be written as a linear combination of market-wide risk premia, where each risk premium is multiplied by a beta coefficient that measures the sensitivity of the return on the asset to a source of undiverifiable risk in the economy. Indeed, this type of tradeoff between risk and expected return is implied by some of the most famous models in financial economics. The Sharpe (1964) - Lintner (1965) capital asset pricing model (CAPM), the Black (1972) CAPM, the Merton (1973) intertemporal CAPM, the arbitrage pricing theory (APT) of Ross (1976), and the Breeden (1979) consumption CAPM can all be classified under the general heading of beta pricing models. Although these models differ in terms of underlying structural assumptions, each implies a pricing relation that is linear in one or more betas. The fundamental difference between conditional and unconditional beta pricing models is the specification of the information environment that investors use to form expectations. Unconditional models imply that investors set prices based on an unconditional assessment of the joint probability distribution of future returns. Under such a scenario we can construct an estimate of an investor's
35
36
c. R. Harvey and C. Kirby
expected return on an asset by taking an average of past returns. Conditional models, on the other hand, imply that investors have time-varying expectations concerning the joint probability distribution of future returns. In order to construct an estimate of an investor's conditional expected return on an asset we have to use the information available to the investor at time t - 1 to forecast the return for time t. Both conditional and unconditional models attempt to explain the crosssectional variation in expected returns. Unconditional models imply that differences in average risk across assets determine differences in average returns. There are no time-series predictions other than expected returns are constant. Conditional models have similar cross-sectional implications: differences in conditional risk determine differences in conditional expected returns. But conditional models have implications concerning the time-series properties of expected returns as well. Conditional expected returns vary with changes in conditional risk and fluctuations in market-wide risk premiums. In theory, we can test a conditional beta pricing model using a single asset. Empirical tests of beta pricing models can be interpreted within the familiar framework of mean-variance analysis. Unconditional tests seek to determine whether a certain portfolio is on the efficient portion of the unconditional meanvariance frontier. The unconditional frontier is determined by the unconditional means, variances and covariances of the asset returns. Conditional tests of beta pricing models are designed to answer a similar question: does a certain portfolio lie on the efficient portion of the mean-variance frontier at each point in time? In conditional tests, however, the mean-variance frontier is determined by the conditional means, conditional variances, and conditional covariances of asset returns. As a general rule, the rejection of unconditional efficiency does not imply a rejection of conditional mean-variance efficiency. This is easily demonstrated using an example given by Dybvig and Ross (1985) and Hansen and Richard (1987). Suppose we are testing whether the 30-day Treasury bill is unconditionally efficient using monthly data. Unconditionally, the 30-day bill does not lie on the efficient frontier. It is a single risky asset (albeit low risk) whose return has non-zero variance. Thus it is surely dominated by an appropriately chosen portfolio. At the conditional level, however, the conclusion is much different. Conditionally, the 30-day bill is nominally risk free. At the end of each month we know precisely what the return will be over the next month. Because the conditional variance of the return on the T-bill is zero, it must be conditionally efficient. A number of different methods have been proposed for testing beta pricing models. This paper focuses on one in particular: the method of instrumental variables. Instrumental variables are a set of data, specified by the econometrician, that proxy for the information that investors use to form expectations. The primary advantage of the instrumental variables approach is that it provides a highly tractable way of characterizing time-varying risk and expected returns. Our discussion of the instrumental variables methodology is organized along the
Instrumental variables estimation of conditional beta pricing models
37
following lines. Section 2 uses the conditional version of the Sharpe (1964) Lintner (1965) CAPM to illustrate how the instrumental variables approach can be employed to estimate and test single beta models. Section 3 extends the analysis to multi-beta models. Section 4 introduces the technique of latent variables. Section 5 provides an overview of the estimation methodology. The final section offers some brief closing remarks.
2. Single beta models A. The conditional C A P M The conditional version of the Sharpe (1964) - Lintner (1965) CAPM is undoubtedly one of the most widely studied conditional beta pricing models. We can express the pricing relation associated with this model as: E[rjtlat_l ] = Coy[tit, rmtlf~t_l] E[rmtlat_l J Var[rmt[g~t_l] (1)
where tit is the return on portfolio j from time t - 1 to time t measured in excess of the risk free rate, r,,t is the excess return on the market portfolio, and g2t-1 represents the information set that investors use to form expectations. The ratio of the conditional covariance between the return on portfolio j and the return on the market, Cov[rjt,rmtlt~t_l], to the variance of the return on the market, Var[rmt[~~t_l] , is the conditional beta of portfolio j with respect to the market. Any cross-sectional variation in expected returns can be attributed solely to differences in conditional beta coefficients. As it stands the pricing relation shown in (1) is untestable. To make it testable we have to impose additional structure on the model. In particular, we have to specify a model for conditional expectations. Thus any test of (1) will be a joint test of the conditional CAPM and the assumed specification for conditional expectations. In theory any functional form could be used. Let f ( Z t - 1 ) denote the statistical model that generates conditional expectations where Z is a set of instrumental variables. The function f ( . ) could be a linear regression model, a Fourier flexible form [Gallant (1982)], a nonparametric kernel estimator [Silverman (1986), Harvey (1991), and Beneish and Harvey (1995)], a seminonparametric density [Gallant and Tauchen (1989)], a neural net [Gallant and White (1990)], an entropy encoder [Glodjo and Harvey (1995)], or a polynomial series expansion [Harvey and Kirby (1995)]. Once we take a stand on the functional form of the conditional expectations operator it is straightforward to construct a test of the conditional CAPM. First we use f ( . ) to obtain fitted values for the conditional mean ofrjt. This nails down the left-hand side of the pricing relation in (1). Then we apply f(-) again to get fitted values for the three components on the right-hand side of (1). Combining the fitted values for the conditional mean of r, nt, those for the conditional covariance between r it and rmt , and those for the conditional variance of rmt yields
38
C. R. Harvey and C. Kirby
fitted values for the right-hand side of (1). If the conditional CAPM is valid then the pricing errors - the difference between the fitted values for the left-hand and right-hand sides of (1) - should be small and unpredictable. This is the basic intuition behind all tests of conditional beta pricing models. In the presentation that follows we focus on one particular specification for conditional expectations: the linear model. This model, though very simple, has distinct advantages over the many nonlinear alternatives. The linear model is exceedingly easy to implement, and Harvey (1991) shows that it performs well against nonlinear alternatives in out-of-sample forecasting of the market return. In addition, the linear specification is actually more general than it may seem. Recent work has shown that many nonlinear models can be consistently approximated via an expanding sequencing of finite-dimensional linear models. Harvey and Kirby (1995) exploit this fact to develop a simple procedure for constructing analytic tests of both single beta and multi-beta pricing models.
B. Linear conditional expectations

The easiest way to motivate the linear specification for conditional expectations is to assume that the joint distribution of the asset returns and instrumental variables is spherically invariant. This class of distributions is analyzed in Vershik (1964), who shows that it is sufficient for linear conditional expectations, and applied to tests of the conditional CAPM in Harvey (1991). Vershik (1964) provides the following characterization. Consider a set of random variables, {xl,... ,Xn}, that have finite second moments. Let H denote a linear manifold spanned by this set. If all random variables in the linear manifold H that have the same variance have the same distribution then: (i) H is a spherically invariant space; (ii) {Xl,... ,Xn) is spherically invariant; and (iii) every distribution function of any variable in H is a spherically invariant distribution. The above requirements are satisfied, for example, by both the multivariate normal and multivariate t distributions. A potential disadvantage of Vershik's (1964) definition is that it does not encompass processes like Cauchy for which the variance is undefined. Blake and Thomas (1968) and Chu (1973) propose a definition for an elliptical class of distributions that addresses this shortcoming. A random vector x is said to have an elliptical distribution if and only if its probability density function p(x) can be expressed as a function of a quadratic form, p(x) = f(x'C-lx), where C is positive definite. When the variance-covariance matrix of x exists it is proportional to C and the Vershik (1964), Blake and Thomas (1968) and Chu (1973) definitions are equivalent. 2 But the quadratic form of the density also covers processes like Cauchy that imply linear conditional expectations where the projection constants depend on the characteristic matrix. 2Implicitin Chu's (1973)definitionis the existenceof the densityfunction. Kelker (1970)provides an alternative approach in terms of the characteristic function. See also Devlin, Gnanadesikan and Kettenring (1976).
39
C. A general framework for testing the C A P M

A linear specification for conditional expectations implies that the return on portfolio j can be written as: rj, = Zt_~aj + ujt , (2)
where ujt is the error in forecasting the return on portfolio j at time t, Zt-1 is a row vector of g instrumental variables, and aj is a g x 1 set of time-invariant weights. Substituting the expression shown in (2) into equation (1) yields the restriction:
Zt-l~m E. . . . Z t - l ~ J -- 1 7 [ , , - ~ ' ~ ] [UjtUmt]~St--lJ ' ~t~mtlL, t-l ]
(3)
where Umt is the error in forecasting the return on the market portfolio. Note that both the variance term, E[U2m~IZ,_I],and the covariance term, E[ujtUmtlZt_l], are conditioned on Zt-1. Therefore, the pricing relation in (3) should be regarded as an approximation. This is the case because the expectation of the true conditional covariance is not the covariance conditioned on Zt-1. The two are connected via the relation: E [ C o v ( r j t , r r n t l ~ t - 1 ) l Z t - l ] = Cov(rj,, rmtlZt_l ) - Cov(E[rjtll2t_l ], E[rmt ]12t_l]lZt_l). An analogous relation holds for the true conditional variance of rmt and the variance conditioned on Zt-1. There is no way to construct a test of the original version of pricing restriction given that the true information set 12 is unobservable. If we multiply both sides of (3) by the conditional variance of the return on the market portfolio we obtain the restriction:
E[um,
E[ujtu,.tZ,-,
mlZ,_l]
(4)
Notice that the conditional expected return on both the market portfolio and portfolio j have been moved inside the expectations operator. This can be done because both of these quantities are known conditional on Zt-a. As a result, we do not need to specify an explicit model for the conditional variance and covariance terms. We simply note that, under the null hypothesis, the disturbance:
ejt :-- umtZt-l j - u j t u m t Z t - l a m
(5)
should have mean zero and be uncorrelated with the instrumental variables. If we divide ejr by the conditional variance of the market return, then the resulting quantity can be interpreted as the deviation of the observed return from the return predicted by the model. Thus ejr is essentially just a pricing error. A negative pricing error implies the model is overpricing while a positive pricing error indicates that the model is underpricing. The generalized method of moments (GMM), which is discussed in detail in Section 5, provides a direct way to test the above restriction. Suppose we have a total of n assets. We can stack the disturbances in (2) and the pricing errors in (5) into the (2n + 1) 1 vector:
40
It, - z,_l a]'

st - ( ut Umt et )'= [rmt - Zt-l~m]' [u2tZt_ l ~ -- UmttltZt_ l t~m]t
)
, (6)
where u is the innovation in the 1 n vector o f conditional means and e is the 1 n vector o f pricing errors. The conditional C A P M implies that st should be uncorrelated with Zt-1. So if we f o r m the K r o n e c k e r p r o d u c t o f st with the vector o f instrumental variables: st ZJt_l , (7)
and take unconditional expectations, we obtain the vector o f orthogonality conditions: E[st @ ZJ,_I] = 0 .
(8)
With n assets there are n -4- 1 columns o f innovations for the conditional means and n columns o f pricing errors. Thus, with g instrumental variables we have g(2n + 1) orthogonality conditions. Note, however, that there are g(n + 1) parameters to estimate. This leaves ng overidentifying restrictions. 3 We can obtain consistent estimates o f the ng matrix o f coefficients ~ and the g 1 vector o f coefficients ~m by minimizing the quadratic objective function: JT = g r S ~ g~ , where:
1 T
t --I
(9)
g r =~ -~ t~l St Q Zrt-1 , and ST denotes a consistent estimate of:

O(3
(10)
So = ~
j~--oo
El(st Z t _ l ) ( S t - j @ Zt_j_l)
(11)
I f the conditional C A P M is true then T times the minimized value o f the objective function converges to a central chi-square r a n d o m variable with ng degrees o f freedom. Thus we can use this criterion as a measure o f the overall goodness-of-fit o f the model.
3 An econometric specification of this form is explored for New York Stock Exchange returns in Harvey (1989) and Huang (1989), for 17 international equity returns in Harvey (1991), for international bond returns in Harvey, Solnik and Zhou (1995), and for emerging equity market returns in Harvey (1995).
Instrumental variablesestimation of conditional betapricing models D. Constant conditional betas
41
The econometric specification shown in (6) assumes that all of the conditional moments - the means, variances and covariances - change through time. I f some of these moments are constant then we can construct more powerful tests of the conditional C A P M by imposing this additional structure. Traditionally, tests of the C A P M have focused on whether expected returns are proportional to the expected return on a benchmark portfolio. We can construct the same type of test within our conditional pricing framework with a specification of the form:
= (r, -
(12)
where p is a row vector of n beta coefficients. The coefficient/~j represents the ratio of conditional covariance between the return on portfolio j and the return on the benchmark to the conditional variance of the benchmark return. Typically, we think of r,,t as a proxy for the market portfolio. It is important to note, however, that the beta coefficients in (12) are left unrestricted. Thus (12) can also be interpreted as a test of a single factor latent variables model. 4 In the latent variables framework, flj represents the ratio of conditional covariance between the return on portfolio j and an unobserved factor to the conditional covariance between the return on the benchmark portfolio and this factor. The testable implication is that E[etlZt-1] = 0 where nt is the vector of pricing errors associated with the constant conditional beta model. There are ng orthogonality conditions and n parameters to estimate so we have g ( n - 1) overidentifying restrictions. O f course we can easily incorporate the restrictions on the conditional beta coefficients by changing the specification to:
( [r,-Z,_,n]'
gt=(U, Urn, bt et) ,= [ F m Z ta l tm ] 2 l t
[u ,p -
I /
(13)
where b is the disturbance vector associated with the constant conditional beta assumption. Tests based on this specification may shed additional light on the plausibility of the assumption of constant conditional betas. With n assets there are n + 1 columns of innovations in the conditional means, n columns in b and n columns in e. Thus there are g(3n + 1) orthogonality conditions, g(n + 1 ) + n parameters to estimate, and n(2g- 1) overidentifying restrictions.
E. Constant conditional reward-to-risk ratio

Another formulation of the conditional C A P M assumes that the conditional reward-to-risk ratio is constant. The conditional reward-to-risk ratio,
4 See, for example, Hansen and Hodrick (1983), Gibbons and Ferson (1985) and Ferson (1990).
42
E[r,nt[~2t_l]/Var[rmtlI2t_l], is simply the price of covariance risk. This version of the conditional C A P M is examined in Campbell (1987) and Harvey (1989). The vector of pricing errors for the model becomes: et = rt
211tUmt ~
(14)
where 2 is the conditional expected return on the market divided by its conditional variance. To complete the econometric specification we have to include models for the conditional means. The overall system is:
,t=(ut
Umt et)'=
[rmt-Zt-lt~m] t
[r, -
(15)
With n assets there are n + 1 columns of innovations in the conditional means and n columns in e. Thus with g instrumental variables there are g(2n + 1) orthogonality conditions and 1 + (e(n + 1)) parameters. This leaves ng - 1 overidentifying restrictions. One way to simplify the estimation in (15) is to note that E[umtujtllt_l] = E[umtrjtlZt-l]. This follows from the fact that: E[umttljt[Zt_l] = E[umt(Fjt - Zt_llfj)lZt_l]
= E[umtrjtlZ,-l] - E[umtZt-l,~SlZt-1] = E[umtrjtlZt-1] - E [ t l m t l Z t - l ] Z t - l ~ j

: E[umtrjtlZt_l] .
As a result, we can drop n of the conditional mean equations. The more parsimonious system is: gt :- (Umt et) = ( [ r m-t - ,~(Umti't)] Zt-15m]t) (16)
N o w we have n + 1 equations and g(n + 1) orthogonality conditions. With g + 1 parameters there are ( n g ) - 1 overidentifying restrictions. The specifications shown in (t5) and (16) are asymptotically equivalent. But (16) is more computationally manageable. The specifications in (15) and (16) do not restrict 2 to be the conditional covariance to variance ratio. We can easily add this restriction:
"t:-(ut
Umt mt
et) '=
[U2t ~ -- Zt_lam] !
[i, t - ,~(Umtrt)] !
[rmt-Zt-l~m]'
'
(17)
where m is the disturbance associated with the constant reward-to-risk assumption. Tests of this specification should shed additional light on the plausibility of the assumption of a constant price of covariance risk. With n assets there are n columns in u, one column in urn, one column in m and n columns in e. Thus there
43
are g(2n + 2) orthogonality conditions, g(n + 1) + 1 parameters, and n - 1 overidentifying restrictions.

F. Linear conditional betas
Ferson and Harvey (1994, 1995) explore specifications where the conditional betas are modelled as a linear functions of the instrumental variables. We could, for example, specify an econometric system of the form:
i~w Ulit ~- ?'it -- Z t _ l ~ i
U2t = rmt -- Z t _ l ~ m
bl3it = 2 [U2t (Z i,w t t-1 Ki ) --
rmtU,it]'
(l 8) lain)!
U4it = # i -- Z t - l a i i,w usit = (-oci + I~i) - Z , I Ki(Z ~
where the elements of Zi,wri are the fitted conditional betas for portfolio i, #i is the mean return on portfolio i, and ~i is the difference between the unrestricted mean return and the mean return that incorporates the pricing restriction of the conditional CAPM. Note that (18) uses two sets of instruments. The set used to estimate the conditional mean return on portfolio i and the conditional beta for the portfolio, Z i,w, includes both asset specific (/) and market-wide (w) instruments. The conditional mean return on the market is estimated using only the market-wide instruments. This yields an exactly identified system of equations, s The intuition behind the system shown in (18) is straightforward. The first two equations follow from our assumption of linear conditional expectations. They represent statistical models for expected returns. The third equation follows from the definition of the conditional beta:
flit = (g[u2,]Z,-l])
2 w - 1
E[rmtulitlZt-1]
i,w
(19)
In (18) the conditional beta is modelled as a linear function of both the assetspecific and market-wide information. The last two equations deliver the average pricing error for the conditional CAPM. Note that & is the average fitted return from the statistical model. Thus ~i is the difference between the average fitted return from our statistical model and the fitted return implied by the pricing relation of conditional CAPM. It is analogous to the Jensen ~. In the current analysis, however, both the betas and the risk premiums are changing through time. Because of the complexity and size of the above system it is difficult to estimate from more one asset at a time. Thus, in general, not all the cross-sectional restrictions of conditional C A P M can be imposed, and it is not possible to report a multivariate test of whether the ~i are equal to zero. Note, however, that (18) 5For analysis of related systems see Ferson (1990), Shanken (1990), Ferson and Harvey (1991), Ferson and Harvey (1993), Ferson and Korajzcyk (1995), Ferson (1995), Harvey (1995) and Jagannathan and Wang (1996).
44
C.R. Harvey and C. Kirby
does impose one important cross-sectional restriction. Because the system is exactly identified, the market risk premium, Z~t_16m, will be identical for every asset examined. There are no overidentifying restrictions, so tests of the model are based on whether the coefficient ~i is significantly different from zero. Additional insights might be gained by analyzing the time-series properties of the disturbance:
i~w U6it = ?'it - - Z t _ l l i ( ~ t _ l O) t i~w
(20)
Under the null hypothesis, E[u6itlZt_l] is equal to zero. Thus diagnostics can be conducted by regressing u6it on various information variables. We could also construct tests for time-varying of betas based on the coefficient estimates associated with Zi,wxi.
3. Models with multiple betas

A. The multi-beta conditional C A P M The conditional CAPM can easily be generalized to a model that has multiple sources of risk. Consider, for example, a k-factor pricing relation of the form: E[rtIZt_I ] =
E lZ,_d (E[u'I,u/,IZt_, l ) -
(21)
where r is a row vector of n asset returns, f i s 1 x K vector of factor realizations, u f is a vector of innovations in the conditional means of the factors, and u is a vector of innovations in the conditional means of the returns. The first term on the right-hand side of (21) represents the conditional expectation of the factor realizations. It has dimension 1 k. The second term is the inverse of the k x k conditional variance-covariance matrix of the factors. The final term measures the conditional covariance of the asset returns with the factors. Its dimension is k n. The multi-beta pricing relation shown in (21) cannot be tested in the same manner as its single-beta counterpart. Recall that in our analysis of single-beta models it was possible to take the conditional variance of the market return to the left-hand side of the pricing relation. As a result, we could move the conditional means inside the expectations operator. This is not possible with a multi-beta specification. We can, however, get around this problem by focusing on specializations of the multi-beta model that parallel those discussed in the previous section. We begin by considering specifications that restrict the conditional betas to be linear functions of the instruments. B. Linear conditional betas The multi-beta analogue of the linear conditional beta specification shown in (18) takes the form:
Instrumental variables estimation o f conditional beta pricing models

i~W
45
Uli t = rit -- Z t _ l ~ i U2t ~ ' f t -- Z t - l ~ f I i,w I il3i ` = [/./2,/./2t ( Z t - 1!i)

U4i t = ]2i - - l i,W t _ l ~i
-ftulit]
(22)
'
U5it = (__0~ i 71_ ]2i) - - Z t _i,w l K i ( Z t _ l ~ fw )
where the elements of Zi'wri are the fitted conditional betas associated with the k sources of risk a n d f i s a row vector of factor realizations. Note that as before the system is exactly identified, and the vector of conditional betas:
/~.
= (E[u2,u2,1~_l])
"~W
--1
E[f,ul.l~_l]
',W
(23)
is modelled as a linear function, Zi, Wri, of the instruments. This specification can be tested by assessing the statistical significance of the pricing errors and checking to see whether the disturbance:
ll6i t ~- tit -- Z t _ l K i ( Z t _ l ~ f )
i,w w
t
(24)
is orthogonal to instruments. The primary advantage of the above formulation is that fitted values are obtained for the risk premiums, the expected returns, and the conditional betas. Thus it is simple to conduct diagnostics that focus on the performance of the model. Its main disadvantage is that it requires a heavy parameterization.
C. Constant conditional reward-to-risk ratios

Harvey (1989) suggests an alternative approach for testing multi-beta pricing relations. His strategy is to assume that the conditional reward-to-risk ratio is constant for each factor. This results is a multi-beta analogue of the specification shown in (15):
\
~.t = ( gt
ttft et ) ' =
~ -- Zt-laf]' /
t,., -
(25)
where 2 is a row vector of k time-invariant reward-to-risk measures. The above system can be simplified to:
g't : ( lift et
)'=
[rt - 2(lltftrt)] t )
'
(26)
using the same approach that allowed us to simplify the single-beta specification discussed earlier. 6
6 Kan and Zhang (1995) generalize this formulation by modelling the conditional reward-to-risk ratios as linear functions of the instrumental variables. Their approach eliminates the need for assetspecific instruments and permits joint estimation of the pricing relation using multiple portfolios. But the type of diagnostics that fall out of the linear conditional beta model - fitted expected returns, betas, etc. - are no longer available.
46
4. Latent variables models The latent variables technique introduced by Hansen and Hodrick (1983) and Gibbons and Ferson (1985) provides a rank restriction on the coefficients of the linear specifications that are assumed to describe expected returns. Suppose we assume that ratio formed by taking the conditional beta for one asset and dividing it by the corresponding conditional beta another asset is constant. Under these circumstances, the k-factor conditional beta pricing model implies that all of the variation in the expected returns is driven by changes in the k conditional risk premiums. We can still form our estimates of the conditional means by projecting returns on the g-dimensional vector of instrumental variables. But if all the variation in expected returns is being driven changes in the k risk premiums then we should not need all ng projection coefficients to characterize the time variation in the n returns. Thus the basic idea of the latent variables technique is to test restrictions on the rank of the projection coefficient matrix.
A. Constant conditional beta ratios
First we take the vector of excess returns on our set of portfolios and partition it
as:
rt-~ ( t i t
" r2t),
(34)
where rlt is a 1 k vector of returns on the reference assets and r2t is a 1 (n - k) vector of returns on the test assets. Then we partition the matrix of conditional beta coefficients associated with our multi-factor pricing model accordingly: ~=(Pl " /~2), (35)
where/~1 is k x k and/~2 is k x (n - k). The pricing relation for the multi-beta model tells us that: E[r~tlZt_l] = ~,t/~l and
E[r2tlZt-1] = ~tIl2 ,
(36)
(37)
where ~'t is a 1 x k vector of time-varying market-wide risk premiums. We can manipulate (36) to obtain the relation ~t = E [ r l t l Z t - l ] ~ 1. Substituting this expression for ~t into (37) yields the pricing restriction: E[r2,1/,-l] = E[rl,l/,-1]/~i-l/~2 (38)
This says that the conditional expected returns on the test assets are proportional to the conditional expected returns on the reference assets. The constants of proportionality are determined by ratios of conditional betas.
Instrumental variablesestimation of conditional beta pricing models
47
The pricing relation in (38) can be tested in much the same manner as the models discussed earlier. 7 The only real difference is that we no longer have to identify the factors. One possible specification is:
\
,~t -~- ( Ult U2t et )'=
[Zt_162 -
[l'2t -- Zt_162] t ] Zt_lal~]t/
(39)
where = flllfl2. There are k columns in ult, n - K columns in u2, and n - k columns in et. Thus we have g(2n - k) orthogonality conditions and gn + k(n - k) parameters. This leaves ( - k)(n - k) overidentifying restrictions. Note that both the number of instrumental variables and the total number of assets must be greater than the number of factors.
B. Linear conditional covariance ratios
An important disadvantage of (39) is that the ratio of conditional betas, ~/i = fltlfl2, is assumed to be constant. One way to generalize the latent variables model is to assume the elements of ~/i are linear in the instrumental variables. 8 This assumption follows naturally from the previous specifications that imposed the assumption of linear conditional betas. The resulting latent variables system is:
ilt = (lilt U2t gt )t= (
[rlt - Zt-161]' It2, - z , _ 1 6 2 ] '

[z,_162 - z,_la (,
(40)
w h e r e , is a k 1 vector of ones. With the original set of instruments the dimension of ~* in the final set of moment conditions is g(n - k) and the system is not identified. Thus the researcher must specify some subset of the original instruments, Z*, with dimension g* < g to be used in the estimation. Finally, the parameterization in both (39) and (40) can be reduced by substituting the third equation block into the second block. For example,
~t = (/glt e,) t = ( [Plt-Zt-161]t ) [r2t -- Z,_161~] / (41) '
In this system, it is not necessary to estimate 62.

5. Generalized method of moments estimation
Contemporary empirical research in financial economics makes frequent use of a wide variety of econometric techniques. The generalized method of moments has proven to be particularly valuable, however, especially in the area of estimating and testing asset pricing models. This section provides an overview of the gen7 Harvey, Solnik and Zhou (1995) and Zhou (1995) show to construct analytic tests of latent variables models. s See Ferson and Foerster (1994).
48
C. R. Harvey and C Kirby
eralized method of moments (GMM) procedure. We begin by illustrating the intuition behind G M M using a simple example of classical method of moments estimation This is followed by brief discussion of the assumptions underlying the G M M approach to estimation and testing along with a review of some of the key distributional results. For detailed proofs of the consistency and asymptotic normality of G M M estimators see Hansen (1982), Gallant and White (1988), and Potscher and Prucha (1991a,b).
A. The Classical method of moments

The easiest way to illustrate the intuition behind the G M M procedure is to consider a simple example of classical method of moments (CMM) estimation. Suppose we observe a random sample xl,x2,... ,xr of T observations drawn from a distribution with probability density function f(x; 0), where 0 =- [01,02,..., Ok] denotes a k 1 vector of unknown parameters The CMM approach to estimation exploits the fact that in general the jth population moment of x about zero:
mj ~
E[xj] ,
(42)
can be written as known function of 0. To implement the C M M procedure we first compute the jth sample moment of x about zero:
1 T /.'1"~ .
rhj = ~ - ~
(43)
Then we set the jth sample moment equal to the corresponding population moment f o r j = 1 , 2 , . . . , k :
?~v/1 ~ml(0
.
)
(44)
r~2
=
.
m2(0)
ink(O)
~nk
This yields a set of k equations in k unknowns that can be solved to obtain an estimator for the unknown vector 0. Thus the basic idea behind the C M M procedure is to estimate 0 by replacing population moments with their sample analogues Now let's take a more concrete version of the above example. Suppose that xa, x 2 , . . . , xr is a random sample of size T drawn from a normal distribution with mean # and variance ~r 2. To obtain the classical method of moments estimators of # and O"2 w e note that 0 "2 = m2 - (ml)2 This implies that the system of moments equations takes the form:
1 Ti~lXi~ ]~
(45)
__ 1L X2 i ~ - 0"2 -}- //2
.
Ti= 1
49
Consequently, the C M M estimators for the mean and variance are:
=~xi
(46)
1 u
~2
1 L X~i=Ti=I
xi
Notice that these are also the maximum likelihood estimators of/~ and a 2.
B. The Generalized method of moments

The classical method of moments is just a special case of the generalized method of moments developed by Hansen (1982). This latter procedure provides a general framework for estimation and hypothesis testing that can be used to analyze a wide variety of dynamic economic models. Consider, for example, the class of models that generate conditional moment restrictions of the form:
Et[ut+j = 0 ,
(47)
where Et[. ] is the expectations operator conditional on the information set at time t, ut+, - h(Xt+,, 00) is an n x 1 vector of vector of disturbance terms, Xt+~ is an s x 1 vector of observable random variables, and 00 is an m x 1 vector of unknown parameters. The basic idea behind the G M M procedure is to exploit the moment restrictions in (47) to construct a sample objective function whose minimizer is a consistent and asymptotically normal estimate of the unknown vector 00. In order to construct such an objective function, however, we need to make some assumptions about the nature of the data generating process. Let Zt denote the date t realization of an g x 1 vector of observable instrumental variables. We assume, following Hansen (1982), that the vector process {Xt, Zt}~=_~o is strictly stationary and ergodic. Note that this assumption rules out a number of features sometimes encountered in economic data such as deterministic trends, unit roots, and unconditional heteroskedasticity. It accommodates many common forms of conditional heterogeneity, however, and it does not appear to be overly restrictive in most applications. 9 With suitable restrictions on the data generating process in place we can proceed to construct the G M M objective function. First we form the Kronecker product:
f ( x , + ~, z , , 0o) - u,+ ~ z , .
(48)
Then we note that because Zt is in the information set at time t, the model in (47) implies that:
9 Although is possible to establish consistency and asymptotic normality of GMM estimators under weaker assumptions, the associated arguments are too complex for an introductory discussion. The interested reader can consult Potscher and Prucha (1991 a,b) for an overview of recent advances in the asymptotic theory of dynamic nonlinear econometric models.
50
c. R. Harveyand C. Kirby Et[f(Xt+~, Zt, 00)] = 0 .

(49)
Applying the law of iterated expectations to equation (49) yields the unconditional restriction:
E[f(Xt+,, Zt, 00)] = 0 .
(50)
Equation (50) represents a set of n population orthogonality conditions. The sample analogue of E[f(Xt+~, Zt, 0)]:
1 r
gr(0) -- ~ ~-~f(Xt+~, Z,, O) ,

t=l
(51)
forms the basis for the G M M objective function. Note that for any given value of 0 the vector 9r(0) is just the sample mean of T realizations of the random vector f(Xt+~, Zt, 0). Given that f(.) is continuous and {Xt, Zt}t~=_~ is strictly stationary and ergodic we have:
gr(O) P--~E[f(Xt+~,Zt, 0)]
(52)
by the law of large numbers. Thus if the economic model is valid the vector gr(00) should be close to zero when evaluated for a large number of observations. The G M M estimator of 00 is obtained by choosing the value of 0 that minimizes the overall deviation o f g r ( 0 ) from zero. As long as E[f(Xt+~, Zt, 0)] is continuous in 0 it follows that this estimator is consistent under fairly general regularity conditions. If the model is exactly identified (m = ng), the G M M estimator is the value of 0 that sets the sample moments equal to zero. For the more common situation where the model is overidentified (m < n~), finding a vector of parameters that sets all of the sample moments equal to zero is not feasible. It is possible, however, to find a value of 0 that sets m linear combinations of the ng sample moment conditions equal to zero. We simply let Ar be an m x ng matrix such that ATgT(O) = 0 has a well-defined solution. The value of 0 that solves this system of equations is the G M M estimator. Although we have considerable leeway in choosing the weighting matrix At, Hansen (1982) shows that the variancecovariance matrix of the estimator is minimized by letting Ar equal D'rST a where Dr and ST are consistent estimates of: Do =-- E|-
o)L ] 00 Z~
and
So ~ ~
j=--OO
F0(j) ,
(53)
with F0(j) - E[f(Xt+~, Zt, Oo)f(Xt+~-j, Zt-j, 00)']. Before considering how to derive this result we first have to establish the asymptotic normality of G M M estimators.
C. Asymptotic normality of GMM estimators

We begin by expressing equation (51) as:

1 T
51
v/Tot(O) = ~
f(Xt+~, Zt, O) .
(54)
The assumption that {Xt, Zt}t~=_~ is stationary and ergodic, along with standard regularity conditions, implies that a version of the central limit theorem holds. In particular we have that:
v/TgT(Oo) a-~N(O, So) ,
(55)
with So given by (53). This result allows us to establish the limiting distribution of the G M M estimator Or. First we make the following assumptions: 1. The estimator Or converges in probability to 00. 2. The weighting matrix Ar converges in probability to A0 where Ao has rank m. 3. Define:
Dr =- - A . ~
T t= 1 ~
"
O0
oT )
Zt
(56)
For any Or such that 0rP00 the matrix Dr converges in probability to Do where Do has rank m. Then we apply the mean value theorem to obtain:
gr(Or) = gr(O0) + O*~(Or -- 00) ,
(57)
where D~ is given by (56) with Or replaced by a vector 0~ that lies somewhere within the interval whose endpoints are given by Or and 00. Recall that Or is the solution to the system of equations ATgr(O) = 0. So if we premultiply equation (57) by AT we have:
AmT(Oo) + ATD* (OT -
00) = 0.
(58)
Solving (58) for (0r - 00) and multiplying by v/T gives:

v/T(OT -- 00) = --[ATID*T]-IATV~aT(O0) ,
(59)
and by Slutsky's theorem we have:
x/T ( OT - 0o) L -[AoOo]-lAo

{the limiting distribution of v/TgT(00)) . Thus the limiting distribution of the GMM estimator is:
(60)
v/-T(OT -- 00)~a N(0, (AoOo)-~ AoSoA'o(AoOo) -1') .
(61)
Now that we know the limiting distribution of the generic G M M estimator we can determine the best choice for the weighting matrix AT. The natural metric by
52
which to measure our choice is the variance-covariance matrix of the distribution shown in (61). We want, in other words, to choose the Ar that minimizes the variance-covariance matrix of the limiting distribution of the G M M estimator.
D. The asymptotically efficient weighting matrix

The first step in determining the efficient weighting matrix is to note that So is symmetric and positive definite. Thus So can be written as So = p p r where P is nonsingular, and we can express the variance-covariance matrix in (61) as:
V -- (AoDo)-IAoSoA~o(AoDo)-I' = (AoDo)-IAoP((AoDo)-IAoP) ' = ( H + (D~oS~o1Do)-ID~o(P')-')(H+ (D~oS~olDo)-ID~o(P')-I)t

where: (62)
H-
(AoDo)-IAoP-
' 1Do)-1 Do(t~) , -l (Oo~
At first it may appear a bit odd to define H in this manner, but it simplifies the problem of finding the efficient choice for At. To see why this is true note that:
I t P - l Do = ( AoDo)-l AoPP-1Do - ( Dro~oolDo)-l Dto( p ' ) - l p-1Do _- i t - I

=0 As a consequence equation (62) reduces to: V = H / - / + (D~)So-o 100) -1
(63)
(64)
Because H is an m x ng matrix with rank m it follows that HEft is positive definite. Thus (D~S~01D0)-1 is the lower bound on the asymptotic variance-covariance matrix of the G M M estimator. It is easily verified by direct substitution that choosing A0 = D~S0-01 achieves this lower bound. This completes our review of the distribution theory for G M M estimators. Next we want to consider some of the practical aspects of G M M estimation and see how we might go about testing the restrictions implied economic models. We begin with a strategy for implementing the G M M procedure.
E. The estimation procedure

To obtain an estimate for the vector of unknown parameters 00 we have to solve the system of equations:
.4m~(O) = o
Substituting the optimal choice for the weighting matrix into this expression yields:
53 (65)
g (o) = o ,
where ST is a consistent estimate of the matrix So. But it is apparent that (65) is just the first-order condition for the problem: min 0 JT(O)
=
9T(O)tSTlgT(O)
(66)
So given a consistent estimate of So we can obtain the G M M estimator for 0o by minimizing the quadratic form shown in equation (66). In order to estimate 00 we need a consistent estimate of So. But, in general, So is a function of 00. The solution to this dilemma is to perform a two-step estimation procedure. Initially we set ST equal to the identify matrix and perform the minimization to get a first-stage estimate for 00. Although this estimate is not asymptotically efficient it is still consistent. Thus we can use it to construct a consistent estimate of So. Once we have a consistent estimate of So we obtain the second-stage estimate for 00 by minimizing the quadratic form shown above. Let's assume that we have performed the two-step estimation procedure and obtained the efficient G M M estimate of the vector of parameters 00. Typically we would like to have some way of evaluating how well the model fits the observed data. One way of obtaining such a goodness-of-fit measure is to construct a test of the overidentifying restrictions.
F. The test for overidentifying restrictions

Suppose the model under consideration is overidentified (m < ng). Under such circumstances we can develop a test for the overall goodness-of-fit of the model. Recall that by the mean value theorem we can express 9r(0r) as:
gT(OT) = gr(O0) + D*r(OT -- 00) .
(67)
If we multiply equation (67) by v ~ and substitute for v~(OT -- 0O) from equation (59) we obtain:
v/-TgT(Or) = (I-- D*r(ArD*r)-I AT)v/Tgr(Oo)

Substituting in the optimal choice for AT yields:
(68)
v /-Tg T ( OT ) = (I-- D ~ ( D~rS~r l D*T) - l D~rS~r l ) ,u/-fg r ( Oo) ,

so that by Slutsky's theorem:
(69)
v/-TgT(OT)L(I-- Do(DtoSffo l Do)-lDtoS~o l ) >( N(O, So) .
(70)
Because So is symmetric and positive definite it can be factored as So = P ~ where P is nonsingular. Thus (70) can be written as:
54
v / T P - 1 9 r ( O r ) a-+(l - p-lDo(D~oSolDo)-lD~o(P')-I ) N(0,13
(71)
The matrix premultiplying the normal distribution in (71) is idempotent with rank ng - m. It follows, therefore, that the overidentifying test statistic:
MT -- TgT(Or)'~oo ~gT(OT)
(72)
converges to a central chi-square random variable with n g - m degrees of freedom. The limiting distribution of Mr remains the same if we use a consistent estimate S r in place of So. Note that in many respects the test for overidentifying restrictions is analogous to the Lagrange multiplier test in maximum likelihood estimation. The G M M estimator of 00 is obtained by setting m linear combinations of the ng orthogonality conditions equal to zero. Thus there are ng - m linearly independent combinations which have not been set equal to zero. Suppose we took these ng - m linear combinations of the moment conditions and set them equal to a (n - m) x 1 vector of unknown parameters e. The system would then be exactly identified and Mr would be identically equal to zero. Imposing the restriction that =0 yields the efficient G M M estimator along with a quantity Tgr(Or)'S~rlgr(Or) that can be viewed as the G M M analogue of the score form of the Lagrange multiplier test statistic. The test for overidentifying restrictions is appealing because it provides a simple way to gauge how well the model fits the data. It would also be convenient, however, to be able to test restrictions on the vector of parameters for the model. As we shall see, such tests can be constructed in a straightforward manner.
G. H y p o t h e s i s testing in G M M
Suppose that we are interested in testing restrictions on the vector of parameters of the form:
q(Oo) =
(73)
where q is a known p x 1 vector of functions. Let the p x m matrix Qo =- Oq/00~ denote the Jacobian of q(O) evaluated at 00. By assumption Q0 has rank p. We know that for the efficient choice of the weighting matrix the limiting distribution of the G M M estimator is:
,v/--T(OT _ o0) d N ( o , (D~S0-01D0)-1) .
(74)
Thus under fairly general regularity conditions the standard large-sample test criteria are distributed asymptotically as central chi-square random variables with p degrees of freedom when the restrictions hold. Let O~ and O} denote the unrestricted estimator and the estimator obtained by minimizing Jr(O) subject to q(O) = O. The Wald test statistic is based on the unrestricted estimator. It takes the form:

u t t -1 -1 t -1 u
55
Wr -- Tq(Or) (QT(DTS~. OT)
Q r)
q(OT) ,
(75)
where QT, Dr and Sr are consistent estimates of Q0, Do and So computed using 0~. The Lagrange multiplier test statistic is constructed using the gradient of Jr(O) evaluated at restricted estimator. It is given by:
LMr ~ TgT(Or) S T DT(DTS: ~ Dr)
-1
-1
-1
DTSTr 9r(O~r) ,
(76)
where Dr and Sr are consistent estimates of Do and So computed from ~r- The likelihood ratio type test statistic is equal to the difference between the overidentifying test statistic for the restricted and unrestricted estimations:
err - g r ( o r ). ,
1O r ( 0 r .) )
(77)
The same estimate ST must be used for both estimations. It should be clear from the foregoing discussion that a consistent estimate of So is one of the key elements of the G M M approach to estimation and testing. In practice there are a number of different methods for estimating So, and the appropriate method often depends on the specific characteristics of the model under consideration. The discussion below provides an introduction to heteroskedasticity and autocorrelation consistent estimation of the variancecovariance matrix. A more detailed treatment can be found in Andrews (1991).
H. Robust estimation of the variance-covariance matrix

The variance-covariance matrix of v/-TgT(OO) is given by:
O~3
So -
Z
j~-oo
F0(j) ,
(78)
where F0(j) - E[f(Xt+~, Zt, Oo)f(Xi+~_j, Zt-j, 00)']. Because we have assumed stationarity, this matrix can also be written as:
oo
So = F0(0) + Z ( F 0 ( J ' ) + F0(j)') ,

j=l
(79)
using the relation Fo(-j) = Fo(j)'. Now we want to consider how we might go about estimating So consistently. First take the scenario where the vector f(Xt+~, Zt, 00) is serially uncorrelated. Under such circumstances the second term on the right-hand side of equation (79) drops out and
T
rr(0)
_-- 1 / T
t=l
z,,
z,, 0r)'
provides a consistent estimate for So. The case where f(-) exhibits serial correlation is more complicated. Note that the sum in equation (79) contains an infinite number of terms. It is obviously
56
impossible to estimate each of these terms. One way to proceed would be to treat f(.) as if it were serially correlated for a finite number of lags L. Under such circumstances a natural estimator for So would be:
L
sT = rT(0) + ~ ( r T ( j )
j=l
+ rT(])') ,
(80)
where Fr(j)=--1/T~rt=l+jf(Xt+~,Zt, Or)f(Xt+~:_j, Zt_j, Or)'. As long as the individual Fr(j) in equation (80) are consistent the estimator ST will be consistent providing that L is allowed to increase at suitable rate as the sample size T increases. But the estimator of So in (80) is not guaranteed to be positive semidefinite. This can lead to problems in empirical work. The solution to this difficulty is to calculate ST as a weighted sum of the Fr(j) where the weights gradually decline to zero as j increases. If these weights are chosen appropriately then ST will be both consistent and positive semidefinite. Suppose we begin by defining the ng(L + l) ng(L + l) partitioned matrix:
CT(L)=
rr(O) rr(1)
rT(1)'
...
rT(L)'
. rT(0)
rr(o)
...
r T ( L - 1)'1
/
J
(81)
krTin)
rT(L-1)
..:
The matrix Cr(L) can always be written in the form Cr(L) = Y Y where Y is an (T + L) ng(L + 1) partitioned matrix. Take L = 2 as an example. The matrix Y is given by:
0
f(Xl+~, Zl, Or)'
:
f(Xl+~, Z1,0r)':
f ( XT +z ' Z T , OT) t
y=!
0
f(Xl+~, Zl, OT)'
v~
(82)
: f(XT+~, ZT, Or)' y(xr+~, zr, or)' o

From this result it follows that consider the matrix:
0 o
CT(L) is
a positive semidefinite matrix. Next
ST(L)=[~ol
~ll...aL1]
rT(0) ... rr(1)......, ...
rT(r~)' r~(L.- 1)' rT(0)
~,oX]
(83)
LrriL)
where the ~i are scalars. Because St(L) is the partitioned-matrix equivalent of a quadratic form in a positive semidefinite matrix it must also be positive semidefinite. Equation (83) can be rearranged to show that:
57
St(L) = (82 + . . . + CCL2)Fr(O)
:,:,+:)(r:(j)+rTo)')
(84)
The weighted sum on right-hand side of equation (84) has the general form of an estimator for the variance-covariance matrix So. Thus if we select the ~+ so that the weights in (84) are a decreasing function of L and we allow L to increase with the sample size at an appropriately slow rate we obtain a consistent positive semidefinite estimator for So. The modified Bartlett weights proposed by Newey and West (1987) have been used extensively in empirical research. Let wj be the weight placed on FT(j) in the calculation of the variance-covariance matrix. The weighting function for modified Bartlett weights takes the form:
wj=
L+I
j=0,1,2,...,L j>L,
(85)
where L is the lag truncation parameter. Note that these weights are obtained by setting ei = 1/v/~ + 1 for i = 0, 1 , . . . ,L. Newey and West (1987) show that ifL is allowed to increase at a rate proportional to T 1/3 then ST based on these weights will be a consistent estimator of So. Although the weighting scheme proposed by Newey and West (1987) is popular, recent research has shown that other schemes may be preferable. Andrews (1991) explores both the theoretical and empirical performance of a variety of different weighting functions. Based on his results Parzen weights seem to offer an good combination of analytic tractability and overall performance. The weighting function for Parzen weights is: 1
6j2 -'} -6j3 o<J< 1
wj =
2(1 _ j ) 3 0
1 < j_< 1 ~>1
(86)
The final question we need to address is how choose the lag truncation parameter L in (86). The simplest strategy is to follow the suggestions of Gallant (1987) and set L equal to the integer closest to T 1/5. The main advantage of this plug-in approach is that it is yields an estimator that depends only on the sample size for the data set in question. An alternative strategy developed by Andrews (1991), however, may lead to better performance in small samples. He suggests the following data-dependent approach: use the first-stage estimate of 00 to construct the sample analogue of f(Xt+~,Z,,Oo). Then estimate a first-order autoregressive model for each element of this vector. The autocorrelation coefficients along with the residual variances can be used to estimate the value of L that minimizes the asymptotic truncated mean-squared-error of the estimator. Andrews (1991) presents Monte Carlo results that suggest that estimators of So constructed in this manner perform well under most circumstances.
58
6. Closing remarks
Asset pricing models often imply that the expected return on an asset can be written as a linear function o f one or m o r e beta coefficients that measure the asset's sensitivity to sources o f undiversifiable risk in the economy. This linear tradeoff between risk and expected return makes such models b o t h intuitively appealing and analytically tractable. A n u m b e r o f different methods have been p r o p o s e d for estimating and testing beta pricing models, but the m e t h o d o f instrumental variables is the a p p r o a c h o f choice in most situations. The p r i m a r y advantage o f the instrumental variables a p p r o a c h is that it provides a highly tractable way o f characterizing time-varying risk and expected returns. This paper provides an introduction the econometric evaluation o f b o t h conditional and unconditional beta pricing models. We present n u m e r o u s examples o f h o w the instrumental variable m e t h o d o l o g y can be applied to various models. W e began with a discussion o f the conditional version o f the Sharpe (1964) - Lintner (1965) C A P M and used it to illustrate h o w the instrumental variables a p p r o a c h could be used to estimate and test single beta models. Then we extended the analysis to models with multiple betas and introduced the concept o f latent variables. We also provided an overview o f the generalized m e t h o d o f m o m e n t s a p p r o a c h ( G M M ) to estimation and testing. All o f the techniques developed in this paper have applications in other areas o f asset pricing as well.
References
Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59, 817-858. Bansal, R. and C. R. Harvey (1995). Performance evaluation in the presence of dynamic trading strategies. Working Paper, Duke University, Durham, NC. Beneish, M. D. and C. R. Harvey (1995). Measurement error and nonlinearity in the earnings-returns relation. Working Paper, Duke University, Durham, NC. Black, F. (1972). Capital market equilibrium with restricted borrowing. J. Business 45, 444-454. Blake, I. F. and J. B. Thomas (1968). On a class of processes arising in linear estimation theory. IEEE Transactions on Information Theory IT-14, 12-16. Bollerslev, T., R. F. Engle and J. M. Wooldridge (1988). A capital asset pricing model with time varying covariances. J. Politic. Econom. 96, 116-31. Breeden, D. (1979). An intertemporal asset pricing model with stochastic consumption and investment opportunities. J. Financ. Econom. 7, 265-296. Campbell, J. Y. (1987). Stock returns and the term structure. J. Financ. Econom. 18, 373~400. Carhart, M. and R. J. Krail (1994). Testing the conditional CAPM. Working Paper, University of Chicago. Chu, K. C. (1973). Estimation and decision for linear systemswith elliptically random processes. IEEE Transactions on Automatic Control AC-18, 499-505. Cochrane, J. (1994). Discrete time empirical finance. Working Paper, University of Chicago. Devlin, S. J. R. Gnanadesikan and J. R. Kettenring, Some multivariate applications of elliptical distributions. In: S. Ideka et al., eds., Essays in probability and statistics, Shinko Tsusho, Tokyo, 365-393.
59
Dybvig, P. H. and S. A. Ross (1"985). Differential information and performance measurement using a security market line. J. Finance 40, 383-400. Dumas, B. and B. Solnik (1995). The world price of exchange rate risk. J. Finance 445-480. Fama, E. F. and J. D. MacBeth (1973). Risk, return, and equilibrium: Empirical tests. J. Politic. Econom. 81, 607~36. Ferson, W. E. (1990). Are the latent variables in time-varying expected returns compensation for consumption risk. J. Finance 45, 397-430. Ferson, W. E. (1995). Theory and empirical testing of asset pricing models. In: Robert A. J. W. T. Ziemba and V. Maksimovic, eds. North Holland 145~200 Ferson, W. E., S. R. Foerster and D. B. Keim (1993). General tests of latent variables models and mean-variance spanning. J. Finance 48, 131-156. Ferson, W. E. and C. R. Harvey (1991). The variation of economic risk premiums. J. Politic. Econom. 99, 285-315. Ferson, W. E. and C. R. Harvey (1993). The risk and predictability of international equity returns. Rev. Financ. Stud. 6, 522566. Ferson, W. E. and C. R. Harvey (1994a). An exploratory investigation of the fundamental determinants of national equity market returns. In: Jeffrey Frankel, ed., The internationalization of equity markets, Chicago: University of Chicago Press, 59-138. Ferson, W. E. and R. A. Korajczyk (1995) Do arbitrage pricing models explain the predictability of stock returns. J. Business, 309-350. Ferson, W. E. and Stephen R. Foerster (1994). Finite sample properties of the Generalized Method of Moments in tests of conditional asset pricing models. J. Financ. Econom. 36, 29-56. Gallant, A. R. (1981). On the bias in flexible functional forms and an essentially unbiased form: The Fourier flexible form. 3". Econometrics 15, 211-224. Gallant, A. R. (1987). Nonlinear statistical models. John Wiley and Sons, NY. Gallant, A. R. and G. E. Tauchen (1989). Seminonparametric estimation of conditionally constrained heterogeneous processes. Econometrica 57, 1091-1120. Gallant, A. R. and H. White (1988). A unified theory of estimation and inference for nonlinear dynamic models. Basil Blackwell, NY. Gallant, A. R. and H. White (1990). On learning the derivatives of an unknown mapping with multilayer feedforward networks. University of California at San Diego. Gibbons, M. R. and W. E. Ferson (1985). Tests of asset pricing models with changing expectations and an unobservable market portfolio. J. Financ. Econom. 14, 217-236. Glodjo, A. and C. R. Harvey (1995). Forecasting foreign exchange market returns via entropy coding. Working Paper, Duke University, Durham NC. Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica 50, 1029-1054. Hansen, L. P. and R. J. Hodrick (1983). Risk averse speculation in the forward foreign exchange market: An econometric analysis of linear models. In: Jacob A. Frenkel, ed., Exchange rates and international macroeconomics, University of Chicago Press, Chicago, IL. Hansen, L. P. and R. Jagannathan (1991). Implications of security market data for models of dynamic economies. J. Politic. Econom. 99, 225-262. Hansen, L. P. and R. Jagannathan (1994). Assessing specification errors in stochastic discount factor models. Unpublished working paper, University of Chicago, Chicago, IL. Hansen, L. P. and S. F. Richard (1987). The role of conditioning information in deducing testable restrictions implied by dynamic asset pricing models. Econometrica 55, 587~513. Hansen, L. P. and K. J. Singleton (1982). Generalized instrumental variables estimation of nonlinear rational expectations models. Econometrica, 50, 1269-1285. Harvey, C. R. (1989). Time-varying conditional covariances in tests of asset pricing models. J. Financ. Econom. 24, 289-317. Harvey, C. R. (1991a). The world price of covariance risk. J. Finance 46, 111-157. Harvey, C. R. (1991b). The specification of conditional expectations. Working Paper, Duke University.
60
Harvey, C. R. (1995), Predictable Risk and returns in emerging markets, Rev. Financ. Stud. 773-816. Harvey, C. R. and C. Kirby (1995). Analytic tests of factor pricing models. Working Paper, Duke University, Durham, NC. Harvey, C. R., B. H. Solnik and G. Zhou (1995). What determines expected international asset returns? Working Paper, Duke University, Durham, NC. Huang, R. D. (1989). Tests of the conditional asset pricing model with changing expectations. Unpublished working Paper, Vanderbilt University, Nashville, TN. Jagannathan, R. and Z. Wang (1996). The CAPM is alive and well. J. Finance 51, 3-53. Kan, R. and C. Zhang (1995). A test of conditional asset pricing models. Working Paper, University of Alberta, Edmonton, Canada. Keim, D. B. and R. F. Stambaugh (1986). Predicting returns in the bond and stock market. J. Financ. Econom. 17, 357-390. Kelker, D. (1970). Distribution theory of spherical distributions and a location-scale parameter generalization. Sankhy~, series A, 419-430. Kirby, C (1995). Measuring the predictable variation in stock and bond returns. Working Paper, Rice University, Houston, Tx. Lintner, J. (1965). The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets. Rev. Econom. Statist. 47, 13-37. Merton, R. C. (1973). An intertemporal capital asset pricing model. Eeonometrica 41, 867-887. Newey, W. K. and K. D. West (1987). A simple, positive semi-definite, heteroskedasticity-consistent covariance matrix. Eeonometrica 55, 703-708. Potscher, B. M. and I. R. Prucha (1991a). Basic structure of the asymptotic theory in dynamic nonlinear econometric models, part I: Consistency and approximation concepts. Econometric Rev. 10, 125-216. Potscher, B. M. and I. R. Prucha (1991b). Basic structure of the asymptotic theory in dynamic nonlinear econometric models, part II: Asymptotic normality. Econometric Rev. 10, 253-325. Ross, S. A. (1976). The arbitrage theory of capital asset pricing. J. Econom. Theory 13, 341-360. Shanken, J. (1990). Intertemporal asset pricing: An empirical investigation. J. Econometrics 45, 99120. Sharpe, W. (1964). Capital asset prices: A theory of market equilibrium under conditions of risk. J. Finance 19, 425-442. Silverman, B. W. (1986). Density estimation for statistics and data analysis. London: Chapman and Hall. Solnik, B. (1991). The economic significance, of the predictability of international asset returns. Working Paper, HEC-School of Management. Vershik, A. M. (1964). Some characteristics properties of Gaussian stochastic processes. Theory Probab. Appl. 9, 353-356. White, H. (1980). A heteroskedasticity consistent covariance matrix estimator and a direct test of heteroskedasticity. Econometrica 48, 817-838. Zhou, G. (1995). Small sample rank tests with applications to asset pricing. J. Empirical Finance 2, 7194.
G.S. Maddala and C.R. Rao, eds., Handbook of Statistics, Vol. 14 1996 Elsevier Science B.V. All rights reserved.
"2
J
Semiparametric Methods for Asset Pricing Models
Bruce N. Lehmann
This paper discusses semiparametric estimation procedures for asset pricing models within the generalized method of moments (GMM) framework. G M M is widely applied in the asset pricing context in its unconditional form but the conditional mean restrictions implied by asset pricing theory are seldom fully exploited. The purpose of this paper is to take some modest steps toward removing these impediments. The nature of efficient G M M estimation is cast in a language familiar to financial economists: the language of maximum correlation or optimal hedge portfolios. Similarly, a family of beta pricing models provides a natural setting for identifying the sources of efficiency gains in asset pricing applications. My hope is that this modest contribution will facilitate more routine exploitation of attainable efficiency gains.
1. Introduction
Asset pricing relations in frictionless markets are inherently semiparametric. That is, it is commonplace for valuation models to be cast in terms of conditional moment restrictions without additional distributional assumptions. Accordingly, a natural estimation strategy replaces population conditional moments with their sample analogues. Put differently, the generalized method of moments (GMM) framework of Hansen (1982) tightly links the economics and econometrics of asset pricing relations. While applications of G M M abound in the asset pricing literature, empirical workers seldom make full use of the G M M apparatus. In particular, researchers generally employ the unconditional forms of the procedures which do not exploit all of the efficiency gains inherent in the moment conditions implied by asset pricing models. There are two plausible reasons for this: (1) the information requirements are often sufficiently daunting to make full exploitation seem infeasible and (2) the literature on efficient semiparametric estimation is somewhat dense. The purpose of this paper is to take some modest steps toward removing these impediments. The nature of efficient G M M estimation is cast in terms familiar to financial economists: the language of maximum correlation or optimal hedge
61
62
B. N. Lehmann
portfolios. Similarly, a family of beta pricing models provides a natural setting for identifying the sources of efficiency gains in asset pricing applications. My hope is that this modest contribution will facilitate more routine exploitation of attainable efficiency gains. The layout of the paper is as follows. The next section provides an outline of G M M basics with a view toward the subsequent application to asset pricing models. The third section lays out the links between the economics of asset prices when markets do not permit arbitrage opportunities and the econometrics of asset pricing model estimation given the conditional moment restrictions implied by the absence of arbitrage. The general efficiency gains discussed in these two sections are worked out in detain in the fourth section, which documents the sources of efficiency gains in beta pricing models. The final section provides some concluding remarks.
2. Some relevant aspects of the generalized method of moments (GMM)

Before elucidating the links between G M M and asset pricing theory, it is worthwhile to lay out some G M M basics with an eye toward the applications that follow. The coverage is by no means complete. For example, the relevant large sample theory is only sketched (and not laid out rigorously) and that which is relevant is only a subset of the estimation and inference problems that can be addressed with G M M . The interested reader is referred to the three surveys in Volume 11 of this series Hall (1993), Newey (1993), and Ogaki (1993) for more thorough coverage and references. The starting point for G M M is a moment restriction of the form:
E[gt(O_o)lZt_l ] = E[_gt(_00) ]= 0
(2.1)
where 9t(_00) is the conditional mean zero random q x 1 vector in the model, 00 is the associated p x l vector of parameters in the model, and It-1 is some unspecified information set that at least includes lagged values of 9_t(_00). The restriction to zero conditional mean random variables means that 9_t(_00) follows a martingale difference sequence and, thus, is serially uncorrelated.1 A variety of familiar econometric models take this form. Consider, for example, the linear regression model:
Yt = X/~o + 5,
(2.2)
where yt is the tth observation on the dependent variable, x t is a p 1 vector of explanatory variables, and st is a random disturbance term. In this model, suppose that the econometrician observes a vector z t for which it is known that E[~t[Zt_l] = 0. Then this model is characterized by the conditional moment condition: l The behavior of GMM estimatorscan be readily establishedwhen O_t(_0 ) is serially dependentso long as a law of large numbers and central limit theorem apply to its time series average.
Semiparametric methodsfor asset pricing models
63
9t(~_o) = etz_t_1;
E[etz_,_lIZt_l] = E[stz_t_l] = E[et]zt_ 1 = _0 .
(2.3)
When z~_1 = x_ t this is the linear regression model with possibly stochastic regressors; otherwise, it is an instrumental variables estimator. G M M involves setting sample analogues of these moment conditions as close to zero as possible Of course, they cannot all be set to zero if the number of linearly independent moment conditions exceeds the number of unknown parameters Instead, G M M takes p linear combinations of these moment conditions and seeks values of _0 for which these linear combinations are zero First, consider the unconditional version of the moment condition - that is, E[9_t(0_0)] = 0. In order for the model to be identified, assume that gt(O_o) possesses a nonsingular population covariance matrix and that E[Ogt(Oo)l/O0] has full row rank. The G M M estimator can be derived in two ways. Following Hansen (1982), the G M M estimator _O r minimizes the sample quadratic form based on a sample of T observations on _gt(00):
rr~n#r(O_)'Wr(O_o)#_r(O_O_ );
-
_#r(_O) = ~Zg__t(_O)
t=l
7"
(2.4)
given a positive definite weighting matrix Wr(_00) converging in probability to a positive definite limit W(_00). In this variant, the econometrician chooses WT(O_o) to give the G M M estimator desirable asymptotic properties. Alternatively, we can simply define the estimator __O~.as the solution to the equation system:
1
= t=l
T
= 0 (2.5)
where Ar(Oo) is a sequence of p x q 0p(1) matrices converging to a limit A(_0) with row rank p. In this formulation, At(_00) is chosen to give the resulting estimator desirable asymptotic properties. The estimating equations for the two variants are, of course, identical in form since: AT(00)OT(_O~) = G~_O~rV~(00)O~(_OT)= 0 ;
{00
_ P . . . . . . . , ^ p - - __ , .
(2.6)
For my purpose, equation (2.5) is a more suggestive formulation. The large sample behavior of -Or is straightforward, particularly in this case where _9 t (0_0) a martingale difference sequence. 2 An appropriate weak law of large numbers insures that g r ( 0 0 ) ~ 0 , which, coupled with the identification condmons, lmphes that __0T-+__00.So long as the necessary time series averages converge:
2 The standard referenceon estimation and inferencein this frameworkis Hansen (1982).
64
B. N. Lehmann
s (_00) =
t=l
] < s(0_0) ; 0)_g,COo)'

(2.7)
IS(O_o)[ > 0
aT(O_o)Pa(O_o)
the standard first order T a y l o r expansion coupled with Slutsky's t h e o r e m yields:

1 T
X/-f( O--T--O-o) P~ - D(O-o)~Et=l gt(O) ;

n(_00) = [G(_00)
(2.8)
W(O_.o)G(O)t]-IG(O_o) W(O_o)
(2.9)
and an appropriate central limit t h e o r e m for martingales ensures that v~(_0r - _00) --~ N[0,
D(O_o)S(O_o)n(O_o)' ] .
Consistent standard error estimates that are robust to conditional heteroskedasticity can be calculated f r o m this expression by replacing 00 with _0r.3 W h a t choice Of AT(Oo) or, equivalently, o f Wr(0_0) is optimal? All fixed weight estimators - that is, those that apply the same matrix Ar(Oo) to each gt(_00) for fixed T - are consistent under the weak regularity conditions sketched above. Accordingly, it is natural to c o m p a r e the asymptotic variances o f estimators, a criterion that can, of course, be justified m o r e formally by confining attention to the class of regular estimators that rules out superefficient estimators. The asymptotically optimal A(_00) is obtained by equating WT(O_o) with ST(00) -1, 1 t 1 yielding an asymptotic covariance matrix of [G(OO_o)S(Oo)-^G(O_o) ]- . Once again, ST(O_o) can be estimated consistently by replacing _0 with _0r.4 The optimal unconditional G M M estimator has a clear connection with the m a x i m u m likelihood estimator (MLE), even though we do not k n o w the probability law generating the data. Let &at(0_o,r/) denote the logarithm o f the p o p u l a t i o n conditional distribution of the data underlying g_t(00) where 17 is a possibly infinite dimensional set of nuisance parameters. Similarly, let ~frt(-_00,~_) denote the true score function, the vector of derivatives o f ~t(00, ~/), with respect to 0. Consider the 'unconditional p o p u l a t i o n projection o f &aTt(00,r/) on the m o m e n t conditions g t(_00):
3 Autocorrelation is not present under the hypothesis that ~(_0) has conditional mean zero and is sampled only once per period (that is, the data are not overlapping). If the data are overlapping, the moment conditions will have a moving average error structure. See Hansen and Hodrick (1980) for a discussion of covariance matrix estimation in this case and Hansen and Singleton (1982) and Newey and West (1987) for methods appropriate for more general autocorrelation. 4 The possible singularity of ST(0) is discussed indirectly in Section 4.3 as part of the justification for factor structure assumptions. While my focus is not on hypothesis testing, the quadratic form in the fitted value of the moment conditions and the optimal weighting matrix yields the test statistic -- p) since p degrees of freedom are used in estimating _0.This test of T O__T(O_T)'ST(O_r)-I~T(O_T)~z2(q overidentifying conditions is known as Hansen's J test.
Semiparametric methods for asset pricing models
65
L~'t(Oo,_q) = Cov[~'t(O_o, _q),~(__Oo)']Var[o_t(_Oo)]-l_gt(Oo)+ = -~-~(_00)

_ [oo,(Oo)']
T = E [~(_00)Ot(_00)']
V_~ut ;
+_V~u, ;
(2.1o)
since E[Lf~t(0_0,_q),gt(_00)'] = - ~ is zero given sufficient regularity to allow differentiation of the moment condition E[g~(_00)] = 0 under the integral sign. In this notation, the asymptotic variance of the unconditional G M M estimator is
[~ I/.t- 1 (~1]-1.
Hence, the optimal fixed linear combination of moment conditions A (00) has the largest unconditional correlation with the true, but unknown, conditional score in finite samples. This fact does not lead to finite sample efficiency statements for at least two reasons. First, the M L E itself has no obvious efficiency properties in finite samples outside the case where the score takes the linear form 1(0o)(0_ - __00)where I(0o) is the Fisher information matrix. Second, the feasible optimal estimator replaces 0_ 0 with _O r in A(O_o), yielding a consistent estimator with no obvious finite sample efficiency properties. Nevertheless, the optimal fixed weight G M M estimator retains this optimality property in large samples. Now consider the conditional version of the moment condition; that is, E[gt(Oo)[It_l ] = 0. The prior information available to the econometrician is that _gt(_00) is a martingale difference sequence. Hence, the econometrician knows only that linear combinations of the g~(_00) with weights based on information available at time t - 1 have zero means - nonlinear functions of _gt(_00) have unknown moments given only the martingale difference assumption. Since the econometrician is free to use time varying weights, consider estimators of the form: 5
1 T
; A,_, 1,_1
t=l
/211/
where At-1 is a sequence of p x q Op(1) matrices chosen by the econometrician. In order to identify the model, assume 9t(Oo) has a nonsingular population conditional covariance matrix E[gt(00)gt(0~/']li_l] and that E[Og_t(Oo)'/OO_llt_l ] has full row rank. The basic principles of asymptotically optimal estimation and inference in the conditional and unconditional cases are surprisingly similar ignoring the difficulties associated with the calculation of conditional expectations E[e[It]. 6 Once again, under suitable conditional versions of the regularity conditions sketched above:
5 The estimators could, in principle, involve nonlinear functions of these time series averages but their asymptotic linearity means that their effect is absorbed in At-1. 6 Hansen (1985), Tauchen (1986), Chamberlain (2987), Hansen, Heaton, and Ogaki (1988), Newey (1990), Robinson (1991), Chamberlain (1992), and Newey (1993) discuss efficient G M M estimation in related circumstances.
66
B. N. Lehmann T E 1t~_tAt-l~t-I 1 ---+Dc(O);
[T t~=l
ao_
-+
,-,=E L @
II,
(2.12)
1 < A t_lOt(Oo)Ot(Oo)At_l_ 1 T ' ' p ~EAt_lE[gt(O_o)gt(Oo)t[lt_l]Att_l

- - - t = l - - -
Lsc( O_o)
the sample m o m e n t condition (2.11) is asymptotically linear so that:
x/T(O_T -- 0o) P -- Dc (00) ~ ~ A t-19_t(0_o)

and x/T(_0 r - _00) --~ N[0,
(2.12)
Dc(O_o)Sc(O_o)Dc(O_O_o)' ] .
(2.13)
T h e econometrician can choose the weighting matrices At-1 to minimize the asymptotic variance of this estimator. The weighting matrices A l which are optimal in this sense are given by:
Att 1 =
~t_l I//~_11 ;
I/it_ 1 =
E[o_t(O__o)Y_t(O_o)tllt_l]
(2.14)
and the resulting minimal asymptotic variance is:
Var[v~(_Or - _0_0o)] ~
~t-1 ~21~t-m'
~t-l')] -1
(2.15)
P[E(~t-1 ~1
The evaluation of A_l need not be straightforward and doing so in asset pricing applications is the m a i n preoccupation of Section 4. 7 The relations between the optimal conditional G M M estimator and the M L E are similar to the relations arising in the unconditional case. The conditional population projection of Aa't(_00,q_) on the m o m e n t conditions ~(00) reveals that:
7 The implementation of this efficient estimator is straightforward given the ability to calculate the relevant conditional expectations. Under weak regularity conditions, the estimator can be implemented in two steps by first obtaining an initial consistent estimate (perhaps using the unconditional GMM estimator (2.5)), estimating the optimal weighting matrix At-i using this preliminary estimate, and then solving (2.14) for the efficient conditional GMM estimator. Of course, equations (2.11) and (2.14) can be iterated until convergence, although the iterative and two step estimators are asymptotically equivalent to first order.
67
-~v't(_00,_q) = Cov[5~(_00, q), 9~(O_o)'[It-1]Var[9_t(O_o)[It_l]-1 _gt(_00)+ _v~ct = -Or-1 ~--al_9t(_00) + v~ect (2.16)
l since E[~CPt(_00, q_)_gt(_00)/ + O~(Oo)'/O0_llt_l ] is zero given sufficient regularity to interchange the order of differentiation and integration of the conditional moment condition El9 (Oo)llt_l ] = 0. Hence, the optimal linear combination of moment conditions~t-i has the largest conditional correlation with the true, but unknown, conditional score in finite samples. While this observation does not translate into clear finite sample efficiency statements, the G M M estimator based on At_1 is that which is most highly correlated with the M L E asymptotically. It is easy to characterize the relative efficiency of the optimal conditional and unconditional G M M estimators. As is usual, the variance of the difference between the optimal unconditional and conditional G M M estimators is the difference in their variances since the latter is efficient relative to the former. The difference in the optimal weights given to the martingale increments _gt(_00) is:
A , - A(00) = [a~,_l - G~(0)] W_2, + Gr(0)[W-'I - S~(00) ~]

-
(2.17)
e] ';21 +
[W21 -
Note that the law of iterated expectations applies to both ~t-1 and ~Pt-1 separately but not to the composite At_, so that E[A_l - A(0~)] does not generally converge to zero. In any event, the relative efficiency of the conditional estimator is higher when there is considerable time variation in both ~t-1 and ~Pt-l. Finally, the conventional application of the G M M procedure lies somewhere between the conditional and unconditional cases. It involves the observation that zero conditional mean random variables are also uncorrelated with the elements of the information set. Let Zt-1 E It-1 denote an r x q(r > p) matrix of predetermined variables and consider the revised moment conditions E[Zt_l_0t(0_0)]/t_l ] ~-~-E[/t_lg,(_00) ] = 0 V Zt-1 C It-1 .
(2.18)
In the unconditional G M M procedure discussed above, Zt-1 is Iq, the q q identity matrix. In many applications, the same predetermined variables gt-1 multiply each element of _gt(_00) so that Z t _ 1 takes the form Iq zt_ 1. Finally, different subsets of the information available to the econometrician z_it_1 E It-1 can be applied to each element of ~(0~) so that Zt_l is given by
fZll :!j/
_0
~x
Zt-1 =
-z2t-1 "'"
0 --" 1
(2.19)
. . . . . . . .
While optimal conditional G M M can be applied in this case, the main point of this procedure is to modify unconditional GMM. As before, the unconditional population projection of ~ ' t ( ~ ) on the moment conditions Zt_l~(O~) yields
68
B. N. Lehmann
't(_00,q) = Cov[L,e't(_00,t/), gt(O_o)'Z[_,]Var[Zt_19t(O_o)]-lzt_lgt(O_o)
+ v_~,,zt
= -Cbz~kzlZt_lg_t(O_o) + v_.,~uZt
z= E ~ -
fog,(Oo)', , ]
Z;_,J
I t
(2.20)
~ z =- E{Zt-l#_i(O__oo)~(O_o) Z;_I}
since E{La't(0_0,_q)_gt(0_0)'Z[_a} = - ~ z given sufficient regularity to allow differentiation under the integral sign. The weights q~zTJzlZt_l can also be viewed as a linear approximation to the optimal conditional weights Atl = ~t_lt/J~-_ll. Put differently, At_1 would generally be a nonlinear function of Zt-1 if Zt-i were the relevant conditioning information from the perspective of the econometrician.
3. Asset pricing relations and their econometric implications Modern asset pricing theory follows from the restrictions on security prices that arise when markets do not permit arbitrage opportunities. That the absence of arbitrage implies substantive restrictions is somewhat surprising. Outside of international economics, it is not commonplace for the notion that two eggs should sell for the same price in the absence of transactions costs to yield meaningful economic restrictions on egg prices - after all, two eggs of equal grade and freshness are obviously perfect substitutes. 8 By contrast, the no-arbitrage assumption yields economically meaningful restrictions on asset prices because of the nature of close substitutes in financial markets. Different assets or, more generally, portfolios of assets may be perfect substitutes in terms of their random payoffs but this might not be obvious by inspection since the assets may represent claims on seemingly very different cash flows. The asset pricing implications of the absence of arbitrage have been elucidated in a number of papers including Rubinstein (1976), Ross (1978b), Harrison and Kreps (1979), and Chamberlain and Rothschild (1983), Hansen and Richard (1987). Consider trade in a securities market on two dates: date t - 1 (i.e., today) and date t (i.e., tomorrow). There are N risky assets, indexed by i = 1 , . . - , N , which need not exhaust the asset menu available to investors. The nominal price of asset i today is Pit-l. Its value tomorrow - that is, its price tomorrow plus any cash flow distribution between today and tomorrow - is uncertain from the perspective of today and takes on the random value Pit + Dit tomorrow. Hence, its gross return (that is, one plus its percentage return) is given by Rit = (Pit + Dit)/Pit-1. Finally, the one period riskless asset, if one exists, has the sure gross return Rft = 1 / P f t - i and ! always denotes a suitably conformable vector of ones. 8 This observationwas translatedinto a livelydiatribe by Summers(1985, 1986).
Semiparametric methodsfor assetpricing models
69
The market has two crucial elements: one environmental and one behavioral. First, the market is frictionless: trade takes place with no taxes, transactions costs, or other restrictions such as short sales constraints. 9 Second, investors vigorously exploit any arbitrage opportunities, behavior that is facilitated by the no frictions assumption, that is, investors are delighted to make something for nothing and they can costlessly attempt to do so. In order to illustrate the asset pricing implications of the absence of arbitrage, suppose that a finite number of possible states of nature s = 1, ...,S can occur tomorrow and that the possible security values in these states are Pist q-Dist .10 Clearly, there can be at most min [N, S] portfolios with linearly independent payoffs. Hence, the prices of pure contingent claims - securities that pay one unit of account if state s occurs and zero otherwise - are uniquely determined if N ___S and if there are at least S assets with linearly independent payoffs. If N < S, the prices of such claims are not uniquely determined by arbitrage considerations alone, although they are restricted to lie in an N-dimensional subspace if the asset payoffs are linearly independent. Let I]lst_ 1 denote the price of a pure contingent claim that pays one unit of account if state s occurs tomorrow and zero otherwise. These state prices are all positive so long as each state occurs with positive probability according to the beliefs of all investors. The price of any asset is the sum of the values of its payoffs state by state.ll In particular:
s s
Pit-1 : ~
s=l
I/Ist_l(eist q- Dist) ; eft-1 -~- ~

s=l
@st-1
(3.1)
or, equivalently:
s s
~/st_lRist = | ;
s=l
Rft_, ~
s=l
~/st-1 = | .
(3.2)
Since they are non-negative, scaling state prices so that they sum to one gives them all of the attributes of probabilities. Hence, these risk neutral probabilities:
9 Some frictions can be easily accommodated in the no-arbitrage framework but general frictions present nontrivial complications. For recent work that accommodates proportional transactions costs and short sales constraints, see Hansen, Heaton, and Luttmer (1993), He and Modest (1993), and Luttmer (1993). l0 The restriction to two dates involves little loss o f generality as the abstract states of nature could just as easily index both different dates and states of nature. In addition, most of the results for finite S carry over to the infinite dimensional case, although some technical issues arise in the limit of continuous trading. See Harrison and Kreps (1979) for a discussion. 11 The frictionless market assumption is implicit in this statement. In markets with frictions, the return of a portfolio of contingent claims would not be the weighted average of the returns on the component securities across states but would also depend on the trading costs or taxes incurred in this portfolio.
70
B. N . L e h m a n n
Xs*t-I - -
I[ s t -l 1
IPst- 1
-- RftlPst-1 -P ft- 1
(3.3)
E s = I I]l s t - 1
comprise the risk neutral martingale measure, so called because the price of any asset under these probability beliefs is given by:
S
Pit-1 = Pft-1 E Xs*t-l(Pist+ Dist)

s=l
(3.4)
that is, its expected present value. Risk neutral probabilities are one summary of the implications of the absence of arbitrage; they exist if and only if there is no arbitrage. This formulation of the state pricing problem is extremely convenient for pricing derivative claims. Under the risk neutral martingale measure, the riskless rate is the expected return of any asset or portfolio that does not change the span of the market and for which there is a deterministic mapping between its cash flows and states of nature. However, it is not a convenient formulation for empirical purposes. Actual return data is provided according to the true (objective) probability measure. That is, actual returns are generated under rational expectations. Accordingly, let lr~t_l be the objective probability that state s occurs at time t given some arbitrary set of information available at time t-1 denoted by It-l. The reformulation of the pricing relations (3.1) and (3.2) in terms of state prices per unit probability qst-1 = I]lst-l/gst-I reveals:
Pit-1 = E l Lqst-l(Pists=l +Dist)[lt-1] =-E[Qt(Pit +Dit)llt-1]
(3.5)
Pft-1 = E
q,t-lllt-1 =-
E[Q, IZ,-I]
or, equivalently, in their expected return form:
E
Ls=l
qst_lRistllt_l = E[QtRit[It-1] = 1
(3.6)
qst_lgft_llIt_l
=_Rftg[Qtllt_l] = 1 .
At this level of generality, these conditional moment restrictions are the only implications of the hypothesis that markets are frictionless and that market prices are marked by the absence of arbitrage. Asset pricing theory endows these conditional moment conditions with expirical content through models for the pricing kernel Qt couched in terms of
71
potential observables. 12 Such models equate the state price per unit probability qst-1, the cost per unit probability of receiving one unit of account in state s, with some corresponding measure of the marginal benefit of receiving one unit of account in state s. 13 Most equilibrium models equate Qt, adjusted for inflation, with the intertemporal marginal rate of substitution of a hypothetical, representative optimizing investor. 14 The most common formulation is additively separable, constant relative risk aversion preferences for which Qt = p ( c t / c t - 1 ) - ~ where p is the rate of time preference, Ct/Ct_ 1 is the rate of consumption growth, and ~ is the coefficient of relative risk aversion, all for the representative agent. 15 Accordingly, let x_ t denote the relevant observables that characterize these marginal benefits in some asset pricing model. Hence, pricing kernel models take the general form:
a t = Q(xt,
O_Q) ;
Ot > 0 ;
x t E It
(3.7)
where _0Q is a vector of unknown parameters. To be sure, the parametric component can be further weakened in settings where it is possible to estimate the function Q(o) nonparametrically given only observations on R__ t and x__ t, However, the bulk of the literature involves models in the form (3.7). 16 Equations (3.5) through (3.7) are what make asset pricing theory inherently semiparametric. 17 The parametric component of these asset pricing relations is a
12 It is also possible to identify the pricing kernel nonparametrically with the returns of particular portfolios. For example, the return of growth optimal portfolio which solves max E{ln~dat_lRt[It_x; wot_l E/t-I) is equal to Q[l. Of course, it is hard to solve this maximum problem without parametric distributional assumptions. See Bansal and Lehmann (1955) for an application to the term structure of interest rates. The addition of observables can serve to identify payoff relevant states, giving nonparametric estimation a somewhat semiparametric flavor. Put differently, the econometrician typically observes a sequence of returns without information on which states have been realized; the vector x_. t provides is an indicator of the payoff relevant state of nature realized at time t that helps identify similar outcomes (i.e., states with similar state prices per unit probability). Bansal and Viswanathan (1993) estimate a model along these lines. 13 The marginal benefit side of this equation rationalizes the peculiar dating convention for Qt when it is equal to the time t-1 state price per unit probability. 14 Embedding inflation in Qt eliminates the need for separate notation for real and nominal pricing kernels. That is, Qt is equal to Q~tealect/Pct_l where Pet is an appropriate index for translating real cash flows and the real pricing kernel Q~eal into nominal cash flows and kernels. 15 More general models allow for multiple goods and nonseparability of preferences in consumption over time and states as would arise from durability in consumption goods and from preferences marked by habit formation and non-expected utility maximization. Constantinides and Ferson (1991) summarize much of the durability and habit formation literatures, both theoretically and empirically. See Epstein and Zin (1991a) and Epstein and Zin (1991b) for similar models for Qt which do not impose state separability. Cochrane (1991) exploits the corresponding marginal conditions for producers. 16 Exceptions include Bansal and Viswanathan (1993) and the linear model Qt = ~_~_lX~with ~--t-1 unobserved, a model discussed in the next section. 17 To be sure, the econometrician could specify a complete parametric probability model for asset returns and such models figure prominently in asset pricing theory. Examples include the Capital Asset Pricing Model (CAPM) when it is based on normally distributed returns and the family of continuous time intertemporal asset pricing models when prices are assumed to follow lt6 processes.
72
B. N, Lehmann
model for the pricing kernel Q(x_t , Oo). The conditional moment conditions (3.6) can then be used to identify any unknown parameters in the model for Qt and to test its overidentifying restrictions without additional distributional assumptions. Note also that the structure of asset pricing theory confers an obvious econometric simplification. The constructed variables Q t R i t - 1 constitute a martingale difference sequence and, hence, are serially uncorrelated. This fact greatly simplifies the calculation of the second moments of sample analogues of (3.6), which in turn simplifies estimation and inference) s Moreover, the economics of these relations constrains how these conditional moment restrictions can be used for estimation and interference. Ross (1978b) observed that portfolios are the only derivative assets that can be priced solely as a function of observables, time, and primary asset values given only the absence of arbitrage opportunities in frictionless markets. The same is true for econometricians - for a given asset menu, the econometrician knows only the prices and payoffs of portfolios with weights w__t_ 1 E It-1. Hence, only linear combinations of the conditional moment conditions based on information available at time t - 1 can be used to estimate the model. Accordingly, in the absence of distributional restrictions, the econometrician must base estimation and inference on estimators of the form:
1 T ^
-f ZAt-I[R_tQ(X__t,O_Q) - ! ] = 0 ;
t=l
At-1 G It-1
(3.8)
where At-i is a sequence of p x N Op(1) matrices chosen by the econometrician and p is the number of elements in _0Q.The matrices At-1 can be interpreted as the weights of p portfolios with random payoffs At_tR_ t that cost A t - l ! units of account. 0 19 How would a financial econometrician choose At_ . An econometrician who favors likelihood methods for their desirable asymptotic properties might prefer the p portfolios with maximal conditional correlation with the true, but unknown, conditional score. In this application, the conditional projection of ~-~tt(O0)~) o n [RtQ(x_t,O_Q) - ! ] is given by:
~,~tt(O0 , q__)= Cov[~f~tt (00, r/), R_tQ(xt, OQ)tllt_l]Var[RtQ(x__t, OQ)lit_l] -1
x [R,Q(xt, O_Q)- 1] + V_~ecQt ,

= --(~t-11/tt21
[RtQ(xt, OQ) - !] + VzcQt ;
(3.9)
q~t-1 = OE[Q(xt, O_o)Rt]I,_, ]'
O0
tJ~/t-1 :
E{[R_tQ(x_r,_0 o) - !] [RtQ(xr, 0 o) - !1'1I,-1}
l~ This observation fails if returns and Qt are sampled more than once per period. For example, consider the two period total return (i.e., with full reinvestment of intermediate cash flows) Rit,t+l = RitRit+l which satisfies the two period moment condition E[QtQt+lRit:+l I It-l] = 1. In this case, the constructed random variable QtQt+lR~t:+1-1 follows a first order moving average process. See Hansen and Hodrick (1980) and Hansen, Heaton, and Ogaki (1988) for more complete discussions.
73
since E{~'t(O_o,q__)[R_tQ(xt, Oo)-!]'lit_l} = - ~ t - 1 given sufficient regularity to permit differentiation under the integral sign. The p portfolios with payoffs 4~t_l~u~__llRt that cost ~t-1 gt~-111 units of account have no obvious optimality properties from the perspective of prospective investors. However, they are definitely optimal from the perspective of financial econometricians - they are the optimal hedge portfolios for the conditional score of the true, but unknown, log likelihood function. Put differently, the economics and the econometrics coincide here. The econometrician can only observe conditional linear combinations of the conditional moment conditions and seeks portfolios whose payoffs provide information about the parameters of the pricing kernel Q(_~,_0Q). The optimal portfolio weights are ~t_1~u~_11 and the payoffs ~bt_l~u~-jlR__t maximize the information content of each observation, resulting in an incremental contribution of ~t_l~-_ll~t_l, to the information about _0Q. In other words, the Fisher information matrix of the true score is ~t_17~_ll~'t 1 - C and the positive semidefinite matrix C is the smallest such matrix produced by linear combinations of the conditional moment conditions. This development conceals a host of implementation problems associated with the evaluation of conditional expectations. 19 To be sure, ~t-1 and ~t-1 can be estimated with nonparametric methods when they are time invariant functions ~(_Zt_l) and ~(_zt_l) for _z t 1 E It-1. The extension of the methods of Robinson (1987), Newey (1990), Robinson (1991), and Newey (1993) to the present setting, in which RrQ(X_t,_0Q)-! is serially uncorrelated but not independently distributed over time or homoskedastic, appears to be straightforward. However, the circumstances in which A_l is a time invariant function of_zt_1 would appear to be the exception rather than the rule. Accordingly, the econometrician generally must place further restrictions on the no-arbitrage pricing model in order to proceed with efficient estimation based on conditional moment restrictions, a subject that occupies the next section. Alternatively, the econometrician can work with weaker moment conditions like the unconditional moment restrictions. The analysis of this case parallels that of optimal conditional GMM. Once again, the fixed weight matrices At(_00) from (2.10) are the weights of p portfolios with random payoffs AT(Oo)R t that cost Ar(_00)Z units of account. As noted in the previous section, the price of these random payoffs is ~P-l_t which generally differs from E(At_l)!. These portfolios produce the fixed weight moment condition that has maximum unconditional correlation with the derivatives of the true, but unknown, log likelihood function.
19 The nature of the information set itself is less of an issue. While investors might possess more information than econometricians, this is not a problem because the law of iterated expectations implies that E[Ri, Qt[/7_I]= 1 VI~ic__It_l. o f course, the conditional probabilities nff_1 implicit in this m o m e n t condition generally differ from those implicit in E[RuQt]lt-1] = 1 as will the associated values of the pricing kernel ~ (i.e., qff-1 = ud,t-~/r~t-~)i The dependence of Q//~ o n nsff_1 is broken in models for Qt that equate the state price per unit probability qst-I with the marginal benefit of receiving one unit of account in state s.
74
B. N. Lehmann
Of course, conventional GMM implementations use conditioning information within the optimal unconditional GMM procedure as discussed in the previous section. Let Zt_l E It-i denote an r x N matrix of predetermined variables and consider the revised moment conditions: E[Zt-1 (RtQ(x_t, O_Q)-~_)l/~_d
= E[Zt_I(R_tQ(x__t,O_Q)-L)]=O V Zt-1 EIt-1.
(3.10)
In the preceding paragraph, Zt-1 is 1N, the N x N identity matrix; otherwise, it could reflect identical or different elements of the information set available to investors (i.e., z~_1 in IN z_t_l and z_it_1 in (2.19), respectively) being applied to each element of R_.tQ(x_t,OQ)-t_ as given in the previous section. The introduction of z_,-t_ 1 and zt_ 1 into the unconditional moment condition (3.10) is often described as invoking trading strategies in estimation and inference following Hansen and Jagannathan (1991) and Hansen and Jagannathan (1994). This characterization arises because security returns are given different weights temporally and, when z_it_t zt_ l, cross-sectionally after the fashion of an active investor. In unconditional GMM, the returns weighted in this fashion are then aggregated into p portfolios with weights that are refined as information is added to (3.10) in the form of additional components of Zt-l. Once again, there is an optimal fixed weight portfolio strategy for the revised moment conditions based on Zt_l (R__tO(x_t, OQ)-!).-From (2.20), the active portfolio strategy with portfolio weights ~Z~PzlZt_l has random payoffs bzgSzlZt_lRt and costs ~zgtzlZt_l! units of account. The resulting moment conditions have the largest unconditional correlation with the true, but unknown, unconditional score in finite samples within the class of time varying portfolios with weights that are fixed linear combinations of predetermined variables Zt-1. Of course, optimal conditional weights can be obtained from the appropriate reformulation of (3.9) above but the whole point of this approach is that the implementation of this linear approximation to the optimal procedure is straightforward.
4. Efficiency gains within alternative beta pricing formulations

The moment condition E[Q(x~, OQ)Ritllt_l] = 1 is often translated into the form of a beta pricing model, so named for its resemblance to the expected return relation that arises in the Capital Asset Pricing Model (CAPM). Beta pricing models serve another purpose in the present setting; they highlight specific dimensions in which fruitful constraints on the pricing kernel model can be added to facilitate more efficient estimation and inference. Put differently, beta pricing models point to assumptions that permit consistent estimation of the components of At_l. Accordingly, consider the population projection of the vector of risky asset returns R _ t _ on O(x_t,0_Q):
Semiparametric methods for asset pricing models Rt=~_t+fltQ(x_t,O_Q)+e_t fl-t = Var[Q(x_t, O_Q)]It_l ]
;
E[_qlI,-1] = 0
75
Cov[Rt, Q(x__,09. ) lit_ 1]
(4.1)
and Var[e] and Cov[.] denote the variance and covariance of their arguments, respectively. Asset pricing theory restricts the intercept vector ~ in this projection which are determined by substituting (4.1) into the m o m e n t condition (3.6):
t = E[~Q(xt, O_Q)II,_I ] =~_tE[Q(x_t,O_Q)llt_l] +B_E[Q(x_t,O_O)2II,_I]

which, after rearranging terms and insertion into (4.1), yields: R_, = ,20, +~[Q(~,_09-) - 29-,] + ~ 2o, = E[Q(x_t, O9-)llt-l] -1 ; ; E[ctII,_~ ] = 0 ;
(4.2)
(4.3)
29.t = )~otE[O(x_t,OQ)21It-1]
The riskless asset, if one exists, earns )~0t; otherwise, 20t is the expected return of all assets with returns uncorrelated with Qt. As noted earlier, the lack o f serial correlation in the residual vector -~t is econometrically convenient. The bilinear f o r m of (4.3) is a distinguishing characteristic of these beta pricing models. Put differently, the m o m e n t conditions (3.6) constrain expected returns to be linear in the covariances of returns with the pricing kernel. This linear structure is a central feature of all models based on the absence of arbitrage in frictionless markets; that is, the portfolio with returns that are maximally correlated with Qt is conditionally mean-variance efficient, z Hence, these asset pricing relations differ f r o m semiparametric multivariate regression models in their restrictions on risk p r e m i u m s like 2Qt and ).0t .21 The multivariate representation o f these no-arbitrage models produces a s o m e w h a t different, though arithmetically equivalent, description of efficient G M M estimation. The estimator is based on the m o m e n t conditions:
I T
~ZA#t_l~t
t=l
= 0 ;
~ = R t - Z_2ot - fl_t[Q(x_t,O_Q) - 2Ot]
(4.4)
and, after solving in terms of the expressions for 20t and )~Qt (in particular, that E[Q(xt, OQ) - 2Qt[It-1] = -2otVar[Q(x t, OQ)[/t-l]) and given sufficient regularity to allow differentiation under the integral sign, the optimal choice of A~t_l is: 20 A portfolio is (conditionally) mean-variance efficient if it minimizes (conditional) variance for given level of (conditional) mean return. A portfolio is (conditionally) mean-variance efficient for a given set of assets if only if the (conditional) expected returns of all assets in the set are linear in their (conditional) convariances with the portfolio. See Merton (1972), Roll (1977), and Hansen and Richard (1987). 21 They differ in at least one other respect - most regression specifications with serially uncorrelated errors have E[~_tlQt ] = 0_, which need not satisfied by (4.3).
76
B. N. Lehmann
A ~ t - 1 = ~/3t-I I//~tl 1 ;
tIl~t_l = ~[Xt-1 -~ flt-1
-fit-fit'Var[Q(xt, O_Q)I/t_l]: E[~t~_tt lit_l]
E {~_Q~t'lit-1 ,}
= 20t
Var[Q(~,o_O_o)lI,-1]O--~-'O-~
O0_o
-fit'
(4.5)
O)~ot (t-- Var[Q(xt, O_Q)llt_l]flt)'
= 2or
OCov[Q(xt, O_Q), R_t]lt_~]' 02or(t 0_0 0_00
Cov[Q(xt, O_Q), R tlI,_l])'
The last line in the expression for ~t~t-1 illustrates the relations with (3.9) in the previous section. Note that the observation of the riskless rate eliminates the term Q.22 involving 0 2ot/ OO There is no generic advantage to casting no-arbitrage models in this beta pricing form unless the econometrician is willing to make additional assumptions about the stochastic processes followed by returns. 23 As is readily apparent, there are only three places where useful restrictions can be placed on beta pricing models: (1) constraints on the behavior of the conditional betas, (2) additional restrictions on the model Q(xt, O_Q),and (3) on the regression residuals. We discuss each of these in turn in the Sections 4.1-4.3 and these ingredients are combined in Section 4.4.
4.1. Conditional beta models

The benefits of a model for conditional betas are obvious. Conditional beta models facilitate the estimation of the pricing kernel model Q(xt, O_Q)by sharpening the general moment restrictions (3.6) with a model for the covariances embedded in them (i.e., E[Q(xt, O__Q)Ritllt-1] = Cov[Q(x_t,O_Q),Ritllt]q-,~ot 1E[Rit]/,_l]). They also mitigate some of the problems associated with efficient of asset pricing relations. Put differently, the econometrician is explicitly modeling some of the components of ~ t - 1 in this case.
22 In the case of risk neutral pricing, ~ t - t collapses to -(020,/0_0)! since Var[Q(x_t, _0Q)lit_l] is zero and to zero if, in addition, the econometrician measures the riskless rate. 23 The law of iterated expectations does not apply to the second moments in these multivariate regression models so that this representation alone does nothing to sharpen unconditional G M M estimation. Additional covariances are introduced in the passage from conditional to unconditional moments because of the bilinear form of beta pricing models. The unconditional moment condition for security i is E[git~t-1 lit-l] = E[gitz_it_l] 0 '7' Z~t_1 6 /t-1 and the sum of the two offending covarianees Cov(flit(E[O(x,,O-)-2Qt)llt-1], g-it-l} q- Cov{flit, (E[O(x,, 0) ,~Qt}E[z_it_l] cannot be separated without further restrictions.
= -
Semipararnetric methods for asset pricing models
77
Accordingly, suppose the econometrician observes a set of variables _2t_1 E It-l, perhaps also contained in x~ (i.e., z t_ 1 c x_a), and specifies a model of the form:
--fit ~-----fl-(-gt-l'--Ofl ) ; Z-t-1 E
It-1
(4.6)
where 0# is the vector of unknown parameters in the model for fit" In these circumstances, the beta pricing model becomes: Rt = z20t + _fl(_zt_1, _0#)[Q(x,, _0Q) - 2Qt] +-~t (4.7)
In the most common form of this model, the conditional betas are constant, the z t_ a is simply the scalar 1, and 0~ is the corresponding vector of constant conditional betas ft. All serial correlation in returns is mediated through the risk premiums given constant conditional betas. 24 Models for conditional betas make efficient G M M estimation more feasible by refining the optimal weighting matrices since: ~ , - 1 = g~-~_O lit-1 = Rot Var[Q(xt, OQ)llt_l] O-~(-zt-i'o00~)' /
/
-~
OVar[Q(x_t, _OQ)I/t-l]
a__o
X __fl(Zt_l, _Off)'
OJ't (l_ - Var[Q(xt, _ OQ)llt_l]fl__(z_t_l,0p))
(4.8)
where, as before, an observed riskless rate eliminates the last line of (4.8). Since the parameter vector _0is (_0Q r__0p r), ~zt-i and tT~flt_ 1 in (4.5) differ in two respects:
( oe_t' - oet3t' lIt-1 I E~ -O~Q= \
) ( OCov[Q(x-t'O-O-Q )'Rtllt-i]t
ao
OVar[Q(x-t, Qo)llt-1] ....
"~'
(4.9)
E ( Oe_~t' ~ Ofl(zt_l, 0_~)' ~ ff~_~ Ilt-I j = 2tVar[Q(x-t' -)]lt-1] - 0o0_
24 Linear models o f the form flit = O-i~rSi#z~-iare also common where Si# is a selection matrix that picks the elements ofz~_ l relevant for flit- Linear models for conditional betas naturally arise when the APT holds both conditionally and unconditionally (cf., Lehmann (1992)). Some commercial risk management models allow 0~/~to vary both across securities and over time; see Rosenberg (1974) and Rosenberg and Marathe (1979) for early examples. Error terms can be added to these conditional beta models when their residuals are orthogonal to the instruments _zt_1 c It-1. Nonlinear models can be thought of as specifications o f the relevant components of ~ t - 1 by the econometrician.
78
B. N. Lehmann
A tedious calculation using partitioned matrix inversion verifies that the variance of the efficient G M M estimator of O-Ofalls after the imposition of the conditional beta model, both because of the reduction in dimensionality in the transition from the derivatives of Cov[Q(xt, O_Q),Rtllt_l] to the derivatives of Var[Q(xt, 0Q)l/t_l] in the first line of (4.9) and because of the additional moment conditions arising from the conditional beta model in the second line of (4.9). Hence, the problem of constructing estimates of the covariances between returns and the derivatives of the pricing kernel in (3.9) is replaced by the somewhat simpler problem of estimating the conditional variance of the pricing kernel along with its derivatives in these models. Both formulations require estimation of the conditional mean of Q(xt, O-Q)and its derivatives through 20t, a requirement eliminated by observation of a riskless asset. While stochastic process assumptions are required to compute E[Q(xt, 0Q)litl], Var[Q(x t, O_Q)lit_l], and their derivatives, a conditional beta model and, when possible, measurement of the riskless rate simplifies efficient G M M estimation considerably. 25 Note also that the optimal conditional weighting matrix q%_lT~tl_l has a portfolio interpretation similar to that in the last section. The portfolio interpretation in this case has a long standing tradition in financial econometrics. Ignoring scale factors, the portfolio weightsassociated with the estimation of the premium 2Qt are proportional to _fl_(gt_l,_0~). Similarly, the portfolio weights associated with the estimation of the 20t are proportional to l-fl_(z_t_l, 0p) after scaling Var[Q(xt, 09.)lit_l] to equal one, as is appropriate when the econometrician observes the return of portfolio perfectly correlated with Qt but not a model for Qt itself (a case discussed briefly below). Such procedures have been used assuming returns are independently and identically distributed with constant betas beginning with Douglas (1968) and Lintner (1965) and maturing into a widespread tool in Black, Jensen, and Scholes (1972), Miller and Scholes (1972), and Fama and MacBeth (1973). Shanken (1992) provides a comprehensive and rigorous description of the current state of the art for the independently and identically distributed case. Models for the determinants of conditional betas have another use-they make it possible to identify aspects of the no-arbitrage model without an explicit model for the pricing kernel Qt. Given only fl__(zt_l,O-~), expected returns are given by: E[Rt]It-t] = !20t + fl_(z_,_l,__0a)[2pt- 20t] (4.10)
The potentially estimable conditional risk premiums 20t and )~pt are the expected returns of conditionally mean-variance efficient portfolios since the expected returns on the assets in this menu are linear in their conditional betas. 26 However, 25The presenceof Var[Q(x t, 0~)I/,-d and its derivativesin (4.8) arises because (4.6) is a model for conditional betas, not for conditionalcovariances. In most applications, conditional beta models are more appropriate. 26 The CAPM is the best known model which takes this form, in which portfolio p is the market portfolio of all risky assets. The market portfolio return is maximallycorrelatedwith aggregatewealth (which is proportional to Qt in this model)in the CAPM in general;it is perfectlycorrelatedif markets are complete.
79
these parameters are also the expected returns of any assets of portfolios that cost. one unit of account and have conditional betas of one and zero, respectively. Portfolios constructed to have given betas are often called mimicking or basis portfolios in the literatures Mimicking portfolios arise in the portfolio interpretation of efficient conditional GMM estimation in this case and delimit what can be learned from conditional beta models alone. Given only the beta model (4.6):
et =l-}cOt -[- fl_(z-t-1, _Off)[}cpt - }cOt] -~- E-flpt ; Itt flpt_ 1 =
(4.11)
Z[~_~pte#pt'llt_l ]
-
~)flpt--I
(}cpt
~Ot) Ofl---(Z-'~ ~'
02ot r
~ ~ t-~ - ~ ( ~ - 1 , 0 ~ ) ]
+ a}cpo
Note that if we treat the risk premiums as unknown parameters in each period, the limiting parameter space is infinite dimensional. Ignoring this obvious problem, the optimal conditional moment restrictions are given by:
~ I (}c,'-}co,) O~(-z'-"/l o0~ ] _, l! J ~lflpt-1 ,=i _~(_~,_~,0~)]'

x
[~ -- l-}cOt- fl(Z-t-1,0fl)(}cpt-
}COt)] = _0
(4.12)
and the solution for each }cot and }cpt -- }cOt is:
#pt --
hot]
hOt J =[(-/--fl(-Zt-l'
O--fl))!tlAfl;t-1 (-/---fl(-Zt-l' Off))]-1

I -I
(4.13)
X (l_tiff_ (Z_t_ 1, O0_fl ) ) ItAflp t _ 1RRt
27 See Grinblatt and Titman (1987), Huberman, Kandel, and Stambaugh (1987), Lehmann (1987),
Lehmann and Modest (1988), Lehmann (1990), and Shanken (1992) for related discussions. In econometric terms, the portfolio weights that implicitly arise in cross-sectional regression models with arbitrary matrices F solve the programming problems: min W_rpt_ WEpt -1
!
l rw_r pt_ 1 subject to WtFpt_l t = 1 and w_tF pt_ l fl(Zt_ l , O_f) = 1

subject to W~Ot_l! : 1 and W~Ot_l_fl(Zt_l,_Ofl) = 0
w mrinwZrot_lWrOt_l
Ordinary least squares corresponds to F : I, F = Diag{Var[R_t[It_l]} to weighted least squares, and

F = Var[R~[It_t] to generalized least squares.
80
B. N. Lehmann
which are, in fact, the actual, not the expected, returns of portfolios that cost one and zero units of account and that have conditional betas of zero and one, respectively Hence, there are three related limitations on what can be measured from risky asset returns given only a conditional beta model. First, the conditional beta model is identified only up to scale: _fl(z_t_l,Ofl)(2pt -- Ot) is observationally equivalent to ~fl_(Zt_l,O__fl)(,~pt- ,~Ot)/~O for any ~o 0. Second, the portfolio returns 20t and "~pt- "~Othave expected returns 20t and Apt- "~Ot,respectively, but the expected returns can only be recovered with an explicit time series model for E[Rt[It_l]. 28 Third, the pricing kernel Qt cannot be recovered from this model - only Rpt, the return of the portfolio of these N risk assets that is maximally correlated with Qt, can be identified f r o m ~pt in the limit (i.e., as _fl(zt_l,_0#)~fl_(z__t_l,_0~)).
^ p
4.2 Multifactor models

Another parametric assumption that facilitates estimation and inference is a linear model for Qt. The typical linear models found in the literature simultaneously strengthen and weaken the assumptions concerning the pricing kernel Clearly, linearity is more restrictive than possible nonlinear functional forms. However, linear models generally involve weakening the assumption that Qt is known up to an unknown parameter vector since the weights are usually treated as unobservable variables. Some equilibrium models restrict Qt to be a linear combination (that is, a portfolio) of the returns of portfolios. In intertemporal asset pricing theory, these portfolios let investors hedge against fluctuations in investment opportunities (cf., Merton (1973) and Breeden (1979)) Related results are available from portfolio separation theory, in which such portfolios are optimal for particular preferences (ef., Cass (1970)) or for particular distributions of returns (cf., Ross (1978a)). Similarly, the Arbitrage Pricing Theory (APT) of Ross (1976) and Ross (1977) combines the no-arbitrage assumption with distributional assumptions describing diversification prospects to produce an approximate linear model for Qt .29 In these circumstances, the pricing kernel Qt (typically without any adjustment for inflation) follows the linear model:
Qt = gtxt-lx-t -[- Ctmt-lR--mt ;
Qt > 0 ; ~xt_l, ogmt__l E It-1
(414)
where xt is a vector of variables that are not asset returns while R_mtis a vector of portfolio returns These models typically place no restrictions on the (unobserved) weights ~xt-1 and ~mt-1 save for the requirement that they are based on information available at time t-1 and that they result in strictly positive values of
28 M o m e n t s of 20t and 2pt - 20t can be estimated. For example, the projection of Jot and ~-pt - jot on z~_1 E It-1 recovers the unconditional projection of 20t and 2pt - 20t on zt_ l c It-1 in large samples, 29 The A P T as developed by Ross (1976) and Ross (1977) places insufficient restrictions on asset prices to identify Qt. In order to obtain the formulation (4.14), sufficient restrictions m u s t be placed on preferences and investment opportunities so that diversifiable risk c o m m a n d s no risk premium.
Semiparametrie methods for asset priciqg models
81
Qt .30 Put differently, a model takes the more general form Q(x_t,_0) when ~___xt-1 and m--~t-1 are parameterized as o)x (z_ t_ l, _0) and m__,n (zt_ 1, _0). Accordingly, consider the linear conditional multifactor model:
R--t = ~t-[- Bx(z-t-l,O-Bx)X-t -I- Brn(Z-t-l,OBm)R--mt q- ~Bt "
(4.15)
The imposition of the moment conditions (2.6) yields the associated restriction on the intercept vector:
~_, = [l - Bm(Zt_l,O__Bm)t_]20t -- gx(zt_l, OBx)2xt ~xt : AOt [E[x_~tt]It-1]~___xt-1 q- E[xtR__'rnt)It-1]~___mt-l]

(4.16)
so that, in principle, oJxt_1 and OOmt_1 can be inverted from the expression for 2xr Finally, insertion of this expected return relation into the multifactor model yields:
R, = l_~O t @ gx (Zt_l, ~Bx ) [X, -- L,] -[- em (z-t- 1, OBm ) [e.-~nt - -L~0t]
+ -~Bt;E[-eBt[It-l] = 0 .
(4.17)
Once again, the residual vector has conditional mean zero because expected returns are spanned by the factor loading matrix B(z~_l, 0B) and a vector of ones. 31 As is readily apparent, this model requires estimates of the conditional mean vector and covariance matrix o f (x_ttRtmt) '. Note t h a t no restrictions are placed on E[Rmt][t_l ] in (4.17). If the econometrician observes the returns R_~t and the variables x_ t with no additional information on Qt, the absence of a model linking R~n t with Qt eliminates the restrictions on E[R_R_.mt[It_l]that arise from the moment condition E[R__mtQt[It_l ] = !. The same observation would hold if the returns of portfolio p were observed in (4.10)-(4.13). Put differently, a linear combination of the returns R_~t or of the r e t u r n Rpt provides a scale-flee proxy for Qt. In the absence of data on or of a model for Qt, asset pricing relations explain relative asset prices and expected returns, not the levels of asset prices and risk premiums. As with the imposition of conditional beta models, linear factor models simplify estimation and inference by weakening the information requirements. Linearity of the pricing kernel confers three modest advantages compared with the conditional beta models of the previous section: (1) the derivatives of the conditional mean and variance of Q(xt, O_Q) are no longer required; (2) the conditional covariance matrices involving x t and R_~t contains no unknown model parameters (in contrast to Var[Q(xt,_00)[It_l]); and (3) the linear model permits c%_ 1 and m_~_mt1 _ to remain unobservable. The third point comes at a cost - the
30 Imposing the positivity constraint in linear models is sometimes quite difficult. 31 Since the multifactor models described above are cast in terms o f Qt, [1 - B", (~_1,0-sm)Z] will not be identically zero. In multifactor models with no explicit link between Qt and the underlying common factors, this remains a possibility. See Huberman, Kandel, and Stambaugh (1987), Huberman and Kandel (1987), and Lehmann and Modest (1988) for a discussion o f this issue.
82
B. N. Lehmann
model places no restrictions on the levels of asset prices and risk premiums. Once again, additional simplifications arise if there is an observed riskless rate. Multifactor models also take the form of prespecified beta models. The analysis of these models parallels that of the single beta case in (4.10)-(4.1 3). A conditional factor loading model B(_Zt_l, 0B) can only be identified up to scale and, at best, the econometrician can estimate the returns of the minimum variance basis portfolios, each with a loading of one on one factor and loadings of zero on the others. In terms of the single beta representation, a portfolio of these optimal basis portfolios with time-varying weights has returns that are maximally correlated with Qt or, equivalently, a linear combination ofB(z_4_l, _0B) is proportional to the conditional betas ~ in this multifactor prespecified beta model. 4.3. Diversifiable residual models and estimation in large cross-sections One other simplifying assumption is often made in these models: that the residual vectors are only weakly correlated cross-sectionally. This restriction is the principal assumption of the APT and it implies that residual risk can be eliminated in large, well-diversified portfolios. It is convenient econometrically for the same reason; the impact of residuals on estimation can be eliminated through diversification in large cross-sections. In terms of efficient estimation of beta pricing models, this assumption facilitates estimation of 7%_1, the remaining component of the efficient G M M weighting matrix. To be sure, efficient estimation could proceed by postulating a model for 7J/~t_l in (4.7) of the form ~(zt_l). However, it is unlikely that an econometrician, particularly one using semiparametric methods, would possess reliable prior information of this form save for the factor models of Section 4.2. Accordingly, consider the addition of a linear factor model to the conditional beta models. Once again, consider the projection: 32 R_t = s t + ~_(zt_ 1, O_#)Q(xt, O_O_Q) + Bx(z_t_,, OBx)~ -t- gm (z-t-l, OBm)Rm, q- (;fiB, and the application of the pricing relation to the intercept vector: ~t ~" [l--- Bm(Z-t-l,O--Bm)l-]);Ot-- fl(2-t-l~Ofl)~Qt-- gx(~-l,OBx)Lt which, after rearranging terms and insertion into (4.19), yields: R_t = !20, + fl_(z_t_l,_0/~)[Q(x,,O_Q)- 2Ot] + Bx(z,_l, OBx) [x~ - 2xt] + Bm(z_,_l, OBm)[Rmt -- _t20t] q- ~-BBt 2Qt = 2otE[Q(x_,, 0O)2j/t-,]; ~t"t~Bt_, = t [~_eBt~_mt'llt_, ]
ax, = ,~o,E[x,O(x,,_0o)II,-1] .
(4.19)
(4.20)
(4.21)
32 Of course, one element of (x/R_,,J) must be dropped if (x_/R_mt')and Q(x_4, 0_0)are linearlydependent.
Semiparametric methodsfor asset pricing models
83
When all of these components are present in the model, assume that a vector of ones does not lie in the column span of either Bx(_Zt_l,__0Bx ) or Bm(z_t_l, OBm). This formulation nests all of the models in the preceding subsections. When Bx(z~_l,0_Bx ) and Bm(Zt_l,O_Bm) a r e identically zero, equations (4.21) yield the conditional beta model (4.7) or, in the absence of the pricing kernel model Q(x~,0O), the prespecified beta model (4.11). Similarly, when __fl(_Zt_l,_00/~ ) is identically zero, equations (4.21) yield the observable linear factor model (4.17) or, without observations on xt and R_R~t, the multifactor analogue of the prespecified beta model. When all components are included simultaneously, the conditional factor model places structure on the conditional covariance matrix of the residuals ~/~t_lin the conditional beta model (4.7). This factor model represents more than mere elegant variation - it makes it plausible to place a a priori restrictions on the conditional variance matrix ~#Bt-1. In terms of the conditional beta model (4.7), the residual covariance matrix 7%_1 has an observable factor structure in this model given by: 33
tr2tflt-l~(Bx(z-t-l,OBx)Bm(z-t-t~O-Bm))Var[(R~t)'lt-1 ] (Bx(z-t-l'OBx)') 4- ~[IBt-I ~kBm(z_,_l, OBm), BIJBt-1 V~Bt-IB3Bt-I' + ~#Bt-I

and its inverse is given by:
(4.22)
-1 1 -- trglflBt_lBflBt_ -1 7s~tl l z tllflBt_ l (VflBt-1 4- BflBt_l Itlfl~t_l B t_ l ]

' -1 BflBt-1 ~flBt-1 " (4.23)
Hence, the factor model provides the final input necessary for the efficient estimation of beta pricing models. Chamberlain and Rothschild (1983) provide a convenient characterization of diversifiability restrictions for residuals like _e/~Bt.They assume that the largest eigenvalue of the conditional residual covariance matrix 7~Bt_l remains bounded as the number of assets grows without bound. This condition is sufficient for a weak law of large numbers to apply because the residual variance of a portfolio with weights of order 1IN (i.e., one for which ~_lwt_l ~ 0 as
N---+ oo V wt_ 1 C It-1)
converges
to zero since
ffwt-2 1 = ~t---lttlflBt-lWt-1 --<W---tt-lW---t-1
~max(~pBt-1)---' 0 as N ~ oo where Cmax(*) is the largest eigenvalue of its argument.
33 Unobservable factor models can be imposed as well as long as the associated conditional betas are constant. The methods developed for the iid case in Chamberlain and Rothschild (1983), Connor and Korajczyk (1988) and L e h m a n n and Modest (1988) apply since the residuals in this application are serially uncorrelated. L e h m a n n (1992) discusses the serially correlated case.
84
B. N. Lehmann
Unfortunately, there is no obvious way to estimate ItlflBt_ 1 subject to this boundedness condition. 34 Hence, the imposition of diversification constraints in practice generally involves the stronger assumption of a strict factor structure: that is, that ~'#Bt-I is diagonal. Of course, there is no guarantee that a diagonal specification leads to an estimator of higher efficiency than an identity matrix (that is, ordinary least squares) when generalized least squares is appropriate, as would be the case if ~Bt-1 is unrestricted save for the diversifiability condition lim ~max(ttlflBt_l) < 00. While weighted least squares may in fact be superior in most applications, conservative inference can be conducted assuming that this specification is false. In any event, the econometrician can allow for a generous amount of dependence in the idiosyncratic variances in the diagonal specification. What is the large cross-section behavior of G M M estimators assuming that a weak law applies to the residuals? To facilitate large N analysis, append the subscript N to the residuals ~-flBNt and to the associated parameter vectors and matrices flN(Zt_l, O~N) ,BxN(Z_t_l, O_BxN),BmN(Zt_I, OBmN), and I~BNt_ 1 and take all limits as N grows without bound by adding elements to vectors and rows to matrices as securities are added to the asset menu. An arbitrary conditional G M M estimator can be calculated from: T
1 ZApBNt_I~_3BNt N T t=t
~_I~BN, = R_t - l_~o, - ~ (z_,_ 1, O--,SU)[Q(x-t,O-Q) - 2Qt] - BxN(Zt-l, OBxN)[X_t -- 2xt]
-- BmN(g_t_l, OBrnN)[R_ant- t,~0t] .

t
(4.24)
where A~BNt-I is a sequence of p Nop(1) matrices chosen by the econometrician having full row rank for which Cmin(A3BNt_IA3BNt_I) ~ (X~ as N ~ (x) where ~min(e) is the smallest eigenvalue of its argument. This latter condition ensures that the weights are diversified across securities and not concentrated on only a few assets. Examination of the estimating equations (4.24) reveals the benefits of large cross-sections when residuals are diversifiable. The sample and population residuals are related by:
~-flBN, = ~BNt-~-/(~0,- ~0t) ~- {~(Z__t-l,0/~)[Q(xt, 0o) - 2Qt]

- ~(zt_l, _0/~N)[Q(x__t,O_Q) - 2Qt]}
Jr- [OxN(Z,_l, O_BxU)-- BxU(Z_,_l, O_BxN)]X_t q- [BxN(Z_t_l, O_BxN)~_xt-- BxN(Zt_I, OBxN)L,] -~ [BmN(Zt_l, O--BraN)-- BmN(Z__t_l, OBmN)]Rmt Jr- {BraN (ZC_I, ~BmN)l_~o, -- BmN(Z,_I, OBmN)l.~O,}
(4.25)
the first component of which is the population residual vector I?,flBUt and the remaining components of which represent the difference between the population and 34 Recently,Ledoit (1994)has proposedestimatingcovariancematricesusing shrinkageestimators of the eigenvalues,an approach that might work here.
85
fitted part o f the model. Clearly, ~_BBN t c a n be eliminated t h r o u g h diversification and, hence, the application of ABsNt_ 1 to ~-BBNt will do so since it places implicit weights o f order 1IN on each asset as the n u m b e r o f assets grows without bound. However, the benefits o f diversification have a limit because o f the difference between the population and fitted part o f the model. F o r example, the sampling errors in Q(x_t , O__Q), "~Ot, ~xt, BxN(Z_t_I,OBxN) and BmN(Z_t_l, O BmN) generally c a n n o t be diversified away in a single cross-section. To be sure, some c o m p o n e n t s o f ~-BBNt are amenable to diversification in some models. F o r example, if fl(Zt_l,0_B ) is identically zero (i.e., if the pricing kernel at is given by CO'xt_lx__ t + OJmt_-lRmt) and, if the models for both BxN(Zt_I,OBxN) and BmN(Z_t_I,OBmN) ^ are linear, the sam) --BmU(Zt_a, Osmu) pling errors BxN(Zt_l, OBxN) -- BxN(Zt_l ' O--BxU) and B,nN(Zt_l, O_BmN can, in principle, be eliminated t h r o u g h diversification. In this case, the only risk p r e m i u m that can be consistently estimated f r o m a single cross-section is 20t since the difference 2-xt - 2-xt can only be eliminated in large time series samples. 35
4.4. Feasible (nearly efficient) conditional G M M estimation of beta pricing models

With these preliminaries in mind, we n o w consider efficient conditional G M M estimation o f the composite conditional beta model (4.21). In this model, the optimal choice o f ABBt_ 0 1 is q~BBt-1 -1 where these matrices are given by:
~ flBt-1 --
02or O_B ) - Bx(zt_l ' OBx) O0 { -l - Var[Q(xt, - -OQ)]It_1]3(z_t_l, -X C o v [xt , Q (xt, OQ)[It_ 1]
-- gm
(z-t-l,
OBm)I-} t ~- E[Rmt -
_z20tlI,-l]' OBm(z_t_l, O~m)'

O0
+ 20,{Var[Q(xt, O_Q)1I,_1]
Ofl(Z_t--1, O0
oB)'
OCov[xt, Q(x~, O_o)lit-l]' O0 (4.26)
+ OVar[Q(xt, OQ)]I,-1]
00 fl(-gt-1 '-0B)t -}
Bx(z-t-l, 0-Bx)t + Cv[x-t, Q(x-t, O--Q)llt-1]'OBx(Z-to--~lO'O-Bx)' }
35This point has resulted in much confusion in the beta pricing literature. The literature abounds with inferences drawn from cross-sectional regressions of returns on the betas of individual assets computed with respect to particular portfolios. If the betas in these prespecified beta models are computed with respect to an efficient portfolio, the best one can do in a single cross-section (with a priori knowledgeof the populationbetas and return covariance matrix) is to recover the returns of the efficient portfolio. Information on the risk premium of portfolios like p in Section 4.1 can only be recovered over time while the return of portfolio 0 converges to the riskless rate in a single crosssection if the residuals of the prespecified beta model are diversifiable given the population value of ~pt-1. Shanken (1992) shows that this is the case using the sample analogue of ~pt-1 in a model with constant conditional betas and independently and identically distributed idiosyncratic disturbances given appropriate corrections for biases induced by sampling error. See also Lehmann (1988) and Lehmann (1990).
86
B. N. Lehmann
I//~tl i = ~3Bt-1 -- ~#Bt_IB3Bt-I \V[3Bt-1 + B#Bt-1 r3Bt_lDt--1 )

t --I B3Bt-1 ~3Bt-I "
In the original formulation (3.6)-(3.9), efficient estimation required ~t-l, the derivatives of the conditional expectation of Q(x_t, OQ)Rt, and ~/t-1, the conditional covariance matrix of R__tQ(xt,OQ)- t_. Equations (4.21) reflect the kinds of assumptions that the econometrician can make to facilitate efficient estimation. The conditional beta model eases the evaluation of the beta pricing version of #t-1 and the factor model assumption places structure on the associated analogue of ~t-1. Consistent estimation of A~Bt 1 requires the evaluation of a number of conditional moments - 20t, E[Rmt-~I-I], Var[Q(x_t, O_Q)lit-l], and Cov[xt, Q(xt,_0a)] /t-l] and their derivatives, when necessary, along with B~Bt-l, VBBt-1, and kU/~Bt_I. The most common strategy by far is simply to assume that the relevant conditional moments are time invariant functions of available informations. This strategy was taken throughout this section in the models for conditional betas and conditional factor loadings. For the evaluation of A~Bt_I, this approach requires the econometrician to posit relations of the form:
2 o ( z t _ l , o) =
E [Q(xt, _0O)Iz_t_L] -1
, Q(xt , O_O)Iz_t_l]) ~m(__.t_l,_00)= E[Rmt Iz_t_l] = -20(zt_l, 0)(l - Cov[R__mt

o)=
=
Var[Q(x_t, OQ)lz_t_l] Cov[x_t, Q(x_t, O_Q)Iz_t_l]
(4.27)
= Var [ (R_-X:t) Iz-t-,]
War[_~aB,lz_t_l] which permit the consistent estimation ofAgBt_ 1 using initial consistent estimates of __0. It is far from obvious that a financial econometrician can be expected to have reliable to prior information in this form. In most asset pricing applications, the possession of such information about the conditional second moments a S (zt_l, 0) and ~rax(z_t_l,O_ ) is somewhat more plausible than the existence of the corresponding conditional first moment specifications 20(Zt_l,_0) and 2re(zt_1,_0_0) in its conditional mean from. However, observation of the riskless rate eliminates the need to model )~0(_Zt_l,0) and models for Cov[Rmt , Q(x~, %)[Z_t_l] seem no more demanding than those for other conditional second moments. The conditional covariance matrix V~B(Zt_l,0) is somewhat less problematic as well, although the specification of multivariate conditional covariance models is in its infancy. The discussion in Section 4.3 left some ambiguity concerning the availability of plausible models of this sort for ~/~st-I due to the inability to impose the general bounded eigenvalue condition. As noted there, the specification of idiosyncratic
87
variances is comparatively straightforward if ~l~Bt_ 1 is diagonal. Finally, conservative inference is always available through the use of the asymptotic covariance matrix in (2.13). Equations (4.27) can either represent parametric models for these conditional moments or functions that are estimable by semiparametric or nonparametric methods. Robinson (1987), Newwy (1990), Robinson (1991), and Newey (1993) discuss bootstrap, nearest neighbor, and series estimation of functions such as those appearing in (4.27). All of these methods suffer from the curse of dimensionality so their invocation must be justified on a case by case basis. Neural network approximations promising somewhat less impairment from this source might be employed as well. 36
5. Concluding remarks
This paper shows that efficient semiparametric estimation of asset pricing relations is straightforward in principle if not in practice. Efficiency follows from the maximum correlation property of the optimal GMM estimators described in the second section, a property that has analogues in the optimal hedge portfolios that arise in asset pricing theory. The semiparametric nature of asset pricing relations naturally leads to a search for efficiency gains in the context of beta pricing models. The structure of these models suggests that efficient estimation is made feasible by the imposition of conditional beta models and/or multifactor models with residuals that satisfy a law of large numbers in the cross-section, models that exist in various incarnations in the beta pricing literature. Hence, strategies that have proved useful in the iid environment have natural, albeit nonlinear and perhaps nonparametric, analogues in this more general setting, the details of which are worked out in the paper. While it has offered no evidence on the magnitude of possible efficiency gains, the paper has surely pointed to more straightforward interpretation and implementation than has been heretofore attainable. What remains is to extend there results in two dimensions. The analysis sidestepped the development of the most general approximations of the conditional moments that comprise the optimal conditional weighting matrices, the subtleties of which arise from the martingale difference nature of the residuals in no-arbitrage asset pricing models as opposed to the independence assumption frequently made in other applications. The second dimension involves examination of less parametric semiparametric estimators. In the asset pricing arena, this amounts to semiparametric estimation of pricing kernels and state price densities, a more ambitious and perhaps more interesting task.
36 Barron(1993) and Horniket al. (1993) discussthe superiorapproximationproperties of neural networks in the multidimensionalcase.
88
B. N. Lehrnann
References
Bansal, R. and B. N. Lehmann (1995). Bond returns and the prices of state contingent claims. Graduate School of International Relations and Pacific Studies, University of California at San Diego. Bansal, R. and S. Viswanathan (1993). No arbitrage and arbitrage pricing: A new approach. J. Finance 48, pp. 1231-1262. Barron, A. R. (1993). Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory 39, pp. 930-945. Black, F., M. C. Jensen and M. Scholes (1972). The capital assest pricing model: Some empirical tests. In: M. C. Jensen, ed., Studies in the Theory of Capital Markets, New York: Praeger. Breeden, D. T. (1979). An intertemporal asset pricing model with stochastic consumption and investment opportunities. J. Financ. Econom. 7, pp. 265-299. Cass, D. and J. E. Stiglitz (1970). The structure of investor preferences and asset returns and separability in portfolio allocation: A contribution to the pure theory of mutual funds. J. Econom. Theory 2, pp. 122-160. Chamberlain, G. (1987). Asymptotic efficiency in estimation with conditional moment conditions. J. Econometrics 34, pp. 305-334. Chamberlain, G. (1992). Efficiency bounds for semiparametric regression. Econometrica 60, pp. 567596. Chamberlain, G. and M. Rothschild (1983). Arbitrage and mean-variance analysis on large asset markets. Econometrica 51, pp. 1281-1304. Cochrane, J. (1991). Production-based asset pricing and the link between stock returns and economic fluctuations. J. Finance 146, pp. 207-234. Connor, G. and R. A. Korajczyk (1988). Risk and return in an equilibrium APT: Application of a new test methodology. J. Financ. Econom. 21, pp. 255-289. Constantinides, G. and W. Ferson (1991). Habit persistence and durability in aggregate consumption: Empirical tests. J. Financ. Econom. 29, pp. 199-240. Douglas, G. W. (1968). Risk in the Equity Markets: An Empirical Appraisal of Market Efficiency. Ann Arbor, Michigan: University Microfilms, Inc. Epstein, L. G. and S. E. Zin (1991a). Substitution, risk aversion, and the temporal behavior of consumption and asset returns: A theoretical framework. Econometrica 57, pp. 937 969. Epstein, L. G. and S. E. Zin (1991b). Substitution, risk averison, and the temporal behavior of consumption and asset returns: An empirical analysis. J. Politic. Eeonom. 96, pp. 263-286. Fama, E. F. and J. D. MacBeth (1973). Risk, return, and equilibrium: Empirical tests. J. Politic. Econom. 81, pp. 60%636. Grinblatt, M. and S. Titman (1987). The relation between mean-variance efficiency and arbitrage pricing. J. Business 60, pp. 97-112. Hall, A. (1993). Some aspects of generalized method of moments estimation. In: G. S. Maddala, C. R. Rao and H. D. Vinod, ed., Handbook of Statistics: Econometrics. Amsterdam, The Netherlands: Elsevier Science Publishers, pp. 393~418. Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica 50, pp. 1029-1054. Hansen, L. P. (1985). A method for calculating bounds on the asymptotic covariance matrices of generalized method of moments estimators. J. Econometrics 30, pp. 203-238. Hansen, L. P., J. Heaton and E. Luttmer (1995). Econometric evaluation of assest pricing models. Rev. Financ. Stud. g pp. 237-274. Hansen, L. P., J. Heaton and M. Ogaki (1988). Efficiency bounds implied by multi-period conditional moment conditions. J. Amer. Stat. Assoc. 83, pp. 863-871. Hansen, L. P. and R. J. Hodrick (1980). Forward exchange rates as optimal predictors of future spot rates: An econometric analysis. J. Politic. Econom. 88, pp. 829-853. Hansen, L. P. and R. Jagannathan (1991). Implications of security market data for models of dynamic Economies. J. Politic. Econom. 99, pp. 225-262.
89
Hansen, L. P. and R. Jagannathan (1994). Assessing specification errors in stochastic discount factor models. Research Department, Federal Reserve Bank of Minneapolis, Staff Report 167. Hansen, L. P. and S. F. Richard (1987). The role of conditioning information in deducing testable restrictions implied by dynamic asset pricing models. Econometrica 55, pp. 587-613. Hansen, L. P. and K. J. Singleton (1982). Generalized instrumental variables estimation of nonlinear rational expectations models. Econometrica 50, pp. 1269-1286. Harrison, M. J. and D. Kreps (1979). Martingales and arbitrage in multiperiod securities markets. J. Econom. Theory 20, pp. 381-408. He, H. and D. Modest (1995). Market frictions and consumption-based asset pricing. J. Politic. Econom. 103, pp. 94-117. Hornik, K., M. Stinchcombe, H. White and P. Auer (1993). Degree of approximation results for feedforward networks approximating unknown mappings and their derivatives. Neural Computation 6, pp. 1262-1275. Hubennan, G. and S. Kandel (1987). Mean-variance spanning. J. Finance 42, pp. 873-888. Huberman, G., S. Kandel and R. F. Stambaugh (1987). Mimicking portfolios and exact asset pricing. J. Finance 42, pp. 1-9. Ledoit, O. (1994). Portfolio selection: Improved covariance matrix estimation. Sloan School of Management, Massachusetts Institute of Technology, Lehmann, B. N. (1987). Orthogonal portfolios and alternative mean-variance efficiency tests. J. Finance 42, pp. 601-619. Lehmann, B. N. (1988). Mean-variance efficiency tests in large cross-sections. Graduate School of International Relations and Pacific Studies, University of California at San Diego. Lehmann, B. N. (1990). Residual risk revisited. J. Econometrics 45, pp. 71-97. Lehmann, B. N. (1992) Notes of dynamic factor pricing models. Rev. Quant. Finance Account. 2, pp. 69-87. Lehmann, B. N. and David M. Modest (1988), The empirical foundations of the arbitrage pricing theory. J. Financ. Econorn. 21, pp. 213-254. Lintner, J. (1965). Security prices and risk: The theory and a comparative analysis of A.T &T. and leading industrials. Graduate School of Business, Harvard University. Luttmer, E. (1993). Asset pricing in economies with frictions. Department of Finance, Northwestern University. Merton, R. C. (1972). An analytical derivation of the efficient portfolio frontier. J. Financ. Quant. Anal. 7, pp. 1851-1872. Merton, R. C. (1973). An intertemporal capital asset pricing model. Econometrica 41, pp. 867-887. Miller, M. H. and M. Scholes (1972). Rates of return in relation to risk: A reexamination of some recent findings. In: M.C. Jensen, ed., Studies in the Theory of Capital Markets, New York: Praeger, pp. 79-121. Newey, W. K. (1990). Efficient instrumental variables estimation of nonlinear models. Econometrica 58, pp. 809-837. Newey, W. K. (1993). Efficient estimation of models with conditional moment restrictions. In: G. S. Maddala, C. R. Rao and H. D. Vinod, eds., Handbook of Statistics: Econometrics. Amsterdam, The Netherlands: Elsevier Science Publishers. Newey, W. K. and K. D. West (1987). A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55, pp. 703-708. Ogaki, M. (1993). Generalized method of moments: Econometric applications. In: G. S. Maddala, C. R. Rao and H. D. Vinod, eds., Handbook of Statistics: Econometrics, Amsterdam, The Netherlands: Elsevier Science Publishers, pp. 455-488. Robinson, P. M. (1987). Asymptotically efficient estimation in the presence of heteroskedasticity of unknown form. Econometrica 59, pp. 875-891. Robinson, P. M. (1991). Best nonlinear three-stage least squares estimation of certain econometric models. Econometrica 59, pp. 755-786. Roll, R. W. (1977). A critique of the asset Pricing Theory's Tests - Part I: On past and potential testability of the theory. J. Financ. Econom. 4, pp. 129-176.
90
B. N. Lehmann
Rosenberg, B. (1974). Extra-market components of covariance in security returns. J. Financ. Quant. Anal. 9, pp. 262-274. Rosenberg, B. and V. Marathe (1979). Tests of capital asset pricing hypotheses. Research in Finance: A Research Annual 1, pp. 115-223. Ross, S. A. (1976). The arbitrage theory of capital assest pricing. J. Economic Theory 13, pp. 341-360. Ross, S. A. (1977). Risk, return, and arbitrage. In: I. Friend and J.L. Bicksler, eds., Risk and Return in Finance. Cambridge, Mass.: Ballinger. Ross, S. A. (1978a). Mutual fund separation and financial theory - the separating distributions. J. Econom. Theory 17, pp. 254-286. Ross, S. A. (1978b). A simple approach to the valuation of risky streams. J. Business 51, pp. 1-40 Rubinstein, M. (1976). The valuation of uncertain income streams and the pricing of options. Bell J. Econom. Mgmt. Sci. 7, pp. 407-425. Shanken, J. (1992). On the estimation of beta pricing models. Rev. Financ. Stud. 5, pp. 1-33. Summers, L. H. (1985). On economics and finance. J. Finance 411, pp. 633-636. Summers, L. H. (1986). Does the stock market rationally reflect fundamental values? J. Finance 41, pp. 591-600. Tauchen, G. (1986). Statistical properties of generalized method of moments estimators of structural parameters obtained from financial market data. J. Business Econom. Statist. 4, pp. 397-425.
G.S. Maddala and C.R. Rao, eds., Handbookof Statistics, Vol. 14 1996 Elsevier Science B.V. All rights reserved,
a'i"
Modeling the term structure*
A. R. Pagan, A. D. Hall and V. Martin
1. Introduction
Models of the term structure of interest rates have assumed increasing importance in recent years in line with the need to value interest rate derivative assets. Economists and econometricians have long held an interest in the subject, as an understanding of the determinants of the the term structure has always been viewed as crucial to an understanding of the impact of monetary policy and its transmission mechanism. Most of the approaches taken to the question in the finance literature have revolved around the search for c o m m o n factors that are thought to underlie the term structure and little has been borrowed from the economic or econometrics literature on the subject. The converse can also be said about the small amount of attention paid in econometric research to the finance literature models. The aim of the present chapter is to look at the connections between the two literatures with the aim of showing that a synthesis of the two may well provide some useful information for both camps. The paper begins with a description of a standard set of data on the term structure. This results in a set of stylized facts pertaining to the nature of the stochastic processes generating yields as well as their spreads. Such a set of facts is useful in forming an opinion of the likelihood of various approaches to term structure modelling being capable of replicating the data. Section 3 outlines the various models used in both the economics and finance literature, and assesses how well these models perform in matching the stylized facts. Section 4 presents a conclusion.
* We are grateful for comments on previous versions of this paper by John Robertson, Peter Phillips and Ken Singleton. All computations were performed with a beta version of MICROFIT 4 and GAUSS 3.2 91
92
2. Characteristics of term structure data

2.1. Univariate properties
T h e data set examined involves m o n t h l y observations on 1, 3, 6 and 9 m o n t h and 10 year zero c o u p o n b o n d yields over the period D e c e m b e r 1946 to F e b r u a r y 1991, constructed by McCulloch and K w o n (1993); this is an u p d a t e d version of McCulloch (1989). Table 1 records the autocorrelation characteristics o f the series, with f3j being the jth autocorrelation coefficient, D F the Dickey-Fuller test, A D F ( 1 2 ) the A u g m e n t e d Dickey-Fuller test with 12 lags, rt('c) the yield on zero-coupon bonds with m a t u r i t y o f z m o n t h s and spt (z) is the spread rt (z) - rt (1). It shows that there is strong evidence of a unit root in all interest rate series, Because this would imply the possibility of negative interest rates, finance modellers have generally maintained that either there is no unit root and the series feature m e a n reversion or, in continuous time, that an appropriate model is given by the stochastic differential equation
drt = ~dt + trrtdtlt ,
where, t h r o u g h o u t the paper, dtlt is a Wiener process. Because of the "levels effect" o f rt u p o n the volatility o f interest rate changes, we can think of this as an equation in d log rt with constant volatility, and the logarithmic t r a n s f o r m a t i o n ensures that rt remains positive. 1 In any case, the i m p o r t a n t point to be m a d e here is that interest rates seem to behave as integrated processes, certainly over the samples o f data we possess. It m a y be that the autoregressive root is close to unity, rather than identical to it, but such " n e a r integrated" processes are best handled with the integrated process technology rather than that for stationary processes.
Table 1 Autocorrelation features, yields, full sample

DF ADF(12) Pl
P2
P6
P12
Pl(At)
.02 .11 .15 .15 .07
r(1) -2.41 r(3) -2.15 r(6) -2.12 r(9) -2.12 r(120) -1.41 sp(3) -15.32 sp(6) -11.67 sp(9) -10.38 sp(120) -5.60
-2.02 -1.89 -1.91 -1.89 -1.53 -3.37 -4.21 -4.38 -4.15
.98 .98 .99 .99 .99 .38 .59 .66 .89
.33 .51 .55 .80
.21 .26 .27 .55
.38 .30 .26 .32
The 5% critical value for the DF and ADF tests is -2.87. It is known that, ifrt is replaced by rt ~, the restriction y > .5 ensures a positive interest rate while, if y = .5, a < 2a is needed.
Modeling the term structure
93
Instead of the yields one might examine the time series characteristics of the forward rates. The forward rate Fkt(z) contracted at time t for a z period bond to be bought at t + k is F~(v) = [1] [(z + k)rt(z + k) - krt(k)]. For a forward contract one period ahead this becomes FI(T) = Ft(z) = ~[(~ + 1)r_,(z + 1) - r~(1)]. For reasons that become apparent later it is also of interest to examine the properties of the forward "spreads" Fpt(z, - 1) = Ft(v - 1) - F t - 1 (~). These results are to be found in Table 2. Generally, the conclusions would be the same as for yields, except that the persistence in forward rate spreads is not as marked, particularly as the maturity gets longer. As Table 1 also shows, there is a lot of persistence in spreads between shortdated maturities; after fitting an AR(2) to spt(3) the LM test for serial correlation over 12 lags is 80.71. This persistence shows up in other transformations of the yield series, e.g. the realized excess holding yield ht+l('C)= zrt(z)- ( z - 1)rt+l (z - 1) - rt(1), when ~ = 3, has serial correlation coefficients of .188 (lag 1), .144 (lag 8), a n d . 111 (lag 10). Such processes are persistent, but not integrated, as the ADF(12) for ht+l (3) clearly shows with its value of -5.27. Papers have appeared concluding that the excess holding yield is a non-stationary process-Evans and Lewis (1994) and Hejazi (1994). That conclusion was reached by the authors performing a Phillips-Hansen (1990) regression of ht+l(z) on Ft-l(z). Applying the same test to our data, with McCulloch's forward rate series, produces an estimated coefficient on Ft-1 (~) o f . 11 with a t ratio of 10, quite consistent with both Evans and Lewis' and Hejazi's results. However, it does not seem reasonable to interpret this as evidence of non-stationarity. Certainly the series is persistent, and an I(1) series like the forward rate exhibits extreme persistence, so that regressing one upon the other can be expected to lead to some "correlation", but to conclude, therefore, that the excess holding yield is non-stationary is quite incorrect. A fractionally integrated process that is stationary would also show such a relationship with an I(1) process. Indeed, the autocorrelation functions of the spreads and excess yields are reminiscent of those for the squares of yield changes, which have been modelled by fractionally integrated processes - see
Table 2 Autocorrelafion features, forward rates, full sample DF F(1) F(3) Y(6) F(9) -2.28 -2.18 -2.39 -2.14 -17.08 -19.52 -20.61 -19.73 ADF(12) -1.92 -1.97 -1.91 -1.77 -4.07 -5.17 -5.77 -4.69 Pl .98 .98 .98 .98 .29 .16 .11 .15 P2 P6 Pl2 p1(A~ .07 .04 .09 .07 .18 .06 -.12 -.00 .11 .01 -.05 .08 .18 -.02 -.03 .02
Fp(O, 1) Fp(2, 3)
Fp(5, 6) Fp(8, 9)
The 5% critical value for the DF and ADF tests is -2.87.
94
Baillie et al. (1993). Nevertheless, the s t r o n g persistence in spreads is a characteristic which is a substantial challenge to t e r m structure models. 2 A s is well k n o w n , there was a switch in m o n e t a r y p o l i c y in the U S in O c t o b e r 1979 a w a y f r o m targeting interest rates, a n d this fact generally m e a n s t h a t a n y analyses have to be r e - d o n e to ensure t h a t the results d o n o t simply reflect outcomes f r o m 1979 to 1982. T a b l e 3 therefore presents the same statistics as in T a b l e 1 b u t using only p r e O c t o b e r 1979 data. It is a p p a r e n t t h a t the conclusions d r a w n a b o v e are quite robust. It is also well k n o w n t h a t there is a substantial d e p e n d e n c e o f the c o n d i t i o n a l volatility o f A r t ( z ) u p o n the past, b u t the exact n a t u r e o f this d e p e n d e n c e has been subject to m u c h less analysis. A s will b e c o m e clear, the m o s t i m p o r t a n t issue is w h e t h e r the c o n d i t i o n a l variance, a~t 2 , exhibits a levels effect a n d , if so, exactly w h a t r e l a t i o n s h i p is likely to hold. H e r e we e x a m i n e the evidence for a "levels 2 d e p e n d s on rt(z), c o n c e n t r a t i n g u p o n the five yields effect" in volatility, i.e. a~t m e n t i o n e d earlier. Evidence o f the effect can be m a r s h a l l e d in a n u m b e r o f ways. By far the simplest a p p r o a c h is to p l o t (Art(z) - #)2 a g a i n s t rt-1 (z), (and this is d o n e in Fig. 1 for rt(1)). 3 The evidence o f a levels effect l o o k s very strong. A m o r e s t r u c t u r e d a p p r o a c h is to estimate the p a r a m e t e r s o f a diffusion process for yields o f the f o r m
drt = (~l - flirt) d t + r~ldtlt ,
(1)
a n d to e x a m i n e the estimate o f 7t. T o estimate this requires some a p p r o x i m a t i o n scheme. C h a n et al. (1992) c o n s i d e r a discretization b a s e d on the E u l e r scheme with h = 1 (ht being the discretized steps) p r o d u c i n g Table 3 Autocorrelation features, pre October 1979 DF r(1) -.76 r(3) -.64 r(6) -.52 r(9) -.55 r(120) -.14 sp(3) -11.99 sp(6) -8.80 sp(9) -8.20 sp(120) -4.05 ADF(12) -.79 -1.03 -1.00 -.89 .33 -2.64 -2.89 -3.10 -4.30
/91
.97 .97 .98 .98 .99 .46 .70 .71 .90
/92
/36
/912
Pl (Ar) -.14 -.07 .08 .11 .04
.34 .56 .60 .84
.22 .39 .38 .59
.34 .38 .36 .23
2 Throughout the paper we will take the term structure data as corresponding to actual observations. In practice this may not be so, as a complete term structure is interpolated from observations on parts of the curve. This may introduce some biases of unknown magnitude into relationships between yields. McCulloch and Kwon (1993) interpolate with spline functions. Others e.g. Gouri6roux and Scaillet (1994), actually utilize some of the factor models discussed later to suggest forms for the yield curve that may be used for interpolation. 3 Marsh and Rosenfeld (1983) also did this and commented on the relation.
M o d e l i n g the t e r m s t r u c t u r e
95
24.1132[
15.3448
6.5763
-2.1921 24900
1 5.5693
10.8897
16.2100
Fig. 1. Plot of squared changes in one month yield against lagged yield
AF t :
o~1 - - f l l r t _ l
-~-
GrtTl_18 , ,
(2)
where here and in the remainder of the paper et is n.i.d.(O, 1). Equation (2) can be estimated by OLS simply by defining the dependent variable as Artr~7~, while the regressors become x, = [r/_~] r]-~], as the error term is then O'er, which is n.i.d(O, or2). Because the conditional mean for rt depends only on ~1, ill, while the conditional variance of ut = rt - Et-1 (rt) is o'2rt2_~11,which does not involve these parameters, we could estimate the parameters in the following way. 1. Regress Art on 1 and rt-1 to get &l and ill. 2. Since
Et_l [u2]
:
,.~2 ~ t t _ 2,, 1 ,
(3)
then
U t2 = ~:r27~ + vt ,
(4)
where g t - l ( V t ) = E[u 2 - Et-1 (u2)] = 0. Hence we can estimate 71 by using a non-
linear regression program. 3. We can re-estimate ~1 ,ill by then doing a weighted regression of Artrt_~ against rt~ and r~_-i ~. The above steps would produce a maximum likelihood estimator if et was taken to be sff(0, 1) and the estimation of ~1 was done by a weighted non-linear
96
regression on (3) using the conditional standard deviation o f vt as weights. 4 C h a n et al. (1992) use a G M M estimator, which jointly estimates cq, ill, 71 and a from the set o f m o m e n t s E(et) = 0,
E ( r t - l e t ) = 0,
E(vt) = 0,
E(rt_lVt ) = 0 .
Their estimator would coincide with the one described above if the last m o m e n t condition was replaced by E(r~_lVt ) -- 0. A potential problem with all the estim a t o r s is that, if fll is likely to be close to zero, the regressors in (2) and (4) will be close to I(1), and so n o n - s t a n d a r d distribution theory almost certainly applies to the G M M estimator. Table 4 presents estimates o f the parameters ~1, fll and 71 f o u n d by using three estimation methods. The first one is based on estimating the diffusion with an Euler approximation, Arth = ~lh - fllhr(t-1)h 5- flit
. -1/2
r(t_l)h~ t ,
~1
(5)
with h = 1. It is the estimator described above as G M M . The others stem f r o m the m o d e r n a p p r o a c h o f indirect estimation p r o p o s e d by Gouri6roux et al. (1993) Table 4 Estimates of diffusion process parameters rt(1) GMM ~1 fll ~l
MLE
rt(3) .090 (1.82) .015 (1.24) 1.424 (5.61) .047 (2.48) .007 (7.89) .648 (1.92) .044 (1.89) -.010 (1.57) .974 (5.73)
rt(6) .089 (1.80) .015 (1.25) 1.532 (4.99) .041 (3.00) .005 (2.08) .694 (2.31) .045 (4.34) -.008 (2.24) .947 (7.21)
rt(9) .091 (1.87) .015 (1.31) 1.516 (5.12) .037 (3.82) .004 (1.08) .753 (3.34) .043 (1.67) -.008 (2.04) .941 (3.09)
rt(120) .046 (1.77) .006 (.98) 1.178 (9.80) .015 (3.74) -.001 (2.35) 1.136 (19.30) .009 (3.30) -.009 (2.36) 1.104 (4.88)
.106 (2.19) .020 (1.52) 1.351 (6.73) .071 (2.17) .012 (.74) .583 (2.39) .107 (1.63) -.004 (2.15) .838 (2.67)
~l t1 ~1
EGARCH
~1 fll ~
Asymptotic t-ratios in parentheses 4 Frydman (1994) argues that the distribution of the MLE of fix is non-standard when Yl = 1/2 and there is no drift.
97
and Gallant and Tauchen (1992). In these methods one simulates K multiple sets of observations L'th (k = l, ...,K) from (5), with given values of h (we use 1/100) and 0 ' = (el fll ~1 ~2), and then finds the estimates of 0 that set ~tr=l {K -1 ~--1 d(~h; ~b)} to zero, where ~ is an estimator of the parameters of some auxiiiary model found by solving ~tr=l dc~(rt; ~) = 0. 5 The logic of the estimator is that, if the model (5) is true, then ~b ~ ~b*,where E[d,(rt; ~b*)] = 0, and the term in curly brackets estimates this expectation by simulation. Consistency and asymptotic normality of the indirect estimator follows from the properties of under mis-specification. It is important to note that the auxiliary model need not be correct, but it should be a good representation of the data, otherwise the indirect estimator will be very inefficient. We use two auxiliary models and, in each instance, d o are the scores for ~bfrom those models. The first is (5) with h = 1 and et being assumed n.i.d.(O, 1)(MLE), while the second has rt being an AR(1) with EGARCH(1,1) errors. The visual evidence of Figure 1 is strongly supported by the estimated parametric models, although there is considerable diversity in the estimates obtained. Perhaps the most interesting aspect of the table is the fact that 71 tends to increase with maturity. Based on the evidence from the indirect estimators, 71 = 1/2 seems a reasonable choice for the shortest maturity, which would correspond to the diffusion process used by Cox et.al. (1985). A problem in simply fitting a model with a "levels" effect is that the observed conditional heteroskedasticity in the data might be better accounted for by a G A R C H process, and so the appropriate questions should either be whether there is evidence of a levels effect after removing a G A R C H process, or, whether a levels representation fits the data better than a G A R C H model does. To shed some light on these questions, our strategy was to fit augmented EGARCH(1,1) models to Art(z) = # + a~te~t, e~t ~JV'(0, 1), of the form log ~t a0~ + al~ log tTzt-2 1
This specification is used to generate a diagnostic test for the presence of a levels effect, and is not intended to be a good representation of the actual volatility. Hence the t-statistic for testing if 6 is zero can be regarded as a valid test for more general specifications, e.g. 6 g ( r t _ l ( Z ) ) , where g(.) is some function, provided rt-l(z) is correlated with g(rt-l (z) ). Table 5 gives the estimates of 6 and the
Table 5 and t Ratios for Levels Effect rt(1) .050 3.73 r,(3) .025 3.51 rt(6) .023 3.42 r,(9) .021 3.04 r,(120) .019 2.42
5 A Mihlstein (1974) rather than Euler approximation of (5) was also tried, but there were very minor differences in the results.
98
associated t ratios. Every yield displays a levels effect, although with the 10 year m a t u r i t y it seems weaker. 6 The same conclusion applies to the spreads between forward rates, F p t ( z , z - 1). Fitting E G A R C H ( 1 , 1 ) models to these series for z = 1, 3, 6 and 9 m o n t h s maturity, and allowing the levels effect to be a function of Ft-1 (r), the t-ratios that this coefficient was zero were 3.85, 3.72, 17.25 and 12.07 respectively. A n u m b e r of studies have a p p e a r e d that look at this p h e n o m e n o n . A p a r t f r o m our own work, Chan et al. (1992), Broze et al. (1993), Koedijk et al. (1994), and Brenner et al. (1994) have all considered the question, while Vetzal (1992) and K e a r n s (1993) have tried to allow for stochastic volatility, i.e. at2 is not only a function of the past history o f yields. T o date no formal c o m p a r i s o n o f the different models is available, unlike the situation for stock returns e.g. Gallant et al. (1994). All studies find strong evidence for a levels effect on volatility. Brenner et al. provide M L estimates of the p a r a m e t e r s of a discretized joint G A R C H / l e v e l s model in which the volatility function, a 2, is the p r o d u c t of a 2 G A R C H ( 1 , 1 ) process and a levels effect i.e. at2 = (a0 + a l a L l ~ _ l + a2at_l)rt_ 1. T h e estimated value o f V falls to a r o u n d .5, but remains highly significant. Koedijk et al. (1993) have a similar formulation except that a 2 is driven by ~t-1 r a t h e r than at_let_1.2 2 Again V is reduced but remains highly significant. One might question the use o f conventional significance levels for the " r a w " t ratios, owing to the fact that one of the regressors is a near-integrated process. To examine the effects of this we simulated data f r o m an estimated model, equation (6) for rt(1), treating the estimates obtained by M L E estimation as the true p a r a m e t e r values, and then found the distribution of the t ratio for the hypothesis that 6 = 0 using the M L E , constructed by taking one step f r o m the true values o f the coefficients (this would be a simulation o f the a s y m p t o t i c distribution). The results indicate that the distribution of the t-ratio has fatter tails than the n o r m a l with critical values for two tailed tests of 2.90 (5%) and 2.41 (10%), but use of these would not change the decisions.
2.2. Multivariate properties 2.2.1. The level of the yield curve As was mentioned in the introduction a great deal of w o r k on the term structure views yields as being driven by a set of M factors
M j=l 6 It is interesting to observe that the distribution of the Dickey-Fuller test is very sensitive to whether there is a levels effect or not. To see this we simulated a model in which Art = .001 + .01rt~_let, where et nid(O,1) and Veither took the value of zero or unity. A small drift was added, although its influence upon the distribution is likely to be small. The simulated critical values for 1%, 2.5% and 5% significance levels when 7 = 0, 1 are (-3.14, ~6.41), (-2.71, -4.97) and (-2.39, -4.03) respectively. Clearly, the presence of a levels effect in volatility means that the critical values are much larger (in absolute terms), strengthening the claim that Table 1 suggests a unit root in yields.
~
99
and it is important to investigate whether this is a reasonable characterization of the data. It is useful here to recognise that the modern econometrics literature on multivariate relations admits just such a parameterization. Suppose the yields are collected into an (n 1) vector Yt and that it is assumed that y t can be represented as a VAR. Then, if Yt are I(1) and, in the n yields there are k co-integrating vectors, Stock and Watson (1988) showed this to mean that the yields can be described in the format
Y' =
~t =
+ "
~t--I ~-13t
(8)
where ~t are the n - k c o m m o n trends to the system, and E t - l V t -~ O. The format (8) is commonly referred to as the Beveridge-Nelson-Stock-Watson (BNSW) representation. I f there are (n - 1) co-integrating vectors, there will be a single c o m m o n f a c t o r , ~lt, that determines the level of the yields. H o w the yields relate to one another is governed by Yt - J i l t = ut i.e. the yield curve is a function of ut. Johansen's (1988) tests for the number of co-integrating vectors may be applied to the data described earlier. Table 6 provides the two most commonly used - the m a x i m u m eigenvalue (Max) and trace (Tr) tests - for the five yields under investigation, and assuming a V A R of order one. 7 F r o m this table there appears to be four co-integrating vectors, i.e. a single c o m m o n trend. Johnson (1994), Engsted and Tanggaard (1994) and Hall et al. (1992) reach the same conclusion. Zhang (1993) argues that there are three c o m m o n trends but Johnson shows that this is due to Zhang's use of a mixture of yields from zero and non-zero coupon bonds. W h a t is the c o m m o n trend? There is no unique answer to this. One solution is to find a yield that is determined outside of the system, as that will be the driving force. For a small country, that rate is likely to be the "world interest rate", which in practice either means a Euro-Dollar rate or some combination of the US, G e r m a n and Japanese interest rates. Another candidate for the c o m m o n trend is
Table 6 Tests for cointegration amongst yields Max 5 vs 4 vs 3 vs 2 vs 1 vs 4 trends 3 trends 2 trends 1 trends 0 trends 273.4 184.7 95.6 30.9 2.4 Crit. Value (.05) 33.5 27.1 21.0 14.1 3.8 Tr. 586.9 313.5 128.8 33.3 2.4 Crit. Value (.05) 68.5 47.2 29.6 15.4 3.8
7 Changing this order to four does not affect any conclusions, but restricting it to unity fits in better with the theoretical discussion.
100
the simple average of the rates) In any case we will take this to be the first factor ~lt in (7).
2.2.2. The shape of the yield curve The existence of k co-integrating vectors ~ (~ is an (n x k) matrix), such that ~t = ,'yt is I(0), means that any VAR in Yt has the ECM format Ayt : 7~t-1 + D(L)Ayt_I + et ,
w h e r e Et-1
(9)
(et) : 0 and D(L) is a polynomial in the lag operator. It is also possible to show that ut in (8) can be written as a function of the k EC (error correction) terms (t and this suggests that we might take these to be the remaining factors ~/t (j = 2 , . . . , K) in (7). To make the following discussion more concrete assume that the expectations theory of the term structure holds i.e. a -c period yield is the weighted average of the expected one period yields into the future. In the case of discount bonds the weights are equal to ~ so that the theory states
lZ-1
rt(z) = T Et(rt+k(1)) .
Of course this is an hypothesis, albeit one that seems quite sensible. It implies that
"c--1
r t ( z ) - r t ( 1 ) = {l-
/
Etrt+k(1)- Etrt(1)
[,~" k=0
=[l~-~EtArt+j(1)}
/'L" i=1 j=l

Now, if the yields are I(1) processes, the yield spread rt(z) - rt(1) should be I(0), i.e. rt(z) and rt(1) should be co-integrated with co-integrating vector [1 - 1], and these spreads are the EC terms. Therefore, to test the expectations hypothesis for the five yields we need to test if the matrix of co-integrating vectors has the form
~,=
0 0 0
1 0 0
(10)
Johansen's (1988) test for this gives a Z2(4) of 36.8, leading to a very strong rejection of the hypothesis. Such an outcome has also been observed by Hall et al. (1992), Johnson (1994) and Engsted and Tanggaard (1994). A number of possible explanations for the rejection were canvassed in those papers, involving the size of the test statistic being incorrect etc. One's inclination is to examine the estimated matrix of co-integrating vectors given by Johansen's
8 See Gonzalo and Granger (1991) for other alternatives.
101
procedure, &, and to see how closely these correspond to the hypothesized values but, unfortunately, the vectors are not unique and the estimated quantities will always be linear combinations of the true values. Some structural information is needed to recover the latter, and to this end we write a' = Aa, where -/76 -f19
--fl120
a=
0 0
0
1 0
0
'
and then proceed to solve the equations ~ = ~ia, where ~i is some non-singular matrix. This produces/73 = 1.038,/76 = 1.063,/79 = 1.075 and/~120 = 1.076, which indicates that the point estimates are quite close to those predicted by the expectations theory. It is also possible to estimate the fl, by "limited information" rather than "full-information" methods. To that end the Phillips-Hansen (1990) estimator was adopted with a Parzen kernel and eight lags being used to form the long-run covariance matrices, producing/73 = 1.021, f16 = 1.034,/79 = 1.034 and fl120 = .91. With the exception of the 10 year rate, neither set of estimates seems to be greatly divergent from that predicted. Some insight into why the rejection occurs may be had from (9). Given that cointegration has been established, and working with a VAR(I), i.e. D(L) = 0 in (9), the change in each yield should be governed by
5
Art(z) = Z Tj,(rt-1 ~) - ~jrt-1 (1)) + ezt ,

j=2
(11)
where j = 2 , . . . , 5 maps one to one into the elements z = 3, 6, 9, 120. If the expectations theory is valid flj = 1 and the system becomes
5
art(z) = ~ yj,(rt-1 ~) - rt-l (1) ) + e,t ,

j=2
and the hypothesis H0 : flj = 1 can be tested by computing the likelihood ratio statistic. It is well known that such a test will be distributed as a Z2(4) under the null hypothesis. If the yields were taken to be I(0), the simplest way to test if flj = 1 would be to re-write (11) as
art(~)=j~=2"~Jdrt-l(j)-rt-l(1))+
?jdl-fls
rt-l(1)+e,t,
(12)
and to test if the coefficient of rt-i (1) in each of the equations for Art(~) was zero. For a number of reasons this does not reproduce the )~2(4) test cited above - there are five coefficients being tested and rt-l(1) will be I(1), making the distribution non-standard. Nevertheless, the separate single equation tests might still be informative. In this case the t-values that rt-l(1) has a zero coefficient in each
102
equation were -4.05, -1.77, -.72, -.24 and .55 respectively, suggesting that the rejection of (10) lies in the behaviour of the one month rate i.e. the spreads are not capable of fully accounting for its movement. Engsted and Tanggaard (1994) also reach this conclusion. It may be that rt-l(1) is proxying for some omitted variable, and the literature has in fact canvassed the possibility of non-linear effects upon the short-term rate. Anderson (1994) makes the influence of spreads upon Art(l) non-linear, while Pfann et al. (1994) take the process driving rt(1) to be a non-linear autoregression - in particular, the latter allow for a number of regimes according to the magnitude of rt(1), with some of these regimes featuring I(1) behaviour of the rate while others do not. Another possibility, used in Conley et.al. (1994) is that the "drift term" in a continuous time model has the form ~]jm___ ma_jr{ and this would induce a non-linearity into the relation between Art and rt-1. Instead of a mis-specification in the mean, rejection of (10) may be due to levels effects in ezt. As noted earlier, the Dickey-Fuller test critical values are very sensitive to this effect, and the test that rlt-1 has a zero coefficient in the Art(l) equation in (12) is actually an A D F test, if the augmenting variables are taken to be the spreads. This led us to produce a small Monte Carlo simulation of Johansen's test for (10) under different assumptions about levels effects in the errors of the VAR. The example is a simplified version of the system above featuring only two variables Ylt and y2t with co-integrating vector [1 - 1], and being generated from the vector ECM,
Aylt A yzt
= = -.8(Ylt-i -. 1 (Ylt-1 Y2t-1) +
.lY~t_l~lt
Y2t-1 ) + . 1Y~t-1 ~2t
The 95 % critical value for Johansen's test that the co-integrating vector is the true one varies according to the value of 7 : 3.95(7 = 0), 4.86(y = .5), 5.87(y = .6), 11.20(7 = .8), and 23.63(7 = 1). Clearly, there is a major impact of the levels effect upon the sampling distribution of Johansen's test, and the phenomenon needs much closer investigation, but it is conceivable that rejection of (10) may just be due to the use of critical values that are too small. Even if one rejects the co-integrating vectors predicted by the expectations theory, the evidence is still that there are k = n - 1 error correction terms. It is natural to equate the remaining M - 1 factors in (7) (after elimination of the common trend) with these EC terms, but this is not very helpful, as it would mean that M = n, i.e. the number of factors would equal the number of yields. Hall et al. (1992) provide an example of forecasting the term structure using the ECM relation (9), imposing the expectations theory co-integrating vectors to form ~t, and then regressing Ayt on (t-1 and any lags needed in Ayt. Hence their model is equivalent to using a single factor, the common trend, to forecast the level, and (n - 1) factors to forecast the slope (the EC or spread terms). In practice however they impose the feature that some of the coefficients in ~ were zero, i.e. the number of factors determining the yield varies with the maturity being examined. It is interesting to note that their representation for Art(4) has no EC terms i.e. it is
103
effectively determined outside the system and plays the role of the "world interest rate" mentioned earlier. In an attempt to reduce the number of non-trend factors below n - 1, it is tempting to assume that (say) only m = M - 1 of the (n - 1) terms in (t appear as determinants of Art(z) and that these constitute the requisite c o m m o n factors, but such a restriction would necessitate m of the columns of 7 being zero, thereby violating the rank condition, P(7) = n - 1. Consequently the factors will need to be combinations of the EC terms. Now, pre-multiplying (7) by ce' gives
M
ct'yt=='~-~bj{jt ,
j=l
(13)
where b} = [flj1..-fljn] is a 1 x n vector. I f we designate the first factor as the c o m m o n trend then it must be that ='bl = 0 as the LHS is I(0) by construction, meaning that
K
(t = ~' ~__~ bj{jt = ~'BEt ,

j=2
(14)
where t is the (K - 1) x 1 vector containing {2t--. {Kt, and B is an n x (K - 1) matrix with p(B) = K - 1, where p(.) designates rank. Equation (14) enables us to draw a number of interesting conclusions. Firstly, p[cov(t)] = rain[p(=), p(B)], provided cov(Et) has rank K - 1. Since K < n implies K - 1 < n - 1, it must be that p(B) < p(~), and therefore p[cov(~t)] = K - 1 i.e. the number o f factors in the term structure (other than the c o m m o n trend) m a y be found by examining the rank of the covariance matrix of the co-integrating errors. Secondly, since C = ~'B has p(C) = K - 1, Ft = (C'C)-IC'(t, and hence the factors will be linear combinations of the EC terms. Applying principal components to the data set composed of spreads spt(3), spt(6), spt(9) and spt(120)~ the eigenvalues of the covariance matrix are 4.1, .37, .02 and .002, pointing to the fact that these four spreads can be summarized very well by three components (at most). 9 The three components are:
9 The principal components approach, or variants of it, has been used in a number of papers Litterman and Scheinkman (1991), Dybvig (1989) and Egginton and Hall (1993). This technique finds linear combinations o f the yields such that the variance o f each combination is as small as possible. Thus the i'th principal component of Yt will be b~yt, where b/is a set of weights. Because one could always multiply through by a scale factor the bi are normalized, i.e. b~bi = 1. With this restriction b becomes the eigenvectors of var(yt). Since b/is an eigenvector it is clear that b'var(yt)b = A, where A is a diagonal matrix with the eigenvalues (21 ... An) on it, and that tr[b'var(yt)b] = )-'~=l 2/. It is conventional to order the components according to the magnitude o f 2i; the first principal component having the largest 2i. There is a connection between principal components and common trends. Both seek linear combinations o f Yt and, in many cases, one o f the components can be interpreted as the common trend, e.g. in Egginton and Hall (1993) the first component is effectively the average of the interest rates, which we have mentioned as a possible common trend earlier.
104
A. R. Pagan, A. D. Hall and V. Martin .32spt(3) -.86spt(6) -.37spt(9) + .17spt(120) q~2t = -.78spt(3) + .OOspt(6) -.55spt(9) + .29spt(120) ~b3t = .54spt(3) + .52spt(6) -.58spt(9) + .37spt(120).
=
~lt
3. Models of the term structure
In this section we describe some popular ways of modelling the term structure. In order to assess whether these models are capable of replicating observed term structures, it is necessary to decide on some way to compare them to the data. There is a small literature wherein formal statistical tests have been performed on how well the models replicate the data in some designated dimension. Generally, however, the reasons for any rejection of the models remain unclear, as m a n y characteristics are being tested at the one time. In contrast, this chapter uses the method of "stylized facts", i.e. it seeks to match up the predictions of the model with the nature of the data as summarized in Section 2. Thus, we look at whether the models predict that yields are near-integrated, have levels effects in volatility, exhibit specific co-integrating vectors, produce persistence in spreads, and would be compatible with two or (at most) three factors in the term structure. 1
3.1. Solutions from the consumer's Euler equations

Consider a consumer maximising expected utility over a period subject to a budget constraint, i.e.
m a x E t [Ls=t ~U(C~)fls]
where/3 is a discount factor, and Cs is consumption at time s. It is well known that a first order condition for this is
U(Ct)vt = Et{ff-tu'(C~)v~}
where vt is the value of an asset (or portfolio) in terms of consumption goods. This can be re-arranged to give
Assuming that the asset is a discount bond, and the general price level is fixed, consider setting s = t + z giving vt = ft(z). The solution of this equation will then
10 There are many other characteristics of these yields that we ignore in this paper but which are challenging to explain e.g. the extreme leptokurtosis in the density of the change in yields and in the spreads.
105
provide a complete set o f discount b o n d prices for any maturity. It is useful to reexpress (15) as f t(z) = Et[ff U'(Ct+~)/U'(Ct)] , (16)
imposing the restriction that f t ( t + z) = 1, so as to find the price o f a zero c o u p o n b o n d paying $1 at maturity. Hence the term structure would then be determined. I f the price level is not fixed (16) needs to be modified to i t ( z ) = Et[fl~PtUt(Ct+,)/(U'(Ct)Pt+~)] , (17)
where Pt is the price level at time t. There have been a few attempts to price bonds f r o m (16) or (17). C a n o v a and M a r r i n a n (1993) and B o u d o u k h (1993) do this by assuming that ct = log (Ct+l/Ct) - 1 and pt = log (Pt+l/Pt) - 1, follow a V A R process with some volatility in the errors, and that the utility function has the C R A A form, U(Ct) = C ] - r / ( 1 - 7), where 7 is the coefficient o f risk aversion. 11 It is necessary to evaluate (17) for the yield rt(v) = - z -1 l o g f t ( z ) . rt(z) = - - 1 "c log Et[ff(Ct+~/Ct)-r(Pt/Pt+~)]
-where
l log Et [fl~(1 + c,~)-'(1 +
pt~)-1]
ct~ = G+~/Ct - 1 -~ log Ct+~ - log Ct Pt~ = Pt+~/Pt - 1 ~ logPt+~ - log Pt . E x p a n d i n g a r o u n d Et(ct~) and Et(pt~), and ignoring all cross terms and terms o f higher order than a quadratic, 12 - l o g f l - - 1 log { [(1 + Et(ctr))-~(1 + Et(p'~))-I]3 v + al~tvart(ct~) + a2~tvart(P~t)} , where
alrt a2rt
=
(18)
1/2(1 + 7) (1 +
+ Et(Ptz)) -1
= (1 + Et(ptz))-3(1 + Et(ct ))
11 Canova and Marrinan actually use the Cambridge equation for the price level, Pt = Mt/Yt, and so their VAR involves the growth in money, output and consumption. 12 The conditional covariance terms between eta:and Pt, are ignored as one is a real and the other a nominal quantity and most general equilibrium models would make this zero. Boudoukh (1993) however argues that the conditional covariance is important for explaining the term structure.
106
_~ - l o g f l + y--log (1 + Et(ct~)) + - log (1 + E,(pt~)) -- - log {bl~tvart(ct~) + b2~tvart(Pt~) } ,

T
(19)
where
blot = 1 (1 + y)y(1 + Et(ctx)) -2 bz~t = (1 + Et(pt~)) -2

Equation (19) points to a four factor model of the term structure with the level being driven by the first two conditional moments of the inflation rate and consumption growth. However, the relation is not easily interpreted as a linear one, since the weights attached to volatilities are functions of the conditional means. The problem remains to evaluate the conditional moments. To complete the model it is necessary to assume something about the evolution of Zlt Ctl and z2t = Ptl.These are generally taken to be AR processes of the form
=
Zjt ~-" ~Oj -Jr"~ l j Z j t - 1 "~ ejt
Canova and Marrinan (1993) take a~+l = vart(ejt+l) to be G A R C H processes of the form
o'2~+1 = aoj +
alja~.t
+ a2je2t ,
whereby the formulae in Baillie and Bollerslev (1992) can be used to evaluate Et(zjt~) and vart(zjt~), while Boudoukh (1993) has a2t as a stochastic volatility process. For G A R C H models vart(zjt~) is a linear function of a}t+l. How well does this model perform in replicating the stylized facts of the term structure? To produce a near unit root in yields it is necessary that log (1 + Et(pt~)) ~ Et(pt~) be near integrated i.e. inflation must be a near integrated process, as it is the only one of the two series that has such persistence in either mean or variance - see Boudoukh (1993) for a description of the time series properties of the two series. Then the inflation rate becomes the common trend in the term structure, and the spreads will depend upon consumption growth and the two volatilities. As there is rather weak evidence for much dependence in either inflation or consumption volatility - see the test statistics in Boudoukh- it is difficult to see the persistence in spreads being explained by these models. 13 Whether a levels effect in Art(z) can be produced is unclear; the G A R C H structures used by Canova and Marrinan will not produce it, but Boudoukh's stochastic volatility formulation does allow for a levels effect in vart(pt). Moreover, even if volatilities were constant, the conditional means enter the weights attached
13 Although Boudoukh finds much more in his estimated stochastic volatility specification than G A R C H specifications.
107
to them, and this dependence might be used to induce a levels effect into Art(z). Whilst Et(ctz) is likely to be close to a constant due to the weak autocorrelation in consumption growth, there is strong serial correlation in inflation rates, and, with inflation as the common trend, it is conceivable that the requisite effect could be found in that way, although the question was not addressed by the authors. 14 Another attempt at working within this framework is Constantinides (1992) who writes (17) as f t(z) = Et[Kt+~/K,] , where Kt = f f U ' ( C t ) / P t is referred to as a "pricing kernel". He then makes assumptions about the evolution of Kt, in particular that Kt=exp 9+ t+zot+Z(zit-ei)2
i=1
He works in continuous time and makes zot a Weiner process while the other zit are Ornstein-Uhlenbeck diffusion processes with parameters 2i and variances o-/2. Each of the zit are taken to be independent. Under these assumptions it turns out that f , ( ~ ) = {l-If=~gi(~)}-~/2exp - 0+ 2~ ~
i=1
i=l
where Hi(r) = ~r2i/2i + (1 - a2 / 2i)e 2~i~. Consequently, rt(z) has the format
N N
rt('r) ---6o~ q- Z
i=1
61i,:(zi, -- c~ie'Z~z) 2 + Z'r-l(zit _~i)2 .

i=1
Terms such as (z~t - ~i) 2 reflect the fact that the "variance" of the change in Zit of an Orstein-Uhlenbeck process depends upon the level of the variable z~t. Constantinides' model will have trouble producing the right outcomes. After converting to yields his model has no factor that would be I(1). The difficulty arises from his specification of the "pricing kernel". The pricing kernel used to evaluate (17) has an (2) variable Pt as it is the inflation rate which is I(1). Consequently it is the assumption implictly made by Constantinides that the kernel is only I(1) through the presence of the term zot which is the root of the problem with his model.
14 Essentially, these are "calibrated" models that emphasise the use of a highly specified theory to explain an observed phenomenon. Hence, one should really distinguish between the model prediction of yields, r~ (z), and the observed outcomes, rt (z). The gap between the two variables is due to factors not captured within the model, or perhaps to specification errors. Examination of the characteristics of the gap may be very informative.
108
3.2. One f a c t o r models f r o m f i n a n c e
Finance theory has developed by working with factor models to determine the term structure. Common to the material just discussed is the use of models of an economy in which there is inter-temporal optimization, but a notable difference is the introduction of a production sector and a concern with ensuring that the pricing formulae prohibit the possibility of arbitrage i.e the solution tends to be closer to a general rather than partial equilibrium solution. The basic work horse of the literature is the model due to Cox, Ingersoll and Ross (1985) (CIR). Essentially they propose an economy driven by a number of processes that affect the rate of return to assets e.g. technological change and (possibly) an inflation factor. Dealing with the simplest case where there is just a single state vector, #t, perhaps total factor productivity (TFP), it is assumed that this variable follows a diffusion process of the form
dla t = (b - #t)dt + q)&
1/2.
ar b .
General equilibrium in asset markets for such an economy results in an expression for the instantaneous rate of interest of the form
drt = (a - flrt)dt + ar t
1/2.
atlt .
(20)
Once one has the expression for the instantaneous rate the whole term structure f t ( z ) is priced according to a partial differential equation 1/2 a2r f ~ + (~ - fll r) f r + f t - )~rf r - r f = 0 , (21)
where frr = 0 2 f / O r O r, f r = O f / O r , f t = O f lOt and the term 2 r f r , which depends upon the covariance of the change in the price of the factor with the percentage change in the optimal portfolio, is the "market price of risk" associated with that factor. This partial differential equation comes from the fact that a zero coupon riskless bond maturing at t + z must be valued at
f t ( z ) = Et exp
E(r
,It
r(~)d~b
)1
(22)
Since the expected rate of change of the price of the bond is given by
r + 2r f J f , it also can be interpreted as a liquidity premium. It is clear that we could group together the terms (~ - f l r ) f r and - 2 r f r and treat the problem as
one of pricing an asset using a "hypothetical " instantaneous rate that is generated by
drt = (a _ flrt - 2rt)dt +
= (a-Trt)dt+art
ar t
1/2a~ , t (23)
1/2,
aqt
The distinction is between the true probability measure in (20) and the "equivalent martingale measure " in (23).
109
The analytic solution for the term structure in the CIR model is then (see Cox et al. (p. 393)) ft(z) = A1 ('c) exp(-Bl('c)rt) , where 26 exp((6 + 7)z/2 ) [ 2(exp(&c)-l) 6] 2e/"= ]
~1(~) = (6 + ~
Converting to a yield
-- 1) + 26 ,and 6 = (~ + 2 ) ~/2.
rt(v) = { - log (A1 ('c)) q- B1 ( v ) r t } / z .
(24)
This is a single factor model with the instantaneous rate or, more fundamentally, the "returns" factor, driving the whole term structure, i.e. the level of the term structure depends on the value of rt at any point in time. The slope of the yield curve depends upon the parameters of the diffusion equation as well as the market price of risk. Perhaps the biggest problem with this methodology is that it will never exactly reproduce an observed yield curve. This bothers practitioners a lot. One response has been to allow a to change according to -c and t. What this does is to add on "fudge factors" to the model based yield curve so that the modified curve equals the observed yield structure. Then, after forecasting rt+l and finding the predicted term structure, the "fudge factors" from the previous period are added on. The need for "fudge factors" suggests that there is substantial mis-specification in the CIR model as a description of the term structure, just as "intercept corrections" in macro econometric models were given such an interpretation. Brown and Dybvig (1986) estimated the parameters of the ClR model by maximum likelihood and then computed the residuals defined by the gap between the observed bond prices ( f t ) and the predictions of the model (f~). Examination of the residuals pointed to specification errors in the model. 15 Looking at the ClR model in the light of stylized facts, the data should posess the characteristic that interest rates are near-integrated processes and possibly co-integrated with cointegrating vectors between any pair of rates of [1 -1] i.e. the spreads should be I(0). The question that arises is whether the ClR model would deliver such a prediction. One problem to be overcome is quantifying the market price of risk, 2, in the ClR bond formulae. As ClR point out, ~. = 0 if the factor had no effect on the real economy e.g. if it was some nominal quantity such as the inflation rate. Accordingly, we will adopt this interpretation, allowing us to set 2 = 0. To induce a unit root we set fl = 0, and we also put the drift term ~ = 0. This makes 15 Sincethere are n yieldsbut only one factor they neededto add on a vector of errors to the model to produce a non-singularcovariancematrix for f~, in order to be able to form a likelihood.It may be that the mis-specificationreflectsthe assumptions made in this step.
110
6 = v/2a,Al("c) = 1,Bl('C)
Now the spread spt('c) will be
2(exp(6z) - 1) 6[exp(&) - 1)] + 26 '
rt('c) -- rt(1) = [r-lBl(r) - Bl (1)]rt ,

so that we will not get spreads to be I(0) unless the term in square brackets is zero. Generally it will not be. Realistic values for a, fl and ~ might be the G M M estimates for rt(1) of .049,.02 and .106. These produce values of r-lBa(z) = .990, .967, .930, .890 a n d . 181 for the five maturities. In the limit (z ~ o c ) B 1 ( r ) = 2/6, and so the spreads between adjoining yields tend to zero as the maturity lengthens. The source of the failure of the spreads to be I(0) is the fact that 6 0. If 6 = 0 then, using L'Hopital's rule, B1 (z) = r, and so the spreads should be identically zero. By making a very small we can always produce results in which the spreads will be very close to being I(0) i.e. even if a is not exactly zero it can be regarded as sufficiently close to zero that the spreads are nearly non-integrated, although the longer the maturity which the spread is based on the less likely we are to see such an outcome. Another way of understanding the problem is to look at a discrete form of the t+zfundamental pricing equation (22), ft(r) = Et[e X p ( - ~ j = t 1 rj)]. Suppose that rt is I(1) with martingale difference innovations that are normally distributed. Then /r--,t+~-I it(v) = e x p ( - r r t ) { I ] ~ - ~ [ 1 / 2 ( z - J ) ~2 V art~2..,j=t+l Art+j)]}. If the conditional variance is a constant the spreads will therefore be I(0). However, if it depends upon the level of the instantaneous rate, the spreads at any maturity would be equal to a non-linear function of rt. For example, substituting the "square-root" formulation of CIR gives vart(Art+j)= a2rt, and s p t ( ' c ) = e n s t - ( 1 - ~ ) log rt. Thus, it is important to determine the nature of the conditional variances in the data. Most econometric models of the term structure make these conditional variances G A R C H processes, which effectively means that they are functions of Art_ j. But, as seen in the section examining the term structure data, there is prima facie evidence of a levels effect after allowing for a G A R C H specification of the conditional variance. Given the conflicting evidence in Section 2, one might look at other co-integrating vectors when performing the comparison with CIR. In general, the CIR model points towards co-integrating vectors that are of the form
rt(r) = d('r)rt(1) ,
where d(r) < 1 and decreasing with ~. As seen in Section 2, with one exception both the Johansen and Phillips-Hansen estimates of d(r) have d ( r ) > 1 and
111
increasing in z. The predictions from CIR type models are therefore diametrically opposed to the data. 16
3.3. Two factor models from finance

Another response to the discrepancy between the model based prediction of a yield curve and the observed one, is to seek to make the model more complex. It is not uncommon in this literature to see people "bypassing" the step between the instantaneous rate and the fundamental driving forces and simply postulating a process for the instantaneous rate, after which this is used to price all the bonds. An example of this is the paper by Chen and Scott (1992) who assume that the instantaneous rate is the sum of two factors rt = {it + {2t , where (25)
d~lt = 0Zl -- fll~lt)dt +
1/2 O'l~lt dqlt
d~2t = (~2 - f12~2t) d t + 0"2~2t d?12t ,
V2
where dqj t are independent, thereby making each factor independent. Then the solution for the bond price is
f t ( z ) = A1 ( z ) A 2 ( z ) e x p { - B 1 ('t)~lt - B 2 ( z ) ~ 2 t } ,
where A2 and B2 are defined analogously to A1 and B1. Obviously this framework could be extended to encompass any number of factors, provided they are assumed to be independent. Another method is that of Longstaff and Schwartz (1992) who also have two factors but these are related to the underlying rate of return process #t rather than directly to the instantaneous rate. In particular they wish to have the two factors being linear combinations of the instantaneous rate and its conditional variance. The model is interesting because the second factor they use, ~2t, affects only the conditional variance of the Pt process, whereas both factors affect the conditional mean. This is unlike Chen and Scott's model which has ~lt and ~2t affecting both the mean and variance. Empirically, the two factors are regarded as the short term rate and its conditional volatility, where the latter is estimated by a G A R C H
16 Brown and Schaefer (1994) find that the CIR model closely fits the term structure of real yields, where these are computed from British government index-linked bonds. Note in constructing the Johansen and Phillips-Hansen estimators that an intercept was allowed into the relations in order to correspond to A(z).
112
process when assessing the quality of the model, x7 Tests of the model are limited to how well it replicates the unconditional standard deviations of yield changes. There are a number of other two factor models. Brennan and Schwartz (1979) and Edmister and M a d a n (1993) begin with the long and short rates following a joint diffusion process. After imposing the "no arbitrage condition" and assuming that the long rate is a traded instrument, Brennan and Schwatz find that the price of the instantaneous risk associated with the long rate can be eliminated, and the two factors then effectively become the instantaneous rate and the yield spread between that rate and the long rate. Eliminating the price of risk for the long rate makes the model non-linear and they need to linearize to find a solution. Even then there is no analytical solution for the yield curve as with CIR. Another possibility for a two factor model might be to allow for stochastic volatility as a factor. Edmister and M a d a n find closed form solutions for the term structure in their formulation. Suppose that the first factor in Chen and Scott's model is a "near I ( 1 ) " process whereas the second factor is I(0).Then the instantaneous rate has the c o m m o n trend format (compare (25) and (8) recognising that J can be regarded as the unit column vector). Using the same parameter values for the first factor as the polar case discussed in the preceding sub-section i.e. /~l = 0, 2 1 - - 0 , o'1 = 0, the first factor disappears from the spreads, which now equal
r t ( z ) - rt(1) ~- log ( A z ( 1 ) / A 2 ( ' r ) ) + [z-lB2(z) - B2(1)]~2t .
Hence, they are now stochastic and inherit the properties of the second factor. For them to be persistent, it is necessary that the second factor have that characteristic. Notice also that rt('c) - r t ( z - 1) will tend to zero as ~ --+ c% and this may make it implausible to use this model with a large range of maturities. Consequently, this two factor model can be made to reproduce the standard results of the co-integration approach in the sense that the EC terms are decomposed into a smaller number of factors. Of course the model would predict that the coefficients on the factors would be negative as ~-1B2(~) _< B2(1). The conclusion of negative weights extends to any number of factors, provided they are independent, so it is interesting to look at the evidence upon the signs of the coefficients of the factors in our data set, where the non-trend factors are equated with the principal components. Although one cannot uniquely move from the principal components/spreads relation to a spreads/principal components relation, a simple way to get some information on the relationship between spreads and factors is to regress each of the spreads against the principal components. Doing so the R 2 a r e .999, .999, .98 and .99 respectively, showing that the spreads are well explained by the three components. The results from the regressions are 17 Volatility affects the term structure here by its impact upon rt in (25). Shen and Starr (1992) raise the interesting question of why volatility should be priced; if one thinks of bonds as part of a larger portfolio only their covariances with the market portfolio would be relevant. To justify the observed importance of volatility they note that the bid/ask spread will be a function of volatility and that has an immediate effect upon yields.
M o d e l i n g the term structure
113
spt(3) = .36~1t - .831P2t + .48~t3t

spt(6) = -.76~01t - .09~k2t + .42~k3t
spt(9) = --1.28~1 t + .33~t2t + .44~3 t spt(120) = --l.44~% + 1.84~,2t + 2.12~k3t ,

where qJjt are the first three principal components. It is clear that independent factor models would not generate the requisite signs. Formal testing of two factor pricing models is in its infancy. Pearson and Sun (1994) and Chen and Scott (1993) estimate the parameters of the model by maximum likelihood and provide some evidence that at least two factors are needed to capture the term structure adequately. The two factor model is also useful for examining some of the literature on the validity of the expectations hypothesis. Campbell and Shiller (1991) pointed out that the hypothesis implies that
rt+l(Z -- 1) -- r,(z) = ~o +
1 ~-1
[rt(z) -- rt(1)]
(26)
if the liquidity premium was a constant. They found that this restriction was strongly rejected by the data. With McCulloch and Kwon's data and T = 3, the regression of rt+l (2) - rt(3) against rt(3) - rt(1) yields an estimated coefficient of -.09, well away from the predicted value of .5. Of course, the assumption of a constant premium is incorrect. Bond prices are determined by (22) which, when discretized, would be,
ft(z)=Et
exp
- ErJ)/
J=t ." .a
_exp(_Et(t~lrj))vt
:
(27)
where fEt(z ) is the bond price predicted by the expectations theory. Thus rt(z) differs from that of the expectations theory by the term - z -1 log vt, and this in turn will be a function of the conditional moments of Art. In the case where Art is conditionally normal it depends upon the conditional variance, and the equation corresponding to (26) will now feature a time varying ~0 that depends on this moment. If the conditional variance relates to the spreads with a negative coefficient, then that could cause there to be a negative bias in the coefficient of rt(z) - r t ( 1 ) in the Campbell and Shiller regressions. One scenario in which this happens is if the conditional variance depended upon Art, as happens with an E G A R C H model. Then, due to cointegration amongst yields, Art could also be replaced by the lagged spreads, and these will have negative coefficients. More generally, since we observed in Section 2 that the factors influencing the term structure, such as volatility, could be written as linear combinations of the
114
A. R. Pagan,A. D. Halland V. Martin
spreads, there is a possibility that term structure anomalies might be explained in this way.
3.4. Multiple non-independent factor models in finance

DuNe and Kan (1993) present a multi-factor model of the term structure where the factors may not be independent. As for the two factor models it is assumed that the instantaneous rate is a linear function of M factors, collected in an M x 1 vector it, which evolves according to the diffusion process
d~t = #(~t)dt + a(~t)d~lt ,

where dqt is a vector of standard Brownian motions and #(it), o'(~t) are vectors and matrices corresponding to drift and volatility functions. They then ask what type of functions #(.) and a(-) are capable of producing a solution for the n bond prices ft(z), z = 1 , . . . , n, of the exponential affine form
ft('c) = exp[(A(v) + B('c)~t)] = exp [(A(z) + ff~__l Bi('c)~it)]

It turns out that #(it) and a(~t) should be linear (aNne) functions of it- Thereupon the solution for B(z) can be found by solving an ordinary differential equation of the form /~(z) = B(B(z)), B(0) = 0 .
In most cases only numerical solutions for B(z) are available. DuNe and Kan consider some special cases, differing according to the evolution of it. When the ~it are joint diffusions driven by Brownian motion with covariance matrix f~ that is not diagonal, there is the possibility that the weights attached to the factors can have different signs, and so the principal defect with the two factor models of the preceding sub-section might be overcome. To date little empirical work seems to be available on these models, with the exception of E1 Karoui and Lacoste (1992) who make it Gaussian with constant volatility.
3.5. Forward rate models

In recent years it has become popular to model the forward rate structure directly rather than the yields, e.g. in Ho and Lee (1986) and Heath, Jarrow and Morton (1992) (HJM). Since the forward rates are linear combinations of the yields, specifications based on the nature of the forward rate structure imply some restriction upon the nature of the yield curve, and conversely. In the light of what is known about the behavior of yields, this sub-section considers the likelihood that popular models of forward rates can replicate the term structure. In what follows, one step ahead forward rates are used along with the HJM framework. In the
115
interest of space only a simple Euler discretization of the HJM stochastic differential equations describing the evolution of the forward rate curve is used. Many variants of these equations have emerged, but they have the common format, Ft(z - 1) - F t _ l ( Z ) = ct,-i + at,-let,.-I , where et,-i is n.i.d.(O, 1). Differences among the models reflect differences in the assumptions made about volatilities. Examples would be a constant volatility model in which ct,-1 = a0 + a2z and o't,z_1 = o', or a proportional volatility model that has ct,~-i =-6Ft('c))~ + ffFt(z)(~nk=lFt(k)) and o't,z-1 = riFt(z). The nature of ct,z-1 reflects the no-arbitrage assumption. After some manipulation it can be shown that Ft(z - 1) - Ft-, ('r) = + z+l
"c
spt(z) - T A r t ( z + 1) 1
~.rAz + 1) - rt(z) ) Art(l) ,
so that the equation used by HJM for the evolution of the forward rate incorporates spreads and changes in yields. In turn, using co-integration ideas, Art(z + 1) depends upon spreads, and this shows quite clearly that the characteristics of F t ( z - I ) - F t - l ( ' c ) will be those of the s p r e a d s - see Table 2. Consequently, at least for small z, constant volatility models with martingale difference errors could not adequately describe the data. It is possible that proportional volatility models might do so due to the dependence of their ct,~-i upon Ft('c), as the latter is near integrated. To check this out we regressed F t ( 2 ) - F t - l ( 3 ) against ct, 2 and s p t - l ( 3 ) for n = 9 and a variety of values for the market price of risk 2. For 2 = 0 the t ratio of the coefficient ofspt_l (3) was -4.37, while for very large 2 it was -4.70. Adopting other values for 2 resulted in t ratios between these extremes. Hence, the conditional mean for the forward rates is far more complex than that found in HJM models. Moreover, the rank of the covariance matrix of the errors et,~-I must reflect the number of factors in the term structure, which appears to be two or three, so that the common assumption of a single error to drive all forward spreads seems inaccurate. A number of formal investigations have been made into the compatibility of the HJM model with the data - Abken(1993) and Thurston(1994) fitted HJM models to forward rate data by G M M whilst Amin and Morton(1994) used options prices to recover implied volatilities whose evolution was compared to those of the most popular variants of the HJM model. Abken and Thurston reach conflicting conclusions-the latter favours a constant volatility formulation and the former a proportional one, although his general conclusion was that all models were rejected by the data. Consequently, it seems interesting to look at the stylized facts regarding volatility and to compare them with model specifications. Equation (28) is useful for this task. As it has been shown that there is a levels effect in Art(k), in order to have constant volatility it would be necessary that
116
A. R. Pagan, A. D. Hall and 1I. Martin
there be some "co-levels" effect, analogous to the co-persistence phenomenon of the G A R C H literature - Bollerslev and Engle (1993) - i.e. even though Art(k) displays a levels effect the linear combination ~-~!Art(z 1) - ~ A r t ( 1 ) does not. This contention is easily rejected - a plot of that variable squared against rt-l (3) looks almost identical to Figure 1, and such an observation points to the proportional volatility model as being the appropriate one.
4. Conclusion This chapter has described methods of modeling the term structure that are to be found in the econometrics and finance literatures. By utilizing a factor representation we have been able to show that there are many similarities in the two approaches. However, there were also some differences. Within the econometrics literature it is common to assume that yields are integrated processes and that spreads constitute the co-integrating relations. Although the finance literature takes the stance that yields are near integrated but stationary, it emerges that the models used in that literature would not predict that the spreads are co-integrating errors if we actually replaced the stationarity assumption by one of a unit root. The reason for this outcome is found to lie in the assumption that the conditional volatility of yields is a function of the level of the yields. Empirical work tends to support such an hypothesis and we suggest that the consequences of such a relationship can be profound for testing propositions about the term structure. We also document a number of stylized facts about a set of data on yields that prove useful in assessing the likely adequacy of many of the models that are used in finance for capturing the term structure
References
Abken, P. A. (1993). Generalized method of moments tests of forward rate processes. Working Paper, 93-7. Federal Reserve Bank of Atlanta. Amin, K. I. and A. J. Morton (1994). Implied volatility functions in arbitrage-free term structure models. J. Financ. Econom. 35, 141-180. Anderson, H. M. (1994). Transaction costs and nonlinear adjustment towards equilibrium in the US treasury bill market. Mimeo, University of Texas at Austin. Baillie, R.T. and T. Bollerslev (1992). Prediction in dynamic models with time-dependent conditional variances, J. Econometrics 52, 91-113. Baillie, R. T., T. Bollerslev and H. O. Mikkelson (1993). Fractionally integrated autoregressive conditional heteroskedasticity. Mimeo, Michigan State University. Bollerslev T. and R. F. Engle (1993). Common persistence in conditional variances: Definition and representation. Econometrica 61, 167-186. Boudoukh, J. (1993). An equilibrium model of nominal bond prices with inflation-output correlation and stochastic volatility. J. Money, Credit and Banking 25, 636~65. Brennan M. J. and E. S. Schwartz (1979). A continuous time approach to the pricing of bonds. J. Banking Finance 3, 133-155. Brenner R. J., R. H. Harjes and K. F. Kroner (1994). Another look at alternative models of the shortterm interest rate. Mimeo, University of Arizona.
117
Brown, S. J. and P. H. Dybvig (1986). The empirical implications of the Cox-Ingersoll-Ross theory of the term structure of intestest rates. J. Finance XLI, 617-632. Brown, R. H. and S. M. Schaefer (1994). The term structure of real interest rates and the Cox, Ingersoll and Ross model. J. Financ. Econom. 35, 3-42. Broze, L. O. Scaillet and J. M. Zakoian (1993). Testing for continuous-time models of the short-term interest rates. CORE Discussion Paper 9331. Campbell, J. Y. and R. J. Shiller (1991). Yield spreads and interest rate movements: A bird's eye view. Rev. Econom. Stud. 58, 495-514. Canova F. and J. Marrinan (1993). Reconciling the term structure of interest rates with the consumption based ICAP model. Mimeo, Brown University. Chan K. C., G. A. Karolyi, F. A. Longstaff and A. B. Sanders (1992). An empirical comparison of alternative models of the short-term interest rate. J. Finance XLVII. 1209-1227. Chen R. R. and L. Scott (1992). Pricing interest rate options in a two factor Cox-Ingersoll-Ross model of the term structure. Rev. Financ. Stud. 5, 613~536. Chen R. R. and L. Scott (1993). Maximum likelihood estimation for a multifactor equilibrium model of the term structure of interest rates. J. Fixed Income 3, 14-31. Conley T., L. P. Hansen, E. Luttmer and J. Scheinkman (1994). Estimating subordinated diffusions from discrete time data. Mimeo, University of Chicago. Constantinides, G. (1992). A theory of the nominal structure of interest rates. Rev. Financ. Stud. 5, 531-552. Cox, J. C., J. E. Ingersoll and S. A. Ross. (1985). A theory of the term structure of interest rates. Econometrica 53, 385-408. Duffie, D. and R. Kan (1993). A yield-factor model of interest rates. Mimeo, Graduate School of Business, Stanford University. Dybvig, P. H. (1989). Bonds and bond option pricing based on the current term structure. Working Paper, Washington University in St. Louis. Edmister, R. O. and D. B. Madan (1993). Informational content in interest rate term structures. Rev. Econom. Statist. 75, 695-699. Egginton, D. M. and S. G. Hall (1993). An investigation of the effect of funding on the slope of the yield curve. Working Paper No. 6, Bank of England. E1 Karoui, N. and V. Lacoste, (1992). Multifactor models of the term structure of interest rates. Working Paper. University of Paris VI. Engsted, T. and C. Tanggaard (1994). Cointegration and the US term structure. J. Banking Finance 18, 167-181. Evans, M. D. D. and K. L. Lewis (1994). Do stationary risk premia explain it all? Evidence from the term structure. J. Monetary Econom. 33, 285-318. Frydman, H. (1994). Asymptotic inference for the parameters of a discrete-time square-root process. Math. Finance 4, 169-181. Gallant, A. R. and G. Tauchen (1992). Which moments to match? Mimeo, Duke University. Gallant, A. R., D. Hsieh and G. Tauchen (1994). Estimation of stochastic volatility models with diagnostics. Mimeo, Duke University. Gonzalo, J. and C. W. J. Granger, (1991). Estimation of common long-memory components in cointegrated systems. UCSD, Discussion Paper 91-33. Gourirroux, C., A. Monfort and E. Renault (1993). Indirect inference. J. AppL Econometrics 8, $85Sl18. Gourirroux, C. and O. Scaillet (1994). Estimation of the term structure from bond data. Working Paper No. 9415 CEPREMAP. Hail, A. D., H. M. Anderson and C. W. J. Granger. (1992). A cointegration analysis of treasury bill yields. Rev. Econom. Statist. 74, 116-126. Heath, D., R. Jarrow and A. Morton (1992). Bond pricing and the term structure of interest rates: A new methodology for contingent claims valuation. Econometrica 60, 77-105. Hejazi, W. 1994. Are term premia stationary? Mimeo, University of Toronto.
118
Ho, T. S. and S-B Lee (1986). Term structure movements and pricing interest rate contingent claims. J. Finance 41, 1011-1029. Johansen, S. (1988). Statistical analysis of cointegrating vectors. J. Econom. Dynamic Control 12, 231254. Johnson, P. A. (1994). On the number of common unit roots in the term structure of interest rates. Appl. Econom. 26, 815-820. Kearns, P. (1993). Volatility and the pricing of interest rate derivative claims. Unpublished doctoral dissertation, University of Rochester. Koedijk, K. G., F. G. J. A. Nissen, P. C. Schotman and C. C. P. Wolff (1993). The dynamics of shortterm interest rate volatility reconsidered. Mimeo, Limburg Institute of Financial Economics. Litterman, R and J. Scheinkman (1991). Common factors affecting bond returns. J. Fixed Income 1, 54-61. Longstaff, F. and E. S. Schwartz (1992). Interest rate volatility and the term structure: A two factor general equilibrium model. J. Finance XLVII 1259-1282. Marsh, T. A. and E. R. Rosenfeld (1983). Stochastic processes for interest rates and equilibrium bond prices. J. Finance XXXVIII, 635450. Mihlstein, G. N. (1974). Approximate integration of stochastic differential equations. Theory Probab. Appl. 19, 557-562. McCulloch, J. H. (1989). US term structure data. 1946-1987, Handbook of Monetary Economics 1, 672-715. McCulloch, J. H. and H. C. Kwon (1993). US term structure data. 1947-1991. Ohio State University Working Paper 93-6. Pearson, N. D. and T-S Sun (1994). Exploiting the conditional density in estimating the term structure: An application to the Cox, Ingersoll and Ross model, d. Fixed Income XLIX, 1279-1304. Pfann, G. A., P. C. Schotman and R. Tschernig (1994). Nonlinear interest rate dynamics and implications for the term structure. Mimeo, University of Limburg. Phillips, P. C. B. and B. E. Hansen (1990). Statistical inference in instrumental variables regression with I(1) processes. Rev. Econom. Stud. 57, 99-125. Shen, P. and R. M. Start (1992). Liquidity of the treasury bill market and the term structure of interest rates. Discussion paper 92-32. University of California at San Diego. Stock, J. H. and M. W. Watson (1988). Testing for common trends. J. Amer. Statist. Assoc. 83, 10971107. Thurston, D. C. (1994). A generalized method of moments comparison of discrete Heath-JarrowMorton interest rate models. Asia Pac. J. Mgmt. 11, 1-19. Vetzal, K. R. (1992). The impact of stochastic volatility on bond option prices. Working Paper 92-08. University of Waterloo. Institute of Insurance and Pension Research, Waterloo, Ontario. Zhang, Z. (1993). Treasury yield curves and cointegration. Appl. Econom. 25, 361-367.
G. S. Maddala, and C. R. Rao, eds., Handbookof Statistics, Vol. 14 1996 Elsevier Science B.V. All rights reserved.
..)
Stochastic Volatility*
Eric Ghysels, A n d r e w C. Harvey and Eric Renault
1. Introduction The class of stochastic volatility (SV) models has its roots both in mathematical finance and financial econometrics. In fact, several variations of SV models originated from research looking at very different issues. Clark (1973), for instance, suggested to model asset returns as a function of a r a n d o m process of information arrival. This so-called time deformation approach yielded a time-varying volatility model of asset returns. Later Tauchen and Pitts (1983) refined this work proposing a mixture of distributions model of asset returns with temporal dependence in information arrivals. Hull and White (1987) were not directly concerned with linking asset returns to information arrival but rather were interested in pricing European options assuming continuous time SV models for the underlying asset. They suggested a diffusion for asset prices with volatility following a positive diffusion process. Yet another approach emerged from the work of Taylor (1986) who formulated a discrete time SV model as an alternative to Autoregressive Conditional Heteroskedasticity ( A R C H ) models. Until recently estimating Taylor's model, or any other SV model, remained almost infeasible. Recent advances in econometric theory have made estimation of SV models much easier. As a result, they have become an attractive class of models and an alternative to other classes such as A R C H . Contributions to the literature on SV models can be found both in mathematical finance and econometrics. Hence, we face quite a diverse set of topics. We say very little about A R C H models because several excellent surveys on the subject have appeared recently, including those by Bera and Higgins (1995), Bollerslev, Chou and Kroner (1992), Bollerslev, Engle and Nelson (1994) and
* We benefitedfrom helpful comments from Torben Andersen, David Bates, Frank Diebold, Ren6 Garcia, Eric Jacquier and Neil Shephard on preliminary drafts of the paper. The first author would like to acknowledge the financial support of FCAR (Qurbec), SSHRC (Canada) as well as the hospitality and support of CORE (Louvain-la-Neuve,Belgium). The second author wishes to thank the ESRC for financial support. The third author would like to thank the Institut Universitairede France, the Frd~ration Frangaise des Socirt~s d'Assurance as well as CIRANO and C.R.D.E. for financial support. 119
120
E. Ghysels, A. C. Harvey and E. Renault
Diebold and Lopez (1995). Furthermore, since this chapter is written for the Handbook of Statistics, we keep the coverage of the mathematical finance literature to a minimum. Nevertheless, the subject of option pricing figures prominently out of necessity. Indeed, Section 2, which deals with definitions of volatility has extensive coverage of Black-Scholes implied volatilities. It also summarizes empirical stylized facts and concludes with statistical modeling of volatility. The reader with a greater interest in statistical concepts may want to skip the first three subsections of Section 2 which are more finance oriented and start with Section 2.4. Section 3 discusses discrete time models, while Section 4 reviews continuous time models. Statistical inference of SV models is the subject of Section 5. Section 6 concludes.
2. Volatility in financial markets

Volatility plays a central role in the pricing of derivative securities. The BlackScholes model for the pricing of an European option is by far the most widely used formula even when the underlying assumptions are known to be violated. Section 2.1 will therefore take the Black-Scholes model as a reference point from which to discuss several notions of volatility. A discussion of stylized facts regarding volatility and option prices will appear next in Section 2.2. Both sections set the scene for a formal framework defining stochastic volatility which is treated in Section 2.3. Finally, Section 2.4 introduces the statistical models of stochastic volatility.
2.1. The Black-Scholes model and implied volatilities

More than half a century after the seminal work of Louis Bachelier (1900), continuous time stochastic processes have become a standard tool to describe the behavior of asset prices. The work of Black and Scholes (1973) and Merton (1990) has been extremely influential in that regard. In Section 2.1.1 we review some of the assumptions that are made when modeling asset prices by diffusions, in particular to present the concept of instantaneous volatility. In Section 2.1.2 we turn to option pricing models and the various concepts of implied volatility.
2.1.1. An instantaneous volatility concept We consider a financial asset, say a stock, with today's (time t) market price denoted by St. 2 Let the information available at time t be described by It and consider the conditional distribution of the return St+h/St of holding the asset over the period [t,t + hi given It. 3 A maintained assumption throughout this chapter will be that asset returns have finite conditional expectation given It or:
2 Here and in the remainder of the paper we will focus on options written on stocks or exchange rates. The large literature on the term structure of interest rates and related derivative securities will not be covered. 3 Section 2.3 will provide a more rigorous discussion of information sets. It should also be noted that we will indifferently be using conditional distributions of asset prices St+h and of returns St+h/St since St belongs to It.
Stochast& volatility
121
Et(St+h/St) = S~-lEtSt+h < Vt(St+h/St) = St2VtSt+h
+co
(2.1.1)
and likewise finite conditional variance given < +~ .
It,
namely (2.1.2)
The continuously compounded expected rate of return will be characterized by h -1 log Et(St+h/St). Then a first assumption can be stated as follows: ASSUMPTION 2.1.1.A. The continuously compounded expected rate of return converges almost surely towards a finite value i~s(It) when h > 0 goes to zero. F r o m this assumption one has EtSt+h - St ~'~ h#s(It)St or in terms of its differential representation: d Et(S~) /~s (It)St almost surely (2.1.3)
where the derivatives are taken from the right. Equation (2.1.3) is sometimes loosely defined as: Et(dSt)= ps(lt)Stdt. The next assumption pertains to the conditional variance and can be stated as: ASSUMPTION 2.1.1.B. The conditional variance of the return h-~Vt(St+h/St) converges almost surely towards a finite value a2s(It) when h > 0 goes to zero. Again, in terms of its differential representation this amounts to:
ff--~Vart(Sz) ~=t= a2(It)S_2 almost
surely
(2.1.4)
and one loosely associates with the expression Vt(dSt) = a~(It)S2tdt. Both assumptions 2.1.1.A and B lead to a representation of the asset price dynamics by an equation of the following form:
dSt = #s(It)Stdt + as(lt)StdWt
(2.1.5)
where Wt is a standard Brownian Motion. Hence, every time a diffusion equation is written for an asset price process we have automatically defined the so-called instantaneous volatility process as(It) which from the above representation can also be written as:
q 1/2
as(It)
= [lim Lhi h -1
Vt(St+h/St)J
(2.1.6)
Before turning to the next section we would like to provide a brief discussion of some of the foundations for the Assumptions 2.1.1.A and B. It was noted that Bachelier (1900) proposed Brownian Motion process as a model of stock price movements. In modern terminology this amounts to the random walk theory of asset pricing which claims that asset returns ought not to be predictable because of the informational efficiency of financial markets. Hence, it assumes that returns
122
on consecutive regularly sampled periods [t + k, t + k + 1],k = 0 , 2 , . . . ,h - 1 are independently (identically) distributed. With such a benchmark in mind, it is natural to view the expectation and the variance of the continuously compounded rate of return log (St+h/St) as proportional to the maturity h of the investment. Obviously we no longer use Brownian Motions as a process for asset prices but it is nevertheless worth noting that Assumptions 2.1.1.A and B also imply that the expected rate of return and the associated squared risk (in terms of variance of the rate of return) of an investment over an infinitely-short interval [t, t + hi is proportional to h. Sims (1984) provided some rationale for both assumptions through the concept of "local unpredictability". To conclude, let us briefly discuss a particular special case of (2.1.5) predominantly used in theoretical developments and also highlight an implicit restriction we made. When #s(It) = #s and as(It) = as are constants for all t the asset price is a Geometric Brownian Motion. This process was used by Black and Scholes (1973) to derive their well-known pricing formula for European options. Obviously, since as(It) is a constant we no longer have an instantaneous volatility process but rather a single parameter as - a situation which undoubtedly greatly simplifies many things including the pricing of options. A second point which needs to be stressed is that Assumptions 2.1.1.A and B allow for the possibility of discrete jumps in the asset price process. Such jumps are typically represented by a Poisson process and have been prominent in the option pricing literature since the work of Merton (1976). Yet, while the assumptions allow in principle for jumps, they do not appear in (2.1.5). Indeed, throughout this chapter we will maintain the assumption of sample path continuity and exclude the possibility of jumps as we focus exclusively on SV models.
2.1.2. Option prices and implied volatilities It was noted in the introduction that SV models originated in part from the literature on the pricing of options. We have witnessed over the past two ,decades a spectacular growth in options and other derivative security markets. Such markets are sometimes characterized as places where "volatilities are traded". In this section we will provide the rationale for such statements and study the relationship between so-called options implied volatilities and the concepts of instantaneous and averaged volatilities of the underlying asset return process. The Black-Scholes option pricing model is based on a Log-Normal or Geometric Brownian Motion model for the underlying asset price: dSt = ~sStdt + asStdWt
(2.1.7)
where #s and as are fixed parameters. A European call option with strike price K and maturity t + h has a payoff:
f St+h - K if St+h >_ K [St+h - K] += ~. 0 otherwise
(2.1.8)
Stochastic volatility
123
Since the seminal Black and Scholes (1973) paper, there is now a well established literature proposing various ways to derive the pricing formula of such a contract. Obviously, it is beyond the scope of this paper to cover this literature in detail. 4 Instead, the bare minimum will be presented here allowing us to discuss the concepts of interest regarding volatility. With continuous costless trading assumed to be feasible, it is possible to form in the Black-Scholes economy a portfolio using one call and a short-sale strategy for the underlying stock to eliminate all risk. This is why the option price can be characterized without ambiguity, using only arbitrage arguments, by equating the market rate of return of the riskless portfolio containing the call option with the risk-free rate. Moreover, such arbitrage-based option pricing does not depend on individual preferences. 5 This is the reason why the easiest way to derive the Black-Scholes option pricing formula is via a "risk-neutral world", where asset price processes are specified through a modified probability measure, referred to as the risk neutral probability measure denoted Q (as discussed more explicitly in Section 4.2). This fictitious world where probabilities in general do not coincide with the Data Generating Process (DGP), is only used to derive the option price which remains valid in the objective probability setup. In the risk neutral world we have:
dSt/St = rtdt + asdWt Ct = C(St, K, h, t) = B(t, t + h)EQ(st+h - K) +
(2.1.9) (2.1.10)
where Et Q is the expectation under Q, B(t, t + h) is the price at time t of a pure discount bond with payoff one unit at time t + h and rt = -~ln~ ~ Log B(t, t + h)
1
(2.1.11)
is the riskless instantaneous interest rate. 6 We have implicitly assumed that in this market interest rates are nonstochastic (Wt is the only source of risk) so that:
B(t,t + h) = e x p [ - ft+hr~d~] .
(2.1.12)
By definition, there are no risk premia in a risk neutral context. Therefore rt coincides with the instantaneous expected rate of return of the stock and hence
4 See however Jarrow and Rudd (1983), Cox and Rubinstein (1985), Duffie (1989), Duffle (1992), Hull (1993) or Hull (1995) among others for more elaborate coverage of options and other derivative securities. 5 This is sometimes refered to as preferencefree option pricing. This terminology may somewhat be misleading since individual preferences are implicitly taken into account in the market price of the stock and of the riskless bond. However, the option price only depends on individual preferences through the stock and bond market prices. 6 For notational convenience we denote by the same symbol Wt a Brownian Motion under P (in 2.1.7) and under Q (in 2.1.9). Indeed, Girsanov's theorem establishes the link between these two processes (see e.g. Duffle (1992) and section 4.2.1).
124
the call option price Ct is the discounted value of its terminal payoff (St+h -- K ) + as stated in (2.1.10). The log-normality of St+h given St allows one to compute the expectation in (2.1.10) yielding the call price formula at time t:
Ct = St4)(dt) - KB(t, t + h)c~(dt - asx/h)
(2.1.13)
where ~b is the cumulative standard normal distribution function while dt will be defined shortly. Formula (2.1.13) is the so-called Black-Scholes option pricing formula. Thus, the option price Ct depends on the stock price St, the strike price K and the discount factor B(t, t + h). Let us now define:
xt = Log St/KB(t, t + h) .
(2.1.14)
Then we have:
C,/St = 4)(dr) - e-X'4)(dt - asX/~)
(2.1.15)
with dt = ( x t / a s x f h ) + asx/~/2. It is easy to see the critical role played by the quantity xt, called the moneyness of the option.
-
I f x t = 0, the current stock price St coincides with the present value of the strike price K. In other words, the contract m a y appear to be fair to somebody who would not take into account the stochastic changes of the stock price between t and t + h. We shall say that we have in this case an at the m o n e y option. - I f xt > 0 (respectively xt < 0) we shall say that the option is in the money (respectively out the money). 7
It was noted before that the Black-Scholes formula is widely used among practitioners, even when its assumptions are known to be violated. In particular the assumption of a constant volatility as is unrealistic (see Section 2.2 for empirical evidence). This motivated Hull and White (1987) to introduce an option pricing model with stochastic volatility assuming that the volatility itself is a state variable independent of Wt: 8
dSt/St = rtdt + astdWt (ast)te[o,r], (Wt)tE[O,T] independent Markovian .
(2.1.16)
It should be noted that (2.1.16) is still written in a risk neutral context since rt coincides with the instantaneous expected return of the stock. On the other hand the exogenous volatility risk is not directly traded, which prevents us from de-
7 We use here a slightly modified terminology with respect to the usual one. Indeed, it is more common to call at the money/in the money/out of the money options, when St = K/St > K/St < K respectively. From an economic point of view, it is more appealing to compare St with the present value of the strike price K. 8 Other stochastic volatility models similar to Hull and White (1987) appear in Johnson and Shanno (1987), Scott (1987), Wiggins (1987), Chesney and Scott (1989), Stein and Stein (1991) and Heston (1993) among others.
125
fining unambiguously a risk neutral probability measure, as discussed in more detail in Section 4.2. Nevertheless, the option pricing formula (2.1.10) remains valid provided the expectation is computed with respect to the joint probability distribution of the Markovian process (S, as), given (St, ast).9 We can then rewrite (2.1.10) as follows:
Ct = B(t, t + h)Et(St+h - K) + = B ( t , t + h)Et{E[(St+h - K)+[(~7Sz)t<z<t+h] }
(2.1.17)
where the expectation inside the brackets is taken with respect to the conditional probability distribution of St+h given It and a volatility path a&, t < z < t + h. However, since the volatility process O'sT is independent of Wt, we obtain using (2.1.15)
B ( t , t + h)Et[(St+h - K)+[(trs~)t<~<t+h] = StEt[c~(dlt) - e-X'49(d2t)]
(2.1.18)
Here dlt and d2t are defined as follows:

dlt = (xt/v(t, t + h ) v ~ ) + ~(t, t + h ) v ~ / 2 d2t = dl - ~;(t, t + h ) v ~
where 7(t, t + h) > 0 and:

__ 1 [ t + h
72(t,t + h) - ~ j t
tr2~dz .
(2.1.19)
This yields the so-called Hull and White option pricing formula:
Ct = StEt[~9(dlt) - e-X'q~(d2t)] ,
(2.1.20)
where the expectation is taken with respect to the conditional probability distribution (for the risk neutral probability measure) of y(t, t + h) given rrSt .10 In the remainder of this section we will assume that observed option prices obey Hull and White's formula (2.1.20). Then option prices would yield two types of implied volatility concepts: (1) an instantaneous implied volatility, and (2) an averaged implied volatility. To make this more precise, let us assume that the risk neutral probability distribution belongs to a parametric family, Po, 0 E 6). Then, the Hull and White option pricing formula yields an expression for the option price as a function:
Ct = StF[crst, xt, 0o]
(2.1.21)
9 We implicitly assume here that the available information This assumption will be discussed in Section 4.2.
It contains the past values (Sz, a~)~<t.
10 The conditioning is with respect to at since it summarizes the relevant information taken from It (the process a is assumed to be Markovian and independent of W).
126
where 0o is the true unknown value of the parameters. Formula (2.1.21) reveals why it is often claimed that "option markets can be thought of as markets trading volatility" (see e.g. Stein (1989)). As a matter of fact, if for any given (xt, 0), F(.,xt, O) is one-to-one, then equation (2.1.21) can be inverted to yield an implied
instantaneous volatility: ix
o'~P(0) = GISt, Ct,xt, 0] . (2.1.22)
Bajeux and Rochet (1992), by showing that this one-to-one relationship between option prices and instantaneous volatility holds, in fact formalize the use of option markets as an appropriate instrument to hedge volatility risk. Obviously implied instantaneous volatilities (2.1.22) could only be useful in practice for pricing or hedging derivative instruments when we know the true unknown value 0o or, at least, are able to compute a sufficiently accurate estimate of it. However, the difficulties involved in estimating SV models has for long prevented their widespread use in empirical applications. This is the reason why practitioners often prefer another concept of implied volatility, namely the socalled Blaek-Scholes implied volatility introduced by Latane and Rendleman (1976). It is a process ~oimp(t,t + h) defined by:
dlt =
Ct = St[4(dlt) - e-X'c~(d2t)] (Xt/o~ilnP(t, t + h)x/h) + (oimP(t, t + h)v/-h/2
(2.1.23)
d2t = d l t - ogimP(t, t h ) v f h
where Ct is the observed option price. 12 The Hull and White option pricing model can indeed be seen as a theoretical foundation for this practice; the comparison between (2.1.23) and (2.1.20) allows us to interpret the Black-Scholes implied volatility fOimP(t, t + h) as an implied averaged volatility since f~imP(t, t h) is something like a conditional expectation of y(t, t + h) (assuming observed option prices coincide with the Hull and White pricing formula). To be more precise, let us consider the simplest case of at the money options (the general case will be studied in Section 4.2). Since xt = 0 it follows that dzt = - d l t and therefore: q~(dat) - e-X@(d2t) = 2qS(dlt) - 1. Hence, ~Oio mp(t, t + h) (the index o is added to make explict that we consider at the money options) is defined by:
Since the cumulative standard normal distribution function is roughly linear in the neighborhood of zero, if follows that (for small maturities h):
11 The fact that F(., xt, 0) is one-to-one is shown to be the case for any diffusion model on ~st under certain regularity conditions, see Bajeux and Rochet (1992). 12 We do not explicitly study here the dependence between foimp(t, t -~- h) and the various related processes : C , St, xt. This is the reason why, for sake of simplicity, this dependence is not apparent in the notation ogimp (t: t + h).
oiomP(t,t + h) ~ Ed(t, t + h) . This yields an interpretation of the Black-Scholes ogimp(t~ t + h) as an implied average volatility:
1/2
127
implied
volatility
oio mp(t, t + h) ~ E, [h J t
(2.1.25)
2.2. Some stylized facts

The search for model specification and selection is always guided by empirical stylized facts. A model's ability to reproduce such stylized facts is a desirable feature and failure to do so is most often a criterion to dismiss a specification, although one typically does not try to fit or explain all possible empirical regularities at once with a single model. Stylized facts about volatility have been well documented in the A R C H literature, see for instance Bollerslev, Engle and Nelson (1994). Empirical regularities regarding derivative securities and implied volatilities are also well covered, for instance, by Bates (1995a). In this section we will summarize empirical stylized facts, complementing and updating some of the material covered in the aforementioned references.
(a) Thick tails

Since the early sixties it was observed, notably by Mandelbrot (1963), Fama (1963, 1965), among others that asset returns have leptokurtic distributions. As a result, numerous papers have proposed to model asset returns as i.i.d, draws from fat-tailed distributions such as Paretian or Lrvy.
(b) Volatility clustering

Any casual observations of financial time series reveal bunching of high and low volatility episodes. In fact, volatility clustering and thick tails of asset returns are intimately related. Indeed, the latter is a static explanation whereas a key insight provided by A R C H models is a formal link between dynamic (conditional) volatility behavior and (unconditional) heavy tails. A R C H models, introduced by Engle (1982) and the numerous extensions thereafter, as well as SV models are essentially built to mimic volatility clustering. It is also widely documented that A R C H effects disappear with temporal aggregation, see e.g. Diebold (1988) and Drost and Nijman (1993).
(c) Leverage effects

A phenomenon coined by Black (1976)as the leverage effect suggests that stock price movements are negatively correlated with volatility. Because falling stock prices imply an increased leverage of firms it is believed that this entails more uncertainty and hence volatility. Empirical evidence reported by Black (1976), Christie (1982) and Schwert (1989) suggests, however, that leverage alone is too
128
small to explain the empirical asymmetries one observes in stock prices. Others reporting empirical evidence regarding leverage effects include Nelson (1991), Gallant, Rossi and Tauchen (1992, 1993), Campbell and Kyle (1993) and Engle and Ng (1993).
(d) Information arrivals Asset returns are typically measured and modeled with observations sampled at fixed frequencies such as daily, weekly or monthly observations. Several authors, including Mandelbrot and Taylor (1967) and Clark (1973) suggested linking asset returns explicitly to the flow of information arrival. In fact it was already noted that Clark proposed one of the early examples of SV models. Information arrival is non-uniform through time and quite often not directly observable. Conceptually, one can think of asset price movements as the realization of a process Yt = Y~: where Zt is a so-called directing process. This positive nondecreasing stochastic process Zt can be thought of as being related to the arrival of information. This idea of time deformation or subordinated stochastic processes was used by Mandelbrot and Taylor (1967) to explain fat tailed returns, by Clark (1973) to explain volatility and was recently refined and further explored by Ghysels, Gourirroux and Jasiak (1995a). Moreover, Easley and O'Hara (1992) provide a microstructure model involving time deformation. In practice, it suggests a direct link between market volatility and (1) trading volume, (2) quote arrivals, (3) forecastable events such as dividend announcements or macroeconomic data releases, (4) market closures, among many other phenomena linked to information arrival. Regarding trading volume and volatility there are several papers documenting stylized facts notably linking high trading volume with market volatility, see for example Karpoff (1987) or Gallant, Rossi and Tauchen (1992). 13 The intraday patterns of volatility and market activity measured for instance by quote arrivals are also well-known and documented. Wood, Mclnish and Ord (1985) and Harris (1986) studied this phenomenon for securities markets and found a U-shaped pattern with volatility typically high at the open and close of the market. The around the clock trading in foreign exchange markets also yields a distinct volatility pattern which is tied with the intensity of market activity and produces strong seasonal patterns. The intraday patterns for FX markets are analyzed for instance by Miiller et al. (1990), Baillie and Bollerslev (1991), Harvey and Huang (1991), Dacorogna et al. (1993), Bollerslev and Ghysels (1994), Andersen and Bollerslev (1995), Ghysels, Gourirroux and Jasiak (1995b) among others. Another related empirical stylized fact is that of overnight and weekend market closures and their effect on volatility. Fama (1965) and French and Roll (1986) have found that information accumulates more slowly when the NYSE and AMEX are closed resulting in higher volatility on those markets after weekends
13 There are numerous models, theoretical and empirical, linking trading volume and asset returns which we cannot discuss in detail. A partial list includes Foster and Viswanathan (1993a,b), Ghysels and Jasiak (1994a,b), Hausman and Lo (1991), Huffman (1987), Lamoureux and Lastrapes (1990, 1993), Wang (1993) and Andersen (1995).
129
and holidays. Similar evidence for F X markets has been reported by Baillie and Bollerslev (1989). Finally, numerous papers documented increased volatility of financial markets around dividend announcements (Cornell (1978), Patell and Wolfson (1979,1981)) and macroeconomic data releases (Harvey and Huang (1991, 1992), Ederington and Lee (1993)).
(e) Long memory and persistence

Generally speaking volatility is highly persistent. Particularly for high frequency data one finds evidence of near unit root behavior of the conditional variance process. In the A R C H literature numerous estimates of G A R C H models for stock market, commodities, foreign exchange and other asset price series are consistent with an I G A R C H specification. Likewise, estimation of stochastic volatility models show similar patterns of persistence (see for instance Jacquier, Polson and Rossi (1994)). These findings have led to a debate regarding modeling persistence in the conditional variance process either via a unit root or a long memory process. The latter approach has been suggested both for A R C H and SV models, see Baillie, Bollerslev and Mikkelsen (1993), Breidt et al. (1993), Harvey (1993) and Comte and Renault (1995). Ding, Granger and Engle (1993) studied the serial correlations of Jr(t, t + 1)]c for positive values of c where r(t, t + 1) is a one-period return on a speculative asset. They found Jr(t, t + 1)]c to have quite high autocorrelations for long lags while the strongest temporal dependence was for c close to one. This result initially found for daily S&P500 return series was also shown to hold for other stock market indices, commodity markets and foreign exchange series (see Granger and Ding (1994)).
(t) Volatility comovements

There is an extensive literature on international comovements of speculative markets. Concerns on whether globalization of equity markets increases price volatility and correlations of stock returns has been the subject of many recent studies including, von Fustenberg and Jean (1989), Hamao, Masulis and Ng (1990), King, Sentana and Wadhwani (1994), Harvey, Ruiz and Sentana (1992), and Lin, Engle and Ito (1994). Typically one uses factor models to model the commonality of international volatility, as in Diebold and Nerlove (1989), Harvey, Ruiz and Sentana (1992), Harvey, Ruiz and Shephard (1994) or explores socalled common features, see e.g. Engle and Kozicki (1993) and common trends as studied by Bollerslev and Engle (1993).
(g) Implied volatility correlations

Stylized facts are typically reported as model-free empirical observations, i4 Implied volatilities are obviously model-based as they are calculated from a pricing 14This is in some part fictitious even for macroeconomicdata for instance when they are detrended or seasonally adjusted. Both detrending and seasonal adjustment are model-based. For the potentially severe impact of detrending on stylized facts see Canova (1992) and Harvey and Jaeger (1993) and for the effectof seasonal adjustment on empiricalregularities see Ghyselset al. (1993).
130
equation of a specific model, namely the Black and Scholes model as noted in Section 2.1.3. Since they are computed on a daily basis there is obviously an internal inconsistency since the model presumes constant volatility. Yet, since many option prices are in fact quoted through their implied volatilities it is natural to study the time series behavior of the latter. Often one computes a composite measure since synchronous option prices with different strike prices and maturities for the same underlying asset yield different implied volatilities. The composite measure is usually obtained from a weighting scheme putting more weight on the near-the-money options which are the most heavily traded in organized markets. 15 The time series properties of implied volatilities obtained from stock, stock index and currency options are quite similar. They appear stationary and are well described by a first order autoregressive model (see Merville and Pieptea (1989) and Sheikh (1993) for stock options, Poterba and Summers (1986), Stein (1989), Harvey and Whaley (1992) and Diz and Finucane (1993) for the S&P100 contract and Taylor and Xu (1994), Campa and Chang (1995) and Jorion (1995) for currency options). It was noted from equation (2.1.25) that implied (average) volatilities are expected to contain information regarding future volatility and therefore should predict the latter. One typically tests such hypotheses by regressing realized volatilities on past implied ones. The empirical evidence regarding the predictable content of implied volatilities is mixed. The time series study of Lamoureux and Lastrapes (1993) considered options on non-dividend paying stocks and compared the forecasting performance of GARCH, implied volatility and historical volatility estimates and found that implied volatility forecasts, although biased as one would expect from (2.1.25), outperform the others. In sharp contrast, Canina and Figlewski (1993) studied S&P100 index call options for which there is an extremely active market. They found that implied volatilities were virtually useless in forecasting future realized volatilities of the S&P100 index. In a different setting using weekly sampling intervals for S&P100 option contracts and a different sample Day and Lewis (1992) not only found that implied volatilities had a predictive content but also were unbiased. Studies examining options on foreign currencies, such as Jorion (1995), also found that implied volatilities were predicting future realizations and that GARCH as well as historical volatilities were not outperforming the implied measures of volatility.
(h) The term structure o f implied volatilities
The Black-Scholes model predicts a flat term structure of volatilities. In reality, the term structure of at-the-money implied volatilities is typically upward sloping when short term volatilities are low and the reverse when they are high (see Stein(1989)). Taylor and Xu (1994) found that the term structure of implied is Different weightingschemes have been suggested, see for instance Latane and Rendleman (1976), Chiras and Manaster(1978), Beckers(1981), Whaley(1982), Day and Lewis(1988), Engleand Mustafa (1992) and Bates (1995b).
131
volatilities from foreign currency options reverses slope every few months. Stein (1989) also found that the actual sensitivity of medium to short term implied volatilities was greater than the estimated sensitivity from the forecast term structure and concluded that medium term implied volatilities overreacted to information. Diz and Finucane (1993) used different estimation techniques and rejected the overreaction hypothesis, and instead reported evidence suggesting underreaction. (i) Smiles If option prices in the market were conformable with the Black-Scholes formula, all the Black-Scholes implied volatilities corresponding to various options written on the same asset would coincide with the volatility parameter a of the underlying asset. In reality this is not the case, and the Black-Scholes implied volatility wimp(t, t + h) defined by (2.1.23) heavily depends on the calendar time t, the time to maturity h and the moneyness xt = Log St/KB(t, t h) of the option. This may produce various biases in option pricing or hedging when BS implied volatilities are used to evaluate new options with different strike prices K and maturities h. These price distortions, well-known to practitioners, are usually documented in the empirical literature under the terminology of the smile effect, where the socalled "smile" refers to the U-shaped pattern of implied volatilities across different strike prices. More precisely, the following stylized facts are extensively documented (see for instance Rubinstein (1985), Clewlow and Xu (1993), Taylor and Xu (1993)):
-
The U-shaped pattern of wimp(t, t h) as a function of K (or logK) has its minimum centered at near-the-money options (discounted K close to St, i.e. xt close to zero). The volatility smile is often but not always symmetric as a function of logK (or ofxt). When the smile is asymmetric, the skewness effect can often be described as the addition of a monotonic curve to the standard symmetric smile: if a decreasing curve is added, implied volatilities tend to rise more for decreasing than for increasing strike prices and the implied volatility curve has its minimum out of the money. In the reverse case (addition of an increasing curve), implied volatilities tend to rise more with increasing strike prices and their minimum is in the money. The amplitude of the smile increases quickly when time to maturity decreases. Indeed, for short maturities the smile effect is very pronounced (BS implied volatilities for synchronous option prices may vary between 154 and 25%) while it almost completely disappears for longer maturities. It is widely believed that volatility smiles have to be explained by a model of stochastic volatility. This is natural for several reasons: First, it is tempting to propose a model of stochastically time varying volatility to account for stochastically time varying BS implied volatilities. Moreover, the decreasing amplitude of the smile being a function of time to maturity is conformable with a formula like (2.1.25). Indeed, it shows that, when time to maturity is increased,
132
temporal aggregation of volatilities erases conditional heteroskedasticity, which decreases the smile phenomenon. Finally, the skewness itself may also be attributed to the stochastic feature of the volatility process and overall to the correlation of this process with the price process (the so-called leverage effect). Indeed, this effect, while sensible for stock prices data, is small for interest rate and exchange rate series which is why the skewness of the smile is more often observed for options written on stocks. Nevertheless, it is important to be cautious about tempting associations: stochastic implied volatility and stochastic volatility; asymmetry in stocks and skewness in the smile. As will be discussed in Section 4, such analogies are not always rigorously proven. Moreover, other arguments to explain the smile and its skewness (jumps, transaction costs, bid-ask spreads, non-synchronous trading, liquidity problems . . . . ) have also to be taken into account both for theoretical reasons and empirical ones. For instance, there exists empirical evidence suggesting that the most expensive options (the upper parts of the smile curve) are also the least liquid; skewness may therefore be attributed to specific configurations of liquidity in option markets.
2.3. Information sets

So far we left the specification of information sets vague. This was done on purpose to focus on one issue at the time. In this section we need to be more formal regarding the definition of information since it will allow us to clarify several missing links between the various SV models introduced in the literature and also between SV and A R C H models. We know that SV models emerged from research looking at a very diverse set of issues. In this section we will try to define a common thread and a general unifying framework. We will accomplish this through a careful analysis of information sets and associate with it notions of non-causality in the Granger sense. These causality conditions will allow us to characterize in Section 2.4 the distinct features of A R C H and SV models. 16
2.3.1. State variables and information sets The Hull and White (1987) model is a simple example of a derivative asset pricing model where the stock price dynamics are governed by some unobservable state variables, such as random volatility. More generally, it is convenient to assume that a multivariate diffusion process Ut summarizes the relevant state variables in the sense that: dSt/St = #fit + fftdWt ~[ dUt = 7tdt + (~tdWtU [ Cov(dWt, d W if) = , , d t
(2.3.1)
16The analysis in this section has some features in common with Andersen (1992) regarding the use of information sets to clarify the differencebetween SV and ARCH type models.
133
where the stochastic processes #t, at,~t, rt and Pt are Itu = [U,,z _< t] adapted (Assumption 2.3.1). This means that the process U summarizes the whole dynamics of the stock price process S (which justifies the terminology "state" variable) since, for a given sample path (U,)0<~<r of state variables, consecutive returns Stk+~~Sty, 0 < tl < t2 < ... < tk <__T are stochastically independent and lognormal (as in the benchmark BS model). The arguments of Section 2. !.2 can be extended to the state variables framework (see Garcia and Renault (1995)) discussed here. Indeed, such an extension provides a theoretical justification for the common use of the Black and Scholes model as a standard method of quoting option prices via their implied volatilities) 7 In fact, it is a way of introducing neglected heterogeneity in the BS option pricing model (see Renault (1995) who draws attention to the similarities with introducing heterogeneity in microeconometric models of labor markets, etc.). In continuous time models, available information at time t for traders (whose information determines option prices) is characterized by continuous time observations of both the state variable sample path and stock price process sample path; namely: It = a[Uz, Sz; "C~_ t] . (2.3.2)
2.3.2. Discrete sampling and Granger noncausality In the next section we will treat explicitly discrete time models. It will necessitate formulating discrete time analogues of equation (2.3.1). The discrete sampling and Granger noncausality conditions discussed here will bring us a step closer to building a formal framework for statistical modeling using discrete time data. Clearly, a discrete time analogue of equation (2.3.1) is: log St+l/St = #(Ut) + tr(Ut)et+l (2.3.3)
provided we impose some restrictions on the process et. The restrictions we want to impose must be flexible enough to accommodate phenomena such as leverage effects for instance. A setup that does this is the following: ASSUMPTION 2.3.2.A. The process et in (2.3.3) is i.i.d, and not Granger-caused by the state variable process Ut. ASSUMPTION 2.3.2.B. The process et in (2.3.3) does not Granger-cause Ut. Assumption 2.3.2.B is useful for the practical use of BS implied volatilities as it is the discrete time analogue of Assumption 2.3.1 where it is stated that the coefficients of the process U are I y adapted (for further details see Garcia and 17 Garcia and Renault (1995) argued that Assumption 2.3.1 is essentialto ensure the homogeneity of option priceswith respectto the pair (stockprice, strike price) which in turn ensures that BS implied volatilities do not depend on the stock price levelbut only on the moneynessS/K. This homogeneity property was first emphasized by Merton (1973).
134
Renault (1995)). Assumption 2.3.2.A is important for the statistical interpretation of the functions #(Ut) and a(Ut) respectively as trend and volatility coefficients, namely, E[log S t + l / S t l ( & / & - l ; z <
t)]
(2.3.4)
= E[E[log St+I/StI(U,, e~; v < t)]l(&/&_l;'c < t)]

= E[#(Ut)I(St/St_I; z < t)]
since E[et+l [ (U~, e,; z _< t)] = E[et+l ]et; z _< t] = 0 due to the Granger noncausality from Ut to st of Assumption 2.3.2.A. Likewise, one can easily show that Var[log S t + l / S t - # ( U t ) ] ( & / & - l ; "r <_ t)] = E[~rZ(ut)l(&/&_l; z <_ t)] . (2.3.5)
Implicitly we have introduced a new information set in (2.3.4) and (2.3.5) which, besides It defined in (2.3.2), will be useful as well for further analysis. Indeed, one often confines (statistical) analysis to information conveyed by a discrete time sampling of stock return series which will be denoted by the information set
ItR = a[&/&_l :z = 0, 1 , . . . , t -
l,t]
(2.3.6)
where the superscript R stands for returns. By extending Andersen (1994), we shall adopt as the most general framework for univariate volatility modelling, the setup given by the Assumptions 2.3.2.A, 2.3.2.B and: ASSUMe'nON 2.3.2.C. I,t(Ut) is I~ measurable. Therefore in (2.3.4) and (2.3.5) we have essentially shown that: E [log St+l/St[ItR] = #(Ut) (2.3.7) (2.3.8)
Var[(log St+I/St)II, R] =
E[a2(Ut)llff] .
2.4. Statistical modelling o f stochastic volatility
Financial time series are observed at discrete time intervals while a majority of theoretical models are formulated in continuous time. Generally speaking there are two statistical methodologies to resolve this tension. Either one considers for the purpose of estimation statistical discrete time models of the continuous time processes, or alternatively, the statistical model may be specified in continuous time and inference done via a discrete time approximation. In this section we will discuss in detail the former approach while the latter will be introduced in Section 4. The class of discrete time statistical models discussed here is general. In Section 2.4.1 we introduce some notation and terminology. The next section discusses the so-called stochastic autoregressive volatility model introduced by
135
Andersen (1994) as a rather general and flexible semi-parametric framework to encompass various representations of stochastic volatility already available in the literature. Identification of parameters and the restrictions required for it are discussed in Section 2.4.3.
2.4.1. Notation and terminology
In Section 2.3, we left unspecified the functional forms which the trend/~(.) and volatility a(.) take. Indeed, in some sense we built a nonparametric framework recently proposed by Lezan, Renault and de Vitry (1995) which they introduced to discuss a notion of stochastic volatility o f unknown f o r m . 18 This nonparametric framework encompasses standard parametric models (see Section 2.4.2 for more formal discussion). For the purpose of illustration let us consider two extreme cases, assuming for simplicity that #(Ut) = 0 : (i) the discrete time analogue of the Hull and White model (2.1.16) is obtained when a(Ut) = at is a stochastic process independent from the stock return standardized innovation process e and (ii) at may be a deterministic function h(et, ~ < t) of past innovations. The latter is the complete opposite of (i) and leads to a large variety of choices of parameterized functions for h yielding X-ARCH models (GARCH, EGARCH, QTARCH, Periodic GARCH, etc.). Besides these two polar cases where Assumption 2.3.2.A is fulfilled in a trivial degenerate way, one can also accommodate leverage effects, w In particular the contemporaneous correlation structure between innovations in U and the return process can be nonzero, since the Granger non-causality assumptions deal with temporal causal links rather than contemporaneous ones. For instance, we may have a(Ut) = at with: log St+l/St = atet+l (2.4.1) (2.4.2)
Cov(o't+l, E't+lII, R) # 0 .
A negative covariance in (2.4.2) is a standard case of leverage effect, without violating the non-causality Assumptions 2.3.2.A and B. A few concluding observations are worth making to deal with the burgeoning variety of terminology in the literature. First, we have not considered the distinction due to Taylor (1994) between "lagged autoregressive random variance models" given by (2.4.1) and "contemporaneous autoregressive random variance models" defined by: log S t + l / S t : O't+let+l (2.4.3)
18 Lezan, Renault and de Vitry (1995) discuss in detail how to recover phenomena such as volatility clustering in this framework. As a nonparametric framework it also has certain advantages regarding (robust) estimation. They develop for instance methods that can be useful as a first estimation step for efficient algorithms assuming a specific parametric model (see Section 5). 19 Assumption 2.3.2.B is fulfilled in case (i) but may fail in the G A R C H case (ii). When it fails to hold in the latter case it makes the G A R C H framework not very well-suited for option pricing.
136
Indeed, since the volatility process at is unobservable, the settings (2.4.1) and (2.4.3) are observationally equivalent as long as they are not completed by precise (non)-causality assumptions. For instance: (i) (2.4.1) and assumption 2.3.2.A together appear to be a correct and very general definition of a SV model possibly completed by Assumption 2.3.2.B for option pricing and (2.4.2) to introduce leverage effects, (ii) (2.4.3) associated with (2.4.2) would not be a correct definition of a SV model since in this case in general: E[log St+i/St I ItR] 0, and the model would introduce via the process a a forecast which is related not only to volatility but also to the expected return. For notational simplicity, the framework (2.4.3) will be used in Section 3 with the leverage effect captured by Cov(at+l, ct) ~ 0 instead of Cov(at+l, et+l) ~ O. Another terminology was introduced by Amin and Ng (1993) for option pricing. Their distinction between "predictable" and "unpredictable" volatility is very close to the leverage effect concept and can also be analyzed through causality concepts as discussed in Garcia and Renault (1995). Finally, it will not be necessary to make a distinction between weak, semi-strong and strong definitions of SV models in analogy with their A R C H counterparts (see Drost and Nijman (1993)). Indeed, the class of SV models as defined here can accommodate parameterizations which are closed under temporal aggregation (see also Section 4.1 on the subject of temporal aggregation). 2.4.2. Stochastic autoregressive volatility For simplicity, let us consider the following univariate volatility process: Yt+l = ~tt + atet+l (2.4.4)
where #t is a measurable function of observables Yt c Itn, z <_ t. While our discussion will revolve around (2.4.4), we will discuss several issues which are general and not confined to that specific model; extensions will be covered more explicitly in Section 3.5. Following the result in (2.3.8) we know that: Var[yt+l[It n] = E[aZllff] (2.4.5)
suggesting (1) that volatility clustering can be captured via autoregressive dynamics in the conditional expectation (2.4.5) and (2) that thick tails can be obtained in either one of three ways, namely (a) via heavy tails of the white noise et distribution, (b) via the stochastic features of E [a 2 litn] and (c) via specific randomness of the volatility process at which makes it latent i.e. at~I~. 2 The volatility dynamics that follow from (1) and (2) are usually an AR(1) model for some nonlinear function of at. Hence, the volatility process is assumed to be stationary and Markovian of order one but not necessarily linear AR(1) in at itself. This is
20 Kim and Shephard (1994), using data on weeklyreturns on the S&P500Index, found that a tGARCH model has an almost identicallikelihoodas the normal based SV model. This exampleshows that a specificrandomness in at may produce the same level of marginal kurtosis as a heavy tailed student distribution of the white noise e.
137
precisely what motivated Andersen (1994) to introduce the Stochastic Autoregressive Variance or SARV class of models where at (or o-2) is a polynomial function g(Kt) of a Markov process Kt with the following dynamic specification:
Kt = w + flKt-1 + [y + oJ4~t-1]ut
(2.4.6)
where fit = ut - 1 is zero-mean white noise with unit variance. Andersen (1994) discusses sufficient regularity conditions which ensure stationarity and ergodicity for Kt. Without entering into the details, let us note that the fundamental noncausality Assumption 2.3.2A implies that the ut process in (2.4.6) does not Granger-cause et in (2.4.4). In fact, the non-causality condition suggests a slight modification of Andersen's (1994) definition. Namely, it suggests assuming et+l independent of ut_j, j >_ 0 for the conditional probability distribution, given et-j, j _> 0 rather than for the unconditional distribution. This modification does not invalidate Andersen's SARV class of models as the most general parametric statistical model studied so far in the volatility literature. The GARCH(1,1) model is straightforwardly obtained from (2.4.6) by letting Kt = at2, ~ = 0 and ut = et 2. Note that the deterministic relationship ut = ct 2 between the stochastic components of (2.4.4) and (2.4.6) emphasizes that, in G A R C H models, there is no randomness specific to the volatility process. The Autoregressive Random Variance model popularized by Taylor (1986) also belongs to the SARV class. Here: log at+l = ~ + log at + r/t+1 (2.4.7)
where qt+l is a white noise disturbance such that Cov(r/t+l , et+l) 0 to accommodate leverage effects. This is a SARV model with K, = log at, c~= 0 and
qt+l ~- "~Ut+l "21
2.4.3. Identification o f parameters
Introducing a general class of processes for volatility, like the SARV class discussed in the previous section prompts questions regarding identification. Suppose again that
Yt+l = atEt+l
a 7 = g(K,),
q e {1,2)
(2.4.8)
K t = W + flKt-1 + []~ + aKt-1]ut
Andersen (1994), noted that the model is better interpreted by considering the zero-mean white noise process fit = ut - 1:
g t ~- ( w - ~ ) ) -~- (0~-1- f l ) g t - 1 -~- (~ -[- ~ r ~ t - l ) f t
(2.4.9)
It is clear from the latter that it may be difficult to distinguish empirically the constant w from the "stochastic" constant yut. Similarly, the identification of the and/~ parameters separately is also problematic as (, +/~) governs the persis21 Andersen (1994) also shows that the SARV framework encompasses another type of random variance model that we have considered as ill-specified since it combines (2.4.2) and (2.4.3).
138
tence of shocks to volatility. These identification problems are usually resolved by imposing (arbitrary) restrictions on the pairs of parameters (w, 7) and (~, fl). The GARCH(1,1) and Autoregressive Random Variance specifications assume that 7 = 0 and ~ = 0 respectively. Identification of all parameters without such restrictions generally requires additional constraints, for instance via some distributional assumptions on et+l and ut, which restrict the semi-parametric framework of (2.4.6) into a parametric statistical model. To address more rigorously the issue of identification, it is useful to consider, according to Andersen (1994), the following reparameterization (assuming for notational convenience that ~ 0): K p = =
(w+7)/(1-oc-fl) c~+/3
(2.4.10)
7/c~.
Hence equation (2.4.9) can be rewritten as:
Kt = K + p(Ks-1 - K) + (6 + Ks-1)(Js
where Os = ~fit. It is clear from (2.4.10) that only three functions of the original parameters ~,/3, 7, w may be identified and that the three parameters K, p, 6 are identified from the first three unconditional moments of the process Kt for instance. To give to these identification results an empirical content, it is essential to know: (1) how to go from the moments of the observable process Yt to the moments of the volatility process at, and (2) how to go from the moments of the volatility process as to the moments of the latent process Ks. The first point is easily solved by specifying the corresponding moments of the standardized innovation process e. If we assume for instance a Gaussian probability distribution, we obtain:
Elys I
E[ytl[yt_j[
EIAIlyt-jl
= V/2-/rc Eat = 2/re E(ato's-;) =
(2.4.11)
The solution of the second point requires in general the specification of the mapping g and of the probability distribution of ut in (2.4.6). For the so-called Log-normal SARV model, it is assumed that ~ = 0 and Kt = log at (Taylor's autoregressive random variance model) and that ut is normally distributed (Lognormality of the volatility process). In this case, it is easy to show that:
Ecr'/ E (a~atn_j) Cov(Kt, Kt-j)
exp[nEKt + n2VarKt/2] Ea~nEa~_j exp [mnCov(Kt, Kt-j)] flJVarKt .
(2.4.12)
Without the normality assumption (i.e. QML, mixture of normal, Student distribution...) this model will be studied in much more detail in sections 3 and 5
139
from both probabilistic and statistical points of view. Moreover, this is a template for studying other specifications of the SARV class of models. In addition, various specifications will be considered in Section 4 as proxies of continuous time models.
3. Discrete time models
The purpose of this section will be to discuss the statistical handling of discrete time SV models, using simple univariate cases. We start by defining the most basic SV model corresponding to the autoregressive random variance model discussed earlier in (2.4.7). We study its statistical properties in Section 3.2 and provide a comparison with A R C H models in Section 3.3. Section 3.4 is devoted to filtering, prediction and smoothing. Various extensions, including multivariate models, are covered in the last section. Estimation of the parameters governing the volatility process is discussed later in section 5.
3.1.
The discrete time SV model
The discrete time SV model may be written as

Yt = a t e t , t = 1,...,T ,
(3.1.1)
where y t denotes the demeaned return process y t = log ( S t ~ S t - l ) - I~ and log a 2 follows an AR(1) process. It will be assumed that ct is a series of independent, identically distributed random disturbances. Usually et is specified to have a 2 is known. Thus for a normal distribution standard distribution so its variance a~ a 2 is unity while for a t-distribution with v degrees of freedom it will be v / ( v - 2). Following a convention often adopted in the literature we write ht =- log o-2:
Yt = ~rcte 'Sh'
(3.1.2)
where a is a scale parameter, which removes the need for a constant term in the stationary first-order autoregressive process
ht+l = q~ht + qt, rlt ~ I I D ( O , a ~ )
[qS[ < 1 .
(3.1.3)
It was noted before that if et and r/t are allowed to be correlated with each other, the model can pick up the kind of asymmetric behavior which is often found in stock prices. Indeed a negative correlation between et and ~/t induces a leverage effect. As in Section 2.4.1, the timing of the disturbance in (3.1.3) ensures that the observations are still a martingale difference, the equation being written in this way so as to tie in with the state space literature. It should be stressed that the above model is only an approximation to the continuous time models of Section 2 observed at discrete intervals. The accuracy of the approximation is examined in Dassios (1995) using Edgeworth expansions (see also Sections 4.1 and 4.3 for further discussion).
140
3.2. Statistical properties

The following properties of the SV model hold even if et and t/t are contemporaneously correlated. Firstly, as noted, yt is a martingale difference. Secondly, stationarity of ht implies stationarity of Yr. Thirdly, if ~/t is normally distributed, it follows from the properties of the lognormal distribution that E[exp(aht)] = exp(aZ~r2/2), where a is a constant and a~ is the variance of hr. Hence, if et has a finite variance, the variance of yt is given by Var(y,) = a 2 ~ exp (a2/2) . (3.2.1)
Similarly if the fourth moment of et exists, the kurtosis of yt is ~:exp(a~), where is the kurtosis of et, so Yt exhibits more kurtosis than st. Finally all the odd moments are zero. For many purposes we need to consider the moments of powers of absolute values. Again, tb is assumed to be normally distributed. Then for et having a standard normal distribution, the following expressions are derived in Harvey (1993):
Elytl ~ = ac2 e/2

and
r ( c / 2 + 1/2)
~-(1-7-2~ e x p ~ o h )
f c2 2"~
c > -1 ,
c 7~ 0
(3.2.2)
(c2 32'~ ~r(c + 1/2) [I'(c/2+1/2)_] 2} Varlyt]C=~2~Uexp~ 2 hi{ r(1/2) [ r(1/2) ] '
c>-0.5, c0 . Note t h a t / ' ( 1 / 2 ) - v / ~ and F(1) = 1. Corresponding expressions may be computed for other distributions of et including Student's t and the General Error Distribution (see Nelson (1991)). Finally, the square of the coefficient of variation of o-2 is often used as a measure of the relative strength of the SV process. This is Var(a~)/[E(a~)] 2 = exp(a 2) - 1. Jacquier, Polson and Rossi (1994) argue that this 2. In the empirical studies they quote it is rarely is more easily interpretable than a n less than 0.1 or greater than 2.
3.2.1. Autocorrelation functions

If we assume that the disturbances et and th are mutually independent, and qt is normal, the ACF of the absolute values of the observations raised to the power c is given by
p!e) __ E(lytlClyt-~[ c) - {E(lyt[ e) }2 = exp (~-ahph,~) -- 1 E(lytl2c)-{E(]ytI~)} 2

z>l , c>-0.5 , c0 ~cexp(-~a])-1 ' (3.2.3)
c2
where ~Ccis
141
xc = E(lyt]2c)/{E(lYtlC)} 2 ,
(3.2.4)
and Ph#, z = 0, 1,2~... denotes the ACF ofht . Taylor (1986) gives this expression for c equal to one and two and et normally distributed. When c = 2, xc is the kurtosis and this is three for a normal distribution. More generally,
Kc = r ( c + 1 / 2 ) r ( 1 / 2 ) / { r ( c / 2
+ 1/2)} 2
e# 0
For Student's t-distribution with v degrees of freedom:
r(c + 1/2)r(-c + v/Z)r(1/2)r(v/2)

14,c
{r(c/2 + 1/2)r(-c/2
+ v/2)} 2
'
(3.2.5)
Icl <
v/2 ,
c # 0
Note that v must be at least five if c is two. The ACF, p{C), has the following features. First, if O' 2 h is small and/or Ph,~ is close to one, e x p ( ~ a 2) - 1 P~) ~-- Ph# (Kcexp(~a]) - 1) ' z_> 1 ; (3.2.6)
compare Taylor (1986, p. 74-5). Thus the shape of the A C F ofht is approximately carried over to p~C) except that it is multiplied by a factor of proportionality, which must be less than one for c positive as xc is greater than one. Secondly, for the t-distribution, xc declines as v goes to infinity. Thus p!C) is a maximum for a normal distribution. On the other hand, a distribution with less kurtosis than the normal will give rise to higher values of p~C). Although (3.2.6) gives an explicit relationship between p!C) and c, it does not appear possible to make any general statements regarding p~C) being maximized for certain values of c. Indeed different values of a ] lead to different values of c maximizing p!C). ifah2 is chosen so as to give values of p~C) of a similar size to those reported in Ding, Granger and Engle (1993) then the maximum appears to be attained for c slightly less than one. The shape of the curve relating p!C) to c is similar to the empirical relationships reported in Ding, Granger and Engle, as noted by Harvey (1993).
3.2.2. Logarithmic transformation Squaring the observations in (3.1.2) and taking logarithms gives
2 log y~ = log a 2 + ht q- log et Alternatively logy~=og+ht+t (3.2.7)
(3.2.8)
142
where co = log 0-2+ Elog eZ,so that the disturbance it has zero mean by construction. The mean and variance of log e 2 are known to be -1.27 and rc2/2 = 4.93 when et has a standard normal distribution; see Abramovitz and Stegun (1970). However, the distribution of log e 2 is far from being normal, being heavily skewed with a long tail. More generally, if st has a t-distribution with v degrees of freedom, it can be expressed as:
~t z ~tKt 0"5
where ~t is a standard normal variate and ~ct is independently distributed such that vxt is chi-square with v degrees of freedom. Thus log e~ = log (2 _ log Kt and again using results in Abramovitz and Stegun (1970), it follows that the mean and variance of log et 2 are -1.27 -~b(v/2) - log (v/2) and 4.93 + ~k'(v/2) respectively, where ~(.) is the digamma function. Note that the moments of it exist even if the model is formulated in such a way that the distribution of er is Cauchy, that is v = 1. In fact in this case it is symmetric with excess kurtosis two, compared with excess kurtosis four when et is Gaussian. Since log e~ is serially independent, it is straightforward to work out the ACF of log ~ for hr following any stationary process:
2 2 p~O) = p h , j { 1 + 0-~/0-h} , "c > 1 . (3.2.9)
The notation p~0) reflects the fact that the ACF of a power of an absolute value of the observation is the same as that of the Box-Cox transform, that is {[yt]C-1}/c, and hence the logarithmic transform of an absolute value, raised to any (non-zero) power, corresponds to c = 0. (But note that one cannot simply set c = 0 in (3.2.3)). Note that even if tit and et are not mutually independent, the t/t and it disturbances are uncorrelated if the joint distribution of et and tit is symmetric, that is f(et, tit) = f ( - e t , -tit); see Harvey, Ruiz and Shephard (1994). Hence the expression for the ACF in (3.2.9) remains valid.
3.3. Comparison with A R C H models

The GARCH(1,1) model has been applied extensively to financial time series. The variance in (3.1.1) is assumed to depend on the variance and squared observation in the previous time period. Thus
2 = ]; _~_ 0~y2_l q_ tiff2 0-t t-1 ,
t = 1,...,T .
(3.3.1)
The G A R C H model was proposed by Bollerslev (1986) and Taylor (1986), and is a generalization of the A R C H model formulated by Engle (1982). The
143
A R C H ( l ) model is a special case of GARCH(1,1) with fi = 0. The motivation comes from forecasting; in an AR(1) model with independent disturbances, the optimal prediction of the next observation is a fraction of the current observation, and in A R C H ( I ) it is a fraction of the current squared observation (plus a constant). The reason is that the optimal forecast is constructed conditional on the current information and in an A R C H model the variance in the next period is assumed to be known. This construction leads directly to a likelihood function for the model once a distribution is assumed for et. Thus estimation of the parameters upon which 0-3 depends is straightforward in principle. The G A R C H formulation introduces terms analogous to moving average terms in an A R M A model, thereby making forecasts a function of a distributed lag of past squared observations. It is straightforward to show that yt is a martingale difference with (unconditional) variance 7/(1 - e -/3). Thus ~ +/3 < 1 is the condition for covariance stationarity. As shown in Bollerslev (1986), the condition under which the fourth moment exists in a Gaussian model is 2~2 + (c +/3) 2 < 1. The model then exhibits excess kurtosis. However, the fourth moment condition may not always be satisfied in practice. Somewhat paradoxically, the conditions for strict stationarity are much weaker and, as shown by Nelson (1990), even include the case ~+fl=l. The specification of GARCH(1,1) means that we can write y2 t = ~ o~y2-1 fl0.2 t-1 Ot = 7 + (0~ / 3 ) y 2 _ 1 -}- I)t --/31)t-1 where vt = Y 2 - 0-3 is a martingale difference. Thus y2 t has the form of an ARMA(1,1) process and so its ACF can be evaluated in the same way. The ACF of the corresponding A R M A model seems to be indicative of the type of patterns likely to be observed in practice in correlograms of y2. The G A R C H model extends by adding more lags of 0-3 and y2t. However, GARCH(1,1) seems to be the most widely used. It displays similar properties to the SV model, particularly if ~b is close to one. This should be clear from (3.2.6) which has the pattern of an ARMA(1,1) process. Clearly ~b plays a role similar to that of e +/3. The main difference in the ACFs seems to show up most at lag one. Jacquier et al. (1994, p. 373) present a graph of the correlogram of the squared weekly returns of a portfolio on the New York Stock Exchange together with the ACFs implied by fitting SV and GARCH(1,1) models. In this case the ACF implied by the SV model is closer to the sample values. The SV model displays excess kurtosis even if ~b is zero since Yt is a mixture of distributions. The 0-7 2 parameter governs the degree of mixing independently of the degree of smoothness of the variance evolution. This is not the case with a G A R C H model where the degree of kurtosis is tied to the roots of the variance equation, ~ and/3 in the case of GARCH(1,1). Hence, it is very often necessary to use a non-Gaussian G A R C H model to capture the high kurtosis typically found in a financial time series. The basic G A R C H model does not allow for the kind of asymmetry captured by a SV model with contemporaneously correlated disturbances, although it can
144
be modified as suggested in Engle and Ng (1993). The E G A R C H model, proposed by Nelson (1991), handles asymmetry by taking log o-~ to be a function of past squares and absolute values of the observations. 3.4. Filtering, smoothing and prediction For the purposes of pricing options, we need to be able to estimate and predict the variance, atz, which of course, is proportional to the exponent of hr. An estimate based on all the observations up to, and possibly including, the one at time t is called a filtered estimate. On the other hand an estimate based on all the observations in the sample, including those which came after time t is called a smoothed estimate. Predictions are estimates of future values. As a matter of historical interest we may wish to examine the evolution of the variance over time by looking at the smoothed estimates. These might be compared with the volatilities implied by the corresponding options prices as discussed in Section 2.1.2. For pricing "at the money" options we may be able to simply use the filtered estimate at the end of the sample and the predictions of future values of the variance, as in the method suggested for A R C H models by Noh, Engle and Kane (1994). More generally, it may be necessary to base prices on the full distribution of future values of the variance, perhaps obtained by simulation techniques; for further discussion see Section 4.2. One can think of constructing filtered and smoothed estimates in a very simple, but arbitrary way, by taking functions (involving estimated parameters) of moving averages of transformed observations. Thus:
~t 2 = g ~j=t-1
wtjf(Yt_ j
)
t = 1,.., T ,
(3.4.1)
where r = 0 or 1 for a filtered estimate and r = t - T for a smoothed estimate. Since we have formulated a stochastic volatility model, the natural course of action is to use this as the basis for filtering, smoothing and prediction. For a linear and Gaussian time series model, the state space form can be used as the basis for optimal filtering and smoothing algorithms. Unfortunately, the SV model is nonlinear. This leaves us with three possibilities: a. compute inefficient estimates based on a linear state space model; b. use computer intensive techniques to estimate the optimal filter to a desired level of accuracy; c. use an (unspecified) A R C H model to approximate the optimal filter. We now turn to examine each of these in some detail. 3.4.1. Linear state space f o r m The transformed observations, the log y2rs, can be used to construct a linear state space model as suggested by Nelson (1988) and Harvey, Ruiz and Shephard (1994). The measurement equation is (3.2.8) while (3.1.3) is the transition equa-
145
tion. The initial conditions for the state, ht, are given by its unconditional mean and variance, that is zero and o-~/(1 - q~2) respectively. While it may be reasonable to assume that t/t is normal, ~t would only be normal if the absolute value of et were lognormal. This is unlikely. Thus application of the Kalman filter and the associated smoothers yields estimators of the state, ht, which are only optimal within the class of estimators based on linear combinations of the log ~'s. Furthermore, it is not the h'ts which are required, but rather their exponents. Suppose htlr denotes the smoothed estimator obtained from the linear state space form. Then exp(ht[7-) is of the form (3.4.1), multiplied by an estimate of the scaling constant, ~r 2. It can be written as a weighted geometric mean. This makes the estimates vulnerable to very small observations and is an indication of the limitations of this approach. Working with the logarithmic transformation raises an important practical issue, namely how to handle observations which are zero. This is a reflection of the point raised in the previous paragraph, since obviously any weighted geometric mean involving a zero observation will be zero. More generally we wish to avoid very small observations. One possible solution is to remove the sample mean. A somewhat more satisfactory alternative, suggested by Fuller, and studied by Breidt and Carriquiry (1995), is to make the following transformation based on a Taylor series expansion:
2 -CSy2 log yt 2 = log (Yt2 + CSy) / ( y , 2 + CS2y) ,
t = 1, ... , T ,
(3.4.2)
where s 2 is the sample variance of the ~ s and c is a small number, the suggested value being 0.02. The effect of this transformation is to reduce the kurtosis in the transformed observations by cutting down the long tail made up of the negative values obtained by taking the logarithms of the "inliers". In other words it is a form of trimming. It might be more satisfactory, to carry out this procedure after correcting the observations for heteroskedasticity by dividing by preliminary estimates, Tt2tS. The log ~t2's are then added to the transformed observations. The ~t2's could be constructed from a first round or by using a totally different procedure, perhaps a nonparametric one. The linear state space form can be modified so as to deal with asymmetric models. It was noted earlier that even if qt and et are not mutually independent, the disturbances in the state space form are uncorrelated if the joint distribution of t and qt is symmetric. Thus the above filtering and smoothing operations are still valid, but there is a loss of information stemming from the squaring of the observations. Harvey and Shephard (1993) show that this information may be recovered by conditioning on the signs of the observations denoted by st, a variable which takes the value + 1 (-1) when yt is positive (negative). These signs are, of course, the same as the signs of the et's. Let E+ (E_) denote the expectation conditional on et being positive (negative), and assign a similar interpretation to variance and covariance operators. The distribution of it is not affected by conditioning on the signs of the et's, but, remembering that E0/t[et ) is an odd function of et,
146
[Z* = E+ (rb) = E+ [Er/tlet] = - E _ (rb) , and

7" -----Cov+(r/t, ~t) =-- E+(r/t~t) - E + ( r / t ) E ( ~ t )
= Cov_(r/. Ct) ,
= E+(r/tC,)
because the expectation o f Ct is zero and E+(qt~t) = E+ [E(ntlet)log e t ] - #*E(log et) = - Z _ (r/tCt) Finally Var+r/t = E+(r/2) - [E+(r/t)] 2 = a 2 n _ [Z*2 . The linear state space f o r m is n o w log yt2 = co + ht + ~t
ht+l = ~ght + st[z* + r/t
(3.4.3)
\7*st 4 - [ z .2
The K a l m a n filter m a y still be initialized by taking h0 to have m e a n zero and variance a~/(1 - 2). The parameterization in (3.4.3) does not directly involve a p a r a m e t e r representing the correlation between et and r/r The relationship between #* and Y* and the original p a r a m e t e r s in the model can only be obtained by m a k i n g a distributional a s s u m p t i o n a b o u t et as well as qt. W h e n et and r/t are bivariate n o r m a l with Corr(et, r/t) = p, E(r/t[et ) -~ panet, and so #* = E+(r/t) = panE+(et) ~-- Pan V/2/TC ~- 0.7979pa n . Furthermore, (3.4.4)
y* = panE(let I log et2) - 0.7979panE(log e~) = 1.1061pa n .
(3.4.5)
W h e n et has a t-distribution, it can be written as ~ttci '5, and ~t and qt can be regarded as having a bivariate n o r m a l distribution with correlation p, while ~t is independent of both. T o evaluate [Z* and 7" one proceeds as before, except that the initial conditioning is on ~t rather than on et, and the required expressions are found to be exactly as in the G a u s s i a n case. The filtered estimate of the log volatility ht, written as ht+~l, takes the form:
ht+llt = (ohtlt_l + ~)(Ptlt_l q- *st) (log y2 Ptlt_ l q- 27"S t + a~ t -- 09 -- htlt_ 1 ) Jr- st[z* ,
where Ptlt-1 is the corresponding m e a n square error of the hilt-1. If p < 0, then 7" < 0, and the filtered estimator will behave in a similar way to the E G A R C H
147
model estimated by Nelson (199l), with negative observations causing bigger increases in the estimated log volatility than corresponding positive values.
3.4.2. Nonlinear filters

In principle, an exact filter may be written down for the original (3.1.2) and (3.1.3), with the former taken as the measurement equation. Evaluating such a filter requires approximating a series of integrals by numerical methods. Kitagawa (1987) has proposed a general method for implementing such a filter and Watanabe (1993) has applied it to the SV model. Unfortunately, it appears to be so time consuming as to render it impractical with current computer technology. As part of their Bayesian treatment of the model as a whole, Jacquier, Poison and Rossi (1994) show how it is possible to obtain smoothed estimates of the volatilities by simulation. What is required is the mean vector of the joint distribution of the volatilities conditional on the observations. However, because simulating this joint distribution is not a practical proposition, they decompose it into a set of univariate distributions in which each volatility is conditional on all the others. These distributions may be denoted p(o'tla_, y), where ~r_t denotes all the volatilities apart from o't. What one would like to do is to sample from each of these distributions in turn, with the elements of a_t set equal to their latest estimates, and repeat several thousand times. As such this is a Gibbs sampler. Unfortunately, there are difficulties. The Markov structure of the SV model may be exploited to write
p(crt]a_t, y) = p(~tJat_l, o't+l, Yt) oc p(yt]ht) p(ht[ht-l) p(ht+l ]ht)

but although the right hand side of the above expression can be written down explicitly, the density is not of a standard form and there is no analytic expression for the normalizing constant. The solution adopted by Jacquier, Polson and Rossi is to employ a series of Metropolis accept/reject independence chains. Kim and Shephard (1994) argue that the single mover algorithm employed by Jacquier, Polson and Rossi will be slow if q~is close to one and/or G, 2 is small. This is because at changes slowly; in fact when it is constant, the algorithm will not converge at all. Another approach based on the linear state space form, is to capture the non-normal disturbance term in the measurement equation, it, by a mixture of normals. Watanabe (1993) suggested an approximate method based on a mixture of two moments. Kim and Shephard (1994) propose a multimove sampler based on the linear state space form. Blocks of the hts are sampled, rather than taking them one at a time. The technique they use is based on mixing an appropriate number of normal distributions to get the required level of accuracy in approximating the disturbance in (3.2.7). Mahieu and Schotman (1994a) extend this approach by introducing more degrees of freedom in the mixture of normals where the parameters are estimated rather than fixed a priori. Note that the distribution of the ~rts can be obtained from the simulated distribution of the hrs. Jacquier, Polson and Rossi (1994, p.416) argue that no matter how many mixture components are used in the Kim and Shephard method, the tail behavior of log et 2 can never be satisfactorily approximated. Indeed, they note that given
! ! !
148
the discreteness of the Kim and Shephard state space, not all states can be visited in the small number of draws mentioned, i.e. the so called inlier problem (see also Section 3.4.1 and Nelson (1994)) is still present. As a final point it should be noted that when the hyperparameters are unknown, the simulated distribution of the state produced by the Bayesian approach allows for their sampling variability. 3.4.3. A R C H models as approximate filters The purpose here is to draw attention to a subject that will be discussed in greater detail in Section 4.3. In an A R C H model the conditional variance is assumed to be an exact function of past observations. As pointed out by Nelson and Foster (1994, p.32) this assumption is ad hoe on both economic and statistical grounds. However, because A R C H models are relatively easy to estimate, Nelson (1992) and Nelson and Foster (1994) have argued that a useful strategy is to regard them as filters which produce estimates of the conditional variance. Thus even if we believe we have a continuous time or discrete time SV model, we may decide to estimate a GARCH(1,1) model and treat the aZts as an approximate filter, as in (3.4.1). Thus the estimate is a weighted average of past squared observations. It delivers an estimate of the mean of the distribution of a2 conditional on the t~ observations at time t - 1 . As an alternative, the model suggested by Taylor (1986) and Schwert (1989), in which the conditional standard deviation is set up as a linear combination of the previous conditional standard deviation and the previous absolute value, could be used. This may be more robust to outliers as it is a linear combination of past absolute values. Nelson and Foster derive an A R C H model which will give the closest approximation to the continuous time SV formulation (see Section 4.3 for more details). This does not correspond to one of the standard models, although it is fairly close to E G A R C H . For discrete time SV models the filtering theory is not as extensively developed. Indeed, Nelson and Foster point out that a change from stochastic differential equations to difference equations makes a considerable difference in the limit theorems and optimality theory. They study the case of near diffusions as an example to illustrate these differences. 3.5. Extensions of the model 3.5.1. Persistence and seasonality The simplest nonstationary SV model has ht following a random walk. The dynamic properties of this model are easily obtained if we work in terms of the logarithmically transformed observations, log ~ . All we have to do is first difference to give a stationary process. The untransformed observations are nonstationary but the dynamic structure of the model will appear in the ACF of ]yt/yt_l] c, provided that e < 0.5. The model is an alternative to I G A R C H , that is (3.3.1) with ~ + fl = 1. The I G A R C H model is such that the squared observations have some of the features of an integrated A R M A process and it is said to exhibit persistence; see Bollerslev
149
and Engle (1993). However, its properties are not straightforward. For example it must contain a constant, 7, otherwise, as Nelson (1990) has shown, o-2 converges almost surely to zero and the model has the peculiar feature of being strictly stationary but not weakly stationary. The nonstationary SV model, on the other hand, can be analyzed on the basis that ht is a standard integrated process of order one. Filtering and smoothing can be carried out within the linear state space framework, since log y2 is just a random walk plus noise. The initial conditions are handled in the same way as is normally done with nonstationary structural time series models, with a proper prior for the state being effectively formed from the first observation; see Harvey (1989). The optimal filtered estimate of ht within the class of estimates which are linear in past log ~ ' s , that is htlt-1, is a constant plus an equally weighted moving average (EWMA) of past log ~ ' s . In IGARCH o-t z is given exactly by a constant plus an EWMA of past squared observations. The random walk volatility can be replaced by other nonstationary specifications. One possibility is the doubly integrated random walk in which A2ht is white noise. When formulated in continuous time, this model is equivalent to a cubic spline and is known to give a relatively smooth trend when applied in levels models. It is attractive in the SV context if the aim is to find a weighting function which fits a smoothly evolving variance. However, it may be less stable for prediction. Other nonstationary components can easily be brought into hr. For example, a seasonal or intra-daily component can be included; the specification is exactly as in the corresponding levels models discussed in Harvey (1989) and Harvey and Koopman (1993). Again the dynamic properties are given straightforwardly by the usual transformation applied to log y~, and it is not difficult to transform the absolute values suitably. Thus if the volatility consists of a random walk plus a slowly changing, nonstationary seasonal as in Harvey (1989, p. 40-3), the appropriate transformations are A~ log ~ and [ Yt/Yt-s [c where s is the number of seasons. The state space formulation follows along the lines of the corresponding structural time series models for levels. Handling such effects is not so easy within the GARCH framework. Different approaches to seasonality can also be incorporated in SV models using ideas of time deformation as discussed in a later sub-section. Such approaches may be particularly relevant when dealing with the kind of abrupt changes in seasonality which seem to occur in high frequency, like five minute or tick-by-tick, foreign exchange data.
3.5.2. Interventions and other deterministic effects Intervention variables are easily incorporated into SV models. For example, a sudden structural change in the volatility process can be captured by assuming that
2 =_ log log ~r t ~r2
+ ht +
2Wt
150
where wt is zero before the break and one after, and 2 is an unknown parameter. The logarithmic transformation gives (3.2.8) but with 2wt added to the right hand side. Care needs to be taken when incorporating such effects into A R C H models. For example, in the GARCH(1,1) a sudden break has to be modelled as
2
with 2 constrained so that a~ is always positive. More generally observable explanatory variables, as opposed to intervention dummies, may enter into the model for the variance.
3.5.3. Multivariate models The multivariate model corresponding to (3.1.2) assumes that each series is generated by a model of the form
Yit = CiiEit eO'5hit
~
t = 1 , . . . , T,
(3.5.1)
with the covariance (correlation) matrix of the vector et = (elt,...,eNt) t being denoted by Ez. The vector of volatilities, ht, follows a VAR(1) process, that is
ht+l
= right -}- tl t ,
where t/t ,-~ lID(O, ~ ) . This specification allows the movements in volatility to be correlated across different series via E n. Interactions can be picked up by the offdiagonal elements of 4 . The logarithmic transformation of squared observations leads to a multivariate linear state space model from which estimates of the volatilities can be computed as in Section 3.4.1. A simple nonstationary model is obtained by assuming that the volatilities follow a multivariate random walk, that is = I. If Y~ is singular, of rank K < N, there are only K components in volatility, that is each hit in (3.5.1) is a linear combination of K < N common trends, that is
ht = Oh~ + h
(3.5.2)
where h~ is the K x 1 vector of common random walk volatilities, h is a vector of constants and O is an N x K matrix of factor loadings. Certain restrictions are needed on O and h to ensure identifiability; see Harvey, Ruiz and Shephard (1994). The logarithms of the squared observations are "co-integrated" in the sense of Engle and Granger (1987) since there are N - K linear combinations of them which are white noise and hence stationary. This implies, for example, that if two series of returns exhibit stochastic volatility, but this volatility is the same with O' = (1, 1), then the ratio of the series will have no stochastic volatility. The application of the related concept of "co-persistence" can be found in Bollerslev and Engle (1993). However, as in the univariate case there is some ambiguity about what actually constitutes persistence.
151
There is no reason why the idea of common components in volatility should not extend to stationary models. The formulation of (3.5.2) would apply, without the need for h, and with h~ modelled, for example, by a VAR(1). Bollerslev, Engle and Wooldridge (1988) show that a multivariate GARCH model can, in principle, be estimated by maximum likelihood, but because of the large number of parameters involved computational problems are often encountered unless restrictions are made. The multivariate SV model is much simpler than the general formulation of a multivariate GARCH. However, it is limited in that it does not model changing covariances. In this sense it is analogous to the restricted multivariate GARCH model of Bollerslev (1986) in which the conditional correlations are assumed to be constant. Harvey, Ruiz and Shephard (1994) apply the nonstationary model to four exchange rates and find just two common factors driving volatility. Another application is in Mahieu and Schotman (1994b). A completely different way of modelling exchange rate volatility is to be found in the latent factor ARCH model of Diebold and Nerlove (1989).
3.5.4. Observation intervals, aggregation and time deformation Suppose that a SV model is observed every b time periods. In this case, h~, where z denotes the new observation (sampling) interval, is still AR(1) but with parameter ~b ~. The variance of the disturbance, t/t, increases, but a 2 remains the same. This property of the SV model makes it easy to make comparisons across different sampling intervals; for example it makes it clear why if q5 is around 0.98 for daily observations, a value of around 0.9 can be expected if an observation is made every week (assuming a week has 5 days). If averages of observations are observed over the longer period, the comparison is more complicated, as h~ will now follow an ARMA(1, l) process. However, the AR parameter is still q5 ~. Note that it is difficult to change the observation interval of ARCH processes unless the structure is weakened as in Drost and Nijman (1993); see also Section 4.4.1. Since, as noted in Section 2.4, one typically uses a discrete time approximation to the continuous time model, it is quite straightforward to handle irregularly spaced observations by using the linear state space form as described, for example, in Harvey (1989). Indeed the approach originally proposed by Clark (1973) based on subordinated processes to describe asset prices and their volatility fits quite well into this framework. The techniques for handling irregularly spaced observations can be used as the basis for dealing with time deformed observations, as noted by Stock 0988). Ghysels and Jasiak (1994a,b) suggest a SV model in which the operational time for the continuous time volatility equation is determined by the flow of information. Such time deformed processes may be particularly suited to dealing with high frequency data. If z = g(t) is the mapping between calendar time z and operational time t, then dSt = #Stdt + a(g(t) )StdWlt
and
152
dlog a(z) = a((b - log a ( z ) ) d z + cdW2~ where Wit and W2~ are standard, independent Wiener processes. The discrete time approximation generalizing (3.1.3), but including a term which in (3.1.2) is incorporated in the constant scale factor a, is then
ht+l = [1 - e-aAo(t)]b + e-ag(t)ht + ~lt
where Ag(t) is the change in operational time between two consecutive calendar time observations and qt is normally distributed with mean zero and variance e 2 ( 1 - e-2aAo(t))/2a. Clearly if A g ( t ) = 1, ~b = e -a in (3.1.3). Since the flow of information, and hence Ag(t), is not directly observable, a mapping to calendar time must be specified to make the model operational. Ghysels and Jasiak (1994a) discuss several specifications revolving around a scaled exponential function relating g(t) to observables such as past volume of trade and past price changes with asymmetric leverage effects. This approach was also used by Ghysels and Jasiak (1994b) to model return-volume co-movements and by Ghysels, Gouri6roux and Jasiak (1995b) for modeling intra-daily high frequency data which exhibit strong seasonal patterns (cf. Section 3.5.1).
3.5.5. L o n g m e m o r y Baillie, Bollerslev and Mikkelsen (1993) propose a way of extending the G A R C H class to account for long memory. They call their models Fractionally Integrated G A R C H (FIGARCH), and the key feature is the inclusion of the fractional difference operator, (1 - L) d, where L is the lag operator, in the lag structure of past squared observations in the conditional variance equation. However, this model can only be stationary when d = 0 and it reduces to GARCH. In a later paper, Bollerslev and Mikkelsen (1995) consider a generalization of the E G A R C H model of Nelson (1991) in which log o3 is modelled as a distributed lag of past et ~s involving the fractional difference operator. This F I E G A R C H model is stationary and invertible i f [ d I< 0.5. Breidt, Crato and de Lima (1993) and Harvey (1993) propose a SV model with ht generated by fractional noise
h, = nt/(1 - L ) u ,
tlt ~ NID(O, a~) ,
0< d < 1 .
(3.5.1)
Like the AR(1) model in (3.1.3), this process reduces to white noise and a random walk at the boundary of the parameter space, that is d = 0 and 1 respectively. However, it is only stationary if d < 0.5. Thus the transition from stationarity to nonstationarity proceeds in a different way to the AR(1) model. As in the AR(1) case it is reasonable to constrain the autocorrelations in (3.5.1) to be positive. However, a negative value of d is quite legitimate and indeed differencing ht when it is nonstationary gives a stationary "intermediate memory" process in which -0.5 < d < 0. The properties of the long memory SV model can be obtained from the formulae in sub-Section 3.2. A comparison of the ACF for ht following a long
153
memory process with d = 0.45 and ah 2 = 2 with the corresponding ACF when ht is AR(1) with ~b = 0.99 can be found in Harvey (1993). Recall that a characteristic property of long memory is a hyperbolic rate of decay for the autocorrelations instead of an exponential rate, a feature observed in the data (see Section 2.2e). The slower decline in the long memory model is very clear and, in fact, for z = 1000, the long memory autocorrelation is still 0.14, whereas in the AR case it is only 0.000013. The long memory shape closely matches that in Ding, Granger and Engle (1993, p. 86-8). The model may be extended by letting ~/t be an A R M A process and/or by adding more components to the volatility equation. As regards smoothing and filtering, it has already been noted that the state space approach is approximate because of the truncation involved and is relatively cumbersome because of the length of the state vector. Exact smoothing and filtering, which is optimal within the class of estimators linear in the log Yt2,s , can be carried out by a direct approach if one is prepared to construct and invert the T x T covariance matrix of the log y~ ' s .
4. Continuous time models
At the end of Section 2 we presented a framework for statistical modelling of SV in discrete time and devoted the entire Section 3 to specific discrete time SV models. To motivate the continuous time models we study first of all the exact relationship (i.e. without approximation error) between differential equations and SV models in discrete time. We examine this relationship in Section 4.1 via a class of statistical models which are closed under temporal aggregation and proceed (1) from high frequency discrete time to lower frequencies and (2) from continuous time to discrete time. Next, in Section 4.2, we study option pricing and hedging with continuous time models and elaborate on features such as the smile effect. The practical implementation of option pricing formulae with SV often requires discrete time SV and/or A R C H models as filters and forecasters of the continuous time volatility processes. Such filters, covered in Section 4.3, are in general discrete time approximations (and not exact discretizations as in Section 4.1) of continuous time SV models. Section 4.4 concludes with extensions of the basic model.
4.1. From discrete to continuous time
The purpose of this section is to provide a rigorous discussion of the relationship between discrete and continuous time SV models. The presentation will proceed first with a discussion of temporal aggregation in the context of the SARV class of models and focus on specific cases including G A R C H models. This material is covered in Section 4.1.1. Next we turn our attention to the aggregation of continuous time SV models to yield discrete time representations. This is the subject matter of Section 4.1.2.
154
4.1.1. Temporal aggregation of discrete time models Andersen's SARV class of models was presented in Section 2.4 as a general discrete time parametric SV statistical model. Let us consider the zero-mean case, namely:
Yt+I :- at~t+l
(4.1.1)
and a q for q = 1 or 2 is a polynomial function o(Kt) of the Markov process Kt with stationary autoregressive representation:
Kt = co + flKt_l +Or
where I/~1 < 1 and E[et+l lez, ~ z < t] ----0
(4.1.2)
(4.1.3a) (4.1.3b) (4.1.3c)
E[e2+lle~, o~ <_ t] = 1
E[Vt+l lez, oz z < t] = 0 .
The restrictions (4.1.3a-c) imply that o is a martingale difference sequence with respect to the filtration ~ t = a[e~, v~, z < t].22 Moreover, the conditional moment conditions in (4.3.1a-c) also imply that e in (4.1.1) is a vCnite noise process in a semi-strong sense, i.e. E[et+lle~,z < t] = 0 and E[e2+t[e,,z <__t] = 1, and is not Granger-caused by 0.23 From the very beginning of Section 2 we choose the continuously compounded rate of return over a particular time horizon as the starting point for continuous time processes. Therefore, let Yt+l in (4.1.1) be the continuously compounded rate of return for [t, t + 1] of the asset price process St, consequently:
Yt+l -~
log St+l/St
(4.1.4)
Since the unit of time of the sampling interval is to a large extent arbitrary, we would surely want the SV model defined by equations (4.1.1) through (4.1.3), (for given q and function O) to be closed under temporal aggregation. As rates of return are flow variables, closure under temporal aggregation means that for any integer m:
m-I
y}m ~) -- log Stm/atm_ m = E

k=O
ytm-k
is again conformable to a model of the type (4.1.1) through (4.1.3) for the same choice of q and 9 involving suitably adapted parameter values. The analysis in this section follows Meddahi and Renault (1995) who study temporal aggregation of SV models in detail, particularly the case a 2 = Kt, i.e. q = 2 and 9 is the identity
22 Note that we do not use here the decomposition appearing in (2.4.9) namely, ot = [y + aKt 1]fit. 23 The Granger noncausality considered here for et is weaker than Assumption 2.3.2.A as it applies only to the first two conditional moments.
155
function. It is related to the so called continuous time G A R C H approach of Drost and Werker (1994). Hence, we have (4.1.1) with:
= co + f l a 2 , + vt
(4.1.5)
With conditional moment restrictions (4.1.3a-c) this model is closed under aggregation. For instance, for m = 2:
y(2) (2) (2) t+l ~ Yt+l q- Yt = o't_lt+l

with:
0(2)((2) -~2__ (2) (0.121)2= W(2) q- P ~O't-3) -I-Dr-1

where: w 2) = 2o)(1 + r)
fl(2) V}21
= r2 = ( f l + l)[flOt-2+Ot-l]
Moreover, it also worth noting that whenever a leverage effect is present at the aggregate level, i.e.:
CoY [D}221,C}22l] 5 0
with e}z) 1 = (Y,-I + Yt-2)/0-}2_)3, it necessarily appears at the disaggregate level, i.e.
Cov(v,, e,) 0.
For the general case Meddahi and Renault (1995) show that model (4.1.5) together with conditional moment restrictions (4.1.3a-c) is a class of processes closed under aggregation. Given this result, it is of interest to draw a comparison with the work of Drost and Nijman (1993) on temporal aggregation of GARCH. While establishing this link between Meddahi and Renault (1995) and Drost and Nijman (1993) we also uncover issues of leverage properties in G A R C H models. Indeed, contrary to what is often believed, we find leverage effect restrictions in G A R C H processes. Moreover, we also find from the results of Meddahi and Renault that the class of weak G A R C H processes includes certain SV models. To find a class of G A R C H processes which is closed under aggregation Drost and Nijman (1993) weakened the definition of GARCH, namely for a positive stationary process 0-:
0-t 2 = W@ ay2_l + b0-2_1
(4.1.6)
where a + b < 1, they defined:

-
strong G A R C H if Yt+l/0-t is i.i.d, with mean zero and variance 1
156 -
semi-strong G A R C H if E [Yt$11Y~, ~ -< tl = 0 and E[y2+I[Y~'2 z2_ < t]t]= if2 o-2.24 weak G A R C H ifEL[yt+lly~,y2,z < t] = 0; E L [Yt+lly~,yz,z < =
D r o s t and N i j m a n show that weak G A R C H processes temporally aggregate and provide explicit formulae for their coefficients. In Section 2.4 it was noted that the f r a m e w o r k of S A R V includes G A R C H processes whenever there is no r a n d o m n e s s specific to the volatility process. This p r o p e r t y will allow us to show that the class of weak G A R C H processes - as defined above - in fact includes m o r e general SV processes which are strictly speaking not G A R C H . The arguments, following M e d d a h i and Renault (1995), require a classification o f the models defined by (4.1.3) and (4.1.5) according to the value of the correlation between ut and ~ , namely: (a) Models with perfect correlation: This first class, henceforth denoted C1, is characterized by a linear correlation between ot and ~ conditional on (e~, v,, z < t) which is either 1 o r - 1 for the model in (4.1.5). (b) Models without perfect correlation: This second class, henceforth denoted C2, has the above conditional correlation less than one in absolute value. The class C1 contains all semi-strong G A R C H processes, indeed whenever V a r [ ~ ] e t , u~,z < t] is p r o p o r t i o n a l to Var[v, le~,o~,z < t] in C 1 w e have a semistrong G A R C H . Consequently, a semi-strong G A R C H processes is a model (4.1.5) with (1) restrictions (4.1.3), (2) a perfect conditional correlation as in C1, and (3) restrictions on the conditional kurtosis dynamics. 25 Let us consider n o w the following assumption: ASSUMPTION 4.1.1. The following two conditional expectations are zero:
E[etotle~, ~,-c < t] = 0 E[e~le~,o~,z < t] = 0

.
(4.1.7a) (4.1.7b)
This assumption a m o u n t s to an absence of leverage effects, where the latter is defined in a conditional covariance sense to capture the notion o f instantaneous causality discussed in Section 2.4.1 and applied here in the context o f weak white noise. 26 It should also be noted that (4.1.7a) and (4.1.7b) are in general not equivalent except for the processes of class C1. The class C2 allows for r a n d o m n e s s p r o p e r to the volatility process due to the imperfect correlation. Yet, despite this volatility-specific r a n d o m n e s s one can 24 For any Hilbert space H of L2, EL[xtlz, z C 11] is the best linear predictor ofxt in terms of I and z E H. It should be noted that a strong GARCH process is afortiori semi-strong which itself is also a weak GARCH process. 25 In fact, Nelson and Foster (1994) observed that the most commonly used ARCH models effectively assume that the variance of the variance rises linearly in az 4, which is the main drawback of ARCH models in approximating SV models in continuous time (see also Section 4.3). 26 The conditional expectation (4.1.7b) can be viewed as a conditional covariance between et and e2. It is this conditional covariance which, if nonzero, produces leverage effects in GARCH.
157
show that under Assumption 4.1.1 processes of C2 sat!fly the weak G A R C H definition. Afortiori, any SV model conformable to (4.1.3a-c), (4.1.5), (4.7. l a b ) and Assumption 4.1.1 is a weak G A R C H process. It is indeed the symmetry assumptions (4.1.7a-b), or restrictions on leverage in G A R C H , that make EL [yt2+t y2, z < t] = o'2 (together with the conditional moment restrictions (4.1.3a-c)) and yield the internal consistency for temporal aggregation found by Drost and Nijman (1993, example 2, p. 915) for the class of so called symmetric weak GARCH(1,1). Hence, this class of weak GARCH(1,1)processes can be viewed as a subclass of processes satisfying (4.1.3) and (4.1.5). 27
lye,
4.1.2. Temporal aggregation of continuous time models

To facilitate our discussion we will specialize the general continuous time model (2.3.1) to processes with zero drift, i.e.: d logSt = atdWt
dat :
(4.1.8a) (4.1.8b) (4.1.8c)
ytdt + 6tdWt
Cov (d Wt, d WT) = ptdt
where the stochastic processes o-t,~Jt,6t and Pt are I t = [aT; z < t] adapted. To ensure that at is a nonnegative process one typically follows either one of two strategies: (1) considering a diffusion for log o-2 or (2) describing at2 as a CEV process (or Constant Elasticity of Variance process following Cox (1975) and Cox and Ross (1976)). 28 The former is frequently encountered in the option pricing literature (see e.g. Wiggins (1987)) and is also clearly related to Nelson (199l), who introduced E G A R C H , and to the log-Normal SV model of Taylor (1986). The second class of CEV processes can be written as
da2t = k ( O - aZ)dt + ~(at2)adW7
(4.1.9)
where 6 < 1/2 ensures that o-2 is a stationary process with nonnegative values. Equation (4.1.9) can be viewed as the continuous time analogue of the discrete time SARV class of models presented in Section 2.4. This observation establishes links with the discussion of the previous Section 4.1.1 and yields exact discretization results of continuous time SV models. Here, as in the previous section, it will be tempting to draw comparisons with the G A R C H class of models, in particular the diffusions proposed by Drost and Werker (1994) in line with the temporal aggregation of weak G A R C H processes.
27 AS noted before, the class of processes satisfying (4.1.3) and (4.1.5) is closed under temporal aggregation, including processeswith leverageeffectsnot satisfyingAssumption4.1.1. 28 Occasionallyone encountersspecificationswhich do not ensure nonnegativityof the at process. For the sake of computational simplicity some authors for instance have considered Ornstein-Uhlenbeck processesfor at or at 2 (see e.g. Stein and Stein (1991)).
158
Firstly, one should note that the CEV process in (4.1.9) implies an autoregressive model in discrete time for a 2 , namely: 2 fit+At : 0 ( 1 -- e - k a t ) -I- e -kAt a t 2 +e-kat
it+At ek(U-t)7(a2)6dW~ . ,It

(4.1.10)
Meddahi and Renault (1995) show that whenever (4.1.9) and its discretization (4.1.10) govern volatility, the discrete time process log St+(k+l)At/St+kAt~ k C ~- is a SV process satisfying the model restrictions (4.1.3a-c) and (4.1.5). Hence, from the diffusion (4.1.9) we obtain the class of discrete time SV models which is closed under temporal aggregation, as discussed in the previous section. To be more specific, consider for instance At = 1 , then from (4.1.10) it follows that:
Yt+l
log St+l/St
= fit(1)~ t + l
(fill))2 : W fl(fi~l))2Ot
where from (4.1.10):
(4.1.11)
fl=e-k,w=O(1--e-k),
{1-e ~'~ -k rt+I
e
3,
e {u-t) (fi2u) Wg
(4.1.12)
It is important to note from (4.1.12) that absence of leverage effect in continuous time, i.e. Pt 0 in (4.1.8c), means no such effect at low frequencies and the two symmetry conditions of Assumption 4.1.1 are fulfilled. This line of reasoning also explains the temporal aggregation result of Drost and Werker (1994), but one more generally can interpret discrete time SV models with leverage effects as exact discretizations of continuous time SV models with leverage.
-~-
4.2. Option pricing and hedging

Section 4.2.1 is devoted to the basic option pricing model with SV, namely the Hull and White model of Section 2. We are better equipped now to elaborate on its theoretical foundations. The practical implications appear in Section 4.2.2 while 4.2.3 concludes with some extensions of the basic model.
4.2.1. The basic option pricing formula Consider again formula (2.1.10) for a European option contract maturing at time t + h =-T. As noted in Section 2.1.2, we assume continuous and frictionless trading. Moreover no arbitrage profits can be made from trading in the underlying asset and riskless bonds ; interest rates are nonstochastic so that B(t, T) defined by (2.1.12) denotes the time t price of a unit discount bond maturing at time T. Consider now the probability space (fl , ~ , P ) , which is the fundamental space of the underlying asset price process S:
159
ast/st = #(t, St, ut)at + atdWts a 2 = f(Ut) aut = a(t, Ut)dt + b(t, u t ) a w 7
(4.2.1)
where Wt = (W s, WT) is a standard two dimensional Brownian Motion (Ws and W7 are independent, zero-mean and unit variance) defined on (f~ ,if,P). The function f , called the volatility function, is assumed to be one-to-one. In this framework (under suitable regularity conditions) the no free lunch assumption is equivalent to the existence of a probability distribution Q on (D,~), equivalent to P, under which discounted price processes are martingales (see Harrison and Kreps (1979)). Such a probability is called an equivalent martingale measure and is unique if and only if the markets are complete (see Harrison and Pliska (1981)).29 From the integral form of martingale representations (see Karatzas and Shreve (1988), p. 184), the (positive) density process of any probability measure Q equivalent to P can be written as: t 1
t $2
Mt=exp[-fo
2sdWS-~fo(2U)
du
(4.2.2)
a a - ~ 1 f0 t (2 - ~ t 2udW: ~~ ) 2du ]
where the processes 2s and 2 ~ are adapted to the natural filtration tTt -- o'[Wz,"c < t], t > 0, and satisfy the integrability conditions (almost surely): 2 <+c~and 2 <+c~ . defined by: (4.2.3)
By Girsanov's theorem the process W = ( w S ~ ) ' ~s = ~s +
/0'
7t
2Sdu and W7 = W7 +
/0
22du
is a two dimensional Brownian Motion under Q. The dynamic of the underlying asset price under Q is obtained directly from (4.2.l) and (4.2.3). Moreover, the discounted asset price process StB(O, t), 0 < t < T, is a Q-martingale if and only if for rt defined in (2.1.11):
2s t _ #(t, St, Ut) - rt
(4.2.4)
Since S is the only traded asset, the process 2 ~ is not fixed. The process 2s defined by (4.2.4) is called the asset risk premium. By analogy, any process 2~ satisfying the required integrability condition can be viewed as a volatility risk
29 Here, the market is seen as incomplete(before taking into account the market pricing of the option) so that we have to characterizea set of equivalentmartingalemeasures.
160
premium and for any choice of 2 ~ , the probability Q(2 ~) defined by the density process M in (4.2.2) is an equivalent martingale measure. Therefore, given the volatility risk premium process 2~: C [ = B(t, T)E Q(~)[Max[0,Sr - K]] , 0< t< T (4.2.5)
is an admissible price process of the European call option. 3 The Hull and White option pricing model relies on the following assumption, which restricts the set of equivalent martingale measures: ASSUMPTION 4.2.1. The volatility risk premium 27 only depends on the current value of the volatility process: 47 = 2~(t, Ut), Vt c [0, T]. This assumption is consistent with an intertemporal equilibrium model where the agent preferences are described by time separable isoelastic utility functions (see He (1993) and Pham and Touzi (1993)). It ensures that wS and W" are independent, so that the Q(2 ") distribution of log St~St, conditionally on ~t and the volatility path (at, 0 < t < T) is normal with mean ftr r u d u - 72(t, T) and variance y2(t, T ) = ftr a2du. Under Assumption 4.2.1 one can compute the expectation in (4.2.5) conditionally on the volatility path, and obtain finally: C [ = StE Q(x~)[q~(dlt) - e-X'dp(d2t)] (4.2.6)
with the same notation as in (2.1.20). To conclude it is worth noting that many option pricing formulae available in the literature have a feature common with (4.2.6) as they can be expressed as an expectation of the Black-Scholes price over a heterogeneous distribution of the volatility parameter (see Renault (1995) for an elaborate discussion on this subject). 4.2.2. Pricing and hedging with the Hull and White model The Markov feature of the process (S, G) implies that the option price (4.2.6) only depends on the contemporaneous values of the underlying asset prices and its volatility. Moreover, under mild regularity conditions, this function is differentiable. Therefore, a natural way to solve the hedging problem in this stochastic volatility context is to hedge a given option of price C] by A~ units of the underlying asset and ~ t units of any other option of price Ct2 where the hedging ratios solve:
{ o C / o s , - A* 2 t - ~ t OC;/OSt = 0 /Oa, ~ , * OC,2/Oa, = 0
(4.2.7)
Such a procedure, known as the delta-sigma hedging strategy, has been studied by Scott (1991). By showing that any European option completes the market, i.e. OQ2,/Oat # O, 0 < t < T, Bajeux and Rochet (1992) justify the existence of an 30 Here elsewhere E~(.) = EQ(.I~t) stands for the conditional expectation operator given o~t when the price dynamics are governed by Q.
161
unique solution to the delta-sigma hedging problem (4.2.7) and the implicit assumption in the previous sections that the available information/t contains the past values (St, at), z < t. In practice, option traders often focus on the risk due to the underlying asset price variations and consider the imperfect hedging strategy ~ t = 0 and At = oclt/ost Then, the Hull and White option pricing formula (4.2.6) provides directly the theoretical value of At:
At = OCt2~/OSt = EQ(X')~b(dlt)
(4.2.8)
This theoretical value is hard to use in practice since: (1) even if we knew the Q(2 ) conditional probability distribution of dlt given It (summarized by at), the derivation of the expectation (4.2.8) might be computationally demanding and (2) the conditional probability is directly related to the conditional probability distribution of 72(t, T) = ftr ~r2du given at, which in turn may involve nontrivially the parameters of the latent process at. Moreover, these parameters are those of the conditional probability distribution o f ])2(t, T) given at under the risk-neutral probability Q(2 ~) which is generally different from the Data Generating Process P. The statistical inference issues are therefore quite complicated. We will argue in Section 5 that only tools like simulation-based inference methods involving both asset and option prices (via an option pricing model) may provide some satisfactory solutions. Nevertheless, a practical way to avoid these complications is to use the BlackScholes option pricing model, even though it is known to be misspecified. Indeed, option traders know that they cannot generally obtain sufficiently accurate option prices and hedge ratios by using the BS formula with historical estimates of the volatility parameters based on time series of the underlying asset price. However, the concept of Black-Scholes implied volatility (2.1.23) is known to improve the pricing and hedging properties of the BS model. This raises two issues: (1) what is the internal consistency of the simultaneous use of the BS model (which assumes constant volatility) and of BS implied volatility which is clearly time-varying and stochastic and (2) how to exploit the panel structure of option pricing errors? 31 Concerning the first issue, we noted in Section 2 that the Hull and White option pricing model can indeed be seen as a theoretical foundation for this practice of pricing. Hedging issues and the panel structure of option pricing errors are studied in detail in Renault and Touzi (1992) and Renault (1995).
4.2.3. Smile or smirk? As noted in Section 2.2, the smile effect is now a well documented empirical stylized fact. Moreover the smile becomes sometimes a smirk since it appears more or less lopsided (the so called skewness effect). We cautioned in Section 2 that some explanations of the smile/smirk effect are often founded on tempting analogies rather than rigorous proofs.
3l The valueof a whichequates the BS formulato the observedmarket price of the option heavily depends on the actual date t, the strike price K, the time to maturity (T - t) and thereforecreates a panel data structure.
162
E. Ghysels,A. C. Harvey and E. Renault
To the best of our knowledge, the state of the art is the following: (i) the first formal proof that a Hull and White option pricing formula implies a symmetric smile was provided by Renault and Touzi (1992), (ii) the first complete proof that the smile/smirk effects can alternatively be explained by liquidity problems (the upper parts of the smile curve, i.e. the most expensive options are the least liquid) was provided by Platten and Schweizer (1994) using a microstructure model, (iii) there is no formal proof that asymmetries of the probability distribution of the underlying asset price process (leverage effect, non-normality .... ) are able to capture the observed skewness of the smile. A different attempt to explain the observed skewness is provided by Renault (1995). He showed that a slight discrepancy between the underlying asset price St used to infer BS implied volatilities and the stock price St considered by option traders may generate an empirically plausible skewness in the smile. Such nonsynchronous St and St may be related to various issues: bid-ask spreads, non-synchronous trading between the two markets, forecasting strategies based on the leverage effect, etc. Finally, to conclude it is also worth noting that a new approach initiated by Gouri6roux, Monfort, Tenreiro (1994) and followed also by Ait-Sahalia, Bickel, Stoker (1994) is to explain the BS implied volatility using a nonparametric function of some observed state variables. Gouri~roux, Monfort, Tenreiro (1995) obtain for example a good nonparametric fit of the following form: crt(St,K) = a(K) + b(K)(log St/St_l) 2 . A classical smile effect is directly observed on the intercept a(K) but an inverse smile effect appears for the path-dependent effect parameter b(K). For American options a different nonparametric approach is pursued by Broadie, Detemple, Ghysels and Torr6s (1995) where, besides volatility, exercise boundaries for the option contracts are also obtained. 32 4.3. Filtering and discrete time approximations In Section 3.4.3 it was noted that the A R C H class of models could be viewed as filters to extract the (continuous time) conditional variance process from discrete time data. Several papers were devoted to the subject, namely Nelson (1990, 1992, 1995a,b) and Nelson and Foster (1994, 1995). It was one of Nelson's seminal contributions to bring together A R C H and continuous time SV. Nelson's first contribution in his 1990 paper was to show that A R C H models, which model volatility as functions of past (squared) returns, converge weakly to a diffusion process, either a diffusion for log cr~ or a CEV process as described in Section 4.1.2. In particular, it was shown that a GARCH(1,1) model observed at finer and finer time intervals At = h with conditional variance parameters COh=hOg,~h=~(h/2) 1/2 and f l h = l - ~ ( h / 2 ) l / Z - O h and conditional mean
32 See also Bossaertsand Hillion(1995) for the use of a nonparametrichedgingprocedure and the smile effect.
163
#h = hca2 converges to a diffusion limit quite similar to equations (4.1.8a) combined with (4.1.9) with 6 = 1, namely
d logSt = ca2dt + fftdWt d .,2 = - 04)at + 4aW7
Similarly, it was also shown that a sequence of AR(1)-EGARCH(1,1) models converges weakly to an Ornstein-Uhlenbeck diffusion for In a2: d In o-2 t = ~(fl - In a2t)dt + d W t Hence, these basic insights showed that the continuous time stochastic difference equations emerging as diffusion limits of A R C H models were no longer A R C H but instead SV models. Moreover, following Nelson (1992), even when misspecified, A R C H models still kept desirable properties regarding extracting the continuous time volatility. The argument was that for a wide variety of misspecified A R C H models the difference between the A R C H filter volatility estimates and the true underlying diffusion volatilities converges to zero in probability as the length of the sampling time interval goes to zero at an appropriate rate. For instance the GARCH(1,1) model with ~oh, c~hand ]~h described before estimates &t 2 as follows:
^2 7 t : (.Oh( 1 __
flh)-l+
i=o
O~hflhYt_h(i+l)
where yt : log St/St-h. This filter can be viewed as a particular case of equation (3.4.1). The GARCH(1,1) and many other models, effectively achieve consistent estimation of at via a lag polynomial function of past squared returns close to time t. The fact that a wide variety of misspecified A R C H models consistently extract at from high frequency data raises questions regarding efficiency of filters. The answers to such questions are provided in Nelson (1995a,b) and Nelson and Foster (1994, 1995). In Section 3.4 it was noted that the linear state space Kalman filter can also be viewed as a (suboptimal) extraction filter for O"t. Nelson and Foster (1994) show that the asymptotically optimal linear Kalman filter has asymptotic variance for the normalized estimation error h-1/4[ln(~-2) - - l n f f ~ ] equal to 2Y(1/2) V2 where Y ( x ) = d[lnF(x)]/dx and 2 is a scaling factor. A model, closely related to E G A R C H of the following form: ln(~-2+h) = ln(?r2) + p2(St+h- St)6t l + 2 ( 1 - p2)V2[F(1/2)'/2F(3/2)1/21St+h - St]&t 1 -
2-1/2]
yields the asymptotically optimal A R C H filter with asymptotic variance for the normalized estimation error equal to 2 1 2 ( 1 - p2)]l/Zwhere the parameter p measures the leverage effect. These results also show that the differences between
164
the most efficient suboptimal Kalman filter and the optimal A R C H filter can be quite substantial. Besides filtering one must also deal with smoothing and forecasting. Both of these issues were discussed in Section 3.4 for discrete time SV models. The prediction properties of (misspecified) A R C H models were studied extensively by Nelson and Foster (1995). Nelson (1995) takes A R C H models a step further by studying smoothing filters, i.e. A R C H models involving not only lagged squared returns but also future realizations, i.e. r = t - T in equation (3.4.1).
4.4. L o n g m e m o r y
We conclude this section with a brief discussion of long memory in continuous time SV models. The purpose is to build continuous time long memory stochastic volatility models which are relevant for high frequency financial data and for (long term) option pricing. The reasons motivating the use of long memory models were discussed in sections 2.2 and 3.5.5. The advantage of considering continuous time long memory is their relative ability to provide a more structural interpretation of the parameters governing short term and long term dynamics. The first subsection defines fractional Brownian Motion. Next we will turn our attention to the fractional SV model followed by a section on filtering and discrete time approximations.
4.4.1. Stochastic integration with respect to f r a c t i o n a l Brownian M o t i o n
We recall in this subsection a few definitions and properties of fractional and long memory processes in continuous time, extensively studied for instance in Comte and Renault (1993). Consider the scalar process:
xt = a(t - s)dWs .
(4.4.1)
Such a process is asymptotically equivalent in quadratic mean to the stationary process:

Yt =
.L'
O0
a(t - s)dWs
(4.4.2)
whenever f o ~ aZ(x)dx < +e~. Such processes are called fractional processes if a(x) = x a(x)/r(1 + )for < 1/2, a continuously differentiable on [0, T] and where F(1 + ~) is a scaling factor useful for normalizing fractional derivative operators on [0, T]. Such processes admit several representations, and in particular they can also be written:
xt =
fo' c(t -
s)dW~s,
W~t
= Jo [, F(1 (t :-+
~) dW~
(4.4.3)
where W~ is the so-called fractional Brownian Motion of order ~ (see Mandelbrot and Van Ness (1968)).
165
The relation between the functions a and c is one-to-one. One can show that W~ is not a semi-martingale (see e.g. Rogers (1995)) but stochastic integration with respect to W~ can be defined properly. The processes xt are long memory if:
X----~ -}-O0
lim x ? t ( x ) = a o ~ , O < ~ < 1/2
and
O<ao~<+cx~ ,
(4.4.4)
for instance,
dxt = - k r t d t + crdW~t xt = O,k > O ,
0<~<
1/2
(4.4.5)
with its solution given by: xt = (t - s)~(r(1 + e ) ) - l d x f (4.4.6a)
x}~) =
I'
e -k(t-s) a d Ws
(4.4.6b)
Note that, x}~) the derivative of order ~ of xt, is a solution of the usual SDE:
dzt : - k z t d t + a d W t .
4,4.2. The fractional S V model
To facilitate comparison with both the F I E G A R C H model and the fractional extensions of the log-Normal SV model discussed in Section 3.5.5 let us consider the following fractional SV model (henceforth FSV):
d S t / S t = tTtdm t
(4.4.7a) (4.4.7b)
d log at = - k l o g trtdt + 7dW~t
where k > 0 and 0 _< ~ < 1/2. If nonzero, the fractional exponent ~ will provide some degree of freedom in the order of regularity of the volatility process, namely the greater ~ the smoother the path of the volatility process. If we denote the autocovariance function of o- by r~(.) then:
>O=~(r~(h)-r~(O))/h~O
as
h~0
This would be incorrectly interpreted as near-integrated behavior, widely found in high frequency data for instance, when:
ro(h)-r~(O)/h= (ph_X)/h~logp
as
h~0
and ~rt is a continuous time AR(1) with correlation p near 1. The long memory continuous time approach allows us to model persistence with the following features:(1) the volatility process itself (and not just its logarithm) has hyperbolic decay of the correlogram ; (2) the persistence of volatility shocks yields leptokurtic features for returns which vanishes with temporal
166
aggregation at a slow hyperbolic rate of decay. 33 Indeed for rate of return on
[0,h]:
E[log St+h/St - E(log St+h/St)] 4 ---, 3 (E[log St+h/St-E(log St+h/St)]2) 2 as h --* ~ at a rate h 2~-1 if ~ 6 [0, 1/2] and a rate exp(-kh/2) if ~ = 0.
4.4.3. Filtering and discrete time approximations

The volatility process dynamics are described by the solution to the SDE (4.4.5), namely: log o't = (t - s)~/F(1 + e)dlog ~!~) (4.4.6)
where log o-(~) follows the O-U process: d log a}~) -~ - k log a}~)dt + 7dWt (4.4.7)
To compute a discrete time approximation one must evaluate numerically the integral (4.4.6) using only values of the process log ~(~) on a discrete partition of [o, t] at points j / n , j = 0, 1 . . . , [nt]. 34 m natural way to proceed is to use step functions, generating the following proxy process: [nt] log~ = ~(t-(jj=l
1)/n)~/F(1 + e ) A l o g o ' ~
(4.4.8)
where A loga(~ ) = log o-(~ ) -loga!~. ) . . . . Comte and Renault (1995) show that J/n j/n tJ - t)/n log &,t converges to the log o-t process for n ---+~ uniformly on compact sets. Moreover, by rearranging (4.4.8) one obtains:
i.1 where L~ is the lag operator corresponding to the sampling scheme j/n,
(1 -
loggr~/~ = [~=o([(i+l)~-i~]/n~r(l+c~))L
logcr (~)j/n
(4.4.9) i.e.
L, Zj/, = Z(j-1)/n. With this sampling scheme logo-(~) is a discrete time AR(1)
deduced from the continuous time process with the following representation:
pnL,)logcr~ = Uj/n
(4.4.10)
where Pn = exp(-k/n) and uj/n is the associated innovations process. Since the process is stationary we are allowed to write (assuming log a~.~ = uj/. = 0 for j < 0):
33 With usual G A R C H or SV models, it vanishes at an exponential rate (see Drost and Nijman (1993) and Drost and Werker (1994) for these issues in the short memory case). 34 [Z] is the integer k such that k < z < k + 1.
167
lg'(j~ = L/=~n~r(1 +
~) .]
(1 - pnLn)-luj/n
(4.4.11)
which gives a parameterization of the volatility dynamics in two parts: (1) a long memory part which corresponds to the filter Z+=~aiLin/n ~ with ai = [(i + 1)~-i~]/F(1 + ~) and (2) a short memory part which is characterized by the AR(1) process: (1 - PnLn)-luj/n. Indeed, one can show that the long memory filter is "long-term equivalent" to the usual discrete time long memory filters ( 1 - L ) -~ i n the sense that there is a long term relationship (a cointegration relation) between the two types of processes. However, this long-term equivalence between the long-memory filter and the usual discrete time one (1 - L)-~ does not imply that the standard parametrization FARIMA(1, a,0) is well-suited in our framework. Indeed, one can show that the usual discrete time filter ( 1 - L) -~ introduces some mixing between long and short term characteristics whereas the parsimonious continuous time model doesn't. 35 This feature clearly puts the continuous time FSV at an advantage with regard to the discrete time SV and G A R C H long-memory models.
5. Statistical inference
Evaluating the likelihood function of A R C H models is a relatively straightforward task. In sharp contrast for SV models it is impossible to obtain explicit expressions for the likelihood function. This is a generic feature common to almost all nonlinear latent variable models. The lack of estimation procedures for SV models made them for a long time an unattractive class of models in comparison to ARCH. In recent years, however, remarkable progress has been made regarding the estimation of nonlinear latent variable models in general and SV models in particular. A flurry of methods are now available and are up and running on computers with ever increasing CPU performance. The early attempts to estimate SV models used a G M M procedure. A prominent example is Melino and Turnbull (1990). Section 5.1 is devoted to G M M estimation in the context of SV models. Obviously, G M M is not designed to handle continuous time diffusions as it requires discrete time processes satisfying certain regularity conditions. A continuous time G M M approach, developed by Hansen and Scheinkman (1994), involves moment conditions directly drawn from the continuous time representation of the process. This approach is discussed in Section 5.3. In between, namely in Section 5.2, we discuss the QML approach suggested by Harvey, Ruiz and Shephard (1994) and Nelson (1988). It relies on the fact that the nonlinear (Gaussian) SV model can be transformed into a linear non-Gaussian state space model as in Section 3, and from this a Gaussian quasi-likelihood can be computed. None of the methods covered in Sections 5.1 through 5.3 involve simulation. However, increased computer power has made simulation-based es35 Namely, (1 -Ln)~log~/n is not an AR(1) process.
168
timation techniques increasingly popular. The simulated method of moments, or simulation-based GMM approach proposed by Duffle and Singleton (1993), is a first example which is covered in Section 5.4. Next we discuss the indirect inference approach of Gouri&oux, Monfort and Renault (1993) and the moment matching methods of Gallant and Tauchen (1994) in Section 5.5. Finally, Section 5.6 covers a very large class of estimators using computer intensive Markov Chain Monte Carlo methods applied in the context of SV models by Jacquier, Polson and Rossi (1994) and Kim and Shephard (1994), and simulation based ML estimation proposed in Danielsson (1994) and Danielsson and Richard (1993). In each section we will only try to limit our focus to the use of estimation procedures in the context of SV models and avoid details regarding econometric theory. Some useful references to complement the material which will be covered are (1) Hansen (1992), Gallant and White (1988), Hall (1993) and Ogaki (1993) for G M M estimation, (2) Gouri6roux and Monfort (1993b) and Wooldridge (1994) for QMLE, (3) Gouri&oux and Monfort (1995) and Tauchen (1995) for simulation based econometric methods including indirect inference and moment matching, and finally (4) Geweke (1995) and Shephard (1995) for Markov Chain Monte Carlo methods. 5.1. Generalized method of moments Let us consider the simple version of the discrete time SV as presented in equations (3.1.2) and (3.1.3) with the additional assumption of normality for the probability distribution of the innovation process (et, t/t). This log-normal SV model has been the subject of at least two extensive Monte Carlo studies on GMM estimation of SV models. They were conducted by Andersen and Sorensen (1993) and Jacquier, Polson and Rossi (1994). The main idea is to exploit the stationary and ergodic properties of the SV model which yield the convergence of sample moments to their unconditional expectations. For instance, the second and fourth moments are simple expressions of 0-2 and 0-h 2, namely ~2exp(0-]/2) and 30-4exp(20-2) respectively. If these moments are computed in the sample, 0-2 can be estimated directly from the sample kurtosis, k, which is the ratio of the fourth moment to the second moment squared. The expression is just &2 = log(~/3). The parameter 0-2 can then be estimated from the second moment by substituting in this estimate of 0-2. We might also compute the first-order autocovariance of ~ , or simply the sample mean of ~y2_ 1 which has expectation a4exp({ 1 + q~}0-h 2) and from which, given the estimate of 0-2 and 0-h 2 , it is straightforward to get an estimate of ~b. The above procedure is an example of the application of the method of moments. In general terms, m moments are computed. For a sample of size T, let gr(fl) denote the m x 1 vector of differences between each sample moment and its theoretical expression in terms of the model parameters/L The generalized method of moments (GMM) estimator is constructed by minimizing the criterion function ]~r = Arg min gr(fl)' Wrgr(fl) P
169
where Wr is an matching each of quier, Poison and by (3.2.2) for c =
m m weighting matrix reflecting the importance given to the moments. When et and r/t are mutually independent, JacRossi (1994) suggest using 24 moments. The first four are given 1,2, 3, 4, while the analytic expression for the others is:
E[I Y;Yt~-~ I] -c = 1,2 ,
~r2c2c F
/zc
---~ah[1 + ~]
z = 1,2, .., 10 .36
In the more general case when et and qt are correlated, Melino and Turnbull (1990) included estimates of: E[I Yt [ Yt-~], "c = 0, 1, -4-2,..., 10. They presented an explicit expression in the case of z = 1 and showed that its sign is entirely determined by p. The G M M method may also be extended to handle a non-normal distribution for et. The required analytic expressions can be obtained as in Section 3.2. On the other hand, the analytic expression of unconditional moments presented in Section 2.4 for the general SARV model may provide the basis of G M M estimation in more general settings (see Andersen (1994)). From the very start we expect the G M M estimator not to be efficient. The question is how much inefficiency should be tolerated in exchange for its relative simplicity. The generic setup of G M M leaves unspecified the number of moment conditions, except for the minimal number required for identification, as well as the explicit choice of moments. Moreover, the computation of the weighting matrix is also an issue since many options exist in practice. The extensive Monte Carlo studies of Andersen and Sorensen (1993) and Jacquier, Poison and Rossi (1994) attempted to answer these outstanding questions. In general they find that G M M is a fairly inefficient procedure primarily stemming from the stylized fact, noted in Section 2.2, that in equation (3.1.3) is quite close to unity in most empirical findings because volatility is highly persistent. For parameter values of close to unity convergence to unconditional moments is extremely slow suggesting that only large samples can rescue the situation. The Monte Carlo study of Andersen and Sorensen (1993) provides some guidance on how to control the extent of the inefficiency, notably by keeping the number of moment conditions small. They also provide specific recommendations for the choice of weighting matrix estimators with data-dependent bandwidth using the Bartlett kernel.
5.2. Quasi maximum likelihood estimation 5.2.1. The basic model

Consider the linear state space model described in sub-Section 3.4.1, in which (3.2.8) is the measurement equation and (3.1.3) is the transition equation. The
36 A simpleway to derivethese moment conditionsis via a two-step approach similar in spirit to (2.4.8) and (2.4.9) or (3.2.3).
170
QML estimators of the parameters ~b, a 2 n and the variance of it, o-~, are obtained by treating it and ~/t as though they were normal and maximizing the prediction error decomposition form of the likelihood obtained via the Kalman filter. As noted in Harvey, Ruiz and Shephard (1994), the quasi maximum likelihood (QML) estimators are asymptotically normal with covariance matrix given by applying the theory in Dunsmuir (1979, p. 502). This assumes that ~/t and it have finite fourth moments and that the parameters are not on the boundary of the parameter space. The parameter co can be estimated at the same time as the other parameters. 2t s, since this is Alternatively, it can be estimated as the mean of the log Yt asymptotically equivalent when q~ is less than one in absolute value. Application of the QML method does not require the assumption of a specific distribution for et. We will refer to this as unrestricted QML. However, if a distribution is assumed, it is no longer necessary to estimate try, as it is known, and an estimate of the scale factor, a2, can be obtained from the estimate of co. Alternatively, it can be obtained as suggested in sub-Section 3.4.1. If unrestricted QML estimation is carried out, a value of the parameter determining a particular distribution within a class may be inferred from the estimated variance of it. For example in the case of the Student's t, v may be determined from the knowledge that the theoretical value of the variance of it is 4.93 + ~'(v/2) (where u?(.) is the digamma function introduced in Section 3.2.2). 5.2.2. Asymmetric model In an asymmetric model, QML may be based o n the modified state space form in (3.4.3). The parameters try, cry, 2 q~, #., and y* can be estimated via the Kalman filter without any distributional assumptions, apart from the existence of fourth moments of qt and it and the joint symmetry of it and qt. However, if an estimate of p is wanted it is necessary to make distributional assumptions about the disturbances, leading to formulae like (3.4.4) and (3.4.5). These formulae can be used 2 ~b and p. to set up an optimization with respect to the original parameters tr2, ~rn, This has the advantage that the constraint ]p] < 1 can be imposed. Note that any t-distribution gives the same relationship between the parameters, so within this class it is not necessary to specify the degrees of freedom. Using the QML method with both the original disturbances assumed to be Gaussian, Harvey and Shephard (1993) estimate a model for the CRSP daily returns on a value weighted US market index for 3rd July 1962 to 31st December 1987. These data were used in the paper by Nelson (1991) to illustrate his E G A R C H model. The empirical results indicate a very high negative correlation. 5.2.3. QML in the frequency domain For a long memory SV model, QML estimation in the time domain becomes relatively less attractive because the state space form (SSF) can only be used by expressing ht as an autoregressive or moving average process and truncating at a suitably high lag. Thus the approach is cumbersome, though the initial state covariance matrix is easily constructed, and the truncation does not affect the
171
asymptotic properties of the estimators. If the autoregressive approximation, and therefore the SSF, is not used, time domain Q M L requires the repeated construction and inversion of the T T covariance matrix of the log y~tls; see Sowell (1992). On the other hand, Q M L estimation in the frequency domain is no more difficult than it is in the AR(1) case. Cheung and Diebold (1994) present simulation evidence which suggests that although time domain estimation is more efficient in small samples, the difference is less marked when a mean has to be estimated. The frequency domain (quasi) log-likelihood function is, neglecting constants,
1
T-1 T-1
logL = - _ ~ - ~ loggj - n~-~I(2j)/gj 2j__~ j=l
(5.2.1)
where I(2j) is the sample spectrum of the log ~ ' s and 9j is the spectral generating function (SGF), which for (3.5.1) is
gj = a2,[2(1 - cos2j)] -d + a~ .
Note that the summation in (5.2.1) is f r o m j = 1 rather t h a n j = 0. This is because go cannot be evaluated for positive d . However, the omission of the zero fre2 a~ and d, but a~ quency does remove the mean. The unknown parameters are a,, may be concentrated out of the likelihood function by a reparameterisation in 2 2 which a~ 2 is replaced by the signal-noise ratio q = %/a. On the other hand if a distribution is assumed for et, then a~ is known. Breidt, Crato and de Lima (1993) show the consistency of the Q M L estimator. When d lies between 0.5 and one, ht is nonstationary, but differencing the log Yt 2 ts yields a zero mean stationary process, the SGF of which is
9j = ~r212(1 - cos 2j)] 1-d + 2(1 - cos2j)a~ .

One of the attractions of long memory models is that inference is not affected by the kind of unit root issues which arise with autoregressions. Thus a likelihood based test of the hypothesis that d = 1 against the alternative that it is less than one can be constructed using standard theory; see Robinson (1993).
5.2.4. Comparison of GMM and QML

Simulation evidence on the finite sample performance of G M M and Q M L can be found in Andersen and Sorensen (1993), Ruiz (1994), Jacquier, Polson and Rossi (1994), Breidt and Carriquiry (1995), Andersen and Sorensen (1996) and Harvey and Shephard (1996). The general conclusion seems to be that Q M L gives estimates with a smaller MSE when the volatility is relatively strong as reflected in a high coefficient of variation. This is because the normally distributed volatility component in the measurement equation, (3.2.8), is large relative to the nonnormal error term. With a lower coefficient of variation, G M M dominates. However, in this case Jacquier, Polson and Rossi (1994, p. 383) observe that " . . . the performance of both the Q M L and G M M estimators deteriorates rapidly." In
172
other words the case for one of the more computer intensive methods outlined in Section 5.6 becomes stronger. Other things being equal, an AR coefficient, qS, close to one tends to favor Q M L because the autocorrelations are slow to die out and are hence captured less well by the moments used in GMM. For the same reason, G M M is likely to be rather poor in estimating a long memory model. The attraction of QML is that it is very easy to implement and it extends easily to more general models, for example nonstationary and multivariate ones. At the same time, it provides filtered and smoothed estimates of the state, and predictions. The one-step ahead prediction errors can also be used to construct diagnostics, such as the Box-Ljung statistic, though in evaluating such tests it must be remembered that the observations are non-normal. Thus even if the hyperparameters are eventually estimated by another method, QML may have a valuable role to play in finding a suitable model specification.
5.3. Continuous time G M M

Hansen and Scheinkman (1995) propose to estimate continuous time diffusions using a G M M procedure specifically tailored for such processes. In Section 5.1 we discussed estimation of SV models which are either explicitly formulated as discrete time processes or else are discretizations of the continuous time diffusions. In both cases inference is based on minimizing the difference between unconditional moments and their sample equivalent. For continuous time processes Hansen and Scheinkman (1995) draw directly upon the diffusion rather than its discretization to formulate moment conditions. To describe the generic setup of the method they proposed let us consider the following (multivariate) system of n diffusion equations:
dy, = ~(Yt; O)dt + ~(y,; O)dW, .
(5.3.1)
A comparison with the notation in Section 2 immediately draws attention to certain limitations of the setup. First, the functions # 0 ( ' ) = # ( - ; 0 ) and o-0(.) = a(.; 0) are parameterized by Yt only which restricts the state variable process Ut in Section 2 to contemporaneous values of yr. The diffusion in (5.3.1) involves a general vector process Yt, hence yt could include a volatility process to accommodate SV models. Yet, the Yt vector is assumed observable. For the moment we will leave these issues aside, but return to them at the end of the section. Hansen and Scheinkman (1995) consider the infinitesimal operator A defined for a class of square integrable functions ~0: Nn _~ R as follows:
- d2~(Y) Aocp(y) - d~o(Y)dy #o(Y) + l x r ( ao(Y)a~o(Y) ~ j ~ .

Because the operator is defined as a limit, namely:
(5.3.2)
Aoq~(y) = lim t -1 [E(~o(yt)lyo = y) - y] ,

t---~O
173
it does not necessarily exist for all square integrable functions q~ but only for a restricted domain D. A set of m o m e n t conditions can now be obtained for this class of functions ~0 E D. Indeed, as shown for instance by Revuz and Yor (1991), the following equalities hold:
EAocp(yt) = 0 ,
E = 0,
5.3.3)
5.3.4)
where A~ is the adjoint infinitesimal operator of Ao for the scalar product associated with the invariant measure of the process y.37 By choosing an appropriate set of functions, Hansen and Scheinkman exploit m o m e n t conditions (5.3.3) and (5.3.4) to construct a G M M estimator of 0. The choice of the function ~o c D and ~ c D* determines what moments of the data are used to estimate the parameters. This obviously raises questions regarding the choice of functions to enhance efficiency of the estimator but first and foremost also the identification of 0 via the conditions (5.3.3) and (5.3.4). It was noted in the beginning of the section that the multivariate process Yt, in order to cover SV models, must somehow include the latent conditional variance process. Gouri6roux and Monfort (1994, 1995) point out that since the m o m e n t conditions based on ~o and b cannot include any latent process it will often (but not always) be impossible to attain identification of all the parameters, particularly those governing the latent volatility process. A possible remedy is to augment the model with observations indirectly related to the latent volatility process, in a sense making it observable. One possible candidate would be to include in yt both the security price and the Black-Scholes implied volatilities obtained through option market quotations for the underlying asset. This approach is in fact suggested by Pastorello, Renault and Touzi (1993) although not in the context of continuous time G M M but instead using indirect inference methods which will be discussed in Section 5.5. 38 Another possibility is to rely on the time deformation representation of SV models as discussed in the context of continuous time G M M by Conley et al. (1995).
5.4. Simulated method of moments

The estimation procedures discussed so far do not involve any simulation techniques. F r o m now on we cover methods combining simulation and estimation beginning with the simulated method of moments (SMM) estimator, which is covered by Duffle and Singleton (1993) for time series processes. 39 In Section 5.1
37 Please note that A~ is again associated with a domain D* so that ~oc D and ~ C D* in (5.3.4). 38 It was noted in section 2.1.3 that implied volatilities are biased. The indirect inference procedures used by Pastorello, Renault and Touzi (1993) can cope with such biases, as will be explained in section 5.5. The use of option price data is further discussed in section 5.7. 39 SMM was originally proposed for cross-sectionapplications, see Pakes and Pollard (1989) and McFadden (1989). See also Gouri~roux and Monfort (1993a).
174
we noted that G M M estimation of SV models is based on minimizing the distance between a set of chosen sample moments and unconditional population moments expressed as analytical functions of the model parameters. Suppose now that such analytical expressions are hard to obtain. This is particularly the case when such expressions involve marginalizations with respect to a latent process such a stochastic volatility process. Could we then simulate data from the model for a particular value of the parameters and match moments from the simulated data with sample moments as a substitute? This strategy is precisely what S M M is all about. Indeed, quite often it is fairly straightforward to simulate processes and therefore take advantage of the SMM procedure. Let us consider again as point of reference and illustration the (multivariate) diffusion of the previous section (equation (5.3.1)) and conduct H simulations i = 1,..., H using a discretization:
^ i 0 ); 0) + a ( ~ ( 0 ) ; O)et and i = 1,. A~v~(O) = #(Yt( ..
, H and t
1, , . , ~ T
where ~vt(O) are simulated given a parameter 0 and et is i.i.d. Gaussian. 4 Subject to identification and other regularity conditions one then considers
1 /~ ~T = Arg min [If(Yt,... Yr) - ~ - ~ f ( ~ v ] ( O ) , . . .
0 i=1
,p~(0))l[
with a suitable choice of norm, i.e. weighting matrix for the quadratic form as in G M M , and function f of the data, i.e. moment conditions. The asymptotic distribution theory is quite similar to that of G M M , except that simulation introduces an extra source of random error affecting the efficiency of the S M M estimator in comparison to its G M M counterpart. The efficiency loss can be controlled by the choice of H. 41
5.5. Indirect inference and moment matching

The key insight of the indirect inference approach of Gouri~roux, Monfort and Renault (1993) and the m o m e n t matching approach of Gallant and Tauchen (1994) is the introduction of an auxiliary model parameterized by a vector, say t , in order to estimate the model of interest. In our case the latter is the SV model. 42 In the first subsection we will describe the general principle while the second will focus exclusively on estimating diffusions.
5.5.1. The principle

We noted at the beginning of Section 5 that A R C H type models are relatively easy to estimate in comparison to SV models. For this reason an A R C H type model 40 We discuss in detail the simulation techniques in the next section. Indeed, to control for the discretization bias, one has to simulate with a finer sampling interval. 41 The asymptotic variance of the SMM estimator depends on H through a factor(1 + H -1), see e.g. Gouri6roux and Monfort (1995). 42 It is worth noting that the simulation based inference methods we will describe here are applicable to many other types of models for cross-sectional, time series and panel data.
175
may be a possible candidate as an auxiliary model. An alternative strategy would be to try to summarize the features of the data via a SNP density as developed by Gallant and Tauchen (1989). This empirical SNP density, or more specifically its score, could also fulfill the role of auxiliary model. Other possibilities could be considered as well. The idea is then to use the auxiliary model to estimate t, so that:
T
fir = Arg max Z log f* (y, ] yt-l,fl)

fl t=l
(5.5.1)
where we restrict our attention here to a simple dynamic model with one lag for the purpose of illustration. The objective function f* in (5.5.1) can be a pseudolikelihood function when the auxiliary model is deliberately misspecified to facilitate estimation. As an alternative f* can be taken from the class of SNP densities. 43 Gouri6roux, Monfort and Renault then propose to estimate the same parameter vector fl not using the actual sample data but instead using samples hi T {yt(O)}t=l simulated i = 1, ...H times drawn from the model of interest given 0. This yields a new estimator of fl, namely:
H fl T
/~ttr(0) = a r g max(1/H)ZZlogf*(~v~(O)l~_l(O),fl)
i=1 t=l
(5.5.2)
The next step is to minimize a quadratic distance using a weighting matrix Wr to choose an indirect estimator of 0 based on H simulation replications and a sample of T observations, namely: 0nr = Arg m i n ( / ~ r - flh,r(0))'Wr ( f i r - ~Hr(0)) (5.5.3)
The approach of Gallant and Tauchen (1994) avoids the step of estimating fl,qr(0) by computing the score function of f* and minimizing a quadratic distance similar to (5.5.3) but involving the score function evaluated at fir and replacing the sample data by simulated series generated by the model of interest. Under suitable regularity conditions the estimator OHr is root T consistent and asymptotically normal. As with G M M and SMM there is again an optimal weighting matrix. The resulting asymptotic covariance matrix depends on the number of simulations in the same way the SMM estimator depends on H. Gouri~roux, Monfort and Renault (1993) illustrated the use of indirect inference estimator with a simple example that we would like to briefly discuss here. Typically AR models are easy to estimate while MA models require more elaborate procedures. Suppose the model of interest is a moving average model of order one with parameter 0. Instead of estimating the MA parameter directly from the data they propose to estimate an AR(p) model involving the parameter
43 The discussion should not leave the impression that the auxiliary model can only be estimated via ML-type estimators. Any root T consistent asymptotically normal estimation procedure may be used.
176
vector ft. The next step then consists of simulating data using the M A model and proceeding further as described above. 44 They found that the indirect inference estimator for Our appeared to have better finite sample properties than the more traditional m a x i m u m likelihood estimators for the M A parameter. In fact the indirect inference estimator exhibited features similar to the median unbiased estimator proposed by Andrews (1993). These properties were confirmed and clarified by Gouri6roux, Renault and Touzi (1994) who studied the second order asymptotic expansion of indirect inference estimators and their ability to reduce finite sample bias.
5.5.2. Estimating diffusions

Let us consider the same diffusion equation as in Section 5.3 which dealt with continuous time G M M , namely:
d yt = #(yt; O)dt + a(yt; O)dWt .
(5.5.4)
In Section 5.3 we noted that the above equation holds under certain restrictions such as the functions # and a being restricted to yt as arguments. While these restrictions were binding for the setup of Section 5.3 this will not be the case for the estimation procedures discussed here. Indeed, equation (5.5.4) is only used as an illustrative example. The diffusion is then simulated either via exact discretizations or some type of approximate discretization (e.g. Euler or Mil'shtein, see Pardoux and Talay (1985) or Kloeden and Platten (1992) for further details). More precisely we define the process yl a) such that:
,(a) ~-- y('~) k6 + Y(k+l)~
#(y2~;O)6+a(y2~);
( )6 0 ) (~l/2e(6k)l
(5.5.5)
Under suitable regularity conditions (see for instance Strook and Varadhan (1979)) we know that the diffusion admits a unique solution (in distribution) and the process yl z) converges to Yt as 6 goes to zero. Therefore one can expect to simulate yt quite accurately for 6 sufficiently small. The auxiliary model may be a discretization of (5.5.4) choosing 6 = 1. Hence, one formulates a M L estimator based on the nonlinear A R model appearing in (5.5.5) setting 6 = 1. To control for the discretization bias one can simulate the underlying diffusion with 6 = 1/10 or 1/20, for instance, and aggregate the simulated data to correspond with the sampling frequency of the D G P . Broze, Scaillet and Zakoian (1994) discuss the effect of the simulation step size on the asymptotic distribution. The use of simulation-based inference methods becomes particularly appropriate and attractive when diffusions involve latent processes, such as is the case 44 Again one could use a score principle here, following Gallant and Tauchen (1994). In fact in a linear Gaussian setting the SNP approach to fit data generated by a MA (1) model would be to estimate an AR(p) model. Ghysels,Khalaf and Vodounou (1994) provide a more detailed discussion of score-based and indirect inference estimators of MA models as well as their relation with more standard estimators.
177
with SV models. Gouritroux and Monfort (1994, 1995) discuss several examples and study their performance via Monte Carlo simulation. It should be noted that estimating the diffusion at a coarser discretization is not the only possible choice of auxiliary model. Indeed, Pastorello, Renault and Touzi (1993), Engle and Lee (1994) and Gallant and Tauchen (1994) suggest the use of ARCH-type models. There have been several successful applications of these methods to financial time series. They include Broze et al. (1995), Engle and Lee (1994), Gallant, Hsieh and Tauchen (1994), Gallant and Tauchen (1994, 1995), Ghysels, Gouritroux and Jasiak (1995b), Ghysels and Jasiak (1994a,b), Pastorello et al. (1993), among others.
5.6. Likelihood-based and Bayesian methods

In a Gaussian linear state space model the likelihood function is constructed from the one step ahead prediction errors. This prediction error decomposition form of the likelihood is used as the criterion function in QML, but of course it is not the exact likelihood in this case. The exact filter proposed by Watanabe (1993) will, in principle, yield the exact likelihood. However, as was noted in Section 3.4.2, because this filter uses numerical integration, it takes a long time to compute and if numerical optimization is to be carried out with respect to the hyperparameters it becomes impractical. Kim and Shephard (1994) work with the linear state space form used in QML but approximate the log(z2) distribution of the measurement error by a mixture of normals. For each of these normals, a prediction error decomposition likelihood function can be computed. A simulated EM algorithm is used to find the best mixture and hence calculate approximate ML estimates of the hyperparamaters. The exact likelihood function can also be constructed as a mixture of distributions for the observations conditional on the volatilities, that is
L(y; q~,an, 0"2) =
p(ylh)p(h)dh
where y and h contain the T elements of Yt and ht respectively. This expression can be written in terms of the at 2 's, rather than their logarithms, the ht is, but it makes little difference to what follows. Of course the problem is that the above likelihood has no closed form, so it must be calculated by some kind of simulation method. Excellent discussions can be found in Shephard (1995) and in Jacquier, Poison and Rossi (1994), including the comments. Conceptually, the simplest approach is to use Monte Carlo integration by drawing from the unconditional distribution of h for given values of the parameters,(~b, a~, a2), and estimating the likelihood as the average of the p(y[h)'s. This is then repeated, searching over ~b,a~ until the maximum of the simulated likelihood is found. As it stands this procedure is not very satisfactory, but it may be improved by using ideas of importance sampling. This has been implemented for ML estimation of SV
178
E. Ghysels, A. C+Harvey and E. Renault
models by Danielsson and Richard (1993) and Danielsson (1994). However, the method becomes more difficult as the sample size increases. A more promising way of attacking likelihood estimation by simulation techniques is to use Markov Chain Monte Carlo (MCMC) to draw from the distribution of volatilities conditional on the observations. Ways in which this can be done were outlined in sub-Section 3.4.2 on nonlinear filters and smoothers. Kim and Shephard (1994) suggest a method of computing ML estimators by putting their multimove algorithm within a simulated EM algorithm. Jacquier, Poison and Rossi (1994) adopt a Bayesian approach in which the specification of the model has a hierarchical structure in which a prior distribution for the hyperparameters, q~ = (a~, ~b,a)', joins the conditional distributions, ylh and h[~0. (Actually the at's are used rather than the htts). The joint posterior of h and (p is proportional to the product of these three distributions, that is p(h, qgly) cx p(ylh)p(h[q~)p(q)). The introduction of h makes the statistical treatment tractable and is an example of what is called data augmentation; see Tanner and Wong (1987). From the joint posterior, p(h, ely), the marginal p(hly) solves the smoothing problem for the unobserved volatilities, taking account of the sampling variability in the hyperparameters. Conditional on h, the posterior of cp, p(q)ih, y) is simple to compute from standard Bayesian treatment of linear models. If it were also possible to sample directly from p(hlq), y) at low cost, it would be straightforward to construct a Markov chain by alternating back and forth drawing from p(cplh , y) and p(hl~o, y). This would produce a cyclic chain, a special case of which is the Gibbs sampler. However, as was noted in subSection 3.4.2, Jacquier, Poison and Rossi (1994) show that it is much better to decompose p(hiq), y) into a set of univariate distributions in which each hi, o r rather at, is conditioned on all the others. The prior distribution for o9, the parameters of the volatility process in JPR (1994), is the standard conjugate prior for the linear model, a (truncated) NormalGamma. The priors can be made extremely diffuse while remaining proper. JPR conduct an extensive sampling experiment to document the performance of this and more traditional approaches. Simulating stochastic volatility series, they compare the sampling performances of the posterior mean with that of the QML and GMM point estimates. The MCMC posterior mean exhibit root mean squared errors anywhere between half and a quarter of the size of the GMM and QML point estimates. Even more striking are the volatility smoothing performance results. The root mean squared error of the posterior mean of ht produced by the Bayesian filter is 10% smaller than the point estimate produced by an approximate Kalman filter supplied with the true parameters. Shephard and Kim in their comment of JPR (1994) point out that for very high q~ and small a~, the rate of convergence of the JPR algorithm will slow down. More draws will then be required to obtain the same amount of information. They propose to approximate the volatility disturbance with a discrete mixture of normals. The benefit of the method is that a draw of the vector h is then possible, faster than T draws from each hr. However this is at the cost that the draws navigate in a much higher dimensional space due to the discretisation effected.
179
Also, the convergence of chains based upon discrete mixtures is sensitive to the number of components and their assigned probability weights. Mahieu and Schotman (1994) add some generality to the Shephard and Kim idea by letting the data produce estimates of the characteristics of the discretized state space (probabilities, mean and variance). The original implementation of the JPR algorithm was limited to a very basic model of stochastic volatility, AR(1) with uncorrelated mean and volatility disturbances. In a univariate setup, correlated disturbances are likely to be important for stock returns, i.e., the so called leverage effect. The evidence in Gallant, Rossi, and Tauchen (1994) also points at non normal conditional errors with both skewness and kurtosis. Jacquier, Polson, and Rossi (1995a) show how the hierarchical framework allows the convenient extension of the M C M C algorithm to more general models. Namely, they estimate univariate stochastic volatility models with correlated disturbances, and skewed and fat-tailed variance disturbance, as well as multivariate models. Alternatively, the M C M C algorithm can be extended to a factor structure. The factors exhibit stochastic volatility and can be observable or non-observable.
5.7. Inference and option price data Some of the continuous time SV models currently found in the literature were developed to answer questions regarding derivative security pricing. Given this rather explicit link between derivates and SV diffusions it is perhaps somewhat surprising that relatively little attention has been paid to the use of option price data to estimate continuous time diffusions. Melino (1994) in his survey in fact notes: "Clearly, information about the stochastic properties of an asset's price is contained both in the history of the asset's price and the price of any options written on it. Current strategies for combining these two sources of information, including implicit estimation, are uncomfortably ad hoc. Statistically speaking, we need to model the source of the prediction errors in option pricing and to relate the distribution of these errors to the stock price process". For example implicit estimation, like computation of BS implied volatilities, is certainly uncomfortably ad hoc from a statistical point of view. In general, each observed option price introduces one source of prediction error when compared to a pricing model. The challenge is to model the joint nondegenerate probability distribution of options and asset prices via a number of unobserved state variables. This approach has been pursued in a number of recent papers, including Christensen (1992), Renault and Touzi (1992), Pastorello et al. (1993), Duan (1994) and Renault (1995). Christensen (1992) considers a pricing model for n assets as a function of a state vector xt which is (l + n) dimensional and divided into a /-dimensional observed (zt) and n-dimensional unobserved (~ot) components. Let Pt be the price vector of the n assets, then: pt = m(zt, ogt, O) . (5.7.1)
180
Equation (5.7.1) provides a one-to-one relationship between the n latent state variables ~ot and the n observed prices pt, for given zt and 0. From a financial viewpoint, it implies that the n assets are appropriate instruments to complete the markets if we assume that the observed state variables zt are already mimicked by the price dynamics of other (primitive) assets. Moreover, from a statistical viewpoint it allows full structural maximum likelihood estimation provided the log-likelihood function for observed prices can be deduced easily from a statistical model for xt. For instance, in a Markovian setting where, conditionally on x0, the joint distribution of x r (Xt)I<j<T is given by the density:
=
fx(x~lxo, O) = I I
t=l
f(zt,('lzt-~,'-',O)
(5.7.2)
the conditional distribution of data D r = (Pt,Zt)l<_t<_T given Do = (p0,zo) is obtained by the usual Jacobian formula:
T
(D IDo, O) = H f[ zt,mol(z', P')I z'-l'mo'(z'-'' Pt-1),O]x

t=l
IVo, m(zt, mol(zt, p,),O)1-1
(5.7.3)
where m0 -l(z,.) is the ~o-inverse of m(z,.,O) defined formally by mo 1(z, m(z, ~o, 0)) = o9 while ~7,om (.) represents the columns corresponding to ~o of the Jacobian matrix. This M L E using price data of derivatives was proposed independently by Christensen (1992) and Duan (1994). Renault and Touzi (1992) were instead more specifically interested in the Hull and White option pricing formula with: zt = St observed underlying asset price, and ogt = at unobserved stochastic volatility process. Then with the joint process xt = (St, 6t) being Markovian we have a call price of the form:
Ct = m(xt, O, K)
where 0 --- (~', V') involves two types of parameters: (1) the vector ~ of parameters describing the dynamics of the joint process xt = (St, at) which under the equivalent martingale measure allows to compute the expectation with respect to the (risk-neutral) conditional probability distribution of V2(t, t + h) given at; and (2) the vector 7 of parameters which characterize the risk premia determining the relation between the risk neutral probability distribution of the x process and the Data Generating Process. Structural M L E is often difficult to implement. This motivated Renault and Touzi (1992) and Pastorello, Renault and Touzi (1993) to consider less efficient but simpler and more robust procedures involving some proxies of the structural likelihood (5.7.3). To illustrate these procedures let us consider the standard log-normal SV model in continuous time:
181 (5.7.4)
d logat = k(a - log at)dt + c d W t
Standard option pricing arguments allow us to ignore misspecifications of the drift of the underlying asset price process. Hence, a first step towards simplicity and robustness is to isolate from the likelihood function the volatility dynamics, namely:
I-[(2~2)-l/2exp[--(22)-a(log
i=1
~Tti--e-kAtlogat,_l --a(l -
e-ka'))] 2
(5.7.5)
associated with a sample o-t,, i = 1,... ,n and t i - ti_~ = At. To approximate this expression one can consider a direct method, as in Renault and Touzi (1992) or an indirect method, as in Pastorello et al. (1993). The former involves calculating implied volatilities from the Hull and White model to create pseudo samples o'ti parameterized by k, a and c and computing the maximum of (5.7.5) with respect to those three parameters. 45 Pastorello et al. (1993) proposed several indirect inference methods, described in Section 5.5, in the context of (5.7.5). For instance, they propose to use an indirect inference strategy involving GARCH(1,1) volatility estimates obtained from the underlying asset (also independently suggested by Engle and Lee (1994)). This produces asymptotically unbiased but rather inefficient estimates. Pastorello et al. indeed find that an indirect inference simplification of the Renault and Touzi direct procedure involving option prices is far more efficient. It is a clear illustration of the intuition that the use of option price data paired with suitable statistical methods should largely improve the accuracy of estimating volatility diffusion parameters.
5.8. Regression models with stochastic volatility
A single equation regression model with stochastic volatility in the disturbance term may be written
yt
t=
1,...,r,
(5.8.1)
where yt denotes the t th observation, xt is a k 1 vector of explanatory variables, fl is a k 1 vector of coefficients and ut = act exp(0.5ht) as discussed in Section 3. As a special case, the observations may simply have a non-zero mean so that
x't#
vt.
Since ut is stationary, an OLS regression of yt on xt yields a consistent estimator of p. However it is not efficient.
45 The direct maximizationof (5.7.5) using BS implied volatilitieshas also been proposed, see e.g. Heynen,Kemna and Vorst (1994). Obviouslythe use of BS impliedvolatilityinducesa misspecification bias due to the BS model assumptions.
182
E. Ghysels, A. C. H a r v e y and E. Renault
For given values of the SV parameters, q5 and a,, 2 a smoothed estimator of ht, htlr, can be computed using one of the methods outlined in Section 3.4. Multiplying (5.8.1) through by exp(-.5htlr) gives
=
t = 1,..., r
(5.8.2)
where the fit's can be thought of as heteroskedasticity corrected disturbances. Harvey and Shephard (1993) show that these disturbances have zero mean, constant variance and are serially uncorrelated and hence suggest the construction of a feasible GLS estimator
-.8=
kt=l
e-h'lTxtx
J
Z e-h'lTxtYt "
t=l
(5.8.3)
In the classical heteroskedastic regression model ht is deterministic and depends on a fixed number of unknown parameters. Because these parameters can be estimated consistently, the feasible GLS estimator has the same asymptotic distribution as the GLS estimator. Here ht is stochastic and the MSE of its estimator is of O(1). The situation is therefore somewhat different. Harvey and Shephard (1993) show that, under standard regularity conditions on the sequence of xt, [! is asymptotically normal with mean/~ and a covariance matrix which can be consistently estimated by a-~v~r@)=
Ze-h':~xtx't
kt=l J
tt~l(yt-xt~)e
2 -2ht r
'xtx, t
=
e ht'r xtx tt
(5.8.4)
When ht[r is the smoothed estimate given by the linear state space form, the analysis in Harvey and Shephard (1993) suggests that, asymptotically, the feasible GLS estimator is almost as efficient as the GLS estimator and considerably more efficient than the OLS estimator. It would be possible to replace exp(htlr) by a better estimate computed from one of the methods described in Section 3.4 but this may not have much effect on the efficiency of the resulting feasible GLS estimator of ~. When ht is nonstationary, or nearly nonstationary, Hansen (1995) shows that it is possible to construct a feasible adaptive least squares estimator which is asymptotically equivalent to GLS.
Conclusions No survey is ever complete. There are two particular areas we expect will flourish in the years to come but which we were not able to cover. The first is the area of market microstructures which is well surveyed in a recent review paper by Goodhart and O'Hara (1995). With the ever increasing availability of high fre-
183
q u e n c y d a t a series, we a n t i c i p a t e m o r e w o r k involving g a m e theoretic models. These can n o w be e s t i m a t e d because o f recent a d v a n c e s in e c o n o m e t r i c m e t h o d s , similar to those e n a b l i n g us to e s t i m a t e diffusions. A n o t h e r a r e a where we expect interesting research to emerge is t h a t involving n o n p a r a m e t r i c p r o c e d u r e s to est i m a t e SV c o n t i n u o u s time a n d derivative securities models. R e c e n t p a p e r s include A i t - S a h a l i a (1994), A i t - S a h a l i a et al. (1994), Bossaerts, H a f n e r a n d H ~ r d l e (1995), B r o a d i e et al. (1995), C o n l e y et al. (1995), Elsheimer et al. (1995), G o u r i 6 roux, M o n f o r t a n d T e n r e i r o (1994), G o u r i 6 r o u x a n d Scaillet (1995), H u t c h i n s o n , L o a n d P o g g i o (1994), L e z a n et al. (1995), L o (1995), P a g a n a n d Schwert (1992). R e s e a r c h into the e c o n o m e t r i c s o f Stochastic V o l a t i l i t y m o d e l s is relatively new. A s o u r survey has shown, there has been a b u r s t o f activity in recent years d r a w i n g on the latest statistical technology. A s r e g a r d s the r e l a t i o n s h i p with A R C H , o u r view is t h a t SV a n d A R C H are n o t necessarily direct c o m p e t i t o r s , b u t r a t h e r c o m p l e m e n t each o t h e r in certain respects. R e c e n t a d v a n c e s such as the use o f A R C H m o d e l s as filters, the w e a k e n i n g o f G A R C H a n d t e m p o r a l a g g r e g a t i o n a n d the i n t r o d u c t i o n o f n o n p a r a m e t r i c m e t h o d s to fit c o n d i t i o n a l variances, illustrate t h a t a unified strategy for m o d e l l i n g volatility needs to d r a w o n b o t h A R C H a n d SV.
References
Abramowitz, M. and N. C. Stegun (1970). Handbook of Mathematical Functions. Dover Publications Inc., New York. Ait-Sahalia, Y. (1994). Nonparametric pricing of interest rate derivative securities. Discussion Paper, Graduate School of Business, University of Chicago. Ait-Sahalia0 Y. S. J. Bickel and T. M. Stoker (1994). Goodness-of-Fit tests for regression using kernel methods. Discussion Paper, University of Chicago. Amin, K. L. and V. Ng (1993). Equilibrium option valuation with systematic stochastic volatility. J. Finance 48, 881-910. Andersen, T. G. (1992). Volatility. Discussion paper, Northwestern University. Andersen, T. G. (1994). Stochastic autoregressive volatility: A framework for volatility modeling. Math. Finance 4, 75-102. Andersen, T. G. (1996). Return volatility and trading volume: An information flow interpretation of stochastic volatility. J. Finance, to appear. Andersen, T. G. and T. Bollerslev (1995). Intraday seasonality and volatility persistence in financial Markets. J. Emp. Finance, to appear. Andersen, T. G. and B. Sorensen (1993). GMM estimation of a stochastic volatility model: A Monte Carlo study. J. Business Econom. Statist. to appear. Andersen, T. G. and B. Sorensen (1996). GMM and QML asymptotic standard deviations in stochastic volatility models: A response to Ruiz (1994). J. Econometrics, to appear. Andrews, D. W. K. (1993). Exactly median-unbiased estimation of first order autoregressive unit root models. Econometrica 61, 139-165. Bachelier, L. (1900). Th6orie de la sp6culation. Ann. Sci. Ecole Norm. Sup. 17, 21-86, [On the Random Character of Stock Market Prices (Paul H. Cootner, ed.) The MIT Press, Cambridge, Mass. 1964]. Baillie, R. T. and T. Bollerslev (1989). The message in daily exchange rates: A conditional variance tale. J. Business Econom. Statist. 7, 297-305. Baillie, R. T. and T. Bollerslev (1991). Intraday and lnterday volatility in foreign exchange rates. Rev. Econom. Stud. 58, 565-585.
184
Baillie, R. T., T. Bollersle' and H. O. Mikkelsen (1993). Fractionally integrated generalized autoregressive conditional heteroskedasticity. J. Econometrics, to appear. Bajeux, I. and J. C. Rochet (1992). Dynamic spanning: Are options an appropriate instrument? Math. Finance, to appear. Bates, D. S. (1995a). Testing option pricing models. In: G. S. Maddala ed., Handbook of Statistics, Vol. 14, Statistical Methods in Finance. North Holland, Amsterdam, in this volume. Bates, D. S. (1995b). Jumps and stochastic volatility: Exchange rate processes implicit in PHLX Deutschemark options. Rev. Financ. Stud., to appear. Beckers, S. (1981). Standard deviations implied in option prices as predictors of future stock price variability. J. Banking Finance 5, 363-381. Bera, A. K. and M. L. Higgins (1995). On ARCH models: Properties, estimation and testing. In: L. Exley, D. A. R. George, C. J. Roberts and S. Sawyer eds., Surveys in Econometrics. Basil Blackwell: Oxford, Reprinted from J. Econom. Surveys. Black, F. (1976). Studies in stock price volatility changes. Proceedings of the 1976 Business Meeting of the Business and Economic Statistics Section, Amer. Statist. Assoc. 177-181. Black, F. and M. Scholes (1973). The pricing of options and corporate liabilities. J. Politic. Econom. 81, 637-654. Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. J. Econometrics 31, 307-327. Bollerslev, T., Y. C. Chou and K. Kroner (1992). ARCH modelling in finance: A selective review of the theory and empirical evidence. J. Econometrics 52, 201-224. Bollerslev, T. and R. Engle (1993). Common persistence in conditional variances. Econometrica 61, 166-187. Bollerslev, T., R. Engle and D. Nelson (1994). ARCH models. In: R. F. Engle and D. McFadden eds., Handbook of Econometrics, Volume IV. North-Holland, Amsterdam. Bollerslev, T., R. Engle and J. Wooldridge (1988). A capital asset pricing model with time varying eovariances. J. Politic. Econom. 96, 116-131. Bollerslev, T. and E. Ghysels (1994). On periodic autoregression conditional heteroskedasticity. J. Business Econom. Statist., to appear. Bollerslev, T. and H. O. Mikkelsen (1995). Modeling and pricing long-memory in stock market volatility. J. Econometrics, to appear. Bossaerts, P , C. Harrier and W. Hfirdle (1995). Foreign exchange rates have surprising volatility. Discussion Paper, CentER, University of Tilburg. Bossaerts, P. and P. Hillion (1995). Local parametric analysis of hedging in discrete time. o r. Econometrics, to appear. Breidt, F. J., N. Crato and P. de Lima (1993). Modeling long-memory stochastic volatility. Discussion paper, Iowa State University. Breidt, F. J. and A. L. Carriquiry (1995). Improved quasi-maximum likelihood estimation for stochastic volatility models. Mimeo, Department of Statistics, University of Iowa. Broadie, M., J. Detemple, E. Ghysels and O. Torr& (1995). American options with stochastic volatility: A nonparametric approach. Discussion Paper, CIRANO. Broze, L., O. Scaitlet and J. M. Zakoian (1994). Quasi indirect inference for diffusion processes. Discussion Paper CORE. Broze, L., O. Scaillet and J. M. Zakoian (1995). Testing for continuous time models of the short term interest rate. J. Emp. Finance, 199-223. Campa, J. M. and P. H. K. Chang (1995). Testing the expectations hypothesis on the term structure of implied volatilities in foreign exchange options. J. Finance 50, to appear. Campbell, J. Y. and A. S. Kyle (1993). Smart money, noise trading and stock price behaviour. Rev. Econom. Stud. 60, 1-34. Canina, L. and S. Figlewski (1993). The informational content of implied volatility. Rev. Financ. Stud. 6, 659-682. Cauova, F. (1992). Detrending and Business Cycle Facts. Discussion Paper, European University Institute, Florence.
185
Chesney, M. and L. Scott (1989). Pricing European currency options: A comparison of the modified Black-Scholes model and a random variance model. J. Financ. Quant. Anal. 24, 267-284. Cheung, Y.-W. and F. X. Diebold (1994). On maximum likelihood estimation of the differencing parameter of fractionally-integrated noise with unknown mean. J. Econometrics 62, 301-316. Chiras, D. P. and S. Manaster (1978). The information content of option prices and a test of market efficiency. J. Financ. Econom. 6, 213-234. Christensen, B. J. (1992). Asset prices and the empirical martingale model. Discussion Paper, New York University. Christie, A. A. (1982). The stochastic behavior of common stock variances: Value, leverage, and interest rate effects. J. Financ. Econom. 10, 407-432. Clark, P. K. (1973). A subordinated stochastic process model with finite variance for speculative prices. Econometrica 41, 135-156. Clewlow, L and X. Xu (1993). The dynamics of stochastic volatility. Discussion Paper, University of Warwick. Comte, F. and E. Renault (1993). Long memory continuous time models. J. Econometrics, to appear. Comte, F. and E. Renault (1995). Long memory continuous time stochastic volatility models. Paper presented at the HFDF-I Conference, Ziirich. Conley, T., L. P. Hansen, E. Luttmer and J. Scheinkman (1995). Estimating subordinated diffusions from discrete time data. Discussion paper, University of Chicago. Cornell, B. (1978). Using the options pricing model to measure the uncertainty producing effect of major announcements. Financ. Mgmt. 7, 54-59. Cox, J. C. (1975). Notes on option pricing I: Constant elasticity of variance diffusions. Discussion Paper, Stanford University. Cox, J. C. and S. Ross (1976). The valuation of options for alternative stochastic processes. J. Financ. Econom. 3, 145-166. Cox, J. C. and M. Rubinstein (1985). Options Markets. Englewood Cliffs, Prentice-Hall, New Jersey. Dacorogna, M. M., U. A. Miiller, R. J. Nagler, R. B. Olsen and O. V. Pictet (1993). A geographical model for the daily and weekly seasonal volatility in the foreign exchange market. J. Internat. Money Finance 12, 413-438. Danielsson, J. (1994). Stochastic volatility in asset prices: Estimation with simulated maximum likelihood. J. Econometrics 61, 375-400. Danielsson, J. and J. F. Richard (1993). Accelerated Gaussian importance sampler with application to dynamic latent variable models. ,/. AppL Econometrics 3, S153-S174. Dassios, A. (1995). Asymptotic expressions for approximations to stochastic variance models. Mimeo, London School of Economics. Day, T. E. and C. M. Lewis (1988). The behavior of the volatility implicit in the prices of stock index options. J. Financ. Econom. 22, 103-122. Day, T. E. and C. M. Lewis (1992). Stock market volatility and the information content of stock index options. J. Econometrics 52, 267-287. Diebold, F. X. (1988). Empirical Modeling of Exchange Rate Dynamics. Springer Verlag, New York. Diebold, F. X. and J. A. Lopez (1995). Modeling Volatility Dynamics. In: K. Hoover ed., Macroeconomics: Developments, Tensions and Prospects. Diebold, F. X. and M. Nerlove (1989). The dynamics of exchange rate volatility: A multivariate latent factor ARCH Model. J. AppL Econometrics 4, 1-22. Ding, Z., C. W. J. Granger and R. F. Engle (1993). A long memory property of stock market returns and a new model. J. Emp. Finance 1, 83-108. Diz, F. and T. J. Finucane (1993). Do the options markets really overreact? J. Futures Markets 13, 298-312. Drost, F. C. and T. E. Nijman (1993). Temporal aggregation of GARCH processes. Econometrica 61, 90~927. Drost, F. C. and B. J. M. Werker (1994). Closing the GARCH gap: Continuous time GARCH modelling. Discussion Paper CentER, University of Tilburg. Duan, J. C. (1994). Maximum likelihood estimation using price data of the derivative contract. Math. Finance 4, 155-167.
186
Duan, J. C. (1995). The GARCH option pricing model. Math. Finance 5, 13-32. Duffle, D. (1989). Futures Markets. Prentice-Hall International Editions. Duffle, D. (1992). Dynamic Asset Pricing Theory. Princeton University Press. Duffle, D. and K. J. Singleton (1993). Simulated moments estimation of Markov models of asset prices. Econometrica 61, 929-952. Dunsmuir, W. (1979). A central limit theorem for parameter estimation in stationary vector time series and its applications to models for a signal observed with noise. Ann. Statist. 7, 490-506~ Easley, D. and M. O'Hara (1992). Time and the process of security price adjustment. J. Finance, 47, 577~505. Ederington, L. H, and J. H. Lee (1993). How markets process information: News releases and volatility. J. Finance 48, 1161-1192. Elsheimer, B., M. Fisher, D. Nychka and D. Zirvos (1995). Smoothing splines estimates of the discount function based on US bond Prices. Discussion Paper Federal Reserve, Washington, D.C. Engle, R. F. (1982). Autoregressive conditional heteroskedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50, 987-1007. Engle, R. F. and C. W. J. Granger (1987). Co-integration and error correction: Representation, estimation and testing. Econometrica 55, 251-576. Engle, R. F. and S. Kozicki (1993). Testing for common features. J. Business Econom. Statist. 11, 369379. Engle, R. F. and G. G. J. Lee (1994). Estimating diffusion models of stochastic volatility. Discussion Paper, Univeristy of California at San Diego. Engle, R. F. and C. Mustafa (1992). Implied ARCH models from option prices. J. Econometrics 52, 289-311. Engle, R. F. and V. K. Ng (1993). Measuring and testing the impact of news on volatility. J. Finance 48, 1749-1801. Fama, E. F. (1963). Mandelbrot and the stable Paretian distribution. J. Business 36, 420~29. Fama, E. F. (1965). The behavior of stock market prices. J. Business 38, 34-105. Foster, D. and S. Viswanathan (1993a). The effect of public information and competition on trading volume and price volatility. Rev. Financ. Stud. 6, 23-56. Foster, D. and S. Viswanathan (1993b). Can speculative trading explain the volume volatility relation. Discussion Paper, Fuqua School of Business, Duke University. French, K. and R. Roll (1986). Stock return variances: The arrival of information and the reaction of traders. J. Financ. Econom. 17, 5-26. Gallant, A. R., D. A. Hsieh and G. Tauchen (1994). Estimation of stochastic volatility models with suggestive diagnostics. Discussion Paper, Duke University. Gallant, A. R., P. E. Rossi and G. Tauchen (1992). Stock prices and volume. Rev. Financ. Stud. 5, 199242. Gallant, A. R., P. E. Rossi and G. Tauchen (1993). Nonlinear dynamic structures. Econometrica 61, 871-907. Gallant, A. R. and G. Tauchen (1989). Semipararnetric estimation of conditionally constrained heterogeneous processes: Asset pricing applications. Econometrica 57, 1091-1120. Gallant, A. R. and G. Tauchen (1992). A nonparametric approach to nonlinear time series analysis: Estimation and simulation. In: E. Parzen, D. Brillinger, M. Rosenblatt, M. Taqqu, J. Geweke and P. Caines eds., New Dimensions in Time Series Analysis. Springer-Verlag, New York. Gallant, A. R. and G. Tauchen (1994). Which moments to match. Econometric Theory, to appear. Gallant, A. R. and G. Tauchen (1995). Estimation of continuous time models for stock returns and interest rates. Discussion Paper, Duke University. Gallant, A. R. and H. White (1988). A Unified Theory of Estimation and Inference for Nonlinear Dynamic Models. Basil Blackwell, Oxford. Garcia, R. and E. Renault (1995). Risk aversion, intertemporal substitution and option pricing. Discussion Paper CIRANO. Geweke, J. (1994). Comment on Jacquier, Poison and Rossi. J. Business Econom. Statist. 12, 397-399.
Stochastic volati6ty
187
Geweke, J. (1995). Monte Carlo simulation and numerical integration. In: H. Amman, D. Kendrick and J. Rust eds., Handbook of Computational Economics. North Holland. Ghysels, E., C. Gourirroux and J. Jasiak (1995a). Market time and asset price movements: Theory and estimation. Discussion paper CIRANO and C.R.D.E., Univerist6 de Montrral. Ghysels, E., C. Gourirroux and J. Jasiak (1995b). Trading patterns, time deformation and stochastic volatility in foreign exchange markets. Paper presented at the HFDF Conference, Zfirich. Ghysels, E. and J. Jasiak (1994a). Comments on Bayesian analysis of stochastic volatility models. J. Business Econom. Statist. 12, 399-401. Ghysels, E. and J. Jasiak (1994b). Stochastic volatility and time deformation a n application of trading volume and leverage effects. Paper presented at the Western Finance Association Meetings, Santa Fe. Ghysels, E., L. Khalaf and C. Vodounou (1994). Simulation based inference in moving average models. Discussion Paper, CIRANO and C.R.D.E. Ghysels, E., H. S. Lee and P. Siklos (1993). On the (mis)specification of seasonality and its consequences: An empirical investigation with U.S. Data. Empirical Econom. 18, 747-760. Goodhart, C. A. E. and M. O'Hara (1995). High frequency data in financial markets: Issues and applications. Paper presented at HFDF Conference, Z0a-'ich. Gourirroux, C. and A. Monfort (1993a). Simulation based Inference: A survey with special reference to panel data models, J. Econometrics 59, 5-33. Gourirroux, C. and A. Monfort (1993b). Pseudo-likelihood methods in Maddala et al. ed., Handbook of Statistics Vol. 11, North Holland, Amsterdam. Gouri~roux, C. and A. Monfort (1994). Indirect inference for stochastic differential equations. Discussion Paper CREST, Paris. Gouri~roux, C. and A. Monfort (1995). Simulation-Based Econometric Methods. CORE Lecture Series, Louvain-la-Neuve. Gourirroux, C., A. Monfort and E. Renault (1993). Indirect inference. J. Appl. Econometrics 8, $85Sl18. Gourirroux, C., A. Monfort and C. Tenreiro (1994). Kernel M-estimators: Nonparametric diagnostics for structural models. Discussion Paper, CEPREMAP. Gouri+roux, C., A. Monfort and C. Tenreiro (1995). Kernel M-estimators and functional residual plots. Discussion Paper CREST - ENSAE, Paris. Gourirroux, C., E. Renault and N. Touzi (1994). Calibration by simulation for small sample bias correction. Discussion Paper CREST. Gourirroux, C. and O. Scaillet (1994). Estimation of the term structure from bond data. J. Emp. Finance, to appear. Granger, C. W. J. and Z. Ding (1994). Stylized facts on the temporal and distributional properties of daily data for speculative markets. Discussion Paper, University of California, San Diego. Hall, A. R. (1993). Some aspects of generalized method of moments estimation in Maddala et al. ed., Handbook o f Statistics Vol. 11, North Holland, Amsterdam. Hamao, Y., R. W. Masulis and V. K. Ng (1990). Correlations in price changes and volatility across international stock markets. Rev. Financ. Stud. 3, 281-307. Hansen, B. E. (1995). Regression with nonstationary volatility. Econometrica 63, 1113-1132. Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica 50, 1029-1054. Hansen, L. P. and J. A. Scheinkman (1995). Back to the future: Generating moment implications for continuous-time Markov processes. Econometrica 63, 767-804. Harris, L. (1986). A transaction data study of weekly and intradaily patterns in stock returns. J. Financ. Econom. 16, 99-117. Harrison, M. and D. Kreps (1979). Martingale and arbitrage in multiperiod securities markets. J. Econom. Theory 20, 381-408. Harrison, J. M. and S. Pliska (1981). Martingales and stochastic integrals in the theory of continuous trading. Stochastic Processes and Their Applications 11, 215-260.
188
Harrison, P. J. and C. F. Stevens (1976). Bayesian forecasting (with discussion). J. Roy. Statis. Soc., Ser. B, 38, 205-247. Harvey, A. C. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press. Harvey, A. C. and A. Jaeger (1993). Detrending, stylized facts and the business cycle. J. Appl. Econometrics 8, 231-247. Harvey, A. C. (1993). Long memory in stochastic volatility. Discussion Paper, London School of Economics. Harvey, A. C. and S. J. Koopman (1993). Forecasting hourly electricity demand using time-varying splines. J. Amer. Statist. Assoc. 88, 1228--1236. Harvey, A. C., E. Ruiz and E. Sentana (1992). Unobserved component time series models with ARCH Disturbances, J. Econometrics 52, 129-158. Harvey, A. C., E. Ruiz and N. Shephard (1994). Multivariate stochastic variance models. Rev. Econom. Stud. 61, 247-264. Harvey, A. C. and N. Shephard (1993). Estimation and testing of stochastic variance models, STICERD Econometrics. Discussion paper, EM93/268, London School of Economics. Harvey, A. C. and N. Shephard (1996). Estimation of an asymmetric stochastic volatility model for asset returns. J. Business Econom. Statist. to appear. Harvey, C. R. and R. D. Huang (1991). Volatility in the foreign currency futures market. Rev. Financ. Stud. 4, 543-569. Harvey, C. R. and R. D. Huang (1992). Information trading and fixed income volatility. Discussion Paper, Duke University. Harvey, C. R. and R. E. Whaley (1992). Market volatility prediction and the efficiency of the S&P 100 index option market. J. Financ. Econom. 31, 43-74. Hausman, J. A. and A. W. Lo (1991). An ordered probit analysis of transaction stock prices. Discussion paper, Wharton School, University of Pennsylvania. He, H. (1993). Option prices with stochastic volatilities: An equilibrium analysis. Discussion Paper, University of California, Berkeley. Heston, S. L. (1993). A closed-form solution for options with stochastic volatility with applications to bond and currency options. Rev. Financ. Stud. 6, 327-343. Heynen, R., A. Kemna and T. Vorst (1994). Analysis of the term structure of implied volatility. J.
Financ. Quant. Anal.
Hull, J. (1993). Options, futures and other derivative securities. 2nd ed. Prentice-Hall International Editions, New Jersey. Hull, J. (1995). Introduction to Futures and Options Markets. 2nd ed. Prentice-Hall, Englewood Cliffs, New Jersey. Hull, J. and A. White (1987). The pricing of options on assets with stochastic volatilities. J. Finance 42, 281-300. Huffman, G. W. (1987). A dynamic equilibrium model of asset prices and transactions volume. J. Politic. Econom. 95, 138-159. Hutchinson, J. M., A. W. Lo and T. Poggio (1994). A nonparametric approach to pricing and hedging derivative securities via learning networks. J. Finance 49, 851-890. Jacquier, E., N. G. Poison and P. E. Rossi (1994). Bayesian analysis of stochastic volatility models (with discussion). J. Business Econom. Statist. 12, 371-417. Jacquier, E., N. G. Poison and P. E. Rossi (1995a). Multivariate and prior distributions for stochastic volatility models. Discussion paper CIRANO. Jacquier, E., N. G. Polson and P. E. Rossi (1995b). Stochastic volatility: Univariate and multivariate extensions. Rodney White center for financial research. Working Paper 19-95, The Wharton School, University of Pennsylvania. Jacquier, E., N. G. Poison and P. E. Rossi (1995c). Efficient option pricing under stochastic volatility. Manuscript, The Wharton School, University of Pennsylvania. Jarrow, R. and Rudd (1983). Option Pricing. Irwin, Homewood III. Johnson, H. and D. Shanno (1987). Option pricing when the variance is changing. J. Financ. Quant. Anal. 22, 143-152.
189
Jorion, P. (1995). Predicting volatility in the foreign exchange market. J. Finance 50, to appear. Karatzas, l. and S. E. Shreve (1988). Brownian Motion and Stochastic Calculus. Springer-Verlag: New York, NY. Karpoff, J. (1987). The relation between price changes and trading volume: A survey. J. Financ. Quant. Anal. 22, 109-126. Kim, S. and N. Shephard (1994). Stochastic volatility: Optimal likelihood inference and comparison with ARCH Model. Discussion Paper, Nuffield College, Oxford. King, M., E. Sentana and S. Wadhwani (1994). Volatility and links between national stock markets. Econometrica 62, 901-934. Kitagawa, G. (1987). Non-Gaussian state space modeling of nonstationary time series (with discussion). J. Amer. Statist. Assoc. 79, 378-389. Kloeden, P. E. and E. Platten (1992). Numerical Solutions of Stochastic Differential Equations. Springer-Verlag, Heidelberg. Lamoureux, C. and W. Lastrapes (1990). Heteroskedasticity in stock return data: Volume versus GARCH effect. J. Finance 45, 221-229. Lamoureux, C. and W. Lastrapes (1993). Forecasting stock-return variance: Towards an understanding of stochastic implied volatilities. Rev. Financ. Stud. 6, 293-326. Latane, H. and R. Jr. Rendleman (1976). Standard deviations of stock price ratios implied in option prices. J. Finance 31, 369-381. Lezan, G., E. Renault and T. deVitry (1995) Forecasting foreign exchange risk. Paper presented at 7th World Congres of the Econometric Society, Tokyo. Lin, W. L., R. F. Engle and T. Ito (1994). Do bulls and bears move across borders? International transmission of stock returns and volatility as the world turns. Rev. Financ. Stud., to appear. Lo, A. W. (1995). Statistical inference for technical analysis via nonparametric estimation. Discussion Paper, MIT. Mahieu, R. and P. Schotrnan (1994a). Stochastic volatility and the distribution of exchange rate news. Discussion Paper, University of Limburg. Mahieu, R. and P. Schotman (1994b). Neglected common factors in exchange rate volatility. J. Emp. Finance 1, 279 311. Mandelbrot, B. B. (1963). The variation of certain speculative prices. J. Business 36, 394-416. Mandelbrot, B. and H. Taylor (1967). On the distribution of stock prices differences. Oper. Res. 15, 1057-1062. Mandelbrot, B. B. and J.W. Van Ness (1968). Fractal Brownian motions, fractional noises and applications. S l A M Rev. 1O, 422-437. McFadden, D. (1989). A method of simulated moments for estimation of discrete response models without numerical integration. Econometrica 57, 1027-1057. Meddahi, N. and E. Renault (1995). Aggregations and marginalisations of GARCH and stochastic volatility models. Discussion Paper, GREMAQ. Melino, A. and M. Turnbull (1990). Pricing foreign currency options with stochastic volatility. J. Econometrics 45, 239-265. Melino, A. (1994). Estimation of continuous time models in finance. In: C.A. Sims ed., Advances in Econometrics (Cambridge University Press). Merton, R. C. (1973). Rational theory of option pricing. Bell J. Econom. Mgmt. Sci. 4, 141-183. Merton, R. C. (1976). Option pricing when underlying stock returns are discontinuous. J. Pinanc. Econom. 3, 125-144. Merton, R. C. (1990). Continuous Time Finance. Basil Blackwell, Oxford. Merville, L. J. and D. R. Pieptea (1989). Stock-price volatility, mean-reverting diffusion, and noise. J. Financ. Econom. 242, 193-214. Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller and E. Teller (1954). Equation of state calculations by fast computing machines. J. Chem. Physics 21, 1087-1092. Miiller, U. A., M. M. Dacorogna, R. B. Olsen, W. V. Pictet, M. Schwarz and C. Morgenegg (1990). Statistical study of foreign exchange rates. Empirical evidence of a price change scaling law and intraday analysis. J. Banking Finance 14, 1189-1208.
190
Nelson, D. B. (1988). Time series behavior of stock market volatility and returns. Ph.D. dissertation, MIT. Nelson, D. B. (1990). ARCH models as diffusion approximations. J. Econometrics 45, 7-39. Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new approach. Econometrica 59, 347-370. Nelson, D. B. (1992). Filtering and forecasting with misspecified ARCH Models I: Getting the right variance with the wrong model. J. Econometrics 25, 61-90. Nelson, D. B. (1994). Comment on Jacquier, Poison and Rossi. J. Business Eeonom. Statist. 12, 403 406. Nelson, D. B. (1995a). Asymptotic smoothing theory for ARCH Models. Econometrica, to appear. Nelson, D. B. (1995b). Asymptotic filtering theory for multivariate ARCH models. J. Econometrics, to appear. Nelson, D. B. and D. P. Foster (1994). Asymptotic filtering theory for univariate ARCH models. Econometrica 62, 1-41. Nelson, D. B. and D. P. Foster (1995). Filtering and forecasting with misspecified ARCH models II: Making the right forecast with the wrong model. J. Econometrics, to appear. Noh, J., R. F. Engle and A. Kane (1994). Forecasting volatility and option pricing of the S&P 500 index. J. Derivatives, 17-30. Ogaki, M. (1993). Generalized method of moments: Econometric applications. In: Maddala et al. ed., Handbook of Statistics Vol. 11, North Holland, Amsterdam. Pagan, A. R. and G. W. Schwert (1990). Alternative models for conditional stock volatility. J. Econometrics 45, 267-290. Pakes, A. and D. Pollard (1989). Simulation and the asymptotics of optimization estimators. Eeonometrica 57, 995-1026. Pardoux, E. and D. Talay (1985). Discretization and simulation of stochastic differential equations. Acta AppL Math. 3, 23-47. Pastorello, S., E. Renault and N. Touzi (1993). Statistical inference for random variance option pricing. Discussion Paper, CREST. Patell, J. M. and M. A. Wolfson (1981). The ex-ante and ex-post price effects of quarterly earnings announcement reflected in option and stock price. J. Account. Res. 19, 434-458. Patell, J. M. and M. A. Wolfson (1979). Anticipated information releases reflected in call option prices. J. Account. Econom. 1, 117-140. Pham, H. and N. Touzi (1993). Intertemporal equilibrium risk premia in a stochastic volatility model. Math. Finance, to appear. Platten, E. and Schweizer (1995). On smile and skewness. Discussion Paper, Australian National University, Canberra. Poterba, J. and L. Summers (1986). The persistence of volatility and stock market fluctuations. Amer. Eeonom. Rev. 76, 1142-1151. Renault, E. (1995). Econometric models of option pricing errors. Invited Lecture presented at 7th W.C.E.S., Tokyo, August. Renault, E. and N. Touzi (1992). Option hedging and implicit volatility. Math. Finance, to appear. Revuz, A. and M. Yor (1991). Continuous Martingales and Brownian Motion. Springer-Verlag, Berlin. Robinson, P. (1993). Efficient tests of nonstationary hypotheses. Mimeo, London School of Economics. Rogers, L. C. G. (1995). Arbitrage with fractional Brownian motion. University of Bath, Discussion paper. Rubinstein, M. (1985). Nonparametric tests of alternative option pricing models using all reported trades and quotes on the 30 most active CBOE option classes from August 23, 1976 through August 31, 1978. J, Finance 40, 455-480. Ruiz, E. (1994). Quasi-maximum likelihood estimation of stochastic volatility models. J. Econometrics 63, 289-306. Schwert, G. W. (1989). Business cycles, financial crises, and stock volatility. Carneg&-Rochester Conference Series on Public Policy 39, 83-126.
191
Scott, L. O. (1987). Option pricing when the variance changes randomly: Theory, estimation and an application. J. Finane. Quant. Anal. 22, 419438. Scott, L. (1991). Random variance option pricing. Advances in Futures and Options Research, Vol. 5, 113-135. Sheikh, A. M. (1993). The behavior of volatility expectations and their effects on expected returns. J. Business 66, 93-116. Shephard, N. (1995). Statistical aspect of ARCH and stochastic volatility. Discussion Paper 1994, Nuffield College, Oxford University. Sims, A. (1984). Martingale-like behavior of prices. University of Minnesota. Sowell, F. (1992). Maximum likelihood estimation of stationary univariate fractionally integrated time series models. J. Econometrics 53, 165-188. Stein, J. (1989): Overreactions in the options market. J. Finance 44, 1011-1023. Stein, E. M. and J. Stein (1991). Stock price distributions with stochastic volatility: An analytic approach. Rev. Financ. Stud. 4, 727-752. Stock, J. H. (1988). Estimating continuous time processes subject to time deformation. J. Amer. Statist. Assoc. 83, 77-84. Strook, D. W. and S. R. S. Varadhan (1979). Multi-dimensional Diffusion Processes. Springer-Verlag, Heidelberg. Tanner, T. and W. Wong (1987). The calculation of posterior distributions by data augmentation. J. Amer. Statist. Assoc. 82, 52~549. Tauchen, G. (1995). New minimum chi-square methods in empirical finance. Invited Paper presented at the 7th World Congress of the Econometric Society, Tokyo. Tauchen, G. and M. Pitts (1983). The price variability-volume relationship on speculative markets. Econometrica 51,485-505. Taylor, S. J. (1986). Modeling Financial Time Series. John Wiley: Chichester. Taylor, S. J. (1994). Modeling stochastic volatility: A review and comparative study. Math. Finance 4, 183-204. Taylor, S. J. and X. Xu (1994). The term structure of volatility implied by foreign exchange options. J. Finane. Quant Anal. 29, 57-74. Taylor, S. J. and X. Xu (1993). The magnitude of implied volatility smiles: Theory and empirical evidence for exchange rates. Discussion Paper, University of Warwick. Von Furstenberg, G. M. and B. Nam Jeon (1989). International stock price movements: Links and messages. Brookings Papers on Economic Activity 1,125-180. Wang, J. (1993). A model of competitive stock trading volume. Discussion Paper, MIT. Watanabe, T. (1993). The time series properties of returns, volatility and trading volume in financial markets. Ph.D. Thesis, Department of Economics, Yale University. West, M. and J. Harrison (1990). Bayesian Forecasting and Dynamic Models. Springer-Verlag, Berlin. Whaley, R. E. (1982). Valuation of American call options on dividend-paying stocks. J. Financ. Econom. 10, 29-58. Wiggins, J. B. (1987). Option values under stochastic volatility: Theory and empirical estimates. J. Financ. Econom. 19, 351-372. Wood, R. T. McInish and J. K. Ord (1985). An investigation of transaction data for NYSE Stocks. J. Finance 40, 723-739. Wooldridge, J. M. (1994). Estimation and inference for dependent processes. In: R.F. Engle and D. McFadden eds., Handbook of Econometrics Vol. 4. North Holland, Amsterdam.
G. S. Maddala and C. R. Rao, eds., Handbook of Statistics, Vol. 14 1996 Elsevier Science B.V. All rights reserved
t~
Stock Price Volatility
Stephen F. LeRoy
1. Introduction
In the early days of the efficient capital markets literature, discourse between finance academics and practitioners was characterized by mutual incomprehension. Academics held that security prices were governed exclusively by their prospective payoffs - in fact, the former equaled the discounted expected value of the latter. Practitioners, on the other hand, made no secret of their opinion that only naive academics could take the present value relation seriously as a theory of asset pricing: everyone knows that traders routinely ignore cash flows, and that large price changes often occur in the complete absence of news about future cash flows. Academics, at least since Samuelson's (1965) paper, responded that rejection of the present value relation implies the existence of profitable trading rules. Given that no one appeared to be identifying a trading rule that significantly outperforms buy-and-hold, academics saw no grounds for rejecting the presentvalue relation. Prior to the 1980's, empirical tests of market efficiency were conducted on the home court of the academics: one searched for evidence of return predictability; failing to find it, one concluded in favor of market efficiency. The variancebounds tests introduced by Shiller (1981) and LeRoy and Porter (1981), however, can be interpreted as shifting the locus of the debate from the home court of the academics to that of the practitioners - instead of looking for patterns in returns that are ruled out by market efficiency, one looked for the price patterns that are implied by market efficiency. Specifically, one asked whether security price changes are of about the magnitude one would expect if they were generated exclusively by fundamentals. The implications of this shift from returns tests to price-level tests were at first difficult to sort out since finding a predictable pattern has opposite interpretations in the two cases: finding that fundamentals predict future security returns argues against market efficiency, whereas finding that fundamentals predict current prices supports market efficiency. In both cases the early evidence suggested that the correlation being sought was not in the data; hence the returns tests accepted market efficiency, whereas the variance-bounds tests rejected efficiency.
193
194
S. F. LeRoy
To understand the relation between returns and variance-bounds tests of market efficiency, note that the simplest specification of the efficient markets model (applied to stock prices) says that
gt(t't+l) = p , (1.1)
where rt is the (gross) rate of return on stock, p is a constant greater than one, and Et denotes mathematical expectation conditional on some information set /t. Equation 1.1 says that no matter what agents' information is, the conditional expected rate of return on stock is p; past information, such as past realized stock returns, should not be correlated with future returns. Conventional efficiency tests directly investigated this implication. Variance-bounds tests, on the other hand, used the definition of the rate of return, r,+l = dt+l + Pt+l , Pt (1.2)
to derive from 1.1 the relation

p, = Et(at+l + pt+l) ,
(1.3)
where/3 ~ 1/(1 + p). After successive substitution and application of the law of iterated expectations, (1.3) may be written as
P t = Et(fldt+l + fl2dt+2 . . .
on+l.s ~- p Ut+n+l ,~n+l ]3 Pt+n+lJ"~ --
(1.4)
Assuming the convergence condition lim/?~+lEt(Pt+n+L) = 0

n----~ OO
(1.5)
is satisfied, sending n to infinity in (1.4) results in Pt = Et(Pt) , (1.6)
where p; is the ex-post rational stock price; i.e., the value stock would have if future dividends were perfectly forecastable: Pt = ~
n=l
B"dt+n .
(1.7)
Because the conditional expectation of any random variable is less volatile than that random variable itself, (1.6) implies the variance bounds inequality V(pt) _< V(p~) . (1.8)
Both Shiller and LeRoy-Porter reported reversal of the empirical counterpart to inequality (1.8): prices appear to be more volatile than the upper bound implied by the volatility of dividends under market efficiency.
Stock price volatility

2. Statistical issues
195
Several statistical issues must be considered in interpreting the fact that the empirical counterpart of inequality 1.8 is apparently reversed. These are (1) bias in parameter estimation, (2) nuisance parameter problems, and (3) sample variation of parameter estimates. Of these, discussion in the variance-bounds literature has concentrated almost exclusively on bias. However, bias is not a serious problem in the absence of nuisance parameter or sample variability problems since the rejection region can always be modified to allow for bias. In contrast, nuisance parameter problems - which occur whenever the sample distribution of the test statistic is materially affected by a parameter which is unrestricted under the null hypothesis - make it difficult or impossible to set rejection regions so that rejection will occur with prespecified probability if the null hypothesis is true. Therefore they are much more serious. High sample variability in the test statistic is also a serious problem since it diminishes the ability of the test to distinguish between the null and the alternative, therefore reducing the power of the test for given size. In testing (1.8) one immediately encounters the fact that Pt cannot be directly constructed from any finite sample since dividends after the end of the sample are unobservable. The problems of bias, nuisance parameters and sample variability in testing (1.8) take different forms depending on how this problem is addressed. Two methods for estimating V(p~) are available, the model-free estimator used by Shiller and the model-based estimator used by LeRoy-Porter. The model-free estimator simply replaces the unobservable Pt with the expected value of p~ conditional on the sample, which is observable. This is given by setting the terminal value P ~ r of the observable proxy series P~r equal to actual Pr:
P ~ -- PT
and computing earlier values .Pt~T from the backward recursion Pl[T ~ fl(Pt*+llT q- dt+ 1) ' which has the required property: g(ptlrhOl, all, . ,PT, dT) = p*t
(2.1)
(2.2)
(2.3)
(under the assumption that the population value of fl is used in the discounting). The estimated series is model-free in the sense that its construction requires no assumptions about how dividends are generated, an attractive property. Using the model-free p~l: r series to construct P(p~) has several less attractive consequences. Most important, if the model-builder is unwilling to commit to a model for dividends, there is no prospect of evaluating the sample variability of l/(p~), rendering construction of confidence intervals impossible. Thus it was no accident that Shiller reported point estimates of V(pt) and P(p~), but no t-statistics.
196
S.~LeRoy
One can, however, investigate the statistical properties of ll(pt ) under particular models of dividends, and this has been done. As Flavin (1983) and Kleidon (1986) showed, because of the very high serial correlation ofPtlr, l?(pt) is severely biased downward as an estimator of V(Pt); see Gilles and LeRoy (1991) for intuitive interpretation. As noted above, this by itself is not a problem since the rejection region can always be modified to offset the effect of bias. However, such modification cannot be implemented without committing to a dividends model, so if one takes this route the advantage of the model-free estimator is foregone. Also, it is known that the model-free estimator l?(pt) has higher sample variability than its model-based counterpart, discussed below. A model-based estimator of V(pt) can be constructed if one is willing to specify a statistical model assumed to generate dividends. For example, suppose dividends are generated by a first-order autoregression:
dt+l = ~.dt + et+l .
(2.4)
Then an expression for the population value of V ( P t ) is readily computed as a function of 2, a~ 2 and fl, and a model-based estimator P'(Pt) can be constructed by substituting parameter estimates for their population counterparts. Assuming the dividends model is correctly specified, the model-based estimator has little bias (at least in some settings) and, more important, very low sample variability (LeRoyParke (1992)). In the setting of LeRoy-Parke the model-based point estimate of V ( p ~ ) is about three times greater than the estimate of V ( p t ) , suggesting acceptance of (1.8). However, due to the nuisance-parameter problem to be discussed now, this result is not of much importance. Besides the ambiguities resulting from the various methods of constructing 11(p~), an even more basic problem arises from the fact that 1.8 is an inequality rather than an equality. Assuming that the null hypothesis is true, the population value of V(pt) depends on the magnitude of the error in investors' estimates of future dividends. Therefore the same is true of the volatility parameter V(p~;) - V(pt), the sample counterpart of which constitutes the test statistic. This error variance is not restricted by the assumption of market efficiency, leading to its characterization as a nuisance parameter. In LeRoy-Parke it is argued that this problem is very serious quantitatively: there is no way to set a rejection region for the volatility statistic /~(p~) - V(Pt). It is argued there that because of this nuisance parameter problem, directly testing Eq. (1.8) is essentially impossible. Since (1.8) is the best-known of the variance-bounds relations, this is not a minor conclusion. There exist other variance-bounds tests that are better-behaved econometrically than inequality (1.8). To develop these, define et+l as the innovation in stock payoffs: 8t+l ~ dt+l +Pt+t - Et(dt+l +Pt+l) , so that the present-value relation (1.3) can be written as
Pt = flEt(dt+l + Pt+l) = fl(dt+l + pt+l - et+l)
(2.5)
(2.6)
197
Substituting recursively, using the definition 1.7 of p7 and assuming convergence, (2.6) becomes
oo p* t = Pt + Z i~t+i i=1 (2.7)
so that the difference between Pt and Pt is expressible as a weighted sum of payoff innovations. Equation (2.7) implies V(Pt) = V(pt) + ~ V(et) . (2.8)
Put this result aside for the moment. The upper bound for price volatility is derived by considering the volatility of a hypothetical price series that would obtain if investors had perfect information about future dividends. LeRoy-Porter also showed that a lower bound on price volatility could be derived if one was willing to specify that investors have at least some minimal information about future dividends. Suppose that one assumes that investors know at least current and past dividends; they may or may not have access to other variables that predict future dividends. Let Pt denotes the stock price that would prevail under this minimal information specification: Pt = E(Ptldt, dr-l, dr-2,...) . (2.9)
Then because It is a refinement of the information partition induced by dt, dt-l, d t - 2 , . . . , we have f t = E([E(PTIIt)]Id, dt-1, d,_2,...) , by the law of iterated expectations, or fit = E(Ptldt, dr-l, dr-2,...) , (2.11) (2.10)
using (1.6). Therefore, by exactly the same reasoning used to derive (1.8), we obtain V(fit) <_ V(pt) , (2.12)
so the variance of pt is a lower bound for the variance of pt. This lower bound is without direct empirical interest since no one has seriously suggested that stock prices are less volatile than is implied by the present-value model under the assumption that investors know current and past dividends. However, the lower bound may be put to a more interesting use. By defining ~t+l as the payoff innovation under the information set generated by dr, dr-l, dr-2,..., ~t+l ~ dt+l +Pt+l - E(dt+l + Pt+l [dt, dr-l, d t - 2 , . . . ) we derive
O~3
p ; = 13t-~- ~-~fli~t+i
(2.13)
(2.14)
i=1
198
S.F. LeRoy
by following exactly the derivation of (2.7). Equation (2.14) implies

V(p~) = V(fit) + ~ - ~ V('et) .
(2.15)
Equations (2.8) and (2.15) plus the lower bound inequality (2.12) imply
V(~t) >_ V(et)
(2.16)
Thus the present-value relation implies not just that prices are less volatile than they would be if investors had perfect information, but also that net one-period payoffs are less volatile than they would be if investors had less information than they (by assumption) do. To test (2.16), one simply fits a univariate time-series model to dividends and uses it to compute I?(gt), while V(et) is just the estimated residual variance in the regression
dt + Pt = fl-lPt-I + et
(2.17)
This adaptation of LeRoy-Porter's lower bound on price volatility to the formally equivalent - but much more interesting econometrically - upper bound on payoff volatility is due to West (1988). The West test, like Shiller and LeRoy-Porter's upper bound tests on price volatility, resulted in rejection. West reported statistically significant rejection (as noted, Shiller did not compute confidence intervals, while LeRoy-Porter's rejections were only of borderline statistical significance). Generally, the West test is free of the most serious econometric problems that beset the price bounds tests. Most important, under the null hypothesis payoff innovations are serially uncorrelated, so sample means yield good estimates of population means (recall that model-free tests of price volatility are subject to the problem that pt and Pt are highly serially correlated). Further, the associated t-statistics can be used to compute rejection regions. Finally, there is no need to specify investors' information since a model-free estimate of V(et) is used, implying that the nuisance parameter problem that occurs under model-based price bounds tests does not appear here.
3. Dividend-smoothing and nonstationarity One objection sometimes raised against the variance-bounds tests is that corporate managers smooth dividends. That being the case, and because the ex-post rational stock price is in turn a highly smoothed average of dividends, it is argued that we should not be surprised that actual stock prices are choppier than ex-post rational prices. This point was raised most forcefully by Marsh and Merton (1983), (1986). 1 Marsh-Merton asserted that the variance-bounds theorems rel This discussion is drawn from the 1988 version of Gilles-LeRoy(1991), available from the author. Discussionof Marsh-Mertonwas deletedfrom the publishedversionof that paper in response to an editor's request.
199
quire for their derivation the assumption that dividends are exogenous, and also that the resulting series is stationary. If these assumptions are not satisfied the variance-bounds theorems are reversed. To prove this, Marsh-Merton (1986) assumed that managers set dividends as a distributed lag on past stock prices:
N aft = E 2 i P t - i i=1
(3.1)
Further, from (1.7) the ex-post rational stock price can be written as
T-i P t = x-" ~ i=l
i4+ i + P
* " PT
(3.2)
Finally, Marsh-Merton took the terminal ex-post rational stock price to be given by the sample average stock price: p.~ _ Et~=l Pt (3.3) T Substituting (3.1) and (3.3) into (3.2), it is seen that pt is expressible as a weighted average of the in-sample pt's. Using this result, Marsh-Merton proved that in every sample p; has lower variance than pt, just the opposite of the variancebounds theorem. Questions emerge about Marsh-Merton's assertion that the variance-bounds inequality is reversed if managers smooth dividends. The most important question arises from the fact that none of the rigorous derivations of the variancebounds theorems available in the literature make use, explicitly or implicitly, of any assumption of exogeneity or stationarity: instead, the theorems depend only on the fact that the conditional expectation of a random variable is less volatile than the random variable itself. How, then, does dividend smoothing reverse the variance-bounds theorem? It turns out that Marsh-Merton are not in fact asserting that the variance-bounds theorems are incorrect, but only that in the setting they specify the sample counterparts of the variance of Pt and P t reverse the population inequality; Marsh-Merton's failure to use notation that distinguishes population from sample moments renders careful reading of their paper needlessly difficult. Marsh-Merton's dividend specification implies that dividends and prices are necessarily nonstationary (this is proved explicitly in Shiller's (1986) comment on Marsh-Merton). Sample moments cannot be expected to satisfy the same inequalities as population moments if the latter are infinite (or time-varying, depending on the interpretation). In nonstationary populations, in fact, there is essentially no relation between population moments and the corresponding sample moments 2 - indeed, the very idea that there is a correspondence between
2 Gilles-LeRoy (1991) set out an example, adapted from Kleidon (1986), in which the martingale convergence theorem implies that the sample counterpart of the variance-bounds inequality is reversed with arbitrarily high probability in arbitrarily long samples despite being true at each date in the population. As with Marsh-Merton, nonstationarity is the culprit.
200
S.F. LeRoy
sample and population moments in time-series analysis derives its meaning from the analysis of stationary series. Thus there is no inconsistency whatever between the assertion that the population variance-bounds inequality is satisfied at every date, as it is in Marsh-Merton's model, and Marsh-Merton's demonstration that under their specification its sample counterpart is reversed for every possible sample. What Marsh-Merton's example demonstrates is that if one uses analytical methods appropriate under stationarity when the data under investigation are nonstationary, one can be misled. Thus formulated, Marsh-Merton's conclusion is surely correct. The logical implication is that one wishes to make progress with the analysis of stock price volatility, one should go on to formulate statistical procedures that are appropriate in the nonstationary setting they assume. MarshMerton did not do so, and no easy extension of their model would have allowed them to take this next step. The reason is that Marsh-Merton's model does not contain any specification of what exogenous variables drive their model; the only behavior they model is managers' response to stock prices, treated as exogenous, in setting dividends. Marsh-Merton made two criticisms of the variance-bounds tests: (1) that they depend on the assumption that dividends are stationary, and (2) that they depend on the assumption that dividends are exogenous, as opposed to being smoothed by managers (this second criticism is especially prominent in Marsh-Merton's unpublished paper (1983) dealing with LeRoy-Porter (1981)). Marsh-Merton treated the two points as interchangeable, so that exogeneity was taken to imply stationarity, and dividend-smoothing nonstationarity. In fact dividend exogeneity neither implies nor is implied by stationarity, and the variance-bounds theorems require neither one, as we saw above. It is true that the specific empirical implementation adopted by Shiller has attractive econometric properties only when dividends are stationary in levels.3 However, whether or not the analyst chooses to model the dividend-payout decision, as Marsh-Merton did, or directly assigns dividends a probabilistic model, as LeRoy-Porter did, is immaterial: if the assumed dividends model under the latter coincides with the behavior implied for dividends in the former case, the two are equivalent. It follows that any implementation of the variance-bounds tests that accurately characterizes dividend behavior is acceptable, regardless of whether corporate managers are smoothing dividends and regardless of whether such behavior, if occurring, is modeled. Whether or not Shiller's assumption of trend-stationarity is acceptable has been controversial: many analysts believe that major macroeconomic time series, such as GNP, have a unit root. The debate about trend-stationarity vs. unit roots in macroeconomic time-series is not reviewed here, except to note that (1) of all
3 LeRoy-Porter used a trend correction based on reversing the effect of earnings retention that should have resulted in stationary data, but in fact produced series with a downward trend (which explained why their rejections of the variance-bounds theorems were of only marginal statistical significance). The reasons for the failure of LeRoy-Porter's trend correction are unclear.
201
the major macroeconomic time series, aggregate dividends appears closest to trend-stationarity, and (2) many econometricians believe that it is difficult to distinguish empirically between the trend-stationary and unit-root cases. Kleidon (1986) showed that if dividends have a unit root, so that dividend shocks have a permanent component, then stock prices should be more volatile than they would be if dividends were stationary. Kleidon expressed the opinion that the evidence of excess volatility reflects nothing more than the nonstationarity of dividends. However, this opinion cannot be sustained. First, the West test is valid if dividends are generated by a linear time-series process with a unit root, so that, if the expected present-value model is correct, dividends and stock prices are cointegrated. West, it is recalled, found significant excess volatility. Other tests, of which Campbell and Shiller (1988) was the first to be published, dealt with dividend nonstationarity by working with the price-dividend ratio instead of price levels. Again the conclusion was that stock prices are excessively volatile. LeRoy-Parke (1992) showed that the variance equality that LeRoy-Porter had used,
2
V(p~) = V(pt) + 1 - fl---~ '
(3.4)
could be adapted to apply to the intensive price-dividend variables, yielding

v(p,/at) = +
av(r,),
(3.5)
where 6 is a function of various parameters, under the assumption that all variances of the intensive variables p t / d t , p ' f / d t and rt remain constant over time (this is the counterpart of the assumption, required to derive (3.4), that variances of extensive variables like pt, P~ and et remain constant over time). LeRoy-Parke also found excess volatility (see also LeRoy and Steigerwald, 1993). Thus the debate about whether dividends are trend-stationary or have a unit root is, from the point of view of the variance-bounds tests, irrelevant: either way, volatility exceeds that predicted by the present-value model.
4. Bubbles
These results show that excess volatility occurs under at least some forms of dividend nonstationarity. However, they do not necessarily completely dispose of Marsh-Merton's criticisms; any model-based variance-bounds test requires some specification of the probability law, stationary or nonstationary, assumed to generate dividends, and critics can always question this specification. For example, LeRoy-Parke assumed that dividends follow a geometric random walk, a characterization that appears not to do great violence to the data. However, it may be that the dividend-smoothing behavior of managers results in a less parsimonious model for dividends, in which case LeRoy-Parke's results may reflect nothing more than misspecification of the dividends model.
202
S.F. LeRoy
Two sets of circumstances might invalidate variance-bounds tests based on particular dividend specification such as the geometric random walk. First, it may be that even data sets as long as a century (the length of Shiller's 1981 data set, which was also used in several of the subsequent variance-bounds papers) are too short to allow accurate estimation of dividend volatility. Regime shift models, for example, require very long data sets for accurate estimation. Alternatively, the stock market may be subject to a "peso problem" - investors might attach timevarying probabilities to an event which did not occur in the finite sample. The second circumstance that might invalidate variance-bounds tests is rational speculative bubbles. Thus consider an extreme case of Marsh-Merton's dividend-smoothing behavior: suppose that firms pay some positive (but low) level of dividends that is deterministic. 4 Thus all fluctuations in earnings show up as additions to (or subtractions from) capital. In this setting the market value of the firm will reflect the value of its capital, which by assumption does not depend on past dividends. Price volatility will obviously exceed the volatility implied by dividends, since the latter is zero, so the variance-bounds theorem is violated. Theoretically, what is happening in this case is that the limiting condition (1.5) is not satisfied, so that stock prices do not equal the limit of the present value of dividends. Models in which (1.5) fails are defined as rational speculative bubbles: prices are higher than the present value of future dividends but, because they are expected to rise still higher, (1.3) is satisfied. Thus insofar as they are suggesting that dividend smoothing invalidates empirical tests of the variance-bounds relations even in infinite samples, Marsh-Merton are asserting the existence of rational speculative bubbles. Bubbles have received much study in the recent economics literature, partly because of their potential role in resolving the excess volatility puzzle (for theoretical studies of rational bubbles, see Gilles and LeRoy (1992) and the sources cited there; for a summary of the empirical results as they apply to variancebounds, see Flood and Hodrick (1990)). This is not the place for a complete discussion of bubbles; we remark only that the widely-held impression that bubbles cannot occur in models incorporating rationality is incorrect. This impression is fostered by the practice of referring incorrectly to (1.5) as a transversality condition (a transversality condition is associated with an optimization problem; no such problem has been specified here), suggesting that its satisfaction is somehow virtually automatic. In fact, (1) there exist well-posed optimization problems that do not have necessary transversality conditions, and (2) transversality conditions, even when necessary for optimization, do not always imply (1.5.) Examples are found in Gilles-LeRoy (1992). These examples, it is true, appear recondite. However, recall that the goal here is to explain behavior -
4 This specificationconflicts with limited liability, which in conjunction with random earnings implies that firm managers may not be able to commit to paying positivedividendswith certainty into the infinite future. This objection, while valid, is extraneous to the present concern, and hence is set aside.
203
excess volatility - that is itself counterintuitive; given this, we should not readily dismiss out of hand counterintuitive specifications of preferences. If (1.3) is satisfied but (1.5) fails, then the price of stock differs from the expected present value of dividends by a bubble term that satisfies
bt+l :- (1 + p)bt 4- qt+l
,
(4.1)
so that a bubble is a martingale with drift p. Since the bubble increases in value at average rate p, which exceeds the growth rate of dividends (otherwise stock prices would be infinite), stock prices rise more rapidly than dividends. Therefore the dividend-price ratio will decrease over time. Informal examination of a plot of the dividend-price ratio shows no clear downward trend, and the majority of the empirical studies surveyed by Flood-Hodrick (1990) do not find evidence of bubbles. This literature is under rapid development, however, from both the theoretical and empirical sides, and this conclusion may shortly be reversed. For now, however, it is difficult to find support for the contention that firms are smoothing dividends in such a way as to invalidate the stationarity presumed in the variance-bounds tests.
5. Time-varying discount rates
One possible explanation for the apparent excess volatility of securities prices is that conditionally expected rates of return depend on the values taken on by the conditioning variables, contradicting (1.1). There is no reason, other than a desire for simplification, to adopt the restriction that the conditional expected return on stock is constant over time, as implied by (1.1). If agents are risk averse, one would expect the conditions of equilibrium in asset markets to reflect a risk-return tradeoff, so that (1.1) would be replaced by a term involving the higher moments of return distributions as well as the conditional mean (consider CAPM, for example). Thus equilibrium conditions like (1.1) are best interpreted as obtaining in efficient markets under the additional assumption of risk-neutrality (LeRoy (1973), Lucas (1978)). Further, in simple models in which agents are risk averse, price volatility is likely to exceed that predicted by risk-neutrality. The intuition is simple: under risk aversion agents try to transfer consumption from dates when income and consumption are high to dates when they are low. Decreasing returns in production mean that this transfer is increasingly costly, so security prices must behave in such a way as to penalize agents who make this transfer. If stock prices are high (low) when income is high (low), then agents are motivated to adapt their saving or dissaving to the production technology, as they must in equilibrium. Thus the more risk averse agents are, the more choppy equilibrium stock prices will be (LaCivita and LeRoy (1981), Grossman and Shiller (1981)). This raises the possibility that the apparent volatility is nothing more than an artifact of the misspecification of risk neutrality implicit in (1.1).
204
s. F. LeRoy
A very simple modification of the efficient markets model is seen to be, in principle, sufficient to explain existing price volatility. Providing other explanations subsequently became a minor cottage industry, perhaps because it is so easy to modify the characterization of market efficiency so as to alter its volatility prediction (1.8) (see Eden and Jovanovic 1994, Romer 1993 or Allen and Gale 1994, for example, for recent contributions). For example, consider an overlapping generations model in which the aggregate endowment is deterministic, but some stochastic factor like a random wealth transfer or monetary shock affects individual agents. In general this random shock will affect equilibrium stock prices. This juxtaposition of deterministic aggregate dividends and stochastic prices contradicts the simplest formulation of market efficiency, since deterministic dividends means that the right-hand side of (1.8) is zero, while the left-hand side is strictly positive. Evidently, however, such models are efficient in any reasonable sense of the word: transactions costs are excluded and agents are assumed to be rational and to have rational expectations. Models with asymmetric information can be shown to predict price volatility that exceeds that associated with the conventional market efficiency definition. These efforts have been instructive, but should not be viewed as disposing of the volatility puzzle. The variance-bounds literature was never properly interpreted as pointing to a puzzle for which potential theoretical explanations were in short supply. Rather, it consisted in showing that a simple model which had served well in some contexts did not appear to serve so well in another context. Resolving the puzzle would consist not in pointing out that other more general models do not generate the volatility implication that the data contradict - this was never in doubt - but in showing that these models actually explain the observed variations in security prices. Such exPlanations have not been forthcoming. For example, attempts to incorporate the effects of risk aversion in security pricing have not succeeded (Hansen and Singleton (1983), Mehra and Prescott (1985)), nor have any of the other proposed explanations of excess volatility been successfully implemented empirically. The enduring interest of the variance-bounds controversy lies in the fact that it was here that it was first pointed out that we do not have good explanations, even ex post, for why security prices behave as they do. It is hard to imagine a more important conclusion, and nothing in the recent development of empirical finance has altered it.
6. Interpretation
Variance-bounds tests as currently formulated appear to be essentially free of major econometric problems - for example, LeRoy-Parke (1992) relied on Monte Carlo simulations to assess the behavior of test statistics, thus ensuring that any econometric biases in the real-world statistics appears equally in the simulated statistics. Therefore econometric problems are automatically accommodated in
205
setting the rejection region. These reformulated variance-bounds tests have continued to find excess price volatility. The debate about statistical problems with the variance-bounds tests has died out in recent years: it is no longer seriously argued that there does not exist excess price volatility relative to that implied by the simplest expected present-value relation. As important as the above-mentioned refinements of the variancebounds tests were in leading to this outcome, another development was still more important: conventional market efficiency tests were themselves evolving at the same time as the variance-bounds tests were being developed. The most important modification of the conventional return market efficiency tests was that they investigated return autocorrelations over much longer time horizons than had the earlier tests. Fama and French (1988) found significant predictability in returns. These return autocorrelations are most significant when returns are averaged over five to ten years; earlier studies, such as those reported in Fama (1970), had investigated return autocorrelations over weeks or months rather than years. There are several general methodological lessons to be learned from comparison of conventional market efficiency tests and variance bounds tests about econometric testing of economic theories. Since the same null hypothesis is tested, one would presume that there exist no grounds for a different interpretation o f rejection in one case relative to the other. Yet it is extraordinarily difficult to keep this in mind: the existence of excess volatility suggests the conclusion that "we cannot explain security prices", whereas the return autocorrelation results suggest the more workaday conclusion that "average security returns are subject to gradual shifts over time". To bring home the point that this difference in interpretation is unjustified, assume that security prices equal those predicted by the present-value model plus a random term independent of dividends which has low innovation variance, but is highly autocorrelated. One can interpret that random term either as representing an irrational fad or as capturing smooth shifts in security returns due to changes in investment opportunities, shifts in social conditions, or whatever. This modification will generate excess volatility, and will also generate return autocorrelations of the type observed. With the same alternative hypothesis generating both the excess volatility and the return autocorrelations by assumption, there can be no justification for attaching different verbal interpretations to the two rejections. The lesson to be learned is that rejection of a model is just that: rejection of a model. One must be careful about basing interpretations of the rejection on the particular test leading to the rejection, rather than on the model being rejected. Despite being generally aware of the possibility that excess price volatility is the same thing statistically as long-horizon return autocorrelation, many financial economists nonetheless dismiss the possibility that excess price volatility has anything to do with capital market efficiency. Fama (1991) is a good example. Fama began his 1991 update of his survey (1970) by reemphasizing the point (made also in his 1970 survey) that any test of market efficiency is necessarily a joint test with a particular returns model. He then surveyed the evidence (to which
206
s. F. LeRoy
he has been a major contributor) that there exists high negative autocorrelation in returns at long horizons, remarking that this is statistically equivalent to "long swings away from fundamental value" (p. 1581). However, in discussing the variance-bounds tests, Fama expressed the opinion that, despite the fact that they are "another useful way to show that expected returns vary through time", variance-bounds tests "are not informative about market efficiency". Contrary to this, it would seem that the joint-hypothesis problem applies no less or more to variance-bounds tests than to return autocorrelation tests: if one type of evidence is relevant to market efficiency, so is the other. Another lesson is that one must be careful about applying implicit psychological metrics that seem appropriate, but in fact are not. For example, it is easy to regard the apparently spectacular rejections of the variance bounds tests as justifying a strong verbal characterization, whereas the extraneous random term that accounts for return autocorrelations appears too small to justify a similar interpretation. This too is incorrect: a random term that adds and subtracts two or three percentage points, on average, to real stock returns (which average some six or eight per cent) will, if it is highly autocorrelated, routinely translate into a large increase in price variance. The small change in real stock returns is the same thing arithmetically as the large increase in price volatility, so the two should be accorded a similar verbal characterization.
7. Conclusion
In the introduction it was noted that the early interchanges between academics and finance practitioners about capital market efficiency generated more heat than light. Models derived from market efficiency, such as CAPM-based portfolio management models, made some inroads among practitioners, but for the most part the debate between proponents and opponents of rationality in financial markets died down. Parties on both sides agreed to disagree. The evidence of excess price volatility reopened the debate, since it seemed at first to give unambiguous testimony to the existence of irrational elements in security price determination. Now it is clear that there exist other more conservative ways to interpret the evidence of excess volatility: for example, that we simply do not know what causes changes in the rates at which future expected dividends are discounted. The variance-bounds controversy, together with parallel developments in financial economics, permit a considerable narrowing of the gap separating proponents and opponents of market efficiency. The existence of excess volatility implies that there are profitable trading rules, but it is known that these generate only small utility gains to those employing them. In fact, this juxtaposition between large departures from present-value pricing and small gains to those who try to exploit these departures provides the key to finding some middle ground in the efficiency debate. Proponents of market efficiency are vindicated because no one has identified trading rules that are more than marginally profitable. De-
207
tractors of market efficiency are vindicated because a large proportion of the variation in security prices remains unexplained by market fundamentals. Both are correct; both are discussing the same sets of stylized facts. Some proponents of market efficiency go to great lengths to argue that it is unscientific to interpret excess volatility as evidence in favor of the importance of psychological elements in security price determination; see, for example, Cochrane's otherwise excellent review (1991) of Shiller's (1989) book. On this view, evidence is scientific only when it is incontrovertible and, presumably, not susceptible to interpretations other than that proposed. At best this is an unconventional use of the term "scientific". Indeed, if the term "unscientific" is to be applied at all, should it not be to those who feel no embarrassment about the continuing presence in their models of an uninterpreted residual that accounts for most of the variation in the data? Given the continuing failure of financial models based exclusively on received neoclassical economics to provide ex-post explanations of security price behavior, why does being scientific rule out broadening the field of inquiry to include psychological considerations?
References
Allen, F. and D. Gale (1994). Limited market participation and volatility of asset prices. Amer. Econom. Rev. 84, 933-955. Campbell, J. Y. and R. J. Shiller (1988). The dividend-price ratio and expectations of future dividends and discount factors. Rev. Financ. Stud. 1, 195-228. Cochrane, J. (1991). Volatility tests and efficient markets: A review essay. J. Monetary Eeonom. 27, 463-485. Eden, B. and B. Jovanovic (1994). Asymmetric information and the excess volatility of stock prices. Economic Inquiry 32, 228-235. Fama, E. F. (1970). Efficient capital markets: A review of theory and empirical work. J. Finance 25, 283-417. Fama, E. F. (1991). Efficient capital markets: II. J. Finance 46, 1575-1617. Fama, E. F. and K. R. French (1988). Permanent and transitory components of stock prices. J. Politic. Econom. 96, 246-273. Flavin, M. (1983). Excess volatility in the financial markets: A reassessment of the empirical evidence. J. Politic. Econom. 91, 929-956. Flood, R. P. and R. J. Hodrick (1990). On testing for speculative bubbles. J. Econom. Perspectives 4, 85 101. Gilles, C. and S. F. LeRoy (1992). Bubbles and charges. Internat. Econom. Rev. 33, 323-339. Gilles, C. and S. F. LeRoy (1991). Economic aspects of the variance-bounds tests: A survey. Rev. Finane. Stud. 4, 753-791. Grossman, S. J. and R. J. Shiller (1981). The determinants of the variability of stock prices. Amer. Econom. Rev. Papers Proc. 71, 222-227. Hansen, L. and K. J. Singleton (1983). Stochastic consumption, risk aversion, and the temporal behavior of asset returns. Eeonometrica 91, 249-265. Kleidon, A. W. (1986). Variance bounds tests and stock price valuation models. J. Politic. Econom. 94, 953-1001. LaCivita, C. J. and S. F. LeRoy (1981). Risk aversion and the dispersion of asset prices. J. Business 54, 535-547.
208
S. F. LeRoy
LeRoy, S. F. (1973). Risk aversion and the martingale model of stock prices. Internat. Econom. Rev. 14, 436-446. LeRoy, S. F. and W. R. Parke (1992). Stock price volatility: Tests based on the geometric random walk. Amer. Econom. Rev. 82, 981-992. LeRoy, S. F. and A. D. Porter (1981). Stock price volatility: Tests based on implied variance bounds. Econometrica 49, 555-574. LeRoy, S. F. and D. G. Steigerwald (1993). Volatility. University of Minnesota. Lucas, R. E. (1978). Asset prices in an exchange economy. Econometrica 46, 1429-1445. Marsh, T. A. and R. C. Merton (1986). Dividend variability and variance bounds tests for the rationality of stock market prices. Amer. Econom. Rev. 76, 483-498. Marsh, T. A. and R. E. Merton (1983). Earnings variability and variance bounds tests for stock market prices: A comment. Reproduced, MIT Mehra, R. and E. C. Prescott (1985). The equity premium: A puzzle. J. Monetary Econom. 15, 145161. Romer, D. (1993). Rational asset price movements without news. Amer. Econom. Rev. 83, 1112-1130. Samuelson, P. A. (1965). Proof that properly anticipated prices flutuate randomly. Indust. Mgmt. Rev. 6, 41-49. Shiller, R. J. (1981). Do stock prices move too much to be justified by subsequent changes in dividends? Amer. Econom. Rev. 71, 421-436. Shiller, R. J. (1989). Market Volatility. MIT Press, Cambridge, MA. Shiller, R. J. (1986). The Marsh-Merton model of managers' smoothing of dividends. Amer. Econom. Rev. 76, 499-503. West, K. (1988), Bubbles, fads and stock price volatility: A partial evaluation. J. Finance 43, 636-656.
G. S. Maddala and C. R. Rao, eds., Handbook of Statistics, Vol. 14 1996 ElsevierScienceB. V. All rights reserved.
"7
GARCH Models of Volatility*
F. C. P a l m
I. Introduction Until some fifteen years ago, the focus of statistical analysis of time series centered on the conditional first moment. The increased role played by risk and uncertainty in models of economic decision making and the finding that common measures of risk and volatility exhibit strong variation over time lead to the development of new time series techniques for modeling time-variation in second moments. In line with Box-Jenkins type models for conditional first moments, Engle (1982) put forward the Autoregressive Conditional Heteroskedastic (ARCH) class of models for conditional variances which proved to be extremely useful for analyzing economic time series. Since then an extensive literature has been developed for modeling higher order conditional moments. Many applications can be found in the field of financial time series. This vast literature on the theory and empirical evidence from A R C H modeling has been surveyed in Bollerslev et al. (1992), Nijman and Palm (1993), Bollerslev et al. (1994), Diebold and Lopez (1994), Pagan (1995) and Bera and Higgings (1995). A detailed treatment of A R C H models at a textbook level is also given by Gourifroux (1992). The purpose of this chapter is to provide a selective account of certain aspects of conditional volatility modeling in finance using A R C H and G A R C H (generalized A R C H ) models and to compare the A R C H approach to alternatives lines of research. The emphasis will be on recent developments for instance in multivariate modeling using factor-ARCH models. Finally, an evaluation of the state of the art will be given. In Section 2, we introduce the univariate and multivariate G A R C H models (including A R C H models), discuss their properties and the choice of the functional form and compare them with alternative volatility models. Section 3 will be devoted to problems of inference in these models. In Section 4, the statistical properties of G A R C H models, their relationships with continuous time diffusion
* The author acknowledgesmany helpfulcommentsby G. S. Maddala on an earlier version of the paper. 209
210
F. C P a l m
models and the forecasting volatility will be discussed. Finally in Section 5 we conclude and comment on potentially fruitful directions of future research.
2. GARCH models
2.1. Motivation
G A R C H models have been developed to account for empirical regularities in financial data. As emphasized by Pagan (1995) and Bollerslev et al. (1994), many financial time series have a number of characteristics in common. First, asset prices are generally nonstationary, often have a unit root whereas returns are usually stationary. There is increasing evidence that some financial series are fractionally integrated. Second, return series usually show no or little autocorrelation. Serial independence between the squared values of the series however is often rejected pointing towards the existence of nonlinear relationships between subsequent observations. Volatility of the return series appears to be clustered. Heavy fluctuations occur for longer periods. Small values for returns tend to be followed by small values. These phenomena point towards time-varying conditional variances. Third, normality has to be rejected frequently in favor of some thick-tailed distribution. The presence of unconditional excess kurtosis in the series could be related to the time-variation in the conditional variance. Fourth, some series exhibit so-called leverage effects [see Black (1976)], that is changes in stock prices tend to be negatively correlated with changes in volatility. Some series have skewed unconditional empirical distributions pointing towards the inappropriateness of the normal distribution. Fifth, volatilities of different securities very often move together, indicating that there are linkages between markets and that some common factors may explain the temporal variation in conditional second moments. In the next subsection, we shall present several models which account for temporal dependence in conditional variances, for skewness and excess kurtosis.
2.2. Univariate G A R C H models
Consider stochastic models of the form

Y t = I~tnt
.1/2
P q o~ 2
(2.1)
iYt-i
ht=~0+~fliht-i+~
i=1 i=1
(2.2)
with Eet = 0, Var(et) = 1, ~0 > 0, fli >- 0, (zi ~ 0, and ~iPl fli ~- ~ i L l 0Q < 1. This is the (p,q)th order G A R C H model introduced by Bollerslev (1986). When fli = O, i = l, 2, ...p, it specializes to the ARCH(q) model put forward in a seminal paper by Engle (1982). The nonnegativity conditions imply a nonnegative variance, while the condition on the sum of the ai's and fli's is required for wide sense
GARCH models of volatility
211
stationarity. These sufficient conditions for a nonnegative conditional variance can be substantially weakened as shown by Nelson and Cao (1992). The conditional variance of yt can become larger than the unconditional variance given by o-2 = e 0 / ( 1 - ~P=l/~i - ~q=l e~)if past realizations of Yt; have been larger than o.2. As shown by Anderson (1992), the G A R C H model belongs to the class of deterministic conditional heteroskedasticity models in which the conditional variance is a function of variables that are in the information set available at time t. Adding the assumption of normality, the model can be written as Y, t ~,-1 ~ N(0, h,) , (2.3)
with ht being given by (2.2) and ~t-1 being the set of information available at time t-1. Anderson (1994) distinguishes between deterministic, conditionally heteroskedastic, conditionally stochastic and contemporaneously stochastic volatility processes. Loosely speaking, the volatility process is deterministic if the information set (o.-field) is identical to the o.-field of all random vectors in the system up to and including time t = 0, the process is conditionally heteroskedastic if contains information available and observable at time t - 1 , the process is conditionally stochastic if contains all random vectors up to period t - 1 whereas the volatility process is contemporaneously stochastic if the information set contains the random vectors up to period t. Notice the order imposed on the information structure of the various volatility representations. When S~P=I fl~ + ~q=l ~ / = 1, the integrated G A R C H (IGARCH) model arises [see Engle and Bollerslev (1986)]. From the G A R C H ( p , q ) model in (2.2), we obtain that [1 - c~(L) -/~(L)]yt2 = e0 + [1 - fl(L)]vt, where vt = ~ - ht are the innovations in the conditional variance process and c~(L)=y~fl~ziLi and //(L) =~P=I ,G~ Li. The fractionally integrated G A R C H model [FIGARCH(p, d, q)] proposed by Baillie, Bollerslev and Mikkelsen (1993) arises when the dpolynomial in the lag operator L, 1 - e(L) -/~(L), can be factorized as ~b(L)(1 - L) where the roots of ~b(z) = 0 lie outside the unit circle and 0 < d < 1. The F I G A R C H model nests the GARCH(p, q) model for d = 0, and the I G A R C H ( p , q ) model for d = 1. Allowing d to take a value in the interval between zero and one gives additional flexibility that may be important when modeling long-run dependence in the conditional variance. In the empirical analysis of financial data, GARCH(1,1) or GARCH(1,2) models have often been found to appropriately account for conditional heteroskedasticity. This finding is similar to that low order A R M A models usually describe the dynamics of the conditional mean of many economic time series quite well. It is important to notice that for the above models positive and negative past values have a symmetric effect on the conditional variance. Many financial series however are strongly asymmetric. Negative equity returns are followed by larger increases in volatility than equally large positive returns. Black (1976) interpreted this phenomenon as the leverage effect according to which large declines in equity values would not be matched by a decrease in the value of debt and would raise the debt to equity ratio. Models such as the exponential G A R C H (EGARCH)
212
F. C. Palm
put forward,by Nelson (1991), the quadratic G A R C H (QGARCH) model of Sentana (1991) and Engle (1990) and the threshold G A R C H ( T G A R C H ) of Zakoian (1994) allow for asymmetry. Nelson's E G A R C H model reads as follows
P q
In h, = a0 + ~
i=1
fli In h t - i
q- Z
i=1
ai((PSt--i -~ t~ I ~t--i I -49E I ~t-i 1)
(2.4)
where the parameters ~0, ~i, fli are not restricted to be nonnegative. A negative shock to the returns which would increase the debt to equity ratio and therefore increase uncertainty of future returns could be accounted for when ~ ~- 0 and ~p -< 0. Similarly, when fractional integration is allowed for in an exponential G A R C H model, the F I E G A R C H model is obtained. The Q G A R C H model is written by Sentana (1991) as
P
ht
= 0-2 -~-
I~ttXt_q -~-xtt_qAxt_q -~ ~'~fliht_i

i=1
(2.5)
where Xt-q = (Yt-1, Yt-2, ..., Yt-q) t. The linear term allows for asymmetry. The offdiagonal elements of A account for interaction effects of lagged values of xt on the conditional variance. The various quadratic variance functions proposed in the literature are nested in (2.5). The augmented G A R C H (GAARCH) model of Bera and Lee (1990) assumes ~ = 0 . Engle's (1982) A R C H model restricts = O, fli = 0 and A to be diagonal. The asymmetric G A R C H model of Engle (1990) and Engle and Ng (1993) assumes A to be diagonal. The linear standard deviation model studied by Robinson (1991) restricts fli = O, 0 - 2 = p2, ~ = 2p~o and A = ~o~0', a matrix of rank 1. The conditional variance then becomes
ht = (p
~otXt-q) 2.
P q
The T G A R C H model put forward by Zakoian (1994) is given by
ht = ~o + Z
i=1
fliht-i + --.~-"(a+ t-i q- o~;Y~-i)

i=1
'
(2.6)
where y+ = max{yt, 0} and y;- = min{yi, 0}. It accounts for asymmetries by allowing the coefficients a+ and ~7 to differ. As shown by Hentschel (1994) many members of the family of G A R C H models (taking p = q = 1) can be embedded in a Box-Cox transformation of the absolute G A R C H (AGARCH) model (at2 - 1)/2 ~-- cx0+ ~10-t2_1ff(et-1) + fl(0-L, - 1)/2 , (2.7)
where at = h~/2, f(zt) =[ et - b [ -c(et - b) is the news impact curve introduced by Pagan and Schwert (1990). For 2 >- 1, the Box-Cox transformation is convex, for ). -< 1, it is concave. For )~ = v = 1 and I c I<_ 1 expression (2.7) specializes to become the A G A R C H model. The model for the conditional standard deviation suggested by Taylor (1986) and Schwert (1989) arises when 2 = v = 1 and b = c -- 0. The exponential G A R C H model (2.4) for p -- q = 1 arises from (2.7)
213
when 2 = 0, v = 1 and b = 0. The T G A R C H model for the standard deviation is obtained from (2.7) when 2 = v = 1, b = 0 and I c I< 1. The G A R C H model (2.2) arises if 2 = v = 2 and b = e = 0. Engle and Ng's (1993) nonlinear asymmetric G A R C H corresponds to the values of 2 = v = 2 and c = 0 whereas the G A R C H model proposed by Glosten-Jagannathan-Runkle (1993) is obtained when 2 = v = 2, b = 0. The nonlinear A R C H model of Higgings and Bera (1992) leaves 2 free and v equal to 2 with b = e = 0. The asymmetric power A R C H ( A P A R C H ) of Ding, Granger and Engle (1993) leaves 2 free and v equal to 2,b = 0 and I c I_< 1. Sentana's (1991) Q G A R C H is not nested in the specification (2.7). As shown by Hentschel (1994), nesting existing G A R C H models in a general specification like (2.7)highlights the relations between these models and offers opportunities for testing sequences of nested hypotheses regarding the functional form for conditional second order moments. Crouhy and Rockinger (1994) put forward the general so-called hysteresis G A R C H ( H G A R C H ) model, in which, in addition to a threshold G A R C H part, they include a short term, up to a few days, and a long term, up to a few weeks, impact of returns on volatility. Engle, Lilien and Robins (1987) introduce the A R C H in mean ( A R C H - M ) model in which the conditional mean is a function of the conditional variance of the process
yt = 9(zt-1, ht) + ht .1/2 ~t ,
(2.8)
where zt-1 is a vector of predetermined variables, g is some function ofzt-1 and h t is generated by an A R C H ( q ) process. O f course, when ht follows a G A R C H process, expression (2.8) will be a G A R C H in mean equation. The most simple A R C H - M model has g(zt-1, ht) = 6hr. G A R C H in mean models arise in a natural way in theories of finance where for instance 9(zt-1, ht) could denote expected return on some asset with ht being a measure of risk. The mean equation (2.8) would then reflect the trade-off between risk and expected return. Pagan and Ullah (1988) refer to these models as models with risk terms.
2.3. Alternative models for conditional volatility

Measures of volatility which are not based on A R C H type specifications have also been put forward in the literature. For instance, French et al. (1987) construct monthly stock return variance estimates by taking the average of the squared daily returns and fit A R M A models to these monthly variance estimates. A procedure which uses high frequency data to estimate the conditional variance of low frequency observations does not make efficient use of all the data. Also, the conventional standard errors from the second stage estimation m a y not be appropriate. Nevertheless, the computational simplicity of this procedure and a related one put forward by Schwert (1989), in which the conditional standard deviation is measured by the absolute value of the residuals from a first step estimate of the conditional mean, makes them appealing alternatives to more complicated A R C H type models for preliminary data analysis.
214
F. C Palm
A related estimator for the volatility may be obtained from the inter-period highs and lows. As shown by Parkinson (1980), a high-low estimator for the variance in a random walk with constant variance and continuous time parameter is more efficient than the conventional sample variance based on the same number of end-of-interval observations. Along these lines, the relationship between volatility and the bid-ask spread for prices could be used to construct variance estimates for returns [see e.g. Bollerslev and Domowitz (1993)], Similarly, the recent efforts into developing option pricing formulae in the presence of stochastic volatility [see e.g. Melino and Turnbull (1990)] have established a positive relationship between the value of an option and the variance of the underlying security, that could be used to assess the volatility of the security price. Finally, information on the price returns distribution across assets at given points in time could also be used to quantify market volatility. When deciding on the form of the specification for the conditional variance one has to define the conditioning set of information and to select a functional form for the mapping between the conditioning set and the conditional variance. Usually, the conditioning set is restricted to include past values of the series itself. A simple two-step estimator of the conditional residual variance can be obtained from a regression of the square residuals against their own lagged values [see Davidian and Caroll (1987)]. Pagan and Schwert (1990) show that the OLS estimator is consistent although not efficient. This two-step estimator's role is that of a benchmark which can be computed in a straightforward way. Jump or mixture models possibly combined with a GARCH specification for the conditional variance have been used to describe time-variation in volatility measures, fat-tails and skewness of financial series. In the Poisson jump model it is assumed that upon the arrival of abnormal information a jump occurs in the returns. The number of jumps occuring at time t, nt, is generated by a Poisson distribution with parameter 2. Conditionally on the number of jumps nt, returns are normally distributed with mean ntO and variance at 2 = a~ + ntcr~. The parameter 0 denotes the expected jump size. The conditional mean and variance of the returns depend on the number of jumps at period t. Additional time dependency could be introduced by assuming that a} is generated by a GARCHtype process. In the finance literature, stochastic jumps have been usually modeled by means of a Poisson process [see e.g. Ball and Torous (1985), Jorion (1988), Hsieh (1989), Nieuwland et al. (1991) and Ball and Roma (1993)]. Vlaar and Palm (1993) compare the Poisson jump process with the Bernoulli jump model for weekly exchange rate data from the European Monetary System (EMS). The performance of both models is very similar in most instances. Using the Bernoulli process has the advantage that one avoids making a truncation error when cutting off the infinite sum in a Poisson process. The mixing parameter 2 could be allowed to vary over time. For instance Vlaar and Palm (1994) assume that the mixing parameter 2 of a Bernoulli jump model for risk premia on European currencies depends on the inflation differential with respect to Germany.
215
Another way of allowing for time dependence is to assume that the probabilities of being in state 1 during period t differ, depending on whether the economy was in say state 1 or state 2 in period t - 1 . Such a model has been put forward by Hamilton (1989) and applied to exchange rates [Engel and Hamilton (1990)], interest rates [Hamilton (1988)] and stock returns [Pagan and Schwert (1990)]. In Hamilton's basic model, an unobserved state variable zt can take the values 0 or 1. The transition probabilities from state j in period t - 1 to state i in period t, Pij are constant and given by Pll = P, P10 --- 1 - p, P00 = q and P 0 1 = 1 - - q. AS shown by Pagan (1995), zt evolves as an AR(1) process. Observed returns Yt in Hamilton's model are assumed to be generated by
Yt ~- to + fllgt + ( .2 "}- ~)gt) 1/2~' ,
(2.9)
with et '~ NID(O, 0-2). The expected values of Yt in the two states are t0 and t0 + fll respectively. The variances are 0-2 and a 2 + <k. The model therefore generates states with high volatility and states with low volatility. Expected returns can also vary across these types of states. The variance of returns conditional on the state in the period t - 1 can be expressed as War(yt [ Zt_l)
=
[<r2+ (1 - q)q~](1 - - z t _ l ) + [ p c ~ + a 2 ] Z t _ l
(2.10)
Quite obviously the conditional variance (2.10) exhibits time dependence. Hamilton and Susmel (1994) generalize the Markov switching regime model by allowing the disturbances to be ARCH. Their model is called switching regime A R C H model (SWARCH). As in equation (2.9), the conditional mean of the S W A R C H model depends linearly on the state variable zt. The disturbance term of yt is assumed to follow an autoregressive process of order p with an error ut = v / ~ f t t where fit follows an ARCH(q) process with leverage effects as in the model of Glosten et al. (1993) and 9~t is constant factor which differs across regimes. The innovation fit is assumed to have a conditional student t-distribution with mean zero. Transitions between regimes are governed by an unobserved Markov chain. The authors use weekly returns on the valueweighted portfolio of stocks traded on the New York Stock Exchange for the period July 3, 1962 to December 29, 1987. Various A R C H models are compared to S W A R C H models allowing for up to four regimes. The S W A R C H specification with leverage terms, a conditional student t-distribution with a low number of degrees of freedom and allowing for four regimes is found to perform best. Along similar lines using a two-state S W A R C H model, Cai (1994) examines the issue of volatility persistence in monthly returns of three-month treasury bills in the period 1964,8 to 1991,11. The persistence in A R C H processes found in previous studies can be accounted for by discrete shifts in the intercept in the conditional variance o f the process. Two periods during which a regime shift occurred are the period of the oil crisis 1974,2 - 1974,8 and the period 1979, 9 - 1982,8 associated with a policy change of the Federal Reserve Bank. Estimates of the conditional variance which do not depend on specific assumptions about the functional form can be obtained using nonparametric
216
F. C Palm
methods. Pagan and Schwert (1990) and Pagan and Hong (1991) use a nonparametric kernel estimator and a nonparametric flexible Fourier form estimator. The kernel estimator of a conditional moment of Yt, denoted by O(Yt) with a finite number of conditioning variables xt reads as
T s=l T
E[g(Yt) I xtl = E g(y,)X(xt - x , ) / E X(xt - x,) ,

s=l
(2.11)
where K is a kernel function which smoothes the data. Various types of kernels might be employed. A popular one is the normal kernel which has also been used by Pagan and Schwert (1990)
K(xt-Xs)
= (27r) - l / 2
I H 1-1/2 exp[-~(xt-Xs)tH(xt-Xs)]
(2.12)
H is a diagonal matrix with kth diagonal element set equal to the bandwidth &kT-ll(4+q), with ~-kbeing the standard deviation ofxk,, k = 1, ...q, with q being the dimension of the conditioning set. An alternative nonparametric estimator involves a global approximation of the conditional variance using a series expansion. Among the many existing series expansions, the Flexible Fourier Form (FFF) proposed by Gallant (1981) has been used extensively in finance. The conditional variance is represented as the sum of a low-order polynomial and trigonometric terms constructed from past &t's (the residuals from a regression for yt). Then, the specification for 0.t 2 becomes
L j=l 2 k=l
(T2 t ~- 0.2 _~_E{O~j~t_j.~_flj~t2_jq_ E [ ~ ) j k C O S ( k ~ t _ j )
+6jk sin(kOt_j)]} .
(2.13)
In theory, the number of trigonometric terms should tend to infinity, but in practice in terms of significance, it is often not worthwhile to go beyond an order of two. A drawback of (2.13) is the possibility that estimates of 0.2 can be negative. The estimator in (2.13) has been applied to stock returns by Pagan and Schwert (1990) for L = 1. The estimate of 0.2 is roughly constant and similar for the kernel, GARCH(1,2) and FFF estimation methods across most of the range of ~t-1. Only for large positive and negative values of &t-1 the estimators exhibit a different behavior. For negative values of &t-l, the volatility estimates increase dramatically. Also, the trigonometric terms in (2.13) appear to be highly significant when tested jointly using an F-test. The nonparametric estimates of conditional volatility using kernels or Fourier series differ from the parametric estimates for the GARCH, EGARCH and Hamilton model in periods when stock prices fall. In particular, large negative unexpected returns lead to a large increase in volatility. Parametric estimates appear to slowly adjust to large shocks and the effects of these shocks exhibit persistence. The parametric methods use the persistent aspects while the nonparametric methods use the highly nonlinear response to large negative shocks. While the nonparametric estimators of conditional volatility have a much higher
217
explanatory power than the parametric GARCH, EGARCH and Hamilton models, in particular in explaining asymmetries, they are inefficient compared with parametric methods. This suggests that improvements could be obtained by merging the two approaches to capture a richer set of specifications than are currently employed. Other nonparametric approaches have been put forward in the literature. Gouri6roux and Monfort (1992) propose to approximate the unknown relation between yt and et by a step function of the form
J J
yt = Z
j=l
o~jlAj(yt-l) + Z f l j l A j ( Y t - 1 ) 8 t
j=l
(2.14)
where A j , j = 1,2, ...J is a partition of the set of values of Yt-1, l&(Yt-1) is an indicator variable taking the value 1 when yt-1 is in A} and zero otherwise and et is white noise. This model is called Qualitative Threshold Autoregressive Conditionally Heteroskedastic (QTARCH) model. If regime j applies to the variable Yt-1, the conditional mean and variance of Yt are given by ~} and flj respectively. The process of yt is determined by qualitative state variables zt = (1~, (yt), ... 1As (yt)) which are generated by a Markov chain. For instance, the partition AI, ...Aj may correspond to the different stages of expansion and contraction of the financial market. By refining the partition A1, ...Aj sufficiently, one can use (2.14) to approximate more complex specifications for the conditional mean and variance of Yr. Alternatively the conditional variance specification could be refined by adding a GARCH term. The pseudomaximum likelihood estimators of aj and flj are the sample mean and variance computed for regime j. The QTARCH model approximates the conditional mean and variance by step functions whereas the TARCH model of Zakoian (1994) relies on a piecewise linear approximation of the conditional variance function. The nonparametric kernel estimators smooth the conditional moments and the F F F estimators approximate the conditional moments using functions which are smoother than piecewise linear or step functions. Along similar lines, Engle and Ng (1993) use linear splines to estimate the shape of the response to news. Their procedure is called partially nonparametric (PNP) as the long memory component is modeled as parametric and the relationship between news and volatility is treated nonparametrically. Among semiparametric methods extensively used in analyzing dependencies in financial data, we should mention the seminonparametric (SNP) models based on a series expansion with a Gaussian VAR leading term proposed by Gallant and Tauchen (1989). Assume that the conditional distribution of an N x 1 vector yt given the entire past depends only on a finite number L of lagged values of yt, denoted by xt-i = (Y't-i., Y't-L+l, ""Y~-I )' which is a vector of length L . N . The procedure consists of approximating the conditional density of yt given xt-~ by a truncated Hermite expansion which has the form of a polynomial in zt times the standard normal density, where zt is the centered and scaled value of yt, zt = R -~ (yt - bo - Bxt_l).
218
F. C. Palm
The truncated expansion is the semiparametric model. The conditional SNP density for zt given xt-1 is approximated by
K~ 2
i~=oa~(xt_l)U ~] (p(u)du
where ~0 denotes the standard Gaussian density, ~ = (~1, ~2...~N)I, z ~ ------a~v/=l (zi) ~' which is of degree ] ~ I= ~N=I ] ~i ], a~(x) = 2..,iBj=oa~x ,/3 = (/31,/32,---/3uS, [/31 - - ~ N ~ l l / 3 i l x P = ~iZ-_l ~ (xi) P, and Kz and Kx are positive integers. The conditional density of Yt given xt-1 is h(yt [ xt-1) = f i R -1 (Yt - bo -BXt_l) [ xt-1]/ det(R). As pointed out by Gallant and Tauchen (1989), by increasing Kz and Kx simultaneously, an SNP model will yield arbitrarily accurate approximations to a class of models which includes fat-tail distributions (t-like distributions) and skewed distributions. As the stationary distribution of the A R C H models is not known in closed form, one cannot say that the A R C H model belongs to the above class. However, the stationary distribution of the A R C H model has fat tails and only a finite number of moments as the t-distribution. Conditionally, the variances of A R C H and SNP models are polynomials in a finite number of lags. One might therefore expect that the conditional density of an A R C H model could be approximated arbitrarily closely by SNP for large Kz and Kx. For large L, this may also be true for G A R C H models, of which the conditional variance is a polynomial in an infinite number of lags. An alternative to using the A R C H framework is to assume the changing variance to follow some latent process. This leads to a stochastic variance or volatility (SV) model [see e.g. Ghijsels et al. (1995)]. Assuming for the sake of simplicity of exposition that the drift parameter is zero, a simple SV model for returns yt has been proposed by Taylor (1986)
-
y, = et exp(~t/2), et N NID(O, 1) ,
(2.16)
~t+l = ~o + .q~o~t+ rlt, tit ~ NID(O, O'2t/) ,

where the random variables et and r/t are independent. This model has been used by Hull and White (1987) for instance in pricing foreign currency options. Its time series properties are discussed by Taylor (1986,1994). The statistical properties of SV models are documented in Taylor (1994) who denotes these models as autoregressive variance (ARV) models. A major difficulty arises with the estimation of SV models which are nonlinear and not conditionally Gaussian. Many estimation methods such as the generalized method of moments (GMM) or quasi maximum likelihood method (QML) used to estimate SV models are inefficient. But methods relying on simulation-based techniques make it possible to perform Bayesian estimation or classical likelihood analysis [see e.g. Kim and Shephard (1994)]. Currently, only few studies compare the performance of the G A R C H and SV approaches to modeling volatility. Ruiz
G A R C H models o f volatility
219
(1993) compares the GARCH(1,1), EGARCH(1,0) and ARV(1) models when applied to daily exchange rates from 1/10/1981 to 28/6/1985 for the Pound sterling, Deutsche mark, Yen and Swiss franc vis-~i-vis the U.S. dollar. Within sample performance of the three models is very similar. When the models are used to forecast out-of-sample volatility, the A R C H models exhibit severe biases which do not occur for the SV volatilities. For daily and weekly returns on the S&P 500 index over the periods 7/3/1962 to 12/31/1987 and 7/11/1962 to 12/30/1992 respectively, Kim and Shephard (1994) conclude that a simple first order SV model fits the data as well as the popular A R C H models. For daily data on the S&P 500 index for the years 1980 to 1987, Danielsson (1994) finds that the EGARCH(2,1) model performs better than ARCH(5), GARCH(1,2), IGARCH(1,1,0) models. It also outperforms a simple SV model estimated by simulated maximum likelihood. The difference between a dynamic SV model and the E G A R C H log-likelihood values is 25.5 in favor of the SV model with four parameters whereas the E G A R C H model has five parameters.
2.4. Multivariate GARCH models

With the exception of the SNP model, the models presented in the Sections 2.2 and 2.3 are univariate. The analysis of many issues in asset pricing and portfolio allocation requires a multivariate framework. Consider an N x 1 vector stochastic process {yt} which we write as
yt
= ~t
c~1/2 ~t
(2.17)
with et being an N 1 i.i.d, vector with Eet = 0 and Var(et) = IN and f2t being the N N covariance matrix of Yt conditional on information available at time t. In a multivariate linear G A R C H ( p , q ) model, Bollerslev, Engle and Wooldridge (1988) assume that f2t is given by a linear function of the lagged cross squared errors and lagged values of f2t
q i=1 P
vech(Ot) = ~o + z Z Aivech(et-id,_i) + ~ Bivech(Ot_i) ,

i=1
(2.18)
where veeh(.) denotes the operator that stacks the lower portion of an N x N matrix as an N(N + 1)/2 by 1 vector. In (2.18), ~0 is an N(N + 1)/2 vector and the Ai and Bi's are N(N + 1)/2 matrices. The number of unknown parameters in (2.18) equals N ( N + 1)[1 N ( N + 1)(p +q)/2]/2 and in practice some simplifying assumptions have to be imposed to achieve parsimony. For instance, Bollerslev et al. (1988) use the diagonal GARCH(p,q) model assuming that the matrices Ai and Bi are diagonal. Other representations include the constant conditional correlation model used by Baillie and Bollerslev (1990) and Vlaar and Palm (1993) who assume the conditional variances to be G A R C H processes. Conditions for the parametrization (2.18) to ensure that f2t is positive definite for all values of et are difficult to check in practice. Engle and Kroner (1995)
220
F. C. Palm
propose a parametrization of the multivariate G A R C H process to which they refer as the B E K K (Baba, Engle, Kraft and Kroner) representation
K q K p
Qt = C~'C~-'}-Z Z Ai~st-iett-iAi*k+ ~ Z Gi*k~-2t-iGi*k'

k=l i=l k = l i=1
(2.19)
where C~),Ai* k and Gi* k a r e N N parameter matrices with C~ being triangular and the summation limit K determines the generality of the process. The covariance matrix in (2.19) will be positive definite under weak conditions. Also this representation is sufficiently general that it includes all positive definite diagonal representations and most positive definite vec representations of the form (2.18). The representation (2.19) is usually more parsimonious in terms of numbers of parameters than (2.18). Given that the two parametrizations are found to be equivalen t under quite general circumstances, the B E K K parametrization might be preferred because then positive definiteness is ensured quite easily. Engle, Ng and Rothschild (1990) have proposed the factor-ARCH model as a parsimonious structure for the conditional covariance matrix of asset excess returns. These models incorporate the notion that risk on financial assets can be decomposed in a limited number of common factors f t and an asset specific (idiosyncratic) disturbance term. A factor structure arises from the Arbitrage Pricing Theory (APT) although APT does not imply that the number of factors is finite. The factor-ARCH model is used by Engle, Ng and Rothschild (1990) to model interest rate risk while in a companion paper, Ng et al. (1992) consider risk premia and anomalies to the capital asset pricing model (CAPM) on the U.S. stock market. Diebold and Nerlove (1989) apply a one factor model to exchange rates whereas King, Sentana and Wadhwani (1994) analyze the links between national stock markets using a factor model. The factor model reads as follows
yt = #t + Bft + et ,
(2.20)
with Yt being an N 1 vector of returns, #t is an N x 1 vector of expected returns, B is a N x k matrix of factor loadings, f t is a k 1 vector of factors with conditional covariance matrix At and ~t denotes an N x 1 vector of idiosyncratic shocks with conditional covariance matrix 7tt. The factors and the idiosyncratic shocks are uncorrelated. The conditional covariance matrix of Yt is then given by
f2t = B A t f f + 7tt .
(2.21)
When 7~t is constant and At has constant (possibly zero) off-diagonal elements, the covariance matrix Ot can be expressed as
k
at : ~_~ bibl2it + V ,
i=1
(2.22)
where bi denotes the i - th column of B and 7' groups the off-diagonal elements of At with the constant elements of the covariance matrix of et. As pointed out by
221
Engle et al. (1990) the model in (2.22) ~s observationally equivalent to a similar model with constant 2 's but time-varying b 's. An implication of the factor model (2.22) is that if k -< N, we can construct N - k portfolio's of assets, i.e. linear combinations of yt, which have constant variance. There are k portfolios which have 2it plus a constant as conditional variance. The factor model (2.20) has to be completed by specifying processes for the factor variances. One could for instance assume that 2~t is generated by a univariate G A R C H process. Applying a one factor model to weekly data on the log differences for seven exchange rates vis/t vis the US dollar for the period July 1973 to August 1985, Diebold and Nerlove (1989) assume that the single common factor has a variance 2t = ~0 + 0 ~'~}21=( 1 3 - i'lf 2 j t-r Notice that their covariance matrix is of dimension seven by seven but contains only nine unknown parameters, cf. those of 7/, e0 and 0. By imposing a linearly decreasing pattern on the ARCH-coefficients, they achieve a substantial reduction of the number of parameters to estimate. A GARCH(1, 1) specification would instead yield geometrically decreasing A R C H coefficients. An alternative proposed by Engle et al. (1990) consists in assuming that the returns of each of the k factor-representing portfolios follow a G A R C H process. For i = 1,...k, the conditional variance of the i - th portfolio is then given by
q~ifa,i
, ~- fli~)i~'~t -1 ~ i '
(2.23)
where for simplicity reason a GARCH(1, 1) model is assumed and q~i is an N x 1 vector of weights of the portfolio. The conditional variances of the portfolios differ from 2it by a constant term only, i.e. dplfatefi = 2it + 4)l~q~i, which together with (2.23) can be substituted into (2.22) so as to express the conditional covariance matrix fat in (2.22) in terms of the conditional portfolio variances. Notice that 4@i = 1 and ~blb j = 0,j i. While the factor-GARCH model has theoretically appealing features, its estimation requires highly nonlinear methods. Maximum likelihood estimation has been considered among others by Lin (1992). Also, an identification issue has to be resolved when the factor portfolios are not directly observed before the model can be estimated [see Sentanta (1992)]. In particular the factor representing portfolios have to be identified. In some instances, it is appropriate to assume that the factor representing portfolios are known and observed. For example, Engle et al. (1990) explain the monthly returns on Treasury bills with maturities ranging from one to twelve months and the value-weighted index of NYSE-AMSE stocks, for the period from August 1964 to November 1985. They select two factor-representing portfolios one of which having equal weights on each of the bills and zero weight on the stock index and the other having zero weights on the bills and all weight on the stock index. Models with observed factor-representing portfolios can be consistently estimated in two-steps. One can first estimate the univariate models for the portfolios. Using the estimates obtained in the first step, the factor loadings can be estimated consistently up to a sign as individual assets have a variance which is linear in the factor variances with coefficients that are equal to the squared factor loadings.
222
F. C. Palm
King et al. (1994) estimate a multivariate factor model as in (2.20) from monthly data on US dollar excess returns for 16 national stock markets for the period 1970,1 to 1988,10 using the maximum likelihood method. They assume that the risk premium #t can be expressed as #t = BAtz, with At being a diagonal matrix and z being a k x 1 vector of constant parameters representing the price of risk for each factor. King et al. (1994) consider the model for k = 6 with 4 observed and 2 unobserved factors. The observable factors represent the unanticipated shocks to asset returns. These shocks are estimated as the common factors extracted from a four-factor model applied to the residuals from a vector autoregression for xt, a set of 10 observed macroeconomic variables. The variances of the common and idiosyncratic terms are assumed to follow univariate GARCH(1, 1) processes in which the past squared values of the factors are replaced by their linear projection given some available information set. Notice that when the covariance matrix of the factor-GARCH model depends on prior unobservables, the return components have a conditionally stochastic volatility representation [see Anderson (1992), Harvey et al. (1992)]. A major finding is that only a small proportion of the covariances between national stock markets and their time-variation can be explained by observed factors. Conditional second moments are explained to a large extent by unobserved factors. This finding underlines the usefulness of models allowing for unobservable factors in explaining volatility within markets and volatility spillovers between markets. The application in King et al. (1994) also illustrates the appropriateness and feasibility of the use of factor models to explain the timedependence in second order moments of a multivariate time series of dimension 16. While it was possible to jointly estimate the factor model with some 200 parameters, the authors had to estimate the vector autoregression for xt separately in a first step. Given that the dimension of the parameter space of multivariate factor-GARCH models will usually be high, two-step estimation procedures will be a feasible alternative to fully joint estimation procedures based on the likelihood principle.
2.5. Persistence in the conditional variance

For high-frequency time series data, the conditional variance estimated using a G A R C H ( p , q) process (2.2) often exhibits persistence, that is ~P=I fli + ~qi=l ~i is close to one. When this sum is equal to one, the I G A R C H model arises. This means that current information remains of importance when forecasting the conditional variance for all horizons. The unconditional variance does not exist in that case. Bollerslev (1986) has shown that under normality, the G A R C H process (2.2) is wide sense stationary with unconditional variance v a r ( y t ) = c~0(1~P=I fli -- ~/q=l (Xi)-1 and cov(yt, Ys) = 0 for t s if and only if ~-~/P=Ifli ~- ~ i q l O~i -4 1. Nelson (1990a) and Bougerol and Picard (1992) prove that the I G A R C H model is strictly stationary and ergodic but not covariance stationary. Similarly, as shown in Bollerslev and Engle (1993), the multivariate G A R C H (p,q) process (2.18) is covariance stationary if and only if the roots of the
223
characteristic polynomial det[I - A(2 -1) - B(~.-1)] = 0 lie inside the unit circle. In that case, there will be no persistence in the variance. On the other hand, if some eigenvalues lie on the unit circle, shocks to the conditional covariance matrix remain important for forecasts of all horizons. If the eigenvalues are outside the unit circle, the effect of a shock to the covariance matrix will explode over time. Notice that the above conditions on the roots of the characteristic polynomial also apply to the BEKK model (2.19), as shown by Engle and Kroner (1995).In many empirical studies of financial data using univariate GARCH(p, q) models, the estimated parameters are found to have a sum close to one. A detailed survey of the literature can be found in Bollerslev, Chou and Kroner (1992). The multivariate k factor model (2.20) with a GARCH(p, q) process of the form (2.23) for the factors will be covariance stationary if the portfolios and ~t are covariance stationary. In line with the concept of cointegration between a set of variables, Bollerslev and Engle (1993) put forward a definition of co-persistence in variance. The basic idea is that several time series may show persistence in the variance while at the same time some linear combinations of the variables may exhibit no persistence in the variance. Bollerslev and Engle (1993) derive necessary and sufficient conditions for co-persistence in the variances of a multivariate GARCH(p, q) process. In practice, co-persistence in the variances allows one to construct portfolios with stationary volatilities from the assets which have nonstationary return volatilities. The finding of unit roots in multivariate GARCH models has led to new developments in factor-ARCH models. Engle and Lee (1993) formulate a factor model of the form of the King et al. (1994) within which they allow for permanent IGARCH(1,0, 1) and transitory GARCH (1, 1) components in the volatilities. Engle and Lee (1993) apply several variants of the component model to daily returns on the CRSP value-weighted index and fourteen individual stocks of large U.S. companies for a sample period from July 1, 1962 to December 31, 1991. Their major empirical finding is that the persistence of individual return volatilities is due to the persistence of both market volatility (assumed to be a common factor) and idiosyncratic volatilities of individual stocks. These results imply that the hypothesis that stock return volatility is co-persistent with market volatility is rejected when market shocks are assumed not to affect idiosyncratic volatility. Using a factor-component-GARCH model with observed factors, Palm and Urbain (1995) also find significant persistence in the common and idiosyncratic factors volatilities using daily observations on returns of stock price indices for Europe, the Far-East and North-America for the period February 1982-August 1995. While the use of factor-component-GARCH models is still in its infancy, the empirical finding of persistence in return volatilities [see e.g. French, Schwert and Stambaugh (1987), Chou (1988), Pagan and Schwert (1990), Ding et al. (1993) and Engle and Gonzalez-Rivera (1991)], common factor and/or idiosyncratic factor volatilities raises a number of important questions. For instance is the finding of persistence in volatilities in agreement with the stationarity assumption for asset returns which has often been made in the literature? Would finance
224
F. C. Palm
theory not predict that a nonstationarity in the volatility leads to a nonstationarity in asset returns? What is the precise form of the persistence in volatilities and in the return series? Should it be modeled as a unit root in the permanent component of the conditional variances or should one allow for fractional integration or should it be modeled as regimes switches as e.g. in Cai (1994) or in Hamilton and Susmel (1994)? There is increasing evidence that return series exhibit fractional integration [see e.g. Baillie (1994)]. The difficulty of empirically distinguishing between persistence arising from unit roots or from fractional differencing is due to the low power of many existing testing procedures.
3. Statistical inference
3.1. Estimation and testing

G A R C H models are usually estimated by the method of maximum likelihood (ML) or quasi-maximum likelihood (QML). In some applications, the generalized method of moments (GMM) has been used [see e.g. Glosten et al. (1993)]. Stochastic volatility models were usually estimated by GMM. More recently indirect inference methods [see e.g. Gouri6roux and Monfort (1993) and Gallant et al. (1994)] have been advocated and used to estimate stochastic volatility models. Bayesian methods have been developed for volatility models [see e.g. Jacquier et al. (1994) for the estimation of stochastic volatility models and Geweke (1994) for the estimation of stochastic volatility and G A R C H models]. For simplicity reason, we discuss M L estimation of the GARCH(1,1) model (2.1) and (2.2) under the assumption that et is distributed as IN(0, 1). The log-likelihood function L for T observations on yt, denoted by y = (Yl, y2...Yr) ~, can be written as
T
L ( y l O) = Z L t
t=l
,
!
(3.1)
where Lt = c -1n h t - ~/h, with 0 = (c~0,~l,/~l),hi = a 2 = ~0/(1 - ~l - 31) and ht given by (2.2) for t > 1. Given initial values for the parameter vector 0, the log-likelihood function (3.1) can be evaluated by computing ht, t = 1,2, ...T recursively and substituting the values in (3.1). Standard numerical algorithms can be used to compute the maximum of (3.1). As is well-known, under regularity conditions given for instance in Crowder (1976), the value of 0 which maximizes L, 0ML, is consistent, asymptotically normally distributed and efficient V~(bML - 0) ~ N(0, Var(/)ML)) , (3.2)
where Var(0ML) = --IT -1 ~[]~=1EO2LT/O000'] -I The asymptotic covariance matrix of 0Mr can be consistently estimated by the inverse of the Hessian matrix associated with (3.1), evaluated at 0~L. A proof of the consistency and asymptotic
225
normality of the ML-estimator in GARCH(1, 1) and IGARCH(1, 1) models is given by Lumsdaine (1992) under the condition that E[ln(ele2 + fll)] < 0. The existence of finite fourth moments of et is not required. Unlike models with a unit root in the conditional mean, the ML estimator in models with and without a unit root in the conditional variance have the same limiting distribution. As shown by Weiss (1986) for time series models with ARCH errors, by Bollerslev and Wooldridge (1992) and Gouri6roux (1992) for GARCH processes, the quasi-ML estimator or the pseudo-ML estimator of 0 is obtained by maximizing the normal log-likelihood function (3.1) although the true probability density function is non-normal. Under regularity conditions the QML-estimator has the following asymptotic distribution
x/T(OQML - O) ~ N(O,B-1AB -I) ,
(3.3)
where A = Eo[OLt/O0 OZt/O0 t] is the covariance matrix of the score vector of L and B = -Eo[O2Lt/tgOOO~] where E0 denotes the expectation conditional on the true probability density function for the data. Of course, if the latter is the normal distribution, the asymptotic distributions in (3.2) and (3.3) will be identical. Lee and Hansen (1994) prove consistency and asymptotic normality of the QML estimator of the Gaussian GARCH(1, 1) model. The disturbance scaled by its conditional standard deviation need not be normally distributed nor independent over time. The GARCH process may be integrated el + fll = 1 and even explosive el + fll > 1 provided the conditional fourth moment of the scaled disturbance is bounded. In finite samples, for symmetric departures from conditional normality the QML has been found close to the exact ML-estimator in a simulation study by Bollerslev and Wooldridge (1992). For non-symmetric conditional true distributions, both in small and large samples the loss of efficiency of QML compared to exact ML can be quite substantial. Semi-parametric density estimation as proposed by Engle and Gonzalez-Rivera (1991) using a linear spline with smoothness priors will then be an attractive alternative to QML. With respect to ML and QML methods to estimate GARCH models, some comments can be made. First, although GARCH generates fat-tails in the unconditional distribution, when combined with conditional normality, it does not fully account for excess-kurtosis present in many financial data. The student t-distribution with the number of degrees of freedom to be estimated has been used by several authors. Other densities which have been used in the estimation of GARCH models are the normal-Poisson mixture [see e.g. Jorion (1988), Nieuwland et al. (1991)], the normal-lognormal mixture distribution [e.g. Hsieh (1989)] and the generalized error distribution [see e.g. Nelson (1991)] and the Bernoulli-normal mixture [Vlaar and Palm (1993)]. De Vries (1991) proposes to use a GARCH-like process with conditional stable distribution which models the clustering of volatility, has fat tails and an unconditional stable distribution. Second, for some models such as the regression model under conditional normal ARCH-disturbances, the information matrix is block-diagonal [see e.g. Engle (1982)]. The implications are important in that the regression coefficients
226
F.C. Palm
and the A R C H parameters can be estimated separately without loss of asymptotic efficiency. Also, their variances can be obtained separately. These results have been generalized by Linton (1993) who shows that the parameters of the conditional mean are adaptive in the sense of Bickel when the errors follow a stationary ARCH(q) process with an unknown conditional density which is symmetric about zero. In other words, estimating the unknown score function using the kernel method based on the normal density function yields parameter estimates of the conditional mean which have the same asymptotic distribution as the M L estimator based on the true distribution. This block-diagonality does not hold for the A R C H - M model as there the conditional mean of a series depends on parameters of the conditional variance process. Also for an E G A R C H disturbance process, the block-diagonality of the information matrix fails to hold. Indirect inference put forward by Gourirroux and Monfort (1993) and the efficient method of moments by Gallant et al. (1994) will be attractive when it is difficult to apply Q M L or M L but it is possible to estimate some function of the parameters of interest from the data. The indirect estimator has been used by Engle and Lee (1994) to estimate diffusion models of stochastic volatility. As a starting point, they estimate GARCH(1,1) models from daily returns on the S&P 500 Index for the period 1991,1-1990,9. The resulting Q M L estimates for 0 are used to estimate the parameters of the underlying diffusion model for the asset price Pt and its conditional variance o-~ (a) (b) (c) Yt = 12 dt + atdwyt da~ = qS(m - ~rZ)dt + ~r~tdw~t correl(dwy, dw~) = p (3.4)
with yt = d p t / p t , dwy and dw~ being Wiener processes, using the relationships which match the first and second order conditional moments of the G A R C H model and the diffusion model (see Nelson (1990b): m = ~0, ~b = (1 - ~1 - ~l)dt, = ~1 ( v / ~ - 1)dt, 6 -= 1 with t being the conditional kurtosis of the shocks of the G A R C H model. Indirect estimation based on estimates of a discrete time G A R C H model appears to be an appropriate way to estimate the parameters of the underlying diffusion process. To estimate stochastic volatility models, Gallant et al. (1994) use an indirect method based on the score of two auxiliary models. Both auxiliary models assume an SNP density as given in (2.15). When the SNP density is in the form of an A R C H model with conditionally homogeneous non-Gaussian innovations, it is termed nonparametric A R C H model because it is similar to the nonparametric A R C H process considered by Engle and Gonzalez-Rivera (1991). In the second model, the homogeneity constraint is dropped and the model is called the fully nonparametric specification. The SNP models are estimated by QML. Gallant et al. (1994) use daily observations on the S&P Composite Index for the period 1928-1987 to estimate a univariate model and daily observations for the period 1977-1992 to estimate a trivariate model for the S&P NYSE Index, the
227
DM/$ exchange rate and the tree month Eurodollar interest rate. The stochastic volatility model is found to be able to match the A R C H part of the nonparametric A R C H score for stock prices and interest rates. However it does not match the moments of the distribution of the innovations. For the exchange rate series, the stochastic volatility model fails to fit the A R C H part. Testing for the presence of ARCH(q) has also been extensively considered in the literature. A simple and frequently used test of the hypothesis H0 : al = a2 . . . . . ~q = 0 against the alternative H0 : al _> 0, ...~q ~ 0 with at least one strict inequality is the Lagrange multiplier (LM) test proposed by Engle L M = ~1fpoz(jz)_lz~ f ,
1Y t -21 , ' " Y t -2q ) , l where zt = ( ,
( y 2 / ~ 0 -- 1).
(3.5)
t
z = (Zl,...ZT)
and f 0 is the column vector of
An asymptotically equivalent statistic is L M = TR 2, where R 2 is the squared multiple correlation between f 0 and z and T is the sample size. This is also the R 2 of a regression of Y~t on an intercept and q lagged values of y2. As shown by Engle (1982), a two-sided L M test has an asymptotic z2-distribution with q degrees of freedom. Demos and Sentana (1991) report critical values for the one-sided L M test which are robust to non-normality. A difficulty in constructing L M tests for G A R C H disturbances is that the block of the information matrix whose inverse is required, is singular, as pointed out by Bollerslev (1986). This is due to the fact that under the null hypothesis, fll in the GARCH(1,1) model is not identified. Lee (199l) has shown how this difficulty can be avoided and that the L M tests for A R C H and G A R C H errors are identical. Lee and King (1993) derive a locally most mean powerful (LMMP)-based score (LBS) test for the presence of A R C H and G A R C H disturbances. The test is based on the sum of the scores evaluated at the null hypothesis and nuisance parameters replaced by their M L estimates. In the absence of nuisance parameters, the test is LMMP. The sum of the scores is then standardized by dividing it by its large sample standard error. The resulting test statistic has an asymptotic N(0,1) distribution. The test statistics used to test against an ARCH(q) process can also be used to test against a G A R C H ( p , q) process. In small samples, the LBS test appears to have better power than the LM-test and its asymptotic critical values were found to be at least as accurate. Wald and likelihood ratio (LR) criteria could be used to test the hypothesis of conditional homoskedasticity e.g. against a GARCH(1,1) alternative. The statistics associated with H0 : ~1 = 0 and fl~ = 0 against//1 : al _> 0 or fll > 0 with at least one strict inequality do not have a z2-distribution with two degrees of freedom as the standard assumption that the true parameter value under H0 does not lie on the border of the parameter space does not hold. A LR test which uses a z2-distribution with two degrees of freedom can be shown to be conservative [see e.g. Kodde and Palm (1986)]. Also, the problem of lack of identification of some parameters mentioned above can lead to a break down of standard Wald and LR testing procedures. These A R C H statistics test for specific forms of conditional
228
F. C. Palm
heteroskedasticity. Many tests however have been designed to test for general departures from independently, identically distributed random variables. For instance, the BDS test put forward by Brock, Dechert and Scheinkman (1987) tests for general nonlinear dependence. Its power against ARCH alternatives is similar to that of the LM-ARCH test [see e.g. Brock, Hsieh and LeBaron (1991)]. For other alternatives, the power of the BDS test may be higher. The application by Bera and Lee (1993) of the White Information Matrix (IM) criterion to the linear regression model with autoregressive disturbances lead to a generalization of Engle's LM test for ARCH where ARCH processes are specified as random coefficient autoregressive models. Several authors have noted that ARCH can be given a random coefficient interpretation [see e.g. Tsay (1987)]. Bera, Lee and Higgings (1992) point out the dangers of tackling specification problems one at a time rather than considering them jointly and provide a framework for analyzing autocorrelation and ARCH simultaneously. That such a framewok is needed has been illustrated by e.g. Diebold (1987) in a convincing way by showing that in the presence of ARCH, standard tests for serial correlation will lead to over-rejection of the null hypothesis. Notice that the presence of ARCH could be interpreted in several ways such as nonnormality (excess kurtosis, skewness for asymmetric ARCH) [see e.g. Engle (1982)] and nonlinearity [see e.g. Higgings and Beta (1992)]. Recently Bollerslev and Wooldridge (1992) have developed robust LM tests for the adequacy of the jointly parametrized mean and variance. Their test is based on the gradient of the log-likelihood function evaluated at the constrained QMLestimator and can be computed from simple auxiliary regressions. Only first derivatives of the conditional mean and variance functions are required. The authors present simulation results revealing that in most cases, the robust test statistics compare favorably to nonrobust (standard) Wald and LM tests. This conclusion is in line with findings by Lumsdaine (1995) who compares GARCH(I,1) and IGARCH(1,1) models in a simulation study of the finitesample properties of the ML estimator and related test statistics, While the asymptotic distribution is found to be well approximated by the estimated t-statistics, parameter estimators are skewed for finite sample size, Wald tests have the best size, the standard LM test is highly oversized but versions that are robust to possible nonnormality perform better. Various model diagnostics have been proposed in the literature. For instance, Li and Mak (1994) examine the asymptotic distribution of the squared standardized residual autocorrelations from a Gaussian process with time-dependent conditional mean and variance estimated by ML. The residuals are then standardized by dividing them by their conditional standard deviation and substracting their sample mean. The conditional mean and variance of the process can be nonlinear functions of the information available at time t. These functions are assumed to have continuous second order derivatives. When the data generating process is ARCH(q), a Box-Pierce type portmanteau test based on autocorrelations of squared standardized residuals of order r up to M will have an asymptotic x2-distribution with M - r degrees of freedom when r > q. These types of diagnostics are very useful for checking the adequacy of the model.
229
Specific kinds of hypotheses can arise in multivariate G A R C H models. For instance, G A R C H can be a common feature to several time series. Engle and Kozicki (1993) define a feature that is present in a group of time series as common to those series if there exists a nonzero linear combination of the series that does not have the feature. As an example, consider the bivariate version of the factorA R C H model in (2.20) with one factor and constant idiosyncratic factor covariance matrix. If the variance of f t follows a G A R C H process, the series Yit will also be G A R C H , but the linear combination ylt - bl/b2y2t will have a constant conditional variance. In this example, the series ylt and y2t share a common feature of the form of a common factor with a time-varying conditional variance. Engle and Kozicki (1993) put forward tests for common features. Engle and Susmel (1993) apply the procedure to test for A R C H as common feature in international equity markets. The approach is as follows. First, test for the presence of A R C H in the individual time series. Second, if the A R C H effects are significant in both series, consider the linear combination ylt - 6y2t and regress its squared value on lagged squared values and lagged cross products of the series yit up to lag q and minimize TR2(6) over the coefficient 6. If instead of two series, a set of k series is considered, 6 becomes a (k - 1) x 1 vector. As shown by Engle and Kozicki (1993) the test statistic which minimizes TR2(6) with respect to 6 has a )~2-distribution with degrees of freedom given by the number of lagged squared values included in the regressions minus (k - 1). Engle and Susmel (1993) applied the test to weekly returns on stock market indexes for 18 major stock markets in the world over the period January 1980 to January 1990. They found two groups of countries, one of European countries and one of Far East countries which show similar time-varying volatility. The common feature tests therefore confirm the existence of a common factor-ARCH structure for each group.
4. Statistical properties
In this section, we shall summarize the main results about the statistical properties of G A R C H models and give appropriate references to the literature. 4.1. Moments Bollerslev (1986) has shown that under conditional normality, the G A R C H process (2.2) is wide sense stationary with Eyt = 0 and var(yt)= e0[1 - e(1) - fl(1)] -1 and cov(yt, y,) = 0 for t ~ s if and only if e(1) //(1) < 1. For the GARCH(1,1) model given in (2.2), a necessary and sufficient condition for the existence of the 2 r-th moment is ~ = o ( ~ ) a j ~ / / ~ - J < l when a 0 = 1 and aj = 4 = 1 ( 2 i - 1),j= 1,2, ... Bollerslev (1986) also provides a recursive formula for even moments of yt when p = q = 1. The fourth moment of a conditionally normal GARCH(1,1) variable will be E y 4 = 3(Eyt2)2[1 - (//1 + ~1)2]/[ 1 - (//1 + el) 2 - 2 ~ ] if it exists. As a result of the symmetry of the normal distribution, odd moments are zero if they exist. These results extend results for the ARCH(q) process given in Engle (1982). The condition given above is sufficient for strict stationarity but not necessary.
230
E C Palm
As shown in Krengel (1985), strict stationarity of a vector A R C H process yt is equivalent to the conditions that fit Q(Yt-1, Yt-2, ...) being measurable and trace Qt~'t < c~ a.s. [see also Bollerslev et al. (1994)]. Moment boundedness i.e. E[ trace (~tQ't) r] being finite for some r > 0 implies trace (t]t~'t) < ~ a.s. Nelson (1990a) has shown that for the GARCH(1,1) model (2.2), yt is strictly stationary if and only if E[ln(fl 1 + ~iet2)] < 0 with et being i.i.d. (not necessarily conditional normal) and y~ nondegenerate. This requirement is much weaker than ~1 +/31 < 1. He also has shown that the IGARCH(1,1) model without drift converges almost surely to zero, while in the presence of a positive drift it is strictly stationary and ergodic. Extensions to general univariate G A R C H ( p , q ) processes have been obtained by Bougerol and Picard (1992).
=
4.2. G A R C H and continuous time models
G A R C H models are nonlinear stochastic difference equations which can be estimated more easily than the stochastic differential equations used in the theoretical finance literature to model time-varying volatility. In practice, observations are usually recorded at discrete points in time so that a discrete time model or a discrete time approximation to a continuous model will have to be used in statistical inference. Nelson (1990b) derives conditions for the convergence of stochastic difference equations, among which A R C H processes, to stochastic differential equations as the length of the interval between observations h goes to zero. He applies these results to the GARCH(1,1) and the E G A R C H model. Nelson (1992) investigates the properties of estimates of the conditional covariance matrix generated by a misspecified A R C H model. When a diffusion process is observed at discrete time intervals of length h, the difference between an estimate of its conditional instantaneous covariance matrix based on a GARCH(1,1) model or on an E G A R C H model and the true value converges to zero in probability as h ~ 0. The required regularity conditions are that the distribution does not have fat tails and that the conditional covariance matrix moves smoothly over time. Using high-frequency data, misspecified A R C H models can yield accurate estimates of volatility. In a way, the G A R C H model which averages squared values of variables can be interpreted as a nonparametric estimate of the conditional variance at time t. Discrete time models can also be approximated by continuous time diffusion models. Different A R C H models will in general have different diffusion limits. As shown by Nelson (1990b), the continuous limit may yield convenient approximations for forecast and other moments when a discrete time model leads to intractable distributions. Nelson and Foster (1994) examine the issue of selecting an A R C H process to consistently and efficiently estimate the conditional variance of the diffusion process generating the data. They obtain the approximate distribution of the measurement error resulting from the use of an approximate A R C H filter. Their result allows to compare the efficiency of various A R C H filters and to characterize asymptotically optimal A R C H conditional variance estimates. They derive optimal A R C H filters for three diffusion models and examine the filtering
231
properties of several G A R C H models. For instance, if the data generating process is given by the diffusion equations (3.4) with independent Brownian motions (p = 0) and 6 = 1, the asymptotically optimal filter for o-t 2 sets the drift for Yt [t = # and the conditional variance
-1/2 ~Sy,t+h 2 ~+h = w.h + (1 - q~h - ~ h l / 2 ) o "2 -~-/'/
(4.1)
with ey,t+h = h-1/2[yt+h - Yt - Et(y,+h -- Yt)], w = m~b and ~ = / v ~ . The asymptotically optimal filter for (3.4) with independent Brownian motions therefore is the GARCH(1,1) model. When Wy and w, are correlated, the GARCH(1,1) model (4.1) is no longer optimal. Nelson and Foster (1994) show that the nonlinear asymmetric G A R C H model proposed by Engle and Ng (1993) fulfills the optimality conditions in this case. Nelson and Foster (1994) also study the properties of various A R C H filters when the data are generated by a discrete time near-diffusion process. Their findings have important implications for the choice of a functional form for the A R C H filter in empirical research. The use of continuous record asymptotics has greatly enhanced our understanding of the relationship between continuous time stochastic differential equations and discrete time A R C H models as the sampling frequency increases. Similarly, issues of temporal aggregation play an important role in modeling time-varying volatilities, in particular when an investigator has the choice between using data observed with a high frequency or using observations sampled less frequently. More efficient parameter estimates may be obtained from the high frequency data. On other occasions, an investigator may be interested in the parameters of the high frequency model while only low frequency observations are available. The temporal aggregation problem has been addressed by Diebold (1988) who has shown that the conditional heteroskedasticity disappears in the limit as the sampling frequency decreases and that in the case of flow variables the marginal distribution of the low frequency observations converges to the normal distribution. Drost and Nijman (1993) study the question whether the class of G A R C H processes is closed under temporal aggregation when either stock or flow variables are modeled. The question can be answered if some qualifications are made. Three definitions of G A R C H are adopted. The sequence of variables Yt in (2.2) is defined to be generated by a strong G A R C H process if ~0, ~i, i = 1,2,...q and fli, i = 1 , 2 , . . . p can be chosen such that et = y t h t U2 is i.i.d, with mean zero and variance 1.The sequence yt is said to be semi-strong G A R C H if E[yt ] y t - l , yt-2, ...] = 0 and E[~]yt-l,Yt-2,...] = h t whereas it is weakly G A R C H ( p , q ) is P[yt [ Y t - l , yt-2, ...] = 0 and P [ ~ I y t - x , yt-2, ...] = ht where P denotes the best linear predictor in terms of a constant, y t - l , yt-2, ..., y2_1, ~ - 2 , ... The main finding of Drost and Nijman (1993) is that the class of symmetric weak G A R C H processes for either stock or flow variables is closed under tern-
232
F. C. Palm
poral aggregation. This means that if the high frequency process is symmetric (weak) GARCH, the low frequency process will also be symmetric weak GARCH. The parameters of the conditional variance of the low frequency process depend upon the mean, variance and kurtosis of the corresponding high frequency process. The conditional heteroskedasticity disappears as the sampling frequency increases for GARCH processes with ~q=l c~i+ ~P=I fli < 1. The class of strong or semi-strong GARCH processes is generally not closed under temporal aggregation suggesting that strong or semi-strong GARCH processes will often be approximations only to the data generating process if the observation frequency does not exactly correspond with the frequency of the data generating process. In a companion paper, Drost and Werker (1995) study the properties of a continuous time GARCH process, i.e. a process of which the increments Xt+h --Xt, t C hN are weak GARCH for each fixed time interval h > 0. Obviously in the light of the results by Drost and Nijman (1993) a continuous time GARCH process cannot be strong or semi-strong GARCH as the classes of these processes are not closed under temporal aggregation. The assumption of an underlying continuous time GARCH process leads to a kurtosis in excess of three for the associated discrete GARCH models, implying thick tails. Drost and Werker (1995) show how the parameters of the continuous time diffusion process can be identified from the discrete time GARCH parameters. The relations between the parameters of the continuous and discrete time models can be used to estimate the diffusion model from discrete time observations in a fairly straightforward way. Nijman and Sentana (1993) complement the results of Drost and Nijman (1993) by showing that contemporaneous aggregation of independent univariate GARCH processes yields a weak GARCH process. Then they generalize this finding by showing that a linear combination of variables generated by a multivariate GARCH process will also be weak GARCH. The marginal processes of multivariate GARCH models will be weak GARCH as well. Finally, from simulation experiments the authors conclude that in many instances, estimators which are ML under the assumption that the process is strong GARCH with conditional normal distribution converge to values close to the weak GARCH parameters as the sample size increases. The findings on temporal and contemporaneous aggregation of GARCH processes indicate that linear transformations of GARCH processes are generally only weak GARCH.
4.3. Forecasting volatility

Time series models are often built to generate out-of-sample forecasts. The issue of forecasting in models with time-dependent conditional heteroskedasticity has been investigated by several authors. Engle and Kraft (1983) and Engle and Bollerslev (1986) obtain expressions for the multi-step forecast error variance for time series models with ARCH and GARCH errors respectively. Bollerslev
G A R C H models o f volatility
233
(1986), Granger, White and Kamstra (1989) are concerned with the construction of one-step-ahead forecast intervals with time-varying variances. Baillie and Bollerslev (1992) consider a single equation regression model with ARMAG A R C H disturbances, for which they derive the minimum MSE forecast. They also derive the moments of the forecast error distribution for the dynamic model with GARCH(1,1) disturbances. These moments are used in the construction of forecast intervals using the Cornish-Fisher asymptotic expansion. Geweke (1989) obtains the multi-step ahead forecast error density for linear models with A R C H disturbances by numerical integration within a Bayesian context. Nelson and Foster (1995) derive conditions under which for data observed at high frequency a misspecified A R C H model performs well in forecasting of a time series process and its volatility. In line with the conditions for successful filtering obtained by Nelson and Foster (1994), the basic requirement is that the A R C H model correctly specifies the functional form of the first two conditional moments of all state variables. To illustrate the construction of estimates of the forecast error variance, consider a stationary AR(1) process
Yt = (gYt-1 + Us ,
(4.2)
where ut = gth]/2 is a GARCH(1,1) process as in (2.2). The minimum MSE forecast of Yt+s at period t is E t ( Y t + s ) = (aS y t . The forecast error wts = yt+s -- (9S y t can be expressed as wt, = ut+s + cbut+~-i + ... + ~b~-lut+l. Its conditional variance at time t
s-I
Var(wts)
= Z
i=0
2 ~ 2i Et(u,+~_i), s> 0 ,
(4.3)
can be computed recursively. The GARCH(1,1) process for us leads to an A R M A representation for ut 2 [see Bollerslev (1986)] 2 = ~0 Jr- (~1 -~- f l l ) U 2 _ l -- f l l O , - I -t- 1)t , Ut (4.4)
with vt = u~ - ht. The expectations on the r.h.s, of (4.3) can be readily obtained from expression (4.4)
Et(ht+s) = Et(u2+s) = ~0 + (~1 + fll)Et(u2+~_l),s > 1 ,
(4.5)
as shown by Engle and Bollerslev (1986). As the forecast horizon increases, the optimal forecast converges monotonically to the unconditional variance ~0/(1 - ~ l - i l l ) . For the IGARCH(1,1) model, shocks to the conditional variance are persistent and Et(ht+s) = ~0(s - 1) + hr. The expression (4.5) can be used as a forecast of future volatility. Baillie and Bollerslev (1992) derive an expression for the conditional MSE of Et(ht+s) as a forecast of the conditional variance at period t + s.
234
5. Conclusions
F. C. Palm
In this paper, we have surveyed the literature on modeling time-varying volatility using GARCH processes. In reviewing the vast number of contributions we have put most emphasis on recent developments. In tess than fifteen years since the path-breaking publication of Engle (1982) much progress has been made in understanding GARCH models and in applying them to economic time series. This progress has drastically changed the way in which empirical time series research is carried out. At the same time, the statistical properties of time series, in particular financial time series which were not accounted for by existing models have led to new developments in the field of volatility modeling. The finding of skewness and skewed correlations defined as [(~t y2yt+k)/(Ta3v~)] fostered the development of asymmetric GARCH models. The presence of excess kurtosis in GARCH models with conditional normally distributed innovations has led to the use of student-GARCH models and GARCH-jump models. Persistence in conditional variances was modeled using variance component models with a stochastic trend component. The finding of time-variation in conditional covariances and correlations resulted in the development of multivariate GARCH and factor-GARCH models. Factor-GARCH models have several attractive features. First, they can be easily interpreted in terms of economic theory (factor models like the arbitrage pricing theory have been used extensively in finance). Second, they allow for a parsimonious representation of time-varying variances and covariances for a high dimensional vector of variables. Third, they can account for both observed and unobserved factors. Fourth, they have interesting implications for common features of the variables. These common features can be tested in a straightforward way. Fifth, they have appeared to fit well in several instances. As has become apparent in Section 2, the functional forms of time-varying volatility has attracted a lot of attention by researchers to an extent where one wonders whether the returns from designing new GARCH specification are still positive. While some specifications are close if not perfect substitutes for others, the results by Nelson and Foster on the use of GARCH as filters to estimate the conditional variance of an underlying diffusion model put the issue of choosing a functional form for the GARCH model in a new perspective. For a given diffusion process some GARCH model will be an optimal (efficient) filter whereas others with similar properties might not be optimal. The research by Nelson and Foster (1994) suggests that prior knowledge about the form of the underlying diffusion process will be useful when choosing the functional form for the GARCH model. As shown by Anderson (1992,1994) GARCH processes belong to the class of deterministic, conditionally heteroskedastic volatility processes. The ease of evaluating the GARCH likelihood function and the ability of the GARCH specification to accommodate the time-varying volatility, in particular to yield a flexible, parsimonious representation of the correlation found for the squared values of many series (comparable to the parsimonious representation of condi-
235
tional means using A R M A schemes) has led to the widespread use of G A R C H models. The history of the stochastic volatility model is brief. This model has been put forward as a parsimoniously parameterized alternative to G A R C H models. While one of its attractive features is the low number of parameters needed to fit the time-variation of volatility of many time series, likelihood-based inference of stochastic volatility models requires numerical integration or the use of the Kalman filter. As mentioned in Section 3, many of these problems have by now been resolved. The statistical properties of G A R C H models and stochastic volatility models differ. Comparisons of these models [see for instance Danielson (1994), Hsieh (1991), Jacquier et al. (1995) and Ruiz (1993)] on the basis of financial time series led to the conclusion that these models put different weights on various moments functions. The choice among these models will very often be an empirical question. In other instances, a G A R C H model will be preferred because it yields an optimal filter of the variance of the underlying diffusion model. Factor-GARCH models with unobserved factors will lead to stochastic volatility components when one has to condition on the latent factors. The borders between the two classes of volatility models are expected to lose sharpness. Results on temporal aggregation of G A R C H processes indicate that weak G A R C H is the most common case. For reasons of aggregation, models relying on strong G A R C H are at best approximations to the data generating process, a situation in which a pragmatic view of using data information to select the model might be the most appropriate. Topics for future research are improving our understanding and the modeling of relationships between volatilities of different series and markets. Multivariate G A R C H , factor-GARCH and stochastic volatility models will be used and extended. Questions regarding the nature and the transmission of persistence in volatility from one series to another, the transmission of persistence in volatility into the conditional expected return will have to receive more attention in the future. Finally, statistical methods for testing and estimating volatility models and for forecasting volatility will be on the research agenda for a while. In particular, nonparametric and semiparametric methods appear to open up new perspectives to modeling time-variation in conditional distributions of economic time series.
References
Anderson, T. G. (1992). Volatility. Department of Finance, Working Paper No. 144, Northwestern University. Anderson, T. (1994). Stochastic autoregressive volatility: A framework for volatility modeling. Math. Finance 4, 75-102. Baillie, R. T. and T. Bollerslev (1990). A multivariate generalized ARCH approach to modeling risk premia in forward foreign exchange rate markets. J. Internat. Money Finance 9, 309-324. Baillie, R. T. and T. Bollerslev (1992). Prediction in dynamic models with time-dependent conditional variances. J. Econometrics 52, 91-113.
236
F. C. Palm
Baillie, R. T., T. Bollerslev, and H. O. Mikkelsen (1993). Fractionally integrated generalized autoregressive conditional heteroskedasticity. Michigan State University, Working Paper. Baillie, R. T. (1994) Long memory processes and fractional integration in econometrics. Michigan State University, Working Paper. Ball, C. A. and A. Roma (1993). A jump diffusion model for the European Monetary System. J. Internat. Money Finance 12, 475-492. Ball, C. A. and W. N. Torous (1985). On jumps in common stock prices and their impact on call option pricing. J. Finance 40, 155-173. Bera, A. K. and S. Lee (1990). On the formulation of a general structure for conditional heteroskedasticity. University of Illinois at Urbana-Champaign, Working Paper. Bera, A. K., S. Lee, and M. L. Higgins (1992). Interaction between autocorrelation and conditional heteroskedasticity : A random coefficient approach. J. Business Econom. Statist. 10, 133-142. Bera, A. K. and S. Lee (1993). Information matrix test, parameter heterogeneity and ARCH. Rev. Econom. Stud. 60, 229-240. Bera, A. K. and M. L. Higgins (1995). On ARCH models : Properties, estimation and testing. In: Oxley L., D. A. R. George, Roberts, C. J., and S. Sayer eds., Surveys in Econometrics, Oxford, Basil Blackwell, 215-272. Black, F. (1976). Studies in stock price volatility changes. Proc. Amer. Statist. Assoc., Business and Economic Statistics Section 177-181. Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. J. Econometrics 31, 307-327. Bollerslev, T., R. F. Engle, and J. M. Wooldridge (1988). A capital asset pricing model with time varying covariances. J. Politic. Econom. 96, 116-131. Bollerslev, T., R. Y. Chou, and K. F. Kroner (1992). ARCH modeling in finance: A review of the theory and empirical evidence. J. Econometrics 52, 5-59. Bollerslev, T. and J. M. Wooldridge (1992). Quasi maximum likelihood estimation and inference in dynamic models with time varying covariances. Econometric Rev. 11, 143 172. Bollerslev, T. and I. Domowitz (1993). Trading patterns and the behavior of prices in the interbank foreign exchange market. J. Finance, to appear. Bollerslev, T. and R. F. Engle (1993). Common persistence in conditional variances. Econometrica 61, 166-187. Bollerslev, T. and H. O. Mikkelsen (1993). Modeling and pricing long-memory in stock market volatility. Kellogg School of Management, Northwestern University, Working Paper No. 134. Bollerslev, T., R. F. Engle and D. B. Nelson (1994). ARCH models. Northwestern University, Working Paper, prepared for The Handbook o f Econometrics Vol. 4. Bougerol, Ph. and N. Picard (1992). Stationarity of GARCH processes and of some nonnegative time series. J. Econometrics 52, 115-128. Brock, A. W., W. D. Dechert and J. A. Scheinkman (1987). A test for independence based on correlation dimension. Manuscript, Department of Economics, University of Wisconsin, Madison. Brock, A.W., D. A. Hsieh and B. LeBaron (1991). Nonlinear Dynamics, Chaos and Instability: Statistical Theory and Economic Evidence. MIT Press, Cambridge, MA. Cai, J. (1994). A Markov model of switching-regime ARCH. J. Business Econom. Statist. 12, 309 316. Chou, R. Y. (1988). Volatility persistence and stock valuations: Some empirical evidence using GARCH. J. Appl. Econometrics 3, 279-294. Crouhy, M. and C. M. Rockinger (1994). Volatility clustering, asymmetry and hysteresis in stock returns : International evidence. Paris, HEC-School of Management, Working Paper. Crowder, M. J. (1976). Maximum likelihood estimation with dependent observations. J. Roy. Statist. Soc. Ser. B 38, 45-53. Danielson, J. (1994). Stochastic volatility in asset prices : Estimation with simulated maximum likelihood. J. Econometrics 64, 375-400. Davidian, M. and R. J. Carroll (1987). Variance function estimation. J. Amer. Statist. Assoc. 82, 10791091.
237
Demos, A. and E. Sentana (1991). Testing for GARCH effects: A one-sided approach. London School of Economics, Working Paper. De Vries, C. G. (1991). On the relation between GARCH and stable processes. J. Econometrics 48, 313724. Diebold, F. X. (1987) Testing for correlation in the presence of ARCH. Proceedings from the ASA Business and Economic Statistics Section, 323-328. Diebold, F. X. (1988). Empirical Modeling of Exchange Rates. Berlin, Springer-Verlag. Diebold, F. X. and M. Nerlove (1989). The dynamics of exchange rate volatility: A multivariate latent factor ARCH model. 9". Appl. Econometrics 4, 1-21. Diebold, F. X. and J. A. Lopez (1994). ARCH models. Paper prepared for Hoover K. ed., Macroeconometrics: Developments, Tensions and Prospects. Ding, Z., R. F. Engle, and C. W. J. Granger (1993). A long memory property of stock markets returns and a new model. J. Empirical Finance 1, 83-106. Drost, F. C. and T. E. Nijman (1993). Temporal aggregation of GARCH processes. Econometrica 61, 909-927. Drost, F. C. and B. J. M. Werker (1995). Closing the GARCH gap: Continuous time GARCH modeling. Tilburg University, paper to appear in J. Econometrics. Engel, C. and J . D. Hamilton (1990). Long swings in the exchange rate : Are they in the data and do markets know it ? Amer. Econom. Rev. 80, 689-713. Engle, R. F. (1982). Autoregressive conditional heteroskedasticity with estimates of the variance of U.K. inflation. Econometrica 50, 987-1008. Engle, R. F. and D . F. Kraft (1983). Multiperiod forecast error variances of inflation estimated from ARCH models. In: Zellner, A. ed., Applied Time Series Analysis of Economic Data, Bureau of the Census, Washington D.C., 293-302. Engle, R. F. and T. Bollerslev (1986). Modeling the persistence of conditional variances. Econometric Rev. 5, 1-50. Engle, R. F., D . M. Lilien, and R. P. Robins (1987). Estimating time varying risk premia in the term structure : The ARCH-M model, Econometrica 55, 391407. Engle, R. F. (1990). Discussion: Stock market volatility and the crash of 87. Rev. Financ. Stud. 3, 103106. Engle, R. F., V . K. Ng, and M. Rothschild (1990). Asset pricing with a factor ARCH covariance structure: Empirical estimates for treasury bills. J. Econometrics 45, 213-238. Engle, R. F. and G. Gonzalez-Rivera (1991). Semiparametric ARCH models. J. Business Econom. Statist. 9, 345-359. Engle, R. F. and V . K. Ng (1993). Measuring and testing the impact of news on volatility. J. Finance 48, 1749 1778. Engle, R. F. and G. G. J. Lee (1993). Long run volatility forecasting for individual stocks in a one factor model. Unpublished manuscript, Department of Economics, UCSD. Engle, R. F. and S. Kozicki (1993). Testing for common features (with discussion). J. Business Econom. Statist. 11, 369-380. Engle, R. F. and R. Susmel (1993). Common volatility and international equity markets. J. Business Econom. Statist. 11, 167-176. Engle, R. F. and G. G. J. Lee (1994). Estimating diffusion models of stochastic volatility. Mimeo, University of California at San Diego. Engle, R. F. and K . F. Kroner (1995). Multivariate simultaneous generalized ARCH. Econometric Theory 11, 122-150. French, K. R., G . W. Schwert and R . F. Stambaugh (1987). Expected stock returns and volatility. J. Financ. Econom. 19, 3-30. Gallant, A. R. (1981). On the bias in flexible functional forms and an essentially unbiased form : The Fourier flexible form. J. Econometrics 15, 211-244. Gallant, A. R. and G. Tauchen (1989). Seminonparametric estimation of conditionally constrained heterogeneous processes : Asset pricing applications. Econometrica 57, 1091-1120.
238
F. C. Palm
Gallant, A. R., D. Hsieh and G. Tauchen (1994). Estimation of stochastic volatility models with suggestive diagnostics. Duke University, Working Paper. Geweke, J. (1989). Exact predictive densities for linear models with ARCH disturbances. J. Econometrics 40, 63-86. Geweke, J. (1994). Bayesian comparison of econometric models. Federal Reserve Bank of Minneapolis, Working Paper. Ghysels, E., A. C, Harvey and E. Renault (1995). Stochastic volatility. Prepared for Handbook of Statistics, Vol.14. Glosten, L. R., R. Jagannathan, and D. Runkle (1993). Relationship between the expected value and the volatility of the nominal excess return on stocks. J. Finance 48, 1779-1801. Gouri+roux, C. and A. Monfort (1992). Qualitative threshold ARCH models. J. Econometrics 52, 159 199. Gouri6roux, C. (1992). ModOles A R C H et Application Financigres. Paris, Economica. Gouri6roux, C., A. Monfort and E. Renault (1993). Indirect inference. J. Appl. Econometrics 8, $85Sl18. Granger, C. W. J., H. White and M. Kamstra (1989). Interval forecasting: An analysis based upon ARCH-quantile estimators. J. Econometrics 40, 87 96. Hamilton, J. D. (1988). Rational-expectations econometric analysis of changes in regime: An investigation of the term structure of interest rates. J. Econom. Dynamic Control 12, 385-423. Hamilton, J. D. (1989). Analysis of time series subject to changes in regime. J. Econometrics 64, 307333. Hamilton, J. D. and R. Susmel (1994). Autoregressive conditional heteroskedasticity and changes in regime. J. Econometrics 64, 307-333. Harvey, A. C., E. Ruiz and E. Sentana (1992). Unobserved component time series models with ARCH disturbances. J. Econometrics 52, 129-158. Hentschel, L. (1994). All in the family : Nesting symmetric and asymmetric GARCH models. Paper presented at the Econometric Society Winter Meeting, Washington D.C., to appear in J. Financ. Econom. 39, hr. 1. Higgins, M. L. and A. K. Bera (1992). A class of nonlinear ARCH models. Internat. Econom. Rev. 33, 137-158. Hsieh, D. A. (1989). Modeling heteroskedasticity in daily foreign exchange rates. J. Business Econom. Statist. 7, 307-317. Hsieh, D. (1991). Chaos and nonlinear dynamics: Applications to financial markets. J. Finance 46, 1839-1877. Hull, J. and A. White (1987). The pricing of options on assets with stochastic volatilities. J. Finance 42, 281-300. Jacquier, E., N. G. Polson and P. E. Rossi (1994). Bayesian analysis of stochastic volatility models. J. Business. Econom. Statist. 12, 371-389. Jorion, P. (1988). On jump processes in foreign exchange and stock markets. Rev. Finan. Stud. 1,427445. Kim, S. and N. Sheppard (1994). Stochastic volatility: Likelihood inference and comparison with ARCH models. Mimeo, Nuffield College, Oxford. King, M., E. Sentana and S. Wadhwani (1994). Volatility links between national stock markets. Econometrica 62, 901-933. Kodde, D. A. and F. C. Palm (1986). Wald criteria for jointly testing equality and inequality restrictions. Econometrica 54, 1243-1248. Krengel, U. (1985). Ergodic Theorems. Walter de Gruyter, Berlin. Lee, J. H. H. (1991). A Lagrange multiplier test for GARCH models. Econom. Lett. 37, 265-271. Lee, J. H. H. and M . L. King (1993). A locally most mean powerful based score test for ARCH and GARCH regression disturbances. J. Business Econom. Statist. 11, 17-27. Lee, S. W. and B. E. Hansen (1994). Asymptotic theory for the GARCH(1,1) quasi-maximum likelihood estimator. Econometric Theory 10, 29-52. Li, W. K. and T. K. Mak (1994). On the squared residual autocorrelations in non-linear time series with conditional heteroskedasticity. J. Time Series Analysis 15, 627-636.
239
Lin, W.-L. (1992). Alternative estimators for factor GARCH models - A Monte Carlo comparison. J. Appl. Econometrics 7, 259-279. Linton, O. (1993). Adaptive estimation in ARCH models. Econometric Theory 9, 539-569. Lumsdaine, R. L. (1992). Asymptotic properties of the quasi-maximum likelihood estimator in GARCH(1,1) and IGARCH(1,1) models. Unpublished manuscript, Department of Economics, Princeton University. Lumsdaine, R. L. (1995). Finite-sample properties of the maximum likelihood estimator in GARCH(1,1) and IGARCH(1,1) models: A Monte Carlo investigation. J. Business Econom. Statist. 13, 1-10. Melino, A. and S. Turnbull (1990). Pricing foreign currency options with stochastic volatility. J. Econometrics 45, 239-266. Nelson, D. B. (1990a). Stationarity and persistence in the GARCH(1,1) model. Econometric Theory 6, 318-334. Nelson, D. B. (1990b). ARCH models as diffusion approximations. J. Econometrics 45, 7-38. Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns : A new approach. Econometrica 59, 347-370. Nelson, D. B. (1992). Filtering and forecasting with misspecified ARCH models I. J. Econometrics 52, 61-90. Nelson, D. B. and C. Q. Cao (1992). Inequality constraints in univariate GARCH models. J. Business Econom. Statist. 10, 229-235. Nelson, D. B. and D . P. Foster (1994). Asymptotic filtering theory for univariate ARCH models. Econometrica 62, 1-41. Nelson, D. B. and D. P. Foster (1995). Filtering and forecasting with misspecified ARCH models II Making the right forecast with the wrong model. J. Econometrics 67, 303-335. Ng, V., R. F. Engle, and M. Rothschild (1992). A multi-dynamic-factor model for stock returns. J. Econometrics 52, 245-266. Nieuwland, F. G. M. C., W. F. C. Verschoor, and C. C. P. Wolff (1991). EMS exchange rates. J. lnternat. Financial Markets, Institutions and Money 2, 21-42. Nijman, T. E. and F. C. Palm (1993). GARCH modelling of volatility : An introduction to theory and applications. In: De Zeeuw, A . J. ed., Advanced Lectures in Quantitative Economics II, London, Academic Press, 153-183. Nijman, T. E. and E. Sentana (1993). Marginalization and contemporaneous aggregation in multivariate GARCH processes. Tilburg University, CentER, Discussion Paper No. 9312, to appear in J. Econometrics. Pagan, A. R. and A. Ullah (1988). The econometric analysis of models with risk terms. J. Appl. Econometrics 3, 87-105. Pagan, A. R. and G. W. Schwert (1990). Alternative models for conditional stock volatility. J. Econometrics 45, 267-290. Pagan, A. R. and Y. S. Hong (1991). Nonparametric estimation and the risk premium. In: Barnet, W. A., J. Powell and G. Tauchen, eds., Nonparametric and Semiparametric Methods in Econometrics and Statistics, Cambridge University Press, Cambridge. Pagan, A. R. (1995). The econometrics of financial markets. ANU and the University of Rochester, Working Paper, to appear in the J. Empirical Finance. Palm, F. C. and J. P. Urbain (1995). Common trends and transitory components of stock price volatility. University of Limburg, Working Paper. Parkinson, M. (1980). The extreme value method for estimating the variance of the rate of return. J. Business 53, 61-65. Ruiz, E. (1993). Stochastic volatility versus autoregressive conditional heteroskedasticity. Universidad Carlos III de Madrid, Working Paper. Robinson, P. M. (1991). Testing for strong serial correlation and dynamic conditional heteroskedasticity in multiple regression. J. Econometrics 47, 67-84. Schwert, G. W. (1989). Why does stock market volatility change over time? J. Finance 44, 11151153.
240
F. C. Palm
Sentana, E. (1991). Quadratic ARCH models: A potential re-interpretation of ARCH models. Unpublished manuscript, London School of Economics. Sentana, E. (1992). Identification of multivariate conditionally heteroskedastic factor models. London School of Economics, Working Paper. Taylor, S. (1986). Modeling Financial Time Series. J. Wiley & Sons, New York, NY. Taylor, S. J. (1994). Modeling stochastic volatility: A review and comparative study. Math. Finance 4, 183-204. Tsay, R. S. (1987). Conditional heteroskedastic time series models. J. Amer. Statist. Assoc. 82, 590604. Vlaar, P. J. G. and F. C. Palm (1993). The message in weekly exchange rates in the European Monetary System : Mean reversion, conditional heteroskedasticity and jumps. J. Business. Econom. Statist. 11, 351-360. Vlaar, P. J. G. and F. C. Palm (1994). Inflation differentials and excess returns in the European Monetary System. CEPR Working Paper Series of the Network in Financial Markets, London. Weiss, A. A. (1986), Asymptotic theory for ARCH models: Estimation and testing. Econometric Theory 2, 107-131. Zakoian, J. M. (1994). Threshold heteroskedastic models. J. Econom. Dynamic Control 18, 931-955.
G. S. Maddala and C. R. Rao, eds., Handbook of Statistics, Vol. 14 1996 Elsevier Science B.V. All rights reserved.
0 0
Forecast Evaluation and Combination*
Francis X. Diebold and Jose A. Lopez
It is obvious that forecasts are of great importance and widely used in economics and finance. Quite simply, good forecasts lead to good decisions. The importance of forecast evaluation and combination techniques follows immediately - forecast users naturally have a keen interest in monitoring and improving forecast performance. More generally, forecast evaluation figures prominently in many questions in empirical economics and finance, such as: Are expectations rational? (e.g., Keane and Runkle, 1990; Bonham and Cohen, 1995) Are financial markets efficient? (e.g., Fama, 1970, 1991) Do macroeconomic shocks cause agents to revise their forecasts at all horizons, or just at short- and medium-term horizons? (e.g., Campbell and Mankiw, 1987; Cochrane, 1988) - Are observed asset returns "too volatile"? (e.g., Shiller, 1979; LeRoy and Porter, 1981) Are asset returns forecastable over long horizons? (e.g., Fama and French, 1988; Mark, 1995) - Are forward exchange rates unbiased and/or accurate forecasts of future spot prices at various horizons? (e.g., Hansen and Hodrick, 1980) Are government budget projections systematically too optimistic, perhaps for strategic reasons? (e.g., Auerbach, 1994; Campbell and Ghysels, 1995) Are nominal interest rates good forecasts of future inflation? (e.g., Fama, 1975; Nelson and Schwert, 1977)
-
Here we provide a five-part selective account of forecast evaluation and combination methods. In the first, we discuss evaluation of a single forecast, and in particular, evaluation of whether and how it may be improved. In the second, we discuss the evaluation and comparison of the accuracy of competing forecasts. In the third, we discuss whether and how a set of forecasts may be combined to produce a superior composite forecast. In the fourth, we describe a number of
* We thank Clive Granger for useful comments, and we thank the National Science Foundation, the Sloan Foundation and the University of Pennsylvania Research Foundation for financial support.
241
242
F. X. Diebold and J. A. Lopez
forecast evaluation topics of particular relevance in economics and finance, including methods for evaluating direction-of-change forecasts, probability forecasts and volatility forecasts. In the fifth, we conclude. In treating the subject of forecast evaluation, a tradeoff emerges between generality and tedium. Thus, we focus for the most part on linear least-squares forecasts of univariate covariance stationary processes, or we assume normality so that linear projections and conditional expectations coincide. We leave it to the reader to flesh out the remainder. However, in certain cases of particular interest, we do focus explicitly on nonlinearities that produce divergence between the linear projection and the conditional mean, as well as on nonstationarities that require special attention.
1. Evaluating a single forecast
The properties of optimal forecasts are well known; forecast evaluation essentially amounts to checking those properties. First, we establish some notation and recall some familiar results. Denote the covariance stationary time series of interest by yt. Assuming that the only deterministic component is a possibly nonzero mean, /~, the Wold representation is yt = # + et + bl et-1 + b2 et-2 -~where W]V(0~'2), and WN denotes serially uncorrelated (but not necessarily Gaussian, and hence not necessarily independent) white noise. We assume invertibility throughout, so that an equivalent one-sided autoregressive representation exists. The k-step-ahead linear least-squares forecast is .Yt+k,t = # ~- bk ct + bk+l Ct-1 + . . . , and the corresponding k-step-ahead forecast error is
, Et ~
et+k,t = Yt+k -- ~Vt+k,t = et+k q- bl et+k-1 + . . .
+ bk-1
et+l
(1)
Finally, the k-step-ahead forecast error variance is
~2
var(et+k,t)
(~__~k-1 62 ) \
2\i_z25, (2)
Four key properties of errors from optimal forecasts, which we discuss in greater detail below, follow immediately: (1) Optimal forecast errors have a zero mean (follows from (1)); (2) 1-step-ahead optimal forecast errors are white noise (special case of (1) corresponding to k = 1); (3) k-step-ahead optimal forecast errors are at most MA(k-1) (general case of(l)); (4) The k-step-ahead optimal forecast error variance is non-decreasing in k (follows from (2)). Before proceeding, we now describe some exact distribution-free nonparametric tests for whether an independently (but not necessarily identically) distributed series has a zero median. The tests are useful in evaluating the properties
Forecast evaluation and combination
243
of optimal forecast errors listed above, as well as other hypotheses that will concern us later. Many such tests exist; two of the most popular, which we use repeatedly, are the sign test and the Wilcoxon signed-rank test. Denote the series being examined by xt, and assume that T observations are available. The sign test proceeds under the null hypothesis that the observed series is independent with a zero median. 1 The intuition and construction of the test statistic are straightforward - under the null, the number of positive observations in a sample of size T has the binomial distribution with parameters T and 1/2. The test statistic is therefore simply
T t=l
where I+(xt)= 1 0 if xt > 0 , otherwise.
In large samples, the studentized version of the statistic is standard normal, S-T~2 N(O, 1) .
v -/4
Thus, significance may be assessed using standard tables of the binomial or normal distributions. Note that the sign test does not require distributional symmetry. The Wilcoxon signed-rank test, a related distribution-free procedure, does require distributional symmetry, but it can be more powerful than the sign test in that case. Apart from the additional assumption of symmetry, the null hypothesis is the same, and the test statistic is the sum of the ranks of the absolute values of the positive observations,
T
W =- ) I+(xt)Rank(lxt D ,
t=l
where the ranking is in increasing order (e.g., the largest absolute observation is assigned a rank of T, and so on). The intuition of the test is simple - if the underlying distribution is symmetric about zero, a "very large" (or "very small") sum of the ranks of the absolute values of the positive observations is "very unlikely." The exact finite-sample null distribution of the signed-rank statistic is free from nuisance parameters and invariant to the true underlying distribution, and it has been tabulated. Moreover, in large samples, the studentized version of the statistic is standard normal,
i I f the series is s y m m e t r i c a l l y d i s t r i b u t e d , t h e n a zero m e d i a n o f c o u r s e c o r r e s p o n d s to a zero mean.
244

W - [T(T + 1)]/4 N(O, 1) . v/[T(T + 1)(2T + 1)]/24
Testing properties of optimal forecasts
Given a track record of forecasts, )vt+k,t, and corresponding realizations, Yt+k, forecast users will naturally want to assess forecast performance. The properties of optimal forecasts, cataloged above, can readily be checked.
a. Optimal forecast errors have a zero mean
A variety of standard tests of this hypothesis can be performed, depending on the assumptions one is willing to maintain. For example, if et+~,t is Gaussian white noise (as might be the case for 1-step-ahead errors), then the standard t-test is the obvious choice because it is exact and uniformly most powerful. If the errors are non-Gaussian but remain independent and identically distributed 0id), then the ttest is still useful asymptotically. However, if more complicated dependence or heterogeneity structures are (or may be) operative, then alternative tests are required, such as those based on the generalized method of moments. It would be unfortunate if non-normality or richer dependence/heterogeneity structures mandated the use of asymptotic tests, because sometimes only short track records are available. Such is not the case, however, because exact distribution-free nonparametric tests are often applicable, as pointed out by Campbell and Ghysels (1995). Although the distribution-free tests do require independence (sign test) and independence and symmetry (signed-rank test), they do not require normality or identical distributions over time. Thus, the tests are automatically robust to a variety of forecast error distributions, and to heteroskedasticity of the independent but not identically distributed type. For k > 1, however, even optimal forecast errors are likely to display serial correlation, so the nonparametric tests must be modified. Under the assumption that the forecast errors are ( k - 1)-dependent, each of the following k series of forecast errors will be free of serial correiation: {el+k,1, el+2k,l+k, el+3k,l+2k,...): {ez+k,2, e2+zk,2+k, e2+3k,Z+Zk,...},{e3+k,3, e3+zk,3+k, e3+3k,3+2k,.- .),.--, {e2~,k, e3k,2k, e4k,3k,...}. Thus, a Bonferroni bounds test (with size bounded above by c~) is obtained by performing k tests, each of size a/k, on each of the k error series, and rejecting the null hypothesis if the null is rejected for any of the series. This procedure is conservative, even asymptotically. Alternatively, one could use just one of the k error series and perform an exact test at level ~, at the cost of reduced power due to the discarded observations. In concluding this section, let us stress that the nonparametric distribution-free tests are neither unambiguously "better" nor "worse" than the more common tests; rather, they are useful in different situations and are therefore complementary. To their credit, they are often exact finite-sample tests with good finite-sample power, and they are insensitive to deviations from the standard
245
assumptions of normality and homoskedasticity required to justify more standard tests in small samples. Against them, however, is the fact that they require independence of the forecast errors, an assumption even stronger than conditionalmean independence, let alone linear-projection independence. Furthermore, although the nonparametric tests can be modified to allow for k-dependence, a possibly substantial price must be paid either in terms of inexact size or reduced power.
b. 1-Step-ahead optimal forecast errors are white noise More precisely, the errors from line-a~ Ieast squares forecasts are linear-projection independent, and the errors from least squares forecasts are conditional-mean independent. The errors never need be fully serially independent, because dependence can always enter through higher moments, as for example with the conditional-variance dependence of GARCH processes. Under various sets of maintained assumptions, standard asymptotic tests may be used to test the white noise hypothesis. For example, the sample autocorrelation and partial autocorrelation functions, together with Bartlett asymptotic standard errors, may be useful graphical diagnostics in that regard. Standard tests based on the serial correlation coefficient, as well as the Box-Pierce and related statistics, may be useful as well. Dufour (1981) presents adaptations of the sign and Wilcoxon signed-rank tests that yield exact tests for serial dependence in 1-step-ahead forecast errors, without requiring normality or identical forecast error distributions. Consider, for example, the null hypothesis that the forecast errors are independent and symmetrically distributed with zero median. Then median (et+l,tet+2,t+l) = 0, that is, the product of two symmetric independent random variables with zero median is itself symmetric with zero median. Under the alternative of positive serial dependence, median (et+l,tet+2,t+l) > 0, and under the alternative of negative serial dependence, median (et+l,tet+2,t+l) < O. This suggests examining the cross-product series zt = et+l,tet+2,t+l for symmetry about zero, the obvious test for which is the signed-rank test, WD = ~f=lI+(zt)Rank([zt[). Note that the zt sequence will be serially dependent even if the et+l,t sequence is not, in apparent violation of the conditions required for validity of the signed-rank test (applied to zt). Hence the importance of Dufour's contribution - Dufour shows that the serial correlation is of no consequence and that the distribution of WD is the same as that of W. c. k-Step-ahead optimal forecast errors are at most M A ( k - 1 ) Cumby and Huizinga (1992) develop a useful asymptotic test for serial dependence of order greater than k - 1. The null hypothesis is that the et+k,t series is MA(q) (0 _< q < k - 1) against the alternative hypothesis that at least one autocorrelation is nonzero at a lag greater than k - 1. Under the null, the sample autocorrelations of et+k,t,19 [[)q+l,...,[)q+s], are asymptotically distributed
=
v~
~ N(0, V).2 Thus,

2 Sis a cutofflag selectedby the user.
246
F. X. DieboM and J. A. Lopez
is asymptotically distributed as Z~ under the null, where ~" is a consistent estimator of V. Dufour's (1981) distribution-free nonparametric tests may also be adapted to provide a finite-sample bounds test for serial dependence of order greater than k - 1. As before, separate the forecast errors into k series, each of which is serially independent under the null of ( k - 1)-dependence. Then, for each series, take Zk,t et+k,tet+2k,t+k and reject at significance level bounded above by ~ if one or more of the subset test statistics rejects at the ~ / k level.
-~-
d. The k-step-ahead optimal forecast error variance is non-decreasing in k
The k-step-ahead forecast error variance, a~ = var(et+k,t) = o-2X-'k-~b2~ ~z.~i=l i J, is nondecreasing in k. Thus, it is often useful simply to examine the sample k-step-ahead forecast error variances as a function of k, both to be sure the condition appears satisfied and to see the pattern with which the forecast error variance grows with k, which often conveys useful information. 3 Formal inference may also be done, so long as one takes care to allow for dependence of the sample variances across horizons.
Assessing optimality with respect to an information set
The key property of optimal forecast errors, from which all others follow (including those cataloged above), is unforecastability on the basis of information available at the time the forecast was made. This is true regardless of whether linear-projection optimality or conditional-mean optimality is of interest, regardless of whether the relevant loss function is quadratic, and regardless of whether the series being forecast is stationary. Following Brown and Maital (1981), it is useful to distinguish between partial and full optimality. Partial optimality refers to unforecastability of forecast errors with respect to some subset, as opposed to all subsets, of available information, ~qt. Partial optimality, for example, characterizes a situation in which a forecast is optimal with respect to the information t~sed to construct it, but the information used was not all that could have been used. Thus, each of a set of competing forecasts may have the partial optimality property if each is optimal with respect to its own information set. One may test partial optimality via regressions of the form et+k,t = offxt -~ Ut, where xt C f2t. The particular case of testing partial optimality with respect to Yt+k,t has received a good deal of attention, as in Mincer and Zarnowitz (1969). The relevant regression is et+k,t ~ o~ 0 + ~lYt+k,t + bit or Yt+k = flO + fllYt+k,t +ut, where partial optimality corresponds to (~0, cq) = (0, 0) or (flo,fll) = (0, 1). 4 One 3 Extensions of this idea to nonstationary long-memoryenvironments are developedin Diebold and Lindner (1995). 4 In such regressions,the disturbance should be white noise for 1-step-aheadforecastsbut may be serially correlated for multi-step-aheadforecasts.
247
may also expand the regression to allow for various sorts of nonlinearity. For example, following Ramsey (1969), one may test whether all coefficients in the J ~ ^" regression et+k,t = ~j=0 J~+k,t + ut are zero. Full optimality, in contrast, requires the forecast error to be unforecastable on the basis of all information available when the forecast was made (that is, the entirety of Qt). Conceptually, one could test full rationality via regressions of the form et+k,t : O~lXt q- Ut. If ~ 0 for all xt C f2t, then the forecast is fully optimal. In practice, one can never test for full optimality, but rather only partial optimality with respect to increasing information sets. Distribution-free nonparametric methods may also be used to test optimality with respect to various information sets. The sign and signed-rank tests, for example, are readily adapted to test orthogonality between forecast errors and available information, as proposed by Campbell and Dufour (1991, 1995). If, for example, et+l,t is linear-projection independent of xt E ~'~t, then c o v ( e t + l , t , x t ) -~- O. Thus, in the symmetric case, one may use the signed-rank test for whether E[zt] = E[et+l,tXt] = O, and more generally, one may use the sign test for whether median(zt) = median(et+l,txt)= 0. 5 The relevant sign and signed-rank statistics T l T+(zt) and W are S~ = ~ t = Moreover, one may allow for nonlinear transformations of the elements of the information set, which is useful for assessing conditional-mean as opposed to simply linear-projection independence, by taking zt = et+l,tg(xt), where g(.) is a nonlinear function of interest. Finally, the tests can b e generalized to allow for k-step-ahead forecast errors as before. Simply take zt = et+k,tg(xt), divide the zt series into the usual k subsets, and reject the orthogonality null at significance level bounded by a if any of the subset test statistics are significant at the c~/k level. 6
=
~f=lI+(z,)Rank(lz,[).
2. Comparing the accuracy of multiple forecasts Measures of forecast accuracy

In practice, it is unlikely that one will ever stumble upon a fully-optimal forecast; instead, situations often arise in which a number of forecasts (all of them suboptimal) are compared and possibly combined. The crucial object in measuring forecast accuracy is the loss function, L(yt+k, ~Vt+k,t), often restricted to L(et+k,t), which charts the "loss," "cost" or "disutility" associated with various pairs of forecasts and realizations. In addition to the shape of the loss function, the forecast horizon (k) is also of crucial importance. Rankings of forecast accuracy
5 Again, it is not obvious that the conditions reqtfired for application of the sign or signed-rank test to zt are satisfied, but they are; see Campbell and D u f o u r (1995) for details. 6 Our discussion has implicitly assumed that both et+l,t and g(xt) are centered at zero. This will hold for et+l,t if the forecast is unbiased, but there is no reason why it should hold for g(xt). Thus, in general, the test is based on g(xt) - ,at, where Pt is a centering parameter such as the mean, median or trend of g(xt). See Campbell and D u f o u r (1995) for details.
248
may be very different across different loss functions and/or different horizons. This result has led some to argue the virtues of various "universally applicable" accuracy measures. Clements and Hendry (1993), for example, argue for an accuracy measure under which forecast rankings are invariant to certain transformations. Ultimately, however, the appropriate loss function depends on the situation at hand. As stressed by Diebold (1993) among many others, forecasts are usually constructed for use in particular decision environments; for example, policy decisions by government officials or trading decisions by market participants. Thus, the appropriate accuracy measure arises from the loss function faced by the forecast user. Economists, for example, may be interested in the profit streams (e.g., Leitch and Tanner, 1991, 1995; Engle et al., 1993) or utility streams (e.g., McCulloch and Rossi, 1990; West, Edison and Cho, 1993) flowing from various forecasts. Nevertheless, let us discuss a few stylized statistical loss functions, because they are used widely and serve as popular benchmarks. Accuracy measures are usually defined on the forecast errors, et+k,t = Y t + k - Yt+k,t, or percent errors, Pt+k,t 1 T = (Yt+k-~t+k,t)/yt+k. For example, the mean error, M E = p ~ t = let+ k,t , and mean percent error, MPE = ~ 1t = lrP t + k , t , provide measures of bias, which is one component of accuracy. The most common overall accuracy measure, by far, is mean squared error, MSE = -~y'~t=let+k,t, r 2 1 v--,T 2 or mean squared percent error, MSPE __ - ~Lt=lPt+k,r Often the square roots of these measures are used to preserve units, yielding the root
/ 1 x--,r e 2 mean squared error, RMSE = V~/_--,t=l t+k,t~ and the root mean squared percent / 1 K--,T p 2 error, RMSPE = VpZ_,t=l t+k,t" Somewhat less popular, but nevertheless com1 r mon, accuracy measures are mean absolute error, M A E = 7~t=l[et+k,tl, and 1 T mean absolute percent error, MAPE = ~ t = l IPt+k,t[. MSE admits an informative decomposition into the sum of the variance of the forecast error and its squared bias,
MSE = E [ ( y , + k - Y,+e,t) ] = v a r ( y , + k - ),+k,t)
+ ( E [ y , + k l - E[Y,+k,,]) 2 ,
or equivalently MSE = var(yt+k) + var(Yt+k,,) - 2 cov(y,+k, Y,+k,t) + (E[yt+k]- Z[Y,+k,,]) 2 This result makes clear that MSE depends only on the second moment structure of the joint distribution of the actual and forecasted series. Thus, as noted in Murphy and Winkler (1987, 1992), although MSE is a useful summary statistic for the joint distribution of Yt+k and ~vt+k,t, in general it contains substantially less information than the actual joint distribution itself. Other statistics highlighting different aspects of the joint distribution may therefore be useful as well. Ultimately, of course, one may want to focus directly on estimates of the joint dis-
249
tribution, which may be available if the sample size is large enough to permit relatively precise estimation.
Measuring forecastability
It is natural and informative to evaluate the accuracy of a forecast. We hasten to add, however, that actual and forecasted values may be dissimilar, even for very good forecasts. To take an extreme example, note that the linear least squares forecast for a zero-mean white noise process is simply zero - the paths of forecasts and realizations will look very different, yet there does not exist a better linear forecast under quadratic loss. This example highlights the inherent limits to forecastability, which depends on the process being forecast; some processes are inherently easy to forecast, while others are hard to forecast. In other words, sometimes the information on which the forecaster optimally conditions is very valuable, and sometimes it isn't. The issue of how to quantify forecastability arises at once. Granger and Newbold (1976) propose a natural definition of forecastability for covariance stationary series under squared-error loss, patterned after the familiar R 2 of linear regression G-var(~t+l,,) var(yt+l) 1 var(et+l,,) var(yt+l) '
where both the forecast and forecast error refer to the optimal (that is, linear least squares or conditional mean) forecast. In closing this section, we note that although measures of forecastability are useful constructs, they are driven by the population properties of processes and their optimal forecasts, so they don't help one to evaluate the "goodness" of an actual reported forecast, which may be far from optimal. For example, if the variance of)t+l,t is not much lower than the variance of the covariance stationary series yt+l, it could be that either the forecast is poor, the series is inherently almost unforecastable, or both.
Statistical comparison of forecast accuracy 7

Once a loss function has been decided upon, it is often of interest to know which o f the competing forecasts has smallest expected loss. Forecasts may of course be ranked according to average loss over the sample period, but one would like to have a measure of the sampling variability in such average losses. Alternatively, one would like to be able to test the hypothesis that the difference of expected losses between forecasts I and j is zero (i.e., E[L(yt+k, ~+k,t)] = E[L(yt+k, ~+k,t)]), against the alternative that one forecast is better.
7 This section draws heavilyupon Diebold and Mariano (1995).
250
F. X. Dieboldand J. A. Lopez
Stekler (1987) proposes a rank-based test of the hypothesis that each of a set of forecasts has equal expected loss. 8 Given N competing forecasts, assign to each forecast at each time a rank according to its accuracy (the best forecast receives a rank of N, the second-best receives a rank o f N - 1, and so forth). Then aggregate the period-by-period ranks for each forecast, T
Hi = Z Rank(L(yt+k' Yt+k,t)) ^i '

t=l I = 1 , . . . , N , and form the chi-squared goodness-of-fit test statistic,
H = ~N ( H i - NT/2)2 i=1 7v /2
Under the null, H ~ X~r-1. As described here, the test requires the rankings to be independent over space and time, but simple modifications along the lines of the Bonferroni bounds test may be made if the rankings are temporally (k - 1)-dependent. Moreover, exact versions of the test may be obtained by exploiting Fisher's randomization principle. 9 One limitation of Stekler's rank-based approach is that information on the magnitude of differences in expected loss across forecasters is discarded. In malay applications, one wants to know not only whether the difference of expected losses differs from zero (or the ratio differs from 1), but also by how much it differs. Effectively, one wants to know the sampling distribution of the sample mean loss differential (or of the individual sample mean losses), which in addition to being directly informative would enable Wald tests of the hypothesis that the expected loss differential is zero. Diebold and Mariano (1995), building on earlier work by Granger and Newbold (1986) and Meese and Rogoff (1988), develop a test for a zero expected loss differential that allows for forecast errors that are nonzero mean, non-Gaussian, serially correlated and contemporaneously correlated. In general, the loss function is L(yt+k, Y~+k,t). Because in many applications the loss function will be a direct function of the forecast error, L(yt+k, ~'~+k,t)= L(e~+k,t) , we write L(e~+k,t) from this point on to economize on notation, while recognizing that certain loss functions (such as direction-of-change) don't collapse to the L(e~+k,t) form. 1 The null hypothesis of equal forecast accuracy for i two. forecasts is E[L(e~+k,t)] = E[L(4+k,t)l, or Etdt] = O, where dt = L(et+k,t)-L(~+k,t ) is the loss differential. If dt is a covariance stationary, short-memory series, then standard results may be used to deduce the asymptotic distribution of the sample mean loss differential, v ~ ( d - #) a N(0, 2~zfd(0)) ,
s Stekler uses RMSE, but other loss functionsmay be used. 9 See, for example, Bradley(1968), Chapter 4. 10 In such cases, the L(Yt+k,)i,t+k,t) form should be used.
251
i where d = is the sample mean loss differential, fa(O) = 1 / 2 ~ = _ ~ y a ( v ) is the spectral density of the loss differential at frequency zero, 7a(~) = E[(dt - #)(dt-r - / t ) ] is the autocovariance of the loss differential at displacement v, and/~ is the population mean loss differential. The formula for fa(O) shows that the correction for serial correlation can be substantial, even if the loss differential is only weakly serially correlated, due to the cumulation of the autocovariance terms. In large samples, the obvious statistic for testing the null hypothesis of equal forecast accuracy is the standardized sample mean loss differential,
1/rE,r=1 [L(e,+k,,)-L(C+k,,)]
B--
V/2rt~fa(O)/T '
where .fa(O) is a consistent estimate of fa(O). It is useful to have available exact finite-sample tests of forecast accuracy to complement the asymptotic tests. As usual, variants of the sign and signed-rank tests are applicable. When using the sign test, the null hypothesis is that the median of the loss differential is zero, median(L(e~+k,t) - L(~+k,t)) = O. Note that the null of a zero median loss differential is not the same as the null of zero difference between median losses; that is, median(L(e~+k,t)-L(e{+k)) # median(L(e~+k,t) ) -median(L(~+k,t)). For this reason, the null differs slightly in spirit from that associated with the asymptotic Diebold-Mariano test, but nevertheless, it has the intuitive and meaningful interpretation that
P(L(e~+k) > L(e~+k,t) ) = P(L(e~+k,t) < L(e~+k,t) ).

When using the Wilcoxon signed-rank test, the null hypothesis is that the loss differential series is symmetric about a zero median (and hence mean), which corresponds precisely to the null of the asymptotic Diebold-Mariano test. Symmetry of the loss differential will obtain, for example, if the distributions of L(e~+lc,t) and L(~+k,t ) are the same up to a location shift. Symmetry is ultimately an empirical matter and may be assessed using standard procedures. The construction and intuition of the distribution-free nonparametric test statistics are straightforward. The sign test statistic is S~ = ~-~Tt=l[+(dt), and the signed-rank test statistic is WB = Y~f=lI+(dt)Rank([dt[). Serial correlation may be handled as before via Bonferroni bounds. It is interesting to note that, in multistep forecast comparisons, forecast error serial correlation may be a "common feature" in the terminology of Engle and Kozicki (1993), because it is induced largely by the fact that the forecast horizon is longer than the interval at which the data are sampled and may therefore not be present in loss differentials even if present in the forecast errors themselves. This possibility can of course be checked empirically. West (1994) takes an approach very much related to, but nevertheless different from, that of Diebold and Mariano. The main difference is that West assumes that forecasts are computed from an estimated regression model and explicitly accounts for the effects of parameter uncertainty within that framework. When the estimation sample is small, the tests can lead to different results. However, as
252
the estimation period grows in length relative to the forecast period, the effects of parameter uncertainty vanish, and the Diebold-Mariano and West statistics are identical. West's approach is both more general and less general than the DieboldMariano approach. It is more general in that it corrects for nonstationarities induced by the updating of parameter estimates. It is less general in that those corrections are made within the confines of a more rigid framework than that of Diebold and Mariano, in whose framework no assumptions need be made about the often unknown or incompletely known models that underlie forecasts. In closing this section, we note that it is sometimes informative to compare the accuracy of a forecast to that of a "naive" competitor. A simple and popular such comparison is achieved by Theil's (1961) U statistic, which is the ratio of the 1step-ahead MSE for a given forecast relative to that of a random walk forecast Yt+l,t : Yt; that is,
T ~-~ (yt+l -- Yt+l,t) ^ 2 t=l T
U:
Z(Yt+l-yt) 2
t=l
Generalization to other loss functions and other horizons is immediate. The statistical significance of the MSE comparison underlying the U statistic may be ascertained using the methods just described. One must remember, of course, that the random walk is not necessarily a naive competitor, particularly for many economic and financial variables, so that values of the U statistic near one are not necessarily "bad." Several authors, including Armstrong and Fildes (1995), have advocated using the U statistic and close relatives for comparing the accuracy of various forecasting methods across series.
3. C o m b i n i n g f o r e c a s t s
In forecast accuracy comparison, one asks which forecast is best with respect to a particular loss function. Regardless of whether one forecast is "best," however, the question arises as to whether competing forecasts may be fruitfully combined - in similar fashion to the construction of an asset portfolio - to produce a composite forecast superior to all the original forecasts. Thus, forecast combination, although obviously related to forecast accuracy comparison, is logically distinct and of independent interest.
Forecast encompassing tests

Forecast encompassing tests enable one to determine whether a certain forecast incorporates (or enczompasses) all the relevant information in competing fore-
253
casts. The idea dates at least to Nelson (1972) and Cooper and Nelson (1975), and was formalized and extended by Chong and Hendry (1986). For simplicity, let us focus on the case of two forecasts, Yt+k,t 1 and ~t+k,t. Consider the regression
^
Yt+k = flO + fllYt+k,t + fl2#+k,t + t+k,t
^1
If (flO, ill, f12) = (0, 1,0), one says that model 1 forecast-encompasses model 2, and if (fl0, ill, f12) = (0,0, 1), then model 2 forecast-encompasses model 1. For any other (fl0, ill, f12) values, neither model encompasses the other, and both forecasts contain useful information about yt+k. Under certain conditions, the encompassing hypotheses can be tested using standard methods. 11 Moreover, although it does not yet seem to have appeared in the forecasting literature, it would be straightforward to develop exact finite-sample tests (or bounds tests when k > 1) of the hypothesis using simple generalizations of the distribution-free tests discussed earlier. Fair and Shiller (1989, 1990) take a different but related approach based on the regression
^1 ^2 (Yt+k -- Yt) = flO q- fll (Yt+k,t - yt) d- fl 2(Yt+k,t-Yt) -[-et+k,t .
As before, forecast-encompassing corresponds to coefficient values of (0,1,0) or (0,0,1). Under the null of forecast encompassing, the Chong-Hendry and FairShiller regressions are identical. When the variable being forecast is integrated, however, the Fair-Shiller framework may prove more convenient, because the specification in terms of changes facilitates the use of Gaussian asymptotic distribution theory.
Forecast combination
Failure of one model's forecasts to encompass other models' forecasts indicates that all the models examined are misspecified. It should come as no surprise that such situations are typical in practice, because all forecasting models are surely misspecified - they are intentional abstractions of a much more complex reality. What, then, is the role of forecast combination techniques? In a world in which information sets can be instantaneously and costlessly combined, there is no role; it is always optimal to combine information sets rather than forecasts. In the long run, the combination of information sets may sometimes be achieved by improved model specification. But in the short run - particularly when deadlines must be met and timely forecasts produced - pooling of information sets is typically either impossible or prohibitively costly. This simple insight motivates the pragmatic idea of forecast combination, in which forecasts rather than models are the basic object of analysis, due to an assumed inability to combine information sets. Thus, forecast combination can be viewed as a key link between the short11 Note that MA(k - 1) serial correlation will typicallybe present in et+k,t if k > 1.
254
run, real-time forecast production process, and the longer-run, ongoing process of model development. Many combining methods have been proposed, and they fall roughly into two groups, "variance-covariance" methods and "regression-based" methods. Let us consider first the variance-covariance method due to Bates and Granger (1969). Suppose one has two unbiased forecasts from which a composite is formed as ~2
Yt+k,t = oJYt+k,t -]-
^c
^1
(1
- co)yt+k, t
^2
Because the weights sum to unity, the composite forecast will necessarily be unbiased. Moreover, the combined forecast error will satisfy the same relation as the combined forecast; that is,
et+k, t ~ (.Oet+k, t -~-
(1 -- fo)et+k, 2 t :
2 = co20-~1+ (1 - (D) 20-22 2 -~- 2co(1 - co)0-t2,where 0-~1 and 0"22are with a variance 0-c unconditional forecast error variances and 0-12 is their covariance. The combining weight that minimizes the combined forecast error variance (and hence the combined forecast error MSE, by unbiasedness) is CO*~ 0-22 -- 0-12 2 + 0-11 2 _ 20-12 0-22
Note that the optimal weight is determined by both the underlying variances and covariances. Moreover, it is straightforward to show that, except in the case where one forecast encompasses the other, the forecast error variance from the optimal composite is less than min(0-12~,0-22). Thus, in population, one has nothing to lose by combining forecasts and potentially much to gain. In practice, one replaces the unknown variances and covariances that underlie the optimal combining weights with consistent estimates; that is, one estimates co* ^ = 1 / T y']~t=le't+k,t~+k,t, r i by replacing a q with 0-ij yielding &. = 0"22-- ~12 ~r~2 + ~r2l -- 2~12
In finite samples of the size typically available, sampling error contaminates the combining weight estimates, and the problem of sampling error is exacerbated by the collinearity that typically exists among primary forecasts. Thus, while one hopes to reduce out-of-sample forecast MSE by combining, there is no guarantee. In practice, however, it turns out that forecast combination techniques often perform very well, as documented Clemen's (1989) review of the vast literature on forecast combination. Now consider the "regression method" of forecast combination. The form of the Chong-Hendry and Fair-Shiller encompassing regressions immediately sug12The generalizationto the case of M > 2 competing unbiased forecasts is straightforward, as shown in Newbold and Granger (1974).
255
gests combining forecasts by simply regressing realizations on forecasts. Granger and Ramanathan (1984) showed that the optimal variance-covariance combining weight vector has a regression interpretation as the coefficient vector of a linear projection of yt+k onto the forecasts, subject to two constraints: the weights sum to unity, and no intercept is included. In practice, of course, one simply runs the regression on available data. In general, the regression method is simple and flexible. There are many variations and extensions, because any "regression tool" is potentially applicable. The key is to use generalizations with sound motivation. We shall give four examples: time-varying combining weights, dynamic combining regressions, Bayesian shrinkage of combining weights toward equality, and nonlinear combining regressions.
a. Time-varying combining weights
Time-varying combining weights were proposed in the variance-covariance context by Granger and Newbold (1973) and in the regression context by Diebold and Pauly (1987). In the regression framework, for example, one may undertake weighted or rolling estimation of combining regressions, or one may estimate combining regressions with explicitly time-varying parameters. The potential desirability of time-varying weights stems from a number of sources. First, different learning speeds may lead to a particular forecast improving over time relative to others. In such situations, one naturally wants to weight the improving forecast progressively more heavily. Second, the design of various forecasting models may make them relatively better forecasting tools in some situations than in others. For example, a structural model with a highly developed wage-price sector may substantially outperform a simpler model during times of high inflation. In such times, the more sophisticated model should received higher weight. Third, the parameters in agents' decision rules may drift over time, and certain forecasting techniques may be relatively more vulnerable to such drift.
b ~ Dynamic combining regressions
Serially correlated errors arise naturally in combining regressions. Diebold (1988) considers the covariance stationary case and argues that serial correlation is likely to appear in unrestricted regression-based forecast combining regressions when fll + t2 1. More generally, it may be a good idea to allow for serial correlation in combining regressions to capture any dynamics in the variable to be forecast not captured by the various forecasts. In that regard, Coulson and Robins (1993), following Hendry and Mizon (1978), point out that a combining regression with serially correlated disturbances is a special case of a combining regression that includes lagged dependent variables and lagged forecasts, which they advocate.
256
c. Bayesian shrinkage of combining weights toward equality

Simple arithmetic averages of forecasts are often found to perform very well, even relative to "optimal" composites. 13 Obviously, the imposition of an equal weights constraint eliminates variation in the estimated weights at the cost of possibly introducing bias. However, the evidence indicates that, under quadratic loss, the benefits of imposing equal weights often exceed this cost. With this in mind, Clemen and Winkler (1986) and Diebold and Pauly (1990) propose Bayesian shrinkage techniques to allow for the incorporation of varying degrees of prior information in the estimation of combining weights; least-squares weights and the prior weights then emerge as polar cases for the posterior-mean combining weights. The actual posterior mean combining weights are a matrix weighted average of those for the two polar cases. For example, using a natural conjugate normal-gamma prior, the posterior-mean combining weight vector is
flposterior
(Q + F,F)-I (Qflprior -t- F'F~) ,
where/~prior is the prior mean vector, Q is the prior precision matrix, F is the design matrix for the combining regression, and/~ is the vector of least squares combining weights. The obvious shrinkage direction is toward a measure of central tendency (e.g., the arithmetic mean). In this way, the combining weights are coaxed toward the arithmetic mean, but the data are still allowed to speak, when (and if) they have something to say.
d. Nonlinear combining regressions

There is no reason, of course, to force combining regressions to be linear, and various of the usual alternatives may be entertained. One particularly interesting possibility is proposed by Deutsch, Granger and Ter/isvirta (1994), who suggest
^1 .~c,+k,t= I(st = 1)(flllYt+k,t + fllZ~tt+k,t)
The states that govern the combining weights can depend on past forecast errors from one or both models or on various economic variables. Furthermore, the indicator weight need not be simply a binary variable; the transition between states can be made more gradual by allowing weights to be functions of the forecast errors or economic variables. 4. Special topics in evaluating economic and financial forecasts
Evaluating direction-of-change forecasts

Direction-of-change forecasts are often used in financial and economic decisionmaking (e.g., Leitch and Tanner, 1991, 1995; Satchell and Timmermann, 1992). 13 See Winkler and Makridakis (1983),Cleman (1989), and many of the referencestherein.
257
The question of how to evaluate such forecasts immediately arises. Our earlier results on tests for forecast accuracy comparison remain valid, appropriately modified, so we shall not restate them here. Instead, we note that one frequently sees assessments of whether direction-of-change forecasts "have value," and we shall discuss that issue. The question as to whether a direction-of-change forecast has value by necessity involves comparison to a naive benchmark - the direction-of-change forecast is compared to a "naive" coin flip (with success probability equal to the relevant marginal). Consider a 2 2 contingency table. For ease of notation, call the two states into which forecasts and realizations fall " / " and "j". Commonly, for example, I = " u p " and j = "down." Tables 1 and 2 make clear our notation regarding observed cell counts and unobserved cell probabilities. The null hypothesis that a direction-of-change forecast has no value is that the forecasts and realizations are independent, in which case Pij = PI.P4, Vi, j. As always, one proceeds under the null. The true cell probabilities are of course unknown, so one uses the consistent estimates/5i. = Oi./O and t54 = 0 4 / 0 . Then one consistently estimates the expected cell counts under the null, Eij = Pi.P.jO, by E~j = P~PjO = 0i.04/0. Finally, one constructs the statistic C = ~ j = l (Oij -- Eij)2/Eij. lJnier the null, CdX~. An intimately-related test of forecast value was proposed by Merton (1981) and Henriksson and Merton (1981), who assert that a forecast has value if Pu/Pg. + PjJPj. > 1. They therefore develop an exact test of the null hypothesis that Pii/Pi. + Pjj/Pj. = 1 against the inequality alternative. A key insight, noted in varying degrees by Schnader and Stekler (1990) and Stekler (1994), and formalized by Pesaran and Timmermann (1992), is that the Henriksson-Merton null is equivalent to the contingency-table null if the marginal probabilities are fixed at the observed relative frequencies, Oi./O and 04/0. The same unpalatable assumption is necessary for deriving the exact finite-sample distribution of the Henriksson-Merton test statistic.
Table 1 Observed cell counts Actual i Forecast i Forecast j Marginal Actual j Marginal
Oil Oil
O.i
Oij Ojj 04
Oi. Oj
Total: O
Table 2 Unobserved cell probabilities Actual i Forecast i Forecast j Marginal Actual j Marginal
Pii Pji
P.~
Pij Pjj
P4
Pi. Pj.
Total: 1
258
Asymptotically, however, all is well; the square of the Henriksson-Merton statistic, appropriately normalized, is asymptotically equivalent to C, the chisquared contingency table statistic. Moreover, the 2 x 2 contingency table test generalizes trivially to the N x N case, with
cN= E
N (Oij
i,j= 1
Under the null, CN aN X~N_I)(N_I). A subtle point arises, however, as pointed out by Pesaran and T i m m e r m a n n (1992). In the 2 x 2 case, one must base the test on the entire table, as the off-diagonal elements are determined by the diagonal elements, because the two elements of each row must sum to one. In the N x N case, in contrast, there is more latitude as to which cells to examine, and for purposes of forecast evaluation, it may be desirable to focus only on the diagonal cells. In closing this section, we note that although the contingency table tests are often of interest in the direction-of-change context (for the same reason that tests based on Theil's U-statistic are often of interest in more standard contexts), forecast "value" in that sense is neither a necessary nor sufficient condition for forecast value in terms of a profitable trading strategy yielding significant excess returns. For example, one might beat the marginal forecast but still earn no excess returns after adjusting for transactions costs. Alternatively, one might do worse than the marginal but still make huge profits if the "hits" are "big," a point stressed by C u m b y and Modest (1987).
Evaluating probability forecasts

Oftentimes economic and financial forecasts are issued as probabilities, such as the probability that a business cycle turning point will occur in the next year, the probability that a corporation will default on a particular bond issue this year, or the probability that the return on the S&P 500 stock index will be more than ten percent this year. A number of specialized considerations arise in the evaluation of probability forecasts, to which we now turn. Let Pt+k,t be a probability forecast made at time t for an event at time t + k, and let Rt+k = 1 if the event occurs and zero otherwise. Pt+k,t is a scalar if there are only two possible events. More generally, if there are N possible events, then Pt+k,t is an (N - 1) x 1 vector. 14 F o r notational economy, we shall focus on scalar probability forecasts. Accuracy measures for probability forecasts are commonly called "scores," and the most common is Brier's (1950) quadratic probability score, also called the Brier score,
14 The probabillity forecast assigned to the Nth event is implicitly determined by the restriction that the probabilities sum to 1.
259
qes =
t=l
2(Pt+k,,- R,+k) 2
Clearly, QPS c [0, 2], and it has a negative orientation (smaller values indicate more accurate forecasts).15 To understand the QPS, note that the accuracy of any forecast refers to the expected loss when using that forecast, and typically loss depends on the deviation between forecasts and realizations. It seems reasonable, then, in the context of probability forecasting under quadratic loss, to track the average squared divergence between Pt+k,t and Rt+~, which is what the QPS does. Thus, the QPS is a rough probability-forecast analog of MSE. The QPS is only a rough analog of MSE, however, because Pt+k,t is in fact not a forecast of the outcome (which is 0-1), but rather a probability assigned to it. A more natural and direct way to evaluate probability forecasts is simply to compare the forecasted probabilities to observed relative frequencies - that is, to assess calibration. An overall measure of calibration is the global squared bias, GSB = 2(/3- R)2 , where P = 1/T~Vt=xP,+k# and R = 1/T~=,Rt+k. GSB C [0, 2] with a negative orientation. Calibration may also be examined locally in any subset of the unit interval. For example, one might check whether the observed relative frequency corresponding to probability forecasts between 0.6 and 0.7 is also between 0.6 and 0.7. One may go farther to form a weighted average of local calibration across all cells of a Jsubset partition of the unit interval into J subsets chosen according to the user's interest and the specifics of the situation. 16 This leads to the local squared bias measure,
J j=l
2v,(pj- R;)2,
where Tj is the number of probability forecasts in set j , / ' ] is the average forecast in set j, and Rj is the average realization in set j, j = 1, ..., J. Note that LSB c [0, 2], and LSB = 0 implies that GSB = 0, but not conversely. Testing for adequate calibration is a straightforward matter, at least under independence of the realizations. For :r a given event and a corresponding sequence of forecasted probabilities {Pt+k,t}t=l , create J mutually exclusive and collectively exhaustive subsets of forecasts, and denote the midpoint of each range rcj,j = 1,... ,J. Let Rj denote the number of observed events when the forecast was in set j, respectively, and define "range j" calibration statistics,
i5 The "2" that appears in the QPS formula is an artifact from the full vector case. We could of course drop it without affecting the QPS rankings of competing forecasts, but we leave it to maintain comparability to other literature. 16 For example, Diebold and Rudebusch (1989) split the unit interval into ten equal parts.
260
(Rj -

_ (Rj - ej)
Zj
( T j z g ( 1 - ~j)) Uz -
w~/2 , , j =
l,...,J
and an overall calibration statistic,

(R+ - e+)
Z0 -
1/2 W+
'
where R+ = ~J=IRj, e+ = ~J=l Tjnj, and w+ = ~J=l Tj ~j (1 - 7rj). Zo is a joint test of adequate local calibration across all cells, while the Zj statistics test cell-by-cell local calibration. 17 Under independence, the binomial structure would obviously imply that Z0 a N(0, 1), and Zj N(O, 1), Vj = 1 , . . . , J. In a fascinating development, Seillier-Moiseiwitsch and Dawid (1993) show that the asymptotic normality holds much more generally, including in the dependent situations of practical relevance. One additional feature of probability forecasts (or more precisely, of the corresponding realizations), called resolution, is of interest: RES = ~ j~l 2Tj(Rj - ~)2 . RES is simply the weighted average squared divergence between R and the [~js, a measure of how much the observed relative frequencies move across cells. RES >_ 0 and has a positive orientation. As shown by Murphy (1973), an informative decomposition of QPS exists, QPS = QPSR + LSB
1 J
RES ,
where QPS~ is the QPS evaluated at Pt+k,t = R. This decomposition highlights the tradeoffs between the various attributes of probability forecasts. Just as with Theil's U-statistic for "standard" forecasts, it is sometimes informative to compare the performance of a particular probability forecast to that of a benchmark. Murphy (1974), for example, proposes the statistic
M=QPS-QPSR=LSB-RES
which measures the difference in accuracy between the forecast at hand and the benchmark forecast R. Using the earlier-discussed Diebold-Mariano approach, one can also assess the significance of differences in QPS and QPSk, differences in QPS or various other measures of probability forecast accuracy across forecasters, or differences in local or global calibration across forecasters.
17 One may of course test for adequate global calibration by using a trivial partition of the unit interval - the unit interval itself.
Forecast evaluation and combination Evaluating volatility forecasts
261
Many interesting questions in finance, such as options pricing, risk hedging and portfolio management, explicitly depend upon the variances of asset prices. Thus, a variety of methods have been proposed for generating volatility forecasts. As opposed to point or probability forecasts, evaluation of volatility forecasts is complicated by the fact that actual conditional variances are unobservable. A standard "solution" to this unobservability problem is to use the squared realization el+ k as a proxy for the true conditional variance ht+k, because E[e~+klOt+k ~]----E[ht+kv2+klt2t+k-1] = hi+k, where vt+k ~ WN(0, 1). 18 Thus, for exT -1(et+ 2 k -ht+k,t) ^ 2. Although MSE as often used to measure ample, MSE = 1/T}-~t= volatility forecast accuracy, Bollerslev, Engle and Nelson (1994) point out that MSE is inappropriate, because it penalizes positive volatility forecasts and negative volatility forecasts (which are meaningless) symmetrically. Two alternative loss functions that penalize volatility forecasts asymmetrically are the logarithmic loss function employed in Pagan and Schwert (1990),
1 T 2 2 LL--- T t__~[ 1n(et+k)In (ht+k,t)]

and the heteroskedasticity-adjusted MSE of Bollerslev and Ghysels (1994),
HMSE = T ~
_1V , [ [ht+k,, d+ k -
Bollerslev, Engle and Nelson (1994) suggest the loss function implicit in the Gaussian quasi-maximum likelihood function often used in fitting volatility models; that is, GMLE=~t=~ 1 ln(ht+k,t)+~]. As with all forecast evaluations, the volatility forecast evaluations of most interest to forecast users are those conducted under the relevant loss function. West, Edison and Cho (1993) and Engle et al. (1993) make important contributions along those lines, proposing economic loss functions based on utility maximization and profit maximization, respectively. Lopez (1995) proposes a framework for volatility forecast evaluation that allows for a variety of economic loss functions. The framework is based on transforming volatility forecasts into probability forecasts by integrating over the assumed or estimated distribution of et. By selecting the range of integration corresponding to an event of interest, a
18Although el+k is an unbiased estimator of ht+k, it is an imprecise or "noisy" estimator. For example, if vt+k N N(O, 1),eZ+k- ht+kvt+ 2 k has a conditional mean of ht+k because v~+k~ X~- Yet, because the median of a ;(12distribution is 0.455, e~+ k < 1/2ht+k more than fifty percent of the time.
-
262
forecast user can incorporate elements of her loss function into the probability forecasts. For example, given et+k]Ot ~ D(O, ht+k,t) and a volatility forecast ]~t+k,t, an options trader interested in the event et+k E [L~,t+k, U~,t+k] would generate the probability forecast
Pt+k,t = Pr(Lz,t+k < et+k < Ue,t+k)
Pr ,, ~t+k,t
(L~,t+k <
zt+k < ~ /
U~,,+~]
= fl~,,+k f(zt+k)dzt+k ,
uo,,+k
where zt+k is the standardized innovation, f(zt+k) is the functional form of D(0, 1), and [l~,t+k, u~,t+k] is the standardized range of integration. In contrast, a forecast user interested in the behavior of the underlying asset, yt+k = #t+k,~ + et+k where ~tt+k, t = E[yt+k]flt], might generate the probability forecast
Pt+k,t : Pr(Ly,t+k < Yt+k < Uy,t+k) [Ly,,+k - ~,+k,, Uy,,+k - ~,+k,t'~
= l "y''+kf(zt+k)dzt+k aly ,t+k

where ~t+k,t is the forecasted conditional mean and [ly,t+k, Uy,t+k] is the standardized range of integration. Once generated, these probability forecasts can be evaluated using the scoring rules described above, and the significance of differences across models can be tested using the Diebold-Mariano tests. The key advantage of this framework is that it allows the evaluation to be based on observable events and thus avoids proxying for the unobservable true variance. The Lopez approach to volatility forecast evaluation is based on time-varying probabilities assigned to a fixed interval. Alternatively, one may fix the probabilities and vary the widths of the intervals, as in traditional confidence interval construction. In that regard, Christoffersen (1995) suggests exploiting the fact that if a (1 - ~)% confidence interval (denoted [Ly,t+k, Uy,t+k]) is correctly calibrated, then
E[It+k,tilt,t-k,It-l,t-k-1,...Ik+l,1] = (1 - ~) ,
where It+k,t = 1, 0, if yt+k C [Ly,t+k, Uy,t+k] if otherwise.
263
That is, Christoffersen suggests checking conditional coverage. 19 Standard evaluation methods for interval forecasts typically restrict attention to unconditional coverage, E[It+klt] = (1 - ct). But simply checking unconditional coverage is insufficient in general, because an interval forecast with correct unconditional coverage may nevertheless have incorrect conditional coverage at any particular time. F o r one-step-ahead interval forecasts (k = 1), the conditional coverage criterion becomes
E[It+l,tlIt,t-~,It-l,t-2,...12,1]
or equivalently,
It+ll t ~ Bern(1 - a) .
= (1 - ~) ,
Given T values of the indicator variable for T interval forecasts, one can determine whether the forecast intervals display correct conditional coverage by testing the hypothesis that the indicator variable is an iid Bernoulli(1 - ct) random variable. A likelihood ratio test of the iid Bernoulli hypothesis is readily constructed by comparing the log likelihoods of restricted and unrestricted M a r k o v processes for the indicator series {It+i#}. The unrestricted transition probability matrix is
[ 7ql 1-Tzll )
YI =
1 - 7r00
x00
'
where re11 =P(It+llt = l[Itlt-1 = 1), and so forth. The transition probability matrix under the null is [I--~ ~] The corresponding approximate likelihood functions are
L(rlii ) =
and
(rc,1)n, (1 _ ~ l l ) n , 0 ( 1 _ 7~00)n0, 7~00n00
L(~II) = (1
~)(nn+n0l)(00(nl0+n00)
where n;- is the number of observed transitions from I to j and I is the indicator sequence. 2 The likelihood ratio statistic for the conditional coverage hypothesis is
LRcc = 2[lnL((-l[I) - lnL(ctlI)] ,
19 In general, one wants to test whether E[lt+kltIOtJ = (1 - ~), where t2t is all information available at time t. For present purposes, Ot is restricted to past values of the indicator sequence in order to construct general and easily applied tests. 20 The likelihoods are approximate because the initial terms are dropped. All the likelihood ratio tests presented are of course asymptotic, so the treatment of the initial terms is inconsequential.
264
where l'I are the maximum likelihood estimates. Under the null hypothesis,
LRcc~a.z 2.
The likelihood ratio test of conditional coverage can be decomposed into two separately interesting hypotheses, correct unconditional coverage, E[It+llt] = (1 - ~), and independence, 7~11 = 1 - ~00. The likelihood ratio test for correct unconditional coverage (given independence) is
LRuc = 2[lnL(~lI ) - l n L ( ~ l I ) ] ,
where Z(nlI ) = ( 1 - n)(n'+ni)(rC) (n~+n). Under the null hypothesis, LRu~ ~ Z~. The independence hypothesis is tested separately by
ZRind :
2[lnL(flII) - l n Z ( ~ l I ) ]
Under the null hypothesis, LRind a X~' It is apparent that LRcc = LRuc+LRind, in small as well as large samples. The independence property can also be checked in the case where k = 1 using the group test of David (1947), which is an exact and uniformly most powerful test against first-order dependence. Define a group as a string of consecutive zeros or ones, and let r be the number of groups in the sequence {It+l,t}. Under the null that the sequence is iid, the distribution of r given the total number of ones, nl, and the total number of zeros, no, is fr for r > 2
where n = no + nl, and f2~ nl - 1 =2(n_-ll)(s_l), forreven for r odd .
fr=
fz~+l
f 2S(~s) 2s),
Finally, the generalization to k > 1 is simple in the likelihood ratio framework, in spite of the fact that k-step-ahead prediction errors are serially correlated in general. The basic framework remains intact but requires a kth-order Markov chain. A kth-order chain, however, can always be written as a first-order chain with an expanded state space, so that direct analogs of the results for the firstorder case apply.
5. C o n c l u d i n g r e m a r k s
Three modern themes permeate this survey, so it is worth highlighting them explicitly. The first theme is that various types of forecasts, such as probability forecasts and volatility forecasts, are becoming more integrated into economic and financial decision making, leading to a derived demand for new types of forecast evaluation procedures.
265
The second theme is the use of exact finite-sample hypothesis tests, typically based on distribution-free nonparametrics. We explicitly sketched such tests in the context of forecast-error unbiasedness, k-dependence, orthogonality to available information, and when more than one forecast is available, in the context of testing equality of expected loss, testing whether a direction-of-change forecast has value, etc. The third theme is use of the relevant loss function. This idea arose in many places, such as in forecastability measures and forecast accuracy comparison tests, and may readily be introduced in others, such as orthogonality tests, encompassing tests and combining regressions. In fact, an integrated tool kit for estimation, forecasting, and forecast evaluation (and hence model selection and nonnested hypothesis testing) under the relevant loss function is rapidly becoming available; see Weiss and Andersen (1984), Weiss (1995), Diebold and Mariano (1995), Christoffersen and Diebold (1994, 1995), and Diebold, Ohanian and Berkowitz (1995).
References
Armstrong, J. S. and R. Fildes (1995). On the selection of error measures for comparisons among forecasting methods. J. Forecasting 14, 67-71. Auerbach, A. (1994). The U.S. fiscal problem: Where we are, how we got here and where we're going. NBER Macroeconomics Annual, MIT Press, Cambridge, MA. Bates, J. M. and C. W. J. Granger (1969). The combination of forecasts. Oper. Res. Quart. 20, 451468. Bollerslev, T., R. F. Engle and D. B. Nelson (1994). ARCH models. In: R. F. Engle and D. McFadden, eds., Handbook of Econometrics, Vol. 4, North-Holland, Amsterdam. Bollerslev, T. and E. Ghysels (1994). Periodic autoregressive conditional heteroskedasticity. Working Paper No. 178, Department of Finance, Kellogg School of Management, Northwestern University. Bonham, C. and R. Cohen (1995). Testing the rationality of price forecasts: Comment. Amer. Econom. Rev. 85, 284-289. Bradley, J. V. (1968). Distribution-free statistical tests. Prentice Hall, Englewood Cliffs, NJ. Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review 75, 1-3. Brown, B. W. and S. Maital (1981). What do economists know? An empirical study of experts' expectations. Econometrica 49, 491-504. Campbell, B. and J.-M. Dufour (1991 Over-rejections in rational expectations models: A nonparametric approach to the Mankiw-Shapiro problem. Econom. Lett. 35, 285-290. Campbell, B. and J.-M. Dufour (1995). Exact nonparametric orthogonality and random walk tests. Rev. Econom. Statist. 77, 1-16. Campbell, B. and E. Ghysels (1995). Federal budget projections: A nonparametric assessment of bias and efficiency. Rev. Econom. Statist. 77, 1%31. Campbell, J. Y. and N. G. Mankiw (1987). Are output fluctuations transitory? Quart. J. Econom. 102, 857-880. Chong, Y. Y. and D. F. Hendry (1986). Econometric evaluation of linear macroeconomic models. Rev. Econom. Stud. 53, 671~590. Christoffersen, P. F. (1995). Predicting uncertainty in the foreign exchange markets. Manuscript, Department of Economics, University of Pennsylvania. Christoffersen, P. F. and F. X. Diebold (1994). Optimal prediction under asymmetric loss. Technical Working Paper No. 167, National Bureau of Economic Research, Cambridge, MA.
266
Clemen, R. T. (1989). Combining forecasts: A review and annotated bibliography. Internat. J. Forecasting 5, 559-581. Clemen, R. T. and R. L. Winkler (1986). Combining economic forecasts. J. Econom. Business Statist. 4, 39-46. Clements, M. P. and D. F. Hendry (1993). On the limitations of comparing mean squared forecast errors. J. Forecasting 12, 617-638. Cochrane, J. H. (1988). How big is the random walk in GNP? J. Politic. Eeonom. 96, 893-920. Cooper, D. M. and C. R. Nelson (1975). The ex-ante prediction performance of the St. Louis and F.R.B.-M.I.T.-Penn econometric models and some results on composite predictors. J. Money, Credit and Banking 7, 1-32. Coulson, N. E. and R. P. Robins (1993). Forecast combination in a dynamic setting. J. Forecasting 12, 63-67. Curnby, R. E. and J. Huizinga (1992). Testing the autocorrelation structure of disturbances in ordinary least squares and instrumental variables regressions. Econometrica 60, 185-195. Cumby, R. E. and D. M. Modest (1987). Testing for market timing ability: A framework for forecast evaluation. J. Finane. Econom. 19, 16%189. David, F. N. (1947). A power function for tests of randomness in a sequence of alternatives. Biometrika 34, 335-339. Deutsch, M., C. W. J. Granger and T. Tersvirta (1994). The combination of forecasts using changing weights. Internat. J. Forecasting 10, 47-57. Diebold, F. X. (1988). Serial correlation and the combination of forecasts. J. Business Econom. Statist. 6, 105-111. Diebold, F. X. (1993). On the limitations of comparing mean square forecast errors: Comment. J. Forecasting 12, 641-642. Diebold, F. X. and P. Lindner (1995). Fractional integration and interval prediction. Econom. Lett., to appear. Diebold, F. X. and R. Mariano (1995). Comparing predictive accuracy. J. Business Eeonom. Statist. 13, 253-264. Diebold, F. X. L. Ohanian and J. Berkowitz (1995). Dynamic equilibrium economies: A framework for comparing models and data. Technical Working Paper No. 174, National Bureau of Economic Research, Cambridge, MA. Diebold, F. X. and P. Pauly (1987). Structural change and the combination of forecasts. J. Forecasting 6, 21-40. Diebold, F. X. and P. Pauly (1990). The use of prior information in forecast combination. Internat. J. Forecasting 6, 503-508. Diebold, F. X. and G. D. Rudebusch (1989). Scoring the leading indicators. J. Business 62, 369-391. Dufour, J.-M. (1981). Rank tests for serial dependence. J. Time Ser. Anal. 2, 117-128. Engle, R. F., C.-H. Hong A. Kane and J. Nob (1993). Arbitrage valuation of variance forecasts with simulated options. In: D. Chance and R. Tripp, eds., Advances in Futures and Options Research, JIA Press, Greenwich, CT. Engle, R. F. and S. Kozicki (1993). Testing for common features. J. Business Econom. Statist. 11, 369-395. Fair, R. C. and R. J. Shiller (1989). The informational content of ex-ante forecasts. Rev. Econom. Statist. 71, 325-331. Fair, R. C. and R. J. Shiller (1990). Comparing information in forecasts from econometric models. Amer. Eeonom. Rev. 80, 375-389. Fama, E. F. (1970). Efficient capital markets: A review of theory and empirical work. J. Finance 25, 383-417. Fama, E. F. (1975). Short-term interest rates as predictors of inflation. Amer. Econom. Rev. 65, 269282. Fama, E. F. (1991). Efficient markets II. J. Finance 46, 1575-1617. Fama, E. F. and K. R. French (1988). Permanent and temporary components of stock prices. J. Politic. Econom. 96, 246-273.
267
Granger, C. W. J. and P. Newbold (1973). Some comments on the evaluation of economic forecasts. Appl. Econom. 5, 35-47. Granger, C. W. J. and P. Newbold (1976). Forecasting transformed series. J. Roy. Statist. Soc. B 38,
189-203.
Granger, C. W. J. and P. Newbold (1986). Forecasting economic time series. 2nd ed., Academic Press, San Diego. Granger, C. W. J. and R. Ramanathan (1984). Improved methods of forecasting. J. Forecasting 3, 197-204. Hansen, L. P. and R. J. Hodrick (1980). Forward exchange rates as optimal predictors of future spot rates: An econometric investigation. J. Politic. Econom. 88, 829-853. Hendry, D. F. and G. E. Mizon (1978). Serial correlation as a convenient simplification, not a nuisance: A comment on a study of the demand for money by the Bank of England. Econom. J. 88, 549-563. Henriksson, R. D. and R. C. Merton (1981). On market timing and investment performance II: Statistical procedures for evaluating forecast skills. J. Business 54, 513-533. Keane, M. P. and D. E. Runkle (1990). Testing the rationality of price forecasts: New evidence from panel data. Amer. Econom. Rev. 80, 714-735. Leitch, G. and J. E. Tanner (1991). Economic forecast evaluation: Profits versus the conventional error measures. Amer. Econom. Rev. 81, 580-590. Leitch, G. and J. E. Tanner (1995). Professional economic forecasts: Are they worth their costs? 3. Forecasting 14, 143-157. LeRoy, S. F. and R. D. Porter (1981). The present value relation: Tests based on implied variance bounds. Econometrica 49, 555-574. Lopez, J. A. (1995). Evaluating the predictive accuracy of volatility models. Manuscript, Research and Market Analysis Group, Federal Reserve Bank of New York. Mark, N. C. (1995). Exchange rates and fundamentals: Evidence on long-horizon predictability. Amer. Econ. Rev. 85, 201-218. McCulloch, R. and P. E. Rossi (1990). Posterior, predictive and utility-based approaches to testing the arbitrage pricing theory. J. Financ. Econ. 28, 7-38. Meese, R. A. and K. Rogoff (1988). Was it real? The exchange rate interest differential relation over the modern floating-rate period. J. Finance 43, 933-948. Merton, R. C. (1981). On market timing and investment performance I: An equilibrium theory of value for market forecasts. J. Business 54, 513-533. Mincer, J. and V. Zarnowitz (1969). The evaluation of economic forecasts. In: J. Mincer, ed., Economic forecasts and expectations, National Bureau of Economic Research, New York. Murphy, A. H. (1973). A new vector partition of the probability score. J. Appl. Meteor. 12, 595-600. Murphy, A. H. (1974). A sample skill score for probability forecasts. Monthly Weather Review 102, 48-55. Murphy, A. H. and R. L. Winkler (1987). A general framework for forecast evaluation. Monthly Weather Review 115, 1330-1338. Murphy, A. H. and R. L. Winkler (1992). Diagnostic verification of probability forecasts. Internat. J. Forecasting 7, 435-455. Nelson, C. R. (1972). The prediction performance of the F.R.B.-M.1.T.-Penn model of the U.S. economy. Amer. Econom. Rev. 62, 902-917. Nelson, C. R. and G. W. Schwert (1977). Short term interest rates as predictors of inflation: On testing the hypothesis that the real rate of interest is constant. Amer. Econom. Rev. 67, 478-486. Newbold, P. and C. W. J. Granger (1974). Experience with forecasting univariate time series and the combination of forecasts. J. Roy. Statist. Soc. A 137, 131-146. Pagan, A. R. and G. W. Schwert (1990). Alternative models for conditional stock volatility. J. Econometrics 45, 267-290. Pesaran, M. H. (1974). On the general problem of model selection. Rev. Econom. Stud. 41, 153-171. Pesaran, M. H. and A. Timmermann (1992). A simple nonparametric test of predictive performance. J. Business Econom. Statist. 10, 461-465.
268
Ramsey, J. B. (1969). Tests for specification errors in classical least-squares regression analysis. J. Roy. Statist. Soe. B 2, 350-371. Satchell, S. and A. Timmermann (1992). An assessment of the economic value of nonlinear foreign exchange rate forecasts. Financial Economics Discussion Paper FE-6/92, Birkbeck College, Cambridge University. Schnader, M. H. and H. O. Stekler (1990). Evaluating predictions of change. J. Business 63, 99-107. Seillier-Moiseiwitsch, F. and A. P. Dawid (1993). On testing the validity of sequential probability forecasts. J. Amer. Statist. Assoc. 88, 355-359. Shiller, R. J. (1979). The volatility of long term interest rates and expectations models of the term structure. J. Politic. Econom. 87, 1190-1219. Stekler, H. O. (1987). Who forecasts better? J. Business Econom. Statist. 5, 155-158. Stekler, H. O. (1994). Are economic forecasts valuable? J. Forecasting 13, 495-505. Theil, H. (1961). Economic Forecasts and Policy. North-Holland, Amsterdam. Weiss, A. A. (1995). Estimating time series models using the relevant cost function. Manuscript, Department of Economics, University of Southern California. Weiss, A. A. and A. P. Andersen (1984). Estimating forecasting models using the relevant forecast evaluation criterion. J. Roy. Statist. Soc. A 137, 484~87. West, K. D. (1994). Asymptotic inference about predictive ability. Manuscript, Department of Economics, University of Wisconsin. West, K. D., H. J. Edison and D. Cho (1993). A utility-based comparison of some models of exchange rate volatility, ar. lnternat. Econom. 35, 23-45. Winkler, R. L. and S. Makridakis (1983). The combination of forecasts. J. Roy. Statist. Soc. A 146,
150-157.
(~
Predictable Components in Stock Returns*
Gautam Kaul
1. Introduction
Predictability of stock returns has always fascinated practioners (for obvious reasons) and academics (for not so obvious reasons). In this paper, I attempt to review empirical methods used in the financial economics literature to uncover predictable components in stock returns. Given the amazing growth in the recent literature on predictability, I cannot conceivably review all the papers in this area. I will therefore concentrate primarily on the empirical techniques introduced and/ or adapted to gauge the extent of predictability in stock returns in the recent literature. Also, consistent with the emphasis in the empirical literature, I will concentrate on the predictability of the returns of large portfolios of stocks, as opposed to predictability in individual-security returns. With the exception of some studies that uncover interesting empirical regularities, I will not review papers that are primarily "results oriented." Also, this review concentrates on the commonly used statistical procedures implemented in the recent literature to determine the importance of predictable components in stock returns. 1 Given that predictability of stock returns is inextricably linked with the concept of "market efficiency," I will discuss some of the issues related to the behavior of asset prices in an informationally efficient market [see Fama (1970, 1991) for outstanding reviews of market efficiency]. To keep the scope of this review manageable, I do not review the rich and growing literature on market microstructure and its implications for return predictability. Finally, even for the papers reviewed in this article, I will concentrate
* I really appreciate the time and effort spent by John Campbell, Jennifer Conrad, Wayne Ferson, Tom George, Campbell Harvey, David Heike, David Hirshleifer, Bob Hodrick, Ravi Jagannathan, Charles Jones, Bob Korajczyk, G.S. Maddala, M. Nimalendran, Richard Roll, Nejat Seyhun, and Robert Shiller in providing valuable feedback on earlier drafts of this paper. Partial funding for the project is provided by the School of Business Administration, University of Michigan, Ann Arbor, MI. i For example, I do not review frequency-domain-based procedures [see, for example, Granger and Morgenstern (1963)] or the relatively infrequently used tests of dependence in stock prices based on the rescaled range [see Goetzmann (1993), Lo (1991), and Mandelbrot (1972)]. Also, more recent applications of genetic algorithms to discover profitable trading rules [see Allen and Karjalainen (1993)] are not reviewed in this paper.
269
270
G. Kaul
virtually exclusively on the empirical methodology and minimize the discussion of the empirical results. To the extent that stylized facts themselves are inextricably linked to subsequent methodological developments, however, some discussion of the empirical evidence is imperative.
2. Why predictability?
Before discussing the economic importance of predictability and the recent advances made in empirical methodology, I need to explicitly define predictability. Let the return on a stock, Rt, follow a stationary and ergodic stochastic process with finite expectation E(Rt) = # and finite autocovariances E[(Rt - #)(Rt-k -/~)] = Vk. Let ~t-1 denote the information set that exists at time t - l , of which Xt-1 (an M x 1 vector) is the subset of information that is available to the econometrician. We then define predictability as specific restrictions on the parameters of the linear projection of Rt on Xt-l:
Rt = ~,+ ft. x,_l +~t
(1)
where fl(lxM) 0(tM). Therefore, for the purposes of this paper, predictability is defined strictly in terms of the predictability of returns. I do not review the rich and growing literature on the predictability of the second moment of asset returns [see Bollerslev, Chou, and Kroner (1992)]. Therefore, for convenience, and unless explicitly stated otherwise, I assume that the errors, et, are conditionally normal, with mean zero and constant variance ~ 2 . F r o m a conceptual standpoint we can, in fact, assume that returns follow a random walk process because we are not directly interested in predictability in the second (or higher) moments of returns. Consequently, the otherwise important difference between martingales and random walks becomes irrelevant [see F a m a (1970)]. Clearly, statistical inferences based on estimates of (1) will depend on any departures from normality, homoskedasticity and/or autocorrelation in et's. Given that the use of statistical proeedures to obtain heteroskedasticity and/or autocorrelation consistent standard errors has been widespread in economics and finance for over a decade, I will not discuss these procedures. The interested reader is referred to Hansen (1982), Hansen and Hodrick(1980), Newey and West(1987), and White(1980). 2
2The assumption of homoskedasticity unfortunately precludes this review from covering the obviously important literature on the relation between conditional volatilityand expected returns [see, for example, French, Schwert, and Stambaugh (1987) and Stambaugh (1993)]. It is also important to realize that the assumption of normality for stock returns is made for convenience so that the coverage of this review is limited to a finite set of papers. Nevertheless, to the extent that normality may be critical to some of the results reviewed in this paper, the readers are cautioned against generalizing these results.
Predictable components in stock returns 2.1. The economic importance o f predictability
271
Having defined predictability in statistical terms, it appears natural to wonder why it has received such overwhelming attention since the advent of trading in financial securities. Clearly, as so eloquently emphasized by Roll (1988) in his American Finance Association Presidential Address, the ability to predict important phenomenon is the hallmark of any mature science. 3 However, predictability takes on several different connotations, for practitioners, individual investors, and academics, when it comes to stock markets. Practitioners and individual investors have understandably been excited about predictability in asset returns because, more often than not, they equate predictability with "beating the market." Though some academics exhibit similar unabashed excitement over discovering predictability, the academic profession's preoccupation with predictability is also based on more complex implications of return-predictability. Consider the model for speculative prices presented by Samuelson (1965). Suppose that the world is populated by risk neutral agents, all of w h o m have c o m m o n and constant time preferences and c o m m o n beliefs about future states of nature. In this world, stock prices will follow submartingales and, consequently, stock returns are a fair game [see also Mandelbrot (1966)]. Specifically, let Pt, the logarithm of stock price follow a submartingale, that is, E(ptlg2t-1) = Pt-1 + r , where r > 0 is the exogenously given risk-free rate. Stock returns, Rt~ will therefore be given by a fair game, or,
E(Rtlf2t_l) = r .4
(2)
(3)
In a risk-neutral world, therefore, it is clear that any predictability in stock returns as defined in (1) (that is, f l 0), would have very strong implications for financial economics: any predictability in stock returns would necessarily imply that the stock market is informationally inefficient. An important assumption for this result to hold is that the risk-free rate is exogenously determined and does not vary through time. In fact, Roll (1968) shows that expected returns on Treasury bills would vary if there is any time-variation in expected inflation. This is probably the first recognition in the financial economics literature of the fact that
3Roll's main focus is of course different from the focus of this paper. While we are interested in the predictability of future returns, he investigates our ability to explain movements in current stock returns using both past and current information. 4 It is important to note that the stock price Pt itself will not generally be a martingale in a riskneutral world. TechnicallyPt should be understood as the "price" inclusiveof reinvested dividends [see LeRoy (1989)]. Also, in this paper, the martingale behavior of stock prices is assumed to be an implication of risk-neutrality. It is important to note however that (a) risk neutrality does not ensure that stock prices will follow martingales [see Lucas (1978)] and (b) stock prices can follow martingales even if agents are risk-averse [see Ohlson (1977)].
272
G. Kaul
asset prices may be predictable even in efficient stock markets, without the predictability resulting from changes in risk premia (see discussion below). Of course, market efficiency could be defined on a finer grid [see, for example, Roberts (1959) and Fama (1970)] depending on the type of information used at time t-1 to predict future returns. The stock market is weak-form, semi-strong form, or strong-form efficient if stock returns are unpredictable using past stock prices, past publicly available information, and past private information. Until the early seventies, the critical role of risk neutrality in determining the martingale behavior of stock prices was not evident. Consequently, it is not surprising that predictability became synonymous with market inefficiency in the financial economics literature. In fact, the academic literature reinforced the "real world" belief that predictability of stock returns was obvious evidence of mispricing of financial assets. This occurred in spite of the fact that, as early as 1970, Fama (1970) provided a very clear and precise discussion of the critical role of expected returns in determining the time-series properties of asset returns, and the unavoidable link between the basic assumption about expected returns and tests of market efficiency. By the late seventies, however, the work of LeRoy (1973) and Lucas (1978) had demonstrated the critical role played by risk preferences in the martingale behavior of stock prices in efficient markets [see also Hirshleifer (1975)]. And today most academics realize that predictability is not immediately synonymous with market inefficiencies because in a risk-averse world rational time-varying risk premia could lead to return-predictability. Nevertheless, one cannot a priori rule out the possibility that predictability in stock returns arises due to the irrational "animal spirits" of agents. Today, therefore, the existence of predictability has complex implications for financial economics. Given the history of the economic implications of return-predictability, the past two decades have witnessed a fast-flowing stream of research on (a) whether stock returns are predictable, and (b) on whether predictability reflects rational time-varying risk premia or irrational mispricing of securities [see Fama (1991)]. Fortunately, my task is limited to a review of the empirical methodology used to address issue (a) above; that is, to describe and evaluate the empirical techniques used to uncover any predictability in stock returns. One final thought on the importance of return-predictability for the financial economics literature. There has been a fascination with testing capital asset pricing model(s), which is understandable because without a theoretically sound and empirically verifiable model (or models) of relative expected returns of fundamental financial securities such as common stock, the foundations of modern finance would be shaky. Return-predictability plays a crucial part in at least a subset of these tests; specifically, without reliable predictability of stock returns, the important distinction between unconditional and conditional tests becomes irrelevant. [The distinction between conditional and unconditional tests of asset pricing models is well elucidated by Gibbons and Ferson (1985)].
Predictable components in stock returns
273
3. Predictability of stock returns: The methodology

I discuss the methodological contributions made to determining return-predictability under two broad categories. The first category includes all tests conducted to gauge predictability of stock returns based on information in past stock prices alone. The second category covers tests that use other publicly available past information to predict stock returns.
3.1. Predictability based on past returns

The simplest and most obvious test for gauging return-predictability is the auto-regression approach used in early studies that investigated predictability primarily in the short-run.
3.1.1. The regression approach: Short-term Let Xt-1 in (1) be limited to one variable: the past return on the stock, Rt-1. We
can then rewrite (1) as:
R t = # + (91Rt-1 q- 8t (91 __ C o v ( R t , R t _ l
where
(4)
) _ ~1
Var(Rt) Y0 We can similarly regress Rt on returns from any past period, t-k, to gauge predictability, with the corresponding autocorrelation coefficient being denoted by (gk. The statistical significance of any predictability can be gauged, for example, by conducting a hypothesis test that any particular coefficient (gj = 0. Such a test can be implemented using the asymptotic distribution of the vector o f f h order autocorrelations [see Bartlett (1946)]
=
.. (9ij ~ N(O, I)
T
l'a
where
(5a)
(5b)
and T = total number of time-series observations in the sample. A joint test of the hypothesis (9k = 0 V k can also be conducted under the null hypothesis of no predictability using the Q-statistic introduced by Box and Pierce (1970), where
274 k Q=TEq~2~X2 j=l
G. Kaul
(6)
Given the early preoccupation with random walks, and Working's (1934) claim that random walks characteristically develop patterns similar to those observed in stock prices, several of the earlier studies concentrated on autocorrelation-based tests of randomness in stock prices [see Kendall (1953) and Fama (1965, 1970)]. These early empirical studies concluded that stock prices either follow random walks or that the observed autocorrelations in returns, though occasionally statistically significant, are economically trivial. 5 The economic implications of any small autocorrelations in returns were also suspect once Working (1960) and Fisher (1966) showed that temporal and/or cross-sectional aggregation of stock prices could induce spurious predictability in returns, both at the individualsecurity and portfolio levels. More recently, however, the short-term autocorrelation-based tests have taken different forms and have been motivated by different factors. Given that riskaversion could lead to time-varying risk-premia in stock returns, Conrad and Kaul (1988) hypothesize a parsimonious AR(1) model for conditional expected returns and test whether realized returns follow the implied ARMA representation. Specifically, let
Rt = E t - I ( R t ) + et
and
(7a) (7b)
Et-I (Rt) = # + ~blEt-2(Rt-t) + ut-1
where Et-1 (Rt) = conditional expectation of Rt at time t - 1, et = unexpected stock return and loll -< 1. Given the model in (7a) and (7b), realized stock returns will follow an ARMA (1,1) model of the form:
Rt = ~ + IPlRt-1 + at + Olat-1
(8)
where [011 _< 1. Note that the positive autocovariance in expected stock returns [see (7b)] will also induce positive autocovariance in realized returns. A positive shock to future expected returns, however, causes a contemporaneous capital loss which, in turn, leads to negative autocovariance in realized returns. Specifically, in (8) the autoregressive coefficient denotes the positive persistence parameter if/l, but the moving average parameter, 01, is negative [see Conrad and Kaul (1988) and Campbell (1991)]. Some researchers therefore argue that it may be very difficult to uncover any predictability in stock returns due to the confounding effects of changes in expected returns on stock prices. Nevertheless, using weekly returns Conrad and Kaul (1988) find that: (a) estimates of the autoregressive coefficient, ~Ol, are positive and range between 0.40 and 0.60, and (b) more importantly, 5Granger and Morgenstern (1963)used spectral analysis to reach similar conclusions.
Predictablecomponentsin stockreturns
275
predictability in stock returns can explain up to 25 percent of the variation in the returns to a portfolio of small NYSE/AMEX firms. Given the rapidly mean-reverting component in weekly stock returns (recall the ~l'S range between 0.40 and 0.60), Conrad and Kaul (1989) show that predictability of monthly returns can be substantial when decreasing weights are given to past intra-month information. This occurs because the most recent intramonth information is most informative about next month's expected returns; using monthly data to predict monthly returns effectively ignores intra-month information by assigning equal weights to all past intra-month information. Specifically, define monthly continuously compounded stock returns R~' as
3
Rt = Z RtW-k
k=0
(9)
where R~'_ k = continuously compounded stock return in week t - ,t:. From (7b) it follows that the monthly expected stock return for the current month is given by
Et-4(Rt)=Et-4I~RtW-k] k=0
= (1 + I//1 -}- ~2 + I//~) Et-4 (Rt-3)
= rclRt_ 4 + rc2Rt_ 5 + . . . .
where ~i = (-01)i-l(~kl + 01)(1 + ~, + ~2 + ~ ) V i = 1,2,3,.... Therefore, the typical weights for past intra-month data would decline dramatically if we were interested in predicting monthly stock returns. Using geometrically declining weights on past weekly and daily returns, Conrad and Kaul (1989) show that up to 45 percent of the monthly returns of a portfolio of small firms can be explained based on ex ante information. On the other hand, studies using past monthly returns typically explain only 3 to 5 percent of variation in realized returns since they implicitly weigh all past intra-month information equally. Although recent autoregression-based (and variance-ratio-based, see Section 3.3) tests conducted on short-term returns reveal statistically and economically significant return predictability, a caveat is in order. Most of the short-run studies use weekly portfolio returns, and at least some of the observed predictability may be spuriously induced by market microstructure effects. Specifically, nonsynchronous trading could lead to nontrivial positive autocovariance in portfolio returns [ see, for example, Boudoukh, Richardson, and Whitelaw (1994), Fisher (1966), Lo and Mackinlay (1990b), Muthuswamy (1988) and Scholes and Williams (1977)].
3.1.2. The regression approach." Long-term

The early literature on short-term predictability in stock returns found small autocorrelations and concluded that this evidence supported market efficiency.
276
G. Kaul
Alternatively, it was claimed that the lack of reliable predictability of returns implied that stock prices are close to their intrinsic value. There are however two problems with this conclusion. First, recent research (see above) has revealed nontrivial predictability of short-horizon returns [Conrad and Kaul (1988, 1989) and Lo and MacKinlay (1988)]. Second, as shown by Campbell (1991), small but very persistent variation in expected returns can have a dramatic impact on a security's stock price. In fact, ShiUer (1984) and Summers (1986) argue that stock prices contain an important irrational component which takes long swings away from the fundamental value. This slowly mean-reverting component, however, cannot be detected in short-term stock returns. Stambaugh (1986a), in a discussion of Summers (1986), argues that although these long swings away from intrinsic value will not be detectable in short-term data, long-term returns should be significantly negatively autocorrelated. Fama and French (1988) formalize this basic intuition by proposing a model for asset prices which now forms the alternative hypothesis for virtually all (long-run) tests of market efficiency. Let the logarithm of stock price, p , contain a random walk component, qt, and a slowly decaying stationary component, zt. Specifically,
P t -= qt + Zt
(10)
~lt ~ iid(O, a2n) et ~ iid(O, ~ )
where
qt = # + qt-1 + tit
Zt ~ (91Zt-1 -~- ~t , ,
and I~1 < 1 and E(qt~t) = O. The two components of stock prices, qt and zt, are also labeled the permanent and temporary components. Given the model for stock prices in (10), stock returns can be written as:
Rt = Pt
-- P t - 1 =
[qt
- q t - l ] + [zt - z t - 1 ]
o0
= lg + ~lt + gt + (t~l 1) Z
i=l
i-l
ff) l ~3t-i
(11)
Fama and French (1988) suggest using the multiperiod autocorrelation coefficient to detect predictability by regressing a k-period return on its own value lagged one period (of length k). Specifically,
k k
Z
i=l
Rt+i = a(k) + fl(k) ~ _ R , - i + l

i=l
+ ut(k) .
(12)
From (12) it is clear that fl(k) measures the multiperiod autocorrelation, and the ordinary least squares estimator of this parameter is given by
277
_- cov[ ,t, R,+,,
Rt/+,]
13a>
Var[Z,tlR,+,]
Some algebra manipulation shows that the probability limit of/~(k) is given by [see, for example, Jegadeesh (1991) ] plim[/~(k)] = -(1 - ~bk)2 27k(1 - ~b~) + 2(1 - ~bl k) (13b)
2/ 2 where 7 = (1 + ~bl)a~/2cr ~ = ratio of the unconditional variances of the returns 1 attributable to the permanent versus temporary components, and the asymptotic variance of/~(k) under the null hypothesis is given by
TVar[/~(k)] - 2k2 + 1 3k
(14)
It is clear from (13) that the temporary component is entirely responsible for any predictability in stock returns [that is, if ~bl = 1, p lim[/~(k)] = 0]. More importantly, with q~l close to unity, it follows that short-term returns [that is, small values of k in (12)] will exhibit small autocorrelations, while the negative autocorrelation will be large at long horizons (that is, for large k). Specifically, Fama and French (1988) argue that the negative autocorrelations in returns may exhibit a U-shaped pattern: close to zero at very short and long horizons, but significantly negative at reasonably long horizons. As the cumulation interval for returns k ~ ~z, p lim[/~(k)] ~ - 1 / 2 due to the temporary component, but the variance of the permanent component of a k-period return will eventually dominate the variance of the temporary component since it increases linearly with k (that is, k7 ~ ~ for very large k). This, in turn, will push plim [/~(k)] up toward zero for large k. Jegadeesh (1991) provides an alternative estimator of long-term return predictability [see also Hodrick (1992)]. He argues that, if stock prices follow the process in (10), power considerations (see Section 4) dictate that a single-period return should be regressed on a multi-period return. Specifically,
k
R, =
+ B(1,k) F_,R,_, + u , .
i=1
(15)
The OLS estimator of fl(1, k) is given by

k Vat [Zi~I
Rt-i]
(16a)
From (13) it follows that
278
G. Kaul
plim[fi(1,k)] = - ( 1 - q51)(1 - q~k) , 2yk(l - ~bl) + 2(1 - ~b~) (16b)
and the asymptotic variance of/~(1, k) under the null hypothesis of no predictability is given by TVar[fi(1,k)] = 1 / k . (17)
Comparing (16) with (13), we see that increasing the measurement interval of the dependent variable leads to a larger slope coefficient of the regression of longterm returns on lagged long-term returns if the alternative hypothesis is the model shown in equation (10). However, increasing the measurement interval of the dependent variable will also increase the standard error of the estimate [compare (17) with (14)]. Using Geweke's (1981) approximate-slope procedure to gauge the relative asymptotic power of fi(k) versus fl(1, k), Jegadeesh (1991) shows that the latter effect always dominates. Consequently, for reasonable parameter values, the optimal choice of k for the dependent variable is always unity. The choice of the measurement interval for the independent variable however depends on plausible parameter specifications for the alternative hypothesis. Not surprisingly, for q~l close to one long measurement intervals are required to uncover predictability, while shorter measurement intervals are recommended if the share of the permanent component in the variance of returns, 7, is large. [A more detailed discussion of the power issues is presented in Section 4.]
3.2. The variance-ratio statistic

Another methodology extensively used in the literature to uncover the statistical and economic importance of the predictable component in economic time-series is the variance-ratio methodology. The variance-ratio statistic, however, is first used extensively by French and Roll (1986) to compare the behavior of stock-return volatility during trading and non-trading periods. Cochrane (1988) uses the variance-ratio statistic to measure the importance of the random walk (or permanent component) in aggregate output; Poterba and Summers (1988) use this methodology to assess the long-term predictability in returns within the context of mean reversion in prices [see (10)]; and Lo and MacKinlay (1988,1989) provide the most formal analysis of the variance-ratio statistic to date to test the random walk hypothesis using short-term stock returns [see also Faust (1992)]. Despite the different contexts in which the variance-ratio statistic has been used in the economics literature, the ultimate purpose has been the same: to assess the importance of the predictable component in stock returns (or other economic timeseries), 6 6As pointed out by Frank Diebold [see LeRoy (1989)],almost forty years before its introduction to finance, Working (1949) proposed that statistical series be modeled as the sum of a random walk and stationary components. More significantly, he also proposed the use of variance ratio tests to determine the relativeimportance of each component.
Predictablecomponentsin stock returns
279
The basic intuition for the variance-ratio statistic follows directly from the random walk model for asset prices. If stock prices follow random walks, then the variance of a k-period return should be k times the variance of a single-period return. In other words, the variances of returns should increase in proportion to the measurement interval, k. The k-period variance ratio is defined as:
(/'(k) = V a r ( Z i i l R ' + i )
k Var(Rt)
1,
(18)
where, for convenience, the factor k is used in the denominator of the variance ratio and unity is subtracted from the ratio.. The intuitively appealing aspect of the variance-ratio-statistic, V(k), is that it will be equal to zero under the null hypothesis of no predictability. Moreover, as shown below, (7(k)<>0 depending on whether single-period returns are positively (negatively) autocorrelated (or equivalently, whether there is mean reversion in security returns or security prices). Under the null hypothesis of no predictability, the asymptotic variance of V(k) is given by [see Lo and MacKinlay (1988) and Richardson and Smith (1991)]: TVar[l?(k) ] = 2(2k - l)(k - 1) 3k (19)
3.3. A synthesis
In this section, we present a synthesis of all the statistics presented to test for the existence of predictability in stock returns based on the information contained in past stock prices. 7 All tests of return predictability discussed above are (approximately) linear combinations of autocorrelations in single-period returns. Under the null hypothesis of no predictability, all these statistics will therefore have zero expected values. However, the behavior of the various statistics could be substantially different under different alternative hypotheses because they place different weights on single-period autocorrelations of different lags. Recall from Section 3.1.1 that the asymptotic distribution of the vector o f j thorder autocorrelations is given by
v ~ 6 ( k ) = v ~ [ r , (k),..., 6j(k)]'L N(O, I)
(20a)
where k = the length of the measurement interval, and q~j(k)] = jth-order autocorrelation. For convenience, we redefine thejth-order autocorrelation coefficient such that:
7The discussionin this sectionis based in largepart on the analysisin Richardsonand Smith(1994). See also Daniel and Torous (1993).
280
G. Kaul T Z t = J ( R ' - f~)(Rt-j -/~) 2 "

1 T k
~j(k) =
(20b)
Note that the fh-order autocorrelation coefficient in (20b) is different from the one in (5b) in that the autocovariance is not weighted by the single-period variance. Instead, since the independent variables in both the Fama and French (1988) multiperiod autoregression (12) and Jegadeesh's (1991) modified autoregression (15) are k-period returns, the autocovariance in (20b) is weighted by a k-period variance. Clearly, under the null hypothesis of no predictability this modification to the fh-order autocorrelation coefficient has no effect in large samples. However, under different alternative hypotheses, this seemingly minor modification could have nontrivial effects on inferences. As mentioned earlier, all the statistics discussed so far can be rewritten as weighted averages of the f h - o r d e r autocorrelations, albeit with different weights. We can define the entire set of test statistics as linear combinations of autocorrelations, such that
2s(k) = Z
J
~ojsdpj(k) ,
(21)
where o)js = weights assigned to thejth-order autocorrelation by a particular teststatistic, 2s(k) ]where s is the index for the test statistic]. Under the null hypothesis of no predictability, from (20a) it follows that s i ( O , o o;) .
(22)
The normality of all the test statistics follows because each one is an (approximately) linear combination o f f h - o r d e r autocorrelations which, in turn, have asymptotically normal distributions under the null hypothesis [see (20a)]. And using (21), the three estimators may be rewritten as [see Cochrane (1988), Jegadeesh (1990), Lo and MacKinlay (1988), and Richardson and Smith (1994)]:
~-~2k-1min(j, 2k k ]~(1,k) - ~jk=l dpj(k) k
j)dpj(k)
' (23a) (23b)
and
8A related stream of research measures the profitability to linear trading strategies of various horizons [see DeBondt and Thaler (1985) and Lehmann (1990)].In these studies, the profits of trading strategies are functions of average autocovariances,both for individual securitiesand portfolios [see Ball, Kothari, and Shanken (1995), Conrad and Kaul (1994),Jegadeesh(1990),Jegadeeshand Titman (1993), and Lo and MacKinlay (1990a)].
281
k-1 ~'(k) = 2 z ( k ~ J) q~j(1) . (23c) j=l Given the weights and the exact formulae in (23a)-(23c), it is simple to calculate the asymptotic variances of each of the estimators under the null hypothesis [or any other estimator of the form 2s(k) = ~jogjs(aj(k)]. Specifically, TVar[2s (k)] = ~ j co}~.Therefore, the asymptotic variances of the three estimators can be calculated as: T Var[/}(k)] - 2 ~ + 1 3k ' TVar[/}(1,k)] = 1/k , and (24a) (24b)
TVar[l~(k) ] = 2(2k - 1 ) ( k - 1) (24c) 3k The appropriateness of a particular test statistic 2s(k) will depend entirely on the alternative hypothesis under consideration. For example, suppose stock prices reflect "true" value but are recorded with well-behaved measurement errors caused by market microstructure effects, that is, observed price /~t = Pt + et (where Pt = true price and et = random measurement error). Then clearly the alternative model for stock returns will follow an MA(1) process, and the optimal weights to detect such predictability would be ~oj = 0 g j > 1. Any alternative weighting scheme would make the resulting test statistic inefficient [see Kaul and Nimalendran (1990)]. A more detailed examination of this important dependence of the choice of a particular test statistic 2,(k) and the alternative hypothesis is provided in Section 4. An additional important point made by Richardson and Smith (1994) in the context of the alternative test statistics used in the literature is that if the null hypothesis is true, then the estimators will be strongly correlated with each other. This occurs because/~(k),/}(1,k), and #(k) will tend to capture common sampling errors. Specifically, the asymptotic variance-covariance matrix of the three estimators can be written as: 9
rVarl
\
(1,2k)
V(2k)
2k
2k-I 2k
a4k-, 2k (2 -l/J
6k /
(25)
For large k, the correlations vary between 75% and 88%, and Richardson and Smith (1994) confirm the existence of high correlation between the three estimators in small samples. This issue is particularly important because Richardson (1993), for example, shows that the U-shaped patterns in autocorrelations predicted by the alternative fads model in (10) can obtain even if true prices are completely unpredictable. Given that we can falsely reject the null hypothesis based on fl(k), it would not be very surprising if use of fl(1,2k) and ~'(2k) also lead to the same conclusion. 9Note that for ease of comparison across the three estimators, the variance-covariancematrix is calculated for/}(k),/~(1,2k), and 12(2k).
282
G. Kaul
3.4. Predictability based on fundamental variables Although predictability of stock returns based on past information in stock prices has received the overwhelming share of attention, several researchers gauge the predictability of stock returns using "fundamental" variables. In a seminal contribution to the predictability literature, Fama and Schwert (1977) use treasury bill rates to predict stock and bond returns [see also Fama (1981)]. Over the past decade, several new fundamental variables have been used to predict stock returns. For example, Campbell (1987), Campbell and Shiller (1988), Cutler, Poterba, and Summers (1991), Fama and French (1988, 1989), Flood, Hodrick, and Kaplan (1987), and Keim and Stambaugh (1986), among others, use financial variables such as dividend yield, price-earnings ratios, term structure variables, etc., to predict future stock returns. In a similar vein, Balvers, Cosimano, and MacDonald (1990), Fama (1990), and Schwert (1990) have used macroeconomic fundamentals, such as output and inflation, to predict stock returns [see also Chen (1991)], while Seyhun (1992) uses aggregate insider-trading patterns to uncover predictable components in stock returns. Some recent papers by Ferson and Harvey (1991), Evans (1994), and Ferson and Korajczyk (1995) focus on the relation between predictability of stock returns based on lagged variables and economic "factors" similar to those identified by Chen, Roll, and Ross (1986). Ferson and Schadt (1996) show that conditioning on predetermined public information removes biases in commonly used unconditional measures of the performance of mutual fund managers; mutual fund managers "look better" using conditional measures. Finally, Jagannathan and Wang (1996) show that models that allow for time-varying expected returns on the market portfolio also have the potential to explain the rich cross-sectional variation in average returns on different stocks. The typical regression estimated to uncover predictable components in stock returns using fundamental variables is similar to regression (12): k Rt+i = a(k) + fl(k)Xt + ut(k) , i=1 where Xt =dividend yield, output, .... The only difference between (12) and (26) lies in the use of past fundamentals in the latter versus past returns in (12). Also, with the exception of Hodrick (1992), multiperiod returns are regressed on the fundamentals typically measured over a fixed interval. 1 The most significant findings of the studies estimating regressions similar to (26) are: (1) Several different variables predict stock returns; and (2) in virtually all cases, the ,~2's of the regressions increase dramatically as the length of measurement interval for the dependent variable is increased. In effect, therefore, there is strong predictability in long-term stock returns. (26)
10Following Jegadeesh (1991), Hodrick (1992) regresses single-period returns on past dividends measured over multipleperiods. See Section4.1 for a discussionof the efficacyof this approach.
283
The more recent literature on return-predictability based on fundamental variables has therefore concentrated on long-term stock returns. This is quite natural, especially given that the most commonly used alternative time-series model for returns [see (10)] also implies greater predictability of long-term returns. In fact, the "excess volatility" literature, pioneered by Shiller (1981) and LeRoy and Porter (1981), can be viewed as the precursor of the vast literature on long-term return-predictability. This literature suggests that if stock prices are excessively volatile relative to subsequent movements in dividends, that implies that longterm returns (or, more specifically, the "infinite-period log returns") are forecastable [see also Shiller (1989)]. [Also see discussion below on the forecastability of long-term stock returns using past dividend yields.] It would also be fair to say that among all the potential variables that could be used to predict stock returns, dividend yields have received overwhelming attention [see, for example, Campbell and Shiller (1988a,b), Fama and French (1988b), Flood, Hodrick, and Kaplan (1987), Goetzmann and Jorion (1993), Hodrick (1992), and Rozeff (1984)]. The choice of the dividend yield variable again is no accident; fairly simple models of asset prices can be used to justify (a) the role of dividend yields in predicting stock returns, and (b) the stronger predictive power of dividend yields at long versus short horizons. Following Campbell and Shiller (1988a), consider the present value model of discounted dividends:
Pt = Et ~
i=1
Dt+i .
(27)
Given constant growth rate of dividends, G, and constant expected returns, we obtain the Gordon (1962) model for stock prices (for R > G):
Pt
= (lG)
\-R-Z-~_ G / Dt .
(28)
Campbell and Shiller (1988a) show that with time-varying expected returns, it is useful to study the loglinear approximation of the relation between prices, dividends, and returns. Using this approximation, the "dynamic" version of the dividend-growth model in (28) may be written as:
Pt
~---
k + Et ~ pJ[(1 - p)dt+,+j - rt+l+j] 1 ------p j=o
(29)
where p = 1/[1 + exp(d-p)], k = -log(p) - (1 - p ) l o g ( l / p - 1) and all lower case letters indicate logs of the respective variables and (d - p) is the fixed mean of the (log) dividend-price ratio, which follows a stationary process.
284
G. Kaul
To demonstrate the importance of the dividend yield variable for predicting future stock returns, equation (28) can be rewritten in terms of the (log) dividend yield [see also Campbell, Lo, and MacKinlay (1993)]:
oo
dt - P t -
kp 1+
~ _ E tj=0 Zj[_Adt+l+j+rt+l+j].
(30)
From (30) the potential predictive ability of dividend yields becomes obvious: the current dividend yield would proxy for future expectations of stock returns (the second term in brackets) as long as future dividend growth rates (the first term in brackets) are not too variable. Also, since we discount all future returns in (30), the current yield is likely to have greater predictive ability for long-term stock returns.t1 Given the economic justification for estimating regressions similar to (26), instead of comparatively ad hoc autoregressions similar to (12) or (15), until recently the startling evidence from the "fundamental regressions" was not viewed with suspicion. For example, Jegadeesh (1991), in investigating the power of autoregressions such as (12), reflects the general belief that "... the evidence that the returns at various horizons can be predicted using these [fundamental] variables does not seem to be controversial" (p. 1428). However, there are statistical problems associated with (long-run) regressions such as (26) caused by the unavoidable use of small sample sizes when k is large. The first problem [analyzed by Nelson and Kim (1993) and Goetzmann and Jorion (1993)] deals with bias in the OLS estimator of fl(k) because dividend yields (or other fundamental variables) are lagged endogenous variables. The second statistical problem results from the fact that the OLS standard errors of fi(k) are also biased [see Hodrick (1992), Kim, Nelson, and Startz (1991), Richardson and Smith (1991), and Richardson and Stock (1989)]. The analysis of Mankiw and Shapiro (1986) and Stambaugh (1986b) suggests that the small-sample bias in/~(k) could be substantial. Consider for example the bivariate system [see also Nelson and Kim (1993)]:
Yt = ~ + flXt-1 + at ,
~ft = # ~- (oXt-1 ~- ?It ,
at ,~ iid(O, ~ )
(30a)
and
~t ~ iid(O, a~)
(30b)
E(atet-k) = E(~ltqt-k) = E(adlt-k) = 0 V k # 0 .
It can be shown that although/~OLS in (30a) is consistent, it is biased in small samples, and the bias is proportional to the bias in the OLS estimator of q~ [see Stambaugh (1986b)]: 1| Campbell, Lo, and MacKinlay (1993) also demonstrate how a highly persistent expected return component [that is, a lpl ~ 1 in 7(b)] could also lead to increased predictive ability of dividend yield (and other fundamental variables) at long horizons.
285
E[(/~ - fl)] -
Cov(gt~ qt)
V---~r(q 5 E[(~b - ~b)] .
(31a)
And Kendall (1954) shows that the bias in qSOLSis approximately to the order of -(1 + 3d~)/T, where T is the sample size. Consequently, E[(/~-/3)] _~ Var(r/t)
Cov(~,, nt)
[-(1 + 3q~)/T] .
(31b)
From (31a) and (31b), it follows that even if Xt_l truly has no explanatory power in predicting Yt, the small sample bias in estimating ~b results in spurious predictability. The spurious predictability will be stronger: (a) the higher the correlation coefficient between the innovations gt and r/t; (b) the higher the autocorrelation in Xt; and (c) the smaller the sample size. The second problem with regression (26) is that due to small sample sizes, most researchers use overlapping observations for k-period returns (that is, the dependent variable) which, in turn, induces serial correlation in the errors. Traditional OLS standard errors are appropriate asymptotically only if there is no serial correlation in returns. Hansen and Hodrick (1980) provide autocorrelationconsistent asymptotic standard errors which can be modified for heteroskedasticity [see Hodrick (1992)]. Richardson and Smith (1991) use an innovative approach to derive asymptotic standard errors that replace the Hansen and Hodrick (1980) standard-error adjustments with a very simple form independent of the data. For example, the asymptotic variances of the three autocorrelation based estimators take the same form as in (24a)-(24c). Hodrick (1992) provides heteroskedasticity-consistent counterparts to the Richardson and Smith (1991) standard errors within the context of regression (26).12 [Section 4.1 contains a detailed analysis of the efficiency gains from using overlapping observations in estimating regressions similar to (26).] Nelson and Kim (1993) address both problems of biased OLS estimators of fl(k) and biases in their standard errors by jointly modeling stock returns and dividend yields as a first-order vector autoregressive (VAR) process [see also Hodrick (1992)]. Specifically, let
z, = a
Z,_l + u,
(32)
where Zt represent stock returns and lagged dividend yields. To assess the bias in /~(k) and the properties of the asymptotic standard errors in small samples, both Hodrick (1992) and Nelson and Kim (1993) simulate the VAR model in (32) under the null that the slope coefficients in the return equation are zero. The VAR approach is attractive because it directly addresses the issue of persistence in dividend yields [see q~ in (30b)] and the strong (negative) contemporaneous cor-
42 See also Newey and West (1987) for autocorrelation- and heteroskedasticity-consistent variance estimators that are positive semidefinite.
286
G. Kaul
relation between innovations in stock returns and dividend yields [proxied by zt and t/t in (30a) and (30b), respectively]. Both Hodrick (1992) and Nelson and Kim (1993) find that inferences could be substantially altered by correcting for (a) small sample bias in/~(k) induced by the endogeneity of dividend yields; and (b) the small sample bias in asymptotic standard errors suggested in the literature [see also Goetzmann and Jorion (1993)]. 13 On a more general level, however, all the tests of predictability will run into data-snooping problems. For example, Lo and MacKintay (1990c) show how grouping stocks into portfolios based on an empirical regularity (such as the size effect) can bias statistical tests. Of more direct concern to us, however, is the work of Foster and Smith (1992) and Lo and Mackinlay(1992) who analyze the properties of the maximal R 2, a widely used measure of the extent of predictability in several scientific contexts [see, for example, Roll (1988)]. Foster and Smith (1992) derive the distribution of the maximal R 2 when a researcher chooses predictor variables from a set of available ones. Consider, for example, a multiple regression:
Yt = ~ + flXt + et , et ~ N(O, a 2)
(33)
where X t is a matrix of k regressors. Under the null hypothesis that the vector fl = 0, the R2 of regression (33) is k T (k+l) distributed Beta [~, -~- ], where Tis the sample size. The distribution of the R 2 can then be used to assess the goodness-of-fit of regression (33). The assumption is that researchers choose K predictors from a potential pool of M regressors, and the cut-offR 2 needs to be adjusted for this choice. Using order statistic arguments, Foster and Smith (1992) show that for independent regressions the distribution function for the maximal R 2 is given by
j(m
UR2(r) = Pr R~ <_ r , R 2 <_ r , . . . , R
) <_ r
= [Beta (r
(34)
where Beta (r) is the cumulative distribution function of the beta density function with k / 2 and ~ degrees of freedom. Given that non-independent regressions are estimated in the literature, equation (34) provides a lower bound for the true distribution function of the maximal R 2. Foster and Smith (1992) show that we could generate reasonably high R2's that do not exceed the maximal R 2 under the assumption of fl = 0 in (33), even if we "snoop" a few predictors from a limited set of potential regressors. Since the (independent and even overlapping) observations (T) in long-run studies are
13 The regression of overlapping returns (even under the null hypothesisof no predictability)on highlyautoeorrelateddividendyieldsand/or prices potentiallyalso sufferfrom the spurious regression phenomenonillustratedby Granger and Newbold (1974).
287
likely to be small, from (34) it follows that one can more easily produce spuriously high values of RZ's in long-run versus short-run regressions. 14 In a related paper, Lo and MacKinlay (1992) explicitly maximize the predictability of stock returns to, among other things, provide a gauge of whether the predictability uncovered in the literature is economically significant or not. They maximize predictability by varying the dependent variable (specifically, the composition or portfolio weights of the stock portfolios whose returns are being predicted), while holding fixed the regressors in (33). Foster and Smith (1992), on the other hand, maximize predictability across subsets of predictors while keeping fixed the asset returns being predicted. Nevertheless, both studies provide useful bounds on maximal R2 values that can be achieved in empirical studies purely by chance.
4. Power comparisons
Until now we have concentrated on the statistical properties of test-statistics used in the literature to gauge predictability in stock returns under the null hypothesis of no predictability. However, critical to any statistic is its power in discerning departures from the null hypothesis. The power of a test-statistic can be determined within the context of a specific alternative hypothesis. The most common approach for evaluating the power of a statistic is to use computer-intensive simulations under different alternative hypotheses [see, for example, Hodrick (1992), Lo and MacKinlay (1989), Kim and Nelson (1993), and Poterba and Summers (1988)]. A classic example of such power comparisons is the exhaustive investigation of the size and power (against several alternative hypotheses) of the variance-ratio statistic in finite samples by Lo and MacKinlay (1989). Although, small sample sizes that are characteristic of long-run studies may make a computer-intensive approach unavoidable for determining the finite sample properties of any particular statistic, some recent studies suggest that asymptotic power comparisons can help us understand the reasons for the different (or similar) behavior of test statistics under alternative hypotheses. Specifically, Campbell, Lo, and MacKinlay (1993), Hansen and Hodrick (1980), Jegadeesh (1991), and Richardson and Smith (1991, 1994), among others, use the Bahadur (1960) and Geweke (1981) procedure to compare the relative asymptotic power of test statistics, which requires a comparison of their approximate slopes. The approximate slope of a test-statistic, denoted by cs, is defined as the rate at which the logarithm of the asymptotic marginal significance level of the statistic declines, under a given alternative hypothesis, as the sample size is increased. Geweke (1981) shows that when the limiting distribution of a test-statistic 2s (k) is ;(2, its approximate slope is equal to the probability limit of 1 / T times the test statistic under the null hypothesis. 14The unreliabilityof R2's in long-run studies that use overlapping stock returns as dependent variables to increaseT is also emphasizedin Grangerand Newbold(1974).
288
G. Kaul
As an illustration of power comparisons, let us assume that the alternative hypothesis is described by the temporary-permanent stock price model shown in (10). The choice of this alternative is attractive because of its widespread use in the literature. Also, following Jegadeesh (1991) and Richardson and Smith (1991, 1994) let us compare the relative asymptotic powers of the three main autocorrelation based statistics,/~(k), fi(1,2k), and (Z(2k). Note that the choice of these statistics is also natural because, given that they are linear combinations of consistent autocorrelation estimators [see (21)], they have limiting Xa distributions. This, in turn, enables us to directly use Geweke's (1981) procedure to conduct power comparisons. Noting that all the autocorrelation-based statistics are given by 2s(k) ~jO)js~oj(k), we need to choose o) and k to maximize the approximate slope of a particular test-statistic 2~(k) [see Richardson and Smith (1994)]:
=
@co = {coColim(q~(k))]}'{o)co'}-l{co~o lim(q~(k))]} .
(35)
The only unknowns in (35) are the probability limits of q~(k) which can be determined easily given the alternative model in (10). Specifically, -[1/(1 + ~,)]4-1(1 - qS)2 p lim[@(k)] = 217/(1 + 7)](1 - q~) + 211/(1 + 7)](1 . (36)
c~k)/k
Substituting the values of plim [~j(k)] from (36)into (35), we can find the test with the maximal approximate slbpe arid use it as a benchmark to gauge the relative power of all existing test statistics. Specifically, maximizing c, in (35) with respect to co and k, we obtain:
~oj --1 (37)
max [ o,k L2( /1 +
(1/1 + 7)[1 - qS]z + 2(7/1 + y)(1 - ~b)k/k
As Richardson and Smith (1994) note, there are two separate parts of this maximization problem in (37). The first part in brackets is clearly maximized as k is increased, but the marginal gain from increasing k decreases at a rate which is a function of the two unknowns, 7 (the share of the variance of the permanent versus the temporary component of stock prices) and q~ (the persistence parameter of the temporary component). The second component involves a choice of the weights, o), which depend only on ~b because it fully explains the autocorrelation pattern under the alternative model in (10). And given a fixed ~b, the optimal weights o)j = ~bj-~ V j, that is, the optimal weights for the asymptotically most powerful statistic will decline geometrically. From the above discussion it would appear that the variance ratio statistic, l;'(2k), which places declining weights on autocorrelations should exhibit the maximum power compared to both the /~(1,2k) statistic, which places equal weights on autocorrelations, and /~(k) which places virtually no weight on the
289
very informative low-order autocorrelations [see (23a)-(23c)]. However, Richardson and Smith's (1994) explicit approximate slope comparisons reveal that the /~(1,2k) statistic fares as well as the V(2k) statistic in detecting departures from the null when the alternative model is of the form in (10). The answer to this puzzling result lies in the use of multiple-period returns in/~(1,2k) versus singleperiod returns in V(2k) for weighting the autocovariances [compare (16a) with (18)]. Thus, the choice of k = 1 for the variance ratio, I;'(2k), reduces its power because the first term in (37) is not maximized. Conversely, the choice o f k > 1 for /~(1,2k) increases its power; however, the flat (as opposed to geometrically declining) weights hurts its power. This useful insight, obtained from theoretical power comparison of the tests, helps us understand the sources of the apparently similar power [given the alternative model in (10)] of two seemingly different test statistics.
4.1. Overlapping observations
A large part of the literature on stock-return predictability has concentrated on long-run predictability, using both past returns and/or fundamental variables. However, since "theory" is silent about what constitutes a long-run, empirical studies have used holding periods of five to l0 years in gauging the existence of predictability. A paucity of historical data, however, makes it difficult to obtain more than a handful of independent (that is, nonoverlapping) observations on long-term returns. For example, between 1926 (the starting date of the CRSP tapes) and 1994, there are only 14 nonoverlapping five-year intervals. Such small samples make inferences very unreliable, and it is not surprising that the past decade has witnessed several attempts to extricate as much information out of the limited historical data at hand. A natural solution to the small-sample problem is to use overlapping data; and this has been the choice of most empiricists. Hansen and Hodrick (1980) use the asymptotic slope procedure of Bahadur (1960) and Geweke (1981) to show that overlapping data leads to an increase in the asymptotic efficiency of estimators of long-run relations. Richardson and Smith (1991) quantify the efficiency gains from the use of overlapping data when past returns are used to predict future returns (see Section 3.1). They show that overlapping data provide approximately 50% more "observations" relative to the nonoverlapping data used for the same period. However, Boudoukh and Richardson (1994) demonstrate that the efficiency gains from the use of overlapping data may be severely diluted when long-term predictability is measured by estimating the information content in fundamental variables [see regression (26)]. Specifically, if the fundamental variables used to predict stock returns are highly autocorrelated, which they invariably are [see, for example, Keim and Stambaugh (1986) and Fama and French (1988b)], the efficiency gains from the use of overlapping data dwindle rapidly. Also, other commonly suggested procedures may actually be even more inefficient than using overlapping observations.
290
G. Kaul
Consider, for example, regression (26) estimated using nonoverlapping data and a single predictor variable; that is, the data are sampled every k periods leading to a sample size of T/k k-period observations. The asymptotic variance of /~(k) is given by
TVar[fl(k)] = k 2 a2
(38)
where a ] and a~ are the variances of single-period returns and the independent variable Xt. Suppose, overlapping observations are used to estimate (26) instead, and let the predictor variable follow an autoregressive model of the form Xt = #x+ ~b~_l + et, with 0 < ~bx < 1.0.15 Under these conditions, Boudoukh and Richardson (1994) show that the asymptotic variance of the overlapping estimator of fl(k), denoted by/~0(k), is given by
TVar[flo(k)]- a-Txx k
-a2[
+l---~2~x (
k-1-q~x
1-~bk-l'].~ 1--~x ) ] '
(39)
Note that while the asymptotic variance of both the nonoverlapping and overlapping estimators, /~(k) and to(k), increases with an increase in the measurement interval of returns, k, the asymptotic variance of the latter also increases with ~bx, the autoregressive parameter of the predictor variable process. In fact, Boudoukh and Richardson (1994) show that with 720 months of data and ~x = 0.99 (a sample size and autoregressive parameter common to several longrun studies),/~0^(k)based on five-year overlapping intervals would be as efficient as the estimator fl(k) based on only 14 five-year nonoverlapping intervals! The importance of the autoregressive parameter q~ in reducing the efficiency gains from using overlapping data can be seen directly from a comparison of (38) and (39): with a q5 x = 0, the nonoverlapping data is less efficient by a factor of k, the length of the long-term interval. Unfortunately, an intuitively appealing alternative approach to resolving this small- sample problem may actually be worse than using overlapping data, in spite of the fact that this approach has the advantage of avoiding the calculation of autocorrelation-consistent standard errors. Specifically, following Jegadeesh (1991), Hodrick (1992) suggests that fl(k) in (26) be estimated by using singleperiod returns as the dependent variable, while using the predictor variable aggregated over k periods [see also Cochrane (1991)]. Although the asymptotic efficiency of this alternative estimator,/~a (k) and the overlapping estimator, t0 (k), are identical under the assumption that qS~= 0; Boudoukh and Richardson (1994) show that given the finite history of data available to us, the efficiency of/~A(k) is much lower than the efficiency of/~0(k), especially the larger the measurement interval, k, and the higher the autocorrelation in the predictor variable. This lower efficiency is primarily due to the fact that the denominator of/~A(k) is a k15A first-orderautoregressivemodelfor Xt may be appropriate because,althoughmost predictor variables have autocorrelafionsat lag 1 that are closeto 1.0, higher-orderautocorrelationstypically decayfairlyrapidly[seeKeimand Stambaugh(1986)].
291
period variance of Xt, while the denominator of/~0(k) is only a single-period variance of Xt. In finite samples, the k-period variance of X t will be measured much more inefficiently than its single-period variance. The above discussion therefore suggests that commonly used approaches to resolving the small-sample problem inherent to long-run studies may be unsatisfactory. Does this imply that long-run regressions have a bleak future? The answer clearly is no. From an economic standpoint, most rational or irrational sources of predictability may be discernible only in the long-run (see Sections 3.1 and 3.4). And ongoing research suggests that even from a statistical standpoint long-run regressions may be informative, in spite of the small-sample-related efficiency problems associated with such regressions. For example, Stambaugh's (1993) recent work suggests that violations of OLS assumptions for regressions similar to (26) [for example, the well-documented heteroskedasticity in stock returns not directly dealt with in this review], may actually enhance the efficiency of long-run regressions relative to their short-run counterparts; and the relative efficiency gain is even greater for overlapping versus nonoverlapping long-run regressions. Also, the work of Campbell (1993) and Stambaugh (1993) shows that the efficiency gains from overlapping data are magnified for nonzero/~(k) alternatives in (26).
5. Conclusion
In this paper, I attempt to provide a review of the broad spectrum of empirical methods commonly used to uncover predictable patterns in stock returns. I have made a conscious effort to limit discussion of empirical facts to the extent that they are relevant to (and perhaps motivate) the development and/or application of new statistical techniques. This review therefore concentrates on the statistical properties of the most widely used techniques. I have presented both the strengths and shortcomings of the statistical procedures because there is no substitute for robust empirical "facts." Robust facts become the basis for most subsequent theoretical and empirical research. 16 Specifically, given that stock returns contain predictable components it is then imperative to determine the economic significance of such predictability. Broadly speaking, two approaches have recently been used to evaluate the economic significance of stock-return predictability. The first approach attempts to assess whether the predictability is due to "animal spirits" or time-varying risk premia using different econometric and modeling techniques [see, for example, Bekaert and Hodrick (1992), Bollerslev and Hodrick (1995), Fama and French (1993), Ferson and Harvey (1991), Ferson and Korajczyk (1995), and Jones and Kaul (1996)]. 16Of course, given that most empiricalstudies in financeare based on historical data of surviving firms, any stylizedfact has to outlivebiases induced by the use of survived data [see Brown, Goetzmann, and Ross (1995)].
292
G. Kaul
T h e s e c o n d a p p r o a c h i n v o l v e s a d e t e r m i n a t i o n o f the uses o f p r e d i c t a b i l i t y to i n v e s t o r s m a k i n g asset a l l o c a t i o n decisions. F o r e x a m p l e , Breen, G l o s t e n , a n d J a g a n n a t h a n (1989) s h o w t h a t the p r e d i c t a b i l i t y o f stock r e t u r n s u s i n g t r e a s u r y bill rates h a v e e c o n o m i c significance in the sense t h a t the services o f a p o r t f o l i o m a n a g e r w h o m a k e s use o f the f o r e c a s t i n g m o d e l to shift f u n d s b e t w e e n bills a n d stocks w o u l d be w o r t h a n a n n u a l m a n a g e m e n t fee o f 2 % o f the v a l u e o f the m a n a g e d assets [see also P e s a r a n a n d T i m m e r m a n (1995)]. I n a m o r e r e c e n t p a p e r , K a n d e l a n d S t a m b a u g h (1996) d e m o n s t r a t e t h a t e v e n statistically w e a k p r e d i c t a b i l i t y o f asset r e t u r n s c a n m a t e r i a l l y affect a risk-averse B a y e s i a n inv e s t o r ' s p o r t f o l i o decisions.
References
Allen, F. and R. Karjalainen (1993). Using genetic algorithms to find technical trading rules. Working Paper, University of Pennsylvania, Philadelphia, PA. Bahadur, R. R. (1960). Stochastic comparison of tests. Ann. Math. Statist. 31, 276-297. Balvers, R. J., T. F. Cosimano, and B. McDonald (1990). Predicting stock returns in an efficient market. J. Finance 45, 1109-1128. Ball, R., S. P. Kothari, and J. Shanken (1995). Problems in measuring portfolio performance: An application to contrarian investment strategies. J. Financ. Econom. 38, 79-107. Bartlett, M. S. (1946). On the theoretical specification of sampling properties of autocorrelated time series. J. Roy. Statist. Soc. 27, 1120-1135. Bekaert, G. and R. J. Hodrick (1992). Characterizing predictable components in equity and foreign exchange rates of return. J. Finance 47, 467-509. Bollerslev, T., R. Y. Chou, and K. F. Kroner (1992). ARCH modeling in finance: A review of theory and empirical evidence. J. Econometrics 52, 5-59. Bollerslev, T. and R. J. Hodriek (1995). Financial market efficiency tests. In: M. Hashem Pesaran and Mike Wickens, eds., Handbook o f Applied Econometrics. Basil Blackwell, Oxford, UK. Boudoukh, J. and M. P. Richardson (1994). The statistics of long-horizon regressions revisited. Math. Finance 4, 103-119. Boudoukh, J., M. P. Richardson, and R. F. Whitelaw (1994). A tale of three schools: Insights on autocorrelations of short-horizon security returns. Rev. Financ. Stud. 7, 539-573. Box, G. E. P. and D. A. Pierce (1970). Distribution of the residual autocorrelations in autoregressive moving average time series models. J. Amer. Statist. Assoc. 65, 1509-1526. Breen, W., L. R. Glosten, and R. Jagannathan (1989). Economic significance of predictable variations in stock returns. J. Finance 44, 1177-1189. Brown, S. J., W. N. Goetzmann, and S. A. Ross (1995). Survival. J. Finance 50, 853-873. Campbell, J. Y. (1987). Stock returns and the term structure. J. Financ. Econom. 18, 373-399. Campbell, J. Y. (1991). A variance decomposition for stock returns. Econorn. J. 101, 157-179. Campbell, J. Y. (1993). Why long horizons? A study of power against persistent alternatives. Working Paper, Princeton University, Princeton, NJ. Campbell, J. Y. and R. J. Shiller (1988a). The dividend-price ratio and expectations of future dividends and discount factors. Rev. Financ. Stud. 1, 195-227. Campbell, J. Y. and R. J. Shiller (1988b). Stock prices, earnings, and expected dividends. J. Finance 43, 661-676. Campbell, J. Y., A. W. Lo, and A. C. MacKinlay (1993). Present value relations. In: The Econom. o f Financ. Markets. Massachusetts Institute of Technology, Cambridge, MA. Chen, N. (1991). Financial investment opportunities and the macroeconomy. J. Finance 46, 529-554. Chert, N., R. Roll, and S. A. Ross (1986). Economic forces and the stock market. J. Business 59, 383403. Cochrane, J. H. (1988). How big is the random walk in GNP? J. Politic. Econom. 96, 893-920.
293
Cochrane, J. H. (1991). Volatility tests and efficient markets: A review essay. J. Monetary Econom. 27, 463-485. Conrad, J. and G. Kaul (1988). Time-variation in expected returns. J. Business 61, 409-425. Conrad, J. and G. Kaul (1989). Mean reversion in short-horizon expected returns. Rev. Financ. Stud. 2, 225-240. Conrad, J. and G. Kaul (1994). An anatomy of trading strategies. Working Paper, University of Michigan, Ann Arbor, MI. Cutler, D. M., J. M. Poterba, and L. M. Summers (1991). Speculative dynamics. Rev. Econom. Stud. 58, 529-546. Daniel, K. and W. Torous (1993). Common stock returns and the business cycle. Working Paper, University of Chicago, Chicago, IL. DeBondt, W. and R. Thaler (1985). Does the stock market overreact? J. Finance 40, 793-805. Evans, M. D. D. (1994). Expected returns, time-varying risk, and risk premia. J. Finance 49, 65~679. Fama, E. F. (1965). The behavior of stock market prices. J. Business 38, 34-105. Fama, E. F. (1970). Efficient capital markets: A review of theory and empirical work. J. Finance 25, 383-417. Fama, E. F. (1990). Stock returns, expected returns, and real activity. J. Finance 45, 1089-1108. Fama, E. F. (1991). Efficient capital markets: II. J. Finance 46, 1575-1617. Fama, E. F. and K. R. French (1988a). Permanent and temporary components of stock prices. J. Politic Econom. 96, 246-273. Fama, E. F. and K. R. French (1988b). Dividend yields and expected stock returns. J. Financ. Econom. 22, 3-27. Fama, E. F. and Kenneth R. French (1989). Business conditions and expected returns on stocks and bonds. J. Financ. Econom. 25, 23-49. Fama, E. F. and G. W. Schwert (1977). Asset returns and inflation. J. Financ. Econom. 5, 115-146. Faust, J. (1992). When are variance ratio tests for serial dependence optimal? Econometrica 60, 12151226. Ferson, W. E. and C. R. Harvey (1991). The variation of economic risk premiums. J. Politic Econom. 99, 385-415. Ferson, W. E. and R. A. Korajczyk (1995). Do arbitrage pricing model explain predicatability of stock returns? J. Business 68, 309-349. Ferson, W. E. and R. W. Schadt (1995). Measuring fund strategy and performance in changing economic conditions. J. Finance, to appear. Fisher, L. (1966). Some new stock-market indexes. J. Business 39, 191-225. Flood, K., R. J. Hodrick, and P. Kaplan (1987). An evaluation of recent evidence on stock market bubbles. Working Paper 1971, National Bureau of Economic Research, Cambridge, MA. Foster, F. D. and T. Smith (1992). Assessing goodness-of-fit of asset pricing models: The distribution of the maximal R 2. Working Paper, Duke University, Durham, NC. French, K. R., G. W. Schwert, and R. F. Stambaugh (1987). Expected stock returns and volatility. J. Financ. Econom. 19, 3-29. Fuller, W. (1976). Introduction to Statistical Time Series. Wiley & Sons, New York. Geweke, J. (1981). The approximate slope of econometric tests. Econometrica 49, 1427-1442. Gibbons, M. and W. E. Ferson (1985). Testing asset pricing models with changing expectations and an unobservable market portfolio. J. Financ. Econom. 14, 217-236. Goetzmann, W. N. (1993). Patterns in three centuries of stock market prices. J. Business 66, 249-270. Goetzmann, W, E. and P. Jorion (1993). Testing the predictive power of dividend yields. Y. Finance 48, 663~79. Gordon, M. J. (1962). The investment, financing, and valuation o f the corporation. Irwin, Homewood, IL. Granger, C. W. J. and O. Morgenstern (1963). Spectral analysis of New York stock market prices. Kyklos 16, 1-27. Granger, C. W. J. and P. Newbold (1974). Spurious regressions in econometrics. J. Econometrics 2, 111-120.
294
G. Kaul
Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica 50, 1029-1057. Hansen, L. P. and R. J. Hodrick (1980). Forward exchange rates as optimal predictors of future spot rates: An econometric analysis. J. Politic. Econom. 88, 829 853. Hirshleifer, J. (1975). Speculation and equilibrium: Information, risk, and markets. Quart. J. Econom. 89, 519-542. Hodrick, R. J. (1992). Dividend yields and expected stock returns: Alternative procedures for inference and measurement. Rev. Financ. Stud. 5, 357-386. Jagannathan, R. and Z. Wang (1996). The conditional CAPM and the cross-section of expected returns. J. Finance 51, 3-54. Jegadeesh, N. (1990). Evidence of predictable behavior of security returns. J. Finance 45, 881-898. Jegadeesh, N. (1991). Seasonality in stock price mean reversion: Evidence from the U.S. and the U.K. J. Finance 46, 1427-1444. Jegadeesh, N. and S. Titman (1993). Returns to buying winners and selling losers: Implications for stock market efficiency. J. Finance 48, 65-91. Jones, C. M. and G. Kaul (1996). Oil and the stock markets. J. Finance 51, 463-492. Kandel, S. and R. F. Stambaugh (1989). Modeling expected stock returns for short and long horizons. Working Paper, University of Chicago, Chicago, IL. Kandel, S. and R. F. Stambaugh (1990). Expectations and volatility of consumption and asset returns. Rev. Financ. Stud. 3, 207-232. Kandel, S. and R. F. Stambaugh (1996). On the predictability of stock returns: An asset-allocation perspective. J. Finance 51, 385-424. Kaul, G. and M. Nimalendrau (1990). Price reversals: Bid-ask errors or market overreaction? J. Financ. Econom. 28, 67-83, Keim, D. and R. F. Stambaugh (1986). Predicting returns in the stock and bond markets. J. Financ. Econom. 17, 357-390. Kendall, M. G. (1953). The analysis of economic time-series, Part I: Prices. J. Roy. Statist. Soc. 96,
11-25.
Kendall, M. G. and A. Stuart (1976). The Advanced Theory of Statistics. Vol. 1. Charles Griffin, London. Kim, M. J., C. Nelson, and R. Startz (1991). Mean reversion in stock prices? A reappraisal of the empirical evidence. Rev. Econom. Stud. 58, 515-528. Lehmann, B. N. (1990). Fads, martingales, and market efficiency. Quart. J. Econom. 105, 1~8. LeRoy, S. F. (1973). Risk aversion and the martingale property of stock returns. Internat. Econom. Rev. 14, 43~446. LeRoy, S. F. (1989). Efficient capital markets and martingales. J. Econom. Literature 27, 1583-1621. LeRoy, S. F. and Richard D. Porter (1981). Stock price volatility: Tests based on implied variance bounds. Econometrica 49, 97-113. Lo, A. W. (1991). Long-term memory in stock prices. Econometrica 59, 127%1314. Lo, A. W. and A. C. MacKinlay (1988). Stock market prices do not follow random walks: Evidence from a simple specification test. Rev. Financ. Stud. 1, 41-66. Lo, A. W. and A. C. MacKinlay (1989). The size and power of the variance ratio test in finite samples: A Monte Carlo investigation. J. Econometrics 40, 203-238. Lo, A. W. and A. C. MacKinlay (1990a). When are contrarian profits due to market overreaction? Rev. Financ. Stud. 3, 175-205. Lo, A. W. and A. C. MacKinlay (1990b). An econometric analysis of nonsynchronous trading. J. Econometrics 45, 181-211. Lo, A. W. and A. C. MacKinlay (1990c). Data-snooping biases in tests of financial asset pricing models. Rev. Financ. Stud. 3, 431-467. Lo, A. W. and A. C. MacKinlay (1992). Maximizing predictability in the stock and bond markets. Working Paper, Massachusetts Institute of Technology, Cambridge, MA. Lucas, R. E. (1978). Asset prices in an exchange economy. Econometrica 46, 1429-1446.
295
Mandelbrot, B. (1966). Forecasts of future prices, unbiased markets, and 'martingale' models. J. Business 39, 394-419. Mandelbrot, B. (1972). Statistical methodology for non-periodic cycles: From the covariance to R/S analysis. Ann. Econom. Social Measurement 1, 259-290. Mankiw, N. G., D. Romer, and M. D. Shapiro (1991). Stock market forecastability and volatility: A statistical appraisal. Rev. Econom. Stud. 58, 455-477. Mankiw, N. G. and M. D. Shapiro (1986). Do we reject too often? Econom. Lett. 20, 139-145. Marriott, F. H. C. and J. A. Pope (1954). Bias in estimation of autocorrelations. Biometrika 41, 390402. Muthuswamy, J. (1988). Asynchronous closing prices and spurious autocorrelations in portfolio returns. Working Paper, University of Chicago, Chicago, IL. Nelson, C. R. and M. J. Kim (1993). Predictable stock returns: The role of small sample bias. J. Finance 48, 641~561. Newey, W. K. and K. D. West (1987). A simple, positive definite, heteroscedasticity and auto-correlation consistent covariance matrix. Econometrica 55, 703-707. Ohlson, J. (1977). Risk-aversion and the martingale property of stock prices: Comments. Internat. Econom. Rev. 18, 229-234. Pesaran, M. H. and A. Timmermann (1995). Predictability of stock returns: Robustness and economic significance. J. Finance 50, 1201-1228. Poterba, J. and L. H. Summers (1988). Mean reversion in stock returns: Evidence and implications. J. Financ. Econom. 22, 27-60. Richardson, M. P. (1993). Temporary components of stock prices: A skeptics view. J. Business Econom. Statist. 11, 199-207. Richardson, M. P. and J. H. Stock (1989). Drawing inferences from statistics based on multiyear asset returns. J. Financ. Econom. 25, 323-347. Richardson, M. P. and T. Smith (1991). Tests of financial models in the presence of overlapping observations. Rev. Financ. Stud. 4, 227-257. Richardson, M. P. and T. Smith (1994). A unified approach to testing for serial correlation in stock returns. J. Business 67, 371-399. Roberts, H. V. (1959). Stock-market 'patterns' and financial analysis: Methodological suggestions. J. Finance 14, 1-10. Roll, R (1988). R 2. J. Finance 43, 541-566. Ro11, R. (1968). The efficient market model applied to U.S. treasury bill rates. Unpublished Ph.D. thesis, Graduate School of Business, University of Chicago, Chicago, IL. Rozeff, M. (1984). Dividend yields are equity risk premiums. J. Port. Mgmt. 11, 68-75. Samuelson, P. A. 0965). Proof that properly anticipated prices fluctuate randomly. Ind. Mgmt. Rev. 6, 41-49. Scholes, M. S. and J. Williams (1977). Estimating beta from nonsynchronous data. J. Financ. Econom. 5, 309-327. Schwert, G. W. (1989). Why does stock market volatility change over time? J. Finance 44, 1115-1153. Schwert, G. W. (1990). Stock returns and real activity: A century of evidence. J. Finance 45, 12371257. Seyhun, N. S. (1992). Why does aggregate insider trading predict future stock returns? Quart. J. Econom. 107, 1303-1331. Shiller, R. J. (1981). Do stock prices move too much to be justified by subsequent movements in dividends? Amer. Econom. Rev. 71, 421-436. Shiller, R. J. (1984). Stock prices and social dynamics. Brookings Papers on Economic Activity 2, 457497. Shiller, R. J. (1989). Market volatility. MIT Press, Cambridge, MA. Stambaugh, Robert F. (1986a). Discussion. J. Finance 41, 601~i02. Stambaugh, Robert F. (1986b). Bias in regression with lagged stochastic regressors. Working Paper, University of Chicago, Chicago, IL.
296
G. Kaul
Stambaugh, R. F. (1993). Estimating conditional expectations when volatility fluctuates. Working Paper, University of Pennsylvania, Philadelphia, PA. Summers, L. H. (1986). Does the stock market rationally reflect fundamental values? J. Finance 41, 591~501. White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test of heteroskedasticity. Econometrica 48, 817-828. White, H. (1984). Asymptotic Theory for Econometricians. Academic press, Orlando, FL. Working, H. (1934). A random difference series for use in the analysis of time series. J. Amer. Statist. Assoc. 29, 11-24. Working, H. (1949). The investigation of economic expectations. Amer. Econom. Rev. 39, 150-166. Working, H. (1960). Note on the correlation of first differences of averages in a random chain. Econometrica 28, 916-918.
G. S. Maddala and C. R. Rao, eds., Handbookof Statistics, Vol. 14 1996 ElsevierScienceB. V. All rights reserved.
1U
Interest Rate Spreads as Predictors of Business Cycles
Kajal Lahiri and Jiazhuo G. Wang
I. Introduction
Financial economists have long understood that financial market variables like the stock prices and interest rates contain considerable information about the future of the economy. In recent years a number of studies have demonstrated that interest rate spreads - i.e. the differences on a given date between interest rates on alternative financial assets - have remarkable power in predicting future economic activity. The spread between the six-month commercial paper and the six-month Treasury bill rates (cf. Friedman and Kuttner (1992,1993b) and Bernanke (1990)), between the Federal funds rate and the long-term Treasury bond rate (cf. Laurent (1988, 1989) and Bernanke and Blinder (1993)), and between short-term and long-term Treasury bond rates (cf. Estrella and Hardouvelis (1991), Fama (1990), Harvey (1989), and Stambaugh (1988)) have appeared prominently in the literature. Using vector-autoregressive techniques (see Sims (1993)) and the concept of Granger Causality, researchers in this area have established the marginal predictive power of the spread variables with a high level of confidence. Stock and Watson (1989, 1990a,b, 1993), in their attempt to develop a new comprehensive index of leading indicators, have found that the paper-bill spread and the "tilt" of the term structure (i.e., the slope of the yield curve) are two of the most potent leading variables from the perspective of business cycle forecasting. There is a presumption, however, that these interest rate variables might have lost some of their predictive power during the 1990s due to a variety of factors. A number of changes in the Federal Reserve operating procedures during the 1980s might have reduced the reliability of interest rates as indicators of monetary policy. Also, financial innovation and deregulation, deepening of the commercial paper market, increasing gtobalization and integration of the international financial markets, and other factors might have increased the substitutability amongst various money market instruments 1. This can reduce the sensitivity of interest rate spreads to monetary policy innovations. In fact, the failure of the 1 See, for instance, Bernanke (1990), Bernanke and Mishkin (1993), Estrella and Hardouvelis (1991), Kashyap, Stein and Wilcox(1993), and Stock and Watson (1993). 297
298
K. Lahiri and J. G. Wang
experimental recession index of Stock and Watson (1993) to predict the latest recession has been attributed to its excessive reliance on these financial variables. A business-cycle predictor will be useful in an e x a n t e sense only if one has developed an appropriate filter rule which will map changes in the predictor variable into turning point predictions. McNees (1991) has pointed out that, unfortunately, this rather obvious point has seldom been adequately emphasized in the literature. A number of a d h o c filtering rules have been developed to interpret monthly movements in the Composite Index of Leading Indicators - the classic one is the "three-consecutive declines" rule for signaling a downturn. 2 In financial circles, an inversion of the yield curve has long been used as a signal for an impending recession. Any empirical rule will typically involve trade-offs of accuracy for timeliness and missed signals for false alarms. Rather than predicting turning points, Stock and Watson (1993) used stochastic simulation of their dynamic single index model to capture the probability that the economy will be in recession during a future month, where a recession is defined as a particular pattern of movements in the unobserved state of the economy. In the present chapter we evaluate the relative performance of various interest rate spread variables as predictors of business cycle turning points in a non-linear framework. All aforementioned studies have followed a linear time-series approach where recessionary episodes are "extrinsic" to the system. Many earlier scholars including Keynes (1935) and Hicks (1950) have emphasized the issue of asymmetric business cycles before the linear time series methodology became popular in empirical economics. Specifically, these authors observed that expansions are more persistent but less sharp than recessions. Burns and Mitchell (1946, p. 134) noted "that contraction is a more violent change than expansion is a common finding". In recent years, a number of authors including Neftci (1984), Sichel (1991) and De Gooijer and Kumar (1992) have found evidence for nonlinearity and asymmetry in macroeconomic time series. We have demonstrated (Lahiri and Wang (1994)) that the usual criterion function for the estimation and prediction in linear time series models are inadequate for the purpose of characterizing and identifying the different dynamics over the alternate stages of the business cycle. Stock and Watson (1990a) have cautioned that the relationship between spreads and subsequent economic activity might better be represented by a non-linear rather than a linear model. In addition, recessions and expansions should not be treated symmetrically. We emphasize the issue of the prediction of turning points with reasonable lead times. Many authors, including McNees (1992) and Zarnowitz (1992, Ch.13), have concluded that the general accuracy and usefulness of macroeconomic forecasts can be greatly enhanced if the sizable errors that are typically found around turning points can be minimized. In our framework, the economy is modeled to shift between two regimes - expansions and recessions where the dynamic behavior of the process is allowed to vary greatly from one regime to another. The switch between the two is governed by a 2 Furtheranalysisof variousfilterrulesto identifyturningpointscan be foundin Zarnowitz(1992, Ch.11). See also Zellnerand Hong(1989).
Interest rate spreads as predictors of business cycles
299
two-state Markov process. We assume that the econometrician does not observe the shifts directly, but instead makes probabilistic inferences about the unobserved underlying state. Hamilton's (1989, 1993) non-linear filter algorithm also permits maximum likelihood (ML) estimation of population parameters in a flexible manner. Our analysis reveals that the interest rate spreads performed remarkably well over the period 1953:01 - 1993:03. In many ways, the slope of the yield curve was the best predictor, followed closely by the spread between the Federal funds rate and the long-term Treasury bond rate. The former predicted all fifteen peak and trough turning points over our sample period with comfortable lead times and without any false alarms. The spread based on the Federal funds rate could not predict the recessions of 1957-58 and 1960-61. It also gave a false signal during 1966. Contrary to current thinking, these two spreads successfully predicted the peak and the trough of the latest recessionary episode. The performance of the matched maturity paper-bill spread was less impressive. The signal for the 1990 recession came only after five months; otherwise, it predicted all peak turning points with an average lead time of nearly six months. Furthermore, unlike the other two, this variable has consistently failed to predict trough turning points with any reasonable lead time. The signals came with a little lag. This result is consistent with the observation of Friedman and Kuttner (1993b) that the paperbill spread tends to be wide not only just before recessions but during recessions as well. Using linear time-series analysis, Bernanke (1990) ran a "horse race" between a number of interest rate variables to predict nine different monthly measures of real macroeconomic activity as well as the inflation rate. While many of the interest rate variables have been excellent predictors of the economy over 1961-89, he found the best single variable to be the spread between commercial paper and the Treasury bill rate. It should, however, be pointed out that in his analysis no special consideration was given to the forecasting errors around business cycle peaks and troughs. The chapter is organized as follows: Section 2 introduces Hamilton's (1989) two-regime Markov switching model and the estimation procedure. Section 3 contains the empirical results. Their implications for the monetary transmission mechanism are given in Section 4. Finally, concluding remarks are presented in Section 5.
2. Hamilton's non-linear filter
The model postulates a data generating process with two different regimes expansions and recessions. We further assume that the process is subject to discrete shifts governed by a two-state Markov process. The observed time-series is drawn from two different states, St = 1,2. Both the mean and the variance are functions of the prevailing state, y t / S t ~ N(#st , f~s) where #~, = (#l, #2) = mean value of yt in expansion and recession, respectively; f~s, = (al, o-2) = the regime dependent standard deviations; St = unobserved state variable taking values acc-
300
ording to a first order M a r k o v chain: Pij = Pr(& = i/St_l = j) with P l l q- P12 -= P21 + P22 = l ( i , j = 1,2). Let 2 - (pl,#2,0"l,~r2, P l l , P 2 2 ) denote the vector of population parameters that characterize the probability density P(yl, Y2,..., Yt; J.) of the observed data. The task is to estimate the parameter which best fit the data, and to make inferences about the unobserved states given the observations up to t . Since we take Yt - a particular interest spread variable - as a leading indicator, the calculated probability can be interpreted as a direct prediction of the underlying state of the economy in the near future. The inference about the unobserved state is conducted in two stages. 3 First, the population parameters are estimated. Second, inference about the unobserved state is made using the estimated parameters. Since the state is not directly observable, the inference takes the form of a probability:
P(St = i/yt, yt-1,...,yl;2), i = 1,2 ,
(1)
which denotes the probability that the process will be in state i at time t, conditional on the data observed through time t and given a value o f 2. Let us first consider the inference procedure assuming that the value of 2 is known. Starting from the unconditional probability of state 1 at time t = 1 given by the well-known formula P(S1 = 1) = (1 - P 2 2 ) / ( ( 1 - P l l ) -}- (1 - P22)), w e can calculate P(S2,S1)= P(S2/SI)P(&), which is the joint probability of the state at t = 1 and t = 2. Given the joint normal density of (yl,y2) conditional on S1 and $2, the joint probability density of states and observations is given by P(y2, yl, $2, S1) = P (y2, yl/$2, S1) P(S2, SI ) . Summing over states, we obtain:
2 2
(2)
P(Y2, Yl) = E
E P(y2, y,,S2,&) .
(3)
$1=1 $2=1
We can make an inference about the states in the first two periods conditional on the data by calculating P(S2, $1/y2, yl) = P(y2, yl, $2, SI)/P(y2, yl). Then, an inference about the state i at t = 2 is obtained as:
P(S2 = i/y2, Yl) = P(S2 = i, Sl = 1/y2, Yl) + P(S2 = i, $1 = 2/y2, yl) , i = 1,2 .
(4)
Similarly, using (4) as the initial value, repeating the above procedure, we obtain the inference about the state of the process at time t conditional on the observed time series through t :
2
P(St/Yt) = E
St-l =1
P(St,St-1/Yt),t = 2 , 3 , . . . , r
(5)
3 See Hamilton (1988, 1989, 1990, 1993) for details.
Interest rate spreads as predictors o f business cycles
301
where Yt = ( y t , y t - 1 , . . . , y l ) . Note that a byproduct of the filter is the sample likelihood function based on all observations:
P(Yl, y 2 ,
...,
yr; 2)
2
= Z""
$1=1
EP(Yl'Y2'""Yr'SI'S2'''"Sr;2)
Sr=l
(6)
which can be maximized directly to estimate 2 using numerical methods. The parameters obtained can then be used to make inference using the filter described above. As the outcome of this procedure, we can obtain a sequence of probabilities that the economy will fall either into an expansion or into a recession at time t. In this way we can forecast turning points of the business cycle.
3. Empirical results
The predictive performance of three interest spread variables is analyzed below. They are (i) the spread between the Federal funds rate and the ten-year Treasury bond rate (FR_10TB); (ii) the spread between the ten-year Treasury bond rate and the one-year Treasury bill rate (10TB_ITB); and (iii) the difference between the commercial paper rate and the Treasury bill rate at six months' maturity (6CP_6TB). Monthly observations ranged from 1955:01-1993:03 for FR_10TB, from 1953:01-1993:03 for 10TB 1TB and from 1959:01-1993:03 for 6CP 6TB. They were obtained from the Citibase data bank. These series are depicted in Figures 1-3, where the boxed areas represent NBER-dated recessions. 4 The
4 3 2 1 0 -1 -2 -3 -4 53 55 57 59 61 63 65 67 69 71 73 75 77
I 1 1 1 1 1 1 : 1 I I 1 I I [ : 1 1 1 :
79 81
83 85
87
89 91 93
Fig. 1. One-year treasury bill rate minus ten-year treasury bond rate (ITB_10TB) 1953:01-1993:03 4 We also experimented with the spread between 10-year Treasury bond rate and 3-month Treasury bill rate (10TB_3TB) and the commercial paper-Treasury bill rate spread at 3-months' maturity (3CP_3TB). The performance of these two spreads was very similar to those of 10TB_ITB and 6CP_6TB, respectively; hence we have not reported these results separately.
302
8
iii iii 6 4 2 0 -2
i i
!i Vi'
55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93
-4
Fig. 2. Federal Funds Rate minus Ten-year Treasury Bond Rate (FR_10TB) 1955:01-1993:03
4 3.5 3 2.5 2 1.5
1
m
iii ii i
iiii
i i
ii i
ii 59 61 63 65 67 69 71 73 75 77 79 81 i i 83 85 87 89 91 93
0.5 O -0.5
Fig. 3. CommercialPaper Rate minus Treasury Bill Rate (6CP_6TB) 1959:01-1993:03 augmented Dickey-Fuller tests rejected the null hypothesis of non-stationarity at the one percent level for all of these series. M L solution was obtained by using the so called EM algorithm described in Hamilton (1990). In order to avoid the wellknown singularity problem associated with estimating parameters of mixtures of normal distributions, we used certain sample-based priors for the parameters of the two regimes following Hamilton's (1991) "quasi-Bayesian" approach. Parameter estimates together with their standard errors are displayed in Table 1. The mean values of FR_10TB and the difference between the one-year Treasury bill and the ten-year bond rates are found to be negative during expansions and positive during recessions. Also, on the average, the private-public spread is much wider during recessions (#2 = 0.95 percent per annum) than during expansions (#1 = 0.27 percent per annum). The estimated standard errors
Interest rate spreads as predictors o f business cycles
303
Table 1 Parameter estimates of the two-regime Markov switching Model Parameter #1 /~2 P11
P22
FR_10TB -1.4849 (0.0738) 1.3493 (0.2431) 0.9839 (0.0072) 0.9539 (0.0209) 0.7759 (0.0713) 2.5764 (0.3269)
10TB_ITB 1.4437 (0.0479) ~ . 1289 (0.0515) 0.9691 (0.0366) 0.9683 (0.0414) 0.3902 (0.0111) 0.4300 (0.0114)
6CP_6TB 0.2711 (0.0124) 0.9531 (0.0418) 0.9583 (0.0138) 0.9364 (0.0205) 0.0268 (0.0030) 0.2404 (0.0264)
cr~ cr~
Note: Numbers in the parentheses are standard errors of parameters.
of the parameters suggest that the parameters are estimated quite precisely. The variance of errors during recessions (~r22)is found to be considerably larger than that during expansions (~). This regime-dependent heteroskedasticity is well recognized. 5 The estimated transitional probabilities (Pll and P22 ) are above 0.90 for all these series, indicating that the tendency to stay in the existing regime is very dominant. 6 Figure 4 depicts the estimated probabilities (filter inference) that the economy will be in a recessionary state, i.e. P(St = 2 / y t , Y t - l , . . . ,yl;2) using 10TB_ITB from the mid-1950s untill 1993:03 on a month-by-month basis. We find that the simple two-regime Markov switching model gave sharp signals in all three cases. It is also noteworthy that the probabilities increase sharply to values close to one just prior to turning points. As a result, very little lead time is lost solely due to the filter rule. For instance, the "three consecutive declines" rule necessitates a three months lag before it can signal. We used a critical value of 0.90 to trigger a peak signal for all three series. A trough turning point was signaled whenever P(St = 1 / Y t , Y t - l , . . . , Y l ; 2 ) exceeded the critical value of 0.90 for FR 10TB and 0.50 for 6CP 6TB and 10TB 1TB. These critical values were chosen to balance the need to signal each turning point over the sample without
5 See Neftci (1984), French and Sichel (1993) and Dasgupta and Lahiri (1993). 6 Following Hamilton (1988, 1989), we also experimented with more complicated models by adding autoregressive terms to the error process, i.e. Yt = ,us, + q51(yt-i - #s,_l ) + - . . + et,, where et is N(0, or2). We allowed A R terms up to 4. In terms of the conventional model fit criterion ( e.g. the maximized value of the likelihood function ) which typically assigns the same weight to all observations, these models were marginally better than the one without any autoregressive error term. However, the probabilistic forecasts generated by these models were considerably worse than those reported in the paper. They missed majority of the turning points. Thus, the "best-fitted" model is not necessarily the best for the purpose of turning point predictions. See Lahiri and Wang (1994) for more details on this point. The model reported in the text is the same as that in Engel and Hamilton (1990).
304
K. Lahiri and J. G.
Wang
1 0.9 0.8 I
0,7 0.6 0.5
o.3
0.2 + , i
0
53 55 57 59 61 63 65
ii
67
69
71 73
75
77 79
81 83 85 87 89
91 93
F i g . 4. Probability of Recession Using 10TB_ITB, 1953:01-1993:03
too many false alarms. 7 Fortunately, the probability estimates were such that these choices were natural and hence, did not affect the results reported here. Over the sample, there were fifteen NBER-defined peaks and troughs Tables 2 and 3 summarize the performance of the three spreads in signaling peak and trough turning points. The slope of the yield curve (i.e. 10TB_ITB) performed best - it signaled all turning points with no false signal. The average lead time in signaling a recession was nearly 20 months and was a little less than three months in signaling a trough turning point. Estrella and Hardouvelis (1991) found that the spread had maximum forecasting power at the 5-6 quarters forecasting horizon. The signal for the peak of December 1969 had a long lead time. It came in October 1964 and was
Table 2 Trough turning point signals NBER-Trough

May-54 Apr-58 Feb-61 Nov-70 May-75
FR_10TB
NA NO NO -1 -2
10TB_ITB
~6 -1 -3 0 -3
6CP_6TB
NA NA + 1 +8 -1
LEI
-2 + 1 + 1 + 1 -1
Jul-80 Nov-82 Mar-91
+2
-3
-1
3
+2
+ 1
0
-1
~5
-6
+ 1
+2
Note: Leads (-) and Lags ( + ) actual troughs. NO = no signals when actual troughs occurred. NA = not available.
7See Neftci (1982), Diebold and Rudebusch (1989, 1991), Koenig and Emery (1991), and Lahiri
and Wang (1994).
305
Table 3 Peak turning point signals NBER-Peak Aug-57 Apr-60 Dec-69 Nov-73 Jan-80 Jul-81 Jul-90 FR_10TB NO NO -16 -6 -12 -8 -15 10TB 1TB -19 -11 -62 -7 -16 -9 -18 6CP_6TB NA -1 -8 -5 -5 -8 +5 LEI -15 -6 -5 -3 -9 -5 -5
Note: Leads (-) and Lags (+) actual peaks. NO = no signals when actual peaks occurred. NA = not available. never canceled till the onset of the 1969-70 recession. We should, however, point out that, as predicted, the slope of the yield curve did revert and stayed positive during all of 1966. The 1966-67 period was later characterized as a growth recessionary period. It is possible that a full-fledged recession was avoided by the tax cut of 1964, and by the tremendous growth of defense spending which began in the fourth quarter of 1965 (as the Vietnam war escalated) and lasted until the end of 1968. In these years, defense spending increased from an annualized rate of $65.5 billion in the fourth quarter of 1965 to an annualized rate of $80 billion at the end of 1968, an increase of more than fifty percent. The output increase induced by the jump in defense spending stimulated a minor investment boom which added further to demand. It is this excess fiscal stimulus which could have delayed the onset of the recession temporarily. As is well known, this led to increased inflationary pressures. Over 1966-68, the 10TB_ITB spread continued to stay relatively high by historical standards in anticipation of a tighter monetary policy. The monetary brake finally came at the end of 1968, and the recession precipitated in December 1969. Since 1969, the track record of FR_10TB is equally impressive. It signaled all turning points with lead times similar to those of 10TB_ITB. However, FR_10TB gave one pair of false signals during 1966-67 and failed to signal the 1957-58 and 1960-61 recessions. The false signal can be explained by the subsequent 1966-67 growth recession, which was essentially caused by the "credit crunch" of 1966 (cf. Bernanke and Blinder (1992), pp. 911-12). The failure of FR_10TB to signal the recessions of 1957-1958 and 1960-61 is not unexpected. Bernanke and Blinder (1992) have argued that the Federal funds rate is a good predictor of future economic activity because it is a good indicator of the monetary policy stance. Before 1966, the funds rate was generally below the discount rate, and hence was not a good indicator of monetary policy. When the funds rate is below the discount rate, borrowing declines to frictional levels. Then the Federal funds rate is no longer sensitive to the spread between the funds rate and the discount rate. On the other hand, 10TB_ITB performed well because the Treasury bill market has been large and well developed throughout the postwar period and hence sensitively recorded monetary policy innovations and other economy-wide de-
306
velopment. This supports Romer and Romer (1993) who have argued that there has been an interest rate channel throughout the postwar era. Both FR_10TB and 10TB_ITB were successful in predicting the latest peak of July 1990 with a lead time of 15-18 months and the trough of March 1991 with a lead time of six months. This result is striking since most researchers working in this area have thought that the latest recessionary episode was not forecastable on the basis of the behavior of the spreads before July 19908. The nine-variable probabilistic VAR model of Sims (1993) could not forecast the 1990 recession. Fair (1993) has commented that the latest recession was not an easy event to predict. Another interesting point to note is that the predictive prowess of F R 10TB and 10TB_ITB did not diminish during the 1979-82 era when the Fed is thought of having shifted its reliance from the Federal funds rate to non borrowed reserves as an intermediate target. However, since reserve requirements during the early 1980s were lagged, weekly non borrowed reserve targeting was closely related to borrowed reserve targeting. The latter was essentially a noisy federal funds targeting procedure that the Fed has historically used before. 9 With one exception, the private-public spread (6CP_6TB) signaled all cyclical peaks since 1960 with an average lead time of 5-6 months. The signal for the July 1990 peak came after five months. Another discouraging aspect of 6CP 6TB's predictive capacity is that it did not predict the cyclical troughs with any lead time. On the average, it lags by 2 months. Even though this result is understandable in view of the fact that the average duration of post-war recessions has only been just over 11 months, FR_10TB and 10TB_ITB performed admirably well even in predicting these troughs. The failure of 6CP_6TB to lead cyclical troughs is consistent with the observation by Friedman and Kuttner (1993b) that the spread is especially wide not only before recessions but during recessions as well. This can be explained by the fact that, apart from monetary factors, the 6CP 6TB spread also reflects default risk and business financing needs, which tend to stay high throughout the recession. We should also note that the recession of 1973-75 was anticipated by all three interest spread variables. The signals came during the second quarter of 1973, which was clearly prior to the tightening of the monetary policy during 1973-74, cf. Romer and Romer (1993). Thus, we can conclude that these spread variables carry information beyond the monetary policy stance, t We also noted that 6CP_6TB signaled seven additional recessions which failed to materialize. The false peak signals came in June 1966, November 1966, August 1968, September 1971, November 1978, July 1984, and May 1987. Arguably, five of these were associated with N B E R growth recessions of June 1966, March 1969,
s Only exceptionis Laurent (1989), who clearlypredictedthe 1990 recessionbased on the spread between the Federal funds and the long-term governmentbond rates. 9 See Goodfriend (1991), Karamouzis and Lombra (1989)and Feinman and Poole (1989) for further discussion on this point. 10 This is consistent with Bernanke (1990), Estrella and Hardouvelis (1991) and Friedman and Kuttner (1993).
307
December 1979, June 1984 and February 1989. ll Even then, the private-public spread tends to give too many false signals for business cycles in comparison to FR_10TB and 10TB_ITB. Many observers have indicated that the predictive power of 6CP_6TB has deteriorated considerably in recent years because the commercial paper market has increasingly become deeper and more liquid during the 1980s. However, we find that during the 80s, 6CP_6TB has been very active and gave a total for five pairs of turning point signals, even though two of these turned out to be false. In our analysis, 10TB_ITB and FR_10TB are clearly superior to 6CP_6TB in forecasting business cycles. This may seem inconsistent with the evidence in Bernanke (1990), and in Friedman and Kuttner (1993b). We should, however, point out that the optimal forecasting horizon in our framework is free and turns out to be much longer than the one-month horizon typical in most studies. In fact, Bernanke and Mishkin (1993) have reported that 6CP 6TB ceases to be the best predictor once the forecasting horizon is changed from one month to twelve months. Another important result is that the optimal forecasting horizon for predicting troughs is significantly shorter than the horizon for predicting peaks. The standard VAR literature ignores this asymmetry between expansions and recessions, and assumes one single forecasting horizon over the whole time-series. It is interesting to compare the performance of the three interest spread variables with that of the Commerce Department's Index of Leading Economic Indicators (LEI). The last columns of Tables 2 and 3 present the performance of LEI in predicting NBER-defined expansions and recessions. These columns give the lead times associated with the currently available LEI data using the same filter. Details can be found in Lahiri and Wang (1994). We find that it predicted all peaks with average lead time of nearly seven months. The record in foreshadowing cyclical troughs is less attractive - on the average, LEI tracked all troughs with a mean lag of 0.125 month. Thus, the signals were almost coincidental. However, like 6CP_6TB, LEI gave five pairs of additional turning point signals in 1956' 05, 1962 : 05, 1966" 06, 1984 : 06 and 1987' 11, when there were no corresponding N B E R defined recessions afterwards. Most of these signals can again be justified in terms of growth recessions that occurred subsequently. Thus, the overall performance of LEI is very similar to the one of 6CP_6TB. We should point out that, unlike many components of LEI, the interest rate predictors do not go through data revisions and occasional major definitional revisions. 12 Also, interest rate data are more promptly available. The LEI figure for a particular month is available only after the end of the following month. Given these additional advantages, the performance of the three interest rate spreads - particularly that of 10TB 1TB and FR_10TB - is truly remarkable when compared to the Index of Leading Indicators.
tl See Zarnowitz (1992, pp 342-344) for these chronologies. 12 See Diebold and Rudebusch (1991a, 1991b), Koenig and Emery (1991) and Lahiri and Wang (1994), who have studied the performanceof LEI in real time.
308
There are several advantages of the approach used in this study. First, the results are not based on any specific macroeconomic time series like the real GNP, unemployment or the index of industrial production. A recession is a comprehensive concept defined in terms of a well-diffused and significant fall in the overall level of economic activity. McNees (1991) has shown that it is practically impossible to characterize a recession using only one or two individual series. The NBER considers a wide variety of monthly data to make retrospective decisions about cyclical turning points, where the relative importance of these diverse sources is essentially determined by expert judgment. 13 Secondly, our results are independent of whether revised or preliminary data are used. For instance, Estrella and Hardouvelis (1991) found that revised rather than preliminary GNP figures are better predicted by their term structure variable. Finally, our analysis, except for the use of certain sample based priors in the maximum likelihood estimation, is completely e x ante. The down turn probabilities that we report are not "smoothed" inference based on the full sample (Yl,y2,'" ,Yr), but rather "filter" inference based on ( Y l , Y 2 , " " ,Yt). In contrast, Estrella and Hardouvelis (1991) have reported recession probabilities based on an estimated Probit model where the dependent variable takes value one during NBER-defined recessions and value zero otherwise. The independent variable in their analysis was (10TB_ITB) lagged four quarters. The probabilities they reported were in-sample fitted values of the dependent variable, rather than out-of-sample ex ante predictions. Also, their specific choice of the independent variable inadvertently assumes a fixed lead time of four quarters for predicting all expansions and recessions, which would impose a severe specification error into the analysis. 4. Implications for the monetary transmission mechanism The monetary transmission mechanism is the process through which monetary policy decisions are transmitted to real GDP and inflation. An understanding of the nature of the transmission mechanism is necessary for an efficient conduct of the monetary policy. We have found that FR_10TB and 10TB_ITB signaled expansions and recessions with very similar lead times. Since the mid-1960s, the funds rate has represented the Fed's conscious and intended policy actions better than any other variables. Presumably, output and prices do not respond directly to the Federal funds rate but to real interest rates of at least 3-6 months' maturity. The Treasury bill rates are determined by expectations of the funds rate over the life of the instruments. Thus, the Fed targets the funds rate with the aim of anchoring the term structure of interest rates, which in turn changes the real rate in the short- and intermediate-run, cf. Mishkin (1990). Bernanke (1990, Table 7) has shown that funds target change announcements get fully reflected in actual Federal funds rates within two weeks. Cook and Hahn (1989) have demonstrated that, during the 1970s, the 3-, 6-, and 12-month bill rates moved by 13See Hall (1991).
309
about 50 basis points in response to a one percent change in the funds rate target. This suggests that about half of each target change is expected by the time it is realized. This also explains the finding of Estrella and Hardouvelis (1991) that the tilt in the term structure contains information in addition to monetary policy changes. However, since the Fed reacts purposefully to economic events, we can not automatically say that the Federal funds rate changes are the fundamental causes of interest rate changes, - both could be driven by more fundamental shocks. These could be technology shocks, taste shocks, demand shocks or supply shocks. Of course, as Goodfried (1991) has pointed out, many of these shocks may originate in the Fed as policy mistakes or shifts in political pressures on the Fed. In fact, Bernanke and Blinder (1992) have shown that innovations in the funds rate overwhelmingly represent policy-induced shocks to the supply of reserves. Note that variations in 10TB_ITB, even though somewhat muted, are very similar to those in FR_10TB. Since the Treasury bill rate primarily affects the behavior of those who save and invest rather than of those who borrow (cf. Friedman and Kuttner, 1993a), it seems that monetary policy works primarily by affecting this group of agents first, and 10TB_ITB, like FR_10TB, fundamentally represents the stance of the monetary policy. In recent years, many authors including Bernanke and Blinder (1992), Friedman and Kuttner (1993b) and Kashyap, Stein and Wilcox (1993) have emphasized the importance of an independent "credit" channel of the monetary transmission mechanism. According to the credit channel, the direct effects of monetary policy on interest rates are amplified by endogenous changes in the external finance premium, which is the difference in costs between funds raised externally and internally. Bernanke and Gertler (1995) suggest two avenues through the monetary policy changes will affect external finance premium in credit markets: the balance sheet channel (or net worth channel) and the bank lending channel. The balance sheet channel arises because a tight monetary policy directly and indirectly weakens borrowers' balance sheet position. Beyond its impact on borrowers' balance sheet, monetary policy may also affect the supply of loans by commercial banks. This is the bank lending channel. Thus, in addition to the usual "money" channel which affects liabilities (i.e. deposits), monetary policy also operates via affecting bank assets (i.e. loans) and the net worth of firms. Bernanke and Blinder (1992) showed that the effect on deposit begins immediately and is complete in about nine months. Bank loans, on the other hand, start reacting only after approximately six months and the entire effect of a decline in deposits is reflected in loans by the end of the second year. Bernanke and Gertler (1995) showed that following a monetery tightening, the adverse balance sheet effect in corporate cash flows and profits tend to peak in about six to nine months. In the U.S., major recessions are often attributed to tight monetary policies implemented primarily to deal with inflationary pressures. The average postwar expansion in the U.S. has lasted a little more than four years. 14 Thus, the time needed for the money or credit channels to work subsequent to a monetary
14 Of. Diebold, Rudebusch and Sichel (1993, p 262).
310
contraction was present. Thus, we can not tell which of the two channels has been relatively more effective. Most of the previous analysis in this area of research assumed symmetry, so that the explanation of the slow fall in loans after a monetary tightening also explains why loans are slow to rise after a monetary easing, cf. Ramey (1992). However, on the average, a postwar recession lasted only 10-11 months. After recognizing the onset of a recession (which may take at least 2-3 months), it is expected that the Fed will relax its monetary policy. Romer and Romer (1994) have shown that monetary policy has been instrumental in ending each of the eight post-war recessions. The very fact that the economy has always turned around in such a short time following the monetary policy stimulus indicates that the relaxed monetary policy acted not through the credit/loan channel but through the money and balance sheet channels. Our observation does not suggest that an independent loan channel does not exist; in fact, the long expansions can be explained by the delayed effects of output via expanded loan supply. However, our analysis does suggest that the money channel together with the balance sheet effects are adequately effective by themselves as counter-cyclical policy instruments. Ramey (1992) and Romer and Romer (1991) have reached a similar conclusion emphasizing the role of only the conventional money channel. 15 The independent and prompt role of the balance sheet channel is consistent with the fact that post-war recessions have been steeper and shorter than expansions. Gertler and Gilchrist (1994) and Oliner and Rudebusch (1994) have found striking differences in the behavior between large and small firms when they face the corporate cash squeeze. Larger firms, which are more likely to have recourse to commercial paper markets and other sources of short-term credit, typically respond to an unanticipated decline in cash flows by increasing their short-term borrowing. In contrast, small firms - which in most cases have more limited access to short-term credit markets - respond to cash squeeze by cutting production. Further more, these differences between large and small firms are expected to be more important just before recessions in tight money periods. During booms, small firms appear to smooth production in much the same way that large firms do. Thus, during recessions when liquidity constraints are likely to be binding for many of these firms, an expansionary monetary policy will have more drastic effect on the economy than during booms. This is consistent with the evidence of asymmetry over the phase of the business cycle that we find in this study. Using a similar framework, Garcia and Schaller (1995) found that monetary policy is more potent during recessions than during expansions. By comparing 6CP_6TB with FR_10TB and 10TB 1TB (see Tables 2 and 3), we found that the former leads cyclical peaks consistently and with much less lead time than those of the other two. Also, on the average, 6CP_6TB lagged behind
15 Romer and Romer (1993) have recently shown that a large part of the impact of tight monetary policy on bank lending can be attributed to Fed's actions like explicit credit controls, special reserve requirements, moral suasion, etc. aimed at reducing bank loans directly, rather than to an inherent feature of the monetary transmission mechanism. See also Bernanke (1993).
Interest rate spreads as predictors o f busb~ess cycles
311
each cyclical trough by nearly two months. On the other hand, FR 10TB and 10TB_ITB always predicted the troughs with a lead time of 2-3 months. These results are consistent with Friedman and Kuttner's (1993b) explanation of why the private-public spread co-moves with business cycles. Based on the presumed imperfect portfolio substitutability between commercial paper and Treasury bills, they proposed three independent explanations. First, the spread directly reflects the perceived default risk, which sensitively summarizes disparate information. Second, a widening paper-bill spread is a symptom of contraction in bank lending due to tighter monetary policy. Finally, cyclical variation of firms' cash flows can impact the commercial paper market in such a way that the paper-bill spread will widen just before and during recessions. W~ can see that none of these factors would make the paper-bill spread change much in advance of the recessions. For instance, unlike FR_10TB or 10TB 1TB, monetary policy is reflected in 6CP_6TB only after lending starts to contract, which does not occur at least six months after the initial monetary tightening. Also, as we pointed out earlier, the default risk and changing cash requirements tend to increase not only immediately before recessions but also well into the recessions. Finally, the last recession was signaled by 6CP_6TB with a lag of 5 months, whereas FR_10TB and 10TB_ITB predicted the peak with lead times of 15-18 months. Bernanke and Lown's (1991) analysis has revealed that the lending slowdown caused by a weakened state of borrowers' balance sheets together with the banking sector's "credit crunch" in the prerecession period had precipitated the recession. They have shown that during the year before the beginning of the 1990 recession, the slowdown in bank lending was accompanied by expansions in both commercial paper and finance company lending which is consistent with the hypothesis that a constraint on bank loan supply initiated the downturn. Owens and Sehreft (1992) and Cantor and Wenninger (1993) have also produced evidence in favor of a credit crunch in the prerecession period, and Romer and Romer (1993) have identified December 1988 as one of the seven episodes of significant monetary contraction in the postwar era. This explains why FR_10TB and 10TB 1TB could predict the recession. However, due to the overall weakened state of demand and other factors, the loan channel was not sufficiently powerful to induce a sufficient widening of the paper-bill spread ahead of the peak to generate the recessionary signal. However, like previous recessions, it did give the trough signal with a lag of one month. This simply means that the factors which helped 6CP_6TB to track the recoveries in the past were also present during the last turning point.
5. Conclusion
We have studied the comparative performance of a number of interest rate spreads as predictors of U. S. business cycle turning points over the period 195393. In order to map changes in the predictor variables into turning point predictions, we used a non-linear filter developed by Hamilton (1989). In our fra-
312
mework, the dynamic behavior of the economy is allowed to vary between expansions and recessions in terms of duration and volatility. We concentrated on three spreads which have shown maximum potential in past research. They were the difference between the Federal funds rate and the ten-year Treasury bond rate ( FR_10TB ), the difference between the ten-year Treasury bond rate and the oneyear Treasury bill rate (10TB_ITB), and the spread between the six-month commercial paper and six-month Treasury bill rates (6CP_6TB). Over 1953-1993 the second one, i.e. the tilt of the term structure, did best - it signaled all turning points ( peaks and troughs) without any false signal. The peak signals came with an average lead time of nearly 20 months and the trough signals with an average lead time of nearly 3 months. The behavior of the spread based on the Federal funds rate was similar to that of the yield curve with very similar lead times. All earlier studies have emphasized the success of the spread variables in predicting peaks and seldom looked into their performance in predicting recoveries. Our analysis reiterates the view that the characteristics of a recessionary regime are quite different from those of an expansionary regime, and that the optimal forecasting horizon for predicting a recession is apt to be much longer than the one for predicting an expansion. We also found that the latest cyclical peak of July 1990 and the trough of March 1991 were forecastable on the basis of 10TB_ITB and FR_10TB alone. The funds rate spread did not anticipate the recessions of 1957 and 1961, and issued a false signal in 1966. This undesirable performance is not entirely unexpected. During the 1950s the Federal funds market was not fully developed and the variations in the funds rate did not reflect the stance of the monetary policy. The sole false alarm was a reflection of the credit crunch of 1966 which was followed by the growth recession of 1966-67. The paper-bill spread did not anticipate the recession of 1990; otherwise it signaled all other recessions with an average lead time of nearly six months. Unlike 10TB_ITB and FR_10TB, however, 6CP 6TB signaled trough turning points with a lag of two months on the average. It also signaled six pairs of false signals most of which, arguably, were associated with growth recessions. Even though the performance of the paper-bill spread was the worst of the three, its record is very similar to that of the Commerce Department's Composite Index of Leading Indicators. Thus, given that the interest rates are promptly available and are never revised, the overall performance of the three interest spreads has been truly remarkable. Our empirical results also suggest that the usual "money" and the "balance sheet" channels of monetary transmission mechanism, which work by directly affecting the term structure of interest rates, the bank deposits and lending, are more instrumental than the so-called "loan channel" in the conduct of a countercyclical monetary policy. From the standpoint of practical forecasting, the most important empirical result of this study is that the interest rate spreads are capable of signaling business cycles consistently on an ex-ante basis with admirable lead times.
ln terest rate spreads as predictors of business cycles
313
Acknowledgement
A n e a r l i e r v e r s i o n o f this p a p e r w a s p r e s e n t e d at T h e 7th W o r l d C o n g r e s s o f t h e E c o n o m e t r i c S o c i e t y , T o k y o , A u g u s t 2 2 - 2 9 , 1995. W e t h a n k P a u l F i s h e r , K e n n e t h K u t t n e r , G . S. M a d d a l a , J o h n T a y l o r a n d V i c t o r Z a r n o w i t z f o r m a n y helpful comments and suggestions.
References
Bernanke, B. S. (1990). On the predictive power of interest rates and interest rate spreads. New England Econom. Rev. Federal Reserve Bank of Boston, November-December, 51-68. Bernanke, B. S. (1993). How important is the credit channel in the transmission of monetary policy? A Comment. Carnegie-Rochester Conf. Vol. 39, 47-52. Bernanke, B. S. and A. S. Blinder (1992). The federal funds rate and the channels of monetary transmission. Amer. Econom. Rev: 82, 901-921. Bernanke, B. S. and M. Gertler (1995). Inside the Black Box: The credit channel of monetary policy transmission. J. Econom. Perspectives 9, 27-48. Bernanke, B. S. and F. S. Mishkin (1993), The predictive power of interest rate spread: Evidence from six industrialized countries. Paper presented at the American Economic Association meeting, Anaheim, California. Bernanke, B. S. and C. Lown (1991). The Credit Channel. Brookings Paper on Econom. Activity. 2, 205539. Burns, A. F. and W. C. Mitchell (1946). Measuring Business Cycles. Cambridge, Mass: NBER. Cantor, R. and J. Wenningery (1993). Perspective on the credit slowdown. Fed. Res. Bank of N. Y. Quart. Rev. 18, 3-36. Cook, T. and T. Hahn (1989). The effect of changes in the federal funds rate target on market interest rates in the 1970s. J. Monetary Econom. 24, 331-349. Dasgupta, S. and K. Lahiri (1993). On the use of dispersion measures from NAPM surveys in business cycle forecasting. J. Forecasting 12, 239-253. De Gooijer, J. G. and K. Kumar (1992). Some recent developments in non-linear time series modelling, testing, and forecasting. Internat. J. Forecast. 8, 135-156. Diebold, F. X. and G. D. Rudebusch (1989). Scoring the leading indicators. J. Business 64, 369-391. Diebold, F. X. and G. D. Rudebusch (1991a). Turning point prediction with the composite leading index: An ex ante analysis. In: K. Lahiri and G. H. Moore, eds., Leading Economic Indicators: New Approaches and Forecasting Records, Cambridge Univ. Press, 231-256. Diebold, F. X. and G. D. Rudebusch (1991b). Forecasting output with the composite leading index: A real-time analysis. J. Amer. Statist. Assoc. 86, 603-610. Diebold, F. X. and G. D. Rudebusch and D. F. Sichel (1993). Further evidence on business cycle duration dependence. In: J. H. Stock and M.W. Watson, eds., New Research on Business Cycles, Indicators and Forecasting, Univ. Chicago Press for NBER, Chicago, 255-284. Engel, C. M. and J. D. Hamilton (1990). Long swings in the dollar: Are they in the data and do market know it? Amer. Econom. Rev. 80, 689-713. Estrella, A. and G. A. Hardouvelis (1991). The term structure as a predictor of real economic activity. J. Finance 46, 555-576. Fair, R. C. (1993). Estimating event probabilities from macroeconometric models using stochastic stimulation. In: J. H. Stock and M. W. Watson, eds., New Research in Business Cycles, Indicators, and Forecasting, Univ. Chicago Press for NBER. Chicago, 157-176. Fama, E. F. (1990). Term structure forecasts of interest rates, inflation, and real returns. J. Monetary Econom. 25, 59-76.
314
Feinman, J. and W. Poole (1989). Federal reserve policy-making: An overview and analysis of the policy process: A comment. Carnegie-Rochester Conf. Series on Pub. Pol. 30, 63-74. French, M. W. and D. F. Sichel (1993). Cyclical patterns in the variance of economic activity. J. Business Econom. Statist. 11, 113-119. Friedman, B. M. and K. N. Kuttner (1992). Money, income, prices and interest rates. Amer. Econom. Rev. 82, 472-492. Friedman, B. M. and K. N. Kuttner (1993a). Another look at the evidence on money-income causality. J. Econometrics 44, 189-203. Friedman, B. M. and K. N. Kuttner (1993b). Why does the paper-bill spread predict real economic activity? In: J. H. Stock, and M. W. Watson, eds., New Research in Business Cycles, Indicators, and Forecasting, Chicago: Univ. Chicago Press and NBER, 213-249. Garcia, R. and H. Schaller (1995). Are the effects of monetary policy asymmetric? Mimeo, Univ. Montreal, Canada. Gertler, M. and S. Gilchrist (1994). Monetary policy, business cycles, and the behavior of small manufacturing firms. Quart. J. Econom. 109, 309-340. Goodfriend, M. (1991). Interest rates and the conduct of monetary policy. Carnegie-Rochester Conf. Set. on Pub. Pol. 34, 7-30. De Gooijer, J. G. and K. Kumar (1992). Some recent developments in non-linear time series modelling, testing, and forecasting. Internat. J. Forecast. 8, 135-156. Hall, R. E. (1991). The business cycle dating process. NBER Reporter, NBER Inc., Winter 1991/2, 1-3. Hamilton, J. D. (1988) Rational-expectations econometric analysis of changes in regime: An investigation of the term structure of interest rates. J. Econom. Dynamic Control 12, 385-423. Hamilton, J. D. (1989). A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica 57, 375-384. Hamilton, J. D. (1990). Analysis of time series subject to changes in regime, aT.Econometrics 45, 39-70. Hamilton, J. D. (1991). A Quasi-Bayesian approach to estimating parameters for mixtures of normal Distributions. aT. Business Econom. Statist. 9, 2239. Hamilton, J. D. (1993). Estimation, inference, and forecasting of time series subject to changes in regime. In: G. S. Maddala, C. R. Rao and R. Vinod, eds., Handbook of Statistics, Vol. 11, NorthHolland, Amsterdam, 231-260. Harvey, C. R. (1988). The real term structure and consumption growth. J. Financ. Econom. 22, 305333. Hicks, J. (1950). A Contribution to the Theory of Trade Cycle. Oxford, Clarendon. Karamouzis, N. and R. Lombra (1989). Federal reserve policymaking: An overview and analysis of the policy process. Carnegie-Rochester Conf. Series on Pub. Pol. 30, 7-62. Kashyap, A. K., J. C. Stein and D. W. Wilcox (1993). Monetary policy and credit conditions: Evidence from the composition of external finance. Amer. Econom. Rev. 83, 79 98. Keynes, J. M. (1936). The General Theory of Employment, Interest, and Money. London: Macmillan. Koenig, E. F. and K, M. Emery (1991). Misleading indicators? Using the composite leading indicators to predict cyclical turning points. Fed. Res. Bank of Dallas, Econom. Rev. (July), 1-14. Koenig, E. F. and Emery, K. M. (1993). Why the composite index of leading indicators doesn't lead. Contemp. Pol. Issues 12, 52~6. Laurent, R. D. (1988). An interest rate-based indicator of monetary policy. Econom. Perspectives, Fed. Res. Bank of Chicago, January/February, 3-14. Laurent, R. D. (1989). Testing the 'Spread'. Econom. Perspectives, Fed. Res. Bank of Chicago, July/ August, 22-34. Lahiri, K. and J. G. Wang (1994). Predicting cyclical turning points with leading index in a Markov switching model. J. Forecasting 13, 245-263. McNees, S. K. (1991). Forecasting cyclical turning points: The record in the past three recessions. In: K. Lahiri and G. H. Moore, eds., Leading Economic Indicators: New Approaches and Forecasting Records, Cambridge University Press, Cambridge, 151-168. McNees, S. K. (1992). How large are the economic forecast errors? New EngL Econom. Rev. Fed. Res. Bank of Boston, July/August, 25-42.
315
Mishkin, F. S. (1990). What does the term structure tell us about future inflation? J. Monetary Econom. 25, 77-95. Neftci, S. N. (1982). Optimal prediction in cyclical downturns. J. Econom. Dynamic Control 4, 225241. Neftci, S. N. (1984). Are economic time series asymmetric over the business cycle? J. Politic. Econom. 92, 305-328. Oliner, S. and G. Rudebusch (1994). Is there a broad credit channel? Mimeo, Board of Governors, Washington, D.C. Owens, R. E. and S. L. Schreft (1993). Indentifying credit crunches. Fed. Res. Bank of Richmond, Working Paper No. 93-2, Richmond, Virginia. Ramey V. A. (1993). How important is the credit channel in the transmission of monetary policy? Carnegie-Rochester Conf. Ser. on Pub. Pol. 39, 1~45. Romer, C. D. and D. H. Romer (1994). What ends recessions? In: S. Fischer and J. Rotemberg, eds., NBER Macroeconomics Annual 1994, MIT Press: Cambridge, Mass., 13-57. Romer, C. D. and D. H. Romer (1993). Credit channels or credit actions? An interpretation of the postwar transmission mechanism. NBER working Paper No. 4485, October. Romer, C. D. and D. H. Romer (1990). New evidence on the monetary transmission mechanism. Brookings Papers on Econom. Activity 1, 149-213. Sichel, S. (1989). Are business cycle asymmetric? A correction. J. Politic. Econom. 97, 1255-1260. Sims, C. A. (1993). A nine-variable probabilistic macroeconomic forecasting model. In: J. H. Stock and M. W. Watson, eds., New Research on Business Cycles, Indicators, and Forecasting, University of Chicago Press, Chicago, 179-212. Stambaugh, R. F. (1988). The information in forward rates: Implications for models of the term structure. J. Finan. Econom. 21, 41-70. Stock, J. H. and M. W. Watson (1989). New Indexes of leading and coincident economic indicators. In: O. Blanchard and S. Fischer, eds., NBER Macroeconomics Annual, 351-394. Stock, J. H. and M. W. Watson (1990a). Business cycle properties of selected U.S. economic time series, 1959-1988. NBER Working Paper, No. 3376. Stock, J. H. and M. W. Watson (1990b). A probability model of the coincident economic indicators. In: K. Lahiri and G. H. Moore eds., Leading Economic Indicators: New Approaches and Forecasting Records. Cambridge University Press, 63-89. Stock, J. H and M. W. Watson (1993). A procedure for predicting recessions with leading indicators: Econometric issues and recent experience. In: J. H. Stock and M. W. Watson, eds., New Research on Business Cycles, Indicators, and Forecasting, University of Chicago Press, Chicago, 95-153. Zarnowitz, V. (1992) Business Cycle. Theory, History, Indicators, and Forecasting. The University of Chicago Press, Chicago. Zellner, A. and C. Hong (1989). Forecasting international growth rate using bayesian shrinkage and other procedures. J. Econometrics 40, 183-202.
G.S. Maddala and C.R. Ra0, eds., Handbookof Statistics, Vol. 14 1996ElsevierScienceB.V. All rights reserved.
I I
1 1
Nonlinear Time Series, Complexity Theory, and Finance*
William A. Brock and Pedro J. F. de Lima
1. I n t r o d u c t i o n
This article describes statistical aspects of a line of recent work in finance that is associated with the words "nonlinearity," "long term dependence," "fat tails," "chaos theory," and "complexity theory." We shall give a rather lengthy introduction in order to give the reader a road map through the issues taken up here. In the spirit of a road map, we shall indicate Section headings where each issue is discussed in detail or give references if the issue is not dealt with in this article. Before we begin let us give a brief overview of some recent "trendy" topics which shall play a role in this article. Centers of research in complexity theory such as the Brussels School, the Stuttgart School, the Santa Fe Institute, and hosts of other related Centers and Institutes springing up around the world are turning to computer based methods as well as analytical methods to study phenomena that lie within the rubric of "complex systems." Indeed highly publicized centers such as the Santa Fe Institute (SFI) place the computer and various types of "Adaptive Computation Methods" and "Artificial Life" at the center of their research strategies. In general the SFI methods blend together ideas from economics, evolutionary biology, computer science, interacting systems theory, and statistical mechanics. A good statement of the SFI approach for economics and finance is in the SFI July, 1993 newsletter, edited by LeBaron. A good example of SFI style work in finance is the work on "artificial stock markets" by Arthur, Holland, LeBaron, Palmer, and Taylor (1993). In that work different species of trading strategies coevolve as they strive to maximize a measure of fitness, i.e., profits. The system is designed to run on a desktop computer and could be viewed as a form of "artificial economic life" in the SFI sense of that term. There are no analytical results available for the Arthur et al. system. * The first author would like to thank National ScienceFoundation (Grant SBR-9422670)and the Vilas Trust for financial support. The paper has benefitedfrom commentsby Michelle Barnes, Craig Hiemstra, Blake LeBaron, G. S. Maddala, and J. Huston McCulloch. The usual disclaimer applies. 317
318
w. A. Brock and P. J. F. de Lima
The book edited by Friedman and Rust (1993) contains somewhat related work along analytical, experimental, and empirical lines. There is a section on the design and experience of the SFI evolutionary tournament where trading strategies competed against each other in a setting reminiscent of Axelrod's famous work on evolutionary tournaments for prisoner dilemma games. Finance works such as Brock (1993), Friggit (1994), Vaga (1994) fall into the interacting systems category. Vaga (1994) builds on his earlier works which apply statistical mechanics to build a stock market model that can exhibit phase transitions. Friggit (1994) uses statistical mechanics type methods to propose and study a theory of evolutive dynamics for high frequency foreign exchange markets. Brock (1993) builds a theory based on a unification of discrete choice theoretic modelling from econometrics, received asset pricing theories, and statistical mechanics. More will be said about this kind of theory in Section 4 below. The field of statistics itself has been moving in a related direction. Simulation based methods such as the Bootstrap (P. Hall (1994), Maddala and Li (1995)) and Dynamic Method of Simulated Moments (Duffle and Singleton (1993) and references to McFadden (1989) and Pakes and Pollard (1989)) are pushing analytical methods such as asymptotic expansions (first and higher order) off the center of the stage. We shall devote part of this article to an argument for a style of research in statistical finance where models inspired by direct theoretical arguments are estimated by computer assisted methods such as MSM and where model adequacy (specification testing) is done by bootstrapping financially relevant quantities under the null. That is to say, the quantities that are inputted into the specification tests are themselves motivated by the type of economic and financial behavior one is trying to study. For example, distributions of statistics gleaned off of trading strategies are bootstrapped under the null model being tested in Brock, Lakonishok, LeBaron (1992), and Levich and Thomas (1993).
1.1. Complexity theory & f i n a n c e
While "complexity theory" sometimes is taken to include chaos theory, we shall not spend much time on chaos theoretic applications to finance here. That topic has been covered by many reviews including, Abhyankar, Copeland, and Wong (1995), Brock, Hsieh, and LeBaron (1991), Creedy and Martin (1994), LeBaron (1994), and Scheinkman (1992). "Complexity" theory is a rather vague term. We use it here to refer to the research practices of centers such as the Brussels School (e.g. Prigogine and Sanglier (1987)), the Stuttgart School (e.g. Weidlich (1991)), and the Santa Fe Institute. Indeed the notion of "complexity" is so hard to define that a recent SCIENTIFIC AMERICAN article on the subject by Horgan (1995) quotes the MIT physicist Seth Lloyd as having compiled a list of at least 31 different definitions of "complexity" that have been proposed. We shall take the strategy here of an "intellectual factor analysis." I.e. we extract a few broad themes that
Nonlinear time series, complexity theory, and finance
319
capture the bulk of the research practices of "complexity research" that we wish to cover for this particular article. An important subset of these research practices includes building dynamical systems models of the form, Yt = h(Xt, ~t), Xt = F(Xt-1, qt, O) where Xt is the state vector at date t, Y~ is the vector of observables emitted by the system, ~ is a stochastic shock that may hit the observer function h at time t, qt is a stochastic shock that may hit the system's law of motion F, at time t and 0 is a vector of "tuning" parameters or "slow changing" parameters. The long run behavior of the system for each fixed 0 is studied by a mixture of analytic and computer-based methods. Then 0 is varied to study how this long run behavior changes. These changes are associated with "emergent behavior" or "emergent structure." There is also a major subtheme of this line of research which emphasizes how "simple rules" F can induce complicated behavior of the observables, Y. The hope of this subtheme of research is to use a combination of computer based and analytic based methods to catalogue "universality classes of F's" as mechanisms to generate different types of "complexity" and to use this research strategy to unearth a small number of universality classes of F's that generate the complex behavior we see in Nature. Wide classes of systems are searched to catalogue similar "species" of emergent structure. "Routes to chaos" such as period doubling cascades of bifurcations are well known examples of this type of methodology. Descriptions of this type of research are in Allen and McGlade's study of fisheries (in Prigogine and Sanglier (1987)), Weidlich's survey of Stuttgart School research (Weidlich (1991)), and Krugman's discussion of uses of this style of research in international trade and economic geography (Krugman (1993)). An interesting subtheme of "complexity" theory is the research into complicated systems whose inner mechanisms are so complex that they are studied by searching for "scaling laws" in observables emitted by the systems where "scaling laws" are broadly interpreted to include regularities of autocorrelations and cross correlations of asset returns, volatility of asset returns, and volume of trading across different assets such as different stocks, foreign exchange, etc. The intent in searching for these "scaling laws" is the hope that they will be robust to the details of a particular complex system and that they will be approximately the same across broad classes of complex systems. Note the similarity to the objective of finding broad classes of dynamical systems where the "emergent behavior" as 0 changes is the same within that class. The hope that there are "universal scaling laws" across widely disparate complex systems is one of the well springs that drives this style of research. A drawback of this "universal scaling laws" style of research is that most of the scaling laws are unconditional statistical objects, whereas, in finance at least, we are much more interested in eonditionalprobabilities. In many cases the set of data generating stochastic processes consistent with a given "scaling law" may just be too large to have much interest in finance. As an extreme example consider the set of stochastic processes {Xt} consistent with the Central Limit Theorem. That particular kind of "universal scaling law" is of limited usefulness in discriminating
320
W. A. Brock and P. J. F. de Lima
among alternative data generating mechanisms in finance. Let us explain what we mean in more detail. Much of statistics and econometrics centers around "Root N Central Limit Theorem" scaling:
n
n -1/2 Z ( X i - EXi)---+N(O, V),

i=l
n --~ oo.
Here > is weak convergence; N(0, V) denotes normal with mean zero and variance V; and {Xt} is a stochastic process with enough regularity so that the CLT is valid. For example {Xt} could be weakly nonstationary and weakly dependent and the CLT would still be valid. But, while, such scaling is used in other ways, such as hypothesis testing, it is not very useful as a discriminator across the class of potential data generating mechanisms. We shall be concerned in this article with mechanisms that lead to scaling that is not Root N. While such scaling still suffers from being a crude discriminator across the class of potential data generating processes, the hope is that the different scaling than CLT will lead to useful insights into what classes of data generating processes can generate such non root N scaling. A good example of this particular style of research is Bak and Chen (1991) who attempt to show that a particular class of probabilistic cellular automata, called "sandpile" models, are good abstractions for a broad variety of complex systems encountered in nature. Think of a real sandpile sitting on a table with sand being dropped upon it from above and think of "sandslides," i.e., "avalanches," of various sizes being triggered by this falling sand when the sandpile reaches "criticality." Furthermore they argue that sandpile models exhibit "power law scaling" of observables such as the distribution of avalanche size and such power law scaling such as "1 I f " noise is widely observed in nature. They argue that the robustness of the power law scaling to the details of particular sandpile automata is a "universal" property which makes the sandpile automaton a particularly useful metaphor for mechanisms that lead to power law scaling. In economics Scheinkman and Woodford (1994) have argued that local interactions and strong nonlinearities can combine through forward and backward linkages to create a breakdown of the Root N central limit theorem in a model of inventory dynamics which is built along the lines of the sandpile model. "Final demand" plays the role of the driving force of falling sand in Scheinkman and Woodford. A similar theme shows up with scaling near "phase transitions" in interacting systems models where it is argued that the form of this scaling is surprisingly robust to the details of the particular model under scrutiny (Ellis (1985, pp. 178-9)). The common criticism that interacting systems models require "tuning" of an exogenous parameter to generate non Root N scaling can be blunted by reformulation within the context of discrete choice random utility theory where the intensity of choice (the "tuning parameter") becomes endogenous along the lines
Nonlinear time series, complexity theory, andfinanee
321
of Brock (1993). All that needs to be done to endogenize the intensity of choice is to make it a function of the difference between the utilities of the choice alternatives. This can be motivated by modelling the tradeoff between costly choice effort and the gain in utility to expending such effort. One tractable way to do this is to set up a two stage problem where the first stage chooses {p~} to maximize entropy E = - ~ P i In(p/) s.t. ~ piUi = U(e), ~ P i 1, and the second stage chooses effort, e, to maximize ~ p~ (e)Ui - c(e), where p i (e) is the probability of choice i from the first stage. This can be viewed as an adaptation of the ideas of E.T. Jaynes into an economic tradeoff where U(e) represents the average amount of utility garnered from random choice when effort level, e, is put into it. See Brock (1993) for references to Jaynes and more on the relationship between maximum entropy, discrete choice, and statistical mechanics. In any event, whatever one's opinion on the need for "tuning" an outside parameter to "criticality," interactions models that generate non Root N scaling may play a role in understanding financial forces that lead to long term dependence and apparent non Root N scaling in the empirical work described in Section 3. Turn now to discussion of structural and empirical modelling by frequency. This is motivated by the belief that the economic forces differ by frequency.
=
1.2. Frequency based study 1.2.1. Theoretic models It is useful to organize discussion of structural theoretic based models and empirical/statistical models in finance by frequencies. At the highest frequencies, tic by tic for example, the market microstructural institutions surely matter. Phenomena such as bid/ask bounce and nonsynchronous trading surely loom large. See the work of Grossman, Miller, Froot, Schwartz and others in the Smith Report (1990) and the work of Domowitz and his co-authors in Friedman and Rust (1993) for discussion of institutional rules, their impact on price discovery and volatility, as well as time series properties of returns at high frequency. Domowitz shows that an institutional quantity which he calls, "the length of the order book" plays a key role in inducing time series properties of returns, the bid/ ask spread, and volatility at the very high frequencies. For another example, the reader should examine the work of Froot, Gammill, and Perold for the Smith Report (1990) in order to see how the autocorrelation function at the 15 minute frequency for the S&P 500 cash index has moved dramatically closer to zero over the period 1983-1989 and the possible explanations given there. They argue that reduction in transactions costs coupled with new trading practices such as portfolio and futures trading have acted to impound new information into prices much more rapidly than before. The possibilities that changes in bid/ask bounce or changes in non trading effects explain the drop in predictability are discounted by Froot et al.
322
Experimental and theoretical work on auction theory and market microstructural institutions (for example see the discussion in Friedman and Rust (1993) and the Roger Smith Report (1990)) has documented differences in performance of different "auction" systems. We put the noisy rational expectations models discussed in Grossman's book (1989) into the Medium to High frequency class. The Highest frequency class contains market microstructure models like those surveyed by Goodhart and O'Hara (1994) as well as the differences in auction institutions discussed above. Recent surveys at the Very High frequency are Goodhart and O'Hara (1994) and Guillaume et al. (1994). Methods designed to analyze financial phenomena at approximately weekly frequencies go into the Medium Frequency class. In order to organize our discussion in this article we shall view the market micro structure as operating at the highest frequency (from tic by tic to the 15 minute frequency, perhaps), whereas the information arrival process and price discovery itself taking place at the next highest frequency (15 minute frequency to daily frequency, perhaps). We shall view the "discovered" prices themselves as moving at the next highest frequency. We shall also view phenomena such as bid/ ask bounce and nonsynchronous trading as occurring at, perhaps, a slightly higher frequency than price discovery itself. It is well known that there are "daily seasonalities" in volume of trade and volatility of returns associated with the open and the close. This intraday seasonality causes problems for time series analysis. Andersen and Bollerslev (1994), show that application of "traditional time series methods [...] to raw high frequency returns may give rise to erroneous inference about the return volatility dynamics. [...] Moreover, de-seasonalization appears critical in uncovering the complex link between the short- and long-run return components, which may help explain the apparent conflict between the long-memory volatility characteristic observed in interday data and the rapid short-run decay associated with news arrivals in intraday data." For example Brock and Kleidon (1992) propose a model that "explains" bid/ ask spreads over the trading day from open to close and discuss evidence interpreted in the context of their model versus alternative models. We view this type of phenomena as, possibly, taking place at a higher frequency than the frequency of asymmetric information based theories such as Grossman (1989), but, taking place at, a lower frequency than phenomena induced by Domowitz's trading institutions in Friedman and Rust (1993). At the other extreme, the lowest frequency is the growth frequency studied by Mehra (1991), for example. At this frequency long run movements in (i) technical change, (ii) institutional change in the private sector, (iii) institutional change in the government sector, (iv) the age distribution of the population, etc. play the major role. We shall put methods designed to analyze monthly and lower frequencies into the Low Frequency class. One should think of these frequencies as business cycle frequencies or lower. For example, we put the Euler Equation and Consumption Based Capital Asset Pricing Model (CCAMP) based methods surveyed by Altug
Nonlinear time ser&s, complexity theory, and finance
323
and Labadie (1994), Campbell, Lo, and MacKinlay (1993), and Singleton (1990), as well as models which focus on finance constraints and mean reversion such as Jog and Schaller (1994) into the low frequency class. We also put the structural exchange rate models based on explicit modelling of the demand for money which are surveyed by Altug and Labadie (1994) into the low frequency class. Of course some of these phenomena may operate at a higher frequency. The boundary we are trying to draw here is very vague.
1.2.2. Statistical models In this article we wish to exposit some work that lies at the boundary of theory based structural approaches and "econometric" approaches. We also wish to put forward and motivate a view of specification testing in financial econometrics that may be somewhat controversial. Before we get formal let us try to explain in plain English what we mean. In the work surveyed by Singleton (1990) and the recent related line of work by Duffle and Singleton (1993) an explicit theoretical economic model forms the basic launch point of the statistical analysis. In Singleton (1990) much of the analysis flows from the Lucas (1978) pure exchange asset pricing model and its relatives. Singleton (1990) concludes that "comovements in consumption and various asset returns are not well described by a wide variety of representative agent models of price determination." de Fontnouvelle (1995) surveys studies, including his own, which take transactions costs into account. Potentially realistic transactions costs appear to reduce some of the conflict with data. In Duffle and Singleton (1993) the production based asset pricing models of Brock (1982) and Michener (1984) serve as the launch point for the statistical analysis which itself is a dynamic extension of the Simulated Method of Moments of McFadden (1989) and Pakes and Pollard (1989). Contrast this approach with the A R C H literature which constructs statistical models of asset returns and estimates them with few attempts to directly derive such models from an underlying theoretic structure. Here the pure economic theory like that which serves as the foundation of the type of work surveyed by Singleton (1990) lies in the background at best in the "purely statistical" work discussed in surveys like Bollerslev, Engle, and Nelson (1994). To illustrate this point, consider the following example. Most asset pricing models such as those treated in the book by Altug and Labadie (1994) generate an equilibrium asset pricing function of the form pt = P(Yt) where yt is a low dimensional state vector for the system. ARCH-type models are intended to model the innovations et = - P t - Et-l[pt], where Et-I[X] defines the expectation of the random variable X conditional on the information available at t - 1. Consider the broad class of A R C H models et = atZt where {Zt} is a sequence of independent and identically distributed (lid) random variables with mean zero and variance one with a symmetric about zero distribution (e.g. normal) and ~rt 2 (the conditional variance of et) is a function of past e's and o-'s. Call these A R C H processes, "symmetric A R C H processes." We shall show that {et} symmetric A R C H almost implies p(.) is essentially linear, i.e.,
324
IV. A. Brock and P. or. F. de Lima Et-l[p(yt)] = p(Et-l[Yt]) for all past y's.
This may imply unpleasant restrictions on the primitives of asset pricing models like those used in Lucas (1978), Brock (1982), and Duffle and Singleton (1993). For example, in the context of the models of Brock (1982) and Duffle and Singleton (1993), this is close to requiring that the utility function be logarithmic and the production function be Cobb Douglas with multiplicative shocks. One may not wish to impose such structure on the primitives of the model. In any event the implication that p ( y t ) is linear in the state variable yt is potentially unpleasant. We state PROPOSITION 1. A s s u m e Yt is one dimensional, pt - - P ( Y t ) , P(.) increasing in y. Furthermore assume t/t - Yt - Et-1 [Yt] and et - Pt - Et-1 [Pt], are conditionally (on past y ' s ) symmetrically distributed with mean zero and finite variance with unique conditional medians o f zero. Then p ( E t - l [ y t D - Et-1 [P(Yt)] f o r all past y's. Proof: By assumption, Prob{et = p(Et-l[yt] + t/t) - Et-l[p(Et-l[Yt] + t/t)] _< 0} = 1/2 = Prob{r/t _< p - I (Et-1 [p(Et-t [Yt] + t/t)]) - Et-1 [Yt]}. Now, by assumption t/t is conditionally symmetrically distributed about zero, so the conditional median of t/t is zero. Hence, p-1 (Et-1 [p(Et-1 [Yt] + t/t)]) - Et-1 [Yt] = O. Thus, Et-1 [p(Et-, [Yt] + ht)] = p(Et-1 [Ytl). Q.E.D.
This type of proposition can be generalized to p ( y t , Y t - 1 , . . . ,Yt-L) by following the above argument for the first component. While A R C H models can easily accommodate non symmetrically distributed innovations, empirical applications of A R C H models commonly assume symmetry of the innovations) Furthermore, the survey of Bollerslev, Engle, and Nelson (1994) contains no work which studies the "inverse mapping" between the statistical structure assumed in the A R C H - t y p e model being estimated and the underlying structure imposed upon the utilities, production functions, and market institutions of the underlying asset pricing model that would give inspiration or motivation for the A R C H - t y p e model being estimated. We gave a sample above of what such research might look like)
1 Normal, Student-t, generalized Student-t, and generalized error distributions appear to be the more commonly used distributions. An exception is the semiparametricARCH model of Engle and Rivera (1993). 2 Note that one could test the symmetryof the innovations' distribution by testing the hypothesis that Prob(St = 1) = 0.5 - where St = sgn(et), sgn(x) equals the sign ofx. This result holds even if the innovation sequence {Zt} is a dependent process, as it is the case for some more general ARCH representations - the weak-ARCH structure presented in Bollerslev, Engle and Nelson (1994).
Nonlinear time series, complexity theory, andfinance
325
At the daily frequency it is typical to associate movements in returns, volume, and volatility with the arrival of information. Development of a structural based approach that parallels the research in Singleton (1990) is still a topic for future research. For example, Lamoureux and Lastrapes (1994) quote Gallant, Rossi, and Tauchen (1992, p. 202) as saying the following about theoretically based models: "...they have not evolved sufficiently to guide the specification of an empirical model of daily stock market data." Lamoureux and Lastrapes (1994) go on to develop a "statistical" model of daily stock returns and daily volume. In section four below, we briefly describe some structural modelling that attempts to go part way towards an empirical model of daily stock market. We believe that this gap between the structure of the theoretic models that inspire the econometric models and the structure of the econometrics models which are actually estimated will vanish as developments in extensions of the bootstrap to financial time series problems and developments in extension of dynamic methods of simulated moments proceed. Computational advances such as those techniques discussed in Judd's forthcoming book (1995) will play a key role.
1.3. Organization o f the paper
This review is organized as follows. Section 1 contains the introduction. Section 2 discusses several tests of nonlinearity including the bispectral skewness test of Subba Rao and Gabr (1980), Hinich (1982), and the BDS test of Brock, Dechert, and Scheinkman (1987). It is pointed out that these tests are inconsistent. There are departures from linearity that these tests cannot detect. A discussion of some consistent tests follows. However, rejections of linearity of asset returns are common when these tests are used. The main issue in finance does not seem to be the inability to detect departures from linearity because rejections of linearity are so frequent. The main issue is to find reasons for the rejections. The discussion turns to the possibility that fat tailed returns distributions may be responsible for the rejections. This motivates methods of estimation of tail thickness. The discussion will present evidence that some of the tests reject the null too frequently under moment conditions appropriate for use on the heavy tailed data common in financial applications. Section 3 explores possible nonstationarities and long term dependencies, such as long memory, in asset returns. In view of the recent interest in long memory processes both in academic finance and in more popular writings in finance, we provide a fairly complete discussion of long memory both in returns and volatility of returns. Topics covered include Fractionally Integrated Generalized AutoRegressive Conditional Heteroskedasticity (FIGARCH), a cousin (FIEGARCH), Stochastic Volatility Models, Hurst Exponents, and Rescaled Range statistics. It is shown that the rescaled range test for long term dependence can be fooled by short term dependent Markov switching stochastic processes such as Hamilton and Susmel's (1994) SWARCH models. But the Hurst Exponent itself is more robust against this form of short term dependence. Section 4 gives a brief
326
discussion of the use of asymmetric information theory to generate potential explanations for the stylized features of autocorrelations and cross correlations among returns, volatility of returns, and trading volume. Furthermore we show how a modification of received asymmetric information theory can serve as a potential explanation of abrupt changes in returns, volatility of returns, and trading volume that seem inexplicable by changes in "news." We provide some concluding remarks at the end of the paper.
2. Nonfinearity in stock returns
We shall confine the meaning of the word "nonlinearity" to methods or models that cannot be analyzed by reduction to linearity via a change of units or extension of analogues of linear methods to higher conditional moments beyond conditional means. We must define what we mean by "stochastic linearity." Following Brock and Potter (1993, and references to Hall and Heyde, and Priestley) call a zero mean strictly stationary stochastic process {Yt} with enough regularity so that it possesses a one sided (causal) Wold representation, iid (mds) oe linear if it has a representation Yt = ~j=0 ~j t-j where the {cs} are Independent and Identically Distributed (Martingale Difference Sequence). We say {es} is an mds (with respect to the G-algebra generated by past e's) if,
E[es[Cs-1, es-2,...] = 0 , all s

Note that G A R C H models with zero conditional means are mds-Linear. Also note that since the Wold representation is essentially unfalsifiable (unless one tested for nonstationarity itself), it is not useful to call a strictly stationary process {Yt} "linear" if it has a moving average representation with uncorrelated errors. For this reason the notions of iid (rods) linearity are introduced. Note also that rods linearity implies that the best Mean Squared Error predictor is the best Linear predictor.
2.1. Lagrange multiplier and portmanteau tests of nonlinearity

A wide variety of tests for nonlinearity is available in the literature. We can broadly divide these tests into two categories, namely, tests designed with an alternative in mind - as the Lagrange multiplier class of tests (Rao's score test) and portmanteau tests. Granger and Ter/isvirta (1993) show that many of the available tests of nonlinearity have a Lagrange multiplier (LM) type interpretation. This class includes the Tsay (1986) test, the RESET tests of Ramsey (1969) and Thursby and Schmidt (1977), the neural network test of Lee, White and Granger (1993), White's (1987) dynamic information test, LM tests against A R C H effects (Engle (1982) and McLeod and Li (1984)), the LM tests of Saikkonen and Luukkonen (1988) against bilinear alternatives and exponential autoregressive models, and the LM test of Luukkonen, Saikkonen and Ter~isvirta (1988) against smooth transition autoregressive models.
327
Two portmanteau tests of linearity are the bispectrum test of Subba Rao and Gabr (1980) and Hinich (1982) and the BDS test (1987). These two tests are among the few nonlinearity tests that do not have a Lagrange multiplier type-test interpretation and both tests are known to have power against a wide variety of nonlinear alternatives. This last characteristic has made these two tests quite popular among practitioners. The bispectrum test is based on the fact that for a zero-mean linear process Yt the skewness function ]B(COl, co2)l2
8(031)8(0)2)8(0) 1 -~- 0)2)
(2.1)
is constant for all pairs of frequencies (0)1, (D2)" B(0)I, 092) is the power bispectrum the Fourier transform of the third-order cumulant E[ytyt+hyt+k] - and S(0)) is the power spectrum - the Fourier transform of E[ytyt+k]. Hinich's (1982) test of linearity looks at the dispersion of estimates of the skewness function at different frequencies. The BDS test is a function of the Grassberger-Procaccia correlation integral, C6, m = (N) -1 i<~<t<~N Za([[Yt - Y~[[), for N observations of the time series yt, where Y t =- (Yt~ Yt+l~..-.~ Yt+m-1), [1.[[ is the max-norm, and ga(.) is the symmetric indicator kernel with Za(X) = 1 if Ixl < 8 and 0 otherwise. BDS (1987) show that if yt is iid, then Ca,m = (C~3)" as N --+ co, and the statistic
BDSa, m = v@
C6'm -
(Ca'l)m
(2.2)
Sa,m
converges in distribution to a standard normal distribution, for 8 > 0 and m = 2 , 3 , . . . . S a , m is an estimate of the asymptotic standard deviation of v~(Ca,m - cam a) under the null of iid. A simple interpretation of the test can be given by noting that Ca, m is an estimator of Pr{[[Yt" - ym[[ < 8}, while Ca,l is an estimator of estimates Pr{[lYt - ysll < a}. Under the null of lid Prb{[lY2 - Y2[I < 8} = Prob{lYt - Ys] < 81,..., ] < 6, [Yt+m-1 - Ys+m-l[ < 8} (Prob{lYt - y~] < 8})" that is, the BDS test estimates the difference between the joint distribution and the product of the marginal distributions in the appropriate intervals. Note that this analogy is not complete because there might be some overlap between Yt+i and
Ys+j.
The BDS test becomes a portmanteau test of linearity if applied to the estimated residuals of a linear model. The null distribution of the test is not affected by this procedure, provided that v~-consistent estimation of the parameters of
328
the null model is possible 3. Proofs of this result are available in the original BDS (1987) paper, as well as Brock, Hsieh and LeBaron (1991) and de Lima (1995). The first two papers derive their result using continuous approximations to the indicator kernel Z6(.). The approach taken by de Lima (1995) generalizes results by Randles (1982) to deal with Z6(.) directly. In particular, these results show that if the data generating process is an A R M A ( p , q) model driven by iid innovations with finite second moments, the estimation of the parameters of the A R M A process does not affect the null distribution of the BDS test. Furthermore, this statement remains valid if the linear process has an autoregressive representation driven by iid innovations whose distribution is a member of the family of stable distributions, that is, the nuisance parameter-free property of the BDS statistic applies to a large class of linear processes with infinite variances - see de Lima (1995). While the local power properties of most LM-type tests of nonlinearity are relatively easy to characterize - see for example Granger and Ter~isvirta (1993), the distributions of the bispectrum and BDS tests are not known u n d e r the alternative hypothesis. For that reason, a considerable number of papers have studied the power properties of these tests by means of Monte Carlo simulations, c.f. Brock, Hsieh and LeBaron (1991), Lee, White and Granger (1993), and Barnett et al. (1994). As expected, the corresponding LM-tests seem to dominate for alternatives that are local to the null hypothesis. However, these tests are usually not very powerful against other departures of the null, while the BDS test appears quite powerful for almost every departure of the null - for example, as documented by Brock, Hsieh and LeBaron (1991), the power of the BDS test against A R C H alternative is close to Engle's (1982) L M test. This is true for both nonlinear stochastic processes and nonlinear deterministic, chaotic alternatives. Finally, note that whereas the BDS statistic is a natural test for the hypothesis that a stationary time series is lid-linear, the bispectrum test can be designed to test the hypothesis that the series {Yt} has the one-sided representation, CX3 Yt = }'-~q=0t~jct-j where {ct} is a symetrically distributed mds, with Ele~l < ~ . Assume, without loss of generality, that E[Yt] = 0, all t. Compute third order cumulants # ( s l , s 2 ) = E[YtYt+~IYt+~2] as in Priestley (1981). One is lead to examination of terms of the form E[etet+ket+t]. The mds property of {ct} allows one to show that E[et~t+kt+l] 0 except for k = l > 0. A version of the bispectral test could, perhaps, be designed to test the general mds property by shutting off power against terms of the form E[ete]+k] ,k > 0. See Barnett et al. (1994, especially references to the work of Hinich and his co-authors) for discussion of bispectral
=
3 This nuisance parameter-freeproperty of the BDS test remains valid if the test is applied to data generating processes that are additive in the error term, Yt = G(X, fl) + Ut, where fl is a vector of parameters and Xt is a (vector of) time series, satisfying a mixing property. Moreover, this property carries through to some multiplicative models of the type Yt = G(Xt,fl) Ut, provided that the test is applied to ln((Jt2), where (Jr are the estimated residuals. This last result shows that by means of an appropriate transformation of the residuals, the null asymptotic distribution of the BDS test is not affected by the use of estimated residuals from GARCH and EGARCH processes. See Brock and Potter (1993) and de Lima (1995) for both analytical and simulation results.
329
tests. It can be shown that E[etC2+k] ----0, for k > 0 for a large class of symmetric ARCH-type processes. Consider the G A R C H ( p , q) class, et = atZt where,
O.t2 ~---~0 _~_~1~2 1 _~_..._]_

{Z} ~ iid(0, 1),
O~pt_ p2
_~_fllO.t_ 12 _~_..._~_ ~qO~t_q,"
and Z is symmetrically distributed around the origin. Compute to show E[ete2+k] = 0 , for all t,k, for the G A R C H ( p , q ) class. 4 Hence all third order cumulants are zero for G A R C H ( p , q)-driven linear processes. Hence the bispectrum is zero for such processes. To put it another way, the bispectrum is zero for any stationary process with a " W o l d " type representation which is driven by G A R C H ( p , q) innovations. Since, in financial applications, the conditional mean of returns is small relative to the conditional variance, this suggests a potentially useful screening test for linear models driven by G A R C H ( p , q ) innovations. However, there is a potential difficulty in carrying out this useful research strategy. Innovations in models fitted to financial returns tend to have heavy tails - see Section 2.3. de Lima (1994a) shows that the bispectral test is badly sized for heavy tailed data. In particular he shows that the bispectral test requires finite sixth moments to be valid. Many financial datasets do not appear to have finite fourth moments much less finite sixth moments, and the bispectral test tends to reject a pareto l i D null too often when its tail exponent is chosen compatible with that estimated for financial data sets. Hence this poses a potential practical problem to implementing the above "portmanteau test" for linear processes driven by G A R C H ( p , q) innovations. Nevertheless we believe that research into uses of variations on the bispectrum would be useful. For example, one possible strategy to deal with de Lima's size problem is to bootstrap the bispectral skewness statistic under the null that the returns data under scrutiny lie in the G A R C H ( p , q) class. Of course this application of the bootstrap is well beyond the scope of the asymptotic theory that we have been able to find for the bootstrap (cf. LePage and Billard (1992), Leger, Politis, and Romano (1992), Li and Maddala (1995)). While there has been a lot of work on the "moving block bootstrap," work on bootstrapping the null distribution of interesting quantities (interesting to economists, at least) under parametric time series volatility models such as G A R C H seems sparse.
2.2. Cons&tent tests o f linearity
It should be noted that neither the BDS test nor the bispectrum test are consistent tests of nonlinearity, that is, there are known departures from linearity for which these tests have zero power. Dechert (1988) presents an example of a dependent process that the BDS test has no power to detect. Also, there are nonlinear processes that exhibit a fiat skewness function, such as G A R C H processes. The 4 We assume that both Z and e have finite absolute third moments.
330
asymptotic power of the bispectrum test of linearity against G A R C H processes is zero, because of the tests' failure to recognize the nonlinearity behind the flat skewness function. 5 Bierens (1990) presents a consistent conditional moment test. The test is closely related to the neural network test described in Lee, White and Granger (1993) and it can be used as a consistent test of linearity in the mean. The null hypothesis for the test is defined as E[y[X] = X'fl, almost surely, where (y,X) is a vector of iid random variables defined on ~ R k and fl is a k 1 vector of parameters. Alternatively, one could define the random variable u = y - E[ylX ] and test the hypothesis that E[ulX ] = 0. The mean independence between u and X implies that E[u 7'(X)] = 0 , for any function 7J(X). Bierens shows that the choice 7J(X) = exp(s'~b(X)) generates a consistent conditional moment test. Here qb is an arbitrary bounded one to one mapping from R ~k, and s E S, where S is some subset of R k. de Jong (1992) extends Bierens' results into a framework that allows for data dependence and for the fact that the conditional expectation of yt might depend on an infinite number of random variables, that is, yt = E[ytlzt-l,Zt-2,...]--ut where zt = (yt,Xt). In other words, under the null of linearity, the disturbance terms ut are a martingale difference sequence. The practical implementation of this consistent conditional moment test faces some difficulties. First, not much is known about the size and power properties of this test. In particular, different mappings (b are likely to have a significant impact on the small sample properties of the test. However, from a distributional point of view, the choice of s is a more delicate issue. Consistency is achieved by considering some functional of the process
N
M(s) = N -l/2 Z ( (Yt - Xt' fl ) exp(s'qS(X)))

t=l
with M(s) viewed as a random element of the space of continuous functions on a compact subset of ~k. Bierens presents two alternative approaches to construct a consistent test from the empirical process M(s). The first one (Bierens 1990, Theorem 3, p. 1450) gives rise to a statistic with an asymptotic distribution function that depends on the distribution of the data. Therefore, critical values for the test statistic have to be simulated each time the test is applied to a different data set. The second approach (Bierens 1990, Theorem 4, p. 1451) produces a tractable null distribution but the resulting test statistic is discontinuous in sample size. A few alternatives to this conditional moment test have been proposed in the literature. Wooldridge (1992) proposes a test that compares least squares estimation of the null model with a sieve estimator - e.g. White and Wooldridge (1991) - of a compact approximation to the alternative model. Note that the
5 It has been suggested that for such type of processes nonlinearity can be detected using higherorder polyspectrum based tests. The sample size requirements of such tests appear exceedingly demanding - see Barnett et al. (1994).
Nonlinear time series, complexity theory, andfinance
331
alternative hypothesis defines an infinite dimensional set. Therefore, as the sample size grows, the sieve estimator must be defined on an increasingly larger dimensional space. Similarly, de Jong and Bierens (1994) consider a consistent chisquare test where the (possibly) misspecified conditional mean function is approximated by means of series expansions. Hong and White (1995) also propose consistent specification tests that compare the least squares estimator with a nonparametric estimator of E[ylX ], specifically Fourier series and regression splines. One problem with the direct comparison of the parametric and nonparametric estimator is that the resulting test statistics converge in probability to zero, if the usual standardization by ~/N is employed - see Lee (1988). As a consequence, previous work avoids this degeneracy by using weighting devices as in Lee (1988), sample splitting as in Yatchew (1992) or by preventing the nonparametric model to nest the parametric model - Wooldridge (1992). The novelty in Hong and White (1995) is that they exploit this degeneracy and present two statistics that diverge under misspecification faster than the standard v ~ rate. Bradley and McClelland (1994a,b) propose a modification of the Bierens test that provides a (asymptotically) most powerful test among the class of consistent conditional moment tests. Let fi be the estimated residuals from least squares estimation of the model yi =X~.~+ui, where the observations { ( y i , X i ) : i = 1 , 2 , . . . , N } are a random sample from a distribution function F(y,x), such that E[yIX] = tX. Bradley and McClelland (1994a) show that ~(X) = E[fiIX] is the function that maximizes Elfin(X)] among the set of bounded functions. This guarantees consistency - E[fiE[fiX]] is different from zero whenever E[fiexp(s'4~(X))] is non zero. E[filX] is estimated by nonparametric kernel methods with bandwidth selection determined by cross-validation. To avoid overfitting problems associated with this procedure - which would result in size distortions, - Bradley and McClelland apply resampling techniques to the estimated residuals. This may be a potential problem for time series applications, namely if the conditional variance is not constant over time. Also, the nonparametric kernel method used to estimate E[filX] under the alternative may not be appropriate in a time series context as the misspecified conditional mean function might involve an infinite number of variables.
2.3. Nonlinearities and fat-tailed distributions

The derivation of the (asymptotic) null distribution of statistical tests requires technical assumptions on the nature of the distribution that generates the data. In particular, some moment conditions are usually imposed so that a central limit theorem can be applied to the test statistic under study. A simple test of the hypothesis that the mean of a random variable X is/~0 illustrates the problem quite clearly. Two types of auxiliary assumptions are brought in: the type of temporal dependence in the data and a moment condition. If random sampling can be assumed, a finite second moment guarantees that the Lindberg-Levy central limit theorem can be used to approximate the distribution of the sample mean.
332
Iv. A. Brock and P. J. F. de Lima
The same type of auxiliary moment condition assumptions need to be made in the derivation of the asymptotic distribution of nonlinearity tests. All the tests summarized in Section 3 assume that the data is generated by distributions with at least finite fourth-order moments. The only exception is the BDS test see de Lima (1994a). This is a consequence of the fact that the moment conditions required for convergence of the BDS statistic to a normal random variable apply to the indicator kernel X,(.). Because X,(-) is a binary variable all its moments are finite. However, some moment conditions need still to be imposed because the BDS test is applied to the estimated residuals of an A R M A ( p , q) and this estimating process should involve v~-consistent estimation techniques. As mentioned previously, iid innovations with finite variances are sufficient for v ~ consistent estimation of the parameters of an A R M A ( p , q) model. The robustness of nonlinearity tests to moment condition failure is of particular relevance for financial time series. The fatness of the tails of the distribution of stock and other financial asset returns is a well established stylized fact. Financial time series exhibit excess kurtosis. Furthermore, Mandelbrot (1963) provides evidence that unconditional second moments might not exist for commodity price changes. This has lead him to suggest the family of stable distributions as an alternative to the gaussian model. It should be noted that although the normal distribution is itself a stable distribution, it is the only member of the stable family that has finite second moment (and all other higherorder moments). Random variables that belong to the family of stable distributions have some nice theoretical properties. For example, they are the only family of distributions with domains of attraction and closed under addition. 6 Their usefulness as a model for financial time series has been strongly contested, though. Alternative characterizations of the marginal distribution of stock returns have been proposed - e.g. the t-student distribution of Blattberg and Gonedes (1974), and the mixture model of Clark (1974). Hsu, Miller and Wichern (1974) provide evidence that nonstationarities in the variance may bias Mandelbrot's statistical methods in favor of the stable model. Comparisons of these different approaches as well as discussions of the efficiency of the statistical methods involved in the estimation of the distributions are described in, among others, Fielitz and Rozelle (1983), Akigary and Booth (1988), Akigary and Lamoureux (1989), for stock returns, and Boothe and Glassman (1987) and Koedijk, Schafgans, and de Vries (1990) for exchange rates. More recently, Jansen and de Vries (1991) and Loretan and Phillips (1994) take a more direct approach to the problem of determining the existence of moments. Instead of trying to characterize the entire distribution, these two papers concentrate on the tails of the distribution, because the existence of moments is ultimately determined by rate of decay of the tails of the density function. Loretan and Phillips (1994) present estimates of the maximal moment exponent, 6 See Zolatarev (1986) for an extensivesurvey, Samorodnitskyand Taqqu (1994) for some recent developments, and McCulloch (1996) for a survey of applications to Finance.
333
0~= SUpq>0 E [X q ]( oc, for a group of stock market and exchange returns. The parameter e is estimated using the procedure developed by Hill (1975) and Hall (1982). Let X1,X2,...,X:v be a sample of independent observations on a distribution with (asymptotically) Pareto-type tails. Let XN,1,XN,2,..., XN,N represent the ordered sample values. The maximal moment exponent can then be consistently estimated by &s = s -1
j=l
lnXN, N-j+I -- lnXu,u-s
for some positive integer s. Letting s grow with the sample size (although at a smaller rate), Hall (1982) shows that sl/2(~s - ~) converges to a N(0, ca) random variable. Loretan and Phillips (1994) estimates suggest that variances are finite but fourth moments may not exist. In other words, these results provide strong evidence against gaussianity but also show little support for the stable model. McCulloch (1995), Mittnik and Rachev (1993) and Pagan (1995) argue, however, that the estimator used by Loretan and Phillips is not a very reliable measure of the shape of the tails of the unconditional distribution of asset returns. First, different choices for s - the number of order statistics - appear to produce significantly different estimates of the maximal moment exponent c~, especially when the number of observations is not very large. However, for reasonably sized samples, Loretan's (1991) simulations indicate that &~is a robust estimator of e if s does not exceed 10% of the sample size. This rule of thumb was first suggested by DuMouchel (1983). Second, the tail index estimator is a maximum likelihood estimator and it assumes that the sample is drawn from a population with Pareto tails. Mittnik and Rachev (1993) present a small simulation study for iid data generated by a Weibull distribution - for which c~= ec, - reporting a mean estimated maximal moment exponent of 3.785. McCulloch (1995) presents evidence that the parameter estimates reported by Jansen and de Vries (1991) and Loretan and Phillips (1994) are consistent with the estimates of the tail index obtained for data generated by stable distributions with c~< 2. Third, the convergence results provided by Hall (1982) assume random sampling. Pagan (1995) reports simulation results showing that the standard deviation of &~ can be significantly larger than predicted from the iid case if the data is generated from a G A R C H process. Note that ARCH-type processes generate heavy-tailed distributed data: de Haan, Resnik, Rootzen, and de Vries (1989) show that the unconditional distribution of A R C H variates has Pareto tails and de Vries (1991) presents a GARCH-type model where the unconditional distribution is stable. Furthermore, estimation of A R C H models for high frequency stock returns data usually produces parameter estimates that imply that fourth moments do not exist. Nelson (1990) shows that an IGARCH(1,1) model, although strictly stationary, does not have a finite variance. The consequences of using nonlinearity tests when moment condition failure is an issue are investigated de Lima (1994a). From the point of view of asymptotic theory it is shown that the distribution of the tests becomes non-standard. As an
334
example, for iid sequences that do not have finite fourth moments, it is shown that the normalization of the sum of the squares of the first h autocorrelations of the process by the number of observations does not provide convergence to a nondegenerate random variable (see de Lima 1994a, Proposition 1.) In other words, for this type of processes the McLeod-Li statistic collapses asymptotically to
zero.7
Simulation experiments presented in de Lima (1994a) show that most nonlinearity tests behave as predicted by the asymptotic result derived for the McLeod-Li test. In particular, the sampling distributions of those tests exhibit a pole around the origin. This would suggest that under moment condition failure and without the appropriate scaling of the tests' statistics, the empirical sizes would always be below the tests' nominal sizes. However, the simulation experiments also reveal that the variance of the tests can be extremely large, giving rise to a significant number of large values for the tests' statistics. This effect is especially more pronounced for extreme cases of moment failure. Further, tests that are designed to have maximal power against misspecification of the conditional variance as well as the bispectrum test seem to be especially sensitive to the non-existence of moments. 8 Overall, the only test that appears robust to moment condition failure - in both the asymptotic and the sampling distributions - is the BDS statistic. de Lima (1994a) presents a study of the relationship between nonlinearities and moment condition failure in a sample of 2165 individual stock returns listed in the 1991 Daily Stock files of the CRSP tapes. The median value of &s in the sample is 2.8 with more than 95% of the estimates above 2 (finite variance) and less than 2% above four. The application of the nonlinearity tests to randomly shuffled series shows a remarkable resemblance to the simulation experiments. This empirical study also shows that evidence of nonlinearity in stock returns can not all be attributed to the non-robustness of nonlinearity tests to moment condition failure. However, it shows that some of those tests are not very trustworthy in testing situations involving heavy-tailed data.
7 Note that an appropriately scaled version of the McLeod-Li statistic converges to a well defined random variable, although the limiting random variable does not have a chisquare distribution and the rate of convergence is slower than for the standard case. 8 The bispectrum test appears particularly sensitive to the problem of m o m e n t condition failure. Simulation experiments reported in de Lima (1994a) show that for iid sequences generated from the Pareto family of distributions satisfying
P(X>x) (P~) P(X<-x)
= 0.5(x+1) -~, x < 0 = 0.5(x+l)-L x > 0
with c= 1.5 (the maximal m o m e n t exponent) and 5000 observations, the 1%-sized test rejects the null of iid in 60% of the cases. Similarly large type-I errors are found for values of c~ between 2 and 6.
Nonl&ear time series, complexity theory, andfinance 2.4. Other topics in nonlinearity testing
335
2.4.1. Nonlinearities and nonstationarities Constancy of the moments of the unconditional distribution of asset returns is a typical assumption of many time series models, including volatility processes such as ARCH. However, given the rate at which new financial and technological tools have been introduced in financial markets, the case for existence of structural changes (and thus for lack of stationarity) seems quite strong, especially when relatively large periods of time are considered. For example, Pagan and Schwert (1990) and Loretan and Phillips (1994) reject the hypothesis that stock returns are covariance-stationarity. Therefore, it is of particular interest to determine whether findings of nonlinearity might be due to nonstationarities in the data. In terms of A R C H models, Diebold (1986) and Lamoureux and Lastrapes (1990) suggest that shifts in the unconditional variance could explain common findings of persistence in the conditional variance. Simonato (1992) applies a G A R C H process with changes in regime - using Goldfeld and Quandt (1973) switching-regression method - to a group of European exchange rates and finds that consideration of structural breaks greatly reduces evidence for A R C H effects. Another model that tries to capture the idea that several volatility periods are present in the data is Cai (1994) and Hamilton and Susmel (1994) Markov switching A R C H (SWARCH) model. 9 A characterization of stock returns as nonstationary processes with discrete shifts in the unconditional variance can be traced back to Hsu, Miller and Wichern (1974). Hinich and Patterson (1985) challenge this view, supporting the alternative hypothesis that stock prices are realizations of nonlinear stochastic processes. They argue that nonstationarities would bias the bispectrum test used in their analysis toward acceptance of linearity. Given that their tests statistics clearly reject this hypothesis, they discard the existence of nonstationarities in daily stock returns during the period July 1962 through December 1977. Using the BDS test, Hsieh (1991) rejects the hypothesis that structural breaks are responsible for the rejection of linearity by means of subsample analysis and by looking at data with different (higher) frequencies. Because the BDS test rejects the null hypothesis for all different subsamples and frequencies, Hsieh concludes that "[...] it is unlikely that infrequent structural changes are causing the rejection of iid [...]". The distinction between nonlinearity and nonstationarity is also central to Incl{m (1993), who presents a nonparametric approach to distinguish between shifts in the unconditional variance and a time-varying conditional variance. de Lima (1994b) uses a generalization of the BDS test to investigate whether rejections of linearity for stock market returns are due to nonstationarities in the data. This paper uses the fact that normalized partial sums of the BDS statistic converge to standard Brownian motion and analyses common stock returns in-
9 See Section 3 for a more general discussionof variance persistence and the SWARCHmodel.
336
dexes between January 1980 and December 1990. It is shown that the period that goes from October 15, 1987 and November 20, 1987 assumes an extremely influential role in the rejection of nonlinearity provided by the BDS statistic for the entire period: for any subsample period starting in January 1980 and ending before October 15, 1987 the BDS test would not reject the null of linearity. Note that Diebold and Lopez (1995), using the autocorrelation function of the squared returns, conclude that evidence for GARCH effects in stock returns during the eighties is also small. However, de Lima (1994b) results also indicate that nonlinearities seem to play an active role in the dynamics of stock indexes after October 1987.
2.4.2. Identification of nonlinear alternatives

Despite their usefulness as general tests for nonlinearity, a rejection of the null by any of the two portmanteau tests described above gives the applied researcher little or no guidance on the actual nature of nonlinearities that might be causing the rejection of the null hypothesis. A test closely related to the BDS test, due to Savit and Green (1991) and Wu, Savit and Brock (1993) is of particular interest in this regard. Instead of relying on estimates of unconditional probabilities, these two papers propose a test that uses correlation integral type estimators of the sequence of conditional probability statements,
Prob{At,slAt_l,s_l ) Prob{At,s]At_l,~_l,At-z,s-2}
= Prob{At,s}
Prob{At,~lAt_l,~-i }
Prob{At,s]Ot_l,~-l,... ,At-k,~-k} = Prob{At,~]At 1,~-1,... ,At-k+l,~-k+l}

(2.3) where At,~ = {(Yt, Y~) : lYt - Y~I < 6). These equalities hold under iid and using the definition of conditional probabilities it can be shown that they can be estimated by correlation integral-type quantities. Savit and Green's (1991) insight is that, under the alternative, these conditional probabilities can be used to detect at which lag temporal dependence is strongest. This type of analysis is of particular interest to Markov processes, commonly used in nonparametric time series analysis - e.g. Robinson (1983) and Gallant, Rossi and Tauchen (1993). Alternative approaches to the identification of nonlinear time series processes include the nonparametric version of the final prediction error criterion of Auestad and Tj6stheim (1990) and Tj6stheim and Auestad (1994) and Granger and Lin (1994) mutual information coefficient (relative entropy),
6(f, fxf y) = if f(x,y) log~. f(x'?7 , ~dxdy tJxtX)YytYJJ

where (x, y) is a pair of random variables with joint density function f(x, y) and marginals fx(x) and fy(y). See also Granger and Ter~isvirta (1993) for a general discussion of the use of nonparametric techniques in nonlinear modeling.
337
2.4.3. Multivariate extensions

Conditional probability statements of the type described in (2.3) can also be used to detect whether there are nonlinear causal relations between variables. Baek and Brock (1992a) define nonlinear Granger causality in the following terms: a time series {Yt} does not cause {x~} if Prob {At,s (X m)lAt-h,s-h (Xh), At_k~_k(Yk) } = Prob {At,s (X m)lAt-h,~-h (X h) } (2.4) where At,s(Wm) z {( W .tm , W~)]IWT' W~]]< b}, for W = X , Y. This means that the random variable y has no predictive power for x. Rewriting expression (2.4) in terms of ratios of unconditional probabilities and estimating the corresponding terms by correlation integral type statistics, Baek and Brock (1992a) show that (a normalized version of) the resulting statistic converges to a normal random variable, under the null hypothesis of noncausality from y to x. Baek and Brock (1992a) and Hiemstra and Jones (1994a) present alternative estimators of the asymptotic variance under different assumptions about the dependence properties for y and x. As for the univariate testing procedures involving the BDS statistic, the tests for nonlinear Granger causality are applied to estimated residuals of linear models. In the present case, nonlinear predictive power consists of any remaining predictive power that is left in the series after the data is filtered by a vector autoregressive model. Hiemstra and Jones (1994a) apply this testing strategy to daily stock returns and percentage changes in trading volume. Their work provides evidence of nonlinear Granger causality in both directions. However, note that the nonlinear impulse response analysis of Gallant, Rossi and Tauchen (1993), while supporting the idea that returns Granger-cause trading volume, does not detect a significant feedback mechanism from volume to prices. Correlation integral based methods have also been employed to detect general nonlinearities in multivariate setups. Back and Brock (1992b) generalize the BDS test for the null hypothesis that a vector of time series is temporal and cross sectional independent.
3. Long memory in stock returns
3.1. Long memory in the mean

The random walk hypothesis has dominated the empirical work on the characterization of the long run behavior of asset prices. The methods used to test this hypothesis include autoregressions of multiperiod returns - Fama and French (1988) - and variance ratio tests - Lo and MacKinlay (1988) and Poterba and Summers (1988). These two methods are closely related - see, for example, Kim, Nelson and Startz (1991) - and their application reflects a concern with the power of traditional tests to detect interesting alternatives to the null hypothesis of market efficiency.
338
One commonly studied alternative is the mean-reverting behavior of stock prices, corresponding to the idea that a given change in prices will be followed, in long time horizons, by predictable changes with opposite sign. This hypothesis describes stock prices - pt - as the sum of a r a n d o m walk - p7 - and a stationary component - ut. Summers (1986) argues that the transitory component is a slowly decaying process, namely an AR(1) process ut = p u t - 1 q- ct, where et is a white noise process and p is close to but less than one. Lo and MacKinlay (1988) and Poterba and Summers (1988) report variance ratio statistics that give some support to the hypothesis that stock prices are mean reverting. In particular, variance ratios appear to be greater than one for lags shorter than a year and below unity for longer lags. As the variance ratio statistic at lag q is a weighted sum of the first q autocorrelations of stock returns Cochrane (1988) and Lo and MacKinlay (1988), the observed pattern of variance ratios implies that stock returns are positively correlated over short time horizons, and negatively correlated over longer intervals. Note that this predictability of long-horizon returns is consistent with models where (some) agents behave irrationally (noise traders) as well as with efficient markets with time-varying equilibrium expected returns. Kim, Nelson and Startz (1991) and Richardson (1993), among others, have presented evidence that the tests used to detect mean reverting behavior might produce spurious results. A new disaggregated approach to the study of mean reversion which uses data on individual firms, and stresses the structural role of variation of financing constraints across different classes of firms is in Jog and Schaller (1994). Differential variation of finance constraints across different classes of firms (such as different size classes) appears to be a promising way to explain the well known variations in mean reversion across periods of financial stress such as the Great Depression as well as a promising way to respect scale economies in raising funds that can be exploited by larger firms. Also one expects the impact of central bank policy to vary across different classes of firms. Lo (1991) takes a somewhat different approach than the rest of the aggregative literature, to present a simple alternative model that generates a similar pattern for the variance ratio statistics. Lo's (1991) example assumes that stock returns are the sum of an AR(1) and a long memory process. Long m e m o r y stationary processes are characterized by the slow (hyperbolic) decay of their autocorrelation function, as opposed to short m e m o r y processes (such as A R M A ) whose autocorrelation function exhibits geometric decay. Alternatively, a long m e m o r y process can be characterized by the behavior of its spectral density function at the origin. 1 Long m e m o r y processes can generate l0 The autocorrelation function - p(k) - of a long memory process satisfiesp(k) ~ C k 2H-2, C ) O, for 0 < H < 1. For H > 1 / 2 , ~ p ( k ) = ~, whereas for H < 1/2, y~lp(k) t < c~ and y~p(k) = O. Correspondingly, the spectral density f(co) = y~ e-i~p(k)/2n diverges at the origin for H > 1/2 and tends to zero as 1091~ 0. Some authors reservethe term long memory for the first type of processes, and label the second type as "intermediate" memory or anti-persistent. See Beran (1994) and Brockwell and Davis (1991). For a survey of long memory processes and their application to Economics, see Baillie (1995).
339
non-periodical cyclical patterns as the ones observed by Hurst (1951) for the Nile River, where long periods of dryness are followed by long flood periods. Mandelbrot and Wallis (1968) coined this phenomenon as the Joseph or Hurst effect. The first paper that discusses the importance of long memory processes in asset markets is Mandelbrot (1971). Mandelbrot shows that under long range dependence perfect arbitraging is not possible. Mandelbrot has raised an important point here which has been expanded upon by Hodges (1995) to show that Fractal Brownian Motion is not a promising model for stock returns unless the market is grossly inefficient. He calculates that "for a market with a Hurst Exponent outside the range 0.4 to 0.6 less than 300 transactions would be required" to obtain "essentially riskless profits." He provides a useful table which relates Hurst exponent values, Sharpe Ratios, and numbers of transactions needed to capture profits under options strategies. Hodges has cast a lot of doubt on the plausibility of "long memory in mean" with Hurst exponents that deviate very far from 1/2. This is so because it is very easy to manufacture the profits and control the risks in a mean/variance setting if the returns data are truly generated by a Fractal Brownian Motion with Hurst exponent very far from 1/2. Whatever the surface plausibility of long memory, because traditional methods of financial economics rely heavily on the possibility of arbitraging, the detection of long memory in stock returns has emerged as a relevant empirical question. Greene and Fielitz (1977) is the first empirical investigation of the long memory hypothesis for stock returns. Their analysis relies heavily on the rescaled range (R/S) statistic first proposed by Hurst (1951). For a time series Xt and any arbitrary time interval of width s and starting point t, the sample sequential range R(t,s) is defined as R(t,s)=
O<k<s
max{X~,,_ * + k [Xt+ * s - X[])} * - (X~

z~ S
O<k<s
min
Xt+k -
+-[X;+,-Xt*
S
where Xt* is the cumulative sum of Xt over the interval from 0 to t, that is, t X ,,, with Xd * = 0, for convenience. The sample range is usually norXt* = ~,=1 malized by the standard deviation for the lag s,
'/=
k=1
Lk=1
.I ]
and the resulting ratio is known as the rescaled range R/S. In a series of papers, Mandelbrot and some of his co-workers have shown that the rescaled range statistic can distinguish between short and long memory processes, in the sense that for a stationary process with short range dependence the R/S statistic converges to a non degenerate random variable at r a t e s 1/2, whereas for processes
340
that exhibit long range dependence the R/S statistic converges to a non degenerate random variable at rate s~/, where H, the Hurst coefficient, is different from 1/2 - see Mandelbrot (1975). Moreover, theorem 6 in Mandelbrot (1975) establishes that the rate of convergence is also s 1/2 for iid sequences in the domain of attraction of stable distributions with infinite variance. In practical terms, the plot of the logarithm of the R/S statistic against the logarithm of s, for different values of s, should reveal whether the data was generated by a short-range or long-range dependent process: the different points should be spread around a straight line with slope 1/2 for short-range dependent processes and slope H ~ 1/2 for long-range dependent processes. Wallis and Matalas (1970) present a Monte Carlo simulation of two alternative procedures for selecting lags and starting points, known as F Hurst and G Hurst. In both cases, the estimate of H, the exponent of long-range dependence, is the slope of the least squares regression of l o g ( R / S ) on a constant and on log(s). Greene and Fielitz (1977) conduct such analysis on the daily returns to 200 common stocks listed in the New York Stock Exchange, concluding that long-term dependence characterizes a significant percentage of the sample. More recently, Peters (1994) also uses R/S analysis and provides evidence of the Hurst effect in the returns to some common financial assets. These findings of long memory in stock returns have been disputed on the grounds that classical R/S analysis is biased by the presence of short-term dependence, a fact already discussed by Wallis and Matalas (1970) and further studied by Davies and Harte (1987). Aydogan and Booth (1988) suggest that the Greene and Fielitz (1977) results might indeed be the outcome of the non-robustness of classical R/S analysis to serial dependency and nonstationarities. To correct for the bias induced by serial correlation, Peters (1994) applies classical R/ S analysis to the estimated residuals of first order autoregressive processes. Furthermore, he compares the values of the R/S statistics obtained for different lag lengths with the expected value of the R/S statistic. This expected value was computed by Anis and Lloyd (1976) for white noise processes. The value used by Peters (1994) reflects a correction term determined by simulation. However, note that Peters (1994) method still does not allow for formal hypothesis testing and his working assumption that an AR(1)filter removes short-term serial dependence for all series under test is highly questionable. Lo (1991) presents a refinement of R/S methods that allows formal statistical testing and is robust to serial correlation and some forms of non-stationarity. Under the null of short-memory, ll Lo shows that the statistic Q ( n ) = _ R ( 1 , n ) / S(1, n) converges weakly to the range of a Brownian bridge on the unit interval, a random variable with mean ~ and variance zc2/6 - re/2 and whose distribution function is positively skewed. The main innovation of Lo's procedure is the use of the Newey-West heteroskedasticity and autocorrelation consistent estimator,
11A short-memoryprocess is definedby Lo as a strongmixingprocess whosemixingcoefficients decay sufficientlyfast to zero.
Nonlineartimeseries,complexitytheory,andfinance S(l'n)Z=nk~=l(Xk--X)2 +nl~l
341
1"
2q
kk=j+lZ(Xk-X)(Xk_j-X)
I"
in place of of S(1, n) 2, where the coj (q)'s are the Bartlet weights. Furthermore, Lo's test does not have to rely on subsample analysis as the classical R/S analysis. Lo (1991) applies the Q(n) statistic to daily and monthly stock returns indexes (the equally and the value weighted indexes on the CRSP flies) and concludes that Greene and Fielitz (1977) methods overstate the existence of long memory in stock returns. The Q(n) statistic-also known as the modified R/S statistic-has been applied by several researchers to other financial data sets, namely Cheung and Lai (1993) to gold market returns, Cheung, Lai, and Lai (1993) and Crato (1994) to international stock markets, Goetzmann (1993) to historical stock returns series, Hiemstra and Jones (1994b) to a panel of stock returns and Mills (1993) to monthly UK stock returns - see also Baillie (1995). The evidence produced by these papers is largely concurrent with Lo's (1991) results, with the transformed R/S statistic finding little evidence of long memory in the returns to those financial assets. However, Pagan (1995) stresses that the choice of q, the number of autocorrelations included in the Newey-West estimator S(1, n) 2 is critical in terms of the results, with a small q usually providing evidence favorable to the alternative (as in the traditional Greene and Fieltiz application where q is set to zero), and a large q supporting the null. Andrews (1991) provides an automatic selection rule for q(also used by Lo (1991) in his application. However, this rule has optimal properties only for AR(I) processes. One additional problem with the Q(n) statistic appears to be its sensitivity to moment condition failure. Hiemstra and Jones (1994b) uncover a positive relation between maximal moment estimates and the probability of a left-tail rejection by the R/S test in their sample of stock returns. The relationship appears reversed for right-tail rejections. Note that, as mentioned previously, Mandelbrot (1975) and Mandelbrot and Taqqu (1979) show that the classical R/S analysis provides an almost surely consistent estimator of the Hurst coefficient even for iid data generated by infinite variance processes. However, these two papers provide no characterization of the limiting distribution of the R/S statistic. Furthermore, Lo (1991) proves convergence to the range of a Brownian bridge under the assumption that the first 4 + 6 (6 > 0) moments of the distribution of the data are finite. A simple simulation study, reported in Table 1, appears to confirm that while heavy-tailed data do not seem to affect the properties of the R/S estimator of the Hurst coefficient, the sampling distribution of the test is shifted to the left relatively to the asymptotic distribution, as observed by Hiemstra and Jones (1994b). Table 1 reports the results of computing the R/S statistic over 1000 series with 5000 observations generated from the family of Pareto distributions see expression (P~) in Section 2 with parameters ~ = 1.5 and ~ = 4. The average estimate of the Hurst coefficient is close to 0.5 as determined by Mandelbrot. However, rejection rates on the left tail are above the nominal sizes given by the asymptotic distribution, whereas rejection rates on the right tail are below the nominal sizes
342
W . A . Brock and P. J. F. d e L i m a
Table 1 Estimated Sizes of the Rescaled Range (R/S) test under moment condition failure Nominal Size ~ = 1.5 0 Left Tail Right Tail 0.01 0.05 0.10 0.10 0.05 0.01 Mean Std 20 40 60 A c~=4 0 20 40 60 A
0.023 0.020 0.017 0.014 0.023 0.018 0.013 0.010 0.007 0.018 0.098 0.092 0.081 0.075 0.093 0.071 0.062 0.051 0.044 0.072 0.174 0.172 0.166 0.156 0.177 0.136 0.124 0.110 0.099 0.130 0.030 0.024 0.022 0.016 0.030 0.092 0.086 0.078 0.080 0.093 0.009 0.000 0.005 0.006 0.009 0.038 0.039 0.035 0.031 0.036 0.000 0.000 0.000 0.000 0.000 0.009 0.007 0.008 0.007 0.008 0.510 0.510 0.510 0.511 0.510 0.521 0.521 0.521 0.522 0.521 0.028 0.027 0.027 0.026 0.028 0.031 0.030 0.030 0.029 0.031
The data were generated froma Pareto distribution with a = 1.5 and ~ = 4, respectively. Each of the 1000 series had N = 5000 observations. The rows labeled as Mean and Std report the mean estimate of the Hurst coefficientand its standard error in the simulations. Each column reports the empirical size of the R/S test for a different number of autocorrelations q included in the estimator S(1, n). The column labeled as A corresponds to Andrews (1991) optimal rule.
given by the asymptotic distribution. 12 The shift in the empirical d i s t r i b u t i o n is more p r o n o u n c e d for data generated with e = 1.5 t h a n for d a t a generated with = 4. This fact should n o t come as a surprise given the m o m e n t a s s u m p t i o n s m a d e by Lo (1991). Other tests of the long m e m o r y hypothesis are available in the literature. This set includes Geweke a n d P o r t e r - H u d a k (1983)-hereafter G P H - the locally optimal a n d b e t a - o p t i m a l tests o f Davies a n d H a r t e (1987), the L a g r a n g e multiplier tests developed by R o b i n s o n (1991a) a n d Agiakloglou, N e w b o l d a n d W o a h r (1994), a n d the locally best i n v a r i a n t test of W u (1992), closely related to the goodness of fit statistic of Beran (1992). 13 I n o p p o s i t i o n to the modified R/S statistic, all these tests assume a p a r a m e t r i c form for the alternative hypothesis, a l t h o u g h the G P H test only requires a p a r a m e t r i c specification of the long r u n d y n a m i c s o f the alternative process. F o r t h i s reason, the G P H test is sometimes designated as a semiparametric test. The d o m i n a n t p a r a m e t r i c discrete-time m o d e l that exhibits hyperbolic decay o f its a u t o c o r r e l a t i o n f u n c t i o n is the fractional integrated autoregressive m o v i n g average model ( A R F I M A ) i n t r o d u c e d i n d e p e n d e n t l y by G r a n g e r a n d Joyeux (1980) a n d H o s k i n g (1981)-see Viano, D e n i a u , a n d O p p e n h e i m (1994) for a c o n t i n u o u s time version. F o r - 0 . 5 < d < 0.5, Xt is said to follow a n A R F I M A (p,d,q) m o d e l if it is the u n i q u e s t a t i o n a r y solution to the e q u a t i o n 12 Left-tail rejectionscorrespond to rejection of the null hypothesisH = 1/2 against the alternative H < 1/2 (anti-persistent long memory) while right-tail rejections correspond to rejection of the null hypothesis H = 1/2 against the alternative H > 1/2 (persistent long memory). 13 Cheung (1993a) provides a Monte Carlo investigation of the small sample properties of some of the more popular tests of the long memory hypothesis.
Nonlinear time series, c o m p l e x i t y theory, and finance
343
(1 -
B)a c~(B)Xt = O(B)qt, qt ~ iidN(0, 4 )
where B is the backshift operator (Bj Xt = Xt_j,j = 0, :t:1, ~2,...), ~b(z) = 1 - (pl Z - ~ 2 Z2 - . . . - ~gpZ p and O(z) = 1 - O l z - 02 Z2 - , . . - Oqz q. Furthermore, the fractional differencing operator is defined through the expansion (1 - B ) a j.__z~0 F(j +
X)r(_d)BJ.
(3.1)
See Brockwell and Davis (1991) for a detailed treatment of this model. The spectral density of an ARFIMA(p, d, q) model is proportional to C[ 2 [-za as 12[ --* 0, for C > 0. The Geweke and Porter-Hudak (1983) test for long memory is based on this fact: regress the logarithm of the periodogram at low frequencies on some function of those frequencies and estimate d by the slope of this least squares regression.14 GPH argued that the resulting estimator of d could capture the long-memory behavior without being contaminated by the short-memory behavior of the process. Robinson (1993) showed that this argument is asymptotically correct if, besides truncation of the higher periodogram frequencies, an additional truncation of the very first ordinates is performed. The usual t-test of the hypothesis that d = 0 against d 0 is a test of the null hypothesis of shortmemory against long-memory alternatives. It should be noted that the small sample properties of both the GPH and Lo's rescaled-range test can be very sensitive to large autoregressive and moving average effects - see Cheung (1993a). Using the GPH approach, Cheung (1993b) finds some evidence of long memory in a set of nominal exchange rates and Cheung and Lai (1993) show that some linear combinations of foreign and domestic prices are long range dependent, that is, foreign and domestic prices are fractionally cointegrated. In their cross-section of stock returns, Hiemstra and Jones (1994b) find a close relationship between rejections of the short-memory null using the R/S statistic and the GPH test.
3.2. Long memory in volatilities
One of the more active research areas in long memory models is their application to volatility processes. This follows the analysis of conditional variance models started with Engle's (1982) seminal paper on autoregressive conditional heteroskedasticity (ARCH) models. ARCH models are defined as Yt = atZt, where Zt is usually taken to be an independent, identically distributed process, with E[Zt] = 0 and Var[Zt] = 1. The variable a 2 is a positive, ~t_l-measurable function, where ~ t - 1 is the sigma-algebra generated by ( Z t - l , Z t - 2 , . . . ) . Therefore, o-t 2 is the conditional variance of the process Yt.
14 For alternative estimation procedures of this regression equation see Beran (1993), and Robinson (1993)
344
Typically, the sample autocorrelation function of stock returns series resembles the autocorrelation function of a white noise process. However, the sample autocorrelation function of measures of volatility, such as the squared returns, the absolute returns, or the logarithm of squared returns, is positive with very slow decay. This fact explains why many applications of ARCH-type models involving high-frequency data indicate the presence of an approximate unit root in the univariate representation for volatility. This feature is present in the original Engle (1982) paper, and it has motivated some of the extensions of Engle's original work, namely Bollerslev (1986) generalized ARCH (GARCH) and Engle and Bollerslev (1986) integrated GARCH (IGARCH). Furthermore, applications of Nelson's (1991) exponential GARCH (EGARCH) model usually find roots to the autoregressive polynomial close to the unit circle. That is, high-frequency stock market data displays highly persistent volatility. The very slow decay of the autocorrelation function of the squared residuals motivated Crato and de Lima (1994) to apply the modified R/S and the GPH test to the squared residuals of various filtered U.S. stock returns indexes. The hypothesis that volatilities are short memory processes is clearly rejected for high frequency series. The rationale for applying long memory tests to the squared series comes from the fact that the conditional variance a~ of a GARCH(p,q) process can be written as an infinite-dimensional ARCH(oo), as in Bollerslev (1986). Therefore, this testing procedure parallels the Lagrange multiplier tests for GARCH effects, which are also performed on the squared series. Ding, Granger and Engle (1993) also study the decay of the autocorrelations of fractional moments of returns series. For returns (Yt) on the SP500 index, they construct the series [ytl v for different positive values of v and find very slow decaying autocorrelations. This has lead them to introduce a new class of ARCH models, the asymmetric power-ARCH, where v becomes a parameter to be estimated. However, this model is still finitely parameterized, making it a short-memory model. Two class of models have been proposed to capture the slow decay of the autocorrelation function of volatility series. One such class includes the fractional integrated GARCH (FIGARCH) and the fractionally integrated EGARCH models of Baillie, Bollerslev and Mikkelson (1993) and Bollerslev and Mikkelson (1994), and it is the natural extension of the ARCH class of models that allows a hyperbolic rate of decay for lagged squared innovations. The second class of long memory volatility models are the stochastic volatility models of Harvey (1993) and Breidt, Crato and de Lima (1994). The FIGARCH(p, d, q) model is defined as
(1 =
-
where ~b(z) and O(z) are pth and qth order polynomials, respectively and (1 - B) a is defined as in (3.1). Like IGARCH processes, the FIGARCH process is strictly stationary but not covariance stationary, because the variance is not finite. Consequently, the autocovariance function of y2 is not defined and the use of
345
spectral and autocovariance methods is not directly possible. Furthermore, the asymptotic properties of the (quasi)-maximum likelihood estimators discussed by Baillie, Bollerslev and Mikkelson (1993) rely on verification of a set of conditions put forward by Bollerslev and Wooldridge (1992). At this point, it is not yet known whether those conditions are satisfied for F I G A R C H processes. "The F I E G A R C H (p, d, q) model, defined as log a 2 t = #t + 0 (B) q6(B)-1 ( 1 -- B)-aO (Zt-l) defines a strictly stationary and ergodic process. Moreover, (log a ~ - #t) is a covariance stationary process if d < 0.5. Note that the function
g(Zt) = ~lZt
~2(Iztl
- EIztl)
was introduced by Nelson (1991) to capture the fact that stock price changes tend to be negatively correlated with changes in stock volatility, the so-called leverage effect. The asymptotic properties of the maximum likelihood estimator of the parameters of the F I E G A R C H model are also dependent on verification of the same set of conditions put forward by Bollerslev and Wooldridge (1992). Simulation experiments in Baillie, Bollerslev, and Mikkelson (1993) show that if a G A R C H process is fitted to data generated by a F I G A R C H model, the estimates obtained for the autoregressive polynomial imply roots that are very close to the unit circle, as it is typical in financial data. Moreover, in their application of the F I G A R C H model to the exchange rate between US dollars and the German mark, the hypothesis of I G A R C H behavior against fractionally integrated behavior is clearly rejected. Similar results are obtained by Bollerslev and Mikkelson (1993) in their application of the F I E G A R C H model to daily stock returns on the Standard and Poor's 500 stock index. The second class of models that allows for long memory in volatilities is the stochastic volatility class of models of Harvey (1993) and Breidt, Crato and de Lima (1994). A stochastic volatility model is an unobserved components model obtained as the product of two stochastic processes, say Yt = tTtZt, where Zt can be defined as for the A R C H model case, but a 2 is no longer an ~t_l-measurable process. Taylor (1986) assumes that the volatility logarithm In(at) follows a stationary, Gaussian AR(1) process. Note that stochastic volatility processes can be seen as the Euler approximation to the continuous time models used in theoretical finance, where the asset price P(t) and the volatility a(t) each follow a diffusion process. Taylor (1994) presents a recent survey of the alternative specifications assumed for the volatility process. Breidt, Crato and de Lima (1994) propose a stochastic volatility model that captures the slow decay of the autocorrelation function of the (logarithm of the) squared returns through an ARF1MA process for a function of the volatility process. Specifically, it is assumed that at = a e x p ( v t / 2 ) , where vt is a long memory process independent of Zt. It is straightforward to show that both yt and ~ are covariance and strictly stationary. After some transformations the model can be written as xt = # vt + et, where xt = log ~ , # = log a 2 + E [log Z2] and et is iid with mean zero and variance
346
IV. A. Brock and P. J. F. de Lima
7r2/2, under the assumption that Zt is Gaussian. xt inherits the long memory properties from yr. In their application of the long memory stochastic volatility model to stock returns, vt is an ARFIMA(1, d, 0) model with an estimated value for the differencing parameter d of 0.444. A standard t-statistic test clearly rejects the hypothesis that a short-memory process generated the data. The model is estimated by maximizing the Whittle's frequency-domain approximation to the Gaussian likelihood of the model. It is shown that this procedure gives consistent estimators of the parameters of the model. As with many other parameterizations concerning volatility processes, the robustness of the findings of long memory in the variance of stock returns processes remains yet to be addressed. In the first place, there are not many economic arguments available to support these statistical findings. Bollerslev and Mikkelson (1994) suggest that long memory in the volatilities of stock market indexes is a consequence of aggregation, because individual returns appear to have less persistent volatility. Granger (1980) shows that the sum of AR(1) processes with coefficients drawn randomly from a suitable distribution approaches a long memory process, as the number of terms in the sum increases. The same result can be derived in the context of short-memory stochastic volatility models, with aggregation generating the observed long memory in the market index. However, a simple application of long memory tests to a sample of 2165 returns extracted from the CRSP tapes, seems to contradict this hypothesis. The results presented in Table 2 for the level series are consistent with the results obtained by Hiemstra and Jones (1994b) for a similar sample, displaying little evidence of long memory in the means. However, both long memory tests indicate that a large percentage of the series exhibits some evidence of long memory in volatilities. However, these results should be taken with extreme care because, as shown in Crato and de Lima (1994) short memory volatility processes such as GARCH can lead to rejections of the short memory null by any of the tests of long memory considered in Table 3. Models of conditional heteroskedasticity are likely to be misspecified. One way of comparing alternative specifications is by concentrating on the ability of the models to track some of the sample features. Breidt, Crato and de Lima (1994) show that the autocorrelation function of the logarithm of the squared process estimated from their long memory stochastic volatility model fits the sample autocorrelation quite closely. In particular, the model can replicate the slow decay of the sample autocorrelation function, a feature that a short-memory process like Nelson's (1991) EGARCH can not match. The traditional GARCH(1,1) and IGARCH(1,1) models also show problems in generating this type of autocorrelation function. However, it is well known that the presence of nonstationarities can generate spurious evidence of extremely persistent features in the data. As mentioned in Section 2, nonstationarities have been suggested as explanation for the findings of persistence in the variance. Simulation results in Cheung (1993a) show that the R/S and the GPH test have robustness problems with shifts in the level of the series, which in terms of testing long memory in
N o n l i n e a r time series, c o m p l e x i t y theory, a n d f i n a n c e
347
Table 2: Rejections of short-memory in a sample of stock returns using the Geweke and Porter-Hudak (GPH) and the Rescaled Range (R/S) tests GPH X 16.4% 10.5% 0.511 0.165 X2 72.1% 65.3% 0.794 0.188 R/S X 12.0% 4.0% 0.524 0.031 X2 51.8% 41.6% 0.563 0.034
10% Test 5% Test Mean Std
The rows labeled as Mean and Std report the m e a n estimate o f the Hurst coefficient and its standard error across return series for the G P H and R/S methods. The G P H test was computed for frequencies between N 1 and N '5. The number of autocorrelations considered in the R/S test follows Andrews (1991).
Table 3: Estimated sizes of the Geweke and Porter-Hudak (GPH) and Rescaled Range (R/S) tests for S W A R C H models GPH
Size X X2
R/S
X
0.10 0.05 Mean Std
0.162 0.099 0.501 0.173
0.327 0.236 0.658 0.168
0.067 0.029 0.525 0.035
X2 0.466 0.353 0.565 0.042
The data were generated from the Student-t SWARCH-L(3,2) model reported in Hamilton and Susmel (1994). Each of the 1000 series had N = 1024 observations. The rows labeled as M e a n and Std report the m e a n estimate of the Hurst coefficient and its standard error across return series for the G P H and R/S methods. The G P H test was computed for frequencies between N 1 and N 5. The n u m b e r of autocorrelations considered in the R/S test follows Andrews (1991).
volatilities would mean that these two tests might have robustness problems to shifts in the variance. In this regard, a particularly interesting model is the Hamilton and Susmel (1994) switching A R C H (SWARCH) model. In this model, there are a finite number of volatility states (st) and the state variable is governed by a Markovchain with transition probabilities
Prob(st = j[st-I
= i, s t - 2 = k , . . . , Y t - 1 , Y t - 2 , . . .)
= Prob(st
= Jlst-1
= i) = P i j
The return process is then defined as yt = g(gt)l/2ut, where 9(st) 1/2 is constant at each different regime st and ut is an ARCH-type model. Hamilton and Susmel
348
(1994) consider several alternative ARCH specifications for ut, including the Glosten, Jagannathan, and Runkle (1994) parameterization that incorporates leverage effects into the ARCH framework. In this particular parameterization designated by SWARCH-L(p,q), where L stands for leverage effects - 0.t 2 is given by
2 l + ~2U2_2 _~_... + O~qUt_ 2 q -~- 3dt-lU2t - 1 0 .2 ~ (D -+- ~XlUt_
where dt-1 is a dummy variable that discriminates between positive and negative values of ut2_l. In the particular class of SWARCH models considered by Hamilton and Susmel (1994), the scale of the process changes with the regime but the parameters of the ut are independent of the volatility state. Hamilton and Susmel (1994) fit SWARCH models to weekly stock returns. This class of models presents slightly better one-period ahead forecasts (depending on the loss function considered) than more conventional GARCH models. For example, a SWARCH model with four volatility states is a conditional heteroskedastic model that has smaller mean squared error than a model with constant variance. It should be noted that this last model is a two parameter model (mean and variance) whereas the SWARCH model involves the estimation of fifteen different parameters. Some SWARCH type models may also lead to multimodal unconditional distributions which may be counterfactual. To address the question of whether data generated from a SWARCH model would appear like a long memory volatility process to the R/S and GPH tests, we ran a small Monte Carlo simulation experiment. We took the student t SWARCH-L(3,2) estimated by Hamilton and Susmel (1994) and generated two sets of 1000 series, the first one with 1024 observations and the second one with 2048 observations. We computed the two tests on the level series, and again on the squared series. As expected, when applied to the levels, the tests indicate no evidence of long memory. However, Table 3 shows that when applied to the squares of the series the tests spuriously detect evidence of long memory. Furthermore, the percentage of rejections is likely to increase if the data were generated from a SWARCH-L model estimated with higher frequency - e.g. daily data. Similar results are reported by Crato and de Lima (1994) for data generated by gaussian GARCH and IGARCH models, where it is shown that the generated data tends to produce larger values of the long memory test statistics than the ones actually observed in the data. However, the estimate of the Hurst coefficient provided by the R/S analysis might provide some useful information in discriminating between spurious rejections of the hypothesis that volatility processes are short memory processes against the alternative that they have long memory characteristics. Breidt, Crato and de Lima (1994) provide some Monte Carlo evidence that while the R/S statistic itself tends to over-reject the short-memory null, the estimate of the Hurst coefficient provided by the R/S statistic is close to its theoretical value of 0.5, when the number of autocorrelations included in the estimator S(1, N) is given by Andrews (1991) optimal rule. Note that the mean of the estimated Hurst coeffi-
349
cients reported in Table 3 using the R/S method is 0.525 for the level series and 0.565 for the squared series, with little variation in the simulation results. Using the same estimator, Breidt, Crato and de Lima (1994) report estimated values of the Hurst coefficient above 0.65 for the squared returns from the value weighted and equally weighted CRSP daily series. This point seems worthy of further investigation. Turn now to a brief discussion of recent efforts to bring asymmetric information models closer to explanation of empirical features of high to medium frequency asset market data.
4. Asymmetric information structural models and stylized features of stock returns
Recent works by Sargent (1993), Wang (1993, 1994), Brock and LeBaron (1995), de Fontnouvelle (1995), and references to the works of Admati, Campbell, Grossman, Hellwig, Lang, Litzenberger, Madrigal, Pfleiderer, Singleton, Stiglitz and others, have pushed the theory of asymmetric information models closer to an empirical model capable of explaining features of market data at higher frequencies than the business cycle frequencies stressed in the macrofinance works surveyed by Singleton (1990), and Altug and Labadie (1994). Without getting into formal detail let us attempt to give a description of some of this work and the stylized features of market activity that we wish the models to reproduce. Here are the stylized features: (i) The autocorrelation function of returns on individual assets is approximately zero at all leads and lags. This is a stylized statement of a version of the Efficient Markets Hypothesis. (ii) The autocorrelation function of a measure of volatility such as squared returns or absolute value of returns is positive with a slowly decaying tail (slower decay for indices). Feature two is a stylized version of the " A R C H " type phenomenon which has stimulated a voluminous "statistical" literature (cf. Bollerslev, Engle, and Nelson (1994)). Evidence for the slow decay of the autocorrelation function of volatility was discussed in Section three of this article. (iii) The autocorrelation function of trading volume has a similar shape to that of volatility. We shall call features (ii) and (iii) volatility and volume "persistence." (iv) The cross correlation function of volume and volatility is positive for volatility with current volume and falls off rapidly to zero for leads and lags. There may be some asymmetry in the falling off in leads versus lags (e.g. Antoniewicz (1992). (v) Short term predictability in the near-future increases when near-past volatility falls (LeBaron (1992)). (vi) Abrupt changes in returns, volatility, and trading volume occur which are hard to attach to "news." Turn now to an informal description of asymmetric information models. At each point in time risk averse traders receive signals on components of the actual future value of assets that are being traded today. Signals are random variables which are equal to the component of future value plus noise. Precision is the ratio of the component variance to the signal noise variance. A background level of trading volume is generated by different realizations of signals even
350
though the precision is the same. Trading volume is also generated by disparity in the precisions of signals across traders. If the structure of the model is common knowledge and traders are rationally conditioning on price and signals then the famous no-trade theorems of Milgrom, Stokey, and Tirole (cf. Sargent (1993) for a nice exposition) assert that volume will dry up unless a source of randomness is added so that traders are forced to "signal process." Wang's papers (1993, 1994) give elegant closed form solutions to a class of dynamic heterogenous agent asymmetric information models which reproduce some of the stylized features of market data. However, no work except that of Brock and LeBaron (1995) and de Fontnouvelle (1995) both endogenizes the information structure and calibrates the resulting models to see how closely they replicate the features (i)-(vi) above. Brock and LeBaron (1995) build an asymmetric information model with short lived assets and short lived traders where traders decide whether to spend resources on purchase of a precise signal to sharpen their conditional expectation on the end-of-period value of the asset or spend nothing and get a publically available crude conditional expectation. Call the actual end-of-period value of the asset, the "fundamental". The fundamental is a random variable which the market is pricing. The information purchase decision is based upon a discrete choice random utility model where the deterministic part of the utility is based upon a distributed lag measure of trading profits. The trading profits are calculated along an equilibrium path. de Fontnouvelle (1995) develops a much more sophisticated model along the same lines, but with infinite lived assets. He shows how persistence in the profit measure that governs the choice of signal purchase generates persistence in volatility and volume. It appears that if his profit measure decays slowly enough his model may be able to produce slowly decaying autocorrelation functions for volatility and volume. This may shed some light on the slow decay of volatility autocorrelations documented in Section three. de Fontnouvelle "solves" his model by developing an expansion around a known solution. Both models discussed here reproduce features (i)-(iv) with some limited success. Hence, since it has infinite lived assets, the de Fontnouvelle (1995) model may be a candidate for estimation on high frequency returns and volume data somewhat along the lines of Duffle and Singleton (1993). If one "backs off" from "ultra" rationality and does not allow traders to condition on the equilibrium price function then this kind of model generates trading volume which is persistent provided the heterogeneity of traders is persistent. The trader heterogeneity can be made persistent in the Brock and Lebaron (1995) model provided that the decision whether or not to purchase the signal is made on a slower time scale than the time scale of data observation. Infinite lived assets, together with slow decay of the distributed lags in the profit measure allow de Fontnouvelle to produce persistence without introduction of a slower time scale for information purchase decisions.
351
Volatility of price changes (or returns) depends upon the average precision of the market. The average precision of the market is defined to be the weighted average of precision of each trader type with the fraction of traders in that type. Volatility of price change is higher when market precision is high because the market is closely "tracking" the random end-of-period value which it is attempting to price. When precision is lowest, price change is proportional to the change in publically available conditional expectations. If the publically available information is very "coarse" this price change could be small. This observation contains a lesson, which is, perhaps, obvious to academics, but maybe not to commentators in the press: observed market volatility can not be automatically associated with problematic "excess" volatility. It can be shown that volatility persistence may be magnified provided that the precision purchase decision is made on a slower time scale than the data time scale. The precision purchase decision might be considered as a metaphor for the "style" of the traders, i.e. whether they are "short term", "medium term," or "long term" traders. This is so because at least part of the cost of signal precision is the opportunity cost of traders in maintaining their trading expertise and information base. Hence, for high frequency data, it may be plausibly realistic that the "style" of the traders does not change as fast as the data is collected. It is of interest to ask whether volatility persistence is inherent in the fundamental which the market is attempting to price, i.e. "estimate," or whether the market pricing process itself adds the volatility persistence. If traders are risk averse, volatility persistence in the fundamental can make them timid in their trading so that the contemporaneous correlation between volume and volatility damps enough to conflict with stylized feature (iv). Brock and LeBaron (1995) and de Fontnouvelle (1995) discuss this potential conflict with the stylized features unless the volatility persistence is being added by the market pricing process itself. Even though the above argument suggests the possibility that the market pricing process itself may be adding volatility persistence over and above the volatility persistence which is in the fundamental, the jury is still out on this issue. Now consider the impact of adding "outside" shares which the trading community as a whole must hold in equilibrium. This creates risk which the community as a whole cannot avoid. The trading community must be compensated to hold this risk. This effect creates a risk premium which discounts the equilibrium stock price. Randomness in the net supply of these "outside" shares is introduced in much of the asymmetric information literature in order to prevent common knowledge and price conditioning from drying up volume in equilibrium (See Sargent (1993)). If changes in the net supply of outside shares is positively correlated then the LeBaron effect (v) can be explained within the context of the Brock and LeBaron model. Here is why. Near-past volatility increases when near-past market precision increases. When market precision is infinite, autocorrelation in outside share supply has zero effect on autocorrelation of price change. This is so because the depressing effect upon equilibrium price caused by these outside shares is caused
352
by the risk that the community must bear in holding these outside shares. But when market precision is infinite this risk is zero. Autocorrelation of near-future price changes with current price changes is a ratio of covariance to the product of standard deviations. A rise in market precision increases the standard deviations for the reasons we gave above. The covariance would be zero because of fact (i). But it is positive in the Brock LeBaron model when the covariance in net supply of outside shares is positive and the market precision is finite. If the market precision increases this covariance is decreased for the reasons given above. We have an explanation for fact (v) within the context of this model. It remains to be seen whether this corresponds to any reason found in reality. However, de Fontnouvelle (1995) is able to produce the LeBaron (1992) effect in his more realistic model. Let us discuss fact (i). In the Brock and LeBaron (1995) model, observed market price is a predictor of the fundamental. Hence price differences represent differences in predictors which makes it fairly easy for the model to reproduce the stylized fact (i) provided that the fundamental is a random walk. de Fontnouvelle's model (1995) can do a better job of reproducing this feature because the intertemporal forces that act to produce low autocorrelation at higher frequencies are better captured by his model. Finally let us discuss the last fact (vi). Brock and LeBaron briefly discuss embedding their model in the general asset pricing framework with social interactions developed by Brock (1993). This framework grafts social interactions in the choice decision of whether to buy more precise information onto conventional asset pricing models and generates asset pricing formulae that can display abrupt changes in equilibrium asset values provided the social interaetions are strong enough. This is due to the interactions causing a breakdown of the cross sectional central limit theorem as the large economy limit is taken. In the Brock and LeBaron model all that is needed for the breakdown of the cross sectional central limit theorem, in the large economy limit, is that the product of the intensity of choice with the strength of the social interactions be large enough. In other words high intensity of choice, i.e., a lot of "rationality" can combine with a small amount of "sociology" to produce large responses to small changes in the environment. If the intensity of choice is parameterized as a function of the difference in profit measures from buying the signal versus not buying the signal, then this kind of model can not only endogenize "jumps" in market data but also lead to "phases" in the market where volatility and "excess returns" differ. In high precision phases volatility is high because the market is "tracking" well, but "excess" returns are not high because very little risk is being borne by holding the outside shares. This can be viewed as an integration of Vaga's (1994) "Coherent Market Hypothesis" with more conventional asset pricing theories. This kind of modelling can produce behavior which looks more like the "Markov switching" models of Hamilton and Susmel (1994) which are discussed in Section three. The different regimes correspond to the different phases when most traders are well informed and when most traders are poorly informed. The
353
social interactions magnify the coherence of traders decisions so that the trading group acts more like a "clump" rather than a group of independent random variables. This clumping can generate behavior that looks more like Markov switching. Section three shows how Markov switching models can produce "spurious" long term dependence in volatility. Of course, we do not wish to imply that social interactions are the only realistic forces that may produce abrupt changes in market data. See Jacklin, Kleidon, and Pfleiderer (1992) for a discussion of the role of other forces such as portfolio insurance, stale prices, trading institutions, etc., in producing abrupt changes such as the October crashes. In this section we have discussed very recent work on calibration of "structural" models to reproduce common features of financial data at relatively high frequencies. Furthermore these kinds of models appear tractable enough to estimate on returns and volume data with computer intensive methods, like those of Duffle and Singleton (1993). It may be possible to use bootstrap-based specification tests along the lines discussed in this section to judge the models, provided that advances in computer technology continue to drive computation costs down. Specification tests based upon quantities of direct financial interest like trading profits may give us better information than conventional specification testing on how to fix the model if it is rejected by the specification test. Turn now to some brief closing remarks.
This article has given a highly selective survey of some recent work in finance. The survey has given a brief discussion of: (i) "complexity theory" and its possible role in generating "fat tailed" returns data in finance, (ii) phenomena by frequency, (iii) nonlinearity testing, (iv) testing for long memory, (v) cautions raised by moment condition failure of popular tests, (vi) problems raised by testing for existence of moments, (vii) bootstrap-based specification testing based upon quantities of interest in finance such as trading profits, (viii) some recent efforts in asymmetric information structural modeling with calibration. In view of the challenges posed to conventional analytics by this type of work, we believe that future progress will make use of computer intensive methods such as Judd and Bernardo (1993), Judd (1994), and Rust (1994). Computer intensive methods will allow a closer dialogue between features of the data, structural modeling, and specification testing which uses financially relevant quantities such as trading profits.
References
Abhyankar, A., L. Copeland, and W. Wang (1995). Nonlinear dynamics in real-time equity market indices: Evidence from the UK. Econom. J. to appear.
354
IV. A. Brock and P. J. F. de Lima
Abhyankar, A. (1994). Linear and nonlinear granger causality: Evidence from the F T - SE100 index futures and cash markets. Department of Accountacy and Finance, University of Stifling, Scotland. Abu-Mostafa, Y. Chin. (1994). Proceedings o f Neural Networks in the Capital Markets: N N C M '94, California Institute of Technology. Agiakloglou, C. P. Newbold, and M. Woahr (1994). Lagrange multiplier tests for fractional difference. J. Time Ser. Anal. 15, 253-262. Akgiray, V. and G. C. Booth (1988). The stable-law model of stock returns. J. Business Econom. Statist. 6, 51-57. Akgiray, V. and C. Lamoureux (1989). Estimation of stable parameters: A comparative study. J. Business Econom. Statist. 7, 85-93. Altug, S. and P. Labadie (1994). Dynamic Choice and Asset Markets. New York: Academic Press. Andersen, T. (1995). Return volatility and trading volume: An information flow intepretation of stochastic volatility. Department of Finance, Kellogg School of Management, Northwestern University W.P. #170. Andersen, T. and T. Bollerslev (1994). Intraday seasonality and volatility persistence in foreign exchange and equity markets. Department of Finance, Kellogg School of Management, Northwestern University, W.P. #186. Andrews, D. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59, 817-858. Anis, A. and E. Lloyd (1976). The expected values of the adjusted rescaled Hurst range of independent normal summands. Biometrika 63, 111-116. Antoniewicz, R. (1992). A causal relationship between stock returns and volume. Board of Governors, Federal Reserve System, Washington, D.C. Antoniewicz, R. (1993). Relative volume and subsequent stock price movements. Board of Governors, Federal Reserve System, Washington, D.C. Arthur, B., J. Holland, B. LeBaron, R. Palmer, and P. Tayler (1993). Artificial economic life: A simple model of a stockmarket. Santa Fe Institute, Working Paper. Aydogan, K. and G. Booth (1988). Are there long cycles in common stock returns? South. Econom. J. 55, 141-149. Auestad, B. and D. Tj6stheim (1990). Identification of nonlinear time series: 1st order characterization and order determination. Biometrika 77, 669-687. Back, E. and W. Brock (1992a). A general test for nonlinear Granger causality: Bivariate model. Mimeo, Department of Economics, University of Wisconsin-Madison. Baek, E. and W. Brock (1992b). A nonparametric test for temporal dependence in a vector of time series. Statist. Sinica 2, 137-156. Baillie, R. (1995). Long memory processes and fractional integration in Econometrics. J. Econometrics to appear. Baillie, R., T. Bollerslev, and H. Mikkelsen, (1993). Fractionally integrated generalized autoregressive conditional heteroskedasticity. Working Paper No. 168, Department of Finance, Northwestern University. Bak, R. and D. Chen (1991). Self-organized criticality. Scientific American, January. Barnett, W., R. Gallant, M. Hinich, J. Jungeilges, D. Kaplan and M. Jensen (1994). A single-blind controlled competition between tests for nonlinearity and chaos. Working Paper No. 190, Department of Economics, Washington University in St. Louis. Beran, J. (1992). A goodness of fit test for time series with slowly decaying serial correlations. J. Roy. Statist. Soc., Ser. B 54, 749-760. Beran, J. (1993). Fitting long-memory models by generalized linear regression. Biometrika 80, 817-822. Beran, J. (1994). Statistics f o r Long-Memory Processes. Chapman and Hall, New York. Bierens, H. (1990). A consistent conditional moment test of functional form. Econometrica 58, 14431458. Blattberg, R. C. and N. J. Gonedes (1974). A comparison of the stable and student distributions as statistical models for stock prices. J. Business 47, 244-280.
355
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. J. Econometrics 31, 307-327. Bollerslev, T., R. Engle, and D. Nelson (1994). ARCH models. In: R. Engle and D. McFadden, eds., The Handbook of Econometrics, Vol. IV, North-Holland, Amsterdam. Bollerslev, T. and Mikkelsen, H. (1993). Modeling and pricing long-memory in stock market volatility. Working Paper No. 134, Department of Finance, Northwestern University. Bollerslev, T. and J. Wooldridge (1992). Quasi-maximum likelihood estimation and inference in dynamic models with time-varying covariances. Econometric Rev. 11, 143-172. Boothe, P. and D. Glassman (1987). The statistical distribution of exchange rates: Empirical evidence and economic implications. J. Internat. Economics 22, 297-320. Bradley, R. and R. McClelland (1994a). An improved nonparametric test for misspecification of functional form. Mimeo, Bureau of Labor Statistics. Bradley, R. and R. McClelland (1994b). A kernel test for neglected nonlinearity. Mimeo, Bureau of Labor Statistics. Breidt, J., N. Crato, and P. de Lima (1994). Modeling long memory stochastic volatility. J. Econometrics, to appear. Working Papers in Economics No. 323, Department of Economics, The Johns Hopkins University. Brock, W. (1982). Asset prices in a production economy. In: The Economics of Uncertainty, ed. by J.J. McCall, Chicago: University of Chicago Press. Brock, W. (1993). Pathways to randomness in the economy: Emergent nonlinearity and chaos in economics and finance. Estudios Economicos 8, E1 Colegio de Mexico, Enero-junio, 3 55. Brock, W. A., W. D. Dechert, and J. Scheinkman (1987). A test for independence based on the correlation dimension. Department of Economics, University of Wisconsin, University of Houston and University of Chicago. (Revised Version, 1991: Brock, W. A., W. D. Dechert, J. Scheinkman, and B. D. LeBaron), Econometric Rev. to appear. Brock, W., D. Hsieh, and B. LeBaron (1991). A Test of Nonlinear Dynamics, Chaos and Instability: Theory and Evidence. M.I.T Press, Cambridge. Brock, W. and A. Kleidon (1992). Periodic market closure and trading volume: A model of intraday bids and asks. J. Econ. Dynamic Control 16, 451-489. Brock, W., J. Lakonishok, and B. LeBaron (1992). Simple technical trading rules and the stochastic properties of stock returns. J. Finance 47, 1731 1764. Brock, W. and B. LeBaron (1995). A dynamic structural model for stock return volatility and trading volume. Rev. Econ. Stat. to appear, NBER W.P. #4988. Brock, W. A. and S. M. Potter (1993). Nonlinear time series and macroeconometrics. In: G. S. Maddala, C. R. Rao, and H. Vinod, eds., Handbook of Statistics Volume 11: Econometrics, North Holland, New York. Brockwell, P. and R. Davis (1991). Time Series." Theory and Models. Springer-Verlag, New York. Cai, J. (1994). A Markov model of unconditional variance in ARCH. J. Business Econom. Statist. 12, 309-316. Campbell, J., S. Grossman, and J. Wang (1993). Trading volume and serial correlation in serial returns. Quart. J. Econom. 108, 905-939. Campbell, J., A. Lo, and C. MacKinlay (1993). The Econometrics of Financial Markets. Princeton University Press, to appear. Cheung, Y. (1993a). Tests for fractional integration: A Monte Carlo investigation. J. Time Ser. Anal. 14, 331-345. Cheung, Y. (1993b). Long memory in foreign exchange rates. J. Business Econom. Statist. 11, 93 101. Cheung, Y. and K. Lai (1993). Do gold markets have long-memory? Financ. Rev. 28, 181~02. Cheung, Y., K. Lai and M. Lai (1993). Are there long cycles in foreign stock returns? J. Internat. Financ. Markets, Institut. Money 3, 33-47. Clark, P. (1973). A subordinated stochastic process model with finite variance for speculative prices. Econometrica 41, 135-155. Cochrane, J. (1988). How big is the random walk in GNP? J. Politic. Econom. 96, 893420.
356
Crato, N. (1994). Some international evidence regarding the stochastic memory of stock returns. Appl. Finane. Econom. 4, 33 39. Crato, N. and P. de Lima (1994). Long range dependence in the conditional variance of stock returns. Econom. Lett. 45, 281-285. Creedy, J. and V. Martin, (1994). Chaos and Non-linear Models in Economics: Theory and Applications. Brookfield, Vermont: Edward Elgar. Davies, R. and D. Harte (1987). Tests for Hurst effect. Biometrika 74, 95-102. De Fontnouvelle, P. (1995). Three Models of Stock Trading. PhD Thesis, Department of Economics, The University of Wisconsin, Madison. De Haan, L., S. Resnik, H. Rootzen and C. de Vries (1989). Extremal behavior of solutions to a stochastic difference equation with applications to ARCH-processes. Stochastic Processes and their Applications 32, 213-224. De Jong, R. (1992). The Bierens test under data dependence. Mimeo, Free University of Amsterdam. De Jong, R. and H. Bierens (1994). On the limit behavior of a chi-square type test if the number of conditional moments tested approaches infinity. Econometric Theory 9, 70-90. De Lima, P. (1994a). On the robustness of nonlinearity tests to moment condition failure. J. Econometrics, to appear, Working Papers in Economics No. 336, Department of Economics, The Johns Hopkins University. De Lima, P. (1994b). Nonlinearities and nonstationarities in stock returns. Mimeo, Department of Economics, The Johns Hopkins University. De Lima, P. (1995). Nuisance parameter free properties of correlation integral based statistics. Econometric Rev., to appear. De Vries, C. (1991). On the relation between GARCH and stable processes. Y. Econometrics 48, 313324. Dechert, W. D. (1988). A characterization of independence for a Gaussian process in terms of the correlation integral. University of Wisconsin SSRI W.P. 8812. Diebold, F. (1986). Modeling the persistence of conditional variances: Comment. Econometric Rev. 5, 51-56. Diebold, F. and J. Lopez (1995). Modeling volatility dynamics. In: K. Hoover, ed., Macroeconometrics: Developments, Tensions and Prospects, Kluwer Publishing Co. Ding, Z., C. Granger, and R. Engle (1993). A long memory property of stock market returns and a new model. J. Emp. Finance 1, 83 106. Duffle, D. and K. Singleton (1993). Simulated moments estimation of Markov models of asset prices. Econometrica 61, 929-952. DuMouchel, W. (1983). Estimating the stable index ~ in order to measure the tail thickness: A critique. Ann. Statist. 11, 1019-1031. Efron, B. and R. Tibshirani (1986). Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statist. Sci. 1, 54-77. Ellis, R. (1985), Entropy, Large Deviations and Statistical Mechanics. New York, Springer-Verlag. Engle, R. F. (1982). Autoregressive conditional heteroskedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50, 987-1007. Engle, R. and T. Bollerslev (1986). Modelling the persistence of conditional variances. Econometric Rev. 5, 1 50. Fama, E. and K. French (1988). Permanent and temporary components of stock prices. J. Politic. Econom. 96, 246-273. Fielitz B. and J. Rozelle (1983). Stable distributions and the mixtures of distributions hypothesis for common stock returns. J. Amer. Statist. Assoc. 78, 28-36. Friedman, D. and J. Rust, eds., (1993). The Double Auction Market: Institutions, Theories, and Evidence. Addison-Wesley, Redwood City, California. Friggit, J. (1995). Statistical mechanics of evolutive financial markets: Application to short term FOREX dynamics. Essec Business School, near Paris, France. Gallant, R., P. Rossi, and G. Tauchen (1992). Stock prices and volume. Rev. Financ. Stud. 5, 199 242.
357
Gallant, R., P. Rossi, and G. Tauchen (1993). Nonlinear dynamic structures. Econometrica 61, 871907. Geweke, J. and S. Porter-Hudak (1983). The estimation and application of long memory time series models. J. Time Ser. Anal. 4, 221-238. Glosten, L., R. Jagannathan and D. Runkle (1994). Reltionship between the expected value and the volatility of the nominal excess return on stocks. J. Finance 48, 1779-1802. Goetzman, W. (1993). Patterns in three centuries of stock market prices. J. Business 66, 249-270. Goldfeld, S. and R. Quandt (1973). A Markov model for switching regressions. J. Econometrics 1, 315. Goodhart, C. and M. O'Hara (1995). High frequency data in financial markets: Issues and applications. London School of Economics and Johnson Graduate School of Management, Cornell University. Granger, C. (1980). Long memory relationships and the aggregation of dynamic models. 3.. Econometrics 14, 227-238. Granger, C. and R. Joyeux (1980). An introduction to long-range time series models and fractional differencing. J. Time Ser. Anal. 1, 15-30. Granger, C. and J. Lin (1994). Using the mutual information coefficient to identify lags in nonlinear models. J. Time Ser. Anal. 15, 371-384. Granger, C. and T. Terfisvirta, (1993). Modeling Nonlinear Economic Relationships. Oxford University Press, Oxford. Greene M. and B. Fielitz (1977). Long-term dependence in common stock returns. J. Financ. Econom. 4, 339-349. Grossman, S. (1989). The Informational Role o f Prices. Cambridge, MA.: MIT Press. Guillaume, D., M. Dacorogna, R. Dave', U. Muller, R. Olsen, and O. Pictet (1994). From the bird's eye to the microscope: A survey of new stylized facts of the intra-daily foreign exchange markets. Olsen and Associates, Zurich, Switzerland. Hall, P. (1982). On some simple estimates of an exponent of regular variation. J. Roy. Statist. Soc. 44, 37-42. Hall, P. (1994). Methodology and theory for the bootstrap. In: R. Engle and D. McFadden, eds., The Handbook o f Econometrics, Vol. IV, North-Holland, Amsterdam. Hamilton, J. D. and R. Susmel (1994). Autoregressive conditional heteroskedasticity and changes in regime. J. Econometrics 64, 302333. Harvey, A. C. (1993). Long memory in stochastic volatility. Mimeo, London School of Economics. Hiemstra, C. and J. Jones (1994a). Testing for linear and nonlinear Granger causality in the stockvolume retlation. J. Finance 49, 1639-1664. Hiemstra, C. and J. Jones (1994b). Another look at long memory in common stock returns. Discussion Paper 94/077, University of Strathclyde. Hiemstra, C. and C. Kramer (1994). Nonlinearity and endogeneity in macro-asset pricing. Department of Finance, University of Strathclyde, Scotland. Hill, B. (1975). A simple general approach to inference about the tail of a distribution. Ann. Math. Statist. 3, 1163-1174. Hinich, M. (1982). Testing for Gaussianity and linearity of a stationary time series. J. Time Ser. Anal. 3, 443-451. Hinich, M. and D. Patterson (1985). Evidence of nonlinearity in stock returns. J. Business Econom. Statist. 3, 69-77. Hodges, S. (1995). Arbitrage in a fractal Brownian motion market. Financial Options Research Centre, University of Warwick. Hong, Y. and H. White (1995). Consistent specification testing via nonparametric series regression. Econometrica 63, 1133-1159. Horgan, J. (1995). From complexity to perplexity: Can science achieve a unified theory of complex systems? Even at the Santa Fe Institute, some researchers have their doubts. Sci. Amer. 276, 104109.
358
Horowitz, J. (1995). Lecture notes on bootstrap. Lecture notes prepared for World Congress of the Econometric Soc., Tokyo, Japan, 1995. Hosking, J. (1981). Fractional differencing. Biometrika 68, 165-176. Hsieh, D. A. (1991). Chaos and nonlinear dynamics: Application to financial markets. J. Finance 46, 183%1877. Hsu, D., R. Miller, and D. Wichern (1974). On the stable paretian behavior of stock-market prices. J. Amer. Statist. Assoc. 69, 108-113. Hurst, H. (1951). Long-term storage capacity of reservoirs. Transactions of the American Socitey of Civil Engineers 116, 770-799. Incl~m, C. (1993). GARCH or sudden changes in variance? An empirical study. Mimeo, Georgetown University. Jacklin, C., A. Kleidon, and P. Pfleiderer (1992). Underestimation of portfolio insurance and the crash of October 1987. Rev. Financ. Stud. 5, 35-63. Jaditz, T. and C. Sayers, (1993). Is chaos generic in economic data? Internat. J. Bifurcations Chaos, 745 755. Jansen, D. and C. de Vries (1991). On the frequency of large stock returns: Putting booms and busts into perspective. Rev. Econom. Statis. 73, 18-24. Jog, V. and H. Schaller (1994). Finance constraints and asset pricing: Evidence on mean reversion. J. Emp. Finance 1, 193-209. Judd, K. (1994). Numerical Methods in Economics, to appear, Hoover Institute. Judd, K., A. Bernardo (1993). Asset market equilibrium with general securities, tastes, returns, and information, asymmetries. Working paper, Hoover Institution. Kim, M., C. Nelson and R. Startz (1991). Mean reversion in stock prices? A reappraisal of the empirical evidence. Rev. Econom. Stud., 58, 515 528. Koedijk, K., M. Schafgans and C. de Vries (1990). The tail index of exchange rate returns. J. Internat. Econom. 29, 93-108. Kramer, C. (1994). Macroeconomic seasonality and the January effect. J. Finance 49, 1883-1891. Krugman, P. (1993). Complexity and emergent structure in the international economy. Department of Economics, Stanford University. Lamoureux, C. and W. Lastrapes (1990). Persistence in variance, structural change, and the GARCH model. J. Business Econom. Statist. 8, 225-234. Lamoureux, C. and W. Lastrapes (1994). Endogenous trading volume and momentum in stock return volatility. J. Business Econom. Statist. 12, 253-260. LeBaron, B. (1992). Some relations between volatility and serial correlations in stock returns. J. Business 65, 199-219. LeBaron, B. (1993). Emergent Structures: a Newsletter of the Economics Research Program at the Santa Fe Institute. LeBaron, B. (1994). Chaos and nonlinear forecastiblity in economics and finance. Philos. Trans. Roy. Soc. London, Ser. A 348, 397--404. Lee, B.-J. (1988) A model specification test against the nonparametric altrenative. Ph.D. Dissertation, University of Wisconsin. Lee, T., H. White, and C. Granger (1993). Testing for neglected nonlinearity in time series models, a comparison of neural network methods and alternative tests. J. Econometrics 56, 269-290 Leger, C., D. Politis, and J. Romano (1992). Bootstrap technology and applications. Technometrics 34, 378-398. LePage, R. and L. BiUard (1992). Exploring the Limits of Bootstrap. John Wiley and Sons: New York. Levich, R. and L. Thomas (1993). The significance of technical trading-rule profits in the foreign exchange market: A bootstrap approach. J. lnternat. Money Finance 12, 451-474. Li, H. and G. S. Maddala (1995). Bootstrapping time series models. Econometric. Rev. to appear. Lo, A. (1991). Long-term memory in stock market prices. Econometrica 59, 1279-1313. Lo, A. and C. MacKinlay (1988). Stock markets do not follow random walks: Evidence from a simple specification test. Rev. Financ. Stud. 1, 41-66.
359
Loretan, M. (1991). Testing covariance stationarity of heavy-tailed economic time series, Ch. 3, Ph. D. Dissertation, Yale University. Loretan, M. and P. C. B. Phillips, (1994). Testing the covariance stationarity of heavy-tailed time series: An overview of the theory with applications to several financial datasets. J. Emp. Finance 1, 211-248. Lucas, R. (1978). Asset prices in an exchange economy. Econometrica 46, 1429-1445. Luukkonen, R., P. Saikkonen, and T. Terasvirta (1988). Testing linearity against smooth transition autoregressions. Biometrika 75, 491-499. Maddala, G. S. and H. Li (1996). Bootstrap based tests in financial models. In: G. S. Maddala, C. R. Rao, eds., Handbook of Statistics 14: Statistical Methods in Finance, North Holland, New York. Mandelbrot, B. (1963). The variation of certain speculative prices. J. Business 36, 394-419. Mandelbrot, B. (1971). When can price be arbitraged efficiently? A limit to the validity of the random walk and martingale models. Rev. Econom. and Statist. 53, 543-553. Mandelbrot, B. (1975). Limit theorems of the self-normalized range for weakly and strongly dependent processes. Z. Wahrsch. Verw. Geb. 31, 271-285. Mandelbrot, B. and M. Taqqu (1979). Robust R/S analysis of long run serial correlation. 42nd session of the International Statistical Institute, Manila, Book 2, 69-99. Mandelbrot, B. and J. Wallis (1968). Noah, Joseph, and operational hydrology. Water Resources Research 4, 967-988. McCulloch, H. (1995). Measuring tail thickness in order to estimate the stable index ~: A critique Department of Economics, Ohio State University. McCulloch, H. (1996). Financial applications of stable distributions. In: G. S. Maddala, and C. R. Rao, eds., Handbook of Statistics Volume 14: Statistical Methods in Finance. North Holland, New York. McLeod, A. and W. Li (1983). Diagnostic checking ARMA time series models using squared-residual autocorrelations. J. Time Ser. AnaL 4, 269-273. McFadden, D. (1989). A method of simulated moments for estimation of discrete response models without numerical integration. Econometrica 57, 995-1026. Mehra, R. (1991). On the volatility of stock market prices. Working Paper, Department of Economics, The University of California, Santa Barbara, J. Emp. Finance, to appear. Michener, R. (1984). Permanent income in general equilibrium. J. Monetary Econom. 14, 297-305. Mills, T. (1993). Is there long-term memory in UK stock returns?. Appl. Financ. Econom. 3, 293-302. Mittnik, S. and S. Rachev (1993). Modeling asset returns with alternative stable distributions. Econometric Rev. 12, 261-330. Nelson, D. B. (1990). Stationarity and persistence in the GARCH(1,1) model. Econometric Theory 6, 318-334. Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new approach. Econometrica 59, 347-370. Pagan, A. (1995). The econometrics of financial markets. Mimeo, The Australian National University and The University of Rochester. Pagan, A. and G. Schwert (1990). Testing for covariance stationarity in stock market data. Econom. Lett. 33, 165 170. Pakes, A. and D. Pollard (1989). Simulation and Asymptotics of optimization estimators. Econometrica 57, 1027-1057. Peters, E. (1994). Fractal Market Analysis. John Wiley & Sons, New York. Poterba, J. and L. Summers (1988). Mean reversion in stock returns: Evidence and implications. J. Financ. Econom. 22, 27-60. Prigogine, I. and M. Sanglier, eds., (1987), Laws of Nature and Human Conduct: Specificties and Unifying Themes. G.O.R.D.E.S. Task Force of Research Information and Study on Science, Bruxelles, Belgium. Priestly, M. (1988). Non-linear and Non-stationary Time Series Analysis, Academic Press, New York. Ramsey, J, B. (1969). Tests for specification errors in classical linear least-squares regression analysis. J. Roy. Statist. Soc. 31, 350-371.
360
Randles, R. (1982). On the asymptotic normality of statistics with estimated parameters. Ann. Statist. 10, 462-474. Richardson, M. (1993). Temporary components of stock prices: A skeptic's View. J. Business Econom. Statist. 11, 199-207. Robinson, P. (1983). Nonparametric estimators for time series. J. Time Set. Anal. 4, 185 207. Robinson, P. (1991 a). Testing for strong serial correlation and dynamic conditional heteroskedasticity in multiple regression. J. Econometrics 47, 67-84. Robinson, P. (1991b). Consistent nonparametric entropy-based testing. Rev. Econom. Stud., 58, 437453. Robinson, P. (1993). Log-periodogram regression for time series with long range-dependence. Mimeo, London School of Economics. Rosen, S., K. Murphy, and J. Scheinkman (1994). Cattle Cycles. J. Politic. Econom. 102, 468-492. Rust, J. (1994). Structural estimation of Markov decision processes. In: R. Engle and D. McFadden, eds., The Handbook of Econometrics, Vol. IV, North-Holland, Amsterdam. Saikkonen, P. and R. Luukkonen (1988). Lagrange multiplier tests for testing non-linearities in time series models. Scand. J. Statist. 15, 55-58. Samorodnitsky, G. and M. Taqqu (1994). Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance. Chapman and Hall, New York. Sargent, T. (1993). Bounded Rationality in Macroeconomics. Oxford: Clarendon Press. Sargent, T. (1995). Adaptation of macro theory to rational expectations. Working Paper, Department of Economics, University of Chicago and Hoover Institution. Savit, R. and M. Green (1991). Time series and dependent variables. Physica D 50, 521-544. Scheinkman, J. (1992). Stock returns and nonlinearities. In: P. Newman, M. Milgate, and J. Eatwell, The New Palgrave Dictionary of Money and Finance. London: MacMillan, 591-593. Scheinkman, J. and M. Woodford, (1994). Self-organized criticality and economic fluctuations. Amer. Econom. Rev. Papers Proc. May, 417-421. Simonato, J. G. (1992). Estimation of GARCH processes in the presence of structural change. Econom Lett. 40, 155-158. Singleton, K. (1990). Specification and estimation of intertemporal asset pricing models. In: B. Friedman and F. Hahn, eds., Handbook of Monetary Economics: L North Holland, Amsterdam. Smith, R., Chm. (1990). Market Volatility and Investor Confidence: Report of the Board of Directors of the New York Stock Exchange, Inc., New York Stock Exchange: New York. Subba Rao, T. and M. Gabr (1980). A test for linearity of stationary time series. J. Time Ser. Anal. 1, 145-158. Summers, L. (1986). Does the stock market rationally reflect fundamental values? 9".Finance 44, 11151153. Taylor, S. (1994). Modeling stochastic volatilty: A review and comparative study. Math. Finance 4, 183-204. Thursby, J. G. and P. Schmidt (1977). Some properties of tests for specification error in a linear regression model. J. Amer. Statist. Assoc. 72, 635-641. Tj6stheim, D. and B. Auestad (1994). Nonparametric identification of nonlinear time series: Selecting significant lags. J. Amer. Statist. Assoc. 89, 1410-1419. Tsay, R. (1986). Nonlinearity tests for time series. Biometrika 73, 461-466. Vaga, T. (1994). Profiting from Chaos: Using Chaos Theory for Market Timing, Stock Selection, and Option Valuation. New York: McGraw-Hill. Viano, M., C. Deniau and G. Oppenheim (1994). Continuous-time fractional ARMA processes. Statist. & Probab. Lett. 21, 323-336. Wallis, J. and N. Matalas (1970). Small sample properties of H and K, estimators of the Hurst coefficient h. Water Resources Research 6, 332. Wang, J. (1993). A model of intertemporal asset prices under asymmetric information. Rev. Econom. Stud., 6, 405-434. Wang, J. (1994). A model of eomeptitive stock trading volume. J. Politic. Econom. 102, 127-168. Weidlich, W. (1991). Physics and social science: The approach of synergeties. Phy. Rep. 204, 1-163.
361
West, K., H. Edison, D. Cho (1993). A Utility-based comparison of some models of exchange rate volatility. J. Internat. Econom. 35, 23-45. White, H. (1987). Specification testing in dynamic models. In: Bewley T., ed., Advances in Econometrics, Fifth World Congress, Volume 1, Cambridge University Press, Cambridge. White, H. and J. Wooldridge (1991). Some results on sieve estimation with dependent observations. In: W. Barnett, J. Powel and G. Tauchen, eds., Semiparametric and Nonparametric Methods in Economics and Statistics, Cambridge University Press, New York. Wooldridge, J. (1992). A test for functional form against nonparametric alternatives. Econometric Theory 8, 452-475. Wu, P. (1992). Testing fractionally integrated time series. Mimeo, Victoria .University of Wellington. Wu, K., R. Savit, and W. Brock (1993). Statistical tests for deterministic effects in broad band time series. Physica D 69, 172-188. Yatchew, A. (1992). Nonparametric regression tests bsaed on an infinite dimensional least squares procedure. Econometric Theory 8, 452-475. Zolatarev, V. (1986). One-dimensional Stable Distributions, Vol. 65 of Translations of mathematical monographs. American Mathematical Society. Translation from the original 1983 Russian edition.
1 1 ~-,
Count Data Models for Financial Data
A. Colin Cameron and Pravin K. Trivedi
In some financial studies the dependent variable is a count, taking nonnegative integer values. Examples include the number of takeover bids received by a target firm, the number of unpaid credit installments (useful in credit scoring), the number of accidents or accident claims (useful in determining insurance premia) and the number of mortgage loans prepaid (useful in pricing mortgage-backed securities). Models for count data, such as Poisson and negative binomial are presented, with emphasis placed on the underlying count process and links to dual data on durations. A self-contained discussion of regression techniques for the standard models is given, in the context of financial applications.
1. Introduction
In count data regression, the main focus is the effect of covariates on the frequency of an event, measured by non-negative integer values or counts. Count models, such as Poisson and negative binomial, are similar to binary models, such as probit and logit, and other limited dependent variable models, notably tobit, in that the sample space of the dependent variable has restricted support. Count models are used in a wide range of disciplines. For an early application and survey in economics see Cameron and Trivedi (1986), for more recent developments see Winkelmann (1994) and Winkelmann and Zimmermann (1995), and for a comprehensive survey of the current literature see Gurmu and Trivedi (1994). The benchmark model for count data is the Poisson. If the discrete random variable Y is Poisson distributed with parameter 2, it has density e-~2Y/y!, mean 2 and variance 2. Frequencies and sample means and variances for a number of finance examples are given in Table 1. The data of Jaggia and Thosar (1993) on the number of takeover bids received by a target firm after an initial bid illustrate the preponderance of small counts in a typical application of the Poisson model. The data of Greene (1994) on the number of major derogatory reports in the credit history of individual credit card applicants illustrate overdispersion, i.e. the sample variance is considerably greater than the sample mean, compared to the Poisson which imposes equality of population mean and variance, and excess zeros since the observed proportion of zero counts of .804 is considerably greater
363
364
A. C. Cameron and P. K. Trivedi
Table 1 Frequencies for some count variables Author C o u n t Variable Jaggia-Thosar Takeover Bids after first 126 1.738 2.051 9 63 31 12 6 1 2 1 0 0 1 0 0 0 0 0 0 0 Greene Derogatory Credit Reports 1319 0.456 1.810 1060 137 50 24 17 11 5 6 0 2 1 4 1 0 1 0 0 0 Guillen Credit Defaults 4691 1.581 10.018 3002 502 187 138 233 160 107 80 59 53 41 28 34 10 13 11 4 28a Davutyan Bank Failures 40 6.343 11.820 0 0 2 7 4 4 4 1 3 5 3 0 0 0 1 0 0 5b
SampleSize Mean Variance Counts... 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 > 17
a The large counts are 17 (5 times), 18 (8), 19 (6), 20 (3), 22 (1), 24 (1), 28 (1), 29 (1), 30 (1), 34 (1). b The large counts are 17(1), 42 (1), 48 (1) 79 (1), 120 (1), 138 (1).
than the predicted probability of e -0'456 = 0.633. The negative binomial distribution, defined below, can potentially accommodate this overdispersion. In fact, the negative binomial with mean 0.456 and variance 1.810 gives predicted probability of zero counts of 0.809. A related example is the data of Dionne, Artis and Guillen (1996) who modeled the number of unpaid installments by creditors of a bank. The data of Davutyan (1989) on the annual number of bank failures has the added complication of being a time series. The data may be serially correlated, as the five largest counts are the last five observations in the latter sample period. In econometric applications with count data, analysis focuses on the role of regressors X, introduced by specifying 2 = exp(Xr/~), where the parameter vector 13 may be estimated by maximum likelihood. For example, the mean number of takeover bids for a firm may be related to the size of the firm. There are important connections between count regressions and duration (or waiting time) models. These connections can be understood by studying the underlying stochastic process for the waiting time between events, which involves
Count data models for financial data
365
the three concepts of states, spells and events. A state is a classification of an individual or a financial entity at a point in time; a spell is defined by the state, the time of entry and time of exit; and an event is simply the instantaneous transition from one state to another state. A regression model for durations involves the relationship between the (nonnegative) length of the spell spent in a particular state and a set of covariates. Duration models are often recast as models of the hazard rate, which is the instantaneous rate of transition from one state to another. A count regression involves the relationship between the number of events of interest in a fixed time interval and a set of covariates. Which approach is adopted in empirical work will depend not only on the research objectives but also on the form in which the data are available. Econometric models of durations or transitions provide an appropriate framework for modelling the duration in a given financial state; count data models provide a framework for modelling the frequency of the event per unit time period. This article differs from m a n y treatments in emphasizing the connections between the count regression and the underlying process, and the associated links with duration analysis. To fix concepts consider the event of mortgage prepayment, which involves exit from the state of holding a mortgage, and termination of the associated spell. I f the available data provide sample information on the complete or incomplete life of individual mortgages, for those that were either initiated or terminated at some date, together with data on the characteristics of the mortgage holders and mortgage contracts, a duration regression is a natural method of analyzing the role of covariates. 1 Now, it is often the case that data m a y not be available on individual duration intervals, but m a y be available on the frequency of a repeated event per some unit of time; e.g. the number of mortgages that were pre-paid within some calendar time period. Such aggregated data, together with information on covariates, may form the basis of a count data regression. Yet another data situation, which we do not pursue, is that in which one has sample information on a binary outcome, viz., whether or not a mortgage was terminated within some time interval. A binary regression such as logit or probit is the natural method for analyzing such data. Further examples of duration models are: duration between the initiation of a hostile bid for the takeover of a firm and the resolution of the contest for corporate control; the time spent in bankruptcy protection; the time to bank failure; the time interval to the dissolution of a publicly traded fund; and the time interval to the first default on repayment of a loan. Several examples of count data models in empirical finance literature have already been given. We reiterate that for each example it is easy to conceive of the data arising in the form of durations or counts.
i A spell may be in progress (incomplete) at the time of sampling. Inclusion of such censored observations in regression analysis is a key feature of duration models.
366
In Section 2 we exposit the relation between econometric models of durations and of counts. A self-contained discussion of regression techniques for count data is given in Section 3, in the context of financial applications. Concluding remarks are made in Section 4.
2. Stochastic process models for count and duration data
Fundamentally, models of durations and models of counts are duals of each other. This duality relationship is most transparent when the underlying data generating process obeys the strict assumptions of a stationary (memoryless) Poisson process. In this case it is readily shown that the frequency of events follows the Poisson distribution and the duration of spells follows the exponential distribution. For example, if takeover bids for firms follow a Poisson process, then the number of bids for a firm in a given interval of time is Poisson distributed, while the elapsed time between bids is exponentially distributed. In this special case econometric models of durations and counts are equivalent as far as the measurement of the effect of covariates (exogenous variables) is concerned. Stationarity is a strong assumption. Often the underlying renewal process exhibits dependence or memory. The length of time spent in a state, e.g. the time since the last takeover bid, may affect the chances of leaving that state; or the frequency of the future occurrences of an event may depend upon the past frequency of the same event. In such cases, the information content of duration and count models may differ considerably. However, it can be shown that either type of model can provide useful information about the role of covariates on the event of interest. The main focus in the remainder of the paper is on count data models.
2.1. Preliminaries
We observe data over an interval of length t. For nonstationary processes behavior may also depend on the starting point of the interval, denoted s. The random variables (r.v.'s) of particular interest are N(s,s + t), which denotes the number of events occurring in (s, s + t], and T(s), which denotes the duration of time to occurrence of the next event given an event occurred at time s. The distribution of the number of events is usually represented by the probability density function P r { U ( s , s + t) = r) , r = 0, 1 , 2 , . . .
The distribution of the durations is represented in several ways, including Fr(~)(t) = Pr{T(s) < t}
Sr(~)(t) = Pr{T(s) _> t} fv(~)(t) = lim Pr{t _< T(s) < t + dt}
dt---*O
hr(~)(t) = lim Pr{t < T(s) < t + dt I T(s) > t}

dt---~O
Count data modelsfor financial data
367
Hr(s) (t) =
s+t
hr(s) (u) du
dS
where the functions F, S, f , h and H are called, respectively, the cumulative distribution function, survivor function, density function, hazard function and in-
tegrated hazard function. For duration r.v.'s the distribution is often specified in terms of the survivor and hazard functions, rather than the more customary c.d.f, or density function, as they have a more natural physical interpretation. In particular, the hazard function gives the instantaneous rate (or probability in the discrete case) of transition from one state to another given that it has not occurred to date, and is related to the density, distribution and survivor functions by
fr(,)(t) fr(~)(t) hr(s) (t) - Fr(~)(t) -- 1 - ST(s)(t) " As an example, consider the length of time spent by firms under bankruptcy protection. Of interest is how the hazard varies with time and with firm characteristics. If the hazard function is decreasing in t, then the probability of leaving bankruptcy decreases the longer the firm is in bankruptcy protection, while if the hazard function increases with the interest burden of the firm, then firms with a higher interest burden are more likely to leave bankruptcy than are firms with a low interest burden. Modeling of the hazard function should take into account the origin state and the destination state. Two-state models are the most common, but multi-state models may be empirically appropriate in some cases. For example, a firm currently under bankruptcy protection may subsequently either be liquidated or resume its original operations; these possibilities call for a three-state model.
2.2, Poisson process

Define the constant 2 to be the rate of occurrence of the event. A (pure) Poisson process of rate 2 occurs if events occur independently with probability equal to 2 times the length of the interval. Formally, as t ~ 0 Pr{N(s, s + t) ----0} = 1 - 2t + o(t)
Pr{N(s,s + t) = 1} ----2t + o(t) .

and N(s, s + t) is statistically independent of the number and position of events in (0, s]. Note that in the limit the probability of 2 or more events occurring is zero, while 0 and 1 events occur with probabilities of, respectively, (1 - 2t) and 2t. For this process it can be shown that the number of events occurring in the interval (s,s tl, for nonlimit t, is Poisson distributed with mean 2t and probability Pr{N(s, s + t) = r} -- e-'~t()~t)~ r! ' r = 0, 1 , 2 , . . .
368
while the duration to the next occurrence of the event is exponentially distributed with mean 2 -1 and density
f r(s) (t) = 2e -~t

The corresponding hazard rate hr(~) (t) = 2 is constant and does not depend on the time since the last occurrence of the event, exhibiting the so-called memoryless property of the Poisson process. Note also that the distributions of both the counts and durations are independent of the starting time s. Set s = 0, and consider a time interval of unit length. Then N, the mean number of events in this interval, has mean given by E[N] = 2 ,
while the mean of T, the duration between events, is given by E[r] = ~ 1 .
Intuitively, a high frequency of events per period implies a short average interevent duration. The conditional mean function for a regression model is obtained by parameterizing 2 in terms of covariates X, e.g. 2 ~- exp(X//~). Estimation can be by m a x i m u m likelihood, or by (nonlinear) regression which for more efficient estimation uses Vat(N) = 2 or Var(T) = (1/2) 2 for a Poisson process. The Poisson process may not always be the appropriate model for data. F o r example, the probability of one occurrence may increase the likelihood of further occurrences. Then a Poisson distribution may overpredict the number of zeros, underpredict the number of nonzero counts, and have variance in excess of the mean.
2.3. Time-dependent Poisson process

The time-dependent Poisson process, also called the non-homogeneous or nonstationary Poisson process, is a nonstationary point process which generalizes the (pure) Poisson process by specifying the rate of occurrence to depend upon the elapsed time since the start of the process, i.e. we replace 2 by 2(s + t). 2 The counts N(s, s + t) are then distributed as Poisson with mean A(s, s + t), where
a(s, s + t) =
s+t
2(u) du
,IS
The durations T(s) are distributed with survivor and density functions St(s) (t) = e x p ( - A ( s , s + t))
2 The process begins at time 0, while the observed time interval starts at time s.
Count data modelsfor financial data fT(s) (t) ----2(S + t ) e x p ( - A ( s , s + t)) .
369
Hence hr(~)(t)= 2 ( s + t ) , so that 2(.) is the hazard function. Also HT(~)(t)= A(s, s + t), so that A(.) is the integrated hazard function. One convenient choice of functional form is the Weibull, 2 ( s + t ) = 2y(s + t) r-l, in which case A(s,s + t) -- 2Is + t]r - 2sL In this case, the time-dependent component of 2(-) enters multiplicatively with exponent 7 - 1. The parameter 7 indicates duration dependence; 7 > 1 indicates positive duration dependence, which means the probability that the spell in the current state will terminate increases with the length of the spell. Negative duration dependence is indicated by 7 < 1. The mean number of events in (s, s + t] also depends on s, increasing or decreasing in s as 7 > 1 or 7 < 1. This process is therefore nonstationary. The case ~ = 1 gives the pure Poisson process, in which case the Weibull reduces to the exponential. The standard parametric model for econometric analysis of durations is the Weibull. Regression models are formed by specifying 2 to depend on regressors, e.g. 2 = exp(XVfl), while ~ does not. This is an example of the proportional hazards or proportional intensity factorization:
2(t,x, =
(2.1)
where 20 (t, 7) is a baseline hazard function, and the only role of regressors is as a scale factor for this baseline hazard. This factorization simplifies interpretation, as the conditional probability of leaving the state for an observation with X = XI is 9(X1, fl)/9(X2, fl) times that when X = )(2. Estimation is also simpler, as the role of regressors can be separated from the way in which the hazard function changes with time. For single-spell duration data this is the basis of the partial likelihood estimator of Cox (1972a). When the durations of multiple spells are observed this leads to estimation methods where most information comes from the counts, see Lawless (1987). Similar methods can be applied to grouped count data. For example, Schwartz and Torous (1993) model the number of active mortgages that are terminated in a given interval of time.
2.4. Renewal process A renewal process is a stationary point process for which the durations between
occurrences of events are independently and identically distributed (i.i.d.). The (pure) Poisson process is a renewal process, but the time-dependent process is not since it is not stationary. For a renewal process fr(~)(t) fT(s')(t), Vs, s', and it is convenient to drop the dependence on s. We define Nt as the number of events (renewals) occurring in (0, t) which in earlier notation would be N(0, t) and will have the same distribution as N(s, s + t). Also define Tr as the time up to the r th renewal.
=
370 Then
Pr{Nt = r} = Pr{Nt < r + 1} - Pr{Nt < r} = Pr{Tr+l > t} - Pr{Tr > t} = Fr(t) - Fr+l (t) where Fr is the cumulative distribution function o f Tr. The second line o f the last equation array suggests an attractive a p p r o a c h to the derivation o f parametric distributions for Nt based on (or dual to) specified distributions for durations. F o r example, one m a y want a count distribution that is dual to the Weibull distribution since the latter can potentially a c c o m m o d a t e certain types o f time d e p e n d e n c e ) Unfortunately, the a p p r o a c h is often not practically feasible. Specifically, Tr is the sum o f r i.i.d, duration times whose distribution is m o s t easily f o u n d using the (inverse) Laplace transform, a modification for nonnegative r.v.'s o f the m o m e n t generating function. 4 Analytical results are m o s t easily f o u n d when the Laplace transform is simple and exists in a closed form. W h e n the durations are i.i.d, exponentially distributed, Nt is Poisson distributed as expected. Analytical results can also be obtained when durations are i.i.d. Erlangian distributed, where the Erlangian distribution is a special case o f the 2parameter g a m m a distribution that arises when the first parameter is restricted to being a positive integer; see Feller (1966), W i n k e l m a n n (1995). F o r m a n y standard duration time distributions, such as the Weibull, analytical expressions for the distribution o f T~ and hence Nt do not exist. In principle a numerical a p p r o a c h could be used, but currently there are no studies along these lines. Some useful asymptotic results are available. I f the i.i.d, durations between events have mean # and variance o-2, then the r.v.
z - - N t - - - t / # a N ( O , 1) .
The expected n u m b e r o f renewals E[Nt], called the renewal function, satisfies E[Nt] = t/# + 0(1) as t ---* ee, so that a halving o f the duration times will approximately double the mean n u m b e r of renewals. Thus if a renewal process is observed for a long period o f time, analysis o f c o u n t data will be quite informative a b o u t the mean duration time. F o r a Poisson process the relationship is exact. 3 The rate of occurrence for a renewal Weibull process is determined by the time since the previous event, when it is "renewed". For a time-dependent Weibull process it is instead determined by the time since the start of the process. 4 IfF(t) is the distribution function of a random variable T, T > 0, then the Laplace transform of F is L(s)= fo e-'tdF(t) = E[e-Sr]" If T = tl + t2 + ... + tn, then the Laplace transform of T is L ( s ) = Fin=l Li(s). Laplace transforms have a property of uniqueness in the sense that to any transform there corresponds a unique probability distribution.
371
Parametric analysis of a renewal process begins with the specification of the distribution of the i.i.d, durations. Analysis is therefore straightforward if data on the duration lengths are available. Most econometric analysis of renewal processes focuses on the implications when spells are incomplete or censored. The observed data may be the backward recurrence time, i.e. the length of time from the last renewal to fixed time point t, or the forward recurrence time, i.e. the time from t to the next renewal, but not the duration of the completed spell which is the sum of the backward and forward recurrence times; see Lancaster (1990, p.94).
2.5. Other stochastic processes
There are many other stochastic processes that could potentially be applied to financial data. A standard reference for stochastic processes is Karlin and Taylor (1975). Like many such references it does not consider estimation of statistical models arising from this theory. A number of monographs by Cox do emphasize statistical applications, including Cox and Lewis (1966) and Cox (1962). The standard results for the Poisson process are derived in Lancaster (1990, pp. 8687). Some basic stochastic process theory is presented in Lancaster (1990, Chapter 5), where renewal theory and its implications for duration analysis is emphasized, and in Winkelmann (1994, Chapter 2). Markov chains are a subclass of stochastic processes that are especially useful for modelling count data. A Markov chain is a Markov process, i.e. one whose future behavior given complete knowledge of the current state is unaltered by additional knowledge of past behavior, that takes only a finite or denumerable range of values, and can be characterized by the transition probabilities from one state (discrete value) to another. If these discrete values are non-negative integers, or can be rescaled to non-negative integer values, the Markov chain describes a probabilistic model for counts. This opens up a wide range of models for counts, as many stochastic processes are Markov chains. One example, a branching process, is considered in Section 3.6.
3. Econometric models of counts
The Poisson regression is the common starting point for count data analysis, and is well motivated by assuming a Poisson process. Data frequently exhibit important "non-Poisson" features, however, including: 1. Overdispersion: the conditional variance exceeds the conditional mean, whereas the Poisson distribution imposes equality of the two. 2. Excess zeros: a higher frequency of zeros (or some other integer count) than that predicted by the Poisson distribution with a given mean. 3. Truncation from the left: small counts (particularly zeros) are excluded. 4. Censoring from the right: counts larger than some specified integer are grouped.
372
A. C, Cameron and P. K. Trivedi
The use of Poisson regression in the presence of any of these features leads to a loss of efficiency (and sometimes consistency), incorrect reported standard errors, and a poor fit. These considerations motivate the use of distributions other than the Poisson. These models for count data are usually specified with little consideration of the underlying stochastic process. For convenient reference, Table 2 gives some commonly used distributions and their moment properties. Each sub-section considers a class of models for count data, presented before consideration of applications and the stochastic data generating process. Table 3 provides a summary of applications from the finance literature and the models used, in the order discussed in the text. 3.1. Preliminaries Typical data for applied work consist of n observations, the ith of which is ( y i , ) ( i ) , i = 1 , . . . , n, where the scalar dependent variable yi is the number of
Table 2 Standard parametric count distributions and their m o m e n t s Family Poisson Negative Binomial
Positive Counts
Density exp(-2) . 2y f(y) _ - ~

J(Y) =F(v)F(y+~))\2+v] \2+v] f ( y [ y > o) -- l_F--2~ f(y)
Count y = 0,1 ....

y = 0 , l,...
Mean; Variance 2; 2
2; 2 + 1 2 2
y = 1,2, .... y= 0
y = 1~2,...
Vary with f Vary with f l , f2
Hurdle
f(y) = f l (0)
1 - f l (0) = l _ f 2 ( 0 ) ' f2(Y)
With Zeroes
f(y) = f l (0) + (1 - f l (0)). f 2 (y)

= (l - f l (0)). f 2 ( y )
y= 0
y = 1, 2~ ...
Vary w i t h f b f 2
Table 3 Finance applications Example 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. Jaggia and Thosar Davutyan Dionne and Vanasse Dean et al. Dionne et al. Greene Bandopadyaya Jaggia and Thosar Green and Shoven Schwartz and Torous H a u s m a n et al. Epps Dependent Variable Bids received by target firm Bank Failures per year Accidents per person Accident claims Unpaid instalments Derogatory credit reports Time in bankruptcy protection Time to tender offer accepted Mortgage prepayments Mortgage prepayment or default Stock price change Normalized stock price change Model Poisson Poisson Negative Binomial Poisson - Inverse Gaussian Truncated Negative Binomial With Zeros Negative Binomial Censored Weibull Censored Weibull-gamma Proportional hazards Grouped proportional hazards Ordered probit Poisson Compound-events
373
occurrences of the event of interest, and X/is the k x 1 vector of covariates that are thought to determine y~. Except where noted we assume independence across observations. Econometric models for the counts y~ are nonlinear in parameters. Maximum likelihood (ML) estimation has been especially popular, even though closely related methods of estimation based on the first two moments of the data distribution can also be used. Interest focuses on how the mean number of events changes due to changes in one or more of the regressors. The most common specification for the conditional mean is
E[yiIXi] = exp(X/fl)
(3.1)
where fl is a k x 1 vector of unknown parameters. This specification ensures the conditional mean is nonnegative and, using OE[yi[x/]/OXij = exp(X[fl)flj, strictly monotonic increasing (or decreasing) in X/j according to the sign of flj. Furthermore, the parameters can be directly interpreted as semi-elasticities, with flj giving the proportionate change in the conditional mean when X/j changes by one unit. Finally, if one regression coefficient is twice as large as another, then the effect of a one-unit change of the associated regressor is double that of the other. Throughout we give results for this particular specification of the mean. As an example, let y~ be the number of bids after the initial bid received by the i th takeover target firm and Si denote firm size, measured by book value of total assets of the firm in billions of dollars. Then Poisson regression of yi on S/using the same sample as Jaggia and Thosar (1993) yields a conditional mean E[yi[Si] = exp(0.499 + 0.037Si), so that a one billion dollar increase in total assets leads to a 3.7 percent increase in the number of bids. Sometimes regressors enter logarithmically in (3.1). For example, we may have E[yi[Xi] = exp(fl~ loge(Xli ) Jr- X~ifl2 )
= X~' exp(X~ifl2 )
(3.2)
in which case//1 is an elasticity. This formulation is particularly appropriate when
Xli is a measure of exposure, such as number of miles driven if modelling the

number of automobile accidents, in which case we expect//1 to be close to unity.
3.2. Poisson, negative binomial and inverse-gaussian models 3.2.1. Maximum likelihood estimation The Poisson regression model assumes that yg given X/ is Poisson distributed with
density
f(yi[Xi) -- e-'~'2~' yg[ ,
Yi = 0, 1 , 2 , . . .
(3.3)
and mean parameter 2i = exp(Xi'fl) as in (3.1). Given independent observations, the log-likelihood is
374
n

ff(t
logL = E { y i
i=1
ifl - exp(Xi'fl) - logyi!}
(3.4)
Estimation is straightforward. The log-likelihood function is globally concave, many statistical packages have built-in Poisson M L procedures, or the NewtonRaphson algorithm can be implemented by iteratively reweighted OLS. The firstorder conditions are
n
) ( Y i - exp(Xi'fl))X," = 0 ,
i=1
or that the unweighted residual ( y i - exp(X/'fi)) is orthogonal to the regressors. Applying the usual M L theory yields fl asymptotically normal with mean fl and Var(fi) = exp , (3.5)
using E [02 log L/OflOff] = - ~_,i~=1exp(X[fl)XiX~'. The Poisson distribution imposes equality of the variance and mean. In fact observed data are often overdispersed, i.e. the variance exceeds the mean. Then the Poisson M L E is still consistent if the mean is correctly specified, i.e. (3.1) holds, but it is inefficient and the reported standard errors are incorrect. 5 More efficient parameter estimates can be obtained by M L estimation for a specified density less restrictive than the Poisson. The standard two-parameter distribution for count data that can accommodate overdispersion is the negative binomial, with mean 2i, variance 2i + ~2~, and density
2i Y*
f ( Y i l X i ) = F ( y i + l)r(a-1)
yi = 0, 1 , 2 , . . .
~ - ~ 7 2i
-~-f- ~ii
(3.6) as in (3.1) equals
The log-likelihood for mean parameter 2i = exp(X/'fi) logL =

i=l
{log - - - - r t k.)'l ,. + i-)F~l)J \

a -1)
(F(yi+cx-1)
~
(3.7)
- (Yi +
log(1 + ~exp(X;fl)) + yilog~ + YiXitfl} .
There are alternative parameterizations of the negative binomial, with different variance functions. The one above is called the Negbin 2 model by Cameron and Trivedi (1986), and is computed for example by LIMDEP. It nests as a special case the Geometric, which sets ~ = 1. An alternative model, called Negbin 1, has
5 This is entirely analogous to the consequences of estimating the linear regression model by MLE under the assumption of normality and homoskedastic error, when in fact the error is non-normal and heteroskedastic but still has mean zero so that the conditional mean is correctly specified.
375
variance (1 + ~)2i which is linear rather than quadratic in the mean. This Negbin 1 model is seldom used and is not formally presented here. For both models estimation is by maximum likelihood, with (~,/~) asymptotic normal with variance matrix the inverse of the information matrix. Both models reduce to the Poisson in the special case where the overdispersion parameter ~ equals zero. One motivation for the negative binomial model is to suppose that yi is Poisson with parameter ,~iOi rather than 2i, where vi is unobserved individual heterogeneity. If the distribution of o i is i.i.d, gamma with mean 1 and variance e, then while y~ conditional on 2i and oi is Poisson, conditional on 2i alone it is negative binomial with mean 2i and variance 2i -I- ~2~ (i.e. Negbin 2). This unobserved heterogeneity derivation of the negative binomial assumes that the underlying stochastic process is a Poisson process. An alternative derivation of the negative binomial assumes a particular form of nonstationarity for the underlying stochastic process, with occurrence of an event increasing the probability of further occurrences. Cross section data on counts are insufficient on their own to discriminate between the two. Clearly a wide range of models, called m i x t u r e models, can be generated by specifying different distributions of vi. One such model is the P o i s s o n - l n v e r s e G a u s s i a n model of Dean et al. (1989), which assumes vi has an inverse Gaussian distribution. This leads to a distribution with heavier tails than the negative binomial. Little empirical evidence has been provided to suggest that such alternative mixture models are superior to the negative binomial. Mixture models cannot model underdispersion (variance less than mean), but this is not too restrictive as most data is overdispersed. Parametric models for underdispersed data include the Katz system, see King (1989), and the generalized Poisson, see Consul and Famoye (1992). When data are in the form of counts a sound practice is to estimate both Poisson and negative binomial models. The Poisson is the special case of the negative binomial with ~ = 0. This can be tested by a likelihood ratio test, with - 2 times the difference in the fitted log-likelihoods of the two models distributed as )~2(1) under the null hypothesis of no overdispersion. Alternatively a Wald test can by performed, using the reported "t-statistic" for the estimated ~ in the negative binomial model, which is asymptotically normal under the null hypothesis of no overdispersion. A third method, particularly attractive if a package program for negative binomial regression is unavailable, is to estimate the Poisson model, construct 2~ = exp(X//~), and perform the auxiliary OLS regression (without constant)
{(yi _ _
yi}/L
+ u, .
(3.8)
The reported t-statistic for ~ is asymptotically normal under the null hypothesis of no overdispersion against the alternative of overdispersion of the Negbin 2 form. This last test coincides with the score or LM test for Poisson against negative binomial, but is more general as its motivation is one based on using only the specified mean and variance. It is valid against any alternative distribution with overdispersion of the Negbin 2 form, and it can also be used for testing under-
376
dispersion; see Cameron and Trivedi (1990). To test overdispersion of the Negbin 1 form, replace (3.8) with
{ ( Y i -- ,~i)2 __ y i } / ~ i -~- Ot "t- Ui
(3.9)
3.2.2. Estimation based on first m o m e n t
To date we have considered fully parametric approaches. An alternative is to use regression methods that use information on the first moment, or the first and second moments, following Gourieroux, Montfort and Trognon (1984), Cameron and Trivedi (1986) and McCullagh and Nelder (1989). The simplest approach is to assume that (3.1) holds, estimate fl by the inefficient but nonetheless consistent Poisson MLE, denoted fl, and calculate correct standard errors. This is particularly easy if it is assumed that the variance is a multiple r of the mean Var(yi[X/) = z exp(X/'fl) which is overdispersion of the Negbin 1 form. Then for the Poisson MLE Var(/~) = z (3.10)
(
\i=1
exp(X[fl
(3.11)
so that correct standard errors (or t-statistics) can be obtained from those reported by a standard Poisson package by multiplying (or dividing) by x/~, where
_ 1 ~ - , (Yi
--
n- k ~.=
exp(X/'/~)) 2 e x ~
(3.12)
This can often be directly calculated from computer output, as it is simply the Pearson statistic (3.19) divided by the degrees of freedom. If f = 4, for example, the reported t-statistics need to be deflated by a factor of two. If instead the variance is quadratic in the mean, i.e. Var(yilXi ) : exp(X[fl) + e(exp(X/'fl)) 2
use n -1 n
(3.13)
(/=~1 exp(X[fl)XiX/')-1 evaluated at a consistent estimate of e such as
(3.14)

n
377
02= Z(exp(Xi t ]~))2{(yi - exp(X[/~))2 - exp(X[/~)}

i=1
(3.15)
/ (exp(X/t fi))4
i=1
Finally, a less restrictive approach is to use the Eicker-White robust estimator Var(fl) = exp
-1
Yi -
exp(X/Pfl))2
(3.16)
X ( i=~1exp (X[fl)X/X[) which does not assume a particular model for the conditional variance. Failure to make such corrections when data are overdispersed leads to overstatement of the statistical significance of regressors.
3.2.3. E s t i m a t i o n based on f i r s t two m o m e n t s
The previous sub-section used information on the second moment only in calculating the standard errors. Directly using this information in the method of estimation of fl can improve efficiency. When the variance is a multiple of the mean, the most efficient estimator using only (3.1) and (3.10) can be shown to equal the Poisson MLE, with correct standard errors calculated using (3.11) and (3.12). When the variance is quadratic in the mean, the most efficient estimator using only (3.1) and (3.13) solves the first-order conditions (y, -_ exp(XiPfl)) exp(X/'fl) + ~ , f l ) ) 2
ex ( x ' ' x , P~ iP) i
---- 0 ,
(3.17)
where the estimator ~ is given in (3.15), and has asymptotic variance Var(fl) = exp(Xi~fl) + e(exp(X[fl))2}-l(exp(X[fl))2 . (3.18)
Such estimators, based on the first two moments, are called quasi-likelihood estimators in the statistics literature and quasi-generalized pseudo-maximum likelihoods estimators by Gouri6roux, Montfort and Trognon (1984). Finally, we note that an adaptive semi-parametric estimator which requires specification of only the first moment, but is as efficient as any estimator based on knowledge of the first two moments, is given by Delgado and Kniesner (1996).
378
A. C, Cameron and P. K. Trivedi
3.2.4. Model evaluation An indication of the likely magnitude of underdispersion and overdispersion can be obtained by comparing the sample mean and variance of the dependent count variable, as subsequent Poisson regression will decrease the conditional variance of the dependent variable somewhat but leave the average of the conditional mean unchanged (the average of the fitted means equals the sample mean as Poisson residuals sum to zero if a constant term is included). If the sample variance is less than the sample mean, the data will be even more underdispersed once regressors are included, while if the sample variance is more than twice the sample mean the data are almost certain to still be overdispersed upon inclusion of regressors. Formal tests for overdispersion and underdispersion, and for discrimination between Poisson and negative binomial, have been given in Section 3.2.1. The choice between negative binomial models with different specification of the variance function, e.g. Negbin 1 and Negbin 2, can be made on the basis of the highest likelihood. The choice between different non-nested mixture models can also be made on the basis of highest likelihood, or using Akaike's information criterion if models have different numbers of parameters. A more substantive choice is whether to use a fully parametric approach, such as negative binomial, or whether to use estimators that use information on only the first and second moments. In theory, fully parametric estimators have the advantage of efficiency but the disadvantage of being less robust to model departures, as even if the mean is correctly specified the M L E for count data models (aside from the Poisson and Negbin 2) will be inconsistent if other aspects of the distribution are misspecified. In practice, studies such as Cameron and Trivedi (1986) and Dean et al. (1989) find little difference between M L estimators and estimators based on weaker assumptions. Such potential differences can be used as the basis for a Hausman test; see, for example, Dionne and Vanasse (1992). And for some analysis, such as predicting count probabilities rather than just the mean, specification of the distribution is necessary. There are a number of ways to evaluate the performance of the model. A standard procedure is to compare the Pearson Statistic P i=, (Yi - exp(Xi//~))2 v(X/, ~ ) ' (3.19)
where v(X/, t, ~) = Var(yilX/), to (n - k), the number of degrees of freedom. This is useful for testing the adequacy of the Poisson, where v(X/, t, ~) = exp(X/'fl). But its usefulness for other models is more limited. In particular, if one specifies v(X/, t, ~) = ~ exp(X[fl), and estimates ~ by (3.12), then P always equals (n - k ) . Cameron and Windmeijer (1996) propose various R-squareds for count data models. For the Poisson model their preferred deviance-based R-squared measure is R2
DEV, P =
Y']n=l Yi log(exp(X,'/~)/ y) ~-]~inl Yi log(yi/ y)
(3.20)
Count data models for financ&l data
379
where y l o g y = 0 when y = 0. If a package reports the log-likelihood for the fitted model, this can be computed as ( l f i t - lo)/(ly - lo) where l fit is the loglikelihood for the fitted model, l0 is the log-likelihood in the intercept-only model, and ly is the log-likelihood for the model with mean equal to the actual value, i.e. ly = ~i~=l y i l o g ( y i ) - Y i - log(y/!) which is easily calculated separately. This same measure is applicable to estimation of the model with overdispersion of the form (3.10). For M L estimation of the negative binomial with overdispersion of the form (3.13), i.e. Negbin 2, the corresponding R-squared measure is
R2EV,NB2 -- 1 " ~-]~in=lyilog(Yi/ ~i) -- (Yi + ~-1)1og((yi-? ~-1)/ ()~i + ~-1)) ~i~=1 yi log(yi/~) - (Yi + ~-' ) log( (yi ~-1)/ (~ _[_~-1)) (3.21)
where ,~i -- exp(X/1]~) A crude diagnostic is to calculate a fitted frequency distribution as the average over observations of the predicted probabilities fitted for each count, and to compare this to the observed frequency distribution. Poor performance on this measure is reason for rejecting a model, though good performance is not necessarily a reason for acceptance. As an extreme example, if only counts 0 and 1 are observed and a logit model with constant term is estimated by ML, it can be shown that the average fitted frequencies exactly equal the observed frequencies.
3.2.5. Some applications to financial data Examples 1 4 illustrate, respectively, Poisson (twice), negative binomial and mixed Poisson-inverse Gaussian.
EXAMPLE 1. Jaggia and Thosar (1993) model the number of bids received by 126 U.S. firms that were targets of tender offers during the period 1978-1985 and were actually taken over within 52 weeks of the initial offer. The dependent count variable yi is the number of bids after the initial bid received by the target firm, and takes values given in Table 1. Jaggia and Thosar find that the number of bids increases with defensive actions taken by target firm management (legal defense via lawsuit and invitation of bid by friendly third party), decreases with the bid premium (bid price divided by price 14 working days before bid), initially increases and then decreases in firm size (quadratic in size), and is unaffected by intervention by federal regulators. No overdispersion is found using (3.8). EXAMPLE 2. Davutyan (1989) estimates a Poisson model for data summarized in Table 1 on the annual number of bank failures in the U.S. over the period 1947 to 1986. This reveals that bank failures decrease with increases in overall bank profitability, corporate profitability, and bank borrowings from the Federal Reserve Bank. No formal test for the Poisson is undertaken. The sample mean and variance of bank failures are, respectively, 6.343 and 11.820, so that moderate overdispersion may still be present after regression and t-statistics accordingly somewhat upwardly biased. More problematic is the time series nature of the
380
A, C. Cameron and P. K. Trivedi
data. Davutyan tests for serial correlation by applying the Durbin-Watson test for autocorrelation in the Poisson residuals, but this test is inappropriate when the dependent variable is heteroskedastic. A better test for first-order serial correlation is based on the first-order serial correlation coefficient, rl, of the standardized residual ( Y t - ~t)/V~t : T~ is asymptotically Z2(1) under the null hypothesis of no serial correlation in Yt, where T is the sample size; see Cameron and Trivedi (1993). Time series regression models for count data are in their infancy; see Gurmu and Trivedi (1994) for a brief discussion. EXAMPLE 3. Dionne and Vanasse (1992) use data on the number of accidents with damage in excess of $250 reported to police during August 1982 - July 1983 by 19013 drivers in Quebec. The frequencies are very low, with sample mean of 0.070. The sample variance of 0.078 is close to the mean, but the Negbin 2 model is preferred to Poisson as the dispersion parameter is statistically significant, and the chisquare goodness-of-fit statistic is much better. The main contribution of this paper is to then use these cross-section negative binomial parameter estimates to derive predicted claims frequencies, and hence insurance premia, from data on different individuals with different characteristics and records. It is assumed that the number of claims ( Y i l , . . . , YiT) by individual i over time periods 1 , . . . , T are independent Poisson with m e a n s (/~ilDi,..., ,~iTUi) where it = exp(X[tfl) and vi is a time invariant unobserved component that is gamma distributed with mean 1 and variance ct,6 Then the optimal predictor at time T + 1 of the number of claims of the i-th individual, given knowledge of past claims, current and past _characteristics (but not the unobserved component oi) is exp(X,!r lfl)[!/~+~], where T T ' + k 1/~+Zi J Yi = 1/TY~t=I Yit and 2i : 1/T~t=lexp(Xi~tfl). This is evaluated at the crosssection negative binomial estimates (4, fl). This is especially easy to implement when the regressors are variables such as age, sex and marital status whose changes over time are easily measured. EXAMPLE 4. Dean et al. (1989) analyze data published in Andrews and Herzberg (1985) on the number of accident claims on third party motor insurance policies in Sweden during 1977 in each of 315 risk groups. The counts take a wide range of values - the median is 10 while the maximum is 2127 - so there is clearly a need to control for the size of risk group. This is done by defining the mean to equal Ti exp(X[fl), where T i is the number of insured automobile-years for the group, which is equivalent to including log T; as a regressor and constraining its coefficient to equal unity, see (3.2). Even after including this and other regressors, the data are overdispersed. For Poisson M L estimates the Pearson statistic is 485.1 with 296 degrees of freedom, which for overdispersion of form (3.10) implies using, (3.12), that ~ = 1.638, considerably greater than 1. Dean et al. control for overdispersion by estimating by ML a mixed Poisson-inverse Gaussian model, with overdispersion of form (3.13). These ML estimates are found to be within one percent of estimates from solving (3.17) that use only the first two moments.
6 This implies that in each time period the claims are Negbin 2 distributed.
381
No attempt is made to compare the estimates with those from a more conventional negative binomial model.
3.3. Truncated, censored and modified count models

In some cases only individuals who experience the event of interest are sampled, in which case the data are left-truncated at zero and only positive counts are observed. Let f(yi[X,.) denote the untruncated parent density, usually the Poisson or Negbin 2 defined in (3.3) or (3.6). Then the truncated density, which normalizes by 1 - f(0[X/), the probability of the conditioning event that yi exceeds zero, is 1-f(0qx~), f(YilX") Yi = 1,2, 3 , . . . , and the log-likelihood function is logL = ~
i:yi>O
logf(yilX~) - log(1 - f(01X~)) .
(3.22)
Estimation is by maximum likelihood. For the Poisson model, f ( 0 [ X / ) = exp (-exp(X~rfl)), while for the Negbin 2 model, f(0IX~ ) = _e-1 log(1 + e exp(X[fl)). One could in principle estimate the model by nonlinear regression on the truncated mean, but there is little computational advantage to doing this rather than maximum likelihood. Other straightforward variations, such as left-truncation at a point greater than zero and right-truncation, are discussed in Grogger and Carson (1991) and Gurmu and Trivedi (1992). More common than right-truncation is right-censoring, when counts above a maximum value, say m, are recorded only as a category m or more. Then the loglikelihood function is
m-1
logL = Z
i:yi<m
lgf(yilXi) + Z
i:yi>m
log(1 -
~f(jlY~))
j=0
(3.23)
Even if the counts are completely recorded, it may be the case that not all values for counts come from the same process. In particular, the process for zero counts may differ from the process for positive counts, due to some threshold for zero counts. An example for continuous data is the sample selectivity model used in labor supply, where the process determining whether or not someone works, i.e. whether or not hours are positive, differs from the process determining positive hours. Similarly for count data, the process for determining whether or not a credit installment is unpaid may differ from the process determining the number of unpaid installments by defaulters. Modified count models allow for such different processes. We consider modification of zero counts only, though the methods can be extended to other counts. One modified model is the hurdle model of Multahy (1986). Assume zeros come from the density fl(YilXi), e.g. Negbin 2 with regressors Xli and parameters ~1 and//1, while positives come from the density f2(yi[Xi), e.g. Negbin 2 with regressors 3(2/ and parameters ~2 and//2. Then the probability of a zero value is clearly fl (0]Yi), while to ensure that probabilities sum to 1, the probability of a positive count is 1 - f (0IX,.) j,e ' . I,tr'~ _ = 1 , 2 , . . . The log-likelihood function is 2 t~y~ Ai), y~
382 logL = Z
i: yi=O
lgfl(0lX/) + Z
i: yi >O
{ l g 0 - f'(01X/)) (3.24)
- l o g ( 1 - f2(01X,-)) + log(f2(yi[Xi))} . An alternative modification is the with zeros model, which combines binary and count processes in the following way. If the binary process takes value 0, an event that occurs with probability f l (0IX,), say, then Yi = 0. If the binary process takes value 1, an event that occurs with probability 1 - f l (0lXi), then Yi can take count values 0, 1 , 2 , . . . with probabilities f2(yilX,-) determined by a density such as Poisson or negative binomial. Then the probability of a zero value is fl(01X/) + (1-f~(OIXO)f2(OlXe), while the probability of a positive count is (1 - f l ( O I X ~ ) ) f 2 ( y d X ~ ) , yi = 1 , 2 , . . . The log-likelihood is logL = ~
i: yi=O
log{ft(01Xi) + (1 - f~(OIX~))f2(OlSi)}
(3.25)
+ ~
i:yi>O
{log(1 - fl(0l~.)) + l o g f 2 ( y i l X i ) }
(3.26)
This model is also called the zero inflated counts model, though it is possible that it can also explain too few zero counts. This model was proposed by Mullahy (1986), who set fl(01X/) equal to a constant, say/~l, while Lambert (1992) and Greene (1994) use a logit model, in which case f l ( 0 t X i ) = (1 + exp(-X~ifll)) -1. Problems of too few or too many zeros (or other values) can be easily missed by reporting only the mean and variance of the dependent variable. It is good practice to also report frequencies, and to compare these with the fitted frequencies. EXaMPCE 5. In an earlier version, Dionne et al. (1996) analyze the number of unpaid installments for a sample of 4691 individuals granted credit by a Spanish bank. The raw data exhibit considerable overdispersion, with a mean of 1.581 and variance of 10.018. This overdispersion is still present after inclusion of regressors on age, marital status, number of children, net monthly income, housing ownership, monthly installment, credit card availability, and the amount of credit requested. For the Negbin 2 model ~ = 1.340. Interest lies in determining bad credit risks, and a truncated Negbin 2 model (3.22) is separately estimated. If the process determining zero counts is the same as that determining positive counts, then estimating just the positive counts leads to a loss of efficiency. If instead the process determining zero counts differs from that determining positive counts, then estimating the truncated model is equivalent to maximizing a subcomponent of the hurdle log-likelihood (3.24) with no efficiency loss. 7
7 The hurdle log-likelihoodis additive in fl and f2, the f2 subcomponentequals (3.22) and the information matrix is diagonal if there are no common parameters in fl and f2.
Count data modelsfor financ&l data
383
EXAMPLE 6. Greene (1994) analyzes the number of major derogatory reports (MDR), a delinquency of sixty days or more on a credit account, of 1319 individual applicants for a major credit card. MDR's are found to decrease with increases in the expenditure-income ratio (average monthly expenditure divided by yearly income), while age, income, average monthly credit card expenditure and whether the individual holds another credit card are statistically insignificant. The data are overdispersed, and the Negbin 2 model is strongly preferred to the Poisson. Greene also estimates the Negbin 2 with zeros model, using logit and probit models for the zeros with regressors on age, income, home ownership, selfemployment, number of dependents, and average income of dependents. A with zeros model may not be necessary, as the standard Negbin 2 model predicts 1070 zeros, close to the observed 1060 zeros. The log-likelihood of the Negbin 2 with zeros model of -1020.6, with 7 additional parameters, is not much larger than that of the Negbin 2 model of -1028.3, with the former model preferable on the basis of Akaike's information criterion. Greene additionally estimates a count data variant of the standard sample selection model for continuous data.
3.4. Exponential and Weibull for duration data

The simplest model for duration data is the exponential, the duration distribution implied by the pure Poisson process, with density 2e -~t and constant hazard rate 2. If data are completely observed, and the exponential is estimated when a different model such as Weibull is correct, then the exponential M L E is consistent if the mean is still correctly specified, but inefficient, and usual M L output gives incorrect standard errors. This is similar to using Poisson when negative binomial is correct. A more important reason for favoring more general models than the exponential, however, is that data are often incompletely observed, in which case incorrect distributional choice can lead to inconsistent parameter estimates. For example, observation for a limited period of time may mean that the longer spells are not observed to their completion. The restriction of a constant hazard rate is generally not appropriate for econometric data, and we move immediately to analysis of the Weibull, which nests the exponential as a special case. Our treatment is brief, as the focus of this paper is on counts rather than durations. Standard references include Kalbfleisch and Prentice (1980), Kiefer (1988) and Lancaster (1990). The Weibull is most readily defined by its hazard rate 2(t), or h(t) in earlier notation, which equals 2Vt~-1. A regression model is formed by specifying 2 to depend on regressors, viz. 2 = exp(X~fl), while 7 does not. The hazard for observation i is therefore
2i(ti[Yi)
Yt~i-1 exp(X/'fl) ,
(3.27)
with corresponding density
fi(tilXi) = 7ty-1 exp(X/'fl) exp(-t/~ exp(X~'fl)) .
(3.28)
384
The conditional mean for this process is somewhat complicated E[tiIX/] = (exp(X/'fl))-l/~F(1 + 1/7) . (3.29)
Studies usually consider the impact of regressors on the hazard rate rather than the conditional mean. If/~j > 0 then an increase in X~j leads to an increase in the hazard and a decrease in the mean duration, while the hazard increases (or decreases) with duration if 7 > 1 (or 7 < 1). In many applications durations are only observed to some upper bound. If the event does not occur before this time the spell is said to be incomplete, more specifically right-censored. The contribution to the likelihood is the probability of observing a spell of at least t~, or the survivor function
Si(tilX~) = exp(-t/~ exp(X~'fl)) .

Combining, the log-likelihood when some data are incomplete is logL= ~
i: complete
(3.30)
{logT+(?-l)logti+X~'fl-t~exp(X,:B)}
(3.31)
~
i: incomplete
-t~exp(X[fl) ,
(3.32)
and y and fl are estimated by ML. With incomplete data, the Weibull M L E is inconsistent if the model is not correctly specified. One possible misspecification is that while ti is Weibull, the parameters are V and )Lioi rather than 7 and J~i, where vi is unobserved individual heterogeneity. If the distribution of vi is i.i.d, gamma with mean 1 and variance ~, this leads to the Weibull-gamma model with survivor function,
&(tdx,.) =
[1 + t~ exp(X~'fl)]-1/~,
(3.33)
from which the density and log-likelihood function can be obtained in the usual manner. The standard general model for duration data is the proportional hazards or proportional intensity model, introduced in (2.1). This factorizes the hazard rate as
2i( ti,Xi, 7, fl) = 20( ti, 7) exp(X/'fl) ,
(3.34)
where 20(ti,7) is a baseline hazard function. Different choices of 20(ti,7) correspond to different models, e.g. the Weibull is 20(ti, 7) = 7t/~-1 and the exponential is 20(ti,7 ) = 1. The only role of regressors is as a scale factor for this baseline hazard. The factorization of the hazard rate also leads to a factorization of the log-likelihood, with a subcomponent not depending on the baseline hazard, which is especially useful for right-censored data. Define R(ti) = {jltj > t~} to be the risk set of all spells which have not yet been completed at time ti. Then Cox (1972a) proposed the estimator which maximizes the partial likelihood
385
logL =
X [ f l - log
Z exp(Xjfl) UcR/'i/ J
(3.35)
This estimator is not fully efficient, but has the advantage of being consistent with correct standard errors those reported by a M L package, regardless of the true functional form of the baseline hazard. EXAMPLE 7. Bandopadhyaya (1993) analyzes data on 74 U.S. firms that were under chapter 11 bankruptcy protection in the period 1979-90. 31 firms were still under bankruptcy protection, in which case data is incomplete, and ML estimates of the censored Weibull model (3.31) are obtained. The dependent variable is the number of days in bankruptcy protection, with mean duration (computed for complete and incomplete spells) of 714 days. The coefficient of interest amount outstanding is positive, implying an increase in the hazard and decrease in mean duration of bankruptcy protection. The other statistically significant variable is a capacity utilization measure, also with positive effect on the hazard. The estimated ~ -- 1.629 exceeds unity, so that firms are more likely to leave bankruptcy protection the longer they are in protection. The associated standard error, 0.385, leads to a "t-statistic" for testing the null hypothesis of exponential, e = 1, equal to 1.63 which is borderline insignificant for a one-sided test at 5 percent. The Weibull model is preferred to the exponential and the log-logistic on grounds that it provided the "best fit". EXAMPLE 8. Jaggia and Thosar (1995) analyze data on 161 U.S. firms that were the targets of tender offers contested by management during 1978-85. In 26 instances the tender offer was still outstanding, and the data censored. The dependent variable is the length of time in weeks from public announcement of offer to the requisite number of shares being tended, with mean duration (computed for complete and incomplete spells) of 18.1 weeks. The paper estimates and performs specification tests on a range of models. Different models give similar results for the relative statistical significance of different regressors, but different results for how the hazard rate varies with time since the tender offer. Actions by management to contest the tender offer, mounting a legal defense and proposing a change in financial structure, are successful in decreasing the hazard and increasing the mean duration time to acceptance of the bid, while competing bids increase the hazard and decrease the mean. The preferred model is the Censored Weibullgamma (3.33). The estimated hazard, evaluated at X / = ff, initially increases rapidly and then decreases slowly with t, whereas the Weibull gives a monotone increasing hazard rate. A criticism of models such as Weibull-gamma is that they assume that all spells will eventually be complete, whereas here some firms may never be taken over. Jaggia and Thosar give a brief discussion of estimation and rejection of the split-population model of Schmidt and Witte (1989) which allows for positive probability of no takeover. This study is a good model for other similar studies, and uses techniques readily available in LIMDEP.
386
3.5. Poisson for grouped duration data

A leading example of state transitions in financial data is the transition from the state of having a mortgage to mortgage termination either by pre-payment of the mortgage debt or by default. Practically this is important in pricing mortgagebacked securities. Econometrically this involves modeling the time interval between a mortgage loan origination and its pre-payment or default. Specific interest attaches to the shape of the hazard as a function of the age of the mortgage and the role of covariates. The Cox proportional hazards (PH) model for durations has been widely used in this context (Green and Shoven (1986), Lane et al (1986), Baek and Bandopadhyaya (1996)). One can alternatively analyze grouped duration data as counts (Schwartz and Torous (1993)). EXAMVLE 9. Green and Shoven (1986) analyze terminations between 1975 and 1982 of 3,938 Californian 30-year fixed rate mortgages issued between 1947 and 1976.2,037 mortgages were paid-off. Interest lies in estimating the sensitivity of mortgage prepayments to the differential between the prevailing market interest rate and the fixed rate on a given mortgage, the so-called "lock-in magnitude". The available data are quite limited, and an imputed value of this lock-in magnitude is the only regressor, so that other individual specific factors such as changes in family size or income are ignored. (The only individual level data that the authors had was the length of tenure in the house and an imputed measure of the market value of the house.) The transition probability for a mortgage of age ai, where ai = t i - to~ and tog denotes mortgage origination date, is given by 2i(ai, X, fl) = 20(ai, 7i)exp(X'fl). The authors used the Cox partial likelihood estimator to estimate (fl,~i, i = 1, .., 30); the (nonparametric) estimate of the sequence {7i, i = 1,2, ..}, somewhat akin to estimates of coefficients of categorical variables corresponding to each mortgage age, yields the baseline hazard function. The periods 1975-78 and 1978-82 are treated separately to allow for a possible structural change in the fl coefficient following a 1978 court ruling which prohibited the use of due-on-sale clauses for the sole purpose of raising mortgage rates. The authors were able to show the sensitivity of average mortgage prepayment period to interest rate changes. EXAMPLE 10. Schwartz and Torous (1993) offer an interesting alternative to the Green-Sh0ven approach, combining the Poisson regression approach with the proportional hazard structure. Their Freddie Mac data on 30-year fixed rate mortgages over the period 1975 to 1990, has over 39,000 pre-payments and over 8,500 defaults. They use monthly grouped data on mortgage pre-payments and defaults, the two being modelled separately. Let nj denote the number of known outstanding mortgages at the beginning of the quarter j, yj the number of prepayments in that quarter, and X(j) the set of time-varying covariates. Let 2(a,X(j'), fl) = 20(a, 7)exp(X(j)'fl) denote the average monthly prepayment rate expressed as a function of exogenous variables X(j), and a baseline hazard function 20(a,v). Then the expected number of quarterly prepayments will be nj- 20(a, ~) exp(X(j)'fl), and ML estimation is based on the Poisson density
387
f(yj l nj,X(j))
= [nj. 20(a, 7) exp(X(J)'fl)] y' exp(-nj - 20(a, 7) exp(X(j)'fl)) Yfl (3.36)
The authors use dummy variables for region, quarter, and the age of mortgage in years at the time of pre-payment. Other variables include loan to value ratio at origination, refinancing opportunities and regional housing returns. Their results indicate significant regional differences and a major role for refinancing opportunities. 3.6. Other count models U.S. stock prices are measured in units of one-eighth dollar (or tick), and for short time periods should be explicitly modelled as integer. For the six stocks studied in detail by Hausman, Lo and MacKinlay (1994), 60 percent of samestock consecutive trades had no price change and a further 35 percent changed by only one tick. Even daily closing prices can experience changes of only a few ticks. This discreteness in stock prices is generally ignored, though some studies using continuous pricing models have allowed for it (Gottlieb and Kalay (1985) and Ball (1988)). One possible approach is to model the price level (measured in number of ticks) as a count. But this count will be highly serially correlated, and time series regression models for counts are not yet well developed. More fruiful is to model the price change (again measured in number of ticks) as a count, though the standard count models are not appropriate as some counts will be negative. A model that permits negative counts is the orderedprobit model, presented for example in Maddala (1983). Let y~ denote a latent (unobserved) r.v. measuring the propensity for price to change, where y~ = )(,.'/3 + ei, ei is N(0, 0-2) distributed, and usually tr~ 2 = 1. Higher values of y~ are associated with higher values j of the actual discrete price change yi in the following way: yi = j if c9 < YT -< ~j+l. Then some algebra yields Pr{yi = j} = Pr{~j - Xi'fi < ci _< ~j+l - X[fi} (3.37) o-i / o'i /
Let dij be a dummy variable equal to one if y; = j and zero if yi 7L j . The loglikelihood function can be expressed as
logL=~~EdijlogI~CtJ+l~X~fl~-~(~J-X'~fl~]
i=1 J O'/ / O-i
.
/J
(3.38)
This model can be applied to nonnegative count data, in which case j = O , 1 , 2 , . . . , m a x ( y i ) . Cameron and Trivedi (1986) obtained qualitatively similar results regarding the importance and significance of regressors in their
388
application when ordered probit was used rather than Poisson or negative binomial. For discrete price change data that may be negative, Hausman et al. (1992) use the ordered probit model, with j = - m , - m + 1,... ,0, 1 , 2 , . . . ,m, where the value m is actually m or more, and - m is actually - m or less. Parameters to be estimated are then parameters in the model for 0-2, the regression parameters t , and the threshold parameters a-,n+1,..., ~,,, while a-m = - o o and
~m+l ~ 0(3.
EXAMPLE 11. Hausman et al. (1992) use 1988 data on time-stamped (to nearest second) trades on the New York and American Stock Exchanges for one hundred stocks, with results reported in detail for six of the stocks. Each stock is modelled separately, with one stock (IBM) having as many as 206,794 trades. The dependent variable is the price change (measured in units of $1/8) between consecutive trades. The ordered probit model is estimated, with rn = 4 for most stocks. Regressors include the time elapsed since the previous trade, the bid/ask spread at the time of the previous trade, three lags of the price change and three lags of the dollar volume of the trade, while the variance a 2 is a linear function of the time elapsed since the previous trade and the bid/ask spread at the time of the previous trade. This specification is not based on stochastic process theory, though arithmetic Brownian motion is used as a guide. Hausman et al. conclude that the sequence of trades affects price changes and that larger trades have a bigger impact on price. EXAMPLE 12. Epps (1993) directly models the discrete stock price level (rather than change) as a stochastic process. It is assumed that the stock price at discrete time t, Pt, is the realization of a Galton-Watson process, a standard branching process, with the complication that the number of generations is also random. The conditional density (or transition probabilities) of Pt given Pt-l is easy to represent analytically, but difficult to compute as it involves convolutions. This makes estimation difficult if not impossible. Epps instead uses an approximation to model the (continuous) normalized price change Yt = (Pt - - P t - 1 ) / ~ which can be shown to be a realization of the Poisson compound-events distribution. Epps (1993) analyses daily individual stock closing price data from 1962 to 1987, with separate analysis for each of 50 corporations and estimation by a method of moments procedure. Advantages of the model include its prediction of a thick tail distribution for the conditional distribution of returns.
The basic Poisson and negative binomial count models (and other Poisson mixture models) are straightforward to estimate with readily available software, and in many situations are appropriate. Estimation of a Poisson regression model should be followed by a formal test of underdispersion or overdispersion, using the auxiliary regressions (3.8) or (3.9). If these tests reject equidispersion, then
389
standard errors should be calculated using (3.11), (3.14) or (3.16). If the data are overdispersed it is better to instead obtain M L estimates of the Negbin 2 model (3.6). However, it should be noted that overdispersion tests have power against other forms of model misspecification, for example the failure to account for excess zeros. A common situation in which these models are inadequate is when the process determining zero counts differs from that determining positive counts. This may be diagnosed by comparison of fitted and observed frequencies. Modified count models, such as the hurdle or with zeros model, or models with truncation and censoring are then appropriate. This study has emphasized the common basis of count and duration models. When data on both durations and counts are available, modelling the latter can be more informative about the role of regressors, especially when data on multiple spells for a given individual are available or when data are grouped. Grouping by a uniform time interval is convenient but sometimes the data on counts will not pertain to the same interval. One may obtain time series data on the number of events for different time intervals. Such complications can be accommodated by the use of proportional intensity Poisson process data regression models (Lawless (1987)). The assumptions of the simplest stochastic processes are sometimes inadequate for handling financial data. An example is the number of transactions or financial trades that may be executed per small unit of time. Independence of events will not be a convincing assumption in such a case, so renewal theory is not appropriate. One approach to incorporating interdependence is use of modulated renewal processes (Cox (1972b)). For time series data on durations, rather than counts, Engle and Russell (1994) introduce the autoregressive conditional duration model which is the duration data analog of the G A R C H model. This model is successful in explaining the autocorrelation in data on the number of seconds between consecutive trades of IBM stock on the New York Stock Exchange. Time series count regression models are relatively undeveloped, except the pure time series case which is very limited. In fact, techniques for handling most of the standard complications considered by econometricians, such as simultaneity and selection bias, are much less developed for count data than they are for continuous data. A useful starting point is the survey by Gurmu and Trivedi (1994).
Acknowledgement
The authors thank Arindam Bandopadhyaya, Sanjiv Jaggia, John Mullahy and Per Johansson for comments on an earlier draft of this paper.
390
References
Andrews, D. F. and A. M. Herzberg (1985). Data. Springer-Verlag, New York. Back, I-M. and A. Bandopadhyaya (1996). The determinants of the duration of commercial bank debt renegotiation for sovereigns. J. Banking Finance 20, 673-685. Ball, C. A. (1988). Estimation bias induced by discrete security prices. J. Finance 43, 841-865. Bandopadhyaya, A. (1994). An estimation of the hazard rate of firms under chapter 11 protection. Rev. Econom. Statist. 76, 346-350. Cameron, A. C. and P. K. Trivedi (1986). Econometric models based on count data: Comparisons and applications of some estimators and tests. J. Appl. Econom. 1 (1), 29-54. Cameron, A. C. and P. K. Trivedi (1990). Regression based tests for overdispersion in the Poisson model. J. Econometrics 46 (3), 347-364. Cameron, A. C. and P. K. Trivedi (1993). Tests of independence in parametric models with applications and illustrations. J. Business Econom. Statist. lI, 29-43. Cameron, A. C. and F. Windmeijer (1995). R-Squared measures for count data regression models with applications to health care utilization. J. Business Econom. Statist. 14(2), 209-220. Consul, P. C. and F. Famoye (1992). Generalized Poisso n regression model. Communications in statistics: Theory and method 21 (1), 89-109. Cox, D. R. (1962). Renewal Theory. Methuen, London. Cox, D. R. (1972a). Regression models and life tables. J. Roy. Statist. Soc. Ser. B. 34, 187-220. Cox, D. R. (1972b). The statistical analysis of dependencies in point processes. In: P.A.W. Lewis ed., Stochastic Point Processes. John Wiley and Sons, New York. Cox, D. R. and P. A. W. Lewis (1966). The Statistical Analysis o f Series o f Events. Methuen, London. Davutyan, N. (1989). Bank failures as Poisson variates. Econom. Lett. 29 (4), 333-338. Dean, C., J. F. Lawless, and G. E. Wilmot (1989). A mixed Poisson-inverse Gaussian regression Model. Canad. J. Statist. 17 (2), 171-181. Delgado, M. A. and T. J. Kniesner (1996). Count data models with variance of unknown form: An application to a hedonic model of worker absenteeism. Rev. Econom. Statist., to appear. Dionne, G., M. Artis and M. Guillen (1996). Count data models for a credit scoring system. J. Empirical Finance, to appear. Dionne, G. and C. Vanasse (1992). Automobile insurance ratemaking in the presence of asymmetric information. J. Appl. Econometrics 7 (2), 149-166. Engle, R. F. and J. R. Russell (1994). Forecasting transaction rates: The autoregresive conditional duration model. Working Paper No. 4966, National Bureau of Economic Research, Cambridge, Massachusetts. Epps, W. (1993). Stock prices as a branching process. Department of Economics, University of Virginia, Charlottesville. Feller, W. (1966). An Introduction to Probability Theory, Vol II. New York: Wiley. Gottlieb, G. and A. Kalay (1985). Implications of the discreteness of observed stock prices. J. Finance 40 (1), 135-153. Gouri6roux, C., A. Monfort and A. Trognon (1984). Pseudo maximum likelihood methods: Applications to Poisson models. Econometrica 52 (3), 681-700. Green, J. and J. Shoven (1986). The effects of interest rates on mortgage prepayments. J. Money, Credit and Banking 18 (1), 41-59. Greene, W. H. (1994). Accounting for excess zeros and sample selection in Poisson and negative binomial regression models. Discussion Paper EC-94-10, Department of Economics, New York University, New York. Grogger, J. T. and R. T. Carson (1991). Models for truncated counts. J. Appl. Econometrics 6 (3), 225 238. Gurmu, S. and P. K. Trivedi (1992). Overdispersion tests for truncated Poisson regression models. J. Econometrics 54, 347-370. Gurmu, S. and P. K. Trivedi (1994). Recent developments in models of event counts: A Survey. Discussion Paper No.261, Thomas Jefferson Center, University of Virginia, Charlottesville.
Count data models f o r financial data
391
Hausman, J. A., A. W. Lo and A. C. MacKinlay (1992). An ordered probit analysis of transaction stock prices. J. Financ. Econom. 31,319-379. Jaggia, S., and S. Thosar (1993). Multiple bids as a consequence of target management resistance: A count data approach. Rev. Quant. Finance Account. December, 447-457. Jaggia, S. and S. Thosar (1995). Contested tender offers: An estimate of the hazard function. Y. Business Econom. Statist. 13 (1), 113-119. Kalbfteisch, J. and R. Prentice (1980). The Statistical Analysis o f Failure Time Data. John Wiley and Sons, New York. Karlin, S. and H. Taylor (1975). A First Course in Stochastic Processes, 2nd. ed., Academic Press, New York. Kiefer, N. M. (1988). Econometric duration data and hazard functions. J. Econom. Literature 26 (2), 646-679. King, G. (1989). Variance specification in event count models: From restrictive assumptions to a generalized estimator. Amer. J. Politic. Sci. 33, 762-784. Lambert, D. (1992). Zero-inflated Poisson regression with an application to defects in manufacturing. Technometrics 34, 1 14. Lancaster, T. (1990). The Econometric Analysis of Transition Data. Cambridge University Press, Cambridge. Lane, W., S. Looney and J. Wansley (1986). An application of the cox proportional hazard model to bank failures. J. Banking Finance 18 (4), 511-532. Lawless, J. F. (1987). Regression methods for Poisson process data. J. Amer. Statist. Assoc. 82 (399), 808-815. Maddala, G. S. (1983). Limited-Dependent and Qualitative Variables in Econometrics. Cambridge University Press, Cambridge. McCullagh, P. and J. A. Nelder (1989). Generalized Linear Models. 2nd ed., Chapman and Hall, London. Mullahy, J. (1986). Specification and testing of some modified count data models. J. Econometrics 33 (3), 341-365. Schmidt, P. and A. Witte (1989). Predicting criminal recidivism using split population survival time models. J. Econometrics 40 (1), 141 159. Schwartz, E. S. and W. N. Torous (1993). Mortgage prepayment and default decisions: A Poisson regression approach. A R E U E A Journal: J. American Real Estate Institute 21 (4), 431-449. Winkelmann, R. (1995). Duration dependence and dispersion in count-data models. J. Business and Econom. Statist. 13, 467-474. Winkelmann, R. (1994). Count Data Models: Econometric Theory and an Application to Labor Mobility. Springer-Verlag, Berlin. Winkelmann, R. and K. F. Zimmermann (1995). Recent developments in count data modelling: Theory and application. J. Econom. Surveys 9, 1-24.
G. S. Maddala and C. R. Rao, eds., Handbook o f Statistics, Vol. 14 1996 Elsevier Science B.V. All rights reserved.
1 ._3
Financial Applications of Stable Distributions
J. Huston McCulloch
Life is a gamble, at terrible odds; If it were a bet, you wouldn't take it. Tom Stoppard, Rosenkrantz and Guildenstern are Dead
I. Introduction Financial asset returns are the cumulative outcome of a vast number of pieces of information and individual decisions arriving continuously in time. According to the Central Limit Theorem, if the sum of a large number of iid random variates has a limiting distribution after appropriate shifting and scaling, the limiting distribution must be a member of the stable class (L6vy 1937, Zolotarev 1986: 6). It is therefore natural to assume that asset returns are at least approximately governed by a stable distribution if the accumulation is additive, or by a logstable distribution if the accumulation is multiplicative. The Gaussian is the most familiar and tractable stable distribution, and therefore either it or the log-normal has routinely been postulated to govern asset returns. However, returns are often much more leptokurtic than is consistent with normality. This naturally leads one to consider also the non-Gaussian stable distributions as a model of financial returns, as first proposed by Benoit Mandelbrot (1960, 1961, 1963a,b). If asset returns are truly governed by the infinite-variance stable distributions, life is fundamentally riskier than in a Gaussian world. Sudden price movements like the 1987 stock market crash turn into real-world possibilities, and the risk immunization promised by "programmed trading" becomes mere wishful thinking, at best. These price discontinuities render the arbitrage argument of the celebrated Black-Scholes (1973) option pricing model inapplicable, so that we must look elsewhere in order to value options. Nevertheless, we shall see that the Capital Asset Pricing Model works as well in the infinite-variance stable cases as it does in the normal case. Furthermore, the Black-Scholes formula may be extended to the non-Gaussian stable cases by means of a utility maximization argument. Two serious empirical objections that have been raised against the stable hypothesis are shown to be inconclusive.
393
394
J. H. McCulloch
Section 2 of this paper surveys the basic properties of univariate stable distributions, of continuous time stable processes, and of multivariate stable distributions. Section 3 reviews the literature on portfolio theory with stable distributions, and extends the CAPM to the most general MV stable case. Section 4 develops a formula for pricing European options with log-stable uncertainty and shows how it may be applied to options on commodities, stocks, bonds, and foreign exchange rates. Section 5 treats the estimation of stable parameters and surveys empirical applications for returns on various assets, including foreign exchange rates, stocks, commodities, and real estate. Empirical objections that have been raised against the stable hypothesis are considered, and alternative leptokurtic distributions that have been proposed are discussed.
2. Basic properties of stable distributions
2.1. Univariate stable distributions

Stable distributions S(x; ~, fl, c, 6) are determined by four parameters. The location parameter 6 C ( - o o , ~ ) shifts the distribution to the left or right, while the scale parameter e E (0, ~ ) expands or contracts it about 6, so that
S(x; ~, [3, c, 6) = S((x - 6)/c; ~, [3, 1,0) .
(1)
We will write the standard stable distribution function with shape parameters c~and [3 as S~#(x) = S(x; ~, [3, 1,0), and use s(x; ~, [3, c, 6) and s~,(x) for the corresponding densities. If X has distribution S(x; ~,/3, c, 6), we write X ~ S(~,/3, e, 6). The characteristic exponent c~ C (0, 2] governs the tail behavior and therefore the degree of leptokurtosis. When ~ = 2, a normal distribution results, with variance 2c 2. For ~ < 2, the variance is infinite. When ~ > 1, EX = 6, but if~ < 1, the mean is undefined. The case ~ = 1,/3 = 0 gives the Cauchy (arctangent) distribution. Expansions due to Bergstrom (1952) imply that as x Y ~ ,
S~,~(-x) ~ (1 -/3) F(~) sm-~-x" rc~ _~ , rc

1 -
(2)
S~,~(x) ~ (1 + fl) F(~) sln~-x" 7r~ _~ .
When ~ < 2, stable distributions therefore have one or more "Paretian" tails that behave asymptotically like x -~ and give the stable distributions infinite absolute population moments of order greater than or equal to =. In this case, the skewness parameter [3 E [-1, 1] indicates the limiting ratio of the difference of the two tail probabilities to their sum. We here follow Zolotarev (1957) by defining [3 so that /3 > 0 indicates positive skewness for all ~. If/3 = 0, the distribution is symmetric stable (SS). As ~ T 2, [3 loses its effect and becomes unidentified. Stable distributions are defined most concisely in terms of their log characteristic functions:
Financial applications of stable distributions
395
log Ee ixt = i6t + ~ , ~ (ct) , where tk~,~(t) -= -It[=[1 - iflsign(t)tanrer/2] , -It[[1 + ifl~ sign(t) log Itl] , ~ 1 , = 1
(3)
(4)
is the log c.f. for S~/~(x)1. The stable distribution and density may be computed either by using Zolotarev's (1986: 74, 68) proper integral representations, or by evaluating the inverse Fourier transform of the c.f. DuMouchel (1971) tabulates the stable distributions, while Holt and Crow (1973) tabulate and graph the density. 2 See also Fama and Roll (1968) and Panton (1992). A fast numerical and reasonably accurate approximation to the SS distribution and density for ~ E [0.84, 2.00], has been developed by McCulloch(1994b). The formulas for S ~ ( x ) are calculable for e > 2 or 1/31 > l, but the resulting function is not a proper probability distribution since one or both tails will then lie outside [0,1], as may be seen from (2). Stable distributions are therefore constrained to have c E (0,2] and/~ c [-1, 1]. Let X ~ S(e,/~, c, 6) and a be any real constant. Then (3) implies
a X ~ S(c, sign(a)[1, lalc, a6) .
(5)
Let X1 ~ S(c~,/~l, cl, 61) and X 2 ~ S(e, [12, c2, 62) be independent drawings from stable distributions with a common e. Then X3 = X 1 + X 2 "~S(o~,[13~c3,63 ), where c 3 = c 1 + c2 ,
f13 = ([11C~ -~- [12C~)/C~ , 63 = 61 -~- 62, 0~ ~L 1 61 + 6 2 + ~ ( [ 1 3 c 3 1 0 g c 3
(6) (7) -[11110gc1 -[12c210gc2),0~=

1 .
(8)
When [1~ = [12,]~3 equals their common value, so that x3 has the same shaped distribution as x~ and x2. This is the "stability" property of stable distributions that leads directly to their role in the CLT, and makes them particularly useful in financial portfolio theory. If/31 [12, [13 lies between [11 and [12. For ~ < 2 and [1 > - 1 , the long upper Paretian tail makes Ee x infinite. However, when X ~ S ( e , - 1 , c, 6), Zolotarev (1986:112) has shown that
i (3) follows DuMouchel (1973a) and implies (1) and (5). Samorodnitsky and Taqqu (1994), following Zolotarev (1957), use (4), but give the general log c.f. as i#t + c~b~# (t). This is equivalent to (3) for e ~ 1, with It = 6. For c~-- 1, however, their It becomes 6 - (2/n)~clogc. McCulloch (1986) erroneously attributes to this "it" formulation the properties of (3). See McCulloch (in press b) for details. 2 Holt and Crow, following the 1949 work of Kolmogorov and Gnedenko, reverse the sign on fl in (4) for e ~ 1, with the unfortunate but easily corrected result that their "fl"> 0 indicates negative skewness and vice-versa, unless e = 1. Cf Hall (1981).
396
J. H. McCulloch
6 - c a see(W), ct 1 6+2clogc, ~=1 .
lgEeX=
(9)
This formula greatly facilitates asset pricing under log-stable uncertainty. 3 A simulated stable r.v. m a y be c o m p u t e d directly f r o m a pair o f independent uniform p s e u d o - r a n d o m variables without using the inverse c d f by the m e t h o d o f Chambers, Mallows and Stuck (1976). 4
2.2. Continuous time stable processes

Because stable distributions are infinitely divisible, they are particularly attractive for continous time modeling (Samuelson 1965: 15-16; McCulloch 1978). The stable generalization o f the familiar Brownian m o t i o n or Wiener process is called an a-Stable Ldvy Motion, and is the subject o f two recent m o n o g r a p h s , by Sam o r o d n i t s k y and T a q q u (1994) and Janicki and W e r o n (1994). Such a process is a self-similar fractal in the sense o f M a n d e l b r o t (1983). In Peters' (1994) terminology, afractal distribution is thus a stable distribution. A standard a-Stable L6vy M o t i o n ~(t) is a continuous time stochastic process whose increments ~(t + At) - ~(t) are distributed S(~, fl, At 1/~,0) for ~ 1 or S(1,fl, At,(2/z:)flAtlogAt) for ~ = 1 , and whose non-overlapping increments are independent. Such a process has infinitessimal increments d~(t) = ~(t + dr) - ~(t), with scale dt 1/~. The process itself m a y then be reconstructed as the integral o f these increments: ~(t) = 4(0) +
Jo'
d~(r) .
The more general process z(t) = co~(t) + fit has scale co over unit time intervals and, for ~ ~ 1, drift 6 per unit time. Unlike a Brownian motion, which is almost surely (a.s.) everywhere continuous, an a-Stable L6vy M o t i o n is a.s. dense with discontinuities. Applying (2) to S(~,/~, cat, 0) (cf. eqs. (18)-(19) o f McCulloch 1978), the probability that dz > x is
k~
= k~c~x-~dt ,
where
(10)
k~/~ = (1 + fl) F(7) sin 7:~ 7: 2- "
(11)
3 The author is grateful to Vladimir Zolotarev for confirming that his Theorem 2.6.1 is, through a reparameterization, equivalent to (9). When c~= 2, (9) becomes the familiar formula log Eex = //+ a2/2. 4 A call to IMSL subroutine GGSTA, which is based on their method, generates a simulated stable variate with argument BPRIME equal to our/~, c = 1, and ( = 0, where ~ = 6 +/~c tan(he/2) for e ~ 1 and ~ = 6 for e = 1, rather than 6 = 0. See Zolotarev (1957: 454, 1987:11) and McCulloch (1986:1121-26, in press b) concerning this shift. See also Panton (1989) for computational details concerning the CMS paper.
397
Eq. (10) in turn implies that values of dz greater than any threshhold x0 > 0 occur at rate
= k (eolxo) ,
(12)
(13)
and that conditional on their occurrence, they have a Pareto distribution:

P ( d z < x l d z > xo) = 1 - (xo/x) ~, x > xo
Likewise, negative discontinuities d z < - x o also have a conditional Pareto distribution, and occur at a rate determined by (12), but with k~p replaced by k~,_B. In the case a = 2, k ~ = 0, so that discontinuities a.s. never occur. With a < 2, the frequency of discontinuities greater than x0 in absolute value approaches infinity as x0 ~ 0. If fl = 1, discontinuities a.s. occur only in the direction of the single Paretian tail. Because the scale of A~ falls to 0 as At $ 0, an a-Stable L6vy Motion is everywhere a.s. continuous, despite the fact that it is not a.s. everywhere continuous. That is to say, every individual point t is a.s. a point of continuity, even though on any finite interval, there will a.s. be an infinite number of points for which this is not true. Even though they are a.s. dense, the points of discontinuity a.s. constitute only a set of measure zero, so that with probability one any point chosen at r a n d o m will in fact be a point of continuity. Such a point of continuity will a.s. be a limit point of discontinuity points, but whose jumps approach zero as the point in question is approached. The scale of A ~ / A t is (At) 0/~)-1, so that if a > 1, ~(t) is everywhere a.s. not differentiable, just as in the case of a Brownian motion. I f a < 1, ~(t) is everywhere a.s. differentiable, though of course there will be an infinite number of points (the discontinuities) for which this will not be true. The discontinuities in an a-Stable L~vy Motion imply that the bottom may occasionally fall out of the market faster than trades can be executed, as occurred, most spectacularly, in October of 1987. When such events have a positive probability of occurrence, the portfolio risk insulation promised by " p r o g r a m m e d trading" becomes wishful thinking, at best. Furthermore, the arbitrage argument of the Black-Scholes model (1973) cannot be used to price options, and options are not the redundant assets they would be if the underlying price were continuous.
2.3. M u l t i v a r i a t e stable distributions
Multivariate stable distributions are in general much richer than MV normal distributions. This is because "iid" and "spherical" are not equivalent for a < 2, and because MV stable distributions are not in general completely characterized by a simple covariation matrix as are MV normal distributions. I f xl and x2 are iid stable with a < 2, their joint distribution will not have circular density contours. Near the center of the distribution the contours are nearly circular, but as we move away from the center, the contours have bulges in the directions of the axes (Mandelbrot 1963b: 403).
398
J. H, McCulloch
Let z be an m x 1 vector of iid stable r a n d o m variables, each of whose components is S(a, 1, 1,0), and let A = (a/j) be a d m matrix of rank d _< m. The d x 1 vector x = Az then has a d-dimensional M V stable distribution with atoms in the directions of each of the columns aj of A. I f any two of these columns have the same direction, say a2 = 2al for some 2 > 0, they may, with no loss of generality, be merged into a single column equal to (1 + 2~)l/~at, by (5) and (6). Each a t o m will create a bulge in the joint density in the direction o f aj. If the columns come in pairs with opposite directions but equal norms, x will be SS. The (discrete) speetral representation represents aj as cjsj, where cj = [[aj[[ and sj = aj/ej is the point on the unit sphere S a c R a in the direction of aj. Then x m a y be written
m
x = Z
j-1
cjsjzj,
(14)
and for e 1 has log c.f.

m
log Ee x't = Z 7j0~1 (s)t) , j=l
(15)
where 7j = c 7 . 5 The m o s t general M V stable distributions m a y be generated by contributions coming from all conceivable directions, with some or even all of the cj in (14) infinitessimal. Abstracting f r o m location, the log c.f. m a y then be written log Ee i'e' = fssa 0~1 (s't)r(ds) , where F is a finite spectral measure defined on the Borel subsets of Sd. In the case d = 2, (16) m a y be simplified to (16)
logEe ix'' =
f0 2=
O~l(s'ot)dr(O) ,
(17)
where so = (cos 0, sin 0)' is the point on the unit circle at angle 0 and F is a nondecreasing, left-continuous function with F(0) = 0 and F(2rc) < oo. (Cp. Hardin, S a m o r o d n i t s k y and T a q q u 1991: 585; Mittnik and Rachev 1993b: 355-56; W u and C a m b a n i s 1991: 86.) Such a r a n d o m vector x = (xl, x2)' m a y be constructed f r o m a maximally positively skewed (fl = 1) e-stable L~vy m o t i o n {(0), whose iid increments d{(O) have zero drift and scale (dO) I/~, by
X=)o
f2=
so
(dr(O)),/=d{(O)
(,8)
5 Because the 6 of (3) is not additive for e = 1, fl 0 (see (8)), the formulas in this section require modification in this special case.
399
(Cp. Modarres and Nolan 1994.) This integrand has the following interpretation: If F'(O) exists, 0 contributes so (F'(O))1/~d~(O) to the integral; if F instead jumps by AF at 0,0 contributes an atom so(AF)l/~Zo, where Z o = ( d O ) - / d ~ ( O ) ~-, S(a, 1, 1, 0) is independent of d~(O') for all 0' 0. If x has such a bivariate stable distribution, and a = (al, a2)' is a vector of constants,
a'x =
fo
(al cos 0 + a2 sin 0) (dF(O))'/~d~(O)
(ao)V
(19)
is univariate stable. By (5) and (6), a'x will have scale determined by
c~(a'x) =
f0
]al cos 0 + a2 sin ol dr(o) .
(20)
M. Kanter (as reported by Hardin et al. 1991) showed in 1972 that if dF is symmetric and e > 1, E(x21x,) = K2,,x, , where, setting x {a) = sign(x)Ix] a ,
1 f02*tsin 0(COS O)(=-l)dF(O) , ~C2,1-- C=(Xl)
(21)
(22)
(xl) =
f0
2~
Icos
O?dr(O)
(23)
The integral in (22) is called the covariation of x2 onxl. Hardin et al. (1991) demonstrate that if dF is asymmetrical, E(x2]xl) is non-linear in xl, but still is a simple function involving this 2,1. They note that (21) may be valid in the symmetric cases even for a < 1. If dF, and therefore the distribution of x, is symmetric, ~ 1 (s't) in (16) and (17) may be replaced by ~t~o(S't ) = -Is't] ~, and d~(O) in (18) taken to be symmetric. In this case, the integrals may be taken over any half ofSa, provided F is doubled. One particularly important special case of MV stable distributions is the elliptical class emphasized by Press (1982: 158, 172-3). 6 If dF(s) in (16) simply equals a constant times ds, all directions will make equal contributions to x. Such a distribution will, after appropriate scaling to give the marginal distribution of each component the desired scale, have spherically symmetrical joint density f(x) = ~b~d(r), for some function ~9~d(r ) depending only on r = [Ixll, ~, and the dimensionality d of x. The log c.f. of such a distribution must be propor-
6 The particular case presented here is Press's "order m"= 1. His higher order cases (with his m > 1) are not so useful. In (1972), Press asserted that these were the most general MV symmetric stable distributions, but in (1982: 158) concedesthat this is not the case.
400 tional to 0~0(lltll)
J. H. McCulloch
= -(ft) ~/2. Such a spherical stable distribution is also called isotropic. Press prefers to select the scale factor for spherical M V stable distributions in such a way that in the standard spherical n o r m a l case, the variance o f each c o m p o n e n t is unity. The univariate counterpart of this would be to replace c in (3) by a/21/~. If this is done, the normalized scale a then equals 2~/~c, and equals the standard deviation when ~ = 2. 7 Accordingly, Press specifies what we call the standard normalized spherical stable log c.f. to be
log Ee ix't = 0~0(lltll)/2 = - ({t)~/2/2 .
(24)
In the case d = 2 of (17) and (18), the requisite constant value o f d F is, by (23),
dF(O)=
(/?
2
I cos~l~do)
)'
dO.
I f z has such a d-dimensional spherical stable distribution, and x = Hz for some non-singular d x d matrix H, then x will have a d-dimensional (normalized) elliptical stable distribution with log c.f. log E exp (ix't) = - ( t' ~,t) ~/2/ 2 and joint density
(25)
f(x) =
IZl-l/%d((X'Z-lx)1/2)
(26)
where I; = (aij) = H I [ . C o m p o n e n t xi of x will then have normalized scale a(xi) = tr]i/2 = 21/~c(x~). Y, thus acts m u c h like the M V n o r m a l covariance matrix, which indeed it is for e = 2. F o r e > 1, E(xilxj) exists and equals (agj/trjj)xj. 8 I f ~ is diagonal, the c o m p o n e n t s of x will be uncorrelated, in the sense E(x~lxs) = 0, but not independent unless e = 2. A symmetric stable r a n d o m variable C with distribution S(e, 0, c, 0) m a y be obtained as the p r o d u c t BA 2/~, where A is distributed S(~/2, 1, c*, 0) and B is distributed S(2, 0, c, 0), with c * = (cos(Tzc~/4)) 2/~ ( S a m o r o d n i t s k y and T a q q u 1994: 20-21). F u r t h e r m o r e , if B is a spherically distributed d-vector whose c o m p o n e n t s are S(2, 0, c, 0), then C is also a spherically distributed d-vector, with c o m p o n e n t s that are marginally S(e, 0, c, 0). Setting e(llCII < r) = P([[B[[A2/=< r) then implies that our density generating function m a y be c o m p u t e d f r o m a maximally skewed univariate stable density (see McCulloch and Panton, in press) as
7 Ledoux and Talagrand (1991: 123) in effect make this substitution in the univariate case. We follow the traditional parameterization here, except in the MV elliptical case. 8 Wu and Cambanis (1991) demonstrate that var(xilxj) actually exists in cases like this.
401
cp~(r)
2c.(4~c2)a/2
exp -
x~/2-1s~/2,l(x~/2/c*)dx
(27)
where c = 2 -1/~ for the Press normalization. (See also Zolotarev (1981))
3. Stable portfolio theory

Tobin (1958) noted that preferences over probability distributions for wealth w can be expressed by a two-parameter indirect utility function if all distributions under consideration are indexed by these two parameters. He further demonstrated that if utility U(w) is a concave function of wealth and this two-parameter class is affine, i.e. indexed by a location and scale parameter like the stable 6 and c, the indirect utility function V(6, c) generated by expected utility maximization must be quasi-concave, while the opportunity sets generated by portfolios of risky assets and a risk-free asset will be straight lines. Furthermore, if such a twoparameter affine class is closed under addition, convex portfolios of assets will be commensurate using the same quasi-concave indirect utility function. I f the class is symmetrical, even non-convex portfolios, with short sales of some assets, may be thus compared. The normal distribution of course has this closure property, as do all the stable distributions (Samuelson 1967). 9 F a m a and Miller (1972: 259-74, 313-319) show that the conclusions of the traditional Capital Asset Pricing Model (CAPM) carry over to the special class of MV SS distributions in which the relative arithmetic return Ri = (Pi(t+ 1) -Pi(t))/Pi(t) on asset i is generated by the " m a r k e t model":
Ri = ai + biM + gi ,
(28)
where ai and bi are asset-specific constants, M ~ S(~, 0, 1,0) is a market-wide factor affecting all assets, and ei ~ S(~, 0,ci, 0) is an asset-specific disturbance independent of M and across assets. Under (28), the returns R = (R1 .... RN)' on N assets have an N + 1-atom MV SS distribution of form (14), generated by
R=a+(b
IN)(M)
(29)
where a = (al,... aN)', etc. This distribution has N symmetrical atoms aligned with each axis, along with an N + 1st extending into the positive orthant. F M show that when ~ > 1, diversification will reduce the effect of the firmspecific risks, as in the normal case, though at a slower rate. They note that if two different portfolios of such assets are mixed in proportions x and ( l - x ) , the scale
90wen and Rabinovitch (1983) show that the general class of elliptical distributions also shares this property. However, except for the elliptical stable distributions, these cannot arise from the accumulation of iid shocks, and have no compellingrationale.
402
J. H. McCulloch
of the mixed portfolio will be a strictly convex function of x and therefore (providing the two portfolios have different mean returns) of its mean return. On the efficient set of portfolios, where mean is an increasing function of scale, maximized mean return will therefore be a concave function of scale, as in the normal case. Given Tobin's quasi-concavity of the indirect utility function, a tangency between the efficient frontier and an indirect utility indifference curve then implies a global expected utility maximum for an individual investor. When trading in an artificial asset paying a riskless real return RU is introduced, all agents will choose to mix positive or negative quantities of the riskfree asset with the market portfolio, as in the normal case. Letting 0 = (01~... ON)' represent the shares of the N assets in the market portfolio, the market return will be given by,
Rm = OrR = a,n + b , m + e m ,
(30)
where a,, = O'a, b,, = O'b, and em= 0'e. Thus, (Rm, Ri) t will have a three-atom BV SS distribution generated by
IRml Ibm l O l l ( Ri = b i 0 1
where e~ = e m -
M) e~ ,
(31)
Oiei.
The variability of R m will be given by (32)
cct(em) = b~ -~ c:(em) ,
where c~(em) = ~ O~c~ is the contribution of the firm-specific risks to the risk of the market portfolio. The conventional CAPM predicts that the prices of the N assets, and therefore their rnean returns ai, will be determined by the market in such a way that
ERi - Rf = (ERm - Rf)flCAP M ,
(33)
where the CAPM "fl" (not to be confused with the stable "fl") is ordinarily computed as
flCAPM = cov(Ri, Rm)/var(Rm)
(34)
This variance and covariance are both infinite for e < 2. However, F M point out that the market equilibrium condition in fact only requires a) that the market portfolio be an efficient portfolio and therefore minimize its scale given its mean return, and b) that in (E(R), e(R)) space, the slope of the efficient set at the market portfolio equal (ERm - R f ) / c ( R m ) . They note that these in turn imply (33), with
flCAPM --
10e(R,n) c(gm ) O0i
(35)
In the finite variance case, (35) yields (34), but the variance and covariance are in fact inessential.
403
In the market model of (28), F M show that (35) becomes 1

ct-1 ct flCAPM = bib~n-1 + Oi ei
(36)
As 0 i .L O, c(Rm) .L bin, and hence flCAPM --+ bi/bm. FM did not explore more general MV stable distributions, other than to suggest (p. 269) adding industryspecific factors to (28). Press (1982:379-81) demonstrates that portfolio analysis with elliptical MV stable distributions is even simpler than in the multi-atom model of FM. Let R ER have a normalized elliptical stable distribution with log c.f. (25) and N x N covariation matrix I2. Then the 2 x 2 covariation matrix I2" of (Rm, Ri) t will be
=
',,',=
i th
4)
e;
z ( o ei)
'
(37)
where ei is the
unit N-vector. It can easily be shown that (35) implies
flCAPM =
aimla~
(38)
In the general symmetric MV stable case, not considered by either Fama and Miller or Press, x = (Rm - ERm, Ri - ERi)' will have a bivariate symmetric stable distribution of the type (17). It then may readily be shown that the Fama-Miller rule (35) implies
flCAPM ~-
Kim ,
(39)
where Xim = E(R~ - ERiIRm -- ERm)/(Rm - ERm) is as given by Kanter's formula (22) above. This generalized formulation of the stable CAPM was first noted by Gamrowski and Rachev (1994, 1995). The possibility that e < 2 therefore adds no new difficulties to the traditional CAPM. However, we are still left with its original problems. One of these is that it assumes that there is a single consumption good consumed at a single point in time. If there are several goods with variable relative prices, or several points in time with a non-constant real interest rate structure, there may in effect be different CAPM fi's for different types of consumption risk. A second problem with the CAPM is that if arithmetic returns have a stable distribution with e > 1 and c > 0, there is a positive probability that any individual stock price, or even wealth and therefore consumption as a whole, will go negative. Ziemba (1974) considers restrictions on the utility function that will keep expected utility and expected marginal utility finite under these circumstances, but a non-negative distribution would be preferred, given free disposal and limited liability, not to mention the difficulty of negative consumption. A further complication is that it is more reasonable to assume that relative, rather than absolute, arithmetic returns are homoskedastic over time. Yet if relative onel0 This follows immediately from their (7.51), when the "efficient portfolio" considered there is the market portfolio.
404
J. H. McCulloch
period arithmetic returns have any iid distribution, then over multiple time periods they will accumulative multiplicatively, not additively as required to retain a stable distribution. A normal or stable distribution for logarithmic asset returns, log(Pi(t+ 1) /Pi(t)), keeps asset prices non-negative, and could easily arise from the multiplicative accumulation of returns. However, the log-normal or log-stable is no longer an affine two-parameter class of distributions, and so Tobin's demonstration of the quasi-concavity of the indirect utility function may no longer be invoked. Furthermore, while the closure property of stable distributions under addition implies that log-normal and log-stable distributions are closed under multiplication, as may take place for an individual stock over time, it does not imply that they are closed under addition, as takes place under portfolio formation. A portfolio of log-normal or log-stable stocks therefore does not necessarily have a distribution in the same class. As a consequence, such portfolios may not be precisely commensurate in terms of any two-parameter indirect utility function, whether quasi-concave or not. Conceivably, two random variables might have a joint distribution with logstable marginals, whose contours are somehow deformed in such a way that linear combinations of them are nevertheless still log-stable. However, Boris Mityagin (in McCulloch and Mityagin 1991) has shown that this cannot be the case if the log-stable marginal distributions have finite mean, i.e. e = 2 or /3 = - 1 . This result makes it highly unlikely that the infinite mean cases would have the desired property, either. In the Gaussian case, the latter set of problems has been avoided by focussing on continuous time Wiener processes, for which negative outcomes may be ruled out by a log-normal assumption, but for which instantaneous logarithmic and relative arithmetic returns differ only by a drift term governed by It6's lemma. With e < 2, however, the discontinuities in continuous-time stable processes make even instantaneous logarithmic and relative arithmetic returns behave fundamentally differently. It therefore appears that the stable CAPM, like the Gaussian CAPM, provides at best only an approximation to the equilibrium pricing of risky assets. There is, after all, nothing in theory that guarantees that asset pricing will actually have the simplicity and precision that was originally sought in the two-parameter asset pricing model.
4. Log-stable option pricing 11

An option is a derivative financial security that gives its owner the right, but not the obligation, to buy or sell a specified quantity of an underlying asset at a contractual price called the striking price or exercise price, within a specified period of time. An option to buy is a call option, while an option to sell is a put
11 This section draws heavily on, and supplants, McCulloch (1985b).
Financialapplications of stable distributions
405
option. I f the option may only be exercised on its maturity date it is said to be European, while if it may be exercised at any time prior to its final maturity it is said to be American. In practice, most options are "American," but " E u r o p e a n " options are easier to evaluate, and under some circumstances the two will have equal value. Black and Scholes (BS; 1973) find a precise formula for the value of a European option on a stock whose price on maturity has a log-normal distribution, by means of an arbitrage argument involving the a.s. everywhere continuous path of the stock price during the life of the option. Merton (1976) noted early on that deep-in-the money, deep-out-of-the money, and shorter maturity options tend to sell for more than their BS predicted value. Furthermore, if the BS formula were based on the true distribution, implicit volatilities calculated from it using synchronous prices for otherwise identical options with different striking prices would be constant across striking prices. In practice, the resulting implicit volatility curve instead often bends up at the ends, to form what is often referred to as the volatility smile (Bates 1996). This suggests that the market, at least, believes that large price movements have a higher probability, relative to small price movements, than is consistent with the log-normal assumption of the BS formula. The logic of the BS model cannot be adapted to the log-stable case, because of the discontinuities in the time path of an a-stable L~vy process. 12 Furthermore if the log stock price is stable with ~ < 2 and/~ > - 1, the expected payoff on a call is infinite. This left Paul Samuelson (as quoted by Smith 1976: 19) "inclined to believe in [Robert] Merton's conjecture that a strict L6vy-Pareto [stable] distribution on log(S*/S) would lead, with 1 < ~ < 2, to a 5-minute warrant or call being worth 100 percent of the c o m m o n . " Merton further conjectured (1976: 127n) that an infinite expected future price for a stock would require the risk free discount rate to be infinite, in order for the current price to be finite. We show below that these fears are unfounded, even in the extreme case ~ < 1. Furthermore, the value of European options under generalized log-stable uncertainty may be evaluated using fundamental expected utility maximization principles, rather than the BS arbitrage argument or even risk-neutrality.
4.1. Spot and forward asset prices

Let there be two assets, AI and A2, that give a representative household utility U(A1, A2), with marginal utilities Ul and U2. Let
Sr = U2/U~
(40)
12 Rachev and Samorodnitsky (1993) attempt to price a log-symmetricstable option, using a hedging argument with respect to the directionsof the jumps in an underlying e-stable L6vymotion, but not with respect to their magnitudes. Furthermore, their hedge ratio is computed as a function of the still unobserved magnitude of the jumps. These drawbacks render their formula less than satisfactory, even apart from its difficulty of calculation. Jones (1984) calculates option values for a compound jump/diffusion process in which the jumps, and therefore the process, have infinite variance, but this is neither a stable nor a log-stable distribution.
406
J. H. McCulloch
be the random spot price of A2 in terms of A1 at future time T. If log U1 and log U2 are both stable with a common characteristic exponent, then log Sr will also be stable, with the same exponent. It will be apparent from context whether " S " represents the spot price of a security, as generally used in the option pricing literature, or a stable c.d.f. Let F be the forward price in the market at present time 0 on a contract to deliver 1 unit of A2 at time T, with unconditional payment of F units of A1 to be made at time T. The expected utility from a position of size E in this contract is EU(A1 - eF, A2 ~). Maximizing over e and imposing the equilibrium condition e = 0 yields
F = EU2/EU1 .
(41)
The expectations in (41) are both conditional on present (time 0) information. In order for the EUi to be finite when the log Ui are stable with e < 2, the latter must both be maximally negatively skewed, i.e. have/~ = - 1 , per (9). We presently see no alternative but to make this assumption in order to evaluate logstable options. However, this restriction does not prevent log S r from being intermediately skew-stable, or even SS, since log ST may receive an upper Paretian tail from U2, as well as a lower Paretian tail from U1, and have intermediate skewness governed by (7). Let u~ ~ S(e, +1, cl, 6l) and u2 ~ S(c~, +1, C2, 62) be independent asset-specific maximally positively skewed stable variates contributing negatively to log U1 and log U2, respectively. In order to add some generality, let u3 ~ S(e, +1, c3,63) be a common component, contributing negatively and equally to both log U~ and log U2, and which is independent of ul and u2, so that log Ul = - u l log U2 =
-u2 u3 ,
(42) (43)
u3
Let (c~,//, c, 6) be the parameters of logSr = ul - u2 (44)
We assume that e,/3, c, and F are known, but that 6, cl, c2, e3, 61,62, and 63 are not directly observed. We have, by (5)-(8),
6 = 61 -- 6 2 , C( 1 ,
(45)
(46)
c ~' Z l
~' C 2 ~'
C2
1 -/ c~~ C C
(47)
We will return to the case ~ = 1, but for the moment assume ~ 1. Equations (46) and (47) may be solved for
Financialapplications of stabledistributions
Cl = ((1 +
2 = ((1 -
407
fl)/2)1/% ,
(48)
Using Zolotarev's formula (9) and setting 0 = rca/2, we have

EUi = e -6'-63-(cT+c~)see ,
i = 1,2 ,
(49)
so that (41) gives us

F = e ~-~2+(c~-~) see o = e6+~c see0
(50)
Iffl = 0 (because ca = c2), (50) implies l o g F -----"ElogSr. This special case does not require logarithmic utility, but only that U1 and U2 make equal contributions to the uncertainty of St.
4.2. Option pricing
Let C be the value, in units of A1 to be delivered unconditionally at time 0, of a European call on 1 unit of asset A2 to be exercised at time T, with exercise (striking) price X. Let rl be the default-free interest rate on loans denominated in A1 with maturity T. C units of A1 at time 0 are thus marginally equivalent to C exp(rl T) units at T. If ST > X at time T, the option will be exercised. Its owner will receive 1 unit of A2, in exchange for X units of A 1. If ST <_ X, the option will not be exercised. In either event, its owner will be out the interest-augmented C exp(rl T) units of A1 originally paid for the option. In order for the expected utility gain from a small position in this option to be zero, we must have
/S
T>X
(U2-XUI)dP(U1,U2)-CerIT f
J all
Sr
Uldp(U1,U2)=O
(51)
or, using (41),

C=e-~,T[ F-
x fST>xUldP(el, u21] . f U2dP(Ul,U2)-~-~l LEU2 JsT>x

(52)
In the above, P(U1, U2) represents the joint probability distribution for U1 and U2. (52) is valid for any joint distribution for which the expectations exist. It is shown in the Appendix that for our stable model with ~ 1, (52) becomes
C = F e -r' Tc~See 011 _ Xe-rl r+c~ see 012 , (53)
where, setting Sa~l = 1 - Sal,

[1 =
f~
OQ
-c2z Sul (z)S~l c ( ( c2z + log X + tic ~ sec 0) ~el ) dz ,
(54)
408
J. H. McCulloch
Eq. (53) effectively gives C as a function C ( X , F , ~ , fl, c, r l , T ) , since cl and c2 are determined by (48), and 0 = rc~/2. Note that (5 is not directly required, since all we need to know about it is contained in F through (50). The common component of uncertainty, u3, completely drops out. Rubinstein (1976) demonstrates that (52) leads to the Black-Scholes formula when log U1 and log U2 have a general bivariate normal distribution. Eq. (53) therefore generalizes BS to the case ~ < 2. If the forward price F is not directly observed, we may use the current spot price So to construct a proxy for it if we know the default-free interest rate r2 on Az-denominated loans, since arbitrage requires
F = S o e (r'-r2)T .
(56)
The value P of a European p u t option giving one the right to s e l l 1 unit of A2 at striking price X at future time T m a y be evaluated by (53), along with the put-call parity arbitrage condition
P = C + (X - F)e -r'r
(57)
Equations (50) and (53) are valid even for ~ < 1. When a = 1, (50) and (53) become
F = e 6-(2/n)'oclgc ,
(58) , (59)
C = F e -r~ T- (2/.)c2logc2I1 -- X e -rl T - (2/~)cllogc112
where cl and c2 are as in (48), but now,

11 =
oo
e-C2Zsll(Z)S~l
ezzq-
q---(czlogc2 - cl logc~
7~
Cl
dz
(6o)
I2 =
e-~l~sll(z)Sll
O(3
((
c l z - log X -
(c2 logc2 - cl lOgcl)
)/)
c2
dz
(61)
4.3. A p p l i c a t i o n s
The stable option pricing formula (53) may be applied without modification to options on commodities, stocks, bonds, and foreign exchange rates, simply by appropriately varying the interpretation of the two assets A1 and A2.
409
4.3.a. Commodities
Let A L and A2 be two consumption goods, both available for consumption on some future date T. A1 could be an aggregate of all goods other than A2. Let rl be the default-free interest rate on A l-denominated loans. Let U1 and U2 be the random future marginal utilities of A1 and A2, and suppose that log U1 and log U2 have both independent (ul and u2) and common (u3) components, as in (42) and (43). The price ST of A2 in terms of A1, as determined by (40), is then log-stable as in (44), with current forward price F as in (50). The price C of a call on 1 unit of A2 at time T is then given by (53) above. Such a scenario might, for example, arise from an additively separable C R R A utility function
U(A1,A2) z ~ l
- 1-q (A 1 _~ A~-t/), r/ >
0,
t/
(62)
with the physical endowments given by Ai = e vi+v3, i = 1,2, where vl, v2 and v3 are independent stable variates with a common e and/~ = + 1.
4.3.b. Stocks
Suppose now that there is a single good G, which serves as our numeraire, A1. Let A2 be a share of stock in a firm that produces a random amount y of G per share at T. Let rl be the default-free interest rate on G-denominated loans with maturity T. The firm pays continuous dividends, in stock, at rate 1"2, and its stock has no valuable voting rights before time T, so that one share for spot delivery is equivalent to exp(r2T) shares at T. Let Uo be the random future marginal utility of one unit of G at time T, and suppose that log Uc = - u l - u3 , log y
= Ul u2 ,
(63) (64)
where the ui ~ S(c~, +1, Ci, (~i) are independent. The marginal utility of one share is then yUa = exp(-u2 - u3), and the stock price per share using unconditional claims on G as numeraire, ST = (yUa)/UG, is as in (44) above. The forward price of one share, F = E(yUa)/E(Ua), is as in (50) above. The value of a European call on 1 share at exercise price X is then given by (53). If the forward price of the stock is not directly observed, it may be constructed from rl, r2, and the current spot stock price So by (56). Equation (64) states that to the extent there is firm-specific good news (-u2), it is assumed to have no upper Paretian tail. This means that the firm will produce a fairly predictable amount if successful, but may still be highly speculative, in the sense of having a significant probability of producing much less or virtually nothing at all. To the extent there is firm non-specific good news (ul), the marginal utility of G, given by (63), is assumed to be correspondingly reduced. De-
410
J. H. McCulloch
spite this admittedly restrictive scenario, the stock price S T c a n take on a completely general log-stable distribution, with any permissible a, r, c, or 6. Note that in terms of expected arithmetic returns, the population equity premium is infinite for a log-stable stock, unless fl = - 1 .
4.3.c. Bonds 13
N o w suppose that there is a single consumption good, G, that m a y be available at each of two future dates, T2 > T1 > 0. Let A1 and A2 be unconditional claims on one unit of G at T1 and T2, resp., and let U1 and U2 be the marginal utility of G at these two dates. Let E1 U2 be the expectation of U2 as of Tl. As of present time 0, b o t h U 1 and E1U2 are random. Assume log U1 = - U l u3 and log EiU2 = - u 2 - u3, where the ui are independently S(~, +1, ei, 6i). The price at Tl of a bond that pays 1 unit of G at/'2, B(TI, T2) = E~U2/UI, is then given by (44) above, and the current forward price F of such a bond implicit in the term structure at present time 0, F = B(0, T2)/B(O, 1"1)= EoU2/EoU1 = E0(E1U2)/ EoUI, is governed by (50) above. 14 The price of a European call is then given by (53) above, where rl is now the time 0 real interest rate on loans maturing at time T1, and " T " is replaced by/'1.
4.3.d. Foreign exchange rates 15

To the extent that real exchange rates fluctuate, they may simply be modeled as real commodity price fluctuations, as in Subsection 4.3.a above. However, the purchasing power parity (PPP) model of exchange rate movements provides an instructive alternative interpretation of the stable option model, in terms of purely
nominal risks. Let P1 and P2 be the price levels in countries 1 and 2 at future time T. Price level uncertainty itself is generally positively skewed. Astronomical inflations are easily arranged, simply by throwing the printing presses into high gear, and this policy has considerable fiscal appeal. Comparable deflations would be fiscally intolerable, and are in practice unheard of. It is therefore particularly reasonable to assume that log P1 and log P2 are both maximally positively skewed. Let ul and u2 be independent country specific components of log P1 and log P2, respectively, and let u3 be an international component of both price levels, re-
13 McCulloch (1985a) uses the results of this section, in the short-lived limit treated below, to evaluate deposit insurance in the presence of interest-rate risk. 14 This model leads to the Log Expectation Hypothesis logF =ElogB(T1,Ta) when f l - 0. McCulloch (1993) demonstrates with a counterexample that the 1981 claim of Cox et al., that this necessarily violates a no-arbitrage condition in continuous time with c~= 2, is invalid. The requisite forward price Fmay be computed as exp(rl T1 - R2T2),whereR2 is the time 0 real interest rate on loans maturing at 7"2. 15 The present subsection draws heavily on McCulloch (1987), q.v. for extensions. Eq. (12.18) of that paper contains an error which is corrected in Eq. (56) of the present paper.
411
flecting the "herd instincts" of central bankers, that is independent of both ul and u2, so that logPi = ui + u3, i = 1,2. Let ST be the exchange rate giving the time T value of currency 2 (A2) in terms of currency 1 (A1). Under PPP, ST = P1/P2 is then as given in (44) above. The lower Paretian tail of log X will give the density of X itself a mode (with infinite density but no mass) at 0, as well as a second mode (unless c is large relative to unity) near exp(ElogX). Thus log-stable distributions achieve the bimodality sought by Krasker (1980) to explain the "peso problem," all in terms of a single story about the underlying process, requiring as few as three parameters (if log-symmetric). Assuming that inflation uncertainty involves no systematic risk, the forward exchange rate F must equal E(1/P2)/E(1/P1) in order to set expected profits in terms of purchasing power equal to zero, and will be determined by (50) above. Let rl and r2 be the default-free nominal interest rates in countries 1 and 2. Then the shadow price of a European call on one unit of currency 2 that sets the expected purchasing power gain from a small position in the option equal to zero is given by (53). The forward price F may, if necessary, be inferred from the current spot price So by means of covered interest arbitrage (56).
4.3.e. Pseudo-hedge ratio

The risk exposure from writing a call on one unit of an asset can be partially neutralized (to a first-order approximation) by simultaneously taking a long forward position on
O(C exp(rl T)) _ eC~seeoi1 OF
(65)
units of the underlying asset. Unfortunately, the discontinuities leave this position imperfectly hedged if ~ < 2. At the same time, this imperfect ability to hedge implies that options are not redundant financial instruments.
4.4. Put/call inversion and in/out duality C(X, F, or, ~, c, rl, T) in equation (53) above may be written as C(X, F, ~, ~, c, rl , T) = e-r~ TFc* ( X , o~,fl, c) ,
(66)
where C~(X/F, 1, ~,/~, c) = C(X/F, ~,//, c, 0, 1) (cp. Merton 1976: 139). Similarly, the value of a put on 1 unit of A2 may be written as
P(X, F, ~, fl, c, rl , T) = e-rl r Fp* ( X , o~,fl, c) ,

where, using (57),
(67)
412
J. H. McCulloch
= c
+p-
1 .
(68)
Now a call on 1 unit of A2 at exercise price X [units A1/unit A2] is the same contract as a put on X units of A1 at exercise price 1/X [units A2/unit A1]. The value of the latter, in units of A2 for spot delivery, is XP(1/X, 1/F, ~, -[1, c, r2, 7), since the forward price measured in units of A2 is l/F, and since log 1~St has parameters c~,-[1 and c. Multiplying by the current spot price So so as to give units of A1 for spot delivery, we have the put-call inversion relationship,
C(X, F, ~, [1,c, rl, T) = SoX~ ~ ,
p/1
ff'l
~, -[1, c, r2, T
(69)
Using (57) and (68), this implies the following in/out of the money duality re-
lationship: C* ~,o~,[1, c = F
=ffC ~
\ X '~'-[1'c
,c~,-[1, c -ff+l . (70)
Puts and calls for all interest rates, maturities, forward prices, and exercise prices may therefore be evaluated from C* (X/F, ~, [1,c) for X / F >_ 1.
4.5. Numerical option values

Table 1 gives illustrative values of 100 C*(X/F,o~,fl, c). 16 This is the interestincremented value, in terms of A1, of a European call on an amount of A2 equal in value (at the forward price) to 100 units of A1. E.g., if A1 is the dollar and A2 is a stock, the table gives the value, in dollars and cents to be paid at the maturity of the option, of a call on $100 worth of stock. Panel a of Table 1 holds e and/~ fixed at 1.5 and 0.0, while c and X/Fvary. The call value declines with X/F, and increases with c. The reader may confirm that the first and last columns satisfy (70). Panels l b ~ l hold c fixed at 0.1 and allow e and [1 to vary for three values of X/ F representing "at the money" (in terms of the forward, not spot, price) with X/F = 1.0; "out of the money" but still on the shoulder of the distribution with X/F = 1.1; and "deep out of the money" with X/F = 2.0. When e = 2, [1 has no effect
16The requisite skew-stabledistributionand densitymay obtained from the tables of McCulloch and Panton (in press), though Table 1 was based on cubic interpolation off the earlier tables of DuMouchel (1971). See McCulloch (1985b) for details. Option values are tabulated extensivelyin McCulloch (1984).
Financial applications o f stable distributions
413
Table 1 IOOC*(X/F, ~, fl, c)

a) ~ = 1.5,/~ =
o.o
X/F
c 0.01 0.03 0.10 0.30 1.00
0.5 50.007 50.038 50.240 51.704 64.131
1.0 0.787 2.240 6.784 17.694 45.642
1.1 0.079 0.458 3.466 14.064 43.065
2.0 0.014 0.074 0.481 3.408 28.262
b) e = O . 1 , X / F - 1.0
-1.0 2.0 1.8 1.6 1.4 1.2 1.0 0.8 5.637 6.029 6.670 7.648 9.115 11.319 14.685
-0.5 5.637 5.993 6.523 7.300 8.455 10.200 12.893
0.0 5.637 5.981 6.469 7.157 8.137 9.558 11.666
0.5 5.637 5.993 6.523 7.300 8.455 10.200 12.893
1.0 5.637 6.029 6.670 7.648 9.115 11.319 14.685
e) c = O . 1 , X / F = 1.1
-1.0 2.0 1.8 1.6 1.4 1.2 1.0 0.8 2.211 2.271 2.499 2.985 3.912 5.605 8.596
-0.5 2.211 2.423 2.772 3.303 4.116 5.391 7.516
0.0 2.211 2.590 3.123 3.870 4.943 6.497 8.803
0.5 2.211 2.764 3.510 4.530 5.957 8.002 11.019
1.0 2.211 2.944 3.902 5.175 6.924 9.410 13.067
d) e = O . 1 , X / F -- 2.0
P
a 2.0 1.8 1.6 1.4 1.2 1.0 0.8 -1.0 0.000 a 0.000 0.000 0.000 0.000 0.000 0.000 -0.5 0.000 a 0.055 0.160 0.351 0.691 1.287 2.333 0.0 0.000 a 0.110 0.319 0.695 1.354 2.488 4.438 0.5 0.000 ~ 0.165 0.477 1.032 1.991 3.619 6.372 1.0 0.000 ~ 0.220 0.634 1.361 2.604 4.689 8.164
Note: aActual value 1.803 x 10-6 rounds to 0.000.
414
J. H. McCulloch
on the option value, even though the underlying story in terms of the two marginal utilities is changingJ 7 Implicit parameter values may be numerically computed from market option values by means of the stable option formulas above. If fl is assumed to be 0, this may be done by using the synchronous prices of two otherwise identical options with different striking prices. McCulloch (1987) shows, using actual quotations on the DM for 9/17/84, how this may be done graphically. The rounding error in the two quotations used accommodated a range of (1.766, 1.832) for e, and a range of (0.0345, 0.0365) for c. The market clearly did not believe the DM was log-normal on this arbitrarily chosen date. If asymmetry is not assumed away, three option values may be used to calculate implicit values of e, fl, and c.
4.5. Low probability and short-lived options

Assume X > F and that c is small relative to log(X/F). Holding fl constant, el and c2 are then small as well. Equation (2) then implies (see McCulloch 1985b for details) that the call value C behaves like
Fe
rlTcCt(1 -]- fl) gt(e,X/F) ,
(71)
where
7 t ( e ' x ) - F ( e ) s~i n O [ e( l g (x ) - ~ ( -~C ni o g(x)
~ld(]
"
(72)
This function is tabulated in some detail in Table 2. It becomes infinite as x + 1, and 0 as e T 2. By the put/call inversion formula (69) (with the roles of C and P reversed), P behaves like
Xe -r'r e~(1 - fl) ~P(e,F/X) .
(73)
In an a-Stable L6vy Motion, the scale that accumulates in T time units is
CoT 1/~. As T .L 0, the forward price F converges on the spot price So. Therefore lira(C/T) = S0(1 + fl)e~g~(e,X/So)
r,~o
(74) (75)
lim(P/T) = X(1 - fl)e~ 7"(e, So~X) .

T+0
Eq. (75) has been employed by McCulloch (1981, 1985a) to evaluate the put option implicit in deposit insurance for banks and thrifts that are exposed to
17 The values for : = 2 reported here were, as a check, computed independently by the same numerical procedure used to obtain the sub-Gaussian values, and then checked against the BlackScholes formula, with a - cv/2. Using the approximation 1 N(x) ~ n ( x ) / x for large x, the BS formula becomes C* = N ( d t ) - X N ( d 2 ) F ~ 6 N ( d l ) / ( d l d 2 ) for large values of l o g ( X / F ) / c , where dl = - - l o g ( X / F ) / a + a/2, d2 = dl - ~r, n(x) = N'(x), and F is determined by (56).

Table 2 x = x/F 1.001 1.01 1.02 1.04 1.06 1.10 .000 .190 .343 .560 .688 .753 .777 .774 .753 .724 .689 .654 .619 1.15 .000 .124 .227 .382 .484 .547 .582 .596 .597 .589 .575 .558 .541 1.20 .000 .091 .169 .291 .376 .434 .471 .492 .503 .505 .502 .496 .489 1.40 .000 .043 .082 .149 .203 .246 .280 .306 .327 .343 .356 .366 .375 2.00 4.00 .0000 .0168 .0329 .0633 .0914 .1172 .1411 .1634 .1842 .2039 .2227 .2411 .2592 .0000 .0062 .0126 .0256 .0391 .0531 .0676 .0827 .0985 .1150 .1325 .1511 .1710
415
10.00 .0000 .0028 .0059 .0125 .0199 .0282 .0375 .0479 .0594 .0723 .0868 .1031 .1215
2.00 0.00 0.000 0.000 0.000 0.000 1.95 18.10 1.962 0.989 0.492 0.324 1.90 26.43 3.199 1.665 0.854 0.573 1.80 28.38 4.275 2.369 1.291 0.896 1.70 23.13 4.319 2.544 1.471 1.056 1.60 17.01 3.916 2.448 1.498 1.112 1.50 11.93 3.365 2.227 1.441 1.103 1.40 8.22 2.812 1.966 1.341 1.059 1.30 5.65 2.319 1.707 1.225 0.995 1.20 3.92 1.904 1.471 1.106 0.923 1.10 2.77 1.567 1.266 0.995 0.852 1.00 2.02 1.300 1.092 0.894 0.784 0.90 1.51 1.090 0.949 0.806 0.722
interest rate risk, using SS M L estimates o f the p a r a m e t e r s o f r e t u r n s on U.S. T r e a s u r y securities to q u a n t i f y p u r e interest rate risk.
5. Paramcter estimation and empirical issues

I f ~ > 1, O L S p r o v i d e s a consistent e s t i m a t o r o f the stable l o c a t i o n p a r a m e t e r 6. H o w e v e r , it has an infinite v a r i a n c e stable d i s t r i b u t i o n with the s a m e ~ as the o b s e r v a t i o n s , a n d has 0 efficiency. F u r t h e r m o r e , e x p e c t a t i o n s proxies b a s e d on a false n o r m a l a s s u m p t i o n will generate s p u r i o u s evidence o f i r r a t i o n a l i t y if the true d i s t r i b u t i o n is stable with ~ < 2 (Batchelor 1981).
5.1. Univariate stable parameter estimation

D u M o u c h e l (1973) d e m o n s t r a t e s t h a t M L m a y be used to estimate the f o u r stable p a r a m e t e r s , a n d t h a t the M L estimates have the usual a s y m p t o t i c n o r m a l i t y g o v e r n e d b y the i n f o r m a t i o n m a t r i x , except in the n o n - s t a n d a r d b o u n d a r y cases = 2 a n d / ~ = zkl. I n (1975), he t a b u l a t e s the i n f o r m a t i o n m a t r i x , which m a y be used for a s y m p t o t i c h y p o t h e s i s testing except in the b o u n d a r y cases where, as he p o i n t s out, M L is actually super-efficient. M o n t e C a r l o critical values o f the l i k e l i h o o d r a t i o for the n o n - s t a n d a r d null h y p o t h e s i s ~ = 2 with a s y m m e t r i c stable a l t e r n a t i v e have been t a b u l a t e d b y M c C u l l o c h (in press a). D u M o u c h e l (1983) suggests t h a t the M L e s t i m a t o r o f c~is b i a s e d d o w n w a r d s when the true c~is n e a r 2.00, b u t this is n o t b o r n e o u t ( a p a r t f r o m the effect o f the ~ _< 2 b o u n d a r y restriction) in larger s a m p l e s i m u l a t i o n s r e p o r t e d b y M c C u l l o c h (in press a). In the SS cases, the n u m e r i c a l a p p r o x i m a t i o n o f M c C u l l o c h (1994b) p e r m i t s fast c o m p u t a t i o n o f the l i k e l i h o o d w i t h o u t r e s o r t i n g to the b r a c k e t i n g p r o c e d u r e
416
J. H. McCulloeh
proposed by DuMouchel. SS ML using an early version of this approximation was applied to interest rate data in McCulloch (1981, 1985a). Asymmetric stable ML has been performed by Stuck (1976), using the Bergstrom series, by Feuerverger and McDunnough (1981), using Fourier inversion of the log c.f., and by Brorsen and Yang (1990) and Liu and Brorsen (1995) using Zolotarev's integral representation of the stable density. See also the algorithm of Chen (1991), reported and employed by Mittnik and Rachev (1993a). ML linear regression with stable residuals has been implemented for the SS case by McCulloch (1979) and for the general case by Brorsen and Preckel (1993). Buckle (1995) and Tsionas (1995) go beyond ML to explore the Bayesian posterior distribution of stable parameters. A much simpler, but at the same time less efficient, method of estimating SS distribution parameters from order statistics was proposed by Fama and Roll (1971), and has been widely implemented. This method has been extended to the asymmetric cases, and a small asymptotic bias in the Fama-Roll estimator of c in the SS cases removed, by McCulloch (1986). A large body of work, following Press (1972), has focussed on fitting the empirical log c.f. to its theoretical counterpart (3), (4). See Paulson, Holcomb and Leitch (1975); Feuerverger and McDunnough (1977, 1981a,b); Arad (1980); Koutrouvelis (1980, 1981); and Paulson and Delehanty (1984, 1985). Practitioners report a high degree of efficiency relative to the ML benchmark. 18 Mantegna and Stanley (1995) implement a novel method of estimating the stable index from the modal density of returns at different sampling intervals. Stable parameters have been estimated for stock returns by Fama (1965), Leitch and Paulson (1975), Arad (1980), McCulloch (1994b), Buckle (1995), and Manegna and Stanley (1995); for interest rate movements by Roll (1970), McCulloch (1985), Oh (1994); for foreign exchange rate changes by Bagshaw and Humpage (1987), So (1987a,b), Liu and Brorsen (1995), and Brousseau and Czarnecki (1993); for commodities price movements by Dusak (1973), Cornew, Town and Crowson (1984), and Liu and Brorsen (in press); and for real estate returns by Young and Graft (1995), to mention only a few studies.
5.2. Empirical objections to stable distributions
The initial interest in the stable model of financial returns has undeservedly waned, largely because of two groups of statistical tests. The first group of tests is based on the observation that if daily returns are iid stable, weekly and monthly returns must be also be stable, with the same characteristic exponent. Blattberg and Gonedes (1974), and many subsequent investigators, notably Akgiray and Booth (1988) and Hall, Brorsen and Irwin (1989), have found that weekly and monthly returns typically yield higher estimates of e than do daily returns. Such 18On estimation see also Blattberg and Sargent (1971), Kadiyala (1972), Brockwelland Brown (1979, 1981), Fielitz and Roselle(1981), Cs6rg6 (1984, 1987), Zolotarev (1986: 217f0, Akgiray and Lamoureux (1987), and Klebanov, Melamedand Rachev(1994).
417
evidence has led even Fama (1976: 26-38) to abandon the stable model of stock prices. However, as Diebold (1993) has pointed out, all that such evidence really rejects is the compound hypothesis of iid stability. It demonstrates either that returns are not identical, or that they are not independent, or that they are not stable. If returns are not lid, then it should come as no surprise that they are not iid stable. It is now generally acknowledged (Bollerslev, Chou and Kroner, 1992) that most time series on financial returns exhibit serial dependence of the type characterized by A R C H or G A R C H models. The unconditional distribution of such disturbances will be more leptokurtic than the conditional distribution, and therefore would generate misleadingly low e estimates under a false iid stable assumption. Baillie (1993) wrongly characterizes A R C H and G A R C H models as "competing" with the stable hypothesis. See also Ghose and Kroner (1995), Groenendijk et al. (1995). In fact, if conditional heteroskedasticity (CH) is present, it is as desirable to remove it in the infinite variance stable case as in the Gaussian case. And if after removing it there is still leptokurtosis, it is as desirable to model the adjusted residuals correctly as it is in the iid case. McCulloch (1985b) and Oh (1994) thus fit GARCH-like and G A R C H models, respectively, to monthly bond returns by symmetric stable ML, and find significant evidence of both CH and residual non-normality. Liu and Brorsen (in press) similarly find, contrary to the findings of Gribbin, Harris and Lau (1992), that a stable model for commodity and foreign exchange futures returns cannot be rejected, once G A R C H effects are removed. Their observations apply also to the objections of Lau, Lau and Wingender (1990) to a stable model for stock price returns. De Vries (1991) proposes a potentially important class of GARCH-like subordinated stable processes, but this model has not yet been empirically implemented. Day-of-the-week effects are also well known to be present in both stock market (Gibbons and Hess 1981) and foreign exchange (McFarland, Pettit and Sung 1982) data. Whether such hebdomodalities are present in the mean or the volatility, they imply that daily data is not identically distributed. It is again as important to remove these, along with any end-of-the month effects and seasonals that may be present, in the infinite variance stable case as in the normal case. Lau and Lau (1994) demonstrate that mixtures of stable distributions with different scales tend to reduce estimates of e below its true value, whereas mixtures with different locations tend to increase estimates above the true value. A second group of tests that purport to reject a stable model of asset returns is based on estimates of the Paretian exponent of the tails, using either the Pareto distribution itself (Hill 1975), or the generalized Pareto (GP) distribution (DuMouchel 1983). Numerous investigators, including DuMouchel (1983), Akgiray and Booth (1988), Jansen and de Vries (1991), Hols and de Vries (1991), and Loretan and Phillips (1994), have applied this type of test to data that includes interest rate changes, stock returns, and foreign exchange rates. They typically have found an exponent greater than 2, and have used this to "reject" the stable model on the basis of asymptotic tests.
418
J. H. McCulloch
However, McCulloch (1994b) demonstrates that tail index estimates greater than 2 are to be expected from stable distributions with c~ greater than approximately 1.65 in finite samples of sizes comparable to those that have been used in these studies. These estimates may even appear to be "significantly" greater than 2 on the basis of asymptotic tests. The studies cited are therefore in no way inconsistent with a Paretian stable distribution. 19 Several alternative distributions have been proposed to account for the conspicuously leptokurtic behavior of financial returns. Blattberg and Gonedes (1974) and Boothe and Glassman (1987) thus propose the Student's t distributions, which may be computed for fractional degrees of freedom, and which, like the stable distributions, include the Cauchy and the normal. Others (e.g. Hall, Brorsen and Irwin 1989; Durbin and Cordero 1993) consider a mixture of normals. Boothe and Glassman (1987) find somewhat higher likelihood for the Student distribution than for either the mixture of normals or stable, but these hypotheses are not nested, so that the likelihood ratio does not necessarily have a Z2 distribution. Lee and Brorsen (1995) have had some success formally comparing such non-nested hypotheses using Cox-like tests. However, such distributions are intrinsically difficult to differentiate without extremely large samples, as noted already by DuMouchel (1973b). The choice among leptokurtic distributions may in the end depend primarily on whatever desirable properties they may have, in particular divisibility, parsimony, and central limit attributes. Cs6rg6 (1987) constructs a formal test for one aspect of stability, and fails to reject it using selected stock price data. Mittnik and Rachev (1993a) generalize the concept of "stability" beyond the stability under summation and multiplication that leads to the stable and logstable distributions, respectively, to include stability under the maximum and minimum operators, as well as stability under a random repetition of these accumulation and extremum operations, with the number of repetitions governed by a geometric distribution. They find that the Weibull distribution has two of these generalized stability properties. Since it has only positive support, they propose a double Weibull distribution (two Weibull distributions back-to-back) as a model for asset returns. This distribution has the unfortunate property that its density is, with only one exception, either infinite or zero at the origin. The sole exception is the back-to-back exponential distribution, which still has a cusp at the origin. The stable densities, on the other hand, are finite, unimodal, absolutely differentiable, and have closed support.
5.3. State-space models
Stable state-space models may be estimated using the Bayesian approach of Kitagawa (1987). When there is only one state variable, the marginal retrospective posterior (filter) distribution of the state variable and the likelihood requires 19 Mittnik and Rachev (1993b: 264-5) similarlyfind that the Wiebull distribution gives tail index estimators in the range 2.5 5.5, even though the Weibull distribution has no Paretian tail.
419
approximately mn numerical integrations with m nodes, where n is the sample size. The hyperparameters of the model may then be estimated by ML, and the marginal full sample posterior (smoother) distribution then computed by another n numerical integrations. If the disturbances are SS, the density approximation of McCulloch (1994b) makes these calculations feasible, even on a personal computer, despite the numerous iterations required by the ML step. Oh (1994) thus estimates an AR(1) time-varying term premium (the state variable) for excess returns on U.S. Treasury securities. After also adjusting for pronounced state-space GARCH effects, he finds ML ~ values ranging from 1.61 to 1.80 and LR statistics (2A log L) for the null hypothesis ~ = 2 in the range 12.95 to 25.26. These all reject normality at the 0.996 level or higher, using the critical values in McCulloch (1994b). (See also Bidarkota and McCulloch (1996)). Multiple state variables greatly increase the number of numerical integrals, and therefore the calculation time, required for Kitagawa's approach. However, the state variable may still be estimated in a reasonable amount of time by instead using the Posterior Mode Estimator approach of McCulloch (1994a, following Durbin and Cordero 1993). In many cases the hyperparameters may be estimated (though without the efficiency of full information ML) by applying pooled M L to various linear combinations of the data. Mikosch, Gadrich, Klfippelberg and Adler (1995) consider a standard ARMA process in which the innovations belong to the domain of attraction of a SS law. Since they did not have access to a numerical density approximation, they employ the Whittle estimator, based on the sample periodogram, rather than the more readily interpretable ML.
5.4. Estimation o f multivariate stable distributions
The estimation of multivariate stable distribution parameters is still in its infancy, despite the great importance of these distributions for financial theory and practice. Mittnik and Rachev (1993b: 365-66) propose a method of estimating the general bivariate spectral measure for a vector whose distribution lies in this domain of attraction. Cheng and Rachev (in press) apply this method to the $/ DM and S/yen exchange rates, with the interesting result that there is considerable density near the center of the first and third quadrants, as would be expected if a dollar-specific factor were affecting both exchange rates equally, but very little along the axes. The latter effect seems to indicate that there are negligible DM- or yen-specific shocks. Nolan, Panovska and McCulloch (1996) propose an alternative method based on ML, which uses the entire data set, whereas the Mittnik and Rachev method employs only a small subset of the data, drawn from the extreme tails of the sample. This method does not necessitate the often arduous task of actually computing the MV stable density (see Byczkowski et al., 1993; Nolan and Rajput, 1995), but relies only on the standard univariate stable density. This method expressly assumes that x actually has a bivariate stable distribution, rather than that it merely lies in its domain of attraction.
420
J. H. McCulloch
Appendix Derivation of (53) from (52) In this appendix, we let si(ui) and Si(ui) represent S(Ui;O~,+l,ci,(~i) and S(ui;~,+l,ci,~i), respectively, for i = 1 , 2 , 3 . We have S r > X whenever u2 < Ul - logX. Then, setting z = (u2 - 62)/c2 and S~ = 1 - Si, we have
oo u l - l o g X oo
f g2dP(gl'g2)= / f / e-u2 U3SI(Ul)S2(U2)S3(N3) du3du2du1 Sr>X --c~o --oo --c,o

(K~ = E e -u3 f e-U2s2(u2)
/Sl(Ul)duldu2
u2+logX
O0
-~xD
O0
= ge -u3 /e--U2sa(bla)S~(bl2 -~-l o g X ) du2

--00 OC)
= Ee-U3e -I~2 fe-C~%,(z)S~(caz + 62 +

O0
logX) dz
--00
= Ee-U3e-6211,
where, using (50), 11 is as given in (54) in the text. Similarly, but now setting z = (ul - 61)/,:1,
oo u l - l o g X 00
f UldP(UI'U2)= / I / e-u~-u3Sl(ul)S2(u2)S3(ug)dbl3du2dbll Sr>X -oo -e,o oo

oo ul - l o g X
--0(3
O0
= Ee -"3 f
--(X)
e-Uls~(ul)S2(Ul - logX) dUl
= Ee-U3e -6~ f
--00
e-C~Zs~l (z)S2(clz + 61 - logX) dz
= Ee-U3e-6U2,
where I2 is as given in (55). Substituting into (52) yields (53).
Financial applications o f stable distributions
421
Acknowledgment
T h e a u t h o r w o u l d like t o t h a n k J a m e s B o d u r t h a , S t a n l e y H a l e s , S e r g e i K l i m i n , Benoit Mandelbrot, Richard May, Svetlozar Rachev, Gennady Samorodnitsky, and Walter Torous for their comments on various aspects of this paper, and the P h i l a d e l p h i a S t o c k E x c h a n g e f o r f i n a n c i a l s u p p o r t o n S e c t i o n 4.
References
Akgiray, V. and G. G. Booth (1988). The stable-law model of stock returns. J. Business Econom. Statist. 6, 51 57. Akgiray, V. and C. G. Lamoureux (1989). Estimation of the stable law parameters: A comparative study. J. Business Econom. Statist. 7, 85-93. Arad, R. W. (1980). Parameter estimation for symmetric stable distribution. Internat. Econom. Rev. 21, 209-220. Bagshaw, M. L. and O. F. Humpage (1987). Intervention, exchange-rate volatility, and the stable Paretian distribution. Federal Reserve Bank of Cleveland Res. Dept. Baillie, R. T. (1993). Comment on modeling asset returns with alternative stable distributions. Econometric Rev. 12, 343-345. Batchelor, R. A. (1981). Aggregate expectations under the stable laws. J. Econometrics 16, 199-210. Bates, D. S. (1996). Testing option pricing models. Handbook o f Statistics. Vol. 14, Noth Holland, Amsterdam, in this volume. Bergstrom, H. (1952). On some expansions of stable distribution functions. Arkivffir Mathematik 2, 375-378. Bidarkota P. V. and J. H. McCulloh (1996). Sate-space modeling with symmetric stable shocks; The case of U.S. Inflation. Ohio Sate Univ. W.P. 96-02. Black, F. and M. Scholes (1973). The pricing of options and corporate liabilities. J. Politic. Econom. 81 637-659. Blattberg, R. C. and N. J. Gonedes (1974). A comparison of the stable and student distributions as statistical models for stock prices. J. Business 47, 244-280. Blattberg, R. C. and T. Sargent (1971). Regression with non-Gaussian stable disturbances: Some sampling results. Econometrica 39, 501-510. Bollerslev, T., R. Y. Chou and K. F. Kroner (1992). ARCH modeling in finance. J. Econometrics 52, 5-60 Boothe, P. and D. Glassman (1987). The statistical distribution of exchange rates. J. lnternat. Econom. 22, 297-319. Brockwell, P. J. and B. M. Brown (1979). Estimation for the positive stable laws. I. Austral. J. Statist. 21, 139 148. Brockwell, P. J. and B. M. Brown (1981). High-efficiency estimation for the positive stable laws. J. Amer. Statist. Assoc. 76, 626-631. Brorsen, B. W. and P. V. Preckel (1993). Linear Regression with stably distributed residuals. Comm. Statist. Thy. Meth. 22, 659467. Brorsen, B. W. and S. R. Yang (1990). Maximum likelihood estimates of symmetric stable distribution parameters. Comm. Statist. Sim. & Comp. 19, 1459-1464. Brousseau, V. and M. O. Czarnecki (1993). Modelisation des taux de change: Le mod+le stable. Cahiers Eco & Maths, no. 93.72, Univ. de Paris I. Buckle, D. J. (1995). Bayesian inference for stable distributions. J. Amer. Statist. Assoc. 90, 605~513. Byczkowski, T., J. P. Nolan and B. Rajput (1993). Approximation of multidimensional stable densities. J. Multivariate Anal. 46, 13-31. Chambers, J. M., C. L. Mallows and B. W. Stuck (1976). A method for simulating stable random variables. J. Amer. Statist. Assoc. 71, 340-344. Corrections 82 (1987): 704, 83 (1988): 581. Chen, Y. (1991). Distributions for asset returns. Ph.D. dissertation, SUNY-Stony Brook, Econom.
422
J. H. McCulloeh
Cheng, B. N. and S. T. Rachev (in press). Multivariate stable commodities in the futures market. Math. Finance. Cornew, R. W., D. E. Town, and L. D. Crowson (1984). Stable distributions, futures prices, and the measurement of trading performance. J. Futures Markets 4, 531-557. Cs6rg6, S. (1984). Adaptive estimation of the parameters of stable laws. In: P. Rrvrsz, ed., Coll. Math. Soc. Jdnos Bolyai 36, Limit Theorem in Probability and Statistics. North Holland, Amsterdam. Cs6rgS, S. (1987). Testing for stability. In: P. Rrvrsz et al., eds., Coll. Math Soc. Jdnos Bolyai 36, Goodness-of-Fit. North Holland, Amsterdam. De Vries, C. G. (1991). On the relation between GARCH and stable processes. J. Econometrics 48, 313-324. Diebold, F. X. (1993). Comment on 'Modeling asset returns with alternative stable distributions.' Econometric Rev. 12, 339 342. DuMouchel, W. H. (1971). Stable Distributions in Statistical Inference. Ph.D. dissertation, Yale Univ. DuMouchel, W. H. (1973a). On the asymptotic normality of the maximum-likelihood estimate when sampling from a stable distribution. Ann. Statist. 1, 948457. DuMouchel, W. H. (1973b). Stable distributions in statistical inference: 1. Symmetric stable distributions compared to other long-tailed distributions. J. Amer. Statist. Assoc. 68(342): 469-477. DuMouchel, W. H. (1975). Stable distributions in statistical inference: 2. Information from stably distributed samples. J. Amer. Statist. Assoc. 70, 386-393. DuMouchel, W. H. (1983). Estimating the stable index ~ in order to measure tail thickness: A critique. Ann. Statist. 11, 1019 1031. Durbin, J. and M. Cordero (1993). Handling structural shifts, outliers and heavy-tailed distributions in state space models. Statist. Res. Div., U.S. Census. Bur. Dusak [Miller], K. (1973). Futures trading and investor returns: An investigation of commodity risk premiums. J. Politic. Econom. 81, 1387-1406. Fama, E. F. (1965). Portfolio analysis in a stable Paretian market. Mgmt. Sci. 11, 404-419. Fama, E. F. (1976). Foundations of Finance. Basic Books, New York. Fama, E. F. and R. Roll (1968). Some properties of symmetric stable distributions. J. Amer. Statist. Assoc. 63, 817-836. Fama, E. F. (1971). Parameter estimates for symmetric stable distributions. J. Amer. Statist. Assoc. 66, 331 338. Feuerverger, A. and P. McDunnough (1977). The empirical characteristic function and its applications. Ann. Statist. 5, 88-97. Feuerverger, A. (1981a). On the efficiency of empirical characteristic function procedures. J. Roy. Statist. Soc. 43B(1): 2(~27. Feuerverger, A. (1981b). On efficient inference in symmetric stable laws and processes. In: M. Cs6rg8 et al., eds., Statistics and Related Topics. North-Holland, Amsterdam. Fielitz B. D. and J. P. Roselle (1981). Method of moments estimators for stable distribution parameters. Appl. Math. Comput. 8, 303-320. Gamrowski, B. and S. T. Rachev (1994). Stable models in testable asset pricing. In: G. Anastassiou and S. T. Rachev, eds., Approximation, Probability, and Related Fields. Plenum, New York. Gamrowski, B. and S. T. Rachev (1995). A testable version of the Pareto-stable CAPM. Ecole Polytechnique and Univ. of Calif., Santa Barbara. Ghose, D. and K. F. Kroner (1995). The relationship between GARCH and symmetric stable processes: Finding the source of fat tails in financial data. J. Empirical Finance 2, 225-251. Gibbons, M. and P. Hess (1981). Day of the week effects and asset returns. J. Business 54, 579-596. Gribbin, D. W., R. W. Harris, and H. Lau (1992). Futures prices are not stable-Paretian distributed. J. Futures Markets 12, 475-487. Groenendijk, P. A., A. Lucas, and C. G. de Vries (1995). A note on the relationship between GARCH and symmetric stable processes. J. Empirical Finance 2, 253-264. Hall, P. (1981). A comedy of errors: The canonical form for a stable characteristic function. Bull. London Math. Soc. 13, 23-27. Hall, J. A., B. W. Brorsen, and S. H. Irwin (1989). The distribution of futures prices: A test of the stable Paretian and mixture of normals hypotheses. J. Financ. Quant. Anal. 24, 105-116.
423
Hardin, C. D., G. Samorodnitsky and M. S. Taqqu (1991). Nonlinear regression of stable random variables. Ann. Appl. Prob. 1, 582-612. Hill, B. M. (1975). A simple general approach to inference about the tail of a distribution. Ann. Statist. 3, 1163-1174. Holt, D. and E. L. Crow (1973). Tables and graphs of the stable probability density functions. J. Res. Natl. Bur. Standards 77B, 143-198. Hols, M. C. A. B. and C. G. de Vries (1991). The limiting distribution of extremal exchange rate returns. J. Appl. Econometrics 6, 287-302. Janicki, A. and A. Weron (1994). Simulation and Chaotic Behavior of a-stable Stochastic Processes. Dekker, New York. Jansen, D. W. and C. G. de Vries (1991 ). On the frequency of large stock returns. Rev. Econom. Statist. 73, 18-24. Jones, E. P. (1984). Option arbitrage and strategy with large price changes. J. Financ. Econom. 13, 91 113. Kadiyala, K. R. (1972). Regression with non-Gaussian stable disturbances. Econometrica 40, 719-722. Kitagawa, G. (1987). Non-Gaussian state-space modeling of nonstationary time series. J. Amer. Statist. Assoc. 82, 103~1063. Klebanov, L. B., J. A. Melamed and S. T. Rachev (1994). On the joint estimation of stable law parameters. In: G. Anastassiou and S. T. Rachev, eds., Approximation, Prob., and Related Fields. Plenum, New York. Koedijk, K. G., M. M. A. Schafgans, and C. G. de Vries (1990). The tail index of exchange rate returns. J. Internat. Econom. 29, 93 108. Koutrouvelis, I. A. (1980). Regression-type estimation of the parameters of stable laws. J. Amer. Statist. Assoc. 75, 918-928. Koutrouvelis, I. A. (1981). An iterative procedure for the estimation of the parameters of stable laws. Comm. Statist. Sim. & Comp. B10(1), 17-28. Krasker, W. S. (1980). The "peso problem" in testing the efficiency of forward exchange markets. J. Monetary Econom. 6, 269-276. Lau, A. H. L., H. S. Lau and J. R. Wingender (1990). The distribution of stock returns: New evidence against the stable model. J. Business Econom. Statist. 8, 217-233. Lau, H. S. and A. H. L. Lau (1994). The reliability of the stability-under-addition test for the stableParetian hypothesis. 3. Statist. Comp. & Sim. 48, 67 80. Ledottx, M. and M. Talagrand (1991). Probability in Banach Spaces. Springer, New York. Lee, J. H. and B. W. Brorsen (1995). A Cox-type non-nested test for time series models. Oklahoma State Univ. Leitch, R. A. and A. S. Paulson (1975). J. Amer. Statist. Assoc. 70, 690~597. Lbvy, P. (1937). La th~orie de l'addition des variables al~atoires. Gauthier-Villars, Paris. Liu, S. M. and B. W. Brorsen (1995). Maximum likelihood estimation of a GARCH-stable model. J. Appl. Econometrics 10, 273-285. Liu, S. M. and B. W. Brorsen (In press). GARCH-stable as a model of futures price movements. Rev. Quant. Finance & Accounting. Loretan, M. and P. C. B. Phillips (1994). Testing the covariance stationarity of heavy-tailed time series. J. Empirical Finance 1, 211-248. Mandelbrot, B. (1960). The Pareto-Lrvy law and the distribution of income. Internat. Econom. Rev. 1, 79-106. Mandelbrot, B. (1961). Stable Paretian random fluctuations and the multiplicative variation of income. Econometrica 29, 517-543. Mandelbrot, B. (1963a). New methods in statistical economics. J. Politic. Econom. 71, 421440. Mandelbrot, B. (1963b) The variation of certain speculative prices. J. Business 36, 394~419. Mandelbrot, B. (1983). The Fractal Geometry of Nature. New York: Freeman. Mantegna, R. N. and H. E. Stanley (1995). Scaling behaviour in the dynamics of an economic index. Nature 376 (6 July), 4&49. McCulloch, J. H. (1978). Continuous time processes with stable increments. J. Business 51, 601 619.
424
J. H. McCulloch
McCulloch, J. H. (1979). Linear regression with symmetric stable disturbances. Ohio State Univ. Econom. Dept. W. P. #63. McCulloch, J. H. (1981). Interest rate risk and capital adequacy for traditional banks and financial intermediaries. In: S. J. Maisel, ed., Risk and Capital Adequacy in Commercial Banks, NBER, Chicago, 223-248. McCulloch, J. H. (1984). Stable option tables. Ohio State Univ. Econom. Dept. McCulloch, J. H. (1985a). Interest-risk sensitive deposit insurance premia: Stable ACH estimates. J. Banking Finance 9, 132156. McCulloch, J. H. (1985b). The value of European options with log-stable uncertainty. Ohio State Univ. Econom. Dept. McCulloch, J. H. (1986). Simple consistent estimators of stable distribution parameters. Comm. Statist. Sire. & Comput. 15, 1109-1136. McCulloch, J. H. (1987). Foreign exchange option pricing with log-stable uncertainty. In: S. J. Khoury and A. Ghosh, eds. Recent Developments in Internat. Banking andFinance 1. Lexington, Lexington, MA., 231-245. McCulloch, J. H. (1993). A reexamination of traditional hypotheses about the term structure: A comment. J. Finance 48, 779-789. McCulloch, J. H. (1994a). Time series analysis of state-space models with symmetric stable errors by posterior mode estimation. Ohio State Univ. Econom. Dept. W.P. 944)1. McCulloch, J. H. (1994b) Numerical approximation of the symmetric stable distribution and density. Ohio State Univ. Econom. Dept. McCulloch, J. H. (in press a). Measuring tail thickness in order to estimate the stable index ~: A critique. J. Business Econom. Statist. McCulloch, J. H. (in press b). On the parameterization of the afocal stable distributions. Bull. London Math. Soc. McCulloch, J. H. and B. S. Mityagin (1991). Distributional closure of financial portfolio returns. In: C.V. Stanojevic and O. Hadzic, eds., Proc. Internat. Workshop in Analysis and its Applications. (4th Annual Meeting, 1990). Inst. of Math., Novi Sad, 269-280. McCulloch, J. H. and D. B. Panton (in press). Precise fractiles and fractile densities of the maximallyskewed stable distributions. Computational Statistics and Data Analysis. McFarland, J. W., R. R. Pettit and S. K. Sung (1982). The distribution of foreign exchange prices: Trading day effect and risk measurement. J. Finance 37, 693 715. Merton, R. C. (1976). Option pricing when underlying stock returns are discontinuous. J. Financ. Econom. 3, 125-144. Mikosch, T., T. Gadrich, C. Klfippelberg and R. J. Adler (1995). Parameter estimation for ARMA models with infinite variance innovations. Ann. Statist. 23, 305-326. Mittnik, S. and S. T. Rachev (1993a). Modeling Asset Returns with Alternative Stable Distributions, Econometric Rev. 12 (3), 261-330. Mittnik, S. and S. T. Rachev (1993b). Reply to comments on Modeling asset returns with alternative stable distributions, and some extensions. Econometric Rev. 12, 347-389. Modarres, R. and J. P. Nolan (1994). A method for simulating stable random vectors. Computional Statist. 9, 11-19. Nolan, J. P., A. K. Panorska and J. H. McCulloch (1996). Estimation of stable spectral measures. American Univ. Dept. of Math. and Statistics. Nolan, J. P. and B. Rajput (1995) Calculation of multidimensional stable densities. Comm. Statist. Sim. & Comp. 24, 551-566. Oh, C. S. (1994). Estimation of Time Varying Term Premia of U. S. Treasury Securities: Using a STARCH Model with Stable Distributions. Ph.D. dissertation, Ohio State Univ. Panton, D. B. (1989) The relevance of the distributional form of common stock returns to the construction of optimal portfolios: Comment. J. Financ. Quant. Anal. 24, 129-131. Panton, D. B. (1992). Cumulative distribution function values for symmetric standardized stable distributions. Comm. Statist. Sire. & Comp. 21, 485492. Paulson, A. S. and T. A. Delehanty (1984) Some properties of modified integrated squared error
425
estimators for the stable laws. Comm. Statist. Sim. & Comp. 13, 337 365. Paulson, A. S. and T. A. Delehanty (1985). Modified weighted squared error estimation procedures with special emphasis on the stable laws. Comm Statist. Sim. & Comp. 14, 922972. Paulson, A. S., W. E. Holcomb and R. A. Leitch (1975). The estimation of the parameters of the stable laws. Biometrika 62, 163-170. Peters, E. E. (1994). Fractal Market Analysis. Wiley, New York. Press, S. J. (1972). Estimation in univariate and multivariate stable distributions. J. Amer. Statist. Assoc. 67, 84~846. Press, S. J. (1982). Applied Multivariate Analysis: Using Bayesian and Frequentist Methods of lnference. 2rid ed. Krieger, Malabar, FL. Rachev, S. R., and G. Samorodnitsky (1993). Option pricing formulae for speculative prices modelled by subordinated stochastic processes. SERDICA 19, 175-190. Roll, R. (1970). The Behavior of Interest Rates: The Application of the Efficient Market Model to U.S. Treasury Bills. Basic Books, New York. Rubinstein, M. (1976). The valuation of uncertain income streams and the pricing of options. Bell J. Eeonom. 7, 407-422. Samorodnitsky, G. and M. S. Taqqu (1994). Stable Non-Gaussian Random Processes. Chapman and Hall, New York. Samuelson, P. A. (1965). Rational theory of warrant pricing. Industrial Mgmt. Rev. 6, 13-31. Samuelson, P. A. (1967). Efficient portfolio selection for Pareto-L~vy investments. J. Finane. Quant. Anal. 2, 107 122. Smith, C. (1976). Option pricing: A review. J. Finane. Eeonom. 3, 3-51. So, J. C. (1987a). The Distribution of Foreign Exchange Price Changes: Trading Day Effects and Risk Measurement - A Comment. J. Finance 42, 181 188. So, J. C. (1987b). The Sub-Gaussian Distribution of Currency Futures: Stable Paretian or Nonstationary? Rev. Eeonom. Statist. 69, 100-107. Stuck, B. W. (1976). Distinguishing stable probability measures. Part I: Discrete time. Bell System Tech. J. 55, 1125-1182. Tobin, J. (1958). Liquidity preference as behavior towards risk. Rev. Econom. Stud. 25, 65-86. Tsionas, E.G. (1995). Exact inference in econometric models with stable disturbances. Univ. of Toronto Econom. Dept. Young, M. S., and R. A. Graft (1995). Real estate is not normal: A fresh look at real estate return distributions. J. Real Estate Finance and Eeonom. 10, 225-259. Wu, W. and S. Cambanis (1991). Conditional variance of symmetric stable variables. In: S. Cambanis, G. Samorodnitsky and M. S. Taqqu, eds., Stable Processes andRelated Topics. Birkh/iuser, Boston, 85-99. Ziemba, W. T. (1974). Choosing investments when the returns have stable distributions. In: P. L. Hammer and G. Zoutendijk, eds., Mathematical Programming in Theory and Practice. NorthHolland, Amsterdam. Zolotarev, V. M. (1957). Mellin-Stieltjes transforms in probability theory. Theory Probab. Appl. 2, 433-460. Zolotarev, V. M. (1981). Integral transformations of distributions and estimates of parameters of spherically symmetric stable laws. In: J. Gani and V. K. Rohatgi, eds., Contributions to Probability. Academic Press, New York, 283-305. Zolotarev, V. M. (1986). One-Dimensional Stable Laws. Amer. Math. Soc., (Translation of Odnomernye Ustoichivye Raspredeleniia, NAUKA, Moscow, 1983.).
G. S. Maddala and C. R. Rao, eds., Handbook of Statistics, Vol. 14 1996 ElsevierScienceB.V. All rights reserved.
1A
1%
Probability Distributions for Financial Models
James
B. McDonald
1. Introduction This paper reviews probability distributions which have been and can be applied to problems arising in finance and examines some of these applications. Viewed from a purely statistical perspective, financial data provide a rich source of variables with diverse distributional characteristics ranging from normally distributed variates to variables characterized by various degrees of skewness and kurtosis. While the normal or lognormal distributions may provide an adequate representation for many financial series, other series are not so conveniently modeled. This paper reviews some important alternatives to the normal, lognormal, and stable paretian distributions. Financial data are of great interest to individual investors, corporate planners, politicians, and government policy makers. Financial data are constantly changing and are highly visible in daily reports on stock prices, interest rates, currency exchange rates, and gold prices. Many of these data are characterized by a high degree of uncertainty, and changes have the potential to generate huge gains or losses. Stocks, currencies, commodities and many other goods are traded at different financial markets and exchanges throughout the world. Various financial instruments and transactions are possible. Spot markets are used to facilitate the immediate transfer of ownership of goods and financial instruments. Futures markets facilitate the exchange of goods at a particular price at some specified future date. Options give the right to participate in a spot or futures transaction at a previously agreed price. However, the right does not have to be exercised. Options exist for stocks, currency, metals, and commodities. Each of these is characterized by a high degree of uncertainty. The most extensive source of data on U.S. stock prices and returns is the Center for Research in Security Prices (CRSP) at the University of Chicago. This database includes daily returns on every common stock listed on the New York and American stock exchanges, beginning in 1962. The CRSP data base also contains some over the counter returns and monthly data back to 1926. Data for future prices can be obtained from the Center for the Study of Future Markets at 427
428
J. B. McDonald
Columbia University, (cf. Taylor (1986), p. 26). The Futures Industry Institute, a nonprofit educational foundation, has compiled a database that would be useful for those conducting research on futures and related option markets. This database includes data on currencies and commodities. The PACAP data base includes data on Asian Markets. This paper reviews alternative probability distributions which can be used to model return distributions on financial assets. Section two reviews the normal, student's t, lognormal, stable, Pearson family and three additional families of probability distributions. Section three considers applications of these distributions in describing return distributions, stochastic dominance, and option pricing. Section three, the conclusion, discusses the application of families of probability distributions to providing partially adaptive estimators of the betas for stocks.
2. Alternative models
2.1. S o m e b a c k g r o u n d
Two common approaches can be taken to model returns to financial instruments. The first describes the underlying stochastic process that generates prices; the second specifies a statistical distribution which provides a good fit to the empirical data. This paper reviews models that can be used to describe returns and does not investigate the underlying stochastic process; however, some of the models have structural interpretations. Let Pt denote the nominal price of a financial instrument on trading day t. Further let dt denote dividends, if any, paid on that day. We will consider two definitions of returns which are independent of the price units
yt = (Pt + d t ) / P t - 1 0 < y,
and < zt < cx~,
zt = ln(yt) = ln(P, + dr) - ln(Pt_l), - ~
where (yt - 1) is the simple return and zt is the compound return. Since the value o f l n (1 + e) is very close to ~, for small c, the results of empirical studies based on yt (or yt - 1) generally yield similar conclusions to studies based on zt. Satistical models for data in both forms, Y for positive variables and Z for any real value will be reviewed. For example, if the random variable Y is lognormally distributed, then Z = In (Y) will be normally distributed.
2.2. Basic concepts and definitions
Let F(s) denote the cumulative distribution function corresponding to the random variable S. The first four moments of S are often involved in the analysis of financial data. Let/~i denote the ith moment about the mean (/~) :
Probability distributions for financial models
429
lai = EF(S - #)i =
foo
oo
(s - ~)'dF(s)
(2.1)
where #2 is the variance; and common measures of the skewness (x/ill), and kurtosis (f12) are defined by 71 = V#fil f12 ~-- .~ /x92
#3 3/2 I~2
(2.2a)
(2.2b)
Symmetric distributions are characterized by 71 = 0. //2 is a measure of tail thickness and peakedness. 72 = //2 - 3 is referred to as excess kurtosis. A distribution is said to be platykurtic, mesokurtic, or leptokurtic as//2 is <, =, or > 3. (Stuart and Ord 1987, p. 107). Leptokurtic distributions are more peaked and have thicker tails than the normal. Normalized incomplete moments or moment distributions for positive random variables are defined by ~b(y; h) = fY-oosh f ( s ) ds E(sa ) (2.3)
rb(y, 0) is merely the cumulative distribution and gives the probability of S _< y. ~b(y; 1) represents the fraction of total S which corresponds to S < y. Each of the q~(y, h) has the properties of a cumulative distribution (nondecreasing in y and approaching 1 as y + 0) - hence the name m o m e n t distributions, q~(y, 0) and q~(y, 1) will be used in the discussion of option pricing and stochastic dominance. We now turn to a discussion of specific probability density functions in section 2.3 and 2.4.
2.3. Some statistical distributions." normal, student's t and lognormal.
The normal, student's t, and lognormal distributions have become widely used in the financial literature. We briefly review some important definitions and properties of these important distributions. The normal distribution function is defined by the probability density function (pdf)
e-(Z-#)2 /2G2 N ( z ; # , ~) --
v/~r ~
, - e c < z < oc .
(2.4)
The normal is symmetric (71 = 0) with f12 = 3; it provides a good fit for many financial time series. However, significantly higher values of kurtosis (f12 > 3) are often observed in financial return data. Student's t-distribution is symmetric about the origin with kurtosis 3 + 6 / ( v 3), where v denotes the '"degrees of freedom" parameter, and allows for thicker
430
J. B. McDonald
tails than the normal. The corresponding pdf, with an arbitrary scale coefficient (o-), is defined by
1
T(z; v, er) = v/aB(1/2, v/2) (1 + 2z2/va2) v+1/2
(2.5)
where B ( , ) denotes the beta function (defined in appendix A). The h th order, heven, moments corresponding to equation (2.5) are given by -hl. /2xh/2BIh+l v-h) Er(Zh) = o k~/ ) k~-, 2
1 v
(2.6)
for h < v. Equation (2.5) approaches the normal, N(z; # = 0, o) as v grows indefinitely large. Blattberg and Gonedes (1974) and Blattberg and Sargent (1971) have used student's t in the finance literature. Many return distributions are not only thick-tailed, but also exhibit positive skewness, Taylor (1986, p. 44). While student's t-distribution can account for kurtosis, it does not allow for modeling skewed data. The lognormal LN(y; #, o-) is also widely used in finance and is defined by e-( ln(y)-,u)2/2~r2 LN(y; #, o) -y o_V / ~
0< y .
(2.7)
The mean and variance, respectively, of the lognormal are E(Y) = e"+~/2 var(Y) = t/2e2u+a2where t/2 = e ~2 - 1 . (2.8a) (2.8b)
Aitchison and Brown (1969, p.8) report expressions for the corresponding skewness and kurtosis, respectively, Yl = t/(~2 + 3) and/~2 = t/8 + 676 + 15@ + 16//2 + 3. Thus 71 is positive and increases with increases in the parameter a. The measure of kurtosis is greater than three and also increases with a. Note that for small values of o-, skewness and kurtosis approach 0 and 3, respectively. The cumulative distribution function for the lognormal is given by 1 (ln(y)-#)[~3 LN(y; #, o') = ~ q v/~a 1F1
;2;
(ln(~_a
/~)2_1
(2.9)
where lFl[ ] denotes the confluent hypergeometric series defined in appendix A. Estimation of the normal and lognormal parameters is relatively simple. Ease of estimation and a theoretical foundation have provided a motivation to use these models in finance. While the normal and lognormal provide adequate descriptive models for many cases, unfortunately many data sets are not accurately modeled by these relatively tractable models. Two approaches to this problem are to select a model from a family of flexible parametric distributions or the use of semi-
Probability distributionsfor financial models
431
parametric models. This paper focuses on the use of flexible parametric distribution functions.
2.4. Some families of statistical distributions

Since some financial data series are not accurately modeled by the normal, lognormal, or student's t, more flexible distributions are often called for. These include the stable, Pearson, generalized beta, and exponential generalized beta of the second kind, and generalized t families of distributions. Each of these distributions includes many common distributions as special cases. Thus a researcher can test whether a more general form yields a statistically significant improved fit relative to any of its special cases.
The stable distribution Mandlebrot (1963) is often credited with the reexamination of the assumption of the normality of stock returns. He found that empirical distributions of price changes were often too peaked and long-tailed to be consistent with the normal distribution. Mandelbrot (1963) investigated the stable family of distributions defined by the log of its characteristic function given by K(t)=lnC(t)=i6tt )tan(~2) ] ~[t]~ [ 1 + i f l ( ~ (2.10)
The underlying density function is symmetric if fl = 0, and in this case 6 is the median. The density is skewed to the left or right as fl < 0 or /3 > 0. The parameter ~, referred to as the characteristic exponent of the stable family, is restricted to the range [1, 2], with the Cauchy and normal distributions corresponding to ~ being eqal to 1 or 2 (with/3 = 0), respectively. These are the only two distributions in this range having known closed-form expressions for the pdf. must be in the range (1, 2] for a finite mean to exist. The variance is not defined if ~ < 2. Fama and Roll (1968) demonstrate that tail-thickness increases as the value of ~ decreases. They also outline a method for estimating ~ and give expressions for other distributions in terms of a Bergstrom series expansion. The stable family exhibits closure under addition, i.e., the distribution of the sum of identically and independently distributed stable variates, is in the stable family. Officer (1972) found the stable distribution to provide a reasonable model for monthly stock returns. However, he found the estimated ~ to be sensitive to the number of daily returns in the sum; this raises questions about the closure property and about the appropriateness of the stable distribution. Hagerman (1978), also investigating estimates of ~, found that the estimated value of ~ tends to increase from approximately 1.5 for daily returns to 1.9 for returns for 35 days; hence he not only questions the closure property, but also provides some evidence of a limiting normal distribution, particularly for monthly or longer periods. Since the distribution of stock returns tends to be fat-tailed relative to the normal, such as can be modeled by the symmetric stable family, Akgiray and Booth (1988) studied the tails of the distributions of 200 common stocks. These stocks were
432
J. B. McDonald
drawn from some of the most actively traded 1,000 stocks. They found significant differences between the empirical and fitted distributions. Lau, Lau, and Wingender (1990) demonstrate that the empirical behavior of estimates of moments of order four and six based on the stable family is generally inconsistent with observed empirical characteristics of stock returns. See Blattberg and Gonedes (1979) for another example.
The Pearson family
The Pearson family of distributions provides another approach to modeling return distributions which are not acurately modeled by the normal or lognormal. The well-known Pearson (1895, 1901, 1916) family of distributions is defined by solutions to the differential equation
~P(s) - d l n ( f ( s ) ) ds _ (s - a) bo + b l s + b2 s2
(2.11)
"
The denominator will have two real roots, which will either be (1) real with the same sign, (2) real with different signs, or (3) imaginary. The properties of the Pearson family of distributions are discussed in Elderton and Johnson (1969), Kendall and Stuart (1969), and Ord (1972). The Pearson family includes, among others, the beta of the first and second kind, gamma, student's t and normal distributions as special and limiting cases. Specific members of the Pearson family can be selected by analyzing the values of/31 and/32 o r using the kappa criterion defined by b2
- 4bob~2 -
/31(/32 + 3) 2 4(2/32 - 3/31 - 6)(4/32 - 3/31) '
(2.12)
For example, the normal is obtained if ~: =/31 = 0 (with/32 = 3) and ~: = 1 yields an inverse gamma. Ord (1972, pp. 8-9) mentions some extensions of the Pearson family in which the numerator and denominator of the defining differential equation (2.11) may be polynomials of arbitrary degree (Pad6 approximations). Numerous methods of estimating members of the Pearson family have been considered. Pearson used the method of moments to fit probability density functions to the data. Method of moment estimators are inefficient for the Pearson family except for the normal pdf. Maximum likelihood estimation yields efficient estimators. Distributional classification, based upon either method of moments or maximum likelihood estimators of/31, /32, or x, should consider sample variation. Ord (1972) cites studies pointing out the importance of grouping corrections when applying these methods to grouped data. Hirschberg, Mazumdar, Slottje, and Zhang (1992) apply the kappa criterion to the problem of model identification for stock return distributions. Lau, Wingender, and Lau (1989) found that accurate estimates of the skewness coefficient required very large samples. Thus sample variation of the kappa criterion should be considered in the analysis. A number of authors have argued that the distribution underlying price changes need not have a constant variance. If returns, conditional on the var-
433
iance, have a well-defined p d f and the stochastic variance has a known distribution, then the corresponding return distribution is said to be characterized by stochastic volatility or heterogeneity and can be expressed as a mixture distribution. Mixture distributions will be considered in more detail later. However, two early examples of mixture distributions in finance were considered by Praetz (1972) and Clark (1973), who both assume that returns, conditional on variance, are distributed normally. Clark (1973) assumes that the variance is distributed as a lognormal that leads to a thick-tailed distribution for observed returns. Praetz (1972) also assumes that the variance is stochastic and is distributed as an inverse gamma. This mixture leads to a Student's t-distribution for observed returns. It has already been noted that Student's t permits much thicker tails than the normal and includes the normal as a special case. Blattberg and Gonedes (1974) use the Student t distribution to model return distributions and find it dominates the stable family. We now discuss three families of distributions which permit mixture interpretations and can thus accommodate a wide variety of tail-thickness, and in one case permits asymmetry as well. These distributions are the generalized beta of the second kind (GB2), the generalized t (GT), and the exponential generalized beta of the second kind (EGB2) distributions.
Generalized Beta o f the second kind The generalized Beta o f the second kind (GB2)lis defined by the p d f
GB2(y; a, b, p, q) =
la]YP-1 bapB(p~q) (1 q- (y/b)a) p+q
y ~- 0
(2.13)
where the parameters b, p, and q are positive. The GB2 distribution is referred to as a generalized F by Kalbfleisch and Prentice (1980), and a modified version (with a non-zero threshold) as a Feller-Pareto distribution by Arnold (1983). The ~ ( y ) function of the GB2 is given by
7*(y) -- d l n f ( y ) _ a p - 1 - (aq + 1) ( y / b ) a dy y(1 + ( y / b ) a)
(2.14)
and neither includes nor is included as a special case of the ~(s), equation (2.11), for the Pearson family. The parameters a, b, p, and q determine the shape and location of the density in a complex manner. The h th order moments of Y are given by
IA generalization of the GB2 is given by the generalized beta (GB) defined by talyap-l(1 - (1 -c) (Y/bd)a)q lforO < ya < ba GB(y; a, b, c, p, q) = ~ f f ( p , q~ ~ ~ ~ ( y ~ 1- c The GB2 is obtained from the GB by letting c = 1. This particular case seemsto be of greatest interest in studying return distributions. However, c = 0 yields a generalization of the beta of the first kind which has other important applications in financial and economic models. See McDonald and Xu (1995) for additional details.
434
J. B. McDonaM
EGB2(yh) = bhB(p + h/a, q - h/a) B(p,q)
(2.15)
for -p < h/a < q and permit the analysis of situations characterized by infinite variance. The parameter b is merely a scale parameter and depends on the units of measure. Generally speaking, the larger the value of a or q, the thinner the tails of the density function. In fact, for large values of the parameter a, the corresponding GB2 density function is characterized by the probability mass being concentrated near the value of the parameter b. This can be verified by noting that for large values of a the mean is approximately b and the variance approaches zero. The relative values of the parameters p and q play an important role in determining the value of skewness and permit positive or negative skewness. This is in contrast to such distributions as the lognormal, which is always positively skewed. The cumulative distribution for the GB2 is given by
G B 2 ( y ; a , b , p , q ) = (zP)2Fl[p, 1 - q; p + l;z]/ p B(p,q)
(2.16)
where z = [(y/b)~/(1 + (y/b)~)] and 2F1[ ] is a hypergeometric series (defined in appendix A). The four parameters in the GB2 provide a great deal of flexibility and nest many important statistical distributions as special or limiting cases. These include, among others, the Beta of the second kind (B2 = GB2(y; a = 1, b, p,q)), the Burr type 3 (BR3 = GB2 (y; a, b, p, q = 1)), the Burr type 12 (BR12 = GB2 (y : a, b, p = 1, q)), and the generalized gamma (GG)
GG(y;a, fl, p) =
]a]yap-l e-(Y/~)a flapF(p )
0< y
(2.17)
as a limiting case of the GB2 GG(y; a, 13, p) = Limitq_~ GB2(y; a,/3ql/~, p, q) . The generalized gamma includes the gamma (GA = G G (y; a = 1, /3, p)), the Weibull (W = GG(y; a,/3, p = 1)), and the Lognormal LN(y; #, ~r) = Limita-~0 GG(y; a,/3 = (~2a2)l/a, p (a# + 1)//3 a) . The h th order moments (h/a < p) for the generalized gamma are given by E~G(yh ) _ / 3 h r ( p - h/a)
r(p)
(2.18)
Negative values of the parameter a, yield inverse generalized gamma (IGG) distributions which arise in models for stochastic volatility and heterogeniety. The cumulative distribution function for the generalized gamma is given by
GG(y;a,/3, p) =
e-(Y/~) (y/ /3)ap F ( p + 1) 1El [ 1 ; p + 1; ( y / f l ) a ]
(2.19)
435
The GB2 also includes Fisher's F, the Lomax, Fisk, half normal, half Student's t, Chi-square, and Rayleigh distributions as special cases. The interrelationships can be visualized by means of a distribution tree in McDonald (1984) or McDonald and Xu (1995). The GB2 can be generated from mixing a generalized gamma with a scale parameter which is randomly distributed as an inverse generalized gamma, GB2(y; a, b, p, q) = GG(y; a, s, p)IGG(s; a, b, q)ds . (2.20)
Equation (2.20) permits Bayesian interpretations, models for heterogeneity or stochastic volatility, and certain types of measurement error. In a model for unobserved heterogeneity, the first distribution can be thought of as the structural distribution for subpopulations; the second represents the mixing distribution of the scale parameter s. The mixing distribution approaches a degenerate distribution at s = b in the case of q increasing in accordance with Limitq~oo GG(s; a, qUab, q); then the corresponding GB2 would approach a G G distribution, McDonald and Butler (1987). In the context of a financial model, the generalized gamma would be the distribution of returns, conditional on scale which is assumed to be distributed as an inverse generalized gamma. This mixture interpretation provides a structural interpretation (stochastic volatility) for the GB2 as a model for returns. Generalized T The generalized T (GT) is a symmetric three-parameter pdf which can model very diverse levels of kurtosis for returns zt = ln(Pt + dr) - ln(Pt) and is defined by the pdf GT(z; or, p, q) = P 2crql/pB(1/p, q)(1 + [z[P/qaP) q+l/p (2.21)
for - ~ < z < oo with positive parameters a, p, and q. The G T was introduced into the literature in McDonald and Newey (1988) and can be shown to include the Box-Tiao (BT) as a limiting case BT(z; a, p) = Limitq_~GT(z; ~, p , q ) pe-(Izl/a: 2aF(l/p) " (2.22)
The BT is symmetric and is also called the power exponential distribution. The normal distribution is a special case of a BT distribution with p = 2. The double exponential or Laplace and Student's t (with v degrees of freedom and without unitary variance) are given as the following special cases of the BT and G T distributions: e-(Izl/~) 2~r (2.23a-b) r(z; v, or) = Gr(z; a, p = 2, q = v)
z
Laplace(z; or) = BT(z; a, p = 1) --
436
J. B. McDonald
The h th order moments (h even) of the G T and BT distributions are given by EOT(Zh) = ffhqh/p F((1 q- h)/p)F(q - h/p)
F(1/p)F(q)
EBT(Zh) = ~rhF((1 + h)/p)
(2.24a - b)
r(1/p)
The BT has finite moments of all orders; whereas, the h th order moment of the G T is defined only for h < qp. The Cauchy is a special case of the G T with p = 2 and q = 1/2 and does not have finite integer moments. The G T is symmetric and can accommodate tails that are thicker or thinner than the normal. The G T also provides the basis for "robust" or partially adaptive estimation of regression and time series models. Applications of these will be considered in a latter section. The G T can be interpreted as a mixture of a B T distribution having a scale parameter 0r), which is distributed as an inverse generalized gamma (IGG) : GT(z; a, p, q) = BT(z; s, p)IGG(s; p, ~r,q) ds. (2.25)
This result is a generalization of the result for a student-t corresponding to a normal with a scale parameter being distributed as an inverse gamma, Praetz (1972).
Exponential Generalized Beta of the Second Kind

While the tail-flexibility of the G T is important, many return distributions are also skewed. Another distribution for real valued random variables which permits skewness as well as leptokurtosis is the exponential generalized beta of the second kind ( E G B 2 ) , with pdf defined by EGB2 (z; 6, a, p, q) =
ep(Z-a)/~
I~] B(p,q) (1 + e(Z-a)/~)p+q

--oc <-z < o~ .
(2.26)
Since the EGB2 and GB2 are related by the logarithmic transformation, many special cases of the EGB2 can be readily determined. However, several of these distributions are of special interest in the statistics literature. The exponential generalized gamma can be defined as EGG(z; 6, ~r, p) = Limitq-~EGB2(Z; 6. : alnq + 6, p,q)
ep(Z-a)/a e -e(z-a)/~
(2.27)
[~l r(p)
The EGB2 and EGG, for a > 0, are merely alternative representations of the generalized logistic and gompertz distributions reviewed in Johnson and Kotz
437
(1970, Vol. 2) and Patil et al. (1984). The generalized Gumbell corresponds to the EGB2 with p--= q. The EBR3 is the Burr type 2 distribution; the exponential Weibull is more commonly known as the extreme value type I distribution. The first four moments for the EGB2 and E G G are given in Table 1 (see McDonald and Xu, 1995, for details).
Table 1 Moments for the EGB2 and E G G Moments Mean (/~) variance (#2) Skewness (/t3) EGB2 6 + a[g~(p) - gt(q)] a2[~t(P) + ~'(q)] a317:(p) _ 7a,(q)] EGG
6 + a~(p)
tr2~V'(P)
a3~.(p)
Kurtosis (P4 - 3#~)
a4[~g"(p) + ~"(q)]
aa~'(p)
#i denotes the ith moment about the mean, and O(s) denotes the digamma function [dlnF(s)]/ds. (See McDonald and Xu, 1995, for details.)
6 is a location parameter, o- is a scale parameter, and p and q are shape parameters. Changing the sign of o"changes the sign of the skewness. The EGB2 is symmetric for p = q. The kurtosis (#4/kt~) is greater than or equal to three. The EGB2 includes the normal as a limiting case and can be used to characterize errors in regression, time series, or other models in which we may want to allow for departures from normality. The EGB2 provides the basis for partially adaptive estimation with bounded influence functions. The EGB2 has the following mixture interpretation: EGB2(z; 6, a, p, q) =
fo
1 GG(U; ~, s, p ) g I G G ( s ; ~, e 6, q)ds.
(2.28)
Estimation Maximum likelihood estimation of the unknown parameters in the GB2, GT or EGB2 families require nonlinear optimization. These estimators are asymptotically efficient and asymptotically normal. We now consider applications of these distributions in the financial literature.
3. Applications in finance We now turn to four applications of the distribution discussed in the second section: distributions for stock returns, stochastic dominance, option pricing, and partially adaptive estimation for betas for stocks.
3.1. Distribution of security price returns

There are two common approaches to modeling the distribution of security returns described in the finance literature. The first begins with the specification of
438
J. B. McDonald
an underlying stochastic process which is assumed to generate prices. The second is empirical and is based on a statistical distribution function that provides a reasonably accurate representation of the observed returns. The actual data are frequently distributed with thicker tails and are more peaked than the normal or lognormal. As noted earlier, this observation led to the consideration of the symmetric-stable and other distributions. A popular hypothesis is that security price distributions involve a mixture of distributions. For example, mixing a lognormal distribution of returns with an inverse gamma distribution for volatility has led to a distribution with corresponding kurtosis that more nearly matches observed kurtosis than the lognormal. This particular mixture, known as a log-t distribution, includes the lognormal as a limiting case. It has already been mentioned that Student's t results from mixing a normal with an inverse gamma distribution for a. In the previous section, the GB2 was shown to be obtained by mixing a generalized gamma with an inverse generalized gamma for the scale parameter (volatility): GB2(y; a, b, p, q) = GG(y; a, s, p)IGG(s; b, q)ds. (3.1)
The GG(y; a, s, p) distribution in (3.1) can be interpreted as the conditional distribution of returns, given s, where s is assumed to be distributed according to the indicated inverse generalized gamma. Since the generalized gamma includes the lognormal as a limiting case, the GB2 generalizes the lognormal-gamma mixture studied by Praetz (1972). It is important to recall that the I G G distribution in (3.1) approaches a degenerate pdf as the parameter q grows indefinitely large; thus the GB2 permits, but need not imply, models of stochastic volatility. Furthermore, the GB2 has finite moments o f order up to aq. Distributions in which aq < 2 are not characterized by finite variance. Bookstaber and McDonald (1987) investigated the distribution of 500 daily stock returns (Yt = (Pt + dt)/Pt-1) dating from December 30, 1981 for twentyone randomly selected stocks. Twice the difference between the maximized loglikelihood values (LR = 2 (gGB2 -- gLN)) provides the basis for a likelihood ratio test of the hypothesis H0 : GB2 = LN. Theuse of critical values based on X2(2) yields a conservative test of statistical significance. Bookstaber and McDonald (1987) find that 19 of the 21 cases exceed the .995 confidence value of 10.6. Thus the more flexible GB2 provides a statistically significant improved fit relative to the lognormal. In a separate study conducted for this paper, 60 monthly stock returns, with dividends, for 45 randomly selected companies for the period January 1988 through December 1992 were investigated. The 45 selected companies are listed in Appendix B. Several distributions were fit to each data set using maximum likelihood procedures. In testing the hypothesis H0 : GB2 = LN, in only ten of the 45 cases did the value of LR exceed 5.99 (95% level), and in only six cases was the value of L R greater than 10.6. These results further confirm previous studies that have found return distributions for longer time periods to be more nearly lognormal (normal) than for short time periods.
439
We report estimation results for one of the companies and for the New York Stock Exchange in tables 2 and 3. Table 2 shows the results of using M L E to estimate the GB2, BRI2, GA, and L N to return data for Ampco-Pittsburgh Corporation (AMPCO). Parameter estimates, estimated moments (corresponding to estimated parameters), and maximized log-likelihood values (g) are reported. The mean, variance, skewness, and kurtosis reported on the fifth through eighth lines of table 2 are obtained by substituting the estimated parameter values into the equations for the theoretical moments, e.q. equation (2.15) for the GB2. The estimated moments reported at the b o t t o m of the table are obtained using the sample moments. The estimated two- parameter L N distribution is able to model the sample mean and variance quite well, but does not have the flexibility to represent the sample skewness and kurtosis. The additional two parameters of the GB2 provide a statistically significant increased flexibility in modeling skewness and kurtosis. Note that these results are based on m a x i m u m likelihood estimation and not method of moments. It is interesting to note that the three-parameter BR12 gives results very similar to those of the GB2. The BR12 is a three-parameter distribution having a closed form cumulative distribution. The same four statistical distributions were fit to monthly returns on the valueweighted New Y o r k Stock Exchange imdex (VWNYSE). These results are given in Table 3. The corresponding L R is not statistically significant at conventional levels of significance; however, the hypothesis H0 : GB2 = L N involves parameters on the boundary of the parameter space. This raises the question of the accuracy of inferences based on an asymptotic Z2(2). The data for A M P C O and V W N Y S E are included in Appendix B.
3.2. Stochastic dominance

This section will review alternative ways in which different return distributions can be compared and some applications of probability density functions to this Table 2 AMPCO-Pittsburgh Co. estimated monthly return distributions (January 1988 - December 1992) GB2 a(/2) b(~r) p q Mean Variance Skewness Kurtosis 29.34 .9642 .7726 .4977 1.0001 .0092 1.184 7.505 60.3 BR12 24.97 .9583 1.000 .6006 1.0002 .0091 1.164 7.164 60.2 GA 1.000 .009592 104.3 N/A 1.0005 .0096 .196 3.06 54.4 LN (-.004276) (.09625) N/A N/A 1.0004 ,0093 .290 3.15 55.6
N/A-not applicable Sample moments : (mean, var, skew, kurt) = (1.0005, .0105, 1.73, 9.13)
440
J. B. McDonald
Table 3 VWNYSE estimated monthly return distributions (January 1988 December 1992) GB2 a(~) b(cr) p q Mean Variance Skewness Kurtosis g 118.6 1.013 .3464 .3672 1.012 .0013 .129 5.39 116.8 BR12 53.09 1.010 1.000 .9721 1.012 .0013 .198 4.31 116.5 GA 1.000 .001239 816.8
N/A
LN (.01106) (.03501) N/A N/A 1.012 .0013 .1051 3.02 115.3
1.012 .0013 .0700 3.01 115.3
Sample moments : (mean, var, skew, kurt) = (1.012, .0012, .0511, 3.79) important problem. The concepts of mean-variance rankings, and first and second order stochastic dominance will first be reviewed. The relationship between these rankings and expected utility provides a notion of optimality. Parametric restrictions on some probability density functions leading to stochastic dominance will be reviewed. Finally, the concepts of Lorenz dominance and meanGini dominance will be reviewed and their relationship to stochastic dominance.
Mean-variance and stochastic dominance
Let F1 and F2 denote cumulative return distributions corresponding to two different assets X1 and X2. Further, let #i and a/2 denote the mean and variance of X,., respectively. Markowitz (1959) and Tobin (1958) propose the mean-variance (MV) criterion to rank distributions. Distribution F1 is said to dominate (is preferred to) distribution F2, according to the mean-variance (MV) criterion F1 >MV F2 MV: or X1 >Mv X: #1 -> #2 and a 2 < 0-~ if and only if :
(3.e)
with at least one strict inequality. The mean-variance criterion partitions the set of alternatives into an "admissible or efficient" set (SMv) and an "inadmissable or inefficient" set. The admissible set is obtained by deleting assets having a lower mean and higher variance than a member of the original set of assets. Thus the inadmissable set will not contain any assets with a higher mean and smaller variance than any asset in the admissible set. As a numerical example we note, from tables 2 and 3, V W N Y S E >MV AMPCO. The mean-variance efficient set corresponding to the 45 randomly selected firms contains Aileem, Atlantic Energy, General Public Utilities, N U C O R , Union Pacific, and Walgreen.
441
The concepts of first and second order stochastic dominance provide alternative decision rules from ranking distributions. A distribution F1 is said to be first order stochastic dominant (FSD) over F2
FSD:
F1 ~>FSD F2 if and only ifi F1 (x) < F2 (x) for allx, -oo and
FI (xo) < F2 (x0) for some x0.
<
<
0o,
(3.3)
Thus, F1 >FSD F2 requires that F1 never lie above and somewhere lie below F2. It follows that a necessary, but not sufficient, condition for FSD is that the mean(if defined) of the preferred asset is at least as large as for the dominated asset. The corresponding efficient set will be denoted SFSD and is not necessarily the same as SMV. The distribution Fl is said to be second order stochastic dominant (SSD) relative to F2, denoted F1 >SSD E2, if and only if: SSD:
F
f
oo
Fl (t)dt <_
(ND
F2 (t) dt or
0(3 --~ < X < oo
(3.4)
X [F1 (t) - F2 (t)] dt < 0 for all x,
and with a strict inequality for at least one x. F1 >SSD F2 requires that the integral of F1 never live above and somewhere lie below the integral of F2. In contrast to FSD, SSD allows F1 and F2 to intersect many times, as long as the negative areas (where F1 > F2) are smaller in absolute value than the accumulated positive areas where F2 > F1. First order stochastic dominance implies second order stochastic dominance. Hence SSSD C SFSD. We again note that the admissible sets corresponding to the MV, FSD, and SSD need not be the same and may lead to different decisions. The concept of expected utility provides an approach to resolving the differences.
Expected utility and optimality Von Neumann and Morgenstern(1953) demonstrated that expected utility can be used as a foundation for decision-making under uncertainty. Thus if U(x) denotes a utility function, distributions could be ranked according to expected utility.
Ei(Y) = f U(Y)dFi(Y). (3.5)
Clearly, rankings based on expected utility depend on assumptions made about the utility function and may differ from the MV, FSD, or SSD criteria. An
442
J. B. McDonaM
optimal efficient set is the set of distributions (or assets) made up of distributions that maximize expected utility corresponding to utility functions with different assumptions.Hence, SSSD and SMV can be optimal under certain restrictive assumptions. The mean-variance criterion is valid (the mean-variance admissible set SMV is optimal) if either the utility function is quadratic or the return distributions are normal, Tobin (1958) and Hanoch and Levy (1969). Pratt (1964) and Arrow (1965) have discussed the limitations of quadratic utility functions (increasing absolute risk aversion). Further, the assumption of normally distributed returns rules out skewness and leptokurtosis, which characterize many return distributions. Quirk and Saposnik (1962), Fishburn (1964), and Hanoch and Levy (1969) demonstrated that FSD is optimal if and only if the utility function is nondescreasing. This follows from equation (3.6) EF, U(X) - EF2 U(X) =
/5
[F2 (t) - F1 (t)]dU(t) .
(3.6)
O(3
SSD has been shown to provide optimal rankings in the case of a non-decreasing and concave utility function, see. Hanoach and Levy (1969) for details. Stochastic dominance and parametric families Ali (1975) investigates stochastic dominance when the distributions belong to various parametric families of distributions. Ali uses a result on monotone likelihood ratios reported in Lehmann (1959) to identify subsets of the parameter space for different families corresponding to FSD and SSD. He considers the gamma, beta, t, F, ~(2, and lognormal families of distributions. As an example, consider the gamma density.
yp-1 e-y/~
GA(Y; r, p) = G G ( Y ; a = 1 , r, p) Ali (1975) finds
flPF(p).
(3.7)
GA(Y; fix, Pl) >FSD GA(y; f12, p2) if and only if r2 ~ fll and P2 _< Pl
(3.8)
with at least one strict inequality. Thus, in determining whether one member of the gamma family dominates another, one need only compare parameter values. 2 This does not facilitate comparing distributions from two different families. Since the GB2 nests the gamma and beta families, the same approach could be considered in an attempt to obtain corresponding results to facilitate a comparison of members from different families. 2Pope and Zimer (1984) study the impact of samplingvariationin estimatingthe mean, variance, and parameter values on the power of tests for efficiency.
Probabilitydistributionsfor financialmodels
443
To apply the methodology outlined in Lehmann (1959), the likelihood ratio is first calculated: LR(y; 1~1, ~}2) = l n f ( y ; 191)-lnf(y; 02). If d L R (y; O 1 , 0 2 ) / d y is monotonically non-decreasing for Ol > 02, then /701 ~>FSD FO2- As a further illustration, the derivative of the log-likelihood ratio for the generalized gamma is given by
dLRGG alp~--azp2 +az (y~a2 d~y Y \flzJ
al ( y ) a Y -~1 "
(3.9)
Increases in the value of parameters p of fl are seen to lead to first order stochastic dominance corresponding to the larger parameter values. This is true for any value of a. This verifies some of the previously cited results for the gamma. The impact of changes in the parameter a are unclear, as are combinations of increases in values of either p or fl and decreases in the other. Similarly, the derivative of the log-likelihood ratio for the generalized beta of the second kind can be written as
dLRG132alPl-a2p2a2(p2+q2)[.1] dy y y 1 + (b2/y) a2
al(Ply+ q l ) [ 1 +
1 (bl/y) af]
(3.10)
Stochastic dominance and Lorenz dominance 3

Atkinson (1970) showed that the rules for stochastic dominance can be restructured in terms of Lorenz curves, which have been used to compare income distributions in the economics literature. The Lorenz curve, for an income distribution, plots the percent of total income held by different fractions of the population. Thus the Lorenz curve is a plot of the incomplete moments (~b(y; 0), ~b(y; 1)) where q~(y; 0) denotes the fraction of the population with income less than y, and ~b(y; 1) is the fraction of total income held by those with incomes less than y. Atkinson (1970) demonstrates that for two distributions with equal means, F1 >SSD F2 implies that the Lorenz curve of F1 lies above that of F2. The literature on Lorenz dominance has adopted the definition F2 Lorenz dominates F 1 F2 >Z F1 if and only if L: the Lorenz curve ofF1 lies above that of F2. (3.11a)
3 Shorrocks (1983) and Kakwani (1984) developa generalizedLorenz curve that takes account of differentmeans in ranking distributions. The generalizedcurve is constructedby scalingup the Lorenz curve by the mean of the distribution. Generalized Lorenz dominance is equivalent to preference according to S-concave social welfare functions. There is a duality between generalized Lorenz dominance and second-orderstochastic dominance. Bishop, Chakraborti, and Thistle (1989) outline some distribution-freeinferenceprocedures for generalized Lorenz curves.
444
J. B. McDonaM
It might be useful to think of an inverse Lorenz ranking IL: F1 <IL F2 if and only if (3.11b)
the Lorenz curve of F1 lies above that of F2. to remind us that the direction of ranking of L is opposite to that of SSD, FSD, etc. Furthermore, Aitkinson (1970) states F1 >SSD F2 is equivalent to the Lorenz or inverse Lorenz dominance (F1 >IL F2) for distributions having the same mean. In this case the rankings of nonintersecting Lorenz curves are independent of the form of a social welfare function except that it be nondecreasing and concave. In the case of intersecting Lorenz curves different welfare functions can yield different rankings. For the case of unequal means. I ~ l - #2 and FI >IL F2 implies F1 >SSD F2
o
(3.12)
Some distributions, such as the gamma, Pareto, and lognormal, do not permit intersecting Lorenz curves; the rankings are characterized by a single shape parameter. Other distributions, such as the Burr distributions or generalized gamma distributions, permit intersecting Lorenz curves and require more complicated parameter restrictions to characterize Lorenz dominance. Some of these results will be reviewed.
Lorenz dominance: Burr type 12 Wilting and Kramer (1993) find parametric restrictions to characterize Lorenz dominance for Burr type 12 distributions, GB2 (3,; a, b, p = 1, q):
GB2 ( y ; a l , b l , p = 1,ql) )IL GB2 (y;a2, b2, p = 1,q2) if and only if al >_ a2 and alql >_ a2 q2 .
(3.13)
A comparison of the estimated parameters for the Burr 12 distribution reported in tables 2 and 3 implies VWNYSE >IL AMPCO.
Lorenz dominance: Generalized beta of the second kind For the more general case of the GB2, Wilting and Kramer (1993) find the following necessary condition for Lorenz dominance:
GB2 (y;al,bl, p l , q l ) ) m G B 2 (y;a2,b2, p2,q2)implies

alp1 ~ a2P2 and alqt ~ a2q2 .
(3.14)
Wilting (1992) finds a sufficient condition: al >_ a2, and Pl >_ P2, and ql >_ q2 implies GB2 (y; al, bl, Pl, ql)
)IL GB2
(3.15)
(y; a2, b2 p2, q2).
445
Hence, increases in the parameter a, p or q lead to inverse Lorenz dominance. Based on the estimated parameters for the GB2 reported in tables 2 and 3,we note that the necessary, but not sufficient conditions for VWNYSE to Lorenz dominande A M P C O are satisfied.
Lorenz dominance: Generalized g a m m a
Taille (1981, p. 190) investigates generalized gamma distributions with two-shape parameters. He reports parametric restrictions associated with nonintersecting Lorenz curves, G G ( y ; a l , b l , p l ) >_ IL G G (y;a2, b2, P2) if and only if
al >_ a2 and alPl >_ a2P2
Mean-Gini dominance
(3.16)
The mean-variance ordering has well recognized limitations. An alternate ordering which is related to Lorenz orderings uses the Gini coefficients. The Gini coefficient is twice the area between the 45 degree line of equality and the Lorenz curve, has a long history as a scalar measure of inequality, and has been used as a criterion for comparing return distributions. This approach was introduced into the finance literature by the papers of Yitzhaki (1982) and Shalit and Yitzhaki (1984). The Gini coefficient is defined by:
G, =
2fli J - ~ J-oo
Is - tldFi(s)dF,'(t).
(3.17)
Lorenz dominance FI >IL F2 implies G1 < G2. Yitzhaki (1982) argues that the use of the mean and Gini coefficient can be used to characterize necessary conditions for stochastic dominance for general distributions, which is not possible with the mean-variance criterion. F1 is said to dominate F2 according to the mean-Gini criterion (MG): F1 >Me F2, if and only if MG: #1 -> #2 G1 _~G2 (3.18)
with at least one strict inequality. Applying the mean-Gini criterion to the 45 stocks discussed earlier yields the same efficient set as based on the mean-variance criterions i.e. Aileen, Atlantic, Energy, General public utilities, N U C O R , Union, Pacific and Walgren. Yitzhaki (1982) proposes an additional criterion for ranking distributions, based on the following proposition: PROPOSITION 1. The condition 2n _> 0, for n = 1,2,. -, is a necessary condition for FSD and for SSD, where
446
J. B. McDonald
f 2. = J [ [ 1 - F,(t)] n - [1 - F2(t)]"]dt Evaluating 21 and 22 gives 21 = #1 - #2 >- 0 and

g
(3.19)
22 = #l(1 - GI) - #2(1 - G2) = / [ 1
- F l ( t ) ] 2 - [1 - F 2 ( t ) ] Z d t > O.
These conditions lead to a different m e a n - G i n i ( M G 1 ) criterion where F1 is said to d o m i n a t e F2 in the sense of M G 1 . F1 >MGI F2, if and only if MGl:
]21 ~ #2
#1(1 -
(3.20) G1) _> ,u2(1 - G2)
with at least one inequality. F1 >MG F2 implies that F1 >MG1 f2, but the converse it not true. Hence the efficient set corresponding to M G 1 will be contained in the efficient set obtained from the M G criterion. The weaker the criterion, the smaller the efficient set. F o r cumulative distributions that intersect no m o r e than once, Shalit and Yitzhaki (1984) argue that " > M G I " (with identical means) is sufficient for first and second degree dominance and SMG1 = SSSD. In applying M G 1 to the 45 stocks, Atlantic Energy is deleted f r o m the M V and M G efficient sets. Table 4 reports expressions for the Gini coefficients corresponding to the normal, lognormal, g a m m a , beta (types 1 and 2), Burr 12, generalized g a m m a and GB2 distributions.
Table 4 Gini coefficients Distribution Normal Lognormal Gamma B1 Bz BR12 GG GB2 2LN(~; 0, 1) - 1
~r(p+l)
Gini coefficient
r(p+l/2)
B(p+q,1/2 )B(p+ l /2,1/2 )
r,B(q,l/2) 28(2p,Zq-U
1 - r(q 1/a)r(2q) G~G GGB2
447
where
GGG = [(1/P)2F1 [1,2p + 1/a; p + 1; 1/2] [22p+1/aB(p, p + l/a)]

-(p-~U~) 2F1 [1,2p + 1/a; p + 1/a + 1; 1/2]] [22p+UaB(p, p l/a)]
GGB2 = [(1/p)3F2[1, p + q, 2p + 1/a; p + 1,2(p + q); 1] B(p, q)B(p, p + 1/a)B(Zq - 1/a, 2p + l/a)] -(p+~/~)3F2[1, p + q, Zp + 1/a; p + 1/a + 1,2(p + q); 1] B(p, q)B(p, p + 1/a)B(2q - 1/a, 2p + l/a)]
For references to these formulas see Nair (1936), Aitkinson and Brown (1970), McDonald (1984), Salem and Mount (1974), and Singh and Maddala (1976). These formulas can be used to construct MG and MG1 efficient sets. Non parametric estimates of the Gini can also be used.
Relationships between alternative rankings The following figure summarizes some of the relationships between the rankings FSD, SSD, IL, MG, and MGI:
If the cumulative distributions have at most one intersection and equal means, then MG1 implies SSD. In the case of equal means, SSD and IL are equivalent. The results in Table 4 can be used in forming MG or MG1 efficient sets for different parametric families. It can be shown that the following relationships between efficient sets hold for normal distributions:
SMG1 C SMG i= SSSD SMV,
Yitzhaki (1982)
The relationships between efficient sets is different in the case of lognormal returns and can be shown to be, SMG1 C SSSD C SMG =SMv. Thus the lognormal provides an example in which the mean-variance criterion can be inconsistent with stochastic dominance Yitzhaki (1982). Also see Elton and Greber (1973).
448
J. B. McDonald
3.3. Option pricing

The Black Scholes (1973) option pricing formula has been widely used to price financial assets. This formula is based on the assumption of lognormally distributed returns that may be in poor agreement with the data. One approach to this problem is to approximate the option pricing formula based on the distribution generating the returns with a generalized beta distribution. As noted, the GB2 distribution includes the lognormal as a limiting case and thus allows for departures from the lognormal. The interpretation of the GB2 as a mixture (see equation (2.20)) also allows for departures from the lognormal due to stochastic volatility. Cox and Ross (1976) derive the relationship between the cumulative distribution function of the security process and the equilibrium value of an option of that security. If we can assume risk neutrality in pricing financial assets, the equilibrium price of a European call option is given by the present value of its expected return at expiration,
C(Sr, T, X) = e-rrE[C(So,O)]
= e -rr
JX
(S-X)f(S[Sr, T)dS
(3.21)
where C, T, r, X and St, denote respectively, the price of the option, the time to expiration, the interest rate, the exercise price, and the price of the stock (T periods from the expiration date), Bookstaber (1987). It will be convenient to rewrite this expression in terms of normalized incomplete moments ~b(y; h) : fy-~ shf(s)ds E(y h) Further, let q~(y; h) = 1 - qS(y;h). The equilibrium pric e for the European call option (3.21) can be rewritten as
C(Sr, T,X)=Sr~(~-~;1)- e-r~xq~ (S~ ; 0) ,
(3.22)
McDonald and Bookstaber (1991). The Black Scholes (1973) option pricing formula is obtained by selecting f( ) to be the lognormal and noting that the normalized incomplete moments for the lognormal are cumulative distribution functions for the lognormal with a modification of the parameters: 4
4Aitchison and Brown (1969, p. 12) give the expression for the normalized incomplete moments or moment distributions for the log normal. Also see equation (2.9). < >
449
4~LN(y;h) = LN(y; # + h0-2, 0"2) . Similar expressions for the value of the European call option can be obtained corresponding to the GB2 and G G distributions by noting that ~bGBz(y;h) = GB2 y; a, b, p + , qa
C~GG(y;h)~-GG(y;a, fl,p+h),
(3.23a-b)
Butler and McDonald (1989). Note that the incomplete moments for the G G and GB2 distributions are members of the G G and GB2 families of cumulative distribution functions (equations (2.18) and (2.16)) and thus exhibit a form of closure. McDonald and Bookstaber (1991) investigate the use of the European option pricing model based on the GB2 in the presence of values of skewness and kurtosis that may differ from those associated with the lognormal. They find that for increases in kurtosis, relative to the lognormal, the Black-Scholes model overprices options that are at the money. For options that are sufficiently far in the money, the Black-Scholes model begins to underprice options. The pricing departures from the Black-Scholes formula are sensitive to both kurtosis and skewness. These findings are illustrated by means of a numerical example. Consider, for example, the case of T = .25, r = .10, X = 100, and 0-2 = .40. These values yield a Black Scholes (BS) price of $13.68. The corresponding skewness and kurtosis in the lognormal case are 1.0007 and 4.856 respectively. Now consider incrementally increasing the kurtosis or decreasing the skewness and fitting a GB2, using method of moments. Given the estimated GB2, option prices can be derived using (3.22) and (3.23a). Table 5 reports option prices for a few representative cases. These entries provide an indication of the impact of non-normality (lognormality) on the accuracy of the Black-Scholes pricing formula. For example, if a lognormal accurately represents the return distribution the option price for a stock with price 100 and an exercise price of 100 is $13.68. If the return distribution is characterized by the same mean, variance, and skewness as the lognormal just considered, but the kurtosis is 9.72 (twice 4.86), the option price based on a GB2 valuation is $13.20.
Table 5 GB2 Option Prices (T = .25,r = .10,x = 100, 0-2 = .40) Sr BS % A Kurtosis 50 90 100 110 8.39 13.68 20.19 7.94 13.40 20.21 100 7.53 13.20 20.30 % A Skewness -25 8.20 13.72 20.50 -50 7.98 13.76 20.81 -75 7.75 13.96 21.19
450
J. B. McDonald
Hull and White (1987) and Wiggens (1987) also consider option pricing formulas in the presence of stochastic volatility. Since the GB2 distribution lends itself to a mixture interpretation, the GB2-based option price formula can also be interpreted as being based on a form of stochastic volatility.
3.4. Estimation of Beta's: adaptive and partially adaptive estimation, ARCH, GARCH, and an application
Regression analysis is an important tool in financial modeling. The basic linear regression model is defined by
Yt = Xtfl + et
(3.24)
where Yt and Xt denote the t th observations on the dependent variable and a 1 x K vector of explanatory variables, and/~ is a K x 1 vector of unknown constants. et , the random disturbance, is assumed to be independently and identically distributed with a zero mean and constant variance: E(et) = 0
E(4) =
2=0-2
(3.25)
If we assume that the limit of (X~X/n) as n grows indefinitely large is a positive definite matrix C where X' = (X~X~... X~n) , then the ordinary least squares (OLS) estimator of/~ = ( X ' X ) - I X ' Y has an asymptotic distribution [N(/~; ~2C/n)] . The least squares estimator will be efficient if the random disturbances are normally distributed. However, if the normality assumption is not satisfied, least squares can still be minimum variance of all linear unbiased estimators, but there may be more efficient non linear estimators. It is well known that OLS is very sensitive to outliers such as are often encountered with thick-tailed return distributions. Numerous alternative estimation procedures have been considered in the finance and statistical literature which are less sensitive to outliers than OLS. One of the most commonly applied methods is that of least absolute deviations (LAD), defined by /~LAD = arg min~--2lYt
t
Xt~l
LAD:
(3.26)
Basset and Koenker (1978) demonstrate that this estimator is asymptotically normal if the pdf of e, f(c), is continuous and has positive density at its median. The LAD estimator has been shown to be more efficient, at least asymptotically, than least squares for many thick-tailed distributions; e.g., see Smith and Hall (1972), Kadiyala and Murthy (1977), and Coursey and Nyquist (1983). LAD is the maximum likelihood estimator for random disturbances that are distributed according to the Laplace pdf. Sharpe (1971) and Cornell and Dietrich (1978) use LAD to estimate the betas in the market model.
Probability distributionsfor financial models Lp estimators, defined by

Lp /~Lp = arg m i n ~
t
451
IYt - xd~l p
(3.27)
provide a generalization of both least squares (p = 2) and LAD (p = 1) . Some early studies of Lp estimators included recommendations for the value of p; see, for example, Hogg (1974). M-estimators are another class of estimators that can accommodate possible non-normalities. These estimators are defined by M: /~M = arg m i n ~ p ( ( Y t - Xtfl)la)
t
(3.28)
where cr is a scale estimate for the distribution. The function p0 assigns "weights" to values of the errors. The function 7J(c) = p'(e) measures the "influence" that a random disturbance will have in the estimation process. M-estimators will have an asymptotically normal distribution if E(7~(e)) = 0 and Var (~(e)) is finite. Least squares, LAD, and Lp estimators are special cases of M-estimators. Huber (1981) considers additional M-estimators. The critical question with M-estimation is the selection of an appropriate p(e) function. M-estimators yield MLE and are efficient if p(e) is selected to be { - l n f ( e ) } . Koenker (1982) provides an excellent survey of related material. Since the form of f(c) is rarely known, a couple of approaches have been developed in the literature. One approach, which could be thought of as being "partially adaptive," is to select p(e) to be the negative of the logarithm of a flexible parametric pdf, which may include the normal and allow for thick tails and possible asymmetry. Early papers by Blattberg and Sargent (1971), which assume stable Paretian errors, and by Zeckhauser and Thompson (1970), based on power exponential or BT errors, characterize partially adaptive procedures. Another approach uses methods that are "fully adaptive." Kernel estimators or methods based on generalized method of moments are examples of fully adaptive procedure. Fully adaptive estimators are as efficient, asymptotically, as maximum likelihood estimators based on the actual distribution for the errors. However, fully adaptive estimators need not exhibit the same efficiency characteristics for samples sizes encountered in practice.
Partially adaptive estimation The BT, GT, and EGB2 pdf's provide the basis for estimating regression models in the presence of possible departures from normality. The BT and GT are symmetric, but allow for different degrees of kurtosis. The EGB2 doesn't permit as wide a range of kurtosis, but allows for symmetric and asymmetric error distributions. To illustrate these methods, consider the log-likelihood function obtained from the Box-Tiao pdf equation (2.22)
452
Z B. M c D o n a l d
eBT(fl, cr, p) = n[ln(p) --
ln(2aF(1/p))]
- ~([Y,
t
- X,131/cr) p .
(3.29)
Maximizing BT0 over fl for p = 1 or 2, respectively, yields LAD and OLS. Maximizing gRT0 over fl a n d p endogenizes the selection of p. Thick tailed error distributions would tend to be associated with small values of p and near normal data would tend to be associated with an estimated value of p near 2. The use of the generalized t distribution would not only accommodate error distributions that can be approximated by members of the Student-t family, but would include the Box-Tiao (power exponential family) - both of which include the normal distribution. 7~aT for finite q is redescending and "discounts" outliers in the estimation process. The use of the EGB2 family permits thick tails and asymmetry. 7~E~2 is bounded, for finite q, but not redescending.
A d a p t i v e e s t i m a t o r s - the n o r m a l k e r n e l
A normal-kernel estimator of the regression parameters can be obtained by assuming the errors have a pdf which can be approximated by
where ~b and enN, denote respectively, the standard normal density function and the least squares residuals
enN = r n - X n f l
and/~ is the least squares estimator of ft. s is a smoothing parameter. Trimming parameters can also be introduced, Hseih and Manski (1987). McDonald and White (1993) use a small Monte Carlo simulation study to compare the finite sample performance of LAD, OLS, partially adaptive (EGB2, BT, GT), normal kernel, and a generalized method of moments estimator. They find that the adaptive and partially adaptive estimators dominate OLS and LAD over several non-normal error distributions with minimal efficiency loss in the case of a normal error distribution. Furthermore, they EGB2-estimators dominated all other estimators in the case of an asymmetric error distribution.
ARCH and GARCH models
Numerous applications in finance have found regression errors to be characterized by clusters of small and large residuals that cannot be described by traditional regression models. In these applications large (small) residuals tend to be followed by large (small) residuals. This empirical finding has suggested an autoregressive conditional heteroscedasticity (ARCH) representation for the errors such as
~t = ut[o~o +
~let_l]
.5
3.31
where ut is independently and indentically N[0, 1]. It can be shown that

2 Var [et[et-1] = at2 = ~o + ~1~t-1
453
(3.32a - b)
Var [et] = c~0/(1 - al) if al < 1, Engle (1982). This model (3.31) is referred to as an A R C H model of the first order, A R C H (1). OLS estimators will still be the best linear unbiased estimators of fl if the errors are A R C H (1) or even if the errors are non-normal; however, they will not be efficient in the class of non-linear estimators. A R C H models of order p, ARCH(P), can be defined by
2 = ~o + ~ et_l 2 + . . . + ~p e t2 A R C H (p) : a t -p
"
(3.33)
Bollerslev (1986) has proposed a generalized A R C H (GARCH) model defined by G A R C H (p,q) : ~rt = ~0 + ~let_l + . - .
2 2
~_O~p~Lp @ 10.tL
1 -~-.-. @
6qGL q
(3.34)
The G A R C H specification permits a parsimonious parameterization of many models; which would require a high order A R C H model. The G A R C H formulation allows the variance to evolve over time in a much more general way than permitted with an A R C H model. Bollerslev reports conditions for stability of moments up to order 12 for a G A R C H (1, 1) model. Greene (1993) presents an overview of A R C H and G A R C H models. Bollerslev, Chou, and Kroner (1992) provide an extensive survey of the theory and empirical applications. Nelson (1991) used the BT as a flexible parametric model in his applications of A R C H and G A R C H models. The EGB2 and G T formulations would provide additional flexibility. Partially and fully adaptive procedures could be combined with A R C H and G A R C H specifications to account for non-normalities (skewness/or leptokurtic error distributions) and clustering found in some empirical finance applications.
A n application to the m a r k e t model.. (AMPCO)
We use the monthly return data referred to in Section 3.1 to estimate the beta of a stock. The dependent variable is Y = ln((Pt + d t ) / P t - 1 ) - re where Pt and dt denote the price and dividends in period t for A M P C O and rt denotes monthly returns on 30 day treasure bills (a proxy for the risk-free rate). The independent variable is constructed as X --- the logarithm of the monthly return on the valueweighted New York Stock Exchange (VWNYSE) less the risk-free rate. The estimated least squares results are = -.0169 + 1.085X (Rtl (-1.44) (3.4) = .166 D W = 1.56 Log-likelihood = g = 60.62 Skewness = 1.56 Kurtosis = 8.7
454
J. B. McDonald
Table 6 Estimates of/~: AMPCO - monthly returns Market Model: Yt ~- ~ + BXt + et (January 1988 - December 1992) OLS LAD BT GT EGB2 KERNEL -.0193 1.149 .166 --
p q R2
-.0169 -.0186 -.0187 .024 -.016 1.085 1.176 1.187 .878 .993 2.000 1.000 1.11 6303.4 .984 ~ cxD .0003 .552 .166 .166 .165 .160 .165 60.6 65.3 65.4 69.1 66.9
The skewness and kurtosis values suggest a problem with the assumption of normally distributed errors. This is confirmed bY a Jarque-Bera test as well as a goodness of fit test using 6 groups. The model was reestimated using LAD, BT, GT, EGB2 and Kernel specifications for the error distribution. The results are reported in Table 6: The BT, GT, and EGB2 specifications provide a statistically significant improvement in the log-likelihood value relative to the normality assumption (i.e. using least squares). There is considerable variation in the estimated of value of/L Only the EGB2 and Kernel estimators allow for skewed error distributions. The properties of these estimators need additional study. Two applications of partial adaptive estimation (not Kernel) can be found in Butler et.al (1990) and McDonald and Nelson (1993). The beta's were estimated for each of 45 randomly selected firms. N o n e of the 45 cases considered exhibited serious A R C H behavior of the error terms. This behavior would more likely be observed in weekly or daily returns.
3.5. Other applications
These applications of flexible parametric families of probability distributions are only suggestive of the breadth of potential uses of flexible parametric distributions. Other applications in finance might include models for A R I M A forecasting models with A R C H or G A R C H components, qualitative response models, and models for duration of business cycles. Estimation of these models is tractable. Still another application would be to make the parameters of the underlying distributions estimable functions of exogenous variables. This would permit possible modeling predicted shifts in distributions of interest.
Appendix A: Special functions

This section reviews some functions and notation discussed in the body of the paper. Abramowitz and Stegun (1964), Luke (1969), Rainville (1960), and
455
Sneddon (1961) are useful references for those interested in additional background in this area. The gamma function, F(z) , is defined by
F(z) =
f0
e-ttZ-ldt
(A.1)
for real (z) > 0. Integrating (A.1) by parts yields the recurrence relation
F(z) = ( z - 1 ) F ( z - 1) .
Two helpful results are F(.5) = x/~ and
(A.2)
(A.3)
F(z) ~ e-ZzZ-5(Zrc) 5 as z ~ oc ,
Rainville (1960). The second result is known as Stirling's approximation. The beta function, B(p, q) , is defined by
(A.4)
B(p,q) =
tp-l(1 - t)q-ldt
(A.5)
t p-1 p+4dt = fo (I -+-~

for positive p and q. B(p, q) can also be expressed in terms of gamma functions as
r(p)r(q) ~(P' q) - r ( p + q)
(A.6)
The cumulative distribution functions considered in this paper can be expressed in terms of hypergeometric series whose representation is facilitated by the pochammer notation
(a)n (a)(a + 1)(a + 2 ) . . . (a + n -- 1)

= 1 forn=0 .
r ( r(a) a + n) for
l<n
(A.7)
The generalized hypergeometric series is defined by
pFq[al~a2,.. . ,ap;bl,b2,... ~bq;x] = ~
(al)i(a2)i" "'-(ap)iy~i (bl)i(b2)i "'" (bq)ii!
(A.8)
Two important special cases of the generalized hypergeometric series are the confluent hypergeometric series with (p -- q -- l)
456
J. B. McDonald
(al)i x i i=0 (bl)ii]
1Fl[al; bl;x] =
(A.9)
a n d the h y p e r g e o m e t r i c series with ( p = 2, q = 1) 2Fl [al, a2; bl;X] = ~ (al)i(a2)ixi i=0 (bl)i i! (A.10)
A s an e x a m p l e o f the flexibility o f these functions, the e x p o n e n t i a l f u n c t i o n e ~ a n d b i n o m i a l e x p a n s i o n o f (1 - x)" can be expressed as special cases o f generalized h y p e r g e o m e t r i c series
ex = tFl[a;a;x] a n d (1 - x ) n = lFo[-n;x] .
C u m u l a t i v e d i s t r i b u t i o n s functions for m a n y o f the r a n d o m variables considered in this p a p e r can be expressed in terms o f the i n c o m p l e t e g a m m a a n d i n c o m p l e t e b e t a functions defined by
7x(P) =
/o xe-ttp-ldt
(A.11)
= (x-fPp)lFl[p, p + l;-x]
Rainville (1960, p. 127) a n d
Bx(p, q) = fo x t p-1 (1 -- t)q-ldt = xP - - 2 F 1 [p, 1 - q; p + 1;x] ,

P L u k e (1969, Vol 2, p. 178)
(A.12)
Appendix B
DATA: 1. Selected Firms 1. Aileen Inc. 2. Aluminum Company Amer 3. American Home Products Corp. 4. Ampco-Pittsburg Corp. 5. Armatron International Inc. 6. Atlantic Energy Inc. N.J. 7. Becton Dickinson & Co. 8. Bethlehem Corp. 9. Brascan Ltd. 10. Brown Forman Inc. 11. Caterpillar Inc. 23. LVI Group Inc. 24. MEI Diversified Inc. 25. Manville Corp. 26. Masco Corp. 27. Mesabi Trust 28. Minnesota Power & Light 29. Nevada Power Co. 30. Nucor Corp. 31. Oneida Ltd. 32. Perkin Elmer Corp. 33. Proler International Corp.
457
Appendix B
(Contd.)
34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. Quaker State Corp. Quantum Chemical Rockwell International Corp. Russell Corp. Ryder Systems Inc. SPS Technologies Inc. Speed O Print Business Mach. Thomas Industries Inc. Union Pacific Corp. Walgreen Co. Wheeling Pittsburgh Corp. Witco. Corp.
12, Cleveland Cliffs Inc. 13. Coastal Corp. 14. Cominco Ltd. 15. Crowley Milner & Co. 16. Curtiss Wright Corp. 17. Dole Food Co. 18. FPL Group Inc. 19. General Public Utils Corp. 20. Hapmpton Industries Inc. 21. Hershey Foods Corp. 22. KATV Industries Inc.
2. DATA AMPC 1.064220 1.017241 0.932203 1,041818 0.921053 1,000000 1.024762 0.934579 1.050000 0.977143 0.990196 1.075248 1.074074 1.017241 1.050847 0.948387 1.042735 0.950820 0.970690 1.080357 0.958678 0.936207 0.953704 0.899029 0.858696 0.924051 1.041096 1.021053 0.948052 0.917808 0.994030 0,924242 0.786885 0.887500 V-WNYSE 1.046050 1.048949 0.975659 1.010124 1.005238 1.048774 0.994076 0.971946 1.038680 1.023418 0.985484 1.018858 1.067892 0.980880 1.021679 1.047366 1.038815 0.997407 1.083675 1.020178 0.996213 0.972320 1.020250 1.021427 0.932313 1.013580 1.023607 0,973387 1.089550 0.994140 0.996466 0.912555 0.951331 0.992493 TREASURY BILLS 1.002942 1.004556 1.004407 1.004616 1.005053 1.004853 1.005072 1.005938 1.006167 1.006101 1.005662 1.00634l 1.005514 1.006131 1.006706 1.006748 1.007873 1.007093 1.006955 1.007392 1.006545 1.006765 1.006866 1.006069 1,005670 1,005679 1,006441 1,006873 1.006771 1.006251 1.006771 1.006572 0.005984 1.006818
458
J. B. McDonald
1.119048 1.119149 1.038462 1.240741 1.074627 0.952778 0.955882 0.892308 0.975862 0.875000 1.244898 1.059016 0.937500 1.033333 1.041935 1.046875 0.955224 0.978125 0.919355 0.982456 1.028571 0.842105 1.000000 0.991667 1.042553 1.469388
1.063259 1.028244 1.042467 1.072492 1.024085 1.002769 1.040264 0.957727 1.045763 1.024660 0.986335 1.014917 0.962219 1.106464 0.988254 1.011946 0.981095 1.023507 1.005404 0.984084 1.041030 0.980654 1.009555 1.006615 1.034350 1.014859
1.005651 1.005989 1.005177 1.004767 1.004391 1.005335 1.004721 1.004171 1.004884 1.004610 1.004558 1.004246 1.003915 1.003792 1.003391 1.002828 1.003376 1.003249 1.002758 1.003201 1.003077 1.002605 1.002573 0.002286 1.002346 1.002823
Acknowledgement
The Author expresses appreciation to Darin Clay and Julia Sunny for research assistance and to Scott Carson and Grant McQueen for their comments on an earlier draft of this paper.
References
Abramowitz, M. and I. A. Stegun (1964). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. National Bureau of Standards, Applied Mathematics Series No. 55, Washington, D.C. Aitchison, J. and J. A. C. Brown (1969). The Lognormal Distribution with Special References to lts Uses in Economics. Cambridge University press, Cambridge. Akgiray, V. and G. G. Booth (1988). The stable-law model of stock returns. J. Business Econom. Statist. 6(1), 51 57. Ali, M. M. (1975). Stochastic dominance and portfolio analysis. J. Financ. Econom. 2, 205-229. Arnold, B. (1983). Pareto Distributions. International Cooperative, Burtousville, MD. Arrow, J. K. (1965). Aspects of the Theory of Risk Bearing. Helsinki. Atkinson, A. B. (1970). On the measurement of inequality. J. Econom. Theory 2, 244-63.
459
Basset, G. and R. Koenker (1978). Asymptotic theory of least absolute error regression. J. Amer.Statis. Assoc. 73, 618-622. Bishop, J. A., S. Chakraborti and P. D. Thistle (1989). Asymptotically distribution free statistical inference for generalized Lorenz curves. Rev. Econom. Statist. 71,725-727. Black, F. and M. Scholes (1973). The pricing of options and corporate liabilities. J. Politic. Econom. 81, 637-659. Blattberg, R. C. and N. J. Gonedes (1974). A comparison of the stable and student distributions as statistical models for stock prices. J. Business 47, 244~280. Blattberg, R. and T. Sargent (1971). Regression with non-Gaussian disturbances: Some sampling results. Econometrica 39, 501-510. Bollerslev, T. (1986). Generalized autoregressive conditional heteroscedasticity. J. Econometrics 31, 307-327. Bollerslev, T., R. Y. Chou and K. F. Kroner (1992). ARCH modeling in finance. J. Econometrics 52, 5-59. Bookstaber, R. M. (1987). Option Pricing and Investment Strategies. Probus Publishing Co., Chicago. Bookstaber, R. M. and J. B. McDonald (1987). A general distribution for describing security price returns. J. Business 60, 401~424. Butler, R. J. and J. B. McDonald (1989). Using incomplete moments to measure inequality. Jr. Econometrics 42, 109-119. Butler, R. J., J. B. McDonald, R. Nelson, and S. White (1990). Partially adaptive estimation of regression models. Rev. Econom. Statist. 72, 321-327. Clark, P. K. (1973). A subordinated stochastic process model with finite variance for speculative prices. Econometrica 41, 135-155. Cornell, D. and J. K. Dietrich (1978). Mean-absolute-deviation versus least squares regression estimation of beta coefficients. J. Financ. Quant. Anal. 13, 123-131. Coursey, D. and H. Nyquist (1983). On least absolute error estimation with linear regression models with dependent stable residuals. Rev. Econom. Statist. 65, 687 692. Cox, J. C. and S. A. Ross (1976). The valuation of options for alternative stochastic processes. J. Financ. Econom. 3, 145-166. Elderton, Sir W. P. and N. L. Johnson (1969). Systems o f Frequency Curves. Cambridge University Press, London. Elton, E. J. and M. J. Gruber (1974). Portfolio theory when investment relatives are lognormally distributed. J. Finance 29, 126~1273. Engle, R. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflations. Econometrica 50, 987-1008. Fama, E. F. and R. Roll (1968). Some properties for symmetric stable distributions. J. Amer. Statist. Assoc. 63, 817-836. Fishburn, P. C. (1964). Decision and Value Theory. Wiley, New York. Greene, W. H. (1993). Econometric Analysis. Macmillan, New York. Hagerman, R. L. (1978). More evidence on the distribution of security returns. J. Finance 33, 1213-1221. Hanoch, G. and H. Levy (1969). The efficiency analysis of choices involving risk. Rev. Econom. Stud. 36, 33~346. Hirschberg, J., S. Mazumdar, D. Slottje and G. Zhang (1992). Analyzing functional forms of stock returns. J. Appl. Financ. Econom. 2(4), 221-227. Hogg, R. V. (1974). Adaptive robust procedures: A partial review and some suggestions for future applications and theory. J. Amer. Statist. Assoc. 69, 909-927. Hsieh, D. A. and C. F. Manski (1987). Monte Carlo evidence on adaptive maximum likelihood estimation of a regression. Ann. Statist. 15, 541-551. Huber, P. J. (1981). Robust Statistics. Wiley, New York. Hull, J. and A. White (1987). The pricing of options on assets with stochastic volatilities. J. Finance. 52, 281-300. Johnson, N. L. and S. Kotz (1970). Continuous Univariate Distributions. Vol. 2. Wiley, New York.
460
J. B. McDonaM
Kadiyala, K. R. and K. S. R. Murthy (1977). Estimation of regression equations with cauchy disturbances. Canad. J. Statist. Section C: Applications. 5, 111-120. Kakwani, N. C. (1984). Welfare Rankings of Income Distributions. Advances in Econometrics, 3. Edited by R. L. Basmann and G. F. Rhodes. Greenwich, Conn., JAI Press. Kalbfleisch, J. D. and R. L. Prentice (1980). The Statistical Analysis of Failure Time Data. Wiley, New York. Kendall, M. G. and A. Stuart (1969, 1967). The Advanced Theory ofStatisticis, Vol.I and II. Griffin, London. Koenker, R. (1982). Robust methods in econometrics. Econometric Rev. 1, 213-255. Lau, A. H., H. Lau and J. R. Wingender (1990). The distribution of stock returns: New evidence against the stable model. J. Business Econom. Statist. 8, 217-223. Lau, H., J. R. Wingender and A. H. Lau (1989). On estimating skewness in stock returns. Mgmt. Sci. 35(9), 1139-1142. Lehmann, E. L. (1959). Testing Statistical Hypotheses. Wiley, New York. 74-75. Luke, Y. L. (1969). The Special Functions and their Approximations. Vol. I and II. Academic Press, New York. Mandelbrot, B. (1963). The variation of certain speculative prices. J. Business. 36, 394-419. Markowitz, H. M. (1959). Portfolio Selection. Wiley, New York. McDonald, J. B. (1984). Some generalized functions for the size distribution of income. Econometrica. 52, 647-663. McDonald, J. B. and R. M. Bookstaber (1991). Option pricing for generalized distributions. Communications in Star&tics: Theory and Methods. 20(12), 4053-4068. McDonald, J. B. and R. J. Butler (1987). Some generalized mixture distributions with an application to unemployment duration. Rev. Econom. Statist. 69, 232-240. McDonald, J. B. and R. Nelson (1993). Beta estimation in the market model: Skewness and Leptokurtosis. Comm. Statist. 22:10 McDonald, J. B. and W. K. Newey (1988). Partially adaptive estimation of regression models via the generalized T distribution. Econometric Rev. 12, 103-124. McDonald, J. B. and S. B. White (1993). A comparison of some robust, adaptive, and partially adaptive estimators of regression models. Econometric Rev. 12, 103-124. McDonald, J. B. and Y. J. Xu (1995). A generalization of the beta distribution with applications. J. Econometrics, 66, 133-152. Nair, U. S. (1936). The standard error of Gini's mean difference. Biometrika. 38, 428 36. Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new approach. Econometrica. 59, 347-370. Officer, R. R. (1972). The distribution of stock returns. J. Amer. Statist. Assoc. 67, 802812. Ord, J. K. (1972). Families of Frequency Distributions. Griffin, London. Patil, G. P., M. T. Boswell, and M. V. Ratnaparkhi (1984). Dictionary and Classified Bibliography of Statistical Distributions in Scientific Work. International Cooperative Publishing, Burtonsville, MD. Pearson, K. (1895). Memoir on skew variation in homogeneous materials. Phil. Trans. Roy. Soc.. A. 186 343-414. Pearson, K. (1901). Supplement to a memoir on skew variation. Phil. Trans. Roy. Soc.. A. 197, 443459. Pearson, K. (1916). Second supplement to a memoir on skew variation. Phil. Trans. Roy. Soc.. A. 216, 429-457. Pope, R. D. and R. F. Ziemer (1984). Stochastic efficiency, normality, and sampling errors in agricultural risk analysis. Amer. J. Agri. Econom. 66, 31-40. Praetz, P. D. (1972). The distribution of share price charges. J. Business. 45, 49-55. Pratt, J. W. (1964). Risk Aversion in the Small and Large. Econometrica. 12~136. Quirk, J. P. and R. Saposnik (1962). Admissibility and measurable utility functions. Rev. Econom. Stud. Rainville, E. D. (1960). Special Functions. MacMillan, New York.
461
Salem A. B. and T. D. Mount (1974). A convenient descriptive model of income distribution: The gamma density. 42, 1115-1127. Shalit, H. and S. Yitzhaki (1984). Mean-Gini, portfolio theory, and the pricing of risky assets. J. Finance 39, 1449-1468. Sharpe, W. F. (1971). Mean-absolute-deviation characteristic lines for securities and portfolios. Mgmt. Sci. 18, 1 13. Shorrocks, A. F. (1983). Ranking income distributions. Economica 3-17. Singh, S. K. and G. S. Maddala (1976). A function for the size distribution of incomes. Econometrica 44, 963-973. Sneddon, I. N. (1961). Special Functions of Mathematical Physics and Chemistry, 2nd ed., Interscience Publishers, Edinburgh. Smith, V. K. and T. W. Hall (1972). A comparison of maximum likelihood versus BLUE estimators. Rev. Econom. Statist. 54, 186-190. Stuart, A. and J. K. Ord (1987). Kendall's Advanced Theory of Statistics, Vol.1. Oxford Press, New York. Taillie, C. (1981). Lorenz ordering within the generalized gamma family of income distributions. Statistical Distributions in Scientific Work. 6, 181-192. Taylor, S. (1986). Modeling Financial Time Series. Wiley, New York. Tobin, J. (1958). Liquidity preference as behaviour towards risk. Rev. Econom. Stud. 25, 6568. Von Neumann, J. and O. Morgenstern (1953). Theory of Games and Economic Behaviour. 3rd ed., Princeton Press, Princeton. Wiggens, J. B. (1987). Option values under stochastic volatility. J. Financ. Econom. 19, 351-372. Wilting, B. (1992). A sufficient condition for Lorenz domination of generalized beta income distributions of the second kind. University of Dortmund, Mimeo. Wilting, B. and W. Kramer (1993). The Lorenz-ordering of Singh-Maddala income distributions. Econom. Lett. 43, 53-57. Yitzhaki, S. (1982). Stochastic dominance, mean variance and Gini's mean difference. Amer. Econom. Rev. 72, 178-85. Zeckhauser, R. and M. Thompson (1970). Linear regression with non-normal error terms. Rev. Econom Statist. 52, 28~ 286.
1 1
Bootstrap Based Tests in Financial Models*
G. S. Maddala and Hongyi Li
1. Introduction
Bootstrap methods initiated by Efron (1979) have been widely used, during the last decade, in the financial literature for a variety of purposes including the following: (i) (ii) (iii) (iv) (v) To obtain small sample standard errors, e.g., Akgiray and Booth (1988) and Badrinath and Chatterjee (1991). To get significance levels for tests, e.g., Hsieh and Miller (1988) and Shea (1989a,b). To get significance levels for trading rule profits, e.g., Levich and Thomas (1993) and LeBaron (1994). To develop empirical approximations to population distributions, e.g., Bookstaber and McDonald (1987). To use trading rules on bootstrapped data as a test for model specification, e.g., Brock, Lakonishok and LeBaron (1992), LeBaron (1991, 1992), Kim (1994) and Karolyi and Kho (1994). To check the validity of long-horizon predictability, e.g., Goetzmann and Jorion (1993), Nelson and Kim (1993), Mark (1995), Choi (1994), and Chen (1995). Impulse response analysis in non-linear models, e.g., Gallant, Rossi and Tauchen (1993) and Tauchen, Zhang and Liu (1994).
(vi)
(vii)
Some of the applications of bootstrap methods in finance that are reviewed here are defective only in light of recent developments in bootstrap methods. However, it is best to review them in light of recent development so that refinement can be made in the use of bootstrap methods in the future. Before we review these studies, we will first discuss the relevant issues in the application of bootstrap methods in financial models.z
* We would like to thank Steve Cosslett and Nelson Mark for many helpful comments. The usual disclaimer applies. 463
464
G. S. Maddala and H. Li
2. A review of different bootstrap methods
Most financial models involve time series data. With time series data the standard bootstrap method relevant for IID observations are not valid. Some alternatives are the recursive bootstrap, moving block bootstrap and the stationary bootstrap. We will give a brief outline of these alternatives. First we start with the standard bootstrap.
2.1. The standard bootstrap
Let ( y l , y 2 , . . . , Yn) be a random sample from a distribution characterized by a parameter 0. Inference about 0 will be based on a statistic T. The basic bootstrap approach consists of drawing repeated samples (with replacement of size m, which may or may not be equal to n, although it usually is). Call this sample (y~, y~,..., y~,) . This is the bootstrap sample. We do this NB times and for each bootstrap sample we compute the statistic T. Call this T*. The distribution of T* based on the NB bootstrap samples is known as the bootstrap distribution of T. We use this to make inferences about 0. This procedure has been extended to classical regressions by Freedman (1981 a, b). In the case of the classical regression models, it is the residuals that are resampled. Needless to say when the errors are not IID, one needs to modify this procedure.
2.2. The recursive bootstrap
To deal with lagged dependent variables and serially correlated errors with a well specified structure (say stationary ARMA(p, q) models with known p and q), one can use the recursive bootstrap method, first introduced by Freedman and Peters (1984). This method was also used by Efron and Tibshirani (1986) for bootstrapping the AR(1) and AR(2) models. In the recursive bootstrap method one estimates the model by OLS, or some other consistent methods, obtains the residuals and (after rescaling and centering) resamples them. With the resampled residuals, one next generates the bootstrap samples recursively. In the case of a regression model with say AR(1) errors, such as
Yt = ~xt + ut Ut = put-1 q- et
(1) (2)
where et ~ lID(O, a2), one estimates equation (1) by OLS, then using the estimated residuals fit, one estimates t5 using the Cochrane-Orcutt or Prais-Winsten procedures and obtains Ot. Then one resamples ~t and using a recursive procedure generates fit, and the bootstrap sample on yt.
2.3. The moving block bootstrap
Application of the recursive bootstrap methods is straightforward if the error distribution is specified to be a stationary ARMA(p, q) process with known p and
Bootstrap based tests in finaneial models
465
q. However, if the structure of serial correlation is not tractable or is misspecified, the residual based methods will give inconsistent estimates (if lagged dependent variables are present in the system). Other approaches which do not require fitting the data into a parametric form have been developed to deal with general dependent time series data. Carlstein (1986) first discussed the idea of bootstrapping blocks of observations rather than the individual observations. The blocks he considers are non-overlapping. Later, Kfinsch (1989) and Liu and Singh (1992) (the paper was available as a discussion paper in 1988) independently introduced a more general bootstrap procedure, the moving block bootstrap which is applicable to stationary time series data. In this method the blocks of observations are overlapping. The methods of Carlstein (non-overlapping blocks) and Kfinsch (overlapping blocks) both divide the data of n observations into blocks of length l and select b of these blocks (with repeats allowed) by resampling with replacement all the possible blocks. Let us for simplicity assume n = bl. In the Carlstein procedure, there are just b blocks. In the Kfinsch procedure there are n - l + 1 blocks. The blocks are Lk = {Xk,Xk+l,... ,Xk+t-1} for k = 1 , 2 , . . . , (n - l + 1). For example with n = 6 and l = 3 suppose the data are: xt = {7, 2, 3, 6, 1, 5}. The blocks according to Carlstein are {(7,2,3), (6,1,5)}. The blocks according to Kfinsch are {(7,2,3), (2,3,6), (3,6,1), (6,1,5)}. Now draw a sample of two blocks with replacement in each case. Suppose, the first draw gave (7,2,3). The probability of missing the block (6,1,5) is 1/2 in Carlstein's scheme and 1/4 in the moving block scheme. Thus there is a higher probability of missing entire blocks in the Carlstein scheme. For this reason, it is not popular, and is not often used. The literature on blocking methods is mostly on the estimation of the sample mean and its variance, although Liu and Singh (1992) talk about the applicability of the results to more general statistics, and Kfinsch (1989, p. 1235) discusses the AR(1) and MA(1) models.
2.4. The stationary bootstrap

The pseudo time series generated by the moving block method is not stationary, even if the original series {xt} is stationary. For this reason, Politis and Romano (1994) suggest the stationary bootstrap method. The basic steps for the stationary bootstrap are the same as those of the moving block bootstrap. However, there is one major difference between the sampling schemes of the moving block bootstrap and the stationary bootstrap. The stationary bootstrap resamples the data blocks of random length, where the length of each block has a geometric distribution with parameter p, while the moving block bootstrap resamples blocks of data of the same length. There is some discussion of optimal choice of k and p in the papers by Carlstein (1986), Kfinsch (1989), Hall and Horowitz (1993) and Politis and Romano (1994). These rules are merely suggestive in small sample cases. More experience is needed on these choices. Furthermore, when using the blocking methods one needs to modify the test statistics as well, as discussed in Hall and Horowitz
466
(1994). There are, as yet, no applications in the financial literature using the blocking methods. Here we mention them as viable alternatives.
3. Issues in the generation of bootstrap samples and the test statistics

There are three important issues that need to be resolved in the use of bootstrap methods in financial models. These are: (1) (2) (3) Whether to bootstrap the residuals or the data. If it is the residuals, how should the residuals be generated? How should the appropriate test statistics be defined?
Regarding question (1), although bootstrapping the residuals is a common procedure, there have been some examples in the literature where bootstrapping the data has been suggested. This alternative, however, is not a valid one in the case of time series models. There are quite a few applications of this method in finance, which are reviewed in the next section. For the case of random regressors (which he calls the "correlation model" as opposed to the "regression model"), Freedman (1981a) suggests resampling the pair (y, x) which have a joint distribution with E(y[x) = xfl. Efron (1981) uses the direct method of resampling the data in a problem involving censored data. The direct method of bootstrapping the data has also been advocated in Efron and Gong (1983). The main problem with the direct method is that no specific model is assumed and for this reason it can result in investigators not doing any specification testing before bootstrapping. This is the case for instance in the study by Levich and Thomas (1993). It is always important to do some specification testing before bootstrapping. Otherwise we would be bootstrapping the wrong model. For this reason we do not recommend bootstrapping the data - particularly in the case of time series models and cointegrating regressions. In the case of I(1) data resampling the data destroys the I(1) property. Resampling the residuals uses more information because afterall we are interested in estimating a model with the bootstrap data and whatever model we estimate should also be a part of the information that should be used in the process of bootstrap data generation. This point is elaborated in Li and Maddala (1996a). The next question is: If it is the residuals that we use in resampling, how should the residuals be generated? To focus on the issues, consider a simple regression model:
y, = / 3 x t + ut . (3)
L e t / ) be the OLS estimator of/~, and ~t the OLS residual. If we are testing the hypothesis/~ =/30, then we should use the residuals ~t = yt -- floXt for resampling: The reason for this is that, if the null H0:/3 =/30 is true but the OLS estimator/~ gives a value of 13far away from/30, the empirical distribution of the residuals will suffer from a poor approximation of the distribution of the errors under the null.
Bootstrap based tests in financial models
467
If equation (3) is a cointegrating regression, so that Yt and xt are I(1) and ut is I(0), then just bootstrapping fit is not enough. We should also make use of the information that x, is I(1). Suppose we write Axt = v, where vt is I(0). Then we resample the pairs (f, vt) in the bootstrap data generations. This is what was done in Li and Maddala (1996b). Thus, it is important to take account of the structure of the model in the generation of bootstrap samples. Coming next to the problems of using bootstrap methods for bias correction, in this case it makes more sense to bootstrap the residuals fit rather than flit. However, since bootstrap methods are time consuming, one might use the same bootstrap samples for both hypothesis testing and for bias correction. But, the formulae to be used for bias correction will be different depending on whether fit or fit are resampled. If we denote fl~ as the estimator of fl from the ith bootstrap sample and define /~* = (NB) -1 ~ ifi; where N B is the number of bootstrap samples, then the bias-corrected estimator of fl is Lc = fi + (/} - fi*) if we use fit for bootstrapping, and (4)
/ bc =/} +
(5)
if we use fit for bootstrapping. Thus, which residuals to use for bootstrapping hinges on the purpose of the bootstrap method: Whether it is hypothesis testing or bias correction. In the former case we should use fit and not fit. In the latter case either one can be used but the bias correction formulae will be different. Other resampling schemes have also been discussed in the literature. (See Giersbergen and Kiviet, 1994). Let u* be the resampled residuals obtained by resampling the OLS residuals ft. Then consider the two sampling schemes: S,: y* = fix + u* $2: y* = flo x + u * . The resampling we discussed earlier is (6) (7)
s3: y* =/ 0x +
(8)
where u~ is the residual obtained by resampling f = y - flox. Note that both St and $2 use the OLS residuals f for resampling. Hall and Wilson (1991) provide two general guidelines for hypothesis testing using sampling scheme S1. It should be noted that these guidelines were not discussed explicitly in the context of regression models, but they hold in these cases. The first suggests using the bootstrap distribution of (fl*-/}) but not (/}* - fi0), where j}* is the estimate of/} from the bootstrap sample. The second guideline suggests using^a properly studentized statistic, that is (/}* - fi)/6.* and not (fi* -/})/6- or just (fl* - fl), where 6.* is the estimate of tr from the bootstrap sample, and 6. is the estimate of ~ from the OLS regressions.
468
Suppose we define the test statistics: /'1: T(/~) = (fi* - fi)/6-* T2: T(/~0) = (/~* -/~0)/#* (9) (10)
T~ is the appropriate test statistic for S~ and T2 is the appropriate test statistic for sampling schemes $2 and $3. As mentioned earlier, for hypothesis testing, sampling scheme $3 is the most appropriate one. Rayner (1990) used sampling scheme $2 and test statistic T2. In the case of unit root models, Basawa et al. (1991a) prove that sampling scheme $1 is not appropriate. Basawa et al. (1991b) use test statistic n(/~* - 1) with sampling scheme $3. Ferretti and Romo (1994) show that the test statistic n(/~* - 1) with sampling scheme $2 can also be used in bootstrap tests of unit roots. The preceding discussion outlines the different resampling schemes for generating bootstrap samples, and their applicability in different contexts. These results should be borne in mind while using bootstrap methods for hypothesis testing and/or bias correction. Finally, there is the issue relating to the type of statistics to use for bootstrapping when procedures like the moving block bootstrap are used. Davison and Hall (1993) argue that this creates problems in using the percentile-t method with the moving block bootstrap. They suggest that the usual estimator ~2 -1 n 2 K-'q-I K"~n-k . n ~i=l(X .i Xn)2 be . modified. to ~2 n-l~-~n=0 {(Xi Xn) ~- Z.ak=lZ~i=l (xi-2n)(xi+k-2n)}. With this modification the bootstrap-t can improve substantially on the normal approximation. The reason for this bias in the estimator of the variance is that the block bootstrap method damages the dependence structure of the data. Unfortunately this formula is valid only for the variance of x/~2n. For more complicated problems there is no such simple correction available. In a subsequent paper, Hall and Horowitz (1994) investigate this problem in the context of tests based on GMM estimators. They argue that because the blocking methods do not replicate the dependence structure of the original data, it is necessary to develop special bootstrap versions of the test statistics and these must have the same distribution as the sample version of the test statistics through Op(n-1). They derive the bootstrap versions of the test statistics with Carlstein's blocking scheme (non-overlapping blocks) but argue that Ktinsch's blocking scheme is more difficult to analyze owing to its use of overlapping blocks. In the case of hypothesis tests in cointegrating regressions based on the moving block scheme, the derivation of the appropriate bootstrap versions of the test statistics is still more complicated. Although the use of the bootstrap version of the usual test statistics cannot be theoretically justified, the Monte Carlo results reported in Li and Maddala (1996b) unequivocally indicate considerable improvement over the asymptotic results. Thus, in spite of no explicit theoretical justification, using the usual test statistics and bootstrapping them produces substantial improvement over asymptotic results.
469
4. A critique of the application of bootstrap methods in financial models

In the light of the preceding discussion we will now review some studies in finance using the bootstrap methods. The main problem with most studies is the use of the standard bootstrap based on the assumption of IID observations (or residuals). This is particularly questionable with the use of cointegrating regressions.
4.1. Bootstrapping the data
Bootstrapping the data has been in wide use in the financial literature. For instance, Bookstaber and McDonald (1987) (referred to as B-M) use it to generate a large number of samples from the original data. They need a large data set so that they can discriminate well between the different classes of distributions they consider. They start with 500 daily return observations dating from December 30, 1981, on 21 randomly chosen stocks. From the sample of 500 observations, they sample randomly with replacement 250,000 times. The resulting bootstrapped sample can be regarded as one 250,000-element data set. They then multiply the first 250 observations on daily returns to get 250-day return and do this for each group of 250 observations. They thus have a sample of 1,000 observations on 250day returns. The main problem with this study is the use of the standard bootstrap method which assumes that the observations are IID. Chatterjee and Pari (1990) consider the bootstrap method to determine the number of factors in the return generating process assumed by the APT (arbitrage pricing theory). They argue that the usual chi-square test overestimates the number of factors. The bootstrap alternative, in their example, suggests a onefactor model to be plausible. There are two problems with this study. The bootstrap approach in this study, (as with many others) is based on the assumption that daily returns are independent. There is now substantial evidence against this assumption. The second issue is the use of t-statistics, essentially from what amounts to a percentile method. There is, again, substantial evidence to show that the bootstrap-t method or the bias corrected percentile methods are more reliable than the simple percentile method. Thus both the process of generating bootstrap samples and the construction of test statistics can be substantially improved in light of the developments in bootstrap methodology since Efron's 1979 paper. Hsieh and Miller (1990) (abbreviated as H-M) also use the method of bootstrapping the data. They are interested in estimating the effect of margin requirements on stock market volatility. The original sample consists of 14,118 daily stock returns covering the period October 1934~December 1987. There were 22 margin changes during the period. In their first tests, H-M use the modified Levene statistic suggested by Brown and Forsythe (1974) to test whether the standard deviation of stock returns in the 25 days preceding the margin changes is the same as that in the succeeding 25 days. To assess the distribution of this statistic, they obtain the significance levels from a bootstrap distribution.
470
Since the assumption of independence of daily returns may not be valid, H-M next consider monthly returns. Monthly returns show very little autocorrelation but the distribution departs significantly from normality. The data consist of 629 observations on y = monthly returns and x = margin requirements. H-M leave y as fixed and resample only the observations on x. This is different from the resampling advocated by Efron which resamples the (y, x) pairs. They argue that the Efron procedure breaks the conditional heteroskedasticity of the stock market returns whereas their procedure preserves it. But the resampling scheme used by H-M is not valid because it violates the relationship between y and x. A better procedure than the one followed by H-M is to estimate a regression of the form volatility = ~ +/~(margin)
(11)
and resample the residuals from this regression. The main interest is in this regression rather than the bootstrap distribution of the modified Levene statistic. This statistic tests the equality of variances before and after the margin change (a two sided test) whereas the null hypothesis calls for a one sided test, that margin increase (decrease) decreases (increases) stock market volatility. Levich and Thomas (1993) is another example of bootstrapping the data. They use the bootstrap method to get standard errors for trading rule profits in the foreign exchange markets and to test their "statistical significance". They generate bootstrap samples by random sampling from first difference of the data holding the starting and ending period values fixed and calculating trading rule profits from the bootstrap samples. This type of resampling is valid only if the original series is a random walk. Thus, the standard errors and significance tests that Levich and Thomas use are valid only under a very restrictive assumption about the time series. In fact specification tests using trading rule profits (discussed later) have shown that the random walk model is not a valid characterization of the data. Levich and Thomas discuss the statistical significance of trading rule profits. A more interesting question is to investigate the "economic significance" as done in LeBaron (1991, 1994). He tests whether profits in the foreign exchange markets are significantly different from those in alternative investments. Bootstrap methods with trading rules have been more fruitfully used as a tool of model specification tests. These are discussed later in Section 5.
4.2. Bootstrap methods f o r standard errors
The earliest applications of bootstrap methods consisted of using bootstrap distributions to get small sample standard errors of estimates. It was soon recognized that the bootstrap distribution can be skewed and getting the standard errors and applying the usual tests of significance (based on symmetric distributions like the t and Normal) is not advisable. To solve this asymmetry problem the bootstrap distribution can be directly used to construct the confidence intervals. If 0 is a consistent estimator of 0 and 0* is the bootstrap estimator of 0, then the two sided (100 - 2~) confidence interval for 0 is

^,k
471 (12)
(0", 01_~) .
This is a two-sided equal-tailed interval which is often non-symmetric. This method is known as the percentile method. Later it was discovered that this simple percentile method does not give accurate coverage probabilities and Efron (1987) suggested the bias corrected and accelerated bias corrected confidence interval methods. However these are rather complicated to compute and an alternative computationally simpler procedure is the percentile-t method (see Hall 1988, 1992). This is the percentile method based on the bootstrap distribution of the t-statistic
t = x/n(O - O)/s
(13)
where s 2 is a x/~ consistent estimator of the variance of x/n(0 - 0). This procedure is often referred to as Studentization and t is said to be "asymptotically pivotal" (a pivotal statistic is one whose distribution is independent of the true parameter 0). Hartigan (1986) stressed the importance of using a pivotal statistic. See also Beran (1987, 1988). These procedures for the construction of confidence intervals are all reviewed in DiCiccio and Romano (1988) and Hall (1988b, 1992) and we shall not repeat the details. In the financial literature, however, we see the use of standard errors and the simple percentile method. Akgiray and Booth (1988) for instance, use bootstrap method to get standard errors for estimates from 4-parameter stable laws. Badrinath and Chatterjee (1991) use bootstrap methods to get standard errors for estimates of parameters from Tukey's 9 and h distributions and compare the bootstrap standard errors with the asymptotic standard errors. There are several other cases in the financial literature that rely on just bootstrap standard errors and the simple percentile method. There are some cases where the asymptotic variance is not readily available and the percentile method is the only alternative. In these cases one has to be satisfied with the percentile method. Of course, one can use the double bootstrap method of Beran (1987, 1988) or some other iterative procedure but this could be computationally very cumbersome in these situations.
4.3. Bootstrap based tests o f hypotheses
An example of this is the study by Lamoureux and Lastrapes (1990) to be referred to as L-L. However, in this study the hypotheses to be tested are not correctly formulated. Hence the use of the bootstrap method is suspect, although the conclusions are perhaps valid. Although there are several studies using the bootstrap approach to hypotheses testing, we discuss here the paper by L-L. Other papers are discussed in the following sections. The point that L-L want to make is that the I G A R C H model can arise from a G A R C H model with structural change, and thus, the empirical evidence in favor of the I G A R C H is suspect. They estimate two GARCH(1,1) models, one without
472
structural change and another allowing for structural change through the introduction of 13 dummy variables. The data are daily stock returns on 30 large companies over the period January 1, 1963 to November 13, 1979 (a total of 4,228 observations). Denoting by ht the conditional variance of stock returns, the two GARCH(1,1) models L-L consider are: Model A: Yt = xtfl + et
(etlet-l, ,~t-2,..-) ~ N(0, ht) ht = co + 2ht-1 -}- o~Vt_l
(14) (15) (16)
where vt-1 is a serially uncorrelated innovation. Model B: same as model A with 13 dummies added to allow for structural change in co (they are exogenously picked on the basis of some prior information). The average value of 2 for the 30 companies was 0.978 under model A, and 0.817 under model B thus suggesting that I G A R C H model can arise from a G A R C H model with structural shifts. For some companies (# 16, 18, 20 for instance) the difference was large, but for a few (# 23 for instance) the change was very small. The results were (value of 2):
Company # 16 # 18 # 20 # 23
Model A 0.938 0.964 1.012 0.992
Model B .641 .587 .687 .981
L-L argue (p. 228) that "the desired test is the null hypothesis that 2 in the restricted model equals 2 in the unrestricted model against the alternative that the latter parameter is less than the former". This formulation is not appropriate. A classical hypothesis test cannot refer to two incompatible models. There are two alternative tests that one can conduct: (i) Test the hypothesis that the structural shift dummies are zero. If this hypothesis is rejected, then Model B is the correct model and Model A is misspecified. Test the hypothesis that 2 -= 1 in Model B, i.e., test the hypothesis that the I G A R C H specification holds for the model with structural change. If this hypothesis is rejected the observed I G A R C H is due to ignoring structural change (as the authors argue).
(ii)
The appropriate way of generating the bootstrap samples depends on which of these hypotheses is considered. For (i) one generates the data under the null that
473
the structural dummies are zero and considers the bootstrap distribution of the relevant F-statistic. For hypothesis (ii), one has to generate the bootstrap data for Model B under the null 2 = 1 (or 0.99) and consider the bootstrap distribution of 2. This of course is more complicated. In both cases the relevant tests are conducted starting with Model B. The bootstrap data generation actually used by L-L is as follows (p. 228): "500 bootstrap samples are drawn from the standardized residuals of the restricted GARCH(1,1) model for company # 16 ... The bootstrap residuals ... are transformed into a GARCH(1,1) with 2 = 0.99. For each of the 500 realizations, the general G A R C H model (Model B) is estimated and the parameters saved. The 500 estimates of 2 define the empirical distribution under the null". The bootstrap data generation used by L-L is correct for testing the null hypothesis that ). = 0.99 in Model A (for company 16). It is not appropriate for the hypotheses of interest here. The hypothesis refers to the validity of I G A R C H for Model B. Thus, the data generation has to start with Model B under the null that 2 = 0.99. The basic issue here is that Model A is misspecified in the sense that it ignores structural change, and that Model B is the correctly specified model. One should not generate samples with a misspecified model and start making inferences about the parameters of a correctly specified model. This example illustrates the importance of correct formulation of the hypotheses and a correct way of bootstrap data generation before jumping on the bootstrap bandwagon.
4.4. Bootstrap methods f o r cointegrating systems
There have not been many applications of bootstrap methods applied to cointegrated systems in the financial literature. Shea (1989a, b) is an exception. He is concerned with the biases in the test statistics in tests of the present value relation and uses bootstrap methods. To do this he starts with the cointegration model developed by Campbell and Shiller (1987). The present value relation for two variables xt and yt states that yt is a linear function of the present discounted value of expected values of xt. Campbell and Shiller show that the present value relation implies that stock prices and dividends are cointegrated when prices and dividends are both I(1). Shea considers two methods of estimating the present value relation: Method 1: The cointegrating regression
Pt = kl + ODt + ut
(17)
Method 2: The error correction regression

l~Dt = k2 -t- fllAOt-1 + fl2APt + fl3Dt-1 + fl4Pt + ut
(18)
which implies 0 = -fl3/fl4Method 2 involves estimation of one of the error correction equations. The models were estimated by OLS and the OLS residuals were resampled. Next
474
bootstrap estimates of the parameters and the bootstrap variance were calculated. Shea argues that the bootstrap method of obtaining standard errors is a viable alternative to estimating the asymptotic standard errors in small samples. The discussion in the preceding sections shows two shortcomings in the bootstrap procedures used by Shea. (Although, admittedly, these were not so well known at the time Shea wrote his paper in 1987). The first refers to the way bootstrap data were generated. As discussed in the previous section, in a cointegrated regression model, it is not enough to resample the residuals from the cointegrating regression. One has to resample pairs of residuals that take into account the I(1) properties of the data. The second shortcoming refers to the concentration on bootstrap standard errors. The bootstrap distribution maybe skewed, in which case the standard errors should not be used. One can make confidence statements directly from the bootstrap distribution. The second point refers to the need to bootstrap a pivotal (or asymptotically pivotal statistic - see Hall and Wilson's guidelines quoted in the previous section). The need for the use of pivotal statistics is also clearly emphasized in Horowitz (1995). In the case of cointegrating regressions, in Method 1 considered by Shea, even though the estimator of 0 is superconsistent, it is now well known that its asymptotic distribution involves nuisance parameters arising from endogeneity of the regressors and serial correlation in the errors. Thus, this method does not provide an asymptotically pivotal statistic to bootstrap. One could use the prepivoting method of Beran (1987, 1988) or the bias correction methods suggested by Efron (1987). But these are computationally burdensome and need not be used when asymptotically pivotal statistics are available. In the case of cointegrating regressions, these are provided by the use of, for instance, Phillips and Hensen's (1990) fully modified least squares (FMOLS) or Johansen's (1988) ML method of vector error correction model (VECM). This is what is illustrated in the paper by Li and Maddala (1996b).
4.5. G M M and tests o f conditional asset pricing models
Because of its simplicity, flexibility and generality, the generalized method of moments (GMM) has become an important technique for estimating and testing asset pricing models. If the number of moment conditions exceeds the number of parameters to be estimated, the GMM provides tests of the overidentifying restrictions. Monte Carlo experiments with GMM have revealed that asymptotic theory often provides poor approximation to the distributions of test statistics obtained from GMM. It is not unusual for the true and nominal sizes of the GMM test statistics to differ from one another when asymptotic critical values are used. See, for instance, Tauchen (1986) and Kocherlakota (1990). Ferson and Foerster (1994) conduct a detailed Monte Carlo study of the size and power of GMM test statistics (for asset pricing models), the sampling properties of the coefficient estimators, their standard errors and t-ratios. They investigate two versions of GMM - two stage and iterative GMM estimators. The
475
two procedures have the same asymptotic properties, and studies typically use one of the two. They find that in larger models the two-stage GMM tests reject the null hypothesis too often, while an iterated GMM test statistic conforms more closely to the asymptotic distribution. They also find that the GMM coefficient estimators are approximately unbiased in simpler models but the use of asymptotic formulae result in an underestimation of the standard errors. This understatement is more severe in systems with large number of assets and small sample sizes. However, in more complex models there are large biases in both the coefficient estimators and their standard errors. These authors also investigate simple adjustments to reduce the finite sample bias. There is a small bootstrap experiment in the Ferson-Foerster paper but not much can be concluded from this. They generate 500 samples of artificial data that satisfy the single latent variable model of asset pricing using N = 12 assets and T = 60 observations. From these samples they compute the small sample distributions of the test statistics, and use the "empirical" critical values as the "true values". Then they use the bootstrap method with 1,000 bootstrap samples and compare the critical values from the bootstrap method with the "true" critical values. However, the bootstrap was applied to only 5 of the 5,000 samples (which they call experiments 1-5). They argue that for samples (experiments) 3 and 4 the bootstrap critical values differ substantially from the "true" critical values. This is not a valid conclusion. The bootstrap critical values from any particular sample can be different from those obtained from the 5,000 samples because of an unusual sample. The bootstrap method should be applied to all the 5,000 samples, and the average computed with the "true values". It is true the computational burden is enormous, but it can be done. See Li and Maddala (1996b) and Horowitz (1995). Thus, the bootstrap results presented in Ferson and Foerster do not throw any light on the validity of the bootstrap method. There is, however, another problem with the use of bootstrap methods to study small sample correction for GMM based test statistics. Hall and Horowitz (1995) argue that with dependent data, one should use the bootstrap method with caution. In the case of GMM we do not have a structural model (e.g. an ARMA model) that reduces the data-generation process to a transformation of independent random variables to which we can apply the bootstrap method. The bootstrap sample must be drawn in such a way that suitably captures the dependence of the data-generating process. This cannot be done by the usual bootstrap methods. Hall and Horowitz argue that one cannot apply bootstrap methods to the usual GMM based test statistics and that it is necessary to develop special versions of the test statistics and these must have the same distributions as the sample versions of the statistics through Op(n -1) . They do this for a non-overlapping block resampling method (Carlstein's method) and argue that the case of overlapping block method (Kfinsch's method) is more difficult. They investigate the performance of their modified bootstrap test statistics through a small Monte Carlo investigation and found that for the models and sample sizes investigated,
476
the bootstrap corrects the finite sample size distortions of GMM based test statistics, although it does not eliminate them.
5. Bootstrap methods for model selection using trading rules

LeBaron (1991), Brock et al. (1992), Kim (1994) and Karolyi and Kho (1994) use bootstrap methods and trading rules (based on moving average rules) for the purpose of checking the adequacy of several commonly used models like the random walk (RW), GARCH, and the Markov switching regression (MSR) models. The bootstrap procedure used is that of bootstrapping the residuals from a fitted model and hence is not subject to the criticism we made earlier regarding bootstrapping the raw data. The procedure involves the following steps: First get a measure of the profits generated by a trading rule, using the actual data. Next estimate the postulated model and bootstrap the residuals and the estimated parameters to generate bootstrap samples. Next compute the trading rule profits for each of the bootstrap samples and compare this bootstrap distribution with the trading rule profits derived from the actual data. The basic idea is to compare the time series properties of the generated data from the given model with those of the actual data. Trading rule profits are one convenient measure for this purpose. R2's and other goodness of fit measures do not capture the time series structure of the data. Brock et al. (1992) tried this procedure with the random walk (RW), AR(1), GARCH-M, and E-GARCH models on 90 years of daily data on the Dow Jones Industrial Average covering the period 1897-1986. They found that none of these models replicate the trading rule profits (based on moving average trading rules) from the actual data. LeBaron (1991) considers RW, GARCH, regime shifting and interest-rate adjusted models and finds that none of them replicates the trading rule profits from the actual data, although GARCH does better than the other models. Thus, more complicated formulations are called for. Besides using trading rule profits as a model specification test, LeBaron also tests the "economic significance" of trading rule profits in the foreign exchange markets, by accounting for transaction costs and interest rates and attempting to measure the riskiness of trading strategies in the foreign exchange market relative to the strategies in the other markets (these are taken to be buying and holding stocks in the U.S. market). The CRSP value weighted index including dividends is the representative asset used. We will not discuss LeBaron's results in detail but broadly speaking, his conclusion is that the use of technical trading rules in the foreign exchange market generate returns similar to those from a domestic stock portfolio but further tests are necessary to completely answer the question of the "economic significance" of trading rules in the foreign exchange market. (LeBaron considers weekly exchange rates on the currencies British Pound (BP), Deutsche Mark (DM) and Japanese Yen (JY) sampled every Wednesday at 12:00 pm EST from January 1974-February 1991. Returns are created using log first differences of the exchange rates $/fx).
477
Kim (1994) also uses trading rule profits as a tool for model specification tests. The moving average trading rules are applied to actual and generated data from the RW, GARCH-M, Hamilton's Markov switching model, the SWARCH (ARCH with Markov switching), and the CAPM models. He finds, as do others earlier, that the random walk model cannot capture the moving average trading rule profits generated by the actual data. As found by Brock et al. (1992) and LeBaron (1991) he finds that the GARCH-M and Hamilton's Markov switching model also do not capture the trading rule profits. However, the SWARCH model does well in replicating trading rule profits from the actual data. It outperforms the GARCH-M and Hamilton's Markov switching model. This, of course, does not mean that the SWARCH model is the only one or the best model characterizing returns in the foreign exchange market. It does mean that the other models are inadequate. Karolyi and Kho (1994) use bootstrap methods in conjunction with trading rules to reexamine the profitability of positive feedback investment strategies which buy stocks that have performed well in the past and sell stocks that have performed poorly in the past. The significant returns to such a strategy were confirmed in Jegadeesh and Titman (1993). Karolyi and Kho conclude that their overall findings for NYSE and AMEX stocks from 1965-89 indicate that the profitability of the relative strength strategies may simply represent fair compensation for the risks assumed by these strategies. As found by others for moving average trading rules, Karolyi and Kho find that the random walk model cannot explain the significant returns of the positive investment strategy, even within size- and beta-based subgroups of stocks with similar risk exposures. They, therefore, try to see whether the profitability of the relative strength trading rules is significant after adjusting for time-varying risk. They find that the trading rule profits are consistent with those simulated using a simple conditional CAPM equilibrium model of time-varying expected returns. Both Kim (1994) and Karolyi and Kho (1994) found models that replicate the trading rule profits considered. The strongest conclusion in all the four papers we have considered is the rejection of the random walk model. The trading rules considered in Brock et al. (1992), LeBaron (1991) and Kim (1994) are moving average rules and those considered by Karolyi and Kho are the positive feedback investment rules. In all cases the bootstrap method in conjunction with the trading rule has been used as a tool for model specification. Although many papers quote the Levich and Thomas (1993) along with the study of Brock et al. (1992) as examples of the application of bootstrap methods in finance, there is a conflict in the conclusions drawn. From the observation that the trading rule profits from the actual data do not fall in the (say) 95% interval of the bootstrap distribution, Levich and Thomas conclude that the trading rule profits are statistically significant. From the same observation the studies by Brock et al., LeBaron, and Kim, referred to earlier, conclude that the random walk model is an inadequate specification. Thus, the "statistical significance" is interpreted in two different (conflicting) ways. The use of bootstrapping trading rule profits for model selection is a more fruitful approach than the one in Levich and Thomas.
478
The problem of model checking using the bootstrap method has also been discussed in Tsay (1993) with different functionals of the sample observations. In the preceding discussion, the functional used for model checking is trading rule profits. In LeBaron (1992), it is pointed out that the particular method of estimation, used for the model before bootstrapping, has an effect on whether the model is considered valid or not on the basis of replicating the trading rule profits from the original data. For instance, in the case of foreign exchange data, Kim (1994) shows that the SWARCH model does well in replicating trading rule profits. This is a non-linear model. LeBaron shows that a linear model like ARMA(1,1), using the simulated method of moments (SMM) estimated parameters (but not using the ML estimated parameters) does well in replicating trading rule profits. It is worth investigating further how different methods of estimation affect model selection using bootstrap methods and trading rules.
6. Bootstrap methods in long-horizon regressions

Bootstrap methods have been extensively used in the analysis of long-horizon regressions to determine the small sample bias in the coefficient estimates and significance levels in tests of hypotheses. See, for instance, Goetzmann (1990), Goetzmann and Jorion (1993), Mark (1995), Choi (1994) and Chen (1995). Although the final results may not change much, the bootstrap methods used in these studies can be improved upon. The bootstrap studies are also different in the sense that the model used to generate the bootstrap data and the models estimated with the bootstrap data are different. Hence the validity of bootstrap procedures is not so obvious. The long horizon regressions were motivated by the observation that although stock returns are not predictable in the short-run, long-run returns are predictable. In fact several studies (reviewed in Kaul, 1996) show evidence in favor of long run predictability. A typical long-horizon regression takes the form
k
~-'~ Rt+i = ek + flkXt + utk

i=1
(19)
where Rt is the log of stock return and Xt is some variable measuring fundamental value (dividend yield is the most commonly used variable). Fama and French (1988) show that dividend yield predicts a significant portion of multiple year return to the NYSE index. They observe that the explanatory power of the dividend yield increases with k, the horizon of the returns. Similar results are reported in Campbell and Shiller (1988). There are two problems with the inferences made from regressions of the form (19), noted in the literature. First, equation (19) is estimated by using overlapping returns because with a small sample size T, the use of non-overlapping returns reduces the sample size to T/k. The use of overlapping returns induces serial correlation in the errors. Hence heteroskedastic and serial correlation consistent
479
(HAC) estimators are used to compute the standard errors. The second problem is that Xt in equation (19) is predetermined but is also stochastic and it is often correlated with lagged values of utk. Because of this it is argued that there is a small sample bias in the estimates of/~k. See Mankiw and Shapiro (1986) and Stambaugh (1986). However, the model considered in these papers has Xt correlated with c u r r e n t ut. The model considered is as follows
Yt = ~ + flXt + et
Xt -~- # Ji- Xt-1 Jv Y]t
(20) (21) ~ (22)
(8t, qt) ~ IID(O, E) where E =
Then it is shown that

E
_-__ E ( 4 , ~,,,(,
= a~
(23)
The HAC corrections to the standard errors are only asymptotically valid, and hence Monte Carlo and bootstrap methods have been used to investigate the small sample problems of corrections for biases in the coefficients and their estimated standard errors so that reliable inference can be made on the significance of the coefficients in the long-horizon regressions. The study by Hodrick (1992) is based on a Monte Carlo study (which can also be considered as a parametric bootstrap). Since it forms the basis of subsequent papers using bootstrap methods we will discuss it briefly. Hodrick explores three methods: (i) (ii) A regression based on (19) with Xt = Dividend yield. A regression of returns on cumulative lagged dividend yields
kl
Rt+l = O~kq- flk Xt-i
)
-]- Vtk
124>
This is also often referred to as "backward" regression. (iii) A VAR model with stock returns, dividend yields, and t-bill rate.
He argues that a VAR completely characterizes the autocovariances of the time series, and explores how it can be used to generate implicit long horizon statistics. Hodrick first estimates a first order VAR model based on monthly data for (A) 1927--1987, (B) 1952-1987, and (C) 1927-1951. If returns are not predictable, then the coefficients of the lagged variables in the returns equation must be zero. The ~2 test statistics are significant especially for sample period B, thus indicating return predictability.
480
To investigate the small sample validity of this inference, Hodrick performs a Monte Carlo experiment. He generates data using the results for time period (B) and generating the errors from a multivariate distribution following a GARCH process. There are two sets of data generated: One setting the coefficients of the lagged variables in the return equation at zero (assuming the null of no predictability) and the other using the actual estimated coefficients (to assess the power of the different estimation procedures). We will not go into the details of Hodrick's paper but the main conclusions are that (i) the VAR approach is the preferred of the three techniques for making inferences about long-horizon regressions, and (ii) the Monte Carlo results support the conclusion that changes in dividend yields forecast significant persistent changes in expected stock returns. The first conclusion is not surprising because the data were generated using the VAR model. The other models are misspecified in this framework. Also, there is one puzzling result in Hodrick's paper. The implied slope coeffcients of long-horizon regressions from the VAR (reported in Table 4) are much higher than the slope coefficients estimated from equations (1) and (2) (reported in Table 3). The subsequent studies essentially follow Hodrick's approach of generating data under the null from a VAR but resample the actual residuals from the fitted VAR's. Nelson and Kim (1993) (to be referred to as N-K) investigated regressions of total return on log dividend yield on S&P over the period 1872-1986. They find, as do others, that the t-ratios (and hence R2's) increase with the return horizon. The question is how biased the coefficient estimates and the t-ratios are. To determine this they simulated artificial pairs of returns rt and dividend yields dt using the fitted VAR approximation of the present-value model drawing samples from the residual pair (fit, it). N-K do not use bootstrap but use a procedure called randomization (see Noreen, 1988) which is the same as bootstrap but sampling without replacement. The VAR model used is, however, not presented in their paper. N-K conclude that the coefficient estimates in the long-horizon regressions are biased upwards and that the standard errors are biased downwards even when HAC estimates are used and that these biases increase with the return horizon. Thus, there are two biases in the inference on return predictability. Their basic conclusion is that, in studies on return predictability, one needs to use simulation methods to get the correct significance levels. Asymptotically valid procedures like HAC suffer from substantial small sample biases. As far as the predictability issue is concerned, their study shows that return predictability is a post-World War II phenomenon. Goetzmann and Jorion (1993) (to be referred as G-J), use the bootstrap method, and arrive at the conclusion that there is no strong statistical evidence indicating that dividend yields can be used to forecast stock returns. However, their bootstrap method is not based on an explicit model. They start with randomly sampling the total returns from their distribution. They argue that because total returns have been randomized, there is no relationship between returns and dividends. This is correct only if the distribution of/~ did not depend on the time
481
series structure of the returns series. The bootstrap data generation is similar to the one used by Hsieh and Miller (1990) discussed earlier and is not valid. G-J also estimate a VAR model and present bootstrap results from the VAR model to compare with the results of Nelson and Kim (1993) and Hodrick (1992), and find that the results are more in favor of predictability than in their bootstrap. For instance, for the G M M statistic the upper 5% critical value is 2.1; it is 3.9 with the VAR and 5.5 with their bootstrap. G-J argue (p. 675) that the rejections (of the null of no predictability) with the VAR are misleading because they do not explicitly incorporate the dynamics of regression with lagged dependent variables. However, since no explicit model is presented by G-J, it is hard to give an accurate interpretation of their results. Mark (1995) does a detailed analysis, using bootstrap methods, of long-horizon predictability in the foreign exchange markets. He considers quarterly data on the currencies Canadian Dollar (CD), Deutsche Mark (DM), Swiss Franc (SF) and Japanese Yen (JY) over the period 1973-1991. He first estimates equations of the form et+k - et -- ~k +/~kZt + v~+k,~ k = 1,4,8, 12, 16 (25)
where et is the log exchange rate at time t. Zt = f t - et and f t is the date-t fundamental. Zt is the deviation of the exchange rate from its fundamental value at time t. ftis obtained from a monetary model of the exchange rate. He finds that/~k and its significance (t-ratio) increase with the horizon k. The next step is to correct for the biases in the coefficient estimates and their SE's. This is done using the bootstrap methods. Mark first discusses the asymptotic corrections for bias in the coefficient estimates given by Stambaugh (1986) and corrections in the SE's using the HAC. The bootstrap method used follows the lines of data generation used in Hodrick (1992) and Nelson and Kim (1993). A VAR is estimated under the null and the residual pairs are bootstrapped to generate new series. The VAR used is:
Act
=
ao +elt
P
(26) (27)
Zt = bo + Z bjZt_j + ezt j=l
Let (h0, b0, bj) be the estimated coefficients, elt and ~2t the residuals and ~" the covariance matrix of (~lt, ezt). There are two methods of resampling done: (i) (ii) Draw samples from N(0, ~'), Draw samples from (~lt, ~2t) with replacement.
Procedure (i) is what Efron calls parametric bootstrap. (See Efron and Tibshirani (1993) Appendix). Procedure (ii) could, in principle, be called "semiparametric" bootstrap because part (the regression function) is parametrized and part (the error distribution) is not. This procedure is not what Efron calls "non-
482
parametric" bootstrap but it is often referred to in the econometric literature as a nonparametric bootstrap because the parametric nature of the regression function is taken as given, and the only issue is whether the error distribution is parametrized or not. Mark also performs a specification analysis of the VAR model estimated under the null (of no predictability) to check for serial correlation and A R C H effects. The bootstrap data are used (i) (ii) (iii) to correct for the biases in/~k obtained from the estimation of equation (25), to get small sample significance levels for testing the null that/~k = 0, to assess out of sample predictions.
The overall conclusion is that of exchange rate predictability from the longhorizon regressions. This analysis is pursued in Choi (1994) using alternative models of exchange rates and thus different specifications of the fundamental value. In Chen (1995) alternative estimation methods are considered. In addition to the estimation of equation (25) and a backward regression of the form (24), a vector error correction model (VECM) was considered and the implied long-horizon regression coefficients/?k derived from the VECM following the analysis in Hodrick (1992) for the VAR. This paper arrives at the conclusion that the VECM is the best approach because it has the highest empirical power to reject the false null hypothesis but this is not surprising (as in the case of Hodrick's paper) because the data were generated using the VECM. However, large small sample biases and size distortions persist with even the VECM. There is one argument in favor of the VECM. This is that the estimation of the VECM conducted with the bootstrap data is valid because the bootstrap data have been generated using the VECM model. For the other models the validity is not so obvious, because the data are generated from a VAR model, and inference is made on a separate set of regressions (the long-horizon regressions). The appropriate method for making bootstrap based inference on the longhorizon regressions, if one starts with a VAR model is to first estimate the VAR model, next generate the bootstrap sample under the null of no (return or foreign exchange) predictability, setting the coefficients of the lagged variables (in the return or exchange rate equation) at zero and then make inferences on the coefficients of the long-horizon regressions implied by the VAR. Since the asymptotic variances of these coefficients (which are nonlinear functions of the coefficients of the VAR) can be computed, one can bootstrap the (asymptotically) pivotal t-statistics. Note, however, (as mentioned earlier) that in Hodrick's study the implied coefficients from the VAR of the long-horizon regressions are much higher than the slope coefficients estimated from the long-horizon regressions directly. This discrepancy needs to be investigated. There is, however, no such discrepancy in the study by Chen (1995). Although it is not clear from these papers, it seems that the motivation in starting with a VAR is that it is more flexible and will give a better representation of the true process. If this is so, since the bootstrap data generation is also done
483
using the VAR model under the null, hypothesis testing on long-horizon coefficients also must be conducted in the framework of the VAR and not from the direct (or indirect) long-horizon regressions. For the purpose of bias correction, the direct estimation of the long-horizon regressions might still be alright. Suppose that we want to apply bootstrap procedures to equation (19) directly (otherwise we have to do this separately for each k). The problem is complicated because of the serial correlation in the errors and possible endogeneity of Zt. But once an appropriate estimation procedure is devised, then it is straightforward to generate bootstrap samples. There is yet another issue with the use of bootstrap methods in all these studies. The bootstrap confidence intervals or significance levels obtained are based on what are known as the percentile methods. It has been documented in the literature on bootstrap that these are biased. Thus, a bias correction method suggested by Efron and discussed in the Appendix of Efron and Tibshirani (1993) is needed. An alternative is the bootstrap-t method. Another alternative is the "bootstrap after bootstrap" suggested by Kilian (1995). We use the first bootstrap for bias correction (as done in the studies by Mark (1995), Choi (1994), and Chen (1995)). We then bootstrap the bias-corrected estimate. In any case, there is substantial scope for improving the significance levels reported in all these papers in light of the fact that the simple percentile methods have been discarded long ago in the bootstrap literature.
7. Impulse response analysis in nonlinear models Financial time series are known to exhibit several types of non-linearities. Various nonlinear models have been fitted to them: the A R C H / G A R C H types of models and Markov switching models being the most common. These models are all parametric and incorporate prior constraints on the shape of low order moments of the conditional distributions. Gallant and Tauchen (1992) develop a nonparametric approach to this problem. In Gallant et al. (1993) and Tauchen et al. (1994) this nonparametric approach is used to study the dynamic properties of the time series through non-linear impulse response analysis. This is done by perturbing the vector of conditional arguments in the conditional density function and tracing out the multistep ahead expectations of the conditional mean and variance functions. These are known as conditional moments profiles. It is not possible for us to go into the details of their procedures. But to derive the confidence bands for the moment profiles Gallant et al. and Tauchen et al. use the bootstrap approach. The method of bootstrapping is neither of the two methods described earlier (bootstrapping the data and bootstrapping the residuals). It is a third method - of bootstrapping the conditional density function. Additional data sets of the same length as the original data are generated from the fitted conditional density )~(y]x) using the initial conditions of the original data. Then these are used to compute the moment profiles. It is not clear to us how the
484
time series structure of the original data is preserved in this procedure of bootstrapping (maybe by having lagged variables in the x in f(ylx)). In any case these authors have used the bootstrap approach in the nonparametric context and derived some new conclusions about the dynamic response of stock prices and volume to several types of shocks. There have been earlier discussions of bootstrap in nonparametric regression, see Hardle and Marron (1991). Gallant et al. and Tauchen et al. extend this to nonlinear time series analysis. Error bands for impulse responses in dynamic models have also been discussed in Kilian (1995) and Sims and Zha (1995), although in the context of linear models. Sims and Zha argue that the Bayesian intervals have a firmer theoretical foundation in small samples, are easier to compute and are about as good in small samples by classical criteria as are the best bootstrap intervals. Bootstrap intervals without bias correction perform very badly. Kilian suggests a different bias corrected confidence interval from that discussed by Efron (1987) and Efron and Tibshirani (1993). He suggests what he calls "bootstrap after bootstrap". This is motivated as follows: Let O(x) be the initial estimator of 0, which we use in generating bootstrap samples. Let the mean of the bootstrap estimators O(x*) be denoted by 0". Then the bias corrected estimate is
Obc(X) = O(x) + (O(x) - 0") .
(28)
Kilian's idea is that if we bootstrap 0bc we will get better confidence intervals than if we bootstrap 0. Thus, use the first bootstrap to get bias correction and then another bootstrap to get the confidence interval. Note that the term bias correction in the literature of bootstrap confidence intervals as suggested by Efron does not refer to correction of the bootstrap estimator for bias which is what Kilian's method involves. However, he shows that his method works very well in his application, compared with the percentile method. More detailed studies are necessary to compare it with Efron's procedures as well as the bootstrap-t. 8. Conclusions The paper points out some shortcomings in some of the applications of bootstrap methods in financial models. There is frequent reference to Efron's 1979 paper but subsequent developments in the bootstrap literature have been often ignored. Taking these into account would result in a better use of bootstrap methods in financial models. It is important to distinguish between two procedures of bootstrapping: bootstrapping the data and bootstrapping the residuals. There is also a third method noted in Section 9 of the paper. Even when bootstrapping the residuals, these are different sampling schemes. These are discussed in Section 3. It is important to bear in mind that the model estimated with the bootstrap data and the method of bootstrap data generation should be consistent. Other-
485
wise, the bootstrap is not a valid bootstrap. If the bootstrap sample is generated assuming model A, then a different model, model B should not be estimated with the same data. The inferences drawn will not be valid. An important use of bootstrap methods in financial models, is the use of trading rules in conjunction with bootstrap methods, as a tool for model selection. It appears that how the models are estimated before bootstrap data are generated makes a difference in the conclusions. These methods need to be explored further. We have surveyed several papers in finance and outlined some shortcomings in the use of bootstrap methods. Have the papers drawn the wrong conclusions because the bootstrap methods are flawed? In some cases perhaps the results are quite robust and the use of correct methods are not going to change the conclusions. This is so, for instance, the case with long-horizon predictability discussed in Section 6 and structural change and I G A R C H discussed in Section 4.3. In any case the use of the correct method will give different results, whether the conclusions change or not. One other issue is: Is a defective bootstrap method still better than asymptotic inference? There are several examples in the literature where this is not so. One case of current interest is the case of bootstrapping unit root models. (See Basawa etal. (1991a)). However, when no asymptotic inference is available, it is better to use a bootstrap method. Also, when the correct bootstrap method is complicated and not feasible, a theoretically imperfect bootstrap method might improve on asymptotic inference, as discussed in Li and Maddala (1996b). Thus, unless proven otherwise, some bootstrap may be better than no bootstrap. But when a correct bootstrap method is available, it is important to avoid the wrong bootstrap.
References
Akgiray, V. and G. G. Booth (1988). Mixed diffusion Jump process modeling of exchange rate movements. Rev. Econom. Statist. 70, 631-7. Badrinath, S. G. and S. Chatterjee (1991). A data-analytical look at skewness and elongation in common-stock return distributions. J. Business Econom. Statist. 9, 223-33. Basawa, I. V., A. K. Mallik, W. P. McCormick and R. L. Taylor (1991a). Bootstrapping unstable first order autoregressive processes. Ann. Statist. 19, 1098 1101. Basawa, I. V., A. K. Mallik, W. P. McCormick and R. L. Taylor (1991b). Bootstrapping test of significance and sequential bootstrap estimation for unstable first order autoregressive processes. Commun. Statist. -Theory Meth. 20, 1015-1026. Beran, R. (1987). Prepivoting to reduce level error of confidence sets. Biometrika 74, 457~468. Beran, R. (1988). Prepivoting test statistics: A bootstrap view of asymptotic refinements. J. Amer. Statist. Assoc. 83, 687~597. Bookstaber, R. M. and J. B. McDonald (1987). A general distribution for describing security price returns. J. Business 60, 401-24. Brock, W., J. Lakonishok and B. LeBaron (1992). Simple technical trading rules and the stochastic properties of stock returns. J. Finance 47, 1731-64. Brown, M. B. and A. B. Forsythe (1974). Robust tests for the equality of variances. J. Amer. Statist. Assoc. 69, 364~7.
486
Campbell, J. Y. and R. J. Shiller (1987). Cointegration and tests of present value models. J. Politic. Econom. 95, 106~1088. Campbell, J. Y. and R. J. Shiller (1988). Stock prices, earnings and expected dividends. J. Finance 43, 661-676. Carlstein, E. (1986). The use of subseries values for estimating the variance of a general statistic from a stationary sequence. Ann. Statist. 14, 1171-1179. Chatterjee, S. and R. A. Pari (1990). Bootstrapping the number of factors in the arbitrage pricing theory. J. Financ. Res., XIII, 15-21. Chert, J. (1995). Long-horizon predictability of foreign currency prices and excess returns: Alternative procedures for estimation and inference. Unpublished Ph.D. dissertation, The Ohio State University. Choi, D. Y. (1994). Real exchange rate prediction by long horizon regression. Unpublished Ph.D. dissertation. The Ohio State University. Diebold, F. X. and R. S. Mariano (1995). Comparing predictive accuracy. J. Business Econom. Statist. 13, 253-263. Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Ann. Statist. 7, 1-26. Efron, B. (1981). Censored data and the bootstrap. Y. Amer. Statist. Assoc. 76, 312-319. Efron, B. (1987). Better bootstrap confidence intervals. J. Amer. Statist. Assoc. 82, 171-200. Efron, B. and G. Gong (1983). A leisurely look at the bootstrap, the jackknife, and cross validation. Amer. Statist. 37, 36~8. Efron, B. and R. Tibshirani (1986). Bootstrap methods for standard errors, confidence intervals, an"d other measures of statistical accuracy. Statist. Sci. 1, 54-77. Efron, B. and R. J. Tibshirani (1993). An introduction to the bootstrap. New York and London, Chapman Hall. Fama, E. and K. French (1988). Dividend yields and expected stock returns. J, Financ. Econom. 22, 3 26. Ferretti, N. and J. Romo (1994). Unit root bootstrap tests for AR(I) models. Working Paper, Division of Economics, Universidad Carlos III de Madrid. Ferson, W. E. and S. R. Foerster (1994). Finite sample properties of the generalized method of moments in tests of conditional asset pricing models. J. Financ. Econom. 36, 29 55. Freedman, D. A. (1981a). Bootstrapping regression models. Ann. Statist. 9, 1218-1228. Freedman, D. A. (1981b). Bootstrapping regression models. Ann. Statist. 9, 1229-1238. Freedman, D. A. and S. C. Peters (1984). Bootstrapping a regression equation: Some empirical results. J. Amer. Statist. Assoc. 79, 97-106. Gallant, A. R., P. E. Rossi and G. Tauchen (1993). Nonlinear dynamic structures. Econometrica 61, 871-907. Gallant, A. R. and G. Tauchen (1992). A non-parametric approach to non-linear time-series analysis: Estimation and simulation. In: E. Parzen et al., eds., New Dimensions in Time Series Analysis, New York, Springer-Verlag. Goetzmann, W. N. (1990). Bootstrapping and simulation tests of long-term patterns in stock market behaviour. Ph.D. thesis, Yale University. Goetzmann, W. N. and P. Jorion (1993). Testing the predictive power of dividend yields. J. Finance 48, 663-679. Hall, P. (1988). Theoretical comparison of bootstrap confidence intervals. Ann. Statist. 16, 927-953. Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Spriuger-Verlag, New York. Hall, P. and J. L. Horowitz (1993). Corrections and blocking rules for the block bootstrap with dependent data. Working Paper #93-11, Department of Economics, University of Iowa. Hall, P. and J. L. Horowitz (1995), Bootstrap critical values for tests based on generalized method of moments estimators. To appear in Econometrica. Hall, P. and S. R. Wilson (1991). Two guidelines for bootstrap hypothesis testing. Biometrics 47, 757762. Hardle, W. and J. S. Marron (1991). Bootstrap simultaneous error bars for nonparametric regression. Ann. Statist. 19, 778-796.
487
Hartigan, J. A. (1986). Comment on the paper by Efron and Tibshirani. Statist. Sci. 1, 75-76. Hodrick, R. J. (1992). Dividend yields and expected stock returns: Alternative procedures for inference and measurement. Rev. Financ. Stud. 5, 357-86. Horowitz, J. (1995). Bootstrap methods in econometrics: Theory and numerical performance. Paper presented at the 7th World Congress of the Econometric Society, Tokyo. Hsieh, D. A. and M. H. Miller (1990). Margin regulation and stock market volatility. J. Finance 45, 329. Jegadeesh, N. and S. Titman (1993). Returns to buying winners and selling losers: Implications for stock market efficiency. J. Finance 48, 65-91. Jeong, J. and G. S. Maddala (1993). A perspective on application of bootstrap methods in econometrics. Handbook of Statistics, Vol. 11,573-610. North Holland Publishing Co. Johansen, S. (1988). Statistical analysis of cointegration vectors. J. Econom. Dynamic Control 12, 231255. Karolyi, G. A. and B-C. Kho (1994). Time-varying risk premia and the returns to buying winners and selling losers: Caveat emptor et venditor. Ohio State University working paper. Kaul, G. (1996). Predictable components in stock returns. In: G.S. Maddala and C.R. Rao eds., Handbook of Statistics, Vol 14, Statistical Methods in Finance. Kilian, L. (1995). Small sample confidence intervals for impulse response functions. Manuscript, University of Pennsylvania. Kim, B. (1994). A study of risk premiums in the foreign exchange market. Ph.D. dissertation, Ohio State University. Kocherlakota, N. R. (1990). On tests of representative consumer asset pricing models. J. Monetary Econom. 26, 285-304. Kiinsch, H. R. (1989). The jackknife and the bootstrap for general stationary observations. Ann. Statist. 17, 12121241. Lamoureux, C. G. and W. D. Lastrapes (1990). Persistence in variance, structural change, and the GARCH model. J. Business Econom. Statist. 8, 225-34. LeBaron, B. (1991). Technical trading rules and regime shifts in foreign exchange. Manuscript, University of Wisconsin. LeBaron, B. (1992). Do moving average trading rule results imply non-linearities in foreign exchange markets. SSRI, University of Wisconsin. Working Paper # 9222. LeBaron, B. (1994). Technical trading rules profitability and foreign exchange intervention. SSRI, University of Wisconsin. Working Paper # 9445. Levich, R. M. and L. R. Thomas, III (1993). The significance of technical trading-rule profits in the foreign exchange market: A bootstrap approach. J. lnternat. Money Finance 12, 451-474. Li, Hongyi and G. S. Maddala (1996a). Bootstrapping time series models. Econometric Rev. 16, 115195 Li, Hongyi and G. S. Maddala (1996b). Bootstrapping cointegrating regressions. Presented at the Fourth Meeting of the European Conference Series in Quantitative Economics and Econometrics: Oxford, Dec. 1618, 1993. To appear. J. Econometrics. Liu, R. Y. and K. Singh (1992). Moving blocks jackknife and bootstrap capture weak dependence. In: Exploring the Limits of Bootstrap, LePage, R. and Billard, L. eds., New York: John Wiley &s, Inc., 225548. Mankiw, N. G. and M. D. Shapiro (1986). Do we reject too often? Econom. Lett. 20, 139-45. Mark, N. C. (1995). Exchange rates and fundamentals: Evidence on long-horizon predictability. Amer. Econom. Rev. 85, 201-218. Nelson, C. R. and M. J. Kim (1993). Predictable stock returns: The role of small-sample bias. J. Finance 48, 641-661. Noreen, E. (1989). Computer intensive methods for testing hypothesis: An introduction. Wiley, New York. Phillips, P. C. B. and B. E. Hansen (1990). Statistical inference in instrumental variables regression with I(1) process. Rev. Econom. Stud. 57, 99-125. Politis, D. N. and J. P. Romano (1994). The stationary bootstrap. J. Amer. Statist. Assoc. 89,1303-13
488
Rayner, R. K. (1990). Bootstrapping p-values and power in the first-order autoregression: A Monte Carlo investigation. J. Business Eeonom. Statist. 8, 251-263. Shea, G. S. (1989a). Ex-post rational price approximations and the empirical reliability of the presentvalue relation. J. Appl. Econometrics 4, 139-159. Shea, G. S. (1989b). A re-examination of excess rational price approximations and excess volatility in the stock market. R. C. Guimaraes et al. eds., A Re-appraisal of the Efficiency of Financial Markets, pp. 469-94. Shea, G. S. (1990). Testing stock market efficiency with volatility statistics: Some exact finite sample results. Manuscript, Pennsylvania State University. Sims, C. A. and T. Zha (1995). Error bands for impulse responses. Working Paper # 95-6, Federal Reserve Bank of Atlanta. Stambaugh, R. F. (1986). Bias in regression with lagged stochastic regressors. CRSP working papers #156, University of Chicago. Tauchen, G. (1986). Statistical properties of generalized method-of-moments estimators of structural parameters obtained from financial market data. J. Business Eeonom. Statist. 4, 397-425. Tauchen, G., H. Zhang and M. Liu (1994). Volume volatility and leverage analysis. Manuscript, Duke University. Tsay, R. S. (1992). Model checking via parametric bootstraps in time series analysis. Appl. Statist. 41, 1-15 Van Giersbergen, N. P. A. and J. F. Kiviet (1994). How to implement bootstrap hypothesis testing in static and dynamic regression models. Discussion paper #TI94~130, Tinbergen Institute, Rotterdam.
G. S. Maddalaand C. R. Rao, eds., Handbook of Statistics, Vol. 14

1996 Elsevier Science B.V. All rights reserved.
/-
[ O
Principal Component and Factor Analyses

C. Radhakrishna Rao
1. Introduction
Principal component and factor analyses (PCA and FA) are exploratory multivariate techniques used in studying the covariance (or correlation) structure of measurements made on individuals. The object may vary from reduction of high dimensional data by finding a few latent variables which explain the variations of or the associations between the observable measurements, grouping of similar measurements and detecting multicollinearity, to graphical representation of high dimensional data in lower dimensional spaces to visually examine the scatter of the data, and detection of outliers. PCA was developed by Pearson (1901) and Hotelling (1933); a general theory with some extensions and applications are given in Rao (1964). FA originated with the work of Spearman (1904) and developed by Lawley (1940) under the assumption of multivariate normality. A general theory of FA, under the title Canonical Factor Analysis (CFA), without any distributional assumptions was given in Rao (1955). Now there are a number of excellent full length monographs devoted to the computational aspects and uses of PCA and FA in social and physical scientific research. Reference may be made to Bartholomew (1987), Basilevsky (1994), Cattel (1978), Jackson (1991), and Jolliffe (1986) to mention a few authors. A technique related to PCA, when the measurements are qualitative, is correspondence analysis (CA), developed by Benzecri (1973) based on a method of scaling qualitative categories suggested by Fisher (1936). A monograph by Greenacre (1984) gives the theory and applications of CA in the analysis of contingency tables. A recent paper by Rao (1995) contains an alternative to CA, which seems to have some advantages over the earlier approach, for the same purpose CA is used. In this paper a general survey is given of PCA and FA with some recent theoretical results and practical applications.
489
490
C. R. Rao
2. Principal components
2.1. The general problem
The problem o f principal components can be stated in a very general set up as follows. Let x be a p-vector variable and y be a q-vector variable, where some c o m p o n e n t s o f x and y m a y be the same. We want to replace y by z = Ay where A is an r x q matrix and r < q in such a way that the loss in predicting x by using z instead y is as minimal as possible. I f
zll 212"~
(2.1)
1221 I222/ is the covariance matrix o f x and y, then the covariance matrix o f the errors in predicting x by z = Ay is
W = 211 -- ~ . 1 2 A ' ( A Z z 2 A t ) - l A 2 2 1
.
(2.2)
We choose A such that 11WI[, for a suitably chosen norm, is small. If we choose 1[WI] = tr W, then the o p t i m u m choice is A, = arg max trZ12Al(d222AI)-lA1221 .
A
The m a x i m u m is attained at A', = (C1 : ... Cr) (2.3)
where C 1 , . . . , G are the r eigen vectors associated with the first r eigen values 22 _> 22 _> ... >_ 22 o f 122t212 with respect to E22, i.e., the eigen vectors and values are those arising out o f the determinental equation
1Z211212 -- 222221 = 0. (2.4)
The relative loss o f information in using z, = A,y for predicting x is tr(~ll -- (A,Y~22AI,) 1A,~21212AI,)/tr Y,ll 2~ + . . . + 22 =1 tr 211
(2.5)
We consider some special choices o f x and y and derive the optimal transformation A as characterized in (2.3).
2.2. The choice x = y

The special choice, x = y, leads to the usual principal c o m p o n e n t s C~x,..., C'rX, where C 1 , . . . , Cr are the first r eigen vectors associated with the first r eigen values 22 _>... _> 2~ 2 o f the determinantal equation IZn - Zrl = o. In such a case, the loss o f information (2.5) is
Principal component and factor analyses
491
+g &,+ +22
1 +2,2
= @
22
(2.6)
usually expressed as a percentage. The choice of r is determined by the magnitude of (2.6). In practice, we have to estimate 22 and C/ from a sample of n independent observations on the p-vector random variable x, which we denote by the p x n matrix X = (xl : . . . : x,) . An estimate of Zll is S = ( n - 1) ' X ( I (2.7)
!ee')X'
where e is an n-vector of unities. The estimates gi of 2i and ci of Ci are obtained from the spectral decomposition
S = ~ l2C l e t1 @ . . . 2 , Jr- ~pCpCp
.
i th
(2.8) individual are then (2.9)
The principal components of the observations on the

qi : (ClXi~ . . . ~CpXi) ! !
!
In the sequel, we denote
sii = the ithdiagonal element of S, cj = (cjl,...,Cjp)', j = 1 , . . . , p,

Cji
(2.10.1) (2.10.2) (2.11.1) (2.11.2)
= ejCji, i = 1,. . ., p,
qi = ( q i l , ' ' ' , qij = g j l q i j ,
qip)', i = 1 , . . . , n,
i=
1,..., n.
It may be noted that the vectors e i and q/ (apart from a translation of coordinates) can be obtained in one step from the singular value decomposition (SVD)
X ( I - l e e ' ) = g,e,d~ + . . . + gpcpdp

n
with the
(2.12)
relationship (lid1
:".
: l p d p ) ' = ( q l : " ' " : qn).
2.3. Interpretation of principal components

For an interpretation of principal components in terms of the influence of the original measurements on them, we need the following computations as exhibited in Table 1. The magnitudes of the correlations in Table 1 indicate how well each variable is represented in each PC and overall in the first r PC's (judged by the values of
492
C. R. Rao
Table 1 original variable correlation with principal c o m p o n e n t
Z1 ... Zp
multiple correlation of xi on zl, . . . , Zr

r S--I K-'&2 u ~-I ;1
xl
^ cu/ ,,/r ...
^ Cpl/ ,,/~7
=R~
Xp
^ Clp/sx/~p
. ..
^ Cpp/s,/K~ 7
r , 1 ~'a2 __R 2 a p p j @ 1Cjp - p
R2). The values of R ] computed for r = 1 , 2 , . . . enable us to decide on r, the number of PC's to be chosen. If for some r, the values of R 2 are high except for one value of i, say j, then we m a y decide to include xj along with Z l , . . . , zr or add other PC's where xj is well represented.
2.4. G r a p h i c a l display o f data
To represent the individuals in terms of the original measurements we need a pdimensional space. But for visual examination, we need a plot of the individuals in a two or a three dimensional space, which reflects the configuration of the individuals in the p-space (distances between individuals) to the extent possible. For this purpose, we use the PC's either as in (2.11.1) or in the standardized form [SPC as in (2.11.2)]. The full set of new coordinates in different dimensions from which first few may be selected is displayed in Table 2. If we plot the individuals in the first r ( < p) dimensions using the coordinates q i l , . . . , q i r for the ith individual, then the Euclidean distance between the individuals i and j in such a plot will be an approximation to the Euclidean distance in the full p-space
dij = [(xi - x j ) t ( x i - xj)] 1/2
On the other hand, if we plot the individuals in the first r ( < p) dimensions using the coordinates 0~1. . . . , O/r, then the Euclidean distance between individuals
Table 2 individuals dim 1 PC

1 qll
dim 2 PC
q12
... ...
"'"
"'"
dim p PC
qlp q.2p
SPC
011
SPC
012 022
SPC
~lp
q21
021
q22
.q2p qnp
1
n
Variance
qnl
12
0nl
1
qn2
g22
qn2
1
"'"
...
qnp
g2p
Principal component andfactor analyses

Table 3 variables
1
2
493
coordinates
~I1
c12 C21 ~722 ... .
CpI
.Cp2 &pp
Clp
C2p
".
i and j in such a plot will be an approximation to the Mahalanobis distance in the p-space
dij = [ ( x i - xj)'S -1 (xi - xj)] 1/2

In practice, one may have to choose the appropriate distance we want to preserve in the reduced space. Usually, two or three dimensional plots may suffice to capture the original configuration. I f more than three dimensions are necessary, other graphical displays for visualizing higher dimensional plots m a y be used. See for instance the paper by Wegman, Carr and Luo (1993). We can also represent the variables in a lower dimensional space to provide a visual examination of the associations between them. The full set of coordinates for this purpose is given as in Table 3. Let us denote the vector connecting the points representing the i th individual in the r-dimensional space to the origin by vi. Then v{vl is a good approximation to sii, the variance of the ith variable and the cosine of the angle between the vectors vi and vj will be a good approximation of the correlation between the i th and jth variables.
2.5. Analysis o f residuals and detection of outliers xi, the p-vector of measurements on the
x -
I f we retain the first r PC's, we can compute the error in the approximation 2i to i th individual, by
~i = (Cr+~C'r+l + . . . + CpCp)X
and an overall measure of difference is
d~
(x i -
.~i)t(xi
3gi)
~-
q2r+l
q} lp
"
I f some d~ is large compared to the others, we have an indication that x~ may be an outlier. Note 1. The PC's are not invariant for linear transformations of the original variables. For instance, if the original variables are scaled by different numbers or if they are rotated by a linear transformation, the PC's will be different. This suggests that an initial decision has to be made on transforming the original measurements to a new set and then extracting the PC's. The recommendation
494
c. R. Rao
usually made is to scale the measurements by the inverse of the standard deviations, which is equivalent to finding the PC's based on the correlation matrix rather than the covariance matrix. Note 2. There are tests available on the eigen values and eigen vectors of a covariance matrix when the original measurements have a multivariate normal distribution. [See Chapter 4 of Basilevsky (1994)]. In practice, it may be necessary to test for normality of the original measurements if these tests are to be applied. It may be useful to try transformations of the measurements by using the BoxCox family of transformations to induce normality if necessary. Several computer programs allow for this option. In such a case, we will be computing the PC's of transformed variables. Note 3. In some problems such as the analysis of growth curves, the PC's are computed from the matrix S = XX' without making correction for the mean. The references to such methods are Rao (1958, 1987). Note 4. It has been suggested by Jolicoeur and Mosimann (1960) that the first principal component, which has the m a x i m u m variance, may be interpreted as a size factor provided all the coefficients are positive, and other principal components with positive and negative coefficients as shape factors. A justification for such an interpretation may be given as follows. Consider the i th variable xi in x and the jth PC, e)x of x. The regression of xi on c~.x is cji the i th element in the jth eigen vector c]. N o w a unit increase in c}x produces on the average an increase cj~ in x~. If all the elements in c] are positive, a unit increase in c~x increases the value of each of the measurements, in which case c~x may be described as a size factor. If some coefficients are positive and others are negative, then an increase in c)x increases the values of some measurements and decreases the values of the others, in which case c)x may be interpreted as a shape factor. It may be of interest to note that if all the original measurements are nonnegative, then the first PC of the uncorrected sum of squares and products matrix will have all its coefficients non-negative. Note 5. Another particular case of the general problem stated in Section 2.1 is when x and y are completely different sets of variables. Such a situation arises when we have a large number of what are called instrumental variables represented by y, and we wish to predict each dependent variable in the set x using certain linear functions y. Such a procedure may be more economical and sometimes more efficient due to multicollinearity in y.
2.6. Principal components of x uncorrelated with concomitant variables z

In some problems it is of interest to find the principal components of a p-vector x uncorrelated with a q-vector of concomitant variables z. Let
495
~21
~22 J
(2.13)
denote the covariance matrix of (x', z')' in the partitioned form. We need k principal components L'lX ,. . . , L 2 x such that LiL = 1, LiL = 0 and cov(L~ix, z) = LIEI2 = 0, i,j = 1,..., k and L'IZL 1 + . . . + L'kEL k (2.14)
is a maximum. It is shown in Rao (1964), that the optimum choice of L 1 , . . . , Lk are the first k right eigen vectors of the matrix ( I - ]~12(Y~21Y,12)-l~]zl)Yql
.
(2.15)
As an application, let us consider a p-vector time series representing some blocks of economic transactions considered by Stone (1947). Economic transactions
1
Time 2
periods ...

T
XIT
Xll
Xpl
X12
Xp2
p
Concomitants functions of time linear quadratic
XpT
1 1
2 22
... . ..
T T2
We compute the (p + 2) order covariance matrix arising out of the main variables and concomitants, considering T as the sample size, Sll
321
S12) 822
(2.16)
where $1~ is of order p x p, $12 of order p 2 and $22 of order 2 2. The necessary number of right eigen vectors of
( I -- S12(821312)-1321)311
(2.17)
provide principle components of x unaffected by linear and quadratic trends of the transactions over time. Elimination of lower order or higher order trends is possible by suitably choosing the concomitant variables as powers of time. Stone (1947) considered the above problem of isolating linear functions of x which have an intrinsic economic significance from those which represent trend with time and those which measure random errors. For this purpose he computed the covariance matrix of x variables alone and found the PC's using the eigen vectors of the $1~ part of the matrix without any reference to the time factor. The problem was then posed as that of identifying the dominant PC which accounted for a large variance. This was interpreted as linear trend and other PC's were
496
C. R. Rao
interpreted in economic terms. It is believed that the method suggested of obtaining the PC's using the matrix (2.17) is more flexible and provides a better technique of eliminating trend of any order and providing linear functions with intrinsic economic significance.
3. Model based principal components

3.1. A n a n a l o g y w i t h t h e f a c t o r a n a l y t i c m o d e l
Let us suppose that the measurement p-vector x i on individual i can be expressed as

xi = ~ + A f + ei, i = 1 , . . . , n
(3.1)
where is a p-vector and A is p x r matrix c o m m o n to all individuals, f is an rvector specific to individual i, and ei is a random variable such that E(ei) = 0, and V ( e i ) = G2I for i = 1 , . . . , n. The model (3.1) is analogous to the FA model except that in FA the covariance matrix of ei is diagonal with possibly different elements (see Section 4 of the paper). The problem we consider is one of estimating A , f , . . . ,f, and o.2 from the model (3.1). Note that the solution is not unique unless we impose certain restrictions such as that the columns of A are orthonormal. We can write the joint model (3.1) as
X = ee ~+ AF + E
(3.2)
where X = (Xl : ... : xn) is p x n matrix, e is an n-vector of unites, and F is r x n matrix. We may estimate ~, A and F by minimizing
IIX - ~e' - AFII
(3.3)
for an appropriately chosen norm. The choice of Frobenius norm leads to an extended method of least squares where the expression
n
Z (xi - - A f i ) ' ( x i - - A f ) i=l
(3.4)
is minimized with respect to ~,A and f , . . . ,fn. One possible solution (see Rao (1995)) is
= = (cl : ... : cr) (3.5)
where C l , . . . , cr are the first r eigen vectors of S = X ( I - ] e e ' ) X ' . Then f is the vector of r PC's for the individual i. We thus have the same solution as that discussed in Sections 2.2 - 2.5. An estimate of o.2 is
t~2
( n - r - 1 ) ( p - r) (g2+l + " " + g2)
n--1
(3.6)
2 are the last (p - r) eigen values of S. where ~ 2 r + l , ' ' " ' gp
497
In some problems, it may be appropriate to consider f / i n the model (3.1) as a random variable with the identity I as covariance matrix. In such a case E(S) = AA' + 0-21, an estimate of A is
= (el 1 : . . . :
(3.7)
erer) ,
(3.8)
and an estimate of a2 is 6.2=

(n - r - 1 ) ( p - r)
n-1
(gr21 + " "
+ eZP)
(3.9)
which are the same as in (3.6) except for scaling factors. If it is desired to estimate (predict) f/, one may use the regression o f f / o n xi which is of the form = A ' ~ A ' + 6-21)- ' ( x i - x,) (3.10)
and differs from the expression (3.5). A similar situation arises when we want to estimate the parameters simultaneously from several linear models having the same design matrix. Reference may be made to Rao (1975) for a discussion of such a problem.
3.2. Regression problem based on a P C model

We have n independent observations on a (p + 1)-vector random variable (y, x), where x is a p-vector and y is a scalar,
(yl,X1),...,(yn,Xn)
(3.11)
and only x,+l for the (n + 1) th sample. The problem is to predict Yn+l the unobserved value, under the PC model xi = ael + A f / + ei
Yi = ~2 -Jr-blfi q- l~i
(3.12) (3.13)
i--- 1 , . . . , n + l where cov(ei, qi) = 0, cov(ei) = o-21, V(rli ) = 0-2, and the rest of the assumptions are the same as in the model (3.1). The above problem was considered in a series of papers (see Rao (1975, 1976, 1978, 1987) and Rao and Boudreau (1985)). Recently, the model (3.12 - - 3.13) is used in the development of partial least squares (see Helland (1988) and the references there in). There are several possible approaches to the problem. 1) Let fl,...)~,+1 be the estimates o f f 1 , . . . ,fnl using the observational equations (3.12) only. Then find estimates ~2 and b of ~2 and b, using the first n observational equations of (3.13) and assuming ~,..-)~,+1 as known, by the usual least squares method. Finally predict y,+l by the formula
498 ^! ^ Yn+l ~- ~2 "~ b fn+l"
C. R. Rao
(3.14)
2) Let 021,~2,A and b be the estimates of ~l,Ct2,A and b using the first n observational equations in (3.12) and (3.13). Then estimate fn+l using the equations
Xn+l ~---~1 +Af,+l + E,+I
(3.15)
assuming ~l and A as known, by the least squares method. Ifa~,+l is the estimate o f f , + l , then Yn+l is predicted by
.Yn+l = ~2 -+- b~Jfn+l"
(3.16)
3) Substitute a value say y for yn+l to make the equations (3.12 - 3.13) complete. Then find the singular value decomposition of the partitioned matrix
Xl :...: Xn Xn+l
\
)
Yl
Yn
(i
_ (n -I- 1)-lee t = glClqtl + . . . - ~ - ~p+lCp+lqp+l
where gi depend on y, and compute

Sr (y) = ~2__1 (y) + . . .
-J- ~2+1 (y)
(3.17)
Finally predict Yn+l as the value of y which minimizes (3.17). The solution may be obtained graphically or by an iterative algorithm as described in Rao and Boudreau (1985). 4) Another method is to consider f / a s a random variable with zero mean vector and covariance matrix F. Then
cov=
xi )
Y;
= \
( AFA' + a2I AFb O'rA' b'rb+G~J
(3.18)
Using (3.12) and the first n observational equations in (3.13), obtain the estimates of A, F,b, ~2 and a02. Methods described by Bentler (I983), S6rbom (1974) and Rao (1983, 1985) may be used for this purpose. Then y,+l may be predicted by
?v.+l = Y + b'FA'(AFA'
+ 621)-l(Xn+1
--
YC)
(3.19)
where ~ = n-lEyi, ~ = (n + 1)-l]~xi and for b,F,A and o 2 their estimates are substituted.
4. Factor analysis 4.1. General discussion

In FA, a p vector variable x is endowed with a stochastic structure

x = +Af+ e
499
(4.1)
where ~ is a p-vector and A is p x r matrix of parameters, f i s an r-vector of latent variables called common factors and e is a p-vector of variables called specific factors, with the following assumptions: E(e) = 0, coy(e) = A a diagonal matrix (4.2)
E ( f ) = 0, cov(f, e) = 0, c o v ( f ) = I . As a consequence of (4.2), we have Z = cov(x) = AA' + A.
(4.3)
Note that (4.3) reduces to the PC model considered in (3.1) when A = o-2I. The problems generally discussed in FA, on the basis of n independent observations xl, , xn made on x, are: 1) 2) 3) 4) What is the How do we How do we How do we minimum r for which the representation (4.3) holds? estimate A, called the matrix of factor loadings? interpret the factors? estimate f f o r a given individual given the observable x?
It may be noted that the equation (4.3) does not ensure the existence of a unique A even for a given r and so also f i n (4.1). However, the object is to obtain any particular solution, and consider transformations of A and f for an interpretation. References to a discussion of non-identifiability ofA a n d f a n d rotation of factors are Basilevsky (1994, pp. 355-360, 402-404), Jackson (1991, pp. 393396), Jolliffe (1986, pp. 117-118). Denoting X = ( x l , . . . , xn), we compute
z n-lXe
S = (n - 1 ) - 1 X ( I -
n-leet)X '
as estimates of and E. Then estimate A and z~ starting with S. The most commonly used method is maximum likelihood (ML) under the assumption of multivariate normality of the vector variable x. There are a number of computer packages for the estimation of r, the number of factors, A, the matrix of factor loadings and A, the matrix of specific factor variances. (See for instance SPSS, SAS, OSIRIS, BMD, C O F A M M etc., which also offer alternatives other than M L estimates and also compute rotations of factor loadings for interpretation). Let us denote the M L estimates of A and A by d and z~. The likelihood ratio test criterion for testing the hypothesis that there r common factors is - ( n - 1)log I~sS~+ ,~] (4.5)
500
C. R. Rao
which is asymptotically distributed as )~2 on [(p -- r)2 -- p -- r]/2 degrees of freedom in large samples. This is valid under the assumption of multivariate normality. A slight improvement to the 22 approximation is obtained by replacing the multiplier (n - 1) in (4.5) by n_l_2p+5 6 2r 3 (4.6)
An alternative method called canonical factor analysis (CFA) for the estimation A and A is developed by Rao (1955) without making any distributional assumptions. The solution turns out to be same as the ML estimate. However, the )~2-test of (4.5) requires the assumption of multivariate normality. A general recommendation is to test for multivariate normality based on the observed data x l , . . . , x , using some of the techniques available in computer packages. Some references to a discussion of tests of normality are Basilevsky (1994, Section 4.6.2) and Gnanadesikan (1977, Section 5.4.2). It may also be worthwhile making transformations of variables to achieve normality. But in such a case the factor structure has to be imposed on transformed variables. It may be noted that unlike PCA, FA is invariant under scaling of variables, if one uses scale free extraction methods such as the ML and CFA. In these cases, one can use the covariance or the correlation matrix to start with. If the covariance matrix is used and scales vary very widely, scale factors will complicate interpretation of results. In such a case, there is some advantage in using the correlation matrix. The covariance matrix is preferable when comparison of factor structures between groups is involved (see S6rbom (1974)).
4.2. Estimation of factor scores
Using the estimates A and /~ of A and A in the representation of Y~, we can estimate the factor score f of the ith individual with measurements x,- by
fii = ~t(~,~t_~_ /~)-1 (X i __ ~), i = 1 , . . . ,n.
(4.7)
The expression (4.7) is simply the regression o f f on xi with the estimates substituted for the unknowns. There are other expressions suggested for the estimates of factor scores (see Jackson (1991, p.409)).
4.3. Prediction problem
We consider a p + 1 variable (x, y) with the factor structure

x=~+Af+e y =/~ + d r + t/
(4.8)
where 3 is a scalar, a is an r-vector and q is such that E(~/)=0, = 0, = ( ~ p2+ l " Suppose that we have observations (xl,yl), . . . , (xn,yn) on n individuals and only xn+l on an (n + 1) th individual. The
501
problem is to predict Yn+l, given all the other observations. By considering the factor structure of the (p + l)-vector variable
(;) =
(:)
,49
and using the observations (xl,Y~2),... , ( x , , y , ) we estimate all the unknown parameters. Let & fl, A, , A and 6p+ 1 be estimates of the corresponding parameters using the CFA or ML-method. Then the regression estimate of Y,+I on
Xn+ 1 is
)~ =- fl + a'.4'(.~ A' +/~)-1 (x,+l - ~).
(4.10)
In this case, we are not utilizing the information provided by x,+l, on the parameters a, A and A.
4.4. What is the difference between P C A and FA?

In PCA, we do not impose any structure on the p-vector random variable x. Suppose that E(x) = 0 and cov(x) = E. We wish to replace x by a smaller number of linear combinations y = Lrx where L is p r matrix of rank r. Then the predicted value of x given y (i.e., the regression of x on y) is
3c = Z L ( L ' Z L ) - ' y
and the covariance matrix of the residual x - :~ is
(4.11)
- ~L(L'~L)-ILIZ.
(4.12)
We wish to choose L to minimize a suitable norm of (4.12). The choice of Frobenius norm leads to the solution L = (el : . . . : er) (4.13)
where e l , . . . , er are the first r eigen vectors of Z in which case L'x represents the first r principal components as explained in Section 3. The aim is to account for the entire covariance matrix of x, to the extent possible, in terms of a reduced number of variables. In FA, we are fitting an expression of the type AA' + A to R, the correlation matrix of the p-vector variable x. Since A is a diagonal matrix of free parameters, the matrix A is virtually determined by minimizing the differences between the off diagonal elements o f A A ~ and R. Thus, the matrix of factor loadings is designed to explain the correlations between the observed variables. The variances in the variables unexplained by the factors, irrespective of their magnitudes, is characterized as specific variances. In PCA, the emphasis is more on explaining the overall variances arising out of both the common and specific factors. Thus, the objectives of PCA and CA are different and so are the solutions.
502
C. R. Rao
Note 1. Fitting an expression of the type AA~+ A to R imposes an automatic upper bound to r, the number of factors. So, in a given situation, one is forced to interpret the data in terms of far fewer factors than those that may have influenced the data. In the CFA developed by the author (Rao (1955)), no limit is placed on the number of common factors, but the method allows for the requisite number of dominant factors to be extracted from the data. No fixed number of factors is postulated to begin with, and the problem is treated as one of estimation rather than testing of hypothesis on the number of factors. Note 2. It may be of interest to note that in the formulation of the F A model, only the second order properties of the common and specific factors are used. However, if we demand independence of the distribution of all these variables, the problem becomes more complex as the following theorem proved in Rao (1969, 1973) shows. THEOREM. Let x be a p-vector random variable with a linear structure x = Ay, where y is a q-vector of independent r.v.'s. Then x admits the decomposition
X ~ X 1 -~ X 2
where xl and x2 are independent, Xl has essentially a unique structure (xl = Alyl with a unique A1 apart from scaling and Yl as a vector of a fixed number of independent non-normal variables) and x2 has a p-variate normal distribution with a non-unique linear structure (x2 = B2Y2 with B2 not necessarily unique and Y2 as a vector of independent univariate normal variables). In view of this theorem, if some of the factors have a non-normal distribution, the uniqueness of A1 automatically specifies a lower bound to the number of factor variables which may have no relationship with p. The limitations placed on the FA model by considering only second order properties of the variables involved need some investigation.
4.5. The arbitrage pricing theory model (APT)

The classical FA model is extended to a statistical model of the APT by Ross (1976), which is similar to the growth curve model of Rao (1958, equation 9, Section 3). Consider the usual FA model, using the notation used in the finance literature
R = tt + B T + u
(4.18)
where R denotes the N-vector of returns on N assets, / ~ = E ( R ) , E0C) = 0, E(u) -- 0, E0ru' ) = 0, cov0r) = 4~ and cov(u) = A, a diagonal matrix. The matrix B of order N x k is the matrix of factor loadings. [In the earlier sections p is used for N and r for k]. From the assumptions made l~ = coy(R) = B~B' + A Now, we model It as (4.19)
Principalcomponentandfactor analyses
B = Rfe + B2
503
(4.20)
where R f is described as the riskless return on a riskless asset. The sample we have over T time periods is
(R1, efl ),..., (RT, RfT)
(4.21)
where in (4.21), Rf is known and varies over time and 2 is k-vector of unknown parameters called the factor premiums. Writing rt : R r - R f r e , we can write the model for the t tla observation as
rt = B ( f + i~) + ut, t = 1 , . . . , T
(4.22)
which is exactly the model considered in Rao (1958). The marginal model for rt is
rt = Bi~ + vt, t : 1, . . . , T
(4.23)
with cov(vt) = Z. If B and Z are known, the least squares estimate of 2 is

J. = ( B ' Z - 1 B ) - I B ' E - I
(4.24)
where ~ = T - l ( r l + . . . - k - r T ) . If B and E are not known, it is suggested by Roll and Ross (1980) and also Rao (1958) that they can be estimated by M L or an appropriate nonparametric method considering the model (4.18) with unrestricted /~ as discussed in section 4.2 of this article and substituted in (4.23). If multivariate normality is assumed for the distribution o f f and u in the model (4.18), it is possible to write down the likelihood for all the unknown parameters B, 2, ~ and A based on the observations r l , . . . , r r and obtain the M L estimates for all the unknown parameters. We can then also apply likelihood ratio tests for the specification of Z, i.e., for the number of factors, and the structure (4.20) on p. Such a procedure is fully worked out in Christensen (1995), where the method is applied to New York Stock Exchange data.
5. Conclusions Both PCA and FA may be considered as multivariate methods for exploratory data analysis. The aim of both the analyses is to understand the structure of the data, through reducing the number of variables, which in some sense can replace the original data and which are easier to study through graphical representation and multivariate inference techniques. Some caution is necessary as there are many decisions to be made on the number of reduced variables and the criterion by which adequacy of the reduced set of variables in representing the whole set of original variables is judged. Some practioners consider PCA and FA as alternative techniques of multivariate data analysis intended to answer the same questions. It is also claimed that each technique has evolved into a useful data - analytic tool and has become an invaluable aid to other statistical models such as cluster and discriminant anal-
504
C. R. Rao
ysis, least squares regression, g r a p h i c a l d a t a displays, a n d so forth. A s discussed in the present article, the p u r p o s e s o f r e d u c t i o n o f d a t a in P C A a n d F A are different. I n P C A , the r e d u c e d d a t a is i n t e n d e d to a p p r o x i m a t e , to the m a x i m u m possible extent, the d i s p e r s i o n o f the original d a t a in terms o f the entire covariance m a t r i x , while in F A , the e m p h a s i s is on explaining the c o r r e l a t i o n s or association between the original variables. T h e objectives are different a n d a decision has to be m a d e as to the a p p r o p r i a t e n e s s o f P C A or F A in a p a r t i c u l a r s i t u a t i o n a n d the p u r p o s e o f d a t a analysis. W h i l e the roles o f P C A a n d F A in e x p l o r a t o r y d a t a analysis are clear, the exact uses o f the e s t i m a t e d P C ' s a n d factors in inferential d a t a analysis, or in p l a n n i n g further investigations do n o t seem to be satisfactorily laid out. S o m e c o n d i t i o n s u n d e r which the f a c t o r scores a n d p r i n c i p a l c o m p o n e n t s are close to each o t h e r have been given b y Schneeweiss a n d M a t h e s (1955). It w o u l d be o f interest to p u r s u e such theoretical investigations a n d also e x a m i n e in ind i v i d u a l d a t a sets the actual differences between p r i n c i p a l c o m p o n e n t s a n d factor scores.
References
Bartholomew, D. J. (1987). Latent Variable Models andFactor Analysis. Oxford University Press, New York. Basilevsky, A. (1994). Statistical Factor Analysis and Related Methods. Wiley, New York. Bentler, P. M. (1983). Some contributions to efficient statistics in structural models: Specification and estimation of moment structures. Psychometrika 48, 493-517. Benzecri, J. P. (1973). L'analyze des Donnes, Tome II, L'Analyse des Correspondences. Dunod, Paris. Cartel, R. B. (1978). The Scientific Use of Factor Analysis in Behavioural and Life Science. Plenum Press. Christensen, B. J. (1995). The likelihood ratio test of the APT with unobservable factors against the unrestricted factor model. Tech. Rept. Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Ann. Eugen, London 7, 179-188. Gnanadesikan, R. (1977). Methods for Statistical Analysis of Multivariate Observations. Wiley, New York. Greenacre, M. J. (1984). Theory and Applications of Correspondence Analysis. Academic, London. Helland, I. S. (1988). On the structure of partial least squares regression. Commun. Statist. Simula. 17, 581-607. Hotelling, H. (1933). Analysis of a complex of statistical variable into principal components. Psychometrika 1, 27-35. Jackson, J. E. (1991). A User's Guide to Principal Components. Wiley, New York. Jolicoeur, P. and J. E. Mosiman (1960). Size and shape variation in the painted turtle, a principal component analysis. Growth 24, 339-354. Joliffe, I. T. (1986). Principal Component Analysis. Springer-Verlag, New York. Lawley, D. N. (1940). The estimation of factor loadings by the method of maximum likelihood. Proc. Roy. Soc. Edinburgh (A), 60, 64-82. Pearson, K. (1901). On lines and planes of closest fit to a system of points in space. Philosophical Magazine 2, 6-th Series, 557-572. Rao, C. R. (1955). Estimation and tests of significance in factor analysis. Psychometrika 20, 93-111. Rao, C. R. (1958). Some statistical methods for comparison of growth curves. Biometrics 14, 1-17.
505
Rao, C. R. (1964). The use and interpretation of principal component analysis in applied research. Sankhyd A 26, 329-358. Rao, C. R. (1969). A decomposition theorem for vector variables with a linear structure. Ann. Math. Statist. 40, 1845-1849. Rao, C. R. (1973). Linear Statistical Inference and its Applications, 2nd ed., Wiley, New York. Rao, C. R. (1975). Simultaneous estimation of parameters in different linear models and applications to biometric problems. Biometrics 31, 545-554. Rao, C. R. (1976). Prediction of future observations with special reference to linear models. In: P. R. Krishnaiah, ed., Multivariate Analysis VI, North Holland, 193-208. Rao, C. R. (1983). Likelihood ratio tests for relationships between covariance matrices. In: S. Karlin, T. Ameniya and L. A. Goodman, eds., Studies in Economics, Time Series and Multivariate Statistics. Academic, New York, 529-543. Rao, C. R. and R. Boudreau, (1985). Prediction of future observations in factor analytic type growth model. In: P. R. Krishnaiah, ed., Multivariate Analysis VI. Elsevier, Amsterdam, 449-466. Rao, C. R. (1987). Prediction of future observations in growth curve models. J. Statist. Science 2, 434-471. Rao, C. R. (1995). A review of canonical coordinates and an alternative to correspondence analysis using Hellinger distance. Qiiestii6 19, 23-63. Roll, R. and S. A. Ross (1980). An empirical investigation of the arbitrage pricing theory. J. Finance 35, 1073-1103. Ross, S. A. (1976). The arbitrage theory of capital asset pricing. J. Econom. Theory 13, 341-360. Schneeweiss, H. and Mathes, H. (1995). Factor analysis and principal components. J. Multivariate Analysis 55, 105-124. S6rbom, D. (1974). A general method for studying differences in factor means and factor structure between groups. British J. Math. Statist. Psych. 27, 229-239. Spearman, C. (1904). General intelligence, objectively determined and measured. Am. J. Psych. 15, 201-293. Stone, R. (1947). An interdependence of blocks of transactions. J. Roy. Statist. Soc. (Supple), 8, 1-32. Wegman, E. J., D. B. Cart and Q. Luo (1993). Visualizing multivariate data. In: C. R. Rao, ed., Multivatiate Analysis: Future Directions. North Holland, 423-466.
17
Errors-in-Variables Problems in Financial Models
G. S. M a d d a l a and M . N i m a l e n d r a n
1. Introduction
The errors-in-variables (EIV) problems in finance arise from using incorrectly measured variables or proxy variables in regression models. Errors in measuring the dependent variables are incorporated in the disturbance term and they cause no problems. However, when an independent variable is measured with error, this error appears in both the regressor variable and in the error term of the new regression model. This results in contemporaneous correlation between the regressor and the error term, and leads to a biased OLS (Ordinary Least Squares) estimator (even asymptotically) and inconsistent standard errors. The biases introduced by measurement errors can be significant and can lead to incorrect inferences. Further, when there are more than one regressor variable in the model the direction of the bias is unpredictable. The effect of measurement errors on OLS estimators is discussed extensively in several econometrics texts including Maddala (1992), and Greene (1993). A comprehensive discussion of errors-invariables model is in Fuller (1987) and a discussion in the context of econometric models is in Griliches (1985), and Chamberlain and Goldberger (1990). The errors in the regressor variable could be due to several causes. We can classify them into the following two groups: (1) measurement errors, and (2) use of proxy variables for unobservable theoretical concepts, constructs or latent variables. Measurement errors could be introduced by using estimated values in the regression model. Examples of this are the use of estimated betas as regressors in cross-sectional tests of the CAPM (Capital Asset Pricing Model), and two-pass tests of the APT (Arbitrage Pricing Theory) where estimated rather than actual factor loadings are used in the second pass tests. The second major source of errors arises from the use of proxy variables for unobservable or latent variables. An example of this in finance would be the testing of signaling models where the econometrician observes only a noisy signal of the underlying attribute that is being signaled. In this article we examine several alternative models and techniques employed in financial models to mitigate the errors-in-variables problems. Some areas in finance where errors-in-variables problems are encountered are described below:
507
508
G. S. Maddala and M. Nimalendran
I. Testing asset pricing models: There are several potential problems in these tests; these include measurement errors associated with the use of estimates for risk measures and the problem associated with the unobservability of the true market portfolio. II. Performance measurements: Measuring the performance of managed portfolios (mutual funds, pension funds etc.) is an important exercise that provides information about the ability of managers to provide superior returns. However, any method used to measure performance must specify a benchmark, and an incorrect specification of the benchmark would introduce errors in the performance measures. III. Market response to corporate announcements: Several articles analyze the response of the market to unexpected earnings, unexpected dividends, unexpected splits and other announcements. To obtain the unexpected component of the variable one needs to specify a model for the expected component. An incorrect specification of the expectation model or estimation errors can result in the unexpected component being measured with error. IV. Testing of signaling models: In signaling models it is argued that managers with private information can employ indicators such as dividends, earnings, splits, capital structure etc. to signal their private information to the market. In testing these models one has to realize that the indicators are noisy measures of the underlying attribute that is signaled (investment opportunities, future cash flows etc.).
A researcher can employ several approaches to correct for the errors-in-variables problem, and to obtain consistent estimates and standard errors. We examine these approaches under the following eight classifications: (1) Grouping Methods, (2) Direct and Reverse Regressions, (3) Alternatives to Two Pass Methods, (4) MIMIC Models, and (5) Artificial Neural Networks (ANN) models. We also discuss other models where the errors-in-variables problems are relevant. These are examined under the categories: (6) Signal Extraction Models, (7) Qualitative Limited Dependent Variable Models, and (8) Factor Analysis with Measurement Errors.
2. Grouping methods
Grouping methods have been commonly used in finance as a solution to the errors-in-variables problem. See, for instance, Black, Jensen and Scholes (1972), Fama and MacBeth (1973) and Fama and French (1992) for a recent illustration. We will refer to these papers as BJS, FM and FF respectively in subsequent discussion. The basic approach involves a two-pass technique. In the first pass, time series data on each individual security are used to estimate betas for each security. In the second pass a cross-section regression (CSR) for the average returns on the securities is estimated using the betas obtained from the first pass as regressors. This introduces the errors-in-variables problem. Since grouping
Errors-in-variables problems in financial models
509
methods can be viewed as instrumental variable (IV) methods, grouping is used to solve this errors-in-variables problem. There are frequent references to Wald's classic paper in this literature but the simple grouping method used by Wald is not the one used in these papers. Wald's method consists of ranking the observations, forming two groups and then passing a line between the means of the two groups. Later articles suggested that the efficiency of the estimator could be improved by dividing the data into three groups, discarding the observations in the middle group, and passing the line between the means of the upper and lower groups. Wald's procedure amounts to using rank as an instrumental variable, but since rank depends on the measurement error, this cannot produce a consistent estimator (a point noted by Wald himself). Pakes (1982) argues that contrary to the statements often made in several textbooks (including the text by Maddala, 1977, which has been corrected in Introduction to Econometrics, Second. Ed. 1992) the grouping estimator is not consistent. This problem has also been pointed out in the finance literature in a recent paper by Lys and Sabino (1992) although there is no reference in this paper to the work of Pakes (1982). The grouping method used in F M and F F is not the simple grouping method used by Wald. The procedure is to estimate the betas with, say, monthly observations on the first 5 years and then rank the securities based on these estimated betas to form 20 groups (portfolios). Then the estimation sample (omitting the first 5 years of data) is used to estimate a cross-section regression of asset returns on the betas for the different groups.
2.1. Cross-sectional tests

In the cross-sectional tests of the CAPM, the average return on a cross-sectional sample of securities over some time period is regressed against each securities beta (/3) with respect to a market portfolio. In the first stage,/?i is estimated from a time series regression of the return on a market index RMt on the individual stock returns Rit.
Rit ~- ~i -~- fliRMt q- 1)it .
(1)
In the second stage, a cross-sectional regression model of the average return on the individual security Ri, is regressed on the estimate of beta.
Finally, the estimated coefficient 90 is compared to the risk-free rate (Rf) in the period under examination and 91 is compared to an estimate of the risk premium on the market (/~M - R f) estimated from the same estimation period. The first direct test based on cross-sectional regression was by Douglas (1969). In this test Douglas estimated a cross-sectional model of the average return on a large number of common stocks on the stock's own variance and on their covariance with a market index. The tests were inconsistent with the CAPM because the
510
coefficient on the variance term was significant while the coefficient on the covariance term was not significant. A detailed analysis of the econometric problems that arise from a cross-sectional test was first given by Miller and Scholes (1972). They concluded that measurement error in fli was a significant source of bias that contributed toward the findings by Douglas. Fama and MacBeth (1973) use a portfolio approach to reduce the errors-in-variables problem. In particular, they estimate the following cross-sectional-time-series model.
Rpt = ~Ot ~- ~)ltflp,t-I
-2 ~2tflp,t-1 -b- ~3t~p,t-1 (~) + ?]pt ,
(3)
where, tip is the average of the betas for the individual stocks in a portfolio, ~2 is the average of the squared betas and 6-p(Q is the average residual variance from a market model given by equation (1). If/~i is estimated with an unbiased measurement error vi then the regression estimate of 7 for the model described by equation (2) is given by plim Yt7l Vat (vi) 1 + Var (/~i) where, Var(vi) is the variance of the measurement errors, and Var(fli) is the crosssectional sample variance of the true risk measures fl~. Thus, even for large samples, as long as fi/s are measured with errors the estimated coefficient 91 will be biased toward zero and 70 will be biased away from its true value. The idea behind the grouping or portfolio technique is to minimize the var(v;) through the portfolio diversification effect, and at the same time one would like to maximize the Var(fli) by forming portfolios by ranking on/~i's.
2.2. Time series and multivariate tests
(4)
Black, Jensen and Scholes (1972) employ a time-series procedure to test the CAPM that avoids the errors-in-variables problem. They estimate the following model:
(Rpt -- RFt) = ~p ~- fl p(RMt -- RFt) -~- pt ,
(5)
where, Rpt is the return on a portfolio of stocks ranked by their betas estimated from a prior period, RFt is the risk free rate, and RMt is the return for the market
portfolio. In this specification, the test is based on the hypothesis that ep = 0 if CAPM is valid. Gibbons (1982) employs a multivariate regression framework in which the asset pricing models are cast as nonlinear parameter restrictions. The approach avoids the errors-in-variables problems introduced by the two pass cross-sectional tests. Gibbons uses the method to test the Black's (1972) version of the CAPM which specifies the following linear relationship between expected return on the security and risk.
511 (6)
E(R/t) = 7 + fli[E(Rmt) - 7] ,
where, E(Rit) is the expected return on security i for period t, E(Rmt) is the expected return on the market portfolio for period t, 7 is the expected return on a zero beta portfolio, and/~i = cov(Rit, R,nt)/var(Rmt). In addition, if asset returns are stationary with a multivariate normal distribution, then they can be described by the "market model"
git : o~i -}- fliRmt -~- ?]it,
i = 1 ,... ,N,
t = 1,... T .
(7)
In terms of equation (7), Black's model given by equation (6) implies the restrictions cti=7(1-fli)
V i= 1,...,N .
(8)
Thus, Black's version of the CAPM places nonlinear restrictions on a system of N regression equations. The errors-in-variables problems with the two-pass procedure are avoided by estimating y and fl's simultaneously. Gibbons employs a likelihood ratio statistic to test the restrictions implied by the CAPM. One important point to note in the cross-sectional tests is that grouping to take care of errors in variable is not necessary. The problem here is not the one in the usual EIV models where the variance of the measurement error is not known. Note that the betas are estimated but their variance is known. This knowledge is used in Litzenberger and Ramaswamy (1979) (referred to later as L-R) to get bias corrected estimates. In the statistical literature this method is known as consistent adjusted least squares (CAL) method and has been discussed by Schneeweiss (1976), Fuller (1980) and Kapteyn and Wansbeek (1984), although the conditions under which the error variances are estimated are different in the statistical literature and the financial literature. The L-R method involves subtracting an appropriate expression from the cross-product matrix of the estimated beta vector to neutralize the impact of the measurement error. The modified estimator is consistent as the number of securities tends to infinity. However, in practice, this adjustment does not always yield a cross-product matrix that is positive definite. In fact, Shanken and Weinstein (1990) observe this in their work and argue that more work is needed on the properties of L-R method. Banz (1981) also mentions "serious problems in applying the Litzenberger-Ramaswamy estimator" in his analysis of the firm size effect. Besides the L-R method, another promising alternative to the traditional grouping procedure for correcting the EIV bias, is the maximum likelihood method. Shanken (1992) discusses the relationship between the L-R method and the ML method. In addition to the bias correction problem there is the problem of correcting the standard errors of the estimated coefficients. Shanken (1992) derives the correction factors for the standard errors in the presence of errors-in-variables.
512
2.3. Grouping in the presence of multiple proxies

The above discussion refers only to simple regression models with one regressor (estimated beta). However, there are models where several regressors are measured with error. Here, grouping by only one variable amount to using only one instrumental variable, and therefore cannot produce consistent estimates. An example of multiple proxies is the paper by Chen, Roll and Ross (1986) which uses the Fama-MacBeth procedure. We will refer to this paper as CRR. They consider five variables describing the economic conditions (monthly growth in industrial production, change in expected inflation, unexpected inflation, term structure, and risk premium measured as the difference between the return on low grade (Baa) bonds and long-term government bonds.) They use a two-pass procedure. In the first pass the returns on a sample of assets are regressed on the five economic state variables over some estimation period (previous five years). On the second pass the beta estimates from the first pass used as independent variables in 12 cross-sectional regressions, one for each of the next 12 months, with asset returns for the month being the dependent variable. Each coefficient in this regression provides an estimate of the risk premium associated with the corresponding state variable. The two-pass procedure is repeated for each year in the sample, yielding time-series estimates of the risk premia associated with the macro variables. The time series means are then tested by a t-test for significant difference from zero. CRR argue (p. 394) that "to control the errors-in-variables problem that arises from step c of the beta estimates obtained in step b, and to reduce the noise in individual asset returns, the securities were grouped into portfolios." They use size (total market value at the beginning of each test period) as the variable for grouping. CRR further argue that the economic variables were significant in explaining stock returns and in addition these variables are "priced" (as revealed by significant coefficients in the second pass cross-sectional regression). Shanken and Weinstein (1990), however, argue that the CRR results are sensitive to the grouping method used and that the significance of the coefficients in the crosssectional regression is altered if EIV adjustment is made to the standard errors. There are two issues that arise in the CRR approach. First, when there are multiple proxies, does grouping by a single variable give consistent estimates? Since grouping by size is equivalent to the use of size as an instrumental variable, what CRR have done is used one instrumental variable (IV). The number of IV's used should be at least equivalent to the number of proxies, in the case of multiple proxies. The second issue is that of alternatives to the grouping methods. One can use the adjusted least squares as in the L-R method discussed earlier, although there would be the problem of the resulting moment matrix being not positive definite. Shanken and Weinstein (1990) discuss adjusting the standard errors only but (we should be) making adjustments for both the coefficient bias and the standard errors.
513
3. Alternatives to the two-pass estimation method

In the estimation of the CAPM model, the errors-in-variables problem is created by using the estimated betas in the first stage as explanatory variables in a second stage cross-section regression. Similar problems arise in the two-pass tests of the arbitrage pricing theory (APT) developed by Roll and Ross (1960), Chen (1983), Connor and Korajczyk (1988), Lehmann and Modest (1988) among others. While Gibbons' (1982) approach avoids the errors-in-variables problem introduced by a two-pass method, the methodology does not address the issue of the unobservability of the "true" market portfolio. As pointed out by Roll (1977), the test of the asset pricing model is essentially a test of whether the proxy used for the "market portfolio" is mean-variance efficient. Gibbons and Ferson (1985) argue that asset pricing models can be tested without observing the "true" market portfolio if the assumption of a constant risk premium is relaxed. This requires a model for conditional expected returns which is used to estimate ratios of betas without observing the market portfolio. The problems due to the unobservability of the market portfolio and the errors-in-variables problems can be avoided by using one-step methods where the underlying factors are treated as unobservables. We discuss models with unobservables in Section 5, and factor analysis with measurement errors in Section 9. Geweke and Zhou (1995) provide an alternative procedure for testing the APT without first estimating separately the factors or factor loadings. Their approach is Bayesian. The basic APT assumes that returns on a vector of N assets are related to k underlying factors by a factor model:
rit = O~i q- f l i l f lt -I- f l i 2 f 2t - 1 - . . . q- f l i k f kt -[- eit ,
i= l,...,N,
t= l,...,T
(9)
where, e; = E(r,-t),/~,k are the factor loadings, and eit are idiosyncratic errors for the ith asset during period t. This model can be written compactly, in vector notation as
rt = ~ + f l f t + et ,
(10)
where rt is an N-vector of returns during period t, e and et are N x 1 vectors, f t is a k x 1 vector and fl is a N x k matrix. The standard assumptions of the factor model are the following: E(ft) = O,E(ftftt) = I,
E(ete't[ft ) = Y~ , E(etlft) = 0
and (11)
where
Z = d i a g [ a ~ , . . . , a 2] .
Also, et and f t are independent and follow multivariate normal distributions. It has been shown that absence of riskless arbitrage opportunities imply an approximate linear relation between the expected returns and their risk exposure. That is
514
~i ~-- ~0 -1- ~,lflli + . . .
-~- )~kflk/
i = 1,... ,N ,
(12)
as N ---+~ , where 2o is zero-beta rate and 2, is the risk premium on the kth factor. Shanken (1992) gives alternative approximate pricing relationships under weaker conditions. A much stronger assumption of competitive equilibrium gives the equilibrium version of the APT where the condition (12) is an equality. Existing studies based on the classical methods test only the equillibrium version. Geweke and Zhou (1995) argue that their approach measures the closeness of (12) directly by obtaining the posterior distribution of Q defined as
N
Q=~.IZ(~i_
7=1
2 0 - 2lflli... - 2kfl~.)2 .
(13)
For the equilibrium version of APT, Q _= 0. Geweke and Zhou argue that inference about Q in the classical framework is extremely complicated. They use the Bayesian approach to derive the posterior distribution of Q based on priors for ~, fl, 2 and Y~. Since the Bayesian approach involves the integration of nuisance parameters from the joint posterior distribution and since analytical integration is not possible in this case, they outline a numerical integration procedure based on Gibbs sampling. The most flexible two-pass approach is the one developed by Connor and Krajezyk (1986, 1988) which is a cross-section approach that can be applied to a large number of assets to extract the factors. By contrast the approach of Geweke and Zhou is a time-series approach and therefore has a restriction on the number of assets that can be considered (N S T - k). However, the former approach ignores the EIV problem but the latter does not. Geweke and Zhou illustrate their methodology by using monthly portfolios returns grouped by industry and market capitalization. An important finding is that there is little improvement in reducing the pricing errors by including more factors beyond the first one. (See also the conclusions in Section 9 which argue in favor of fewer factors.)
4. Direct and reverse regression methods
In his 1921 paper in Metroeconomica, Gini stated that the slope of the coefficient of the error ridden variable lies between the probability limit of the OLS coefficient and the probability limit of the "reverse" regression estimate of the same coefficient. This result, which has also been derived in Frisch (1934), does not carry over to the multiple regression case in general. This generalization, due to Koopmans (1937), is discussed, with a new proof in Bekker et al. (1985). Apart from Koopmans' proof, later proofs have been given by Kalman (1982) and Klepper and Learner (1984). It has also been extended to equation systems by Learner (1987). All these results require that the measurement errors be uncorrelated with the equation errors. This assumption is not valid in many applications. Erickson
515
(1993) derives the implications of placing upper and lower bounds on this correlation in a multiple regression model with exactly one mis-measured regressor. Some other extensions of the bounds literature is that by Krasker and Pratt (1986), who use a prior lower bound on the correlation between the proxy and the true regressor, and Bekker et al. (1987) who use as their prior input an upper bound on the covariance matrix of the errors. Iwata (1992) considers a different problem - - the case where instrumental variables are correlated with errors. In this case, the instrumental variable method does not give consistent estimates but Iwata shows that tighter bounds can be found if one has prior information restricting the extent of the correlation between the instrumental variables and the regression equation errors. In the financial literature the effect of correlated errors has been discussed in Booth and Smith (1985). They consider the case where the errors and the systematic parts of both y and x are correlated (all other error correlations are assumed to be zero). They also give arguments as to why allowing for these correlations is important. This analysis has been applied by Rahman, Fabozzi and Lee (1991) to judge performance measurement of mutual fund shares, which depends on the intercept term in the capital asset pricing model. They derive upper and lower bounds for the constant term using direct and reverse regressions. These results on performance measurement are based on the CAPM. There is, however, discussion in the financial literature of performance measurement based on the APT (arbitrage pricing theory) which is a multiple-index/factor model. See Connor and Korajczyk (1986, 1994). In this case, the bounds on performance measurement are difficult to derive. The results by Klepper and Learner (1984) can be used but they will be based on the restrictive assumption that the errors and systematic parts are uncorrelated (an assumption relaxed in the paper by Booth and Smith). The relaxation of this assumption is important, as argued in Booth and Smith.
5. Latent variables [ structural equation models with measurement errors and M I M I C models
5.1. Multiple indicator models
Many models in finance are formulated in terms of theoretical or hypothetical concepts or latent variables which are not directly observable or measurable. However, often several indicators or proxies are available for these unobserved variables. The indicator or proxy variables can be considered as measuring the unobservable variable with measurement errors. Therefore, the use of these indicator variables directly as a regressor variable in a regression model would lead to errors-in-variables problems. However, if a single unobservable (or latent) variable occurs in different equations as an explanatory variable (multiple indicators of a latent variable), then one can get (under some identifiability conditions) consistent estimates of the coefficients of the unobserved variable. These models are discussed in Zellner (1970), Goldberger (1972), Griliches (1974),
516
Joreskog and Goldberger (1975), and popularized by the LISREL program of Joreskog and Sorbom (1989, 1993). 1 Although many problems in finance fall in this category, there are not many applications of these models in finance. Notable exceptions in corporate finance are the models estimated by Titman and Wessels (1990), Maddala, and Nimalendran (1995), and Desai, Nimalendran and Venkataraman (1995). Titman and Wessels (TW) investigate the determinants of corporate capital structure in terms of unobserved attributes for which they have indicators or proxies which are measured with error. The model consists of two parts: a measurement model, and a structural model which are jointly estimated. In the measurement model, the errors in the proxy variables (e.g. accounting and market data) used for the unobservable attributes are explicitly modeled as follows: X=AZ+6 . (14)
where, Xq 1 is a vector of proxy variables, Zmxl is vector of unobservable attributes

and Aqm is a matrix of coefficients, and 6q1 is a vector of errors. In the above measurement model, the observed proxy variables are expressed as a linear combination of one or more attributes and a random measurement error. The structural model consists of the relationship between different measures of capital structure (short term debt/equity, long term debt/equity etc.), Ypxl, and the unobservable attributes Z. The model is specified as follows where e is a vector of errors:
Y=FZ+e
(15)
Equations (14) and (15) are estimated jointly using the maximum likelihood technique (estimation techniques are described later in this section). TW estimate the model for 15 proxy variables, 8 attributes and 3 different capital structure variables. In order to identify the model additional restrictions are placed. In particular, it is assumed that the errors are uncorrelated, and 105 of the elements of the coefficient matrix are constrained to be zero. The principal advantage of the above model over traditional regression models is that it explicitly models the errors in the proxy variables. Further, if the model is identified then it can be estimated by full information maximum likelihood (FIML) which gives consistent and asymptotically efficient estimates under certain regularity conditions. Maddala and Nimalendran [MN] (1995) employ an unobserved components panel data model to estimate the effects of unexpected earnings on change in price, change in bid-ask spreads and change in trading volume. Traditionally, the unexpected earnings (actual-analysts forecast), AE, is employed as a regressor in a regression model to explain the changes in spreads (AS) or changes in volume
I These models have also been discussed extensively under the titles: linear structural models with measurement errors, analysis of covariance structures, path analysis, causal models and content variable models. Bentler and Bonett (1980) and BoUen (1989) provide excellent introductions to the subject.
517
(A V). 2 However, the unexpected earnings are error-ridden proxies for the true unexpected earnings. Therefore, the estimates and the standard errors suffer from all the problems associated with error in variables. MN employ an unobserved components model to obtain consistent estimates of the coefficients on the unobserved variable and the consistent standard errors. In the 3-equation model they consider, it is assumed that the absolute value of the change in price ]API, the change in spread AS, and the change in volume AV are three indicator variables of the unobserved absolute value of the unexpected true earnings IAE*1. The specification of the model is,
lAP] = e0 + elIAE*I + et
AS =/~0 +/~llaE*l + e2 AV = 70 + 711AE*I-~-~3

,
(16)
where it is assumed that the errors, el, I = 1,2, 3, are uncorrelated and they are also uncorrelated with the unobserved variable ]AE*[. Then the covariance matrix of the observed variables implied by the model is given by
elO" e + 0-2
22
--
e l f l l o . 2 "~ 0"12
el}'l O'2 -I- O"13 2 f1171o.e -~- 0"23

+ 4
f l l f f e + O.2
-
2 2
'
(17)
where, aij = cov(ei, ej), i,j = 1,2, 3 and a,2 = Var(AE*). Since the sample estimates of the variance-covariance matrix are consistent estimates of the population parameters, one can estimate the parameters el, ill, 71, o.12,o.2, and o.e 2, by setting the sample estimates equal to the population variance-covariance elements. However, there are seven unknown parameters and only six pieces of sample information. Therefore the system is under identified and only fll/cq and 71/el that are estimable. The parameters el, 131, and Yl are not separately estimable. Among the variances o.1, 2 o'2, 2 0-~ are estimable and so is e12%.2 Let the variance-covariance matrix based on sample data be given by
S = Var
AV
=
--
s22 $23
-$33
(18)
Then consistent estimates for the parameters are given by:
2 Morse and U s h m a n (1983) examined a sample of O T C (Over the Counter) firms and found no evidence of change in the spread around earnings announcements. Skinner (1991) using a sample of N A S D A Q firms found only a weak evidence of an increase in spread prior to an earnings announcements. Skinner used change in price around the earnings announcement as a proxy for the forecast errors.
518
/}1 - - S23 ~
&l S13 ^ 0-2 :
91 $23 -Z-=--~ ~1 S12 ~,2^2 ^2 $22 - - ]J10~10-e,
^2 ^2 S12 ~lO. e __ ^ fll/&l and
~2
^2 ^2 = S l l - - (Xl0-e~ ^2 ^ 2 ^ ~11~1~e
(19)
&2 : s 3 3 -
It should also be noted that the model described by equations (16) can be written
as: * fll *
AS -- flo + 7 , lael + Av =
* l+e3,
where,
* e2 =e2-f l l ~1
(20) .
fl~=f10---/81c~0
and
with 7~ and e~ defined similarly. From equations (19) and (20), it is easy to see that /}l/&l is the IV (instrumental variable) using AV as an instrumental variable, and 9l/&~ is the IV estimator from using AS as an instrumental variable. The above model shows that it is not necessary to observe the unobservable variable to estimate the parameters of the model. The sample moments contain sufficient information to identify the structural parameters. Also, since the above model is exactly identified, the method-of-moment estimators are also maximum likelihood estimates under normality assumption, with all its desirable properties. The above model gives estimates of the effects of unexpected earnings on the other variables that are free of the errors-in-variables bias involved in studies that use IAEI or lael as a proxy for IAE*1. M N find that errors-in-variables can result in substantial biases in OLS estimates leading to incorrect inferences. Maddala and Nimalendran (1995) also estimate a 4-equation model in which the absolute value of the unexpected earnings (IAEI) is used as an additional proxy. When there are more than 3 indicator variables, the model is over identified (assuming that the errors are mutually uncorrelated and they are uncorrelated with the latent variable). That is there are more unique sample pieces of information than unknown parameters. If there are N indicators then there are N(N + 1)/2 sample moments (variances and covariances) but there are only 2N unknown parameters. The additional information allows one to estimate additional parameters such as some of the covariances between error terms. More importantly, M N use the panel data structure (quarterly earnings for a crosssection of firms) to obtain within group and between group estimates that provide information about the short term and long term effects of earnings surprises on microstructure variables.
5.2. Testing signaling models

The study of the relationship between signals and markets' response to them is an important area of financial research. In these models it is argued that managers with private information employ indicators such as dividends, earnings, splits,
519
capital structure etc. to convey their private information to the market. In testing these models one has to realize that the indicators are only "error ridden" proxies for the "true" underlying attribute being signaled. Therefore, the latent variable/ structural equation models would be more suitable compared to the traditional regression models. Israel, Ofer and Siegel (1990) discuss several studies that use changes in equity value as a measure of the information content of an event (earnings announcement, dividend announcement, etc.) and use this as an explanatory variable in other equations. See, for instance, Ofer and Siegel (1987). All these studies test the null hypothesis that there is no information content about earnings embodied in a given announcement, by testing for a zero coefficient on the change in equity value AP. Israel, et.al, assume that AP is a noisy measure of the true information content AP*, and they investigate the power of standard tests of hypotheses by simulation for given values of the slope coefficient, and the ratio of the error variance to var(AP). The information in dividend announcements above that in earnings data, and whether such announcements lead to subsequent changes in earnings estimates, have been studied interalia in Aharony and Swary (1980) and Ofer and Siegel (1987). Ofer and Siegel use change in equity value surrounding the dividend announcement as a proxy for the information content and use this as an explanatory variable in the dividend change equation. However, a more reasonable model to estimate, that is free of the errors-in-variables bias is to treat information content as an unobserved signal and use change in equity value, unexpected dividends, and change in expected earnings as functions of the unobserved signal. This is illustrated in the paper by Desai, Nimalendran and Venkataraman [DNV] (1995). DNV estimate a latent variable/structural equation model to examine the information conveyed by stock splits which are announced contemporaneously with dividends. They also examine whether dividends and stock splits convey a single piece of information or whether they provide information about more than a single attribute. Their analysis shows that dividends and splits convey information about two attributes, and more importantly the latent variable approach gives unbiased and asymptotically efficient estimators. Several recent papers in the area of signaling have argued that management may use a combination of signals to reduce the cost of signaling. It is also possible that management can signal in a sequential manner using insider trading and cash dividends (see for example John and Mishra (1990) and the references in it). Many of the signals used by management are changes in dividends, stock splits, stock repurchases, investment and financial policies, insider trading and so on. In testing these models one has to measure the price reaction around the announcement date and also estimate the unexpected component of the signal used (such as unexpected component of dividend change). Generally simple models such as setting the expected dividend equal to past dividend is used. These naive models can lead to substantial errors.
520
5.3. M I M I C models
If there are multiple indicators and multiple causes, then these models are called MIMIC models (Joreskog and Goldberger (1975)). Note that the multiple indicators of a single or multiple latent variables model is a special case of the MIMIC model. The structural form is Y=Az*+e z* = X/A + v (21)
where, Ymxl represents the vector of indicator variables, z* is unobservable and is related to several causes given by the vector Xkl, and Akl is a vector of parameters. A potential application of the above model in financial research involves the effects of trading mechanisms (or information disclosure) on liquidity and cost of trading. One function of a stock market is to provide liquidity. Several theoretical and empirical papers have addressed this issue (see for example Grossman and Miller (1988), Amihud and Mendelson (1986), Christie and Huang (1994)). The effect of market structure on liquidity is generally examined by analyzing the change in spreads (effective or quoted) associated with stocks that move from one market to another (as in Christie and Huang (1994). However, spread is only one of several proxies that measure liquidity (other proxies are volume of trade, market depth, number of trades, time between trades etc.) More important, there could be several causes driving a stock's liquidity that include: an optimum price, trading mechanism, frequency and type of information, type of investors, type of underlying assets or investment opportunities of the firm. Given multiple indicators and multiple causes, a M I M I C model is more suitable to evaluate effects of trading mechanism and market structure on liquidity.
5.4. Limitations with M I M I C / l a t e n t variable models 5.4.1. Problem of poor proxies and choice of proxies There are several limitations of the latent variable or M I M I C models. Since the model formulation amounts to using the proxies as instrumental variables in the equations other than the one in which it occurs, the problem of poor proxies is related to the problem of poor instrumental variables, on which there is now considerable literature. Therefore the problems associated with the use of poor instruments suggests that caution should be exercised in employing too many indicators. For instance, Titman and Wessels (1988) use 15 indicators and impose 105 restrictions on the coefficient matrix. The problems arising from poor instruments are not likely to be revealed when one includes every conceivable indicator variable in the model. Very often there are several proxy variables available for the same unobserved variable. For instance, Datar (1994) investigates the effect of 'liquidity' on equity returns. He considers two proxies for liquidity: volume of trading, and size (market value). Apart from the shortcoming that his analysis is based on sizebased and volume-based grouping (which amounts to using the proxy variables as
Errors-in-variablesproblems in financial models
521
instrumental variables), he argues for the choice of volume as the preferred proxy for liquidity based on conventional t-statistics. The problem of choosing between different proxy varibales cannot be done within the framework of conventional analysis. A recent paper by Zabel (1994) analyzes this problem within the framework of likelihood ratio tests for non-nested hypotheses. However, instead of formulating the problem as a choice between different proxies, it would be advisable to investigate how best to use all the proxies to analyze the effect of say "liquidity" on stock returns. This can be accomplished by using the MIMIC model (or multiple indicator model) approach. Standard asymptotic theory leads us to expect that a weak instrument will result in a large standard error, thus informing us that there is not much information in that variable. However, in small samples a weak instrument can produce a small standard error and a large t-statistic which can be spurious. Dufour(1994) argues that confidence intervals based on asymptotic theory have zero probability coverage in the weak instrument case. The question of how to detect weak instruments in the presence of several instruments is an unresolved issue. There are some studies like Hall, Rudenbusch and Wilcox (1994) that discuss this but this study also relies on an asymptotic test. Jeong (1994) suggests alternative criteria based on an exact distribution. Thus the issue of which indicators to use and which to discard in MIMIC models needs further investigation. It might often be the case that there are some strong theoretical reasons in favor of some indicators and these any how need to be included (as done in the study by DNV).
5.4.2. Violation o f assumptions The second important limitation arises from the assumption that the errors are uncorrelated with the systematic component and among themselves. In the multiple indicator models, some of the correlations among the errors or the errors and the systematic parts may be introduced only if the number of indicators is more than three. The third problem arises from possible non-normality of the errors. In this case the estimates are still consistent, but the standard errors and other test statistics are not valid. Browne(1984) suggests a weighted least squares (WLS) approach which is asymptotically efficient, and provides the correct standard errors and test statistic under general distributional assumption. Finally, there is the question of small sample performance for the different tests based on the latent varibale model and FIML. 5.5. Estimation
All the models described in this section can be estimated by FIML. See Aigner and Goldberger (1977), Aigner, Hsiao, Kapteyn, and Wansbeek (1984), and Bollen (1989). The FIML approach provides an estimator that is consistent, asymptotically efficient, scale invariant, and scale flee. Further, through the Hessian matrix one can obtain standard errors for the parameter estimates. However, these standard crrors are consistent only under the assumption that the
522
observed variables are multivariate normal. If the observed variables have significant excess kurtosis, the asymptotic covariance matrix, standard errors, and the ~2 statistic (for model evaluation) based on the estimator are incorrect (even though the estimator is still consistent). Under these conditions, the correct standard errors and test statistics can be obtained by using the asympotically distribution free WLS estimators suggested by Browne (1984). The FIML estimates for the model are obtained by maximizing the following likelihood function.
L(O) = c o n s t a n t - ( 2 ) [ l g IZ(0)l + tr[SZ-l(0)]]
(22)
where S is the sample variance-covariance matrix for the observed variables, and E(0) is the covariance matrix implied by the model. Several statistical packages including LISREL and SAS provide FIML estimates and their standard errors. LISREL also provides the asymtotically distribution free WLS estimates.
6. Artificial neural networks (ANN) as alternatives to M I M I C models
One other limitation of the models considered in the previous section is the assumption of linearity in the relationships. The artificial neural network (ANN) approach is similar in structure to the MIMIC models (apart from differences in terminology) but allows for unspecified forms of non-linearity. In the ANN terminology the input layer corresponds to the causes in the MIMIC models, and the middle or hidden layer corresponds to the unobservables. In principle, the model can consist of several hidden or middle layers but in practice there is only one hidden layer. The ANN models were proposed by cognitive scientists as flexible non-linear models inspired by certain features of the way the human brain processes information. These models have only recently received attention from statisticians and econometricians. Cheng and Titterington (1994) provide a statistical perspective and Kuan and White (1994) provide an econometrics perspective. An introduction to the computational aspects of these models can be found in Hertz et. al. (1991) and the relationship between neural networks and non-linear least squares in Angus (1989). The ANN is just a kind of black box with very little said about the nature of the non-linear relationships. Because of their simplicity and flexibility and because they have been shown to have some success compared with linear models, they have been used in several financial applications for the purpose of forecasting. See Trippi and Turban (1993), Kuan and White (1994) and Hutchinson, Lo and Poggio (1994). Apart from the linear vs. nonlinear difference, another major difference is that the MIMIC models have a structural interpretation, but the ANN models do not. However, for forecasting purposes detailed specifications of the structure may not be important. There is considerable discussion about identification in the case of ANN, but the whole emphasis is on approximation and forecasting with a black box. Hornik, Stinchcombe and White (1990), for
Errors-in-variables problems infinancial models
523
instance, show that single hidden layer multi-layer neural networks can approximate the derivatives of an arbitrary non-linear mapping arbitrarily well as the number of hidden units increases. Most of the papers on A N N appear in the journal Neural Newtorks. However, not much work has been done on comparing M I M I C models discussed in the previous section with A N N models (with the exception of Qi, 1995).
7. Signal extraction methods and tests for rationality

The signal extraction problem is that of predicting the true values for the errorridden variables. In the statistical literature this problem has been investigated by Fuller (t990). In the finance literature the problem has been discussed by Orazem and Falk (1989). The set-up of the two models is, however, different. This problem can be analyzed within the context of M I M I C models discussed in the previous section. Consider, for instance, the problem analyzed by Maddala and Nimalendran (1995). Suppose we now have a proxy AE for AE* which can be described by the equation, AE = AE* +
e4 ,
(23)
where, AE is unanticipated earnings from say the IBES survey. The estimation of the M I M I C model considered in the previous section gives us an estimate of Var (AE*). The signal extraction approach gives us an estimate of AE* as AI~* = 7(AE) where 7Var (AE*) Var (AE) (24)
Thus, if we have a noisy measure of AE*, then this, in conjuction with the other equations in which AE* occurs as an explanatory variable, enables us to get estimates of 7 and this can be accomplished if we have other variables where AE* occurs as an explanatory variable. This method can also be used to test rationality of earnings forecasts (say those from the IBES survey). For an illustration of this approach see Jeong and Maddala (1991).
8. Qualitative and limited dependent variable models

Qualitative variable models and limited dependent variable models also fall in the category of unobserved variable models. However, in these cases there is partial observability (observed in a range or in a qualitative fashion). The unobserved variable models discussed in the previous section are of a different category. There is, however, a need to combine the two approaches in the analysis of event studies. For instance, in the signaling models, there are different categories of signals: dividends, stock splits, stock repurchases, etc. In connection with these models there are the two questions, of whether or not to signal, and how best to signal. When considering the information content of different announcements,
524
(say dividend change or stock split) it is customary to consider only the firms that have made these signals. But given that signaling is an endogenous event (the firm has decided to signal), there is a selection bias problem in the computation of abnormal returns computed at the time of the announcement (during the period of the announcement window). There are studies such as McNichols and Dravid (1990) that consider a matched sample and analyze the determinants of dividends and stock splits. However, the computation of abnormal returns does not make any allowance for the endogeneity of the signals. In addition, there are some conceptual problems involved with the "matched sample" method almost universally used in financial research of this kind. The problem here is the following. Suppose we are investigating the determinants of dividends. We have firms that pay dividends and we get a "matched sample" of firms that do not pay dividends. The match is based on some attribute X that is common to both. Usually the variable X is also used as an explanatory variable in a (logit) model to explain the determinants of dividends. If we have a perfect match, then we have the situation that one firm with the value of X has paid a dividend, and another with the same value of X has not. Obviously, X cannot explain the determinants of dividends. The determinants of dividend payments must be some other variables besides the ones that we use to get matched samples. The LISREL program can deal with ordinal and censored variables besides continuous variables. However, combining M I M I C models with selection bias in the more relevant financial applications, as in the example of McNichols and Dravid (1990) is more complicated if we allow for endogeneity of the signals. It is, however, true that the self-selection model, has as its reduced form a censored regression model. Thus the LISREL program can be used to account for selection bias in its reduced f o r m . But the estimation of MIMIC models with selection bias in the structural form needs further work.
9. Factor analysis with measurement errors
In the econometrics testing of the APT (arbitrage pricing theory) many investigators have suggested that the unobserved factors might be equated with observed macro economic variables. See inter alia Chen, Roll and Ross (1986); Chan, Chen and Hsieh (1985); and Conway and Reinganum (1988). The papers using observed variables to represent the factors treat these variables as accurate measures of a linear transformation of the underlying factors so that the regression coefficients are estimates of the factor loadings. However, these observed macro-economic variables are only proxies which at best measure the factors subject to errors of measurement. Cragg and Donald (1992) develop a framework for testing the APT considering the fact that the factors are measured with error. They apply this technique to monthly returns over the period 1971-90 (inclusive) for 60 companies selected at random form the CRSP tape. They consider 18 macroeconomics
525
variables but found that they represent only four or five factors. The method they used, as outlined in Cragg and Donald (1995) is based on the GLS approach to factor analysis, which is an extension of earlier work by Joreskrg and Goldberger (1972) and D a h m and Fuller (1986). Cragg and Donald argue that there is no way of estimating the underlying factors in an A P T model without measurement error. In particular this holds for macro-economic variables that are possible proxies. However, as argued in the previous sections, an alternative method to handle the measurement error problem is to use the unobserved components model where the macroeconomic variables (used as proxies) are treated as indicators of unobserved factors. The L I S R E L program can be used to estimate this model. Tests of the A P T can be conducted within this framework as well, and it will be free of the errors-in-variables problem. The L I S R E L program handles both the G L S and M L estimation methods. However, the M I M I C models impose more structure than the Cragg-Donald approach. A comparison of the two approaches - the multiple indicator approach and the approach of factor analysis with measurement errors is a topic for further research. 10. Conclusion This article surveys several problems in financial models caused by errors-invariables and use of proxies. In addition, the article also examines alternative models and techniques that can be employed to mitigate the problems due to errors-in-variables. As noted in the different places, several important gaps exist in the financial literature. First, m a n y models in finance use grouping methods to mitigate error-in-variables problems. This approach can be viewed as the use of instrumental variable (IV) methods. Therefore, it is appropriate to make use of the recent econometrics literature on instrumental variables, which discusses the problem of p o o r instruments, judging instrument relevance, and choice among several instruments. Second, since the use of proxy variables for unobservables is also very pervasive, use can be made of the vast econometrics literature on latent and unobservable variables. F o r instance, M I M I C models are not used as often as they should be. Also, the interrelationships and comparative performance of M I M I C models, A N N models and factor analytic models with measurement errors need to be studied. References Aharony, J. and I. Swary (1980). Quarterly dividend and earnings announcements and stockholders' returns: An empirical analysis. J. Finance 35, 1-12. Aigner, D. J. and A. S. Goldberger eds., 1977. Latent Variables in Socio-Economic Models. North Holland, Amsterdam. Aigner, D. J., C. Hsiao, A. Kapteyn and T. Wansbeek (1984). Latent variable models in econometrics. In: Z. Griliches and M. D. Intrilligator eds., Handbook of Econometrics Vol II, North Holland, 1321-1393. Amihud, A. R. and H. Mendelson (1986). Asset pricing and the bid-ask spread. J. Financ. Econom. 17, 223-249.
526
Angus, J. E. (1989). On the connection between neural network learning and multivariate non-linear least squares estimation. Neural Networks 1, 42-47. Banz, R. (1981). The relations between returns and market values of common stocks. J. Einanc. Econom. 9, 3-18. Bekker, P., A. Kapteyn, and T. Wansbeek (1985). Errors in variables in econometrics: New developments and recurrent themes. Statistica Neerlandica 39, 129-141. Bentler, P. M. and D. G. Bonett (1980). Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin 88, 588-606. Black, F., M. C. Jensen and M. Scholes (1972). The capital asset pricing model: Some empirical tests. In: M. Jensen ed., Studies in the Theory of Capital Markets, Praeger, New York, 79-121. Bollen, K. A., (1989). Structural equations with latent variables. New York, Wiley. Booth, J. R. and R. L. Smith (1985). The application of errors-in-variables methodology to capital market research: Evidence on the small-firm effect. J. Financ. Quant. Anal. 20, 501-515. Browne, M. W. (1984). Asymptotically distribution-free methods for the analysis of covariance structures. Brit. J. Math. Statist. Psych. 37, 6~83. Chamberlain, G. and A. S. Goldberger (1990). Latent variables in econometrics, o r. Econom. Perspectives 4, 125 152. Chan, K. C., N. F. Chert and D. A. Hsieh (1985). An exploratory investigation of the firm size effect. J. Financ. Econom. 14, 451-471. Chen, N. F., R. Roll, S. A. Ross (1986). Economic forces and the stock market. J. Business 59, 383 403. Cheng, B. and D. M. Titterington (1994). Neural networks: A review from the statistical perspective (discussion). Statist. Sci. 9, ~54. Chen, N. (1983). Some empirical tests of the theory of arbitrage pricing. J. Finance 38, 139~1414. Christie, W. G. and R. D. Huang (1994). Market structures and liquidity: A transactions data study of exchange listings. J. Finan. lntermed. 3, 300-326. Connor, G. and R. A. Korajczyk (1986). Performance measurement with the arbitrage pricing theory. J. Financ. Econom. 15, 373-394. Connor, G. and R. A. Korajczyk (1988). Risk and return in an equilibrium APT: An application of a new methodology. J. Financ. Econom. 21, 255 289. Connor, G. and R. A. Korajczyk (1994). Arbitrage pricing theory. In: R. Jarrow, V. Maksimovic, and W.T. Ziemba eds., The Finance Handbook, North Holland Publishing Co. Conway, D. A. and M. C. Reinganum (1988). Stable factors in securing returns: Identification using cross-validation. J. Business Eeonom. Statist. 6, 1 15. Cragg, J. G. and S. G. Donald (1992). Testing and determining arbitrage pricing structure from regressions on macro variables. University of British Columbia, Discussion paper #14. Cragg, J. G. and S. G. Donald (1995). Factor analysis under more general conditions with reference to heteroskedasticity of unknown form. In: G. S. Maddala, Peter Phillips and T. N. Srinivasan eds., Advances in Econometrices and Quantative Economics, Essays in Honor of C. R. Rao (Blackwell). Datar, V. (1994). Value of liquidity in financial markets. Unpublished Ph.D. dissertation, University of Florida. Desai, A. S., M. Nimalendran, and S. Venkataraman (1995). Inferring the information conveyed by multiple signals using latent variables/structural equation models. Manuscript, University of Florida, Department of Finance, Insurance and Real Estate. Dahm, P. F. and W. A. Fuller (1986). Generalized least squares estimation of the functional multivariate linear errors in variables model. J. Multivar. Anal. 19, 13~141. Douglas, G. W. (1969). Risk in the equity markets: An empirical appraisal of market efficiency. Yale Economic Essays 9, 3-45. Dufour, J. M. (1994). Some impossibility theorems in econometrics with applications to instrumental variables, dynamic models and cointegration. Paper presented at the Econometric Society European Meetings, Maastricht. Erickson, T. (1993). Restricting regression slopes in the errors-in-variables model by bounding the error correlation. Econometrica 61,959-969.
527
Fama, E. F. and K. R. French (1992). The cross-section of expected stock returns. J. Finance 47, 427-465. Fama, E. F. and J. MacBeth (1973). Risk, return and equilibrium: Empirical tests. J. Politic. Econom. 81, 607-636. Frisch, R. (1934). Statistical Confluence Analysis by Means of Complete Regression Systems. Oslo, University Institute of Economics. Fuller, W. A. (1990). Prediction of true values for the measurement error model. In: P. J. Brown and W. A. Fuller eds., Statistical Analysis of Measurement Error Models and Applications: Contemporary Mathematics Vol. 12, 41-58. Fuller, W. A. (1980). Properties of some estimators for the errors-in-variables model. Ann. Statist. 8, 407-422. Geweke, J. and G. Zhou (1995). Measuring the pricing error of the arbitrage pricing theory. Federal Reserve Bank of Minneapolis, Research Dept., Staff report #789. Gibbons, M. R. (1982). Multivariate tests of financial models, a new approach. J. Financ. Econom. 10, 3-27. Gibbons, M. R. and W. Ferson (1985). Testing asset pricing models with changing expectations and an unobservable market portfolio. J. Financ. Econom. 14, 217-2236. Goldberger, A. S. (1972). Structural equation methods in the social sciences. Econometrica, 40, 979-1001. Greene, W. H., (1993). Econometric Analysis, 2nd ed., Macmillan, New York. Griliches, Z. (1974). Errors in variables and other observables. Econometrica 42, 971-998. Griliches, Z. (1985). Economic data issues. In: Z. Griliches and M. D. Intrilligator eds., Handbook of Econometrics, Vol III, North Holland, Amsterdam. Grossman, S. J. and M. H. Miller (1988). Liquidity and market structure. J. Finance 43, 617-637. Hall, A. R., G. D. Rudenbusch and D. W. Wilcox (1994). Judging instrument relevance in instrumental variable estimation. Federal Reserve Board, Washington D. C. Hertz, J., A. Krogh, and R. G. Palmer (1991). Introduction to the Theory of Neural Computation. Addison Welsey, Redmont City. Hornik, K., M. Stinchcombe and H. White (1990). Universal approximation of an unknown mapping and its derivatives. Neural Networks 3, 551-560. Hutchinson, J. M., A. M. Lo and T. Piggo (1994). A non-parametric approach to pricing and hedging derivative securities via learning networks. J. Finance 49, 851-899. Israel, R., A. R. Ofer and D. R. Siegel (1990). The use of the changes in equity value as a measure of the information content of announcements of changes in financial policy. J. Business Econom. Statist. 8, 209-216. Iwata, S. (1992). Instrumental variables estimation in errors-in-variables models when instruments are correlated with errors. J. Econometrics 53, 297-322. Jeong, J. (1994). On pretesting instrument relevance in instrumental variable estimation. Unpublished paper, Emory University. Jeong, J. and G. S. Maddala, (1991). Measurement errors and tests for rationality. J. Business Econom. Statist. 9, 431-439. John, K. and B. Mishra (1990). Information content of insider trading around corporate announcements: The case of capital expenditures. J. Finance 45, 835-855. J6reskog, K. G. and A. S. Goldberger (1975). Estimation of a model with multiple indicators and multiple causes of a single latent variable. J. Amer. Statist. Assoc. 70, 631-639. J6reskog, K. G. and D. Sorb6m (1989). LISREL 7. User's Reference, (First Ed.), SSI Inc. Publication, Chicago. J6reskog, K. G. and D. Sorbrm (1993). LISREL 8. Structural equation modeling with the SimplisTM command language. SSI Inc. Publication, Chicago. Kalman, R. E. (1982). System identification from noisy data. In: A. Bednarek and L. Cesari eds., Dynamical Systems II, New York Academic Press. Kapteyn, A. and T. Wansbeek (1984). Errors in variables: Consistent adjusted least squares (CALS) estimation. Communications in Statistics: Theory and Methods 13, 1811-37. Klepper, S. and E. E. Leamer (1984). Consistent sets of estimates for regression with errors in all variables. Econometrica 55, 163-184.
528
Koopmans, T. C. (1937). Linear Regression Analysis o f Economic Time Series. Haarlem, Netherlands Economic Institute, DeErven F. Bohn, NV. Krasker, W. S. and J. W. Pratt (1986). Bounding the effects of proxy variables on regression coefficients. Econometrica 54, 641-655. Kuan, C. M. and H. White (1994). Artificial neural networks: An econometric perspective. Econom. Rev. 13, 1-91. Learner, E. (1987). Errors in variables in linear systems. Econometrica 55, 893-909. Lehmann, B. N. and D. M. Modest (1988). The empirical foundations of the arbitrage pricing theory. J. Financ. Econom. 21, 213-254. Litzenberger, R. H. and K. Ramaswamy (1979). The effect of personal taxes and dividends on capital asset prices. J. Financ. Econom. 7, 163-195. Lys, T. and J. S. Sabino (1992). Research design issues in grouping-based tests. J. Financ. Econom. 32, 355-387. Maddala, G. S. (1992). Introduction to Econometrics. 2nd ed., Macmillan, New York. Maddala, G. S. and M. Nimalendran (1995). An unobserved component panel data model to study the effect of earnings surprises on stock prices, volume of trading and bid-ask spreads. J. Econometrics 68, 299-242. McNichols, M. and A. Dravid (1990). Stock dividends, stock splits, and signaling. J. Finance 45, 857 879. Miller, M and M. Scholes (1972). Rates of returns in relation to risk: A reexamination of some recent findings. In: M. Jensen ed., Studies in the Theory of Capital Markets, Praeger, New York, 47-78. Morse, D. and N. Ushman (1983). The effect of information announcements on market microstructure. Account. Rev. 58, 274-258. Ofer, A. R. and D. R. Siegel (1987). Corporate financial policy, information, and market expectations: An Empirical investigation of dividends. J. Finance 42, 889-911. Orazem, P. and B. Falk (1989). Measuring market responses to error-ridden government announcements. Quart. Rev. Econom. Business 29, 41-55. Pakes, A. (1982). On the asymptotic bias of the Wald-type-estimators of a straight-line when both variables are subject to error. Internat. Econom. Rev. 23, 491-497. Qi, M. (1995). A comparative study of Neural Network and MIMIC Models in a study of option pricing. Working Paper, Ohio State University. Rahman, S., F. J. Fabozzi, and C. F. Lee (1991). Errors-in-variables, functional form, and mutual fund returns. Quart. Rev. Econom. Business. 31, 24-35. Roll, R. W. (1977). A critique of the asset pricing theory's tests-part I: On past and potential testability of the theory. J. Financ. Econom. 4, 129-176. Roll, R. W. and S. A. Ross (1980). An empirical investigation of the arbitrage pricing theory. J. Finance 35, 1073-1103. Schneeweiss, H. (1976). Consistent estimation of a regression with errors in the variables. Metrika 23, 101-115. Shanken, J. (1992). On the estimation of beta-pricing models. Rev. Financ. Stud. 5, 1-33. Shanken, J. (1992). The current state of the arbitrage pricing theory. J. Finance 47, 1569-74. Shanken, J. and M. I. Weinstein (1990). Macroeconomic variables and asset pricing: Further results. University of Southern California. Skinner, D. J. (1991). Stock returns, trading volume, and the bid-ask spreads around earnings announcements; Evidence from the NASDAQ national market system. The University of Michigan Titman, S. and R. Wessels (1988). The determinants of capital structure choice. J. Finance 43, 1-19. Trippi, R. and E. Turban (1993). Neural Networks in Finance and Investing. Chicago, Probus. White, H. (1989). Some asymptotic results for learning in single hidden-layer feed forward network models. J. Amer. Statist. Assoc. 86, 1003-1013. Zabel, J. E. (1994). Selection among non-nested sets of regressors: The case of multiple proxy variables. Discussion paper, Tufts University. Zellner, A. (1970). Estimation of regression relationships containing unobservable independent variables. Internat. Econom. Rev. 11, 441-454.
G. S. Maddalaand C. R. Rao, eds., Handbook of Statistics, Vol. 14 1996ElsevierScienceB.V. All rightsreserved.
I. (~
t')
Financial Applications of Artificial Neural Networks
Min Qi
1. Introduction
Data-driven modeling approaches, such as Artificial Neural Networks (ANN), are becoming more and more popular in financial applications. Broadly speaking ANNs are nonlinear nonparametric models. ANNs allow one to fully utilize the data and let the data determine the structure and parameters of a model without any restrictive parametric modeling assumptions. They are appealing in financial area because of the abundance of high quality financial data and the paucity of testable financial models. As the speed of computers increases and the cost of computing declines exponentially, this computer intensive method becomes attractive. The present paper first outlines ANN, and briefly points out its relation to some of the traditional statistical methods in Section 3. Section 4 provides some useful ANN modeling methodologies. Section 5 reviews empirical studies in several major areas of financial applications. Section 6 presents the conclusions.
2. Artificial neural networks
The past decade has seen an explosive growth in studies of neural networks after three consecutive cycles of enthusiasm and skepticism since the 1940's. This has been brought about largely by the realization that ANNs have powerful pattern recognition properties that may outperform other existing modeling techniques in many applications. ANNs have attracted attention of researchers from a diverse field of applications including signal processing, medical imaging, economic and financial modeling (to name only a few). Meanwhile researchers from cognitive science, neuroscience, psychology, biology, computer science, mathematics, physics and statistics have contributed to the structural and methodological developments of ANNs. Many different networks, such as multilayer feedforward networks, recurrent and statistical networks, associative memory networks and self-organization networks, etc., thus have been developed for different purposes. A variety of supervised or unsupervised learning rules are now available to train a 529
530
M. Qi
network from data. Among these, multilayer feedforward backpropagation network is the most popular one in financial applications and is the focus of the present paper. Wide-ranging introductions to neural network theory can be found in Hecht-Nielsen (1990), Hertz, Grogh and Palmer (1991), Wasserman (1993) and Bose and Liang (1996). White, Gallant, Hornik, Stinchcombe and Wooldridge (1992) present a collection of papers that carry out mathematical analysis of the approximation and learning abilities of ANNs for those who are familiar with neural networks, or mathematical statistics. Gately (1996) provides a very nontechnical, step-by-step approach to neural network applications for beginners.
2.1. A N N structure
Inspired by studies of the brain and nerve system, neural networks simulate a highly interconnected, parallel computational structure with many relatively simple individual units Individual units are organized in layers: the input, middle and output layers Feedforward networks map inputs into outputs with signals flowing in one direction only, from the input layer to the middle layer and then the output layer Each unit in the middle and output layers has a transfer function which transfers the signal it receives. The input layer units do not have a transfer function, but they are used to distribute input signals to the network Each connection has a numerical weight, which modifies the signals that pass through it. Consider a three-layer feedforward network with a single output unit, k middle layer units and n input units (see Figure 1). The input layer can be represented by a vector X = ( X l , X 2 , . . . ,Xn) t, the middle layer can be represented by a vector M = (ml,m2,... ,ink)~, and y is the output. Any middle layer unit receives the weighted sum of all inputs and a bias term (denoted by x0, x0 always equals one), and produces an output signal 1,2,...,k, i=0,1,2,...,n, (2.1)
/~output
~ bias: Xo~)
layer:y
weight vector: ~t=(Oto,~l..... O~k)'
\
-><-/\
b l a s~ ~ '/ m ~ \
~ /~ ~ . , /"-~
middlelayer:m=(mo,m~ .....ink)'
~ wi~htmatrix[~ ei:putlayer::X=(xoxl ..... --)'
Fig. 1. A three-layer feedforward neural network
Financialapplications of artificial neuralnetworks
531
where F is the transfer function, xg is the i th input signal, and fl,.j is the weight of the connection from the i th input unit to t h e f h middle layer unit. In the same way, the output unit receives the weighted sum of the output signals of the middle layer units, and produces a signal Y
=
G(~_~ cgmj),
j = 0, 1,2,... ,k,
(2.2)
where G is the transfer function, ej is the weight of the connection from the jth middle layer unit to the output unit, and j = 0 indexes a bias unit m0 which always equals one. Substituting (2.1) into (2.2), we get
y = G ~o + ~ ejF
flijxi
= f ( X , O) ,
(2.3)
where X is the vector of inputs, and 0 = ( c % e l , e 2 , . . . , e k , / ~ 0 a , f l 0 > . . . , fi0k, fill, fl12..., file,..., ft,1, ft,2, ,/~,k)' is the vector of network weights. F and G can take several functional forms, such as the threshold function which produces binary (-4-1)or (0/1) output, the sigmoid (or logistic) function which produces output between 0 and 1, F(a)= G(a)= 1/(1 + e x p ( - a ) ) or F(a) = a (identity) and G(a) = 1/(1 + exp(-a)). (2.3) can be interpreted as a nonlinear function which represents the described three-layer feedforward neural network. As will be shown in Section 3, this representation nests many familiar statistical models, such as regression (linear and nonlinear), classification (logit, probit), latent variable models (MIMIC), Principlal component analysis, and time series analysis (ARMA, GARCH). The basic A N N structure represented by (2.3) can be generalized in many different ways. For example, Poli and Jones (1994) introduce a multilayer feedforward A N N with observation noise and random connections between units. Based on some distributional assumptions of the noise and the randomness of the connections, such an A N N can be estimated by a Kalman filtering procedure which has been shown to have greater predictive accuracy than the Newton algorithm for a chaotic time series that was generated from a logistic map.
2.2. A N N learning
The most widely used estimation method (or so called learning rule) of the A N N described in the previous section is error backpropagation (Rumelhart, Hinton and Williams, 1986a,b), which is considered to be a major reason of the explosive reemergence of interest in multilayer neural networks in the mid-1980's. A good discussion of various estimation methods is given in Kuan and White (1994). Backpropagation is a recursive gradient descent method that minimizes the sum of the squared errors of the system by moving down the gradient of the error curve. More specifically, network weight vector 0 is chosen to minimize the loss function,
532
1 N
M. Qi
min L =
0
~,(Yt N ~=
- ):t) 2
(2.4)
where N is the sample size, yt is desired (or target, actual) output value and ):t is the calculated output value,
?t = f ( X t , O) = G Oto +
j=l
~zjF
flijxi,
))
(2.5)
Then the iterative step of the gradient descent algorithm takes 0 to 0 + AO, and AO = -t/Vf(Xt, O)(yt - f ( X t , 0)) , (2.6)
where q > 0 is the step size, or learning rate, Vf(Xt, O) is the gradient of f(Xt, O) with respect to 0 (a column vector), and the chain rule is used to calculate V f ( X t , 0). The error surface is multi-dimensional and may contain many local minima. As a result, training the network often requires experimentation with different starting weights, adjusting the learning rate, or adding a momentum term to avoid getting stuck in local optima or slow convergence. For most studies that aim at comparing the ANN with some alternative models, as long as the ANN performs significantly better than its counterpart, it is not necessary to search for global minima. For studies that try to search for global minima, a grid search method is often used (see Gorr, Nagin and Szczypula, 1994, for example). Other methods have also been proposed, for example, Baldi and Hornik (1989) find that the error surface has a unique minimum which corresponds to the projection onto the subspace generated by the first principal component vector of the covariance matrix of the data. White, Gallant, Hornik, Stinchcombe and Wooldridge (1992) has more discussion about the global optimization. The iteration stops when either the prespecified maximum number of iterations or the error goal has been reached.
2.3. Universal approximation
A major advantage of ANNs is their ability to provide a flexible mapping between inputs and outputs. Based on a series of studies by Kolmogorov (1957), Sprecher (1965), Lorentz (1976), and Hecht-Nielsen (1987, 1990), any continuous function can be computed using linear summations and a single properly chosen nonlinear function. Therefore, the arrangement of the simple units into a multilayer framework produces a mapping between inputs and outputs that is consistent with any underlying functional relationship regardless of its "true" functional form. Having a general mapping between the input and output vectors eliminates the need for unjustified a priori restrictions which are needed in common statistical and econometric modeling.
Financial applications of artificial neural networks
533
However, to implement a perfectly general mapping between inputs and outputs, correct transfer functions are needed. Sigmoid middle layer transfer function has been shown to serve the purpose by studies like Cybenko (1989), Funahashi (1989), Hecht-Nielsen (1989), Hornik, et al. (1989). Stinchcombe and White (1989) show that some non-sigmoid functions can also be used. Thus, an ANN can be viewed as a "universal approximator", i.e., a flexible functional form that can approximate an arbitrary function arbitrarily well, given sufficiently many middle layer units and properly adjusted weights.
3. Relationship between ANN and traditional statistical models

Most of the development in neural networks has been achieved primarily by nonstatisticians. Consequently, few statistical concepts and methods have been applied in this development. Nevertheless, some familiar statistical models can be represented in a general ANN framework, and many concepts and constructs can be expressed in a neural network notation (Cheng and Titterington, 1994). On the other hand, ANNs can be considered as a particular class of nonlinear parametric models, and "learning" corresponds to statistical estimation of the model parameters. As a result, modern theory of estimation and inference for nonlinear models can be applied to neural network learning (White, 1989a; Kuan and White, 1994). This section briefly outlines the relationship between ANNs and some of the traditional statistical methods.
3.1. Linear regression
Multiple linear regression models can be represented by a simple two-layer feedforward network with a linear transfer function F ( a ) = a, an ADALINE network of Widrow and Hoff (1960) (see Figure 2),
/I
y = Eflixi
i=o
:X'fl
(3.1)
where y is the output value, X = ( x 0 , x l , . . . ,Xn) t is the input vector, and fl = (rio, i l l , . . . , fin)' is the weight vector. While such a network has been proved useful in a variety of applications, it cannot generalize, or perform well on patterns that have
utput layer: y weight vector: J3=(13o,131 ..... l~n)'

(~)
Fig. 2. A D A L I N E network
(._.)
""
input layer: X=(xox, ..... x.)'
534
M. Qi
never been presented. It is also computationally more cumbersome than linear regression. However, it does not assume things like homoscedasticity and orthogonality as in linear regression about the true data generating process, and thus is more robust than classical linear regression. A multiple adaptive linear network, M A D A L I N E of Widrow and Hoff (1960), can be used to represent the standard systems of seemingly unrelated regressions (Figure 3):
Yl = ~
i=0
flliXi = Y t fll ,
Y2 = ~ ~2iXi ~" Y t fl2 , i=0
(3.2)
yk = ~
i=0
~kixi = X' ~k ,
If lagged outputs are used as network inputs in an A D A L I N E network, we get linear AR(d) time series equation:
d
Yt = Z
i=1
fliYt-i
(3.3)
3.2. Logit and probit models

In the two-layer A D A L I N E network with a linear discriminant transfer function, units are not activated until some threshold level is reached, i.e.,
y =F
ixi
(3.4)
where the transfer function F(a) = 1 if a >0 and F(a) = 0 if a _<0. The output unit is thus a threshold unit. Networks with a threshold output unit are suited for classification and pattern recognition problems. Since the transfer function F can be any continuous, nondecreasing function, F can represent a cumulative distribution function (cdf).
OUtput layer: Y--(Yl,Y2 ..... Yk)' ight matrix: 13 input layer: X=(xo,xi,x2 ..... x.)'
Fig. 3. M A D A L I N E network
Financialapplicationsof artificialneuralnetworks
535
When F is the logistic cumulative distribution function, F(~i"=oflixi) is the conditional expectation of the familiar binary logit model. When F is the normal n flixi) is the conditional expectation of a cumulative distribution function, F (Y~'~i=0 binary random variable generated by a probit model. For a more detailed introduction of logit and probit models, see Maddala (1983). Therefore, a two-layer neural network can represent the familiar logit and probit regression models, which are very popular in financial applications where binary classifications or decisions are involved. However, due to the limitations of a two-layer neural network, most of the classification applications of ANNs use one or more middle layers. It has been shown by Tam and Kiang (1992) that a two-layer A N N has a performance similar t~ that of linear discriminant analysis, but the incorporation of a hidden layer considerably improves the predictive accuracy. More work on ANNs and related methods for classification is discussed in Ripley (1994).
3.3. Principal component analysis

Principal component analysis (PCA) is a common statistical method of data analysis often used for reduction in the dimension of data matrix. The purpose is to find a set of m orthogonal vectors in data space that account for as much as possible of the data variance. Typically m is smaller than the dimension of the original data, thus, PCA performs a dimension reduction that retains most of the intrinsic information in the data and makes the reduced data much easier to handle. For a more detailed discussion, see Rao (1964) who examines the issue in what sense principal components provide a reduction of the data without much loss of information we are seeking from the data. Specifically, the first principal component is taken to be along the direction with maximum variance. The second principal component is constrained to lie in the subspace perpendicular to the first, within which it is taken along the direction with the maximum variance. Then the third principal component is taken in the maximum variance direction in the subspace perpendicular to the first two, and so on. In general the kth principal component direction is along an eigenvector direction belonging to the kth largest eigenvalue of the full covariance matrix. Several ANNs can perform PCA (Hertz, Grogh and Palmer, 1991). Let's first consider a two-layer linear feedforward network (see Figure 3),
yj = ~ flijxi = X'flj ,
i=1
(3.5)
where the input vector X = (xl,x2,... ,x.)' is n-dimensional, and flj is the weight vector for the jth output. Under either of the following learning rules, J
Aflij = ~lyj(xi - Z Ykflta) ,

k=l
(Sanger, 1989), or
(3.6)
536
M. Qi
n
mfliJ ~- ?]yj(xi -- Z
k=l
Yk[3ki) '
(Oja, 1989),
(3.7)
when an equilibrium has been reached, the average weight change is expected to be zero. It can be shown that mean(A//j) = C~: - (~Cflj)~j
= 0 ,
(3.8)
where C is the correlation matrix. An equilibrium weight vector thus must satisfy C/~: = 2:flj , with
= = =
(3.9)
(3.1o)
(3.9) shows clearly that an equilibrium 3j must be an eigenvector of the correlation matrix C, and (3.10) proves that I//j[ = 1. It can also be shown that 2: is the jth largest eigenvalue. PCA can also be performed by a three layer linear A N N with n inputs, n outputs, and m < n middle layer units, using a self-supervised backpropagation approach (Sanger, 1989). The idea is to make the target outputs equal to the inputs. As the outputs become arbitrarily close to the inputs in the training set, the m middle layer units end up projecting onto the subspace of the first m principal components. Various generalizations of neural PCA-type learning algorithms containing nonlinearities have been derived and discussed in Karhunen and Joutsensalo (1995).
3.4. Latent var&ble model with multiple indicators and multiple causes ( M I M I C model)
Causal models which contain latent variables have been extensively applied in several areas of social science, such as psychology, economics, education. They are potentially useful in financial applications. The latent variables are hypothetical and not directly observable, but have implications for relationships among observable variables. The observable variables may be effects ("indicators"), or causes of the latent variables, or both. Causal models with multiple indicators and multiple causes of latent variables are sometimes called MIMIC models. Such a MIMIC model can be easily represented in a three-layer feedforward linear A N N ( See Figure 4):
M
r = ,
(4.1)
(4.2)
where/~ is the weight matrix of connections between the input and middle layers, and ~ is the weight matrix of connections between the middle and output layers.
537
~weig
output layer: Y ht matrix: o~
middle layer: M ight matrix: 13
Q . ) input layer: X
Fig. 4. A three-layerfeedforwardlinear ANN
In (4.1), the middle layer units of the ANN, M = (ml, m 2 , . . . , rnk)' (comparable to the latent variables in a MIMIC model), is linearly determined by the input vector of the ANN, X -- (Xl,X2,...,Xn) I (corresponding to a set of observable exogenous causes). In (4.2), the middle layer units of the A N N linearly determine the output units of the ANN, Y = (Yl, y 2 , . . . , y,,)~, which correspond to a set of observable endogenous indicators. Under some assumptions about the disturbances added to (4.1) and (4.2), and some restrictions on the reduced form, the M I M I C model can be estimated by maximum-likelihood or some limited information approach (J6reskog and Goldberger, 1975). While no additional restrictions are needed to train an A N N MIMIC model, such a multilayer linear network has the same limitations as a two-layer one. It can only work if the input patterns are linearly independent (Hertz, Grogh and Palmer, 1991). A multilayer nonlinear network which can represent nonlinear MIMIC models will be more interesting.
4. A N N implementation and interpretation
It is well known that there are several limitations that may restrict the use of neural networks. First, there is no formal theory for determining optimal network structure, and the appropriate number of layers and middle layer units must be determined by experimentation. Second, there is no optimal algorithm to ensure the global minimum because of the multi-minima error surface. Third, statistical properties of A N N are generally not available, thus no statistical inference can be carried out. Fourth, it is difficult to interpret a trained A N N model. These limitations call for further studies in three broad areas outlined in Cheng and Titterington (1994): (1) mathematical modeling of real cognitive process; (2) theoretical investigations of networks and neurocomputing; (3) development of useful tools for practical prediction and pattern recognition. While the first two areas are certainly important, they are not the focus of the present paper. In this section, we outline some of the useful techniques and procedures that aim to overcome the aforementioned limitations.
538
4.1. M o d e l selection
M. Qi
Though ANNs can be universal approximators, the optimal network structure is not determined automatically. Failures in applications are sometimes due to a suboptimal A N N structure. To develop the optimal network in any financial application, one need to (1) identify the relevant inputs and outputs; (2) choose an appropriate network structure including the necessary number of hidden layers and hidden layer units; (3) use proper model evaluation criteria. We now clarify these points one by one.
4.1.1. A N N inputs and outputs
The choice of network input and output variables and the quality of data are critical to the success of A N N applications. The choice depends heavily on the type of task that an A N N is expected to perform and is more or less subjective to the modeler's discretion on the model and the scope of the study. It is common practice to use independent variables as network inputs and use dependent variables as network outputs in a model. For example, in a seminal study aimed at extracting nonlinear regularities from economic time series, White (1988) uses the lagged one day returns on IBM stock, rt-1, rt-2,. . . . . . , rt-p, as the network inputs and the one day return on day t, rt, as the network output. The goodness of fit of such an A N N provides evidence for or against the efficient markets hypothesis and the presence of nonlinear regularities in the case of IBM daily stock returns. However, as the author points out, in order to expand the scope of the search for evidence against the efficient markets hypothesis, the network needs to be elaborated by allowing additional inputs, such as volume, other stock prices and volume, leading indicators, macroeconomic data, etc. In another study by Grudnitski and Osburn (1993), 24 input units and one output unit are used in their A N N based on the belief that general economic conditions and traders' expectation about the futures market are related to price movements of futures. The input units represent six input variables per month (i.e., price change, price volatility, money growth rate, three percentage commitments of large speculators, large hedgers, and small traders) presented four months at a time. The output is the change of the monthly centered price mean for the forecast month. Sometimes, if there are more independent variables than one desires to include in the network input, dimension reduction techniques can be used. One can choose a smaller group of statistically significant variables from a regression of the dependent variable on a large group of independent variables. Principal component analysis and stepwise regression can also be used. For example, Salchenberger, Cinar and Lash (1992) perform a stepwise regression on 29 financial ratios, which results in the identification of five variables. Then the five financial variables are used as inputs of a neural network to forecast the probability of failure of thrift institutions.
539
In order to minimize the effect of magnitude among the inputs and outputs and increase the effectiveness of the learning algorithm, the data set is often normalized (or scaled) to be within a specific range depending on the transfer function. For example, if an ANN has sigmoid or logistic transfer function in the output unit, output needs to be scaled to fall in the range of [0, 1]. Otherwise, a target output which falls outside that range will constantly create large backpropagation errors, and the network will be unable to learn the input-output relationship that is implied by the particular training pattern. Typically, variables will be normalized to have zero mean and unit standard deviation. The quality of data and the degree to which data sets properly represent the population are very important, as is the case in any econometric and statistical modeling. To train and test an ANN, it is also important to have enough data.
4.1.2. A N N architecture After specifying the network input and output layers, the ANN architecture remains undetermined unless the necessary number of hidden layers and hidden layer units are determined. Consider layered networks of continuous-valued units with logistic transfer functions for hidden units and linear transfer functions for output units. Overall such a network implies a function, y = f ( X ) , from input variables, X = (xl, Xn) t, to output value, y. Due to the limitation of the capacity of a two-layer ANN (Hertz, Grogh and Palmer, 1991), networks with at least one middle layer are often used. Cybenko (1988) proves that ANN with at most two hidden layers can approximate a particular set of functions with arbitrary accuracy given enough units per layer. It has also been proved that only one hidden layer is enough to approximate any continuous function (Cybenko, 1989; Hornik, Stinchcombe and White, 1989). Many empirical studies, such as Collins, Ghosh and Scofield (1988), Dutta and Shekhar (1988), Salchenberger, Cinar and Lash (1992) (to name a few) have confirmed this. The correctness of these results, however, hinge on the appropriate number of hidden units. The choice of k, the number of hidden units, represents a compromise. If k is too small, an ANN may not approximate y = f ( X ) at the desired accuracy. However, if k is too large, an ANN may overfit and can not generalize (or forecast) out of sample. A useful method is cross-validation, by which the number of middle layer units is selected to optimize out-of-sample performance (White, 1990). Another related model selection criterion, predictive stochastic complexity (PSC) (defined as (4.10)) can also be used (Kuan and Liu, 1995). Other common methods for optimal network design have been reviewed by Refenes (1995b). These methods fall into three groups. The first is analytic techniques in which algebraic or statistical analysis is used to determine a priori hidden unit size. Several rules of thumb have been cited, such as the number of connections should be less than 0.1 T and the number of hidden units is of the order of (T-l) or log2 T, where T is the sample size. The main problem with these techniques is that they perform static analysis and can only provide a very rough
X2, . . . ,
540
M. Qi
estimate for hidden unit size. However, they compare well with current experimental methods for network design. The second type is constructive techniques, such as cascade correlation (Fahlman and Lebiere, 1990), tiling algorithm (Mezard and Nadal, 1989), neural decision tree (Gallant, 1986), upstart algorithm (Frean, 1989) and the CLS procedure (Refenes and Vithlani, 1991). These methods construct the hidden units in layers one by one as they are needed. Though these techniques guarantee the network convergence, generalization and stability are not guaranteed. The last type, network pruning, operates in the opposite direction by pruning the network and removing "redundant" or least sensitive connections. These include network pruning (Sietsma and Dow, 1991) and artificial selection (Hergert, Finnoff and Zimmermann, 1992). However, optimal pruning is not always possible. 4.1.3. A N N evaluation criteria A criterion is always needed to compare the performance of alternative models and select the best one. Let (~1,~2,...~N) denote the predicted values and (Yl,Y2,...YN) be the actual values, where N is the sample size. Some of the commonly used criteria are listed below. (1) Mean square error (MSE) and root mean square error (RMSE):
1 N
MSE = -~i~=l(yi - 9i) 2 , RMSE = ~ .
(4.1) (4.2)
(2) Mean absolute error (MAE) and mean absolute percentage error (MAPE):
1 N MAE = ~ Z ] Y i - Yil ,
i: "~ 1
(4.3) (4.4)
l @', yi -.~i M A P E = N i = ~ I Yi I " (MAPE is not available for samples in which Yi has zero actual values). (3) Coefficient of determination (R2): R2 = 1 Z(Yiyi)2 Z ( y i _ p)2 '
(4.5)
where ~ = 1 Z Yi. (4) Pearson correlation (p): p measures the linear correlation between predicted values and actual values, Z(Y'P = V/~(yi_ )')(Y;- }) 2)2V/~(~, / _ ))2 (4.6)
Financial applications o f artificial neural networks
541
(5) Theil's coefficient of inequality (U): U gives prediction performance relative to the random walk prediction, U= RMSE (4.7)
(6) Akaike information criterion (AIC): AIC adjusts MSE to account for the model complexity, (N + k~ AIC = M S E \ N _ k ] ' (4.8)
where k is the number of free parameters in the model, or the number of free weights in an ANN. (7) Schwarz information criterion (SIC), or Bayesian information criterion (BIC): SIC or BIC is another way of adjusting MSE to account for model complexity, SIC
----
BIC
----
ln(MSE)' t ~ ln(N) - - - k
(4.9)
(8) Predictive stochastic complexity (PSC): ,

1 N
PSC
k
i=k+l
Y i - yii) 2
(4.10)
where ~ii is the predicted value based on parameters obtained from the data up to the i-1 observation. (9) Direction accuracy (DA) and confusion rate (CR): DA=~Z 1 0
1 a
i ,
- Yi) > 0 ,
(4.11)
where ai =
if (Yi+l - Y i ) ( ) ; i + l otherwise.
CR = 1 - DA.
(4.12)
Sometimes, the significance of the difference in the performance of alternative models needs to be tested. T-test or Diebold-Mariano test (Diebold and Mariano, 1995) are often used to test the null hypotheses that there is no difference in the square errors of two alternative models. The hypothesis of independence between the actual and predicted directions can be tested by the HM test (Henriksson and Merton, 1981; Pesaran and Timmerman, 1994). It is worth noting that the in-sample performance of any properly designed and well trained ANNs, evaluated by the above measures, is usually much better than from their traditional statistical counterparts. This is not surprising given the universal approximation property of ANNs. To avoid spurious fit or overfit, it is important to test the trained ANN using hold-out sample, i.e., to evaluate the
542
M. Qi
trained A N N using data not been used in training the ANN. Whether the selected A N N model is useful or not depends primarily on the out-of-sample performance. Swanson and White (1995a,b) show that compared to a variety of out-ofsample forecast-based model selection criteria, such as forecast mean squared error, forecast direction accuracy, or forecast-based trading system profitability, an in-sample Schwarz information criterion (SIC) does not appear to be a reliable guide to out-of-sample performance. In cases where out-of-sample performance measures are used as model selection criteria, it is important to test the model using strictly untouched data set, i.e., data not used in training and validation. Otherwise, an upward bias in out-of-sample forecasting accuracy is likely to occur.
4.2. Stat&tical inference in A N N
Very few empirical studies of A N N applications report confidence intervals or conduct hypothesis testing, because the classical statistical properties are generally not available. However, if we view (2.4) as a nonlinear least squares regression, then the estimator of 0 will have the statistical properties of a nonlinear least square estimator. Thus, statistical inference can be carried out. For details, see White (1989a,b), Kuan and White (1994). A bootstrap method has been proposed by Lebaron and Weigend (1994) to determine the quality and reliability of a neural network predictor. Though the method is extremely computationally intensive, it does provide more robust forecasting along with the probability distribution of the forecast results. In their multivariate time series prediction of daily total trading volume on the New York Stock Exchange, the bootstrapping results show that the performance variation due to different splits between training, cross-validation, and testing samples is significantly larger than the variance due to different network architecture and initial weights.
4.3. Model implications
Artificial neural networks are often viewed as "black boxes", because the estimated models are difficult to explain due to their complex functional forms. However, the relationship between weights, inputs and outputs is clearly defined, which allows us to look into the "black boxes" and find the economic implications of A N N models. Following the notation in Section 2.1 for a three-layer A N N as shown in Figure 1, several practical methods have been proposed to interpret the relative significance of each input variable on the output. (1) Pseudo weights In an application of A N N to price call options using five input variables, Qi and Maddala (1995a) use the weighted average of the input weights, or so called pseudo weights, to approximate the marginal contribution of an input variable to the o u t p u t . The pseudo weight for the ith input variable is defined as

k
543
P W i ~-- ~-~ o~j~ij = o~'~i .

j=l
(4.13)
It is reported in their paper that the economic implications of PW are consistent with the call option properties. (2) Sum of input weights Sen, Oliver and Sen (1995) proposed, and Refenes, Zapranis and Francis (1995) adopted the idea of summing the absolute values of the input weights for each input variable to approximate the degree of impact that an input variable has on the outcome. The sum of input weights (SW) for the ith input variable is calculated as :
k
swi
=
j=l
IBd
(4.14)
Sen, Oliver and Sen (1995) find that all the variables found significant in the logit analysis to predict corporate mergers are included in the set of five variables with the highest sum of input weights. Notice the difference between PW and SW. SW loses information about the negative effect of an input variable on the output by taking the absolute values. If the weights were all positive, PW and SW should end up with the same rank order of the different input variables. More importantly, Qi (1996) points out that in the presence of substantial nonlinearity, both PW and SW are no longer relevant, and a useful tool of model interpretation is sensitivity analysis. (3) Sensitivity analysis Sensitivity analysis shows the sensitivity of the network output to changes in the input variables. To perform the sensitivity analysis, the minimum, maximum and the mean (or median) of each input variable are first determined. The value of each input variable is varied one at a time, holding the values of other input variables fixed at the their mean (or median). For each predictor being varied, the values are spread over certain number of equal intervals over its whole range. The neural network model is then used to compute the output. The plot of the neural network outcome against the value of the input variable indicates how the network output changes with a particular input variable with other input variables being fixed. This sensitivity analysis has been utilized by Sen, Oliver and Sen (1995) and Refenes, Zapranis and Francis (1995) to gain insights into their models. (4) Sensitivity index Sen, Oliver and Sen (1995) use a sensitivity index to find out the relative strength of the influence of an input variable on the output. The index for the ith input variable is computed by averaging the changes of output for certain number (M) of equal interval changes over the whole range of that input variable:
1 M
Sli = ~-"~(.~j+l - ~j) .

j=l
(4.15)
544
M. Qi
The sensitivity index provides a measure of "significance" of the input variables in predicting the output. The results in Sen et al. (1995) agree, in part, with the logistic regression.
5. Financial appfications
ANN has been successfully applied in several financial areas, such as option pricing, bankruptcy prediction, exchange rate forecasting, and stock market prediction. In this section, we review some of the well designed and carefully assessed empirical studies in each area.
5.1. Option pricing

There are only a few published studies regarding neural networks and option pricing. Much of the success and growth of the options market may be traced to the seminal Black-Scholes model and its extensions. While these parametric option pricing formulas are preferred where they are available, nonparametric neural network alternatives can be useful when parametric methods fail. Great success has been achieved by using ANN. The first well known experiment of option pricing using neural networks is by Hutchinson, Lo and Poggio (1994). First, the potential value of neural network pricing formula has been shown by the fact that neural networks can discover the Black-Scholes formula from a two-year training set of simulated daily option prices. The option prices are simulated based on all the Black-Scholes assumptions, such as geometric Brownian motion with constant mean and volatility, constant interest rate, etc. The resulting network formula has been shown to be successful in pricing and delta-hedging options out-of-sample. Then the network is applied to the pricing and delta-hedging of S & P 500 futures options from 1987 to 1991. The results show that neural networks outperform the Black-Scholes formula. However, Hutchinson et al. (1994) assume constant risk-free interest rate and constant volatility of the underlying asset. They further assume that the return of the underlying asset is independent of the level of the stock price, so that the option pricing formula is homogeneous of degree one in both S, the asset price, and X, the exercise price. Thus, their networks have only two inputs, SIX and T (time to maturity), one output, C/X, the ratio of the call price to the exercise price. It is reasonable to doubt whether such a network can capture all the option price variations. Another research on option pricing using ANN has been done by Qi and Maddala (1995a). Unlike Hutchinson, Lo and Poggio (1994), Qi and Maddala use variables that are believed to be important in determining option prices as network inputs, and use option prices as network output. The input variables are the underlying asset price (S), exercise price (X), risk-free rate (r), time-to-maturity (73, and open interest (V). Such a network provides superior performance
545
to the Black-Scholes formula both in and out of sample for S&P 500 index call options, and the results are better than those reported by Hutchinson, Lo and Poggio (1994). Moreover, by analyzing the network weights, Qi and Maddala find that the economic implications of the neural network model are consistent with the option price properties and the open interest is found to be important in determining option prices. Option pricing using ANN is an ongoing area. Qi (1995) uses ANN to examine the put-call parity and shows that the previous evidence of market inefficiency based on traditional put-call parity might have been exaggerated. Other option data sets and input variables are worth exploring using ANN. More evidence on option pricing using ANNs can be found in Bailey et al. (1988).
5.2. Bankruptcy prediction
In contrast to option pricing, a lot of studies have been done in bankruptcy prediction. The standard tools are discriminant analysis (DA) and the logit model. Given the pattern matching, classification and prediction abilities of ANNs, they may improve upon the traditional statistical counterparts. Tam and Kiang (1992) compare the neural network approach with linear discriminant analysis, the logit model and other approaches in predicting the failure of Texas banks from 1985 to 1987 using 19 financial ratios. A jackknife method has been used to get unbiased estimates for the misclassification rates. The original backpropagation algorithm was modified to include prior probabilities of bank failure and misclassification costs. The modified algorithm allows decision makers to choose a tradeoff between type I errors (misclassifying a failed bank to the nonfailed group) and type II errors (misclassifying a non-failed bank into a failed group). The empirical results show that neural networks offer better predictive accuracy than the alternative approaches and neural network with a hidden layer performs better than a two-layer network. Tam and Kiang also point out that ANN offers a comparative alternative to classification techniques in term of adaptability, robustness, and the ability to deal with multimodal distributions. Salchenberger, Cinar and Lash (1992) present a neural network developed to predict the probability of failure for savings and loan associations (S&Ls), using financial variables that signal an institution's deteriorating financial condition. Unlike Tam and Kiang (1992) who use all the 19 financial ratios as network inputs, Salchenberger et al. reduce the data dimensions from 29 to 5 by stepwise regression, and use these five variables as network inputs. Nevertheless, the results are similar, ANN has performed as well as or better than the best logit model with their data. Moreover, in some cases, when the cutoff point was lowered (higher probability of predicting failure), the reduction in Type I errors is accompanied by greater increase in Type II errors for the logit model than for the neural network model. An improved degree of accuracy and other beneficial characteristics between linear discriminant analysis and ANN has been reported by Altman, Marco and
546
M. Qi
Varetto (1994). The study diagnoses corporate distress for over 1,000 healthy, vulnerable and unsound industrial Italian firms from 1982-1992, and suggests a combined approach for predictive reinforcement. Other studies of bankruptcy prediction using ANNs are Tam and Kiang (1990), Odom and Sharda (1990), Raghupathi, Schkade and Raju (1991), Coats and Fant (1992), Huang (1993) and Poddig (1995). Bankruptcy prediction is just one class of classification problems. Other classification problems are corporate merger prediction (Sen, Oliver and Sen, 1995), market response models (Dasgupta, Dispensa and Ghose, 1994), Bond rating (Dutta and Shekhar, 1988; Surkan and Singleton, 1990; Utans and Moody, 1991; Moody and Utans, 1995), and mortgage underwriting (Collins, Ghosh and Scofield, 1988) .
5.3. Exchange rate forecasting
Exchanges rates are notorious for their unpredictability. Most of the unpredictable conclusions are drawn from linear time series techniques, thus the linear unpredictability of exchange rates may be due to limitations of linear models. Evidence of nonlinearity has been found since 1980s. As a class of flexible functional form nonlinear models, ANNs may provide improvements in forecasting accuracy. Kuan and Liu (1995) investigate the out-of-sample forecasting ability of neural networks on five exchange rates against the US dollar, including the British pound, the Canadian dollar, the Deutsche mark, the Japanese yen and the Swiss franc. The data are daily opening bid prices of the NY Foreign Exchange Market from March 1, 1980 to January 28, 1985, which consist of 1245 observations. A two-step procedure has been used to select the suitable network. First, networks are selected based on the predictive stochastic complexity (PSC) criterion (defined in Section 4.1.3). Then the selected networks are estimated using both recursive Newton algorithms and nonlinear least squares methods. For the Japanese yen and British pound, ANNs are found to have significant market timing ability and/ or significantly lower out-of-sample MSE relative to the random walk model in different testing periods; for the Canadian dollar and Deutsche mark, however, the selected networks only have mediocre performance. The results show that nonlinearity in exchange rates may be exploited to improve both point and sign forecasts, in contrast with the conclusion of Diebold and Nason (1990). The results are also different from those in Tsibouris (1993), in which ANN is found to be useful in forecasting the direction of the exchange rate change, but not the magnitude. Other applications of ANN to exchange rates are Abu-Mostafa (1995), who reports a statistically significant improvement in performance in four major foreign exchange markets by an A N N with a simple symmetry hint, and Hsu, Hsu and Tenorio (1995) who use an ANN to select predictive indicators and show that the forecasting accuracy of direction is better than that from the unprocessed universe of indicators. However, these studies do not compare the model performance to that of a benchmark model. Although they provide useful meth-
547
odologies, they cannot be counted as evidence in favor of the ANN in forecasting exchange rate.
5.4. Stock market prediction
Traditional models, such as the market model, CAPM, and APT, have been very useful in expanding the understanding of stock price behavior. However, their practical use is often limited given their limit success in forecasting stock returns. Because of the inductive, adaptive, and robust nature of ANNs, a great deal of effort has been devoted to developing ANNs for predicting stock returns. Limited success has been achieved so far. White (1988) investigates the forecastability of IBM daily stock returns using historical data. Though the surprisingly good fit (R2 = . 175) has been found insample, which is inconsistent with the efficient markets hypothesis, the out-ofsample correlation between actual and forecasted return is -.0699 (the in-sample correlation is .0751). Such results do not provide evidence for the forecastability of ANN and at present ANN is not a "money-making machine". Nevertheless, it is capable of capturing some of the dynamic behavior of the stock returns. However, the question of forecastability remains open because of the simple network used in White's study. Some elaborations in the ANN structure and learning method may improve the performance. Chuah (1993) uses ANN to forecast stock index returns of NYSE using data from January 1963 to December 1988, and compares the predictability and profitability of the network forecasts with those from a benchmark linear model using the same data. The predictability tests show that the forecast errors of the network are not significantly different from those of the benchmark linear model, and that the network has no market timing ability. The profitability test examines profits generated from a trading simulation over a five year forecast period, in comparison with a benchmark buy-and-hold strategy. The nonlinear network generated a total return of 116% versus 94% from the buy-and-hold strategy, while the linear network generated only a 38% total return. Similar results have been obtained by Qi and Maddala (1995b) on S&P 500 index returns using data from January 1959 to June 1995. Refenes, Zapranis and Francis (1995) show that neural networks are a superior substitute for linear regression in a dynamic multi-factor model of stock returns, a dynamic version of APT. Other studies in this area are Kamijo and Tanigawa (1990), Schoneburg (1990), Refenes, Zapranis and Francis (1994), Haefke and Helmenstein (1994, 1995). More references can be found in Trippi and Turban (1993) and Refenes (1995a).
6. Conclusions
In the present paper, we briefly introduce ANN and point out its relation to some familiar statistical models. Some practical ANN modeling methods are reviewed.
548
M. Qi
We have also reviewed empirical studies in several major fields o f financial applications, including option pricing, forecasting o f foreign exchange rates, b a n k r u p t c y prediction, and stock market prediction. While A N N s have achieved great success in option pricing and classification problems, the gains using A N N s in exchange rate forecasting and stock market prediction are less spectacular. While it calls for broadening the scope o f applications and developing better A N N modeling methods, the reasons o f this lack o f improvement need to be analyzed further. R a m s e y (1995) points out that open and non-isolated systems c a n n o t usually be forecast, and the extent to which economic systems are closed and isolated provides the true pragmatic limits to forecastability. Instead o f being due to the lack o f optimal network structure or learning methods, the empirical evidence o f unpredictability m a y be a result o f an open and non-isolated economic system in which exchange rates and stock returns are determined.
Acknowledgement
I a m grateful to G. S. M a d d a l a for helpful discussions and for providing some o f the papers reviewed in the present paper. I also thank Stephen R. Cosslett, H o n g y i Li and Y o n g Yin for helping me collect the papers.
References
Abu-Mostafa, Y. S. (1995). Financial market applications of learning from hints. In: A.-P. Refenes, eds., Neural Networks in the Capital Markets. John Wiley & Sons, Chichester, 221532. Altman, E., G. Marco, and F. Varetto (1994). Corporate distress diagnosis: Comparisons using linear discriminant analysis and neural networks (the Italian experience). J. Banking Finance 18, 505-529. Bailey, D. B., D. M. Thompson and J. L. Feinstein (1988). Option trading using neural networks. In: J. Herault and N. Giamisas, ed., Proc. Internat. Workshop on Neural Networks and Their Applications, Neuro-Nimes, 395402. Baldi, P. and K. Hornik (1989). Neural networks and principal component analysis: Learning from examples without local minima. Neural Networks 2, 53-58. Bose, N. K. and P. Liang (1996). Neural Network Fundamentals with Graphs, Algorithms, and Applications. McGraw-Hill, New York. Cheng, B. and D. Titterington (1994). Neural Networks: A review from a statistical perspective. Statist. Sci. 9, 2-54. Chuah, K. L. (1993). A nonlinear approach to return predictability in the securities markets using feedforward neural network. Dissertation, Washington State University. Coats, P. and L. Fant (1992). A neural network approach to forecasting financial distress. J. Business Forecasting 10, 9-12. Collins, E., S. Ghosh and C. Scofield (1988). An application of a multiple neural-network system to emulation of mortgage underwriting judgments. Proc. IEEE Internat. Conf. Neural Networks 2, 459-466. Cybenko, G. (1988). Continuous valued neural networks with two hidden layers are sufficient. Technical Report, Department of Computer Science, Tufts University, Medford, MA.
549
Cybenko, G. (1989). Approximation by superposition ofa si~noid function. Math. of Control Signals, and systems 2, 303-314. Dasgupta, C. G., G. S. Dispensa and S. Ghose (1994). Comparing the predictive performance of a neural network model with some traditional market response models. Internat. J. Forecasting 10, 235-244. Diebold, F. X. and R. S. Mariano (1995). Comparing predictive accuracy. J. Business Econom. Statist. 13, 253-263. Diebold, F. X. and J. A. Nason (1990). Nonparametric exchange rate prediction? J. Internat. Econom. 28, 315-332. Durra, S. and S. Shekhar (1988). Bond Rating: A non-conservative application of neural networks. Proc. 1FEE lnternat. Conf. Neural Networks 2, 443-450. Fahlman, S. E. and C. Lebiere (1990). The cascade-correlation learning algorithm. In: D. S. Touretzky, eds. Advances in Neural Information Processing Systems 2. Morgan Kaufmann, San Mateo, CA, 525-532. Frean, M. R. A. (1989). The upstart algorithm: A method for constructing and training feed-forward neural networks. Neural Computation 2, 198-209. Funahashi, K. (1989). On the approximate realization of continuous mappings by neural networks. Neural Networks 2, 183 192. Gallant, S. I. (1986). Three constructive algorithms for neural learning. Proc. 8th Annual Conf. of
Cognitive Science Soc.
Gately, E. (1996). Neural Networks for Financial Forecasting. John Wiley & Sons, New York. Gorr, W. L., D. Nagin and J. Szczypula (1994). Comparative study of artificial neural network and statistical models for predicting student grade point average. Internat. J. Forecasting 10, 1-34. Grudnitski, G. and L. Osburn (1993). Forecasting S&P and gold futures prices: An application of neural networks. J. Futures Markets 13, 631-643. Haefke, C. and C. Helmenstein (1994). Stock price forecasting of Austrian initial public offerings using artificial neural networks. Proc. Neural networks Capital Markets. Haefke, C. and C. Helmenstein (1995). Predicting stock market averages to enhance profitable trading strategies. Proc. Neural Networks Capital Markets. Hecht-Nielsen, R. (1987). Kolmogorov's mapping neural network existence theorem. Proc. IEEE 1st Internat. Conf. Neural Networks 3, 11-14. Hecht-Nielsen, R. (1989). Theory of the back-propagation neural network. Proc. Internat. Joint Conf. Neural Networks, Washington D. C.. IEEE Press, New York, 1, 593-606. Hecht-Nielsen, R. (1990). Neurocomputing. Addison-Wesley, MA. Henriksson, R. O. and R. C. Merton (1981). On Market timing and investment performance II, Statistical procedures for evaluating forecasting skills. J. Business 54, 513-533. Hergert, F., W. Finnoff and H. G. Zimmermann (1992). A comparison of weight elimination methods for reducing complexity in neural networks. Internat. Joint Conf. on Neural Networks, Baltimore, III, 980-987. Hertz, J., A. Grogh and R. Palmer (1991). Introduction to the Theory of Neural Computation. AddisonWesley, Redwood City. Hornik, K., M. Stinchcombe and H. White (1989). Multilayer feedforward networks are universal approximators. Neural Networks 2, 359-366. Hsu, W., L. S. Hsu and M. F. Tenorio (1995). A neural network procedure for selecting predictive indicators in currency trading. In: A.-P. Refenes, eds., Neural Networks in the Capital Markets. John Wiley & Sons, Chichester, 245-257. Huang, C. S. (1993) Neural networks in financial distress prediction: An application to the life insurance industry. Dissertation, University of Mississippi. Hutchinson, J., A. Lo and T. Poggio (1994). A nonparametric approach to pricing and hedging derivative securities via learning networks. J. Finance 99, 851-889. Jrreskog, K. G. and A. S. Goldberger (1975). Estimation of a model with multiple indicators and multiple causes of a single latent variable. J. Amer. Statist. Assoc. 70, 631~539.
550
M. Qi
Kamijo, K.-I. and T. Tanigawa (1990). Stock price recognition - A recurrent neural network approach. Proc. lnternat. Joint Conf. Neural Networks, San Diego, CA. Karhunen, J. and J. Joutsensalo (1995). Generalizations of principal component analysis, optimization problems and neural networks. Neural Networks 8, 549-562. Kolmogorov, A. N. (1957). On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. Dokl. Akad. Nauk USSR 114, 953-956. Kuan, C. and T. Liu (1995). Forecasting exchange rates using feedforward and recurrent neural networks. J. Appl. Econometrics 10, 347-364. Kuan, C. and H. White (1994). Artificial neural networks: An econometric perspective. Econometric Rev. 13, 1-91. Lebaron, B. and A. S. Weigend (1994). Evaluating neural network predictors by bootstrapping. University of Wisconsin - Madison, SSRI, Working Paper #9447. Lorentz, G. G. (1976). The 13th Problem of Hilbert. Proc. Symposia Pure Math., American Mathematical Society 28. Maddala, G. S. (1983). Limited-Dependent and Qualitative Variables in Econometrics. Cambridge University Press. Mezard, M. and J. Nadal (1989). Learning in feedforward layered network: The tiling algorithm. J. Physics A 22, 2191-2203. Moody, J. and J. Utans (1995). Architecture selection strategies for neural networks: Application to corporate bond rating prediction. In: A.-P. Refenes, eds., Neural Networks in the Capital Markets. John Wiley & Sons, Chichester, 277-300. Odom, M. and R. Sharda (1990). A neural network for bankruptcy prediction. Proc. Internat. Joint Conf. Neural Networks, San Diego, CA, 2, 163-168. Oja, E. (1989). Neural networks, principal components, and subspace. Internat. J. Neural Systems 1, 61-68. Pesaran, M. H. and A. G. Timmerman (1994). A generalization of the non-parametric HenrikssonMerton test of market timing. Econom. Lett. 44, 1-7. Poddig, T. (1995). Bankruptcy prediction: A comparison with discriminant analysis. In: A.-P. Refenes, eds., Neural Networks in the Capital Markets. John Wiley & Sons, Chichester, 311-323. Poli, I. and R. D. Jones (1994). A neural net model for prediction. J. Amer. Statist. Assoc. 89, 117-121. Qi, M. (1995). A reexamination of put-call parity on index options: An artificial neural network approach. Paper presented at the 3rd ICSA Statistical Conference, Beijing. Qi, M. (1996). Applications of generalized nonlinear nonparametric econometric methods (ANNs). Dissertation, The Ohio State University. Qi, M. and G. S. Maddala (1995a). Option pricing using ANN: The case of S&P 500 index call options. Neural Networks in Financial Engineering; Proc. 3rd Internat. Conf. on Neural Networks in the Capital Markets, London, 78-91. Qi, M. and G. S. Maddala (1995b). Economic factors and the stock market: A new perspective. Working Paper, Department of Economics, The Ohio State University. Raghupathi, W., L. L. Schkade and B. S. Raju (1991). A neural network approach to bankruptcy prediction. Proc. IEEE 24th Annul Hawaii Conf. Systems Sciences. Ramsey, J. B. (1995). If nonlinear models cannot forecast, what use are they? Manuscript, New York University. Rao, C. R. (1964). The use and interpretation of principal component analysis in applied research. Sankhya series A 26, 329-358. Refenes, A.-P. (1995a). eds., Neural Networks in the Capital Markets. John Wiley & Sons, Chichester. Refenes, A.-P. (1995b). Methods for optimal metwork design. In: A.-P. Refenes, eds., Neural Networks in the Capital Markets. John Wiley & Sons, Chichester, 33-54. Refenes, A.-P. and S. Vithlani (1991). Constructive learning by specialization. Proc. Internat. Conf. Artificial Neural Networks, Helsinki, Finland. Refenes, A.-P., A. D. Zapranis and G. Francis (1994). Stock performance modeling using neural networks: A comparative study with regression models. Neural Networks 7, 375-388.
551
Refenes, A.-P., A. D. Zapranis and G. Francis (1995). Modeling stock returns in the framework of APT: A comparative study with regression models. In: A.-P. Refenes, eds., Neural Networks in the Capital Markets. John Wiley & Sons, Chichester, 101-125. Refenes, A.-P., A. D. Zapranis and G. Francis (1994). Stock performance modeling using neural networks: A comparative study with regression models. Neural Networks 7, 375-388. Ripley, B. (1993). Statistical aspects of neural networks. In: O. E. Barndorff-Nielsen, J. Jensen and W. Kendall, eds. Networks and Chaos - Statistical and Probabilistic Aspects. Chapman and Hall, London. Ripley, B. (1994). Neural Networks and related methods for classification. J. Roy. Statist. Soc. Ser. B 56, 409-456. Rumelhart, D. E., G. E. Hinton and R. J. Williams (1986a). Learning internal representation by error propagation. In: D. E. Rumelhart and J. C. McClelland, ed., Parallel Distributed Processing: Explorations in the Microstructures of Cognition 1. MIT Press, Cambridge, 318-362. Rumelhart, D. E., G. E. Hinton and R. J. Williams (1986b). Learning internal representation by backpropagating errors. Nature 323, 533-536. Salchenberger, L., E. Cinar and N. Lash (1992). Neural networks: A new tool for predicting bank failures. Decision Sciences 23, 899-916. Sanger, T. D. (1989). Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Networks 2, 459-473. Schoneburg, E. (1990). Stock price prediction using neural networks: A project report. Neurocomputing 2, 17. Sen, T. K., R. Oliver and N. Sen (1995). Predicting corporate mergers. In: A.-P. Refenes, eds., Neural Networks in the Capital Markets. John Wiley & Sons, Chichester, 325 340. Sietsma, J. and R. F. J. Dow (1991). Creating artificial neural networks that generalize. Neural Networks 4, 67-79. Singleton, J. and A. Surkan (1991). Modeling the judgment of bond rating agencies: Artificial intelligence applied to finance. J. Midwest Finance Assoc. 20, 72 80. Sprecher, D. A. (1965). On the structure of continuous functions of several variables. Trans. Amer. Math. Soc. 115, 34~355. Stinchcombe, M. and H. White (1989). Universal approximation using feedforward networks with non-sigmoid hidden layer activation function. Proc. Internat. Joint Conf. Neural Networks, San Diego. IEEE Press, New York, 1, 612~17. Surkan, A. J. and J. C. Singleton (1990). Neural networks for bond rating improved by multiple hidden layers. Proc. IEEE Internat. Conf. Neural Networks, San Diego, CA, 2, 163-168. Swanson, N. R. and H. White (1995a). A model-selection approach to assessing the information in the term structure using linear models and artificial neural networks. J. Business Econom. Statist. 13, 265-275. Swanson, N. R. and H. White (1995b). A model-selection approach to real-time macroeconomic forecasting using linear models and artificial neural networks. Working Paper, Department of Economics, Penn State University. Tam, K. Y. and Y. M. Kiang (1990). Predicting bank failures: A neural network approach. Appl. Artificial Intelligence 4, 265-282. Tam, K. Y. and Y. M. Kiang (1992). Managerial application of neural networks: The case of bank failure predictions. Mgmt. Sci. 38, 92(~947. Trippi, R. and E. Turban (1993). eds. Neural Networks in Finance and Investing. Probus Publishing Company. Tsibouris, G. C. (1993). Essays on nonlinear models of foreign exchange. Dissertation, University of Wisconsin-Madison. Utans, J. and J. Moody (1991). Selecting neural network architectures via the prediction risk: Application to corporate bond rating prediction. Proc. 1st Internat. Conf. Artificial Intelligence Applications on Wall Street, IEEE Computer Society Press, Los Alamitos, CA. Wasserman, P. (1993). Advanced Methods in Neural Computing. Van Nostrand Reinhold, New York.
552
M. Qi
White, H. (1988). Economic prediction using neural networks: The case of IBM daily stock returns. Proc. IEEE Internat. Conf. Neural Networks. White, H. (1989a). Learning in artificial neural networks: A statistical perspective. Neural Computation 1, 425464. White, H. (1989b). Some asymptotic results for learning in single hidden-layer feedforward network models. J. Amer. Statist. Assoc. 84, 1003-1013. White, H. (1990). Connectionist nonparametric regression: Multilayer Feedforward networks can learn arbitrary mappings. Neural Networks 3, 535-549. White, H., A. R. Gallant, K. Hornik, M. Stinchcombe and J. Wooldridge (1992). eds., Artificial Neural Networks: Approximation and Learning Theory. Blackwell Publishers, Cambridge. Widrow, B. and M. E. Hoff (1960). Adaptive switching circuits. Institute Radio Engineers WESCON Convention Record 4, 96-104.
G. S. Maddala and C. R. Rao, eds., Handbook of Statistics, Vol. 14 1996; Elsevier Science B. V. All rights reserved,
1
l
Applications of Limited Dependent Variable Models in Finance*
G. S. Maddala
1. Introduction
The purpose of the present paper is to review some applications of limited dependent variable models in finance and to suggest some improvements where the methods used are defective. Some of the problems in this area have been discussed in Maddala (1991) but they are reviewed again in the light of more recent research. For the sake of brevity, duration models are excluded from this survey. The specific areas discussed include: (1) Studies on loan discrimination and default, (2) Studies on bond ratings, (3) Event studies, (4) Savings and Loan and bank failures, (5) Miscellaneous other applications: corporate takeovers, corporate choice of debt, market microstructure and futures markets.
2. Studies on loan discrimination and default
Models of discrimination in granting loans usually use logit analysis or discriminant functions. The two are related (see Maddala, 1983 and 1991, Section II). There is a latent variable R t defined as
R; = flO + fllLt + fl2Ct + fl3Mt + ct
where R~' is an unobserved index of the lender's decision to reject, Lt is a vector of loan terms, Ct is a vector of variables measuring credit worthiness and Mt denotes the demographic characteristics of the borrower (includes race, sex, age and so on). Coefficients of the variables in Mt that identify protected groups are inter-
* I would like to thank Hongyi Li for his help and useful comments in the preparation of this paper. 553
554
G. S. Maddala
preted as indicators of the presence or absence of discriminatory treatment (or even reverse discrimination). To provide adequate representation to the protected groups in the sample, it is customary to sample the rejected and accepted applications at different rates. For instance if the proportion of rejected applicants is 5 percent and accepted ones is 95 percent, and one wants to draw roughly a 10 percent sample, then one would sample the rejected applications at a 100 percent rate and the accepted applications at a 5 percent rate. This sampling scheme is known as 'choice based sampling'. For this reason, it is customary in some financial applications to suggest the use of the Manski-Lerman weighted M L estimator. See e.g. Palepu (1986) and Boyes et al. (1989). However, this estimator was suggested for the case of the McFadden conditional logit model that includes attributes of choices. This is not the case with the logit model used in financial applications. In this case only the constant term needs to be adjusted. The slope coefficients and their standard errors are all valid. Suppose the dummy variable is defined as: 1 0 if the observation belongs to group 1 otherwise.
yi=
Let pl and p2 be the proportions sampled in the two groups. Then after estimating the logit model from the choice-based sample, the constant term needs to be decreased by log pl - l o g p2. (On p. 91 of Maddala (1983) the word "increased" should be "decreased"). Further discussion of the examples in accounting and finance as well as a detailed criticism of the applicability of the Manski-Leman estimator and its extensions in these cases can be found in Maddala (1991, pp. 793-794) and hence will not be repeated here. Two extensions of the single equation model considered here are worth mentioning. These are Boyes et al. (1989) and Yezer et al. (1994). They both extend this model to include default probabilities. Boyes et al. consider a two equation model involving credit granting and default: yl = Z~I + el, Yl = 1 0 1 0 if loan granted if not if loan defaulted if not.
y2
= Z0~2
-[- C2, Y2 =
They argue that y2 is observed in this censored probit model only if Yl = 1. They extend the Manski-Lerman W E S M L (weighted exogenous sampling maximum likelihood) estimator to this censored probit model. Although this extension is interesting, there is an alternative C M L (conditional maximum likelihood estimator) that is more efficient than the W E S M L and as noted earlier, these estimation methods have been suggested in the context of the McFadden conditional logit model. (See the discussion of the C M L and these issues in the context of financial models in Maddala (1991, pp. 793-794)).
Applications of limited dependent variable models in finance
555
A more important problem in the Boyes et al. paper is the use of the censored probit model itself. It is true that the actual default is observed only for those who have been granted credit but the default equation is in principle defined for all the individuals and the ex-ante probability of default determines the loan granting process. Thus if y~ is the latent variable determining the probability of granting of the loan and y~ is the latent variable determining the probability of default, then y~ should appear as an explanatory variable in y~. y~ and y~ are jointly determined. Yezer et al. (1994) consider a 3 equation model consisting of two latent variables R~ (decision to reject the application on the part of the lender), D t (determining the probability of default) and Lt the loan terms. Their model consists of the equations
R; = flo 4- fllLt 4- fl2Dt -[- fl3Ct 4- fl4Mt 4- Eli
D~ = 70 + 71Lt + 72Ct 4- y3Mt 4- E2t

Lt = 0~0+ oqR t + ~2D~ 4:- 0~3C t 4- cqMt + e3t
where Ct, Mt have been defined earlier, and 1 0 if loan rejected if loan not rejected.
Rt =
We observe { 10
Dt =
(default) if D t > 0 and Rt = 0 (repayment) if D t _< 0 and Rt = 0 .
Note that this system is not identified without more prior information. Yezer et al. do not actually estimate this equation system (they say (p. 242) that this is not easy). Instead, they gather information on some of the parameters in the system from a data set on mortgage lending in Boston made available from the Federal Reserve Bank of Boston and use Monte Carlo methods to investigate the biases in estimates of loan discrimination from the use of single equation methods. Their major finding is that there are substantial biases in the estimates of the coefficients of variables in Mt (measuring loan discrimination) arising from the use of single equation models and not accounting for simultaneity and self-selection. From the purely statistical view point there are several deficiencies in the procedures followed in Yezer et al. But the paper addresses some important problems in this area that other papers have ignored and it provides a useful guide to the literature in this area.
3. Studies on bond ratings and bond yields
The paper by Kaplan and Urwitz (1979) was the earliest in this area that used the limited dependent variable models. It criticizes earlier studies and applies the
556
G. S. Maddala
ordinal probit model developed by McKelvey and Zavoina (1975) to study the determinants of bond ratings. The paper by Kaplan and Urwitz was concerned with the determinants of bond ratings. Kao and Wu (1990) extend this to a study of the determinants of bond yield which is formulated as a function of default risk and other factors, and default risk being measured by bond ratings. The model they consider is as follows:
= flaXli ~- fl2xzi -4- ~Y2i d- eli t y* 2i = f13x2i -~ C2i

Yli
CoV(Eli'E2i) = ( t720 0 "20)

Yli = corporate bond yield Y~i = latent variable measuring default risk xli = a set of explanatory variables that determine bond yields only
x2i = a set of explanatory variables that determine both bond yields and default risk yzi = an observed ordinal variable of bond ratings based on Y~i. In the first step//3 is estimated using the bond ratings y2g and an ordinal probit model. In the next step E(y~i ) is derived and used as an explanatory variable in the bond yield equation Yli, with E(y~i) substituted for Y~i" Kao and Wu derive the asymptotic covariance matrix of the two-step estimators. Since E(y~) is a nonlinear function ofx2i, there is no multicollinearity problem in the equation for yli and hence //2 and 7 are estimable. Note that if the errors eli and eZi are correlated with correlation p then //2 and 7 are not separately estimable by the two-step method because in this case the equation for yli can be written as
, , , , p61 / , '
Yli //lXli +//2X2i -~ 7Y2i Jr ~ 2
kY2i -- //3X2i) " pfl
Thus, the estimable parameters are (7 + pal) and (//2 -~2-2//3). Kao and Wu do
o" 2
not use the M L method of estimation, which is feasible. Moon and Stotsky (1993) is another extension of the Kaplan-Urwitz study. They consider municipal bond rating analysis for 892 cities (727 of which have actual ratings, and 165 do not) taking account of sample selectivity and simultaneous equation biases. Their argument is that some cities choose to have a rating because this saves on their interest cost. Thus, a bond rating analysis based on just those cities that have an actual rating introduces a self-selection bias. Their model consists of: (i) two continuous latent variables y~ and y~ defined as: y~ = propensity to obtain a rating y~ -- measure of credit worthiness two ordered latent variables
Y2 = potential rating for each city determined by the rating agency
(ii)
557
y2 = the city's perceived potential rating It is important to note that the authors assume that 72~ = ~2e. So there is no difference between these two variables. (iii) two observed variables: yl =
:
1 0
if the city has a bond rating otherwise
y2 observed categorical variable giving actual rating which is observed only if yl = 1, i.e., yzi : Y2i if yli 1. y~ determines 72, which in turn determines y~. y~ determines yl, and Yl and Y2 determine y2. Let ule and u2~ be the errors in the equations for Y~i and y~, respectively. Moon and Stotsky estimate this model by ML assuming that (Uli, u2~) are bivariate normal with means zero, unit variances and correlation p. They estimate four cases of this model: with no selectivity (72 does not determine yl) and no simultaneity (p = 0), and with simultaneity only, selectivity only and with both selectivity and simultaneity. They compare the four models by comparing the number of correct predictions of the ratings for the 727 cities with actual ratings. They find that correction for simultaneity is more important than correction for sample selectivity bias but making both the corrections gives the best results.
=
4. Event studies
Almost all papers on event studies use some form of the methodology developed by Fama et al. (1969) to determine the economic impact on stock prices of events like stock splits, debt and equity issues, stock buybacks, dividend and earnings announcements and so on. The approach developed by Fama et al. requires a model for expected returns before the event (often the CAPM model is used). This model is then used to determine excess or "abnormal returns" caused by the event. This can also be accomplished more easily by using a dummy variable method the dummies being used in the event period. (See Maddala, 1992, chapter 8 on using dummies for prediction). Subsequent econometric work studied the effects of (i) misspecification of the event date, (ii) changes in volatility caused by the event, and (iii) misspecification of the underlying stochastic process. For a review of some of these problems see Strong (1992). Nimalendran (1994) criticizes the traditional event study methodology on grounds it does not model the process by which private information is incorporated into prices through strategic trading. He uses the mixed jump diffusion model to estimate the separate effects of information surprises and strategic trading around corporate events and shows the potential of this new methodology in an example of block holding and subsequent targeted purchases, as well as a simulation study. Another problem with the standard event study methodology is that events are treated as exogenous. With voluntary corporate events such as those cited earlier,
558
G. S. Maddala
economically motivated managers can control the timing, type and magnitude of the announcements. This introduces a self-selection bias in the estimation of the returns equation used to compute abnormal returns, and necessitates the use of corrections for the truncation of residuals. These problems have been discussed in Acharya (1986, 1988, 1993a) and Eckbo et al. (1990). In the standard event study methodology, as mentioned earlier, the events are treated as exogenous when in fact they are often endogenous. The model considered in Acharya (1993a) is as follows: There is a latent variable I~ which measures the evaluation by firm i, of its present value of announcing the event minus the net present value of not announcing the event, at time t. Let
]~ ~ ]:tZi, t-1 -Jr- Cit
where zi,t-1 denotes firm specific characteristics for firm i at time t - 1 and eit denotes an error. The observed indicator is
Itt--
1 0
iff I;~ > 0, firm i announces the event at time t otherwise.
It is customary to study the determinants of the announcements by estimating the parameters vector 7 using the logit model and firms that experienced the event, and some (matching) firms that did not experience the events. In the next section we shall discuss problems of analysis with such "matched" samples. In any case the estimations of the logit model implies that the event is endogenous and not exogenous. The returns equation estimated in studies on abnormal returns is
t
Rit = f l ' ~ i t + uit
where E(uitlX,.t) = 0 and X/t is a set of firm specific variables. The computation of the abnormal returns amounts to estimating this model using the dummy variable method. The advantage of the dummy variable method (compared to the procedure of Fama et al., 1996) is that we can readily get the standard errors of the abnormal returns. (See Maddala, 1992, chapter 8). This methodology is, of course, valid only for the case of exogenous events. For endogenous events, there is a truncated residual problem because 2 E(uitllit = 1 ,Xit ) ~L O. Specifically if Cov(uit, eit ) is denoted by q and Var(ut) by au, then we assume that (uit, eit) have a joint normal distribution with means zero and covariance matrix
Then we have E(u~tllit = 1,Xit) = - q 1-0i, Oit where ~bit and Ou are respectively, the density function and cumulative distribution function of the standard normal evaluated at 7'zi,t_l. (See Maddala, 1983, chapter 8).
Applications o f limited dependent variable models in finance
559
We can now write the return equation as
qwit + Vit = L (air (1 cpit where wit lt t~it ---Iit)" 1 -- -~it This equation can be estimated using a cross-section of firms that experienced the event and finns that had possibilities but did not experience the events. If the latter group of firms cannot be identified, they could be proxied by non-event observations on firms that experienced the event. The estimation method is a twostage method. In the first stage we use a probit model to estimate the parameter vector 7. Then using ~ for 7 in ~it and ~b~t, we estimate the return equation Rit. Once we have estimates of fli, q,~ and au 2 we can compute E(RitIXit,Iit = 0) for those observations for which/it = 1. A measure of the event-induced change in expected return is
Rit = ffiXit +
E (Rit IX~t,Iit = 1) - E(Rgt IX~t,lit = O) = q 49it + q 1 4)it

(~it ~ -~it
_
~it
q~git (1 - ~)it) "
This is the measure of abnormal return. Note that if in the estimations of the return equation, q is found to be not significant, then we have an exogenous event and the traditional abnormal return methodology should be used. One can, in principle, use only the event period observations (//t = 1) and estimate the return equation as
Rit : o I .r , ~it , PiAit ~ q ~it ~- vit
or using just the nonevent data (lit = 0), estimate the equation
Rit = ffiX,t - q ~
+ vit .
Acharya (1993a) considers only the first of these two equations and calls it the truncated regression model. Actually, this is a censored regression model because the explanatory variables are observed for all the observations (see Maddala, 1983, chapter 6). A truncated regression model cannot be estimated by two-stage methods. Applications of this selection model can be found in Acharya (1991, 1993b and 1994) and in Eckbo et al. (1990).
5. Savings and loan and bank failures Again, the commonly used methods in this area are discriminant analysis and logit analysis. There are also problems arising from unequal sampling rates of the two group: failed and non-failed institutions. These problems have been discussed earlier in Section 3. One other method commonly used in this area is that of
560
G. S. Maddala
creating a "matched sample". Very often, the logit analysis or discriminant analysis is conducted with the failed institutions and a "matched" sample of nonfailed institutions that have characteristics similar to those of each failed institution. This practice widely used in this area gives wrong measures of the effect of the explanatory variables on the failure rate. Consider the following case A: failed institution, B: non-failed institution with the same measured characteristics. The question is: why did A fail and B did not? Clearly, the measured characteristics do not explain why A has failed and B did not. The failure of A and not B has to be attributed to some unmeasured characteristics. Thus, a logit analysis based on "matched" samples cannot tell us anything about the effects of measured characteristics on failure rates. Many of the problems of econometric analysis of savings and loan failure rates have been surveyed in Maddala (1986) and will not be repeated here. Instead some further work, that appeared since the publication of that paper, will be reviewed. Barth et al. (1990) extend the simple failure models to study resolution costs of failed thrift institutions. The model (with a slight change of notation) consists of two equation:
t
zi = fllXli + uli
t
closure rule, cost of resolution equation.
ci = fl2x2i + u2i
The observed dichotomous indicator is Yi= 1 0

if zi >_0
otherwise.
The discussion of the econometric issues concerned with the estimation of this model in Barth et al. is not accurate. There is a discussion of selection bias and Heckman procedure but this is confusing as well. First they define
Yi =
1 0
if the institution is CAAP solvent or resolved otherwise,
i.e., solvent institutions and insolvent but resolved institutions are combined. Next they argue that since the Heckman procedure is not fully efficient, a M L procedure is used to estimate the equation for z~ (closure rule, p. 737). A probit estimation of this equation is the M L procedure and hence, it is not clear what the authors are talking about. It is the Heckman two-stage estimation of the cost of resolution estimation that is not efficient but this is not what the authors are talking about. Barth et al. argue that they were "uncomfortable" with the results of the Heckman procedure and that the value of ~ was outside the unit interval (p is not defined). They, therefore, estimated the cost of resolution equation by the tobit method. However, the tobit model is inapplicable in this case. The tobit model is a censored regression model and the dependent variable is i n p r i n c i p l e defined for all observations but is not observed due to censoring - not being above a
561
threshold (here zero). In the case under consideration the non-observability is not due to censoring. It is due to a decision not to close the (insolvent) institution. Cole (1990) and Cole, Mckenzie and White (1990) use the selection model to examine the determinants of resolution costs. This is an improvement over the tobit model used by Barth et al. However, a more appropriate model would involve first the determinants of insolvency based on the solvent and insolvent institutions, then the determinants of closure among the insolvent institutions, and the resolution costs for the closed institutions. The model would then consist of the following equations:
t
Y*li = fll xli ~- ~lli
an equation determining insolvency. The observed dichotomous variable is: yli = 1 0 if Y~i > 0, institution i solvent otherwise;
Y~i = fl2X2i ~- U2i
an equation determining closure. The observed dichotomous indicator is: y2; = 1 0 if y* 2i > 0 and the institution not closed otherwise.
The third equation is:

t
Ci "~- fl3X3i -~- u3i
cost of resolution equation, c; is observed only if yl; = 0 and y2i = O. In models like this there is the question of whether to treat Y~i and Y*2;as joint decision variables or sequential decision variables. The problems of classification between joint and sequential decision models and analysis of selection bias in the latter models is discussed in Lee and Maddala (1985). It is important to note that in the joint decision model, there is a double selection bias in estimating the cost of resolution equation, that needs to be taken into account. A simpler procedure is of course to consider only the insolvent institutions and use a single selection model to study resolution costs. Thus solvent institutions would not be combined with those which are insolvent and closed as in Barth et al. Cole (1993) analyzes insolvency and closure using a bivariate probit model. Thus he treats Y~i and Y~i as joint decision variables. The errors uli and u2i are assumed to be bivariate normal with zero means, unit variances and correlation p. There were 3552 institutions, 2513 solvent and 1039 insolvent. O f the insolvent institutions 769 were closed and 270 were still open. Cole estimates a bivariate probit model using the indicators 1 0 for 2513 solvent institutions for 1039 insolvent institutions
Yl= and
562 1 0
G. S. Maddala
y2 --
for 2783 non-closed institutions for 769 closed institutions.
The model is estimated using the same explanatory variables for both the variables and the LIMDEP program. The curious result is that/5 --- 0.99. It has been often observed with the bivariate probit program in LIMDEP that t5 is close to 1. This could be a consequence of the poor starting values that LIMDEP uses. See Maddala (1995) for discussion of this point. A more important issue in the paper by Cole concerns with the use of the joint decision model and the bivariate probit model. The question of closure does not arise for the solvent institutions. Thus, the model has to be treated as a sequential decision model. Cole, in fact, estimates later a probit model taking the insolvent institutions only. One important variable explaining the closure decision is the months in insolvency. One other point worth mentioning with respect to the sample selection model used in the estimation of resolution costs is that the Heckman two-stage method often referred to, is not only not fully efficient but has recently been found to give worse results than ML, which is easy to implement with the current computer technology. See Maddala (1995) for the references on this and the relevant discussion.
6. Miscellaneous other applications

6.1. Corporate takeovers
There are two problems that have been analyzed in the context of corporate takeovers: one is that of determinants of takeovers and the second is on the method of financing takeovers, cash, stock or both. In the case of explanatory models of takeovers, the model often used is the logit model. There are two problems in this area. The first is the use of matched samples before the use of logit analysis. The problems with this procedure have been discussed in Section 5. The second problem is that of choice based samples or unequal sampling rates of the two groups: (takeovers and non-takeovers). For this problem, Palepu (1986) uses the Manski-Leaman estimator. A criticism of this has been presented in Section 2 and in Maddala (1991, pp. 793-794). The other problem is that of the choice of the method of financing takeovers. Amihud et al. (1990) classify firms as choosing stock or cash and use a probit model to study the determinants of the method of financing. Meyer and Walker (1996) consider the trichotomous classification: all cash, all stock and part cash and part cash. They use the two-limit tobit model (Maddala, 1983 pp. 160-162) to study the choice of payment method in corporate acquisitions. They also extend the analysis in Maddala to cover the case of heteroskedasticity, which they find to be important. In their sample 115 of the takeovers involved all cash, 32 involved a mixture of cash and stock and 34 involved all stock. The results indicate the usefulness of the two-limit tobit model.
Applications of limited dependent variablemodels infinance 6.2. Corporate choice of debt financing
563
The earliest studies on corporate choice between short-term and long-term debt used logit models. A recent application that uses the two-stage tobit method is Bronsard et al. (1994). They use data from business surveys during the period May 1979 to December 1988 conducted by the French National Institute of Statistics (INSEE). The surveys are biennial and cover over two thousand firms. The data are qualitative. What is observed is whether the firm used short-term or long-term debt or both. The model Bronsard et al. use is similar to the models used in studies on labor supply with a reservation wage and offered wage. Denote short-term interest rate by r and long-term interest rate by R. Bronsard et al. hypothesize that r* and R* are the reservation interest rates of the firm at which the firm is willing to undertake short-term and long-term debt respectively, and r and R are the corresponding interest rates offered to the firm by the bank. There are four equations explaining r,R, r* and R* in terms of variables denoting the financial condition of the bank. The two observed variables are yl = log r 0 logR 0 and short-term debt is observed if log r _< log r* otherwise. and long-term debt is observed if logR _< logR* otherwise.
Y2 =
The authors estimate the model by M L method (although the likelihood function for the full model is not presented in the paper),
6.3. Market microstructure During recent years there has been increased use of limited dependent variable models in the study of market microstructure. The models that have been used are the ordered probit model to account for the discreteness of the observations and the friction model to allow for no transactions at certain prices. Hausman et al. (1992) use an ordered probit model to study price impacts of trades of a given size, tendency towards price reversals from one transaction to the next and the empirical significance of price discreteness. Bollerslev and Melvin (1994) use an ordered probit model to study the relationship between bid-ask spreads and volatility in the foreign exchange markets, the volatility being measured using a G A R C H model. Lesmond (1995) and Lesmond et al. (1995) use the friction model (see Rosett (1959) and Maddala 1983, chapter 6) to get a new measure of transaction costs implicit in the data on stock returns. They argue that a rational informed investor will trade on new information only if the investor can realize a profit net of transaction costs. Consequently, unless the threshold of transaction costs is exceeded, the price of the security will not change. Using data on zero and non-zero returns, they estimate a friction model. As expected they find that zero returns occur more frequently among small-firm stocks, for which transaction costs are
564
G. S. Maddala
likely to be higher. The friction model implicitly gives a measure of transaction costs. These authors find that the transaction costs generated by the friction model are substantially lower than the transaction cost usually used which is the bid-ask spread plus the broker commission.
6.4. Futures markets
Futures markets are characterized by limits in the price movements. The implication of this is that the models estimated have to use the disequilibrium models discussed in Maddala (1983, chapter 10). Monroe (1983) applies the disequilibrium model to study demand and supply functions in interest rate futures markets. Other applications of this methodology include studying the effect of margin requirements and changes in margin requirements on price volatility in the futures markets.
7. Suggestions for future research We have surveyed the literature on limited dependent variable models in finance and noted some deficiencies in the methods used. In addition to these, there are two major problems that have not received attention and on which further work needs to be done. These refer to the problems of non-normality and incorporating expectations into the models. The first problem is that the papers are mostly based on the assumption of normality. The corrections for selection bias are all based on the normal distribution. It is well-known that the assumption of normality is very unreasonable in the case of financial variables. (See Chapters 13 and 14 in this volume). In view of this, some specification tests for normality should be a standard practice. Such tests in the context of limited dependent variable models are described in Maddala (1995). This paper also gives references to semiparametric methods in limited dependent variable models. These methods should be used to analyze the problems reviewed in the previous sections. The second problem that has been ignored is the incorporation of expectations. In event studies, it is the unexpected component of dividend and earnings announcements, stock repurchases etc. that has any information content and effect on stock price changes. Similarly, dividend changes depend on expected earnings. Thus, expectations enter almost everywhere in financial modeling. A friction model of dividends with rational expectations is presented in Maddala (1993). Other approaches to incorporating rational expectations in limited dependent variable models are also surveyed in that paper. More work remains to be done in incorporating expectations into the limited dependent variable models in finance surveyed in the previous sections.
Applications o f limited dependent variable models in finance
565
References
Acharya, S. (1986). A generalized model of stock price reaction to corporate policy announcement: Why are convertibles called late? Ph.D. Dissertation, Northwestern University, Evansten, Ill. Acharya, S. (1988). A generalized econometric model and tests of a signalling hypothesis with two discrete signals. J. Finance 43, 413~429. Acharya, S. (1991). Debt buybacks signal sovereign countries' creditworthiness: Theory and tests. Federal Reserve Board, Working Paper 80. Acharya, S. (1993a). Value of latent information: Alternative event study methods. J. Finance 48, 363385. Acharya, S. (1993b). An econometric model of multi-player corporate merger games. Federal Reserve Board, Working Paper. Acharya, S. (1994). Measuring gains to bidders and successful bidders. Federal Reserve System, Board of Governors, Working paper. Amihud, Y., B. Lev and N. G. Travlos (1990). Corporate control and the choice of investment financing: The case of corporate acquisitions. J. Finance 45, 603-616. Barth, J. R., P. F. Bartholomew and M. G. Bradley (1990). Determinants of thrift institution resolution costs. J. Finance 45, 731-754. Bollerslev, T. and M. Melvin (1994). Bid-ask spreads and volatility in the foreign-exchange market, o r. lnternat. Econom. 36, 355-372. Boyes, W. J., D. L. Hoffman, and S. A. Low (1989). An econometric analysis of the bank credit scoring problem. J. Econometrics 40, 3-14. Bronsard, C., F. Rosenwald and L. Salvas-Bronsard (1994). Evidence on corporate private debt finance and the term structure of interest rates. INSEE, Discussion Paper, Paris. Cole, R. A. (1990). Agency conflicts and thrift resolution costs. Federal Reserve Bank of Dallas, Financial Industry Studies Department, Working Paper. #3-90. Cole, R. A. (1993). When are thrifts closed? An agency-theoretic model. J. Financ. Serv. Res. 7, 283307. Cole, R. A., J. Mckenzie and L. White (1990). The causes and costs of thrift institution failures. Solomon Brothers Center for the Study of Financial Institutions, Working Paper #S-90-26. Eckbo, B. E., V. Maksimovic and J. Williams (1990). Consistent estimation of cross-sectional models in event studies. Rev. Financ. Stud. 3, 343-365. Fama, E. F., L. Fisher, M. Jensen and R. Roll (1969). The adjustment of stock prices to new information. Internat. Econom. Rev. 10, 1-21. Hausman, J. A., A. M. Lo and A. C. Mackinlay (1992). An ordered probit analysis of transaction stock prices. J. Financ. Econom. 31, 319-379. Kao, C. and C. Wu (1990). Two-step estimation of linear models with ordinal unobserved variables: The case of corporate bonds. J. Business Econom. Statist. 8, 317-325. Kaplan, R. S. and G. Urwitz (1979). Statistical models of bond ratings: A methodological inquiry. J. Business 53, 231-261. Lee, L. F. and G. S. Maddala (1985). Sequential selection rules and selectivity in discrete choice econometric models. Paper presented at the Econometric Society Meetings, San Francisco, reprinted in G. S. Maddala, Econometric Methods and Applications Vol. II, Edward Elgar, London. Lesmond, D. A. (1995). Transaction costs and security return behavior: The effect on systematic risk estimation and firm size. Unpublished doctoral dissertation, State University of New York at Buffalo. Lesmond, D. A., J. P. Ogden and C. A. Trzcinka (1995). Do stock returns reflect investors' trading thresholds? Empirical tests and a new measure of transaction costs. Paper presented at the Silver Anniversary Meeting of the Financial Management Association, New York, October, 1995. Maddala, G. S. (1983), Limited Dependent and Qualitative Variables in Econometrics. New York, Cambridge University Press. Maddala, G. S. (1986). Econometric issues in the empirical analysis of thrift institutions' insolvency and failure. Federal Home Loan Bank Board, Working Paper 56.
566
G. S. Maddala
Maddala, G. S. (1991). A perspective on the use of limited-dependent and qualitative variables models in accounting research. Account. Rev. 66, 788-807. Maddala, G. S. (1993). Rational expectations in limited dependent variable models. In: Handbook of Statistics Vol. 11, North Holland Publishing Co., Amsterdam, pp. 175-194. Maddala, G. S. (1995). Specification tests in limited dependent variable models. In: Advances in Econometrics and Quantitative Economics, Essays in honor of C. R. Rao, Blackwell, Oxford, pp. 149. Mayer, W. J. and M. M. Walker (1996). An empirical analysis of the choice of payment method in corporate acquisitions during 197%1990, Quart. J. Business Econom. 35, 48-65. McKelvey, R. and W. Zavoina (1975). A statistical model for the analysis of ordinal level dependent variables. J. Math. Soc. 4, 103-20. McNichols, M. and A. Dravid (1990). Stock dividends, stock splits, and signaling. J. Finance 45, 857 879. Monroe, M. A. (1983). On the estimation of supply and demand functions: The case of interest rate futures markets. Res. Finane. 4, 91-122. Moon, C. G. and J. G. Stotsky (1993). Municipal bond rating analysis. Regional Science and Urban Economics 23, 29-50. Nimalendran, M. (1994). Estimating the effects of information surprises and trading on stock returns using a mixed jump-diffusion model. Rev Financ. Stud. 7, 451475. Palepn, K. G. (1986). Predicting takeover targets: A methodological and empirical analysis. J. Account. Econom. 8, 3-35. Rosett, R. (1959). A statistical model of friction in economics. Econometriea 27, 263-267. Strong, N. (1992). Modelling abnormal returns: A review article. J. Business Finane. Account. 19, 533 553. Tobin, J. (1958). Estimation of relationships for limited dependent variables. Econometriea 26, 24-36. Yezer, A. M. J., R. F. Phillips, and R. P. Trost (1994). Bias in estimates of discrimination and default in mortgage lending: The effects of simultaneity and self selection. J. Real Estate Financ. Econom. 9, 197-215.
G. S. Maddala and C. R. Rao, eds., Handbook of Statistics, Vol. 14 1996ElsevierScienceB. V. All rights reserved.
"~1"~
LU
Testing Option Pricing Models

D a v i d S. B a t e s
I. Introduction
Since Black and Scholes published their seminal article on option pricing in 1973, there has been an explosion of theoretical and empirical work on option pricing. While most papers maintained Black and Scholes' assumption of geometric Brownian motion, the possibility of alternate distributional hypotheses was soon raised. Cox and Ross (1976b) derived European option prices under various alternatives, including the absolute diffusion, pure-jump, and square root constant elasticity of variance models. Merton (1976) proposed a jump-diffusion model. Stochastic interest rate extensions first appeared in Merton (1973), while models for pricing options under stochastic volatility appeared in Hull and White (1987), Johnson and Shanno (1987), Scott (1987), and Wiggins (1987). New models for pricing European options under alternate distributional hypotheses continue to appear; for instance, Naik's (1993) regime-switching model and the implied binomial tree models of Dupire (1994), Derman and Kani (1994), and Rubinstein (1994). Since options are derivative assets, the central issue in empirical option pricing is whether option prices are consistent with the time series properties of the underlying asset price. Three aspects of consistency (or lack thereof) have been examined, corresponding to second moments, changes in second moments, and higher-order moments. First, are option prices consistent with the levels of conditional volatility in the underlying asset? Tests of this hypothesis include the early cross-sectional tests of whether high-volatility stocks tend to have highpriced options, while more recent papers have tested in a time series context whether the volatility inferred from option prices using the Black-Scholes model is an unbiased and informationally efficient predictor of future volatility of the underlying asset price. The extensive tests for arbitrage opportunities from dynamic option replication strategies are also tests of the consistency between option prices and the underlying time series, although it is not generally easy to identify which moments are inconsistent when substantial profits are reported. Second, the evidence from A R C H / G A R C H time series estimation regarding persistent mean-reverting volatility processes has raised the question whether the
567
568
D. S. Bates
term structure of volatilities inferred from options of different maturities is consistent with predictable changes in volatility. There has been some work on this issue, although more recent papers have focussed on whether the term structure of implicit volatilities predicts changes in implicit rather than actual volatilities. Finally, there has been some examination of whether option prices are consistent with higher moments (skewness, kurtosis) of the underlying conditional distribution. The focus here has largely been on explaining the "volatility smile" evidence of leptokurtosis implicit in option prices. The pronounced and persistent negative skewness implicit in U.S. stock index option prices since the 1987 stock market crash is starting to attract attention. The objective of this paper is to discuss empirical techniques employed in testing option pricing models, and to summarize major conclusions from the empirical literature. The paper focusses on three categories of financial options traded on centralized exchanges: stock options, options on stock indexes and stock index futures, and options on currencies and currency futures. The parallel literature on commodity options is largely ignored; partly because of lack of familiarity, and partly because of unique features in commodities markets (e.g., short-selling constraints in the spot market that decouple spot and futures prices; harvest seasonals) that create unique difficulties for pricing commodity options. The enormous literature on interest rate derivatives deserves its own chapter; perhaps its own book. The tests of consistency between options and time series are divided into two approaches: those that estimate distributional parameters from time series data and examine the implications for option prices, and those that estimate modelspecific parameters implicit in option prices and test the distributional predictions for the underlying time series. The two approaches employ fundamentally different econometric techniques. The former approach can in principle draw upon methods of time series-based statistical inference, although in practice few have done so. By contrast, implicit parameter "estimation" lacks an associated statistical theory. A two-stage procedure is therefore commonplace; the parameters inferred from option prices are assumed known with certainty and their informational content is tested using time series data. Hybrid approaches are sorted largely on whether their testable implications are with regard to option prices or the underlying asset price.
2. Option pricing fundamentals

2.1. Theoretical underpinnings: actual and "risk-neutral" distributions
The option pricing models discussed in this survey have typically employed special cases of the following general specification:
Testing option pricing models
569
dS/S = [lt - 2[c]dt + aSP-ldW + kdq a~ = m ( ~ ) a t + v(~)ave~ dr = ~r(r)dt + vr(r)dWr

where
(1)
S is the option's underlying asset price, with instantaneous (and possibly stochastic) expected return p per unit time; a is a volatility state variable; 2(p -1) is the elasticity of variance (0 for geometric Brownian motion); r is the instantaneous nominal discount rate; dW, dW,, and dWr are correlated innovations to Wiener processes; k is the random percentage jump in the underlying asset price conditional upon a jump occurring, with 1 + k lognormally distributed: ln(1 + k) ~ N [ln(1 + k) 1 62, 62]; and q is a Poisson counter with constant intensity 2 : Prob(dq = 1) = 2 dt.
_ _
This general specification nests the constant elasticity of variance, stochastic volatility, stochastic interest rate, and jump-diffusion models. Most attention has focussed upon Black and Scholes (1973) assumption of geometric Brownian motion:
dS/S = # d t + adW ,
(2)
with a and r assumed constant. Excluded from consideration are option pricing models with jumps in the underlying volatility; e.g., the regime-switching model of Naik (1993). Such models, while interesting and relevant, have not to my knowledge been tested in an option pricing context. Fundamental to testing option pricing models against time series data is the issue of identifying the relationship between the actual processes followed by the underlying state variables, and the "'risk-neutral" processes implicit in option prices. Representative agent equilibrium models such as Cox, Ingersoll, and Ross (1985a), Ahn and Thompson (1988), and Bates (1988, 1991) indicate that European options that pay off only at maturity are priced as /f investors priced options at their expected discounted payoffs under an equivalent "risk-neutral" representation that incorporates the appropriate compensation for systematic asset, volatility, interest rate, and jump risk. For instance, a European call option on a non-dividend paying stock that pays off max (St - X, 0) at maturity T for exercise price X is priced as c = E* exp (- f~rtdt) max(St - X, 0) . (3)
E* is the expectation using the "risk-neutral" specification for the state variables:
dS/S = [r - 2*/c*]dt + f f s p - l d W * q- k*dq*
570
D. S. Bates
dcr = [#~(tr)dt + ~a] + v(tr)dW*

dr = [ktr(r)dt + ~r] "-I-Vr(r)dW;
(4)
where #a = Coy(de, dJw/Jw) ~r = Cov(dr, dJw/Jw) 2" = 2E(1 + AJw/Jw)

p = ~ ~ Coy(k, AJw/Jw) E[1 + AJw/Jw] '
(5)
and q* is a Poisson counter with intensity 2*. Jw is the marginal utility of nominal wealth of the representative investor, AJw/Jw is the random percentage jump conditional on a jump occurring, and dJw/Jw is the percentage shock in the absence of jumps. The correlations between innovations in risk-neutral Wiener processes W* are the same as between innovations in the actual processes. The "risk-neutral" specification incorporates the appropriate required compensation for systematic asset, volatility, interest rate, and jump risk. For assets such as foreign currency that pay a continuous dividend yield r*, the risk-neutral process for the asset price is
d S / S = (r - r* - 2*/c*)dt + t r s p - l d W * + k*dq*
(6)
The process for r* must also be modelled if stochastic. Discrete dividend payments on stocks cause a discrete drop in the actual and risk-neutral asset price. The drop is typically assumed predictable in time and magnitude. Black and Scholes (1973) emphasize the derivation of the "risk-neutral" process under geometric Brownian motion as an equilibrium resulting from the continuous-time capital asset pricing model - a property also captured by the discrete-time equilibrium models of Rubinstein (1976) and Brennan (1979). However, as emphasized by Merton (1973), the Black-Scholes model is relatively unique in that the distributional assumption (2) plus the important assumption of no transaction costs Suffice to generate an arbitrage-based justification for pricing options on non-dividend paying stock at discounted expected terminal value under the "risk-neutral" process
d S / S = rdt + adW* ,
(7)
a feature also shared with other diffusion models for which instantaneous asset volatility is a deterministic function of the asset price. The arbitrage pricing reflects the fact that a self-financing dynamic trading strategy in the underlying asset and risk-free bonds can replicate the option payoff given the distributional restrictions and assumed absence of transaction costs, and that therefore the option price must equal the initial cost of the replicating portfolio. It is, however, important that the Black-Scholes model has an equilibrium as well as a no-arbitrage justification, given that even minuscule transaction costs vitiate the continuous-
571
time no-arbitrage argument and preclude risk-free exploitation of "arbitrage" opportunities. Other models require some assessment of the appropriate pricing of systematic volatility risk, interest rate risk, and/or jump risk. Standard approaches for pricing that risk have typically involved either assuming the risk is nonsystematic and therefore has zero price ( ~ = ~r = 0 ; 2 " = 2, k * = k), or by imposing a tractable functional form on the risk premium (e.g.,~r = ~r) with extra (free) parameters to be estimated from observed option prices. It has not been standard practice in the empirical option pricing literature to price volatility risk or other sorts of risk using asset pricing models such as the consumption-based capital asset pricing model. 1 These risk premia can potentially introduce a wedge between the "risk-neutral" distribution inferred from option prices and the true conditional distribution of the underlying asset price. Even in the case of Black-Scholes, it is not possible to test the consistency of option prices and time series without further restrictions on the relationship between the "actual" and "risk-neutral" processes. For whereas the instantaneous conditional volatility should theoretically be identical across both processes, and therefore should be common to both the time series and option prices, estimation of that parameter on the discretely sampled time series data typically available requires restrictions on the functional form of #. The issue is discussed in Grundy (1991) and Lo and Wang (1995), who point out that strong mean reversion such as # ( S ) = flln(S/S) could introduce a substantial disparity between the discrete-time sample volatility and the instantaneous conditional volatility of log-differenced asset prices. Tests of option pricing models therefore also rely to a certain extent on hypotheses regarding the asset market equilibrium for the risk p r e m i u m / ~ - r, or alternatively on empirically based knowledge of the appropriate functional form for #. In the above example, for instance, one might argue in favor of a constant or slow-changing risk premium and against such strong mean reversion as "implausible" either because of the magnitude of the speculative opportunities from buying when S < S and selling when S > S or because of the empirical evidence regarding unit roots in asset prices. Conditional upon a constant risk premium, of course, the probability limit of the volatility estimate from log-differenced asset prices will be the volatility parameter a observed in option prices, assuming Black-Scholes distributional assumptions. 2
1 For the consumption CAPM, the marginal utility o f nominal wealth is related to the instantaneous marginal utility of consumption: Jw = Uc(c)/P, where c is the real consumption and P is the price level. 2 Fama (1984) noted that the standard rejections of uncovered interest parity could be interpreted assuming rational expectations as evidence for a highly time-varying risk premium on foreign currencies. For surveys of the resulting literature, including alternate explanations, see Hodrick (1987), Froot and Thaler (1990) and Lewis (1995).
572
D. S. Bates
2.2. Terminology and notation The forwardpriee F on the underlying asset is the price contracted now for future delivery. For assets that pay a continuous dividend yield, such as foreign currencies, the forward and spot prices are related by the "cost-of-carry" relationship F = Se (r-r*)r, where r is the continuously compounded yield from a discount bond of comparable maturity T, and r* is the continuous dividend yield (continuously compounded foreign bond yield for foreign currency). For stock options with known discrete dividend payments, the comparable relationship is F = e r r [ s - Zte-rttDt], where dividends are discounted at the relevant discount bond yields rt. Futures prices have zero cost of carry. A call option will be referred to as in-the-money (ITM), at-the-money (ATM), or out-of-the-money (OTM) if the strike price is less than, approximately equal to, or greater than the forward price on the underlying asset. For futures options, the futures price will be used instead of the forward price. Similarly, put options will be in-, at-, or out-of-the-money if the strike is greater than, approximately equal to, or less than the forward or futures price. This is standard terminology in most of the literature, although some use the spot price/strike price relationship as a gauge of moneyness. An ITM put corresponds in moneyness to an OTM call. European call and put options that can be exercised only at maturity will be denoted c and p respectively, while American options that can be exercised at any time prior to maturity will be denoted C and P. The intrinsic value of a European option is the discounted difference between the forward and strike prices: e - r T ( F - X) for calls, e - r r ( x - F) for puts. The intrinsic value of American options is the value attainable upon immediate exercise: S - X for calls, X - S for puts. Intrinsic value is important as an arbitrage-based lower bound on option prices. The time value of an option is the difference between the option price and its intrinsic value. The implicit volatility is the value for the annualized standard deviation of logdifferenced asset prices that equates the theoretical option pricing formula premised on geometric Brownian motion with the observed option price. It is also commonly if ungrammatically called the "implied" volatility. Implicit volatilities should in principle be computed using an American option pricing formula when options are American, although this is not always done. Historical volatility is the sample standard deviation for log-differenced asset prices over a fixed window preceding the option transaction; e.g., 30 days.
2.3. Tests of no-arbitrage conditions

A necessary prerequisite for testing the consistency of time-series distributions and option prices is that option prices satisfy certain basic no-arbitrage constraints. First, call and put option prices relative to the synchronous underlying asset price cannot be below intrinsic value, while American option prices cannot be below European prices. Second, American and European option prices must be monotone and convex functions of the underlying strike price.
573
Third, synchronous European call and put prices of common strike price and maturity must satisfy put-call parity, while synchronous American call and put prices must satisfy specific inequality constraints discussed in Stoll and Whaley (1986). Violation of these constraints either implies rejection of the fundamental economic hypothesis of nonsatiation, or more plausibly indicates severe market synchronization or data recording problems, bid-ask spreads, or transaction costs that have not been taken into account. Furthermore, as discussed in Cox and Ross (1976a), these no-arbitrage constraints reflect extremely fundamental properties of the risk-neutral distribution implicit in option prices. Monotonicity in European option prices with respect to the strike price is equivalent to the riskneutral distribution function being nondecreasing, while convexity is equivalent to risk-neutral probability densities being nonnegative. If these no-arbitrage constraints are severely violated, there is no distributional hypothesis consistent with observed option prices. In general, there is reason to be skeptical of papers that report arbitrage violations based on Wall Street Journal closing prices for options and for the underlying asset. Option prices are extremely sensitive to the underlying asset price, and a lack of synchronization by even 15 minutes can yield substantial yet spurious "arbitrage" opportunities. An early illustration is provided in Galai (1979), who found that most of the convexity violations observed for Chicago Board Options Exchange (CBOE) stock option closing prices over April to October, 1973 (24 violations out of 1000 relevant observations) disappeared when intradaily transactions data were used. Nevertheless, studies that use more carefully synchronized transactions data have found that substantial proportions of option prices violate lower bound constraints. Bhattacharya (1983) examined CBOE American options on 58 stocks over August 24, 1976 to June 2, 1977 and found 1,120 violations (1.30%) out of 86,137 records violated the immediate-exercise lower bound, while 1,304 quotes out of a 54,735-record subset of the data (2.38%) violated the European intrinsic value lower bound. Bhattacharya found very few violations net of estimated transaction costs, however. Culumovic and Welsh (1994) found that the proportion of CBOE stock option lower bound violations had declined by 1987-89, but was still substantial. Evnine and Rudd (1985) examined the CBOE's American options on the S&P 100 index and the American Stock Exchange's options on the Major Market Index using on-the-hour data over June 26 to August 30, 1984, during the first year the contracts were offered. They found 2.7% of the S&P 100 call quotations and 1.6% of the MMI call quotations violated intrinsic-value bounds, all during turbulent market conditions in early August. The underlying indexes are not traded contracts, but rather aggregate prices on the constituent stocks. Consequently, the apparent arbitrage opportunities were not easily exploitable, and may reflect deviations of the reported index from its "true" value because of stale prices.
574
D. S. Bates
Bodurtha and Courtadon (1986) examined Philadelphia Stock Exchange (PHLX) American foreign currency options for five currencies during the market's first two years (February 28, 1983 to September 14, 1984), and found that .9% of the call transaction prices and 6.7% of the put prices violated the immediate-exercise lower bounds computed from the Telerate spot quotations provided by the exchange. Most violations disappeared when transaction costs were taken into account. Ogden and Tucker (1987) examined 1986 pound, Deutschemark, and Swiss franc call and put options time-stamped off the nearest preceding CME foreign currency futures prices. They found only .8% violated intrinsic-value bounds, and that most violations were small. Bates (1996b) found roughly 1% of the PHLX Deutschemark call and put transaction prices over January 1984 to June 1991 mildly violated intrinsic value bounds computed from futures prices. Hsieh and Manas-Anton (1988) examined noon transactions for Deutschemark futures options during the first year of trading (January 24 to October 10, 1984), and found 1.03 % violations for calls and .61% for puts, all of which were less than 4 price ticks. Violations of intrinsic value constraints will only be observed for short-maturity, in-the-money and deep-in-the-money options with little time value remaining - a small proportion of the options traded at any given time. The magnitude rather than the frequency of violations is consequently more relevant. The fact that the violations are generally less than estimated transaction costs is reassuring, and suggests that the violations may originate either in imperfect synchronization between the options market and underlying asset market, or in bid-ask spreads. Further evidence of imperfect synchronization is provided by Stephan and Whaley (1990), who found that stock options lagged behind price changes in individual stocks by as much as 15 minutes in 1986, and by Fleming, Ostdiek, and Whaley (1996), who found that S&P 100 stock index options anticipated subsequent changes in the underlying stock index by about 5 minutes over January 1988 to March 1991. The violations suggest measurement error in the observed option price/underlying asset price relationship even for high-quality intradaily transactions data.
3. Time series-based tests of option pricing models

3.1. Statistical methodologies
If log-differenced asset prices were drawn from a stationary distribution, such as the Gaussian distribution for log-differenced asset prices assumed by Black and Scholes (1973), then empirical tests of the consistency of option prices with time series data would be relatively easy. The methods of estimating the parameters of stationary distributions are well-established, and the resulting testable implications for option prices are straightforward applications of statistical inference. For instance, Lo (1986) proposed maximum likelihood parameter estimation, which given the invariance properties yields maximum likelihood estimates of
575
option prices conditional upon time series information. Associated asymptotic confidence intervals for option prices can similarly be established, based upon asymptotic unbiasedness and normality of estimated option prices. For the lognormal distribution, the maximum likelihood estimator for data spaced at regular time intervals At is of course
1 N
= ,
closely related to the usual unbiased estimator of variance
~2At-- N- l ~=l [ln(Sn/Sn-1) - ln(Sn/S,-1)]

And since under geometric Brownian motion, N can be increased either by using more observations or by sampling at higher frequency, arbitrarily tight confidence regions could in principle be constructed for testing whether observed option prices are consistent with the underlying time series. The only caveat is the distinction between the actual and "risk-neutral" mean of the distribution - which, however, becomes decreasingly important as the data sampling frequency increases. The approach of using high-frequency (e.g., intradaily) data for academic tests was initially precluded by lack of data, and subsequently by the recognition of substantial intradaily market microstructure effects such as bid-ask bounce that reduce the usefulness of that data. The appeal of extending the length of the data sample was reduced by the recognition of time-varying volatility. Tests of the Black-Scholes model have, therefore, typically involved some recognition that the model is misspecified and that its underlying distributional assumption of constant-volatility geometric Brownian motion with probability one is false. Assorted alternate estimators premised on geometric Brownian motion have been proposed for deriving time series-based predictions of appropriate option prices conditional on the use of a relatively short data interval. Parkinson's (1980) high-low estimator exploits the information implicit in the standard reporting of the day's high and low for a stock price, assuming intradaily geometric Brownian motion. Garman and Klass (1980) discuss potential sources of bias in Parkinson's volatility estimate, including noncontinuous recording (which biases reported highs and lows), bid-ask spreads, and the (justified) concern that intradaily and overnight volatility can diverge. Butler and Schachter (1986) note that although sample variance is an unbiased estimator of the true variance, pricing options off of sample variance yields biased option price estimates given the nonlinear transformation. They consequently develop the small-sample minimum-variance unbiased estimator for Black-Scholes option prices, by expanding option prices in a power series in a and using unbiased estimators of the powers of o based upon the postulated normal distribution for log-differenced asset prices. Butler and Schachter (1994), however, subsequently conclude that the small-sample bias
.2
576
D. S. Bates
induced by using a 30-day sample variance is negligible for standard tests of option market efficiency, especially relative to the noise in the small-sample volatility estimate. Bayesian methods have been proposed that exploit prior information regarding the volatility (Boyle and Ananthanarayanan (1977)) or the cross-sectional distribution of volatilities across different stocks (Karolyi (1993)). Finally, of course, the enormous literature on ARCH and GARCH models explicitly addresses the issue of optimally estimating conditional variances when volatility is time-varying. The potential value of these methods for option markets is examined by Engle, Kane, and Noh (1993), who conduct a trading game in volatility-sensitive straddles (1 ATM call + 1 ATM put) between fictitious traders who use alternative variance forecasting techniques. They conclude based on 1968-91 stock index data that GARCH(1,1) traders would make substantial profits off moving-average "historical" volatility traders, especially when trading very short-maturity straddles. Their results are substantially affected by the 1987 stock market crash, however.
3.2. The Black-Scholes model 3.2.1. Option pricing
The original Black-Scholes specification of geometric Brownian motion for the underlying asset price has been and continues to be the dominant option pricing model, against which all other models are measured. For European call options, the Black-Scholes formula can be written as
av/T XN f l n ( F / X ) -
(10)
o'2T'~
7-T
1]
where F is the forward price on the underlying asset, T is the maturity of the option, X is the strike price, r is the continuously compounded interest rate, ~2 is the instantaneous conditional variance per unit time, and N(*) is the Normal distribution function.3 A related formula evaluates European put options. American call and put option prices depend on similar inputs but generally have no closed-form solutions, and must be evaluated numerically. The dominance of the Black-Scholes model is reflected in the fact that the implicit volatility - the value of a that equates the appropriate option pricing formula to the observed option price - has become the standard method of quoting option prices.
3 The classicBlack-Scholes(1973)formulacan be obtainedfrom(10) usingF = Serr, whichis the appropriate forwardprice on a non-dividendpayingasset.
577
Most theoretical option pricing papers have maintained the geometric Brownian motion assumption in some form, and have focussed upon the impact of dividends and/or early exercise upon option valuation. While Black and Scholes (1973) assumed non-dividend paying stocks, European option pricing extensions to stocks with constant continuous dividend yields (Merton (1973)), currency options (Garman and Kohlhagen (1983)), and futures options (Black (1976b)) proved straightforward and are nested in the above formula. The discrete dividend payments observed with stocks proved more difficult to handle, especially in conjunction with the American option valuation problem. For tractability reasons, papers such as Whaley (1982) assumed that the f o r w a r d price rather than the cum-dividend stock price follows geometric Brownian motion. 4 This yields a relatively simple formula for American call options when at most one dividend payment will be made, and permits recombinant lattice techniques for numerically evaluating American options under multiple dividend payments (Harvey and Whaley (1992a)). Evaluating the early-exercise premium associated with American options has proved formidable even under geometric Brownian motion. Computationally intensive numerical solutions to the underlying partial differential equation are typically necessary, although good approximations can be found in some cases. 5 And although Kim (1990) and Carr, Jarrow, and Myneni (1992) have provided a clearer understanding of the "free-boundary" American option valuation problem, this has only recently yielded more efficient American option valuation techniques. 6 Concerns over the correct specification of boundary conditions and their impact on option prices continue to surface (e.g., the "wild card" feature of S&P 100 index options discussed in Valerio (1993)), and are of course fundamental to exotic option valuation. A major issue in the early empirical literature was whether the use of European option pricing models with a d hoc corrections for the early-exercise premium were responsible for reported option pricing errors; e.g., Whaley (1982), Sterk (1983), and Geske and Roll (1984). Many papers consequently concentrated upon cases in which American option prices are well approximated by their European counterparts. For stock options, this involves examining only call options on stocks with no or low dividend payments. American call (put) currency options are well approximated by European currency option prices when the domestic interest rate is greater (less) than the foreign interest rate (Shastri and Tandon (1986)).
4 Whaley'sassumptionthat the stock price net of the present value of escroweddividendsfollows geometric Brownianmotion is equivalent to the assumption of geometric Brownianmotion for the forward price F = err[S - Y'~te-r'tDt] " 5 Examplesinclude the MacMillan (1987) and Barone-Adesi and Whaley(1987) quadratic approximation for pricing American options on geometric Brownian motion. A good survey of the efficiencyof alternative numericalmethods is in Broadie and Detemple(1996). 6 See, e.g., Allegretto,Barone-Adesi,and Elliott (1995) and Broadie and Detemple(1996).
578
D. S. Bates
3.2.2. Tests o f the Black-Scholes model
There have in fact been relatively few papers that estimate volatility from the past history of log-differenced asset prices, and then test whether observed option prices are consistent with the resulting predicted Black-Scholes option prices. One reason is that the no-arbitrage foundations of the Black-Scholes model suggested proceeding directly to a "market efficiency" test of the profits from dynamic option replication, as in Black and Scholes (1972). A second factor was that early recognition of time-varying volatility made it more natural to reverse the test and examine whether volatilities inferred from option prices did in fact correctly assess future asset volatility. The former tests are discussed in the following section; the latter are surveyed in Section 4.3 below. Nevertheless, several papers used cross-sectional and event study methodologies to examine the overall consistency of stock volatility with stock option prices. Black and Scholes (1972) and Latan6 and Rendleman (1976) did find that highvolatility stocks tended to have high option prices (equivalently, high implicit volatilities). However, Black and Scholes (1972) expressed concern that the crosssectional relationship was imperfect, with high-volatility stocks overpredicting and low-volatility stocks underpredicting subsequent option prices. Black and Scholes examined over-the-counter stock options during 1966-69; but a similar relationship was found by Karolyi (1993) for CBOE stock options over 1984-85. The possibility that this originates in an errors-in-variables problem given noisy volatility estimates has not as yet been ruled out. Choi and Shastri (1989) conclude that bid/ask-related biases in volatility estimation cannot explain the puzzle. Blomeyer and Johnson (1988) found that Parkinson (1980) stock volatility estimates substantially underestimated stock put option prices in 1978 even after adjusting for the early-exercise premium. Event studies of predictable volatility changes have had mixed results. Patell and Wolfson (1979) found that stock implicit volatilities increased up until earnings announcements and then dropped substantially, which is consistent with predictable changes in uncertainty. Maloney and Rogalski (1989) found that predictable end-of-year and January seasonal variations in common stock volatility were in fact reflected in call option prices. By contrast, Sheikh (1989) found that predictable increases in stock volatility following stock splits were not reflected in CBOE option prices over 1976-83 at the time the split was announced, but did influence option prices once the split had occurred. Cross-sectional evidence for currency and stock index options appears qualitatively consistent with the risk on the underlying assets. Implicit volatilities reported in Lyons (1988) for Deutschemark, pound and yen options over 1984-85 are comparable in magnitude to the underlying currency volatility of 10-15% per annum. Options on S&P 500 futures typically had implicit volatilities of 15-20% over the three years prior to the stock market crash of 1987 (Bates (1991)), which is comparable in magnitude to standard estimates of pre-crash stock market volatility. That high-volatility assets typically have options with high implicit volatilities is reassuring, especially given volatilities ranging from 5% on the Canadian dollar
Testing optionpricing models
579
to 30%-40% on individual stocks. The evidence of time-varying volatility from implicit volatilities and from A R C H / G A R C H models is sufficiently pronounced as to call into question the utility of more detailed time series/option price comparisons premised upon constant volatility. 3.2.3. Trading strategy tests of option market efficiency Starting with Black and Scholes (1972), many have tested for dynamic arbitrage opportunities that would indicate option mispricing. Such tests start with some assessment of volatility; Black and Scholes used historical volatility from the preceding year, while others have used lagged daily implicit volatilities. All options on a given day are evaluated using the Black-Scholes model (or an American option variant) and "overvalued" and "undervalued" options are identified. Appropriate option positions are taken along with an offsetting hedge position in the underlying asset that is adjusted daily using a "delta" based on the assessed volatility. Any resulting substantial and statistically significant profits are interpreted as a rejection of the Black-Scholes model. Profits are often reported net of the transaction costs associated with the daily alterations in the hedge positions. Since daily hedging is typically imperfect and profits are risky, average profits are sometimes reported on a risk-adjusted basis using Sharpe ratios or Jensen's alpha. 7 The major problem with market efficiency tests is that they are extremely vulnerable to selection bias. Imperfect synchronization with the underlying asset price and bid-ask spreads (on options or on the underlying asset) can generate large percentage errors in option prices, especially for low-priced out-ofthe-money options. 8 Consequently, even a carefully constructed ex ante test that only uses information from earlier periods doesn't guarantee that one can actually transact at the option price/asset price combination identified as "overvalued" or "undervalued". An illustration of this is Shastri and Tandon's (1987) observation with transactions data that delaying exploitation of apparent opportunities by a single trade dramatically reduces average profits. The problem is of course exacerbated in early studies that used badly synchronized closing price data. A further statistical problem is that the distribution of profits from option trading strategies is typically extremely skewed and leptokurtic. This is obviously true for unhedged option positions, since buying options involves limited liability but substantially unlimited potential profit. Merton (1976) points out that this is also the case with delta-hedged positions and specification error. If the true
7See Galai (1983) for a survey of early market efficiencytests. 8The elasticity of the Black-Scholes option price with regard to the underling asset price approaches infinity for options increasingly out-of-the-money,indicating a large impact from small percentage errors in the appropriate underlyingasset price. George and Longstaff(1993) report that bid-ask spreads on S&P 100 index options ranged from 2% to 20% of the option price in 1989.
580
D. S. Bates
process is a jump-diffusion and options are priced correctly, profits from a correctly delta-hedged option position follow a pure jump process: "excess" returns most of the time that are offset by substantial losses on those occasions when the asset price jumps. And although skewed and leptokurtic profit distributions may not pose problems asymptotically, whether t-statistic tests of no average excess returns are reliable on the 1-3 year samples typically used has not been investigated. A third problem with most "market efficiency" studies is that they give no information about which options are mispriced. The typical approach pools options of different strike prices, maturities, even options on different stocks. The "underpriced" options are purchased, the "overpriced" are sold, and the overall profits are reported. Such tests do constitute a valid test of the hypothesis that all options are priced according to the Black-Scholes model - subject, of course, to the data and statistical problems noted above. However, the omnibus rejections reported offer little guidance as to why Black-Scholes is rejected, and which alternative distributional hypotheses would do better. More detail is needed. Bad market volatility assessments, for instance, would affect all options, while mispriced higher moments affect options of different strike prices differently. Greater detail would also be useful in identifying whether the major apparent profit opportunities are in out-of-the-money options, which are especially vulnerable to data problems. Studies such as Fleming (1994) that restrict attention to at-themoney calls and puts appear more reliable and informative. Many studies find excess profits that disappear after taking into account the transaction costs from hedging the position in discrete time; e.g., Fleming (1994). While relevant from a practitioner's viewpoint, these failures to reject BlackScholes are not conclusive. Transaction costs vitiate the arbitrage-based foundation of Black-Scholes, and it is not surprising that few arbitrage opportunities net of transactions costs are found under daily hedging. The model does, however, have equilibrium as well as no-arbitrage foundations. Testing these requires examining whether investing in or writing "mispriced" options represents a speculative opportunity with excessively favorable return/risk tradeoff. Unfortunately, testing option pricing models in an asset pricing context requires substantially longer data bases than those employed hitherto - especially given the skewed and leptokurtic properties of option returns.
3.3. The constant elasticity of variance model The constant elasticity of variance (CEV) option pricing model dS/S = # dt + ~rsp-ldW (11)
first appeared in Cox and Ross (1976b) for the special cases p = 1/2 and p = 0. The more general model subsequently appeared in MacBeth and Merville (1980), Emmanuel and MacBeth (1982), and Cox and Rubinstein (1985). The model received attention for several reasons. First, the model is grounded in the same
581
no-arbitrage argument as the Black-Scholes model. Second, the model is consistent with Black's (1976a) observation that volatility changes are negatively correlated with stock returns - a correlation subsequently if somewhat misleadingly referred to as "leverage effects."9 As such, there was initially some hope that the model could both explain and identify time-varying volatility. Third, the model is potentially consistent with option pricing biases relative to the BlackScholes model. Fourth, the model is compatible with bankruptcy. Recent models of "implied binomial trees" (Dupire (1994), Derman and Kani (1994), and Rubinstein (1994)), which model instantaneous conditional volatility as a flexible but deterministic function of the asset price and time, can be viewed as generalizations of the CEV model. Beckers (1980) estimated the CEV parameters for 47 stocks using daily data over 1972-77, and found return distributions were invariably less positively skewed than the lognormal (p < 1) and typically negatively skewed (p < 0). He simulated option prices for the p = 1/2 and p = 0 cases, although he did not explicitly test for compatibility with observed option prices. Gibbons and Jacklin (1988) examined stock prices over a longer 1962-85 data sample, and almost invariably estimated p between 0 and 1. Melino and Turnbull (1991) estimated CEV processes for 5 currencies over 1979-86 with p constrained to discrete values between 0 and 1, inclusive, and typically rejected the geometric Brownian motion hypothesis (p = 1). Re-estimation over two subsamples of the 1983-85 period for which they had currency option data revealed that all values considered were essentially observationally equivalent both from time series data and with regard to predicted option prices. All CEV models substantially underpredicted option prices during these first two years of the Philadelphia currency option market. In general, the CEV model seems unsuitable for stock index and currency options, and not especially desirable for stock options. While bankruptcy is possible for stocks, it seems inconceivable for stock indexes or currencies. Perhaps more important even for stock options, however, is that the variance of asset returns is modelled as a deterministic and monotonic function of the underlying nominal asset price. Given th~it asset prices have unit roots and typically non-zero drift, the CEV model for p ~ 1 implies that variance either approaches infinity or zero in the long run. The "implied binomial tree" models suffer from a similar problem. Such models therefore require repeated parameter recalibration, indicating fundamental misspecification.
3.4. Stochastic volatility and A R C H models
Given the substantial evidence summarized in Bollerslev, Chou and Kroner (1992) regarding substantial and persistent changes in the volatility of asset re9 Black (1976a) noted that models of financial or operational leverage (i.e., that stockholders receive corporate incomenet of interest paymentsand other fixedcosts) offereda partial explanation of the correlation. Black also noted, however, that leverage effectswere insufficientto explain the magnitude of the price/volatilitycross-effects.
582
D. S. Bates
turns, theorists in the 1970's developed numerical methods for pricing options under stochastic volatility processes. The most popular specification has been an Ornstein-Uhlenbeck process for the log of instantaneous conditional volatility, d(ln a) = (a - flln a)dt + vdW~ (12)
with the log transformation enforcing nonnegativity constraints on volatility. The square root stochastic variance process used inter alia by Cox, Ingersoll, and Ross (1985b) has also received attention:
da 2 = (~ - fl~r2)dt + vv/~a2dW~
(13)
with a reflecting barrier at zero that is attainable when 2~ < v2. Assorted assumptions are made regarding the correlations between volatility shocks and asset and interest rate shocks. European option pricing tractability (but not necessarily plausibility) is substantially increased for the former process when shocks are uncorrelated. By contrast, Fourier inversion techniques proposed by Heston (1993a) and Scott (1994) facilitate European option pricing for the latter process even when there are non-zero volatility shock correlations with asset and interest rate shocks. There has been relatively little empirical research thus far as to the correct specification; or indeed as to whether the diffusion assumption is warranted. As discussed in Section 2.1, assumptions regarding the form and magnitude of the volatility risk premium are also necessary when pricing options off the risk-adjusted versions of (12) or (13). Estimation of stochastic volatility processes on discrete-time data has proved difficult, in two dimensions. First, the fact that volatility is not directly observed implies that maximum likelihood estimation of the parameters of the subordinated volatility process is at best computationally intensive and often essentially impossible. Consequently, stochastic volatility parameter estimates have relied either on time series analysis of volatility proxies such as short-horizon sample variances, or on method of moments estimation using moments of the unconditional distribution of asset returns. Second, testing the implications of time series estimates for option prices under stochastic volatility processes requires an assessment of the current level of instantaneous conditional volatility. T h e filtration issue of identifying that volatility level given past information on asset returns is difficult. Melino and Turnbull (1990), who used an extended Kalman filter, is one of the few papers to directly tackle the issue in an option pricing context, l Other option pricing "tests" of stochastic volatility models have either involved simulations of the implications for option prices of the parameter estimates (e.g., Wiggins (1987)), or alternatively have inferred the instantaneous conditional volatility from option prices conditional upon the parameter estimates. Examples of the latter hybrid and two-stage 10Scott (1987) proposed using a Kalman filter approach to infer the level of volatility - an approach implementedby Harvey, Ruiz, and Shepherd (1994). Kim and Shepherd (1993)discuss the problems posed by the failure of the asset return and volatility processess to satisfy the jointly Gaussian assumptions underlyingthe Kalman filter, and propose a remedy.
583
approach include Scott (1987) for stock options, and Chesney and Scott (1989) for currency options. There are three relevant tests of the stochastic volatility option pricing model relative to Black-Scholes. First, variations over time in assessed volatility should outpredict option prices (equivalently, implicit volatilities) relative to the BlackScholes assumption of a constant volatility inferred from log-differenced asset prices. Second, if volatility is mean-reverting then the term structure of implicit volatilities across different option maturities should be upward (downward) sloping whenever current volatility is below (above) its long-run average level. 11 Third, the leptokurtic and possibly skewed asset return distributions implicit in stochastic volatility models should be reflected in option price/implicit volatility patterns across different strike prices that deviate from those generated by a lognormal distribution. None of the above papers employed the first test. This test is not possible under the hybrid approaches, while Melino and Turnbull (1990) used the time-varying assessed volatility as an input to both the stochastic volatility model and an ad hoc Black-Scholes model with continuously re-adjusted o't. Consequently, these papers effectively focussed on whether the estimated stochastic volatility parameters can explain the cross-sectional patterns of option prices at different strike prices and maturities relative to those generated by assuming a Gaussian distribution with variance ~t 2T for maturity T. Melino and Turnbull found that the stochastic volatility model did reduce the average and root mean squared pricing errors on predicted Canadian dollar option prices over February 1983 to January 1985 relative to the continuously readjusted and ad hoc Black-Scholes model, although the volatility assessments do underpredict option prices on average. Most of the improvement appears attributable to superior predictions of the term structure of implicit volatilities relative to the Black-Scholes assumption of a flat term structure. Further substantial reconciliation of predicted and actual option prices was achieved by judicious choice of the volatility risk premium - a free parameter in the model that substantially influences the term structure of implicit volatilities. Whether the sign and magnitude reflect plausible compensation for volatility risk was not examined. Melino and Turnbull (1990) used 47 moment conditions in conjunction with Hansen's (1982) generalized method of moments (GMM) methodology, and estimated fairly tight standard errors on their parameter estimates. It is difficult to have equal confidence in the parameter estimates and option pricing predictions from other papers, given that the results appear sensitive to the limited choice of moments. Wiggins (1987), for instance, estimated stochastic volatility parameters primarily off of the moments of sample variances, and found the results quite
I1 A caveat is that the implicit volatility is roughly the expected average risk-neutral volatility, which can deviate from the expected average volatility because of a volatility risk premium. Other potential problems with implicit volatilities are discussed in Section 4.1 below.
584
D. S. Bates
sensitive to whether 2-, 4-, or 8-day sample variances were used. Scott (1987) and Chesney and Scott (1989) used exactly identified method of moments estimation based in part upon the unconditional second and fourth moments of asset returns. The standard errors reported in Chesney and Scott (1989) indicate considerable imprecision. Furthermore, the use of fourth moments is vulnerable to specification error, given the attribution to volatile volatility of any unconditional leptokurtosis originating in fat-tailed independent shocks to the underlying asset price. 12 The various autoregressive conditionally heteroskedastic (ARCH) models of time-varying volatility are better designed for the twin problems of process and current volatility estimation from discrete-time asset price data. These models converge in the continuous-time data sampling limit to stochastic volatility models (Nelson (1990)), and provide consistent filtration-based estimates of conditional variance even under misspecification (Nelson (1992))provided the true volatility process follows a diffusion. A R C H models consequently appear well suited for examining whether volatility inferences from time series data are consistent with observed option prices. The downside is that it can be difficult to price options off an estimated A R C H process. Conditional upon assumptions about the appropriate volatility risk premium, European options can be priced via Monte Carlo simulations of the risk-adjusted asset price/asset volatility processes. Most exchange-traded options are American, however, for which Monte Carlo methods cannot readily be used. Studies that have tested ARCH-based volatility assessments on option prices include Cao (1992) for currency options, Myers and Hanson (1993) for commodity options, and Amin and Ng (1994) for stock options. All three papers use ARCH-based volatility assessments as inputs to both an ad hoc Black-Scholes option pricing model and the A R C H option pricing model. As with stochastic volatility papers, therefore, the focus is again on whether the A R C H models' predictions of volatility mean reversion and higher-moment abnormalities fit option prices of different strike prices and maturities better than assuming a Gaussian distribution with variance ~t2T for maturity T. All three papers found some ability of ARCH-based option pricing models to correct Black-Scholes pricing errors, albeit for different reasons. Cao (1992) found that Nelson's (1991) E G A R C H model outpredicted D M option prices in 1988 relative to a comparable-volatility Black-Scholes model. The reasons for the superior performance are unclear. Myers and Hanson (1993) estimated a rollingregression GARCH(1,1)/Student's t process for soybean futures. They found that the major gain for soybean futures option pricing prediction relative to Black's (1976b) geometric Brownian motion model originated in the G A R C H recogni12AS discussed in Bollerslev,Chou and Kroner (1992), GARCH modelers have concluded that time-varyingvariance cannot explain all of the leptokurtosis in unconditional asset returns. Current GARCH models tend to assume fat-tailed shocks to the asset price. Ho, Perraudin and Sorensen (1996) estimated a stochastic volatility asset pricing model with jumps via GMM, and noted that inclusion of the jump component substantiallyaffectedparameter estimates.
585
tion of volatility mean reversion. Amin and Ng (1994) examined the degree to which various ARCH models estimated on a 3-year moving window that included the 1987 stock market crash could predict post-crash stock option prices over July 1988 to December 1989. All models overpredicted observed option prices, and had substantial moneyness- and maturity-related biases. However, the substantially negatively skewed and leptokurtic models such as EGARCH outpredicted the leptokurtic but essentially symmetric GARCH(1,1) model in terms of overall option pricing mean absolute error, while the GARCH model outperformed a comparable-volatility Black-Scholes forecast. Amin and Ng's option pricing improvements clearly originate in superior modelling of the negatively skewed and leptokurtic distributions implicit in post-crash stock option prices. Overall, the tests of stochastic volatility and A R C H / G A R C H option pricing models estimated from time series data are still at an early stage, and far from conclusive. The simulated option trading game in Engle, Kane and Noh (1993) suggests that GARCH(1,1) models are efficient volatility estimators relative to moving-average estimates of sample volatility, but whether this translates into superior predictions of option prices has not in fact been tested directly. Similarly, while some calibrations of stochastic volatility models (e.g., Heston (1993a)) suggest that the higher-moment implications of stochastic volatility shocks do not have a large impact on option prices, the time series plausibility of the calibrations has not been definitively established. Indeed, the Amin and Ng (1994) estimates offer evidence to the contrary, although their modelling assumption that the 1987 stock market crash was just a bad draw from a conditionally normal distribution is questionable. For currency options, the primary testable implications of time-varying volatility models appears to lie in whether the conditional volatility is comparable to volatilities inferred from option prices. Whether the typical estimates of a meanreverting volatility process are consistent with the term structure of implicit volatilities can also be tested. For stock and stock index options, an outlier of the magnitude of October 19, 1987 poses possibly insurmountable problems for estimating stochastic volatility-based option prices from time series data on the underlying asset price.
3.5. Jump-diffusion processes
Merton (1976) suggested that distributions with fatter tails than the lognormal might explain the tendency for deep-in-the-money, deep-out-of-the money, and short-maturity options to sell for more than their Black-Scholes value, and the tendency of near-the-money and longer-maturity options to sell for less. Merton priced options on jump-diffusion processes under the assumption of diversifiable jump risk and independent lognormally distributed jumps. Subsequent work by Jones (1984), Naik and Lee (1990), and Bates (1991) indicates that Merton's model with modified parameters is still relevant even under nondiversifiable jump risk. Others have proposed alternate option pricing models under fat-tailed
586
D. S. Bates
shocks: McCulloch's (1987) stable Paretian model, Madan and Seneta's (1990) variance-gamma model, and Heston's (1993b) gamma process. As of current writing, only Merton's (1976) model has been used in time seriesbased tests of option pricing models. Apart from early work by Press (1967) using the method of cumulants, most papers have used maximum likelihood estimation along with a truncation of the infinite series representation of the likelihood function. Ball and Torous (1985) estimated jump-diffusion processes with meanzero jumps for 30 NYSE stocks, using daily cum-dividend returns over January 1, 1981 to December 31, 1982. They generated theoretical Merton and BlackScholes European option prices with strike prices and maturities matching those observed for CBOE and AMEX American call options on these stocks on January 3, 1983. They concluded that the Merton and Black-Scholes option prices were essentially indistinguishable for the estimated parameters, except for out-ofthe-money January options with less than a month to maturity. Trautmann and Beinert (1994) estimated high-frequency (0.3-2.2 jumps/day) low-amplitude jumps for 14 German stocks based on daily data over 1981-85 and 1986-90, and found that the resulting option prices are virtually identical to those generated from a comparable-volatility no-jump specification. Jorion (1988) similarly estimated jump-diffusion parameters for the $/DM exchange rate and the CRSP value-weighted stock index using weekly and monthly data over January 1974 to December 1985, both with and without an ARCH(l) specification for non-jump conditional volatility. His estimate for $/DM of 1.32 jumps per week with mean jump size essentially 0 and standard deviation of 1.17% induces substantial percentage pricing biases (relative to Black-Scholes values) in OTM options of less than 1-month maturity, but has negligible impact on longer maturities. Jorion noted that the biases are partially but not fully consistent with biases in DM options over 1983-85 reported by Bodurtha and Courtadon (1987), but did not explicitly test that consistency. For the CRSP stock index, Jorion estimated. 17 jumps/week with jump mean of 0 and standard deviation of 3.34%. Simulations again indicate the largest pricing impact for options of less than 1 month maturity, but also some substantial impact on longer maturities. Whether the estimated pricing biases are consistent with those observed in stock index options was not discussed. Jump-diffusion parameter estimates from daily or weekly data typically find high-frequency low-amplitude jump components of relevance only to options with very short maturities. It seems likely that such estimates are picking up lumpy information flows associated with macroeconomic or firm-specific data announcements, as discussed in Ederington and Lee (1993). Whether there is also a low-frequency large-amplitude component such as would be more consistent with 1-6 month option pricing anomalies is difficult to ascertain. It is hard to identify low-frequency jumps on the short data intervals (less than 10 years) typically employed, so parameter estimates for a single jump process naturally gravitate towards the identifiable high-frequency phenomena. A possible solution would be to expand the data set and have two or more independent jump
587
processes, but I know of no paper that has implemented this approach on financial data. 13
4. Implicit parameter estimation

It has been common when examining option pricing models to infer some or all of the distributional parameters from option prices conditional upon the postulated model, rather than estimating parameters from time series data on the underlying asset price. The interest in implicit parameters reflects the fact that options are forward-looking assets, with prices sensitive to distributional moments such as future volatility. Much of the academic interest in options has reflected the potential ability of option prices to offer insights into market expectations of future distributions that are more difficult to infer from time series analysis. A major problem with implicit parameter estimation is that we have no associated statistical theory. Option pricing models are premised upon the underlying parameters and distributional structure being known with certainty, so that implicit parameters should in principle be a matter of inversion rather than estimation. An obvious overidentification problem arises when there are K parameters and N + K option prices. And although measurement error in option prices offers one justification for aggregating information from different option prices, the alternative hypothesis that inconsistencies across options may reflect specification error must constantly be kept in mind. Tests involving implicit parameters are inherently two-stage: information (e.g., implicit volatilities) is inferred from option prices under some aggregation scheme, and is treated as the null hypothesis to be tested using time series data.
4.1. Implicit volatility estimation

Within the Black-Scholes paradigm, a single option quote suffices to identify the implicit parameter a ; see (10). Since synchronous option prices of different strike prices and maturities yield different o's, assorted schemes have been proposed for aggregating the information from different options into a single volatility assessment. The major methods are summarized in Table 1. Most involve weighting schemes that assign equal weight to in- and out-of-the-money options, and most give heavier weight to near-the-money options. The exception is Chiras and Manaster (1978), where a focus on percentage pricing errors results in the heaviest weight falling on the deepest out-of-the-money call and put options. TM A further issue is the choice between point-in-time option prices (e.g., closing or settlement 13 The problem of maximum likelihood estimation given a multiple infinite summation series representation for transition densities can be finessed by instead using Fourier inversion of the characteristic function to evaluate those densities. 14 See Day and Lewis (1988) for a comparison of the Chiras and Manaster (1978) and Whaley (1982) weighting schemes.
588
D . S. B a t e s
prices) and pooled transactions data over some interval (e.g., daily). Since nearthe-money call and put options are typically most heavily traded on centralized exchanges, and trading activity differs for in- and out-of-the-money options, the use of transactions data further affects the relative weights. Given time-varying volatility, it is desirable to construct maturity-specific implicit volatilities from options of a common maturity. Some studies, however, pool across maturities. Underlying the alternate weighting schemes is an implicit presumption of independent measurement error in option prices. Given nonconstant "vega" O 0 / O a across different strike prices, this can translate into substantial noise in implicit volatilities, especially from deep in- and out-of-the-money options. There has, however, been little explicit scrutiny of the nature of this presumed measurement error across strike prices and maturities, and what it implies for optimal weights. For instance, while Whaley's (1982) methodology is consistent with homoskedastic white noise in option prices, there has been little verification of that underlying assumption. Plausible explanations of measurement error include bidask spreads or imperfect synchronization with the underlying asset price - both of
Table 1 Alternate methods for computing weighted implicit standard deviations Model Schmalensee and Trippi (1978) Formula ~"= ~ ~ cri where ~ri is the implicit volatility from the ith option price Oi. Comments Equal weights. Typically implemented on a restricted set of options (e.g., excluding deep out-of-the-money options). Weights don't sum to one, creating biased volatility estimates. Heaviest weight on near-the money options. In-and out-of-the money options weighted symmetrically. Even heavier weight on near-the-money options than the modified LatantRendleman. Typically implemented on transactions data, which affects the relative weights. Even heavier weight on near-the-money options than Whaley (1982).
Latan6 and Rendleman (1976) modified Latan6 and Rendleman Whaley (1982)
~2 = (}2 ~ .~4 ) for

~ ~
wi = N -
oo, ,,
~w OOi ~ri ( ~i ~ ) r l ~ W i = ~-~
# = argmin
~ ~'~'' S"w? '
~[Oi -
Oi(~r)]2
w . - oo,
i
~-
o,~
I ffi
Beckers (1981)
= argmin E
Z~>,
wi[Oi -
Oi(a)] z
~--7-,' w, - o-71o,
Chiras and Manaster (1978) at-the-money
~_jwlo'i = - - W i = ~r OOi a~
_ oo, I
Elasticity-weighted,
with heaviest weight on low-priced, deep out-of-themoney options.
= ffATM
Increasingly standard. A readily replicable benchmark based on actively traded options.
Testingoptionpricingmodels
589
which suggest heteroskedastic option pricing errors that are related to moneyness and maturity. 15 Engle and M u s t a f a (1992) and Bates (1996b) propose a nonlinear generalized least squares m e t h o d o l o g y that allows the appropriate weights to be determined endogenously by the data. A p a r t f r o m measurement error in option prices or in the underlying asset prices, there are other potential sources o f bias when inferring the volatility parameter from observed option prices. First is the issue o f selecting the appropriate short-term interest rate to put into the Black-Scholes formula, whether f r o m Treasury bills, commercial paper, or Eurodollars. M o s t academic studies use Treasury bill yields, but this is less c o m m o n a m o n g practitioners. Furthermore, m o s t empirical tests use the same daily interest rate for evaluating all options on a given day, even when intradaily transactions data are used. Simulations by H a m m e r (1989) indicate a fairly small impact on at-the-money implicit volatilities f r o m using the w r o n g interest rate. 16 Some have attempted to infer which is the appropriate interest rate using pairs o f options; e.g., Brenner and Galai (1986) and French and Martin (1987). Results are somewhat inconclusive, but suggest that the Treasury bill rate is p r o b a b l y too low. Second, the c o m m o n practice o f using a new interest rate every day suggests that a stochastic interest rate model would be more appropriate. However, the fact that interest rates are stochastic does not appear to be a m a j o r concern when inferring volatilities f r o m short-term E u r o p e a n option prices. I f the instantaneous nominal domestic interest rate follows an Ornstein-Uhlenbeck process, then a Black-Scholes formula still applies:
c(F,T;X,r,~rF) =e-rT[FN{ ln(F/X) +la2FT)

(14)
where r is the c o n t i n u o u s l y - c o m p o u n d e d yield from a discount b o n d o f comparable maturity T and a ~ , the average conditional variance o f the forward price over the lifetime o f the option, is a deterministic function o f time under this interest rate process. 17 This specification is not valid for other interest rate pro-
15See George and Longstaff (1993) for evidence of irregular bid-ask spreads across different strike prices and maturities. 16If the true parameters are a = 20% and r = 10%, erroneously using a 9.7% interest rate yields a 20.22% implicit volatility from a 90-day at-the-money option on a nondividend paying stock, with comparable effects at longer maturities but different effects for different strike prices. Most of this error is attributable to the interest rate error's impact on the assessed forward price F = Serr used in (10). Less error arises when that forward price can be inferred more directly; e.g, from futures prices. 17Stochastic interest rate and bond price models that generate option prices of this form are in Merton (1973), Grabbe (1983), Rabinovitch (1989), Hilliard, Madura, and Tucker (1991), and Amin and Jarrow (1991). For foreign currency options it is necessary to impose comparable distributions on foreign interest rates or foreign bond prices.
590
D. S. Bates
cesses (e.g., the square root interest rate process of Cox, Ingersoll, and Ross (1985b))] 8 nor of course is it valid for American options. Nevertheless, the model suggests that the standard practice of using a contemporaneous and comparablematurity money market yield captures the major impact of changing interest rates over time. Furthermore, the fact that interest rates are stochastic and possibly correlated with the underlying asset price is largely captured by the recognition that it is the volatility of the f o r w a r d price rather than the spot price that is implicit in option prices. There is little difference between the two for options maturing in less than a year, although the difference can matter at longer maturities. Ramaswamy and Sundaresan (1985) examine A m e r i c a n futures option pricing under square root stochastic interest rate processes, and conclude that the term structure of interest rates significantly affects short-term American option prices but the fact that interest rates are stochastic does not. Many have pointed out the internal inconsistency involved in re-estimating implicit conditional volatilities daily using a model premised on constant volatility. The impact of the specification error can be assessed using the observation by Hull and White (1987) and Scott (1987) that if volatility evolves independently of the asset price, then the true European option price is the expected value under the risk-neutral distribution of the Black-Scholes option price conditional on the realized average variance over the option's maturity: 19
c =
=0
c Bs
f*(fZ)df" = E
c Bs
(15)
A similar relationship holds for Merton's (1976) jump-diffusion model with mean-zero jumps. Using a Taylor series expansion, cBS(6 -) = c ~ cBs q 2(0a2) 2 Var;(V) (16)
which indicates that the implicit variance #2 inferred using the Black-Scholes formula will be biased upward (downward) relative to risk-neutral expected average variance in regions where the Black-Scholes formula is predominantly convex (concave) in a 2. For at-the-money options, the second-order Taylor approximation2 c B S ~ e - r r F a v / T - / 2 n can be used in conjunction with (16) to further clarify the relationship between implicit and risk-neutral expected average variance:
18 Scott (1994) develops stock option pricing formulas applicable in the Cox et al. (1985b) environment. 19 It is important to note that (15) is an expectation over average v a r i a n c e - not average volatility. A confusion between the two has led some to erroneously conclude that at-the-money implicit volatilities should be unbiased estimates of future volatility. 20 For at-the-money options, F = X and (10) can be written as c s s = e - r r F [ 2 N ( a ~ / T ) - 1]. Expanding N(*) in a second-order Taylor series around 0 yields the approximation.
591
~t---z-~ TM ~ 1
1 Vart(~" )
8 [E;(e)] ~
(17)
There are three caveats. First, the expected average variance under the riskneutral measure will differ from the true expected average variance if there is a volatility risk premium. Second, (15) is invalid for options on stocks and stock indexes, given the strong negative correlations observed between price and volatility shocks for these assets. Equation (15) is also invalid for Merton's jumpdiffusion model when jumps have non-zero mean - another skewed distribution. Consequently, the reliability of implicit volatilities premised on lognormality when the actual distribution is substantially skewed has not been established. Third, (15)-(17) are only valid for European options. Nevertheless, at-the-money implicit volatilities appear relatively robust estimates of future volatility under the alternative distributional hypotheses typically considered, although it is certainly possible to identify parameter values for which this is not the case. Estimates of the volatility of volatility from the time series properties of implicit volatilities suggest that the Jensen's inequality bias in implicit volatilities is typically less than .5% for 1- to 12-month at-the-money options. The difference between actual and "risk-neutral" expected average variance is unknown, but is not likely to be a major factor for short-maturity options. Finally, estimates of implicit parameters under moderately skewed jump-diffusion processes in Bates (1991, 1996a) almost invariably yield implicit volatilities that diverge by less than 1% from the volatilities inferred using an American option variant of the Black-Scholes model.
4.2. Time series properties of implicit volatilities
There has been substantial interest in the time series properties of implicit volatilities. First, since implicit volatilities are a direct proxy for option prices, such analyses offer direct and readily interpretable insights into the stochastic evolution of those prices. Second, if implicit volatilities are good proxies for expected future volatility of the underlying asset price, then further insights into volatility processes can be obtained. Poterba and Summers (1986), for instance, use implicit volatility dynamics to assess how much stock prices should respond to volatility shocks. Several procedural issues arise with regard to time series analysis of implicit volatilities. First, the volatilities should ideally be inferred using a stochastic volatility option pricing model that is consistent with the model fitted to the resulting time series of implicit volatilities. 21 As discussed above, however, implicit variances as measures of expected average variances appear relatively robust to specification error in the option pricing model. Examining volatilities inferred 21"Consistent" does not, of course, mean identical. The two processescan differ because of a volatility risk premium.
592
D. S. Bates
under the Black-Scholes model is consequently a reasonable and informative initial diagnostic of volatility dynamics. A second problem is the quarterly expiration cycle of exchange-traded options. The average maturity of implicit volatilities steadily decreases as options approach maturity, followed by a j u m p increase upon introduction of a new option contract. Most papers acknowledge the problem; not all do something about it. Provided that a linear process in v a r i a n c e is specified, such as the AR(1) in (13) above, it is somewhat straightforward to estimate the A R M A process for instantaneous conditional variances from the (approximate) expected average variances inferred from exchange-traded option prices; see, e.g., Taylor and Xu (1994). 22 Alternate volatility processes are more complicated, and implicitly involve further approximations not typically recognized by the authors when identifying the dynamics of instantaneous conditional volatilities. 23 Time series analyses of implicit volatilities have been perhaps surprisingly consistent in their results, given substantial differences in data construction. Most studies agree that implicit volatilities from stock, stock index, and currency options are substantially serially correlated and follow stationary, mean-reverting processes. Most conclude that a parsimonious AR(1) specification captures the time series properties quite well, with a typical half-life to volatility shocks of 1 to 3 months. Examples include Schmalensee and Trippi (1978), Merville and Pieptea (1989), and Sheikh (1993) for stock options; Poterba and Summers (1986), Stein (1989), Harvey and Whaley (1992b), and Diz and Finucane (1993) for S&P 100 index options; and Taylor and Xu (1994), C a m p a and Chang (1995), Jorion (1995), and Bates (1996b) for currency options. Merville and Pieptea (1989) argue for a mixed mean-reverting diffusion plus white noise for stock implicit volatilities; the noise is perhaps attributable to their use of closing price data. Schmalensee and Trippi (1978) and Sheikh (1993) found substantial negative correlations between stock returns and stock implicit volatilities, qualitatively comparable to the "leverage effect" negative correlations typically observed between returns and a c t u a l volatility. Franks and Schwartz (1991) found similar effects for implicit volatilities from stock index options on the British FTSE 100. Taylor and Xu (1994) present evidence of long-term nonstationarities in the AR(1) specification for currency implicit variances.
22For (13), there is a parameter-dependentlinear mapping between the expected average variance E~~ and the instantaneous conditional variance Vt: EtV = -~[1 - w ( T - t)] + w ( T - t)Vt where w ( T - t) = [1 - e-~(r-t)]/[fl(T - t)], and T - t is the option maturity at time t. This can be used to estimate the parameters ct and fl of the Vt process given Et ~" data. The procedure does of course involve assuming 6-2 ~ E~P ~ Eta'. A bias correction based on (17) can improve the first approximation. 23For instance, Stein (1989) uses a linear volatility process and assumes that expected average volatilities equal implicit volatilities from at-the-money option prices. That assumption reflects a confusion between standard deviations and variances, but may neverthless be a reasonable approximation. (15) - (17) above indicate the relationship between implicit and expected average variances.
t"
Testing optionpricing models 4.3. Implicit volatilities as forecasts of future volatility
593
The informational content of the volatilities inferred from option prices is usually tested by regressing some measure of realized volatility upon implicit volatilities. Three issues arise. First, whether implicit volatilities are informative with regard to future volatility is typically examined by looking at the statistical significance of the slope coefficient. Second, whether implicit volatilities are unbiased forecasts of future volatility is examined by testing for zero intercept and unitary slope. Third, there is the issue of whether implicit volatilities are informationally efficient forecasts; i.e., whether they incorporate all readily available information regarding future volatility. This has been tested by adding the additional information (e.g., historical volatilities) i/l a multivariate "encompassing regression" framework and testing the statistical significance of the additional variable(s). Early studies of the forecasting power of stock option implicit volatilities were typically cross-sectional. Perhaps the earliest example was Black and Scholes (1972) observation that the ex post sample volatility over the option's lifetime better captured the cross-sectional dispersion of option prices than did ex ante historical volatility. Latan6 and Rendleman (1976) similarly observed that their (biased) implicit volatility estimates from CBOE call options on 24 stocks over 1973-74 had a higher cross-sectional correlation with concurrent and subsequent realized stock volatilities than did historical volatility estimates from an earlier 4year sample. Chiras and Manaster (1978) concluded that the cross-sectional informativeness of their weighted implicit standard deviation (WISD) measure increased over June 1973 to April 1975 (the early years of the CBOE option market), with higher R 2 from 20-day volatility forecasts in the last 14 months than in the first nine. Furthermore, 20-day historical volatilities typically contributed no statistically significant additional information to the WISD volatility forecasts in the last 14 months. However, the WISD was a substantially biased forecast of cross-sectional stock volatility, with monthly slope coefficients ranging from .29 to .83. Beckers (1981) looked at various implicit standard deviation methodologies (at-the-money, modified Latan&Rendleman, his own method) predominantly using daily closing price data on 62-115 CBOE stock options over October 13, 1975 to January 23, 1976. He concluded that at-the-money implicit volatilities were at least as good as other methodologies, and that all implicit volatility methods outperformed quarterly historical estimates with regard to cross-sectional stock volatility forecasting. However, he also noted that implicit volatilities were biased and not informationally efficient, since historical volatilities contributed additional information. Subsequent tests of implicit volatilities have regressed realized upon implicit volatilities in a time series context. Realized volatility is typically computed as the sample volatility either over the lifetime of the option, or over some fixed future horizon (e.g., 1 week). The former method is more consistent with the maturity of the implicit volatility, but typically results in overlapping observations given 1-6 month option maturities. Furthermore, as discussed in Fleming (1994), the
594
D. S. Bates
standard Hansen-Hodrick (1980) GMM correction for the moving average component in overlapping fixed-horizon forecast errors is inappropriate given that the option maturity shrinks over time as the option approaches expiration. 24 Using fixed-horizon volatility over shorter intervals typically yields nonoverlapping observations, allowing standard ordinary least squares regressions. The downside is the maturity mismatch between realized and implicit volatility, which may affect the results. Lamoureux and Lastrapes (1993) examined implicit volatilities from CBOE call options on 10 non-dividend paying stocks over April 19, 1982 to March 31, 1984, and compared the 1-day and option-lifetime volatility forecasts with those from GARCH and historical volatility estimates. They concluded that implicit volatilities were biased but informative, and that historical volatilities provided additional information for volatility forecasting. Canina and Figlewski (1993) examined the ability of implicit volatilities from closing prices of S&P 100 index call options over March 1983 to March 1987 to forecast future realized volatility over the lifetime of the option. Rather startlingly, they found that implicit volatilities from options of assorted moneynesses and maturities were virtually useless in forecasting future S&P 100 index volatility. And although implicit volatilities from noisy closing data undoubtedly suffer from an errors-in-variables problem, biasing slope coefficients towards 0, simulations in Jorion (1995) suggest that this effect should not be large enough to explain Canina and Figlewski's results. By contrast, Day and Lewis (1992) found that S&P 100 implicit volatilities' forecasts of subsequent weekly volatility for 319 weeks over November 1983 to December 1989 (including the stock market crashes of 1987 and 1989) were definitely informative and close to unbiased. Day and Lewis also concluded, however, that G A R C H and EGARCH volatility assessments contain additional information not captured by the implicit volatility. Fleming (1994) regressedfirst-differenced realized volatility (options' lifetime and 28-day) on first-differenced implicit volatilities using daily transactions data over October 1985-April 1992, excluding the 1987 crash period. He concluded that the implicit volatility was a biased but substantially informative forecast of future vol-atility, and that implicit volatilities were informationally efficient relative to other variables such as 28-day historical volatility. Reconciling the three papers is difficult, given differences in sample period, methodology, and data construction. Perhaps the appropriate conclusion is that the extremely active S&P 100 option market was inefficient in its early years, but has improved over time. Foreign currency options have been examined by Scott (1992), Jorion (1995), and Bates (1996a). Scott (1992) examined the implicit volatility less intraquarterly historical volatility as a forecast of changes in future intraquarterly volatility over 1983 to 1989, using non-overlapping data. He concluded that pound, Deutschemark and Swiss franc implicit volatilities were informative and close to unbiased forecasts of future volatility, but that yen implicit volatilities had no informa-
24 Flemingdevelopsa modifiedGMM estimator to handle the problem.
595
tional content. A similar conclusion was reached by Bates (1996a) with regard to weekly volatility forecasts from Deutschemark and yen futures options over 198492 and 1986-92, respectively. Jorion (1995) examined Deutschemark, yen, and Swiss franc futures options over January 1985 to February 1992. He found that implicit volatilities were almost unbiased forecasts of the next day's absolute return, but were more biased forecasts of the volatility over the lifetime of the option. In both cases, 20-day historical volatility and GARCH-based volatility assessments contributed no additional information. Almost all studies have, therefore, found implicit volatilities to contain information with regard to future volatility. The volatility forecasts from implicit volatilities are apparently biased for stock options, stock index options, and yen options, but are close to unbiased for other currency options. Other sources of volatility information can be used to improve on a bias-adjusted implicit volatility forecast in some cases, depending upon the security and the period. There are several possible explanations why implicit volatility forecasts might be biased forecasts of actual volatility. As noted in Section 4.1 above, implicit variances can potentially deviate from risk-neutral expected average variances for a number of reasons, while risk-neutral and actual expected average variances will diverge in the presence of a substantial volatility risk premium. Alternatively, options may be mispriced. Fleming (1994) and Engle, Kane, and Noh (1994) explore the last explanation by examining the profits from trading volatilitysensitive straddles (1 call plus 1 put) on the S&P 100 index. Fleming reports substantial profits that disappear when trading costs are taken into account. Engle, Kane, and Noh used a GARCH-based straddle trading strategy and found substantial profits net of transaction costs. Both studies include the post-crash period, which may be atypical given the trauma of the crash.
4.4. Implicit volatility patterns." evidence for alternate distributional hypotheses The Black-Scholes hypothesis of geometric Brownian motion implies that all options regardless of strike price and maturity depend upon the single parameter o-. Various methods are commonly employed to examine the cross-sectional pricing errors of the Black-Scholes model, in order to assess which alternative distributional hypotheses are more compatible with observed option prices. One approach is to compute a single daily implicit volatility from at-the-money or pooled options, price all options conditional on that implicit volatility, and describe how the resulting option pricing residuals vary by moneyness and maturity. An alternate technique proposed by Rubinstein (1985) computes option-specific implicit standard deviations (ISD's), and uses carefully synchronized pairs of option transactions to identify typical patterns in implicit volatilities across different strike prices and maturities. Since implicit volatilities are monotonically increasing functions of option prices, the two methods are substantially equivalent. A divergent focus on mean pricing errors versus median ISD patterns necessitates different tests of statistical significance.
596
D. S. Bates
The first derivative of the European call or put option price with respect to the strike price is proportional to the relevant risk-neutral tail probability, while the second derivative is proportional to the probability density. The pattern of residuals or implicit volatilities across different strike prices (moneyness biases) consequently provides direct evidence for European options of the shape of the risk-neutral density and distribution, relative to the benchmark hypothesis of a lognormal distribution. A symmetric leptokurtic distribution implies out-of-themoney call and puts (which pay off under realizations in the tails) are more valuable than predicted by a lognormal distribution, and consequently generates a symmetric U-shaped pattern or "volatility smile" in implicit volatilities across different strike prices. Skewness "tilts" the ISD patterns, with positive (negative) skewness typically increasing (decreasing) the values and implicit volatilities of OTM calls/ITM puts relative to the values and implicit volatilities of correspondingly OTM puts/ITM calls. 25 The early-exercise premium associated with American options complicates the analysis, especially if the implicit volatilities are erroneously computed using a European option pricing model. A comparison of ISD's across maturities is primarily indicative of whether the term structure of implicit volatilities was typically upward or downward sloping, suggesting equivalent patterns for expected average variances over different option maturities. Typical estimates of volatility mean reversion indicate that either or both patterns can occur repeatedly within a typical 1- to 3-year data interval. 26 Consequently, while instantaneous maturity biases are interesting, median maturity patterns in ISD's from data aggregated over a longer interval appear uninformative. The strike price/maturity cross-effects are perhaps of greater interest. Leptokurtic models such as Merton (1976) that rely on independent fat-tailed finitevariance shocks to the underlying asset price imply by the central limit theorem an inverse relationship between implicit skewness/leptokurtosis magnitudes and option maturity. By contrast, standard stochastic volatility models are instantaneously lognormal and imply skewness and leptokurtosis magnitudes initially increase with option maturity. The two models therefore alternately predict decreasingly/increasingly pronounced strike price patterns for short-maturity options as maturity increases, provided the strike price spacing is adjusted proportionally to the appropriate standard deviation at different horizons. For a flat term structure of annualized volatilities, this implies increasing strike price spacing with the square root of maturity. Further adjustments are necessary if the term structure is not flat. Absent these adjustments, it is more difficult to dis-
25 Hull (1993,pp. 436438) discussesthe impact of skewness and leptokurtosisupon option prices and Black-Scholes option pricing residuals. See also Bates (1991, 1994) for the impact of skewed distributions on the relativeprices of OTM call and put options, and Shastri and Wethyavivorn(1987) for some illustrations of implicit volatilitypatterns under alternate distributional hypotheses. 26Taylor and Xu (1994)found that the term structure of implicitvolatilitiesfrom foreigncurrency options reversedslope every few months over 1985-89.
597
tinguish between these alternative distributional hypotheses from moneyness/ maturity cross-effects. Finally, studies that look at both call and put options have compared implicit volatilities from the two and reported significant differences; e.g., Whaley's (1986) study of 1983 S&P 500 futures options. There is no obvious theoretical explanation why the two should diverge, since put-call parity implies that European call and put options of identical moneyness and maturity should have identical implicit volatilities. Whaley's results are probably attributable to the fact that the puts have a lower average strike price than the calls, 27 so that the put-call comparison is picking up the moneyness biases also reported in Whaley (1986). Bates (1991) found little difference between at-the-money call and put prices on S&P 500 futures over 1985-87, indicating comparable implicit volatilities. Alternate nonparametric and parametric methods also exist that shed light on which distributional hypotheses would be more consistent with observed option prices. The "skewness premium," or percentage deviation between call and put prices for options comparably out-of-the-money, is shown in Bates (1991, 1994) to be a useful diagnostic of which distributions are consistent with the skewness implicit in option prices. The intuition is that since OTM call and put options pay off only under realizations in the upper and lower tails, respectively, the relative price of those options is a direct indication of asymmetries in the tails. A related measure based on implicit standard deviations is in Gemmill (1991). Multiparameter distributions that include the lognormal as a special case have been fitted to daily option prices; examples include the constant elasticity of variance model used by MacBeth and Merville (1980) and Emmanuel and MacBeth (1982); the pure-jump model used by Borensztein and Dooley (1987); and the jumpdiffusion model used by Bates (1991, 1996a). Finally, Dupire (1994), Derman and Kani (1994), and Rubinstein (1994) have proposed estimating implicit distributions using an "implied binomial tree" methodology, which can be viewed as a flexible generalization of the constant elasticity of variance model. Instantaneous maturity effects clearly reject the original Black-Scholes assumption of a flat term structure of implicit volatilities. Furthermore, the term structure of at-the-money implicit volatilities is typically suggestive of a meanreverting volatility process: upward sloping when short-term implicit volatilities are low, inverted when short-term volatilities are high. See Taylor and Xu (1994) for evidence from currency options, and Stein (1989) for evidence from S&P 100 index options. Option pricing residuals, implicit volatility patterns, and implicit parameter estimates from stock options indicate that there is no single alternative distributional hypothesis that can eliminate the Black-Scholes strike price biases. The biases change sign over time, indicating changes in implicit skewness relative to the slightly positively skewed lognormal distribution underlying Black-Scholes. For
27 See Table II in Whaley (1986). The average strike price is relevant becauseWhaley'simplicit standard deviationmeasure is transaction-weighted.
598
D. S. Bates
instance, evidence favoring a distribution less positively skewed than the lognormal and possibly negatively skewed has been found by Rubinstein (1985) for 30 stock options over August 1976-October 1977; by MacBeth and Merville (1980) and Emmanuel and MacBeth (1982) for 6 stock options in 1976; by Chen and Welsh (1993) for the fourth quarter of 1979; and by Culumovic and Welsh (1994) for stock options in the six quarters following the stock market crash of October 19, 1987. By contrast, evidence favoring a distribution more positively skewed than the lognormal has been found by Rubinstein (1985) for October 1977-August 1978; by Emmanuel and MacBeth (1982) for most of 1978; by Chen and Welsh (1993) for 1978 and most of 1979; by Karolyi (1993) for 74 stock options over 1984-85; and by Culumovic and Welsh (1994) for the last three quarters of 1989. And while there is a tendency for most stocks to exhibit similar moneyness patterns at the same time,2s Culumovic and Welsh found that this is not fully reliable over 1987-89. Stock index options also evince substantial evolution in moneyness biases over time. Whaley (1986) documented S&P 500 futures option residuals in 1983 (the first year of trading) that were consistent with a distribution more negatively skewed than the lognormal. Sheikh (1991) examined ISD patterns for options on the S&P 100 index over 1983-85, and found relatively negatively skewed distributions in 1983-84 and leptokurtic distributions of mixed skewness in 1985. Bates (1991) found substantial evolution in implicit skewness in S&P 500 futures options over 1985-87: positive in 1985, roughly symmetric over most of 1986, and periods of substantial negative skewness in late 1986, early and mid-87, and following the stock market crash in October 1987. Bates (1994) found persistent and strongly negative implicit skewness in S&P 500 futures options throughout the post-crash period of October 20, 1987 to December 31, 1993. A comparison of Culumovic and Welsh (1994) and Bates (1994) indicates that the moneyness biases in stock index options were at times of opposite sign from those observed contemporaneously in most stock options. Foreign currency option pricing biases can roughly be divided into two periods: the 1983-87 period when options on foreign currencies and foreign currency futures were first introduced on centralized exchanges and the dollar was initially quite strong, and the subsequent 1988-92 period. The early years of the currency option markets were characterized by substantial positive implicit skewness (on foreign currencies) and leptokurtosis. Bodurtha and Courtadon (1987) found option pricing residuals from five foreign currency options over 1983-85 that were consistent with a distribution more positively skewed than the lognormal for all currencies. Estimates of pure-jump parameters on the same data base by Borensztein and Dooley yielded substantial positive implicit skewness, 29 as did
2sSee, e.g., the comovementsin stock-specificCEV parameter estimatesreported in Emmanuel and MacBeth(1982). The CEV parameteris directlyrelated to implicitskewness. 29Since Borenszteinand Dooleyconstrainedjump magnitudesto be positive, negativeskewness was precluded.Neverthless,the modeldid allowfor implicitskewnessarbitrarilycloseto zero, via the possibility of a high-frequencylow-amplitudejump componentobservationallyequivalentto geometric Brownianmotion.
599
implicit parameter estimates for pooled 1984-85 and 1986-87 Deutschemark options by Bates (1996b) using stochastic volatility and stochastic volatility/jumpdiffusion models. Exceptions are Adams and Wyatt (1987), who used 1983 closing data, and Shastri and Tandon (1987), who used 1983-84 transactions data. These papers regressed currency option pricing residuals on moneyness and maturity and found little clear-cut moneyness and maturity effects. It is possible that regression-based summaries of pricing biases are too crude, given intrinsic nonlinearities in residuals when both skewness and leptokurtosis are present. Hsieh and Manas-Anton (1988) found implicit volatility patterns in 1984 Deutschemark futures options roughly consistent with a leptokurtic, positively skewed distribution. Bates (1996a) found substantial positive implicit skewness in DM futures options over 1984-87, especially during the appreciating-dollar period of 1984 and early 1985. The 1987-92 period appears to have been predominantly characterized by a leptokurtic but roughly symmetric distribution implicit in currency options. Ben Khelifa (1991) found that a "volatility smile" was typically observed in five currency options over 1984-89; Cao (1992) found similar results for the 1988 Deutschemark options. Implicit parameter estimates on pooled DM options data over 1988-89 and 1990-91 in Bates (1996b) using a stochastic volatility/jumpdiffusion model indicate overall a leptokurtic, symmetric distribution. Daily implicit parameter estimates on DM and yen futures options over 1986-92 in Bates (1996a) indicate oscillating skewness that is small in magnitude relative to 198485 levels. The oscillations are typically but not invariably synchronized across the two currency options, and are strongly correlated with the relative trading activity in calls versus puts. The historical fluctuations in the sign of implicit skewness observed in stock, stock index, and currency options imply that none of the current alternative distributional hypotheses can consistently outperform Black-Scholes with regard to fitting option prices. All current models are consistently either more or less skewed than the lognormal. We need models of time-varying skewness, to complement our existing models of time-varying volatility. Furthermore, many of the existing alternate models do not differ substantially from the lognormal. Thus, while Rubinstein (1985) and Sheikh (1991) argue that volatility patterns are at times consistent with "leverage" models of equity, Bates (1991, 1994) points out that leverage models imply future stock price distributions intermediate between the normal and lognormal - a very narrow range compared with values of implicit skewness typically observed. A similar point emerges from MacBeth and Merville's (1980) and Emmanuel and MacBeth's (1982) estimates of constant elasticity of variance parameters well outside the 0 < p < 1 leverage range. Implicit skewness is not only time-varying, but can also be large relative to many standard models.
600
D. S. Bates
5. Implicit parameter tests of alternate distributional hypotheses The interpretation of Black-Scholes option pricing biases as evidence of skewed and/or leptokurtic distributions is of course premised upon option prices being representative of the underlying risk-neutral distribution. An alternate hypothesis is that the options are mispriced; either because of market frictions, or possibly because of data problems. For instance, as discussed in Section 2.3, option price violations of intrinsic-value lower bounds are commonly observed - probably because of synchronization error between option and asset price data. Canina and Figlewski (1993) point out that the common practice of throwing out the violations involves one-sided data censoring, biasing upward average in-the-money option prices. If options are correctly priced, than any abnormalities implicit in option prices should be reflected in the underlying time series - subject, as always, to the caveat that the risk-neutral and actual distributions can differ. There have, however, been relatively few tests of the informativeness of implicit distributions inferred under alternate distributional hypotheses. Much of implicit parameter estimation has been essentially descriptive: an examination of what would better fit option prices. Whether these implicit parameters are plausible when measured against the time series properties of the underlying asset price has been less thoroughly examined. Part of the reason is that inferring parameters from American options under alternative distributional hypotheses is typically computationally intensive. Stochastic volatility models involve an additional state variable, dramatically increasing the cost of finite-difference methods. Finite-difference methods for jumpdiffusions have similarly higher costs, although Bates (1991) develops a good approximation for quickly evaluating American options on jump-diffusion processes. And although American option evaluation under CEV processes is simplified by a transformation of variables discussed in Nelson and Ramaswamy (1990), the transformation can only be used in the limited and uninteresting parameter range 0 _< p _< 2 (Bates (1991)). An often-exploited loophole is that American option prices are well approximated by European prices in some cases. Furthermore, there are more implicit parameters to be estimated from option prices than the single volatility parameter of the geometric Brownian motion model. Nonlinear multi-parameter techniques such as quadratic hill-climbing can be used, but require substantially more option evaluations. Globally optimal implicit parameter estimates cannot be guaranteed for these more general models. 30 The sections below discuss the limited existing research on implicit parameterbased tests of various alternative distributional hypotheses, with an emphasis on the testable predictions of these alternate specifications.
30Bates(1991, 1996a)frequentlyfound multiple locallyoptimal equilibria wheninferring 4 jumpdiffusion parameters daily from stock index and currencyfutures options.
Testing optionpricing models 5.1. Constant elasticity of variance processes
601
The constant elasticity of variance (CEV) model predicts that both asset return volatility and Black-Scholes implicit volatilities should change deterministically over time as a function of the underlying asset price. Whereas the original MacBeth and Merville (1980) implicit CEV parameter estimation was essentially descriptive of moneyness biases, subsequent papers have tested the above propositions to some extent. Emmanuel and MacBeth (1982) found that daily implicit CEV parameters varied over 1976 and 1978, yielding implicit distributions less positively skewed than the lognormal and sometimes negatively skewed over 1976 for 6 stock options, and distributions more positively skewed than the lognormal over April-November 1978 for 4 out of 6 stock options. Since stock return volatility innovations were negatively correlated with stock returns in 1976 and in 1978, only the 1976 option pricing patterns were qualitatively consistent with observed price/volatility correlations. Furthermore, Emmanuel and MacBeth found little ability of the CEV model to fit next month's option prices better than Black-Scholes conditional on the stock price change over the month, although results were better for 1976 than for 1978. There was some ability to outpredict Black-Scholes' forecast of the next day's option prices - probably because of serial correlation in the Black-Scholes moneyness biases "explained" by the CEV model. Peterson, Scott, and Tucker (1988) estimated the CEV parameters implicit in foreign currency options (5 currencies, 4 contracts, Sept. 1983-June 1984) at contract inception, and generally found implicit foreign currency distributions more positively skewed than the lognormal (p > 1). Their test of the forecasting power for future option prices essentially indicates that the moneyness biases captured by the CEV model were persistent at 1-3 day horizons, but that the predicted changes in implicit volatilities given exchange rate changes were not discernable. Scott and Tucker (1989) found that CEV-based implicit volatilities did about the same as Black-Scholes in predicting actual currency volatility over 1983-87, despite substantial changes in exchange rates.
5.2. Stochastic volatility processes
At first blush, it does not appear possible to substantially refine the distributional predictions of the stochastic volatility model for asset returns beyond the existing tests of whether implicit volatilities from an ad hoc Black-Scholes model are unbiased and informationally efficient forecasts of future volatility. While in principle the volatilities inferred using a stochastic volatility model are less biased than an at-the-money Black-Scholes implicit volatility, the bias appears small for standard estimates of the volatility of volatility. Second, the ad hoc approach, by computing sample variances over options' lifetimes, effectively captures any volatility changes that would be predicted by a stochastic volatility model. Finally, although stochastic volatility models predict conditionally and unconditionally leptokurtic distributions, the magnitude is small relative to sample leptokurtosis.
602
D. S. Bates
There are, however, two additional testable distributional predictions from stochastic volatility models. First, the stochastic volatility model typically predicts volatility changes relative to the Black-Scholes assumption of constant volatility. Testing this requires a maturity mismatch between options and time series; e.g., testing whether daily or weekly asset return volatility subsequently tends to increase (decline) whenever the term structure of implicit volatilities is upward sloping (inverted). Second, stochastic volatility models attribute any skewness implicit in option prices to a corresponding correlation between volatility and asset return shocks. As with CEV models, whether the predicted correlations are in fact observed can be tested. Stochastic volatility models contain a number of testable predictions for the time series properties of implicit volatilities - or, equivalently, for the stochastic evolution of option prices. First, since stochastic volatility option pricing models are premised upon an explicit volatility process, whether the time series properties of volatilities inferred from option prices are consistent with the postulated process can be tested. 31 Probably the most important issue is whether implicit volatilities actually follow the one-factor mean-reverting AR(1) specification typically postulated for some transform of volatility. Issues regarding the volatility of volatility and whether implicit volatilities follow a diffusion can also be examined. Stein (1989) argued that the observed average term structure of S&P 100 implicit volatilities over December 1983 to September 1987 was inconsistent with the time series properties of implicit volatilities. Stein's argument was based on two tests. First, the average half-life to volatility shocks implicit in the term structure was 17.9 weeks, substantially and statistically significantly higher than the 5.4week half-life estimated from the time series properties of implicit volatilities. Stein described this difference as "overreaction" of long-maturity options to short-maturity volatility shocks. Second, Stein tested and rejected the expectations hypothesis that the current forecast of next month's 1-month implicit volatility inferred from 1- and 2-month options is unbiased and informationally efficient. The former test is heavily dependent upon Stein's AR(1) specification for volatility; the latter test less so. Stein's results are disputed by Diz and Finucane (1993), who found no evidence of overreaction over December 1985 - November 1988 under either test - not even for an 1985-87 data sample that overlaps with Stein's data. 32 Diz and Finucane attribute the difference in results to their use of cleaner intradaily data. Omission of the early years of the S&P 100 index option market may also have had an effect. Analyses of the term structure of implicit volatilities from foreign currency options have found qualitative agreement with the time series properties of implicit volatilities. Taylor and Xu (1994) found that both the term structure and the 31A similar question regarding the compatibilityof the time seriesproperties of interest rates with postulated bond pricing models is a central issue in the bond pricing literature. 32 Diz and Finucane report in their paper only the AR(1)-based tests. They also tested and could not reject the expectationshypothesis(private communication).
603
time series estimates over 1985-89 yielded a typical half-life to foreign currency volatility shocks around 1 month. Bates (1996b) found that the term structure from Deutschemark options yielded plausible half-lives of 1-3 months over 198687, 1988-89, and 1900-91. The earliest 1984-85 period had 12-24 month halflives, sharply inconsistent with observed volatility mean reversion. Campa and Chang (1995) tested and failed to reject the expectations hypothesis using December 1989 to March 1992 volatility quotes from the interbank foreign currency option market. Bates (1996b) also found that the volatility of volatility inferred from Deutschemark option prices under a stochastic volatility model was significantly different from the volatility of implicit volatilities. Ludicrously high values of the volatility of volatility were necessary to generate implicit leptokurtosis of a magnitude consistent with the "volatility smile" in currency options. Under such values, implicit volatilities should be repeatedly reflecting off zero and attaining enormous values; neither was observed. The implication is that either the implicit leptokurtosis is attributable to fat-tailed exchange rate shocks, or options are mispriced. A further implication is that volatile volatility imparts little bias to Black-Scholes implicit volatilities under "reasonable" values of the volatility of volatility.
5.3. Jump processes
Most papers that estimate jump processes implicit in option prices have been descriptive. And although jump processes appear qualitatively consistent with many features of asset return distributions (e.g., leptokurtosis that is more pronounced at daily and weekly frequencies than at monthly or quarterly), there have been very few tests of whether the distributions inferred from option prices using a model with jumps are in fact consistent with observed asset returns. Borensztein and Dooley (1987), for instance, showed that a substantially positively skewed pure-jump model fitted foreign currency option prices better in 1983-85 than the Black-Scholes model, but did not test the model's plausibility against exchange rate data. Bates (1991) used jump-diffusion parameters inferred daily from S&P 500 futures options over 1985-87 to gauge crash fears prior to the stock market crash of 1987. Although there were periods when the jump-diffusion model fitted option prices substantially better than the nested geometric Brownian motion model, whether those periods represented ex post a better description of the conditional distribution of futures prices was not tested. 33 Testing jump-diffusion implicit parameters against no-jump implicit volatilities on asset prices is primarily a test of third and fourth moments, since the implicit second moments are typically comparable (Bates (1991, 1996a)). Bates (1996a) inferred jump-diffusion parameters daily from 1-4 month Deutschemark and yen
33Pre-crash option prices in Septemberand October 1987certainlydid not predicta stockmarket crash.
604
D. S. Bates
futures options over 1984--92 and 1986-92, respectively. For Deutschemark options, the higher-moment distributional abnormalities inferred from option prices did in fact contain statistically significant information for subsequent abnormal distributions in weekly log-differenced $/DM futures prices, although the predictions were not unbiased. Yen futures options contained no information whatsoever for subsequent S/yen futures price distributions. Bates (1996b) estimated a stochastic volatility/jump-diffusion process implicit in Deutschemark options over 1984-91, imposing constant parameters over the full data sample. An infrequent (biannual) substantial jump process was inferred from option prices, qualitatively consistent with one "outlier" in weekly log-differenced $/DM futures prices over the period. Owing to a fundamental lack of power when testing an infrequent jump hypothesis on eight years of data, the hypothesis of no jumps was as plausible as the hypothesis that jump magnitudes matched those inferred from option prices.
6. Summary and conclusions
This paper has argued that the central empirical issue in option pricing is whether the distributions implicit in option prices are consistent with the conditional distributions of the underlying asset prices. Tests of consistency are almost invariably conducted within the framework of a particular distributional hypothesis, and therefore to some extent involve a joint test of consistency and of that distributional hypothesis. The most common framework by far has been the geometric Brownian motion hypothesis underlying the Black-Scholes model. This one-parameter model has been used extensively to examine whether volatility assessments inferred from option prices are consistent with the conditional volatility of the underlying asset price. Results have been mixed: implicit volatilities from most currency options are relatively unbiased forecasts of future currency volatility, whereas substantial biases have been found in implicit volatitities from stock and stock index options. There also seems to have been substantial evolution in the sophistication of option markets. Results including the early years of options markets typically involve more noise (e.g., more arbitrage violations) and a greater divergence from the time series properties of asset prices and implicit volatilities than found in studies from later periods. By comparison with the studies of volatility compatibility between options and time series, studies of expected volatility changes and of higher moments are still in their infancy. To some degree, this is appropriate, given a somewhat hierarchical ordering among these three issues. If the volatility assessments diverge between options and time series, there is little reason to believe that moving to a more complicated model with time-varying variances or fat-tailed shocks will yield greater agreement regarding conditional distributions. The (risk-neutral) expected average variance over the lifetime of the option is the single most important determinant of near-the-money option prices. Other factors that induce
605
skewness or excess kurtosis are typically second-order by comparison. 34 And although model misspecification can in principle affect volatility inferences from option prices, the alternate models considered hitherto suggest that misspecification does not have a large impact in practice. It is of course important to keep in mind alternate explanations for observed deviations between option prices and time series. Option prices are not actuarially fair when compensation for systematic risk is required. Volatility risk premia could in principle explain a divergence between implicit variances and expected average variances over a finite horizon. It would, however, be easier to have confidence in this explanation if there had been more serious work in an asset pricing context on the plausible magnitude of these risk premia. The possibility that reported divergences represent data synchronization problems, bid-ask spreads, or outright errors in the option pricing methodology must also be kept in mind. Small errors can have large effects in option pricing research; e.g., using an option maturity that is off by a few days. Nevertheless, option prices do indicate an assortment of interesting phenomena that are worth modelling and testing against the time series properties of the underlying asset price. Predicted volatility changes and higher-moment phenomena are implicit in option prices; whether they are subsequently realized by the underlying asset price requires additional investigation. Fluctuations in moneyness biases over time suggest the need for models of time-varying skewness. It may be that these phenomena are attributable to market microstructure effects. The fluctuations in implicit skewness are highly correlated with relative trading activity in calls versus puts for foreign currency futures options (Bates 1996a) and for S&P 500 futures options (Bates 1994). An alternate hypothesis is, for instance, that it represents price-gouging by option writers as the relative demand for out-of-the-money calls versus puts by the end-users of options fluctuates. But the initial null hypothesis must always be that options are in fact priced rationally - i.e., consistently with the time series properties of the underlying asset price. Conclusive tests of that hypothesis are an important and necessary first step before alternative explanations can be put forward.
References
Adams, P. D. and S. B. Wyatt (1987). Biases in option prices: Evidence from the foreign currency option market. J. Banking Finance 11, 549-562. Ahn, C. M. and H. E. Thompson (1988). Jump-diffusion processes and the term structure of interest rates. J. Finance 43, 155-174. Allegretto, W., G. Barone-Adesi and R. J. Elliott (1995). Numerical evaluation of the critical price and American options. Europ. J. Finance 1, 69-78.
34Perhaps the one major exception to this general statement is the extremely pronounced and persistent negative skewness implicit in U.S. stock index options since the stock market crash in 1987.
606
D. S. Bates
Amin, K. I. and R. A. Jarrow (1991). Pricing foreign currency options under stochastic interest rates. J. Internat. Money Finance 10, 310-329. Amin, K. I. and V. K. Ng (1994). A comparison of predictable volatility models using option data. Research Department Working Paper, International Monetary Fund. Ball, C. A. and W. N. Torous (1985). On jumps in common stock prices and their impact on call option pricing. J. Finance 40, 155 173. Barone-Adesi, G. and R. E. Whaley (1987). Efficient analytic approximation of American option values. J. Finance 42, 301-320. Bates, D. S. (1988). Pricing options on jump-diffusion processes. Rodney L. White Center Working Paper 37-88, Wharton School. Bates, D. S. (1991). The crash of '87: Was it expected? The evidence from options markets. J. Finance 46, 1009-1044. Bates, D. S. (1994). The skewness premium: Option pricing under asymmetric processes. Advances in Futures and Options Research, to appear. Bates, D. S. (1996a). Dollar jump fears, 1984~1992: Distributional abnormalities implicit in currency futures options. J. Internat. Money Finance 15, 65-93. Bates, D. S. (1996b). Jumps and stochastic volatility: Exchange rate processes implicit in PHLX Deutsche mark options. Rev. Financ. Stud. 9, 69-107. Beckers, S. (1980). The constant elasticity of variance model and its implications for option pricing. J. Finance 35, 661-673. Beckers, S. (1981). Standard deviations implied in option prices as predictors of future stock price variability. J. Banking Finance 5, 363-381. Ben Khelifa, Z. (1991). Parametric and nonparametric tests of the pure diffusion model adjusted for the early exercise premium applied to foreign currency options. In: Essays in International Finance, Wharton School Dissertation, 1-48. Bhattacharya, M. (1983). Transactions data tests of efficiency of the Chicago Board Options Exchange. J. Financ. Econom. 12, 161-185. Black, F. (1976a). Studies of stock price volatility changes. Proceedings of the 1976 Meetings of the American Statistical Association, 177-181. Black, F. (1976b). The pricing of commodity contracts. J. Financ. Econom. 3, 167 179. Black, F. and M. Scholes (1972). The valuation of option contracts in a test of market efficiency. J. Finance 27, 399-417. Black, F. and M. Scholes (1973). The pricing of options and corporate liabilities. J. Politic. Eeonom. 81, 637-659. Blomeyer, E. C. and H. Johnson (1988). An empirical examination of the pricing of American put options. J. Financ. Quant. Anal. 23, 13-22. Bodurtha, J. N. and G. R. Courtadon (1986). Efficiency tests of the foreign currency options market. J. Finance 41, 151-162. Bodurtha, J. N. and G. R. Courtadon (1987). Tests of an American option pricing model on the foreign currency options market. J. Financ. Quant. Anal. 22, 153-167. Bollerslev, T., R. Y. Chou and K. F. Kroner (1992). ARCH modeling in finance. J. Econometrics 52, 5-59. Borensztein, E. R. and M. P. Dooley (1987). Options on foreign exchange and exchange rate expectations. IMF Staff Papers 34, 642-680. Boyle, P. P. and A. Ananthanarayanan (1977). The impact of variance estimation in option valuation models. J. Financ. Econom. 5, 375-387. Brennan, M. J. (1979). The pricing of contingent claims in discrete time models. J. Finance 34, 53-68. Brenner, M. and D. Galai (1986). Implied interest rates. J. Business 59, 493-507. Broadie, M. N. and J. Detemple (1996). American option valuation: New bounds, approximations, and a comparison of existing bounds. Rev. Financ. Stud. 9, to appear. Butler, J. S. and B. Schachter (1986). Unbiased estimation of the Black/Scholes formula. J. Financ. Econom. 15, 341-357.
607
Butler, J. S. and B. Schachter (1994). Unbiased estimation of option prices: An examination of the return from hedging options against stocks. Advances in Futures and Options Research 7, 167-176. Campa, J. M. and P. H. K. Chang (1995). Testing the expectations hypothesis on the term structure of implied volatilities in foreign exchange options. J. Finance 50, 529-547. Canina, L. and S. Figlewski (1993). The informational content of implied volatility. Rev. Financ. Stud. 6, 659~82. Cao, C. (1992). Pricing foreign currency options with stochastic volatility. University of Chicago Working Paper. Carr, P., R. A. Jarrow and R. Myneni (1992). Alternative characterizations of American put options. Math. Finance 2, 87-106. Chen, D. and R. Welch (1993). Relative mispricing of American calls under alternative dividend models. Advances in Futures and Options Research 6. Chesney, M. and L. O. Scott (1989). Pricing European currency options: A comparison of the modified Black-Scholes model and a random variance model. J. Financ. Quant. Anal. 24, 267-284. Chiras, D. P. and S. Manaster (1978). The information content of option prices and a test of market efficiency. J. Financ. Econom. 6, 213-234. Choi, J. Y. and K. Shastri (1989). Bid-ask spreads and volatility estimates: The implications for option pricing. J. Banking Finance 13, 207-219. Cox, J. C., J. E. Ingersoll and S. A. Ross (1985a). An intertemporal general equilibrium model of asset prices. Econometrica 53, 363-384. Cox, J. C., J. E. Ingersoll and S. A. Ross (1985b). A theory of the term structure of interest rates. Econometrica 53, 385~407. Cox, J. C. and S. A. Ross (1976a). A survey of some new results in financial option pricing theory. J. Finance 31, 383-402. Cox, J. C. and S. A. Ross (1976b). The valuation of options for alternative stochastic processes. J. Financ. Econorn. 3, 145-166. Cox, J. C. and M. Rubinstein (1985). Options Markets. Prentice-Hall, Englewood Cliffs, New Jersey. Culumovic, L. and R. L. Welch (1994). A reexamination of constant-variance American call mispricing. Advances in Futures and Options Research 7, 177-221. Day, T. E. and C. M. Lewis (1988). The behaviour of the volatility implicit in the prices of stock index options. J. Financ. Econom. 22, 103-122. Day, T. E. and C. M. Lewis (1992). Stock market volatility and the information content of stock index options. J. Econometrics 52, 267-287. Derman, E. and I. Kani (1994). Riding on a smile. Risk 7, 32-39. Diz, F. and T. J. Finucane (1993). Do the options markets really overreact? J. Futures Markets 13, 298-312. Dupire, B. (1994). Pricing with a smile. Risk 7, 18-20. Ederington, L. H. and J. H. Lee (1993). How markets process information: News releases and volatility. J. Finance 48, 1161-1192. Emmanuel, D. C. and J. D. MacBeth (1982). Further results on the constant elasticity of variance option pricing model. J. Financ. Quant. Anal. 17, 533-554. Engle, R. F., A. Kane and J. Noh (1993). Index-option pricing with stochastic volatility and the value of accurate variance forecasts. Advances in Futures and Options Research 6, 393-415. Engle, R. F., A. Kane and J. Noh (1994). Forecasting volatility and option prices of the S&P 500 index. J. Derivatives 2, 17-30. Engle, R. F. and C. Mustafa (1992). Implied ARCH models from options prices. J. Econometrics 52, 289-311. Evnine, J. and A. Rudd (1985). Index options: The early evidence. J. Finance 40, 743-756. Fama, E. F. (1984). Forward and spot exchange rates. J. Monetary Econom. 14, 319-338. Fleming, J. (1994). The quality of market volatility forecasts implied by. S&P 100 index option prices. Rice University Working Paper. Fleming, J., B. Ostdiek and R. E. Whaley (1996). Trading costs and the relative rates of price discovery in the stock, futures, and option markets. J. Futures Markets 16, 353-387.
608
D. S. Bates
Franks, J. R. and E. S. Schwartz (1991). The stochastic behaviour of market variance implied in the prices of index options. Econom. J. 101, 1460-1475. French, D. W. and D. W. Martin (1987). The characteristics of interest rates and stock variances implied in option prices. J. Econom. Business 39, 279-288. Froot, K. A. and R. H. Thaler (1990). Anomalies: Foreign exchange. J. Econom. Perspectives 4, 179 192. Galai, D. (1979). A convexity test for traded options. Quart. Rev. Econom. Business 19, 83-90. Galai, D. (1983). A survey of empirical tests of option-pricing models. In: Menachem Brenner, ed., Option Pricing: Theory and Applications. Lexington Books, Lexington, MA, 45-80. Garman, M. B. and M. Klass (1980). On the estimation of security price volatilities from historical data. J. Business 53, 6%78. Garman, M. B. and S. W. Kohlhagen (1983). Foreign currency option values. J. Internat. Money Finance 2, 231-237. Gemmill, G. (1991). Using options' prices to reveal traders' expectations. City University Business School (London) Working Paper. George, T. J. and F. A. Longstaff (1993). Bid-ask spreads and trading activity in the S&P 100 index options market. J. Financ. Quant. Anal. 28, 381-398. Geske, R. and R. Roll (1984). On valuing American call options with the Black-Scholes European formula. J. Finance 39, 443-455. Gibbons, M. and C. Jacklin (1988). CEV diffusion estimation. Stanford University Working Paper. Grabbe, J. O. (1983). The pricing of call and put options on foreign exchange. J. lnternat. Money Finance 2, 239-253. Grundy, B. D. (1991). Option prices and the underlying asset's return distribution. J. Finance 46, 1045-1069. Hammer, J. A. (1989). On biases reported in studies of the Black-Scholes option pricing model. J. Econom. Business 41, 153-169. Hansen, L. P. (1982). Large sample properties of generalized method of moments estimation. Econometrica 50, 1029-1054. Hansen, L. P. and R. J. Hodrick (1980). Forward exchange rates as optimal predictors of future spot rates: An econometric analysis. J. Politic. Econom. 889, 829-853. Harvey, A., E. Ruiz and N. Shephard (1994). Multivariate stochastic variance models. Rev. Econom. Stud. 61, 247-264. Harvey, C. R. and R. E. Whaley (1992a). Dividends and S&P 100 index option valuation. J. Futures Markets 12, 123-137. Harvey, C. R. and R. E. Whaley (1992b). Market volatility prediction and the efficiency of the S&P 100 index option market. J. Financ. Econom. 31, 43-74. Heston, S. L. (1993a). A closed-form solution for options with stochastic volatility with applications to bond and currency options. Rev. Financ. Stud. 6, 327 344. Heston, S. L. (1993b). In'Jisible parameters in option prices. J. Finance 48, 933-948. Hilliard, J. E., J. Madura and A. L. Tucker (1991). Currency option pricing with stochastic domestic and foreign interest rates. J. Financ. Quant. Anal. 26, 139-151. Ho, M. S., W. R. M. Perraudin and B. E. Sorensen (1996). A continuous time arbitrage pricing model with stochastic volatility and jumps. J. Business Econom. Statist. 14, 31-43. Hodrick, R. J. (1987). The Empirical Evidence on the Efficiency of Forward and Futures Foreign Exchange Markets. Harwood Academic Publishers, New York. Hsieh, D. A. and L. Manas-Anton (1988). Empirical regularities in the Deutsche mark futures options. Advances in Futures and Options Research 3, 183-208. Hull, J. (1993). Options, Futures, and Other Derivative Securities. 2nd ed. Prentice-Hall, Inc., New Jersey. Hull, J. and A. White (1987). The pricing of options on assets with stochastic volatility. J. Finance 42, 281-300. Johnson, H. and D. Shanno (1987). Option pricing when the variance is changing. J. Financ. Quant. Anal. 22, 143-151.
609
Jones, E. P. (1984). Option arbitrage and strategy with large price changes. J. Financ. Econom. 13, 91113. Jorion, P. (1988). On jump processes in the foreign exchange and stock markets. Rev. Financ. Stud. 1, 427445. Jorion, P. (1995). Predicting volatility in the foreign exchange market. J. Finance 50, 502528. Karolyi, G. A. (1993). A Bayesian approach to modeling stock return volatility for option valuation. J. Financ. Quant. Anal. 28, 57%594. Kim, I. J. (1990). The analytic valuation of American options. Rev. Financ. Stud. 3, 547-572. Kim, S. and N. Shephard (1993). Stochastic volatility: New models and optimal likelihood inference. Nuffield College Working Paper, Oxford University. Lamoureux, C. G. and W. D. Lastrapes (1993). Forecasting stock-return variance: Toward an understanding of stochastic implied volatilities. Rev. Financ. Stud. 6, 293-326. Latan6, H. A. and R. J. Rendleman (1976). Standard deviations of stock price ratios implied in option prices. J. Finance 31, 369 381. Lewis, K. K. (1995). Puzzles in international financial markets. In: G. Grossman and K. Rogoff, eds., Handbook o f International Economics. Vol 3. North Holland, Amsterdam, 1911-1967. Lo, A. W. (1986). Statistical tests of contingent-claims asset-pricing models: A new methodology. J. Financ. Econom. 17, 143-173. Lo, A. W. and J. Wang (1995). Implementing option pricing formulas when asset returns are predictabl e. J. Finance 50, 87-129. Lyons, R. K. (1988). Tests of the foreign exchange risk premium using the expected second moments implied by option pricing. J. Internat. Money Finance 7, 91 108. MacBeth, J. D. and L. J. Merville (1980). Tests of the Black-Scholes and Cox call option valuation models. J. Finance 35, 285-301. MacMillan, L. W. (1987). Analytic approximation for the American put option. Advances in Futures and Options Research I:A, 119-139. Madan, D. B. and E. Seneta (1990). The Variance Gamma (V.G.) model for share market returns. J. Business 63, 511 525. Maloney, K. J. and R. J. Rogalski (1989). Call-option pricing and the turn of the year. J. Business 62, 539-552. McCulloch, J. H. (1987). Foreign exchange option pricing with log-stable uncertainty. In: Sarkis J. Khoury and Ghosh Alo, eds., Recent Developments in International Banking and Finance. Lexington Books, Lexington, MA. Melino, A. and S. M. Turnbull (1990). Pricing foreign currency options with stochastic volatility. J. Econometrics 45, 239-265. Melino, A. and S. M. Turnbull (1991). The pricing of foreign currency options. Canad. J. Economics 24, 251-281. Merton, R. C. (1973). Theory of rational option pricing. B e l l & Econom. Mgmt. Sci. 4, 141 183. Merton, R. C. (1976). Option pricing when underlying stock returns are discontinuous. J. Financ. Econom. 3, 125-144. Merville, L. J. and D. R. Pieptea (1989). Stock-price volatility, mean-reverting diffusion, and noise. J. Financ. Econom. 242, 193-214. Myers, R. J. and S. D. Hanson (1993). Pricing commodity options when the underlying futures price exhibits time-varying volatility. Amer. J. Agricult. Econom. 75, 121-130. Naik, V. (1993). Option valuation and hedging strategies with jumps in the volatility of asset returns. J. Finance 48, 1969-1984. Naik, V. and M. H. Lee (1990). General equilibrium pricing of options on the market portfolio with discontinuous returns. Rev. Financ. Stud. 3, 493~22. Nelson, D. B. (1990). ARCH models as diffusion approximation. J. Econometrics 45, 7-38. Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new approach. Econometrica 59, 347-370. Nelson, D. B. (1992). Filtering and forecasting with misspecified ARCH models I: Getting the right variance with the wrong model, aT. Econometrics 52, 61-90.
610
D. S. Bates
Nelson, D. B. and K. Ramaswamy (1990). Simple binomial processes as diffusion approximations in financial models. Rev. Financ. Stud. 3, 393-430. Ogden, J. P. and A. L. Tucker (1987). Empirical tests of the efficiency of the currency futures options markets. J. Futures Markets 7, 695-703. Parkinson, M. (1980). The extreme value method for estimating the variance of the rate of return. J. Business 53, 61~55. Patell, J. M. and M. A. Wolfson (1979). Anticipated information releases reflected in call option prices. J. Account. Econom. 1, 117-140. Peterson, D. R., E. Scott and A. L. Tucker (1988). Tests of the Black-Scholes and constant elasticity of variance currency call option valuation models. J. Financ. Research 111, 201-212. Poterba, J. and L. Summers (1986). The persistence of volatility and stock market fluctuations. Amer. Econom. Rev. 76, 1142-1151. Press, S. J. (1967). A compound events model for security prices. J. Business 40, 317-355. Rabinovitch, R. (1989). Pricing stock and bond options when the default-free rate is stochastic. J. Financ. Quant. Anal. 24, 447-457. Ramaswamy, K. and S. M. Sundaresan (1985). The valuation of options on futures contracts. J. Finance 40, 1319-1340. Rubinstein, M. (1976). The valuation of uncertain income streams and the pricing of options. Bell J. Econom. Mgmt. Sci. 7, 407-425. Rubinstein, M. (1985). Nonparametric tests of alternative option pricing models using all reported trades and quotes on the 30 most active CBOE option classes from August 23, 1976 through August 31, 1978. J. Finance 40, 455-480. Rubinstein, M. (1994). Implied binomial trees. J. Finance 49, 771-818. Schmalensee, R. and R. R. Trippi (1978). Common stock volatility expectations implied by option premia. J. Finance 33, 129 147. Scott, E. and A. L. Tucker (1989). Predicting currency return volatility. J. Banking Finance 13, 839851. Scott, L. O. (1987). Option pricing when the variance changes randomly: Theory, estimation, and an application. J. Financ. Quant. Anal. 22, 419-438. Scott, L. O. (1992). The information content of prices in derivative security markets. I M F Staff Papers 39, 596-625. Scott, L. O. (1994). Pricing stock options in a jump-diffusion model with stochastic volatility and interest rates: Applications of Fourier inversion methods. University of Georgia Working Paper. Shastri, K. and K. Tandon (1986). On the use of European models to price American options in foreign currency. J. Futures Markets 6, 93-108. Shastri, K. and K. Tandon (1987). Valuation of American options on foreign currency. J. Banking Finance 11, 245-269. Shastri, K. and K. Wethyavivorn (1987). The valuation of currency options for alternate stochastic processes. J. Financ. Res. 10, 283-293. Sheikh, A. M. (1989). Stock splits, volatility increases, and implied volatilities. J. Finance 44, 13611372. Sheikh, A. M. (1991). Transaction data tests of S&P 100 call option pricing. J. Financ. Quant. Anal. 26, 459-475. Sheikh, A. M. (1993). The behavior of volatility expectations and their effects on expected returns. J. Business 66, 93-116. Stein, J. C. (1989). Overreactions in the options market. J. Finance 44, 1011-1023. Stephan, J. A. and R. E. Whaley (1990). Intraday price change and trading volume relations in the stock and stock option markets. J. Finance 45, 191-220. Sterk, W. (1983). Comparative performance of the Black-Scholes and Roll-Geske-Whaley option pricing models. J. Financ. Quant. Anal. 18, 345-354. Stoll, H. R. and R. E. Whaley (1986). New option instruments: Arbitrageable linkages and valuation. Advances in Futures and Options Research I:A, 25-62.
611
Taylor, S. J. and X. Xu (1994). The term structure of volatility implied by foreign exchange options. J. Financ. Quant. Anal. 29, 57-74. Trautmann, S. and M. Beinert (1994). Stock price jumps and their impact on option valuation. University of Mainz (Germany) Working Paper. Valerio, N. (1993). Valuation of cash-settlement options containing a wild-card feature. J. Financ. Engg. 2, 335-364. Whaley, R. E. (1982). Valuation of American call options on dividend-paying stocks. J. Financ. Econom. 10, 29-58. Whaley, R. E. (1986). Valuation of American futures options: Theory and empirical tests. J. Finance 41, 127 150. Wiggins, J. B. (1987). Option values under stochastic volatility: Theory and empirical estimates. J. Financ. Econom. 19, 351-377.
Z, 1
Peso Problems: Their Theoretical and Empirical Implications*
Martin D. D. Evans
This paper examines how the theoretical and empirical implications of asset pricing models are affected by the presence of a "peso problem"; a situation where the potential for discrete shifts in the distribution of future shocks to the economy affects the rational expectations held by market participants. The paper examines the ways in which "peso problems" can induce behavior in asset prices that apparently contradicts conventional rational expectations assumptions. This analysis covers the relationship between realized and expected returns, asset prices and fundamentals, and the determination of risk premia.
1. Introduction
One common feature of asset pricing models is that current asset prices incorporate market participants' expectations of future economic variables. When market participants act in a stable economic environment, their rational expectations are based on a subjective probability distribution for shocks hitting the economy that coincides with the distribution generating past realizations of variables. In an unstable environment, by contrast, expectations may be based on a subjective probability distribution that differs from the distribution generating past realizations if market participants rationally anticipate discrete shifts in the distribution of future shocks. The "peso problem" refers to the behavior of asset prices in this situation. In particular, "peso problem" models focus on how the potential for discrete shifts in the distribution of future shocks to the economy can affect the rational expectations held by market participants, and hence the behavior of asset prices. In this chapter, I shall review how the presence of "peso problems" can affect the predictions of standard asset pricing models. In particular, I shall show how discrete shifts in the distribution of economic determinants can induce behavior in
* I am grateful to Jeff Frankel, Karen Lewis, James Lothian, Richard Lyons, and Stan Zin for their comments on an earlier draft. 613
614
M.D.D.
Evans
asset prices that apparently contradicts conventional rational expectations assumptions. Since these assumptions are widely used in empirical research, "peso problems" can have potentially far-reaching implications for the estimation and evaluation of asset pricing models. Although the precise origins of the term "peso problem" are unknown, a number of economists attribute its first use to Milton Friedman in his examination of the Mexican peso market during the early 1970's. During the period, Mexican deposit rates remained substantially above U.S. dollar interest rates even though the exchange rate remained fixed at 0.08 dollars per peso. Friedman argued that this interest differential reflected the market's expectation of a devaluation of the peso. Subsequently, in August 1976, these expectations became justified when the peso was allowed to float because it fell in value by 46% to a new rate of 0.05 dollars per peso. The first written discussion of the "peso problem" appears in Rogoff (1980). He argued that the behavior of Mexican peso-futures prices and spot exchange rates from June 1974 to June 1976 was consistent with participants anticipating the devaluation of the peso [see also Frankel (1980)]. Krasker (1980) and Lizondo (1983) provide models that make the reasoning behind this argument clear. Let St+l be the logarithm of the spot exchange rate (dollars per peso). From April 1954 to August 1976 the spot exchange rate was fixed at 0.08 dollars per peso, st = s . Ifs I (< s ) is the level of the spot rate after devaluation, the expected spot rate can be written as
E[S,+lla,] = ,sl + (1 ,
where rot is the market's assessed probability that the peso will be devalued between period t and t + 1. While the peso remained fixed at s, the difference between the realized spot rate and the rate expected in the market was
s E[st+llat] = -
Thus, so long as market participants assessed there to be a positive probability of devaluation so that =t > 0, their forecast errors would be systematically positive. This example illustrates how the potential for discrete events can affect the forecast errors made by market participants during periods where the events do not materialize. This idea lies at the heart of recent models that allow for the presence of "peso problems". One important difference between these models and the analysis of the Mexican peso market is that they generally do not focus on a single event. Rather, they examine the extent to which repeated but infrequent discrete shifts in the distribution of shocks hitting the economy could induce "peso problems" in the observed behavior of asset prices. This is an important distinction because "peso problem" models designed to explain the behavior of asset prices around a particular event have little predictive content. In the case of the Mexican peso, for example, the model places no restrictions on market expectations unless the probability of devaluation, rot, and the new value for the exchange rate, s l, are pinned down.
Peso problems: Their theoretical and empirical implications
615
The problem of how to identify market expectations in the presence of a "peso problem" is tricky. It is always possible that market expectations are being influenced by the possibility of discrete shifts in the distribution of economic determinants that are never observed in the data. In such circumstances, it is impossible to distinguish between rational expectations influenced by a "peso problem" and irrational expectations. Many recent models avoid these "pathological peso problems" by explicitly linking market expectations to discrete shifts estimated in the data. For this purpose, researchers have used variants on the regime switching model originally due to Hamilton (1988, 1989). Regime switching models provide a simple, tractable framework in which to identify the rational expectations of market participants influenced by the possibility of discrete shifts. Importantly, this modelling approach allows us to make a distinction between irrational expectations and the expectations of rational market participants affected by the presence of "peso problems". In this chapter, I shall use the regime switching framework to discuss how the presence of "peso problems" can affect both the theoretical and empirical implications of asset pricing models. In recent years, "peso problem" models have been developed to examine the behavior of stock prices, interest rates and foreign exchange returns. This chapter makes no attempt to survey the general literature on these topics. Rather, I shall focus on the potential for "peso problem" models to shed light on some of the well-documented puzzles, such as the equity premium and forward premium puzzles. I begin, in Section 2, by considering how the presence of "peso problems" affect the properties of forecast errors made by rational market participants. Section 3 examines how the presence of "peso problems" can affect the relationships between asset prices and fundamentals. This analysis identifies the conditions under which regime switching in the process for fundamentals will lead to "peso problems". Section 4 considers how "peso problems" can affect the assessment of risk. Here I evaluate several recent models of the equity risk premium that employ regime switching. In Section 5, I consider a number of econometric issues that arise in the modelling of "peso problems". The paper concludes in Section 6 with a discussion of the directions future research on "peso problems" might usefully take.
2. Peso problems and forecast errors
Although "peso problems" can affect the behavior of asset prices through a number of different channels, in the literature researchers have paid most attention to their impact on the errors made by rational market participants when forecasting returns. In this section, I examine both the theoretical origins and empirical implications of these effects. I will begin by considering cases where market participants face uncertainty about the future regime. Here there exists a "pure peso problem" in the sense that there is no uncertainty about the current regime. I then consider the implications of "generalized peso problems". Here the
616
M . D . D . Evans
effects of "pure peso problems" and learning combine to alter the properties of forecast errors in cases where market participants are uncertain about both current and future regimes.
2.1. Pure peso problems 2.1.1. Theoretical implications Let Rt+l be the return on an asset between periods t and t + 1. By definition, we can write this as the sum of the ex ante expected return held by market participants given information at t, E[Rt+I lot], and the forecast error:
Rt+l =- E[Rt+I[Ot] et+l .
(1)
Under standard rational expectations assumptions, the forecast error, et, should have mean zero and be uncorrelated with variables in the markets' information
set, (2t.
To see how these properties of the forecast errors are affected by the presence of discrete shifts in the returns process, consider the simple case where Rt+l can switch between two processes. Throughout this chapter I shall assume that switches in the process are indicated by changes in a discrete-valued variable, Zt = {0, 1}. Let Rt+l(z) denote lealized returns in regime Zt+ 1 = z . Our aim, therefore, is to consider the behavior of the forecast errors, Rt+l(Z) -E[Rt+I lOt]. For this purpose, it is useful to decompose realized returns into the conditionally expected return in regime z, E[Rt+I (z)[Ot], and a residual wt+l:
Rt+l
= E[Rt+I (0)lot] VE[Rt+I IOt]Zt+l + Wt+l ,
(2)
with VE[Rt+IIOt] - E[Rt+~(1)IOt] - E[Rt+I(O)IOt]. Notice that it will always be possible to decompose returns in this way irrespective of the process they follow in each regime or the specification of the markets' information set, f2t. In order for (2) to be useful in the analysis of market forecast errors, we have to say something about the properties of the residuals, wt+l. When market participants hold rational expectations, their forecasts, E[Rt+l (z)]O/], coincide with the mathematical expectation of Rt+l conditioned on the market's information set. Taking expectations on both sides of (2) conditioned on Ot for Zt+t = {0, 1} implies that E[wt+l lOt] = 0. Thus, the residual, Wt+l, inherits the properties of conventional rational expectations forecast errors. Since it represents the error the rational market participants would make when the t + 1 regime is known, I shall refer to it as the within-regime forecast error. When market participants are unaware of the time t + 1 regime, their forecast errors will differ from the within-regime errors. To see this, we must first identify the market's forecasts by taking expectations on both sides of (2). Using the fact that E[wt+l lot] = 0, this gives
E[Rt+I lot] = E[Rt+I (0)[(2t] + VE[Rt+I IOtlE[Zt+t lot] .
(3)
617
Substituting (2) and (3) into (1) and rearranging, we obtain the following expression for the market's forecast errors, Rt+l -- E[Rt+I lot],
et+l = Wtl + VE[RtI IOt](Zt+l - E[Zt+I [nt])
(4)
This equation shows how the market's forecast errors, et+l, are related to the within-regime errors, wt+l. Clearly, when the future regime is known, Zt+m = E[Zt+I lot], so the second term vanishes. In this case there is no "peso problem" and the market's forecast errors inherit the conventional rational expectations properties of the within-regime errors.1 When the future regime is unknown, the second term in (4) makes a contribution to the market's forecast errors. It is under these circumstances that the presence of a "peso problem" may affect the properties of the market's forecast errors. To see this more clearly, suppose that returns are generated from the regime 1 process in period t + 1. Under these circumstances, the market's e x p o s t forecast error in (4) is et+l(1) = wt+m + VE[Rt+I[Ot](1 - E[Zt+IIOt])
= wt+l + VE[Rt+IIOt]Pr(Zt+I = 0lOt ) .
(5)
As noted above, when market participants have rational expectations, the first term on the right, has mean zero and is uncorrelated with any variables in f2t. The second term is equal to the difference between the within-regime forecasts, VE[Rt+I lot], multiplied by the market's subjective probability that regime 0 occurs next period. A "peso problem" will exist in this case if the market believes that regime 0 is possible so that Pr(Zt+l = 0lOt ) > 0. These beliefs will make the second term in (5) non-zero provided the within-regime forecasts differ from one another. If they do, the term may have a non-zero mean and may be correlated with elements in Or. Thus, the presence of a "peso problem" can cause the markets' forecast errors to appear biased and correlated with e x ante information when viewed e x p o s t even though market participants form their expectations rationally. The presence of a "peso problem" can have these effects on e x p o s t forecast errors more generally. As (4) shows, so long as some uncertainty exists about the future regimes governing returns, the term VE[Rt+I ]Ot](Zt+l -- E[Zt+I lot]) will be present in the realized forecast errors within a regime. As a result, these errors may appear biased and correlated with e x ante information when viewed e x post. The extent to which these properties are found in a particular sample of forecast errors depends upon the frequency of regime shifts in the sample. In the extreme case where only regime 1 occurs, the sample properties of the forecast errors will match those of et+ 1(1) in (5). Alternatively, when there are a number of regime changes during the sample, the forecast errors will inherit a combination
1 Fullenkamp and Wizman (1992) coin the term "surety" when referring to a situation where market participants know the process governing realizations of future returns. Here "surety" implies that
z~+l = E[Z,+iI~]-
618
M . D. D. Evans
of the properties of et+l(1) and et+l(0) [defined analogously with et+l(1)]. As (4) indicates, in this case, the resulting effect on the forecast errors depends on the sample properties of Zt+l - E[Zt+l [~2t]. If the frequency of regime shifts in the sample is representative of the underlying distribution of regime changes upon which rational market participants base their forecasts, in a typical sample Zt+l -E[Zt+I [Qt] will have a mean close to zero and will be uncorrelated with elements in t2t. Equation (4) shows that the sample forecast errors will inherit these properties because, as we noted above, E[Wt+llt]t] = 0. Thus, under these circumstances, the forecast errors will display the conventional rational expectations properties. From this discussion, it should be clear that the impact of a "peso problem" on the forecast errors made by rational market participants depends upon the frequency of regime shifts in the sample. When the number of shifts is representative of the underlying distribution, the forecast errors will display the conventional rational expectations properties. In other cases where the number of shifts is unrepresentative, the forecast errors may appear biased and correlated with ex ante information. Thus, there is a sense in which the presence of a "peso problem" can only impact upon the forecast errors made by rational market participants in "small" samples. Of course the term "small" in this context refers to a sample with an unrepresentative number of regime shifts rather than the number of observations on returns, or even the time span of the data.
2.1.2. Empirical implications A number of papers have examined whether "peso problems" can account for some of the anomolous behavior of asset returns. To summarize this research, it will prove useful to write returns in terms of spot and forward rates. Define st as the logarithm of the spot rate on an asset at time t and f t~ as the logarithm of the time t forward rate on a contract to buy or sell the asset k periods in the future. Then, the speculative return on a forward contract to sell the asset in the future period is,
St+k --
f ~ = ~?t + et+k ,
(6)
where ~b t is the risk premium on this speculative position and et+k is the market's error in forecasting the spot rate given information available at time t.
The forward premium puzzle: It is natural, given the origins of the term, that the foreign exchange literature has paid a good deal of attention to the potential role of "peso problems". In particular, researchers have considered whether "peso problems" could account for the behavior of foreign exchange returns implied by the following regression of the change in the (log) spot exchange rate, Ast, on the forward premium, fit - s t , due to Fama (1984):
ASt+l
=
bo + b(f]
- st) +
Ut+l .
(7)
Using the fact that ASt+l =-f~ - s t + c~t + et+t, and the standard rational expectations assumption that the covariance between f~ - st and the forecast error,
619
et+i, is zero, least s q u a r e s t h e o r y i m p l i e s t h a t in a s a m p l e o f T o b s e r v a t i o n s , the e s t i m a t e o f b is: = 1 -~ C v r ( ~ b t ' f l - st) V a r r ( f } - st)
'
(8)
w h e r e V a r r ( . ) a n d C o v r ( . ) d e n o t e the s a m p l e v a r i a n c e a n d c o v a r i a n c e . T h u s , u n d e r c o n v e n t i o n a l r a t i o n a l e x p e c t a t i o n s a s s u m p t i o n s , a n e s t i m a t e o f b different f r o m o n e i m p l i e s t h a t the risk p r e m i u m c o v a r i e s w i t h t h e f o r w a r d p r e m i u m . Since excess r e t u r n s c a n be w r i t t e n as t h e s u m o f t h e risk p r e m i u m a n d f o r e c a s t e r r o r , this is e q u i v a l e n t to s a y i n g t h a t excess r e t u r n s c a n be p r e d i c t e d w i t h the f o r w a r d premium. T a b l e 1 s h o w s t h e results f r o m e s t i m a t i n g this r e g r e s s i o n w i t h d o l l a r e x c h a n g e rates a g a i n s t the G e r m a n M a r k , British P o u n d a n d J a p a n e s e Y e n o v e r t h e p e r i o d Table 1 This table reports the results of estimating the Fama regression
A st+l = bo 4- b(f~ - st) 4- Ut+l
where st and f] are the spot and the one-period forward exchange rates, over the period 1975-1989. Column (1) reports OLS estimates orb. Column (2) reports the p-value for H0 : D = 1, based on Wald tests that allow for heteroskedasticity in the residuals Ut+l. Column (3) reports the bias in the estimate of c implied by b under the hypothesis that the risk premium is related to the forward discount by:
fbt = Co 4- c ( f ] - st) + yr.
The bias is measured as c* - c where c* is the value of implied from the Fama regression based on simulated data from a switching model. The table reports the mean bias with the standard deviation in parenthesis of the empirical distribution based on 1000 simulations. Column (4) reports the mean and standard deviation of the ratio c*/c. Currency (1) /5 Monthly Data Pound Mark Yen Quarterly Data Pound Mark Yen -2.266 -3.502 -2.022 (2) p-value H0 : b = 1 < 0.001 0.001 < 0.001 (3) (4) Monte Carlo Experiments Bias -0.726 (3.438) - 1.068 (3.253) -0.107 (0.607) -0.724 (2.691) -0.720 (2.735) -0.124 (0.700) Ratio 1.222 (1.053) 1.237 0.722) 1.035 (0.201) 1.216 (0.804) 1.162 (0.615) 1.031 (0.177)
-2.347 -3.448 -2.955
0.001 0.004 < 0.001
Source: Evans and Lewis (1995b)
620
M . D . D . Evans
1975 to 1989. In common with the findings of other researchers, all the estimates of b are significantly less than zero. Based upon the decomposition of b in (8), these negative coefficient estimates imply that the variance of the risk premium is greater than the variance of the forward premium [see Fama (1984)]. There is now quite a large literature trying to reconcile this interpretation of the regression results with the predictions of theoretical asset-pricing models [see, for example, Backus, Foresi and Telmer (1994)]. However, as Lewis (1994) notes in a recent survey, none of the models in the literature have been very successful in generating variability in the risk premia sufficient to explain the regression results. From this perspective therefore, the results in Table 1 present something of a puzzle. " P e s o problems" provide one potential resolution to this puzzle because their presence provides an additional channel through which the forward premium can have predictive power for excess returns within a sample. This can be seen if we rewrite the expression for the OLS estimate of b as b = 1 + CvT(~bt' f ~ - st) ~ COVT(t+I, f ~ -- st) V a r r ( f ] -- st) Varr(f~ St) '
-
(9)
where ct+l = st+l E[st+l [Or]. As we have seen, the presence of a "peso problem" can create a small sample correlation between the rational forecast errors, et+l, and variables in ~t, such as the forward premium f ~ - s t . Thus, in contrast to Fama's analysis, the third term on the right may actually contribute to the estimate of b in "small" samples where a "peso problem" exists. Evans and Lewis (1995b) provide some evidence on the size of the third term in (9). Using estimates of a switching model for the spot exchange^rates, they ran Monte Carlo experiments to look at the small sample bias in b due to "peso problems". In these experiments, the forward rates are driven by both market expectations of future spot rates (which incorporate the effects of potential switches in the spot rate process) and variations in the risk premia according to
t = co + c ( f ] - st) + vt ,
(10)
where vt is an i.i.d, error. In each experiment, a sample of spot and forward rates was generated and used to find the estimate of c implied by the regression in (7), i.e., c* = b - 1. An empirical distribution for c* was built by repeating this procedure. Columns (3) and (4) of Table 1 reproduce the results of these Monte Carlo experiments. Column (3) reports the mean value of c* - c. This is negative for all three currencies indicating that the Fama coeffcient may indeed be biased downwards by the presence of a "peso problem". Column (4) reports the mean and standard deviation of c*/c. This ratio measures the ratio of lower bounds on the standard deviations of the risk premia and gives an indication of how much "peso problems" may contribute to the apparent variability of the risk premia. For all currencies, the mean value of c * / c implies that the standard deviation of the measured risk premium exceeds the true risk premium from the model. In the case of the Pound and the Mark, the standard deviations are about 20% higher.
621
Thus standard inferences may overstate the variability of the risk premia when "peso problems" are not taken into account. These results illustrate how the presence of a "peso problem" can affect coefficient estimates found in conventional regressions that characterize the short run properties of returns. "Peso problems" may also affect inferences about the long-run properties of asset prices and returns as represented by cointegration relationships estimated in the data. Cointegration : A good deal of recent empirical research has focused on the longrun properties of asset prices and returns. This interest has been spurred by the observation that many asset prices and returns appear to be well characterized as following processes with permanent shocks. Under these circumstances, many asset pricing models make predictions about the long-run behavior of prices and returns. These predictions can be easily understood by referring back to the expression for returns in (6): St+k
-
f ~ = q~t + et+k
(6)
Standard models with rational expectations imply that both the risk premia, (~t, and forecast errors, et+k, should follow a covariance stationary process, called "I(0)" in the literature. Since the sum of two stationary variables must be stationary, (6) implies that s t + k - fkt must also follow a stationary process. By contrast, observed spot and forward rates have typically been found to contain very persistent shocks, well-approximated as permanent disturbances which cumulate into so called "stochastic trends". These processes are covariance stationary after first differencing, called "I(1)" in the literature. Clearly, if spot and forward rates are I(1), st+k - fkt will only be I(0) stationary when the permanent shocks to st+k and f tk cancel out. For this to happen two requirements must be met. First, the variables in the vector Xt = [st+k, f~] must be cointegrated. That is to say, there exists a "cointegrating v e c t o r " , such that ~'Xt is I(0) stationary. Second, the cointegrating vector must be ~ ' = [1,-1] since premultiplying by this vector, ~'Xt, gives the excess returns. Testing for the number o f trends: Evans and Lewis (1993) provide an example of how to test the first of these requirements. First they test for the number of trends in a vector of spot rates and a vector of forward rates individually using the methodology developed by Johansen (1988). Next, they test for the number of trends in a vector that combines all the spot and forward rates. If each pair of spot and forward rates share a common trend, the number of trends should not increase when the spot and forward rates are combined in the same vector. Using data for the US Dollar against the German Mark, British Pound and Japanese Yen currencies over the period 1975 to 1989, Evans and Lewis find that vectors containing spot and forward rates contain one more trend than the vector of spot rates. They then examine whether these results could reflect the presence of a "peso problem". Using the estimates from a switching model for the Dollar/ Pound rate, their Monte Carlo study shows that there is a reasonably high
622
M.D.D. Evans
probability of observing an additional trend in forward rates when market participants rationally anticipate shifts in the spot rate process. They also show that standard tests would be very unlikely to detect the trends in excess returns due to the "peso problem" associated with these shifts.
Testing f o r one-to-one eointegration: "Peso problems" may also affect estimates of the cointegrating vector between spot and forward rates. Recall that excess returns will only be stationary when spot and forward rates are cointegrated onefor-one. Thus, in the context of the cointegrating regression, st+k = ao + alfkt + vt+k ,
(11)
al must be equal to one under the null hypothesis of stationary excess returns. Comparing (11) with the identity, st+k - fkt =- (ot + e+k, reveals that we should find al = 1 if the sum of the risk premium and forecast errors follow a stationary I(0) process. Evans and Lewis (1994) examine the relationship in (11) using monthly returns from the U.S. Term Structure for the period June 1964 to December 1988. In this application, st+k is the rate on a one month T-bill at t k, and f tk is the forward rate on a contract at month t for a one month bill at m o n t h t k. They show that the null hypothesis o f a l = 1 can be rejected at horizons o f k = 1, to 10 months. Could these results be attributable to a "peso problem"? To address this possibility, consider the case where k = 1 and let Rt+l =St+l and f~ = E[Rt+I ]~2t] - ~br Let us also assume that the one period rate switches between two processes that share the same trend: R,+I (z) = I~zZt+l et+l (z), "C,+l = zt t/t+ 1 , (12)
for z = {0, 1}, where ~t is the common stochastic trend with i.i.d, innovations qt and et+l (z) following stationary I(0) processes. Using (12) to find the forecasts of Rt+l (z), it is easy to show that f~ = zt[~qPr(Zt+l = l l(2t) + ~0Pr(Zt+l = 01(2t)] + I(0) terms st+l - f ) = z,(~q - 4,0)(Zt+l - E[Zt+I [(2,]) + I(0) terms. (13)
In data samples where the frequency of regime shifts differs from the underlying distribution used by market participants in forming their forecasts, ( Z t + l - E[Z,+I]~2t]) will be serially correlated. Under these circumstances, (13) shows that the stochastic trend, zt, will appear in realized excess returns when ~'1 # ~0. And, since this same trend drives forward rates, the cointegrating coefficient al in (11) will be different from one.
2.2. Generalized peso problems
In the models considered so far, market participants are assumed to know the current regime so that the "small" sample properties of the forecast errors are only affected by uncertainty about future regimes. Other models assume that
623
market participants cannot directly observe current or past regimes. These models introduce an element of learning that can be another source of small sample bias and serial correlation into the ex post forecast errors.
2.2.1. Theoretical implications To illustrate how learning can contribute to peso effects in forecast errors,, suppose that the only information available to market participants when forecasting future returns are current and past returns so that fat = {Rt,Rt-1,...}. Under these circumstances, the degree of uncertainty about the current regime is represented by the conditional probability distribution, Pr(Ztlfat). In extreme cases where the observed history of returns is fully revealing about the current regime, Zt = z, there is no uncertainty. Thus, Pr(Zt = zlfat ) = 1 and the analysis goes through as before. I shall therefore consider cases where the history of returns is not fully revealing so that 1 > Pr(Ztlfat) > 0 for Zt = {0, 1}. Here new observations on returns within a regime may allow market participants to learn about the current regime so that Pr(Ztlfat) can vary from period to period. To see how changes in Pr(Ztlfat) can affect the properties of forecast errors, substitute the identity Pr(Zt+I -- Olfat) - Pr(Zt+l = OlZt = 1, fJt)Pr(Zt+l = OlZt = 1, fat) - Pr(Zt+l = O[fat)) into (5) to obtain the following expression for the ex post forecast error in regime 1:
et+l (1) = wt+l + ~ E [ R t + I Ifat]Pr(Zt+l = OIZt = 1, fat)
- VE[Rt+llfat](Pr(Zt+l = 0lZt = 1 , a t ) - Pr(Zt+l = 0lOt)) .
(14)
The first two terms in this equation are the same as those in (5). The third term shows how learning about the current regime can affect the forecast error. We can rewrite this term as
vg[Rt+,l~t](Pr(gt+,
x Pr(Z, =
= 0IX, = 1 , a , ) - Pr(Zt+l = 0IZ, = 0, fat))
(15)
0lO,)
Notice that this term will be zero if the probability of regime 0 occurring in t + 1 is independent of the current regime. In this special case, uncertainty about the current regime, as measured by Pr(Zt = 0[fat), makes no contribution to the forecast errors. In other cases, changes in Pr(Zt = 0lfat ) due to learning will contribute to the dynamics of this term. Kaminsky (1993) refers to the combined effect of the second and third terms in (14) as the "generalized peso problem". If market participants use Bayes Law to update their probability distributions on the current state using current and past returns, we can describe the learning dynamics by Pr(Z, = 0]fat) = Pr(Z, = OIae_1)(RtIZ, = O, at_l) ~ z Pr(Zt = zlfat_l)E(RtlZt = z, fat-l) (16)
'
624
M . D . D . Evans
and
Pr(Zt = zlOt_z) = Z P r ( Z t =z[Zt_l,f2t_l)

Zt-1
Pr(Zt-llf~t-1)
(17)
where (.]Zt, 2t-1) denotes the likelihood of observing the return given regime Zt and past information, 2t. The first equation is simply a statement of Bayes' Law showing how observations on current returns are used to update the markets' probability of being in regime 0. The second equation shows how the probability distributions of future and current regimes are linked. Equations (16) and (17) have two potentially important implications for the evolution of Pr(Zt = 0lOt) and hence the behavior of the forecast errors. First, uncertainty about the current regime will persist while market participants place some likelihood on current returns coming from regime 0, i.e., while (RtlZt = 0, Ot-1) > 0. Second, as the number of consecutive observations from regime 1 become large, Pr(Zt = 0lOt) will approach zero. In other words, if a regime persists long enough, rational market participants will eventually learn which regime they are in. These features of the learning process suggest that uncertainty about the current regime is unlikely to make a large contribution to the small sample bias and serial correlation of the forecast errors within a single regime if i) current and past returns contain a lot of information about the current regime, and ii) the regime persists for a long time. Both these features depend upon whether market participants view regime changes as being once-and-for-all or not. Lewis (1989a,b) studies the effects of learning on asset prices. In particular, she considers how the exchange rate would behave during a period where market participants are learning about a past change in regime induced by a once-and-for all shift in the process for fundamentals. In the context of equation (14), this situation is equivalent to the case where the switch to regime z = 1 is viewed as permanent so that Pr(Zt+l = O[Zt = 1, ~2t) = O. Imposing this restriction on (14), we can write the forecast errors following the regime switch as et+l(1) = wt+l + VE[Rt+llat]Pr(Zt+l = O[Qt) . Thus, the expost forecast errors will only differ from the within regime errors until market participants have learned that the switch in regime has taken place. In such circumstances, forecast errors are affected by a pure learning problem rather than a "generalized peso problem".
2.2.2. Empirical implications

To what extent are the empirical implications of "peso problems" affected by the presence of learning? This issue has recently been addressed in papers by Kaminsky (1993) and Evans and Lewis (1995a). Evans and Lewis consider the effects of "peso problems" caused by shifts in the inflation process on the long-term relationship between nominal interest rates and realized inflation; the so called long-term Fisher relation. As part of this study, they conduct Monte Carlo experiments on the following cointegrating regression,
Peso problems: Their theoretical and empirical implications E[nt+lIO~n] = do + dlgt+ 1 -]- /)t
625 (18)
where E[Tzt+ 1 I[~nt] is the expected inflation rate and ~z~n+lis the realized inflation rate, both generated from a switching model for quarterly inflation. The experiments reveal that the presence of both a "pure" and "generalized peso problem" creates bias in the estimates of d~ in typical data samples. They also show that the bias is smaller in the "generalized peso" case. Thus, it is quite possible for pure peso and learning effects to have partially offsetting influences on forecast errors. Kaminsky (1993) provides another perspective on the effects of learning in her study of the dollar/pound exchange rate. She examines the properties of exchange rate forecast errors using a variant of the switching model in Engel and Hamilton (1990) where market participants use both the past history of exchange rates and monetary policy announcements made by the Federal Reserve to make inferences about the current regime. As in (14) and (15), the forecast errors depend upon Pr(Ztlf2t). These filtered probabilities are found from the Bayesian updating equations in (16) and (17) using the maximized value of a likelihood function that combines data on the spot exchange rate with a monetary policy indicator. 2 Kaminsky shows that the forecast errors obtained from the model contain a good deal of small sample bias. She then compares them with forecast errors that are constructed using the "smoothed" probabilities, Pr(ZtlOr), in place of the filtered probabilities. These probabilities can be calculated recursively from Pr(Z,_iIO,) = ,C(RtlZt_e, (2t_l)Pr(Zt_i]f2t_l) ~-~zC(RtlZt_l = z, f2t-a )Vr(Zt-1 = zlf2t_l) (19)
starting with t = T, i = 1, and working back through the sample. Notice that these probabilities incorporate all the information in the sample. Thus, if the subsequent behavior of the exchange rate makes clear what process was being followed at t, this new set of forecast errors will be purged of the effects of learning. Kaminsky shows that there is little difference between the sample properties of the two sets of errors. Again, learning appears to contribute little to the small sample effects of the "peso problem". 2.3. Summary In this section, we have seen how the presence of a "peso problem" can affect the forecast errors made by rational market participants. In"small" data samples where the number of regime shifts are unrepresentative of the underlying distribution used by market participants to forecast, their forecast errors may appear biased and correlated with ex ante information when viewed ex post by a researcher. In these cases, the size of these peso effects depends upon the difference 2 Kaminsky refers to this model as an "Imperfect Regime Classification" model because market participants recognize that policy announcements may not provide correct information about the regime. Kaminsky and Lewis (1992) use a similar model to study the impact of foreign exchange intervention.
626
M . D . D . Evans
between the within-regime forecasts of future returns, VE[Rt+I IQt], the dynamics of Zt, and the degree to which the current regime is known. Examples from the literature show that the presence of "peso problems" can significantly affect the relationship between asset prices and returns estimated from typical data samples. Moreover, these effects appear robust to the presence of learning.
3. Peso problems, asset prices and fundamentals So far we have seen how the presence of "peso problems" can affect the properties of forecast errors via their impact on the rational market forecasts. Since asset prices also incorporate forecasts of future fundamentals, the analysis above suggests that the presence of "peso problems" will also affect the link between asset prices and their economic fundamentals. In this section, I shall examine these effects.
3.1. Peso problems in present value models

Present value models are among the simplest asset pricing models in which market expectations of future variables affect current asset prices and returns. I shall examine the impact of "peso problems' in the context of a generic present value model:
OO
Pt = 00 + 0(1 - p)ZpiE[Xt+ilQt] ,
i=0
(20)
where 00 is a constant, 0 is a coefficient of proportionality, and p is the discount factor. Models of this form have been used to examine the behavior of interest rates, stock prices, and exchange rates. For the present, I shall simply refer to Pt and Xt as the asset price and fundamental. Since Pt and Xt often appear to follow non-stationary I(1) processes in applications, it is useful to consider an alternative form of (20) expressed in terms of stationary I(0) variables. Subtracting 0Xt from both sides of the equation and rearranging, we obtain the following expression for the "spread":
OO
Yt =-P, - OXt = Oo + O > ~ piE[z~(t+ilQt] .

i=1
(21)
Notice that when Xt follows a non-stationary I(1) process, E[z~t+i[Qt] must be stationary under conventional rational expectations assumptions. Thus, the spread, Yt, will follow a stationary I(0) process even when Pt is I(1). To see how the presence of a "peso problem" affects the link between asset prices and fundamentals, I shall focus on (21) and study how switches in the process for AXt affect the behavior of the spread. As above, I shall confine my attention to the case where AXt switches between two processes governed by the discrete value state variable Zt = {0, 1}. Realizations of AXt+I are assumed to
Peso problems: Their theoreticaland empiricalimplications
627
depend upon the regime during period t determined by the value of Zt = z, and will be written as AXt+I (z). Since E[AXt+i[O,] = ~zE[AXt+i]O, Zt = z]Pr(Zt = z]Ot), we can take expectations on both sides of (21) conditioned on the market's information Ot [with Yt COt] to obtain Yt = Yt(0)Pr(Zt = 0lOt ) + Yt(1)Pr(Zt = llO,) ,
O(3
(22) (23)
where
Yt(z) = Oo + 0 Z piE[&gt+ilOt' Zt = z] .
i=l
The observed spread is shown in (22) as a probability weighted average of the regime-contingent spreads, Yt(z). These are defined in (23) as the value of the spread when market participants know the current regime. To examine the effects of switching, we need to solve for the regime-contingent spreads, Yt(z). The first step is to iterate (23) one period forward:
o(3
Yt(zt) = O0 + 0 Z piE[AXt+ilOt' Zt =- Z] + 0 p E [ A X t + I lot, Zt = z] . i-2 Next, we note that, E[aX,+ilfa,, z,] = ~

g
(24)
E [E[AXt+ilat+l, Zt+, = z] lot,/,+1 = Z] Pr(Zt+1 = zlO,, Zt).
Substituting this expression in the second term on the right hand side of (24) and rearranging, gives Yt(z) = 00(1 - p) + p ~
i
E[Yt+, (z')]O,]Pr(Zt+l = z'lOt, Zt = z) (25)
+ 0pE[AXt+I (z)lot] , where E[AXt+I (z) lOt] = E[AXt+llOt, Zt = z].
The next step is to solve (25) for both regimes, z = {0, 1}. In models where the transition probabilities governing regime switches are either unknown to market participants or depend upon other variables, the probabilities Pr(Zt+l = zalOt, Zt = z) will be time-varying making (25) a non-linear difference equation. To avoid the complications of solving such an equation, I shall consider the case where Zt follows an independent Markov process with constant transition probabilities known to market participants. Under these circumstances, we can rewrite (25) as a linear matrix difference equation: Yt(1) ]
r,(0)/
/ 00(1 p) I
[ 00(1 - p )
,~A [ E[Yt+I (1)lOt] [ E[ZLgt+l(1)lOt] + " [E[Y~+I(O)JOt] I +Op [E[AXt+I(O)IOt ] I '
(26)
where A is the matrix of transition probabilities with ij 'th element equal to Pr(Zt+l = ilZt =j, Ot). Iterating (26) forward and applying the condition, limt_+~piE[Yt+i(z)lOt] = 0, we obtain
628
M.D.D. Evans
Yt(1) = 00 + O~'-~piE[~Yt+i(1)lK2t] - (1 - 2,)#t ,

i=1 o~3
(27)
Yt(O)
= O0
-]- 0 ~--~piE[AXt+i(0)lf2,] + (1 - 20)~t,

i=1
where 2z is the probability of remaining in regime z = {0, 1} from one period to the next, and
i=1
Equations (22) and (27) allow us to examine how switches in the process for AXt affect the behavior of the spread under a variety of conditions. For example, consider the case of a "pure peso problem" in which market participants only face uncertainty about the future regime. Here Yt = Yt(z) so all the effects of switching can be examined using (27). This equation shows that news about fundamentals can affect the spread through two channels. First, news that leads to revisions in the expected present value of AXt+i within the current regime, affects Y(z) through the second term on the right of each equation. Second, new information on the expected size of the jump in dividend prices when a regime switch occurs affects Yt(z) through ~t. This jump term is equal to the present value of expected future changes in the regime-contingent spread induced by switches in regimes. Since Yt = Yt(z), in the "pure peso problem" case, ~t represents the effects of expected capital gains induced by future regime switching. In the case of a "generalized peso problem", where market participants face uncertainty about both the current and future regimes, news can affect the spread through a third channel. Recall that under these circumstances the observed spread is linked to the regime-contingent spreads by
Yt = Yt(O)Pr(Zt = 0lOt) + Yt(1)Pr(Zt = 1lOt) ,

withl > Pr(ZtiOt) > 0. Thus news that leads market participants to revise their estimate of the current state will in general lead to a change in the spread even when the regime-contingent spreads remain unchanged. Equation (27) makes clear that the presence of a "peso problem" affects the relationship between Yt(z) and the present value of expected future fundamentals growth within a regime because market participants take account of future capital gains and losses associated with regime switches. To examine these capital gains, we need to solve for Yt(1) - Yt(0). Taking the difference between the two equations in (27), and rearranging, we find that
oo
Y,(1) - Yt(0) = Op Z qd-'E[AXt+j(1) - AXt+j(O) lO, ] , j=l
(28)
where q~ = p(21 + 2o - 1). Thus, the jump in the regime-contingent spread when a switch in regime occurs depends upon the present value of the difference between the within regime forecasts of the future AXt's.
629
Equation (28) has two important implications for the behavior of the spread when there is a change in regime. First, the size of any jump in Yt(z) depends upon both the difference in expected future fundamentals growth across regimes and the dynamics of regime switching. In this two regime example, the value of 21 + 20 - 1 determines the serial correlation structure of regimes. If 21 + 2o = 1, regimes are serially independent so the continuation of the current regime is as likely as a switch. In this case, (28) shows that Yt(1) - Yt(0) = E[AXt+I (0) -AXt+I (1)[ f2t]. Thus, cross-regime differences in future AXt's have no effect on the size of the jump. The reason is that a switch in regime this period has no impact on markets' expectations for future AXfs when regimes are serially independent. In other cases where there is serial dependence in the regimes (i.e. when 2t + 2o 1), market participants will revise their forecasts of future AX/s when the regime switches so that the cross-regime differences in forecasts far into the future can affect the size of the jump. For example, in the case where 21 + 2o > 1 so that continuation of the current regime is more likely than a switch, (28) indicates that the spread will jump upwards when there is a switch from regime 0 to 1 if E[z~O(t+j(1)[f2t] > E[AXt+j(0)[Ot] for j > 0. The second implication of (28) is that jumps can occur in Yt(z) even when the change in regime is not accompanied by a jump in AXt+I. For example, suppose that a switch in regime only affects forecasts of AXt+2. So long as regimes are not serially independent, a change in regime at t will be accompanied by a jump in the regime-contingent spread. In the case of a "pure peso problem", this jump will be matched by the observed spread. Thus, a regime switch can generate jumps in the spread, even when there is no change in the current behavior of fundamentals. In this case, a switch in regime could have the appearance of a financial crisis, or crash. We can also use (22) and (27) to see how switches in the process for fundamentals can give rise to the appearance of a rational bubble. In the context of the present value model, the spread contains a bubble when Yt satisfies the difference equation implied by (21), namely,
Yt = 00(1 - p) + pE[Yt+l[f2t] + pE[~+I[f2t] ,

but not the transversality condition, limT~o~E[p~'Yt+vlOt] = 0. For example, if AXt+I is constant, one bubble process for the spread is
Yt+l = const. + - Yt + ~]t+l P with E[r/t+l ]f2t] = 0. In this case, the spread varies because expectations of future spreads vary and not because there is any fundamentals' news. Bubble models are therefore quite different from present value models with switching in the fundamentals process because in switching models all the variations in Yt are driven by fundamentals' news. Flood and Hodrick (1986) noted that this theoretical distinction between peso and bubble models may be impossible to spot empirically. Suppose that during regime one, news arrives about the future fundamental in regime zero. Equations
630
M . D . D . Evans
(22) and (27) indicate that this news would affect the current spread insofar as it alters the expected future capital gain in the event of a regime switch. If this news is uncorrelated with the behavior of fundamentals in regime one, some of the variations in the spread in regime one would appear unrelated to the observed fundamentals. In the extreme case where all the observations come from a single regime, there would be no way to distinguish between this manifestation of a "peso problem" and the presence of a bubble.
3.2. Empirical implications 3.2.1. The term structure of interest rates

The first application of a switching model to a fundamentals-based asset pricing model appears in Hamilton (1988). He considers the following model [based on Shiller (1979)] for the yield on ten-year Treasury bonds, R[, and the three month T-bill rate, R]:
l-1
R[ = 00 + 0(1 - p ) Z p i E
i=0
[R]+ilf2t] ,
(29)
R] = ~o + ~ Z t + vt ,
(30)
with 0 < p < 1. Here vt follows an AR(4) process with regime dependent heteroskedasticity, and Zt = {0, 1} follows an independent first-order Markov process. Market participants are assumed to forecast future short rates only using the so a "generalized peso propast history of short rates [i.e., ~2t = {R~,R t-l,...}] 1 blem" is present. The model places a complicated set of rational expectations restrictions on the joint behavior of the long and short rates. Using quarterly U.S. data from 1962:1 to 1978:3, Hamilton estimates the restricted process for the long rate as R~ = 0.051 + 2.454Pr(Zt = l lf2t) + 1.89E[vtlf2t] + 0.009E[vt_, lot] + 0.011E[vt-2lf~t] + 0.001E[vt_3 l~2t] + ~t , with Pr(Zt = lilt-1 = 1) : 0.997, and (31)
Pr(Zt =
012,-1 : 0) = 0.998.
What do these model estimates imply about the importance of a "peso problem" in the U.S. term structure? Suprisingly, they suggest that "peso problems" were almost completely absent. In the analysis above, we saw that "peso problems" will only affect the spread when market participants take account of the capital gains and losses associated with future changes in regime [i.e., via ( 1 - 2z)rPt in (27)]. Although the estimated coefficient of 2.452% on the Pr(Zt = 1 lot) term in (31) indicates that these capital gains are quite large, market participants largely ignore them because the estimates of Pr(ZtlZt-1) indicate that the probability of a regime switch from one period to the next is very close to zero.
631
Sola and Driffill (1994) come to somewhat different conclusions in their study of the U.S. term structure. Unlike Hamilton, they consider the implications for behavior of the yield spread when there are switches in the process for short rate changes. With this formulation, the variables in the switching model are I(0) stationary even when long and short rates follow I(1) processes. This is an important feature, because as Pagan and Schwert (1990) point out, the validity of Hamilton's procedure for modelling regime switching requires that the variables in the model are I(0). Although the estimated timing of regime switches in Sola and Driffill's model are very similar to those found in Hamilton (1988), their estimated transition probabilities are a good deal smaller. As a result, their model estimates indicate that the behavior of the U.S. term structure was significantly affected by "peso problems". 3 The contrast between these results suggests that it is perilous to draw conclusions about the importance of peso effects from the estimates of a single switching model.
3.2.2. Stock prices

Switching models have also been used to examine the behavior of stock prices. For example, in Evans (1993), I examine the effects of switches in dividend growth within the context of the dividend ratio model developed by Campbell and Shiller (1989). This model relates the natural log of the dividend price ratio at the beginning of period t, fit, to expected future dividend growth:
OO
at = Oo- ~ pi E[kdt+jlf2,] ,
j=l
(32)
where Adt+l is the dividend growth rate during year t and p is close to but smaller than one. Notice that this equation has the same form as the equation for the spread in (21) with Adt = -AXt and 0 = 1 so the analysis above can be used to examine the effects of switching in the dividend growth process. I assume that market participants observe the current regime and dividend growth switches between two processes, with switches determined by Zt = {0, 1} following an independent first order Markov process. As in Campbell and Shiller (1989), the empirical implications of the model are derived within a VAR framework for the joint behavior of log dividend prices and dividend growth. For the case of a first-order system, the VAR takes the form:
Adt+l = L
't-l-I ]
F~(Zt+I)fl(Zt)
fl(z,)
~(Zt-l-I )o~(Zt )l [ zl~t]

J
] -[- L[7~(Zt+l)l)t+l-~-t]t+l
' (33)
-~-[~(zt+l) q- g(zt+l)o(zt)g(zt)
3 This findingis consistentwith the results of Lewis(1991) and Evans and Lewis(1994) for U.S. rates and Kugler(1994) for Eurodollarrates.
632
M.D.D. Evans
where e(z),/~(z), g(z), y(z) and rt(z) are coefficients that depend upon the regime and E[rlt+l[6t, Adt ] = E[Vt+lI~t, Adt] = 0. Under rational expectations, the dividend ratio model in (32) imposes a complicated set of restrictions on these coefficients. Table 2 shows estimates of the model in (33) using annual series for stock prices and dividends for the Standard and Poors Composite Stock price index from 1871 to 1987. The estimates of e(z) and//(z) show how the predictability of dividend growth varies across regimes. In particular, the estimates of ~(z) indicate that past dividend growth is a useful predictor of future dividend growth over short to medium forecasting horizons in regime 1 but not regime 0. As we saw above, differences in the forecasts of fundamentals across regimes only create "peso problems" when market participants place a significant probability on a regime switching from one period to the next. In this model, the probabilities are approximately 10% when in regime 1 and 1% in regime 0 so "peso problems" do affect the behavior of dividend-prices. One way to gauge the importance of "peso problems" is to examine the sample behavior of stock returns implied by the model estimates. Campbell and Shiller (1989) show that the log return on stocks between periods t and t + 1 can be well approximated by
rt+ 1 ~ K "q"~t -- fl~t+l q- Adt+l ,
(34)
where x is a constant. Iterating this approximation forward, imposing the terminal condition, limt~piSt+i = O, and taking expectations conditioned on f2t, gives,
OQ OG
at - 1-K-~p
OZ
pi
+ OE
PJ
(35)
Comparing (35) and (32), we see that ex ante expected stock returns are constant in the dividend ratio model. Thus, variations in rt+l should not be forecastable with any variables in 12t when market participants hold rational expectations and "peso problems" are absent. When they are present, realized returns will appear forecastable in "small" samples for the reasons discussed in Section 2. The lower panels of Table 2 examine the predictability of returns with the regressions
Fnt+m ~--- ao + alat + Ut+m ,
and rt+l m-I bo + bl E (~t-j -~- Wt+l , j=O
where r~t+m- Y~ff=l rt+i is the m-period return. Under the null hypotheses of no predictability, al = 0 and bl = 0. 4 As the upper rows of the panel show, this null 4 See Hodrick (1992) for a discussion of these regressiontests.
Peso problems. Their theoretical and empirical implications
633
Table 2 The upper panel of the table reports the maximum likelihood estimates of the switching VAR model in (33). The parameters 7(z) and n(z) depend on ct(z),~(z), and g(z) through the cross-equation restrictions implied by the dividend ratio model in which rational market participants anticipate switches between two regimes. Switches are governed by Zt = {0, 1} which follows an independent first-order Markov Process, with transition probabilities, Pr(Zt =zlZt_i = z ) - 2z. The model is estimated with S&P annual data of 117 years starting in 1879. The lower panels of the table report the percentiles of the empirical distribution for the t-statistics in the return regressions A and B. The empirical distribution is derived from 1000 replications of Monte Carlo experiments based on the estimated switching model. All the t-statistics correct for the presence of conditional heteroskedasticity. In addition, the statistics in Panel A correct for the presence of an Ma(m - 1) process in the residuals induced by the forecast overlap under the null hypothesis of no predictability in returns.
Maximum Likelihood Estimates
Parameter ~(1) (0) ~(1) ~(0)
Estimates 0.575 0.095 q).066 ~).307
Std. Error 0.133 0.070 0.584 0.048
Parameter g(1) g(0) 21 2o
Estimates -22.367 -89.889 0.898 0.985
Std. Error 20.100 13.881 0.067 0.026
Return Predictability A
: ~t+m :
ao + alOt + Ut+m
' b 1x ~ " - 1 3 t-j -]- Wt+l B : rt+l : bo ~z'-.~j=o
m=l al 0.115 t-statistics 2.175 Percentiles 5 4.560 10 5.101 25 5.794 50 6.627 75 7.437 90 8.295 95 8.725 Source: Evans (1993)
m=2 0.285 3.073 4.118 4.588 5.365 6.311 7.224 8.157 8.834
m=3 0.379 3.168 3.799 4.201 5.054 6.036 7.093 8.228 8.960
m=4 0.540 3.739 3.397 3.987 4.896 5.994 7.224 8.327 9.076
t~1
m=2
0.087
m=3 0.058 2.574 2.189 2.419 2.825 3.292 3.768 4.175 4.562
m=4 0.059 2.847 1.771 2.003 2.382 2.835 3.271 3.713 3.937
t-statistics 2.717 Percentiles 5 2.909 10 3.172 35 3.630 50 4.180 75 4.758 90 5.244 95 5.555
c a n be r e j e c t e d at s t a n d a r d significance levels w h e n t h e r e g r e s s i o n s a r e e s t i m a t e d w i t h t h e S & P d a t a . T h e c o n v e n t i o n a l i n t e r p r e t a t i o n o f this r e g r e s s i o n e v i d e n c e is that market participants' forecasts of future returns vary with the log dividendp r i c e r a t i o . T h e l o w e r r o w s o f t h e p a n e l p r o v i d e us w i t h a n a l t e r n a t i v e interp r e t a t i o n . R e p o r t e d h e r e a r e M o n t e C a r l o d i s t r i b u t i o n s f o r t h e t-statistics a s s o c i a t e d w i t h al a n d bl e s t i m a t e d f r o m s i m u l a t e d d a t a b a s e d o n t h e m a x i m u m l i k e l i h o o d e s t i m a t e s o f t h e s w i t c h i n g m o d e l in (33). T h e r e is o n l y o n e case w h e r e t h e r e is a g r e a t e r t h a n 5 % p r o b a b i l i t y o f o b s e r v i n g a t-statistic less t h a n the a s y m p t o t i c critical v a l u e 1.95. T h u s , p e s o effects a p p e a r to h a v e a s i g n i f i c a n t i m p a c t o n s t o c k r e t u r n s in this m o d e l .
634
M . D . D . Evans
3.3. Summary
In this section, I have examined how the prospect of discrete shifts in the behavior of fundamentals can affect the forecasts of rational market participants, and hence the behavior of asset prices. When market participants anticipate a switch in the fundamentals' process, current asset prices will depend on both the forecasts of fundamentals under the current process, and forecasts of the jump in prices if a switch takes place in the future. In "small" samples, variations in this latter term can induce movements in asset prices that appear unrelated to fundamentals and can complicate inferences about the link between prices and fundamentals in particular applications. To illustrate how important these effects may be in practice, I considered models of the term structure and stock prices that incorporate switching in fundamentals. The findings from these models exemplify two important points. First, the presence of switching in fundamentals need not imply that "peso problems" significantly affect the behavior of asset prices. Second, it can be perilous to draw conclusions about the importance of "peso problems" from the estimates of a single switching model.
4. Risk aversion and peso problems
So far we have seen how the presence of "peso problems" can affect the behavior of asset prices and returns through their effect on market participants' expectations. In particular, we have seen how the prospect of a shift in regime can affect the link between asset prices and fundamentals and the properties of rational forecast errors in "small" samples. In this section, I shall consider how the prospect of regime shifts affects the market's assessment of risk. I will begin by examining the impact of "peso problems" in a fairly general theoretical setting. This provides us with the framework to consider recent research on the behavior of asset prices in general equilibrium models with regime switching. In the second half of this section, I will examine how regime switching may provide a potential explanation for the equity premium and forward premium puzzles.
4.1. Peso problems in dynamic asset pricing models

In modern dynamic asset pricing theory, the asset prices are constrained by the behavior of a pricing kernel: a stochastic process governing prices of state-contingent claims. Let Yt+l be a random variable that prices one-period state-contingent claims. If the economy admits no pure arbitrage opportunities, it can be shown that the one-period returns on all traded assets, i, must satisfy
E[~t+le~+ ll~'~t] = 1 ,
(36)
where R~+1 is the real gross return on asset i between t and t + 1 [see Duffle (1992)]. I shall refer to 7t+l as the pricing kernel. In economies where there is a complete
635
set of markets for state-contingent claims, there is a unique random variable 7t satisfying (36). Under other circumstances, this no arbitrage condition still holds but for a range of 7t's. In economies with a representative agent, 7t+1 is the intertemporal marginal rate of substitution so that (36) also represents a firstorder condition. For the present, I shall keep the specification of 7t+1 general so that the analysis o f " p e s o problems" can be applied to a wide class of asset pricing models. Since (36) applies to all traded assets, the pricing kernel wilt be related to the 0 by E[yt+ 1[Ot] = 1/R+l Combining this expresreturn on a risk-free asset, Rt+l, sion with (36), we obtain an equation for the risk premium on asset i: E i 0 1In,] = 1 - Cov('~t+l , R~+1[~"~t) [et+l/Rt+ (37)
It is clear from (37) that the presence of a "peso problem" will only affect the risk premium insofar as it influences the conditional covariance term. To examine this influence, consider the simple case where the vector Xt+ 1 -[R~+l,Tt+l ] switches between two regimes. As in Section 2, we can write the realized values of Xt+ l as
Xt+l = E[Xt+I (0)l(2t ] + VE[Xt+I I(2t]Zt+l + Wt+l , where ~E[zt+llg2t]-E[zt+l(1)lOt]-E[zt+l(O)l~2t] and E[Wt+I [f2t] = 0. F r o m (38), it is easy to show that
(38)
W/+ 1 ~ [w~t+l,W~+l] with
= cov(wf,+ ,
In,)
(39)
-l-rE[el+ 1 I~r~t]VE[])t+l [f2t]Var (Zt+l I~'~t) .
This decomposition of the conditional covariance allows us to see clearly how the presence of a "peso problem" can affect the risk premium. In the cases where the future regime is known [i.e., Zt+l c f2t], there is no "peso problem" and the risk premium only depends on the conditional covariance between the withinregime forecast errors, cov(w~t+l , w~ 1 [~'2). Here the variations in the risk premium originate from conditional heteroskedasticity in a regime [i.e., changes in cov(w~t+l , w~+1[O) for a given value of Zt+l] and/or conditional heteroskedasticity induced by a change in Zt+l. By contrast, when a "peso problem" is present [i.e., Zt+ff~f2t], the risk premium includes the conditional covariance between E[Rt+l(Z)[f2t] and E[Tt+l(Z)[f2t]. This term accounts for the forecast uncertainty market participants face across regimes. It is clear from (39) that the importance of a "peso problem" depends on several factors. In particular, the second term in (39) will make no contribution to the risk premium in cases where the within-regime forecasts of the pricing kernel are the same so that X~E[Tt+ 1[~2t] = 0. Thus, it is quite possible for a "peso problem" to generate small sample bias and serial correlation in Rt+l -E[R~+I[~2t ] because VE[R~+ 1[Ot] ~ 0, and yet have no effect on the risk premium. While this may appear to be a special case and therefore of limited interest, it turns out to be a feature of some models in the literature.
636
M . D . D . Evans
"Peso problems" will contribute to the risk premium in varying degrees depending upon the amount of information market participants have about the future regime. This is easily seen by writing the conditional variance of Zt+l in (39) as Var(Zt+l I(2t) = E[Var(Zt+l ](2t, Zt)lot] + Var(E[Zt+l ]f2t, Zt] ]f2t) (40)
When market participants observe the current regime, the second term in (40) vanishes. The behavior of Var(Zt+l lot) will then depend entirely on the dynamics governing regime changes. For example, when there is no serial dependence in Zt, Var(Zt+l lot, Zt) will be a constant. In this case, the presence of a "peso problem" introduces a constant into the risk premium. Otherwise, Var(Zt+l lot, Zt) will vary with Zt so that the "peso problems" will introduce another source of variability in the risk premium when there is a change in regime. In cases where market participants do not observe the current regime, the presence of a "peso problem" can contribute to variations in the risk premium within a regime. Here the probabilities Pr(Zt = z ll2t) will change as market participants learn about the current regime and this will lead to variations in both the terms on the right of (40).
4.1.1. Peso problems and the equity premium puzzle

A number of papers have recently used switching models in an effort to relate the observed behavior of the equity returns to general equilibrium asset pricing models. In particular, Cecchetti, Lam and Mark (1990, 1993) and Kandel and Stambaugh (1990) have used estimates of switching processes for consumption and dividends to examine the behavior of stock returns in variants of Lucas' model [Lucas (1978)]. These papers nicely illustrate the conditions under which "peso problems" can contribute to the behavior of the returns. In all the papers, the presence of a representative agent with isoelastic utility makes 7t+l = f l ( C t + l / C t ) -~ where Ct is equilibrium consumption, ~ is the coefficient of relative risk aversion, and 0 < fl < 1. One important difference between the papers is their specification for the switching process governing consumption and dividends. These specifications are summarized in the table below:
Model
I
Dividend and Consumption growth

Adt+l : ]20, -~-~/iZl + 8t+l ACt+l : / / ' o , + # l Z t -~- et+l
Paper Cecchetti, Lam and Mark (1990) Kandel and Stambaugh (1990) Cecchetti, Lain and Mark (1993)
II
III
Adt+l = Iv(Z, ) act+i = I.(Z,) Adt+l = #o,a + Pl,dZt+I + ea,t+l

Act+l = I20,c + #l,cZt+l + ec,t+l
In Models I and III, Zt is assumed to follow an independent first-order Markov process that switches between two regimes z = {0, 1}. The errors, et+a, are assumed to be independent and identically distributed normal variates with zero mean. The presence of these errors creates uncertainty about growth within each
637
regime. By contrast, in model II, all the variations in growth originate from changes in Zt via the indicator function I,(.) that takes a different value according to the regime. Here Zt follows an independent first-order Markov process between four regimes. Although these models are similar in many respects, they have quite different implications for the role played by "peso problems" in determining the behavior of equity returns. In Model I, equilibrium dividends and consumption are identically equal. Moreover, growth between period t and t 1 depends upon the current regime Zt. Since market participants are assumed to observe the current regime in all the models, this implies that there is no uncertainty about the distribution of growth over the next period. To understand the implications this timing assumption has for the role of "peso problems", consider the equilibrium expressions for the pricing kernel and stock returns derived from model I: ?t+l =/~exp(-r/#0 - r/#lZ, - ~]/3t+l) (41) Rt+ 1 = [ e x p ( f ( Z t ) - 6 ( Z t + l ) ) e x p (3(Zt))] exp(#0 ~lZt /3t+l)
where 3(z) is the equilibrium log dividend price ratio in regime z. The important thing to note in (41) is that Zt+l only affects realized stock returns. This means that there is no difference between the within-regime forecasts of the pricing kernel, i.e., VE[Tt+l[f2t] = 0. As a result, uncertainty about the future regime makes no contribution to the equity risk premium because the coefficient on Var(Zt+l lot) is zero in the expression for Cov(?t+ 1,R~+lIOt) shown in (39). While "peso problems" have no effect on the equity premium in this model, they do affect the small sample properties of equity returns, RtS+l. As the second equation in (41) shows, realized returns depend upon Zt+l through the log dividend-price ratio in t + 1,3(Zt+l). Provided the ratio varies across regime [i.e., 3(1) ~ 3(0)], the within-regime forecast of future returns will differ from one another so that ~TE[R~+1[f2t] ~ 0. As we saw in Section 2, "peso problems" will affect the small sample properties of the rational forecast errors under these circumstances. Model II has very similar implications. Although Kandel and Stambaugh's model implies a somewhat different expression for the equilibrium log dividend price ratio, the pricing kernel in their model depends upon the current regime as in (41). Consequently, "peso problems" have no effect on the equity premium or expected returns, E[Rt+1lOt]. As in Model I, the dividend price ratio does vary across regimes creating a dependence between realized returns and the future regime. This, in turn, is the source o f a "peso problem" in the rational forecast errors which is reflected in realized returns. Model III allows uncertainty about the future regime to affect the pricing kernel. This can be clearly seen from the equilibrium expression for the pricing kernel and stock returns:
638
M.D.D. Evans
7t+1 = fl exp(-~/#0,c - t]I21,clt+l - ?]~c,t+l) (42)
The most important difference between (42) and (41) is that the pricing kernel now depends upon the future regime, Zt+l rather than the current regime. This means that there is now the potential for "peso problems" to affect the size of Cov(Yt+l , Rt+ i 1 lot) through the second term in (39), and hence the behavior of the equity premium. To examine the strength of this peso effect, it is useful to reconsider equation (39), shown below:
Cov('Yt+l , et+l i lOt) = cov(w~t+l, wff+1lot)
+ VE[RI+ 110t]VE[?t+I lot] Var (Zt+l lot) . As the last term in the equation shows, uncertainty about the future regime will only affect Cov(?t+l, R~+1lot) when both VE[?t+ 1lot] and VE[R~+1lot] are nonzero. From (42) we see that the size of ~7E[Tt+! lot] depends upon the degree of risk aversion via the term -r/#l,c and the size of VE[RI+ 1lot] depends upon the crossregime differences in the equilibrium log dividend pricing ratio, 6 ( 1 ) - 6(0). Cecchetti, Lain and Mark's estimates imply that 6(1) - 6(0) is close to zero because there is very little serial dependence in regimes, [the estimated value of 21 + 20 - 1 is only 0.06]. As a result, "peso problems" have little impact on the equity risk premium in this model. There are two lessons to be drawn from the analysis of these models. The first is that the presence of switching need not lead to peso effects in risk premia even though market participants are aware that small sample problems will exist in the errors they make in forecasting future returns. As models I and II illustrate, peso effects on the risk premia can be ruled out by the (implicit) choice of specification for the equilibrium pricing kernel. The second lesson is more subtle. Even if the specification for the pricing kernel means that peso effects can potentially affect risk premia, the importance of these effects depends upon the dynamics of regime changes. Thus, the presence of switching in fundamentals need not imply that "peso problems" contribute significantly to the behavior of returns. So far I have only examined the implications of these switching models for the behavior of the conditional equity premium, E[Rt+I/Rt+IIOt]. ~ 0 A b e l (1993) considers their implications for the unconditional premium, E[Rt+ ~ 1/Rt+l]. 0 Taking unconditional expectations on both sides of (37), and applying the law of iterated expectations, we can write the unconditional premium as
s 0 E[R, 1/R,+I] = 1 - E [ C o v ( T t + l , e t + 1 IOt)] (43)
= 1 -- Cov(~t+l, Rt+,) + Cov(E[?t+ 1[Ot], E[Rt+ 1[Ot]) where Cov (.) denotes the unconditional covariance. Abel points out that if the conditionally expected growth rates of consumption and dividends are positively correlated, the last term on the right hand side of (43) will be negative in models
639
with conditional lognormality and constant relative risk aversion. Thus, in these cases, the unconditional risk premium will be lower in the presence of Markov switching than would emerge from a model using the unconditional distribution of shocks. Abel confirms this prediction for the Markov switching specifications in Models I, II, and III. What implications do these findings have for the potential effects of "peso problems" on the unconditional equity premium? Equation (43) shows that switching in fundamentals will affect the size of the unconditional risk premium through the covariance between E[Tt+ll(2t] and E[R~+I[~t]. "Peso problems" will therefore only affect the unconditional equity premium to the extent they alter this covariance. This observation suggests that "peso problems" will be of little help in resolving the equity premium puzzle in models where Cov(E[])t+ 1 [g2t], E[Rt+ 1[(2t]) < 0. However, as we shall see, "peso problems" can have significant effects on the unconditional moments of returns estimated in "small" samples. It is therefore possible that the sample estimates of E[R~+1/Rt+l] and COV(Tt+I,R~+I) used to characterize the equity premium puzzle are quite different from the unconditional population moments.
4.1.2. Peso problems and the f o r w a r d premium puzzle
In Section 2, we saw how the presence of switching in the spot exchange rate process could generate "peso problems" in exchange rate forecast errors. We also saw how estimates of peso effects could explain some, but not all of the predictability of foreign exchange returns in the context of Fama's regression. In view of these findings, it is worthwhile investigating whether "peso problems" could contribute to the predictability of returns via the foreign exchange risk premia. Hansen and Jagannathan (1991) provide a suitable framework for this purpose. To begin, write the nominal return on asset i as R~+1 =_ L~+I/Vt where V[ is the dollar value of the asset at t and L~+1 is the cash flow one period later. The no arbitrage condition in (36) can now be written as V[ = E[Tt+lL~+lIQt] where 7t+1 is the nominal pricing kernel denominated in dollars. Note that Yt+l will be equal to the nominal intertemporal marginal rate of substitution in representative agent models. Next, let L~+a = Ft - St+a where Ft is the one period forward price and St+l is the future spot price of foreign currency. Since this cash flow can be generated by selling domestic currency to buy the forward contract, it involves no (net) payments at time t. Thus, the no arbitrage condition in (36) implies that E[Tt+l(Ft- St+a)l~'~t] ~-O. Applying the law of iterated expectations, we can rewrite this restriction as
C o v r ( ~ t + l , F t - St+l) = -Ev[Tt+l]Er[Ft - St+l]
(44)
where Er[.] and Covr(.) represent the mean and covariance based on a sample of T observations. Using the Cauchy-Schwarz inequality, (44) implies the following bound on the coefficient of variation for the nominal pricing kernel: v/Varr(vt+l) > [ET[Ft- St+l][ Et[Yt+l] - v/Varr(Ft - S,+l) (45)
640
M.D.D. Evans
The Hansen-Jagannathan bound in (45) applies not only to investments in foreign exchange but also to investments in equities or bonds, or in portfolios that combine all these assets, so long as the associated cash flow at data t is zero. Bekaert and Hodfick (1992) estimate the bounds using equity and foreign exchange returns in the U.S., Japan, U.K. and Germany. For the three exchange rates, they estimate the bound to be as large as 0.48 with a standard error of 0.08. By contrast, the bound for U.S. equity is estimated to be 0.12 with a standard error of 0.10. These estimates appear to be very high when compared against the behavior of the pricing kernel implied by standard asset pricing models with moderate degrees of risk aversion. For example, Bekaert (1994) calculates the left hand side of (45) from an extended version of the Lucas (1982) model to be approximately 0.01 assuming the coefficient of relative risk aversion is equal to 2. From this perspective, the behavior of foreign exchange appears to be even more of a challenge for asset pricing theory than the behavior of equity returns. To see how the presence of a "peso problem" might help explain these results, consider an economy where equilibrium foreign exchange returns and the nominal pricing kernel switch between two processes. In particular, let X[+ 1 [ F t - St+l, ~t+a] so that the joint switching process for the two variables can be represented by (38). Further, let us assume that 7t+l (0) is constant. Now suppose that the researcher calculates the variance bound from a sample of foreign exchange returns that only contains observations from regime zero. Under these circumstances, the no arbitrage condition in (36) implies that E[Ft - S,+1 let] = -Cv(Tt+l' Ft - S+l let)
E[~,+lln,]
where Cov(Tt+l, Ft - S+Ilet) = VE[Ft - St+l let]vg[7,+l ]et]Var(Z+l let). absolute value of the mean excess return from such a sample is therefore Er[F
The
St+l]
Er [VE[F - St+l let]VE[Tt+l le,]Var (Zt+l let) 1
(46)
Thus, the absolute value of the mean excess return will be greater than zero whenever the term in the numerator is non-zero. We saw above that this term determines whether a "peso problem" is present is the risk premium. When a "peso problem" is present, (46) indicates that the sample estimate of the lower bound on the fight hand side of (45) is greater than zero. Now suppose that a researcher compared the predictions of a particular general equilibrium asset pricing model against this bound. If the model ignored regime switching, and the data used to calibrate the model was from regime zero, the implied value of x/Varr (?t+~)/Er [Tt+l] will be close to zero. This value could easily violate the lower bound in (45) based on the sample behavior of returns. This example illustrates the potential effects of "peso problems" on variance bound calculations. The violation of the variance bounds in the example occurs because the sample distribution of F t - St+l and 7t+l is unrepresentative of the underlying distribution used by market participants in their assessment of risk. In
641
this particular case, the sample distribution of the pricing kernel implied that there was no foreign exchange risk premium because Cov (7t+1(0), Ft - St+l) = O. In reality however, market participants accounted for the risk associated with the switch to regime 1 through V E [ F t - St+ 1 ](2t]~TE[~:t+l ](2t]Var (Zt+ 1 lot). Of course, these effects should disappear in large samples as the sample distribution of data approaches the underlying distribution.
4.1.3. S u m m a r y
The discussion above shows that "peso problems" can potentially affect the behavior of returns through their implications for the market's assessment of risk. I have identified the conditions under which uncertainty about the process driving future fundamentals can lead to a peso effect in the risk premium. Importantly, these conditions differ from those needed to generate "peso problems" in forecast errors and may not be met by every switching model. I have also shown how variance bounds can be affected in "small" samples when "peso problems" affect the risk premia. One question for future research is whether standard general equilibrium models extended to include peso effects in the risk premia are capable of meeting the bound requirements implied by the observed behavior of equity and foreign exchange returns.
5. Econometric issues
The central point to emerge from the analysis above is that the presence of a "peso problem" can complicate inferences about the behavior of asset prices and returns in "small" samples. Once this point has been recognized, the researcher faces two related problems. The first concerns the size of the available data sample. As we have seen, size in this context means much more than the number of data periods. Theoretically, the size of a sample depends on the difference between the sample distribution of the data and the underlying distribution used by market participants. A data sample is "small" when there is a significant difference between the two. In conventional rational expectations models without regime switching, the span of the data set is often used as a reliable indicator of size. While there are no hard and fast rules, researchers have routinely used asymptotic inferences in data sets as short as 15 years. Unfortunately, the simulation results in the literature indicate that data spans of over 100 years can be considered "small" when regimes switch infrequently. This suggests that there is no way to judge whether a data set is "small" without a model characterizing regime switches in the sample. The second problem concerns the modelling of regime switching. Following the pioneering work of Hamilton (1988, 1989), a plethora of switching specifications have been used to characterize regime switching in various applications. As we saw above, the choice of switching specifications can have far-reaching consequences for the potential role of peso effects. It is therefore important that the switching model be appropriately specified if we want to accurately gauge the
642
M . D . D . Evans
importance of "peso problems". Unfortunately, this requirement forces the researcher to face some thorny econometric issues. In this section, I will try to provide some practical guidance towards addressing these problems. I will not discuss the techniques used to estimate particular switching models since they are well covered in Hamilton (1994).
5.1. Small samples

At the outset, it should be clear that there is no way to definitively tell whether a data sample is "small" in a finite sample. It is always possible that market participants are influenced by the possibility of a switch to a regime that never occurred during the sample period. In this case, we can never hope to uncover the underlying distribution used by market participants in decision-making however well we manage to characterize the distribution of regime switches that took place in the sample. Pathological small sample problems of this type could only be detected in an infinite sample. Putting these pathological cases aside, how might a researcher proceed? One approach is to assume that the sample is well characterized by a single regime and then look for evidence against this null hypothesis. Although the details of this approach will vary according to the application, the general idea is that the presence of regime switching will manifest itself as parameter instability in the reduced form equations of the model. For example, for the dividend ratio model described in Section 3, regime switching generates parameter instability in a standard VAR for fit and Adt:
Adt+l J
6t+l ] = I A , 1
LA21 A22
A121[A~t]+[Pp~]+[Vl,t+l ]
/)24+1 ]
(47)
In this case, the proposed procedure would be to estimate (47) and then test for instability in the estimated coefficients Aii and #;. The tests developed by Hansen (1991) could be used for this purpose. Of course, evidence of parameter instability need not imply that the samples contain more than one regime. It may reflect other forms of misspecification instead. Nevertheless, finding evidence of parameter instability should lead to the consideration of regime switching.
5.2. Alternative switching models

Once the researcher finds some evidence of parameter instability and decides to investigate the possibility of regime switching, the natural question arises of how to model the switching process. Since economic theory rarely provides any specific guidance on this issue, the common approach has been to select a model on econometric grounds. In particular, researchers have typically first estimated an ad hoc switching specification and then evaluated how well it characterizes the data sample with a series of specification tests. As switching models are highly nonlinear, inferences from these tests are usually based on asymptotic distribution
643
theory. Unfortunately, as Hansen (1992) points out, the regularity conditions used in standard asymptotic theory are often violated in situations where we want to conduct specification tests on switching models. In particular, tests for the number of regimes require non-standard distribution theory. To address this problem, Lam (1990) and Cecchetti, Lam and Mark (1990) use Monte Carlo simulations in which they repeatedly estimate their proposed switching model on data generated under the null hypothesis of a single regime, i.e., no switching. The results from these simulations are then used to derive the empirical distribution of the test statistics under the null hypothesis. Although this procedure appears reasonably straight forward, it may not be easy to implement in practice for two reasons. First, the switching model has to be repeatedly estimated in order to build the empirical distribution. This can require a significant amount of computation. Second, since the data used to estimate these models is generated under the null hypothesis of no switching, the likelihood function for the switching model is likely to be very ill-behaved. As a result, nonlinear optimization techniques may have a very hard time finding the global maximum. Hansen (1992) has advocated an alternative to this Monte Carlo simulation approach. He uses the theory of empirical processes to derive a bound on the asymptotic distribution of a standardized likelihood ratio statistic that is applicable even when conventional regularity conditions are violated. Unfortunately, calculating this bound also requires an enormous amount of computation in all but the simplest models. Where does this leave the researcher? At present, there does not appear to be an easy way to conduct correct asymptotic inferences about the number of regimes to include in a model. In simple models it may be feasible to use either of the methods described above, but in others the CPU requirements appear well beyond the reach of most researchers. Perhaps the best approach in these latter cases is to consider the implications of alternative models with a different number of regimes. Recall from Sections 3 and 4 that the presence of regime switching need not lead to peso effects in asset pricing models. In particular, we examined switching models that did not generate peso effects because the estimated transition probabilities implied that the regimes were serially independent. Thus, there is little apriori reason to think that spurious peso effects will be present in a model with "too many" regimes. We may be able to side-step the question of how many regimes exist by showing that similar peso effects are present in models that use switching processes with different numbers of regimes. Aside from choosing the number of regimes, the researcher also has to specify the process for regime switching. Following Hamilton (1988, 1989), most models in the literature have assumed that the process governing the regime, Zt, follows an independent first-order Markov process. As we saw in Section 3, this assumption simplifies the calculations needed to quantify the effects of switching in dynamic asset pricing models. However, a number of authors have argued that this assumption may be unduly restrictive in certain applications. As an alternative, Diebold, Lee and Weinbach (1992) suggest that the transition probabilities
644
M.D.D. Evans
be modelled as logistic functions of a vector of variables xt. In the case of a two regime model, the transition probabilities are given by exp(x~flz)
Pr(Zt+, = ziZt = z, xt) - 1 + exp(x~flz) '
(48)
for z = {0,1}. When xt includes a constant, the constant probability model is nested within this specification. Papers using this more flexible switching specification include Engel and Hakkio (1994) and Filardo (1994). If our objective is to provide a parsimonious yet flexible switching representation for a time series process, allowing for endogenous transition probabilities is certainly attractive. But if the estimated switching model is to be used to represent the dynamics of fundamentals in an asset pricing model, the presence of endogenous transition probabilities greatly complicates the model. In this situation, it may be more attractive to think about alternative specifications for the switching process maintaining the assumption o f constant probabilities.
5.3. S u m m a r y
Researchers interested in examining the empirical importance of "peso problems" face a number of difficulties. Since the theoretical impact of "peso problems" are confined to "small" samples, the question of whether a particular sample is "small enough" is an important one. Unfortunately, it is very hard to judge whether a sample is "small" without the explicit use of switching models. Furthermore, modelling regime switching presents a number of challenges. Since conventional asymptotic inference cannot be used to differentiate between models with different numbers of regimes, in practice it will often be impossible to provide sound statistical evidence supporting a particular switching specification. Thus, the best practical way forward may be to make sure that the significance of estimated peso effects using a particular switching specification are robust to alternative specifications.
6. Conclusion
In this chapter, I have examined the channels through which the presence of "peso problems" may affect the behavior of asset prices. Although the peso effects described above will only be present in "small" samples, this theoretical constraint does not appear to limit the potential for "peso problems" to affect the observed behavior of asset prices in many applications using typical data sets. Thus, the question of whether "peso problems" contribute to the well-known asset pricing puzzles in the literature is largely an empirical one. If there is strong econometric evidence to support the presence of discrete shifts in the distribution of the data, "peso problems" can potentially affect asset prices. Going beyond this to make a strong case for the significance of peso effects in a particular application is challenging.
645
Nevertheless, there are a n u m b e r o f directions that future research o n "peso p r o b l e m s " m a y profitably take. A l t h o u g h m o s t research to date has focused o n the implications of "peso p r o b l e m s " for the b e h a v i o r o f r a t i o n a l forecast errors, "peso p r o b l e m s " can also affect the link between f u n d a m e n t a l s a n d asset prices a n d the assessment o f risk. T o e x a m i n e these effects, we need to consider the behavior of asset prices in a general e q u i l i b r i u m setting allowing for b o t h risk aversion a n d switching in the f u n d a m e n t a l processes. W i t h such models, we will be able to consider all the potential implications of "peso p r o b l e m s " for the b e h a v i o r of a single asset price. These models will also allow us to consider the implications of "peso p r o b l e m s " across asset markets. I n s o f a r as "peso p r o b l e m s " have a comm o n source, like shifts in g o v e r n m e n t policy, it seems likely that cross-market i n f o r m a t i o n will be very useful in estimating the significance o f peso effects.
References
Abel, A. B. (1993). Exact solutions for expected rates of returns under Markov regime switching: Implications for the equity premium puzzle. J. Money Credit Banking, 26, 345-361. Backus, D., S. Foresi and C. Telmer (1994). The forward premium anomaly: Three examples in search of a solution. Manuscript, Stern School of Business, New York University. Bekaert, G. (1994). Exchange rate volatility and deviations from unbiasedness in a cash-in-advance model. J. Internat. Econom. 36, 29-52. Bekaert, G. and R. J. Hodrick (1992). Characterizing the predictable components in equity and foreign exchange rates of return. J. Finance 47, 467-509. Campbell, J. Y. and R. J. Shiller (1989). The dividend-priceratio and expectations of future dividends and discount factors. Rev. Financ. Stud. 1, 195-228. Cecchetti, S. J., P. Lam and N. C. Mark (1990). Mean Reversion in Equilibrium Asset Prices. Amer. Econom. Rev. 80, 398418. Cecchetti, S. J., P. Lain and N. C. Mark (1993). The equity premium and the risk-free rate: Matching the moments. J. Monetary Econom. 31, 2146. Diebold, F. X., J. Lee and G. C. Weinback (1994). Regime switching with time varying transition Probabilities. In: Hargreaves, ed., Nonstationary Time Series Analysis and Cointegration (Advanced Texts in Econometrics). Oxford: Oxford University Press, 283-302. Duffle, D. (1992). Dynamic Asset Pricing Theory. Princeton, N.J.: Princeton University Press. Engel, C. and C. S. Hakkio (1994). The distribution of exchange rates in the EMS. NBER Working Paper no 4834. Engel, C. and J. D. Hamilton (1990). Long swings in the dollar: Are they in the data and do the markets know it? Amer. Econom. Rev. 80, 689-713. Evans, M. D. D. (1993). Dividend variability and stock market swings. Manuscript, Stern School of Business, New York University. Evans, M. D. D. and K. K. Lewis (1994). Do risk premia explain it all? Evidence from the term structure. J. Monetary Econom. 33, 285-318. Evans, M. D. D. and K. K. Lewis (1995a). Do inflation expectations affect the real rate? J. Finance, L, 225-253. Evans, M. D. D. and K. K. Lewis (1995b). Do long-term swings in the dollar affect estimates of the risk premia? Rev. Financ. Stud., to appear. Fama, E. (1984). Forward and spot exchange rates. J. Monetary Econom. 14, 319-338. Filardo, A. J. (1994). Business-cyclephases and their transitional dynamics. J. Business Econom. Statist. 12, 299-308. Flood, R. P. and R. J. Hodrick (1986). Asset price volatility, bubbles, and process switching.J. Finance XLI, 831-841.
646
M. D. D. Evans
Frankel, J. A. (1980). A test of rational expectations in the forward exchange market. South. Eeonom. J. 46. Fullenkamp, C. R. and T. A. Wizman (1992). Returns on capital assets and variations in economic growth and volatility. Manuscript, Department of Finance and Business Economics, University of Notre Dame. Hamilton, J. D. (1988). Rational expectations analysis of changes in regime: An investigation of the term structure of interest rates. & Eeonom. Dynamic Control 12, 385-423. Hamilton, J. D. (1989). A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica 57, 357-384. Hamilton, J. D. (1994). Time Series Analysis. Princeton, N.J.: Princeton University Press. Hansen, B. E. (1991). Testing for parameter instability in linear models. Manuscript, University of Rochester. Hansen, B. E. (1992). The likelihood ratio test under nonstandard conditions: Testing the Markov switching model of GNP. J. Appl. Econometrics 7, $61-$82. Hansen, L. P. and R. Jagannathan (1991). Implications of security market data for models of dynamic economics. J. Politic. Econom. 99, 255-262. Hodrick, R. J. (1992). Dividend yields and expected stock returns: Alternative procedures for inference and measurement. Rev Financ. Stud. 5, 357-386. Johansen, S. (1988). Statistical analysis of cointegrating vectors. J. Econom. Dynamic Control 12, 231-2. Kaminsky, G. (1993). Is there a peso problem? Evidence from the dollar/pound exchange rate. 19761987, Amer. Econom. Rev. 83, 450-472. Kaminsky, G. and K. K. Lewis (1992). Does foreign exchange intervention signal future monetary policy? Working Paper No. 93-3, The Wharton School, University of Pennsylvania. Kandel, S. and R. Stambaugh (1990). Expectations and volatility of consumption and asset returns. Rev. Financ. Stud. 3, 207-232. Krasker, W. S. (1980). The peso problem in testing the efficiency of the forward exchange markets. J. Monetary Econom. 6, 269-76. Kugler, P. (1994). The term structure of interest rates and regime shifts: Some empirical results. Manuscript, Institut fur Wirtschaftswissenschaften. Lam, P. (1990). The Hamilton model with a general autoregressive component. J. Monetary Econom. 26, 409-432. Lewis, K. K. (1989a). Changing beliefs and systematic forecast errors. Amer. Eeonom. Rev. 79, 621 636. Lewis, K. K. (1989b). Can learning affect exchange-rate behavior? J. Monetary Econom. 23, 79-100. Lewis, K. K. (1991). Was there a peso problem in the U.S. term structure of interest rates: 1979-1982? Internat. Econom. Rev. 32, 159-173. Lewis, K. K. (1994). Puzzles in international financial markets. NBER Working Paper No 4951, to appear in Grossman and Rogoff eds., The Handbook o f International Economics. Amsterdam: North Holland. Lizondo, J. S. (1983). Foreign exchange futures prices and fixed exchange rates. J. Internat. Econom. 14, 69-84. Lucas, R. E. (1978). Asset prices in an exchange economy. Econometrica 46, 1429-1445. Lucas, R. E. (1982). Interest rates and currency prices in a two-country world. J. Monetary Econom. 10, 335-360. Rogoff, K. S. (1980). Essays on expectations and exchange rate volatility. Unpublished Ph.D. Dissertation, Massachusetts Institute of Technology. Pagan, A. and G. W. Schwert (1990). Alternative models for conditional stock volatility. J. Econometrics 45, 267-290. Shiller, R. J. (1979). The volatility of long-term interest rate and expectations models of the term structure. J. Politic. Econom. 87, 1190-1219. Sola, M. and J. Driffill (1994). Testing the term structure of interest rates using a stationary vector autoregression with regime switching. J. Econom. Dynamic Control 18, 601428.
~"1'~
Modeling Market Microstructure Time Series*
Joel Hasbrouck
I. Introduction
Market microstructure is the area of financial economics that focuses on the trading process. Factors both practical and academic are motivating research here. On the practical side, innovation in financial markets has resulted in increased trading volume in standard securities (stocks, bonds, etc.), creation of new types of securities, and greater experimentation with alternative trading mechanisms. F r o m the academic perspective comes a fuller understanding of the role played by trading in the incorporation of new information into security prices. Empirical work in the area has also benefited from the increasing availability of detailed transaction data. Microstructure research seeks to address two sorts of questions. The first belong to the study of markets narrowly defined: how should transaction costs be estimated; what are the optimal trading strategies; and, how should markets be organized? The second and broader set of questions arises from the role that the market plays in price discovery (the incorporation of new information into the security price): how can we characterize the determinants of security value that we loosely refer to as public and private information? Ultimately these two types of questions are related. The organization of a market may affect the transactions costs, and therefore the net return to an investor, the valuation of the asset and the allocation of real resources (Amihud and Mendelson (1986)). Conversely, the characteristics of an asset (risk, return, homogeneity, divisibility) may favor certain holding patterns among investors and certain market structures (Grossman and Miller (1988)). Empirical microstructure analyses draw on three areas of knowledge. The first is comprised by the formal economic models of individual behavior that offer substantive predictions about how observable variables should behave. The second area is statistical time series analysis. The third area concerns the institutional realities: the actual procedures by which individuals and automated systems work to accomplish trades in a particular market.
*All errors are my own responsibility. 647
648
Z Hasbrouck
The theoretical work in market microstructure has centered around several reasonably well-defined paradigms that serve as a common basis for variations. The evolution of thought on security transaction price behavior has passed from basic martingale models, to noninformational cost models (order processing and inventory control paradigms), and finally to models that incorporate the distinctly informational and strategic aspects of trading. Although this paper will describe the intuitions behind these models, it does not present a rigorous discussion. O'Hara (1994) provides a comprehensive textbook discussion that establishes much of the economic background for this paper. Present empirical work in microstructure is characterized by a wide diversity of techniques. Market data exhibit a panoply of features that are hostile to statistical modeling: complex dynamics, nonlinearities, nonstationarities, and irregular timing to name a few. The impracticality of modeling all of these features jointly, in a specification that can also potentially resolve alternative economic hypotheses, leads to a multitude of more modest models that simply try to capture one or two phenomena relevant to the problem at hand. To establish a common footing, however, the models considered in this paper are cast in the framework of linear multivariate time series analysis. Most of the statistical techniques discussed here were originally developed and applied to macroeconomic time series. (Lutl~epohl (1993) and Hamilton (1994) are excellent textbook presentations.) The reader approaching the present paper from a macro perspective will find most of the time series results familiar. But time series analysis is not a mechanical procedure, and the application of any technique to a new problem involves some reflection on the economics of the situation and the nature of the data. Some issues that cause great difficulty in macro applications are conveniently absent in microstructure data: microstructure observations are exceedingly numerous and the fine time intervals over which the data are collected greatly mitigate the simultaneity induced by time aggregation. On the other hand, microstructure data often exhibit troublesome properties such as discreteness that rarely arise in macro analyses. Except as necessary to motivate the economic or statistical material, this paper does not discuss the institutional details of particular markets. For reasons of data availability, however, most empirical work has focused on U.S. equity markets, particularly the New York Stock Exchange (NYSE). Hasbrouck, Sofianos and Sosebee (1993) discuss the NYSE in detail. The NYSE and other U.S. and non-U.S, equity markets are described in Schwartz (1988 and 1991). In contemplating the various empirical approaches to microstructure modeling, it is useful to bear in mind two dichotomies or principles of differentiation. The first dichotomy arises from the issues to which microstructure analysis is commonly addressed: the narrowly defined questions of market design and operational market performance vs. the broader informational and security valuation issues. From an economic perspective, the actual security price in many microstructure models can be interpreted as an idealized "informationally efficient" price, corrupted by perturbations attributable to the frictions of the trading process. From an empirical viewpoint, the distinction can loosely be viewed as
Modeling market mierostructure time series
649
one based on time horizon. New information imparts a permanent revision to the expectation of a security's value, while microstructure effects are short-lived and transient. The first principle, then, is the dichotomy of security price variations into permanent (informational) and transitory (market-friction-related) components. The second dichotomy addresses the s o u r c e of the price variations, as to whether or not they are trading-related, i.e., attributable to one or more transactions. This distinction is more subtle than the first, because while the difference between permanent and transitory components arises frequently in economic analysis, the preoccupation with the role of trades p e r s e in price determination is largely peculiar to microstructure studies. For the present purpose, the most important aspects of a trade are the fact and time of its occurrence, the price and volume (quantity), and whether the trade was initiated by the buyer or the seller. This last characteristic may require some elaboration. Academic economists have long reacted to lay statements like, "Heavy buying drove stock prices higher today," with retorts along the lines of, "So, there were no sellers?" Certainly there must be a seller for every buyer. At a fine level of observation, however, it is often sensible to identify the active and passive sides of the transaction. The active transactor can be viewed (in the sense of Demsetz (1968)) as the agent who seeks to trade immediately, and is willing to pay a price to do so. The passive transactor is the supplier of immediacy. In many security markets, for example, the passive traders are those who post bid and offer quotes (indicated prices at which they are willing to buy or sell), and wait. The traders who impatiently demand an immediate trade, and accept one of the quotes (hitting the bid or lifting the offer) are active. A trade can affect both the permanent and transitory components of the price. The permanent effect is informational. In asymmetric information models, the informational impact of a trade is attributed to market's estimate of the private information content of the trade. The price rises in response to a buyer-initiated trade, for example, in accordance with the market's assessment of the chances that the trade was initiated by positive information known to the buyer, but not to the public. The portion of the permanent price movements that can be attributed to trades is therefore related to the degree of information asymmetry concerning the firm's value. From a statistical viewpoint, it may be measured by the explanatory power of trade-related variables in accounting for price changes. The transitory price effect of a trade is a perturbation induced by the trade that drives the current (and possibly subsequent) transaction prices away from the corresponding informationally accurate (permanent component) prices. For a particular trade, this divergence may sometimes be interpreted as a trading cost. In simple bid-ask spread models, for example, the divergence corresponds to a cost paid by the active trader to the passive trader. More generally, the traderelated transitory effect will reflect influences such as price discreteness and inventory control (position management) by dealers. For the sake of completeness it should be mentioned that both permanent and transitory price components may be due to considerations not directly related to
650
Z Hasbrouck
trades. Security prices (or indicated prices) react to public information, such as news releases. The permanent effect of a public news release is informational. Any lagged adjustment toward to new permanent price would constitute a transitory component. The principal dichotomies of permanent vs. transitory and trade-related vs. trade-unrelated are summarized in Table 1. For each combination, the table gives economic examples and also considerations useful in empirical resolution. These will be discussed at length in the following sections. Although these distinctions are useful for classification and exposition, this simplicity comes at the cost of neglecting economic considerations that cross over these dichotomies. As noted earlier, the operational features of a security market may affect the informational characteristics of a security and vice versa. However, many useful analyses can proceed under plausible ceteris paribus assumptions. Assuming that market structure stays fixed, one may want to examine shifts in information characteristics surrounding corporate announcements. Alternatively, assuming that the informational structure stays fixed, one might want to examine the effect of a change in the tick size (minimum price increment). The literature contains examples of both sorts of analyses. While an overview of any sort requires the imposition of some classification scheme, the particular perspective adopted here follows from a personal preoccupation with the dynamic properties of microstructure data. One could organize a survey historically or from the perspective of different market participants, perhaps with equal justification. Nor is the perspective adopted here
Table 1 A classification of microstructure effects Type of price change Permanent (informational) Source of Trade-induced price change (attributable to an actively initiated transaction) Economic: Market's assessment of the information content of the trade (asymmetric information) Statistical: Random-walk component of price attributable to trade variables Not trade-induced Economic: Public information Transient (market related) Economic: Non-informational spread effects, transaction costs, dealer inventory control effects, price discreteness. Statistical: Stationary component of price attributable to trade variables.
Economic: Lagged adjustment to public information, price discreteness Statistical: Stationary component of price not explained by trades.
Statistical: Random-walk component of price change not attributable to trade variables.
Modeling market microstructure time series
651
is an exhaustive one. I attempt to point the reader to approaches that lie outside of this framework, but cannot claim to do justice to these studies.1 The organization of the paper is as follows. The next two sections describe the basic economic paradigms of market microstructure using simple structural models. Section 4 presents a general statistical framework in which the diverse microstructure effects can be accommodated while maintaining the two distinctions described above. The next sections address particular characteristics of microstructure data that lie beyond (or at least at the fringes of) conventional techniques: irregular timing of market events such as trades (Section 5); price discreteness (Section 6); nonlinearities in the trade-price relation (Section 7); and multiple security / multiple market situations (Section 8). A summary concludes the paper in Section 9.
2. Simple univariate models of prices

2.1. M a r t i n g a l e s a n d the r a n d o m - w a l k m o d e l
The efficient markets hypothesis of financial economics generally implies that a security price (perhaps normalized to reflect an expected return) behaves as a martingale, a stochastic process with unforecastable changes (Samuelson (1965) and Fama (1970)). A special case useful for empirical work is the homoskedastic random walk, wherein the evolution of the security price p t is given by
Pt = P t - l + wt
(2.1)
2 and E w t w z = 0 for t ~ z. where the wt are disturbances with E w t = O~ E w 2 = a w These unforecastable increments derive from updates to the market's information set (cf. Table 1). This model is often generalized to include an unconditional expected price change or return, but for reasons both expositional and practical (described below) this component is omitted in the present discussion. The martingale property typically arises because the fundamental security valuation in many models is characterized as a conditional expectation of the security's terminal (liquidation) cash flow. A sequence of conditional expectations is a martingale (Karlin and Taylor (1975, p. 246)). For the actual security price to behave as a martingale, however, additional structure must be imposed. The hypothesis that transaction prices behave as a random walk rests on assumptions (most importantly, the absence of transaction costs) that do not hold even approximately at the level of the microstructure phenomena considered in this paper. The random-walk model is nevertheless a useful point of departure. Even if the (martingale) conditional expectation does not completely determine the security 1Arecentsurveyby Goodhart and O'Hara (1995)providesmore backgroundon volatilitymodeling and non-equitymarket applications.
652
J. Hasbrouck
price, it certainly constitutes a component that is large and economically important. Accordingly, even for models in which actual transaction price processes exhibit complicated dependencies, examination of the random-walk component of the price will illuminate the informational structure of the market. Furthermore, the departure of actual prices from the implicit martingale component may be used to illuminate the costs of transacting in the market. In embedding the random-walk model in microstructure frameworks, however, one should bear in mind the importance of the conditioning information. A price Pt is said to be a martingale with respect to a (possibly vector-valued) information process q~t if E[pt+ll~0, cbl,..., ~] = pt. If the conditioning information includes the price (p~ c ~bt), then E[pt+llpo, p l , . . . , p t ] = pt. This ensures that the increments wt in (2.1) are unforecastable. The assertion that Pt c ~t is frequently supported by institutional fact. Most of the early theoretical and empirical work on market effciencyfocused on U.S. equity markets, for which transaction prices are promptly reported and widely disseminated. Many markets, however, such as the U.S. government securities market, do not enforce trade reporting, or, as in the case of the London equities market, permit delayed reporting of certain trades (Naik, Neuberger and Viswanathan (1994)). In the absence of prompt trade reporting, the fallback justification of (2.1) is that the transaction price is redundant, i.e., that it contains no new information beyond that available in the public information set. This view is unattractive because current economic thought accords great significance to the role played by prices as aggregators or signals of private information. In summary then, the random-walk model, which is a component of most of the specifications discussed in this paper, is only appropriate in markets with prompt transaction reporting. Absent this disclosure, other approaches must be used. Instead of using transaction prices that may not be widely disseminated, for example, it may be preferable to use dealer bid and offer quotes. Correct specification of the conditioning information at the transaction level may be exceedingly difficult because knowledge will often differ in a subtle fashion across participants by reason of proximity to the market and cost. For example, the contents of the book (pending orders) on the Tokyo Stock Exchange are publicly available in the sense that anyone may obtain the information from his or her broker. But the data are electronically transmitted only in response to an inquiry and only to the broker's lead office (Hamao and Hasbrouck (1995), Lehmann and Modest (1994)). Costs of information acquisition that are small at long time lags may become large over microstructure time frames. Daily closing security prices are available for the price of a newspaper, for example, while immediate updates require expensive real-time data feeds. The preceding remarks are intended to heighten the reader's sensitivity to informational issues that are often suppressed (in the interests of tractability) in the formal models. When aspects of these models are incorporated into specifications and estimated for real market data, these considerations usually warrant at least some qualification of the conclusions.
653
Equation(2.1) is specified in terms of price levels. It is often useful to interpret as the natural logarithm of the price, in which case the first difference is a continuously compounded rate of return. This is particularly convenient when the analysis covers multiple securities spanning a wide range of prices, and in many applications does not affect the conclusions. It should be borne in mind, however, that most of the formal models are constructed using price levels. Furthermore, certain microstructure phenomena (discreteness, in particular) depend fundamentally on the price level. Many tests have been proposed and applied to the problem of determining whether stock prices follow a random walk over daily or longer intervals (Fama (1970) and Lo and MacKinlay (1988)). At the level of transaction prices, however, the random-walk conjecture is a straw man, a hypothesis that is very easy to reject in most markets even in small data samples. In microstructure, the question is not "whether" transaction prices diverge from a random walk, but rather "how much?" and "why?" For the present, however, it is useful to discuss several aspects of estimation in random-walk models that will also apply in more realistic situations. Microstructure data sets typically contain by large numbers of observations (often in the thousands for each security) over a relatively brief period of calendar time (such as a few months). To the econometrician seeking to estimate the parameters of a microstructure model, the abundance of observations appear to hold out the promise of high precision. Unfortunately, when the number of observations is a consequence of fine sampling (rather than a long span of calendar time), the increase in precision is partially illusory. In particular, Merton (1980) shows that while precision of the estimate of variance per unit time increases, that of the mean estimate does not. In view of the large estimation errors for the mean, Merton suggests estimating the variance using the noncentral sample moment. There are two practical implications of this for transaction-level analyses. First, if we are willing to accept a small bias in our estimates, the precision of these estimates is enhanced by ignoring the unconditional expected return (suppressing the intercept in price-change specifications). The discussions that follow do this as a matter of routine, although it is usually a simple matter to add a nonzero expected return. Second, tests of economic hypotheses that are based on second moments (variances and covariances) are likely to be more powerful than those that rely on first moments.
Pt
2.2. M o d e l s w i t h r a n d o m p r i c i n g e r r o r s
It is useful to generalize the random-walk model by allowing the security price to reflect a stationary disturbance in addition to the random-walk component. The general structural model is:
mt = mt 1 + wt Pt = m t + st
(2.2)
654
J. Hasbrouck
Here, the random-walk term is mr, which may be interpreted as an implicit efficient price, where (as in (2.1)) the wt are unforecastable increments arising from updates to the conditional expectation of the security's terminal value. The second component in the price equation (st) is a stationary component that for the moment can be viewed in an ad hoc fashion as a residual or perturbation that drives the transaction price away from the implicit efficient price. Model (2.1) establishes the first of the principal dichotomies alluded to in the introduction (cf. Table 1). The informational aspects of a model may be characterized by analysis of the mt or the wt. The noninformational features show up in the st. Since the dichotomy is not observable, some additional structure must be imposed on the problem in order to make substantive statements. It is often useful to estimate the wt and the st at a point in time (as a function of various sets of 2 and ~ 2 conditioning information), to estimate the variances ~w , and to ascertain the components of these variances. In a sense, most of this paper is devoted to consideration of the full generality of (2.1). The motivation for and interpretation of wt are essentially the same as in the random walk model. The new feature that has been introduced is the stationary pricing error. The terminology stems from its role as a discrepancy between the implicit efficient price and the actual transaction price. If st > 0, then there is a sense in which the buyer lost (paid in excess of the efficient price) and the seller gained. Aggregating over the buyer and seller, st is a zero-sum game. If st were randomly distributed over trades and traders, then one would be tempted to argue its irrelevance by the law of large numbers. Equality of traders in real markets, however, is a poor assumption. Agents' characteristics (small trader, large trader or dealer) have a large effect on the sort of prices they give and take, and it is therefore likely that the pricing error will induce systematic distributional effects.
2.3. The simple bid-ask spread m o d e l
A useful special case of the preceding model arises from the following trading process. The implicit efficient price is common knowledge to all participants. A market-maker or dealer in the security posts a price at which he is willing to buy (the bid price) and a price at which he is willing to sell (the offer or ask price). These bid and ask quotes will be denoted q~ and q~, and the difference between them is termed the spread, St = q~ - qb. In economic terms, this spread can be viewed as a consequence of the dealer's need to recover fixed transaction costs and a normal profit (Tinic (1972). Alternatively, the spread may arise endogenously from the choices of traders deciding between market (active) and limit (passive) orders, as in Cohen, Maier, Schwartz and Whitcomb (1981). These are noninformational spread models; other alternatives will be considered below. Assuming that the spread is constant at S, that the bid and ask quotes are set to bracket symmetrically the implicit efficient price (q~ = mt - S / 2 and q~ = mt S / 2 ) , and that at each time point, an agent arrives at the dealer and either buys (at price q~) or sells a single unit of the security (at qb). The full model is now
655
m t = rot-1
Av
Wt
Pt = ms + Ct ct = + S / 2
(2.3)
The vacillations of et are sometimes called "bid-ask bounce". The market mechanics imply that ct in (2.3) is a stationary random process with the following properties: Ect = 0; Ec 2 = a2; E c t c , = 0 for t z and Ectw~ = 0 for all t, z. The first three properties establish ct as a zero-mean homoskedastic random variable with no serial correlation. The fourth property asserts that it is uncorrelated with the information process, i.e., that the increments in the implicit efficient price are not trade-related. By comparing this model with (2.2) it is apparent that ct = st, the pricing error. The variance of the pricing error is a useful summary measure of how close actual transaction prices track the implicit efficient price. In this model, 0.2 = 0..2= $ 2 / 4 . In this model st is clearly driven by the incoming trade (buy or sell). In modern microstructure data sets, these trades (or convenient proxies) are often observable, and it is possible to model them directly. Representative bivariate price and trade models will be discussed extensively below. Many older historical data sets, however, are limited to transaction prices. We therefore consider inference based only on these prices. We are in effect attempting to make inferences about the two unobserved components of the transaction price, mt and s t ( = ct). The price changes are:
A p t = Pt - Pt-1
= w t Av s t - - s t - 1
(2.4)
with first and second-order autocovariances given by 70 = EAp 2 = O'2 w+ 20.2 and 71 = E A p t A p t _ I = _0.2. The autocovariances at higher orders are zero. From 2 these first two autocovariances (or estimates thereof), we may solve for o-2 and o" w. Most importantly, the spread is given by
S = 2ac = 20.~ = 2v/~ ~ .
(2.5)
The last expression is commonly known as Roll's (1984) estimate of the spread. This obviously requires 71 -< 0. Harris (1990) discusses the statistical properties of this estimator. Another useful characterization of this model is the innovations or moving average form. A process that possesses zero autocovariances beyond the first lag may be characterized as a first order moving average (MA(1)) process:
A p t = et + Oct-1
(2.6)
where the et are serially uncorrelated homoskedastic increments. By equating the price change autocovariances implied by (2.4) and (2.6), the correspondence between the two sets of parameters may be established. In the one direction, aw2 = (1 -}- 0)20-2 and a,2 = _00.2. There is a useful intuition behind the expression for 0-2. The impulse response function of a time series model specifies how the variables react to particular initial
656
Z Hasbrouck
shocks. Suppose in the present case that the lagged innovations et-1, et-2, . are zero. If the innovation at time t is nonzero, the expected current and subsequent price changes implied by equation (2.6) are E[Apt[t] -~ et. E[Apt+I [ct] = Oct, and E[Apt+k[et] -~ 0 for k > 0. The cumulative expected price change is therefore E[Apt + Apt+l + Apt+2 + ' "
let] = (1 + O)et
(2.7)
This is the long-run expected price impact of an innovation, i.e., the informational impact of the innovation. This implies wt = (1 + O)et, from which the expression for a 2 w follows immediately. In the discussions that follow, impulse response functions are often used to characterize the dynamic properties of structural models. While many economic hypotheses of interest can be addressed by considering the variances of the random-walk and pricing error components, it is often desirable to know wt and st at a particular time. On the basis of the transaction prices these quantities are not identified in this model (even if we condition on prices subsequent to t), although filtered estimates are attainable.
2.4. Lagged price adjustment
The simple bid-ask model predicts that the price change will exhibit a negative first-order autocovariance. This is in fact usually the case in transaction price data. The model may be generalized to permit price change dependencies at orders higher than one by introducing lagged price adjustment. Goldman and Beja (1979) suggest that security dealers do not instantaneously adjust their quotes to new information, but do so gradually. More generally, lagged adjustment can arise from lagged dissemination of information, price smoothing by market makers and discreteness. Other analyses that feature lagged adjustment are Amihud and Mendelson (1987), Beja and Goldman (1980), Damodaran (1992) and Hasbrouck and Ho (1987). A simple lagged-adjustment model is given by:
mt = mt-l + wt
P t = P t - 1 + o~(mt -- P t - 1 ) ,
(2.8)
where ~ is an adjustment speed parameter. (The spread is suppressed here in order to focus on the lagged adjustment.) The price dynamics implied by this model may be illustrated with an impulse response function. Figure 1 depicts the price subsequent to a one-unit shock in the efficient price (w0 = 1), assuming an adjustment parameter of a -- 0.5. At each step, half of the remaining adjustment is made toward the efficient price. If 0 < ~ < 1, this adjustment is monotonic. By substitution from (2.8), it is seen that price changes are generated as the first-order autoregressive process: Apt = ( 1 - ct)Apt_l ~-o~wt. If the estimated model is Apt = d~Apt_l + et, the structural parameters may be computed as: 2 has an 2 a2/(1 - - ~b)2, ~ = 1 _ qS. As in the simple bid-ask spread model, trw O" w ~ impulse response interpretation. The random-walk innovation may be computed
657
Price
1
0.8 0.6 0.4 0.2
(p)
Otl, 0 O 0 0 O O O O O 0 0 0000
t 5 10 15 20
Fig. l. The Impulse Response Function for the Lagged Price Adjustment Model. The adjustment of the transaction price (p) subsequent to an initial shock of + 1 in the efficient price. The model is the lagged price adjustment model given in equation (2.8), with parameter ~ = 0.5
as w t = (1 + ~b + ~b 2 + . . - ) e t = (1 - qS)-let, which effectively sums each period's contribution to price subsequent to the initial disturbance. The pricing error is st = P t - mr, which implies st = (1 - c~)St_l - (1 - c~)wt = ~bSt_l - c~wt and 2 = 22 1-1 -- )2j . Since there is one disturbance driving this model (wt), both wt and st can be recovered from the price record. This is a stronger result than obtained in the simple bid-ask spread model. From a time-series perspective, this is due to the fact that the stationary component in the present model is an exact linear function of past w's. In the simple bid-ask model, whether the trade took place at the bid or the ask (i.e., the value of st) is independent of wt.
3. Simple bivariate models of prices and trades

The univariate price models described above are capable of exhibiting dynamics that reflect microstructure phenomena and can also capture the first dichotomy mentioned in the introduction, that between permanent (informational) and transient (market) effects. The models described in this section encompass trades as well, with a view toward establishing the second important distinction, that between trade-related and -unrelated sources of price variation.
3.1. Inventory models
Buyers and sellers in the simple bid-ask spread model are assumed to arrive independently and with equal probability. Let xt denote the signed trade quantity, positive if the arriving trader buys from the dealer and negative if the trader sells. t x i. In the paper The cumulative quantity from time zero through time t is ~i=0 that introduced the term "microstructure", Garman (1976) pointed out that as t increased, this sum would diverge, implying that the dealer bought or sold (net)
658
Z Hasbrouck
an infinite amount. Real-world dealers face capital constraints, however, and would in any event avoid large positions due to risk-aversion. This motivates the need for some sort of inventory control or position management. The inventory control problem in classical microeconomics is one of specifying a restocking strategy subject to order and stock-out costs. The security market dealer, on the other hand has traditionally been supposed to achieve inventory control by shifting the quotes to elicit an imbalance of buy and sell orders. Formal models of this effect include Amihud and Mendelson (1980), Ho and Stoll (1981), O'Hara and Oldfield (1986) and Stoll (1978). As an illustration, consider a generalization of the simple bid-ask spread model in which quote-setting is depends on the dealer's inventory position and incoming order flow depends on the quotes:
m t = m t - i -1- Wt
qt = m t - b i t - 1 It = I-1 - xt xt = -a(qt - m t ) + vt
(3.1)
Pt = qt + cxt
The first equation describes the random-walk evolution of the efficient price. The quotes are summarized by the quote midpoint (the average of the bid and ask quotes), qt. This is equal to the efficient price plus an inventory control component, where It is the dealer's inventory at the close of period t. Without loss of generality, the dealer's target inventory is assumed to be zero. The quote-midpoint equation specifies that with b > 0, the dealer lowers his price if he has a long position. The net demand, xt, is driven by a price sensitive component (a > 0) and a random component. The usefulness of the quote position as an inventorymanagement tool is based on the demand price elasticity. Since the dealer is assumed to be the counterparty to all trades, the change in inventory is equal to the negative of the net demand. The transaction price is equal to the quote midpoint, plus a cost component ext. This cost is proportional to trade size: rather than quoting a bid and offer price, the dealer quotes a linear bid and offer schedule. A trader wanting to buy an amount [xt[ will be quoted an ask price of q~ -- qt + c[xtl, and a trader wanting to sell will be quoted a bid price of qt b = qt - c[xt]. The trade innovation vt is assumed to be serially correlated, and uncorrelated at all leads and lags with wt. The essential features of this model can be illustrated by examining the impulse response function for a particular set of parameter values. Let a = 0.8, b = 0.04 and e = 0.5, and consider the paths of price and inventory subsequent to a trade shock at time zero of v0 = 1, i.e., a purchase of one unit from the dealer. These paths are graphed in Figure 2. The buy is associated with an immediate price jump due to the cost component. Reversion is not immediate, however. Subsequent to the trade, the dealer has a inventory shortfall and must raise his quotes to elicit an incoming sell order. As the sell orders arrive (in expectation), the
659
Price (p)
Inventory(I)
0.~ "4~-5
0.4 0.3 0.
-0.2 -0.4 . . . . 15 20 -0.6 -0.8 10
e~3
10
15
20
Fig. 2. The Impulse Response Function for the Inventory Model. The adjustment of the transaction price (p) and the dealer's inventory (I) subsequent to an initial purchase of one unit. The model is the inventory control model given in equation (3.1) with parameters a = 0.8, b = 0.04 and c -- 0.5.
dealer resets the quotes to the initial level. The inventory path reflects the initial depletion caused by the purchase ( f r o m the dealer) and the subsequent sales ( t o the dealer). At the end of the adjustment process, both price and inventory have completely reverted. There is no permanent price impact of a trade in this model because trades are independent of information. The permanent component of the price change is wt, which is due entirely to public information. The pricing error is:
st = P t - mt = cxt - b i t - 1
(3.2)
This is entirely trade-driven. As in the simple bid-ask model, the buyer pays the half-spread ext. The second term depends on the dealer's previous inventory position. If the dealer happened to have an inventory surplus, the buyer's cost would be reduced. If both P t and It are observable, the model may written as: A p t = - c l t + (2c - b ) I t _ l + ( b - c ) I t _ 2 + W t a n d / t = (1 - a b ) I t _ l - vt. Formally, this is a bivariate vector autoregressive (VAR) model, with a contemporaneous recursive structure, which may be estimated directly by least squares. There is sufficient structure here to recover both wt and st from current and past observations. Among the various sorts of microstructure data available, however, dealer inventory data are about the rarest. Implicit in these data are the dealer's trading strategies and trading profits, both of which are usually kept private. If It is not known, then inference must proceed solely from prices. On the basis of the univariate time-series representation of the price changes, the structural model is underidentified. Two important structural parameters are identified, however: the variances of the random-walk and pricing error components. Due to the paucity of inventory data, there are few analyses of pure inventory control models. In a U.S.S.E.C. (197l) study, Smidt presents some results for NYSE stock specialists based on daily positions and price changes. Ho and Macris (1984) estimate a transaction level model for an American Stock Exchange options specialist. Most recent studies allow for the possibility of asymmetric information in addition to inventory control, and these are discussed below.
660
3.2. A s y m m e t r i c i n f o r m a t i o n
J. Hasbrouck
The models considered to this point have assumed that all market participants possess the same information. This sort of public information may be thought of as instantaneous news releases, in response to which bid and offer quotes would adjust with no necessity of trading. The most important recent developments in theoretical microstructure, however, have been models that allow for heterogeneously informed traders. If a trade might be motivated by superior information, the occurrence of a trade (a public event in most models) will communicate to the market something about this private information. Some studies that initially addressed this phenomenon in microstructure settings are Bagehot (1971), Copeland and Galai (1983), Glosten and Milgrom (1985), Kyle (1985) and Easley and O'Hara (1987). O'Hara (1994, Ch. 3) provides an overview. A simple model of private information with fixed transaction costs can be given as:
mt ~ rot-1 + Wt wt = ut + 9xt qt ~ mr-1 q- ut Pt = qt + cxt
(3.3)
Relative to the earlier models, the novelty here is in the random-walk innovation, It is now composed of two components. The first, ut, is assumed to reflect updates to the public information set. The second, 9x~, with g > 0, reflects the market's estimate of the information contained in the trade. For this component to be serially uncorrelated, it must be the case that xt is serially uncorrelated, i.e., we are back to assuming that buy and sell orders arrive randomly. This model is a variant of one suggested by Glosten (1987). Actual transaction prices are subject to a bid-ask spread related to the direction of the trade. There are two ways of interpreting the cxt term in the price specification. First, if the magnitude of the trade is fixed, say xt c {-1, +1 }, then c is one-half the bid-ask spread (S/2), with transactions occurring at the bid and offer prices (q~ -~ q~ - S / 2 and q~ = qt + S / 2 ) . Alternatively, if trade size is continuous, then c gives the slope of the dealer's linear bid and offer schedule. The dynamic behavior of prices and trades may be illustrated by the impulse response function based on parameter values e = 0.5 and g = 0.2, subsequent to an initial buy order of one unit (x0 = 1). These are graphed in Figure 3. The initial price jump simply reflects the bid-ask bounce, but in contrast with the inventory control model, the reversion is not total. Of the initial 0.5 price jump, 0.2 is the inferred information content, which remains permanently impounded in the stock price. By assumption there are no serial dependencies in trades: the initial purchase engenders no subsequent order flow effects. The evolution in the efficient price now reflects both public and private information components, so
wt.
661
Price (p)
Trade (x)
0.4 0.5 0.3 0,2 J ~ 4 1 6 D 6 d l Q ~ d 1 4 1 6 6 0.1

- . ' ' . . . . . t
0.8 1 I 0.6 0.4 0.2 20 5 t0 15 20
10
15
Fig. 3. The Impulse Response Function for the Asymmetric Information Model. The adjustment of the transaction price (p) and the incoming trade (x) subsequent to an initial purchase of one unit. The model is the asymmetric information model given in Equation (3.3) with parameters c = 0.5 and 9 = 0.2
2 .~- O"u 2+ O" w
92Crx 2'
(3.4)
which isolates the non-trade and trade-related components of the efficient price change. A useful summary measure of the relative importance of trades in explaining movements in the efficient price is the proportion
RZ,x = g 2 2
(3.5)
The R 2 notation denotes the usual "proportion of total variance explained." This measure generalizes beyond the present model, and is a useful proxy for the extent of asymmetric information. The private information effects in this model reflect the market's beliefs about the probabilistic structure of the private information, not the actual level of private information. That is, the price impact of a particular trade depends only on the market's general beliefs about extent and nature of private information, and not directly on the actual information possessed by the trader. A model of this sort cannot be used to identify, for example, illegal insider trades in a sample of data. The pricing error is
s, = p t - m , = ( c - 9 ) x ,
(3.6)
The pricing error is entirely trade driven. Relative to the simple bid-ask model with no private information, however, st is reduced by the information content of the trade, 9 x t . It is generally assumed that c > 9 because the dealer is setting the halfspread to recover both information costs 9 and additional order processing costs. The return series is given by:
Apt = pt P,-] = ut + c x t (c g)xt-1
(3.7)
If trades and prices are observable, this may be estimated directly. Early transaction-based estimations of trade impacts on price are Marsh and Rock (1986), Glosten and Harris (1988), and Hasbrouck (1988).
662
J. Hasbrouck
When trades are not observed, however, the inference must proceed solely on the basis of transaction prices. This model superficially resembles the simple bidask model considered in section 2.3. Like the earlier model, it possesses an MA(1) representation of the form (2.6). Here, however, the two parameters of the MA model {cr2, 0} are insufficient to identify the four parameters of the structural model {c,g, 0-u,O-x}. 2 2 The random walk variance is identified as before: __ 2 + 920-2. In contrast with the earlier model, however, we a2 w = (1 + 0) 2 20-~% cannot assume that the pricing error is uncorrelated with the increment to the efficient price. The connection to the simple model may be illustrated by considering the estimate of the spread given in equation (2.5). Suppose that xt c {-1, +1}, a2 = 1 (from the assumption of equiprobable buy and sell orders) and that c is the half2 = (c - 9)2. The estimate spread S / 2 . From (3.6) the pricing error variance is 0-, of the spread implied by the simple bid-ask model will generally be biased downward. In the present model, the first-order autocovariance is ~l = - c ( c - g)0-2 x = - c ( c - 9). For example, if c = 9, i.e., if the spread is entirely information-based, then the transaction price changes will exhibit no autocorrelation, and the simple estimate of the spread will be zero. From a statistical viewpoint, the pricing error in the simple model is uncorrelated with w t (the increment in the efficient price). In the present model since st = (c - g ) x t and wt = ut gxt the two are correlated due to the shared influence of trades. This correlation will not be perfect, except in the special case where 0-2 = 0, i.e., where there is no nontrade public information. Although this case is not attractive from an economic viewpoint, the value of cr2 implied by this restriction possesses the useful property that it establishes a lower bound for 0-s 2 (over all correlations between w t and st, holding constant the parameters of the observed return model {0-2, 0}). In terms of the moving average representation (2.6), the assumption of perfect correlation implies that both st and w t are proportional to et. Equating wt to the cumulative effect of a disturbance (cf. the discussion following equation (2.7)) gives wt = (1 + O)et. From (2.2), A p t = et + Oct-1 = (1 O)et + st - s t - l , which 2 implies by inspection that st _ - _ - O c t , and %.lower bound = 02a2 . Since - 1 < 0 < 0, this is obviously less than or equal to the estimate of 0-2 implied by the simple model, - 0 0 - ~ . This lower bound is generalized in section 4. In summary, based on knowing the parameters of the return process for this model (autocovariances or, equivalently, the moving average parameters), we can compute the random-walk (implicit efficient price) variance. Neither the pricing error variance nor derived measures such as the spread, however, are identified in the absence of further restrictions. Unfortunately, neither of the two identification restrictions considered above is particularly attractive, as they involve a choice between suppressing all public information or alternatively all private information.
Modeling m a r k e t microstructure time series
663
3.3. M o d e l s with both a s y m m e t r i c information a n d inventory control
The following model combines inventory control and asymmetric information in an additive fashion:
m t = m t - 1 q- w t w t = Ut ~- 91)t
qt = mt-1 qIt : Pt : lt-1 -
Ut
--
bit-1
(3.8)
xt : - a ( q t - (mr-1 + u,)) + v,
xt
q t q- c x t
The mt and wt expressions are the same as in the asymmetric information model of the last section. The quote-midpoint expression includes an inventory control component. When information is entering the model from two sources, one must pay particular attention to the timing. At time t, public information (ut) arrives, quotes are set (qt), net demand is realized (xt), which leads to a transaction at price pt. Finally, the new efficient price m t is set to reflect the information contained in the trade. The increment to the efficient price is driven by the trade innovation vt and not simply the total trade. (Any new information imputed to the trade should come from the trade innovation.) The quote midpoint is set to reflect the current public information (ut) and the inventory imbalance, but not the private information inferred from the time-t trade (which is not known at the time the quote is set). The incoming net demand reflects the difference between the current quote and the effcient price inclusive of public information. The essential features of this model are illustrated by the impulse response function. The same parameter values are used as for the pure inventory control case in Figure 2, with 9 : 0.2. Figure 4 depicts the time path subsequent to a oneunit innovation in the demand (v0 = 1, a one-unit purchase from the dealer). The essential difference between this and Figure 2 is that the price reversion is incomplete. There is a permanent price effect of the buy order innovation, equal to 9v, = 0.2(1). The pricing error is
St : Pt -- mt= c x t - - gl)t - -
blt-~
(3.9)
The cxt - 9vt term is analogous to the (c - O)xt expression for the pricing error in the pure asymmetric information model (3.6). Note, however, that the half-spread c is paid on the full trade, while the information update is driven solely by the trade innovation. The role of the -b/t-1 term is the same as in the inventory control model (cf. equation (3.2)). Both terms are trade-driven. The joint specification for returns and inventory levels may be written as a bivariate VAR in which all structural parameters are identified. If only transaction prices are available, only the random-walk variance (not the pricing error variance) may be identified from the reduced form. By comparing the price impulse responses for the inventory control model (Figure 2), the asymmetric information model (Figure 3) and the combined model
664
Z Hasbrouck
Price (p) 0.4 8 0.5 I 61~ 0.3 0.2 0.1 5 ~TQQ 10 15 20
Inventory (I)
-0.2 -0.4 -0.6 -0.8 -1
.ee-,++.. 5 10
t 15 20
Fig. 4. The Impulse Response Function for the Inventory Control/Asymmetric Information Model. The adjustment of the transaction price (p) and inventory (I) subsequent to an initial purchase of one unit. The model is the inventory control/asymmetric information model given in Equation (3.8) with parameters a = 0.8, b = 0.04, c = 0.5, and g = 0.2.
(Figure 4), it is apparent that the short-run price effects implied by the inventory and asymmetric information effects are very similar. In the pure inventory control model, the price rises in response to a buy because the dealer now has an inventory deficit and must attract more selling interest. In the asymmetric information model, the price rise reflects the new information revealed by the trade. The similarity of the short-run price responses engendered by the inventory and information effects makes resolution of the two very difficult. Since the inventory control paradigm arose first, it was natural for early studies detecting a positive impact of trades on prices to affirm the existence of inventory effects. Empirical tests of (more recent) asymmetric information models tended to attribute the initial price rise to the information content of a trade. In practice, the two mechanisms can be resolved only by a dynamic analysis of both short and long-run effects. Studies of dealer (specialist) trading in equities on the NYSE suggest that inventory control is indeed practiced. However, the mechanism is considerably more complicated than that allowed for by the simple models considered here. The hypothetical impulse response functions discussed here depict a rapid inventory adjustment process, spanning a dozen trades at most. Trades are hypothetically negatively autocorrelated: a purchase should (in expectation) be followed in short order by sales. In actuality, however, trades exhibit strong positive autocorrelation in the short run (Hasbrouck and Ho (1987) and Hasbrouck (1988)). Furthermore, NYSE specialist positions appear to possess large long-run components (on the order of weeks or months). The ability of the available data samples to support reliable identification of transient inventory-control quote effects at these horizons is poor. See Hasbrouck and Sofianos (1993) and Madhavan and Smidt (1991 and 1993). As noted above, this simple model combines inventory and asymmetric information effects in an additive fashion. The demand of an informed trader (and the market's estimate of the information content of a trade), however, will in principle depend on the prevailing bid and offer quotes, which are also determined by the dealer's inventory position. The Madhavan and Smidt models illuminate these interactions.
Modeling market microstructure time series 3.4. Prices, inventories and trades
665
The preceding analyses suggest that in the presence of asymmetric information or some combination of asymmetric information and inventory control, the results available from reduced-form price-change specifications are meager: tr2 is identified, but o-2 is not. It was also noted, however, that data sets that include dealer inventory data are rare. (There are presently none to my knowledge that exist in the public domain.) It is often possible, however, to obtain good proxies for the trade series, xt. A common practice when trade prices and volumes are reported and bid and ask quotes are available is to construct the proxy
+(volume)t xt =
0,
-(volume)t ,
if pt > qt. if Pt = qt. if Pt < qt.
(3.10)
where qt is the quote midpoint prevailing at the time the trade occurred. In the pure asymmetric information model of section 3.2, this proxy is sufficient. When inventory control is present, however, matters become more complicated. By construction in the models discussed to this point, the dealer inventory is related to the trade by It =/t-1 - xt. Because trades convey information only about the inventory changes, but not about the levels, they are generally inadequate proxies. From a statistical viewpoint, the problem is one of overdifferencing. When a variable such as a security price contains a random walk component, it is common to specify a stationary model in terms of the first difference (the price change, as we have done here). If one takes the first difference of a variable that is already stationary, however, the first difference will still be stationary, but it will not possess a convergent autoregressive representation. The overdifferenced variable is said to be noninvertible. The general role of the invertibility assumption in microstructure models will be discussed in section 4.1. But the consequences for the specification of inventory control models can be illustrated with the simple models considered here. In the pure inventory control model of section 3.1, the specification given in equations (3.1) may be reworked to give a univariate representation for the inventory level: /t = ( 1 - a b ) / t _ l - vt, a simple first-order autoregression that is easily estimated. The trade series obtained by taking the (negative of) the first difference of the inventory is xt = - ( I t - / t - l ) = (1 - ab)xt_l + vt - vt-1, a mixed autregressive-moving average (ARMA) form. No recursive substitution will yield an autoregressive representation for xt with declining coefficients. The dilemma is not solved by adding the price change: there does not exist a convergent vector autoregressive representation for { A p t , x t ) . Nor is it generally convenient to estimate the A R M A specification given for xt directly, since most techniques assume invertibility. (Exceptions are those based on exact maximum-likelihood Kalman filter methods. See Hamilton (1994).) Despite this cautionary note, there are many situations in which models based on trades will in fact be invertible. The noninvertibility of the trade specifications
666
J. Hasbrouck
arises from the fact that the trade series is the (negative) first-difference of the (presumably stationary) inventory series. In some data sets this is indeed the case: transactions are identified as to sign (buy or sell) and counterparty (e.g., the London Stock Exchange data used by Neuberger (1992) or the computerized trade reconstruction (CTR) data used by Manaster and Mann (1992)). The trade series composed of all the buys and sells to and from a particular dealer is, by construction, the first difference of the dealer inventory and it is implausible to assume invertibility. In many markets, however, the dealer is not invariably the counterparty to the outside order. On the NYSE, for example, the dealer (specialist) participates in a relatively small portion of the trades. Often the bid and ask quotes represent nonspecialist orders. There is a strong presumption of mean reversion in dealer inventories. But the other traders effectively placing bid and ask quotes represent a large, diverse and changing population of agents. There is little reason to suspect that the aggregate trades of this group integrate up to a stationary series, and therefore little concern that trades will constitute an overdifferenced and noninvertible time series. As an example, consider the following ad hoc model designed to capture many of the essential features of the inventory and asymmetric information model, but specified without direct reference to inventories:
mt = m t - 1 q- Wt
Wt : Ut q- gvt q, = m , - I + Ut + d(q,-1 - (mr-2 + ut-1)) + bxt xt : - a ( q t - (mt-1 + ut)) + vt Pt : qt q- CXt
(3.11)
The essential difference between this and (3.8) is in the quote midpoint equation. The inventory dependence has been replaced by an explicit mean-reversion component that mimics the behavior associated with inventory control. This model was originally suggested by Lawrence Glosten, and is discussed in Hasbrouck (1991). That the model exhibits characteristics of both inventory control and asymmetric information models can be seen from the impulse response functions (Figure 5) subsequent to a one-unit purchase innovation. The cumulative trade series is plotted as an analog to the (negative) inventory level. The parameter values are a = 0.8, b -- 0.4, c -- 0.5, g -- 0.2 and d = 0.5. Like the basic inventory control model, there is a decaying reversion in the transaction price. Like the asymmetric information model, the reversion is not complete.
3.5. S u m m a r y r e m a r k s on the simple models
This section and the one preceding have illustrated the basic economic paradigms that underlie modern microstructure. The results may summarized as follows. The bid-ask spread reflects fixed-cost and asymmetric information factors. The cost

(p) C u m u l a t i v e T r a d e ( S u m of x)
667
Price
0.4 0.3 0.2 0.1

~ l 1 6 1 b d 1 4 Q 6 1 6 0 ~ 6
0.8
0.6 0.4 t 20 0.2
00G ::=:
::=
==2=Q
. 5
. 10
. 15
. . . . . 5 10
. '
15
20
Fig. 5. The ImpulseResponse Functionfor the AsymmetricInformation/TradeModel.The adjustment of the transactionprice (p) and cumulativetrades (,T,x)subsequentto an initial purchase of one unit. The model is the asymmetricinformation/trade model given in Equation (3.11) with parameters a = 0.8, b = 0.4, c = 0.5, 0 = 0.2 and d = 0.5. effect introduces a short-run transient "bounce" in price movements, while the asymmetric information effect is associated with a relatively rapid and permanent impact of a trade on the security price. Neither effect should necessarily induce any particular behavior in subsequent trades. Lagged price adjustment and inventory control create transients of longer duration. The price transients caused by the former, however, tend to smooth informational responses, while those induced by inventory control induce price reversals. Inventory control should furthermore be associated with endogenous effects on the incoming trades.
4. General specifications
The last section introduced basic microstructure concepts using simple structural models. These models are useful for calibrating the economist's intuition, but they are generally not good candidates for direct estimation. Key variables (such as the dealer's inventory) are often unobserved; the mechanisms are often more complicated than the stylized models suggest; the effects are often operating in concert; and finally, they are complicated by a host of other (primarily institutional) considerations discussed below. While it is always preferable to base a statistical model on a well-specified theoretical model, these considerations impose limitations on what can be achieved. The models discussed in this section are in contrast nonrestrictive statistical models of microstructure data. The perspective here is one of foregoing precise estimates of structural parameters in hopes of achieving a characterization of microstructure effects that is both broad and robust. Most importantly, it is still possible under minimal assumptions to characterize the permanent/transient and trade-related/-unrelated dichotomies set forth in the introduction.
4.1. Vector Autoregressions ( V A R s )
A vector autoregression is a linear regression specification in which current values of all variables are regressed against lagged values of all variables. The inventory and asymmetric information models discussed in the last section, for example, can
668
J. Hasbrouck
be specified as bivariate vector autoregressions. More general and flexible models can be obtained by extending the number of lags in estimation. VARs are relatively easy to estimate (least squares usually suffices) and interpret (via the impulse response functions or other transformations considered below). Their value in microstructure studies also rests, however, on the their ability to characterize very general time series models. It is useful at this point to outline the assumptions underlying this generality, and also the ways in which they might be violated in microstructure applications. The broad applicability of VARs ultimately rests on the Wold theorem. A zero-mean vector time series yt is said to be weakly stationary (covariance stationary) if the autocovariances do not depend on t, Eyt~_j = Fj . The Wold theorem states that a zero-mean weakly stationary nondeterministic process can be written as a convergent vector moving average (VMA) process (possibly of infinite order):
yt = et 4 - B l e t - 1 + B2et-2 + . . . . B(L)et,
(4.1)
where the et are serially uncorrelated homoskedastic increments with covariance matrix ~2 and L is the backshift operator, L(.)t = (')t-1 (Hamilton (1994) and Sargent (1987)). This is nothing more than the innovations representation of the process. This section assumes that the conditions of the Wold theorem are satisfied. The stationarity assumption will be examined in greater detail in Section 5. Suppose that we are working with price changes and trades (as in the model of section 3.4), so that the state vector is
Yt = L xt
r]
Apt
and et =
II
t
ut , Var(et) = f2 =
~ru
0
~7
(4.2)
The orthogonality of the residuals is based on the economic assumption that contemporaneous causality flows from trade to the transaction price. This characterized all of the simple structural models discussed in the last section. It is easy to contemplate market structures in which this assumption might be violated, but in many settings it is a reasonable approximation. If all of the roots of the polynomial equation det(B(z)) = 0 lie outside of the unit circle, then the VMA representation is said to be invertible, that is, it may be reworked to give a (possibly infinite) convergent VAR representation:
yt = A l y t - I + A 2 y t - 2 + ' " + et = A ( L ) y t + et.
(4.3)
In microstructure applications, the invertibility assumption is commonly violated by overdifferencing or cointegration. As noted in section 3.4, overdifferencing is a real possibility when the model involves inventories, but the data contain only trades (the first difference of the inventory). Cointegration arises when the state vector includes two or more price variables for the same security (like the bid and ask quotes, or the transaction price and either quote), and is discussed further in section 8. All of the simple models discussed in the preceding sections may be represented in the form (4.3).
Modeling market microstructuretime series
669
A minor inconvenience arises because all of the bivariate VAR models in the last section include a contemporaneous term on the right hand side: Yt =A~yt +ATyt_l +A~yt-2 + ... + e~. It is easy to rework this into the form (4.3) by noting Yt = (I-A~)-lA*lyt_l + (I-A~)-lA~yt_2 + . . . + ( I - A~)-le~ Estimating the model in the form that includes the contemporaneous term is a convenient way of forcing orthogonality on the estimated residuals. Most econometric texts, however, employ the form (4.3), and this will be used here as well. There are several ways of computing the VMA (4.1) from the VAR. Conceptually, the simplest procedure involves simulating the behavior of the system subsequent to one-unit initial shocks (Hamilton (1974)).
4.2. Random-walk decompositions

In the simple models the distinction between permanent and transitory price changes was expressed by equation (2.2). In the earlier sections, the specification of st was implicitly given by the structural form of the model. In this section, we take a more frankly statistical perspective, defining mt and st in terms of their time series properties. Formally, the model is equation (2.2), but with the additional statistical assumptions that: 2 and 1. mt follows a homoskedastic random walk: Ewt = 0, Ewt 2 = o-~ Ewtw~ = 0 for t ~ z. 2. st is a covariance stationary stochastic process. It is worth emphasizing that the pricing error is not assumed to be serially uncorrelated or uncorrelated with wt. To establish the connection between the random walk decomposition (2.2) and the VAR described in (4.3), we will be working with the component of the VMA representation that corresponds to the price changes:
Apt = b(L)et
(4.4)
where b(L) is the first row of the B(L) matrix in (4.1). We assume that the pricing error can be written as a linear combination of current and lagged et plus (to allow for other sources of variation) current and lagged r/t where r/t is a scalar disturbance uncorrelated with et:
st = c(L)e, + d(L)~h
(4.5)
In terms of the random-walk decomposition model, the price changes can be written as:
Apt = (1 - L)mt + (1 - L)st = wt + (1 - L)st

The autocovariance generating function for a vector process Yt is
(4.6)
hy(z) . . . .
F-2 z-2 + F-1 z-1 + Fo + Flz 1 + F2z2 + - - - ,
(4.7)
670
3. Hasbrouck
where z is a complex scalar (Hamilton (1994) p. 266). For a VMA process such as (4.1), hy(z) = B(z)~2B(z -l). Equations (4.4) and (4.6) lead to two alternative representations for the autocovariance generating function of Apt:
hAp(Z) = b(z)f2b(z -1) = ~w 2 + (1 - z)h~(z)(1 - z-l)
(4.8)
where hAp(Z) and h~(z) are the autocovariance generating functions for Ap and s. By setting z = 1, we obtain: 2 O" w b(1)~2b(1)' (4.9)
This expression for the random-walk variance depends only on the parameters of the observed model, and hence is always identified. For example, the bid-ask model (with or without asymmetric information) can be represented as a firstorder moving average model given by equation (2.6). In this case, b(L) = 1 + OL and f~ -- ~r2, which implies a2 = (1 + 0)2a~. Returning to the bivariate case with price changes and trades, let b(L) be partitioned as b(L) = [bAp(L) bx(L)]. Given the diagonal structure of f~, the random-walk variance can be decomposed as:
2
[bAp(1)]2cr2u + [bx(1)]2o-2
(4.10)
The two variance terms correspond to the non-trade and trade-related contributions to the efficient price variance. The R 2 measure introduced in (3.5) as a summary of the extent of asymmetric information can be generalized as:
2 R2w,x = [bAp(1)] 20.2u/~w
(4.11)
Turning to the pricing error, we find that most results require further structure. If it is assumed that the pricing error is driven entirely by et, then we may eliminate the d(L)q t term in (4.5). This yields b(L)et = wt + (1 - L)c(L)et, which implies wt = [b(L) - (1 - L)c(L)]et. A solution for this is wt = b(1)et, which is obviously consistent with the random-walk variance described above. By solving b(L) = b(1) + (1 - L ) e ( L ) , the coefficients of the e(L) polynomial are found to b e : c / = - ~ i + 1 bj. Once the c(L) coefficients are obtained, we may compute the value for st at a point in time, the unconditional variance of the pricing error, and also the trade- and nontrade-related components of this error. Given the diagonality of the innovation covariance matrix, these may be partitioned into traderelated and -unrelated components following the same procedure used in the analysis of aw2 above. The restriction that d(L)tlt = 0 was originally suggested in macro applications by Beveridge and Nelson (1981). If the pricing error is assumed to be orthogonal to the random-walk increment, then the e(L)et term in (4.5) vanishes. In this case, the coefficients of the d(L) polynomial must be found by factoring the autocovariance generating function. The autocovariance generating function for st is h~(z) = d(z)cr~d(z -l) with do normalized to unity. This may be substituted into (4.8) and the d(L) coefficients found by factorization. This identification restriction is due to Watson (1986).
671
Watson also establishes some filtering results that are very useful in microstructure applications. We are assumed to possess a VMA for the observed processes (equation 4.1)) and wish to establish a correspondence to an unobserved components model (equations (2.2) with pricing error given by (4.5)). Watson shows that the best one-sided linear estimate (i.e., linear function of current and past observables) of the stationary component (pricing error) is the one associated with the Beveridge-Nelson identification restriction. (Since t/t in (4.5) is orthogonal to the et, the best one-sided projection involves only the et.) This one-sided projection, denoted ~t, is:
~t = E*[stle,, et-1,., .] = e(L)e,
(4.12)
where the c(L) coefficients are given above. Hasbrouck (1993) notes that the variance of the error in the one-sided projection is: E(st - ~t)2 = Est 2 - E~2 _> 0 where the equality follows from the fact that the projection errors are uncorrelated with the projection: E(st - ~t)kt = 0 . This implies Es 2 > E~2: the variance of the one-sided (Beveridge-Nelson) projection establishes a lower bound on the variance of the pricing error. A related result is discussed in Eckbo and Liu (1993). The tightness of the lower bound for the pricing error variance depends on the nature of the unobserved components model and also on the available data. In the asymmetric information model of section 3.2, the lower bound is exact (coincides with the true pricing error variance) if the model is estimated using both prices and trades. The actual variance exceeds the computed lower bound, however, if the model is estimated solely on the basis of prices. Hasbrouck (1993) discusses implementation considerations.
4.3. M o d e l order
The VAR and VMA representations discussed above are possibly infinite in length. In most applications these will be approximated by truncated specifications. This raises the question of how many lags should be included in the specification. It is tempting here to rely on the usual statistical tests for model order (see Lutkepohl (1993), Ch. 4). In macroeconomic applications these tests usually (and conveniently) lead to models of modest order. This may be a consequence, however, of the low power of these tests to identify weak long term dependencies in typical macroeconomic data sets. In contrast, the large number of observations in microstructure applications is often sufficient to suggest statistical significance of weak dependencies at lags that would drive the number of model parameters beyond the capacity of most computer programs. Many empirical and theoretical considerations do in fact militate in favor of extremely long lags. A number of studies, for example, have documented stock return dependencies over horizons on the order of five or ten years. A correct specification for stock price changes at the transaction level should in principle also account for observed behavior over longer horizons as well. It would
672
J. Hasbrouck
therefore appear that estimations limited to, say, the five or ten most recent transactions are seriously misspecified. If the concern is the behavior of stock returns over annual and longer cycles, however, it can be argued that the misspecification in short-run transaction studies is both economically irrelevant (for microstructure) and small in magnitude. The long-term swings in stock prices are generally held to reflect changes in expected returns. These are presumably due to business cycle factors in the real economy that have little connection to the short-run trading characteristics. Microstructure phenomena are almost by definition confined to short horizons. A truncated transaction-level model may not achieve an accurate resolution of transitory and permanent effects, but it may nevertheless still satisfactorily resolve microstructure and non-microstructure effects. It must be acknowledged, however, that between horizons that are clearly microstructure-related (five transactions) and those that are clearly macroeconomic (five years) lie hourly or daily horizons over which microstructure phenomena might be important but difficult to detect. It was noted that dealer inventories often exhibit long-term components. Furthermore, traders sometimes employ strategies that spread order placement over many days. Such effects may not be detected in short-run transaction studies. This point is particularly imprint when the variable set includes nonpublic data, as discussed below.
4.4. Expanding the variable set
Since the models discussed in sections 2 and 3 involve only prices and trades or inventories, the discussion has been limited to bivariate VARs. It is not difficult, however, to imagine hypotheses that would involve additional variables. For example, Huang and Stoll (1994) incorporate futures market variables into stock return specifications; Hasbrouck (1996) includes order flow; and Laux and Furbush (1994) examine program trades. Such studies typically attempt to test hypotheses concerning the informational content of particular data that are usually associated with the trading process. While the details of these models lie beyond the present discussion, it is appropriate here to raise certain issues of modeling philosophy. In contemplating the addition of a variable to a stock price specification, perhaps the most important question is whether or not or in what sense it is public knowledge. Given the complexities of the trading process, the usual situation is a murky one in which the data are known by a subset of agents (see section 2.1). Transaction-level microstructure VAR's typically reflect the explanatory or predictive power of a variable over a relatively short time horizon. If the variable does not enter the public information set within the horizon, however, then its information content will be not be measured correctly. The information content of a trade, for example, can plausibly be assessed by short-run analyses because in most markets trades are reported quickly. But suppose the econometrician possesses a series of trades that has been identified (some months after the fact) as originating from corporate insiders illegally
673
trading on advance knowledge of earnings announcements. If the insiders trade a week in advance of the public announcement, then the association between an insider purchase and the price rise occurring a week later will not be detected in a short-run microstructure VAR. The VAR will pick up the information content of a purchase, but not the additional informational content of an insider purchase. Addition of other variables may cloud attribution of information effects in another respect. The simple models were constructed with explicit timing assumptions that generally sufficed to impose a recursive structure on the disturbances. In each time interval for the asymmetric information model, for example, the quote is revised to reflect public information, then a trade arrives, and then expectations are updated. This recursive economic structure gives rise to the statistical property that trade innovations are uncorrelated with public information, which in turn supports a clear resolution of trade and non-trade information effects. Often, however, particularly when the data are collected from diverse sources, the time-stamps may not be clear enough to establish a recursive structure. The econometrician's imposition of a particular choice may exaggerate the informational content of variables appearing early in the assumed recursion. In such situations, the behavior of the model may be investigated by examining alternative recursion assumptions. It is often possible, for example, to establish bounds on the variance decomposition components in expressions such as (4.10) using Cholesky factorizations of the innovation covariance matrix. Hamilton (1994) discusses general principles; Hasbrouck (1995) presents a microstructure application.
5. Time The microstructure models studied in the earlier sections were implicitly cast in real time, sometimes referred to as "calendar time" by macro econometricians or "wall-clock time" by microstructure students. In the interest of simplicity we implicitly took the time subscript t in the usual sense, as an index of equallyspaced points in real time. The stationarity assumptions necessary to support inference were assumed to hold with respect to this time index. Timing considerations in actual markets, however, are considerably more involved. Markets do not usually operate continuously. The few that are in principle open twenty-four hours per day exhibit strong concentration of activity. Furthermore, trades usually take place at random times throughout the market session. This section discusses ways in which more realistic notions of time can be incorporated into statistical models.
5.1. Deterministic time considerations
Some of the time properties of markets appear to be deterministic, like the regular or predictable seasonalities encountered in macro time series. Two related examples in microstructure data are market closures and intraday patterns.
674
J. Hasbrouck
In most markets, trading takes place continuously during organized trading sessions. In between are periods of nontrading, typically over a lunch break, overnight, or over a weekend or holiday. If we are interested only in the behavior of the market during a trading session, we may drop from the sample all observations that span trading sessions, e.g., we might ignore an overnight return. If the aim of the analysis is a comprehensive model of the market evolution during periods of trading and nontrading, however, the econometrician must first take a position on whether or not the market evolution is time homogeneous, i.e., whether prices (security values) behave in the same way during trading and nontrading periods. If homogeneity is assumed, then we are taking the view that the timing of the observations in our sample is merely an artifact of some sampling process that is not related to the behavior of the system. Obviously for models in which trading plays a central role (such as those involving asymmetric information), time homogeneity is not an attractive assumption. In testing less refined hypotheses, however, the conjecture might be a workable approximation. This motivates consideration of how time homogeneity is empirically examined. Most of what we know about the role of time in microstructure data derives from the analysis of price-change variances (rather than means). This reliance on second moment properties characterizes not only the analysis of trading vs. nontrading periods, but also most of the work done on intra-trading session evolution. The reasons for this emphasis are the ones raised in Section 2.1: if the price follows a random walk, the precision of variance estimates is improved by more frequent sampling, the precision of mean estimates is not. In U.S. equity markets, at least, the hypothesis that the return variance per unit time is constant over trading and nontrading periods is easily rejected (Fama (1965), Granger and Morgenstern (1970), Oldfield and Rogalski (1980) and Christie (1981)). Based on an analysis of returns computed using daily closing prices, French and Roll (1986) estimate that the return variance per unit time is at least an order of magnitude higher when the market is open than when it is closed. This is due in part to the fact that production of public information (such as news releases) is more likely to occur during normal business hours, but it is also due to the role of trading itself in the price discovery process. Having rejected time homogeneity in the large, that is over trading and nontrading periods, might we still provisionally assume that it holds during trading sessions, at least well enough to support intraday analysis? There is considerable evidence to the contrary. As a general rule, microstructure data exhibit distinctive behavior at the beginning and end of trading sessions. Most notably, return variances per unit time exhibit "U"-shapes, i.e., elevations at the session endpoints. Marked intraday patterns are also found in measures of trading activity such as transaction frequency, trading volume rates and bid-ask spreads (Jain and Joh (1988), Mclnish and Wood (1990), Mclnish and Wood (1992) and Wood, Mclnish and Ord (1985)).
Modeling market microstructure time series 5.2. Stochastic time effects
675
Although trading processes unfold in continuous time, they are marked by discrete events (e.g., trades or quote revisions). The determination of these occurrence times is at least in part random. Ideally, then, how should these processes be modeled from a purely statistical perspective? Furthermore, what is the economic significance of the occurrence times? Specification of continuous-time models that allow for random intervals between events is difficult. There is a well-established literature on the analysis of irregularly spaced time series. (See Parzen (1984), Jones (1985), and the references therein.) It is commonly assumed in these models that the irregularity is a property of the observational process per se, i.e., that the underlying process evolves homogeneously in real time, and that the irregular observation times are either fixed or are at least exogenous to the evolution of the process. In microstructure applications both of these assumptions are problematic, the former on account of intraday volatility patterns and the latter for reasons yet to be discussed. Nevertheless, this approach does achieve an appealing unity in capturing the discrete and continuous time aspects of a simple model. Furthermore, the techniques used to specify and estimate these models may yet be generalized to more complicated and realistic situations. Garbade and Lieber (1976) specify a variant on the simple bid-ask model in which the implicit random-walk variance per unit time is constant and the random-walk variance over a transaction interval is scaled by the intertransaction time. It is also necessary to assume that the intertransaction times are identically and independently distributed exponential random variables (i.e., a Poisson trade arrival process). Garbade and Lieber find that the model performs well in a study of transaction data for IBM and Potlatch over ten trading days. The data suggest, however, more clustering of trades (over intervals shorter than approximately ten minutes) than is consistent with the hypothesized Poisson arrival process. In a more recent and comprehensive study of stock transaction data, Engle and Russel (1994) also find clustering and suggest an autoregressive duration model. Although the Garbade and Lieber model predated the advent of the inventory control and asymmetric information models, it could easily be adapted to incorporate these effects. The principal limitation of the approach from a current perspective is the assumed independence of the observation ("transaction generation") process. The model implies, for example, that the probability that a trade will occur is independent of the size of the innovation in the security value, i.e., that we would be no more likely to witness a trade in the one minute following the close of a major press conference than we would in the middle of an uneventful August afternoon. This independence is not realistic. Alternative approaches to the transaction occurrence problem have been employed in multiple security settings. The principle that (for a random walk) precision of variance estimates is enhanced by refinement of the observation interval also applies to estimates of covariances and betas, both of which are central to the standard portfolio problem. In addition, portfolio groupings are often employed
676
Z Hasbrouck
to reduce measurement errors in certain applications, particularly the estimation of the return autocorrelations. Yet as the use of daily closing prices has become common, it has also been recognized that trading and reporting practices can induce significant estimation error in betas and significant autocorrelation in measured portfolio returns. Campbell, Lo and MacKinlay (1993) provide an overview of these developments. Applications with asynchronous trading and last-trade reporting have historically attracted the most attention. Fisher (1966) discusses implications for stock index construction and interpretation. Analyses focusing on beta and covariance estimations are given in Scholes and Williams (1977), Dimson (1979), Cohen, Hawawini, Maier, Schwartz and Whitcomb (1983a,b), Shanken (1987). Studies emphasizing the effects on portfolio return autocorrelations include Atchison, Butler and Simonds (1987), Boudoukh, Richardson and Whitelaw (1994), Cohen, Maier, Schwartz and Whitcomb (1986), Conrad and Kaul (1989), Conrad, Kaul and Nimalendran (1991), Lo and MacKinlay (1988a,b, 1990a,b), McInish and Wood (1991) and Mech (1993). Traders sometimes characterize a market at a given time as being "slow" or "fast". The description extends beyond the speed of price changes. Prices do tend to move quickly in a fast market, but the frequency of order arrival and transaction occurrence is also higher. It is as if "an hour's worth of trading is packed into five minutes." From a modeling viewpoint, this is more than figurative speech. It is calling attention to the distinction between real time and operational time, the time scale over which the process evolves at a constant rate. Stock (1988) describes this as time deformation. Time deformation themes have been advanced in many empirical microstructure studies (not always using this terminology). Although the asymmetric information link between trades and prices has been formalized relatively recently, the idea that price variance is related to trading activity is older. Clark (1973) suggests that stock prices follow a subordinated stochastic process, one in which the "clock" of the process is trades. A number of studies find that over fixed real time intervals (such as a day or hour), the variance of equity price changes is positively related to the number of transactions and/or the trading volume (Harris (1987), Tauchen and Pitts (1992)). McInish and Wood (1991) and Jones, Kaul and Lipson (1994)) suggest that the association between return variance and trade frequency is higher than that between return variance and trade volume. From an economic perspective, time deformation in market data is usually assumed to result from variation in the "information intensity" of the market, the rate at which the informational primitives (public and private signals) evolve. This is difficult to operationalize because these primitives, with the exception of sharply defined events like press conferences, are rarely observed. Also, in most theoretical models, the informational primitives are exogenous, implying that the resulting time deformation would also be exogenous. Other economic considerations, however, strongly suggest endogenous time effects. A market-maker, for example, might diminish the frequency of incoming
677
order arrival simply by widening the bid-ask spread. This sometimes occurs in response to a particularly significant informational announcement. In this instance, the econometrician relying on trade frequency as a proxy for informational intensity will draw exactly the wrong inference. Easley and O'Hara (1992), Easley, Kiefer and O'Hara (1993, 1994) and Easley, O'Hara and Paperman (1995) discuss these effects and suggest empirical tests. Strategic quote-setting behavior that can also lead to trade frequency effects is discussed by Leach and Madhavan (1992, 1993).
5.3. R e c o m m e n d a t i o n s
Incorporating realistic time effects into microstructure models is a difficult task that is likely to call forth more and better research efforts. But if time p e r se is not the focus of a particular analysis, the econometrician needs to match the method to the immediate problem and the data. For investigating broad hypotheses about intraday patterns in market data and associations in these patterns, it appears sufficient to rely on data aggregated over fixed time intervals (e.g., hours). For investigating causal relations (such as trade price impacts) that would be obscured by aggregation, the econometrician should lean toward modeling the data purely in event time, i.e., where t indexes trades, quote revisions, etc. This is generally preferable to real-time modeling because it mitigates the effect of intraday patterns, and it incorporates some of the intuition of the formal time deformation approach: the "clock" of the process is assumed to be events.
6. Discreteness
Although the models discussed to this point have assumed that both prices and quantities are continuous random variables, both are in fact discrete. Of course, most economic data are discrete in the sense that they are collected and reported subject to rounding or truncation errors. Market data are different, though, firstly because the discreteness is not merely an artifact of the observational process and secondly because the discreteness is economically significant. On the NYSE, for example, the standard transaction size is a "round lot" of 100 shares. Deviations from multiples of this transaction size may lead to more difficulty in completing the trade and higher proportional transaction costs. Also, a stock priced at $5 or more per share trades in ticks of 1/8 dollar (12.5 cents). By way of comparison, the per share commission on an institutional trade is roughly five cents per share. Inability to smoothly adjust prices and quantities plays havoc with the intuition behind the simple models discussed earlier. Discreteness effectively transforms the decisions faced by agents from relatively tractable continuous optimization problems to complicated integer programming problems. In the simple asymmetric information model of section 3.2, for example, it might be conjectured that a dealer contemplating a one-tick quote increase would wait until a sequence of buy orders had occurred. It appears to be all but inevitable
678
J. Hasbrouck
that discreteness will induce dynamic effects. Economic models that incorporate these and other aspects of discreteness include Bernhardt and Hughson (1990, 1992), Harris (1991, 1994), Chordia and Subrahmanyam (1992) and Glosten (1994).
6.1. The statistical modeling of discreteness

Although investigation of the economic aspects of discreteness is coming into its own as an important subject for inquiry, its status in empirical models has traditionally been that of a nuisance effect. Discreteness is often viewed as a feature of market data that needs to addressed or controlled for in some fashion while investigating other hypotheses. Most of the initial work on discreteness arose in response to the need to estimate return variances for purposes of option valuation. From a statistical viewpoint it is most convenient to model discreteness as a rounding disturbance (possibly to a floor or ceiling) (Ball (1990), Cho and Frees (1988), Gottlieb and Kalay (1985) and Harris (1990)). At first glance, discreteness would seem to cause intractable problems for the simple models of Section 3 and the generalized VAR models of Section 4, for the reasons usually given in econometrics texts regarding the estimation of limited dependent variable models using linear specifications. Consistency of least squares estimation does not require that the residuals be independent of the explanatory variables, however, only that they be uncorrelated. In many situations, absence of correlation can be motivated by appeal to the Wold Theorem, which is not contingent on an assumption that the variables are continuous. If the assumption of joint covariance stationarity is tenable in the time scale used to specify the model (usually either wall-clock time or transaction time), then there is no particular reason why discreteness should pose problems for estimating general VAR microstructure models and related constructs such as impulse response functions and variance decompositions. For many purposes, this approach will suffice. The characterization of the market obtained in this fashion, however, is incomplete. The implied impulse response functions, for example, represent the continuous paths of the expected evolution of the market, which will look quite different from the sample paths that arise in discrete data. Furthermore this perspective is ill-suited for examining hypotheses in which discreteness parameters (such as the tick size) are of interest. Hausman, Lo and MacKinlay (1992) present an ordered probit model of price changes. This is a single equation model in which trades and other explanatory variables (notably including the time between trades) drive a latent continuous price variable, which is in turn mapped onto the set of discrete prices using ordered breakpoints (that are estimated). Conditional on particular values of the explanatory variables, the predictions from this sort of model are given as probabilities of prespecified discrete price changes.
Modeling market mierostructure time series 6.2. Clustering
679
Market prices have an affinity for whole numbers that is difficult to justify on economic grounds. In most economic and statistical models, discreteness is specified as a grid on which strategies and outcomes must lie, but no distinctive properties are attributed to particular points on the grid. In a discrete random walk with 1/8 ticks, for example, the price change is equally likely to be + 1/8 or 1/8. If the current stock price is 50 1/8, it is equally likely that the next price will be 50 or 50 1/4. Yet, as Harris (1991) notes, "Stock prices cluster on round fractions. Integers are more common than halves; halves are more common than odd quarters; odd quarters are more common than odd eighths; other fractions are rarely observed. This phenomenon is remarkably persistent across stocks." Similar effects are found in NYSE limit order prices (Neiderhoffer (1965, 1966)), NYSE quotes (Harris (1994), and (to a striking degree) in U.S. National Market System quotes (Christie and Shultz (1994a,b)). Clustering suggests the existence of an implicit price grid that is coarser than the one mandated by the market rules. The economics of why these trading conventions arise and persist are not well understood.
7. Nonlinearity
The models in Sections 2 4 express current variables as linear functions of past variables and disturbances. Although one can construct theoretical models for which linearity is appropriate, such a requirement is uncomfortably restrictive in applications to actual markets. This section discusses the motivation and approaches for nonlinear generalizations. Among all of the aspects of microstructure modeling which we have examined so far, the one in which accurate functional specification is most important is the relation linking trades and price changes. Implicit in this relation are both the mapping from trades to inferred private information content and also the mapping from trades to trading costs. These mappings are determinants of individual agents' order placement strategies: how much to trade and whether to split the total quantity across different orders. From a social viewpoint, these mappings may admit or reject the possibility of market manipulation. Most of the structural models that allow for nonlinearity in the trade/price impact mapping are single-equation specifications of price changes in which trades are assumed exogenous and the dynamic aspects of the market are not explicitly modeled. One standard model of this sort is due to Glosten and Harris (1988). Their specification can be viewed as a generalization of the asymmetric information model of Section 3.2 in which there is an implied intercept in the cost and information functions. Variations of this model include George, Kaul and Nimalendran (1991), Neuberger and Roell (1991), Huang and Stoll (1994) and Madhavan, Richardson and Roomans (1994).
680
Z Hasbrouck
Intercepts and other nonlinearities can be incorporated into the general VAR models of section 4 in an ad hoc fashion. If price changes and signed trades are jointly stationary, then any transformations of price changes and signed trades are also jointly stationary. This suggests that the dynamic VAR models can be generalized by expanding the state vector to include nonlinear transformations. Hasbrouck (1991a,b, 1993) employs polynomial functions. Although a continuous function of a real variable can generally be approximated by a polynomial of sufficiently high degree, however, there is no assurance, that the approximation is a parsimonious one, an important consideration in practical applications. This motivates consideration of more flexible characterizations of the tradeprice change relation, of the sort provided by nonparametric analysis. Algert (1992) applies locally weighted regression to NYSE price and trade data, and concludes that the price change maps most closely to a low fractional power of the trade, suggesting that a square root transformation is preferable to the quadratic. Further applications of nonparametric and semiparametric methods in characterizing microstructure relations are likely to be illuminating. Related studies focus primarily on the price impact of large (block) trades in the U.S. equity market: Holthausen, Leftwich and Mayers (1987), Barclay and Warner (1993). Such trades are of interest not only because of their size, but also due to their trade mechanism, ~g discussed in the next section
8. Multiple mechanisms and markets The basic market paradigm used in this paper is one in which patient or passive traders (including dealers) post bid and offer quotes in some centralized venue like a stock exchange. Trades occur when impatient active traders arrive and hit these quotes. While this is the most common mechanism, actual markets exhibit considerable diversity. It is in fact rare for a security to trade solely in one market setting using one procedure. Most continuous equity markets, for example, employ a batching procedure to open a trading session or to handle large order imbalances. There may be special mechanisms to handle large trades. Finally, multiple markets in the same security may simply operate in parallel, with varying degrees of formal integration. The important economic issues in these situations concern the merits of alternative market structures and the nature of the competition between markets (see, for example, Chowdhry and Nanda (1991)). The empirical challenges involve the building of specifications general enough to handle the diverse trading mechanisms while retaining enough structure to address the economic hypotheses of interest. We consider in this section some common situations.
8.1. Call auctions
A call auction is a procedure that approximates the Walrasian auction often used as a conceptual device to explain price determination in an idealized competitive
681
market. Over some order entry period, traders submit supply and demand schedules specifying how much they intend to buy or sell at a particular price. At some clearing time, orders are crossed at the price given by the intersection of the aggregate supply and demand curves. Although conceptually simple, the practical aspects of implementation are decidedly nontrivial, ranging from how much information to display before clearing to the pricing of order entry and exchange services. There is much current interest in the economic analysis of call and continuous markets. This is perhaps a consequence of the realization that with current communications technology, a call auction simultaneously involving large numbers of geographically dispersed participants is, for the first time, feasible. Advocates of call auctions argue that pricing errors will be minimized because the aggregate supply and demand schedules will reduce (by the law of large numbers) the impact of idiosyncratic randomness in individual demands and arrivals (Mendelson (1982), Schwartz and Economides (1995) and Schwartz (1996)). Advocates of continuous markets place a high value on the availability of immediate execution, which is of particular importance in hedging and dynamic portfolio strategies. At the NYSE, a call is used to open continuous trading, and also to reopen continuous trading after a trading halt. A call (itayose) is also used to initiate continuous trading on the Tokyo Stock Exchange (Lehmann and Modest (1994), Hamao and Hasbrouck (1995)). The Frankfurt Bourse runs a noon call, at which time most of the retail orders for German equities are traded. If the primary aim of a study is characterization of the continuous trading mechanism (which usually accounts for the bulk of the trading activity and most of the price change variance), then one commonly drops the opening price (and the overnight price change) from the analysis. For hypotheses that specify the joint behavior of the two mechanisms, however, other methods are required. It is rare in empirical studies for the two mechanisms to be modeled jointly with fully specified models of both mechanisms. Instead, the merits are usually investigated by comparing opening call prices with one or more prices from the continuous session. Suppose that the time index t = 1,2,... is constructed so that the odd times t = 1, 3, 5,... correspond to market opening times, and the even times t = 2, 4 , . . . correspond to market closing prices (or some other price taken from the continuous trading session). Using the basic random walk decomposition model from section 2.2, a two-period price change may be written as Apl 2] (wt @ wt-1) @ st - st-2. Assuming that the wt and st are mutually and serially uncorrelated, the variance of the two-period price change is
=
Var(Ap121) = Var(wt) + Var(wt_l) + Var(st) + Var(st_2)
(8.1)
We now consider how this variance depends on whether t is odd (an open-toopen price change) or even (close-to-close). There are two random walk terms. Whether or not t is even, one of the pair t and t - 1 is even and the other is odd. Therefore Var(wt) + Var(wt_l) does not depend on whether t is even. It is the variance of the 24-hour innovation in the efficient price. The pricing error time
682
,/. Hasbrouck
subscripts, on the other hand will be both even or both odd. We may therefore write:
Var(Aptpe") = VarlAp dse3~ t /=
Var(wt) + Var(wt_l) + 2Var(st pe") Var(wt) + Var(wt_l) +
2Vat(s/se)
(8.2)
The difference between these two variances is therefore twice the difference in variances of the opening and closing pricing errors. If the variance of the opening pricing error is greater than that of the closing pricing error, this difference is positive. Alternatively, the variance ratio of the first variance to the second is greater than one. Amihud and Mendelson (1987) and Stoll and Whalley (1990) find that on average for NYSE stocks this ratio is indeed greater than one (larger variance of pricing error at the opening call). These results have not settled the mechanism debate. It has been argued that the elevated opening variance at the NYSE is due to particular features of the NYSE call (selective ability of traders to "recontract", the last-move advantage of the specialist, etc.). It may also be that the period of overnight market closure is associated with transient opening effects that are not associated with the call mechanism per se. The Tokyo Stock Exchange trading day is broken into morning and afternoon sessions, both of which begin with a call. Amihud and Mendelson (1991) find that while the variance of the morning open is elevated (consistent with U.S. findings), the variance of the afternoon call is not. Related studies include Amihud, Mendelson and Murgia (1990) (Italy), Gerety and Mulherin (1994) (long-run U.S.) and Masulis and Ng (1991) (London). Smith (1994) and Ronen (1994) discuss the general statistical properties of variance ratio estimates in these applications. Lee, Ready and Seguin (1994) discuss calls subsequent to trading halts. More general variance ratios of another type arise in microstructure studies as a summary measure of the extent to which a price series deviates from a random walk. It is a property of a homoskedastic random walk that the variance of the increments is a linear function of the time interval over which the increment is computed. That is, in simple random-walk model (section 2.1) the variance of the one-period price change is V a r ( A p t ) = V a r ( p t - Pl-t)= o'2; that of the twoperiod change is Var(Apl 2]) = Var(pt pt-2) 2aZwa~d so on. The ratio of these two variances scaled by the time intervals is (Var[Apl ]]~2)~Vat(Apt) is equal to unity. More generally, the variance ratio formed from the n-period price change (relative to the one-period change is
=
v. -
nVar(Apt)
(8.3)
For a random-walk, V, = 1 for all n. The extent to which this ratio deviates from unity is sometimes taken as a measure of how much the process deviates from a random walk.
683
A useful alternative form for V~ is obtained by expanding Var{Ap~"]J/'\ in terms of the price-change autocovariances, and dividing through by Va]-(Apt), yielding Vn = 1 + 2 ~in_l 1 Pi where Pi is the price-change autocorrelation at lag i. Written in this fashion, it becomes apparent that for the simple bid-ask model of section 2.3, the only non-zero autocorrelation is Pl < 0, which will in turn drive Vn below unity. On the other hand, positive autocorrelation (induced perhaps by lagged adjustment) can lead to variance ratios above one. A mixed pattern of positive and negative autocorrelations can lead to a variance ratio equal to unity for a price-change process that is distinctly different from a random walk. An early application of variance ratios to stock return data is Barnea (1974), who interprets the nine-day/one-day variance ratio as a performance measure for New York Stock Exchange specialists (designated dealers). Hasbrouck and Schwartz (1988) estimate variance ratios using transaction data for stocks traded on the New York, American and National Market System ("over-the-counter") exchanges. Kaul and Nimalendran (1994) use variance ratios to resolve bid-ask and overreaction effects. Lo and MacKinlay (1988) employ variance ratios to examine the random walk hypothesis in weekly stock return data, and describe the asymptotic properties of the variance ratio and related estimates under the null (random walk) hypothesis. Their paper also contains citations to other occurrences of variance ratios in the statistical and economics literature.
~ J
8.2. Large trade mechanisms
Trade cost is related to trade size. When a trader is contemplating a transaction that is much larger than the normal trade size for a market, this cost might be reduced by breaking the order into smaller pieces brought to the market over time. For traders demanding immediacy in large size, however, alternative trading procedures have often evolved. On the NYSE, for example, large (block) trades are typically negotiated in the "upstairs" market, and then formally transacted ("crossed") on the exchange and reported to the transaction tape. Economic issues are considered by Burdett and O'Hara (1987), Grossman (1992), Seppi (1990, 1992). The last section cited studies of the price impact of block trades. As in the case of different opening mechanisms, there are no analyses employing fully realized joint specifications of the regular ("downstairs") and upstairs markets. In fact it is not possible to infer from the public quote and transaction record which trades were negotiated in the upstairs market. Accordingly, most empirical studies simply treat block trades as "large" trades, ignoring the details of the negotiation process.
8.3. Parallel m a r k e t s
It is convenient to view opening call auctions and block trades (at least in the U.S. equities markets) as alternative mechanisms functioning as close adjuncts to regular trading in a single market. When the alternative trading mechanisms for a
684
J. Hasbrouck
security diverge greatly with respect to their clientele, locations or procedures, it may be more natural to view the alternatives as distinctly different markets. For example, equities listed on the NYSE also trade on the U.S. regional exchanges. Although there are electronic links among the exchanges, trading and quote-setting may vary considerably across venues. As a second example, while the Paris Bourse accounts for much of the trading volume in French equities, large trades are frequently done on the London Stock Exchange. There is no formal integration of the two, although it is likely that someone contemplating a trade would check the prices in both markets (de Jong, Nijman and Roell (1993)). Grunbichler, Longstaff and Schwartz (1992) discuss multiple markets in German equities. The current trend toward increased dispersal of trading activity is termed "fragmentation". It might be hoped that with market data on a single security trading in or more markets, one could estimate the market dynamics jointly, simply by "stacking" the market data to combine them in a single estimation. If these data include two or more price series for the security, however, specification becomes tricky. The complexities can be illustrated in a simple model of a single security trading in two markets, with imperfect flows of information. The implicit efficient price follows a random walk, but with increments that are "revealed" to each market separately:
mt = mt-1%- wt
wt = ul,t + u2,t Pl,t = mr-1 + ul,t %- (1
(8.4) - al)u2,t = mt - alu2,t
p2,t = m t - 1 % - u2,t %- (1 - - a 2 ) u l , t = m t -- a2ul,t
The price equations are consistent with lagged adjustment to information originating in the other market. The price in the first market, for example, reflects only (1 - a l) of the contemporaneous innovation in the second market. The remaining portion is reflected in the subsequent time period. If the ui are uncorrelated, the total variance of the implicit efficient price changes is aw 2 = Var(ul,t) + Var(u2 t). The proportion of information contributed by the ith market, termed the 'finformation share" in Hasbrouck (1995), is Var(ui,t)/a2~. It may be shown that although a VMA representation for the price changes exists in this model it is not invertible: a convergent VAR representation for the price changes does not exist. This is not a consequence of the stylized nature of the model. It is rather a reflection of the fact that even though both price series possess random-walk components (formally, possess unit roots), the difference between the prices is stationary. Such systems are said to be cointegrated. (See Davidson, Hendry, Srba and Yeo (1978), Engle and Granger (1987), and, at a textbook level, Hamilton (1994) and Banerjee, Dolado, Galbraith and Hendry (1994). Cointegrated systems can often be represented in numerous alternative ways, some of which are more useful for interpretation and others for estimation. Of particular importance in the present application is the Stock-Watson common trends representation. If two prices are cointegrated, they may be written:
685
i 11 Ell
p2,t
=
1 m r + IS2,tJ
85,
This is a multivariate generalization of the basic dichotomy between permanent and transitory components. It is important to note that the two prices share the same permanent component. In a cointegrated system, a convergent VAR representation for the price changes will never exist. One generally has more success with a slightly modified specification, the so-called error correction model (ECM). For a two-price model, a typical ECM is: Apt = 0~(Pl,t-1 - Pz,t-1) + A 1 A p t _ I + A 2 A p t - 2 + . . . + ut (8.6)
where the Ai are (2 2) coefficient matrices and ct is a (2 1) vector of coefficients. From (8.6) a VMA representation for the price changes may be recovered. This in turn will support computation of market information contributions described above (see Hasbrouck (1995)). Although ECMs are frequently employed as general reduced form specifications, their existence is not guaranteed. If ~1 = ~2 = 1, the model given in equation (8.4) will not possess a convergent ECM representation, although state-space estimation may remain feasible. In macroeconomic applications, the presence of cointegration and the coefficients of the cointegrating vectors (or a linear basis for these vectors) are often problematic. Matters are usually simpler in microstructure settings. When the cointegration involves two or more prices associated with same security (such as the price in different markets or the bid and ask quote in the same market), a basis for the cointegrating vectors can plausibly be specified a priori. If there are n price variables, there are n- 1 linearly independent price differences. Rejection of this set of cointegrating vectors is tantamount to asserting that two or more prices will tend over time to diverge without bound. This is not plausible if the prices all pertain to the same security. Harris, McInish, Shoesmith and Wood (1992) and Hasbrouck (1995) discuss these issues and describe applications to the U.S. equities markets. A similar situation exists when the multiple prices apply not to the same security, but instead to the security and a derivative such as a futures or options contract. Here it is often the case that arbitrage relationships between the derivative and the underlying will lead to cointegration between the price of the underlying and some function of the price of the derivative. Cointegration is likely to arise therefore, in studies of spot and forward prices and stock and option prices.
9. Summary and directions for further work This paper has attempted to provide an overview of the various approaches to modeling microstructure time series. Rather than recapitulate these developments, it is perhaps more useful to return to the questions that motivated them. It
686
J. Hasbrouck
was claimed in the introduction that microstructure models can potentially examine both narrow questions of trading behavior and market organization and also broader issues of valuation and the nature of information. The present paper has focused, however, almost exclusively on the former. This emphasis can be justified on the grounds that any study using market transaction data must employ methods that reflect the market realities. But as a practical matter the economic importance of security valuation and the implications for the allocation of real assets almost certainly outweighs the welfare improvements that might result from modest changes in the trading mechanisms for most securities. It is therefore appropriate to briefly indicate some of the ways in which microstructure studies can illuminate aspects of corporate finance. The classic event study measures the impact of a public information event by the associated change in the security price. The insight of the asymmetric information models is that when the "event" is a trade, the price reaction summarizes the market's estimation of the private information behind the trade. Studies of the price impact of trades, the spread (under certain assumptions), or the summary R2x measure introduced in section 3.2 thereby broadly characterize the market's beliefs about the magnitude of information asymmetries. Since these beliefs cannot usually be measured directly, the window offered by microstructure data may well be the only vantage point. Recent studies that explore asymmetric information in the vicinity of corporate announcements include Foster and Viswanathan (1995) (takeover announcements) and Lee, Mucklow and Ready (1993) (earnings announcements). Neal and Wheatley (1994) discuss the asymmetric information characteristics of closed-end mutual funds. We now return to the narrower microstructure issues. From a statistical perspective, the current state of the art falls considerably short of a plausible comprehensive model of transactions data. The reader who has skimmed over the discussion of time, discreteness, nonlinearities and multiple markets in the earlier sections can hardly avoid getting a sense of the tentativeness that marks modeling efforts in these areas, and the need for further work. But statistical models in this area must be ultimately judged by their implications for the economic questions. From an economic perspective, the standing questions are those of how information enters market prices, how traders should behave (private welfare) and how markets should be organized (social welfare). Studies of trade-price behavior have yielded a modest understanding of the first issue. It is an empirical fact that trades seem to explain part but not all of price changes. This confirms the existence of private information and establishes the importance of trading for the revelation or incorporation of this information. Answers to the other two fundamental questions, however, remain elusive. Trading strategy in most markets remains the province of human judgment, guided by experience and intuition, beyond the limits of existing normative models, even outside the realm of most ex post performance measurement excepting that of the roughest sort ("Did our investment strategy make money, net of trading costs?"). Nor have academic efforts to define economically efficient trading arrangements been particularly successful. While we have garnered
687
greater insights into the workings of existing markets, we have yet to create yardsticks capable of ranking potential alternative arrangements. No consensus on these questions among academics, practitioners and regulators has yet emerged. It is certainly to be hoped that improved econometric models will provide useful insights.
References
Algert, P. (1992). Estimates of nonlinearity in the response of stock prices to order imbalances. Working Paper, Graduate School of Management, University of California at Davis. Amihud, Y. and H. Mendelson (1980). Dealership market: Market making with inventory. J. Financ. Econom. 8, 31-53. Amihud, Y. and H. Mendelson (1986). Asset pricing and the bid-ask spread. J. Financ. Econom. 17, 223-49. Amihud, Y. and H. Mendelson (1987). Trading mechanisms and stock returns. J. Finance 42, 533-53. Amihud, Y. and H. Mendelson (1991). Volatility, efficiency and trading: Evidence from the Japanese stock market. J. Finance 46, 1765-89. Amihud, Y., H. Mendelson and M. Murgia (1990). Stock market microstructure and return volatility. J. Banking Finance 14, 423-40. Atchison, M., K. Butler and R. Simonds (1987). Nonsynchronous security trading and market index autocorrelation. J. Finance 42, 533 53. Banerjee, A., J. Dolado, J. W. Galbraith and D. F. Hendry (1994). Co-integration, Error-correction, and the Econometric Analysis of Non-stationary Data. Oxford University Press, London. Barclay, M. J. and J. B. Warner (1993). Stealth trading and volatility: Which trades move prices. J. Financ. Econom. 34, 281-306. Barnea, A. (1974). Performance evaluation of New York Stock Exchange specialists. J. Financ. Quant. Anal. 9, 511 535. Beja, A. and M. Goldman (1980). On the dynamics of behavior of prices in disequilibrium. J. Finance 35, 235-48. Bernhardt, D. and E. Hughson (1990). Discrete pricing and dealer competition. Working Paper, California Institute of Technology. Bernhardt, D. and E. Hughson (1992). Discrete pricing and institutional design of dealership markets. Working Paper, California Institute of Technology. Beveridge, S. and C. R. Nelson (1981). A new approach to the decomposition of economic time series into permanent and transitory components with particular attention to the measurement of the 'business cycle'. J. Monetary Econom. 7, 151-174. Blume, M. and M. Goldstein (1992). Displayed and effective spreads by market. Working paper, University of Pennsylvania. Boudoukh, J., M. P. Richardson and R. F. Whitelaw (1994). A tale of three schools: Insights on the autocorrelations of short-horizon stock returns. Rev. Financ. Stud. 7, 539-73. Burdett, K. and M. O'Hara (1987). Building blocks: An introduction to block trading. J. Banking Finance 11, 193-212. Campbell, J. Y., A. W. Lo and A. C. MacKinlay. The econometrics of financial markets Chapter 3: Aspects of market microstructure. Working Paper No. RPCF-1013-93, Research Program in Computational Finance, Sloan School of Management, Massachusetts Institute of Technology. Cheng, M. and A. Madhavan (1994). In search of liquidity: Block trades in the upstairs and downstairs markets. Working Paper, New York Stock Exchange. Cho, D. C. and E. W. Frees (1988). Estimating the volatility of discrete stock prices. J. Finance 43, 451-466. Chordia, T. and A. Subrahmanyam (1992). Off-floor market-making, payment-for-order-flow and the tick size. Working Paper, UCLA.
688
J. Hasbrouck
Chowdhry, B. and V. Nanda (1991). Multimarket trading and market liquidity. Rev. Financ. Stud. 4, 483-512. Christie, A. A. (1981). On efficient estimation and intra-week behavior of common stock variances. Working Paper, University of Rochester. Christie, W. G. and P. H. Schultz (1994a). Why did NASDAQ market makers stop avoiding oddeighth quotes? J. Finance 49, 1841 60. Christie, W. G. and P. H. Schultz (1994b). Why do NASDAQ market makers avoid odd-eighth quotes? J. Finance 49, 1813-40. Clark, P. K. (1973). A subordinated stochastic process model with finite variance for speculative prices. Econometrica 41, 135-159. Cohen, K., D. Maier, R. Schwartz and D. Whitcomb (1981). Transaction costs, order placement strategy and the existence of the bid-ask spread. J. Politic. Econom. 89, 282305. Cohen, K., D. Maier, R. Schwartz and D. Whitcomb (1986). The microstructure of security markets. Prentice-Hall: Englewood Cliffs, NJ. Cohen, K., G. Hawawini, S. Maier, R. Schwartz and D. Whitcomb (1983a). Friction in the trading process and the estimation of systematic risk. J. Financ. Econom. 29, 135-148 Cohen, K., G. Hawawini, S. Maier, R. Schwartz and D. Whitcomb (1983b). Estimating and adjusting for the intervalling-effect bias in beta. Mgmt. Sci. 29, 135 148. Conrad, J. and G. Kaul (1989). Mean reversion in short-horizon expected returns. Rev. Financ. Stud. 2, 225-40. Conrad, J., G. Kaul and M. Nimalendran (1991). Components of short-horizon individual security returns. J. Financ. Econom. 29, 365-84. Copeland, T. and D. Galai (1983). Information effects and the bid-ask spread. J. Finance 38, 1457 1469. Damodaran, A. (1993). A simple measure of price adjustment coefficients. J. Finance 48, 387-400. Davidson, J. E. H., D. F. Hendry, F. Srba and S. Yeo (1978). Econometric modeling of the aggregate time series relationship between consumers' expenditure and income in the United Kingdom. Econom. J. 88, 661-92. De Jong, F., T. Nijman and A. Roell (1993). A comparison of the cost of trading French shares on the Paris Bourse and on SEAQ International. London School of Economics, Discussion Paper No. 169. Dimson, E. (1979). Risk measurement when shares are subject to infrequent trading. J. Financ. Econom. 7, 197. Easley, D. and M. O'Hara (1987). Price, size and information in securities markets. J. Financ. Econom. 19, 69-90. Easley, D. and M. O'Hara (1991). Order form and information in securities markets. J. Finance 46, 905-927 Easley, D. and M. O'Hara (1992). Time and the process of security price adjustment. J. Finance 47, 577-606. Easley, D., N. M. Kiefer and M. O'Hara (1993). One day in the life of a very common stock. Working Paper, Cornell University. Easley, D., N. M. Kiefer and M. O'Hara (1994). Sequential trading in continuous time. Working Paper, Cornell University. Easley, D., N. M. Kiefer, M. O'Hara and J. B. Paperman (1995). Liquidity, information and infrequently traded stocks. Working Paper, Cornell University. Eckbo, B. E. and J. Liu (1993). Temporary components of stock prices: New univariate results. J. Financ. Quant. Anal. 28, 161-176. Engle, R. F. and C. W. J. Granger (1987). Co-integration and error correction: Representation, estimation and testing. Econometrica 55, 251-76. Engle, R. F., and J. R. Russell (1994). Forecasting transaction rates: The autoregressive conditional duration model. Working Paper No. 4966, National Bureau of Economic Research, Cambridge, MA. Fama, E. F. (1965). The behavior of stock market prices. J. Business 38, 34-105. Fama, E. (1970). Efficient capital markets: A review of theory and empirical work. J. Finance.
689
Fisher, L. (1966). Some new stock market indexes. J. Business 39, 191-225. Foster, F. D. and S. Viswanathan (1990). A theory of the interday variations in volumes, variances and trading costs in securities markets. Rev. Financ. Stud. 3, 593-624. Foster, F. D. and S. Viswanathan (1995). Trading costs of target firms and corporate takeovers. In: Advances in Financial Economics, JAI Press. French, K. R. and R. Roll (1986). Stock return variances: The arrival of information and the reaction of traders. J. Financ. Econom. 17, 5-26 Garbade, K. and Z. Lieber (1977). On the independence of transactions on the New York Stock Exchange. J. Banking Finance 1, 151-172. Garman, M. (1976). Market microstructure. J. Financ. Econom. 3, 257-275. George, T. J., G. Kaul and M. Nimalendran (1991). Estimation of the bid-ask spread and its components: A new approach. Rev. Financ. Stud. 4, 623-656. Gerety, M. S. and J. H. Mulherin (1994). Price formation on the stock exchanges: The evolution of trading within the day. Rev. Financ. Stud. 7, 609-29. Glosten, L. (1987). Components of the bid-ask spread and the statistical properties of transaction prices. J. Finance 42, 1293-1307. Glosten, L. (1994). Is the electronic open linait order book inevitable? J. Finance 49, 1127-1161. Glosten, L. and L. Harris (1988). Estimating the components of the bid-ask spread. J. Financ. Econom. 21, 123-142. Glosten, L. R. and P. R. Milgrom (1985). Bid, ask and transaction prices in a specialist market with heterogeneously informed traders. J. Financ. Econom. 14, 71-100. Goldman, M. and A. Beja (1979). Market prices vs. equilibrium prices: Return variances, serial correlation and the role of the specialist. J. Finance 34, 595407. Goodhart, C. A. E. and M. O'Hara (1995). High frequency data in financial markets: Issues and applications. Working Paper, London School of Economics. Granger, C. W. J. and O. Morgenstern (1970). Predictability of stock market prices. Heath-Lexington, Lexington, MA. Grossman, S. J. and M. H. Miller (1988). Liquidity and market structure. J. Finance 43, 617-33. Grossman, S. J. (1992). The informational role of upstairs and downstairs trading. J. Business 65, 50928. Grunbichler, A., F. A. Longstaff and E. Schwartz (1992). Electronic screen trading and the transmission of information: An empirical examination. Working Paper, UCLA. Hamao, Y. and J. Hasbrouck (1995). Securities trading in the absence of dealers: Trades and quotes on the Tokyo Stock Exchange. Rev. Financ. Stud., to appear. Hamilton, J. D. (1994). Time series analysis. Princeton University Press, Princeton. Harris, F. H. deB., T. H. McInish, G. L. Shoesmith and R. A. Wood (1992). Cointegration, error correction, and price discovery on the New York, Philadelphia and Midwest Stock Exchanges. Working Paper, Fogelman College of Business and Economics. Harris, L. (1990). Statistical properties of the Roll serial covariance bid/ask spread estimator. 9". Finance 45, 579-90. Harris, L. (1991). Stock price clustering and discreteness. Rev. Financ. Stud. 4, 389415. Harris, L. (1994). Minimum price variations, discrete bid-ask spreads and quotation sizes. Rev. Financ. Stud. 7, 149-178. Harvey, A. C. (1990). Forecasting, structural time series models and the kalman filter. Cambridge University Press. Hasbrouck, J. and G. Sofianos (1993). The trades of market makers: An empirical analysis of NYSE specialists. J. Finance 48, 1565-1593. Hasbrouck, J. and T. S. Y. Ho (1987). Order arrival, quote behavior and the return-generating process. J. Finance 42, 1035-1048. Hasbrouck, J. (1988). Trades, quotes, inventories and information. J. Financ. Econom. 22, 229-252. Hasbrouck, J. (1991a). Measuring the information content of stock trades. J. Finance 46, 179-207. Hasbrouck, J. (199lb). The summary informativeness of stock trades: An econometric investigation, Rev. Financ. Stud. 4, 571-95.
690
J. Hasbrouck
Hasbrouck, J. (1993). Assessing the quality of a security market: A new approach to measuring transaction costs. Rev. Financ. Stud. 6, 191-212. Hasbrouck, J. (1996). Order characteristics and stock price evolution: An application to program trading. J. Financ. Econom. 41, 129-149. Hasbrouck, J. (1995). One security, many markets: Determining the contributions to price discovery. J. Finance 50,1175-1199. Hasbrouck, J., G. Sofianos, and D. Sosebee (1993). Orders, trades, reports and quotes at the New York Stock Exchange. NYSE Working Paper, Research and Planning Section. Hausman, J., A. Lo and A. C. MacKinlay (1992). An ordered probit analysis of stock transaction prices. J. Financ. Econom. 31, 319-379. Ho, T. S. Y and H. R. Stoll (1981). Optimal dealer pricing under transactions and returns uncertainty. J. Finance 28, 1053-1074. Holthausen, R. W., R. W. Leftwich and D. Mayers (1987). The effect of large block transactions on security prices. J. Financ. Econom. 19, 237-67. Huang, R. D. and H. R. Stoll (1994a). Market microstructure and stock return predictions. Rev. Financ. Stud. 7, 179-213. Huang, R. D. and H. R. Stoll (1994b). The components of the bid-ask spread: A general approach. Working Paper 94-33, Owen Graduate School of Management, Vanderbilt University. Jain, P. C. and G. H. Joh (1988). The dependence between hourly prices and trading volume. J. Financ. Quant. Anal. 23, 269-83 Jones, R. H. (1985). Time series analysis with unequally spaced data. In: E. J. Hannan, P. R. Krishnaiah and M. M. Rao, eds., Handbook of Statistics, Volume 5, Time Series in the Time Domain, Elsevier Science Publishers, Amsterdam. Karlin, S. and H. M. Taylor (1975). A first course in stochastic processes. Academic Press, New York. Kaul, G. and M. Nimalendran (1990). Price reversals: Bid-ask errors or market overreaction~ J. Financ. Econom. 28, 67-93. Kyle, A. S. (1985), Continuous auctions and insider trading. Econometrica 53, 1315-1336. Laux, P. and D. Furbush (1994). Price formation, liquidity, and volatility of individual stocks around index arbitrage. Working Paper, Case Western Reserve University. Leach, J. C. and A. N. Madhavan (1992). lntertemporal discovery by market makers. J. Financ. Intermed. 2, 207-235. Leach, J. C. and A. N. Madhavan (1993). Price experimentation and security market structure. Rev. Financ. Stud. 6, 375--404. Lee, C. M. C. and M. Ready (1991). Inferring trade direction from intradaily data. J. Finance 46, 733746. Lee, C. M. C., B. Mucklow and M. J. Ready (1993). Spreads, depths and the impact of earnings information: An intraday analysis. Rev. Financ. Stud. 6, 345-374. Lee, C. M. C., M. J. Ready and P. J. Seguin (1994). Volume, volatility and New York Stock Exchange trading halts. J. Finance 49, 183-214 Lehmann, B. and D. Modest (1994). Trading and liquidity on the Tokyo Stock Exchange: A bird's eye view. J. Finance 44, 951-84. Lo, A. and A. C. MacKinlay (1988a). Stock prices do not follow random walks: Evidence from a simple specification test. Rev. Financ. Stud. 1, 41-66. Lo, A. and A. C. MacKinlay (1988b). Notes on a Markov model of nonsynchronous trading. Working Paper, Sloan School of Management, Massachusetts Institute of Technology. Lo, A. and A. C. MacKinlay (1990a). An econometric analysis of nonsynchronous trading. J. Econometrics 45, 181-212. Lo, A. and A. C. MacKinlay (1990b). When are contrarian profits due to stock market overreaction. Rev. Financ. Stud. 3, 175-205. Lo, A. and A. C. MacKinlay (1990c). Data-snooping biases in tests of financial asset pricing models. Rev. Financ. Stud. 3, 431-468. Madhavan, A. and S. Smidt (1991). A Bayesian model of intraday specialist pricing. J. Financ. Econorn~ 30, 99-134.
691
Madhavan, A. and S. Smidt (1993). An analysis of changes in specialist inventories and quotations. J. Finance 48, 1595-1628. Madhavan, A., M. Richardson and M. Roomans (1994). Why do security prices change? A transaction level analysis of NYSE stocks. Working Paper, Wharton School. Manaster, S. and S. Mann (1992). Life in the pits: Competitive market making and inventory control. Working Paper, University of Utah. Marsh, T. and K. Rock (1986). The transactions process and rational stock price dynamics. Working Paper, University of California at Berkeley. Masulis, R. W. and V. K. Ng (1991). Stock return dynamics over intra-day trading and non-trading periods in the London stock market. Working Paper No. 91-33, Mitsui Life Financial Research Center, University of Michigan. Mclnish, T. H. and R. A. Wood (1990). A transactions data analysis of the variability of common stock returns during 1980-1984. J. Banking Finance 14, 99-112 Mclnish, T. H. and R. A. Wood (1991 a). Hourly returns, volume, trade size, and number of trades. J. Financ. Res. 14, 303-15. Mclnish, T. H. and R. A. Wood (1991b). Autocorrelation of daily index returns: Intraday-to-intraday vs. close-to-close intervals. J. Banking Finance 15, 193-206. Mclnish, T. H. and R. A. Wood (1992). An analysis of intraday patterns in bid/ask spreads for NYSE stocks. J. Finance 47, 753-64. Mech, T. (1993). Portfolio return autocorrelation. J. Financ. Econom. 34, 307-44. Mendelson, H. (1982). Market behavior in a clearing house. Econometrica 50, 1505-24. Merton, R. (1980). Estimating the expected rate of return, J. Financ. Econom. 8, 323-62. Naik, N. A. Neuberger and S. Viswanathan (1994). Disclosure regulation in competitive dealership markets: Analysis of the London Stock Exchange. Working Paper, London Business School. Neal, R. and S. Wheatley (1994). How reliable are adverse selection models of the bid-ask spread. Working Paper, Federal Reserve Bank of Kansas City. Neuberger, A. J. and A. Roell (1991). Components of the bid-ask spread: A Glosten-Harris approach. Working Paper, London Business School. Neuberger, A. J. (1992). An empirical examination of market maker profits on the London Stock Exchange. J. Financ. Serv. Res., 343-372. Niederhoffer, V. and M. F. M. Osborne (1966). Market making and reversals on the stock exchange. J. Amer. Statist. Assoc. 61, 897-916. Niederhoffer, V. (1965). Clustering of stock prices. Oper. Res. 13, 258-262. Niederhoffer, V. (1966). A new look at clustering of stock prices. J. Business 39, 309-313. O'Hara, M. and G. S. Oldfield (1986). The microeconomics of market making. J. Financ. Quant. Anal. 21, 361-76. Oldfleld, G. S. and R. J. Rogalski (1980). A theory of common stock returns over trading and nontrading periods. J. Finance 37, 857-870. Parzen, E., ed., (1984). Time series analysis of irregularly observed data. Springer-Verlag, New York. Petersen, M. and S. Umlauf (1991). An empirical examination of intraday quote revisions on the New York Stock Exchange. Working Paper, Graduate School of Business, University of Chicago. Roll, R. (1984). A simple implicit measure of the effective bid-ask spread in an efficient market. J. Finance 39, 1127-1139. Ronen, T. (1994). Essays in market microstructure: Variance ratios and trading structures. Unpub. Ph.D. Dissertation, New York University. Samuelson, P. (1965). Proof that properly anticipated prices fluctuate randomly, lndust. Mgmt.
Rev.
Sargent, T. J. (1987). Macroeconomic Theory. 2rid ed., Academic Press: Boston. Scholes, M. and J. Williams (1977). Estimating betas from nonsynchronous data. J. Financ. Econom. 5, 309. Schwartz, R. A. and N. Economides (1995). Making the trade: Equity trading practices and market structure. J. Port. Mgmt. to appear.
692
J. Hasbrouck
Schwartz, R. A. (1988). Equity markets: Structure, trading and performance. Harper and Row, New York. Schwartz, R. A. (1991). Reshaping the equity markets. Harper Business, New York. Schwartz, R. A. (1996). Electronic call market trading. Symposium Proceeding, Irwin Professional. Seppi, D. J. (1990). Equilibrium block trading and asymmetric information. J. Finance 45, 73-94. Seppi, D. J. (1992). Block trading and information revelation around quarterly earnings announcements. Rev. Financ. Stud. 5, 281-305. Shanken, J. (1987). Nonsynchronous data and the covariance-factor structure of returns. J. Finance 42, 221-232. Stock, J. (1988). Estimating continuous time processes subject to time deformation. J. Amer. Statist. Assoc. 83, 77-85. Stock, J. H. and M. W. Watson (1988). Testing for common trends. J. Amer. Statist. Assoc. 83, 10971107. Smith, T. (1994). Econometrics of financial models and market microstructure effects. J. Financ. Quant. Anal. 29, 519-540. Stoll, H. R. (1978). The supply of dealer services in securities markets. J. Finance 33, 1133-1151. Stoll, H. R. (1989). Inferring the components of the bid-ask spread: Theory and empirical tests. J. Finance 44, 115-34. Tinic, S. (1972). The economics of liquidity services. Quart. J. Econom. 86, 79-93. U.S. Securities and Exchange Commission, 1971, Institutional Investor Study Report, Arno Press, New York. Watson, M. W. (1986). Univariate detrending methods with stochastic trends. J. Monetary Eeonom. 18, 49-75. Wood, R. A., T. H. McInish and J. K. Ord (1985). An investigation of transactions data for NYSE stocks. J. Finance 40, 723-39.
Z,.~
Statistical Methods in Tests of Portfolio Efficiency: A Synthesis*
Jay Shanken
This paper provides a review of statistical methods that have been used in testing the mean-variance efficiency of a portfolio, with or without a riskless asset. Topics considered include asymptotic properties of the two-pass methodology for estimating coefficients in the linear relation between expected returns and betas; the errors-in-variables problem in two-pass estimation; small-sample properties and economic interpretation of multivariate tests of expected return linearity in beta.
1. Introduction
The tradeoff between risk and expected return in the formation of an investment portfolio is a central focus of modern financial theory. In this review, we explore the ways in which statistical methods have been used to evaluate this tradeoff and test the "efficiency" of a portfolio. The emphasis is on methodology rather than empirical findings. Formally, a portfolio is characterized by a set of security or asset weights that sum to one. The return on the portfolio is the corresponding weighted average of security returns. Here, return refers to the change in price over the period plus any cash flow received (interest or dividends) at the end of the period, all divided by the beginning-of-period price. In a single-period context, if the rates of return on the available investments are jointly normally distributed, then a risk-averse (strictly concave utility function) investor will exhibit a preference for expected return and an aversion to variance of return. 1 In order to maximize expected utility, such an investor will combine securities in what is termed an efficient portfolio, i.e., a portfolio that (i) has the smallest possible variance of return given its expected return and (ii) the largest possible expected return given its variance.
* Thanks to Dave Chapman, Aditya Kaul, Jonathan Lewellen, John Long, Ane Tamayo, and Guofu Zhou for helpful comments on earlier drafts. i See Chamberlain (1983) for more general conditions. 693
694
J. Shanken
More generally, any portfolio that satisfies condition (i) is said to be a minimumvariance portfolio. 2 We now consider statistical methods for testing whether a given portfolio satisfies these conditions. Assume that a set of N risky securities and a portfolio p are given. The return on security i over period t is denoted Rit and the return on the portfolio is Rpt. The N + 1 returns are taken to be linearly independent. It is well known [Fama (1976), Roll (1977) and Ross (1977)] that p is a minimum-variance portfolio if and only if there is a constant, Yop, such that the vector of expected security returns, rb ... rN, is an exact linear function of the vector of security betas on Rp; i.e.,
ri = Y0p + fli(rp
YOp), i = 1 , 2 , . . . ,N,
(1.1)
where rp is the expected return on portfolio p and the betas are slope coefficients in the time-series regressions of (realized) security returns on the returns of p:
Rit = ~i -F t~iRpt + sit
and
E(u)
E ( E . R p t ) ~- O.
(1.2)
Moreover, a minimum-variance portfolio p is efficient if and only if the additional restriction, rp > 7Op, is satisfied, where the "zero-beta rate," 7op, is the expected return on any security (or portfolio) that has a beta of zero relative to p. Thus, in the efficient portfolio case, expected return is an increasing linear function of beta. The equivalence between the minimum-variance property and the expected return-beta relation arises from the fact that the beta coefficient determines the marginal contribution that a security makes to the total risk (variance) of portfolio p. This equivalence is of great import for the testing of portfolio efficiency since the hypothesis can be viewed as a restriction on the parameters in the multivariate linear regression system (1.2). Combining (1.1) and (1.2), we have the hypothesis n01 : ~i = 70p( 1 -/~i), i = 1,... ,N, (1.3)
a joint restriction on the intercepts and slopes in the time-series regressions. This condition asserts the existence of a single number, Y0p, for which the interceptslope relation holds across the given N securities. I f investors can borrow or lend at a known riskfree rate, rf, and p is presumed efficient with respect to the set of all portfolios of both the risky securities and the riskless asset, then 70p rf .3 Otherwise, 7op is unknown and must be estimated. According to H01, the ratio of alpha to one minus beta for any N-1 securities is equal to the ratio for the remaining security. Thus, 2N parameters (the alphas and betas) are reduced to a set of just N + 1 parameters (the betas and Y0p) under the
=
2 It is convenient to exclude from this definition the global minimum variance portfolio, i.e., the portfolio with the lowest variance of return, regardless of expected return. Also, we assume below that at least two portfolios have distinct expected returns. 3 A negative position in the riskless asset amounts to borrowing, and the riskless rate is assumed to be the same for both borrowing and lending.
Statistical methods in tests of portfolio efficiency: A synthesis
695
N-1 restrictions implicit in (1.3) [Gibbons (1982)]. The restriction is nonlinear in a statistical sense when Y0p is unknown, since ~0p and/~ip enter multiplicatively and both must be estimated.
2. Testing efficiency with a riskless asset

2.1. Univariate tests
Before going on to the general case, we focus on the much simpler scenario in which Y0p is known and equal to the return on a riskless security. In this case, it is convenient to consider the excess-return version of the system (1.2); i.e., we now view Rit as the return on security i in excess of the riskless rate and ri is the corresponding expected excess return. 4 The excess zero-beta rate in (1.1) is then zero, and hence, by (1.3), so are the time-series regression intercepts in (1.2). Thus, the main hypothesis of interest is now H02 : ~i = 0, i= 1,...,N. (2.1)
A test of this restriction on the excess-return regression model is a test that the given portfolio satisfies the minimum-variance property in the presence of a riskless asset. An early study by Black, Jensen, and Scholes (1972) examines the efficiency of an equal-weighted stock market index using monthly excess returns over the period 1931-65. The equal-weighted index is used as a proxy for the valueweighted market portfolio of all financial assets. The latter portfolio is predicted to be an efficient portfolio under the assumptions of the capital asset pricing model (CAPM) of Sharpe (1964) and Lintner (1965), a theory of financial market equilibrium. Black, Jensen, and Scholes report t-tests on the intercepts for a set of ten stock portfolios, with two of the ten significant at the 0.05 level (two-sided). The estimated intercepts are negative for the portfolios with relatively high estimated betas and positive for those with lower betas.
2.2. Multivariate tests 2.2.1. F-test on the intercepts
More recently, Gibbons, Ross, and Shanken (1989) apply a multivariate F-test of H02 to the Black, Jensen, and Scholes data and fail to reject the joint hypothesis that the intercepts are all zero [see related work by Jobson and Korkie (1982, 1985) and MacKinlay (1987)]. Use of the F-test presumes that the disturbances in (1.2) are independent over time and jointly normally distributed, each period,
4 In this context,all probabilitystatementscan be viewedas conditionalon the risklessrate series. In general, the total return and excessreturn time-seriesspecificationsneed not be strictlyconsistentwhen the riskless rate varies over time.
696
J. Shanken
with mean zero and nonsingular cross-sectional covariance matrix Z, conditional on the vector of returns, Rp. Let T equal the length of the given time-series of returns for the N assets and portfolio p. The F-statistic, with degrees of freedom N and T - N - I , equals (T-N-I)N-I(T-2) -1 times the Hotelling T2 statistic
--2 2 O - T~'~-1~/[1 +Rp/Sp],
(2.2)
where Rp and Sp are the sample mean and standard deviation of excess return for p; ~ is the N-vector of OLS intercept estimates and ~ is the unbiased estimate of E, computed from crossproducts of OLS residuals divided by T-2. The conditional covariance matrix of the alpha estimates, given Rp, equals the product of the denominator in (2.2), a function of Rp, and the residual covariance matrix, E, divided by T. Thus, the T2 statistic is a quadratic form in the alphas, weighted by the inverse of the estimated covariance matrix of the alphas. When N = 1, Q is just the square of the usual univariate t-statistic on the intercept. More generally, it can be shown that Q is the maximum squared (univariate) tstatistic for alpha, where the maximum is taken over all portfolios of the N assets. 5 Since Q has the same distribution unconditionally and conditional on Rp, the F-test does not require that Rp itself be normally distributed; the disturbances are assumed to be jointly normally distributed, however. Affleck-Graves and McDonald (1989) present simulation evidence indicating that the multivariate tests are robust to deviations from normality of the residuals, although MacKinlay and Richardson (1991) report a sensitivity to conditional heteroskedasticity. Zhou (1993) reaches similar conclusions. Given our assumptions, the zero intercept restriction implies that expected excess returns for the N assets are proportional to the betas, both unconditionally and conditional on Rp. Extremely high or low returns for p, in a given sample period, tell us nothing about whether the intercepts are zero. Accordingly, the test statistic in (2.2) depends on the mean return of portfolio p only through its squared value, not its level. Portfolio efficiency entails the additional restriction that the ex ante mean excess return, rp, exceeds zero, however, and this hypothesis can and should be evaluated separately through a simple t-test on the sample m e a n , ~p.6
2.2.2. Power and economic interpretation of the F-test

Gibbons, Ross, and Shanken (1989) provide an interesting economic interpretation of the F-statistic that requires some additional notation. Let SH(p) equal the ratio, rp/Crp, of expected excess return to standard deviation of return for portfolio
s See Gibbons, Ross, and Shanken, section 6, for a proof and an economicinterpretation of this relation. 6 Since Q is independent of Rp under the null hypothesis of efficiency,the p-value for the joint hypothesisthat the interceptsare zero and rp > 0 (probabilitythat at least one of the two statistics is in the relevant tail areas) equals the sum of the two p-values minus their product.
697
p and let sh (p) be the corresponding sample quantity. These reward/risk measures are referred to as Sharpe ratios. Using this terminology, an efficient portfolio can be characterized as one with the maximum possible Sharpe ratio, while a minimum-variance portfolio maximizes the squared (absolute) Sharpe ratio. 7 If portfolios are plotted as points in a graph with expected excess return on the vertical axis and standard deviation of return on the horizontal axis, then the Sharpe ratio for p equals the slope of a ray through p emanating from the origin; in the case of a minimum-variance portfolio, the ray is tangent to the graph. Gibbons, Ross, and Shanken show that Q in (2.2) equals
T[sh(,) 2 -
sh(p)2]/[(1 + sh(p)2],
(2.3)
where sh(*) is the sample Sharpe ratio with maximum squared value over all portfolios. Examining the numerator of (2.3), we see that, other things equal, the F-statistic is larger the lower is the squared Sharpe ratio for portfolio p in relation to the maximum squared sample ratio. Thus, the F-statistic is large when p is "far" from the ex post minimum-variance frontier. Of course, in any sample, there will be portfolios whose sample Sharpe ratios dominate p's, even ifp is truly an ex ante minimum-variance portfolio. The F-test provides a basis for inferring whether the difference, sh(*) 2 - sh(p), is within the range of random outcomes that would reasonably be anticipated under the null hypothesis. This assessment naturally depends on the precision of the alpha estimates. Given the assumptions above, Gibbons, Ross, and Shanken show, further, that the F-statistic is distributed, under the alternative, as noncentral F with noncentrality parameter
2 = T[SH(*) 2 - SH(p)2]/[1 + sh(p)2].
(2.4)
Again, the distribution is conditioned on Rp, the independent variable in the timeseries regressions, and depends on Rp through the ex post Sharpe ratio. In this context, sh(p) may be viewed as a constant, and hence the noncentrality parameter in (2.4) is just the (conditional) population counterpart of the sample statistic, Q, in (2.3). Under the null hypothesis that p is a minimum-variance portfolio, p attains the maximum squared ex ante ratio. In this case, 2 equals zero and we have a central F distribution as earlier. The power of the F-test is known to be an increasing function of the noncentrality parameter. Therefore, given sh(p), power is greater the further is the square of S H ( p ) from the maximum squared (population) ratio; i.e., the greater is the deviation from ex ante efficiency in this metric. Holding the ex ante deviation constant, power decreases as the square of sh(p) increases, reflecting the lower
7 See Merton (1973a) and Litzenberger and Huang (1988).
698
J. Shanken
(conditional) precision with which the intercepts are estimated when this sample quantity is high. In order to implement the F-test, the residual covariance matrix, Z, must be invertible, which requires that N be at most equal to T-2. Analysis in Gibbons, Ross, and Shanken (1989) suggests that much smaller values of N should be used in order to maximize power, however. This is related to the fact that the number of covariances that must be estimated increases rapidly with the number of assets. Although increasing N can increase the noncentrality parameter in (2.4), by increasing the maximum Sharpe measure, apparently this benefit is eventually offset by the additional noise in estimating E and its inverse. Given the thousands of stocks available for analysis and the requirement that N be (much) less than T, some procedure is needed to reduce the number of assets. Although subsets of stocks could be used, the test is more commonly applied to portfolios of stocks. This has the advantage, for a given N, of reducing the residual variances, thereby increasing the precision with which the alphas are estimated. 8 On the other hand, as Roll (1979) has noted, individual stock expected return deviations can cancel out in portfolios, which would reduce power. The expected power of the test thus depends on the researcher's prior beliefs as to the likely sources of portfolio inefficiency.9
2.3. O t h e r t e s t s
The likelihood ratio test (LRT) and the Lagrange multiplier (Rao's score) test statistics are both monotonic transformations of the T2 statistic (modified Wald test) in (2.2) and thus need not be considered separately from the F-test. l In particular,
L R T = Tln[1 + Q / ( T - 2)].
(2.5)
Lo and MacKinlay (1990) have emphasized that the use of portfolio grouping in multivariate tests, together with the exploration of a wide variety of potentially relevant firm ranking variables, can lead to substantial "data-snooping" biases; i.e., the appearance of statistical significance even when the null hypothesis of efficiency is true. An alternative diagonal version of the multivariate test, suggested by Affleck-Graves and McDonald (1990), is interesting in this regard since it does not require grouping. As such, it also avoids Roll's concerns about the use of portfolio-based tests. The diagonal test appears to have desirable power characteristics in simulations, but the distribution of the test statistic is unknown.
8There are additional motivationsfor the use of portfolios.Somestocks come and go over time and using portfoliosallows one to use longer time seriesthan would otherwisebe possible.Also, portfolios formed by periodicallyranking on some economiccharacteristicmay have fairly constant betas even though individual security betas change over time. Note that the composition of each portfolio changes over time in this context. 9 See the related analysis of power issues in MacKinlay(1995). l0 See related work by Evans and Savin (1982).
699
It would be helpful to have some sort of approximate distribution theory for this approach. In the remainder of this section, we consider several different variations on the multivariate framework-joint confidence intervals, tests of approximate efficiency, Bayesian approaches to testing efficiency, and tests of conditional efficiency.
2.3.1. Joint confidence intervals In some contexts, one is interested in the mean-variance efficiency of an index primarily for the purpose of obtaining (statistically) efficient estimates of asset expected returns, via the linear relation (1.1). For example, in capital budgeting applications, the required discount rates for a set of projects might equal the expected returns (adjusted for financial leverage) of some industry portfolios. Here, the magnitude of deviations from the expected return relation is important. Shanken (1990, p.110) suggests examining joint confidence intervals for the alphas, in such a case, since the p-value for the F-test is not very informative in this regard. The simultaneous confidence interval approach exploits the fact, noted earlier, that the T2 statistic in (2.2) equals the maximum squared univariate t-statistic for the alphas, where the maximum is taken over all possible portfolios of the given assets.n The intervals consist of alphas within k sample standard errors of the OLS estimates, where the constant k is the relevant fractile of the T 2 distribution or, equivalently, N ( T - 2 ) ( T - N - 1 ) q times the fractile of an F distribution with degrees of freedom N and T-N-1. Alternatively, the Bonferroni approach may be used to obtain (conservative) joint confidence intervals for the N alphas. In this case, one divides the designated error probability by N and then computes conventional confidence intervals based on a t distribution with T-2 degrees of freedom. 2.3.2. Tests o f approximate efficiency In a portfolio investment context, one may not be interested in the expected returns, alone, but rather in the extent to which a given portfolio deviates from efficiency. This, recall, is reflected in the noncentrality parameter 2, in (2.4), which depends on both the alphas and the residual covariance matrix, Y.. Kandel and Stambaugh (1987) and Shanken (1987b) utilize the multivariate framework to formulate tests of approximate efficiency. This enables the researcher to test for "economically significant" departures from mean-variance efficiency. It is also of interest in testing positive theories like the CAPM, mentioned earlier. Roll (1977) emphasizes that inferences about the efficiency of a stock index proxy do not tell us whether the true market portfolio is efficient, as required by
11 See Morrison (1976), Chapter 4, for a discussionof joint confidenceintervals. Asymptoticversions of these methods [e.g., Shanken (1990)] based on chi-squareor normal distributionsfollowthe same logic.
700
Z Shanken
the asset pricing theory. Kandel and Stambaugh and Shanken show, however, that efficiency of the true market portfolio, along with an a priori belief about the correlation between the proxy and the market, can be used to bound the extent to which the proxy is inefficient. If the bound is violated, efficiency of the true market portfolio is rejected. For example, Shanken rejects efficiency of the true market portfolio, over the period 1953-83, assuming the correlation with an equal-weighted stock index proxy exceeds 0.7. This tempers the concerns about testability raised by Roll somewhat, as he also conjectured that most reasonable proxies would be fairly highly correlated with the true market portfolio, whether the latter is efficient or not. 2.3.3. Bayesian tests of efficiency Making use of the fact that the distribution of the test statistic for the minimumvariance property is known under both the null and the alternative, given normality, Shanken (1987a) explores a Bayesian approach to testing portfolio efficiency. Harvey and Zhou (1990) and Kandel, McCulloch, and Stambaugh (1995) extend this analysis by considering prior distributions formulated over the entire parameter space of the multivariate regression model. 12 The relation (2.4) is important in this context, as it facilitates an assessment of the economic significance of deviations from the null hypothesis and the related formulation of meaningful priors on the unknown parameters. 2.3.4. Tests of conditional efficiency We have assumed, thus far, that asset betas are constant over time. However, if we condition on variables characterizing different states of the economy, betas may well vary. The regression framework is easily extended to accommodate changes in the betas if one is willing to specify the relevant state variables, say interest rates, and postulate some functional relation to beta. For example, suppose there is a single, stationary, mean-zero state variable, zt-1, known at the beginning of period t, and the conditional beta is flit-1 = -fli -]- CiZt-l" (2.6)
Here, fli is the long-run average beta for security i and ci indicates the sensitivity of i's conditional beta to variation in the state variable. Substituting fl~t-1 for fli in (1.2) and assuming eit has zero mean conditional on both zt-1 and Rpt, Rit = ~i + -fliRp, + c~(zt-lRpt) +eit (2.7)
is an expanded regression equation from which the parameters of interest may be estimated and the zero-intercept restriction tested. This approach to efficiency
12Also see related work by McCullochand Rossi (1990).
701
tests is developed in Campbell (1985) and Shanken (1990) in the context of an. intertemporal CAPM [Merton (1973b)]. 13 In addition to time-varying betas, the expected return or risk of portfolio p may change over time. This does not pose a problem, though, since the regression analysis is conditioned on the returns for p, as noted earlier. An F-test of the joint zero-intercept restriction is still appropriate if the disturbances in (2.7) have constant variance (over time) conditional on both Rpt and zt-1. Shanken (1990) finds strong evidence of conditional residual heteroskedasticity, however, and employs an asymptotic chi-square test based on the heteroskedasticity-consistent covariance matrix of the intercept estimates [White (1984)]. This approach is also adopted by MacKinlay and Richardson (1991), in exploring the impact of residual heteroskedasticity conditional on the contemporaneous realization of Rp.
3. Testing efficiency without a riskless asset
Since U.S. Treasury bills are only nominally riskless, the assumption that there is a riskless asset m a y not be appropriate if one is concerned with the efficiency o f a portfolio in real (inflation-adjusted) terms. Even in the nominal case, if there are restrictions on borrowing [Black (1972)], or an investor's riskless borrowing rate exceeds the T-bill rate [Brennan (1971)], then the zero-beta rate for an efficient portfolio can be greater than the T-bill rate and must be estimated. In this section, therefore, we treat Y0p as an unknown parameter and consider tests of the nonlinear restriction (1.3). The regression variables in (1.2) can now be viewed as either total returns or excess returns; in the latter case, 70p is the excess zero-beta rate.
3.1. Traditional two-pass estimation techniques
Given the "bilinear" nature [Brown and Weinstein (1983)] of the relation (1.3), an intuitively appealing approach to estimation entails first, estimating the alphas and betas from time-series regressions (1.2), for each security, and then running a cross-sectional regression of the N alpha estimates on one minus the N beta estimates (no constant) in order to estimate y0o. This is effectively the approach adopted by Black, Jensen, and Scholes (1972) [see related discussion in Blume and Friend (1973)]. Another approach, essentially that of Fama and MacBeth (1973), is to regress the cross-section of mean security returns on the betas and a constant, a4 The intercept in this cross-sectional regression (CSR) is taken as the estimate of 70p
13Also see related work by Ferson, Kandel, and Stambaugh (1987)and Harvey (1989). 14There are many variations on this approach. Here, we assume that each asset beta is estimated from a single time-seriesregressionover the entire period. See Jensen (1972)for a reviewof the early developmentof the literature.
702
J. Shanken
and the slope coefficient on beta is an estimate of 71p = rp--0p .15 We focus primarily on the Fama-MacBeth version of the "two-pass" methodology in the remainder of this review, as it is the approach used most often in the literature. 16 It is well known that security returns are cross-sectionally correlated, due to c o m m o n market and industry factors, and also heteroskedastic. F o r example, small-firm returns tend to be more volatile than large-firm returns. As a result, the usual formulas for standard errors, based on a scalar covariance matrix assumption, are not appropriate for the OLS CSR's run by Black, Jensen, and Scholes and F a m a and MacBeth. Recognizing this problem, F a m a and MacBeth run CSR's each month, generating time-series of estimates for both 70p and ~lp" Means, standard errors, and "t-statistics" are then computed from these time series and inference proceeds in the usual manner, as if the time series are independently and identically distributed. Since the true variance of each monthly estimator depends on the covariance matrix of returns, cross-sectional correlation and heteroskedasticity are reflected in the time series of monthly estimates. However, given the fact that the same beta estimates are used in each monthly cross-sectional regression, the monthly g a m m a estimates are not serially independent. This dependence is ignored by the traditional two-pass procedure. The fact that there is an error component c o m m o n to each of the monthly cross-sectional regressions, due to beta estimation error, makes the small-sample distribution of the mean g a m m a estimator difficult to evaluate. This is a form of the "generated regressor" problem [Pagan (1983)], as it is sometimes called in the econometrics literature. While consistency (as T---~oo)of the beta estimates implies consistency of the g a m m a estimates, the " F a m a - M a c B e t h standard errors" computed from the time series of CSR estimates are generally inconsistent estimates of the asymptotic standard errors [Shanken (1983, 1992)]. Let X be the N x 2 matrix [1U, /3] of ones and betas and J( the corresponding matrix, [1N,/~], with estimated betas. Let R t be the N-vector of security returns for period t and R the N-vector of sample mean returns. In this notation, equation (1.1) implies
Rt = X I " -~- error = X F + [error - ~lp(~ -/3)],
(3.1)
where F ----(70p, 7~p)' and "error" is the unexpected component of return. I f A = ( X ' X ) "1 X ' and A is the corresponding estimator, then the second-pass estimator of the gammas is F - ( ~ ) 0 , , ) I ) ' - ~ A R , the mean of the monthly estimators, /~, - ~]R,. x5 Although 70p and 71p are treated as separate parameters, the constraint that ;qp = rp-%p is implicitly imposed if p is an equal-weightedportfolio of the N assets used in an OLS CSR. The FamaMacBeth approach can also be used in asset pricing tests where the "factor" is, say, a macroeconomic variable, rather than a portfolio return [e.g., Chen, Roll, and Ross (1986) and Shanken and Weinstein (1990)], and the constraint on the gammas is no longer appropriate. ~6The various results summarizedhere all have straightforward extensions to the Black, Jensen, and Scholes specification. See Shanken (1992).
703
Since the gamma estimates are linear combinations of asset returns, they have an intuitively appealing portfolio interpretation [Fama (1976, Chapter 9)]. Note that AX is a 2 x 2 identity matrix. Focusing on the first row of A, we see that the estimate of 70p is the sample mean return on a standard (weights sum to one) portfolio with a beta (weighted-average asset beta) of zero. Similarly, the estimate of the risk premium 7~p is the mean return on a zero-investment portfolio (weights sum to zero) with a beta of one - properties shared by the mean excess return for p in the riskless asset case. Using (3.1), Shanken (1992) shows that the sample covariance matrix of the /~'ts, used in computing Fama-MacBeth standard errors, converges to AZA' + M, where M is a 2 x 2 matrix with ap2 in the lower right corner and zeroes elsewhere. 17 The first term, AZW, arises from the return residuals in (1.2); the diagonal elements capture the residual variation in the portfolio estimators. The second term, M, accounts for "systematic" variation related to Rp and reflects the fact that the estimates of 70p and 7~p are returns on portfolios with betas of zero and one, respectively. It follows that the variance of the mean excess return for p is a lower bound on the variance of ~lAs noted earlier, the traditional method of computing standard errors for the gamma estimates ignores beta estimation error. When this measurement error is recognized, the asymptotic covariance matrix of/~, i.e., the covariance matrix of the limiting multivariate normal distribution of v/T(/" - F), is: is
(1 +
2 lp/%)AX4 + M,
(3.2)
The additional term in (3.2) arises from the fact that i) the asymptotic covariance matrix for/~ is Z/a2p and, ii) the impact of measurement error in/~ on the CSR disturbance is, by (3.1), proportional to 71p. Thus, the traditional standard errors are too low, except for the case in which measurement error in beta is irrelevant, i.e., under the null hypothesis that 71p equals zero. 19 Asymptotic confidence intervals for the gammas always require the use of adjusted standard errors. Asymptotically valid standard errors are easily obtained from (3.2) by substituting consistent estimates for the various parameters. For 70p, this amounts to multiplying the Fama-MacBeth variance by the errors-in-variables adjustment ^2 2 For 71p, Sp 2 is subtracted from the Fama-MacBeth variance term, (1 + 71/Sp). before multiplying by the adjustment term and is then added back.
17 This follows from the fact that the covariance matrix of Rt is Z + flffap2, and that Aft is the second column of M. 18 Gibbons (1980) independently derives the asymptotic distribution for the Black, Jensen, and Scholes estimator, a special case of Shanken (1992). 19 In the "multifactor" context, the adjustment term is a quadratic form in the vector of factor riskpremia with weighting matrix equal to the inverse of the factor covariance matrix. Now, an asymptotic "t-statistic" for the null hypothesis that a given factor's risk premium is zero always requires that the adjustment term be incorporated since the other factor premia need not be zero under the null.
704
J. Shanken
3.2. Tests of linearity against a specific alternative

The estimation results above are relevant for testing whether ~lp > 0, a necessary condition for p to be an efficient portfolio. The analysis assumes linearity of the expected return relation, however, and this must be tested separately. The simplest approach is to include other independent variables along with beta in the CSR and test whether the coefficients on the additional variables differ from zero. If so, then beta is not the sole determinant of cross-sectional variation in expected returns and efficiency is rejected. This is the approach taken by Fama and MacBeth (1973), who use beta-squared and residual variance as additional variables. Their evidence supports linearity in beta with a positive risk premium. Consistent with the results of Black, Jensen, and Scholes (1972), they also find that 70 is significantly greater than the T-bill rate while ~1 is less than the mean excess market index return. Supposing, for simplicity, that the additional cross-sectional variables are constant over time and measured without error, the asymptotic analysis above is easily modified. The additional variables are included in the X matrix and a row and column of zeroes are added to the matrix M, for each extra variable. The asymptotic covariance matrix of the expanded gamma estimator is then given by (3.2). Note that measurement error in the betas affects the standard errors of the additional coefficients, even though the associated independent variables are 2 2 measured without error. Moreover, the adjustment term, 1 + ~lp/~Tp, must always be included in testing linearity, as 71p need not be zero under the linearity hypothesis. In contrast to the multivariate approach, the coefficient-based test of this section requires that the researcher formulate a specific alternative hypothesis to linearity. This can be an advantage if the null hypothesis is rejected, as the test provides concrete information concerning the deviations from linearity. The downside is that the test will have limited power, or none at all, against other potentially relevant alternatives. In addition, there is the inherent invitation to data mining, i.e., the tendency of researchers to explore various alternatives and to publish the results of experiments which, nominally, indicate statistical significance, while discarding the "unsuccessful" experiments. The multivariate approach to testing has the potential to reject any deviation from expected return linearity with power converging to one as T---~oo.The general nature of this "goodness-of-fit" approach is not without its downside, however, as it is likely to be less powerful against some alternatives than a more focused test. As discussed earlier, it also has its own data-mining problems.
3.3. Maximum likelihood and modified regression estimation

Gibbons (1982) proposes that classical maximum likelihood estimation (MLE) be used to estimate the betas and gammas in (1.3) simultaneously. Since M L E is asymptotically efficient (as T ~ ) , it is of interest to compare the efficiency of twopass estimation to that of MLE. The asymptotic analysis of the OLS second-pass
705
estimator, considered above, easily generalizes to weighted-least-squares (WLS) or generalized-least squares (GLS) versions of the estimator based on sample estimates of the variances and covariances, z One merely redefines the matrix A. It turns out that the asymptotic covariance matrix of the second-pass GLS estimator is the same as that for M L E and hence GLS is asymptotically efficient.21 In fact, the second-pass GLS estimator of F is identical to a one-step GaussNewton (linearization) procedure that Gibbons uses to simplify the computations. A straightforward computational procedure for exact M L E was subsequently developed in Kandel (1984) and extended in Shanken (1992). Although two-pass estimation is consistent, as T~oo, it suffers from an errorsin-variables problem since r, the independent variable in the cross-sectional relation, is measured with error. Thus, the slope (risk premium) estimator is biased toward zero and the bias is not eliminated asymptotically by increasing the number of securities; i.e., the estimator is not N-consistent. 22 Recognizing this, the early studies group securities into portfolios in order to reduce the variance of the error in estimating betas. Concerned about possible reductions in efficiency, elaborate techniques are used to ensure that a substantial spread in portfolio betas is maintained. Assuming the residual covariance matrix is (approximately) diagonal, Black, Jensen, and Scholes (1972) show that the resulting estimator is N-consistent. In proposing MLE, Gibbons (1982) conjectures that simultaneous estimation of betas and gammas should provide a solution to the errors-in-variables problem. However, simulation evidence in Amsler and Schmidt (1985) indicates that the GLS CSR (they call it "Newton-Raphson") estimator outperforms M L E in terms of mean-square error; GLS is biased upward while M L E is biased downward. Some support for Gibbons' conjecture is provided in Shanken (1992), however, in that a version of M L E with the residual covariance matrix constrained to be diagonal is shown to be N-consistent. Thus, the benefits of M L E may only be realized with a large number of assets. Although simultaneous estimation of betas and gammas is one path to Nconsistency, a modified version of the second-pass estimator is also N-consistent [Litzenberger and Ramaswamy (1979) and Shanken (1992)]. The modified estimator is based on the observation that inconsistency of the second-pass estimator is driven by systematic bias in the lower right element of the ff~X matrix. Conditioning on the time series of returns for portfolio p, we have:
20 In fact, the same estimator is obtained whether the residual covariance matrix or the (total) covariancematrix of returns is used. This was first noted by Litzenbergerand Ramaswamy(1979) for WLS. 21 This is true despite the fact that the OLS estimator of r, used in the CSR, is inefficient.Also, we assume that the constraint, 71p = rp-YOp, is imposed when appropriate. 22More formally,it does not convergeto the sample mean return on p minus the zero-betarate, the "ex post price of risk."
706
Z Shanken
E(/~'/~) = ff/~ + tr(E)/(Ts~), (3.3)
where tr(.) is the sum of the diagonal elements of a matrix. Subtracting off tr(~)/(Ts 2) from the lower right element of X ' k , therefore, yields an N-consistent estimator of F, provided the residual covariance matrix, Z, is (approximately) diagonal. 23 The asymptotic distribution of the estimator, as T---~, is unaffected by this modification. 24 Recall, from classical errors-in-variables analysis, that the slope estimator (';1) is attenuated toward zero by a factor equal to the variance of the true independent variable (fl), divided by the variance of the proxy variable (/~). This attenuation factor is less than one, since the latter variance equals the sum of the true variance and the measurement error variance. It is easily verified that the slope component of the modified estimator, described above, equals the regression slope estimator divided by an estimate of the attenuation f a c t o r y The results for M L E and modified CSR estimation suggest that the traditional use of portfolio grouping techniques to address the errors-in-variables problem m a y be unnecessary. An interesting issue that has not been adequately explored, however, concerns the relative efficiency of (modified) OLS or WLS estimation with a very large set of securities and M L E or G L S estimation with a more modest number of portfolios and a full covariance matrix.
3.4. Multivariate tests 3.4.1. Likelihood ratio and C S R T e tests

The first step toward a multivariate test of linearity is taken by MacBeth (1975), who uses a variation on Hotelling's T 2 test to evaluate whether the residuals from Fama-MacBeth CSR's systematically deviate from zero. The test does not fully take into account all of the existing parameter uncertainty, however. Gibbons (1982) formulates a likelihood ratio test (LRT) of the nonlinear restriction (1.3) under the assumption of temporally independent and identically jointly normally distributed returns. Inference is then based on the usual asymptotic ehi-square distribution. Unlike MacBeth's approach, the L R T accounts, at least asymptotically, for all relevant parameter uncertainty. As we shall see, though, the asymptotic test suffers from serious small-sample problems.
23 Unfortunately, this can result in a negative diagonal element in finite samples. 24 WLS and GLS versions of the modified CSR estimator have also been derived, and additional variables measured without error can be included as in section 3.2. See the referencescited earlier. Kim (1995) develops an MLE procedure that accomodates the use of betas estimated from prior data. The modified regression approach can also be applied using prior betas. In this case, T, sp, and the residual variance estimates substituted in (3.3) come from the time-seriesregressionsused to estimate the betas. 25Banz (1981) considers errors-in-variablesbiases in the gammaswhen additional variables like firm size are considered along with beta in cross-sectionalregressions. The coefficienton beta is still biased toward zero, while the "size effect" is overstated.
Statistical methods in tests of portfolio efficiency:A synthesis
707
The connection between the L R T and the multivariate T 2 test is explored in Shanken (1985). He shows that the relation (2.5) continues to hold for this model with the following expression substituted for Q: 2 2 QMLE ~ Te'~-le/( 1 + ~IMLE/Sp), where e ~ R - kFMLE, is the variance Shanken the CSR 2 is the sample unbiased estimate of the residual covariance matrix, Sp of return for portfolio p, and IVML E = (~0MLE, ~IMLE)I is the M L E for F. refers to the corresponding test based on the G L S CSR estimate of F as test (CSRT). 26
(3.4)
3.4.2. Small-sample inference

The test statistic in (3.4) is a direct generalization of Q in (2.2), for the riskless asset case, as ~ is obtained from the residual vector e by substituting the riskless rate and portfolio p ' s mean excess return for ~)0MLE and ~)IMLE, respectivelyY In other words, Q in (2.2) is just a constrained version of QMLZ in (3.4). This parallel suggests that the T 2 distribution might be useful in approximating the smallsample distributions of the L R T and the CSRT. 2s By this logic, ( T - N + 1)(N-2) -a (T--2)-lQMLz (and the corresponding C S R T statistic) should be approximately distributed as F with degrees of freedom N-2 and T - N + 1. Here, N-2 replaces N from the riskless asset case, since two additional cross-sectional parameters, 70p a n d 71p, are now estimated. Shanken (1985) shows, further, that ignoring estimation error in the betas and omitting the errors-in-variables adjustment term (denominator of (3.4)) in computing the C S R T "F-statistic" yields a lower bound on the exact p-value for the test. On the other hand, ignoring estimation error in FMLE and treating the gammas as if they were known yields an upper bound on the true p-value. In this case, the "F-statistic" is computed as in Section 2.2.1 with degrees of freedom N and T-N--1 [Shanken (1986)]. Zhou (1991) derives the exact distribution of the L R T and finds that it depends on a nuisance parameter that must be estimated. Optimal bounds that do not depend on the unknown parameter are also provided. Inferences based on small-sample analysis of the multivariate test differ dramatically from those based on the asymptotic chi-square distribution. For ex-
26 See Kandel (1984) and Roll (1985) for geometric perspectives on the LRT and CSRT, respectively. 27This follows from the usual relation between the (time-series)regressionestimates and the means of the regression variables. 28 This observation is made with the benefit of hindsight. In fact, most of the work on the multivariate statistical model with 70t, unknown was done before the riskless asset case was analyzed in depth.
708
Z Shanken
ample, whereas Gibbons (1982) obtains an asymptotic p-value less than 0.001 in testing the efficiency of a stock index, Shanken (1985) reports that a small-sample lower bound on the true p-value is 0.75. This difference is driven by the fact that error in estimating the residual covariance matrix is not reflected in the limiting chi-square distribution. The estimate of the inverse of the residual covariance matrix is quite noisy in small samples and severely biased upward when the number of assets, N, is large relative to the time-series length, T. 29 In Gibbons' case, the test was applied over subperiods with N = 40 and T = 60. Jobson and Korkie (1982) reach a similar conclusion about Gibbons' test using a Bartlett correction factor [also see Stambaugh (1982)]. Amsler and Schmidt (1985) find that this correction and Shanken's CSRT both perform quite well in simulations under joint normality.
4. Related work
Given a subset of a larger set of assets, it is natural to ask whether some portfolio of the assets in the subset is a minimum-variance portfolio with respect to the larger set. The minimum-variance problem considered in this review is a special case in which the subset consists of a single portfolio. Most of the results discussed here have straightforward generalizations to the multiple-portfolio or "multifactor" case. A related question is whether a given subset of risky assets actually spans the entire minimum-variance frontier of the larger set. This is a stronger restriction than that considered above, which Huberman and Kandel (1987) refer to as "intersection." They show that the spanning condition amounts to a joint restriction that the intercepts equal zero and the betas for each asset sum to one in the multifactor version of (1.2). This is tested using a small-sample F-statistic. There is also a literature that treats the efficient portfolio as an unobserved "latent variable." A time-series model of conditional expectations is postulated and used to derive testable cross-sectional restrictions on the joint distribution of observed security returns. See Gibbons and Ferson (1985) and Hansen and Hodrick (1983) for early examples of latent variable models. A recent paper by Zhou (1994) provides analytical generalized method of moment tests for latent variable models, permitting applications with many more assets than was previously computationally feasible.
29 The first and second moments of the distribution of the sample covariance matrix do not depend on N, whereas the moments of the distribution of the inverse involve expressions with T - N in the denominator. See Press (1982), pp. 107-120, for the basic properties of Wishart and inverted Wishart distributions.
Statistical methods in tests of portfolio efficiency." A synthesis
709
References
Amsler, C. and P. Schmidt (1985). A Monte Carlo investigation of the accuracy of CAPM tests. Economics 14, 359-375. Affieck-Graves, J. and B. McDonald (1989). Nonnormalities and tests of asset pricing theories. J. Finance 44, 889-908. Affleck-Graves, J. and B. McDonald (1990). Multivariate tests of asset pricing: The comparative power of alternative statistics. J. Financ. Quant. Anal 25, 163-183. Banz, R. (1981). The relationship between return and market value of common stocks. 9". Financ, Econom. 9, 3-18. Black, F. (1972), Capital market equilibrium with restricted borrowing. J. Business 45, 444-455. Brennan, M. (1971). Capital market equilibrium with divergent borrowing and lending rates. J. Financ. Quani. Anal. 6, 1197-1205. Black, F., M. C. Jensen and M. Scholes (1972). The capital asset pricing model: Some empirical tests. In: M. C. Jensen, ed., Studies in the theory of capital markets, Praeger, New York, NY. Blume, M. and I. Friend (1973). A new look at the capital asset pricing model. J, Finance 28, 19-34. Brown, S. and M. Weinstein (1983). A new approach to testing asset pricing models: The bilinear paradigm. J. Finance 38, 711-743. Campbell, J. (1985). Stock returns and the term structure. NBER Working Paper. Chamberlain G. (1983). A characterization of the distributions that imply mean-variance utility functions. 3". Econom. Theory 29, 185-2-1. Chen, N. F., R. Roll and S. Ross (1986). Economic forces and the stock market. J. Business 59, 383 403. Evans, G. and N. Savin (1982). Conflict among the criteria revisited: The W, LR and LM tests. Eeonometrica 50, 737-748. Fama, E. F. (1976). Foundations of Finance. Basic Books, New York, NY. Fama E. F. and J. MacBeth (1973). Risk, return, and equilibrium: Empirical tests. J. Politic. Econom. 81, 607-636. Ferson, W., S. Kandel and R. Stambaugh (1987). Tests of asset pricing with time-varying expected risk premiums and market betas. J. Finance 42, 201-220. Gibbous, M. (1980). Estimating the parameters of the capital asset pricing model: A minimum expected loss approach. Unpublished manuscript, Graduate School of Business, Stanford, University. Gibbons, M. (1982). Multivariate tests of financial models: A new approach. J. Financ. Econom. 10, 327. Gibbons, M. and W. Ferson (1985). Testing asset pricing models with changing expectations and an unobservable market portfolio. J. Financ. Econom. 14, 217-236. Gibbons, M,, S. Ross and J. Shanken (1989). A test of the efficiency of a given portfolio. Econometrica 57, 1121-1152. Hansen, L~ and R. Hodrick (1983). Risk-averse speculation in the forward foreign exchange market: An econometric analysis of linear models. In: J. J. Frenkel ed., Exchange rates and international macroeconomics. Cambridge, MA: National Bureau of Economic Research, 113-146. Harvey, C. (1989). Time-varying conditional covariances in tests of asset-pricing models. J. Financ, Econom. 24, 289-317. Harvey, C. and G. Zhou (1990). Bayesian inference in asset pricing tests. J. Financ. Eeonom. 26, 221254. Huberman, G. and S, Kandel (1987). Mean-variance spanning. J. Finance 42, 873-888. Jensen, M. 1972, Capital markets: Theory and evidence. Bell J. Econom. Mgmt. Sci. 3, 357-398. Jobson, J. D. and B. Korkie (1982) Potential performance and tests of portfolio effciency. J. Financ. Econom. 10, 433-466. Jobson, J. D. and B. Korkie (1985). Some test of linear asset pricing with multivariate normality. Canad. J. Administ. Sci. 2, 114-138.
710
J. Shanken
Kandel, S. (1984). The likelihood ratio test statistic of mean-variance efficiency without a riskless asset. J. Financ. Econorn. 13, 575-592. Kandel, S. and R. F. Stambaugh (1987). On correlations and inferences about mean-variance efficiency. J. Financ. Econom. 18, 61-90. Kandel, S. R. McCulloch and R. F. Stambaugh (1995). Bayesian inference and portfolio efficiency. Rev. Financ. Stud. 8, 1-53. Kim, D. (1995). The errors in the variables problem in the cross-section of expected stock returns. J. Finance 50, 1605-1634. Lintner, J. (1965). The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets. Rev. Econom. Statist. 47, 13 37. Litzenberger, R. and K. Ramaswamy (1979). The effect of personal taxes and dividends on capital asset prices: Theory and empirical evidence. J. Financ. Econom. 7, 163-195. Litzerberger, R. and C-f Huang (1988). Foundations for Financial Economics. Elsevier Science Publishing Company, Inc., North Holland. Lo, A. W. and A. C. MacKinlay (1990). Data-snooping biases in tests of financial asset pricing models. Rev. Financ. Stud. 3, 431-467. MacBeth, J. (1975). Tests of the two parameter model of capital market equilibrium. Ph.D. Dissertation, University of Chicago, Chicago, IL. MacKinlay, A. C. (1987). On multivariate tests of the CAPM. J. Einanc. Econom. 18, 341-371. MacKinlay, A. C. (1995). Multifactor models do not explain deviations from the CAPM. J. Financ. Econom. 38, 3-28. MacKinlay A. C. and M. Richardson (1991). Using generalized method of moments to test meanvariance efficiency. J. Finance 46, 511-527. McCulloch, R. and P. E. Rossi (1990). Posterior, predictive, and utility based approaches to testing the arbitrage pricing theory. J. Financ. Econom. 28, 7-38. Merton, R. (1973a). An analytic derivation of the efficient portfolio frontier. J. Financ. Quant. Anal., 1851-1872. Merton, R. (1973b). An intertemporal capital asset pricing model. Econometrica 41, 867-887. Morrison, D. (1976). Multivariate statistical methods. McGraw-Hill, New York. Pagan, A. (1983). Econometric issues in the analysis of regressions with generated regressors. Internat. Econom. Rev. 25, 221-247. Press, S. J. (1982). Applied Multivariate Analysis. Robert E. Krieger Publishing Company, Malabar, Florida. Roll, R. (1977). A critique of the asset pricing theory's test - Part 1: On past and potential testability of the theory. J. Financ. Econom. 4, 129 176. Roll, R. (1979). A reply to Mayers and Rice. J. Financ. Econom. 7, 391-399. Roll, R. (1985). A note on the geometry of Shanken's CSR T 2 test for mean/variance efficiency. J. Financ. Econom. 14, 349-357. Ross, S. (1977). The capital asset pricing model, short sales restrictions and related issues. J. Finance 32, 177-183. Shanken, J. (1983). An asymptotic analysis of the traditional risk-return model. Ph.D. Dissertation, Carnegie Mellon University, Chapter 2. Shanken, J. (1985). Multivariate tests of the zero-beta CAPM. J. Financ. Econom. 14, 327-348. Shanken, J. (1986). Testing portfolio efficiency when the zero-beta rate is unknown: A note. 3". Finance 41, 269-276. Shanken, J. (1987a). A Bayesian approach to testing portfolio efficiency. J. Financ. Econom. 19, 195215. Shanken, J. (1987b). Multivariate proxies and asset pricing relations: Living with the Roll critique. J. Financ. Econom. 18, 91-110. Shanken, J. (1990). Intertemporal asset pricing: An empirical investigation. J. Econometrics 45, 9% 120. Shanken, J. and M. Weinstein (1990). Macroeconomic variables and asset pricing: Further results. Working Paper, University of Rochester.
Statistical methods in tests o f portfolio efficiency: .4 synthesis
711
Shanken, J. (1992). On the estimation of beta-pricing models. Rev. Financ. Stud. 5, 1-33. Stambaugh, R. F. (1982). On the exclusion of assets from tests of the two-parameter model: A sensitivity analysis. J. Financ. Econom. 10, 237-268. Sharpe, W. F. (1964). Capital asset prices: A theory of market equilibrium under conditions of risk. J. Finance 19, 425442. White H. (1984). Asympototic theory for econometricians. Academic Press, Orlando, Florida. Zhou, G. (1991). Small sample tests of portfolio efficiency. J. Financ. Econom. 30, 165-191. Zhou, G. (1993). Asset pricing tests under alternative distributions. J. Finance 48, 1927-1942. Zhou, G. (1994). Analytical GMM tests: Asset pricing with time-varying risk premiums. Rev. Financ. Stud. 7, 687-709.
Subject Index
Absolute GARCH (AGARCH) 212 Active transactor 649 ADALINE network 533 Adaptive estimators 452 ANN evaluation criteria 540 implementation and interpretation 537 inputs and outputs 538 learning 531 statistical inference 542 - structure 530 Approximate efficiency 699 APT 2, 7, 220,502, 547 Arbitrage 2, 10, 24, 28, 339 ARCH filters 230-231 in mean 213 ARCH-M 213, 226 ARFIMA 342 Asset prices 613-615, 621,624, 626, 630, 635, 640, 643-644 - pricing models 1-2, 11, 13, 15-16, 22, 24, 29, 474 - theory 640 tests 39-40 Asymmetric business cycles 298 - GARCH 212, 234 information 349, 351,660 - power ARCH (APARCH) 213 Augmented GARCH 212 Autoregressive variance ARV 218
-
Beta 3, 5-6, 22, 432, 434, 442 - pricing 35-60, 74 - multiple beta models 1, 44M7 Bias correction 195, 467 Bid-ask spread 654, 655 Bispectrum test 327, 328, 329, 334 Bivariate probit model 561 Black Scholes formula 448, 449, 544 - model 120 Block trades 683 Bond ratings 546, 555 Bonferroni bounds test 244, 250 Bootstrap 318, 329, 353 Bootstrapping the data 469 Box-Cox transformation 494 Box-Tiao distribution 435, 452 Brier score 258 Bubbles 201,629-630 Burr 12 type distribution 444 Business cycle 298, 301,307, 310, 311 Calibration 259 Call auctions 680 Canonical Factor Analysis 489, 500 CAPM (capital asset pricing model) 2-6, 1112, 19, 21, 23, 24, 33, 37-44, 401-404, 695, 699, 701 Cauchy distribution 43 I, 436 Censored count 371 duration 365, 371,385 Chaos theory 317-318 Characteristic exponent 431 Co-persistence in the variances 223 COFAMM for factor analysis 499 Cointegration 201,473, 621,684, 688 Commercial paper market 311 Common features 229 Common stochastic trends 622 Complexity theory 317-319 Composite forecast 241,254
Backpropagation 531 Balance sheet effect 309, 312 Bank failures 559 Bankruptcy prediction 545 Bayesian methods 177, 576, 623, 624 shrinkage 255-256 - tests of efficiency 700 BDS test (Brock, Dechert and Scheinkman) 228, 327, 328, 329, 332, 334, 335, 337
713
714
Subject index
Errors-in-variables 507-513, 515, 517, 518519, 525, 703, 705-707 Euclidean distance 492 European call option 449 Event studies 557 Ex-post rational stock price 194 Exact ML 225 Exccess returns 621-622 Exchange rate 614, 619, 6244525 - forecast errors 625 forecasting 546 Exogeneity 199 Expected returns 274-276, 283, 613, 637 - utility 440-441 Exploratory data analysis 503 Exploratory multivariate techniques 489 Exponential GARCH (EGARCH) 211,216219, 226, 230 Extensive variables 201 External finance premium 309
-
Conditional - asset pricing 32, 37-38 - beta models 76, 35-60 coverage 263-264 - efficiency 700 - equity premium 638 Confluent hypergeometric series 455 Consistent moment test 330, 331 Constant elasticity of variance 580,601 Continuous time GARCH 232 time stochastic process 396 Corporate merger prediction 546 Corporate takeovers 562 Correspondence analysis 489 Counter-cyclical policy 310 Counts Hurdle model 381-382, 389 - mixture model 375, 378 modified count model 381 - negative binomial model 363, 373-375, 378, 382, 388 - poisson models 366, 371,375, 379-383, 389 zero inflated 382 - positive 381-382 - truncated 372 Credit crunch 305, 311 Cross-section regression T2 tests 706, 707 Cross-validation 539
-
Data mining 704 Default risk 306, 311 Definitional revisions 307 Demand shocks 309 Detection of outliers 493 Dickey-Fuller test 302 Diebold-Mariano test 251,262 Direct and reverse regession methods 508, 514, 515 Direction-of-change forecasts 242, 256-257, 265 Discount rate 305, 699 Discriminant analysis 545 Dividend ratio model 631-632 smoothing 198 Duration models 365-371,383-385, 389 Efficient portfolio 693-695, 699 set 440, 442 tests of linearity 704 - tests of efficiency 701-707 time-varying betas 701 EM algorithm 302 Equilibrium asset pricing models 2, 12 Equity premium puzzle 636, 639 risk premium 615, 637-638 Error correction model 685
-
Factor analysis 489, 498, 508, 513, 524-525 analytic model 220, 496 loadings 499 scores 500 - ARCH 220, 223, 229 - GARCH 221-222, 234-235 False signal 299, 304 Fat tails 317, 329, 332 Feasible conditional GMM 85 Federal funds targeting 306 Feller-Pareto distribution 433 Filter rule 298, 303, 308 Filtration 582 Financial assets 2, 24 crisis 629 - markets 24, 297 First order stochastic dominant 441 Fisher relation 624 Fisk distribution 435 Flexible Fourier Form FFF 216 - functional form 533 Forecast accuracy 247, 250-252, 257, 260, 265 - combination 241,252-254 encompassing tests 252 - error 6144518, 621,625, 634, 637 - evaluation 241-242, 258, 261,264-265 turning points 301 Forecasting errors 299 - horizon 307 Foreign exchange returns 618 risk premia 639, 641 Forward premium puzzle 615, 618, 634, 639 Fractional stochastic volatility 165 Friction model 563 Full Optimality 246-247
-
Subject index
Fully Adaptive 451,453 Futures markets 564 GARCH-jump models 234 General mapping 532 Generalized beta distribution 444, 448 gamma distribution 438, 443, 445-446 - hypergeometric series 455 method of moments 1, 3, 11, 15, 29, 33, 4757, 218, 224, 244, 451-452, 468 - peso problem 623, 625, 628, 630 - poisson 375 - t 435, 452 Geometric random walk 201 Gini coefficients 446 Global minima 532 GPH test (Geweke and Porter Hudak) 342 Granger causality 297 Graphic display of data 492 Graphical display of data 492 Grouped duration 386 Grouping methods 509, 512, 525 Growth recessions 305, 307
-
715
585-587, 591,603
Jump-diffusion processes
Kernel estimators 216, 226, 451 Kurtosis 427, 429-430, 435, 437-439, 449, 451,454 Lp estimators 450~52, 454 Lagged price adjustment 656 Laplace distribution 435, 450 Large cross-sections 82 Latent variables 33, 46-47, 499, 515, 519-520, 521, 536 Leading indicator 300, 307 Learning 532, 616, 623, 625 Leptokurtosis 429, 436, 442 Leverage effects 581 Likelihood ratio test 706 Liquidity constraints 310 Loan discrimination 553 Log-t distribution 438 Logistic functions 644 Logit model 535, 545 Lognormal distribution 427, 432, 438, 442, 444, 448-449 Long memory 152, 164, 317, 338-340 horizon regressions 266, 478 term prediction 275-278, 282-284, 289 Lorenz dominance 444-445 Loss function 246-250, 252, 261-262, 265
-
Habit Persistence 14 Hansen-Jagannathan bound 640 Hazard function 367, 369, 384, 386 - rate 365, 368, 383-384 Heavy tails 329 Heterogeneity 433-435 Heteroskedasticity 30, 303, 343 344, 347 Hotelling Tz test 696, 706 Hull and White model 126, 160 Hurst exponent 339, 348 Hypergeometric series 434, 456 Hysteresis (GARCH) HGARCH 213 IGARCH 211,219, 222-223, 225, 228, 233 Implicit efficient price 654 volatility 62, 576, 587 Impulse response analysis 483, 655 Incomplete moments 443, 449 Index of Leading Economic Indicators 297, 307 Indirect inference 174 Inflationary pressures 305, 309 Information sets 246-247, 253 Instrumental variable estimation 35~60 Integrated hazard function 367, 369 Intensive variables 201 Interacting systems 318, 320 Intermediate target 306 Intraday patterns 673 Inventory models 657
M-estimators 451 Mahalanobis distance 493 Market - closures 673 - coherent hypothesis 352 - efficiency 193, 269, 272, 275-276 - efficiency tests 579-580 - microstructure 563, 605 - model 450 - portfolio 2, 4-5, 20-22 Markov process 263, 299, 627, 630-631,636637, 643 switching model 299, 303, 639 Martingale 270, 272, 651 Matched samples 558 Maximum likelihood estimation 224, 226, 228, 299, 439, 499, 574, 582, 586-587 Mean absolute percent error 248 reversion 338 squared percent error 248 Gini 440, 445-446 Mesokurtic 429 Method of simulated moments 318 MIMIC model 515, 520-525, 536 Minimum-variance portfolio 694, 697 Model order 671
-
716 Model selection 538, 542 Model-free estimator 195 Modified regression estimation 704 Moment condition failure 333-334, 341 - distributions 429 matching 174 Monte Carlo experiments 624, 6194520, 633 simulation 643 Moving block bootstrap 464 Multifactor models 80 Multiple markets 680 - beta models 3, 6-7, 12, 19, 28 Multivariate - approach 704 - GARCH 222, 229, 232, 234 normality 500 - tests 695-696, 698, 706, 707
-
Subject index
Peso problem 613-615, 617, 620, 622, 624626, 628, 630, 634, 641-642, 644 Platykurtic 429 Poisson jump model 214 Policy-induced shocks 309 Portfolio efficiency 694 substitutability 311 - theory 401-404 Power comparisons of predictive tests 287289 - exponential distribution 435, 451 Predictive performance 297, 301 Predictive stochastic complexity (PSC) 541 Present value models 201,626, 629 Pricing error 654 - kernel 1,634-635, 6374539, 641 Principal components analysis 489, 490, 535 interpretation 491 Private-public spread 302, 306-307, 311 Probabilistic inferences 299 Probability forecast 242, 258-262, 264 Probit model 535 Proportional hazards model 369, 384, 386 Proxy variables 507, 515-516, 520, 525 Pseudo weights 542
-
NBER-defined recessions 308 Network pruning 540 No-arbitrage constraints 572 Non-linear filter 299, 311 Non-normality 564 Non-stationarity 198, 201,302, 335, 346 Nonlinear ARCH 213 - combining regressions 255-256 - Granger causality 337 Nonlinearity 317, 326, 679 Normal kernal 452 lognormal 431 student's t and lognormal 429 Normalized incomplete moments 429 Nuisance parameter problems 195
-
Quadratic GARCH 212 hill-climbing 600 - probability score 258 Qualitative and limited dependent variable models 523 Quasi maximum likelihood method QML 170, 218, 225, 226, 228 Bayesian 302
-
Observation noise 531 Optimal forecast 242, 244, 249 - network design 539 Option pricing 158, 404~15, 428, 437, 448, 450, 544 Ordered probit 387-388 Ornstein-Uhlenbeck process 582 OSIRIS for factor analysis 499 Outliers 489 Overdifferencing 665, 668 Overdispersion tests 375-379 Overlapping generations model 204 Paper-bill spread 297, 299, 311 Parallel markets 683 Parameter instability 642 Partial optimality 246-247 Partially adaptive 451-452 Passive transactor 649 Pearson distributions 428, 432 Permanent and transitory components
649
R/S and the GPH test 339-341,344, 348 Random walk model 270, 274, 276, 278-279, 651,669 Rational expectations 32, 206, 564, 6134519, 621,626, 629, 630, 632, 641 Rayleigh distribution 435 Recursive bootstrap 464 Regime switching 569, 615, 6284532, 640644 Regression based forecast combination 254 Regularity conditions 643 Rejection region 196, 205 Renewal process 366, 369-371,389 Reserve requirements 306 Return autocorrelations 205 Right-censored 384 Risk aversion 204 - and peso problems 634 - premia 613, 6194520, 6384541 neutral distribution 600 neutral processes 569
-
Subject index
Scaling laws 319 Second order stochastic dominance 440 pass estimator 705 Self-selection bias 558 Sensitivity analysis 543 Shape factors 494 Sharpe ratios 697 Sign test 243-245, 247, 251 Signal extraction methods 508, 523 Signed-rank test 244-245, 247, 251 Simulated method of moments 173, 318, 323 Singular value decomposition 491 Size factor 494 Skewed data 431,434, 436 Small-sample inference 697, 707 Semi-non-parametric models 218-219, 226 Specification tests 642 Spectral representation 398-399 SSD 441-442, 447 Stable distributions 332-333, 340, Bayesian estimation 416 - continuous time processes 396-397 - elliptical 399-401 - empirical objections to 416 - estimation 415-419 - multivariate 397-401,419 - option pricing 404~15 paretian 431,438 - portfolio theory 401-404 - properties 394-396 spectral representation 398-399 Standard bootstrap 464 State-space models 418-4 19 Stationarity 199, 276 Stationary bootstrap 465 Stochastic - dominance 440, 442-443, 447 interest rate 589 simulation 298 trends 621 variance or volatility (SV) 218 volatility 235, 344-345, 438, 448, 450, 581585, 590, 601 Stock market prediction 547 Stock returns 636~37 Strong GARCH 231-232 Structural models 321,353 Student GARCH 234 Studentization 471 Stylized features of market activity 349-350 Submartingale 271 Supply shocks 309 Survivor functions 367-368, 384 Switching model 619, 625, 631,636, 643-644 regime ARCH (SWARCH) 215
-
717 571-574, 600
Synchronization error
Technology shocks 309 Temporal aggregation 154, 157 Temporary component 276, 277, 288 Term structure of interest rates 297, 309, 312, 630-631,634 characteristics 92 forward rate models 114 - GARCH models 9 7 interest rates 92 multiple factor models 114 one factor models 108 two factor models 111 Term structure of volatilities 130 Threshold ARCH 217 - GARCH 212, 213 Theil's U-statistic 252, 258, 260 Time deformation 676 dependent Poisson process 368 Trade reporting 652 Trader heterogeneity 350 Trading rules 476 Transaction costs 563 Transfer functions 530-531,533 Transitional probabilities 303, 627, 643-644 Transmission mechanism 308-309, 312 Transversality condition 202 Treasury bill market 305, 312 Trend-stationarity 200 Turning points 298, 305, 311-312 Two-pass methodology 693, 702 Two-stage regression 3
-
Unconditional equity premium 639 Uncovered interest parity 571 Underdispersion 375-378, 388 Unit root models 200, 468 Universal approximation 532 Variance lower bound 671 - ratios 278, 287, 682-683 bounds tests 193 covariance forecast combination methods 254 Vector ARCH 230 Vector Autoregressions (VARs) 297, 667 Volatility forecast 261-262, 264 statistic 196
-
Weibull distribution 369-370, 383-384, 385 West test 198 Wilcoxon signed-rank test 243, 245, 251 Within regime forecasts 616, 626, 628, 637 Yield curve 298, 305
Handbook of Statistics Contents of Previous Volumes
Volume 1. Analysis of Variance Edited by P. R. Krishnaiah 1980 xviii + 1002 pp.
1. Estimation of Variance Components by C. R. Rao and J. Kleffe 2. Multivariate Analysis of Variance of Repeated Measurements by N. H. Timm 3. Growth Curve Analysis by S. Geisser 4. Bayesian Inference in MANOVA by S. J. Press 5. Graphical Methods for Internal Comparisons in ANOVA and MANOVA by R. Gnanadesikan 6. Monotonicity and Unbiasedness Properties of ANOVA and MANOVA Tests by S. Das Gupta 7. Robustness of ANOVA and MANOVA Test Procedures by P. K. Ito 8. Analysis of Variance and Problem under Time Series Models by D. R. Brillinger 9. Tests of Univariate and Multivariate Normality by K. V. Mardia 10. Transformations to Normality by G. Kaskey, B. Kolman, P. R. Krishnaiah and L. Steinberg 11. ANOVA and MANOVA: Models for Categorical Data by V. P. Bhapkar 12. Inference and the Structural Model for ANOVA and MANOVA by D. A. S. Fraser 13. Inference Based on Conditionally Specified ANOVA Models Incorporating Preliminary Testing by T. A. Bancroft and C. -P. Han 14. Quadratic Forms in Normal Variables by C. G. Khatri 15. Generalized Inverse of Matrices and Applications to Linear Models by S. K. Mitra 16. Likelihood Ratio Tests for Mean Vectors and Covariance Matrices by P. R. Krishnaiah and J. C. Lee
719
720 17. 18. 19. 20. 21. 22. 23. 24. 25.
Contents of previous volumes
Assessing Dimensionality in Multivariate Regression by A. J. Izenman Parameter Estimation in Nonlinear Regression Models by H. Bunke Early History of Multiple Comparison Tests by H. L. Harter Representations of Simultaneous Pairwise Comparisons by A. R. Sampson Simultaneous Test Procedures for Mean Vectors and Covariance Matrices by P. R. Krishnaiah, G. S. Mudholkar and P. Subbiah Nonparametric Simultaneous Inference for Some MANOVA Models by P. K. Sen Comparison of Some Computer Programs for Univariate and Multivariate Analysis of Variance by R. D. Bock and D. Brandt Computations of Some Multivariate Distributions by P. R. Krishnaiah Inference on the Structure of Interaction in Two-Way Classification Model by P. R. Krishnaiah and M. Yochmowitz
Volume 2. Classification, Pattern Recognition and Reduction o f Dimensionality Edited by P. R. Krishnaiah and L. N. Kanal 1982 xxii + 903 pp.
1. Discriminant Analysis for Time Series by R. H. Shumway 2. Optimum Rules for Classification into Two Multivariate Normal Populations with the Same Covariance Matrix by S. Das Gupta 3. Large Sample Approximations and Asymptotic Expansions of Classification Statistics by M. Siotani 4. Bayesian Discrimination by S. Geisser 5. Classification of Growth Curves by J. C. Lee 6. Nonparametric Classification by J. D. Broffitt 7. Logistic Discrimination by J. A. Anderson 8. Nearest Neighbor Methods in Discrimination by L. Devroye and T. J. Wagner 9. The Classification and Mixture Maximum Likelihood Approaches to Cluster Analysis by G. J. McLachlan 10. Graphical Techniques for Multivariate Data and for Clustering by J. M. Chambers and B. Kleiner 11. Cluster Analysis Software by R. K. Blashfield, M. S. Aldenderfer and L. C. Morey 12. Single-link Clustering Algorithms by F. J. Rohlf 13. Theory of Multidimensional Scaling by J. de Leeuw and W. Heiser 14. Multidimensional Scaling and its Application by M. Wish and J. D. Carroll 15. Intrinsic Dimensionality Extraction by K. Fukunaga
721
16. Structural Methods in Image Analysis and Recognition by L. N. Kanal, B. A. Lambird and D. Lavine 17. Image Models by N. Ahuja and A. Rosenfeld 18. Image Texture Survey by R. M. Haralick 19. Applications of Stochastic Languages by K. S. Fu 20. A Unifying Viewpoint on Pattern Recognition by J. C. Simon, E. Backer and J. Sallentin 21. Logical Functions in the Problems of Empirical Prediction by G. S. Lbov 22. Inference and Data Tables and Missing Values by N. G. Zagoruiko and V. N. Yolkina 23. Recognition of Electrocardiographic Patterns by J. H. van Bemmel 24. Waveform Parsing Systems by G. C. Stockman 25. Continuous Speech Recognition: Statistical Methods by F. Jelinek, R. L. Mercer and L. R. Bahl 26. Applications of Pattern Recognition in Radar by A. A. Grometstein and W. H. Schoendorf 27. White Blood Cell Recognition by E. S. Gelsema and G. H. Landweerd 28. Pattern Recognition Techniques for Remote Sensing Applications by P. H. Swain 29. Optical Character Recognition - Theory and Practice by G. Nagy 30. Computer and Statistical Considerations for Oil Spill Identification by Y. T. Chinen and T. J. Killeen 31. Pattern Recognition in Chemistry by B. R. Kowalski and S. Wold 32. Covariance Matrix Representation and Object-Predicate Symmetry by T. Kaminuma, S. Tomita and S. Watanabe 33. Multivariate Morphometrics by R. A. Reyment 34. Multivariate Analysis with Latent Variables by P. M. Bentler and D. G. Weeks 35. Use of Distance Measures, Information Measures and Error Bounds in Feature Evaluation by M. Ben-Bassat 36. Topics in Measurement Selection by J. M. Van Campenhout 37. Selection of Variables Under Univariate Regression Models by P. R. Krishnaiah 38. On the Selection of Variables Under Regression Models Using Krishnaiah's Finite Intersection Tests by J. L Schmidhammer 39. Dimensionality and Sample Size Considerations in Pattern Recognition Practice by A. K. Jain and B. Chandrasekaran 40. Selecting Variables in Discriminant Analysis for Improving upon Classical Procedures by W. Schaafsma 41. Selection of Variables in Discriminant Analysis by P. R. Krishnaiah
722
Volume 3. Time Series in the Frequency D o m a i n Edited by D. R. Brillinger and P. R. Krishnaiah 1983 xiv + 485 pp.
1. Wiener Filtering (with emphasis on frequency-domain approaches) by R. J. Bhansali and D. Karavellas 2. The Finite Fourier Transform of a Stationary Process by D. R. Brillinger 3. Seasonal and Calender Adjustment by W. S. Cleveland 4. Optimal Inference in the Frequency Domain by R. B. Davies 5. Applications of Spectral Analysis in Econometrics by C. W. J. Granger and R. Engle 6. Signal Estimation by E. J. Hannan 7. Complex Demodulation: Some Theory and Applications by T. Hasan 8. Estimating the Gain of a Linear Filter from Noisy Data by M. J. Hinich 9. A Spectral Analysis Primer by L. H. Koopmans 10. Robust-Resistant Spectral Analysis by R. D. Martin 11. Autoregressive Spectral Estimation by E. Parzen 12. Threshold Autoregression and Some Frequency-Domain Characteristics by J. Pemberton and H. Tong 13. The Frequency-Domain Approach to the Analysis of Closed-Loop Systems by M. B. Priestley 14. The Bispectral Analysis of Nonlinear Stationary Time Series with Reference to Bilinear Time-Series Models by T. Subba Rao 15. Frequency-Domain Analysis of Multidimensional Time-Series Data by E. A. Robinson 16. Review of Various Approaches to Power Spectrum Estimation by P. M. Robinson 17. Cumulants and Cumulant Spectral Spectra by M. Rosenblatt 18. Replicated Time-Series Regression: An Approach to Signal Estimation and Detection by R. H. Shumway 19. Computer Programming of Spectrum Estimation by T. Thrall 20. Likelihood Ratio Tests on Covariance Matrices and Mean Vectors of Complex Multivariate Normal Populations and their Applications in Time Series by P. R. Krishnaiah, J. C. Lee and T. C. Chang
723
Volume 4. Nonparametric Methods Edited by P. R. Krishnaiah and P. K. Sen 1984 xx + 968 pp.
1. Randomization Procedures by C. B. Bell and P. K. Sen 2. Univariate and Multivariate Mutisample Location and Scale Tests by V. P. Bhapkar 3. Hypothesis of Symmetry by M. Hugkovfi 4. Measures of Dependence by K. Joag-Dev 5. Tests of Randomness against Trend or Serial Correlations by G. K. Bhattacharyya 6. Combination of Independent Tests by J. L. Folks 7. Combinatorics by L. Tak/tcs 8. Rank Statistics and Limit Theorems by M. Ghosh 9. Asymptotic Comparison of Tests-A Review by K. Singh 10. Nonparametric Methods in Two-Way Layouts by D. Quade 11. Rank Tests in Linear Models by J. N. Adichie 12. On the Use of Rank Tests and Estimates in the Linear Model by J. C. Aubuchon and T. P. Hettmansperger 13. Nonparametric Preliminary Test Inference by A. K. Md. E. Saleh and P. K. Sen 14. Paired Comparisons: Some Basic Procedures and Examples by R. A. Bradley 15. Restricted Alternatives by S. K. Chatterjee 16. Adaptive Methods by M. Hugkovfi 17. Order Statistics by J. Galambos 18. Induced Order Statistics: Theory and Applications by P. K. Bhattacharya 19. Empirical Distribution Function by E. Cs/tki 20. Invariance Principles for Empirical Processes by M. Cs6rg6 21. M-, L- and R-estimators b y J. Jure6kov/t 22. Nonparametric Sequantial Estimation by P. K. Sen 23. Stochastic Approximation by V. Dupa6 24. Density Estimation by P. R~v6sz 25. Censored Data by A. P. Basu 26. Tests for Exponentiality by K. A. Doksum and B. S. Yandell 27. Nonparametric Concepts and Methods in Reliability by M. Hollander and F. Proschan 28. Sequential Nonparametric Tests by U. M~ller-Funk 29. Nonparametric Procedures for some Miscellaneous Problems by P. K. Sen 30. Minimum Distance Procedures by R. Beran 31. Nonparametric Methods in Directional Data Analysis by S. R. Jammalamadaka 32. Application of Nonparametric Statistics to Cancer Data by H. S. Wieand
724
33. Nonparametric Frequentist Proposals for Monitoring Comparative Survival Studies by M. Gail 34. Meterological Applications of Permutation Techniques based on Distance Functions by P. W. Mielke, Jr. 35. Categorical Data Problems Using Information Theoretic Approach by S. Kullback and J. C. Keegel 36. Tables for Order Statistics by P. R. Krishnaiah and P. K. Sen 37. Selected Tables for Nonpararnetric Statistics by P. K. Sen and P. R. Krishnaiah
Volume 5. Time Series in the Time D o m a i n Edited by E. J. Hannan, P. R. Krishnaiah and M. M. R a o 1985 xiv + 490 pp.
1. Nonstationary Autoregressive Time Series by W. A. Fuller 2. Non-Linear Time Series Models and Dynamical Systems by T. Ozaki 3. Autoregressive Moving Average Models, Intervention Problems and Outlier Detection in Time Series by G. C. Tiao 4. Robustness in Time Series and Estimating ARMA Models by R. D. Martin and V. J. Yohai 5. Time Series Analysis with Unequally Spaced Data by R. H. Jones 6. Various Model Selection Techniques in Time Series Analysis by R. Shibata 7. Estimation of Parameters in Dynamical Systems by L. Ljung 8. Recursive Identification, Estimation and Control by P. Young 9. General Structure and Parametrization of ARMA and State-Space Systems and its Relation to Statistical Problems by M. Deistler 10. Harmonizable, Cram6r, and Karhunen Classes of Processes by M. M. Rao 11. On Non-Stationary Time Series by C. S. K. Bhagavan 12. Harmonizable Filtering and Sampling of Time Series by D. K. Chang 13. Sampling Designs for Time Series by S. Cambanis 14. Measuring Attenuation by M. A. Cameron and P. J. Thomson 15. Speech Recognition Using LPC Distance Measures by P. J. Thomson and P. de Souza 16. Varying Coefficient Regression by D. F. Nicholls and A. R. Pagan 17. Small Samples and Large Equation Systems by H. Theil and D. G. Fiebig
725
Volume 6. Sampling Edited by P. R. Krishnaiah and C. R. Rao 1988 xvi + 594 pp.
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.
21. 22. 23. 24.
A Brief History of Random Sampling Methods by D. R. Bellhouse A First Course in Survey Sampling by T. Dalenius Optimality of Sampling Strategies by A. Chaudhuri Simple Random Sampling by P. K. Pathak On Single Stage Unequal Probability Sampling by V. P. Godambe and M. E. Thompson Systematic Sampling by D. R. Bellhouse Systematic Sampling with Illustrative Examples by M. N. Murthy and T. J. Rao Sampling in Time by D. A. Binder and M. A. Hidiroglou Bayesian Inference in Finite Populations by W. A. Ericson Inference Based on Data from Complex Sample Designs by G. Nathan Inference for Finite Population Quantiles by J. Sedransk and P. J. Smith Asymptotics in Finite Population Sampling by P. K. Sen The Technique of Replicated or Interpenetrating Samples by J. C. Koop On the Use of Models in Sampling from Finite Populations by I. Thomsen and D. Tesfu The Prediction Approach to Sampling theory by R. M. Royall Sample Survey Analysis: Analysis of Variance and Contingency Tables by D. H. Freeman, Jr. Variance Estimation in Sample Surveys by J. N. K. Rao Ratio and Regression Estimators by P. S. R. S. Rao Role and Use of Composite Sampling and Capture-Recapture Sampling in Ecological Studies by M. T. Boswell, K. P. Burnham and G. P. Patil Data-based Sampling and Model-based Estimation for Environmental Resources by G. P. Patil, G. J. Babu, R. c. Hennemuth, W. L. Meyers, M. B. Rajarshi and C. Taillie On Transect Sampling to Assess Wildlife Populations and Marine Resources by F. L. Ramsey, C. E. Gates, G. P. Patil and C. Taillie A Review of Current Survey Sampling Methods in Marketing Research (Telephone, Mall Intercept and Panel Surveys) by R. Velu and G. M. Naidu Observational Errors in Behavioural Traits of Man and their Implications for Genetics by P. V. Sukhatme Designs in Survey Sampling Avoiding Contiguous Units by A. S. Hedayat, C. R. Rao and J. Stufken
726
Volume 7. Quality Control and Reliability Edited by P. R. Krishnaiah and C. R. R a o 1988 xiv + 503 pp.
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23.
Transformation of Western Style of Management by W. Edwards Deming Software Reliability by F. B. Bastani and C. V. Ramamoorthy Stress-Strength Models for Reliability by R. A. Johnson Approximate Computation of Power Generating System Reliability Indexes by M. Mazumdar Software Reliability Models by T. A. Mazzuchi and N. D. Singpurwalla Dependence Notions in Reliability Theory by N. R. Chaganty and K. Joagdev Application of Goodness-of-Fit Tests in Reliability by H. W. Block and A. H. Moore Multivariate Nonparametric Classes in Reliability by H. W. Block and T. H. Savits Selection and Ranking Procedures in Reliability Models by S. S. Gupta and S. Panchapakesan The Impact of Reliability Theory on Some Branches of Mathematics and Statistics by P. J. Boland and F. Proschan Reliability Ideas and Applications in Economics and Social Sciences by M. C. Bhattacharjee Mean Residual Life: Theory and Applications by F. Guess and F. Proschan Life Distribution Models and Incomplete Data by R. E. Barlow and F. Proschan Piecewise Geometric Estimation of a Survival Function by G. M. Mimmack and F. Proschan Applications of Pattern Recognition in Failure Diagnosis and Quality Control by L. F. Pau Nonparametric Estimation of Density and Hazard Rate Functions when Samples are Censored by W. J. Padgett Multivariate Process Control by F. B. Alt and N. D. Smith QMP/USP-A Modem Approach to Statistical Quality Auditing by B. Hoadley Review About Estimation of Change Points by P. R. Krishnaiah and B. Q. Miao Nonparametric Methods for Changepoint Problems by M. Cs6rg6 and L. Horv~tth Optimal Allocation of Multistate Components by E. E1-Neweihi, F. Proschan and J. Sethuraman Weibull, Log-Weibull and Gamma Order Statistics by H. L. Herter Multivariate Exponential Distributions and their Applications in Reliability by A. P. Basu
727
24. Recent Developments in the Inverse Gaussian Distribution by S. Iyengar and G. Patwardhan
Volume 8. Statistical Methods in Biological and Medical Sciences Edited by C. R. R a o and R. C h a k r a b o r t y 1991 xvi + 554 pp.
1. Methods for the Inheritance of Qualitative Traits by J. Rice, R. Neuman and S. O. Moldin 2. Ascertainment Biases and their Resolution in Biological Surveys by W. J. Ewens 3. Statistical Considerations in Applications of Path Analytical in Genetic Epidemiology by D. C. Rao 4. Statistical Methods for Linkage Analysis by G. M. Lathrop and J. M. Lalouel 5. Statistical Design and Analysis of Epidemiologic Studies: Some Directions of Current Research by N. Breslow 6. Robust Classification Procedures and Their Applications to Anthropometry by N. Balakrishnan and R. S. Ambagaspitiya 7. Analysis of Population Structure: A Comparative Analysis of Different Estimators of Wright's Fixation Indices by R. Chakraborty and H. DankerHopfe 8. Estimation of Relationships from Genetic Data by E. A. Thompson 9. Measurement of Genetic Variation for Evolutionary Studies by R. Chakraborty and C. R. Rao 10. Statistical Methods for Phylogenetic Tree Reconstruction by N. Saitou 11. Statistical Models for Sex-Ratio Evolution by S. Lessard 12. Stochastic Models of Carcinogenesis by S. H. Moolgavkar 13. An Application of Score Methodology: Confidence Intervals and Tests of Fit for One-Hit-Curves by J. J. Gart 14. Kidney-Survival Analysis of IgA Nephropathy Patients: A Case Study by O. J. W. F. Kardaun 15. Confidence Bands and the Relation with Decision Analysis: Theory by O. J. W. F. Kardaun 16. Sample Size Determination in Clinical Research by J. Bock and H. Toutenburg
728
Volume 9. Computational Statistics Edited by C. R. R a o 1993 xix + 1045 pp.
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28.
Algorithms by B. Kalyanasundaram Steady State Analysis of Stochastic Systems by K. Kant Parallel Computer Architectures by R. Krishnamurti and B. Narahari Database Systems by S. Lanka and S. Pal Programming Languages and Systems by S. Purushothaman and J. Seaman Algorithms and Complexity for Markov Processes by R. Varadarajan Mathematical Programming: A Computational Perspective by W. W. Hager, R. Horst and P. M. Pardalos Integer Programming by P. M. Pardalos and Y. Li Numerical Aspects of Solving Linear Lease Squares Problems by J. L. Barlow The Total Least Squares Problem by S. Van Huffel and H. Zha Construction of Reliable Maximum-Likelihood-Algorithms with Applications to Logistic and Cox Regression by D. B6hning Nonparametric Function Estimation by T. Gasser, J. Engel and B. Seifert Computation Using the QR Decomposition by C. R. Goodall The EM Algorithm by N. Laird Analysis of Ordered Categorial Data through Appropriate Scaling by C. R. Rao and P. M. Caligiuri Statistical Applications of Artificial Intelligence by W. A. Gale, D. J. Hand and A. E. Kelly Some Aspects of Natural Language Processes by A. K. Joshi Gibbs Sampling by S. F. Arnold Bootstrap Methodology by G. J. Babu and C. R. Rao The Art of Computer Generation of Random Variables by M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie Jackknife Variance Estimation and Bias Reduction by S. Das Peddada Designing Effective Statistical Graphs by D. A. Burn Graphical Methods for Linear Models by A. S. Hadi Graphics for Time Series Analysis by H. J. Newton Graphics as Visual Language by T. Selker and A. Appel Statistical Graphics and Visualization by E. J. Wegman and D. B. Carr Multivariate Statistical Visualization by F. W. Young, R. A. Faldowski and M. M. McFarlane Graphical Methods for Process Control by T. L. Ziemer
729
Volume 10. Signal Processing and its Applications Edited by N. K. Bose and C. R. R a o 1993 xvii + 992 pp.
1. Signal Processing for Linear Instrumental Systems with Noise: A General Theory with Illustrations for Optical Imaging and Light Scattering Problems by M. Bertero and E. R. Pike 2. Boundary Implication Rights in Parameter Space by N. K. Bose 3. Sampling of Bandlimited Signals: Fundamental Results and Some Extensions by J. L. Brown, Jr. 4. Localization of Sources in a Sector: Algorithms and Statistical Analysis by K. Buckley and X.-L. Xu 5. The Signal Subspace Direction-of-Arrival Algorithm by J. A. Cadzow 6. Digital Differentiators by S. C. Dutta Roy and B. Kumar 7. Orthogonal Decompositions of 2D Random Fields and their Applications for 2D Spectral Estimation by J. M. Francos 8. VLSI in Signal Processing by A. Ghouse 9. Constrained Beamforming and Adaptive Algorithms by L. C. Godara 10. Bispectral Speckle Interferometry to Reconstruct Extended Objects from Turbulence-Degraded Telescope Images by D. M. Goodman, T. W. Lawrence, E. M. Johansson and J. P. Fitch 11. Multi-Dimensional Signal Processing by K. Hirano and T. Nomura 12. On the Assessment of Visual Communication by F. O. Huck, C. L. Fales, R. Alter-Gartenberg and Z. Rahman 13. VLSI Implementations of Number Theoretic Concepts with Applications in Signal Processing by G. A. Jullien, N. M. Wigley and J. Reilly 14. Decision-level Neural Net Sensor Fusion by R. Y. Levine and T. S. Khuon 15. Statistical Algorithms for Noncausal Gauss Markov Fields by J. M. F. Moura and N. Balram 16. Subspace Methods for Directions-of-Arrival Estimation by A. Paulraj, B. Ottersten, R. Roy, A. Swindlehurst, G. Xu and T. Kailath 17. Closed Form Solution to the Estimates of Directions of Arrival Using Data from an Array of Sensors by C. R. Rao and B. Zhou 18. High-Resolution Direction Finding by S. V. Schell and W. A. Gardner 19. Multiscale Signal Processing Techniques: A Review by A. H. Tewfik, M. Kim and M. Deriche 20. Sampling Theorems and Wavelets by G. G. Walter 21. Image and Video Coding Research by J. W. Woods 22. Fast Algorithms for Structured Matrices in Signal Processing by A. E. Yagle
730
Volume 11. Econometrics Edited by G. S. Maddala, C. R. R a o and H. D. Vinod 1993 xx + 783 pp.
1. Estimation from Endogenously Stratified Samples by S. R. Cosslett 2. Semiparametric and Nonparametric Estimation of Quantal Response Models by J. L. Horowitz 3. The Selection Problem in Econometrics and Statistics by C. F. Manski 4. General Nonparametric Regression Estimation and Testing in Econometrics by A. Ullah and H. D. Vinod 5. Simultaneous Microeconometric Models with Censored or Qualitative Dependent Variables by R. Blundell and R. J. Smith 6. Multivariate Tobit Models in Econometrics by L. -F. Lee 7. Estimation of Limited Dependent Variable Models under Rational Expectations by G. S. Maddala 8. Nonlinear Time Series and Macroeconometrics by W. A. Brock and S. M. Potter 9. Estimation, Inference and Forecasting of Time Series Subject to Changes in Time by J. D. Hamilton 10. Structural Time Series Models by A. C. Harvey and N. Shephard 11. Bayesian Testing and Testing Bayesians by J. -P. Florens and M. Mouchart 12. Pseudo-Likelihood Methods by C. Gourier0ux and A. Monfort 13. Rao's Score Test: Recent Asymptotic Results by R. Mukerjee 14. On the Strong Consistency of M-Estimates in Linear Models under a General Discrepancy Function by Z. D. Bai, Z. J. Liu and C. R. Rao 15. Some Aspects of Generalized Method of Moments Estimation by A. Hall 16. Etticient Estimation of Models with Conditional Moment Restrictions by W. K. Newey 17. Generalized Method of Moments: Econometric Applications by M. Ogaki 18. Testing for Heteroskedasticity by A. R. Pagan and Y. Pak 19. Simulation Estimation Methods for Limited Dependent Variable Models by V. A. Hajivassiliou 20. Simulation Estimation for Panel Data Models with Limited Dependent Variable by M. P. Keane 21. A Perspective on Application of Bootstrap methods in Econometrics by J. Jeong and G. S. Maddala 22. Stochastic Simulations for Inference in Nonlinear Errors-in-Variables Models by R. S. Mariano and B. W. Brown 23. Bootstrap Methods: Applications in Econometrics by H. D. Vinod 24. Identifying outliers and Influential Observations in Econometric Models by S. G. Donald and G. S. Maddala 25. Statistical Aspects of Calibration in Macroeconomics by A. W. Gregory and G. W. Smith
731
26. Panel Data Models with Rational Expectations by K. Lahiri 27. Continuous Time Financial Models: Statistical Applications of Stochastic Processes by K. R. Sawyer
Volume 12. Environmental Statistics Edited by G. P. Patil and C. R. R a o 1994 xix + 927 pp.
1. Environmetrics: An Emerging Science by J. S. Hunter 2. A National Center for Statistical Ecology and Environmental Statistics: A Center Without Walls by G. P. Patil 3. Replicate Measurements for Data Quality and Environmental Modeling by W. Liggett 4. Design and Analysis of Composite Sampling Procedures: A Review by G. Lovison, S. D. Gore and G. P. Patil 5. Ranked Set Sampling by G. P. Patil, A. K. Sinh~ and C. Taillie 6. Environmental Adaptive Sampling by G. A. F. Seber and S. K. Thompson 7. Statistical Analysis of Censored Environmental Data by M. Akritas, T. Ruscitti and G. P. Patil 8. Biological Monitoring: Statistical Issues and Models by E. P. Smith 9. Environmental Sampling and Monitoring by S. V. Stehman and W. Scott Overton 10. Ecological Statistics by B. F. J. Manly 11. Forest Biometrics by H. E. Burkhart and T. G. Gregoire 12. Ecological Diversity and Forest Management by J. H. Gove, G. P. Patil, B. F. Swindel and C. Taillie 13. Ornithological Statistics by P. M. North 14. Statistical Methods in Developmental Toxicology by P. J. Catalano and L. M. Ryan 15. Environmental Biometry: Assessing Impacts of Environmental Stimuli Via Animal and Microbial Laboratory Studies by W. W. Piegorsch 16. Stochasticity in Deterministic Models by J. J. M. Bedaux and S. A. L. M. Kooijman 17. Compartmental Models of Ecological and Environmental Systems by J. H. Matis and T. E. Wehrly 18. Environmental Remote Sensing and Geographic Information Systems-Based Modeling by W. L. Myers 19. Regression Analysis of Spatially Correlated Data: The Kanawha County Health Study by C. A. Donnelly, J. H. Ware and N. M. Laird 20. Methods for Estimating Heterogeneous Spatial Covariance Functions with Environmental Applications by P. Guttorp and P. D. Sampson
732
21. Meta-analysis in Environmental Statistics by V. Hasselblad 22. Statistical Methods in Atmospheric Science by A. R. Solow 23. Statistics with Agricultural Pests and Environmental Impacts by L. J. Young and J. H. Young 24. A Crystal Cube for Coastal and Estuarine Degradation: Selection of Endpoints and Development of Indices for Use in Decision Making by M. T. Boswell, J. S. O'Connor and G. P. Patil 25. How Does Scientific Information in General and Statistical Information in Particular Input to the Environmental Regulatory Process? by C. R. Cothern 26. Environmental Regulatory Statistics by C. B. Davis 27. An Overview of Statistical Issues Related to Environmental Cleanup by R. Gilbert 28. Environmental Risk Estimation and Policy Decisions by H. Lacayo Jr.
Volume 13. Design and Analysis of Experiments Edited by S. G h o s h and C. R. R a o 1996 xviii + 1230 pp.
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.
The Design and Analysis of Clinical Trials by P. Armitage Clinical Trials in Drug Development: Some Statistical Issues by H. I. Patel Optimal Crossover Designs by J. Stufken Design and Analysis of Experiments: Nonparametric Methods with Applications to Clinical Trials by P. K. Sen Adaptive Designs for Parametric Models by S. Zacks Observational Studies and Nonrandomized Experiments by P. R. Rosenbaum Robust Design: Experiments for Improving Quality by D. M. Steinberg Analysis of Location and Dispersion Effects from Factorial Experiments with a Circular Response by C. M. Anderson Computer Experiments by J. R. Koehler and A. B. Owen A Critique of Some Aspects of Experimental Design by J. N. Srivastava Response Surface Designs by N. R. Draper and D. K. J. Lin Multiresponse Surface Methodology by A. I. Khuri Sequential Assembly of Fractions in Factorial Experiments by S. Ghosh Designs for Nonlinear and Generalized Linear Models by A. C. Atkinson and L. M. Haines Spatial Experimental Design by R. J. Martin Design of Spatial Experiments: Model Fitting and Prediction by V. V. Fedorov Design of Experiments with Selection and Ranking Goals by S. S. Gupta and S. Panchapakesan
733
18. Multiple Comparisons by A. C. Tamhane 19. Nonparametric Methods in Design and Analysis of Experiments by E. Brunner and M. L. Puri 20. Nonparametric Analysis of Experiments by A. M. Dean and D. A. Wolfe 21. Block and Other Designs in Agriculture by D. J. Street 22. Block Designs: Their Combinatorial and Statistical Properties by T. Calinski and S. Kageyama 23. Developments in Incomplete Block Designs for Parallel Line Bioassays by S. Gupta and R. Mukerjee 24. Row-Column Designs by K. R. Shah and B. K. Sinha 25. Nested Designs by J. P. Morgan 26. Optimal Design: Exact Theory by C. S. Cheng 27. Optimal and Efficient Treatment - Control Designs by D. Majumdar 28. Model Robust Designs by Y-J. Chang and W. I. Notz 29. Review of Optimal Bayes Designs by A. DasGupta 30. Approximate Designs for Polynomial Regression: Invariance, Admissibility, and Optimality by N. Gaffke and B. Heiligers

Handbook of Statistics Vol 14 (Elsevier, 1996) WW

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Handbook of Statistics Vol 14 (Elsevier, 1996) WW

Uploaded by

Copyright:

Available Formats

Preface

Econometric Evaluation of Asset Pricing Models*

Wayne E. Ferson and Ravi Jagannathan

W.E. Ferson and R. Jagannathan

Econometric evaluation of asset pricing models

2. Cross-sectional regression methods for testing beta pricing models

2.1. The capital asset pricing model

W.E. Ferson and R..lagannathan

fli = Cov(Rit, Rmt) /Var(Rmt) .

V[E(Rpt 1I), Var(Rpt II)]

Econometric evaluation of asset pricing models

2.2. Testable implications of the C A P M

W.E. Ferson and R. Jagannathan

2.3. Multiple-beta pricing models and cross-sectional regression methods

Econometricevaluation of assetpricing models

E(Rit) = ~ Y l k Aik -k ZY2kflik

W . E . Ferson and R. Jagannathan

The BJS estimate of the covariance matrix of T1/2(9 - 7) is given by

Therefore, a consistent estimate of variance of the first term in (2.15) is given by

Econometric evaluation o f asset pricing models

W. E. Ferson and R. Jagannathan

3. Asset pricing models and stochastic discount factors

Econometric evaluation o f asset pricing models

security that promises to pay state of nature s, is given by

units at date t + 1, as a function of the

fli = Cov(Rit+l , Rmt+l ) /Var(R,,t+l ) .

E( Rit+l ) = liE(mr+l) + Cov(Rit+l ; -mt+l /E(mt+l ) )

W. E. Ferson and R. Jagannathan

and ej = {6j/foVar(fj)}, j = 1,...,K .

Econometric evaluation of asset pricing models

Pi,tEt{ (OV/OCt) } = Et{ (Pi,t+I + Di,t+l)( OV/OCt+i) }.

u(C) = [C 1-~ - 1]/(1 - ~) .

W. E. Ferson and R. Jagannathan

The marginal utility at time t is

(OV/OC,) = flt(C, -]- bCt_l) -a Ac-fit+AbEt {(Ct+l -~-bCt) -a }

Econometric evaluation of asset pricing models

Vt = [(1 - fl)Ctp + flEt(Vtl-I~)P/O-~)] 1/p

[fl(C,+,/C,) p-1 ] (1-~)/P{Rm,,+ 1}((1-~-p)/p)

4. The generalized method of moments

4.1. An overview of the generalized method of moments in asset pricing models

ui,t+l = m(O, xt+l)Ri,,+l - 1 .

W.E. Fersonand R. Jagannathan

Econometric evaluation of asset pricing models

~'IZI Av "'" -]- ~NL-dim(O)~NL-dim(O)

W. E. Ferson and R. Jagannathan

Econometric evaluation o f asset pricing models

4.3. Illustrations." Using the G M M to test the conditional CAPM

W. E. Ferson and R. Jagannathan

Econometric evaluation of asset pricing models

-l) = C (rl,lZ -I)

W. E. Ferson and R. Jagannathan

Econometric evaluation of asset pricing models

w. E. Ferson and R. Jagannathan inequality restrictions

Econometric evaluation of asset pricing models

Iv. E. Ferson and R. Jagannathan

a(m) > E(m)lMax{E(ri)/a(ri)} I

5.2. Statistical infere'nce for moment inequality restrictions

Econometric evaluation of asset pricing models

W. E. Ferson and R. Jagannathan

Max~T-1 E [ Y t2 - (Yt + o{Rt) 2 q- 2~t1~/]1/2.

Econometric evaluation of asset pricing models

(C l k l ~ - ' ' : ~ l k T ~ ' ' ' : C N k l ~ ' ' - ~ f - N k T

By the definition of bk, we have that

v/-TgT(Or) = (I-- Dr(ArDr)-I AT)v/Tgr(Oo)