MIT Press - Kennedy P. - A Guide To Econometrics, 4e PDF

A GUIDE TO ECSNAMEGRES PETER KENNEDY Simon Fraser University ‘The MIT Press ‘Cambridge, Massachusetts© 1998 Deter Kenneds A rigs reserved. No just of this book clectronie or mechanical means (includ Storage sand rt 12 photweoy’ ‘ort fr the publisher ‘al, wittiout permission Printed and bound in The Ursited Kingdon) by 1! taterwationa ISBN 0 262 11235 3 ha Lin eh) rabewvert. 0 262 61180 6 cpap sre! Number: YS. 4S110 oF Congress Catal CONTENTS introduction | What is Econometrics? 2 The Disturbance Term 3 Estimates and Estimators 4 Good and Preferred Estimators General Notes Technical Notes Criteria for Estimators 2.1 Introduction 2.2 Computational Cost 23 Least Squares 24 Highest R? 2.5 Unbiasedness 26 Efficiency 2.7 Mean Square Error (MSE) 2.8 Asymptotic Properties 2.9 Maximum Likelihood 2.10 Monte Carlo Studies 2.11 Adding Up General Notes Technical Notes The Classical Linear Regression Model 3.1 Texebooks as Catalogs 3.2 The Five Assumptions 3.3. The OLS Estimator in the CLR Model General Notes Technical NotesInterval Estimation and Hypothesis Testing 4.1 Introduction 4.2. Testing a Single Hypothesis: the t Test 4.3. Testing a Joint Hypothesis: the F Test 4.4 Interval Estimation for a Parameter Vector 4.5. LR. W. and LM Statistics 4.6 Bootstrapping General Notes Technical Noves pecification 1 Introduction 2 Three Methodologies 3 General Principles for Specification 4 Misspecification Tests/Diagnostics 5.5 R? Again General Notes Technical Notes Violating Assumption One: Wrong Regressors, Nonlinearities, and Parameter Inconstancy 61 Incroduction 6.2 Incorrect Set of Independent Variables 63 Nonlinearity 6.4 Changing Parameter Values General Noves Technical Notes Violating Assumption Two: Nonzero Expected Disturbance General Notes Violating Assumption Three: Nonspherical Disturbances 8.1 Incroduetion 8.2 Consequences of Violation B83 Heteroskedasticity 8.4 — Autocorrelaced Disturbances General Notes Technical Notes 34 54 5S. 57 60 62 63 69 73 73 74 7 78 8 83 9% 94 94 94 96 99 tol 108 3 115, 116 116 7 119 nt 126 132 9 10 12 Violating Assumption Four: Measurement Errors and Autoregression 9.1 Introduction 9.2 Instrumental Variable Estimation 9.3. Errors in Variables 9.4 Autoregression General Notes Technical Notes Violating Assumption Four: Simultaneous Equations 10.1 Introduction 10.2. Identification 10.3 Singte-equation Methods 10.4 Systems Methods 10.5 VARs General Notes Technical Notes Violating Assumption Fiv Multicollinearity IL. Introduction 11.2 Consequences 11.3 Detecting Multicollinearity \14 What te Do General Notes Technical Notes Incorporating Extraneous Information 12.1 Introduction 12.2. Exact Restrictions 12.3 Stochastic Restrictions 12.4 Pre-test Estimators 12.5. Extraneous Information and MSE General Notes Technical Notes The Bayesian Approach 121 Introduction 13.2 What is a Bayesian Analysis? 13.3. Advantages of che Bayesian Approach 137 137 139 140 143 146 Iso 157 157 159 163 166 167 168 \74 183 183 184 186 187 189 193 194 194 194 195, 195, 197 198 203 205 205 205 20914 15 13.4 Overcoming Practitioners’ Complaints General Notes Technical Notes Dummy Variables 14.1 Introduction 14.2 Interpretation 14.3 Adding Another Qualitative Variable 144° Interacting with Quantitative Variables 14.5 Observation-specific Dummies |4.6 Fixed and Random Effects Models General Notes Technical Notes itative Dependent Variables 1S.1 Dichotomous Dependent Variables 15.2 Polychotomous Dependent Variables 15.3 Ordered Logit/Probit 18.4 Count Data General Notes Technical Notes Limited Dependent Variables 16.1 Introduction 16.2 The Tobit Model 16.3. Sample Selection 16.4 Duration Models General Notes Technical Notes Time Series Econometrics 17.1 Introduccion 17.2 ARIMA Models 17.3. SEMTSA 17.4 Error-correction Models 17.5 Testing for Unit Roots 17.6 Cointegeation General Notes Technical Notes 210 212 27 221 224 222 223 225 226 226 228 232 233 233 235 236 236 237 243 249 249 250 251 252 254 257 263 263 264 265 266 268 269 20 277 9 Forecasting 18.1 Ineroduction 18.2 Causal Forecasting/Economerric Models 18.3. Time Series Analysis 18.4 Forecasting Accuracy General Notes Technical Notes Robust Estimation 19.1 Introduction 19.2 Outliers and Influential Observations 19.3 Robust Estimators 19.4 Non-parametric Estimation General Notes Technical Notes Appendix A: Sampling Distributions, the Foundation of Statistics Appendix B: All About Variance Appendix C: A Primer on Asymptotics Appendix D: Exercises Appendix E: Answers co Even-numbered Questions Glossary Bibliography Author Index Subject Index 288 288 289 290 291 292 297 298 298 299 300 302 304 306 313 317 322 328 384 406 aut 000 000PREFACE In the preface to the third edition of this book | noted that upper-level under- tries students are a Likely to leam about this book from their instructor as by word-of-mouth, the phenomenon that made the first edition of this book so successful. Sales of the third edition indi cate that this trend has continued ~ more and more instructors are realizing that students find this book to be of immense value to their understanding, of econo metrics, ‘What is it about this book thut students have found to be of such value? book supplements econometrics texts, at al levels, by providing an overview ol the subject and an intuitive feel for its concepts and techniques. without the usual clutter of notation and technical detail that necessarily: characterize an econometrics textbook. Its often said of econometrics texthooks that their read- cers miss the forest for the trees, This is inevitable ~ the terminology and tech= hhigues that must be taught do not allow the text to convey a proper intuitive sense of "What's it all about?” and “How does i ail ft together?” All econometries textbooks fail t0 provide this overview. This is not from lack of trying ‘most textbooks have excellent passages containing the relevant insights and interpretations. They make good sense to inseruetors. but they do not make the expected impact on the students. Why? Because these insights and interpret tions are broken up. appearing throughout the book. mixed with the technical details, In their struggle to keep up with notation and to learn these technical details, students miss the overview so essential to a real understanding of those dotals, This book provides students with a perspective from which itis possible to assimilate more easily the details of these textbooks, Although the changes from the third edition are numerous. the basic structure and flavor of the book remain unchanged. Following an introductory chapter. the second chapter discusses at some length the criteria tor choosing estimators. and in doing so develops many af the basic concepts uscd throughout the book. The thitd chapter provides an overview of the subject matter, presenting the five assumptions of the classical linear repression model and explaining how most problems encountered in econometrics ean be interpreted as « violation of one of these assumptions, The fourth chapter exposits some concepts of inference toto the specication of an econametsie model, setting the stage fir the nest six chapters, each of whieh deals with vi ations of an asseniplion ef the classical linear regression mouel, describes their implications, discusses relevant tests, and suggests means of resolving resulting estimation) problems. The remaining vight chapters and Appendices A, B and C address selected topics. Appendix D provides some student exereises and Appendix F offers suppested smsivers to the ibered ques: tions is avatlable from the publisher upon request to instructors adopting book for elassraom use (here are several major changes in this edition, The chapter on qualitative and limited dependent variables was sph into a chapter on qualitative dependent variables (adding a scetion on count data) anid a chapter on limited dependent variables (adding a section on duration models). The time series ehapter has been extensively revised te incorporate the huge amount of work done in this area since the third edition. A new appendix on the sampling distribution con- cept fas been added, to deal with what I believe is students’ biggest stumbling block 10 unde WZ veonometties, In the exercises, =) new type of question has boon added. in which a Monte Carle study is described and students are asked to explain the expected results, New material has been added to a wide variety of lepies stich as bootstrapping, generalized method of moments, neural relations, VARS. and instrumental Minor changes have been made throughout to update results and references, and to improve exposition. To minimize readers’ distractions, there are no footnotes, All references. peripheral points and detatls worthy of comment are relegated to a section at the snd of cach chapter entitled “General Notes”. The technical material that appears in the book is placed in end-of-chapter sections entitled “Technical Notes”. This technical material continues to be presented in & way that suppl ments rather than duplicates the contents of traditional textbooks. Students, should find that this material provides a useful introductory bride to the more sophisticated presentations found in the main text, Students are advised to wait until a second or third reading of the body of a chapter before addressing the ‘material in the General or echnical Notes. A glossary explains common ccono- mietrie terms not found in the body of this book Frrors in or shortcomings of this book are my responsibility, but for improve. ments [owe many debis, mainly to scores of students, both graduate and unde: graduate, whose comments and reactions have played a prominent role in shaping this fourth edition. Jan Kmenta and Terry Seaks have made major con- nbutions in their role as “anonymous” referees, even though Hhave not always, followed their advice. 1 continue to be grateful to students throughout the world who have expressed thanks tv me for writing this book: I hope this fourth edition continues t be of value to students both during and after their formal course work even-numbered exercines. A set af suggested answers to odd tan DEDICATION To ANNA and LD Wiho, until they discovered whut an econometrician was, were very impressed that their sem aight become one. With apologies te K. A.C, Manderville, 1 draw their attention to the following, adapted from The Undoing of Lamia Guselleneck You haven't told me yet.” said Lady Nuttal, “what itis your fiancé does for ti “He's an econometrivian.” replied Lamia, with am annoying sense of being on the defensive Lady Nuttal wis obviously taken aback. It had not occurred to her that ecano= metricians encered into normal social relationships, The species, she would hive surmised, was perpettiated in some collateral manner, like mules ‘But Aunt Sara, it's a very interesting profession.” said Lamia warmly 1 don’t doubt it” said her aunt. who obviously doubted it very much. “To express anything important in mere figures is so plainly impossible that there ‘must be endless scope for well-paid advice on how to do it, But don't you think, that life with an econometrician would be rather, shall we say, humdrum Lamia was silent. She felt reluctant to discuss the surprising depth of emo= tional possibility which she had discovered below Edward's numerical vencer “H's not the figures themselves,” she said finally, “it's what you do with them that matiers”INTRO CTION 1.1 WHAT IS ECONOMETRICS? Strange as it may seem. there does not exist a generally accepted answer to this question. Responses vary from the silly “Econometrics is what econometricians do” to the staid “Eeonometries is the study of the application of statistical methods to the analysis of econamic phenomena.” with sufficient disagreements to warrant an entire journal article devoted to this question (Tintner, 1953) This confusion siems from the fact that econometricians weur many different hats. First, and foremost. they are economists. capable of utilizing eeonomie the ory to improve their empirical analyses-of the problems they address. AL times they are mathematicians, formulating eeonomie theory in Ways that make st appropriate for statistical testing. At times they are aecomnrans, concerned with the problem of finding and collecting economic data and relating theoretical eco nomic variables to abservable ones. At times they are applied statisticians spending hours with the computer trying to estimate exonomic relationships or predict economic events. And at times they ace theoretical statisticians, applying theic skills to the development of statistical wechniques appropriate to the empiri cal problems characterizing the seience of economics, I is to the last of these roles that the term “econometric theory” applies, and it is on this aspect of econometrics that most textbooks on the subjctt focus. This guide is accordingly devoted to this “econometric theory” dimension of econometris, discussing the empirical problems typical of economies and the statistical techniques used 10 overcome these problems ‘What distinguishes an econometrician from a statistician is the former's peo- occupation with problems caused by violations of statisticians’ standard assumptions, owing to the nature of economic relationships and the lack of controlled experimentation, these assumptions are seldom met. Patching up statistical ‘methods to deal with situations frequently encountered in empirical work in ec omic has created a large battery of extremely sophisticated statistical te niques. In fact, cconometrictans are often sledgeharnmers to ‘rack open peanuts while turning a blind eye to data deficiencies and the many cused of usinFoanometric theory is ken exquisitely baked Pronch aecipe, spelling out pre tisely with how anny tas zo ms the sauce, how ayy chats OF pve tsk an fir how many milliseconds to bake the mistane at esaclly 474 tempest fore. Bur when the sali to raw materials, he finds that he alate, sy he substitutes ehunks of cantaloupe: where the recipe alls for vermiewll he use shredded wheats std he substittes 4 meal die tor carey. ping-pong alls for trtle's 1883, a ean of tumpentine, (Val psy xl cook sand. for Chalifowgnae vintage nis, 19 How has this state of altsirs come about? One reason is that prestige in the ‘econometries profession hinges on technical expertise rather than on hard work required to collect good dati Its the preparation sil ofthe iches the professional eye, not the quality of the raw materials in che meal, or th effort that wert into procur wg them, (Ciiliches, 1994, p. 14} Criticisms of econometrics slong these lines are not uncommon, Rebuttals cite improvements in data collection, extol the frutts of the computer revolution and provide examples of improvements in estimation due to advanced techniques. It remains a fact, though, that in practice yood results depend as much om the inp of sound and imaginative economic theory as on the application of correct statis: tical methods. The skill of the economcirician livs in judiciously mixing these ‘oo essential ingredients; in the words of Malinvau . The ar of the economettician consists in finding the set of assumptions which a both sutticionly specific and sufficiently roalistic 1 alow him to fake the best pos sible advantage ofthe data available to him, (Matinvatl, 1966. p. 514) Modem econometrics texts try to infuse this art into students by providing a farge number of detailed examples of empirical application. This. important dimension of econometrics texts lies beyond the scope of this book. Readers should keep this in mind as they use this guide to improve thei the purcly statistical methods of econometrics. 1.2. THE DISTURBANCE TERM A major distinction herween economists and econometrieians is the latter's con- cern with disturbance terms. An economist will specify, For example, that con= sumption is a funetion of income, and waite C— /(¥) where Cis consumption and ¥ is income, An cconometcician will claim that this relationship must also include a disterbuace (or error) term, and may alter the equation to read term the relationship is stiel to be exact or deterministic: with the disturbance term it is ‘The word “stochastic” comes from the Greek “stokhos,” meaning, a target or bull's eye. A stochastic relationship is not always right on target sn the sense that it pprodicts the precise value of the variahle being explained, just asa dart thrown at a target seldom hits the bull’s eye. The disturbance term is used to capture explicitly the size of these “misses” or “ertors.” The existence of the disturbance term is jus tified in three main ways. (Note: these are not mutually exclusive.) (1) Omission of she influence of innumerable chance events Althes might be the major determinant of the level of consumption, itis not the only determinant. Other variables, such as the interest rate o Tiguid asset holdings. may have a systematic influence on consumption. Their omission constitutes one type of specification errar: the nature of the economic rela tionship is not cortectly specified. In addition to these systematic in however, are innumerable less systematic influences, such as her variations. taste changes, earthquakes. epidemics and postal strikes, Although some of these variables may have a significant impact on consumption, and thus should detinitely be included in the specified relationship. many have only a very slight, imegular influence: the disturbance is oflen viewed as representing the net influence of a large number of such small and independent causes. (2) Measurement error It may be the case that the variable being explained cannot be measured accurately, cither because of data collection difficulties cor because itis inherently unmeasurable and a proxy variable must be used in its stead, The disturbance term can in these circumstances be thought of as representing this measurement error. Errors in measuring the explaining variable(s) (as opposed to the vaniable being explained) create a serious cconontetric problem, discussed in chapter 9. The terminology errors ie variables is also used to cefer to measurement errors (3) Human indeterminacy Some people believe that human behavior is such that actions taken under identicul circumstances will differ in a random way. The disturbance term can be thought of us representing this inherent randonmess in human behavior Associated with any explanatory relationship are unknown constants, called parameters, which tie the relevant variables into an equation. For example, the relationship between consumption ¢ where fy and fy are the parameters characterizing this consumption function Economists are often keenly interested in learning the values of these unknown parameterswe tents wnt 8H {HOH Method (the tor). The eeonometrician has no Way of knowing the actual values sa the estimate calculated feom that sample could be quite inaccurate, It in theretone impossible io justify the estimate itself, However. it may be the cave thar the econometrician can justify the estimator by shovwing, for example, that the este ‘ator “usually” produces an estimate that “quite close” to the true parameter value regardless of the particular sample chosen. (The mesning af this sentence, in particular the meaning of “usually” and of “quite close." discussed at length . in the next chapter.) Thus an estimate of from a particular sumple is defended 13 ESTIMATES AND ESTIMATORS by justifying the ectmaten Because attention is focused on estimators of a convenient way of deno those estimators is required. An easy way of doing this is to place « mark over the B or a superseript on it. Thus B (beta-hat) and B* (heta-star) are often used to denote estimators of beta, One estimator. the ordinary least squares (OLS) eat, ator, is very popular in econometries; the notation "> is used throughout this book to represent it. Alternative estimators are denoied by B BY. or something similar. Many textbooks use the letter 6 to denote the OLS estimator, - 20, tie bul of econ. HS on this task, the estimation of parameter ala ee ods oF aaeamete sory foctses. The shes mt seonomencan ae meron ating Parameter values depend ia lage pat on the mae gn erg atts satstcal assumptions concerning the. cheers ne the isluhance term, and means of resting these amsunptions eres play a Droininent role in econmetric theory [Ate mathematical notation econometrics vsully employ Greek letters to ted aria tus unknown values of parameters The Greek levee pan pacar fontest ts beta (B). Thus. throughout tis books ye the Goer actully et bs SeoHiometricisn is secking 1 team, Of eoure ne migues cae the vale oF B bat it can be estimated va saneacel eon Tar appitcnen att En Be used to take am educated sess at anh ean {er application, an estimate of fs simply number Bee example, B might be That let it keneral. eeonometicans are seldom tere ples to reas Raameter: economic elationships ae usually saffron cee TN feats more than one parametor. nd Recut thee pene cur in {hs Sime Telattonsip, beter estimates ol these pramton obtained if sreiare timated together (ie. the inhuence of ane vaplunig eee more nol eegeted I he influence ofthe cer explaining vatuble mage sam teamed fon. Asa rest, B seldom refers *0.8 single parameter value; BiB peepee © a set OF parameter values, indiidually 2 BoB whee Ksthe mtr den panacea ase relerred to asa vector and is written us often 1.4 GOOD AND PREFERRED ESTIMATORS Any fool can produce an estimator of 8. since literally an infinite number of them exists: je. there exists an infinite numer of different ways in which + sample of data can be used to produce an estimate of B. all but a few of these ways producing “bad” estimates. What distinguishes an economtrician is the ability to produce “good estimators, which in turn produce “pood” estinvates One of these “good” estimators could be chosen asthe “bes or “preferred” est Imator and be used to generate the “preferred” estimate of 8. Whit further distin be Bushes an econometrician is the ability lo provide “youd estimators in varity | & of dtferent estimating contexts, The set of “yood” estimators tund the chace cy “preferred” estimator) is not the sate in all estimating problems, In ten, a 6 “good” estimator in one estimating situation could be a “bad” estimator in In any particular application, a1 ate of another situation. three Brmighe boner ae being He. Fhe dimension oF B is, “prefer exinatorn pen siting son, Beene eae . “preferred” estintator depends upon the subjective values ofthe person doing the mraeettah econometric theory Focuses no on the estimate sel, bt on the est fo Te ml recipe” by which the data are aston In ae on for his is thatthe justification oF an estimate compenhis or hor val tmmportant. Th which the est vidual Clearly, our investigation of the subject of econometries can wo no further Until the possible criteria for a “good!” estimator are discussed. Ths is the pur pose of the next ehapier. judgements that determine which of these criteria is the value judgement may well be influenced by the purpase for ate is sought, in addition to the subjective prejudices of the ind GENERAL NOTES LL What is Econometrics? fit came ino prominence with the formation inthe early ie Society and the Founding ofthe journal Beanomeseiva. The introduction of Dowling und Glahe (1970) surveys brictly the landmark publications in cconomties. Pesaran (1987) is & concise history and overview af econometrics. Hendry and Morgan (1995) is a collection of papers of historical importance in the development af ecanoaieiries, Bpsiein (1987), Morgan (19%0} and Qin (1993) are ‘extended histones: see also Morgan (1990), Hendry (1980) notes tit the word econometrics should not be confused with “economystics.” “economiesticks.” oF © The discipline of econometrics hus geenen so eapidlly, sulin so many dflerent ieee tions, tha disagreement regarding the detiniton af ecenenenics has grown eather than diminished over the past decade, Reflecting this, atleast one prominent eeomone: tnitan, Goldberger (2989, p. 151), tas concluded that "nowadays my definition would he that econometrics is what econometsicians do.” One thing that economwtricians do that is not discussed in this book is serve as expert witnessey in court cases, Fisher (1986) has an interesting account of this dimension of eeomomettic work. Judge & (O88, p. 81) remind eaders that “econometrics i fan! © A distinguishing feature of econometrics is that it focuses on ways of dealing with ta that are avskwardidinty because they were nol produced by controlled experi ments, In recent years, however, conteolled experimentation in economics has become moce common. Burless (1995) summarizes the nature of such experianentation and ‘argues for its continued use, Heckman and Smith (1995) is a strong defense of using traditional date sources. Much of this argument is assnciated with the selection bias phenomenon (discussed in chapter 16) ~ peonle in an experimental program inevitably fare not a random selection af all people, particularly with respect to their unmeasured aatributes, and so results from the experiment are compromised, Friedman and Sunde (1994) is primer on conducting economie experiments, Meyer (1995) discusses the atriutes of “parural” experiments in economics © Mayer (1938, chapier 10), Summers (1991), Brunner (1973), Rubner (1970) and Slrvisser (1920) are good Sources af cynical views of eeonometrics, summed Up J ally hy MeCloskey (1994. p. 350)... most allegedly empirical research ih eco -vable. uninteresting or both.” More comments appear in this book ors in variables and chapter 18 on prodietion, Fair (1972) and Fromm and Schink (1973) are examples of studies defending the use of sophisticated goon: reine techniques. The use of econometrics in the policy context has been hampered by the (nesplicable’) operation of “Goodharts Lave” (1978). namely tht all econo met Jk down when used for policy. The nding of Dewald et al. (19S), that there is renuarkably high inewlence af inability to repiveate emypicnea! ste im feconaities. ves nol promotes favarable view of econametewians 9 What fis eon the contibation nf ceogometsies fa the development of economic si nee Sorte wonld atgie That empieial work Frequently uncovers emmparcal eegulats tical ahances. For cumple, the dbilference between ment ol she madels be fies which inspire th timesscries and ero-seetional estimates of the MPC prompted devel relative. permanent and lifecyele eonsamptinn theories. Hut many otters View econo. hr seazn, ts evidenced by the following ques: imeties We dot yenumely uke empirical work ssriasly in coopomiss IS po the Hee BY ‘whet ccuoomsts arcane ther opinions. hy and large. (Les i Hey oa Very lute of wher economists willl yo they kaon. and almost ove oe Comet of ts, Retest on 08 the eerittary txt us Boo score by tmentecllsted ditt have Bec eset snail to bolster oe Tere argue ss nother Hur the helsing Mey provide Is weak. neoneluste. aid easy eutecet DY someare cls" reeessons, Besgman, 1987. p. 192. sad oy seune empl No eenmoui theory wns ever hardened esiise i ws 1 [seoramettic Les nr ana cle tt decision between competing theories dein Tight pr the evidence of sic ates (Spies, KB, 9.6601: nd sone eh Lost sha hs “This rflecis the belief that eoonomie dat are not powerful enough to test and ehoose “among theory, and thst as «) result cconometries hs shifted tion being a too! for tenting theories to being a tool for exhibiting Wisplaying theories. Because evonom ics isa nortexpenmental selenee, often the data are weak, and because of this empitieal evidence provided hy econimetries is Froquently ineanclasive: in such esses it shout bbe qualified as such, Ciliches (1986) comments at Tength an the role data in eean- they are improving: Aigner (IYSR} sresses the potential role oF mmproved data © Critics might choose t6 paraphrase the Maliwaud «quote as “The crooked line from an improved assumption to foregone eanclusien.” The importance wt of drawing a fang of eeonnmettic techniques inthe face of a potential infeior- is captured nicely hy Samuelson ‘ofa prope ueilrsta ity of econometrics t inypired economic theorizing (1965, pr “Teen if a seiomific regularity were less accurate shan the suite hunches of s virtuoso, the fat that it cam be put into operation by thousands of people who are nal virtuosos gives iy trusscendental importance.” This yids ts destane For those of us who ae not virtuosos! ‘© Feminist economists have complained that traditional econorm bias. They urge esonometrcians 10 broaden their teaching and research miethodole toencompass the collection of primary data of different iypes, such as survey oF inter ‘iow data. and the use oF qualitative stules which ame pot Pased on the exclusive Use Df “objective” data, See MacDonald (1595) and Nelson (1995), King, Keohane anderbi (1984) discuss ow research using qulitative stadies can meet traditional si standards, eral books fieus on the empirical applications dimension of ceonometties, Some recent exianples are Thomas (1993), Berndt (1991) ankl Lott and Ray (1992), Manshs (1991. p. 49) notes that “in the past, advances m econometrics were tually motivated by a desire Wo answer specsfie emipiical questions. This yyntbiosis of theory and pric= ioe is fess common today.” He laments tht “the distancing of methodological research from its applied roots is 1.2. The Disturbance Term The error term associated with a relationship need pot necessarily be alive, as Hts in the example cited, For some nonlinear functions it is oflen convenient to spec the xtor term in @ multiplicative form, In ofhor instances it may be appropriate to build the stochastic clement into the lationship by specifying the parameters to be tandom variables rather than constants. (This is ealled the random-coeffcients del.) Some econoretricians prefer to define the relationship between C and F discussed ceaiier as “the mean of C conditional on ¥ is (OL written as AICI} (UPd spells out more explicitly what econametricians have in mind when sing this spe cation. In terms of the throwing-darts-att refers io deseribing th snalogy. characterizing disturbance terms lure of the misses: are the dats distribute! uniformly arcu ‘the huil’s eye? Is the average miss large o¢ small Does the average miss depend on \who is throwing the darts? Is a miss ta the right likely tbe followed By another miss fo the right? In later chapters the statistical spevification af these eheraeteristies nd wy (such as “homoskedasteity" snd “aurogorrelated errors”) are ible detail explained 1.3. Estimates and Estimators An estimator is simply an algebraic function of @ potential semple of datas once the Ssmple is drawa. this function creates an actual aumerical estimate. {Chapter 2 discusses in detail the means wherehy sn estimator 15 “justiied” and com= pared with altemative estimators 1.4 Good and Preferred Estimators ‘The tctminolagy “preferred” esi because the latter has & spec ape 2 Estimation of parameter values is not he only purpose of ecamometties, Iwo other major themes can be identified: tesing of hypostieses and econo:nie” forecasting Hecause both these problems ure intimately rolated 10 the estimation of parameter Uuss, itis not misleadiny to eharuotetize cconometties ay being primaily concerned with parameter estiznation, ialor iy used instead of the term “best” estimator fie meaning im cconometries. This is explained in TECHNICAL NOTES 1.1 What is Econometrics? ‘@ In the macroeconomic context, mn particular in research om zeal busines eyeles, a computational simulation procedure walle! cafibranicn is alten employed sy ar alter native {0 traditional econometric analysis, la this proceduee economic theory plays 9 role than usual supplying ingredients toa general equilibrium much more promi ‘model desivned to address a specific economic question. This miodet is then “calibeat- ed bys os of economic eatins known not to have changed much overtime or equsl to empirical estimates from microeconomic studies, computer simulation produces output from the model, with adjustments model und parameters made until the ouput from these simulations has qualitative of Goldberger (1968h) lor an interpretation of the OLS estimator in these terms, Manski (1988) gives a more complete treatment. This approach is seme! called the method of moments hecause i implies that a moment ofthe populatiery dlistibution should be estimated by the corresponding moment of the sample. Ses the technics! notes. (0) Neurnessiconcentravion Sone estioators ave infinite variances and for that reason are ofien disenissed. Wit this in mind, Fiebig (1985p suggests using as 3 stiterion the probahiligy of nearness preter Buy BY iF prot UB ~ BL = B* B12 0.5) or the probabilit: af cancensration ¢peeter B wa BP it prob (i BI) prob (iB* ~ B~8p wo good intraductory references For the material of this chapter are Kmena 1 19%6, pp. 9 16.97-108, 156-72) and Kane (146%. chapter 8 o sensitive fo violations of the conditions under Whie! 1 have the TECHNICAL NOTES 2.5 Unbiasedness The expected! value ofa variable ¢ is define probability density function (sumpling disteibution) oF 4, Thus BUA) coud be ¥ aga weighted average ofall possible values of v where the weights are proportional 6 the heights oP the density function (sampling distribution) of, 2.6 Efficiency In this author's experienee. tulent assessment of sampling distributions is hindered more than anything else, by contusion abou how to calculate an estimator's varkance ‘This contusion arises for several reasons, variaaee and an (0) Thor isa erveial difference bere something that ofiem is not well understood. @)_ Many instructions assume that some varanee form retained from previeus courses, G) Ths feoquently not appicnt Tha the derivations of varianee formulas all follow a generic form, (4) Stadonts are expected to revognize that some fxmnulas are special eases of more eeneral formu. (8) Discussions of varian mone place for easy te as are “oom knowl ind appropriste formulas are seldom: pathored together ‘the end ofthis book ta alleviate this contusion. sup nical notes Appendix B has hoon inclu plementing the material in these te Jn our discussion of unbiascdnes, no confusion could arise from B being multidimen: sional: an estimator's expected value is ether equal to B Lin every dimension) or iis rot. But in the case ofthe variance ofan estimator, confusion could arise. An estima tor B that is A-imensional really consists of & different estimators, one for each dimension of B, These & different estimators al have their own variances, IPall & of the variances associated with the estimator * are smuiller than their respective eoun- terpans of the estimator B. Unen it cleat that the varianes of f* can be considered smaller than the variance of f. For example. if B is two-dimensional, consisting of two separate parameters and ‘an estimator B* Would consist oF wo estimators BF and BE, IE B* were an unbiased extimator of B. BT would he an unbiased estimatar of Bt, andl BE would be am unbiased estimator of B, The estimators BF and BF wouk! eteh have variances. Suppose their vartinces were 3.1 and 7.4, respectively, Now suppose [1 consisting aff, and Bis another unbiased estimator, where Bi, anu By have variances 5.6 and 8.3, respee tively. In this example, since the variance of BY ts less than the variance af fy and theWee WF BE is kes than the variance af B., its oleae thal the “vamanee” oF BY less than the variance of B. But the variance of (were 6.8 mstead OF 8.3? hen ts nor clear which “sanianee” is smallest additonal complication exists in comparing the variances of estinutors of 4 multi iensional B- There may exist nonzeto e hetween the estimators af the separate comporicnts of B. For example, « postive covariance between Band fs plies that, whenever fy overestimates f.theee is tendeney for fl, to overestinnate 8 the complete estimate of B worse than Would be the ease were this cova ance zero. Comparison of the “Satiances” of multidimensional estimations should wrefore somes aceount for this covariance phisnomenon the “sariance” of a multilimensional estimator is called a variance covariance mattis. I B* issn estimator of k-dimensional B. then the variance covariance mars ‘of B* denoted by Ff"), is defined as a & & marris (a table with & entries in dlirection} containing the variances of the { elements of BY along the diagonal snd the jeovariances in the of-diayonal positions. Thus, FiBD. CBT BD. . CBR BI rp np npD where JYB2) isthe varanes ofthe Ath clement of Band (Bt BH f the eovaranee between BY and BE. All ths variance eovariance mtr does is ray the relevant Variances und covariances ma table, Onee this is chin, the econo ‘on mathematicuns” knowledge of matrix algebra to supyes! ways in which ance covariaaee matrix «Cone unbiased estinsator could be considered “sinallee” tha the variance -covatianee matrix of another unbiased estimator ds Consider Four allemative ways of measuring smallness ainong variance covariance uriges, all aecomplisned by transforming the mateces into single m ‘comparing those numbers res and thes (2) choose the unbiased! estimator whose vat asturace (sum af diagonal elements) (2) choose the unbiased estimator whose variance ovasianee matrix has the small est determinant (3) choose the unbiased estinator for which any given Tinewe combination wf its ele -menis bas the smallest variance: (4) choose the unbiased estimator whose variance-covariance matrix minimizes & risk function consisting af a weighted sum of the insividual variances and vovari sees. (A risk function is the expected value ofa traditional loss function such as the square of the difference between an estimate and what its estimting) anee-covariance matris has the small This last eri on scems sensible" a researcher ean weight the va jnces and covari ances according 10 the importance he or she subjectively feels their rainimization should be given i choosing an estimator. I happens that inthe context of an unbiased estimator this risk fiunetion eur be expressed i an altemative form, as the expected Yalue of a quadratic function of the difference berween the estimate and the true par meter value; Le. FAB — BYOHE Bp. This alternative interpretation also makes powed Intuitive sense asa choice criterion for use inthe estimating conte the weights in the risk function deseribed shove. the eleinents of U.ane chosen so 48 to make it impossible for this risk function to be ne reasonable request sinoe if IL were negative # would be 9 whine ot & ssh Ml sy 3) very for fceurs. Under these circumstances all tour of these criteria head 10 the same ehoice of estimator. What is more, this re the risk Tunction I dae nor depend on the particular weights use i four ways of defining 3 smallest eaxonably straightfo ans have chosen, tr mathematica! reasons, to use as Their define tiga an equivalent fut conceptually more difficult idea, This fifi eule says, chose the unbiased estimator whose variance covarianee matrix, when subleaeted from the vei ance covariance mari of any other unbiased estimator, leaves a non-negative det nite marx. (A mattiy 4 is non-nepative definite sf ee quadratic function tormed by using the elements of as parameters (r°fx) takes on only nonnegative wales. Thus negative risk futon as deseribed. above, the Weighting matrix Q ive definite Proofs of the equivalence of these five seleetion rules can be constructed by eon sulting Rothenberg (1873, p.8), Theil (1971. p. 121). and Goldberger (1964 p. 38) [A special cave othe risk fanetion iste Suppose we chose the weighting suck that the variance of any one clement ofthe estimator as a very heawy weigh, with all other weights negligible. This implies shat each of the elements of the estimator with the “smallest varisnee covariance mattis his individual minimum variance, (Thus, the example given earlier of ene estimator with individual variances 3.1 and 74 ate another with variances 5.6 and 6.3 is unfair: these Wn estinatars could be combined into a new estimator with variances 3.1 and’ 6.3.) This special case als inivates ti an general covariances play no role in determining the best estimator 2.7 Mean Square Error (MSE) Jn the multivanate context the MSE cntenon ean be interpreted in terms of the “sinll= est” (as defined in the technical notes to section 2.6) MSF. maitix. This mattis, given by the formula Zi ~ BAB - By. is.a natural matrix generalization af the MSE erite- on. In practice, however, this gencralization is shunned in favor of the sum oF the MSEs of all the individual componen's of (, @ definition of risk that has come to be the usual mosning of the term. 2.8 Asymptotic Properties The cconometi literature has hevome fall of asymptotics, so mich so that at least, one prominent econometrician, Leamer (1988), has complained tha here is too much Of i. Apnendx C of this book provides an itrition to the technical dimension of this important area of econometric, supplemeating the ies tht fellow ‘The reason forthe important result that Fete) # 2(EX? for gonlineae is astated figure 2.8. On the horizontal axis ate measured values of the sampling distribution of which i portrayed by paf(B), with values of gi) measured um he vertical ants Values 4 and # of B. equidistant fiom FB ar traced to give goth and (8), Noe thst 8) is ma farter from g(B) than is gi): high values of lead 0 values of wif) Considerably above gif), hut low values of 6 Bead to valuty of siB) only sihtly below s( FA). Consequently the sampling distibution of if) w asymomettic, ay shown by pai]. and in dis example te expected vate ofa} Fes ahove eC)ahi pati oe a ei 2 Bpafigisn Figure 2.8 Why che expected value of a nonlinear function is, not the nonlinear function of the expected value I'g were a Tinear function, she asyeamcty parted in figure 28 would not arise and thus we would have Fit) = gE). For e nonlinear, hovwecer, this est des not hl Sunpove now that we allow the sample size to become very larse. al suppose tit plim B exists and is equal to FB in figure 2.8, As the sample size becomes very l the sampling distribution pf) begins ws collapse on plim B: i. its van becomes very, very small. The points! and 8 are no lonyer relevant since values near thom now occur with neghible probability, Only values of i very, very close to lira 6 ate relevant; such values when traced through 4B) ars very, very close to gipliny £3). Clearly, the distribution of gi} collapses on pln 8) as the distribution of 8 eal Japses on plan fh. Thus plim «(B) = g(plim B), for ga continuous Function For u simple example of this phenomenon, let z be the square function, s0 that 868) — BE Prom the well-known result that s°¢s) — Fis) ~ (ESF, we can deduce tha EB) = (BY | FifhL Cleary, FB" # (By but ifthe variance OF fb goes ty 7er0 as the sample size goes 10 snfinty then plat) = ¢plimBy., The ease of egual 40 the sample mean stauisie provides am easy example of this ‘Note that in figure 2.8 the modes, a well as the expected values, of the 160 dens: ties do not corespond. Ann explanation of this can be constructed with the hep 9 the change of variable” thearem discussed inthe techaies notes to section 2.9 © An approximate correction fastor can be estimsited to reduce the simall-sampke bits dseussad here, For example, suppose an estimate B of Bis distributed normally with mean B and variance 11), Then exp 4B) is distributed fow-monally with me + VBI]. suggesting that exp (B) covid be extimted by exp [BAU] whieh, although biased. sheuld have less bias than exp (8) 1 in this same example the oft fal ertor were nol distribuied normally, so that fE was not distributed normally. & Taylor series expansion eoukl be used to deduce an appropriate correction factor Expand exp (Piaround EB ~ Bta ye exp(B) = exp(B) + (8 ~ Brexp iB) |-XB ~ BY exp (Bd plus higher-order terms which are neglected. Taking the expected value of buth sides produces Exp (B) ~ exp BL + 41b)) suggest ar exp B could be estimszed by exp (Bol + AFB For discussion and examples of these kinds of adjustments, sex Miller 984), Kennedy (1981s, 1982) and Goldberger (19682). An alternative way of producing an ‘estimate of « nonlinear function sib is t0 eakeulate samy values of KIB" ele lene fe isan crror with mean zero and variance equal t the estimated variance of B. and average them, For more on this “smearing” estintate soe Dust (1983) ‘When g isa fincar function, the varianae of wi) is given by the square of the slope of g times the vananee of Br Le.. Mast = ata), When gis a comtinaous online function its vannanee is move difficult 19 calculate, AS noted above in the contest of figure 2.4, when the sample size becomes very large only values of f +ery, very clone to pli are relevant, and in this range a linear appnoximation to. ¢(B} is adequate The slope of such a linoar approximation is given by the fis Wve of w with respect fo 8 Thus the asymptotic variance of gif) is ofl ealeulated as the square of this first derivative times the asymptotic variance of B, with this derivative evaluated at B= plim 6 for the theoretical variance, and evaluated at B for the estimated 2.9 Maximum Likelihood The Hikelihood of a sample is often identified with the “probability” of obtaining that sample, something which is strictly speaking, not correct, The use of this terminology accepted, however, because of an implicit understanding, aniculated by Press etal (0986, p. 50): “Ifthe 4° take on continuous values, the peobability will always he ‘ero unless weil the phrase, *... plus ar minus some fixed Ay en each data point ‘So es always take this phrase as understood ‘The likelihood function is iemtical the joint probability density funetion of the Biven sample. {1 is piven a diferent came Ge. the name “Tikelihosal) ho denote the fact that in this contest it isto be imeypreted as & funetion of the parameter vales (ince tis to be maximized with respect to those parameter values) rather than. ay i terpreted asa function of he sample dats ‘The mechanies of finding a cmaximum likelihood estimtor are explained in. most econometrics texts. Because of the impostance of maximum likelihood essimthe econometric literature, any es Iypical evan wie problem of wying to lind dhe maximum hkcTihood estinstor of 1 “fl n the relationship = B+ Bar + Be + # where T observations an 9 and 2 ate available leis prevented here. Consid (1). The first sop isto specify the nature of the distribution af the disturbance term e Suppose the disturbances are identically and indepondenty disteiouted with probs ability density function fle). For example. it could be postulated that © 1 disteib= J normally with moan zor und variance e 5 that He) = One expt ea (2) The soeond step is to rewrite the given relaliomhip ase — eB) Bas BS so that for the ith value oF « we have Fe) = Omer) Fexp) ~ B= Bos = Be) (3) The tied step is to farm the fikelhoodfemcti, the Formals for the jot probs bility distribution ofthe sample, .¢..1 formula proportional to che probability of drawing the particular error terms inherent in this sample. IF the etree teemis are independent of exch other, this is given by the product oF all Me f(s. ome For ceach of the T'sample observations. For the example at hat, this ereates the hke- lihood furctien So. A ~~ Bs: Je Bi Bas ~ Bs L = Qne%) Fexp|— 4 complicated function of the sample data anal the unknown parameters Bj. Be ‘and B, plus uy unknown parameters inherent inthe probability density function F inthis ease (4) The fourth step isto find the set of values ofthe unknown parameters (B,. Bs B id 6"), ws functions wf the sample da, that maximize this likelihood function ‘Since the paruoeter Values that maximize 1. also maximise Inf. and the later tuk ier, ateton wuly face th Taglelbood unstn, T thi xp T re to = Taam) OS ou BB, ~ Jn some simple cases, such as this one, the maximizing values of this Function (Le. the MLEs) can he found using standard algebraic maximizing techeigues. fp ost cases, however, numerical seanch fecbnigue (eseribed im seetions 6.3) ‘ust be employes ta find the MLW fe There are Ove circumstances in whieh the tehnigue presemied above must be mkt fe (1) Densiy: of» na equal to density of «We have observations on 9 Ths the likelihood functian shoul be stature trom the density a mot the density a “The technique deseribed above inplivtly assumes that the density fy Ls). ken cal 0 fie) the density of with & replaced in this formala By y NP. ful th is not recessarily the ease. The probability of obtaining vale of i dhe small ange ds given by f(r de: th a equivatent probability foes of fytde] where fs) the density fiction of wand ‘li the snsote value of the sponding 10 de, Thus, bevause of fle) de = (Kh, We 6 Ae) ker Inthe example given above f(x) and fuel are hentia singe de! Fone Ht su pose our example were such that we had ofr values corre ealewlate C01 as = Bet Bet Bete howe Ais some thnown or unkove) praet. tn this ane Fis) Ah | foey and the ikiho anetion would become t= Tee where Q is the Hhelibood function ofthe original example, with each ¥:eaisel to the power A og the density of when y is. a function oF another variable « ‘hose density is known, is referred t0 as the churge-of variable tongue, Ths variate analogue «ddr is the absolute value of the Jacohian of the Kansformation ~ the determinant of the matrix nt first derivatives of the weetor © with respect to the vector y, Judge et ul. (1988, pp, 30 6) have a good exposition, 2) Obsorvations not independent In the examples above, the observations were Independent of one another so that the density values for each abscreation could Ply be multiplied together to obtain the likelibuod faretion. When the abservat ‘ot independent, tor example ifa lagged value of the regressand appears as a S0f, oF ifthe errors are autocorrelate, an alternative mens of finding the likelihood funetion must be employed. There are two ways of handling this par, (2) Ging « multisuriat donyiey A. multivariate density Funetion gives the density (of an entire sector of r rather than of just one clement of that veetor (i.e. it wives the “probability” of obtaining the entive set of»). For example, the tultivariate ‘normal density Function for the vector » is given tin matris tofmminoloy Hy the formula var\shere 40 is the warianes cosariance matri oF the Vector # Tis formula esele ‘ean serve asthe hikslihood fanetion (i.e there fs no need 10 multiply a set af den sities together since this Formula bas implicity already done hat, as wells th {ing account of interdependencies amang the data). Note that this forms wives the density of the Vector &, not the vector y. Since what is required isthe density of yw multivariate adjustment factor eqaivalent to the univariate deidy| used tearlier is necessary. This adjustment factor is fet ds de] where deed is a mats eta its ith position the derivative of the sth observation of & with respect tothe jth observation of y. I is called the Jacwhian of the transformation fram » 1x, Watts (1973) has 4 good explanation of the Jacobian (b) Caine we transformation Kt may be possible 19 (ransfoem the variables of the problem so as ta be able t» work with errors that are indspendent, For example suppose we have y= B+ Ba 1 Behe but © is such thats, — pr) a, where a is 4 normally distributed error wth mean zero ard variance oF, The #8 are not independent of one arnother, so dhe density forthe vector # cannot he formed by mvaltiplying togethor all the individe ‘ual densities: the multivariate density formula given carer must he used, wh {2 4 function of p and ois. function of pand oy. But the w errors are distrib tied independently, se the density of the « vector can Be formed by muliplying together all the individual, densities. Soa ie manipulation allows «, by expressed as Ge pt (d= BAL = 0) = Bal ~ pe BAG (There is a special transformation form): see the fechnical notes to section 8.3 whore autccorrelated errors are discussed.) The density othe y vector, and thus the required likelihood function, is then calculated 3s the density of the u vector times the Jacobiat ofthe transformation froma (oy. fa the example at band this soeond method tums out to be easier, since the first method (using 2 multivariate density function) requires thatthe determinant of 2 be calculated, x ileult task © Working through examples in the Luerature of the application of these techniques 1s the best way to become confortable with them and fo become aware of the uses to which MLTs van be put. To this end ser Beach and MacKinnon (19784), Savin and White (1978), Lahiri and Bgy (1981), Spitzer (1982), Seaks and Layson (1983), an LLayson and Seaks (198. © The Crumer-Rao lower bound isa matrix given by the Formula where 8 is the Weetor of unknown parameters including ¢°) forthe MLE estimates of which the Cramer-Rao lower bo estimation is wecomplished by anserting the MLE estimates of the unknown parante ters the inverse af the Cramer-Ra lewver bound is called the information matrix nd isthe asymptotic variance vovarianee Matis. I foonmnaly. the MLUE estimator off ois S817 Drawing on similar examples reported in precedingg ections, we ave that estimation of the variance of « normally Jistibuted population ean be computed ws SNM 1), SSE'T oF SSEAT 4.1), which ate, respectively land the minimum MSE estimator. Here SSE is Sty Ty 211 Adding Up ‘The analogy principle of estimation is often calles! the method of moments because typically moment conditions (such as thal EX'e — Mh the covatiaice between the explanatory variables and the ettor 1s vero} are utilized to derive estimators using this technique. For example, consider a variable » with anknown mcan pe The mean jot Xx is the first moment. so we estimate 4 by the frst moment (Ihe average) oF the data This procedure is not always so easy, Suppose, for example, that the density of » is ipiven by £0)-Ax* “for = x < | and vero elsewhere. The expected value of » is MO +1) so the method of moments estimator Af OF Nas found by setting = Mae + 1) and solving t obtain AF = RCL) In erenal we ate usily imecested in estimating several paramoters and so wall require as many of ‘moment conditions as there ure parameters to be estimated, i which case Finding e mates involves solving these equations simultaneously ‘Consider. for example, estimating «and B iny = a | Px + & Because # ty speci fied to be an independent error, the expested value ofthe preduet efx and eis Zeno. an “orthogonality” er “momea” condition, This suggests thst estimation could be bused fon setting the prouct of x and the residuals = y° «a? ~ Bx equal to zen, where la* and Bare the desited estimates of «and B, Similarly, the expected value of « (ts first moment is specified to be Zoro, suggesting that estimation could be hasetd on tng the avers ofthe °* equ tar, Ths gives Fie 16 0 eatin HD Sy at — Beane Ry at — Bx) Which «reader might recognize as the normal equations of the ordinary last squares estimator. It is not unusual for 3 method of moments estimator ta turn wut 49 be @ familiar estimator. result which gives it some appeal, Greene (1997, pp. 145.53) has 4 good textbook exposition, This approoch to estimation jy straightforward so Jong as the nusnber of moment Conditions is equal to the number of parameters to be estimated, But what iF there are ‘ore moment conditions than parameters? fh this case there will be more equations than unknowns and it is not obvious how t© peoeeed The generalized evethed of 2>roments (GMM) procedure, described inthe technical notes al section 8.1, deals with this caseTHE CLAS AL LINEAR REGRESSION MODEL 3.1 TEXTBOOKS AS CATALOGS In chapter 2 we learned that many of the estimating criteria held in high regird by econometticians (such as best unbiasedness and minimum mean squ are characteristics of an eslimator’s sampling disttibetion. These characteristics ‘cannot be determined unless 1 set of repeated Samples ean he taken or hypothe: sized: to take or hypothesize these repeated samples, knowledge of the way i which the observations are generated is necessury, Unfortunately. sn estimate does not have the same charaeteristies forall ways in which the observations ean be generated. This means that in some estimating situations a particular estima lor hus desirable propertios but in other estimating situations it does aay have desirable properties. Because there is no "Superestimator” baving desirable prop cerligs in all situations, for each estimating problem (i.e. for each different way in which the observations can be generated) the econometrician must determine anew which estimator is preferred. An econometries textbook can be characterized as a catalog of which estimators are most desirable in what estimating sitva fons. Thus. a researcher acing a particular estimating problem simply turns to the catalog to determine which estimator is most appropriate for him or li ‘employ in that situation, The purpose of this chapter is to explain how this log is structured, The cataloging process described above is centered around a standard estimat: ing situation referred to as the classical finear regression model (CLR model). Ii happens that in this standard situation the OLS estimator is considered the opt: mal estimator. This model consists of five assumptions coneeming the way i jenerated. By changing these assumptions in one way or another. different estimating situations arc created, in many of which the OLS estimator is no longer considered to be the optimal estimator. Most econometric problems can be characterized as situations in which one (or more) of these five assumptions is violated in a particular way. The catalog works in a steaighttor- ward way: the estimating situation is modeled in the general mold of the CLR model and then the researcher pinpoints the way in which this situation differs from the standard situation as described hy the CLR model (.e., finds out which assumption of the CLR model is viotated in this problem), he or she then turns to the textbook (catalog) to see whether the OLS estimator eetains ity desirable properties. ad iP nil wht alternative estimator should be used. Because econ metricians often are mbt certain of whether the estimating sitttion they fice is fone in which an assumption of the CLR: model iy violated, the catalog also jncludes a listing of techmigues useful in testing whether oF nat the CLR model assumptions ure violated, 3.2. THE FIVE ASSUMPTIONS ‘The CLR model consists of five basic assumptions about the way in whieh the observations are penerated (Q) The first assumption of the CLR model is that the dependent variable ean be calculated as a linear function of a specific set of independent variables, plus a disturbance term. The unknown coetticients of this linear function form the vector fb and are assumed to be constants. Several violations of this assumption, called specification errors, are discussed in chapter 6: (@) wrong regressors the omission of relevant independent variables or the inclusion of irelevant independent variables; (©) nonlincarity’ when the relauonship berseen the dependent and independent variables is not linear (©) changing parameters ~ when the parameters () do not remain constant during the period in which data were collected (2) The second assumption of the CLR model is that the expected value of the disturbance term is zero; i.e. the mean of the distribution from which the disturbance term is drawn is zero. Violation of this assumption leads to the biased intercept problem, discussed in chapter 7. (3) The third assumption of the CLR model is that the disturbance terms all have the same variance and are not correlated with one another. Two major Econometric problems, us discussed in chapter 8, are associated with violations of this assumption: (8) heteroskedasticity ~ when the disturbances do not all have the same (©) autocorretuted errors ~ when the disturbances are correlated with one another. (8) The fourth assumption of the CLR model is that the observations on the independent variable can be considered fixed in repeated samples: Le. it is pose sible to repeat the sample with the same independent variable values. Three important econometric problems, discussed in chapters 9 and 10, correspond te Violations of this assumption:(a) errors in variables errors in measuring the independent ¥ tb) antoregression — using a lagged value of the dependent vs pendent variables te) simultaneous equation estimation ~ situations in whieh the dependent variables are determined by the simultaneous interaction of several rela ships Chapter in which 6 5 9 discussed (5) The fifth assumption of the CLR model is that the number of observations is greater than the number of independent variables and that there are no exact linear relationships between the independent variables. Although this is viewed as an assumption for the general case, for a specific exse ¥ ean easily be checked. so that it need not be assumed. The proflem of malticollineuriy (two or more insdependent variables being approximately linearly related in the san ple data) is associated with this assumption. This is discussed in chapter 11 ‘Changing parameters Perfect multicoilines Errors in variables Nonlinearity Biased intereept All this is summarized in table 3.1, which presents these five assumptions of the CLR model. shows the appearance they take when dressed in mathematic notation, and lists the econometric problems most closely associated with vio tions of these assumptions, Later chapters in this book comment on the m and signilicance of these assumptions. note implications of their violation for the OLS estimaror. discuss ways of determining Whether or not they are Violated and suggest new estimators appropriate fo situations in which one af these assumptions must be replaced by an alfermative assumption. Before we move ab to this, however, more must he said about the character of the OLS estimator 19 the context of the CLR model, because of the central role it plays in the econo X fixed in repeated Rank of = Kee T youpte Samples bead metrician’s “catalog Bi Bate 1 fixed in epeated 70 samp 3.3 THE OLS ESTIMATOR IN THE CLR MODEL Bes The central role of the OLS estimator in the econometrician’s catalog is that of standard against which all ether estimators are compared. The reason for this is that the OLS estimator is extraordinarily popular. This popularity stems from the fact that in the context of the CLR model, the OLS estimator has a large number of desirable properties. making it the overwhelming choice for the “opti estimator when the estimating problew is accurately characterized by the CLR model, This is best illustrated by looking at the eight criteria listed in chapter 2 and determining how the OLS estimator rates on these criteria in the context of the CLR model. (1) Compurational cost Because of the popularity of the OLS estimate many packaged computer routines exist, and for simple cases hand-held ealeult- tors ean he used to pertorm the requited calculations quickly. (Some hand-held cealeulators have OLS estimation built in.) Whenever the functional form being estimated is Tinear, as itis in the CLR model, the OLS estimator involves very little computational cost. and Eine, = repeated ps between n independent variables The assumptions of the CLR model ‘variable a Hinear funet a specific sel of independent variables. Disturbunces have uniform 3 Eg ai § i Depends 2) Expocted value of disturbance term is Assumption t Table 3.1 saab.(2) Least squares Because the OLS estimator is designed to minimize the sum of squared residuals, iis automiatieally “optimal” on this (3) Highest RE Bcuuse the OLS estimator is uptimal om the leust squares rion. it will automatically be optimal on the highest R"eriterion. (4) Cnbiasedaess he assumptions of the CLR: model ean by used fo show that the OLS estimator "is an unbiased estimator of ) Best unbiasedness tn the CLR model Bis a linear estimator, Le. 1 can be written as a linear function of the errors. As noted earlier, itis unbiased Among ail incar unbiased estimators of Bit ean be shown (in the context of the CLR model) to have the “smallest” variance-covariance mattix. Ths the OLS estimator 18 the BLUE in the CLR model. If we add the additional assumption that the disturhances are distributed normally (creating the CNLR model ~ the classical normal fincar regression model, ican be shown that the OLS estinator isthe best unbiased estimator (ie., best among al! unbiased estimators, not just linear unbiased estimators}. (6) Meun square error is not the case that the OLS estimator is the mit smn mean square error estimator in the CLR model. Even among linear estima tors, it is pessible *hat a substantial reduction in vanance can be obtained by Jnly biased estimator, ‘This is the OLS estimator’s weakest point chapters 1] and 12 diseuss several estimators whose appeal iy that they may beat OLS on the MSE-crterion (7) sisimprotie criteria Because the OLS estimator in the CLR model is co, it ts also unbiased in samples of infinite size and thus is asymptotically ed, It cam also be shown that the variance: covariance matrix of (8 * woes to zero as the sample size goes to infinity. so that 8"* is also a consistent esti mator of B., Further. in the CNLR model its asymptotically efficient (8) Mavimum likelthood is impossible to calculate the maximum likeli hood estimator given the assumptions of the CLR model, because these assump tions do not specity the functional form of the distribution of the disturban tems, However, if the disturbunees are assumed to be distributed normally (0 NLR model) t rors out that 8 is identical to Thus, whenever the estimating situation can be efi model, the OLS estimator meets practvully all ofthe eri consider relevant, Its no wonder, then. that this estimator lar. It isin faet coo popular: itis often used, without ju situations that are not accurately represented by the CLR model. If some of the CLR model assumptions do not hold, many of the desirable properties of the OLS estimator no longer hold. Ifthe OLS estimator does not have the propertics that are thought to be of most importance, an alternative estimator must be found, Before moving to this aspect of uur examination of econometrics, how exer, we will spend a chapter discussing some concepts of and problems in ‘inference. to provide # foundation for ater chapters. terion. acterized by the CLR seonometeicians is become sv popll- tion, in estimat © Tf more than one of the CLR model assum; GENERAL NOTES 3.1 Textbooks as Catalogs Worsck 19°29 79) pretenol which wo pers woe evra et tes sould tm pe en form” 4 Bib and Toso (1977, pp. 72-3 note ht the C18 ml ha they el he GUM (genta an boo ap sae and ine te Whiehew n suns “Sec snp ity and nat tan an on cepa he tse ofthe a nl an chnge ana ways the ate edb the Tem in the mold of the CLR model narrows the quest c erence apr oop et een goeton by focsing atention on sppreay wong pac tes capi hing ert ony by exper They warns of th GUN by mtg ceil mite coplexin of areal hough sss ine so many compute pean eat Tor the sorted suv doet wat ia thnk ow much ete erp Deckage simple) sll eapcily Ht cus tough amas 0 emplcate da Sify ese barr But agora angerou things and shal be we arly meticans en Bid tspelsce tule bec te that odo ony eof het Rat aa violated simultancously. These situ x tren gis no view fi rably by all. Consider the opinion ot FFconometricians] are not, it seems to me, m by testing is terms of Fel ial questions con cchniealizes the debate tons will be discussed when appropriate 3.3. The OLS Estimator in the CLR Model The process whetehy the OLS e fr % wheteby the OLS estimator is apple to the data at hand is usually tefered to by the terminology “running a regfesion.” The dependent variable tthe ‘egresind" ix sal to i “regres onthe dependent variables (the regress") to produce the OLS estimates. This terminclogy comes from 8 pioneering sin whch nd ha he mean hes eight tends to “regress” wr mave towar the cht See ht tend the population average height See Maddala (1977, pp. 97-101) for turher comment on this and for discussion ofthe meaning snd interreatin of regres eras ries note that thee Sunn Dictiner lefinesregrewsion a8 “The diversion of psychic energy. nt chanls of fanny The result that the OLS estimator in the CLR model is the BLUE: is ofien referee 28 the Gauss Marko theorem The formula forthe OLS estimator of a specific clement of the ve ‘empirical ght of children born of parents oa given ed 0 oe usallyFigure 3.1 Defining the Ballentine Venn diagram involves observations on wif the independent variables (as well as observations on the ependent variable), Hot just obscevations on the independent variable coxresponding ‘o that particular elenien’ of B. This 1s because, to obtain gn accurate estimate of the influence of one indcpensient variable on the dependent variable, the simulrancous influence of wher independent variablcs on the dependent variable must be taken mle account, Doing this ensures thatthe /th element of "> reffets the influence of the ith independent variable on the dependent variable, holding all the other independent iiables constant. Similarly, the formula forthe variance of an element of B> wo usually involves observations on all the independent variables cause the OLS estimator is s popular, and because it so ofien pliys a role in the foeulation of allemative estimators, itis important that jis mechanical properties be ‘well undersiod, The most effective way of expositing these characteristics is through the use ofa Venn diagram calle the Rallentine. Suppose the CLR rode! apples, with ¥ determined by ¥ and an error term, fn figure 3.1 the eircle ¥ represents variation in the dependent variable ¥ and the circle ¥ represents variation in the independent vable X, The overlap of with ¥, the blue area, eepresents variation that ¥ and X have in commen in the sense that this variation in Y can be explained by X via an OLS rogression. The blue area reBlets information employed hy the estimating procedure inestimating the slope coefficient the larger ins erea the more information is used to form the estimte an! thus the smaller is Ms variance Now consider figure 3.2. in which a Ballentine for a caxe of 1wo explanatory var ables, X and 2. is portrayed ti red by both ¥ and Z), In general, the Y and Z circles will overlap, rellecting some collinearity between the #0: th is shown in figure 32 by the red-plussorange area. FY were regressed on ale. anor Imution in the hlue-plus-red atea would be used to eslimuate Band if ¥ were repressed ‘an / alone, information inthe preen-plus-red arca would be used to extitnate B.. What bappens, though if ¥ is regrow on Vand Z fowether? Th the multiple rogression oF Yan X and Z togetber, the OLS estimator uses the y Figure 3.2 Interpreting multiple regression with the Ballentine iforavon in the Bue es 8 eine amd Ui he simile dss he notin ook rar The ene ae ee trea cureopoistrartton nt hatches i alg erence nee ths infomation soa eco. pie nana cst sets ee ination ia he pc aes consprdsW vaiaton io ta eatchup eels ah Yanan in Zune thc notion shoal padue an smbeced ee oe 8 The infomation ana nese ics an hag de ined y vais Vand the lane consis ot hihareety eer Known Inthe bn ws. Sr example argon Tal dee een ce Imahing ups vansionsn | wit sanatin in chikaon eaten By Mt he eo oc. hing up ese vartonl emslendne Rese ‘Variation in Y is duc to variation in.X, - Notes tht rpeion ana? pcr ees wine tm 8 nd heres regen 7-08 ind sratly cfs Nd ects of fe Becase ts ater med wh ede Bel nn ss Deru te fone met das he el sees Ke neraton oes Iecloe e stot and Tha thn ferns, Assay tee 37 econometrics, the price of obtaining unbiased estimates is higher variances. . Whenever tnd shoul tron het oa sxcip min fae anh ete apr Ta a ee sen eas eenning Fo Valo ovo Zale pds he ac nee a a Bait ner rtenwton Vand wgcee Thon saushie anal te Ole a ta ssp sn Tach sear mee skearatevontaeons ‘One for which it is the slope cocilicient estimate). * resorts Whoreer and ae hy colina an ee Nt thea an een snes bec ey sal ing at when Tegel ce ee FeIie intimation is used to estimate f, and mates to be very fares. Thus, the impact af Df the OLS estimates. Pet ity pletely; the blue und green areas dissppe Multicllingarity is discussed a Tengtl in chapter 1 In figure 3-1 the blue area tepzesents the variation in ¥ explained by. Thus. RE 1s iuven asthe ratio ofthe blue are to the enbne Vetele Ia tigate 3 plus-green area represents the variation in J'explained by ¥ and Z together. (Note th {he red ares is discarded only for the purpose of estimating the coefficients, not for predicting ¥: once the coefficients are estimated, all variation im X and 7 1s used vo predict ¥.) Thus the RE resulting. ftom the multiple regression is given by the rab o the blue-plusred-plus-geeen arca tothe entire Yeitele, Novice that there is no way 0” ullacating portions of the total RP to and Z because the red a explained by forl. in a way that cannot be disentangled. Only i and Z are orthoge ral. and the red atea disappears, can the toal R” he allocated unequivocally to. and / ‘separately The yellow area represents va magnitude of the yellow area represents the magnitude of "the variance vf the errr term, This implies. for example, that if, in the context of pure 3.2, ¥ had been regressed on only ¥, omitting 2, 0° would be estimated by the yellow splusgreen ana ‘The Rallentine was named, by it originators Cohen and Cohen (1975). afr a brand oF US beer whose logo resembles figure 3.2. Their use of the Ballentine was conlined to the exposition of ‘concepts related 10 R°. Kennedy (T9Ntb) extended its us to the exposition of other aspects of regression, \ hinitation of the Ballentine is tat it ‘is necessary in certain cases for the red area to representa negative quantity. Suppose the evo explanatory variables X and Z each have positive eoellivients, but the dat ond 2 ure negatively correlaied: W alone could do a poor job oF explaining variation tn ¥ because, for example, the mmpact of high value of is offset by a low value o! 2) This suggests that the explinations offered above are lacking and should be revised, for example, the resull that regressing on ¥ alone reduces the variance of is eoeffciont estimate should be explained in terms of this regression incorporating 3 arcater cange of variation of X (ic. the entire circle as opposed ta just the blue-pius- brown ares), This probletn notwithstanding, the interpretation advanced earlier is retained in this hook, on the grounds that the benefits of its illustrative power out ‘weigh the danger shat it will load to error. The Ballentine i used here as metaphoric device illustrating Some repression results: if shoul not be given meanine beyond that ‘An altemative geometrie analysis of OLS, using veetor geometn Davidson and MacKinnon (1993, chip. 1) have a good expessiton out eolline os the WV arnl Z cielss to overlap om Kd estnnation 1S. tmpossible huste in ¥ attributable to the error term, and thus Be TECHNICAL NOTES 3.2. The Five Assumptions ession model = gis. Sh + is really » specication of how the eon ins Fisisy %) Are Felaled bo each other through x The poputation regression fimetion ts seinen as Phu 41 ebshs atdeseribes hows the aver for expected value of y varies with. Suppose e ist linear Funston se that the sion function is By # Bos Baye... 1 Busy 1 ns Each element of Bes for examplet isn estimate of the effect on the conditional expectatin ofr of a unit change in .. with all other r hele constant Inthe “CLR model, the ry 9B. + fr. + Byo, ~ disturbance, a formula that ean be written down 7 times, onve For eacl set oF bservations on the ecpendent and independent variables “This gives a large stack of equations. which can be consolidated via mati nota as F= AB + Mere Pisa vector containing the 7 observations on the dependent vat able y: Xv a matrix consisting of & golumas. each column being a vector of Tobser vations on one of the independent variables: and» is vector containing the 7 vnknown distuchaness, resin model is spovified ay 3.3. The OLS Estimator in the CLR Model “The formula for ** i CVINY A. A. properdaivaton of hiss evomihe by iniizing the aim of square crore A ose way of rencnbuTns hs falar prenlipy = AB + hy. wget VY = Uap = be, dpe Yan teh save for 8. The ann oe te vance variance mats 6 #4 varanee ofthe cnurbance tr, Fo hsp ass whi he epson fenton y= Br vs thingies te feral’ Se, ~ ofr ihe saianee Oe BS Nake tha ithe onan i he terauor tas bith deena ig expression il We i, ning to keh variance ">a The varanes cove mts of Bm ly onkava Pesan os wal unkown. Iipestinated by s10X) "whore v mam etimatr oF Thee i usually given the mula 87k) 2E (7 - Kd where &ithe xin thedsutane veo, alt a6 1) — J) where T iA Inte CLR model the es quate nbised estimator ofa inthe NER model kbs anne By dcr he ro as ia 92, ho OLS formula euch ts este te influence of one tnepndent warble are slclsed tle ewnaling To te Smulonaus inButoe ofthe shor independent vrs, ewe erection Say eth eloment of "nan esmuteo the nn oh eps “arial holding al oher xplnatey variables constant Tat there es ca ed can he crphasze by nuting tt the OLS extmat of sa. ean e aes from either the regression of ¥ on X and Z together or thy vgression of ¥ on X "resid valized™ with respect 7 ith the fact of Z ramon a te 32. ne were fo regrow Won Z we owl be abe to explin the ee pcan ata: ts rida rom is eresion th belch nc Ral rests fo 2 Now suppoe ht repeal neil lr 2 theese of the Fels ‘ith he nea bown ie ste bee soca the ane ermine 9 exite Bintan ma ts hon ree oh and gee Tesulting in an identical estimate of B, ™ een ‘Notte frtor tae wens resized foe Z, pring the yellow pane blue area, regressing the residualized Yon the tesidualized ¥ would alse produce the same sina sce thr ously nthe Mae ae Av poration V1‘ witese a” is the‘his result i that, for example, running repression on data frony sich a linear time trend has boot rerws el will proce esatlly the sume coe icien esti line ess FU om FAW dal. AS another example, consider the remenal oF a linese seasonal influence, manning royression on Tincarly descasonalized dats will producc essetly the sstimates as iF the Hinear seasonal indlugnice were ineluded ss sh ete keg repression rut on ata, A variant of OLS called sicpaine regression is to be avons. Hs ¥ on cach explanatory vanable sepatutely and keeping the regression with the bihest RE This determines the estimate ofthe slope evellicieet ofthat regression’s explana tory variable, Unem the residuals from this regression are used asthe dependent vt able in a ew svarel using the remining explanatory vanables and the provedre is repeated, Suppose that. fa the example of figure 3.2, the regression OF Yan. pr» duced a higher R¢ than the regression of ¥ on Z. Thon the estimate of fh would be formed sng the information i the bluc-plaw-red area. Note that this estimate ts biased The Hallenting can be vse to sllstrate several variants of RE, Conskler far example, the sis bonwoen P nd Zin figure 3.2. the area oft be unity, this simple Re, donned R= is given by the red-phisyreen area, The partial he miluenice wf Zn Fatier accounting for the ream the ced for Won Z cortveted for ¥. and is denoted RE_.. Out earhor use of the Ballentine itis whven as the yrven area divided by the rec agea, The rcader might bike to verity that is given by the Torna € wvetlici oY cieole is normalize 16 essing oF Ya REA RE RoR The OLS estimator has several well-known mechanical properties with which students should become inionately familiar ~ instructors tend tp assume this knowledge af the first lecture or two on OLS, Listed below are the more important af these propet= ties: proofs ean be found in mos texthooks. The context sy — a + Bed # (1) FB ~ 0 so that the only zepressor isthe interespt, ys reise 9 ‘ones, producing a’ = 5. dhe average ofthe y observations, (2) Ie. = O sa there is no iercept and one explanatory variable, is repressed on column of « values, producing p!° — Yay (3) If there isan incercepe ard one explanatory variable BMS = or — SW SMA 59 — MO Spidey (4) Wabservations are expressed as deviations from their means. 6 = yan stm then Bi Pea, Ths lan from 13) abo, Low Testers are some cs reserved to denote deviations tom sample means 45) Tho intercept cane estimated as f— BPG on, there are more expan \ariables. asf — SBOE, This comes trom the first mormal equation, the equ sion that results fom setting the partial derivative wf SSK with respeet 09 «| qual to zero 10 masini ze the SSF 46) Av implication of (Sis thatthe sunt af the OLS residuals equals zero in effect, a e ° 19) av a2) a3) a4 as) sy an aay ay) the interop is estimate yt ne tha eases the winy of the OLS fevials rou ere the predicted. or estimated, « values are ealeulated as fa" ‘mplivation oF (8 that the man OF the F values equals the Me buena =F An implication of (3). (6) al (7) boxe is that the OLS regression line passes throu the overall mean of the data pings Adkling i constant tw a variable, or seabng a variable, has a predictable impact on the OLS estimates. For example, multiplying the obssrvations by 10 all matiply B"* by one-tenth, and adding 6 to the observations will, rmeresse co by 6 "= BM, AW bo the stu A a resificion om fhe parameters ean be mewporated ink) a nee climinaniy one coettictent fen thar cgustion and running the resulting reres sion using transformed variables. Kar an example see the general notes 10 sce hom 3 The “variation” in th SST = Sv = sample size The “varialion” explained linearly by th Sion sum of wquares,” S98 — SUF Pe dependent variable 1s the “otal sum of squares” Vy = MF where M9 is mais notation for Sean A is the The sum uf squared errors. fom) 4 regression is SSE =r PNG P= vr = CF = SST SSR. (Note tha textbook notation varies, Some authors use SSE for “explained sui oF squares” and SSR for “sum (of squared resis.” ¢ sven bere) SSE is olten calculated by Sy matrix notation Se = BN The coefficient af determination, R ~ SSR'SST by O18 because OLS minimizes SSE. Ris the sure bergen y ane: is the fraction ofthe “variation” in x that is explained linear ly by the explanaiery variables, ‘When no intercept 1s includ. i is possible for range. See the general notes to section 2.4, Minimizing with some extrs help cannot make the minimization less success ful, Thus SSE deereasey (ur in unusual eases remains unchanged! when an ali tional explinatory variable is added; R? must therefore sive (or remain ichanged Because the explanatory sariable(s) is (are) given ws much credit us possible for ‘explaining changes in v. and the error as Ktle credit as possible, 2 is uncot= related wih the explanatory variable(s) and thus with # bceause ¢ is w linear function af the explanatory varies) ‘The estimated coefficient of the a this repressor “resid ne tesaiy that Took 10 be the opposite oF those WS0- BL ay, or im the more general to lie outside the zero 10 ae reser van be obtained by regressing on lized” For the other rogressors (the residual from a regression of the fh rezressor on all the other rearessors), The same result is oblained if the “residuulized” v iy usal ay the regressind, instead of 1. These results were explained easlier in these teehnicnl notes with the help of the RallentineINTERVAL ESTFMATION AND HYPOTHESIS TESTING 4.1 INTRODUCTION In addition to estimating parameters, econometricians often Wish to construct con- Fidence intervals for their estimates and test hypotheses concerning parameters. To strengthen the perspective from which violations of the CLR model are viewed in the following chapters, this chapter provides a brief discussion of these principles of inference in the context of traditional appli ind in econometrics. Under the null hypothesis most test statistics have a distribution that is tabulated in appendices ut the back of statistics Books. the most common of which are the standurd normal. the 4, the chi-square, and the F distributions. [n senall samples the applicability of all these distributions depends on the errors ia the CLR model being normally distributed, something that is not one of the CLR ‘model assumptions. For situations in which the errors ure not distributed nomal- ly. it tums out that in most cases « traditional test sfatistic has an asymptotic distribution equivalent to one of these tabulated distributions; with this as justification, testing’interval estimation proceeds in the usual way, ignoring the small sample bias. For expository purposes, this chapter's discussion of infer ence is couched in terms of the classical normal linear regression (CNLR} ‘model, in which the assumptions of the CLR model are augmented by assum that the errors are distributed normally. ons 4.2. TESTING A SINGLE HYPOTHESIS: THE t TEST Hypothesis tests om ancl interval estimates of single parameters are straightl \waré applications of techniques familiar to all students of elementary statistics. In the CNLR model the OLS estimator f"* generates estimates that are dist uted joint-normally in repeated samples. This means that 3, 82"... BE are all connected to one another (through their covariances), In particulat. this means that BUS, say, is distributed normally with meun By (since the OLS esti- imator is unbiased) and variance 1480"*) equal to the third diagonal clement of the variance eosatiance mutha of B, The square root oF FYBE™) as the skin dard deviation of B"*. Using the normal table and this standard deviation, inter: val estimates can he constructed and liypotheses ean be tested. ‘A major drawback to this procedure is that the varianee covariance matrix of pO is not usually Known (because 0°. the variance of the disturbances, which fppeurs in the formula for this. va covariance mari, is not usually known). Estimating o° by s', as discussed in the technical notes to section 3.3, allows an estiniate of this matrix to be created, The square root of the third diag. ‘onal element af this matrix is the standard error of 8". an estimate of the stan dard deviation of 8", With this estimate the -table can be used in place of the normal tle to test hypotheses or construct interval estimates ‘The use of such F tests, as they are called, iy so common that mest packaged computer programs designed to compate the OLS estimators (designed to run OLS regressions) have included in their output a number called the 1 statistic for each parameter estimate. This gives the value of the parameter estimate divided by its estimated standard deviation (the standard error). This value can be com: pared directly to critical values in the rable to test the hypothesis that that paras meter i equal to zero, In some research reports, this f statistic is printed in parentheses underneath the parameter estimates, creating some confusion because sometimes the stundard errors appear in this position. ( negative number in parentheses would have to be a £ value, so that this would indicate that these numbers were (values rather than standard errors.) 4.3. TESTING A JOINT HYPOTHESIS: THE F TEST Suppose that a researcher wants to test the joint hypothesis that, say. the fourth and fifth elements of fare equal (0 1.0 and 2.0, respectively, That is. he wishes to test the hypothesis that the sub-veetor Be By is equal t the vector 10 i20 This isa different question from the two separate questions of whether fi is qual to 1.0 and wether fis equal to 20. It iy posible, for example, 0 accept the hypothesis that 8, i equal to 1.0 and also to aeeept the hyporhesis that By 1s ‘equal to 2.0, but to reject the joint hypothesis that iis equal 16 1] 20 The purpose of thy section iy ty explain how tie testis hypotheses. The following section explains how a dies based on separate tests and joint tests vould arise The F statistic for testing a set of J I parameters (including the intereept) and ised to test such joint nee between results cat constraints in a egeession with & observations takes the generiv form [SSE (conse ul) ~ SSE unconstrained) SSE unconstrained Ky where the degrees of freedom for this F statistic are J and T — K. This generie form is worth memorizing — it is extremely useful for structuring F tests for 3 wide variety of special cases. such as Chow tests (chapter 6) and tests invaly dummy variables (chapter 14), When the constraints are mie, because of the error term they will not be satis fied exactly by the data, so the SSE. will increase when the constrtinis are imposed minimization subject co constraints will not be as successful as mink jon without constraints. But if the constraints are truc the per-constraity increase in SSE should not be large. relative to the influence of the error term, The numerator has the “pei (change in SSF due ta imposing the constraints and the denominator has the “per-error” contribution to SSE. (The minus X in this expression corrects for degrees of freedom, explained in the general notes.) If their ratio is “too big” we would be reluctant to believe that it hap pened by chance, concluding that it must have happened because the constraints are false. High values of this F statistic thus lead us to reject the null hypothesis that the constraints are true. How does one find the constrained SSE? A constrained segression 1s run (© ‘obtain the constrained SSE. The easiest example is the case oF constraining 9 cocfficien! to be equal to zero — just run the regression omitting that coefficient’s variable. To mun a regression constraining fi)" to be 1.0 and B."* ta be 2,0, sub tract 1.0 times the fourth regressor and 2.0 times the fillh regressor from the dependent variable and regress this new. constructed dependent variable on the remaining regressors. in general, to incorporate a lincar restriction into a regres: sion, use the restriction to solve out one of the parameters, and rearrange the ting equation to form a new n involving constructed variables. Ar explicit examples given in the general notes consti 4.4. INTERVAL ESTIMATION FOR A PARAMETER VECTOR Interval estimation in the multidimensional ease is best illustrated by a wo: dimensional example. Suppose that the sub-vector js of interest. The OLS estimate of this sub-veetor is shown as the point in the center of the reetang o 4.1. Using the table and the square coot af the fourth diagonal term in the estimated variance covariance matrix of BU. 95% confidence interval ean be constructed for B,. This is shown in figure 4.1 ay the interval from 1 B; Bi!" Ties halfway between and B. Similarly. 8 98% conte dence interval can be constructed for Bc it is shown in figure 4.1 as the unter from C'to 2) and is drawn larger than 1 standard ereoe for i! terval AB to reflcet an assumed larger Posse aes one Figure 4.1 A confidence region with zero covarianceAp interval estimate for the subsector Bi | Be is w region or area that. when constructed in repeated samples, covers the true value (B. Bo in, say. 95% of the samples. Furthermore, this rezion should for an efficient estimate be the smallest such region possible. A natural region 10 choose for this purpase is the rectangle formed by the individual interval estimates, as shown in figure 4.1. If Bi! and "> have zero covariance. then in repeated sampling rectangles calculated in this fashion will cover the unknown point (B,. Bs) in 0.95 X 0.95 = 90.25% of the samples. (In repeated samples the probability is 0.95 that the B, conlidence interval covers B.. as 18 the probability thatthe By confidence interval covers fi: thus the probability for both fy and Bs to be covered simultancously is 0.95 X 0.95.) evidently, thiy rectangle is not “big” enough to serve as a 95% joint confi- deve interval. Where should it he enlarged’! Because the region must be hepl as small as possible, the enlargement must come in thase parts that have th est chance of covering (fy, Bo) in repeated Samples. The comers of the reetan will cover (B,, 8.) in a repeated sample whenever fy" and Be are simul ously a long Way from their mean values of By and {.. The probability in repeat cd samples of having these two unlikely events accur simultaneously ts very small. Thus the ateas just outside the rectangle near the points 1, B. Cand Da more likely to caver (B,. Bs) in repeated samples than are the areas just outside the comers of the rectangle: the rectangle should be made bigger near the points 4, B.C. and D. Further thought suggests that the areas just outside the points 4, B,C. and D are more likely, in repeated samples, to cover (Bs, Ba) than the areas just inside the comers of the rectangle, Thus the total region could be made smaller by chopping a lot of area off the comers and extending slightly the areas near the points , B, C. and 2. In fact. the F staistic deseribed earlier allows the cconometrician to derive the confidence reuion as an ellipse, as shown i figure 4.1 The ellipse in figure 4.1 represents the case of zero covariance between 6!* and "*. If By!* and B2"* have a positive covariance (an estimate of this covariance is found in either the fourth column and fifth row or the fifth column end torch row of the estimate of the variance. covariance matrix of >), whenever BY” is an overestimate of 8", B"* ty Likely to be an overestimate of Bs, and whenever fis un underestimate of By, fis likely to be an underestimate of {8.. This means that the aren neur the top right-hand comer of the seetan the area near the bottom leftshand comer une no longer as unlikely 10 cover {B.. B.1 in repeated samples: it also means that the areas near the top left-hund ‘comer and bottom right-hand commer are even less likely to cover (fb... In this ‘ease the ellipse represent tn figure 4.2. In the the confidence re} ise oF negative covariance between By on is tilted to the right as show “and BY *, the Posie one e258) ~ Fone vines ote Figure 4.2 A confidence region with positive covariance ellipse is tilted to the left. In all cases. the ellipse remains centered on the point BE Be) This two-dimensional example illustrates the possibility. mentioned earlier, of accepting two individual hypotheses but rejecting the corresponding joint hypothesis. Suppose the bypothesis is that B, ~ 0 and f= 0, and suppose the point (0.0) lies inside a corner of the rectangle in figure 4.1, but outside the ellipse. ‘festing the hypothesis , = O using a £ test concludes that 8, is insignif cantly different from zero (because the interval 48 contains 2620), and testing the hypothesis 8, =0) concludes that . i8 insignificantly different from zero (because the interval CD contains 7era). But testing the joint hypothesis le1-[al- a using an F test, concludes thaticantly different from the Zena velar because 10,0) fies outside the ellipse. In this example one ean confidently say that af least ame of the two Vit ables has a significant influence on the dependent variable, but one cannot with confidence assivn that influence co cither oF the variables individually. The typical citcumstanice in which this comes ubout is in the ease of multieollinearity (see chapter 11), in which independent variables are related so that itis difficult to tell which of the variables deserves credit for explaining variation in the dependent variable, Figure 4.2 is representative of the multicollinearity ca In three dimensions the confidence region becomes a confidence volume and is represented diagrammatically by an ellipsoid, In higher dimensions diagram: ‘matic representation is impossible, but the byper-surfuce corresponding 10 cri teal value of the F statistic can he called a multidimensional ellipsoid, 4.5 LR, W, AND LM STATISTICS The F test discussed above is applicable whenes. tions in the context of the CNLR- model. Whene into this mold = for example, if the restrictions are aonfingar, the mode! is nom linear in the parameters or the errors are distributed non-normally ~ this proce ure is inappropriate and is usually replaced by one of three asymptotically equivalent tests. These are the likelihond cutie (LR) test, the Wald QW) test) and the Lagrange multiplier (LM) test. The test statistics associated with these tests wave unknown small-sample distributions. but are cach distributed asymuptotical- ly as w chisquare (3°) with degrees of Freedom equal t0 the number of rest tions being tested These threc test statistics are based on three different rationales. Consider fig- ¢ 4.3. in which the log-likelihood (InL.) funetion is graphed i a function of the parameter being estimated. 8%" is, hy definition, the value of B at which tal attains its maximum, Suppose the restriction being tested is written as 4(@) = 0. satisfied at the value 3" where the function x8) cuts the horizontal axis we are testing Linear restrie~ cr the problem exnnat be cast (1) The LR text Ifthe restriction is true, then lola. the maximized value ot Ink. imposing the restriction, should not be signieant less that In yy the unrestricted maximum value of InL. The LR test tests whether (lnky ~ In.) significantly different from zero, (2) The W test Ifthe resirietion 4(B) = 0 is true. then ¢(B™") should not be significantly different from zero, The W est tests whether BY (the unre stricted estimate of B) violates the restriction by a significant (3) The Lat test The log-likelihood function Il. is maximized at point 4 where the slope of Inl- with respect to is zero. IF the restriction is true then the slope of InL. at point A shoud not be sigmfcunty different trom Zero, The LM test texts whether the slope of In, evaluated nt the restricted estimate, is significantly different from zero, mou. Int. ost bee oi 4 Figure 4.3. Explaining the LR, W, and LM statistics When faced with three statistics with identical asymptotic properties. econe- metricians would usally choose among them on the basis af their small-sample properties, as determined by Monte Caro studies. fn this ease, however. A ape pens that computational cost plays @ dominant role in this respect To calculate the LR statistic, both the restricted and the unrestricted estimates of fb must be calculated, If neither is difficult 10 compure, then the LR tess computationally the most atractive ofthe three tests(To calculate the W statistic only the unne- sirieted estimate is required: ifthe restricted hut not the unrestriied estimate is difficut to compute, owing to a nonlinear restriction, for example, the W testis computationally the most attractive.!To caleulate the LM statistic. only the restricted estimate is required: if the un but not the restricted cstimate is difficult to compute — for example, when imposing the restriction transforms a nonlinear tinetional form into a linear functional form — the LM testi the most attractive4.6 BOOTSTRAPPING Festing hypotheses exploits knowledge of the sampling distributions of test st listies when the null is true, and constructing confidence intervals requires Knowledge of estimators’ sampling distributions. Unfortunately, this “knowl edge” is ollen questionable, or unavailable, for a variety of reasons (1) Assumptions made concerning the distribution of the error term may be false, For example, the error may not be distributed normally. or even approximately normally. as is often assumed, (2) Algebraic difficulties in calculating the eharacteri bution often cause econometricians co undertake such derivations assuming. that the sample size is very large, The resulting “asymptotic” results: my hot be close approximations for the sample size of the problema at hand, (3) For some estimating techniques, such ax minimizing the median squared terror, even asymptotic algebra cannot produce formulas for estimator variances. (4) A cesearcher may obtain an estimate by undertaking a series of tests, the results of which lead eventually to adoption of a fina! estimation formula This search provess makes it impossible to derive algebraically the el ter of the sampling distribution, of a sampling distri One way of dealing with these problems is to perform a Monte Carlo study data are simulated to mimic the process thought to be generating the data, the estimate or test statistic 1s calculated and this process is repeuted several thou. sand times ta allow computation of the character of the sampling distribution of the estimator or test statistic, To tailor the Monte Carlo study to the problem at hand, initial parameser estimates are used as the “true” parameter values. and the sctual values of the explanatory variables are employed as the “fixed in repealed samples” values of the explanatory variables. But this tailoring 1 incomplete because in the Monte Carlo study the e7rors must be drawn from a known distr bution such as the normal. This is a major drawback of the traditional Monte Carlo methodology mn this context. The bootstrap is a special Monte C: ilo procedure which circumvents this problem. It does so by assuming that the unknown distribution of the error term ‘an be dequately approximated by a discrete distribution that gives equal weight fo cach of the residuals from the original estimation, Assuming a reasonable sample size, in typical eases most of the residuals should be small in Absolute value, so that although cach residual is given equal weight (and thus is equally Likely to be chosen in random draws from this distribution, stnall resi uals predominate, causing random draws from this distribution to produce small values much more frequently than large values. This procedure, which estimates sampling distributions by using only the original daca (and so “pulls itself up by its own bootstraps") has proved to by remarkably successful. fy effect it substi tutes computing power. the price of which has dramatically decreased, for theo Fem-proving, whose price has held constant or even increased as we have adopted more complis The bootstrup begins by estimating the model in question and saving the residuals. It performs « Monte Carlo study. using the estimated parameter values ay the “true” paranteter values and the actual values of the explanatory variables as the fixed explanatory variable values, During this Monte Carlo study errors are dravin, with replacement. from the set of original residuals. In this: way account iy taken of the unknown distribution of the true errors, This “residual: based” technique is only appropriate whenever each errar is equally likely to be drawn for each observation, A this ts not the ease, an alternative bootstrap ping method is employed. See the yeneral notes to this section for further discus sion, estimation procedures GENERAL NOTES 4.1 Introduction © eis extremely convenient to assume that errors are disnibuted normally. but there ist lito justification for this assumption, Tiga and Box (1973, p, 13) speculate that tn uni formality of disturbances may be traced. pethups. to early 18a diet of asymptotic Normality of maximum likelihood and other estims- claimed that “evervone elieves in the [Gaussian] hw of errors, the experimenters heeause they think i is 2 mathematical theorem, the maths ‘eaticians because they think itis an empirical fat.” Several lets for nonmality exis: a (977. pp. 30S-H). See als Judge et sl (1988, nonmality of the fal-tailed kind, implying infi= {ereal estimation canine for a texthook exposition see Mad pp. 882.7). The consequers nite variance, are quite serious, singe hypothesis testing and be undertaken meaningfully. Faced with such non-normality, two uptions exis). Fis. ‘one can einploy rohust estimators, as deseribed in chapter 18, And second, one an transtorm the data to create iransformed errors that are closer 1 being normally diss tributed. For discussion see Maddala (1977, pp. 314-17), © Testing hypoiheses is viewed by sore with searn, Consider for example the remark a Johnson (1971, p. 2): “The “testing of hypotheses" is frequently merely @ euphemism for obtaining plausible numbers 18 provide ceremonial adequacy for a theory choses and defended on « prior: grounds.” For a completely oppesite eynical view, Blows 41980, p. 257) feels that econotretricians “express a hypothesis i terms oF an eqs tion, eslimate & varity of forms for that equation, seleet the best it discard the Fest, and then adjust the theoretical argument to ratinnalize the hypothesis that being test ed. © eshould be hore im mind that despite the power. oe lack thers, of hypothesis tests ‘oflen conclusions are convincing to a researcher only isupported by porsonal exper cence. Nelson (1995, p, L41) captures this subjective element of empirical researe by noting that “what oflen really seems 49 matter 1 convincing 2 make colleague of theexistence of sex discrimination is not tadios wath [L004 ‘objective’ observations. ut ‘thera particular single direct observation the experienee of is ean daughler" Tiypothesis teste are usally conducted using a type Lerrr rate (probability of reject ing a true null) of S%, ut there iS no wood reason why’ Sty should be preferted to some other percentage. It chosen sian that i has become a triton, prompting Kempthome and Doerflcr (1969. p, 231) t0 opine that “statisticians are people hose si lif ste wry, $a te tie!” For a number of reasons, test of significance can sometimes be misleading, A good discussion can be found in Bakan (1966), One of the more interesting problems inthis respect i the ft that alawst any parameter ean be found to he significantly different from ro if the same size is saiiemly ange (Alo! every slat independent variable will have some in ce val, on dependent varia; inereasing the sample size will reduce the variance and eventually make dis influcnee statistcal- ly significant.) Thus although « researcher wants a lange simple size Wo pererate more accurate estimates, too large a simple size might cause difigalties in imeqpreting the usual tests of significance McCloskey and Ziliak (1996) look carefully al & large number of empicieal studies in economics and conte thal researchers soem not to appreciate that statistical significance dees nol imply economic significance One must ask if the magnitude of the coefficient in question is lange enough far its explanatory variable thay as opposed to significant”) influence om the dependent ‘variable. This is called the 1ao-large somple sice prublem. It is suggested thal the Si nificance fevel be adjusted downward as the sample size grows: for forza see Leamer (2978, pp. 88 9. 10H-5). See also Attiekl (1982). Leamer would alse argue (198K, p. 331) that this problem would be resolved if rowarchers wewgnized tht wenuinely interesting hypotheses are neighborhoods, not poms Another interesting dimension af this problem isthe yuestion of shat sig level should he employed when replicating a study with new dia: conclusions mist bbe draven by considering both sets of data sa unit, not just the nesv set of data, Fer liscussion see Busche an Kennedy (1984). A third interesting cxample in this cor: {ext isthe propensity tor published studies to contain a disproportionstely large nu bet of ype I errors: studies with statistically significant results tend to wet published ‘whereas those with insignificant results di aol. For comment see Feige (1975), Yet another example thal should be mentioned here is pre-test bias. discussed 1 chapter 12 Inferences from a model may’ be sensitive w the model specification, the validity ot which may be in doubt. A fragilire anaes is recommended to deal with this: i unines the range of inferences resulting from the range of believable model specit tsations. Soe Leamer ard Leonard (1983) ata Learner (1983. Armstrong, (1938, pp.406 7) advocates te use othe methel of muhiple hypotheses, jn whieh rewearch ss designed to earmpare two or more reasonable hypotheses, in gor teas! 1a the usual advocacy stacey i which a res arctic iies to find eontieming evi v artists, tend te Fallin Hous with their models) Its claimed that the latter procedure bisses the way seientisty perceive the World, and that scientists employing the former strateay prngress mone rapidly donee fora favorite hypothesis, (reonoretricians, 4.2 Testing a Single Hypothesis: the ¢ Test AF tes ch he med fe est amy single linear consist. Suppose Ye EB 8 Bw Fe ind We wh to est BEB = LAL tests fila by ts equal to 2er0, in this eave afb 81 — Ces S491 and diving this by the square woot of iis estimated variance to lem a statisti with degrees of froedom egal to the sample nos the muimber ef paraincters estimated in the seatession, Estimation of the nce OF B+ AY Th isa bit messy, bat can be done using the elements in the wied variance covariance matrix Fram the OLS teetessian, This messiness ea be avoited by using an F test, as explained sn the goneral nates to the following suction © Nonlinear comsteains ane us W. LR gr LM text bur sometines ‘eosin wrth ih ts hand side equal to zero, the lefts root of an catimate of asniptou od side is estimate! then dives by the 3 nce to produce the asymptote Satstic. His sponding hewlaarlametion as Uneseed chapter 2 W test satis, "The asymptotic variance of & 4.3. Testing a Joint Hypothesis: the F Test © IF there une only two observations, linear funetion with one indspondent variable (1s, two parameters) will i the data perfec, regardless of whut independent vari= able is used, Adding a third observation will destroy the portoet fit, but the 6t will eimai quite good. simply because there is effectively only one observation to ‘explain, Its to corre! this phenomenon that statistics are adjusted fir degre wf fe dom the number of “ites” ur Tinearly independent observations used in the ealeuls- tion of the statistic. Por all we F tests cited inthis seation, the degrees freedom appropriate for the numerator is the number of resections being tesied. The degrees ‘of freedom for the denominator is JK. the number of obscrvations tes the namher ‘of parameters Being estimated. J/— K is also the deyrees af freedom for the sats mentioned in section 4.2 © The degrees of freedom of a sa cul For ex ste isthe number of quantities that enter into the eal ‘oF the statistic minus dhe number of constraints connecting these quantities ple, the formula used 10 eompate the sample variance involwes the sample mean statistic. This places a consiaint on the data given the simple mest any one data point ean be determinedly the wxher (WV — 1) data points. Consequently th in eflect only (V— 1 unconstained observations asailable (6 estimate the sinaple Variance; he degrees of frecdom ofthe sample variance stastic WY 2. © A special ease of the # statistic is automatically reported by most regression packages the “overall significanev of the regression.” This F siatistic Hts the hypothesis that all dhe slope coefficients ace zero, The constrained regression in this case would base only am intercepy © To elarisy further how one fans a constrained regression, suppose for example tht yn Bes be4 and we wish tw Impose the consiruint thal +A = 1 Suostiute B 1S and reverange to yet y= aH Btw) +e, The resin© In the proveding example it should he el that Could he used 10 tes this same hypuithesre his reflects the general resul that the sa spose am anpetien Sanaote has etal w compare the R? ofthis repression with that toe oe describ re of a rst deurees of reed one andl the degrees wf fem fv the test, With dl oF testing a single coeMicient equal to ¢ specific val. i Pest thay artes! © By dividing the numerator and denominstoe of the F statis that i 9 easy vo eonstHUEL an F testa the bepotiess that B > 8 1, The esting # statistic will be the sy Ke Ol the Psat bed in the proveding section) ities ah F Statistic (with he exeoption tis usually easier to pertomm by ss7, te total varias join the dependent variable, Fan be wetten interns of Rand AR” Tht ming is not recommended, horwever, beat 8 segression witha diferent dependent variable th ‘obtain the unrestricted SSB! (@s in the example a wcorparable Rs, © An F statis sauare statistics, each divide by is deg the denominator. Foe the standard xa Freedom p. This exp Invariably expressed as chi-square statisti rather ples it cannot be said that this approach, sa th hove) often the restricted SSH is obtained by running used by the regression run to implying different STS and with ¢ and n degrees of freedom is the ratin of two independent eh of fedom, p for the mumerator and foe mie We have been discussing, the chi-squar fon the denominator is SSE, the sum of squsted OLS revi dlom T= &. divided by 0°, Axranptoicaly, SSE4T~ Ky squats 2 ‘or becomes unity. leaving F equal to the numerator chi square uv of freedom p. Thus. asymptotically pF ts cistributed as 2 o uals. with devroes vf fre 0 the denorain lel by its degrees square with deyrees of iis Why test satistiee derived on asymptotic arguments are 2 F sttistes, i small an aleulating the chissquare statistic and win re ear fiom the ehrsquae drcribution, dette prefered 1s salem the F statistic and using ertical vulues fom the F cistribution © One application of the F st is in testing for causality, It is usually assumed that serena the dependent variable are caused by movements in the inlepenten ‘variable(s) but the existence of a relationship between these the existence of exusality nor ils direction. Using the dictonar Is impossible to tt For {cconometzicians should say ariables proves neither ‘meaning af causality. salty. Granger developed 4 special definition of causal { which economeinicians use in place of the dictionary definition, sai speaking Hranger-cause” in place of “cause.” but usually they do hag eanhble «is said to Granger-cause y if prediction of the eutent valee of ve ‘enhanced by using past values ofr. This definition ing by regressing y on pas, current and future val fron © t0_v, the set of coelicienis of the future different from the vero vector (via an F testy both data sets are transformed (using the sane sutocorteltion in the error attached to this regtess format Df the F tess chaptcr ® exwmines the problem of aulocor ‘ery exists over the appropriate way of conducting ths tr lo which the results are sensitive 10 the transforman (on the possibility oF expected fotuns values of t af plemented for empirical tes. of rs if eausaily runs one way. values of should test insignificantly and the set of coefficients of the past val- tics of 2 should test significantly different from zero, Belong ra ing this regression tion), so as to eliminate any on, (This is required to pct use led errors.) Grea! vontrox sfomnation and the exten: oxen. Other erticisms focus the Curent value ofr and, werseause Cheistmas> ices the mr studies 08 this top (Consider, for example the fact that Chest Bishop (1979) has cameise review and refer Darnell (1994, pp. 41-3) has a concise textbook expos 4.4 Interval Estimation for a Parameter Vector © Figure 4.2 can be used to illustrate another curiosity ~ the possibilty of accepting the . ty chasis oan tes wl acting te Rypotess ht = aa she hypothesis tht Don ns of mn oth wou Neth st he ‘one at hand, the point (9, 0) fell in either of the small shaded! areas {in the upper pet or Ser thelist 42. Fer assay ds a he posible ean that coud aie hee, long ian example of hs seldom encountered cant’ se Ges ana ese 8 4.5 LR, W. and LM Statistics sion of the W, LR and LM stanisties, noting, for example that he scone ain es an nee statistics through variances (recalt that these second derivatives Inainize subj Teens, the Farange malin tshnige 6 aslexpansions of the W. LR aod LM tess are different and upon esaraination the ER test os to be favored un small samples, Dagens and Datour (1991, 19829 coche that W tests and some fists of LM tests ate not invariant to ekimges a the measreme lunits the represenation of the null hypothesis and rapatameterizations, an so recon vend the LB test Because it requires estimation under only dhe mall hypothess, the 1.M tests kes spe Cie than other tests coneeming the precise nagure of the altersative hypothesis. Thi ‘could be viewed ay an ausantage, since allows testing to be eondueted inthe content ‘ofa more general alermtive hypothesis, oF as disadvantage, since it dos not permit the prceise nature of an alternative hypothesis to play a role sand thereby inerease the power of the test, Monte Carlo studies, however, have show that this potential dans back isnot oP gees cone For the special ease of i linear restrictions ithe CNL. model with «” Known, the LROW and LM tests are equivalent to the # test (which in this eircumstaaee because ois known, becomes 4 x7 twsty When eis know, see Vardcle (1981) For the relationships among these tests In many eases it Tur nt tha he parameters « ng several misspei ite functionally independent of each otber, so thst dhe snlormation mattis 1s Block diagonal to this case the LM statistic far testing all the misspecification jointly ts the sum ofthe LM statistics Tor testing each oF te misspecitications separately. The same ts true forthe W and LR sissies AV nonlinear resietion ean he written in difterent ways, For example. the res eB = 1-= O could be werten ase — 1B Wald (ex statistic fs sensitive to whieh way the restrilion is wat they recommend the former version, eCregery and Vell (198S) tnd that the Earthis esumple 4.6 Bootstrapping Jeong and Maddals (1993) is 8 good survey of bootstrapping in an text. Li and Mauldala (1996) extend this survey, eoncenteating on Veal! (1987, 1992) are good examples of econometrie applications, and Veall (1489, 1998) are conene surveys of such applications. Efron and Sibshitam (1993) is a detailed exposition An impheit assumption of bootstrapping is thatthe errors are exchangeable, meaning ‘hat each erro. which inthis case is one of the N residuals (sample size N), i equally likely to occur with eaeh abservation, this may not he trae, For example. larger error variances might be associated! with larger values ol one of the explanatory variables. in which cave aye errors are more likely co oveur whenever there are large values of this explanatory variable. A variant of the boost elles the complete boot ‘employed to deal with this problem. kach of the \ observations in the original Fs wollen asa vector of values containing an ebservation on the dependent variable and sm associated observation for each of The explanatory variables. Observations fe a Monte Carle reps cited sample are drawn with replacement Rom the set af these technique iniroduces three innovations. Fist. t implicitly employs the tu lunknown errors because they are part of the dependent viriable vaios. and keeps these unknown errors paired with the original exphnatory varianle valkoe with sehich they were associated. tes. implicitly using the 1 sony sathable values as i Second, it dows not enyplay estinittes oF the wnknown parame: parameter values And thied. i ne longer views the ed in repeated samples, assuming insted that these -xlequistely approx inated by a diserete distrib tion giving equal weg ‘observed vector of values on the explanatory var ihles. This nithes sense ina cantext in which the obscevttions ne a small subset oF a ge population of simu observations, Unfortunaicly it docs not make Sense if Ute orginal observations exhaust the population, ay would be the ease, for example. if they wore observations on al lige Caalian cities, Nor wold it nthe sense i a con text in which a rscarcher selec the valies of the explanony vatiales 10 suit the wlan process. [also would not be suisble for probless vallcs were drawn from a iste Stxly eather tan via some in which the errors see autocorrelated in that the eror far one observation i related to the error fr anothers in this ease a bootstrapping residuals technigue would have to be ‘used sith an appropeiate modification ta ereate the desined error eerelation in each Dbootsirapped sample, The message here is shat the bootstrapping procedure must be ccaretilly thought out for each application, Aa alternative computer-based means of esienating a sampling distibution of & test af associated sth a randomization permutation fe. The rationale behind this esting methodology is that uf an explanatory variable kas no influcnee on dependent variable then i should make litle diffegence tothe outcome oF the West Sk tistie if the values ofthis explanatory variable ane shuffled and matic up with cilTer= cent dependent variable values. By performing this sbullling thousands of times. each tinve calculating the tes statistic, the hypoubesis ean be teste by seciny a the or test statistic wile i wus relative tothe thousands of test stariste vues ereated by the shuttlings. Notice bow different is the meaning of the sampling distribution it no longer corresponds to “what would happen if we drew different bundles oF errors now st corresponds to “what would happen if the inlepensksnt variable values were paized with different dependent variable valves.” Hypothesis esting is based on view ing the test Statistie as having resulted from playing a game of chance: the randomize jon view of testing elsinis thal there is more than one way to play a game of chance with one’s data! For further discussion of this testing methodology’ ip the eeonomet= rigs context see Kennedy 11995) and Kenedy and Cade (1995), Noreen (19X89) 1 a ‘good elementary reference TECHNICAL NOTES 4.1 Increduction itis trues a spe Hrenvor aditonal testing method- usually denoted A tupe Ferraris concluding the null hypothesis is filse is concluding the null hypothests is true whet tis Fase blogies set the probability of # type [emer (called the 8 the signifeance fevet equal joan arbitrarily determined vale (lypically $%) and then rigximize the priser (one misus the probability of a type I exror) af the test. A fest called inifonaite must posverful (UMP) it has preater power thar any other test of the same size for all deprees of falseness oF the hypothesis. Econometric theorists work hasd to develop fancy tests as noted by MeAlesr (1994334, test that is never used eas zero power, Sugeest Hal Hess mst be sim opertonmit A test is consistent iFits power woos to one as the sample size grows to ifinity, seme ‘img that wally happens ifthe tes! is based in a consistent estate, May tests are resulting taneously diese inis- 2 problem ome other mon eriti- saintained + for mis msspeei= tla should hese tests, sses rival ined were ‘on modelthe words of Hendry and Richard (1983, p. 112), “the data generation process 1s Complicated, data are searce and of uncertain gelevance, eaperimentaion ie unvontiolled aud availible theories are highly abstract and rarely uncontrover sial.” Second. most econametricians would agree that specification is an inmovs tiveimaginative process that cannot be taught: “Pven with a vast arsenal of diggnosties iis very hard to write down rules that ean be used to guide 4 dita analysis. So much is really subjective and subtle..., A great deal of what we teach in applied statistics is mor written down, let alone in 2 form suitable far for smal encoding. tes ust simply “lore’™ (Welch, 1986, p. 405) And third, there is no accepted “best” way of going about finding a eorret specification There is ftle chat cun be done about items one and two above: they must he lived with. Item threc, however, is worthy of further discussion: regardless of how difficult a specification problem, oF how limited a researcher's powers of innovation/imagination, an appropriate methodology should be employed when lundertaking empirical work. The purpose of this chapter isto discuss this ise it should be viewed as a prelude to the examination in chapter 6 of spec Jations ofthe first assumption of ehe CLR model 5.2 THREE METHODOLOGIES Until about the mid-1970s, evonometricians were too busy doing evonometries fo worry about the principtes that were or should be guiding empirical research. Spurked by the predictive failure of large-scale econometric models, and fueled by dissatisfaction with the gap between how econometrics was tout and hw it was applied by practitioners, the profession began to examine with a ertical eye the way in which econometric models were specified, This chapter is in part a summary of the state of this ongoing methodological debate. At considerable risk of oversimplification, thtee main approuches to the specification problem ate described below in stylized form, (1) AVERAGE ECONOMIC REGRESSION (AER) This approach describes what is thought to be the usual way in which empirical ‘work in economics is undertaken. The researcher begins with a specification that is viewed as being known to be correet, with data being uscd primarily to deter- ‘mine the orders of magnitude of a small number af unknown parameters Siunificant values of diagnostic test statistics, such its the Durbin Watson statis. tic, are initially interpreted as suggesting estimation problems that should be dealt with by adopting more sophisticated estimation methods, rather than as pointing to a misspecification of the chosen model. If these more sophislicated methods fail to “solve the problem. the researcher then conducts “specification” tests, hunting for an alternative specification tha ig “better,” using age-old erite= fia such ay correct signs, high R’s, and significant ¢ values on coetficients known” to be nonzero, [ous tn the AER approach the data ultimately do role in the speci despite the researcher's initial attiuade regarding. the validity of the theoretical specification. This role niay be characterized as pro cen iple model and “testing up” © a specifi (2) rest, rest, Test (117) This approach uses econometrics to discover which models of the economy are tenable, and to test rival views. To begin, the intial specification is mae more general than the researcher expects the specification ultimately chosen to he, and testing of various restriction, sch as sets of coefficients equal fo the zero vee~ tor, is undertaken to simplify this yeneral specification: this testing ean be char acterized as “testing down’ from a general to-a more specific model. Following this. the model is subjected to batery of diagnostic, pe misspecification, test, hunting for signs that the model is misspecified, (Note the contrast with ALK ication” tests, which hunt for specific allerative speeifications.)A signif= icunt diagnostic. such as 4 stall DW value. is interpreted as pointing to a model fication rather than ds pointing Wo a need for more sophisticated estima method The model is continually respected until a battery of diagnostic tests allows 4 rescarcher to conclude that the mokiel > satstictory om several {discussed in the generat notes), in which case st is said to be (3) FRAGILITY ANALYSIS ~~ jeation ultimately arrived at by the typical AER or TTT search may be inappropriate because its choice is sensitive tothe initial specification investizat- ced, the order in whieh tests were undertaken. type [and type If errors, and in merable prior beliefs of researchers concerning the parameters that subtly influence decisions taken (hrough the exezeise of inovation imagination) throughout the specification process. ft may, however, be the ease thatthe differ ent possible specifications that could have arisen from the AER or the TTT approaches would all lead fo the sume conclusion with respect to the purpose for Which the study was undertaken, in which ease why worry about the sp tion? This is the attitude towards specification adopted by the third approach Suppose that the purpose of the study is to estimate the coefficients of some “key” variables, The first step of this approach, after identifying a yeneral family OF models, is to undertake ain “extreme bounds analysis." in which the eoetti- cients of the key variables ure estimated using all combinations of included ‘exeluded “doubtful” variables. Ifthe resulting eange of estimates ts wo Wide for comfort, an attempt is made to narvaw this range by conducting a agility analysis.” A Bayesian method (see chapter 13) is used to incorporate non-sample information into the estimation, but in such a way ws to allow for a ayesian information. corresponding! to the range oF seh infor range of this- Way researchers interested in this estinn ton. This cage oF infocmation will produce a range of estimates of the pa {or of interest @ navrow range (*sturdy” estimates) implies thatthe date hard yield usefil information, but if this iy not the cas be concluded that inferences trom these dat C'tragile” estimates). it must re too fragile to be believed. Which the best of these three general approaches? There is no agreement th one of these methodologies is unequivocally the best to employ. tach has faced criticism, « general summary of which is provided below (1) The AER i the most heavily criticized, perhaps because it reflects most accurately what researchers actually da, It is accused of using, econometric, merely W illustrate wssumed-known theories, The attitude that significant hosties reflect estimation problems rather than specification errors is sivwed ay a swpecially negative light, even by those defending this approach, “Testing up" is Recognized as inappropriate, inviting type 1 errors through loss of contre] over the probability ofa type error. The ai! how use of extrancous information, uch. the “right” signs on cocficient estimates. is deplored. especially by those with a Bayesian bent. The use of statistics such as R°. popular with thoy following tis methodology. is frowned upon, Pezhups most frustrating iy erties isthe lack of a well-defined structure and set of enters for this app iy never an adequate deser hs there tion of the path taken to the ultimate specification (2) The TTT methodology i also citcized for failing in practice 10 provide an adequate description of the path taken tothe ultimate specification, necting an underlying suspicion that practitioners using this methodology find it neces, Tip, Ouse many of the ad foe rules of thumb followed in the ALR approach THs could in pact be a reflection ofthe role played in specification by renown Wonfimagination, which cannot adequately be explained or defended. ber Fonetheless unsetling. The heavy reliance on testing in this methodoloey raises fears of a proliferation of type I errors (ercating prevest bias. discussed itreacing 12.4 of chapter 12), exacerbated by the small degrees of freedom due ta the wenn general intial specification and by the fact that many af these testy have only ssomptoti justification, When “testing up” the probability of a type | eror Rether known nor controlled: using the “testing dovst” upproach can allay these feats by the adoption of a lower «value forthe tests, bt this is not ouinely done. (G) Objections to fagitity analysis usually come from those not comfortable Wit the Bayesian approach, even though care has been taken to make it palo able to non-Bayesizns. Such objections are theological in nature and aot feck, 2 Pe resolved. There iy vagueness reyatding how large a rane of parsimete EStimates Tas to be to conclude that it is fragile: attempts to formalize this lead fo messuites comparable f0 the test satisties this approach seeks to avo, The methodology never docs lea to the adoption of a specific specifient ‘hing that researchers find unsatisfuctory. There is no seope forthe ser fami ly of models initiatly chosen 1g by changed in the light of what the data has 10 sty. Muny researchers find Bayesian prior formulation both difficult and alien, Some objeet that thts analysis coo often coneludes that results are fragile 5.3. GENERAL PRINCIPLES FOR SPECIFICATION Although the controversy over econometric methodology has not yet heen revolved, the debate has beew truittil in that some general principles have emerged to guide model specification. (1) Although “ieiting the data speak for themselves” through econometve esti mation and testing is an important part oF model specification, economic theory should be the foundation of and guiding force in a specitication search, (2) Models whose residuals do not test as insignificantly ditt noise (random errors) should be initially viewed as conta cation, not as needing & special estimation procedure, researchers are prone 10 do, (3) “Testing down” is more suitabe than “testing up": one should bee weal, unrestricted mnodel and then Systematicatly simplify it in li the sample evidence, In doing this a seseareher should congo rhe ovcrall probability of 3 type 1 error by adjusting the «value used at each stage of the testing (as explained in the technical notes), something which too many researchers neglect to do. This approach, deliberate overfitting, involves a loss of efficiency (and thus loss of power) when compared to a search beginning with w correct simple move. But this simple model may not be worrect, in which case the approach of bexinning with a simple model and expanding as the data permit runs the danger of biased inference resulting from underspecitication, (4) Tests of misspecification are better undertaken by testing simultancously for several misspecifications rather than testing one-by-one for these mise specifications. By such an “overtesting” technique one avoids the problem of one type of misspecification adversely affecting a test for some other type of misspecification. This approach helps wo deflect the common eriti- ism that such tests rely for their power on aspects of the maintained fiypothesis about which litle is Known, (5) Reyurdiess of whather oF not it possible to test simultaneously for mie specitications, models should routinely be exposed 10 a battery of misspeet fication diagnostic tests before being accepted, A subset ofthe data should be set aside before model specification and estimation, so that these testy can incluce ests for predicting extra-sample observations, (6) Researchers should be obliged to show that their model encompasses rival models, in the sense that it can predict what results would be obtained were fone to run the repression sugested by a rival model, The chosen model from white isspeciti in with 1 of

MIT Press - Kennedy P. - A Guide To Econometrics, 4e PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MIT Press - Kennedy P. - A Guide To Econometrics, 4e PDF

Uploaded by

Copyright:

Available Formats

You might also like