You are on page 1of 335
SCHAUM’S oullines Second Edition POS ULAR Sau. a = Updated examples with the most current U.S. and world data Two complete self examinations = New chapter on Time Series Econometrics Perfect for pre-test review Use with these courses: A sisiics and Econometrics [7 Statistical Methods in Economics } Quantitative Methods in Economics (+! Mathematical Economies [ Micro-Eeanomics Macro-Econames Math for Economists: Math for Social Scienees BS RTS Theory and Problems of STATISTICS AND ECONOMETRICS SECOND EDITION DOMINICK SALVATORE, Ph.D. Professor and Chairperson, Department of Economics, Fordham University DERRICK REAGLE, Ph.D. Assistant Professor of Economics, Fordham University Schaum’s Outline Series McGraw-Hill hii Th Cara Hl Cogpeight ©2002 hy The MecranHill Compnins, fe. All ighisteservnd, Manufanared inthe Wied Steve Ameren cep | ered under te Ue ates Copyright Act 01 no pata! ens pasiicion may he repeadaced Cede my orm foc by any means oc ee in a danse o reival ssem, without the peice wrinen permis be publi 07-1 30566-7 ‘The mesa inthis eRe so apa inthe pein version this il: Oa 4852.2 All trademarks ae trademarks of tei respective owners Rather then pt atradeturk symbol afer evry accurece of sade marked same, use mars nan edie Cashion only and ote bene. he trae mark mer, with mo iment of ing ‘ment ode wader, Beere sch Jesignsions appear in Chis Dak, ey have en pine with ial eas ‘MCh Hill ks we avila a ia pean nse wc un pcoio daengsead ny i i np ‘ining pgm For move inti, pease come Geowpe Haare, Special Ses at geoige Nowe Pacyeaw Dillman 212) Saale), TERMS OF USE This Copyright week and The MeCraw-Hill Cormpies, Ine. (°MAcew-HiL") an isles veer al rights in and (he wok, Use of this wor i tet wo hese toma. Except as permit wer the Cpsyigh ACT of TTS a he right ste aun euieve cae copy ofthe Work you muy wt vom, issotnbe,reveie exper, reproduc, sxe cale deta ‘works based open tana dtr, disseminate, sl, 9 aublcen work or ay pat fice concent. Yu may use the week fr your ow mancoreerca nd pers se: any other use of he wok ie mictly pr le Your right no the wrk may be semintod i you fat comply wich those tema ‘THEWORK IS PROVIDED "AS IS", MeGRAWAIILL AND ITS LICENSORS MAKE NOGUARANTEES OR WARRANTIES, AS TO TH: ACCURACY, ADEQUACY 8 COMPLETENESS OF OR RFSULTS TO BF OBTAINED FROM USING THE WORK, INCLUDING ANY INFORMATION THAT CAN BE ACCESSED THROUGH THE WORK VIA HYPERLINK 08 OTHERWISE, AND EXPRESSLY DISCLAIM ANY WARRANT, EXPRESS OR IMPLIED, [MCLUDING BUT NOT LIM. ITED TO IMPLIED WARMARTIES OF MERCIAN TAMILTFY 8 FTTNESS FORA MAITICUEAR BUILPOSE- Merve tn iene donot watraace pune tha he Fanci contained athe wexk wil meet youreeguiement eth is oper: sion wil he uninterupted or era free. Nether MeGeaa-HIl mart lcensrs shal be Hable you ot anyone else for any im: ‘acy, em oc cto, regardies Oca, rhe WCE oe MH ages ren eTeOM. Mra Rae NO repo "Sil forihe conten of any infomation aveseed rough the work. Under no cumstances all McGraw-Hill nd its cen sense able ray inc, incident, peca pie. consegoeaia oe emir dares tha result fe he une €or nak ley w seat work, even ify of them hasbeen asc the posit of such damagex This ntation Vai sl apply to any claimer case whatsoever whether mach elaieer cause arises in ears, nto otherwise. DOK: 10.1m36u0TEDeS«S? This book presents a clear and concise introduction to statistics andl econometrics. A course in statistics tr casmunnctiice io wften ume uf the mest useful but abso ane of the ununtdillivalt ut Use reyined oun ses in colleges and universities. The purpose of this book is to help overcame this diliculty by using a problem-solving approuch, Each chapter begins with a statement of theory, principles, ar background information, fully lla strated with examples. Thies followed by numerous theoretical and practical problems with detailed, step-by-step solutions. While primarily intended as a supplement to all current standard textbooks of statistics andor ecomomettics, the haok can alse he uscd asan independent text. aswell as to urpplement class lectures, ‘The book is aimed at wollege students in economics. business aciministration, and the social sciences taking a one-semester or a one-year course in statistics andjor econometrics. It also provides a very iwseful source of reference for M.A. and M.B.A. students and For all those who use tor wold Hike to use) statistics and evonometries in their work, No prior statistical background is assumed, The book is completely self-contained in that it covers the statistics (Chaps. | to $) required for econometrics (Chaps. 6 0 11), It is applied in mature, and all proofs appear in the problems section rather than in the text itself. Real-world socioeconomic and business data are used, whenever possible, to demonstrate the more advanced econometne techniques and models. Several sources of online data are used, and Web addresses-are given for the student’ and researcher's further use (App. 12). Topics frequently cncoumered In econometrics, such as multicollincarity and autocorrelation, are clearly and concisely discussed as to the problems they create, the methods to test for their presence, and possible conection toclusigus. i this seam edition, we have expuanied the computer appliativis ty prusake a reneral introduction to data handling, and specific programming instruction to perform all estimations im this book by somputer (Chap. 12) using Microsoft Excel, Eviews, or SAS statist have also added sections on nonparametric testing. matrix notation, binary choice models, chapter on time sorics analysis (Chap. 11}, field of econometrice which has expanded at of late. A sample statistics and econometrics examination is also included. ‘The methodology of this hook and much of its coment has heen tested in undergrad graduate classes in statistics and econometrics at Fordham University. Students found the approach and content of the book extremely useful and made many valuable sugesstions for improvement. We have also received very useful advice from Professors Mary Beth Combs, Edward Dowling, and Damo- dar Gujarati. The following students carefully read through the entire manuseript and made many ‘useful comments: Luca Bonardi, Kevin Coughlin, Sean Hennessy, and James Santangelo. To all of them we ate deeply grateful, We owe a great intellectual det to our formar profesor of tatisies and econometrics: JS. Butler, Jack Johnston, Lawrence Klein, and Bernard. Okun ‘We are indebied to the Literary Executor of the Inie Sir Ronald A. Fisher, F. R.S., to Dr. Frank Yates, F. K.S.,and he Longman Group Ltd., London, for permussion to adapt and reprint 1apiss IL and IV from their hook, Statistical Tables for Biolagical, Agricultural and Medical Research. In addition 10 Statistics and Econometrics, the Schaum's Outline Serles in Economies includes Microeconomic Theory, Macroecanomic Theory, International Economics, Mathematics for Economists, sand Principles of Ecomrnies Dosmack SxLvarone Derrick Rescu New York, 2001 ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Terms of Use, CHAPTER 1 CHAPTER 2 CHAPTER 3 CHAPTER 4 CHAPTER 5 Introduction LL The Nature of stausbes 12 and Econometrics 13 ‘Methadalogy of Econometries Descriptive Statistics 2A Frequency Distributions 22 Measures of Central Tenvleney 23° Measures of Dispersion 24 Shape of Frequency Distributions Probability and Probability Distributions 31 Probability of a Single Event 2 Probability of Multiple Events 33 Diserow Probability Distributions: The Binomial Dastriburion 34 The Poisson Distribution 35) Continuous Probability Distribstions; The Normal Distribution Statistical Inference: Estimation 41 Sampling 42 ie Distribution of the Mean 43° Estimation Using the Normal Distrib 44° Confidence Tntervals for the Mean Using the ¢ Distribution Statistical Inference: Testing Hypotheses SA Teating Hypotheses 52 Testing Hypotheses about the Population Mean and Proportion 3° Testing Hypotheses far Dillerencey between Two § Proportions SA ChisSquare Test of Goodness of Fit and Independence Analysis of Variance Nonparametric Testing ‘STATISTICS EXAMINATION CHAPTER 6 ‘Simple Regression Analysis 6.1 The Two-Varlable Linear Modet 62 The Ordinary Least-Squares Method ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Terms of Use, 1 1 1 67 67 a 69 87 87 87 9 92 124 128 128 128 CHAPTER 7 CHAPTER & CHAPTER 9 CHAPTER 10 CHAPTER 11 CHAPTER 12 CONTENTS 43 Tests of Significance of Parameter Estimates GA Test of Goodness of Fit and Correlation 65 Propertics of Ordinary Least-Squares Estimators Multiple Regression Analysis 7 The Three-Variable Linear Model ‘72 Tests of Significance of Parameter Estimates 7.3 The Coctficient of Multiple Determination 74 Test of the Overall Significance of the Regression 7S Partial-Correlation Coefficients 766 Matrix Notation Further Techniques and Applications in Regression Analysis 1 Functional Form 82 Dummy Variables 3.3 Distributed Lag Models Forecasting BS Binary Choice Models $846 Interpretation of Binary Choice Models Problems in Regression Analyst 91 Multicolineas 2 Heteroscedastici 93 Autocorrélation 94 Errors in Variables Simultaneous-Equations Methods 10.1 Simultancous-Equations Models Ww tasnuticauon 10.3 Estimation: Indirect Least Squaes Wa Estimation; Two-Stage Least squares Time-Series Methods ut 2 “3 14 Testing for Unit Rant ILS Cointegration and Error Correction 11.6 Causality Computer Applications in Econometrics 12.4 Data Formats 122. Microsoft Excel 130 ne 133 154 1st 158 1ST 158 158 181 181 182 182 133 184 185 266 266 267 vi CONTENTS 12.3 Eviews 124 5A5, ECONOMETRICS EXAMINATION Appendix: 1 Binomial Distribution ‘Appendix: 2 Poisson Distribution “Appendix 3 Standard Normal Distribution Appendix 4 Table of Random Numbers pendix § Student's ¢ Distribution ‘Appendix 6 ‘Chi-Square Distribution ‘Appendix 7 F Distribution Appendix 8 Durbin Watson Statistic ‘Appendis: 9 Wikeoxon Appendix 10 Kolmogorov-Smirnov Critical Values ‘Appcadis 1 ADF Critical Values ‘Appendix 12 Data Souroes on the Web INDEX 268 18 Introduction 1A THE NATURE OF STATISTICS ‘Statics refers to the collection, presentation, analysis, and utilization of numerical data to make inferences and reach decisions in the face of uncertainty in economics, business, and other social and. physical sciences. ‘Salisties is subdivided into descriptive and inferential, Deseriptive statistics is concemed with summarizing and describing: a body of data, Mnjerential seattsvies is the process of reaching general- izations about the whole (called the populatian) by examining a potion (called the sample). In order for this to be valid, the sample must be representative of the population and the probability of error also. must be specified ‘Deschiptive statsties is discussed in detail in Chap. 2. This is followed by (the more crucial statistical inference: Chap. 3 deals with probability. Chap. 4-with estimation, and Chap. 5 with hypoth sis testing EXAMPLE 1. Suppose that we fave data on the incomes of [000 US. families. This body of data cam be Summarized By foding the average family income and the spread of these family incomes above and below the iiverage The data also can be described by constricting a table, chart, or graph of the number or proportion of families fm each income clase. This i descriptive statictace. If those [00 Famili are representative of all US. families, we ean then estimate and test hypotheses about the average family ancome an the United States at a whos Since these conclusions are subject to error, we also would have to indicate the probability of error, This 1 saristeal Inference 1.2. STATISTICS AND ECONOMETRICS -Economiciricy refers to the application of economic theory, mathematics. and statistical techniques for the purpose of testing hypwoubwescs ancl id foreeasting eennomic phenomena. Feane- imetrics has become stvongly ilenified with regression analysis. This rolatcs a dependent variable to one ior more independent or explanutory variables Sines relationships arenag aecnomi: variahles are generally inexact, a disturkance or error term (with well-defined probabilistic properties) must be incluted (500 Prob 1 8) ‘Chapters 6 und 7 deal with regression analysis: Chap. 8 extends the hasic regression model; Chap. 9 deals with methods of testing and correcting for violations in the assumptions of the basic regression model and Chaps 10 and 11 4 two specific areas of econometrics, specifically simultancous- equations and time-series methods. Thus Chaps. | to 5 deal with the statistice required for sconometricr (Chaps. 6t0-11). Chapter 12 is concerned with using the computer to aid in the cileulations involved in tho previous chapters ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Terms of Use, 2 INTRODUCTION lomar. t EXAMPLE 2. Consumption thoory tells ws that, in general, peop: inexease their eonsumpéion expensiture C as thete dicporable (after-tax) necome ¥, increater, bul aot by at mach ag the dneveace in thairdisporalie income. This ean be stated i explicit incur equation fost 38 cathe wa ‘where by and dare unknown constants called parameters, The parameter hy is the slope coulicient representing the frarginal propensity to constume (MPC) Since even people with icatical disposable ineowe are Likely to have somemhat different consumption expenditures, the theoretically exact and deterministic relationship represented by Eq, (11) must be madifed to inchude a randam disturbance or error term, 1, making tt stochastic: c ba heee uw 13 THE METHODOLOGY OF ECONOMETRICS Econometric research, in general, involves the following throe stages: |. Specification of the model or maintained hypothesis in explicit stochastic equation form, together with Uwe a primi theoretical expectations about de sign and size of Ure pranuncters of the function, 2 Collection of data on the variables of the model and estimation ofthe enefficients of the function ‘with appropriate econometric techniques (presented in Chaps. @ to 8). 3. Evaluation of the estirsated coefficients of the Funetion om the basis of economic, statistical, and eeanamettic criteria, EXAMPLE 8. The frat stage in coummctic sorcarch wu cannumption theuny ie alate the than i exp stochastic equation form, as in Eq. (1), with the expectation that fy > 0 (ae, at Ty = 0, C>Oas people dssave fandjor harrow) and 0. = ET secand sfage involves che collstion of data on consumption sxpondicure and Sispacable income an estimation of F.(1 1). The died rage in econometric research involves (1) hing ose if theesticnated vale of be O aed by 1: Ohdeterminine a “satisfactory” peapartion af the variation ia Cs explained by changes in Vand i hy andy ae “statistically significant at acceptable level [se Prob. 1.15) and See, 5.2 and (3) testing to see the assumptions of the basi regression model ae satistied oF. if not, how to correct for violations, If the estimated relationship does mot pass these tests, the hypothesized relationship mnsst be modified and reestimated until a satisfactory estimated consumption relationship is ahseved. Solved Problems ‘THE NATURE OF STATISTICS 1.4 What is the purpose and function of (a) The ficld of study of statistics? (by Descriptive sta fisties? (03 Inferential statistics? (@) Statistics is the body of procedures and teriques sed to calle, presen, and analyze data on wish totinedcesions Inte ice ofuncermny or incomplctsiformation. Sac ana Isused voy in practically every profeson. The economist weit to test the eicency of akernative prodution techniques the buitesaperoh ay nae ft test the pot eng or package that mses aes the sociolosist to analy the res of a drug habitation program; the instil peychologit to tczorina borkart ruepouit to plant etre sree the pola etn 1 foes Woting pale the physi tows the efciveness of new drug; the chemist to produce cheoper frien and so (0) Design stasis suhasicd «bik of dat with nt eek W fai tha cas tne whole data. abo telorsfo the prosetation of boy of dala he rm of tabs, chars graphs another Forme of graphic dp CHAP. 1) INTRODUCTION 3 12 13 La (©) Infeccotial statistics dhoth estimation and hypothesis testing) refers to the drawing of generalizations about the properties of the whole (called a prpufarion} trom the epeific or a eample drawn from the population. Inferential statistics thus involves inductive reasoning. (This is to be eoatrasted with rletuetive reasoning, which asesibes properties to the spestie starting with the whe) (a) Ate descriptive or inferential statistics mone important today? (B) What is the importance of a representative sample in statistical inference? (c) Why is probability theory required? (a) Statics started as a purely descriptive scence, but it grew émto a powerful too! of decision making as its inferential branch was seveloped. Modern statistical analysis vefers peicnaily 4 inferential o¢ inductive statistics, However, declucive and insuctive statistics are complementary, We must stusly hhow to generate carnplae from poptlationc before we can beara to gooeraline from expe to popati (Uy Gis oidee for statistical aatnnve tir be val iL aimst be based om a siaiple that Fully safety the characterstis and properties of the population feom which is drawn. A sepeesentative sample is soriced by random rampling, whershy ach sloment of the population hae an aqual chance of baing Included! in the sample (see See. 4.1). (0) She the puossbiliy uf eis enink i staintialinfercem, elinaten ye teks oa pupa prone y characteristic are given together with the chance or probabikty of being wrong. Thus probability thoory i an essential slomont in statistoal infarc How can the manager of a firm producing lightbulbs summarize and describe to a board meeting the results of testing the Hife of a sample of 100 Tightbwihs produced by the fin? Providing the (raw) data on the le ofeach in the saeple of 100 Kightbulbs prod oc by’ the firm would be very inconvenient and ime-consursing or tne tard members to evaluate. Instead, the manager might summarize the data by indicating that the average life ofthe bulbs tested is 360h and that 95% of the bulbs tested tea Uetest 320 nd A001, Byung Ui Une nana io ridin te wicoen fini sativs (ee average if and the spread i te average lif) that characterize the life ofthe 100 bulls tested. ‘The manager taka might want to describe the data with a table or chart indicating the murber or proportion of bulbs tested that lasted within cach IO-Nclasification, Such a tabular oF araphic representation of the data is abso seny usefil for gaining a quick oversiew of the data stimmaririna and deseabing the data inthe ways indicated, the manages is engaging in deseriptive statistics. It should be noted that descriptive statistics can be used to summarize and describe any how of data, whether it sa sample (as above) ora population when all the elements of the population arc known and its characterstics can Be calculated) (a) Why may the manager in Prob, 1.3 want to engage in statistical inference? (6) What would this involve and require? a} Quality control requires that the manager have a Fairly good idea about the average life and the spread ‘in the life of the lightbulbs produced by the firm. However, testing all rhe lightbulbs produced would destroy the entire output of the firm. Even when isting does not destroy ths produc, testing the entire output is usually probibitively expensive and famecconsurring. The usual procedure i ta take asample ff the output and infer the properties and characteratice of the entire ousput (population) from the conesponding charsetenstiss of a sample drawn [rom the population, (6) Statistical inference requires frst of ll that the sample be repsesentative of the population being sampled. If the frm produces lightbulbs in diferent plants, with more than one workshift, and ‘with raw materials from different suppliers, these must be represented in the sample in the propertion in which they contribute to the tolal output of the firm, From the average life and spread in the if of the bulbs in the sample, the fim manager might estimate, with 98% probability of being correct and 1% probability of being wrong, the average Ife oC all the lightbulhs produced hy the frm to be berween S20and 400 {oee Sec. 4.3). Instead, the manager may use the sample information ta test, with 95%. feovtabaity of being corvest and £0% plokublity of bung Weomg, that the average life of the population of all the bulbs produced by the firm is greaier than 320h (see See. 5.2) In estimating or testing the average fora population from sample information, the manager engaping im ctateical inference INTRODUCTION lomar. t STATISTICS AND ECONOMETRICS. 1s 7 What is meant by (a) Econometrics? (b) Regression analysis? {c) Disturbance or error term? () Simultaneous-equations models? («) Exonomeiries is the integration of economic theary, mathematics, and statisical techniques for the ‘parpov: of teting hypothotor about aconamic phenomena, extimaling eveliconts af economic relation ‘tips, nd forecasting oe prodieting funure values of economic Variables or phenomena. Econometrics is suhtivided into theoretical and applied econometrics Thvweetical accaowmieenis neters to the methaels for measurement of economic relationships in general. Applied econameteics examines the problems encountered and the findings in particular elds af economics, such as demand theory, peaduction, investment, consumption, and other fells of applied eeonomie rewearch, In any case, econometrics is partly art and portly a ssience, because oRen the intuition and good judgment of the ssonometrician plays a crosial role (6) Regression analysis studios the causal relationship between one economic variable to be explained (the Aepenlent variable) und one ar mare independent or explanutary variables, When there is only one ‘iwdependent of explanatory variable, We have simple regression. la the wore usual case of tase that ‘one independent or explanatory variable, we have mullple regression. (©) A frandom) disturbance or error must be included in the exact relationships postulated by economis theory and mathematical esonamnis in order ts make them stochastic (ic, in onder 1 reflect the fact that in the real world, ccanomic reathonships among econoraic variables are inexael and somewhat ertatic). (df) Simultaneous equations models refer to relationships among economic variables expressed with mone than one equation and such that the ssonomic ¥artables in the various equations imeract, —Simuta- ncous-oquations rodclsare the most coraplex aspect of economnetsics and are discussed in Chap. 10. (a) What are the unctions of coonometnes? (0) What aspects ol ecomomets (and other social sciences) make it basically different (rom most physical sciences? (4) Beonometies has basically theee closely interelated functions. The first sto west coonomie theories or hnypothesee. For example, is sansuraption directly relied tn income? Ts the quantity demanded of a commodity inversely related to its price? The second function of econometrics is to provide numeral estimates of the coefficients of economic relationships. These are csscntial in decision making. For ‘xample,a government polieyemaker needs to have an aocurate estimate ofthe svefisient of the relation- ship between consumption and income in order to determine the stimulating (i. the multiplier) effect fof proposed tax reduction. A manager needs to know i a price reduction increases or reduces the total sales rexenues of the firm and, if so, by how mach, The thd function of econometrics is the foresasting of events. Is, 109, i Mesessary a. orver for polkeymakers to faRe apprOprae cArrecteNe action ifthe ratz of unemployment or inflation is predicted 10 rise an the future. () There are two basse differences between econometrics (and other socialsciences) on one handl, and most physical sciences feuck as physic) oa the other. One is that (as pointed out eather relationsheps among economic vanabies are ensxact ars somewhat erratic. Ihe sesond 1s that most economis [Phenomena coeur sontemporancensy, s that Iahoratery experiments eannot be conducted, These differences require special methods efanalyss (sich as the incisslon ata disturbance or error sre with the cxaet relationships postulated by economic theory! and multivariate analysis (each ae multiple regression analysis), Ths Istts issltss the affect of sash indspondent or explanatory variable on the dependent variable in the face ef contemporancous change in all explanatory variables, In what way and for what purpose are (a) economic theory, (S) mathematics, and (¢) statistical analysis combined to form the field of study of econometrics? |w)Peonometes presupposes the existence ofa body of economic theorkes or hypotheses requlring texting. [the variables suggested ly econarnic theory do nat pravade a satisfactory explanation, the researcher nny copra it alternative rsialions anid Vaniables suggested by paved Lats oe carols theories, In this may, economeiti research can lead to the acceptance. ection, and reformulation of sconomie theories CHAP. 1) INTRODUCTION = 18 (6) Mathematics is used to express the verbal statements of economic thearies in mathematical form, expresiing an exact or daterminiatic fanciional relationship between the dependent and one oF more independent or explanatory variables, (0) Statistical analysis applies appeopiate Hla oes to etic the ncaa aud uomenpesiaee tal clation ships among economic variables by utling relevant economic data and evaluating the results, Wht justifies he inlusion af disturbance or ceeur (erty in regrension analysis? ‘The inclusion of a frandom) disturbance or estor teem (with wellatined probabilistic properties) is required in regression analysis for three important reasons, Firs, sings the purpose of theory isto generalize and simplify, ceonomie relationships usually inlude only the most important farces at work, This means that nuimeraus other variables with slight ane repr effects are not ineluded. The error term can be viewed a representing the act elect ofthis large number of small and irregular forees at work. Second, the Imctusion of the error ferm ean be JUsihed -oTder to take mer onsiceration the Net eect oF possAbkesrTaT: im measuting the dependent variable, ar variable being explained. Finally, sinee human behavior usally Gifers im random way under idcnlical circumstances dhe disturbs or ceror term eam be uoed wo expluse this inherently random human behavior, ‘This ersor term thus allows for inéiritual rarsiomn deviations from ths enact and deterministic relationships postlated by economic theory ng mathematical economics, Consumer demand theory states tat the quantity demanded of'a commodity Dy isa function of, or depends an. ils price Py. consumer's income and the price of otber (related) commodities, say, commodity Zi, Fz). Assuming that consumers" rast remain constant during the period. of analysis, tate the preceding theory in (a) spociic or explicit incar form or equation and (6) in stochastic form. (c} Which are the costtcients to be estimated? What are they called? @ Dy=By4b\Pr by) + bP sn oo Dra hy thiPrth¥ ther te a (e) The cooticents to be estimated are by by, and by, They ar called paranster THE METHODOLOGY OF ECONOMETRICS. 110 With refercnee to the consumer demand theory in Prob. 1.9, indicate (a) what the frst step is in econometric research and (4) what the a priori theoretical expectations are of the sign and possible size of the parameters of the demand funetion given by Eq. (1-4) (a) The first step in econometric analy is to express the theory of consumer demand in stochastic ‘equation form, as in Eq. (14), and indicate the a priori theoretical espestations about the sign and possibly the size ofthe parameters of the Function. (6) Consumer demand theory postulates that in Eq. (1.4), < 0 (indicating that price and quantity ase inversely elated, by = 0 ifthe commodity is a normal pood (incieating thax consumers purchase more of the commodity at bigher incomes), by =i X and Z are substitutes, and by <4 X and Z are complements Indicate the sccond stage in econometric research (a) in general and (4) with reference to the demand function specified by Eq. (1.4, (a) The second stage in econometric research involves the caleetion of data on the dependent variable and ‘on each of the independent or explanatory variables of the model and utilising these: data for the ctipivieal eitimatlon of the pacaineters of the wiodel. The i URUslly davse with multiple regression analysis (diseussed in Chap. 7) (oy tis wrdee to stints the desman fection given by By. (1), data must be solleeted ou (Up the «quantity demanded of commodity ¥ bby consumers, (2) the prive of ¥Y, (3) consumer's incomes. and. (8) the price of commodity 7 per unit of time (ie, par day, month, oF yeas) and aver a number 12 INTRODUCTION lomar. t of days. months, or years. Bata on Py. Vand Py are then regressed against data on Diy and estimates of parameters by by bane By obtained, How doos the iype of data required to estimate the demand function specifiod by Fa, (1.4) difler fear the type of ata eat wail Be teqired ta estimate the consumption function for a gecsp af families at ane pons in rie In onder to estimate the demand function given by Eq, (1.4), numencal values of the vanables are required over a period of time. Fer example, ifwe want to estimate the demand finetion for coffee, we need she numerical value ofthe quantity of coffee demanded, say, per yeas, over a numberof years say, ram 1960 ter 1980, Similarly, we need data om the average peice of colle, canstmers income, and the ries, of say, tea (a: aubatitute for coffec} per gear from 1960 to 1980, sta that give nimerical wales for the warinbles of 8 function from pertod to period are walled tinw-serics data. However 1 estimate the consumption funtion for 4 group of Families at one point in tec. we ced crorssectional data (L., numerical valucs foe the consumption expenditures and dispacable incomes of each Family in the group at particular point in time, say, mn 192 What is meant by (a) Lne third stage am econometnc analysts! (b) A pnori theoretical en teria? (c) Statistical criteria? (al Feonometric criteria? (e) The forecasting ability of the moet? he evahvalion of the estimated matel on the sis of ty of the model (6) Thee print economic criteria fer to the sign and sas af the parometers of the model paatulated by csonomie thoory. Ifthe estimated cooflcicats do wot conform to those postulated, the mods! must be revised or rgected (©) The statistical crteria eefer ta (1) the proportion of variation inthe dependcat variable “explained” bby changes inthe independent or explanatory wariables and (2) verifention that the dispersion or spread of eich estimated evellicicot around the true parameter is suficiently microw Lo give us "eon ‘dence in the estimates The ecowomeerc criteria reler to test that the assumptions of the basic regression model, and particu: larly these about the disturbance or 0 (WT W a normal good), and wf; > O ( Za substitute for A, 28 postulated by demand theory, () The statistical criteria are satisfied only if a “high” proportion of the variation in Dy. ovce time is “explained by changes in Py, Vand P, and ifthe dispersion of etmated 4, an By aro the {rue parameters are “slficiently narzoww.” There is no generally atcepled answer as to what sa “high” ‘proportion ofthe variation in Dy “explained” by Fy, P.and Py. However, beause of eommon trends in imesseries data, we would expect more than $0 0 70% of the varlation In the dependent varlabie 10 bbe explained by the independent or explanatory variables for the model to be judged satisactory. ‘Silty, in eee fit sich estnasted cacnnt to La Staisacally signi” wre Wahl eae the Alspersion of cach estimaicd cosflcient about the true paramcier measured by is standard devi as Seo, 21) to be panoraly lee than half the ertimatad salve of the eoalficiont CHAP. 1) INTRODUCTION 7 Las (0) The econometric criteria are used to determine i the assumptions of the ceanometric methods used are catiled inthe ecimation of the demand fnetion of Eq. (11) Only i thew aerumptione are ratified ill the estimated coefficients have the desirable properties of unbiasedness, consistency, fficeney, and sa forth (s98 See. 64 Qe way to test the forcxasting ability of the demand model given by Eq. (1.4) isto use the estimated Faction to predict the value of Diy For a periad mat included in the cample and checking that this predict! value s "sufficiently close tothe actual observed value of Dy foe that perise 15 stages Of econometric research 4 Mathematical riod 1 oonomettic (stochastic) model Stage 2: Collection of approprints data 4 Entimation of the parameter of the model ‘Stage 4: Evaluation of the model om the basis af sconemie, atistical, and seonometric critecia I C74 Accent theory Reject theory Revise thenry if compatible if incompatible if incompatible with data wits data wwith data L Prediction Confrontation of revised theory vont new dana Supplementary Problems THE NATURE OF STATISTICS ut (a) To hich field of study is statistical analysis important? (6 What are the most important Functions of Sescripeive statistics? (¢} What is che most important function of inferential statistics? Ars. (a) Toccanomics, business, and other social and physical sciences (By Summarizing and describing | body of data. (0) Drawing inferences abst ths characteristics of 4 population from the comesponding characteristics of a sample drawn from the popallation. (a [s statistical inference associated with deductive or inductive reasoning? (8) What are the conditions required in order fr statistical inference to be wali ‘Ans. (a) Unduetive seasoning (b) A representative sample and probabiity theory STATISTICS AND ECONOMETRICS [Express in che form oP an explicit Incr equation the statement that she Level of investment sponding F bx inversely related 10 rate of interest R dn J y+ byR with by postulated to be negative us INTRODUCTION lomar. t 1.4 What is the answer to Prob, 18 an example of? dng. Aneconomne theory exproted io {enact or deerennitis) evatheratia! form 1.2m Express Bq. (1. in stochastic form, ss. Tet 4b Ro U6 1.21 Why isa stochastic form required in econometric analysis? sing. Becavse the rbationshis among economic variables are inexact and somewhat erratic as opposed to the exact and deterministic relationships postulated by economic theory und matherutical economics THE METHODOLOGY OF ECONOMETRICS 1.2% What are wager (a) ome, (4) two, and (4) thies in oomometaie research? Ans. (a) Spesiication ofthe theory in stochastic equation form and ification of the exposted signs and posse since of estimated paramtrs (8) Collertinn of dats on the warnbles ofthe movil ana timation Othe coofcients ofthe Function. (ch Eeonoeni, statistical, and cconometic evaluation ofthe estsmated rameters 1.28 What isthe frst stage of esonometic analysis for the investment theory in Prob. 118? Ans. Stating the theory iv the Form of Ea. (2.6) and pricing by ~ 0 1.24 What is the sosond stage in esonometric analysis forthe investment theory in Prob. 1.18 Ans, Colfsstie of time-series data on / and and estmation of Ea. (8) 1.26 What is the third stage of ssonometic analysis for the investment theory in Prob, 18? dus, Determination thatthe estimated coeficient of 8, ~ 0, that an “adsquate” proportion of the variation in Fover ome 6 explatned” by changes in R, that 6) is“satistically significant at eastornary levels” and that the econornetsic assumptions of the madel ate satistied Descriptive Statistics 2 FREQUENCY DISTRIBUTIONS frequency distribution, This breaks upp s the number of abservations in each class. The number of sfisiribution is obtained by dividing the number The sum of the felative frequencies equals |. A histogram isa bar graph of a frequency distribution, where classes are measured along the horizontal axis and frequencies along the vertical axis. A frequency polygom isa line raph of a froquency distribution resulting from joining the frequeney of each class plotted at the class midpoint, A. cumecative frequemey distribuste cach class, the total number of observations in all classes up to and including that class. W. this gives a dlstribution curve, or ogive tis often useful 1o organize or arr the data into: groups ar classes anal sh classes is usually between Sand 15. A relative frequenc plott EXAMPLE 4. A student rescived the following grads (measured from 0 to 10).on the 10 quizses he took during 3 semester: 6,7, 6,8, 5, 7,6, 9, 10, and 6, These grades can be arranged into frequency distributions asin Table 3 | and shown graphically as in Fig. 2-1 Table 21 Freqsensy Distributions of Grades Grades ‘Absolute Frequency Relative Frequency t 1 ‘ oa 2 2 U l L el o eo io Lo Fig. 24 9 ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Tenms of Use, 10 DESCRIPTIVE STATISTICS [oHar, 2 EXAMPLE 2. The cans in a sample of 0.cans of fruit contain net weights of frit ranging fram 19:3 to 20.90%. a5 piven in Table 22, If we want to group there data into & claster, wo git eforr éntoreak of O.Fox [(2L0-192)/6=03ed. The weights given in Table 2. can be arranged into the frequency distributions gven in Table 9 ¥and chown praphically in Fig. 9-9 ‘Vale 2.2 Net Weaghe i= Ounces of Feat 7 199 m2 199 m0 26 1 m4 1D 20d 201 9S MY M3 2S 199 WO He 19 198 ‘Table 2.3. Frequency Disertnutlon of Wels rr 192194 195197 19200 Dotan ma me mo7209 Panel A toga ae: Reve epee gain ® é «| a. a z ‘ a ea Panel ive anal; Prequeney peiyzoa " i 3 ‘weghie a ciate Fig 22 cua, DESCRIPTIVE STATISTICS u 2.2 MEASURES OF CENTRAL TENDENCY, Central tendency refers to the location of a distribution. The most important measures of central tendency are (1) the mean, (2) the median, and (3) the made. We will be measuring these for Populations (i... the collection of all the elements that we ars describing) and for samples drawn from populations, as well a Tor srouped and ungrouped data 1. The artiimietic mean of average, of a population is represented by ys (the Greek letter muy and. fora sample, by F (read “X bar"). For ugrouped data, ys and Y are calculated by the following, formulas: St am THEE (res) ¥ * where OX refers to the cum of all the obsarvations, while Nand m refer to the number of observations in the population and sample, respectively. For groped data, ye and Y are caleulated by oe and H (22a,0) +e where 7 roe to the sum of the Trequeney of exeh elass mes the chs mapornt 2. The median for uogrouped data is the-valuc of the middle item when all the tems are arranged in either ascending oF descending order in terms. of values: N4I Median = the ( im item in the data array 4) where’ refers to the number of items in the population (n for a sample). The median for _groupedt dava is given by the formula nfl—F Median = L425 Se (4) Whore J =lower limit of the median class (i¢., the elass that contains the middle item of the distribution = the number of abservations in the data set F = sum of the frequencies up to but not including the median class Jue = frequency of the median elas ¢= width of the elass interval 3. The mode is the value that occurs most frequently in the data set. For grouped data, we obtain (25) Where J. = lower hmit of the modal class (2. the class with the greatest Irequency) dy = frequency of the modal class minus the frequency of the previous class dy = frequency of the modal class minus the frequency of the following class = width of the eas rv ‘The mean is the mort commonly used measure of central tendency. The mean, however, is affected by extreme values im the data set, while the median and the mode are mot. Other meusures of central tendency are the weighted moan, the genmerric- moan, and the harmonic mean (soe Peobs. 2.7 to 29), 2 DESCRIPTIVE STATISTICS lemar. 2 EXAMPLE 3. The mean grade for the population en the 10 quizess given in Example 1, sing the Formula for nmogrouped data, ie LX _LO+TH64 8454746494106 _ 70 we 10 @ ‘To find the median forthe ungrouped data, ve fist arrange the 10 grades in ascending ovder: 5, 6,6, 6,6, 7.7.8, 1, Then-we find the grade of the (¥ + 1)/20r (10-+ 11/2 = $.Sth itr, Thus the median is the average ofthe Sth ‘nd 6th item in the array. ar (6-+ 72 =63. The made for the ungrouped data is 6 (he value that occurs most Frequently in the data set} sins EXAMPLE 4, We can estimate rhe mean for the grouped data given in Table 2.3 with the aid of Table 2.4 Ste 2 at [ns calcuration cous ne simpined coming, (8 Hrobs 2.0 Y= M08 0% Table 24 Caleulatlon of the Sample Mean forthe Data: in Table 2.3 Chass Frequency Weight, on | Midpoiae * pe 1294 193 193 195.197 196 3a ret) 19 8 vn 20..-203 m2 4 sas, m4 208 as 3 ee 20.1209 208 2 416 wid Te = 98402676 = oar Mod = 198+ Unbere £ = 19.8 = lower limit of the median class tic. the 198-2040 class which contains the 10th and 1th obscevations) f= 20 number of observations or terns r sum of frequensies up to bet not inchading the median class fre = 8= frequency of the median class 603 — width of class intersal Similarly Modest n= 8s¥o ssa 9tec As noted in Prob. 2.4, the mean, snadian, and mode for grouped data are estimates used when only the grouped data ble-ar to reduce calculations with a large wngrowped data ext cua, DESCRIPTIVE STATISTICS B 23 MEASURES OF DISPERSION Dispersion refers to the variability or spread in the data. The most important measures of disper- sion are (1) the aveeage deviation, (2) the variance, and (3) the standard deviation, We will mea sure these for populations and samples, as well as for grouped and unerouped data. |. Auerage devaateon. The average devianion (AD), also called the mean atieolute deveatton (MATD}, is given by ‘for populations (26a) nat for samples (ey where the two: vertical bars indicate the absolute value. or the walues oenitting the sign, with the other symbok having dhe same meaning as in See. 2.3. For grouped data ap LAX =o for populations (27a) sot ap-E2™=T pe ampts em where f refers to the frequency of each class and to the class middpoints, Variance. The population variance o? (the Greck letter sigina squared) and the sample variance # for ungrouped data are given by > Step) rw-xy 5 oe ond gf ES (28a) For grouped data eB od eo (290. 3, Standard deviasion, The population standard deviation ¢ and sample standard deviation s are the positive square root of their recpective variances, For ungrouped data a poy [ou = ul? uy - FF ae a Eni gg y= YEAS (2a. The most widely used measure of (absolutey dispersion is the standard deviation, Other measures (besides the variance und average deviation) are he range, Uhe Orrerguarcle range, and the guarate deviation (see Probs, 2.11 and 2.12). 4. The conffcient of variation (8°) measures relative dispersion: and (2.100) For grouped data or populations (2.120) and v=4 for samples (2.12) EXAMPLE 8. The average deviation, variance, standard deviation, and coefisint of variation For the ungrouped ata givon io Kxample 1 can be found with the aid of Table 2.5 (je = 7; eae Example 3k “ DESCRIPTIVE STATISTICS lomar. 2 "Palle 2:5 Custos he Dut bn Examgie 1 Grade | Yawn [Nal Wen? 6 |7 T 7 7 7 |? o ° ° 6 |? “1 1 1 s |r \ 1 1 5 7 2 4 1/7 ° 6 6 |? 1 1 9 |? 2 4 w |? 3 3 ’ 6 |? “1 1 1 Elteal=0 | DW am EXAMPLE 6. The average deviation, variance, standard deviation, and eoeficient of variation for the frequency distribution of weights (grouped data) piven in Table 2,3 can be found with the aid of Table 26 (1° — 2008 a; see Brample O31802 225 9 star quid [ELL OY POS _ Vous = 0.84202 # 0.3982 02 ae 0.0196, or 1.56! Yo War oz * [Note that in the formula for ? and ¢,a— I rather than m is used inthe denominator ieee Prob, 2.16 forthe reason} [Pr the fiers fv oor a Biv tis ssl thers may he esi that wl sey scars for a large body of data (soe Probs. 2.17 to 2.19 for their derivation and application} Table 246 Calculations om the Data in Table 24 a a we | edna | ER] efi} eam ae} Towa | rn we ie some | 0 4 |e | ome] on as ones manana | ans + | am | nae | one as De ava | 0 2 | aw | oz] on La si Eysss> Lae Foe cuar, DESCRIPTIVE STATISTICS 1s 24 SHAPE OF FREQUENCY DISTRIBUTIONS: The shape ofa distribution refers to. (1) its symmetry oF lack of it (skewness) and (2) its peak: edness (kurtosis) 1. Skewness. A distribution has zero skewness if it is symmetrical about its mean. For a symmetrical (unimodal) distribution, the mean, median, and mode are equal, A distribution is positively skewed if the right tail is longer. Then, mean > median > mode. distribution is neastvely skewed if the left tail is longer. Then, mede > median > mean (see Fig. 2-3). Mean Mode Mean ae on Pu A Syma Pama Rose shew na ent avd fg 23 Skewness can bo measured by the Pearson coeficien of skenness: sx = %A= met) for populations 23a) and se Em bop samples (2b) Monn and variance ary the first and second moments ofa distribution, respectively, Skowmeas an also be measured by the third moment [the numerator of Eq. (2.14a.b)] divided by the cube of the standard deviation: sea ZL or popattons (2 and SELEY compte eum For symmetric distributions, Sk = 0. 2 Kurtosts, A peaked curve is called leprolerric, as opposed to a flat one (plarykurric), relative te fone that is mesokurtic sce Fig. 2-4). Kurtosis can be measured by the fouth emament [the numerator of Eg. (2.154.01] divided by the standard deviation raised to the fourth power. The kurtosis for a mesokurtic curve is 3. Lepeokutic Meese 16 DESCRIPTIVE STATISTICS Eset for populations (2.154) : ana E LUT por sampes 2.090) 3. Joint moment, The comovement of two separate distributions can be measured by covariance: er Tyr -F) year) N N E(Y- WF) Ey) eo(¥, 1) — XY for populations cov(¥ Y= YF for samples ‘A positive covariance indicates that 1" and ¥ move together in rel negative covariance Indicates that they move In opposlie directions. jon to their means. A EMAMPLE 7, We cin fl the Poissons coslfict of keane fu the grinds givin Cosas 1 Ry nag ye 5 (see Example 3), and o = 18 (se Example 5): Heal 6a sie 2 Similarly, by using V = 20.08 o2, med = 2kox sce Example 4), and Pearson coefficient of skewness for the frequency distribution of weights 347 — med) _ 30 Sk 239. (see Example 6), we can fod the Table 2.3 as follows: Sk= 28015 toe Fi Le), For kurtosis, see Prob, 223, Solved Problems FREQUENCY DISTRIBUTIONS: ZL Table 2.7 gives the grades on a quis for a cass of 40 students, (a) Arrange these grades éraw data set) into an array from the lowest grade to the highest grade. (B) Construct a table showing class Intervals and class midpolats and the atsolute, ratlve, and cumulative frequencles for each grade, (@) Present the data in the form of a histogram. relative-frequency histogram, frequency polygon, and ogive, Taille 2.7 Grades on u Quite for x Class of 40 Statens (a) See Table 28. Table 28 Data Array of Grades > 2 2s 3 3 @ @ @ @ 4 5 5 5 § 5 6 6 6 6 Boe FF a 8 os 8 8 9 9 9 9 wo cuar. 2) DESCRIPTIVE STATISTICS " () See Table 2:9 Note that sinos we ars dealing here with discrete data (is. data expressed in whole snambere), we weed the actual grades ae the clues misdpoints. ‘Table 29 Frequency Distribution of Grades ‘Class Absolute | Relative Grade | Midpomt | Frequency | Frequency isa 2 3 ons > 2sh4 3 3 aus 8 asada 4 3 0133 a asa 5 5 as 6 5564 6 6 4.150 2 674 1 8 200 ” 1884 8 4 100 ™ 808 9 4 4.100 8 9s 10 uso 40 10 (0) See Fig. 25. Panel A: Mistogears Panel B: Relative Frequency Dicribution Fegan Relate fregeney : , ale we + Be i i i Panel ©: Frequency polbigon i ? i Gendee 2 DESCRIPTIVE STATISTICS lomar. 2 A sample of 25 workers in plant receive the hourly wages given in Table 2.10, a) Arrange thet caw data into ai aivay fiom the lowest to the highest wage, (2) Group the ata isto classes. (o} Present the data in the form of a histogram, relative-frequency histogram, frequency polygon, aud ogive. Table £10 Hoarly Wages is Dottars TAS M7e O8F 998 400 410 435 RSS ORE nme Sad 390 426 378 39S gOS ame 41S 380 ans 388 393 40d 4a dos (See Table 2.11 300 68 7S 378 380 SRS BAS ORAS 395 398 198 3.96 400 405 ans 405 406 48 40 413 48 42S 4.26 (@)Thshourly woges in Table 2.10 range from $3.55 to $4.25, This can the conveniently subslvided imio ® cqwal classes of $0.10 cach. ‘That is, {$8.30 ~ £3.50]/8 = 8080/8 = 80.1, Note that the range was extended from 3,50 to $4.30 s9 thatthe lowest wags, $3.55, falls win the lowest cass and the largest wage, $4.26, falls widhiv the largest class. Tt is also convenient (and needed for Plotting the frequency polygon} to find the class mark or midpoint of each class These are shou in Table 2 ‘Table 212 Froqucacy Distribution of Wages [Hourly Wage] Class ‘Absolute | Relative | Cumulative 5 ‘Mispoint, $ | Frequency | Froqueney | Frequcr ‘na sa) 360) 3.69 3 o.08 370-3.79 o.00 330 3,89 a0 4004.09 om 410-419 an 420-429 uns Loo) (6) See Figs 26 Armley of ttn thesrsive slo plo the eunvulalve esa wpe 53.595, 3.695,, 3.795, and so on (so asto include the upper limit of each class). The-values 53.595, 3.695, 3795, etc. are ‘often refern 10.45 the clase hoamdaricyaresct its, Moke hat the clays midints are obtained by ‘adding together the lower and upp class houndarizsaand divideng by 2. Forename, the second class smicposnt se goven by’ (3.508 4 3.688)/2 — 7.2002 — 3.65 (nee Table 2.12). cuar, DESCRIPTIVE STATISTICS ro Panel Ac Hisgram Pan Neate rogue sention gE : 1 fe =. ‘ gon ass = 3 olay Precl De Ogre MEASURES OF CENTRAL TENDENCY 24 Find the mean, median, and mode (a) for the grades om the quiz for the class of 40 students given in Table 2.7 (the ungrouped data) and (6) for the grouped data of these grades given in Table 29, (a) Since we are dealing with aif grades, we want the population smear DN TES 46445 MO ey x cr “ay = SPH ‘That ix, jb obtained by adding together all the 40 grades given in Table 27 and dividing by 40 [the three centered dats flips) were pat i 19 sNoid repeating the 40 values in Table 2.7] ‘The median i siren by the values of the [(W 4 1)/2th tem in the data array in Table 28 Therefore, the median ix the vale of the (40-4 1)/3 oF 20.5th, oF the average ofthe 20th and 2Ist item. Since they are both qual tn 6, the metinn is, The mind is 7 (the vale that qssare mot frequently in the ata set) (6) We can find the paputarian mean for the grouped data in Table 2.9 with the aid of Table 2.13 This isthe some mean we found for the ungronped data, Note that the som of the frequencies, $f. equals the number of observations in the population, N, and EN = 5°70. The median for the grouped data of Table 2.13 is given by = 554067 =617 20 mM DESCRIPTIVE STATISTICS lomar. 2 whore L.— $.5— lower limit of the median class (ie the 5,564 elass, whieh contains the 04h stad 298 obser vate) = 40 = number of observations F = 16 =su of observations up to but aot including the enedian. cass Frequency of the median class seith of class interval ‘The made for the grouped data in Table 2.13 i given by +74 avd Where £= 6.5= lower limit of the modal clas fue, the 6.5-7.4 class with the highest frequency of 8) i —2 = frequency of the maa clas, 8, mins the Frequency of the previons clas, 6 sh 4— frequency of the modal clas, 0, minus the frequency of the following class, # = L = wiih of the olass interval Note that while the mcan calculated from the grouped data is in this case identical to the mean saloslated for the ungrouped data, the median and the mode are only (goad) approximations ‘Table 2.13 Cakulaton of the Population Mam forthe Groped! Data in Table 29 Grade [Class Midpoint x] Frequency aa Z 2534 3 3544 4 as Sa s 5$64 6 5ST T 7584 ‘ es o4 ® 95-1 Find the moan, median, and mode (a) for the cample of hourly wage received by the 25 workers recorded in Table 2.10 (the ungrouped data) and (d) for the grouped data of these wages given in Table 2.13, oe yp EX _ sas 4 sizes 9.68 SEAM or S98 8 Medion = $3.95 the value of the fn 1/3 (25 | 1) = 13th fe in the data array in Table 2.11} Moge ~ §3.95 and 54.05, since there are three of each of these wages, Thus the distriution is iste Ge at hs tuo ates (6) We can dnd the sarmple mean for the grouped data im Table 2.12 with the aif af Table 214: Note that in this ease 5 fil = 98,75 # SO’ ='998.65 (found in part a) since the average of the cobrervation: in sack clace ic not equal ta the clacs midpoint for all classes [ar im Prob, 2.38 cur, 25 1 DESCRIPTIVE STATISTICS 2 ‘Thus T cabcuated from the grouped data is only a very good approximation for the trie value of F calculated for the ungrouped data. nthe neal workd, we often feave only the grouped data, or if we have a very lasge Body of usgeouped data, i will save on calentions to estimate the meat by fest cermping the atm Te 1 compared with the true median of $4.95 found from the ungrouped data (sce part). age HOT 5H Mode = 1+ (0.10) = $400 + 80.028 = $4028 or S403 1s compared ‘ith the true modes of 5395 andl $4.05 found from the ungrouped data (see part a Swvaetinin Un re senor given asthe anidpwnnt of te wa tas ‘Table 2.14 Caleutation of the Sample Mean for the Grouped Data in xt = Compare the advantages and disadvantages of (a) the mean, (6) the median, and (c} the mode as measures of central tendency. (ah Te aug Une vnc ase CF iC Gains an sleet by vinhslly everyones (2H lle observations in the data are taken into account, and (31 it & used in performing many other statistical procedures and tests. The disadvantages of the mean are ()) itis afested by xtreme Values, (2) it is time-consuming to compute for a large body of ungrouped data, and (3) if cannot be calculated shen the lst clate of grouped data ie opemended (Le, it inchudes the lower limit of the last class “and aver”) G8) The ausaniages of themmalias a's €1) ibis uw alfeted by cuisine valuss, (2) i iscaily netstat (Gc. hal the data are smalles than the median-and half are greater, and (3) it ean be calculated even whan the Inst olast 9 open-ended and shen the data are qualitative rather than quantitative, The slsadvantages of the mean ars (1) it does not use much of the information available, and (2) ib recpires that obearsations be arranged into an amray, which ie time consuming for a Harge badly of ‘ungrouped data, (0) The enlvantayss wf the enous are the sans as theme For iis snsaion, The analsantagss uf ahs mode are (J) as for the median, the mde docs not use much of the information available, and (2) sometimes no walns of the data is repested mons than ones, ao that there is no mode, while al other times there may be many maces. In general, the mican i the most frequently used measure of| central tendency and the mode ic the beat wiod 26 aT 28 DESCRIPTIVE STATISTICS lomar. 2 Find the mean forthe grouped data in Table 2.12 by coving (ie, by assigning the value of x = 0 te the tho Sth esses ai — —1, yo = —2, eRe eae lower elas and j= Hy jem 2, oe 80 cach Larger class and thon using the formula Terst ce, (210) where Xp is the midpoint ofthe class assigned j = O and cis the width of the hiss intervals}, Ses Table 2.4, ‘Table 2.18 Calculation of the Sample Mean by Coding forthe Grouped Data in Tabi 212 Waously Wage, » | Clas wapomet, > | Codey | Prequeney ) 7 ee 3s 3 T se 360 aes 3 : mM 2 ars 1 2 380-389 3.88 0 4 390-399 sas 1 : 400-409 408 2 6 400-419 aus 3 3 420-43 4 z = 5395 Et sassy Sn in) sans “F for the grouped data formed by coding is identical to that found in Prob. 2.48 without coding. Coding eliminates the problem of having to deal with possibly large and inconvenient class rmidpoints; thus it may simplify the calculations. A firm pays a wage of 54 per hour to its 25 unskilled workers, $6 to its 15 semiskillod workers, and 3810 is IU skilled workers, What is the wergiied arerage, oF weighted meu, wage pais by this fim? In find the weightet mean, ox weighted average. of a poptlation, j4., oF sarmple. T. the weights, w, have the same function as the Frequency in finding the mean for the grouped dala. Thus Lew or a= ee (207) ‘For this problem, the weights are the number of workers employed at each wage, and Ss equals the sum of all the workexs (S425) + (56) (15) we wie ie This weighted average compares with the simple average of S6 (S44 S6-+ $8)/3 = S6] and i a betier imeasare ofthe average wages, Anation faces a rate of inflation of 2% in ome year, 5% inthe sevond year, ane 12.5% inthe third your. Find tho geametrio meun of tha inflation rates (the geometnie man, op Ng, of oat oF n Positive numbers is the mth root of their product and is used mainly to average rates of change and index numbers XN, (2.8) cur, 29 DESCRIPTIVE STATISTICS 2 where Nj Xy).00) Ny refer to the w (or N) abservations. He = Y/CVSVUTS = WTB = 3% This compares with = (24+ $+ 12.5)/3 = 19.5/3 = 6.5%, Whew all the musbees are equal, jg equa otherwise jy smaller than j. In practic, 1g i ealculated by logarithms: Slee N ‘The scometsis mean is wied primarily i the mathematics of finance and Finansial managsmeot op ho = (ny A commuter drives 1Omi on the highway at 60 mi/h and 10mi om local streets at 1Smi/h. Find. the harmonic mean, The harmonic mean jx is used primarily to average ratios: N Bu = Spe) a (1/60) + O15) (1 4)/60 10 sean Tos amie sscanpeted with je =O VIN = (14 16)/9.= 14/9 = 37 Sanith Note that if ris ecnnter had aereapied 30.5 mifh it would have taken her (20 on/37-Sanij6O min = 32min to drive the 2 mi. Insicad she drives Gimin om the highway (10 ai at 6@ mish) and 40 min oe local streets (10 mi at LS mii or a total of Sin, and this is the (comreet) answer we get by using jy = 2M igh. That i (20rni/24i/h) x 60 min = Sein. (a1 Por the ungrouped data in Table 2.2, find the first, second, and third quartiles and the third deciles and siatieth percentiles. (6) Do the same for the growped «ata in Table 2.12, (Quarriler divide the data into 4 parts, deciles into 10 parts, and percentiles into 100 parts) Go) Q) Uist quartile) =.4 (the average of the 10th and 11th vahies in Table 2.8) 2; (second quartile) = 6 = the valve of the Sth item = the median 2 (thied quastie) — 7.5— the value of the 20.2 itn Dy (third decile) = 5 the value ofthe 125th item Fa (sistiath percentile) = 7= the value of the 28.5 inom nis F af = 24 msassmnses (220 (ey Beals nit aa * (90,18) = 53.90.4807 = 8897 = median (22%) =" (sa10) = 5.00 sn0792 = $4.08 (227) 2 DESCRIPTIVE STATISTICS lomar. 2 (224) = $4.00 + (80.10) seis SH + $0067 = S402 1225 MEASURES OF DISPFRSION Ru 243 (a) Find the range for the ungrouped data in Table 27, () Find the range for the ungrouped data in Table 210 and for the grouped data in Table 2.12. 4c) What are the advantages and disadvantages of the rangs? (@) The range for ungrouped data is equal to the value of the largest observation rminus the value of the smallest observation in the data sxt. The range forthe ungrouped data in Table 27 is from 210 10, 0r8 points, (8) Tassie far th ageonped ata is Table? Inde feeen 814St0 $4 26, 08 STL TE Fae grange sata, the range extends from the lower lint ofthe smallest lass to the-upper Imi ofthe largest class, Fo the srouped data in Table 2.12, the range extends from $3.50 10 5.29 (©) The-advantapes of the range-are that it i easy to find and understand, Its disadvantages are that it ‘cso the lowest nl highest valves of adistriition, ee ereally illinsea by-exterme abies sand it cannot be found for aper-ended distributions. Bectuse af these disadvantages, the range is of tel usefulness (except in quality control. Find the interquastile ange aval Ue quantile deviation (2) fox the wrod it Fable 27 and (4) for the grouped data in Table 2.12 (w) The interquartile range is equal tothe difference hetwcem the tind and frst quartiles; - 21-9 1226 For the ungrouped data in Table 2.7, [R = 7.5 —4 = 35 points ftilizing the values of Q; and Q« found in Prob. 210 (a) Note that he antrguartl ange iv aot afte By careme values becane a lies cooly the mide Kalf ofthe data, Its thus better than the range, but ite no as widely used. the other measures of cispersion, For the quartile deviatio o = (22) QD Therefore, QD = (9.6 4)/2= 3.6/2. ‘one-fourth of the da (R= Q, ~ 0, = SA8 ~ $3.82 = $0.25 otilering the values of Qy and Qy Found ip Prob > 10(6¥ p= 21-21 _ $4.08 S383 1.78 points, Quartile devindon measures the average mange of 02s Find the average deviation for (a) the ungrouped data in Table 2.7 and (B) for the grouped data in Tabls 29. (a) Since ps = 6 [see Prob. 2a). Eu DHLSOFAH2ETSOS1ESESEAE IE LEIS DOE TEED EE $ASISOFIES424EG42EIES42FOS 1424340404 34441 n ap DL. Lspointe cur, 2d 1 DESCRIPTIVE STATISTICS 28 [Note that the average deviation takes every ebscrvation into aecount. It measures the average of the abvolute deviation of each abusrvation from the mean. It taker the absalute value (indicated by the to vertical bars) Because SO(¥ — 2} =O (see Example Sh. (oy We sae fal rstnes evant fv Une sane rpm da wits Une abd of Table 216 DA wl 72 Ap ND the same as we Found for the wngroupod: data, ‘Table 216 Calewtalons forthe Average estat for the Grouped Data im Tabbe 29 Clans Midpoint r Frequency. | Moan v—p| fra 2 3 6 a 3 3 6 3 4 4 5 6 2 0 . . «6 | 1 5 6 6 6 o fo ® 8 4 6 2 | 2 8 ° 4 6 sf 3 2 S104 0 2 6 a | a 8 Dyeve@ Elr-a=7 Find the average deviation for the grouped data In Table 2.12, ‘We can dnd the average deviation for the grouped data of hourly wages in Table 2.12 with the aid of Table 217 (F = 3:95, ee Prob, 2.463): Note thatthe average deviation found forthe srouped data sm estate of the “rus” average deviation ther comid be wad ke the agent ata Th sally es saat fers tbe Fran average devitin because we use the estimate af the mean for the grouped data in our ealculations [compare the values of T Found in Prob. 2.0) and (6) ‘Table 2.17 Calculations forthe Average Deviation for the Grouped Data in Table 2.12 Hourly Wage, [Class Midpoint] Frequency [Mean J ¥—¥,]|— HL] f= ¥h, s XS f 5 si] os s Sa-h60 hos 040 | 030 bs 30-478 335 120 | 020 oa 380-389 385 4 1 | 010 pap 30-398 398 5 o.08 | 0.00 boo 400-409) 4.05 6 ow | a0 a0 410-419 48 3 20 | 020 050 420-429 4 2 0.30 | 030 ba Lfaaas Eri T = 300 26 DESCRIPTIVE STATISTICS lomar. 2 AS Pind the warianoe and the standard deviation for (a) the ungrouped data in Table 2.7 and (@) the grouped data ia Table 29. (°) What is the advantage of the standard deviation over the variance? fa Te and 6 Goce Prob. 234) SUV Wh UGTA OFS ELA OS TE IE OE WS TELS E OFTHE ESE LG HOPLAOFLS OAS IG H4 SFOS ESO E TEA O FETE OS ICH =i 2h .8 points squared Eww _ (_ ae, on Pe a pe VEE 219 pons (6) We can find the variance and the standard deviation for the grouped dats of grades with the aid of Tale 218 SEyiy =u _ 92 Poet ints. square ° w ay = 48 points squared and om Var = VER 219 points the same as we Found for the wngrouped data “Table 2.13 Calculations for the Variance and Standard Deviation for the Data in Table 2.9 Frequency f tm?) fora? ” 2 16 36 2 Tifa = py = 192 (6) Tisacvantags af me stand deviating wer the waa is thatthe stata oval is mepesiel the same units a& the data rather than in “the wideh squad,” which is how the variance is expressed ‘The standard deviation is by for the most widely used measure of (absolute) dispersion. {E10 Find the variance and the stangard deviation for the grouped data in Table £10 ‘We san find the varie andthe standart deviation forthe groped data hourly wage withthe nit of Table 2.19 [¥ = $3.95; soe Prob, 2416): obs aT and cuar. DESCRIPTIVE STATISTICS 7 ‘Table 219 Caleulatlons forthe Variance and Standard Iestation fur the Data in Table 2.12 Hourly | Class tea Wage, S [Midpoint ¥. 3] Froguency/ |S iw - 8 fit = 3 yas] ass 1 335 016 Oe aap369 | 3.65 2 395 0.09 01s aman] 31s 2 39s oat sama | 38s 4 398 oat song | 95 5 39s 00 on-s00 | ans ® a9s oat wea] 4s a ass 04 amar] 43s 2 393 008 Epan-3 IT 18 ote that in the Formula for and s,m — I rather than 9 i used in the denominator. The reason for this is that if we take many samples from a polation, the average of the sample varianees does not ead to qual population variance, 0°, unlce we we» 1 i the donominator of the Formula for «(mora wll be sald oF this im Chap, 5). Furthermore, ° and s for the grouped data are estimates for the true Fane £ thot com be found foe the grouped data because ae ie the coimate of W from the grote eat i our ealeulations, Starting with the formula for a” and s' given In See, 2.3, prove that (@) (2.280,4) (6) (2.2¥0,4) oy We can get by simply replacing wih Tang 4 sth im the numerator and WHR A — 1 mn abe denominator of the Formal for EF =F EPO = tea DAF = eT A Nt @ x N x AEP ae pe EM m We can pet rin the ame way as we did in part a The preceding formulas will simplify the-aleulutions for of and st fora large body of data. Cadi also helps (see Prob. 2.6) Find the variance and the standard deviation for (a) the ungrouped data in Table 2.7 and (0) the groupe lata in Tate 2.9. wine rhe style canpuarianal fowmnulas in Prob 217 28 DESCRIPTIVE STATISTICS lomar. 2 ) SENT Hh 28 Nh dy 6b 8D 4 36 I Se BT 106 15 4 254 25 4164 36-449 4 18444494 254 B64 ADF RY Hd 16 RTE 164 16494 Od 4 Op SbF DY BT LOD 9 2S = 162 1.637 (any(36) L482 —1aan _ 197 = 4.8 points squared the samme a2 in Prob. 2.1548), (6) We can dnd o” and o-for the grouped data in Table 2-9 with the aid of Table 220 11.832 — (409136) _ 1,682 — 1.840 192 ae Vee = VER 2 19 points 48 polats squared the same as in part @ and Prob 2.15 Table 2.20, Calcutations for the Vartance and Standard Deviation for tn Tabte 2.9 Gente [Cae Mutpaior Y | Feeney re v ne 1S24 2 3 é 4 asa a 3 9 ° 1344 4 5 x 16 4354 s 5 3B 3 S84 6 4 % 36 6574 T 8 56 ” 184 8 4 2 a asa4 » 4 Ea a yo 2 a ow Lrewean| Ser 219 Find the variance and the standard deviation for the grouped data in Table 2.12 using the simpler computational formula given in Prab. 2.17(b) “We can tind ¢ and » foe the grouped data in Table 2.12 with the ld of Table 2.21 0.0342 dollars sauazed and os VOORE 50.18 the came se-we found in Prob, 216. cur, 220 DESCRIPTIVE STATISTICS » ‘Table 2.21 Calewlations for the Variance and Standard Deviation forthe Grouped Data i “Vane 212 Hourly Css ‘Wape,'S | Midpoint x8] Frequency |X, $ a a saess9 | ass 1 338 1200 s03.09 | 365 2 730 265450 amo3% | 375 2 7.50 28.1250 asos9 | as 4 15.40 9.2000 s903.99] 39s 5 19.35 7a0128 4oo-409 | 40s 6 ux | 164025] geal aoa | as 3 yas [inzmas| S167 amar | 42s 2 aso [isms] 361280 Efean8 [Es = 9075 Ee = ORs Find the coefficient of variation V for the data in {aj Table 27 and (6) Table 2.12. fe) Whats the usefulness of the cocificient of variation’? (a) with je~ 6 and 2.19 (se Prob, 2.19) a 219 points eo Gpeints 0835, of 4.38% (6) With = 93.95 and » 30.18 (oor Prob. 2.19) (©) The coefficient of variation measures the relatiw dispersion in the data and is expressed as a pure number without any units. This ss to be contrasted with standard deviation and other measures af ‘absolute dispersion, which are expressed én the unite of the problem. Thue the eoeficient of variation ‘cam be used to-compare the relative dispersion of two oF mare distributions sxpressed in diferent nits, c= wns lia he ee i val ifr, Fo esata we wa ay Un lenge eat it ‘Table 2.7 i greater than that in Table 2.12. ‘The ovellcient of variation aso can be used to compare the relative digpersion of the came type af data over different time periods (when ys ar F and ors change) SHAPE OF FREQUENCY DISTRIBUTIONS 220 Find the Pearson coctfcient of skewness for the (grouped) data in (a) Table 29 and (@) Table 212 (ah With j= 6, ned 6.17 [ose Prob 2.3¢8), and o 22.19 soe Prob. 2.15(61 _ Hucimed) i 2 zy Se 0.23. (a pure number) ‘Note that mectan is greater than mean and that the distribution is sightly negatively skewed (see Fig. 2a. (6) With T= $3.95, med — $3.97 [see Prob, 2-4()], and 5 280.18 (see Prab. 2.16) sx = 20 = med). 4395-397) _ 34-002) Sa a = 033 30 DESCRIPTIVE STATISTICS lomar. 2 2:22 Using the formula for skewness based on the third moment, find the coeflicient of skewness for the data in (a) Table 2.9 aad) Table 2.12, (@) We can find the eoelciont of skewness for the data in Table 2.9 using the formula based on the thind moment with the aid of Table 2.22: 2 “Tamm = 4 This indicates that this distribation is negatively skewed. but the dogree of skewness is measered differently than in Prob, 2.71 ‘Table 22% Calcuations for Skewness for the Data in Table 2.2 Grade Frequency [Mean fara] fa? isa z 3 6 ; 3 3 6 ass 4 6 2 assa s 6 | $ ssa 6 6 0 0 e578 T 6 L 1 8 1384 t 6 2] 8 2 asoa ® 6 af|on 108 osm % 6 a] ot E () See Table 2:23, [Note that regarutess of the mensure of skewness sed, te 398 | ~0.30 | 0081 onie: 370-339 373 ao 0.0016 sons 3.80-389 388 4 a9s | nin | ooo oon 3590-399 395 5 ass | 0 fo o 00-09 403 é sas | oo | 000 0006 410-439 4s 3 as | om | aoe sons 49 2 ass | 930 | ooost Sole EAN T= 00570 R DESCRIPTIVE STATISTICS lomar. 2 2:24 Find the covariance between hourly wage ¥ and education Y, measured in years of schooling in the data in Table 2-26 Table 2.26 Employee Hourly Wages and Years of Schraling Employee Hourly Years af Number | Wage x,3 | Schooling : 1 a0 n > now uu 3 2.00 0 4 20 R s 11.00 6 7 25.0 18 8 1.00 18 » 650 R io 825 0 From the calculations ip Table 2.77, cow(, ¥)~ (108.55/14) ~ 10.388, When 1 and 9° are both above a tuclow their means, eavariance # imereased. Wher X and Y move in apposite diectians relative to that cans (empress 9), cownriance i decreased Sinee in this ease eaw(N, V} >> 0. ¥ and ¥ mawe together to thetr means. Table 227 Employes] Howly | Years of ~ oo Number | Wage X,S [Schooling r] (x —F) iw-Tor-7} 1 2 327s | 18 5595 3 ta | -2775 | 38 lasts 4 1050 2 175 | -18 2.05 5 1.00 | -orrs| 22 1.705 6 1500 6 3aas | 22 avs 7 25.00 42 S548 8 10 4a os ° 650 13 9.495 0 828 38 13.398 suas Bix — THY — ¥y= W338 CHAP. 2 DESCRIPTIVE STATISTICS 2 2.28 Compute the covariance from Table 2.26 using the alternate formula, Computations are given in Table 228 eovy, ¥) = (17388/10) (11.728)(13.8) = 172.88 162.49 = 10.355. ‘Table 2.28 Caketations for Covartance with Altenate Fortsala Employee Supplementary Problems FREQUENCY DISTRIBUTIONS: 1226 Table 2.29 gives the frequency far gasoline pricesat 48 stations ina town. Present the data in the Form a bistogram, « elativedrequency histogram, a frequency polygon, and an ogive. Table 2.29 Prequeney Distibation of Gasoline Prices rice, Frequency Toot 7 1.01.09 6 Liga a Lis-49 1s Lae. Ls 227 Table 2.30 gives the frequency distribution of family incomes for sample of 100 families ina sty. Graphs the data into a hietogramm, a relative frequency bisogramn, a Frequensy polygon, and an ogive ™ DESCRIPTIVE STATISTICS lomar. 2 ‘Table 2.90 Frequency Disinbution of Feil Tneosvet Fanily Income, § Frequency 10,000-11,999 12 12900-13,999 4 14,000 15,999 4 16,000-17,999 Is 1s,000- 19,999 1B 20,000. 21,9901 7 22,000) 33,900, ‘ 24,000 25.995 4 25,000 27.999 3 25,000-29,959 2 10 MEASURES OF CENTRAL TENDENCY aoe 229 Find (ap the can, (i) the madian, and (2) tho mode for the grouped dasa in Table 2.29. ns. (a) e= S115 8) Mestan = $1.16 (@) Mode = S117 Find (ah the arean, (6) the median, and fc) the mode for the frequency distribution wf incomes in Table 23. Aes. (a) N= S170, (69 Median = $16 000 (<) Mode = $15,053 FFind the mean for the grouped datain (a) Table 229 and (6) Table 2.30 by eodi dns. a) jem S118 (6) V =S17,000 1A ins aay 5/120 iby aon vec bru ly ee of $5 1/9 of Uke ab Freee mane A 6, ana 1M wane PST. What isthe weighted average paid By this fn Ans. hy 2685.88 For the se anna of apa invested in cach of 8 yeas aa invest dacued a sabe? sets of 1% dicing the frst year, 4 during the second year, and 16% daring the thisd. (a) Find yg. (8) Find px (c) Which ie appropriate? Ans. (a) ig =H BET (2) Ho |A plane traveked 200 mi at 60¢mi/h and 100 mi at S00 mh. What was itsaverage speed? Ans. ty = 562.5 ih A deiver purchases $10 woul sf gasiling at 90.90 4 gallon aol SI $1,100 gallon, What is the average pice por gallon?” das. ty $0.99 per gallon For the grouped data of Table 2.29, Gand tu) the Rist quale, (b) the secon! quartile, fe} the than ‘quartile, (4) the fourth decile, and (e) the seventicth pereentie. ns. 2) Dy = SUI (b) Q. S116 (0) Q,~ 81M Kl) Dy SLING fe) Pyy 81.195 For the gouped ata ia Table 2.20, fund (a) the tnd (d) the sixtieth percentile, dns. (a) Q= SIRRST (6) eunti,(6) th ious, fe) the thie desis, 19,833 $19,538 fe) Dy = SAAT (4) Pay CHAP. 2 DESCRIPTIVE STATISTICS 38 MEASURES OF DISPERSION 2.37 What isthe range of the distiitution of (e) gasoline prices in Table 229 and (hb) family incomes in Table a0 Ans. (a) S29 (8) $10,000 to $29,999, oF $20,000 2.34 Find the interquartile range ane quartile deviation for the data in (4) Table 22% and (4) Table 2.30, Aes (ah IRE SO Mand ON NANG (b) TR SATéand OR S938 2.9 Find the average deviation for the data in (a) Table 229.and (b) Table 230. Ans: (a) SOOS?S (6) $3520 240 Find (the variance and (6) the standard deviation forthe frequency distribution of eusolin pices in Table 229 dns. (oh of & 0.0045 dollars squared (8) 0 & 80.0698 2AL_ Find (a) the variance and (bp the standard deviation forthe Frequency éistibution of family ineonses it Table 230, ans, (0)? 19,760,000 doulas squared (9) 3 2 3489.22 EAE Using the eer camperanional formaias, find (a) the variance and (b the standard deviation for the distribution of gasoline pries in Table 2.29, css (wh 0 0.0089 allars suaeed CE) 0 230.0099 2AB Using the easier computational fovitals Hae (a) the varinace aid (0) the standant deviation for the family incomes in Table 2.0, Ane. (2) = 19,760,000 eotlare scpeared (Bb) #5 $415.22 244 Find the coeficient of varintion V for (a) the data in Table 279 and (i) the data in Table 230. (€4 Which data have the greater dispersion? Ans, (2) 0.080, or Be (H) O61, oF 26.1% () The data of Table 2.30, SHAPE OF FREQUENCY DISTRIBUTIONS 245 Find the Pearson coefficient of ckewness for the data in (a) Table 229 and (5) Table 230. Aus, (a) — 0.43 (6) 0.07 246 Find the coofltent of seewness using the formula based om the thid moment foe the data in (2) sand (8) Table 2.30. Ans. (a) = 188 (8) 755 27 Pinal the sefsinat of ketosis for th data do (a) Table 720 nel (2) Tae 9 0 es, (a) 177 (8) 300 248 For covariance, (a) in what range should the covariance for directly elated data fall? (6) for inversely related data? (ec) for unrelated data” dies, (at sow = (b) cov 0 [e} cove Probability and Probability Distributions 3 PROBABILITY OF A SINGLE EVENT If event can oceur in ny ways out of a total of A’ possible and equally likely outcomes, the Probability that event will occur is given by Pia) wu) where P(A) = probability that event 4 will occur aq = number of ways that event 4 can o¢e NV = total number of equally possible outcomes Probability can be visualized with a Venm diggs ofa? are of the rectangle represents, PEA) ranges between 0 and I In Fig: 31, the circle represents event A, and the G2) Feu TE PEA} = 0, event 4 cannot occur. Wf Pla) = 1, event # will oocur with certainty ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Tenms of Use, CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS 7 If PAY} represents the probability of nonoccurence of event A, then PIA) + PLA) = 1 (ap EXAMPLE 1. A hicad {H) and a tail (T) are the two equally possible outcomes in tossing a balanced coin. Thus and A) PT) = 3 EXAMPLE 2. In rolling a fair die once, there are six posible and equally likely outeomss: 1, 2,3, 4 $, and 6 Ths Py FN 2} — 1H F5) GF “The probability of not rolling » Lis and EXAMPLE 8. card dock has 6 cans divided ints suits (hanaonds, Bears, chubs, aa spades) walle 1st it cach suit (1.2.3... Wjack, queen, king). Ifthe deck is welleabuficd. cach of the 52 cards is equally Wkly to be Picked. Since there are 4 jacks, the probability of picking a jack, J, on a sine pick ts m4 INR Since there are 13 diamonds, D. PID") 1 Prp) = La a and rib} + F(D') EXAMPLE 4. Sungei: that in 100 tess of a haikanssd sai we set OV Rend ame 47 tls Tha eeative forest of heads is $8/100, oF 0.53. This is the refative frequency or emplscul probabil’y, which isto be distinguished from the ¢ priori or catsiea! profabilry of FXEN) = 0.5. As the number of toss increases and approaches infinity in the limit, the relative frequency ot empitical probability approaches the a priori or elasical probably. For example, the relative frequency ar empirical probability might be 0.517 or 1000 tosses, 508 for 10,000 tosses, and s9 on. 3.2 PROBABILITY OF MULTIPLE EVENTS: 1. Rule of addition for manmutually exclusive events. Two events, A and B, are not mutually cexciusie ifthe accurrence of does not preclude the occurrence af B, or view versa, Then FA or By = Pia) + P(B)— P(A and B) a4) PLA and f) is subtracted to avoid double counting, This.can be seen with the Venn diagram in Fig 4 2. Rule of addivion for muruatly exclusive events, Two events, 4 and 8, are mutually exctusive ifthe soccurseive of of precludes the wveurrenwe of Byer vive versa [P(A aval) =O). Th Pid and Bl = Fi + PBL J) 38 PROBABILITY AND PROBABILITY DISTRIBUTIONS [cHar, 3 Fig. 32 3. Rude of multipbearion for dependlens events, Two events are dependem if the occurrence of ome is connected in some way with the occurrence ofthe other. Then the joint probability of A and B is PUA and B= PLA) PLBy AY (36) This reads: “The probability that Aorh events and # will take place equals the probubility of event A times the probability of event 8, given that event A has already occurred.” P(B/A) = conditional probability of B, given that A has already occurred (3.7) and P(A and 8) = PB and A) Ga) Dee rob, 5.1(6) and (a). 4. Rule of madtiptication for independent events. Two events, A and B, are independent if the ovcurtence of A is not connected in any way to the oocwrrence of B, [P{8/a) = P(B)). Then P(A and B) = #4) PB) (9) EXAMPLE 5. Ona single tossofa dic, we can get only one of six posible oateomes: 1,2, 3,4, 5,0" 6. These are routualy exchstve vents, W'the di is fait, P{1) = P(2) = P(3) = P(8) = 213) = #16) = 1/6. The probability of setting a2 ov a 3 on a single toss af the dic is PQQ oF 3) = PI) + P13) = Similarly {2 oF 3 oF 4) = Pi2) + Fi8) + Fea) = EXAMPLE 6. Picking at random a spade or a king o0 a single pick from a wellabufled card deck does not constitute two mutually exchuive events because we could pick the king of spades. This 1 L_w_4 PIS or K) = PIS) + PIK) ~ PIS and K) = Using set hey, the pec statement can be reuse i an euivlen way as 4 SUK) = FS) +AK)— PISOK) = B+ 3-B- Sak ‘where the symbol Ui (read “union”) replaces wv and 7 tread “intersection”? replaces and, EXAMPLE 7. The outoomcs of tao svocessine tosses of « hakanced coke ar inipondens cvents. The outcome of the first toss im no way affects the oirtsome on the Keeond tous, Tha PUH and Hy) — PHM) — PIR) EH) Similac, AH and H and Hi = PH HH) = 1H) POE Pt) = 3-4 EXAMPLE 8 The probally that onthe ist pick fom dak we gt the king. ood is ri, CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS » 1 the frst card picked was faded he Ky of lamonds and iF the fist card was not replaced the probability of getting another Kingon the cond pick is dapendiet er the Get pick because these aro now only 3 Kings and 1 cards bet in the deck. The conditional probability of picting another king, given thatthe king of dimaonds was already picked and not reptant. is 3 PiK/Kul = 5 ‘Thus the probability of picking the king of diamonds on the fest pick and, without replacement, picking another king of the setond pick is Pe Fike) RI RE) = 5 (Rp and KY= FUR) RIN) = 35°97 = Fees ‘abt iv M000, Relat to sonal gra combinations and permutations, or “counting techniques Bayes thounsn (see Prolt 3:17} Proilie 3s18 seven 33, DISCRETE PROBABILITY DISTRIBUTIONS: THE BINOMIAL DISTRIBUTION A candor variable is variable whose values areassociated with some probability of being observed. A discrete (as. opposed to continuous} random variable is one that can assume only finite and distinet values, The set of all possible values of a random variable and its associated probabilities is called a probability distribution. ‘The sum of all probabilities equals I (sce Example 9} ‘One discrete probability distribution is the binownéal distribution. This is used. to find the probability of ¥ number of occurrences or successes of an event, P[-), im m trials of the same experiment when (1) there are only rio possible and mutually exelusive outcomes, (2) the m trials are independent, and (3) the probability of oocurrence ur suns, g, remains eonsiamt im gach trial, Then Pin) agg (su) where sr! (read “a factorial) =n (# — 1) (mn — 2) ‘The mean of the binomial distribution is 3-21, and 08 | by definition (see Prob. 3.18). =n (3.41) The standard deviation is a= apa) (3.22) Ip — 1p —0.8, the binomial distribution is symmetrical; if p < 0.4, i is skewed to the rights and if p> 05, itis skewed to the let EXAMPLE 8 The possible outcomes in * tosies ofa halanced coin are TT, TH, HT, and MM. Thus 1 poet rin! ant mt a The number of heads is therefore a discrete random variable, and the set of all possible outcomes with their associated probabilities is a discrete probability distribution (Gee Table 3.1 and Fig. 3) ‘Table 2.1 Probability Distribution of Heads in Two Tesses of a lanced Coin Nurnber of Heads Poste Ouicomcs Probaity a 7 a TH ur 9.50 1 40 PROBABILITY AND PROBABILITY DISTRIBUTIONS [cHar, 3 as Prstabiby Nur hee Fig. 321 Probability Distetbution of Heads in Two Tosees x Balanced Coin EXAMPLE 10, Using the binomial distribution, we can find the probability of 4 heads in 6 Nips of a balanced ein as allows: a IB aye— ay! O63 aja t ta 5 43 me) 2 EL yoni asyen nxn as embers nh cet id peas can be aol wing App Ee ered nmtot ted nd te errsapeONh/}2) hose ie nama denise oe ota o- vn =a— TPR TA — Va VTS 1 22nd Bocanse p =0 5, thit probability disribation is symmetrical If we were not dealing with a coin and the trials were not dependent (asin sampling without replacement), we would’ have hid tee the hyporgecmeeri distribution (see Prob. 3.27, 34 THE POISSON DISTRIBUTION ‘The Poisson dsirsburion is another diserete probability distribution. Tt is used to determine the probability of a designated number of successor per ult of rimw, when the events of successes are independent and the average number of suscesses per unit of time remains constant, ‘Then Mes PAX) (ra) where X= designated number of successes PLY) = probability of W number of successes (Girock letter lambda) = average number of suocesses per unit of time ¢ = base of the natural logasithasie system, oF 2.71828 Given the valus of \ (the expected valle oF mean and variance of the P find e~* from App. 2, substitute in Eq, (3.13), and find PCY). n istribution), we can EXAMPLE 11. A police department reosives an average of Scallsper hour. The probability of reciving 2eallsin a randomly selected hour i Pix) ‘The Poisson distribution can be wsed asan approximation to the binomial distribution when wis large and vor | — p fe emall eay, 2 30 and mp $ and nil —p) > §, and it approximates the Fonson distribution when A > 11 Sse FTODS, 857 and 3.881, Another continuAKIs probatsty stistribution isthe exponential distro (see Prob. 3.39) Chebyshev 'stheceam, oF inequality, states that repardless fof the shape of a dlstriberlon, the proportion of the observations or arca falling within K standard deviations af the ‘mam is atleast |= 1/K, for X > | (see Probs, 340 and 3.72), cd ~ ar a eae + tf “ Yee Pip 38 Solved Problems PROBABILITY OF A SINGLE EVENT 31 (a) Distinguish among classical or a priori probability, relative frequency or empirical probabil- ity, and subjective or personalistic probabsity. (b) Whatis the disadvantage of each? (e) Why lo we study probability theory? a) According to classical prababily, the probability of an event A is given by Pid) = ¥ ‘where P(A) — probability that event 4 sill o¢eur re = number of ways event 4 can oecur N = total pumber of equally possible eutcoenes By the classical approach, we can make probability statements about balanced coins, fakr dice, and standard card decks a prior, ar-withowt tossing a coin, rolleng a cie, ar drawing a card. Relate eesucmy st erpirioal petabaiy i given by the eat of the wusnbe of ties ae vent cextrs the ‘otal number of actual outcomes or observations, As the ptumber of experiments ar trials fsach as the ‘ooring of a coin) increacer, the relative Frequency or erapirical probability approachec the laescal ora CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS a 32 a3 priori probability. Subjective or persnnatstc probability refers to the degree of betle/of an individual ‘that the event wall oceur, based on whatever evidence i available tothe individual () The classical ora priori approach to probability can only be applied to games of chance (such as tossing ss Traut, rns Fait iss oe pishins wards fiona stanadund aovh wf sao} lies we wae determine a prior, or without experimentation, the probability that an event will oscar, Ia realk ‘world problems of ceonamies and business, we afte cannot axdgn probabilities « price aad the classical approach cannot be used, The relaive-frequency or empirical approach eversomes the sicvantopes of the clastcal approach by sing the rvlalive frequenries of maasl ceewrrences as probabilities, The diliculty with the relativefrequency or empirical approach és tat we get different probabilities (relative Frequencies) for different numbers of trials or experiments, These probabilities stabilize, oF approach a limit, as the numer of tris or experiments increases. ecause this may be expensive and time-consuming, people may end Up using it without a “suflesent” aumber of trials of experiments. The disadvantage of the subjective or personalistic approach to probability is that sffrent people faced with the same situation may come up with completely different probabilities, (o) Most of the decisions me face in economics, business, seiece, and everyday’ life invatve risks and probabilities, These probabilities are easier fo understand and illustrate for games of choice bocanse Objective probabilities can easily be assigned to various events, However, the primary reason for studying probaly theory i 10 help us make intelligent decisions in economics, busines, selene, ant everyday Me when sk and uncertainty ase mvolved, What is the probability af (a) A head in one toss af a balanced coin? A tail? A head or « tail? (6) A 2 in one rolling of a fair die? Nota? A2ornota 2 (or (by. Sinee each of the 6 sides of af ic is equally likely to come up and a 2 is one of the possi Pi) = ‘The probability of not rolling 2 that is, #42") i given by cia 1-P Pays ei) = (iy a spade, (c) the King of spades, Cd) ner the king af spades, ar (0) the king of spades or not the king of spades? ah Since there are 4 kings K an the 9Z-earas oF the sangre acok a (6) Since there are 13 spades Sin the SE cards, P(S) = 18/52 = 1/4 (©) There is only one king of spades in the deck, thetefone PCRs) = 1/32 (ai The probability of not picking the king of spades is PUK) = 1 ~ 1/52 = S1/S2 o) (RS) | PORES) = 1/52 1 51/30 = 53/30 = 1, or exctainty “ 3s 36 PRODADILITY AND PRODABILITY DISTRIBUTIONS [omar 3 ‘An urn (vase) contains 10 halls that are exactly alike except that 5 are red, are blue, and 2 are gueen. What is the probability that, in picking up a single bal, the ball is (a) Red? (i) Due? (e) Green? (d) Nanblue? (e) Nongreen? (f) Green or nongreen? ¢g) What are the odds of picking a blac ball? (h) What are the ookls of wot piching « blue ball? Nn _$ « ny ho nas w « “ rip) 1B) 1-03-07 ) FG) 1 F(G)= 1-02-08 wn HG) + PG) 02408 = (e) Theodds of picking a he ball are piven by the ratio oF the mumber of ways of picking a blue bal to the ‘numberof ways of not picking & Hue ball, Since there are 3 Hue balls and 7 nonblue balls, the oddsin favor of picking a blue ball are 3 to 7, of 3:7 (ih) The odds of not (against) picking a blue ball are 719 3, or 7:3 Suppose that a 3.comes up 106 tlmes In 600 tosses of dle. ar) What Is the retanlve frequency of the 3? How does this differ from classical ora priori probability” (by What would you expect to be the relative frequency or empirical probability if you increased the umber of times the die is rolled? (a) The relative frequency or empirical probability of the 3 is given by the ratio of the number of times 3 comes up (106) out ofthe total number of times the dic is rolled (600). Thus the rekative froaucasy o7 empirical probability of the is 16/600 0.177 in 600 rolls. According to the classical ar a prion approach fand without rolling the die at alll, P(3) = 1/6 0.167. the die i fais, we expect the 3 10 ‘cme up 100 times in 600 rolls ofthe die as compared with the actual, observed, or empirical 106 times (b) Ifthe mumber of times te sane dic is roted is increased trom 60, we expect the relative frequeney empirical probability to approach (i, to becameles+ unequal with) the classical ora priosi peabalility The production process results in 27 defective items for each 1000 items produced. (a) ‘What is the relative frequency or empirical probability of a defective item? (b) How many defeeti do you expect out of the 1606 items produced each day? (e) The relative Frequency or empirical probability of defective item is 27/1000 = 0.027 () By muleplying the number of thems produced cach day (Ie00) by the relive fequency or emnplrieal probability of a defective stern (0,027), we get the number of defective items we expect omt of each day's ‘vutput, This is (1600}40.027) — 43, te the aearest ise. PROBABILITY OF MULTIPLE EVENTS a7 Define and give some examples of events that are (a) mutually exclusive, (b) not rhutually exclusive, (e) independent, and (af) dependent. (e) Two oF more events are mutually exelusve, or dinjoint, if the cectrsence of one of them precludes prevents the occurrence of the ethers). When one event takes place, the others) will not. For cuample, In-a single Mp of a coin, we pet elthor a head oF a tall, but nox both, Heads and calls are therefore mutually exchusive events. In a simple tous ofa dic, we get one and only one oF six possible watodnis, 1,2, 3,4, Sea 6. The oulscnies ant iefove swaRUally exclusive, A cas picked At éasons san be of only one sui: diamonds, hearts, clubs. orspades. A child is hom either a boy ara gi items produced an an assembly ine ic sither good or defective CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS: 48 39 (6) Two or more events are nos nautuaty excfustve if they may occur atthe same time. ‘The oesurrence of ‘one does not preclude the eocurrence of the other(s). For example, a card picked at randkown from 2 deck of cards can be both ant ace and a club. Therefore, aces and clubs ave not mutually exclusive vente. herr: we crmldl pick the ace of elnbs Resance wr eon have inflation and reeession at the same time, inflation and recession are not mutually exchisive events (2) Two or more events are inepondont if the oscarrence of one of them in no way afte the oceurrence of the other(s). For example, two successive fups of halanced coin, the outcome of the sacon Hip im 0 way dopeads ow te term of the fst fig Ths Sue is tre fay raw sures tasers a a fief dice or picks of two cards fram a deck with replacement, (Two oF more events are dependent if the securrones of onc of thom offsets the probability of dhe ‘ecurrence of the others) For example, if ae pick a card from a deck and do not replace it, the peababulity of packane the same card ae the second piek is Allother prokabiitirs alo are affected since there are naw oaly SI cards in the dock. Similarly. af the proportion of defective item is greater for the evening than for the morning shift. the probability that an item picked at random frem the evening satput is defective is arcater than for the morning oatput Drawa Venn diagram for ta) mutually exclusive events and (5) not mutually exclusive ever (c) Are mutually exclusive events dependent ar independent? Why? (a) Figure 3-6 illuctrates the Venn diagram for events 4 and é which ave enucuslly exclusive (6) Fagure 3-7 usteates the Venn diagram for events 4 and dF which are mot mutually exctusive. OO) Fig. 26 Figa7 (eo) Mutalty exchosive events are depsndent events, When one crsnt secure, the probability of the other occurring is. Thus the oecusrence of the fist allects (precludes) the escurrence of the other. What is the probability of getting (a) Less than 3 on a single roll of a fair die? (6) Hearts or clube on a cingle pick from a well chuilled standard deck ofearde? (s) A red or a blue ball from an urn containing 5 red Balls, 3 blue balls, and 2 green balls? (df) Mere than 3 on a single rol a fair dis? (a) Geting tess than 3 on a single roll of a (air dic means geting a | ora 2. These are mutually exclusive events. Applying the rule of addition for mutually exclusive events, We get Pier Fy +r) Using set theory, P(L or 2) can be cewrtten in am equivalent way as P{Q/U2}. where U is read “anion” and stands for a. (8) Getting s heart or a stub 96 a single pisk from a welkshufed desl of cands alse constitatcs two: maually exshisive events. Applying the rule of addition, we get PH or C) = PIMC) = © POR of B) = P(RUB) Mor Ser6)=rausus) =a mols ms) bed 46 PRODADILITY AND PRODABILITY DISTRIBUTIONS [omar 3 (a) What is the probability of getting an ace or a club on a single pick from a wellshuled standard deck of cards? (Ia all rernaining problems, it will be implicitly assumed that coins are balanced, dic are fair, and decks of cards are standard and well shuttled and cards are picked at sarnborn without veplaccanat;) (@) What is the fwaction of the negatine tern in the whe of addition for events that are aot mutually exclusive? (a) Getting anace or a club does not constitute tuo mutually exclusive events because we could get the ace of clubs. Applying the rule of addition for events that are not mutually exclusive, we get 4. tw 4 Bt Ron G FiN os C) = F(A) + IC) ~ PIA and C) ‘The preceding probability statement can be rewsiten in an equivalent FIAUC) = PIA) + PIC) — ANC) using set theory a hse 7 i ral “nhcnoctvn aad sans fads () The function of the negative term in the rle nF addition for events that are net mutually exchaine isto avoid Wouble countmg. For example, m calculating FA or {) m part 4, Me ace of eluDs 1s counted tice, onge as an ave and onse asa eluly, ‘Therefore, we subtract the probability of geting the aoe of subs in ordcr Gv avoid thissdouble counting, IC iicevcris-arc mutually exclusive, dhe prebabiliy tha ‘both events will occur simultaneously is‘, and no double counting is involved. This is why the rule of sddition for mutually onchusive ovents dees net contain a negative tem, What is the probability of (a) Inflation [or recession R ifthe probability of inflation is.3, the probability of recession is 0.2, and the probability of inflation and recession is 0.06? (é) Drawing an age, a elub, or a diamond on a single pick trom a deck? (a) Since the probability of inflation wid recession is not 0, inflation and recession are not mutually exclusive events. Applying the rule of addition, we get Por R) = PU) + PIR) = P{l and Ry or PULURY = PUL) 4 PIR) = PUL R) nel PUlor R) = MUU R} =03 40.2 —n06— 0.44 ()Gotting an ace, a eb, ara diamond doet not constitute mutually exclusive evens because we could pet the ace of tubs or the ace of diamonds. Applying the rule of addition for events that are not mutuals exclusive, We get P(A or © or D) = MYA) + #(C) + PIDI— PA aml C) — PLA and D) 4,1 ,18 1 VT a is PUA oF © a 2) = What is the probability of (a) Two Os on 2 rolls of a die? (6) A Gon each die in rolling 2 dice once? c) Two blue balls in 2 successive picks with replacement from the urn in Prob. 3.4? (a) Thrce girls in a family with 3 ehibdren? (eo) Getting 4 6 on each of 2 rolls of a die constitutes independent events, Applying the rule of sis plication for independent cvents, we get P{6 and 6) = PIB) = PIG) Fi6}= 6 (6) Getting a 6.0m each die in rolling 2 dice once also constiies independent events, Therefore FUG and 6) = PIG) = PIB) PLO}= 6 () Since we replace the frst ball picked, the probability of geting a bu ballon the second pick is the same fc 09 the fet pick. The events ara independent. Therefore CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS a Aa aa 9 warms! Gd) The probability ofa girl, G, on each birth eanstiutes independent events, each with a probability of 0.8, ‘Therefore MG and G and G) = PGOG 016) ‘oF J chance in 8 (Band (BN B) = PLB) PIB) 1G PIG) - FIG) = (0.8) (05) 40.5) =0.125 (a) List all possible outcomes in rolling 2 dice simultaneously. (6) What is the probability of petting a total of 5 in rolling 2 dice simultaneously? (ch What is the probability of gctting a total of 4 ar less in rolling 2 dice simultaneously? More than 4 (a) Bach dicts 6possible and oqualy likely ouicumes and the wuleume on eackdicisindepewlent. Sinve cach ofthe 6 ouenmesoa the first die can be associated with each of the 6 oatcomes on the second dic, thore are a total of 36 possible autsomes that bi, the sample space Nis 24, (In Table 3.3, the ist ‘uiibor refs tthe oatconse om the Bist die, and the sooond aumnber refers to the sozond dee, The dist can be disinguished by diffewet colors.) The total of the 36 possible outcomes also-car be shaven by 2 roe (or sequential diggram, as in Fig. 8 Table 32 Outcomes in Reiling Two Dice Sinultaneousy wt BT BE 4 3,1 2 2 at 4 5,2 3 23 33 4 33 4 ha ua 4 sa 5 as a8 4 58 é Ne XG Ae 46 (6) Oot of the 36 pocsible and equally likely outoomer, 4 of them givea total of $. These are 1, 4.2, 3;3, 2: and 4,1. Thus the probability of a total of § (event ) im rolling 2 ice simultancausly is given by fia) ot (0) Rolling a total of # ar less involves rolling total of 2 3,9 4. There are f possible and ecwally Ukely ways of rolling atotal of 4 or fest. These are 1, 11.21.3521; 2.2 and 3.1. Thus event 4 is defined as rolling a total of 4 or less. Pi} = 6/36 ~ 1/6, ‘The probability of getting a total of more than + equals T mimss the probability of getting a total of 4 oF less, This is | 1/6 — 5/6, What isthe probability of (a) Pickiag a second red ball from the win iin Prob, 34 when a red ball was alrcady obtained om the first pick and not replaced? (6) .A red ball on the second pick when dhs First ball picked was 04 rest aad was snot veplavel? Co} A seal ball ow the tise pich oh rod and a nonred ball were obtained on the frst two picks and were net replaced? (a) Picking a sccond red ball from the urn whem a red ball was already picked on the first pick and was not replaced is a dependent event, sine there are now only 4 red balls and 5 noneed balls remaining inthe turn. The conditional probubity of picking a second red ball when 2 re ball was already obtained on the first pick and was mot replaced is P(RR/RR} = 4/9 (6) The conditional probability of obtaining a red ball on the second pick when the first ball picked was not red (Rand was not replaced in the arm before the sesond ball is picked is PUR/R') = $9, 48 Bus. PROBABILITY AND PROBABILITY DISTRIBUTIONS [cHar, 3 Oxon oo scone on Ge inte the econ ie 6 Fig. 38 Tree Diagram for Rolling Two Dice Simultaneously e)_ Since ? balls, one of which was red, were already picked and not replaced, there remains a total of 8 alls, of which 4 are red, in the urn. ‘The (conditional) probability of picking another red ball i AARUR and Re’) = F(R/R' and R) = 4/8 = 1/2. What is the probability of obtaining (a) Two rod balls from the urn in Prob 3.4 in 2 picks ‘without replacement? (b) Twoaces from a deck in 2 picks without replacement? (e) The acs of ‘hubs and a spade in thar order in 2 picks from a deck without replacement? (df) A spade and the see of chuls ov that order im 2 picks fear a ddeck without replacement? (2) Throw ve halls from ‘the urn of Prob. 3.4in 3 picks without replacement? (f) Three red halls fromm the same urn 30 3 picks with placement? a) Applying the rule of multiplication For dependent events, we get 6) @ CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS ” 316 uy a Sand Ae) = ASN Ach = AS) Ae} = Beha = eRe a8 (Pend Rand R) = ARAROR) = AR) -#iR/R) ARIK ae) S43 ot oo Wee (7) With replacement, picking, three balls from an um constitutes three independent events. Therefore POR and R and R) = PIR) P(R)- PIR: io 10 10 Past experience has shown that for every 100,000 items produced in a plant by the morning shift, 200 are defective, and for every 100,000 items produced by the evening shift, $00 are defective During a 24-h period, 1000 item are produced by the morning shift and 649 by the evening shift What is the probability that am item picked at random from the total of 1400 items produces during the 24h period (a) War produced by the morning shift and ix dafective? (5) Was produced by the evening shift and is defective? (c) Was produced by the evening shift and is net defestive? (a) Te dofeetive, bother produesd by the morning or the evening shift? (a) The probabilities of picking an item produced by the morning shift MI and evening: E are 000 00 iM 0625 and PE) = SE ‘The probabilities of picking a defeetive item D from the morning and evening outputs separately are 20 00 2a a DIM) = sary = 8D — and FID /E) = ET = 0m ‘The probability that an item picked at random from the total of 16 Hems produced during the 24-h period was produced by the mening shif und ip defective ie XM and BD) = PM) #(D/M) = (0.6289(0.002) = 0.00125 ( P(E apd D) = PCE) A(D/E) = (0.375}(0.005) = 0.001875 %, G and D') = P(E)- A(D'/E) = (0.3% =asrsns @ PE and D!) = PIE) -F(D/E) = (0.15) SE = 03731 (a) The expected amber of defective itemsfrom the morning sift is equal to the probability of a defective item from the morning output times the mumber of items prodvocd by the momning, shift that i, (0,002 0 From the evening shift we expect (00005)(60M) = 3 defective items. Thus we expect $ defective items from the 1600 items prosiaced during the 24-h period, IF there are indies 5 defective items, the probability of picking al randar any of the S defective lems out of a total of 1600 items is $1600 1/320 or 0.003128, (a) From the rule of multiplication for dependent events Band , derive the formula for P(4/B) in terms of P(A} and P(R) This is Known as Raves" rhearem and is used to revise probabilities when additional relevant information becomes available. (b) Using Bayes’ theorem, find the probability Gtat a defective item picked at random from the 24h output of 1600 items in Prob 4.16 was produced by the morning shift; by the eve (a PiBand aj = FB) -PLA/BI By dividing both However, PR and 4) = PIA and By, exe Prob. 8.15(opand (i). Therefore 0 PROBADILITY AND PRODABILITY DISTRIBUTIONS cua. 3 P54) 8) ay r44/ Bp PE es toro 85) FR) FR (6) Applying Bayes” theoreen to the statement in Prob. 3.16, lets 4 sip the morning, shit Mand 8 sigmty defective D, and utiizing the results of Prob. 3.16, we get FIM) P/M) _ (0.625002) _ 0H? POn/D) = DY —aansiis~ a0sTs 04 Thatis, the probability that a defective teen picked al randora Gow the total 24h output of 1600 eens war produced by the morning shift i 40%. Similarly i9.375)(0005) _ 9.0m1s75 B/D) = rip rey = OTT ANNS! _ OemtES = 0.6, oF 60, Dyes’ theorcen can te generalized, for example, to find the probability that a defective item 2 picked at random was produced by any of w plants (4ie/= 1... ..n), as follows: Pid.) PB SAT PITA where 5) refers tothe summation over the plants (the only ones producing the wuipat), Bayes! orem is apphied im Hesiness decision theory, DU Is sekJom Wed IN the eG of oN, (Mewever, ‘rayesian econometrics is beeoming increasingly amportan.) Pai) = (48) 318 Acclub has § members. (a) How many diflerent committees of 3 members cach can be formed from the club? (Two committees are different even when only one member is different.) () How many commitices of 3 members each can be formed from the club if each commitice is to have a president. a treasurer. and a sccretary? (a) We are imerested here in finding the number of eombinasions of $ people taken 3 at a time without ccancern forthe onder ! SOF In genera, the number of arrangements of things taken ata time-without eoner for the onder isa combination given by = (aaa tar) where al tread w fastorial) =e-fn 1) fa —3)-—-3-2-1 and OF = 1 by definition, (6) Since cach committee of 3 has to havea president, a treasurer, and a seeretary, we ane mow interested in nding the number of purmmutations of 8 people taken 3 at atime, whem the order x éportane ee oe =a ial In general. the number of arrangements i define ode, of n things taken 1 ata time ism peomutae tion given by ” a (4s) Permutations and combimations foften referred to as counting teinigues) are helpful in counting the saeiher OF ally Whely ways eve a ode cela te Une Lat of alps aid ual likely ‘oatcomes, Combinations and permutations were not used in previous problems because those pro blame ware simple enough without ther CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS 5 DISCRETE PROBABILITY DISTRIBUTIONS: THE BINOMIAL DISTRIBUTION 319 30 Define what is meant by and give an example of (a) a random variable, (b) a diserete random: variable, and (e) a discrete probability distribution, (a) What is the distinction between a probability distribution and a reativesfrequency distribution? ab A rondo wate isa variable host vahucs are aasecated with som protatility of hing sbacred Fr enatple, oe 1 roll ofa fat die, we have 6 mutually exclusive outenmes (2 3, 4,5, 0° 6), each aatociated witha probability ccurtcace of 1/6." has the eutcome from the rll ofa die Wa random “arable. CO) A cdssreie renin raninble is ou Haat cau asses ouly Guile or distinet values, For esas the ‘outsomes from rolling a dic sonstitutsdisrete random variables bscaruse they arc limited to-the values 12,44, 5, and 6, Thie to be contrasted with continous vorlahter, which san accome an infinite number of values within any given interval [see Prob. 3.31fa) (0) Addoceeie probably asieauion veer te he 961 of all puss values uf a (uixercleh random variable land their associated probabilities The sct of the 6 outcomes in rolling a die and their associated Peohabiltcs in an example of a disorate probability dsteibution. ‘The sum af the probabiliion ania ciated with all the valies that the diseste random variable can assume alway’ equals | (a) A probabiiy diserbusion reters wo the classe ora prfart probable associated with ll the values that 1 random variable can assume. Because those probabilities arc assigned a priosi and without any sapirimentation, a probability distribution is oftsn referred to as a ehevvencul (lative) fPequensy sdstribution, This differs from an empicical (relative) frequency distribution, which refers 1o the ratio of the number of timer exch outcome actually occurs to the total mumber of actual trial or observations. Far example, in actually rolling « die a number af times, we are not likely to get ch outcome exactly 1/6 af the times. However, at the number of tolls increases, the empirical (elative) frequency distribution stabilies atthe (uniform) probability ar sheoreticl relativefreq wency distribution of 16 Derive the formula for (a) the mean js ar expected valwe EC¥’) and (b) the variance for a sdscrete probably st ration. (a) The Fortuila for the arithmetic mean far grouped population data [Eq. (2 2a] is ret ante where 55 ffs ihe sum of the frequency of each class f thnes the class mikpolnt W and.” = 5 7, whieh te the number ofall observations or frequencies. In dealing with probability dstibutioms, the mean ye ‘soften soterced tows the “eajtesl Nabe” £(). Ths fovaula fos ye or EA) fou a shancste poobalty sistribution can be derived by starting with Eq, (22a)and keting f = PL). which isthe probability of och of the possible omtoomer W, ‘Thon, 32 f¥ — S5MDUN), which ic tho cum af the valve of each outcome times its probability of eccurence, and N= Ef = 5.A(X), which is the sum of the prob abilities of each evtasune. which is 1 Thus Fer) =e EP (n (6) The formula for the varinnce of grouped popolation data [Eg. 2] is Ev - i" u (ray ‘Qnoe again letting f = PLY’) = probability of cach outcome and the formula for the variance of a discrete probability distribution Erebrun we cam got Var Xa of = = E(YIPPC) = SPP EG = BY (2 22 PRODADILITY AND PRODABILITY DISTRIBUTIONS [omar 3 321 Table 3.3 gives the number of job applications processed at a small employment agency during the past 100~day period. Determine the expected number of applications processed and the variance and standard deviation, ‘Table 3.3 Number of Jub Application: Procesced during the Pact 100-Day Period a it) 0 » : M4 4 ‘To the extent that we believe that the experience ofthe past 100 days is typical, ws can find the relative frequeney distibution and equates probability dist®bution, This and the other calculations to find) and Var Y are shows in Table 3.4 VarX =o} =) A0X) —[SENPUXIF = 116— (10.6y = 116 — 112.36 = 3.64 applications squared SDN = oy = ye} = W369 & 1.91 applications ‘Table 34 Caleuations to Fin the Expected Vatue and Variance lumber, Days, rin) Erin x “erin 7 1 on Oo ” 1 8 w ou 08 o 64 w 20 02 20 100 204 un 20 03 33 12 363 1 » va 24 ry ake 4“ Ww on a 196 4 NeSsreto | Daye | xy = 106 y= 6 BUYS = DPD 106 applications 3:22 (ap State the conditions required to apply the binomial distribution. (8) What is the probability ofS heacls in 5 flips of « balancod-coin? (c) What is the probability of less than 3 heads in 5 flips of a balanced evn? (@) Theinomial distribution i used to find the probability of 1” number af occurrences oF soocesses of an seat, PA, aw Winks ofthe sia eopesinget when (2) trace sul)? auutually ealuseve oulkonnes, (@) them trials are independent. and (3) the probability of vccurrence.or succes, p, remains constant in-each trial CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS 2 (on FX) aap = PF = (Pol at = a at a See Ege. 3.10) and (3.17). Ia some Books, 1 — p (the probability of failure) iedefised at. Here we — 5 No=3,p=1/2,and 1—p= 1/2. Substituting these values into the presediag equation, we get PO)= ge gg tat (Ua? = (1/2 = 191/32) = 92125 i) PIX -<3) = PION PI) + PD) ET 5 as. PD) = peg UF RY = 35 = 0.125 Thus PUN <3) = POON PI) + PQQ) — 03125 +9.15625 40.3125 — 0.5 323 (a1 Suppose that the probability of parents having a child with blond hair is 1/4. ‘there are 6 chikdren in the Family, what isthe probability that half of them will have blond hair? (bt I the probability of hitting a target on a single shot is 0.3, what is the probability that in 4 shots the target wil be hit at Teast 3 times? (a) Meee 6.8 —3.y— 1), and 1p 3/4, Substituting these values inte the binomial formals, we st 8s apraiay Phe teayanven 85432 ae PSN ap UNC =a (LOHNTION) =F (2/4096) Nga son thy Here n= 3, and 1p PIX> 3) PI) +A) PB 0.3"(071 Thos 3.24 (a) A quulity inspector picks a simple of 10 tubes al random from a very large shipment of tubes knows to contain 20% defective tubes. What is the probability that no more than 2 of the tubes picked are defective? (b) An inspection engineer picks a sample of 15 items at random from a manufacturing process known to produce 85% acceptable items. What is the probability that 10 of the items picked are acceptable? (0) Heron = 10, 22, pehd,and 1 p05: s PROBABILITY AND PROBABILITY DISTRIBUTIONS [cHar, 3 AN S21 PLO PL) + PRR) 10! ‘oro —07) = 0.1074 ooking up m= 10,0 0, and p= 02 in App. 1) Pil} = 0.2684 (looking up m= 10,1 = 1, and p= 02 in App. 1) P{2} = 1.3020 (looking up w= 10,1 = 2, and p= 02 in App. 1) Thus PIN S21 P(OD-+ PL) + PLZ) —O.1074 + 0.2684 + 0.3000 = 6778 (8) Here m= 15, ~ lip 8.85, and | p= 0.15. Since App. | only gives binomial probabilities For up 10.0.5, we should transform the problem. The probability of = 10 acceptable items with equals the probability of = 5 defective items with p=4.15. Using a = 15, ¥'= S defective, (of sbjcctive) = 0.15, we pet 0.0849 (from App. 1). Pio} (o2)"os)" 25 (a) IE balanced coins are tossed simultaneously (or 1 bakaneed coin is tossed 4 times), compute ‘the entire probability distribution and plot it. (6) Compute and plot the probability distribution for a sample of 5 items taken at random from a production process known to produce 30% defective items. ta) ; V=0H, IM, 2H, 3H, or 4H; P= 1/2; and App. 1, we get POOH) = 0.0625, 3180, POH) = 4.2400, P(aH) 00635, and En PUI) = 0.3500, PH) thas POOH) + #(0H) + PIM) + PCED + PAH) (0625 + 02500 + 0.3790 + 02500 + 90625 = 1 ‘See Fig. $1 Note that = 0.3 and the probability distribution in ig. 3-9 is symmetrical, z 2 ass an a qu ans ‘a Senter le Number f eter fers Fig, 34 Probability Distribution of Heads in ‘Tosting Foor Balanced Coins Fig, 310 Probability Distribution of Defective ems (81 Using n= 5 4 4.4, or $ dof five; ar p= 0.3, we got pf) = 0.1681, #1) = 0.9602, #(2)—= 0.3087, #3) = 0.1523, 2(8) = 0.02H, AS) = 00028, Therefore PQ) + #1) + PI) + PCR) + PIA) + PIS) = 0,168] + 0.3602 +.0.3087 + 0.19234 00384 40.0024 = ‘See Fig. 3.10, Note that p<. and the probability distribution in Fig, 3-10 is skewed to the righ; 3.26 Calculate the expected value and standard deviation and determine the symmetry or asymmetry of the probability distribution of (a) Prob, 3.2%(a), (6) Prob, 3.24by (c) Prob. 3.240), and (d) Prob. 3.2406). %) EL) = po up = (6)(1/4) = 3/2 = 1S blond children SDY = ye@pT =i = YETTA = VTR7TR = VTE & 1.6 blond children Becaure p < 0.5, the probatility distribution of blond children ic ckewed to the + CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS 58 or T= op = (410.3) = sox = vty = YARTHET < = vO & uyens Becatse p05, the praabilitydivebution is skewed to the right (eh BUY) = c= mp = (10)(02) = 2 defective tubes SD-X = Vinpll — p) = V{1OO2NO.8) = VIG = 1.26 defective tubes: ecause p< 0.9, the probability dixtrbution ts skewed to the sight cr zixt= = (1510.85 = 12.7 accemtabie items SD. = api =p) = VTSORS|@1S) = vASTTE 1.38 avoepeable ems Because p> 0.5, the probability distibution is skewed tothe et 3.27 When sampling is done fiom a finite population wishous replacement, the binomial distribution cannot be used because the events are not independent, Then the Aypergewmetric distribution is wed. Thit ie given by CQ) hhypergeometrie distribution (an Te measures the number of suosesses in a sample size taken at random and without replace ment from a population of size N, of which ; items have the characteristic denoting success, (a) Using the Formula, determine the probability of picking 2 men in a sample of 6 selected at random without replacement from a group of 10 people, Sof which are men. (6) What would the result have been if we had (incorrectly) used the binomial distribution? an (@ (") 7 re aa (al ag Pua o Pa) [should be noted that when the sample ie wery small in relation to the population (sa, less than 3% of ‘the population), sampling without replacement has ile effect on the probability of sueves in each tial and the binomul distribution (which is easier to use) #64 good approaimation for the hyperscometcic istribution. This is the season the binomial distabution was used in Prob, 3.2Ka), THE POISSON DISTRIBUTION 3.28 (a) What isthe difference between the binomial and the Poisson distributions? (b) Give some examples of when we can apply the Poisson distsibution, (ce) Give the formula for the Poisson distribution and the meaning of the various symbols. (d) Under what conditions can the Poisson distribution be used as an approximation tw the binomial distribution? Why can this be useful? (@) Whereas the binomial distribution can be used to find the probability of a designated number of suvseases im ins, ths Poitaon distibution is used to funk the probability of designates uuaibec of successes per wn ef tine, ‘The other conditions required te apply the binomial distribution also ars reuited to apply the Poizvan dictation: that i (1) there must bo only te matallywxchicive oot 56 any 30 [omar 3 comes, (2) the events must be independent, and (3} the average number of successes per unit of time (6) The Poisson distribution is ofen used in operations research in solving management problems Some samuples ate the aber of telephone alls to te poles pat hous, Hae wunibes of castonnaes aciving ata ‘gasoline pump per howr, and the sumber of trafic accidents at an intersection per week (6) The probability of a designated number af successes per anit of time, Pi), can be found by Met oT ix ‘where X= designated number of successes he averse neimber af sueeesies wes a specie ime perio he base of the natural logarithes system, oF 2.70828 Given the value of, we can find «* from App. 2 substitute it nto the fom, and-ind P(X). Note ‘hats the mean and variance ofthe Poison distribution, (We can use the Poisson distribution 85 an approximation to the binomial distibation when w, the srumber of tak, i large and p oF Up is small (are events}.A good rake of thumb isto use the Poisson distribution when 20 and np or n{l-~ p< S. Whenm is large, it cam be very time consuming to wse th binomial distribution and tables for binomial probabliiss, for very small vals of p may pot be availble. Ifa(l ~p) < 5, soosess and faire shut be redefined so that ap < 5 to snake the approximation ascarate. Past experience indicates that an average number of 6 customers per hnur stop for gasoline at a gusoline pump. (a) What is the probability of 3 customers stopping in any hour? (b) What is the prehahility of Tcustomers or less in any hour? (0) What is the expected value, or mean, anc standard deviation for this distribution? fe*_ (2 \ ® any — GINO _ OSES _ gap é oy Fin ray) P2) 4) fe ayaa Se (O08 gory Ge _ (360.0248) £3)= 00898 fo ut) Ths 5 3) Onn Com oc S28 =o.onss = 00ds6 (2) The sepsgted walvs, of moan, of this Poisson distribution is A — 6 cistomers, and the standard devis tion is VA = VB 2.45 eustoners Past experience shows that 1% of the lightbulbs produced in a plant are defective. Find the probability that more than | bull is defective in a random sample of 30 bulbs, using ta) the binomial distribution and 4b) the Poisson distribution (@) Here 30, p = 0.01, and weare asked to find P(V > 1}. Using App. 1, we pet POO) + Fi) + PLA) + = MORE + O.O031 + 8002 = AMET, oF 61% (6) Since oe 90 aad np — (3RY(0.MT) —03, We san use He Poissow appeosination of the binonal Alstibution. Letting = ap = 0.3, we have tofind PN > I} = 1 — PLY 1), where ¥ is the mamber of Gofective lls. Using Tg, (3.13), we got CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS 7 Poy =e (0.3)(0.74082) = 0.222285 Pio 8 —o74n82 PUY S 1) = PI) + Pi) = 0.22246 + 0.74082 = 0.965066 Thos PUPS Ent St) = 1 sie = O.0;6M, oF 3.895% ‘As becomes large the apyicosimation besomies even clotce. CONTINUGUS PROBABILITY DISTRIBUTIONS: THE NORMAL DISTRIBUTION aa aa (a) Define what is meant by a continuous variable and. give some examples. () Define what is meant by a continuous probability distribution, (c) Derive the formula for the expected value and variance of a continuous probabiity distribution, Ga) A. continser veriabte is one that can assume any valne within any given interval. A continous \anable san be measured with any degree of aocuracy simply by wing smaller and smaller ucts of rmearurement. For example, if we ray that » production procots takes 10k, this mane anywhere berween 93 ard 10-4h (10h rounded to the nearest hous). If we used mimates as the unit of measure- ment, we could have sail thatthe nrevluction process takes 10h and 20min, This means anywhere between IWhard 19.Seminand 10k and 24min, and sows, Times thus a continuous Variable, and 30 arc ucight, distance, and temperature. (6) A.cominuous probability setbuson refers to the range ofall possible values that a continuous random ahr saa ascnene ragether with the stsoeiatet peabaltiis The penhabiity cistriatinn of a ei tinuous random variable is often called a probability density mcrion, or simply a prabulility Faction, Tes given by a smooth curve such that the total area (probability) under the curve is 1. Since 2 continuous random variable can assure an infinite nurmber of values within any given interval, the probability of a specific value is 0. However, we ean measure the probability that a continuous random vaniable ¥ assumes any valne within a given interval (say, betwcon .y and 3} by the area under the carve within that interval; iti W Ny [i rae (27 ys ‘whore f(s tho equation ofthe probability density funstion, andthe integration sign, J, ie analogous to the summation sign © for discrete variables, Probability tables for some of the mast sed con tinnions prnhabity Aitiohntions are gira inthe appends, this ciminnting thr rit to getirr the integration oursches, (2) Tho expected value, or mean, and arance for continuous probability disteiutions can he dovived by substicating J for 5 and f(¥) for PLX) into ths formuls foe the expected value and varianee foe dliscrete probability distributions (Eqs. (3.0) and (2.2 symm [aren av (34 Var = fw mevyp pvp a (a5) (a) What is normal distribution? (5) What is its usefunese?.(c) What is the standard normal disteibution? What is its usefulness? (a) The norm dicinbution 8 a continous probability function that & bell-shaped, symmetrical bout the ii, ail scouts eGue iat Sec. 24), AS we ans Mle aay Cons Ue cna i ttle directions, the normal eurve approaches the horizontal axis but never quite touches it). The equation of the normal probability fanction i given by 38 PRODADILITY AND PRODABILITY DISTRIBUTIONS [omar 3 14] Where (17) = height of the normal curve 2 = shanna deviation of thea l. () The normal distribution is the mast commonly used of all probability distributions in statistieal anaivsis. Many distnbutions actually found in nature and industry are normal. Some examples are the IQs (intelligence quotients), weight, and Beights of a large aumber of people and the variations in dimensions ofa large number of parts prodosed by a machine. The normal cistribation often can be used to approximate other distributions, sich ac the binomial and the Poisson distributions (ese Prob. 3.7 and 3.38) Disinbutions of sample means and proportions are often notmal, regardless of the distibution of the parent population (Se See. 4.2), (e) The standard normal distribution i¢ a normal distribution sith j= and o° ‘Any normal disttbution (defined bya particular value for y and o°) can be transformed into a standard normal distribution by letting ¢— 0 and expressing deviations from y+ in standard deviation units, We often can find areas (probabilities) by converting Y values into corresponding > values [that ix, (= )/o} an looking up these = values in App. 3 from minus infinity to plas infinity) 2) fore era ar sa ene Find the area under the standard normal curve (a) between z+ 1,242, and 243; (6) from 2S Dluz = O88 () hows = 1.0 lue = 2.55, (a) Ww lheboll uf: = 1.60, (@) lu the aight of r= 2.55; (A) (0 the left of z= =1,60 and to the right of z= 2.55, a) Thearea (probability included under the standard normal curve between = 0 and z= 1 is obtained bby looking up the vale of 1.0 ix App. 3. This is accompbabed by moving down the z column en the tableto 1.0 and then across until we-are below the columa headed 00. ‘The value that we get i 0.3413 This means that 34.13% of the total area (of 1 of 100%) under the eurve lis between z= 0 and P= LO0, Because of symmetry, the area hetween z—0 and z~—I is also 0.313, of 34.13%, the area. betwee Land z= 1 8 68.25% (see Fig. 3). Similarly, the area between ig 4092, of 41.12% (by Hooking up r= =u) im the eablep, 30 hat the area between, Fo £2 1s 95.44% (ope Fig. 4). The area between 7+ 3 = #9474%5 (see Fig. 3-42, Nove thatthe table sly ass tailed valucy fre ay hy 2.99 Benne Une a wes Ue ete wale «3 i wali (6) Thearea between z= Oand 2 = 0.88 is obtained by looking up 0.88 in the table. This is 0.3106. (©) Thearea between z= O.and = ~1.60 is obtained by booking up z= 1.60 in the table, This is 0.4452. ‘Thearea between z= 0 and : = 2.58 is obtained by looking up 2 = 2.55 in the table. This is 0046. Thas the area under the standard normal carve from z= =I-60 and 2 = $5. cquals 0.4452 phas D546. This is 0.9598, or 93.8% (see Fig. 311). Ima probleme of this nature itis helpful ta sketch a figure i) Weknow that the otal arca under the normal curve is oqual fo 1. Bocauseof symmetry, 0.$0Fthe area s on either side of =O. Since O.A8S2 extends from 2 = 0 to 2 = ~ 1.60, 0.5 ~ 0.8482 = 0.0548, or 5.48%, is the area in the left tll, to the left of 1 6D (ave Fig. 3-11) fe) 0.5~ 0.4049 = LOSS, oF 1.54%, is the area in the right tail, to the right of = 2.85 (see Fig. 3-10. (Fr Thearca to the left of z = —1.60 and tothe right of : = 2.55 is equal to-1 ~ 049998 (sce part ch. This is 1.0802, o 6.02% of the tal. CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS: 2 male Pig. S18 334 The lifetcne uf lightbuls i» kawwa to be morally distributed with ys = LOK sumer = Sh, What is the probability that a bulb picked at random will have a lifetime between 110 and 120 burning, hours? ‘Weareasked here find P(110 < ¥ < 120), mbere 1 refers to time measured in hours of burning time. Given = 100 and o'= Sh, and letting 2 = 110b and 1) = 120, we get My=w_ tot Xam _ 120-100 O28 and $20 100 20 ‘Thus we want the area (probability) between +) = 1.28 and =) =2.50 (the shaded area in Fig, 312). Looking up £3 = 2.50 in App. 3, we get 04938, This is the arca from 2 =0 to 2)= 250, Looking up 21 = 125, me get 0.364. This thearca from z= 4-24 = 1.25, Subtracting 0.394 from 0.4938, we pot (00954, of 9.948%, for the shaded area that gives P(I1O <1 < 120). ee = Fig. 312 3.38 Assume that family ingomes are normally distributed with js — $16,000, and» — #2000, What is the probability that a family picked at random will have an income: (a) Between $15,000 and $18,000? (6) Belew $15,000? fc) Above $18,000" (ay Above $20,000? (a) We want (815,000 < ¥ ~ 818,000), hese X is faeily incase: y= nw _ $15,000 — 816,000 _ Ayn _ $18,000 - 316.000 e ‘aan 5 dae oy ‘Thus we want the area (probabiltys between z= —05 and 4) =1 (Whe shaded area in Fig. 313). Looking up 2=05 in App. 3, we got 0.1918 for the arca from z—0 to z= 05. Looking up aI, we get OMI for the area from z= ta z= 1, ‘Thus, P(815,000 < X < $18,000) =0.1915+ O13 = 0.5828, oF 53.25%. ie ica oe Hecle oe Fig. 313, 337 PROBABILITY AND PROBABILITY DISTRIBUTIONS [cHar, 3 18) PLY’ < 815,000) = 0.5 ~ 0.1915 = 0.3085, or 30.85% (the unshadedd ara im the left tail of Fig. 3-13), Ae) ra > $1000) = US — 0.3418 = OAK, of 18874 (the Unshaded anen in the right tal of Fig. $132. {d)_¥ = $20,000 eorvesponds to 2= (820,000 ~ $16,000) /$2000 = 2. Therefore, PL’ > $20,000) =0.5~ ‘The grades om the midterm examination in a large statistics section are normally distributes! with mean of 78 and a standard deviation of & The professor wants to give the grade of A to 10% fof the students. What is the Towest grade point that can be designated an Aon the midterm? In this problem we are asked to find the point grade such that 10% of the students will have higher grades, “This involves finding the grade point X such That 10% of the area under the normal curve Will Be to the right of (the shaded azeain Fig, 3-14), Since the total areaunder the curve tothe right of 8 5 0.5, the swashadet area in Pig 3-14 tothe righ of 7S mma be O.. We muse look inv ahe Body oF App. ¥ forthe valve lowest 1004. This is 0.3997, which corresponds to the z valve of 28. The X value tthe grade point) that sorresponds to the = vals of 1.28 is obtained by substituting the known valuss inter — (N'— sr and solving for W “This piers 1074 WTR Thesele Vm 78+ M24 = 88 74, oe RS te avast Whe ae 9.3000 oe (Gene pit Pig. 54 ‘Experience indicates that 30% of the people entering a store make a purchase, Using (a) the binomial distribution and (8) the norenal approximation to-the binomial, find the prabability ‘hat out of 30 people entering the store, 10a more will make a purchase. ta) (= 10) = PLO) + {TI + PI) +--+ + P(30) = 0.1416 + 0.1103 +0789 + CO + 00231 ++ A10106 40,0042 + 0.0015 44.005 + 0.001 a? 16) je np = (309(0.3) <9 persons, and o = yfapit—p) = yGONOSHOT = v3 002.51 persons. ‘Since n= 30 and both ap and a(t = p) > S, we can approxmate the binomial probabelty with the ‘normal. However, the number of people ssa dscrote variable. In onder to use the normal distribution, ‘4p Must {reat the number of people as HAL Were a continNOUS NaN and Find FA. 93). Thus 2 From z= 8.20, we get 0733 (from App. 3). This means that 0.0793 of the area uoder the standard normal curve bes from = Ota = 0.20. Therefore, P(X > 9.5) = 0.5 = 0.0793 = 0.4207 tthe normal appiesimalions Ase becomes even large, the appresimation Lacowns eve chiser [LP we had wot ‘treated the number of people as a continuous variable, we would have found that PLN’ = 10) =O, and the approximation wold not have been ae clace.] CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS a 338 339 A proclction process produces I defective items per hour. Find the probability that 4 or ess items ate defective out of the eutput of « vaadonily chosen howr using. a) the Poiston distrbe tion and ¢4) the normal approximation of the Poison (ab Here A= 10 and we are asked to find P(X <4), where X is the number of defective items from the output of a randomly chosen hour, The value of ¢"* from App. 2 & 0.00005, ‘Therefore FID ZEN _ W005) gggos = 00083335 gr08s9s nM (0) + #1) + PLR) +3) + 0000S + 0.0005 + 0.0025 + 0083335 + 0.020335 = 0032217, oF about 3.2745 (Gh) Treating the item os comtinuows [ose Prob 33%, we are wid ta find 2X <4), whens W inthe number of defective items, = A= 10, and o = v= VIOH316, Thus Ka 49-10-92 a" ie Fie Foc : = Lin App. 3, we get 0.459], This means that M.S ~ 0.4591 = 0.0409 of the area (probability) ‘under the standard normal curve lies to the left of : = 1.74, Thus ALN < 4.5) = 0.0409, of 4.09%, ‘As ¥ booowics lager, we get a betier approximation (If we had not ‘rcatad the mamber af defective items as a continous variable, we wouk! have found that PAX < 4) 0.287) 1 Thevents or successes fallow a Poisson distribution, we can determine the probability that the frst event occurs within a designated period of time, P(T <1), by the exponential probability distriburion. scause we are dealing with time, the exponential ic a. continuous probability dlstribution, This is given by (3.27) wlicte i Use wuinber of wseanseacs Fo Ue inernal af iaverest anal e* cai be obtained from App. 2. The expected value and variance are (25) (329) (a) For the statement of Prob. 3.29, find the probability that starting ai a random point in time. the fit eustomer stops at the gasoline pump within a half hear (A) What isthe: pensahility that no customer stops at the gasoline pump within a half hour? (e) What is the expected value and variance nf the exponential distributing. where the comtinuons variahke is time 7? (a) Since am average of 6 custoniers stop at the pump pee hour, A = average of 3 custowiers per half hour. ‘The probably thatthe frst customer wil stop within she frst half our is Ine 7 = 1 —0,09979 (from App. 2) = 0:9502, oF 954 340 PRODADILITY AND PRODABILITY DISTRIBUTIONS [omar 3 () The probability that no-eustomer siops at the pump within a half hour is ae san (2 E(P) = 1/4 = 1/6 20.17 por sar, and yueT = 1/38 = 1/26 20.07h por car aquared, The expe. acatial distribution also can be used to calvulats the tims betwoen two successive eves, The mean level of schooling for a population is § years and the standard deviation is | year. What is the probability that a randomly selected individual from the population will have had between 6 and 10 years of schooling? Less than 6 years or more than 1D years? Since we have nat been told the form of the distribution, we eam use Chebyshev’ theorem, which applies tivany diseiburion. With ye = 8 ears andr = 1 year, 6 years af schawaling #2 standard clevintinns Below j and 10 years of schooling ts standard deviations aboxe Using Cheryshev's theorem or inequality We obtai PUR —y| = Ko) > 1 130) ‘The probability ofan individual picked at random froma the population wil be within 2 standard deviations trom the mean 1s Therefore, the probability that th indivigeal will have ha cither less than @ or mere than ID years of schooling & 25%. Supplementary Problems PROBABILITY OF A SINGLE EVENT 3aL ‘What approach to probabihty ts mvolved in the Yollewang statements? {a) The probabibty ofa head in the tex of a balanced coin is 1/2. 18) The relative frequency of a head in 100 tosses af a coin Is S3. Ke) The probability of rain tomorrow Is 29%. ‘dns. (a) The classical a a priori approach (6) ‘The relative frequency or empirical approach. ¢@) The subjective or persoaalistc approach. ‘What isthe probability thst in tossing a balanced coin we get (a) a tail, (6) alhead, (c) not ata or (dt a Wor pot a tail! ans. (a) PUT 1/2 b) PUMP 1/2 te) PC) = ya a PCH) + PT ‘What isthe probability that ine roll ofa fair die We Bet (a) a1, 48) 46, (Ch Hota Lor td) al oraot alt fins. (a) PAY) = 1/6 (b) PG) = 1/6 fey 5/6 Md) PI) Pi ‘What isthe probability that ina single pisk from a standurd desk ofcards we pick (a a club, (6) anaes, (o) theacr af clibs, [d) nol acloh, ar fe a club er not a Ans. (a) PIC) = 13/82 =1/4 (6) PIA) =4/S2= 1/13 (6) PAC (d) ric) =3/4 fe) PICI+ P= 1 Aw tuo contains 12 balls that as saactly alike encapt that 4 ase ble, Saco 3 ane geen anal 2 aie hile What is the probability that by pisking.a single ball we pick (a) A lve ball? (By A ced ball? fe) A green ball? (a A white ball? (@) A ponred ball? (F) A-nonshit ball? fg) A shite or nonwhite ball? Alco CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS a () What are the odds of picking a green ball? (2) What are the odés of picking « nongroen ball? ns: 4a) PB} ~ 1/3 or 0.33 (b) PR) ~ Lidar B25 (o) PIG) —1/for O25 {d) PW) — 1/6 or 0.167 (e) PIR) S078 (F) POW!) = 0833 dg) PW) + PTW) ST (h) 3-9 (9 9:3 Suppose that a card is picked from a well-shufled standard deck, The card is then teplaced, the deck reshuffled, and another card is picked. Ax this procen is repented $20 timex, we obtain 136 spades. (a) What is the relative froquency o¢ empirical probabllty af getting a spade? (hy What is the classical fo a price! probability of gctting » spade? (c} What would you expsct the relative frequency or empirical probability of getting spade to be ifthe proves is repealed many mors times? dmc. (a) 134/820 0.26 (44 PIS) —1/4 (6) To approach 1/4 or 0.28 An insuranos company found thal Gum a sample 6 10000 mew bebwcen the ayes of 30 and 40, 87 become seriously ill during a I-year period, 0) What is the relative frequency or empirical probability of men betwcen 2 and 0 becoming seriowaly il during 4 I-year peviog? (6) Why fe the insurance sompanse iaterested in these sults? — (cb Suppose that the company subsequently sills Realth insurance te 1.387.684 men in the 30 tad age group. How many elaimscan the company expect during a laxear period? Ans. (ay The relative frequency or empirical probability is 87/10,000 = 0.0687. (6) The insnrance-com- pany is interested in the relative frequeney or empirical probability in order to determine ite insurance premninis. fe} 12.073, to the nearest person PROBABILITY OF MULTIPLE EVENTS ae 350 ase What typos of events ase the following? (a) Pioking hoarts or chubs am a. single pick from a dock. (0) Picking diamonds or a queca on a single pick from a deck, (} To successive fips of a balanced cain. td) Two soocessive tosses of a fair dic. (ob Picking two. cards from a deck with 1=placement. () Picking two cards from a deck without replacement, (gb Picking two balls from an ura without replacement, Ars: (a) Mutually cxctasive —(b) Not mutuall exclusive (e) lodependent (d) Independent (e) Independent () Dependent (g) Dependent What i the probability of getting (a) Folie or shone on a simile tas oF a Fair ie? (8) Acer king on single pick from a welhshulflad standard deck of cards? (6) A green or white ball from the ura of Prob. aa? Ans; (ah 1/2 (8) 8/SE or 23. (e) SZ ‘What isthe probability of getting. (a) A diamond or a qussn on a single pick from a deck of cards? (b) A diamond, waqocen, or a King? (6) An African-American ar a woman president of the Linited States if the probability of an African-American president is 0.75, of a woman i 0.15, and of an Afi&can-American woman is 00072 Ans: (a) 16,52 07 4/13 (8) 19/52 (e) 033 What isthe probability of (a) To ones in 2 roll ofa die? (6) Three tile i 3 fips ofa coin? (e) A total of 6 in volling 2 dice simultancously? (a) A total of oes than $ in solliag 2 dice siemultancously? (0) A total af 16 oF more in rolling 2 dice sinmtancossly* Ans. (ah 136 (AV UK te) S36 G16 ted 18 ‘What isthe probability of obtaining the following from a feck of cards: (a) A diamond on the weed piske when the first card picked and not replaced was a-diammond? (6) A diamond on the scoond pick when the Breteard picked and not replaced wat nota diamond? {c) A king on the thind pick when a queen and a jack wwete already obtained on the frst and sscond pick abst aot replaced” Ame. (ah 12/81 (6 SL oe 4/50 What is tee probability of pickings (wh the king of clas sand liacnoud that wre ine pits fom a desks without replacement? (b) A white ball and a green bal in thus order in2 picks without replacement fram the torn of Prob. 115? (09 A preon ball and a white ball thor ordi i picke without replacement from the “ PRODADILITY AND PRODABILITY DISTRIBUTIONS [omar 3 uum of Prob, 3.457 dd) A grestsand a white ball ie shat ode in 2 picks turn? (6) Thos green balle in 3 picks without replacement (toon the ura? Ans. (a) 13/2682 a 1/208 (8) GI132 oF 1/22 feb 122 Ae) YEE fe) 6/1320 oF 1,220 jout replacement from the same SM Suppose thatthe probabity of rtm on a given day i 0.1 and the probability of my having a-cxr accident is (9005 on any day acd L012 seein aye (a) What ce svuhl Vase to cabal the platy that oa a tiven day it will rain anc will have a car accident? (8) State the rule asked for in part a, sting A signify acciddant and R signify rain. (8) Calculate the probability acked For in part a dns. (a) The rule of multiplication for dependent exents (6) A(R und A) = F(R) F(A/R) (2) 002 388 _ (@) What rule or theorem should { use to calculate for the statement in Prob. 3.54 the probability that it was sings lige: Fad a car atsntset? (2) Stale the cule-ov thecesns applable be pat ae (e) Ansties the question i part fans. (a) Baye’ theorem (5) BR/A) = A(R) FA/R)/PIAY 4s) O24 438 In how many ciflewent ways can @ qualified individuals be assignod to. (a) Three trainee positions available if the positions are wentical? 48) Three wainee positions eventually ifthe positions cifer? ) Six trainee roils avails ithe pit lifes? Aus. (a) 20 (8) 130) 720 DISCRETE PROBABILITY DISTRIBUTIONS: THE. BINOMIAL DISTRIBUTION, 3ST The probability distribution of lunch customers al a restaurant is given in Table 3:5. Caleulale (a) the ‘expected number of hunch customers, (8) the varianec, and (c) the standard deviation ‘Table 35 Probability Distribution of Lanch Customers at 4 Restaurant Nasu of Castine 100 10 us 120 12s Ans, (a) 113.1 customers (6) 65.69 customers squared (¢¥ 8.10 customers 358 What is the probability of (a) Getting exactly 4 heads and 2 tails in 6 tosses of a bullaced coin? () Getting 3 sixes in 4 rolls of a fair diet Ans, (a) 923 (6) O0LS4R21 380 (a) 120% of the seadents entering college deop out fore secelvingthels diplomas, find the pesbabilcy that ‘ut of 20 stucents picked at random from the very langs numberof students entering college, less than 3 drop fut (8) If 0% of the bulks produced in a plant are acceptable, what isthe prabahulity that out of 10 bulls, Picked at random from the very large outpot of the plant, 8 are acceptable? ns. (a) 9206 tb) 0.1937 ‘380 Caloulase the expected valve and standard deviation and éewermine the symmetry or asyrametry of the Probability distribution of (a) Prob. 3.5842), (8) Prob. 3.59{a), and (ey Prob, 3.3%) Aeon (a) E(A) — 1 els, SEN — 1.22 haul, aod theistabution és ayunneteical (2) ECE) —4 stntents, SD ¥ = 1.79 students, and the distribution is skewed to the right, (e) (1°) = 9 bulls, SD. = 0.95 balks, and the dietribution ie el:ewed to the let CHAP. 3) PROBABILITY AND PROBABILITY DISTRIBUTIONS oe 261 What is the probability of picking (a) Two women in a sample of $ drawn at random and without replacemant From a group of B people, 4 of whom are womma? (8) Eight men in a eample of 1 drawn at randoot and without replacement from poputation of L000, half of which are men, Ams (a) Ahont C171 dosing the hypergecmesrie slisritusion) (hy Abst O39 (using the hincwnial approximation to the hypergcometric probability) THE POISSON DISTRIBUTION sa Past experience shows that there are to traffle accidents at an lnrerscsion per week. What isthe probe ability of: (a) Four accidents during a randomly selected week? (8) No accidents? {cy What is the sxperted vahts and standand deviation of the distribution? Aus; a} About 0.36 (6) About 14 (@) BLA} A— 2 accidents, and SD. — VR— Ll accidents Past experience shows that 00% of the national labor force get seriously ill during a year, If 1000 persons are randomiy selected from the national labor Force: (a) What is the expected mumber of workers that oil get sek during a year? (8) What i the probability that S workers will get sick during the year? Ars. (a) 3 workers (6) About 0.1 (using the Poisson approximation to the binomial distribution) CONTINUOUS FROBABILIFY DPSTRIBUTIONS: THE NORMAL DISTRIBUTION as aT am Give the formas: (a the probability that eontinuoys variable X falls berween As and Vs. (8) the normal Slistrinution, (c) the expected valve and variance of the normal distribution, and “{d) the standard normal distribution, fe} what i the mean abd Variance of the standard Hormal disteibution? Ans. (ab PLM) OSD, thea _ 2 [Nan OA rem TeV 30 = 4, witht the jite correction factor instead of op = EXAMPLE 4, The probability that the mean of a random sample V of 36 elements from the popalation in [Example 3 falls between 18 and 24 units i compited as Fallows 18 ang oF Looking up 2; and 2) im App. 3, we get rise <2) =08 13 + MATT2 = O.RTRS, oF BLASS cuar. 4) STATISTICAL INFERENCE, ESTIMATION @ Soe Fig. 42, se ca Sica io 7 cele Fig. a 43 ESTIMATION USING THE NORMAL DISTRIBUTION ‘We can get a point or an interval estimate of a population parameter. A poi estimate is a single umber. Such & point estimate is wibiased if in repeated random samplings from the poputation, the expected oF mean value of the corresponding statistic is equal to the population parameter, For example, is an unbiased (point) estimate of because pg = p, Where jy #8 the expected value of The sample standard deviation sfas defined in Eqs, (2.20b1and (2,1Jbi] is an unbiased estimate of {sce Prob. 4.13(6)). and the sample proportion jis an unbiased estimate of p (the proportion of the population with a given characteristic). ‘An interval extimare rofors to a range of values together with the probability, or confidence level, that fhe interval includes the unknown papnlation paramcter Given the population standard deviation ar its estimate, and given that the population is normal or that a random sample is equal to or larger than 4, we can find the 95% confidence interval for the uakinown popubation mean as PUL — 1.960ry < p< 8 + 1.9604) = 0.98 (4) This states that in repeated random sumpling, we expect that 95 out of 100 intervals such as Eq. (4-4) include the unknown population mean and that our confidence interval (based on a single random, sample) is one of these. A confidence interval can be constructed similarly for the population proportion (see Example 7) where (the proportion of suscesses in the population) 43) (the standard error of the proportion) (668) EXAMPLE 5. A random sample of 144 with a mean of 100 and a standard deviation of @) is taken from a population of 1000, The 95% confidence interval for the unknown pptlation mean is £1 mep since n > 30 £196. since m > 0.058 = 100 4 1.9622, OORT sing sat an extmate of © aa 1000 = 7 — 100+ £9048) (093) = 1040.1 Thus sis between 9,89 and 109.11 witha 95% degree of eonfidenes. Other frequently used confidence intervals are the 80 and 99%; level, corresponding ta the 7 value: of 1,64 and 2.5%, respsctively (ose App. 3 70 STATISTICAL INFERENCE, ESTIMATION [omar 4 EXAMPLE 6, A manager wishes to estimate the mean number of minutes that workers take to compli particular manufacturing process within 43 min and with 80% confidence, From part experience, the manager Knows that the standard deviation o is 15min, The minimum required sample sie (w > 30) is found as follows: x oF sop =X we 1a aecaming 1 805N 1s ret 8 ra re SL 3 ince the total confidence interval, fe 3 min 167.24, of 68 (rounded to the next higher integer) EXAMPLE 7, A suste clucation departarent finds that ina random sample uf 100 persons why aitendal college, sérrcceived a college degree. To find the 9% confidence interval for the proportion of college graduates out of all the persons whe altended college, we precoed as follows. Firat, we note that this problem tnveives the binomial distribution (sce See. 3.3}, Since. > 30 and both op > $ and {1 — p) > S, the binomial distribution approebics the normal distribution (which ix simpler to use: sce Sec. 15). Then an papery assuming a < 005) 59,0008) 258) SE sing as an estimate of p was 7 58(0.05) oat o13 ‘Thus pis between 0.27 and 0.53 with a 99% level of confidence 44 CONFIDENCE INTERVALS FOR THE MEAN USING THE ¢ DISTRIBUTION ‘When the population is normaly distributed but ¢ is not known and w < 30, we cannot use the noriial distribution for determining cosfidence intervals for the wiknown population mean, but we can Use the Faistbubion, Tus is symmetrical about sts zeta mean Du i Haller than the standard normal distribution, so that more of its area falls within the tails. While there is a single standard normal stistrbution, there diferent J distribution for each sample size, x, However, asm becomes larger, the 4 distribution approaches the standard normal distribution (sce Fig. 4-3) until, when > 30, they ate approximately equal Appendix 5 gives the values of 10 she right of which we fine 10, 5. 2.5, 1, and 0.5% of the total area wunder the carve for various degrees of freedom. Degrees of freed (4) ate defined in this case as a — | Standard normal dissin > 2X rineation, 93 cuar. 4) STATISTICAL INFERENCE, ESTIMATION a (or the sample size minus I for the single parameter j we wish fo estimate). The 95% confidence it for the unkiwwn population wican when the ¢ distribution is used is given by o(e- 2 The sandurd exer ofthe mean 2 i cven by the standard deviation ofthe parent portation 2 divided by the mare foot ofthe samples sie J that So = o/ va. Pos fie populations ice N,fintle correction factor most be added, and of = (o//byGW = n)TN =}. However af the sample si is ‘ery smal ineation tothe poptation sie, /(N = n)/( = [}igetose to | and canbe dropped fromthe formuls, By convention, this is dome whenever n = O0SN, Independently of this tinite correction factor, np is drctly related to. and iversely elated to in [soe Eq. .20.8)) Thus increasing the samples sie 4 times increases the accuracy of as an estimate of by catting oy in half, Notc also that ‘9p s anways smaller an 9. he reason tor this Wat the sample meats, 38 areager of IME pe ‘observations exhibit Iss variability or spread than the population values, Furthermore, the lrgerare the sane siee, he mone he valucoat'apaneuveragsl Uk wits repost Ihe valuvst Gs Figs 4+. For a population composed of the following $ mumbers: 1, 3, §, 7, and 9, find (a) ye and 2, (B) the theoretical sampling distribution of the mean for the rample size of ?, and. (c) yp and 0 la LEW _ls3454749_ 25 (6) The theoretical sampling distribution of the sample mean for the sample size of from the given finite population m is given by the mean of al che pusshle different samples that can be obtained fom this Dopulanon, Me MuMDsr oF commPmuacoss OF 3 NUENDEKS fakeN 2 at LANE EERO concern For Ae Or? 1s SY/2188— 10 (sae Prov, 3.18), These 10 samples are 1,351,551. 1,8: 3,5: 3,2, 3,9:5 7: $,9; and 7,9, ‘The mean, of the proceding 10 samples ts 23,4, 3,4, 8.6, 6, 7, 8. ‘The sheoretiea sampling dlistribacion of the mean is given in Table-4.1, Nove thatthe variability ar spread of the sample means (from 2 to.8) is less than the varity or spread of the values in the parent population (from | to 9. confirming the statement made at the end of Prob. 4.55). (©) By applying theorem 1 (Sec. 42), 1g = y= 5. Since the sample size of population sine (that is, > 0.05V), greater than 54 af the vi OK Ver ” 48 STATISTICAL INFERENCE, ESTIMATION lemar. 4 ‘Table 41 Theoretical Sampling Disisibuiion of the Mean Values of the Mean | Possible Quicomes | Probability of Oveurrenee 2 2 a1 3 3 ol 4 4 a2 $ 45 02 6 ‘ 2 , 7 a 8 8 a Total Lie or the theoretical sampling distnbution of the sample mean found in Frob. 4.0(0+ (a) tind the mean ane the standard error of the mean asing the formulas for the poputation mean and standard deviation given in Secs. 2.2 and 2.5. (0) What do the answers to part a show? PEASME STA HERE THR Pear pOTIsOsTSITaT_ Og PEREAESESES EAESESEC BOLE ELS (6) Theanswers to part a confirm the rests obtained in Prob. 4.5{¢hby the application of trarem F (See. namely, that ap =y and op = (o/ ya)y(N=m)/(W~ 1) for the finite population where n> 0.056, Noe thal We LOOK alf the postin diferent samples of size 2 that me cout take from our ite population of $ mmibers, Sampling from an infinite parent papalation (or from a finite parent ‘population with replaccment) would have required taking an iil number of randem samples of sie ‘frome the parent population (an abuiously impossible task), By taking oly a fonited number of random samples, theorem I would hold only approximately (iss. yy ™ wand vy % yA with the approximation besoming better as the number of random samples taken is increased, In this cass, the tarepling distribution of the eampbe mean gerurated ig refereed to athe ompé (the ae, A population of 12.000 elements has a mean af 100 and a standard deviation of ¢#, Find the mean and standard crror of the sampling distribution of the mean for sample sizes of (ab 100 wk hy 00. (al a) up aa Since a sample of $00 is more than 5% of the population size, the finite correction factor mst be wsed jnthe formala fr the eiandaed error cuar. 4) STATISTICAL INFERENCE. ESTIMATION 78 60 [iano 900 60 | om TMT 1 = 30) To 294.9% oe 20,982) a 1.2 net oe Without the correction factor, - word have been equal to 3 instead of 1,92, (a) What i the chape of the theoretioal sampling distribution of the moan if the paront popula tion is norma? Ifthe parent population is not normal? (2) What is the importance of the answer (© part a? (a) Ifthe parent population is normally distributsd, the theoretical sampling distributions of the mean are also normally dstribuied, regardless of sample size. According (o the centru lime dhearem, even if the parent population is not normal, the theoretical sampling distributions of the sample mean approach Normality as simple size increases (Le.,asm— co), Thisapproximation is sufficiently good for samples of at east 30, (6) The contrabimit theorem is perhaps the most important theorem in all of statistical inference. Te alloms us to use sample statistics to make inferences ahout population parameters without knowing. anything about the shape of the parent population. This will be dane an this chapter and in Chap. 5. (a) How can we calculate the probability that 2 random sample has a mean that fall within a given interval if the theoretical sampling distribution of the mean is normal or approximately: normal? How ie thit different feom the procoss of finding the probability that a normally dis twibuted random variable assumes a value within a given interval? (2) Deaw a noemal curve in. the ¥ and zecales and chow the percentage of thearea under the curve within 1, 2, and 3 standard. deviation units of ite mean, () [the theoretical sampling distribution ofthe mean is normal or apprositmatcly normal, we can find the probability that a random ample has a racan that falls within a given interval by calculating the sorresponding 2 values in App, 3. This is analogous to what was done i See, 3.5, where the normal and the standard normal curves were introduced. The aly diflerence rs that aow we aze dealing ‘sith ‘ue distribution of the 1+ rather than with ihe distribution of the 1s. In addition, Before (X= nie, while now 2 = (4 — ue) /ee=(X —al/or, sinoe ap (6) In Fig 4.5, we have a normalcurve in the 1 scale and a standard normal curvein the rseale. The area Kectle a a er Heelers: a st ort ar Fig. 45 16 STATISTICAL INFERENCE. ESTIMATION [onar, 4 ‘Find the probability that the mean of a random sample of 25 elements from a normally diss twibuted population with a mean 90 and a standard deviation of OD is larger tha 100, ‘Since the parent population is normally distributed, the theoretical sampbing distribution of the mean is ako normally distributed and op = 7/ /m because w-< O0SN. For X= 100 kop www or elyn GOVE TE ‘Looking up this vabue in App. 3, we pet 083 PCE > 100) = 1 ~ (0.5000 +0.2967)= 1 ~ 0.7967 = 0.2033, o¢ 20.33% See Fig. 46, Atk Euale Fir 4 4.12 A small local hank has 1450 individ wl sivings accounts with an average balance: of $3000 and a standard deviation of $1200, If the bank takes a random sample of 100 accounts, wit is the probability that the average savings for these 100 accounts will be blow $2800? ‘Since w= 100, the theoretical sampling ditibutioa of the mean is approximately normal, but since > WUBIN, the finite sorrection factor must be wsed fo find rp. For X= 82800 N-up =u 280 — 5000 m2 sy ao Nan 1200 flaso. 19 [380 OR VaYN-1 JinoV 0 oy ia 73 im App. 3, we pot PCY < 82800) = 1 — (0.5000 + 0.4582) ‘Looking adit, on 4.18% See Fig, 47, ESTIMATION USING THE NORMAL DISTRIBUTION 413 What is meant by (a) A point estimate? (@) Unbiased estimator? te) An a) Because of cost, time, and feasibility, population parameters arc frequently estimated from sample statistics, A sample viaistic used to estimate a popwlation parameter i called an exsimaror, and specific observed value is called an estimaie, When the estimate of an ueknown population parameter is piven by a single number, itis called a poiw eseimate, For example, the sample mean is an feslmator of the population mean, and a single valve of fsa point estimate of Similarly, she ‘sample standard deviation scan be Gsed as an estimator of the poptlation standard deviation @ and single valus of» isa point estimate of u. The sane proportion psa be used as an estimate Fr the population proportion p, and a single value of 7 is & point estimate of » (ue. the proportion of the popelation with a given characteristic) val estimate? CHAP. 4) STATISTICAL INFERENCE, ESTIMATION 7 (6) A estimator is anblesad if in repeated random sampling from the population the corresponding ttatictic frora the theoreiical campling divteibution x equal to the population parameter, Another Way of stating this is that an estimator i unbiased if its expected Value (see Probs. 3.20 and 3.31) is qual to the popmlatinn parameter being estimated For example, ¥, « [einer in Fas. (106) ant @.NB)), and pare unbinsed estimators of w,.0, and p, espectively. Other important criteria for a good estimator are discussed in Sec. 6 (0) A fetcratessimate refers to the range of values wsed to estimate an yaknown population parameter gether with he peababiliy, oF candice level thatthe interval dis doch the va kas BopMsTc parameter. This ic known as. eowfdence inerval and is usually centered around the unbiased point sstimate, For example, the 95% coniidenos interval for ur is given by FN = 1960 < ps Nop 360.9) = 0.95 The two mimbers defining confidence interval ars called confidence fits, Because an interval ctiate also expresses the dogsse af accuracy o¢ cowialence we have it the estionate, st i SUpEHOr te 2 point estimate, 4.14 A random sample of 64 with a mean of 50 and a standard deviation of 20 is taken from a. population of 800. (a) Find an interval estimate for the population mean such that we ire 93% confident that the interval includes the population mean, (8) What does the result of part a tell us? G2) Since n> Hi, 20 can woe the 2 wulue of 1.56 from the standard normal cstsibution to construct the 98% confidence interval for the unknown popalation and we can we sas an estimate for the unknown oo en see he RA" (anes a tn an Ze a Sf EE eens om tan ow WAT ta hi rsh a [N=a ~ 8 Oy ogo 24 a - 2 a ao aN =17 yea soo 1S (24) oe $0.4 4.30, Tia is Beton the Hower contidenes nt of 46.2 nthe upper condense limit of $4.7 witha 98% level of sontders. () The result of par tells us that if we take from dhe population repeated random samples, cach of sine 11 = 64, and construct the 88% contidenes interval for cich of the sample means, 98% of these cone fidence intervals will contain the trae unknown popaation mean, Dy assoring tha cur confidence interval (based on the single random sample that ws have actually ken) esone ofthese 95% sonidence that include p, we take the calculated rik of being wrong Sof the tne A random sample of 25 with a mean $0 is taken from & population of 1000 that is normally disteibuted with a standard deviation of 1 Find (a) the @0%, (8) the WM%_and (6) the 9% confidence intervals for the unknown population mean. (aly What dogs the difference im the results to parts 2, b, and ¢ indicate? @ W=N-L16top seh youn i normally dt Jee E64 se <1 anes 04m i i were vis = wats so+ 98d 78 416 47 STATISTICAL INFERENCE, ESTIMATION lemar. 4 ‘Ths jis betmcen 70,16 and 89.94 with 0% confidence. (a) be 80 1:96(6) ~ HN ILS ‘Ths jis betmsen 68.04 and 91,76 with 9695 level of eonfiaens, be BALE 2,58(6) ~ 15.48 ‘Ths jis bermcen 64.52 and 95,48 with 49% keel of confidence, (i) ‘The results of pars a, 8, and ¢ indicate that as We inerease the degree of contin required, the size of the confidence interval inereases and the interval estimate: becomes mare vagne (he. bess precise) Honever, the degree of confidence associated with a very narrow confidence Interval may be $0 low sssto have litle meaning. By-canvention, the most frequently used confidence interval is #5, followed by 90 and 994 ‘A-random sample of 36 students is taken out of the S00 students from a high school taking the college entrance examintion, The mean test score for the sample is 33H, and the standard deviation far the entife population of S00 students is 40. Find the 95% contidence interval Jor ine unknown population mean score, Since 1 1, the thesia nenpling Atribtion rhs aan I appeaeNARy Ane Alu, seas > 005 [RaW _ 0 HHH. og Ve NT a V apm T Ot Then wa Leoy = 3804 1.96164) ve 3021254 Thus is between 367.46 and 392.54 with a 95% level of confidence, ‘A researcher wishes tn estimate the mean weekly wage of the several thousands af workers employed ina plant within plus or minus $20 and with a 99% degree of confidence. From. ast experience, the meearcher knows that the weekly wages of these warkers ne normaly distributed with a standard deviation of $40, What is the minimum sample size required? ina S168 = 26.3, or 27 (rounded to the nearest higher integer) (a) Sotve Prob. 4.17 by first getting an expression for n and then substituting the values fram the problem info the expression obtained, (6) Why is the question of sample size important? (e) What is the size of the total confidence interval in Prob. 4.17? (d) What would have to be the sample size in Prob. 4.17 if we had not been told that the population was normally distributed? (e? What would have happened if we had not been told the population standard deviation? (o) Searing with an/J= 2 yu(soe Brab, 4.17) we get 29/(8 ye) yi Thue CHAP. 4) STATISTICAL INFERENCE, ESTIMATION » (5) so Substituting the values from Prob, 4.17, we get ne ES] = 2668, or 27 (the same as in Prob. 4.17) (6) The question of sample size i important because if the sample is too small, we fall to achiews the objectives of the analysis, ang ifthe sample istoo large, we waste resources beearuse i is more expensive 10 colleet and evaluate a larger sample. (0) The size ofthe total confidense intceval ia Prob, 4.17 is S40, of twice N— x, Since we arousing Jas at catimate of 1 — gis sometimes referred to a8 the error af the estinate. Because in Prob, 4.17 we ‘want the error ef the estimate to be “within plus or minus $20," we get — = $800, ora range of $40 for the total confidence interval (a) Uf-we had not been told that the population was normally distributed, we would have had to increase the sample to at east 30m Prob, 4.17 im onder to justly the use of the nommal distribution, (©) [fms had not been told the value of, we auld not have solved the problem. (Since we were deciding fon what camp ce to take fn Prof, 4.17, we could not porsbly have known the © nvear an esimate of 0.) The only way we sould estimate «(and thes approximate m) would be if we knew the range of wages Fn the highest tothe lowest Since £ Ie inches 99.78 of all the agra vase the-normal ere. wwe cauld have equated Go with the range of wages and thus estimate @ (and solve the prablern) With reference to & binornlal distribution, indicate the relarlonship between 4a) pean gigs (1 and A, and fo} wap. and dy. a) = np = mean monber of succeses in. tials, where pis the probability of succes in any of the trials (ee See. 3.31. 4p = w/e p = the proportion of swesesses of the sampling distribution of the propor- tion, Ub) p— the proportion of succes i he peputuian, and f — the proportion of accesses ds the sample (and fan unbiased estimator of ph (0) o = Var pT ~ starr eatin of the mums suena re palo ata smd ene 9 (v4 yn lee hon 9608" (28 4.20) Fora random sample of 100 workers In a plant employing 1200, 76 prefer providing for thelr own retirement benefits over belonging to a compans-sponsored plan, Find the 95% confidence interval for the proportion of all the workers in the plant whe prefer their own retirement plans 80 an 42 STATISTICAL INFERENCE, ESTIMATION singe 4 > 30 and mp > Sand a(l —p)> 5 since w > 0.05 =07 2196) S) a a= sing fa am estimate for p 0.7 1 9640.05)(0,96) so7z009 Thus {the proportion ofall the workers in the plant who prefer their oun retirement plans) is between 0.6 and 0.79 with 95% degree of eonfidence. A polling agency wante to estimate with 964% level of confidence the proportion af voters who would vote for a particular candidate within +£0.06 of the truc (population) yeoportion of voters. What ic the minimum sample cize required if other poll: indicate that the proportion voting for this candidate is 0.307 % poe a nf pp prorat n cas 164 [PMO og EERO OTI — oqn36 by squaring both ses (2.0896910.3)¢07) = RESSPNO SOT) 156.59, oF 187 (a) Solve Prob, 4.21 by first getting. an expression for mand then substituting the values fram the problem into the expression obtained. {b) How could we still have solved Prob. 4.21 if we had not been told that the proportion voting for the candidate was 0.30? (eo) Starting with 24/00 pie 7 * aa 2 ‘ell = -(- and pa) ene vn Pp tuve Prob. 4.21), we get 1422) Sutmsitusing the valucy from Prob. me pet (Loy"taayo.7) _ Qosseyo. ‘O08 aos 15689, oF 157 (dhe same as in Prob, 4.21), (}) Ih we had nos boen tid that the proportion votIng for the candidate was 0,30, we ewuld estimate the largest value of m to achieve the precision required.na mutter what the acta! value of pis This # done uy Letting p — 0.8 (ao that I~ ~ 0.5 al), Shae pf — pe} ayaa iw the erator of He fon fe 1 (ce part «) and this product is greatest when p and 1p bath equal 0:5, the vale of wis greatest Thor CHAP. 4) STATISTICAL INFERENCE, ESTIMATION 81 p(t =p) _1.63(0.500.5) _ (26896) 0.25) ot oe z 21868, or 187 (nstead of w — LST when we were told that p 0-30). In this and similar cases, trying to get an actu estimate of p does not greatly reduce the size of the required sample. When p is taken to be 0.5, the formula for m can be simplified to en Using this, kr got CONFIDENCE INTERVALS FOR THE MEAN USING THE ; DISTRIBUTION 4 44 (a) Under what conditions can we not use the normal distribution but can use the J distribution (wo final coufidence interyals for the unknown population sucan? (8) What is the eehstiuaship between the rdistribution and the standard normal distribution? (c) What is the relationship between the ands statistics for the theoretical sampling distribution of the mean? (a) What is meant by degrees of freedom? (a) When the population is normally distributed but the population standard deviation o i not known and the sample sci smaller than 3, we cannot usc the normal distribution for determining confidence intervals forthe unknown population mean but wean vse the Students (or simply, the f) distribution, (b)- Like the standaedt narmal Aistihution the 1 istibuting is bell-shaped and symmetrical akoat is era ‘can, but itis platykurtie (se See, 24) or latter than the standard normal distribution so that more of its area falls within the tails, While there is only one standard normal distribution, there is a different ¢ distribution for each sample size x, However, as m booornes larger, the ¢distibution approaches she standard normal distribution until, when w > 30, they are approximately equal @ and is found in App. 3. (ery land is found in App. $ for the degrees of freedom involved, (a) Degrees of freedom (af refer to the number of values we can choose freely. For example if we deal with a ample of and we know thatthe sample mean for these two vals is 10, ne can freelv asian the value to only one of these two numbers, Mone number is 8, the other nurnber must be 12 (to get the ican of 1). Then we say that me have a— 1 =2— 1 = Ed, Similarly, ifm = 10, this means that we can Freoly assign a valoe to only 9 of the 10 values if we want fwestimate the popolation mean, and 30 we have n= P= 10-129 at (2) How can you find the ¢ value for 10% of the area in each tail for 9 df? (B) Ta what way are ¢ ilues interpreted differently from z values? (e) Find the # value for 5, 2.5, and 0.5% of the arca within cach tail for 9 df. (4) Find the rvalue for 5, 2.5, and 0.5% of the area withim each tail for a sample size, m that is very large or inf How do these ¢ values compare with their corresponding = values? (ah The sas fav IDS the ak within ch dao cai Ley ag doi the ces a O10 ae App. Sto df. "This pives the rvahieof 1.383. By symmetry, 10% of the area under the ¢ distribution ‘with 9 alco Kee within the lot tail, to the leit of = 1383, 425 STATISTICAL INFERENCE, ESTIMATION lemar. 4 () The ¢ values given in App. 5 sefer to the areas (probabiltics) within the ‘all's) of the ¢ distribution indicated by the dograes of fevedom, However, aloes given ia App. 3 rer to the arsas (probabilities) under the standard mortal curve jrom the mean tothe specified £ values (eaeapare Example 4 with Frample 8 (6) Moving down the columns beaded 0.05, 0.025, and 05 in App. § t0 9 df, we get ¢ values of 1.833, 12262, and 2.280, respectively, Racouse of symmetry, $2.8, and 0.59% of the area within the left tail the Fdistmbuion for 9 lf lie to the let of f= —1833, 1 2.262, and f= 3.250, respectively () For sample sees (ann at ane ery ange or afin, Hage = LOS, danas = 1.960, anal ggys = 2576 (irom the last rom of App. $1 These coincide with the corresponding = values in App. 3. Specifically, fuges = L960 mans that 3.882 of the ars under ther distribution with sdf Wes within the ight tai, €2 the right-of = 1.96, Similarly, 2~ 1.9% gives (rom App. 3) 14780 of the arsa under the standard normal curve From si O40 r= 198, Thus, Far df ==] = co. the Plitribution is identical to the standard normal curve A random sample of 25 with a mean of 80 and a standard deviation of 30 is taken from. a population of 1000 chat és normally distributed, Find (a) the A%%, (A) the 954%, and ¢¢) the 99% confidence intervals for the unknown population mean. (d) How do these results compare with thase-in Prob, 4.157 (a) L711 for 24 af oss ene 674 an 1.206 th ef oa ” oo or 24 a Wo BO 20 Tw AE er eter 4 a 8 ef oe yo 19 or 34 Ge tpamarm mets Thus jis between 63.218 and 06.742 with 99% degree of confidence. (d) The 90, 95, and 99% confidence intervals, as anticipated, are larger in this problem, where the 1 disttbution was used, than in Prob. 4.15, whese the standasd normal distribution was used. Hawwever, the diferenars ate not great because when w= 25, the distribution and the standard normal distifou- tiom are laily similar, Note that in this problem we had to use the f distribution becase rwas given (al wot, as at Pools 415% Arandom sample of 1 = 9 lightbulbs with 1 mean operating life of 300 and a standard deviation fof ASh it picked from a large shipment of Vightbulbs known to have a normally distributed: operating lS, (a) Find the 90% confidence interval for the unknown mean operating Ife of the entire chipment. (8) Sketoh a figure for the reculte of part a @ gas — L880 foe 8 af be Pardy os tae$S sae ‘Thor jis approximately betwaon 272 and 328 h with a 90% level of confidence. cuar. 4) STATISTICAL INFERENCE. ESTIMATION 82 any 424 (6) See Fig. #8. A random sample of = 23 with = 80 Is taken from a population of 100 with @ = 30. Suppose that we know that the population from which the sample is taken is not normally distributed. (a) Find the 95% confidence interval for the unknown population mean. (6) How does this result compare with the resubts of Probs. 4.15(6) and 4.2S(b1? (a) Since we know that the population from which the sample i taken is nat normally distributed and 11 30, we cam use asither the normal nor the ¢ distribstions, We ean apply Chebasher's theorem, Thich sates that regardless of the shape of the distribution, the proportion of observations (or area fallg withix K standard deviations ofthe mean) is at last |— (1/2), for A> 1 (ee Prob. 3.40) Setting f= (1/08) =0.95 and solving for we get a 20 Then wake ames 2 wos 2682 Vi 3 Thus 11s approximately Between 53.and 107 with a 95% level of contidence. (6) The 98% confidence interval using Chcbyshev’s theorem is much wider than that fours when We could use the normal distribution [Prob, 4.1509] or the # distribution [Feob. 4.25(0)]. For this rason, Chebyshev’s theorem is seldom used to find confidence intervals for the enknown population mean. However, it represents the only possiblity short of mereasing the sample size Lo at least 30 (60 thatthe ‘nopmal distribution can be used. Under what eond:tions can we construct confidence intervals for the unknown pepukation mean from a random sample drawn from a population using (a) The normal distribution? (6) The ¢ distribution? (c} Chebyshev’s theorem? (o) We can use the nocmal distribution (1) ifthe parent population is normal, » > 30, apd « or s are Snows; (2) ifm > 30 (by invoking the central-timit theorem) and using s as.an estimate for 6; or (3) if = 30 but o is given and the population from which the random sample is taken is known to be normally distributed, (6) We ean uss the edisteibution (fs the given digress of freedoms) whea «230 bat it nat given and the population from which the ample is taken is known to bo normally distributed. (0) Tees 20 but the populatin Gm which the das sap o taken a od Kani te be cnally lstrdbated, theoretically ws should use neither the normal distribution nor the distribution, In such ‘cor, eithor wa chord ea Chebychev's theorem or school increace the ssa af the random camp to st STATISTICAL INFERENCE, ESTIMATION lemar. 4 ‘2 = 0.0 as to be able to use the normal disriation). fn reality, however, the eisribution is sed seven in thene cater. Supplementary Problems SAMPLING 429 430 (a) What does statistical ijerence sefer to (§) What are the names of the descriptive characteristics of populations and samples? (c) How can representative samples be obtained? Ans. (a) Estimation and hypothesis testing (4) Parameters and statisties (et By random sampling (2) Starting foam the thied columa apd tunth row of App. and reading horizontally, obtain a camp from 99 elements. (8) Starting from the seventh columa and frst roo of App. 4 and reading ve tain a sample af 10 from 46M clement dws. (a) 31, 13,33, 67, 68 (B) 24, S4, 290, 218, 385, 130, 24, 72, 313, 397 SAMPLING DISTRIBUTION OF THE MEAN 4 on 4 as How-can-ve obtain the theoretical sampling distribution ofthe mean from. poptulation which is ta) Finite? (@) Kotinite? ans. (4) By taking all possible diferent samples of sie @ from the population and shen fading the mean of ach sample (6) By (hypothstically) king an infinite number of samples of size » from the infinite eplatcns anl thes ing ee sn ee age ‘What is (a) the mean and (5) the standard error for # theoretical sampling distribution of the mean? Ans. (a) Jo = where jis the mean of the parent population (6) oy =f vi where oi the standard deviation of the parent poplin and m isthe sample ie; oe ite poplations a see N wheven > OASN, og = (oval = mith = 1) Foca papnlatin of 100 tors, p= ane = 1 What ethene atl stanwaed orrne af The thetic sampling distribution of the mean for sample sizss of (a) 28and (b) 817 din, (a) sey = 50 gits and oy = 2 48) ap = Sits and rp = 107 What i the shape of the theuretical sampling distibution of the mcan for samples of fa) 10 the parent population is normal? (6) $0 ifthe parent populations not normal? (c) On what was the answer t part 8 busca? Ans. (a) Nomal 16) Approximately normal (c) The eeatrabtimit theorem What ic the statistic for (a) Random variable XP (8) The theoretical sampling distribution of 2 Ans, (a) 2=(N—ylie (6) 2= (8 alfa What ic the probability of 1 fying between 49 and 50 for a random sample of $6 fear popslation with tb and o= 12% ‘dns. 01498, oF 14.98% What i¢ the probability that the mean for a ransom sample of 14 accounts receivable drama from a population of 2000 accounts With can of SUOMO and & stacubaad deviation of $4000 will Le betes 91500 and 510,500? dns. 08813, oF 88.13% CHAP. 4) STATISTICAL INFERENCE, ESTIMATION BE ESTIMATION USING THE NORMAL DISTRIBUTION 48 an What are unbiased point estimators of ys, 2, and p, respectively? Ans. 5 [as debined in Eqs. (2.108) ad (2.1761), and p Using the stanelacticn! macrnal distesbutin, stow foe ye Go) the 90%, CH) the WSR, a (o) the BO confidence intervals Ans, tah AY = L6bop (by Mhoperating hoars(e)» wookd have had co be increased to 301 justi them of the normal dxtniiion FFor the binomial dtsrbutlon, wrke the formula for (uy sand o, (by op amd dp when # < 0.05%, and (6) whee n >A.a5 Aes (ab on aie — ATP) op — Vie — Pia ad dy — VU 10) Oy = VT PITH x VN For a random sample of 36 graduate students in economics in a graduate cconopnics program with $80 students, § students have an undergraduate degree in mathematics. Find the proportion of all graduate studens at this university with an undergraduate major in mathematics at the 90% comfidenae hve Ans, O11 40 0.33 A. manulacturer of lightbulbs warts (0 catimate the proportion of defective lightbulbs within 0.1 with a 96% ogres oF evade What isthe ini saruple sie eoypinet if previnns experience idicates that the proportion of defective light bulbs preduced is 0.2 Ans, 62 (ua) Waitedowa taesapression forte solve Prob, 47, (Lp Hlow-cuull still ave solved Prot. $47 if the sanafactrer did not know tha Ans: (ah 2p ap (8) Ry Itting p= 0.5 and n= 87 86 STATISTICAL INFERENCE, ESTIMATION lemar. 4 CONFIDENCE INTERVALS FOR THE MEAN USING THE : DISTRIBUTION a9 40 4st Find the evalu for 29dffor the Following areas falling within th (right) til of the sdstritation: (ay 10%, (6) S85, fed 2.554, and) 056%, Ans. (a) fayy = LET (8) fens = 109 el dawns = 20S (a) fae = 2.786 FFind the : value for the following areas falling from the mean to the ¢ value under the standard normal curve: (a) = 40%, (H) F— 45%, (e) F~4TSN, and [d) TAOS fe) How do these = salucs compare with the corsesponding ¢ Yalues found in Prob. 449° ins. (a) 2= 128 (b) 2= 185 fe) z= 196 (d) 22.88 Ce) Corresporling 2 and ¢ values are very similar (compare = 1.28 to f= 1.811, 2 L.65t0 1 = 1,699, 2= [9610 1 = 2045, and 2 = 2.38 10 7361 Arrandom sample of m= 16 with X= 5M) and ¢= 10 ictaken from a very large npalation that is normally distributed. (a) Find the 95% confidence interval for the unknown population men. {5} How would the answer have differed if = 10? ins, (a) $467 to 35,33 (using the ¢ distribution with 15 ef) () 45.1 to $4.9 (using the standard normal Alistribution) On. particular test for a very large statctiss clase, random sxmple of m= 4 students has a moan grade Wa Tard 58 The onde far the entincelass ate knowin to be nermalty distrihered. Fae thence population mean of the grads. find (a) the 95% confidence interval and (h) the 99% coafidence interval. das. (a) Approximately from 62 10 88 (6) Approximately from 5 Avrandom sample of n= 16 with Y= 50 and s= 10 is taken from a very large population that nomnally distributed. (a) Find the OS. aanfidance interval for the unknown popula (@) How isthe answer in part « diffrent from those of Prob, 4.517 Ans. (a) 38 to 61 fusing Chebysher’s thoorem and + as a rovsh ertimate of «) (B) The 858% eanfidence interval bere is uch wider than those found in Prob. 431 Indicate which distribution to use in onder to find eunfidense intervals for the unknown popalation mean from a raneom sample taken front the population inthe fllawing cases: 4a) w= 3éand.s= 10, 48) r= 20 And + — 10 and the population is normally distribated, and (cb 11— 20and s — 10nd the population is not rnomnally distributed, dns. (a1 Noswal dutrbation invoking the centeal lst thease and using rae an extineats of) (by The ¢ distribution with 19 df (e) Chebyshev's theorem Statistical Inference: Testing Hypotheses §. TESTING HYPOTHESES Testing byporkeses about population characteristis (such as j and a) is another fundamental aspect of statistical inference and statistical analysis, In testing hypothesis, we start by making am assumption with regard to an unknown population characteristic We then take a random sample from the Fepelation, and on the basis of the corresponding sample characteristic, we either accept oF rejeet the hypothesis with a particular degra of confidence. We can make two types of errors in esting a hypothesis. First. on the basis of the sample (formation, we could reject a hypothesis that is in fact true. This és called a rype error. Second, Wweecould accept a false hypothesis and make a sype IT error. We can control or dctermine the probability of making a type Ferror, a. However, by reduc; wwe ll have tor st prohnbility af making a type TT error, A, unless the st increased. a is called the level of significance, and | ~aris the lev af confidence of the test EXAMPLE 4. Suppose that a fim producing lightbulbs wants to know if team staim tha its lightbalbs fuming hour, 4, To do this the firm can take a random sample of, sy, 100 hulls and sind their average lietume W. The stmalle the difference is between V’ apd je, the mare likely s acceptance of the hypothesis that y= 1000 brimming: hours ata speciied level of significance, @, By sctling eat $%, the frm aoexpts the calculated risk of ‘Of the tHe. By setng @ aN I", Re frm WOH! fase a greater probably of accepting 52 TESTING HYPOTHESES ABOUT THE POPULATION MEAN AND PROPORTION 9s follows ‘The formal sieps in testing hypotheses about the population mean (or proportion) a 1. Assume that js equals some hypothetical value jig. This is represented by My: j= jey and is called the sald Aypathests, Une allernative hypotheses ake Men Hy: je My (Fead “ju 18 HO equal 10 fig"), Hy se > fe, oF Hf: je < fig. depending on the problem Decide om the level of significance of the test (usually $%, but sometimes 1%) and define the ‘accepiance region and rejection region for the test using the appropriate distribution 3, Take a random sample from the population and compute X, 1f¥ tin standard deviation units) {alle iw the acceptance region, accept Hy; otherwise, reject Hy in favor of H ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Tenms of Use, 85 STATISTICAL INFERENCE, TESTING HYPOTHESES [omar 5 EXAMPLE 2. Sepposc that the firm in Example I wants ta est whsterit can claim that he ihtbalb it process Inet [000 Burning hers, The firm esr random rape of no Te te ght dfs that she empl scan T'=960 hand the sap standard deviations = 0h. Ifthe i wants o conduct the tit al the 5% velo Snificame tshonldprweed as falls. ince ral eral to. larger than, osm han 1, the Sm ‘Book at the all and sllermativehxpotheses x My w= 0m wg 1000 Sine = 30, denen tiation of the va apytesinaey seal (aa we a we swan “The acsptance pion of th testa the $Y lve of signconces within 41:96 under the standard normal curve aod the ration region i ots foe Fp. 5 1p. Sino the ejection region sn both ai, ve have awe aie, The third sep iso find the # wae sorrssponding fo T: Tse _ 990 — 1000 dn “R077 Telesoneglen —Acsopance reps Radecka ga ig. 51 Since the caleulate = vahue falls inthe rejection region, the firm should reject My, that = IO and accept My, that i # 1008, at the 3% level of significance EXAMPLE 3. A firm wants to know with a 95% level of confidence if it can etaim thatthe boxes of detergent it sell coataia more than £00, (about I.1/b}of detergent. From past experince the frm knows thatthe enous of Stergnt inthe bos is normally dsibuted.- The frm takes a tandort sample of m= 2S and fndsthat V = 30g and s= 7S q. Sings the tr is interesed in testing if ue > 300g, 4s have He w= 80 Hye p> SO Since the popilavon dstnibution 1s normal bul n= 30 and-¢ # Mot known, We must use the str ibUtON KHER =| = 24 depres of freedom) to define the critical, or rejection, region ofthe test at the Ss level of signicance, “This is found trom App. 5 (ice See. 4) and ls gine In Fig. 3.2. ‘This tsa ripheral rest. Pally, since Vou _ 520-500 20 sida” 357085 "15 und it as within the acceptance region, we accep 4, that jt = Sg atthe $e level of significance (or with a 85> level of contdenee). 133 ‘Acepane mgs Fa y Fe 82 EXAMPLE 4, In ths past, 60% of the students entering a specialized college program received Mir egress within years Foe the 1980 entering class of 38, only 1S reccived their degrees by TSM To test if the 1980 class CHAP. 5) STATISTICAL INFERENCE. TESTING HYPOTHESES 89 Performed worse than previous lass, we fist note that this problem involves the binornial distribution. However. tines m > 30 and ap and n(l —p) > §, 0 can use the normal ditsibution (eee Soc, 3.8), with g (the proportion of successes) = 0160, For the 1980 class, the proportion of suceesses j= 15/36 = 0.42 and the standard ersor Hl — p/m = (GUO / A= OER Since We worl ke ta test ifthe IME elase performed worse. we have 00 AE p06 He Bap _ 0420.60 oe Since thi ia deytuail rst and 5% ofthe area under the stapdard normal curve lies to the left of =1.64 (see App. 3h tre Teject My und conclude, at the $% level of significance, that the 1999 class did perform worwe than previons classes. Homever, if a= 1%, the critical s¢gion Would be to the left of :=—233 and me would aecept Hy. Problem $.5 shows how to define the acceptance and rejection repons in the units af the problem instead of in ssandard deviation units, Prablems §.10 and 5.11 show how ta find the aporating-characerisic urve (OC curve), Iwhish gives the value of for various values of gc jy, Problem 5.12 then shows how to find the power curve, which grves the value of (1 — fh For various valves of 2 > ty. Thea 53. TESTING HYPOTHESES FOR DIFFERENCES BETWEEN TWO MEANS: OR PROPORTIONS Ta many decisionmaking situations, it is important t determine whether the means or propertions of two populations arc the same or different. To do this, we take a random sample from cach population and only ifthe diference in the sample means oF proportions can be attributed to chance «lor we acoept the hypothesis that the twe populations have equal means or proportions. ‘Trthe two populations are normally distributed (or if both and my 230) and independent, then the sampling distribution of the difference between the sample means or proportions is also normal or approximately normal with standard error given by ie to esti an wp fw po=A eS to test if py = ps (2p y and wn BBE a eid average fF a 9) i EXAMPLE 5. A onanager wants to determine at the $% level of significance if the hourly wages for seniskilled tnorkers are the came in two cities. Inorder todo this, she takes @ random sample of hourly wages in both cities and finds that Ty = 56.00, Ty = S50, 4 = 52M, and ss S18 fora, = 40 andl my = 4. The hyroteses to he testes Me mayo Hye Ameer wa #0 This isa two-tal test amd the aeceptance region for My lies within 1.96 under the standard normal curve (ere Fig. Bh [tyeda tte 208 era a viTeR od Very Yay Ya Se Fost Wi-T)-9_ 06 |, 7, oO 20 STATISTICAL INFERENCE, TESTING IIVPOTHESES lomar. 5 Since the calelatod = vale falls within the asoptance resin, we accept Ho, that yr = ys at the ta level of tnanicane. Harucver. ithe twee populatone wars known to be wovealy riscad bus beth n and, are ae than 30 and it were assunid that of = of (but Unknowa), thea the sapling dstbution ofthe diferente Between the mens wont have a rifurion with m+ ns~? deers offers (ce Prob 414) EXAMPLE 6, firm wants to determine atthe 1% level of significance i the proportion of acceptable elestronic ‘components ofa foreign supplicr, 7, is greater than for a domestic supplicr,p:. The frm takes a random sample from the shipment of each supplier and finds that j= 09 and jy =0-7 for m) = 10Kland m; = 80. The frm sets up the following hypotheses: Hor yep Hi mee ts “This is right-ail test and the reece region for Me les tthe right of 2.33 under the standard normal curve, BA nay _ 1D) + EIHO.T) 1 em Ran wn CRIS PP AP (Oa) SO are bia = sans ~008 Siace Bi =F)= tn) 333 oe ‘We refecl Mp and accept the hypothesis that py > 1 Mhe 1 Ye level oF signibeanse. S4_ CHILSQUARE TEST OF GOODNESS OF FIT AND INDEPENDENCE, The (chi-square) distribution is used to test whether (1) the observed frequencies differ “signifi- cantly” from expected frequencies when more shaw nyo outcomes are possible: (2) the sampled distribu- tion is binomial, normal, or other; and (3) two variables are independent. ‘The ¥ statistic calculated from the sample data is given by. 2-rn eH where fp denores the fequencies and fj the expectd frequencies HW the calculated y° is greater than the tabular value of x at the specified Jevel of significance and Agrees ufficelenn (om pp he nll Ills is reece avon of healer ative ype A “The degrees of eeu fox cst of goodness of ft (Vand 2) are given by dt =c-m— 53) ‘where ¢ represents the eategories and m, the number of population parameters estimated from sample statistics, ‘The degrees of freedom for tests of independence, or conténgency-sable tests (31, are given by ot Ike = 15.6) ‘whore r indicates the number of rows of the contingeney table and c, the number of columns, The expected frequeney for each ell of a contingency table is _EAEah te ‘where Y, and F, indicate sum ever row and columa, respectively, of the observed cell and w represents ‘me overall sample size. 5.2) EMAMPLE 7. [a the past, 30% of the TVs sobl by a stove West sual scien, 40% vtse alien aud 20% Weve large. In onder to determine the inventory to maintain of each type of TV set the manager takes. random sample F100 recent prrchacee and finde that 30 were small yerewn, 40 were medium, and 40 were large. To tect at the 5% CHAP. 5) STATISTICAL INFERENCE. TESTING HYPOTHESES 91 level of significance the hypothesis thatthe past pattern of sales H.sill prevails, the manager procesds as Follows (we Table 5.1) mem ecice no population parameter was atimated,.w =O. df = 2 meane thatif'we know the valve of 2.of the 3 clase (and the Catal) the third class mot “free” to vary. Since the caleulated value of y* = $83 icemater than the taular alte of = 599 with a = Sand df = see App 6), we cana reject A, that the past sakes pattern stl eval ‘When the expected frequency ofa eategory i ess than 5, the category sbould be combined with an adjacent ope (sce Prob 5.18}, Foe testing ifthe ampled distribution is binomial or normal, sce Probs. $.19 and 5.20. ‘Table 5.1 Observed and Expected Purchases af TV Sets by Sercen Slee Screen Sire ‘Small | Medium [Largs | Total Observed patter fy Poe patter fe 0 * EXAMPLE 8 A car dealer has collected the data shown in Table 5.2 on the number of foreign and domestic cars purchased by customers under 30 years old and 30 and above, To rest at the [% level of significance ifthe type of car bought (foreign or domestic) is dependent of the age of the buyer, the dealer constructs a table of expected Frequencies (Table $3). For the first cell in rom 1 and columa 1, we obtain Ef De te_ FOS0 ” 170 ea The other three expected frequencies can tbe oblained by sublraction frem row and column. totals. Thus Table S.2 Contingency Table for Car Buyers Type of Car Ase Farcien Pwestie Total 230 30 o 70 2 x 0 100 Total 0 10 ‘Tae $3 Table of Expected Froquancls for the Observed Froqucncic in Table 52 Type of Car Awe Forsign [Domestic | Total = 2 » 70 230 » ef 100 Toul 0 ia Ey 2 STATISTICAL INFERENCE, TESTING IIVPOTHESES ah (r= Ihe 1) = = 1K anne E Since the calculated value of y* excocds the tabulas valuc of x? with = 0,01 and df = 1 face App. 6), we reject Hy. ‘that apes aot a Factor in the type ef car bought (and sonchude that sounger people seem more likely to buy foreign fears), Whendf = | but w < Sia coreetion for eomtnaiy is mas by using (fy ~ 4,1 — 05)’ the numerator of Ea, 154) (608 Prob $22), oa 33 ANALYSIS OF VARIANCE ‘The analysis of variance is used to test the mull hypothesis that the means of he ar more populations fare equal vere: the alternative that at faact one of the means ic diferent. The populations are accumedt to be independently normally distributed, and of equal variance. The steps are ais Fallows: 1. Estimate the population variance from the variance berweem the sample means (MSA in Table 54) . Estimate the population variance from the variance within he samples (MSE in Table 3.8) 3. Compute the F ratio (MSA/MSE in Table 5.4): varianee between the sample means variance within the sumples ‘Tale 5.4 Analysis of Variance Table Dane ‘Semrce of Variation ‘Sum of Squares of ‘Mean Sqnare | F Ratio Fron Tawi te na explained by SSA = PKR) = TF ent ee ‘ain he : SE ee | SESE EG —Tyt — fir=ne|Mse= St front Sst = EE uy— a= ssarsse | =r 4. Whe calculated F ratio is greater than the tabular value of F at the specified level of significance and degscey of fieodum (fiom App. 7), the mull Kyputlesis, fg, uf equal population ucan is rejected in favor of the alternative hypothesis, My, The preceding steps are formalized in Table Sal where 7) = mean of sample J composed of r observations = (Stu)rr 13.3) W = grand mean of all c samples = (FF, Mis) se es) S84 — sum of squares caplainad by factor 4 — EUR, =} (10) SSE s sum of squares of error unexplained by factor Ae SVN TP (5H) SST = total sum of squares = SSA +SSE = 7 Uy — TF Gr Appendlin 7 gives # values for e = 0.05 (the top number) and # = 0.01 (the bottom or boldface umber) for each pair of dogrecs of freedom: CHAP. 5) STATISTICAL INFERENCE. TESTING HYPOTHESES 2 Af of numerator = «I (san where cis the number of samples and Af of denominator — (r — I} (Sith where r is the number of observations in each sample. EXAMPLE 8. A company sel, identical soap in three different wrappings at the same price. The sales for $ months ate given in Table $6. Saks data are normally distributed with equal vasianee, To test atthe 3% level of Tale $5. Fihe-Moath Sales of Soap in Weappinge 1, 2, and 3 ‘wrapping 1 wrapping 2 ‘wrapping $ = 0 BI o ” sa 82 2 0 8 435 significance whether the mean soap sales for cach wrapping is equal or not (i, Hy ts = ss and gy are not equal), the company proceeds as follows: 4 435 yn 10+ 00 +48 10 00 35 2 Teettew Tete, Tatttt SSA. SUBD — 857+ (80 - 83)? + (87 —83)'] = 130. SSE = (7 — 2H (85 — 82) 479 — AO (41 — 80) + (80 — 82)? +8 — SOF + CE — 80 479-80! 482 — stn! 4 0 ROP 4 (90 A7F4 601 —8TF +A STF 4 — 8T! 4 RAF 0 SST SAAT = 85+ (83 83) ++ # (85 = 85) = SSA 4 SSE = 240 The preceting data are wes to constract Table 60 for the analysis of variance (ANOVAL. ‘Table S46 ANOVA Table for Seup: Wrapplngs Sum of Degrose of Variation Squares Froodom Mean Square F Ratio Explained by wrappings 88A= 130 ISA = 13/2= 63 eeerees otc MSA/MSE = 65/917 Esroror = 10 unexplained (ee thin eotiesns) Tocal| r-l=M Since the caleulated value of F =7.09 (Trot Table 5.61 exoseds the tabular value of ¥ = 3.88 for a = 0.05 and 2 and 12 degrees of facedom (sce App. 7). me reject Hs. that the mean soap sales for each wrapping i the same, and aceept H;, that it ie not the same. The preceding procedure is referred to-as ane-way, oF meat, analysis of variance. Far tuc-way analysis af warianes, se Pros. $6 and $77 ” STATISTICAL INFERENCE, TESTING IIVPOTHESES lomar. 5 56 NONPARAMETRIC TESTING Nonparomettic testhig is used when ane of more of the assumptions of the peevious tests have hot boon met, Usually the assumption in question is the normality of the distribution (distribution of the data is unknown or the simple size is smatl), Nonparametric tests are often based om counting techniques that are easier to calculate and may be used for ordinal as well as quantitative data “These tests afc inoficent if the distribution is known of the sample is large chowgh for a parametric test To test a hypothesis about the median of a population (analogous to tt of population mean). the Wilcoxon signed rank test may be wsed: 1. For each observation, calculate the difference between the value and the hypothesized median. 2 Rank values according to the distance From the median, dropping zero differences, 3 ‘The test statistic, W” = the sum of the ranks of the positive differences, This is compared to the critical values in App. 9. ‘The signal rank test cam be adjusted slightly to test equality of onsdians of mone thant two sanyples {analogous to ANOVA, but no assumption of normality) in the Kruskal test 1. Rank all data as if from a single sample. £ Add ranks of eacn sample, A 3. The test statistic 12 (a “ae tm a8) =H IW all sample sizes are at least 5, chiesquare tables (App. 6) can be used with dl = ¢ — For @ nonparametric West of goodness of fil, the Kolmngoro®-Smurnow test compares. curnulative probabilities of the data to a hypothesized distribution 1. Aerange data from smallst value to largest value. 2. The proportion of data below cach value is compared with cumulative probability below that value from the hypothesized distribution. 2. The test statistic is the maximum dlference found in stsp 2, which can. be compared ta the EXAMPLE 10. A comporation has § subsidiaries with profits of 20, 35, 10, 8. ~S0, §, 0, 13, vespectivety (in (MSI, and wants to: know with 95% confidence if the median firm is making profit of 5 MS, Since we have a small sample (<30) and ne assumption of normality, a test cannot be used, We set the mull and altemative Hy Med #5 “The steps for the signed rank test are listed in Table $.7, Since 4 = "= 32, we accept My: Med = Sat the 5% significance bevel, EXAMPLE 11. A store owner wants to determine at the 3% significance level whether sales are normally dis- snibured with mean of 10 unlesand standard éeviation of 3 units, Sales for a week are observed of 2,8, 4, 18,9, 11, and 13 units "The nal saad precludes the use of the chiseuars pooduesiol Gt test bat the wnrapacaunctrie Koluwoa= Smiimov test may be used to test Agi normally distrituted 4c = 18, «= 3: My: not normally distributed 4 = 10, on CHAP. 5) STATISTICAL INFERENCE. TESTING HYPOTHESES 98 ‘Table 8.7. Skgned Rank Test Rank for Positive Wes | Ordered Rani Ditfercnocs 5 a NIA ” 5 1S (ied 1s s 5 1S tied -10 8 3 3 os 10 4 ° i 5 5 5 » ‘ ‘ 5 =35 7 Weiss ‘Ordered data values 2 4 = ° Ti Bo] OS Proportion Belew, aaar | azae | saa | mas | aan [iw Normal cumulative probubilay, 9s | 0.38 | 227 | 2524 | saa | aos | sats | over Dilrenee, % tsar] 263 | ange | ror | sae | ise | oa ‘The maximum difference is 26.20% (0.2630), which is ess than the ertial value of $10; therefore we accept the mull Ibyporhosis that sales we morimally Wistsibuted with w mean of I and standard devin of 8 Solved Problems TESTING HYPOTHESES sa (6) What is meant by testing a hypotheris? What is the general procedure? (h) What is meant by rype Hand rype MH errors? (o) What is meant by the level of significance? The fevel of comfitonce”! (a) Testing a hypothess rors to the acceptance oF rejection of an assumption made about an unknown sharastensb of a populahen, sgh as a parunieer oF tne shape of frm ofthe popalaton dst uten The fist step in testing a Bypothesis 8 19 make an asumption about an unknown population characcrisic. A random simple 6 then tsken from the population, and on the hans of the corresponding sample charactriic, ne accept oF ject the hypothests with a particular depee of os (} Tipe Ferner refersto the rejection ofa true hypothesis, Type If error tefers to the acceptance of a False Iypothesl in suudsveal analpas, we cas saetrol or determine the probability of type to wpe Tr crore. The probability of type I cxror # usually given by the Grock eter alta (a), while the probability of type II eeror is represented by a beta (0. Dy specifying a smaller type T eror. we Increase the probability ofa type Herron. The only way to reduer both and fi 9 increase the eagle eink (0) The lew! of significance refers tothe probability of rejecting a trae hypotheds or committing type Eersor (Gel Tle deel of surface (given by 1 ~ a) oefers tothe probity of assy ne lyetesis. lee statistical work, the evel of significance, is usually sct at 5%, so that the level of confidence, 1 —, is 98%, Sometitoes om 1% (60 that I~ a = 898%), 26 STATISTICAL INFERENCE, TESTING IIVPOTHESES lomar. 5 $2 (a) Howean we test the hypothesis that a particular coin is balanced? (5) What is the meaning of type Fand type [Terror in this case? (@) To text the hypothesis that a purticolar coin is balanss, we can toss the coin a number of times and record the number of heads and tails. For example, we might toss the coin 20 times and get 9 heads instead of the expected 1, This, however, dogs not necessarity mean that the coin i= unbalaersd Indeed, since 9 "so close” 4-10, weare “hkely” to Be deabng With & balanced coin, If, ROWEVET, We get only 4 heads in 20 ines, we as likely to be dealing with an unbalanced coin because the prcbafity (of geting 4licads (and tails) In 20 vimes with a balancod coin is very small indced (see See. 3.3). () vem though 9 heads in 20 tosses indicates in all likelihood a balanced sein, there is always a small probability that the eoin is unbalanced. By accepting the hypothesis that the-coin is balanced, we could thos be making a type Terror. Hormever, 4 heads in 20 tosses is very likey to mean an unbalanced coin But By accepting the Hypothesis that the coin is ambataneed, we mut Kace the small probability thal the ‘in is instead balanced, which would mean that we made a type [Lerror, In testing a hypothesis, the wostigalor ca vt the probability of reexting a Lue Lgputhesis wr 9 sell as Uesiced. However, by increasing the “region of aeceptance” of the hypothesis, the investigator would necessarily increase the probability of accepting a flee hypothesis or of making a-type Terror, 5.3 How ean a producer of steel cables test that the breaking strength of the eables produced is (ay 5000Ib? (by Greater than 500018? (¢) Less than S0001bT (@) The produser can test the breaking strength of the stet cables produced is M0 Ib by taking a random suns of Ure wales nd inlet rats Unsthitasyietgtl W. Thexhines Tin he pple ye = S0ODIb, the more likely the producer is to accept the hypostesis For the specified kel a signif (&) The producer may instead by interested in testing ifthe breaking strength of the cable excoods S001 Ge. > 000K}. To do this, unos pein, the preafueer takes random sample of the cable produced and tests the mean breaking strength The more exceeds the hypothesized = $000 Ib, the more likely dhe produces fae nesopt the hypotheats atthe aposiied level of significance, () To test that the breaking strength af the cable does not exesed SOO0|, the producer finds the mean bbrcaking strength af a random sample of the sted cables, ‘The more 1 falls shore of $000 Tb, the rane likely the produces isto aoacpe the hypothesis thatthe breaking strength of the steel cables is bss than the {0001 (ic. 54 ~ £0001), with a particular degree sf sonfidenes 1 — « ING HYPOTHESES ABOUT THE FUFULATION MEAN AND FROPOR LION A producer af stect cables wants to test if the steel eables it produces have a heeaking strength of SODDID. A breaking strength of less than SDOOIb would not be adequate, and to produce steel fables. with hreaking strengths af more than S000 Th woul! uencesssarily increase penductian costs. The producer takes a random sample af 6 pieoss and finds that the average breaking. strength is SIONT and the sample standaed deviation is 4801s Should the peoduser accept the hypothesis that its steel cable has a breaking strength of $000% at the 3% level of significance? ‘Since could be equal to, greater than, oF smaller than S000M, We set up the mull and alternative hypotheses as follows: Moh eh S000 Fe SOND Since = 30. due sampling distibusion uf dhe mean is approninatly norm (ana we ean woe an estate fo). The aocxptance region of the test atthe 5% level of significance is within +1.96 under the standard norm curve andthe rejection oF enitcal region ia outside (ce Fig, $-3), Since the rjeetion resin tin both ls, we havea awotal ext. The thie step is 0 ind he = wake coprespamding ta ¥ Bat = mo _ ¥—oy _ 5100 5000 100 67 ep ah aR aed Since the calculated value of = falls within the ascepiance region, the producer shoukd aveept the nail Inypothecie Hy and rajot H, at the 5% evel of significance (or with a 85% level of confidence). Note CHAP. 5) STATISTICAL INFERENCE. TESTING HYPOTHESES oT 55 56 Rejeron gion Arcegncemmgos Recon pion Fig. $3 thar thie dacs not “prove" that jis indeed equal to S000, It ony “proves” that there és mo statistical fidence that 0 #6 wat epual te SONI at the $84 level of signiicance Define the rejection and acceptance regions for Prob. 5 problem, in terms of pounds, the writs of the ‘To find the acceptance region (atthe $% level af significance) in terms af pounds, we proceed 3s in Sex. 44 by finding the 95% conidence interval about jg Ha ary eS neech =a too = son tine Ths, to ascent Hy at the $M level of signtficanee, T’ misst have a value greater than 4882-41 anal smaller than $117.6. The relationship between this and the result obtained in Prob. 54 is shown in Fig. 5-A ons os } Rijecuon region Aroptcerepion Rejection region Fig 84 ‘An army recruiting center knows fam past experience that the weight af army recruits is nor- imally distributed with a mean of 30kg (about 176Ib) and a standard deviation « of 10kg, The recruiting center wants to test, atthe 1% level of significance, ifthe average weight of this year's recruits is above $Okg, To do this it takes a random sample of 25 recruits and Finds that the average weight for this sample is 83k. How can this test be performed? Since the center is imerest testing that je > S@kg, it sets up the following hypotheses: Me w= sOke Hy: > tke (Some books stare the mull hypothesis as Ay: ses 8Okg, but the result is the same.) Sinee the parent popalation is normally distributed and a is Known, the standard normal distribution can be used to define Ue siti eons ot vsjeation evan of the tests Wilh Hye je > BO bys ne Taare a igh deo wil oe critical region ta the right af z= 2.33 at the 1% level af significance (sce App. 3 and Fig. $5}. Then F-sty _ Fhe _ 85-80 oF On Tm 28 a7 STATISTICAL INFERE (E, TESTING HYPOTHESES [omar 5 oat ‘Asceaoce exon Rejection region Fig, $5 Since the calculated value of falls within the rejection region, we reoct My and accept (that > 80 keh ‘This sato tba if ge = 60g. the probabiey of gating «random sump fom this papulaion tha gives V=A5kg isles than 19%, That would be on unusual sample indeed. Thas we ret My atthe 1% level of sapnificance (sy We are 908 soniidant of masking the ABht scsion). A government agency receives many consumer complaints that the boxes of detergent sold by a ‘company contain less than the 2002 of detergent advertised. To chock the consumers’ com+ plaints, the agency purchases 9 boxes of the detergent and iinds that ¥ = 1802 and »= 302 How can the agency conduct the test atthe 5% level of significance it knows that the amount of sdetergent in the boxes is normally distributed ‘The agency can set up Hy and A as follows: Fe ‘Moz He se < 207 Some books set up the mull hypothesis as My ye = 24, bot the result ix the same.) Since the parent population is normal, o is not known, and n = 30, the ¢ distribution (with # Af and ero sy must be used to define the rejection region Yor this leet! tet a the 3% kvel of significance (ace Fig. St). Then Yim _¥ wy Fm 1 oo my ayn oe 3 Since the calculated 1 value fils within the reject region, the agency shoul! eject Hy and accept the sonsumers’ complains 4. Note that ifs had been set at 1% the rejection region would lis he left of {= —2,896, leading to the acceptance of My, Thus it important to spesify the level of significance bfore the test Fin 6 A hospital wants to test that 90% of the dosages of a drug it purchases contain 100mg (1/1000) ‘ofthe drug. To do this, the hospital takes a sample of» = 100 dosages and finds that only 85 of them contain the appropelate amount. How can the hospital west this at (aj w= 1%? (6) @= 5%? (c) = 10%? @)_ This problem involves the binomial distribution. However, since » > 30 and yp and n(l = p> 5, we ‘can use the normal distribution with p= 0.90, For the sample CHAP. 5) STATISTICAL INFERENCE. TESTING HYPOTHESES 29 59 510 Pep _ass— 0.90 _ op wes NT the howpltal should accept Hj, that = 0.90, at the 1% level of eignificanes, (6) At the 3% level of signiticanee, the aveeptance segion for Hp lies within 1.96 standard deviation units, ‘sod eas tle ial slovuhd auneas Ap ad jew! Hy al tle 957% level of sam lena al (0) At the 10% level of ignticanee,the-acceptance region for Hp lies within 1,64 standard deviation writs (see App. $), anc {Us the Mosprtal SHOWN reed My ABI aveept My, hat) ULM, Noe that LEST values of a increase the rejection region for My (1.., increase the probability af acceptance of Mf). Furthermore the greater isthe value of @ (i. the greater isthe probability of rejecting fy when tree), the smaller is (the probability of acecpsing a false hypothesis). The government antipallution spokesperson ascerts that more than 80% of the plants in the rogion meet the antipollution standards. An antipollution advocate does not believe the govern ment claim. She takes a random sample of published data.on pollution emission for 64 plants in the area and finds that $6 plants meet the pollution standards. (a) De the sample data support the government claim at the 3% level of significance? ¢@) Would the eanelusion change if the sample had beon 124, but with the sample proportion of the firms meeting the pollution standards the same as before? {64 stale onal (ah Hors Hy: 9 = O80 ai My: p= 80, The esostnn vias iF Hes i ts for a= Sa. For the sample 088 and Since it falle within the acceptance region for My. ‘This means that there is no statistical support far the jsovernment cary that > 08 atthe 6% bevel oF significance Ub) U?the sample siee had buen 124 instead of 6d, but F = 0.88 as before, (0.8n02, Ta 0.88 — 0.80 Toe oat and {and would fallin the rejection region for A (se that there would be no evidence against the government lain that p= 8). Note that aneceasing o (aud lobe evecpthons else te stose} wieceunes thee probability of accepting the government claim Find the probability of aevepting Hy for Prob, $.6 if (a) w= y= S0kg, (6) w= Ske. (m= S4kg, (d) w= 85g, fe) w= 87kg,and Cf) = Wky, (a) Th ye = 4g = 80kg, T= 85, 0 = 10k and a = 25, then F-io F-u_ ss- 90 ay ein Ine Sat saz STATISTICAL INFERE (E, TESTING HYPOTHESES [omar 5 ‘The probability of wccepting Hy when Je = tg = BOky is 0.9938 (by Looking up tbe value o App, Tand adding.$ 104), Therefore, the probability of rejscing Hl, when I, 10.9938, or 0.0062, 8) 1 ye = 82 by ny ths 250 fet trve equals Ton = ada ais ‘Therefore, the probability of acocpting Hy when His false equals 0.9332 (by looking wp the value of 2m 1 Sin App. 3 and adding 0.5 to i). (p= Seg, P= (85 = 8/2 = 1/2 and f= eS, (Ws 85ky. =O umd p= 05 fo) Uji 86 kg, 2 (85 ~ 86) W) Waa skp 2 and f= 0.5 —0,1915 = 0.3085, 1 and f= 0.5 — 0.3413= 0.1587, fa) Draw a figuee for the answees to Prob, 5.10 showing om the vertical axis the probability of accepting Hy when ys =S0kg, S4kg, $5kg, S6kg, and §8kg (b) What docs this show’? fe) Wihual ip the iinpottance of Knowing the value of a) Sce Fig 57 (8) The aperaring-characteistic (OC) curve in Fig. £7 shows the values of f for various values of j2 > sty “Nate that the rane the actnal-valne.oF yw ecorets ig, the smaller is (oe the praballty of accepting Hy ‘when fake), fe) Knowing the value of # is important if accepting a falar hypothesia (type I exter) Ieads to very damaging results, such as, for cxample, when a drug is acepted as effective when it is not. Im sach cuss the want to Keep ft, even if me have to accept a higher c (type Terror). Te anly way to avid ‘this and reduce both @ and fis to increase the sample size, Fig. £7 (a) Draw a figure for the answers to Prob. $.10 showing on the vertical axis the probability of ‘ejecting My for various values of > yg. What does this show? (b) How would the OC eurve found in Prob, $.11(a) and in part 6 of this problem look if the alternative hypothesis had been Hy < Ha? a) For cach valuc of p> jg, the probability of rejcling Hp when iy is Fabs > given by 1 — A, where fas found in Prob. 510(b} te part f. Joining these 1 = p points (starting with the valve ofc), we et the power curve (see Fig. 8), The power curve shows the probability of rejcting fy for varioms-vatucs of 44> vg. Note that the mores. sxceeds pig, the greater isthe power of the test (i. the greater isthe [probability of rejecting a false hypothesis) CHAP. 5) STATISTICAL INFERENCE. TESTING HYPOTHESES wl —+—_ gta Tite Fy. 8 () For Hy: 1 < jy, the OC curve (for an actual value of T apd for various alternative values of F on i a u Fi i au Table $19 gives the expected frequencies, Foor the first cell fe Ef _ (20201 7 HF Le =u ‘Table £.19° Expected Male and Female Warkers over Ane Ciman | Male] Fame | Tol 670 iW 2 3 21 s 3 n 2 0 # For the other salle, & found ky subtraction from the row and enlumn totals. éf = tr = Iie = 1211, Since df= 1 and a ~ 50, a correction for eantinuity must he made to calewlate 2, 3s indicated in Ea. (54a puna oy wae) (7-14) 0.57 | gp 121-05) aM OSy nas i 2 Thus (B- 61-057 | (8—si—03 Since the calculated 47 is larger thaw the tabulas value of 42 with w ~ 9.10 and af — 1, me esject 1%, that males and females over 6S continue to work inthis town independently of whether they are above or below TO years of age. The proportion of workere ic significantly higher for mates inthe 66 19 70 age group and For CHAP. 5) STATISTICAL INFERENCE. TESTING HYPOTHESES 109 Females in the 7I-phisage group. Note that the same adjustment indicate) by Eq. (84a) is also made for teats of the goodness of ht when JP = land n = $0, ANALYSIS OF VARIANCE 5.23 Table 5:20 gives the output for & years of an experimental farm that used each of 4 fertilizers Assume that the outputs with edch fertilizer are normally distributed with equal variance, (a) Find the meat output for each fertiliae and the grand mean for all the yeaes and for al four fertiizers. (0) Estimate the population variance from the variance derweea the means oF columns, (c) Estimate the population variance frem the variance within the samples or columns. (d) Test the hypothesis that the population means are the same at the $% level of signifieance. sesh sear me ese or a 6s 8 3 st 2 (a ELST yy ag aa 250) to eof he Wee he he EW where Ty isa sample or column mean, ¥ is the grand mean, + is the number of observations in each sample, and e isthe number of samples. Then ky = Tit 55 5514 53 = 55) 4 (SH 39) 48 5 HEL wt 12, 7 ‘which isan estimate of population variance from the variance diween the means ar enhumns. (©) Ap estimate of the population varianes from the varianee- within the samples or cokers is obtained by stearaging the four sample sariancer: 10 STATISTICAL INFERENCE, TESTING IIVPOTHESES lomar. 5 _Usto sa ees (50 — 58) +461 —58¥ +--+ 159-887 wT 2a, SE+S3+S}+ Si _ 20.57 429.71 +30.86 + 22.57 7 + ‘A more concise way of expressing the above is Fabs tse +e Eta No" , Etta Na, Dia ut, Plt — Fal aa 4216418 _m6_, ay a SIE variance beeen same ans _ 37.33 ® Yaraae Within Samples pale ‘The value of F frons App. 7 for @-= 0.05 abd ¢— 1 = 3a in the mumneratoe and (r— je = 28d in the eoominator is 295, Since the-calculated valve of Fis smaller than the tabular valve, we accept Hf, that the popelation means are the same 524 (a) From the results obtained in Prob, 2.25, nd the value of SNA, SSE, and S81; the degrees of fircedom for SSA, SSE, and SST; and MSA, MSE,and the Fratio. (8) From the results in part 4, construct an ANOVA lable similar to Table 5.4. (c} Conduct the analysis of variance and draw a figure showing the acceptance and rejection regions for Hy. @ SSA =r D(Ry = Fy = 112 Ifeom Prob. 5.26)] SSES PLY, Ty 728 [rom Prob. 5.254) S91 =F Ee, —Ty 05-397 447-38 +4 37 = SSA¢ S8E= 1134736 = HE The df af SSA =~ 1 =4— 13; dl of SSE =4r— Ije-= (8 1)(4) = 28 and df of SST= re 1 = 321 = 31, which is the same as the df of SSA plus the df of SSE. (@)- See Table 521 (le) The byporneses to be tested are Mes wi a ane Sess ALLE asst as te mot equal Since the calculated value of F = Ld is smaller than the tablar value of J = 2.95 with a =00S and ME=3 aud 28, we accept My Ge Fig. £13) that ss oe aust the wall hypothesis, Jp, that my we Since we were told din Prob, 5.23) that the populations were normal with sequal tariance, wo could view the four eamplac ac coming from the same popalation. Note that the CHAP. 5) STATISTICAL INFERENCE. TESTING HYPOTHESES m ‘Table $21 OneWay ANOVA Table for Feillzer Experiancnt Variation | SumofSquarce | Degrees of Freedom | Mean Square | F Ratio SSA = 112 e-1=3 MSA = 37.35 koerween MBA/MSE — neers | MsE= 2503 unexplained ewthin eolurns) rota St esis rene ate 38 Fess ro Acceptance resba Rejection region Fig, $13, MSE isa goad estimate of whether Myis true. Homever, MSA is abont equal to MSE: oaty if fy is true («0 that F = 1), Note that the F distribution is continous and is used here fora right-tal test $25. Table 5.22 gives the outputs of an experimental farm that used each of four fertilizers and three pesticides such that each plat land hid an equal probability of receiving each fertilizer pesticide combination (completely randamized design). (a) Find the average output for each fertilizer X: for cach pesticide T,. and for the sample asa whale YT. (&) Find the total sum of squares, SST. the sum of squares for fertilizer of factor A, SSA, for pesticides or factor B, SSB, and for the error or unexplained residual, SSE. («) Find the dogrecs of froedom for SSA, SSB, SSE, and SST. (a) Find MSA, MSB, MSE, MSA/MSE, and MSB/MSE ‘Table $22 Output with 4 Fertitcers and 3 Pesticides R am 0 5 (a) Thesotumn mean for each fertilise is given by > (3.509 ‘The row mean for each peeticide i givan by STATISTICAL INFERENCE, TESTING IIVPOTHESES Me ‘ne grand means grven Oy yeDE EY a0) 4594) ‘The subscripted dots signify shat more than one factor és being considered, The results are shown in Table 5.23, ‘Fable 5.29 Ourput with 4 Fertilizers and 3 Pestichles (with Row, Column, and Grand Means) © Feiler | | Fertile 2 | Fevllinr | Fert # ] Sample Mean Pesticide | » 6 Pesticide 2 3 I & § Pesticide 3 t 1 sst= Puy FF C1 Wa Id (2-9 9 GH HO 13-9 = Cor 1 OP @-¥=_ 1 B-9= 1 TW H4 vet u 5 SST= 161-4 1145-499 = 266 S8A=/ YF) — Fi (bermven-columa variations) = HUA 97 + 10-9 4 88) 41-97] MOS 4 14 14 25) = 156 SSB= 6) (-=Ty — dhetweensrom variations) = ale? —9F 449-97 + 5-97] = 49 F049)—72 59 = S57 - 554 - 55m 266 156-72 = 38 tssheentes df of SSE = (r= 1e= T= 6 Sorsst onli wise 88818 wise S82 use = Te 5.433) a5) 13.440) Gus) Guo) wan ss) CHAP. 5) STATISTICAL INFERENCE. TESTING HYPOTHESES m2 5.26 $37 Sa 821 F ratio for Factor A Merilies) iH ie 6 i FTO Frat For factor B pester Gan (a) From the results of Prob, 5.25, construct an ANOVA table sinilar to Table $4, (8) Test at the 1% love of significance the hypothecic that the means for factor A populations (fertilizers) are identical, (c) Test at the 1% level of significance the hypothesis that the means fer factor populations «peotieides) are identi (a) See Table 5.24, ‘Table $.24 Two-Factor ANOVA Table for Eifect of Fertilizers and Pesticides en Output Variation Sum of Squares | Degrees of Freedom | Mean Square Fr Explained by fertilizer MSA, (between columns) | SSA = 155 MSA SF Explained by pesticide MSB between rowsy ssp 72 Msp —26 | SSE” Error unexplained | SSE=38 otal SST = 266 (6) The hypotheses 4 be tested re Hs des veg ges ever Mais seat ate not all equal where y refereto the various means for fastor A (fertilizer) populations. For factor A, = 9.78 (from App. 7 for degress of fedom § (numerator) and 6 denominator) and = 0.01, Since the caleulated value of F= 8.21 (from Table 5.24} is less than the tabular valve of F, we accept Mj, thatthe means for factor A (lertivery poprtations are equa (0) The second set of hypotheses to be tested consists of A by vermas YS ey. 4p 4 are mot all equal Hy bbat now ye refers tothe various means for factor B (pesticide) populations. For factor B, F'= 10.8 ram Amp. 7 for degrees of freedom 2.and fand a= (01. Since the calculated value of F = 5.69 (eons Table §.24) 15 kes than the tabular value of F, oe accept Hy, that the means for factor B (pesticide) populations arc also equal. Note that im twovfactor analysis of variance (with an ANOVA abi salar w Table 5.2) we can et two ml knpotbrs, ne for fer A and one for ie Table 5.25 gives the first-year earnings (in thousands of dotlars) of stusfents with master's degrees from § schooks and for 3 class rankings at graduation. Test at the 5®% level of significance that the means are identical (a) for school populations and (0) for elass-ranking populations, a) The hypotheses to be tested are re ee sa. ates ans not equal where 1 refers to the various means for Fictor A (schools nemulations sst= TY HF 4 STATISTICAL INFERENCE, TESTING IIVPOTHESES lomar. 5 Table $25. First-Year Earnings of MA Graduates of § Schoots amd 3 Class Ranks (in Thomands of Dollars) ¥,=16 Xl R 4 (8- My = 16 6-1-4 (u- uP =_9 Comte 20 (4-14F= 0 (d-14= 4 1 (tala = 4 a= 1a = 16 0-1 ais 1a a6 » 6 SST = 774 47144 56 = Id SSA= EUR s— Ty (hermeem column variations) (19 — 14 + ct — 14° +13 1a + 12 - Bay + 10-14) = 190 sepa eS. - SSE = SST — S54 — SSI (S444 144416 (16 4 + (4 147 4402 =F) = sO 194 = 150-40 ” “These results aresummarized in Table $26, From App. 7, F'= 3.84 for degrees of freedom dnd 8nd a =0.05. Since the caleulated F = 70, we reject Hy and acoept Hy, that the population means of first- year carnlings for the # schools ane diferent ‘Table $26 Two-Pactor ANOWA Table For Flist-Vear Raruings ‘Variation | Sumof Squares | Degrees of Freedom | Mean Square F Explained by a MSA _375 Jecnoais (A) SSA 10 berween coturans Explained by ranking (B) between ros) msa = 4 278 | MSE 05 7 MgB = Error oF hunespiained MsB_ 20 mse “03 SE=4 = Tie==8 Frotal SST = 194 eoted (6) The hypothoses to be tested are Me wae versus AYE dey. ptp sy ane mot equal where refers to the varlous means for factor B fclaseranking} populattons, From Table 5.26, we pet that the calculated value of F= MSU/MSE = 40, Since this i larger than the tabular value of F 46 foe df 2 ind 0 oud oe — DE, we 14jeek My ad accep My, Hat the poplin scans of first-year carnings forthe & class rankings are different. Thus the type of school and class ranking. are both statctcally significant at the $% love in explaining dfferencor in fret year warnings. The CHAP. 5) STATISTICAL INFERENCE. TESTING HYPOTHESES nie preceding analysis implicitly assumes that the effects of the two Factors are alive (i nteracian betivemn there, NONPARAMETRIC TESTING saa 529 (2) What are nonparametric tects! ¢b) When would one want to wee a nonparametric toet (co) What are the advantages and disadvantages of nonparametric tests? (a) Nonpuramcrc tess rie fener astmptons fo etal the adit of heir rll, Parametric tests involve setmptons abou the speci iribton that the data folloms, aswell ste structure of du-genertng proses. Nonparunrc ess OW the reareher to fax te assumpUons regarding the eltnhaton ofthe data andor te fonction form ofthe andeing press (0) Nonparametric tests should be used oely when ove is uncertain about the asrmptions behind the porameti tt, The npal sition for toga nonparametc et ats fa pall sgh See {the ales are it normaly Gsroutd, smal srple woul neat Ie astrmption thatthe sample can noemally strbutGd With a mean fy an & lance fan (©) A conporameti tests adrantageous becanse of its case of calalaton aed its feibilny, There are ‘Nonparametric tests appropriate for most scailes of measurement, and for nonstandard functional forms fnd'distbations Als, te noeyaramtrc posix oft te dss ot ave the scarcer chose Clas intervals to compare observed und expected Yalu, The chisquare poodnew-of ft tei oft fb robust changetin ch peciications ‘The dexGvantages ofa conparaneti it ces arted the fous of information, Nonparametric teste are bed on counting rules, ugh as ranking. and Unsefone meni nila an atte Tho uly arn the a yrstvn et ay Ifthe tad ssrpion ola paranetitt wil be ron ficient, a therfore more power for a gives data oe A marketing firm is deciding whether food additive B is better tasting than food additive A.A foous group af 10 individuals rate the taste on a seale of 1 to 10, Results of the focus group are listod in Table 5.27. Test at the 5% significance level the null hypothesis that food additive B 3s.no better tasting than food additive A, Table $27 Food Additive Taste Commarison Individwal ID ] Additive A | Additive B Na, Rating Rating T 5 6 2 7 8 3 y ’ 4 3 6 s 6 8 6 6 6 7 5 4 8 6S 8 » 7 8 10 . ° ‘This is a small sample with ratings rather than quantitative variables, therefore the usval assumptions do nat hold. We proooid with the nonparammetii lst, Since we have two samples With data thal ate paired (ovo ratings per person), we list take the diFerenoe of the Wo ratings for ach person to Lst the hypotheses My: Mode=Medy 20 Hy: Med, =Medy <0 The steps are shown in Table 5.38, Since W/< LI, me reject My: Med, — Mody = 0 and aesept Hy: Med, — Mody <0 at the 5% rignficance level, 16 STATISTICAL INFERENCE, TESTING IIVPOTHESES lomar. 5 ‘Table 5.28 Ratings Signed Rank Test Rank for Positive Ordered Differences 0 ° 0 0s a 4 2 4 a a 4 ‘ 2 5 -1s 3 65 a 3 63 a 4 8 8 $:20 Data from the World Bank's ‘World Development Indicators reports that for 8 Latin American countries, male illteraey i& as Follows fin percent): Argentina 3, Bolivia 8, Bi ‘Veneevela 7 I 15, Chile 4, Colombia 9, Ecuador 7, Peru 6, Uruguay 3, (a Test the null hypothesis that the median illiteracy rate is 8% at the 10% significance level. (4) Test the null hypothesis that the median illiteracy rate is greater than or equal to 8% ab the 10% sigeiticance aval (@)Caleslations forthe signed rank test are given in Table 5.29 to test Hy: Med = 8 versus Hy: Med #& The ertical values for the signed rank test (App. 9 fora two-tal test at the 1% significance level and n= 9are9 and 36, Since 9 < HW < 36, we accept the null hypathess that the median literacy rate for ‘South American countries is equal to, (8) Totty: Med = 8 versus Mod < 8, we would expect a higher value of W fora higher poration amcdian. Therehore ths 6 @ 0%6-tall test with theaejection region i thee tal, The cetial value fons App. is Il. Since H 11.34, me reject the mull hypothesis that the median male illiteracy ates of all three groups are equal. (6) For testing twa samples, one can use the same ranking method, but ean compare the sum of ranks of the smallest of the Iwo groups with the critical values ia the two-sample section of the Wileoxon statistics in App. 9. Since all African rankings fell above the rankings of South American and Asian sountries, the rankings from Table 5,12 may he used, WaDay = 565 From the table in App, 9, the critical values at the 5% significance level ate 41 and 78, Since 41 Wr 78, we accept the mall hyparkeslsthac the median male literacy ranes in South America are equal Using the African mate iitracy rates from Prob. 530, test the aul hypothesis at the 10% significance level that the illiteracy rates in Africa follow the continuous uniform distribu- tion (a) Between 25 and 80 (6) betwoen 25 and 100, (@) The comrinear unifarm aseriburian has equal valoe of the density function at each point between 2nd, $0, "To ealeulate the probability of being hotaven values a and 4, ams eam take the aroa winder the density fnetion: Pla = 2¢ <6) the Upper and Koier hoiind Si oodness-o-fi test (b—a)/( 80 — 25), where the denominator is the differsnce berwecen We aN & sll Sipe ine, oe el A Ue Keddie Serio ‘Ordered daa walnes ‘Proportion below, 9% Uniform etimulative as] 73 | 14 | 255 probability, Diflesence, a9 | int m3 The maximum difference is 29.5% (0.295, which ig less than the critical xalve of 0411 (App, 10% therefore we accept the null Hypothesis that literacy tates in Afnica follow the continuous uniform distsbation between 25 ab 8 (6) This continuous uniform distribution has equsl value of the dersity fonction at each point between 25 and 100 under the mull: a = 1° = #) =~ a)/(100 25), The ealeulations are as follows ‘Ordered data values ufl2]s |» Proportion below, % ws [ase | as | s00 Uniform cumulative aa] ss] ae | asa Probability, Lterence, Ye ga | mr 3h ‘The maximum difference is 44.8% (0.448), which is greater than the critical value of 0.611; therefore we reject the null hypothesis that ilteracy rates in Africa follow the continuous uniform distribution between 25 and He Repeat the test from Prob, 5.19 with the Kolmogorov-Smimoy goodness-of-fit test to test the Hl: data aro from the binomial distribution with probability of aceeptance equal to W4. CHAP. 5) STATISTICAL INFERENCE. TESTING HYPOTHESES 19 Calealations are given in Table 5.31. The largest difference is 0.088 in absofute valve. ‘The critical walos From the table For » = 100 ic 8.13421 the 5% lovel of significance, Sings 0.058 < 0.136, we accypt the pothesis that the distribution of college acceptances Follows the binomial distribution with 2 probe oF nceeptance of A, Table $31 Kolmogocor-Souirmov Goodnesot-Fit Test Cumulative Relate | Cumutanve Number of Retaive | Binomial | Frequency | Probability Accepunces| Frequency | Frequency | Probabilities | (Observed | (expected) | Ditterence 0 75 075 76 095 nia now 1 3 34 0432 09 oes | —o0s8 2 3 931 28 9.90 core | -0.036 3 0 ou 0.064 10 100 0.00 Supplementary Problems TESTING HYPOTHESIS an su G) What do ue call the error of accepting a false hypothesis? OF erecting a trie hypathesis? (i) Whar ssmbol ic usually used far the probability of type Terror? What is another name for this? (e) What isthe shinbol conventionally used for the probability of type UI error? if) What wsthe level of eonfsdence? fe} If sais seduced from $ to 1%, What happens to #? ‘Aus; (a) Type [error; type T error (ht at kevel of significance () (if) 1—a (¢) pinereases Having set a= St, wher is a graduate shoo more likely to accent the hypothesis that the averse Graduate Record Examination (GRE} scores of its entering class (a) Equal 600? (B) Ave larger than. eO#Y fe) Aresmaller than 600? _ - Ans, {a} The closer the mean sample, Y, isto 600 (8) The more ¥ > 600 (e) The more ¥ < 60h TESTING HYPOTHESES ABOUT THE POPULATION MEAN AND: PROPORTION Am aircraft manufacturer needs to Buy aluminum sheets of U.UD in taackness, Uniner sheets WoUK! Rol De appropriate, ans thicker sheets would be too heavy. — The aiserat marufacturer takes a random sampke of 100 sticeisfeven at supper lain shits wna fry nat sit avstage Uhh i 0,088 it wma tein standard deviation is 0.01ia. Should the aireraft mamfacturer buy the aluminum sheets from this supplier in order to make the decision at the £2 lval af significance? Ans, No Define the acesptance region for Prot, 5.35 in inches, es, OAS bs 951960 A navy recrtiting center knows fsom past experience that the height of recruits ts normally distributed with a rican jc of 184¢m (Vem = 1/100m)and a standand deviation « of 10cm. The reenuiting center wants to test tv the I" level of significance the hypothesis thar the average hetght ofthis year’s reorults Is above 189 cm ‘Tord this, the recrufting officer takes a random sample of 6s recruits and finds that the average height for this saunple ss 182. fa) Shoubl the secruting offer accept the hypothesis? (4) What i the vjestion region for the test in sentimeters? Ans; (ah No (6) Groator than 1529135 5 S40 Sal STATISTICAL INFERE . TESTING HIVPOTHESES lomar. 5 A purchaser of elesironis eompancnts wants to tes the hypothesis that they last Iss than 10h. ‘To-do this the tales. random cample of 16 cach component and fide that, on average, they tort 6h, with a standard deviation of Bh. If the purchaser knows thatthe hfetine of the components normally distributed, should sdeaseegt the hypothesis that they kat less than ith at (2) A O%4 level of eamiddencet i) A 99% level of confidence? dns. (a) Yes (bp No Inthe past, 20% of applicants for admission into a master's prograrn had GRE scores above 680, Of the 88 students applying to be admit into the program in 1981, 22 had GRE scores above 60, Do the 1987 applicants have greater GRE scores than previous applicants at the $%o level of significance? das. No Find the probability of accepting Hy (that 1 = 690) for Prob, 5.39 if op = M3 and (ak p ) p02, (p= O24, fd) pO, (el p= HIG and (fd p= OIE, Aus, (a) OST (8) OTSE fe) OT OH OS (or 0.809 CN BD (a) What the value of a when p = 0.20 in Prob. 5.39 (+ Howsean the OC curve be derived for Prob. $397 dus. (a) 01123 (6) By joining the value of | = a for p = 0.20 with the valves off found in Prob 5h) to Urrioe various values of p > 0-20 ring tne probability of rejecting #4 (that = 090) for Fro, 9.0 ap = BUSS amd (at p=, ) p=0.2, (od p=O28, (el p= 0.26, and (f) p= 0.28. ano. (a) 123 (0242 (e) 009 HOS (er 0.591 GN OTS Hare ean we get the power curve for Prob. $397 dis. By joining the values found in Prob. $.42ta0 to (7) for various alternat values of p > 0:2 ‘TESTING HYPOTHESES POR DIFFERENCES BETWEEN TWO MEANS OR PROPORTIONS: st A consitking firm wants to decide at the $% level of significance ifthe salaries of construction workers differ between New York and Chicago. A random sample of 100 construction workers an New York has an average weekly salary of $400 with a standard deviation of S100. In Chicago, a nindom sample of 75 workers has an average weekly salary af S375 with a standard deviation of S80, Ts there a significant laiterenee tetagen the salaries of eonstroevon Workers in New York ang Chisago at 4g) Lhe 2% level (6) The 10% lever” ans. (a) No iB) Yes ‘A random sample af 21 APC football players hasa main weight of 268Ib with a suandard deviation of 301b, while arandoo sample of 1 NFC players has a mean weight of 240 with a standard deviation of 2ilb. Is the mean ‘sight of all APC football players greater than that for the MP players at the 1% level of significance? dns. Ves A random cample of 100 coldiereindicater that 20% are marsied io yar 1, while 309 ave married in year 2 Determine whother go accapt the hypathesis thal the proportion of married sods im yeut | sles thas that in year 9 (a) at the 4% level af significanas and Ch at the 1% level of denificasse Ans, (a) Avcept the hypothesis (6) Reject the hypothesis CHLSQUARE TEST OF GOODNESS OF FIT AND INDEPENDENCE sar 1 de bs rolled 60 times with she folloming resuts:a 1 came up 12 tlmes, a2 came up 8 shes, a ¥ came up 13, times, a4 came up 12 times, 8 S came up 7 times, and a 6 cathe up 8 times. the die balanced at the 5% level of significance? das, Ves Amurm coniains balls of 4enlors green, white, red. and blue. A ballis picked from the um and its color is recorded. The ball jc then replaced in the orp, the ball: are thoroughly mixed, and another bal ie picked. CHAP. 5) STATISTICAL INFERENCE. TESTING HYPOTHESES 121 ‘The process is repeated 18 times, and the result is that a green ball is picked 8 times, a white bal is picked 7 timer, a red ball s picked once, and a bine ball & picked twice, Does the ura contain an equal number of green, white, red, or bloe balls? Test the hypothesis at the 5% level of significance. “inv The bymrhesis chon he aeentes at the $% level af Gignifiemnae thar the wer enwtains an ent ‘umber of halls of all four colors $49 A random sample of 64 cities inthe United States indicates the number of rainy days during the month of June given an Table §,32, Dorainy days in U.S. ets follow a normal distribution with j= 3 and = 2at the 10% level of significance? Ans. No “Tolle 532 Nabe of Rainy Days ding Je fv US, Ces Number of Rainy Days] Number of Cis a 0 1 2 3 1B 4 ‘ 5 a a $50 Contingency Table $33 gives the number of acceptable and nonacceplable clecranic compenents produced at various hours of the morning ina random simple from the output of a plant, Should the hypothesis be accopred o¢ rejected a the $% level of significance thatthe production af acceptable Htems is independent of ‘the howr of the moming in which they are proved? ds, Auwet My ‘Table $33 Acceptable and Nenacce able Componcats Produced Bach Hou of ‘the Morning foam | 9t0am | itaw | am | Toul Acceptable a Pn 68 a Nonaovepeable | 30, » a8 1 ia io wa $51 The number af people voting Democrat or Republican below the age of and 40 plus in a random sample of 30 voters in acty is given in contingency Table 5.34, Ts vating Democrat or Republican independent of the voter being below the age of 40 or 40 plus inthis city atthe $Y% level of significance? Ans, No ‘Table 5.4 Democrats and Republicans below and ahore Age 40 Democrats] Republicans 6 10 is im STATISTICAL INFERENCE, TESTING IIVPOTHESES lomar. 5 ANALYSIS OF VARIANCE, SSD Table 5,35 gives the miles per gallon ford diferent cctanes.of gasoline for $ days, Assume thatthe miles per gallon for each octane is normally distibuted with equal variance. Should the Bypothess of equal pops Hon means be aecepted or rejected at the $% Keel of signiicance’? ans, Rejected ‘Table 5.38 Miles per Gallon with 4 Types of Gasotine for § Day Type 1 | Type? Type 4 " 4 la 5 n 1 0 B 1s 8 6 " 4 4 Is S83 Table 5.36 gives the miles per gallon for each of 4 silerent ostanes af gasoline and 3 types of ear (heavy, median. and light) in a completely randomized design. Should the hypothesis be anceptod atthe 1% kvel of ‘Sgnificance inal the population means are the same for each (@) Utne of gasoline” (bp Type af car? ans. (a) Yes (0) No ‘Table $46 Miles per Gallan for Bach of 4 Octames and 3 Types of Car Type of Car ‘Oetane? | Octane 3 | Octane a Hesvy Q 9 0 Medium 15 8 1" Light 6 2 0 SS4 Table 5.37 gives sales data for soap with cach of 3 difercet wrappings and 4 diferent Formas ia a compleicly randomized design, Should the hypothesis be accepted at the 5% level of significance that the population means are the tame for each ta) Weappinage? (b) Formula? ans. (a) No 1b) Yes Table $37 Soap Sale foe Each of 3 Wrapping an 4 Formate Formula | Foesula 2 Formula ¥ Formula 4 NONPARAMETRIC TESTING ESE Using the data from Table 4.88, would the Wilsoxan signad rank tot rset at tho 10% hull hypothesis that the median mis per gallon for type | gasoline is (a) 12 48) 15 dns. (a) No (W 2) (Bs Yer (HW — 0) CHAP. 5) STATISTICAL INFERENCE. TESTING HYPOTHESES 556 sst 1m Repeat the test from Prob, 5.52 wsing the Kruskal-Wallis rank test, Is the null hypothesis of quality of| sediang accupted at the 54% level of cigniicance! Ans. No, its rejected (HT = 14.25) Reprod the ts Faane Prob. 640 nsing the Kolenngnvay Sandeman auiceossott test Avethe data wieenally distributed with je= 3 and «= 2 at the 10% level of significance? Ans, No {maximum difference = 8,331) Statistics Examination 1. Table I gives the frequency distribution of the rate of unemployment in a sample of 20 large U.S. cities in 1980, (2) Find the mean, median, and mode of the uinemployrment rate. () Fine the variance, standard deviation, and coefficient of variation. (c) Find. the Pearson's coefficient of skewness and sketch the mlatixe frequency’ hisingram ‘Table 1 Froqueecy Distribution of Unemployment Rat Unemployment Rate. Frequency 1034 1579 Bod BSae ond 9599 4 4 1 2. The lifetime of an electronic component is known to-be normally distributed with a moan of 1000 h and a standard deviation of 80h. What is the prabability that a component picked at random, from the production line will have a lifetime (a) Between 1120 and 1180 ht (b) Between 958 and 7S K? (2) Below B55 h? (lh Abous 97S h? (e) Sketch the normal and the standard normal distribution for this problem and shade the area corresponding to part d. 3. The average IQ ofa random simple of 25 students at.a college is 11. [Pte distribution of the FQ at the college is known ta be normal with a standard deviation of 10. (2) Find the 98% eonfidence interval for the unknown mean 1Q for the entire student body at the college. (b) Answer the same qucstion if the population standard deviation had not been known, but the simple standard deviation was calculated to be 8. —(c) Specify all possible eases when the normal distribution, the ¢ distribution, of Chebyshev's inequality can be used, 4. Aiton sells dstergent pushes in toro plants. From past isa hnows that the amount of detergent in the boxes packed ia the two plants ly distributed. The firm takes a random saunple of 25 buses frost dhe output of excl pint and fads Ukat the wc weight end standard. deviation of the detergent in the boxes from plant | is 1004 g (2.34 Thy and 100g, respectively. Por the sample in plant 2, the mean is 1024 g und the standard deviation is 60 (a) Can the firm claim with a 95% level of confidence that the boxes of detergent from plant I contain more than 1000 2 (6) Test at the 95% evel of eonildence that the amount of detergent in 1 boxes af bath plants is the same, Anowers 1. (ah See Table 2 Lex. eno “ Yr Basti (bh) See Table 3. 124 Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Terms of Use, CHAP. 5) STATISTICAL INFERENCE. TESTING HYPOTHESES n Table 2 Caleulatbons to Find Sample Mean, Median, and Mode ‘Unemployment Chass Rate, Midpoint 3° Frequency f n 7078 72 2 14a 7529 1 4 308 80-8. 82 5 410 8589 87 4 MB 90-98 2 3 216 95.99 97 a I bien= Drs Wao ‘Table 3. Calculations 1 Find the Variance, Standard Deviation, and Cocficient of Variation class Micpoint 72 2 82 87 92 97 10065095 qt 0-32 Te 2082 eee Fi tn «i oz | om oo wt [aie | | Unesptonment ie Pit eatve Requeney 2 (a) The problem asks to find PUI =X < 1180), where AV refers te time measured in hours of lifetime For clectranic component, Givens: = 1000 h and # = 80 h and letting X, = 1120hand X;= 1180 h, we get 1139 = 1900 w 180 — 1000 1S and a= 126 o & iw w . @ o fo 4 @ STATISTICAL INFERENCE, TESTING HYPOTHESES [omar 5 Sabiracting the value of zy = 04878 from the value of ttandard normal distribution), we pot FALI20 < ¥ < 1180) = 0.0546, of 546% (0.4352 (obtained from the table of the 9881000 978 — 1000, 2 ETT 8638 and 2, SIO Looking up 2 =0.36 in the table. we oot 0.2123, For 2; 0.31, we get 01217, Thus POSS < N= 975) = 0.2123 —0,1217 =OM05, oF 9.06%, POY = 958) = 0S 07173 = O28TF, oF 28 7TH JAY > 978) = 0.1217 40.5 = 0.6217 or 62.17%, seerig. 2 aise Since the population is normally distribated and « is known, the normal distribution can be used: wa Tdony = Tae 943.92 wo 101.96 VS THhus is between 106.08 and 113.92 with 95% confidence. Since the divsibusion is normal, n < Mi, and o is mot known, the x rather than she normal distribution must be uscd, with 5 as an estimate af Fetes te faas mth 284 28 8 = 10 42.0543 v5 =loss0 ‘rns p15 between 106,71 and 115.30 wath 93% sonnenee, ‘The normal distabusion can be used (|) ifthe parent population i normal, n> 30, and @ or s ure known; (2) ifm > 30 (by invoking the eentraldimil theorem) and using s as an estimate for; or (3): 100 Hy: Since the population distribution is normal, but # < 30 and a is not known, we must use the & ictribetion with m~ T= 21 dogrese of Freedom: CHAP. 5) STATISTICAL INFERENCE. TESTING HYPOTHESES 17 Fim sey _ 164 ~ 1000 sin, tay v25 The calslated value of r oneveds the tabular valuo of ys = 1.71 with 24 depres of frsdom. Thus My isrejcted and H iswovepted so that the fir can claim at the 95% keel of confidence thatthe boacs of detergent From plant | contain more thar 1000 g of detergent 6 Hes pen ory Ho Memeo Ms imme Ao hon? | ea Vas He 28.82 10s4— 1024 Ta ain This is a two-tal test vith oy +n: —1= 49 degrees of freedom. Sines the tabular value of ‘with 49 df, the firm can accept at the 95% level of confidence the hypothesis that there is ne difference in the amerant of detergent in the boxes from both plants eat a) Simple Regression Analysis 61 THE TWO-VARIABLE LINEAR MODEL bbe i umple regression analysis, s used for testing hypatheses about th relationship between a dependent variable Y and an independent or explanatory variable V and for pred Simple tineur regression analysis usually begins by plotting the set of NY values on a scarier dhagram and determining by inspection if there exists an approximate linear relationships Yom bye bik 6) The swo- vari Since the points are unlikely to fall precisely om the line, the exact linear relationship in Eg. (4. FV must be modified to include 0 random dlsurbance. error. or siochastle seem, u, (500 Sec. 1.2 and Prob. 1.8 Yam +X, tm 62) sm is assumed to be (1) normally distibuted, with (2) zero expected value o¢ mcan, andl ant variance and it i further assumed (4) that the error terms. ars unwccrrslted ar ‘unrelated to each other, and (5) that the explanatory variable assumes fixed values in. repeated Sumpling (26 that ¥, and, ane alsa uncorrelated) EMAMBLE 4. Table 61 givwe the bychole of com per acre, ¥', revulting fiw the wariowe amacmte af Aentizer im pounds per acre, X, produced an a farm in each of 10 years from 1971 to 1980, These are plotted nthe satter diagram of Fig @1 The relationship hermeen Vand ¥ ia Fig 41 i apgrasimately linear te, the points would fall on or near a straight Kine) 42. THE ORDINARY LEAST SQUARES ME: The ordinary iecest-sqwares method (OLS) is a technique for fitting the “best” straight line to the sample of XY observations. [1 involves minimizing the sum of the squared (vertical) deviations of points from the lin HOD Min Do = fi 6 ‘where, Y; refers fo the actual observations, and 7; refers yo the corresponding fited valuss, so that VDP OR recat, "Tha gies the Tolling two mrmel oucrions (ove Pooh 65} 18 ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Tenms of Use, CHAP. 6) SIMPLE REGRESSION ANALYSIS 1 ‘Table 61 Corn Produced with Perilizer Used Yer |e n x 1971 1 a 6 lan 2 “ 0 1973 3 * lo 4 ® 4 ons 3 2 6 91% ‘ se 8 lr 1 a 2 aT 8 « ™ Lo 9 74 % we | » 2 Pista ofeom 7 oe a @ Fraiee a Fig. 4 Py aay eh Dw on EvnsiEnei Ee (6S where n is the number of observations and Ay and 6; are estimators of the true parameters by and by Solving simultaneously Eqs. (6.9) and (6.5). we get [see Prob. 6 Mall § -BEa ENE aca nEM (24) mo The value of By is then given by [se Prob. 6.716] i= 7-4 6.7) is offen useful to use an equivalent formula for estimating b, [sex Prob. 6.10a)|: (64 where x, =) =F, and y= ¥, 7. ‘The estimated least-squares regression (OLS) equation ie then 130 SIMPLE REGRESSION ANALYSIS lenar. 6 10.9) EXAMPLE 2. Table 6.2 shows the calculations to estimate the regression equation for the corn-frilier problem tm lable 6.1. Using Eg. 10.9), ‘Table 6.2 Comm Bredused with Fertisey Usedt Caleulations Y, x : a | Gorm | crertiizen |» » ao a v © 6 -7 1 2 4 0 <3 a 3 4 iba =u 36 4 8 im = 16 5 2 16 5 4 5 s 8 1 7 a 2 3 8 @ ™ u 9 ™ 26 "7 © f= Ret 956 the slope of the estimated regression li 5 a Eat = Ey ae (the slope of the estimated regression i By = P= BvT 22 97 = (1.66K 18) & ST 29.88 827.12 (the ¥ intercept) ¥, 2724 166m, ‘the eatimated repression equation) thus, when <0, Fo 21.12— fy, _ When x 218 F, Fa 272+ 90s = s7= FAs a eeu, the segression line passes thratigh point TT (ee Fig. 6-2), 63° TESTS OF SIGNIFICANCE OF PARAMETER ESTIMATES Ts order, to test for the statistical significance of the parameter estimates of the segressiom, the ‘variance of fh, and B) is required (see Probs. 6.14 and 6.13): (6.10) (et) Since of is unknown, the recibua variance sis used a8 an (unbiased) etimate of 2 ze ’ é zs (or) whore & represents the number of parameter estimates Unbiaied ertimates of the variance of by and by are then given by CHAP. 6) SIMPLE REGRESSION ANALYSIS ea 2+ 6K, 6.13) win so that s¢ and 3, are the sumdard errors of she estimates. Since w; 16 normally distributed, Y; and therefore fy and 6, are also normally distributed, so that we can use the ¢ distribution with »— dogrees of freedom, to test hypotheses about and construct confidence intervals for by and By (see Sees. $4 and 5.21 EXAMPLE 3. Table 6.3 fan extension of Table 6.2) shows the calculations required t0 test the statistical signitie cance of fy and fy, The values of ¥, in Table 6.3 are obtained by substituting the valves of 1 into the estimated egressom equation found in Example 2. (The values of yj are obtamed by squaring y, trom Table 62 and are tobe conse en Therefore 7m0=0 13.7 and 13 im SIMPLE REGRESSION ANALYSIS Table 6.3 Cormferitizer Calculations to Test Significance of Parameters 8564 oo7ss 10816 5.5596 282 1.0900 2436 om | am 13.838 som | 024 0.0576 6 289 1% so Exi—s7e | Dole Le-0 | Ed —7a086 Since both ‘yand 1, cxeoed = 2.306 with # dF ut the 5% level of significance {from App. $), weconclade that both hy and 5) are statically sigeificant at the 5% level. 64° TEST OF GOODNESS OF FIT AND CORRELATION ‘The-closer the observations fall to the regression line (ic, the smaller the residuals), the greater is the variation in ¥ “explained” by the estimated regression equation, ‘The total variation in is equal to the explained plus the residual variation: Dy FP = De-PP + Ey - f Foal secnion in Tipline witness Whores of in fcr in Floor ve GID umes ano sures as ass] = SSS Dividing both sides by TSS. gives RSS _ ESS ‘Sas Miss The cocfficiens af deteeminatton, or #, is then defined as the proportion of the total variation in ¥ “explained” by the regrcasion of Y on 2: (6.88) FP ean be caleulated by (a7) where A ranges in value from 0 (when the estimated regression equation explains none of the variation in ¥) to 1 (when all points lic on the regression line). The correlation coefficlene ris given by (see Prob. 6.22) contr) fF Ems aay (6.18) CHAP. 6) SIMPLE REGRESSION ANALYSIS 12 1 ranges in value from =I {for perfect negative linear correlation) to +1 (For perfect positive Nnear corrdlation) aid. does aot imply causality or dependence. With qualitative data, the raik or (he Spearman) correlation coefficient r” (see Prob, 6.25) can be used FXAMPLE 4. The covfficient of determimition for the comertilixer eximple can be Find fram Table 6 = 0290 20,9710, 1 97.10% Thus the regression equation explains about 97% of the tatal variation in com output, The remaining 3% is attributed to Factors inckaded in the error term. Then r= vA ce /O.97N0 b 0.9854, or 98.54%, and ts postive Inecatise fj A poutive, Figure ts shows the total, the explained, and the resutual vanatinn of 2 ” Hen Ba 168, ashes 2f cara ¥ 63 PROPERTIES OF ORDINARY LEAST-SQUARES ESTIMATORS est linear unbiased estimators (RLATF). Lack of bias ‘Ordinary least-squanes (OLS) estimators a means E(B) = b so that Bias = BE) - 0 Best unbiared or efficient means smallest variance. Thus OLSestimators are the best among all unbiased. linear estimators (see Probs, 6.14(a} and 6.15(6)) This is known as the Gauss-Markov eheorem and reprexonte the most important justification for wing OLS. IM SIMPLE REGRESSION ANALYSIS lenar. 6 Sometimes, a rescarcher may want to trade off some bias for a possibly smaller variance and ‘minimize the mean square eetor, MST (see Prob. 6.29) FGi= hy = varth) + (bias iy Amestimator is comsisient if, as the sample size approaches infinity in the limit, its value approaches ‘the true parameter(te., 118 asymptotically unbiased) and its distribution eollapses.on the true parameter see Prob. 6.31 MSPihi EXAMPLE 5. OLSestimators fy and dy found in Example 2 are unbiased linear eximators of Fy apd Because tb) =e) and UN by ‘Yarfyand vty fousdin Exanp are als ower than oragy other linear uaiasdestinators, Therefore fy and Bae BLUE Solved Problems ‘THE TWO-VARIABLE LINEAR MODEL 6.1 What is meant by and what is the function of (a) Simple regression analysis? (8) Lincar regressnn analysis? (ck A scatter diagram? (d) An ertar term? (1) Simple regression is used for testing hypotheses ahout the relationship between a dependent variable: and an independent or explanatory varkible 1" and for predicion, This is 40 be comrasted with multi regression analysis, in which these are nol oee, BUL LO Or shore independent Or explanatory variables, Multiple regression analysis is discussed ia Chap 7. (6) Covear regression analysis assumes that there is ae approximate linear relationship betwcen and Y Ge, the st of random sample values of and ¥ Fall ow or near a straight line). This i to be comtnisted with noniénear regression analysis (disceced in Sa. 8.1), A seater dlagrars isa figre in which each pair of independent-dependent observations is plotted as a point in the XY plane. T's purpose is to determine {by inspection) if there exists an approximate ine Felationsiip betuicen the dependent Variatie ¥ and the independent oF explanatory warfable (i) The error term (also known as the dirtarhance or stochastic term) measures the deviation of each observed value from the true (but unobserved) regression line These ervor terms, designated by ay and ¢) arise because of (1) numerous explanatory variables with only slight and iregular effects 1 that ace omitted trom the exact linear Telabonshap gwen BY tx. {O./}, (2) posstble errors ot rmcasurement ip, and (3) random human behavior (sev Prob, 1.8). 62 The data in Table 6.4 reports the aggrerale consumption (¥, in billions of U.S. dollars) and disposable income (X, also-in bithions of U.S. datlars) fara developing economy for the 12 years from 1988 to 1999. Draw a scattor diagram for the data and determine by inspection if there exists an approximate linear relationship between ¥ and X. From Fig. 6 i can be xen that the slativship bowen eomsumption expendilures ¥ and disposable income X'is approximately linear, as roquired by the linear regression mod. 63 State the general relationship between consumption ¥ and disposable income X in (ab exact Tinear form and (B) stochastic form, (c} Why would you expect most observed values oF not to fall exactly on a straight line? (uw) The caact or dcieoministic general relationship between aggregale consumption expenditures ¥ and aggregate dieporable income. can bo w CHAP. 6) 6 © SIMPLE REGRESSION ANALYSIS; be ‘Table 64° Ageregate Comsamption (¥) amd Dixpowale lecome (1) Your n Yi x 188 1 1a L999 2 lis L990 3 126 91 4 130 992 5 16 1998 « va 04 7 Me yas 8 1% 1998 9 1 en) M48 It L998 u im | 1m eo) is | rte Drapes Fig 64 Yoh ebik on ‘where refers to each year in time-series analysis(as withthe data in Table 6.4) or to cach geonomie unit (Guat aso Fail) sienrsextional analysis. Tn By. (6.0), dy ana & aoe wuikawwes wonstants called parameters. Parameter by is the constant ar Y intercept, while 6) measures AY/ A, which, im the ‘somtext af Prob, 623, refer 4a the marginal propontity ea consume (MPC) (see Soe, 13), The apeific linear relatonship corresponding to the general linsar relationship in Eq, (6./pis obtained by estimating the waies of by and by fepresented by Ay nd hy and vad as "5 sith evn hat and “B sub ome hat") ‘The exact linear relationship in Eq, (6./) ean be made stochastic by adding a randoon disturbance or srtor term, uy giving ¥ ‘Most observed values of ¥ are not expocied to fal consumption ¥ ie poetulated to depend prima +a ta 102 precisely on a siraight line (1) because even though Ip on disposable income X, it aleo may depand on 136 64 SIMPLE REGRESSION ANALYSIS lenar. 6 smumerous other omitted variables with only slight and irregular effect an ¥ (if some of these other variables had instead a eigfcant and regular effect on ¥, thon they should be inchuded as additional explanatory variables, a5 in a multiple eegression nioéel); (2) Because of possible errs in meas Yana (4 hesanse of inerent rate hum hebavior, which acually leas to different vals af ¥ fir the same value of X under identical circumstances (see Prob. 1) State cach of the five assumptions of the classical regression model (OLS) and give an intuitive explanation of the meaning and need for each of them |The first assumption of the classical linear regression model (OLS) is that the random erzar term wis nonmally distibuted, As a resol, Y and the sampling distribution of the parameters of the regression are also normally distrtuted. so that tests can he conducted om the significance of the parameters (see Sees. 42, 5.2, and 6.3). 2 Thesscond atumption # that the expected: value of the errar term or ite mean oquale ser: Fluyh=0 629) Because of thic assumption, Eq, (6.4) gives the average value of Y”. Specifically, since is assured fixed. the value of Y in Bo. (6.29 varie abowe and below its mean as u excceds or & smaller than 0. Since the average vatue of i is assumted to be 0, Eq. (6.1) gives the average value of Y 2. Thethid assumption is thatthe variance of the error term ie costant in each period and For all vahsos of X: Blue = oS 6.20) This asstimptinn ersten that cach obsersatinn tx nally reliable, 60 that estimates of the roerestion coellicients are efliciet and teat of hypotheses about them are not biased. ‘These fit three assump toms about the stor tem ean be summarized as Ho Naa 20) 4. Thesorth assumption is that the value which the crror term assumes in one period is uncorrelated or ‘unrelated to ite value in any otber period; Flay 0 forte ifm don 620) ‘This ensures that the average value of Y depends only on and noton u, and itis, once again, required ‘ivorder to have eficientestienates of the regression cnelfcicals and unbaased 10s of thee significance The fifth assumption is that the explanatory variable assumes dixed values that can be obtained in ‘repeated samples, so that the explanatory variable is also ungorrelated with the error teem: etm) = 623) This assumption is made to simplify the analysis. ‘THE ORDINARY LEAST-SQUARES METHOD as i) What is meant by the ordmary feastosquares (OLS) method of estimating the “best” straight line that fits the sample of XY observations? (B) Why do we take vertical deviations? (c) Why do We not simply taKe the sum of the deviations WickeW squaring chemi! (aly Why do we not take the sum of the absolute deviations? (a) The OLS method gives the best straigh line that fits the sample of 1" observations in the sense tha it minimis the sum of the squared tvertiral deviations of cach observed point on the gtanh from the straight line. (6) Wo take vertical deviations because we are trying to. explain or predict masemente ia Y, which ie ‘measured aloag the vertical axis. (0) We anni take the auch of the esate Of conte of the bated jolts Haw the OLS law besaise Adesiations that are equal in size but opposite in sign cancel out 50 the sum ofthe deviations equals 0 (pee Table 6.25 CHAP. 6) SIMPLE REGRESSION ANALYSIS 137 (1 Taking the sam of the whsofute deviations avoids the problem of having ths sum of the deviations equal 100. However, the eum of the squared daviationt is preférsed an 10 penalise Larger deviations relatively mone than Senalley deviations. 66 Starting from Eq. (6.3) calling for the minimization of the sum of the squared deviations or residuals, deve (a) normal Eq, (6-f1and_ tb) normal Hg, (6.5). (THe reer without know ecg of calculus can ship this problem.) ie Yd = EH - ty =D hy? ‘Normal Eq. (6.41 is derived by minimizing Fe with respect to by she an “ty 2¥oi- A sniens0 Fei 4-ha9-9 Vinnie Pa (ne (b) Norma q (0.5 is derived by minimizing Fe] with respect to 6 Yim iy fut? (65 67 Derive (a) Eq. (6.0) to find 6 and (6) Eg. 10.7) to find by. [Hint for part a: Start by smulkiplying Bq. (6.8) by mad Ba, (3-4) by 52 ¥yd (a) Multiplying Eg, (6.5) by wand Eq. (6.4) by F,, we pet AYN tn Ea 4b (20 EnEn-mErsa(En) 625) Sobiracing Ea (6.25) from Ea. 45.24), we ast Lan-o0eL Solving Eq. (6.260 for by, we get éfDx-(04x)] (626 LEM - ENE N 5, - TESTE SEY (er rE (Ea) (b) Equation (6.7) ic obtainad by simply soling Ea, (6.9) for By DN anie iyo oa EL, Ee PAY in 68 (a) State the difference between fy and 4,, om one hand, and Ay and 6, on the other hand. (B) Stato the difference between a, and ¢;, (eb Weite the equatione for the true and estimated 138 69 SIMPLE REGRESSION ANALYSIS lenar. 6 relationships between X and Y. (a) Write the equations for the true and estimated regression nes between 1 and Y. (@)byand by are the parameters of the true but unknown rsession line, wile and Bare the parameters of the estimated regression line () wis the tandom disturbance, err. or stochastic term in the true but unknown relationship between. X and Y. However, ¢, is the residual hetween-ach observed value of ¥ and itsearresponding fitted valne Yin the estimated relationship, (©) The-equations for the trae and estimated relationships between 1” and ¥ are, respectively, Fem Pet wa Yoh thie, 46.27) (a) The squations for the rue and estimated regressions between 1 and Y (T= by +8 28) h-h+hy 69) (a) Find the egaesion uation fr the consumption schedule in Table 6.4 sing Eg. (6.6) to find Bj. (8) Plot the repression line and show the deviations of each ¥; from the corresponding 1 (or) Table 6,5 shows the ealeulations to find 6, and By for the dats im Table 6.4 YY, _ (124225,124) ~ (174041524) _ 2,701,488 ~2.651,760 UDGS7 1D) — aoe” 3085S = 3 077.600 27 asta 027130302230 ‘Ths the equation for the ctimated consumption reresion is f= 230-4086, Table 65 Aggregate Consumption und Disposable Income: Calculations aan 12 Duy = 2s (6) Torplot the regression equation, me need to define any two points on the repression whea k=, ¥, 1. For example, 23D AS6(UI4)— 10034. When X= 178, ¥) 2.30 &0.86(1781— 15538, CHAP. 6) SIMPLE REGRESSION ANALYSIS 19 ™ F233 +0ser, r Dispose tec. Pipes ‘The consumption regression line is plotted in Fig. 6-5. where the positive and negative residuals are abso shown, The regression line represents the best fit to the tandem sample of consumption disposable income observations i the sense that if eninimizes the sum of the squared (vertical) deviations (rom the line. 610 (a) Starting with a. (4), die the equation orb, in Govan fon for the case where (b) What is the value of by when Y= n (o) Starting with Eq. (66) for 8, too ‘ue divide numerator and denominator by n! and get = EMMide (ENS Ym) EN n= (DMs) = EMviia “3 “Duna ee sings =F = (and canceling ths » erms abe er) (8) Starting with Eq, (6.7) For By, we obtain since ¥ =P =0 (65 67 and substiwting @ for F and Y, we pet fy 0b 140 SIMPLE REGRESSION ANALYSIS [oHar, 6 6.1L With respect to the data in Table 6.4, (a) find the value of by using Eq, (6.8), and (6) plot the segvession Tine om a graph measuring the variables as deviations from their respective means. ‘How docs this regression line compare with the regression line plotted in Fig. 6-57 a) Table 6.6 shows the calculations to find 4, for the data in Table 64. In deviation form (note that Dy =a =o i, — Seti ise f 6 REN T6086 he me win Pb, 6960 Table 6.6 Aggregate Comuamption and Disposable Income; Alternative Calculations: [ep] on a x 7 pw [ae 7 Pai 2 | to | ie ser os 3 | is | 6 sel sai 4 fu | 30 28s ns s 136 45 si 6 0 15 PS 7 148 1 3 3 9 a 136 3 W 3 (2h 9 fi | a0 5 5 zs zs wo | iss | te FS 3 395 2 uu | i 3 25 ss as m | is | re ” a 691 sw Eve0 | peso | Exneua | Daas 16) From Prob, 6.1046) we know thatthe regression line erosss the origin when plotted on a graph with the ‘axis measuring the variables in deviation form, and from part.a of this problem ws Know that this regressive Tine Bas he samme slope wy the eression live in Fiy, 6-5. See Fig. 6, =O, “10, 86) Fig. 66 6.12 In the context of Prob, 6,9(a), what is the meaning of; (a) Estimator by? (b) Estimator 6,? Ae) Find the income elasticity of consumption a) Catster fy 82.0 i the inteespt, oF the vl of aggregate consumption, an billions of Sellars snhen disposable come, abo i Billions of dollars, is 0. ‘The fact that By > B-confirms what was anticipated on theoretical rownde in Example 3 in Chap. CHAP. 6) SIMPLE REGRESSION ANALYSIS 141 (b} Estimator & =a¥/dX' & 0.869 the slope of the estimates regression line. It measures the marginal eepenaity to consume {MPC} or the change in comeinaption per one-uni¢ change in digpotable incorme. ‘Goce again, the fact that @ = By < Leanlirms what was anticapated on theoreti groundsin Example 3 in Chap. (6) The income elasticity of consumption r-measures the percentage change in consumption resulting from «a given percentage change in disporable income, Since the elasicity usually changes at every point in the Function, itis measured at the means: ae = nhs om For the data in Table 6. oe = 0.861 sas [Note that clastcity as apposed to the slope, i a pre (urit-fce) number, TESTS OF SIGNIFICANCE OF PARAMETER ESTIMATES. 613 Beline (a) oj ands‘, (H) varby and ward, de) and sf, and Gt 3g ands, (ay od fe tMevarianss of ths ctor tm in the trae relationship bstwsen Nand, oweven. a? — Yew ~ #8 isthe residual variance and is an Canbiased) estimate of a, which is unknown, is the Sst ssn yroncn Ta syiengrnnon snipe 9 'Tmen bo ate ‘othe cepa stan (oy Vauly eB E Aye Ea white vacky ely Eads The variances of Fy a by (oe tc stots) ‘nl igi abou and ist coleice teas rb and ; pEN gerry fF Ed ° et and fare, sepectively,(uniased) estimates of var by and var, which are unknown since of is dont = JF and 5, = fF called the standard errors. ae sad 3, ar, spectively, the snd deviations of By ad By and se 6.14 Prove that (a) mean & = b, and (Bb) vardy (ch mean by = by, and (d) varby EN fem Ea) (ay whee 6 ay Ed = woman asa of en Lor Dui sok hbo h Dedit Dow - ah 4b s+ Dom ge since Dey = Ea Dad $0 tbscanse En Frou EM Ey “Eat Day sey — era Ree] — ty 9) ant noe hy i 3 constant and (E%5,) = 0 becatee of acmumption § (Sac. 6.1), we SIMPLE REGRESSION ANALYSIS lenar. 6 (@) From part a me obtain RE Pon vari = var(Dear} = Ed wah since ¥, varies only because of u with Ny assumed fixe. . wpe De goa vari “Lede L(y) t= EGtoae we (from part a} (Cross multiplying, aii (4 because $34 =and Soe} (i) We saw im part chat since E and Dd = DH wi ae Eiee vache 7 *y Sines in par awe sow that Sox? = EP aT 6.15 For the aggregate consumption-income observations in Table 6.4, find ta} s", (6) sf and s,. 1 ands, (a) Thesalenlations required ta find «are shonin in Table 6.7, which ie an © values for J in Table 6,7 are obtained by substituting the vahues for found in Prob, 69a), sine of Table 65. The into the regression equation 11s2789 1-2 SNOT LS 534257113) o (11534257112) 12512) eM Then Gesien CHAP. 6) SIMPLE REGRESSION ANALYSIS 1 6.16 ‘Table 67 Consumption Regresshon: Calculations io Test Significance of Parameters Year | oy | “ é x x 1 foe | via | roo 29556 12.996 961 2 | 106 | 118 | 108.78 4984 13924 9 3 | 108 | 126 | 110.68 st 10756 15,876 361 4 fio | 10 | ris0 ; Le.8100 16.900 23s 5 [im | ta6 | ri9.a6 75036 18406 a 6 fam | vo | 1m30 1.5800 3 7 [ars | 14s | 129.88 2ag64 9 | 10 | 156 | 1as5 ALTI6 24.6 9 [12 | 160 | 139.90 I 4.100 25.60 wo figs | 199 | rasa 2736 26,896 tf 180} 170 | 148,50 iz [asa] ire | rss Then (a State the null and alternative hypotheses to test the statistical significance of the parameters of the regression equation estimated in Prob. 6.9(a). (1) What is the form of the sampling dise tribution of Bp and by? ¢) Which distribution must we use to test the statistical significance of by and by? (a) What are the degsers of freedom? (a) To text for the statistical significance of hy ad by, we eet the following wall hypothesis, My, and secutive hyphen, J ee Bee. £2), Hy by vesas My by £0. Hy ad versus He by 8) {ne hopes regresion analy s 9 rect Aly ane! 9 oR, at Aya by, wth mont est () Since wi assumed to be normally distribtel (asssimption I in Se. f), ¥, als normally distibates (ince ¥, i asrumed to be Saed-amumplion 5} As a resi, hy and also will be normally distributed (e) To test the statistical significance of Jy and by, the distribution (from App. 5} must be wsed because by snd by are normally dstibuted, but ward and Yar5, afe unknown (snes @f 8 uncxown) and « = 3 (ore Se 44) dt The degrees of freedom are n—, whee a isthe mumber of observations and h isthe aumber of farameters estimated. Since in staple regression analy, tha parameters are estimated (hy and hat Test at the $% level of significance for (a) hy and (6) 4, in Prob, 6.92) (ay 4 SIMPLE REGRESSION ANALYSIS lenar. 6 Since ¢y smaller than the tala value of atthe $Y evel ((o-tall fost) and with 10 ffrom ‘App. 5), we conclude that 6, i not staivically ignificant at the $% bevel Ge, we cannot reject Hl, that fy =O) b= 6 ue ‘50 fy is statistically significant at the $Y% (and 1%4p evel (Le,, we cannot reject Mi, that by #0) 61K Construct the 99% confidence interval five (a) fy andl (bi 6) in Pro 6 Ofer) (e) The 98% sonfidense interval for by is gven by (See 4) byw hye 22281, © 230.42228(7.17)= 2.30.4 1597 0 fy is between =13,67 and [8.27 with 93% confidence, Note how wide fand me: conlidence interval by is, reflecting the fact that dy is highly insignificant () The 95% confidence interval for by is siven by less the 95%, hy = By EDI, = ONO ADO = ON SOIT So by ix hotwoen 0.98 and 0.99 (Le, 4 2h, + 0.97) with 98%. comidonce, ‘TEST OF GOODNESS OF FIT AND CORRELATION 619 Derive the formula for R ‘The encicient of determination Ride‘ as the proportion af the total variation in ¥ “explained” by the regrenion of Foa.¥, The total variation in ¥ or total sum of auares T55-— Fiyi Fy — Eyl The explined variation in ¥_ or repression sam of stares RSS= 301%, — Fy = S57, The resdaal Variation a Y oe ror cus of guarce ESS — EtY, — 3)? = Eat 138 = RS 4ESS So-Tr = Do.-3¥ + Dey ba =LH +E Dividing both sides by Ei, we pet Therefore B igumitércennd 0 © AP < I hecaune 0 < ESS 0 indicates that 1V and ¥ change in the tame direction, Such as the quantity supplied of a commodity and M3 price. r= —1 relers to a perfect negative eortelatinn tie all rhe sample abservatinns lie on a straight fine af negative dope. however. r= 1 refers to perfect positine correlation (ic, all the sample observations he on a straight line of postive dope). r=tl is seldom found, ‘The elowerr isto al, the greater is the degree of positive or negative linear relationship. It should be woted that the sign of ris alway’ the same as that of By, A zero correlation coefficient means that there exists no linear relationship whatsoever between and Y Ge, they tend to change with no connection with each other). For example i the sample observations fall exactly on a circle, there ia perfoct nonlinear relationship but a sero linear relationship, and r= 0. (6) Regression analysis implies (but does not prove) causality between the independent variable and depeadent variable Y. However, corslation analysis implass no esusalty or dependence bat eefers simply to the type and degree of association between two vanables For example, Wand ¥ may be highly correlated because of snother variable that strongly affects both. Thus correlation analysis isa ‘much less powerful tool than regression analysts and te seldom used by itself inthe real world, Infact, the main use of correlation analysis i to determing the degree of association found in regression analysls, This Is ghven by the cocMclent of determination, which fs the square of the correlation coefhicient et Deve ine equation (a) ¢= Taun/lyCMVE ) (eine start py showing tnat Ty, 8 8 measure of association between X and ¥.) and ib) r= ylb(Caxan./Ev71 (Mine: Start with Daily DIVE (a) Tim provides a menuare of the association betmson WV ané 7 ecase iY and ¥ hoth ese fl Si» while Vesa ¥ Gills, vies sean, ry, < Mall oe eanet aang abservanions dass 21s or fall inboth N and ¥, n> O and larg, implying large postiveearrsation, Wall of most sample observations invole opposite anaes in Vand ¥. then $< O and ares, implying a lara regative correlation, Uf, however, some X and Y sbeervations move inthe same direction, while bers move in opposte directions, Fy) will be small, indicating a small net positive or negative conelaion. However, measfing the degrce of association by xy has two disadvantages Fin, the greater ic the umber of sample observations, the larger 3, and weond, Y"xy 1 expessied in the ums of the probe, These problems Gan be overcome by diving > BY w the number of sample observations) apd by the standard deviation of Wand ¥ (YSx7/ and ETM Then © — covananee of and 7 (osm and oy 622 Find R° for the estimated consumption regression of Prob. 6:9 using the equation (a) i” = LAvLy and (Rat Le Dy (a) From Prob, 6.19, we know that yi = + Lei. se LH = Ly -Ea. Since Lys squaringand adding they tems fom Tables. and Soe? = 113247 rom Table) 3.5 Tsast 3508-308. This Sif _ sous fas0 36a — 2 ONSET, OF 9687 146 6.23 oat SIMPLE REGRESHION ANALYSIS (cna. « () Using Eel = 1152572 and Ey} = 384, we oot -) Ee, sas oronsrre wo ES 1 ET nsec, or 87%, (as in part) Find r for the estimated consumption regression in Prob. 6.9 using (a) VR’, (b) VERVEN and 6) r= JS an Eoth (@) r= RT JOBERT 0.9882 and is postive because B= 0 (@) Using Soy) = 4144 and SD xf = 4812 from Table 64 and yf Euy vo¥, ‘The very small difference between the vale of r fond here and that found im part results from roving eeF008 (©) Using 6, 0.56 found in Prob. 64a), we obtain = fi = ERE noe Daw {3684 from Prob. 6.22%), we pet (a) Find the rank or Spearman correlation cocfficient between the midterm grad: and the 1Q ranking ofa random sample of 10 students in a large cass as given in Table 6.8, using Eg. (6.31). (6) When is the rank correlation used? Table 68 Miter ‘Student tT)2/)2) 4] 5] *| 7] 8] 9] Midterm grade | 27 [7m | os | oa | a | os | or [oe | aw | a6 1Q ranking 7) els] s} ef safe] tfufe wan) where D is the difference betyrcen ranks of corresponding pairs of the 1wo- variables (iter in aseending or descending order, with the mean rank assigned to observations of the same value) and» is the ‘number of observations. Ihe calculations (0 find rare given m bathe 0.9 () Rank corrslation is used with qualitative data sush as profession, education, oF sea, when, because of the absenec of mumerical valves, the cocfieent of correlation cannot be found, Rak correlation aso ‘ued when precise values for all or some of the vasiables are abt available (s0 that, ance again, the With a great aueaber of obseevations of le values.’ can be found asan estimate of vin order to avoid very time-consuming calculations chamwever, lity to computere hae practically eliminated thie reacon for sing r'). CHAP. 6) SIMPLE REGRESSION ANALYSIS ur Find the Coeficemt of Rank Conretation Midterm | Ranking on Grae Mitterm PROPERTIES OF ORDINARY LEAST-SQUARES ESTIMATORS 625 6.26 (a) What is mcant by an unbiasod estimator? Hlow is biasdefincd? (6) Draw a figure showing the sumpling distribution of an unbiasod and a biased estimator. (a) Am cstimator is unbiased i the mean ofits sampling distribution equals the true parameter. ‘The mean cf the sampling dictibution ia the expested value of the cotimator. Thus lack of bias wnoans that = 4, where & is the estimator af the truc parameter, &, Bias i then defined a the difference tenecen the expect value ofthe estimator and the true parameter: that i, his — AU) = A. Note that luck of bias does not mcan that J-= b, but that in repeated random sampling, we get, on average, the comet estimate. The hoe is that the sample actually ohtained iselose to the mean of the sampling distribution ofthe estimator. (4) Figure 6 Ye shows the sampling distribution af on estimator that i unbiased, and Fig. 676 chows one that is bias. Panel A; Usb estimatr Paeny Fig. 67 (a) What is meant by the best unbiased or efficient estimator? Why is this important? (b) Draw: a figure of the sampling distribution of two unbiased estimators, one of which is efficient, (a) The best wihiased or eficlene estimator refers to the onc with the smallest variance among: unbiased citimators, [tis the unbistcd estimator with the most compact or last apread-out distribution, This is ‘ery important because the researcher woul be more ceriain that the estimator is slover tthe trae popelation parameter being etimated. Another way of caying thic ic that an elfiient eetimator has the 148 67 628, 629 SIMPLE REGRESSION ANALYSIS [oHar, 6 ‘smallest confidence intesval and is more key to be statistically significant than any other estimator. ‘houll by noted that minimum variance by isle not very important onlece coupled with the fask of bas (8) Figure 680 shows the sampling distribution oC an eficeat estimates, whike Fig. 6-86 showy an inci j Scat oc Fin 6s Why is the OLS estimator so widely used? Ts it superior to all other estimators? ‘he OLS estimator 1s widely uses Decause Ht a6 HLLU: (Dest near unbiased estimator, hat is, among all unbiased linear estimators, it has the lowest varianee, The BLUE properties of the OLS estimator is ‘fen verted to ws the Gass Marker cheorem. Hlowever. montincar eatienatora ay te weperivr 4 the OLS, estimator (j.e., they might be unbiased and have lower variance), Since it is often dificalt or impossible to find the variance of unbiased nonlinear estimators, however, the OLS eatimatoe remiss by far the moat widely wed, ‘The OLS estimator, being linear, is also easier to use than nonlincat estimators. (a) What is meant by the mean-square error? Why and when is the eule to minimize the mean- squarc-error useful? (B) Prove that the mcanesquare crror equals the variance plus the square of the bias af the es ©) = ab = oF = var b+ (oie? “The rule to minimis the MSE arises when the researcher faces a slightly biased estimator but with a smaller variance than any unbiased extimator. The researcher ix then likaly to choose the estimator ‘with the lowest MSE. This rae penalizes equally forthe lager variance or for the square of the bins of fan estimator, However, this is used only when the OLS estimator has an “unaosentably large” wo stb) = 24 oP H(i) + EB) = BF = Bub — (by + 1848) — AP + 261 — aide) — ard +1085 tecamse FI EP — eae 16 BF ins. and EIB = e1GIEG) =H) —0 rouse this exprestion is aqua to EWE) = [EGE ~ hb + bE) = [840 — [EN ~ HEU) + HED) =. (a) What is meant by consistency? (b) Draw figure of the sampling distribution of a consistent estimator. 2) "To sonitions ace equi for an estimate to be somsistent, (1) as the saute ine increases the «sjimator mast approach more and more the true parameter (this is referred 10a asympiatic unbiased snecsj; amd (29 a5 tho sample cise approacher infinity in the Kamit, the carping dicribtion of the CHAP. 6) SIMPLE REGRESSION ANALYSIS Me N@ol) orn Btu) 0 forsee bE Lee 2 pen (ory (See Prob 61) 12 SIMPLE REGRESSION ANALYSIS [oHar, 6 eesesas ‘TIE ORDINARY LEAST-SQUARES METHOD 623 a 6238 66 [Express mathematically he following statements and formulas: (a) Minimize the som of the squared Seviations of each value of Y from is corresponding fited value, (6) Minimize the sum of squared residuals. (€) The normal equations. (f} The formulas for estimating by and by ans. ta) Min Tr, — HP by MiwEed (6) Tay = my rh To @) & =6DKN- OME Howl AF - Oo] = Lo 8 and HOEY ‘Tor the data in Table 6.12, find the value of fa} &) amd (l6 by. Go) Write the equation fox the estimated LS regression line. . . des. fo) 5 EON (W) ATES) Heats sony, {@) On a set of axes, plot the data in Table 6.12, plot the estimated OLS regression line in Prob. 6.34, and show the residuals. (4) Show algebraically that the repression line goes through point °F. Ans. a) See Fig. @11 (4) At ¥, = 14.38-+ $.99(5) = 43.98 & P= 44 ithe slight difference fue to rounding) f= 1428+ soy Fig. 6-11 ‘With refercnge to the estimated OLS regression ne ia Prob. 6.M, state (a) the meaning of iy, (8) she meaning of by. and (c) the clasticty.of ¥ wath respect to.X at the means. Ans. ia) hy ts the ¥ imercept (5) By ie the slope of the estimated OLS regression line (6) 9 206% ‘TESTS OF SIGNIFICANCE OF PARAMETER ESTIMATES 637 For the dat dest) in Table 612 in Prob 6.31, find ta) 2) sand sg .and (0) an, Sa60T Ub RAAT Mandy 611 (a ASTM and yg #1 CHAP. 6) SIMPLE REGRESSION ANALYSIS 1 638 Test at the 5% fevel of significance for (a) by and) yim Prob, 6.34 Ans, (a) By i statistically significant atthe $% level (8) 6, it alko statitically igniicant atthe $9% level 6.39 Conssruet the 95% contidmos interval for (a) by and (BP by in Prob, 6.34, Ans, (a) O19 sfefefuye y afofaftefafoalaysftifayi » fafafof2fefsfalsa]afafa]e Tas mumbers in parentheses are ais, Thus 6 sassy spicata she St (ad 175) kvl of Mgsicance, by vot Multiple Regression Analysis 7. THE THREE-VARIABLE LINEAR MODEL Multiple regression analysts is used for testing hypotheses about the relationship between a depen Adept vavdable 7 andl oro or more independent variable Nand for prediction, ‘The irexera fegression model cin be written ax ty + BX + Bakar + my my The additional assumption (10 those of the simple regression model) is that there is no exact Hncar relationship bet ¥ values (Gee Prob. 7.2) Kent Dt ET Yn =4 Osh Sth TMs on Fins} Path Date vn ‘which (when expressed in deviation form) ean be solved simullancously for b, and bs, giving (see Prob, 73) wy (Exe) (Ew Es ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Tenms of Use, CHAP. 7) MULTIPLE REGRESSION ANALYSIS Ise Estimator By measures the change in ¥ for a unit change in Xy while holding Xp constant, by is analogously defined, Estimators aad by are called portal regression coeficlns. by ly aed By 008 BLUE (see Bee. 6.5). EXAMPLE 1. Table 21 extends Table 6 and gives the bushels of com per scte, Y, resulting from the use of varions amounts of fertilizer Vand insecticides Ys, both in pounds per acre, fram [71 to 1980. Using Eqs. 47.5). (76), and (7.7), we get _ si: (SMF (CssMEe] Cay) (Ean) _ovonster— a80ns28) MEA) (aaa POON EF 3198 way = 97 ~ (0.659018) - (LID sv that = 31984-8654 +1.11%y, Tostimate the regression parameters with hie or more independent or Splanatory ariablo, 30 Saoton 3 72 TESTS OF SIGNIFICANCE OF PARAMETER ESTIMATES Ta onder to test for the statistical significance of the parameter estimates of the multiple regression, the vurianes of the estimates is required: Var) =o —— Et (ra Debs - (Lael } Varbs =o; Ex 79 “EM ES-(Dnn) [dy is usually not of primary concern; see Prob. 7.2(e)}. Since ay is unknown, the residual variance s? is used. ac an unbiased estimate of of raga dd ze (6. whete A — umber of parameter catimates Unbiased cstimatcs ofthe variance of fy and & arc then given by Le rd nek > 2 Gia sank SE (Lara -=4¢ EM aun ah a Boake Tad (Can) 80 that 5; ands, are the standard errors of the estimates, Tests of hypotheses about 5, and by are conducted as in Sev. 4.3. EXAMPLE 2. Table 7.2(ancxtension of Table 7.1)shows the additional calculations required totes the statistical Significance of, and. ‘The values for ¥, in Table 7.2 are obtained by substituting the values for Fy and Ns, into sed OLS repression equation found in Example 1. (The valves for y? are obtained by squaring y, from 1d are to be used in Sec?) Using the valves from Table 72 and 7.1, we get a eH 13001 sox 4-25 —— ae te and 5 SM sank Ene (laa 2 Gomi Gay . aie Ei ____ 13.6004 __576__ iggy ad, 2027 ~kDabe- (lanl 3 GTOsOn— (ey % ° ° ” 2 2 ‘ : 6 8 s | es 1s ef m4 Done |e se SBAAWNY NowSSICERH NOH cavnal CHAP. 7) MULTIPLE REGRESSION ANALYSIS 1s Table 7.2, Com-Fertlier-Insecticide Calculations io Test Significance of Parameters var |r % e e y im | wo | « on 038 289 wr | 4 | 4292 Lise 1 we | 4 | oa 45.33 0.4889 12 wa | as | 48 0.7228 at ws | oo | i sam 0.1369 Fa we | os | i sa 1.000 1 wn | @ | 2 sua asia a wre | es | oe 69.78 1684 at im | vs | 26 729 3761 299 wo | ow | mat 0.3364 2 Deane | Spee iyfog 2 0.95/0.28 230, and ty fy/sg, = LA1/O.2724.11. Since both 4, and fy exceed ‘Mss with 7 AF at the level level of significance (fram A'pp. $}, Both fy art hy are statistically sipnificaat at the 33. THE COEFFICIENT OF MULTIPLE DETERMINATION The cosficient of multiple dererminasion R? ia defimel as Ue proportion of the (uta variation in “explained” by the multiple regression of Yon 1; and 13, and {as shown in Sec. 6.4) it can be calculated by dove Prab. 7.14) 1) 54 AEs thom =e re Since. the inchision of additional independent or explanatory variables is Tikely to increase the RSS =f? for the same TSS = J" 5? (see Sec. 6.4), R increases. To factor in the reduction in the degrees of reedset as additional independent or explasstlery variables are added, Uht adjurted R2 or Fe, is computed (se Prob, 7.16 nat Ra1-0-By (7p whore 1 is the number of checrvatone, and & the umber of poremctersenimated EXAMPLE 8. 2 forthe conertlizcrins:tiide example ean be found from Table 7.2 B= 9916, oF 98.16% ‘This comparce with an A! of 97, 10%; im the simple resression, with Feiler asthe only independent or explanatory vanable a io 11 0.99169 1 =0.0084(1.2857) = 4.9892, oF 98.92% 158 MULTIPLE REGRESSION ANALYS 74 TEST OF THE OVERALL SIGNIFICANCE OF THE REGRESSION ‘The overall significance of the regression can be tested with the ratio of the explained to the ‘unexplained variance, This follows an_F distribution (see See. 5.5) with &—1 and ak degrees of freedom, where m is number of ebservations and & is number of parameters estimated: a Esa eak= 0) fut Sen =—) (1 = Rar = ® the calculated F ratio exccods the tabular valuc of Fat the specified evel of significance and degrees of freedom (irom App. 7), the hypothesis is accepted that the rseression paramstersars not all equal to 272 and that R° is significantly different from zero, In addition, the F ratio can be used to teat any linzar restriction of rzgscasion parameters by using the form (a) where is the number of restriction being tested, Sehr indicates the sum of squared residuals for the restricted regression where the restrictions are assumed to be truc, and 5 e? indicates the sum of squared residuals for the unrestricted regression tie. the usual residuals). ‘The mull hypothests is that the p restrictions are truc, in whieh case the residuals from the restricted and unrestricted models should be ‘Mentical, and F would take the value of zero. If the restrictions are not true. the unrestricted model will hhave lower errors, inereasing the value of FI F exceeds the tabular valu. the null hypothesis is rejected, This test will be used extensively in See, 11.6 EXAMPLE 4, Ta test the overall significance of the regresion estimated in Example | atthe 5% lev = NOI6 (Hoey Faawiple 3, so tbat 991012 ~Weamigg =? Fy Since the caleulaed value of F exceeds the tabular value of F = 47 atthe 5% level of significance and with df = 2 and 7 (rom App. 7), the hypothesis ieaoeepted that by and By are not both zero and that A i significantly diffrent from zeta 7S PARTIAL-CORRELATION COEFFICIENTS Ihe puruiat-eorretation eoemneient measures (Ne net correlation between the dependent sanable and fone independent variable after excluding the common inffucnce of (ic., holding constant) the other imdependent vanables in the model. For example, 7yy,y, 15 the partial correlation between Y and Xj, aller removing the influence of 1; from both ¥ and ‘ky [see Prob. 7.234a)} Pax (aM) Vi Fay! Fx, veneg, — DED oun Vii ‘where rrxy = simple-correktion sootficient between Y and Nj, and ryx; and ryyay are analogously defined, Partiahcortclation coetfcients range in value from. =I to +1 (as do-simphesorrelation samultancouny. I ieahuape posable to caltulae hy at except i thee at each relation dp between ¥, and ¥; or ifthe numberof observation: on each variable of the model is 3 fener Parameter hy anthem be caleated hy substring inn Fay (77) the val of fy ant fy [saleulated with Eqs. (7.5) and (7,61) and ¥, and; (calculated from the given values of the reoblen) ay)— 74 iables, reference to multiple regression analysis with two independent or explanatory v cicave the meaning of (4) By, C8) By, (0) By. Ue) Ave Bye, and by BLUE fe) Paeatirter hy the cont tim oe ater the regen urs the saoatod wale A wise 1 0. (@) Parectr 5, mensnes the hangs in ¥ for cosh ons-unit change ia x, while Kelling, constant paraiicte 8 sa parka gression coefccnt because it costesponds to the pata derivate of Y sith respect to, or F/B (©) Parameter by measores the change in ¥ for each one-unit change is Xp while holding X%) conetant ‘logs parameter 6 fs the eccond partial regression coofsent Gesause @ eorsaponds to the pari deeratve of ¥ with respect to Xj, 01 OY/8Ns. (ih Since fy by, abd are obtained by the OLS method, they are also best Hear unbiased estators (BLUE; see Seo. 659), ‘That fs, #10) = By, tb) =, aNd Ub) = Ba, aNd 3, 5,8 5, ae howe than for any ollet Unbiased lnc etinaton, Prout of Uae popes tc ia very Sharon ita he so of patria algebra, sa they are ot provigod ere 78 Table 7.3 gives the eal per capita income in thousands of ULS, dollars ¥ with the percentage of the labor force in agriculture X; and the average years of schooling of the population over 25 years of age Xs for 15 developed countries im 1981. (a) Find the least-squares regression cquation of Y on X, and 3. (4) Interpret the results of part a. (uw) Table 14 shows the calculations required to estimate the parameters of the OLS regression equation of Yoo X, and Vs CHAP. 7) MULTIPLE REGRESSION ANALYSIS 1a ‘Table 7.3 Per Capita Income, Labor Force im Agrisulture, and Years of Schooling w]e fisfa[ mf a] i) wlio] faa pe fie fa fio fos j, -LwWE) ~ (Ea) se) _ C2909 — 8-12) (eE- (Ene GO Kar 724456 | = ES 038 §, = eet) = (Evel (F ee) _ BENE O28 fea)-eat (eo) =F 9 = (0.3847) ASKIN) = 9 + 2.66 — 50 = 6.26 ‘Thus the estimated OLS regression equation of ¥ om Xi and Xs is F, = 6.26 -0.38My + 045K, 6) The estimated OLS regression equation indicates that the level of real per eapitaimeame ¥ is inversely related to the percentage of the labor foree in agriculure X, but directly related to the years of schooling of the population over 25 years (as might have boen anticipated). Specfcally, by indicates ‘that a 1 percentage point deciine in the labor foros in agriculture associated with an tmewease in pt capita income of 380 US. dollars whi holding Y: constant. However, an inczeasc of 1 year of schooling for the population ever 25 years of age i associated with an increase in per eapita income of 450-15, dollars, while holding Wy constant, When Ny = ty =0, Fy = by = 6.26, 7.6 Table 7.5 extends Table 6.11 and gives the per capita GDP (gross domestic product) to the nearest 3100 (7) and the pereentage of the economy represented by agnculture (201), and the male literacy rate (1) reported by the World Bank World Development Indicators for 1999 for 15 Latin American countries. (a) Find the least-squares regression equation of Y om Ay and X. (By Interpret the results of part a and compare them with those of Prob. 6.30. (0) Table 74 shams the esleulations required! to-cstimate the parameters ofthe OLS regression equation of Yon Vy and W3 _ Ew) = eM ag) _ coruaKro9s.7335)— (esas Tosi) - (ema ETOH = (FE (Sew ME si) - (Ew E nx: CemiEN)— aad Wy = 30.53 — (—1.95V(11) — (0.53188. 53) = 5.06 9s 687.7335)440) —(TLIOK- 81g “ER TD) = ‘Thus the estiated OLS sepiesinns Gquabow a ¥ ou Ny aad My ¥ = $.06— 195%) +0530) (6) The estimates OLS equation indicates that the level of per capita income ¥ is inversely related to the resceiage tes curuiny coprescntial Uy agi icultuce My Raa ugety cela ete teddy cate of the ‘male population (as might have becn anticipated. Specifically, #y indicates that a | point decline inthe percentage of the economy represented by agriculture ic associated with an inewease in par capita we ate 7a owt fr tim ears the Dat Tae 3 UTPLE REDRESSION ANALYSIS a = a 7 a 3 A Tower esT 5 A alr-qaerete a ajrvaaarspecoeen as a E. alsseececerzcesen["r Hee CHAP. 7) MULTIPLE REGRESSION ANALYSIS lee ‘Table 78 Per Capita Income, Agricultural Proportion, and Literacy coury |i fa] a] alo] @] a] o |e fanfan]aa]anf on fas = tPefa,@]s[e«[7|s[s[[n[e[slufis vy |wiwolalalalefal»lslalslals| ela my | ©, *| 9] 8[™[u[nlw[u] s[*| sf se] e|s mn [¥]#|S|elalslala|[ulale|elalala ‘ey: (1) Angenting; (2) Bolivia; (3) Braz; (4} Cale (5) Calonibia; (6) Dominican Republi; (7) Ecuador; (89 Salado; (9) Hembra: (00) Mevio: (11) Nieragua: (12) Panarea: (9 era: (14) Uruguay: (15) Venera Sources Ward Bask Word DevesopmentInsicators income of 195 U.S. dollars while holding > constant, However, an increase in the male literacy rate of 1 point is assneiated with ay increase in per capita income of 53 US, dollars, while holding 7, constant When Mi, = Ny =4, Y= fy = 5.06. 1FYs is found to be statistically significant [see Prob. 712()] and should, thescfore, be inchuded in the regression, 8 = =2.60 found in Prob. 6.80 8 nota reliable estimate oD. TESTS OF SIGNIFICANCE OF PARAMETER ESTIMATES. 2T Define (a) oy ands’, (b) varby and varbs, ¢e) sf and si, (df) 5, and s¢. (e) Why is by usually not of primary concern? (a) opis the variance ef the error term in the true relationship betwen Ny, Nyy ad ¥, However, = 4 — Def inf othe residual variance and is an anbinsedetirate of a, which i unkmawn the purer of estimated paranciers. In the no independent ar explanatory variable mltple regresion, k~3. Thor n= R—w=3—df Ex 4 “Ds bd—(Laml oy Vary = while ‘The varianos of By and By (or thee extimates) are roquived to text hypotheses about and construct, confdenoe interval for by an a 4, *SyEn- Gant er EP aE - (bs qe-r——ft__- Ed _tai__ EM Ea (Dae) "Ee be- (aa 4] and are, respectively, unbiased estimates of var é, and vari, which are unknown because 03 is sim Wy 95, = J, andy, = JOE 94, nd, are, sopeetvey, the andard deviations of fy and called the standart ever. (@) Unless sulicient observations near, — 2p = are avails, intercept purameter By is usally mot of Frimary concer and a test of ts statistical sgmiicance can be emitted. Equation (7.15) for sary is ‘ery cumbersome and also for that reason is som given and used: sy +EMDY DY EE) LL d= n a A = iy ps | ry | turer joe for | ow Thome o | ime joa fu lon chs | “cme joe fou |e a8 8 | sem yo fo | os sum 1 | isis i} fom | om cor tf sein yo fom fos ae | oanim nfo | | ‘te se | taser nt tow | o sane se | unten nfm fo | ‘8 + | “tom pv -ailw ia] E10 Desa aw Edm ve SBAAWNY NowSSICERH NOH cavnal CHAP. 7) MULTIPLE REGRESSION ANALYSIS 167 However, 3, is sometimes given in the eomputcr printout, so tests of the statistical significanos of &y can be conducted ily, 78 For the data in Table 73, find ta) 2, (8) 2 and sp .and (0 ands (a) The calculations required to find s* are shomn in Table 7.7, which is an extension of Table 74. The values of F; are obtainad by substituting the values of 1, and Xs into the estimated OLS repression ‘equation Found in Pro, 7.32): Ee 70, sp nok go 3 = ‘Table 7.7 Per Capita Income Regression: Cabcaation to Tost Significance of Parnmters coumy [or fox | am r * e 2 es] wo fos | sar o.0961 3 s]os fu] sir ans 4 7] 7] |] so 1.2100 5 7] ow | 2 | a6 0.7396 6 2 ]o4 dow | ros ams 7 o | s | aw | sas o.nio6 * se] os] iw | ose 0.7306 2 a] 6] 2 | sas olde 0 wf} os ]ou | ss 2304 w | 7] 2 | 900 1.00 u | 4] ow | ios a.s936 e]oo fu | se a.n196 w } os | w | sas 1.2008 We} os | 2 | se Set (bh) Using the vale of! found in part cand the vale in Table 74, we ast 2 re lan. 4 . “Sy be- aa eons as, = VHB 04 4, © q07 Et __, »! Saba Dea) 44, 2 V00T 010 OTH 7.9 Testat the 5% level of significance for (a) &, and (b) by in Prob. 7.5(a. nass—0 wa 7! (ay 168 7 MULTIPLE REGRESSION ANALYSIS lomar. 7 Since the absolute value off, sxeecds the tabular value of ¢ = 2.179 (from App. 5) atthe SY level (two: ‘all teu) and ww 18— 3m 12d, we concde that bye etainticallysigniseant at the 5% level (2. ‘we eamnot reject that by 0) 6 ‘So fy is statistically significant at the $% (and 1%y evel (Ls., Hay that bs # 0 cannot be rejected). Construct the 95% confidence interval for (a) 6, and (by by in Prob. 7.54a), (ue) tne 9574 weber sca Fon i wins Uy by =H 1119s; = =0.384 2.179104 0380.31 So by is between 0.68 and 0.07 (ie, ALG = by = —AL07) with 95% confidence. (6) The 9$%% confidence interval for bis siven by hyo By L205, OAS 2.179(0.10) = O45 40.22 So ty fs between 0.28 apd 0.67 (Le, 9.23 = by 0.67) with 9854 condense For the dat in Table 7.5, find (a) ©°, (6) sf and s,,, and fe) si and 5, (@) The calelations required to find ¢ are shown in Table T., which é an extension of Tate 7.6. The values of Fare attained by substituting the values of ¥, aad Np, it the estimated OLS regression ssuation found in Prob, 7.6: 252.9517 3 te oes ve a1 (6) Using the vatus of = found in part ant the valies in Table 7.6, we ast ‘Table 7.8 Por Caplin GDF Regression: Calcutation to Test Signifeance of Parameters 7] * % ¥ ’ . 1| ® of aan ving 2} ow | ow | a ner | -1262 1592644 a] 4 o | gs 2.56 nad 1308736 aloe s | 9s os 6.66 44.3556 s| 3 |u| a asso | 299 fern stow fou | ow oem | en 790m a] | | 9 aos | -1795 sous s}ow | ow | oa 240 oo ‘90.0601 a] os | ois | om 3.18 Lise w] oa s | ss 4400 0.3000 " s | 2 | 6 | 1013 1B 199.6549 wm} oa s | 2 50.104 n|ou s | a4 8 omatsa u | @ o | on 38.92 20.08 3.2064 wis 5 | 93 4a | =760 S77 Dea TRIS CHAP. 7) MULTIPLE REGRESSION ANALYSIS 12 -? oa 133 LLY - (uk: 4, = (if, =< THe 1s ei esa ae moss “INES = (En En) Ganon SF ai vee TAZ Testat the 5% level of significance for (a) & and (6) bs in Prob, 7.6(a), (ay 1 Since the absolute valve of f, does not exceed the tabular value of ¢=2.17% (irom App. 5) at the 5% Seve (rvo-tail test) and » = k = 15 = 3= 12df, me comchade that, is not statistically significant atthe 15% bevel (ie, We eannot reject My that by = 0). my _ 0-0 ae (oy e073 iy bs also mat satetieally significant at the SU Level he, Hp, that By = canna! he rected) 7.43 Construct the 95%% confidence interval for (a) b, and (b) b, im Prob. 7.6(a). (a) The 95% conic intra fr by ve by b)=8 421%, 95+ 2179.15) 954251 Sey i Lntnssat —$.46 ad 56 Gist AA 56) with 9574 senfdeime. Sims ‘interval costains 0, we ean see that By & not statistically significant. (0 tne 997% contiaence inerval for bs 1s greeA By flere By, 21M, —083-4217073}—0.53.21.59 So by is between 1,06 and 2.12 (ie,
  • fn = Mn) > Vand RE RE Whe ais age for a gives A @~ Ila A} ise toualty and and wlll uch Whee is ‘small and cis large in relation to, R* will be much smaller than R’ and R° can even be negative (even though 0-22 7.7 (ap Find for the OLS regression equation estimated in Prob. 7.5(a), (+) How does R° com- puted in part a compare with # from Peob. 7-15(a} in RE from Prob. 6 31)? CHAP. 7) MULTIPLE REGRESSION ANALYSIS 17 78 (a) Using RY | 6932 found in Prob. 7.1506), we get Pe1-0-8 - a0.) 2 = 0.600 (6) R= 0.33 in the simple repression, with only the percentage ofthe labor Force in agriculture, Wy, as ant independent orexptanatory variable [soc Prob. 6 31(0). R° = 0.48 by adding the years of schooling for the population over 25 years of age, asthe second independent or explanatory variable. However, ‘when consideration is taken of the fact that the addition of ¥ reduces the degrees of freedom by | (corn w— em 15—2 = 15 i the simple segression of Y on Xj, to n—K-= 1S— 3 = [2in the multiple regression af ¥'on Xy and 1) # is educod to 0.64. ‘The faet that B; was found to be statistically sygmncant [mn Prob, 7.9(Py) ans x” = £° = 0.58 m the simple regression of 7 on Xy aNd nses to 2H = 0.64 in the raultiple repression of Y on 2 and Xb justifies the retention of X) as an additional independent or explanatory variable in te regression equation (a) How can Soe} (required to conduct tests of significance) be found without (6) Find © ef for the data in Table 7.3 without finding ¥, (Table 7.7} Ge) Using the eximmated vatoes of By and By and Toys, Cys and Ey? tn got Then 1 Ty wo thet Del dL yl. This method of foding Let involves much fewer calculations than using 1 ¢the only additional calculation besides those required 1o-esti= rats Band Fes Ev, (@}- From the value of R= 0.6935 found in Prof. 7.1%) which wiles only the estimated vals of 8, and 5s found In Prob, 7a) and she valucscalealared In Table 7.4] and 5°97 = 9 from Prob, 7.15), 9 at Dd -0-a5 ‘This compares with ¥°¢] = 12.2730 found in Table 7.7. {emul irc in the rive ot Fo 8 found by these two-ehethods s obviously due to rounding eetors.) Note, however, that fading 5 done above eliminates entirely the ned for Table 7. = (1 0.693534) = 12.26 TEST OF THE OVERALL SIGNIFICANCE OF THE REGRESSION m9 (a) State the null and alternative hypotheses in testing the overall significance of the regression, (6) How is the overall significance ef the regression tested? What is its rationale? (e) Give the Formula for the explained and wnexphined oF residual variance, (a) Testing the overall significance of the regression refers to testing the hypothesis that nane of the independent variables helps Lo explain the sariation of the dependent Variable about its mean. Fer- ‘mally, the null hypothesis I He babs shae aptinst the alternative hypothesis: Hy: et all b values are 0 (6) The overall significance of the regression is tested by calculating the F ratio of the explained to the ‘unexplained or residual variance. A ~high" value forthe F statistic suggests a significant relationship ‘erwoes the dependent and ladepeadent vaitablet leading to the xejetion of the cull hypathetis that the coeflciets of all explanatory variables are jointly 2210 (0) Eaplained asians — EU — FRM ~ 1) — RSS /CA 1) EAA 1h es As andes of tie rats pramcirs (es Sec 6-) Uncplined saiancs = S1¥,~ PP dtn=4) = ESS ot. Re 70 7a 72 MULTIPLE REGRESSION ANALYS lomar. 7 (a) Give the formula for the calculated F ratio-or statistic for the case ofa simple regression and fora regiession with # = 15, k= 3. (8) Cais the calculated F statistic be “large” and yet none of the estimated parameters be statistically signiticant? Evin @ Fie Eea—3 respectively. In this simple regreion cas, Fiy-p #2 forthe ame Ke of signee. Por tiple ropesion with n= [Sand k~3, Fay = HAMAD (by tLespossibe forth caleloed F satsue toby "rae and set none of he estimated parameters wo Be staisicalysigafcant, This might oocur when the indepnécatvarghes are hight cored with Gach other ice See. 321 ‘The Pes often of mie wfuness heats key to eee she al typethesis regardless of whether the model explana “a yreat deal” of the asian of. (a) Prowe thar [240k — DID ¢7/ta — ba) = [AE 0k — DIAC — Rta 8). Tn view of the result of part'2, what is an allernative way to state the hypothesis for tosting the overall significanes: of the regression? @ Eek) Evinnk PE nk nak Rk Leme8 Lett Cee it Twit Tee (0) The F ratio, a6 txt of cignificance of the explanatory power of all independant variable jointly roughly equivalent to testing the agnificance of the AP statistic. If the alternative hypathests is secopted me wold expert RE and thesefnee Ft he high Test a the 3% Jevel the overa significance of the OLS regression estimated in Prob, 7.5(a) by (OE — WNISS et — Ro amd (BY [RE Ade — HEL = REY =) 17.727 feomn Prob. 7.15ta) and $e? = 12.2730 from Table 7.7, we pet using (@) 5 (a) Using BF Since the-aleulated value F ratio exceeds the tabular value of F = 3.88 at the 5% level of significance sand 2 and 12 degrees of fresdom (ser App. Th. the alternative hypothesis that wot all b's are zero is sccepted at the $6 level (8) Using 2 — A482 from Prob, 7.15(8), me got Rik T-#e-6 and se accept the hypothe that 2 ie cigniieantly differant from sero at the 5% love PARTIAL-CORRELATION COEFFICIENTS 7B (a) How ean the influence of %) be removed from both ¥ and 1% in finding.ryy, ».? (6) What is the range of values for partial-correlation coellicients? (c) What is the sign of partial-corretation, coefficients? (a) What is the usc of partial comelation coefficien s? (@) In onder to remove the influence of M3 on ¥, we regress Y on 1 and find the residual ey = "To remove the inflacnce of ¥; on My,-me repress X) om X3 and find the resifual ey — Xf. 17 and 27 then represent the variations in and, reqpectively, let unexplained after removing the infuence of Xs oem both ¥ and). Thissters, the partial corilation eadlfciaal is endraly the Grip cornitation, cocficient berween the resus 3* and ¥} ¢chat i, yy, x, = Proxy) (4) Pactik solution soeflciats cage i Yalu fonm ~1 to +1 Gust as ia the sae of simple covestaion: soeffcisntsh For esample, ryy,x; =~! refers t the case whore there is an exact or perfect negative linear relationship betwoon ane X, after ranoving the common influance-of from both Yand X CHAP. 7) a4 ESSION ANALYSIS 1m However, rrr, = 1 indicates a perfect postive linear net relationship between Y’ and Xy. And raya; = 0 indicates no linear ralationship betwen ¥ and ¥, when the common indluence of X hae beet femoved from both Y and 1). Asa result. can be ornitted trom the regression. GO) The sass of path conceathis coellceats is as ase ash te ns padi tata ya For example, for the estimated regsession equation ¥ = &y + By¥) +B: erp x, has the same sign as Be and ryazg, bacthe came vign ach, (4). Partial sorrelation cosfisients are used in multiple regression analysis to determine the relative impor- lage oF eal enlanabory sasiabie ss the snlel, The detent variable sith Ur Ilene partial corelation coeficient with respect to the dependent variable contributes moot tthe explanatory power fof the mad and is cntered fist ina. stopwia maple rogresion analyzis. It should he nated, howevor, ‘that partial correlation coefficients give an ordinal, nota cardinal, measure of net correlation, and the sum of the partial carélstian coeflicents betwen the dependent nd all the endepenctent variables im the raodel need not add up to 1. For the gression estimated in Prob. 7.52), find (a) ryy.x, and (8) ryy.ay+ (0) Bows Xr Ny contribute more to the explanatory power of the model? psf eyayags Wo med 9 Find Bt rps Pry A rye Ung the values from Table 7.4, we got Enya : eos 10.5715) — (0.6964N-.180) Teme = 96331 (6) Using the values of Pry. Frys nd ty, Saleulated fa part a, we got 0.084) — (0.571 {0.180 Vining ft in, t= -arsontyt -t-0s7isF (2) Sines rnin eneeeds the absointe value of reap ME-soncha that Y; comtelbues more than Ys tothe explanatory poner of the modal = 0.8072 MATRIX NOTATION 725 7.26 (a) Why is matrix notation used? (6) What are the advantages? ages? (a) Matrix notation is a mathematical way to representa spstem of several finear equation in.an organized fashion Since, by ote assimptions (Chap. 6), the standard reeressinn is Ena nd centaine mlipe observations of the same linear equation, linear algebra lends ite well to econometrics, (H) Cine advantage of mate notation is eonsisoneie in the notation since one dees not have 46 wete surimations and ellipsss, Also, the mates solution Wosks for any wumbor of indepsndest Variables (om 0 to bp (0) The main disadvantage of matrix sotation is that it requires a more advanced knowledge of linear lgcbra and matrix mathematics, ¢ the OLS solution using matrix natation, Tn matricar, the regression ie written MULTIPLE REGRESHON ANALYSIS vextew We nant o mnie the sum of squared eros, in mati tation Mis woe Min (= Hy (=) Taking the ia Srivtve and sci tau sero anys ie Expanding terms and simplitying wy b=0 Solving for 5 wd = '¥ ay lad = aay er Since any marin times its inverse is equal to the Wemtty matrix away ince any matrix noltiplid by Pie equal to tse seu ey 7.27 For the regression in Prob, 7.6, identify the matrices (a) Nand (#) Yr. ta 1 1 1 1 1 1 1 8 1 1 1 1 1 1 1 2828 lomar. 7 CHAP. 7) MULTIPLE REGRESSION ANALYSIS Ve oy 7.28 For the regression in Prob, 7.6, identify the matrices (0) X°X and (6) ¥UK)*, se ey =| ies m7 snes 1328 14,905 118,666, UTP 03187 2392 (oy ery! =| 03187 6.0058 won29 02203 6.0089 0038 SUMMARY PRORLEM 7.29 Table 79 gives the hypothetical quantity demanded of a commodity, ¥, it price, ¥y, and con- sumers’ income, .t3. from 198) to 1999. a) Fit an OLS regression to these observations (6) Test at the 5% level for the statistical significance of the slope parameters. {c) Find the unadjusted and adjusted cowthiien! of multiple correlation. (a) Lest 1or the overall signiieanoe of the regression. (c) Find the partail corretition coefficients and indicate which independent Vanable contributes more to the explanatory power of the model, (7) Hind the eoethicent of price elasticity of demand yp and income clasticity of demand yy at the means. ig) Report all the results in summary and round off all calculations to four decimal places. (a) Table 7.10 gives the calculations required to fit the linear repression, (Lax) (2x5) = (Px vay) _ (= 505102.800.0000 = C107. 500= 11.900), Ea EN)- End (an 2 0 0) — (—11 9007 (La)(Eai) = (LavWE (007,500 NOH) — (50S 11,900), (602,800,000) — (—11 9007 2 Wt 5.1061 46) = (0.01611) 382 266 2666 — 5 1061.8, + 00167: = 4108 2 00167 (by Wo can find Sef hy fat caeaatng B® rom Table 710 176 MULTIPLE REGRESSION ANALYS ‘Table 7.9 Quantity Demanded of a Commodity, Price, and Connamers Tecoma, 1085-1008 {$1604 —208 + (0167 107560) a = 9s e we ES = (1 = 0.9508}4600 = 228.32 a= za SO n-k Pepe 2632 2.300.000) y 15 =3 (60h(2,800,000) = (=11,500) sant and ook Pala (Lal a w Ee == RE, 282 153 (GON R00,000) = 107 B= S:1061 9 Tatas 360 and 2 Therefore, both by and by are statistically significant at the $4% level (©) RE =9.9508 (Found in pare 6). Therefore ae mV ST} 21-11-0950 121 0.96 Re ie- 0 apsogs—1) ; “ Fite “ORB A= a9s09) 715 =m = 1595? ‘Therefore, A is significantly different from 0 at the 5% level te) To find a tne Fr aye 9 ML feet Fin (From Table 7.10) ca m2 =] = ms 7 a ms] @ |e a aa ie 7 = we} os | oa se | ais | | ae | os a se 2] om | as wo) ow | 9 | oe fw fot | ae] oy 6 ' wos | 8 wer) ow | 6 | om s| oo | -w] 6 © 5 wom | as ee) S| a} ow] 4 We ‘ ee) wa) om | os | om s| a fom] os ae ' wom | 33 vo) vw | os [asm fos | a | ae | ie ¢ | mom |e wer} ow | oo | om |» | a | | Shee ©] aan | be we{ oe | + | im | is | -2 | to © | tom | as lea TN NOSSO WEAN 178 MULTIPLE REGRESSION ANALYSIS lomar. 7 Saw =205 a AIG Ves eee pny = = Ig gas JEEP aan oT enn me) roa = EMEP AOI) HOOTENANNY gay Visine, f= (asian? ft -osery ga fit, = OT) — 0918-0 a go19 VisAnyl-Sy,/1-(-ostet- (996137 ‘Thos ¥, contributes more than W; t0 the explanatory poner of the model wo m= be Ty =82.2668—s106ry, +0.018TK, fvaluce (-2.6006) (2.8602), Supplementary Problems: THE THRER-VARIARLE LINEAR MODEL 720 Table 7.11 extends Table 6.12 und gives observations on ¥, 1, and X: of Yon 4) and %) 4764 529K # 213K) Find the OLS regression equation Table 711 Observations ow ¥, %), and 2 TAH With eference tothe estimated OLS regress @) bcand, (6) by _ Ans. a) by = 4.76 i the constant or ¥ intercept; Fy shy = 4.75, when May Ay) =O (68 Be = indicating that a oneunit create ia Y, (while holding Y; constant) rnuls nits (2) = 2.13, indicating that a One-unitinesease in increase in of 213 units quation of on) and yn Prob. 730 aterpret (a), ae incieae in By of $29 ‘hile holding, sonstant) results in an CHAP. 7) MULTIPLE REGRESSION ANALYSIS 19 TESTS OF SIGNIFICANCE OF PARAMETER ESTIMATES Table TIN, find tak, (Oh of ard sg .and Ce) amd 50 (by s2 = .G and 5, 21.78 4c) STB BS and 4, #435 TAY Test at the $% level of significance for {a} By and) by in Prob, 7.30. Ans, (a) By & statistically significant at the 3% level () by is mot statistically significant at the S*% level 7M Comsceuet the 95% confidenos inteval for (a) by andl (6P #y it Prob, 7.90, Ans: (a) 108 $8, 29.80 (8) ~8.165 0; 5 1242 THE COEFFICIENT OF MULTIPLE DETERMINATION TAS For the estimated OLS regression found in Prob, 7.38, find (a) A and) FE. Should included ia the regression? 7 Aes. (a) KOM fasing = 1 Eels Ey] _ th) Ts Gey Since by wat wen fond te be. statisially significant (in Prob. 7.33(09) and A* fell trom &°= A= 0.77 with only) as an iadeyanden variable pee Prd, GAOGH Ws A= 0.73 fabaresh, a should anek be inwlodel in the regression. M6 For R= 1660,n= 10, and k= 1, find #, Ane. E60) 737 Far B® — 060.0 — 10, and 2. find Ans. Pm 055 7.38 For R? = 060 and & = 2 (asin Prob, 7.37) but w = 100, find Ans, = 0.596 7.399 For R= 040,n= 10, and k= 5, find #. des, RE = 0008 (hut is interpreted as being equal 10 0) TEST OF THE OVERALL SIGNIFICANCE OF THE REGRESSION 740 te OLS regression in Prob. 7M, find (a) the explained variance, (b) the unexph ¢. and fc) the # ratio or statistic. Aes a SFM 8) Lede 8 A 298 TAL Test the overall signtficanes of the OLS resression estimated in Prob. 7.10 at (a the 5% level and (6) at the 1% level Ans. (a) Since the calcolated F ratio (12.98) exoscds the tabular oF theoretical alc of F (474) at a = 0.05 and df 2 and 7, we accept the hypothesis that the estimated OLS regression parameters are jointly Senificant at the 2% level. (UY Shae the tabalas value of Fis 9.56 atu 0,01, the alternative yyrtleses is avovpted at the L% level of significance also. PARTIAL CORRELATION COEFFICIENTS: JAE For the estimated OLS regression in Prob, 7.80, find (a) rvy.a, and (8) ry, x.) Which independent ‘arlable contributes more to the exptaratory power of tha mode? Ans. (ah ryga, O78 OB ryayy, = O18 fe) ¥y MATRIX, NOTATION FAS (a) What isthe Fist colon ofthe mais? (8) Whore i he varinse off i the f matrix? ddns. {ah acofumm of Ty h sccond tom, scone calvin 180 MULTIPLE REGRESSION ANALYSIS lomar. 7 SUMMARY PROBLEM. ‘Tad Table 7.12 extends Table 6.13 and gives data for a randoor rample of 1? couples on the number of children they had, Y, the muaber of childsen they stated that they wanted al the time of their uarriage. Xa the Jearsof education of the Wife... (a) Find the OLS regression equation of Y om Xj and1;. ¢@) Caleulate ‘values and test at the $% level for the statistical significance of the slope parameters. te} Find the unadjusted and adjusted coefMetent of mubiple correlaion. f) Test for the overall significance of the regression. (e) Find the partial correlation coeflcients and indicate which independent variable contsibutes ore te the explanatory power of the modsl, Carry ott al calculations to two decimal plas, ante 7.82 Number OF Chien Ha ant Wanted amd B:eucatton of YH cum) of 2) 2,4) el*)7)*)°)*@)]"]2 r{a[s[e[+[«[s]e[a[fs][i][s]a ~leto)*)/2!f2]3]*,/3]2]:[3]2 ~lalu)se)wlwlu]lselnlelwlu]s Ans. (a) T= 6:904053%, ~ 0.38 (6) Since 4y = 3.12 and ¢y = —5.57, both by and By are statistically significant at the 3% level. (ch #° = 0.92 and = O90 (ah Since Fra $1.31, AR’ is statistically sig nificant at the 5% level. fe) Pyxony = 0.71 and ry 0.87; thus 1X, eontribotes more thaa.X, t0 the explanatory power of the motel. ase) Further Techniques and Applications in Regression Analysis FUNCTIONAL FORM ‘Theory or the seatter of points (requently suggests nonlinear relationships. 11 is possible to trans- form some noriincar functions iato Tinear ones 60 that the OLS method can still be used. Some of the most common of these and their transformations are shown in Table 8.1. Applying the OLS method to: he transformes near tunctons gives unbuased slope estimates. In Eg. (8.4), Oy 18 Ine elashelly ot 2 with respoct to. ¥ ‘Table 4.1 Fonetiomal Forms nd this Transformations Funston Transformation Form Equation Yohire PaRshe au oobi to yet hvew Pr mie hitew Semilo Veh+thivitu | Yak bbz tu Reciprocal Yoh hvahvtan| Vm a hv sean | Polynomial BXAMPLE 4. Suppove that we posistate a demand function of the form wpabe Where ¥ = quantity demanded of a commodity ¥\ = its price Xi-m comsumen income 18 ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Tenms of Use, 12 PURTHER TECHNIQUES AND APPLICATIONS IN REGRESSION ANALYSIS. (CHAP. & Usiicing the o linear fore, we get in Table 7.9 and applying the CHLS method to this demand function transformed into double-fog hy oar 96 = 0.26, + 0.390 X 3s) 6.60 ‘where ~0.26 and 0.39 are, respectively, unbiased extimates af the peice and income elasticity of demand (see Prob. 8.2), The dt hove sess better tha (or the linear form (see Prob, 729) 3.2 DUMMY VAKIABLES Qualitative explanatory variables (such as wartime vs. peacetime. periods of strike vs_nomstrike male vs. females, etc.) can be introduced into regression analysis by assigning the value of | for one shassitication (2 g.. wartime) and for the other (ep. peacetinse) These are called dhamany variables and are treated as any othor variable. Dummy variables can be uscd to eapture changes (shifis) in the intercept [Eq. (8.5)|- changes in slope [Eq, (86)], and changes in both intercept and slope [Eq. (7) Ye yt ht boi oo Pm yh ON OOD ba wa) Vm yt N b baDot baXD tae ro) ‘where 1s | for one classification and otherwise and X is the usual quantitative explanatory variable Dummy variables also can be used to capture differences among mors than two classifications, such as seasons and regions [Eq. (88): PB AI EAD, FB bhaDy ba wo ‘where by isthe intereept for the first season oF region and BD, D;, and D refer. respectively. ta season oF region 2, 5, and, Note that for any number of elassifications &, & ~ | dummiesare required (see Frobs. 89, 8.26, and 8.27), For qualitative dependent variables, see See, 8.5 BXAMPLE 2 Table $2 ghee grou private domevticintesnent¥ and grows national roduc both in lina i eurene dla, fo the United Stas tfom 195911984 Using Dl for he war yu (982 1848) and DO fore pence yar, eet 258-016" = 20810 = 0.94 (unre «889 ‘Dis aiscay apnea atte Sel. Ths y= -2.8 for peste nd isthe common sope-eocien (For esafa dccac in spe aswell ay ileences mtecpt and pe Se Probe 87 and 32) ble 8.2 Grass Privaic Demesiic Investment and Gross National Product (ia Billions uf Dollars); United States, Year] 1929] 194] 141 [waz | 1983] 1948] 1945] 1946 | 1947] 1940] 1990] 1980] Last | 19% 198 | 198 y | 93] 13a] io] 99] sal 72] toe] aur] 40] aso] 35a] sea] soa] sui] nals ¥ | 908 [1000] 1249] 158.3] 192.0] 210.5] 212.3] 309.3] 232.8] 289.1 [2580] 284.2] 330.2 [347.2] 346 | [366.3 ‘Sees FemowiRapare of the Presson, 11% Government Printing Ofies, Washing, DC, 1900, p. 208 83 DISTRIBUTED LAG MODELS Tis often the case that the eurrent value of the dependent variable is2 funetion of or depends on the ‘weighted sum of present 1 and past values of the independent variable (and the error term), with generally different weights assigned ta variaus time periods: CHAP. #) FURTHER TECHNIQUES AND APPLICATIONS IN REGRESSION ANALYSIS 182 + bgp HON HB Xia tay oy Estimating the distributed dar made! (Eq, (6.2) presents two difficulties: (1) the stata on one observation or time period are lost for each lagged value of 1’ and (2) the X'sare likely to be related to cach other, so that it may be dificult or impossible to isolate the effect of each ¥' on Y. ‘These difficulties can be eliminated by deriving from Eq. (8.9) the Ki which assumes that the weights decline geometrically (ee Prob, 8.11: Mp mal — a) + OM HAR tay (el vk Joe model (Eq. (8.10), where 0 < 3 < 1 and y, = af,— ha... However, Eq, (8:10) violates two assumptions of the OLS model and results in biased and inconsistent estimatoys that require adjustment (sce See. 9.31 Alternatively, the Adon lag mou can be used. This allows for a. move flexible lag structure to be approximated empirically by 4 polynomial ofdegeee at east one more than the number of turning points in the functiom (see Prob. $.131, Assuming a three-period lag [Eq. (8.170 king the form of a secomd- degree polynomial [Eq. (8.2). we can dorive Eq. 613) (sce Prob. 8.14 eS OE OA, HO Had HPs Ht un where Bee tedtel AD so that $oy2y Fey tern +My (63 where 7 : and Pex, ‘he values of the 6 terms an Eg. ($.14) are obtained by substituting the estunated values of ¢, from Eq, (8.14) into Eq. (8.12) (see Prob. 8.15) EXAMPLE 3. Table 8.3 gives the lev of ips and the gross domestic peoduct , bath in billions of 1906 Gollars, for the United States from 190 to 1999, Fisting the Koyek model, we get i, ~3.9-405TH HORN» GIN G9 139.99, so that. = ~147.42 0.99 where i= 0.87 and dti 0. ‘Table 8.3 enports snd Gross Dosestic Product (ih Bios of 1996 Dela); United Stats, 1960-1999 vs [i981 | t982 Ce sso] saa) ava yor | wa7a | wos To,a08.6 | 2008.7 | 19,6775 24,asha | 254782 [263073 row” | iver | 1992 1999) wiza| siv2[ ame[ ates 14a. 2631.6] 26,705.7| 27,5204] 28,2506] s0a7sa] 35,503. ‘Source Lati Fedonl Resres (Horeau of Eeimomie ABA), 84 FORECASTING Forecasting refers to the estimation of the value of the dependent variable Fp given the actual or projected value of the indopendent variable Vp The foresast-errne variance a} is given By 1st PURTHER TECHNIQUES AND APPLICATIONS IN REGRESSION ANALYSIS. (CHAP. & 1, (Xp-4P apaet| i424 We“ ¥Y [ = Saar ‘where nis the number of obscrvations and a Is the varlance of u. Since ols seldom known, we uses? as an unbiased estimate of ers, s0 that the estimated forecast-error variance, 5, is aM Pet tuansse where Fp = fy + 6)Xy and refers to the r distribution with m — 2 degrees of freedom. EXAMPLE 4. Returning tothe cormfertier example in Chap. 6, recall that f= 2712-416 ¥. 1= 1, Walt, JA, Hy 846 Com Esample 6), and = Seize 2p 2473178 2 $091 (rom Example 63> Projecting for 1981 am amonnt of front ote per ace of N= 38,8 ast 1, 3-18" S712 1.6635) = 45.38 ‘Then the 95% conidense a forecast interval for Yp i 198 {See Prob. 8.19 for foresasting in multiple regression analysis) 945 and 5p = 308. i, 5.38 (2511.08), of between 3827 and 52.49, 85 BINARY CHOICE MODELS. If the dopondont variable is a dummy variable, an OLS regression is not appropriate. An OLS regression could yield incongruous predictions greater than | or less than (Also, the regression would violate the assumption ofno heteroserdastigity boowuse of the diserete nature of the depondont variable To estimate the model, we first set up an underlying model Yi = byt hXiry Here, ¥* is considered an waderlying propsusity for the dummy vasiabls to take the sahue of 1 and isa ble so that Vif YF 2 Oy = oO Of HOw (open), TE" <0, ¥ =0 (not opens robit estimation gives the Following results: 1.9982 +0.0010%), Pe = OS8F, 35, = 0.0008, tn 2 i e867 Totes significance, we can use the usual ¢ test bat since probit uses the standard normal distribution, the = tables can be wos fg ful, = = 1990/0 8047 = 240 « -196 from App 3) therfore sic athe Se eve festa, <9oaigjmoes = 2> 1.96; mereore gala athe 5% el 86 INTERPRETATION OF BINARY CHOICE MODELS ‘The interpretation of by changes in a binary choice model. by istheeflect of Von ¥*. “The margina effect of X on PUY = 1) is easier to interpret and is given by Faby + OE) By Te ‘To test the fit of the model (analogous to 2°}, the maximized logelikelihood value (In L) cam be compared fo the maximized log hkehhood ina model with only a constant (In £y) an the dikelihood ratio index int tera 1 = Another measure of goodness of fit is to compare predicted values of ¥ f0 actual values, Custo= imacily, if Pay & by iN) > 05, them y= 1 EXAMPLE 6. Continuing with the interpretation from Sex. 8.5. The marginal effect of W (GDP/ 039 Nigeria wands Singapore Soak Affiea as onl = 099 086 Uganda Uruguay Wencpcla Zimbabwe oo = 099 = pay 020 Predicted Solved Problems FUNCTIONAL FORM BL (a) How is the form of the functional relationship decided” (6) What are some of the most useful teansformations into linear functions? (c) Ate the estimated parameters obtained from the application of the OLS method to transformed linear functions unbiased estimates of the true population parameters? {e) Peopomie theory ean sometimes suggest the functional form of an ccopornletelnionship. For exam i, microeconomic theory postulates an average bort-nan)ons carve that i Lishaped and a average fixed-costeurvethat constantly falls and approachesthe quantity axis asymptotically as total fixed costs are spread over more and more units prodeed, The scatter of points alsa suggest the appropriate fapetional forms i 4 too arable relationship, When nether theory nar eealtér of pointe eof help, the linear Function is usually tried fist beeause ofits simplicity gi ts Mati ave ts oe (4) Some of the most use fl aul comuntons Wansfovenations of logarithm or double log. the semilog the reciprocal, ad the polyromial functions (sceTable 8.1). Ohne (oF tha advantages of the double log form iz that the slope parameters reprocentelartcitior (ose Prob. CHAP. #) FURTHER TECHNIQUES AND APPLICATIONS IN REGRESSION ANALYSIS Ist 2h The semilog function is appropriate when the dependent variable grows at about a consiant rate ‘over tina, a8 in the cats of the labor force and popolation {608 Prob, 8A), The eseiprocal and polynoenial fusctions are appropriate to estimate averape-cost and total-oost curves (soe Prob. 8.5) (0) The estimation off a tragalorined doubiebos fuwction by the OLS stl! seals ia wubased slope estimators. However, by = antlog fj isa biased but consistent estimator of by. ‘The fact that By is biased ic nor of much consequence hecauce the donctant ir uowally not of primary intereat [ose Prob. 7.Xe- In the other transformed functions in Table &.1, fy also ie unbiased. The double-Log linear esol Ss appraise LY plate! against la Y los appensitaately na strait Line 82 Prove that in the double-log demand function of the farm Qa brie where @ is the quantity demanded, P is the price, and Y is the income, (a) by is the price elasticity of demand, or mp, and (b) by is the income elasticity of demand, or ny, (The wader Without knowledge of calculus ean skip this problem.) (a) The definition of price elasticity of demand is s “The derivative of the funetion with respect to P is Be ity yey Badly Yet mE ‘Substituting the value of aQ/di inte the formulas for np, we Bet we ao (6) The definition of income elasticity of demand is we ‘he denvatiye of te Q tunction With respect 10. 1s ity PN Ney Daly eT a Substinnting the valus of aQ/A into the formula For ry, we Bet 8.3. Table 886 gives the output in tons Q, the labor fmput in hous £, anel eapital input in machine- hours K, of 14 firms in an industry. Fit the data to the Cobb-Doughs production Function Qa meh ihe ‘Table 8.6 Output amd Labor and Capital Inputs of 14 Firms iam Industry 188 FURTHER TECHNIQUES AND APPLICATIONS IN REGRESSION ANALYSIS (CHAP. 8 ‘The data are first transformed into natural log form, as shown in Table 8.7, and then the OLS method is applied to the traniformed variables a¢-explained in Sec. 6.2 the computer dow all ofthis), The rwrults are InQ= 2323+ 148In + 3.48In& es) em ‘Table R7 Outpt and Labor and Capital Input in Original and Log Form Fee | 0 1 K mo lee we av | a0 | is [a0 | Same sateié 2 | 0 | 160) 450 | Sgr 6.10935 3] ouo | ots | ss | atone soni? 4] sa | is | 430 | eres S638 s | so | isin | ss | ssn e178 6 | a0 | issn | aso | easzs 6.10925 2 | 40 | ism | oo | catmas 1944 s | oi@ | 20 | so7R8 to | amo | reso ao | an 613133 | asa | ism] as | sass 6.07538 | sso | rr ] ao | 6302 615273 | sea | 200 ] a8 | 63209 ere | aa | ism] aan | ener oReT? The estimated coefficients 143 and 3.05 seler, respectively, to the output elasticity af Land X. Since LAS-+ 403 =4.48 > 1, tere are ireasing reTUms 10 sale AN Hhis MNDUSKTY (e.g. wheIEAKINg the IMpUES OT oth £ and 4 by 10% causes output to increase by 44.8%). it anilons} ie the United States Table 8.8. BA Table 8.8 gives the smmber of monfasun porsons employed N from 1980 to 1999. Fit an OLS regression Tine to the data ‘Table 8.8. Millions of Pervoas Erglayed in the United States from 1980 io 1999 Year | 1980 | 198 ] ws | 198s | 198 | 1983 | one | 987 | 19Re | Ime ae Year | 1990 | tar | vee [ ras [19a [1993 | 1996 [i997 [ aoee | 199m Ww | 94] 1952 [rose [vie7 [ase | nize | tise | i2z7 fies | ieee Since emplosment tends to grow at about a constant rate aver time 7, we fia semilog function of the form of Eq, (8.2) to the transformed data in Table 8.9. The result InN aah + ONT SNe 6.77) CHAP. #) FURTHER TECHNIQUES AND APPLICATIONS IN REGRESSION ANALYSIS 189 ‘Table 8.9 Millions Eanployed in the ‘United State, TORG-1999: ‘Original aod. Traesformed Data ‘Year W my |? 19 wa | somes |r 9st a2 | asin | 2 1982 ws | aaa | 3 1983 m2 | sso | 4 98 oat | ass | 5 aRs ora | ast |g 988 v3 | soon | 7 sas | 032 | goss | 9 i | ors | assis | to io | tna | aeaso | 1 1991 wos2 | ses | 12 re ee ios | 407 | arog | ta 1904 ameo | 15 995 ames | 16 a6 ame | a7 1997 samt | is 1995 amass | 19 1999 ase | 20 85 Fit a short-run average-cost curve to the data in Table 8.10, which gives average cost AC and output @ fora firm over a I2eweek period. ‘Table 410 Average Coot and Output of Firm over a 12-Weok Period wee] 7] 2] *] 4] s]) ©] 7] *) ®?] wo] nye ac fox | me [ef im] fa fo fae | om fous [or [xr ven | 190 | 109 | ass [aoa [asa | iss | amr | 130 | asi Since miseoeconomis theory postales Ushaped shor-ren cost curves, we it ALAR =rU-E mW tH where = The rsa RE amuis6—2200 40019" aoe a8 942) DUMMY VARIABLES 86 (0) Write am cquation for peacetime and one for wartime for Fas 1&9 to (7) if C-=consumption, Yy = disposable income, and D=1 for war years and 2=0 for peace years (8) Diriw a figure for Eqs. (8.5) to (8.7) showing a consumption function for peace 190 PURTHER TECHNIQUES AND APPLICATIONS IN REGRESSION ANALYSIS (CHAP. # years and one for war years. (a) What are the advantages of estimating Eqs. (8.5) to (8.7) as ‘opposed to estimating two regressions, one for peace years and ome for War years, it cach case? (a) Lasting a equations refer to pracstime and 6 equation refer to wartime, we get Cay tbh tu (85a) CS (byt bt byte (54) (60) Cm bth thee (868) Catythtyte (74) C= (aH ODE UO +O He (6.78) Note that all peacetime equations are identical because JP-= 0. Daring wartime, consumption is less ‘han in peacetime because of controls, reduced availability of goods and services, and moral suasion, ‘This ty and hy tthe coefficients of 1) are-expected to be negative For war years, so that the equations for seas petty Have a lowes scree aialie slap Ms Uae poutine 8) Sce Fig. 81 Fp 84 fe) The advantages of estimating Eqs. (8.5) 40 (8.7) at opposed te estimating a separate regression in each ‘ease, one for peavctime and ope Forwartime, are (2) the degrees af freedom are greater. (2) avvariety ‘of hypotheses can easily be tested to see if the differences in constants and/or slopes are statistically significant. are (3) enepater time is sive 3.7 Table 8.11 gives the quantity of milk (in thousands of quarts) supplicd by a firm per month Q at ‘various prices Provera I4+month period. ‘The firm faced a strike In some of its plants during the fifth, sixth, and seventh months, Ruma regression of @ on # (a) testing only for a shift in the nercept during periods of strike and nonstrike and (6) testing for a shift in the intercept and slope. ‘Table 8.11 Quantity Supplicd of Milk (in Thowsands of Quaris) at Various Prices Month] 1 @ | fw iefis| w]e] fia | ef spar as] 6] oe » [02 [os os2] 0x3] 008 | 098 [0.96 [ose | ose | 020 | 0.03 | 0.98 | 06 [aor during the months of strike and = 0 otherwise, we pet Ow WAT I6S9TP= 34D Rw 98 186 (2859) CHAP. #) FURTHER TECHNIQUES AND APPLICATIONS IN REGRESSION ANALYSIS 11 88 89 Since 12 js statistically significant at better than the 1% lve, the inteeept is by = -3247 during the period of no else, and it equale fy +, = —32.47 37,68 = —70.11 during the strike period, o G—-2074+ \ersce— 30.620 +2871 R089 CHS) sey AI Donal pL) ave statistically sigaicant at better than the 1% kel. The intercept and slope arc. respectively, 207M and 162.6 doring the period of na csike. Druring the ctrike peri, the intercept I By-t By = 28.74 — 309.26— =339, while the slope is + by — 1286-4 287.14 — 450 [since the fees, prosuinaly. able ta step up the iacesase is sutput in fs nostrikieg Slantsh Table 8.12 gives the consumpiion expenditures C, the disposable income Ty, and the sex of the head of the household. 5 of 12 random families. (a) Regress C om Yy. (6) Test for a different intercept for families with # male or a female as head of the houschold. (¢) Test for a different slope or MPL (marginal propensity to consume) for lamiies with a make oF a female as head of the household.) Test for both different intercept and slope. (e) Which is the “best” result? ‘Table 3.12 Consumption, Disposable Income, and Sex of Head of Howsehald of 12 Ramon Families y J 9 7 3) F 0) of © {1s.s35] 11,350] 12.190]15210] sesa [16760] 13,490] 9680] 17 840] 11,190 | 14.320 Toto 1089 15.040 | ram] vasa [29089] 10,40 | 0, Fon] 22 430 s[ululefl[mu]eluls[rleM[eler lo (a Cm 168.6040.75F, = 0.978 279 (12 (by Lettiog D = 1 for Families beaded by a female and B= 0 otherwise, we get Is6.02 +0827) +85209 6s) Ca 8 198s © FOOTE +0707, +0087. au) este <184.704 0.830) 4 1757.990 = 0067, R= ORS a6 LO) O57) w (e) Since aeither D nor V0 is statistically significant al the $66 level in parts b, ©, and d, there is no dlillewence inthe consumption patieens of households beaded by males or fernales. Thus the best results are those given in part a Table $.13 gives the retail sales (in billions of 1996 doflars) of the United States from the first quarter of 1995 to the fourth quarter of 1999. (a) Preparea table showing sales, a time trend, and dummy variables to take inte account seasonal effects. (6) Using the data from the tabs in part ar un a regression of eas on inventories and the seasonal dummies and interpret the resus 12 FURTHER TECHNIQUES AND APPLICATIONS IN REGRDSSION ANALYSIS ‘Talle 8.13 Retail Sas inthe United States (in Billions of 1996 8) lemar. & some [S08 | 8s | 6 | ots | sor ao | ore | sero | ctor quarter [1 «| mfowfo u | m|wfo mn yer 1998 1980 9 wae Lo? | ie [re [tsa [oes [oma] aos [nies [tu | m9 qurer| om | ow ft u |mf[wift « )m fw var 199 188 Be Sowee: St Louis Federal Reserve (US. Department of Commerce, Census Iureau) (@) Taking the first quarter as the base, and letting 2, =| for the second quarter andl Q otherwise, 22 = T fir the third quarter and 0 otherwisc, and Bi, = 1 for the fourth quarter and 0 otherwise, we get Table s4 ‘Table £14 Sales, Tine Tretd, and Seasonal Dusaries Year| Quarter Sales] Time Trend ] Di Dy 9s T sms v 0 0 a was i 8.5 2 1 0 0 Ws ML 6.6 2 ° 1 o was Iv 6 4 ® a 1 95 68.4 5 o a 0 5 on 6 1 a a 1996 am 25.0 7 0 1 a 96 Vv GING 8 ° o 1 wat I 70 ’ ° 9 0 wat UL isa " 0 1 4 wat Iv 65.9 ira o a 1 98 I sa B 0 a a 9 u 6153 ir l 0 4 va Mm 08 Is 0 1 a ry Ww 73 6 ° 0 1 wen I ns 7 6 a a 9 7168 Is 1 a a we9 m3 ie ® 1 a 99 m9 ou o 0 1 (8) Using: the data From Table 8.14 19 regress sales 5, 0m the $e 52656 | 666 T | 61.520, 1 S01 D, 1 96:15 D, 13.78) (95) Since all dummy variables are statistically significant at the 5% level, we obtain Sas1s.s6+6.06 7 5 588.08 + 6.66 7 west) = trend, T, D,, Dy, Dy, we Bet 2.23) in quarter 1 in quarter R= 0.08 CHAP. #) FURTHER TECHNIQUES AND APPLICATIONS IN REGRESSION ANALYSIS 192 379.5746.66T in quarter MI 622.7146.66 Tin quarter IV Those sevulls remain unchanged when four durteses are aed, ome foreach af the four seasons, bur the constant from the regression equation is dropped. Using the four seasonal duramiss and the constant together would make it impossible to estimate the OLS represrion (sce See. 9.2) DISTRIBUTED LAG MODELS sat (a) Whats meant by a distributed lag moder! (b) Write the equation Lor a general distaibuted tag model with an infinite mum ber of lags and for one with & lags. (cb What practical difficulties arise im estimating a distributes lag model with & Iags? (a) en theeffoct ofa policy variable may be distributed aver a scricsof time pericds (ix the dependent variable may be “sluggish” to respond to a policy change), requiring a series of lagged explanatory variables to aovount for the full adjustment process through time. A diseribwred lar madet is one in ‘which the current value of the dependent variable ¥, depends on the Weighted som af present and past values of the independent variables (Xj, )1,¥;-1, ef.) apd the errar term, with generally different ‘weights assigned & vrions tite periods (avtally deckning successively for earlier time periods). or Pye RBM EB RNa tt he (9% Yee ORB Ke ED 1 RON a tot Pah (90 Note that in Egs (5.8) and (8.84), a is constant, while by isthe cnelfcient of X,. This ‘order 10 simplify the algebraic manipulation in Prob. &, L(t, (6) Ih the estimation of a distributed lag model, the inclusion of each lagged term uses up one degree of frcedam. When the namber of independent lagged terme ix emall, the model can be estimated with GILS, as done in. Chap. 7. However, with K large (in relation to the length of the time series), an inadequate number of degrees of freedom may he lef t estimate the model or to be confident in the estimated parameters. Moreover, the lagged explanatory variables ina distebulted lag model are Wely to be strongly eurrelatod, soit may be dificult to adequately separate their independent effets on the depenclent variable [see Prob, 7,300) Ihactbeen dene in (a) Derive the Royek distributed lag model. (6) What problems arise in tho estimation of this model? (Hine for part a Start with the general distributed lag moxic] and assume that the weights decline geometrically, with 2 referring t0 a onstant larger shan O and smaller dhan 1; then bag the relationship by one period, anultiply by 2, and subtract it from the original relationship.) (a) Starting with Eg, 48.9), it is agsumed that all the weal assamptions of OLS are satisfied (gee Prob, 7.1)¢ Fema Aye 1 Baty PBN 1 bay (em Geometrically declining weights and 0 <4 2 1 gives bak) i 12, we ig Eg, (5.16) into Ea (8.93, we obtain Vy maor bate Ab eR A 2 ery Lagging by one period, we have Foyt ati PAA HAAN tote Maltiplying by 2 yields Dy SAME ABN Rhy Ry 2 Foot yg de and vubsracting from Eg. (#0) Io4 PURTHER TECHNIQUES AND APPLICATIONS IN REGRESSION ANALYSIS. (CHAP. & Vpn AY. am Bart byl + RbgK 4 Aly Nee es ee ea Yea LY mall 20+ By = hag t= ADE RN AT eo, 0) sens ag Dany Nive as (832) doe ns oo ee has ce veal Lr only bry with only one ¥. (6) two seus problemeanse inthe estimation ofa KOyck distributes lag madel. mars, a, Am bg. (9) satisfies all the OLS assumptions (soe Pro’, 64), then 1, =~ Ax an Eg, (8.10) does net Spec fially, Bly.) #MbBeawse vy ann ep are both Ocined with ay yi common fc.) = mj — Bap ad U1 Mer = tay 3). Imaddition, F(e.¥,.) #0. Violations ofthese OLS assumptions reskin biased tend ineonsintent itimatons for the Keyelt lng mode [E, (4.10), requiring elaborate sarrection pre dures (Some of which are discussed is Soe 9.3), The second serious prob is that the Koyek model rigidly assumes geometrically declning weights. This may seldom be the case in the real wold, thus requiring a more flexible lag scheme (sce Prob. 8.13) B12 Table 8.15 gives the keel of inventories Vand sales (in billions of dollars) in U.S. manufacture an img from 1981 to 1999 (a) Fit the Koyek model to the data tn Table 8.15, 15) What 1s ne value of i and a? Table 8.18 Inventories and Sales in U.S. Manufacturing, 1981-1999 (in Billions of Dollars) Year | 198t | i98t ] ions | i984 | a985 | ine | 19s? ] ies | cove | 1900 y | sw | sa | sm | om | cot | oa | vo | ver | ais | ow x | 3 | ae [oe [ae [as [ows [a [oe [ss [ae vear | 191 | sae | ras [reas [ 99s | tome | iar | toes | rove y [a3 | sax | avo [ oas [ ome | woe | ioe [vio | st y | sn | ss | ow | om | m | 2] | a7 | on ‘Shura: S, Louis Federal Reserve (US, Department of Conmmers, Censms Bure ta) BE A26.104060 4, +050) 449) 18.22) ” Em G50 and GOS) ASST, oe ITESELIN (a) What is the lag structure in the Almon lag model? —(b) What ee the advantages and disadvantages of the Almon lag model with respect to the Koyek model? (a) While the Koyek tag model assumes geometncally deelming Weights, the Almon lag model allows for any lag structure, 19 be approximated empirically by a polynomial of degreeat least one more than the ‘number of taming polnts inthe functlon. For example a tag stricture of the form of an inverted (Gs, with hy > hy) car he approximated by a polynomial of atleast the second degree. This may ai ‘asin the case ofan investment function. when ticcause of delays in recognition ard in making decisions, the level of investment in the current perind is more responsive to demand conditions in a: few earlier pleted than in tha eter pd. (6) The Almoa lag model has at least to important advantages with respect 1o the Koyek lag model Fost (aed as jointed oat arle), the Abin aide Ls sible los stuusture as opposed to the isd lng structure of the Koyck model. Sceond, since the Almon lag mode! does not replace the lngged indopendent warble (tho Nep-vith the lagged dependant variable, it doo: nat violate any of the OLS. CHAP. #) FURTHER TECHNIQUES AND APPLICATIONS IN REGRESSION ANALYSIS 198 assumptions (as does the Koyck model). Ohne disadvantage of the Almon model is that the number of conlletents to be estimated ip not reduced by ac enuch atin the Koycl model, Another diradvantage ie that in actual empieical work, neither the period aor the form of the kag nay be suggested by theory or be known a pring 8.14 Derive the Almom transformation for (a) a three-perind ng taking the form of a second-egree polynomial and (B) a fou-poriod lag taking the form af a third-degree polynomial (a) Starting with Bas. (8.140 and 68.12) Fe = at OyXs + By + bt +4 ath tm win Lat ern See ith and substituting Eq. (8.12) into Ex. (8.44). we pet Vy mae cg ley cr cad 1 bey Dey dead sey Bey BIEN a, Aearrangng he terms an te tat expression Feato(Eu Jeo Sa )re(Sen) a 2 4 Exod: EP, we wet 4 DM a and 2 obey tata taty hy, any (6) With a foursperiod lag taking the form of a third-degree polynomial, me have GF BMF Bit ct aM a tN tba hy ag tattel +o wih i=0,1,234 Substituting the second into the frst, we get ¥, eg log ey ey ROR # Cu ey eg 2 E/E Hop hey + Beat Mey (op te) + Hey HOD, a Hm, rinaa(Sos oa(SaJea(ea) o(be and Jetting the termes im parentheses equal, mespectvely, Zi, +m, Zig ad Zig, WE BBE Ym ab ody teat dy tata tay RAS Using the data from Table 8.15 and assuming & three-period lag taking the form of a second» degree polynomial, (a) Peopace a table with the original variables and the calculated Z values to be used to estimate the Almon lag madel. (6) Regress the level of inventories, ¥, on the Z values im the table in part a, ic, estimate regression Eq. (8.13), (c) Find the values and write out estimated Eq. (8.1), (a) Tho 2 waluee given im Table 8.16 are calculated as follows: 196 PURTHER TECHNIQUES AND APPLICATIONS IN REGRESSION ANALYSIS. (CHAP. & Table 3.16 Inventories, Sales, and Z Values in U.S, Mamafacturing, 1081-1999 Gn Billions of Dollars) Skee temas Zn Di, Nog #2824 BX. 5) ee 1 9K) (6) egressing ¥ 00 the 2s, we get Veiis+ou7, 40772-0157 et eaoe 2.2) 10.56) (099 © = imo hyenas By lag bay 4a) — 04d 4027 = 0.15) — 0.56 By =p $26, +48) = (ed + 0.54 0.60) = 038 By ip #38, #98) (4 $081 1.35) = 0.10 so that Fy = 171,804 048 +056 Ny 40.38 O10 Xs 2.20) Gy AI, OaT) where the standand errors of the lagged values of 1 have been found by vad fare HQ i+ QF) i) CHAP. #) FURTHER TECHNIQUES AND APPLICATIONS IN REGRESSION ANALYSIS 197 FORECASTING 816 (a) What is meant by forecasting? Conditional forecast? Prediction? (6) What are the possible sources of errors in forecasting? (c) What is the forecastrror variance? What is an unbiased estimate of the forecasterror variance? What do they depend on? (ay How is the value of Ye found? The 95% confidence interval of the Forecast, Y5? (a) Forecasting refers to the estimation of the value of a dependent variable, given the actual or projected value ofthe independent wariabla(:). When the forecort ie hasad on an ertimated or projected (eather ‘than om an sctuallvalue-of the mdependent variable, we have a conditional faecart. Prediction soften ‘ise interes bl with femeca ing. AR other thes, areticion veer to itiiatind n interhe value of the dependent variable. Forecasting then, refers to estimating future value of the dependent variable, (o} Forecasting errors ariss because of (1) the random nature of the error term, (2} estimated unbiased parameters equal the tru parameters only on the average, (3) errors in projecting the independent ‘variables, and (8) incorect model specification. (0) The Forscutt warianse of in en by are sions shes nis the tinibe of observations of the vain fu. An unbiased tina of the areas error variances given by pole hy Mtn aas[iale at] wis ‘where «is an unbiased estimate of a2 given by poDoinit Ee oo The larwer isn, the smaller isa} (or #1, 02 (or ), and the difference between Up and T. (a1 The wale of Fe is found hy sobttoting the sete or prota aloe fF Ny Int the ssinates teareion sq: Fem iat bite ‘The 95% confidence interval of the Foreesst Yr is given by Veta shore F raf to the dite tion with m2 degrees of Fmsdomn. Find the 95% confidence interval of the forecast for Y in Prob. 6.30 for (a) = (Oh ¥ = 11.5% (@) Ip Prob 630, we found that, = $9.13-200%, 735/13 & 22099, Fu X= 11% we ublain 1 as" aesnn(efpet) Fp = S018 — 2.8005) = 20.13 5% and, 5 X= 1100, Ext =442, and wages sp Ibe ‘Thon the 88% eofrene interval de Yp ia 20.13 £(2.18KIS.61) of between — 13:90 and $4.16 Where 2.18 = done with df= 13, (by For Wm ILS 198 PURTHER TECHNIQUES AND APPLICATIONS IN REGRESSION ANALYSIS (CHAP. # sm (1435 m= T58S and se 1S.S aE Fp $9.13 -260(115) = 2923 “Then the 95% eowfidenceimerval or ¥y 8 292A CIAKISAS) oF between = 4.79 and 62.0 Note that the range of the 98% eomafidenss interval For Vr as less hese tha it part a because the difference betucen the projected value of V and is smaller here B18 Draw a graph showing @ hypothetical positively sloped estimated OLS regression line, the 95% eontidence interval [or Yy for a given Ay, and the ¥3%e eontidence mnterval bands lor Tp. ‘Soe Fig 8:7 Nite that the 06% dividers haavls ee claws at Mp = B19 Lind the 98% confidenes interyal of Vp for Nig 38 and Nap — 26 in, 1981, given that Fm 31.98 + 0.65% +1 TX) Fy 18, F512 irom. Example 1, Pa Ee ina kee 1267/7 221.95, of B00, of 260.07 (fom Kxamnple 7.2), sf, 22.66, cord Ba) — 4, 5, 0.07 rom the computer) and if Shae eh yy — FP tah Oy — Rp agg Mie = Bay 98+ 2.06 + .N6(89 — 18)? + QUTLES ~ 12F° + (0.07139 — 1828 12) SSSI and sp 8428 Pp m= 31.98 + 0.65638) + 1.1125) = 8248 (8.8) ‘The 93% confidence interval for Yip in 1981 is then $2.48 + (2.37)(4.28) or between 75.4 and 92.62 BINARY CHOICE MODELS 8.20 (a) Derive the lostikelihood function for the probit model. (6) Give two alternative represen tations ofthe log-likelihood funetion. (6) How would the log-likelihood funetion differ for the logit model? CHAP. #) FURTHER TECHNIQUES AND APPLICATIONS IN REGRESSION ANALYSIS 199 821 822 83 (a) Since this is-a probit model, we know that w, is normally distributed in the model of the underlying peopenaity of ¥ but Ni ba if YF 20nd ¥,=0if YF <0. Mh we wee an observed value of ¥ = 1, we know that or alkernatively, uw >—Ay—hi The piobality of a, being in ‘this range is fy ~ by Xi), where @) is the cumulative probability for the normal distribution. Since the ‘normal distribution is symmetrical, we ean also write this as FAY = 1) = Ody +8,N) Similaly, the probability of obverving Y= for a single observation AY A O)e KY 1.46; therefore significant at the 5% kel ‘The coefficients are proportionally higher in absolute value than in the probil model, but the marginal fects, and igniicance should be similar, INTERPRETATION OF BINARY CHOICE MODELS, Baa 828 (a) Explain the diference between the following pars of terms in the context of bmnary chose models: (a) coefficient and marginal effect, (5) A and likelihood ratio index, (c) predicted ¥ and observed 7. Ga) The cvefewar ina hinaryehoiee mal givesonly the eelationshi herwcon aed Y", rhe niaabsorvale [propensity of ‘Therefor, the coefficient has an ambiguous interpretation, and cannot be compared ‘across different models. or betwven probit and logit. The marginal effect isthe effest of 1 on the probabelty of observing a suceess for Y. Since 6 observable, the interpretation of the marginal effect is clearer, and the marginal effet should be robust across models 18) Ris the ratio of explained sum of squares to total sum of squares ip a regression, which cannot be defined in a model with an unobservable dependent variable. The likstihood rato index uses the ratio. ‘of log-likelihood valves to achieve a similar measure, batits interpretation is not asstraightforward. is bounded by @ and 1, but achieves [only an the Innit, and rarely takes on large vals. (©) Predicted ¥ values are suceesses of ¥ that are predicted by the binary choice model, usually by having probabolty of Y= 1 greawr than 0%, Observed Y valursare the successes and faiures of Y from the data st Find the marginal effects, LRI, and predicted values for ¥ for the logit mods! in Prob, 8.23, ‘How do the resulls compare with Example 8.6? ‘Tho marginal effect of GDBjcap on the probability of a country being open to trade ie etuocemuaansy but - = OL 2 ba) ony 0m This can also be aiterpveted as the marginal effect of GDP/cap om the expected value of ¥, urret BEA (6s) mssaasi data 180 Predicted probabilitie: are sted in Tabla 8.18 ae 17 apt ieted Vate frte oe Mne z waroay | marae [minor ]eanraae | nro omy [x for |S Ea Pon ‘aon Wen ‘tat tot fe [me | a =ae “aan =a z Tepe De Tar =m =a sae 2 Tom —]# mm =e =e =A 3 np Roag | 1 [ear | Sana Sina m2 3 fe ae | Sa saa a tory Coast | @ | tine a1 2.0090 0207 z Tepe =a =a . is [| =o ne i cae cae aes i a sa Sar a 7 z San pT [ea 3 aT i a z Umass] 1 | = 32189 =e =a Ta Tan 4 Unser] [| aa ny aa 8 an z Teens] [ia | See Soa ne 8 so : Zeiaime | 6 | tie | an mate osu nase aia é Ea EwRF =I eEa LAY = Tio LwAT= To llaka ERT =T|wcm war] |B feeeeinons | ace | a Pacenst | Macatee | neues wwe 202 PURTHER TECHNIQUES AND APPLICATIONS IN REGRESSION ANALYSIS. (CHAP. & ‘The model predicts 18 out of 24 countries correctly, o MMe. The marginal effect and predictions are ‘irtwally identical to the probit mod, giving an indication of why the logit model wae wes almoxt exch sively before computers were readily available Table 8.18 Predicted Probabilities for Logit Made arandi Chai Congo. Egypt Hong Kong ‘aut 0s ost on > asa ‘Kenya alaysia 2 a8 Singapore South Alcs > 0.99 ost Veneaus — Zimbabue > 0.98 18 Supplementary Problems FUNCTIONAL FORM 826 Transform the following nonlinear functions into linear functions: (a) V=Aye"et, (by P= hy +h, In Xm (6) P= by A/N to and Gf) P= ty + ba bX an Ans. Ge) MY =I tOX +e (8) Y= OFA Ru. where R= INN le) Y= by—AyZ+u. where Fm i) Vm bye hk = byl yon, where We =X and T= 1 8.27 Fit addoubiedog funcslon to the data in Table 6. ns. Inys 264 ¢ 072InY R= 83.26% 140) (6.500 B28 Fit a semilog function of the fom ¥ ans. Y y+ By ln 8 +1 to the data in Table 6.12, 24TMN =I (0.36) 45.90) 28 (e) Fits polynomial fonction athe fem Ym dy boy Y— Yt the data i Tale 617 (8) Which s5vesa better fit for the data in Table 6.12, the nar Comm of robs, 6-34, 6.37, 63, and 640; the semilog form of Prob, 6.28: or the pelsnomial form of part xt ns. (a) Ye-22s¢1e1v—o71N) x? =s075% (199) (Fy = 1868 (@) The ft with the eamilog:onctbon i better than the Fic with the Binsar and polynomial Forme: CHAP. #) FURTHER TECHNIQUES AND APPLICATIONS IN REGRESSION ANALYSIS 03. DUMMY VARIABLES a0 aa For the data in Table 82 (a) run regression Ba. (8.6). (6) Is the slope coefficient significantly diferent in Wartime than in peacetime? (c) What is the slope covfcsent in peacetite? In wartime? 29940170 - 011K R= 095 (11.88) (7.56 As; (ah v= (by Yee (0) by 0.17 in pewcetin and by 06 waste For the data in Table 2, (a) run repression Eg. (6.7). (6) Is the intereep significantly different in wartime than in peacetime? (c) [s the slope encficcnt significantly different in wartime than in peacetime? Ams. (a) Pe +OITY + 9D -O1RKD R= 095 CLse (067) Le (6) No (e) Mo Table'S.19 gives the aggregate reserves of U.S, depository institutions R from the first quarter of 1995 to the third quarter of 7000. (a) Test for a linear trend in escrves apd for seasonal effects. (6) What i the valoe of the interoept for each season (use the 10% significance level)? Ans. (a) Assigning a trend value T that equals 1.2.3.....23 consecutively to cach quarter and leting for 'B, = [for the second quarter and 0 otherwise, B the fourth quarter and Datherwive, we get 1 Yor the third quarter and 0 otherwise, and Dy be 1% Aperoate Reserves of US, epenty Tio {in Milions of Dollars) Seurce; Federal Reserve Baad of Governors f= 56.370 985 T4153 10, + INE END, + 1875 SHA, (1756) 18) OID) = 098 (by Since ony Dy ie staisticaly sgntficam atthe 10% kee, by 58,3700 in quarters 1, and dy = 246.29 quater TY Table 8.20 gives the per capita disposable income Yin thousands of dollars and the percentage of college radiates in the population 28 yours af ngs or older X for the enstern United Sint in 1008. (3h Run a regression of on .C and on dummiss to take regional effets into ascount. (4) What & the value of the intercept far each region {use the 10% significance level)? Ans. (a) Taking South Atlantic as the base, By = 1 far New England states and 0 otherwiseand Dy Mid-Atlantic states and 0 otherwise, we gst Lor 204 PURTHER TECHNIQUES AND APPLICATIONS IN REGRESSION ANALYSIS. (CHAP. & ‘Table 8.20 Disposable Incoone aml Percent of College Grasses im the Bast in 1998 Dispose | 19.76 | 2499 | mo.tr | 2622 | 2am | 3022 | 208 | an | aw income, Percent with | 192 | 256 | 2a | ano | ozs | sie | ane | amt | 20 college degree, % State me | ost | vr {oma | ort | cx | ony | om | PA Region New Eagland! Mid-Atlantic Disposable wo | zo [inp ] 2 | as Bos income, Percent with | as1 | sis | moa | usa | 233 | 2a | 2u7 | as ‘cotegs degree, "+ State pe | mp | va | wy | ne | so | ca | re Region South Atlantic Source: Stara Aburac of the Und Stes, Fa k16+ 0560 +0880, +2830, A= 086 son th 10) 0 for New Lngand and Sonn ALtaRic states, white Ay = oan DISTRIBUTED LAG MODELS BM What are the problems in estimating (a) Equation (8.8)" (8) Equation (8.109 (ce) Equation (8.13)? Ans.) One observation i lost for each lagged valve of and the 1’sare likely to be related to each other (6) The rigidly geometrically declining lag strocture and the violation of two assumptions of OLS leading to biased and inconsistent estimators. {c) The number of coefficients to be estimated is mot reduced as much as in Eq, (8.16) and the period and the Form of the lag may not be known BAS Table K.21 gives the husinoss expenditures for ncw plant equipecar af public utilities ¥ and the gross ational procut .(, Dot m billions ot dollars, for the United States trom 196 te 1979, fa) kstimate Eq. (8.10, (6) What are the values of & and a? 192 -O0LY,+am07,, S099 a5 260 ‘Table 8.21 Business Expenditures for New Plant Equipment of Public Utliies and the ‘Gross National Product: United States, 1960-1979 (in Billions of Dollars) Year] 1990 | i961 | 1982 | 1963 | 194 | 18s | 1966 | 1967 | Ios] 1980 y] sa] | 4) so] ss] a) va] a7] wal m0 x | soo] sa] sox] sot] a7] oss | ss30] roms | sess | oss year] rom [wi [toma [rons [ae [ans [vom [sr [iene | aam0 y] aif w3a[ me] m7[ «| wif 2a] asa) as] a2 «| veza | voesa [ania | aaine [veins [asisa [a2 [isos | are | Bess Shanes Boonom pot of the Presdonr, U. Gaverament Ing Of, Woshingeon, BC 1080, pp 308, 348 CHAP. #) FURTHER TECHNIQUES AND APPLICATIONS IN REGRESSION ANALYSIS m0 8.36 Table 8.22 gives the total personal consumption expenditures F and the total disposable personal income, both is billiose of dollar, far the United State fromm 1064 to 1078, (2) Extimate the Almon lag model assuming a Ubree-period lap taking the form ofa second-degsee polynomial. (6) Does this model Ot the dats well? ‘Tale 9.22 Comsamnplive ant Disprnable Finan (ie Billions of Dollars) United Staten 860-1979 Year] 1980 | i961 | 1982 ] 1963 | tom | 196s | 96s | 167 | i988 | 1000 y | 309 [gas [982 | are | soos | 0a | sets | oa | aso | 097 v [saa [ao [aeso | wes [aso | ama [sia] sas] sat | ena Year| 19% | 17 | i972 [ 197 [ise | i97s | ise | ism? | gms | 19m y | aias | ease | 1380 | aon [asso | 979. to1x9 | i000 | 1ss0a | 105 asa | tax | aons | ont? | onam | anee.7 | rants Praose | vase | inn, Sour: Feonomie part ofthe Present. US. Government Printing Ofice, Washington,DC, 1980, 229. Ans. (a) 1908-4 1LMMY TOTTN, FOIA, yr O.0NN, 5 = 0.09 10.98) 267) 36) (I (hy Since only the costiient of X,. (ke, hie eataticaly sgnfiant atthe $Y lene adits ve exceade thevahte af fy, this adel daesant the data Well ‘The Kayek mevtel ne anather sham othe Alon nnd might be more appropriate, FORBCASTING ar 8.38 Por ¥ = 4 im Prob, 644, to, Pr, Ans. (a) sp 119 (b) Vp x78 fe) ATE DAT (6) the 95% waoiab ins vl Fie For Prob, 7.29 and Nip =2and Y;y= 1250 for 2000 Go) finds} and_ (By the 95% eonicoosintarval for Xp, giv that P= 227 — SADA, 8 O03Ns, Kym 6, Rem 1000, ew Eoin — he 22633/13 = 1886, i, 200, af 2 23419, and y,, S00 (a) ef or46n,61 8) 97.05 4 (2.18K21.68), oF between 49.85 and 14425 BINARY CHOICE MODELS a (Calculate the log-likeiosd values forthe logit mectel in Prab. 5.23 for By (6) oon. Ans: (a) InL= 36.59 (O) InL==9.70 fe) Ink = -6.91 Sands, = (a) 0, (6) aon, Calculate the logdikelibood walues far the logit model in Prob, 8.23 far by = 0.0018 and fy = (a) —38 (oy 3.0 tcp = Ans: (a) InL= 680 (b) InL=—6IT (o} nL = —682 INTERPRETATION OF BINARY CHOICE MODELS: sat sat Should cowffisients be the same betwee probit an lit models? ‘ns. No, logit coefficients should be proportionally greater than probit coefficients, Should marginal effects he the same between probit und logit models? Aes, Yes, matpinal effets should aitfer only lightly. CHAPTER 9 Problems in Regression Analysis 9.1 MULTICOLLINEARITY Multivollineariry refers to the case in which wo or more explanatory variables in the regression model are highly corrclated, making it difficult or impossible to isolate their individual clfects on the dependent variable. With multicollinearity, the estimated OLS ooefficents may be statistically insige fiflcant (and even have the wrong sign} even though R° may be “high,” Mulbcolinearity cam some times be overcome or reduced by collecting more data, by utilizing a priori information, by teansforming the functional relationship (see Prob. 9.3), or by dropping one of the highly collinear variables, EXAMPLE 1, Table 9.1 grves the #058 domestic produet X, [United States frorn 1985 10 1989 (the reason for using growth ratcs is explained an Chup. I pected that the level of imports will be greater as GDP and domestic prices increase, Regressing Yon, and Vs, we get Faomis + 1.39%, 40.092 R= 042 4146) 1.85) ry = 0.38, Table %1 Growth Rate Tmpoets, GDP and Inflation in ihe Unied States from 1985 to 1999 Year ee | wos] ness] oars | wocse] a0ass| oom] —e0is7| ams Ye | -aisas [ovens | ous | araus| ons | orere| - sit | 02611 Year | i993 [tom [ives | iove [ter | toe | 199 ¥ | sox] ois] cose | ovis] oo | o0sss| 0.108 Xi | vom] one | 002 | one | oon | oases % | ansar | —oison | anest | annie | 00s | seis 206 ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Tenms of Use, cular. 9 PROBLEMS IN REGRESSION ANALYSIS 207 Neither & nor bs is statistically significant at the 5% bevel, significant at the 10% level. but the A indicates that 42% of the variation in Y ie explained by the model even though none ofthe independent variables cand out individually. Tee correlation is positive carrelation Xj and 1, a8 indicated by rj. Resstimating the regression. tsithoar either 3 ne ¥;, a get 0044 206K, R= 026 ey =o094 all x, R03 (Ga In sine repression 1 list sii a Use 59 Reve an Ns significant at more than the $% level, indicating that the original regression exhibited maulticollinarity. However. ropping either variahle From the regression leads thiaced OLS ertimates, becatioe esonomic theory suggests that both GDP and prises showld be inched in the import function, 92 METEROSCEDASTICITY ‘fthe OLS assumption that the variance of the error term is constant for all observations does not hhokd, we face the problem of Kereroseedasticity. This loads to unbiased but inefficient (iv., larger than ‘minimum variance) estimates of the cocficsents, as well as biased estimates of the standard esrars (and, thus, incorrect statistical tests and confidence intervals}. ‘One test for heteroscodasticity involves arranging the data from small to large values of the indo Pendent variable and running two regressions, one for small values of X and one for large values. ‘omitting. say. oneefifth of the middle observations, Then, we test that the ratio of the error sam of squares (ESS) of the second regression to the first regression is significantly diflerent from ze10, using the F table with (a —d — 2k1/2 degrees of freedom, where n is the total number of observations, d is the ‘number of omitted observations, and & is the number of estimated parameters. Tf the error variance is proportional to V7 (often the case), heieroscedasticity can be overcame by dividing every term of the model by 1° and then reestimating the regression using the transformed wariahles, EXAMPLE 2. Table? pines sserage wagrs F ane the mumber of workers emplayed 1 by’ 30 fem in a iabstry Regressing ¥ on 1 foe the entire sample, we get Fe rs +00x R= 0.90 49027) (1610) The results of regressing ¥* on WV for the first 12 and for the lat [2 observations are, respectively P= si +omony Re = 066 (94) 1436) B88, = 0.507 Pe eijomay e= 040 (410) 3.89) B88) = 305 ‘Table 9.2 Average Wages and Number of Workers Employed ‘Average Wages ‘Workers Employed 840 8M 100 ec 0 90 980 980130100 0 1020 10.0010 90 Heo 8) D120 1270 0 00 208 PROBLEMS IN REGRESSION ANALYSIS lenar. 9 Since ESS,/ESS; 1095/0507 10 cxoseds Fg,» = 2.97 al the 5% keel of significance (see App. 7). the hypoth: tris of heterorcadaciety ie accepted, Reveimating the traraformed model io eorrect or betaroccedasticly, we att i wan + 25 (2) wean dey ons is 9A AUTOCORRELATION ‘When the error tm in ane time poriod is positively correlated with the error term in the previous tine period, we face the problem of (positive first-order) aavocorvelarion, ‘This ix common in timerseties analysis and Kkads to dowemard-biased standard crrors (and, thus, to incorreet statistical tests and weonfidenee intervals) ‘The presence of fistorder autocorrelation is tested by ullizing the table of the Durbin-Watson statistic (App. 3) at the 5 or 1% levels of significance for a observations and k’ explanatory variables, If the calculated value of @ from Eq. (91) is smaller thin the tabular value of dj; (lower limit), the hypothesis of positive first-order autocorrelation is accepted: Yee a oh The hypothesis is ejected if d > dy (upper limith, and the test is inconclusive if dy < df < dy. (For negative autncorrelation, see Prah. 9.8.) ‘One way ta correet for autocorrelation ist fist estimate p (Gruek letter rho} from Eq. (2.2) ¥, = yll = a0 PV OM, Bp tH, w2) and thow eeestimate the regression an the transfrmed! variables! 1 ~ ¥en) = Bot = A) — AN + OH — fd ea) ‘To.avoid losing the first observation in the differencing process, ¥iy'T — and Xyy/1— are used for the first transformed observations of Y and V, respectively. When & 1, attocorrelation can be ‘corrected by rerunning the regression indifference form and omitting the imsreept term (soe Prab, 9.12) EXAMPLE 3. Table 9.3 givec the kevel af inventories ¥ and sales §, both in billions of dollars, in U.S. man. fseturing (rom: 1979 to 1998, Regressing Yon X, we get = 12606 ¢ 103K, R098 (66%) dg =0.38 ‘Table 9.3 Inventory and Sales (Both in Billions of Dalles) in U.S. Manufacturing 1979-1998: Year | 1979 | 1980 | WsI ] 1982 | 1993 | 1984] 1985] 1986 | 1987 | 15Re y [om [oe [an [oe [oa [aoe | ose [ose [ame | te Year [ 1989 | 1990 | 1991 Tosa | aod | 1995] 1996 [197 | 1988 vy | sar | gas | sar | sas | ssa fans | ase | ast | ase x [as | as [am [ 2m | ast | am [ 200 | so [ser | a Sarees Penman Reporte Pree cular. 9 PROBLEMS IN REGRESSION ANALYSIS 209 Since d = 0,58 < de = 1:20 at the 5% level of signticance with m = 20 and &' = | (from App. there evidence of auocorrelation. An estimate of pie given by the coulicient of ¥,.. ix the following sagrvesion; Fa 6688+ O58 4 O88 O50 Ky OTT 84) Ba CLO) to transforan the original yaribks {i is coincidence here that 2 — W971 anal LMT 0658" = 117.30 for the fist transformed observations of ¥ and N, respes- rgrecsian on the traneformed wariablac (denoted by the asteriek) and pot 0.83 78 tively, se sera sesestooery ws Since now d= 1:78 o ie ees than for 1, thu 1.4 (hom App. 8), theres. po evidence of autocorrelation. Nore thar the valve of 7 i highly sgniicaty and is ako howe. 9.4 ERRORS IN VARIABLES erors in variables refer (a the ease in which the variables in the regression model include measure ment errors. Measurement errors in the dependent variable are incorporated into the disturbance term. and do-not create any special problem, However, errors in the explanatory variables lead to biased anc inconsistent parameter estimates. ‘One method of obtaining consistent OLS parameter extimates is to replace the explanatory variable subject to measurement errors with another variable (called an instrumental variable) that is highly correlated with the original explanatory variable but is independent of the error term. ‘This is often difficult to do and somewhat arbitrary, The simplest instrumental variable is usually the lagged explanatory variable in question (sce Example 4). Another method uscd when only ¥ is subject to measurement crrors involves regressing Won ¥ (inverse Teast squares; sce Prob. 9.15}. EXAMPLE 4, Tuble%4 gives inventories ¥, actual sales ¥, and hypothetical values of 4 that inctade measure- rent error X", alli billns of dollars, in U.S. retail trade from 19790 1998, Vand ¥ are assumed to be eror= free, Reegresing ¥, on ¥,, we get ee) (0.72) (5667 Regressing ¥, on Xj (i, is not available), we get Fim 678 | Lae Ny Rm O9D (1) (56.23) Table 94 Inventories and Sales (i Bilions of Dollars) in U.S. Retail Trade, 1979-1998 year | 1979 | wea | rot | 1982 | 1983 ] 1964 | 198s | tose ] 1957 | 1988 y fam fan [am [ass [sas [toe [ote [ist [os | ao Sonne oanenlo Report of the Preis 20 [Nove that dj < by: furthermore, éy falls outside the 95% confidence interval of By (1.40 to PROBLEMS IN REGRESSION ANALYSIS lenar. 9 Oh Using M/s as an invirumental varlatle for 2 GF 2 is uppacted to be corelated with 2), we pel Fo nsee Loy, Rose 48) 1aM19) “The sooflseat ow Xj-y is soser to the true ome (falls i the 98% confdunce interval of 1.42 191.57, and is consistent Solved Problems MULTICOLLINEARITY SAL (ay What is meant by perfect mulicollineariry? What is itsefTeet? (b) What is meant by high but 92 not perfect, muticollinearity? What problems may result? (c) How can multicollincarity be detected? (a) What can be done 10 overeome ar reduce the problems resulting frem multi- collinearity? (@) Two ar more independent variables are perfectly colfmear if ane 0° more of the variables can be species ah a logar combination of the sther variables). ov enannpl, there is psdést wal sollincaity between Vy and Nz if My =2¥3 00 Xy=S—(1/3)Nq If two or more explanatory tariablos anv peefectly nearly corrlaed, it wll be knpowible to calculate OLS ertimatey of the frrameters because the systes of oral equations sail costa two oF more equations that are Hot independent (@) ish, bat not pesfoct, malcotinceriy refers to the cae in whic (wo or more independ variables in the rogresion modcl are highly correlated. ‘This may mals i dificult or impoarble to ola the ofet that etch of the Righty collinear explanatory wariabes has on the dependent varibl. Homever, the OLS estenatedtcoefieents are stl ibis (the moe properly speed) Furthermore, i the Frincipal aim ie prediction, mftcolieaity i not a problem if the same multcolinearty patter persists during the forecasted period, ©) Ths classe case of tautcoltiocarity aseurs when nose of the explanatory vadables ia the OLS regresion ie atoitically significant fand some may even have the wrong sign even. though 2 tay be high (ay, berwoen 0.7 and 10}, th the les clereut eaves, detecting muliolinenrity may be more dificult, High, simple, or partial correlation eveflicients among explanatory vari ables ate sometines used as a measure of multcoliscanity, However, serous mulicofieearty can be present ven if simple or partial correlation coefficients arc relatively low (i.c., kess than as (a) Serious rlticitinearity may somtimes he corestat hy (2) extending the sie a the sore at (2) shiieins @ pron anformation fez, me may know from a previous stody thal 6: —D25h), (3) transforming the functional relationship, or (4) drapping one of the highly collinear variables (however, this may Irad co specification bias.or error if theory tells us that the dropped variable should include in the mod, Table 9.5 gives the output in tons Q, the labor input in worker-hours £, and the capital input in muchins-hours XK, of 15 firms in an industry. (a) Fit_a Cobb-Douglas produce tion function of the form = AE" A%4" 10 tho data and find and the simple correla- ttoa cooficient between In and Ink. — (&) Regress in Q om Ind only. (6) Regress Jn @ on Ink only, (d) What can be concluded from the results with regard to multi- collinatity? cular. 9 PROBLEMS IN REGRESSION ANALYSIS a ‘Table 9.6 Ouiput, Labor, and Caplial Inpats of 15 Firms in fmf 1 [2 fafa ls fe fa]s]ofwlulfali fafa @ | 23m] 2470] 2110 | 2500 | 2esa 2240 2430 | 2530 | 2ss0 | 2480 | 2290 | 21a | 2a0n | 2490 | 2590 | 222s [ane [aan [aes [asec [are | Sem [aa [aus [ane | sai [me [ae [ano [aro K [ism] anso] 1150 | 990 [aaa] v340] 1700 vee | aaa [1790] 1480 [ 1240 [vee | 1450 | 2000 (a) Trarsfonning the data into natural log form as shown in Table 9.6 and then regressing In om In £ anal Ink, we get R= 0960 InQ—0.50407%inL 40.190 K B a6 0h (1.360 ing = 0992 toy InQe 550 L7LInt R= 0968 (77 (18.69) © n= 51040Ming an 966 (8.78) 0919) 4 Singensitter 8 nor n pares statistaly significant atthe $% level (oe. they have unduly large slapatd cru while’ =97, dete is ear inition of serious mltnlinetty. Spiel, large firms tend to use both more labor and mere apital than do sal firms. TAs isconfimed bythe ‘ory high valao of 099 For the imple sorrslation eaiont butscen IE and Ind. In parts band o, sip regzessons were rsstinated with either in Lor lo K asthe only explanatory variable. In these sien egsessns, hall Inf and In ave statically sinicwnt st rvch me tha th 1 level with Rexcecding 0.46, Hourevee, dropping ether bn or In Eom the mule regeasin beads oa bised fable 9.6 Output, Labor, and Capital Popats Original amd Log Foren Firm 0 i x me [owe | ink fl iso | zaeit | r7sma | nase 2 isso | zstia7 | r7apsa | rama 3 uso | nessas | sags | nomrse 4 94a | r34776 | rsao4 | 7.5m 5 casa | rasan | tsar | Tanase 6 i349 | z21423 | a.7aios | 2.20082 7 1700 ze7tase | 7.408 5 160 zyas2 | 7.52838 0 1790) 778447 " st 774110 n a0 7.73002 B tesa | z.7m3a2 | 7.70738 “4 asa | 22000 | 7.70565 8 cow | Tia | 781197 22 PROBLEMS IN REGRESSION ANALYSIS [cnar. 9 ‘OLS slope sstimate for the relained variable because economic theory postulates that both labor and ‘capital should be included in the production function 9.3 How can the multicollinearity difficulty faced in Prob. 9.2 be overcome if it is known that ‘constant returns to seale (i... by +2 = 1) prevail in this industry? ‘With constant retums to scale, the Cobb-Douglas production funtion ea be rewritten as Oana [Expressing this prostuction function in doublefog form and rearranging it, get Ing Indy # yn Ee(l dyin a InQ In. =Inby + Ayla = Ind) Setting In Q* = In Q = In K and In." InQt= O07 + O83 nit’ = 0.992 @26 C981) L=Ink and then regressing In" on Ini, we pot HETEROSCEDASTICITY 9A (a) What is meant by heteroscedasticity? (b) Draw figure showing homoscedastie disturbances and the various Lorms of heteroscedastie disturbances. (c} Why 18 helerascedastiety a problem? 0) Heterascedestcity refers to the casein wich the variance of the eri teem is nod comstint forall valies of the independent variable; that is, EL.) # 0 80 iin) #7. This violates the third assumption of ‘the OLS regression model see Prob, 64), It occurs primaniy in cross-estional data, or ¢xampke. the srvor variance asiocinted withthe expenditures of krw-income familic is usually small than for high ‘neo fares Because most of the expenditures of low-income families are an access, with ite room foe diserction. 16) Figure 9-ta shows homoscedasti (i... constant variance) disturbances. while Fig. 9-14 and d shows hneteraseedastic disturbances, Fig. 9-16.07 inereases with ¥. In Fig. 1c, decreases with Xj. In Fig. 9-1 of first decreases and then increases as X, increases. In csoaomics, the heleroscedasticity shown in Fig. 16 is the most common, so the discussion that follows refers to that fe) With hoteroscodastcty, the OLS parameter estimates are still unbiased ans consistent, but they are inefficient (Le. they have larger than minimum varianess). Furthermore, the estimated variances af the [parameters are biased, leading to- incorrect statistical tests for the parameters and biased confidence interval cular. 9 PROBLEMS IN REGRESSION ANALYSIS 2. 9s 96 (a) How is the presence of heteroscedasticity tested? (41 How ean heterosoedasticity be corrected”? (a) The presence of heeroseastiity can be tested by arranging the data from smal ofrgevalus ofthe independent variable and then runing fwo wparate regressions, ont for eal alucr of and one for large values of X;, omitting some (say, one-fifth) of the middle observations. Then the ratio of the ror in of sures OF the sec rarest t0 the stor um of sere ofthe fit regretson ESS)/ESS;) is std to sei signin iret from aco. ‘The Fdistibution fs ed Ke ths test with (pa ~ 2ky/? gree of frowlor, where vie the total umber of abseriations, oe the rmber of omitted observations, and & i the number of estimated parameters, This ts the Cae elt Quad ts forever on large men (ny Fin 2230} ‘middle observations are mite, te test sll earnect, but it will have a redacal omer ttc, tetroseedatniy (by {Pits assumed fas ofa isthe ese) that vara, = C7, where Cs a nonzero constant, We can sorrst for heeronadtny by sivading Gi, wishing every tee ce rgnemion by 3 a he roi rating the rgression ting the transformed wares. Inthe tuowarable cas, we have oy ge Pa Bam et oa ‘The transformed error term is now homeseedastic od ae = yp [Note that the original interoept has become a variable io Eq. (9s), while the original slope parameter, Dy, mow the new intercept. However, care must be used to gorrectly interpret the results of the lvansfunnied wr weighted repression, Sint in By. (4) the crore are onwoeeastie, the OLS eatnnstes are not only unbiased and consistent, but alo effleient. In the case of a multiple regression, each term ‘of the regression ia divided (L.., weights!) by uhe independent suriable (ay, ¥) that i thought to be associated with the error term, 80 we have vari es In Ea, 49:4), the original intereep, thy, has become a variable, while & has bscome the: new intereept term, ‘Weesn visually determine whether itis My or Vy that is rebated to the , by plotting Vy and Xyy ‘against the regression rexiduas,e, Table 9.7 gives the consumption expenditures C° and disposable income Ys for 4) families (a) Regress C on Vy for the entire sample and test for heteroscedasticity. (6) Correct for heteroscedasticty if it is found in part a. ‘Table 8.7 Comoanption and Income Data for 30: Families (in US. Dalles) ‘Consumption Income Toe 10,800 11,100 08 400 11,700 12,Lo0 13,000 2300 12,600 13,200 14,000, Bean 13,300 13,600 1.00, ian M900 14.200 16.000 igo 4,900 15,300 17,000 1000 15,700 16,400 18,000, 15900 16,500 16,300 19,000 re 300 17,300 18,100 0 i200 17800 18,500 21.000 4 a7 PROBLEMS IN REGRESSION ANALYSIS lenar. 9 () Regressing C om for the entire sample of 30 observations, we gst C= 1480.04 0.788 Fy R= 0.97 2) 2937) To test for heteroscedasticly, we regress C om Wy for the first 12 and foe last 12 observations, leaving the mide 6 observations out, and We Bet é-sureosiy, asi 0%) @91) .m Ca2ne7 407%, =O 1079) (G00 SS, = 3.640 Since ESS,/ESS) = 3,344 000/2,069,000 = 3.13 exconde F = 2.97 with (20~ 6 44/2 = 10 degrees of fovedovn i the muasater aod denominator atthe 8% Eevel of wigficones Cece App Th accept the iypathesis of heteroseedasticity, (6) Assuming tha the eror varanee i proportional to rj, ad then ressimating the regression using the transformed variables of Table9 8 to comet for heterossdastinty we get (inthe last column of Table 918, 53953304 = 000005933535 he Fellows 1 muy pias. Remus ¥)” Gun? so) 8 ‘Note that the marginal propensity wo sensume is now given by the intsrespt (Le, 0.792) and is larger than before the adjustment (ue, 0.788). The statstical significance of both estimated parameters is now even higher that before. "The Rf the weighted regression (ie. 0.32) is much lower but not directly comparable with the A of 097 before the transformation because the dependent variables are erent (7.0 a5 opposed to. Table 9.9 gives the level of inventories / and sales S. both in millions of dollars, and borrowing rates for 35 firms in an industry. This expected that F will be directly related to $' but inversely related to A. (av Regress J an Sand A for the entire sample and test for herernseedasticiy 6) Correct for heterascodasticity if itis found in part a, assuming that the error variance is propnrtinnal to S? lu) Regressing / on 5S and A foe the entire sample of 38 fies, we get GIT OS — 028k Gy CRer oe Tovtestforheteroscedasticity, we regress fon S and R for the first 14 and for the ast 1 observations, leaving the msddle 7 observations out, and we get 223 70.168 0.228 Wao (190) (-481) ESS) = 0.908 16.004 0115-1408 FW =096 G38) (3381 ESS. = S014 Since ESS, ESS, = §.114/0:908 = $.83 exceeds #1) = 7), we accept the Rypotncsis of heterasceaastty, (b) Assuming that the error variance is proportional to 5 and restimating the gression using the transformed variable fo covet for etcrncedanicity, we pet 2 at the 5% level of signilicance fave App. = O21 — 845(1/S)- Q18ER/S) R= 093 ans C298) cular. 9 PROBLEMS IN REGRESSION ANALYSIS as ‘Table 9.8 Consumption C and Disposable Incoone (¥,) in Original amd ‘Teametoeaied Foren Family | C$ ae 1s. ie 1 tad | Bom | osaans | OsR33Eoe 2 tagay | 12900 | a900q0 | .833333E.08 3 uta | 10m | aszso0 | 0833333E-08 4 nao | Bom | asre2s | 076231608 5 rizm | ise | asooma | ores E08 ‘ rat | om | agaotes | 0760291 B.08 ? iaa00 | igooo | oats | ottasce o4 5 1260 | 1450 | aso0000 | 0.714256E.08 9 tx200 | goo | asazssr | ottazssr-on 10 13000 | 1000 | asesser | O.666667E-08 n 13300 | 1000 | asses? | (os6s6TE-0s R txe00 | 13000 | a906867 | O.sc6s67E-09 B isan | i600 | oseasmm | 0.62s000B-08 u i400 | 1600 | 0875000 | 0.6250008.08 8 taz00 | téo00 | asesoa | o62s000 E08 w i440 | zoo | asamase | osen ase 0 14900 | zon | asmeai | o.seR2asE-08 18 isa00 | 170m | aso | osonasb08 9 tso00 | tq | osama | osssss6E-08 a ism | sam | axrees | sssss0n-08 a tea | ton | ast | asssss6E-o8 2 ts900 | 19000 | assed: | O.S26316E08 3 teso0 | om | assur | 0526316608 4 ta900 | 99m | assure | o.526316E-08 2 ise | onam | asasenn | o.sooanr- ng 2s i700 | ano | as7s000 | 0.s00000E-08 2 i100 | ang00 | asosoa | 0.S000008 08 8 ina | arom | asisags | 0.4761908.08 3 irs | 2100 | assets | ator90R-08 Py isa | 210m | asaooss | .4761906-08 by =0.21 ts now the slope eacflicient associated with the variable § (instead of 0.16 before-the trans- foeesatinn), while Ay = 0118 isthe slap evefickeat astocaten wil the wareable A denstead af 1% before the transformaiion), Both these slope coefficients remain highly significant before and after the transformation, as does R, The pew constant is 8.45 instead of 4.17 AUTOCOKRELATION 98 (a) What is meant by awevorrelarion? (b) Draw a figure showing positive and negative first- onder autocorrelation, {c) Why is autocorrelation a problem’? (a) lweocerrerion or seria? carretition refers 10 the case In which the ear term in one time period Is commited with the error term in any other time period, If the ervor term in one time period is suelated wil i the previa tine ysiod, tees 8 fotserder wilesrvdlation, Ma of the applications in economeinis involve frst rather than sceond- or higher-order autocorrelation ven though napatiwe autocorrelation is possible, moet economic time scarier exhibit positive 216 PROBLEMS IN REGRESSION ANALYSIS [cnar. 9 ‘Table 9.9 Inventories, Sales, and Borrowing Rates for 35 Firms 7] )) 5, 2, 3] 4) 5) 6) vp nl ep ny 2) ep ep ay sf) i] val as v8 fw fn fin [nafs ira [ ve [nie fir | ve ize Bfspul al wf ap al ef ef ep ala 133 | asa | 3s] 136] 199 [14s [47 [os [ass | ies [70 99 sautocoredation, outive first-order seal or autscorcelation sngaas that Zsa, - Oy tits visting the fourth OLS assumption (sss Prob: 6.4), This is common in time-scriss analysis (0) Figure v-2u shows positive and Pig, 2» shows negative first-order autocorrelation, Whenever several ‘sonsoevtive residuals have tho samo sign as in Fig. 9-24, thore 8 positive first-order autocorrelation. However, whenever consecutive residuals change sign frequenty, as ir Pig. 92h, shore is meptive fst ‘order autocorrelation ey) Panel A: Bove meson 4) Panel Bs Napaive muscorelton ay Fig 92 fe) With autocorrelation, the OLS parameter evtimates are still unbiased and consistent, but the standard era the estimated regressive parameters are hia, levine to ineseréet statistical testand hive ‘confidence intervals. With positive first-order autocorrelation, the standard erzors of the estimated segresion parameters are biased downward, thus exaguerating the precision and statistical significance ofthe estimated regression parameters a) How is the presence of positive or negative First-order autocorrelation tested? (4) How can autocorrelation by corrected? fa) “The presence of autsenrretation can be tested by calculating the Durhin-Watsom statistic d given by Ea. (0.0). This routinely given by most computer programs such as SAS Se ae wn Har. 9) PROBLEMS IN REGRESSION ANALYSIS; 207 The calculated value of d ranges between 0 and 4, with no autocorrelation when d isin the neighbor howd of 2. The sahwe of indicating the prevemee or absence of postive or negative first order autocorrelation, and for which the testi inconclusive, are summarized in Fig. 9-3. When the lagged dependent appears as an explanatory variable in the regression. cis biased tomard 7 and its power to detect autocorrelation i hampered bento | Fig 83 (6) One method to correct positive first-order autocorrelation (the usual type) involves first regressing ¥'on its value lagged one period, the explanatory variable of the model, and the explanatory variable lagged ‘one petiod: Fy boll — alt oF a tb Me ~ bape + wm (The preceding equation is decived! by mukiptying cach tem of the original OLS mod! lagged ome pesiod by 9, eubtacting the resulting expression fram the original OLS model, transposing the term 2 from the Int to the night side of the cation, and defining vj — iq — wen) Ths second step involes using the value of ound in Eq (9-2 o transform all the variables of the original OLS mol 2 indicated in Ex. (0.9, and thon exiting Eg, (0.3) TAM) (l= AF OL Ae 3 The-error tenn, 1, in Bg (95s now free of autocorrelation. This procedure, known asthe Duchin ‘vo-sage method, i a example of generalizad least squares, To avoid losing the ist observation in the dilfercnesng peocoss, Vit ~ and Xyy/l — pr are used forthe frst transformed obscrvatioe of Y and X, respectnly. Ifthe autocorrelation is due tothe omission of an important variable, wrong fanctional form, oF improper model specification, these problems should be removed first, betore applying the presing corevtion procedure or autocorelation 9.10 Table 9,10 gives the level of U.S. imports Af and GDP (both seasonally adjusted in billions of dollars) from 1980 to 1999. (a) Regress M on GDP and test for autocoreclation atthe 5% level of significance. (b) Correct for autocorrelation if it is found in part a. ey i= 201.80 p0.14GDR, A 2098 (645) O94) dw 8H and k= 1 (from App. 8), there Since d = 054 < dy = 1.20 at the 5% level of significance with » is ouadenee af positive festenndor antneaerelatinn able 9.0 Seasonally Aqqusted US. Mmports and GLE (Moth i tows of Dollars) tram IMU to EM Year | tee | vet] 902 | wes | 19m | 195 | 19s | we? | ree | ioe mw [22] soa] cua] am0| ais | amo] aor] sor] sas] sme 4337.3 | aaone | sasn3 | sono Year | is [vr | a92 [wes [19% | ios [iam [wer [toe | 99% mw | sma] ooo] emi | veo] a6 | 9093 | a8 | vaso | riers] iam) ‘opr | 20088 | 3203.1 | 33186 | aoese | aoa | ai ‘ape | sav | eas0.7 | teas | e705 | 72177 | ts00a | roan | sata | sorta | ossa7 2 PROBLEMS IN REGRESSION ANALYSIS lenar. 9 (@) To-correct for autocorrelation, frst the Following repression is nun Af, = 103214 082M, y+ O36GDP,-023. GDP, = 098 am ey (42 “Thea, niing = 1K? (ths coe om AF, de The paren earn), we Beane the aeiaal varables-as indicated in Eq, (9.3). The original variables (M and GDPhand the transformed variables (AP and GDP" are given ip Table 3.11 denaVviORE = 17128 and GDPhyy = kav TOE Mowe 670.618 Table 9.11 US, Imports and GDP in Original and Transformed Form 1 a0 #10 74056 S09 584 116.182 970.008: ia ona 97452 lolLs30 wr 802 995.678 153.186, lms 138405 L218 129.330 L276 1ST.S28 126.140 105.655, Lasts 163.120 1493.66 e178 Lago.268 248,782 Leas.200 ausazs Lotw. 78 Dn.IT4 1807.37 27204 los ast 255.960 389.314 Repening M* on GDP*, we gst Nig =sM4470GNR Rs as8 a) Lan 4-1 Since now «f= 1.6 > dy = 141 atthe 5% level of significance with @ = 20 and &” = 1 (from App. 8) there is no evidence of autocorrelation. Note that though GDP? sernains highly significant, is (value ‘slower than the ¢ valve of GDP, la addition, = 0.88 nauk, as opposed to R= 0.98 before the sonsstion for autocorrelation. Table 9.12 gives gross private domestic investment (GPM and GDP, hoth in seasonally adjusted billions of 1996 dollars, and the GDP deflator price index P for the United States from 1980 to 1999, (ay Regress GPDI on GDP and P and test for autocorrclation at the 5% level of signifieanee. (2) Cortoct for autecorrekation if i ie found in part cular. 9 PROBLEMS IN REGRESSION ANALYSIS 219. Table 912 U.S, GPDL, GDP (Both in Seasonally Adjusted Billions of 199% Dollars) and GDP Bollator Price Index, 1982-1090 Shurcé St. Loss Federal Reserve (Rustan of cosa Asay), w PDI, = 19971-4056 0D",—2.70 2, A = 097 cosy 607) d= 6 Sine d= 0.56 «ly = 1.85 othe 5 val fina ihn = 18 aa = 2 Gann Aa 8 hs itevidecce of exiocoreition. (0) To coret fr atocoreon, sy Re Ft. Tess PDI, = 29.79 094GRDI, + BAGGDR, — OPEN, y+ LEP AON a0 2 4) 0m 06, eo Then, wing 9 = 0.74 iUhe euelTccn on GPDI, im the peoing repression), we (vansfonn the uviginal ‘variables as indicated in Eq. (9.9). ‘The original and the transformed variables (he latter indicated by fan astril’ aro von in Table 13, GPDFjgs — STV OTe — 384126 GDPjpgs = 491561 — 0.14 = 3306-266 Phy = OTT = OT w 45.361 DIS on GDP; and F7, we get OFDM; 31.05-+4.52:0DP} ~ s02rF 81) 654) Since = 1.77 = dy = 1.50 atthe $9 bevel afsignificance with w= 18 and! = 2 (from App. #), there ic evidence of antecoreeation Rath variables exnain highly signin, nd R? alls 9.412 Table 9.14 gives personal consumption expenditures C ane disposable personal income ¥, both in billions of dollars, for the United States fom 1982 to 1999, (a) Regress C, on Y, and teat for autocorrelation. (8) Correet for autocorrelation if it is found in part a. (ay 60977, = 099 (6.38) 9365 d=a.ss Sings d = 0.58, there ie evidence of autocorraation at hath the S and 1°% lovee of significance 220 PROBLEMS IN REGRESSION ANALYSIS lenar. 9 ‘Table 9.13 GPDDL, GDP, and P in Original and Transformed Form Yeu} cro | app |r | appr | apr r v0 | ez [ amee | sats | aeni26 | s3me266 | asacio wos} vss | sort | otto | 2is72 | t3ga6 | 20.3216 ios} sii | aise | 6744 | asses | 127.796 | 2n.0n60 iss | tra | sass | 09.25 | 339.590 | temas | toads ios} atas | Seni | s2a4 | ainam | tetas | 206080 wees | wars | ssnao | 140 | aassor | teres | angie tse | sasz | soexs | 7605 | 181.228 | 1673.80 | noose ast | maa | coae4 | 73.46 | somosz | 1816970 130 loss | 916.7 | oiss2 | size | 22.018 | rssi.ra4 25 tw | ze | osass | waza | ans | roman | mone oq} 496 | coas2 | 97,76 | 1ess4 | I75sat0 | 25azz4 i931] seax | ora | 9947 | 235496 | 1780302 | 25.5378 1992] ote | coos | 9286 | 302002 | aorise | seize 1993} wisé | mise7 | 9479 | ager | 1995655 | 262956 waa] risa | tasty | 9674 | 93956 | mise 762 | 26 suse ims | rise | rai | 9879 | 301030 | 2100696 | 272024 woe | izes? | 721.2 | tones | azaeae Lope | 27a ier | sass | sve | w249 | ass.se2 | 2umsrae | soz wos | anos | sesas | woo | sasaa | osszssa | 27sa7a 999} irsi6 | goat | 10531 | sen274 | 2679.10 | ons ‘Table 9.14 USS. Consumption Expenditures and Disposable Income (in Billions of Dollars), 1982-1999 Year] uel | 9m ] ea | ims | ie Ts] 89 |e © [mms | oaca | cama forire | onos> | vunsa | vane | von? | wars 18 | 25060 | oars | 30ne.s | 32025 | 3as05 | arse [ 4one3 | 42936 Year] war | i9z | 1993 | i999 | 199s | 1996 | 1997 | 19s | 19mm © | soma | amr | aasa7 [area | apa | sev7s | ssnaa | saane | asa ¥ | aires | ars | aoasa | stes | sec | sora | soma | sae | ovox2 ‘Source Eeenomic Repor af the President, (6) ‘To-correet for autocorrelation, frst the following repression is run: E-NM4LBC 4 OT = DAY, isp 1.99) 3.08) 99 Since 21 (the cafficient on C,- Aiferences of the original variables ti the preceding repression), we rerun the regression on the frst, AG, and AY), omitting the intercept, and get Agmosray, Ri =0s8 ese) dors The ew val of Wd inlaid wo evidense of aulucontstion A ébés Ha bo atthe 5% kvl of significance, (Nate: 8 is not well defined in regression. with no intereepé and therefore is not compa able with the previous repretrions, For a marein depth ctady of procedure when = 1, s00 Sec. 11,3.) cular. 9 PROBLEMS IN REGRESSION ANALYSIS mI ERRORS IN VARIABLES 9.43 (at What is meant by errors in variable’? (b) What problems de errors in variables create? (o) Is there any test to detect the presence of errors in variables? (a) How ean the problems created by the existence of errors ins variables be corrected? cerors bt warfabler wer to the c4as in wBgh the wariables in the regression model inchide mssurcment crrors. Theve are probably very comman in view of the way moat data are collated and elaborated. (6) Measurement errors In the dependent varlable are Incorporated into the disturbance term Kkaving ‘unbiased and consistent (although inefficient or larger than minimum variance) OLS parameter sstimates However, with weasurencat denis ia the coplaaatory vache, the Gh of the OLS assumption of independence of the explanatory variables and error term is violated (see Prob. 6.4). Jeading to biased and inconeitent OLS parameter ssimater, Ina simple regrorsion, #, ic biaead downward, while, 1 biased upward. oo (0) There is formal ok to dels tbe proense of eine knowledge of haw the data were gathered can sometimes tho problem, (41 ne method of obtaining consistent (but stil biased and inefficient) OLS parameter estimates is toreplace ‘the explanatory sariable subject 19 measurement ervars with another variable that & highly corvelaed ‘withthe explanatory variablein question bat which is independent of theervar tren. In the real world, it might bs dificult to find such an instrumntal variabls, and onc could nevsr be vars that it would be independent of the error term, The most poplar instrumental variable is the Lagaed value of the explanatory varlable in question. Measurement error in the explanatory variable only alge can be conected by inverse least cquares, This involves regressing on ¥. Then, y= —lgchy and ty, where fy and A are consistent estimatesf the intereent and slope parameter ofthe repression 7, 00. veaiables, Only camumie Ue y anal jve some indication of the seriousness of of Table 9:15 gives inventricn Y, actual sales X, amd lypotletival values uf A that inelale mica surement errors, all in billions of dollars, im U.S. manufacturing from 1983 t@ 199%. ¥ and are aosumned (o be free of measureenent errors. (@) Regress ¥; um). 4b) Regress Yan Xf (om the assumption that 1 is not avaiable), What type of bias results in the estimares-in using X” instead of 4c) Use instrumental variables to obtain consistent parameter estimates, om the assumption that 1, is careolated with 1. How de these parameter estimates compare with those obtained In part 6? (or Fim 1606940904, PROBS (11.66) (16.46) ‘Table 9.15 Inventory and Sales (Both in Billions of Dollars) in U.S. Mamufaccuing, 1983-1988 Your] 15 ] 1989 | 198s | ime ] war | ise | 19 | 1990 y {ome [osm fos [oor [om | seo fa fans m [oo [om | ws | ae | as [om [ae woe | | ow | 2 | ca | ae | as | as vear| wor | v2 | tos | 194 | tes | 9 | wer | 99 ee ee ee ¥ [2 [2 | 2a | 2 | w | om | | ae mt | oa | ae | ae | wo | ass | am | ee Snse Bronade Ropor of the Breton 22 PROBLEMS IN REGRESSION ANALYSIS lenar. 9 (é) Regressing ¥, ony (iF.%; is not available), we gst Ty = 182.504 0. (1338) (15.231 [Note that BY = Bs farthermore, fy falls outside the 95% confidence interval of i (0,67 10° 0.89), (6) Using 43.) a8 an instrumental variable for 7 (ify i Believed to be correbatod with ay We get j= 187.00 1 0.80 Ke, RE = OE Lady (123% ‘The soeficient-on AY, is closer to the true one (B, falls inthe 95% confidence interval of 0.66 to 0.94}, land is consistent, Of course, in the real world itis rarely known what error of measurement might be Pron! (otherwise, the errors could be corectd before running the repesign}. Its aladieult or Impose to establish whether is eorelated with uy, 9.18 Using the data in Table 9.15, (a) regress XY on ¥, in order to overcome errors in measuring X, (de) How do. these sults. compeate wit those ins Prob 14¢ oy? (@) Since only ¥, (ie, the explanatory variable) is subject to measurement errors, inverse least squares is sanoiner method dor obtaining consistent parameter estimates, Kegressing of 1,, We get 206 1131 ¥, (6.68) (1520 where &y and fy are consistent tut sil asd) cstimates ofthe intercept and ope parameters of the reamesion of Yon (Using inverse least squarcs gives better results in this case compared to the instrementabvariable rmsthos [se Prob, 218 Wh iatramental sainlen, Roth the svimatas imcrecpt amd slope pinhole art farther foun ‘he true Salton.” Homever, the sills nay tery Will ie im clher Ente n any ove, nthe real world we sckdom bec kat types of erro are present, what type of adjustment is appropriate, and how clase the adjusted parameters are to the true parameter values, Supplementary Problems MULTICOLLINEARITY 916 Why can the following consumption Function not be estima? Ch Ng th M gS AN ea where AV y= Vo — Ya Ans. Because there wa perfect multoluncamty betwven A 7 oN Ms and and 9 and Y., on the ener. ‘As a result, there are only three independent normal equations and fonr cosfisients to estimate, and 5910 tankgue solution Is possible 9:17 Table9.6 gives hypothetial ta on consumption expenditures C, disposable income ¥,and wet Wall in Ub of dlls, fr a sume 15 Esa (a) Ragen Cr Ty ad Had Gd aad ry iby Regrest on Ygoniy, (0) Regress€ on Wan. dd) What can you conclude fom the preceding with regan to mcollnarity? cular. 9 PROBLEMS IN REGRESSION ANALYSIS 22 ‘Table 9.16 Consumption Expenditures, Disposable Income, amd Wealth for 1 Families remy) 1] 2] >, 4] 2] [7] =|] ele] ulule © (le (s[e[ele(e|=[e)e] ale] sla] = [| =[2[=["[=[=[>]=[=)s] =e] =]=]o wo} isa) 47 | 63 | 7 | 67 ‘Te | 90 | 38] 70] 208] 76 | 149] 86 | 76 RA Ans. (a Ca LSd + LAs O15 0 R097 2 CORD ew 985 o = 213 + omy, Rooms (498) 6.25 oO é=1n+ aisw aos (637) (41.46) (a) Serious erica present 9.98 a) How cn prin inemation that y= .25h e- t etoe e ineaiy eae Prob 9.17 th) Ressimat the repression of Prob 9.17. incorporating hea pri formatinn (indie ia pact oto overcome the maltotineaity problem. {c} What i he value uff? OF? Ams, (a) IBy cstimating, byt byZ, where Z = Yd +0250 tbh O- 1594 az ROO (875) (aL © f=039 and ie HETEROSCEDASTICITY 9.1% Table917 gives gos ed capital formation Y; sms, bot in thousands of tars, for 38 ims ma industry. Regress ¥,on 7) (a) for all the data, (6) forthe first 18 observations arly and reeard the ersor sum of squares (ESS), (e) forthe last [4 observations only and rooond the errar sum. of sqaccs {ESS;). a) Fest For the presence of heteroseesastiey, Taide 9217 Grove Fis! Capital Porat wid Saks for 38 Few ‘Gross Fined Capital Formation ‘Saker ~2 9S mS MT Me 2) 2 | tO 31S S319 see | 310 AS73830 | om kaka | 4a MO 482 dws | tow Ans. (a) ¥,= 21637 + 0079x, err 8.50) 22.00) om 71 + ORY, Range els 35 ESS, = 4597 24 PROBLEMS IN REGRESSION ANALYSIS lenar. 9 © Feige sorsy, ean 2) iT) BS =e 108 exceeds Fy) = 282 at the 4% Kevel of significance, heteroncstlastciy is UW) Since FAS/PSS, present 9.20 Assuming that the error variance is proportional 19 .C? in Prob. 9.19, ta) gorreet for heterascedasticity. (G) What i the vale of his now inlet ana ths rcv ahape pevatelsy amounts) itl Ue wanbae Xi How do they compare with the corresponding valbes before the transformation? i ns. (a) 5 = 098 oor +20(F) oan ae () The value of the new intereeptis 23.187 (instead of 21,637), and the new slope parameter associated with the variable 4, is now 674 (instead of 0.079), 921 Table 9.18 gives the level of gross fined capital formation sales), Both in thousands of dollars, ad a productivity sade Ay, for 36 firms in aa industry. I is expected that ¥ wall be directly slated to boty and J Regress Fon Jy and As for (a) the entice sample, (3) the 14 observations with the smallest valuge of Mj and record ESS), and (c) tho 1 observations with the largest valuer of and record ESS, () Test for the prosonoe of heteroscadat Table 9.18 Grose Fived Capital Formation, Sales, a Productivity in 38 Firme Fim? i f2]2{,4¢]i,el7[,t]o] elas ¥ [soe] ons | a2 [ae | as | ws | es |e fai | om | as? | an [ass [1 | 300 [aes | to [tos | are [ass [vas | 250 | ans | ons % [193 [108 [tea [re | 67 [ oo [ 9 | 19 | me | rae [iat | ass Fim] is | [is | | at [as [ 9 | 2 [20 | 2 [2s |e ¥ [ue m8 | 363 [sta | ws | aaa | ava | ao | wa | asa | wa m | veo [ais | axe | io | a0 | 200 | one | wos | a0 | oo x [ios [ise [9 [es] is | wo] ma [ast] mr] ns] 93 | 99 y | aso} ass | asa] aor | ara | aor | asa | ava | sia | ate | ans a [a0 [sa [ae | 200 | 20 | ao | 250 [ss [ies | aes | i %, [ivs[ io [ars [aso [ies [or | ar] ws] os] sa | aor As. (a) Fa 11009400174 + Le08Y, R= 0.99 cs (8.97 w Fasaasomun a7, 08s G9 (a9) ESS, =M658 ) F = S87840.010.%, + 211502 Ry = 099 (5 Bas) ESS: ~2.126 (i) Since ESS,/ESS, recent 123 enoceds 82 ai the 5% level of significance, hetesoseedasticity is cular. 9 PROBLEMS IN REGRESSION ANALYSIS 2 9.22 (a) Assuring that the error variance is proportional to 3 in Prob, 9.21, (uw) correst for heteroscedastsity. Cee eee el i oN ets Sort ni caps wna ho era si Fo ses sams(Mfoam(h) ata 2 10s, gaasy a, ©» Te tee gr 220 nnd 12009 whieh oy pe me nah sae HET tne Se pm date oa ade vo svroconmeLaTion 421 Bs profiad patna 90 that jt i a eh commercial paper interest rate. for the United States from 1982 t0 1999, (a) Reeress ¥ on. Is there nse strc de nek surat Rea ae hy nt tsi tes apes 1 oi cea tc af al Tw thle la” ey tcetalivsaee se Deter afapaianet Aide level of significance? leh fa ed Semmes Ch my tts 15 nape ot Hat att Ta [oe [om] ow [oe | oe [oe [oe | oe | Yor paw [ome Dom [oo oe [ome [ee [om 1 [tora [ne [one [ona [ssa [amo [nos [oar [iesi0 de cf S89 | RT RIT | 443 5 saa 54 443] S12 Say ne Foe Res acl an is F Ry ee est Ans. (ab Fi = 42.95 40.16%, ona Sis 035 iis fats a th and ets (> Tacos frst, nln ep 0-888, 0.8, 040% ( W067 +0.23N), 22 00 Fi ti fattest eit a 8 waco AL UsetheneTa 3%) Rape on an yoann ik Sand 1% lave of cipniticamcs? (bh) [evidence of autocorraation & find in part a, find thewakuo of tr 226 925 PROBLEMS IN REGRESSION ANALYSIS lenar. 9 tae used to transform the variables in onder to adjust for autocorrelation. (c) [f evidence af wutocorretation is found in part o, regrece 97 on Xi and XE to correct for antocorratation, — there any avideace of remaining autocorrelation at the [% level of sigaificance? At the 5% level of sighaicance? ans. (a) Foe 886.284 0194), + 28.62%, as 1340) 12.26) d=0.89 Sines d 4, there is eviderse of autocorrelation at both the $ and 1% levels of significance, % pao © OL 40.5044 OSHS =O 62) Ol) d= 0 Altbough d iscloser to 2 thew is still evidence of autocarrelation at the 5% level, andthe testis inconclusive atthe 1% keel of significance. Using the data in Table 8.19, (a) regress A aethe | and $82 leva jon AY), and AN. (6) Is there evidence of autocorrelation of cignifioanoe? fet Why is this trancformation wali? PO FOATAN, 091 AN, ROT (600.18) dais ans. iw af, (6) There'és now no evidence of autocorrelation at sither the 5% or the 1% level of signifisance. (0) A. regression of AY, on A.Ny, Would be less vali since j is not as close 10 I ERRORS IN VARIABLES 926 Table 9.20 gives inventories Y, actual shipments Vand hypothetical values of that include measurement race 1, al in ils af dae, in ETS sivale-gots ius fen 19K ty 1H Y al Y ae assumed to be fcc measurementereors. (ab Regress Yon X_(b) Regress Yon 4 (oa the assumption that ¥ is not available, What typeof bias results inthe estimates in using 4 instead of Xt) Use insinumental variables o obtain consistent pararuter estimates, on the assumption that, i ormlatd with sy. How do these parameter estiotes compase with those of part ‘Tali 9.20 ventoriey aun Shipments in Billions of Doar) in Ue U8, Durabke-Gowns Industries, 1583-1958 Year] ia | wet | t0as Y | sas | 9704 | tor2e y | toss [anae [ mea a | ans2r | eonte | 2na6 Year) 9a i ¥ [100 | iasay | assay x | awe v | ae ‘Source: St. Louie Feral Reserve (Deparment of Commence, Census Bateau). ® dns. (a) Ye -1Ma+ Lo 6 (455) (94% cular. 9 PROBLEMS IN REGRESSION ANALYSIS 27 on ty 7O9840824/ ORT 4.05) (11.66) ‘wan errors of measurement in the vam of shipments 6 « 6 (0) Using Xi. a5 an instrument for. we get SATA OSE KL, Rags (493) 013.51) ‘The new parameter estimates are closer to the true ones than those obtained in pat h Ubing the date in Table 9.90, (0) rabies KY on Yd onder to overtone enrons it citing, When it this method appropriate? (b) How do these results compare with those in Prob. 9.24}? Ars. ah 1019+ 11 Y, 17.98) (1.6) Csi patsy site of the pion oY Kane fy =—91 an fy 98 ve et squares appropriate when only ihe explanatory vaiabk ines measurement errors (hy Using inverse lest squares tvs beter ssn ths ase compa othe nsrumenta- variable method fs Prob 9.280 CHAPTER 10 Simultaneous- Equations Methods 18.1 SIMULTANEOUS-EQUATIONS MODELS. ‘When the dependent variable in one equation is also an exp we hie a sirmueamevneguaztions system oF model. The depend equations are ca The variables determined by factors outside the model ure failled exogenous warigbics. ‘There is one bchavioral or scructurat equarion for cach endogenous variuble in the system (see Example 1). Using OLS to estimate the structural equations results in biased and Inconsistent parameter estimaics, This is referred to as simmilraneoscrequarions bfas. To obsaln con- sistent parameter estimates, the redueed-form equations of the model must first be oblained. Thess ‘express cach endogenous varlable in the system only asa function of the exogenous variable of the model {see Example 2). ry variable in some other equation, EXAMPLE 4. Th: following tw equations represint a simple mm My =a ba Nba Vi SBP BM + Bah + ee femalon and Y depends on A (and 1) a he econ equation, AF and Y are joy determine, 29 we have 8 MM wu) Pate te codugeavan ville wae! enopemns ut deevioed veloc fbemedal A songs inn alets Ain te frst ution. Thi, ko ttn afets Yo the sooo equation. Aba feet Vea arecovelsad, nding to Bised and Inconitom OLE emes ofthe Mand ¥) equation EXAMPLE 2, The fist reduced fand rearranging: neque can be derived Wy substituting the Secon equation ite the est My = ty $4 lby +b) My Bah +) Fey analy | obs yy Mabon Toad, tab THe or Mya meta tn ‘Tho eacond redo! form equation cam be derived by eahetitating the fet equ bbe wicca id Pasco 28 ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Tenms of Use, CHAP. 10) SIMULTANEOUS: CQUATIONS METHODS ca Yom by # balay ay Fe my) bah ae yey Toad) * Toa! or remem ty 10.2 IDENTIFICATION Hdenuficarion refers to the posstility of calculating the structural parameters of a simultansous- equations model from the reduced-form paramcters. Am equation of « system is exacely ldemrified if the number of excluded exogenous variables from the equation is equal to the number of endogenous variables inthe equation minus 1, However, an equation of a system is ovenidemifed for ameridensfied ithe number of excluded exogenous variables ftom the equation excceds (or is smalls than) the number of endogenous variables included in the equation minus 1 (see Example 3). Although this is only a necessary rather than a sufficient condition for identification, it usually gives the correct answer (see Prob. 10.5). Unigue structural coticients can be caleulated from the reduced orm cooficsemts only for aan exactly identified cquation (sce Example 4 EXAMPLE 3. The moncy suprly (4) oqation of Example | is exactly identified boeaus it excfudes one exopene os variable (i) and inclades tun endogenous variables (Wf and ¥). However, the income, ¥. equation is der- idenidod because exeludes no exoganous variable If his sivond equation hud instuded the adltional exogenous ‘arable G (governecat expenditures. the fist or Mf, equation would Rave been overentiid because the umber of excladed exogenions variables would then have excoaded the mumber of endagenotsvartables mins 1 EXAMPLE 4. A unikjue-valuc of the structural paratncters of the exactly identified AF equation of Example | cats be calculated from rhe rexiueed-form parameters of Example 2 as follows: = ‘1 = aby mora ay yw any = 205) ne) a ca TO i 10.3 ESTIMATION: INDIRECT LEAST SQUARES: Indirect least squares (ILS) is a method of ealculating structural-parameter values for exactly iden tified equations. ILS involves using OLS to estimate the reducedeform equations of the system and then using the estimated covlfcients to calculate the structural parameters. However. it is not easy to cakulate the standard errors of the structural parameters, norcan ILS be used in cases of overidentifica- tion. EXAMPLE § Table 10.1 gives the money supply GM = currency plus demand deposits GDP. gross private domestic investment F, and government purchases of goods and services G. all seasapably adjusted in tallions of Gollars, for the United States from 1982 to 1999 (6 will be used in Example 6) ‘The estimated redueed-foom equations of Example 2 are A, = 112.0608 + 0.56931, 067 Daw) (645, A52.5000-4 5.45004, R= 093 ain Le a 0803 0.1068 3am 20 SIMULTANEOUS. EQUATIONS METHODS fewar. 10 ‘Table 10.1. Money Supply, GDP, Investments. and Gavernment Expenditures (Seavey Adjted ie lla of Duilare) br the United Staten, 1982-1080 TRE Tr [aa |S wT Y_[ ane | sno S580 | S847 T_[ 8350 | 63950 "806.70 | 81280 «© [raw | aa Ta [aw Year [tor _[ 1993 T33 Ta TT TRA & ats ‘Thuis the AW squation of Example I estimated by TLS is Mf, 221.3738 4 0.10607, “The same equation estimated by OLS (inappropriately) is Mi, = 162,70440.11597, = ORS iy” ty 104 ESTIMATION: TWO-STAGE LEAST SQUARES ‘Tero-mage least-squares (28LS) isa method of catimating consistent structural parameters for over identified equations (for exacly identified equations, 2SLS gives the same results as ILS, but italso gives ‘he slandued ci1or> uf Uke climate sl wcluial putaiueters). 2518 invulves segneasing cavl cnlogcavus variable on all the exogenous variables of the system and them using the precited values of the endo- genous variables to estimate the structural equations of the mode. EXAMPLE 6. IF the second, or ¥, equstion of Esarmple 1 nove includes @ tgorernment expenditures) as an additonal explanatory variable then the first, oF M, equation is overidentilied (sce Example 3) and can be estimated by 3818. The first stage is 1007 S364 TATA 4.5786, RP 37h ior (R57 Ie second stage 1h, = 166 5660 + 0.11535, sons eo G19 4) = 0.1150 is a consistent estimate af ay CHAP. 10) SIMULTANEOUS: CQUATIONS METHODS BI Solved Problems SIMULTANEQUS-EQUATIONS MODELS What is meant by (at Sinutiameouseequations system or made! (8) Endogenous wariabtes* (ch Rengenaus variahles? (a) Steuctural equations? (e) Simultameousrequations hint! (9) Reduced form equations? La) A smattaneoueseguatrans fystem oF mode relers Wo the case in-which the dependent vartable one OF ‘more equations & also an explanatory variable in some other equation of the system. Specifically, not only ase hte F salstereninsal by the Wy lu aon of He a Sastre ly te Ws la se Ys apd the Vs are jointly or siraltancously determined LO} Ane cucigenons varianes are the depenient vanabies in te spstem of situitancovs equations, hese are the variables that are determined by the system, even though they also appear as explanatory ‘variables im some other equation of the sysiem (0) Exogenous variabfes ate those variables which are determined outuide ofthe model. ‘These also inelude the lagged ensfogsnous variables, soe their values are already known m any gnen period. The ‘exogenous variables and the lagged endopsnons variables are sometimes called predctraied varies. WI Structural or bohevioral equations describe the structure of an economy oe the behavior of some economic agents such as consumers or producers. There is one structural equation for each endogen- fous variable ofthe systers, The eoeficients of the structural equations are alld structural paranaeters and expaess the direct effect of each explanatory Variable of the depeedent varable, 6) Sinulranconsequations Hiay refers to the overcstimation or underestimation of the structural meters obtained from she application of OLS te the structural equations of a simoltancous-equations model. This bias results because those endogenous variables ofthe system which are also explanatory ‘Variables are correlated wath the error terms, thus Violating the fifth assumption of OLS (see Prob. 6.4) (PY Reduced form equations are obtained by solving the system of structural equations se as to expres each endogenous variable of the system as a function only of the exogenousor predetermined variable of the system, Since the exogenous variables of the system are uncorrelated with the errae teens, OLS ives consistent reduced.form parameter esfimates. These measure the tolal direct and indivec efecrs fof a change in the exogeaou vaneables on the endogenous vanmables and may be used to obtain consistent structural parameters, The following (wo structural equations represent t sitmple dernaand-supply rrandel o Demand: Qa +aP tm, bm, 0 <0 and ay Supply: Qy = hut iP, + a boo where Q is quantity, P is price, and is consumers! income, tis assumed that the market is cleared in every year so that Q, represents both quantity bought and sold in year J. (a) Why is this a simultancous-cquations model? (5) Which arc the endogenous and exogenous variables of the system? (@) Why would the estimation of the demand and supply function by OLS give biased and inconsistent parameter estimates? (2) “The given demand-supply model repress simple simitiansms-equations marist syxm beanie Q and Pare evutually of jointly determined. If price were below oqualibrium, the quantity demanded ‘wonkd exceed the quantity supplied, and vie vers. At equine the Cocpatively sloped) demand cave erases the (positively doped) supply curve, jointly or simultaneously determining (the equi. ‘brium) Q and F, (b> The erslogenous variables ofthe model are Q and P. These are the variables determined within she modal. Tie the only exogenote variabla of tho model Ga, determined ouide the mod). (6) Since the endogenous variable # i alsa an explanatory variable im botk the demand and supply Sout, PS corveated Wilh ay in the vnaen equation aud Wilh any bh the sippy esate “This wiolats the fith assumption of OLS, which quires that the explanatory variable be unserrsated ‘oth the error term. Ae a rec, estimating the demand and. cupply Functions by OLS recite in 22 SIMULTANEOUS. EQUATIONS METHODS fewar. 10 parameter estimates that are not only biased but also inconsistent (i, that do nat converge om the trae parameters even at the eample aka ie iereated), 10.3 (a) Find the reduced-form equations corresponding to the structural equations of Prob. 10.2. (6) Why am these teduers-form equations impartant? What to the redued-form enedficients measure in this market modeF? (@) to-ting the recuced-form equations, the structural equations of Prob. 10.1 are soived for QLand #4the endogenous variables) as a funetion of only ¥ (the exogenous variable). Converting the supply ‘yution inte Fenction of & and substitaring into dhe demaral equation, we pct 1 B= G Ie — a) Bae tO ty = yy) Fa Hy tne *) a~(R=a8)-( Qn tm hte, whee m= Substituting the demand equation into the supply equation as a Function of FP, we get 1 Fo play pa Pet aa tay by a 1 Fully aN # ay — by —ay) GA) pale maBth 2 hoa hmm shore (&) Reduced: form equations emt T bey Paap besl bey are important because Y, is uncorrelated with ty, and 1, $0 that consistent estimates of reduce’-form [arameters %, ty, ™, andy can be obtained by applying OLS to the reguces-form equations, my and sa give, respectively, the total of thedineet and indicet effects ata change in ¥ on Q and P. A change Y causes shift inthe demand curve, which affects both the equilibain # and 18.4 Given the following three-cquations system, (a) explain why this is wot a simultancous-equations moclel, (K) Cont OLS he used to estimate cack equation of this system? Why? My Satay tay Voy byt Yi BK, ty Yu =cut op toad +, (@) Thepreceding system isnot simultansors because although Y; isa fnction of ¥, 7 i not a function WE Ya. Shailady allugh yi Cuacion oY, Yo a Buco A Wy, Ths te haf estlon runs only in one rather than in both direetions. “Once ¥ bas heen estimated in the first equation, ¥ can be-uced ttogether with.) to estimate Vin the oscond equation. Similarly, one ¥, hae been CHAP. 10) SIMULTANEOUS: CQUATIONS METHODS 23 ‘stimated jn the second equation, 15 ean be used (together with 1°) toestimate ¥' in the third equation Model oP this eature are reewsin eather than ganaleaneous, (6) In the first equation, exogenous variable "it uncorrelated with error term uy, $0 that OLS gives tubal pstairies skinaiss Tor Ue Gint sasivn, i Use oman swualiony atl Yas ated vith ay the, Yj Hs correla with a but not with a), a that OLS gives unblased parameter estimates for the seoand equation. ‘The came iz true for the third equation. Thur recursive moe: can be ‘estimated by the sequential application of OLS, IDENTIFICATION ws (a) What is meant by idenrificazion” (8) When is an equation of a system exactly identified? (c) Overidentified? (d) Underidentified? (e) Are these rules sufficient for identification? (uh Fehon vers he the yosiblity so inrosillity of obtaining the sbesctucal pacanistecs of simaliancous-equations system from the reduced-form parameters. An equation of a sysiem can be exactly identified, oxeridentfid, or undorifentfiad, Tho ryrtem.ar a whole ic exactly Wanted i al te equations are exactly identified, (8) Avs equation ua ayatem is fa wr esaciy serie the nuraber weactuded eaangcrous variables om the equation is equal to the number of endogenous variables in the equation minus |. For an exactly “dsetifiod squation,a unique value of the strsetural parameters can be wloulated rom the redueed form paramsters, (0) Ap equation of u system is overtdencficd if the number af excluded exogenous variables trom the cquuation exccedethe number of endogenous variables ia the oquation minus |, For an. overidentiied squstion, mors than one mumarical val: can bs salrulated fasen asene oF the stractural paramcters oF ‘the equation from the reused-Form parameters. (Wy A equation of a system bs uns kdemntedorwmifenifed Wf the mumber of exchuded variables trom the equation is smaller than the aumber of endogenous variables exchaded from the equation mists |, In ths sé, io atiictucal paccntess cans be cated front seduces pasts (©) The preceding rules for identification (called the order condita) are necessary but not suficient. Monever, since these rules do give the correct result m most caves, they are the only ones actially used here. A sufficient condition for identfsation is given by the rank condom, which states that in ysteur of O equations, my particular equation is leu a una un A i > powible iy wbtain Uwe onrero determinant of onder G ~ 1 fram the eoelicients of the variables excluded from that particular quation but inchadsd in the other equations af the medal, When this rank condition fa aatetled, the ‘order condition is automatically satisfied, However, the reverse is not true. Given the following demand-supply model (2) determine if the demand and/or supply is exactly identified, overidentified, or underidentiied Demand: Quad +ajPj+uy 4) <0 Supply: Qe= bye Ptu, > 0 (6) What would a regression of Q, on F, indicate? (a) Since this éemand-supply model dacs not incinde any cxogenoas variable, both the demand and supply functions are wrderidenttied. In thie case, there are no reduced-form equations, and no. structural paraaistere cia be calculated, Each price quantity abiervation zeprescate the aqui quantity ‘bought and sold at the given price and coryespoads to the intereeption of an (unknown) demand and surly curve, (6) Regressing @, on P, gives neither a demand curve nor a supply carve, bt rather a hybrid of demand and rupply, which choull be refered to cimply ara ragrection 24 SIMULTANEOUS-EQUATIONS ME 10Ds 10.7 With reference to the demandesupply model in Prob. 10.2. (a)determine if the demand andior supply function is exactly identified, averideatifed, or underidentifed. (0) Give a graphical interpretation of your answer to part a. {c) Derive the formula for the structural coeficients reduced-fors oes fawn th ls. a Fig. 1041 (a) The demand function is underidentified because it does not exchidle any exagenous variable, However, sinoe there is one excluded exogenous Variable from the supply equation (that is, Ypand two inchuded ‘endogenous variables the, Q-and 9, the supply function & exactly ientiied 8) Changes in Y cause shifts in the demand curve, thus tracing the supply curve. Figure 10-1a shows a Inypothetical scatter of points resulting trom changes in ¥ and the error terms, while Fig. 10-16 shows, ‘the resulting supply carve that could be gencrated fe) Unique values of the structural coeflcseats of the supply equation (the exactly identified equation) can Ibe-calculated from the redueed-form coefficients in Prob. 10-3 as follows: ‘The formula for the structural coeflcients ofthe demand function canmot be desived from the redussd form eoeflicients because the demaod function it this model is uaderidentified.. 10.8 With reference to the demandesupply model given below, (2) determine if the demand andior supply functions are exactly identified, overdentifie, or underidentifod. (0) Find the reduced- form equations, (¢) Derive the formula for the structural parameters, Demand: @, Supply: Q ee a ee) FHP thT tm, bj >0 hs where T = trend, (6) The eupply equation i exactly identified fae in Prob. 10.7) bacatca it exehsdar one exogenasie sariable (0 and inchudes two endogenous variables (P apd Q). The demand equation ie now also exactly identified hevanse it eselndesne exprnons tariahle (7) and incudes tw endogenanis variables (P and o 8) The raducad form equations can be obtained ac in Prob. 10.33): CHAP. 10) SIMULTANEOUS: CQUATIONS METHODS ne ws (82) (te or, Qememnemt +n FomemdtmT +e, With reference to the demand-supply modcl given below, (a) determine if the demand a supply equation is exactly identified, everidemtified, er underidentified. (6) Calculate the steue~ tural slope parameters. Demand: Q, —ay+ ayy tan¥) + a1 + tay Supply: = fy 5)P, ay, where BY, is wealth and the expectation is that a > 0. (a) ‘The dermapd equation 4 underidentifed because dave not exclude any exogenout variable. However, singe there are tao excluded exogenous variables fwarn the supply equation (ie., ¥ and He) and twee inclu endagenonts variables (i.e, Q@ and Ph, the supply Finction is averidentiied. (6) In order to caleulatc the structural slope parameters, the edoed-form equations must be Found, They are obtained as in Prob 1.70) and are Damen tambo, Pam tad tai ten et a earn fe BS haa OR Rw ‘The value of by can be calenlated rom ‘These two estimates of by will generally be different, reflecting the fact that the supply equation ix now coveridemtified, As in Prob. 17|e), the stractural eveficients of the demand funstion cannot be calulited from the reduced-form cotfficients because the demand function in this model is under- ‘dennifie, ESTIMATION: INDIRECT LEAST SQUARES 10.10 (2) When ean indirect least squares be used? (6) What does it involve? {e) What are some of the shortcomings of using indirect least squares? 236 SIMULTANEOUS. EQUATIONS METHODS fewar. 10 (a) Indirect least squares (ILS) is a method of cakclating consistent structural parameter values for the texsily Weatified equations in 2 eystem off multansons equations. () ILS involves using OLS to estimate the reduced-form equations of the system and then using the ‘slanted sedusedfovus patanisters br cabvulate uoige aun consistent siructucal pasate esa as indicated ia Probs. 10-7), 10.8(c4, and 10.5, (e) Ove disadvantage of using ILS is that it doce not give the standard error of the calculated structural Parameters, and it i rather complicated (and beyond the scope of this book? to ealeulste them. Auotlessinedvantage of ILS iy that i saunnot be tea by wulelale air ata] ernst steel a parameter estimates from the reduced-form: coeflicints for the overidentiied equations of simulta neous equations medal Table 10.2 gives the index of erop outpul Q (indexed to 1992), crop prices # (indexed to 1991 1902 and dispasable income per capita ¥ (in 1996 dollars, in the Unite! States from 1975 to 1996, Assume that the market i cleared in every year so that Q, represents both the quantity ought and sold in year + (a) Estimate hy O18 the esduerd-form equations given in Pra 10.3). (Hy Caleulate the supply structural parameters from the reduced-form coefficients.) How do these compare with the structural parameters obtained by segressing Q, on P, diecetly? Fable 19.2 Index of Crop Output, Prices, amd Disposable Income per Capita i 19% Doltars United States, 1978-1996 Your [WS | mute [1977 | TT [TH | Tow | RT | TURD | TORS | Tome | Tee a = a> wo) oy =) =| =] «> = © BLP sss, sey ty in] as] aoe] ¥_[raase | erase [sia | 15027 | IS wIE] VOM] Tos | Toss | Taser | Thee | THO Your | 166 [1967 | 198s [Tomy | 1990 | 199 | 1992 | 1993 | i998 | 1995 | 109m Z Bee] ae F Ls) las] toa] tos] tor] tor] wos] aos ta] 1 ¥_ [15586 [15,790 | TR 19, 7am | aT] WO.swD | SOAS] SO.sRT| HGS | THON [TAT Sure: eomome part ofthe Frestenr, 0 (e) Theestimated reducedsform equations [feoen Prob. 10 3a)] are = 142802 +4007, Rane? 36 ean) 4, = s41671 +a0ny, Reo es) w Bae = 1.5000 ove Prob, 10.700] By — ty = Bey — 14.2500 — | suonis4 1671) ~ 669705 where Ay and fy are consistent estimators of hy andy, respectively, (estimated by ILS) the structural supply equation 0, = 069705 + |.so00r, (eo) Repressing Q, on P, directly, we get Oy = 38.1584 4 0.5105R, 00.8 en @6n CHAP. 10) SIMULTANEOUS: CQUATIONS METHODS 27 ‘The values of dy and 6 obtained by regressing Q, on Py are biased and inconsistent estimates of the cuppty paranosies, 10.12 With reference to the demand-supply model of Prob. 10.8 and using the data in Table 10.2 and frend values T= 12.3... 30, (a) ealenlate consistent structural parameters for the demand equation. (3) How do these compare with the structural parameters obtained by estimating the demand raqniation ditcetly hy OLS? (a) Since the demand cqjuation is exactly ifentfed [see Prob. 1:8}, we can use ILS to obtain consistent demand structurabparameses values. The estimated reduced-form equations [from Prob. 10.58)] are G. = 026080 — 00247, + 2.25207 0 (ay) easy’ ay *y — 211.2674 — 0.00877, + 4.00797 Rodd (2.38) 1.49) (195) whee qe ME, ¥, = 000915, = 22520 3) 21.3674 4, = -00087. 4, = 4.0099 0.0028) 222720) — 6 op none ( Croan aon) = 85 I aniae74 ~ 40075) = ‘Thus the demand equation estimated by ILS (and showing consistent parameter estimates) is 9g STS+ O.S819F, + O.002SF (6) The OLS estimation of the demand Rane is ce ee er 4529-08918, 40007T R= GT i (aoe (USN 488 ‘The valuoe of, fj, ad 5, extimated by OLS are biawd and incondatent. Inde, 8, ie leer than 30% fof the [1S estimate, and 4 even has the wong sign (hut is not statistically significant) ESTIMATION: TWO-STAGE LEAST SQUARES 1013 When can 2SLS be used? 16) What does it involve? (c) What are the advantages of ISLS with respect to TLS? (a) Two-stage feast squares QSLS) is 4 method of estimating sonsistent structural parameter values (or the xoclly identified ur wvendentified equations uf a simullapcoucyuations sytem, Pur exacily ienti- fied equations, LS gives the same result as TLS. (0) 2SLS estimation invoives the appisauien of OLS in tO stagts. In the nist stag, each endogenous ‘variable is regressed oq al the predetermined variables ofthe sytem, Tse are now the refueed-form cquations th the seaond stage, the predicted rather than che actual values ofthe endogenous variables arc used to estimate the stractural equations of the model. The predicted values of the endogenous ‘variables are obtained by substitating the observed valves of the exogenous variables into the reduced form equations, The previcted valves of the endogenous variables are enoorrelated with the error term, leading to cosatent DSL etructural-paramete cetimates (6) Ope advantage of 251 over TLS is that 2SL5 can be used to obtain consistent structurabparameter cinates forte overadcutifie aa well forthe caaity cotfied equations in asyston of seaultancors squations. Another important advantags is that 2SLS (but aot [LS sives the siandand a ie) a) oy) os] wy) «lS e m= as[ sa] ony tn] om] aon ¥_[1aae [ass [1S | ear TSO | 1614 | 1280 | 1669 | THRE Te toes Paster [ee [TES Zap | 2001.8 | Saco | 3ae0.7 | TEM Sm A Tar | ree | 9s | os | 1s Bese Wp Ta] aa FP w] a6) oa] ta] way try ior] ta] 105 1536 | 18,790 | 19 sees [19,746 | 19,967 | 19.882 | 20359 | Bo.354 | OTS W_[aieaasana [ana [anon [2077s] soneo| ara] aiTaa | Sse Share: Boomonte Repart of the Present, 2000 (@) Since the supply equation in Prob. 10.9 overidentitied, 2SLS isan appropriate estimating technique 10 consistant structural paramctsrs. The leat stags i -aay, +002", RE (hos) 7s 1 +1168, ce) () The Ainappropriate) OLS estimation of the supply equation is o 3320 +0518, e 6) O08) CHAP. 10) SIMULTANEOUS: CQUATIONS METHODS 2e Supplementary Problems SIMULTANEOUS-EQUATIONS MODELS: 10.16 The following tro equations represent a simple wapeprice made We my bP, 40.8.4, Pim hy hb Wh Where W, is the wage in time period 1, P sepresents prices, and Q is productivity. (a) Why is this a Ssimullancous-squations model! (#) Which are the endogenous and exogenous variables? —(c) Why ‘wand the estimation of W” and P equstions by OLS give biased and inconsistent parameter estimates? Aus (a) This twosequnions model ls simultancous in nature because W" = /(Ppand P= fH); thus Wand Pare jointly determined, (b) The endogenous variables are HY and P. ‘The exogenous variable is @. (e} The estimation of the H function by OLS pives biased and inconsistent parameter estimates becams P is correlated witha. Similarly, estimating the second, or P, equation by OLS also gives biased! and incon tittent parameter etimates becace Wand iy are covrelated. 10.17 (a) Find the reducedform equations For the model in Prob. 10.16. () Why are they important? (¢) What othe redaced.form coefficients measure in this macro model? Ans. (a) oe Wem tite + Oe Un (6) The reduced-form equations are important because they express each endogenous variable of the model 5 a funetion of the exogenous variable(s} ony, so that GLS gives consistent parameter estimates. 4c) The redvocdform parameters give the total diet and indirect elfects of a-change in any exogenous variable of Une adel ont sal endrweniry atabslew te sand (a) What type of mode! isthe Following?” (5) [Tow san the equations of this mode! be estimated? Ysa aty the Yap bt Ty tba Xar ty Vem teiYe bela Hes Bat te ns, (a) The model is recursive. (@) The equations of the model can be estimated by applying OLS: sequentially, staring with the fist equation, IDENTIFICATION 10.19 IF the simple macrocconomic model in Prob. 10.16 did not include the variable Q,, (a) would the first equation be exactly identified, overidentilied, or underidentified (4) What about the second equation! Ans: a) The frst equation would be underidentitied, —() The second equation also woul be under- ‘ert. For the macro model in Prob, 10.16, determine (a) ifthe fist equation is exaetly Wentifed, overidenttiod, oF underidentifed, (#4 What about the second equation? (1 What are the values of the structural parameters? ‘Ans, (a) Une tint equation is underidentiisd. 0} Ihe second equation is exactly ented. te) 8) = 7/5 by =; —-hjms) and ay cannot be calculated from the reduccdform coeficents because the I equation is ‘wlio, M021 6 the sisi equation sf the ensceo snide in Poo, 1016 lina the sain vasbable. (GNP), (a) Getermine if the H andior P equations areexactly identified, overidentified. or underidentitied. (8) Find the radvoed farm aguations, (0) Darive the Forma forthe stractural parameters 20 SIMULTANEOUS. EQUATIONS METHODS fewar. 10 ns. (a) Both the frst, or W, equation and the second, or P, equation are now exactly ienifed, tata st ay We ebony “ 2 +e sab + by bunt da fT aby * ah oe a tmQst mY th er eae ry and as 10.22 Ifthe fest equation in Prob, 10.16 included the additional variable Py (prise lagged | year), (a) would she ‘equations be exactly identified, overidentifisd, or underidentified? (6) What isthe value ofthe structural slope parameters? ‘ans. (a) The fist or 1, equation is underidentied, white the second, oF equation is overidentiied. (= mal OF Mo, Feflesting the fat that the P equation is now averidemified; a, a>, and a; cantor be saluted lesaune the HY syst iy asnlsilenlifia ESTIMATION, INDIRECT LEAST SQUARES 10.28 Table 10-4 gives an index of houtly earnings IV, consumer prices P, output pet hour in nonfarm businesses, and GDP in tallions of dollare Yin the United States from 19S ta 1999, (a) Extimate the reduced-form ‘equations of Prob, 10,17}. 18) Calculate the structural coeficients of the P equation from the reduced: form coemiernts, (¢) Mow do these compare wth the siractural parameters obtained by regressing. #00 airetly? ‘Table 10.4 Knenings, Price Index, Postnciti Yeu | om | oat | wae Te W | se6| 81s ent TT F_| teal oar] oR 103.8 @ | ma] ss mr a ¥_ [Bree [sana [BIS waRS Toot | 82 Ta BaF] Ea Source: St Lois Fedeml Reserve (hureao of Labor Stacistis (BF Q values, Bureau of Beonemic Arabia (Ysa ans. ta) Tlise828 #202709, a 0.88 ien “G12 176.0632 -4 2.64719, = 0.96 9.89) ULI (b) By = 1.2845; By = 16.8711) By OLS &, = 1.2580and by = 15.9256 CHAP. 10) SIMULTANEOUS: CQUATIONS METHODS MI 10.24 For the model in Prob. 1021, (a) estimate the reduced-form equations, and (6) caleulate the séruetural coulcsonts of the H’ equation from the rwced-form coellicients, —{c} How de these compare with the Siructural coefcsents of the IV euation obtained by OLS? yeh +0499, +0007, R09 ai a B68 1143857 = 0.5671g, #401097, =0.98 lo 1 eat) )A67S, and dy = 0.9007 fe By OLS, dy Aes. (a) (hy = 86.6837, 54.2209, dy = 0.4810, ond @ = 0.8539 10.25 Forth model in Prot, 1021, write the structural equation for the P equation estimated by fa) ILS ancl (Hy oLs B Ars. (a) & 108.7575 = 1.18030, + 0807, 19178 +1358, —o.00107, =O aan 17) (036) TWOSTAGE LEAST SQUARES 10.26 For the model in Prob. 10.21 and using the data in Table 104 to estimate the # equation, (a) show the fist- Stage results of 2SLS estimation, an {6) show the second-stage results of 2SLS estimation. (c) How do these reselts eampare with TLS estimation of the W equation found in Prob, 10.28" Ans. (ah B= 1143837= 030710, #001697, =O.98 (io ath c4a7p to) $6.32 + 0.468, +9.900, e=099 i Be)” 260 (0) They are identical (there is a slight diference due to rounding); we alvo get the standard errars, The strvctural parameters estemated by 2SLS and ILS are consistent 10.27 For the mostel in Prob, 19.22 and the data in Table 10.4, te the Pequation by (a) 258 and (6) ous. Ams. 1607 $1287, mam 3.26) (38.63 co 15.93 + 1.250, R= 0.99 weIT) 48.03) Time-Series Methods ULL ARMA In Sec. 9.3, we discussed the problem of first-order autovorrslation in time series, Often, variables fareexploited solely for their time-series properties to achieve forceasts. These forecasts are not based on 4 theoretical model, but use past movements to predict future movements. High-frequeney data (monthly, daily, ete.) ean follow complex time-series processes that will change the appropriate methad of estimation. ‘There arc two main types of correlation: 1. Autoregressive of order plARip)] Fem Vibrant Paka #0 vine #8 Moving average of order gfMA(g}] Combining the two yaekds the ARMA, 7) preseatation Y= Pah He at Hie, He, 2 eg Estimation of AR(p) is simp! samples. Inclusion of the moving: computer as shown in Chap. 12, a lag-dependent variable and can be estimated with OLS for large erage process yields nonlinear equations that can be estimated by EXAMPLE 4. Using the observations of nin Table 11.1, we generate ARyy =0.8), MA(IH@) = 9.8), and ARMA(III(y, = 04. 8} and graph the results im Fig. [1-1 (with # =O t0 start the processes. ‘Asean be soem in Table 11.1 and in Fig, 11-1, the original series, ,,uetuates around its mean (0) The ARI) [process movesarocné (bt rctains art ofthe past vakucs anal dacs not revert ack to as quickly. The MA(I) process felains some memory of past values, but only for 1 period, and thus moves away’ from past Values sre Quickly. The ARMATI,1) process nas some qualities of both AR(/) and MAL} 112 IDENTIFYING ARMA AMAR process ran be distinguished from an MA process by ils persistence, Since autoregression is bi iterative process, valves of the random error fade away slowly as each year feeds to the next. The MA process is eorrciation of only the random component, so afler 4 periods the random error is no longer in the system ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Terms of Use, ot Tess Omer 7 a rs on a Poon) em | aa | come [ram | ame | ome | uae | rom | “aot pen am | nas Poa fa | ow | aane | oui | use | 19 suming | aor | ane | zeme | zee | otis fous | ie | zee | anes hewn Sac HAahy SINS SIAL, m4 TIME-SERIES METHODS ARMA) Fig. 1-1 Time-Series Processes ‘The persistence of error terms cam be cnamined through the autocorrelation function oomtvin Fiaad ACK, = and the partial autocorretatian fiction (PACF,), which is the coefficient on yj, in the regression Veo Wider + aden $2 FNine te ‘Once the degree of correlation is narrowed down, multiple possibilities ean be estimated. One way to choose the best specilication 1s to take the one Which minimizes Atkake’s mntormation ertteria (AIC) ~ Ess) arcs in( 5) + where ji the number of parameters estimated and ESS isthe sum of squared crrors (S-e) To test the presence of correlations, the Box-Pierce statistic Q = T $~ ACF; tests the null hypothesis ‘that there are no correlations. ( follows the chi-square distribution with degrees of freedom equal to the highest lag caleulated (usually the minimum of 40 and T/2), EXAMPLE 2. A company ic trying to aid their pectin af sales patterns by looking at the time-series properties ‘of te past + years of woskly sales (20S works). Table [1.2 shows the ACF and PACK, and Fig, 11-2 relects the correlations plotted against the number of laps, known as a carrefelegram, “Analyzing the ACF, we see a large positive correarlon ai 4 lags, and subscquently smaller correlations at inweevals of every 4 age (5, 12, 16 lags), The persistence iseonsistent with an AR process. Looking at the PACE senfirms this, Thor i a large partial corrsiation at lags, bat fcr accounting for ths, 12, and 16 lags no fongsr show correlation, Therefore our finding is ofan AR(4) prosess. To see if is significant, we calculate the Q statistic there we use only 16 tags for simplicity) 2- TY ACK = nn 0.7000 = 16628 ‘The csitisal value For the chisquase distdbation with 16 dP at the 5% level of signifanes i 263, Since (66.28 > 26.3, we reject the null hypotbesis and there is no correlation; therefore the AR} process is statis: cuaP. 11) TIMESERIES METHODS ME Table 11.2 ACP and PACF of Sales 7 ACF PACE 1 0.265 1266 2 0.10376 08912 3 008842 nuns 4 067068 agers 3 | -a123s6 ooiiis 6 8.08703 001332 7] -wos00 no1s31 8 names | -oasiot 9 | -mosss2 ouonss 0 9682 o.oi297 ML 0.05903 095738 i oasis ~omsiay W 0438 .oss21 M 0.10290 5 004299 16 oa7i37 es Coen 1-2 ACF and PACF Carretelaeram 3 NONSTATIONARY SERIES For OLSestimation in general to be valid, the error term must be lime-anvariant, that A nonstationary series follows the form stabianary, y +e which is autoregressive with y= |, alse called amit root, of integrated of onder 1/41)] ‘Since theentire value from the previous period is carried forward to the curreat period, values of the random crror never fade away. The continuous buildup of the errors crcatcs the problem that nonstationary series will tend toward an infinite variance. Furthermore, if the ¥ and X variables in [regression Are hoth nonstaticnary, the motel will have a spuricuely significant result and high R even if the two variables are unrelated, Taking first differences will eliminate the autoregressive component, and the unit root: 6 ‘TIME-SERIES METHODS fewap. ur 1=ahae EXAMPLE 3. The 1wo series in Table 11.3, ¥ und X, ate independently generated variables somtaining a unit root. ‘There should be no statistical relationship between F and Table 1.3. Unit-Reot Variables and First Dileremces 2 4 =13108 6579 1.5800 1.9027 ane a9uea 92130 212365 2 4 1.9802 3 o.n025 zat 160 16768 Reepressing ¥ and X yields — 116 - 045K, #032 (=2.84) W we ignored the unit root of 1 and, we would conclude that 1 ae a statistically significant elfoct on V dat the $% significance level. Taking the unit root into aceount and regressing AY in AY, we get reliable results Of =o. =O128%, #002 (0.53) for the unit root lowers the spurious statistic of by and the R dramatial Id TESTING FOR UNIT ROOT Stationary and nonstationary series ean follow different natteens, many of which lok similar when graphed, This makes testing for a unit root a tricky proposition. Stationary Newstationary White noise: =H +e, Random watk: K=7, Autoregressive: Fy =p tyY.,+e(iy) <1) Random walk with drity ¥, = + Trond stationary: Y= yet Br UE 120 To distinguish a unit root, we can run the regression ar y+ DG An + Pb ‘The regression includes enough lags of AY, s0 that, contains no autocorrelation. The model may be un without ¢ ia time trenel is not necessary, If there is a unit root, differencing ¥ should result in a ‘white-noise series tno coreclation with ¥,.,). The augmented Dicke)=Fuller (ADF) test of the null hypothesis of no unit root tests Hy: if there is a trend (FF test), amd fy: y= 0 if there is mo trend (fost), If the null is accepted, we assume that there ix a unit root and differance the data hefore iti CHAP. 14) TIME-SERICS METHODS ™T running a regression. [fthe-null is rejected, the data are stationary and can be used without differencing Since a wait root biases the estimation of » dowaward, special tables ia App. I are used to find the critical value for the ADF test FXAMPLE A We tot rend. rom Famine 1 Frnt raat al the Si level of significance sith ant withont 2 time Without trend AGM, Ream (0.20) ‘Singe 4, = —0.20 > ~3.33 (from App. 11, we fail to reject the mall hypothesis that there is a unit root, The ‘Sorrect procedure is thea to take first dilfereness of Y before using Hin a regression, ‘With a trend OF = VUEHOUTE = UNIT 1 (0.98) Since F'= 0.66 < 724, we again find a unit root. vas 0.66 ILS COINTEGRATION AND ERROR CORRECTION For a series which has a unit soot, the best forecast ofthe next pesiod’s value isthe eutteat period's value, In some eases, even though two series have a unit root and follow a random walk individually, they move together inthe long run, If Y= ¥-9 + ey and X; = Xp4 + ex we soe that Fand ¥ havea Unit oot, TF there is no unit rootin the error term from the regression ¥ = by + bX; mp then ¥ and X are cobrtegrated. If ¥ and X are cointegrated, then it is not enough to simply difference the variables to run a regression, Qne must also take into agcount the long-run relationship between the variables. When Y is above the level indicated by X, we would expoct to fall, and vice versa, Thesefore the deviations from the long-run relationship should be included as an explanatory variable in an error-correcsion model. First the long-run relationship is estimate. «<7, -b- wun relationship, Juded oc an additional variable WX, are the deviations fom the long ‘Moxt, these differences ax AY mote) AX, tee. tm Sinoc all variables in the error-correction model arc stationary, OLS may be used. EXAMPLE § A potential imvostor wishes fo model earsmption in Korea Table 11.4 reports log.of cansumpe tioa ¥ and log of GDP-¥ in Korea from 1953 to 1991 (both measured in 1985 international prices). ‘To censure the vatidity of the results, we first test each series for a unit root AT, = 0.03 +0.01%,. R= 0.04 (12a AX, = 0.02 + 0.01%. R= 0.08 (1.78) Both accopt the null of a unit root. To tost that fret differences ae stationary: 2a ‘TIME-SERIES METHODS fewap. ur ‘Table 14 Log of Consumption and GDP bn 1985 Intcenational Prices in Korea, 1958-1991 ¥ 327 33384 3.4035 3.5088 3.5806 3.7098, 38178 3.9821 ssa 41000 ‘Soare, Pe Watld Tables 5 AAT, = 0.06 -0.90 AF, Rs045 (5.36) 006-078 ax) R08 (438) Both AY and AN rejset the null ofa wnit root, ‘This establishes that ¥ and WV both hase unis roots; we ‘now test fora long-run relationship (ic., cointegration). Estimating residuals of the long-run relation- ship. we obtain = Fy 013 088, Unit-root test of e, yields ‘sé, = 0.002-0.836, we aos 42) Since we cum reject the null of a unit root fore, atthe $% level of significance, we conclude that ¥ and sre cointagrated. ‘Therefore tho correct modal of consumption and GDP isan error-correction model Afp= O140.73 AX, 0.554 R= 0.76 Oo) 44d) ‘The results reveal that for a 1% inerease in income there is « 0.73% increase in consumption (note that thisis a double-log model). “The negative cooficcnt on e,., indicates that if consumption is above its Jong-run relationship with GDP, it will decrease (o return to-equilibrum. 116 CAUSALITY The usual OLS model only identifies the correlation between variables; it does not help in determin- ing the dircetion of the relationship. While causality is an clusive concept that can never be proved with certainty, time-series econometrics ean help sort aut those timing issues, If changes in precede changes in Y, we ean rule out ¥ causing ¥. Using this logic, we can estimate the regression: yt Dhigt Doky tm 1 past values of X help determine current valuce of ¥, we say Granger caer Y. The test of ‘Fig; can be carrisd out with an F test. ‘The mumbor of fags may be chosen using ths AIC, adjusted 2, or one may include the highort feasible number of lis. To calculate the magnitude of CHAP. 14) TIME-SERICS METHODS ae causality 5 cj represents a shorterun effect of 1), Since there is a feedback effect from lags of Yin the ang. ruts, the longerun effect is F6)/(1— 6) EXAMPLE 6. Using the data in Table 11.4, we want to test to Soe if cther consumption or GDP leads the othe. Since the two series are coinicgrated, the correet proceduse would be Granger causality in an error earrection model ‘We tse one lag of the vanables, MHS a f fest caN De Used test for Csranger eaUsaMty AT, 006-919 ay 4034 AN, wanrs (0.7) (0.19) js not significant atthe $%% level, we conclude that 1° doss not Granger-sause ¥. ‘Since the coefficient on AY, ‘We then test for reverse causality: Af, =006-40538 44-035 87, (+0Me eas (a | ray e238) ‘Since the coeflicient on AY, , is not significant at the 3% level of significance, we conclude that ¥ does not Grangercause-¥”Theifone ther f ao aig val i heaton betas ¥ abd aad Wen ConcNe hat the ee is conkemporancous Solved Problems HA 1. (a) Explain the difference between an autoregressive and a moving-average process. (6) Why are AR and MA processes referred to ab stationary processes? (a) Autregressiog 4 process i wich a proportion of y, is carsied forward to the nest pero, thea Peoportion Sf jy Harvie to-Rh Sat, efSe, Since Boone oy, Hei 5 WhEi ie carted forward, we say that autonegsesion is long-lasting. A high observation a time ¢ will be sare forward indefinitely in emaller and cmaller proportions. The moving average proce, the other hand, cartes forwied ,, the random component of y,0 previous abservations ate not perpetuated (4) Boal tne AR pranens (vee at [il ~ Dy andthe MA yronsan eval seventh tc iia ‘means afer positive or negative stuck. lathe AR proces, the shock eventually dis out. Ta the MA process th shook laws alter a numberof periods greater than the nsmber of lage in the MEA. pr Since bath of these processes stay around thet means, they are stationary 112 Show algebraically that (a) am AR(1) process is equivalent fo an MA(s0) process and (6) an MA‘1} peocess is equivalent to an AR(=e) process (oh An ABU} races is defined ae We Wrbe Exicadiog tis proces to) gives sos nmrters Substituting into the equation for y, yiekle trite EMO ees Similanty, substituting for y,, 6 obtain De WOO te ate debe ale bie be Recursively cubes 250 13, ‘TIME-SERIES METHODS fewap. ur nes det yl tetris tet neat Assan be seen, an AR(1) prosesccantains some part ofeach previous erar term. Since y isa fraction, errors farther away are reflected in smaller proportions. Also, the preceding equation i equivalent 10 fan MAE) process (as 1+ 26) with ~# =f from 5= 1 tor (@) Starting with the MA(1) process and performing a similar manipalation as in part a, we obtain Solving For 4 ‘Subtituting into the equation form. ey te ing foe FEA HA HOB, Recursively substituting for ss FeO OY erg OE He “This inequivalent to.am ARE(2=) process tas # + 28) with 1 = fore tet For the randomly generated error terms in Table 11.5, calculate (a) AR). 94 = 0.5; (8) ARUL y= <0: te) ARC (GARI = 40.35. te) When would one see positive and negative correlations? Table 11,8 Randomly Generated, Standard Normal Distributed Variable Y f 2 3 4 5 6 7 8 9 10 © | nassa | 0278) -o27ia) 2.3897] —1.7548] aor |—osree] ose | 0.7s7e | o.reee: ci nfaeflefuflefle« fo le lw [= «© | 002s J-n29rs) 20288) 0.3801] 0.2191] a.s7o1 |~0.4038] -0.2615| 0.2086 | 0.681 “The calculations for theirs four parts are given in Table 11.6, ‘To camry out the calculations for an AR, process, we Will use a5 an example the calculations fr partw since parts, cand d use the sane method with ‘nly a change of y4. The fortnala for an AR process Ie nennate (for part a, yy ——th5). Staring at ¢— 1 y, — L489 (We assume here that #y — Oo get a starting value Another commonly wsed method to deal witha starting value is to delete the first period after generating the series ince it had no Bag associated with sha yy $0,511 $885) 40.270 = 0.4733 yy = 0.504733) — 0.2714 = 0.0847 vem 0.4 GOLMTs) 22687— 2a4KIaS te (e) Nove that the astorepressivesexies with negative correlation moves around zero to the opposite direc tion of the previouctalue, Natural phenomena that ean eshibit negative corralationeare evershoating, CHAP. 14) TIME-SERICS METHODS 251 ‘Table 11.6 Autoregressive Sees j@arm | ara) | @ara | a aR ' © no05 | abd t | n=as t Tas Laisa asa 2 “04733 oas7 norst 3 00387 aed | 02361 4 23463 2.3866 2456 5 ~O.5816 1994 6 oes A185 246 1 “0470 0.330 | 1.007 8 o.ss3s alsa au3ir ° 0.3188 ost0r aute 0 0.6283 8698 1.0984 Ww 92910 or) 6223 2 “#1519 wares | aan is ae L901 2316 uw 06922 ossr7 1am Is 0.4270 mates | aaeTs. 16 0.5068 05537 8040 W 065m | 0.4631 | =a3434 | 0.0017 is | wrens nsm | —arrst | -aam | nmr | 2086 ou720 az Ts ont 2 | asset o.enan ness 07086 ost smoothing, and saree resources, Series with postive correlation mows in the same dinection at the previous Yalucs, Examples of positive comelations are herding, larning, and spillovers 114 For the randomly generated error terms in Prob, 11,3, cakulate (a) MA(I),@) = 05; (6) A(T, A= 0.1; 4) MAU), = 0.1; (d) MAUI 8, = 05. Caleuiations for parts though dare fisted in Table 11.7. To carry out the calculations for an MA ross na ills won ceample the calulszons for port aig pare has and d won tho amo method with Only a hang off. The forma For am MA. proces is wee (for part a, Ay =05) Starting at = 1 yy = LARS (We ascomne apain that ¢y = Ma 26t a starting value ¥ yr = 0.2709 0.54884) = 0.8733 ys = “0.2714 — 0,5(0.2708) = ~0.40685 2280 3637 — 05-02 IDENTIFVING ARMA TLS Compare the ACF for Prob. 11.X(d).y4; Prob, 11.4(d), yx: and the random error of Prob. | Caleulate the ACF upto Four lags. Table 11.8 giver the-variablec and the fret Lage. The estimated covariances of the lags aro 3. ‘Table 11.7 Moving Average Series ta MAG) | (MACY | «2 MAG) | Wt MAG) ‘ © a=os | gaat | o=-0 | y= as T Tass | 1a Tass Tae Tae > | arm | aan aa | Lansi 3 | -oama | -asoess 0.2441 | 0.13895 s | -Lrss igi? | 293665 6 | amar | oase orgs | =na6s2 7 | -oaisa | -a32s5 -o.n6%8 | -a3113 8 | gem | oso6s osisz6 | 048% ° nas | osmsa | 0.822: basis wo | a7 | oor? e7ias2 | oseas | Liss ny oust ans | 0.10076 12 | 02875 | sa3o0ns | 0.29981 19 ia | zoms | zursss | zasiss | tons te | aaa | ass o.issea | o.seose is | -azier | -ase1s | -o2si1 | -o4sn9 is | asrr | asrss | osmm | osisis vr | -oanss | aos 1.34079 Is | 02613 | -a.05n6 0.30188 wo | azas6 | assers | o2m7s | armas mo | assar | o.sas3 oxerst | 0.70866 Table 11.8 Variables and First Lags r " by on Pus ' wu Lae Lost Lassa | Lage 0.13585 tors | 10151 2404 eas | 0.13305 3665 22436 4908 0.8632 -2a7%6 | ~2.93665 93113 -1azis | 0.8632 wary -iwsn | —osits Lag 35 ean? | 0.87% L168 oss | 108135 oto Lives | Less 0.29595 ps3 | patos RTH nai | —a 2x05 Lams 20316 | Ls760s -onans iam | Lame ons nas7s | 0.04003 8.11875 808 on ss 0.4634 enar7 | ~0.11875 o.0788s 0263 | —o.a6s4 0.7909 nares | 0074s 1406931 CHAP. 14) ME-SERICS METHODS 20. 6 corti Fae 1) = M8001 Gordy Ye 2} = 0.336492 conse rye ab= —DIISISS covlyin rs 2d = -OATITSL ordre Pay) = ARZBSTE Cardy ye -2) = GOIERBI conkyge yy g)= BIOS eovl yy, ye 4) = 0.17 OMe Pay) = ABOTO2 COM ead = DOMED coe Pad = —DIREBDS COM Iyea = 0.299672 S00 Sia) ACF, For the fret we 98001 1/1 dao = 0.6709 ACE, w 0,336192/1 460805 = 02303 ACK; = =0.128358/1 460805 = 0.0879 ACK, = 0471731, 460808 = 0.3028 ‘Thecarrelation is high forthe first lag, dosines forthe second, buts sil positive, and then isclose to zero at the third lag, indicating an AR process, For the second series: 285741 40081 =H, 5880 (056333/1 406831 = an400 0.3081 1471.06 = 02190 0. 517564y/1 206991 = ~0.3679 (Correlation is high for the first Ing, and then close 19 zero For the sscond, indicating an MA process Fr the 6.307029/0,899516 = 03413, =0.034027/0 999516 — —H0381 0,1 34803/0.890516 = 0.1300 ACE, = —0.299672/0 899916 = 0.3981 All correlations are relatively low, indicating white noise Calculate the @ st or the first series: © Far the sera series: Q= FYLACE = 290.5880 + 0.0H00" + (021907 + 0.16797] = 20.5817) = 106 istic for the three series in Prob. 11.5 up to four Tags. LACE! = 200.0708" 40.250? + (00878) + (0.0590 For the thd sree: o=rDack! {he critical value for the eni-aquare cistribution with four degrees of teecom 16 YA with a 9% Level af significanos, For the frst 180 series, > 9.49; therefore we reject the mull hypothesis tha there isn time sien wun latin GH = 94%; therefone we ave the mall hypothesis Ua tix white noise 2040.3483" + (0.0381)? + (0.1500)? +4-0.3331))] = 2000-2514) = S05 For the AR(I) scrics in Prob. IL3{d), use the AIC to test between (a) white noise (no correlation), (GV ARCH), (c) AR(2.and_ ta) ARG). Since the ARE process simply involves a Iag-dependent-variable, we use OLS toestimate the four possible rmodale. For the foar modal the etimation gielde 24 ‘TIME-SERIES METHODS fewap. ur te) Rena bss =2922 AIC i eaibed # pF =O0S4 + 06448, (wy 6.65) mn(1535) 22), (Se) pom te) ¥, = 0.0380 + 0.9995y,_, — O.5714y,_, in Gm Le sce (ESS) Zam) 22 ois w Fe BOSE 4 0.86%, OABTy, — O0EISy,, woos 2) GIy (0.8) Os ESS = 1302 anc = nF) +2 ta) +52 = 9209 Since the AIC & at its ninimum for the eiodel in part B, we choose AR(1) a8 the appropriate specification ovc that foreach additional lag, there i one Fewer observation. An alternative method for model slestion is tomake the sample consistent for each model (,e,, 17 observations foreach) so that the same data are weed for each specication NONSTATIONARY SERIES 13 iy What are the problems of a nonstationary series? (by What types af variables are likely 10 be stationary? (ae) A-ponstationary series invalidates the standard statistical tests because it has a ti-varving wartanee. ‘Without a specified variance, test statistics cannot be standardized, Also, nonstationary setiss tend to show a statistically significant sprivas correlation when repressed even if they ane independent (@) Variables quoted in levels rather than growth rates tend to possess a unit root since their nextsperiod value fa function of their current vale plus wzowth. Since the fll current wale carries forward in the sock, tis nonstationary, Algebraieally show that the variance of a unit eaot series increases with time. ‘The function of a unit root series is y, Nate ‘Tracing thie arts from ie initial wale yield nen aed Yettecate ahd tof Yehtesqtate a}, = 30! Y= Vytesa tes betes oh, = 40? ee ‘As cam be cen, ths varius of the rh value of variance of Y. of therefore au te time poriod increases, wx does the HAO. (a) Use the random error from Prob. 113 to generate unileraot series (A) Graph the uniteroot series and the original error term on the same axis. (c) Calculate an average for euch series for Y= 5,16 = 10, ¢= 11-15, and s = 16-20. CHAP. 14) TIME-SERICS METHODS 2s (a) The results are listed in Table 11,9, The method for gencrating Visas follows: asta 14884 + 0.2709 + 1.7593 1.7598 — 0.2714 = 1.4879 A879 — 2.3637 = -O.8798 ‘Table 11.9 Unit-Root Series : 2 ¥ Fi 7 1 14s 1s 2 2709 1.7503) 2m nas 0.9758 -i7s8 | —2.6306 5= 0261 ® goa | 26168 7 | massa | -29348 8 ean | =22877 9 ars | =15299 10 arses | 0.7833 Few 204778 | iy = Pm uo | 07202 -az7s | 10077 202k Loot 13381 13652 02191 L161 =a377 | ry 45-= 0.3561 6 asma 17162 7 | ~a4038 1314 ws | 026i L080) 9 12056 1.2868 20 asset Lone ws gy 0.897 | Tig ay = L851 (6) Figure 11-3 graphs the two series, (©) The avernges are shows in Table 119. The avesage forthe statinnary ses (2) stays near zen far al subsets, while the averages For the uniteroot seics, ¥, uctuaic to extreme negative values {~: and extreme postive values (1.4561), giving different inference for dillerent subsets. HLAL Table 11.10 reports the clase of the NYSE (New York Stock Exchange) composite stockmarket index ¥, and the population of Sri Lanka in thousands ¥ for the years 1966 to 1992. (a) Regress Y on X and test the eoellicient on Wat the 5% level of significance, (b) Regress AY on AW and test the coefficient on Yat the 5% level af significance. (a) For the initial roproscioa in leva, wo obtain 256 TIME-SERIES METHODS loHap. 11 sae [1989 tae2s | 156.26 | 195.01 | tangy | 22948 e361 | 16887 | iskos | 16993 | 17190 221 i745 ‘Sours: New Vork Sie Bxchange (lex) and Pene-Woeld Tables (Pop 301 4 0.034%, wa0rs omy “There ina positive relationship between Y, and which is significant a the 5% level eritialvalbe — ‘206 with 25d}. Alo, the Ri slatively High. We would conclude that the population of Sei Lanka is ‘an important ingkeator oF the NYSE (6) Taking the unit root inte account, and repressing 4, on A,X, 7 get reliable results: AR =7.M4-400018 4X, ® —3.33 (from App. 11), we fall rset the mull hypothesis that these tsa unit root “The correct procedure ic then to take fir difference: of ¥ before wring it ina raprersion CHAP. 14) TIME-SERICS METHODS 27 oy AAP, = 8.85 = 1.14 AY, , ost 850 Since 1;, = 5.56 < <1. (fram App. 10}, we reject the all hypothesis that there is a unit cont Therefore AY, isa stationary series which can be used in a regression, M43 (a) Test 2, from Prob. 11.11 for a unit oot without a trend at the $% level of significance. (6) Test 4, ftom Prob. 11-11 for a unit root without a trend at the 5% level of significance: 1.8 — WK we =003 098) Since #5, = ~0.93 = —3.3 (from App. 11, we fail to reject the null hypothesis that there is 8 unit roo. ‘Tho corrost procedure ie thon to tale fiat differences of X hefere using ‘a Ate o BA = M68—O9 aN) Os 40 Since ty, = “462 © =339 from App. 11), we reject the mull Aypotmess hat thene He unit root. ‘Thorefoke A, isa stationary series which sam be used in a repression, 114 (a) Test ¥; from Prob. 11.U1 fart unit root using dhe Pest form of the ADF with a (rend. (0) Test X; from Prob, 11.11 for a unit root using the F-test form of the ADF ‘with a trend and two lags of AX, (a). Sines the restristion forthe null hypothesis involves testing if ony soeiciomt is significant the standard test may be used with the Dickcy-Fuller adjusted critical values (App. 11). We run the regression: AY, = $27 -010¥, 41.17 R= 8 fast) @.12) Paw Sinee = 4.09 <= 7.24, we cannot rejet the mull of unit root in favor of trend stationary. (6) Recall from Chap. 7 the formula for the F test on a subset of variables is Ae Fy wwhars Rindieates a rsstrieted ssgression wnderthe null pothesis, The Ftsst therefore ques tos regressions to be run Unrestrict: AN, 6922.08 —0.58y, | +1MIT +033 ax, +028 AN,» R=00 2m em as) Can SS = 25,481.48 Restreted AN, = 219574095. 4X, ,-0.02 AX, 5 enol 2) (9.098 bss = 37,225.56 Caleulating the F statistic, we obtain (22tise tatsnae fas 446 Since F 4.46 < 7.24, we accept the null that 1 follow a init root prose 258 ‘TIME-SERIES METHODS COINTEGRATION AND ERROR CORRECTION tas 116. Way (a) What is cointegration? (6) How does eointegration affect the speeification of a regression model” (a) ‘Two variables are cointegratad if they individually follow a unit root process, but jointly move together inthe long ron, Individually, movemonts appear random and unpredictabla, bat the location of one can give information about the other. I the prediction ercorsof Y regressed om X are stationary, there rides af elarcerntinn (})Ieoimtegration exists, the long.run process should be used to explain the dependent variable. 10 ¥ is ssbove (esp. below) its long-vun equilibrium, we would expect ¥ to decrease {vep. inerease} in the mest period. Therefore an seror-sorrestion model inclndss deviations from the Fon g-rin relationship as an explanatory varinble Show algebraically that estimating the model ¥, 1X) oN by Vg eay awhen ¥ and Y are cointegtated implies the use ef an error-correetion model Enon woveeation atipulates hut ¥ aia V fullow a log-in sdatoountiy Ysa ta¥+e Talking the original model, ¥,= fy -+0)¥,-+ Bs, + Bs ta, inthe long run fas ¢— 02), we obtain Ya eB boot Bale Pato tity OF (I= Sy) Van = By bby dan Suing Fox aul lvoe the yalesipt sian ip aamlaraymvcnann, we Ive Since ¥ aud ¥ follow the long-non relationship, we know that ly f(l ~ bs) — aye and (by 83)/48 8s) a. Since these paramsists move in a constant ratio, we san solve for by and by in Asem of by, 8, ayy and a}, by n(1 bs) 8) —3.83 (from App. |}, we fail to reject the mull hypothesis that there s.2 unit wot for ¥. 200 ‘TIME-SERIES METHODS fewap. ur Aij=251-001%,, an (937 Since 1, = =0.87 = =4.3% (Feary App 1) fil rejeet the rll ypnthesic that shere ica unit mat for (@) Since both and Ware unit-raot vastablee, we can proceed to tet for eointegeation. Eetimating the long-ran relationship yields = t0sstonsy, Rm ays ese) “esting the residual fr uit root, we sbain ip 003-050e,, A (338 Since #j, = —3.38 < —3.33, we reject the mull hypothesis that there a unit root for ¢,. Therefore ¥" sd. tre ciate rated. 11.19 Estimate the crror-cortection model far the data in Prob. 1.18, ‘Since both variables are unit root an cofntegrated, we run the model in diflerences with the nciusion of the Ing residual of the long-nan mode: AS, 0.04 16a, 4012, we 0.04 Lan) (0.969 CAUSALITY 11.20 How does Granger causality differ from ather types of causality? ‘Granger couvatuy i an econometeic represeetation of tbe timing of causation, Unfortunately, Granger ssausaliiy can never prove causality with certainty, There are neveral thr factors shat auld mimic the results of Granger causality. ¥ coold Granger-cause F bocause ofa third factor causing both. This would fot show up inthe model. could mave before Yin unticpation of ¥ moving, ¥ would Granger caine ¥, BUE itis the eoveneat in Y Which is the true cause. Also, the reactions of Y sould be wanstory, indicating that while V may Grangerscaose 9, the effect docs nat last, U2 The data in Table 11.12 report housing starts ¥ in thousands and personal consumption 1 in bilkions of 1996 US dollars. We want to determine if housing, starts is a leading indicator of consumption using Granger causality, What form should variables take in the regression (levels, differences, ete)? Since Granges causality isa time-setics regression, its form will depend om the time-series properties of the variables, spssifically if they powsess a unit rost, and if s0, mhcther they arc esintegrated, Testing for sant root in levels yields AY, = 45802 -028¥, , Reds (-248) Since f,, = 2.48 > ~333 (trom App. 11),we fall to-refect the mull hypothess that there isa unit mot for Since having starts are a flow variable, iti not abrious that it sold follow st wnat root. En fact, bet Statist isclose to the critical value, init-root testing suf from being a. lowesporser test i that it sek rjoets unit reet when itshould, Since a wait sot sauscs many statistical problems, however, we re on the sade of sorrecting for the unit root when we do not have 10. AT, = -9805 40024, Rawls am CHAP. 14) TIME-SERICS METHODS 26 Table 11.12 Housing Starts in Thousands of Units and Real Personal Consurapton in Billions of 1996 Dollars inthe United States, Jan. 1087 ta Dee, 1998 Source: St Louis Federal Reserve (Bureau of Eecnonic Anais) Sings 4, = 1.72 3.83 (from App. 11, we fail torejet the mull hypothesis that thor is unit rast foe. Testing for unit root in ferences yields pal, = 1206-141 AY, 892) Since f,, = B92 < ~3.33 (foot App. 11), we can reject the aull hypothesis that here is aunt root foe AY, aad, 03 2563-110 AN, (sa) Sine fg = —$ 84 = —22 drome App 1D) fal 1 aejoct the all ypuheis that thew gait ena fe AN, Since both ¥ and W are uniteroot variables, we can proceed to test for cointegration. Estimating the long-run relationship viel’ 3081 1 + L.66%, (ona Tet she vesidual fos unl woe yikds Ab = 201-0336, ® root for, Therefore there isto evidence of sointegration. We can conclude that thegorrest model to use both and Xin frst dliferences with ao error eorrection, 11.22 Caloulate the AIC for the Granger causality model from Prob. 11.21 fave one to six bags sith the figst diflereace of consumption as the dependent variable. What is the optimal specification’ Since we are concerned only with the sum of squared errors confiseat. 5), we omit reporting the regression With one lag cach of the first differenoe of consumption and the first diflerense of housing starts, TSS = 0207.02, T= 24, j= 3 intercept and one lag of eachi 202 ‘TIME-SERIES METHODS fewap. ur ESS) | wr) +P ‘With two lps cach of the at irene f consumption and the fist difence of housing stars Ess = TTT = 3h) 28) Arc a in( PRO) SE) = 5.79 (32 H n ron) 269 Wen) es Win tree lags cach of the lst aiierenee of consumption and the fist dilerenge of Rousing starts ESS = 7554929, T= 3) =7: oeassr arc= (9S) 42 = (eee 20) ‘With four fags each of the first diference of consumption and the fire diffrenes of howsing starts ESS ~ 4617587, T= 31,7 =9, Tt acto 5) 2 8 7) $ ‘With five Ings cach of the frst diference of sonsumption and the first diferense of housing starts W ESS = 3742738, T= 3, = 1: ESS) 25 _ (3042798) | ESS) 44 ass (Foie) ag 85 “With st lags each of the first iference of consumption and the fist sifrense of howsing sans FSS = MRSEM T= MT sn(ES5) 28S) 52) ss alc AIC 7 Since five lags has the lowest AFC, that is the nprimal motel 11.23. Determine if housing sturis Granger-cause pervonal consumption at dhe 5% Jevel of significance using the data from Prob. 11.21 and the optimal model found in Probs. 11.21 and 11. ‘We ran the model restricted and unrestricted, then use the F test to test whether housing starts are a stafistcally significant predictor of personal consumption Unrestricted AR, = 42060-0018 Fey OOLAT, » HONEA, 5— GOR Fy #00981, 40.12% TOSTAN, 2 HO0SAN, 50.1340, 4 HO0AN, 5 #052 (G27) is 0.6) GOH) ESS = STATE Restricted S868 —OAZAN, | —OSTAN, 2 OKAY, 5 ~O1GRY, 4 O.IAY, 5 R= 0.28 (1.78) (2.7) 78 (OY 0.58) ESS = S085 ) (2 auate 3 ay Gs) Tera air Fp athe Slot inci Nosing ve dnt Change ces pl ews 108 193 < 2.14, wesonchade that CHAP. 14) TIME-SERICS METHODS 203 Supplementary Problems ARMA HLM Using the random variable from Table 11.13, and an ARQ) process for », with 9) = G4 and jy = —0.3, layeakubate ya (bho dehy td) Ans; (a) HO8SS (8) 0.3618 "(e) 0.7625 (uh 0.818 ‘Table 1113 Random-Frroe Terme ' 1 >7?:])*#i]:]s © [ia [aeoro | oiaae | o1si0 | ose | onze oases [asaas | 0077 fufeloe fw fos fw fw [ow fw fe © | #303 | sens [ 0.9250 | oosas | o-96ar | 0.6877 | o.sze | 0.5996 | u.ai97 | oss © | aser | onan | 02334 | 0.5804 | oa2R6 | oa3s2 | 19s | Ome | arm | OsIHe 11.26 Using the random variable from Table 11.13, and an MA(3) proven for yy mith 0, = 0.2 anit By =—O.S, fal ealewlate yey (610 (6) tne GE 70. rs, (a) AEDS (5) 0.6090 fe) LONSE (dh O.8TIT IDENTIFYING ARMA 11.26 Table 11,14 reports the average temperature in New Vork’s Central Park from 1969 te 1999, Calzulate he aanxooorrelation faction of average temperavare up 19 sk lags Ans. ACE, ==0.0051, ACF; = =0.0013, ACF; = =0.2007, ACF, = 0.2488, ACF, 1029 111598, ACF, Table 1 Central Fark: New York, isin Average Temperature / Year | 1960 | 97a [ia | 72 [v9 [ ao7e [ars [rote | i927 [tame | oe roc | ist | eas | sae | inca Passe [ine fazer [aus [amis Pas | i308 ‘year | 19m faest | iee2 | was [rasa [aoas [ioe [roar | iase 1990 roe | ste | 127 | iat | rae | anes | rss | wes | ines | res Ba Year | 199 | 1992 | 1993 | wea | 1995 [19% | 1997 | 199 | 1999 rc | woe | i227 | 1307 | 1266 Passe | ines [test [iar | 37s Sours NASA Goddard lett for Space Studie 11.27 (a) Caleulate the @ statisti for the autocorretatons in Prob. 11.26 (bY Ane there statistically sigifiant corrulations at the 5% keel of significance? Ans, (a) 4.22 (6) No NONSTATIONARY SERIES 1L39 (o) Calculate the F static for the ADP toot of nit root without 2 trend and oo lage of 9, for the temperatures in Table 11.14, 48) Do the temperatures possese a unit root Ans, (ah 5.09 (No 264 ‘TIME-SERIES METHODS fewap. ur 11,29 («) Calculate the F statistic for the ADF test of unit root with. trend and no lags of A; For the tempera turecin Table 11,14, (6) Bo the temperatures posters 4 unit soot? Ans. (a) ESBS (6) No COINTEGRATION AND ERROR CORRECTION 11.4 Table 11.18 seporse the-value of the Dow Jones Industrial Average (DMIA) 1, the S&P 500 Stack Index x, tnd the Toronto Stock Exchange 300 Index Z, from Jan, 2 10 30, 2001, a) Does the DFTA have a unit, fowk? (8) Dives the S&P 500 laave a ui vu (6) Ave P aval WV eamleptateal? dns. (a) Ves (8) Yes te) No Table 11.15 DITA, S&P S00 Index, and TSE_¥00 Index: Jum, 2-80, 2001 hae | Slaw | Hawa | Slowed | law-Or | Sane) [10-Jan-09] 1U-Ban-0i | 2-Ban-l | 16 Fan-0T toy646.15] 10,881.2 | 10,985.75] 10,012.41 | 10,6201 [19,621.35] 10,572.55] 10,604.27] 10,609.35] 10,528.38, t83.27| 1079.39] e706 108.25] 1298.86] 13000 | 1313.27] moaee2] ange sors | s9szs | 99057 sori7 | ss7z_ | ssoos | ssosa | stisa | sata 17-Jan-01] 18-Jan-01] 19-Jan-01 [25-Fan-i]4-Jan-0if25-Fan-0i [26-Fan-01 29-Jan-01] 30-Fae-O1 TayoSz co] 10,580.34] 10,078.28 10,578 24] wast | vo,000.92 20,729.52 fim.ase.9s [10,702.09 1326.65) 1329.47] 134797 sa] igang | 23004 1357.51 136417 ssva4| ssov1 | 9161.1 sees | 93062 ys? ound Source: quo puhoo com N31 Using the data in Table 11.18 (0) Doss the Taronto Stock Exchange hare a anit root? (fy Ane Wand Z cointegrated? Ans. (a) Vos (8) Yes ‘CAUSALITY 11.32 Table 11.16 reports monthly fist differences of an industrial production index for the United States ¥ and the BRD £00 Stock Market Index W from February 1998 te-Desember 2000. (a) Using on Ing of Y and X, docs N'Granger-cause 7 (6) Iso, what isthe short-run magnitude ofthe causality? (¢) What isthe long ran vnagnitude? Ans. (a) Nes (B) O66 (2) 0118 Table 1116 Industral Produetiom Indes: ond SEP £040 Index: United Seatos, Fob, 1008 Dee. 20660 ute Feb 9s |Mar-os] apro% [atay-98] Jun-98 | sub-28 [Aux 98] sep-o8 | Oct-96 [Nov-98] Dax 98 Y vst] 128 08) ass] 793] os] -0i2 | —sas)—n75 x eaus| sz ms] aur [risa Aims.so} sv.2s | sia | onsa) esse Bate | Jan-99 | Fela | Mar-99| Apr-99 |Mav-98] Jun-g9 | Jub99 |Aue99] Sep.99 | Ove99 [Now-99 | Docs) yf ia) ise) 22 [2 asi] 446 [-a9x) 712] 075] asi | -249)-139 x | saat |—ana] asos | asst [15.34] sos7 [-as99] -sso]-s7t0] s22 | 26.13) sos, ‘Date | Tan-on | Pets |Mar-00 | apr-00 | Mtay-o0| runuin | gul-oo | Aug-00] Sepu00 | et-on [Now-00 | seo y | tar] aie] 208 020] 343 6] 746] 165] 157 | —301)—2a9 x_ | =14.79 |-2500] 13216 [-as.i4 |-31s3| 34.00 |-2377] seas]—sia7| —7.10 [ras] 5.2 Seine Fieral Reive Beard of Governces(eduatial Presyston) aad qwow galeniean CRP #0 CHAP. 14) TIME-SERICS METHODS 268 11.88 Using the daia from Table 11.16, (0) What is the F statistic usd to test if. Granger-sauses ¥ with six lage? (h) Does 1 Grangercause with wx agi (¢) How would ane know the correct number of lage to use? es (ab 20 UH) No Gr Calelate the ATC for diferent number of lags andl se model with lowest AIC Computer Applications in Econometrics IL DATA FORMA’ Wdata are found from am existing soun Ina text format. Text format is exible sinc There are two main types of text formats: (rather than collected by the researcher), they often com amy statistical package and brand of computer can te 1. Delimited formas (also called free form space, tab, or comma, 2. Fixed formar cach 1)—each variable is sepen ble occupies x specific column or group ofcolumns in the text file To determine the format, onder of the variables, and any codes (c.g., missing value eade) one must consult a codebook which aceampanies the data set EXAMPLE 4. We report the data from Chap. 2, Example I asa text fle in several formats Comma-dslimited Space- Al under 1 distriution (20 df, 24all test) =tdist(A1.20.2) Al functions may be agcessed through the toolbar Inseri-Function, which includes descriptions of the Function, Graphing is done through the toolbar Insert-Chart. More advanced calcukitions (histogram, test, ANOVA, regression) are found in the toolbar Tools-Data Analysis. Note that if the Data ‘Analysis eption is Not present under tools, then the Analysis Loo! Fack nas not been installed. Lo iadé the option either go to Microsoft Office Setup ar ToolsAdsIns and install Analysis Too! Pack EXAMPLE 2. We saved the data from the commardelimited version of Example | ta a tert file. Using Excel, we ‘can open the data directly into a worksheet with the following steps: 1, File-Open, in the Open dialog box set “Files of type" to “All Fibs (.*)," select the desired file, in tis ease -example.tt, 2. The Text Ieaport Wirard dialog box appears since the selected file is not am Excel file 268 COMPUTER APPLICATIONS IN ECONOMETRICS: IcHAP, ‘We have the option of specifying “Detinited” or “Fixed width” (fixed format). If*Fixed width" js selected, the ‘next box allows the selectioa of columns, Sinec our data are delimited, we choose “Delimited” and click “Next.” ‘The meat box allows the selection of the delimiter. Our data ave commatelitaited, x0 we check the bux mext to “Comma.” For most data purposes, this is enough for Excel to import the data, so we click “Finish.” ‘Our data are now in Excel and may be-used in calculations, and saved as an Excel spreadsheet. 123° EVIEWS: Eviews is a powerful statistical package designed especially for time-series regression analysis. Eviews is a windows-basod statistical package that works through windows dialog boxes. All regression ‘options are programmed by checking the desired options. The basic steps to work with data in Eviews 3 ‘Open a workfile (File-New-Workfile). Since Fviews is written for time series, start and end dates must be specified. Read in data (File Import Read Text, Lotus, Exosl}. Give variable names, delimiters, sample, Redefine data if necessary (Quick-Generate Series), Give equation for new variable using usual math symbols (¢g., {0 define x2 as 2 times x1. the equation would be x2 = 2*x1") Perform statistical operations. For example, Descriptive statistie—histogram, mean, standard deviation, covariance, ACF, ADF (Quick- Series Statistics) Joint statistics covariance, cortelation, coimlegration, Granger causality (Quick-Ciron Statistics) “Estimarton—regression, AKMA corrections ((Juick-Estimate Equation) cuap. 12) COMPUTER APPLICATIONS IN ECONOMETRICS 209 EXAMPLE 3, Using the text file exampleltt from above, we ean import the data into Eviews: |, Tesstart a now workspace, we click File-New-Warkfile, A dialog: box queries the period length an dates. Sinoe our data do not constitute a ime series, we enter 1 as start date and 10 as end date to-clear enough space for 10 observations. “We click FilesImport-Read Text, Latus, Excel, to read data from an external text fle. The ASCTI Text Import dialog box appears: Welist the variable names in the order in which they appear in the data set. Data are arranged in columns, earns Aclimited, so those options are checked. ‘Checking the box for rectangular file layout indicates that there is one observation per row; Clicking "OK" reads the data into Eviews and sets up an eniry im the workfile for each aide, The wan ike many Le mare a ls yon, 124 SAS ‘The current version of SAS (we are using V &4}p operates in Windows, bur Is programmed by ‘entering statements rather than checking options, Thete are three main windows in SAS: the Program Edhvor where statements are written; the Lerg, where comments are stored when a program is submitted for processing (processing time, error messages, ete). and the Quipur window, where results are written on successful processing of # program, The Explorer window, witich accesses SAS data sets, amd the Results window, which catalogs previous results, are useful for the organization of large projects. ‘SAS programmung invalves two distinet parts: 1 tep where the di Progam 0 esriptiom Lapname Iname ‘e1\" 1 ‘Gives path where SAS data set willbe stored. Thiscan bbe omitted if the data set will Be ase once (i, ters- porary dataset). Inume refers tothe userdefiped name fiven to the brary. All names in SAS must begin with S letter aud be uo more tsa 6 characters, data Iname.dname; [Names data set dune to be stored in library Drame. in€ile oth: fleew delimiters’, '% 1 Gives location of test file containing data. The delimiter option may be omitied if the data arc space-delimited or in fixed format, Linpat vail var, Reels ia Varies in order of coluonns, Wat ae ie fixed format, fist variables followed by the column smumberesuheve the data fll (ap, var 1 2) 270 COMPUTER APPLICATIONS IN ECONOMETRICS loHap. 12 ‘After the data step, new variables can be calculated through equations tas with Eviews “ Generate”), The usual math notation is used for add, subwact, multiply, and divide (+ *,. Exponents are achieved by two stars (*), Data manipulations must come in the data step. Ifa procedure has been run, a new data step must be started In ofder 10 ereate new variables. Previous data sets can be called! into a data step with the “Set” command, For example state recall ‘set Irnume.dname; calls back the data set read in above. 2. The procedures where the estimation routines are called. Procedures are identified by “prac” followed by the specific procedure name and options. Some commonly used procedures are Vuied ere Procedure Description proc means, abel descriptive atti, evar, men, standard Sviatioo, minima, mania proc freq: Cateltes esrptive statistics of crete warabls proc corr; Cale sph soncation bet a es proc reg; Rs in eres rar anraregs Rinna tmaeresreresinn pre arimay Identifies nnd comeste ARMA processes, proe probit, Runs a binary ehaioe regen proc ysLin Enimats smal sxnations proc print; Prin he data sex tothe Ouspor window proc plot; Plots a graph proc iml; Matrix linguage; performs mutrs mathemati, All hngs of a SA3 program are followed by a-semigolon. Sections of the program to be processed. are followed by the “run;” comand; “quit;” designates the end ofthe program. The program is run by clicking Run-Submit, or clicking the [§€)button. EXAMPLE 4. Using the text fle examplet.txt from abave, we can import thedata into SAS through the data step. The data step is as follows data example; infile ‘’c:\examplel.tut*! delimiter =/°,1"; anput test scor: ruby quity The SAS Log window reports the following information: 1 data example; 2 Anfile ‘‘c:\examplei.tut’’ delimiter =/*,': 3 Anput test scores 4 run: NOTE: ‘The infile ‘*c+\examplet.tzt’’ is File Mamesc:\examplet. txt, RECEN=V, LRECL#=256 MOTE: 10 records were read fromthe infile ‘4c:\example1.txt'’. The minimum record lengthwas 3. The maximum record length was 4. NOTE: The data set WORE.EKAMPLE has 10 observations and 2 variables NOTE: DATA statement used: eal Ginte 1425 oevunds 5 auity CHAP. 12) ‘COMPUTER APPLICATIONS IN ECONOMETRICS aI ‘The Log window tells us that the file was found that 10 reconds (observations) and 2 variables were read. Te also reports the processing, time of 1.25 5 nt m2 ns Solved Problems TA FORMATS (od Why ate computes important in statistics and coamomoteics? (2) What age camman sources of computer-readable data? (a) Much of stausical theory relies on the large-sumple properties of estimators. As the data set gets larger, standard errors pet smaller: therefore confidence intervals get nartomer and more precise. The ‘miniovum acceprable mumber of observations for most practical purposes is 0, ks damn sexs pet larger, however, calculations get more time-consuming. Withou! eomputers, even simple calculations iavok ving large data sets would aot be feasible, More couples calculations, such as probit ov sanmultancous cquuttions, are tov computationally demanding even with relatively small data sets, Reading text files (nthe computer aeo-eliminater typing ersors from data entry. What mut be remerabared ic that while ‘the computer ie tool for processing calculations quickly, the researcher still must wersy that the madet thas heen specie enrrectly (6) Government agencies have large amos of public, compater-readable data (Census, Bureau of Labor Storistioe, Federal Reserve, oto). Other source: are college ond university research departments, Internet search engines, nonprofit agencies, and political Eokbyang groups, Financial dat may be obtained theamgh scutes eatings exeipaniss aid Koepante iafnematinn weviess, Bat usally At & substantial cost. Appendix 12 lists all Internet data sourees used in this text. (0) What is the difference botwoen delimited and fxodformat data? (8) What are some possible problems with delimited data? In fsedformat data ses, data are arranged so that each variable escapies specific cotumns of the text fk (6) Tab dotimiterscan be « problem since some statistical packages do not read tabs well (SAS), Tabs cant specially be problems with non-Microsoft Windows programs such as mainframes and DOS. Space delimiters can cause a problem with test variables that eoetain spaces within them, Consider reading in data of countries for the Fst “United Slates of Amenca Hong Kong Utaly Germany.” Reading this as space-delimited would yield eight variables; tne fist vartable "Would be “Linited, the secand rd “of,” and s0 on. Comma-delimited data woul! solve this problem since “United States of Avmtiea, Homa Kura aly, Gessiy” marl Le veal wrt Tdemtify the format of the following population estimates (in millions) for Jul US. Consus Bureau: a 0 © New Mexico 7 NowNexica, 1.7 New Mexico: New York. 122 New¥ork, 18.2 New York; 18.2 North Carsling 7.7 North Carolina, 7.7 Novthearelina, 7.7 North Dakota ole North Dakota, 0.6 orth Dakota; 0.6 ova ma mihi, 14.4 ohing 112% (a) Fixed format, state im columas 1 10 14, poputation in columns 16 t0 19. (6) Comsa-deliited (0) Semicotom delimited 272 COMPUTER APPLICATIONS IN ECONOMETRICS loHap. 12 MICROSOFT EXCEL 124 Using the data from Example 1, (a) Use the data analysis tools to graph the histogram and ogive ‘of west scores, (6) Cakulate, a mean, median, mode, sample sariance, sample standard devia- ‘ion, and covfficient of variation fo siatistialy describe the data, ¢¢) Use Excel functions to standardize each test score. 2) For abisiogram in Excel, choose Tool-Data Analysis. Inthe rerulting dialog hox, select “ixtogratn” and click “OK.” We then choose the options we want far our histogram in the follwing Ibo: ur data are in column B, from row 1 to row 10, The default isa frequency distribution, hocking “Chart Quipat" draws se histogram, and checking “Cumulative Pereenage” plots the ogire Custom cass intervals may be typed into Excel and indicated asthe “Bie Range.”” ‘The results areas follows: Farts # and care reported in the image below as'borh mumerisal results and Excel formulas. For tthe descriptive statistics can all be performed through functions, The coeflcient of variation is simply the stardard deviations divided by it ream (dividing by the encan gives a relative manure of satiation ‘without units). For part c, ote that when formalas are copied and pasted, the ecll references adjust to the new lovation, In standardiring, ws want to subtract the sams mean and divide by the samc standard deviation for all calculations, Inching a dollar sign (5) before the column and row reference lkeape i From changing when parted to a new location, cuap. 12) COMPUTER APPLICATIONS IN ECONOMETRICS 23 * (2 Bi eeu aDETT 7 Ere ‘ eens ' =e ennee 5 Sao aensyenT ‘ =e Sen BEIT * eemuen a Sener ‘ io-sem.yanai mee i t a & 2 128 For the data in Chap. 5, Example 9 (a) perform a r test of the mull hypothesis that wrapping I has average sales equal to $2. (6) Feform a 1 test of the mull hypothesis that wrapping 1 and wrapping 2 have the samc average sales. (¢) Perforin an ANOVA test of the null hypothesis ‘mat all thnoe wrappings Mave the same average sales, (eo) Wr calonlate the 1 statistic using the Pssst Eneemalas Since the pvahalny is the tails af thet distribation is greater than 0.05, we accept the mull that the average sacs are 85 at the 5% level of sinnifcanee, (6) The rme-sample rtest i fannd in Taols-ata Analysis There are several aptinns. Sine it is spite in Chap. 5, Example 9 that the data have equal variances, we sclect i test: Two Sample Assuming 274 COMPUTER APPLICATIONS IN ECONOMETRICS loHap. 12 ‘We enter the range for wrapping | as variable | and from wrapping 2 as variable 2. The hypothesized svn difference is sinae our null sates that the maons re equal. Alpha io the deceud lvel of significance. The result fails to rejct the mall that both means arc the same at the $% kvel of significance ian Foto st enor Leese ezeisoe ‘We enter the entire range ofall three variables and select “Grouped fy: Cofumns since the variables sate in sgpacabe colons. Again We set the lev of signiicance be Sa. Since the calculated F value exceeds the critical valve (°F ert inthe table, we reject the moll hypoth. esis that all three wrannines have the same vege sales. cuap. 12) COMPUTER APPLICATIONS IN ECONOMETRICS me 12.6 In Example 1 of Chap. 6, Table 6.1 reports corm per acre ¥ and fertlieer used ¥ from 1971 to 1980, (w) Cakulate the covasiance between X and ¥. (6) Use Excel to plot Vand ¥.(e}Tita, regression line to the graph, (a) As seen below, the covariance between the .Y and ¥ is positive. a ro Ce a mo 204 10 wm a) 3 12 ma ) " im os 2s te i rr) 7. co wt 7 = me se on "oe : 2 10 oe « R omer He commer SOVARIEE CIEE DIS (6) To-plot ine (Ho variables, we highlight born varsables and ehoose InserI-Chart, We eleck XY (seater) plotand click next. ‘The seies can be named in the “Series” tab, we also swith the. and Y variables thar ¥isoathewercalaxis. th thenext window the chart and axes-can be mares. th the next box the Jocation cam be determined, and we can cick “Finish. Seater Plot ; i= , (6) Te-fita vogreion fine (u the pol, ick Ube igh se button ams ‘The Following graph is created: ae he, al soll “AMdE Tease line" (this may take some practice aiming). We select to add a linear trendline; under the “Options” {als mesa palette have the repression equation and A? reported 276 COMPUTER APPLICATIONS IN ECONOMETRICS loHap. 12 127 Example 1 of Chap. 7 extends the com production table to add insecticide use. Run a multiple regression of F on My and 3, reporting the residual error terms. ‘Regresion estimation is under Tools-Data Analysis; we selest “Regression In the dialog box, we give the loction of the ¥" and 1 vattables (this can be done easily by clicking in the sdesited box and highlighting the wartable onthe Worksheet). It is important that all theinde pendent variables ae in acostinuous range of columns, We chock “Residuals” to report the erzors ofthe sepression, Note that shecking “Kesidual Fos” isa valuable diagnowie for autocorretation and heterovesdasticity, The residuals ‘ean be used to calculate the additional tests such asa Durbin-Watson statisti. Stans Ostia 28ers OnE? OOS | 28 4 womare amen Ss2rera ames 5 2 1 7 Sais -1giosaK © Ame Aimee wo Aas cuap. 12) COMPUTER APPLICATIONS IN ECONOMETRICS 27 EVIEWS 12.8 Save the variables from Prob. 12.7 in an Excel worksheet, and import the values into Eviews, ‘Since Evicws can read Excel worksheets, we save the data in Excel format, To make reading the data ‘easier. we sliminate all Fnetions and labels. Below is the Exes! worksheet and Eviews import options to read the data, 129 Using the Eviews worklile from Problem 12.8 (a) generate a variable for the proportion of fertilizer per bushel af corn. (b1 Calculate descriptive statistics for the fertilizer ratio, (ct Graph the eorrelogram for the fertilizer ratio. (ao) To generis new variable, we go to-Quick-Cencrste Series, We get the dinlog baw below. ‘We mame the new variable “ratio” and define it by the equation “= ferticorm.” and click “OK.” Clicking on the ratio variable in the workfle shows the results ofthe circulation, (6) For descriptive: statistics ofa series, chonse Quick.Serics Statistics Histogram and Stats. Enter the desired series forthe resulting information. (Descriptive statistics of the entire data set can be found in Quic-Group Statistics) 278 COMPUTER APPLICATIONS IN ECONOMETRICS {e)_ The eorrslogram is found in Quick-Series Statistes-Commelogram, After specifying the series name, correlations in levels, and eight lags, we get the following eupat. Note the high initial carelation ‘which fades ont, and the large spike at one lag for partial correlations indicates AR). 1210 Using the data from the Eview workfte in Prob. 12.8 (a) Estimate the repression of corm on fertilizer and pesticides. (6) Ts there evidence of autocorrelation in part «? If $0, correct for autocorrelation, (¢) Estimate the regression of the fertilizericorn ratio on only a constant, td) Ts there evidence of amtocorrelation in part ¢? Iso, eorrect for autocorrelation. a) To ran a regression, sleet Quick Estimate equation, To specify the equation in the dialog box, list the variables to be used in the regression with the dependent variable first; “c” inciodes ‘constant fintercepth, and then the dependent variables. The "Method" setting allows for different ution techniques. For OLS, the default setting is correct. “Sample” allows the user to imate the repression om a suet of the data set. The defanlt setting is tn extimate for the entite ddaia sct. The specification of the regression equation and the output arc listed in the following dialog bor, cuap. 12) COMPUTER APPLICATIONS IN ECONOMETRICS 2 Ta oe sees oot Sige ome Pome Gee oznen ‘oene hee 1s7e et (6) Fxiews antewnatically cakalates many diagnosis statites, including thy RE. the F statistic. AIC. lag likelihood, and the DurbiseWatson statistic. Since the Durbin-Watsoa statistic is noar 2, there is mo evidence of fist-order autocortelation, (©) oar estimate of the regression for “ratio.” however, the Durbin-Watson statistic is near aero in the pat slo, indicating autacoretation, apendet Vaele RATED ‘Nitros Une Saoeer Dae oben fe 118 cen tem Veuaie Cookson Gd Exe _SStaiene Po © cm lone ee Oo Sree (QOIMGD ew esentet ve O20 Agricitenest GHG SD demeae Galen ny Comat Sadecmeremen | S250 San sgaret ODE Scher eaten 2208 gant (2K? Daonasonstd OM 9 sorrec fr autacorsettion, the same procedure wed as for the stand regression, except hat ‘n{)" se included i the regression equation. —‘Thic came method can be vsed to correct for any ARMA proces by including “ari” for autoregresion processes, and “mug” for moving average proceses tubete p and ate the appropriate numbers}. Lags can also be inserted quickly by using (Cr where Lis the desired lg kngth, Foresampe, to insert one lag of ratis aan acnats control for aniocorratation, the equation specification wouk) be "ratio « ratio(—1).” From the resking ott, we can eau that fet order atocoratation ie no longer prev We) 280 COMPUTER APPLICATIONS IN ECONOMETRICS (oHap. 12 niet ON SRE state on caw 44D Sams Seewrtmen Hime Rese ‘user Zaeee Pree) Oia ‘From the data in Example 2 of Chap, 11, use Eviews to (4) rus an ADDF test to test the mull hypothesis ofa unit root in ¥, (6) Rum an ADF test to test the mull hypothesis of & unit root in ar a) The ADF (est is Found in the Quick Scries Statsientinit Root Test. The resulting dialog box is as follows: Eview allows Mesibiity an the unit root teu, allowing choice of intercept; trend, of neither, levels or sdffronce: and diferent lags of the diflececsd Toews to control for autocorrelation. Weehoose the test in devels, with an intereept and no lags. ‘The output reports the regression as well asthe critical values. Since the ADF st ‘the critical valve, we accept the mull of unit root, (8) Running the ADF test in frst dillerences allows ws to reject the mul f level bot nett the S% significance level into is greater than root at the 10% cigniicance cuap. 12) COMPUTER APPLICATIONS IN ECONOMETRICS 281 ADF TesSiasic ORSON 1% Clea Vibe" SERRA Chew Vibe 30a ‘ie Estcalvawe 2D Wacken ated aves ncn cypher ot oc ‘Agreed Dishny Fut Test Eoin Coord Vane DY ‘Meta tana Squares ‘S201 te 1 ‘Sopa: 12 190 Inca cbr: 1 ar adusing eos Varatle Cesc’ Sid Enos eStatats Pb wen “Cianee ONeNS A OBS © Oem Oa eee]. Teqaes CH Men emer Ome ‘éonted sewed —-BOEKGE] SD dunt vw ON Sumsqaredsest) (010% ‘Schwwecateion 088502 agian Banh Ferme ‘bets anita iat 12T2 Plt) BARES ADF Ten Siete 270K TN Coes Vale” 3D wiceaves 28 Sieseear once are ener apa arte ag aes Tot Rate ‘epee varie 72) ‘nos Un See Dae arene tee nag ‘Samp 1509 1D 282 COMPUTER APPLICATIONS IN ECONOMETRICS loHap. 12 12.12. From the data in Table 11.16 for Prob. 11.33, use Eviews to test if (a) W Granger-causes ¥-with SAS sb lags and (6) Y Grangesecauses N with six lags (6) Granger eausality is found in the toolbar Quick-Group Statisties-Granger Causality Test, We have input the data set and an the Cramger causality dialog box specily sent Y ancl,and click “UK. We then specify six lags and click “OK.” From the output below, neither variable Granger-causes the other at the 5% level af significance Parse Granger Paeoally Teal dose not Granger Cause dens ot Granger Cause K 12.13 (a) Save the variables from Prob. 12.7 in a comma-delimited text file, and import the valucs into SAS. (b) Create a variable for the Fertiliger:com ratio. {c) Print the ratio variable to the output ‘window. (d) Caleulate descriptive statistics for the ratio variable, (e) Calewlate the cortelo- gram for the fertilizer:corn ratio. (0), 8), fe), (f) We start with the Excel file from Prob. 12.7. After dclting all but the variable values, swe click “FileSave As," and save the data sot as type “CSV (Coouma Delimited for easy accessibility by SAS. The SAS program to ascoruiish arts athraagh ds paesented fclow. Noe that in SAS” "enclose comments which are nat ead by SAS. It isimportant to annotate programs and give Variables sSescriptive names so the program is casly debugged, if necessary, and others can read your program. Libname main ¢e:\" 5 (/* designates the directory to save data */ {starts the data step */ Fe gives Location s€ text ELLE and delimiter */ data main.cori infile 'ec\earn cut delimiter input year ncornfert insect; /* names variables and gives order */ vatie-fert/eorm, (ft defines ratio as division of fert and corn */ proc print: /* prints data te output window */ var ratioy 7 names var lables tor print, omit to print allvariables */ procmeans; /* calculates descriptive state */ . var ratiog /* names variables, omit for state of all variables */ CHAP. 12) ‘COMPUTER APPLICATIONS IN ECONOMETRICS 283 ‘The output window seports the results. From proc print: obe n corm fert insect ratio 1 2 49 6 4 9.15900 2 2 4a 10 4 0.22727 3 3 49 a 5 O.2euE7 a 4 a3 ia 7 9129167 s 5 5 ie 2 0130760 € é 52 ig 2 0.31034 7 7 40 23 M4 0.36667 e 8 68 2a 20 0235284 3 5 74 26 2 0.35135 0 wo 80 32 ba 6240000 ‘Prom proc meas ‘The MEANS Procedure Variawie 6 Mean Sra new imum Marine year 1975.50 3,0276504 1971.00 1980.00) a 5.500000 «30276504 1.9000000 19. n6N0000 corn 57.0000000 © -13.4742553 40.0000000 —80.0000000 fert 18.c000000 © ,0000000-6.a0n0000 32, 0000000 insect i2,0000000 74833148 © 4,0000000 24, 0000000 ratio 20 G.3028805 —a.0740907 _O.1500000 a. sa00000 fo) Tadiagnose ARMA processin SAS, there is “proc arima” which has tien stages isentify (designatest by “i") and estimate (designated by “e"). Calling back up the data set from the previous parts and continuing yields Libnamemainte:\'s /# names libvary and gives location to find data */ sta crma: PA begins data Step and names temporary data set */ set main.corns /* reads previously saveddata*/ proc arima; /* procedure to calculate correlogran*/ ivarsratios /* selects variable te identify */ runs quit; This produces the following owtput: ‘a ARIMA Proceaurs wane of vaz iaole~ rarto aus af working Sarton b.s0.m8 Seandavd oavastian blanazes Hiner of doaervat ions 8 antorarre lations lag covariance corvelation —=198765432101224867891 sea error © 9.anese05 a.o000 8 + B.no38o86 Oem | boy ae228 2 anse8b Sane | yaa wrens jutororeelatione a sumer oj. . 1 284 [owe diagnosed am ATR proces, me could adel the ‘More complex processes ean be est identity i -R APPLICATIONS IN ECONOMETRICS femar. 12 signresaszno1z9ase7eas ur the arma prware afer the For example, “e p=(1) @)” onlenatos an auterogreiive proeces at the Haat and clghth Mags, and “eq —(1 #5" eotimatee moving ‘avorage at the fist and cighth lags. 12.14 Using the permanent SAS data set from Prob. 12.13. 18) Is there evidence o (c) Estimate the regression of the fettlizer:corn ratio on only a constant. fertilizer and pesticides, autocorrelation, (a) Estimate the regression of carn on. jutocorrelatton in purt a? Ht s0, correct for (a) Is there evidence of autocorrelation in part ¢? Iso, correct for autocorrelation. {a}, (c) The Burbin-Watsom statistic can be ealcuated inthe basic regression prooedure, “proc rea.” but can also be calculated in “prog autores” with the added benstt af‘ p value which eliminates the need for supplementary critical value tables, and can be used far longer lags of auloresression. proceduses Libname main 'e:\¢ data der pet main.curn; bree reg model corn=fert insect /dw7 prec autores model rata qaies The ceaulting output is ‘We will use both /* names Library and gives location to find data */ /* begins data step and names temporary data set */ /* reads previously saved data */ Jt starts regression pracedure */ /* specifies the regressionmodel, SAS automatically includes constant, /dw is omitted far no Durbin-watson */ fe starts autovegresaionprecedare */ /* specifies the regressionmodel, /dw=1 calculates Durhin-Watean start for 11ag, duprot calculates significance */ ‘The REGProceaure Redels MODELL Dependent Variable: corn Analysis of Variance Sun or Mean Source OF squares: square Fvalue Pr>F Mode 2 1620.32960 a10.16a0 0414.85 © < .0007 Error 7 13.67040 1.95291, Corrected Total 9 1634:60000 Oot MSE Lisuvas Kesquare — G.9¥8 Dependent Mean $7.00000 adjR-sq 0.9892 Coeff Var 45170. CHAP. 12) Variable Intercept Fect insect Sse nSE sae Regress R-Square Durbin-Watson Pro ow COMPUTER APPLICATIONS IN ECONOMETRICS Parameter Estimates Parameter DF Estimate Standard Error 32-98087 1 9165005, 1 1.10987, 1.63180 0175016 0.26743, he Kew vrocedure Model : MODELL t Value 28 Proith 19.60 2460 ans 0001 020355 020043 Dependent variables carn purbin-watsenD Mumber of observations Ast Order Autocorrelation ‘The AUTOREG Procedu: Dependent Variable: ratis 2.414 10 -0,093 Ordinary Least Squares Estimates o.o4p4oane 0.00543 922,42155¢ 9.0000 0.2842 2.0000 FE are Pr < DW Root HSE Total R-Square 2 0.07408 -22.724138 ‘9.0000 ©0007 NOTE: PreDW is the p-value for test ing positive autocorrelation, and Pro0W Ls the p-value for testing negative autocorrelation. standard variable DF Eotimate Error Intercept 2 0.3019 0.0334 approx value Pr |e] 12.88 <.0002 (4), Gib The Durbin-Watson statistic for the model in part a does not indivate autogorrstation, but the btode! in part d shows statistically significant autocorrelation since ois near O and Pr < DW is les than & 5% leva of significance (05), Libname main *e:\"¢ data dws set main.corny proc autoregs model ratio= /dw=1 dwprob nlage2s run; quits “To coreet for autocorrelation, we also use “pros autores.” /# mlag=l corrects for AR(1) #/ 286 -R APPLICATIONS IN ECONOMETRICS femar. 12 ‘This gives the Following ouput: The ANTOREG Procedure Dependent Variabie: ratio ordinary Least Squares Estimates Sse o.pasdouns DFE 9 MSE 0.00543 Root NSE 8.07403, SEL 22421554 aLe 22, 724138 Regress R-Square 0.0000 Total R-Square 0.0000 DarbineWat con a. 2849 Prep 2.0001 Pr > OW 1.0000 HOTE: PASOW ic Lire p-value fur LeoLing penibive aulovus celal iste aud Pa? BH is the p-value for testing negative autocorrelation. standard Approx variable pe Estimate Error t Value Pe> 1th Intercept 1 9.3019 9.0234 12.88 <.0002 Estimates of Autocorrelations Lag Covariance «Correlation -198765432101234s67891 8 v.nugya 2evuwvou | [oerrsnrrnsnere] i 0100260 ols27000 | [teeetaeseee | Preliminary MSE 0.00357 Estimates of Rutoregressive Parameters seandard Lag coefficient Error Value 1 =p. 527000 0.300873 21.75 Yule-Walker Estimates SSE o.02683769 DFE 8 HSE 9.00332 Root MSE 9.05760 SBC - ee, sags AIG 2b 1se18 Regrese R-square 0.0000 Total Resquare 0.4628 urbin-watoen @.9708 Prepw ao13d Br > OW 0.9666 NOTE: Pret Ls the p-value for testing positive autocorrelation, and Pr2bi is the p-value for testing negative autocorrelation. ‘The AUTOREG Procedure standard Approx variable pe Estimate Error t Value Pre Ith Intercept 1 0.2970 o.o34a 2.53 <.0002 ‘The correction calculates the magnitude of autocorrelation and estimates the corrected regression, The Ducts Watson slalistns indicates thal autovocreation ssl poesea at the 5% level of sigadicanse, but aot atthe 1% level, Note that the results differ from Eview since SAS uscs a different sstimation method by otk CHAP. 12) COMPUTER APPLICATIONS IN ECONOMETRICS 287 1245 Estimate the bimary choice model from Exampke $ in Chap. 8 using (a) probit and (6) logit. (a) (Sine the Logit specification is an option “prac probit,” we wall put Both parts in ane program. We will ako show the method of manuslly énpating data through the “cards” statement to bypass crating a sepavate data countrys Input open gdpeap; cards; 565 408 3240, 1869 16471 ipe2 2102 sho 914 5746 2173 378 762 12653, 30ce 3075 547 S185 7082 iiez proc sort) by descending open: proc probit order=data; class open; model open=gdpeap; proc probit order-datay class epen? model apen=gdpeap /d= logistic: runs quits ‘The output i file, The peabit procedure i SAS also eequises that the data be sorted with sooeesses fest for the estimation. This can be done with "proe sort /* calls gort procedure */ 7 sorts data set by open variable, descending option puts larger values first */ /* probit procedure, order=éaca Epecifivc that cucceccuc axe First indata+/ 73 dependent variable */ # regression model */ /* regression model, /A option specifies distribution */ Probit Procedure Class Level Information Name Levels open values loo compet -R APPLICATIONS IN ECONOMETRICS femar. 12 Nodel Information Data Set WORK . COUNTRY Dependent Variable open Number of Observations 20 name of Distribution ‘NORMAL Leg Likelihsed .0c471345 Response Profile Level count, 1 29 a 390 Algorithm converged. analysis of Parameter Estimates Standard Verisble Dr Retinate Error Chi Square Pr > Chigq Label Intercept 1 -1.994i8 0.82471 §.8470 0.0156 Intercept gdpeap—1-0,0010035 9.004712 415347 010332 Propak Model in Terms of Tolerance Distr iputaon 1987.2336 966. 514769 Probit Procedure Estimated Covariance Matrix for Telorance Parameters co SIGMA ny 1ns389.39327 96238. 205174 SEGMA = pussy ZU3L Ia Zlib 4/0 probit procedure Class Level Information Name Levels Values ‘open 2 too Nodel Information Data Set WORK. COUNTRY Dependent Variable open Sumber of Observations 20 name of Distribution LoGIstic Leg Likelihood 8,766 185 426 ‘Response Profile Level count 1 10 a 10 CHAP. 12) ‘COMPUTER APPLICATIONS IN ECONOMETRICS 289, Algorithm converged. Analysis of Parameter Estimates Standara Variable OP Estimate Error Chi-square Pr > chisqLabel Intercept 1 3.60499 1.68107 4.8987 0.0820 ntercepe adpcap—1«0.0017958 0.0008999 3.9817 0.0460 Probit Nadel in Terme of Tolerance Distribution ao sraMa 556.864972 Probit Procedure Butimated Covar lance Hats Lx for Tolerance Parameters xu sTGMA xu 16667035772 41952,902987 SIGMA 41952. 902987 77881..332977 Note that both distributions give similar results, Using the data fiiem Chap. 10, Table 19,1, estimate the simultaneous equations model for Money Supply on GDP by two-stage least squares (2SLS) using investment and government expenditure fa inctramental variables (Example 6), data cimaly inglle “er\table.vi.caw’ delimiver=’*,°*; input year my i gz proc syslin2sley /# cimaltaneous equations procedure, 2s1s indicates two-stage least squares */ endogenous my; 9 designates endogenous variables */ instruments ig: YF designates instrumental variables */ money: model my: y+ model tobe estimaced */ quits This gives the output ‘The SYSLIN Procedure ‘Two-Stage Least fquares Rstimatian model MONEY Dependent Variapie n Analyaia of varianee Sum of Mean Source DF Squares quar! FWalue Pr>F node 1 vs204 1 rasz0d.. wess0 «.uODL Error 16 i3sde914 8486.83 Corrncted Total 17 eai628-7 -R APPLICATIONS IN ECONOMETRICS femar. 12 Root MSE 92.0543 ReSquare o.as254 Dependent Mean —874.72667 Ad) B-Sq. 0184332 Coeff var 10.5193 Parameter Estimates Parameter standard Variable DE Estimate Error t Wale Prot Intercept 1 166.5660 76.75781 2.17 8.0454 ¥ 1 O.liszae © 0.011987 S162 20081 12.17 From the data in Table 11,16 for Prob, 11.33, use SAS to test i (a) Vv Granger-causes ¥ with six Jags and (b) ¥ Granger-causes with six lags (G), (6) fn SAS, the F test cam be calculated by adding a “weet line £0 “proc weg.” date ranger inrile "es\geanger.cev’ gelimiter="" 7} Anput y xz P* create lagged variables #/ ylelagityiy vesdecz vhs yaslaca ly); yostegs (yi) ye-lega(y)) aglixh: x2elag2(x)) x3-1ag3 (x) x4=laga (x); aS-lay5 (a); x6-1a98(2)) pros rea: model y grangxy: test x2, 22, x3, x Lyd y3 y4y5 yO x1 x2 43 x4 x5 x6; /* model with s lags of each */ x5, 26; yo? test null that all are zero with F test */ pros re: model x=y1 y2 y3.yd ys yé x1 grangyx: test YL, ¥2,¥3, ¥ ru uit 2 a3 Ad AS Hos YS v8 This gts the following ourput: The REG Procedure Model: MODELL Dependent Variable: y Analyaia of variance Sun of Hean Source oF Squares Square FValue | Pre model ae aise 2usdare kv O0eLe Error 46 103148963 6.82435 Corrected Total 38 353.32684 CHAP. 12) COMPUTER APPLICATIONS IN ECONOMETRICS 21 Root MSE 2.61235 Resquare 9.6920 Dependent Mean orse138 adj B-Sq, 014592 Coeff Var 465. 34419 Parameter Estimates Parameter standard Variable BP Estimate Error t Value Bro ttl Intercept 1 2.70383 1.7a699 1.52. 0.1498 yl a olisio4 = 01236503025, olauso y2 1 0182864 0136461 =227 010372 y3 1 iite16s 0.43922 2031 0.0156 ya 1 30167208 = «0.46783 =1 ad 0.3701, ys 1 0.26792 0.44364 O68. 0.5584 ye 1 0209995 = 027288 La 029130, BL 1 “0.01778 © a.00800-2.22 ola4i0 x2 2 co.o11s7 = g.o1isé = -0.89 a3360 ce 1 0.01483 0.01498 1,00 0.3341 xa 1 -olo7a7l =o .o1sa2— 15s. 01403 x5 1 0.01126 0.01750 Ob 0.5788 x6 1 0.03078 = 0.01391 2.22, 0.0417 he REG Procedure Models MODELL Tost GRANGXY Results for Dependent Variable y wean Source be Square F Value Pror Numerator 6 1771606 2.60 0596 Denominater 16 6.82438 The REG Procedure Model: MODELL Dependent Variable: Analysis of Variance Sum of, Mean seuree ne squares. Sanare PWalue PROP nodal 12 28986 2415.49262 0.37 0.9544 Error 16 103157 6447.29086 Corrected Total © 28 132143 Root HSE 20.29502 R-Square 0.2194 Dependent Mean B.BAVIN AAS RS 0.3661 Coeee Var 1166 /55262 Parameter Estimates Paramoter standard. Variable DF Estimate © Error t Value Pr» qt} Intercept 1 63.79389 S4.92640 1.25, 0.2784 yi 1 o201g «= 6.96180 -0.58 al87i7 ye 1 selisess ailzuesa -uloa vloaue ya a 71296355 13019274 -0.58 0.3404 yl 1 210112972 14137952 -070 olaa1s 292 -R APPLICATIONS IN ECONOMETRICS femar. 12 ys -14,23754 13,6360 -1.05 9.3087 ve =2.95082 0 @.3aya5 =1:07 0.3017 xl -0117522 124590 -01 71 014364 x2 -p138289 0 0.358a1-0.38. 8.3395, x "0.34082 OL4G07S-0.74 9.47068 x4 “0158908 Ol4942 | -1114 9.2701, 2E 0.23701 9.52770 «9.62 905997 6 0137959 0042743 -0189. 9.3889 ‘The REG Procedure Model: MODEL Test GRANGYX Results for Dependent variable x Moan seuree DF Square P value Pro Numerator 6 3452.6a866 = 0.54 0.7736 Denominator 6 4a7 29086 Again, neither variable Grangerscauses the other at the 5% level of significance Supplementary Problems DATA FORMATS 1218 Using the data from the Federal Reserve Board of Goxemors (the Website is listed in App. 12). what two ddata formats would be able to read the text fle ofthe interest rate data? Ans, Spacs-delimited and fixed format, 1219 Can all spase-dcmited daa be vsad in fixed form dns, Mo, often epace delimited data do mot line up into columns if abeorwations are of ditfering langthe. MICROSOFT EXCEL 1220 Ta Problemy 17 6 9 simple regress ine was ft ts what was By? (by what was by? (o) What was the R Ans. (a) 27.125 46) 1.6897 {e097 ltaral ata using Facel From the ntpint 4a) 1221 la Prob, 127, allie cepsessin as sstiomled using Excel, Prana tle culpa (a) what as the sa of ‘squared errors? (6) What was the standard exror of Ay? {<) What was the 2°? dns. a) 1367 (B) 0.267 fe) 0.9916 VIEWS 1222 LTsing theory from Pviews in Prob 12.9(K1 (a) What wer ts # stasis tv test the aul hypathesis that the population mean of the fertilizer ratio is 0.252 (6) Is thie statistically significant atthe 3% level? Ans. (a) 221 (6) No 1223 Whats the ceittal value foe the Grapes causality F sbatisticcabeulated ia Prob. 12:12 (0) At te $e level of significance? (b) At the 1% level of significance? dns. (a) 274 (By 4.30 CHAP. 12) ‘COMPUTER APPLICATIONS IN ECONOMETRICS 293. Sas HLM __ From she estimation in Prob, 12.15. (a) What isthe log-likelihood vafue for the Hogi regression? (5) What isthe «sai or yin the logit presi ‘Ans (a)~8.3665. (8) x= WOH. = 1225 In Prob, 1217, wesee 1 Granger-eauses ¥ at the 10% level of significance, roe the output ¢a) What is the shorerun effect of Yan ¥? ¢b) Whats the long-run effect of Yon. ¥7 Ans. (@) 0.02695. (6) ~0.00668 Econometrics Examination Table 1 gives the quantity supplied of a commodity ¥ at various prices 1. holding everything else constant. ta) Fatimate the regression equation of ¥ on X. ¢&) Test for the statistical signi fence of the parameter estimates at the $% keel of () Find A and report all previous results in standard summary form fy Prefict Y and calrulate 2 94% confidenne or prvtictinn interval for X= 10. Tale 1 Quantity Supplied at Variows Prices "| '] 2) 2] 4] 5] @] 7] & yfeflelwlselefeluls xy? s[u]?)e*,u/s]e«]@ Ut Suppose that from 24 yearly observations om the quantity demand of a commodity in kilograms per year, its pre in dollars X, consuiper's income in thousands af dollars. X>, and the price of a substitute commodity in dollars Ys, the following estimated repression is obtained, where the sumbers in parentheses represent standard errors fa 13-1¥, +245 45 2 ws ux (a) Toteate whether the signs ofthe parametrs conform to these predicted by demand theory. «) Are the estimated slope parumcters significant at the 5% kveP? (e) Find Rif Sy" 40, Ley 2 10, and Coyxy = 45 (where small letters indicate deviations fram the recamh. (db Find F. (e) Us A significantly different from cere at the 8% evel? 47) Find the standard error of the regreccion, (g) Find the cosficiont of price and income elastvity of demand at the means, given ¥ = 32, Ty =8, and Y= 16, When the level of business expenditures for new plants and equipment of nonmanufacturing fens in the Tinited States. ¥, frm 196m tn 1929 is regresw om the GNP ¥,,. and the consumer pris index, X;,, the following results are obtained: = 57S+008%,-058%, R= 0.98 1608) (108) an077 (2) How do you know that autocorrelation is present? What is meant by antocorrelation? Why is autocorrelation « problem? (8) How can you estimate p, the coefficient of autocorrckation? {e) How can the valus of be usod to-transform the variables in ord to-eoerset for autocoerslation? How do you find the frst value af the trancformed variable:! (dl) Te there any evidence of roniaining autocorrelation from the following results abtained by running the regression om the transformed variables Gndicatod by an astoricky? ¥p=379 4 008N;, -mosK, R= 096 (10) (092) d= 0.88 ‘What could be the cause of any remaining autocorrelation? low could this be corrects? ‘The following two equations represent a simple macroeconomic model Res ay aM, HALT, + He Y= byt BR, ty where i the interest rate, AY ts the money supply, and ¥ is income, (a) Why is this a simultaneous-oquations modet? Which are the endogenous and exogenous variables? Why ‘would the estimation of the Rand ¥ equations hy OLS give biased and inconsistent parameter estimates? (6) Find the reduced form of the model, (ct Is this model undertdemtified, over- ‘dentifid, or just identified? Why? What are the valuoe of tho structural eoefficiente? What 208 Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Terms of Use, ECONOMETRICS EXAMINATION 5 is an appropriate estimation technique for the model? Explain this technique, () I the first, or R, equation included Y;_, a8 an additional explanatory vasiable, would this model be identified, ‘overidentified, or underidentified? What are the values of the structural stape coefficients? What ruld be an appropiate eatinsativn techaigue? Explain this techniques The ARIMA procedure in SAS gives the following ouiput for a data set of 220 timewseries observations. (a) What type of time-series process do the data seem to follow? (&) Calculate the Hox-Pierce ctatictic up 10 20 lage. (c} Te there evidence of ctatictically vignificant Gme-ceriee correlations at the $% level of significance? (d) How would one choose the exact order or correlation te correct for? ‘ha ARIMA Braceduce Mews ce Woehing terlew 0.003707 Weaker of Oherrvations 220 Sovarianse coreelwtion 4987 6540 2204296567898 steerer 31Josess i aeensessevanee | g.o67420 2iinna i Sete 1 bSdozeee zinased i aeenseas too Vestn i state to Lesaz0s i too -2baa09 i : b.azesse Bienes i O'tes26 riser? i 0.137755 “Eases? i 8.193005 “eecos21 i bs 15e056 Aart? i 0.47074 wer) i oc311%0 wars) i 8.133486 Sp ceTam 1 bopeces Patt ' parca partial netocorrelations o.ez035 bobbbesbookbbbe “Tite Work i a @ a i 3 ' 4 a fone | oe ' emmy AOILYSINNS SOHLHNONOO ECONOMETRICS EXAMINATION 27 Hp 644082 X, f229416N546), Re n9m6 b-oLe toe ) 42 2433 and a 208 005 andr OS and is significant at the 8% level and is also significant at the $% evel « - 0.6105, or 64st: Fa sasaosy, et wotos @.10 G28) i Fp 644408210 6 ae ion DE iad ote Py tate 0] sha 267 and sp 168 ‘Therefore, the 95% confidence of prediction interval for ¥y is given by Y~ = 14.6442.45(1.63), where 3.45, with n= § = 2 = ed, 30 shat we are 95% confident that 10.63 = Vp = 18.63. (a) Consumer demand theory postulates thatthe quantity demande of a commodity isinersey related wits rie hat iresty related tn eneaamsers income (ifthe commodity i a narena pore!) and tthe pie oF substitute commodiies. Thus the signs of By and, eoeform, bul the sign of bs docs nt conform to that prodicied by demand theory. a1) = 35, = 24/0 nt atthe 5% level, ut by i eo. ned fy = 4/18 50.2% Therefore, and B; ane statistically signifi- +6 Sos = 09800, or 95% ® = 005K1.5) = 09425, 0: 9425 (Same AF’ is significantly different from zero at the 5% level. W) Sine Bat (LEVEY, A follows that Bele (YE y= 095300) =2. Thos he ak FB yy = R/V) = 2416/99) = 12 260.32 66) BELT) = 18/22) — 298 (a e ©) @ fa eo ©) w ECONOMETRICS EXAMINATION Evidense of the presence of autocorrelation is given by the very low value of the Durbin-Watson siatstic 4d. dutocoersltion elec tothe casein which the error term ia one tae ported ieatsocisied with the error lest in any other period. The most common form af autocorselation in time-series data 8 postive frst ‘oner antorarelation With antreorrelating, the OLS parameters ar sil unbiased and ennsisten, hut the standard errors of the estimated regression parameters are biased, leading te incorrect statistical tests and beased confidence intervals An cstimate of the enalicent of autocorrelation » ean be obtained from the eveticlent of Y, following regression the Fy ay i0 4d oN + Ny ~ boa “The value of the transformed variables to eorrest for sutocomrelation can be Found 2s follows (where the asterisk refers i the transformed variables): Wer-an MBX We nvi-# Mis Mv Since d semains very low, evidence of autocorrelation remains even after the adjustinent. In this case, -qutovortelation is very likey duc to the fact that some importaat explanatory variables were mot inchuded in the regression, to improper functional farm, ot more generally to biased madel specification, There. fore, before transforming the variables in un attempt to owzeame autocorreation, i cant be found because the K equation is underidentiicd, An appeopriate technique for estimating the exactly identified ‘equation t indirect east equares (ILS), This involves OLS estimation af the R, reduced form equation ful thes Use of A, to eatin te structusal equations, Whi this is dai by is couse Ii the first, or A, equation included the additional, variable, the first equation would continue to be underiéemtfed, but the second equation would now beoverideniifed, Twodifferent values of by can be ‘calculated (roo the redaced orm coeicients, bat ft woud be impossible t calculate any of the structural tigre souls of the usidbatibed RE sadlnns, As ajwiqniate Weclninjus fou satiate Wes identified ¥” oquation is two-stage least squares (2SLS}. This involves first aegressing Ry on Mand Y, and thea wring to estimate the ¥ structural equation. Whoe thie ic done, by i ennristent & @ ® cy) w ECONOMETRICS EXAMINATION 209, ‘The large correlations atthe first and tenth lag indicate the presence of time-series correlations. ‘The spike atone lag fadeeamay slowly, and the partial corratation at one lag eaves quickly, indicating AR(I}. The teat lig #5 more toublesome since i exhibits featuses of AR in the correlations, but the partial correli sion i no clear The crmbination of the rw effects makes diagnosis owe rien The Box-Pieroe statistics = TDACH = 292952) = 64956 ‘Thecritical value of the chisquare distribution with 20 dfs 3111 at the $% level of significance. Since = 649.36 > S141, we reject the null of no gorTelatioNs. —-Therstore the gorrelations are stausticaly significant One could try possible specifications and take the one with the howest AIC. For our case, we try AR(I,10), AR(T} and MA(10}, and MA(1) and MA(1O} since we have an idea of the lag lengths, but hot the process, We do this by adaing the Tollowang procedure m gur SAS program: prec aria; ivareys Spat) (10); /# AR) and RR(IO) 4/ pelt) ge(10); /*RR(1) and €A(I0) */ eq-(L 101; fo Ma(1) and MA(10) #7 The resuhing AIC is 67097, 644.38, and 786,79, respectively, telling us that the second model of AR(I) fang MA(IO) i the best specification. a hat Binomial Distribution aL? os ©» 8 » 8?» 8 © 8 ® a ‘emo 7300 | 7000 | e800 1 | owe | oso | 00 0 0 | ow 100 2s 1 | owe 100, i330 2 | aw 180 ims © | om m6 4 | oma ino 36 2 | ‘ows | 07 | oa me 0 3 | oo | soo | one mt om | os 1 | ome | ins | 26 6 seis | se 4 | 0000 | soo | oxi ole vont | ors | ase © | os | re | sms | ar | om 0 | oT 1 | own | som | a0 na | oa 2 | wi | ois | one set | 336 3 | com | voir | ot wet | 208 4 | oom | oan | ooo ows | net s | cow wos 0 | ous ose | over 3 | 000 mss | 6s 4 | G00 ost | cia 5 | oom ou | os | awe 6 | o00 | soon | ceo. va? | ois | oat © | er | sear | arm owe | onan 1 | ass | Ses | ne men | 1306 2 seis | i200 ans | 263 ‘Copyright 2002 The McGraw-Hill Companies, Ino 300 Click Here for Terms of Use, APPENDIX. 1 BINOMIAL DISTRIBUTION 301 ee pow “ 2 5 | aor | oo | EW | MRT] AIT] Tw | aa 20 Te 4 | ‘geo | coo | “sors | ora | ‘oasr | loser | bere was Ey s am | are | ame | ons | ans | msi nm vat ‘ so | acu | omor | oon | unison am ser 3 scow | ene | em | ov | conor | oon m6 wr r° cosa | amos | ares | awe | 1002 | 0576 | ono | orm on 1 sae | geen | gs | zero | a7 | ams | sae a 2 | 4036 tat | “Ste | 56 [311s | aes | 389 | oo re a | gent | cose | comnn | cose | aus | airs | aur | amo | a7 2 5 | te | coo | coos | os 19 a8 & | aoe | ooo | seo | oe aus ee. 7 | weo0 | -co00 ] ona | m0 wy asi | seo | c000 | sooe | oo seer oon 99 | ons | ioe 216 aio one 1 | soo | 2085 ae nes 01m 2 | aes4 | “o620 297 81d 70 3 | ser | oo7e rs 2408 est | ooo | con. ies 28 2d 5 | see | co00 00 17 sat 6 | tooo | con 0085 my tt 7 | owe | oon ae an ny s | oooo | oo00 00 ans 07% 9 | aono | c000 on v0 co wo | ons | ser 9 a0, xno 1 | apa | aust san 003 ne 2 | gon | coms | isn | ore i308 oe 3 | seoi | coins | cosa | zs | 201s 2150 nn | sooo | co | ne | osm | nes 208 si s | seo | coor | 001s | aes | 0264 er 281 6 | seo | c000 i | goss nis Sl 7 | neo | coo ‘ooo | 00 ns ni 9 | swoo | e000 coo | 000 me x 10 | wo | ono ‘oom | 0000 901 co no | ass | sea6 sere | ns a. ox 1 | omms | “amas ‘rut | Ht 6 mst 2 | nso | os67 zeae | 2981 0x7 ra 3 | soz | 0137 isi? | is im od 4 | nooo | cots fase | ‘tier as ait 5 | seo | coor ais | oss nor 28 6 | sooo | 000 cooos | 0057 rl mm + | tooo | 000 voces | 007 sno vat | awe | 2000 0 an sas 9 | aaoo | —c000 7 0s) a x09 | c000 00) 007 ons soo | e000 m0 00, owns oo | ses | stoe va on oon 1 | cia | ans sor om wi 2 | aso | ons cet oo met 3 | sepa | 017s i Mie asi 4 | wana | cw | tort | ast 7178 a, 3 | awe | cone | oor | ois | ose =m ra 6 | seoo | e000 | oes | aw | oiss ‘tse | “an | ase 302 BINOMIAL DISTRIBUTION IAPPENDIX " ose aw o 8 os eT or] | caw wes | 078 ‘ann | le | “ieas oe twas | 015 ans | on | ossr x00 cw909 | 0003 wos | cows | ors “ecu twnno | 00 aos | ono | 00% aa ws | oor ory | op | oa en ee | 0s ants | coos | om. 208 ans | 388 uss | coo | ows aon asi | 21m ser | ‘ocao | ose on ae | a7 mus | cn | oes, 30 Iss inns | ioa2 nna st | 9183 686. in a0 oo es wn wl wn 01 rd 00 6 ors cy ms an 2m as. ama ist i sé one ae 2k 1sT4 mz nin is 008 us ane 000 ont wn aus wn 0 0 a0 1s ose eos ts a) ams est | 1700 ‘ot mins test | 2001 1359 “oo wae | una Be 03 ws | att im se ost | 38 181 “oo mss | 118 ‘oot ccna wz | 030 eras 00 cor | oe on oe ‘wooo | 0001 m6 “oe ‘wn | 000 cay 00) wg | 000 000, ci wo | 2000 eo 1 ss ian | ows am a sss | 228 30 as ase | 072 ois | conse | ons APPENDIX. 1 BINOMIAL DISTRIBUTION 303. " oss 2 ws | ai | as | aes | om rg woos | “coor | “asia | isi | Som 2 woo | coos | oir | sss | 0 or woo | ‘coor | ome | orm | osm 1 wo | cad | coor | oes | 187 i vwv00 | e000 | mp0 | amr | 012 iy woo | coo0 | cen | ome | om ia wo | coo | ome | om | com ear woo | ‘sooo | ome | oom | como wm woo | “coo | evo | omen | om ons svn | com | ome | ome | oo ms woo | co00 | cen | oma | 000 nn woo | co00 | 00 | mo | oom oa " soy | aan | see | own | 22s wwe our | asi | 2m0 | aera | a9 oxo woe | cons | ise | are | a0 on, vo | ‘con | owes | test | oss ne wana | cain | ars | awe | rt wan woo | coor | ox | ze | ose 9 woo | cow | om? | ones | 267 ist wo | ‘coo | om | ons | ues is woo | -c000 | one | em | 02) Tess wo | c000 | cee | 0 eet woo | coo | cone | oem ns woo | “coo | “owe | xen oar 00 | -co00 | eno | oem ne exo | coon | come | aan one woo | co00 | ome | 0 aro soe | e000 | cane | mo vn we | ans | ame | ssn | ase rs a | cisiy | ares | om | rsa oa 2 | ‘ora | “oases | 2006 nd 3 | amr | ors] te | 2a05 mB 6 | ooo | mes | ome | “ise or s | woo | cor] ons | over our 6 | vow | uo | cose | san ase | 1373 ome 7 | pow | coo | cot | os so | rte ait s | oom | enon | oom | om ame | stl a | ono | oa00 | scan | ood aise | ome ass 000 ecw | coon | ove | aes | 129 a 000 ccc | oan | 0000 | cooie | 012 ra 00 cecud | ee | cao | 00 | 000 ont 000 soo | 0000 | one | ooo | 0000 ast 000 ceca | cae | 0000 | coo00 | 000 ox st stcate | oxee | ooo | anne | mat | can | toa |e | xe 00 ‘ona | oom | ‘ove | ‘omoo | oo | ceo | cone | Soxeo | ome w 02 as | ase | ovat | coe | oot | cos | over | ooo | oom 1386 cast | cise | cooss | coast | oo | oon | ies | coooz | ome wise 2a | 208 | ase | comes | osse | one | coos | loos | oom 304 BINOMIAL DISTRIBUTION IAPPENDIX " 10 o 8 os Te Ta ais | oe | oe om 2s cater | en | ore a6 os erat | om | am 683 sre ust | om | asi. “ool ws amr | casa | oo “ooo on ner | im | isa 00) 198 vase | Cran | lire ci ne wore | tee | ite “ecu ‘is ast | com | iso 00 on as | loa | sie oc ‘wg aot | oon | om eu or os | oom | oan oon 00 vot | cooos | ome ea 90 soa | vot |e um ‘ua on | “tern | ea caoow | one0 | m0. » sme usa new | one | 00: aw ont aos | ret | 000. a8 ceo | oxy | ovo | ons | 0008 | oom Isat ie | ‘one | ‘ox3 | 9123 | ‘ooo | ‘om ‘eon | lor | Sue | ‘ise | po “ous | oa canis | ote | ans | ames | anes aus | oie eas | 04st | toot | tats | ist6 mas | 0x7 somo | oreo | oes | tie | teas a | ore com | oe | cz | como | tise was | sim ‘oot | noise | coors | lasts | ese ers scan | gee | onze | ome | ose rs | ire sccan | geo | conis | cons | 9120 iss | tsa coco | m0 | 0001 | coos | one fom | lew ‘ooo | oa | ‘v0 | ‘oma | one ‘ome | le eno | amo | ove | ooo | one aim | asm re | v0 | c00 | om | om | oom | 900 | 0000 conna | 08 1 | avo | cada | eo | ae | oom | 000 | 0000 ced | 0000 20 | sooo | 000 | sean | oad | 000 | oom | cH cand | 00 ” som | ame | ome | arm | oss | ous | nnn ues vst | aso | ise | carer | xs | anes | 001s ns nase | 2305 | ese | ear | ons | msn | ona ans ois | oa | a6 | ive | aase | omen | ons vee ee osm ven woo | coco | ste | asia | 980 30 ie woo | coro | ory | gem | ext wn ns wos | -cao0 | core | arts | ous si ase woo | -co00 | oot | ease | one m6 ae woo | c000 | ee | oars | ors | one | one no | osm, swoo | com | cone | oot | oo | coisa | sxe cass | cise woo | -co00 | ene | gem | oni2 | coors | ones isu | iss ona | “com | ome | am | oom | ms | nis rw | isa swe | cog | ene | am | 090 | two? | 0083 asst | tse ee ‘oa | com APPENDIX. 1 BINOMIAL DISTRIBUTION 30s a” we ro ‘Example: MAP = 3m = Sp = 3) = O18 APPENDIX 2 Poisson Distribution Valnes of «* 0 os u 07 3 oy Ms uo to u a iH us u ts ua a too 06 ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Tenms of Use, da Rae} Standard Normal Distribution ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Tenms of Use, 308 [APPENDIX 3 Armani * rs Proportions af Area forthe Standard Normal Distribution : © ” oF oF c os a o os om a Tr ons | ons oss | ome | ons | os | ove | a7 om | 871 ms hn ums | inet rn | “rss 15 tw | isi7 94s 20s 2198 a0 ye 2833 2305 ast 3106 me aus 3363 308 st 98 a8 ame 7m 2810 Tak ws ar wm 6 ae 431 462 403 a a8 ‘0 as | aos ama | coe | aan ano | seat aus | ane avs | aos | ass ass | ass fe | 9 aor | as | dae 23 | 68h awe | ese aot | ats | as 40a Toe am | oom | ams | as | as | asin aa | asi aos | ass | lass | aes | ane | les asst | gsr ast | ses | camry | casos | asm | tse aus | ao aes | age | amr | sor | ses | aise ais | sie ao | oamz | ams | ar | ae | amt ao | a9 goo | ao | ame | aes | ete | aus ast | ose wos | ese | anst | oe | aye | aoe ant | 66 woe |p few | ene | lem aon | em aos | a6 am | an | asm v0 | eat aon | aos | ams | est | ames | ons aus | one a0 | amr as | sor a0 | wn “Beampi: For = 1.6, shaded ara 0.4190 or of te cota arm oF 10, APPENDIX 4 Table of Random Numbers tana | WON AIT Haas | auios 92 mre | 12550 02044 sun ena 4ss71 4010 65710 | Boas 83568 pert) seca nT tse mn 1805 45266 Dour Tran 9987 38573 ans as3e | 32195 26803 14013 1946 oo74 45783 sos 78017 | soe S616 309 ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Terms of Use, Aeon repented below * @ APPENDIX 5 Student’s t Distribution Proportion of re (one) [Proportions of Area far the # Distributions say 99s 296 aaa 20 “81 1650 Lent saul et co in 36 x01 a | ims wo Sono | 3390 1» toa | 2558 "Example: Fr he shaded aren o represen 05 of the total aren of 1.0, vale of ¢ with Ieee of freedom E812 ‘Somes Feom Table Il of Fisher and Vales, Skttiical Yes i pablahed by Cie dt Bed. Eda Urght, Dy pt Dlilhed by Loagitan Groep Lid, London (pret Spastic! ‘Copyright 2002 The McGraw-Hill Companies, Ine iskepcal terface ard Meda! Research, the, 19 310 Click Here for Terms of Use, APPENDIX 6 Chi-Square Distribution Asean reper iow: # LA Feedfnia Feedf23 Peopewtians af Area forte 9 Distetaas Proportion of Area ot Toms ws] es] nam | osm] on] oom] ous | aoe 2| 9100 05 ast | soo | ran | 931 3) oa a6 11 aa 4) om ‘ts 9 18 3| oan onl ier 108 6 | ome las wet | re Isat 7) 99m | iz | ie nm | uo re a) i | iss | zie ine | ssi 2009 w) ow ws | a ma u] 2 ane Me +0 | as S01 a) ao sa Is] 40 oe ass | naa te) ste eon om 1 19) $70 74 woo | seas ta | 36 as toss | i934 | eee wai tes | maa | a a9 aa | 34 ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Terms of Use, mR CHLSQUARE DISTRIBUTION IAPPENDIX 6 Proportion of Aven at Fags Dv vasa [vam | os] orm [ua on Wea | ae 2 [uae [aa | ma a a2) ae | nae iam [iam | sia | teat ws | ao 2) 936 | tose woe | iss | en | ser sis | aise mu) am | inte nas [isms fou | aoe | ave | nas | oa as} | inst ver | teat | ana | seas | azas | anos | saan ae) fis wae | irae fasae | asse | aso | ataa | sot ayia | ines ers | intr fe | gare | ann | asia | ies mw) ia | 133% eos | isa forge | aren | arse | agas | as wii | 1a ma fim |e | a0 | arse | ase | as | ose ae) ise | tage ws | ao | eae | soar | arr | anos | suse | ss67 wom | mie as! |aos [eat | sist | sere | mas | sxe | er se) 90 | ao7 saze | sre fas | oxi | ors | saz | meas | map mw) pas | asa siza | ssax | was | assy | goss | osm2 | rons | wo42 sp) sii7 | sase ax lea fra | 8 [ioe |ione | maa | 6s oe) sao | hts eas | ia9 fest ioe usa | mt | as wo | ans | 70.06 mgs eae [oar | nes [izes Liane Jase | eo “werample: or he ace area to represent OA the toa area of ner the density functor, he valoe of a SAE when “ Sree: From Table IV-of Psber and Yates, Stic Tees fr Bolopal,Agrcultra and Medica Resear, 6th ef, 1974, rallied Uy Lenapnan Grom Lid Lesa (jevitnly paid ly Oliver Bh Eden ey pss alles nd pablshers APPENDIX 7 —————, F Distribution 313 ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Tenms of Use, ess a ose ener fase 2 a8 |B] 8/22 ta |i | Sh U3] au | 2 aba Een | uo | au 1a Jaa |e ee) zn | ue 3/38] a] 2 5/28] vd teste ia |e [ise tan | un sefan}as pie 3s ian) y 3 vee sa ea un [2 ba | 25 |e 3 38/28 [13 3)i2 [3d wi] 2 im R (appence 7 enctmes) | fi |r | him | 20h | 2a | 2a 30 |i sofa | [is [stn] 20] | oe] oe] te] be] te | 2s | ame zm | as un fa [am | ise ist | | at | ue 22] im ain |an [um [oe ist | ae | | on je) ta 21) ES /EG/iS )S5) oe) 26) 0) 825) 22) B5|.cas | aus | 20 Ey zt | 208} 2m0 | am Page | oar | ee | cae | ta | aoe | | me E cr 33 fam [on | | | 224) | | a9] ip] | ir] ber 5 be 3g in| unm fags aan oasis oe] oe 3] oe] pm |= is is EYER AR A) a aS IBS] AS FE is he tos fu ow fist [ara | oe te) ut | Lat of is 355 [se ae tor | |e | | ae | ta) ia sha |s| se ae sii fis|sa |i ine |e ie] as] | tt ayia ig is fiat 15] |S] AAS AS 1S {Cond ke oraneiaay Nounanisia 4 ppt 7 omtresy cal on at | 2a aa son [frst |e | e| ve | tae oa As | in at ein | um |e | m0 | 2h | 2a sel a 2 vain al em | om | Eo] [us [ae eT) Ase | an 5 ae PE Bool sae | | 20 tar |u| oat us| 22 os 1a a8 aie) ba [20/2 || 9 & # vol [om | | tel tae| [te tm ss er af on 3 w|i | to 200 ae tel en| at | ae tal vee ax a Sec: Beil ih prin cs Cr W. Seat ond Wil. Si Ms hf By en St APPENDIX 8 Durbin-Watson Statistic ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Tenms of Use, = 2 = eee eueseceas a mtn epte v ing het Saree J Oued 5 Wan. “Tours era Can Le Stars Repro 3819-07 (981 Rete he rminn oti aboa he awe SusLL¥AS NOst¥ mune # wlan APPENDIX 9 Wilcoxon W Wileowen Sign “Tait Citi Valves TonTal Tow Probability: 2 oo os 201 Probably: 6.1 owe oot 0s a2 q 33s ts ane 0 13 2 0 1a Sx157 Ie Seiver RL, MeCormash, “Exmendid Tables of the Wikoton Matched Pairs Signal Rank Satscs.” J. dm. Sta. Ace, #0 TI585) pp ST For larger simple sees, sana oral ables ca be Wad for the test ati net a fin Dn +1) + Ww 319 ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Tenms of Use, 320 WILCONON W IAPPENDIX 9 Wikcovow Signed Rank Test: Lefte amd Right-Tail Critical Values (Two Sarmple Testy 5% amd J siniticance levels (26% and SC for One Tall Testy nb the senaler sample) ‘Sour F, Wilconon and B.A. Wikow Same Ajyvoxinane Sianitical Presedwes, American Cysnamid Company, 1964 For larger sample snes, standard normal tables cam be ws forthe test statist: +0 z n+l) APPENDIX 10 Kolmogorov-Smirnov Critical Values Kolmagoran Simnor Critiel Values for Various Significance Levels ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Tenms of Use, APPENDIX 11 ADF Critical Values «¢ (ADF) Test Lef-Hland Critical Values (tet) Augmented Dickey and Right-Mand Critical Values (F Test): 5% Level of Significance Te Tarp Tateept Tansee, F “ Wo Tren Tread Tread Suainie = i 7H x on Wo oa oA an 0 60 | | 625 Suece W. A. Palle roduc t Surhsical Tne Sores, Wey, New Work, 1976: D_A.Dickey and W. A. Faller, Likelood Uni Res Beewmcrrisa 9 1980, pp 1087-107 ‘ato Staats for Autoregressive Time Sei ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Tenms of Use, APPENDIX 12 Data Sources on the Web ‘Te following ars cli eat mores om the Webs no i shin Sen Sachs nd Warner Openess ates ‘up monk conan Grows sachol ach hom ‘Workd Bunk Data and Current World Devetapment Indicators ‘dp ww Workdbankorg bata? St Lenin Fedevad Revere, Exum: Tinw-Sevies Dott Bane ‘gp newt 8.org fre Bureau af Labor Siatstics Reinifrernchdege Federal Reserve Board of Governors ‘hap ee egovireeases! Sentsticel Abstract of the United Siates ‘ipso cones grow rtatiatioalahstsc to tl ‘Beomaumic Report of the President ‘ip yoew gpaseop edulcatalon, Pean-Workt Fale ‘ap: eanston: eps tora. ca S68, prt hol NASA Goddard Imitote for Space Studies ‘ipo iss nasa ga gate! stem staion, sata (New York Stock Exenange ‘laps sec Yahoo.com Stock Quotes ‘ap quote yam, co "Since Websites olen chan pe. we will keep as Upset the tet Website 323 ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Tenms of Use, elaseal pba, 37, eas sl overt edad 6 ‘Absntate Gaperioa, 29 ‘Aezeprance repo: Typist eating 47-89, 3 1b in mle epreson anay Adusted a, adhuned eveficemt of multiple determinatso), E57, er Alkake'sinfereutin ener (AIC, ‘4, 3a, 20-284, Dol 36D ‘Atmon tag model. 183, 134-196, 3 ‘Ahern Hype in ypeehests testing 47-69, 95-96, B10 in muhipieregresion aatyss, 171 wr pecpiomeonirn see, 100 ‘Analysis of tarianee (ace ANOVA ‘ANOVA (analyas of caraecel ble, Athmetic mean cavers) 11 ARMA, 343-245, 3-281 ‘Aejmpiodc-unbuisedsem, [48-149 ‘Aupmted Dickey-Falle (ADE), Avtacoreation (serial correlation: ‘ahd feo in Sara, 217 ‘3 problem un regres Tabs Autocegresion, 302 289-251 ‘Secaer feral ay Ut ‘Average devation. 13, 24-25 Bayes theorem, 3249-30 Benatnal(uectura egutions, ‘sa1-233 est iene anhiase etimatrs (ee LUE) 2 ut ete erm iag, (84-134, 147-149, 228,281 ‘Baad exis, 147-10, Toromndanincy sc 309 Keyek lag model and. 194 ‘inary enone modes US, 198-2 inom distsburion. ‘a irtemrobablity dsribuion. BiH, 1-85, 58-05 in xtimatsn, 70, 79-81 in nypethens texting, #8 9,00 93, 3a 99, 108-108 scent deribtion an, 0 ino mat dsntaion (Cont Police soenbatioa Sngaishs teow. 6-95 fino protien, 500-305 BBLUIE (bet linear unblaned estimators) 2 alle seqtemien ays 1 {simple regremton analy, 133-124, Ta7-139 Box Pisce sie, 24, 283 unaiy, 2AH 2UD, 26-263, Mt 36S (Cestraime theorem, 8 75, HE Cesirl tendency, 19-34, 14 eerem equal) 2, 66, 71, EB (Chosguaze ee ‘ef goodness of ft and independence (Cite boundaries (exact ks}, 18 ‘Chis inert oops te hypotbests ein T0107 assaf pon! etal. 37 Guer sentag, {Gabh- Dupin prodetion Fonction, 167 si0-711 Grin, 2, Sec ale specie rafiiews Chstreratinn 47 MN IS 28, 36 ‘Callocton of a, 2 (Colinear independent variable, 210 ‘Galace fone) een, 92, HEHE ‘Conditional forcest, 177 ‘Condiiotal probaly, 3, 47-48 (Confidence terval ‘ucocorteaton and, 208.206 ‘nd fice eximate, 147-138 fstab. 8-70, 16-81, HERS Foret 183- 188 197-198 fe the maa ing 1 ‘tivucion1)-71, 81-88, 86 fn rulipe egression drab, tale repel ah, HA Cnidense hve! team, 760, 0-85 te hypotets wing, 7-48, 95-90 Condidence iin, 77 Cosson, 144149 (Consistent scimatars, 134, 18-14 ‘Continous siaribation, 41, 37-0, 105 ‘Contauces prehabilicy dansbation, aa) ST a As ae (Contintows rane vaichs le Probabay dsttuton) Covminuem variables tl, 5B, 3h (Comat, cofisent mulicilineaiy and, 10-21 eve tin eo mata, in snap regression aaa, 13-138, Correogram, 344 ‘Counting techniques, 39.50, 6 Couarance 18, 129, 145 {Grcl ropon {ae tga ego) ‘Crescctonal anabys 18 ‘Cross-section data, 8,213 ‘Camuraie feqoeney asreunon, ‘Camulatice normal Fangio probe tuo) 184, 199 ata formats, 266, 271, 292 Degrees of Freeiom ih seb lag oe 193 tn ou yaa, 189-1 in estimation, 10-71, 88 in focal 1A¥-184 HT Heteronotia a 207, 214 im hypothe testing i 998, 2-403, tee HS in mle regression aly, 18, Ta in sample epresia mati, 181,143 elmira 26, Demand Fonction, 7 Besa ec ie Poli epaner waabley 1, $8545 8 ‘olcoretion and 216-217 in distin lye model, 193 sendogeriows variable ak, 22 224 ‘of veo m sani 2-223 ie forecnting. 1 (Ser alan Feecasting) in np rearesnon aman, 184 (Se te i nals) smuliphieation fe 36-48 ‘alien, 1 AI imple eon naa, 25, (Seca Sample repression nal ‘Copyright 2002 The McGraw-Hill Companies, Inc, Click Here for Tenms of Use, Dependee varie (Com {Ser abo Smaleanecsesuations methods} vst Descriptive sain 1-3, 9-35 Treen tiv i, 5-0, rae rrr Feel inden i, Tinieze ravage ot dein in, 13-15, 38 Determine of Sin atosonretion, 318 ‘lsonmeay ed, 306,90 in mile essa amy, 15, ie in singe piano ana, 12-133, is Desret inoun, 1739-40 $157, os Discrete randoss variables, 39, $1 ‘Dueine (mutually encase) eva, 738 446,65 Dispersion, 13-15, 24 29,35 ‘Daud lag anil, 183, 195-196, 204 205 iets: ate eaten ef 1 Tn ample reresiow any, 183, (See also specie dstarons) Dats cae oasteh 9617. 1B Disturbance ae Eero ten Deubleag oer (mde 11-182, 1,2 Dooubletay linear mode rm), 81 1s Iss 189 Damm varabies, 182. 189-193, 03 3M Drain emestage mei, 217 Durin-Wanaon eas, 8, 206.217, a comosnen criteria, 67 Ebenventrce ‘methodology of, 1.2. 5-8 stuns and 13,7 8 ctnmuctes cxamisation, 298-29 Econom theo, Eien (pst ue estimates, iia tay ie alte Enrica smokin iseribution of the Tecan endogenous varabies, 224-2 reve comathon, 247 248, 258-200, 264 Error sum of sguares (BSS. 110-118 in inp epg aoa 122, U4 rv tem acochasti tet, ‘Soturbance), samogorrlstion and, 208-202, in inti ng andl 198-18 and errors in varables, 202 Forecasting errors asd 197 in mkt reese aay, 165 ‘anab, he rege Tee B13 INDEX {See als Standand deans) reo vanes, S710, 771-272, pret ESS (ace Eror sam of squares) ims) ‘etna, 76 in desctipsive sai, 38, 7 exon ef he, 130-131, 155 In simple regression aalseis, 128-130 (So ee pt forms ond Estimated demand function, snes parametst Functional Fors and, 186-187 im milipe repression analy Tete stinatioa, |, 2 67-86 using ¢ distribution, 70-51, B18, ra males ast sqoares, 29-230, Ds 37, 240-31 seve 67 71-72. BE farmpling ditt af the mean, ‘67-68, 72-76, 84 arse fot ua TT 39m 281 normal disiibotion, 69-7, mas (ee aly Forecasting) Patines ‘etned, 78 in multiple regression anal 1S Im simple regression aes. 140-141 (See alo gous tps of enor ant evimoor) vies, 268 20, 277-282, 292 tact lint (ass Doandais), 18 Fact linear rebrionchip, 123, 172 198 Eracliy Mentfed equates, 29-200, eas Evgeny Variables, 224-50 Exposed fequsneis, 90-92, 1-109 Exped ale Tbunogual dsibution 40,51, 4-88, rn ot comtimese prsbabilty sation, 7 of ra ra ple reson 2m, cf Posmon detain, 55.63 un ashen area sam of ‘uares) 1b 2, 14, Elms rable re nip Sass Expomentaldisuibutiow, 43, 61-62 Fale of 318 Geautbaton, 11 heteroraiiy ad,2708 im hypothen testing 2-93, 108-110 in lie repression anal. 155 TOW THQ, 138, 090-172 2 Fine format, 266,271 Focccasverre vaiance, 188-184, 197 198 Forecasting. 7, 180-188, 197-198 Frequency dsribetions, |,9-10, 16-19, 38 Ws es Seale Relative frequency -detrbatioe) Frequency polygon 117-19 Functional Form, 160-182, 188-189, 202 (Gauss Markov theorem. 138,18 ‘Geometnic mewn, 122-73 ‘Gout for heteroveesanity, 212 Gooden of hire test of independence ad “92. 10, Ba 121 fn nypotheis tng, 1 ‘in simple regesion anal, 132 1. 1S sali, 216-2, 260-282, Grouped data, 11-18, 1829, 51-92 lassi naan 23 Hiteaseetus ty, 307-08, 212-215, 05 igh mincalieanty, 210 Histogram, 1,3, 16-19 Hoosen dintorhamees, 1 Hipergeomeic dsribatcn, 40.38 Hypothesis testing, 7172.87 127 ssi ate m2, vo tts, ieee to ones of td indeed 30-92 10410, Sand, 1,3, 11-73, 67, 85 96, 10 oe dierences between ta mca of Proportions, £9-90, 101-104, 120 coven agnineance of regres i 17 out population mean and Proporta, 8788. 36-101, Tig 130 {Se afes Maltple regrnion analyse, Semple repression anal) TLS dine east suares). 239-230, "8-27. 2a ncome sasity, 1-181, 175178, 11-182 is sonal esate toe Dia Independent enlanar)vvnles, oe auineoereaton and 208, 2166-217, incon nel Pre dibs and. 3556 nist lg res, 193 sia errs in nate, ‘xagencus varalee 26) 238 209 fin forecasting. 197 (Se ae Fececasting) eteroedauty and 3077-28, areas 226 Iependent (explanatory) varaties Cons Iypothess testing and, 90-92, 1 0 Togo. 208-210, (Si alae Simuanaous-squaions metho) rmulieotinearicy and, 366-207 Erte ripleregeasion anaes, 181, 1 {See aise Maple egresion as rulepcaton foe, 8, 44 90 qualictive, durum varisles 2, | Tap 193, 208 304 ‘qualative dépenieatwarabies and ro im simp reves analy, 128 Thetis (Sie alse Simple negresson eas) Trafic east squares (ILS, 229-230, 358-291, 0-241 nance reasoning, 2 Inecentit stasis 1-3 {Se alee Foti: Hyoresss aarile range, 13, 28 ier estinste, 62, 16-81 ere est unis, 223 Joint moment. 16 Joint probability, 38 ‘Kolmogeroe Sermo tet, 94.94, Tp-1i8, 128 ose a idl. 1, 193-194, 20 ‘waa Wily t,t, 117-118, 123 Searowie, IE-16. 3118 Lack of hig, 147-149 [Ligges vats, 0-210, 224 {Soe lve Simataneoureqtions methods) eft est 8, 102 LLeptokurtc cure, [3 3b Likelihood rai inden, 185-186, 200 [Linear eegresson satis, 128, 134 ‘Se fe Regression analysis) Linear eelatonshin. 154 [Log-lkelhooc futon, 18, 198-199 fami (lth 18195 Logs model logsticfuncbon), 181-185, 99 308 eect, 185, 00, 205 Mathematics, 1.4.7 Maurie notatoa, 198-180, 173.095, 19% ‘Maio Helin, 154, 198 Meanish 11,2 “a anal of variance 92 nol inna 99, 3488 Saidence itera forthe wing ‘tai, 70-71, RUB, 86. in descriptive wanes, 15-16. 19-24 INDEX Meg Cane nae Iypoithests testing for ifleences ‘erat to prenortiont or $990, 101-104, 120 im normal distribution, ‘necmalirsbuthon a comtbmacs ‘probabaity dtnbuioa, 31 in Patson dictibation, i, 37-58 Sampling dstsbubon of the (ace Sapo aren of he im simple regresion analysis, L41-H2 (Seeale Estimation, Expected val specie mea} Mean hein devin (MAD), 3 Measure eror (Mt imtgpmens eng 92 110-13 in simple regression analsis, 136 Media, 11, 13,15, 19.24 Microsoft Excel, 367°268, 272-276, 292 Mone 11 12 1619-24 Moning wage 22244, 299.251 MSE (sce Mean square ror) Muticllnen ty, 206-207, 210-20 ans Mubipl even, 37-99, 44-50. 63-68 Muntpe epretsion analy, 4,13, Ts 1h oeBiien af mile detrei in 157, 109-001, 1 Forecaning in 183 Ud purclakvoraion eee in segresiog oa 158, TPCT. 179 tess of signe of parameter ectimatc Sy 155, 16518, 17D Uhree-variable linear sods, TSS, 181-185 ITE usiptcasoa for dependent evens, 28, 45-30 Fr indepen even, 19. 85,48, 42 Mutually exchate (dst) evens 3038, ae a ‘Nevative caretaton, 132-153, 1145 Negative near elatonship, 172-193 Nepatnelyakewsd cutaton, 13, 230, Nowearestinatss, 147-148 Nowearfusctons, 181 "Nonlin regression analysis, 14 owovcunrane probaly 36 Nowpurasocic ting. 9895, 118.1 12133 ‘Normal distribution: ‘ascoa tinuous probably ditucion, it, S10, aS ‘oe ero teem in simple regression analy, 128 ie esas 69-78 83 tm hyporerts teing, $8, 90, 92, ‘35, 86.99, 106-107 im simple regressice analysis, 1, Norma dsnbotion (Com) ‘Sandard, 1-42.37 Normal equations, 128-129 Nall hypothesis: fn hothess testing. 7-89, 35-94 8, 1 ib, 13 1S in muleiple represion analy, niet fn simple regression analysis, 183 Observed frequencies. 90-92, 101-109 OC faperating-charseersn9 cate, #9, Toto, 13 sive (distribatice curve}, 8, 17-19 OLS (ee Ordinary Iestsquares ‘etd (One facoe (one-way analy of ‘rine, 3 neta tet, 58, 142, 18 ‘One-nay (ce facoed analy of Onenay ANOVA cal, 103-115 Operating characterise 10) ear, 99, Order contin, 233 ‘adinary east-siers oiaatoes, Pista 14) 14,153 {See ao BLUEY Ts tnguates nd YOLS. i, Te 1a, ay 15,18 ‘ode and, 196 ‘aetocorreation and, 215-216 Sneed lag moe. and 193-195 rrorein variables anc "10, ae Aocoant an, 198 fetal fem an, 186-18 etreazedasecey and, 207-208, Tes indie least sauares and, 229-200 ‘mulcinearey and, 26,21 in mula regranion anal, sit ‘online fusions and, 182 (Gualiaue dependent tare and, i omugancons equations mushy an, Dou 7.98) 281-28, 28 Overidemited squat, 29°29, Parameterish 1.5. 67 in srple repression analysis, 135 ‘tats and. 1-72 (See ao ecte prwererd) Parameter eshte rallye cegseaion ana, 30-138 141-168, 18 test of, im stale egressien anal 13g, 11, sa 18 (See al Estinsied parame) Partin! astocareatinn Fain WPACH), 244-245, 251-253 Partial sreeation cowie, 158-159, iret, 1 Pearson's cefichne of skewnes (12 ‘Skewes, coefient of} Porceiies 23-24 eect nea eats, 172-173 Perfect srulbcaliearic, 210° Peemsations. 30 Pecsoaaistic (subjective) probability, a3 Puaearic curve, 18 6 40, 35-97, 01, #8 Polymowsial function, 18, 1c 189 Popatiog, 2 ‘debe, 7 rou’ 23182 Sngeosped, UH, 3-28 Popalatioe mes, 18 thesia, 67-99, 72.84 ype tesa 59,510 19 12 Populaio parameters finetioal form ae in ump eed analy, 148 iaias tie ane elias, 172-179 Pests skewed dita 13 Peer eye 10 101,120 Protein ahi, SA 282 reno aod Forcasting, 197-198 {Se se Poesia) iin pn ti 3 avast a simp rereson anit foe, 128 {Se wa Simple rereasion aeabst Prive cass, 175178, 181-182, 187 Prot, 13428 ‘Paige svete 37-39, 240, eat of singe events, 36-37, 2-44, 62-63, Probab dtaton (et eka oan io serial). a2. 57-5 ince siucln as dart, a (See abo binomial dbo) cea ton as some, Rede Polson tribution a, 0, $5-59, 66 rehab theory, Probic model (esmaltive normal Funes 18, 198 Qualcative dependent varie, 1185 (Qualanive explanatory variahle 192, 198-195, 05 (Quartile deviation. 13,24 Quastes, 224 18 (ze Determination. cosfcient of Randoen isturbange (sre Ever term) Bandeen moplings, 3 in esurabon, 60739, 72-81, 84 in bspthess testing. 67, 87-89, 98-95, ‘ta ming tatoo the teen, 67-8 simple; defined, 72 in i rere a Mra INDEX Random variates im banomal distribution. 99, 81 omtinuoms, 41-42. 57- 36 Ginette. 12-40, St Random wit, 246 wits dt, 236 Randomized design, completely, UI ange, 13.24 ‘coeff in multiple regression amass, 172 in simple regression analysis. 44 Rank eadston, 233 Rank (Spaarman'e cretion ‘befhcent. 122133. 145 Reciprocal fietion, JF, 16-187 Receave modes 2 Redused-form coeticients, 232-237 Reduced-form equations 228-230 ana ralyig hs 1 237 ‘autocorrelation as pele in 328,215 220, 242 tury vyuiies 182. 189-19, ‘oie cerors in variables as problems in, heterosis problem er ms 1h a 278 9 rmaltcolinearty a probes 6-27, 210-212. 222-223 mhkph egress anaes i Ce Mune segresion analy sinople represion amas in (rer 'impie repression nalts) Regression vum ef quater (RSS), Tio-t1S, 132,404, 87 Rejection reson Im autacorelian, 208 217 I hypemests ecg. 87-39, 95-08 in mtltipe regression analysi Wel ln simple regression analysis, 188 128, 37, ean ear epic! se oy wan proto eer Fokpis anton eras Rerenae sample, 3.67.2 (Berle Saat nd arate a, 180 ‘someon an 1 meen ine Insp tpeaic ana 1) a an at tal ie fo I-88 BSS cee ou oom, oT ease Ri Samples, 92 RT Songs (Cn rat, 74,88 teeta aoe ‘Sea Rien siping soni es ne Sng sse a, Tl, 3s Eine nie oss semen SE oe Sig tae hams “Seqatr nA cae inte 8 Sunt daa of eeu, wa Terni, 27 8 Sion ene #087 ase eg soning ate sed sas FS 98 Sete dm 2 ae es awe a Scotiles dagarn seticenden ae ner) sacri Se he ema 28 9215 seta on area Seer ss nla stn a, saving et ma 13018, "ei Single reso ni, 10-13 ey ncn meted ne See ieee repro ono nas Ss a Ee Ne, sm nf et ein meta Tare co Sie te ae lege motel of 25 set Simao unin is, 24 Sinatra tote trode yy eas 21 pa nig ae aga 2 eee Sigecunar ie 2428 Bitoni of rise ef af anon 6, so in on a2, 4 6 tae fia oe ce ate “ea 1 Tus Sesto ct 2 Se ee aa ee Sunsne eae i a mieccaea 3 one alice as 2 228 "aandard deve ton (Con ‘ot costinsous petubity datnbation, STB " othe estimate, 7%, 130-131, 185 ih esimsnn, 62-71, 7296, -ka in bypeshas tang, 86 98, 97-98, Tat-iot inde eam ane. 259,290 ‘riagged vase 17 in mute reason anal, 1S in oman dation 3 rotary, 62 Pinpling aibsion of he mea, “a ins reressom anal 141 sutusielo an Sate enti, 6 Satta infrene 3.67. 70-7l 4 “Se ao Exton, Roche ‘ings Pas 2 “ind eomoret, 13-8248 arate ate -3,7 Seepise male regresion sabia inn ‘Bocuredanurbace (i Ete erm) Secharic equator, 1.5.78 Bechet cplanainy Saute te cependent cavities) Sochaie em face Ero tr) Sroused samping, 72 Stuur efi, 229-235 Senet fichavee ations, eo ‘Seu pees, 228-291, 285-297 SBuvenca Ubon or ‘seuiatca) Sbjct geeonaliic)pobabiy, ae sam of sects deviation, 186.137 Sem of devin 136133 Sem of squared deviation, 136-157 Bath of snes (88), 92°93, 110-118 eF ‘of binomial distribstion, 28, A 64 ‘tsoggous eebabi reno of dstbatioa, 15 ‘oF dation, 7 INDEX Systematic xampting, 72 Student's?) disibatice: ‘confine ser fr the mewn ‘ing. OTT S134, 88 im esination 11° i Forecast, 184, 197-1 ime hype vexing. BB. 98 Pecpertione af aren fr, 310 In-simple regresion antic, 1, Ts 14a ‘eet formats, 256 Theorem | (camping dictributon af the ‘mean 6 ‘Theoret 2 amplingssribuion the ‘Theoretical sampling dstution of the ‘micas. 72-74, 78 5, 83 Third meremens, 18, 30, model, 159-255, Time er anal, 13,208 Timosern da 6 Trent sani, 286 ‘Tent am of agurer CF8S) teypoties tesing 22 110-1 gl asst aay 127 ism regres an 132,184 Tyee exuenta) diagram, 47 a8 5S te Tota ur anite} Twrsactor ANOWA table, 13115, SC reestage lene pve) 73, ‘D2 al Twostage lest squares 2SLS}, 2, 237-356, 211 ‘Two-ta tt, 87-, 96-67, 101, 103,13, 161 Trove iar a 128, 1 alo Snape rogram salve) Tworway (twostecteth analysis IL) "ANOVA table [IE 11S Type | ewer, 8), 955m, 10, 19 Type Heer, 87, 95-96, 100, 119 Unblased extinates in forecast, 184, 197 i fonesional fren. 1 bind tists (Ca in hypothe tein. 105 ingens, 15, is sn sgl ees anys 1,17 ‘uatlatve depend variable an Unbinsed poi estimate, 62,85 Underdentiied equation, 219-250, 35 25 Unexplained roids Ungrouped dats 118 ree in, 235 eis Yartables tae anette veriaber) Virance. 3-30 ‘analy of, 92-43, 109-115 ‘bea unbiased or efficient, 133-134 nomial diswibati and, $1.55 usta, roe tr A supe regresion analysis, 128 na ety doe sefined, 5 gua esnrsguans eo Bas uare of fat of estima, 138 Aoecast ear, 188-184, 197-198, 205 hetroscedasticity an error wer of, 207-204, 212-215, 203-225 in mule regremion analy, 155, 15-168, 1%) in Poesen disinbution, 40, 56,1 ‘sill (ae Ress vain) vsmuple rgresoon aeaiyes, 11, tat Variation, coefficient of, 13, 14,29 Venn diagram. 6. 45, Vereal deviations, 136 Weighted average (mean 1,22 Weighted meas (average) 1, Woigme tepresioa, 219 White ier. 26 Woonsen sane eam 94 115,122 oe we samples, |
  • You might also like