You are on page 1of 4

Econometrics 30C00200 SPRING 2012 Antti Saastamoinen, Aalto University School of Economics Antti.Saastamoinen@aalto.

fi

PROBLEM SET 1
24.1. & 26.1.
(In order to keep you entertained, the problems have supposed to be funny titles on them)

PROBLEM 1: Not Back to Basics since we have not covered basics yet Answer/comment with ONE sentence, the following questions/statements. One point from each part. a) b) c) d) e) What is the difference between the residuals and the disturbances? The true underlying model is generally unknown. (TRUE of FALSE) Controlling relevant factors in our estimation is important, since The data at hand (sample) can be considered to be given (fixed). What is the distinction between an estimator and an estimate?

PROBLEM 2: Regression by means of means Consider two random variables Y and X taking the values given in the table I below. TABLE I Y 2 4 3 6 4 7 6 9 X 1 1 2 2 3 3 4 4

Then consider the following two random variables Y* and X* taking values in the table II below. TABLE II Y* 3 X* 1

4.5 5.5 7.5

2 3 4

Run two regression on both data sets such as that Y is explained by X and Y* is explained by X*. What do you observe from the slope coefficient and the intercept of the two regressions? Explain your findings. Also report the estimation results. Ill advise you to pay attention only to the column of coefficients in the regression output tables. Do not mind the other number yet. STATA HELP: reg y x PROBLEM 3: Derivation of an OLS estimator i.e. some Math Consider next the simple linear regression model Yi 0 1X i i . The subscript i refers to the ith observation when we have n observations. Do the following. a) Derive the OLS estimators of 0 and 1 . (3 p) b) Show that the sample average ( Y , X ) is on the estimated regression line. (2 p) PROBLEM 4: Familiarizing with the Data (not the pale Dude in Star Trek) and some regression Obtain WAGE1.dta dataset from the course webpage. This dataset has 526 observation (persons) and the variables describe the earnings and the background characteristics of the persons. Obviously the main purpose of the dataset has been to study what factors contribute to the potential wage differences among people. Differences may occur for instance from different education levels or there may be some sectorial differences in wages. This is a dataset from United States (from year 1976) so the earnings are in dollars (1976 dollars to be specific, but this does not matter for us). Although we are already eager to do some mystic regression stuff, it is important first to know the characteristic of the data. Thus here we first do a lot of descriptive statistics and then a small regression. 1 p for each part. a) Write the command describe to Stata to get the description of the variables. Obtain descriptive statistics for the following variables: wage, educ, exper, and tenure. Based on the means of these variables, how does the average person in the sample look like? b) Obtain the same descriptive statistics separately for males and females. Do males earn more in average terms? What do you conclude from the differences between males and females in three other variables? c) Obtain a histogram of the variable wage. How the wages have distributed in this sample? Is the distribution skewed to some direction?

d) Obtain correlation coefficients between wage and educ. What does this coefficient tell to you? e) Do a simple regression analysis where average hourly wage of person is explained by the years of schooling. Interpret the coefficient for schooling. Also report the regression table. STATA HELP: commands to apply in each part a) describe sum wage educ exper tenure b) sort female by female: sum wage educ exper tenure c) histogram wage d) correlate wage educ e) reg wage educ PROBLEM 5: Fertilizing your interest for regression So just to give you an idea that regression analysis is widely applied in very different contexts the following example is based on soybean yield data ().Consider a simple regression where the soybean yield is the dependent variable and the amount of fertilizer is the independent variable. That is the model is:

yield

fertilizer

Answer the following questions. Report the results. a) In what sense the above model can be related to the concept of conditional expected value that is E( yield | ferilizer ) ? (2 p) b) What relevant factors are likely to be included in the error term ? (1 p) c) Interpret the coefficient estimate 1 for fertilizer (variables are in levels)? Do you think that the estimate for the intercept i.e. 0 has a meaningful interpretation in this model (assuming that the model is correct in other respects). (2 p) PROBLEM 6: Bad Coefficient for Smoking Moms? Consider that we are interested how mothers smoking might affect the birth weight of the child. We have run the regression below (using BWGHT.dta, if you want to try it for ourselves) where infant birth weight in ounces (bwght) is regressed on cigs (average number of cigarettes that mother smoked per day during the pregnancy). The sample size is 1388 (number of births).
bwght 119.77 0.514 cigs

Do the following: a) What is the predicted birth weight of cigs=0 and cigs=20? What do you think about this difference? (1p) b) Again, do you think that this simple regression catches the causal relationship between birth weight and smoking habits of a mother or are there relevant factors which we have left outside the model? (1p) c) Below is the histogram of the variable cigs. Comment on the specific nature of this data. The summary statistics for the variable bwght are also provided. Comment why the above model might have difficulties in predicting the birth weights, using the the summary statistics as your guide. Based on the above model, what would be the amount of cigarettes smoked if the predicted birth weight is 125 ounces? Does your finding make much sense here? (3p)

0 0

.1

.2

Density .3

.4

.5

10

20 30 cigs smked per day while preg

40

50

birth weight, ounces ------------------------------------------------------------Percentiles Smallest 1% 61 23 5% 86 30 10% 93 35 Obs 1388 25% 107 38 Sum of Wgt. 1388 50% 75% 90% 95% 99% 120 132 143 149 161 Largest 172 176 192 271 Mean Std. Dev. Variance Skewness Kurtosis 118.6996 20.35396 414.2839 -.1458657 6.147639

You might also like