Professional Documents
Culture Documents
Jerome B. Capalac
I. Write the equations for the true and estimated relationships between X and Y.
y i=b o +b1 X 1 + i
True Equation:
II. Why do we NOT simply take the sum of the deviations without squaring them?
We do not simply take the sum of deviations without squaring them because if we
compute the mean of the deviations by summing the deviations and dividing by the sample
size, we will run into a problem. There will be an equal size of positive and negative values
which will be cancelled out and would result to the sum of deviation equal to zero. The
property of the sample mean is the sum of the deviations below the mean equal to the sum
of the deviations above the mean. However, the goal is to capture the magnitude of these
deviations in a summary measure. To address this problem of the deviations summing to
zero, we could take absolute values or square each deviation from the mean. Both methods
would address the problem. The more popular method to summarize the deviations from
the mean involves squaring the deviations.
III. Table 1 gives the bushels of corn per acre, Yt, resulting from the use of the various amounts
of fertilizer in pounds per acre, Xt, produced on a farm in each of 10 years from 1971 to
1980.
Year n Yt Xt
1971 1 40 6
1972 2 44 10
1973 3 46 12
1974 4 48 14
1975 5 52 16
1976 6 58 18
1977 7 60 22
1978 8 68 24
1979 9 74 26
1980 10 80 32
a. Compute the values of b^0 and b^1 .
(X XX )( Y Y ) 1 = 1.6597
1 = (X X )2 0 = Y - 1 X
0 = 27.1254
b. Write the regression equation.
Y = 27.1254 + 1.6597 Xt
c. Interpret the regression result (in terms of the direction of the relationship of the
variable and interpretation of intercept and slope coefficient).
The direction of the relationship between bushels of corn per acre and the
amount of fertilizers (pounds per acre) used is positive. This means that the
greater the amount of fertilizers used, the greater is the quantity of bushels of
corn per acre produced.
Intercept: There will be only 27.1254 bushels of corn per acre produced if
zero pounds or no amount of fertilizers per acre is used.
Slope coefficient: For every 1 pound amount of fertilizers per acre is used,
there will be an increase of 1.6597 bushels of corn per acre produced.
to zero.
Y^ = 27.1254 + 1.6597 Xt
= Yi- Y
Residuals
COR FERTILIZER
YEAR N N (Xt) Yi 1 (Xt- X) (Yt-Y)
(Yt)
1971 1 6 40 93.5143 -87.5134 -12 -17
1972 2 10 44 100.152 -90.1522 -8 -13
2
1973 3 12 46 103.471 -91.4716 -6 -11
6
1974 4 14 48 106.791 -92.791 -4 -9
1975 5 16 52 113.429 -97.4298 -2 -5
8
1976 6 18 58 123.388 -105.388 0 1
1977 7 22 60 126.707 - 4 3
4 104.7074
1978 8 24 68 139.985 -115.985 6 11
1979 9 26 74 149.943 - 8 17
2 123.9432
1980 10 32 80 159.901 - 14 23
4 127.9014
180 570 1217.28 - 0 0
39 1037.283
IV. The following chart shows the relationship between two variables: (SALARY = Y-variable,
EDUC = X - variable) annual salary and total years of education.
A. Enter the SALARY and EDUC data into Excel. Make a scatter plot of the data.
1 11 40000
2 12 37000
3 11 34000
4 8 12000
5 12 45000
6 16 95000
7 18 100000
8 12 42000
9 12 49000
10 17 120000
X = 129 Y = 574,000
X = 12.9 Y = 57,400
B. The general equation for linear relationship is SALARY = B0 + B 1 EDUC. Suppose 0 = 0 and
1=10,000. In Excel, show the linear relationship given the values of b0, b1.
If 0 = 0 and 1=10,000, then the equation or formula for Y^ would be
SALARY_HAT = 0 + 10,000 EDUC
1 11 40000 110000
2 12 37000 120000
3 11 34000 110000
4 8 12000 80000
5 12 45000 120000
6 16 95000 160000
7 18 100000 180000
8 12 42000 120000
9 12 49000 120000
10 17 120000 17000
Y^
D. You now have an estimate of the economic relationship. Calculate the explained portion of
SALARY and the unexplained, or residual, part of SALARY.
^ 0 Y - ^ 1 X ^ 1= Xi X Yi
=
Y
X i X
= 57,400 - (0 * 12.9)
^ 0 ^ 1
= 57,400 = 0
E. Provide a table of fitted and residual values for SALARYi over the 10 observations.
You now want to determine how well the economic theory explains the observed relationship
between SALARY and EDUC. You also want to determine how good your best-fit line is
relative to other options.
n X Y Y^ u^ ^
= Y- Y
I
F. Calculate your sum of the residual values over the 10 observations. What is desirable for
this sum? What does your sum of residuals tell you about your line?
u^
= Y- Y^
i Normally, the sum of the residual values is
equal to zero because some of the errors are
-70,000 negative while others are positive, so these will just
be cancelled out. However in the case of this
-83,000 problem, the sum of the residual values is a
-76,000
negative number but it wouldnt matter because
u^i
-68,000
is actually not a good measure of goodness of
fit of the estimated from the actual economic
-75,000 relationship of salary and education.
-65,000
-80,000
-78,000
-71,000
-50,000
ui=716,000
G. Calculate your sum of the squared residual values over the 10 observations. What is
desirable for this sum? What does your sum of residuals tell you about your line?
The simple-minded alternative to the economic theory is to simply calculate the mean for
SALARY over the sample.
H. Calculate the mean for SALARY over the 10 observations. Does the mean provide as good
an explanation for the behavior of SALARY as your line? Why or why not? Be specific (what
is the sum of squared residuals)?
Salary (Y) N
40000 1
By definition, the mean is used to give a
measure of central tendency for a set of
37000 2
observations and the points are used to
34000 3 summarize the location of observations as
to where data lies or can be assumed to be
12000 4 laying for all summary purpose.
100000 7
42000 8
49000 9
120000 10
Y = 574,000
Y = Y / n
Y = 574,00 / 10
Y = 57,400
I. In the context of this example, explain why there exists an error or residual term and exactly
what it represents.
An error term represents the margin of error within a statistical model, referring to the
sum of the deviations within the regression line, which provides an explanation for the
difference between the results of the model and actually observed results. The regression
line is used as a point of analysis when attempting to determine the correlation between one
independent variable and one dependent variable.
J. Calculate the ordinary least squares estimates for 0 and 1. Do not use the Excel regression
function (thats cheating), rather calculate the components and plug them into the OLS
formula. Show your work.
Yi Xi Xi - Yi - ( X )2 Yi - Y^i u^ u^ 2
N (
X Y Y )(
Xi
-
X
)
1 40000 11 -1.9 -17400 3.61 33,060 36,817.03 3182.97 10131290.38
2 37000 12 -0.9 -20400 0.81 18,360 47650.17 -10650.17 113426178.5
3 34000 11 -1.9 -23400 3.61 44,460 36817.03 -2817.03 7935664.78
4 12000 8 -4.9 -45400 24.01 222,460 4317.61 7682.39 59019116.82
5 45000 12 -0.9 -12400 0.81 11,160 47650.18 -2650.17 7023415.34
6 95000 16 3.1 37600 9.61 116,560 90982.74 4017.27 16138388.35
7 100000 18 5.1 42600 26.01 217,260 112469.02 -12649.01 159977750
8 42000 12 -0.9 -15400 0.81 13,860 47650.17 -5650.17 31924451.54
9 49000 12 -0.9 -8400 0.81 7,560 47650.17 1349.83 1822033.74
10 120000 17 4.1 62600 16.81 256,660 101818.9 18184.13 330553117.2
574,00 129 0 0 86.9 941,400 0.04 738,080,552
(X XX )(Y Y )
1 =
(X X )2
= 941400/86.9
1 = 10,833.14
Y Y
0OLS =Y
Y - Y1 X
X
= 57,400 -[10,833.14(12.9)]
0 = - 82,347.51
^
OLS = Y =82,347.525+10,833.1415 X i
K. Show that the OLS estimates are better than the values given in part (B) above.
Explain your answer.
OLS : Yi 82,347.5254 10,833.1415 X i
Yi 0 10,000 X i
a They are unbiased. This means that the OLS estimates of the coefficients are
centered on the true population values of the parameters being estimated.
b They are minimum variance. This means that no other unbiased estimator has a
lower variance for each 1 than OLS.
c They are consistent. This means that as the sample size approaches infinity, the
estimates converge on the true population parameters.
d They are normally distributed.
V. Use the data set in WAGE2.RAW for this problem. As usual, be sure all of the following
regressions contain an intercept.
~
1. Run a simple regression of IQ on educ to obtain the slope coefficient, say, .
1
N = 935
R2 = 0.27
~
1 = 3.5338
^
IQ = 53.69 + 3.53 educ
~
2. Run the simple regression of log(wage)on educ, and obtain the slope coefficient, 1 .
N = 935
R2 = 0.10
~
1 = 0.05984
^
log( wage) = 5.97 + 0.0598 educ
3. Run the multiple regression of log(wage) on educ and IQ, and obtain the slope
^ ^
coefficients, 1 and , respectively.
2
N = 935
2
R = 0.13
^
log ( wage) = 5.66 + .039 educ + 0.0058 IQ
^
1 = 0.03912
^
2 = 0.00586
~ ^ ^ ~
4. Verify that 1 = 1 + 2
1 .
^
1 = 0.03912
^
2 = 0.0058631
~
1 = 0.05984
~
1 = 3.533829
~ ^ ^ ~
1 = 1 + 2
1.