You are on page 1of 6

Econometrics 206-3

Exam III: 11.50 AM -1.20 PM, 24 April 2017

In answering these below, paste the Stata output only when it is asked. When
pasting output, use the copy as picture option. When testing a hypothesis, be sure
to mention the distribution of the test statistic, its degrees of freedom, the level of
significance and the associated critical value. DO NOT USE THE STATA test
COMMAND.

It would be easiest if you inserted your answer between the questions below and
returned this document. Rename the document as `your name.docx’ and upload it
on LMS.

You have to do this exam by yourself. You are allowed to consult the textbook and
your notes. You are NOT allowed to consult anybody whether by speaking, by text
messages or email or any other means. Violations will attract penalties as per
Ashoka policy.

1. (a) Regress log of wages on a constant and the female dummy. Paste
output here.

A: On regressing the log of wages on a constant and the female dummy


from the given data set, we obtain the following result:

n=1,000 R-squared= 0.1025

(b) Interpret the coefficient on the female dummy.

A: female or the coefficient of the female dummy in the above regression denotes
the increment on log(wages) over the average male log(wage). This gives the
effect of workers being female on log(wage). In this case the sample coefficient of
the female dummy is reported as -0.6905 implying that the average female wage
is 69.05% less than the average male wage. This estimation seems imply a very
high discrimination towards women. However, we must note that in this
particular regression, since we do not have any other independent variables this
estimated effect is unlikely to be a ceteris paribus effect and more likely to be a
biased effect.
(c) Test the null hypothesis that the coefficient on female dummy is -0.5 against
the alternative that the coefficient on female dummy is less than -0.5. Show your
workings. [5+5+10]

A: To test, H0: famale= -0.5 against HA: female<-0.5:

According to convention, let us take a 5% level of significance to test this


hypothesis.

From the table given in (a), we know that (hat)female=-0.6906 and


s.e.((hat)female)= 0.0647. Since, the hypothesis is one of a single restriction, we
use the t-test. The t-statistic is given by [{ (hat)female- }/s.e.{ (hat)female}] i.e.
{-0.6906-(-0.5)}/(0.0647)= -2.946.

For the given data set, the degrees of freedom is (n-k-1)= 1000-2=998. Since for
df >/= 120, the distribution tends to a standard normal distribution, we use the z-
table to find the critical value for a one-sided test at the 5% significance level.
This turns out to be -1.65. Since the t-statistic lies in the critical region, we reject
the null in favor of the alternate at the 5% significance level. This implies that the
coefficient of the female dummy is statistically different from -0.5 at the 5%
level.

In fact, we can see that at the 1% level of significance too, the null is rejected. The
1% level critical value from the z-table is approx. -2.33. Even for this level, the t-
statistic lies in the critical region and hence, the null is rejected.

2.
(a) Regress log of wages on a constant, the female dummy, age of the individual
and the square of age. Paste your output here.

A: On regressing the log of wages on a constant, the female dummy, age of the
individual and the square of age, we obtain the following result:

n= 1,000 R-squared= 0.1596


(b) Controlling for age and the square of age does not seem to substantially
change the coefficient of the female dummy. Why is that so? [5+5]

A: From the table above, we observe that age= 0.079388 and agesq= -0.0008853.
Controlling for age and the square of age does not change the estimated
coefficient of the female dummy much because neither age nor the square of age
have a large partial impact on the dependent variable i.e. log(wages) as can be
seen from the table above.

Moreover, the correlation between the female dummy and age as well as the
correlation between the female dummy and the square of age is minimal. This
can be observed in the tables below.

The small partial impact of the two new variables in our regression and their
minimal correlation with the female dummy go on to tell us that the bias created
by age and the square of age in the impact of the female dummy, reported earlier
was not a very significant one. Hence, there isn’t much difference between the
coefficient of the female dummy in the more restricted model and the one in the
less restricted model including age and the square of age.

3. (a) Regress log of wages on a constant, the female dummy, age of the
individual the square of age and the social group dummies for scheduled caste,
for scheduled tribe and for other backward caste. Note the omitted category is
the general castes (or forward castes). Paste your output here.
A: On regressing the log of wages on a constant, the female dummy, age of the
individual, the square of age, the dummy for scheduled caste, the scheduled tribe
dummy and the dummy for other backward caste (where the omitted category is
the general castes) we obtain the following result:
n= 1,000 R-squared= 0.1792

(b) Test the null hypothesis that none of the social group dummmies matter, i.e.,
controlling for sex, age and square of age, the average of log wages is the same
for all categories: scheduled castes, scheduled tribes, other backward castes and
the general (forward) castes. Do NOT use the Stata test command.
A: To test H0: scd=std=obc=0 against HA: null is not true.

Since the above hypothesis is one of multiple restrictions, we use the F-test. The
F-statistic is given by {(RSSr-RSSu)/J}/{RSSu/n-k-1} where RSSr is the residual
sum of squares from the restricted model where scd=std=obc=0, RSSu is the
residual sum of squares from the unrestricted model and J is the number of
restrictions.

From the table of the unrestricted regression in part (a), we know that RSSu=
727.152. We run a separate regression for the restricted model, from which we
find RSSr= 744.585. Here, J=3. Therefore, the F-statistic comes out to be 7.9353.

Using the F-table for df=(6, 993) at the 5% significance level, we get the critical
value equal to

(c) Test the null hypothesis that relative to the general (forward) castes,
scheduled castes and other backward castes suffer the same extent of
discrimination. If this requires new regressions, paste the output in your
answer. [5+15+15]
A: To test H0: scd=obc against HA: null is not true.
4. (a) Regress log of wages on a constant, the female dummy, age of the
individual the square of age, the social group dummies for scheduled caste, for
scheduled tribe and for other backward caste, and the education dummies for
illiterate, literate, primary, secondary, and higher secondary. Paste the output
here.
A: On regressing the log of wages on a constant, the female dummy, age of the
individual, the square of age, the social group dummies for scheduled caste, for
scheduled tribe and for other backward caste, and the education dummies for
illiterate, literate, primary, secondary, and higher secondary we obtain the
following result:

n=1,000 and R-squared=0.3934

(b) Compare the above regression with the regression in question 3 (without the
education dummies). Does the inclusion of education dummies alter the
discrimination against women, scheduled castes, scheduled tribes and other
backward castes? Why? [5+15]
A: Yes, the inclusion of education dummies DOES alter the discrimination against
women, scheduled castes, scheduled tribes and other backward castes. We can
see that the partial effect of each of these variables diminishes—female goes from
-.681 to -.479, scd goes from -.366 to -.133, std goes from -0.163 to -.041 and obc
goes from -.243 to -.139.

This is because of the effect of the education dummies on the log(wages). And
perhaps the correlation between the female, scd, std and obc dummies and the
illiterate, literate, primary, secondary and higher_secondary dummies. Therefore
when the latter are included in the regression, the bias that was earlier present
in the coefficients of the female, scd, std, and obc dummies disappears and
causes the estimated coefficients to change.
5. (a) To the explanatory variables in the regression in Qn 4(a), add land owned
(LandO) and land possessed (LandP) and re-run the regression. DO NOT paste
the output.
A: On regressing log(wages) on the given explanatory variables including landO
and landP, we obtain the following result:

lwages(hat)= 3.943 - .4768female + .05637age -.00053agesq -.1090scd -


.03277std -.14177obc -1.5228illiterate -1.2110literate -1.0845primary -
.8272secondary -.3401higher_secondary -9.16e-06landO + .00004landP

(b) Is either of the land variables individually significant at the 5 or 10% level?
A: H0: land0=0 against HA: lando not equal to 0. Degrees of freedom is 986.

t-statistic:

H0: landP=0 against landp not equal to 0.

t-statistic: (.00004/.000046)= 0.869. Critical value at the 5% level of significance


for the two sided test is 2.81 and for the 10% level is 1.65. Since the t-statistic
does not lie in the critical region, landP is not statistically significant at the 5 or
the 10% level. Since its partial impact is also negligible, we would say that it is
also not economically very significant.

(c) Now drop land owned (LandO) and re-run the regression. Is the included
land variable significant at the 5 or 10% level?

(d) Explain the pattern of results observed in (b) and (c).

[4+4+7]

You might also like