You are on page 1of 5

Some feedback assignment 1

Just what is relevant for assignment 2, rest will be discussed in question


hour for later assignments and exam

Quantitative Methods for Applied


Economics

Units of variables in descriptives and interpretation

Complete answer: command(s), output, answer/interpretation

Question hour Assignment 2

Interpretation: significance (and level); ceteris paribus; in case of


dummy/categorical variables, mention reference category (and interpret all
categories one by one, dont just say worse sphealth increases
expenditures); dont just mention coefficient but interpret it

Teresa Bago dUva


Erasmus School of Economics
Department of Applied Economics

Predicted/fitted equation does not include the error term


Question 3: include interaction and understand what interaction means
(difference in age effect between males and females)
Conclusion question: Dont just repeat the detailed results.

14 September 2016
1

Now the doubts received about


assignment 2

Do file: make sure to open data and log file correctly (and close log file)

1. Fitted model/estimated equations

Example probit PT use: estimated


equation for probability
Probit regression

Log likelihood = -636.96328

Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2

=
=
=
=

3000
54.65
0.0000
0.0411

2. Scatterplot

-----------------------------------------------------------------------------ptuse |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------urban | -.1178419
.0782359
-1.51
0.132
-.2711814
.0354976
age |
.013089
.0018674
7.01
0.000
.009429
.016749
_cons |
1.201824
.0834919
14.39
0.000
1.038183
1.365465
------------------------------------------------------------------------------

P r y 1 | age, urban 1.201824 0.1178419.urban 0.013089.age


where . is the cumulative distribution function (CDF) of the
standard normal distribution
- How to compute? In exercise lecture do file.
- Or statistical table of standard normal distribution.

Difference between black female and black male for


two ages
Simply indicate in scatterplot (can even write by hand on printed graph)
Should be clear you understand what the effect is

3. Linear regression model vs probit

Received one question about interpretation of graph in question 8 (typo?)

Graph with confidence intervals


Exercise lecture

3. Linear regression model vs probit


Goal is to talk about effects of variables, based on what you got
in questions 2 and 3
No need for graphs
Effects of all variables
Can only compare what is comparable (think of interpretation)

4. Marginal effects

Marginal effects of categorical/dummy variables at


values of your choice

Average marginal effects


Stata commands:

logit ptuse i.urban age


margins, dydx(*)

Average marginal effects


Model VCE
: OIM

10

. margins, dydx(*) at(urban=0 age=50)


Conditional marginal effects
Model VCE
: OIM

Number of obs

3000

Expression
: Pr(ptuse), predict()
dy/dx w.r.t. : 1.urban age
at
: urban
=
age
=

Expression
: Pr(ptuse), predict()
dy/dx w.r.t. : 1.urban age

Number of obs

3000

0
50

-----------------------------------------------------------------------------|
Delta-method
|
dy/dx
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------urban |
Urban | -.0081151
.0058582
-1.39
0.166
-.0195971
.0033668
age |
.0008388
.0001187
7.07
0.000
.0006062
.0010714
-----------------------------------------------------------------------------Note: dy/dx for factor levels is the discrete change from the base level.

-----------------------------------------------------------------------------|
Delta-method
|
dy/dx
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------urban |
Urban | -.0117787
.0084613
-1.39
0.164
-.0283625
.0048051
age |
.001381
.0002193
6.30
0.000
.0009513
.0018108
-----------------------------------------------------------------------------Note: dy/dx for factor levels is the discrete change from the base level.

In the case of dummy


variables, Stata always
calculates:
Pr ptuse 1 | urban 1, age 50
Pr ptuse 1 | urban 0, age 50

= -0.0081 (0.8 pp)

. margins, dydx(*) at(urban=1 age=50)


Conditional marginal effects
Model VCE
: OIM
Expression
: Pr(ptuse), predict()
dy/dx w.r.t. : 1.urban age
at
: urban
=
age
=

On average, living in an urban area decreases probability of () by 1.17 pp, compared


to living in a rural area (), ceteris paribus. Effect insignificant at ().
11

Number of obs

3000

1
50

-----------------------------------------------------------------------------|
Delta-method
|
dy/dx
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------urban |
Urban | -.0081151
.0058582
-1.39
0.166
-.0195971
.0033668
age |
.0010318
.0001194
8.64
0.000
.0007978
.0012658
-----------------------------------------------------------------------------Note: dy/dx for factor levels is the discrete change from the base level.

For individuals living in


an urban area and aged 50,
an additional year of age
increases probability () by
0.1pp (). Effect significant
at ().
12

Results with i.age_categ or list of dummy


variables

Different base category?


logit ptuse i.urban ib2.age_categ

. logit ptuse i.urban i.age_categ


Logistic regression
Log likelihood = -617.88621

Number of obs
LR chi2(4)
Prob > chi2
Pseudo R2

=
=
=
=

Logistic regression

3,000
92.80
0.0000
0.0698

-----------------------------------------------------------------------------ptuse |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------urban |
Urban | -.1917626
.1653453
-1.16
0.246
-.5158335
.1323083
|
age_categ |
2 | -.4105884
.180612
-2.27
0.023
-.7645815
-.0565953
3 |
1.02875
.2284758
4.50
0.000
.5809455
1.476554
4 |
2.114608
.4316005
4.90
0.000
1.268687
2.96053
|
_cons |
2.559948
.1677366
15.26
0.000
2.23119
2.888706
------------------------------------------------------------------------------

logit ptuse urban age20_39 age40_59 age60_plus

Logistic regression
Log likelihood = -617.88621

Number of obs
LR chi2(4)
Prob > chi2
Pseudo R2

=
=
=
=

3,000
92.80
0.0000
0.0698

-----------------------------------------------------------------------------ptuse |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------urban | -.1917626
.1653453
-1.16
0.246
-.5158335
.1323083
age20_39 | -.4105884
.180612
-2.27
0.023
-.7645815
-.0565953
age40_59 |
1.02875
.2284758
4.50
0.000
.5809455
1.476554
age60_plus |
2.114608
.4316005
4.90
0.000
1.268687
2.96053
_cons |
2.559948
.1677366
15.26
0.000
2.23119
2.888706
------------------------------------------------------------------------------

Model output (ie,


estimated
coefficients are
exactly the same, if
same reference
category)

Marginal effects
different: obtained
after model with
i.age_categ are
correct; obtained
after model with
separate dummies
are wrong.
13

Log likelihood = -617.88621

Number of obs
LR chi2(4)
Prob > chi2
Pseudo R2

=
=
=
=

3,000
92.80
0.0000
0.0698

-----------------------------------------------------------------------------ptuse |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------urban |
Urban | -.1917626
.1653453
-1.16
0.246
-.5158335
.1323083
|
age_categ |
1 |
.4105884
.180612
2.27
0.023
.0565953
.7645815
3 |
1.439338
.2217098
6.49
0.000
1.004795
1.873882
4 |
2.525197
.4279787
5.90
0.000
1.686374
3.36402
|
_cons |
2.14936
.164105
13.10
0.000
1.82772
2.471
-----------------------------------------------------------------------------margins, dydx(*)
Average marginal effects
Model VCE
: OIM

Number of obs

3,000

Expression
: Pr(ptuse), predict()
dy/dx w.r.t. : 1.urban 1.age_categ 3.age_categ 4.age_categ
-----------------------------------------------------------------------------|
Delta-method
|
dy/dx
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------urban |
Urban | -.0099922
.0084661
-1.18
0.238
-.0265855
.0066012
|
age_categ |
1 |
.0358715
.0158533
2.26
0.024
.0047996
.0669434
3 |
.0857047
.0135678
6.32
0.000
.0591122
.1122971
4 |
.1054683
.0131294
8.03
0.000
.0797352
.1312014
-----------------------------------------------------------------------------Note: dy/dx for factor levels is the discrete change from the base level.

Implications for
interpretation?
interpretation is always
compared to the
reference category

Can choose base


category:
With separate
dummies, change
the one left out
With i.categ, for
example:
ib2.age_categ

14

Question 6
Probit models
Same xs as question 5, except that age will enter the model differently:
One in categories (up to you which and how many categories)
Other one not in categories but also not (just) age
In both cases, need to create new variables

5. Question 6

If different goodness of fit measures give contradictory results:


Mention this in your answer. In practice, model selection is often
not black or white.
If one measure can be considered dominant (think of measures
which relate to each other), can also mention this.

15

16

Question 8
Cannot see from the data how long a person has smoked, that is why
need to assume something: they all started at the same time (does not
matter when)
=> what do you know about duration of smoking?

6. Question 8

Include other explanatory variables in Q8a? Up to you


8b:
Smoking explains results for educated above? In principle, this
refers to the results of Question 5, but can also be results of
Question 8a if different that Q5.
Need to do Stata analysis
17

18

7. Other question(s)
for each explanatory variable (question 5). Show Stata output with
marginal effects for all the variables included in the model. Interpret just
for the two variables mentioned.

Good luck!

19

20

You might also like