You are on page 1of 31

3SLS

3SLS is the combination of 2SLS and SUR.



It is used in an system of equations which are endogenous, i.e. In each
equation there are endogenous variables on both the left and right hand
sides of the equation. THAT IS THE 2SLS PART.

But there error terms in each equation are also correlated. Efficient
estimation requires we take account of this. THAT IS THE SUR
(SEEMINGLY UNRELATED REGRESSIONS). PART.



Hence in the regression for the ith equation there are endogenous (Y )
variables on the rhs AND the error term is correlated with the error terms in
other equations.


3SLS
log using "g:summ1.log"

If you type the above then a log is created on drive g
(on my computer this is the flash drive, on yours you
may need to specify another drive.

The name summ1 can be anything. But the suffx
must be log

At the end you can close the log by typing:

log close

So open a log now and you will have a record of this
session
3SLS Load Data
Clear
use http://www.ats.ucla.edu/stat/stata/examples/greene/TBL16-2

THAT link no longer works. But the following does
webuse klein
In order to get the rest to work
rename consump c
rename capital1 k1
rename invest i
rename profits p
rename govt g
rename wagegovt wg
rename taxnetx t
rename totinc t
rename wagepriv wp
generate x=totinc





*generate variables
generate w = wg+wp
generate k = k1+i
generate yr=year-1931
generate p1 = p[_n-1]
generate x1 = x[_n-1]
OLS Regression
regress c p p1 w

Regresses c on p , p1 and w (what this equation means is not so
important).

Usual output

_cons 16.2366 1.302698 12.46 0.000 13.48815 18.98506
w .7962188 .0399439 19.93 0.000 .7119444 .8804931
p1 .0898847 .0906479 0.99 0.335 -.1013658 .2811351
p .1929343 .0912102 2.12 0.049 .0004977 .385371

c Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 941.429389 20 47.0714695 Root MSE = 1.0255
Adj R-squared = 0.9777
Residual 17.8794524 17 1.05173249 R-squared = 0.9810
Model 923.549937 3 307.849979 Prob > F = 0.0000
F( 3, 17) = 292.71
Source SS df MS Number of obs = 21
reg3
By the command reg3, STATA estimates a system of structural
equations, where some equations contain endogenous variables
among the explanatory variables. Estimation is via three-stage
least squares (3SLS). Typically, the endogenous regressors are
dependent variables from other equations in the system.

In addition, reg3 can also estimate systems of equations by
seemingly unrelated regression (SURE), multivariate regression
(MVREG), and equation-by-equation ordinary least squares
(OLS) or two-stage least squares (2SLS).
2SLS Regression
reg3 (c p p1 w), 2sls inst(t wg g yr p1 x1 k1)


Regresses c on p , p1 and w. The instruments (i.e. The predetermined
or exogenous variables in this equation and the rest of the system) are
t wg g yr p1 x1 k1

This means that p and w (which are not included in the instruments
are endogenous).

The output is as before, but it confirms
what the exogenous and endogenous
variables are.
.

Exogenous variables: t wg g yr p1 x1 k1
Endogenous variables: c p w

_cons 16.55476 1.467979 11.28 0.000 13.45759 19.65192
w .8101827 .0447351 18.11 0.000 .7158 .9045654
p1 .2162338 .1192217 1.81 0.087 -.0353019 .4677696
p .0173022 .1312046 0.13 0.897 -.2595153 .2941197
c

Coef. Std. Err. t P>|t| [95% Conf. Interval]


c 21 3 1.135659 0.9767 225.93 0.0000

Equation Obs Parms RMSE "R-sq" F-Stat P

Two-stage least-squares regression
2SLS Regression
ivreg c p1 (p w = t wg g yr p1 x1 k1)

This is an alternative command to do the same thing. Note that the
endogenous variables on the right hand side of the equation are
specified in (p w

And the instruments follow the = sign.

The results are identical.
Instruments: p1 t wg g yr x1 k1
Instrumented: p w

_cons 16.55476 1.467979 11.28 0.000 13.45759 19.65192
p1 .2162338 .1192217 1.81 0.087 -.0353019 .4677696
w .8101827 .0447351 18.11 0.000 .7158 .9045654
p .0173022 .1312046 0.13 0.897 -.2595153 .2941197

c Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 941.429389 20 47.0714695 Root MSE = 1.1357
Adj R-squared = 0.9726
Residual 21.9252518 17 1.28972069 R-squared = 0.9767
Model 919.504138 3 306.501379 Prob > F = 0.0000
F( 3, 17) = 225.93
Source SS df MS Number of obs = 21
Instrumental variables (2SLS) regression
3SLS Regression
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)

This format does two new things. First it specifies all the three
equations in the system. Note it has to do this. Because it needs to
calculate the covariances between the error terms and for this it needs
to know what the equations and hence the errors are.

Secondly it says 3sls not 2sls
All 3 equations are printed out. This tells us
what these equations look like

Exogenous variables: t wg g yr p1 x1 k1
Endogenous variables: c p w i wp x

_cons 1.797216 1.115854 1.61 0.107 -.3898181 3.984251
yr .149674 .0279352 5.36 0.000 .094922 .2044261
x1 .181291 .0341588 5.31 0.000 .1143411 .2482409
x .4004919 .0318134 12.59 0.000 .3381388 .462845
wp

_cons 28.17785 6.793768 4.15 0.000 14.86231 41.49339
k1 -.1948482 .0325307 -5.99 0.000 -.2586072 -.1310893
p1 .7557238 .1529331 4.94 0.000 .4559805 1.055467
p -.0130791 .1618962 -0.08 0.936 -.3303898 .3042316
i

_cons 16.44079 1.304549 12.60 0.000 13.88392 18.99766
w .790081 .0379379 20.83 0.000 .715724 .8644379
p1 .1631439 .1004382 1.62 0.104 -.0337113 .3599992
p .1248904 .1081291 1.16 0.248 -.0870387 .3368194
c

Coef. Std. Err. z P>|z| [95% Conf. Interval]


wp 21 3 .7211282 0.9863 1594.75 0.0000
i 21 3 1.446736 0.8258 162.98 0.0000
c 21 3 .9443305 0.9801 864.59 0.0000

Equation Obs Parms RMSE "R-sq" chi2 P

Three-stage least-squares regression
Lets compare the three different sets of equations. Look at the coefficient on
w. In OLS very significant and in 2SLS not significant but in 3SLS its back to
similar with OLS and significant. That is odd.

Now I expect that if 2sls is different because of bias then so should 3sls. As it
stands it suggests that OLS is closer to 3SLS than 2SLS is to 3SLS. Which does
not make an awful lot of sense.

But we do not have many observations. Perhaps that is partly why.


3SLS 2SLS OLS
coefficient t stat coefficient t stat coefficient t stat
p 0.125 1.16 0.017 0.13 0.193 2.12
p1 0.163 1.62 0.810 18.11 0.090 0.99
w 0.790 20.83 0.216 1.81 0.796 19.93
_cons 16.441 12.6 16.555 11.28 16.237 12.46
R2 0.98 0.977 0.981
3SLS Regression
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)
matrix sig=e(Sigma)

Now this command stores the variances and covariances between the
error terms in a matrix I call sig.

You have used generate to generate variables, scalar to generate scalars.
Similarly matrix produces a matrix.

e(Sigma)stores this variance covariance matrix from the previous
regression


3SLS Regression
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)
matrix sig=e(Sigma)
display sig[1,1], sig[1,2], sig[1,3]
display sig[2,1], sig[2,2], sig[2,3]
display sig[3,1], sig[3,2], sig[3,3]










. display sig[1,1], sig[1,2], sig[1,3]
1.0440596 .43784767 -.3852272

.
. display sig[2,1], sig[2,2], sig[2,3]
.43784767 1.3831832 .19260612

.
. display sig[3,1], sig[3,2], sig[3,3]
-.3852272 .19260612 .47642626
1.04406 0.437848 -0.38523
0.437848 1.383183 0.192606
-0.38523 0.192606 0.476426
Variance of 1
st
error term
Covariance of error terms
from equations 2 and 3
3SLS Regression










.
1.04406 0.437848 -0.38523
0.437848 1.383183 0.192606
-0.38523 0.192606 0.476426
This relates to the variance
covariance matrix in the lecture
Hence 0.437848 relates to
12
and
of course
21
This matrix is
3SLS Regression
display sig[1,2]/( sig[1,1] ^0.5* sig[2,2]^0.5)

Now this should give the correlation between the error terms from
equations 1 and 2.

It is this formula Correlation (x, y) =
xy
/(
x

x
). When we do this we get:






.36435149
. display sig[1,2]/( sig[1,1]^0.5* sig[2,2]^0.5)
Lets check
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)
matrix sig=e(Sigma)
matrix cy= e(b)
generate rc=c-(cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4])
generate ri=i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8])
correlate ri rc

matrix cy= e(b) stores the coefficients from the regression in a
regression vector we call cy,

cy[1,1] is the first coefficient on p in the first equation
cy[1,4] is the fourth coefficient in the first equation (the constant term)
cy[1,5] is the first coefficient ion p in the second equation
Note this is cy[1,5] NOT cy[2,1]




Lets check
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1
k1)matrix sig=e(Sigma)
matrix cy= e(b)
generate rc=c-(cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4])
generate ri=i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8])
correlate ri rc

Thus cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4] is the predicted value
from this first regression. and

i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8])

Is the actual minus the predicted value, i.e. The error term from the 2nd
equation
correlate ri rc prints out the correlation between the two error terms




Lets check
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)matrix
sig=e(Sigma)
matrix cy= e(b)
generate rc=c-(cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4])
generate ri=i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8])
correlate ri rc






The correlation is 0,30, close to what we had before. But not the same.
Now the main purpose of this class is to illustrate commands. So its not
too important. I think it could be because stata is not calculating the
e(sigma) matrix by dividing by n-k, but just n?????





rc 0.3011 1.0000
ri 1.0000

ri rc
(obs=21)
. correlate ri rc
Lets check
Click on help (on tool bar at the top of the screen to the right).
Click on stata command
In the dialogue box type reg3

Move down towards the end of the file and you get the following







e(cons_#) 1 when equation # has a constant; 0 otherwise
e(ic) number of iterations
e(p_#) significance for equation #
e(chi2_#) chi-squared for equation #
e(ll) log likelihood
e(dfk2_adj) divisor used with VCE when dfk2 specified
e(rmse_#) root mean squared error for equation #
e(F_#) F statistic for equation # (small)
e(r2_#) R-squared for equation #
e(df_r) residual degrees of freedom ( small)
e(rss_#) residual sum of squares for equation #
e(df_m#) model degrees of freedom for equation #
e(mss_#) model sum of squares for equation #
e(k_eq) number of equations
e(k) number of parameters
e(N) number of observations
Scalars
reg3 saves the following in e():
Saved results
Some important retrievables
e(mss_#) model sum of squares for equation #
e(rss_#) residual sum of squares for equation #
e(r2_#) R-squared for equation #
e(F_#) F statistic for equation # (small)
e(rmse_#) root mean squared error for equation #
e(ll) log likelihood

Where # is a number e.g. If 2 it means equation 2.

And

Matrices
e(b) coefficient vector
e(Sigma) Sigma hat matrix
e(V) variance-covariance matrix of the estimators






The Hausman Test Again
We looked at this with respect to panel data. But it is a general test to
allow us to compare an equation which has been estimated by two
different techniques. Here we apply the technique to comparing ols
with 3sls.


reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr),ols
est store EQNols

reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr) , 3sls inst(t wg g yr p1
x1 k1)
est store EQN3sls

hausman EQNols EQN3sls





The Hausman Test Again
Below we run the three regressions specifying ols and store
the results as EQNols.

reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr),ols
est store EQNols

Then we run the three regressions specifying 3sls and store
the results as EQN3sls.

reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr) , 3sls inst(t wg g yr p1
x1 k1)
est store EQN3sls

Then we do the Hausman test
hausman EQNols EQN3sls





The Results


(V_b-V_B is not positive definite)
Prob>chi2 = 0.9963
= 0.06
chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
Test: Ho: difference in coefficients not systematic
B = inconsistent under Ha, efficient under Ho; obtained from reg3
b = consistent under Ho and Ha; obtained from reg3

w .7962188 .790081 .0061378 .0124993
p1 .0898847 .1631439 -.0732592 .
p .1929343 .1248904 .068044 .

EQNols EQN3sls Difference S.E.
(b) (B) (b-B) sqrt(diag(V_b-V_B))
Coefficients
. hausman EQNols EQN3sls
The table prints out the two sets of coefficients and their difference.

The Hausman test statistic is 0.06

The significance level is 0.9963

This is clearly very far from being significant at the 10% level.
The Hausman Test Again
Hence it would appear that the coefficients from the two
regressions are not significantly different.

If OLS was giving biased estimates that 3SLS corrects they
would be different.

Hence we would conclude that there is no endogeneity which
requires endogenous techniques.

But because the error terms do appear correlated SUR is
probably the approriate technique as it produces better
results.






Tasks
1. Using the display command, e.g.

display e(mss_2)

Print on the screen some of the retrievables from eqach regression (the
above the model sum of squared residuals for the second equation.

2. Lets look at the display command

Type:

display "The residual sum of squares =" e(mss_2)

Tasks

display "The residual sum of squares =" e(mss_2), "and the
R2 =" e(r2_2)

display _column(20) "The residual sum of squares ="
e(mss_2), _column(50) "and the R2 =" e(r2_2)

display _column(20) "The residual sum of squares ="
e(mss_2), _column(60) "and the R2 =" e(r2_2)

display _column(20) "The residual sum of squares ="
e(mss_2), _column(60) "and the R2 =" _skip(5) e(r2_2)

display _column(20) "The residual sum of squares ="
e(mss_2), _column(60) "and the R2 =" _skip(10) e(r2_2)




Tasks

Close log:

log close

And have a look at it in word.

webuse klein
In order to get the rest to work
rename consump c
rename capital1 k1
rename invest i
rename profits p
rename govt g
rename wagegovt wg
rename taxnetx t
rename totinc t
rename wagepriv wp
generate x=totinc
generate w = wg+wp
generate k = k1+i
generate yr=year-1931
generate p1 = p[_n-1]
generate x1 = x[_n-1]
reg3 (c p p1 w), 2sls inst(t wg g yr p1 x1
k1)
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr),
3sls inst(t wg g yr p1 x1 k1)

You might also like