You are on page 1of 5

*First Change your working directory

cd "C:\Users\H\Desktop"
*Initiate your log file: Save your commands and results under your working direc
tory
log using 24October2016.log
*Note that the data set is under different folder on your computer. Do not forge
t to adapt the path to your computer.
use "C:\Users\H\Desktop\AppEcon1\wage1.dta", clear
*****STORING REGRESSION RESULTS IN STATA AND CREATING TABLES
reg wage marmale marfem singfem
*After the regression, let's store the results of this regression and name it re
g1
estimates store reg1
reg wage marmale marfem singfem educ exper expersq tenursq tenure
estimates store reg2
*Now present the results of these regressions in a table,
*b:coefficient estimates, se: standart errors.
*b(%8.4f) is about the font and rounding sensitivity of the output.
estimates table reg1 reg2, b se t p stats(N r2) b(%8.4f)
***LPM (Linear Probability Model)
*Binary dependent variable y = 1 or y = 0. it may indicate
*whether an adult has a high school education, whether a
*household owns a house, whether an adult is married, owns a
*car, etc.
*The case where y = 1 is called success whereas y = 0 is called
*failure.
*What happens if we regress a 0-1 variable on a set of
*independent variables? How can we interpret regression
*coeffcients?
**Slope coefficients are now interpreted as the change in the
*probability of success in the LPM model.DeltaP(y=1|x) = Bjdelta_xj
use "C:\Users\H\Desktop\AppEcon1\MROZ.dta", clear
*y (inlf - in the labor force) equals 1 if a married woman
*reported working for a wage outside the home in 1975, and 0
*otherwise.
*Definitions of explanatory variables
*nwifeinc: husband's earnings (in $1000),
*kidslt6: number of children less than 6 years old,
*kidsge6: number of children between 6-18 years of age,
*educ; exper; age
reg inlf nwifeinc kidslt6 kidsge6 age educ exper
*All variables are individually statistically significant
*except kidsge6. All coefficients have expected signs using standard
*economic theory and intiution.
*Interpretation of coefficient estimates: For example, the
*coefficient estimate on educ, 0.038, implies that, ceteris
*paribus, an additional year of education increases predicted
*probability of labor force participation by 0.038.
*The coefficient estimate on nwifeinc: if husband's income
*increases by 10 units (ie, $10000), the probability of labor
*force participation falls by 0.034.
*exper has a quadratic relationship with inlf: the effect of

*past experience on the probability of labor force participation


*is diminishing.
*The number of young children has a big impact on labor force
*participation. The coefficient estimate on kidslt6 is -0.262.
*Ceteris paribus, having one additional child less than six years
*old reduces the probability of participation by 0.262.
*In the sample, about 20% of the women have at least one
*child.
*Predicted probability of success is given by yhat and it can have
*values outside the range 0-1. Obviously, this contradicts
*the rules of probability.
*In the example out of 753 observations, 16 have yhat < 0 and
*17 have yhat > 1.
*If these are relatively few, they can be interpreted as 0 and 1,
*respectively.
*Nevertheless, the major shortcoming of LPM is not
*implausible probability predictions. The major problem is that
*a probability cannot be linearly related to the independent
*variables for all their possible values.
*In the example, the model predicts that the effect of going
*from zero children to one young child reduces the probability
*of working by 0.262.
*This is also the predicted drop if the woman goes from having
*one child to 2 or 2 to 3, etc.
*It seems more realistic that the first small child would reduce
*the probability by a large amount, but subsequent children
*would have a smaller marginal effect.
*Thus, the relationship may be nonlinear.
*LPM is heteroscedastic: The MLR.5: Constant error variance
*assumption is not satisfied.
*We learned that in this case OLS is unbiased and consistent
*but inefficient. The Gauss-Markov Theorem fails. Standard
*errors and the usual inference procedures are not valid.
*It is possible to find more efficient estimators than OLS.
tab kidslt6
quietly reg inlf nwifeinc educ exper expersq age kidslt6 kidsge6
predict inlfh
count if inlfh<0
count if inlfh>1
replace inlfh=1 if inlfh>1
replace inlfh=0 if inlfh<0
***HETEROCEDASTICITY
*Heteroscedasticity-Robust Standard Errors
*Robust Standard Errors
use "C:\Users\H\Desktop\AppEcon1\wage1.DTA", clear
reg wage female educ exper tenure
estimates store non_robust
reg wage female educ exper tenure, robust
estimates store robust
estimates table non_robust robust, b se t stats(r2, N)
*If the sample size is large enough, use robust option.
**Breusch-Pagan Heteroscdasticity Test
use "C:\Users\H\Desktop\AppEcon1\hprice1.dta", clear
reg price lotsize sqrft bdrms
predict uhat, residual
g uhat2=uhat^2

reg uhat2 lotsize sqrft bdrms


test lotsize sqrft bdrms
scalar LMts=e(N)*e(r2)
disp LMts
display chi2tail(3,LMts)
* Stata built-in commands to conduct Breusch-Pagan Heteroscedasticity test
reg price lotsize sqrft bdrms
estat hettest lotsize sqrft bdrms, fstat rhs
estat hettest lotsize sqrft bdrms, iid rhs
*Take the log of the suitable variables and re-conduct test:
reg lprice llotsize lsqrft bdrms
estat hettest llotsize lsqrft bdrms, fstat rhs
estat hettest llotsize lsqrft bdrms, iid rhs
**White Heteroscdasticity Test
reg lprice llotsize lsqrft bdrms
predict lpricehat
gen lprhatsq= lpricehat* lpricehat
*By using rename command, we can change a variable name:
rename lprhatsq lpricehatsq
estat hettest lpricehat lpricehatsq, fstat
estat hettest lpricehat lpricehatsq, iid
**Let's open hprice1.dta
** File>Edit>Open>...
use "C:\Users\H\Desktop\AppEcon1\hprice1.dta", clear
**Ramsey RESET Test
reg lprice llotsize lsqrft bdrms
predict yhat
gen yhat2=yhat*yhat
gen yhat3=yhat^3
reg lprice llotsize lsqrft bdrms yhat2 yhat3
test yhat2 yhat3
*or use built-in Stata command: estat ovtest
*but 4th power also included in the test regression:
reg lprice llotsize lsqrft bdrms
estat ovtest
**at 5%, the functional form is correctly specified.
*We do not reject the null hypothesis
* But if we take the variables in level rather than log,
**RESET test indicates the functional form misspecification in level:
reg price lotsize sqrft bdrms
estat ovtest
*****GLS-WLS-FGLS**********
**WLS (Weighted Least Squares)
*We know that robust inference procedures are only valid if
*the sample size is large enough. An alternative is
*to use Weighted Least Squares (WLS) instead of OLS.
*Under heteroscedasticity WLS estimators are more efficient
*than OLS estimators.
*WLS method require the knowledge of the form of the

*heteroscedasticity. In most cases, it is assumed that the


*heteroscedasticity is known up to a multiplicative constant.
*we can transform the original model so that the
*transformed model has a constant variance. Since the
*transformed model will not have heteroscedasticity, OLS
*estimation will be efficient. This procedure is called WLS.
*GLS method: applying OLS to the transformed model gives
*the GLS estimators.
*GLS estimators will be different from the OLS estimators in
*the original regression. Interpretation of parameter estimates
*are made in the context of the original model.
*GLS estimators are also used in the presence of serial
*correlation in the regression analysis of time series data.
*The GLS estimators correcting for heteroscedasticity are called
*WLS estimators.The GLS estimators are BLUE.
*The R2 of the transformed model cannot be used as a
*goodness of fit measure. But it can be used to compute test
*statistics.
use
reg
reg
reg
reg

"C:\Users\H\Desktop\AppEcon1\SAVING.dta", clear
sav inc
sav inc [aw = 1/inc]
sav inc size educ age black
sav inc size educ age black [aw = 1/inc]

**FGLS (Feasible GLS)


*We use the data in SMOKE.dta
*to estimate a demand function for daily cigarette consumption.
*Since most people do not smoke in the sample,
*the dependent variable, cigs, is zero for most observations.
*A linear probability model is not ideal
*because it can result in negative predicted values.
*Nevertheless, we can still learn something about the determinants of
*cigarette smoking by using a linear model.
use "C:\Users\H\Desktop\AppEcon1\SMOKE.dta", clear
reg cigs lincome lcigpric educ age agesq restaurn
*Neither income nor cigarette price is
* statistically significant, and their effects are
*not practically large.
**Change in cigs if income increases by 10%
display _b[lincome]*10/100
*For example,
*if income increases by 10%, cigs is predicted to increase by
*(0.880/100)(10)=0.088, or less than one-tenth
* of a cigarette per day. The magnitude of the price effect
*is similar.
*Additional year of education reduces the average
*cigarettes smoked per day by one-half, and the effect
*is statistically significant. Cigarette smoking
*is also related to age, in a quadratic fashion.
**Turnover point for age
display -_b[age]/(2*_b[agesq])
*Smoking increases with age up until age 0.771/[2(0.009)]= 42.83,

*and then smoking decreases with age.


*Both terms in the quadratic are statistically significant.
*The presence of a restriction on smoking in
*restaurants decreases cigarette smoking
* by almost three cigarettes per day, on average.
*Do the errors underlying equation contain heteroskedasticity?
*The Breusch-Pagan regression of the squared OLS residuals
*on the independent variables:
qui reg cigs lincome lcigpric educ age agesq restaurn
estat hettest lincome lcigpric educ age agesq restaurn, fstat rhs
estat hettest lincome lcigpric educ age agesq restaurn, iid rhs
* The results indicate very strong evidence of heteroskedasticity.
*Therefore, we estimate the equation using the feasible GLS procedure
qui reg cigs lincome lcigpric educ age agesq restaurn
predict uhat, resid
gen luhat2=log(uhat*uhat)
qui reg luhat2 lincome lcigpric educ age agesq restaurn
predict hhat, xb
gen e_hhat = exp(hhat)
reg cigs lincome lcigpric educ age agesq restaurn [aw=1/e_hhat]
**Linear Probability Model-WLS-OLS-ROBUST
use "C:\Users\H\Desktop\AppEcon1\MROZ.dta", clear
reg inlf nwifeinc educ exper expersq age kidslt6 kidsge6
estimates store OLS
predict yhat
replace yhat=0.999 if yhat>1
replace yhat=0.001 if yhat<0
generate h_hat=yhat*(1-yhat)
reg inlf nwifeinc educ exper expersq age kidslt6 kidsge6 [aw=1/h_hat]
estimates store WLS
reg inlf nwifeinc educ exper expersq age kidslt6 kidsge6, robust
estimates store ROBUST
estimates table OLS ROBUST WLS, b se stats(N, r2)
log close

You might also like