You are on page 1of 34

Two-Stage Least-Squares Regression

Diagnosing the independence of error terms in a nonrecursive path analysis model

Two-Stage Least-Squares Regression (2SLS): Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

2 Key Concepts ***** Two-Stage Least-Squares Regression (2SLS) In a Nonrecursive Path Analysis

Recursisve and nonrecursive path models Causal loops Problems in OLS regression when a predictor variable is related to an error term Regression coefficient (b) will be biased Confidence interval of b will not be efficient There may be a spurious effect present b may be overestimated or underestimated The value of b may be suppressed or have the wrong sign How to test whether the error terms in a path model are independent The logic of 2SLS regression The concept of an instrumental variable Criteria of a good instrumental variable The concept of a lagged variable Construction of the equations in 2SLS DITS analysis in 2SLS SPSS 2SLS

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

Lecture Outline
 The problem of non-independent error terms in path analysis  Recursive and nonrecursive path models  The concept of two-stage least-squares regression  The concept of an instrumental variable  2SLS regression in phases  SPSS 2SLS in one step  DITS analysis of results

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

The Problem
In linear regression, if a predictor variable Xi is related to the error term ej bij will be a biased estimate of Fij and The confidence interval of bij will not be efficient

In addition, a spurious relationship may be present with the result that: bij may be overestimated or bij may be underestimated with the result that The spurious effect acts as a suppressor on bij or bij will have the wrong algebraic sign

This can happen in path analysis, particularly in nonrecursive path models.

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

Recursive & Nonrecursive Path Models


Recursive Model
Causal relationships are unidirectional

X3

X1

I1

X2

X4

I4

Nonrecursive Model
Causal relationships are bi-directional creating causal loops

X3

X1

I1

X2

X4

I4

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

Nonrecursive Model
Correlated Error Terms

X3

X1

I1

X2 Interpretation of relationships

X4

I4

X1 is nonrecursively & causally related to X4 X1 may be indirectly related to I4 via X4 X4 may be indirectly related to I1 via X1

I1 may be correlated I4
If these conditions exist
The assumptions of OLS will be violated. OLS regression assumes unidirectional relationships & independence of error terms

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

An Example
Let us assume that we are interested in predicting a state's correctional population (X4 state), the total number of people in prison and on parole. Assume further that The index crime rate (X3 crm_indx), The number of arrests (X2 arrests) and The total correctional population (X1 com_corr) in the state's largest county are causes of the state's population Finally, assume that the state's population is partly a cause of the county's population Since parolees who are arrested become part of the county's population

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

Path Diagram of the Theory


A nonrecursive theory about the relationship between the state's correctional population (X4) and The correctional population in the largest county (X1), its crime index (X3) and its arrests (X2)

crime index

com_corr

X3

X1

I1

X2
arrests

X4
state

I4

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

9 An Example (cont.)

Interpretation The county's correctional population effects the state population and The state's population effects the county's population through parole releases.

Therefore The county's population may be indirectly related to I4 via the state's population The state's population may be indirectly related to the county's population And I1 may be correlated I4

I1 via

If this is the case OLS assumptions will be violated in estimating the model

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

10

Estimation of the Structure Coefficients in the Nonrecursive Path Model


( X1, X2, and X4 are endogenous variables, X3 is an exogenous variable)

Path Equations (estimated by OLS regression)


1) X2 = a + b23X3 = 827219 + 0.027 X3 2) X1 = a + b12X2 + b13X3 + b14X4 X1 = -31014 + 0.072X2 - 0.0008X3 - 0.044X4 3) X4 = a + b41X1 + b42X2 + b43X3 X4 = -145721 + -0.884X1 + 0.214X2 + 0.05X3

The Problem
Equation #1 may be solved without bias using OLS regression Equations #2 & #3 may not be solvable without the potential for bias using OLS regression

Since X1 & X4 are nonrecursively related


I4 may be indirectly related to X1 I1 may be indirectly related to X4 and I4 may be correlated with I1

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

11

Correctional Analysis of the Independence of Error Terms


Variable I1 I4 X1 X4 I1 1.00 I4 0.198 1.00 X1 0.000 -0.084 1.00 X4 -0.058 0.000 0.231 1.00

Interpretation
Correlation between I1 and I4 (r = 0.198) Slight correlation between I1 and X4 (r = -0.058) Slight correlation between I4 and X1 (-0.084)

Path Model Correlations


-0.084

X3

X1

I1
-0.058 +0.198

X2

X4

I4

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

12

Solving the Problem with Two-Stage Least-Squares Regression (2SLS)


The logic of 2SLS regression Step 1 Replace the troublesome variable X1 with an instrumental variable (Ij) which is highly correlated to X1 but Not caused by X4 and not correlated with I4 Step 2 Regress the troublesome variable (X1) on the instrumental variable (Ij) and the other variables theorized to cause it recursively X1 = a + bi(I) + b2(X2) + b3(X3) Step 3 Save the predictions of X1 (pre_1) since

They can not be caused by X4 and will be independent of

I4
Step 4 Use the predictions (pre_1) of X1, instead of X1 , to predict X4, Since they are that part of X1 which is not caused by X4 and are independent of I4

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

13

Step 1 Selection of an Instrumental Variable


Finding an instrumental variable (Ij) that meets the criteria set forth in Step 1 is usually very difficult.

An alternative method involves creating an instrumental variable (Ij) by lagging the troublesome variable (X1). This assumes that the nonrecursive relationship between X1 & X4 is contemporaneous and will disappear when X1 is lagged.

The instrumental variable Ij Ij = f(X1 lag = +1) Shifting the cases of variable X1 by one case

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

14

An Example of Lagging the Variable County Corrections (X1), lag = +1


Notice that the newly created instrumental variable (X5 lag_com) is simply X1 lagged by one case.

Case Number

County Correstions X1 (com_corr)

Instrumental Variable X5 (lag_com)

1 2 3 4 N N+1

20141 18134 22148 .. .. .. 28167 20141 18134 22148 .. .. .. 28167

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

15

The Variable X1 (com_corr) lagged +1


The new variable lag_com

state 110331.66 160400.00 110331.66 140374.17 130360.56 170412.31 140374.17 150387.30 120346.41 180424.26 . .

com_corr 20141.42 18134.16 22148.32 21144.91 27164.32 26161.25 25158.11 27164.32 30173.21 28167.33 . .

arrests crm_indx 810900.00 840916.52 850921.95 850921.95 880938.08 900948.68 890943.40 940969.54 950974.68 990994.99 2101449.14 2901702.94 1751322.88 2301516.58 2051431.78 2801673.32 2401549.19 2501581.14 1951396.42 2601612.45

lag_com . . 20141.00 18134.00 22148.00 21145.00 27164.00 26161.00 25158.00 27164.00 30173.00 28167.00

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

16

Step 2 Regressing the Troublesome Variable X1 on Its Recursive Causes


The next step involves regressing the troublesome variable X1 on the instrumental variable X5 and the other variables thought to recursively cause it.
lag_com

X5
crime index com_corr

X3

X1

I1

X2
arrests

X4
state

I4

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

17 Step 2: Regressing the Troublesome Variable X1 on Its Recursive Causes (cont.)

Com_corr = f(com_lag, arrests & crm_indx)

X1 = a + b5(X5) + b2(X2) + b3(X3)

X1 = -8567.07 + 0.418(X5) - 0.0376(X2) -0.00437(X3)

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

18

Regression Analysis for Step 2


Regression
b Variables Entered/Removed

Model 1

Variables Entered CRM_IND X, ARRESTS, a LAG_COM

Variables Removed .

Method Enter

a. All requested variables entered. b. Dependent Variable: COM_CORR

Mo el Su

Model 1

R .919 a

R Square .844

a. Predictors: (Constant), CRM_INDX, ARRESTS, LAG_COM b. Dependent Variable: COM_CORR

ANO A Sum of Squares 99266152 18384142 1.18E+08

Model 1

df 3 5 8

Regression Residual Total

a. Predictors: (Constant), CRM_INDX, ARRESTS, LAG_COM b. Dependent Variable: COM_CORR

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University


ry Adjusted R Square .750 Std. Error of the Estimate 1917.5058 Mean Square 33088717.21 3676828.498 F 8.999 Sig. .019 a

19 Regression Analysis for Step 2 (cont.)

ffici

a. Dependent Variable: COM_CORR

Predi ted Val e Resid al Std. Resid al Std. Predi ted Val e

-1779.79 -1.775 -.928

2726.8611 1.414 1.422

a. Dependent Variable: COM_CORR

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

Minim m 18794.91

Maxim m 30026.32

Mean 25046.21 8.084E-13 .000 .000

Std. De iation 3522.5373 1515.9214 1.000 .791

Resi

  

Model 1

.431 .506 -.442

t -.409 1.002 1.253 -2.063

Si . .699 .362 .266 .094

a ls t tisti s

Unstandardized Coeffi ients B Std. Error (Constant) -8567. 73 20947.224 LAG_COM .418 .418 ARRESTS 3.761E-02 .030 CRM_INDX -4.37E-03 .002

Standardi zed Coeffi ien ts Beta

N 9 9 9 9

20

Step 3 Saving the Predictions From the Regression Model of Step 2


When the regression model in Step 2 is run The unstandardized predictions of X1 (com_corr) are saved. This new variable in the SPSS database is labeled pre_1 pre_1 will be used as a proxy variable for the troublesome variable X1
$$$ $$$ $ 5 )&$% 4 #
46.21 . .

Pr

ict

-1779.79

2726.8611 1.414 1.422

8. 84E-13

1515.9214 1. .791

t . Pr t . .

ict

-1.775

-.928

ri

l :

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

$$$

11 D4 A 5 5 @ 6 CB CB & ) )( &' & & ) )0 ' 2&1 ' 3 &0 ) ( '& '& ' 3 )0 ' 2&1 &0 )( '& '&

i im m 18794.91

xim m 26.32

% 5 8 % 7&6 ) '3

$ 0 $)4

Resi

a als tatisti s

"

t . i ti 3 22. 373

0 54

9 9 9 9

21

The Predicted Values of X1

Predictions: pre_1

state 110331.66 160400.00 110331.66 140374.17 130360.56 170412.31 140374.17 150387.30 120346.41 180424.26 . .

com_corr 20141.42 18134.16 22148.32 21144.91 27164.32 26161.25 25158.11 27164.32 30173.21 28167.33 . .

arrests crm_indx 810900.00 840916.52 850921.95 850921.95 880938.08 900948.68 890943.40 940969.54 950974.68 990994.99

lag_com

pre_1

2101449.14 2901702.94 1751322.88 2301516.58 2051431.78 2801673.32 2401549.19 2501581.14 1951396.42 2601612.45

. . 20141.00 18134.00 22148.00 21145.00 27164.00 26161.00 25158.00 27164.00 30173.00 28167.00

18794.91006 23361.31851 22634.78247 24437.45568 24427.47096 25381.07001 26405.48824 30026.31649 29947.11905 .

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

22

Step 4 Using the Variable pre_1 to Predict X4


A new path model with pre_1 as a proxy variable for X1 (com_corr)

lag_com X5 Crime index X1 X3


pre_1

Error (1.0) Ipre_

X2 Arrests

X4 State

(1.0)

I4 Error

The regression model X4 = a + bpre_1 (pre_1) + b2 (X2) + b3 (X3)

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

23

Regression Analysis for Step 4


SPSS regression output
Regression
b Variables Entered/Removed

Model 1

Variables Entered CRM_IND X, ARRESTS, Unstandar dized redi a ted Value

Variables Removed

Method

Enter

a. All requested variables entered. b. Dependent Variable: STATE

Model

Model 1

R .953 a

R Square .909

a. redi tors: (Constant), CRM_INDX, ARRESTS, Unstandardized redi ted Value b. Dependent Variable: STATE

AN Sum of Squares 3.85E+09 3.87E+08 4.23E+09

Model 1

df 3 5 8

Regression Residual Total

a. redi tors: (Constant), CRM_INDX, ARRESTS, Unstandardized redi ted Value b. Dependent Variable: STATE

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

U Q V

R IIH G QP

ar

Ad usted R Square .854

Std. Error of the Estimate 8792.7193

S S

Mean Square 1282269992 77311912.94

F 16.586

Sig. .005 a

24 Regression Analysis for Step 4 (cont.)

a oefficients

Unstandardi ed Coeffi ients Model 1 (Constant) Unstandardi d Predi ted Val e ARRESTS CRM_INDX B -103846

Std. Error 129910.6 4.578 .302 .017

Beta

-.215 .147 5.149E-02

-.033 .330 .869

a. Dependent Variable: STATE

a Residuals tatistics

Predi ted Val e Residual Std. Residual Std. Predi ted Value

Minimum 106512.8 -9758.68

Maximum 169488.2 10936.02 1.125 1.244

Mean 144823.4 1.779E-11 .000 .000

-1.747 -1.110

a. Dependent Variable: STATE

Regression model
X4 = -103846 - 0.215 (pre_1) + 0.147 (X2) + 0.05149 (X3)

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

W a YX ` a ` `

Standardi ed Coeffi ien ts t -.799 -.047 .487 3.097 Sig. .460 .964 .647 .027

Std. Deviation 21928.3207 6951.2550 1.000 .791

N 9 9 9 9

25

Reconfigured Path Diagram with Structure Coefficients

lag_com X5 +0.418 Crime index -0.004 X3 +0.038 +0.027 -0.215 0.052 X2 Arrests +0.147 X4 State (1.0) I4 Error r = 0.0
pre_1

Com_corr X1 (1.0) Error Ipre_1

Regression equations X2 = 827219 + 0.027 X3 X1 = - 8567.07 + 0.418(X5) - 0.0376(X2) -0.00437(X3) X4 = -103846 - 0.215 (pre_1) + 0.147 (X2) + 0.05149 (X3)

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

26

Interpretation of the Modified Path Diagram


 When the previous years county corrections population (X5
lag_com) increases by one offender, the proxy variable for the current years county corrections population (pre_1) increases by +0.418 offenders. A ratio of +1000 to +418

 When the proxy variable for the current years county corrections
population (pre_1) increases by one offender, the states correctional population (X4) decreases by -0.215 offenders. A ratio of +1000 to 215  The proxy variable for the current years county corrections population (pre_1) Declines by -0.004 for a unit increase in the crime index (X3) (+1000 to 4) Increases by +0.038 for a unit increase in arrests (X2) (+1000 to +38)

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

27

Is the Problem of Correlated Errors Eliminated?

Correlation of pre_1 and X4 (state) r = 0.0 In the original path model the correlation between X1 (com_corr) and e4 was r = 0.231
lag_com X5 +0.418 Crime index -0.004 X3 +0.038 +0.027 -0.215 0.052 X2 Arrests +0.147 X4 State (1.0) I4 Error r = 0.0
pre_1

Com_corr X1 (1.0) Error Ipre_1

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

28

2SLS Regression in One Step


SPSS has a routine that allows 2SLS regression to be performed in one step. The steps are as follows: To estimate the regression of state on its causes X4 = a + b2 (X2) + b3 (X3) + bpre_1 (pre_1) Analyze Regression 2-Stage Least Squares Dependent: State Explanatory: Crm_indx Arrests Com_corr Instrumental: Crm_indx Arrests Lag_com OK

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

29

SPSS Result of 2SLS in One Step


Two-st ge Le st Squ res MODEL: MOD_1. _ Equation number: 1

Dependent variable.. STATE Listwise Deletion of Missing Data

Multiple R .95406 R Square .91023 Adjusted R Square .85636 Standard Error 8710.97783

Analysis of Variance: DF Sum of Squares 3 5 Mean Square 3846809975.3 379405674.1 Signif F = 1282269991.8 75881134.8 .0048

Regression Residuals F =

16.89840

------------------ Variables in the Equation -----------------Variable B SE B Beta -.035774 .868543 .330126 T -.047 3.126 .492 -.807 Sig T .9641 .0261 .6436 .4564

COM_CORR -.214591 4.535006 CRM_INDX .051487 .016472 ARRESTS .147136 .299121 (Constant)-103845.71777 128702.8891
Correlation Matrix of Parameter Estimates

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

30

Comparison of Results
The table below compares the results of the SPSS 2SLS in one step with the previous analysis involving multiple steps.

Variable

Statistical Indicator b SEb b SEb b SEb R2 1- R2 SE prediction

SPSS 2SLS +0.0515 s0.0165 +0.1471 s0.2992 -0.2146 s4.5350 0.9102 0.0898 s8710.98

Previous Analysis +0.0515 s0.0166 +0.1472 s0.3019 -0.2149 s4.5780 0.9090 0.0913 s8792.72

Crm_indx (X3) Arrests (X2) Com_corr Pre (X1) Goodness Of Fit Indicators

There are slight differences in the error terms since SPSS adjusts the error for the accuracy with which the instrumental variable predicts com_corr (X1). Otherwise the results are identical within rounding error

DITS Analysis: Direct, Indirect, Total & Spurious Effects


Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

31

Direct Effects : D = (bij)


D41 = - 0.215 D42 = +0.147 D43 = +0.052 D12 = +0.038 D13 = - 0.004 D15 = +0.418 D23 = +0.027

Indirect Effects : I = (4 bij)


I412 = (-0.215) (0.038) = -0.00817 I413 = (-0.215) (-0.004) = +0.00086 I415 = (-0.215) (+0.418) = +0.08987 I423 = (0.147) (0.027) = +0.00397 I4123 = (-0.215) (0.038) (0.027) = +0.00022 I123 = (0.038) (0.027) = +0.00103

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

32

Total Effects: T = (D + I)
T41 = D41 = -0.215 T42 = D12 + I412= (0.147 0.00817) = +0.13883 T43 = D43 + I413 + I423 + I4123 = (0.052 + 0.00086 + 0.00397 + 0.00022) = +0.05705 T45 = I415 = +0. 08987 T12 = D12 = +0.038 T13 = D13 = -0.004 T15 = D15 = +0.418 T23 = D23 = +0.027

Spurious Effects: S = (D T) = (bij T)


S41 = b41 T41 = (-0.215) (-0.215) = 0.00 S42 = b42 T42 =(0.147) (0.13883) = 0.008 $ 0.00 S43 = b43 T43 = (0.052) (0.05705) = -0.005 $ 0.00 S12 = b12 T12 =(0.038) (0.038) = 0.00 S13 = b13 T13 = (-0.004) (-0.004) = 0.00 S15 = b15 T15 =(0.418) (0.418) = 0.00 S23 = b23 T23 = (0.027) (0.027) = 0.00

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

33

Summary of DITS Results


All effects have been multiplied by 100 & indicate the effect of a change of 100 in a predictor variable on the dependent variable

Variable D
Crm_indx (X3)

Effects on State Corrections (X4) I via (Xj)


+0.086 (1) +0.397 (2) +0.022 (1,2)

T +5.7

S 0.00

+5.2

Arrests (X2) Com_corr (X1)

+14.7 -21.5

-0.817 (1) NA

+13.9 -21.5

0.00 0.00

Variable D
Crm_indx (X3) Arrests (X2) Lag_com (X5)

Effects on County Corrections (X1) I via (Xj) +0.10 (2) NA NA T -0.4 +3.8 +41.8 S 0.00 0.00 0.00

-0.4 +3.8 +41.8

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

34

The Case Study


The case study that follows utilizes the database State It involves four phases analogous to the analysis presented on the previous pages Phase1 Diagnosis of the error terms

Phase 2

Creation of a lagged instrumental variable

Phase 3 Estimation of 2SLS equations

Phase 4 Execution of SPSS 2SLS

Two-Stage Least-Squares Regression: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

You might also like