You are on page 1of 21

Garson - Path analysis

David Garson
Professor of Public Administration, North Carolina State University, Raleigh, North Carolina.

Path Analysis
Lecture Notes, 2008
Contents
Key concepts and terms
Path coefficients
Path multiplication rule
Effect decomposition
Path analysis with structural equation modeling
Assumptions
Frequently asked questions
Bibliography

Overview
Path analysis is an extension of the regression model, used to test the fit of the correlation
matrix against two or more causal models which are being compared by the researcher. The
model is usually depicted in a circle-and-arrow figure in which single-headed arrows
indicate causation. A regression is done for each variable in the model as a dependent on
others which the model indicates are causes. The regression weights predicted by the model
are compared with the observed correlation matrix for the variables, and a goodness-of-fit
statistic is calculated. The best-fitting of two or more models is selected by the researcher as
the best model for advancement of theory.
Path analysis requires the usual assumptions of regression. It is particularly sensitive to
model specification because failure to include relevant causal variables or inclusion of
extraneous variables often substantially affects the path coefficients, which are used to
assess the relative importance of various direct and indirect causal paths to the dependent
variable. Such interpretations should be undertaken in the context of comparing alternative
models, after assessing their goodness of fit discussed in the section on structural equation
modeling (SEM packages are commonly used today for path analysis in lieu of stand-alone
path analysis programs). When the variables in the model are latent variables measured by
multiple observed indicators, path analysis is termed structural equation modeling, treated
separately. We follow the conventional terminology by which path analysis refers to
modeling single-indicator variables.

Garson - Path analysis

Garson - Path analysis

Key Concepts and Terms


Estimation. Note that path estimates may be calculated by OLS regression or
by MLE maximum likelihood estimation, depending on the computer package.
Two-Stage Least Squares (2SLS), discussed separately, is another path
estimation procedure designed to extend the OLS regression model to
situations where non-recursivity is introduced because the researcher must
assume the covariances of some disturbance terms are not 0 (this assumption is
discussed below). Click here for a separate discussion of 2SLS.
o Path model. A path model is a diagram relating independent, intermediary,
and dependent variables. Single arrows indicate causation between exogenous
or intermediary variables and the dependent(s). Arrows also connect the error
terms with their respective endogenous variables. Double arrows indicate
correlation between pairs of exogenous variables. Sometimes the width of the
arrows in the path model are drawn in a width which is proportional to the
absolute magnitude of the corresponding path coefficients (see below).
o Causal paths to a given variable include (1) the direct paths from arrows
leading to it, and (2) correlated paths from endogenous variables correlated
with others which have arrows leading to the given variable. Consider this
model:
o

This model has correlated exogenous variables A, B, and C, and endogenous


variables D and E. Error terms are not shown. The causal paths relevant to
variable D are the paths from A to D, from B to D, and the paths reflecting
common anteceding causes -- the paths from B to A to D, from C to A to D,
and from C to B to D. Paths involving two correlations (C to B to A to D) are
not relevant. Likewise, paths that go backward (E to B to D, or E to B to A to
D) reflect common effects and are not relevant.
o

Garson - Path analysis

Exogenous and endogenous variables. Exogenous variables in a path model


are those with no explicit causes (no arrows going to them, other than the
measurement error term). If exogenous variables are correlated, this is
indicated by a double-headed arrow connecting them. Endogenous variables,
then, are those which do have incoming arrows. Endogenous variables include
intervening causal variables and dependents. Intervening endogenous variables
have both incoming and outgoing causal arrows in the path diagram. The
dependent variable(s) have only incoming arrows.

Garson - Path analysis

Path coefficient/path weight. A path coefficient is a standardized regression


coefficient (beta) showing the direct effect of an independent variable on a
dependent variable in the path model. Thus when the model has two or more
causal variables, path coefficients are partial regression coefficients which
measure the extent of effect of one variable on another in the path model
controlling for other prior variables, using standardized data or a correlation
matrix as input. Recall that for bivariate regression, the beta weight (the b
coefficient for standardized data) is the same as the correlation coefficient, so
for the case of a path model with a variable as a dependent of a single
exogenous variable (and an error residual term), the path coefficient in this
special case is a zero-order correlation coefficient.
Consider this model, based on Bryman, A. and D. Cramer (1990). Quantitative
data analysis for social scientists, pp. 246-251.

This model is specified by the following path equations:


Equation 1. satisfaction = b11age + b12autonomy + b13 income + e1
Equation 2. income = b21age + b22autonomy + e2
Equation 3. autonomy = b31age + e3
where the b's are the regression coefficients and their subscripts are the
equation number and variable number (thus b21 is the coefficient in Equation 2
for variable 1, which is age.
Note: In each equation, only (and all of) the direct priors of the endogenous
variable being used as the dependent are considered. The path coefficients,
which are the betas in these equations, are thus the standardized partial
regression coefficients of each endogenous variable on its priors. That is, the
beta for any path (that is, the path coefficient) is a partial weight controlling for
other priors for the given dependent variable.
Formerly called p coefficients, now path coefficients are called simply beta
weights, based on usage in multiple regression models. Bryman and Cramer
computed the path coefficients = standardized regression coefficients = beta
weights, to be:

Garson - Path analysis

Garson - Path analysis

Correlated Exogenous Variables. If exogenous variables are


correlated, it is common to label the corresponding double-headed
arrow between them with its correlation coefficient.
Disturbance terms.The residual error terms, also called disturbance
terms, reflect unexplained variance (the effect of unmeasured variables)
plus measurement error. Note that the dependent in each equation is an
endogenous variable (in this case, all variables except age, which is
exogenous). Note also that the independents in each equation are all the
variables with arrows to the dependent.
The effect size of the disturbance term for a given endogenous variable,
which reflects unmeasured variables, is (1 - R2), and its variance is (1 R2) times the variance of that endogenous variable, where R2 is based
on the regression in which it is the dependent and those variables with
arrows to it are independents. The path coefficient is SQRT(1 - R2).
The correlation between two disturbance terms is the partial correlation
of the two endogenous variables, using as controls all their common
causes (all variables with arrows to both). The covariance estimate is
the partial covariance: the partial correlation times the product of the
standard deviations of the two endogenous variables.
o

Garson - Path analysis

Path multiplication rule: The value of any compound path is the product of
its path coefficients. Imagine a simple three-variable compound path where
education causes income causes conservatism. Let the regression coefficient of
income on education be 1000: for each year of education, income goes up
$1,000. Let the regression coefficient of conservatism on income be .0002: for
every dollar income goes up, conservativism goes up .0002 points on a 5-point
scale. Thus if education goes up 1 year, income goes up $1,000, which means
conservatism goes up .2 points. This is the same as multiplying the
coefficients: 1000*.0002 = .2. The same principle would apply if there were
more links in the path. If standardized path coefficients (beta weights) were
used, the path multiplication rule would still apply, but the the interpretation is
in standardized terms. Either way, the product of the coefficients along the
path reflects the weight of that path.

Garson - Path analysis

Effect decomposition. Path coefficients may be used to decompose


correlations in the model into direct and indirect effects, corresponding, of
course, to direct and indirect paths reflected in the arrows in the model. This is
based on the rule that in a linear system, the total causal effect of variable i on
variable j is the sum of the values of all the paths from i to j. Considering
"satisfaction" as the dependent in the model above, and considering "age" as
the independent, the indirect effects are calculated by multiplying the path
coefficients for each path from age to satisfaction:
age -> income -> satisfaction is .57*.47 = .26
age -> autonomy -> satisfaction is .28*.58 = .16
age -> autonomy -> income -> satisfaction is .28*.22 x .47 = .03
total indirect effect = .45
That is, the total indirect effect of age on satisfaction is plus .45. In
comparison, the direct effect is only minus .08. The total causal effect of age
on satisfaction is (-.08 + .45) = .37.
Effect decomposition is equivalent to effects analysis in regression with one
dependent variable. Path analysis, however, can also handle effect
decomposition for the case of two or more dependent variables.
In general, any bivariate correlation may be decomposed into spurious and
total causal effects, and the total causal effect can be decomposed into a direct
and an indirect effect. The total causal effect is the coefficient in a regression
with all of the model's prior but not intervening variables for x and y controlled
(the beta coefficient for the usual standardized solution, the partial b
coefficient for the unstandardized or raw solution). The spurious effect is the
total effect minus the total causal effect. The direct effect is the partial
coefficient (beta for standardized, b for unstandardized) for y on x controlling
for all prior variables and all intervening variables in the model. The indirect
effect is the total causal effect minus the direct effect, and measures the effect
of the intervening variables. Where effects analysis in regression may use a
variety of coefficients (partial correlation or regression, for instance), effect
decomposition in path analysis is restricted to use of regression.
For instance, imagine a five-variable model in which the exogenous variable
Education is correlated with the exogenous variable Skill Level, and both
Education and Skill Level are correlated with the exogenous variable Job
Status. Further imagine that Education and each of the other two exogenous
variables are modeled to be direct causes of Income and also of Median House
Value, which are the two dependent variables. We might then decompose the
correlation of Education and Income:

Garson - Path analysis

Garson - Path analysis

1. Direct effect of Education on Income, indicated by the path coefficient


of the single-headed arrow from Education to Income.
2. Indirect effect due to Education's correlation with Skill Level, and Skill
Level's direct effect on Income, indicated by multiplying the correlation
of Education and Skill Level by the path coefficient from Skill Level to
Income.
3. Indirect effect due to Education's correlation with Job Status, and Job
Status's direct effect on Income, indicated by multiplying the
correlation of Education and Job Status by the path coefficient from Job
Status to Income.
As a second example decomposition for the same five-variable model is a bit
more complex if we wish to break down the correlation of the two dependent
variables, Income and Median House Value. Since here somewhat implausibly
the two dependents are modeled not to have a direct effect from Income to
House Value, the true correlation is hypothesized to be zero and all
correlations are spurious.
4. The spurious direct effect of Education as a common anteceding
variable directly causing both dependents, indicated by multiplying the
path coefficient from Education to Income by the path coefficient of
Education to House Value.
5. The spurious direct effect of Skill Level as a common anteceding
variable directly causing both dependents, indicated by multiplying the
path coefficient from Skill Level to Income by the path coefficient of
Skill Level to House Value.
6. The spurious direct effect of Job Status as a common anteceding
variable directly causing both dependents, indicated by multiplying the
path coefficient from Job Status to Income by the path coefficient of
Job Status to House Value.
7. The spurious indirect effect of Education and Skill Level as a common
antecedings variable directly causing both dependents, indicated by
multiplying the path coefficient from Education to Income by the
correlation of Education and Skill Level by the path from Skill Level to
House Value and adding the product of the path from Skill Level to
Income by the correlation of Education and Skill Level by the path
from Education to Median House Value.
8. The spurious indirect effect of Education and Job Status as a common
anteceding variables directly causing both dependents, indicated by
multiplying the path coefficient from Education to Income by the
correlation of Education and Job Status by the path from Job Status to
House Value and adding the product of the path from Job Status to
Income by the correlation of Education and Job Status by the path from
Education to Median House Value.

Garson - Path analysis

Garson - Path analysis

9. The spurious indirect effect of Skill Level and Job Status as a common
anteceding variables directly causing both dependents, indicated by
multiplying the path coefficient from Skill Level to Income by the
correlation of Skill Level and Job Status by the path from Job Status to
House Value and adding the product of the path from Job Status to
Income by the correlation of Skill Level and Job Status by the path
from Skill Level to Median House Value..
10. The residual effect is the difference between the correlation of Income
and Median House Value and the sum of the spurious direct and
indirect effects.
Correlated exogenous variables. The path weights connecting correlated
exogenous variables are equal to the Pearson correlations. When calculating
indirect paths, not only direct arrows but also the double-headed arrows
connecting correlated exogenous variables, are used in tracing possible indirect
paths, except:
Tracing rule: An indirect path cannot enter and exit on an arrowhead. This
means that you cannot have a direct path composed of the paths of two
correlated exogenous variables.

Garson - Path analysis

Significance and Goodness of Fit in SEM Path Models


OLS vs. SEM While a series of OLS regressions may be used to
implement path analysis, testing individual path coefficients using the
standard t or F test from regression output, today it is far more common
to use structural equation modeling (SEM) software. This section uses
AMOS with a model based on Ingram et al. (2000), used with the kind
permission of Karl Wuensch. Use of AMOS is described more fully in
the section on structural equation modeling. A structural equation
model with simple rather than latent variables is a path model.
The SEM path model. The path model is drawn as usual in SEM.
Illustrated below is the model for the Ingram data, which deals with
application to graduate schools. In this model, Attitude, SubNorm, and
PBC all predict Intent, while the ultimate dependent variable, Behavior,
is predicted by Intent and also directly by PBC. As customary, the
straight arrows represent regression paths for presumed causal
relationships, which the curved double-headed arrows represent
assumed covariances among the exogenous variables. The endogenous
variables are depicted with associated error terms.

Garson - Path analysis

In this model, Attitude is the individual's attitude toward graduate


school; SubNorm is subjective norms, reflected by attitudes toward
graduate school of others around the individual; PBC is planned
behavioral control, which is the individual's level of control over
behaviors related to graduate school. Intent is the individual's intent to
go to graduate school. Behavior is applying to graduate school.

Garson - Path analysis

Select outputs. Statistical tests and other outputs are selected under
View, Analysis Properties, in the AMOS menu system, yielding the
dialog shown below:

Garson - Path analysis

Garson - Path analysis

Path coefficients in standardized and unstandardized form are


generated by AMOS by selecting Analyze, Compute Estimates. Un like
the OLS regression method, all parameters are calculated
simultaneously. These coefficients may be caused to be displayed on
the path diagram, and also appear in the output, which is obtained in the
menu system by selecting View, Text Output. For this example, note
that the paths from SubNorm and PBC to Intent are not significant.
That is, Intent is primarily a function of Attitude.

Overall test of the model. The likehood ratio chi-square test, also called
the model chi-square test or deviance test, assesses the overall fit of the
model. A finding of nonsignificance corresponds to an adequate model
- one whose model-implied covariance matrix does not differ from the
observed covariance matrix. For this example, there is adequate fit:

Garson - Path analysis

10

Result (Default model)


Minimum was achieved
Chi-square = .847
Degrees of freedom = 2
Probability level = .655

However, the likelihood ratio chi-square test cannot be relied upon


alone, particularly for large samples, because a finding of significance
(rejecting the model) can occur even with very small differences of the
model-implied and observed covariance matrices (note below that
AMOS labels the likelihood ratio chi-square CMIN). Therefore a large
variety of other goodness of fit measures have been devised. Their use
and relative merits are described in the section on structural equation
modeling. In the output below, the "default model" is the researcher's
model. The "saturated model" is the perfectly explanatory but trivial
model with all possible arrows. he "independence model" is the model
with no regression arrows (straight arrows). Suffice it to say, these
goodness of fit measures support the adequacy of the model in the
example (for example, RMSEA should be <.05 for a good model and
here is .000). Note, however, that in spite of statistically adequate
goodness of fit, normally the researcher would drop non-significant
structural arrows indicated above in the path coefficients section.

Garson - Path analysis

10

Garson - Path analysis

11

Garson - Path analysis

11

Garson - Path analysis

12

Correlations. Also included in AMOS output are the correlations


among the exogenous variables (correlations in the upper output) and
between the exogenous variables and the endogenous variables
(squared multiple correlation in lower output) as illustrated below. The
model explains 34.3% of the variance in the dependent variable,
Behavior.

Correlations: (Group number 1 - Default model)


Estimate
SubNorm <--> PBC .505
Attitude <--> SubNorm .472
Attitude <--> PBC .665

Garson - Path analysis

Squared Multiple Correlations: (Group number 1 Default model)


Estimate
Intent
.600
Behavior
.343

Direct and indirect effects. AMOS will use the muliplication rule
automatically to partition overall effects into direct and indirect effects
for the endogenous variables (for Intent and Behavior in this example).

12

Garson - Path analysis

13

Garson - Path analysis

Modification indexes. Modificiation indexes (MI) may be used to add


arrows to the model. The larger the MI, the more adding the model will
improve model fit. As discussed in the section on structural equation
modeling, MIs should be interpreted in relation to critical ratios (CR),
which are a measure of the change in likelihood ratio chi square. And
as noted above, nonsignificance of path coefficients is used to drop
arrows in a model-building and model-trimming process discussed in
the section on structural equation modeling. The conservative approach

13

Garson - Path analysis

14

calls for adding or dropping one arrow at a time as each change will
affect the coefficients. For this example, the MIs are so small that no
addition of arrows is warranted. In fact, all MIs are well below the
usual lower threshold of 4.0.

Assumptions
Linearity: relationships among variables are linear (though, of course,
variables may be nonlinear transforms).
o Additivity: there are no interaction effects (though, of course, variables may
be interaction crossproduct terms)
o Interval level data for all variables, if regression is being used to estimate
path parameters. As in other forms of regression modeling, it is common to use
dichotomies and ordinal data in practice. If dummy variables are used to code a
categorical variable, one must be careful that they are represented as a block in
the path diagram (ex., if an arrow is drawn to one dummy it must be drawn to
all others in the set). If an arrow were to be drawn from one dummy variable to
another dummy variable in the same set, this would violate the recursivity
assumption discussed below.
o Residual (unmeasured) variables are uncorrelated with any of the variables
in the model other than the one they cause.
o

Garson - Path analysis

14

Garson - Path analysis

Garson - Path analysis

15

Disturbance terms are uncorrelated with endogenous variables. As a


corollary of the previous assumption, path analysis assumes that for any
endogenous variable, its distubance term is uncorrelated with any other
endogenous variable in the model. This is a critical assumption, violation of
which may make regression inappropriate as a method of estimating path
parameters. This assumption may be violated due to measurement error in
measuring an endogenous variable; when an endogenous variable is actually a
direct or indirect cause of a variable which the model states is the cause of that
endogenous variable (reverse causation); or when a variable not in the model is
a cause of an endogenous variable and a variable the model specifies as a cause
of that endogenous variable (spurious causation).
Low multicollinearity (otherwise one will have large standard errors of the b
coefficients used in removing the common variance in partial correlation
analysis).
No underidentification or underdetermination of the model is required. For
underidentified models there are too few structural equations to solve for the
unknowns. Overidentification usually provides better estimates of the
underlying true values than does just identification.
Recursivity: all arrows flow one way, with no feedback looping. Also,
it is assumed that disturbance (residual error) terms for the endogenous
variables are uncorrelated. Recursive models are never underidentified.
Proper specification of the model is required for interpretation of path
coefficients. Specification error occurs when a significant causal variable is
left out of the model. The path coefficients will reflect the shared covariance
with such unmeasured variables and will not be accurately interpretable in
terms of direct and indirect effects. In particular, if a variable specified as prior
to a given variable is really consequent to it, "we can do ourselves considerable
damage" (Davis, 1985: 64) because if a variable is consequent it would be
estimated to have no path effect, whereas when it is included as a prior variable
in the model, this erroneously changes the coefficients for other variables in
the model. Note, however, that while interpretation of path coefficients is
inaccurate under specification error, it is still possible to compare the relative
fit of two models, perhaps both with specification error.
Appropriate correlation input. When using a correlation matrix as input, it is
appropriate to use Pearsonian correlation for two interval variables, polychoric
correlation for two ordinals, tetrachoric for two dichotomies, polyserial for an
interval and an ordinal, and biserial for an interval and a dichotomy.
Adequate sample size is needed to assess significance. Kline (1998)
recommends 10 times as many cases as parameters (or ideally 20 times). He
states that 5 times or less is insufficient for significance testing of model
effects.
The same sample is required for all regressions used to calculate the path
model. This may require reducing the data set down so that there are no
missing values for any of the variables included in the model. This might be
achieved by listwise dropping of cases or by data imputation.

15

Garson - Path analysis

16

Frequently Asked Questions


o

Does path analysis confirm causation in a model?


No, although this is sometimes said. Everitt and Dunn (1991) note,
"However convincing, respectable and reasonable a path diagram...
may appear, any causal inferences extracted are rarely more than a
form of statistical fantasy". The authors are referring to the fact that
ultimately path analysis deals with correlation, not causation of
variables. The arrows in path models do indeed reflect hypotheses
about causation. However, many models may be consistent with a
given dataset. Path analysis merely illuminates which of two or more
competing models, derived from theory, is most consistent with the
pattern of correlations found in the data. The competing theories may
be represented in separate path models with separate path analyses, or
may be combined in a single path diagram, in which case the researcher
is concerned with comparing the relative importance of different paths
within the diagram.

Can path analysis be used for exploratory rather than confirmatory


purposes?
Methodologists favor a priori formulation of hypotheses about the
results to be obtained from path analysis, not post factum conclusions
based on the results. That is, the researcher should be seeking to
confirm hypotheses made beforehand. At a minimum, the researcher
should posit the sign of each relationship (arrow) in the model, and
ideally should go further to posit the arrangement by magnitude of the
importance of the independents, or even better yet, the intervals within
which the path coefficients will be expected to lie. Since data can
support multiple models, path analysis should focus on determining
which of two or more theoretically-derived models most conform to the
underlying data.
If used in an exploratory way, note that the more models you test, the
more likely you will confirm one just by chance. Thus the actual
confidence level is equal to the individual test confidence level to the
power of the number of models tested, here .95 cubed = .86. Thus
testing 3 models at .95 would mean we were actually operating at about
.86 confidence. At the .99 level, however, actual confidence would still
be .95 with 5 models.

Garson - Path analysis

16

Garson - Path analysis

17

How do I know if my model is "underidentified" and what difference does


it make? How does this relate to "recursivity?
A unique path solution cannot be calculated if a model is
underidentified. Identification is defined and steps the researcher can
take to deal with underidentification are dealt with in a section on
structural equation modeling. How to determine beforehand if a model
is underidentified, other than by running a path analysis program for
sample or fictional data and seeing if there are error flags, is discussed
more fully under a second section on structural equation modeling.

How does the significance of a path coefficient compare with the


significance of the corresponding regression coefficient?
They are identical. The path coefficient is the beta for the regression of
the dependent or other endogenous variable on the other variables with
arrows to it. The significance of the beta and b coefficients will be the
same, and is displayed on the same line in SPSS regression output.
Naturally, all paths in the model should be significant!

How do you assess the significance of the total (direct and indirect) effect
of exogenous variable x on endogenous variable y?
Run a regression with y as dependent and all others as independents,
leaving out any variable which mediates between x and y. The
significance of the b or beta for x in this equation is a test of the
significance of the total effect.

Why might the direct effect be zero?


There is a fully controlling mediating effect or fully controlling
anteceding effect. See further discussion in the section on partial
correlation.

How are path coefficients related to the correlation matrix for purposes of
testing a model?
First, recognize that computation of the model-estimated correlations
and their comparison with observed correlations is best done by relying
on a model-estimating program such as LISREL or AMOS. The model
path coefficients can be compared to the predicted path coefficients as
computed from the correlation matrix, following which the model
coefficients can be tested for goodness-of-fit with the predicted
coefficients.

Garson - Path analysis

17

Garson - Path analysis

18

The tracing rule is a rule for identifying all the paths, the sum of effects
of which is the estimated correlation between two variables in the
model. This model-estimated correlation can be compared to the
observed correlation to assess the fit of the model to the data. The
tracing rule is simply that the model-implied correlation between two
variables in a model is the sum of all valid paths (tracings) between the
two variables. These include the total effect (which is the sum of direct
and indirect effects) plus any associational effects due to correlated
exogenous variables. These associational effects are calculated by
multiplying the correlation between the exogenous variable under
consideration with a second exogenous variable, by this second
exogenous variable"s total effect on the target variable under
consideration.
For simplicity, consider this simple model:

The actual (observed) correlation matrix might look like this:


A
B
C

A
1

B
.379
1

D
-.652
-.451
1

When running AMOS or another path analysis program, the path


coefficients (standardized regression coefficients) would be:

A
B
C

A
1

B
.379
1

D
-.562
-.238
1

If one had only the path output and wanted to estimate back to the
correlation matrix, one would use these equations, one for each path:

r(AB) = n = .379 (the correlation is the standardized


path coefficient, in the bivariate case where the
independent is exogenous and the dependent has no other
inputs)
r(BD) = q + np = -.238 + .379(-.562) = -.451
r(AD) = p + nq = -.562 + .379(-.238) = -.652

Garson - Path analysis

18

Garson - Path analysis

19

In testing a model, somewhat similar reasoning is followed to compare


model-implied covariance matrices with observed covariance matrices,
with smaller differences indicating better goodness of fit.

How, exactly, can I compute path coefficients in SPSS?


The recommended method is to enter a path model into AMOS, which
is the SPSS program for structural equation modeling, discussed in that
section. However, in the SPSS regression module, for a recursive
model, let VARA cause VARB and VARC, and let VARB cause
VARC. A series of regressions is conducted with each non-exogenous
variable considered as a dependent in turn. For the foregoing model, the
path coefficient from VARA to VARB is given by this SPSS code:
REGRESSION
/MISSING LISTWISE
/STATISTCS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT VARB
/METHOD=ENTER VARA.

The path coefficients from VARA to VARC and from VARB to VARC
are given by this second regression command:

REGRESSION
/MISSING LISTWISE
/STATISTCS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT VARC
/METHOD=ENTER VARA VARB.

How do I compute the value of the path from an error term to an


endogenous variable?
The path is the square root of (1 - R-squared), where R-squared is from
the regression equation for the corresponding dependent variable. Do
not use adjusted R-squared.

Garson - Path analysis

19

Garson - Path analysis

20

How can multiple group path analysis determine if the path model differs
across groups in my sample?
Multiple group path analysis may be accomplished simply by running
separate path analysis for each group in the sample, then comparing the
path estimates. A more sophisticated approach supported by some path
analysis and SEM packages involves a second step: to impose a crossgroup equality constraint on the path estimates, then run the analysis
separately for each group, then see if the goodness-of-fit for the
constrained models is as good as for the unconstrained models. If the fit
of the constrained model is worse than that for the corresponding
unconstrained model, then the researcher concludes that model direct
effects differ by group.

Could I substitute logistic regression in doing effect decomposition?


No. Forward paths cannot be decomposed accurately with log linear
techniques. See Davis (1985): 48, 59.
Are regression and SEM the only approaches to path analysis?

Garson - Path analysis

Partial least squares path analysis is also available, through custom


software (not SPSS or SAS, which only support PLS regression). PLS
can support small sample models, even where there are more variables
than observations, but it is lower in power than SEM approaches.

20

Garson - Path analysis

21

Bibliography
o

Alwin, Duane F. and Robert M. Hauser (1975). The decomposition of effects


in path analysis. American Sociological Review 40(Feb.): 37-47.

Baron, R.M., & Kenny, D.A. (1986). The moderator-mediator variable


distinction in social sociological research: Conceptual, strategic, and statistical
considerations. Journal of Personality and Social Psychology, 51, 1173-1182

Cohen, Jacob, et al (2003). Applied multiple regression/correlation analysis


for the behavioral sciences. Mahwah, NJ. Chapter 12 is a good generalized
introduction to path analysis.

Davis, James A. (1985). The logic of causal order. Quantitative applications in


the social sciences series, no. 55. Thousand Oaks, CA: Sage Publications. Pp.
44-48 provide a non-technical introduction to effects analysis. Pp. 48-68
discuss path analysis generally.

Everitt, B. S., and G. Dunn, G. (1991). Applied multivariate data analysis.


London: Edward Arnold.

Heise, David R. (1975). Causal analysis. NY: Wiley.

Ingram, K. L., Cope, J. G., Harju, B. L., & Wuensch, K. L. (2000). Applying
to graduate school: A test of the theory of planned behavior. Journal of Social
Behavior and Personality, 15, 215-226.

Kenny, D. (2008). Reflections on mediation. Organizational Research


Methods, 11(2), 353.

Kerlinger, F. N. and E. J. Pedhazur (1973). Multiple regression in behavioral


research. BY: Holt, Rinehart, and Winston.

Kline, Rex B. (1998). Principles and practice of structural equation modeling.


NY: Guilford Press. A very readable introduction to the subject, with good
coverage of assumptions and SEM's relation to underlying regression, factor,
and other techniques.

Loehlin, J. (1991). Latent variable models: An introduction to factor, path and


structural analysis. Hillsdale, NJ: Lawrence Erlbaum. Widely used textbook.

Pedhazur, Elazer J. (1982). Multiple regression in behavioral research, 2nd


edition. NY: Holt. Chapter 15 (pp. 577-635) covers path analysis. Widely used
textbook.

Schumacker, Randall E. and Richard G. Lomax (2004). A beginner's guide to


structural equation modeling, Second edition. Mahwah, NJ: Lawrence
Erlbaum Associates. Chapters 7 discusses path models in the context of SEM
packages.

Wright, S. (1921). Correlation and causation. Journal of Agricultural Research


20: 557-585.

Wright, S. (1934). The method of path coefficients. Annals of Mathematical


Statistics, Vol. 5: 161-215. The seminal article.

Garson - Path analysis

21

You might also like