You are on page 1of 31

The Counseling Psychologist http://tcp.sagepub.

com/

The Use of Structural Equation Modeling in Counseling Psychology Research


Matthew P. Martens The Counseling Psychologist 2005 33: 269 DOI: 10.1177/0011000004272260 The online version of this article can be found at: http://tcp.sagepub.com/content/33/3/269

Published by:
http://www.sagepublications.com

On behalf of:

Division of Counseling Psychology of the American Psychological Association

Additional services and information for The Counseling Psychologist can be found at: Email Alerts: http://tcp.sagepub.com/cgi/alerts Subscriptions: http://tcp.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav Citations: http://tcp.sagepub.com/content/33/3/269.refs.html

>> Version of Record - Apr 5, 2005 What is This?

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

Martens / SEM IN COUNSELING PSYCHOLOGY THE COUNSELING PSYCHOLOGIST / May 2005 10.1177/0011000004272260

The Use of Structural Equation Modeling in Counseling Psychology Research


University at Albany, State University of New York

Matthew P. Martens

Structural equation modeling (SEM) has become increasingly popular for analyzing data in the social sciences, although several broad reviews of psychology journals suggest that many SEM researchers engage in questionable practices when using the technique. The purpose of this study is to review and critique the use of SEM in counseling psychology research regarding several of these questionable practices. One hundred five studies from 99 separate articles published in the Journal of Counseling Psychology between 1987 and 2003 were reviewed. Results of the review indicate that many counseling psychology studies do not engage in various best practices recommended by SEM experts (e.g., testing multiple a priori theoretical models or reporting all parameter estimates or effect sizes). Results also indicate that SEM practices in counseling psychology seem to be improving in some areas, whereas in other areas no improvements were noted over time. Implications of these results are discussed, and suggestions for SEM use within counseling psychology are provided.

Structural equation modeling (SEM) is a technique for analyzing data that is designed to assess relationships among both manifest (i.e., directly measured or observed) and latent (i.e., the underlying theoretical construct) variables. When using statistical techniques such as multiple regression or ANOVA, the researcher only conducts his or her analysis on variables that are directly measured, which can be somewhat limiting when the individual is interested in testing underlying theoretical constructs. For example, in an ANOVA design, a researcher interested in studying the construct of depression might include one self-report depression scale as the dependent variable. The researcher may interpret that scale as representative of the entire construct of depression, a dubious conclusion given the complexity of depression. In contrast, a researcher using SEM could explicitly model the latent construct of depression rather than relying on one variable as a proxy for the construct. SEM also provides advantages over other data analytic techniques in that complex theoretical models can be examined in one analysis.1
I thank Richard Haase, Tiffany Sanford, and Samuel Zizzi for their work on earlier drafts of this article and Kirsten Corbett, Amanda Ferrier, Melissa Sheehy, and Xuelin Weng for their help in coding the data. A previous version of this article was presented at the 2003 annual meeting of the American Psychological Association. Correspondence concerning this article should be addressed to Matthew P. Martens, Department of Educational and Counseling Psychology, University at Albany, State University of New York, ED220, 1400 Washington Ave, Albany, New York 12222; phone: (518) 442-5039; e-mail: mmartens@uamail.albany.edu.
THE COUNSELING PSYCHOLOGIST, Vol. 33 No. 3, May 2005 269-298 DOI: 10.1177/0011000004272260 2005 by the Society of Counseling Psychology

269

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

270 THE COUNSELING PSYCHOLOGIST / May 2005

A hypothetical example of a structural equation model that illustrates some advantages of SEM is presented in Figure 1.2 This model includes five latent constructs that are represented by ovals: personality characteristics thought to be associated with alcohol use (personality), familial factors thought to be related to alcohol use (family risk), motivations for using alcohol (drinking motives), strategies that can be used to limit alcohol consumption and problems related to alcohol use (protective behaviors), and problems associated with alcohol consumption (alcohol problems). Each latent variable includes several measured indicator variables, represented by rectangles, that are thought to represent components of the underlying variable. Therefore, one can see how the researcher can explicitly model the underlying constructs of interest via SEM by directly incorporating the constructs into the model that is to be tested. Figure 1 also demonstrates a relatively complex series of relationships that explain or predict problems associated with alcohol consumption, which would then be tested in a single analysis. In this model, both personality characteristics and family risk factors are thought to predict or cause motivation for drinking and use of protective behaviors, which are then in turn thought to predict or cause alcohol-related problems. These causal paths are indicated by single-headed arrows between the variables in question (note that such paths exist between each latent construct and its observed indicator variables, which occurs because the latent construct is thought to cause whatever responses occur in the observed variables that represent the construct). Personality characteristics and family risk factors are conceptualized as being correlated, but no causal or predictive relationship is specified. Therefore, a double-headed curved arrow indicates the relationship between these two constructs, which represents covariance between variables. As Figure 1 illustrates, SEM is well suited for model testing because the researcher can specify causal models that correspond to a theoretical perspective. Through SEM the researcher can then test the plausibility of the models on observed data. SEM has numerous applications within counseling psychology, as research in the field often involves testing or validating theoretical models. For example, SEM is appropriate in scale development research to confirm the factor structure of an instrument. A researcher may wish to test a hypothesized factor structure of an existing instrument with a new population or may have established a tentative factor structure of a new instrument (perhaps via exploratory factor analysis) and wish to confirm this factor structure on an independent sample. Counseling psychology researchers are also often interested in testing complex theoretical models in relevant areas (e.g., career development and multicultural development models), which can be accomplished effectively via SEM.

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

Martens / SEM IN COUNSELING PSYCHOLOGY 271

Tension Reduction

Social Enjoyment

Pleasant Feelings

Impulsivity

Sensation Seeking

Drinking Motives

Personality
Neuroticism

Binge Drinking

Social Anxiety

Drinks Per Week

Alcohol Problems
Social Problems

Maternal Alcoholism

Paternal Alcoholism

Personal Problems

Family Risk
Age At First Drink

Family Connectedness

Protective Behaviors

Type of Drinking

Stopping Drinking

Peer Support

FIGURE 1. Structural Equation Model Predicting Alcohol Problems

Perhaps because of the rapid expansion in SEM software in recent years, SEM is a popular technique for analyzing data in the social sciences (see Steiger, 2001). Unfortunately, this expansion in popularity coincides with the expression of many concerns in the SEM literature regarding practices of psychological researchers. Recent reviews of SEM research (MacCallum & Austin, 2000; McDonald & Ho, 2002) among various psychology journals reported many questionable practices related to the use of SEM at all stages of research, including conceptualization (e.g., not including and testing plausible alternative models), execution (e.g., modifying or generating models based on empirical rather than theoretical criteria), and interpretation (e.g., not reporting all parameter estimates within a model).

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

272 THE COUNSELING PSYCHOLOGIST / May 2005

Studies from the Journal of Counseling Psychology were included in these previous reviews, but because findings were not categorized by journal, the extent to which the concerns applied specifically to counseling psychology research was impossible to determine. Furthermore, because these reviews cover a fairly limited time (1993 to 1997 for MacCallum & Austin; 1995 to 1997 for McDonald & Ho), the generalizability is questionable. Finally, these reviews were primarily narrative rather than empirical. A broad, empirical review and critique of SEM practices specific to counseling psychology could therefore serve several purposes. First, an empirical rather than narrative review lets findings be presented in a statistical format, which allows readers to generate their own conclusions from the findings. Second, an empirical review can provide counseling psychologists with some gauge of the quality of SEM research that has been published within the field. Besides the scientific importance of evaluating the methodology that was used in a portion of counseling psychology research, a practical consideration emerges when one realizes that counseling psychologists often use SEM to develop and refine psychological assessments. If, for example, a pattern of misusing or misinterpreting SEM exists within counseling psychology, then individuals in the field might need to reexamine instruments developed via these procedures before feeling confident regarding their use. Third, a review over a reasonably long time (e.g., at least 15 years) would allow one to determine if practices related to the use of SEM have improved over time. Finally, an empirical review can educate researchers, journal reviewers, and journal editors by highlighting salient concerns about the use of SEM within counseling psychology. Some specific concerns related to the use of SEM in psychological research that have been highlighted include lack of identification of plausible alternative models, failure to assess for multivariate normality before conducting SEM analysis, failure to assess the fit of the path model separately from the measurement model, failure to provide a full report of parameter estimates, and either generation or modification of models on the basis of empirical, rather than theoretical, criteria (Breckler, 1990; MacCallum & Austin, 2000; McDonald & Ho, 2002). Additionally, researchers are concerned about the use of certain fit indices in assessing how well the theoretical model fits the data (e.g., Hu & Bentler, 1998, 1999). Each of these issues is addressed below. Identifying Plausible Alternative Models According to McDonald and Ho (2002), multiple models that might explain the data are found in most multivariate data sets.3 Thus, a researcher

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

Martens / SEM IN COUNSELING PSYCHOLOGY 273

Tension Reuction

Social Enjoyment

Pleasant Feelings

Impulsivity

Sensation Seeking

Drinking Motives

Personality
Neuroticism

Binge Drinking

Social Anxiety

Drinks Per Week

Alcohol Problems
Social Problems

Maternal Alcoholism

Paternal Alcoholism

Personal Problems

Family Risk
Age At First Drink

Family Connectedness

Protective Behaviors

Type of Drinking

Stopping Drinking

Peer Support

FIGURE 2. Alternative Model Predicting Alcohol Problems, With Additional Paths Included

testing only one model may identify a well-fitting model but may be ignoring other plausible models that better account for the relationships among the data (or at least account for the relationships as well as the initial model). By testing alternative a priori models (i.e., the researcher specifies multiple models to be tested before conducting the analyses), even when a target model is clearly of greatest interest, researchers can protect themselves against a confirmation bias that can occur when only testing one model (MacCallum & Austin, 2000). For example, Figure 2 illustrates an alternative, yet theoretically plausible, model to that depicted in Figure 1. Note that two causal paths have been added: one between personality and alcohol problems and one

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

274 THE COUNSELING PSYCHOLOGIST / May 2005

between family risk and alcohol problems. Essentially, these paths are testing whether a direct relationship exists between personality/familial risk factors and alcohol problems as well as an indirect relationship, which is thought to occur, exists via drinking motives and protective behaviors. By testing this model along with the model depicted in Figure 1, researchers could make conclusions about model fit between two theoretically plausible perspectives and thus be less likely to engage in confirmation bias. A second advantage of testing alternative models is that when one model is nested within another, direct comparisons can be conducted to determine if one model provides a significantly better fit than the other model. Models are considered nested when the model with the smaller number of estimated parameters can be obtained by fixing the values of one or more parameters of the larger model (Bollen, 1989b). For example, one could obtain the model depicted in Figure 1 by constraining the values of the direct paths between personality/familial risk factors and alcohol problems in Figure 2 to zero. Because these models are nested, one could, via the c 2 difference test, determine if the more complex model (i.e., the model in Figure 2) provides a significantly better fit to the data.4 Additionally, testing multiple a priori models provides the researcher with alternatives should problems be found with the initial target model, without relying on post hoc empirically derived model modifications (MacCallum & Austin, 2000). Issues related to empirically derived model modifications are discussed later. Assessing for Multivariate Normality The most common estimation method in SEM research, maximum likelihood, requires an assumption of multivariate normality (Bollen, 1989b; McDonald & Ho, 2002; Quintana & Maxwell, 1999). Essentially, maximum likelihood estimation procedures provide parameter estimates that are most likely (hence the name) to represent the population values, assuming that the sample represents the population from which it was drawn. If SEM is used with data that do not satisfy this requirement, then issues such as biased standard errors, inaccurate test statistics, and inflated Type I error rates can emerge (Chou, Bentler, & Satorra, 1991; Powell & Shafer, 2001; West, Finch, & Curran, 1995). Although the maximum likelihood method may be somewhat robust against this violation, especially with smaller deviations from normality (Amemiya & Anderson, 1990; Browne & Shapiro, 1988; Chou et al., 1991; McDonald & Ho, 2002), it seems prudent that SEM researchers at least note potential issues, concerns, or alternative analytic strategies (e.g., alternative estimation procedures, data transformations, and bootstrapping) related to multivariate normality.

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

Martens / SEM IN COUNSELING PSYCHOLOGY 275

How Well a Model Fits: The Use of Fit Indices When using SEM, a major component of the analysis involves evaluating how the hypothesized model fits the observed data. To assess this fit, researchers generally use various goodness-of-fit measures. The most common measure is the probability of the c 2 statistic, which assesses the magnitude of the discrepancy between the fitted (model) and sample (observed) covariance matrix and represents the most stringent exact fit measure. The null hypothesis for this analysis is that no difference exists between the fitted and sample matrices, so a nonsignificant c 2 indicates that the model accurately represents the data (assuming a true model). However, the power of the c 2 and the c 2 difference test when comparing models, like that of all inferential tests, is influenced by sample size. Therefore, when samples are large, small differences between the fitted and sample covariance matrices (which would indicate a relatively good fit) may yield a statistically significant c 2 (see Bentler & Bonett, 1980; Gerbing & Anderson, 1993; Marsh, Balla, & McDonald, 1988). Furthermore, since SEM analyses typically require fairly large sample sizes, many otherwise well-fitting models may nonetheless yield a statistically significant c 2.5 To deal with this problem, researchers generally use additional measures of fit, but considerable debate exists regarding which fit indices are appropriate (e.g., Bentler, 1990; Bollen, 1990; Gerbing & Anderson, 1993; McDonald & Marsh, 1990). Several studies have found that some commonly used fit indices, such as the goodness-of-fit index (GFI; Jreskog & Srbom, 1981), adjusted goodness-of-fit index (AGFI; Bentler, 1983; Jreskog & Srbom, 1981; Tanaka & Huba, 1985), c 2/df ratio, and normed fit index (NFI; Bentler & Bonett, 1980), were substantially affected by factors extrinsic to actual model misspecification (e.g., sample size and number of indicators per factor) and did not generalize well across samples (Anderson & Gerbing, 1984; Hu & Bentler, 1998; Marsh et al., 1988). In contrast, fit indices such as the Tucker-Lewis index (or non-normed fit index; TLI; Bentler & Bonett, 1980; Tucker & Lewis, 1973), incremental fit index (IFI; Bollen, 1989a), comparative fit index (CFI; Bentler, 1990), root mean square error of approximation (RMSEA; Steiger & Lind, 1980), and standardized root mean square residual (SRMR; Bentler, 1995) were much less affected by factors other than model misspecification and tended to generalize relatively well. Based on these and other findings regarding misspecified models, some SEM experts have recommended against the use of the GFI, AGFI, c 2/df ratio, and NFI, while supporting the use of the TLI, IFI, CFI, RMSEA, and SRMR (e.g., Hu & Bentler, 1998, 1999; Steiger, 2000). Although these recommendations are not the only opinion and should

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

276 THE COUNSELING PSYCHOLOGIST / May 2005

not necessarily be considered the so-called gold standard, the research underlying these recommendations is some of the most comprehensive and compelling on the topic. Thus, these recommendations were followed for the purposes of this article. Assessing the Fit of the Path Model When analyzing a structural equation model that posits causal relations among latent variables, the researcher is typically most interested in the path portion of the structural model (i.e., the relationships among the latent variables), as opposed to the measurement portion of the model (i.e., the manifest indicators of each latent variable). In the examples provided in Figures 1 and 2, the path portion of the model would refer to the causal paths among the latent variables of personality, family risk, drinking motives, protective behaviors, and alcohol problems, while the measurement portion of the model would refer to the paths from each latent variable to its observed indicator variables. When most SEM researchers report the fit of their model, they only report the fit of the full structural model (including both the measurement and path components of the model) or first report the fit of the measurement model and then the fit of the full structural model. In their review, however, McDonald and Ho (2002) identified 14 studies where the fit of the path model itself could be obtained separately from the fit of the measurement model (the discrepancy function and degrees of freedom can be divided into separate additive components for both the measurement and path model; see Steiger, Shapiro, & Browne, 1985). In most of these studies, the fit of the path model itself was poor, even though the fit of the full structural model was generally good. The authors concluded that in many cases the goodness-of-fit of a full structural model conceals the badness-of-fit of the actual path model (which is generally of most interest to the researcher), which generally results from a particularly well-fitting measurement model. In these cases the researchers might conclude that their overall model demonstrates a good fit, when in fact the relationships between the latent variables in their model would be weak. Therefore, they recommended that researchers report the fit of the measurement and path portions of the model separately. Reporting All Parameter Estimates/Effect Sizes Another concern about SEM research is incomplete reporting of all parameter estimates, in particular the error or disturbance variances associated with endogenous (outcome) variables (Hoyle & Panter, 1995; MacCallum & Austin, 2000; McDonald & Ho, 2002). Among other consid-

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

Martens / SEM IN COUNSELING PSYCHOLOGY 277

erations, reporting all parameter estimates (including error variances) allows readers to consider the relationships among the variables in the structural model and the variance explained by the exogenous (predictor) variables with the endogenous variables, rather than simply the fit of the overall model. Alternatively, researchers could simply provide the R2 values for the endogenous variables in their model. In Figures 1 and 2, providing the R2 values for drinking motives, protective behaviors, and particularly alcohol abuse would be useful. Prior reviews have indicated that only about 50% of published SEM studies reported parameter estimates of error and disturbance variances or other measures of effect size (MacCallum & Austin, 2000; McDonald & Ho, 2002). SEM and Model Modification The model modification strategy refers to the practice of modifying an initial model, generally by empirical criteria, until it fits the data (MacCallum & Austin, 2000). SEM models that initially display a poor fit can be easily modified to improve fit by adding parameters that will decrease the c 2 value (i.e., using modification indices), by simply deleting nonsignificant parameters, or by parceling individual items into groups that are then used as manifest variables. Although the practice of parceling items can sometimes be warranted (see Little, Cunningham, Shahar, & Widaman, 2002), parceling items post hoc primarily to improve fit is best considered a model modification strategy. An example of post hoc model modification would occur if a researcher tested the model displayed in Figure 2, found that the path between personality and protective behaviors was nonsignificant, and then deleted the path and reran the analysis. Another example would be if the researcher learned that correlating the error terms (which are not shown in the figures) of the observed variables of impulsivity and sensation seeking would improve model fit, added this parameter, and reran the analysis. Most SEM experts warn against the use of the model modification (e.g., Hoyle & Panter, 1995; MacCallum & Austin, 2000; McDonald & Ho, 2002), which has been described as potentially misleading and easily abused (MacCallum & Austin, 2000, p. 216). These concerns stem from the fact that SEM models that are modified within the same sample to improve fit might be capitalizing on chance or might not cross-validate well, which has been demonstrated in previous research (MacCallum, Roznowski, & Necowitz, 1992). Furthermore, adding paths to an SEM model without removing any paths will generally improve the empirical fit of the model, so researchers might easily obtain a well-fitting

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

278 THE COUNSELING PSYCHOLOGIST / May 2005

model that is not theoretically meaningful (for a discussion on empirical vs. theoretical fit, see Olsson, Troye, & Howell, 1999). Although reviews of SEM practices have not completely discouraged modifying SEM models that do not initially fit well, they recommend that the modifications be few, theoretically defensible, and cross-validated on an independent sample, or they recommend that the importance of the modifications at least be discussed (Boomsma, 2000; MacCallum & Austin, 2000; McDonald & Ho, 2002). Therefore, even though SEM can be used for the exploratory purpose of generating the best-fitting model, and most SEM technical manuals describe such procedures, most SEM experts contend that the technique should be used for confirmatory rather than exploratory purposes (e.g., Bollen, 1989a, 1989b; Hoyle & Panter, 1995; MacCallum & Austin, 2000; McDonald & Ho, 2002). This is the point of view that I have adopted for this article. Purpose of the Study Given (a) the increase in popularity of SEM analysis (Steiger, 2001), (b) the importance of SEM studies within the field, and (c) the various problems and concerns that have been reported in previous SEM reviews (e.g., Breckler, 1990; MacCallum & Austin, 2000; McDonald & Ho, 2002), the main purpose of this study was to review SEM practices within counseling psychology. More specifically, I sought to assess and critique SEM research regarding the following aspects of the analytic technique: (a) identifying alternative models, (b) addressing the assumption of multivariate normality, (c) using fit indices that are less sensitive to extrinsic factors and that generalize better across samples, (d) assessing path model fit separate from measurement model fit, (e) reporting all parameter estimates, and (f) using SEM for model generation/modification. These aspects were chosen because they are among the most salient concerns expressed in the SEM literature and because they are practices that should be fairly easy for SEM researchers to modify, should modification be necessary. Additionally, I sought to assess longitudinal trends regarding these practices to determine if researchers have been more likely over time to adhere to various recommendations regarding SEM use (e.g., Boomsma, 2000; Breckler, 1990; Hoyle & Panter, 1995; MacCallum & Austin, 2000; McDonald & Ho, 2002).

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

Martens / SEM IN COUNSELING PSYCHOLOGY 279

METHOD Selection of Studies Studies during 1987 to 2003 in the Journal of Counseling Psychology (JCP) were reviewed to assess practices related to SEM research in counseling psychology. JCP was chosen because of its status as the flagship journal for research in the field. The year 1987 was chosen for several reasons. First, it was in this year that Fassinger (1987) published an article in a special issue of JCP that served as an introduction to SEM. Second, a PsycINFO search using the term structural equation modeling revealed only 30 citations before 1987, none of which were published in JCP. Third, 1987 appears to be the year when SEM studies began to be consistently published in JCP. Although a handful of articles published in JCP before 1987 used path analysis (i.e., modeling with measured variables only), most of these articles did not use the statistical procedures of assessing model fit that are now commonly used in SEM research (i.e., using the c 2 statistic or other fit indices). To be included in the analyses, articles were selected that utilized either SEM or path analysis for any portion of their results. Thus, studies that used SEM as the main outcome analysis (e.g., testing several theoretical models) or as a preliminary analysis (e.g., establishing a model and then further testing the model using different analytical techniques) were included. Four articles included multiple and distinctly separate studies that used SEM, so for these articles each study was coded separately. A total of 105 studies from 99 separate articles met these criteria and were included in the analyses. Coding Procedure Studies were coded independently by the author and one of four advanced graduate students on several variables, including (a) year of publication, (b) type of study, (c) specification of multiple a priori models, (d) multivariate normality, (e) choice of fit indices, (f) assessment of path fit separately from measurement fit, (g) report of all parameter estimates or effect sizes, and (h) use of post hoc model modification procedures. Interrater agreement was assessed via the kappa statistic. For all variables, the kappa statistic was significant (p < .001) and ranged from .77 to 1.00 (M = .86). Descriptively, agreement percentages ranged from 89% (post hoc model modification procedures) to 100% (specification of multiple a priori models). Any discrepancies were reexamined conjointly by the two coders until proper classification was decided.

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

280 THE COUNSELING PSYCHOLOGIST / May 2005

Year of publication. Studies were coded two ways. For descriptive purposes, they were simply coded by the year of publication. However, for longitudinal analyses (described below), a potential problem would emerge if I attempted to make comparisons by including each year as an independent variable, because of the many levels of the independent variable (i.e., year of the study) and the small cell sizes for some of the years would be entailed. Thus, for the purposes of the longitudinal analyses, year of publication was broken into four relatively equal categories: 1987 to 1995 (28 studies), 1996 to 1998 (23 studies), 1999 to 2001 (29 studies), and 2002 to 2003 (25 studies).6 Type of study. Studies were coded as a path analysis (i.e., model testing with only manifest, or observed, variables),7 confirmatory factor analysis (CFA; i.e., model testing that involved testing a measurement model without positing causal relations among the latent variables), or full SEM (i.e., testing causal relationships among latent variables). Specifying multiple a priori models. Studies were coded in a yes/no format in terms of whether more than one a priori theoretical model that might explain the data was discussed, meaning the multiple models to be tested were specified before analyses were conducted. Studies that tested multiple models only in the context of multigroup analysis (which specify different constraints that are placed on parameter estimates within a model but do not generally involve testing different theoretical models; see Byrne, 2001) were coded as no, as were those studies that included comparisons among models that were generated post hoc (see below). Additionally, a few studies tested the same conceptual model (i.e., all hypothesized paths remained the same) but with slightly different endogenous constructs (e.g., perceived likelihood that a situation would occur vs. perceived seriousness of a situation should it occur). In these instances, the studies were coded as testing only a single a priori model. Addressing multivariate normality. Studies were coded as yes/no in terms of whether issues related to multivariate normality were addressed (e.g., indicating that data were normally distributed, discussing appropriate data transformations, considering alternative estimation strategies, etc.). Choice of fit indices. For each study the individual fit indices used to assess model fit were noted. Assessing path fit separate from measurement fit. Studies were coded as yes/no in terms of whether the fit of the path model separate from the

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

Martens / SEM IN COUNSELING PSYCHOLOGY 281

measurement model was indicated. Additionally, I calculated fit of the path model for those studies that provided the necessary information (i.e., fit of the measurement model and the full structural model) but did not calculate fit of the path model itself. Note that this coding did not apply to path analysis studies (because no measurement model exists because only observed variables are included in the analysis) or CFA studies (because no causal structural relations posited among latent variables exist). Reporting all parameter estimates/effect sizes. Studies were coded yes/no in terms of whether all parameter estimates for the model were reported, including parameter estimates for error and disturbance terms or if effect sizes for the outcome variables were indicated. For studies that tested multiple models, this criterion was applied only to the final model (e.g., a study was coded yes if the authors provided all parameter estimates for the best fitting of two competing models but did not provide parameter estimates for the other model). Post hoc model modification procedures. Studies were coded yes/no to indicate whether the authors engaged in empirically derived post hoc model modification or model generation procedures (e.g., analyzing modification indices or deleting nonsignificant paths). Parceling items post hoc to improve fit was also coded yes, but parceling items a priori was coded no. Data Analysis Descriptive statistics were calculated for all variables to determine the frequency that counseling psychology researchers engaged in the various SEM practices. To assess longitudinal trends on each of these practices, logistic regression analyses were conducted where the four groupings of studies by year were categorically coded as 0 (the oldest set of studies) to 3 (the newest set of studies). Separate logistic regression analyses were conducted for the following dependent variables: specifying multiple a priori models, addressing multivariate normality, choice of fit indices, assessing path fit separate from measurement fit, reporting all parameter estimates, and using post hoc model modification procedures. For comparison purposes, the newest set of studies (2002 to 2003) was used as the reference group. RESULTS All results are discussed on a broad, general level so that no particular author or study is indicated. A total of 105 separate studies published in JCP

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

282 THE COUNSELING PSYCHOLOGIST / May 2005 TABLE 1: Frequency of SEM Studies in JCP by Year Type of Study Year 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 Full SEM 0 1 1 2 0 1 2 3 5 1 4 9 3 3 2 6 4 CFA 1 2 1 1 0 0 0 1 3 1 3 1 9 3 3 3 7 Path Analysis 1 0 0 1 0 1 1 0 0 1 1 2 1 1 4 3 2

NOTE: SEM = structural equation modeling; CFA = confirmatory factor analysis.

between 1987 and 2003 used either SEM or path analysis. Results indicated that SEM seems to be a more popular data analytic technique, because more than half (51%) of the studies were published between 1999 and 2003. Interest in SEM began to rise in 1995, given that eight SEM studies were published in JCP that year while the most in any single previous year had been four. The largest percentage of studies used full structural modeling (45%), followed by CFA (37%) and path analysis (18%). Frequency and type of study, by year, are presented in Table 1. Descriptive Statistics The number of studies, grouped by the four yearly categories, is presented in Table 2 along with the percentage of studies that engaged in each SEM practice. For example, the percentage in the normality category represents the number of studies that addressed this consideration, while the percentage in the modify category represents the number of studies that modified models post hoc based on empirical criteria. The percentage of studies that used each fit index is presented in Table 3. Only indices that were used in at least 10% of the studies are presented. Results for each of these specific categories are summarized below.

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

Martens / SEM IN COUNSELING PSYCHOLOGY 283 TABLE 2: Percentage of Studies Engaging in SEM Practices, Overall and by Year of Publication N Overall 1987 to 1995 1996 to 1998 1999 to 2001 2002 to 2003 105 28 23 29 25 % A Priori Models 47.6 53.6 52.2 41.4 44.0 % Normality 19.0 3.6 26.1 10.3 40.0 % Path Fit 2.1 0.0 7.1 0.0 0.0
a

% PE/ES 46.7 50.0 56.5 34.5 48.0

% Modify 40.0 39.3 43.5 44.8 32.0

NOTE: SEM = structural equation modeling; a priori models = specified multiple a priori theoretical models; normality = assessed for multivariate normality; path fit = measured path fit separate from overall model fit; PE/ES = reported either all parameter estimates or effect sizes for outcome variables; modify = engaged in post hoc empirical model modification procedures. a. Includes only full SEM studies.

Specifying multiple a priori models. Approximately half (47.6%) of the studies specified more than one theoretical model a priori. Additionally, a greater percentage of studies in the older periods reported specifying multiple a priori models compared with the newer periods (53.6% and 52.2% vs. 41.4% and 44.0%). Addressing multivariate normality. Only 19.0% of the studies mentioned the issue of multivariate normality, although results seem to indicate that more recent studies were more likely to address the consideration. These results are similar to those reported in prior SEM reviews (e.g., Breckler, 1990). Researchers used various ways to assess and deal with multivariate normality, such as deleting outliers, transforming data, and using robust estimation procedures. Choice of fit indices. When examining researchers choice of fit indices, one should remember that many fit indices were unavailable during all periods covered by this review. For example, the CFI was not published until 1990, and a common citation for the RMSEA comes from 1993 (Browne & Cudeck, 1993). As expected, the probability of the c 2 statistic was the most commonly used fit index (90.5% of the studies, although it is somewhat surprising that it was not reported in all studies), followed by the CFI (63.8%) and GFI (48.6%). In terms of year of publication, results suggest a decrease in use over time of some fit indices that have been identified as problematic (e.g., GFI and AGFI), while the use of other problematic indices seems somewhat consistent (e.g., c 2/df ratio and NFI). Similarly, results suggest an increase in use over time of some indices that have been identified as more

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

284 Percentage of Studies Using Selected Fit Indices, Overall and by Year of Publication N 105 28 23 29 25 90.5 85.7 100.0 82.8 96.0 42.9 21.4 43.5 62.1 44.0 63.8 17.9 73.9 86.2 80.0 38.1 3.6 17.4 41.4 92.0 36.2 46.4 30.4 34.5 32.0 37.1 25.0 39.1 44.8 40.0 %c
2

TABLE 3: % TLI % CFI % RMSEA % SRMR % c /df


2

% GFI 48.6 57.1 56.5 51.7 28.0

% AGFI 20.0 28.6 17.4 27.6 4.0

% NFI 25.7 25.0 26.1 31.0 20.0

Overall 1987 to 1995 1996 to 1998 1999 to 2001 2002 to 2003

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

NOTE: TLI = Tucker-Lewis index (or non-normed fit index); CFI = comparative fit index; RMSEA = root mean square error of approximation; SRMR = standard2 2 ized root mean square residual; c /df = c /degrees of freedom ratio; GFI = goodness-of-fit index; AGFI = adjusted goodness-of-fit index; NFI = normed fit index.

Martens / SEM IN COUNSELING PSYCHOLOGY 285

accurate at identifying misspecified models (e.g., RMSEA), while the use of other indices accurate at identifying such models has remained relatively consistent (e.g., SRMR). Assessing path versus measurement fit. Only one study that used full SEM explicitly attempted to assess the fit of the path model separately from the measurement model. This is not surprising given that this concern is a fairly recent addition to the SEM literature (e.g., McDonald & Ho, 2002). Several studies assessed the fit of the measurement model before assessing the fit of the full structural model but did not assess the fit of the path model itself and generally drew conclusions in terms of the fit of the full structural model. However, 14 studies provided the necessary information to calculate the fit of the path model itself and did not include other features that would make such calculations impossible (e.g., statistical equivalency between the measurement and structural models or removing a variable from the measurement model when testing the structural model). Twenty-two comparisons were included in these studies because the fit of the model was assessed separately on different groups (e.g., men and women) in several studies. By using the RMSEA as a measure of fit (which conceptually measures the degree to which the model would fit the population covariance matrix, if it were known; see Browne & Cudeck, 1993), with smaller values indicating better fit, results indicated relatively equal fit between the path and measurement portions of the models (M RMSEA values = .068 and .065, respectively). These results differ from prior reviews of SEM research in psychology (McDonald & Ho, 2002), which reported that the fit of the path model was generally worse than the measurement model. The current review, however, did reveal several studies where the fit of the path model was considerably worse than the fit of the measurement model (e.g., RMSEA of .165 vs. .054; .160 vs. .069), yet, based on the fit of the full structural model, the authors concluded that the model fit the data well. Although specific guidelines will vary, a value of .08 for the RMSEA is generally considered an upper bound value for indicating an adequate fit to a model (e.g., Hu & Bentler, 1999). Therefore, in these studies the authors interpreted the relationships among their latent variables as being meaningful (because the overall model fit fairly well), when in fact the portion of their model that examined only these latent variables did not fit well. Reporting all parameter estimates/other measures of effect size. Approximately half (46.7%) of all studies either reported all parameter estimates in the model or provided other indications of effect size (e.g., squared multiple correlations) for the outcome variables in the model. These results were somewhat consistent over the years, except for 1999 to 2001, when

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

286 THE COUNSELING PSYCHOLOGIST / May 2005

only 34.5% of the studies reported either all parameter estimates or other measures of effect size. Modifying models post hoc via empirical criteria. A total of 40.0% of the studies used empirically derived criteria (e.g., modification indices or deletion of nonsignificant parameters) to either improve the fit of the model or generate a well-fitting model. These numbers were fairly consistent over the four periods, although the newest period had the fewest studies that engaged in this practice (32.0%). Of these studies that used empirical model modification or generation procedures, approximately half noted considerations such as (a) modifications that were theoretically plausible, (b) the tentative nature of such models, or (c) the importance of (and in some instances actual) crossvalidation. Logistic Regression Analyses A series of logistic regression analyses were conducted to more precisely assess changes over time regarding the SEM practices addressed in this review. For each analysis, the four-category grouping of study year was entered as a categorical independent variable; the use of the specific practice or fit index (yes/no) was entered as the dependent variable; and the newest category of studies (2002 to 2003) was used as the reference group.8 Note that for some fit indices that have been more recently developed or popularized (e.g., comparative fit index and RMSEA), the oldest set of studies was not included in the logistic regression analyses, and note that an analysis was not conducted for assessing path versus measurement fit (because only one study assessed path vs. measurement fit). Results comparing the yearly categories are summarized in Tables 4 and 5. Of the SEM practices outside of fit index usage, a significant omnibus effect emerged for addressing multivariate normality, c 2(3, N = 105) = 14.28, p = .003. Comparisons between the yearly categories indicated that studies published in 2002 to 2003 were more likely to assess for multivariate normality than those published in 1987 to 1995 (odds ratio = 17.86, p = .008) or 1999 to 2001 (odds ratio = 5.78, p =.017) but that differences between 2002 to 2003 and 1996 to 1998 were not statistically significant. For the use of the fit indices, a significant omnibus effect emerged for the AGFI, c 2(3, N = 105) = 7.77, p = .05; RMSEA, c 2(2, N = 77) = 32.20, p < .01; and Tucker-Lewis index, c 2(3, N = 105) = 10.03, p = .02. For the AGFI, comparisons between the yearly categories indicated that studies published in 2002 to 2003 were less likely than those published in 1987 to 1995 (odds ratio = .10, p = .04) and 1999 to 2001 (odds ratio = .11, p = .05) to use the AGFI. For the RMSEA, results indicated that studies published in 2002 to

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

Martens / SEM IN COUNSELING PSYCHOLOGY 287 TABLE 4: Logistic Regression Analyses Summaries Comparing SEM Practices by Study Year b 0.38 0.33 0.11 2.89 0.64 1.75 0.08 0.34 0.57 0.32 0.49 0.55 Wald Test 0.48 0.32 0.04 6.94 1.03 5.71 0.02 0.35 1.01 0.30 0.67 0.92 OR 0.68 0.72 1.11 17.86 1.89 5.78 0.93 0.71 1.75 0.72 0.61 0.58 95% CI(OR) 0.23 to 2.00 0.23 to 2.22 0.38 to 3.28 2.10 to 66.67 0.55 to 6.45 1.37 to 24.39 0.31 to 2.72 0.23 to 2.22 0.58 to 5.26 0.23 to 2.26 0.19 to 1.98 0.19 to 1.76

SEM Practice A priori models 1987 to 1995 vs. 2002 to 2003 1996 to 1998 vs. 2002 to 2003 1999 to 2001 vs. 2002 to 2003 Normality 1987 to 1995 vs. 2002 to 2003 1996 to 1998 vs. 2002 to 2003 1999 to 2001 vs. 2002 to 2003 PE/ES 1987 to 1995 vs. 2002 to 2003 1996 to 1998 vs. 2002 to 2003 1999 to 2001 vs. 2002 to 2003 Modify 1987 to 1995 vs. 2002 to 2003 1996 to 1998 vs. 2002 to 2003 1999 to 2001 vs. 2002 to 2003

NOTE: A priori models = specified multiple a priori theoretical models; normality = assessed for multivariate normality; PE/ES = reported either all parameter estimates or effect sizes for outcome variables; modify = engaged in post hoc empirical model modification procedures; OR = odds ratio; CI = confidence interval. Odds ratios greater than 1 indicate that studies from 2002 to 2003 were more likely to engage in the practice, while odds ratios less than 1 indicate that studies from 2002 to 2003 were less likely to engage in the practice.

2003 were more likely than those published in 1996 to 1998 (odds ratio = 55.56, p < .01) or 1999 to 2001 (odds ratio = 16.39, p < .01) to use the RMSEA. Even though the omnibus test, which examines the overall difference among the different categories, was statistically significant for the Tucker-Lewis index, no significant differences emerged between studies published in 2002 to 2003 and any other yearly category. Finally, even though the omnibus test for use of the GFI was not statistically significant, l 2(3, N = 105) = 5.92, p = .12, significant differences existed between the yearly categories. Studies published in 2002 to 2003 were less likely to use the GFI than those published in 1987 to 1995 (odds ratio = .29, p = .04) or 1996 to 1998 (odds ratio = .30, p = .05). DISCUSSION In analyzing the results of this study, I am reminded of the statement regarding the water glass that can be seen as either half empty or half full. The pessimist might look at the results and see significant cause for concern and

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

288 THE COUNSELING PSYCHOLOGIST / May 2005 TABLE 5: Logistic Regression Analyses Summaries Comparing Use of Fit Indices by Study Year b 0.69 0.04 0.20 0.29 0.35 0.59 1.23 1.21 1.01 2.26 1.62 2.21 0.61 0.08 0.11 1.06 0.02 0.73 4.00 2.79 0.35 0.45 Wald Test 1.35 0.00 0.13 0.19 0.25 0.84 4.41 3.88 3.05 4.21 1.95 4.03 1.14 0.01 0.04 2.99 0.00 1.74 18.92 11.36 0.25 0.37 OR 2.00 1.04 0.82 0.75 0.71 0.56 0.29 0.30 0.36 0.10 0.20 0.11 0.54 1.08 0.89 2.88 1.02 0.48 55.56 16.39 1.41 0.64 95% CI (OR) 0.62 to 6.45 0.33 to 3.30 0.28 to 2.43 0.20 to 2.75 0.18 to 2.74 0.16 to 1.95 0.09 to 0.92 0.09 to 0.99 0.12 to 1.13 0.01 to 0.90 0.02 to 1.92 0.01 to 0.95 0.18 to 1.67 0.32 to 3.65 0.29 to 2.79 0.87 to 9.52 0.33 to 3.19 0.16 to 1.43 9.01 to 333.33 3.22 to 83.33 0.37 to 5.46 0.15 to 2.70

Fit Index c 2/df 1987 to 1995 vs. 2002 to 2003 1996 to 1998 vs. 2002 to 2003 1999 to 2001 vs. 2002 to 2003 NFI 1987 to 1995 vs. 2002 to 2003 1996 to 1998 vs. 2002 to 2003 1999 to 2001 vs. 2002 to 2003 GFI 1987 to 1995 vs. 2002 to 2003 1996 to 1998 vs. 2002 to 2003 1999 to 2001 vs. 2002 to 2003 AGFI 1987 to 1995 vs. 2002 to 2003 1996 to 1998 vs. 2002 to 2003 1999 to 2001 vs. 2002 to 2003 SRMR 1987 to 1995 vs. 2002 to 2003 1996 to 1998 vs. 2002 to 2003 1999 to 2001 vs. 2002 to 2003 TLI 1987 to 1995 vs. 2002 to 2003 1996 to 1998 vs. 2002 to 2003 1999 to 2001 vs. 2002 to 2003 RMSEA 1996 to 1998 vs. 2002 to 2003 1999 to 2001 vs. 2002 to 2003 CFI 1996 to 1998 vs. 2002 to 2003 1999 to 2001 vs. 2002 to 2003
2 2

NOTE: c /df = c /degrees of freedom ratio; NFI = normed fit index; GFI = goodness-of-fit index; AGFI = adjusted goodness-of-fit index; SRMR = standardized root mean square residual; TLI = Tucker-Lewis index (or non-normed fit index); RMSEA = root mean square error of approximation; CFI = comparative fit index; OR = odds ratio; CI = confidence interval. Odds ratios greater than 1 indicate that studies from the years 2002 to 2003 were more likely to use the fit index, while ratios less than 1 indicate that studies from 2002 to 2003 were less likely to use the fit index.

suggest that much counseling psychology research using SEM has been, and continues to be, in a state of disarray. The optimist, however, might conclude that SEM practices within counseling psychology research are improving. I tend to believe that the truth lies in the middle and will address both the causes for concern and the strengths regarding SEM research in counseling psychology.

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

Martens / SEM IN COUNSELING PSYCHOLOGY 289

The Glass Is Half Empty Results from this review revealed several concerns involving the use of SEM within counseling psychology research, four of which are discussed. First, slightly less than half of the studies tested more than one a priori theoretical model, and the percentage of studies that engaged in this practice actually decreased over time (53.6% of the studies between 1987 and 1995 compared with 44.0% of the studies between 2002 and 2003). Testing multiple a priori models is generally considered the strongest use of SEM (e.g., Hoyle & Panter, 1995; MacCallum & Austin, 2000), so it is somewhat disheartening that only about 50% of the studies in JCP (and even fewer in more recent years) engaged in this practice. Second, slightly less than 50% of the studies either provided all parameter estimates or specified effect sizes in their SEM models, with no improvement in this practice noted over time. This result is somewhat surprising in light of the increased attention in the psychological literature to reporting effect sizes (e.g., Cohen, 1994; Kirk, 2001; Wilkinson & APA Task Force on Statistical Inference, 1999) and the ease that such effects can be reported via SEM (in Windows-based programs, reporting generally involves clicking on a box). Third, despite several articles that provided compelling evidence against the use of certain fit indices (e.g., Hu & Bentler, 1998; Marsh et al., 1988; Steiger, 2000), indices such as the c 2/df ratio and the normed fit index continue to be used in several SEM studies. Fourth, 40% of the studies used post hoc empirical model modification procedures, which has been consistently discouraged in the SEM literature (Hoyle & Panter, 1995; MacCallum et al., 1992; McDonald & Ho, 2002), although approximately half of these studies acknowledged the limitation of this approach, and a few studies (n = 7) even conducted cross-validation procedures with the empirically developed model. Nevertheless, to summarize the pessimistic point of view, one would conclude that many SEM studies use weak methodological approaches, provide no information regarding effects on outcome variables, and continue to use less than desirable measures of fit. Why, then, do counseling psychology researchers who use SEM often not engage in the best practices related to the technique? One explanation could be disconnect between journals where articles related to SEM methodology tend to be published and scholarly journals typically read by researchers, reviewers, and editors. Although there are exceptions (e.g., Quintana & Maxwell, 1999; Tomarken & Waller, 2003), such articles are often published in journals less often read by most counseling psychologists (e.g., Multivariate Behavioral Research, Psychological Methods, Structural Equation Modeling). Therefore, many counseling psychologists may not stay current with trends involving SEM practices.

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

290 THE COUNSELING PSYCHOLOGIST / May 2005

Another explanation may be the relative ease of using statistical programs to conduct SEM analyses. Most SEM software (e.g., AMOS and EQS) does not require the individual to have an in-depth knowledge of SEM. Generally, these programs simply require the user to draw his or her hypothesized model(s), and, assuming that the model is properly (over)identified, the necessary calculations are automatically made. Although the ease with which these programs allow researchers to utilize SEM certainly has benefit, a potential drawback may be that people without a thorough background in SEM theory, statistical assumptions, or best practices are utilizing the technique (see Steiger, 2001). A final explanation, one especially related to the use of post hoc model modification or generation procedures, could involve a so-called file drawer problem (Rosenthal, 1979) that may exist within SEM research. Traditionally, a file drawer problem refers to the practice of publishing only statistically significant results and relegating nonsignificant findings to ones file drawers. In SEM analyses, file drawer problems would refer to only publishing findings about well-fitting models. In all but a handful of studies included in this review, the authors concluded that their model fit the data (e.g., wellfitting model or adequate fit to the data). Although the JCP rejection rate for studies that include final models that do not fit well is unknown, it is plausible to believe researchers perceive that they must develop a well-fitting model to improve their chance at publication. Researchers may therefore be motivated to engage in whatever statistical and empirical procedures are available in the pursuit of the well-fitting model. If indeed well-designed SEM studies that demonstrate a less-than-good fit are not being considered for publication in JCP, then the overall knowledge in our field may be suffering. One can argue that in science the relationships that do not exist are as important to know as are those that do, yet the only studies using SEM that seem to appear in JCP are the latter. The Glass Is Half Full Now, we turn to some of the more optimistic findings from this study, most of which relate to improved SEM practices in the most recent set of JCP studies. First, although the overall percentage of studies that acknowledged the importance of multivariate normality when conducting SEM was relatively low even among the most recent studies (32%), results indicated that more recent studies were more likely to address multivariate normality than older studies. Second, newer studies were more likely to use the RMSEA and less likely to use the GFI and AGFI to assess model fit. These results are encouraging because the RMSEA has been shown to be one of the better measures for detecting true model misspecification, while the GFI and AGFI

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

Martens / SEM IN COUNSELING PSYCHOLOGY 291

are influenced by factors other than the fit of the model itself (Hu & Bentler, 1998; Marsh et al., 1988; Steiger, 2000). Finally, results from studies that provided the necessary information indicated less of a discrepancy between path and measurement fit than has been reported in other reviews of SEM practices (McDonald & Ho, 2002), suggesting that the phenomenon of a well-fitting measurement model masking a poor-fitting path model may not be a general concern within counseling psychology research. What might be some explanations for these encouraging trends within SEM research in counseling psychology? First, it seems that more classes in SEM are being offered as cornerstones, or at least electives, in counseling psychology graduate training, which should have the effect of improving all SEM practices. Second, the improvement in addressing multivariate normality may be a by-product of enhanced overall awareness regarding the importance of data screening (e.g., Farrell, 1999; Wilkinson, 1999). Although assumptions such as normally distributed data for various statistical tests are certainly not a recent phenomenon (e.g., Guilford, 1956), perhaps more researchers are actively aware of the importance of such considerations when designing and reporting their studies, or more editors/reviewers are asking that such information be included. Third, even though many counseling psychology researchers may not read journals that typically publish SEM methodology articles, it seems that some articles become relatively well known outside of the methodological community. For example, 40% of the articles published in 2002 to 2003 cited Hu and Bentlers work on fit indices (1998, 1999), which might explain why some of their recommendations are becoming more popular (e.g., using the RMSEA and not using the GFI). Regarding the RMSEA, one should also remember that part of its increase in use is probably an artifact of its relatively recent promotion in the SEM literature (e.g., Browne & Cudeck, 1993), but this alone does not explain why more than 90% of the JCP studies in 2002 to 2003 used the index. Explaining the relatively equal fit between the path and measurement portions of the full SEM models in JCP, in contrast to findings from other reviews (McDonald & Ho, 2002), is more difficult. One possibility is that the constructs involved in counseling psychology research tend to display stronger relationships with each other than in other areas of psychology, but such a conclusion should be considered tentative at best. One must remember that (a) less than 30% of the full SEM studies in this review provided the necessary information to calculate both path and measurement fit, (b) most of such studies (57%) used post hoc model modification procedures, which could inflate model fit by capitalizing on sample-specific relationships, and (c) several studies demonstrated a considerably worse path fit

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

292 THE COUNSELING PSYCHOLOGIST / May 2005

when compared with measurement fit. Therefore, more definitive conclusions on this topic await further study. Although this review covered a broad representation of procedures related to SEM practice, it was not exhaustive. In the course of analyzing studies for this review, I noticed that many JCP studies included other practices that have been questioned in the SEM literature. One such practice involved parceling items to reduce the number of parameters in the study (see Russell, Kahn, Spoth, & Altmaier, 1998), especially in the context of CFA. Although parceling can sometimes be warranted, especially when one is primarily interested in relationships among latent constructs, it is less appropriate when one is most interested in the relationships among specific items (as in CFA; Little et al., 2002) or when items making up the parcels are not unidimensional. In fact, recent Monte Carlo simulations have found that item parcels often mask misspecified models by yielding acceptable factor loadings and fit indices (Bandalos, 2002; Kim & Hagtvet, 2003). A second practice involved some researchers being overly optimistic in their interpretation of fit indices, a concern that has been addressed in other reviews (e.g., MacCallum & Ho, 2000). For example, a recent JCP study concluded that a RMSEA value of .17 was an indicator of an adequate fit (lower RMSEA values indicate better fitting models), when in fact such a value is well above any recommended criteria (e.g., .08). Third, a few recently published studies engaged in the practice of correlating error terms, often to improve the fit of the model. This practice, except in instances such as longitudinal studies where the same measure is used on separate occasions, is generally frowned upon because it is rarely theoretically defensible (e.g., Boomsma, 2000; Hoyle & Panter, 1995). Finally, only a few studies mentioned the issue of alternative equivalent models, which can be particularly problematic when conceptualizing SEM as causal modeling (see MacCallum et al., 1993). These and other SEM practices (e.g., problems with missing data and assessing model identification) were beyond the scope of the present review but would be worthwhile to address in future studies. This review was limited. One limitation is that a yes/no coding criteria was used to categorize each study regarding the various SEM practices. This coding procedure was useful for providing an overall summary of SEM practices within counseling psychology but did not provide information in areas such as (a) the relevancy of a SEM practice to the unique context of an individual study (e.g., reporting effect sizes may be more important in studies with clear outcome variables of interest, as opposed to studies that primarily involve CFA) or (b) the severity (e.g., deleting one nonsignificant parameter vs. adding multiple parameters post hoc) or specific mechanisms (e.g., various ways to assess multivariate normality) for some SEM practices. Such information was beyond the scope of the present study.

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

Martens / SEM IN COUNSELING PSYCHOLOGY 293

A second limitation was that most recommended practices examined in this review, even those based on empirical findings, contained an inherent degree of subjectivity. Thus, even though most SEM experts might agree with a particular practice (e.g., engaging in no or limited post hoc model modification), one could probably locate dissenters. A third limitation was that the focus on actual SEM practices provided no information regarding the theoretical foundations of the studies that were reviewed. Although assessing the degree to which a study tested some theoretical foundation would be difficult to quantify, such information would be useful to obtain in future reviews. Despite these potential limitations, this review provided an important picture of how counseling psychology researchers have used and continue to use SEM in terms of several best practices related to the analytical technique. To summarize, SEM researchers in counseling psychology have a history of not engaging in the best practices related to the technique and in many areas continue to ignore such practices. In other areas, however, such as recognizing the importance of normally distributed data and using more accurate fit indices, the practices of counseling psychologists seem to be improving. Based on this review, I encourage counseling psychology researchers who utilize SEM to pay closer attention to the practices covered in this review and to follow the recommendations of experts when possible (e.g., Hoyle & Panter, 1995; MacCallum & Austin, 2000; McDonald & Ho, 2002; Tomarken & Waller, 2003). Such recommendations include the following:
Identifying multiple a priori theoretically derived models to test Assessing for multivariate normality and using appropriate procedures (e.g., robust estimation procedures) should non-normality be detected When conducting full SEM analyses (i.e., causal paths hypothesized between latent variables), providing some indication of the fit of the path model separate from the measurement model Reporting all parameter estimates or other means of determining effect size, especially for endogenous variables; this reporting can be easily performed by including the R2 values for each outcome variable or including all parameter estimates in a path diagram Avoiding empirically derived post hoc model modification procedures or at least engaging in only those modifications that can be theoretically defended and noting the limitations of the procedure Using measures of fit that have been shown to be more accurate at rejecting misspecified models (e.g., RMSEA, SRMR, comparative fit index, TuckerLewis index, and incremental fit index)

Although slight inconsistencies might emerge among these recommendations and recommended practices that have been addressed elsewhere,

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

294 THE COUNSELING PSYCHOLOGIST / May 2005

researchers should find considerable overlap. Besides the clear research implications of improving SEM practices within counseling psychology, training and practice would benefit as well. Counseling psychology graduate students would at least become more informed consumers of SEM research and could better evaluate the quality of the work to which they are exposed. Students interested in pursuing research careers would be better grounded in the analytical technique, which would hopefully open more doors in terms of analysis and design options. For psychological practice, the implications of enhancements in any statistical technique are generally indirect, but improving practices related to SEM could improve the science associated with studies that are relevant to the application of psychology. Put another way, practice benefits when the science that is supposed to inform practice is improved. Additionally, I encourage counseling psychology journal reviewers and editors to pay close attention to such recommendations and require that researchers address important SEM considerations should they fail to do so, regardless of whether the researcher is adhering to the recommendation. Finally, I encourage all counseling psychologists involved with SEM at any level to move away from what I perceive to be a culture that only values well-fitting models. In effect, we must place more value on analyses that have a solid theoretical foundation and follow sound analytic procedures rather than becoming enamored with reporting a finding that demonstrates a good fit and therefore doing whatever possible to achieve such an outcome. NOTES
1. In discussing these advantages of structural equation modeling (SEM), I am not suggesting that SEM is inherently superior to other analytical techniques. SEM is, however, particularly useful when testing complex models and/or specific underlying theoretical constructs. 2. Note that this model does not include every parameter or variable necessary to identify and test a structural equation model (e.g., error terms are not included, and specific parameters are not identified). Such information is beyond the scope of this article, and interested readers can consult sources that serve as general introductions for novices to SEM (e.g., Byrne, 2001; Raykov & Marcoulides, 2000). 3. Several authors (e.g., Boomsma, 2000; Hoyle & Panter, 1995; MacCallum, Wegener, Uchino, & Fabrigar, 1993; McDonald & Ho, 2002; Tomarken & Waller, 2003) discuss the issue of assessing equivalent versus nonequivalent a priori models. This topic is beyond the scope of this article, and interested readers can consult these sources. 2 2 4. The c difference test is conducted by calculating the difference in c values and degrees of freedom between the two nested models. The resulting values are examined to determine if significant differences exist in fit between the two models. For example, assume that a less restricted 2 model (i.e., the model with fewer paths) had a c value and degrees of freedom of 100.00 and 30, 2 while the more restricted model had values of 90.00 and 28. The c difference would be 10.00,

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

Martens / SEM IN COUNSELING PSYCHOLOGY 295 which is statistically significant (p < .05) with two degrees of freedom (30 28 = 2). Therefore, 2 one would conclude that the more restricted model (which has the lower c value) demonstrates a 2 significantly better fit than the less restricted model. If the more restricted model had a c value of 95.00, however, the differences would not be considered statistically significant. In such cases, researchers generally accept the simpler of the two models. 5. The issue of sample size in SEM analysis is somewhat controversial, and a detailed discussion is beyond the scope of this article. Some authors recommend addressing sample size based on the ratio of participants to number of parameters (Jackson, 2003). Others discuss sample size in terms of power (e.g., MacCallum, Browne, & Sugawara, 1996), while others provide absolute guidelines (e.g., Hatcher, 1994). Nonetheless, most sources will indicate that, depending on the models complexity, a researcher should have at least 200 cases. 6. Studies were not divided into four equal groups chronologically because I did not want to separate studies published in the same year or, in some cases, the same issue of JCP. Therefore, the four groups were created as equally as possible while maintaining this stipulation. 7. When conducting a path analysis, a researcher generally uses the same procedures as in SEM (i.e., causal relationships specified among multiple variables), except that only observed variables are included. Therefore, there is no measurement model to be tested, because only observed variables are included. However, the issues described in this article apply equally to both path analytic studies and SEM studies that include latent variables. 8. One reviewer suggested that the logistic regression analyses be conducted with the four yearly categories conceptualized as a continuous independent variable. I chose to retain a categorical approach for the following reasons: (a) the yearly groupings technically do not meet criteria for a continuous variable; (b) changes in SEM practices over time should be reflected in significant differences between the newest set of studies and older studies; and (c) interpretation of odds ratios in logistic regression with continuous independent variables is not as straightforward as interpretation with categorical variables (see Pedhazur, 1997). Therefore, I chose to conceptualize the yearly groupings as categorical independent variables.

REFERENCES
Amemiya, Y., & Anderson, T. W. (1990). Asymptotic chi-square tests for a large class of factor analysis models. Annals of Statistics, 18, 1453-1463. Anderson, J. C., & Gerbing, D. W. (1984). The effect of sampling error on convergence, improper solutions, and goodness-of-fit indices for maximum likelihood confirmatory factor analysis. Psychometrika, 49, 155-173. Bandalos, D. L. (2002). The effects of item parceling on goodness-of-fit and parameter estimate bias in structural equation modeling. Structural Equation Modeling, 9, 78-102. Bentler, P. M. (1983). Some contributions to efficient statistics for structural models: Specification and estimation of moment structures. Psychometrika, 48, 493-571. Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107, 238-246. Bentler, P. M. (1995). EQS structural equations program manual. Encino, CA: Multivariate Software. Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin, 88, 588-606. Bollen, K. A. (1989a). A new incremental fit index for general structural equation models. Sociological Methods & Research, 17, 303-316. Bollen, K. A. (1989b). Structural equations with latent variables. New York: John Wiley.

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

296 THE COUNSELING PSYCHOLOGIST / May 2005 Bollen, K. A. (1990). Overall fit in covariance structure models: Two types of sample size effects. Psychological Bulletin, 107, 256-259. Boomsma, A. (2000). Reporting analyses of covariance structures. Structural Equation Modeling, 7, 461-483. Breckler, S. J. (1990). Applications of covariance structure modeling in psychology: Cause for concern? Psychological Bulletin, 107, 260-273. Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 136-162). Newbury Park, CA: Sage. Browne, M. W., & Shapiro, A. (1988). Robustness of normal theory methods in the analysis of linear latent variate models. British Journal of Mathematical and Statistical Psychology, 41, 193-208. Byrne, B. M. (2001). Structural equation modeling with AMOS: Basic concepts, applications, and programming. Mahwah, NJ: Lawrence Erlbaum. Chou, C., Bentler, P. M., & Satorra, A. (1991). Scaled test statistics and robust standard errors for non-normal data in covariance structure analysis: A Monte Carlo study. British Journal of Mathematical and Statistical Psychology, 44, 347-357. Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997-1003. Farrell, A. D. (1999). Statistical methods in clinical research. In P. C. Kendall, J. N. Butcher, & G. N. Holmbeck (Eds.) Handbook of research methods in clinical psychology (2nd ed., pp 72-106). New York: John Wiley. Fassinger, R. (1987). Use of structural equation modeling in counseling psychology research. Journal of Counseling Psychology, 34, 425-436. Gerbing, D. W., & Anderson, J. C. (1993). Monte Carlo evaluations of goodness-of-fit indices for structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 40-65). Newbury Park, CA: Sage. Guilford, J. P. (1956). Fundamental statistics in psychology and education. New York: McGrawHill. Hatcher, L. (1994). A step-by-step approach to using SAS system for factor analysis and structural equation modeling. Cary, NC: SAS Institute. Hoyle, R. H., & Panter, A. T. (1995). Writing about structural equation models. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications (pp. 158-176). Thousand Oaks, CA: Sage. Hu, L., & Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification. Psychological Methods, 3, 424-453. Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1-55. Jackson, D. L. (2003). Revisiting sample size and number of parameter estimates: Some support for the N:q hypothesis. Structural Equation Modeling, 10, 128-141. Jreskog, K. G., & Srbom, D. (1981). LISREL V. Mooresville, IN: Scientific Software. Kim, S., & Hagtvet, K. A. (2003). The impact of misspecified item parceling on representing latent variables in covariance structure modeling: A simulation study. Structural Equation Modeling, 10, 101-127. Kirk, R. E. (2001). Promoting good statistical practices: Some suggestions. Educational and Psychological Measurement, 61, 213-218. Little, T. D., Cunningham, W. A., Shahar, G., & Widaman, K. F. (2002). To parcel or not to parcel: Exploring the question, weighing the merits. Structural Equation Modeling, 9, 151-173. MacCallum, R. C., & Austin, J. T. (2000). Applications of structural equation modeling in psychological research. Annual Review of Psychology, 51, 201-226.

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

Martens / SEM IN COUNSELING PSYCHOLOGY 297 MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1, 130-149. MacCallum, R. C., Roznowski, M., & Necowitz, L. B. (1992). Model modifications in covariance structure analysis: The problem of capitalization on chance. Psychological Bulletin, 111, 490-504. MacCallum, R. C., Wegener, D. T., Uchino, B. N., & Fabrigar, L. R. (1993). The problem of equivalent models in applications of covariance structure analysis. Psychological Bulletin, 114, 185-199. Marsh, H. W., Balla, J. R., & McDonald, R. P. (1988). Goodness-of-fit indices in confirmatory factor analysis: Effects of sample size. Psychological Bulletin, 103, 391-411. McDonald, R. P., & Ho, M. R. (2002). Principles and practice in reporting structural equation analyses. Psychological Methods, 7, 64-82. McDonald, R. P., & Marsh, H. W. (1990). Choosing a multivariate model: Noncentrality and goodness of fit. Psychological Bulletin, 107, 247-255. Olsson, U. H., Troye, S. V., & Howell, R. D. (1999). Theoretic fit and empirical fit: The performance of maximum likelihood versus generalized least squares estimation in structural equation models. Multivariate Behavioral Research, 34, 31-58. Pedhazur, E. J. (1997). Multiple regression in behavioral research (3rd ed.). Fort Worth, TX: Harcourt Brace. Powell, D. A., & Schafer, W. D. (2001). The robustness of the likelihood ratio chi-square test for structural equation models: A meta-analysis. Journal of Educational and Behavioral Statistics, 26, 105-132. Quintana, S. M., & Maxwell, S. E. (1999). Implications of recent developments in structural equation modeling for counseling psychology. The Counseling Psychologist, 27, 485-527. Raykov, T., & Marcoulides, G. A. (2000). A first course in structural equation modeling. Mahwah, NJ: Lawrence Erlbaum. Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86, 638-641. Russell, D. W., Kahn, J. H., Spoth, R. S., & Altmaier, E. M. (1998). Analyzing data from experimental studies: A latent variable structural equation modeling approach. Journal of Counseling Psychology, 45, 18-29. Steiger, J. H. (2000). Point estimation, hypothesis testing, and interval estimation using the RMSEA: Some comments and a reply to Hayduk and Glaser. Structural Equation Modeling, 7, 149-162. Steiger, J. H. (2001). Driving fast in reverse: The relationship between software development, theory, and education in structural equation modeling. Journal of the American Statistical Association, 96, 331-338. Steiger, J. H., & Lind, J. C. (1980, May). Statistically based tests for the number of common factors. Paper presented at the annual meeting of the Psychometric Society, Iowa City, IA. Steiger, J. H., Shapiro, A., & Browne, M. W. (1985). On the multivariate asymptotic distribution of sequential chi-square statistics. Psychometrika, 50, 253-264. Tanaka, J. S., & Huba, G. J. (1985). A fit index for covariance structural models under arbitrary GLS estimation. British Journal of Mathematical and Statistical Psychology, 42, 233-239. Tomarken, A. J., & Waller, N. G. (2003). Potential problems with well fitting models. Journal of Abnormal Psychology, 112, 578-598. Tucker, L. R., & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38, 1-10. West, S. G., Finch, J. F., & Curran, P. J. (1995). Structural equation models with nonnormal variables: Problems and remedies. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications (pp. 56-75). Thousand Oaks, CA: Sage.

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

298 THE COUNSELING PSYCHOLOGIST / May 2005 Wilkinson, L. (1999). Graphs for research in counseling psychology. The Counseling Psychologist, 27, 384-407. Wilkinson, L., & APA Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594-604.

Downloaded from tcp.sagepub.com at Jazan University on August 27, 2012

You might also like