You are on page 1of 6

Bai 1 http://core.ecu.edu/psyc/wuenschk/StatHelp/EFA.

htm

Review of Article on Use of Exploratory Factor Analysis


Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 3, 272-299. I recommend this article to those who are just learning about exploratory factor analysis as well as to those who have used it in their research for many years. The authors discuss several decisions that the factor analyst will need to make when doing a factor analysis and present the results of reanalysis of three data sets from the literature, illustrating the pitfalls associated with making incorrect decisions. Selecting Variables/Items for the Analysis. Ideally the researcher will select items which are reliable and will have good communalities. Include enough variables so that each common factor will be represented by at least three or four variables. See the work of Velicer and Fava (1998) on this topic, which is summarized near the end of my document Factor Analysis. Selecting Subjects for the Analysis. See the earlier work of MacCallum, Widaman, Zhang, & Hong (1999) regarding recommend number of subjects (summary available near the end of my document Factor Analysis). Don't make the mistake of sampling from a population of subjects for which their is little variance in the factors you wish to estimate. You might even want to sample in such a way that your subjects will vary exceptionally much with respect to the factors you wish to estimate but little on other attributes. Principle Components Analysis or Factor Analysis? If your purpose is to reduce the information in many variables into a set of weighted linear combinations of those variables, use Principle Components Analysis (PCA), which does not differentiate between common and unique variance. If your purpose is to identify the latent variables which are contributing to the common variance in a set of measured variables, use Factor Analysis (FA), which will attempt to exclude unique variance from the analysis. Exploratory or Confirmatory Factor Analysis? If you wish to restrict the number of factors extracted to a particular number and specify particular patterns of relationship between measured variables and common factors, and this is done a priori (before seeing the data), then the confirmatory procedure is for you. If you have no such well specified a priori restrictions, then use the exploratory procedure. Which Factor Extraction Procedure? Maximum Likelihood (ML) extraction allows computation of assorted indices of goodness-of-fit (of data to

the model) and the testing of the significance of loadings and correlations between factors, but requires the assumption of multivariate normality. Principal Factors (PF) methods have no distributional assumptions. The authors favor ML extraction. They suggest that one first examine the distributions of the measured variables for normality. Unless there are severe problems ( |skew| > 2, kurtosis > 7), they say go with ML. If there are severe problems, consider trying to correct the problems (by transforming variables, for example) rather than using PF methods. How Many Factors to Extract? Prefer overfactoring (too many factors) to underfactoring (too few factors). Overfactoring is likely to lead to a solution where the major factors are well estimated by the obtained loadings but where there are also additional poorly defined factors (with few, if any, variables loading well on them). Underfactoring is likely to lead to factors that are poorly estimated (poor correspondence between the structure of the true factors and that of the estimated factors), a more serious problem. The authors are not very fond of the Kaiser "eigenvalue greater than 1" rule nor Cattell's scree test. With respect to the former, they note that it was intended to be applied to the eigenvalues of the full correlation matrix (that with 1's in the main diagonal), not to the eigenvalues of the reduced correlation matrix (that with estimates of communalities in that diagonal). The authors spoke kindly of "parallel analysis," in which the obtained eigenvalues are compared to those one would expect to obtain from random data. If the first m eigenvalues are those which have values greater than what would be expected from random data, then one adopts a solution with m factors. Regretfully, this method is not available in the major statistical programs. The goodness-of-fit statistics available from ML factor analysis may be helpful in determining the number of factors to retain. The analyst first decides how many factors, at most, e would be willing to retain. Then e fits models with 0, 1, 2, 3, ... up to that number of factors and compares them with respect to goodness-of-fit. The authors also note that "a model that fails to produce a rotated solution that is interpretable and theoretically sensible has little value." This sounds like what I call the "meaningfulness criterion." I typically examine, in addition to the solution with what seems at first to have the correct number of factors, solutions with one or two more or fewer factors. I then adopt the solution which makes the most sense to me. What Type of Rotation? The authors make a strong argument in favor of oblique rotations rather than orthogonal solutions. They note that dimensions of interest to psychologists are not often dimensions we would expect to be orthogonal. If the latent variables are, in fact, correlated, then an oblique rotation will produce a better estimate of the true factors and a better simple structure than will an orthogonal rotation -- and if the oblique rotation indicates that the factors have close to zero correlations between one another, then the

analyst can go ahead and conduct an orthogonal rotation (which should then give about the same solution as the oblique rotation). What Do Researchers Actually Do? Based on articles published between 1991 and 1995 in the Journal of Personality and Social Psychology and the Journal of Applied Psychology, about half use a PCA, despite the fact that the primary goal was to identify latent variables, in which case FA should have been employed. They do often report the reliabilities of their variables, but not the communalities (which are more informative). Frequently they do not explain the method they used to decide how many factors to retain, and when they do report the method it is most likely to be the eigenvalue-greaterthan-one method They use varimax rotation. When asked to provide a copy of their data so that Fabrigar et al. could determine if a better solution would be obtained by making decisions other than those made by the researchers, most researchers failed to provide the data. For those that did provide the data, Fabrigar et al. found that an oblique rotation often produced a slightly better simple structure than did a varimax rotation, but the pattern of loadings was almost always the same with varimax as with oblique rotation. Why do Researchers Make These Decisions? That is, why do they elect to do a PCA, retain as many factors as have eigenvalues greater than 1, and use varimax rotation? Well, maybe it is just because these are the defaults for factor analysis in SPSS. You know, one does not have to understand anything about factor analysis to be able to point and click. How Can We Prevent Researchers From Making These Bad Decisions? The authors suggest that methodologists, in addition to publishing highly technical papers in journals seen only by other methodologists, need to publish less technical papers in the journals that researchers read, and editors must be willing to publish those articles. Regretfully, the editors of the journals that nonmethodologists read have not, in my experience, been very receptive to publishing such articles -- see Frequency of Type I Errors in Professional Journals for one example. Bai2

Frequency of Type I Errors in Professional Journals


Gasparikova-Krasnec and Ging (1987) expressed the opinion that use of the .05 criterion of statistical significance results in 1 out of every 20 published studies representing a Type I error. I believe that they have grossly overestimated the frequency of published Type I errors and that uncritical acceptance of their opinion may produce more skepticism about the veracity of published studies than is warranted. The frequency of Type I errors in the literature is critically dependent upon the frequency of psychologists' testing true null hypotheses. To keep my

argument very simple, I shall assume that all significant studies are published and that no nonsignificant studies are published. If 50% of the null hypotheses tested by psychologists were true and 50% were false, for every 1000 null hypotheses tested, 500 would be true and 500(.05) = 25 would produce significant results (using the .05 criterion for p) and thus be published. Let us falsely assume that every researcher used methods and sample sizes adequate to hold the Type II error rate (for effect sizes that are nontrivial) at a level equal to that that we consider acceptable for Type I errors, 5%. For every 1000 null hypotheses tested, 500 would be false, and 500(.95) = 475 would produce significant results and thus be published. For the total of 475 + 25 = 500 studies published, 25/500 = 5% would indeed represent Type I errors. Let us now assume that only 10% of the null hypotheses tested by psychologists are true. For every 1000 null hypotheses tested, 100 would be true and would lead to .05(100) = 5 published Type I errors; 900 would be false and would lead to .95(900) = 855 published correct rejections of the null hypothesis. Only 5/860 = 0.6% of the published studies would represent Type I errors. What percentage of the null hypotheses tested by psychologists are likely to be true? I believe that psychologists rarely ever test an absolutely true null hypothesis. Almost all of the null hypotheses tested by psychologists can be reduced to the hypothesis that no correlation exists between (among) two (or more) variables. The probability of a psychologist picking two variables that are absolutely uncorrelated in the population to which the results are to be generalized is extremely small. For example, I would wager a considerable sum on the hypothesis that in the population of all humans, mean IQ is associated with the number of letters in the person's last name. That is, were we to have data for the entire population, I seriously doubt that the mean IQ of persons with one letter last names is exactly equal to that of persons with two letter last names and so forth. Of course, psychologists may indeed often study null hypotheses that while not absolutely true, are nearly true. That is, they may study effects that are trivial, such as the effect upon IQ of the number of letters in one's last name. I believe that psychologists should pay much more attention to consideration of the effects of power upon the probability of a Type II error and upon the probability that a practically trivial effect will be found statistically significant. Most psychologists realize that unacceptably small sample sizes and large error variance make it all too likely that even a relatively large effect will not be found statistically significant, but many often forget that large sample sizes and artificially low error variance can allow one to declare a practically trivial effect as statistically significant. Reference Gasparikova-Krasnec, M., & Ging, N.(1987). Experimental replication and professional cooperation. American Psychologist, 42, 266-267.

The manuscript above was submitted to the American Psychologist in 1987 as a comment on the Gasparikova-Krasnec and Ging article. In 1988 I got a reply from the editor, Leonard D. Goodstein, rejecting the submission. He had received one unfavorable review, the second reviewer never replied, so he gave up waiting for the second reviewer and just rejected it. The one reviewer commented:

the points in the paper are fairly common knowledge it is fairly common knowledge that the proportion of significant results that are Type I errors is not the definition of alpha It is neither important or new

Interestingly, while I was waiting for Goodstein's decision, there appeared in the Quantitative Methods section of the Psychological Bulletin an excellent article on the same topic:Pollard, P., & Richardson, J. T. E.(1987). On the probability of Making Type I Errors. Psychological Bulletin, 102, 159-163. I really enjoyed reading this article, but it is written at a level which will result, IMHO, in it not being read by many, certainly not by those who need most to read it. Below are a few of the points made by Pollard and Richardson in this excellent article. These are direct quotes - I could not state these points any more eloquently than Pollard and Richardson did. If you find them interesting, please do obtain their article and read the full text of it.

"Our informal inquiries within a wide and varied cross section of our professional colleagues indicated a widespread assumption that the probability of having made a Type I error in rejecting the null hypothesis is the same as the alpha level" "One possible reason for the common assumption among psychologists and their students that the alpha level represents the probability of having made a Type I error is that standard statistical texts promote this fallacy." "Of course, the alpha level does indeed give the probability of making a Type I error when the null hypothesis is true, but these quotations <from such texts> involve an unfortunate shorthand in which the conditional nature of this definition is left unstated." "These problems worsen when the authors in question discuss the frequency of Type I errors. For instance, Christenson (1980) reported 'If the .05 significance level is set, you run the risk of being wrong and committing Type I error five times in 100.'" "The alpha level cannot be used to estimate the proportion of Type I errors in the psychological research literature." "to the extent that most psychologists frame good alternative hypotheses (that is, ones more likely to be true than false), P(H 0) will likely to be low." "there are reasons for believing that the overall number of Type I error in the literature is small."

I wonder if the reviewer of my comment or the editor of the American Psychologist ever got around to reading the article by Pollard and Richardson. More recently, Raymond Nickerson (2000, Null hypothesis significance testing: A review of an old and continuing controversy, Psychological Methods, 5, 241-301) has discussed "Misconceptions Associated with NHST," including the Belief That Alpha Is the Probability That if One Has Rejected the Null Hypothesis One Has Made a Type I Error, the Belief That the Value at Which Alpha Is Set for a Given Experiment Is the Probability That a Type I Error Will Be Made in Interpreting the Results of That Experiment, and the Belief That the Value at Which Alpha Is Set Is the Probability of Type I Error Across a Large Set of Experiments in Which Alpha Is Set at That Value. Nickerson's article is a dandy review of various controversies about NHST. Regretfully, because it appears in Psychological Methods, it may not be read by those who most need to read it - I assume that the readership of Psychological Methods includes relatively few of the many who suffer from the misconceptions which Nickerson reviews.

You might also like