Professional Documents
Culture Documents
Definition
Confirmatory factor analysis (CFA) is a procedure for learning the extent to which k observed
variables might measure m abstract variables, wherein m is less than k. In CFA, we indirectly
measure non-observable behavior by taking measures on multiple observed behaviors.
Conceptually, in using CFA we can assume either nominalist or realist constructs, yet most
applications of CFA in the social sciences assume realist constructs.
CFA differs from EFA in that it specifies a factor structure based upon expected theoretical
relationships. Whereas we might think of EFA as a procedure for inductive theory construction,
CFA is a procedure for testing hypotheses deduced from theory. CFA allows the researcher to
conduct two forms of data analysis not avaliable in EFA:
1. CFA allows for the examination of second-order (i.e., higher-order) latent variables. We
might posit, for example, that marital satisfaction (a latent variable) consists of four sub-
dimensions (each a latent variable), satisfaction with: romance, companionship, family
finances, and child rearing.
2. CFA allows for testing hypotheses related to construct validity. We can test for statistical
significance of the effect of a latent variable on each of the observed variables posited to
measure it.
The web page entitled, "Using CFA to Test Empirical Validity" provides an example of how CFA
can be used to examine construct and predictive validity for a second-order latent variable.
CFA requires one to specify the measurement of and relationships among the factors.
Therefore, it relies upon deductive examination of a theory. Deductive analysis has the
advantage of knowing a priori the factor structure, which allows one to test hypotheses related
to examining the various types of construct validity. However, whereas the EFA model is never
underidentified, the CFA model can be underidentified, requiring one to understand
mathematical identification and the rules for certifying model identification.
Assumptions
1. Typically, realism rather than nominalism: Abstract variables are real in their consequences.
2. Normally distributed observed variables.
3. Continuous-level data.
4. Linear relationships among the observed variables.
5. Content validity of the items used to measure an abstract concept.
6. E(ei) = 0 (random error).
7. Theoretically specified relationships among observed variables and factors.
8. A sample size greater than 100 (more is better).
Note: In CFA:
1. we use the symbol (xi) to refer to an exogenous factor (an independent latent variable).
2. we use the symbol (eta) to refer to an endogenous factor (a dependent latent variable).
3. we use the symbol to refer to the intercept of the measurement model.
4. we use the symbol to refer to the variance/covariance matrix of the factor(s). Note that the
variance of a factor always equals 1 in EFA.
The diagram shown below shows the terminology typically used in CFA and Structural Equation
Modeling (SEM). This course addresses the "measurement" model, meaning the
measurements of and relationships among the exogenous variables. Soc 613 addresses the
"causal" model, referring to relationships among the exogenous and endogenous variables.
The model shown below specifies that a set of three abstract variables related
to locus of controlinternal, chance, and powerful otherscan be measured with sufficient
validity and reliability by nine observed variables, wherein each latent variable is measured with
three observed variables.
Software Packages for Conducting CFA
The Sociology 512 web site provides examples of conducting CFA using six well known
software packages: LISREL, MPlus, R, SAS, SPSS/AMOS, and Stata. The examples shown in
these notes rely mainly upon the LISREL software package.
We noted above that CFA assumes a sample size of at least 100. Understanding the
consequences of measurement error can explain why we make this assumption.
Single Indicator of
X1 = 1 + 111 + 1, where:
Recall that this equation cannot be solved because of the linear dependencies in the matrix.
To estimate the parameters, we must make one of the following assumptions:
X1 = 1 + 111 + 1
X2 = 2 + 211 + 2
A measurement model specifies a structural relationship that connects latent variables to one or
more observed variables. The general linear model for specifying these relationships is:
= () = E(XX), where:
1. refers to reality.
2. () refers to theory.
3. E(XX) refers to the correlation matrix of observed variables.
Consider the following example of the measurement model:
X1 = 1 + 111 + 1
X2 = 2 + 211 + 2
or, in general:
X = x +
Most latent variables in the social sciences are abstract ones. Abstract variables require an
arbitrary scale. There are two approaches to setting a scale:
Assume:
1. E() = E(),= 0, factors are not correlated with errors (random errors in measurement).
2. E() is the covariance matrix of latent variables:
3. E() is the covariance matrix of errors:
4. Therefore: = () = E(XX) = xx' +
Model Identification
X1 X2 X3 X4
Assume: uncorrelated error terms. This assumption is not necessary in CFA; it is made here to
simplify the presentation regarding model identification.
Then,
X= | X1 |
| X2 |
| X3 |
| X4 |
x= | 1 0 |
| 21 0 |
| 0 1 |
| 0 42 |
= | 1 |
| 2 |
= | |
| |
= | var(1) 0 0 0 |
| 0 var(2) 0 0 |
| 0 0 var(3) 0 |
| 0 0 0 var(4) |
Compute ():
| 1 0 |
x | 21 0 | *
| 0 1 |
| 0 42 |
| |
| | =
| |
| 21 21 |
x | | *
| 42 42 |
x' | 1 21 0 0 |
| 0 0 1 42 | =
'
xx
| |
| 21
21
2
|
| 21 | +
| 42 2142 42
42
2
|
| var(1) 0 0 0 |
| 0 var(2) 0 0 |
| 0 0 var(3) 0 | =
| 0 0 0 var(4) |
() | var(1) |
| 21 212var(2) |
| 21 var(3) |
| 42 2142 42 var(4)
42
2
|
Using E(XX'):
| var(X1) |
| cov(X1 X2) var(X2) |
| cov(X1 X3) cov(X2 X3) var(X3) |
| cov(X1 X4) cov(X2 X4) cov(X3 X4) var(X4) |
Then:
cov(X1 X3)
21 = cov(X2 X3) / cov(X1 X3)
42 = cov(X1 X4) / cov(X1 X3)
11 = [cov(X1 X2) * cov(X1 X3)] / cov(X2 X3)
22 = [cov(X3 X4) * cov(X1 X3)] / cov(X1 X4)
var(1) = var(X1) - 11
var(2) = var(X2) - 2111
var(3) = var(X3) - 22
var(4) = var(X4) - 4211
Example
Assume the correlation matrix shown below. Calculate the parameter estimates given the
model as identified above.
rx1x2 =
| 1 |
| .305 1 |
| .233 .230 1 |
| .216 .213 .308 1 |
rx1x3 = .233
21 = rx2x3 / rx1x3 = .987
41 = rx1x4 / rx1x3 = .927
11 = (rx1x2 * rx1x3) / rx2x3 = .309
22 = (rx3x4 * rx1x4) / rx2x3 = .332
= 1 - .309 = .691
= 1 [.9872 * .309] = .699
= 1 - .332 = .668
= 1 [.9272 * .332] = .715
X1 = (12)(.309) / 1 = .309.
X2 = (.9872)(.309) / 1 = .301.
X3 = (12)(.332) / 1 = .332.
X4 = (.9272)(.332) / 1 = .285.
Summary
T-rule
1. t = (4)(5) = 10.
2. The nine estimated parameters to be estimated are: 21, 41, 11, 22, 12, , 22,33, and
44
3. Therefore, the model meets the t rule. In this case, the model is said to be "underidentified"
because t < 10.
Degrees of Freedom
d.f. = [q(q+1) / 2] t.
That is, the number of potential parameters minus the number of estimated parameters.
Model Evaluation
Theoretical proposition:
= () = E(XX), where:
1. refers to reality.
2. () refers to theory.
3. E(XX) refers to the correlation matrix of observed variables.
Notation:
S = E(XX), the observed correlation matrix.
( ) = the matrix of estimated parameters.
Alternative Hypothesis: The theory fits the data.
S = ( )
Null Hypothesis: There is no difference between the estimated parameter matrix and the
observed correlation matrix.
S - ( ) = 0
Note: A relatively small value for a model test statistic, such as chi-square, indicates that the
theory fits the data. Such a finding would indicate support for the theory. Thus, in evaluating
model fit, we look for a low chi-square value relative to the degrees of freedom, showing a
probability of alpha < .05.
Note: Measures of overall fit are not applicable to exactly identified models because at least one
degree of freedom is required for the hypothesis.
Note: Although evaluation statistics might indicate an overall good fit for the model, the individual
parameter estimates might be theoretically inappropriate or statistically non-significant.
Ho: S - ( ) = 0
n = sample size.
log refers to the natural log.
Consider the conceptual foundation of chi-square. It equals a summary of the estimated score
minus the observed score in a table. In this same manner, chi-square equals the estimated
parameters plus their item reliabilities (the trace, or diagonal of the observed correlations divided
by the estimated parameters) minus the observed correlation matrix minus the number of
observed variables.
Coefficient of Determination
The coefficient of determination (R-square) calculates the percent of variance explained in the
observed variables (X matrix) by the latent variables ( matrix). It equals 1 minus the
determinant of the errors in estimating X (the matrix) divided by the determinant of Sigma-hat
(i.e., the input correlation matrix).
R2 = 1 [ || / | XX' |
Various goodness of fit indexes have been developed to assess model fit. Ones more
commonly used are the Goodness of Fit Index (GFI) and the Adjusted Goodness of Fit Index
(AGFI). The Residual Mean Square (RMS) and Critical N (CN) also are a popular statistics
used to assess model fit. Critical N is equal to "what chi-square would be if the sample size
were 200." Thus, Critical N adjusts chi-square for very large samples, wherein a large sample
size can create a large chi-square statistic even when the "amount of error" is small.
These indexes have the disadvantage of not having ratio scales. Thus, the community of
scholars must arrive at some agreed upon level of the indexes that assures them of adequate
model fit. In general, a GFI or AGFI of .9 or above is considered acceptable. The community of
scholars looks for an RMS of below .05. The community of scholars looks for a CN of above
200, meaning that "a sample size of more than 200 is needed to arrive at a chi-square that
indicates a probability of alpha greater than .05." See related article by Schreiber et al. for a
detailed description of model evaluation for CFA and Structural Equation models.
The t-test at 1 degree of freedom is used to evaluate the statistical significance of a parameter
estimate, wherein t = estimate / standard error of the estimate. A t-ratio of 1.98 or greater
indicates statistical significance at alpha = .05.
Reliability of the Parameter Estimates
Reliability of Xi
The reliability (i.e., communality) of Xi is the magnitude of the direct relationship that all latent
variables have on Xi.
11 = 21
1 2
In the Congeneric Model:
11 21
1 2
The Reliability of
q q q
(the reliability of ) = xi)2 / xi)2 + i)
i 1 i 1 i 1
ijs = ij / var(Xi)
In matrix format:
xs = Dx-1x D
s = D-1 D-1
s = Dx-1 Dx-1
where:
D= (diag )1/2
Unique Validity Variance
In cases where a measurement model specifies correlated factors or error terms one might want
to know the unique commonality for an observed variable.
Uxij (the unique validity variance, or commonality) of the effect of Xi on Xj) = Rxi2 - Rxi(j)2, where:
Rxi2 is the squared multiple correlation coefficient for Xi. This is the proportion of variance in Xi
explained by all latent variables in the model that have a direct effect on Xi.
1. xiis the correlations of on Xi, for all that affect Xi. (a 1 x d vector, where d is the
number of with direct effects on Xi).
2. * = correlation matrix of all with direct effects on Xi.
Rxi(j)2 is the squared multiple correlation coefficient for Xi, controlling for the effects of the latent
variable on other observed variables.
1. xi() = is the correlations of on Xi, for all that affect Xi, except for j, the latent variable of
interest (a 1 x d vector, where d is the number of with direct effects on Xi).
2. (j)* = correlation matrix of all with direct effects on Xi, except for j, the latent variable of
interest.
Note: The unique validity variance might be relatively low in comparison with Rxi2 because Xi
might depend upon highly correlated latent variables.
Degree of Collinearity
A measurement model with more than one latent variable, wherein the latent variables are
correlated with one another, should be evaluated for its degree of collinearity.
Having found some underlying dimension(s) in the data, the researcher might want to
construct a factor scale. A factor scale is a latent variable derived from two or more
observed variables that have been demonstrated to have content and construct validity,
and which are sufficiently reliable to be used for further analysis.
Factor scales can be used in two ways: 1) to examine observations in terms of their
scores on the latent variables, 2) to use the latent variables in subsequent analysis as
independent and/or dependent variables.
Measurements on factor scales can be constructed in several ways. First, they can be
calculated by simply adding or obtaining the mean of the two or more observed variables
comprising the scale. If the observed variables differ in their item reliabilities, however, the
researcher might want to construct the factor scale based upon weighted observed
variables. Observed variables typically are weighted by their parameter estimates on the
factor. Listed below are three procedures that use different assumptions to create more
refined factor scores.
Bollen's Procedure
Bollen suggests accounting for the correlations among the latent variables:
Factor Score = (xS-1)x, where S = the observed correlation matrix.
Bartlett's Procedure
Barlett suggests giving more weight to observed variables with greater item reliability:
Factor Score = [(x'-2)(S-2S)-1]x, where S = the observed correlation matrix.
Hypothesis Testing and Model Comparison
One advantage to theory testing and the subsequent use of CFA is that nested models can be
used to test hypotheses. One can conduct a difference in chi-square test, for example, to
evaluate the extent to which changes in model specification affect model fit.
If the model fits the data, then chi-square will be low and the prob. of a type-I error will be
over .05 (assuming an assigned type-1 error rate of 5%).
If there is no relationship between the model and the data, then chi-square will be high and
the prob. of a type-I error will be less than .05 (assuming an assigned type-1 error rate of
5%).
The approach to testing differences in estimates across two samples, or testing for the
moderating effect of an external variable, is to estimate a baseline model that assumes no
difference in estimates across the two samples. Then, estimate less restricted models, ones
that allow for differences in parameter estimates across levels of the external variable. The chi-
square calculation for each less restricted model will be less than the chi-square value for the
baseline model. And the degrees of freedom for the less restricted model will be less than that
of the baseline model. To determine if a less restricted model fits the data better than the
baseline model, one can calculate a chi-square difference test:
2r - 2u
chi-square (baseline) chi-square (less restricted).
This difference score is evaluated at the difference in the degrees of freedom for the two
models:
For example, suppose the chi-square for a baseline model that contains three parameters in the
gamma matrix equals 142.691 at 123 d.f. Suppose that a less restricted model is estimated that
allows for the three parameters in the gamma matrix to be estimated separately for the two
groups under consideration. And suppose that the chi-square for this less restricted model
equals 110.527 at 120 d.f. Then the difference in chi-square equals 32.164 at 3 d.f. The critical
value of chi-square at three degrees of freedom for a type-I error rate of 5% equals 7.815.
Therefore, we would conclude that, at a type-I error rate of 5%, the less restricted model fits the
data better than does the baseline model, meaning that the parameter estimates differ
significantly from one another across the two levels of the external variable. The next step
would be to conduct a chi-square difference test for each of the paths in the gamma matrix to
determine which of the three paths has significantly different parameter estimates across the
two levels of the external variable.
Typically, one would allow a matrix of estimates, such as the lambda, gamma, beta, and error
matrices (psi, theta-delta, and theta-epsilon) matrices to become less restricted to examine the
possibility of differences in parameters across the levels of the external variable. If the chi-
square difference test indicates that the baseline model and less restricted model contain at
least some significantly different parameter estimates, then one would test each path within a
matrix at a time to locate the ones that differ significantly from one another (they might all be
significantly different from one another).
If one finds a less restricted model that fits the data significantly better than the baseline model,
then this model becomes the new "baseline" model for testing of further differences in
parameter estimates across levels of the external variable.
The Sociology 512 web site includes notes on hypothesis testing using the SAS and LISREL
software packages.
Some latent variables are themselves considered to be composed of multiple latent variables.
The latent variable Locus of Control (LOC), for example, is thought to comprise three sub-
dimensions: internal, chance, and powerful others. The diagram below illustrates a second-
order model of LOC, with the variable "perceived risk" used to assess the predictive validity of
the measure of LOC.
= + , where:
A central premise of CFA is that the theory fits the data. Thus, if an observed variable is posited
to measure just one latent variable then it should not also have a significant parameter estimate
on another latent variable. If an observed variable X1 is posited to measure 1, for example,
then X1 should not have a significant parameter estimate on 2. If it does, then we can question
the construct validity of X1 as an indicator of 1 as well as the theory that specifies that X1
measures only 1.
Sensitivily analysis examines the extent to which a theory has construct validity: the extent to
which hypotheses of no relationship are supported by the data.
Consider, for example, the Locus of Control CFA model as specified by Sapp and Harrod (see:
http://www.soc.iastate.edu/sapp/Soc512MeasurementRefs.html). Sapp and Harrod posit that 1)
the latent variable Internal is measured with three observed variables: Own Actions, Protect,
and Determine, 2) the latent variable Chance is measured with three observed variables:
Accidential Happenings, Bad Luck Happenings, and Lucky, and 3) the latent variable Powerful
Others is measured with three observed variables: Pressure Groups, Powerful Others, and
Powerful People (see: http://www.soc.iastate.edu/sapp/soc512LOCCFAModel.pdf). Implied by
this model is that Own Actions, for example, which is posited to measure the latent variable
Internal, is not significantly related to either of the remaining latent variables: Chance or
Powerful Others.
Sensitivity analysis examines whether the implied hypotheses of no relationship are supported
by the data. Shown below are examples of sensitivity analysis for the Sapp and Harrod LOC
model conducted in LISREL.
The Sociology 512 web site includes notes on hypothesis testing using the SAS and LISREL
software packages.
Means and Intercepts for Latent Variables
In CFA with multiple samples, it is possible to estimate means and intercepts for the latent
variables.
where:
x is the constant intercept term for each Xi. This value is set to be equal across samples (g).
Loadings are listed in the Lambda X matrices. Intercepts are listed in the Tau X matrices.
These matrices are the same for all groups.
Internal 1.121
Internal .869
g
[( xij x j ) n / g ] / ni
i 1
j = 1, 2, 3 ... k factors
g
[( xij x j ) n / g ] / ni = [ (-.218 + .109) 64.5 ] / 62 = -.113
i 1
Analysis of Ordinal Variables
The maximum likelihood estimation (MLE) approach relies on the strong assumption of
multivariate normality. In practice, a substantial amount of social science data is non-normal.
Survey responses are often coded as yes/no or as scores on an ordered scale (e.g. strongly
disagree, disagree, neutral, agree, strongly agree). In the presence of categorical or ordinal
data, MLE may not work properly, calling for alternative estimation methods.
Mplus and LISREL employ a multi-step method for ordinal outcome variables that analyzes a
matrix of polychoric correlations rather than covariances. This approach works as follows:
In LISREL, the diagonally weighted least squares (DWLS) method needs to be specified.
Alternatively, the polychoric correlation matrix and asymptotic covariance matrix is estimated
and saved into a LISREL system file (.dsf) using PRELIS before fitting the model.
Mplus automatically follows above steps when the syntax includes a line identifying observed
variables as categorical.
Instructions
[For those times when you will be using data collected by persons other than those who
graduated from ISU, given that ISU graduates never would be so silly as to collect ordinal-level
data! ]
When conducting CFA with ordinal-level data, use weighted least squares with an asymptotic
covariance matrix. N must be at least 200 if k < 12 and at least 1.5 k(k+1) if K 12.
Power Analysis
What is power?
To understand power, it is helpful to review what inferential statistics test. When you conduct
an inferential statistical test, you are often comparing two hypotheses:
The null hypothesis This hypothesis predicts that your program will not have an
effect on your variable of interest. For example, if you are measuring students level of
concern for the environment before and after a field trip, the null hypothesis is that their
level of concern will remain the same.
The alternative hypothesis This hypothesis predicts that you will find a difference
between groups. Using the example above, the alternative hypothesis is that students
post-trip level of concern for the environment will differ from their pre-trip level of
concern.
Statistical tests look for evidence that you can reject the null hypothesis and conclude that
your program had an effect. With any statistical test, however, there is always the possibility
that you will find a difference between groups when one does not actually exist. This is
called a Type I error. Likewise, it is possible that when a difference does exist, the test will
not be able to identify it. This type of mistake is called a Type II error.
Power refers to the probability that your test will find a statistically significant difference when
such a difference actually exists. In other words, power is the probability that you will reject
the null hypothesis when you should (and thus avoid a Type II error). It is generally accepted
that power should be .8 or greater; that is, you should have an 80% or greater chance of
finding a statistically significant difference when there is one.
1. Estimate the more specified model and ACOV(a), the covariance matrix of the parameter
estimates for this model.
2. Calculate the added parameter estimates for the more specified model (Ha)under the
assumption that all standardized estimates equal .1.
3. NCP = [(column x row) matrix of the added parameter estimates] * [diagonal matrix of the
variances of the added parameters (inverse)] * [(row * column) matrix of the added
parameter estimates].
4. Calculate the power of the test.
The Multitrait-Multimethod Matrix
Before you can interpret an MTMM, you have to understand how to identify the
different parts of the matrix. First, you should note that the matrix is consists of
nothing but correlations. It is a square, symmetric matrix, so we only need to look at
half of it (the figure shows the lower triangle). Second, these correlations can be
grouped into three kinds of shapes: diagonals, triangles, and blocks. The specific
shapes are:
These are the correlations among measures that share the same method
of measurement. For instance, A1-B1 = .51 in the upper left heterotrait-
monomethod triangle. Note that what these correlations share is
method, not trait or concept. If these correlations are high, it is because
measuring different things with the same method results in correlated
measures. Or, in more straightforward terms, you've got a strong
"methods" factor.
Heterotrait-Heteromethod Triangles
The Multitrait-Multimethod Matrix
These are correlations that differ in both trait and method. For instance,
A1-B2 is .22 in the example. Generally, because these correlations share
neither trait nor method we expect them to be the lowest in the matrix.
These consist of all of the correlations that share the same method of
measurement. There are as many blocks as there are methods of
measurement.
These consist of all correlations that do not share the same methods.
There are (K(K-1))/2 such blocks, where K = the number of methods.
In the example, there are 3 methods and so there are (3(3-1))/2 =
(3(2))/2 = 6/2 = 3 such blocks.
Now that you can identify the different parts of the MTMM, you can begin to
understand the rules for interpreting it. You should realize that MTMM
interpretation requires the researcher to use judgment. Even though some of the
principles may be violated in an MTMM, you may still wind up concluding that you
have fairly strong construct validity. In other words, you won't necessarily get perfect
adherence to these principles in applied research settings, even when you do have
evidence to support construct validity. To me, interpreting an MTMM is a lot like a
physician's reading of an x-ray. A practiced eye can often spot things that the
neophyte misses! A researcher who is experienced with MTMM can use it identify
weaknesses in measurement as well as for assessing construct validity.
To help make the principles more concrete, let's make the example a bit more
realistic. We'll imagine that we are going to conduct a study of sixth grade students
and that we want to measure three traits or concepts: Self Esteem (SE), Self
Disclosure (SD) and Locus of Control (LC). Furthermore, let's measure each of
these three different ways: a Paper-and-Pencil (P&P) measure, a Teacher rating, and a
Parent rating. The results are arrayed in the MTMM. As the principles are presented,
try to identify the appropriate coefficients in the MTMM and make a judgement
The Multitrait-Multimethod Matrix
That is, a trait should be more highly correlated with itself than with
anything else! This is uniformly true in our example.
A validity coefficient should be higher than values lying in its column and row
in the same heteromethod block.
The example clearly meets this criterion. Notice that in all triangles the
SE-SD relationship is approximately twice as large as the relationships
that involve LC.
Despite these advantages, MTMM has received little use since its introduction in
1959. There are several reasons. First, in its purest form, MTMM requires that you
have a fully-crossed measurement design -- each of several traits is measured by each
of several methods. While Campbell and Fiske explicitly recognized that one could
The Multitrait-Multimethod Matrix
As mentioned
above, one of the
most difficult
aspects of MTMM
from an
implementation
point of view is that
it required a design
that included all
combinations of
both traits and
methods. But the
ideas of convergent
and discriminant
validity do not
require the methods factor. To see this, we have to reconsider what Campbell and
Fiske meant by convergent and discriminant validity.
It is the principle that measures of theoretically different constructs should not correlate highly
with each other. We can see that in the example that shows two constructs --
self-esteem and locus of control -- each measured in two instruments. We would
expect that, because these are measures of different constructs, the cross-construct
correlations would be low, as shown in the figure. These low correlations are
evidence for validity. Finally, we can put this all together to see how we can address
both convergent and discriminant validity simultaneously. Here, we have two
constructs -- self-esteem and locus of control -- each measured with three
instruments. The red and green correlations are within-construct ones. They are a
reflection of convergent validity and should be strong. The blue correlations are
cross-construct and reflect discriminant validity. They should be uniformly lower
than the convergent coefficients.
The important thing to notice about this matrix is that it does not explicitly include a
methods factor as a true MTMM would. The matrix examines both convergent and
discriminant validity (like the MTMM) but it only explicitly looks at construct intra-
and interrelationships. We can see in this example that the MTMM idea really had
two major themes. The first was the idea of looking simultaneously at the pattern of
convergence and discrimination. This idea is similar in purpose to the notions
implicit in the nomological network -- we are looking at the pattern of
interrelationships based upon our theory of the nomological net. The second idea in
MTMM was the emphasis on methods as a potential confounding factor.
The Multitrait-Multimethod Matrix
While methods may confound the results, they won't necessarily do so in any given
study. And, while we need to examine our results for the potential for methods
factors, it may be that combining this desire to assess the confound with the need to
assess construct validity is more than one methodology can feasibly handle. Perhaps
if we split the two agendas, we will find that the possibility that we can examine
convergent and discriminant validity is greater. But what do we do about methods
factors? One way to deal with them is through replication of research projects, rather
than trying to incorporate a methods test into a single research study. Thus, if we
find a particular outcome in a study using several measures, we might see if that same
outcome is obtained when we replicate the study using different measures and
methods of measurement for the same constructs. The methods issue is considered
more as an issue of generalizability (across measurement methods) rather than one of
construct validity.
When viewed this way, we have moved from the idea of a MTMM to that of the
multitrait matrix that enables us to examine convergent and discriminant validity, and
hence construct validity. We will see that when we move away from the explicit
consideration of methods and when we begin to see convergence and discrimination
as differences of degree, we essentially have the foundation for the pattern matching
approach to assessing construct validity.