Factor Analysis

Factor Analysis
Exploratory factor analysis is a statistical approach that can be used to analyze

interrelationships among a large number of variables and to explain these variables in
terms of a smaller number of common underlying dimensions. This involves finding a
way of condensing the information contained in some of the original variables into a
smaller set of implicit variables (called factors) with a minimum loss of information.
For example, suppose you would like to test the observation that customer satisfaction is
based on product knowledge, communications skills and people skills. You develop a new
questionnaire about customer satisfaction with 30 questions: 10 concerning product
knowledge, 10 concerning communication skills and 10 concerning people skills. Before
using the questionnaire on your sample, you pretest it on a group of people similar to
those who will be completing your survey.
You perform a factor analysis to see if there are really these three factors. If they do, you
will be able to create three separate scales, by summing the items on each dimension.
Factor analysis is based on a correlation table. If there are k items in the study
(e.g. kquestions in the above example) then the correlation table has k × k entries of
form rijwhere each rij is the correlation coefficient between item i and item j. The main
diagonal consists of entries with value 1.
Closely related to factor analysis is principal component analysis, which creates a
picture of the relationships between the variables useful in identifying common factors.
Factor analysis is based on various concepts from Linear Algebra, in particular
eigenvalues, eigenvectors, orthogonal matrices and the spectral theorem. We review these
concepts first before explaining how principal component analysis and factor analysis
work.
Topics:
 Linear Algebra Background

 Principal Component Analysis (PCA)
 Basic Concepts of Factor Analysis
 Factor Extraction
 Determining the Number of Factors to Retain
 Rotation
 Factor Scores
 Validity of Correlation Matrix and Sample Size
 Principal Axis Method of Factor Extraction
 Real Statistics Functions and Data Analysis Tools
To illustrate Factor Analysis we will use an example. Click here for a complete
description of this example.
Linear Algebra Background for Factor Analysis
We now summarize the key concepts from Linear Algebra that are necessary to perform
principal component analysis and factor analysis. Additional details can be found
in Linear Algebra and Advanced Matrix Topics.
Definition 1: Given a square k × k matrix A, an eigenvalue of A is a scalar λ such that
det (A – λI) = 0, where I is the k × k identity matrix. A non-zero k × 1 column vector X is
an eigenvector which corresponds to eigenvalue λ provided AX = λX.
Observation: Since any scalar multiple of an eigenvector is also an eigenvector,
commonly we consider unit eigenvectors (i.e. an eigenvector whose length is 1). If X
=[xi] is an eigenvector corresponding to λ, then X/||X|| is a unit eigenvector
corresponding to λ, where ||X|| = .

Observation: Eigenvalues and eigenvectors of a square matrix can be constructed in
Excel using a variety of approaches. In Excel’s Goal Seek and Solver we show how to find
eigenvalues using Excel’s Solver capability; we can then find the corresponding
eigenvectors using Gaussian Elimination (although there are some limits to this
approach). We also show how to calculate eigenvalues and eigenvectors using QR
Factorization (see Orthogonal Vectors and Matrices and Spectral Decomposition).
When you need to find eigenvalues and/or eigenvectors you can use either of these
techniques within Excel, but because either method is complicated and time consuming,
we suggest that you use following supplemental array functions.
Real Statistics Functions: The Real Statistics Resource Pack provides the following
supplemental functions, where R1 is a k × k range in Excel.
eVALUES(R1): Produces an 1 × k array containing the eigenvalues of matrix in range R1.
These eigenvalues are listed in decreasing absolute value order.
eVECTORS(R1) : Produces a row with the eigenvalues as for eVALUES(R1). Below each
eigenvalue is a unit eigenvector corresponding to this eigenvalue. Thus the output of
eVECTORS(R1) must be an (k+1) × k array.
Since the calculation of these functions uses iterative techniques, you can optionally
specify the number of iteration used by using eVALUES(R1, iter) and
eVECTORS(R1, iter). If the iter parameter is not used then it defaults to 100 iterations.
The eigenvectors produced by eVECTORS(R1) are all orthogonal, as described in
Definition 8 of Matrix Operations. See Figure 5 of Principal Component Analysis for an
example of the output from the eVECTORS function.
Observation: Every square k × k matrix has at most k (real) eigenvalues
(see Eigenvalues and Eigenvectors). If A is symmetric then it has k eigenvalues, although
these don’t need to be distinct (see Symmetric Matrices). It turns out that the eigenvalues
for covariance and correlation matrices are always non-negative (see Positive Definite
Matrices).
Theorem 1 (Spectral Decomposition Theorem): Let A be a
symmetric n × n matrix, then A has a spectral decomposition A = CDCT where C is an n
× n matrix whose columns are unit eigenvectors C1, …, Cn corresponding to the
eigenvalues λ1, …, λn of A and D is the n × n diagonal matrix whose main diagonal consists
of λ1, …, λn.
Observation: This is Theorem 1 found in Spectral Decomposition. We will use this
theorem to carry out principal component analysis and factor analysis. In fact, the key
form of the theorem that we will use is that A can be expressed as
Principal Component Analysis
Principal component analysis is a statistical technique that is used to analyze the
interrelationships among a large number of variables and to explain these variables in
terms of a smaller number of variables, called principal components, with a minimum
loss of information.
Definition 1: Let X = [xi] be any k × 1 random vector. We now define a k × 1 vector Y
= [yi], where for each i the ith principal component of X is
for some regression coefficients βij. Since each yi is a linear combination of the xj, Y is a
random vector.
Now define the k × k coefficient matrix β = [βij] whose rows are the 1 × k vectors =
[βij]. Thus,
yi = Y=
For reasons that will be become apparent shortly, we choose to view the rows of β as
column vectors βi, and so the rows themselves are the transpose .
Observation: Let Σ = [σij] be the k × k population covariance matrix for X. Then the
covariance matrix for Y is given by
ΣY = βT Σ β
i.e. population variances and covariances of the yi are given by
Observation: Our objective is to choose values for the regression coefficients βij so as to
maximize var(yi) subject to the constraint that cov(yi, yj) = 0 for all i ≠ j. We find such
coefficients βij using the Spectral Decomposition Theorem (Theorem 1 of Linear Algebra
Background). Since the covariance matrix is symmetric, by Theorem 1 of Symmetric
Matrices, it follows that
Σ = β D βT
where β is a k × k matrix whose columns are unit eigenvectors β1, …, βk corresponding to
the eigenvalues λ1, …, λk of Σ and D is the k × k diagonal matrix whose main diagonal
consists of λ1, …, λk. Alternatively, the spectral theorem can be expressed as
Property 1: If λ1 ≥ … ≥ λk are the eigenvalues of Σ with corresponding unit

eigenvectors β1, …, βk, then
and furthermore, for all i and j ≠ i

var(yi) = λi cov(yi, yj) = 0
Proof: The first statement results from Theorem 1 Symmetric Matrices as explained
above. Since the column vectors βj are orthonormal, βi · βj = = 0 if j ≠ i and =
1 if j = i. Thus
Property 2:
Proof: By definition of the covariance matrix, the main diagonal of Σ contains the
values , …, , and so trace(Σ) = . But by Property 1 of Eigenvalues and
Eigenvectors, trace(Σ) = .
Observation: Thus the total variance for X can be expressed as trace(Σ)
= , but by Property 1, this is also the total variance for Y.
Thus the portion of the total variance (of X or Y) explained by the ith principal component
yi is λi/ . Assuming that λ1 ≥ … ≥ λk the portion of the total variance explained by the
first m principal components is therefore / .
Our goal is to find a reduced number of principal components that can explain most of
the total variance, i.e. we seek a value of m that is as low as possible but such that the
ratio / is close to 1.
Observation: Since the population covariance Σ is unknown, we will use the sample
covariance matrix S as an estimate and proceed as above using S in place of Σ. Recall
that S is given by the formula:
where we now consider X = [xij] to be a k × n matrix such that for each i, {xij: 1 ≤ j ≤ n} is a
random sample for random variable xi. Since the sample covariance matrix is symmetric,
there is a similar spectral decomposition
where the Bj = [bij] are the unit eigenvectors of S corresponding to the

eigenvalues λj of S(actually this is a bit of an abuse of notation since these λj are not the
same as the eigenvalues of Σ).
We now use bij as the regression coefficients and so have
and as above, for all i and j ≠ i

var(yi) = λi cov(yi, yj) = 0
As before, assuming that λ1 ≥ … ≥ λk, we want to find a value of m so that explains
as much of the total variance as possible. In this way we reduce the number of principal
components needed to explain most of the variance.
Example 1: The school system of a major city wanted to determine the characteristics of
a great teacher, and so they asked 120 students to rate the importance of each of the
following 9 criteria using a Likert scale of 1 to 10 with 10 representing that a particular
characteristic is extremely important and 1 representing that the characteristic is not
important.
1. Setting high expectations for the students
2. Entertaining
3. Able to communicate effectvely
4. Having expertise in their subject
5. Able to motivate
6. Caring
7. Charismatic
8. Having a passion for teaching
9. Friendly and easy-going
Figure 1 shows the scores from the first 10 students in the sample and Figure 2 shows
some descriptive statistics about the entire 120 person sample.
Figure 1 – Teacher evaluation scores
Figure 2 – Descriptive statistics for teacher evaluations
The sample covariance matrix S is shown in Figure 3 and can be calculated directly as
=MMULT(TRANSPOSE(B4:J123-B126:J126),B4:J123-B126;J126)/(COUNT(B4:B123)-
1)
Here B4:J123 is the range containing all the evaluation scores and B126:J126 is the range
containing the means for each criterion. Alternatively we can simply use the Real
Statistics formula COV(B4:J123) to produce the same result.
Figure 3 – Covariance Matrix
In practice, we usually prefer to standardize the sample scores. This will make the weights
of the nine criteria equal. This is equivalent to using the correlation matrix. Let R = [rij]
where rij is the correlation between xi and xj, i.e.
The sample correlation matrix R is shown in Figure 4 and can be calculated directly as
=MMULT(TRANSPOSE((B4:J123-B126:J126)/B127:J127),(B4:J123-
B126:J126)/B127:J127)/(COUNT(B4:B123)-1)
Here B127:J127 is the range containing the standard deviations for each criterion.
Alternatively we can simply use the Real Statistics function CORR(B4:J123) to produce
the same result.
Figure 4 – Correlation Matrix
Note that all the values on the main diagonal are 1, as we would expect since the variances
have been standardized. We next calculate the eigenvalues for the correlation matrix
using the Real Statistics eVECTORS(M4:U12) formula, as described in Linear Algebra
Background. The result appears in range M18:U27 of Figure 5.
Figure 5 – Eigenvalues and eigenvectors of the correlation matrix
The first row in Figure 5 contains the eigenvalues for the correlation matrix in Figure 4.
Below each eigenvalue is a corresponding unit eigenvector. E.g. the largest eigenvalue
is λ1= 2.880437. Corresponding to this eigenvalue is the 9 × 1 column
eigenvector B1 whose elements are 0.108673, -0.41156, etc.
As we described above, coefficients of the eigenvectors serve as the regression coefficients
of the 9 principal components. For example the first principal component can be
expressed by
i.e.
Thus for any set of scores (for the xj) you can calculate each of the corresponding
principal components. Keep in mind that you need to standardize the values of
the xj first since this is how the correlation matrix was obtained. For the first sample
(row 4 of Figure 1), we can calculate the nine principal components using the matrix
equation Y = BX′ as shown in Figure 6.
Figure 6 – Calculation of PC1 for first sample
Here B (range AI61:AQ69) is the set of eigenvectors from Figure 5, X (range AS61:AS69)
is simply the transpose of row 4 from Figure 1, X′ (range AU61:AU69) standardizes the
scores in X (e.g. cell AU61 contains the formula =STANDARDIZE(AS61, B126, B127),
referring to Figure 2) and Y (range AW61:AW69) is calculated by the formula
=MMULT(TRANSPOSE(AI61:AQ69),AU61:AU69). Thus the principal components
values corresponding to the first sample are 0.782502 (PC1), -1.9758 (PC2), etc.
As observed previously, the total variance for the nine random variables is 9 (since the
variance was standardized to 1 in the correlation matrix), which is, as expected, equal to
the sum of the nine eigenvalues listed in Figure 5. In fact, in Figure 7 we list the
eigenvalues in decreasing order and show the percentage of the total variance accounted
for by that eigenvalue.
Figure 7 – Variance accounted for by each eigenvalue

The values in column M are simply the eigenvalues listed in the first row of Figure 5, with
cell M41 containing the formula =SUM(M32:M40) and producing the value 9 as expected.
Each cell in column N contains the percentage of the variance accounted for by the
corresponding eigenvalue. E.g. cell N32 contains the formula =M32/M41, and so we see
that 32% of the total variance is accounted for by the largest eigenvalue. Column O simply
contains the cumulative weights, and so we see that the first four eigenvalues accounts for
72.3% of the variance.
Using Excel’s charting capability, we can plot the values in column N of Figure 7 to obtain
a graphical representation, called a scree plot.
Figure 8 – Scree Plot

We decide to retain the first four eigenvalues, which explain 72.3% of the variance. In
section Basic Concepts of Factor Analysis we will explain in more detail how to determine
how many eigenvalues to retain. The portion of the Figure 5 that refers to these
eigenvalues is shown in Figure 9. Since all but the Expect value for PC1 is negative, we
first decide to negate all the values. This is not a problem since the negative of a unit
eigenvector is also a unit eigenvector.
Figure 9 – Principal component coefficients (Reduced Model)

Those values that are sufficiently large, i.e. the values that show a high correlation
between the principal components and the (standardized) original variables, are
highlighted. We use a threshold of ±0.4 for this purpose.
This is done by highlighting the range R32:U40 and selecting Home >
Styles|Conditional Formatting and then choosing Highlight Cell Rules >
Greater Than and inserting the value .4 and then selecting Home >
Styles|Conditional Formatting and then choosing Highlight Cell Rules > Less
Than and inserting the value -.4.
Note that Entertainment, Communications, Charisma and Passion are highly correlated
with PC1, Motivation and Caring are highly correlated with PC3 and Expertise is highly
correlated with PC4. Also Expectation is highly positively correlated with PC2 while
Friendly is negatively correlated with PC2.
Ideally we would like to see that each variable is highly correlated with only one principal
component. As we can see form Figure 9, this is the case in our example. Usually this is
not the case, however, and we will show what to do about this in the Basic Concepts of
Factor Analysis when we discuss rotation in Factor Analysis.
In our analysis we retain 4 of the 9 principal factors. As noted previously, each of the
principal components can be calculated by
i.e. Y= BTX′, where Y is a k × 1 vector of principal components, B is a k x

k matrix (whose columns are the unit eigenvectors) and X′ is a k × 1 vector of the
standardized scores for the original variables.
If we retain only m principal components, then Y = BTX where Y is an m × 1 vector, B is
a k × m matrix (consisting of the m unit eigenvectors corresponding to the m largest
eigenvalues) and X′ is the k × 1 vector of standardized scores as before. The interesting
thing is that if Y is known we can calculate estimates for standardized values for X using
the fact that X′ = BBTX’ = B(BTX′) = BY (since B is an orthogonal matrix, and so, BBT = I).
From X′ it is then easy to calculate X.
Figure 10 – Estimate of original scores using reduced model

In Figure 10 we show how this is done using the four principal components that we
calculated from the first sample in Figure 6. B (range AN74;AQ82) is the reduced set of
coefficients (Figure 9), Y (range AS74:AS77) are the principal components as calculated
in Figure 6, X′ are the estimated standardized values for the first sample (range
AU74:AU82) using the formula =MMULT(AN74:AQ82,AS74:AS77) and finally X are the
estimated scores in the first sample (range AW74:AW82) using the formula
=AU74:AU82*TRANSPOSE(B127:J127)+TRANSPOSE(B126:J126).
As you can see the values for X in Figure 10 are similar, but not exactly the same as the
values for X in Figure 6, demonstrating both the effectiveness as well as the limitations of
the reduced principal component model (at least for this sample data).
Basic Concepts of Factor Analysis
In this model we again consider k independent variables x1, …, xk and observed data for
each of these variables. Our objective is to identify m factors y1, …, ym, preferably
with m ≤ k as small as possible, which explain the observed data more succinctly.
Definition 1: Let X = [xi] be a random k × 1 column vector where each xi represents an
observable trait, and let μ = [μi] be the k × 1 column vector of the population means.
Thus E[Xi] = μi. Let Y = [yi] be an m × 1 vector of unobserved common factors where m
≤ k. These factors play a role similar to the principal components in Principal Component
Analysis.
We next suppose that each xi can be represented as a linear combination of the factors as
follows:
where the εi are the components which are not explained by the linear relationship. We
further assume that the mean of each is 0 and the factors are independent with mean 0
and variance 1. We can consider the above equations to be a series of regression equations.
The coefficient βij is called the loading of the ith variable on the jth factor. The
coefficient εi is called the specific factor for the ith variable. Let β = [βij] be
the k × m matrix of loading factors and let ε = [εi] be the k × 1 column vector of
specific factors.
Define the communality of variable xi to be φi = and let ϕi = var(εi) and =
var(xi).
Observation: Since μi = E[xi] = E[βi0 + yi + εi] = E[βi0] + E[yi] + E[εi]
= βi0 + 0 + 0 = βi0, it follows that the intercept term βi0 = μi, and so the regression equations
can be expressed as
or equivalently
From the assumptions stated above it also follows that:
E[xi] = μi for all i

E[εi] = 0 for all i (the specific factors are presumed to be random with mean 0)
cov(yi, yj) = 0 if i ≠ j
cov(εi, εj) = 0 if i ≠ j
cov(yi, εj) = 0 for all i, j
From Property A of Correlation Advanced and Property 3 of Basic Concepts of
Correlation, we get the following:
From these equivalences it follows that the population covariance matrix Σ for X has the
form
where is the k × k diagonal matrix with in the ith position on the diagonal.
Observation: Let λ1 ≥ … ≥ λk be the eigenvalues of Σ with corresponding unit
eigenvectors γ1, …, γk where each eigenvector γi = [γij] is a k × 1 column vector of the
form γi = [γij]. Now define the k × k matrix β = [βij] such that βij = γij for all 1 ≤ i, j ≤ k.
As observed in Linear Algebra Background, all the eigenvalues of Σ are non-negative, and
so the βij are well defined (see Property 8 of Positive Definite Matrices). By Theorem 1
of Linear Algebra Background (Spectral Decomposition Theorem), it follows that
As usual, we will approximate the population covariance matrix Σ by the sample

covariance matrix S (for a given random sample). Using the above logic, it follows that
where λ1 ≥ … ≥ λk are the eigenvalues of S (a slight abuse of notation since these are not
the same as the eigenvalues of Σ) with corresponding unit eigenvectors C1, …, Ck and L =
[bij] is the k × k matrix such that bij = cij.
As we saw previously
or equivalently
The sample versions of these are
We have also seen previously that
The sample version is therefore
and so
Similarly
Factor Extraction
A number of methods are available to determine the factor loadings used for factor
analysis. We will start by explaining the principal component method. Another commonly
used method, the principal axis method, is presented in Principal Axis Method of Factor
Extraction.
Using the concepts that are described in Basic Concepts of Factor Analysis, we show how
to carry out factor analysis via the following example..
Example 1: Carry out the factor analysis for evaluating great teachers based on the data
in Example 1 of Principal Component Analysis.
As we saw in Example 1 of Principal Component Analysis, nine criteria are measured. Our
objective is to find a set of fewer than nine factors which reasonably captures what is a
great teacher. In fact we hope to find substantially fewer than nine factors that do the job.
Figure 1 shows the correlation matrix for this data (repeated from Figure 4 of Principal
Component Analysis).
Figure 72 shows the table of eigenvalues and eigenvectors for the correlation matrix
(repeated from in Figure 5 of Principal Component Analysis) using the supplemental
function eVECTORS(B6:J14).
Figure 2 – Eigenvalues and eigenvectors
Using the formula bij = cij where C1, …, Ck are the eigenvectors (range B19:J27 in Figure
2) corresponding to the eigenvalues (range B18:J18 in Figure 2) λ1 ≥ ⋯ ≥ λk, we calculate
the loading factors for the nine common factors (see Figure 3).
Figure 3 – Loading factors (full model)
For example, the loading factor of the Passion variable on Factor 1 (cell B38) is given by
the formula =B26*SQRT(B$18). Figure 3 also contains the communalities (range
K31:K39). The communality of each variable represents the portion of that variable’s
variance captured by the model. For variable xi this is . E.g., the communality of
the Passion variable (cell K38) is calculated via the formula =SUMSQ(B38;J38). Since we
are using the full model (where all nine common factors are present) and the variance of
each variable is 1 (remember we standardized the data), it is not surprising that column
K contains all ones.
In fact, if we had used the eigenvalues and eigenvectors as calculated in Figure 2, we would
have seen communalities that are close to 1, but not exactly 1. In fact, to get the
communalities to come out to 1 we reran the eigenvector function eVECTORS(B6:J14,
200), using 200 iterations (instead the default of 100) to get a more accurate picture of
the eigenvalues and eigenvectors.
Determining the Number of Factors
As mentioned previously, one of the main objectives of factor analysis is to reduce the
number of parameters. The number of parameters in the original model is equal to the
number of unique elements in the covariance matrix. Given symmetry, there are C(k, 2)
= k(k+1)/2 such elements. The factor analysis model requires k(m+1) elements; i.e. the
number of parameters in L (namely km) plus the number of elements in X = μ + LY +
ε (namely k).
Thus, we desire a value for m such that k(m+1) ≤ k(k+1)/2, i.e. m ≤ (k–1)/2. For Example
1 of Factor Extraction, we are looking for m ≤ (k–1)/2 = (9–1)/2 = 4. Our preference is to
use fewer than 4 factors if possible.
In general, the factors which have a high eigenvalue should be retained, while those with
a low eigenvalue should be eliminated, but what is high and what is low? The general
approach (Kaiser) is to retain factors with eigenvalue ≥ 1 and eliminate factors with
eigenvalue < 1. This may be appropriate for smaller models, but it may be too restrictive
for models with lots of variables.
Another approach is to create a scree plot (Cattel), i.e. a graph of the eigenvalues (y-
axis) of all the factors (x-axis) where the factors are listed in decreasing order of their
eigenvalues (as we did in principal component analysis). The heuristic is to retain all the
factors above (i.e. to the left of) the inflection point (i.e. the point where the curve starts
to levels off) and eliminate any factor below (i.e. to the right of) the inflection point. Since
the curve isn’t necessarily smooth there can be multiple inflection points and so the actual
cutoff point can be subjective.
The scree plot for Example 1 of Factor Analysis Example is shown in Figure 1. The plot
seems to have two inflection points: one at eigenvalue 2 and the other at eigenvalue 5. For
our purposes we choose to keep the factors corresponding to eigenvalues to the left of
eigenvalue 5, i.e. the 4 largest eigenvalues. These four eigenvalues account for 72.3% of
the variance.
Figure 1 – Scree Plot

Figure 2 contains the table of loading factors from Figure 1 restricted to only the four
highest common factors. Since all but the Expect loading for Factor 1 is negative, we first
decide to negate all the loading factors for Factor 1. This is not a problem since the
negative of a unit eigenvector is also a unit eigenvector.
Figure 2 – Loading factors and communalities for 4 factors
In addition we recalculate the communalities for each of the variables (in column F). We
can think of a communality as something like R2 from regression analysis. In fact, if we
perform regression analysis on the four factors, the value of R2 would be 6.60747, which
represents the total variance (out of 9) captured by the model (i.e. 72.3%). The
communalities for each of the variables range from 50.2% for Passion to 92.2% for
Expertise. Note that 72.3% of total variance is the same percentage that we saw in Figure
1, found by dividing the sum of the eigenvalues for the highest four factors by the total
variance.
In general we would like to see that the communalities for each variable are at least .5.
Variables with communalities less than .5 should be considered for removal and the
analysis rerun.
Since the variance of each variable is 1, the specific variance is simply 1 – the communality,
i.e. ϕi = 1 – , as summarized in Figure 3. The communalities are the variances
captured by the model and the specific variance are the error terms.
Figure 3 – Communalities and specific variances
As we did in Figure 9 of Principal Component Analysis, we highlight all the loading factors
whose absolute value is greater than .4 (see Figure 2). We see that Entertainment,
Communications, Motivation, Charisma and Passion are highly correlated with Factor 1,
Motivation and Caring are highly correlated with Factor 3 and Expertise is highly
correlated with Factor 4. Also Expectation is highly positively correlated with Factor 2
while Friendly is negatively correlated with Factor 2.
Ideally we would like to see that each variable is highly correlated with only one factor. As
we can see from Figure 2, this is the case in our example, except that Motivation is
correlated with both Factor 1 and 3. We will attempt to clarify the analysis by means of a
rotation, as in Rotation.
Rotation
Let U be any m × m orthogonal matrix , and so by definition UTU = I.
Let L′ = LUT and Y′= UY. Then L′ is a (k × m) × (m × m) = k × m matrix and Y′ is a
(m × m) × (m × 1) = m × 1 column vector. Also
X = μ + LY+ ε = μ + LUTUY + ε = μ + L′Y′ + ε
E[Y′] = E[UY] = U E[Y] = U0 = 0
var(Y′) = var(UY) = U var(Y) UT = UIUT = UUT = I
cov(Y′, ε) = cov(UY, ε) = U cov(Y, ε) = U0 = 0
This shows that if L and Y satisfy the model, then so do L′ and Y′. Since there are an
infinite number of orthogonal matrices U, there are an infinite number of alternative
models.
A rotation of the original axes is determined by an orthogonal matrix U with det = 1
(Property 6 of Orthogonal Vectors and Matrices). Thus, replacing Y and by Y′ is
equivalent to rotating the axes. This won’t change the overall variance explained by the
model (i.e. the communalities), but it will change the distribution of variances among the
factors.
We seek an m × m rotation matrix U = [uij] such that the rows represent the existing
factors and the columns represent the new factors. The most popular rotation approach
is called Varimax, which maximizes the differences between the loading factors while
maintaining orthogonal axes. Varimax attempts to maximize the value of V where
There are also non-orthogonal rotations which do a

better job of differentiating the factors, but at the cost of loss of orthogonality.
We can carry out the Varimax orthogonal rotation in standard Excel as described
in Varimax. Because the calculation is complicated and time consuming, we suggest that
you use following supplemental array function.
supplemental function where R1 is a k × m range in Excel.
VARIMAX(R1): Produces a k × m array containing the loading factor matrix after
applying a Varimax rotation to the loading factor matrix contained in range R1.
Referring to Figure 2 of Determining the Number of Factors, we now use
VARIMAX(B44:E52) to obtain the rotated matrix for Example 1 of Factor Extraction as
shown in Figure 1.
Figure 1 – Loading factors after Varimax rotation
We now see that the each of the variables (including Motivation) with the single exception
of Entertainment correlates highly with only one factor. Also note that the communalities
(column M) are the same as those shown in Figure 2 of Determining the Number of
Factors prior to rotation.
We can also calculate the rotation matrix U that transforms the matrix in Figure 2
of Determining the Number of Factors into that in Figure 1. We do this by using Gaussian
elimination (see Determinants and Linear Equations). In order to avoid the time
consuming steps required in standard Excel, we first create a copy of the original loading
factors (from Figure 2 of Determining the Number of Factors) and put a copy of the
rotated loading factors (from Figure 2) right next to it as shown in Figure 3.
Figure 2 – Preparation for Gaussian elimination

We next apply the supplemental Excel function =ELIM(A57:H65) to get the result shown
in Figure 3.
Figure 3 – Rotation Matrix
The 4 × 4 rotation matrix U is now found in the upper right portion (range E67:H70) of
Figure 3. Note too that U is an orthogonal matrix (i.e. UTU = I) and det(U) = 1.
Factor Scores
Elsewhere we have shown how calculate the loading factors L, but we still need to find the
values of the factors, namely Y, which correspond to values of the explicit variables X.
We show three methods of calculating the factor scores.
Regression Method
If we look to Definition 1 of Basic Concepts of Factor Analysis, we recall that the factor
analysis model is based on the equations:
Using the sample instead of the population we have
We find the values of the factors using the method of least square employed in multiple
regression (see Least Square Method of Multiple Regression). In particular, our goal is to
find the value of Y which minimizes ||E|| based on the values in the sample for the explicit
variables X.
The least square solution (Property 1 of Least Square Method of Multiple Regression) is
Note that since this regression doesn’t have a constant term, we don’t need to add a
columns of 1’s to L as we did in Property 1 of Least Square Method of Multiple Regression.
Now LTL = D where D is a diagonal matrix whose main diagonal consists of the
eigenvalues of S. Thus (LTL)-1 is the diagonal matrix whose main diagonal consists of 1/λ1,
⋯, 1/λk.
We define the factor score matrix to be the m × k matrix F = (LTL)-1L = [fij] where
and where C1, …, Cm are the orthonormal eigenvectors corresponding to the eigenvalues λ1,
…, λm.
Recall that L = [bij] is the k × m matrix such that bij = . Since · = , if
follows that the factor scores for the sample X satisfy
For example, the factor score matrix and factor scores for the first sample (see Figure 1 or
6 of Principal Component Analysis) for Example 1 of Factor Extraction is shown in Figure
1.
Figure 1 – Factor score matrix using least squares method
Here the factor score matrix (range BV6:BY14) is calculated by the formula
=B19:E27/SQRT(B18:E18) (referring to cells in Figure 2 of Factor Extraction), the sample
scores X (range CA6:C14) is as in Figure 1 or 6 of Principal Component Analysis, X′
(CC6:CC14) consists of the values in X less the means of each of the variables and is
calculated by the formula =CA6:CA14-TRANSPOSE(B128:J128) (referring to Figure 2
of Principal Component Analysis). Finally, the factor scores Y corresponding to the scores
in X (range CE6:CE9) is calculated by the formula
=MMULT(TRANSPOSE(BV6:BY14),CC6:CC14)
Actually since we reversed the sign of the loadings for factor 1, we need to reverse the sign
for the factor scores for factor 1 (i.e. column BV). This results in a change of sign for factor
1 (i.e. CE6). The result is shown in Figure 2.
Figure 2 – Revised factor score matrix

In a similar fashion we can calculate the factor scores for the entire sample (see Figure 2
of Principal Component Analysis). The result for the first 10 sample items is shown in
Figure 3. Note that we are now showing the X as row vectors (instead of column vectors
as was employed in Figure 2), and so the factor scores are calculated by
Figure 3 – Factor scores of the sample using least squares

Here the factor scores for the entire sample is given in range CH19:CK38, and are
calculated by the formula =MMULT(B4:J123-B126:J126,BV19:BY27), referring to cells in
Figure 1 of Principal Component Analysis and Figure 2.
Bartlett’s Method
Bartlett’s method of creating factor scores is similar to the least squares method except
that now the reciprocals of the specific variances are used as weighting factors. This gives
more weight to variances with high community (and therefore low specific variance).
As before, we seek a Y such that
But this time instead of minimizing
we try to minimize
where V is the diagonal matrix whose main diagonal
consists of the specific variances. This produces factor scores satisfying
For Bartlett’s method we define the factor score matrix to be
the m × k
and so the factor scores for a sample X are given by
For Example 1 of Factor Extraction the factor score matrix and calculation for the first
sample using Bartlett’s method is shown in Figure 4.
Figure 4 – Factor scores using Bartlett’s method
Here LTV-1L (range CN:CQ21) is calculated by the array formula
=MMULT(TRANSPOSE(B44:E52),MMULT(MINVERSE(DIAGONAL(Q44:Q52)),
B44:E52))
The factor score matrix (range CN26:CQ34) is calculated by the formula
=TRANSPOSE(MMULT(MINVERSE(CN18:CQ21),MMULT(TRANSPOSE(B44:E52
MINVERSE(DIAGONAL(Q44:Q52)))))
The rest of the figure is calculated as in Figure 2. In a similar fashion we can calculate the
factor scores for the entire sample (see Figure 2 of Principal Component Analysis). The
result for the first 10 sample items is shown in Figure 5 (note that we are now showing
the X as row vectors instead of column vectors as was employed in Figure 4).
Figure 5 – Factor scores using Bartlett’s method
Here the factor scores for the entire sample is given in range CZ19:DC38, and is calculated
by the formula =MMULT(B4:J123-B126:J126,CN26:CQ34), referring to cells in Figure 1
of Principal Component Analysis and Figure 4.
Anderson-Rubin’s Method
In this method the factor scores are not correlated. This method produces factor scores
satisfying
We define the factor score matrix to be the m

× k matrix
and so the factor scores for a sample X are given by
To calculate the factor matrix for Example 1 of Factor Extraction using Anderson-Rubin’s
method, we first find the matrices shown in Figure 6.
Figure 6 – Preliminary calculations for factor matrix using Anderson-Rubin

All these matrices are calculated using standard Excel matrix functions as we have seen
previous (see for example Figure 4), with the exception of
This requires finding the square root of a positive semidefinite matrix as described
in Positive Definite Matrices. To avoid carrying out the complicated calculations, the
following array function is provided in the Real Statistics Resource Pack:
supplemental array function, where R1 is a k × k range in Excel.
MSQRT(R1): Produces a k × k array which is the square root of the matrix represented
by range R1
Thus (range DR12:DU15) is calculated using the formula
=MINVERSE(MSQRT(DF17:DI20))
The factor score matrix and calculation for the first sample using Anderson-Rubin’s
method is shown in Figure 7.
Figure 7 – Factor scores using Anderson-Rubin’s method

The factor score matrix (range DF26:DI34) is calculated using the formula
=TRANSPOSE(MMULT(DR12:DU15,DR5:DZ8))
The factor scores for the first 10 sample items is shown in Figure 8 (note that as before we
are now showing the X as row vectors instead of column vectors as was employed in
Figure 7).
Figure 8 – Factor scores using Anderson-Rubin’s method
The factor scores (using any of the methods described above) can now be used as the data
for subsequent analyses. In some sense they provide similar information as that given in
the original sample (Figure 1 of Principal Component Analysis), but with a reduced
number of variables (as was our original intention).
Note that exploratory factor analysis does not require that the data be multivariate
normally distributed, but many of the analyses that will be done using the reduced factors
(and factor scores) will require multivariate normality.
alidity of Correlation Matrix and Sample Size
Factor analysis doesn’t make sense when there is either too much or too little correlation
between the variables. When reducing the number of dimensions we are leveraging the
inter-correlations. E.g. if we believe that three variables are correlated to some hidden
factor, then these three variables will be correlated to each other. You can test the
significance of the correlations, but with such a large sample size, even small correlations
will be significant, and so a rule of thumb is to consider eliminating any variable which
has many correlations less than 0.3.
We can calculate the Reproduced Correlation Matrix, which is the correlation matrix
of the reduced loading factors matrix.
Figure 1 – Reproduced Correlation Matrix
Referring to Figure 2 of Determining the Number of Factors, the reproduced correlation

in Figure 1 is calculated by the array formula
=MMULT(B44:E52,TRANSPOSE(B44:E52))
By comparing the reproduced correlation matrix in Figure 1 to the correlation matrix in

Figure 1 of Factor Extraction, we can get an indication of how good the reduced model is.
This is what we will do next.
We can also look at the error terms, which as we observed previously, are given by the
formula
Our expectation is that cov(ei, ej) ≈ cov(εi, εj) = 0 for all i ≠ j. If too many of these
covariances are large (say > .05) then this would be an indication that our model is not as
good as we would like.
The error matrix, i.e. R – LLT, for Example 1 of Factor Extraction is calculated by the array
formula,
=B6:J14-MMULT(B44:E52,TRANSPOSE(B44:E52))
(referring to cells shown in Figure 1 of Factor Extraction and Figure 2 of Determining

the Number of Factors) and is shown in Figure 2.
Note that the main diagonal of this table consists of the specific variances (see Figure 3
of Determining the Number of Factors), as we should expect. There are quite a few entries
off the diagonal which look to be significantly different from zero. This should cause us
some concern, perhaps indicating that our sample is too small. One other thing worth
noting is that the same error matrix will be produced if we use the original loading factors
(from Figure 2 of Determining the Number of Factors) or the loading factors after
Varimax rotation (Figure 1 of Rotation).
Figure 2 – Error Matrix
Note too that if overall the variables don’t correlate, signifying that the variables are
independent of one another (and so there aren’t related clusters which will correlate with
a hidden factor), then the correlation matrix would be approximately an identity matrix.
We can test (called Bartlett’s Test) whether a population correlation matrix is
approximately an identity matrix using Box’s test.
For Example 1 of Factor Extraction, we get the results shown in Figure 3.
Figure 3 – Bartlett’s Test
We first fill in the range L5:M6. Here cell L5 points to the upper left corner of the
correlation matrix (i.e. cell B6 of Figure 1 of Factor Extraction) and cell L6 points to a 9
× 9 identity matrix. 120 in cells M5 and M6 refers to the sample size. We next highlight
the 5 × 1 range M8:M12, enter the array formula BOX(L5:M6) and then press Ctrl-Shft-
Enter.
Since p-value < α = .001, we conclude there is a significant difference between the
correlation matrix and the identity matrix.
Of course, even if Bartlett’s test shows that the correlation matrix isn’t approximately an
identity matrix, especially with a large number of variables and a large sample, it is
possible for there to be some variables that don’t correlate very well with other variables.
We can use the Partial Correlation Matrix and the Kaiser-Meyer-
Olkin (KMO)measure of sample adequacy (MSA) for this purpose, described as follows.
It is not desirable to have two variables which share variance with each other but not with
other variables. As described in Multiple Correlation this can be measured by the partial
correlation between these two variables. To calculate the partial correlation matrix for
Example 1 of Factor Extraction, first we find the inverse of the correlation matrix, as
shown in Figure 4.
Figure 4 – Inverse of the correlation matrix
Range B6:J14 is a copy of the correlation matrix from Figure 1 of Factor Extraction (onto
a different worksheet). Range B20:J28 is the inverse, as calculated by
=MINVERSE(B6:J14). We have also shown the square root of the diagonal of this matrix
in range L20:L28 as calculated by =SQRT(DIAG(B20:J28)), using the DIAG
supplemental array function. The partial correlation matrix is now shown in range
B33:J41 of Figure 5.
Figure 5 – Partial correlation matrix
The partial correlation between variables xi and xj where i ≠ j keeping all the
other variables constant is given by the formula
where Z = the list of variables x1, …, xk excluding xi and xj, and the inverse of the
correlation matrix is R-1 = [pij]. Thus the partial correlation matrix shown in Figure 5 can
be calculated using the array formula
=-B20:J28/MMULT(L20:L28,TRANSPOSE(L20:L28))
Since this formula results in a matrix whose main diagonal consists of minus ones, we use
the slightly modified form to keep the main diagonal all ones:
=-B20:J28/MMULT(L20:L28,TRANSPOSE(L20:L28))+2*IDENTITY()
The Kaiser-Meyer-Olkin (KMO) measure of sample adequacy (MSA) for variable xj is

given by the formula
where the correlation matrix is R = [rij] and the partial covariance matrix is U = [uij]. The
overall KMO measure of sample adequacy is given by the above formula taken over all
combinations and i ≠ j.
KMO takes values between 0 and 1. A value near 0 indicates that the sum of the partial
correlations are large compared to the sum of the correlations, indicating that the
correlations are widespread and so are not clustering among a few variables, indicating a
problem for factor analysis. On the contrary, a value near 1 indicates a good fit for factor
analysis.
For Example 1 of Factor Extraction, values of KMO are given in Figure 6.
Figure 6 – KMO measures of sample adequacy

E.g. the KMO measure of adequacy for Entertainment, KMO2 (cell C46) is calculated by
the formula =C15/(C15+C42) where C15 contains the formula =SUMSQ(C6:C14)-1 (here
one is subtracted since we are only interested in the correlations with other variables and
not with Entertainment with itself) and C42 contains the formula =SUMSQ(C33:C41)-1.
Similarly the overall KMO (cell K46) is calculated by the formula =K15/(K15+K42), where
K15 contains the formula =SUM(B415:J415) and K42 contains the formula
=SUM(B42:J42).
The general rules for interpreting the KMO measures are given in the following table
Figure 7 – Interpretations of KMO measure

As can be seen from Figure 6, the Expectation, Expertise and Friendly variables all have
KMO measures less than .5, and so are good candidates for removal. Such variables
should be removed one at a time and the KMO measure recalculated since these measures
may change significantly after removal of variable.
It should be noted that the matrix all of whose non-diagonal entries are equal to the
corresponding entries in the Partial Correlation Matrix and whose main diagonal consists
of the KMO measures of the individual variables is known as the Anti-image
Correlation Matrix.
At the other extreme from testing correlations that are too low is the case where some
variables correlate too well with each other. In this case, the correlation matrix
approximates a singular matrix and the mathematical techniques we typically use break
down. A correlation coefficient between two variables of more than 0.8 is a cause for
concern. Even lower correlation coefficients can be a cause for concern since two variables
correlating at 0.9 might be less of a problem than three variables correlating at 0.6.
Multicollinearity can be detected by looking at det R where R = the correlation matrix.

If Ris singular then det R = 0. A simple heuristic is to make sure that det R >
0.00001. Haitovsky’s significance test provides a way for determining whether the
determinant of matrix is zero, namely define H as follows and use the fact
that H ~ χ2 (m) where
and k = number of variables, n = total sample size and m = k(k – 1)/2.

Figure 8 carries out this test for Example 1 of Factor Extraction.
Figure 8 – Haitovsky’s Test
The result is not significant, and so we may assume that the correlation matrix is
invertible.
In addition to the KMO measures of sample adequacy, various guidelines have been
proposed to determine how big a sample is required to perform exploratory factor
analysis. Some have proposed that the sample size should be at least 10 times the number
of variables and some even recommend 20 times. For Example 1 of Factor Extraction, a
sample size of 120 observations for 9 variables yields a 13:1 ratio. A better indicator of
sample size is summarized in the following table:
Figure 9 – Sample size requirements

The table list the sample size required based on the largest loading factor for each variable.
Thus if the largest loading factor for some variable is .45, this would indicate that a sample
of at least 150 is needed.
Per [St], a factor is reliable provided
 There are 3 or more variables with loadings of at least .8

 There are 4 or more variable with loadings of .6 or more
 There are 10 or more variables with loadings of .4 or more and the sample size is at
least 150
 Otherwise a sample of at least 300 is required
Principal Axis Method of Factor Extraction
The full principal component extraction model assumes that all the variance is common,
and so the communalities are all equal to 1 (i.e. there is no specific variance). It is only
when we reduce the number of factors that specific variance is introduced into the model.
In the principal axis factoring method, we make an initial estimate of the common
variance in which the communalities are less than 1. This initial estimate assumes that
the communality of each variable is equal to the square multiple regression coefficient of
that variable with respect to the other variables. The principal axis factoring method is
implemented by replacing the main diagonal of the correlation matrix (which consists of
all ones) by these initial estimates of the communalities. The principal component is now
applied to this revised version of the correlation matrix, as described above
In the principal axis method the following iterative approach is used:
R0 = R, the original correlation matrix

Rp+1 = Rp with the main diagonal of Rp replaced by the communalities Cp of Rp
This algorithm is repeated until a predefined maximum number of iterations are
performed or the communalities converge (and so there is too little difference
between Cpand Cp+1 (and therefore between Rp and Rp+1. The final version of Rp is then used
as in the principal component method of extraction.
We now discuss how to calculate the communalities Cp. To calculate the initial
communalities C0 for principal axis factoring we use the value of R2 between each factor
and all the other factors. For Example 1 of Factor Extraction, the initial communalities
are given in range V33:V41 of Figure 1.
Figure 1 – Initial Communalities

Referring to the sample data in Figure 1 of Factor Analysis Example, the communality for
the first factor (cell V33) can be computed by the formula =RSquare(B4:J123,U33), which
is has the same value as RSquare(C4:J123,B4:B123), and similarly for the other eight
factors.
It turns out that the vector of initial communalities V33:V41 can also be computed by the
array formula
=1–1/DIAG(MINVERSE(M4:U12))
where M4:U12 is the correlation matrix (see Figure 3 of Factor Analysis Example).
For each p we show how to compute the communalities Cp+1 in the next example.
Example 1: Repeat the factor analysis on the data in Example 1 of Factor
Extraction using the principal axis factoring method.
As calculate the correlation matrix and then the initial communalities as described above.
We next substitute the initial communalities in the main diagonal of the correlation
matrix and calculate the factor matrix as we did in the principal component method of
extraction. This is shown in Figure 2.
Figure 2 – Iteration #1
The revised correlation matrix R1 in range Y6:AG14 is equal to the original correlation
matrix with the entries in the main diagonal replaced by the communalities calculated in
the previous step (i.e C0 in this case). We can calculate this correlation via the array
formula
=M4:U12–IDENTITY()+DIAGONAL(V33:V41)
where M4:U12 is the original correlation matrix R0 (Figure 3 of Factor Analysis Example)
and V33:V41 are the communities C0 (from Figure 1).
The eigenvalues and eigenvectors in range Y18:AG28 is calculated by
=eVECTORS(Y6;AG114). The Factor Matrix in range Y33;AG41 is calculated as in
Principal Component extraction, except where the corresponding eigenvalues are not
positive. While this is not possible for Principal Component extraction, it is possible for
Principal Axis extraction. When an eigenvalue is non-positive (as is the case with the final
5 eigenvalues in Figure 2) the corresponding loading factors are set to zero. For example
the formula for calculating the first entry in the Factor Matrix (cell Y33) is
=IF(Y$19>0,Y20*SQRT(Y$19),0)
The new communalities C1 (range AH33:AH41) is now computed as in Principal

Component extraction. E.g. AH33 is computed by the formula =SUMSQ(Y33:AG33).
As explained above we next calculate C2 and R2 in the same manner, and continue in this
manner until a fixed number of predetermined iterations is reached (e.g. we will use p =
25 as the default maximum number of iterations) or until Cp and Cp+1 are sufficiently close.
For this we test whether the sum of the squares of the differences in the communalities
are less than some predetermined precision amount (we will use .00001 as the default).
For iteration #1 this metric is found in cell AH43 and is calculated by the formula
=SUMXMY2(AH33:AH41,V33:V41)
(referring to Figure 1 and 2).
It turns out that after 19 iterations convergence goal of .00001 is reached with the
difference between the communalities C18 and C19 of 8.81E-06. The values of the
communalities after the 19th iteration are given in range IP33:IP41 of Figure 3.
Figure 3 – Iteration #19
The Real Statistics Resource Pack provides a supplemental array function which
automates the process of finding the converged values of the communalities, thus
avoiding the tedious calculations described above.
Real Statistics Function: If R1 is a k × k correlation matrix then

ExtractCommunalities(R1, iter, prec, eigen) = the 1 × k row vector with the
communalities after convergence based on a precision value of prec but with a maximum
number of iter iteration. As described above we use .00001 as the default value
of prec and 25 as the default value of iter.
Since the eigenvalues and eigenvectors of the correlation matrix is calculated (using the
eVECTORS supplemental function) in each iteration, a fourth argument eigen can be
used to specify the number of iterations used to calculate these eigenvalues/vectors (with
a default of 100).
Once these values for the communalities is found the Principal Axis extraction method
proceeds exactly as for the Principal Component extraction method, except that these
communalities are used instead of 1’s in the main diagonal of the correlation matrix. This
is illustrated in Real Statistics Support for Factor Analysis.
Real Statistics Support for Factor Analysis
Real Statistics Data Analysis Tool: The Real Statistics Resource Pack contains
the Factor Analysis data analysis tool, which automates most of the Factor Analysis
capabilities described elsewhere in this website.
To access this data analysis tool, first press Ctrl-m and then choose
the Multivariate Analyses option from the resulting menu. From the dialog box that
appears select the Factor Analysis option and click on the OK button. The dialog box
in 1 will then appear.
Figure 1 – Factor Analysis dialog box
If you click on the Help button the following dialog box will appear.
Figure 2 – Help for Factor Analysis data analysis tool

As seen in Figure 1, you are presented with a choice between using Principal
Component extraction or Principal Axis extraction. You can choose to
use Varimaxrotation or not. You can also choose to specify the number of factors to use
in the model (# of Factors); if this field is left blank then the Kaiser criterion is used,
namely that all factors whose eigenvalue is 1 or greater are retained.
Principal Component Extraction
If you choose the Principal Component extraction option then the following output will
appear (all the data refers to Example 1 of Factor Extraction):
Figure 3 – Factor Analysis PCA Extraction – part 1


In order to display the rotated factor matrix shown in range B114:E122, the VARIMAX
supplemental array function is used. This function is provided in the Real Statistics
Resource Pack.
VARIMAX(R1, iter, prec) = the result of rotating the square matrix defined by range R1
using the Varimax algorithm, where iter is the maximum number of iterations (default
100) and prec is the value that is considered to be sufficiently close to zero (default
0.00001).
In Figure 7, range B114:E122 contains the formula =VARIMAX(M100:P108).
Principal Axis Extraction

If you choose the Principal Axis extraction method then the output is similar to that
described above. In fact, the output starts out identically as described in Figure 3 and 4
(except that the title is Factor Analysis – Principal Axis Extraction).
As described in Principal Axis Extraction, the Real Statistics software next calculates the
initial communalities and revised communalities (using the ExtractCommunalities
supplemental function) as described in Figure 11.
Figure 11 – Factor Analysis PAF Extraction – part 3
From this point on the data analysis tool calculates its results exactly as in Principal
Component extraction except that the revised correlation matrix (range M96:104 in
Figure 11) is used as the correlation matrix.


Factor Analysis Example

Example 1: The school system of a major city wanted to determine the characteristics of
a great teacher, and so they asked 120 students to rate the importance of each of the
following 9 criteria using a Likert scale of 1 to 10 with 10 representing that a particular
characteristic is extremely important and 1 representing that the characteristic is not
important.
1. Setting high expectations for the students
2. Entertaining
3. Able to communicate effectvely
4. Having expertise in their subject
5. Able to motivate
6. Caring
7. Charismatic
8. Having a passion for teaching
9. Friendly and easy-going
Figure 1 shows the entire 120 person sample and Figure 2 shows some descriptive
statistics about this sample
Figure 1a – Sample (part 1)
Figure 1b – Sample (part 2)
Figure
1c – Sample (part 3)
Figure 1d – Sample (part 4)
Figure 2 – Descriptive Statistics


Factor Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Factor Analysis

Uploaded by

Copyright:

Available Formats

Factor Analysis

Exploratory factor analysis is a statistical approach that can be used to analyze

 Linear Algebra Background

corresponding to λ, where ||X|| = .

Property 1: If λ1 ≥ … ≥ λk are the eigenvalues of Σ with corresponding unit

and furthermore, for all i and j ≠ i

where the Bj = [bij] are the unit eigenvectors of S corresponding to the

and as above, for all i and j ≠ i

Figure 1 – Teacher evaluation scores

Figure 2 – Descriptive statistics for teacher evaluations

Figure 3 – Covariance Matrix

Figure 4 – Correlation Matrix

Figure 6 – Calculation of PC1 for first sample

Figure 7 – Variance accounted for by each eigenvalue

Figure 8 – Scree Plot

Figure 9 – Principal component coefficients (Reduced Model)

i.e. Y= BTX′, where Y is a k × 1 vector of principal components, B is a k x

Figure 10 – Estimate of original scores using reduced model

From the assumptions stated above it also follows that:

E[xi] = μi for all i

As usual, we will approximate the population covariance matrix Σ by the sample

The sample versions of these are

We have also seen previously that

The sample version is therefore

Figure 1 – Correlation Matrix

Figure 2 – Eigenvalues and eigenvectors

Figure 1 – Scree Plot

Figure 2 – Loading factors and communalities for 4 factors

There are also non-orthogonal rotations which do a

Figure 2 – Preparation for Gaussian elimination

Using the sample instead of the population we have

Figure 2 – Revised factor score matrix

Figure 3 – Factor scores of the sample using least squares

As before, we seek a Y such that

But this time instead of minimizing

and so the factor scores for a sample X are given by

The factor score matrix (range CN26:CQ34) is calculated by the formula

We define the factor score matrix to be the m

and so the factor scores for a sample X are given by

Figure 6 – Preliminary calculations for factor matrix using Anderson-Rubin

Figure 7 – Factor scores using Anderson-Rubin’s method

Figure 1 – Reproduced Correlation Matrix

Referring to Figure 2 of Determining the Number of Factors, the reproduced correlation

By comparing the reproduced correlation matrix in Figure 1 to the correlation matrix in

(referring to cells shown in Figure 1 of Factor Extraction and Figure 2 of Determining

Figure 2 – Error Matrix

Figure 3 – Bartlett’s Test

Figure 4 – Inverse of the correlation matrix

The Kaiser-Meyer-Olkin (KMO) measure of sample adequacy (MSA) for variable xj is

For Example 1 of Factor Extraction, values of KMO are given in Figure 6.

Figure 6 – KMO measures of sample adequacy

Figure 7 – Interpretations of KMO measure

Multicollinearity can be detected by looking at det R where R = the correlation matrix.

and k = number of variables, n = total sample size and m = k(k – 1)/2.

Figure 9 – Sample size requirements

Per [St], a factor is reliable provided

 There are 3 or more variables with loadings of at least .8

In the principal axis method the following iterative approach is used:

R0 = R, the original correlation matrix

Figure 1 – Initial Communalities

The new communalities C1 (range AH33:AH41) is now computed as in Principal

Real Statistics Function: If R1 is a k × k correlation matrix then

Figure 1 – Factor Analysis dialog box

Figure 2 – Help for Factor Analysis data analysis tool