You are on page 1of 18

DISCRIMINANT

No analysis is done for any subfile group for which the number of non-empty
groups is less than 2 or the number of cases or sum of weights fails to exceed the
number of non-empty groups. An analysis may be stopped if no variables are
selected during variable selection or the eigenanalysis fails.

Notation
The following notation is used throughout this chapter unless otherwise stated:
g Number of groups
p Number of variables
q Number of variables selected
Xijk Value of variable i for case k in group j
f jk Case weights for case k in group j
mj Number of cases in group j
nj Sum of case weights in group j

n Total sum of weights

Basic Statistics
Mean

F
mj I
= G∑ f JJ bvariable i in group jg
Xij
GH
k =1
jk Xijk
K
nj

Fg mj I avariable if
= G∑∑ f JJ
Xi•
GH
j =1 k =1
jk Xijk
K
n

1
2 DISCRIMINANT

Variances

F mj I
GG ∑ f 2
jk Xijk − n j Xij2 JJ
Sij2 =
H k =1 K bvariable i in group jg
dn − 1i
j

F g mj I
GG ∑ ∑ f X − nX 2 2 JJ avariable if
H K
jk ijk i
j =1 k =1
Si2• =
an − 1f

Within-groups Sums of Squares and Cross-product Matrix (W)

g mj F g mj IF mj I
wil = ∑∑ −∑ G ∑ f JJ GG ∑ f JJ i, l = 1, …, p
j =1 k =1
f jk Xijk Xljk
GH j =1 k =1
jk Xijk
KHk =1
jk Xljk
K
nj

Total Sums of Squares and Cross-product Matrix (T)

g mj F g mj IF g mj I
til = ∑∑ f jk Xijk Xljk − GG ∑ ∑ f jk Xijk JJ GG ∑ ∑ f jk Xljk JJ n
j =1 k =1 H j =1 k =1 KHj =1 k =1 K

Within-groups Covariance Matrix

W
C= n>g
a f
n−g
DISCRIMINANT 3

Individual Group Covariance Matrices C FH a f IK


j

Fm I
GG ∑ f jk Xijk Xljk − Xij Xlj n j JJ
j

a j f H k =1
cil =
K
dn j − 1i

Within-groups Correlation Matrix (R)

R| wil if wii wll > 0


ril = S| wii wll
TSYSMIS otherwise

Total Covariance Matrix

T
T′ =
n −1

Univariate F and Λ for Variable I

b
t − wii n − g
Fi = ii
ga f
wii g − 1 a f
with g −1 and n − g degrees of freedom

wii
Λi =
tii

with 1, g − 1 and n − g degrees of freedom.


4 DISCRIMINANT

Rules of Variable Selection


Both direct and stepwise variable entry are possible. Multiple inclusion levels may
also be specified.

Method = Direct
For direct variable selection, variables are considered for inclusion in the order in
which they are written on the ANALYSIS = list. A variable is included in the
analysis if, when it is included, no variable in the analysis will have a tolerance less
than the specified tolerance limit (default = 0.001).

Stepwise Variable Selection


At each step, the following rules control variable selection:
• Eligible variables with higher inclusion levels are entered before eligible
variables with lower inclusion levels.

• The order of entry of eligible variables with the same even inclusion level
is determined by their order on the ANALYSIS = specification.

• The order of entry of eligible variables with the same odd level of
inclusion is determined by their value on the entry criterion. The variable
with the “best” value for the criterion statistic is entered first.

• When level-one processing is reached, prior to inclusion of any eligible


variables, all already-entered variables which have level one inclusion
numbers are examined for removal. A variable is considered eligible for
removal if its F-to-remove is less than the F value for variable removal,
or, if probability criteria are used, the significance of its F-to-remove
exceeds the specified probability level. If more than one variable is
eligible for removal, that variable is removed that leaves the “best” value
for the criterion statistic for the remaining variables. Variable removal
continues until no more variables are eligible for removal. Sequential
entry of variables then proceeds as described previously, except that after
each step, variables with inclusion numbers of one are also considered for
exclusion as described before.

• A variable with a zero inclusion level is never entered, although some


statistics for it are printed.
DISCRIMINANT 5

Ineligibility for Inclusion


A variable with an odd inclusion number is considered ineligible for inclusion if:
• The tolerance of any variable in the analysis (including its own) drops
below the specified tolerance limit if it is entered, or

• Its F-to-enter is less than the F-value for a variable to enter value, or

• If probability criteria are used, the significance level associated with its F-
to-enter exceeds the probability to enter.

A variable with an even inclusion number is ineligible for inclusion if the first
condition above is met.

Computations During Variable Selection


During variable selection, the matrix W is replaced at each step by a new matrix
W∗ using the symmetric sweep operator described by Dempster (1969). If the first
q variables have been included in the analysis, W may be partitioned as:

LMW11 W12 OP
NW21 W22 Q
where W11 is q × q . At this stage, the matrix W∗ is defined by

W∗ =
LM−W111
− −1
W11 W12
=
∗OP LM
W11 ∗
W12 OP
MNW21W111− −1
W22 − W21W11 W12 ∗
W21PQ MN ∗
W22 PQ
In addition, when stepwise variable selection is used, T is replaced by the matrix
T ∗ , defined similarly.
6 DISCRIMINANT

The following statistics are computed:

Tolerance

R|0 if wii = 0
TOL i
|
= Sw w ∗
if variable i is not in the analysis and wii ≠ 0
||−1 ew w j
ii ii

T ii ii if variable i is in the analysis and wii ≠ 0.

If a variable’s tolerance is less than or equal to the specified tolerance limit, or its
inclusion in the analysis would reduce the tolerance of another variable in the
equation to or below the limit, the following statistics are not computed for it or
any set including it.

F-to-Remove

Fi =
ew∗
ii ja
− tii∗ n − q − g + 1 f
tii∗ ag − 1f
with degrees of freedom g −1 and n − q − g +1.

F-to-Enter

F =
et ∗
ii ja
− wii∗ n − q − gf
i
wii∗ ag − 1f
with degrees of freedom g −1 and n − q − g .

Wilks’ Lambda for Testing the Equality of Group Means

Λ = W11 T11

with degrees of freedom q, g −1, and n − g .


DISCRIMINANT 7

The Approximate F Test for Lambda (the “overall F”), also known as Rao’s R (Tatsuoka, 1971)

F=
e1 − Λ jbr s + 1 − qh 2g
s

Λs qh

where

R| q + h − 5
2 2
|
s= S q h −4
2 2
if q 2 + h 2 ≠ 5

||
T1 otherwise
r = n − 1 − a q + gf 2
h = g −1

with degrees of freedom qh and r s + 1 − qh 2 . The approximation is exact if q or h


is 1 or 2.

Rao’s V (Lawley-Hotelling trace) (Rao, 1952; Morrison, 1976)

q q
a f∑ ∑ w ∗ bt
V = − n−g il il − wil g
i =1 l =1

When n − g is large, V, under the null hypothesis, is approximately distributed as


a f
χ 2 with q g −1 degrees of freedom. When an additional variable is entered, the
change in V, if positive, has approximately a χ 2 distribution with g −1 degrees of
freedom.

The Squared Mahalanobis Distance (Morrison, 1976) between groups a and b

q q
2
Dab a f∑ ∑ w∗ c X
= − n−g il ia hc
− Xib Xla − Xlb h
i =1 l =1
8 DISCRIMINANT

The F value for Testing the Equality of Means of Groups a and b

Fab =
bn − q − g + 1gn n D a b 2
qbn − g gbn + n g
ab
a b

The Sum of Unexplained Variations (Dixon, 1973)

g−1 g
R= ∑ ∑ 4 e4 + D j 2
ab
a =1 b = a +1

Classification Functions
Once a set of q variables has been selected, the classification functions (also known
as Fisher’s linear discriminant functions) can be computed using

q
a f∑ w∗ X
bij = n − g il lj i = 1, 2, …, q, j = 1, 2, …, g
l =1

for the coefficients, and

∑b X
1
a j = log p j − ij ij j = 1, 2, …, q
2
i =1

for the constant, where p j is the prior probability of group j.

Canonical Discriminant Functions


The canonical discriminant function coefficients are determined by solving the
general eigenvalue problem

( T − W ) V = λWV
DISCRIMINANT 9

where V is the unscaled matrix of discriminant function coefficients and l is a


diagonal matrix of eigenvalues. The eigensystem is solved as follows:

The Cholesky decomposition

W = LU

is formed, where L is a lower triangular matrix, and U = L ′ .


The symmetric matrix L−1BU −1 is formed and the system

bL ( T − W )U
−1 −1
g
− λI ( UV ) = 0

is solved using tridiagonalization and the QL method. The result is m eigenvalues,


b g
where m = min q, g − 1 and corresponding orthonormal eigenvectors, UV. The
eigenvectors of the original system are obtained as

V = U −1 UV a f
For each of the eigenvalues, which are ordered in descending magnitude, the
following statistics are calculated:

Percentage of Between-Groups Variance Accounted for

100λ k
m

∑λ k
k =1

Canonical Correlation

b
λk 1+ λk g
10 DISCRIMINANT

Wilks’ Lambda
Testing the significance of all the discriminating functions after the first k:

m
Λk = ∏1 b1 + λ g i k = 0, 1, …, m − 1
i = k +1

The significance level is based on

c a f h
χ 2 = − n − q + g 2 − 1 ln Λ k ,

a fa f
which is distributed as a χ 2 with q − k g − k − 1 degrees of freedom.

The Standardized Canonical Discriminant Coefficient Matrix D


The standard canonical discriminant coefficient matrix D is computed as

−1
D = S11 V

where
S = diag e w11 , w22 , …, w pp j
S11 = partition containing the first q rows and columns of S
V = matrix of eigenvectors such that
V ′W11V =I

The Correlations Between the Canonical Discriminant Functions and the


Discriminating Variables
The correlations between the canonical discriminant functions and the
discriminating variables are given by

−11
R = S11 W11V
DISCRIMINANT 11

a
If some variables were not selected for inclusion in the analysis q < p , thef
eigenvectors are implicitly extended with zeroes to include the nonselected
variables in the correlation matrix. Variables for which Wii = 0 are excluded from S
and W for this calculation; p then represents the number of variables with non-zero
within-groups variance.

The Unstandardized Coefficients


The unstandardized coefficients are calculated from the standardized ones using

B= an − gfS11−1D
The associated constants are:

q
ak = − ∑ bik Xi•
i =1

The group centroids are the canonical discriminant functions evaluated at the group
means:

q
fkj = ak + ∑ bik Xij
i =1

Tests For Equality Of Variance


Box’s M is used to test for equality of the group covariance matrices.

g
a f
M = n − g log C′ − ∑ dn j − 1ilog Ca jf
j =1
12 DISCRIMINANT

where
C′ = pooled within-groups covariance matrix excluding groups with singular
covariance matrices

Ca jf = covariance matrix for group j.

a jf
Determinants of C′ and C are obtained from the Cholesky decomposition. If any
diagonal element of the decomposition is less than 10 −11, the matrix is considered
singular and excluded from the analysis.

p
a jf
log C = 2 ∑ log lii − p logdn j − 1i
i =1

where lii is the ith diagonal entry of L such that n j − 1 C d i a f = L ′L .


j

Similarly,
p
log C′ = 2 ∑ log lii − p logan′ − gf
i =1

where
an′ − gfC′ = L′L
n′ = sum of weights of cases in all groups with nonsingular covariance matrices
The significance level is obtained from the F distribution with t1 and t2 degrees of
freedom using (Cooley and Lohnes, 1971):
R|M b if e2 > e12
F=S
|| t atb −MM f
2 if e2 < e12
T 1
DISCRIMINANT 13

where

F g 1 1 I 2 p + 3p −1
e1 = GG ∑ n − 1 − n − g JJ 6ag − 1fa p + 1f 2

Hj j
=1 K
Fg 1 I a p − 1fa p + 2f
e2 = GG ∑ (n − 1) −
1
JJ 6ag − 1f
Hj j
=1
2
( n − g) 2
K
a fa f
t1 = g − 1 p p + 1 2

b
t2 = t1 + 2 g e2 − e12

R| t 1 if e2 > e12
1− e − t t
b=S 1 1 2
||1 − e t− 2 t 2 if e2 < e12
T 1 2

If e12 − e2 is zero, or much smaller than e2 , t2 cannot be computed or cannot be


computed accurately. If

e2 = e2 + 0.0001 e2 − e12 e j
the program uses Bartlett’s χ 2 statistic rather than the F statistic:

b
χ 2 = M 1 − e1 g
with t1 degrees of freedom.
For testing the group covariance matrix of the canonical discriminant functions, the
procedure is similar. The covariance matrices C a j f and C′ are replaced by D j and
D′ , where

D j = B′ C B a jf
14 DISCRIMINANT

is the group covariance matrix of the discriminant functions.


The pooled covariance matrix in this case is an identity, so that

a f
D′ = n − g I m − ∑ dn j − 1iD j
j

where the summation is only over groups with singular D j .

Classification
The basic procedure for classifying a case is as follows:
• If X is the 1× q vector of discriminating variables for the case, the 1× m
vector of canonical discriminant function values is
f = XB + a

• A chi-square distance from each centroid is computed

d i d
χ 2j = f − f j D −j 1 f − f j i′
where D j is the covariance matrix of canonical discriminant functions for
group j and f j is the group centroid vector. If the case is a member of group j,
χ 2j has a χ 2 distribution with m degrees of freedom. P(X G) is the
significance level of such a χ 2j .

• The classification, or posterior probability P(G j | X) , is


−1 2 − χ 2j 2
Pj D j e
P(G j | X ) = g


−1 2 − χ 2j 2
Pj D j e
j =1

where p j is the prior probability for group j. A case is classified into the group
for which P(G j | X) is highest.
DISCRIMINANT 15

The actual calculation of P(G j | X) is

g j = log Pj −
1
2
elog D j + χ 2j j
R| expFG g − max g IJ
|| H j
j K j
if g j − max g j > −46
|| ∑ expFGH g − max g IJK
g j

|
j j
j
P( G | X ) = S j =1
j
||
||0 otherwise

||
T
If individual group covariances are not used in classification, the pooled within-
groups covariance matrix of the discriminant functions (an identity matrix) is
substituted for D j in the above calculation, resulting in considerable simplification.

If any D j is singular, a pseudo-inverse of the form

LMD −1
j11 0 OP
MN0 0 PQ

replaces D −j 1 and D j11 replaces D j . D j11 is a submatrix of D j whose rows and


columns correspond to functions not dependent on preceding functions. That is,
function 1 will be excluded only if the rank of D j = 0 , function 2 will be excluded
only if it is dependent on function 1, and so on. This choice of the pseudo-inverse
is not optimal for the numerical stability of D −j11
1
, but maximizes the discrimination
power of the remaining functions.
16 DISCRIMINANT

Cross-Validation
The following notation is used in this section:

X
~ jk ( X1 jk , …, Xqjk )T

M
~ j
Sample mean of jth group
mj

∑f
1
M
~ j
= jk X
nj ~ jk
k =1
M
~ jk
Sample mean of jth group excluding point X jk
~

mj

∑ f jl X~ jl
1
M
~ jk
=
n j − f jk
l =1
l≠k

Σ Polled sample covariance matrix


Σj Sample covariance matrix of jth group
Σ jk Polled sample covariance matrix without point X jk
~

Σ −1 n − g − f jk −1
jk = (Σ +
n−g
n j Σ −j 1 ( X jk − M
~ j
)( X jk − M j ) T Σ −j 1
~ ~ ~
)
(n j − f jk )(n j − g) − n j ( X jk − M j ) T Σ −j 1 ( X jk − M
~ j
)
~ ~ ~

−1
d02 ( a~, b~ ) ~− b
~ ) Σ jk ( a − b)
T T
= (a ~ ~

Cross-validation applies only to linear discriminant analysis (not quadratic). During

cross-validation, SPSS loops over all cases in the data set. Each case, say X
~ jk
, is

extracted once and treated as test data. The remaining cases are treated as a new

data set.
DISCRIMINANT 17

Here we compute d02 ( X , M ) and d02 ( X


~ jk ~ jk
, M ) (i = 1,..., g. i ≠ j ) . If there is
~ jk ~ i

an i (i ≠ j ) that satisfies (log( Pi ) − d02 ( X , M ) / 2 > log( Pj ) − d02 ( X jk , M


~ jk ~ i ~ jk
) / 2 ),
~

then the extracted point X


~ jk
is misclassified. The estimate of prediction error rate is

the ratio of the sum of misclassified case weights and the sum of all case weights.
To reduce computation time, the linear discriminant method is used instead of
the canonical discriminant method. The theoretical solution is exactly the same for
both methods.

Rotations
Varimax rotations may be performed on either the matrix of canonical discriminant
function coefficients or on that of the correlation between the canonical
discriminant functions and the discrimination variables (the structure matrix). The
actual algorithm for the rotation is described in FACTOR.
For the Kaiser normalization

R|
1 + 1 w w∗
hi2 =S
| a
ii ii
squared multiple correlation f
if coefficients rotated

||∑ r
m
2
if correlations rotated
|T =
k 1
ik

The unrotated structure matrix is

−1
R = S11 W11V

If the rotation transformation matrix is represented by K, the rotated standardized


coefficient matrix D R is given by

D R = DK
18 DISCRIMINANT

The rotated matrix of pooled within-groups correlations between the canonical


discriminant functions and the discriminating variables R R is

R R = RK

The eigenvector matrix V satisfies

V ′( T − W )V = Λ a
= diag λ 1 , λ 2 , …, λ m f
where the λ k are the eigenvalues.

The equivalent matrix for the rotated coefficient VR

b V g′ a T − W fV
R R

is not diagonal, meaning the rotated functions, unlike the unrotated ones, are
correlated for the original sample, although their within-groups covariance matrix
is an identity. The diagonals of the above matrix may still be interpreted as the
between-groups variances of the functions. They are the numerators for the
proportions of variance printed with the transformation matrix. The denominator is
their sum. After rotation, the columns of the transformation are exchanged, if
necessary, so that the diagonals of the matrix above are in descending order.

References
Anderson, T. W. 1958. Introduction to multivariate statistical analysis. New York:
John Wiley & Sons, Inc.

Cooley, W. W., and Lohnes, P. R. 1971. Multivariate data analysis. New York:
John Wiley & Sons, Inc.

Dempster, A. P. 1969. Elements of Continuous Multivariate Analysis. Reading,


Mass.: Addison-Wesley.

Dixon, W. J., ed. 1973. BMD Biomedical computer programs. Los Angeles:
University of California Press.

Tatsuoka, M. M. 1971. Multivariate analysis. New York: John Wiley & Sons, Inc.

You might also like