Professional Documents
Culture Documents
Discriminant Analysis
Discriminant Analysis
Discriminant analysis helps in discriminating between two or more sets of objects or people based on the knowledge of some of their characteristics Discriminate between Bones or skeletons of males or females Dividing people into potential buyers or non buyers Classifying individuals as good or bad credit risk Classifying companies as good or bad investment risks Classifying consumers as brand loyal or brand switchers
Similarities and Differences between ANOVA, Regression, and Discriminant Analysis DISCRIMINANT ANALYSIS ANOVA REGRESSION
Similarities Number of dependent variables Number of independent variables Differences Nature of the dependent variables Nature of the independent variables One One One
Multiple
Multiple
Multiple
Metric Categorical
Metric Metric
Categorical Metric
Discriminant Analysis
Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor or independent variables are metricin nature. The objectives of discriminant analysis are as follows: Development of discriminant functions, or linear combinations of the predictor or independent variables, which will best discriminate between the categories of the criterion or dependent variable (groups). Classification of cases to one of the groups based on the values of the predictor variables. Evaluation of the accuracy of classification.
Discriminant Analysis
When the criterion variable has two categories, the technique is known as two-group discriminant analysis. When three or more categories are involved, the technique is referred to as multiple discriminant analysis.
Identify the objectives, the dependent variable, and the independent variables. The dependent variable must consist of two or more mutually exclusive and collectively exhaustive categories. (Gender, Credit Risk, Investment Risk,) The independent variables should be selected based on a theoretical model or previous research, or the experience of the researcher. Collect data on independent variables for each category of criterion variable One part of the sample, called the estimation or analysis sample, is used for estimation of the discriminant function. The other part, called the holdout or validation sample, is reserved for validating the discriminant function. Often the distribution of the number of cases in the analysis and validation samples follows the distribution in the total sample.
Example
To determine salient characteristics of families that visited a vacation resort during last two years Data were obtained from a sample of 42 families of which 30 were included in analysis sample & 12 in validation sample HHs that visited resort coded as 1 & those that did not as 2 Both analysis and hold out samples were balanced in terms of visits Independent variables were --- Family income (V1) ---Attitude towards travel measured on a 9 point scale (V2) ---Importance attached to family vacation measured on a 9 point scale(V3) ---HH Size(V4) ---Age of the head of the HH(V5)
Spent on Vacation
Resort to Family
($000)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
50.2 70.3 62.9 48.5 52.7 75.0 46.2 57.0 64.1 68.1 73.4 71.9 56.2 49.3 62.0
5 6 7 7 6 8 5 2 7 7 6 5 1 4 5
8 7 5 5 6 7 3 4 5 6 7 8 8 2 6
3 4 6 5 4 5 3 6 4 5 5 4 6 3 2
43 61 52 36 55 68 62 51 57 45 44 64 54 56 58
M (2) H (3) H (3) L (1) H (3) H (3) M (2) M (2) H (3) H (3) H (3) H (3) M (2) H (3) H (3)
Spent on Vacation 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
($000) 32.1 36.2 43.2 50.4 44.1 38.3 55.0 46.1 35.0 37.3 41.8 57.0 33.4 37.5 41.3 5 4 2 5 6 6 1 3 6 2 5 8 6 3 3
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Amount
Importance Household Age of Toward Attached Size Head of No. Visit Income Household Family Vacation
1 2 3 4 5 6 7 8 9 10 11 12
1 1 1 1 1 1 2 2 2 2 2 2
50.8 63.6 54.0 45.0 68.0 62.1 35.0 49.6 39.4 37.0 54.5 38.2
4 7 6 5 6 5 4 5 6 2 7 2
7 4 7 4 6 6 3 3 5 6 3 2
3 7 4 3 6 3 4 5 3 5 3 3
45 55 58 60 46 56 54 39 44 51 37 49
M(2) H (3) M(2) M(2) H (3) H (3) L (1) L (1) H (3) L (1) M(2) L (1)
Pooled Within-Groups Correlation Matrix INCOME TRAVEL VACATION INCOME TRAVEL VACATION HSIZE AGE 1.00000 0.19745 0.09148 0.08887 - 0.01431 1.00000 0.08434 -0.01681 -0.19709
1.00000 -0.04301
1.00000
Wilks' (U-statistic) and univariate F ratio with 1 and 28 degrees of freedom Variable INCOME TRAVEL VACATION HSIZE AGE Wilks' 0.45310 0.92479 0.82377 0.65672 0.95441 F 33.800 2.277 5.990 14.640 1.338 Significance 0.0000 0.1425 0.0209 0.0007 0.2572
Contd.
Interpretation
When Predictors are considered individually only Income, Importance of vacation &HH size significantly differentiate between those who visited resort & those who did not( F ratio with k=1 & n-k-1= 30-1-1=28 d.f ) Wilks lambda (also called U statistics) is ratio of within group SS to total SS. Its value varies between 0 to 1 Small value of lambda indicate that group means are different & a better discrimination power of that variable
The null hypothesis that, in the population, the means of discriminant functions in both groups are equal can be statistically tested. In SPSS this test is based on Wilks' . If the null hypothesis is rejected, indicating significant discrimination, one can proceed to interpret the results.
* marks the 1 canonical discriminant functions remaining in the analysis. Standard Canonical Discriminant Function Coefficients FUNC INCOME TRAVEL VACATION HSIZE AGE 0.74301 0.09611 0.23329 0.46911 0.20922 1
Structure Matrix: Pooled within-groups correlations between discriminating variables & canonical discriminant functions (variables ordered by size of correlation within function) FUNC INCOME HSIZE VACATION TRAVEL AGE 0.82202 0.54096 0.34607 0.21337 0.16354 Contd. 1
Classification results for cases selected for use in analysis Actual Group Group Group 1 2 Predicted No. of Cases 15 15 Group Membership 1 2 12 80.0% 0 0.0% 3 20.0% 15 100.0% Contd.
Interpretation
Since 0.0001 is less than .05 we reject the null hypothesis of equality of group means indicating better discriminating power of the discriminant function The unstandardised discriminant function is D= -7.975476 + +0.8476710E-01(INCOME) +0.4964455E-01(TRAVEL) +0.1202813(VACATION) +0.4273893(HSIZE) +0.2454380E-01(AGE)
Resort
50.8
45
Classification Results for cases not selected for use in the analysis (holdout sample) Actual Group Group Group 1 2 Predicted Group Membership No. of Cases 1 6 6 4 66.7% 0 0.0% 2 2 33.3% 6 100.0%
Interpretation of Results
Find out the percentage of cases correctly classified by the model Find out variables which are relatively better in discriminating between groups How to classify a new subject into one of the groups
The discriminant weights, estimated by using the analysis sample, are multiplied by the values of the predictor variables in the holdout sample to generate discriminant scores for the cases in the holdout sample. The cases are then assigned to groups based on their discriminant scores and an appropriate decision rule. The hit ratio, or the percentage of cases correctly classified, can then be determined by summing the diagonal elements and dividing by the total number of cases.
SPSS Windows
The DISCRIMINANT program performs both two-group and multiple discriminant analysis. To select this procedure using SPSS for Windows click: Analyze>Classify>Discriminant
Eigen Value = Between S S/ Within SS Wilk,s Lamda= WithinSS/TotalSS Canonical R = Correlation beween estimated Y & actual Y