You are on page 1of 22

F O U R T H E D I T I O N

Using Multivariate Statistics


Barbara G. Tabachnick
California State University, Northridge
Linda S. Fidell
California State University, Northridge
Allyn and Bacon
Boston London Toronto Sydney Tokyo Singapore
CONTENTS
1
Preface xxv
Introduction 1
1.1 Multivariate Statistics: Why? 1
1.1.1 The Domain of Multivariate Statistics: Numbers of IVs
and DVs 1
1.1.2 Experimental and Nonexperimental Research 2
1.1.2.1 Multivariate Statistics in Nonexperimental Research
I A.2.2 Multivariate Statistics in Experimental Research 3
1.1.3 Computers and Multivariate Statistics 4
1.1.3.1 Program Updates 4
1.1.3.2 Garbage In, Roses Out? 5
1.1.4 Why Not?" 5
1.2 Some Useful Definitions 5
1.2.1 Continuous, Discrete, and Dichotomous Data 5
1.2.2 Samples and Populations 7
1.2.3 Descriptive and Inferential Statistics 7
1.2.4 Orthogonality 8
1.2.5 Standard and Sequential Analyses 9
1.3 Combining Variables 10
1.4 Number and Nature of Variables to Include 11
1.5 Statistical Power 11
1.6 Data Appropriate for Multivariate Statistics 12
1.6.1 The Data Matrix 12
1.6.2 The Correlation Matrix 13
1.6.3 The Variance-Covariance Matrix 14
1.6.4 The Sum-of-Squares and Cross-Products Matrix 14
1.6.5 Residuals 16
1.7 Organization of the Book 16
A Guide to Statistical Techniques: Using the Book 17
2.1 Research Questions and Associated Techniques 17
2.1.1 Degree of Relationship among Variables 17
2.1.1.1 Bivariater 17
2.1.1.2 Multiple/? 18
H I
IV CONTENTS
2.1.1.3 Sequential/? 18
2.1.1.4 Canonical/? 18
2.1.1.5 Multiway Frequency Analysis 19
2.1.2 Significance of Group Differences 19
2.1.2.1 One-Way ANOVA and t Test 19
2.1.2.2 One-Way ANCOVA 19
2.1.2.3 Factorial ANOVA 20
2.1.2.4 Factorial ANCOVA 20
2.1.2.5 Hotelling's T
2
20
2.1.2.6 One-Way MANOVA 21
2.1.2.7 One-Way MANCOVA 21
2.1.2.8 Factorial MANOVA 21
2.1.2.9 Factorial MANCOVA 22
2.1.2.10 Profile Analysis 22
2.1.3 Prediction of Group Membership 23
2.1.3.1 One-Way Discriminant Function 23
2.1.3.2 Sequential One-Way Discriminant Function 23
2.1.3.3 Multiway Frequency Analysis (Logit) 24
2.1.3.4 Logistic Regression 24
2.1.3.5 Sequential Logistic Regression 24
2.1.3.6 Factorial Discriminant Function 24
2.1.3.7 Sequential Factorial Discriminant Function 25
2.1.4 Structure 25
2.1.4.1 Principal Components 25
2.1.4.2 Factor Analysis 25
2.1.4.3 Structural Equation Modeling 26
2.1.5 Time Course of Events 26
2.1.5.1 Survival/Failure Analysis 26
2.1.5.2 Time-Series Analysis 26
2.2 A Decision Tree 26
2.3 Technique Chapters 29
2.4 Preliminary Check of the Data 30
Review of Univariate and Bivariate Statistics 31
3.1 Hypothesis Testing 31
3.1.1 One-Sample z Test as Prototype 31
3.1.2 Power 34
3.1.3 Extensions of the Model 35
3.2 Analysis of Variance 35
3.2.1 One-Way Between-Subjects ANOVA 36
3.2.2 Factorial Between-Subjects ANOVA 40
3.2.3 Within-Subjects ANOVA 41
3.2.4 Mixed Between-Within-Subjects ANOVA 44
CONTENTS
3.2.5 Design Complexity 45
3.2.5.1 Nesting 45
3.2.5.2 Latin-Square Designs 46
3.2.5.3 Unequal n and Nonorthogonality 46
3.2.5.4 Fixed and Random Effects 47
3.2.6 Specific Comparisons 47
3.2.6.1 Weighting Coefficients for Comparisons 48
3.2.6.2 Orthogonality of Weighting Coefficients 48
3.2.6.3 Obtained F for Comparisons 49
3.2.6.4 Critical F for Planned Comparisons 50
3.2.6.5 Critical F for Post Hoc Comparisons 50
3.3 Parameter Estimation 51
3.4 Strength of Association 52
3.5 Bivariate Statistics: Correlation and Regression 53
3.5.1 Correlation 53
3.5.2 Regression 54
3.6 Chi-Square Analysis 55
4 Cleaning Up Your Act: Screening Data
Prior to Analysis 56
4.1 Important Issues in Data Screening 57
4.1.1 Accuracy of Data File 57
4.1.2 Honest Correlations 57
4.1.2.1 Inflated Correlation 57
4.1.2.2 Deflated Correlation 57
4.1.3 Missing Data 58
4.1.3.1 Deleting Cases or Variables 59
4.1.3.2 Estimating Missing Data 60
4.1.3.3 Using a Missing Data Correlation Matrix 64
4.1.3.4 Treating Missing Data as Data 65
4.1.3.5 Repeating Analyses with and without Missing Data 65
4.1.3.6 Choosing among Methods for Dealing
with Missing Data 65
4.1.4 Outliers 66
4.1.4.1 Detecting Univariate and Multivariate Outliers 67
4.1.4.2 Describing Outliers 70
4.1.4.3 Reducing the Influence of Outliers 71
4.1.4.4 Outliers in a Solution 71
4.1.5 Normality, Linearity, and Homoscedasticity 72
4.1.5.1 Normality 73
4.1.5.2 Linearity 77
4.1.5.3 Homoscedasticity, Homogeneity of Variance, and
Homogeneity of Variance-Covariance Matrices 79
VI CONTENTS
4.1.6 Common Data Transformations 80
4.1.7 Multicollinearity and Singularity 82
4.1.8 A Checklist and Some Practical Recommendations 85
4.2 Complete Examples of Data Screening 86
4.2.1 Screening Ungrouped Data 86
4.2.1.1 Accuracy of Input, Missing Data, Distributions,
and Univariate Outliers 87
4.2.1.2 Linearity and Homoscedasticity 90
4.2.1.3 Transformation 92
4.2.1.4. Detecting Multivariate Outliers 92
4.2.1.5 Variables Causing Cases to be Outliers 94
4.2.1.6 Multicollinearity 98
4.2.2 Screening Grouped Data 99
4.2.2.1 Accuracy of Input, Missing Data, Distributions,
Homogeneity of Variance, and Univariate Outliers 99
4.2.2.2 Linearity 102
4.2.2.3 Multivariate Outliers 104
4.2.2.4 Variables Causing Cases to be Outliers 107
4.2.2.5 Multicollinearity 108
Multiple Regression 111
5.1 General Purpose and Description 111
5.2 Kinds of Research Questions 112
5.2.1 Degree of Relationship 113
5.2.2 Importance of IVs 113
5.2.3 Adding IVs 113
5.2.4 Changing IVs 113
5.2.5 Contingencies among IVs 114
5.2.6 Comparing Sets of IVs 114
5.2.7 Predicting DV Scores for Members of a New Sample 114
5.2.8 Parameter Estimates 115
5.3 Limitations to Regression Analyses 115
5.3.1 Theoretical Issues 115
5.3.2 Practical Issues 116
5.3.2.1 Ratio of Cases to IVs 117
5.3.2.2 Absence of Outliers among the IVs and on the DV 117
5.3.2.3 Absence of Multicollinearity and Singularity 118
5.3.2.4 Normality, Linearity, Homoscedasticity of Residuals 119
5.3.2.5 Independence of Errors 121
5.3.2.6 Outliers in the Solution 122
5.4 Fundamental Equations for Multiple Regression 122
5.4.1 General Linear Equations 123
5.4.2 Matrix Equations 124
CONTENTS Vll
5.4.3 Computer Analyses of Small-Sample Example 128
5.5 Major Types of Multiple Regression 131
5.5.1 Standard Multiple Regression 131
5.5.2 Sequential Multiple Regression 131
5.5.3 Statistical (Stepwise) Regression 133
5.5.4 Choosing among Regression Strategies 138
5.6 Some Important Issues 139
5.6.1 Importance of IVs 139
5.6.1.1 Standard Multiple Regression 140
5.6.1.2 Sequential or Statistical Regression 142
5.6.2 Statistical Inference 142
5.6.2.1 Test for Multiple/? 142
5.6.2.2 Test of Regression Components 143
5.6.2.3 Test of Added Subset of IVs 144
5.6.2.4 Confidence Limits around B 145
5.6.2.5 Comparing Two Sets of Predictors 145
5.6.3 Adjustment of/?
2
147
5.6.4 Suppressor Variables 148
5.6.5 Regression Approach to ANOVA 149
5.6.6 Centering when Interactions and Powers of IVs
Are Included 151
5.7 Complete Examples of Regression Analysis 153
5.7.1 Evaluation of Assumptions 154
5.7.1.1 Ratio of Cases to IVs 154
5.7.1.2 Normality, Linearity, Homoscedasticity,
and Independence of Residuals 154
5.7.1.3 Outliers 157
5.7.1.4 Multicollinearity and Singularity 157
5.7.2 Standard Multiple Regression 159
5.7.3 Sequential Regression 165
5.8 Comparison of Programs 170
5.8.1 SPSS Package 170
5.8.2 SAS System 175
5.8.3 SYSTAT System 176
Canonical Correlation 177
6.1 General Purpose and Description 177
6.2 Kinds of Research Questions 178
6.2.1 Number of Canonical Variate Pairs 178
6.2.2 Interpretation of Canonical Variates 178
6.2.3 Importance of Canonical Variates 178
6.2.4 Canonical Variate Scores 178
Vlll C O N T E N T S
6.3 Limitations 178
6.3.1 Theoretical Limitations 178
6.3.2 Practical Issues 180
6.3.2.1 Ratio of Cases to IVs 180
6.3.2.2 Normality, Linearity, and Homoscedasticity 180
6.3.2.3 Missing Data 181
6.3.2.4 Absence of Outliers 181
6.3.2.5 Absence of Multicollinearity and Singularity 181
6.4 Fundamental Equations for Canonical Correlation 182
6.4.1 Eigenvalues and Eigenvectors 183
6.4.2 Matrix Equations 185
6.4.3 Proportions of Variance Extracted 189
6.4.4 Computer Analyses of Small-Sample Example 190
6.5 Some Important Issues 198
6.5.1 Importance of Canonical Variates 198
6.5.2 Interpretation of Canonical Variates 199
6.6 Complete Example of Canonical Correlation 199
6.6.1 Evaluation of Assumptions 200
6.6.1.1 Missing Data 200
6.6.1.2 Normality, Linearity, and Homoscedasticity 200
6.6.1.3 Outliers 203
6.6.1.4 Multicollinearity and Singularity 207
6.6.2 Canonical Correlation 216
6.7 Comparison of Programs 216
6.7.1 SAS System 216
6.7.2 SPSS Package 216
6.7.3 SYSTAT System 218
Multiway Frequency Analysis 219
7.1 General Purpose and Description 219
7.2 Kinds of Research Questions 220
7.2.1 Associations among Variables 220
7.2.2 Effect on a Dependent Variable 221
7.2.3 Parameter Estimates 221
7.2.4 Importance of Effects 221
7.2.5 Strength of Association 221
7.2.6 Specific Comparisons and Trend Analysis 222
7.3 Limitations to Multiway Frequency Analysis 222
7.3.1 Theoretical Issues 222
7.3.2 Practical Issues 222
7.3.2.1 Independence 222
7.3.2.2 Ratio of Cases to Variables 223
8
CONTENTS IX
7.3.2.3 Adequacy of Expected Frequencies 223
7.3.2.4 Outliers in the Solution 224
7.4 Fundamental Equations for Multiway Frequency Analysis 224
7.4.1 Screening for Effects 225
7.4.1.1 Total Effect 226
7.4.1.2 First-Order Effects 227
7.4.1.3 Second-Order Effects 228
7.4.1.4 Third-Order Effect 232
7.4.2 Modeling 233
7.4.3 Evaluation and Interpretation 235
7.4.3.1 Residuals 235
7.4.3.2 Parameter Estimates 236
7.4.4 Computer Analyses of Small-Sample Example 241
7.5 Some Important Issues 250
7.5.1 Hierarchical and Nonhierarchical Models 250
7.5.2 Statistical Criteria 251
7.5.2.1 Tests of Models 251
7.5.2.2 Tests of Individual Effects 251
7.5.3 Strategies for Choosing a Model 252
7.5.3.1 SPSS HILOGLINEAR (Hierarchical) 252
7.5.3.2 SPSS GENLOG (General Log-linear) 253
7.5.3.3 SAS CATMOD, SYSTAT LOGLINEAR,
and SYSTAT LOGLIN (General Log-linear) 253
7.6 Complete Example of Multiway Frequency Analysis 253
7.6.1 Evaluation of Assumptions: Adequacy
of Expected Frequencies 253
7.6.2 Hierarchical Log-linear Analysis 254
7.6.2.1 Preliminary Model Screening 254
7.6.2.2 Stepwise Model Selection 256
7.6.2.3 Adequacy of Fit 258
7.6.2.4 Interpretation of the Selected Model 264
7.7 Comparison of Programs 270
7.7.1 SPSS Package 273
7.7.2 SAS System 274
7.7.3 SYSTAT System 274
Analysis of Covariance 275
8.1 General Purpose and Description 275
8.2 Kinds of Research Questions 277
8.2.1 Main Effects of IVs 278
8.2.2 Interactions among IVs 278
8.2.3 Specific Comparisons and Trend Analysis 278
8.2.4 Effects of Covariates 278
C O N T E N T S
8.2.5 Strength of Association 279
8.2.6 Parameter Estimates 279
8.3 Limitations to Analysis of Covariance 279
8.3.1 Theoretical Issues 279
8.3.2 Practical Issues 280
8.3.2.1 Unequal Sample Sizes, Missing Data, and Ratio of Cases
to IVs 280
8.3.2.2 Absence of Outliers 281
8.3.2.3 Absence of Multicollinearity and Singularity 281
8.3.2.4 Normality of Sampling Distributions 281
8.3.2.5 Homogeneity of Variance 281
8.3.2.6 Linearity 282
8.3.2.7 Homogeneity of Regression 282
8.3.2.8 Reliability of Covariates 283
8.4 Fundamental Equations for Analysis of Covariance 283
8.4.1 Sums of Squares and Cross-Products 284
8.4.2 Significance Test and Strength of Association 288
8.4.3 Computer Analyses of Small-Sample Example 289
8.5 Some Important Issues 291
8.5.1 Test for Homogeneity of Regression 291
8.5.2 Design Complexity 293
8.5.2.1 Within-Subjects and Mixed Within-Between Designs 293
8.5.2.2 Unequal Sample Sizes 296
8.5.2.3 Specific Comparisons and Trend Analysis 298
8.5.2.4 Strength of Association 301
8.5.3 Evaluation of Covariates 302
8.5.4 Choosing Covariates 302
8.5.5 Alternatives to ANCOVA 303
8.6 Complete Example of Analysis of Covariance 304
8.6.1 Evaluation of Assumptions 305
8.6.1.1 Unequal n and Missing Data 305
8.6.1.2 Normality 305
8.6.1.3 Linearity 305
8.6.1.4 Outliers 305
8.6.1.5 Multicollinearity and Singularity 309
8.6.1.6 Homogeneity of Variance 309
8.6.1.7 Homogeneity of Regression 310
8.6.1.8 Reliability of Covariates 310
8.6.2 Analysis of Covariance 310
8.6.2.1 Main Analysis 310
8.6.2.2 Evaluation of Covariates 313
8.6.2.3 Homogeneity of Regression Run 315
8.7 Comparison of Programs 319
8.7.1 SPSS Package 319
8.7.2 SYSTAT System 319
8.7.3 SAS System 321
CONTENTS XI
Multivariate Analysis of Variance and Covariance 322
9.1 General Purpose and Description 322
9.2 Kinds of Research Questions 325
9.2.1 Main Effects of IVs 325
9.2.2 Interactions among IVs 326
9.2.3 Importance of DVs 326
9.2.4 Parameter Estimates 326
9.2.5 Specific Comparisons and Trend Analysis 327
9.2.6 Strength of Association 327
9.2.7 Effects of Covariates 327
9.2.8 Repeated-Measures Analysis of Variance 327
9.3 Limitations to Multivariate Analysis of Variance
and Covariance 328
9.3.1 Theoretical Issues 328
9.3.2 Practical Issues 328
9.3.2.1 Unequal Sample Sizes, Missing Data, and Power 329
9.3.2.2 Multivariate Normality 329
9.3.2.3 Absence of Outliers 330
9.3.2.4 Homogeneity of Variance-Covariance Matrices 330
9.3.2.5 Linearity 330
9.3.2.6 Homogeneity of Regression 331
9.3.2.7 Reliability of Covariates 331
9.3.2.8 Absence of Multicollinearity and Singularity 331
9.4 Fundamental Equations for Multivariate Analysis
of Variance and Covariance 332
9.4.1 Multivariate Analysis of Variance 332
9.4.2 Computer Analyses of Small-Sample Example 339
9.4.3 Multivariate Analysis of Covariance 340
9.5 Some Important Issues 347
9.5.1 Criteria for Statistical Inference 347
9.5.2 Assessing DVs 348
9.5.2.1 Univariate/
7
348
9.5.2.2 Roy-Bargmann Stepdown Analysis 350
9.5.2.3 Using Discriminant Function Analysis 351
9.5.2.4 Choosing among Strategies for Assessing DVs 351
9.5.3 Specific Comparisons and Trend Analysis 352
9.5.4 Design Complexity 356
9.5.4.1 Within-Subjects and Between-Within Designs 356
9.5.4.2 Unequal Sample Sizes 356
9.5.5 MANOVA vs. ANOVAs 357
9.6 Complete Examples of Multivariate Analysis of Variance
and Covariance 357
9.6.1 Evaluation of Assumptions 358
9.6.1.1 Unequal Sample Sizes and Missing Data 358
Xii CONTENTS
9.6.1.2 Multivariate Normality 360
9.6.1.3 Linearity 360
9.6.1.4 Outliers 360
9.6.1.5 Homogeneity of Variance-Covariance Matrices 361
9.6.1.6 Homogeneity of Regression 362
9.6.1.7 Reliability of Covariates 365
9.6.1.8 Multicollinearity and Singularity 365
9.6.2 Multivariate Analysis of Variance 365
9.6.3 Multivariate Analysis of Covariance 376
9.6.3.1 Assessing Covariates 377
9.6.3.2 Assessing DVs 377
9.7 Comparison of Programs 386
9.7.1 SPSS Package 389
9.7.2 SYSTAT System 389
9.7.3 SAS System 390
JLU Profile Analysis: The Multivariate Approach
to Repeated Measures 391
10.1 General Purpose and Description 391
10.2 Kinds of Research Questions 392
10.2.1 Parallelism of Profiles 392
10.2.2 Overall Difference among Groups 393
10.2.3 Flatness of Profiles 393
10.2.4 Contrasts Following Profile Analysis 393
10.2.5 Parameter Estimates 393
10.2.6 Strength of Association 394
10.3 Limitations to Profile Analysis 394
10.3.1 Theoretical Issues 394
10.3.2 Practical Issues 394
10.3.2.1 Sample Size, Missing Data, and Power 394
10.3.2.2 Multivariate Normality 395
10.3.2.3 Absence of Outliers 395
10.3.2.4 Homogeneity of Variance-Covariance Matrices 395
10.3.2.5 Linearity 395
10.3.2.6 Absence of Multicollinearity and Singularity 396
10.4 Fundamental Equations for Profile Analysis 396
10.4.1 Differences in Levels 396
10.4.2 Parallelism 398
10.4.3 Flatness 401
10.4.4 Computer Analyses of Small-Sample Example 403
10.5 Some Important Issues 410
10.5.1 Contrasts in Profile Analysis 410
11
CONTENTS Xlll
10.5.1.1 Parallelism and Flatness Significant, Levels Not Significant
(Simple-Effects Analysis) 413
10.5.1.2 Parallelism and Levels Significant, Flatness Not Significant
(Simple-Effects Analysis) 414
10.5.1.3 Parallelism, Levels, and Flatness Significant
(Interaction Contrasts) 416
10.5.1.4 Only Parallelism Significant 421
10.5.2 Univariate vs. Multivariate Approach
to Repeated Measures 421
10.5.3 Doubly-Multivariate Designs 423
10.5.4 Classifying Profiles 429
10.5.5 Imputation of Missing Values 429
10.6 Complete Examples of Profile Analysis 430
10.6.1 Profile Analysis of Subscales of the WISC 430
10.6.1.1 Evaluation of Assumptions 431
10.6.1.2 Profile Analysis 435
10.6.2 Doubly-Multivariate Analysis of Reaction Time 442
10.6.2.1 Evaluation of Assumptions 442
10.6.2.2 Doubly-Multivariate Analysis of Slope
and Intercept 446
10.7 Comparison of Programs 453
10.7.1 SPSS Package 453
10.7.2 SAS System 455
10.7.3 SYSTAT System 455
Discriminant Function Analysis 456
11.1 General Purpose and Description 456
11.2 Kinds of Research Questions 458
11.2.1 Significance of Prediction 458
11.2.2 Number of Significant Discriminant Functions 458
11.2.3 Dimensions of Discrimination 459
11.2.4 Classification Functions 459
11.2.5 Adequacy of Classification 459
11.2.6 Strength of Association 460
11.2.7 Importance of Predictor Variables 460
11.2.8 Significance of Prediction with Covariates 460
11.2.9 Estimation of Group Means 460
11.3 Limits to Discriminant Function Analysis 461
11.3.1 Theoretical Issues 461
11.3.2 Practical Issues 461
11.3.2.1 Unequal Sample Sizes, Missing Data, and Power 461
11.3.2.2 Multivariate Normality 462
11.3.2.3 Absence of Outliers 462
XIV CONTENTS
11.3.2.4 Homogeneity of Variance-Covariance Matrices 462
11.3.2.5 Linearity 463
11.3.2.6 Absence of Multicollinearity and Singularity 463
11.4 Fundamental Equations for Discriminant Function Analysis 463
11.4.1 Derivation and Test of Discriminant Functions 464
11.4.2 Classification 467
11.4.3 Computer Analyses of Small-Sample Example 469
11.5 Types of Discriminant Function Analysis 477
11.5.1 Direct Discriminant Function Analysis 478
11.5.2 Sequential Discriminant Function Analysis 478
11.5.3 Stepwise (Statistical) Discriminant Function Analysis 481
11.6 Some Important Issues 481
11.6.1 Statistical Inference 481
11.6.1.1 Criteria for Overall Statistical Significance 481
11.6.1.2 Stepping Methods 482
11.6.2 Number of Discriminant Functions 482
11.6.3 Interpreting Discriminant Functions 483
11.6.3.1 Discriminant Function Plots 483
11.6.3.2 Loading Matrices 484
11.6.4 Evaluating Predictor Variables 485
11.6.5 Design Complexity: Factorial Designs 488
11.6.6 Use of Classification Procedures 489
11.6.6.1 Cross-Validation and New Cases 489
11.6.6.2 Jackknifed Classification 490
11.6.6.3 Evaluating Improvement in Classification 490
11.7 Complete Example of Discriminant Function Analysis 492
11.7.1 Evaluation of Assumptions 492
11.7.
11.7.
11.7.
11.7.
11.7.
11.7.
.1 Unequal Sample Sizes and Missing Data 492
.2 Multivariate Normality 492
.3 Linearity 493
.4 Outliers 493
.5 Homogeneity of Variance-Covariance Matrices 493
.6 Multicollinearity and Singularity 493
11.7.2 Direct Discriminant Function Analysis 497
11.8 Comparison of Programs 509
11.8.1 SPSS Package 515
11.8.2 SYSTAT System 516
11.8.3 SAS System 516
JL.Z Logistic Regression 517
12.1 General Purpose and Description 517
CONTENTS XV
12.2 Kinds of Research Questions 518
12.2.1 Prediction of Group Membership or Outcome 518
12.2.2 Importance of Predictors 518
12.2.3 Interactions among Predictors 518
12.2.4 Parameter Estimates 520
12.2.5 Classification of Cases 520
12.2.6 Significance of Prediction with Covariates 520
12.2.7 Strength of Association 520
12.3 Limitations to Logistic Regression Analysis 521
12.3.1 Theoretical Issues 521
12.3.2 Practical Issues 521
12.3.2.1 Ratio of Cases to Variables 521
12.3.2.2 Adequacy of Expected Frequencies and Power 522
12.3.2.3 Linearity in the Logit 522
12.3.2.4 Absence of Multicollinearity 522
12.3.2.5 Absence of Outliers in the Solution 523
12.3.2.6 Independence of Errors 523
12.4 Fundamental Equations for Logistic Regression 523
12.4.1 Testing and Interpreting Coefficients 524
12.4.2 Goodness-of-Fit 525
12.4.3 Comparing Models 527
12.4.4 Interpretation and Analysis of Residuals 527
12.4.5 Computer Analyses of Small-Sample Example 527
12.5 Types of Logistic Regression 533
12.5.1 Direct Logistic Regression 533
12.5.2 Sequential Logistic Regression 533
12.5.3 Stepwise (Statistical) Logistic Regression 535
12.5.4 Probit and Other Analyses 535
12.6 Some Important Issues 536
12.6.1 Statistical Inference 536
12.6.1.1 Assessing Goodness-of-Fit of Models 537 !
12.6.1.2 Tests of Individual Variables 539 j
12.6.2 Number and Type of Outcome Categories 539
12.6.2.1 Unordered Response Categories
with SYSTAT LOGIT 540 j
12.6.2.2 Ordered Response Categories with SAS LOGISTIC 542 '
12.6.3 Strength of Association for a Model 545
12.6.4 Coding Outcome and Predictor Categories 546
12.6.5 Classification of Cases 547
12.6.6 Hierarchical and Nonhierarchical Analysis 548
12.6.7 Interpretation of Coefficients using Odds 548
12.6.8 Importance of Predictors 549
12.6.9 Logistic Regression for Matched Groups 550
XVI C O N T E N T S
13
12.7 Complete Examples of Logistic Regression 550
12.7.1 Evaluation of Limitations 551
12.7.1.1 Ratio of Cases to Variables and Missing Data 551
12.7.1.2 Adequacy of Expected Frequencies 554
12.7.1.3 Linearity in the Logit 558
12.7.1.4 Multicollinearity 558
12.7.1.5 Outliers in the Solution 559
12.7.2 Direct Logistic Regression with Two-Category Outcome 559
12.7.3 Sequential Logistic Regression with Three Categories
of Outcome 563
12.8 Comparisons of Programs 575
12.8.1 SPSS Package 575
12.8.2 SAS System 580
12.8.3 SYSTAT System 581
Principal Components and Factor Analysis 582
13.1 General Purpose and Description 582
13.2 Kinds of Research Questions 585
13.2.1 Number of Factors 585
13.2.2 Nature of Factors 586
13.2.3 Importance of Solutions and Factors 586
13.2.4 Testing Theory in FA 586
13.2.5 Estimating Scores on Factors 586
13.3 Limitations 586
13.3.1 Theoretical Issues 586
13.3.2 Practical Issues 587
13.3.2.1 Sample Size and Missing Data 588
13.3.2.2 Normality 588
13.3.2.3 Linearity 588
13.3.2.4 Absence of Outliers among Cases 588
13.3.2.5 Absence of Multicollinearity and Singularity 589
13.3.2.6 Factorabilityof/? 589
13.3.2.7 Absence of Outliers among Variables 589
13.4 Fundamental Equations for Factor Analysis 590
13.4.1 Extraction 591
13.4.2 Orthogonal Rotation 595
13.4.3 Communalities, Variance, and Covariance 596
13.4.4 Factor Scores 597
13.4.5 Oblique Rotation 600
13.4.6 Computer Analyses of Small-Sample Example 603
13.5 Major Types of Factor Analysis 609
13.5.1 Factor Extraction Techniques 609
13.5.1.1 PCAvs. FA 610
14
CONTENTS XVU
13.5.1.2 Principal Components 612
13.5.1.3 Principal Factors 612
13.5.1.4 Image Factor Extraction 612
13.5.1.5 Maximum Likelihood Factor Extraction 613
13.5.1.6 Unweighted Least Squares Factoring 613
13.5.1.7 Generalized (Weighted) Least Squares Factoring 613
13.5.1.8 Alpha Factoring 613
13.5.2 Rotation 614
13.5.2.1 Orthogonal Rotation 614
13.5.2.2 Oblique Rotation 616
13.5.2.3 Geometric Interpretation 616
13.5.3 Some Practical Recommendations 618
13.6 Some Important Issues 619
13.6.1 Estimates of Communalities 619
13.6.2 Adequacy of Extraction and Number of Factors 620
13.6.3 Adequacy of Rotation and Simple Structure 622
13.6.4 Importance and Internal Consistency of Factors 623
13.6.5 Interpretation of Factors 625
13.6.6 Factor Scores 626
13.6.7 Comparisons among Solutions and Groups 627
13.7 Complete Example of FA 627
13.7.1 Evaluation of Limitations 628
13.7.1.1 Sample Size and Missing Data 628
13.7.1.2 Normality 628
13.7.1.3 Linearity 628
13.7.1.4 Outliers 628
13.7.1.5 Multicollinearity and Singularity 633
13.7.1.6 Factorabilityof/? 633
13.7.1.7 Outliers among Variables 633
13.7.2 Principal Factors Extraction with Varimax Rotation 633
13.8 Comparison of Programs 648
13.8.1 SPSS Package 648
13.8.2 SAS System 652
13.8.3 SYSTAT System 652
Structural Equation Modeling
by Jodie B. Ullman 653
14.1 General Purpose and Description 653
14.2 Kinds of Research Questions 657
14.2.1 Adequacy of the Model 657
14.2.2 Testing Theory 657
14.2.3 Amount of Variance in the Variables Accounted for
by the Factors 657
14.2.4 Reliability of the Indicators 657
XV111 C O N T E N T S
14.2.5 Parameter Estimates 657
14.2.6 Mediation 658
14.2.7 Group Differences 658
14.2.8 Longitudinal Differences 658
14.2.9 Multilevel Modeling 658
14.3 Limitations to Structural Equation Modeling 659
14.3.1 Theoretical Issues 659
14.3.2 Practical Issues 659
14.3.2.1 Sample Size and Missing Data 659
14.3.2.2 Multivariate Normality and Absence of Outliers 660
14.3.2.3 Linearity 660
14.3.2.4 Absence of Multicollinearity and Singularity 660
14.3.2.5 Residuals 661
14.4 Fundamental Equations for Structural Equations Modeling 661
14.4.1 Covariance Algebra 661
14.4.2 Model Hypotheses 663
14.4.3 Model Specification 665
14.4.4 Model Estimation 667
14.4.5 Model Evaluation 672
14.4.6 Computer Analysis of Small-Sample Example 674
14.5 Some Important Issues 691
14.5.1 Model Identification 691
14.5.2 Estimation Techniques 694
14.5.2.1 Estimation Methods and Sample Size 696
14.5.2.2 Estimation Methods and Nonnormality 697
14.5.2.3 Estimation Methods and Dependence 697
14.5.2.4 Some Recommendations for Choice
of Estimation Method 697
14.5.3 Assessing the Fit of the Model 697
14.5.3.1 Comparative Fit Indices 698
14.5.3.2 Absolute Fit Index 700
14.5.3.3 Indices of Proportion of Variance Accounted 700
14.5.3.4 Degree of Parsimony Fit Indices 701
14.5.3.5 Residual-Based Fit Indices 702
14.5.3.6 Choosing among Fit Indices 702
14.5.4 Model Modification 703
14.5.4.1 Chi-Square Difference Test 703
14.5.4.2 Lagrange Multiplier Test (LM) 703
14.5.4.3 WaldTest 713
14.5.4.4 Some Caveats and Hints on Model Modification 715
14.5.5 Reliability and Proportion of Variance 715
14.5.6 Discrete and Ordinal Data 716
14.5.7 Multiple Group Models 717
14.5.8 Mean and Covariance Structure Models 718
CONTENTS XIX
14.6 Complete Examples of Structural Equation Modeling Analysis 719
14.6.1 Confirmatory Factor Analysis of the WISC 719
14.6.1.1 Model Specification for CFA 719
14.6.1.2 Evaluation of Assumptions for CFA 719
14.6.1.3 CFA Model Estimation and Preliminary Evaluation 721
14.6.1.4 Model Modification 730
14.6.2 SEM of Health Data 737
14.6.2.1 SEM Model Specification 737
14.6.2.2 Evaluation of Assumptions for SEM 738
14.6.2.3 Model Estimation and Preliminary Evaluation 742
14.6.2.4 Model Modification 745
14.7 Comparison of Programs 764
14.7.1 EQS 764
14.7.2 LISREL 764
14.7.3 SAS 771
14.7.4 AMOS 771
J..5 Survival/Failure Analysis 772
15.1 General Purpose and Description 772
15.2 Kinds of Research Questions 773
15.2.1 Proportions Surviving at Various Times 773
15.2.2 Group Differences in Survival 774
15.2.3 Survival Time with Covariates 774
15.2.3.1 Treatment Effects 774
15.2.3.2 Importance of Covariates 774
15.2.3.3 Parameter Estimates 774
15.2.3.4 Contingencies among Covariates 774
15.2.3.5 Strength of Association and Power 774
15.3 Limitations to Survival Analysis 775
15.3.1 Theoretical Issues 775
15.3.2 Practical Issues 775
15.3.2.1 Sample Size and Missing Data 775
15.3.2.2 Normality of Sampling Distributions, Linearity,
and Homoscedasticity 775
15.3.2.3 Absence of Outliers 775
15.3.2.4 Differences between Withdrawn and Remaining Cases 776
15.3.2.5 Change in Survival Conditions over Time 776
15.3.2.6 Proportionality of Hazards 776
15.3.2.7 Absence of Multicollinearity 776
15.4 Fundamental Equations for Survival Analysis 776
15.4.1 Life Tables 777
15.4.2 Standard Error of Cumulative Proportion Surviving 778
XX CONTENTS
15.4.3 Hazard and Density Functions 779
15.4.4 Plot of Life Tables 780
15.4.5 Test for Group Differences 781
15.4.6 Computer Analyses of Small-Sample Example 783
15.5 Types of Survival Analysis 791
15.5.1 Actuarial and Product-Limit Life Tables
and Survivor Functions 791
15.5.2 Prediction of Group Survival Times from Covariates 796
15.5.2.1 Direct, Sequential, and Statistical Analysis 796
15.5.2.2 Cox Proportional-Hazards Model 797
15.5.2.3 Accelerated Failure-Time Model 797
15.5.2.4 Choosing a Method 804
15.6 Some Important Issues 805
15.6.1 Proportionality of Hazards 805
15.6.2 Censored Data 807
15.6.2.1 Right-Censored Data 807
15.6.2.2 Other Forms of Censoring 808
15.6.3 Strength of Association and Power 808
15.6.4 Statistical Criteria 809
15.6.4.1 Test Statistics for Group Differences
in Survival Functions 809
15.6.4.2 Test Statistics for Prediction from Covariates 809
15.6.5 Odds Ratios 811
15.7 Complete Example of Survival Analysis 813
15.7.1 Evaluation of Assumptions 814
15.7.1.1 Accuracy of Input, Adequacy of Sample Size, Missing Data,
and Distributions 814
15.7.1.2 Outliers 814
15.7.1.3 Differences between Withdrawn and Remaining Cases 816
15.7.1.4 Change in Survival Experience over Time 819
15.7.1.5 Proportionality of Hazards 820
15.7.1.6 Multicollinearity 821
15.7.2 Cox Regression Survival Analysis 822
15.7.2.1 Effect of Drug Treatment 822
15.7.2.2 Evaluation of Other Covariates 825
15.8 Comparison of Programs 829
15.8.1 SAS System 829
15.8.2 SYSTAT System 829
15.8.3 SPSS Package 836
Time-Series Analysis 837
16.1 General Purpose and Description 837
CONTENTS XXI
16.2 Kinds of Research Questions 839
16.2.1 Pattern of Autocorrelation 841
16.2.2 Seasonal Cycles and Trends 841
16.2.3 Forecasting 841
16.2.4 Effect of an Intervention 841
16.2.5 Comparing Time Series 841
16.2.6 Time Series with Covariates 842
16.2.7 Strength of Association and Power 842
16.3 Assumptions of Time-Series Analysis 842
16.3.1 Theoretical Issues 842
16.3.2 Practical Issues 842
16.3.2.1 Normality of Distributions of Residuals 842
16.3.2.2 Homogeneity of Variance and Zero Mean of Residuals 843
16.3.2.3 Independence of Residuals 843
16.3.2.4 Absence of Outliers 843
16.4 Fundamental Equations for Time-Series ARIMA Model s 843
16.4.1 Identification of ARIMA (p, d, q) Models 844
16.4.1.1 Trend Components, d: Making the Process Stationary 844
16.4.1.2 Auto-Regressive Components 847
16.4.1.3 Moving Average Components 848
16.4.1.4 Mixed Models 848
16.4.1.5 ACFsandPACFs 849
16.4.2 Estimating Model Parameters 854
16.4.3 Diagnosing a Model 855
16.4.4 Computer Analysis of Small-Sample Time-Series Example 855
16.5 Types of Time-Series Analysis 865
16.5.1 Models with Seasonal Components 865
16.5.2 Models with Interventions 869
16.5.2.1 Abrupt, Permanent Effects 870
16.5.2.2 Abrupt, Temporary Effects 870
16.5.2.3 Gradual, Permanent Effects 872
16.5.2.4 Models with Multiple Interventions 877
16.5.3 Adding Continuous Variables 877
16.6 Some Important Issues 878
16.6.1 Patterns of ACFs andPACFs 878
16.6.2 Strength of Association 881
16.6.3 Forecasting 882
16.6.4 Statistical Methods for Compari ng Two Models 884
16.7 Complete Example of a Time-Series Analysis 884
16.7.1 Evaluation of Assumptions 884
16.7.1.1 Normality of Sampling Distributions 884
16.7.1.2 Homogeneity of Variance 885
16.7.1.3 Outliers 885
XX11 CONTENTS
16.7.2 Baseline Model Identification and Estimation 885
16.7.3 Baseline Model Diagnosis 892
16.7.4 Intervention Analysis 893
16.7.4.1 Model Diagnosis 893
16.7.4.2 Model Interpretation 893
16.8 Comparison of Programs 897
16.8.1 SPSS Package 897
16.8.2 SAS System 897
16.8.3 SYSTAT System 900
J. / An Overview of the General Linear Model 901
17.1 Linearity and the General Linear Model 901
17.2 Bivariate to Multivariate Statistics and Overview
of Techniques 901
17.2.1 Bivariate Form 901
17.2.2 Simple Multivariate Form 902
17.2.3 Full Multivariate Form 904
17.3 Alternative Research Strategies 907
Appendix A A Skimpy Introduction to Matrix Algebra 908
A.I The Trace of a Matrix 909
A.2 Addition or Subtraction of a Constant to a Matrix 909
A.3 Multiplication or Division of a Matrix by a Constant 909
A.4 Addition and Subtraction of Two Matrices 910
A.5 Multiplication, Transposes, and Square Roots of Matrices 911
A.6 Matrix "Division" (Inverses and Determinants) 913
A.7 Eigenvalues and Eigenvectors: Procedures
for Consolidating Variance from a Matrix 914
Appendix 15 Research Designs for Complete Examples 918
B.I Women's Health and Drug Study 918
B.2 Sexual Attraction Study 919
B.3 Learning Disabilities Data Bank 922
B.4 Reaction Time to Identify Figures 923
CONTENTS XX111
B.5 Clinical Trial for Primary Biliary Cirrhosis 923
B.6 Impact of Seat Belt Law 924
Appendix C Statistical Tables 925
C.I Normal Curve Areas 926
C.2 Critical Values of the t Distribution for = .05 and .01,
Two-Tailed Test 927
C.3 Critical Values Of the /^Distribution 928
C.4 Critical Values of Chi Square (/
2
) 933
C.5 Critical Values for Square Multiple Correlation (R
2
) in Forward
Stepwise Selection a = .05 934
C.6 Critical Values for F
max
( S
ma x
/ S^ J Distribution for a = .05
max \ max'
and .01 936
References 937
Index 945

You might also like