Professional Documents
Culture Documents
Multiple regression assumes that the errors are independent and there is no serial correlation. Errors are the residuals or differences between the actual score for a case and the score estimated using the regression equation. No serial correlation implies that the size of the residual for one case has no impact on the size of the residual for the next case. The Durbin-Watson statistic is used to test for the presence of serial correlation among the residuals. The value of the Durbin-Watson statistic ranges from 0 to 4. As a general rule of thumb, the residuals are not correlated if the Durbin-Watson statistic is approximately 2, and an acceptable range is 1.50 - 2.50.
Multicollinearity - 1
Multicollinearity is a problem in regression analysis that occurs when two independent variables are highly correlated, e.g. r = 0.90, or higher. The relationship between the independent variables and the dependent variables is distorted by the very strong relationship between the independent variables, leading to the likelihood that our interpretation of relationships will be incorrect. In the worst case, if the variables are perfectly correlated, the regression cannot be computed. SPSS guards against the failure to compute a regression solution by arbitrarily omitting the collinear variable from the analysis. SW388R7
Data Analysis & Computers II
Multicollinearity - 2
Multicollinearity is detected by examining the tolerance for each independent variable. Tolerance is the amount of variability in one independent variable that is no explained by the other independent variables. Tolerance values less than 0.10 indicate collinearity. If we discover collinearity in the regression output, we should reject the interpretation of the relationships as false until the issue is resolved. Multicollinearity can be resolved by combining the highly correlated variables through principal component analysis, or omitting a variable from the analysis.
SW388R7 Data Analysis & Computers II
Adjusted R
The coefficient of determination, R which measures the strength of a relationship, can usually be increased simply by adding more variables. In fact, if the number of variables equals the number of cases, it is often possible to produce a perfect R of 1.00.
Adjusted R is a measure which attempts to reduce the inflation in R by taking into account the number of independent variables and the number of cases. If there is a large discrepancy between R and Adjusted R, extraneous variables should be removed from the analysis and R recomputed.
SW388R7 Data Analysis & Computers II
Influential cases
The degree to which outliers affect the regression solution depends upon where the outlier is located relative to the other cases in the analysis. Outliers whose location have a large effect on the regression solution are called influential cases. Whether or not a case is influential is measured by Cooks distance. Cooks distance is an index measure; it is compared to a critical value based on the formula: 4 / (n k 1) where n is the number of cases and k is the number of independent variables. If a case has a Cooks distance greater than the critical value, it should be examined for exclusion. The criteria for identifying influential case is a Cook's distance score with a value of 0.0160 SW388R7 Data Analysis & Computers II or greater.
S
No
Yes
No
Yes
Run baseline regression, using method for including variables identified in the research question. Record R for evaluation of transformations and removal of outliers and influential cases. Record Durbin-Watson statistic for assumption of independence of errors.
No
Try: 1. Logarithmic transformation 2. Square root transformation 3. Inverse transformation If unsuccessful, add caution for violation of regression assumptions
Yes
Try: 1. Logarithmic transformation 2. Square root transformation (3. Square transformation) 4. Inverse transformation If unsuccessful, add caution for violation of regression assumptions
No
Yes
No
Yes
No
Yes
Yes
No
No
Restore outliers and influential cases to data set, add caution to findings
Yes
No
Yes
Evaluate impact of transformations and removal of outliers by running regression again, using method for including variables identified in the research question.
Yes
Pick regression with transformations and omitting outliers for interpretation
No
Pick baseline regression for interpretation
No
False
Yes
Tolerance for all IVs greater than 0.10, indicating no multicollinearity?
No
False
Yes
Yes
Set the random seed and compute the split variable Re-run regression with split = 0 Re-run regression with split = 1
Set the first random seed and compute the split1 variable Re-run regression with split1 = 1 Set the second random seed and compute the split2 variable Re-run regression with split2 = 1
Probability of ANOVA test <= level of significance for both validation analyses?
No
False