Professional Documents
Culture Documents
Slide 1
Compu
ters II
Slide 2
Compu
ters II
Slide 3
Compu
ters II
Multicollinearity - 1
Slide 4
Compu
ters II
Multicollinearity - 2
Slide 5
Compu
ters II
Adjusted R
Slide 6
Compu
ters II
Influential cases
Slide 7
Compu
ters II
Slide 8
1.
2.
3.
4.
5.
Compu
ters II
Problem 1
Slide 9
True
True with caution
False
Inappropriate application of a statistic
ters II
Slide
10
Dissecting problem 1 - 1
When we test for influential cases
using Cooks distance, we need to
compute a critical value for
comparison using the formula:
4 / (n k 1)
where n is the number of cases and
k is the number of independent
variables. The correct value
(0.0160) is provided in the problem.
ters II
Slide
11
Dissecting problem 1 - 2
In the dataset GSS2000.sav, is the following statement true, false, or an
When a problem
states that there
is
a
variables
first in the
incorrect
application
of
a
statistic?
Assume thatThe
there
is nolisted
problem
with
relationship between some independent
problem statement are the
missing
data.
Use a level
ofwe
significance of 0.05independent
for the regression
variables and
a dependent
variable,
variables (IVs):
analysis.
a level
of significance of 0.01 for evaluating
assumptions.
do standard Use
multiple
regression.
"age" [age], "sex"
[sex], and
Use 0.0160 as the criteria for identifying influential
cases.socioeconomic
Validate the
"respondent's
index"
[sei]. in two, using
results of your regression analysis by splitting the
sample
788035 as the random number seed.
The variables "age" [age], "sex" [sex], and "respondent's socioeconomic
index" [sei] have a strong relationship to the variable "how many in family
earned money" [earnrs].
Survey respondents who were older had fewer
family members earning
The variable that is the
money. The variables sex and respondent's socioeconomic
index
target of the relationship
is did not
have a relationship to how many in family earned
money.
the dependent
variable
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
ters II
Slide
12
Dissecting problem 1 - 3
In the dataset GSS2000.sav, is the following statement true, false, or an
incorrect application of a statistic?
Assume
that
there
is no
with
In order for
a problem
to be
true, we
will problem
have
missing data. Use a level of significance
for the
regression
to find that thereof
is 0.05
a statistically
significant
relationship
the evaluating
set of IVs and assumptions.
the
analysis. Use a level of significance
ofbetween
0.01 for
and the strength
of the relationship
Use 0.0160 as the criteria forDV,
identifying
influential
cases. stated
Validate the
in the problem must be correct.
results of your regression analysis by splitting the sample in two, using
788035 as the random number seed.
The variables "age" [age], "sex" [sex], and "respondent's socioeconomic
index" [sei] have a strong relationship to the variable "how many in family
earned money" [earnrs].
Survey respondents who were older had fewer family members earning
money. The variables sex and respondent's socioeconomic index did not
have a relationship to how many in family earned money.
1.
2.
3.
4.
True
In addition, the relationship or lack of
True with caution
relationship between the individual IV's
and the DV must be identified correctly,
False
and must be characterized correctly.
Inappropriate application of a statistic
ters II
Slide
13
LEVEL OF MEASUREMENT
In the dataset GSS2000.sav, is the following statement true, false, or an
Multiple regression requires that
incorrect application
a statistic?
the dependent of
variable
be metric Assume that there is no problem with
missing data.
Useindependent
a level of
significance
of 0.05 for the regression
and the
variables
be
metric
or
dichotomous.
analysis. Use a level of significance of 0.01 for evaluating assumptions.
Use 0.0160 as the criteria for identifying influential cases. Validate the
results of your regression analysis by splitting the sample in two, using
788035 as the random number seed.
The variables "age" [age], "sex" [sex], and "respondent's socioeconomic
index" [sei] have a strong relationship to the variable "how many in family
earned money" [earnrs].
"How many in family earned money" [earnrs]
True
"Sex" [sex] is a dichotomous or dummyTrue with caution
coded nominal variable which may be included
False
in multiple regression analysis.
Inappropriate application of a statistic
ters II
Slide
14
ters II
Slide
15
ters II
Slide
16
Mean
1.47
46.62
1.57
48.601
Std. Deviation
1.008
16.642
.496
19.1110
N
254
254
254
254
ters II
Slide
17
ters II
Slide
18
ters II
Slide
19
ters II
Slide
20
Lower Bound
Upper Bound
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis
Statistic
1.43
1.31
Std. Error
.061
1.56
1.37
1.00
1.015
1.008
0
5
5
1.00
.742
1.324
.149
.296
ters II
Slide
21
The logarithmic
transformation
improves the normality
of "how many in family
earned money"
[earnrs]. In evaluating
normality, the skewness
(-0.483) and kurtosis (0.309) were both within
the range of acceptable
values from -1.0 to
+1.0.
The square root transformation also has
values of skewness and kurtosis in the
acceptable range.
However, by our order of preference for
which transformation to use, the logarithm
is preferred to the square root or inverse.
ters II
Slide
22
ters II
Slide
23
ters II
Slide
24
Whenever we add
transformed variables to
the data set, we should be
sure to delete them before
starting another analysis.
ters II
Slide
25
ters II
Slide
26
Lower Bound
Upper Bound
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis
Statistic
45.99
43.98
Std. Error
1.023
48.00
45.31
43.50
282.465
16.807
19
89
70
24.00
.595
-.351
.148
.295
ters II
Slide
27
ters II
Slide
28
Logarithm of EARNRS
[LG10( 1+EARNRS)]
AGE OF RESPONDENT
Logarithm of AGE
[LG10(AGE)]
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Logarithm of
EARNRS
AGE OF
Logarithm of
Square of
Square Root
Inverse of
The
evidence AGE
of linearity in
the
[LG10(
RESPON
AGE
of AGE
AGE
relationship[(AGE)**2]
between the
1+EARNRS)]
DENT
[LG10(AGE)]
[SQRT(AGE)]
[-1/(AGE)]
independent
variable
1
-.493**
-.417**
-.552**"age" [age]
-.457**
-.336**
and
the
dependent
variable
"log
.
.000
.000
.000
.000
.000
transformation of how many in
269
269
269
269
269
269
family earned money" [logearn]
-.493**
1
.979**
.983**
.916**
was the
statistical
significance .995**
of
.000
.
.000
.000
.000
the correlation
coefficient
(r = .000
269
270
270 The probability
270
270
-0.493).
for the270
-.417**
.979**
1 coefficient
.926** was <0.001,
.994**
.978**
correlation
less than. or equal
to the level of
.000
.000
.000
.000
.000
269
270
270
.960**
.000
270
.832**
.000
270
270
1
.
270
.951**
.000
270
270
.832**
.000
270
.951**
.000
270
1
.
270
ters II
Slide
29
ters II
Slide
30
Mean
95% Confidence
Interval for Mean
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis
Lower Bound
Upper Bound
Statistic
48.710
46.348
Std. Error
1.1994
51.072
47.799
39.600
366.821
19.1526
19.4
97.2
77.8
31.100
.585
-.862
.153
.304
ters II
Slide
31
ters II
Slide
32
Logarithm of EARNRS
[LG10( 1+EARNRS)]
RESPONDENT'S
SOCIOECONOMIC INDEX
Logarithm of SEI
[LG10(SEI)]
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Logarithm of
EARNRS
[LG10(
1+EARNRS)]
1
.
269
.055
.385
254
.073
.243
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
254
.036
.563
254
.064
.309
254
.092
.142
254
RESPONDEN
T'S
Logarithm of
Square Root
SOCIOECON
SEI
Square of SEI
of SEI
Inver
OMIC INDEX
[LG10(SEI)] for[(SEI)**2]
[SQRT(SEI)] SEI [-1
The probability
the correlation
.055
.073
.036
coefficient was 0.385, greater than .064
.385
the level of.243
significance .563
of 0.01. We .309
254
254
254
254
cannot reject the null hypothesis
1
.988**
.997**
that
r = 0,.987**
and cannot conclude
.
.000
.000
that
there .000
is a linear relationship
between the
255
255variables. 255
255
.987**
1
.951**
.997**
Since none of. the transformations
.000
.000
.000
255
255 that
successful, 255
it is an indication
the problem
may be a weak
.988**
.951**
1
relationship,
rather
than
a
.000
.000
.
curvilinear
relationship
correctable
255
255
255
by
using
a
transformation.
A weak
.997**
.997**
.973**
relationship is not a violation of the
.000
.000
.000
assumption of linearity, and does
255
255
255
not require a caution.
.948**
.986**
.892**
.000
.000
.000
255
255
255
255
.973**
.000
255
1
.
255
.970**
.000
255
ters II
Slide
33
HomoscedasticityAssumptionAnd
Transformations.SBS
ters II
Slide
34
ters II
Slide
35
ters II
Slide
36
ters II
Slide
37
ters II
Slide
38
The variable
containing
Cooks distances
for identifying
influential cases
has been
named coo_1
by SPSS.
ters II
Slide
39
ters II
Slide
40
ters II
Slide
41
Univariate outliers
A score on the dependent variable is
considered unusual if its studentized
residual is bigger than 3.0.
ters II
Slide
42
Multivariate outliers
The combination of scores for the
independent variables is an outlier
if the probability of the Mahalanobis
D distance score is less than or
equal to 0.001.
ters II
Slide
43
Influential cases
ters II
Slide
44
ters II
Slide
45
ters II
Slide
46
ters II
Slide
47
To complete the
request, we click on
the OK button.
ters II
Slide
48
ters II
Slide
49
ters II
Slide
50
ters II
Slide
51
ters II
Slide
52
Third, click on
the OK button to
complete the
specifications.
ters II
Slide
53
ters II
Slide
54
Third, click
on the
Continue
button to
complete the
specifications.
ters II
Slide
55
ters II
Slide
56
Model Summaryb
Model
1
R
.620a
R Square
.384
Adjusted
R Square
.377
Std. Error of
the Estimate
.1457258
ters II
Slide
57
SAMPLE SIZE
Descriptive Statistics
LOGEARN
AGE
SEX
SEI
Mean
.354289
46.70
1.57
48.819
Std. Deviation
.1845814
16.677
.496
19.1071
N
248
248
248
248
ters II
Slide
58
ters II
Slide
59
ters II
Slide
60
MULTICOLLINEARITY
Coefficientsa
Model
1
(Constant)
AGE
SEX
SEI
Unstandardized
Coefficients
B
Std. Error
.626
.048
-.007
.001
.024
.019
.000
.000
Standardized
Coefficients
Beta
-.615
.065
.018
t
12.989
-12.237
1.284
.354
Sig.
.000
.000
.200
.724
Collinearity Statistics
Tolerance
VIF
.999
.997
.997
1.001
1.003
1.004
ters II
Slide
61
Coefficientsa
Model
1
(Constant)
AGE
SEX
SEI
Unstandardized
Coefficients
B
Std. Error
.626
.048
-.007
.001
.024
.019
.000
.000
Standardized
Coefficients
Beta
-.615
.065
.018
t
12.989
-12.237
1.284
.354
Sig.
.000
.000
.200
.724
Collinearity Statistics
Tolerance
VIF
.999
.997
.997
1.001
1.003
1.004
ters II
Slide
62
Model
1
(Constant)
AGE
SEX
SEI
Unstandardized
Coefficients
B
Std. Error
.626
.048
-.007
.001
.024
.019
.000
.000
Standardized
Coefficients
Beta
-.615
.065
.018
t
12.989
-12.237
1.284
.354
Sig.
.000
.000
.200
.724
Collinearity Statistics
Tolerance
VIF
.999
.997
.997
1.001
1.003
1.004
ters II
Slide
63
Model
1
(Constant)
AGE
SEX
SEI
Unstandardized
Coefficients
B
Std. Error
.626
.048
-.007
.001
.024
.019
.000
.000
Standardized
Coefficients
Beta
-.615
.065
.018
t
12.989
-12.237
1.284
.354
Sig.
.000
.000
.200
.724
Collinearity Statistics
Tolerance
VIF
.999
.997
.997
1.001
1.003
1.004
ters II
Slide
64
Validation analysis:
set the random number seed
ters II
Slide
65
ters II
Slide
66
Validation analysis:
compute the split variable
ters II
Slide
67
ters II
Slide
68
ters II
Slide
69
ters II
Slide
70
First, scroll
down the list of
variables and
highlight the
variable split.
ters II
Slide
71
Click on the
Rule button
to enter a
value for split.
ters II
Slide
72
ters II
Slide
73
Click on the OK
button to
request the
output.
ters II
Slide
74
ters II
Slide
75
ters II
Slide
76
ters II
Slide
77
Click on the OK
button to
request the
output.
ters II
Slide
78
SPLIT-SAMPLE VALIDATION - 1
In both of the split-sample validation
analyses, the relationship between the
independent variables and the dependent
variable was statistically significant.
ANOVAb,c
Model
1
Regression
Residual
Total
Sum of
Squares
1.692
2.538
4.230
df
3
109
112
Mean Square
.564
.023
F
24.220
Sig.
.000a
Regression
Residual
Total
Sum of
Squares
1.500
2.614
4.114
df
3
131
134
Mean Square
.500
.020
F
25.062
Sig.
.000a
ters II
Slide
79
SPLIT-SAMPLE VALIDATION - 2
Model Summaryb,c
R
Model
1
SPLIT =
.0000
(Selected)
.632a
SPLIT ~=
.0000
(Unselected)
.593
R Square
.400
Durbin-Watson Statistic
The total proportion
SPLIT =of variance
SPLIT in
~= the
Adjusted relationship
Std. Error of utilizing
.0000 the full data
.0000 set
38.4% compared
for
R Square was
the Estimate
(Selected) to 40.0%
(Unselected)
the
first
split
sample
validation
and
.383
.1525916
2.117
1.862
b. Unless noted otherwise, statistics are based only on cases for which SPLIT = .0000.
c. Dependent Variable: LOGEARN
Model
1
SPLIT =
1.0000
(Selected)
.604a
SPLIT ~=
1.0000
(Unselected)
.621
R Square
.365
Adjusted
R Square
.350
Std. Error of
the Estimate
.1412615
Durbin-Watson Statistic
SPLIT =
SPLIT ~=
1.0000
1.0000
(Selected)
(Unselected)
1.839
2.161
ters II
Slide
80
Model
1
(Constant)
AGE
SEX
SEI
Unstandardized
Coefficients
B
Std. Error
.663
.077
-.007
.001
.024
.029
.000
.001
Standardized
Coefficients
Beta
-.628
.062
-.039
t
8.603
-8.429
.828
-.525
Sig.
.000
.000
.410
.601
Collinearity Statistics
Tolerance
VIF
.992
.989
.985
1.008
1.011
1.015
ters II
Slide
81
Model
1
(Constant)
AGE
SEX
SEI
Unstandardized
Coefficients
B
Std. Error
.595
.062
-.007
.001
.022
.025
.001
.001
Standardized
Coefficients
Beta
-.598
.063
.076
t
9.552
-8.590
.907
1.098
Sig.
.000
.000
.366
.274
Collinearity Statistics
Tolerance
VIF
.999
.999
.999
1.001
1.001
1.001
ters II
Slide
82
Model
1
(Constant)
AGE
SEX
SEI
Unstandardized
Coefficients
B
Std. Error
.663
.077
-.007
.001
.024
.029
.000
.001
Standardized
Coefficients
Beta
-.628
.062
-.039
t
8.603
-8.429
.828
-.525
Sig.
.000
.000
.410
.601
Collinearity Statistics
Tolerance
VIF
.992
.989
.985
1.008
1.011
1.015
ters II
Slide
83
Model
1
(Constant)
AGE
SEX
SEI
Unstandardized
Coefficients
B
Std. Error
.595
.062
-.007
.001
.022
.025
.001
.001
Standardized
Coefficients
Beta
-.598
.063
.076
t
9.552
-8.590
.907
1.098
Sig.
.000
.000
.366
.274
Collinearity Statistics
Tolerance
VIF
.999
.999
.999
1.001
1.001
1.001
ters II
Slide
84
Model
1
(Constant)
AGE
SEX
SEI
Unstandardized
Coefficients
B
Std. Error
.663
.077
-.007
.001
.024
.029
.000
.001
Standardized
Coefficients
Beta
-.628
.062
-.039
t
8.603
-8.429
.828
-.525
Sig.
.000
.000
.410
.601
Collinearity Statistics
Tolerance
VIF
.992
.989
.985
1.008
1.011
1.015
ters II
Slide
85
Model
1
(Constant)
AGE
SEX
SEI
Unstandardized
Coefficients
B
Std. Error
.595
.062
-.007
.001
.022
.025
.001
.001
Standardized
Coefficients
Beta
-.598
.063
.076
t
9.552
-8.590
.907
1.098
Sig.
.000
.000
.366
.274
Collinearity Statistics
Tolerance
VIF
.999
.999
.999
1.001
1.001
1.001
ters II
Slide
86
Split = 0
(Split1 = 1)
Split = 1
(Split2 = 1)
ANOVA significance
(sig <= 0.05)
<0.001
<0.001
<0.001
R2
0.384
0.400
0.365
Significant Coefficients
(sig <= 0.05)
Age of respondent
Age of respondent
Age of respondent
ters II
Slide
87
True
True with caution
False
Inappropriate application of a statistic
ters II
Slide
88
The
index" [sei] have a strong relationship to the variable "how many in family
earned money" [earnrs].
Survey respondents who were older had fewer family members earning
money. The variables sex and respondent's socioeconomic index did not
have a relationship to how many in family earned money.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
ters II
Slide
89
True
True with caution
False
Inappropriate application of a statistic
ters II
Slide
90
No
Inappropriate
application of
a statistic
Yes
Ratio of cases to
independent variables at
least 5 to 1?
Yes
Run baseline regression, using method for including
variables identified in the research question.
Record R for evaluation of transformations and
removal of outliers and influential cases.
Record Durbin-Watson statistic for assumption of
independence of errors.
No
Inappropriate
application of
a statistic
ters II
Slide
91
No
Try:
1. Logarithmic transformation
2. Square root transformation
3. Inverse transformation
If unsuccessful, add caution
for violation of regression
assumptions
Yes
Yes
No
Try:
1. Logarithmic transformation
2. Square root transformation
(3. Square transformation)
4. Inverse transformation
If unsuccessful, add caution
for violation of regression
assumptions
ters II
Slide
92
DV is homoscedastic for
categories of dichotomous
IVs?
No
Yes
Residuals are
independent,
Durbin-Watson between
1.5 and 2.5?
Yes
No
ters II
Slide
93
Yes
No
Ratio of cases to
independent variables at
least 5 to 1?
Yes
No
ters II
Slide
94
Yes
Evaluate impact of transformations and
removal of outliers by running regression
again, using method for including variables
identified in the research question.
Yes
Pick regression with
transformations and omitting
outliers for interpretation
No
Pick baseline regression
for interpretation
ters II
Slide
95
No
False
Yes
Tolerance for all IVs
greater than 0.10,
indicating no
multicollinearity?
Yes
No
False
ters II
Slide
96
No
Yes
Yes
No
False
ters II
Slide
97
No
False
Yes
Change in R statistically
significant in both
validation analyses?
(Hierarchical only)
No
False
Yes
Yes
No
False
ters II
Slide
98
No
Yes
DV is interval level and IVs
are interval level or
dichotomous?
No
Yes
Yes
True
No
ters II
Slide
99
ters II
Slide
100
Coefficientsa
Model
1
(Constant)
AGE
SEX
SEI
Unstandardized
Coefficients
B
Std. Error
.626
.048
-.007
.001
.024
.019
.000
.000
Standardized
Coefficients
Beta
-.615
.065
.018
t
12.989
-12.237
1.284
.354
Sig.
.000
.000
.200
.724
Collinearity Statistics
Tolerance
VIF
.999
.997
.997
1.001
1.003
1.004
ters II
Slide
101
ters II
Slide
102
ters II
Slide
103