Professional Documents
Culture Documents
1- Correlation analysis
2- ANOVA test
3- Independent t-test
4- Paired t-test
5- One sample t-test
6- Regression Analysis
7- Binary logistic Analysis
8- Randomness
9- Normality
10- Multinomial Logistic Regression
Correlation analysis (metric-metric)
Correlations
Educational
Employment
Level (years)
Category
Employment Category
Current Salary
Beginning Salary
Pearson Correlation
Current Salary
Salary
.514**
.661**
.633**
.000
.000
.000
474
474
474
474
.514**
.780**
.755**
.000
.000
Sig. (2-tailed)
N
Beginning
Sig. (2-tailed)
.000
474
474
474
474
.661**
.780**
.880**
Sig. (2-tailed)
.000
.000
474
474
474
474
.633**
.755**
.880**
Sig. (2-tailed)
.000
.000
.000
474
474
474
Pearson Correlation
Pearson Correlation
Double star showing that we are more than 99% sure and there is no bias.
Single star shows that we are 95% sure that there is no bias.
.000
474
No star means not related and insignificant and there is no role of this variable.
.514 most significant moderate and positive relation is found between educational level and
employment category.
To purify the relation between beginning salary and current salary use partial.
Take beginning salary and current salary into variables and the others all are in controlling.
Correlations
Control Variables
Current Salary
Current Salary
Beginning Salary
Correlation
Beginning Salary
1.000
.674
Significance (2-tailed)
.000
df
464
Correlation
.674
1.000
Significance (2-tailed)
.000
df
464
The pure relation between beginning salary and current salary is 67.4%.
ANOVA TEST groups are non-metric
ANOVA
Current Salary
Sum of Squares
df
Mean Square
Between Groups
8.944E10
4.472E10
Within Groups
4.848E10
471
1.029E8
Total
1.379E11
473
F
434.481
Sig.
.000
As sig value is .000 which shows Ho is rejected and three groups are not same. But which two
are same?
Now check the values on homogenous subset.
Current Salary
Tukey HSDa,,b
Subset for alpha = 0.05
Employment Category
Clerical
363
$27,838.54
Custodial
27
$30,938.89
Manager
84
$63,977.80
Sig.
.227
1.000
Independent t-test
Ho equal variances
H1 not equal variances
F
Weight Equal variances
Sig.
.221
.646
t
.542
df
tailed)
Mean
Std. Error
Difference Difference
Lower
Upper
13
.597
2.57143
.535 11.880
.602
2.57143
assumed
Equal variances
not assumed
As sig value is greater than 0.05 therefore accept Ho and it is concluded that variances are equal
while the difference in mean/averages is found insignificant (.597) therefore it is concluded that
males are having insignificantly higher than females (means not that much).
Association analysis (cross tab or cross measuring, test of independence)
For non-metric/ non metric relation
Symmetric Measures
Value
Nominal by Nominal
N of Valid Cases
Contingency Coefficient
Approx. Sig.
.229
474
.000
Clerical
Custodial
Manager
Total
Count
Yes
Total
276
87
363
76.0%
24.0%
100.0%
74.6%
83.7%
76.6%
% of Total
58.2%
18.4%
76.6%
14
13
27
51.9%
48.1%
100.0%
3.8%
12.5%
5.7%
% of Total
3.0%
2.7%
5.7%
80
84
95.2%
4.8%
100.0%
21.6%
3.8%
17.7%
% of Total
16.9%
.8%
17.7%
370
104
474
78.1%
21.9%
100.0%
100.0%
100.0%
100.0%
78.1%
21.9%
100.0%
Count
Count
Count
% of Total
Regression analysis
Linear regression
Things to report:
1- accept/reject of hypothesis
2- R, Adjusted R, S.E
3- Presence of Multi, hetero, auto
4- Model (Regression model)
5- Model description
First of all check ANOVA
If the sig value is less than 0.05 then regression is fitted
ANOVAb
Model
1
Sum of Squares
df
Mean Square
Regression
1.148E11
2.869E10
Residual
2.314E10
469
4.934E7
Total
1.379E11
473
F
581.575
Sig.
.000a
a. Predictors: (Constant), Previous Experience (months), Beginning Salary, Educational Level (years), Employment
Category
b. Dependent Variable: Current Salary
Interpretation:
As sig value is less than 0.05 refers to reject Ho and concluded that at least
anyone of the BS, JC, EL, PE effects the current salary of the employees.
BS= Basic Salary
JC= Job Category
EL = Educational Level
PE = Previous Experience
And now
Model summary table:
R- Shows the strength of relationship between dependent and selected
independents.
Model Summaryb
Model
1
R Square
.912a
Adjusted R
Square
Estimate
.832
.831
Durbin-Watson
$7,024.152
1.753
R- 0.92 there exists a strong (do not focus on sign) relationship b/w current
salary and BS, JC, EL, PE.
Adjusted R- square (as it shows unbiased accuracy)
Adjusted R-square is 83.1% means model is 83.1% accurate.
S.E allowed margin of error.
Comes to Data and check the value of salary that is 57000 and our model
predicted 57226 and there is difference of -226 which is under 7024(S.E)
means our model is right.
To check multi comes to table coefficient
VIF is greater than 10 means multi exist and less than 10 means does not
exists.
Coefficientsa
Model
1
Unstandardized
Standardized
Coefficients
Coefficients
B
(Constant)
Educational Level
Std. Error
-3068.271
1782.508
601.303
155.934
5930.283
Collinearity Statistics
Beta
Sig.
Tolerance
VIF
-1.721
.086
.102
3.856
.000
.515
1.940
640.029
.269
9.266
.000
.426
2.348
1.342
.070
.618
19.035
.000
.339
2.950
-19.031
3.327
-.117
-5.720
.000
.861
1.161
(years)
Employment Category
Beginning Salary
Previous Experience
(months)
a. Dependent Variable: Current Salary
All of the VIF are less than 10 that shows that no multi collinearity exists and
all the coefficients are showing the pure effect of their corresponding
variable on the target variable.
Or
Tolerance value greater than 0.1 means no multi collinearity exists.
For auto correlation Durbin Watson
D.W = 2 no auto correlation
D.W is not equal to 2 then auto correlation exists.
1.453 is not equal to 2
1.753 is equal to 2
Model:
Y = + 1X1 + 2X2+ 3X3+4X4+E
CS = + 1 (BS) + 2 (JC) + 3 (EL) + 4 (PE) + E
CS = -3068 + 1.342(BS) + 5930 (JC) + 601 (EL) 19 (PE) removing error sign
in the final model.
Now check sig values if sig value is greater than 0.05 then exclude that from
the model
As constant sig value is greater than 0.05 therefore remove constant from
the model
CS = 1.34 (BS) + 5930 (JC) + 601 (EL) 19 (PE)
CS will increase 1.34$ if the beginning salary is increases by 1$ because unit
of salary is written in 1$.
CS will increase by 19 if the previous experience decrease by 1 month.
From standardized coefficient
Showing the percentage and tells us which one is more effective
BS (61%)
JC (27%)
EL (10%)
PE (12%)
BS is most effect on current salary because it is standardized coefficient
value 61%.
Chi-square
df
Sig.
11.297
.185
df
Sig.
Step
252.214
.000
Block
252.214
.000
Model
252.214
.000
Sig value will always be same for all three (step, block, and model)
In this case the model has significant ability to reflect the target variables on the covariates
because sig value is less than 0.05
Block 0
Classification Tablea,b,c
Predicted
Previously defaulted
Observed
Step 0
Previously defaulted
No
Yes
Percentage Correct
No
517
.0
Yes
183
100.0
Overall Percentage
a. No terms in the model.
b. Initial Log-likelihood Function: -2 Log Likelihood = 970.406
c. The cut value is .500
26.1
Variables
df
Sig.
employ
195.424
.000
address
148.225
.000
income
117.972
.000
age
169.521
.000
215.271
.000
Overall Statistics
Sig value less than 0.05 must be included in order to better predict the default status.
Model Summary
Step
-2 Log likelihood
718.192a
Nagelkerke R Square
.303
.403
a. Estimation terminated at iteration number 4 because parameter estimates changed by less than .001.
Nagelkerke R Square
It is showing that at least 40.3% accuracy may increase in prediction of defaulter by considering their age,
income, address, and current employer.
Cox & Snell R Square
Cox & Snell R Square are always less than Nagelkerke R Square
S.E.
Wald
df
Sig.
Exp(B)
employ
-.164
.023
51.984
.000
.849
address
-.051
.018
8.242
.004
.950
income
.013
.004
13.148
.000
1.013
-.001
.006
.051
.821
.999
age
Classification Tablea
Predicted
Previously defaulted
Observed
Step 1
Previously defaulted
No
Yes
Percentage Correct
No
487
30
94.2
Yes
153
30
16.4
Overall Percentage
a. The cut value is .500
This model can predict the default status of a customer 73.9% accurately.
+ means default chances are increases and
-
73.9
Runs Test
Italy
Test Valuea
France
China
Enthusiast
8.4857
8.8953
8.1063
8.9553
8.0387
8.8367
8.1533
8.5050
140
125
171
133
163
131
163
152
160
175
129
167
137
169
137
148
Total Cases
300
300
300
300
300
300
300
300
Number of Runs
158
154
150
152
154
146
153
148
.891
.853
.229
.343
.481
-.305
.364
-.344
.373
.394
.819
.732
.631
.760
.716
.731
Value
tailed)
a. Mean
Italy sig value 0.373 which is greater than 0.05 therefore accept H1 and data is random
Normality
Only for metric variables
Tests of Normality
Kolmogorov-Smirnova
Statistic
Current Salary
.208
df
Shapiro-Wilk
Sig.
474
.000
Statistic
.771
df
Sig.
474
.000
If sig value is greater than alpha (sig value) then test is normal. In this case
is less than 0.05 means test is abnormal.
Descriptives
Statistic
Current Salary
Mean
Std. Error
$34,419.57
Lower Bound
$32,878.40
Upper Bound
$35,960.73
5% Trimmed Mean
$32,455.19
Median
$28,875.00
Variance
$784.311
2.916E8
Std. Deviation
$17,075.661
Minimum
$15,750
Maximum
$135,000
Range
$119,250
Interquartile Range
$13,163
Skewness
2.125
.112
Kurtosis
5.378
.224
One-Sample Test
Test Value = 30000
95% Confidence Interval of the
Difference
t
Current Salary
df
5.635
Sig. (2-tailed)
473
.000
Mean Difference
$4,419.568
Lower
$2,878.40
Upper
$5,960.73
Paired t-test
Correlation
Sig.
.946
.000
Mean
Pair 1 VAR00001 -
1.87500
Std.
Std. Error
Deviation
Mean
3.09089
1.09279
of the Difference
Lower
-.70904
Upper
4.45904
Sig. (2t
df
1.716
tailed)
7
.130
VAR00002
Classification
Predicted
Observed
Anorexia Nervosa
Anorexia
Nervosa
Atypical
Eating
Disorder
Percent
Correct
97
100.0%
36
100.0%
56
100.0%
Atypical Eating
Disorder
28
100.0%
44.7%
16.6%
25.8%
12.9%
100.0%
Overall Percentage
-
ABN
BNA
Ref: AED