You are on page 1of 17

Test explained

1- Correlation analysis
2- ANOVA test
3- Independent t-test
4- Paired t-test
5- One sample t-test
6- Regression Analysis
7- Binary logistic Analysis
8- Randomness
9- Normality
10- Multinomial Logistic Regression
Correlation analysis (metric-metric)

Correlations
Educational

Employment

Level (years)

Category

Educational Level (years) Pearson Correlation

Employment Category

Current Salary

Beginning Salary

Pearson Correlation

Current Salary

Salary

.514**

.661**

.633**

.000

.000

.000

474

474

474

474

.514**

.780**

.755**

.000

.000

Sig. (2-tailed)
N

Beginning

Sig. (2-tailed)

.000

474

474

474

474

.661**

.780**

.880**

Sig. (2-tailed)

.000

.000

474

474

474

474

.633**

.755**

.880**

Sig. (2-tailed)

.000

.000

.000

474

474

474

Pearson Correlation

Pearson Correlation

**. Correlation is significant at the 0.01 level (2-tailed).

Double star showing that we are more than 99% sure and there is no bias.
Single star shows that we are 95% sure that there is no bias.

.000

474

No star means not related and insignificant and there is no role of this variable.
.514 most significant moderate and positive relation is found between educational level and
employment category.
To purify the relation between beginning salary and current salary use partial.
Take beginning salary and current salary into variables and the others all are in controlling.

Correlations
Control Variables

Current Salary

Employee Code & Date of

Current Salary

Birth & Educational Level


(years) & Employment
Category & Months since Hire
& Previous Experience

Beginning Salary

(months) & Minority


Classification

Correlation

Beginning Salary

1.000

.674

Significance (2-tailed)

.000

df

464

Correlation

.674

1.000

Significance (2-tailed)

.000

df

464

The pure relation between beginning salary and current salary is 67.4%.
ANOVA TEST groups are non-metric

ANOVA
Current Salary
Sum of Squares

df

Mean Square

Between Groups

8.944E10

4.472E10

Within Groups

4.848E10

471

1.029E8

Total

1.379E11

473

F
434.481

Sig.
.000

As sig value is .000 which shows Ho is rejected and three groups are not same. But which two
are same?
Now check the values on homogenous subset.

Current Salary
Tukey HSDa,,b
Subset for alpha = 0.05
Employment Category

Clerical

363

$27,838.54

Custodial

27

$30,938.89

Manager

84

$63,977.80

Sig.

.227

1.000

Means for groups in homogeneous subsets are displayed.


a. Uses Harmonic Mean Sample Size = 58.031.
b. The group sizes are unequal. The harmonic mean of the group sizes is used. Type I error levels are not
guaranteed.

Independent t-test
Ho equal variances
H1 not equal variances

Independent Samples Test


Levene's Test for
Equality of Variances

t-test for Equality of Means


95% Confidence
Interval of the
Difference
Sig. (2-

F
Weight Equal variances

Sig.
.221

.646

t
.542

df

tailed)

Mean

Std. Error

Difference Difference

Lower

Upper

13

.597

2.57143

4.74714 -7.68414 12.82700

.535 11.880

.602

2.57143

4.80380 -7.90694 13.04980

assumed
Equal variances
not assumed

As sig value is greater than 0.05 therefore accept Ho and it is concluded that variances are equal
while the difference in mean/averages is found insignificant (.597) therefore it is concluded that
males are having insignificantly higher than females (means not that much).
Association analysis (cross tab or cross measuring, test of independence)
For non-metric/ non metric relation
Symmetric Measures
Value
Nominal by Nominal
N of Valid Cases

Contingency Coefficient

Approx. Sig.
.229
474

Value of contingent Coefficient is .229% which is weak and sign is +. It is


significant as the sig value is below 0.05.
Reporting:
There exists a significant but weak positive relation b/w employment
category and Minority classification.

.000

Employment Category * Minority Classification Crosstabulation


Minority Classification
No
Employment Category

Clerical

Custodial

Manager

Total

Count

Yes

Total

276

87

363

% within Employment Category

76.0%

24.0%

100.0%

% within Minority Classification

74.6%

83.7%

76.6%

% of Total

58.2%

18.4%

76.6%

14

13

27

% within Employment Category

51.9%

48.1%

100.0%

% within Minority Classification

3.8%

12.5%

5.7%

% of Total

3.0%

2.7%

5.7%

80

84

% within Employment Category

95.2%

4.8%

100.0%

% within Minority Classification

21.6%

3.8%

17.7%

% of Total

16.9%

.8%

17.7%

370

104

474

% within Employment Category

78.1%

21.9%

100.0%

% within Minority Classification

100.0%

100.0%

100.0%

78.1%

21.9%

100.0%

Count

Count

Count

% of Total

If employment category is the banner then report:


From employment category clerical there are 76% who dont belong to
minority and 24% belongs to minority classification.
From employment category custodial 51.9% dont belong to minority
classification and and 48.1% belongs to minority classification. And so on so
forth.
If minority classification is the banner the report:
Employee who says no to minority classification are 74.6% belongs to
employment category clerical.

Regression analysis
Linear regression
Things to report:
1- accept/reject of hypothesis
2- R, Adjusted R, S.E
3- Presence of Multi, hetero, auto
4- Model (Regression model)
5- Model description
First of all check ANOVA
If the sig value is less than 0.05 then regression is fitted

ANOVAb
Model
1

Sum of Squares

df

Mean Square

Regression

1.148E11

2.869E10

Residual

2.314E10

469

4.934E7

Total

1.379E11

473

F
581.575

Sig.
.000a

a. Predictors: (Constant), Previous Experience (months), Beginning Salary, Educational Level (years), Employment
Category
b. Dependent Variable: Current Salary

Interpretation:
As sig value is less than 0.05 refers to reject Ho and concluded that at least
anyone of the BS, JC, EL, PE effects the current salary of the employees.
BS= Basic Salary
JC= Job Category
EL = Educational Level
PE = Previous Experience

And now
Model summary table:
R- Shows the strength of relationship between dependent and selected
independents.

Model Summaryb

Model
1

R Square
.912a

Adjusted R

Std. Error of the

Square

Estimate

.832

.831

Durbin-Watson

$7,024.152

1.753

a. Predictors: (Constant), Previous Experience (months), Beginning Salary, Educational


Level (years), Employment Category
b. Dependent Variable: Current Salary

R- 0.92 there exists a strong (do not focus on sign) relationship b/w current
salary and BS, JC, EL, PE.
Adjusted R- square (as it shows unbiased accuracy)
Adjusted R-square is 83.1% means model is 83.1% accurate.
S.E allowed margin of error.
Comes to Data and check the value of salary that is 57000 and our model
predicted 57226 and there is difference of -226 which is under 7024(S.E)
means our model is right.
To check multi comes to table coefficient
VIF is greater than 10 means multi exist and less than 10 means does not
exists.

Coefficientsa

Model
1

Unstandardized

Standardized

Coefficients

Coefficients

B
(Constant)
Educational Level

Std. Error

-3068.271

1782.508

601.303

155.934

5930.283

Collinearity Statistics

Beta

Sig.

Tolerance

VIF

-1.721

.086

.102

3.856

.000

.515

1.940

640.029

.269

9.266

.000

.426

2.348

1.342

.070

.618

19.035

.000

.339

2.950

-19.031

3.327

-.117

-5.720

.000

.861

1.161

(years)
Employment Category
Beginning Salary
Previous Experience
(months)
a. Dependent Variable: Current Salary

All of the VIF are less than 10 that shows that no multi collinearity exists and
all the coefficients are showing the pure effect of their corresponding
variable on the target variable.
Or
Tolerance value greater than 0.1 means no multi collinearity exists.
For auto correlation Durbin Watson
D.W = 2 no auto correlation
D.W is not equal to 2 then auto correlation exists.
1.453 is not equal to 2
1.753 is equal to 2

Model:
Y = + 1X1 + 2X2+ 3X3+4X4+E
CS = + 1 (BS) + 2 (JC) + 3 (EL) + 4 (PE) + E
CS = -3068 + 1.342(BS) + 5930 (JC) + 601 (EL) 19 (PE) removing error sign
in the final model.
Now check sig values if sig value is greater than 0.05 then exclude that from
the model

As constant sig value is greater than 0.05 therefore remove constant from
the model
CS = 1.34 (BS) + 5930 (JC) + 601 (EL) 19 (PE)
CS will increase 1.34$ if the beginning salary is increases by 1$ because unit
of salary is written in 1$.
CS will increase by 19 if the previous experience decrease by 1 month.
From standardized coefficient
Showing the percentage and tells us which one is more effective
BS (61%)
JC (27%)
EL (10%)
PE (12%)
BS is most effect on current salary because it is standardized coefficient
value 61%.

Binary Logistics Regression


Effect of income, age, education, years at current address and employers on
the default status of a customer. File name banklaon.sav
Dependent default status (yes, no- binary)
Things to report
1- Godness of fit
2- Omnibus test of model coefficient (block 1)
3- Block Zero
4- Model Summary
5- Block 1
Goodness of fit from homer and lameshow test
If greater than 0.05 then it is fit.

Hosmer and Lemeshow Test


Step

Chi-square

df

Sig.

11.297

.185

The effect of independent variables is approved by the test as sig value is


0.185 which is greater than 0.05. If value comes less than 0.05 then stop the
test.
Omnibus Tests of Model Coefficients
Chi-square
Step 1

df

Sig.

Step

252.214

.000

Block

252.214

.000

Model

252.214

.000

Sig value will always be same for all three (step, block, and model)
In this case the model has significant ability to reflect the target variables on the covariates
because sig value is less than 0.05
Block 0
Classification Tablea,b,c
Predicted
Previously defaulted
Observed
Step 0

Previously defaulted

No

Yes

Percentage Correct

No

517

.0

Yes

183

100.0

Overall Percentage
a. No terms in the model.
b. Initial Log-likelihood Function: -2 Log Likelihood = 970.406
c. The cut value is .500

The information in this block refers to the fluke.


Cut off value is .500
The result is showing that the fluck can only predict 26.1% correct default status.

26.1

Variables not in the Equation


Score
Step 0

Variables

df

Sig.

employ

195.424

.000

address

148.225

.000

income

117.972

.000

age

169.521

.000

215.271

.000

Overall Statistics

Sig value less than 0.05 must be included in order to better predict the default status.
Model Summary
Step

-2 Log likelihood

Cox & Snell R Square

718.192a

Nagelkerke R Square
.303

.403

a. Estimation terminated at iteration number 4 because parameter estimates changed by less than .001.

Nagelkerke R Square
It is showing that at least 40.3% accuracy may increase in prediction of defaulter by considering their age,
income, address, and current employer.
Cox & Snell R Square
Cox & Snell R Square are always less than Nagelkerke R Square

Variables in the Equation


B
Step 1a

S.E.

Wald

df

Sig.

Exp(B)

employ

-.164

.023

51.984

.000

.849

address

-.051

.018

8.242

.004

.950

income

.013

.004

13.148

.000

1.013

-.001

.006

.051

.821

.999

age

a. Variable(s) entered on step 1: employ, address, income, age.

Lf = -0.001(age) 0.17 (ed) 0.165 (emp) 0.051 (add) + 0.014 (income)


On the bases of the sig value include the variables
Lf = -0.165 (emp) 0.051(add) + 0.014(income)

Classification Tablea
Predicted
Previously defaulted
Observed
Step 1

Previously defaulted

No

Yes

Percentage Correct

No

487

30

94.2

Yes

153

30

16.4

Overall Percentage
a. The cut value is .500

This model can predict the default status of a customer 73.9% accurately.
+ means default chances are increases and
-

Means chances decreases

Default is reduced by 0.51% by increasing the years at current address.


Default is increases by 1.4 % by increasing thousands of income.
Randomness
Ho: random data
H1= not random

73.9

Runs Test
Italy
Test Valuea

South Korea Romania

France

China

United States Russia

Enthusiast

8.4857

8.8953

8.1063

8.9553

8.0387

8.8367

8.1533

8.5050

Cases < Test Value

140

125

171

133

163

131

163

152

Cases >= Test

160

175

129

167

137

169

137

148

Total Cases

300

300

300

300

300

300

300

300

Number of Runs

158

154

150

152

154

146

153

148

.891

.853

.229

.343

.481

-.305

.364

-.344

Asymp. Sig. (2-

.373

.394

.819

.732

.631

.760

.716

.731

Value

tailed)
a. Mean

Italy sig value 0.373 which is greater than 0.05 therefore accept H1 and data is random
Normality
Only for metric variables

Tests of Normality
Kolmogorov-Smirnova
Statistic
Current Salary

.208

df

Shapiro-Wilk

Sig.
474

.000

Statistic
.771

df

Sig.
474

.000

a. Lilliefors Significance Correction

If sig value is greater than alpha (sig value) then test is normal. In this case
is less than 0.05 means test is abnormal.

Descriptives
Statistic
Current Salary

Mean

Std. Error

$34,419.57

95% Confidence Interval for


Mean

Lower Bound

$32,878.40

Upper Bound

$35,960.73

5% Trimmed Mean

$32,455.19

Median

$28,875.00

Variance

$784.311

2.916E8

Std. Deviation

$17,075.661

Minimum

$15,750

Maximum

$135,000

Range

$119,250

Interquartile Range

$13,163

Skewness

2.125

.112

Kurtosis

5.378

.224

If the value of skewness is 0 then test is normal otherwise abnormal.


One sample t-test
Claim was 30000

One-Sample Test
Test Value = 30000
95% Confidence Interval of the
Difference
t
Current Salary

df
5.635

Sig. (2-tailed)
473

.000

Mean Difference
$4,419.568

Lower
$2,878.40

Upper
$5,960.73

T- Value greater than 2 therefore reject the claim.


After applying one sample t-test, ii is identified that there exists a difference of 4419$ between
the claim value and the sample average which appears significant (sig. 0.000) therefore it is
concluded that the average salary of the employees is significantly higher than the claim value.
Therefore claim is considered wrong.

Paired t-test

Paired Samples Correlations


N
Pair 1

Correlation

VAR00001 & VAR00002

Sig.

.946

.000

.946 strong correlation and sig value is 0.000 which is significant.

Paired Samples Test


Paired Differences
95% Confidence Interval

Mean
Pair 1 VAR00001 -

1.87500

Std.

Std. Error

Deviation

Mean

3.09089

1.09279

of the Difference
Lower
-.70904

Upper
4.45904

Sig. (2t

df

1.716

tailed)
7

.130

VAR00002

Std error: 1.09279 which is ignorable


t- Value is less than 2 then accept.
It is identified that the before average is insignificantly higher than the after average therefore we
can say that both averages are insignificantly different from each other. Whereas both the
variables having a significant strong relation between them.
If the mean difference value is within the standard deviation then it is significant. (mean =
1.87500 less than standard deviation = 3.09089)

Multinomial logistic regression


This is a regression which is used to predict /model the dependent variable when it is non-metric
with multiple/more than 2 categories (multi chotomous)
As it is a logistic regression so it is based upon the assumption of GLM (exponential family)
therefore there is no restriction of normality, size of data, # of types of independent variables.
That means no auto correlation and hetroscadisity may occur in logistic regression.
Target variable must be non-metric

Dependent: diagnosis/disease___ 4 categories therefore use multinomial logistic regression.


Analyze_regression____multinomial logistic
In dependent variable it is showing reference category__by default is consider the last category
but u can choose on the base of four categories of your target variables.
All the independent variable should be selected in covariates
Likelihood ratio test is used to check whether the model with predictors is better than the model
without them (fluck) or not. If it found significant (sig<0.05) that means the M-logit is better
than fluck.
Also goodness of fit test is showing that the M logit model is fitted (sig> 0.05). Fitted used for
greater than that.
Or compatible for data analysis using this data
Neglekarke R-square shows the approximate betterment in accuracy that can achieve by using
M-logit model. In this case approximately 100% accuracy may achieve
In pseudo R-square check Neglekarke values
Only include those variables which are significant and exclude the insignificants values.
Significant variables will help to predict the desire category while the insignificant variables are
useless predictors.
That means the only two variables (tidi, time) is the differentiating variables among all the
diagnosis.
Findings: there is no significant difference in time and tidi in AN with reference to AED. This
means that:
(time and tidi are approximately same for AN and AED)
AN
In case of significance
(Diagnosis= AN) = -113.28 (tidi)+453 (time)
-

Showing: the tidi is decreased by 113.28 units in AN with reference to AED


THE time is increased by 453 units in AN with reference to AED.
These models are K-1 in quantity (k stands four categories like four categories in the
model)
And these models compute the probabilities of occurrence of that category.

Classification
Predicted

Observed
Anorexia Nervosa

Anorexia with Bullimia


Bulimia
Nervosa after
Nervosa
Anorexia

Anorexia
Nervosa

Atypical
Eating
Disorder

Percent
Correct

97

100.0%

Anorexia with Bulimia


Nervosa

36

100.0%

Bullimia Nervosa after


Anorexia

56

100.0%

Atypical Eating
Disorder

28

100.0%

44.7%

16.6%

25.8%

12.9%

100.0%

Overall Percentage
-

These model are correct.


Use spss from the file employee data and breakfast.sav file. Use these file for the use of
multinomial test chk yar. Also chk the data of telecom file in which u predict customer
category (dependent) and take some independent variables.

ABN
BNA
Ref: AED

You might also like