Professional Documents
Culture Documents
Research Process?
Planning
Design
Data collection
Analysis
Data entry
Data cleaning
Data management
Data analysis
Reporting
Statistics is used in .
What is statistics?
Collecting
Organizing
Summarizing
Presenting
Interpreting
data
Types of variables
Categorical variables
Continuous variables
Categorical variables
Nominal: unordered data
Death
Gender
Country of birth
Education
Satisfaction
Continuous variables
Continuous: Not restricted to integers
Age
Weight
Cholesterol
Blood pressure
Data collection
Database structure
Data entry
Data cleaning
Data management
Data analyses
Data collection
Data collection:
Data collection
Database structure
Database structure:
Structure the database (using SPSS) into which the data will be
entered
Data entry
Data entry:
Data cleaning
Data cleaning:
Data management
Data management:
Such as:
BMI
Recoding
Categorizing age (less than 50 years, and 50 years and above)
Etc.
Data analyses
Data analyses:
Data analyses
Data analyses:
Univariate analyses
Bivariate analyses
Multivariate analyses
Bottom line
Frequency distribution
Graphical representation
Frequency distribution
Frequency distribution:
Title
Values
Frequency
Valid
Missing
Total
Married
Single
Widow
Total
System
Frequency
266
13
2
281
10
291
Percent
91.4
4.5
.7
96.6
3.4
100.0
Valid Percent
94.7
4.6
.7
100.0
Cumulative
Percent
94.7
99.3
100.0
Example
Graphical representation
A graph lists, for each value (or small range of values) of a variable,
the number or proportion of times that observation occurs in the
study population
Graphical representation:
Two types
Bar chart
Pie chart
Title
Values
Central tendency
Dispersion
Graphical representation
Central tendency:
Mean
Median
Mode
Mean:
Formula
X=
X
i =1
Summation Sign
Summation sign () is just a mathematical shorthand for add
up all of the observations
X
i=1
= X1 + X 2 + X 3 + ....... + Xn
Uniqueness
Simplicity
Median: is the middle number, or the number that cuts the data in
half
80
90
95
110 120
80
90
95
110 120
125
95 + 110
= 102.5 mmHg
2
Median: Formula
Properties:
Uniqueness
Simplicity
Not affected by extreme values
80
Mode = 95
90
95
95
120
125
Example:
Statistics
Systolic blood pressure
N
Valid
286
Missing
5
Mean
144.13
Median
144.50
Mode
155
Example:
21
22
23
23
23
24
Mean = 213/9 = 23.6
Median = 23
15
18
21
21
23
25
Mean = 213/9 = 23.6
Median = 23
24
25
28
25
32
33
Measures of dispersion:
Range
Variance
Standard Deviation
Range
Example:
Range = 120 80 = 40
X 1 = 120
X 2 = 80
X 3 = 90
X 4 = 110
X 5 = 95
s2 =
2
(X
X
)
i
i=1
n 1
s=
2
(X
X
)
i
i=1
n 1
X 1 = 120
X 2 = 80
X 3 = 90
X 4 = 110
X 5 = 95
2
2
2
2
(X
X
)
=
(120
99)
+
(80
99)
+
(90
99)
i
i=1
Sample Variance
n
s =
2
2
(X
X
)
i
i=1
n 1
1020
=
= 255
4
The units of s is the same as the units of the data (for example,
mm Hg)
Example:
Statistics
Systolic blood pressure
N
Valid
Missing
Mean
Median
Mode
Std. Deviation
Variance
Range
Minimum
Maximum
286
5
144.13
144.50
155
35.312
1246.916
202
55
257
Graphical representation:
Different types
Histogram
Construct a chart
Title
Values
Symmetrical
and bell
shaped
Positively
skewed or
skewed to
the right
Negatively
skewed or
skewed to
the left
Shapes of Distributions
Mean
Median
Mode
Shapes of Distributions
Mean
Median
Mode
Shapes of Distributions
Mode
Median
Mean
A
Bimodal
B
Reverse
J-shaped
C
Uniform
Probability
Probability
Definition:
Frequentist Approach:
Application in medicine
Descriptive
What do we mean?
Out of each 100 patients admitted to the emergency department, 4
will die, whereas 96 will be discharged alive
Associations
Current Cigarrete
Smoking
Total
No
Yes
Death at discharge
Death
Discharged
5
123
5
154
10
277
Total
128
159
287
= 100 / 331
= 5 / 159 = 3.1%
= 5 / 128 = 3.9%
Associations
Relative risk
Risk difference
Attributable risk
Odds ratio
Etc..
Bottom line
Probability distributions
Categorical distributions
Continuous distributions
Categorical variables
Frequency distribution
Continuous variables
Continuous distribution
Normal Distribution
Normal Distribution
Mean
Median
Mode
Normal Distribution
Normal Distribution
Age distribution for a specific population
50%
50%
Mean=40
SD=10
Normal Distribution
Age distribution for a specific population
Age = 25
Mean=40
SD=10
Normal distribution
Normal distribution
Thus, for any normal distribution, once we have the mean and sd,
we can calculate the percentage of subjects:
Mean=0
SD=1
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
4.0
0.00000
0.03983
0.07926
0.11791
0.15542
0.19146
0.22575
0.25804
0.28814
0.31594
0.34134
0.36433
0.38493
0.40320
0.41924
0.43319
0.44520
0.45543
0.46407
0.47128
0.47725
0.48214
0.48610
0.48928
0.49180
0.49379
0.49534
0.49653
0.49744
0.49813
0.49865
0.49903
0.49931
0.49952
0.49966
0.49977
0.49984
0.49989
0.49993
0.49995
0.49997
0.00399
0.04380
0.08317
0.12172
0.15910
0.19497
0.22907
0.26115
0.29103
0.31859
0.34375
0.36650
0.38686
0.40490
0.42073
0.43448
0.44630
0.45637
0.46485
0.47193
0.47778
0.48257
0.48645
0.48956
0.49202
0.49396
0.49547
0.49664
0.49752
0.49819
0.49869
0.49906
0.49934
0.49953
0.49968
0.49978
0.49985
0.49990
0.49993
0.49995
0.49997
0.00798
0.04776
0.08706
0.12552
0.16276
0.19847
0.23237
0.26424
0.29389
0.32121
0.34614
0.36864
0.38877
0.40658
0.42220
0.43574
0.44738
0.45728
0.46562
0.47257
0.47831
0.48300
0.48679
0.48983
0.49224
0.49413
0.49560
0.49674
0.49760
0.49825
0.49874
0.49910
0.49936
0.49955
0.49969
0.49978
0.49985
0.49990
0.49993
0.49996
0.49997
0.01197
0.05172
0.09095
0.12930
0.16640
0.20194
0.23565
0.26730
0.29673
0.32381
0.34849
0.37076
0.39065
0.40824
0.42364
0.43699
0.44845
0.45818
0.46638
0.47320
0.47882
0.48341
0.48713
0.49010
0.49245
0.49430
0.49573
0.49683
0.49767
0.49831
0.49878
0.49913
0.49938
0.49957
0.49970
0.49979
0.49986
0.49990
0.49994
0.49996
0.49997
0.01595
0.05567
0.09483
0.13307
0.17003
0.20540
0.23891
0.27035
0.29955
0.32639
0.35083
0.37286
0.39251
0.40988
0.42507
0.43822
0.44950
0.45907
0.46712
0.47381
0.47932
0.48382
0.48745
0.49036
0.49266
0.49446
0.49585
0.49693
0.49774
0.49836
0.49882
0.49916
0.49940
0.49958
0.49971
0.49980
0.49986
0.49991
0.49994
0.49996
0.49997
0.01994
0.05962
0.09871
0.13683
0.17364
0.20884
0.24215
0.27337
0.30234
0.32894
0.35314
0.37493
0.39435
0.41149
0.42647
0.43943
0.45053
0.45994
0.46784
0.47441
0.47982
0.48422
0.48778
0.49061
0.49286
0.49461
0.49598
0.49702
0.49781
0.49841
0.49886
0.49918
0.49942
0.49960
0.49972
0.49981
0.49987
0.49991
0.49994
0.49996
0.49997
0.02392
0.06356
0.10257
0.14058
0.17724
0.21226
0.24537
0.27637
0.30511
0.33147
0.35543
0.37698
0.39617
0.41308
0.42785
0.44062
0.45154
0.46080
0.46856
0.47500
0.48030
0.48461
0.48809
0.49086
0.49305
0.49477
0.49609
0.49711
0.49788
0.49846
0.49889
0.49921
0.49944
0.49961
0.49973
0.49981
0.49987
0.49992
0.49994
0.49996
0.49998
0.02790
0.06749
0.10642
0.14431
0.18082
0.21566
0.24857
0.27935
0.30785
0.33398
0.35769
0.37900
0.39796
0.41466
0.42922
0.44179
0.45254
0.46164
0.46926
0.47558
0.48077
0.48500
0.48840
0.49111
0.49324
0.49492
0.49621
0.49720
0.49795
0.49851
0.49893
0.49924
0.49946
0.49962
0.49974
0.49982
0.49988
0.49992
0.49995
0.49996
0.49998
0.03188
0.07142
0.11026
0.14803
0.18439
0.21904
0.25175
0.28230
0.31057
0.33646
0.35993
0.38100
0.39973
0.41621
0.43056
0.44295
0.45352
0.46246
0.46995
0.47615
0.48124
0.48537
0.48870
0.49134
0.49343
0.49506
0.49632
0.49728
0.49801
0.49856
0.49896
0.49926
0.49948
0.49964
0.49975
0.49983
0.49988
0.49992
0.49995
0.49997
0.49998
0.03586
0.07535
0.11409
0.15173
0.18793
0.22240
0.25490
0.28524
0.31327
0.33891
0.36214
0.38298
0.40147
0.41774
0.43189
0.44408
0.45449
0.46327
0.47062
0.47670
0.48169
0.48574
0.48899
0.49158
0.49361
0.49520
0.49643
0.49736
0.49807
0.49861
0.49900
0.49929
0.49950
0.49965
0.49976
0.49983
0.49989
0.49992
0.49995
0.49997
0.49998
Standardized Normal
Distribution (Z)
Normal Distribution
Mean = , SD =
TRANSFORM
Z
x-
Mean = 0, SD = 1
Standardized Normal
Distribution (Z)
Normal Distribution
Mean = 40, SD = 10
Z(40)
x - = 40 - 40 = 0
10
TRANSFORM
Mean = 0, SD = 1
Standardized Normal
Distribution (Z)
Normal Distribution
30
Mean = 40, SD = 10
Z(40)
x - = 30 - 40 = -1
10
TRANSFORM
-1
Mean = 0, SD = 1
Normal Distribution
Age distribution for a specific population
Mean=40
SD=10
30
Mean 1SD
68%
50
Mean + 1SD
Normal Distribution
Age distribution for a specific population
Mean=40
SD=10
20
Mean 2SD
95%
60
Mean + 2SD
Normal Distribution
Age distribution for a specific population
Mean=40
SD=10
10
Mean 3SD
99.7%
70
Mean + 3SD
Practical example
Practical example
.3
68%
.2
95%
99.7%
.1
0
83
97
111
125
139
153
167
Data analyses
Data analyses:
population
sample
Inference: Drawing
conclusions on certain
questions about a
population from sample data
Inferential statistics
Inferential statistics
What do we do?
286
5
144.13
35.312
Inferential statistics
Inferential statistics
Sample data
N=291
Mean=144
SD=35
Inference
1-
Confidence Interval
Confidence Intervals
A point estimate:
A single numerical value used to estimate a population parameter.
Interval estimate:
Consists of 2 numerical values defining a range of values that with
a specified degree of confidence includes the parameter being
estimated.
(Usually interval estimate with a degree of 95% confidence is
used)
Example
Select a sample
Point estimate
= mean
= 144
35
144 1.95
291
x + z (1-/2) SE
x + z (1-/2) SE
x z (1-/2) SE
N = 291
- 2SE
95%
+ 2SE
Standard error
Standard error
= sd / n
Interpretation
95% Confidence Interval
calculated interval
Thus, if we repeat the sampling procedure 100 times, the above
wrong in 5 times (the true parameter is outside the interval) (also called
error)
Interpretation
No
A 99% CI is wider
A 90% CI is narrower
2-
P-value
Inference
P-value
Hypothesis testing
Hypothesis testing
Hypothesis testing
If the data are consistent with the null hypothesis, then we do not
reject the null hypothesis (conclusion = no difference)
If the sample data are not consistent with the null hypothesis, then
we reject the null (conclusion = difference)
Hypothesis testing
Ho: = 120
Ha: 120
Can we consider that the 144 is consistent with the normal value
(120 years)?
Hypothesis testing
N = 291
mean
144
Ho: = 120
mean
144
Hypothesis testing
N = 291
mean
mean
2.5%
2.5%
95%
- 2SE
Ho: = 120
+ 2SE
Test statistic
How to decide
Problem!
Ho True
Ho False
Correct decision
Type II error
Type I error
Correct decision
Error
Type I error:
Referred to as
Type II error:
Referred to as
Power:
Represented by 1-
Significance level
0.05
0.01
0.1
Statistical significance
We carry out a test called one sample t-test which provides a pvalue based on which we accept or reject the null hypothesis.
286
Mean
144.13
Std. Deviation
35.312
Std. Error
Mean
2.088
One-Sample Test
Test Value = 120
t
11.558
df
285
Sig. (2-tailed)
.000
Mean
Difference
24.133
95% Confidence
Interval of the
Difference
Lower
Upper
20.02
28.24
Since p-value is less than 0.05, then the conclusion will be that the
systolic blood pressure for patients admitted to emergency
department after an MI is significantly higher than the normal
value which is 120
p-values
Small p-values mean that the sample results are unlikely when the
null is true
t-distribution
t15
75%
80%
85%
90%
95%
97.5%
99%
99.5%
99.75%
99.9%
99.95%
1.000
1.376
1.963
3.078
6.314
12.71
31.82
63.66
127.3
318.3
636.6
0.816
1.061
1.386
1.886
2.920
4.303
6.965
9.925
14.09
22.33
31.60
0.765
0.978
1.250
1.638
2.353
3.182
4.541
5.841
7.453
10.21
12.92
0.741
0.941
1.190
1.533
2.132
2.776
3.747
4.604
5.598
7.173
8.610
0.727
0.920
1.156
1.476
2.015
2.571
3.365
4.032
4.773
5.893
6.869
0.718
0.906
1.134
1.440
1.943
2.447
3.143
3.707
4.317
5.208
5.959
0.711
0.896
1.119
1.415
1.895
2.365
2.998
3.499
4.029
4.785
5.408
0.706
0.889
1.108
1.397
1.860
2.306
2.896
3.355
3.833
4.501
5.041
0.703
0.883
1.100
1.383
1.833
2.262
2.821
3.250
3.690
4.297
4.781
10
0.700
0.879
1.093
1.372
1.812
2.228
2.764
3.169
3.581
4.144
4.587
11
0.697
0.876
1.088
1.363
1.796
2.201
2.718
3.106
3.497
4.025
4.437
12
0.695
0.873
1.083
1.356
1.782
2.179
2.681
3.055
3.428
3.930
4.318
13
0.694
0.870
1.079
1.350
1.771
2.160
2.650
3.012
3.372
3.852
4.221
14
0.692
0.868
1.076
1.345
1.761
2.145
2.624
2.977
3.326
3.787
4.140
15
0.691
0.866
1.074
1.341
1.753
2.131
2.602
2.947
3.286
3.733
4.073
16
0.690
0.865
1.071
1.337
1.746
2.120
2.583
2.921
3.252
3.686
4.015
17
0.689
0.863
1.069
1.333
1.740
2.110
2.567
2.898
3.222
3.646
3.965
18
0.688
0.862
1.067
1.330
1.734
2.101
2.552
2.878
3.197
3.610
3.922
19
0.688
0.861
1.066
1.328
1.729
2.093
2.539
2.861
3.174
3.579
3.883
20
0.687
0.860
1.064
1.325
1.725
2.086
2.528
2.845
3.153
3.552
3.850
100
0.677
0.845
1.042
1.290
1.660
1.984
2.364
2.626
2.871
3.174
3.390
120
0.677
0.845
1.041
1.289
1.658
1.980
2.358
2.617
2.860
3.160
3.373
0.674
0.842
1.036
1.282
1.645
1.960
2.326
2.576
2.807
3.090
3.291
Hypothesis Testing
OR = 1
RR = 1
RD = 0
Test of homogeneity
Etc..
Example
N
Mean
Std. Deviation
Valid
Missing
Heart Rate at
admission
286
5
82.64
22.598
Heart Rate at
discharge
77
214
76.99
17.900
Paired t-test
Pair
1
Mean
81.16
76.72
N
75
75
Std. Deviation
23.546
17.973
Std. Error
Mean
2.719
2.075
Mean
Pair
1
4.440
Std. Deviation
Std. Error
Mean
25.302
2.922
95% Confidence
Interval of the
Difference
Lower
Upper
-1.381
10.261
t
1.520
df
Sig. (2-tailed)
74
.133
sample mean 0
t=
SEM
4 .4
t =
= 1 . 52
2 .9
The value t = 1.52 is called the test statistic
Then we can compare the t-value in the table and get the
p-value, or get it from the computer (0.13)
Thus, this probability is big (bigger than 0.05) which leads to saying
that the difference of 4.4 is due to chance
Notes
3 scenarios
-15
-10
-5
10
15
-15
-10
-5
10
15
-15
-10
-5
10
15
Sex
Male
Female
N
240
44
Mean
145.05
138.64
Std. Deviation
35.162
35.753
Std. Error
Mean
2.270
5.390
Null hypothesis:
Alternative hypothesis:
P-value > 0.05, then do no reject the null (the two means are equal)
P-value < 0.05, then reject the null (the two means are different)
Sex
Male
Female
Mean
145.05
138.64
240
44
Std. Deviation
35.162
35.753
Std. Error
Mean
2.270
5.390
F
Systolic blood pressure
Equal variances
assumed
Equal variances
not assumed
.044
Sig.
.835
df
Sig. (2-tailed)
Mean
Difference
Std. Error
Difference
95% Confidence
Interval of the
Difference
Lower
Upper
1.109
282
.269
6.409
5.781
-4.970
17.789
1.096
59.267
.278
6.409
5.848
-5.292
18.111
Example
T-test
No significant difference
Chi square
Example
Sex
Total
Male
Female
Hypertension
No
Yes
191
52
24
20
215
72
Total
243
44
287
Example
Sex
Male
Female
Total
Count
% within Sex
Count
% within Sex
Count
% within Sex
Hypertension
No
Yes
191
52
78.6%
21.4%
24
20
54.5%
45.5%
215
72
74.9%
25.1%
Total
243
100.0%
44
100.0%
287
100.0%
Example
H0: P1 = P2
(P1 - P2 = 0)
Ha: P1 P2
(P1 - P2 0)
(0 - E)
=
E
4 cells
E = expected =
We can use this to determine how likely it was to get such a big
discrepancy between the observed and expected by chance alone
Probability
.4
.6
.8
.2
2 = 3.84 p = 0.05
10
Chi-squared Value
15
20
Example of Calculations of
Chi-Square 2x2 Contingency Table
Test statistic
(0 - E)
=
E
4 cells
Chi-Square Tests
Pearson Chi-Square
Continuity Correctiona
Likelihood Ratio
Fisher's Exact Test
Linear-by-Linear
Association
N of Valid Cases
Value
11.471b
10.227
10.366
11.431
df
1
1
1
1
Asymp. Sig.
(2-sided)
.001
.001
.001
Exact Sig.
(2-sided)
Exact Sig.
(1-sided)
.001
.001
.001
287
= 11.471
2
.2
Probability
.4
.6
.8
10
Chi-squared Value
15
20
Example
Chi-square
MI
Yes
No
Vioxx
71
52
Placebo
29
48
Drug
Ho: RR = 1
Ha: RR 1
Significant association
Notes
3 scenarios
0
Example
Example
Chi-square
Example
Sex
Male
Female
Total
Count
% within Sex
Count
% within Sex
Count
% within Sex
Total
240
100.0%
44
100.0%
284
100.0%
Example
Conclusion
Sex * Hypterension and Diabetes combined Crosstabulation
Sex
Male
Female
Total
Count
% within Sex
Count
% within Sex
Count
% within Sex
Chi-Square Tests
Pearson Chi-Square
Likelihood Ratio
Linear-by-Linear
Association
N of Valid Cases
Value
28.691a
24.336
25.341
2
2
Asymp. Sig.
(2-sided)
.000
.000
.000
df
284
Total
240
100.0%
44
100.0%
284
100.0%
Example
ANOVA
The problem
The problem
The problem
- If 5 groups is available then 10 t-test of 2 groups to perform.
- The high Type I error rate, resulting from the large number of
comparisons, means that we may draw incorrect conclusions.
Assumptions
Analysis of variance requires the following assumptions:
Sources
df
SS
MS
Factor
k 1
SS(factor)
MS(factor)=
Error
n k
SS(error)
MS(error)=
SS ( factor)
k 1
MS ( factor)
MS (error )
SS (error )
n k
___________________________________________________________
Total
n 1 SS(total)
Rationale
Rationale
Under the null hypothesis that the group means are the same,
SS(factor) will be similar to SS(error).
Example
Valid
Missing
Total
None
Either HT or DM
Both HT and DM
Total
System
Frequency
159
80
47
286
5
291
Percent
54.6
27.5
16.2
98.3
1.7
100.0
Valid Percent
55.6
28.0
16.4
100.0
Cumulative
Percent
55.6
83.6
100.0
Example
Valid
Missing
Total
None
Either HT or DM
Both HT and DM
Total
System
Frequency
159
80
47
286
5
291
Percent
54.6
27.5
16.2
98.3
1.7
100.0
Valid Percent
55.6
28.0
16.4
100.0
Cumulative
Percent
55.6
83.6
100.0
Example
Descriptives
Systolic blood pressure
N
None
Either HT or DM
Both HT and DM
Total
155
79
47
281
Mean
144.52
142.97
146.55
144.43
Std. Deviation
32.789
39.634
36.360
35.319
Std. Error
2.634
4.459
5.304
2.107
Minimum
78
56
55
55
Maximum
248
257
235
257
ANOVA
Systolic blood pressure
Between Groups
Within Groups
Total
Sum of
Squares
380.517
348908.2
349288.8
df
2
278
280
Mean Square
190.259
1255.066
F
.152
Sig.
.859
Conclusion
We conclude that the average systolic blood pressure for the three
groups is the same.
Bivariate analyses
DEPENDENT
(outcome)
INDEPENDENT
(exposure)
2 LEVELS
> 2 LEVELS
CONTINUOUS
2 LEVELS
X2
(chi square test)
X2
(chi square test)
t-test
> 2 LEVELS
X2
(chi square test)
X2
(chi square test)
ANOVA
t-test
-Correlation
-Linear
Regression
CONTINUOUS
ANOVA
New scenario
Scatterplot
Correlation coefficient
Correlation
Correlation
Ranges between:
r>0
Positive association
r<0
Negative association
r=0
No association
r = 0.01
Y
r = 0.68
r = 0.98
10
12
r = -0.9
3.5
r = .76
3
2.5
55
60
65
70
75
Correlation
Ha: Correlation 0
A
A
250
BP systolic
200
150
100
50
A
A
A
A
A
A
A
AA
A
A A
AAA
A A
A
A
A
A
AA A A
AA
A
A AA
A
A A
A
A
A A AA
AA
A
A A
A
AA
A
A
A A A
A
A
AA
A
A
A
A
A
A A A
AAA
A
A
AA AA
A
AA
A
A AAA
A
A
A
A
A
A
A
A
A A
A
AA A
A
AA
A
A
A
A
AA A
A
AA A
A
A A
AA
AA
A
A
A
A
A
A
A
A
A
A AA
A A
A
A
A
A A A A
AA A A
A
A AA A A
AA A
A
A
A
A
A
AA
AAAA A
A
A
A
A
A AA
AA
A
A
A
A
A
A
A
A A AAAAA
A
AA A A
AA
A
AA
A
AA
AA
AA A
A
A
A
AA
A
AA A A
AA
AA
A
A AA AAA
A
A
A A
AA
A A A AA
A
AA
A AAAA
A
A
A
A
A A A
A
A
A
A
A A
A
A
A
A
A
40
80
Correlation: = 0.190
P-value = 0.001
Heart Rate
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Systolic blood
pressure
1
Heart Rate at
admission
.190**
.001
286
285
.190**
1
.001
285
286
120
Correlations
160
Significant correlation
Problem
r= 0
Correlation Coefficient
r = .7
It is simple in terms of
The Slope
1 = 0
1 > 0
1 < 0
y = b0 + b1 x
b1
b0
The Slope
y
1 > 0
1 = 0
1 < 0
0
R
.054a
R Square
.003
Correlation:
R = 0.054
Adjusted
R Square
-.001
Std. Error of
the Estimate
35.387
Coefficientsa
Model
1
(Constant)
Age
Unstandardized
Coefficients
B
Std. Error
136.400
8.812
.148
.162
Standardized
Coefficients
Beta
.054
t
15.479
.910
Sig.
.000
.364
Hypothesis test
Ho: 1 = 0
Ha: 1 0
SBP = 0 + 1 (Age)
Study the association between Age and SBP while controlling for
gender
Model
1
(Constant)
Age
Sex
Unstandardized
Coefficients
B
Std. Error
143.090
9.742
.216
.171
-8.992
6.123
Standardized
Coefficients
Beta
.080
-.093
t
14.688
1.261
-1.469
Sig.
.000
.208
.143
Sample characteristics
Inferences to be made
Number of variables
not for entire study, but for the specific question at hand
Type of data
numerical, continuous
Number of groups
Sample type
independent or dependent
Descriptive analyses
Type of variable
Measure
Categorical
Proportion (%)
Continuous
(Normal)
Mean (SD)
Continuous
(Not Normal)
Median
Inter-quartile range
-
Parametric:
More powerful
Non-parametric:
Univariate analyses
Type of variable
Measure
Categorical
Z proportions
Continuous
(Normal)
T-test
Continuous
(Not Normal)
n > 30 t-test
n < 30 Kolmogorov-Smirnov Test
-
Bivariate analyses
Type of
variable
2 levels
> 2 levels
Continuous
2 levels
Chi squared
Chi squared
T-test
> 2 levels
Chi squared
Chi squared
Anova
Continuous
T-test
Anova
Correlation
linear regression
-
Bivariate analyses
Type of
variable
2 levels
2 levels
Fishers test
McNemars test
-
> 2 levels
Continuous
> 2 levels
Fishers test
Fishers test
Mann-Whitney
- Wilcoxin test
Fishers test
Kruskal-Wallis
- Friedman test
Continuous
Mann-Whitney
- Wilcoxin test
Kruskal-Wallis
- Friedman test
Correlation
Regression
Multivariate analyses
Type of variable
Measure
Categorical
Logistic regression
Continuous
(Normal)
Multinomial regression
Continuous
(Not Normal)
Linear regression
Overview
Measurement
(Gaussian)
Ordinal or
Measurement (NonGaussian)
Binomial
Survival Time
Mean, SD
Median, interquartile
range
Proportion
Unpaired t test
Mann-Whitney test
Fisher's test
Chi-square
Log-rank test or
Mantel-Haenszel*
Paired t test
Wilcoxon test
McNemar's test
Conditional
proportional hazards
regression*
One-way ANOVA
Kruskal-Wallis test
Chi-square test
Cox regression
Repeated-measures
ANOVA
Friedman test
Cochrane Q**
Conditional
proportional hazards
regression*
Pearson correlation
Spearman correlation
Contingency
coefficients**
Simple linear
regression
Nonparametric
regression**
Simple logistic
regression*
Cox regression
Multiple linear
regression*
Multiple logistic
regression*
Cox regression
Depends on:
Study design
Types of variables
2 *SD * (z + z )
2
1 - = power
1 - = power
1 - = less or larger N
1 - = more or smaller N
N = to be found
Thank you