You are on page 1of 57

Introduction to Inferential Statistics:

Probabilities, Distribution & Estimates

Lecture Handouts
St. Luke’s College of Nursing
SY. 2017 - 2018
Statistical Inference
The process of generalizing or drawing conclusions
about the target population on the basis of
results obtained from a sample. Either by:
-Estimation
-Hypothesis Testing

*Using probability distributions and


estimation to make INFERENCES about the
target population
PARAMETERS
Populations
N = Total Population
n = Samples from the Total Population
Estimation

Is the process by which a statistic


computed for a random sample is
used to approximate or estimate the
corresponding parameter
PARAMETERS
Populations

Summarizing Figure Total Population Sample


Numerical Parameter Statistic
Representation
Mean ʅ x
Variance σ2 s2
Standard Deviation σ s
Proportion Ρ ρ
Estimation

Is the process by which a statistic


Đomputed for a random sample is
used to approximate or estimate the
corresponding parameter͘
Types of Estimate
Point Interval
- A single numerical - Consists of two
value used to numbers, a lower limit
approximate the and an upper limit,
population parameter which serve as the
bounding values
ie. µ=x within which the
parameter is expected
Ρ=ρ to lie with a certain
degree of confidence

ie. mean is found


between x and y
Mean µ and x

N Parameter (age in years)


20 23 19 21 18 22 22 21 23
18
19 21 22 20 21 19 22 24 21

µ = 21

n Statistic (age in years)


9 19 18 19 19 20 22 22 20 21

x = 20
Proportion Ρ and ρ

Body Mass Index (Parameter)


31 32.5 17.2
27.5 23.6 31
36.9 28 15.6
24.5 16.1 32
26.3 30.2 31.9

N = 15 Obesity
7 Yes (BMI > 30.0)
8 No (BMI < 30.0)

P = 0.47 or 47%
Proportion Ρ and ρ

Body Mass Index (Statistic)


27.5 23.6 31
36.9 28 15.6
24.5 16.1 32

n=9 Obesity
3 Yes (BMI > 30.0)
6 No (BMI < 30.0)

ρ = 0.33 or 33%
Point Estimate

Therefore, x is the point estimate of µ, while


ρ is the point estimate of Ρ
Player 1 2 3 4 5 6 7
Height (cm) 167 168 168 169 170 170 171

Total Population Sampling Population


N=7 n=2
µ = 169 cm x = ??

Composition, Observations and Mean of


all of the 21 possible samples of size n=2
from a Population of 7 elements (players)
Player 1 2 3 4 5 6 7
Height (cm) 167 168 168 169 170 170 171
Sample No. Player n=2 Height in cm Statistic mean x
1 1, 2 167, 168 167.5
2 1, 3 167, 168 167.5
3 1, 4 167, 169 168
4 1, 5 167, 170 168.5
5 1, 6 167, 170 168.5
6 1, 7 167, 171 169
7 2, 3 168, 168 168
8 2, 4 168, 169 168.5
9 2, 5 168, 170 169
10 2, 6 168, 170 169
11 2, 7 168, 171 169.5
12 3, 4 168, 169 168.5
13 3, 5 168, 170 169
14 3, 6 168, 170 169
15 3, 7 168, 171 169.5
16 4, 5 169, 170 169.5
17 4, 6 169, 170 169.5
18 4, 7 169, 171 170
19 5, 6 170, 170 170
20 5, 7 170, 171 170.5
21 6, 7 170, 171 170.5
Sampling Distribution of the mean x

Sampling Distribution of the Mean Height


of Players
6
5

Frequency
4
3
2
1
0
167.5 168 168.5 169 169.5 170 170.5
Height in cm
Player 1 2 3 4 5 6 7
Height (cm) 167 168 168 169 170 170 171
Statistic mean x
167.5
167.5
Total Population Sampling Population
168
N=7 N=2 168.5
168.5
µ = 169 cm x = depends on the samples
169
Mean of the x = 169 cm 168
168.5
169
169
169.5
168.5
The mean of the x is exactly 169

equal to the µ 169


169.5
169.5
169.5
170
170
170.5
170.5
Sampling Distribution of the mean x

The Frequency distribution of sample


means obtained from all possible samples
of the size n.

Therefore assumes normal distribution


from all sample sets
Central Limit Theorem
If one draws a large sample of size n
repeatedly from a population that has
mean µ and standard deviation σ, then the
distribution of the sample means will
approximate that of a normal distribution.

Sampling Distribution of the x is


approximately normal even if x is not
normally distributed provided n is large
Normal (Gaussian) Distribution

by German Mathematician, physicist and


astronomer Carl Friedrich Gauss (1777-1855)
Characteristics of a Normal Curve
1. It is bell shaped and symmetrical about
the mean
2. The mean, median and mode of the
normal distribution are all equal
Characteristics of a Normal Curve
3. The total area under the curve is equal
to 1 or 100%, therefore any of these
areas can be thought of as a proportion
or a percentage of a whole.

P = 1 or 100%
Characteristics of a Normal Curve
4. It has long tapering tails that extend
infinitely in either direction but never
touching the x- axis.
Characteristics of a Normal Curve
5. It is completely determined by its mean and standard
deviation
6. 1 SD covers 68% of the distribution
2 SD covers 95.5% of the distribution
3 SD covers 99.75% of the distribution
Kurtosis
σ determines spread of the distribution
– As σ increases, the distribution becomes wider
but shorter,
– As σ decreases, the distribution becomes thinner
and taller
Kurtosis > σ Kurtosis = σ Kurtosis < σ
Kurtosis

e.g. σ = 1.85 σ = 0.88


Skewness
Normally Skewed
Positively Skewed

• Skewed to the Right


• This reflects a frequency distribution which has
lower frequencies as the measurements take on
higher values.
Negatively Skewed

• Skewed to the Left


• Curve with a larger number of observations in the
higher values of the variable being considered
Standard Normal Distribution
The normal Distribution is actually a collection or family
of distributions, with each member being characterized
by the value of µ and σ. Any member can be
transformed into standard normal distribution.

Where the Mean = 0


& S.D. = 1
Standard Normal Distribution
Z-score
Represents the number of standard
deviations a score away from the mean
Standard Normal Distribution:
Problem Solving

Computation of proportions or
percentages of values that belong to
different categories of the variable of
interest.
Problem Solving
Exercise:
Assuming that the distribution of finals exam
score of students has an average of 85 and
standard deviation of 3.

What is the proportion of students who scored


90 and above?
Problem Solving
Sketch the Curve to better understand:

85 90
What is the proportion (shaded area) of
students who scored 90 and above?
Problem Solving
Exercise:
Assuming that the distribution of finals exam score of
students has an average of 85 and standard deviation of
3.
What is the proportion of students who scored 90 and
above?

Given:
x = 90 µ = 85 σ=3

Solution:
z=x - µ = 90 – 85 z = 1.67
σ 3
Problem Solving
Sketch the Curve to better understand:

Z=1.67

What is the probability of getting a standard


normal value above 1.67?
Problem Solving
Exercise:
Assuming that the distribution of finals exam score of
students has an average of 85 and standard deviation of
3.
What is the proportion of students who scored 90 and
above?

Solution:
Using the Normal Distribution Table, get the
corresponding proportion of the area under the curve
Normal Distribution Table
Normal Distribution Table: How to use

What is the Area under the Curve of the z-deviate 1.67?


Problem Solving
Remember that the
Area under the Curve
is equal to 1.00 or
100%
85 90

A = 0.0475
Interpretation:
“the proportion
Z=1.67 (shaded area) of
students who
scored above 90 is
4.47%”
Problem Solving

Exercise:
Assuming that the distribution of finals exam
score of students has an average of 85 and
standard deviation of 3.

What is the proportion of students who scored


between of 81 and 90?
Problem Solving
Sketch the Curve to better understand:

81
81% 85
85% 90
90%

What is the proportion (shaded area) of students who


scored between of 81 and 90?
Problem Solving
Exercise:
Assuming that the distribution of finals exam score of
students has an average of 85 and standard deviation of
3.
What is the proportion of students who scored between
of 81 and 90?

Given:
x1 = 81 x2 = 90 µ = 85 σ=3

Solution:
z1 = x1 - µ = 81 – 85 z = - 1.33
σ 3
Problem Solving
Exercise:
Assuming that the distribution of finals exam score of
students has an average of 85 and standard deviation of
3.
What is the proportion of students who scored between
of 81 and 90?

Solution:
z2 = x2 - µ = 90 – 85 z = 1.67
σ 3
Using the Normal Distribution Table, get the
corresponding proportion of the area under the curve
Problem Solving
Sketch the Curve to better understand:

Z1= -1.33

Z2=1.67

What is the probability of getting a standard


normal value between -1.33 and 1.67?
Normal Distribution Table: How to use

What is the Area under the Curve of the z-deviate 1.67?


Problem Solving
Again, the area under
the Curve is equal to
1.00, therefore to get
the chosen portion
between the two z-
81 85 90
scores, subtract the
values of each area
proportions from 1.00.
Given:
Z1= -1.33 A1 = 0.0918
Z2=1.67 A2 = 0.0475
A3 = 1 – 0.0918 – 0.0475
A3 = 0.8607
Problem Solving

Interpretation:

81 85 90
“The proportion
(shaded area) of
students who
scored between
Z1= -1.33
81 and 90 in the
Z2=1.67
finals exam is
86.07%”
Interval Estimate
Referred to as Confidence Interval (CI)

Degree of Confidence Z-deviate


90% 1.64
95% 1.96
99% 2.58
Interval Estimate
Need to transform the sampling
distribution of x to standard normal using
general formula for z:

Z= x - µ
σ/ 𝑛
Interval Estimate
Need to transform the sampling
distribution of x to standard normal using
general formula for z:
*taking the 95% of sample means, x, are within + 1.96
standard deviations from µ, ie.,

x - 1.96 σ and x + 1.96 σ


𝑛 𝑛
Lower Limit of the Upper Limit of the
Interval Interval
Interval Estimate: Problem Solving
Exercise:
A study in a hospital before identified that
the average length of patient stay was 10
days, with standard deviation of 3 days. If
the researcher’s sample size was 25
random patients, what is the 95%
confidence interval for µ?
Interval Estimate: Problem Solving
Given: n = 25
x = 10 σ=3 CI = 95% ; z = 1.96

Formula:
LL = x - 1.96 σ UL = x + 1.96 σ
𝑛 𝑛
Interval Estimate: Problem Solving
Solution:
LL = 10 – 1.96 3 UL = 10 + 1.96 3
25 25

= 10 – 1.96 (0.6) = 10 + 1.96 (0.6)

= 10 – 1.176 = 10 + 1.176

CI.LL = 8.824 days CI.UL = 11.176 days


Interval Estimate: Problem Solving
Interpretation:
“It is with 95% confidence that the average
length of patient stay is between 8.824
days and 11.176 days.”

Seatwork Exercise:
Using the same problem, apply the 90% CI and
99% CI. What can you say about the results?
Commonly used Degree of
Confidence

Confidence Area Area in one z-score


Level between tail (alpha/2)
0 and z-score
50% 0.2500 0.2500 0.674
80% 0.4000 0.1000 1.282
90% 0.4500 0.0500 1.645
95% 0.4750 0.0250 1.960
98% 0.4900 0.0100 2.326
99% 0.4950 0.0050 2.576
Other References
Lecture presentation on Biostat Unit III: Probabilities for
Biostatistics by H. Ho, 2010.

Lecture notes on Statistical Inference: Estimation of µ for


Biostatistics 201 by M.P. Borja, 2013.

Lecture notes on Normal Distribution for Biostatistics 201


by M.C. Palatino, 2013.

You might also like