You are on page 1of 33

Chapter Four

Statistical
Inference
1

Objectives
At the end of these session the students will be
able:
1. Explain the principles of estimation and
differentiate between point and interval
estimations
2. Compute appropriate confidence
intervals for population means and
proportions and interpret the findings
2

Statistical Inference
Is the process of drawing conclusions about an
entire population based on the data in a sample.
Methods of inference usually fall into one of two broad
categories: Estimation or Hypothesis Testing.
We will focus on using the observations in a sample to
estimate a population parameter.
The purpose of statistical inference is to make
decisions about population characteristics.

Parameters and Statistics

Parameters are summaries of the population, while


estimates are summaries of the sample.
Parameters are unknown; statistics are calculated.
Parameters are hypothetical; whereas statistics are
"real."
Parameters are constants; statistics are random
variables.

Estimation

Is concerned with estimating the values of


specific population parameters based on
sample statistics.
Is about using information in a sample to
make estimates of the characteristics
(parameters) of the source population.

Continued
Example:
A sample survey revealed:
Proportion of smokers among a certain group of
population aged 15 to 24.
Mean of SBP among sampled population
Prevalence of HIV-positive among people involved in the
study
The next question is what can we predict about the
characteristics of the population from which the sample was
drawn.

Estimation, Estimator and Estimates

Estimation is the computation of a statistic from


sample data, often yielding a value that is an
approximation (guess) of its target, an unknown true
population parameter value.
The statistic itself is called an estimator and can be
of two types point or interval.
The value or values that the estimator assumes are
called estimates.

Parameter Estimations
In parameter estimation, we generally
assume that the underlying (unknown)
distribution of the variable of interest is
adequately described by one or more
(unknown) parameters, referred as
Population Parameters.
As it is usually not possible to make
measurements on every individual in a
population, parameter cannot usually be
determined exactly.
Instead we estimate parameters by
calculating the corresponding characteristics
from a random sample estimates
.
8

Types of estimation
Two methods of estimation are
commonly used:
Point estimation involves the
calculation of a single number to
estimate the population parameter
Interval estimation specifies a
range of reasonable values for the
parameter
9

Continued
From a single sample we can calculate a sample
statistic to estimate a single parameter (a point
estimate).
A single numerical value used to estimate the
corresponding population parameter.
n
Point estimate for population
mean is
xi

x = i =1
n
Point estimate for
population proportion is given by
total number of
(events).

x
p=
n

, Where x is the
10

success

11

Confidence Interval estimate


However, the value of the sample statistic will vary from
sample to sample. Therefore, to simply obtain an estimate
of the single value of the parameter is not generally
acceptable.
We need also a measure of how precise our estimate is
likely to be
We need to take into account the sample to sample
variation of the statistic
A confidence interval defines an interval within which the
true population parameter is like to fall (interval estimate).

12

Interval estimation

13

Continued

Level of Confidence
Denoted by 100(1-)%.
A relative frequency interpretation
In the long run, 100(1-)% of all the
confidence intervals that can be
constructed will contain the unknown
parameter.
A specific interval will either contain or
not contain the parameters.

16

Continued
Confidence interval, therefore; takes into account the
sample to sample variation of the statistic and gives the
measure of precision.
The general formula used to calculate a Confidence interval
is Estimate K Standard Error, k is called reliability
coefficient
Confidence intervals express the inherent uncertainty in any
medical study by expressing upper and lower bounds for
anticipated true underlying population parameter.
Most commonly the 95% confidence intervals are
calculated, however 90% and 99% confidence intervals are
sometimes used.

17

18

Confidence interval
A (1-) 100% confidence interval for
unknown population mean and population
proportion is given as follows;

[ x z . , x z . ]
for estimating mean
n
n
2
2
if is unknown, it can be estimated by s.e

[ p z . p(1 p) / n , p z . p(1 p) / n ]
2

for estimating proportion

19

Confidence intervals
The 95% confidence interval is calculated in such a
way that, under the conditions assumed for
underlying distribution, the interval will contain true
population parameter 95% of the time.
Loosely speaking, you might interpret a 95%
confidence interval as one which you are 95%
confident contains the true parameter.
90% CI is narrower than 95% CI since we are only
90% certain that the interval includes the population
parameter.
On the other hand 99% CI will be wider than 95% CI;
the extra width meaning that we can be more
certain that the interval will contain the population
parameter. But to obtain a higher confidence from
the same sample, we must be willing to accept a
larger margin of error (a wider interval).20

Confidence intervals
For a given confidence level (i.e. 90%, 95%,
99%) the width of the confidence
interval depends on the standard error
of the estimate which in turn depends on
the
1.Sample size:-The larger the sample size,
the narrower the confidence interval (this
is to mean the sample statistic will
approach the population parameter) and
the more precise our estimate. Lack of
precision means that in repeated sampling
the values of the sample statistic
are
21

Confidence intervals
- To increase precision (of an SRS), use a
larger sample. You can make the
precision as high as you want by taking
a large enough sample. The margin of
error decreases asn increases.
2.Standard deviation:-The more the
variation among the individual values, the
wider the confidence interval and the less
precise the estimate. As sample size
increases SD decreases.
22

Confidence interval for single mean


If the population standard deviation is known
and the sample size is relatively large the
confidence interval for sample mean is: x
z(/n)
x is the sample mean
is the population standard deviation
n is the sample size
Z is the value from SND
90% CI, z=1.64
95% CI, z=1.96
99% CI, z=2.58
23

Continued

If the population standard deviation is unknown


and the sample size is small (<30), the
formula for the confidence interval for sample
mean is: x t (s/n)
x is the sample mean
s is the sample standard deviation
n is the sample size
t is the value from the t-distribution with (n1) degrees of freedom

24

Degrees of Freedom (df)

df = Number of observations that


are free to vary after sample mean
has been calculated
df = n-1

25

Confidence interval for single proportion

For relatively large sample size the


confidence interval for sample p is
given by:
p z (p (1-p) /n)
p is the sample proportion (r/n)
n is the sample size
Z is the value from SND

26

Mean Example
A uric acid level of 16 apparently healthy subjects yielded the
following values of urine excreted (milligram per day);
0.007, 0.03, 0.025, 0.008, 0.03, 0.038, 0.007, 0.005, 0.032, 0.04,
0.009, 0.014, 0.011, 0.022, 0.009, 0.008
Compute point estimate of the population mean

If x1 , x 2 , ..., x n are n observed values , then


n

x=

x
i =1

0.295

0.01844
16

Construct 90%, 95%, 98% confidence interval for the mean


(0.01844-1.65x0.0123/4, 0.01844+1.65x0.0123/4)=(0.0134, 0.0235)
(0.01844-1.96x0.0123/4, 0.01844+1.96x0.0123/4)=(0.0124, 0.0245)
(0.01844-2.33x0.0123/4, 0.01844+2.33x0.0123/4)=(0.0113, 0.0256)
27

Example2

28

Continued

29

30

Proportion example
In a survey of 300 automobile drivers in one city, 123
reported that they wear seat belts regularly. Estimate
the seat belt rate of the city and 95% confidence
interval for true population proportion.
Answer :p= 123/300 =0.41=41%
n=300,
Estimate of the seat belt of the city at 95%
CI = p z (p(1-p) /n) =(0.35,0.47)

31

Exercise for proportion

32

Thank you
33

You might also like