You are on page 1of 14

05 : Sampling Distributions and Methods of Estimation (1)

05. Sampling and Sampling Distributions


We are often interested in calculating some properties (called as the parameters) of a
population. For a very large population, the exact calculation of a parameter is typically
prohibitive. A more economical and sensible approach is to take a random sample from the
population of interest, calculate a statistic related to the parameter of interest, and then make an
inference bout the parameter based on the value of the statistic. This is called statistical inference.
The distribution of a statistic is called a sampling distribution. The sampling distribution
helps us understand how close a statistic to its corresponding population parameter is.
Typical parameters of interest include:
Mean
Proportion
Variance
The standard statistic that is used to infer about the population mean is the sample mean.
Definitions:
(1) A population is defined as an aggregate of all individuals, or elements, or objects under
consideration. They are called statistical units.
Examples
(i). The manager of an automobile agency is interested in fuel economy of the Suzuki
cars in the companys fleet. Here the population consists of all Suzuki cars in the
fleet. The elements of the population are the individual cars.
(ii). A quality assurance manager wishes information about the quality level of the
firms for manufacturing light bulbs. Here the population consists of all the bulbs
that could be produced by the process. The elements of the population are the
individual electric bulbs.
(2) A population containing finite or fixed number of elements is called a finite population
otherwise it is infinite.
(3) A population which consists of concrete objects is called an existent population otherwise
it is called hypothetical population.
(4) A small part of a population is called a sample.
(5) Technique of selecting a true sample is called sampling. Well discuss sampling later.
(6) The sampling distribution of the mean is the probability distribution or the relative

frequency distribution of the means X of all possible random samples of the same size
that could be selected from a given population. The mean of this distribution is
represented by x- and the standard deviation which is called the standard error of the
mean, by x- .
(7) Sampling is said to be with replacement if the selected unit is replaced to the population
before selecting the next unit. Thus sampling unit can be selected more than once.
Case (i) when sampling is without replacement from a finite population:
N-n
x- = and x- =
n N-1

Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
05 : Sampling Distributions and Methods of Estimation (2)
N-n
where is called the finite population multiplier or the finite population correction
N-1
n
factor. If the sampling fraction is less than 0.05, then finite population multiplier need not be
N
used. For a large N, this factor, of course, approach 1 and hence can be ignored. The usual rule of
thumb is to consider N is large enough if it is at least 20 times larger than n.
Case (ii) when the sampling is with replacement from infinite population:

x- = and x- =
n
(8) An element of a sample is called a sample unit. A complete list of all possible sampling
units is called a sampling frame.
(9) Numerical information or values drawn from population are called parameter. For
example population mean and the standard deviation .
(10) Numerical information or values drawn from sample are called statistic. It varies from

sample to sample from the same population. For example sample mean X and sample
standard deviation S.
(11) The difference between parameter and statistic due to small sample is called sampling
error. It can be reduced by increasing the sample size to a sufficient level.

sampling error = X
(12) The non sampling errors are those which arise due to defective sampling frame or
information not being provided correctly. For example, income, sale, production age etc.
are not coated correctly in the most of the cases.
(13) Bias is a cumulative component of error which arise due to defective selection of the
sample or negligence of the investigator. Errors due to bias increase with an increase in
the size of the sample.
(14) A population in which every sampling unit have similar characteristic and have equal
chances of selection in sample is called a homogeneous population.
Definition (Sampling)
Sampling techniques are used to estimate the population parameters on the basis of
samples measures called as statistic and usually these inferences are mean, variance, standard
deviation, Skewness and Kurtosis etc. Thats why we discuss here sampling distributions as an
application of these inferences.
Sampling methods
(1) Probability Sampling
when each unit in population has known non-zero (not necessarily equal) probability of
its being included in the sample, the sampling is said to be probability sampling is also
called random sampling. e.g. simple random sampling, stratified sampling, systematic
sampling, cluster sampling etc.
(2) Non-probability Sampling
a non-probability sampling is a process in which the personal judgment determines which
units of the population are selected for the sample. It is also called non-random or
judgment sampling

Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
05 : Sampling Distributions and Methods of Estimation (3)
Types of Sampling
Random or Probability Sampling
Non-random or Judgment Sampling
In probability sampling or random sampling, all the items in the population have a chance
of being chosen in the sample. In judgment sampling, personal knowledge and opinion are used
to identify the items from the population that are to be included in the sample. Sometimes
judgment sample is used as pilot or trial sample to decide how to take a random sample later.
The rigorous statistical analysis can be done only with the probability samples.

Types of Random Sampling

(i) Simple Random Sampling


Goldfish Bowl Procedure: In this procedure each unit of the population is
allotted a different serial number from 1 to N and record each number on a card or
on a slip of paper. Place these numbered cards or the folded slips of paper in a
bowl or a basket and mix them thoroughly. Then draw out blindly the desired
number of cards or the folded slips of paper one by one for the sample.
Using a Random Number Table: Assign a number from 1 to N to each of the N
units in the population. Consult a random number table, read digits in groups of
two or three or more according to the largest number assigned to a unit in the
population, from the table vertically , horizontally or diagonally.

(ii) Systematic Sampling


A sample of size n is defined to be a systematic random sample if it is obtained by
choosing one unit at random from the first k units and thereafter selecting every
kth unit after the N units in the population have been serially numbered from 1 to
N or arranged in a systematic way.

(iii) Stratified Sampling


A sample of size n is defined to be stratified random sample if it is selected from a
population which has been divided into a number of non-overlapping groups
called strata, such that parts of the sample is drawn at random from each stratum.

(iv) Cluster Sampling


A random sample is said to be a cluster sample if it consists of first selecting at
random groups of individual units, called cluster into which a population can be
divided and then including in the sample either all the units from each of the
chosen clusters, or selecting a random sample of the units which the cluster
comprises.

Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
05 : Sampling Distributions and Methods of Estimation (4)
Sampling Distribution
A frequency distribution of all the means of the samples is called the sampling distribution of the
mean.
Explanation:
Suppose we draw samples from a normally distributed population with mean 100 and a standard
deviation of 25. We draw samples of 5 items each and calculate their mean.
Relationship between the population distribution and sampling distribution of the mean for a
normal population is:

Suppose we increase our sample size from 5 to 20. This would increase the effect of averaging in
each sample and would expect even less dispersion among the sample means

Examples (1)
Consider the data concerning the experience of five motorcycle owners with life of tires.
Owners Carl Debbie Elizabeth Frank George Total
Tire Life 3 3 7 9 14 36
(in months)
Because only five people are involved, the population is too small to be approximated by a
normal distribution. We will take all of the possible samples of the owners in groups of three.

Compute the sample mean X , list them and compute the mean of the sampling distribution x- ?

Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
05 : Sampling Distributions and Methods of Estimation (5)
Solution
Calculation of sample mean of tires life with n = 3 is given below:

Sample of Three Sample Data Sample Mean


(Tire lives)
EFG 7+9+14 10
DFG 3+9+14 8 2/3
DEG 3+7+14 8
DEF 3+7+9 6 1/3
CFG 3+9+14 8 2/3
CEG 3+7+14 8
CEF 3+7+9 6 1/3
CDF 3+3+9 5
CDE 3+3+7 4 1/3
CDG 3+3+14 6 2/3
Total 72

x- = 72/10 = 7.2

Calculations show that even the population is not normal, the mean of the sampling distribution
x- , is still equal to the population mean .
In the following figures, we observe that the distributions of the population is not normal
whereas the sampling distribution of the mean looks a little like the bell shape.

Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
05 : Sampling Distributions and Methods of Estimation (6)
As the sample size is increased, the sampling distribution of the mean looks more likely to a bell
shape of the normal distribution.

Now we state central limit theorem which supports the above sited arguments.
Central Limit Theorem
The central limit theorem (CLT) states that, given certain conditions, the mean of a sufficiently
large number of independent random variables, each with a well-defined mean and well-defined
variance, will be approximately normally distributed.
The central limit theorem explains:
the mean of the sampling distribution of the mean will equal the population mean
it measures that the sampling distribution of the mean approaches normal as the
sample size increases
It is a relationship between the shape of the population distribution and the shape
of the sampling distribution of the mean.
Examples (2)
A population consists of 5 numbers 2, 3, 6, 8 and 9. Consider all possible samples of size 3 that
can be drawn with replacement from this population. Find (a) the mean of the population, (b) the standard
deviation of the population, (c) the mean of the sampling distribution of means and (d) the standard
deviation of the sampling distribution.
Using software Minitab, this question may be solved.

Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
05 : Sampling Distributions and Methods of Estimation (7)

06. POINT ESTIMATION AND INTERVAL ESTIMATION

The sample mean x is the best estimator of the population mean . It is unbiased, consistent, the
most efficient estimator, and, as long as the sample is sufficiently large, its sampling distribution
can be approximated by the normal distribution as central limit theorem says.
Definition (Point Estimate)
Point estimate of a population parameter is a single numerical value of a sample statistic.
(1) Point Estimates

(a) Sampling mean x as point estimate of the population mean.


E(
x) =

i.e.
x is an unbiased estimate of the population mean .
(b) s2 is a point estimate of the population variance 2
(xi -
x)2
s2 =
n-1
i.e. an unbiased estimate of the population variance
(c) s2 is also a point estimate of the population variance 2
(xi -
x)2
s2 =
n
i.e. an biased estimate of the population variance

Examples (3)
A bank calculates that its individual saving accounts are normally distributed with mean
of $2000 and a standard deviation of $600. If the bank takes random samples of 100 accounts,
what is the probability that the sample mean will lie between $1900 and 2050.
Solution
First we calculate standard error of the mean:

x- = (for infinite population)
n
600
=
100
= $ 60
To determine the probability that sample mean will lie between $1900 and $2050. We find that
corresponding values z1 and z2 using

x-
Z= =
x
It tells us to convert any normal random variable to a standard normal random variable.

Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
05 : Sampling Distributions and Methods of Estimation (8)
1900 - 2000
For X = $ 1900 z1 = = - 1.67
60
2050 - 2000
For X = $ 2050 z2 = = 0.83
60

using table we have the total area between z1 and z2 is 0.7492


i.e. P[1900 x 2050] = P[- 1.67 z 0.83] = 0.7492
Examples (4)
In a sample of 25 observations from normal distribution with mean 98.6 and standard
deviation 17.5
(a) what is the standard error of the mean
(b) what is P[92 < x < 102 ] SC 6.5
(c) Find the corresponding probability given a sample of 36.
Solution
(a) n = 25, = 98.6, = 17.2,
x = / n = 17.2 / 25 = 3.44 (Standard Error)
92 - 98.6 x- 102 - 98.6
(b) P[92 < x < 102 ] = P[ < < ]
3.44 x 3.44
= P[-1.92 < z < 0.99]
= 0.4726 + 0.3389
= 0.8115
(c) n = 36, x = / n = 17.2 / 36 = 2.87
92 - 98.6 x- 102 - 98.6
P[92 <
x < 102 ] = P[ < < ]
2.87 x 2.87
= P[-2.30 < z < 1.18]
= 0.4893 + 0.3810
= 0.8703

Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
05 : Sampling Distributions and Methods of Estimation (9)
Examples (5)
Mary Bartel an Auditor for a large credit card company, knows that on average, the
monthly balance of any given customer is $ 112 and the standard deviation is $ 56. If Mary
audits 50 randomly selected accounts, what is the probability that the sample average monthly
balance is Page 321 AIOU SC 6.5
(i) below $ 100
(ii) between $ 100 and $ 130.
solution

n = 50, = 56, = 112, x = / n = 56 / 50 = 7.92



x- 100 - 112
(i) P[
x < 100] = P[ < ] = P[z < -1.52 ] = 0.5 0.4357 = 0.0613
x 7.920

100 - 112 x- 130 - 112


(ii) P[100 < x < 130 ] = P[ < < ] = P[-1.52 < z < 2.27]
7.920 x 7.920
= 0.4357 + 0.4884 = 0.9241
Examples (6)
In a sample of 16 observations from a normal distribution with a mean of 150 and a
variance of 256, what is
(i) P[
x < 160] ? (AIOU p-321 Prob. 6.27)
(ii) P[
x > 142] ?
If, instead of 16 observations, 9 observations are taken, find
(iii) P[
x < 160] ?
(iv) P[
x > 142] ?
Examples (7)
From a population of 125 items with a mean of 105 and a standard deviation of 17, 64 items
were chosen.
(a) what is the standard error of the mean (AIOU -327 SC 6.7)

(b) what is the P(107.5 < X 109)?
Solution
N = 125, = 105, = 17 and n = 64
N-n 17 61
(a) x- = N-1 = = 1.4904
n 8 124

107.5-105 X- 109-105
(b) P(107.5 < X 109) = P( < < )
1.4904 x- 1.4904
= P(1.68 < z < 2.68) = 0.4963-0.4535 = 0.0428

Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
05 : Sampling Distributions and Methods of Estimation (10)
Examples (8)
From a population of 75 items with a mean of 364 and a variance of 18, 32 items were
randomly selected without replacement.
(d) What is the standard error of the mean (AIOU p-327 Prob. 6.40)
(e)
what is the P[363 < x < 366].
(f) What would your answer to part (a) be if we sample with replacement.
Examples (9)
Given a population of size N = 80 with a mean of 8.2 and standard deviation of 3.2. What
is the probability that a sample of 25 will have a mean between 21 and 23.5?
(AIOU p-327 Prob. 6.41)
Examples (10)
For a population of size N = 80 with a mean of 8.2 and standard deviation of 2.1, find the
S.E of the mean for the following sample size (a) n= 16, (b) n= 25, (c) n = 49
(AIOU p-327 Prob. 6.42)
Examples (11)
Data on pull-off force (pounds) for connectors used in an automobile engine application
are as follows: (Douglas Montgomary Ch 7 page 228)
79.3, 75.1, 78.2, 74.1, 73.9, 75.0, 77.6, 77.3, 73.8, 74.6, 75.5, 74.0, 74.7,
75.9, 72.9, 73.8, 74.2, 78.1, 75.4, 76.3, 75.3, 76.2, 74.9, 78.0, 75.1, 76.8.
(a) Calculate a point estimate of the mean pull-off force of all connectors in
the population. State which estimator you used and why.
(b) Calculate point estimates of the population variance and the population
standard deviation.
(c) Calculate the standard error of the point estimate found in part (a). Provide
an interpretation of the standard error.
(d) Calculate a point estimate of the proportion of all connectors in the
population whose pull-off force is less than 73 pounds.
Examples (12)
Data on oxide thickness of semiconductors are as follows:
425, 431, 416, 419, 421, 436, 418, 410, 431, 433, 423, 426,
410, 435, 436, 428, 411, 426, 409, 437, 422, 428, 413, 416.
(a) Calculate a point estimate of the mean oxide thickness for all wafers in the
population.
(b) Calculate a point estimate of the standard deviation of oxide thickness for
all wafers in the population.
(c) Calculate the standard error of the point estimate from part (a).
(d) Calculate a point estimate of the proportion of wafers in the population
that have oxide thickness greater than 430 angstrom.
(Douglas Montgomary Ch 7 page 228)

Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
05 : Sampling Distributions and Methods of Estimation (11)

07 INTERVAL ESTIMATION
An interval estimate for a population parameter is called a confidence interval. A
confidence interval is constructed so that we have high confidence that it does contain the
unknown population parameter.
Objective
Construction of confidence intervals on the mean of a normal distribution using either the normal
distribution or the t-distribution method
Motivation
Whenever we use mathematical approximation formula, we should try to find out how
much the approximated value can at most deviate from the unknown true value. e.g. suppose that
in a certain case we obtain 2.47 as an approximated value of a given formula and 0.02 as the
maximum possible deviation from the unknown exact value. Then we are sure that the values
2.47 0.02 = 2.45 and 2.47 + 0.02 = 2.49 include the unknown exact value.
In estimating a parameter , the corresponding problem would be the determination of
two numerical 1 and 2 that depend on the sample values and include the unknown value of the
parameter with certainty. However we already know that from a sample we cannot draw
conclusions about the corresponding population that are 100% certain. So we choose a
probability 1- close to 1 (for example 1- = 95%, 99%). Then determine two quantities 1 and
2 such that the probability that 1 and 2 include the exact unknown value of the parameter
equal to 1-.
i.e. P(1 2) = 1-
The number 1- is called the confidence coefficient or the confidence level. It represents
the probability associated with the interval. We should choose by considering the affordable
risk of making false decision. The interval (1 , 2) is called 100(1-)% confidence interval for
the unknown parameter . If = 0.05, then the probability that the interval (1 , 2) contains is
0.95
Confidence Interval of a Population Mean
To compute a confidence interval for the population mean , we have to see whether or not
a. The population is normal
b. The population standard deviation is known
c. The sample size is small
(a) Confidence Interval on the Mean of a Normal Distribution with known 2
Let a sample of size n be drawn from a normal population with an unknown mean and
known variance 2. Then the sampling distribution of the mean
x will be normal with a mean
and standard deviation / n. Then

x-
Z=
/ n
is standard normal without considering how small the sample size is.

Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
05 : Sampling Distributions and Methods of Estimation (12)
The probability of falling Z in the interval ( - Z/2 , Z/2 ) is 1- and the corresponding
interval is:

- Z/2 Z Z/2

x-
- Z/2 Z/2
/ n
- Z / n
/2 x-Z /2 / n
-
x - Z/2 / n - -
x + Z/2 / n

x + Z/2 / n x - Z/2 / n
i.e.
x - Z / n x + Z / n
/2 /2

which is 100(1-)% confidence interval for of normal distribution with known 2


Examples (13)
Determine a 95% confidence interval for the mean of a normal distribution with variance
2 = 9 using a sample of 100 values taken with replacement with mean x = 5. Repeat the
example with n= 30, 15
Examples (14)
A confidence interval is constructed from a sample of size 25 taken with replacement for
the mean of a normal population with = 50. The limits for the interval are 110.2 and 135.8.
Find the confidence coefficient (or the confidence level).
Examples (15)
A sample of size n = 200 selected without replacement from a population of size N =
1000 with =1.08 showed that x = 69.2. Construct a 95% confidence interval for the true mean
of the population.
Examples (16)
Find a 90% confidence interval for the mean of a normal distribution with = 3 given the
sample (2.3, -0.2, -0.4, -0.9).
(b) Confidence Intervals for of Normal Distribution with unknown 2
In practice, the population variance 2 is usually not known and is estimated from the
sample data. So when the sample size n is small (n < 30) and 2 is replaced with its unbiased
2
2 (xi - x)
estimate s =
n-1
where v = n-1 is called the degree of freedom. i.e. the number of values we can choose freely.

x-
The statistic t =
s/ n
is used.

Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
05 : Sampling Distributions and Methods of Estimation (13)
and the corresponding interval is:
- t/2 (v) t t/2 (v)

x-
- t/2 (v) t/2 (v)
s/ n
-t s/ n
/2 (v) x-t /2 (v) s/ n
-
x - t/2 (v) s/ n - -
x + t/2 (v) s/ n

x + t/2 (v) s/ n x - t/2 (v) / n
i.e.
x-t s/ n x+t s/ n
/2 (v) /2 (v)

which is 100(1-)% confidence interval for of normal distribution with unknown


Examples (17)
Five independent measurements of the point of inflammation (flash point) of diesel oil
gave the values 144, 147, 146, 142, 144. assuming normality, determine 99% confidence interval
for the mean.
Examples (18)
Find a 99% confidence interval for the mean of a normal population from the sample 425,
420, 425, 435 length of 20 bolts with sample mean 20.2 cm and sample variance 0.04cm2. knoop
hardness of diamond 9500, 9800, 9750, 9200, 9400, 9550 copper contents (%) of brass 66, 66,
65, 64, 66, 67, 64, 65, 63, 64 melting point (0C) of aluminium 660, 667, 654, 663, 662
Examples (19)
A sample of size 16 from a normal population with unknown standard deviation gave

x = 14.5 and s = 5. Find 90% confidence interval for the mean.


Examples (20)
For the following sample sizes and confidence levels, find the appropriate t value for
constructing confidence intervals:
(i) n = 28, 95%
(ii) n = 8, 98%
(iii) n = 13, 90%
(iv) n = 10, 95%
(v) n = 25, 99%
Examples (21)
Seven homemakers were randomly sampled, and it was determined that the distances
they walked in their housework had an average of 39.2 miles per week and a sample standard
deviation of 3.2 miles per week. Construct a 95% confidence interval for the population mean.

Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore
05 : Sampling Distributions and Methods of Estimation (14)
Examples (22)
Given the following sample sizes and t values used to construct confidence intervals, find
the corresponding confidence level:
(i). n = 27, t = 2.056
(ii). n = 5, t = 2.132
Examples (23)
For the sample size 10 and confidence level 99%, find the appropriate t value for
constructing confidence intervals. Given the sample size18 and t values t = 2.898 used to
construct confidence intervals, find the corresponding confidence level.
Practice Problems
(1) If X ~ N (80, 25) Find
a. a point that has 14% area below it
b. a point that has 85.31% area above it
c. a point that has 30.5 % area above it
d. two points symmetrical to mean containing 92% area between them
(2)
If X ~ N(24, 16) Find
a. lower and upper quartiles
b. 37th percentile
c. median
d. mode
(3)
In a sample of 36 observations taken without replacement from normal distribution with mean
98.5 and standard deviation 16.5
(i) what is P[85 <
x < 100 ]
(ii) Find the corresponding probability given a sample of 36.

Muhammad Naeem Sandhu, Assistant Professor, Department of Mathematics, University of Engineering and Technology, Lahore

You might also like