You are on page 1of 29

Sampling and Sampling

Distributions
Biswo Poudel
KUSOM
Why Take Samples?
To save money, time
To maximize information gleaned out of
limited resource
Often it might be the only option. If access to
the population is impossible, it could be the
only option. (how would you survey the
owners of old Omega watches in Nepal?)
Census vs sampling
Question: When is taking census a better
option than taking a sample?
Answer: When omission of a group of
population is not tolerable for the researcher.
Example: all airplanes are tested thoroughly
because their performance is individually so
important.
Frame
The target population from which the sample is
taken.
This population list, map, directory or other
source used to represent the population is called
frame.
Can also be school list, trade association lists, lists
sold by list brokers.
Frames may have overregistration (including
more than target population) or
underregistration.
Random Vs Nonrandom Sampling
Random Sampling: Every unit of the population has the
same probability of being selected into the sample. For
example: lottery outcomes. This is also called
probability sampling.
Nonrandom Sampling: Not every unit of the population
has the same probability of being selected into the
sample. This is also called nonprobability sampling.
Assigning the probability of occurrence in nonrandom
sampling is impossible.
Nonrandom sampling data are not amenable to
analysis by most of the statistical techniques.
Random Sample Techniques
There are four basic random sample
techniques.
1. Simple Random Sample Technique : this is
the most elementary technique. Number each
unit of the frame from 1 to N. Select n items
out of that into sample by using some random
number generator.
2. Stratified Random Sampling
In this, population is subdivided into
nonoverlapping subpopulations called strata.
The researcher then extracts random sample
from each subpopulation.
It has potential for reducing sample error.
How to choose strata? (a) must be internally
homogenous, externally must contrast with each
other. (b) do stratification by demographic
variables such as gender, socioeconomic class,
geographic region, religion and ethnicity.
Stratified random sampling(SRS) could be
either proportionate or disproportionate.
Proportionate SRS occurs when the
percentage of the sample taken from each
stratum is proportionate to the percentage
that each stratum is within the whole
population. If the Sampling is not
proportionate, then it is disproportionate SRS.
Example of proportionate SRS: Suppose we
are sampling population of Kathmandu.
Kathmandu has 30% Newars. Suppose you
have divided your population into strata
involving ethnicity. If you are taking a sample
of 100 people, then you want to make sure 30
Newars are in the sample.
3. Systematic Sampling
Used because of its convenience and relative
ease of administration.
Every kth item is selected to produce a sample
of size n from a population of size N.
Value of k, sometimes called sampling cycle, is
given by .
For this to be useful, the source of population
elements is random.
n
N
k
4. Cluster (Area) Sampling
Divide population into nonoverlapping areas
(Clusters) that are internally heterogenous.
Each cluster is, in theory, a microcosm of the
population.
For example, Chitwan could be a cluster, when
thinking of taking a sample of Nepal. Other
cities, districts, metropolitan areas can also
qualify as a cluster.
After choosing clusters, the researcher either selects all
elements from the cluster or randomly selects
individual elements into the sample from the clusters.
Two stage sampling: when clusters are too big, and
another cluster is picked up from within a big cluster.
Advantage: cost, convenience. Since all data are picked
from one cluster, the movement cost is reduced.
Disadvantage: If the elements are similar, then the
cluster sampling may be inefficient compared to simple
random sampling. If all elements of a cluster are same,
then it is not better than sampling one individual.
Nonrandom Sampling techniques
Also called nonprobability techniques since
chance is not used to select elements from the
samples.
Four nonrandom sampling techniques are
presented here.
1. Convenience Sampling: elements for the
sample are selected for the convenience of
the researcher. Researcher chooses samples
that are readily available.
2 Judgment sampling
Elements selected for the sample are chosen by
the judgment of the researcher.
Researchers often believe they can obtain right
sample by using their sound judgment.
Sampling errors are hard to determine because
the samples are put together nonrandomly.
Problems: judgement error might be in one
direction (introducing bias), unlikely to include
extreme elements
3 Quota Sampling
In essence, similar to Stratified Random
Sampling(SRS).
Certain population subclasses are used as
strata.
Use nonrandom sampling technique to gather
data from each strata.
For example: one may go to a Newar
community (say in Sundhara , Lalitpur) and
interview people there until the quota is filled.
Advantage: cost, easy
Disadvantage: it is essentially a nonrandom
sampling.
4. Snowball Sampling
Survey subjects are selected based on
referrals from other survey respondents.
First pick a person who fits the profile of
subject wanted for the study. Then ask this
person to refer others who have similar
profile.
Advantage: survey objects are identified
cheaply and efficiently.
Disadvantage: this is nonrandom.
Sampling Errors and Nonsampling
Errors
Sampling errors: error that occurs when the
sample is not representative of the
population.
Nonsampling errors: all other errors such as
missing data, recording errors, in put
processing errors, analysis errors, response
errors, measurement instrument caused
errors, defective questionairre error, poor
concept errors etc etc.
Sample Mean and Sample Proportion
Whenever a research produces measurable data such as
weight, distance, time and income, the sample mean is often
the statistics of choice. If the research results in countable
items such as how many people in a sample choose Coca Cola,
the sample proportion is often the statistics of choice.
Sample Proportion ( )
sample the in items of number n
stics characteri the have that sample a in items of x
where
n
x
p

Distribution of Sample Mean


Central Limit Theorem: If samples of size n
are drawn randomly from a population that
has a mean of and standard deviation ,
then the sample means are approximately
normally distributed for sufficiently large
sample sizes (greater than 30) regardless of
the shape of the population distribution. If the
population is normally distributed, the sample
means are normally distributed for any size
sample.

Example:
Suppose during any hour in a large
department store, the average number of
shoppers is 448, with a standard deviation of
21 shoppers. What is the probability that a
random sample of 49 different shopping hours
will yield a sample mean between 441 and
446 shoppers?
Answer: problem is to determine ) 446 441 ( x P
Notice that




This leads to the probability of the value being
between 441 and 446 to be 0.4901-
0.2486=0.2415; i.e. 24.15%.
2486 . 0 ; 4901 . 0
67 . 0
49
21
448 446
33 . 2
49
21
448 441

values z
z
z
Correction for finite sample
If the sample is taken from a finite population
of size N, then the z-value for sample size n
has to be calculated using the following
formula:
1

N
n N
n
x
z


Example
A production companys 350 hourly employees
average 37.6 years of age, with a standard
deviation of 8.3 years. If a random sample of 45
hourly employees is taken, what is the probability
that the sample will have an average age of less
than 40 years?


Associate probability: 0.4808. Probability of
getting average less than 40 years is: 0.9808
07 . 2
1 350
45 350
45
3 . 8
6 . 37 40
1

N
n N
n
x
z


If the correction had not been used..
The answer would have been 0.9738 (with
associated z-value being 1.94).
Sampling Distribution of proportion
Normal distribution approximates the shape
of the distribution of sample proportions if
n.p>5 and n.q>5 (where q=1-p).

Z-value for proportion
n
pq
p p
z


Example:
Suppose 60% of the electrical contractors in a
region use a particular brand of wire. What is
the probability of taking a random sample of
size 120 from these electrical contractors and
finding that 0.5 or less use that brand of wire?
Here
24 . 2
120
4 . 0 6 . 0
6 . 0 5 . 0
; 120
; 50 . 0
; 60 . 0

n
pq
p p
z
n
p
p
Z-table associated with -2.24 is 0.4875. Hence
the probability of z getting less than this value
is less than 0.0125.
Example: 2
If 10% of a population of parts is defective,
what is the probability of randomly selecting
80 parts and finding that 12 or more parts are
defective?
Here 12/80=.15; p= which is
associated with 0.0681.
49 . 1
80
9 . 0 1 . 0
1 . 0 15 . 0

You might also like