You are on page 1of 44

Statistical Inference:

Hypothesis Testing
Learning Objectives

 Understand hypothesis-testing procedure using one-tailed and two-


tailed tests

 Understand the concepts of Type I and Type II errors in hypothesis


testing

 Understand the procedure of hypothesis testing


Introduction to Hypothesis Testing

 A statistical hypothesis is an assumption about an unknown population


parameter.

 Hypothesis testing is a well defined procedure which helps us to decide


objectively whether to accept or reject the hypothesis based on the
information available from the sample.

 In statistical analysis, we use the concept of probability to specify a


probability level at which a researcher concludes that the observed
difference between the sample statistic and the population parameter
is not due to chance.
Hypothesis Testing Procedure
Seven steps of hypothesis testing
Step 1: Set Null and Alternative Hypotheses
 The null hypothesis generally referred by H0 (H sub-zero), is the hypothesis
which is tested for possible rejection under the assumption that is true.
Theoretically, a null hypothesis is set as no difference or status quo and
considered true, until and unless it is proved wrong by the collected
sample data.
 Symbolically, a null hypothesis is represented as:

 The alternative hypothesis, generally referred by H1 (H sub-one), is a logical


opposite of the null hypothesis.

 Symbolically, alternative hypothesis is represented as:


Step 2: Determine the Appropriate
Statistical Test

 Type, number, and the level of data may provide a platform for
deciding the statistical test.
Step 3: Set the Level of Significance

 The level of significance generally denoted by α is the


probability, which is attached to a null hypothesis, which may be
rejected even when it is true.

 The level of significance is also known as the size of the rejection


region or the size of the critical region.

 The levels of significance which are generally applied by


researchers are: 0.01; 0.05; 0.10.
Type I and Type II Errors
When a researcher tests statistical hypotheses, there can be four possible
outcomes as follows:
Step 4: Set the Decision Rule
Acceptance and rejection regions of null hypothesis (two-tailed test)

Critical region is the area under the normal curve, divided into two mutually
exclusive regions. These regions are termed as acceptance region (when the null
hypothesis is accepted) and the rejection region or critical region (when the null
hypothesis is rejected).
Two-Tailed Test of Hypothesis

 Let us consider the null and alternative hypotheses as below:

 Two-tailed tests contain the rejection region on both the tails of


the sampling distribution of a test statistic. This means a
researcher will reject the null hypothesis if the computed sample
statistic is significantly higher than or lower than the
hypothesized population parameter (considering both the tails,
right as well as left).
Acceptance and rejection regions (alpha = 0.05)
One-Tailed Test of Hypothesis
Let us consider a null and alternative hypotheses as below:

One-tailed test contains the rejection region on one tail of the


sampling distribution of a test statistic. In case of a left-tailed test, a
researcher rejects the null hypothesis if the computed sample
statistic is significantly lower than the hypothesized population
parameter.

In case of a right-tailed test, a researcher rejects the null hypothesis


if the computed sample statistic is significantly higher than the
hypothesized population parameter.
Acceptance and rejection regions for one-tailed (left)
test (alpha = 0.05)
Acceptance and rejection regions for one-tailed (right)
test (alpha = 0.05)
Step 5: Collect the Sample Data

 In this stage of sampling, data are collected and the appropriate


sample statistics are computed.

 The first four steps should be completed before collecting the


data for the study.

 It is not advisable to collect the data first and then decide on the
stages of hypothesis testing.
Step 6: Analyse the data

 In this step, the researcher has to compute the test statistic. This
involves selection of an appropriate probability distribution for a
particular test.
 Some of the commonly used testing procedures are z, t, F, and
χ2.
Step 7: Arrive at a Statistical Conclusion
and Business Implication

 In this step, the researchers draw a statistical conclusion. A


statistical conclusion is a decision to accept or reject a null
hypothesis.
 Statisticians present the information obtained using hypothesis-
testing procedure to the decision makers. Decisions are made on
the basis of this information. Ultimately, a decision maker
decides that a statistically significant result is a substantive result
and needs to be implemented for meeting the organization’s
goals.
Hypothesis Testing for a Single Population
Mean Using the Z Statistic

 When sample size is greater than (equals to) 30.


 Population has a normal distribution.
Hypothesis Testing for a Single Population
Mean Using the Z Statistic

A marketing research firm conducted a survey 10 years ago and found that
the average household income of a particular geographic region is Rs
10,000. Mr. Gupta, who has recently joined the firm as a vice president has
expressed doubts about the accuracy of the data. For verifying the data,
the firm has decided to take a random sample of 200 households that
yield a sample mean (for household income) of Rs 11,000. Assume that
the population standard deviation of the household income is Rs 1200.
Verify Mr. Gupta’s doubts using the seven steps of hypothesis testing. Let
α = 0.05 (5%).
Example (Solution)
Hypothesis Testing for a Single Population
Mean Using the T Statistic (Case of a
Small Random Sample When N < 30)
When a researcher draw a small random sample (n < 30) to estimate
the population mean μ and when the population standard deviation
is unknown and population is normally distributed, t-test can be
applied.
Example
Royal Tyres has launched a new brand of tyres for tractors and claims
that under normal circumstances the average life of the tyres is
40,000 km. A retailer wants to test this claim and has taken a
random sample of 8 tyres. He tests the life of the tyres under normal
circumstance. The results obtained are presented in Table 10.4.
Example (Solution)
Figure : Computed and critical t values for Example 10.4
Lets Do It !!

A cable TV network company wants to provide modern facilities to its


consumers. The company has five-year old data which reveals that the
average household income is Rs 120,000. Company officials believe that
due to the fast development in the region, the average household income
might have increased. The company takes a random sample of 25
households to verify this assumption. From the sample the average
income of the households is calculated as 125,000. From historical data,
standard deviation is obtained as 1200. Use alpha = 0.05 to verify the
finding.
Statistical Inference:
Hypothesis Testing for
Two Populations
Hypotheseis Testing for the Difference
Between Two Population Means Using the
Z Statistic (Case of a large Random
Sample, n1, n2 > 30, When Population
Standard Deviation Is Known)

When sample size is large (n1, n2 > 30) and samples are independent
(not related) and the population standard deviation is known, the Z
statistic can be used to test the hypothesis for difference between
two population means.
Hypotheseis Testing for the Difference Between
Two Population Means Using the Z-Statistic (Case
of a large Random Sample, n1, n2 > 30)
LET’S DO IT !

The amount of a certain trace element in


blood is known to vary with a standard
deviation of 14.1 ppm (parts per million)
for male blood donors and 9.5 ppm for
female donors. Random samples of 75 male
and 50 female donors yield concentration
means of 28 and 33 ppm, respectively.
What is the likelihood that the population
means of concentrations of the element are
the same for men and women?
Your Turn !
Dominos wanted to test their claim regarding
who can eat more slices of Pizza in a Pizza eating
festival for males vs females. For the purpose,
they randomly selected 55 males and 50 females.
The average number of slices eaten by males
were 450 with a population standard deviation of
25 (from historical data) and the average number
of slices eaten by females were 550 with a
population standard deviation of 20 (from
historical data). On the basis of the samples
taken for the study, estimate the difference in
population means taking 5% as the level of
significance and help Dominos
Hypotheseis Testing for the Difference
Between Two Population Means Using the
t Statistic (Case of a Small Random
Sample, n1, n2 < 30, When Population
Standard Deviation Is Unknown)

When sample size is small (n1, n2 < 30) and samples are
independent (not related) and the population standard deviation is
unknown, the t statistic can be used to test the hypothesis for
difference between two population means.
Hypotheseis Testing for the Difference Between
Two Population Means Using the t Statistic (Case
of a Small Random Sample, n1, n2 < 30, When
Population Standard Deviation Is Unknown)
LET’S DO IT !
Anmol Constructions is a leading company in the construction
sector in India. It wants to construct flats in Raipur and Dehradun,
the capitals of the newly formed states of Chattisgarh and
Uttarakhand, respectively. The company wants to estimate the
amount that customers are willing to spend on purchasing a flat
in the two cities. It randomly selected 25 potential customers
from Raipur and 27 customers from Dehradun and posed the
question, “how much are you willing to spend on a flat?” The
data collected from the two cities is shown in Table 11.2(a) and
Table 11.2(b). The company assumes that the intention to
purchase of the customers is normally distributed with equal
variance of 15.39 in the two cities taken for the study. On the
basis of the samples taken for the study, estimate the difference
in population means taking 95% as the confidence level.
Example 11.2 (Contd.)
Solution (Example 11.2)
Statistical Inference About the Difference
Between the Means of Two Related Populations
(Matched Samples)

 For dependent samples or related samples, it is important that the two


samples taken in the study are of the same size.

 t Formula to test the difference between the means of two related


populations (matched samples)
Example 11.3

An electronic goods company


arranged a special training
programme for one segment of its
employees. The company wants to
measure the change in the attitude
of its employees after the training.
For this purpose, it has used a well-
designed questionnaire, which
consists of 10 questions on a 1 to 5
rating scale (1 is strongly disagree
and 5 is strongly agree). The
company selected a random sample
of 10 employees. The scores
obtained by these employees are
given in Table 11.3 with S.D of 4.44.

Use α = 0.10 to determine whether there is a significant


change in the attitude of employees after the training
programme.
Solution (Example 11.3)
F test for the difference in two population
variances
F Distribution
 In the F distribution, degrees of freedom are attached to the
numerator and denominator, which decide the shape of the F
distribution. F distribution is based on the assumption that the
populations from which samples are drawn are normally
distributed.
 The F distribution is neither symmetric nor does it have a zero
mean value. So, the simple procedure of obtaining the upper-tail
value and merely placing a minus sign besides to the upper-tail
value for obtaining the lower tail value is not applicable here.
 The F value is always positive because it is a ratio of two
variances (two squared quantities). The lower-tail value is
obtained by using the reciprocal property of the F distribution.
F Distribution (Contd.)
The reciprocal property can be stated as:
Example 11.5

A plant has installed two machines producing polythene bags. During the
installation, the manufacturer of the machine has stated that the capacity of the
machine is to produce 20 bags in a day. Owing to various factors such as
different operators working on these machines, raw material, etc. there is a
variation in the number of bags produced at the end of the day. The company
researcher has taken a random sample of bags produced in 10 days for machine
1 and 13 days for machine 2, respectively. The following data gives the number
of units of an item produced on a sampled day by the two machines:

How can the researcher determine whether the variance is from the same
population (population variances are equal) or it comes from different
populations (population variances are not equal)? Take α = 0.05 as the
confidence level.
Solution (Example 11.5)

You might also like