You are on page 1of 57

INFERENTIAL

STATISTICS
BY:
ADIBARAWIAH BT ABDUL RAHIM (D20102040002)
NIK NURHAFIZAH BT NIK DAUD (D20102039968)
NOOR AZIE HARYANIS BT ABDUL AZIZ (D20102039969)

WHAT ARE INFERENTIAL STATISTICS?


Inferential statistics refer to certain procedures that
allow researchers to make inferences about a
population based on data obtained from a sample.

THE LOGIC OF
INFERENTIAL
STATISTICS

Sampling Error
Definition
Difference between a sample and its population (the data
obtained)
Arises as a result of taking a sample from population
rather than using the whole population.

Distribution of Sample Means


Called as sampling distribution
It has its own mean and standard deviation.
It is the mean of the means (from the samples)
The mean is equal to the mean of the population.

Large collections of random samples


Do pattern themselves in such a way that it is possible for
researchers to predict accurately some characteristics of
the population from which sample was selected.
Its means tend to be normally distributed (sizes of each of
the samples must be large; more than 30)

Standard Error Of The Mean (SEM)


The standard deviation for sampling distribution is
called the standard error of the mean (SEM)

n = sample size

CONFIDENCE INTERVALS
A confidence interval is a region extending both
above and below a sample statistic (such as a
sample mean) within which a population parameter
(such as the population mean) may be said to fall
with a specified probability of being wrong.
Limits or boundaries where the population mean
lies.
68% fall within 1 SEM of the mean
95% fall within 2 SEM of the mean
99% fall within 3 SEM of the mean

The Standard Error Of The Difference


Between Sample Means
The standard error of the difference (SED)
is the standard deviation of the distribution of differences
between samples means as well as the populations.
Formula:

HYPOTHESIS TESTING

13

HYPOTHESIS TESTING

RESEARCH
HYPOTHESIS

NULL
HYPOTHESIS

HYPOTHESIS TESTING
Statistical hypothesis testing is a way of
determining the probability that an obtained sample
statistic will occur, given a hypothetical population
parameter.

RESEARCH HYPOTHESIS
A research hypothesis specifies the nature of the
relationship the researcher thinks exists in the
population.
E.g :
The population mean of students using method A
is greater than the population mean of students
using method B.

NULL HYPOTHESIS
The null hypothesis typically specifies that there is no
relationship in the population.
E.g
There is no difference between the population mean of
students using method A and the population mean of
students using method B.
(This is the same thing as saying the difference between
the means of the two populations is zero.)

Steps in making hypothesis

1. State the research hypothesis.


There is a difference between the population mean of students using method A and the
population mean of students using method B

2. State the null hypothesis.


There is no difference between the population mean of students using method A and the
population mean of students using
method B.

3. Determine the sample statistics pertinent to the hypothesis.


the mean of sample A and the mean of sample B.

4. Determine the probability of obtaining the sample results.


the difference between the mean of sample A and the mean of sample B.

5. If the probability is small, reject the null hypothesis, thus affirming the
research hypothesis. If the probability is large, do not reject the null
hypothesis, which means you cannot affirm the research hypothesis.

PRACTICAL VS.
STATISTICAL
SIGNIFICANCE

20

PRACTICAL SIGNIFICANCE
A calculated difference is practically significant if
the actual difference it is estimating will affect a
decision to be made.
Practical significance is more subjective and is
based on other factors like cost, requirements,
program goals, etc.

When determining practical significance the


researcher must consider the following:

The quality of the research questions


The relative size of the effect
The size of the sample
The importance of the finding
Confidence intervals
The link to previous research
The strength of correlation

STATISTICAL SIGNIFICANCE
Statistical significance only means that ones results
are likely to occur by chance less than a certain
percentage of the time, say 5 percent.
The degree of risk that you are willing to take that
you will reject a null hypothesis when it is actually
true
Statistical significance is mathematical - it comes
from the data (sample size) and from your
confidence (how confident you want to be in your
results).

TESTS OF STATISTICAL
SIGNIFICANCE
A one-tailed test of significance involves the use of
probabilities based on one-half of a sampling
distribution because the research hypothesis is a
directional hypothesis.
A two-tailed test, on the other hand, involves the
use of probabilities based on both sides of a
sampling distribution because the research
hypothesis is a non directional hypothesis.

TWO TAILED TESTS


If you are using a significance level of 0.05, a two-tailed test allots half of
your alpha to testing the statistical significance in one direction and half of
your alpha to testing statistical significance in the other direction.
This means that .025 is in each tail of the distribution of your test statistic.
When using a two-tailed test, regardless of the direction of the relationship
you hypothesize, you are testing for the possibility of the relationship in
both directions.
For example, we may wish to compare the mean of a sample to a given
valuexusing a t-test. Our null hypothesis is that the mean is equal tox. A
two-tailed test will test both if the mean is significantly greater thanxand
if the mean significantly less thanx. The mean is considered significantly
different fromxif the test statistic is in the top 2.5% or bottom 2.5% of its
probability distribution, resulting in a p-value less than 0.05.

ONE TAILED TESTS


If you are using a significance level of .05, a one-tailed test allots all of your
alpha to testing the statistical significance in the one direction of interest.
This means that .05 is in one tail of the distribution of your test statistic.
When using a one-tailed test, you are testing for the possibility of the
relationship in one direction and completely disregarding the possibility of a
relationship in the other direction.
Our null hypothesis is that the mean is equal tox. A one-tailed test will test
either if the mean is significantly greater thanxor if the mean is significantly
less thanx, but not both.
Then, depending on the chosen tail, the mean is significantly greater than or
less thanxif the test statistic is in the top 5% of its probability distribution or
bottom 5% of its probability distribution, resulting in a p-value less than 0.05.
The one-tailed test provides more power to detect an effect in one direction
by not testing the effect in the other direction.

TYPE I AND TYPE II ERRORS


Type I Error: rejecting the null
hypothesis when it is true.
The probability of making a Type I
error
- Set by researcher

TYPE II ERROR: NOT


REJECTING THE NULL
HYPOTHESIS WHEN IT IS NOT
TRUE.

E.g., .01 = 1% chance of rejecting


null when it is true.

THE PROBABILITY OF MAKING


A TYPE I ERROR

E.g., .05 = 5% chance of rejecting


null when it is true.

- NOT DIRECTLY CONTROLLED


BY RESEARCHER

- Not the probability of making one or


more Type I errors on multiples tests
of null.

- REDUCED BY INCREASING
SAMPLE SIZE

Making a decision
If You

When the Null Hypothesis Is


Actually

Then You Have

Reject the null hypothesis

True (there really are no


differences)

Made a Type I Error

Reject the null hypothesis

False (there really are


differences)

Made a Correct decision

Accept the null hypothesis

False (there really are


differences)

Made a Type II Decision

Accept the null hypothesis

True (there really are no


differences)

Made a Correct Decision

SIGNIFICANCE LEVELS
The term significance level (or level of significance),
as used in research, refers to the probability of a
sample statistic occurring as a result of sampling
error.
The significance levels most commonly used in
educational research are the .05 and 01 levels.
Statistical significance and practical significance are
not necessarily the same. Even if a result is
statistically significant, it may not be practically
(i.e., educationally) significant.

Probability Values
p > .05 (deemed likely to be a result of chance)
p < .05 (not likely to be a result of chance)
p < .01 (less likely to be a result of chance)
p < .001 (even less likely to be a result of chance)
Researchers are more often reporting the actual probability
value rather than using < or > signs. (example p = .063)

INFERENCE
TECHNIQUES

33

Commonly Used Inferential


Techniques
Parametric

Nonparametric

Quantitative

t-test for means

Mann-Whitney U test

Analysis of variance
(ANOVA)

Kruskal-Wallis one-way
analysis of variance

Analysis of covariance
(ANCOVA)

Sign test

Multivariate analysis of
variance (MANOVA)

Friedman two-way analysis


of variance

t-test for r
Categorical

t-test for difference in


proportions

Chi square

34

PARAMETRIC TESTS FOR


QUANTITATIVE DATA
A parametric statistical test requires various
kinds of assumptions about the nature of the
population from which the samples involved in the
research study were taken.

35

The t -Test for Means


used to see whether a difference between the means of two
samples is significant.
produces a value for t (called an obtained t), to determine the level of
significance (p= .05) that has been reached.
Researcher rejects the null hypothesis and concludes that a real
difference does exist when the level of significance is reached.
Two forms of t-test:
A t-test for independent means
A t-test for correlated means

36

t-test for independent means


USED TO COMPARE THE MEAN
SCORES OF TWO DIFFERENT
OR INDEPENDENT GROUPS.
EXAMPLE:
2 randomly selected
groups of 18th graders (31
in each group) were exposed
to 2 different methods of
teaching for a semester and
they were given the same
achievement test at the end
of the semester. Their
achievement scores could be
compared using a t-test.

Null hypothesis: population mean of


method A = population mean of
method B
Research hypothesis: population mean
of method A > population mean of
method B
THE MEAN SCORE OF THE ACHIEVEMENT
TEST FOR METHOD A = 85
THE MEAN SCORE OF THE ACHIEVEMENT
TEST FOR METHOD B = 80
THE DIFFERENCE BETWEEN THE TWO
TEACHING METHODS (85-80=5) IS
INDICATED BY CONDUCTING A ONETAILED T-TEST, TO CONCLUDE WHETHER
THE DIFFERENCE IS STATISTICALLY
SIGNIFICANT OR NOT.

37

USED TO COMPARE THE


MEAN SCORES OF THE
SAME GROUP BEFORE AND
t-test
for Acorrelated
means
AFTER
TREATMENT
OF
SOME SORT IS GIVEN.
USED WHEN THE SAME
SUBJECTS RECEIVE TWO
DIFFERENT TREATMENTS IN A
STUDY.
EXAMPLE: INVESTIGATE THE
EFFECTIVENESS OF
RELAXATION TRAINING FOR
REDUCING THE LEVEL OF
ANXIETY SUCH ATHLETES
EXPERIENCE AND THUS
IMPROVING THEIR
PERFORMANCE AT THE FREE
THROW LINE. SHE
FORMULATES THESE
HYPOTHESES:

NULL HYPOTHESIS: THERE


WILL BE NO CHANGE IN
PERFORMANCE AT THE FREE
THROW LINE.
RESEARCH HYPOTHESIS:
PERFORMANCE AT THE FREE
THROW LINE WILL IMPROVE.

38

Analysis of Variance (ANOVA)


A ratio of observed differences between more
than two means.
It is more versatile than a t-test and should be used
in most cases instead of the t-test.
The analysis allows comparison of means of the
samples and testing of the null hypothesis
regarding no significance difference between means
of the samples.

39

Analysis of Covariance (ANCOVA)


Used when groups are given a pretest related in some
way to the dependent variable and their mean scores
on this pretest are found to differ.
Enables the researcher to adjust the posttest mean
scores on the dependent variable for each group to
compensate for the initial differences between the groups on
the pretest. The pretest is called the covariate.
How much the posttest mean scores must be adjusted
depends on how large the difference between the pretest
means is and the degree of relationship between the
covariate and the dependent variable.

40

Multivariate analysis of variance


(MANOVA)
Differs from ANOVA in only one respect: It
incorporates two or more dependent variables
in the same analysis, thus permitting a more
powerful test of differences among means.
It is justified only when the researcher has reason
to believe correlations exist among the dependent
variables.

41

The t -Test for r


Used to see whether a correlation coefficient
calculated on sample data is significantthat is,
whether it represents a nonzero correlation in the
population from which the sample was drawn.
The statistic being dealt with is a correlation
coefficient ( r ) rather than a difference between
means. The test produces a value for t (again called an
obtained t ), which the researcher checks in a statistical
probability table to see whether it is statistically
significant. As with the other parametric tests, the larger
the obtained value for t , the greater the likelihood that
significance has been achieved.
42

NONPARAMETRIC TESTS FOR


QUANTITATIVE DATA
A nonparametric statistical technique makes few,
if any, assumptions about the nature of the
population from which the samples in the study
were taken.
Some of the commonly used nonparametric
techniques for analyzing quantitative data are the
Mann-Whitney U test, the Kruskal-Wallis one-way
analysis of variance, the sign test, and the Friedman
two-way analysis of variance.

43

The Mann-Whitney U Test


a nonparametric alternative to the t -test used when a
researcher wishes to analyze ranked data. The
researcher intermingles the scores of the two groups and then
ranks them as if they were all from just one group.
The test produces a value ( U ), whose probability of
occurrence is then checked by the researcher in the
appropriate statistical table.
The logic of the test is as follows:
If the parent populations are identical, then the sum of the
pooled rankings for each group should be about the same.
If the summed ranks are markedly different, on the other hand,
then this difference is likely to be statistically significant.
44

The Kruskal-Wallis One-Way Analysis of


Variance
used when researchers have more than two
independent groups to compare.
The procedure is quite similar to the Mann-Whitney U
test except the sums of the ranks added together for
each of the separate groups are then compared.
This analysis produces a value ( H ), whose probability
of occurrence is checked by the researcher in the
appropriate statistical table.

45

The Sign Test


used when a researcher wants to analyze two related (as
opposed to independent) samples. Related samples are
connected in some way.
For example, often a researcher will try to equalize groups on IQ,
gender, age, or some other variable.
Another example of a related sample is when the same group is
both pre- and posttested (that is, tested twice). Each individual, in
other words, is tested on two different occasions (as with the t
-test for correlated means).
Procedures:
Simply lines up the pairs of related subjects and then determines how many
times the paired subjects in one group scored higher than those in the other
group. If the groups do not differ significantly, the totals for the two groups
should be about equal. If there is a marked difference in scoring (such as
many more in one group scoring higher), the difference may be statistically
46
significant.

The Friedman Two-Way Analysis of


Variance
If more than two related groups are involved,
then this test can be used.
Example:
This test would be appropriate if a researcher
employs four matched groups.

47

PARAMETRIC TESTS FOR CATEGORICAL DATA


The most common parametric technique for
analyzing categorical data is the t test for
differences in proportions.
t -Test for Proportions
Used to analyze whether the proportion in one
category (e.g., males) is different from the
proportion in another category (e.g.,
females)
Two forms similar like t-test for means in
quantitative data:
t-test for independent proportions
48
t-test for correlated proportions

NONPARAMETRIC TESTS FOR CATEGORICAL


DATA
The chi-square test is the nonparametric technique
most commonly used to analyze categorical data.
The Chi-Square Test
The chi-square statistics can be used to determine the
strength of the relationship (i.e., Does knowing
someones gender help you predict their outcome
score/value).
The test statistic is:

2= Chi-square value
O = Observed frequency for each category
E = Expected frequency for each category.
49

Chi-square example
We are interested in whether male students vs. female
students are more likely to own cats vs. dogs.
Notice that both variables are categorical.
Kind of pet: people are classified as owning cats or
dogs (or both or neither). We can count the number
of people belonging to each category; we dont
scale them along a dimension of pet ownership.
Sex: people are male or female. We count the
number of people in each category; we dont scale
each person along a sex dimension.

Example Data
Males are more
likely to have dogs
as opposed to cats
Females are more
likely to have cats
than dogs

Cat

Dog

Male

20

30

50

Female

30

20

50

50

50

100

NHST Question: Are these differences best accounted for by the null hypothesis or by
the hypothesis that there is a real relationship between gender and pet ownership?

To answer this question, we need to know what we


would expect to observe if the null hypothesis were
true (i.e., that there is no relationship between
these two variables, and any observed relationship
is due to sampling error).

Example Data
To find expected value for a
cell of the table, multiply
the corresponding row total
by the column total, and
divide by the grand total
For the first cell (and all
other cells), (50 x 50)/100
= 25
Thus, if the two variables
are unrelated, we would
expect to observe 25
people in each cell

Cat

Dog

Male

25

25

50

Female

25

25

50

50

50

100

Example Data
The differences between
these expected values and
the observed values are
aggregated according to the
Chi-square formula:

Female

O E 2

20 25 2
25

Male

Dog

20 25 2

30 25 2 50

25

25

30 25 2

20 25 2

25

25

50

Cat

30 25 2
25

30 25 2
25

25 25 25 25

1111 4
25 25 25 25

20 25 2
25

50

50
100

Null Hypothesis Significance Testing(NHST)


and chi-square
Once you have the chi-square statistic, it can be evaluated
against a chi-square sampling distribution
The sampling distribution characterizes the range of chisquare values we might observe if the null hypothesis is
true, but sampling error is giving rise to deviations from the
expected values.
You can look up the probability value associated with a chisquare statistic in a table of using a computer
In our example in which the chi-square was 4.0, the
associated p-value was >.05. (The chi-square statistic
would have had to have been larger than 3.8 for it to have
been significant.)

POWER OF A STATISTICAL TEST


The power of a statistical test for a particular set of
data is the likelihood of identifying a difference,
when in fact it exists, between population
parameters.
Parametric tests are generally, but not always, more
powerful than nonparametric tests.

56

REFERENCES
Fraenkel, J. R. , Wallen, N. E. , Hyun, H. H. (2012).
How to design and evaluate research in education.
New York: McGraw- Hill.
Idre. (2013).Tail tests.Available:
http://www.ats.ucla.edu/stat/mult_pkg/faq/general/t
ail_tests.htm. Last accessed 6th November 2013.
Sauro, J. (2004-2013). The standard error of the
mean. Retreived from
http://www.usablestats.com/lessons/sem
57

You might also like