Statistics Overview

What is statistics?
Inference and uncertainty: This is what

statistics is all about.
Statistics consists of a body of methods for
collecting and analyzing data. (Agresti & Finlay,
1997)
Developed for interpreting and drawing
conclusions from collected data
The major objective of statistics is to make
inferences about population from the analysis
of the sample data
What does statistics provide?

Design: Planning and carrying out research
studies
Description: Summarizing and exploring data
Inference: Making predictions and
generalizing about phenomena represented
by the data
Population vs. sample
Steps in Planning Statistical analysis
Terms and Terminologies

Population- Total group of samples or individuals
that the researcher is interested to study.
Sample- A group of individuals selected from the
population
Parameter- is a characteristic of a population
Statistic- is a characteristic of a sample
Variable- characteristic or attribute that can
assume different values.
Variate- A random variable taken from a known
probability distribution

Descriptive statistics- describe the relationship
between variables. E.g. Frequencies, means,
standard deviation
Exploratory statistics- Usually represented in
the form of graphs to see the patterns in a
datum.
Inferential statistics- are used to draw
inferences about a population from a random
sample

Qualitative variable- Also known as categorical
variable. Usually measured on a nominal scale.
Quantitative variable- They are measured on a
numeric scale. Ordinal, interval and ratio scales
are quantitative
Discrete variable- countable in a finite amount of
time.
Continuous variable- would (literally) take forever
to count. In fact, you would get to forever and
never finish counting them
Random sampling
Systematic sampling
Convenience sampling
Stratified sampling
Cluster sampling
Sampling Error- is the difference between the
sample measure and the corresponding
population measure
Descriptive vs. Inferential statistics

Descriptive statistics consist of methods for
organizing and summarizing information
(Weiss, 1999)
Inferential statistics consist of methods for
drawing and measuring the reliability of
conclusions about population based on
information obtained from a sample of the
population. (Weiss, 1999)
Types of Statistical Approaches

Descriptive Statistics- Describes your data
- How many?
- How much?
Exploratory Statistics- represented in the form

of graphs
- Is there any pattern?
- Are data points clustered or stretched?
Types of Statistical Approaches

Inferential Statistics
-
Are there any differences?

What is the relationship?
What is the effect?
Model building
What determines what?
Distributions
Positively skewed
Symmetric
Negatively skewed
Distributions
Distributions
Normal Probability distribution
Mean, Median and mode are same

Bell-shaped curve symmetrical around mean
Probability area under the curve will be 1
Denoted by
Normal Probability Distribution
Areas under a normal distribution curve
Common types of Probability

distributions
Other important types of
distribution
1.Poisson
2.Binomial
Poisson Distribution
used to represent the number of successive
independent events of a specified type with
low probability of occurrence (< 10%) in some
specified interval of time or space.
Example cases of flu
Denoted by
Poisson Distribution
Binomial Distribution
An experiment that consists of n independent,
repeated trials, each of which can end in only one
of two ways arbitrarily labeled success or
failure.
The probability that any trial ends in a success
is p (and hence q = 1 p for a failure).
Denoted by
where in
Binomial Distribution
Central Limit Theorem

Sampling distribution of means
As the sample size n increases without limit,
the shape of the distribution of the sample
means taken with replacement from a
population with mean m and standard
deviation will approach a normal
distribution.
This distribution will have a mean m and a
standard deviation /n

Importance of Central limit theorem
- we can describe the sampling distribution
from any variable without actually having to
infinitely sample the population of raw scores.
Types of Variables
Nominal
Ordinal
Interval
Ratio
Types of Variables
Sampling Techniques
Random sampling
Systematic sampling
Stratified sampling
Cluster sampling
Other sampling techniques- Convenience
sampling, Sequential sampling, Double
sampling and multi-stage sampling
Theory of Probability
Experiment
Outcome
Sample space
Event
Theory of Probability
P [A]= No of Possible outcomes in which an event A occurs
Total No of possible outcomes in the sample space
Where P [A]= Probability that an Event B will occur

P(A)= 0 to 1
P(A)+P(B)+.+P(n)= 1
P(AorB) = P(A)+P(B) >> Disjoint event
Independent events
P(AandB) = P(A)*P(B) >> Joint event
Theory of probability
P(AUB) = P(A)+P(B)- P(AB) >> contingent joint event

P(AB) = P(A)+P(B)- P(AUB) >> contingent joint event
P(A|B) = P(AB)/P(B) >>conditional probability for A
P(B|A) = P(AB)/P(A) >>conditional probability for B
Definition of Probability
A probability measure is a rule, say P, which associates
with each event contained in a sample space S a number such
that the following properties are satisfied:
1 For any event, A, P(A) 0.
2 P(S) = 1 (since S contains all the outcomes, S always
occurs).
3 P(not A)+P(A)=1.
4 If A and B are mutually exclusive events (that cannot
occur simultaneously) and independent events (that are not linked in
any way), then P(A or B) = P(A) + P(B) and
P(A and B) = 0
Note: Many elementary probability theorems (rules) follow directly

from these definitions.
Confidence Intervals
The range around any hypothetical value of
mean () within which 95% of the means of all
samples of size n taken from that population
will occur.
Denoted by
Where 95% confidence interval for mean, when population variable X is

normally distributed and known
Understanding Z-statistic
Distribution of the Z statistic (the ratio of the difference of population mean and
sample mean divided by the Standard error of the mean (SEM) obtained by taking
the means of a large number of small samples from a normal distribution). The 95%
confidence interval obtained by taking the means of a large number of small samples
from a normally distributed population with known statistics is indicated by the black
horizontal bar enclosed within 1.96 SEM. By chance 95% of the sample means
will be within the range 1.96 to +1.96 , with the remaining 5% outside this range
With larger sample sizes,
the 95% confidence
intervals get smaller
P-Value
It is defined as the probability of getting the
observed result, or a more extreme result, if the
null hypothesis is true. In other words it is the
measure of the likelihood of the result given the
null hypothesis is true or the statistical
significance of the claim.
range from 0 to 1
P-Value
"P=0.030" is a shorthand way of saying "The
probability of getting 17 or fewer male
chickens out of 48 total chickens, IF the null
hypothesis is true that 50 percent of chickens
are male, is 0.030.
It is a usual convention in biology to use
a critical P-value of 0.05 (often called alpha, )
P-Value
This p-value measures how likely it was that
you would have gotten your sample results if
the null hypothesis were true.
The farther out your test statistic is on the tails
of the standard normal distribution, the
smaller the p-value will be, and the more
evidence you have against the null hypothesis
being true
Interpreting P-value
If the p-value is greater than or equal to , you
fail to reject Ho.
If the p-value is less than , reject Ho.
p-values on the borderline (very close to )
are treated as marginal results
- Heres how to interpret your results for any
given alpha level
To make a proper decision about whether or
not to reject Ho, you determine your cutoff
probability for your p-value before doing a
hypothesis test; this cutoff is called an alpha
level ().
Typical values for are 0.05 or 0.01
- How to interpret your results if you use an alpha level of
0.05
If the p-value is less than 0.01 (very small), the results are
considered highly statistically significant reject Ho.
If the p-value is between 0.05 and 0.01 (but not close to
0.05), the results are considered statistically significant
reject Ho
If the p-value is close to 0.05, the results are considered
marginally significant decision could go either way
If the p-value is greater than (but not close to) 0.05, the
results are considered non-significant dont reject Ho
Biological vs statistical hypotheses

Biological and statistical hypothesis- "Sexual selection by females has not caused
male chickens to evolve bigger feet than
females
- Male chickens dont have a different average
foot size than females
Statistical Hypothesis
Statistical Hypothesis- statement about the
probability distribution of populations using one
or more data samples
Hypothesis H0: All data samples originate from
the same population (or the single data sample is
consistent with a given theoretical distribution).
Hypothesis H1: Some data samples do not
originate from the same population (or the single
data sample is not consistent with the given
theoretical distribution).
Statistical Inference and Hypothesis

Testing
What do we mean by chance?
What do we mean unlikely?
What do we mean by effect?
Hypothesis and Significance Testing

Hypothesis- is a statement about some
characteristic of a variable or a collection of
variables. (Agresti & Finlay, 1997).
Significance test- is a way of statistically

testing a hypothesis by comparing the data to
values predicted by the hypothesis
The Process of Hypothesis Testing
The Mechanism of Hypothesis Testing
Sample selected at random from very different population may not necessarily be different. Simply by
chance the samples from populations 1 and 2 are similar, so you might mistakenly conclude the two
populations are also similar
The Mechanism of Hypothesis Testing
Even a random sample may not necessarily be a good representative of the population. Two samples have been
taken at random from the same population. By chance, sample 1 contains a group of relatively large fish, while
those in sample 2 are relatively small.
Type I & Type II errors
Test Statistics and your decision
Type I & Type II errors
Four possible results of hypothesis testing
Parametric statistics
Also known as classical statistics
Parametric tests are designed for analysing
data from a known distribution
ANOVA (1920s and 30s), Multiple Regression
(1800s), T-tests (1900s), Pearson Correlation
(1880s) are parametric statistical methods
Parametric statistics
General Assumptions of Parametric Statistical
Tests
1. The sample of n subjects is randomly selected
from the population.
2. The variables are continuous and from the
normal distribution
3. The measurement of each variable is based
on interval or ratio data
Non parametric Statistics

Sometimes called distribution free statistics
Do not require data to be normally distributed
In general, a less powerful test than the
analogous parametric test
No normality assumption
Uses less information
Spearmans Rho (1904), Kendalls Tau (1938),
Kruskal-Wallis (1950s), Wilcoxon Signed-Ranks
Matched Pairs (1940s)
Parametric vs Non Parametric

Parametric test
T-test (unpaired)
Paired t-test
ANOVA
Repeated measures
ANOVA
Non-parametric analog
Wilcoxon rank sum test
Wilcoxon signed rank test
Kruskal-Wallis test
Friedman test
The parametric tests are called parametric because,

when we calculate the p-value, we use the parameters of
the normal distribution: mean and standard deviation
The non-parametric tests do not estimate these
parameters, but instead are based on ranks
Hypothesis and Statistical Tests

main goal of a statistical or Hypothesis testwhat is the probability of getting a result like
my observed data, if the null hypothesis were
true
Evaluate and compare groups of data
To determine whether hypothesis can be
retained or rejected and modified
can refer to a single group
can also refer to two groups
Steps for a hypothesis Test

1. Set up the null and alternative hypotheses:
Ho and Ha.
2. Take a random sample of individuals from the
population and calculate the sample statistics
(means and standard deviations).
3. Convert the sample statistic to a test statistic by
changing it to a standard score (all formulas for
test statistics are provided later in this chapter).
4. Find the p-value for your test statistic.
5. Examine your p-value and make your decision.
Structure of Hypothesis Tests

1. Choose the appropriate test.
2. Establish the null and alternate hypotheses.
3. Decide on an acceptable error rate .
4. Compute the test statistic from the data.
5. Compute the p-value.
6. Reject the null hypothesis if p .
Sampling Distributions
Major parametric test statistics Z distribution
T distribution
Chi-square
F distribution
Sample size is the key
Sampling test Distributions
Four common probability distributions of sample statistics z, t, chi-square,

and F
Z distribution
Represents the probability distribution of a
random variable that is the ratio of the
difference between a sample statistic and its
population value to the standard deviation of
the population statistic
Students t Distribution
Chi-square Distribution
represents the probability distribution of a
variable that is the square of values from a
standard normal distribution
bounded by 0 and infinity
used for interval estimation of population
variances
can also be used to determine the probability
of obtaining a sample difference (or one smaller
or larger) between observed values and those
predicted by a model
F Distribution
represents the probability distribution of a variable
that is the ratio of two independent chi-square
variables, each divided by its df (degrees of freedom)
(Hays 1994).
Because variances are distributed as 2, the F
distribution is used for testing hypotheses about ratios
of variances.
bounded by zero and infinity.
Used to determine the probability of obtaining a
sample variance ratio (or one larger) for a specified
value of the true ratio between variances
Hypothesis Testing
Null Hypothesis(H)&Alternate Hypothesis(H)
H: = / H: (Two-tailed test)
H: = / H: (one-tailed test)
Types of hypothesis tests
Associations and Differences

Relationship between variables Associations
and Differences
Association- The relationship between a wing
length and weight of a growing bird
Difference- The relationship between the
mean tail length of Gull-billed Tern and the
mean tail length of Common Tern.
Difference of mean tests

One sample t-test
Two independent samples t-test
t= SE /n
where t represents the effect size or test
statistic
Paired samples t-test
K-independent samples (n>2)

- ANOVA (Analysis of Variance)
One way ANOVA
Two way ANOVA
Difference of mean tests (Non parametric)

- One sample
Runs test
- Two independent samples
Kolmogrov-Smirnov test
Mann Whitney U test
Difference of mean tests (Non parametric)
Paired samples
Wilcoxon signed Ranks test
Mc Nemars test
Marginal Homogeneity test
- K independent samples
Kruskall- Wallis test
Friedmans Rank test
Test of Proportions, ratios and indices

Chi-square test
Goodness of fit
Correlation
Pearsons product moment correlation (r)
To investigate linear relationships between
two independent variables
r -1 to +1
Correlation
Scatter plots with various correlations
Regression
Prediction is made on the assumption the
hypothesis is correct
Simple linear regression
Investigate relationships- Dependent and
independent variable
Best fit linear line describes relation between
X and Y
Regression coefficient/ Coefficient of
determination (R)
Regression
Regression lines by gender and parity status for predicting weight at 1 month of age in
term babies
Classification of some hypothesis tests
Summary of Statistical Tests
Common Errors of statistical analysis

Samples are not random
Sample size is too low for any meaningful
interpretation
Non-independence of sample data
Overuse of non-parametric statistics, even
with low sample size
Failure to do a graphical exploration
Common Errors of statistical analysis

Power analysis and effect size
Interpreting simple correlation as cause and
effect
Use of complex model and multivariate
statistics without verifying the merit of the
data
Power of a test
Measure of likelihood of a test reaching a
correct conclusion

Statistics Overview

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics Overview

Uploaded by

Copyright:

Available Formats

What is statistics?

Inference and uncertainty: This is what

What does statistics provide?

Population vs. sample

Steps in Planning Statistical analysis

Terms and Terminologies

Terms and Terminologies

Terms and Terminologies

Terms and Terminologies

Descriptive vs. Inferential statistics

Types of Statistical Approaches

Exploratory Statistics- represented in the form

Types of Statistical Approaches

Are there any differences?

Normal Probability distribution

Mean, Median and mode are same

Normal Probability Distribution

Areas under a normal distribution curve

Common types of Probability

Central Limit Theorem

Central Limit Theorem

Central Limit Theorem

Where P [A]= Probability that an Event B will occur

P(AUB) = P(A)+P(B)- P(AB) >> contingent joint event

Note: Many elementary probability theorems (rules) follow directly

Where 95% confidence interval for mean, when population variable X is

Biological vs statistical hypotheses

Statistical Inference and Hypothesis

Hypothesis and Significance Testing

Significance test- is a way of statistically

The Process of Hypothesis Testing

The Mechanism of Hypothesis Testing

The Mechanism of Hypothesis Testing

Type I & Type II errors

Test Statistics and your decision

Type I & Type II errors

Four possible results of hypothesis testing

Non parametric Statistics

Parametric vs Non Parametric

The parametric tests are called parametric because,

Hypothesis and Statistical Tests

Steps for a hypothesis Test

Structure of Hypothesis Tests

Sample size is the key

Sampling test Distributions

Four common probability distributions of sample statistics z, t, chi-square,

Types of hypothesis tests

Associations and Differences

Difference of mean tests

K-independent samples (n>2)

Difference of mean tests (Non parametric)

Difference of mean tests (Non parametric)

Test of Proportions, ratios and indices

Scatter plots with various correlations

Classification of some hypothesis tests

Summary of Statistical Tests

Common Errors of statistical analysis

Common Errors of statistical analysis

You might also like