Professional Documents
Culture Documents
Random sampling
Systematic sampling
Convenience sampling
Stratified sampling
Cluster sampling
Sampling Error- is the difference between the
sample measure and the corresponding
population measure
Distributions
Positively skewed
Symmetric
Negatively skewed
Distributions
Distributions
Poisson Distribution
used to represent the number of successive
independent events of a specified type with
low probability of occurrence (< 10%) in some
specified interval of time or space.
Example cases of flu
Denoted by
Poisson Distribution
Binomial Distribution
An experiment that consists of n independent,
repeated trials, each of which can end in only one
of two ways arbitrarily labeled success or
failure.
The probability that any trial ends in a success
is p (and hence q = 1 p for a failure).
Denoted by
where in
Binomial Distribution
Types of Variables
Nominal
Ordinal
Interval
Ratio
Types of Variables
Sampling Techniques
Random sampling
Systematic sampling
Stratified sampling
Cluster sampling
Other sampling techniques- Convenience
sampling, Sequential sampling, Double
sampling and multi-stage sampling
Theory of Probability
Experiment
Outcome
Sample space
Event
Theory of Probability
P [A]= No of Possible outcomes in which an event A occurs
Total No of possible outcomes in the sample space
Theory of probability
Definition of Probability
A probability measure is a rule, say P, which associates
with each event contained in a sample space S a number such
that the following properties are satisfied:
1 For any event, A, P(A) 0.
2 P(S) = 1 (since S contains all the outcomes, S always
occurs).
3 P(not A)+P(A)=1.
4 If A and B are mutually exclusive events (that cannot
occur simultaneously) and independent events (that are not linked in
any way), then P(A or B) = P(A) + P(B) and
P(A and B) = 0
Confidence Intervals
The range around any hypothetical value of
mean () within which 95% of the means of all
samples of size n taken from that population
will occur.
Denoted by
Understanding Z-statistic
Confidence Intervals
Distribution of the Z statistic (the ratio of the difference of population mean and
sample mean divided by the Standard error of the mean (SEM) obtained by taking
the means of a large number of small samples from a normal distribution). The 95%
confidence interval obtained by taking the means of a large number of small samples
from a normally distributed population with known statistics is indicated by the black
horizontal bar enclosed within 1.96 SEM. By chance 95% of the sample means
will be within the range 1.96 to +1.96 , with the remaining 5% outside this range
Confidence Intervals
With larger sample sizes,
the 95% confidence
intervals get smaller
P-Value
It is defined as the probability of getting the
observed result, or a more extreme result, if the
null hypothesis is true. In other words it is the
measure of the likelihood of the result given the
null hypothesis is true or the statistical
significance of the claim.
range from 0 to 1
P-Value
"P=0.030" is a shorthand way of saying "The
probability of getting 17 or fewer male
chickens out of 48 total chickens, IF the null
hypothesis is true that 50 percent of chickens
are male, is 0.030.
It is a usual convention in biology to use
a critical P-value of 0.05 (often called alpha, )
P-Value
This p-value measures how likely it was that
you would have gotten your sample results if
the null hypothesis were true.
The farther out your test statistic is on the tails
of the standard normal distribution, the
smaller the p-value will be, and the more
evidence you have against the null hypothesis
being true
Interpreting P-value
If the p-value is greater than or equal to , you
fail to reject Ho.
If the p-value is less than , reject Ho.
p-values on the borderline (very close to )
are treated as marginal results
Interpreting P-value
- Heres how to interpret your results for any
given alpha level
To make a proper decision about whether or
not to reject Ho, you determine your cutoff
probability for your p-value before doing a
hypothesis test; this cutoff is called an alpha
level ().
Typical values for are 0.05 or 0.01
Interpreting P-value
- How to interpret your results if you use an alpha level of
0.05
If the p-value is less than 0.01 (very small), the results are
considered highly statistically significant reject Ho.
If the p-value is between 0.05 and 0.01 (but not close to
0.05), the results are considered statistically significant
reject Ho
If the p-value is close to 0.05, the results are considered
marginally significant decision could go either way
If the p-value is greater than (but not close to) 0.05, the
results are considered non-significant dont reject Ho
Statistical Hypothesis
Statistical Hypothesis- statement about the
probability distribution of populations using one
or more data samples
Hypothesis H0: All data samples originate from
the same population (or the single data sample is
consistent with a given theoretical distribution).
Hypothesis H1: Some data samples do not
originate from the same population (or the single
data sample is not consistent with the given
theoretical distribution).
Sample selected at random from very different population may not necessarily be different. Simply by
chance the samples from populations 1 and 2 are similar, so you might mistakenly conclude the two
populations are also similar
Even a random sample may not necessarily be a good representative of the population. Two samples have been
taken at random from the same population. By chance, sample 1 contains a group of relatively large fish, while
those in sample 2 are relatively small.
Parametric statistics
Also known as classical statistics
Parametric tests are designed for analysing
data from a known distribution
ANOVA (1920s and 30s), Multiple Regression
(1800s), T-tests (1900s), Pearson Correlation
(1880s) are parametric statistical methods
Parametric statistics
General Assumptions of Parametric Statistical
Tests
1. The sample of n subjects is randomly selected
from the population.
2. The variables are continuous and from the
normal distribution
3. The measurement of each variable is based
on interval or ratio data
Non-parametric analog
Wilcoxon rank sum test
Wilcoxon signed rank test
Kruskal-Wallis test
Friedman test
Sampling Distributions
Major parametric test statistics Z distribution
T distribution
Chi-square
F distribution
Z distribution
Represents the probability distribution of a
random variable that is the ratio of the
difference between a sample statistic and its
population value to the standard deviation of
the population statistic
Students t Distribution
Chi-square Distribution
represents the probability distribution of a
variable that is the square of values from a
standard normal distribution
bounded by 0 and infinity
used for interval estimation of population
variances
can also be used to determine the probability
of obtaining a sample difference (or one smaller
or larger) between observed values and those
predicted by a model
F Distribution
represents the probability distribution of a variable
that is the ratio of two independent chi-square
variables, each divided by its df (degrees of freedom)
(Hays 1994).
Because variances are distributed as 2, the F
distribution is used for testing hypotheses about ratios
of variances.
bounded by zero and infinity.
Used to determine the probability of obtaining a
sample variance ratio (or one larger) for a specified
value of the true ratio between variances
Hypothesis Testing
Null Hypothesis(H)&Alternate Hypothesis(H)
H: = / H: (Two-tailed test)
H: = / H: (one-tailed test)
Paired samples
Wilcoxon signed Ranks test
Mc Nemars test
Marginal Homogeneity test
- K independent samples
Kruskall- Wallis test
Friedmans Rank test
Correlation
Pearsons product moment correlation (r)
To investigate linear relationships between
two independent variables
r -1 to +1
Correlation
Regression
Prediction is made on the assumption the
hypothesis is correct
Simple linear regression
Investigate relationships- Dependent and
independent variable
Best fit linear line describes relation between
X and Y
Regression coefficient/ Coefficient of
determination (R)
Regression
Regression lines by gender and parity status for predicting weight at 1 month of age in
term babies
Power of a test
Measure of likelihood of a test reaching a
correct conclusion