Data Analysis

10/6/2011
Samples & Populations
Data Analysis
Sample: subset of the population

Problems:
1. Cannot sample every individual
2. Inherent variation among individuals
Randomness
Reduces bias and increases chances of
representing population
Simple random, stratified random
Replication
More replicates, better representation of the
population
Outline
Background
terminology
Descriptive statistics
Inferential statistics
Graphs & tables
Parameters vs. Statistics
Types of Statistical Analyses
Parameter number calculated from population

Statistic number calculated from sample
1. Parameter estimation estimating mean of

population
Ex. meeting target/threshold objectives
Maintain a population of at least 1000
individuals
2. Significance/statistical/hypothesis testing
change in parameter
Ex. objective is to detect change over time
The population mean has changed between
year one and year two.
4
Scientific Method
Scientific Method
OBSERVATION
OBSERVATION
QUESTION
QUESTION
HYPOTHESIS
HYPOTHESIS
PREDICTION
PREDICTION
TEST
TEST
6
Hypochlora alba
Artemisia ludoviciana
10/6/2011
Scientific Method
OBSERVATION
QUESTION
Scientific Method
Hypochlora alba is the same color

as its host plant, Artemisia
ludoviciana.
Is the color of Hypochlora alba
related to its host plant?
OBSERVATION
QUESTION
Hypochlora alba is the same color

as its host plant, Artemisia
ludoviciana.
Is the color of Hypochlora alba
related to its host plant?
HYPOTHESIS
Pigments in the host plant affect the

color of Hypochlora alba.
HYPOTHESIS
Pigments in the host plant affect the

color of Hypochlora alba.
PREDICTION
The color of Hypochlora alba will

change if fed another host plant of
another color.
PREDICTION
The color of Hypochlora alba will

change if fed another host plant of
another color.
TEST
Feed Hypochlora alba different

host plants.
8
Hypotheses
TEST
Feed Hypochlora alba different

host plants.
9
Hypotheses
Null Hypothesis (Ho)
There is no difference between the means of
treatment A and B.
Null Hypothesis
There is no effect of plant host color on
Hypochlora alba color.
Alternative Hypothesis (Ha)

There is a difference between the means of
treatment A and B.
Alternative Hypothesis
The color of the host plant affects the color of
Hypochlora alba.
Ho: mean A = mean B

Ha: mean A mean B
10
Hypotheses
11
Good Hypotheses
Null Hypothesis
There is no change between the means from
year one to year two.
Alternative Hypothesis
There is a change between the means from year
one to year two.
State independent and dependent variables!

Exs.
1. There is an effect of plant host color on
grasshopper color.
2. There is no difference in species diversity
between control and burned treatments.
3. Purple loosestrife density does not differ
between years.
Use statistics to test hypotheses!

12
13
10/6/2011
Good Hypotheses
Types of Experiments
State independent and dependent variable!

Exs.
1. There is an effect of plant host color on
grasshopper color.
2. There is no difference in species diversity
between control and burned treatments.
3. Purple loosestrife density does not differ
between years.
Descriptive:
Does not alter the environment
natural experiment
Shows patterns, but does not identify the
mechanism explaining those patterns
14
Variables
Manipulative:
Alters the environment using treatments
Isolates the variable of interest (identifies the
mechanism)
Controls for all other variables
15
Degrees of Freedom
Variable: things we measure, control or

manipulate in experiments; a.k.a. factor, effect
Dependent variable
Variables that are measured
Independent variable
Variables that are controlled or manipulated
i.e. treatment
**dependent variables depend on the
independent variables**
Degrees of freedom (d.f.)

number of scores that are free to vary
# observations in a sample - # population
parameters estimated from sample data
Ex.
Parameter estimated = mean
Sample size = n
d.f. = n - 1
16
17
Descriptive Statistics
Mean:
Average of all the numbers
Median:
Middle number of the group
Mode:
Number that appears the most frequently
Range:
Difference between the largest and smallest
number
DESCRIPTIVE
STATISTICS
18
19
10/6/2011
Describing Variation
Normal Distribution
1. Standard Deviation
2. Standard Error of Mean
3. Confidence Intervals
mean
20
Standard Deviation
Symbols
21
Sum of Squares
SS = (x x)2
x = sample value
x = sample mean
s = sample variance
= population mean
= population variance
n = sample size
= level of significance
Variance
s2 = (x x)2
n-1
Standard Deviation
S.D. = (x x)2
n-1
Degrees of Freedom
d.f. = n - 1
Standard Deviation
S.D. = SS
d.f.
S.D. measures dispersion of the data

22
Standard Deviation
Used in statistical testing
23
Standard Error of Mean
General Rules
68% fall w/in 1 s.d.
*If normal distribution
Standard Deviation
S.D. = (x x)2
n-1
S.E.M. = (x x)2
n-1
n
Standard Deviation
S.D. = SS
d.f.
S.E.M. = s.d.
n
S.E.M. describes uncertainty in estimating the mean

24
Used in statistical testing
25
10/6/2011
Confidence Intervals
Confidence Interval
C.I. = x z(1-)
n
Use:
Large sample size
Normal distribution
Confidence Intervals Revisited

Confidence interval width
Confidence Interval
C.I. = x t(, d.f.)
24.5 3 (95% C.I.)

Mean
Confidence level
Confidence level = % of time (probability) that the

true mean falls within the interval
95% CI = true mean falls in interval 95% of the
time
Use:
Small sample size (<30)
Students t distribution
C.I. describes interval around

estimated mean and probability that
actual mean falls in this interval
Used for parameter estimation
26
27
Comparing Sample Means
Reporting Means
Low
variability
46.5 8.4 cm (S.D.)

46.5 2.9 cm (S.E.M.)
21.34 5.2 cm (95% C.I.)
or
(16.14, 26.54)
Medium
variability
High
variability
28
29
Inferential Statistics
Uses sample statistics to make inferences about
population parameters
Inherent variation in population
Are the differences you see due to inherent
variation or your treatments?
Must test hypotheses
Use test statistic and P values
INFERENTIAL
STATISTICS
31
32
10/6/2011
Types of Values
Probability Values in Statistics
Discrete values:
Values are distinct
Categorical data
Ex. presence/absence
Continuous values:
One value flows into the next
There are an infinite number of other possible
values in between any two values
Ex. plant height
P-values:
Probability that the null hypothesis is true
Value tells you whether you reject or fail to
reject the null hypothesis
Standard to reject if P < 0.05 ( = 0.05)
Ex. P = 0.02
Reject Ho; 2% of the time the Ho is true
Ex. P = 0.88
Fail to reject Ho; 88% of the time the Ho
is true
Different statistical tests are used

for different types of values
33
Types of Samples
34
Statistical Tests
Independent Samples
Different sampling units each time
Paired Samples
Re-sample already established units
ex. permanent plots
Depend upon:
Discrete or continuous values
Independent or paired samples
Chi-square test for independence
Independent sample t-test
Paired t-test
Analysis of variance
Repeated measures analysis of variance
What are your sampling units?

Quadrats, points, transects
35
36
Chi-Square
Chi-Square
Chi-square test for independence:

Discrete (categorical) data
determines whether there is a significant
association between two categorical variables
Ho: two variables are independent of each other

Ha: two variables are not independent of each other
Chi-square test for independence test statistic:

Create contingency
tables (observed values)
Calculate expected values
Contingency table
2 = [(O-E)2 / E]
1999
2009
where:
Present
40
15
O = observed
Absent
20
10
E = expected
60
See Elzinga p. 241-244
37
25
Ho: two variables are independent of each other

Ho: frequency of target plant is same in 1999 & 2009
55
30
85
38
10/6/2011
Independent-sample t-test
t-test test statistic:
t = difference between two means
standard error of two means
t-test:
Continuous data
Tests the difference between sample means of
two groups
t = x1 x2
s x1 x2
Ho: there is no difference between the hypothesized

population means (
Ha: there is a difference between the hypothesized
population means (
Ho: 1 = 2
Ha: 1 2
where:
x1 = mean of sample 1
x2 = mean of sample 2
s x1 x2 = standard error of two means
39
Two vs. One-tailed t-test
Analysis of Variance
Two-tailed t-test
Ho: There is no change in population mean.
One-tailed t-test
- Ho: There is no increase in population mean.
Note: If fail to reject Ho, this could mean no
change or a decrease in population mean.
**More powerful than two-tailed test in
detecting true change.
One-tailed t-test P-value is half of two-tailed
t-test
41
s2
within groups
F=
Ho: there is no difference among the hypothesized

population means (
Ha: there is a difference among the hypothesized
population means (
Ho: 1 = 2 = 3 = 4
See Elzinga
Ha: 1 2 3 4
p. 239-241
42
Multiple comparisons
Tells you which means are different from each other

Ha: 1 2 3 4
1 vs. 2
1 vs.
2 vs. 3, etc.
Analysis of Variance (ANOVA) test statistic:
s2between groups
Analysis of Variance (ANOVA):

Continuous data
Tests the difference between sample means of
more than two groups
F=
40
MSbetween groups
MSwithin groups
43
Use two-tailed t-tests

# comparisons < 8-10: Bonferroni
Divide by # of tests
Multiple P value by # of tests
# comparisons > 8-10: Tukey
44
10/6/2011
Paired t-test
Statistical Tests
Independent Samples
Chi-square test
Analysis of Variance (ANOVA)
Paired Samples
Paired t-test
Repeated-measures ANOVA
45
Repeated-measures ANOVA
Means
1990 = 0.44
1994 = 0.38
P = 0.55
Means
1990 = 0.44
1994 = 0.38
P = 0.0009
46
Parametric statistics assume:

1. Normal distribution
2. Homogeneity of variance
3. Random samples
If assumptions fail:
1. Increase sample size
2. Transform data
3. Use nonparametric statistics
4. Use statistical analyses based on resampling
47
49
Reporting Statistics
Statistical Values
Include values in tables, figure captions or text

Test statistic, d.f., P
Examples:
Test statistic:
Chi-square test t-test t

Analysis of Variance (ANOVA) F
NOW WHAT?
Find degrees of freedom (d.f.)
Need test statistic and d.f. to get P-value
Use statistical tables or computer programs to
compute P-value
Paired samples
Statistical Assumptions
ANOVA with three or more years of data

Not independent samples
Assumes sphericity
Correlation between data of year 1 and 2 is
the same as year 2 and 3, etc.
Does not tell you what years differ from each
other
Elzinga recommends using paired t-tests using
Bonferronis adjustment instead of RM
ANOVA
Independent samples
See
Elzinga
p. 244-245
2 = 28.22, d.f. = 10, P = 0.02

t = 1.3, d.f. = 4, P = 0.26
F = 1.67, d.f. = 1,4 , P = 0.27
*Round to the nearest hundreth for most values.
** Report exact P values unless P < 0.0001
50
51
10/6/2011
Sampling & Statistics

Three items to consider when deciding sampling
units:
1. Shape
Sample size = number of sampling units

n = sample size
Sample size affects statistical power
Why?
Constraints to sample size?
Time
Money
2. Size
3. Number
Sample Size
Statistical Error
False-Change Error =
No change has taken place but sampling detects
change
Controlled by P-value
P-value = probability that the null hypothesis is
true
P-value = probability that no change actually
occurred
P-value = probability that the difference was due
to chance
Probability Values in Statistics

P-value:
Value tells you whether you reject or fail to reject
the null hypothesis
Standard to reject if P < 0.05 or = 0.05
5% probability that the null hypothesis is true
5% probability that no change actually occurred
5% probability that the difference was due to
chance
Missed-Change Error =
Real change has taken place but sampling does
NOT detect change
Increase statistical power
Power = 1
Ex.
If = 0.70, power = 0.30
If = 0.05, power = 0.95
I want to be at least 95% certain of detecting a real change.
10/6/2011
Increasing Statistical Power
Tools for Sample Size
Power = a function of:

S.D. = standard deviation
n = number of samples
= false-change error
MDC = minimum detectable change
MDC = size of change you want to detect

Set by observer
Depends on natural history of organism
Is it biologically significant?
General recommendations/experience
Graphically
Equations
Computer Programs
Resources
See Elzinga p. 74-87 (graphing)
See Elzinga p. 141-154, Appendix 16 & 18
(computer programs)
See Herrick vol. II Appendix C
(recommendations & calculations)
Tables
GRAPHS AND TABLES
60
Table 3
Mean values ( standard error) of exchangeable cations in soil of
control and NaCl-treated soils from experiment #1 after plants were
harvested in the second year of the experiment. ** = P 0.05, *** =
P 0.0001.
Na (ppm)
Ca (ppm)
Mg (ppm)
K (ppm)
Newingham and
Belnap 2006
Histogram
Control
79
4
3690
77
195
9
267
20
NaCl
247
24
3133
133
150
6
204
16
***
**
**
**
61
Scatterplot
Broken Fingers While Running
Number of Cases
50
40
30
20
10
0-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-9091-100
Age
62
63
10
10/6/2011
Scatterplot
Simple Bar Graph

Plant Total Biomass (g)
16
6
S.D.
0
0
confidence intervals)
20
20
20
Insect Abundance (#)
25
September 2004
25
**
10
Ambient
CO2
Elevated
CO2
15
10
Ambient
CO2
Elevated
CO2
6
4
2
Ambient
CO2
Fertilizer
65
Complicated Graph with

Error Bars
S.E.
August 2004
25
10
No Fertilizer
Error bars:
Whiskers on means to guide whether
bars/points are statistically (significantly)
different (ex. standard deviation, standard error, 95%
15
10
64
Statistics from Graphs
15
12
Population
June2004
14
66
Elevated
CO2
Letters
Denote
Significant
Differences
F. idahoensis Total Biomass (g)
Body Size (cm)
No Clipping
Clipping
6
5
4
3
2
S.E.
1
0
Benomyl
No Benomyl
67
Interesting
Results!
68
69
11

Data Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Analysis

Uploaded by

Copyright:

Available Formats

10/6/2011

Samples & Populations

Sample: subset of the population

Parameters vs. Statistics

Types of Statistical Analyses

Parameter number calculated from population

1. Parameter estimation estimating mean of

Hypochlora alba is the same color

Hypochlora alba is the same color

Pigments in the host plant affect the

Pigments in the host plant affect the

The color of Hypochlora alba will

The color of Hypochlora alba will

Feed Hypochlora alba different

Feed Hypochlora alba different

Alternative Hypothesis (Ha)

Ho: mean A = mean B

State independent and dependent variables!

Use statistics to test hypotheses!

State independent and dependent variable!

Variable: things we measure, control or

Degrees of freedom (d.f.)

S.D. measures dispersion of the data

Used in statistical testing

Standard Error of Mean

S.E.M. describes uncertainty in estimating the mean

Used in statistical testing

Confidence Intervals Revisited

24.5 3 (95% C.I.)

Confidence level = % of time (probability) that the

C.I. describes interval around

Comparing Sample Means

46.5 8.4 cm (S.D.)

Probability Values in Statistics

Different statistical tests are used

What are your sampling units?

Chi-square test for independence:

Ho: two variables are independent of each other

Chi-square test for independence test statistic:

See Elzinga p. 241-244

Ho: two variables are independent of each other

Ho: there is no difference between the hypothesized

Two vs. One-tailed t-test

Ho: there is no difference among the hypothesized

Tells you which means are different from each other

Analysis of Variance (ANOVA) test statistic:

Analysis of Variance (ANOVA):

Use two-tailed t-tests

Parametric statistics assume:

Include values in tables, figure captions or text

Chi-square test t-test t

ANOVA with three or more years of data

2 = 28.22, d.f. = 10, P = 0.02

Sampling & Statistics

Sample size = number of sampling units

Probability Values in Statistics

Increasing Statistical Power

Tools for Sample Size

Power = a function of:

MDC = size of change you want to detect

GRAPHS AND TABLES

Broken Fingers While Running

0-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-9091-100

Simple Bar Graph