You are on page 1of 11

10/6/2011

Samples & Populations

Data Analysis

Sample: subset of the population


Problems:
1. Cannot sample every individual
2. Inherent variation among individuals
Randomness
Reduces bias and increases chances of
representing population
Simple random, stratified random
Replication
More replicates, better representation of the
population

Outline
Background
terminology
Descriptive statistics
Inferential statistics
Graphs & tables

Parameters vs. Statistics

Types of Statistical Analyses

Parameter number calculated from population


Statistic number calculated from sample

1. Parameter estimation estimating mean of


population
Ex. meeting target/threshold objectives
Maintain a population of at least 1000
individuals
2. Significance/statistical/hypothesis testing
change in parameter
Ex. objective is to detect change over time
The population mean has changed between
year one and year two.
4

Scientific Method

Scientific Method

OBSERVATION

OBSERVATION

QUESTION

QUESTION

HYPOTHESIS

HYPOTHESIS

PREDICTION

PREDICTION

TEST

TEST
6

Hypochlora alba

Artemisia ludoviciana

10/6/2011

Scientific Method
OBSERVATION
QUESTION

Scientific Method

Hypochlora alba is the same color


as its host plant, Artemisia
ludoviciana.
Is the color of Hypochlora alba
related to its host plant?

OBSERVATION
QUESTION

Hypochlora alba is the same color


as its host plant, Artemisia
ludoviciana.
Is the color of Hypochlora alba
related to its host plant?

HYPOTHESIS

Pigments in the host plant affect the


color of Hypochlora alba.

HYPOTHESIS

Pigments in the host plant affect the


color of Hypochlora alba.

PREDICTION

The color of Hypochlora alba will


change if fed another host plant of
another color.

PREDICTION

The color of Hypochlora alba will


change if fed another host plant of
another color.

TEST

Feed Hypochlora alba different


host plants.
8

Hypotheses

TEST

Feed Hypochlora alba different


host plants.
9

Hypotheses
Null Hypothesis (Ho)
There is no difference between the means of
treatment A and B.

Null Hypothesis
There is no effect of plant host color on
Hypochlora alba color.

Alternative Hypothesis (Ha)


There is a difference between the means of
treatment A and B.

Alternative Hypothesis
The color of the host plant affects the color of
Hypochlora alba.

Ho: mean A = mean B


Ha: mean A mean B
10

Hypotheses

11

Good Hypotheses

Null Hypothesis
There is no change between the means from
year one to year two.
Alternative Hypothesis
There is a change between the means from year
one to year two.

State independent and dependent variables!


Exs.
1. There is an effect of plant host color on
grasshopper color.
2. There is no difference in species diversity
between control and burned treatments.
3. Purple loosestrife density does not differ
between years.

Use statistics to test hypotheses!


12

13

10/6/2011

Good Hypotheses

Types of Experiments

State independent and dependent variable!


Exs.
1. There is an effect of plant host color on
grasshopper color.
2. There is no difference in species diversity
between control and burned treatments.
3. Purple loosestrife density does not differ
between years.

Descriptive:
Does not alter the environment
natural experiment
Shows patterns, but does not identify the
mechanism explaining those patterns

14

Variables

Manipulative:
Alters the environment using treatments
Isolates the variable of interest (identifies the
mechanism)
Controls for all other variables

15

Degrees of Freedom

Variable: things we measure, control or


manipulate in experiments; a.k.a. factor, effect
Dependent variable
Variables that are measured
Independent variable
Variables that are controlled or manipulated
i.e. treatment
**dependent variables depend on the
independent variables**

Degrees of freedom (d.f.)


number of scores that are free to vary
# observations in a sample - # population
parameters estimated from sample data
Ex.
Parameter estimated = mean
Sample size = n
d.f. = n - 1

16

17

Descriptive Statistics
Mean:
Average of all the numbers
Median:
Middle number of the group
Mode:
Number that appears the most frequently
Range:
Difference between the largest and smallest
number

DESCRIPTIVE
STATISTICS

18

19

10/6/2011

Describing Variation

Normal Distribution

1. Standard Deviation
2. Standard Error of Mean
3. Confidence Intervals

mean
20

Standard Deviation

Symbols

21

Sum of Squares
SS = (x x)2

x = sample value
x = sample mean
s = sample variance
= population mean
= population variance
n = sample size
= level of significance

Variance
s2 = (x x)2
n-1
Standard Deviation
S.D. = (x x)2
n-1

Degrees of Freedom
d.f. = n - 1

Standard Deviation
S.D. = SS
d.f.

S.D. measures dispersion of the data


22

Standard Deviation

Used in statistical testing

23

Standard Error of Mean

General Rules
68% fall w/in 1 s.d.
95% fall w/in 2 s.d.
99% fall w/in 3 s.d.
*If normal distribution

Standard Deviation
S.D. = (x x)2
n-1
Standard Error of Mean
S.E.M. = (x x)2
n-1
n

Standard Deviation
S.D. = SS
d.f.
Standard Error of Mean
S.E.M. = s.d.
n

S.E.M. describes uncertainty in estimating the mean


24

Used in statistical testing

25

10/6/2011

Confidence Intervals
Confidence Interval
C.I. = x z(1-)
n

Use:
Large sample size
Normal distribution

Confidence Intervals Revisited


Confidence interval width

Confidence Interval
C.I. = x t(, d.f.)

24.5 3 (95% C.I.)


Mean

Confidence level

Confidence level = % of time (probability) that the


true mean falls within the interval
95% CI = true mean falls in interval 95% of the
time

Use:
Small sample size (<30)
Students t distribution

C.I. describes interval around


estimated mean and probability that
actual mean falls in this interval
Used for parameter estimation

26

27

Comparing Sample Means

Reporting Means

Low
variability

46.5 8.4 cm (S.D.)


46.5 2.9 cm (S.E.M.)
21.34 5.2 cm (95% C.I.)
or
(16.14, 26.54)

Medium
variability
High
variability
28

29

Inferential Statistics
Uses sample statistics to make inferences about
population parameters
Inherent variation in population
Are the differences you see due to inherent
variation or your treatments?
Must test hypotheses
Use test statistic and P values

INFERENTIAL
STATISTICS

31

32

10/6/2011

Types of Values

Probability Values in Statistics

Discrete values:
Values are distinct
Categorical data
Ex. presence/absence
Continuous values:
One value flows into the next
There are an infinite number of other possible
values in between any two values
Ex. plant height

P-values:
Probability that the null hypothesis is true
Value tells you whether you reject or fail to
reject the null hypothesis
Standard to reject if P < 0.05 ( = 0.05)
Ex. P = 0.02
Reject Ho; 2% of the time the Ho is true
Ex. P = 0.88
Fail to reject Ho; 88% of the time the Ho
is true

Different statistical tests are used


for different types of values

33

Types of Samples

34

Statistical Tests

Independent Samples
Different sampling units each time
Paired Samples
Re-sample already established units
ex. permanent plots

Depend upon:
Discrete or continuous values
Independent or paired samples
Chi-square test for independence
Independent sample t-test
Paired t-test
Analysis of variance
Repeated measures analysis of variance

What are your sampling units?


Quadrats, points, transects
35

36

Chi-Square

Chi-Square

Chi-square test for independence:


Discrete (categorical) data
determines whether there is a significant
association between two categorical variables

Ho: two variables are independent of each other


Ha: two variables are not independent of each other

Chi-square test for independence test statistic:


Create contingency
tables (observed values)
Calculate expected values
Contingency table
2 = [(O-E)2 / E]
1999
2009
where:
Present
40
15
O = observed
Absent
20
10
E = expected
60

See Elzinga p. 241-244

37

25

Ho: two variables are independent of each other


Ho: frequency of target plant is same in 1999 & 2009

55
30
85

38

10/6/2011

Independent-sample t-test

Independent-sample t-test
t-test test statistic:
t = difference between two means
standard error of two means

t-test:
Continuous data
Tests the difference between sample means of
two groups

t = x1 x2
s x1 x2

Ho: there is no difference between the hypothesized


population means (
Ha: there is a difference between the hypothesized
population means (
Ho: 1 = 2
Ha: 1 2
See Elzinga p. 236-237

where:
x1 = mean of sample 1
x2 = mean of sample 2
s x1 x2 = standard error of two means
39

Two vs. One-tailed t-test

Analysis of Variance

Two-tailed t-test
Ho: There is no change in population mean.
One-tailed t-test
- Ho: There is no increase in population mean.
Note: If fail to reject Ho, this could mean no
change or a decrease in population mean.
**More powerful than two-tailed test in
detecting true change.
One-tailed t-test P-value is half of two-tailed
t-test
41

s2

within groups

F=

Ho: there is no difference among the hypothesized


population means (
Ha: there is a difference among the hypothesized
population means (
Ho: 1 = 2 = 3 = 4
See Elzinga
Ha: 1 2 3 4
p. 239-241
42

Multiple comparisons

Tells you which means are different from each other


Ha: 1 2 3 4
1 vs. 2
1 vs.
2 vs. 3, etc.

Analysis of Variance (ANOVA) test statistic:

s2between groups

Analysis of Variance (ANOVA):


Continuous data
Tests the difference between sample means of
more than two groups

Analysis of Variance

Analysis of Variance

F=

40

MSbetween groups
MSwithin groups

43

Use two-tailed t-tests


# comparisons < 8-10: Bonferroni
Divide by # of tests
Multiple P value by # of tests
# comparisons > 8-10: Tukey

44

10/6/2011

Paired t-test

Statistical Tests
Independent Samples
Chi-square test
Independent-sample t-test
Analysis of Variance (ANOVA)
Paired Samples
Paired t-test
Repeated-measures ANOVA
45

Repeated-measures ANOVA

Means
1990 = 0.44
1994 = 0.38
P = 0.55

Means
1990 = 0.44
1994 = 0.38
P = 0.0009

46

Parametric statistics assume:


1. Normal distribution
2. Homogeneity of variance
3. Random samples
If assumptions fail:
1. Increase sample size
2. Transform data
3. Use nonparametric statistics
4. Use statistical analyses based on resampling
47

49

Reporting Statistics

Statistical Values

Include values in tables, figure captions or text


Test statistic, d.f., P
Examples:

Test statistic:

Chi-square test t-test t


Analysis of Variance (ANOVA) F
NOW WHAT?
Find degrees of freedom (d.f.)
Need test statistic and d.f. to get P-value
Use statistical tables or computer programs to
compute P-value

Paired samples

Statistical Assumptions

ANOVA with three or more years of data


Not independent samples
Assumes sphericity
Correlation between data of year 1 and 2 is
the same as year 2 and 3, etc.
Does not tell you what years differ from each
other
Elzinga recommends using paired t-tests using
Bonferronis adjustment instead of RM
ANOVA
See Elzinga p. 245-246

Independent samples

See
Elzinga
p. 244-245

2 = 28.22, d.f. = 10, P = 0.02


t = 1.3, d.f. = 4, P = 0.26
F = 1.67, d.f. = 1,4 , P = 0.27
*Round to the nearest hundreth for most values.
** Report exact P values unless P < 0.0001

50

51

10/6/2011

Sampling & Statistics


Three items to consider when deciding sampling
units:
1. Shape

Sample size = number of sampling units


n = sample size
Sample size affects statistical power

Why?
Constraints to sample size?
Time
Money

2. Size

3. Number

Sample Size

Statistical Error

False-Change Error =
No change has taken place but sampling detects
change
Controlled by P-value
P-value = probability that the null hypothesis is
true
P-value = probability that no change actually
occurred
P-value = probability that the difference was due
to chance

Probability Values in Statistics


P-value:
Value tells you whether you reject or fail to reject
the null hypothesis
Standard to reject if P < 0.05 or = 0.05
5% probability that the null hypothesis is true
5% probability that no change actually occurred
5% probability that the difference was due to
chance

Missed-Change Error =
Real change has taken place but sampling does
NOT detect change
Increase statistical power
Power = 1
Ex.
If = 0.70, power = 0.30
If = 0.05, power = 0.95
I want to be at least 95% certain of detecting a real change.

10/6/2011

Increasing Statistical Power

Tools for Sample Size

Power = a function of:


S.D. = standard deviation
n = number of samples
= false-change error
MDC = minimum detectable change

MDC = size of change you want to detect


Set by observer
Depends on natural history of organism
Is it biologically significant?

General recommendations/experience
Graphically
Equations
Computer Programs
Resources
See Elzinga p. 74-87 (graphing)
See Elzinga p. 141-154, Appendix 16 & 18
(computer programs)
See Herrick vol. II Appendix C
(recommendations & calculations)

Tables

GRAPHS AND TABLES

60

Table 3
Mean values ( standard error) of exchangeable cations in soil of
control and NaCl-treated soils from experiment #1 after plants were
harvested in the second year of the experiment. ** = P 0.05, *** =
P 0.0001.
Na (ppm)

Ca (ppm)

Mg (ppm)

K (ppm)
Newingham and

Belnap 2006

Histogram

Control
79
4
3690
77
195
9
267
20

NaCl
247
24
3133
133
150
6
204
16

***
**
**
**
61

Scatterplot

Broken Fingers While Running

Number of Cases

50

40

30

20

10

0-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-9091-100

Age

62

63

10

10/6/2011

Scatterplot

Simple Bar Graph


Plant Total Biomass (g)

16
6

S.D.

0
0

confidence intervals)

20

20

20

Insect Abundance (#)

25

Insect Abundance (#)

Insect Abundance (#)

September 2004

25

**
10

Ambient
CO2

Elevated
CO2

15

10

Ambient
CO2

Elevated
CO2

6
4
2

Ambient
CO2

Fertilizer

65

Complicated Graph with


Error Bars

S.E.
August 2004

25

10

No Fertilizer

Error bars:
Whiskers on means to guide whether
bars/points are statistically (significantly)
different (ex. standard deviation, standard error, 95%

15

10

64

Statistics from Graphs

15

12

Population

June2004

14

66
Elevated
CO2

Letters
Denote
Significant
Differences

F. idahoensis Total Biomass (g)

Body Size (cm)

No Clipping
Clipping

6
5
4
3
2

S.E.

1
0

Benomyl

No Benomyl

67

Interesting
Results!

68

69

11

You might also like