You are on page 1of 135

Training Programme

“Statistics for Analytical Chemistry”

PREPARED BY

MOHAMED SALAMA
Training Programme

“Statistics for Analytical Chemistry”

PART I

INTRODUCTION
What is Statistics?

Statistics is the science of


 collecting,
 describing, and
 interpreting data
to make
 predictions, and
 decisions
A statistical Analysis

…includes
 describing the problem
 gathering data
 summarizing data
 Analyzing the data and communicating
meaningful conclusions
Statistics in chemical analysis
Lab Chemists are concerned with the chemical analysis
processes that quantify analytes in different matrices.

All these processes are subject to


• systematic variation
(e.g. Instrument effects, matrix effect)
• random variation
(e.g. measurement errors)

Statistics is a tool to help us understand the effects of


random variation
Areas of statistics

DESCRIPTIVE STATISTICS

Methods for summarizing and describing


a given set of data.

E.g. using  tables


 graphs
 numerical summaries
Areas of statistics II

INFERENTIAL STATISTICS

Methods for drawing conclusions about a


population based on a given data set from that
population
Population & Sample

POPULATION
The collection, or set, of ALL individuals, items, or
events of interest.

SAMPLE
The subset of the population for which we have
data available.
Example

Suppose we are investigating the daily water


consumption of students at Cairo University
50 students are selected at random and
interviewed.
 The POPULATION is the set of all
students.
 The SAMPLE is the 50 students
randomly selected.
Population & Sample

Population Sample

Use statistics to
summarize features
Use parameters to
summarize features

Inference on the population from the sample


Probability and statistics
PROBABILITY:
 Usually assumes knowledge of the population, i.e
 distributions are known.
 The theory behind statistics.
PRINCIPLE OF INFERENTIAL STATISTICS:
Only have a sample
Use this to infer details about the population.
Variable
A characteristic of interest about each individual
element of a population or sample.
In the water consumption study, a example of
a variable is the consumption per week in
units e.g. litres.

Other examples for a group of students could


be:
 age, weight, height
 etc
Parameter
 A piece of numerical information about a population.
 We use statistical inference to estimate the parameter
from the given sample.
Example: Suppose we want to find the average height of
a lab staff.
The POPULATION is the set of all lab staff.

The average height is a PARAMETER of the population.

The QC section is an example of a SAMPLE.


Statistic

A statistic is number calculated from a sample.

Typical examples we will learn about:

• the average (or mean) of a sample of measurements


• the standard deviation
Types of data

Qualitative (non numerical data)


 NOMINAL –
Names that describe categories, but no order.
Example: eye colour, gender

 ORDINAL –
Ordered categories. Example: year groups (1st
year, 2nd year, 3rd year etc.)
Types of data II

Quantitative (numerical data)


 DISCRETE –
number of possible values can be counted
E.g. number of children in a family, marks out of
50 in an exam.
 CONTINUOUS –
possible values are in a continuum
E.g. measurements in height, weight and volume
Data collection and sampling techniques

Surveys and experiments should use methods to


select a sample that is “representative” of the
population of interest.

Possible methods to obtain “representative” samples


 deterministic sampling.
 random sampling.
Data collection and sampling techniques

Deterministic Sample
Elements are selected on the basis some algorithmic
approach, for example, selecting every 10th member
of the population.
Danger of biased results.
Random Sample
Each member of the population is selected with a
certain probability.
So that !
Statistics is a valuable tool in all sciences.

However, making proper use of statistics to


• select data samples
• analyse these samples
• draw conclusions
is not a simple task.
Why Statistics?
1. Existence of errors
 Gross
 Random
 Systematic

2. Small number of measurements


 Sample
 Population
Definitions (ISO 5725-1:1994)
1. Accuracy
2. Trueness
3. Bias
4. Laboratory bias (Total bias)
5. Bias of measurement method
6. Laboratory component of bias
7. Precision
8. Repeatability (Conditions & Limit)
9. Reproducibility (Conditions & Limit)
Definitions
1. Accuracy
The closeness of agreement between a test result
and the accepted reference value

2. Trueness
The closeness of agreement between the average
value obtained from large series of test results and
an accepted reverence value
Definitions
3. Bias
The difference between the expectation of the test
results and an accepted reference value

4. Laboratory bias (Total bias)


The difference between the expectation of the test
results from a particular lab and an accepted
reference value
Definitions
5. Bias of the measurement method
The difference between the expectation of test
results obtained from all laboratories using that
method and an accepted reference value
6. Laboratory component of bias
The difference between the lab bias and the bias of
the measurement method
7. Precision
The closeness of agreement between independent
test results obtained under stipulated conditions
Definitions
Definitions

value of param eter value of param eter


(a) N o bias, high precision (b) N o bias, low precision

value of param eter value of param eter

(c) B iased, high precision (d) B iased, low precision

F igure 7.4.1 B ias and precision.


From C hance Encounters by C .J. W ild and G .A .F. Seb er, © John W iley & S ons, 2000.
Definitions
8. Repeatability
Precision under repeatability conditions

9. Repeatability conditions
Conditions where independent test results are
obtained with the same operator on identical test
items in the same lab by the same operator using
the same equipment within short intervals of time.
Definitions
10. Reproducibility
Precision under reproducibility conditions

11. Reproducibility conditions


Conditions where test results are obtained with the
same method on identical test items in different
laboratories with different operators using
different equipment
Statistical Model
1. Basic Model (ISO 5725-1:1994)

Test Result (Y) = m + B + e

 M is the general mean (expectation);

 B is the laboratory component of bias under repeatability conditions

(between-lab variation - Var (B) = σL2);

 e is the random error occurring in every measurement under

repeatability conditions (within-lab variation - Var (e) = σr2).


Statistical Model
5. Basic Statistical Model & Precision
Repeatability Standard Deviation = σr = √ Var (e)

Reproducibility Standard Deviation = σR = √ σL2 + σr2

i.e. Repeatability variance is estimated directly from Random error while


reproducibility variance depends on the sum of both within- & between- lab
variance
Conclusions
Accuracy

Trueness Precision

Bias Under Repeatability Under Reproducibility


Conditions r Conditions R

Within-lab variation Within-lab variation


+
Between-lab variation
Example Problems
Exercises
Training Programme

“Statistics for Analytical Chemistry”

PART II

FUNDAMENTALS OF
STATISTICS
Statistical Fundamentals
1. Measures of Central Tendency
1.1 Average
n

 xi
i=1
X=
n
Statistical Fundamentals

1. Measures of Central Tendency


1.2 Median (Md)
which is simply the middle value of the sample
when the measurements are arranged in
numerical order. (If n is even, then the median
is the average of the two middle values of the
ordered sample)
Statistical Fundamentals
1. Measures of Central Tendency
1.3 Mode
The mode is defined as the most frequent
value in a frequency distribution.
Statistical Fundamentals
2. Measures of Dispersion
2.1 Range (R)

2.2 Standard Deviation (s)

2.3 Coefficient of Variation (CV)


Statistical Fundamentals
2. Measures of Dispersion
2.4 Variance (s2)

2.5 Pooled Standard Deviation (sp)


Example Problems
Exercises
Training Programme

“Statistics for Analytical Chemistry”

PART III

ERRORS IN CLASSICAL
ANALYSIS
Training Programme

“Statistics for Analytical Chemistry”

1. DISTRIBUTION OF
ERRORS
Normal Distribution
Continuous random variable
Values from interval of numbers
Absence of gaps
Continuous probability
distribution
Distribution of continuous random
variable
Most important continuous
probability distribution
The normal distribution
Normal Distribution

“Bell shaped”
f(X)
Symmetrical
Mean, median and
mode are equal
Interquartile range X

equals 1.33 
Random variable Mean
has infinite range Median
Mode
Normal Distribution
 Is symmetric about the mean
 Mean = Median

Figure 6.2.2
50% 50%

Mean
Normal Distribution
1
1      2

f X  

X
2
e
2 2
f  X  : density of random variable X
  3.14159; e  2.71828
 : population mean
 : population standard deviation
X : value of random variable    X   
Normal Distribution

(a) Changing (b) Increasing

shifts the curve alongthe axis increases the spreadandflattens the curve

1 =6
1= 2 =6

2 =12

140 160 180 200 140 160 180 200

1 =160 2=174 1 = 2=170


Normal Distribution

(c) Probabilities and numbers of standard deviations

Shaded area = 0.683 Shaded area = 0.954 Shaded area = 0.997

     
68% chance of falling 95% chance of falling 99.7% chance of falling
between  and  between   and  between   and 
Normal Distribution
Normal Distribution

Probability is
the area under
the curve! P c  X  d   ?

f(X)

X
c d
Normal Distribution

An infinite number of normal distributions


means an infinite number of tables to look up!
Normal Distribution
Cumulative Standardized Normal
Distribution Table (Portion)
Z  0 Z 1
Z .00 .01 .02
.5478
0.0 .5000 .5040 .5080
Shaded Area
Exaggerated
0.1 .5398 .5438 .5478
0.2 .5793 .5832 .5871 0
Probabilities
0.3 .6179 .6217 .6255 Z = 0.12

Only One Table is Needed


Normal Distribution
X  6.2  5
Z   0.12
 10
Normal Distribution Standardized
Normal Distribution
  10
Z 1

6.2 X 0.12 Z
 5 Z  0
Example Problems
Example Problems
Exercises
Training Programme

“Statistics for Analytical Chemistry”

2. CONFIDENCE
INTERVALS
Confidence Intervals
 The problem
 How large are the error bounds when
we use data from a sample to estimate
parameters of the underlying population?

 Compute confidence intervals for 


 when  2 is known
 when  2 is unknown
Confidence Intervals
Suppose an estimate e.g. an
estimate for the mean x is given

We want to describe the


precision of the estimate

We do this by giving a range of likely values


for the parameter.
Such a range is called confidence interval.
Confidence Intervals

Population Random Sample I am 95%


confident that 
Mean
Mean, , is is between 40 &
unknown X = 50 60.

Sample
Confidence Intervals

Provides Range of Values


 Based on Observations from 1 Sample

Gives Information about Closeness to


Unknown Population Parameter
Stated in terms of Probability
Never 100% Sure
Elements of CI Estimation
A Probability That the Population Parameter
Falls Somewhere Within the Interval.
Sample
Confidence Interval
Statistic

Confidence Limit Confidence Limit


(Lower) (Upper)
Confidence Limits for Mean
Parameter =
Statistic ± Its Error   X  Error

X   = Error =   X

X   Error
Z  
 X  X

Error  Z  x

  X  Z X
Confidence Intervals

X  Z  X  X  Z 
n
x_
_
X
  1.645 x   1.645 x
90% Samples

  1.96 x   1.96 x
95% Samples
  2.58 x   2.58 x
99% Samples
Intervals & Level of Confidence
 Probability that the unknown population parameter is in
the confidence interval in 100 trials.
 Denoted (1 - ) % = level of confidence e.g.
90%, 95%, 99%
 Is Probability That the Parameter Is Not Within the
Interval in 100 trials (NOT THIS TRIAL ALONE!)
Intervals & Level of Confidence
Sampling
Distribution of _
x
the Mean /2 /2
1 -
_
X
Intervals
Extend from
X  
(1 - ) % of
X  Z X Intervals
Contain .
to
% Do Not.
X  Z X
Confidence Intervals
Factors Affect Interval Width
 Data Variation
 measured by  Intervals Extend from
 Sample Size X - Z to X + Z 
x x

  XofConfidence
Level X / n
(1 - )
Confidence Intervals
CONCLUDING REMARK

As smaller we choose  as ‘more


confident’ we get that the interval
contains the parameter  . But at the
same time the confidence interval gets
wider and is therefore less precise.
CI (σ Known – Hardly True)
 Assumptions
 Population Standard Deviation Is Known
 Population Is Normally Distributed
 If Not Normal, use large samples

 Confidence Interval Estimate

    X  Z / 2  
X  Z / 2 
n n
CI (σ Unknown)
 Assumptions
 Population Standard Deviation Is Unknown
 Sample size must be large enough for central limit theorem or
Population Must Be Normally Distributed
 Use Student’s t Distribution
 Confidence Interval Estimate

X  t / 2,n 1 
S
n
 X  t / 2,n1 
S
n
Student’s t Distribution

Standard
Normal

Bell-Shaped t (df = 13)


Symmetric
‘Fatter’ Tails t (df = 5)

0 t
Degrees of Freedom (df)
 Number of Observations that are free
to vary after sample Mean has been
calculated
 Example
 Mean of 3 Numbers Is 2 degrees of freedom =
X1 = 1 (or Any Number) n -1
X2 = 2 (or Any Number)
= 3 -1
X3 = 3 (Cannot Vary)
Mean = 2
=2
Student’s t Table
/2 Assume: n = 3 df
=n-1=2
Upper Tail Area
 = .10
df .25 .10 .05 /2 =.05

1 1.000 3.078 6.314

2 0.817 1.886 2.920 .05


3 0.765 1.638 2.353
0 t
t Values 2.920
Example Problems
Example Problems
Exercises
Training Programme

“Statistics for Analytical Chemistry”

PART IV

SIGNIFICANCE TESTS
Training Programme

“Statistics for Analytical Chemistry”

1. T-TEST
Student’s t Test
When solving probability problems for the sample mean,
one of the steps was to convert the the sample mean
values to z-scores using the following formula:

x  x 
z where x   and x 
x n
What happens if we do not know the population
standard deviation  ? If we substitute the population
standard deviation  with the sample standard
deviation, s, can we use the standard normal table?
Answer: no.
Student’s t Test
This question was addressed in 1908 when W.S. Gosset
found that if we replace  with the sample standard
deviation s, the distribution becomes a t-distribution. If

x
T
s/ n
then T has a t-distribution with n-1 degrees of freedom.
The t-distribution is similar to the z-curve in that it is bell
shaped, but the shape of the t-distribution changes with
the degrees of freedom.
We will use the T-tables to get the critical t-values at
different levels of  and degrees of freedom.
Student’s t Test
1.One-sample t-test

x
T 
s/ n
When using t-test TAKE CARE !!! Is it one-tailed or
two-tailed test?
Student’s t Test
2. Independent sample t-test (Equal variances)

Degrees of freedom= (n1+n2) -2


Student’s t Test
3. Independent sample t-test (Unequal variances)
Student’s t Test
3. Independent sample t-test (Unequal variances)
In such complicated case Degrees of freedom is calculated from:

Degrees of freedom= (n1+n2) -2


Student’s t Test
4. Paired sample t-test

The sign of the difference is very important


Training Programme

“Statistics for Analytical Chemistry”

2. F-TEST
f -Test
f - Test is used to compare the standard deviation of two
samples, and to make a test to determine whether the
populations from which they come have equal variances
Training Programme

“Statistics for Analytical Chemistry”

3. HYPOTHESIS TESTING
Hypothesis Testing
A hypothesis is a
claim (assumption) I claim the mean GPA of
this class is   3.5!
about the population
parameter
 Examples of
parameters
are population mean
or proportion
 The parameter must
be identified before
analysis
Hypothesis Testing
Assume the
population
mean age is 50.
( H 0 :   50 ) Identify the Population

Is X  20 likely if    ? Take a Sample


No, not likely!

REJECT
Null Hypothesis  X  20 
Hypothesis Testing
Sampling Distribution of X
It is unlikely that ... Therefore,
we would get a we reject the
sample mean of null hypothesis
this value ... that m = 50.
... if in fact this were
the population mean.

20  = 50 X
If H0 is true
Hypothesis Testing
H0: Innocent
Jury Trial Hypothesis Test
The Truth The Truth
Verdict Innocent Guilty Decision H0 True H0 False
Do Not Type II
Innocent Correct Error Reject 1-
Error (  )
H0
Type I
Error Correct Reject Power
Guilty Error
H0 (1 -  )
( )
Hypothesis Testing
If you reduce the probability of one
error, the other one increases so that
everything else is unchanged.


Hypothesis Testing
 True value of population parameter
  Increases when the difference between
hypothesized parameter and its true value decrease
 Significance level
  Increases when  decreases
 Population standard deviation 
 Increases when  increases


 Sample size

 Increases when n decreases  


n
Hypothesis Testing
 Convert sample statistic (e.g.: X ) to test statistic
(e.g.: Z, t or F –statistic)
 Obtain critical value(s) for a specified 
from a table or computer
 If the test statistic falls in the critical region, reject
H0
 Otherwise do not reject H0
Hypothesis Testing

H0: 0 H0: 0


H1:  < 0 H1:  > 0
Reject H0 Reject H0
 

0 Z 0 Z
Small values of Z don’t
Z Must Be Significantly
contradict H0
Below 0 to reject H0
Don’t Reject H0 !
Example Problems
Example Problems
Example Problems
Example Problems
Example Problems
Exercises
Training Programme

“Statistics for Analytical Chemistry”

4. ANALYSIS OF VARIANCE
(ANOVA)
Analysis of Variance (ANOVA)
The statistical tests described previously are
used in the comparison of two sets of data, or to
compare a single sample of measurements with
a standard or reference value. Frequently,
however, it is necessary to compare three or
more sets of data, and in that case we can make
a use of a very powerful statistical method with
a great range of applications – Analysis of
Variance (ANOVA).
Analysis of Variance (ANOVA)
If there is only one source of
variation apart from this
measurement area, a one-way
ANOVA calculation is appropriate,
if there are two sources of
variation we use two-way ANOVA
calculations, and so on.
Analysis of Variance (ANOVA)
Example:
A sample of fruit is analysed for its pesticide content by a
liquid chromatographic procedure, but four different
extraction procedures A-D (solvent extraction with different
solvents, solid-phase extraction etc) are used, the
concentration in each case being measured three times. The
results (mg kg-1) are indicated in the shown table, Is there
any evidence that the four different sample preparation
methods yield different results?
Analysis of Variance (ANOVA)

Results A B C D
1 10.5 9.9 9.9 9.2
2 11.5 10.8 9.1 8.5
3 10.7 10.8 8.9 9.0
Average 10.9 10.5 9.3 8.9

Overall Average = 9.9


Analysis of Variance (ANOVA)
Solution:
The principle underlying a one-way ANOVA
calculation is that two separate estimates of So2 are
made, one from the within-sample variation, the
other from the between-sample variation. If H0 is
correct, then these two estimates of So2, which can
be compared using the F-test, should not be
significantly different.
Analysis of Variance (ANOVA)
Solution:
In practice simple spreadsheets and other
statistical software provide the results very
rapidly, but here we demonstrate the
underlying principles by going through the
calculations step by step. At each stage we use
the expression ∑ (xi -x)2/(n – 1) to calculate the
variances.
Analysis of Variance (ANOVA)
Solution:
1. Within-sample variation:
This is readily found by calculating the variance of
the three measurements for each of the four methods
A-D, and averaging them. The variance for sample
A is given by {(10.5 – 10.9)2 + (11.5 – 10.9)2 +
(10.7 – 10.9)2}/2 = (0.16 + 0.36 + 0.04)/2 = 0.28.
Analysis of Variance (ANOVA)
Solution:
1. Within-sample variation:
Similarly the variances for B, C and D are 0.27,
0.28 and 0.13. The average of these four variances,
0.24, is the within-sample estimate of So2 . This
estimate has 8 degrees of freedom, i.e. 2 from each
of the four samples.
Analysis of Variance (ANOVA)
Solution:
2. Between-sample variation:
If the variance of the four average values for the
samples A-D is determined, it is found to be {(10.9
– 9.9)2 + (10.5 – 9.9)2 + (9.3 – 9.9)2 + (8.9 – 9.9)2}/3
= 2.72/3 = 0.907. However this variance is
determined from numbers which are themselves the
means of three replicates, so it estimates So2/n rather
than So2.
Analysis of Variance (ANOVA)
Solution:
2. Between-sample variation:
In the present experiment n = 3, so the between-
sample estimate of So2 is 0.907  3 = 2.72. This
estimate has three degrees of freedom, being
derived from 4 sample means.
Analysis of Variance (ANOVA)
Solution:
3. F-Test:
The two estimates of So2 can be compared using the F-test.
Remembering that F > 1, the experimental value of F is
2.72/0.24 = 11.33. When evaluating this result using
statistical tables it is very important to note that the F-test in
ANOVA is used as a one-tailed (one-sided) test. This is
because the only realistic possibility is that the between-
sample variation might be significantly greater than the
within sample variation – the opposite is not feasible.
Analysis of Variance (ANOVA)
Solution:
3. F-Test:
At the P = 0.05 level, the critical one-tailed value of
F3,8 is 4.07. The experimental value is much larger
than this, so the null hypothesis can be rejected: it is
very likely that there are significant differences
between the four mean values.
Analysis of Variance (ANOVA)
EXCEL PRINTOUT FOR ANOVA EXAMPLE
Training Programme

“Statistics for Analytical Chemistry”

5. THE CHI-SQUARED TEST


The CHI-Squared Test
 The significance tests so far described in this course, in
general, assume that the data analyzed:
1. Be continuous, interval data comprising a whole
population or sampled randomly from a population.
2. Have a normal distribution.
3. Sample size should not differ hugely between the
groups.
In Contrast, CHI-Squared Test is concerned with frequency
i.e. the number of times a given event occurs.
The CHI-Squared Test
Example:

The numbers of glassware breakages reported by four


laboratory workers over a given period are given below. Is
there any evidence that the workers differ in their
reliability?

Number of breakages: 24, 17, 11,9

Solution:
Ho : The same number of breakages by each worker (no difference
in reliability assuming that the workers use the lab for an equal length of time)
HA : Different number of breakages by each worker
The CHI-Squared Test
The null hypothesis implies that since the total number of
breakages is 61, the expected number of breakages per worker
is 61 / 4 = 15.25. Obviously it is not possible in practice to
have a non integral number of breakages (this number is only a
mathematical concept). The question to be answered is
whether the difference between the observed and expected
frequencies is so large that the null hypothesis should be
rejected.

The calculation of chi-squared, χ2, the quantity used for test for
a significant difference, is shown in the following table:
The CHI-Squared Test
Observed Expected O-E (O-E)2/E
Frequency Frequency
(O) (E)
24 15.25 8.75 5.020

17 15.25 1.75 0.201

11 15.25 - 4.25 1.184

9 15.25 - 6.25 2.561

χ2= ∑ (O-E)2/E
= 8.966
The CHI-Squared Test
If χ2, (propability, df) exceeds a certain critical value the null hypothesis is
rejected.
df χ2, (P = 0.05)
1 3.84
2 5.99
χ2calculated = 8.966
3 7.81
χ2critical = 7.81 4 9.49
5 11.07
6 12.59
χ2calculated > χ2critical
7 14.07
The null hypothesis is rejected at the 5 % 8 15.51
significance level i.e. there is evidence that 9 16.92
the workers do differ in their reliability 10 18.31
Training Programme

“Statistics for Analytical Chemistry”

6. CONCLUSIONS FROM
SIGNIFICANCE TESTS
Significance Tests - Conclusions
What are the conclusions which may be drawn from
significance test?

As we have just explained that a significance test at , for example, P =


0.05 level involves a 5 % risk that a null hypothesis will be rejected
even though it is true. This type of error is known as a Type 1 error : the
risk of such an error can be reduced by altering the significance level of
the test to P = 0.01 or even P = 0.001. This, however, is the not only
possible type of error: it is also possible to retain a null hypothesis even
when it is false. This is called Type 2 error. In order to calculate the
probability of type 2 error it is necessary to postulate an alternative to the
null hypothesis known as alternative hypothesis.
Significance Tests - Conclusions
Significance Tests - Conclusions
Consider the situation where a certain chemical product is
meant to contain 3% of phosphorus by weight. It is
suspected that this proportion has increased. To test such
increase the composition is analyzed by a standard method
with known standard deviation of 0.03%. Suppose 4
measurements are made and a significance test is
performed at the level of P = 0.05. A one-tailed test is
required, as we are interested only in an increase.
Significance Tests - Conclusions

Ho , µ = 3.0 %
The solid line shows the sampling
distribution of the mean if the null
hypothesis is true. This sampling
distribution has mean 3.0 and s.e.m
= 0.03 / √4 % . If the sample mean lies
above the indicated critical value , Xc ,
the null Hypothesis is rejected.
Thus the shaded region, with area 0.05,
represent the probability of a Type 1
error.
Significance Tests - Conclusions

H1 , µ = 3.05 %
The broken line shows the sampling
distribution of the mean if the
alternative hypothesis is true. Even if
this is the case, The null hypothesis
will be retained if the Sample mean
lies below Xc. The probability of this
type 2 error is Represented by the
hatched area
Significance Tests - Conclusions
The diagram makes clear the
interdependence of the two types of
error. If, for example, the significance
level is changed to P = 0.01 in order
to reduce a risk of type 1 error , Xc
will Be increased and the risk of a
type 2 Error is also increased.
Conversely, A decrease in the risk of
type 2 error Can only be achieved at
the expense of an increase in the
probability of Type 1 error.
Significance Tests - Conclusions
The only way in which both errors can
be reduced is by increasing the sample
size. The effect of increasing n to 9,
for example, is illustrated in the
shown figure : the resultant decrease
in the standard error of the mean
produces a decrease in both types of
error for a given value of Xc .
Significance Tests - Conclusions
The probability that a false null hypothesis is
rejected is known as the POWER of the test.
that is, the power of a test is (1 – the
probability of a type 2 error).

In the studied example the POWER is a


function of the mean specified in the
alternative hypothesis, and depends on the
sample size , the significance level of the test,
and whether the test is one- or two- tailed.

IF TWO OR MORE TEST ARE


AVAILABLE TO TEST THE SAME
HYPOTHESIS, IT MAY BE USEFULL TO
COMPARE THE POWERS OF THE TESTS
TO DECIDE WHICH IS MORE APPROPRIATE
Training Programme

“Statistics for Analytical Chemistry”

PART V

OUTLIERS
Outliers – Dixon’s Test

(|Suspect Value – nearest value|)


Q =
(|largest value – smallest value|)

Example The nitrate level (mg l-1) in a sample of river


water was measured four times, with the following results:
0.404, 0.400, 0.398, 0.379. Can the last value be rejected
as an outlier?
Outliers – Dixon’s Test
Solution: Using the mentioned equation
Q is given by (0.398-0.379)/(0.404 – 0.379) = 0.019/0.025
= 0.76. The critical value of Q (n = 4, P = 0.05, 2-tailed) is
0.831. Since the experimental value does not exceed the
critical Q value , the null hypothesis is retained, i.e. the
measurement 0.379 cannot be rejected.
n 4 5 6 7 8 9 10

Q 0.831 0.717 0.621 0.570 0.524 0.492 0.464


(P =
0.05)

You might also like