Professional Documents
Culture Documents
And Statistics
Notice
This document is published under the conditions of the Creative Commons
http://en.wikipedia.org/wiki/Creative_Commons
Attribution
http://creativecommons.org/licenses/by/2.5/
License (abbreviated cc-by), Version 2.5.
Table of Contents
I.
II.
III. Time_____________________________________________________ 3
IV. Materials__________________________________________________ 3
V.
VI. Content___________________________________________________ 4
III. Time
The total time for this module is 120 study hours.
IV. Material
Students should have access to the core readings specified later. Also, they will need
a computer to gain full access to the core readings. Additionally, students should
be able to install the computer software wxMaxima and use it to practice algebraic
concepts.
V. Module Rationale
Probability and Statistics, besides being a key area in the secondary schools teaching
syllabuses, it forms an important background to advanced mathematics at tertiary level.
Statistics is a fundamental area of Mathematics that is applied across many academic subjects and is useful in analysis in industrial production. The study of statistics
produces statisticians that analyse raw data collected from the field to provide useful
insights about a population. The statisticians provide governments and organizations
with concrete backgrounds of a situation that helps managers in decision making.
For example, rate of spread of diseases, rumours, bush fires, rainfall patterns, and
population changes.
On the other hand, the study of probability helps decision making in government
agents and organizations based on the theory of chance. For example:- predicting
the male and female children born within a given period and projecting the amount
of rainfall that regions expect to receive based on some historical data on rainfall
patterns. Probability has also been extensively used in the determination of high,
middle and low quality products in industrial production e.g the number of good and
defective parts expected in an industrial manufacturing process.
VI. Content
6.1 Overview
This module consists of three units:
Mean,
Mode, and
Median
Indicator
functions
Bonferoni
Inequalities,
random
vectors
Generating
functions,
characteristic
functions &
random samples
Multinomial
distributions,
Functions of
random variables
DATA
Probability
Probability
distributions
Multivariate
distribution,
Convergence &
limit theorems
Frequency
Curves,
Quartiles
Deciles and
Percentiles,
Moment
and
moment
generating
function
Markov and
Chebychev
inequalities
Joint marginal
& conditional
distributions
Univariate
and
Bivariate
distributions
Regression
& correlation
Derived
distributionsChi-square, t
and F
Joint
probability
tables
1
6
B.
1
3
C.
1
2
D. 1
2) A single card is drawn at random from a standard deck of cards. Find the probability that is a queen.
A.
B.
C.
D.
1
13
1
52
4
13
1
2
3) Out of 100 numbers, 20 were 4s, 40 were 5s, 30 were 6s and the remainder
were 7s. Find the arithmetic mean of the numbers.
A. 0.22
B. 0.53
C. 2.20
D. 5.30
Height (cm)
60 - 62
63 - 65
66 - 68
69 - 71
72 - 74
A.
B.
C.
D.
57.40
62.00
67.45
72.25
4
5
6
8
1
2
12
5
8
6
9
11
7
8.88
H, T and HT
HH, HT, TH, TT
HH, HT, TT
H, T
10) If a letter is selected at random from the word Mississippi, find the probability
that it is an i
A.
1
8
B.
1
2
C.
D.
3
11
4
11
Answer Key
1. B
2.
3.
4.
5.
6. A
7.
8.
9.
10.
Statistical Terms
1. Raw data: Data that has not been organised numerically.
2. Arrays: An arrangement of raw data numerical data in ascending order of magnitude.
3. Range: the difference between the largest and the smallest numbers in a data.
4. Class intervals: In a range of grouped data e.g 21-30, 31-40 etc, then 21-30 l is
called the class interval.
5. Class limits: In a class interval of 21-30, then 21 and 30 are called class limits.
6. Lower class limits (l.c.l) : In the class interval 21-30, the lower class limit is 21
7. Upper class limit (u.c.l): in the class interval 21-30, the upper class limit is 30
8. Lower and upper class boundaries: In the class interval 21-30, the lower
class boundary is 20.5 and the upper class boundary is 30.5. These boundaries
assume that theoretically measurements for a class interval 21-30 includes all
the numbers from 20.5 to 30.5
9. Class Interval: In a class 21-30, then the class interval is the difference between
the upper class limit and the lower class limit i.e. 30.5-20.5 = 10. The class interval is also known as class width or class size.
10. Class Mark or Mid-point: In a class interval 21-30, the class mark is the average
21 + 30
= 25 .5
2
11. Frequency Distributions: large masses of raw data maybe arranged in classes
in tabular form with their corresponding frequencies. e.g.
of 21 and 30 i.e
Mass (kg)
Number of pupils (f)
10-19
5
20-29
7
30-39
10
40-49
6
Mass ( X)
Frequency (f)
Cumulative
Frequency( C.F)
20-24
4
4
25-29
10
4+10=14
30-34
16
14=16=30
35-39
8
30+8=38
40-44
2
38+2=40
Hence the cumulative frequency of a value is its frequency plus frequencies of all
smaller values.
The above table is called a Cumulative Frequency table.
13. Relative Frequency Distributions: In a frequency distribution
Mass ( X)
Frequency (f)
20-24
4
25-29
10
30-34
16
35-39
8
40-44
2
f = 40
The relative frequency of a class 25-29 is the frequency of the class divided by the
total frequency of all classes (cumulative frequency) and generally expressed as a
percentage.
Example:
100% =
10
100 = 25%
40
Mass ( X)
Frequency (f)
Cumulative
Frequency( C.F)
20-24
4
4
25-29
30-34
35-39
40-44
10
16
8
2
4+10=14 14=16=30 30+8=38 38+2=40
From the above cumulative frequency table, we can draw a graph of cumulative
frequency verses the upper class boundaries.
Cumulative frequency
Upper class
boundaries
Cumulative
frequencies
24.5
29.5
34.5
39.5
44.5
14
30
38
40
Ogive
45
40
35
30
25
20
15
10
5
0
20
25
30
35
40
45
Note: From the cumulative frequency data, the first plotting point is ( 24.5, 3). If
we started our graph at this point, it would remain hanging on the y-axis. We create
another point (19.5, 0) as a starting point. 19.5 is the projected upper class boundary
of the preceding class.
J Shaped
Reverse J-Shaped
Bimodal
U- shaped
Multimodal
Resource #2 Graph
Complete reference : Copy of Graph on a disc is accompanying this course
Abstract : It is difficult to draw graphs of functions, especially complicated functions,
most especially functions in 3 dimensions. The learners, being distance learners, will
inevitably encounter situations that will need mathematical graphing. This course
is accompanied by a software called Graph to help learners in graphing. Learners
however need to familiarise with the Graph software to be able to use it.
Rationale: Graph is an open-source dynamic graphing software that learners can
access on the given CD. It helps all mathematics learners to graph what would otherwise be a nightmare for them. It is simple to use once a learner invests time to learn
how to use it. Learners should take advantage of the Graph software because it can
assist the learners in graphing in other subjects during the course and after. Learners
will find it extremely useful when teaching mathematics at secondary school level.
Useful Link #2
Title : Mathsguru
URL : http://en.wikipedia.org/wiki/Probability
Description: Mathsguru is a website that helps learners to understand various branches
of number theory module. It is easy to access through Google search and provides
very detailed information on various probability questions. It offers explanations and
examples that learners can understand easily.
Rationale: Mathsguru gives alternative ways of accessing other subject related topics,
hints and solutions that can be quite handy to learners who encounter frustrations of
getting relevant books that help solve learners problems in Probability. It gives a
helpful approach in computation of probabilities by looking at the various branches
of the probability module.
Useful Link #3
Title : Mathworld Wolfram
URL : http://mathworld.wolfram.com/Probability
Description: Mathworld Wolfram is a distinctive website full of Probability solutions. Learners should access this website quite easily through Google search for
easy reference. Wolfram also leads learners to other useful websites that cover the
same topic to enhance the understanding of the learners.
Rationale: Wolfram is a useful site that provides insights in number theory while
providing new challenges and methodology in number theory. The site comes handy
in mathematics modelling and is highly recommended for learners who wish to study
number theory and other branches of mathematics. It gives aid in linking other webs
thereby furnishing learners with a vast amount of information that they need to comprehend in Probability and Statistics.
40 Hours
76
85
62
71
85
53
63
78
68
60
82
81
67
80
75
88
68
73
75
53
95
71
85
74
73
62
75
61
71
68
69
83
95
94
87
78
82
66
60
83
60
68
77
75
75
78
89
96
72
71
76
63
62
78
61
65
67
79
75
53
62
85
93
88
97
79
73
65
93
85
76
76
90
72
57
84
73
86
2. Weights of goats in kg
Weight
(kg)
No. of
goats
118-126
127-135
12
462
98
480
75
498
56
516
42
534
30
552
21
570
15
588
11
606
6
624
2
Time in minutes
No. of days
90-100
9
80-89
32
70-79
43
60-69
21
50-59
11
40-49
3
30-39
1
CASE 1:
A local firm dealing with agriculture extension services visits the farmer. She proudly
produces her records. The agricultural officer is very impressed by her good records
but clearly realises that the farmer needs some skills in data management to enable
her make informed decisions based on her farm outputs.
The agricultural officer designs a short course on data processing for all the rural
farmers.
During the course planning stage, the following terms are defined and designed for
a lesson one to the farmers.
a)
b)
c)
d)
e)
1. We want to choose a statistic that shows how different units seem similar.
Statistical textbooks call the solution to this objective, a measure of central
tendency.
2. We want to choose another statistic that shows how they differ. This kind of
statistic is often called a measure of statistical variability.
When we are summarizing a quantity like length or weight or age, it is common to
answer the first question with the arithmetic mean, the median, or the mode. Sometimes, we choose specific values from the cumulative distribution function called
quartiles.
The most common measures of variability for quantitative data are the variance; its
square root, the standard deviation; the statistical range; interquartile range; and the
absolute deviation.
Farmers lessons
The farmers are taught how to compute the
a) Mean or Average of a data as follows:
Average of a data= Sum total of the data divided by number of items in data.
Example:
Calculate the mean of the following data:
1) 1,3,4,4,5,6,3,7,
Solution: Mean =
1 + 3 + 4 + 4 + 5 + 6 + 3 + 7 33
=
= 4.125
8
8
650 + 675 + 700 + 725 + 800 + 900 + 1050 + 1125 + 1200 + 575
10
=
= 840
8400
10
Lesson Two
Mean Of Discrete Data
Example:
1) Find the mean of the following data:
X 22
f 5
24
7
25
8
33
4
36
6
37
9
41
11
Solution:
Mean
220
12
250
15
300
18
350
20
375
5
Solution:
Mean =
= $ 295.214
49
45
36
47
50
45
42
46
46
41
39
45
48
46
52
35
42
37
46
44
39
46
43
45
47
47
51
46
42
43
46
40
51
33
54
47
Solution
Frequency / Tally table
Class
Tally
Frequency
33- 37
37-42
43-47
48-52
53-57
////
///// ///
//// //// //// ///
//// //
//
Total
4
8
19
7
2
40
Class
Tally
Frequency(f)
33- 37
////
37-42
43-47
48-52
53-57
///// ///
//// //// //// ///
//// //
//
Total
8
19
7
2
40
Mean =
fx = 1775 = 44.375
f 40
Mid-point (x)
fx
33 + 37
= 35
2
4 35 = 140
40
45
50
55
320
855
350
110
1775
DO THIS
2).
x
f(x)
1
11
2
10
3
5
4
3
5
1
3).
Weight (x)
Frequency
4-8
2
9-13
4
14-18
7
19-23
14
24-28
8
29-33
5
61
5
64
18
67
42
70
27
73
8
6).
Weight (x)
Frequency
30.5-36.5
4
36.5-42.5
10
42.5-48.5
14
Answer Key
1). 66.4
2) 2.1
3). 20.6
4) 80
5) 76.45
6) 51.44
48.5-54.5
27
54.5-60.5
45
Lesson Three
Mode
Example
1) Find the mode of the following data: 1,3,4,4,5,6,1,3,3,2,2,3,3,5
Solution:
The mode of a data is the item that appears most times. In this data, 3 occurs most
times or most frequently i.e. 5 times. Therefore the mode is 3.
2) Find the mode of the following data: 22, 24, 25,22, 27, 22, 25, 30, 25, 31
Solution
22 and 25 occur three times each. Therefore the modes are 22 and 25. this is called
a bimodal data.
3) Find the mode of the data:
Observation ( X)
Frequency ( f)
10
16
11
Solution
The most occurring observation is 3 i.e. 3 occurs 16 times.
4) Find the modal class of the following data
Weight ( X)
Frequency ( f)
50 54
3
55-59
6
60-64
8
65-69
5
70-74
15
75-79
9
Solution
The modal class is 70-74 because it has the highest frequency of occurrence.
80-84
13
DO THIS
4-8
2
9-13
4
14-18
7
19-23
14
24-28
8
29-33
5
4)
Weight (x)
Frequency
30.5-36.5
4
36.5-42.5
10
42.5-48.5
14
48.5-54.5
27
Answer key
1) 5
2) 23.4
3) 19-23
4) 54.5-60.5
54.5-60.5
45
Lesson Four
Median
The median is the value in the middle of a distribution e.g. in 1, 2,3,4,5, the median is 3
i.e it comes at exactly in the middle of the distribution. For the data 1,2,2,3,4,5,6,7,7,8;
there are 10 terms and no middle number. In such a case, the median is the average
of the two numbers bordering the centre line
Eg 1,2,2,3,
6,7,7, 8
4+5
= 4.5
2
Example
Find the median of the following grouped data
Mass ( X)
Frequency (f)
20-24
4
25-29
10
30-34
16
35-39
8
40-44
2
Solution
th
20 + 21
= 10.5th term
2
20-24
4
4
25-29
10
4+10=14
30- 34
16
14 + 16 = 30
35-39
8
30+8=38
40-44
2
39+2 =40
Frequency
Procedure for Calculation of the Median
Step 1: The median occurs in the class interval 30-34
Step 2: L.C.L and U.C.L of 30-34 are 29.5 and 34.5
Step 3: Work out the Cumulative Frequency ( C.F)
Step 4: Work out the class interval as U.C.L L.C.L
Step 5: To get the 10.5th term.
Summation difference
10.5th term = L.C.L of class with median +
x Class Interval
Class frequency
i.e Summation difference 20.5 14 = 6.5 where 14 is the C.F of the class interval
25-29.
Step 6: The median = 29.5 +
6.5
5 = 31.53125.
16
Note that the denominator 16 is the class frequency in the class interval 30-34.
Range of a Data
The range of a data is simply the difference between the highest and the lowest score
in a data
Example: 23,26,34, 47,63 the range is 63-23=40 and in 121, 65, 78, 203, 298, 174
the range is 298 65= 233.
Q=
Q3 Q1
2
3) Deciles
If data arranged in order of magnitude is sub-divided into 10 equal portions ( 10%
each), then each portion constitutes a decile. The deciles are denoted by D1, D2,
D3,D9
4) Percentiles
If data divided arranged in order of magnitude is subdivided into 100 equal portions
(1%each), then the portion constitutes a percentile. Percentiles are denoted as P1,
P2, P3, P99
N
X X
j
j =1
Mean deviation (MD) =
=
N
X X
N
= X X , where X is the
arithmetic mean of the numbers and X X is the absolute value of the deviation
of X
from X .
Example
Find the mean deviation of the set
3, 4, 6, 8, 9.
Solution
Arithmetic mean =
3 + 4 + 6 + 8 + 9 30
=
=6
5
5
3 + 2 + 0 + 2 + 3
=
5
3 6 + 4 6 6 6 + 8 6 + 9 6
=
5
3 + 2 + 0 + 2 + 3 10
=
=5
5
2
Values
Frequencies
X1
f1
X2
f2
X3
f3
XN
Fm
m
f X X
j j
f XX
j =1
Mean deviation =
=
= XX
N
N
N
(X X )2
j
j =1
=
N
s=
(X X )2
N
x2 =
(X X )2
It follows that the standard deviation is the root mean square of the deviations
from the mean.
Values
Frequencies
X1
f1
X2
f2
X3
f3
XN
Fm
s=
m
f (X X )2
j
j =1
=
N
f (X X )2
=
N
fx 2
= (X X )2
N
where N= f = f .
j =1
The Variance
The variance of a set of data is defined as the square of the standard deviation i.e
variance = s2. We sometimes use s to denote the standard deviation of a sample of a
population and ( Greek letter sigma ) to denote the standard deviation of a po-
pulation population. Thus 2 can represent the variance of a population and s2 the
variance of sample of a population.
Examples
Find the Mean and Range of the following data: 5,5,4,4,4,2,2,2
Solutions
Mean = m
x =
5+5+4+4+4+4+2+2+2
= 3.56
9
= 3.56
Range 5 2 =3.
n + 1 14
7
=
= 607
2
2
14
= 7th position. The median is 5
2
n+1
2
10) Example
1,1,2,2,3,4,4,5,6,8,10,14,15,17
4+5
= 4.5
2
q
Group Work
Definition
h (x x )2
s =
N
2
s 2 is variance and
s 2 is standard deviation.
Example
Given the data 2,4,5,8,11. Find the variance and the standard deviation.
(x x )2
16
4
1
4
25
xx
2
4
5
8
11
-4
-2
-1
2
5
x =5
(x x )2 =50
So x =
30
=6
5
Variance= s 2 =
52 =
50
= 10
5
50
= 10
5
1,1,1,2,2,3,3,3,4,5
Skewness
Definition: Skewness is the degree of departure from symmetry of a distribution.
( Check positive and negative skewness above)
For skewed distributions, the mean tends to lie on the same side of the mode as the
longer tail.
Skewness=
mean mod e
X mod e
=
s tan dard deviation
s
Skewness=
3(mean median)
3(X median)
=
s tan dard deviation
s
(Q Q ) (Q Q ) Q 2Q + Q
3
2
2
1 = 3
2
1
Q Q
Q Q
3
1
3
1
(P P ) (P P ) P 2P + P
90
50
50 10 = 90
50 10
P P
P P
90 10
90 10
(n + 1)x0.25
9(.25) = 22.5( percentile)
2nd = 2
3rd = 3
4th = 4
5th = 5
q
Group Work
DO THIS
Find the 25th percentile, the 50th percentile, and 90th percentile
46,21,89,42,35,36,67,53,42,75,42,75,47,85,40,73,48,32,41,20,75,48,48,32,52,61,
49,50,69,59,30,40,31,25,43,52,62,50
Answer Key
a) 36
b) 48
c) 73
Kurtosis
Definition: Kurtosis is the degree of peakedness of a distribution, as compared to the
normal distribution.
Eamples
1) Leptokurtic Distribution
2) Platykurtic Distribution
DO THIS
Rate
9.3
9.5
9.7
10.4
10.6
10.6
10.6
10.9
10.8
10.5
10.0
3) Number of deaths per 1000 years for years 1960 and 1965 1975
1960
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
9.5
9.4
9.5
9.4
9.7
9.5
9.5
9.3
9.4
9.3
9.1
8.8
Solutions
1. 3
2. 10.6
3. 9.5
READ:
1) An Introduction to Probability by
Charles M. Grinstead
pages 247 -263
Exercise on pg 263-267 Nos.
4,7,8,9
Probability
1) Sample Space and Events
Terminology
a) A Probability experiment
When you toss a coin or pick a card from a deck of playing cards or roll a dice, the
act constitutes a probability experiment. In a probability experiment, the chances
are well defined with equal chances of occurrence e.g. there are only two possible
chances of occurrence in tossing a coin. You either get a head or tail. The head and
the tail have equal chances of occurrence.
b) An Outcome
This is defined as the result of a single trial of a probability experiment e.g. When
you toss a coin once, you either get head or tail.
c) A trial
This refers to an activity of carrying out an experiment like picking a card from a
deck of cards or rolling a die or dices.
d) Sample Space
This refers to all possible outcomes of a probability experiment. e.g. in tossing a coin,
the outcomes are either Head(H) or tail(T) i.e there are only two possible outcomes
in tossing a coin. The chances of obtaining a head or a tail are equal.
e) A Simple and Compound Events
In an experimental probability, an event with only one outcome is called a simple
event. If an event has two or more outcomes, it is called a compound event.
2) Definition of Probability
Probability can be defined as the mathematics of chance. There are mainly four
approaches to probability;
1)
2)
3)
4)
p=Pr(N)=
. Probability refers to the ratio of possible outcomes to all possible
outcomes. M
The probability of non-occurrence of the same event is given by {1-p(occurrence)}.
The probability of occurrence plus non-occurrence is equal to one.
If probability occurrence; p(O) and probability of non-occurrence (O), then
p(O)+p(O)=1.
Empirical Probability ( Relative Frequency Probability)
Empirical probability arises when frequency distributions are used.
For example:
Observation ( X)
Frequency ( f)
10
16
11
P(2)=
freuency of 2
f (2)
10
10
=
=
=
sum of frequencies f 3 + 7 + 10 + 16 + 11 47
3) Properties of Probability
a) Probability of any event lies between 0 and 1 i.e. 0 p(O) 1. It follows that
probability cannot be negative nor greater than 1.
b) Probability of an impossible event ( an event that cannot occur ) is always
zero(0)
c) Probability of an event that will certainly occur is 1.
d) The total sum of probabilities of all the possible outcomes in a sample space
is always equal to one(1).
e) If the probability of occurrence is p(o)= A, then the probability of non-occurrence is 1-A.
Counting Rules
1) Factorials
Definition: Factorial 4 ! = 4 x 3 x 2 x 1 and 7! = 7 x 6 x 5 x 4 x 3 x 2 x 1
2) Permutation Rules
Definition:
n r
n !
(n r ) !
Examples
5!
5x4x3x2x1
=
= 5x4x3 = 60
(5 3)!
2x1
8!
8! 8x7x6x5x4x3x2x1
8P5 =
= =
= 8x7x6x5x4 = 6720
(8 5)! 3!
3x2x1
5P3 =
3) Combinations
Definition: nCr =
n !
(n r ) ! r !
Examples
C2 =
5!
5x4x3x2x1 5x4
=
=
= 10
(5 2)!2!
3! 2!
2x1
C6 =
10!
10!
10x9x8x7x 6! 10x9x8x7
=
=
=
= 210
(10 6)!6! 4! 6!
4x3x21x 6!
4x3x2x1
10
DO THIS
P
C
8 3
C
15 10
C
6 3
P
15 4
C
9 3
C
10 8
P
7 4
8 3
Answer key
1) 336
2) 56
3) 3003
4) 20
5) 32 760
6)84
7)90
8) 840
Rules of Probability
Addition Rules
P(A or B)=P(A)+P(B)
2) Rule 2: If A and B are two events that are NOT mutually exclusive, then
P(A or B) = P(A) + P(B) - P(A and B), where A and B means the number of
outcomes that event A and B have in common.
Example: When a card is drawn from a pack of 52 cards, find the probability that
the card is a 10 or a heart.
Solution
P( 10) = 4/52 and P( heart)=13/52
P ( 10 that is Heart) = 1/52
P( A or B) = P(A) +P(B)-P( A and B) = 4/52 _ 13/52 1/52 = 16/52.
Multiplication Rules
1) Rule 1: For two independent events A and B, then P( A and B) = P(A) x P(B).
Example: Determine the probability of obtaining a 5 on a die and a tail on a coin
in one throw.
Solution: P( 5) =1/6 and P(T) =1/2.
P(5 and T)= P( 5) x P(T) = 1/6 x = 1/12.
2) Rule 2: When to events are dependent, the probability of both events occurring
is P(A and B)=P(A) x P(B|A), where P(B|A) is the probability that event B occurs
given that event A has already occurred.
Example: Find the probability of obtaining two Aces from a pack of 52 cards without
replacement.
Solution: P( Ace) =2/52 and P( second Ace if NO replacement) = 3/51
Therefore P(Ace and Ace) = P(Ace) x P( Second Ace) = 4/52 x 3/51 = 1/221
Conditional Probability
P (A and B)
,
P (B)
where P(A and B) means the probability of the outcomes that events A and B have
in common.
Example: When a die is rolled once, find the probability of getting a 4 given that
an even number occurred in an earlier throw.
Solution: P( 4 and an even number) = 1/6 ie. P(A and B) =1/6. P(even number) =3/6
=1/2.
P( A|B) =
P (A and B)
=
P (B)
1
1
6
2
1
3
Examples
1) A bag contains 3 orange, 3 yellow and 2 white marbles. Three marbles are selected without replacement. Find the probability of selecting two yellow and a
white marble.
Solution. P( 1st Y) =3/8, P( 2nd Y) = 2/7 and P( W)= 2/6
P(Y and Y and W)=P(Y) x P(Y) x P(W) = 3/8 x 2/7 x 2/6 = 1 / 28
2) In a class, there are 8 girls and 6 boys. If three students are selected at random
for debating, find the probability that all girls.
Solution: P( G) =8/14 and P(B) =6/14. P( 1st G)=8/14, P(2nd G) 7/13 and P(3rdG)=
6/12.
P( three girls) 8/14 x 7/13 x 6/12= 2/13
3) In how many ways can 3 drama officials be selected from 8 members?
Solution:
C3
= 56 ways.
4) A box has 12 bulbs, of which 3 are defective. If 4 bulbs are sold, find the probability that exactly one will be defective.
Solution
P( defective bulb)= 3C1 and P( non-defective bulbs) = 9C3
C1 x 9C3 =
3!
9!
x
= 252
(3 1)!1! (9 3)!3!
Answer Key
1) 5040
2) 220
3) 0.013
READ:
An Introduction to Probability & Random
Processes By Kenneth B & Gian-Carlo R,
pages
1. 1.20 -1.22
Exercise Chapter 1: Sets, Events &
Probability Pg 1.23-1.28 Nos. 1-12
& 14-20
2. 2.1-2.33
Exercise Chapter 2: Finite Processes Pg 2.33 Nos. 1,2,3,13-20,
22-27
3. Introduction to Probability, By Charles
M. Grinstead pages139-141
Random Variables
Random Variables ( r.v)
Definition: A random variable is a function that assigns a real number to every possible result of a random experiment.
(Harry Frank & Steve C Althoen,CUP, 1994, pg 155)
A random variable is a variable in the sense that it can be used as a placeholder for
a number in equations and inequalities. Its randomness is completely described by
its cumulative distribution function which can be used to determine the probability
it takes on particular values.
Formally, a random variable is a measurable function from a probability space to the
real numbers. For example, a random variable can be used to describe the process
of rolling a fair die and the possible outcomes { 1, 2, 3, 4, 5, 6 }. The most obvious
representation is to take this set as the sample space, the probability measure to be
uniform measure, and the function to be the identity function.
Random variable
Some consider the expression random variable a misnomer, as a random variable is
not a variable but rather a function that maps outcomes (of an experiment) to numbers.
Let A be a -algebra and the space of outcomes relevant to the experiment being
performed. In the die-rolling example, the space of outcomes is the set = { 1, 2,
3, 4, 5, 6 }, and A would be the power set of . In this case, an appropriate random
variable might be the identity function X() = , such that if the outcome is a 1,
then the random variable is also equal to 1. An equally simple but less trivial example
is one in which we might toss a coin: a suitable space of possible outcomes is = {
H, T } (for heads and tails), and A equal again to the power set of . One among the
many possible random variables defined on this space is
is said to conver-
is said to
, and variances
converges in distribution to a standard normal random variable.
be independent
Then the
Example
Let X be a real-valued, continuous random variable and let Y = X2. Then,
If y 0, then
So
Probability Distributions
Certain random variables occur very often in probability theory due to many natural
and physical processes. Their distributions therefore have gained special importance in
probability theory. Some fundamental discrete distributions are the discrete uniform,
Bernoulli, binomial, negative binomial, Poisson and geometric distributions. Important continuous distributions include the continuous uniform, normal, exponential,
gamma and beta distributions.
Distribution Functions
If a random variable
defined on the probability space (,A,P) is given,
we can ask questions like How likely is it that the value of X is bigger than 2?.
This is the same as the probability of the event
is often written as P(X > 2) for short.
which
Modern definition: The modern definition starts with a set called the sample
space which relates to the set of all possible outcomes in classical sense, denoted by
. It is then assumed that for each element
probability value
1.
2.
, an intrinsic
So, the probability of the entire sample space is 1, and the probability of the null
event is 0.
The function
mapping a point in the sample space to the probability value
is called a probability mass function abbreviated as pmf. The modern definition
does not try to answer how probability mass functions are obtained; instead it builds
a theory that assumes their existence.
2.
3.
If
is defined as
Whereas the pdf exists only for continuous random variables, the cdf exists for all
random variables (including discrete random variables) that take values on .
These concepts can be generalized for multidimensional cases on
Continuous Distribution
Suppose X is a continuous random variable. A continuous random variable X is specified by its probability density function which is written f(x) where f(x) 0 throughout
the range of values for which x is valid. This probability density function can be
represented by a curve, and the probabilities are given by the area under the curve.
The total area under the curve is equal to 1. The are under the curve between the
lines x=a and x=b ( shaded) gives the probability that X lies between a and b, which
can be denoted by P(a<X<b). p(X) is called a probability density function and the
variable X is often called a continuous random variable
Since the total area under the curve is equal to 1, it follows that the probability between
a range space a and b is given by
P (a X b) =
b
f (x)dx ,
a
( and ) and (< and >) inequalities. We assume the lines at a and b have no
Evaluate
a). The value of constant k
b). The probability of range space P(1<X<2)
c). The probability P(x 3)
Solution
f(x)
x
b
a
For any function f(x) such tha
f(x) 0, for a x b,
and
b
a f (x)dx = 1
may be taken as the probability density function (p.d.f) of a continuous random variable in the range space a x b.
Procedure
Step 1: In general, if X is a continuous random variable (r.v) with p.d.f f(x) valid
over the range a x b, then
f (x)dx = 1 i.e.
all x
f (x)dx = 1
Step 2
a). To determine k, we use the fact that in f(x) = kx(16-x2), for 0<x<4, then
kx(16 x 2 )dx = 1
4
k 16x x 3 )dx = 1
0
k=
Step 3
b).
1
64
Find P(1<X<2)
Solution
P(1<X<2)= f (x)dx
1
1 2
81
(16x x 3 )dx =
1
64
256
Step 4
c). To find P(x 3)
P (x 3) =
1 4
49
(16x x 3 )dx =
64 3
256
Example 2
2). X is the continuous random variable the mass of a substance, in kg, per
minute in an industrial production process, where
1
x(6 x)
f (x) = 12
(0 x 3)
otherwise
f ( x) =
f(x)
1
x( 6 x)
12
x
0
P (x > 2) =
1
x(6 x)dx
12
1 3
(6x x 2 )dx
12 2
3
1
x3
= 3x 2
12
3 2
= 0.722 (3 d.p)
Worked example
3). A continuous random variable has p.d.f f(x) where
f (x) = kx 2 , 0 x 6.
a).
b).
Find P (2 X 4)
Solution
f (x)dx = 1
all
kx 2 dx = 1
kx 3
3 = =1
0
216k
=1
3
3
k=
216
Therefore f(x)=
3 2
1 2
x =
x , 0x6
216
72
b).
f ( x) =
f(x)
x
0
1
x2
72
P (2 x 4) =
1 3
x
216
= 0.259
=
1 2
x dx
72
4
2
f (x) = k(2x 3)
0
0x<2
2x5
otherwise
a). Find the value of the constant k
b). Sketch y=f(x)
c). Find P(X 1)
d). Find P(X>2.5)
Solution
a). Since X is a r.v, then
f (x)dx = 1
all x
Therefore
kx
2
0
+ k x 2 3x 2
2k + 19k = 1
1
k=
21
1
21
1
(2x 3)
f (x) = 21
0x<2
2x5
otherwise
Sketch
1
3
1
21
2.5
1 1
=
= 0.048
21 21
=(
1
1
1
2
11
x 2 ) + ( {0.5}{ + } =
= 0.131
21
2
21 21 84
q
Reflection : Teachers may find graph drawing software useful in
the teaching of statistics.
If you have computer access, download graph and explore its statistical features.
DO THIS
1). The continuous random variable X has p.d.f f(x) where f(x)= k, 0 x 3 .
a) Sketch y=f(x)
2). The continuous random variable has p.d.f f(x) where f(x)=kx2, 1 x 4 .
f (x) k(2x 1)
0
0x<2
2x3
otherwise
b) Sketch y=f(x)
c) Find P(X 2 )
Expectation
Definition
If X is a continuous variable (r.v) with probability density function (p.d.f) f(x), then
the expectation of X is E(X) where
E (X ) =
x f (x)dx
all x
x2 ,
1). If X is a continuous variable ( r.v) with a p.d.f f (x) =
16
find E(X).
0 x 3,
Solution
E (X ) =
x f (x)dx
all x
1
{x} x 2 dx
16
3
1 x4
81
= =
= 1.265
16 4 0 64
f (x) =
E (X ) =
2
(3 + x)(x 1),
5
all x
x f (x)dx
1 x 3 , find E(X).
E (x) =
2
{x} (3 + x)(x 1)dx
5
3
2 x 4 2x 3 3x 2
= +
5 4
3
2 1
608
=
60
= 10.13
Generalisation
If g( x) is any function of the continuous random variable r.v X having p.d.f f(x),
then
E [ g(X )] =
g(x) f (x)dx
all x
and in particular
E (X 2 ) =
x 2 f ( x ) dx
all x
4. E [ ( f1 (X ) + f2 (X )] = E [ f2 (X )]
Example
1). The continuous random variable X has p.d.f f(x) where f(x)=
Find
a).
E(X)
b).
E(X2)
c).
E(2X +3)
1
x, 0 x 3.
2
Solution
a) E (X ) =
x f (x)dx
all x
1 2
x dx
2
3
1 x3
=
2 3 0
= 4.5
b)
E (X 2 ) =
=
all x
x 2 f (x)dx
1 3 3
x dx
2 0
3
1 x4
=
2 4 0
81
= 10.125
8
c). E(2X +3) = E (2X) + 3
= 2E(X) +3
= 2(10.125)+5
DO THIS
kx
f (x) = k
k(4 x)
0 x <1
1 x < 3
3 x 5
otherwise
a). Find k
b). Calculate E(X)
2) The continuous random variable has p.d.f f(x) where f(x) = 1 (x + 3), 0 x 5
10
a).
b).
c).
d).
Find E(X)
Find E(2X+4)
Find E(X2).
Find E( X2 + 2X 1).
Bernoulli Distribution
In probability theory and statistics, the Bernoulli distribution, named after Swiss
scientist Jakob Bernoulli, is a discrete probability distribution, which takes value 1
with success probability p and value 0 with failure probability q = 1 p. So if X is a
random variable with this distribution, we have:
The probability mass function f of this distribution is
The expected value of a Bernoulli random variable X is
riance is
The kurtosis goes to infinity for high and low values of p, but for p = 1 / 2 the Bernoulli
distribution has a lower kurtosis than any other probability distribution, namely -2.
The Bernoulli distribution is a member of the exponential family.
Binomial Distribution
In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no
experiments, each of which yields success with probability p. Such a success/failure
experiment is also called a Bernoulli experiment or Bernoulli trial. In fact, when n =
1, the binomial distribution is a Bernoulli distribution. The binomial distribution is
the basis for the popular binomial test of statistical significance.
Examples
An elementary example is this: roll a die ten times and count the number of 1s as
outcome. Then this random number follows a binomial distribution with n = 10 and
p = 1/6.
For example, assume 5% of the population is green-eyed. You pick 500 people
randomly. The number of green-eyed people you pick is a random variable X which
follows a binomial distribution with n = 500 and p = 0.05 (when picking the people
with replacement).
Examples
1) A coin is tossed 3 times. Find the probability of getting 2 heads and a tail in any
given order.
Formula
We can use the formula
Cx. (p)x.(1-p)n-x
1st)
Cx
2nd)
3rd)
Solution
Tossing 3 times means n=3
Two heads means x=2
P(H)=1/2;
P(T)=1/2
P( 2 heads) = 3C2.
1
2
( )2.(1-
1 3-1
) = 3(1/4)(1/2)= 3/8
2
DO THIS
1
6
5 2
) = 25/72 = 0.347 i.e n=3, x=1, p=1/6
6
1
2
1 5
) = 7/32 = 0.218. i.e n=8, x=3, p=1/2
2
= 3C1.
( )1.(
( )3.(
1). P( one 5)
2
3
( )3.(
1 1
) = 32/81= 0.395 i.e. n=4, x=3, p=2/3
3
READ:
1. Lectures on Statistics, By Robert B. Ash, , page 1-4
Exercise Nos.1, 2 and 3 on pg 4.
2. An Introduction to Probability & Random Processes By
Kenneth B & Gian-Carlo R, pages 3.1-3.63
Exercise Chapter 3: Random Variables pg 3.64-3.82
Nos. 1-7, 11-17, 20-24, 34-36
3. An Introduction to Probability By Charles M. Grinstead
pages 96-107, & 184
Exercise on pages 113-118
Nos. 1,2,3,4,5,8,9,10,19,20
Ref: http://en.wikipedia.org/wiki/measurable_space
Ref: http://en.wikipedia.org/wiki/Probability_theory
Ref: http://en.wikipedia.org/wiki/Bernoulli_distribution
Poisson Distribution
In probability theory and statistics, the Poisson distribution is a discrete probability
distribution that expresses the probability of a number of events occurring in a fixed
period of time if these events occur with a known average rate, and are independent
of the time since the last event.
The distribution was discovered by Simon-Denis Poisson (17811840)
The Poisson distribution is sometimes called a Poissonian, analagous to the term
Gaussian for a Gauss or normal distribution.
The Poisson distribution is used when the variable occurs over a period of time,
volume, area etcit can be used for the arrival of airplanes at airports, the number
of phone calls per hour for a station, the number of white blood cells on a certain
area.
The probability of x successes is
e x
x!
q
Group Work
Example
If there are 100 typographical errors randomly distributed. In 500 pages manuscripts
find the probability that any given page has exactly 4 errors.
Solution
Find the mean number of errors = 100/500 = 1 / 5 = 0,2
In other words there is an average of 0.2 errors per page. In this case = 4 so the
probability of selecting a page with exactly 4 errors
e . x ( 2.7183) ( 0.2 )
=
x!
41
0.2
= 0.00168
Amount 0.2%
Worked Example
A hot line with a full free number receives an average of 4 calls per hour for any
given hour. Find the probability that it will receive exactly 5 calls.
e . x ( 2.7183)
=
x!
5!
= 0.1001
Which is 10%
( 3)
DO THIS
READ:
1. An Introduction to Probability & Random Processes By
Kenneth B & Gian-Carlo R, pages187-192
2. Robert B. Ash, Lectures on Statistics, page 1 and Answer
problems 1,2,3 on pg 15.
Ref: http://en.wikipedia.org/wiki/Normal_distribution
Geometric Distribution
In probability theory and statistics, the geometric distribution is either of two discrete probability distributions:
the probability distribution of the number X of Bernoulli trials needed to get
one success, supported on the set { 1, 2, 3, ...}, or
the probability distribution of the number Y=X1 of failures before the first
success, supported on the set { 0, 1, 2, 3, ... }.
Which of these one calls the geometric distribution is a matter of convention and
convenience.
If the probability of success on each trial is p1, then the probability that k trials are
needed to get one success is
for k = 1, 2, 3, ....
Equivalently, if the probability of success on each trial is p0, then the probability that
there are k failures before the first success is
for k = 0, 1, 2, 3, ....
(1-p)n-1p or simply
1 3-1 ( 1 ) 1
1 1
)
= ( ) .. ( ) ( ) =1/8
2
2
2
2 2
n = 3 and p=1/2
1 1 1 1
=
2 2 2 8
Or by the formula
1
2
31
1 1 1 1
. = = .
2 2 2 8
2
2) A die is rolled; find the probability of getting the first 3 on the fourth roll.
Solution
n=4
p=1/6
4 1
1 5 5 1 125
= 0.96
= =
6
6
6
6
1296
3
Example
If cards are selected from a deck and replaced, how many trials would it take on
average to get two clubs?
P (Club) = 13/52=1/4
2
4
= 2x = 8
1
1
4
DO THIS
1. A card from an ordinary deck of cards is selected and then replaced with another
card selected etc find the probability that the first club will occur on he fourth
draw.
2. A die is tossed until 5 or 6 is obtained. Find the expected number of tosses.
Answer Key
1) Fourth
2) 3
Hypergeometric Distribution
In probability theory and statistics, the hypergeometric distribution is a discrete
probability distribution that describes the number of successes in a sequence of n
draws from a finite population without replacement.
A typical example is illustrated by the contingency table above: there is a shipment
of N objects in which D are defective. The hypergeometric distribution describes the
probability that in a sample of n distinctive objects drawn from the shipment exactly
k objects are defective.
In general, if a random variable X follows the hypergeometric distribution with parameters N, D and n, then the probability of getting exactly k successes is given by
there are
objects.
When the population size is large compared to the sample size (i.e., N is much larger
than n) the hypergeometric distribution is approximated reasonably well by a binomial distribution with parameters n (number of trials) and p = D / N (probability of
success in a single trial).
Hypergeometric Formula
When there are two groups of items such that there are a items in the first group
and b items in the second group, so that the total number of items is (a + b), the
probability of selecting x items from the first group and (n-x) items from the second
group is
C
C
a x . b n x
, where n is the total of items selected without replacement.
C
a+b n
Examples
1. A bag contains 3 blue chips and 3 green chips. If two chips are selected at random, find the probability that both are blue.
Solution
C
C
a x . b n x
From the formula
; a = 3, b= 3, x=2, n=2, n-x=2-2=0
C
a+b n
C
C
3 2 . 3 2 2 3 x1 1
=
= = 0.2
C
15
5
3+ 3 2
Solution
So into
a=6 b=3
n = 6+3=9
x=2
n-.x= 3-2=1
Pr = 6C 2 3C 1 15x3 15
=
=
= 0.536
9C 3
84
28
3 are defective
a=3
7 are good
b=7
Pr (one to be defective)
n = 4
x=1
n-x=4-1=3
3C 1 .7C 3 105
=
= 0.5
10C 4
210
DO THIS
1. In a box of 10 shirts there are five (5) defective ones. If 5 shirts are sold at random
find the probability that exactly two are defective.
Answer
2) 0.255
q
Group Work
P(choosing 5) =
15
1
1
=
C 5 3003
4
P ( AUB ) = P ( A ) + P ( B ) P ( AUB )
52
P (Ace) =
P ( spade) =
13
52
3)
4 13 1
+
52 52 52
16 4
=
52 13
1
P ( A) =
1
51
P ( A )1
50
= use calculator
51
5
1 50
=
51 51
drawn
not drawn
total
4 (k)
1 = 5 4 (D k)
5 (D)
6 = 10 4 (n k)
39 = 50 + 4 10 5 (N + k n D)
45 (N D)
10 (n)
40 (N n)
50 (N)
So, the probability of drawing exactly 4 white marbles is quite low (approximately
0.004) and the event is very unlikely. It means, if you repeated your random experiment (drawing 10 marbles from the urn of 50 marbles without replacement) 1000
times you just would expect to obtain such a result 4 times.
But what about the probability of drawing even (all) 5 white marbles? You will intuitively agree upon that this is even more unlikely than drawing 4 white marbles.
Let us calculate the probability for such an extreme event.
The contingency table is as follows:
drawn
not drawn
total
white marbles
5 (k)
0 = 5 5 (D k)
5 (D)
black marbles
5 = 10 5 (n k)
40 = 50 + 5 10 5 (N + k n D)
45 (N D)
total
10 (n)
40 (N n)
50 (N)
And we can calculate the probability as follows (notice that the denominator always
stays the same):
As expected, the probability of drawing 5 white marbles is even much lower than
drawing 4 white marbles.
Conclusion
Consequently, one could expand the initial question as follows: If you draw 10 marbles from an urn (containing 5 white and 45 black marbles), whats the probability
of drawing at least 4 white marbles? Or, whats the probability of drawing 4 white
marbles and more extreme outcomes such as drawing 5)? This corresponds to calculating the cumulative probability p(k>=4) and can be calculated by the cumulative
distribution function (cdf). Since the hypergeometric distribution is a discrete
probability distribution the cumulative probability can be calculated easily by
adding all corresponding single probability values.
In our example you just have to sum-up Pr (k = 4) and Pr (k = 5):
Pr (k 4) = 0.003964583 + 0.0001189375 = 0.004083520
READ:
1. An Introduction to Probability & Random Processes By
Kenneth B & Gian-Carlo R, pages 184-195
where
and
is the correlation of
and
are commonly used in place of
and
And
Totals
Under 30
31-90
Over 90
Under $50,000
0.06
0.05
0.01
0.13
$50,000-99,999
0.03
0.19
0.10
0.31
$100,000-150,000
0.03
0.35
0.13
0.50
Over $150,000
0.01
0.04
0.01
0.06
Totals
0.13
0.63
0.25
1.00
Marginal Probabilities
Let be partitioned into
disjoint sets
and
denoted
. Then the marginal probability of
READ:
1. An Introduction to Probability & Random Processes
By Kenneth B & Gian-Carlo R, pages 142-150
2. Exercise pg 150 Nos. 1,23,4,5,6,7,8,9,14,15,16,17,26.
q
REFLECTION: ICT resources are difficult
to access!! The link opens up avenue
for Mathematics teachers to access ICT
resources.
http://www.tsm-resources.com/suppl.html
Unit 2
( 40 Hours)
Equality in distribution
Two random variables X and Y are equal in distribution if they have the same distribution functions:
Two random variables having equal moment generating functions have the same
distribution.
Equality in mean
Two random variables X and Y are equal in p-th mean if the pth moment of |X Y|
is zero, that is,
Equality in pth mean implies equality in qth mean for all q<p. As in the previous case,
there is a related distance between the random variables, namely
Equality
Finally, the two random variables X and Y are equal if they are equal as functions on
their probability space, that is,
Moment-generating Function
In probability theory and statistics, the moment-generating function of a random
variable X is
wherever this expectation exists. The moment-generating function generates the
moments of the probability distribution.
For vector-valued random variables X with real components, the moment-generating
function is given by
where t is a vector and
Provided the moment-generating function exists in an interval around t=0, the nth
moment is given by
If X has a continuous probability density function f(x) then the moment generating
function is given by
where mi is the ith moment. MX( t) is just the two-sided Laplace transform of f(x).
Regardless of whether the probability distribution is continuous or not, the momentgenerating function is given by the Riemann-Stieltjes integral
where F is the cumulative distribution function.
If X1, X2, ..., Xn is a sequence of independent (and not necessarily identically distributed) random variables, and
where the ai are constants, then the probability density function for Sn is the convolution of the probability density functions of each of the Xi and the moment-generating
function for Sn is given by
Related to the moment-generating function are a number of other transforms that are
common in probability theory, including the characteristic function and the probability-generating function.
Markovs Inequality
f(x)
{X X
| f ( x)
Markovs inequality gives an upper bound for the probability that X lies within
{ X X | f (x) }
In probability theory, Markovs inequality gives an upper bound for the probability
that a non-negative function of a random variable is greater than or equal to some positive constant. It is named after the Russian mathematician Andrey Markov, although
it appeared earlier in the work of Pafnuty Chebyshev (Markovs teacher).
Markovs inequality (and other similar inequalities) relate probabilities to expectations,
and provide (frequently) loose but still useful bounds for the cumulative distribution
function of a random variable.
Chebyshevs Inequality
In probability theory, Chebyshevs inequality (also known as Tchebysheffs inequality, Chebyshevs theorem, or the Bienaym-Chebyshev inequality), named
after Pafnuty Chebyshev, who first proved it, states that in any data sample or probability distribution, nearly all the values are close to the mean value, and provides
a quantitative description of nearly all and close to. For example, no more than
1/4 of the values are more than 2 standard deviations away from the mean, no more
than 1/9 are more than 3 standard deviations away, no more than 1/25 are more than
5 standard deviations away, and so on.
Probabilistic statement
Let X be a random variable with expected value and finite variance 2. Then for
any real number k>0,
Only the cases k > 1 provide useful information.
As an example, using k=2 shows that at least half of the values lie in the interval
( 2 , + 2 ).
Typically, the theorem will provide rather loose bounds. However, the bounds provided by Chebyshevs inequality cannot, in general (remaining sound for variables of
arbitrary distribution), be improved upon. For example, for any k>1, the following
example (where =1/k) meets the bounds exactly.
The theorem can be useful despite loose bounds because it applies to random variables
of any distribution, and because these bounds can be calculated knowing no more
about the distribution than the mean and variance.
Chebyshevs inequality is used for proving the weak law of large numbers.
Example application
For illustration, assume we have a large body of text, for example articles from a
publication. Assume we know that the articles are on average 1000 characters long
with a standard deviation of 200 characters. From Chebyshevs inequality we can
then deduce that at least 75% of the articles have a length between 600 and 1400
characters (k = 2).
Probabilistic proof
Markovs inequality states that for any real-valued random variable Y and any positive
number a, we have Pr(|Y|>a) E(|Y|)/a. One way to prove Chebyshevs inequality
is to apply Markovs inequality to the random variable Y=(X)2 with a = (k)2.
It can also be proved directly. For any event A, let IA be the indicator random variable
of A, i.e. IA equals 1 if A occurs and 0 otherwise. Then
The direct proof shows why the bounds are quite loose in typical cases: the number 1
to the left of is replaced by [(X)/(k)]2 to the right of whenever the latter
exceeds 1. In some cases it exceeds 1 by a very wide margin.
READ:
1. An Introduction to Probability & Random Processes
By Kenneth B & Gian-Carlo R, pages 305-318
Correlation Types
Correlation is a measure of association between two variables. The variables are not
designated as dependent or independent. The two most popular correlation coefficients are: Spearmans correlation coefficient rho and Pearsons product-moment
correlation coefficient.
When calculating a correlation coefficient for ordinal data, select Spearmans technique. For interval or ratio-type data, use Pearsons technique.
The value of a correlation coefficient can vary from minus one to plus one. A minus
one indicates a perfect negative correlation, while a plus one indicates a perfect positive correlation. A correlation of zero means there is no relationship between the
two variables. When there is a negative correlation between two variables, as the
value of one variable increases, the value of the other variable decreases, and vise
versa. In other words, for a negative correlation, the variables work opposite each
other. When there is a positive correlation between two variables, as the value of
one variable increases, the value of the other variable also increases. The variables
move together.
The standard error of a correlation coefficient is used to determine the confidence
intervals around a true correlation of zero. If your correlation coefficient falls outside
of this range, then it is significantly different than zero. The standard error can be
calculated for interval or ratio-type data (i.e., only for Pearsons product-moment
correlation).
The significance (probability) of the correlation coefficient is determined from the
t-statistic. The probability of the t-statistic indicates whether the observed correlation
coefficient occurred by chance if the true correlation is zero. In other words, it asks
if the correlation is significantly different than zero. When the t-statistic is calculated for Spearmans rank-difference correlation coefficient, there must be at least 30
cases before the t-distribution can be used to determine the probability. If there are
fewer than 30 cases, you must refer to a special table to find the probability of the
correlation coefficient.
Example
A company wanted to know if there is a significant relationship between the total number of salespeople and the total number of sales. They collect data for five months.
Variable 1
207
180
220
205
190
Variable 2
6907
5991
6810
6553
6190
Another Example
Respondents to a survey were asked to judge the quality of a product on a four-point
Likert scale (excellent, good, fair, poor). They were also asked to judge the reputation
of the company that made the product on a three-point scale (good, fair, poor). Is
there a significant relationship between respondents perceptions of the company and
their perceptions of quality of the product?
Since both variables are ordinal, Spearmans method is chosen. The first variable is
the rating for the quality the product. Responses are coded as 4=excellent, 3=good,
2=fair, and 1=poor. The second variable is the perceived reputation of the company
and is coded 3=good, 2=fair, and 1=poor.
Variable 1
4
2
1
3
4
1
2
Variable 2
3
2
2
3
3
1
1
Regression
Simple regression is used to examine the relationship between one dependent and
one independent variable. After performing an analysis, the regression statistics can
be used to predict the dependent variable when the independent variable is known.
Regression goes beyond correlation by adding prediction capabilities.
People use regression on an intuitive level every day. In business, a well-dressed
man is thought to be financially successful. A mother knows that more sugar in her
childrens diet results in higher energy levels. The ease of waking up in the morning
often depends on how late you went to bed the night before. Quantitative regression
adds precision by developing a mathematical formula that can be used for predictive
purposes.
For example, a medical researcher might want to use body weight (independent
variable) to predict the most appropriate dose for a new drug (dependent variable).
The purpose of running the regression is to find a formula that fits the relationship
between the two variables. Then you can use that formula to predict values for the
dependent variable when only the independent variable is known. A doctor could
prescribe the proper dose based on a persons body weight.
The regression line (known as the least squares line) is a plot of the expected value
of the dependent variable for all values of the independent variable. Technically, it
is the line that minimizes the squared residuals. The regression line is the one that
best fits the data on a scatterplot.
Using the regression equation, the dependent variable may be predicted from the independent variable. The slope of the regression line (b) is defined as the rise divided by
the run. The y intercept (a) is the point on the y axis where the regression line would
intercept the y axis. The slope and y intercept are incorporated into the regression
equation. The intercept is usually called the constant, and the slope is referred to as
the coefficient. Since the regression model is usually not a perfect predictor, there is
also an error term in the equation.
In the regression equation, y is always the dependent variable and x is always the
independent variable. Here are three equivalent ways to mathematically describe a
linear regression model.
y = a + bx + e
The significance of the slope of the regression line is determined from the t-statistic.
It is the probability that the observed correlation coefficient occurred by chance if
the true correlation is zero. Some researchers prefer to report the F-ratio instead of
the t-statistic. The F-ratio is equal to the t-statistic squared.
The t-statistic for the significance of the slope is essentially a test to determine if the
regression model (equation) is usable. If the slope is significantly different than zero,
then we can use the regression model to predict the dependent variable for any value
of the independent variable.
On the other hand, take an example where the slope is zero. It has no prediction ability
because for every value of the independent variable, the prediction for the dependent
variable would be the same. Knowing the value of the independent variable would
not improve our ability to predict the dependent variable. Thus, if the slope is not
significantly different than zero, dont use the model to make predictions.
The coefficient of determination (r-squared) is the square of the correlation coefficient. Its value may vary from zero to one. It has the advantage over the correlation
coefficient in that it may be interpreted directly as the proportion of variance in the
dependent variable that can be accounted for by the regression equation. For example,
an r-squared value of .49 means that 49% of the variance in the dependent variable
can be explained by the regression equation. The other 51% is unexplained.
The standard error of the estimate for regression measures the amount of variability
in the points around the regression line. It is the standard deviation of the data points
as they are distributed around the regression line. The standard error of the estimate
can be used to develop confidence intervals around a prediction.
Example
A company wants to know if there is a significant relationship between its advertising
expenditures and its sales volume. The independent variable is advertising budget and
the dependent variable is sales volume. A lag time of one month will be used because
sales are expected to lag behind actual advertising expenditures. Data was collected
for a six month period. All figures are in thousands of dollars. Is there a significant
relationship between advertising budget and sales volume?
Independent
Variable
4.2
6.1
3.9
5.7
7.3
5.9
Dependent
Variable
27.1
30.4
25.0
29.7
40.1
28.8
You might make a statement in a report like this: A simple linear regression was
performed on six months of data to determine if there was a significant relationship
between advertising expenditures and sales volume. The t-statistic for the slope was
significant at the .05 critical alpha level, t(4)=4.10, p=.015. Thus, we reject the null
hypothesis and conclude that there was a positive significant relationship between
advertising expenditures and sales volume. Furthermore, 80.7% of the variability in
sales volume could be explained
READ:
1) An Introduction to Probability & Random Processes By
Kenneth B & Gian-Carlo R, pages 18-30, 212-215, 300303
2) Robert B. Ash, Lectures on Statistics, page 28-29.
Ref: http://en.wikipedia.org/wiki/Correlation
Ref: http://en.wikipedia.org/wiki/Regression
Chi-square Test
A chi-square test is any statistical hypothesis test in which the test statistic has a chisquare distribution when the null hypothesis is true, or any in which the probability
distribution of the test statistic (assuming the null hypothesis is true) can be made
to approximate a chi-square distribution as closely as desired by making the sample
size large enough.
Specifically, a chi-square test for independence evaluates statistically significant
differences between proportions for two or more groups in a data set.
Pearson's chi-square test, also known as the Chi-square goodness-of-fit test
Yates' chi-square test also known as Yates' correction for continuity
Mantel-Haenszel chi-square test
Linear-by-linear association chi-square test
is distributed according to the chi-square distribution. This is usually written
The chi-square distribution has one parameter: k - a positive integer that specifies the
number of degrees of freedom (i.e. the number of Xi)
The chi-square distribution is a special case of the gamma distribution.
The best-known situations in which the chi-square distribution is used are the common
chi-square tests for goodness of fit of an observed distribution to a theoretical one,
and of the independence of two criteria of classification of qualitative data. However,
many other statistical tests lead to a use of this distribution.
Characteristic Function
The characteristic function of the Chi-square distribution is
Properties
The chi-square distribution has numerous applications in inferential statistics, for
instance in chi-square tests and in estimating variances. It enters the problem of estimating the mean of a normally distributed population and the problem of estimating
the slope of a regression line via its role in Students t-distribution. It enters all analysis
of variance problems via its role in the F-distribution, which is the distribution of the
ratio of two independent chi-squared random variables divided by their respective
degrees of freedom.
READ:
Ref: http://en.wikipedia.org/wiki/pearson%chi-square_test
Ref: http://en.wikipedia.org/wiki/Chi-Square_test
Students T-test
A t test is any statistical hypothesis test for two groups in which the test statistic has
a Students t distribution if the null hypothesis is true.
History
The t statistic was introduced by William Sealy Gosset for cheaply monitoring the
quality of beer brews. Student was his pen name. Gosset was a statistician for the
Guinness brewery in Dublin, Ireland, and was hired due to Claude Guinnesss innovative policy of recruiting the best graduates from Oxford and Cambridge to apply
biochemistry and statistics to Guinness industrial processes. Gosset published the t test
in Biometrika in 1908, but was forced to use a pen name by his employer who regarded
the fact that they were using statistics as a trade secret. In fact, Gossets identity was
unknown not only to fellow statisticians but to his employerthe company insisted
on the pseudonym so that it could turn a blind eye to the breach of its rules.
Today, it is more generally applied to the confidence that can be placed in judgments
made from small samples.
Use
Among the most frequently used t tests are:
A test of the null hypothesis that the means of two normally distributed populations are equal. Given two data sets, each characterized by its mean, standard
deviation and number of data points, we can use some kind of t test to determine
whether the means are distinct, provided that the underlying distributions can
be assumed to be normal. All such tests are usually called Students t tests,
though strictly speaking that name should only be used if the variances of the
two populations are also assumed to be equal; the form of the test used when
this assumption is dropped is sometimes called Welchs t test. There are different
versions of the t test depending on whether the two samples are
o independent of each other (e.g., individuals randomly assigned into two
groups), or
o paired, so that each member of one sample has a unique relationship with a
particular member of the other sample (e.g., the same people measured before
and after an intervention, or IQ test scores of a husband and wife).
If the t value that is calculated is above the threshold chosen for statistical significance
(usually the 0.05 level), then the null hypothesis that the two groups do not differ is
rejected in favor of an alternative hypothesis, which typically states that the groups
do differ.
Once a t value is determined, a P value can be found using a table of values from
Students t-distribution.
and using probabilities based on the normal distribution, calculate the t value
The probability that the t value is within a particular interval may be found using the
t distribution. The samples degrees of freedom are the number of data that need to
be known before the rest of the data can be calculated.
e.g.
A random sample of things have weights
30.02, 29.99, 30.11, 29.97, 30.01, 29.99
Calculate a 95% confidence interval for the populations mean weight.
Assume the population ~ N(,2)
The samples mean weight is 30.015 with standard deviation of 0.045. With the mean
and the first five weights it is possible to calculate the sixth weight. Consequently
there are five degrees of freedom.
The t distribution tells us that, for five degrees of freedom, the probability that t >
2.571 is 0.025. Also, the probability that t < 2.571 is 0.025. Using the formula for
t with t = 2.571 a 95% confidence interval for the populations mean may be found
by making the subject of the equation.
i.e.
READ:
1. Introduction to Probability By Charles M. Grinstead, pages
18-30, 212-215, 300-303
2. Robert B. Ash, Lectures on Statistics, page 23-29.
Answer problems 1- 6 on pg 23.
Ref: http://en.wikipedia.org/wiki/Statistical_Hypothesis_testing
Ref: http://en.wikipedia.org/wiki/Null_hypothesis
q
Reflection
http://www.ncaction.org.uk/subjects/maths/ict-lrn.htm
( 40 Hours)
Indicator Function
In mathematics, an indicator function or a characteristic function is a function
defined on a set X that indicates membership of an element in a subset A of X.
The indicator function of a subset A of a set X is a function
defined as
The indicator function of A is sometimes denoted
A(x) or
or even A(x).
Bonferoni Inequality
Let
be the probability that is true, and
be the probability that at
least one of , , ..., is true. Then the Bonferroni inequality, also known as
Booles inequality, states that
where denotes the union. If and are disjoint sets for all and , then the inequality becomes an equality. A beautiful theorem that expresses the exact relationship
between the probability of unions and probabilities of individual events is known as
the inclusion-exclusion principle.
A slightly wider class of inequalities are also known as Bonferroni inequalities.
Generating Function
In mathematics a generating function is a formal power series whose coefficients
encode information about a sequence an that is indexed by the natural numbers.
There are various types of generating functions, including ordinary generating
functions, exponential generating functions, Lambert series, Bell series, and
Dirichlet series; definitions and examples are given below. Every sequence has a
generating function of each type. The particular generating function that is most
useful in a given context will depend upon the nature of the sequence and the details
of the problem being addressed.
Generating functions are often expressed in closed form as functions of a formal
argument x. Sometimes a generating function is evaluated at a specific value of x.
However, it must be remembered that generating functions are formal power series,
and they will not necessarily converge for all values of x.
If an is the probability mass function of a discrete random variable, then its ordinary
generating function is called a probability-generating function.
The ordinary generating function can be generalised to sequences with multiple
indexes. For example, the ordinary generating function of a sequence am,n (where n
and m are natural numbers) is
If X is a vector-valued random variable, one takes the argument t to be a vector and
tX to be a dot product.
Every probability distribution on R or on Rn has a characteristic function, because
one is integrating a bounded function over a space whose measure is finite.
The continuity theorem
If the sequence of characteristic functions of distributions Fn converges to the characteristic function of a distribution F, then Fn(x) converges to F(x) at every value of
x at which F is continuous.
where the ai are constants, then the characteristic function for Sn is given by
In particular,
of characteristic function:
Observe that the independence of X and Y is required to establish the equality of the
third and fourth expressions.
Because of the continuity theorem, characteristic functions are used in the most
frequently seen proof of the central limit theorem.
Characteristic functions can also be used to find moments of random variable. Provided
that nth moment exists, characteristic function can be differentiated n times and
READ:
1. Robert B. Ash, Lectures on Statistics, page 32 of 45:
Ref : http://en.wikipedia.org/wiki/Characteristic_function_
%28probability_theory%29
Statistical Independence
In probability theory, to say that two events are independent intuitively means that
the occurrence of one event makes it neither more nor less probable that the other
occurs. For example:
The event of getting a "6" the first time a die is rolled and the event of getting a
"6" the second time are independent.
By contrast, the event of getting a "6" the first time a die is rolled and the event
that the sum of the numbers seen on the first and second trials is "8" are dependent.
If two cards are drawn with replacement from a deck of cards, the event of
drawing a red card on the first trial and that of drawing a red card on the second
trial are independent.
By contrast, if two cards are drawn without replacement from a deck of cards,
the event of drawing a red card on the first trial and that of drawing a red card
on the second trial are dependent.
Similarly, two random variables are independent if the conditional probability distribution of either given the observed value of the other is the same as if the others
value had not been observed.
Independent Events
The standard definition says:
Two events A and B are independent if and only if Pr(A B) = Pr(A)Pr(B).
Here A B is the intersection of A and B, that is, it is the event that both events A
and B occur.
More generally, any collection of events -- possibly more than just two of them -- are
mutually independent if and only if for any finite subset A1, ..., An of the collection
we have
This is called the multiplication rule for independent events.
If two events A and B are independent, then the conditional probability of A given B
is the same as the unconditional (or marginal) probability of A, that is,
There are at least two reasons why this statement is not taken to be the definition
of independence: (1) the two events A and B do not play symmetrical roles in this
statement, and (2) problems arise with this statement when events of probability 0
are involved.
When one recalls that the conditional probability Pr(A | B) is given by
Random Sample
A sample is a subset chosen from a population for investigation. A random sample is
one chosen by a method involving an unpredictable component. Random sampling can
also refer to taking a number of independent observations from the same probability
distribution, without involving any real population. A probability sample is one in
which each item has a known probability of being in the sample.
The sample will usually not be completely representative of the population from
which it was drawn this random variation in the results is known as sampling
error. In the case of random samples, mathematical theory is available to assess the
sampling error. Thus, estimates obtained from random samples can be accompanied
by measures of the uncertainty associated with the estimate. This can take the form
of a standard error, or if the sample is large enough for the central limit theorem to
take effect, confience intervals may be calculated.
A simple random sample is selected so that every possible sample has an equal
chance of being selected.
Cluster sampling involves selecting the sample units in groups. For example, a
sample of telephone calls may be collected by first taking a collection of telephone
lines and collecting all the calls on the sampled lines. The analysis of cluster
samples must take into account the intra-cluster correlation which reflects the
fact that units in the same cluster are likely to be more similar than two units
picked at random.
Multinomial Distribution
In probability theory, the multinomial distribution is a generalization of the binomial distribution.
The binomial distribution is the probability distribution of the number of successes
in n independent Bernoulli trials, with the same probability of success on each trial.
In a multinomial distribution, each trial results in exactly one of some fixed finite
number k of possible outcomes, with probabilities p1, ..., pk (so that pi0 for
i=1,...,k and
), and
there are n independent trials. Then let the random variables Xi indicate the number
of times outcome number i was observed over the n trials.
follows a multinomial distribution with parameters n and p.
n!
.
x1 ! x2 ! x3 !
p .p .p
x1
x2
x3
where x1 + x2 + x3 = n and p1 + p2 + p3 = 1
Example
1) In a large city, 60% of the workers drive to work, 30% take the bus, and 10%
take the train. If 5 workers are selected at random, find the probability that 2 will
drive, 2 will take the us, and 1 will take the train.
Solution
n= 5, x1=2, x2 = 2, x3= 1 and p1=0.6, p2= 0.3, and p3 = 0.1
Hence, the probability that 2 workers will take the bus, and one will take the train
is
2
2
1
5!
. (0.6) (0.3) (0.1) = 0.0972
2 ! 2 !1!
2) A box contains 5 red balls, 3 blue balls, and 2 white balls. If 4 balls are selected
with replacement, find the probability of getting 2 red balls, one blue ball, and
one white ball.
Solution
n=4, x1=2, x2=1, x3=1, and p1=
5
3
2
, p2=
, and p3=
.
10
10
10
Hence, the probability of getting 2 red balls, one blue ball, and one white ball is
4! 5 3 2
3 9
= 0.18
= 12
=
2!1!1! 10
10
10
200 50
2
Order Statistic
Probability distributions for the n = 5 order statistics of an exponential distribution
with = 3.
In statistics, the kth order statistic of a statistical sample is equal its kth-smallest
value. Together with rank statistics, order statistics are among the most fundamental
tools in non-parametric statistics and inference.
Important special cases of the order statistics are the minimum and maximum value
of a sample, and (with some qualifications discussed below) the sample median and
other sample quartiles.
When using probability theory to analyse order statistics of random samples from
a continuous distribution, the cumulative distribution function is used to reduce the
analysis to the case of order statistics of the uniform distribution.
READ:
1. Robert B. Ash, Lectures on Statistics, page 25 -26 and
Answer problems 1-4 on pg 26/27.
Ref: http://en.wikipedia.org/wiki/probability _distribution
Ref: http://en.wikipedia.org/wiki/Ranking
Ref: http://en.wikipedia.org/wiki/non-parametric_Statistics
6, 9, 3, 8,
The first order statistic (or smallest order statistic) is always the minimum of the
sample, that is,
where, following a common convention, we use upper-case letters to refer to random
variables, and lower-case letters (as above) to refer to their actual observed values.
Similarly, for a sample of size n, the nth order statistic (or largest order statistic)
is the maximum, that is,
The sample range is the difference between the maximum and minimum. It is clearly
a function of the order statistics:
A similar important statistic in exploratory data analysis that is simply related to the
order statistics is the sample interquartile range.
The sample median may or may not be an order statistic, since there is a single middle value only when the number n of observations is odd. More precisely, if n = 2m
+ 1 for some m, then the sample median is X(m + 1) and so is an order statistic. On the
other hand, when n is even, n = 2m and there are two middle values, X(m) and X(m + 1),
and the sample median is some function of the two (usually the average) and hence
not an order statistic. Similar remarks apply to all sample quantiles.
where
(a) If k is odd,
where the sum is taken over all allocations of the set
into (unorde1
red) pairs, giving (2 1)! / (2 ( 1)!) terms in the sum, each being the product
of covariances. The covariances are determined by replacing the terms of the list
by the corresponding terms of the list consisting of r1 ones, then r2
twos, etc, after each of the possible allocations of the former list into pairs.
In particular, the 4-order moments are
For fourth order moments (four variables) there are three terms. For sixth-order moments there are 35 = 15 terms, and for eighth-order moments there are 357
= 105 terms.
128
157
144
135
165
161
138
146
146
168
135
150
140
142
138
142
147
176
142
147
145
140
154
149
152
156
125
148
119
153
150
144
163
134
136
145
173
164
158
126
Find;
a). the highest weight
b). the least weight
c). the range
d). construct a frequency distribution table starting with a class of 118-126
e). calculate the mean of the data
f). calculate the standard deviation
Question 2: General Probability
2) A). A coin and a die are thrown together. Draw a possibility space diagram and
find the probability of obtaining:
a).
b).
c).
d).
a head
a number greater than 4
a head and a number greater than 4
a head or a number greater than 4
19
2
4
, P(N) = and P(M U N)= . Find
30
5
5
i). no misprints
c). find the probability that pages 427 and 428 will contain no misprints
Question 4: Continuous Random Variable
4) A continuous random variable (r.v) X has a probability density function (p.d.f)
f(x) where
f (x) =
k(x + 2)2
4k
0
2xp0
0 x 113
otherwise
b) Sketch y=f(x)
c) Find P( - 1 X 1)
d) Find P(x>1)
Probability of an event
5). Given that P(AUB) =7/8, P(A I B)=1/4 and P(A)=5/8, find the values of
a) P(A)
b) P(B)
c) P(A I B)
d) P(AU B)
Expected Value
6). The continuous random variable r.v has the p.d.f
f (x) = x +
1
2
0 x 1
Find:
a).E(X)
b).E(24X +6)
c).E( 1-X)
7). The masses, to the nearest kg, of 50 boys are recorded below.
Mass (kg)
Frequency (f)
60-64
2
65-69
6
70-74
12
i) Median
75-79
14
80-84
10
85-89
6
b) 119
c) 176-119=57
Weight(kg)
118-126
127-135
136-144
145-153
154-162
163-171
172-180
Tally
///
////
//// ////
//// //// //
////
////
//
Frequency
3
5
9
12
5
4
2
Total 40
Coin / Die
Coin H
Coin T
1
2
3
4
5
6
H1 H2 H3 H4 H5 H6
T1 T2 T3 T4 T5 T6
Sample space=12.
a). 6/12=1/2
b). 4/12=1/3
c). 2/12=1/6
d). 8/12=2/3
4 19
2
=
+
- P(M I N).
5 30 5
P(M I N) =
19 12
+
30 30
24
30
7
30
ii).
= 0.2231
P(X=4)= e-1.5
(1.5)4
= 0.0470
4!
c). We expect 1.5 misprints on each page and so on two pages 427 & 428 we
expect 1.5 + 1.5 = 3 misprints.
= 0.4421
all
Therefore
0
2
k(x + 2) dx +
2
k
3 0
(x
+
2)
2
3
f (x)dx = 1
1
3
4kdx = 1
0
11
+ 4k [ x ]0 3
= 1
k
4
(8) + 4k
3
3
k=
= 1
8k=1
1
8
a) The p.d.f of X is
y
1
y
x
-2
11
c)
P(- 1
x 0) =
and
Therefore
P (1 X 1) =
P (0 x 1) = area of
8 (x + 2) dx = 24
rec tan gle =
7 1 19
+ =
24 2 24
1
2
d).
Therefore P(x>1) =
1 1 1
= .
3 2 6
1
6
5) a) P(A)=1-P(A)=1- 5/8=3/8
7/8=3/8+P(B)
P(B)=3/4
= 1/8 + =5/8
6). a).E(X)=7/8
b).E(24X+6)=20
c).E( 1-X)
1
1
3
2
(1
x)
(x + )dx =
0
2
5
1
c). Estimate of
d). Estimate of
7
50 = 35 th
10
60
50 = 30 th
100
XVII. References
http://en.wikipedia.org/wiki/Statistics
A concise Course in A-Level Statistics By J. Crawshaw and J.Chambers, Stanley
Thornes Publishers, 1994
http://en.wikipedia.org/wiki/Probability
Business Calculation and Statistics Simplified, By N.A. Saleemi, 2000
http://microblog.routed.net/wp-content/uploads/2007/01/onlinebooks.html
Statistics: concepts and applications, By Harry Frank and Steven C Althoen, Cambridge University Press, 2004
http://mathworld.wolfram.com/Statistics
http://mathworld.wolfram.com/Probability
Probability Demystified, By Allan G. Bluman, McGraw Hill, 2005.
http://directory.fsf.org/math/
http://microblog.routed.net/wp-content/uploads/2007/01/onlinebooks.html
Lectures on Statistics, By Robert B. Ash, 2005.
Introduction to Probability, By Charles M. Grinstead and J. Laurie Snell, Swarthmore
College.
http://directory.fsf.org/math/
Simple Statistics, By Frances Clegg, Cambridge University Press 1982.
Statistics for Advanced Level Mathematics, By I. Gwyn Evans University College
of Wales, 1984.