Professional Documents
Culture Documents
Statistical measures
Chapter Goals
After completing this chapter, you should be able
to:
Compute and interpret the mean, median, and mode for
a set of data
Compute the range, variance, and standard deviation
and know what these values mean
Construct and interpret a box and whiskers plot
Compute and explain the coefficient of variation and
z scores
Use numerical measures along with graphs, charts, and
tables to describe data
Chapter Topics
Coefficient of
Variation
Measures of Center and Location
Overview
Center and Location
n
x i
XW
wx i i
x
w
i1
n i
N
x i W
wxi i
i1
N
w i
Mean (Arithmetic Average)
x i
x1 x 2 x n
x i1
n n
◦ Population mean
N = Population Size
N
x x1 x 2 xN
i
i1
N N
Mean (Arithmetic Average)
(continued)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1 2 3 4 5 15 1 2 3 4 10 20
3 4
5 5 5 5
Weighted Mean
xi fi Is the arithmetic
mean appropriate to a
10 2 simple frequency
12 8 distribution?
13 17 Why?
14 5 Formula:
n
16 1 x i fi
x i 1
n
i 1
fi
Example
x f xf
0 12
1 18 (x): Number of
2 30 newspapers/magazine
3 20 s/journals a student
4 15 read a week
5 5 (f): Number of students
Total 100
Weighted mean of a grouped
frequency distribution
Example: The following data relates to the
productivity of workers in a factory:
x i fi
x i 1
n
f
i 1
i
Where:
- x: mid-point as representative value of
each class
- f: frequency of each class
Weighted mean of a grouped
frequency distribution
Productivity Number of xi xifi
(items/h) workers
0-9 15
10-19 25
20-29 30
30-39 35
40-49 28
50-59 17
Total
3. Geometric mean
n
gmm fi (1
i 1
pi ) fi
3. Geometric mean
22
GEOMETRIC MEAN Example
Geometric mean
For example, suppose you have an investment which earns 10% the
first year, 60% the second year, and 20% the third year. What is its
average rate of return? (cannot just add & divide because it is a
23 of
factor) 26
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 3 Median = 3
2
Median for a simple
frequency distribution
Step 2: Find the value(s) that correspond
to the middle item(s)
M e xm1
Example
xi fi Fi Median?
1 8
2 15
3 20
4 13
5 9
Total 65
Median for a grouped
frequency distribution
Step 1: Find the middle item(s)
Step 2: Find the class(es) containing the
middle item(s)
Step 3: Estimating the median by formula
n
f i
i 1
FM e 1
M e LM e cM e 2
fMe
LM e Lower limit of the median class
400-500 10
500-600 30
600-700 45
700-800 80
800-900 30
900-1000 5
Mode
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
Mode = 5 No Mode
The mode of a simple
frequency distribution
400-500 10
500-600 30
600-700 45
700-800 80
800-900 30
900-1000 5
Mode?
Review Example
$2,000,000
500,000 $500 K
300,000 $300 K
100,000
100,000
$100 K
$100 K
Summary Statistics
Mean < Median < Mode Mean = Median = Mode Mode < Median < Mean
(Longer tail extends to left) (Longer tail extends to right)
Other Location Measures
Other Measures
of Location
Percentiles Quartiles
Q1 Q2 Q3
Example: Find the first quartile
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9)
Q1 = 25th percentile, so find the 25 (9+1) = 2.5 position
100
so use the value half way between the 2nd and 3rd values,
so Q1 = 12.5
Box and Whisker Plot
Example:
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
Box-and-Whisker Plot Example
0 22 33 55 27
27
This data is very right skewed, as the plot
depicts
NOTE
Variation
Sample Sample
Variance Standard
Deviation
Variation
Measures of variation give
information on the spread or
variability of the data values.
Same center,
different variation
Range
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Disadvantages of the Range
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Interquartile Range
Example:
Median X
X Q1 Q3 maximum
minimum (Q2)
25% 25% 25% 25%
12 30 45 57 70
Interquartile range
= 57 – 30 = 27
2. The mean deviation
x i x
d i 1
n
Formula
f i xi x
d i 1
k
f
i 1
i
Example
Group 20 30 40 50 60
A
Group 38 39 40 41 42
B
n n
xi x xi x
dA i 1
dB i 1
n n
Example
Productivity Number of
The data in table workers
(kg/person)
below relates to
<10 7
the productivity
(kg/person) of 10 – 20 18
100 workers in a 20 – 30 25
small factory 30 – 35 20
35 – 40 18
Mean ≥ 40 12
deviation? Total 100
Characteristics of the mean
deviation
A better measure of dispersion than the
range
Useful for comparing the variability
between distributions
Can be complicated to calculate in
practice if the mean is anything other
than a whole number.
Variance
(x i x) 2
s2 i1
n -1
◦ Population variance: N
(x i μ) 2
σ2 i1
N
For a frequency distribution
k
i
( x x ) 2
fi
2 i 1
k
f
i 1
i
i fi
x 2
or 2 i 1
k
( x )2 x 2 ( x )2
f
i 1
i
Example
Group 20 30 40 50 60
A
Group 38 39 40 41 42
B
n
i
( x x ) 2
2 i 1
n
Example
i
(x x ) 2
s i1
n -1
◦ Population standard deviation:
N
i
(x μ) 2
σ i1
N
For a frequency distribution
k
x x fi
2
i
i 1
k
f
i 1
i
i fi
x 2
or i 1
k
( x )2 x 2 ( x )2
f
i 1
i
Example
Group 20 30 40 50 60
A
Group 38 39 40 41 42
B
2
Example
Sample
Data (Xi) : 10 12 14 15 17 18 18 24
n=8 Mean = x = 16
126
4.2426
7
Comparing Standard Deviations
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = .9258
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 4.57
Coefficient of Variation
Population Sample
σ s
CV
μ
100% CV
x 100%
Comparing Coefficient
of Variation
Stock A:
◦ Average price last year = $50
◦ Standard deviation = $5
s $5
CVA 100% 100% 10%
Stock B: x $50 Both stocks
have the same
standard
◦ Average price last year = $100 deviation, but
◦ Standard deviation = $5 stock B is less
variable relative
to its price
s $5
CVB 100% 100% 5%
x $100
Standardized Data Values
A standardized data value refers to
the number of standard deviations a
value is from the mean
Standardized data values are
sometimes referred to as z-scores
Useful to compare data from two or
more distributions when data scales
for these distributions are different.
Standardized Population Values
x μ
z
σ
where:
x = original data value
μ = population mean
σ = population standard deviation
z = standard score
(number of standard deviations x is from μ)
Standardized Sample Values
xx
z
s
where:
x = original data value
x = sample mean
s = sample standard deviation
z = standard score
(number of standard deviations x is from μ)
Example
Which means what % or number of students higher than Sarah and what % or number of
students score lower than Sarah? To reflect, Sarah scored 70 out of 100, the mean score was
60 and the standard deviation was 15.
Standard
Score Mean
Deviation
(X) µ s
English Literature 70 60 15
We have the z score of 0.67 as above, and now we need to work out the percentage (or number
of students) that scored higher and lower than Sarah.
From the z score table we read z = 0.67 is 0.2514, which means 25% of the class got a better
mark than Sarah. (see z table). However to answer question 1. Sarah performed better than a
large proportion of the other students because 74.86% of the class scored lower than her.
However the key finding is that Sarah’s score even at 70 marks was not one of the best
because 25% of the class got higher marks than her.
Z-SCORE
1.28
Z-SCORE
Standard
Score Mean z-score
Deviation
(X) µ s z
? 60 15 1.282
To find the relevant score we thus apply the below p: probability formula
2. Which student came in the top 10% of the class? Or to rephrase: What mark need to be achieved to be
in the top 10% and thus qualify for advanced class?
Therefore, students that scored above 79.23 marks out of 100 came in the top 10% of the class, qualifying
for the advanced class as a result.
Using Excel
Click OK
Excel output
Microsoft Excel
descriptive statistics output,
using the house price data:
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Chapter Summary