C 3

Lesson 3
Statistical measures
Chapter Goals
After completing this chapter, you should be able
to:
 Compute and interpret the mean, median, and mode for
a set of data
 Compute the range, variance, and standard deviation
and know what these values mean
 Construct and interpret a box and whiskers plot
 Compute and explain the coefficient of variation and
z scores
 Use numerical measures along with graphs, charts, and
tables to describe data
Chapter Topics
 Measures of Center and Location

◦ Mean, median, mode, geometric mean,
midrange, weighted mean
 Other measures of Location (position)
◦ percentiles, quartiles
 Measures of Variation
◦ Range, interquartile range, variance and
standard deviation, coefficient of variation
Summary Measures
Describing Data Numerically
Center and Location Other Measures Variation

of Location
Mean Range
Percentiles
Median Interquartile Range
Quartiles
Mode
Variance
Weighted Mean
Standard Deviation
Coefficient of
Variation
Measures of Center and Location
Overview
Center and Location
Mean Median Mode Weighted Mean

n
x i
XW 
wx i i
x
w
i1
n i
N
x i W 
 wxi i
 i1
N
w i
Mean (Arithmetic Average)
 The Mean is the arithmetic average of

data values
◦ Sample mean n n = Sample Size
x i
x1  x 2    x n
x i1

n n
◦ Population mean
N = Population Size
N
x x1  x 2    xN
i
 
i1
N N
Mean (Arithmetic Average)
(continued)
 The most common measure of central tendency

 Mean = sum of values divided by the number of
values
 Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1  2  3  4  5 15 1  2  3  4  10 20
 3  4
5 5 5 5
Weighted Mean
 Used when values are grouped by

frequency or relative importance
Example: Sample of
26 Repair Projects
Weighted Mean Days
Days to Frequency to Complete:
Complete
5 4 XW 
w x
i i

(4  5)  (12  6)  (8  7)  (2  8)
6 12 w i 4  12  8  2
7 8 164
  6.31 days
8 2 26
2. Weighted mean
 Simple frequency distribution

 Grouped frequency distribution
Weighted mean of a simple
frequency distribution
xi fi  Is the arithmetic
mean appropriate to a
10 2 simple frequency
12 8 distribution?
13 17  Why?
14 5  Formula:
n
16 1 x i fi
x  i 1
n
i 1
fi
Example
x f xf
0 12
1 18 (x): Number of
2 30 newspapers/magazine
3 20 s/journals a student
4 15 read a week
5 5 (f): Number of students
Total 100
Weighted mean of a grouped
 Example: The following data relates to the
productivity of workers in a factory:
Productivity 0-9 10-19 20-29 30-39 40-49 50-59

(items/h)
Number of 15 25 30 35 28 17
workers
 Formula: n
x i fi
x  i 1
n
f
i 1
i
 Where:
- x: mid-point as representative value of
each class
- f: frequency of each class
Productivity Number of xi xifi
(items/h) workers
0-9 15
10-19 25
20-29 30
30-39 35
40-49 28
50-59 17
Total
3. Geometric mean
 Applicable when the products of data

values are meaningful
 Proportional increases and multipliers:
 Example:
The number of students attending the
music class last Tuesday was 160. This
Tuesday, the number is expected to
increase by 15%.
How many of them are likely to attend
this Tuesday?
3. Geometric mean
 A specialized measure, used to average

proportional increases.
 Formula:
- Step 1: Express the proportional increases
(p) as proportional multipliers (1+p)
3. Geometric mean
-Step 2: Calculate the geometric mean

multiplier
(i) Simple geometric mean multiplier:
applied when each proportional increase
appears once only
gmm  n (1  p1 )(1  p2 )...(1  pn )

3. Geometric mean
-Step 2: Calculate the geometric mean
multiplier
(ii) Weighted geometric mean multiplier:
applied when each proportional increase
repeatedly appears
n
 fi
gmm  i 1
(1  p1 ) (1  p2 ) ...(1  pn )
f1 f2 fn
n
gmm   fi  (1
i 1
 pi ) fi
3. Geometric mean
- Step 3: Subtract 1 from the gm multiplier

to obtain the average proportional
increase
average proportional increase = gm multiplier - 1

Example
 The number of bankers of a small bank
over the period 2000-2006 is presented in
the table below:
Year 2000 2001 2002 2003 2004 2005 2006
No of 200 220 250 262 284 300 312

bankers
Example
Year 2000 2001 2002 2003 2004 2005 2006
No of 200 220 250 262 284 300 312

bankers
Proport
ional
multipli
ers
Example
For example, suppose you have an
investment which earns 10% the first year,
60% the second year, and 20% the third
year. What is its average rate of return?
(cannot just add & divide because it is a
factor)
22
GEOMETRIC MEAN Example
Geometric mean
For example, suppose you have an investment which earns 10% the
first year, 60% the second year, and 20% the third year. What is its
average rate of return? (cannot just add & divide because it is a
23 of
factor) 26
It is not the arithmetic mean, because what these numbers mean is

that on the first year your investment was multiplied (not added to) by
1.10, on the second year it was multiplied by 1.60, and the third year it
was multiplied by 1.20.
The relevant quantity is the geometric mean of these three numbers.

note the pwr of 1/3 is the 3√
(1.1 X 1.60 X 1.20)1/3 (cube root)
If you calculate this geometric mean you get approximately 1.283, so

the average rate of return is about 28% (not 30% which is what the
arithmetic mean of 10%, 60%, and 20% would give you).
Median
 Not affected by extreme values
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 3 Median = 3
 In an ordered array, the median is the

“middle” number
Notes
 When a data set contains an even number

of items: 2m
there are two middle items: the mth one

and the (m+1)th one
 When a data set contains an odd number

of items: (2m+1)
The middle item is the (m+1)th one
Median for a simple
 Step 1: Find the middle item(s)
 Step 2: Find the value(s) that correspond
to the middle item(s)
- For an even-number data set: 2m, the

median is the average value of mth item
and (m+1)th item x x
Me  m m 1
2
Median for a simple
 Step 2: Find the value(s) that correspond
to the middle item(s)
- For an odd-number data set: 2m+1, the

median is the value of (m+1)th item
M e  xm1
Example
xi fi Fi Median?
1 8
2 15
3 20
4 13
5 9
Total 65
Median for a grouped
 Step 1: Find the middle item(s)
 Step 2: Find the class(es) containing the
middle item(s)
 Step 3: Estimating the median by formula
n
f i
i 1
 FM e 1
M e  LM e  cM e 2
fMe
LM e Lower limit of the median class
FM e 1 Cumulative frequency of class immediately

prior to the median class
fMe Actual frequency of the median class
cM e Median class width
Example
 Amount of food per person in province A
Amount of food (kg/person) Number of people
400-500 10
500-600 30
600-700 45
700-800 80
800-900 30
900-1000 5
Mode
 A measure of central tendency

 Value that occurs most often
 Not affected by extreme values
 Used for either numerical or categorical
data
 There may may be no mode
 There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
Mode = 5 No Mode
The mode of a simple
 Mode is the value xi fi

which has the 0 10
largest frequency 1 15
Mode? 2 17
3 20
4 18
5 9
The mode of a grouped
 Step 1: Find the modal class
 Step 2: Estimate the mode by the formula
f M0  f M0 1
M 0  LM0  cM0
( f M0  f M0 1 )  ( f M0  f M0 1 )
Frequency (or distribution
LM 0 Lower limit of fM0 density) of modal class
modal class
Frequency (or distribution density) of the
f M 0 1 class immediately prior to the modal class
cM 0 Modal class
width Frequency (or distribution density) of the
f M 0 1 class immediately following to the modal
class
Example
 Amount of food per person in province A
Amount of food (kg/person) Number of people
400-500 10
500-600 30
600-700 45
700-800 80
800-900 30
900-1000 5
Mode?
Review Example
 Five houses on a hill by the beach

$2,000 K
House Prices:
$2,000,000
500,000 $500 K
300,000 $300 K
100,000
100,000
$100 K
$100 K
Summary Statistics
House Prices:  Mean: ($3,000,000/5)

= $600,000
$2,000,000
500,000
300,000  Median: middle value of ranked
100,000
100,000 data
Sum 3,000,000
= $300,000
 Mode: most frequent value

= $100,000
Which measure of location
is the “best”?
 Mean is generally used, unless
extreme values (outliers) exist
 Then median, mode are often used,
since the median is not sensitive to
extreme values.
◦ Example: Median home prices may be
reported for a region – less sensitive to
outliers
Shape of a Distribution
 Describes how data is distributed

 Symmetric or skewed
Left-Skewed Symmetric Right-Skewed
Mean < Median < Mode Mean = Median = Mode Mode < Median < Mean
(Longer tail extends to left) (Longer tail extends to right)
Other Location Measures
Other Measures
of Location
Percentiles Quartiles
The pth percentile in a data array:  1st quartile = 25th percentile

 p% are less than or equal to
this value  2nd quartile = 50th percentile
 (100 – p)% are greater than or = median
equal to this value
(where 0 ≤ p ≤ 100)
 3rd quartile = 75th percentile
Percentiles
 The pth percentile in an ordered array of n
values is the value in ith position, where
p
i (n  1)
100
 Example: The 60th percentile in an ordered array of 19
values is the value in 12th position:
p 60
i (n  1)  (19  1)  12
100 100
QUESTION
 Suppose you received a score of 91 on

your statistic test. Does this mean you
scored in 91st percentile?
 Suppose your test score put in the 80th
percentile. What percentage of students
had a better test score than you?
Business Statistics: A Decision- Cha

Making Approach, 6e © 2005 p 3-
Prentice-Hall, Inc. 42
Quartiles
 Quartiles split the ranked data into 4
equal groups
25% 25% 25% 25%
Q1 Q2 Q3
 Example: Find the first quartile
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9)
Q1 = 25th percentile, so find the 25 (9+1) = 2.5 position
100
so use the value half way between the 2nd and 3rd values,
so Q1 = 12.5
Box and Whisker Plot
 A Graphical display of data using 5-

number summary:
Minimum -- Q1 -- Median -- Q3 -- Maximum
Example:
25% 25% 25% 25%
Minimum 1st Median 3rd Maximum

Minimum 1st
Quartile Median 3rd
Quartile Maximum
Quartile Quartile
Shape of Box and Whisker Plots
 The Box and central line are centered between

the endpoints if data is symmetric around the
median
 A Box and Whisker plot can be shown in either

vertical or horizontal format
Distribution Shape and
Box and Whisker Plot
Left-Skewed Symmetric Right-Skewed
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
Box-and-Whisker Plot Example
 Below is a Box-and-Whisker plot for the

following data:
Min Q1 Q2 Q3 Max
0 2 2 2 3 3 4 5 5 10 27
0 22 33 55 27
27
 This data is very right skewed, as the plot
depicts
NOTE
 The lower limit for boxplots as Q1 –

1.5(Q3 – Q1) and the upper limit is Q3 +
1.5(Q3 – Q1).
 Any data values outside these limits are
referred as outlier and marked with an
asterisk (*)

Measures of Variation
Variation
Range Variance Standard Deviation Coefficient of

Variation
Population Population
Interquartile
Variance Standard
Range
Deviation
Sample Sample
Variance Standard
Deviation
Variation
 Measures of variation give
information on the spread or
variability of the data values.
Same center,
different variation
Range
 Simplest measure of variation

 Difference between the largest and the
smallest observations:
Range = xmaximum – xminimum
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Disadvantages of the Range
 Ignores the way in which data are

distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
 Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Interquartile Range
 Can eliminate some outlier problems by

using the interquartile range
 Eliminate some high-and low-valued

observations and calculate the range from
the remaining values.
 Interquartile range = 3rd quartile – 1st

quartile
Interquartile Range
Example:
Median X
X Q1 Q3 maximum
minimum (Q2)
25% 25% 25% 25%
12 30 45 57 70
Interquartile range
= 57 – 30 = 27
2. The mean deviation
 The mean deviation is a measure of

dispersion that gives the average
difference (i.e. ignoring ‘-’ signs) between
each item and mean.
 Formula:
- For a data set n
x i x
d  i 1
n
Formula
- For a frequency distribution
f i xi  x
d  i 1
k
f
i 1
i
Example
Group 20 30 40 50 60
A
Group 38 39 40 41 42
B
n n
 xi  x  xi  x
dA  i 1
dB  i 1
n n
Example
Productivity Number of
 The data in table workers
(kg/person)
below relates to
<10 7
the productivity
(kg/person) of 10 – 20 18
100 workers in a 20 – 30 25
small factory 30 – 35 20
35 – 40 18
Mean ≥ 40 12
deviation? Total 100
Characteristics of the mean
deviation
 A better measure of dispersion than the
range
 Useful for comparing the variability
between distributions
 Can be complicated to calculate in
practice if the mean is anything other
than a whole number.
Variance
 Average of squared deviations of values

from the mean
◦ Sample variance: n
 (x i  x) 2
s2  i1
n -1
◦ Population variance: N
 (x i  μ) 2
σ2  i1
N
For a frequency distribution
k
 i
( x  x ) 2
fi
2  i 1
k
f
i 1
i
 i fi
x 2
or 2  i 1
k
 ( x )2  x 2  ( x )2
f
i 1
i
Example
Group 20 30 40 50 60
A
Group 38 39 40 41 42
B
n
 i
( x  x ) 2
2  i 1
n
Example
 The data in table Productivity Number of

below relates to (kg/person) workers
the productivity < 10 7
100 workers in a 20 – 30 25
small factory
30 – 35 20
35 – 40 18
Variance? ≥ 40 12
Total 100
Characteristics of the variance
 A better measure of dispersion than the

range
 Complicated since it multiply the
discrepancies
 The unit of the variance is not meaningful
Standard Deviation
 Most commonly used measure of variation
 Shows variation about the mean
 Has the same units as the original data
◦ Sample standard deviation:

n
 i
(x  x ) 2
s i1
n -1
◦ Population standard deviation:
N
 i
(x  μ) 2
σ i1
N
For a frequency distribution
k
 x  x  fi
2
i
  i 1
k
f
i 1
i
 i fi
x 2
or  i 1
k
 ( x )2  x 2  ( x )2
f
i 1
i
Example
Group 20 30 40 50 60
A
Group 38 39 40 41 42
B
  2
Example
 The data in table Productivity Number of

(kg/person) workers
below relates to
the productivity >10 7
100 workers in a 20 – 30 25
small factory 30 – 35 20
35 – 40 18
Standard ≥ 40 12
deviation? Total 100
Characteristics of Standard
Deviation
 Can be regarded as one of the most
useful and appropriate measure of
dispersion.
 For distribution that are not too skewed:
- 99.7% of the data items should lie
within three standard deviation of the
mean
- 95% of the data items should lie within
two standard deviation
- 68% of the data items should lie within
one standard deviation of the mean
Calculation Example:
Sample Standard Deviation
Sample
Data (Xi) : 10 12 14 15 17 18 18 24
n=8 Mean = x = 16
(10  x )2  (12  x )2  (14  x )2    (24  x )2

s 
n 1
(10  16)2  (12  16)2  (14  16)2    (24  16)2


8 1
126
  4.2426
7
Comparing Standard Deviations
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = .9258
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 4.57
Coefficient of Variation
 Measures relative variation

 Always in percentage (%)
 Shows variation relative to mean
 Is used to compare two or more sets of
data measured in different units
Population Sample
σ  s 
CV  
μ
  100% CV   
 x   100%
   
Comparing Coefficient
of Variation
 Stock A:
◦ Average price last year = $50
◦ Standard deviation = $5
s $5
 
CVA     100%   100%  10%
 Stock B:  x $50 Both stocks
have the same
standard
◦ Average price last year = $100 deviation, but
◦ Standard deviation = $5 stock B is less
variable relative
to its price
s $5
CVB     100%   100%  5%
x $100
Standardized Data Values
 A standardized data value refers to
the number of standard deviations a
value is from the mean
 Standardized data values are
sometimes referred to as z-scores
 Useful to compare data from two or
more distributions when data scales
for these distributions are different.
Standardized Population Values
x μ
z
σ
where:
 x = original data value
 μ = population mean
 σ = population standard deviation
 z = standard score
(number of standard deviations x is from μ)
Standardized Sample Values
xx
z
s
where:
 x = original data value
 x = sample mean
 s = sample standard deviation
 z = standard score
(number of standard deviations x is from μ)
Example
 A national achievement test is

administered annually to 3rd graders. The
test has a mean score of 100 and a
standard deviation of 15. If Jane's z-score
is 1.20, what was her score on the test?

EXAMPLE 1
 Mary receives a score 43 on statistic test.

The teacher says Mary’s test score is
equivalent to a z-score of 2.1 compared to
her classmate. How well did Mary do?
EXAMPLE 2
 Consider a company that uses placement

exams as a part of its hiring process. The
Company will accept from either of two tests:
AIMS Hiring and BHS Screen. The problem is
that AIMS Hiring test has an average scores
of 2,000 and a standard deviation of 200,
whereas the BHS Screen test has an average
scores of 80 with a standard deviation 12.
How can the Company compare two
applications, John and Mary if John took the
AIMS Hiring test and scores 2,344 and Mary
took the BHS Screen and scores 95?
EXAMPLE 3
 A teacher sets homework for 50 students & the
results are mean score is 60 out of 100 and the
standard deviation is 15 marks. Sarah, asked
teacher that by scoring 70 out of 100, has she
done well?
 Also teacher has another dilemma. He must
choose which of his students have performed
well enough to be entered into an advanced
class. He decide that only those students that
are in the top 10% of the class should be
entered into the advanced class. Which
students (scores) came in the top 10% of
the class?
Z-SCORE
1. How well did Sarah perform compared to the other 50 students?
Which means what % or number of students higher than Sarah and what % or number of
students score lower than Sarah? To reflect, Sarah scored 70 out of 100, the mean score was
60 and the standard deviation was 15.
Standard
Score Mean
Deviation
(X) µ s
English Literature 70 60 15
In terms of z-scores, this gives us:
We have the z score of 0.67 as above, and now we need to work out the percentage (or number
of students) that scored higher and lower than Sarah.
From the z score table we read z = 0.67 is 0.2514, which means 25% of the class got a better
mark than Sarah. (see z table). However to answer question 1. Sarah performed better than a
large proportion of the other students because 74.86% of the class scored lower than her.
However the key finding is that Sarah’s score even at 70 marks was not one of the best
because 25% of the class got higher marks than her.
Z-SCORE
1.28
Z-SCORE
Standard
Score Mean z-score
Deviation
(X) µ s z
? 60 15 1.282
To find the relevant score we thus apply the below p: probability formula
2. Which student came in the top 10% of the class? Or to rephrase: What mark need to be achieved to be
in the top 10% and thus qualify for advanced class?
Therefore, students that scored above 79.23 marks out of 100 came in the top 10% of the class, qualifying
for the advanced class as a result.
Using Excel
Use menu choice:

tools / data analysis /
descriptive statistics
Using Microsoft Excel
 Descriptive Statistics are easy to

obtain from Microsoft Excel
◦ Use menu choice:
tools / data analysis / descriptive
statistics
◦ Enter details in dialog box

Using Excel
(continued)
 Enter dialog box

details
 Check box for

summary
statistics
 Click OK
Excel output
Microsoft Excel
descriptive statistics output,
using the house price data:
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Chapter Summary
 Described measures of center and

location
◦ Mean, median, mode, geometric mean,
midrange
 Discussed percentiles and quartiles
 Described measure of variation
◦ Range, interquartile range, variance,
standard deviation, coefficient of variation
 Created Box and Whisker Plots
Chapter Summary
(continued)
 Illustrated distribution shapes

◦ Symmetric, skewed
 Calculated standardized data values
TEST
 What happens to measures of central

tendency and measures of variation when
you add or multiply a constant to each
value in the data set


C 3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

C 3

Uploaded by

Copyright:

Available Formats

Lesson 3

 Measures of Center and Location

Describing Data Numerically

Center and Location Other Measures Variation

Mean Median Mode Weighted Mean

 The Mean is the arithmetic average of

 The most common measure of central tendency

 Used when values are grouped by

 Simple frequency distribution

Productivity 0-9 10-19 20-29 30-39 40-49 50-59

 Applicable when the products of data

 A specialized measure, used to average

-Step 2: Calculate the geometric mean

gmm  n (1  p1 )(1  p2 )...(1  pn )

- Step 3: Subtract 1 from the gm multiplier

average proportional increase = gm multiplier - 1

Year 2000 2001 2002 2003 2004 2005 2006

No of 200 220 250 262 284 300 312

No of 200 220 250 262 284 300 312

It is not the arithmetic mean, because what these numbers mean is

The relevant quantity is the geometric mean of these three numbers.

If you calculate this geometric mean you get approximately 1.283, so

 In an ordered array, the median is the

 When a data set contains an even number

there are two middle items: the mth one

 When a data set contains an odd number

- For an even-number data set: 2m, the

- For an odd-number data set: 2m+1, the

FM e 1 Cumulative frequency of class immediately

 A measure of central tendency

 Mode is the value xi fi

 Five houses on a hill by the beach

House Prices:  Mean: ($3,000,000/5)

 Mode: most frequent value

 Describes how data is distributed

Left-Skewed Symmetric Right-Skewed

The pth percentile in a data array:  1st quartile = 25th percentile

 Suppose you received a score of 91 on

Business Statistics: A Decision- Cha

 A Graphical display of data using 5-

25% 25% 25% 25%

Minimum 1st Median 3rd Maximum

 The Box and central line are centered between

 A Box and Whisker plot can be shown in either

Left-Skewed Symmetric Right-Skewed

 Below is a Box-and-Whisker plot for the

 The lower limit for boxplots as Q1 –

Business Statistics: A Decision- Cha

Range Variance Standard Deviation Coefficient of

 Simplest measure of variation

 Ignores the way in which data are

 Can eliminate some outlier problems by

 Eliminate some high-and low-valued

 Interquartile range = 3rd quartile – 1st

 The mean deviation is a measure of

- For a frequency distribution

 Average of squared deviations of values

 The data in table Productivity Number of

 A better measure of dispersion than the

◦ Sample standard deviation:

 The data in table Productivity Number of

(10  x )2  (12  x )2  (14  x )2    (24  x )2

(10  16)2  (12  16)2  (14  16)2    (24  16)2