You are on page 1of 90

Lesson 3

Statistical measures
Chapter Goals
After completing this chapter, you should be able
to:
 Compute and interpret the mean, median, and mode for
a set of data
 Compute the range, variance, and standard deviation
and know what these values mean
 Construct and interpret a box and whiskers plot
 Compute and explain the coefficient of variation and
z scores
 Use numerical measures along with graphs, charts, and
tables to describe data
Chapter Topics

 Measures of Center and Location


◦ Mean, median, mode, geometric mean,
midrange, weighted mean
 Other measures of Location (position)
◦ percentiles, quartiles
 Measures of Variation
◦ Range, interquartile range, variance and
standard deviation, coefficient of variation
Summary Measures

Describing Data Numerically

Center and Location Other Measures Variation


of Location
Mean Range
Percentiles
Median Interquartile Range
Quartiles
Mode
Variance
Weighted Mean
Standard Deviation

Coefficient of
Variation
Measures of Center and Location
Overview
Center and Location

Mean Median Mode Weighted Mean


n

x i
XW 
wx i i
x
w
i1
n i
N

x i W 
 wxi i

 i1
N
w i
Mean (Arithmetic Average)

 The Mean is the arithmetic average of


data values
◦ Sample mean n n = Sample Size

x i
x1  x 2    x n
x i1

n n
◦ Population mean
N = Population Size
N

x x1  x 2    xN
i
 
i1
N N
Mean (Arithmetic Average)
(continued)

 The most common measure of central tendency


 Mean = sum of values divided by the number of
values
 Affected by extreme values (outliers)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Mean = 3 Mean = 4
1  2  3  4  5 15 1  2  3  4  10 20
 3  4
5 5 5 5
Weighted Mean

 Used when values are grouped by


frequency or relative importance
Example: Sample of
26 Repair Projects
Weighted Mean Days
Days to Frequency to Complete:
Complete
5 4 XW 
w x
i i

(4  5)  (12  6)  (8  7)  (2  8)
6 12 w i 4  12  8  2
7 8 164
  6.31 days
8 2 26
2. Weighted mean

 Simple frequency distribution


 Grouped frequency distribution
Weighted mean of a simple
frequency distribution

xi fi  Is the arithmetic
mean appropriate to a
10 2 simple frequency
12 8 distribution?
13 17  Why?
14 5  Formula:
n
16 1 x i fi
x  i 1
n

i 1
fi
Example
x f xf
0 12
1 18 (x): Number of
2 30 newspapers/magazine
3 20 s/journals a student
4 15 read a week
5 5 (f): Number of students
Total 100
Weighted mean of a grouped
frequency distribution
 Example: The following data relates to the
productivity of workers in a factory:

Productivity 0-9 10-19 20-29 30-39 40-49 50-59


(items/h)
Number of 15 25 30 35 28 17
workers
Weighted mean of a grouped
frequency distribution
 Formula: n

x i fi
x  i 1
n

f
i 1
i

 Where:
- x: mid-point as representative value of
each class
- f: frequency of each class
Weighted mean of a grouped
frequency distribution
Productivity Number of xi xifi
(items/h) workers
0-9 15
10-19 25
20-29 30
30-39 35
40-49 28
50-59 17
Total
3. Geometric mean

 Applicable when the products of data


values are meaningful
 Proportional increases and multipliers:
 Example:
The number of students attending the
music class last Tuesday was 160. This
Tuesday, the number is expected to
increase by 15%.
How many of them are likely to attend
this Tuesday?
3. Geometric mean

 A specialized measure, used to average


proportional increases.
 Formula:
- Step 1: Express the proportional increases
(p) as proportional multipliers (1+p)
3. Geometric mean

-Step 2: Calculate the geometric mean


multiplier
(i) Simple geometric mean multiplier:
applied when each proportional increase
appears once only

gmm  n (1  p1 )(1  p2 )...(1  pn )


3. Geometric mean
-Step 2: Calculate the geometric mean
multiplier
(ii) Weighted geometric mean multiplier:
applied when each proportional increase
repeatedly appears
n
 fi
gmm  i 1
(1  p1 ) (1  p2 ) ...(1  pn )
f1 f2 fn

n
gmm   fi  (1
i 1
 pi ) fi
3. Geometric mean

- Step 3: Subtract 1 from the gm multiplier


to obtain the average proportional
increase

average proportional increase = gm multiplier - 1


Example
 The number of bankers of a small bank
over the period 2000-2006 is presented in
the table below:

Year 2000 2001 2002 2003 2004 2005 2006

No of 200 220 250 262 284 300 312


bankers
Example
Year 2000 2001 2002 2003 2004 2005 2006

No of 200 220 250 262 284 300 312


bankers
Proport
ional
multipli
ers
Example
For example, suppose you have an
investment which earns 10% the first year,
60% the second year, and 20% the third
year. What is its average rate of return?
(cannot just add & divide because it is a
factor)

22
GEOMETRIC MEAN Example
Geometric mean

For example, suppose you have an investment which earns 10% the
first year, 60% the second year, and 20% the third year. What is its
average rate of return? (cannot just add & divide because it is a
23 of
factor) 26

It is not the arithmetic mean, because what these numbers mean is


that on the first year your investment was multiplied (not added to) by
1.10, on the second year it was multiplied by 1.60, and the third year it
was multiplied by 1.20.

The relevant quantity is the geometric mean of these three numbers.


note the pwr of 1/3 is the 3√
(1.1 X 1.60 X 1.20)1/3 (cube root)

If you calculate this geometric mean you get approximately 1.283, so


the average rate of return is about 28% (not 30% which is what the
arithmetic mean of 10%, 60%, and 20% would give you).
Median
 Not affected by extreme values

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Median = 3 Median = 3

 In an ordered array, the median is the


“middle” number
Notes

 When a data set contains an even number


of items: 2m

there are two middle items: the mth one


and the (m+1)th one

 When a data set contains an odd number


of items: (2m+1)
The middle item is the (m+1)th one
Median for a simple
frequency distribution
 Step 1: Find the middle item(s)
 Step 2: Find the value(s) that correspond
to the middle item(s)

- For an even-number data set: 2m, the


median is the average value of mth item
and (m+1)th item x x
Me  m m 1

2
Median for a simple
frequency distribution
 Step 2: Find the value(s) that correspond
to the middle item(s)

- For an odd-number data set: 2m+1, the


median is the value of (m+1)th item

M e  xm1
Example

xi fi Fi Median?
1 8
2 15
3 20
4 13
5 9
Total 65
Median for a grouped
frequency distribution
 Step 1: Find the middle item(s)
 Step 2: Find the class(es) containing the
middle item(s)
 Step 3: Estimating the median by formula
n

f i
i 1
 FM e 1
M e  LM e  cM e 2
fMe
LM e Lower limit of the median class

FM e 1 Cumulative frequency of class immediately


prior to the median class
fMe Actual frequency of the median class
cM e Median class width
Example
 Amount of food per person in province A
Amount of food (kg/person) Number of people

400-500 10
500-600 30
600-700 45
700-800 80
800-900 30
900-1000 5
Mode

 A measure of central tendency


 Value that occurs most often
 Not affected by extreme values
 Used for either numerical or categorical
data
 There may may be no mode
 There may be several modes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6

Mode = 5 No Mode
The mode of a simple
frequency distribution

 Mode is the value xi fi


which has the 0 10
largest frequency 1 15
Mode? 2 17
3 20
4 18
5 9
The mode of a grouped
frequency distribution
 Step 1: Find the modal class
 Step 2: Estimate the mode by the formula
f M0  f M0 1
M 0  LM0  cM0
( f M0  f M0 1 )  ( f M0  f M0 1 )
Frequency (or distribution
LM 0 Lower limit of fM0 density) of modal class
modal class
Frequency (or distribution density) of the
f M 0 1 class immediately prior to the modal class
cM 0 Modal class
width Frequency (or distribution density) of the
f M 0 1 class immediately following to the modal
class
Example
 Amount of food per person in province A
Amount of food (kg/person) Number of people

400-500 10

500-600 30

600-700 45

700-800 80

800-900 30

900-1000 5

Mode?
Review Example

 Five houses on a hill by the beach


$2,000 K
House Prices:

$2,000,000
500,000 $500 K
300,000 $300 K
100,000
100,000

$100 K

$100 K
Summary Statistics

House Prices:  Mean: ($3,000,000/5)


= $600,000
$2,000,000
500,000
300,000  Median: middle value of ranked
100,000
100,000 data
Sum 3,000,000
= $300,000

 Mode: most frequent value


= $100,000
Which measure of location
is the “best”?
 Mean is generally used, unless
extreme values (outliers) exist
 Then median, mode are often used,
since the median is not sensitive to
extreme values.
◦ Example: Median home prices may be
reported for a region – less sensitive to
outliers
Shape of a Distribution

 Describes how data is distributed


 Symmetric or skewed

Left-Skewed Symmetric Right-Skewed

Mean < Median < Mode Mean = Median = Mode Mode < Median < Mean
(Longer tail extends to left) (Longer tail extends to right)
Other Location Measures

Other Measures
of Location

Percentiles Quartiles

The pth percentile in a data array:  1st quartile = 25th percentile


 p% are less than or equal to
this value  2nd quartile = 50th percentile
 (100 – p)% are greater than or = median
equal to this value
(where 0 ≤ p ≤ 100)
 3rd quartile = 75th percentile
Percentiles
 The pth percentile in an ordered array of n
values is the value in ith position, where
p
i (n  1)
100
 Example: The 60th percentile in an ordered array of 19
values is the value in 12th position:
p 60
i (n  1)  (19  1)  12
100 100
QUESTION

 Suppose you received a score of 91 on


your statistic test. Does this mean you
scored in 91st percentile?
 Suppose your test score put in the 80th
percentile. What percentage of students
had a better test score than you?

Business Statistics: A Decision- Cha


Making Approach, 6e © 2005 p 3-
Prentice-Hall, Inc. 42
Quartiles
 Quartiles split the ranked data into 4
equal groups
25% 25% 25% 25%

Q1 Q2 Q3
 Example: Find the first quartile
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9)
Q1 = 25th percentile, so find the 25 (9+1) = 2.5 position
100
so use the value half way between the 2nd and 3rd values,
so Q1 = 12.5
Box and Whisker Plot

 A Graphical display of data using 5-


number summary:
Minimum -- Q1 -- Median -- Q3 -- Maximum

Example:

25% 25% 25% 25%

Minimum 1st Median 3rd Maximum


Minimum 1st
Quartile Median 3rd
Quartile Maximum
Quartile Quartile
Shape of Box and Whisker Plots

 The Box and central line are centered between


the endpoints if data is symmetric around the
median

 A Box and Whisker plot can be shown in either


vertical or horizontal format
Distribution Shape and
Box and Whisker Plot

Left-Skewed Symmetric Right-Skewed

Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
Box-and-Whisker Plot Example

 Below is a Box-and-Whisker plot for the


following data:
Min Q1 Q2 Q3 Max
0 2 2 2 3 3 4 5 5 10 27

0 22 33 55 27
27
 This data is very right skewed, as the plot
depicts
NOTE

 The lower limit for boxplots as Q1 –


1.5(Q3 – Q1) and the upper limit is Q3 +
1.5(Q3 – Q1).
 Any data values outside these limits are
referred as outlier and marked with an
asterisk (*)

Business Statistics: A Decision- Cha


Making Approach, 6e © 2005 p 3-
Prentice-Hall, Inc. 48
Measures of Variation

Variation

Range Variance Standard Deviation Coefficient of


Variation
Population Population
Interquartile
Variance Standard
Range
Deviation

Sample Sample
Variance Standard
Deviation
Variation
 Measures of variation give
information on the spread or
variability of the data values.

Same center,
different variation
Range

 Simplest measure of variation


 Difference between the largest and the
smallest observations:
Range = xmaximum – xminimum

Example:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 14 - 1 = 13
Disadvantages of the Range

 Ignores the way in which data are


distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5

 Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Interquartile Range

 Can eliminate some outlier problems by


using the interquartile range

 Eliminate some high-and low-valued


observations and calculate the range from
the remaining values.

 Interquartile range = 3rd quartile – 1st


quartile
Interquartile Range

Example:
Median X
X Q1 Q3 maximum
minimum (Q2)
25% 25% 25% 25%

12 30 45 57 70

Interquartile range
= 57 – 30 = 27
2. The mean deviation

 The mean deviation is a measure of


dispersion that gives the average
difference (i.e. ignoring ‘-’ signs) between
each item and mean.
 Formula:
- For a data set n

x i x
d  i 1

n
Formula

- For a frequency distribution

f i xi  x
d  i 1
k

f
i 1
i
Example
Group 20 30 40 50 60
A
Group 38 39 40 41 42
B
n n

 xi  x  xi  x
dA  i 1
dB  i 1

n n
Example
Productivity Number of
 The data in table workers
(kg/person)
below relates to
<10 7
the productivity
(kg/person) of 10 – 20 18
100 workers in a 20 – 30 25
small factory 30 – 35 20
35 – 40 18
Mean ≥ 40 12
deviation? Total 100
Characteristics of the mean
deviation
 A better measure of dispersion than the
range
 Useful for comparing the variability
between distributions
 Can be complicated to calculate in
practice if the mean is anything other
than a whole number.
Variance

 Average of squared deviations of values


from the mean
◦ Sample variance: n

 (x i  x) 2

s2  i1
n -1

◦ Population variance: N

 (x i  μ) 2

σ2  i1
N
For a frequency distribution
k

 i
( x  x ) 2
fi
2  i 1
k

f
i 1
i

 i fi
x 2

or 2  i 1
k
 ( x )2  x 2  ( x )2
f
i 1
i
Example
Group 20 30 40 50 60
A
Group 38 39 40 41 42
B
n

 i
( x  x ) 2

2  i 1

n
Example

 The data in table Productivity Number of


below relates to (kg/person) workers
the productivity < 10 7
(kg/person) of 10 – 20 18
100 workers in a 20 – 30 25
small factory
30 – 35 20
35 – 40 18
Variance? ≥ 40 12
Total 100
Characteristics of the variance

 A better measure of dispersion than the


range
 Complicated since it multiply the
discrepancies
 The unit of the variance is not meaningful
Standard Deviation
 Most commonly used measure of variation
 Shows variation about the mean
 Has the same units as the original data

◦ Sample standard deviation:


n

 i
(x  x ) 2

s i1
n -1
◦ Population standard deviation:
N

 i
(x  μ) 2

σ i1
N
For a frequency distribution
k

 x  x  fi
2
i
  i 1
k

f
i 1
i

 i fi
x 2

or  i 1
k
 ( x )2  x 2  ( x )2
f
i 1
i
Example
Group 20 30 40 50 60
A
Group 38 39 40 41 42
B

  2
Example

 The data in table Productivity Number of


(kg/person) workers
below relates to
the productivity >10 7
(kg/person) of 10 – 20 18
100 workers in a 20 – 30 25
small factory 30 – 35 20
35 – 40 18
Standard ≥ 40 12
deviation? Total 100
Characteristics of Standard
Deviation
 Can be regarded as one of the most
useful and appropriate measure of
dispersion.
 For distribution that are not too skewed:
- 99.7% of the data items should lie
within three standard deviation of the
mean
- 95% of the data items should lie within
two standard deviation
- 68% of the data items should lie within
one standard deviation of the mean
Calculation Example:
Sample Standard Deviation

Sample
Data (Xi) : 10 12 14 15 17 18 18 24
n=8 Mean = x = 16

(10  x )2  (12  x )2  (14  x )2    (24  x )2


s 
n 1

(10  16)2  (12  16)2  (14  16)2    (24  16)2



8 1

126
  4.2426
7
Comparing Standard Deviations

Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 3.338

Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = .9258
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 4.57
Coefficient of Variation

 Measures relative variation


 Always in percentage (%)
 Shows variation relative to mean
 Is used to compare two or more sets of
data measured in different units

Population Sample

σ  s 
CV  
μ
  100% CV   
 x   100%
   
Comparing Coefficient
of Variation
 Stock A:
◦ Average price last year = $50
◦ Standard deviation = $5
s $5
 
CVA     100%   100%  10%
 Stock B:  x $50 Both stocks
have the same
standard
◦ Average price last year = $100 deviation, but
◦ Standard deviation = $5 stock B is less
variable relative
to its price
s $5
CVB     100%   100%  5%
x $100
Standardized Data Values
 A standardized data value refers to
the number of standard deviations a
value is from the mean
 Standardized data values are
sometimes referred to as z-scores
 Useful to compare data from two or
more distributions when data scales
for these distributions are different.
Standardized Population Values

x μ
z
σ
where:
 x = original data value
 μ = population mean
 σ = population standard deviation
 z = standard score
(number of standard deviations x is from μ)
Standardized Sample Values

xx
z
s
where:
 x = original data value
 x = sample mean
 s = sample standard deviation
 z = standard score
(number of standard deviations x is from μ)
Example

 A national achievement test is


administered annually to 3rd graders. The
test has a mean score of 100 and a
standard deviation of 15. If Jane's z-score
is 1.20, what was her score on the test?

Business Statistics: A Decision- Cha


Making Approach, 6e © 2005 p 3-
Prentice-Hall, Inc. 77
EXAMPLE 1

 Mary receives a score 43 on statistic test.


The teacher says Mary’s test score is
equivalent to a z-score of 2.1 compared to
her classmate. How well did Mary do?
EXAMPLE 2

 Consider a company that uses placement


exams as a part of its hiring process. The
Company will accept from either of two tests:
AIMS Hiring and BHS Screen. The problem is
that AIMS Hiring test has an average scores
of 2,000 and a standard deviation of 200,
whereas the BHS Screen test has an average
scores of 80 with a standard deviation 12.
How can the Company compare two
applications, John and Mary if John took the
AIMS Hiring test and scores 2,344 and Mary
took the BHS Screen and scores 95?
EXAMPLE 3
 A teacher sets homework for 50 students & the
results are mean score is 60 out of 100 and the
standard deviation is 15 marks. Sarah, asked
teacher that by scoring 70 out of 100, has she
done well?
 Also teacher has another dilemma. He must
choose which of his students have performed
well enough to be entered into an advanced
class. He decide that only those students that
are in the top 10% of the class should be
entered into the advanced class. Which
students (scores) came in the top 10% of
the class?
Z-SCORE
1. How well did Sarah perform compared to the other 50 students?

Which means what % or number of students higher than Sarah and what % or number of
students score lower than Sarah? To reflect, Sarah scored 70 out of 100, the mean score was
60 and the standard deviation was 15.
Standard
Score Mean
Deviation
(X) µ s
English Literature 70 60 15

In terms of z-scores, this gives us:

We have the z score of 0.67 as above, and now we need to work out the percentage (or number
of students) that scored higher and lower than Sarah.

From the z score table we read z = 0.67 is 0.2514, which means 25% of the class got a better
mark than Sarah. (see z table). However to answer question 1. Sarah performed better than a
large proportion of the other students because 74.86% of the class scored lower than her.

However the key finding is that Sarah’s score even at 70 marks was not one of the best
because 25% of the class got higher marks than her.
Z-SCORE
1.28
Z-SCORE
Standard
Score Mean z-score
Deviation
(X) µ s z
? 60 15 1.282
To find the relevant score we thus apply the below p: probability formula

2. Which student came in the top 10% of the class? Or to rephrase: What mark need to be achieved to be
in the top 10% and thus qualify for advanced class?

Therefore, students that scored above 79.23 marks out of 100 came in the top 10% of the class, qualifying
for the advanced class as a result.
Using Excel

Use menu choice:


tools / data analysis /
descriptive statistics
Using Microsoft Excel

 Descriptive Statistics are easy to


obtain from Microsoft Excel
◦ Use menu choice:
tools / data analysis / descriptive
statistics

◦ Enter details in dialog box


Using Excel
(continued)

 Enter dialog box


details

 Check box for


summary
statistics

 Click OK
Excel output

Microsoft Excel
descriptive statistics output,
using the house price data:
House Prices:

$2,000,000
500,000
300,000
100,000
100,000
Chapter Summary

 Described measures of center and


location
◦ Mean, median, mode, geometric mean,
midrange
 Discussed percentiles and quartiles
 Described measure of variation
◦ Range, interquartile range, variance,
standard deviation, coefficient of variation
 Created Box and Whisker Plots
Chapter Summary
(continued)

 Illustrated distribution shapes


◦ Symmetric, skewed
 Calculated standardized data values
TEST

 What happens to measures of central


tendency and measures of variation when
you add or multiply a constant to each
value in the data set

Business Statistics: A Decision- Cha


Making Approach, 6e © 2005 p 3-
Prentice-Hall, Inc. 90

You might also like