You are on page 1of 11

Chapter 1:

Describing Data
Lesson 7: Measures of Variation
TIME FRAME: 1 hour session

OVERVIEW OF LESSON
In this lesson, students will be shown that it is not enough to get measures of central tendency in
a data set by seeing two data sets representing the returns on stocks. Here, the means are the
same, and the spread of the data, i.e. range, standard deviation and variance, for the data sets are
also the same. The standard deviation can be viewed as a measure of risk. The main learning her
is that if we get a mixture of the stocks (and thus have an average of the returns of the two
stocks), we would get less risk (as the standard deviation of this mixture will be less).
LEARNING OUTCOME(S): At the end of the lesson, the learner is able to

calculate some measures of variation,


think of the strengths and limitations of these measures,
provide a sound interpretation of these measures in a business setting.

LESSON OUTLINE:
1. Introduction
2. Case Study: Returns on Stocks
3. Analysis and Comments on Case

DEVELOPMENT OF THE LESSON


(A) Introduction
Discuss with students the importance of thinking of their future, of saving, and of wealth
generation. Explain that a number of people invest money into the stock market as an
alternative financial instrument to generate wealth from savings. (Explanatory Note: Stocks
are shares of ownership in a company. When people buy stocks they become part owners of
the company, whether in terms of profits or losses of the company. )
Mention to students that the history of performance of a particular stock maybe a useful
guide to what may be expected of its performance in the foreseeable future. (This is of
course, a very big assumption, but we have to assume it anyways.)

(B) Case Study: Returns on Stocks


Provide the following data to students representing the rates of return for two stocks, which
well call stock A and stock B.

Chapter 1 Describing Data Lesson 7 Page 1


Year Stock A Stock B Year Stock A Stock B
2005 0.081 0.214 2010 0.241 0.081
2006 0.231 0.193 2011 0.193 0.181
2007 0.214 0.133 2012 0.133 0.231
2008 0.214 0.071 2013 0.071 0.214
2009 0.181 0.066 2014 0.066 0.241

Inform students that the rate of return is defined as the increase in value of the portfolio
(including any dividends or other distributions) during the year divided by its value at the
beginning of the year. For instance, if the parents of Juana dela Cruz invests 50,000 pesos in
a stock at the beginning of the year, and the value of the stock goes to 60,000 pesos, thus
having an increase in value of 10,000 pesos, then the rate of return here is 10,000/50,000 =
0.20
Explain to students that the rate of return may be positive or negative. It represents the
fraction by which your wealth would have changed had it been invested in that particular
combination of securities.
Divide students into sets of threes, ask students to obtain the average return for the two
stocks and the standard deviations for the rates of return.
Tell them to use the historical performance of the stocks as a guide to making an investment
decision. Instruct them to look at summary measures of variability (such as the range and the
standard deviation) of the rates of return, and use these as measures of risk associated with
investing in a given security. Discuss whether it would make any difference if we decide to
invest wholly in stock A, wholly in stock B, or half of our investments in stock A and half in
stock B? Ask them why is this so???
Notes on Calculating Measures of Variation

(i) A simple measure of variation is the range, the difference between the maximum and
minimum values.

The range is the difference between the largest and smallest


values in a data set.

While the range is simple, it only depends on the extremes; it ignores information
about what goes on between the smallest (minimum) and largest (maximum) values
in a data set.

Chapter 1 Describing Data Lesson 7 Page 2


(ii) Explain that a more popularly used measure of variation is the standard deviation, but
its calculation, however, is more complex. To justify its calculation, consider a list of
data. Each of the data entries deviates from the average by some amount,
deviation from the average = entry - average

We may want to have a measure of spread based on all these deviations. Getting the
mean of these deviations always yields a value of zero regardless of the values of a
data distribution. However, the average of the absolute value of these deviations is
nontrivial. It is called the mean absolute deviation and is useful for measuring
spread. This measure, alas, does not have very interesting mathematical properties.

An alternative to the mean absolute deviation is the variance, formed by taking the
mean of the squared deviations from the average. Unlike the mean absolute deviation,
the variance has some interesting mathematical properties; but, we omit discussion of
these properties here. The variance has the square of the units of the data. If we take
its square root, we get the standard deviation.

The variance is the mean of the squared deviations from the


average. The standard deviation is the square root of the
variance.

TECHNICAL NOTES

x i
Given a data set x1 , x 2 , , x N , denote the mean as i 1
N
N

(x i )2
(a) the variance, denoted as 2 , is 2 i 1
N
(b) the population standard deviation, denoted as , is the square root of the variance
N

(x i )2
i 1
N

If a list of data has no variation, that is, all the data items are constant, then the mean
is that constant, and the variance and standard deviation are both zero. Furthermore, if
for a given list of data, the variance and standard deviation turn out to be zero, then
all the deviations from the average must be zero, which means that all the members of
Chapter 1 Describing Data Lesson 7 Page 3
the list are equal. Note that if a data set were rescaled, i.e. if the data were multiplied
by some constant, then the standard deviation of the new data set is merely the scaling
factor multiplied to the standard deviation of the original data set.

The variance and standard deviation are based on all items in the list, and each item is
given a proper weight. They are extremely useful measures of variability as they
measure the mean scattering of the data around the average, i.e. how large data
fluctuate above the average and how small data distribute below the average. The
variance and standard deviation increase with an increase in the deviations about the
mean, and decrease with decreases in these deviations. A small standard deviation
(and variance) means a high degree of uniformity in the observations and of
homogeneity in a series.

The variance is most suitable for algebraic manipulations but as was pointed out
earlier, its computation results in squared units. On the other hand, the standard
deviation has a value in the original units of the data. Thus, it serves as the primary
measure of variation just as the mean is the primary measure of central location

(C) Analysis and Comments of Case


This case introduces the idea of expected return and risk, as measured by the average and
standard deviation of the rates of return to a stock. The main point in the case is to measure
risk of stocks with the standard deviation.
Students should discover that the even if two data sets have a common standard deviation,
the standard deviation of a mixture of portfolios will yield a lower standard deviation. This
provides empirical basis for the advice in making investments: do not put all your eggs in
one basket!
The means, ranges and standard deviations for stocks A and B are identical (mean 0.1625,
range 0.175 and standard deviation 0.065291).
Some students may notice that the rates of return in B are the same as in A except for some
rearrangement in the order, and at this point, they might stop and say it does not make any
difference which one to invest in: A or B (but what about the mixture of A and B?).
There may be other recommendations and observations made by students. The main
discovery of students should be that when we compute the returns that would have been
received for the 50/50 mix (as 0.5*A + 0.5*B) , the mean, range and standard deviation of
that sequence are 0.1625, 0.0885 and .024862, respectively so that the mixture gives the
same return with less risk.
This case is an effective way of showing that intuition (that the mix would also not make a
difference) does not always yield something right, and that it is important to have solid data
analytics.

Chapter 1 Describing Data Lesson 7 Page 4


Recommended: It may be helpful to illustrate fast calculation of standard deviation (and
other summary measures) with a spreadsheet application (such as Microsoft Excel), as
indicated in Students Notes.

Chapter 1 Describing Data Lesson 7 Page 5


REFERENCES
Albert, J. R. G. (2008). Basic Statistics for the Tertiary Level (ed. Roberto Padua, Welfredo
Patungan, Nelia Marquez), published by Rex Bookstore.
BryantSmith (2009): Practical Data Analysis, Second Edition. McGraw-Hill/Irvine, USA.
Range as a Measure of Variation http://www.sharemylesson.com/teaching-resource/range-as-
a-measure-of-variation-50009362

Chapter 1 Describing Data Lesson 7 Page 6


ASSESSMENT
1. The mean number of letters in ten students names is 8 letters. The range of the data set is 5
letters. Make a line plot showing a data set that fits this description.

One possible answer is

x x x

x x x x x

6 7 8 9 10

x number of letters of one student

x one night

Another is
x x x x x

x x x x x

6 7 8 9 10

x number of letters of one student

2. Gerald, Carmina, and Rodolfo obtained the prices (in pesos) of a jar of peanut butter at
several grocery stores. Below is the data they have collected:
100.80 197.60 158.00 131.60 184.40 149.20
136.00 109.60 360.40 122.80 131.60

After analyzing the data, Gerald said, The prices of peanut butter are pretty similar. The range
is only PHP 30.80. Carmina said, You are mistaken! The prices are very different. The range
is P259.60. That is a big difference in terms of peanut butter. Rodolfo said, I think you are
both mistaken. The range isnt a useful measure to describe this set of data.

a) Explain what you think each person was thinking.

Chapter 1 Describing Data Lesson 7 Page 7


ANSWER:

Gerald did not order the data set from smallest to largest, and erroneously subtracted the first
value (100.80) from the last value (131.60) in the data set.

Carmina found the range correctly by subtracting the smallest value (100.80) from the largest
value (360.40).

Rodolfo noticed that the maximum 360.80 is an outlier in this set of data. As a result, a range of
PHP259.60 should not entirely describe the variation of the set of data as it was unduly increased
by the extreme value.

b) Who should we agree with? Why ?

ANSWER:

We can agree with both Carmina and Rodolfo.

Carmina correctly calculated the range;

Rodolfo astutely observed that while Carmina was correct in her cacluations, the range is not
very useful in describing the variability of this set of data, as the range would only be PHP 96.80
if the outlier were removed from the data set.

3. Three hundred students taking a first course in Statistics are provided a common final
examination. The following histogram shows the distribution of the final scores.

C E

B
F
G
A H

40 60 80 100 120
Chapter 1 Describing Data Lesson 7 Page 8
Suppose the professor will give 30% weight to the Final Examination, what effect
would multiplying 30% on all the Final Scores have on the mean of the Final
Exam Scores? on the standard deviation of the Final Exam Scores?

Answer: mean will also get rescaled by 30%, so with the standard deviation

Suppose the professor wants to bloat the Final Examination Scores, what effect
would adding 5 points to all the Final Scores have on the mean of the Final Exam
Scores? on the standard deviation of the Final Exam Scores?

Answer: mean will also go up by 5 points; standard deviation stays the same

4. In a fitness center, the weights of a certain group of students were taken resulting to a common
weight of 140 pounds. What would be the standard deviation of the distribution of weights?

Answer:
Zero (since the data do not vary).

5. Determine which of the following five statements is true or false and explain briefly.

a. The average and median of any list of data are always close together. (Answer: False)

b. Half of a list of data is always below the average. (Answer: False, median)

c. If entries in a list are doubled, then the average is doubled. (Answer: True)

d. If entries in a list are doubled, then the standard deviation is doubled. (Answer: True)

e. If in a set of data, positive numbers are changed to negative, while negative are changed
to positive, the Standard Deviation changes sign as well. (Answer: False, Standard
deviation is always nonnegative)

Explanatory Note:

Teachers have the option to just ask this assessment orally to the entire class to either
introduce or recall the notions of computing the range and of computing the standard
deviation, or to group students and ask them to identify answers, or to give this as
homework, or to use some questions/items here for a chapter examination.

Chapter 1 Describing Data Lesson 7 Page 9


HANDOUT FOR STUDENTS
Computing the Standard Deviation

Consider the following five data representing the difference in scores of two players in a
computer game:

-47, 28, 37, -47, 34

To compute for the sample standard deviation here, five steps must be essentially done, viz.:

(a) compute the sample mean, i.e. sum the values in the first column and divide by
the number of items thus yielding: = 1

(b) subtract the sample mean from each of the items (yielding the deviations from
the average) and thus obtain the second column in table below.

Difference in Deviations from Squared


scores of two the Average Deviations
players
-47 -48 2304
28 27 729
37 36 1296
-47 -48 2304
34 33 1089

(c) square the deviations from the average (the items in the second column), and
thus obtain the third column of the table above.

(d) sum the values in the third column and divide by number of data, thus yielding
Variance = 7722/5 = 1544.4

(e) take the square root of the result from (d): Standard Deviation is nearly 39. 3

In practice, the sum of the values in the third column may be obtained in much faster
and efficient way. This calculation involves (a*) summing the squared values of the
first column, 7727, (b*) subtracting from (a*) the product of the sample size and the
square of the sample mean, thus yielding 7727-(5)*(1)2=7722. This is the so-called
computational scheme for obtaining the variance (and standard deviation).

If the first column of the table were entered as an Excel spreadsheet as in Figure 6.1,
Chapter 1 Describing Data Lesson 7 Page 10
we merely have to enter the command

= STDEVP(A2:A6)

in an empty cell (such as A9) and obtain the population standard deviation as
approximately 39.3

TECHNICAL NOTE:

x i
Given a data set x1 , x 2 , , x N , denote the mean as i 1
. The variance, denoted as 2 , is
N
N

(x i )2
2 i 1
; while the population standard deviation, denoted as , is the square root of the
N
variance, i.e.,

(x i )2
i 1
N

Chapter 1 Describing Data Lesson 7 Page 11

You might also like