You are on page 1of 67

Quantitative methods for

management
Descriptive statistics- Numerical
measures
DAY 3
Recap
Day 1 Introduction, types of statistics, data and its types
Definition of statistics, terminologies : population , sample, parameter,
statistic, qualitative and quantitative data, levels of measurements :
Nominal, Ordinal, Interval and Ratio- sources of collecting data
Primary and secondary, applications of Statistics in various functions of
management data mining and data warehousing

Day 2 Classification of data Qualitative , quantitative, geographical and


chronological :Presentation of data frequency distribution, relative
and cumulative frequencies ; bivariate distributions, Diagrammatic
bar diagram , pie diagram
Graphical histogram, Frequency polygon, Ogive
Exploratory data analysis : Scatter diagram, stem and leaf plot
Summarization of data
Measures of central tendencies
AM, WM, GM
Positional averages median, percentiles, quartiles
Mode
Empirical formula
Measures of dispersion
Range
Quartile deviation
Mean deviation
Standard deviation
Variance
Coefficient of variation RAW DATA
Measures of shapes & Kurtosis
Exploratory Data analysis ( Box plot, Five number summary,
chebychev inequality)
Measures of association ( covariance , correlation)
Summary Definitions
The central tendency is the extent to which all the data values
group around a typical or central value.

The variation is the amount of dispersion, or scattering, of


values

The shape is the pattern of the distribution of values from the


lowest value to the highest value.

Chap 3-4
Arithmetic Mean
Commonly called the mean
is the average of a group of numbers
Applicable for interval and ratio data
Not applicable for nominal or ordinal data
Affected by each value in the data set, including
extreme values
Computed by summing all values in the data set and
dividing the sum by the number of values in the data
set
It is possible to find the average, if we know the
aggregate and number of items, not necessarily to
know the value of the individual
Measures of Central Tendency:
The Mean
The arithmetic mean (often just called
mean) is the most common measure of
central tendency
Pronounced x-bar
The ith value

For a sample
n of size n:
X i
X1 X2 Xn
X i1

n n
Sample size Observed values

Chap 3-6
Measures of Central Tendency:
The Mean (continued)

The most common measure of central tendency


Mean = sum of values divided by the number of values
Affected by extreme values (outliers)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Mean = 3 Mean = 4

1 2 3 4 5 15 1 2 3 4 10 20
3 4
5 5 5 5
Chap 3-7
Properties of AM
Sum of deviations from AM is ZERO
Sum of squares of deviation taken from AM
will be minimum
Combined mean
It is affected by change of scale and change of
origin
Median
Middle value in an ordered array of numbers.
Applicable for ordinal, interval, and ratio data
Not applicable for nominal data
Unaffected by extremely large and extremely
small values.
Median: Computational Procedure
First Procedure
Arrange the observations in an ordered array.
If there is an odd number of terms, the median is
the middle term of the ordered array.
If there is an even number of terms, the median is
the average of the middle two terms.
Second Procedure
The medians position in an ordered array is given
by (n+1)/2.
Median: Example
with an Odd Number of Terms
Ordered Array
3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22

There are 17 terms in the ordered array.


Position of median = (n+1)/2 = (17+1)/2 = 9
The median is the 9th term, 15.

If the 22 is replaced by 100, the median is 15.

If the 3 is replaced by -103, the median is 15.


Median: Example
with an Even Number of Terms
Ordered Array
3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21

There are 16 terms in the ordered array.


Position of median = (n+1)/2 = (16+1)/2 = 8.5
The median is between the 8th and 9th terms,
14.5.
If the 21 is replaced by 100, the median is 14.5.
If the 3 is replaced by -88, the median is 14.5.
Percentiles
Measures of central tendency that divide a group
of data into 100 parts
At least n% of the data lie below the nth
percentile, and at most (100 - n)% of the data lie
above the nth percentile

Example: 90th percentile indicates that at least


90% of the data lie below it, and at most 10% of
the data lie above it
The median and the 50th percentile have the same
value.
Applicable for ordinal, interval, and ratio data
Not applicable for nominal data
Percentiles: Computational Procedure
Organize the data into an ascending ordered
array.
Calculate the
P
percentile location:
i (n)
100
Determine the percentiles location and its value.

If i is a whole number, the percentile is the


average of the values at the i and (i+1) positions.

If i is not a whole number, the percentile is at


the (i+1) position in the ordered array.
Percentiles: Example
Raw Data: 14, 12, 19, 23, 5, 13, 28, 17
Ordered Array: 5, 12, 13, 14, 17, 19, 23, 28
Location of
30
30th percentile: i (8) 2. 4
100
The location index, i, is not a whole number; i+1 =
2.4+1=3.4; the whole number portion is 3; the
30th percentile is at the 3rd location of the array;
the 30th percentile is 13.
Quartiles
Measures of central tendency that divide a group of
data into four subgroups

Q1: 25% of the data set is below the first quartile


Q2: 50% of the data set is below the second quartile
Q3: 75% of the data set is below the third quartile

Q1 is equal to the 25th percentile


Q2 is located at 50th percentile and equals the
median
Q3 is equal to the 75th percentile
Quartile values are not necessarily members of the
data set
Quartiles

Q1 Q2 Q3

25% 25% 25% 25%


Quartiles: Example
Ordered array: 106, 109, 114, 116, 121, 122,
125, 129
Q1 25 109114
i (8) 2 Q1 1115
.
100 2
50 116121
Q2: i (8) 4 Q2 1185
.
100 2
75 122125
Q3: i (8) 6 Q3 1235
.
100 2
Mode
The most frequently occurring value in a data
set
Applicable to all levels of data measurement
(nominal, ordinal, interval, and ratio)

Bimodal -- Data sets that have two modes


Multimodal -- Data sets that contain more
than two modes
Measures of Central Tendency:
The Mode
Value that occurs most often
Not affected by extreme values
Used for either numerical or categorical data
There may be no mode
There may be several modes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6

No Mode
Mode = 9
Chap 3-20
Mode -- Example
The mode is 44.
35 41 44 45
There are more 44s
than any other value. 37 41 44 46

37 43 44 46

39 43 44 46

40 43 44 46

40 43 45 48
Problem
The cost of consumer purchases such as single family
housing, gasoline, internet services, tax preparation ,
and hospitalization were provided in The Wall Street
journal. Sample data typical of the cost of tax return
preparation by services such as H&R block are shown
below
120 230 110 115 160 130 150 105
195 155 105 360 120 120 140 100
115 180 235 255
- Compute the mean, median and mode
- Compute the first and third quartiles
- Compute and interpret the 90th percentile
Measures of Central Tendency:
Review Example
House Prices: Mean: ($3,000,000/5)
$2,000,000 = $600,000
$500,000
$300,000 Median: middle value of ranked data
$100,000 = $300,000
$100,000
Mode: most frequent value
Sum $3,000,000
= $100,000

Chap 3-23
Measures of Central Tendency:
Which Measure to Choose?

The mean is generally used, unless extreme


values (outliers) exist.
The median is often used, since the median is
not sensitive to extreme values. For example,
median home prices may be reported for a
region; it is less sensitive to outliers.
In some situations it makes sense to report
both the mean and the median.

Chap 3-24
Measures of Central Tendency:
Summary
Central Tendency

Arithmetic Mean Median Mode

X i
X i 1
Most
n Middle value in
the ordered array frequently
observed
value

Chap 3-25
Measures of Variability
It is often desirable to consider measures of variability
(dispersion), as well as measures of location.

For example, in choosing supplier A or supplier B we


might consider not only the average delivery time for
each, but also the variability in delivery time for each.
Variability
No Variability in Cash Flow Mean
Mean

Variability in Cash Flow Mean


Mean
Variability

Variability

No Variability
Measures of Variation
Variation

Range Variance Standard Coefficient of


Deviation Variation

Measures of variation give information


on the spread or variability or
dispersion of the data values.

Same center,
Chap 3-29 different variation
Measures of Variability:
Ungrouped Data
Measures of variability describe the spread or the
dispersion of a set of data.
Common Measures of Variability
Range
Interquartile Range
Mean Absolute Deviation
Variance
Standard Deviation
Z scores
Coefficient of Variation
Range
The difference between the largest and the
smallest values in a set of data
Simple to compute 35 41 44 45

Ignores all data points except 37 41 the 44 46


two extremes
Example: 37 43 44 46

Range 39 43 = 44 46
Largest - Smallest =
48 - 35 = 13 40 43 44 46

40 43 45 48
Measures of Variation:
The Range
Simplest measure of variation
Difference between the largest and the smallest values:

Range = Xlargest Xsmallest

Example:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 13 - 1 = 12

Chap 3-32
Measures of Variation:
Why The Range Can Be Misleading

Ignores the way in which data are distributed

7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5

Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

Range = 5 - 1 = 4

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

Range = 120 - 1 = 119


Chap 3-33
Interquartile Range

Range of values between the first and third


quartiles
Range of the middle half
Less influenced by extremes

Interquartile Range Q 3 Q1
Deviation from the Mean
Data set: 5, 9, 16, 17, 18
Mean:

X 65 13
N 5
Deviations from the mean: -8, -4, 3, 4, 5
+5
-4 +4
-8 +3

0 5 10 15 20


Mean Absolute Deviation
Average of the absolute deviations from the
mean
X X X
X
M . A. D.
5 -8 +8 N
9 -4 +4
+3 +3 24

16
17 +4 +4
18 +5 +5 5
0 24 4.8
Measures of Variation:
The Standard Deviation
Steps for Computing Standard Deviation

1. Compute the difference between each value and the


mean.
2. Square each difference.
3. Add the squared differences.
4. Divide this total by n-1 to get the sample variance.
5. Take the square root of the sample variance to get
the sample standard deviation.

Chap 3-37
Measures of Variation:
The Variance
Average (approximately) of squared deviations of values from
the mean

Sample variance: n

(X X)
i
2

S 2 i1
n -1
Where X= arithmetic mean
n = sample size
Xi = ith value of the variable X
Chap 3-38
Measures of Variation:
The Standard Deviation
Most commonly used measure of variation
Shows variation about the mean
Is the square root of the variance
Has the same units as the original data

n
Sample standard deviation: (X i X) 2

S i 1
n -1

Chap 3-39
Measures of Variation:
Sample Standard Deviation:
Calculation Example
Sample
Data (Xi) : 10 12 14 15 17 18 18 24

n=8 Mean = X = 16

(10 X)2 (12 X)2 (14 X)2 (24 X)2


S
n 1

(10 16)2 (12 16)2 (14 16)2 (24 16)2



8 1

A measure of the average scatter


130
4.3095 around the mean
7
Chap 3-40
Population Variance
Average of the squared deviations from the
arithmetic mean

X X
X
X
2
2


2
5 -8 64
9 -4 16 N
16 +3 9
130
17
18
+4
+5
16
25
0 130
5
2 6 .0
Population Standard Deviation
Square root of the
variance
X
2

X X X
2

2

N
5 -8 64 130
9 -4 16
16 +3 9 5
17
18
+4
+5
16
25 2 6 .0
0 130

2

2 6 .0
5 .1
Sample Variance
Average of the squared deviations from the
arithmetic mean

X X X X X X
X
2
2

2
2,398 625 390,625 S
1,844 71 5,041 n1
1,539 -234 54,756
6 6 3 ,8 6 6
1,311
7,092
-462
0
213,444
663,866

3
2 2 1 , 2 8 8 .6 7
Sample Standard Deviation
Square root of the
X X
2
sample variance 2
S
X X X X X
2
n1
6 6 3 ,8 6 6
2,398 625 390,625
1,844 71 5,041 3
1,539 -234 54,756
1,311 -462 213,444 2 2 1 , 2 8 8 .6 7
7,092 0 663,866
2
S S
2 2 1 , 2 8 8 .6 7
4 7 0 .4 1
Uses of Standard Deviation
Indicator of financial risk
Quality Control
construction of quality control charts
process capability studies
Comparing populations
household incomes in two cities
employee absenteeism at two plants
Measures of Variation:
Comparing Standard Deviations

Smaller standard deviation

Larger standard deviation

Chap 3-46
Standard Deviation as an
Indicator of Financial Risk

Annualized Rate of Return


Financial
Security

A 15% 3%
B 15% 7%

3-47
Measures of Variation:
Comparing Standard Deviations
Data A
Mean = 15.5

11 12 13 14 15 16 17 18 19 20 21
S = 3.338

Data B Mean = 15.5


S = 0.926
11 12 13 14 15 16 17 18 19 20 21

Data C Mean = 15.5


S = 4.570
11 12 13 14 15 16 17 18 19 20 21

Chap 3-48
Measures of Variation:
Summary Characteristics
The more the data are spread out, the greater the range,
variance, and standard deviation.

The more the data are concentrated, the smaller the range,
variance, and standard deviation.

If the values are all the same (no variation), all these
measures will be zero.

None of these measures are ever negative.

Chap 3-49
Measures of Variation:
The Coefficient of Variation
Measures relative variation
Always in percentage (%)
Shows variation relative to mean
Can be used to compare the variability of two or more sets
of data measured in different units

S
CV 100%

X
Chap 3-50
Measures of Variation:
Comparing Coefficients of Variation
Stock A:
Average price last year = $50
Standard deviation = $5
S $5
CVA 100% 100% 10%
X $50 Both stocks
have the same
Stock B: standard
deviation, but
Average price last year = $100 stock B is less
Standard deviation = $5 variable
S $5 relative to its
CVB 100% 100% 5% price
X $100
Chap 3-51
Coefficient of Variation
Ratio of the standard deviation to the mean,
expressed as a percentage
Measurement of relative dispersion


C.V . 100

Coefficient of Variation
29
1
84
2

1
4.6 2
10
100 100
. .
CV 1
1
. .
CV 2
2

1 2

4.6 10
100 100
29 84
1586
. 1190
.
A home theatre in a box is the easiest and cheapest way to provide surround
sound for a home entertainment centre. A sample of prices is shown here
(Consumer Reports Buying Guide, 2013). The prices are for models with a
DVD player and for models without a DVD player.

Models with DVD Player Price Models without DVD Player Price
Sony HT-1800DP $450 Pioneer HTP-230 $300
Pioneer HTD-330DV 300 Sony HT-DDW750 300
Sony HT-C800DP 400 Kenwood HTB-306 360
Panasonic SC-HT900 500 RCA RT-2600 290
Panasonic SC-MTI 400 Kenwood HTB-206 300

Compute the mean price for models with a DVD player and the mean price for
models without a DVD player. What is the additional price paid to have a DVD
player included in a home theatre unit?
Compute the range, variance, and standard deviation for the two samples. What
does this information tell you about the prices for models with and without a DVD
player?
Price with DVD player Price without DVD player

Mean 410 Mean 310

Standard Error 33.1662479 Standard Error 12.64911064

Median 400 Median 300

Mode 400 Mode 300

Standard Deviation 74.16198487 Standard Deviation 28.28427125

Sample Variance 5500 Sample Variance 800

Kurtosis 0.867768595 Kurtosis 4.578125

Skewness -0.551618069 Skewness 2.099223257

Range 200 Range 70

Minimum 300 Minimum 290

Maximum 500 Maximum 360

Sum 2050 Sum 1550

Count 5 Count 5
The following data were used to construct the histograms of the number of
days required to fill orders for Dawson Supply, Inc., and J.C. Clark
Distributors

Dawson Supply Days for Delivery :11 10 9 10 11 11 10 11 10 10


Clark Distributors Days for Delivery : 8 10 13 7 10 11 10 7 15 12

Use the range and standard deviation to support that Dawson Supply
provides the more consistent and reliable delivery times.
dawson clark

Mean 10.3 Mean 10.3


Standard Error 0.213437475 Standard Error 0.817176711
Median 10 Median 10
Mode 10 Mode 10
Standard Deviation 0.674948558 Standard Deviation 2.584139659
Sample Variance 0.455555556 Sample Variance 6.677777778

Kurtosis -0.282994816 Kurtosis -0.350865189

Skewness -0.433637384 Skewness 0.359288855

Range 2 Range 8

Minimum 9 Minimum 7

Maximum 11 Maximum 15

Sum 103 Sum 103

Count 10 Count 10
coefficient of variation 25.08873455
coefficient of variation 6.552898619
Practice
The following times were recorded by the quarter-mile and mile runners of
a university track team (times are in minutes).
Quarter-Mile Times: .92 .98 1.04 .90 .99
Mile Times: 4.52 4.35 4.60 4.70 4.50
After viewing this sample of running times, one of the coaches commented
that the quarter milers turned in the more consistent times. Use the standard
deviation and the coefficient of variation to summarize the variability in the
data. Does the use of the coefficient of variation indicate that the coachs
statement should be qualified?
General Descriptive Stats Using
Microsoft Excel
1. Select Tools.
2. Select Data Analysis.
3. Select Descriptive Statistics and
click OK.

Chap 3-59
General Descriptive Stats Using
Microsoft Excel

4. Enter the cell range.


5. Check the Summary
Statistics box.
6. Click OK

Chap 3-60
Excel output
Microsoft Excel
descriptive statistics output,
using the house price data:

House Prices:

$2,000,000
500,000
300,000
100,000
100,000

Chap 3-61
Minitab Output

Descriptive Statistics: House Price


Total
Variable Count Mean SE Mean StDev Variance Sum Minimum
House Price 5 600000 357771 800000 6.40000E+11 3000000 100000

N for
Variable Median Maximum Range Mode Skewness Kurtosis
House Price 300000 2000000 1900000 100000 2.01 4.13

Chap 3-62
Numerical Descriptive Measures for a
Population
Descriptive statistics discussed previously described a sample,
not the population.

Summary measures describing a population, called


parameters, are denoted with Greek letters.

Important population parameters are the population mean,


variance, and standard deviation.

Chap 3-63
Numerical Descriptive Measures
for a Population: The mean
The population mean is the sum of the values in the
population divided by the population size, N

X i
X1 X2 XN
i1

N N
Where = population mean
N = population size
Xi = ith value of the variable X
Chap 3-64
Numerical Descriptive Measures For A
Population: The Variance 2
Average of squared deviations of values from the mean

N
Population variance:
(X )
i
2

2 i1
N

Where = population mean


N = population size
Xi = ith value of the variable X
Chap 3-65
Numerical Descriptive Measures For A Population:
The Standard Deviation

Most commonly used measure of variation


Shows variation about the mean
Is the square root of the population variance
Has the same units as the original data

Population standard deviation: i


(X ) 2

i1
N

Chap 3-66
Sample statistics versus population
parameters

Measure Population Sample


Parameter Statistic
Mean
X
Variance
2 S2
Standard
S
Deviation

Chap 3-67

You might also like