You are on page 1of 35

Descriptive Statistics

(chapter 3 contd)
Distribution Shape, Relative Location,
and Outliers
Distribution Shape
Skewness:
Skewed to left = negative numbers
Symetric = 0
Skewed to right = positive numbers

Skewness complicated to calculated but easily done with a


statistical package
Skewness is automatically calculated with the discriptive
statitistics.
Skewness = 0

Skewness = negative

Skewness = positive
Z-Scores
Tell us relative location of a value within the data set
(i.e., how far it is from the mean)
Z-scores standardized values
Zi can be interpreted as the number of standard deviations xi
is from x

xi x
zi
s
Z > 0 observation greater than mean
Z < 0 observation less than mean
Lets try it
Continue with the startsalary data
Mean = 3540
Std. dev = 165.65

What is the z score for the 6th observation


xi x
zi
s
Z6 = (3310-3540)/165.65 = -1.39
Chebyshevs Theorem
Allows us to make statements about the proportion of
data values that fall within a specified number of
standard deviations around the mean
At least (1 - 1/z2) of the items in any data set will be
within z standard deviations of the mean, where z is any
value greater than 1.

E.g., at least 75% of data are within 2 st dev.


E.g., at least 89% of data are within 3 st. dev
So What
Chebyshev allows us to make extrapolations
e.g., Exam 1: N=100, = 72, = 6
How many students scored between 60 and 84
60 is 2 standard deviations below the mean
60-72/6 =- 2
84 is 2 standard deviations above the mean.

Thus, applying Chebyshev = 75% of students


scored between 60 and 84
Chebyshevs Empirical Rule
APPLIES TO BELL-SHABE (NORMAL)
DISTRIBUTIONS
68% of the values of a normal random variable
are within +/- 1 standard deviation of its mean.

95% of the values of a normal random variable


are within +/- 2 standard deviations of its mean.

99% of the values of a normal random variable


are within +/- 3 standard deviations of its mean.

MORE ON THIS LATER


How this works
Look at z table in the front of the text.

-2 2

The shaded area is what you need to find but you are given the following
Shaded area = .0228

Shaded area = .9772

.9772-.0228 = .9544
Empirical Rule
99.7%
95%
68%

x
m m + 3s
m 3s m 1s m + 1s
m 2s m + 2s
Outliers and Chebyshev
An outlier is an unusually small or unusually large
value in a data set.
A data value with a z-score less than -3 or greater
than +3 might be considered an outlier
It might be:
incorrectly recorded data value (fix)
data value that was incorrectly included in the data set
(remove)
correctly recorded data value (keep)
Lets continue with StartSalary data

Note that all of the values are within 3 standard deviations no outliers.
You try it
#18
Measures of Association
between Two Variables

Covariance
Correlation
Covariance
measure of the linear association between two variables
Positive numbers = positive relationship
Negative numbers = negative relationship

Population sample

( xi m x )( yi m y ) ( xi x )( yi y )
s xy s xy
N n 1
NOTE: covariance is impacted by the measurement units
Lets continue with the Stereo Data
Calculate Covariance using excel
Week No. of Commercials Sales Volume (xi-x) (yi-y) (xi-x)(yi-y) cov

1 2 50 -1 -1 1 9.9

2 5 57 2 6 12

3 1 41 -2 -10 20

4 3 54 0 3 0

5 4 54 1 3 3

6 1 38 -2 -13 26

7 5 63 2 12 24

8 3 48 0 -3 0

9 4 59 1 8 8

10 2 46 -1 -5 5

Total 30 510 0 0 99

average 3 51
Interpreting covariance
Positive Sxy = positive linear association

Negative Sxy negative linear association

Values close to zero no relationship


Correlation
measure of linear association (not impacted by
measurement units)
Correlation does not mean causation!

Sample
Population
s xy sxy
xy rxy
s xs y sx s y
The coefficient can take on values between -1
and +1.

Values close to -1 strong negative linear relationship


Values close to + strong positive linear relationship
Continuing with the Stereo Data
You try it
# 39
Of course Excel will do all that for
you
Use the covar function
NOTE- need to adjust to create sample covar
{n/(n-1)}*pop cov

Use the correl function

Lets do # 46 in excel
The Weight Mean and Working with
Grouped Data
Weighted means giving each observation a
weight that reflects its importance
E.g. In the computation of a grade point average (GPA),
the weights are the number of credit hours earned for
each grade.

When data values vary in importance, the analyst


must choose the weight that best reflects the
importance of each value (e.g., pounds, dollars, )
Computing a Weighted Mean

x wx i i

w i

where:
xi = value of observation i
wi = weight for observation i
Lets do an example
#56
Grouped Data
Used when data are only available in grouped or
frequency distribution form
Used to find approximations of the mean, variance, and
standard deviation for the grouped data
treat the midpoint of each class as though it were the
mean of all items in the class.
compute a weighted mean, variance and standard
deviation of the class midpoints use the class
frequencies as weights
Computing Weighted Mean for
Grouped Data

x fM i i

n
where:
fi = frequency of class i
Mi = midpoint of class i
Example

Rent ($) Frequency


420-439 8
440-459 17
460-479 12
480-499 8
500-519 7
520-539 4
540-559 2
560-579 4
580-599 2
600-619 6
Sample Mean for Grouped Data

Rent ($) fi Mi f iM i
34, 525
420-439 8 429.5 3436.0 x 493.21
440-459 17 449.5 7641.5 70
460-479 12 469.5 5634.0
480-499 8 489.5 3916.0
500-519 7 509.5 3566.5
520-539 4 529.5 2118.0
540-559 2 549.5 1099.0
560-579 4 569.5 2278.0
580-599 2 589.5 1179.0
600-619 6 609.5 3657.0
Total 70 34525.0
Weighted Variance and Standard
Deviation

Sample Population

fi ( Mi x ) 2
i i
f ( M m ) 2
s
2
s
2

n 1 N
Rent ($) fi Mi Mi - x (M i - x )2 f i (M i - x )2
420-439 8 429.5 -63.7 4058.96 32471.71
440-459 17 449.5 -43.7 1910.56 32479.59
460-479 12 469.5 -23.7 562.16 6745.97
480-499 8 489.5 -3.7 13.76 110.11
500-519 7 509.5 16.3 265.36 1857.55
520-539 4 529.5 36.3 1316.96 5267.86
540-559 2 549.5 56.3 3168.56 6337.13
560-579 4 569.5 76.3 5820.16 23280.66
580-599 2 589.5 96.3 9271.76 18543.53
600-619 6 609.5 116.3 13523.36 81140.18
Total 70 208234.29

Continued
Sample Variance
s2 = 208,234.29/(70 1) = 3,017.89

Sample standard deviation

s 3, 017.89 54.94
Summary of Formulas
Pages 156 and 157 gives a good summary of
notations and formulas

You might also like