Professional Documents
Culture Documents
(chapter 3 contd)
Distribution Shape, Relative Location,
and Outliers
Distribution Shape
Skewness:
Skewed to left = negative numbers
Symetric = 0
Skewed to right = positive numbers
Skewness = negative
Skewness = positive
Z-Scores
Tell us relative location of a value within the data set
(i.e., how far it is from the mean)
Z-scores standardized values
Zi can be interpreted as the number of standard deviations xi
is from x
xi x
zi
s
Z > 0 observation greater than mean
Z < 0 observation less than mean
Lets try it
Continue with the startsalary data
Mean = 3540
Std. dev = 165.65
-2 2
The shaded area is what you need to find but you are given the following
Shaded area = .0228
.9772-.0228 = .9544
Empirical Rule
99.7%
95%
68%
x
m m + 3s
m 3s m 1s m + 1s
m 2s m + 2s
Outliers and Chebyshev
An outlier is an unusually small or unusually large
value in a data set.
A data value with a z-score less than -3 or greater
than +3 might be considered an outlier
It might be:
incorrectly recorded data value (fix)
data value that was incorrectly included in the data set
(remove)
correctly recorded data value (keep)
Lets continue with StartSalary data
Note that all of the values are within 3 standard deviations no outliers.
You try it
#18
Measures of Association
between Two Variables
Covariance
Correlation
Covariance
measure of the linear association between two variables
Positive numbers = positive relationship
Negative numbers = negative relationship
Population sample
( xi m x )( yi m y ) ( xi x )( yi y )
s xy s xy
N n 1
NOTE: covariance is impacted by the measurement units
Lets continue with the Stereo Data
Calculate Covariance using excel
Week No. of Commercials Sales Volume (xi-x) (yi-y) (xi-x)(yi-y) cov
1 2 50 -1 -1 1 9.9
2 5 57 2 6 12
3 1 41 -2 -10 20
4 3 54 0 3 0
5 4 54 1 3 3
6 1 38 -2 -13 26
7 5 63 2 12 24
8 3 48 0 -3 0
9 4 59 1 8 8
10 2 46 -1 -5 5
Total 30 510 0 0 99
average 3 51
Interpreting covariance
Positive Sxy = positive linear association
Sample
Population
s xy sxy
xy rxy
s xs y sx s y
The coefficient can take on values between -1
and +1.
Lets do # 46 in excel
The Weight Mean and Working with
Grouped Data
Weighted means giving each observation a
weight that reflects its importance
E.g. In the computation of a grade point average (GPA),
the weights are the number of credit hours earned for
each grade.
x wx i i
w i
where:
xi = value of observation i
wi = weight for observation i
Lets do an example
#56
Grouped Data
Used when data are only available in grouped or
frequency distribution form
Used to find approximations of the mean, variance, and
standard deviation for the grouped data
treat the midpoint of each class as though it were the
mean of all items in the class.
compute a weighted mean, variance and standard
deviation of the class midpoints use the class
frequencies as weights
Computing Weighted Mean for
Grouped Data
x fM i i
n
where:
fi = frequency of class i
Mi = midpoint of class i
Example
Rent ($) fi Mi f iM i
34, 525
420-439 8 429.5 3436.0 x 493.21
440-459 17 449.5 7641.5 70
460-479 12 469.5 5634.0
480-499 8 489.5 3916.0
500-519 7 509.5 3566.5
520-539 4 529.5 2118.0
540-559 2 549.5 1099.0
560-579 4 569.5 2278.0
580-599 2 589.5 1179.0
600-619 6 609.5 3657.0
Total 70 34525.0
Weighted Variance and Standard
Deviation
Sample Population
fi ( Mi x ) 2
i i
f ( M m ) 2
s
2
s
2
n 1 N
Rent ($) fi Mi Mi - x (M i - x )2 f i (M i - x )2
420-439 8 429.5 -63.7 4058.96 32471.71
440-459 17 449.5 -43.7 1910.56 32479.59
460-479 12 469.5 -23.7 562.16 6745.97
480-499 8 489.5 -3.7 13.76 110.11
500-519 7 509.5 16.3 265.36 1857.55
520-539 4 529.5 36.3 1316.96 5267.86
540-559 2 549.5 56.3 3168.56 6337.13
560-579 4 569.5 76.3 5820.16 23280.66
580-599 2 589.5 96.3 9271.76 18543.53
600-619 6 609.5 116.3 13523.36 81140.18
Total 70 208234.29
Continued
Sample Variance
s2 = 208,234.29/(70 1) = 3,017.89
s 3, 017.89 54.94
Summary of Formulas
Pages 156 and 157 gives a good summary of
notations and formulas