Ch3 - BOTH 2nd Part-Descriptive Statistics - Distribution Shape

Descriptive Statistics
(chapter 3 contd)
Distribution Shape, Relative Location,
and Outliers
Distribution Shape
Skewness:
Skewed to left = negative numbers
Symetric = 0
Skewed to right = positive numbers
Skewness complicated to calculated but easily done with a

statistical package
Skewness is automatically calculated with the discriptive
statitistics.
Skewness = 0
Skewness = negative
Skewness = positive
Z-Scores
Tell us relative location of a value within the data set
(i.e., how far it is from the mean)
Z-scores standardized values
Zi can be interpreted as the number of standard deviations xi
is from x
xi x
zi
s
Z > 0 observation greater than mean
Z < 0 observation less than mean
Lets try it
Continue with the startsalary data
Mean = 3540
Std. dev = 165.65
What is the z score for the 6th observation

xi x
zi
s
Z6 = (3310-3540)/165.65 = -1.39
Chebyshevs Theorem
Allows us to make statements about the proportion of
data values that fall within a specified number of
standard deviations around the mean
At least (1 - 1/z2) of the items in any data set will be
within z standard deviations of the mean, where z is any
value greater than 1.
E.g., at least 75% of data are within 2 st dev.

E.g., at least 89% of data are within 3 st. dev
So What
Chebyshev allows us to make extrapolations
e.g., Exam 1: N=100, = 72, = 6
How many students scored between 60 and 84
60 is 2 standard deviations below the mean
60-72/6 =- 2
84 is 2 standard deviations above the mean.
Thus, applying Chebyshev = 75% of students

scored between 60 and 84
Chebyshevs Empirical Rule
APPLIES TO BELL-SHABE (NORMAL)
DISTRIBUTIONS
68% of the values of a normal random variable
are within +/- 1 standard deviation of its mean.

are within +/- 2 standard deviations of its mean.

are within +/- 3 standard deviations of its mean.
MORE ON THIS LATER

How this works
Look at z table in the front of the text.
-2 2
The shaded area is what you need to find but you are given the following
Shaded area = .0228
Shaded area = .9772
.9772-.0228 = .9544
Empirical Rule
99.7%
95%
68%
x
m m + 3s
m 3s m 1s m + 1s
m 2s m + 2s
Outliers and Chebyshev
An outlier is an unusually small or unusually large
value in a data set.
A data value with a z-score less than -3 or greater
than +3 might be considered an outlier
It might be:
incorrectly recorded data value (fix)
data value that was incorrectly included in the data set
(remove)
correctly recorded data value (keep)
Lets continue with StartSalary data
Note that all of the values are within 3 standard deviations no outliers.
You try it
#18
Measures of Association
between Two Variables
Covariance
Correlation
Covariance
measure of the linear association between two variables
Positive numbers = positive relationship
Negative numbers = negative relationship
Population sample
( xi m x )( yi m y ) ( xi x )( yi y )
s xy s xy
N n 1
NOTE: covariance is impacted by the measurement units
Lets continue with the Stereo Data
Calculate Covariance using excel
Week No. of Commercials Sales Volume (xi-x) (yi-y) (xi-x)(yi-y) cov
1 2 50 -1 -1 1 9.9
2 5 57 2 6 12
3 1 41 -2 -10 20
4 3 54 0 3 0
5 4 54 1 3 3
6 1 38 -2 -13 26
7 5 63 2 12 24
8 3 48 0 -3 0
9 4 59 1 8 8
10 2 46 -1 -5 5
Total 30 510 0 0 99
average 3 51
Interpreting covariance
Positive Sxy = positive linear association
Negative Sxy negative linear association
Values close to zero no relationship

Correlation
measure of linear association (not impacted by
measurement units)
Correlation does not mean causation!
Sample
Population
s xy sxy
xy rxy
s xs y sx s y
The coefficient can take on values between -1
and +1.
Values close to -1 strong negative linear relationship

Values close to + strong positive linear relationship
Continuing with the Stereo Data
You try it
# 39
Of course Excel will do all that for
you
Use the covar function
NOTE- need to adjust to create sample covar
{n/(n-1)}*pop cov
Use the correl function
Lets do # 46 in excel
The Weight Mean and Working with
Grouped Data
Weighted means giving each observation a
weight that reflects its importance
E.g. In the computation of a grade point average (GPA),
the weights are the number of credit hours earned for
each grade.
When data values vary in importance, the analyst

must choose the weight that best reflects the
importance of each value (e.g., pounds, dollars, )
Computing a Weighted Mean
x wx i i
w i
where:
xi = value of observation i
wi = weight for observation i
Lets do an example
#56
Grouped Data
Used when data are only available in grouped or
frequency distribution form
Used to find approximations of the mean, variance, and
standard deviation for the grouped data
treat the midpoint of each class as though it were the
mean of all items in the class.
compute a weighted mean, variance and standard
deviation of the class midpoints use the class
frequencies as weights
Computing Weighted Mean for
Grouped Data
x fM i i
n
where:
fi = frequency of class i
Mi = midpoint of class i
Example
Rent ($) Frequency

420-439 8
440-459 17
460-479 12
480-499 8
500-519 7
520-539 4
540-559 2
560-579 4
580-599 2
600-619 6
Sample Mean for Grouped Data
Rent ($) fi Mi f iM i
34, 525
420-439 8 429.5 3436.0 x 493.21
440-459 17 449.5 7641.5 70
460-479 12 469.5 5634.0
480-499 8 489.5 3916.0
500-519 7 509.5 3566.5
520-539 4 529.5 2118.0
540-559 2 549.5 1099.0
560-579 4 569.5 2278.0
580-599 2 589.5 1179.0
600-619 6 609.5 3657.0
Total 70 34525.0
Weighted Variance and Standard
Deviation
Sample Population
fi ( Mi x ) 2
i i
f ( M m ) 2
s
2
s
2
n 1 N
Rent ($) fi Mi Mi - x (M i - x )2 f i (M i - x )2
420-439 8 429.5 -63.7 4058.96 32471.71
440-459 17 449.5 -43.7 1910.56 32479.59
460-479 12 469.5 -23.7 562.16 6745.97
480-499 8 489.5 -3.7 13.76 110.11
500-519 7 509.5 16.3 265.36 1857.55
520-539 4 529.5 36.3 1316.96 5267.86
540-559 2 549.5 56.3 3168.56 6337.13
560-579 4 569.5 76.3 5820.16 23280.66
580-599 2 589.5 96.3 9271.76 18543.53
600-619 6 609.5 116.3 13523.36 81140.18
Total 70 208234.29
Continued
Sample Variance
s2 = 208,234.29/(70 1) = 3,017.89
Sample standard deviation
s 3, 017.89 54.94
Summary of Formulas
Pages 156 and 157 gives a good summary of
notations and formulas

Ch3 - BOTH 2nd Part-Descriptive Statistics - Distribution Shape

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ch3 - BOTH 2nd Part-Descriptive Statistics - Distribution Shape

Uploaded by

Copyright:

Available Formats

Descriptive Statistics

Skewness complicated to calculated but easily done with a

What is the z score for the 6th observation

E.g., at least 75% of data are within 2 st dev.

Thus, applying Chebyshev = 75% of students

95% of the values of a normal random variable

99% of the values of a normal random variable

MORE ON THIS LATER

Shaded area = .9772

Negative Sxy negative linear association

Values close to zero no relationship

Values close to -1 strong negative linear relationship

Use the correl function

When data values vary in importance, the analyst

Rent ($) Frequency

Sample standard deviation

You might also like