Professional Documents
Culture Documents
Chapter Outline
1. Measuring center: the mean 2. Measuring center: the median 3. Comparing the mean and the median 4. Measuring spread: the quartiles 5. The five-number summary and boxplots 6. Measuring spread: the standard deviation 7. Choosing measures of center and spread
How to get
x?
x1 x2 x3 ... xn x n
Example 2.1 (P.33)
x
n
How to find M?
1. Sort all observations in increasing order (This step is important!!!) n 1 ( 2. If n is odd, 2 )th observation is M. if n is even, average of two center values is M. n 1 ( Note that 2 ) is the location of the median in the ordered list, not the median value.
Examples
Case 1. 11, 21, 13, 24, 15, 26, 17 Case 2. 11, 21, 13, 24, 15, 26
Median is more resistant than the mean. The mean and median of a symmetric distribution are close together. If the distribution is exactly symmetric, the mean and median are exactly the same. In a skewed distribution, the mean is farther out in the long tail than is the median.
Example
1, 2, 3, 4, 5, 6, 10000
Inference :
Strongly skewed distributions are reported with median than the mean.
Note: (1) It is important to sort data first before we try to find quartiles! (2) Quartiles are resistant.
The five-number summary: Minimum, Q1, M, Q3, Maximum. Boxplot is a graph of five number summary.
Boxplot
1. A boxplot is a graph of the fivenumber summary 2. A central box spans the quartiles 3. A line in the box marks the median 4. Lines extended from the box out to the minimum and maximum 5. Range = maximum - minimum
Figure 2.2(P.39): side-by-side boxplots comparing the distributions of earning for two levels of education.
Inference :
-- In a symmetric distribution Q1 and Q3 are equally distant from the median, but in case of right skewed one the third quartile would be further above the median than the first quartile bellow it.
It says how far the observations are from their mean. The variance s2 of a set of observations is an average of the squares of the deviations of the observations from their mean. Notation: s2 for variance and s for standard deviation
( x1 x ) 2 ( x2 x ) 2 ... ( xn x ) 2 s2 n 1 ( xi x ) 2 n 1 ( xi2 ) n( x ) 2 n 1
Why (n-1) ?
As the sum of the deviations ( xi x ) always equals 0, so the knowledge of (n-1) of them determines the last one. --- Only (n-1) of the squared deviations are variable but not the last one, so we average by dividing the total by (n-1). The number (n-1) is called the degrees of freedom of the variance or standard deviation
Properties of
2 s
and s
s measures spread about the mean and should be used only when the mean is chosen as the measure of center. s 0 and s=0 only when each of the observation values does not differ from each other. S is not resistant.
With a skewed distribution or with a distribution with extreme outliers, fivenumber summary is better.
With a symmetric distribution (without outliers), mean and standard deviation are better.