You are on page 1of 1

The sample mean x and the sample standard deviation s of a data set have been introduced as measures of its

location or center and its variability or spread. In fact these two numbers give a very compressed summary of a data set. The link that lets us construct a picture of the whole data set from its mean and standard deviation is called the Empirical Rule: Empirical Rule about 68% of a data set lies in the range x - s to x + s about 95% of a data set lies in the range x - 2s to x + 2s almost all of a data set lies in the range x - 3s to x + 3s This rule was established by observing many data sets, hence the name. Here is an example based on the data: 50.6, 50.9, 49.1, 51.3, 50.5, 49.7, 51.5, 49.8, 51.1, 48.9, 50.3, 49.2, 51.2, 50.4, 52.8. Statistics: x = 50.5, s = 1.05, n = 15 Interval Numerical Interval 49.45 to 51.55 48.4 to 52.6 47.35 to 53.65 Fraction of data in interval 11/15 or 73% 14/15 or 93% 15/15 or 100% Empirical Rule 68% 95% ~100%

Empirical Rule

x - s to x + s
x - 2s to x + 2s

x - 3s to x + 3s

The Empirical Rule suggests that the mean x of a data set is a natural center or origin for the values, and the standard deviation s is a natural scale. This leads to the idea of the transformation of data according to the rule x z = ( x x ) / s ( z is called the z-score or normal score of x). The Empirical Rule then states that most values in a data set have z-scores between and +3. 3 A data point whose z-score is beyond +3 or 3 is called an outlier. Notice that the z-score is dimensionless. Example. In the data of the example the smallest data item is 48.9, and its z-score is (48.9 50.5)/1.05 = 1.52; the largest data item is 52.8, and its z-score is (52.8 50.5)/1.05 = 2.19. Exercises 1. Find the fraction of the data within each of the three ranges x s, x 2s, x 3s, and compare to the prediction of the Empirical Rule. The statistics are x = 92.26, s = 2.39. Data: 91.50, 94.18, 92.18, 95.39, 91.79, 89.07, 94.72, 89.21. 2. Find the z-scores of the largest and smallest items in the data set of Exercise 1. 3. In a data set containing 10 items, it is actually impossible for any item to have a z-score larger than 3 in absolute value. How close can you come? Try to make up a data set of 10 numbers (small integers will be OK) that has an item with a large z-score. 4. A very rough approximation for the standard deviation of a large data set is (max - min)/6. Explain how the Empirical Rule justifies this.

You might also like