Professional Documents
Culture Documents
Statistical Lingo
Virtually everyone has a pretty good
understanding of what is meant by
the word average.
A golfer who shot rounds of 78, 84,
and 87 could compute her average,
or what statisticians would call her
mean ( or x).
NOTE: In general, a Greek letter is
used if an entire populations data is
being checked
but in the case of a sample, the
regular letter is used.
78
75
84
80
85
87
90
Statistical Lingo
Virtually everyone has a pretty good
understanding of what is meant by
the word average.
A golfer who shot games of 78, 84,
and 87 could compute her average,
or what statisticians would call her
mean ( or x), like this:
78
84
+87
249 : 3 = 83
75
80
85
90
Statistical Lingo
Each of these scores deviates from
the average (83) by some amount.
These deviations can be combined
to calculate what is called a
standard deviation.
78
84
+87
249
75
-5
+1
+4
80
85
90
Statistical Lingo
But if we want to calculate the
standard deviation we cant simply
add them up theyll cancel each
other out and well get zero.
On the other hand, squaring the
deviations will prevent that problem.
78
84
+87
249
75
-5 25
+1 1
+4 16
80
85
90
Statistical Lingo
Then we can add the squares up - this
helps to get an estimate of how much
variation is present. (The concept of
adding up squares of differences like
this is called the sum of the squares.)
78
84
+87
249
75
-5 25
+1 1
+4 + 16
42
80
85
90
Statistical Lingo
Then we divide this sum by the number
of scores in the list (N) minus 1. (This is
because we only have a sample of all
this persons golf scores if we had all
of their golf scores we would simply
divide by N.)
78
84
+87
249
75
-5 25
+1 1
+4 + 16
42 : 2 = 21
80
85
90
Statistical Lingo
If we just leave it like this, its called the
variance ( 2 or s2). If we take the square
root (which cancels out the fact that we
squared the deviations earlier) well get
the standard deviation ( or s). (Also,
we divide by 2 because its the number of
data points in the sample minus 1.)
78
84
+87
249
-5 25
+1 1
+4 + 16
42 : 2 = 21
21 = 4.6
75
80
85
90
Statistical Lingo
Another common term is the median.
Its the middle value of the data and is
insensitive to actual values in the set.
Real estate folks might refer to a median
income level for an area its virtually
unaffected by Bill Gates moving into
(or out of) the neighborhood.
78
84
87
75
80
85
90
Statistical Lingo
In a few short slides, weve covered a
number of the most frequently used
statistical terms.
deviation
78
84
+87
249
variance
(s2)
-5 25
+1 1
+4 + 16
42 : 2 = 21
249 / 3 = 83
mean (x)
75
standard
deviation
(s)
21 = 4.6
median
80
85
90
Statistical Lingo
Of course, if you had to manually
compute:
an average
a deviation for each data point
a square of all the deviations
a sum of the squares
a variance
a standard deviation
a median
every time you got some data, things
could get crazy; especially if theres a
lot of data. Thankfully, we have Minitab.
75
80
85
90
3.
5.
7.
6.
Minitab will provide a summary of the data that looks something like this.
Well break this down in pieces to explain all the information displayed.
Normal
Not Normal
Not Normal
x (sample) or
(population)
s (sample) or
(population)
Mean: The average value of all the data points. (If calculated using a sample of
data from a population it may be written x, if calculated using all the data in the
population it may be written .)
StDev: The standard deviation of all the data points. It can be thought of as the
average distance that data points are from the mean the larger the standard
deviation, the greater the variation. (If calculated using a sample of data from a
population its usually written s, if calculated using all the data in
the population Its usually written .)
s2 (sample) or
2
(population)
N (sample size)
Variance: Equal to the standard deviation squared.
Skewness: A measure of asymmetry the further from zero, the more skewed the
data. For example, if a distribution has a large tail at the upper end of its distribution,
skewness will likely be positive. Typically, the skewness value will range from negative
3 to positive 3.
Kurtosis: A number reflecting how much the sample data resembles a normal
distribution in shape. A very negative kurtosis indicates a distribution that is flatter than
usual, a very positive kurtosis indicates a distribution that is more peaked than usual.
The kurtosis value is approximately zero for a normal distribution.
N: The number of data points used in the creation of this summary.
The vertical line part way through each of the red boxes is the calculated
mean (top) and median (bottom) for the sample of data entered.
Around these points, Minitab calculates an interval within which it is 95%
certain that the population mean and median actually reside.
For example, in the case of the top red bar, the vertical line in the middle of
the red bar shows a mean of about 50.6. While this is probably not the EXACT
mean for the population, using the number of data points and the amount of
variation they exhibited it can be estimated with good confidence (95%) that
the mean for the population falls somewhere between 48.9 and 52.3.
1st quartile
74
83
87.5
92
96.5
68%
34% 34%
78.5
13.5%
13.5%
2.36%
69.5
87.5
68%
74
78.5
34% 34%
95%
68%
34% 34%
99.73%
13.5%
92
13.5%
2.36%
96.5
74
78.5
83
87.5
92
96.5
68%
34% 34%
78.5
13.5%
68%
74
13.5%
2.36%
69.5
87.5
34% 34%
95%
68%
34% 34%
99.73%
13.5%
92
13.5%
2.36%
96.5
74
78.5
83
87.5
92
96.5
68%
34% 34%
78.5
13.5%
68%
74
13.5%
2.36%
69.5
87.5
34% 34%
95%
68%
34% 34%
99.73%
13.5%
92
13.5%
2.36%
96.5