Professional Documents
Culture Documents
org/wiki/Standard_deviation
Standard deviation
From Wikipedia, the free encyclopedia
This page may be too technical for a general
audience. Please help improve the page by
providing more context and better explanations of
technical details, even for su bjects that are
inherently technical.
Standard deviation is the measu rement of the distribu tion of data abou t a
mean valu e. It describes the dispersion of data on either side of a mean
valu e. A low standard deviation indicat es that the data set is clu stered
arou nd the mean valu e, whereas a high standard deviation indicates that the
data is widely spread with significantly higher/lower figu res than the mean.
T he mean is the arithmetic average.
[1]
Formu lated by Francis Galton in the late 1860s, the standard deviation
remains the most common measu re of statistical dispersion, measu ring how
widely spread the valu es in a data set are. If many data points are close to
the mean, then the standard deviation is small; if many data points are far
from the mean, then the standard deviation is large. If all data valu es are
equ al, then the standard deviation is zero. A u sefu l property of standard
deviation is that, u nlike variance, it is expressed in the same u nits as the
data.
When only a sample of data from a popu lation is available, the popu lation
standard deviation can be estimated by a modified standard deviation of the
sample, explained below.
1 of 16 02/02/2009 10:16 AM
Standard deviation - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Standard_deviation
Contents
1 Definition and calcu lation
1.1 Probability distribu tion or
random variable
1.2 Continu ou s random variable
1.3 Discrete random variable or data
set
1.3.1 Example
1.3.2 Simplification of the A data set with a mean of 50
formu la (shown in blue) and a standard
1.4 Estimating popu lation standard deviation (σ) of 20.
deviation from sample standard
deviation
1.5 Properties of st andard deviation
2 Interpretation and application
2.1 Application examples
2.1.1 Weather
2.1.2 Sports
2.1.3 Finance
2.2 Geometric interpretation
2.3 Chebyshev's inequ ality
2.4 Ru les for normally distribu ted
data
3 Relationship between standard
deviation and mean
4 Rapid calcu lation methods
5 See also
6 References
7 External links
2 of 16 02/02/2009 10:16 AM
Standard deviation - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Standard_deviation
where E(X) is the expected valu e of X. E(X) is another name for the mean,
and it is often indicated with the Greek letter μ.
Not all random variables have a standard deviation, since these expected
valu es need not exist. For example, the standard deviation of a random
variable which follows a Cau chy distribu tion is u ndefined becau se its E(X) is
u ndefined.
Continu ou s distribu tions u su ally give a formu la for calcu lating the standard
deviation as a fu nct ion of the parameters of the distribu tion. In general, the
standard deviation of a continu ou s real-valu ed random variable X with
probability density fu nction p(x) is
where
and where the integrals are definite integrals taken for x ranging over the
range of X.
3 of 16 02/02/2009 10:16 AM
Standard deviation - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Standard_deviation
If not all valu es have equ al probability, bu t the probability of valu e xi equ als
pi, the standard deviation can be compu ted by:
and
where
Example
Su ppose we wished to find the standard deviation of the data set consisting
of the valu es 3, 7, 7, and 19.
4 of 16 02/02/2009 10:16 AM
Standard deviation - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Standard_deviation
Step 3: squ are each of the deviations, which amplifies large deviations and
makes negative valu es positive,
Step 5: take the non-negative squ are root of the qu otient (converting
squ ared u nits back to regu lar u nits),
So, the standard deviation of the set is 6. T his example also shows that, in
general, the standard deviation is different from the mean absolu te deviation
(which is 5 in this example).
Note that if the above data set represented only a sample from a greater
popu lation, a modified standard deviat ion wou ld be calcu lated (explained
below) to estimate the popu lation standard deviation, which wou ld give 6.93
for this example.
5 of 16 02/02/2009 10:16 AM
Standard deviation - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Standard_deviation
In the real world, finding the standard deviation of an ent ire popu lation is
u nrealistic except in certain cases, su ch as standardized testing, where every
member of a popu lation is sampled. In most cases, the st andard deviation is
estimated by examining a random sample taken from the popu lation. Using
the definition given above for a data set and applying it to a small or
moderately-sized sample resu lts in an estimate that tends to be too low. T he
most common measu re u sed is an adju sted version, the sample standard
deviation, which is defined by
6 of 16 02/02/2009 10:16 AM
Standard deviation - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Standard_deviation
variance σ 2 of the u nderlying popu lation, if that variance exists and the
sample valu es are drawn independent ly with replacement. However, s is not
an u nbiased estimator for the standard deviation σ; it tends to
u nderestimate the popu lation standard deviation. Althou gh an u nbiased
estimator for σ is known when the random variable is normally distribu ted,
the formu la is complicated and amou nts to a minor correction: see Unbiased
estimation of standard deviation. Moreover, u nbiasedness, in this sense of
the word, is not always desirable; see bias of an estimator.
T his form has a u niformly smaller mean squ ared error than does the
u nbiased estimator, and is the maximu m-likelihood estimate when the
popu lation is normally distribu ted.
For example, each of the three data sets {0, 0, 14, 14}, {0, 6, 8, 14} and {6,
6, 8, 8} has a mean of 7. T heir standard deviations are 7, 5, and 1,
respectively. T he t hird set has a mu ch smaller standard deviation than the
other two becau se its valu es are all close to 7. In a loose sense, the standard
deviation tells u s how far from the mean the data points tend to be. It will
have the same u nit s as the data points themselves. If, for instance, the data
set {0, 6, 8, 14} represents the ages of fou r siblings in years, the standard
deviation is 5 years.
7 of 16 02/02/2009 10:16 AM
Standard deviation - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Standard_deviation
As another example, the data set {1000, 1006, 1008, 1014} may represent
the distances traveled by fou r athletes, measu red in met ers. It has a mean of
1007 meters, and a standard deviation of 5 meters.
Application examples
Weather
As a simple example, consider average temperatu res for cities. While two
cities may each have an average temperatu re of 15 °C, it's helpfu l to
u nderstand that the range for cities near the coast is smaller than for cities
inland, which clarifies that, while the average is similar, the chance for
variation is greater inland than near the coast.
So, an average of 15 occu rs for one cit y with highs of 25 °C and lows of 5 °C,
and also occu rs for another city with highs of 18 and lows of 12. T he standard
deviation allows u s to recognize that t he average for the city with the wider
variation, and thu s a higher standard deviation, will not offer as reliable a
prediction of temperatu re as the city with the smaller variation and lower
standard deviation.
Sports
8 of 16 02/02/2009 10:16 AM
Standard deviation - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Standard_deviation
categories will have a low standard deviation. A team that is consistently good
in most categories will also have a low standard deviation. However, a team
with a high standard deviation might be the type of team that scores a lot
(strong offense) bu t also concedes a lot (weak defense), or, vice versa, that
might have a poor offense bu t compensates by being difficu lt to score on.
T rying to predict which teams, on any given day, will win, may inclu de looking
at the standard deviations of the variou s team "stats" ratings, in which
anomalies can match strengths vs. weaknesses to attempt to u nderstand
what factors may prevail as stronger indicators of eventu al scoring
ou tcomes.
Finance
For example, let's assu me an investor had to choose bet ween two stocks.
Stock A over the last 20 years had an average retu rn of 10%, with a standard
deviation of 20% and Stock B, over the same period, had average retu rns of
12%, bu t a higher standard deviation of 30%. On the basis of risk and retu rn,
an investor may decide that Stock A is the safer choice, becau se Stock B's
additional 2% point s of retu rn is not worth the additional 10% standard
deviation (greater risk or u ncertainty of the expected retu rn). Stock B is
likely to fall short of the initial investment (bu t also to exceed the initial
investment) more often than Stock A u nder the same circu mstances, and is
estimated to retu rn only 2% more on average. In this example, Stock A is
expected to earn abou t 10%, plu s or minu s 20% (a range of 30% to -10%),
abou t two-thirds of the fu tu re year retu rns. When considering more extreme
9 of 16 02/02/2009 10:16 AM
Standard deviation - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Standard_deviation
possible retu rns or ou tcomes in fu tu re, an investor shou ld expect resu lts of
u p to 10% plu s or minu s 90%, or a range from 100% to -80%, which inclu des
ou tcomes for three standard deviations from the average retu rn (abou t
99.7% of probable retu rns).
Calcu lating the average retu rn (or arit hmetic mean) of a secu rity over a given
nu mber of periods will generate an expected retu rn on t he asset. For each
period, su btracting the expected retu rn from the actu al retu rn resu lts in
the variance. Squ are the variance in each period to find the effect of the
resu lt on the overall risk of the asset. T he larger the variance in a period, the
greater risk the secu rity carries. T aking the average of t he squ ared variances
resu lts in the measu rement of overall u nits of risk associated with the asset.
Finding the squ are root of this variance will resu lt in the standard deviation
of the investment t ool in qu estion.
Geometric interpretation
T o gain some geometric insights, we will start with a popu lation of three
3
valu es, x1 , x2, x3. T his defines a point P = (x1 , x2, x3) in R . Consider the line L
= {(r, r, r) : r in R}. T his is the "main diagonal" going throu gh the origin. If ou r
three given valu es were all equ al, then the standard deviation wou ld be zero
and P wou ld lie on L. So it is not u nreasonable to assu me t hat the standard
deviation is related to the distance of P to L. And that is indeed the case.
Moving orthogonally from P to the line L, one hits the point :
whose coordinates are the mean of the valu es we started ou t with. A little
algebra shows that the distance between P and R (which is the same as the
distance between P and the line L) is given by σ√3. An analogou s formu la
(with 3 replaced by N) is also valid for a popu lation of N valu es; we then have
N
to work in R .
Chebyshev's inequality
An observation is rarely more than a few standard deviat ions away from the
mean. Chebyshev's inequ ality entails t he following bou nds for all distribu tions
for which the standard deviation is defined.
At least 50% of the valu es are within √2 standard deviations from the
mean.
At least 75% of the valu es are within 2 standard deviations from the
mean.
At least 89% of the valu es are within 3 standard deviations from the
mean.
10 of 16 02/02/2009 10:16 AM
Standard deviation - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Standard_deviation
At least 94% of the valu es are within 4 standard deviations from the
mean.
At least 96% of the valu es are within 5 standard deviations from the
mean.
At least 97% of the valu es are within 6 standard deviations from the
mean.
At least 98% of the valu es are within 7 standard deviations from the
mean.
And in general:
2
At least (1 − 1/k ) × 100% of the valu es are within k standard deviations
from the mean.
zσ percentage
1σ 68.27%
1.645σ 90%
11 of 16 02/02/2009 10:16 AM
Standard deviation - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Standard_deviation
1.960σ 95%
2σ 95.450%
2.576σ 99%
3σ 99.7300%
3.2906σ 99.9%
4σ 99.993666%
5σ 99.99994267%
6σ 99.9999998027%
7σ 99.9999999997440%
If we want to obtain the mean by sampling the distribu tion then the standard
deviation of the mean is related to the standard deviation of the distribu tion
by:
12 of 16 02/02/2009 10:16 AM
Standard deviation - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Standard_deviation
A slightly faster (significantly for ru nning standard deviat ion) way to compu t e
the popu lation standard deviation is given by the following formu la (thou gh
considerations mu st be made for rou nd-off error, arithmetic overflow, and
arithmetic u nderflow conditions):
13 of 16 02/02/2009 10:16 AM
Standard deviation - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Standard_deviation
A1 = x1
Q1 = 0
sample variance:
standard variance
14 of 16 02/02/2009 10:16 AM
Standard deviation - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Standard_deviation
A1 = x1
Q1 = 0
where n is the total nu mber of elements, and n' is the nu mber of elements
with non-zero weights. T he above formu las become equ al to the more simple
formu las given above if we take all weights equ al to 1.
See also
15 of 16 02/02/2009 10:16 AM
Standard deviation - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Standard_deviation
References
1. ^ Sir Francis Galton discovered the standard deviation
(http://www.sciencetimeline.net/1866.htm)
External links
A Gu ide to Understanding & Calcu lating Standard Deviat ion
(http://www.stats4stu dents.com/Essentials/Measu res-Of-Spread
/Overview_3.php)
Interactive Demonstration and Standard Deviation Calcu lator
(http://www.u sablestats.com/tu torials/StandardDeviation)
Standard Deviation - an explanation withou t maths
(http://www.techbookreport.com/tu torials/stddev-30-secs.html)
Standard Deviation, an elementary int rodu ction (http://davidmlane.com
/hyperstat/A16252.html)
Standard Deviation, a simpler explanat ion for writers and jou rnalists
(http://www.robertniles.com/stats/stdev.shtml)
Standard Deviation Calcu lator (http://invsee.asu .edu /srinivas/stdev.html)
T exas A&M Standard Deviation and Confidence Interval Calcu lators
(http://www.stat.tamu .edu /~jhardin/applets/)
16 of 16 02/02/2009 10:16 AM