Professional Documents
Culture Documents
N
1 X
yi
N i=1
N
1 X
=
(yi y)2
N 1 i=1
where is the standard deviation of the N data values yi from the average value
y. represents the RMS (root mean square) deviation of the values from the
average and provides an indicator of the uncertainty or the error in the values.
expected value: The is the most likely value to occur as indicated by the probability
distribution representing some random variable. It is often referred to as the
average since it usually corresponds to the arithmetic mean of the values as the
1
s P (s)
for a collection of discrete values where P (s) represents the probability distribution function for s and is normalized so
X
P (s) = 1
and where the sum is over all possible values of s. If the possible values are
continuous the expected value is given by
Z
s =
s0 P (s0 ) ds0
(s s)2 P (s)
(s s)2 P (s)
(s2 2 s s + s2 ) P (s)
s2 P (s) 2 s
s P (s) + s2
s
2
P (s)
s
2
s P (s) 2 s + s
s2 P (s) s2
s2
s1
n!
ps q ns
s!(n s)!
n s ns
=
pq
s
=
where
n
s
n
X
Bn,p (s) = 1 .
s=0
s =
=
n
X
s Bn,p (s)
s=0
n
X
n!
ps q ns
s!(n s)!
n!
ps q ns
s!(n s)!
s=0
=
=
n
X
s=1
n
X
np
s=1
= np
n1
X
j=0
= np
n1
X
(n 1)!
ps1 q ns
(s 1)!(n s)!
(n 1)!
pj q n1j
j!(n 1 j)!
Bn1,p (j)
j=0
s = n p .
Because B is properly normalized, the sum in he next to last line evaluates to one
(1). The variance is
2 =
n
X
s2 Bn,p (s) s2
s=0
=
=
n
X
s=0
n
X
s=1
s2
n!
ps q ns s2
s!(n s)!
s2
n!
ps q ns s2
s!(n s)!
4
n
X
s=1
= np
n!
ps q ns s2
(s 1)!(n s)!
n
X
s=1
= np
n1
X
(n 1)!
ps1 q ns s2
(s 1)!(n s)!
(j + 1)
j=0
(n 1)!
pj q n1j s2
j!(n 1 j)!
" n1
X
#
n1
X
(n 1)!
(n
1)!
= np
j
pj q n1j +
pj q n1j s2
j!(n
j)!
j!(n
j)!
j=0
j=0
" n1
#
n1
X
X
= np
j Bn1,p (j) +
Bn1,p (j) s2
j=1
j=0
2
= n p [(n 1) p + 1] (n p)
= n p (1 p)
since the first sum represents the expected value of j = (n 1) p and the second sum
is again equal to one (1) since B is normalized.
n!
1
n i
s
=
1
1
s! (n s)! n
n
n
Bn,p (s) =
If we now take the limit of this equation as n , and with a little help from
5
n ns
s
s!
where P (s) represents the probability that exactly s events occur in the specified
interval.
The expected value is found by
s =
s P (s)
s=0
s e
s=1
=
=
X
s=1
= e
s
(s 1)!
s=1
s
s!
s1
(s 1)!
X
j
j!
j=0
where j = s 1. Note that the normalization of the distribution provides the relationship
X
j=0
j
j!
X
j
j!
j=0
= 1
= e
One of the surprising, but very useful, results for this distribution is that
=
=
s2 P (s) s2
s=0
s2 e
s=1
s e
s=1
s
s2
s!
s
s2
(s 1)!
s1
s2
(s 1)!
#
"s=1
j
X
s2
=
(j + 1) e
j!
j=0
#
"
X
X
P (j) s2
=
j P (j) +
=
s e
j=0
j=0
= [ + 1]
=
where j = s 1 again, the first sum finds the expected value of j and the second is
equal to one (1) because this distribution is also normalized.
1/c for (
s c/2) s (
s + c/2)
0 otherwise.
s+c/2
s2
=
sc/2
2
1
ds s2
c
c
+ s2 ) s2
12
c2
.
=
12
= (
This distribution is very common when recording values with modern digital equipment since we can only record discrete values but we are usually measuring a continuous function. c is the minimum spacing between the possible values on a digital
voltmeter or the spacing between levels returned from an analog-to-digital converter.
As an example of this distribution, if a digital voltmeter is used to measure a voltage,
and it reads 0.001 V, the actual value could lie anywhere within the range of 0.0005
to 0.0015 V with equal likelihood.
The average
or expected value is 0.001 V and the
e(ss)
2 /(2 2 )
s P (s) ds = s
OpenStax College, Introductory Statistics, OpenStax College, 19 September 2013, p. 339, available at http://cnx.org/content/col11562/latest/.
and
Z
s2 P (s) ds s2 = 2 .
Many measurements of continuous variables in physics will exhibit a Gaussian distribution. If you are uncertain if your signal does have a Gaussian dependence, you can
collect a large number of measurements. You then bin the values, plot a histogram
of the number of times the value falls within the appropriate bin, and compare this
to a Gaussian curve.
If your measurements are truly Gaussian the meaning of is very specific. If you
integrate the distribution over some range that is centered on s you can get the
fraction of measurements that will typically fall in that range:
1
f =
2
s+
e(ss)
2 /(2 2 )
ds .
f =
/( 2 )
et
, 2 /(2 )
Z /(2 )
1
2
=
et dt .
/( 2 )
2 dt
Since the integrand is symmetrical about zero (0) and the integral limits are also
symmetrical about zero we can just use twice the integral from zero to the upper
limit:
2
f =
/( 2 )
et dt .
n/ 2
et dt
= erf(n/ 2)
where erf(x) is the Gaussian error function that is well known. Most mathematical
packages include this as one of the standard functions.
9
If we look at the fraction of the values that fall within a given number of standard
deviations we find
=
=
=
=
=
=
,
2 ,
3 ,
4 ,
5 ,
6 ,
erf(1/ 2) = 0.6826894921370858
erf(2/ 2) = 0.954499736103642
erf(3/ 2) = 0.99730020393674
erf(4/ 2) = 0.999936657516334
erf(5/ 2) = 0.999999426696856
erf(6/ 2) = 0.999999998026825
As you can see, virtually all the measurements of a signal that exhibits Gaussian
behavior should fall within the range of s 6 .
11
Figure 2: A graph of the Poisson distribution (circles) overlaid on the normal distribution (solid line). Both have been calculated to have a mean value of 4 and a
variance of 4. The vertical dotted lineshows the mean value and the vertical dashed
lines show the mean plus or minus ( 4). You will notice that the Poisson distribution has about the same peak value and that the peak is shifted toward lower values
of s. On the right side of the peak (higher values of s) the Poisson distribution is
lower than the normal distribution, consistent with the peak being shifted to the left.
But the Poisson goes to zero more slowly as s increases. At s = 12 the Poisson and
normal distribution values are 6.415 104 and 6.692 105 respectively. At s = 16
the values are 3.760 106 and 3.038 109 respectively.
12
Figure 3: A graph of the normal distribution. This was calculated to have a mean
value of 4 and a variance of 4 to match the values in Figure 2. The vertical dotted line
shows
the mean value and the vertical dashed lines show the mean plus or minus
13
y2
n
X
i2
i=1
f
xi
2
.
In this equation, we have assumed that the xi are independent and there is no crosscorrelation between different xi . If this not true, we will have to add one or more
terms to the sum of the form
n
X
n
X
i=1 j=1(j6=i)
f f
ij
xi xj
where ij is call the covariance of xi and xj and arises from correlations between those
variables where a change in on of them causes a change in the other. If the i values
are uncorrelated we dont have the extra terms in the equation.
There are also some pitfalls in this equation if the function f is nonlinear or the
uncertainties i are large in some sense. This is because the propagation equation is
based on a first-order Taylor expansion of the equation for y under the assumption
that we only keep the lowest order powers of i .
14
2
v2 + of
f .
v
u
u
= t
N
1 X
(yj y)2
N 1 j=1
(recall that this is known as the sample standard deviation). If we took a single sample
yi we would expect it to have a standard deviation of y as long as the number of
points, N , in the original data set is large enough to get a good sample of the signal
and any noise present.
As a reminder, the standard deviation, , of a sequence with a Gaussian distribution
provides a specific estimate of the uncertainty in each of the values. An uncertainty
of about the mean value indicates that if we take many measurements, 68.3% of
them will fall in the range between y and y + . An uncertainty of 2 about
the mean value indicates that 95.4% of the values in the given range and using 3
will result in 99.7% of the values falling within the given range.
15
If we then take an average of N samples (which conveniently can be the same set of
samples we used in determining the standard deviation of each of the samples) we
would find
N
1 X
y =
yj
N j=1
v
u N
2
uX
y
t
y =
y
yj
j=1
v
u N
2
uX
1
t
=
y
N
j=1
v
u 2 N
u y X
= t 2
1
N j=1
r
y2
=
N
y
y = .
N
Averaging a slowly-varying signal can significantly improve the uncertainty in the
resulting value.
You can combine other errors, in the same way, to arrive at a total error in your final
value. Care in accounting for the possible errors in a measurement will significantly
improve your understanding of the quality and usefulness of that measurement.
16