Professional Documents
Culture Documents
Preliminaries of Probability
1. Transformation of densities
Exercise 1. If X has cdf FX (x) and g is increasing and continuous, then Y = g (X) has cdf
1
FY (y) = FX g (y)
for all y in the image of y. If g is decreasing and continuous, the formula is
1
FY (y) = 1 FX g (y)
Exercise 2. If X has continuous pdf fX (x) and g is increasing and di¤ erentiable, then Y = g (X)
has pdf
fX g 1 (y) fX (x)
fY (y) = 0 = 0
g (g 1 (y)) g (x) y=g(x)
for all y in the image of y. If g is decreasing and di¤ erentiable, the formula is
fX (x)
fY (y) = :
g 0 (x) y=g(x)
Remark 1. Under proper assumptions, when g is not injective the formula generalizes to
X fX (x)
fY (y) = :
jg 0 (x)j
x:y=g(x)
Remark 2. A second proof of the previous formula comes from the following characterization of
the density: f is the density of X if and only if
Z
E [h (X)] = h (x) f (x) dx
R
fX (x)
for all continuous bounded functions h. Let us use this fact to prove that fY (y) = jg 0 (x)j y=g(x) is the
density of Y = g (X). Let us compute E [h (Y )] for a generic continuous bounded functions h. We
i
ii 1. PRELIMINARIES OF PROBABILITY
Moreover, the numbers i are eigenvalues of Q, and the vectors ei are corresponding eigenvectors.
Since the covariance matrix Q is also non-negative de…nite, we have
i 0; i = 1; :::; n:
Remark 4. To understand better this theorem, recall a few facts of linear algebra. Rn is a vector
space with a scalar product h:; :i, namely a set of elements (called vectors) with certain operations (sum
of vectors, multiplication by real numbers, scalar product between vectors) and properties. We may call
intrinsic the objects de…ned in these terms, opposite to the objects de…ned by means of numbers, with
respect to a given basis. A vector x 2 Rn is an intrinsic object; but we can write it as a sequence of
numbers (x1 ; :::; xn ) in in…nitely many ways, depending on the basis we choose. Given an orthonormal
basis u1 ; :::; un , the components of a vector x 2 Rn in this basis are the numbers hx; uj i, j = 1; :::; n. A
linear map L in Rn , given the basis u1 ; :::; un , can be represented by a matrix of components hLui ; uj i.
We shall write y T x for hx; yi (or hy; xi).
Remark 5. After these general comments, we see that a matrix represents a linear transformation,
given a basis. Thus, given the canonical basis of Rn , that we shall denote by u1 ; :::; un , given the matrix
Q, it is de…ned a linear transformation L from Rn to a Rn . The spectral theorem states that there is a
new orthonormal basis e1 ; :::; en of Rn such that, if Qe represents the linear transformation L in this
new basis, then Qe is diagonal.
Remark 6. Let us recall more facts about linear algebra. Start with an orthonormal basis u1 ; :::; un ,
that we call canonical or original basis. Let e1 ; :::; en be another orthonormal basis. The vector u1 , in
iv 1. PRELIMINARIES OF PROBABILITY
and so on, namely U represents the linear map which maps the canonical (original) basis of Rn into
e1 ; :::; en . This is an orthogonal transformation:
1
U = UT :
Indeed, U 1 maps e1 ; :::; en into the canonical basis (by the above property of U ), and U T does the
same:
0 1 0 1
eT1 e1 1
B eT e1 C B 0 C
U T e1 = B 2 C B
@ ::: A = @
C
::: A
eTn e1 0
and so on.
Remark 7. Let us now go back to the covariance matrix Q and the matrix Qe given by the spectral
theorem: Qe is a diagonal matrix which represents the same linear transformation L in a new basis
e1 ; :::; en . Assume we do not know anything else, except they describe the same map L and Qe is
diagonal, namely of the form
0 1
1 0 0
Qe = @ 0 ::: 0 A :
0 0 n
Qe = U T QU
3. Gaussian vectors
Recall that a Gaussian, or Normal, r.v. N ; 2
is a r.v. with probability density
!
1 jx j2
f (x) = p exp :
2 2 2 2
We have shown that is the mean value and 2 the variance. The standard Normal is the case = 0,
2 = 1. If Z is a standard normal r.v., then + Z is N ; 2 .
We may give the de…nition of Gaussian vector in two ways, generalizing either the expression of
the density or the property that + Z is N ; 2 . Let us start with a lemma.
Lemma 1. Given a vector = ( 1 ; :::; n ) and a symmetric positive de…nite n n matrix Q
(namely v T Qv > 0 for all v 6= 0), consider the function
!
1 (x )T Q 1 (x )
f (x) = p exp
(2 )n det(Q) 2
where x = (x1 ; :::; xn ) 2 Rn . Notice that the inverse Q 1 is well de…ned for positive de…nite matrices,
(x )T Q 1 (x ) is a positive quantity, det(Q) is a positive number. Then:
i) f (x) is a probability density;
ii) if X = (X1 ; :::; Xn ) is a random vector with such joint probability density, then is the vector
of mean values, namely
i = E [Xi ]
and Q is the covariance matrix:
Qij = Cov (Xi ; Xj ) :
vi 1. PRELIMINARIES OF PROBABILITY
Proof. Step 1. In this step we explain the meaning of the expression f (x). We have recalled
above that any symmetric matrix Q can be diagonalized, namely it exists a orthonormal basis e1 ; :::; en
of Rn where Q takes the form 0 1
1 0 0
Qe = @ 0 ::: 0 A :
0 0 n
Moreover, the numbers i are eigenvalues of Q, and the vectors ei are corresponding eigenvectors.
See above for more details. Let U be the matrix introduced there, such that U 1 = U T . Recall the
relation Qe = U T QU .
Since v T Qv > 0 for all v 6= 0, we deduce
v T Qe v = v T U T Q (U v) > 0
for all v 6= 0 (since U v 6= 0). Taking v = ei , we get i > 0.
Therefore the matrix Qe is invertible, with inverse given by
0 1
1
1 0 0
Qe 1 = @ 0 ::: 0 A :
0 0 1
n
Since
n
X 2
T 1 (yi ( e )i )
(y e) Qe (y e) =
i=1 i
Namely, fe (y) is the product of n Gaussian densities N (( e )i ; i ). We know from the theory of
joint probability densities that the product of densities is the joint density
R of a vector with indepen-
dent
R components. Hence fe (y) is a probability density. Therefore R n fe (y) dy = 1. This proves
Rn f (x) dx = 1, so that f is a probability density.
Step 3. Let X = (X1 ; :::; Xn ) be a random vector with joint probability density f , when written
in the original basis. Let Y = U T X. Then (exercise 3) Y has density fY (y) given by fY (y) = f (U y).
Thus !
Yn
1 (yi ( e )i )2
fY (y) = fe (y) = p exp :
2 i 2 i
i=1
Thus (Y1 ; :::; Yn ) are independent N (( e )i ; i) r.v. and therefore
E [Yi ] = ( e )i ; Cov (Yi ; Yj ) = ij i :
and covariance
QX = U QY U T :
Since Y = e and e = U T we readily deduce X = UUT = . Since QY = Qe and Q = U Qe U T
we get QX = Q. The proof is complete.
Definition 1. Given a vector = ( 1 ; :::; n ) and a symmetric positive de…nite n n matrix Q,
we call Gaussian vector of mean and covariance Q a random vector X = (X1 ; :::; Xn ) having joint
probability density function
!
1 (x )T Q 1 (x )
f (x) = p exp
(2 )n det(Q) 2
where x = (x1 ; :::; xn ) 2 Rn . We write X N ( ; Q).
The only drawback of this de…nition is the restriction to strictly positive de…nite matrices Q. It is
sometimes useful to have the notion of Gaussian vector also in the case when Q is only non-negative
de…nite (sometimes called degenerate case). For instance, we shall see that any linear transformation
of a Gaussian vector is a Gaussian vector, but in order to state this theorem in full generality we need
to consider also the degenerate case. In order to give a more general de…nition, let us take the idea
recalled above for the 1-dimensional case: a¢ ne transformations of Gaussian r.v. are Gaussian.
viii 1. PRELIMINARIES OF PROBABILITY
Definition 2. i) The standard d-dimensional Gaussian vector is the random vector Z = (Z1 ; :::; Zd )
d
Y z2
with joint probability density f (z1 ; :::; zd ) = p (zi ) where p (z) = p12 e 2 :
i=1
ii) All other Gaussian vectors X = (X1 ; :::; Xn ) (in any dimension n) are obtained from standard
ones by a¢ ne transformations:
X = AZ + b
0.15
0.10
z
0.05
-2 0.00 -2
0 0
2 2
y x
and the graph of the other Gaussian vectors can be guessed by linear deformations of the base
plane xy (deformations de…ned by A) and shift (by b). For instance, if
2 0
A=
0 1
0.15
0.10
z
0.05
-4 0.00 -4
-2 -2
0 0
2 2
4 4
y x
First, let us compute the mean and covariance matrix of a vector of the form X = AZ + b, with
Z of standard type. From exercises 4 and 5 we readily have:
3. GAUSSIAN VECTORS ix
Proposition 3. Mean and covariance Q matrix of a vector X of the previous form are given
by
=b
Q = AAT :
When two di¤erent de…nitions are given for the same object, one has to prove their equivalence.
If Q is positive de…nite, the two de…nition aim to describe the same object, but for Q non-negative
de…nite but not strictly positive de…nite, we have only the last de…nition, so we do not have to check
any compatibility.
Proposition 4. If Q is positive de…nite, then de…nitions 1 and 2 are equivalent. More precisely, if
X = (X1 ; :::; Xn ) is a Gaussian random vector with mean and covariance Q in the sense of de…nition
1, then there exists a standard Gaussian random vector Z = (Z1 ; :::; Zn ) and a n n matrix A such
that
X = AZ + :
p
One can take A = Q, as described in the proof. Vice versa, if X = (X1 ; :::; Xn ) is a Gaussian
random vector in the sense of de…nition 2, of the form X = AZ + b, then X is Gaussian in the sense
of de…nition 1, with mean and covariance Q given by the previous proposition.
Proof. Let us prove the …rst claim. Let us de…ne
p p
Q = U Qe U T
p
where Qe is simply de…ned as
0 p 1
p 1 0 0
Qe = @ 0 ::: p0 A :
0 0 n
We have
p T p T p p
Q =U Qe UT = U Qe U T = Q
and
p 2 p p p p
Q =U Qe U T U Qe U T = U Qe Qe U T = U Qe U T = Q
p p
because Qe Qe = Qe . Set
p 1
Z= Q X
p
where notice that Q is invertible, from its de…nition and the strict positivity of i. Then Z is
Gaussian. Indeed, from the formula for the transformation of densities,
fX (x)
fZ (z) =
jdet Dg (x)j z=g(x)
x 1. PRELIMINARIES OF PROBABILITY
p 1 p 1
where g (x) = Q x
; hence det Dg (x) = det Q = p 1 p ; therefore
1 n
n p T p !
Yp 1 Qz + Q 1 Qz +
fZ (z) = ip n
exp
(2 ) det(Q) 2
i=1
p T p !
1 Qz Q 1 Qz 1 zT z
=p n
exp =p n
exp
(2 ) 2 (2 ) 2
p
which is the density of a standard Gaussian vector. From the de…nition of Z we get X = QZ + ,
so the …rst claim is proved.
The proof of the second claim is a particular case of the next exercise, that we leave to the
reader.
Exercise 6. Let X = (X1 ; :::; Xn ) be a Gaussian random vector, B a n m matrix, c a vector of
Rm . Then
Y = BX + c
is a Gaussian random vector of dimension m. The relations between means and covariances is
Y =B X +c
and covariance
QY = BQX B T :
Remark 8. We see from the exercise that we may start with a non-degenerate vector X and get
a degenerate one Y , if B is not a bijection. This always happens when m > n.
Remark 9. The law of a Gaussian vector is determined by the mean vector and the covariance
matrix. This fundamental fact will be used below when we study stochastic processes.
Remark 10. Some of the previous results are very useful if we want to generate random vectors
according to a prescribed Gaussian law. Assume we have prescribed mean and covariance Q, n-
dimensional, and want to generate a random sample (x1 ; :::; xn ) from such N ( ; Q). Then we may
generate n independent samples z1 ; :::; zn from the standard one-dimensional Gaussian law and com-
pute p
Qz +
p
where z = (z1 ; :::; zn ). In order to have the entries of the
p matrix
p Q, if the software does
p not provide
them (certain software do it), we may use the formula Q = U Qe U T . The matrix Qe is obvious.
In order to get the matrix U recall that its columns are the vectors e1 ; :::; en written in the original
basis. And such vectors are an orthonormal basis of eigenvectors of Q. Thus one has to use at least
a software that makes the spectral decomposition of a matrix, to get e1 ; :::; en .