You are on page 1of 10

CHAPTER 1

Preliminaries of Probability

1. Transformation of densities
Exercise 1. If X has cdf FX (x) and g is increasing and continuous, then Y = g (X) has cdf
1
FY (y) = FX g (y)
for all y in the image of y. If g is decreasing and continuous, the formula is
1
FY (y) = 1 FX g (y)
Exercise 2. If X has continuous pdf fX (x) and g is increasing and di¤ erentiable, then Y = g (X)
has pdf
fX g 1 (y) fX (x)
fY (y) = 0 = 0
g (g 1 (y)) g (x) y=g(x)
for all y in the image of y. If g is decreasing and di¤ erentiable, the formula is
fX (x)
fY (y) = :
g 0 (x) y=g(x)

Thus, in general, we have the following result.


Proposition 1. If g is monotone and di¤ erentiable, the transformation of densities is given by
fX (x)
fY (y) =
jg 0 (x)j y=g(x)

Remark 1. Under proper assumptions, when g is not injective the formula generalizes to
X fX (x)
fY (y) = :
jg 0 (x)j
x:y=g(x)

Remark 2. A second proof of the previous formula comes from the following characterization of
the density: f is the density of X if and only if
Z
E [h (X)] = h (x) f (x) dx
R
fX (x)
for all continuous bounded functions h. Let us use this fact to prove that fY (y) = jg 0 (x)j y=g(x) is the
density of Y = g (X). Let us compute E [h (Y )] for a generic continuous bounded functions h. We
i
ii 1. PRELIMINARIES OF PROBABILITY

have, from the de…nition of Y and from the characterization applied to X,


Z
E [h (Y )] = E [h (g (X))] = h (g (x)) f (x) dx:
R
Let us change variable y = g (x), under the assumption that g is monotone, bijective and di¤ erentiable.
We have x = g 1 (y), dx = jg0 (g 11 (y))j dy (we put the absolute value since we do not change the extreme
R
of integration, but just rewrite R ) so that
Z Z
1
h (g (x)) f (x) dx = h (y) f g 1 (y) 0 (g 1 (y))j
dy:
R R jg
fX (x)
If we set fY (y) := jg 0 (x)j y=g(x) we have proved that
Z
E [h (Y )] = h (y) fY (y) dy
R
for every continuous bounded functions h. By the characterization, this implies that fY (y) is the
density of Y . This proof is thus based on the change of variable formula.
Remark 3. The same proof works in the multidimensional case, using the change of variable
formula for multiple integrals. Recall that in place of dy = g 0 (x)dx one has to use dy = jdet Dg (x)j dx
where Dg is the Jacobian (the matrix of …rst derivatives) of the transformation g : Rn ! Rn . In fact
we need the inverse transformation, so we use the corresponding formula
1
dx = det Dg 1 (y) dy = dy:
jdet Dg (g 1 (y))j
With the same passages performed above, one gets the following result.
Proposition 2. If g is a di¤ erentiable bijection and Y = g (X), then
fX (x)
fY (y) = :
jdet Dg (x)j y=g(x)

Exercise 3. If X (in Rn ) has density fX (x) and Y = U X, where U is an orthogonal linear


transformation of Rn (it means that U 1 = U T ), then Y has density
fY (y) = fX U T y :
1.1. Linear transformation of moments. The solution of the following exercises is based on
the linearity of expected value (and thus of covariance in each argument).
Exercise 4. Let X = (X1 ; :::; Xn ) be a random vector, A be a n d matrix, Y = AX. Let
X = X X X
1 ; :::; n be the vector of mean values of X, namely i = E [Xi ]. Then
Y X
:= A
is the vector of mean values of Y , namely Y = E [Yi ].
i
Exercise 5. Under the same assumptions, if QX and QY are the covariance matrices of X and
Y , then
QY = AQX AT :
2. ABOUT COVARIANCE MATRICES iii

2. About covariance matrices


The covariance matrix Q of a vector X = (X1 ; :::; Xn ), de…ned as Qij = Cov (Xi ; Xj ), is symmetric:

Qij = Cov (Xi ; Xj ) = Cov (Xj ; Xi ) = Qji

and non-negative de…nite:


n
X n
X n
X
T
x Qx = Qij xi xj = Cov (Xi ; Xj ) xi xj = Cov (xi Xi ; xj Xj )
i;j=1 i;j=1 i;j=1
0 1
Xn n
X
= Cov @ xi Xi ; xj Xj A = V ar [W ]
i=1 j=1
Pn
where W = i=1 xi Xi .
The spectral theorem states that any symmetric matrix Q can be diagonalized, namely it exists a
orthonormal basis e1 ; :::; en of Rn where Q takes the form
0 1
1 0 0
Qe = @ 0 ::: 0 A :
0 0 n

Moreover, the numbers i are eigenvalues of Q, and the vectors ei are corresponding eigenvectors.
Since the covariance matrix Q is also non-negative de…nite, we have

i 0; i = 1; :::; n:

Remark 4. To understand better this theorem, recall a few facts of linear algebra. Rn is a vector
space with a scalar product h:; :i, namely a set of elements (called vectors) with certain operations (sum
of vectors, multiplication by real numbers, scalar product between vectors) and properties. We may call
intrinsic the objects de…ned in these terms, opposite to the objects de…ned by means of numbers, with
respect to a given basis. A vector x 2 Rn is an intrinsic object; but we can write it as a sequence of
numbers (x1 ; :::; xn ) in in…nitely many ways, depending on the basis we choose. Given an orthonormal
basis u1 ; :::; un , the components of a vector x 2 Rn in this basis are the numbers hx; uj i, j = 1; :::; n. A
linear map L in Rn , given the basis u1 ; :::; un , can be represented by a matrix of components hLui ; uj i.
We shall write y T x for hx; yi (or hy; xi).

Remark 5. After these general comments, we see that a matrix represents a linear transformation,
given a basis. Thus, given the canonical basis of Rn , that we shall denote by u1 ; :::; un , given the matrix
Q, it is de…ned a linear transformation L from Rn to a Rn . The spectral theorem states that there is a
new orthonormal basis e1 ; :::; en of Rn such that, if Qe represents the linear transformation L in this
new basis, then Qe is diagonal.

Remark 6. Let us recall more facts about linear algebra. Start with an orthonormal basis u1 ; :::; un ,
that we call canonical or original basis. Let e1 ; :::; en be another orthonormal basis. The vector u1 , in
iv 1. PRELIMINARIES OF PROBABILITY

the canonical basis, has components


0
1
1
B 0 C
u1 = B C
@ ::: A
0
and so on for the other vectors. Each vector ej has certain components. Denote by U the matrix such
that its …rst column has the same components as e1 (those of the canonical basis), and so on for the
other columns. We could write U = (e1 ; :::; en ). Also, Uij = eTj ui . Then
0 1
1
B 0 C
UB C
@ ::: A = e1
0

and so on, namely U represents the linear map which maps the canonical (original) basis of Rn into
e1 ; :::; en . This is an orthogonal transformation:
1
U = UT :

Indeed, U 1 maps e1 ; :::; en into the canonical basis (by the above property of U ), and U T does the
same:
0 1 0 1
eT1 e1 1
B eT e1 C B 0 C
U T e1 = B 2 C B
@ ::: A = @
C
::: A
eTn e1 0
and so on.

Remark 7. Let us now go back to the covariance matrix Q and the matrix Qe given by the spectral
theorem: Qe is a diagonal matrix which represents the same linear transformation L in a new basis
e1 ; :::; en . Assume we do not know anything else, except they describe the same map L and Qe is
diagonal, namely of the form
0 1
1 0 0
Qe = @ 0 ::: 0 A :
0 0 n

Let us deduce a number of facts:


i) from basic linear algebra we know the relation

Qe = U T QU

ii) the diagonal elements j are eigenvalues of L, with eigenvectors ej


iii) j 0, j = 1; :::; n.
3. GAUSSIAN VECTORS v
0 1
1
B 0 C
To prove (ii), let us write the vector Le1 in the basis e1 ; :::; en : ei is the vector B C
@ ::: A, the map
0
L is represented by Qe , hence Le1 is equal to
0 1 0 1 0 1
1 1 1
B 0 C B 0 C B C
Qe B C=B C = 1B 0 C
@ ::: A @ ::: A @ ::: A
0 0 0
which is 1 e1 in the basis e1 ; :::; en . We have checked that Le1 = 1 e1 , namely that 1 is an eigenvalue
and e1 is a corresponding eigenvector. The proof for 2 , etc. is the same. To prove (iii), just see that,
in the basis e1 ; :::; en ,
eTj Qe ej = j :
But
eTj Qe ej = eTj U T QU ej = v T Qv 0
where v = U ej , having used the property that Q is non-negative de…nite. Hence j 0.

3. Gaussian vectors
Recall that a Gaussian, or Normal, r.v. N ; 2
is a r.v. with probability density
!
1 jx j2
f (x) = p exp :
2 2 2 2
We have shown that is the mean value and 2 the variance. The standard Normal is the case = 0,
2 = 1. If Z is a standard normal r.v., then + Z is N ; 2 .
We may give the de…nition of Gaussian vector in two ways, generalizing either the expression of
the density or the property that + Z is N ; 2 . Let us start with a lemma.
Lemma 1. Given a vector = ( 1 ; :::; n ) and a symmetric positive de…nite n n matrix Q
(namely v T Qv > 0 for all v 6= 0), consider the function
!
1 (x )T Q 1 (x )
f (x) = p exp
(2 )n det(Q) 2
where x = (x1 ; :::; xn ) 2 Rn . Notice that the inverse Q 1 is well de…ned for positive de…nite matrices,
(x )T Q 1 (x ) is a positive quantity, det(Q) is a positive number. Then:
i) f (x) is a probability density;
ii) if X = (X1 ; :::; Xn ) is a random vector with such joint probability density, then is the vector
of mean values, namely
i = E [Xi ]
and Q is the covariance matrix:
Qij = Cov (Xi ; Xj ) :
vi 1. PRELIMINARIES OF PROBABILITY

Proof. Step 1. In this step we explain the meaning of the expression f (x). We have recalled
above that any symmetric matrix Q can be diagonalized, namely it exists a orthonormal basis e1 ; :::; en
of Rn where Q takes the form 0 1
1 0 0
Qe = @ 0 ::: 0 A :
0 0 n
Moreover, the numbers i are eigenvalues of Q, and the vectors ei are corresponding eigenvectors.
See above for more details. Let U be the matrix introduced there, such that U 1 = U T . Recall the
relation Qe = U T QU .
Since v T Qv > 0 for all v 6= 0, we deduce
v T Qe v = v T U T Q (U v) > 0
for all v 6= 0 (since U v 6= 0). Taking v = ei , we get i > 0.
Therefore the matrix Qe is invertible, with inverse given by
0 1
1
1 0 0
Qe 1 = @ 0 ::: 0 A :
0 0 1
n

It follows that also Q, being equal to U Qe U T (the relation Q = U Qe U T comes from Qe = U T QU ),


is invertible, with inverse Q 1 = U Qe 1 U T . Easily one gets (x )T Q 1 (x ) > 0 for x 6= .
Moreover,
det(Q) = det (U ) det (Qe ) det U T = 1 n
because
det(Qe ) = 1 n
and det (U ) = 1. The latter property comes from
1 = det I = det U T U = det U T det (U ) = det (U )2
(to be used in exercise 3). Therefore det(Q) > 0. The formula for f (x) is meaningful and de…nes a
positive function.
Step 2. Let us prove that f (x) is a density. By the theorem of change of variables in multidi-
mensional integrals, with the change of variables x = U y,
Z Z
f (x) dx = f (U y) dy
Rn Rn
because jdet U j = 1 (and the Jacobian of a linear transformation is the linear map itself). Now, since
U T Q 1 U = Qe 1 , f (U y) is equal to the following function:
!
T 1 (y
1 (y e ) Qe e )
fe (y) = p exp
(2 )n det(Qe ) 2
where
e = UT :
3. GAUSSIAN VECTORS vii

Since
n
X 2
T 1 (yi ( e )i )
(y e) Qe (y e) =
i=1 i

and det(Qe ) = 1 n, we get


n
!
Y 1 (yi ( e )i )2
fe (y) = p exp :
2 i 2 i
i=1

Namely, fe (y) is the product of n Gaussian densities N (( e )i ; i ). We know from the theory of
joint probability densities that the product of densities is the joint density
R of a vector with indepen-
dent
R components. Hence fe (y) is a probability density. Therefore R n fe (y) dy = 1. This proves
Rn f (x) dx = 1, so that f is a probability density.
Step 3. Let X = (X1 ; :::; Xn ) be a random vector with joint probability density f , when written
in the original basis. Let Y = U T X. Then (exercise 3) Y has density fY (y) given by fY (y) = f (U y).
Thus !
Yn
1 (yi ( e )i )2
fY (y) = fe (y) = p exp :
2 i 2 i
i=1
Thus (Y1 ; :::; Yn ) are independent N (( e )i ; i) r.v. and therefore
E [Yi ] = ( e )i ; Cov (Yi ; Yj ) = ij i :

From exercises 4 and 5 we deduce that X = U Y has mean


X =U Y

and covariance
QX = U QY U T :
Since Y = e and e = U T we readily deduce X = UUT = . Since QY = Qe and Q = U Qe U T
we get QX = Q. The proof is complete.
Definition 1. Given a vector = ( 1 ; :::; n ) and a symmetric positive de…nite n n matrix Q,
we call Gaussian vector of mean and covariance Q a random vector X = (X1 ; :::; Xn ) having joint
probability density function
!
1 (x )T Q 1 (x )
f (x) = p exp
(2 )n det(Q) 2
where x = (x1 ; :::; xn ) 2 Rn . We write X N ( ; Q).
The only drawback of this de…nition is the restriction to strictly positive de…nite matrices Q. It is
sometimes useful to have the notion of Gaussian vector also in the case when Q is only non-negative
de…nite (sometimes called degenerate case). For instance, we shall see that any linear transformation
of a Gaussian vector is a Gaussian vector, but in order to state this theorem in full generality we need
to consider also the degenerate case. In order to give a more general de…nition, let us take the idea
recalled above for the 1-dimensional case: a¢ ne transformations of Gaussian r.v. are Gaussian.
viii 1. PRELIMINARIES OF PROBABILITY

Definition 2. i) The standard d-dimensional Gaussian vector is the random vector Z = (Z1 ; :::; Zd )
d
Y z2
with joint probability density f (z1 ; :::; zd ) = p (zi ) where p (z) = p12 e 2 :
i=1
ii) All other Gaussian vectors X = (X1 ; :::; Xn ) (in any dimension n) are obtained from standard
ones by a¢ ne transformations:
X = AZ + b

where A is a matrix and b is a vector. If X has dimension n, we require A to be d n and b to have


dimension n (but n can be di¤ erent from d).

The graph of a standard 2-dimensional Gaussian vector is

0.15
0.10
z
0.05
-2 0.00 -2
0 0
2 2
y x

and the graph of the other Gaussian vectors can be guessed by linear deformations of the base
plane xy (deformations de…ned by A) and shift (by b). For instance, if

2 0
A=
0 1

matrix which enlarge the x axis by a factor 2, we get the graph

0.15
0.10
z
0.05
-4 0.00 -4
-2 -2
0 0
2 2
4 4
y x

First, let us compute the mean and covariance matrix of a vector of the form X = AZ + b, with
Z of standard type. From exercises 4 and 5 we readily have:
3. GAUSSIAN VECTORS ix

Proposition 3. Mean and covariance Q matrix of a vector X of the previous form are given
by
=b

Q = AAT :
When two di¤erent de…nitions are given for the same object, one has to prove their equivalence.
If Q is positive de…nite, the two de…nition aim to describe the same object, but for Q non-negative
de…nite but not strictly positive de…nite, we have only the last de…nition, so we do not have to check
any compatibility.
Proposition 4. If Q is positive de…nite, then de…nitions 1 and 2 are equivalent. More precisely, if
X = (X1 ; :::; Xn ) is a Gaussian random vector with mean and covariance Q in the sense of de…nition
1, then there exists a standard Gaussian random vector Z = (Z1 ; :::; Zn ) and a n n matrix A such
that
X = AZ + :
p
One can take A = Q, as described in the proof. Vice versa, if X = (X1 ; :::; Xn ) is a Gaussian
random vector in the sense of de…nition 2, of the form X = AZ + b, then X is Gaussian in the sense
of de…nition 1, with mean and covariance Q given by the previous proposition.
Proof. Let us prove the …rst claim. Let us de…ne
p p
Q = U Qe U T
p
where Qe is simply de…ned as
0 p 1
p 1 0 0
Qe = @ 0 ::: p0 A :
0 0 n

We have
p T p T p p
Q =U Qe UT = U Qe U T = Q
and
p 2 p p p p
Q =U Qe U T U Qe U T = U Qe Qe U T = U Qe U T = Q
p p
because Qe Qe = Qe . Set
p 1
Z= Q X
p
where notice that Q is invertible, from its de…nition and the strict positivity of i. Then Z is
Gaussian. Indeed, from the formula for the transformation of densities,
fX (x)
fZ (z) =
jdet Dg (x)j z=g(x)
x 1. PRELIMINARIES OF PROBABILITY

p 1 p 1
where g (x) = Q x
; hence det Dg (x) = det Q = p 1 p ; therefore
1 n
n p T p !
Yp 1 Qz + Q 1 Qz +
fZ (z) = ip n
exp
(2 ) det(Q) 2
i=1
p T p !
1 Qz Q 1 Qz 1 zT z
=p n
exp =p n
exp
(2 ) 2 (2 ) 2
p
which is the density of a standard Gaussian vector. From the de…nition of Z we get X = QZ + ,
so the …rst claim is proved.
The proof of the second claim is a particular case of the next exercise, that we leave to the
reader.
Exercise 6. Let X = (X1 ; :::; Xn ) be a Gaussian random vector, B a n m matrix, c a vector of
Rm . Then
Y = BX + c
is a Gaussian random vector of dimension m. The relations between means and covariances is
Y =B X +c
and covariance
QY = BQX B T :
Remark 8. We see from the exercise that we may start with a non-degenerate vector X and get
a degenerate one Y , if B is not a bijection. This always happens when m > n.
Remark 9. The law of a Gaussian vector is determined by the mean vector and the covariance
matrix. This fundamental fact will be used below when we study stochastic processes.
Remark 10. Some of the previous results are very useful if we want to generate random vectors
according to a prescribed Gaussian law. Assume we have prescribed mean and covariance Q, n-
dimensional, and want to generate a random sample (x1 ; :::; xn ) from such N ( ; Q). Then we may
generate n independent samples z1 ; :::; zn from the standard one-dimensional Gaussian law and com-
pute p
Qz +
p
where z = (z1 ; :::; zn ). In order to have the entries of the
p matrix
p Q, if the software does
p not provide
them (certain software do it), we may use the formula Q = U Qe U T . The matrix Qe is obvious.
In order to get the matrix U recall that its columns are the vectors e1 ; :::; en written in the original
basis. And such vectors are an orthonormal basis of eigenvectors of Q. Thus one has to use at least
a software that makes the spectral decomposition of a matrix, to get e1 ; :::; en .

You might also like