MIMO Information Theory

MIMO Information Theory
Robert W. Heath Jr.
Wireless Networking and Communications Group (WNCG)

Dept. of Electrical and Computer Engineering
The University of Texas, Austin
rheath@ece.utexas.edu
www.ece.utexas.edu/~rheath
2007 Huawei May 2007
Outline
Background on information theory
- Mutual information
- Channel capacity
Derivation of capacity with channel state information at

the transmitter and receiver
Derivation of the ergodic capacity for Rayleigh fading

channels
2007 Robert W. Heath Jr. 2

Objectives
Define mutual information and capacity
Calculate the capacity of the narrowband AWGN
channel with knowledge of the channel at the
transmitter and the receiver
Calculate the ergodic capacity of the narrowband
AWGN Rayleigh channel with distribution information
at the transmitter and channel state information at the
receiver

Reading for Lecture

The following paper:
I. E. Telatar, Capacity of Multi-antenna Gaussian
Channels, European Transactions on
Telecommunications, 10(6):585-595, Nov. 1999.
Chapter 4 (Sections 4.1-4.5, 4.7)

Introduction to Space-Time Wireless Communications
Elements of Information Theory by Cover and Thomas

is a useful reference for this lecture.

Review Questions
Why can we ignore the carrier frequency in the
equivalent channel model?
What is the main assumption that allows for the

convenient discrete-time representation?
What is the flat-fading MIMO channel model?

Review of Information Theory

Information theory founded by Claude Shannon
Studies two key problems that pertain to
communication of information
- What is the maximum data compression possible for an
information source?
- What is the maximum transmitter rate supported by a
communication channel?
Entropy is a measure of randomness of a source and
defines the compressibility limit

Brief Intro to Channel Capacity 1/2

Channel capacity is essentially the supremum of achievable data
rates that a channel can support without error
- Remember that the supremum is the least upper bound (you can think
of it as a maximum)
The channel in this context is the mechanism that transfers the
input to the output
- Includes the matrix channel as well as additive noise
Rates R < C are achievable with arbitrarily small probability of
error
Rates R > C are not achievable with arbitrarily small probability or
error

Brief Intro to Channel Capacity 2/2

Capacity is a rate
- The units are bits per second or bits/s
Capacity often is normalized by the bandwidth
- The units are bits per second per Hz (b/s/Hz)
- This is called spectral efficiency
Computing the capacity requires
- Determining the mutual information as a function of the
channel
- Maximizing the mutual information over all possible input
distributions of transmitted signals under appropriate
constraints, like power and bandwidth
Focus on capacity of a single user channel

Capacity of MIMO Channels
Narrowband Signal Model
y: Mt x 1 received signal vector

s: Mt x 1 transmitted signal vector, assumed to be zero
mean and have covariance Rs = E s sH
v: Mr x 1 AWGN vector with zero mean and covariance
No IMr
- This means the noise is uncorrelated on different antennas
H: Mr x Mt channel matrix

Channel State Information
Tx

Rx
? CSI: channel state information
?
CDI: channel distribution information
Relevant scenarios
- Channel known at receiver but channel distribution known at
the transmitter CSIR / CDIT
- Channel known instantaneously at the transmitter and receiver
CSIR / CSIT
- Unknown? (noncoherent case or capacity with training)
Usually need at least distribution information at the
transmitter, otherwise it is difficult to optimize
CSI Known at Both Transmitter and Receiver

Consider
- CSI (H) known perfectly at the transmitter
- CSI (H) known perfectly at the receiver
- Proper white complex Gaussian noise
How do we realize CSI?
- @ receiver -> channel estimation
- @ transmitter -> feedback or channel reciprocity
This will be an upper bound for the case where only
distribution information is available at the transmitter

Capacity Derivation
Channel capacity (w/ CSIR & CSIT) is given by
Mutual information
H(x) is differential entropy

Entropy for Complex Gaussian Distributions

Use the complex Gaussian distribution in most
calculations. A circularly symmetric complex
Gaussian (CSCG) vector has a distribution fully
determined from mx=E[x] and Rx = E[x - E[x]][x-E[x]]
(dont need to treat real and imaginary separately)
Telatar provides a number of useful results about

complex Gaussian distributions

Properties
(1) For complex Gaussian x CN H(x)=log2 | e Rx|
(2) For any zero mean x CN with Rx

H(x) log2 | e Rx|
Gaussians are maximum entropy!
(3) Let y = A x where x is Nc(mx, Rx).

-> Then y is Nc(A mx, A Rx AH)
(4) Let y = x + v, both independent CSCG

-> my = mx + mv Ry = Rx + Rv

Capacity Derivation
Can show that optimum s is circularly symmetric complex
Gaussian (see Telatar)
H(v) is the differential entropy of Gaussian noise which is
H(y) is the differential entropy of y. Since n and v are CSCG

Capacity Derivation (cont.)

The capacity can be computed as follows
How do we solve the above?


A simplification
Perform an SVD on H, i.e., H=UVH
Rewrite


Make Rs = V D VH where D is diagonal
This simplifies to

Water Filling Solution

Using the water-filling theorem (see Larsson & Stoica or Cover &
Thomas)
Optimal solution is
Where is chosen such that

Water Filling Algorithm

First assume p = Mt modes are used
Compute
For k = 1,2,,p
If any dk are negative, recompute for p = p-1

Capacity Derivation Summary

Capacity with CSIT / CSIR for a given narrowband
matrix channel is
Where
and is chosen such that

Intuition 1/2
s

y

s D1/2 V UH y

Send information on eigenvectors of channel
s D1/2 y

Makes channel look like a diagonal matrix

Intuition 2/2
Optimum signal covariance implies the best
transmission strategy is to send information on the
eigenmodes of the channel
Waterpouring determines how many modes we use
and how much power is given to each mode
MIMO Channel have at most min(Mt, Mr ) modes
SISO channel have only one mode

Capacity Derivation Summary

Capacity with CSIT / CSIR for a given narrowband
matrix channel is
Where for the optimum solution Rs = V D VH where D

is diagonal
Elements of D are chosen as
where

Key Ideas from Telatars Proof

Define the capacity
Use complex notation / distributions
Gaussian transmit signaling is optimal
Transmit covariance Rs has a special form
- Eigenvectors are the right singular vectors of H
- Eigenvalues of Rs chosen from eigenvalues of H

Review Questions
What is waterfilling?
What is the intuition about waterfilling for

- Low rank channels?
- High rank channels?

Scenario: CDIT / CSIR
Tx

Rx
CDI: channel
distribution
information
available
Consider
- AWGN noise
- Rayleigh fading channel (entries of H are CN(0,1))
- Block fading model

Narrowband Signal Model
y: Mt x 1 received signal vector

s: Mt x 1 transmitted signal vector, assumed to be zero
mean and have covariance Rs = E s sH
v: Mr x 1 AWGN vector with zero mean and covariance
No IMr
- This means the noise is uncorrelated on different antennas
H: Mr x Mt channel matrix

What if Transmitter has No Channel Info?

Can not optimize the mutual information over the
transmit signal covariance matrix Rs. No solution!
Most researchers use the following capacity definition
but notice that this is not the true capacity (the true
capacity is not defined in this case)
Effectively they assume that Rs = IMt

Example: CSIT / CSIR
SIMO
- CSIT/CSIR
- CSIR
MISO Power penalty!
- CSIT/CSIR
- CSIR

CSIR / CDIT Case

How does distribution information help?
We can optimize the transmitter distribution based on
the channel distribution
- Depends on the channel model!
Today assume Rayleigh channel model for H
- Means that entries of H are circularly symmetric complex
Gaussian with zero mean and total unit variance Nc(0,1)
- Models fading channel with sufficient scattering, large transmit
antenna spacing, large receive antenna spacing
- Somewhat unrealistic (discuss other models in next lecture)

Useful Theorem from Telatar

If the entries of H are circularly symmetric complex
Gaussian with zero mean and total unit variance
Nc(0,1) then for any unitary Mr x Mr U and unitary Mt x
Mt V then
U H VH is equivalent in distribution to H

Ergodic Capacity Derivation

Heres the trick -> since the receiver knows the
channel realization, the channel output is (y,H)
Compute
0
Last step follows from fact that source should be complex

Gaussian

Capacity given by solving
Using the previous theorem can equivalently solve
where D is diagonal!!

Capacity Derivation (concluded)

Using a somewhat technical permutation argument, it
can be shown that the optimal D = IMt
Theorem (Teletar 95): The capacity of the channel is

achieved when s is a proper complex Gaussian vector
with zero-mean and covariance Imt.

Observations
Thus the Ergodic Capacity for Rayleigh channels is
given by
Note that for fixed Mr, as Mt->1 1/Mt HHH -> I thus

Ergodic Capacity
We call this the ergodic capacity because it turns out
that H does not need to be independent from
realization to realization, only generated from an
ergodic process.
In contrast, consider a non-ergodic channel. This is
one in which the channel is randomly chosen but fixed
for all time. The Shannon capacity of this channel is
zero.

What about Just CSIR (no CDIT)?

The instantaneous capacity is
The average (or ergodic) capacity is
This is the origin behind the ad hoc definition of capacity we

discussed before. Basically that Rs = I is optimal in the Rayleigh
case but is used for other channels as well.

MIMO: Ergodic Capacity

MIMO Outage Capacity

Ergodic capacity is the (time) average of the mutual
information and in some cases is the true capacity.
Another useful measure is the outage capacity. This is

the fraction of the time the capacity exceeds a certain
threshold

MIMO 10% Outage Capacity

SIMO (Mt =1), Ergodic Capacity

SIMO 10% outage capacity

SIMO Comments
No benefit of multiple data streams
At high SNRs, capacity increases logarithmically with
SNR
Ergodic capacity increases only slightly with increasing
number of antennas

MISO (Mr =1), Ergodic Capacity

MISO (Mr =1), 10% outage Capacity

Full channel knowledge, 10% outage capacity

Comments
No benefit of multiple data streams
At high SNRs, capacity increases logarithmically with
SNR
No array gain effect (versus SIMO case)
Ergodic capacity increases only slightly with increasing
number of antennas

Summary
Capacity with no channel state or distribution
information at the transmitter is often assumed to be
Ergodic capacity for Rayleigh channels is
Note: it can be calculated in closed form or

approximated but it is messy

MIMO Information Theory

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MIMO Information Theory

Uploaded by

Copyright:

Available Formats

MIMO Information Theory

Robert W. Heath Jr.

Wireless Networking and Communications Group (WNCG)

Derivation of capacity with channel state information at

Derivation of the ergodic capacity for Rayleigh fading

2007 Robert W. Heath Jr. 2

2007 Robert W. Heath Jr. 3

Reading for Lecture

Chapter 4 (Sections 4.1-4.5, 4.7)

Elements of Information Theory by Cover and Thomas

2007 Robert W. Heath Jr. 4

What is the main assumption that allows for the

What is the flat-fading MIMO channel model?

2007 Robert W. Heath Jr. 5

Review of Information Theory

2007 Robert W. Heath Jr. 6

Brief Intro to Channel Capacity 1/2

2007 Robert W. Heath Jr. 7

Brief Intro to Channel Capacity 2/2

2007 Robert W. Heath Jr. 8

Narrowband Signal Model

y: Mt x 1 received signal vector

2007 Robert W. Heath Jr. 10

Channel State Information

CSI Known at Both Transmitter and Receiver

2007 Robert W. Heath Jr. 12

H(x) is differential entropy

2007 Robert W. Heath Jr. 13

Entropy for Complex Gaussian Distributions

Telatar provides a number of useful results about

2007 Robert W. Heath Jr. 14

(2) For any zero mean x CN with Rx

(3) Let y = A x where x is Nc(mx, Rx).

(4) Let y = x + v, both independent CSCG

2007 Robert W. Heath Jr. 15

H(v) is the differential entropy of Gaussian noise which is

H(y) is the differential entropy of y. Since n and v are CSCG

2007 Robert W. Heath Jr. 16

Capacity Derivation (cont.)

How do we solve the above?

2007 Robert W. Heath Jr. 17

Capacity Derivation (cont.)

2007 Robert W. Heath Jr. 18

Capacity Derivation (cont.)

2007 Robert W. Heath Jr. 19

Water Filling Solution

Where is chosen such that

2007 Robert W. Heath Jr. 20

Water Filling Algorithm

If any dk are negative, recompute for p = p-1

2007 Robert W. Heath Jr. 21

Capacity Derivation Summary

2007 Robert W. Heath Jr. 22

Send information on eigenvectors of channel

Makes channel look like a diagonal matrix

2007 Robert W. Heath Jr. 23

SISO channel have only one mode

2007 Robert W. Heath Jr. 24

Capacity Derivation Summary

Where for the optimum solution Rs = V D VH where D

2007 Robert W. Heath Jr. 25

Key Ideas from Telatars Proof

2007 Robert W. Heath Jr. 26

What is the intuition about waterfilling for