You are on page 1of 11

EE 376B Information Theory Prof. T.

Cover

Handout #7 Thursday, April 21, 2011

Solutions to Homework Set #2 1. Multiple layer waterlling Let C(x) = 1 log(1 + x) denote the channel capacity of a Gaussian channel with signal 2 to noise ratio x. Show ( ) ( ) ( ) P1 P2 P1 + P2 C +C =C . N P1 + N N This suggests that 2 independent users can send information as well as if they had pooled their power. Solution: Multiple layer waterlling ) ( ) ( P1 + P2 1 P1 + P2 = log 1 + C N 2 N ( ) N + P1 + P2 1 = log 2 N ( ) N + P1 + P2 N + P1 1 = log 2 N + P1 N ( ) ( ) N + P1 + P2 N + P1 1 1 = log + log 2 N + P1 2 N ( ) ( ) P2 P = C +C P1 + N N1

2. Parallel channels and waterlling Consider a pair of parallel Gaussian channels, i.e., ) ) ( ) ( ( Z1 X1 Y1 , + = Z2 X2 Y2 where ( Z1 Z2 ) ]) ( [ 2 1 0 , N 0, 2 0 2

2 2 2 2 and there is a power constraint E(X1 + X2 ) P . Assume that 1 > 2 .

(a) At what power does the channel stop behaving like a single channel with noise 2 variance 2 , and begin behaving like a pair of channels, ie., at what power does the worst channel become useful? (b) What is the capacity C(P ) for large P ? Solution: Parallel channels and waterlling (a) By the result of Section 9.5 of Cover and Thomas, it follows that we will put all the signal power into the channel with less noise until the total power of noise + signal in that channel equals the noise power in the other channel. After that, we will split any additional power evenly between the two channels. Thus the combined channel begins to behave like a pair of parallel channels when the signal power is equal to the dierence of the two noise powers, i.e., when 2 2 P = 1 2 .
2 2 (b) Let E(X1 ) = P1 and E(X2 ) = P2 . Therefore

P = P1 + P2 . From waterlling we know


2 2 P2 = P1 + 1 2 .

(1)

(2)

From equations (1) and (2) we get P1 = P2 Hence


2 2 P (1 2 ) 2 2 2 P + (1 2 ) . = 2

( ) ( ) 2 2 2 2 1 P (1 2 ) 1 P + (1 2 ) C(P ) = log 1 + + log 1 + 2 2 2 21 2 22

3. Vector channel Consider the 3 input 3 output Gaussian channel Z N3 (0, K)

?  

- Y

2 2 2 where X, Y, Z R3 , E||X||2 = E(X1 + X2 + X3 ) P, and Z N3 (0, K). Find the capacity for 1 0 0 K = 0 1 . 0 1

Solution: Vector channel We know that 1 C = log 2 ( |KY | |K| )

where KY is the covariance matrix of the channel output. We can calculate the eigenvalues of the K matrix to be 1 = 1, 2 = 1 , and 3 = 1 + . Hence |K| = 1 . We now need to maximize |KY | = |KX + K|. From Section 9.5 of Cover and Thomas we see that 3 (Ai + i ), sup |KX + K| =
KX + i=1

where Ai = ( i ) and A1 + A2 + A3 = P. We will rst look at the case where > 0. Hence we have 0 if P < P A1 = if P < 3 2 P if 3 P 3 if P < P P + A2 = if P < 3 P 2 + if 3 P 3 if P < 0 0 if P < 3 A3 = P if 3 P 3

Therefore,

(P + 1 )(1 + ) if P < ( )2 P sup |KX + K| = + 1 (1 + ) if P < 3 2 KX (P )3 +1 if P > 3 3 C= log if P < ) ( P 2 ( 2 +1) (1+) 1 if P < 3 log 2 12 ( P 3) ( 3 +1) 1 log if P > 3 2 12
1 2

and

(P +1)(1+) 12

Since if given a channel with parameter < 0 we can negate Y3 and get the same channel with a new parameter = > 0, we know that the capacity for a channel with parameter must be the same for the channel with parameter . Therefore ( ) 1 log (P +1||)(1+||) if P < || 2 2 1 ( P || 2 ) 1 ( 2 +1) (1+||) log if || P < 3|| C= 2 12 ( P 3) ( 3 +1) 1 log if P > 3|| 2 12 Note if the noise where white, i.e. = 0 then C= as expected. 1 1 log((P/3 + 1)3 ) = 3 log(1 + P/3) 2 2

4. Filtered noise We consider the previous vector channel where Z is ltered white noise Z = BU, where U N3 (0, I) is white Gaussian noise 1 B= 4 5 (a) Find the capacity. (b) How would you signal over this channel? 4 and 2 3 5 6 . 7 9

Solution: Filtered noise (a) First observe that |B| = 0 and therefore |K| = 0. From the capacity equation ( ) 1 |KY | C = log , 2 |K| it is clear that C = . (b) Observe that Z3 = Z1 + Z2 . One way to exploit this structure in the noise is to send nothing at time 1 and 2, i.e. X1 = X2 = 0. Therefore the receiver knows Z1 and Z2 and hence also knows Z3 . The receiver can then decode Y3 perfectly by noting that X3 = Y3 + Z2 + Z1 .

5. A mutual information game Consider the following channel: Z

?  

- Y

Throughout this problem we shall constrain the signal power EX = 0, and the noise power EZ = 0, EZ 2 = N, and assume that X and Z are independent. The channel capacity is given by I(X; X + Z). Now for the game. The noise player chooses a distribution on Z to minimize I(X; X + Z), while the signal player chooses a distribution on X to maximize I(X; X + Z). Letting X N (0, P ), Z N (0, N ), show that Gaussian X and Z satisfy the saddlepoint conditions I(X; X + Z ) I(X ; X + Z ) I(X ; X + Z). Thus ( ) P 1 min max I(X; X + Z) = max min I(X; X + Z) = log 1 + , Z X X Z 2 N 5 EX 2 = P,

and the game has a value. In particular, a deviation from normal for either player worsens the mutual information from that players standpoint. Can you discuss the implications of this? Note: Part of the proof hinges on the entropy power inequality from Chapter 16, which states that if X and Y are independent random n-vectors with densities, then e n h(X+Y) e n h(X) + e n h(Y) . Solution: A mutual information game Let X and Z be random variables with EX = 0, EX 2 = P , EZ = 0 and EZ 2 = N . Let X N (0, P ) and Z N (0, N ). Then as proved in class, I(X; X + Z ) = = = h(X + Z ) h(X + Z |X) h(X + Z ) h(Z ) h(X + Z ) h(Z ) I(X ; X + Z ),
2 2 2

where the inequality follows from the fact that given the variance, the entropy is maximized by the normal. To prove the other inequality, we use the entropy power inequality 22h(X+Z) 22h(X) + 22h(Z) with equality if X and Z are independent normal. Now we have I(X ; X + Z) = h(X + Z) h(X + Z|X ) = h(X + Z) h(Z) ( ) 1 log 22h(X ) + 22h(Z) h(Z) 2 ( ) 1 22h(X ) = log 1 + 2h(Z) 2 2 ( ) 1 22h(X ) log 1 + 2h(Z ) 2 2 ( 1 ) = log 22h(X ) + 22h(Z ) h(Z ) 2 = h(X + Z ) h(Z ) = I(X ; X + Z ), where the rst inequality follows from the entropy power inequality and the second in( ) 1 22h(X ) equality follows from the fact that g() = 2 log 1 + 22 is a nonincreasing function and h(Z) h(Z ). 6

Combining the two inequalities, we have I(X; X + Z ) I(X ; X + Z ) I(X ; X + Z). Hence, using these inequalities, it follows directly that min max I(X; X + Z) max I(X; X + Z )
Z X X

= I(X ; X + Z ) = min I(X ; X + Z)


Z

max min I(X ; X + Z).


X Z

(3)

We have shown an inequality relationship in one direction between minZ maxX I(X; X+ Z) and maxX minZ I(X; X +Z). We will now prove the inequality in the other direction is a general result for all functions of two variables. For any function f (a, b) of two variables, for all b, for any a0 , f (a0 , b) min f (a, b).
a

Hence max f (a0 , b) max min f (a, b).


b b a

Taking the minimum over a0 , we have min max f (a0 , b) min max min f (a, b).
a0 b a0 b a

or min max f (a, b) max min f (a, b).


a b b a

From this result, min max I(X; X + Z) max min I(X; X + Z).
Z X X Z

(4)

From (3) and (4), we have min max I(X; X + Z) = max min I(X; X + Z) Z X X Z ( ) 1 P = log 1 + . 2 N This inequality implies that we have a saddlepoint in the game, which is the value of the game. If signal player chooses X , the noise player cannot do any better than choosing Z . Similarly, any deviation by the signal player from X will make him do worse, if the noise player has chosen Z . Any deviation by either player will make him do worse. Another implication of this result is that not only is the normal the best possible signal distribution, it is the worst possible noise distribution. 7

6. Additive noise channel This problem has an instructive answer. Consider the channel Y = X + Z, where X is the transmitted signal with power constraint P , Z is independent additive noise, and Y is the received signal. Let { 0, with prob. 1/10 Z= , Z , with prob. 9/10 where Z N (0, N ). Thus Z has a mixture distribution which is the mixture of a Gaussian distribution and a degenerate distribution with mass 1 at 0. (a) What is the capacity of this channel? (b) How would you signal in such a manner as to achieve capacity? Solution: Additive noise channel (a) The capacity of the channel is in fact innite. Since Z has a discrete component, the dierential entropy h(Z) = . Now choose any distribution for X such that Y = X + Z has no atoms, e.g. X U [0, P ] will suce. Since h(Y ) > , C h(Y ) h(Z) C = (a) Many dierent signalling schemes are possible. A rather simple one is for the transmitter to pick a rational number (between P and P ) and transmit it. Since there are countably many rational numbers in ( P , P ), P( Q) = 0 where is the standard normal random variable. Therefore if the receiver gets a rational number, she can correctly conclude that Z = 0 w.p.1. Now the transmitter transmits the same rational number repeatedly for n times so that with probability 1 (9/10)n , the receiver can get a rational number at least once and hence decode the message correctly. This immediately implies that the achievable rate of this signalling scheme is innite since there are innitely many rational numbers that can be transmitted in this way. Note: This scheme is impractical since there is no nite-time algorithm to determine whether a real number is rational or not. We can, however, modify the above scheme to one in which the signal set is an arbitrary nite subset of rational numbers. It is easy to see that the achievable rate is unbounded and we still have the innite capacity.

7. Time varying channel. A train pulls out of the station at constant velocity. The received signal energy thus falls o with time as 1/i2 . The total received signal at time i is ( ) 1 Yi = Xi + Zi , i where Z1 , Z2 , . . . are i.i.d. N (0, N ). The transmitter constraint for block length n is 1 2 x (w) P, n i=1 i
n

w {1, 2, . . . , 2nR }.

Using Fanos inequality, show that the capacity C is equal to zero for this channel. Time Varying Channel Just as in the proof of the converse for the Gaussian channel nR = H(W ) = I(W ; W ) + H(W |W ) I(W ; W ) + nn I(X n ; Y n ) + nn = h(Y n ) h(Y n |X n ) + nn = h(Y n ) h(Z n ) + nn n h(Yi ) h(Z n ) + nn
i=1

(5) (6) (7) (8) (9) (10) (11) (12)

= =

n i=1 n i=1

h(Yi )

n i=1

h(Zi ) + nn

I(Xi ; Yi ) + nn .

Now let Pi be the average power of the ith column of the codebook, i.e., 1 2 Pi = nR xi (w). 2 w

(13)

Then, since Yi = 1 Xi + Zi and since Xi and Zi are independent, the average power of i Yi is i1 Pi + N . Hence, since entropy is maximized by the normal distribution, 2 1 1 log 2e( 2 Pi + N ). 2 i Continuing with the inequalities of the converse, we obtain nR (h(Yi ) h(Zi )) + nn ( ) 1 1 1 log(2e( 2 Pi + N )) log 2eN + nn 2 i 2 ( ) 1 Pi = log 1 + 2 + nn . 2 iN h(Yi ) 9 (14)

(15) (16) (17)

Since each of the codewords satises the power constraint, so does their average, and hence 1 Pi P. (18) n i This corresponds to a set of parallel channels with increasing noise powers. Using waterlling, the optimal solution is to put power into the rst few channels which have the lowest noise power. Since the noise power in the channel i is Ni = i2 N , we will put power into channels only where Pi + Ni . The height of the water level in the water lling is less than N + nP , and hence the for all channels we put power, i2 N < nP + N , 1 or only o( n) channels. The average rate is less than n n 1 log(1 + nP/N ) and the 2 capacity per transmission goes to 0. Hence there capacity of this channel is 0. [ ] 1 8. Feedback capacity for n = 2. Let Z = (Z1 , Z2 ) N (0, K), K = . 1 2 2 Suppose X = (X1 , X2 ) with input power constraint tr(KX ) = E(X1 + X2 ) 2P . (a) Find (b) Find
1 2 1 2
X+Z log KKZ without feedback. X+Z log KKZ with feedback.

Feedback capacity Without feedback, the solution is based on waterlling. The eigenvalues of the matrix are 1 , and therefore if P < , we would use only one of the channels, and achieve 2P capacity C = 1 log(1 + 1 ). For P , we would use both eigenvalues and the 2 waterlevel for water lling would be obtained by distributing the remaining power equally across both eigenvalues. Thus the water level would be (1 + ) + (2P 2)/2 = 1 1 1 + P , and the capacity would be C = 2 log( 1+P ) + 2 log( 1+P ). 1+ 1 With feedback, the solution is a a little more complex. From (9.102), we have Cn,F B = max |(B + I)KZ (B + I)t + KV | 1 log (n) 2n |KZ |
(n)

(19)

where the maximum is taken over all nonnegative denite KV and strictly lower triangular B such that (n) tr(BKZ B t + KV ) nP. (20) In the case when n = 2, ( (B +
(n) I)KZ (B

+ I) + KV

= ( =

1 0 b 1

)(

1 1

)(

1 b 0 1

) + )

P1 0 0 P2

) (21) (22)

1 + P1 +b + b 1 + P2 + 2b + b2

10

subject to the constraint that ( trace P1 0 0 P2 + b2 ) 2P (23)

Expanding this, we obtain the mutual information as I(X; Y ) = 1 + P1 + P2 + P1 P2 + P1 b2 + 2P1 b 2 subject to P1 + P2 + b2 = 2P (25) Setting up the functional and dierentiating with respect to the variables, we obtain the following relationships P1 = P2 + b2 + 2b (26) and b = P1 (27) (24)

11

You might also like