You are on page 1of 6

EE595A Introduction to Information Theory Winter 2004

University of Washington Dept. of Electrical Engineering

Handout 8: Problem Set 4: Solutions


Prof: Jeff A. Bilmes <bilmes@ee.washington.edu> Lecture 10, March 10, 2004

Book Problems Do problems 7.4(abcd),7.6,7.9,8.8,8.10,8.12,9.1,10.2 For problem 7.4d, compute the probability of the sequence quence, where

which is then followed by any arbitrary se

halts

and

Also, do not do 7.4e. (note: a good break in this problem set is to be done with all chapter 7 problems, problem 1 and 2 below, and 1/2 of the chapter 8 problems by Friday). Problem 7.4 Monkeys on a computer. Suppose a random program is typed into a computer. Give a rough estimate of the probability that the computer prints the following sequence: 1. 2. 3.

followed by any arbitrary sequence.


followed by any arbitrary sequence, where

is the ith bit in the expansion of

followed by any arbitrary sequence.


halts

4. (Optional)

followed

by

any

arbitrary

sequence,

where

Solution 7.4 The probability that a computer with a random input will print will print out the string followed by any arbitrary sequence is the sum of the probabilities over all sequences starting with the string .

where

(8.1)

This sum is lower bounded by the largest term, which corresponds to the simplest concatenated sequence. 1. The simplest program to print a sequence that starts with 0s is Print 0s forever. This program has constant length and hence the probability of strings starting with zeroes is


forever. Hence

(8.2)

2. Just as in part (a), there is a short program to print the bits of


(8.3)

3. A program to print out 0s followed by a 1 must in general specify . Since most integers have a complexity , and given , the program to print out is simple, we have

(8.4)

8-1

8-2

4. We know that bits of are essentially incompressible, i.e., their complexity . Hence, the shortest program to print out bits of followed by anything must have a length at least , and hence

(8.5)

Problem 7.6 Do computers reduce entropy? Let , where is a Bernoulli(1/2) sequence. Here the binary . sequence is either undened or is in . Let be the Shannon entropy of . Argue that Thus although the computer turns nonsense into sense, the output entropy is still innite. Solution 7.6 Do computers reduce entropy? The output probability distribution on strings is , the universal probability of the string . Thus, by the arguments following equation (7.65), the output distribution includes a mixture of all computable probability distributions. Consider the following distribution on binary nite length stings:

if


1s

(8.6)

otherwise

where . Then is a computable probability distribu is chosen to ensure that tion, and by problem 9 in Chapter 2, has an innite entropy.

By (7.65) in the text,

for some constant that does not depend on . Let

(8.7)


It is easy to see that

, and therefore is a probability distribution. Also,


(8.8)

(8.9)

By the results of Chapter 2, is a concave function of and therefore

(8.10)

Summing this over all , we obtain

(8.11)

Thus the entropy at the output of a universal computer fed in Bernoulli(1/2) sequences is innite. Problem 7.9 Random program. Suppose that a random program (symbols i.i.d. uniform over the symbol set) is fed into the nearest available computer. To our surprise the rst bits of the binary expansion of are printed out. Roughly what would you say the probability is that the next output bit will agree with the corresponding bit in the expansion of ? Solution 7.9 Random program. The arguments parallel the argument in Section 7.10, and we will not repeat them. Thus the probability that the next bit printed out will be the next bit of the binary expansion of is . Problem 8.8 Cascade of Binary Symmetric Channels. Show that a cascade of identical binary symmetric channels,

BSC #1

BSC #n

each with raw error probability , is equivalent to a single BSC with error probability and hence that if . No encoding or decoding takes place at the intermediate terminals . Thus the capacity of the cascade tends to zero.

8-3

Solution 8.8 Cascade of binary symmetric channels. There are many ways to solve this problem. One way is to use the singular value decomposition of the transition probability matrix for a single BSC. Let,

be the transition probability matrix for our BSC. Then the transition probability matrix for the cascade of of these BSCs is given by,

Now check that,

where,

Using this we have,


From this we see that the cascade of BSCs is also a BSC with probablility of error,

The matrix, , is simply the matrix of eigenvectors of . This problem can also be solved by induction on . Probably the simplest way to solve the problem is to note that the probability of error for the cascade channel is simply . But this can simply be the sum of the odd terms of the binomial expansion of with and written as .

Problem 8.10 Suboptimal codes. For the Z channel of the previous problem, assume that we choose a code at random, where each codeword is a sequence of fair coin tosses. This will not achieve capacity. Find the maximum rate such that the probability of error , averaged over the randomly generated codes, tends to zero as the block length tends to innity. Solution 8.10 Suboptimal codes. From the proof of the channel coding theorem, it follows that using a random code corresponding to with codewords generated according to probability , we can send information at a rate that with an arbitrarily low probability of error. For the Z channel described in the previous problem, we can for a uniform distribution on the input. The distribution on is (3/4, 1/4), and therefore calculate

(8.12)
be

Problem 8.12 Time-varying channels. Consider a time-varying discrete memoryless channel. Let conditionally independent given with conditional distribution given by

8-4

0 1

Let

Find

Solution 8.12 Time-varying channels. We can use the same chain of inequalities as in the proof of the converse to the channel coding theorem. Hence

(8.13)

(8.14) (8.15)

depends only on since by the denition of the channel, Continuing the series of inequalities, we have

and is conditionally independent of everything else.

(8.16) (8.17) (8.18)

with equality if

is chosen i.i.d.

Bern(1/2). Hence

(8.19)

Problem 9.1 Differential entropy. Evaluate the differential entropy 1. The exponential density, 2. The Laplace density, 3. The sum of

for the following:

and variances

and

where

and

are independent normal random variables with means

8-5

Solution 9.1 Differential Entropy. 1. Exponential distribution.

nats.

(8.20) (8.21) (8.22)

2. Laplace density.

bits.

nats. bits.
3. Sum of two normal distributions.

(8.23) (8.24) (8.25) (8.26)

The sum of two normal random variables is also normal, so applying the result derived the class for the normal distribution, since ,
bits.

(8.27)

Problem 10.2 A channel with two independent looks at Y. Let identically distributed given 1. Show

and

be conditionally independent and conditionally

2. Conclude that the capacity of the channel

is less than twice the capacity of the channel

Solution 10.2 A channel with two independent looks at Y. Channel with two independent looks at 1.

(8.28) (8.29) (8.30) (8.31) (8.32)

(since

and

are conditionally independent given

(since and
distributed)

are conditionally identically

8-6

2. The capacity of the single look channel

is

(8.33)

The capacity of the channel

is

(8.34)

(8.35) (8.36) (8.37)

Hence, two independent looks cannot be more than twice as good as one look. Other Problems Problem 1: (The Ternary Confusion Channel): Consider a discrete memoryless channel with input alphabet and output alphabet , where and . The stochastic matrix given by if or if , and if or if . Compute the capacity of this channel, and determine the maximizing mass function over the input alphabet. Solution 1:

, since there is no confusion between It is clear that if we never use the symbol , we will obtain capacity of input and output in this case. Therefore, with , and , we get a capacity of , can the capacity increase? No. The 1. The question becomes, suppose we have some non-zero probability for reason is by looking at the mutual information. Let , , and . Then

Setting and can be is 1, so any

gives us the capacity of mentioned above. Clearly, the maximum that will reduce the capacity from (at most) 1 to something smaller.

Problem 2: In class and above, we dened the amazing incredible and unknowable number . In this problem, you are to choose a normally unsolvable problem (it can be one from mathematics, or even any general world problem you wish you could solve), and show how that if you have available to you, you can compute a (guaranteed cd halting) solution to this problem. Be as precise as possible, in that you give a explicit algorithm for how, when is given, you can compute an answer to your problem. Argue why your algorithm is correct, and why the problem you choose is normally unsolvable. Solution 2: This was a fun problem that everyone did well on. Clearly, there are an enormous number of problems once can solve given . All such solutions take the form of enumerate and test all possible solutions, and halt if a solution is found. Given , we can decide if the program halts.

You might also like