Ch2 Information Theory

2.
INFORMATION THEORY
"Measure what is measurable, and make measurable what is not

so.
-Galileo
The outcome of this chapter is to make the readers to:

To define the information content of a message
To emphasize the need for measure of information and average information content of
a system.
To develop a mathematical model to measure the information content of a message.
To detail the properties of entropy.
To develop a Markoff model to measure entropy for a dependent source.
2.1 Introduction
Today we are living in the era of information. Modern communication being one of
the key areas of interest, information exchange in an efficient manner is the need for the day.
Until 1948, when Claude E Shannon published his premiere work on Mathematical Theory
on Communication in Bell System Technical journal, information was considered to be an
abstract entity which cannot be measured. However he has given a mathematical model for
measuring the information.
2.2 Measure of Information

An information source generates the information. But the amount of information
conveyed by the source is different for different symbols it emits. A mathematical modeling
can be formulated to quantify the amount of information conveyed by a symbol.
Consider the following statements.
(A) It rained heavily in Chirapunji yesterday.

(B) There was a heavy rainfall in Rajastan the last night.
Although both statements infer the occurrence of heavy rainfall, the amount of
information conveyed by them are different. The first statement will not create any surprise
as heavy rainfall is a common event in Chirapunji. However the second statement will
certainly create some degree of surprise as it is less likely to have heavy rainfall in Rajastan.
From the above example, it can be concluded that the amount of information
conveyed by a message is inversely proportional to the probability of its occurrence.
Therefore we can write the amount of information conveyed or self information of a
message k, Ik is inversely proportional to its probability of its occurrence Pk
1
i.e. Ik
(2.1)
Also there are some other facts on information:
(i)
The information conveyed by a message cannot be negative. It has to be atleast 0

i.e Ik 0
(2.2)
(ii)
If the event is definite, i.e if the probability of the occurrence of the event is 1,
then the corresponding information conveyed by the event will be 0.
i.e if
Pk= 1, Ik = 0
(2.3)
(iii)
The information conveyed by composite statements which are independent is

simply given by the sum of the individual self information contents
i.e I(m1,m2)= I(m1)+ I(m2)
(2.4)
The only mathematical operator which satisfies the relations (2.2), (2,3) and (2.4) is
the LOGARITHM operator. Therefore the self information content of a message is given by
Ik =
units
(2.5)
Where pk is the probability of occurrence of kth message.

The unit of the self information is dependent on the base of the logarithm used.
If r= 2, the unit is bits
If r= e, the unit is nats
If r= 10, the unit is Hartley or Decits
In our entire discussion, unless otherwise specified, the unit of self information is
considered to be bits.
Example 2.1 Consider a binary system emitting two symbols {0,1} with probabilities 0.6
and 0.4 respectively. Find the information conveyed by a bit 0 and a bit 1.
Solution:
We have the self information content of a symbol is given by,
1
Ik = bits
Probability of the occurrence of 0 at the source output is P(0)= 0.6

And probability of the occurrence of 1 at the source output is P(1)= 0.4
I0
= 2
= 0.6 = 2 0.7368 bits

Similarly
I1
= 2
1
= 0.4 = 2 1.3219 bits
Example 2.2 Consider a source emitting two symbols s0 and s1 with the corresponding
probabilities and respectively. Find the self information of the symbols in
(i)
Bits
(ii)
Decits
(iii)
Nats
Solution:
(i)
Ik
= bits
Is0
= 2
1
(0)
= 2 34 = 0.415 bits
Is1
= 2 (1)
1
= 2 14 = 2 bits
(ii)
Ik
= 10 decits
Is0
= 10 (0)
= 10 34 = 0.124 decits
Is1
= 10 (1)
1
= 10 14 = 0.602 decits
(i)
Ik
= nats
Is0
= (0)
= 34 = 0.287 nats
Is1
= (1)
1
= 14 = 1.386 nats
Claude Elwood Shannon (April 30, 1916 February 24,

2001), The Father of Information Theory was an American
engineer. At the age of 21 doing his masters at MIT, he wrote
a thesis giving an insight on the usage of Boolean concepts in
solving electromechanical problems. He also has contributed
a lot in the field of cryptanalysis during World War II. He
had two Bachelor degrees, one in electrical engineering and
the other in mathematics. His paper A Mathematical Theory
of Communication which was published in two parts of the
Bell System Technical Journal is considered to be one of the
most promising works in the field of information theory.
Example 2.3 A pair of die is rolled simultaneously. The outcome of the first dice is
considered to be x and that of the second as y. Three events are defined as
P= {(x,y) such that (x+y) is exactly divisible by 3}

Q= {(x,y) such that (x+y) is an even number}
R= {(x,y) such that (x+y)=7}
Which event conveys more information? Justify your answer with mathematical
calculation.
Solution:
To find which event conveys more information, we need to find the probabilities of the
events. Less probable the event, more will be the information conveyed.
To find the probabilities find the sets of sample space and events P, Q and R.
S
= {(1,1), (1,2), (1,3), (1,4), (1,5), (1,6)

(2,1), (2,2), (2,3), (2,4), (2,5), (2,6)
(3,1), (3,2), (3,3), (3,4), (3,5), (3,6)
(4,1), (4,2), (4,3), (4,4), (4,5), (4,6)

(5,1), (5,2), (5,3), (5,4), (5,5), (5,6)
(6,1), (6,2), (6,3), (6,4), (6,5), (6,6)}
P
= {(1,2), (1,5), (2,1), (2,4), (3,3), (3,6), (4,2), (4,5), (5,1), (5,4), (6,3), (6,6)}
Q = {(1,1), (1,3), (1,5), (2,2), (2,4), (2,6), (3,1), (3,3), (3,5), (4,2), (4,4), (4,6), (5,1), (5,3),
(5,5), (6,2), (6,4), (6,6)}
R
= {(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)}
12
P (P) = 36
18
P (Q) = 36
6
P (R) = 36
As the probability of occurrence of event R is less, it conveys more information. This

statement can be justified by considering the self information of the events P, Q and R.
1
Ik
= 2 bits
IP
= 2 ()
= 2 1236 = 1.584 bits

Similarly
IQ
= 2 ()
1
= 2 1836 = 1 bits
IR
= 2 ()
= 2
1
636
= 2.584 bits
As IR > IP and IQ, event R conveys more information
2.3 Average Information Content (Entropy) of a Zero Memory Source

A zero memory source or a Discrete Memoryless Source is the one in which the
emission of the current symbol is not dependent on the emissions of the previous symbols.
Consider a source emitting symbols
S= {s1, s2, s3, . . . sN} with respective probabilities
P= {p1, p2, p3, . . . pN}
Now consider a long message of length L emitted by the source. Then it contains
P1L number of symbols of s1
P2L number of symbols of s2
...
...
...
PNL number of symbols of sN
We have the self information of s1 is given by

1
Is1= 2 1 bits
1
In other words each symbol of s1 conveys an information of 2 1 bits. And such

p1L number of s1 symbols are present on an average in a length of L symbols.
1
Therefore the total information conveyed by symbols of type s1 is p1Llog 2 bits.

1
Similarly total information conveyed by symbols of type s2 is p2Llog 2
1
2
bits.
...
...
...
1
Total information conveyed by symbols of type sN is pNLlog 2 bits.
Therefore the total information conveyed by the source is simply the sum of all these
information contents.
1
i.e ITOTAL= p1Llog 2 + p2Llog 2 + . . . + pNLlog 2

1
Thus the average information conveyed by the source by emitting L symbols is

denoted by its entropy H(S), which is given by the expression,
H(S)=
ITotal
H(S) = [p1Llog 2 + p2Llog 2 + . . . + pNLlog 2 ]/ L

1
= L [p1log 2 + p2log 2 + . . . + pNlog 2 ] /L

1
= p1log 2 + p2log 2 + . . . + pNlog 2

1
H(S)
=
=1 pk log 2 bits/ sym
(2.6)
Equation (2.6) gives the expression for the average information content of a source S.
The average rate of information can also be defined for an information system if the
symbol rate or the baud rate of the system is known. If the baud rate of the system is
rs sym/ sec, then the average rate of information is given by
RS= H(S) * rs bits/ sec
(2.7)
Example 2.4 A discrete memoryless source emits 5 symbols in every 2ms. The symbol
probabilities are {0.5, 0.25, 0.125, 0.0625, 0.0625}. Find the average information rate of
the source.
Solution:
The entropy or the average information content of the source is given by

H(S)
= 5=1 2 bits/ sym
Substituting the probabilities given in the above equation, we will have

1
= 0.52 0.5+ 0.252 0.25+ 0.1252 0.125+ 0.06252 0.0625+ 0.06252 0.0625
=1.875 bits/sym
Average rate of information is given by
Where
rs is the symbol rate
In the problem it is given that a symbol is emitted for every 2ms.
rs = 1/2X10-3 = 500 sym/sec
RS= 1.875 * 500 = 937.5 bits/ sec
Example 2.5 A discrete source emits one of the following 5 symbols in every 1s. The
symbol probabilities are {1/4, 1/8, 1/8, 3/16, 5/16}. Find the average information content
of the source in nats/sym and Hartley/sym.
Solution:
The average information content of the source is nothing but the source entropy which is
given by
H(S)
= 5=1 nats/ sym

1
= 4 14+ 8 18 + 8 18 + 16 316 + 16 516

=1.543 nats/sym
Similarly H(S) in Hartley/ sym is given by,

H(S)
= 5=1 10
1
Hartley/ sym
= 4 10 14+ 8 10 18 + 8 10 18 + 16 10 316 + 16 10 516

=0.6704 Hartley/sym
Example 2.6 The output of an information source contain 160 symbols, 128 of which
occurs with a probability of 1/256 and remaining with a probability of 1/64 each. Find
the average information rate of the source if the source emits 10,000 sym/sec.
Solution:
The source entropy is given by
H(S)
= 160
=1 2
1
= 128* 256 2
1
1
256
+ 32* 64 2
1
1
64
= 7 bits/sym
Average rate of information is given by
Where
rs is the symbol rate given as 10,000 sym/sec in the problem
RS= 7 * 10000 = 70000 bits/ sec
Example 2.7 The international Morse code uses a sequence of symbols of dots and
dashes to transmit letters of English alphabet. The dash is represented by a current
pulse of duration 2ms and dot of 1ms. The probability of dash is half as that of dot.
Consider 1ms duration of gap is given in between the symbols. Calculate
(i)
Self information of a dot and a dash
(ii)
Average information content of a dot dash code
(iii)
Average rate of information
Solution:
Let Pdot and Pdash be the probabilities of dot and dash respectively.
1
Given Pdash= 2 Pdot

Also Pdot+Pdash = 1
1
Pdot+ 2 Pdot = 1
2
Pdot = 3
1
And therefore Pdash= 2 Pdot = 3

(i)
Idot
= 2
=
1
2 23 =
0.5849 bits
Idash= 2
= 2 13 = 1.5849 bits
(ii)
H(S)
= 2=1 2
1
= pdot2 + pdash 2
= 0.9182 bits/sym
(iii)
From the probabilities of dot and dash, it is clear that for every three symbols
transmitted there will be one symbol of type dash and two of type dot. Also
the duration of dash is 2ms and that of dot is 1ms and 1ms gap is left in
between the symbols as follows
.._.._.._
Therefore a total of (1ms+1ms+1ms+1ms+2ms+1ms=) 7ms time is required to
transmit 3 symbols.
Therefore the symbol rate is given by,
3
rs= 7 =
Rs
3000
7
sym/ sec
= H(S) * rs
= 0.9182 *
3000
7
= 393.51 bps
2.3.1 Properties of Entropy

The following are the properties of entropy:
Entropy is a continuous function of probability
Entropy is a symmetric function of its arguments. In other words, the entropy of a

system is the same irrespective of the order in which the symbols are arranged in the
list.
Upper bound on Entropy:
Inorder to define the condition for the upper bound on entropy, consider the following
scenarios when election results declared in case of
(A) Two equally strong candidates have contested in a constituency.

(B) One contestant is very strong and other one is least expected to win.
Amongst the two scenarios given above, the result in the first case would give
maximum information as the winning probabilities of both candidates are same. However in
the second case as one of the candidates is very strong and expected to win, the election result
conveys very little information.
In other words, the entropy of a source will be maximum if all its symbol probabilities
are equal. Let us derive the expression for the maximum entropy of a source.
Consider the term,
1
Log2N-H(S) = Log2N-
=1 2
(2.8)
We have sum of all symbol probabilities emitted by a source is unity.
i.e
=1 = 1
Multiplying this term to the first term of the RHS of equation (2.8) will not alter the
equality.
Log2N-H(S)
=
=1 pk log2N- =1 pk log 2
=
=1 pk [ log2N- log 2 ]
=
=1 pk log2(Npk)
Log2N-H(S)
= log2e
=1 pk ln(Npk)
(2.9)
From mathematics we have the following relation for logarithms

ln
1
x
1-x
(2.10)
Above relation turns to be an equation if x=1. Let us reproduce the RHS of equation
(2.9) starting from the relation given in (2.10). Further comparing the LHS of equation (2.9)
we can deduce the upper bound for H(S).
1
Substitute x= NP
ln Npk 1- NP
Multiplying both sides by log2e

=1 pk
1
log2e
=1 pk ln Npk log2e =1 pk [1- NP ]
k
The LHS of the above relation represents Log2N-H(S) from the equation (2.9).
Log2N-H(S) log2e
=1 pk [1- NP ]
k
Log2N-H(S) log2e [
=1 pk - =1 ]
N
N
But
=1 pk = 1 and k=1 N = N + N + N + N (N times) = 1
Log2N-H(S) log2e [1 -1 ]
Log2N-H(S) 0
H(S) Log2N
Therefore the upper bound of the entropy is given by
H(S)max = Log2N bits/ sym

(2.11)
1
This condition occurs when x=1 i.e. =1 or pk =
2.3.2 Source Efficiency

The efficiency of an information source is defined as the ratio of the average
information conveyed by the source to that of the maximum average information.
H(S)
S= H(S)
max
X 100 %
(2.12)
The redundancy of the source is defined as

RS= 1- S
= [ 1=
H(S)
H(S)max
H(S)maxH(S)
H(S)max
X 100 %
(2.13)
Example 2.8 Consider a system emitting one of the three symbols A, B and C with
respective probabilities 0.7, 0.15 and 0.15 respectively. Calculate its efficiency and
redundancy.
Solution:
()
S= () X 100 %
H(S)
= 3=1 2 bits/ sym

1
= 0.72 0.7+ 0.152 0.15+ 0.152 0.15

= 1.1812 bits/sym
H(S) max
= log2 N
= log2 3
= 1.5849 bits/sym
1.1812
= 1.5849 X 100 %
= 74.52%
RS= 1- S = 25.48%
Example 2.9 A certain digital frame consists of 15 fields. First and the last field of each
frame is the same. The remaining 13 fields can be filled by any of 16 symbols with equal
probability. Find the average information conveyed by the frame. Also find the average
rate of information if 100 frames are transmitted in every second.
Solution:
Consider the frame with 15 fields as shown below
1
10
11
12
13
14
15
The total average information conveyed by the source is simply the sum of the individual
entropies of each field in the frame.
i.e H(S)Total = H1+ H2+ H3+ . . . + H15
As the first and last fields are same for all frames, it conveys no information i.e, H1= H15 = 0
Remaining fields can be filled with one of the 16 symbols with equal probability. Therefore
the corresponding entropy is the maximum entropy H(S)max
H2= H3= . . . = H14= H(S) max = log2 N = log2 16= 4 bits/field
H(S)Total
= H1+ H2+ H3+ . . . + H15
= 13*4= 52 bits/frame
Rs= H(S) * rs
Given rs= 100 frames/sec
Rs = H(S) * rs
= 52*100
= 52,000 bps
Example 2.10 In a facsimile transmission of a picture there are 4X106 pixels/frame. For
a good reconstruction of the image atleast 8 brightness levels are necessary. Assuming
all these levels are equally likely to occur, find the average information rate if 1 picture
is transmitted for every 4 seconds.
Solution:
Total number of pixels = 4X 106
Total number of brightness levels = 8
Total number of different frames N = 84X 106
As all brightness levels are equally likely to occur, all these frames are equiprobable.
H(S) = H(S) max
= log 2N
6
= log 2 (84X 10 )
= 12 X 106 bits/frame
Rs = H(S) * rs
It is given that each picture frame is transmitted in 4 sec.
1
rs = 4frames/sec
Rs = H(S) * rs
= 12 X 106 X
1
4
= 3 X 106 bps
2.4 Extension of Zero Memory Source

Consider a zero memory source emitting two symbols s1 and s2 with probabilities
p1 and p2 respectively.
Obviously p1+ p2= 1
Also the entropy of the source S is given by
H(S) = p1log 2 + p2log 2

1
(2.14)
Now consider the second order extension of the source S. This source is denoted as
S2. Now the source S2 will now have four combinations viz s1s1, s1s2, s2s1 and s2s2.
Corresponding probabilities are given by,
s1s1 = p(s1) p(s2) = p1 X p1 = p12

s1s2 = p(s1) p(s2) = p1 X p2 = p1. p2
s2s1 = p(s2) p(s1) = p2 X p1 = p2. p1
s2s2 = p(s2) p(s2) = p2 X p2 = p22
The entropy of the second order extension of the source is given by,
H(S2) = 4=1 pk log 2

1
= p12log 2
2
1
+ p1p2log 2
1
12
= 2p12log 2 + 2p1p2log 2
1
+ p2p1log 2
1
12
1
21
+ p22log 2
+ 2p22log 2
= 2p12log 2 + 2p1p2log 2 + 2p1p2log 2 + 2p22log 2

1
2
2
= 2p1 (p1+ p2) log 2 + 2p2 (p1+ p2) log 2

1
But we have p1+ p2= 1
H(S2)
= 2p1 log 2 + 2p2 log 2

1
= 2 [p1 log 2 + p2 log 2 ]

1
H(S2)
= 2. H(S)
Similarly we can show that

H(S3)
= 3. H(S)
H(S4)
= 4. H(S)
And so on
In general for an nth order extension of S
H(Sn)
= n. H(S)
(2.15)
Example 2.11 Consider a zero memory source emitting three symbols X, Y and Z with
respective probabilities {0.6, 0.3, 0.1}. Calculate
(i)
Entropy of the source
(ii)
All symbols and the corresponding probabilities of the second order

extension of the source
(iii)
Find the entropy of the second order extension of the source
(iv)
Show that H(S2)= 2*H(S)
Solution:
(i)
= 3=1 2 bits/ sym
H(S)

1
= 0.62 0.6+ 0.32 0.3+ 0.12 0.1

= 1.2954 bits/sym
(ii)
Consider the second order extension

Symbols
(iii)
H(S2)
Probability
XX
P(X)*P(X) = 0.6* 0.6 = 0.36
XY
P(X)*P(Y) = 0.6* 0.3 = 0.18
XZ
P(X)*P(Z) = 0.6* 0.1 = 0.06
YX
P(Y)*P(X) = 0.3* 0.6 = 0.18
YY
P(Y)*P(Y) = 0.3* 0.3 = 0.09
YZ
P(Y)*P(Z) = 0.3* 0.1 = 0.03
ZX
P(Z)*P(X) = 0.1* 0.6 = 0.06
ZY
P(Z)*P(Y) = 0.1* 0.3 = 0.03
ZZ
P(Z)*P(Z) = 0.1* 0.1 = 0.01

1
= 9=1 2 bits/ sym

= 2.5909 bits/sym
(iv)
2* H(S) = 2* 1.2954= 2.5909 bits/sym = H(S2)
2.5 Entropy of a Source with Memory

In the previous sections we have assumed the information source to be memoryless
for our analysis. In other words, we considered that the emission of the current symbol from
the source is independent of the previous emissions. But most of the sources we consider in
real time, do have memory. For instance, in English, if the first letter of a word is Q it is
more likely to occur that the succeeding letter would be U and so on. Thus in real time,
most of the sources are dependent. Hence the model we discussed for a memoryless source
would not suffice.
Thus a dependent probabilistic model is required to model such sources. One such
model is Markoff model. Thus a Markoff model is used to represent a source with memory.
In general for an nth order Markoff source, the emission of symbol s at the current instant is
dependent on its previous n symbols.
2.5.1 Markoff Model

A system can be represented using a state diagram. A state diagram represents all
possible states of a system along with the transition probabilities. Also the symbols emitted
by the source in each of the transitions are depicted in a state diagram. From the state diagram
one can construct a tree diagram. From the tree diagram the probabilities of the symbols
emitted by the source can be determined. The probabilities of messages of length n can be
determined by constructing a tree diagram of n stages.
2.5.2 Entropy and Information Rate

Let the entropy of state k is denoted by Hk. It can be obtained considering all
outgoing probabilities of state k.
Hk
=
=1 plk log 2
bits/ sym
(2.16)
Where M is the total number of states.
The entropy of the source is given by,

=
=1 bits/ sym
(2.17)
Where pk is the probability of the kth state.
The average information rate of a source is given by,
Rs= rs H bits/ sec
(2.18)
2.5.3 Average Information per Symbol
The average information content per symbol in a message of length L is given by,
1
GL= ()2 () bits / sym

(2.19)
Where p(mi) is the probability of the messages of length L.
The average amount of information per symbol in a long message decreases with
increase in L and will be atleast equal to H(S).
i.e G1 G2 G3 . . . H(S)
In other words = H(S)
(2.20)
Example 2.12 Consider the following Markoff source shown in Fig. 2.1.
Find
(i)
State probabilities
(ii)
State entropies
(iii)
Source entropy
(iv)
G1, G2
Show that G1>G2>H
Figure 2.1 Markoff Source
Solution:
(i)
To find the state probabilities we have to first write the state equations. State
equations can be written by considering all incoming transition probabilities of the
states.
Consider the equation for state 1
1
P(1)= 3 P(1) + 3 P(2)

(2.21)
Similarly for state 2,
2
P(2)= 3 P(1) + 3 P(2)

(2.22)
Also the sum of all state probabilities is equal to one.
i.e P(1)+P(2)=1
(2.23)
From Eq. (2.21)
1
P(1)- 3P(1) =
2
3
2
3
P(2)
P(1)= 3 P(2)
P(1)= P(2)
Substituting this in Eq. (2.23)
P(1)+P(1)=1
1
P(1)= 2 = P(2)
(ii)
To find the state entropy we need to consider all outgoing probabilities of the
states.
We have
1
=
=1 2 bits/ sym
Hk
For state 1,
1
H1= 3 2 1 + 3 2
3
1
2
3
= 0.918 bits/sym

1
H2= 3 2 1 + 3 2
3
1
2
3
= 0.918 bits/sym
(iii)
Source entropy is given by

H =
=1 bits/ sym
= P(1) H1+ P(2) H2
1
= 2 * 0.918 +2 * 0.918 = 0.918 bits/sym

(iv)
To find G1 and G2, we need to construct the tree diagram. It is as shown in figure
2.2.
From the tree diagram, we can find the probabilities of all messages of lengths 1 and two
by considering the output of stage 1 and 2.
There are two possible messages of length 1 viz A and B. The corresponding probabilities
are
1
P(A) = 2 P(B) = 2
Corresponding information per symbol G1 is given by,
1

L= 1
Figure 2.2 Tree Diagram for the Markoff Source shown in Fig. 2.1
G1 = 1 ()2 ()
=
1
2
2 1 + 2 2
2
1
1
2
= 1 bits / sym
Similarly from the second stage of the tree we can get the probabilities of all symbols of
length two which are given as follows:
1
P(AA) = 6
P(AB) = 3
P(BA) = 3
P(BB) = 6
G2 = 2 ()2 ()
=
1
2
[ 6 2 1 + 3 2
6
1
1
3
1
3
2 1 + 6 2 1 ]
3
= 0.959 bits/sym
(v)
Obviously G1>G2>H
Example 2.13 For the following Markoff source shown in Fig.2.3
Find
(i)
State probabilities
(ii)
State entropies
(iii)
Source entropy
(iv)
G1, G2
(v)
Show that G1>G2>H
Solution:
(i)
Consider the equation for state 1

5
P(1)= 6 P(1) + 5 P(2)

(2.24)
1
P(2)= 6 P(1) + 5 P(2)

(2.25)
Also P(1)+P(2)=1
(2.26)
From Eq. (2.24)
P(1)=
18
P(2)
Substituting this in Eq. (2.26)

18
5
P(2)+ P(2)=1
5
P(2)= 23
18
P(1) =23
(ii)
We have
=
=1 2
Hk
bits/ sym
For state 1,
1
H1= 6 2 1 + 6 2
6
1
5
6
= 0.65 bits/sym

3
H2= 5 2 3 + 5 2
5
1
2
5
= 0.9709 bits/sym
(iii)

H =
=1 bits/ sym
= P(1) H1+ P(2) H2
= 0.7197 bits/sym
(iv)
To find G1 and G2, we need to construct the tree diagram. It is as shown in figure
2.4.
Figure 2.4 Tree Diagram for the Markoff Source shown in Fig. 2.3
There are two possible messages of length 1 and the corresponding probabilities are
15
P(X) = 23
2
P(Y) = 23
6
P(Z) = 23
Corresponding information per symbol G1 is given by,
1

L= 1
1
G1 = 1 ()2 ()
= 1.2142 bits / sym

Similarly from the second stage of the tree we can get the probabilities of all symbols
of length two which are given as follows:
25
P(XX) = 46
5
P(XZ) = 46
1
P(ZZ) = 10
6
P(ZY) = 115
5
P(ZX) = 46
6
P(YZ) = 115
4
P(YY) = 115
1
G2 = 2 ()2 ()
= 1.0597 bits/sym
(v)
Obviously G1>G2>H
Example 2.14 Consider the Markoff source shown in Fig. 2.5.
Find
(i)
State probabilities
(ii)
State entropies
(iii)
Source entropy
Solution:
(i)
We can write the state equations for 4 states as follows

P(A)= 0.6 P(A) + 0.5 P(D)
(2.27)
P(B)= 0.4 P(A) + 0.5 P(D)
(2.28)
P(C)= 0.5 P(B) + 0.6 P(C)
(2.29)
P(D)= 0.5 P(B) + 0.4 P(C)
(2.30)
Also P(A)+P(B) +P(C) +P(D)=1
(2.31)
From Eq. (2.27)
P(A) =
5
4
P(D)
From Eq. (2.28)

P(B) = P(D)
From Eq. (2.29)
P(C) =
5
4
P(D)
Substituting these relations in Eq. (2.31)

5
4
P(D)+ P(D) +
5
4
P(D) + P(D) = 1
P(D)= 9 = P(B)
P(A) =
(ii)
5
4
P(D) = 18
We have
Hk
=
=1 2 bits/ sym
For state A,
1
HA= 0.62 0.6+0.4 2 0.4

= 0.9709 bits/sym
For state B,
1
HB= 0.52 0.5+0.5 2 0.5

= 1 bits/sym
For state C,
1
HC= 0.62 0.6+0.4 2 0.4

= 0.9709 bits/sym
Finally for state D,
1
HD= 0.52 0.5+0.5 2 0.5

= 1 bits/sym
(iii)

H =
=1 bits/ sym
= P(A) HA+ P(B) HB + P(C) HC+ P(D) HD
= 0.9838 bits/sym
Example 2.15 Design a system to report the heading of collection of 400 cars. The
heading levels are heading straight (S), turning left (L) and turning right (R). This
information is to be transmitted every second. Construct a model based on the test data
given below.
(i)
On an average during reporting interval, 200 cars were heading straight, 100
were turning left and remaining were turning right.
(ii)
Out of 200 cars that reported heading straight 100 of them reported going
straight during next reporting period. 50 of them turning left and remaining
turning right during the next period.
(iii)
Out of 100 cars that reported as turning during a signaling period, 50 of them
continued the turn and remaining headed straight during next signaling
interval.
(iv)
The dynamics of the cars did not allow them to change their heading from
left to right or right to left during subsequent reporting periods.
Find the entropies of each states, source entropy and rate of information.
Solution:
There are three states in the model.
Heading straight (S)
Turning left (L)
Turning right (R)
The Markoff model can be constructed based on the data given in the problem.
It is given that 200 cars were heading straight, 100 were turning left and remaining
100 were turning right in any turning interval on an average. And there are 400 cars in total.
Therefore the probabilities of the states can be determined as follows:

200
100
100
P(S) = 400 = 2
P(L) = 400 = 4
P(R) = 400 = 4
Also from the second and third statements the transition probabilities can be
determined. The model constructed based on the given data is as shown in Fig. 2.6.
The state entropies can be calculated as follows

1
=
=1 2 bits/ sym
Hk
For state S,
1
HS= 0.52 0.5+0.25 2 0.25 + 0.25 2 0.25

= 1.5 bits/sym
For state L,
1
HL= 0.52 0.5+0.5 2 0.5

= 1 bits/sym
For state R,
1
HR= 0.52 0.5+0.5 2 0.5

= 1 bits/sym
H =
=1 bits/ sym
= P(S) HS+ P(L) HL + P(R) HR
= 1.25 bits/sym
The average information rate of a source is given by,
Rs
= rs H bits/ sec
Given rs= 1 sym/sec
Rs
= 1 * 1.25 = 1.25 bits/ sec
MATALB PROGRAMS
% Design an information system which gives information every

year for around 200 students of E & C branch of NITK. The
students can get into one of the three fields as given below.
(i) Go abroad for higher studies ? A
(ii) Join MBA or IAS ? B
(iii) Join industry in India ? C
Based on the data given below, construct the state diagram of
the source model and find the source entropy.
(a) On the average 100 students are going abroad.
(b) Out of 100 going abroad this year, 50 were reported going
abroad next year while 25 each went to MBA & IAS or joined
industries in India.
(c) Out of 100 remaining in India this year, 50 continued to
do so while 50 went abroad next year.
(d) Those joining MBA & IAS or industry could not swap the two
fields next year.
%source model
clc
paa = 0.5;
pab = 0.25;
pac = 0.25;
pba = 0.5;
pbb = 0.5;
pbc = 0;
pca = 0.5;
pcb = 0;
pcc = 0.5;
pa = [paa pab pac];
pa1 = -log(pa)/log(2);
pb = [pba pbb ];
pb1 = -log2(pb);
pc = [pca pcc];
pc1 = -log2(pc);
P = [
1
1
-0.5 0.5
0.25 -0.5
b = [ 1 ; 0; 0];
a = P\b;
1;
0.5;
0];
PA = a(1);
PB = a(2);
PC = a(3);
ha = pa.*pa1;
hb = pb.*pb1;
hc = pc.*pc1;
HA = sum(ha);
HC = sum(hc);
HB = sum(hb);
h = [HA HB HC];
A = a';
H = sum(h.*A)
fprintf
fprintf
fprintf
fprintf
fprintf
fprintf
fprintf
fprintf
fprintf
fprintf
fprintf
fprintf
fprintf
fprintf
fprintf
fprintf
fprintf
fprintf
fprintf
('FOR SOURCE A\n\n');

('PA = %f\n\n',PA);
('Paa = %f\n',paa);
('Pab = %f\n',pab);
('Pac = %f\n\n',pac);
('The entropy of the source A is %f bit/message-symbol \n\n', HA);
('FOR SOURCE B\n\n');
('PB = %f\n\n',PB);
('Pba = %f\n',pba);
('Pbb = %f\n',pbb);
('Pbc = %f\n\n',pbc);
('The entropy of the source B is %f bit/message-symbol\n\n', HB);
('FOR SOURCE C\n\n');
('PC = %f\n\n',PC);
('Pca = %f\n',pca);
('Pcb = %f\n',pcb);
('Pcc = %f\n\n',pcc);
('The entropy of the source C is %f bit/message-symbol\n\n', HC);
('The total entropy of the source is %f bits/message-symbol', H);
ESSENCE
The amount of information conveyed by a message is inversely proportional to its

probability of occurrence.
The self information content of a message is given by

1
Ik = units
Entropy of a source is given by

1
=
=1 2 bits/ sym
H(S)
The average rate of information is given by

The upper bound of the entropy is given by

H(S)max = Log2N bits/ sym
For an nth order extended source

H(Sn) = n. H(S)
Sources with memory can be modeled using Markoff model.
Entropy of state k is denoted by Hk can be obtained considering all outgoing

probabilities of state k
1
Hk =
=1 2 bits/ sym
The entropy of the source is given by,

H =
=1 bits/ sym
Average information per symbol

1
EXERCISE
1. Define information.
2. Explain how information is measured. Justify the usage of logarithmic function to
measure the information.
3. Consider a system emitting three symbols {X, Y, Z} with probabilities 0.5, 0.3 and
0.2 respectively. Find the information conveyed each of these symbols.
4. Establish a relationship between Hartley and nats
5. Establish a relationship between nats and bits
6. Consider a binary system emitting symbols with probabilities 0.7 and 0.3. Find the
self information of each of the symbols in nats and Hartley.
7. A pair of die is rolled simultaneously. The outcome of the first dice is considered to
be x and that of the second as y. Three events are defined as
A= {(x,y) such that (x+y) is exactly divisible by 4}
B= {(x,y) such that 3 (x+y) 8 }
C= {(x,y) such that (x+y) is a prime number}
Which event conveys more information? Justify your answer with mathematical
calculation.
8. A source emits 6 symbols in every 4ms. The symbol statistics are {0.3, 0.22, 0.20,
0.12, 0.10, 0.06}. Find the average information rate of the source.
9. The output of an information source consists of 100 symbols, 60 of which occurs with
a probability of 1/120 and remaining with a probability of 1/80 each. Find the average
information rate of the source if the source emits 2100 sym/sec.
10. A Morse code uses a sequence of symbols of dots and dashes to transmit the
information. The dash is represented by a pulse of duration 3ms and dot of 1ms. The
probability of dash is one third as that of dot. Consider 1ms duration of gap is given in
between the symbols. Calculate
(i)
Self information of a dot and a dash
(ii)
Average information content of a dot dash code
(iii)
Average rate of information
11. Consider a binary source emitting two symbols X and Y. Let the probability of
emission of X be p. Plot the function H(S) as a function of p.
12. Consider a system emitting four symbols A, B, C and D with respective probabilities
0.5, 0.3, 0.15 and 0.05 respectively. Calculate its efficiency and redundancy.
13. A data frame consists of 10 fields. First field in each frame is the same for some
synchronization purpose. However the remaining fields can be filled by any of 32
symbols with equal probability. Find the average rate of information if 500 frames are
transmitted in every second.
14. In a facsimile transmission of a picture there are 2.6X106 pixels/frame. For a good
reconstruction of the image atleast 12 brightness levels are necessary. Assuming all
these levels are equally likely to occur, find the average information rate if 1 picture is
transmitted for every 4 seconds.
15. Consider a zero memory source emitting two symbols A and B with respective
probabilities {0.6, 0.4}. Calculate
(i)
Entropy of the source
(ii)
All symbols and the corresponding probabilities of the third order

extension of the source. Find the entropy of the third order extension of
the source
(iii)
Show that H(S3)= 3*H(S)
16. For the following Markoff source shown in Fig. 2.7
Find
(i)
State probabilities
(ii)
State entropies
(iii)
Source entropy
(iv)
G1, G2
(v)
Show that G1>G2>H
17. For the following Markoff source shown in Fig. 2.8
Find
(i)
State probabilities
(ii)
State entropies
(iii)
Source entropy
(iv)
G1, G2
(v)
Show that G1>G2>H
18. Design a system to report the speed of collection of 200 cars on a highway. The speed
levels can be High (H), Medium (M) and Slow (S). This information is to be
transmitted every 2 second. Construct a model based on the test data given below.
(i)
On an average during reporting interval, 100 cars were in Medium

speed, 50 were in High and remaining were Slow pace.
(ii)
Out of 100 cars that reported to be Medium, 50 of them reported to

continue in the same pace during next reporting period. 25 of them
were reported to go to High and remaining to go to Slow during the

next period.
(iii)
Out of 50 cars that are reported as going in High or Low pace during a
signaling period, 25 of them continued in the same pace and remaining
went to Medium pace during next signaling interval.
(iv)
The rules of the Highway authority restrict the change from High to
Low or Low to High pace during subsequent reporting periods.
Find the entropies of each states, source entropy and rate of information.
REFERENCES
[1] K. Sam Shanmugam, Digital and Analog Communication Systems, John Wiley, 1996.
[2] Simon Haykin, Digital Communication, John Wiley, 2003.
[3] Ranjan Bose, Information Theory Coding and Cryptography, Tata McGraw Hill, 2007.
[4] Simon Haykin, Michael Moher, Modern Wireless Communications, Pearson Education
2007.
[5] C E Shannon, A Mathematical Theory of Communication, Vol 27, Bell System
Technical Journal, 1948.
[6] Ian A Glover, Peter M Grant, Digital Communications, Pearson Education, 2004.
[7] Bernard Sklar, Pabitra Kumar Ray, Digital Communications: Fundamentals and
Applications, Pearson Education.
[8] Andrea Goldsmith, Wireless Communications, Cambridge University Press, 2005.
[9] John G Proakis, Masoud Salehi, Contemporary Communication System using
MATLAB, PWS Publishing Company.

Ch2 Information Theory

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ch2 Information Theory

Uploaded by

Copyright:

Available Formats

2.

"Measure what is measurable, and make measurable what is not

The outcome of this chapter is to make the readers to:

2.2 Measure of Information

(A) It rained heavily in Chirapunji yesterday.

The information conveyed by a message cannot be negative. It has to be atleast 0

The information conveyed by composite statements which are independent is

Where pk is the probability of occurrence of kth message.

Probability of the occurrence of 0 at the source output is P(0)= 0.6

= 0.6 = 2 0.7368 bits

= 0.4 = 2 1.3219 bits

Claude Elwood Shannon (April 30, 1916 February 24,

P= {(x,y) such that (x+y) is exactly divisible by 3}

= {(1,1), (1,2), (1,3), (1,4), (1,5), (1,6)

(4,1), (4,2), (4,3), (4,4), (4,5), (4,6)

= {(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)}

As the probability of occurrence of event R is less, it conveys more information. This

= 2 1236 = 1.584 bits

As IR > IP and IQ, event R conveys more information

2.3 Average Information Content (Entropy) of a Zero Memory Source

We have the self information of s1 is given by

In other words each symbol of s1 conveys an information of 2 1 bits. And such

Therefore the total information conveyed by symbols of type s1 is p1Llog 2 bits.

Similarly total information conveyed by symbols of type s2 is p2Llog 2

Total information conveyed by symbols of type sN is pNLlog 2 bits.

i.e ITOTAL= p1Llog 2 + p2Llog 2 + . . . + pNLlog 2

Thus the average information conveyed by the source by emitting L symbols is

H(S) = [p1Llog 2 + p2Llog 2 + . . . + pNLlog 2 ]/ L

= L [p1log 2 + p2log 2 + . . . + pNlog 2 ] /L

= p1log 2 + p2log 2 + . . . + pNlog 2

The entropy or the average information content of the source is given by

= 5=1 2 bits/ sym

Substituting the probabilities given in the above equation, we will have

= 5=1 nats/ sym

= 4 14+ 8 18 + 8 18 + 16 316 + 16 516

Similarly H(S) in Hartley/ sym is given by,

= 4 10 14+ 8 10 18 + 8 10 18 + 16 10 316 + 16 10 516

Self information of a dot and a dash

Average information content of a dot dash code

Average rate of information

Given Pdash= 2 Pdot

And therefore Pdash= 2 Pdot = 3

2.3.1 Properties of Entropy

Entropy is a continuous function of probability

Entropy is a symmetric function of its arguments. In other words, the entropy of a

Upper bound on Entropy:

(A) Two equally strong candidates have contested in a constituency.

From mathematics we have the following relation for logarithms

Multiplying both sides by log2e

Therefore the upper bound of the entropy is given by

H(S)max = Log2N bits/ sym

This condition occurs when x=1 i.e. =1 or pk =

2.3.2 Source Efficiency

The redundancy of the source is defined as

= 3=1 2 bits/ sym

Substituting the probabilities given in the above equation, we will have

= 0.72 0.7+ 0.152 0.15+ 0.152 0.15

= H1+ H2+ H3+ . . . + H15

2.4 Extension of Zero Memory Source

H(S) = p1log 2 + p2log 2

s1s1 = p(s1) p(s2) = p1 X p1 = p12

H(S2) = 4=1 pk log 2

= 2p12log 2 + 2p1p2log 2 + 2p1p2log 2 + 2p22log 2

= 2p1 (p1+ p2) log 2 + 2p2 (p1+ p2) log 2

P(X)P(X) = 0.6 0.6 = 0.36

P(X)P(Y) = 0.6 0.3 = 0.18

P(X)P(Z) = 0.6 0.1 = 0.06

P(Y)P(X) = 0.3 0.6 = 0.18

P(Y)P(Y) = 0.3 0.3 = 0.09

P(Y)P(Z) = 0.3 0.1 = 0.03

P(Z)P(X) = 0.1 0.6 = 0.06

P(Z)P(Y) = 0.1 0.3 = 0.03

P(Z)P(Z) = 0.1 0.1 = 0.01