Professional Documents
Culture Documents
What is Information?
Form 1995 we have been witnessing a transformation or revolution in the way we communicate
and the process is still under way. It includes ever growing use of internet, explosion of mobile
networks and video conferencing. Information i.e. text, audio, pictures and video is transmitted
on networks. Data compression is one enabling technologies for each of aspects of multimedia
revolution.
Information treated in various disciplines
physics basic property as energy, matter
biology senses
neurophysiology brain processes
psychology behavior, perception
cognitive science cognition
telecommunication, computer science signals, bits
philosophy knowledge
The discussion centers on information as an energy and signals/bits.
Define the scope of information theory and coding.
Information theory is a branch of applied mathematics, electrical engineering, and computer
science involving the quantification of information.
Information theory studies the transmission, processing, utilization, and extraction of
information. In the case of communication of information over a noisy channel, this abstract
concept was made concrete in 1948 by Claude Shannon in A Mathematical Theory of
Communication, in which "information" is thought of as a set of possible messages, where the
goal is to send these messages over a noisy channel, and then to have the receiver reconstruct the
message with low probability of error, in spite of the channel noise. Shannon's main result, the
Noisy-channel coding theorem showed that, in the limit of many channel uses, the rate of
information that is asymptotically achievable is equal to the Channel capacity, a quantity
dependent merely on the statistics of the channel over which the messages are sent.
Coding theory is concerned with finding explicit methods, called codes, for increasing the
efficiency and reducing the error rate of data communication over noisy channels to near the
Channel capacity. These codes can be roughly subdivided into data compression (source coding)
and error-correction (channel coding) techniques. A third class of information theory codes are
cryptographic algorithms (both codes and ciphers). Concepts, methods and results from coding
theory and information theory are widely used in cryptography and cryptanalysis
H ( A ) = pilog pi bits
i=0
Entropy is typically measured in bits, nats, or bans.
For example, the human ear can detect sound across the frequency range of 20 Hz to 20 kHz.
According to the sampling theorem, one should sample sound signals at least at 40 kHz in order
for the reconstructed sound signal to be acceptable to the human ear. Components higher than
20 kHz cannot be detected, but they can still pollute the sampled signal through aliasing.
Therefore, frequency components above 20 kHz are removed from the sound signal before
sampling by a band-pass or low-pass analog filter. Practically speaking, the sampling rate is
typically set at 44 kHz (rather than 40 kHz) in order to avoid signal contamination from the filter
rolloff.
Oversampling
In practice signal are oversampled, where fsis significantlyhigher than Nyquist rate to avoid
aliasing.
5. Entropy encoding
Entropy Coding
The process of entropy coding (EC) can be split in two parts: modeling and coding. Modeling
assigns probabilities to the symbols, and coding produces a bit sequence from these probabilities.
As established in Shannon's source coding theorem, there is a relationship between the symbol
probability and it's corresponding bit sequence. A symbol with probabilitiy p gets a bit sequence
of length -log(p).
In order to achieve a good compression rate, an exact propability estimation is needed. Since the
model is responsible for the probability of each symbol, modeling is one the most important
tasks in data compression.
Entropy coding can be done with a coding scheme, which uses a discrete number of bits for each
symbol, for example Huffman coding, or with a coding scheme, which uses a discrete number of
bits for several symbols. In the last case we get arithmetoc coding, if the number of symbols is
equal to the total number of symbols to encode.
For example, suppose we want to transmit messages composed of the four letters a, b, c, and d. A
straightforward scheme for coding these messages in bits would be to represent a by 00, b by
01, c by 10 and d by 11. However, suppose we know that for any letter of the message
(independent of all other letters), a occurs with probability .5, b occurs with probability .25, and
c or d occur with probability .125 each. Then we might choose a shorter representation for a, at
the necessary cost of accepting longer representations for the other letters. We could represent a
by 0, b by 10, c by 110, and d by 111. This representation is more compact on average
than the first one; indeed, it is the most compact representation possible (though not uniquely
so). In this simple example, the modeling part of the problem is determining the probabilities for
each possible symbol value; the entropy-coding part of the problem is determining the
representations in bits from those probabilities; let us emphasize that the probabilities associated
with the symbol values play a fundamental role in entropy coding.
6 problems on entropy calculation
1)aaabbbccdbbddc
2)11213444543323
3) B D C B C E C C C A D C B D D A A E C E E A
4) A B B D A E E C A C E E B A E E C B C E A D
5) Calculate the entropy for following events
a) From a pack of 52 cards, a card is drawn at random the independent outcome set
={ queen of club, a face card i.e. Jack, Queen and King , 8 of red color }
b) A fair coin tossed ten times outcome set T T H T T H H H T H T