Professional Documents
Culture Documents
Mathematically
2D array Statistically
Transformation Uncorrelated data
Of pixels
Data compression refers to the process of reducing the
amount of data required to represent a given quantity
of information
Data ≠ Information
Various amount of data can be used to represent the
same information
Data might contain elements that provide no relevant
information : data redundancy
Data compression is based on data redundancy
Image Compression for a communication
System
6
Fidelity Criteria
• Error between two functions is given by:
e( x, y) f ( x, y) f ( x, y)
[ f ( x, y) f ( x, y)]
x 0 y 0
M 1 N 1
x 0 y 0
f ( x, y ) 2
SNRms
M 1 N 1
[ f (
x 0 y 0
x , y ) f ( x , y )]2
Let n1 and n2 denote the number of information
carrying units in two data sets that represent the same
information
The relative redundancy RD is define as :
1
RD 1
CR
n1
CR
n2
1
RD 1
CR
n1
CR
n2
Types of Redundancy
11
nk
p (rk )
n
where p(rk) is the probability of a pixel to have a certain
value rk
If the number of bits used to represent rk is l(rk), then
L 1
Lav l (rk )( p (rk )
k 0
7
Lav l (rk )( p (rk )
k 0
Variable
Length
Coding
Inter-pixel Redundancy
16
Classification
• Lossless
• Reconstructed image resembles original image without
loss of information
• Compression ratio is less
• Preferred in medical field
• Huffman codes, LZW, Arithmetic coding, 1D and 2D run-
length encoding, Loss-less Predictive Coding, and Bit-
Plane Coding
• Lossy
• Quality of image is inferior
• Compression ratio is more
• Used in multimedia applications
17
Basic Compression Methods
• 1-D runlength
• 2-D runlength
• Shannon Fano
• Huffman (Binary, Non-Binary, Adaptive)
• Arithmetic
• Dictionary based (LZ77, LZ78)
• Predictive (delta encoding, delta modulation, DPCM
• Transform Based
18
Data Compression Standards
Huffman Coding
23
Huffman Coding
Original source Source reduction
a6 0.3
a1 0.1
a4 0.1
a3 0.06
a5 0.04
24
Huffman Coding
Original source Source reduction
a6 0.3 0.3
a1 0.1 0.1
a4 0.1 0.1
a3 0.06 0.1
a5 0.04
25
Huffman Coding
Original source Source reduction
a3 0.06 0.1
a5 0.04
26
Huffman Coding
Original source Source reduction
a3 0.06 0.1
a5 0.04
27
Huffman Coding
Original source Source reduction
a3 0.06 0.1
a5 0.04
28
Huffman Coding
Original source Source reduction
a3 0.06 0.1
a5 0.04
29
Huffman Coding
Original source Source reduction
a3 0.06 0.1
a5 0.04
30
Huffman Coding
Original source Source reduction
a3 0.06 0.1
a5 0.04
31
Huffman Coding
Original source Source reduction
a3 0.06 0.1
a5 0.04
32
Huffman Coding
Original source Source reduction
a3 0.06 0.1
a5 0.04
33
Huffman Coding
Original source Source reduction
a3 0.06 0.1
a5 0.04
34
Huffman Coding
Original source Source reduction
a3 0.06 0.1
a5 0.04
35
Huffman Coding
Original source Source reduction
a3 0.06 0.1
a5 0.04
36
Huffman Coding
Original source Source reduction
a5 0.04
37
Huffman Coding
Original source Source reduction
a5 0.04
38
Huffman Coding
Original source Source reduction
a5 0.04 01011
a5 0.04 01011
42
Basis Functions
Approximation of Images
Using 8x8 mask and 50% coefficients
45
Selection of Subimage Size
• Transform coding error and computational
complexity increases with increase in subimage
size
• Correlation between adjacent images must as
less as possible
46
Effect of Subimage Size
Block Processing
Processing of entire image
DCT of entire image requires large memory
48
Block Processing
Block artifacts caused by the discontinuities
appear due to rectangular windows
Block artifacts can be minimized by
• Overlapping blocks
• Lowpass filtering of boundary pixels
Overlapping increases bit rate hence low
compression rate
Lowpass filtering results in blurring
49
Zonal coding
Transform coefficients of maximum variance
carry the most picture information
Locations of coefficients with the k largest
variances are indicated by zonal mask
Locations are same for all blocks
Transform coefficients in the zone are retained
Others are converted to zero
Variances are calculated based on a global
image model
50
Threshold coding
51
Threshold and Zonal coding
Truncating, quantizing and coding coefficients
Retained 8
largest
coefficients
of each
subimage,
e = 4.5
Retained 8
distributions
of largest
variance for
each
subimage,
e = 6.5
Zonal Coding Implementation
Ways to threshold transformed Image
Global , local and dynamic
Approximation using DCT and
normalization array
C=19,e=3.83
C=49,e=6.62
C=30,e=4.93
C=85,e=9.35 C=182,e=22.46
C=85,e=13.94
For still photographic images
JPEG standard is a collaboration among :
International Telecommunication Union (ITU)
International Organization for Standardization (ISO)
International Electrotechnical Commission (IEC)
57
Lossy compression technique
Is based on DCT
General image compression technique
independent of
Image resolution
Image and pixel aspect ratio
Color system
Image complexity
A scheme for video compression based on JPEG called
Motion JPEG (MJPEG) exists
Effective because of
1. Image data usually changes slowly across an
image, especially within an 8x8 block
Therefore images contain much redundancy
2. Human is not very sensitive to the high frequency
data images
Therefore some data after transform coding can be
removed
3. Human is much more sensitive to brightness
(luminance) information than color (chrominance)
Therefore JPEG uses chroma subsampling (4:2:0)
For JPG compatibility, system must include
support for the baseline system
Three types
1. Lossy baseline coding system
based on DCT
Adequate for most compression techniques
2. Extended coding system
Greater compression and higher precision
3. Lossless independent coding system for
reversible compression
Compression is performed in 3 steps
1. DCT computation
2. Quantization
3. Variable length code assignment
Input and output data precision is 8 bits
Quantized DCT values are upto 11 bits
8x8 block
1. Image
2. Divide into subimage
3. Level shift by subtracting 2k-1 from each pixel
(maximum levels are 2k)
4. Compute 2-D DCT
5. Quantize
6. Reorder using zigzag pattern to generate 1-D
7. Encode nonzero AC coefficients using
variable length codes
8. Encode the difference between DC coefficients
of current and pervious subimage
The image is divided up into 8x8 blocks
2D DCT is performed on each block
The DCT is performed independently for each block
When a high degree of compression is requested,
JPEG gives a “blocky” image result
Quantization in JPEG aims at reducing the total
number of bits in the compressed image
Divide each entry in the frequency space block by an
integer, then round
Use a quantization matrix Q(u, v)
• Sensitivity of the eye varies with spatial frequency
• Amplitude threshold below which the eye detect a
spatial variations/frequency also varies
• Threshold values vary for each of the 64 DCT
coefficients
• Quantization table stores implements different values
of threshold
• Choice of threshold value is a compromise between the
required level of compression and information loss that
is acceptable
68
JPEG standard has two quantization tables for
the luminance and the chrominance
coefficients.
However, customized tables are allowed and
can be sent with the compressed image
Multiple quantization matrices can be used (by
scaling the defaults), allowing the user to
choose how much compression to use
Trades off quality vs. compression ratio
More compression means larger entries in Q
Use larger entries in Q for the higher spatial
frequencies
High frequency entries are at the lower right
part of the matrix
Default Q(u, v) values for luminance and
chrominance are
Based on psychophysical studies intended to maximize
compression ratios while minimizing perceptual
distortion
After division the entries are smaller, fewer bits can be
used to encode them
52 55 61 66 70 61 64 73
63 59 66 90 109 85 69 72
62 59 68 113 144 104 66 73
63 58 71 122 154 106 70 69
67 61 68 104 126 88 68 70
79 65 60 70 77 63 58 75
85 71 64 59 55 61 65 83
87 79 69 68 65 76 78 94
Normalization
matrix, Z(x,y)
75
round(= - 415/16) N(x,y) = round{T(x,y)/Z(x,y)}
-26 -3 -6 2 2 0 0 0
1 -2 -4 0 0 0 0 0
-3 1 5 -1 -1 0 0 0
-4 1 2 -1 0 0 0 0
1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
76
-26 -3 -6 2 2 0 0 0
1 -2 -4 0 0 0 0 0
-3 1 5 -1 -1 0 0 0
-4 1 2 -1 0 0 0 0
1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Remaining values
are zeroes
Reordered using zigzag pattern
[-26 -3 1 -3 -2 -6 2 -4 1 -4 1 1 5 0 2 0 0 -1 2 0 0 0 0 0 -1 -1 EOB]
77
Range DC AC category
difference
category
0 0 NA
-1,1 1 1
-3, -2, 2, 3 2 2
-7, …, -4,4,…,7 3 3
-15,…-8,8,…15 4 4
-31,…,-16,16,…31 5 5
: : :
-511,…,-256,256,…511 9 9
-1023,…,-512,512,…1023 A A
: : :
-32767,…, -16384, 16384, 32767 F NA
78
Category Base code length category Base code length
0 010 3 6 1110 10
1 011 4 7 11110 12
2 100 5 8 111110 14
3 00 5 9 1111110 16
4 101 7 A 11111110 18
5 110 8 B 111111110 20
79
Reordered using zigzag pattern
[-26 -3 1 -3 -2 -6 2 -4 1 -4 1 1 5 0 2 0 0 -1 2 0 0 0 0 0 -1 -1 EOB]
80
Run/Category Base code length
0/0 (EOB) 1010 4
0/1 00 3
0/2 01 4
: : :
0/A 1111111110000011 26
1/1 1100 5
1/2 111001 8
: : :
1/A 1111111110001000 26
2/1 11011 6
: : :
2/A 1111111110001111 26
: : :
F/A 1111111111111110 26
81
• Reordered sequence is
[-26 -3 1 -3 -2 -6 2 -4 1 -4 1 1 5 0 2 0 0 -1 2 0 0 0 0 0 -1 -1 EOB]
• First nonzero AC coefficient is - 3
• Category is 2
• Preceded by 0 number of zeroes, therefore run is 0
• Code for 0/2 from table is 01 with length 4
• -3 is zeroeth element of array therefore last 2 bits are 00
• Array of code words is
• [1010110 0100,…]
82
• Reordered sequence is
[-26 -3 1 -3 -2 -6 2 -4 1 -4 1 1 5 0 2 0 0 -1 2 0 0 0 0 0 -1 -1 EOB]
• AC coefficient is 1
• Category is 1
• Preceded by 0 number of zeroes, therefore run is 0
• Code for 0/1 from table is 00 with length 3
• 1 is 2nd element of array therefore last 1 bit is 1
• Array of code words is
83
• Reordered sequence is
[-26 -3 1 -3 -2 -6 2 -4 1 -4 1 1 5 0 2 0 0 -1 2 0 0 0 0 0 -1 -1 EOB]
• AC coefficient is 1
• Category is 1
• Preceded by 0 number of zeroes, therefore run is 0
• Code for 0/1 from table is 00 with length 3
• 1 is 2nd element of array therefore last 1 bit is 1
• Complete Array of code words is
• [1010110 0100 001 0100 0101 100001 0110 100011
001 100011 001 001 100101 11100110 110110 0110
11110100 000 1010]
• Image requires 92 bits
• C = 8x8x8/92 = 5.6:1
84
Lookup table of huffman code to decode data
Regenerate array of quantized coefficients
-26 -3 -6 2 2 0 0 0
1 -2 -4 0 0 0 0 0
-3 1 5 -1 -1 0 0 0
-4 1 2 -1 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
-26 -3 -6 2 2 0 0 0
1 -2 -4 0 0 0 0 0
-3 1 5 -1 -1 0 0 0
-4 1 2 -1 0 0 0 0
0 0 0 0 0 0 0 0 x
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
12 -24 -56 0 0 0 0 0
-42 13 80 -24 -40 0 0 0
-56 17 44 -29 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
88
58 64 67 64 59 62 70 78
56 55 67 89 98 88 74 69
60 50 70 119 141 116 80 64
69 51 71 128 149 115 77 68
74 53 64 105 115 84 65 72
76 57 56 74 75 57 57 74
83 69 59 60 61 61 67 78
93 81 67 62 69 80 84 84
AC Zig-zag Huffman
88
reordering coding
Color FDCT JPEG
components after Quantizer
bit-stream
(Y, Cb, or Cr) centeri
ng Difference Huffman
DC Encoding coding
Quantization
Table Huffman
Table
97
For nonnegative integers
Based on the assumption that larger an integer
the lower is the probability of occurence
n is a non negative integer
Simplest form is unary code
Unary code is n 1s followed by a ‘0’
n=4, code is 11110
Golomb code of n with respect to m is Gm(n),
m> 0 and n>0
Unary code is G1(n)
98
r = n – qm, r = 0,1,…,m-1
n = mq + r
Form unary code for q
k = Log2m rounded to higher integer
c=2k – m, r = n mod m
r‘ = r truncated to k-1 bits for 0<= r < c
r‘ = r + c truncated to k bits for other values of r
Concatenate codes of q and r’
For m = 2k, c=0 and r is truncated to k bits
called Golomb Rice codes
99
Ex: m = 5, n=7
7 = 1x5 +2 , q=1 and r =2
k = Log25 rounded to higher integer is 3
c=2k – m =3
q is coded to 10
Since r < c, r=2 is coded using k=2 bits
For n = 7, code is 1010
100
For G4(9), m = 4 and n=9
n = qm + r, r = 0,1,…,m-1
9 = 2x4 + 1, q = 2 and r=1
q is coded to 110
k = log24 = 2 bits
c = 22 – 4 = 0
Since r = 1, r is truncated to 2(=k ) bits = 01
Code is 11001
101
1. Find an integer i >= 0 such that
𝑖−1 𝑗+𝑘 𝑖 𝑗+𝑘
𝑗=0 2 ≤𝑛< 𝑗=0 2
and form a unary code of i.
If k=0 then i = 𝑙𝑜𝑔2 (𝑛 + 1) , (lower integer)
2. Truncate binary representation of
𝑖−1 𝑗+𝑘
𝑛− 𝑗=0 2
to atleast k+i LSB
3. Concatenate the result of step 1 and 2
102
1. If k = 0 then i = 𝑙𝑜𝑔2 (9) =3
and form a unary code of i.
i is coded to 1110
2. Truncate binary representation of
𝑖−1 𝑗+𝑘
𝑛− 𝑗=0 2
to atleast k+i LSB
3−1 𝑗
8− 𝑗=0 2 = 1 to atleast 3 LSB bits
Code 1 to 001
3. Concatenate the result of step 1 and 2
104
Interval 0 to 1 is divided according to the
probabilities of the occurrence of intensities
Does not generate codes for each character
Performs arithmetic operation on a block of
data
Possible to encode characters with a fractional
number of bits thus approaching the theoretical
optimum
Works better than Huffman for codes with low
entropy
105
A source emits 4 symbols {a,b,c,d} with the
probabilities 0.4, 0.2, 0.1 and 0.3 respectievely.
Encode the word, ‘dad’
Symbol a b c d
Probability 0.4 0.2 0.1 0.3
Subrange (0-0.4) (0.4-0.6) (0.6-0.7) (0.7-1)
a b c d
106
A source emits 4 symbols {a,b,c,d} with the
probabilities 0.4, 0.2, 0.1 and 0.3 respectievely.
Encode the word, ‘dad’
Symbol a b c d
Probability 0.4 0.2 0.1 0.3
Subrange (0-0.4) (0.4-0.6) (0.6-0.7) (0.7-1)
a b c d
107
A source emits 4 symbols {a,b,c,d} with the
probabilities 0.4, 0.2, 0.1 and 0.3 respectievely.
Encode the word, ‘dad’
Symbol a b c d
Probability 0.4 0.2 0.1 0.3
Subrange (0-0.4) (0.4-0.6) (0.6-0.7) (0.7-1)
a b c d
108
0 0.4 0.6 0.7
a b c d 1
R = 1- 0.7 = 0.3
109
0 0.4 0.6 0.7
a b c d 1
R = 1- 0.7 = 0.3
110
0 0.4 0.6 0.7
a b c d 1
R = 1 – 0.7 = 0.3
R = 1 – 0.7 = 0.3
R = 1 – 0.7 = 0.3
a
0.7 0.82 0.88 0.91 1
R = 0.82–0.7=0.12
0.82
0.7
0.748 0.772 0.784 d
0 0.802 1
114
Symbol a b c d
Probability 0.4 0.2 0.1 0.3
Subrange (0-0.4) (0.4-0.6) (0.6-0.7) (0.7-1)
0.802
0 0.7 1
d T=0.802
1
a
0.34 0.4 T’=(T-0.7)/0.3
0 = 0.34
d 1
0.7 0.85 T’=(T-0)/0.4
= 0.85
115
Symbol a b c d
Probability 0.4 0.2 0.1 0.3
Subrange (0-0.4) (0.4-0.6) (0.6-0.7) (0.7-1)
0.802
0 0.7 1
d T=0.802
1
a
0.34 0.4 T’=(T-0.7)/0.3
0 = 0.34
d 1
0.7 0.85 T’=(T-0)/0.4
= 0.85
116
Symbol a b c d
Probability 0.4 0.2 0.1 0.3
Subrange (0-0.4) (0.4-0.6) (0.6-0.7) (0.7-1)
0.802
0 0.7 1
d T=0.802
1
a
0.34 0.4 T’=(T-0.7)/0.3
0 = 0.34
d 1
0.7 0.85 T’=(T-0)/0.4
= 0.85
117
Low computational overhead
Achieves good compression
Error free or lossy
Eliminates redundancies of closely spaced
pixels
Extracts new information in each pixel
New information is difference between actual
and predicted
Loss-less Predictive Encoding
m
i f x, y i
ˆf (x, y) round
i 1
m
fˆ (x, y) round i f x, y i
i 1
Histogram of image
𝑓 𝑥, 𝑦 = 𝑟𝑜𝑢𝑛𝑑 𝛼𝑓(𝑥, 𝑦 − 1)
𝑒 𝑥, 𝑦 = 𝑓 𝑥, 𝑦 − 𝑓 (𝑥, 𝑦)
Histogram of
Prediction error
Image
Maximum
compression achieved
is 8:3.99 = 2:1
Two views of moving earth
Standard
deviation of
error is 3.76
bits/pixel which
is much less
than 15.58
bits/pixel for
standard
deviation of one
frame
Entropy of error
is 2.59 bits/pixel
which is less
than 3.99 Maximum
1
compression
bits/pixel for
entropy of one
fˆ (x, y, t) round f x, y, t 1 achieved is
frame
i 1 8:2.59 = 3.1:1
Video Compression
Based temporal component since video consists of a
time sequence of images
Take advantage of temporal correlation
There are different situations in which video
compression becomes necessary
Each requires a solution specific to its peculiar
conditions
video compression algorithms and standards are
developed for different video communications
applications
124
Video Compression
125
Video Compression
• View video as a sequence of correlated images
• Most of the video compression algorithms make use of
the temporal correlation to remove redundancy
• Previous reconstructed frame is used to generate a
prediction for the current frame
• Difference between the prediction and the current frame,
the prediction error or residual, is encoded and
transmitted to the receiver
• Previous reconstructed frame is also available at the
receiver
126
Video Compression
• For the compression algorithm designed for two-way
communication, coding delay should to be minimal
• Compression and decompression should have about the
same level of complexity
• The complexity can be unbalanced in a broadcast
application with one transmitter and many receivers and
one-way communication
• In this case, the encoder can be much more complex than
the receiver
• There is also more tolerance for encoding delays
• For personal computers, the decoding complexity has to
be low for the decoder to decode a sufficient number of
images to give the illusion of motion
127
Video Compression
128
Video Compression
129
Motion Compensation
In most video sequences there is a little change in the
contents of the image from one frame to the next
There is a significant portion of the image that does not
change from one frame to the next
Most video compression schemes take advantage of
this redundancy by using the previous frame to
generate a prediction for the current frame
Prediction based on pixel to pixel comparison may lead
to large data
The object in one frame that provides the pixel at a
certain location (i0, j0) with its intensity value may
provide the same intensity value in the next frame to a
pixel at location (i1, j1)
130
Motion Compensation
131
Block based Motion Compensation
132
Mean Absolute Distortion (MAD)
• Motion of the object is measured and encoded
into motion vectors
• Search for motion vector may be based on
maximum correlation or minimum error
between macroblock pixels and predicted pixels
1 m n
MAD(x, y)
mn i 1 j 1
f x i, y j p(x i dx, y j dy)
134
Motion Vector
upper left corner of best matching block - upper left
corner of block to be encoded
8x8 block to be encoded has coordinates (10,9) and
(17,16)
Block that shows best matching has coordinates (6,23)
and (13,30)
Motion vector is (-4, 14)
Positive x component shows best matching block is to
the right of the block being encoded
Positive y component shows best matching block is
below the block being encoded
135
Motion Compensation
136
Motion Compensation
137
Motion Compensation
138
Motion Compensated Predictive Coding
SD=12.73
E=4.17
C=8/4.17
= 1.92
SD=5.62
E=3.04
C=8/3.04
= 2.63.
Sub pixel Motion Compensated Predictive Coding
SD=12.7
E=4.17
SD=4.4
E=3.35
SD=4
E=3.34
SD=3.8
E=3.35
Video Signal
• The composite color signal consists of luminance
component, Y, black-and-white signal
Y = 0.299R+0.587G+0.114B
where R is the red component, G is the green component,
and B is the blue component.
• Two chrominance signals are
Cb = B−Y and
Cr = R−Y
• Three signals can be used by the color television set to
generate the red, blue, and green signals
• The luminance signal can be used directly by the black-
and-white televisions
• Eye is much less sensitive to changes of the chrominance
in an image
141
Y 0.299000 0.587000 0.114000 R 0
C 0.168736 0.331264 0.500002 G 128
b
Cr 0.500000 0.418688 0.081312 B 128
(a) translate from RGB to YCbCr
142
ITU-T Recommendation H.261
143
ITU-T Recommendation H.261
144
Motion Compensation
145
Size of block to reduce the no. of computations
146
• For a block of 2×2 squares, it is possible to find a block
that exactly matches the 2×2 block that contains the circle
• For 4×4 squares, the block that contains the circle also
contains the upper part of the octagon
• Similar 4×4 block in the previous frame can not be found
• Thus, there is a trade-off
147
Size of block to reduce the no. of computations
148
Size of block for H.261
150
Loop Filter
151
Example: Loop Filter
(110x1/4)+(218x1/2)+ (116x1/4)=165
153
Quantization and coding
• In the case of an intra block, the DC coefficients take on
much larger values than the other coefficients
• For the little motion from frame to frame, motion vector
has small values leading to small values for the
coefficients
• For H.261 algorithm 32 different quantizers can be used
• Different quantizers may be used for macroblocks
• One quantizer is reserved for the intra DC coefficient
• Remaining 31 quantizers are used for the other
coefficients
• The intra DC quantizer is a uniform quantizer with a step
size of 8
• The other quantizers use a step size of an even value
between 2 and 62
154
Quantization and coding
155
A GOB with 33 blocks
156
Rate Control
157
Asymmetric applications
• The ITU-T H.261 algorithm is designed for videophone
and videoconferencing applications
• Therefore, the algorithm operates with minimal coding
delay (less than 150 milliseconds)
• There are a number of applications in which it is cost
effective for more computations at the encoder
• In multimedia applications where a video sequence is
stored on a CD-ROM, the speed of decompression is
performed in real time
• However, the compression is performed only once, and
there is no need for it to be in real time
• Thus, the encoding algorithms can be significantly more
complex
158
The MPEG-1 Video Standard
159
The MPEG-1 Video Standard
160
I-frames
• I-frames resemble JPEG encoded block
• Ideal for starting points for generation of prediction
residuals
• Provide high degree of random access, ease of editing and
resistance to propagation of transmission error
• All standards require periodic insertion of I-frames into
compressed video code stream
• I frames do not use temporal correlation, therefore
compression rate is low compared to the frames that
make use of the temporal correlations for prediction
• Thus, the number of frames between two consecutive I
frames is a trade-off between compression efficiency and
convenience
161
Motion Compensated Predictive Coding
for MPEG
163
B frames
• To compensate for the reduction in the amount of
compression due to the frequent use of I frames, the MPEG
standard uses B frames
• B frames achieve a high level of compression by using
motion-compensated prediction from the most recent
anchor frame and the closest future anchor frame
• Example: Video sequence in which there is a sudden
change between one frame and the next
• This is a common occurrence in TV advertisements
• In this situation, prediction based on the past frames may
be useless
• However, predictions based on future frames has a high
probability of being accurate
164
B frames
165
Group of Pictures
• Different frames are organized together in a group of
pictures (GOP)
• A GOP is the smallest random access unit in the video
sequence
• Structure is set up as a trade-off between the high
compression efficiency of motion-compensated coding
and the fast picture acquisition capability of periodic
intra-only processing
• Must contain at least one I frame
• First I frame in a GOP is either the first frame of the
GOP, or is preceded by B frames that use motion-
compensated prediction only from this I frame
166
group of pictures.
• The first frame is an I frame, which is compressed
without reference to any previous frame
• The next frame to be compressed is the fourth frame
• This frame is compressed using motion-
compensated prediction from the first frame
• Compress frame two, which is compressed using
motion-compensated prediction from frame 1 and 4
167
A typical sequence of frames in display order
• Third frame is also compressed using motion-compensated
prediction from the first and fourth frames
• The next frame to be compressed is frame seven, which uses
motion-compensated prediction from frame four
• This is followed by frames 5 and 6, which are compressed
using motion-compensated predictions from frames 4 and 7
• Processing order that is different from the display order,
bitstream order
168
A typical sequence of frames in display order
169
sequence of frames
170
sequence of frames
171
Rate control in MPEG
172
MPEG Encoder
Specifications of MPEG
174
Specifications of MPEG
175
MPEG 1 to MPEG 2
176
Image compression using wavelet coding
• Subimage processing is not required
• Eliminates blocking artifacts
• Wavelet transforms are computationally efficient and
have capability of multiresolution
compressed
Input
image
image Wavelet Symbol
Quantizer
Transform encoder
decompressed
compressed Inverse image
Symbol
image wavelet
decoder
transform
177
Selection of wavelets
179
Used for still images and video
Uses wavelet transform
Better compression than JPG
Portion of the compressed image can be
extracted for retransmission, storage, display
and/or editing
Coefficient quantization is dependent on the
scale and subbands
Coefficients are arithmetically coded on
bitplane basis
Isolate particular bits of
intensity value
Shows contribution of
each bit
Higher-order bits
usually contain most of
the significant visual
information
Lower-order bits
contain subtle details
1. Take an image with k-bit intensity levels
2. Level shift by -2k-1
3. For color image perform level shifting on each of
three color components e.g. R, G and B
• Decorrelate level shifted components
• 𝑌 𝑥, 𝑦 = 0.299𝑅 𝑥, 𝑦 + 0.587𝐺 𝑥, 𝑦 + 0.114𝐵 𝑥, 𝑦
• 𝐶𝑏 𝑥, 𝑦 = −0.16875𝑅 𝑥, 𝑦 − 0.33126𝐺 𝑥, 𝑦 + 0.5𝐵 𝑥, 𝑦
• 𝐶𝑟 𝑥, 𝑦 = 0.5𝑅 𝑥, 𝑦 − 0.41869𝐺 𝑥, 𝑦 − 0.08131𝐵 𝑥, 𝑦
• Decorrelation is optional
• Histogram of resulting components are peaked
around zero
• Image/ color components can be subdivided into
tiles (optional)
1. Compute 1-D DWT for rows and columns of
each tile
2. Transformation generates 4 subbands (WD, WV,
WH, WØ)
3. Transformation can be repeated NL number of
times to produce NL -scale coefficients.
4. Lowest scale approximation coefficient is the
approximation of original image
NL - scale transform
a2LL(u,v) a2HL(u,v)
a1HL(u,v)
a2LH(u,v) a2HH(u,v)
a1LH(u,v) a1HH(u,v)
0 1
1
1 2
1 2
Bit Arrange
Arithmetic Three
Layers plane in code
coding passes
slicing block
packets
190
Invert the operations of encoder
Reconstruct the subbands of the tile components
from arithmetically coded packets
Decode subbands
Out of Mb bit planes, Nb bitplanes can be
decoded
This is equivalent to quantizing the coefficients of
code block using step size of 2𝑀𝑏−𝑁𝑏 . ∆𝑏
Uncoded bits are set to zero
Resulting coefficients unquantized
𝑞 𝑢, 𝑣 are non zero coefficients
After inverse quantization
𝑅𝑞𝑏 𝑢, 𝑣
𝑞𝑏 (𝑢, 𝑣 + 𝑟 × 2𝑀𝑏−𝑁𝑏 𝑢,𝑣 ∆𝑏 𝑞𝑏 𝑢, 𝑣 > 0
= 𝑞𝑏 (𝑢, 𝑣 − 𝑟 × 2𝑀𝑏−𝑁𝑏 𝑢,𝑣 ∆𝑏 𝑞𝑏 𝑢, 𝑣 < 0
0 𝑞𝑏 𝑢, 𝑣 = 0
𝑅𝑞 𝑢, 𝑣 denotes inverse quantized transform
𝑏
coefficient and 𝑁𝑏 𝑢, 𝑣 is the number of decoded
bit planes for 𝑞𝑏 (𝑢, 𝑣 .
Reconstruction parameter, r is chosen by the
decoder to produce the best visual quality
Generally 0 <= r < 1
Therefore r=1/2
Compute inverse FWT
Assemble component tiles
Compute inverse component transformation
𝑅 𝑥, 𝑦 = 𝑌 𝑥, 𝑦 + 1.402𝐶𝑟 𝑥, 𝑦
𝐺 𝑥, 𝑦 = 𝑌 𝑥, 𝑦 − 0.34413𝐶𝑏 𝑥, 𝑦 - 0.71414𝐶𝑟 𝑥, 𝑦
𝐵 𝑥, 𝑦 = 𝑌 𝑥, 𝑦 + 1.772𝐶𝑏 𝑥, 𝑦
196
Buffer smoothens the output of the compression
algorithm for a high-activity region of the video
Buffer generate more than the average number of
bits per second, in order to prevent the buffer
from overflowing
This is followed by the generation of fewer bits
per second than the average
Done by increasing the step size or dropping
coefficients, or entire frames
This may reduce the quality of image
197
If ATM network is not congested then it
accommodate the variable rate generated by
the compression algorithm
Otherwise compression algorithm operates at a
reduced rate
If the network is well designed compression
algorithm need not operate at a lower rate
Therefore video coder can provides uniform
quality
198
Congestion might cause long delays therefore
some packets arrive after they can be of any use
That is, the frame they were supposed to be a
part of might have already been reconstructed
To avoid these problems video compression
algorithm provides information in a layered
fashion
Low-rate high-priority layer is used to
reconstruct the video, even though the
reconstruction is poor
Low-priority enhancement layers enhance the
quality of the reconstruction
199
Highest
priority
& low
data rate
200
Encode the difference between the current
frame and the prediction for the current frame
using a 16×16 DCT
Transmit the DC coefficient and the three
lowest-order AC coefficients to the receiver
The coded coefficients make up the highest-
priority layer
201
At the transmitter reconstructed frame is
subtracted from the original
The sum of squared errors is calculated for
each 16×16 block
Blocks with squared error greater than a
prescribed threshold are subdivided into four
8×8 blocks, and the coding process is repeated
using an 8×8 DCT
The coded coefficients of 16×16 block make up
the next layer
202
Since blocks that fail to meet the threshold test
are subdivided this information is transmitted
as side information
Repeated with 4×4 blocks, to make the third
layer, and 2×2 blocks to make the fourth layer
Data rate for the first layer is constant and is
variable for other layers
To remove the effect of delayed packets from
the prediction, only the reconstruction from the
higher-priority layers is used for prediction.
203