You are on page 1of 203

 Savings in storage and transmission

 multimedia data (esp. image and video) have large


data volume
 difficult to send real-time uncompressed video over
current network

 Accommodate relatively slow storage devices


 they do not allow playing back uncompressed
multimedia data in real time
 1x CD-ROM transfer rate ~ 150 kB/s
 320 x 240 x 24 fps color video bit rate ~ 5.5MB/s
 2-hour Standard Definition TV (SDTV) requires
(30x720x480x3=31,104,000 bytes/pixel)x(60x60)x2
=2.24x1011 bytes
27 DVD (8.5 GB each) are needed to store it
 HDTV has resolution of 1920x1080x24 bits/image
 Broad band Internet connection has 12 Mbps
128x128x24 image will require 0.03 sec
 Televideo conferencing, remote sensing, document
imaging, fascimile transmission require compressed
 Remove redundancy from the data

 Mathematically

2D array Statistically
Transformation Uncorrelated data
Of pixels
 Data compression refers to the process of reducing the
amount of data required to represent a given quantity
of information
 Data ≠ Information
 Various amount of data can be used to represent the
same information
 Data might contain elements that provide no relevant
information : data redundancy
 Data compression is based on data redundancy
Image Compression for a communication
System

6
Fidelity Criteria
• Error between two functions is given by:

e( x, y)  f ( x, y)  f ( x, y)

• Total error between the two images is


M 1 N 1 

[ f ( x, y)  f ( x, y)]
x 0 y 0

• Root-mean-square error averaged over the whole image is


1 
erms  [ f ( x, y)  f ( x, y)]2
MN
 Expressed in the form of signal to noise ratio

M 1 N 1 

x 0 y 0
f ( x, y ) 2
SNRms 

M 1 N 1

 [ f (
x 0 y 0
x , y )  f ( x , y )]2
 Let n1 and n2 denote the number of information
carrying units in two data sets that represent the same
information
 The relative redundancy RD is define as :

1
RD  1 
CR
n1
CR 
n2

where CR is compression ratio


 If n1 = n2 , CR=1 and RD=0 no redundancy
 If n1 >> n2 , CR  ∞ and RD  1 high redundancy
 If n1 << n2 , CR  0 and RD  -∞ undesirable
 A compression ratio of 10 (10:1) means that the first data
set has 10 information carrying bits for every 1 bit in the
compressed data set.

1
RD  1 
CR
n1
CR 
n2
Types of Redundancy

Statistical Redundancy Psychovisual Redundancy

Interpixel Redundancy Coding Redundancy

Spatial Redundancy Temporal Redundancy

11
nk
p (rk ) 
n
where p(rk) is the probability of a pixel to have a certain
value rk
If the number of bits used to represent rk is l(rk), then

L 1
Lav   l (rk )( p (rk )
k 0
7
Lav   l (rk )( p (rk )
k 0

 2(0.19)  2(0.25)  3(0.16)  ...  6(0.02)


 2.7 bits
3
CR   1.11
2.7
1
RD  1   0.099
1.11
Coding Redundancy

Variable
Length
Coding
Inter-pixel Redundancy

Approximately the same


Histogram.

Pixel dependence can be exploited

Each pixel can be estimated


From its neighbors.
Functional Block Diagram of
Image Compression

16
Classification
• Lossless
• Reconstructed image resembles original image without
loss of information
• Compression ratio is less
• Preferred in medical field
• Huffman codes, LZW, Arithmetic coding, 1D and 2D run-
length encoding, Loss-less Predictive Coding, and Bit-
Plane Coding
• Lossy
• Quality of image is inferior
• Compression ratio is more
• Used in multimedia applications

17
Basic Compression Methods
• 1-D runlength
• 2-D runlength
• Shannon Fano
• Huffman (Binary, Non-Binary, Adaptive)
• Arithmetic
• Dictionary based (LZ77, LZ78)
• Predictive (delta encoding, delta modulation, DPCM
• Transform Based

18
Data Compression Standards
Huffman Coding

• Yields smallest possible number of bits/symbol.


• Symbols must be coded one at a time
• Symbols can be intensity of pixel or output of a mapper

23
Huffman Coding
Original source Source reduction

Sym prob code 1 2 3 4


bol
a2 0.4

a6 0.3

a1 0.1

a4 0.1

a3 0.06

a5 0.04

24
Huffman Coding
Original source Source reduction

Sym prob code 1 2 3 4


bol
a2 0.4 0.4

a6 0.3 0.3

a1 0.1 0.1

a4 0.1 0.1

a3 0.06 0.1

a5 0.04

25
Huffman Coding
Original source Source reduction

Sym prob code 1 2 3 4


bol
a2 0.4 0.4 0.4

a6 0.3 0.3 0.3

a1 0.1 0.1 0.2

a4 0.1 0.1 0.1

a3 0.06 0.1

a5 0.04

26
Huffman Coding
Original source Source reduction

Sym prob code 1 2 3 4


bol
a2 0.4 0.4 0.4 0.4

a6 0.3 0.3 0.3 0.3

a1 0.1 0.1 0.2 0.3

a4 0.1 0.1 0.1

a3 0.06 0.1

a5 0.04

27
Huffman Coding
Original source Source reduction

Sym prob code 1 2 3 4


bol
a2 0.4 0.4 0.4 0.4 0.6

a6 0.3 0.3 0.3 0.3 0.4

a1 0.1 0.1 0.2 0.3

a4 0.1 0.1 0.1

a3 0.06 0.1

a5 0.04

28
Huffman Coding
Original source Source reduction

Sym prob code 1 2 3 4


bol
a2 0.4 0.4 0.4 0.4 0.6 0

a6 0.3 0.3 0.3 0.3 0.4 1

a1 0.1 0.1 0.2 0.3

a4 0.1 0.1 0.1

a3 0.06 0.1

a5 0.04

29
Huffman Coding
Original source Source reduction

Sym prob code 1 2 3 4


bol
a2 0.4 1 0.4 1 0.4 1 0.4 1 0.6 0

a6 0.3 0.3 0.3 0.3 0.4 1

a1 0.1 0.1 0.2 0.3

a4 0.1 0.1 0.1

a3 0.06 0.1

a5 0.04

30
Huffman Coding
Original source Source reduction

Sym prob code 1 2 3 4


bol
a2 0.4 1 0.4 1 0.4 1 0.4 1 0.6 0

a6 0.3 0.3 0.3 0.3 00 0.4 1

a1 0.1 0.1 0.2 0.3 01

a4 0.1 0.1 0.1

a3 0.06 0.1

a5 0.04

31
Huffman Coding
Original source Source reduction

Sym prob code 1 2 3 4


bol
a2 0.4 1 0.4 1 0.4 1 0.4 1 0.6 0

a6 0.3 00 0.3 00 0.3 00 0.3 00 0.4 1

a1 0.1 0.1 0.2 0.3 01

a4 0.1 0.1 0.1

a3 0.06 0.1

a5 0.04

32
Huffman Coding
Original source Source reduction

Sym prob code 1 2 3 4


bol
a2 0.4 1 0.4 1 0.4 1 0.4 1 0.6 0

a6 0.3 00 0.3 00 0.3 00 0.3 00 0.4 1

a1 0.1 0.1 0.2 0.3 01

a4 0.1 0.1 0.1

a3 0.06 0.1

a5 0.04

33
Huffman Coding
Original source Source reduction

Sym prob code 1 2 3 4


bol
a2 0.4 1 0.4 1 0.4 1 0.4 1 0.6 0

a6 0.3 00 0.3 00 0.3 00 0.3 00 0.4 1

a1 0.1 0.1 0.2 010 0.3 01

a4 0.1 0.1 0.1 011

a3 0.06 0.1

a5 0.04

34
Huffman Coding
Original source Source reduction

Sym prob code 1 2 3 4


bol
a2 0.4 1 0.4 1 0.4 1 0.4 1 0.6 0

a6 0.3 01 0.3 01 0.3 01 0.3 00 0.4 1

a1 0.1 011 0.1 011 0.2 010 0.3 01

a4 0.1 0.1 0.1 011

a3 0.06 0.1

a5 0.04

35
Huffman Coding
Original source Source reduction

Sym prob code 1 2 3 4


bol
a2 0.4 1 0.4 1 0.4 1 0.4 1 0.6 0

a6 0.3 01 0.3 01 0.3 01 0.3 00 0.4 1

a1 0.1 011 0.1 011 0.2 010 0.3 01

a4 0.1 0.1 0.1 011

a3 0.06 0.1

a5 0.04

36
Huffman Coding
Original source Source reduction

Sym prob code 1 2 3 4


bol
a2 0.4 1 0.4 1 0.4 1 0.4 1 0.6 0

a6 0.3 01 0.3 01 0.3 01 0.3 00 0.4 1

a1 0.1 011 0.1 011 0.2 010 0.3 01

a4 0.1 0.1 0100 0.1 011

a3 0.06 0.1 0101

a5 0.04

37
Huffman Coding
Original source Source reduction

Sym prob code 1 2 3 4


bol
a2 0.4 1 0.4 1 0.4 1 0.4 1 0.6 0

a6 0.3 01 0.3 01 0.3 01 0.3 00 0.4 1

a1 0.1 011 0.1 011 0.2 010 0.3 01

a4 0.1 0100 0.1 0100 0.1 011

a3 0.06 0.1 0101

a5 0.04

38
Huffman Coding
Original source Source reduction

Sym prob code 1 2 3 4


bol
a2 0.4 1 0.4 1 0.4 1 0.4 1 0.6 0

a6 0.3 01 0.3 01 0.3 01 0.3 00 0.4 1

a1 0.1 011 0.1 011 0.2 010 0.3 01

a4 0.1 0100 0.1 0100 0.1 011

a3 0.06 01010 0.1 0101

a5 0.04 01011

Lavg= (0.4×1)+ (0.3×2)+ (0.1×3)+ (0.1×4)+ (0.06×5)+ (0.04×5)=2.2 bits/pixel


Entropy=(0.4×log0.4)+(0.3×log0.3)+2(0.1×log0.1)+(0.06×log0.06)+(0.04×log0.04)
= 2.14 bit/symbol 39
Huffman Coding
Original source Source reduction

Sym prob code 1 2 3 4


bol
a2 0.4 1 0.4 1 0.4 1 0.4 1 0.6 0

a6 0.3 01 0.3 01 0.3 01 0.3 00 0.4 1

a1 0.1 011 0.1 011 0.2 010 0.3 01

a4 0.1 0100 0.1 0100 0.1 011

a3 0.06 01010 0.1 0101

a5 0.04 01011

Lavg= (0.4×1)+ (0.3×2)+ (0.1×3)+ (0.1×4)+ (0.06×5)+ (0.04×5)=2.2 bits/pixel


Entropy=(0.4×log0.4)+(0.3×log0.3)+2(0.1×log0.1)+(0.06×log0.06)+(0.04×log0.04)
= 2.14 bit/symbol 40
Block Transform Coding
Transform Selection

• Based on amount of reconstruction error (at the


time of quantization) computational resources

42
Basis Functions
Approximation of Images
Using 8x8 mask and 50% coefficients

Rms error: 2.32 1.78 1.13


Selection of Transforms
• Information packing ability of DCT is superior to
DFT and WHT for almost all images
• KL transform is optimal transform
• KLT minimises MSE for any input image and any
number of retained coefficients
• However, KLT is data dependent, ie basis
function is computed for each data block
• DFT, DCT, WHT,… have fixed basis images
• WHT is simplest to implement
• DFT and DCT more closely approximate
Information packing ability of KLT

45
Selection of Subimage Size
• Transform coding error and computational
complexity increases with increase in subimage
size
• Correlation between adjacent images must as
less as possible

46
Effect of Subimage Size
Block Processing
Processing of entire image
 DCT of entire image requires large memory

 Compression using DCT is effective due to


spatially varying statistics within an image
Block processing
 Small blocks are much easier to compute than
the complete image
 Usually, pixel correlation does not exceed 16
or 32 pixels

48
Block Processing
 Block artifacts caused by the discontinuities
appear due to rectangular windows
 Block artifacts can be minimized by
• Overlapping blocks
• Lowpass filtering of boundary pixels
 Overlapping increases bit rate hence low
compression rate
 Lowpass filtering results in blurring

49
Zonal coding
 Transform coefficients of maximum variance
carry the most picture information
 Locations of coefficients with the k largest
variances are indicated by zonal mask
 Locations are same for all blocks
 Transform coefficients in the zone are retained
 Others are converted to zero
 Variances are calculated based on a global
image model

50
Threshold coding

 Transform is compared with threshold


 Converted to zero or retained
 Global threshold or local threshold

51
Threshold and Zonal coding
Truncating, quantizing and coding coefficients

Retained 8
largest
coefficients
of each
subimage,
e = 4.5

Retained 8
distributions
of largest
variance for
each
subimage,
e = 6.5
Zonal Coding Implementation
Ways to threshold transformed Image
Global , local and dynamic
Approximation using DCT and
normalization array
C=19,e=3.83
C=49,e=6.62

C=30,e=4.93

C=85,e=9.35 C=182,e=22.46

C=85,e=13.94
 For still photographic images
 JPEG standard is a collaboration among :
 International Telecommunication Union (ITU)
 International Organization for Standardization (ISO)
 International Electrotechnical Commission (IEC)

57
 Lossy compression technique
 Is based on DCT
 General image compression technique
independent of
 Image resolution
 Image and pixel aspect ratio
 Color system
 Image complexity
 A scheme for video compression based on JPEG called
Motion JPEG (MJPEG) exists
 Effective because of
1. Image data usually changes slowly across an
image, especially within an 8x8 block
 Therefore images contain much redundancy
2. Human is not very sensitive to the high frequency
data images
 Therefore some data after transform coding can be
removed
3. Human is much more sensitive to brightness
(luminance) information than color (chrominance)
 Therefore JPEG uses chroma subsampling (4:2:0)
 For JPG compatibility, system must include
support for the baseline system
 Three types
1. Lossy baseline coding system
based on DCT
Adequate for most compression techniques
2. Extended coding system
Greater compression and higher precision
3. Lossless independent coding system for
reversible compression
 Compression is performed in 3 steps
1. DCT computation
2. Quantization
3. Variable length code assignment
 Input and output data precision is 8 bits
 Quantized DCT values are upto 11 bits
8x8 block
1. Image
2. Divide into subimage
3. Level shift by subtracting 2k-1 from each pixel
(maximum levels are 2k)
4. Compute 2-D DCT
5. Quantize
6. Reorder using zigzag pattern to generate 1-D
7. Encode nonzero AC coefficients using
variable length codes
8. Encode the difference between DC coefficients
of current and pervious subimage
 The image is divided up into 8x8 blocks
 2D DCT is performed on each block
 The DCT is performed independently for each block
 When a high degree of compression is requested,
JPEG gives a “blocky” image result
 Quantization in JPEG aims at reducing the total
number of bits in the compressed image
 Divide each entry in the frequency space block by an
integer, then round
 Use a quantization matrix Q(u, v)
• Sensitivity of the eye varies with spatial frequency
• Amplitude threshold below which the eye detect a
spatial variations/frequency also varies
• Threshold values vary for each of the 64 DCT
coefficients
• Quantization table stores implements different values
of threshold
• Choice of threshold value is a compromise between the
required level of compression and information loss that
is acceptable

68
 JPEG standard has two quantization tables for
the luminance and the chrominance
coefficients.
 However, customized tables are allowed and
can be sent with the compressed image
 Multiple quantization matrices can be used (by
scaling the defaults), allowing the user to
choose how much compression to use
 Trades off quality vs. compression ratio
 More compression means larger entries in Q
 Use larger entries in Q for the higher spatial
frequencies
 High frequency entries are at the lower right
part of the matrix
 Default Q(u, v) values for luminance and
chrominance are
 Based on psychophysical studies intended to maximize
compression ratios while minimizing perceptual
distortion
 After division the entries are smaller, fewer bits can be
used to encode them
52 55 61 66 70 61 64 73
63 59 66 90 109 85 69 72
62 59 68 113 144 104 66 73
63 58 71 122 154 106 70 69
67 61 68 104 126 88 68 70
79 65 60 70 77 63 58 75
85 71 64 59 55 61 65 83
87 79 69 68 65 76 78 94

• Has 28 possible intensity levels


• K=8,
• Level shift by -2k-1 = -27 = -128
72
52 55 61 66 70 61 64 73
63 59 66 90 109 85 69 72
62 59 68 113 144 104 66 73
Subimage, f(x,y) 63 58 71 122 154 106 70 69
67 61 68 104 126 88 68 70
79 65 60 70 77 63 58 75
85 71 64 59 55 61 65 83
87 79 69 68 65 76 78 94

-76 -73 -67 -62 -58 -67 -64 -55


-65 -69 -62 -38 -19 -43 -59 -56
-66 -69 -60 -15 -16 -24 -62 -55
-65 -70 -57 -6 26 -22 -58 -59
Level shifted
-61 -67 -60 -24 -2 -40 -60 -58
f(x,y)-128
-49 -63 -68 -58 -51 -65 -70 -53
-43 -57 -64 -69 -73 -67 -63 -45
-41 -49 -59 -60 -63 -52 -50 -34 73
-76 -73 -67 -62 -58 -67 -64 -55
-65 -69 -62 -38 -19 -43 -59 -56
-66 -69 -60 -15 -16 -24 -62 -55

Level shifted -65 -70 -57 -6 26 -22 -58 -59


-61 -67 -60 -24 -2 -40 -60 -58
-49 -63 -68 -58 -51 -65 -70 -53
-43 -57 -64 -69 -73 -67 -63 -45
-41 -49 -59 -60 -63 -52 -50 -34

-415 -29 -62 25 55 -20 -1 -3


7 -21 -62 9 11 -7 -6 6
-46 8 77 -25 -30 10 7 -5
DCT and
round to -50 13 35 -15 -9 6 0 3
nearest 11 -8 -13 -2 -1 1 -4 1
integer -10 1 3 -3 -1 0 2 -1
-4 -1 2 -1 2 -3 1 -2
-1 -1 -1 -2 -1 -1 0 -1 74
-415 -29 -62 25 55 -20 -1 -3
7 -21 -62 9 11 -7 -6 6
-46 8 77 -25 -30 10 7 -5
DCT, T(x,y) -50 13 35 -15 -9 6 0 3
11 -8 -13 -2 -1 1 -4 1
-10 1 3 -3 -1 0 2 -1
-4 -1 2 -1 2 -3 1 -2
-1 -1 -1 -2 -1 -1 0 -1

Normalization
matrix, Z(x,y)

75
round(= - 415/16) N(x,y) = round{T(x,y)/Z(x,y)}

-26 -3 -6 2 2 0 0 0
1 -2 -4 0 0 0 0 0
-3 1 5 -1 -1 0 0 0
-4 1 2 -1 0 0 0 0
1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0

Large number of coefficients with zero value

76
-26 -3 -6 2 2 0 0 0
1 -2 -4 0 0 0 0 0
-3 1 5 -1 -1 0 0 0
-4 1 2 -1 0 0 0 0
1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Remaining values
are zeroes
Reordered using zigzag pattern
[-26 -3 1 -3 -2 -6 2 -4 1 -4 1 1 5 0 2 0 0 -1 2 0 0 0 0 0 -1 -1 EOB]
77
Range DC AC category
difference
category
0 0 NA
-1,1 1 1
-3, -2, 2, 3 2 2
-7, …, -4,4,…,7 3 3
-15,…-8,8,…15 4 4
-31,…,-16,16,…31 5 5
: : :
-511,…,-256,256,…511 9 9
-1023,…,-512,512,…1023 A A
: : :
-32767,…, -16384, 16384, 32767 F NA
78
Category Base code length category Base code length
0 010 3 6 1110 10
1 011 4 7 11110 12
2 100 5 8 111110 14
3 00 5 9 1111110 16
4 101 7 A 11111110 18
5 110 8 B 111111110 20

79
Reordered using zigzag pattern
[-26 -3 1 -3 -2 -6 2 -4 1 -4 1 1 5 0 2 0 0 -1 2 0 0 0 0 0 -1 -1 EOB]

• Assume DC coefficient of previous subimage is -17


• DC difference is -26-(-17) = -9
• -9 is in category 4
• Default huffman difference code is 101 with length 7
• Remaining 4 bits are generated from LSB of sequence
number of -9
• - 9 is 7th in sequence (-15,-14,…-8,8,9,…15)
• Sequence number starts from 0.
• 7th in sequence  6=0110
• Complete code word is 1010110

80
Run/Category Base code length
0/0 (EOB) 1010 4
0/1 00 3
0/2 01 4
: : :
0/A 1111111110000011 26
1/1 1100 5
1/2 111001 8
: : :
1/A 1111111110001000 26
2/1 11011 6
: : :
2/A 1111111110001111 26
: : :
F/A 1111111111111110 26
81
• Reordered sequence is
[-26 -3 1 -3 -2 -6 2 -4 1 -4 1 1 5 0 2 0 0 -1 2 0 0 0 0 0 -1 -1 EOB]
• First nonzero AC coefficient is - 3
• Category is 2
• Preceded by 0 number of zeroes, therefore run is 0
• Code for 0/2 from table is 01 with length 4
• -3 is zeroeth element of array therefore last 2 bits are 00
• Array of code words is

• [1010110 0100,…]

82
• Reordered sequence is
[-26 -3 1 -3 -2 -6 2 -4 1 -4 1 1 5 0 2 0 0 -1 2 0 0 0 0 0 -1 -1 EOB]
• AC coefficient is 1
• Category is 1
• Preceded by 0 number of zeroes, therefore run is 0
• Code for 0/1 from table is 00 with length 3
• 1 is 2nd element of array therefore last 1 bit is 1
• Array of code words is

• [1010110 0100 001 …]

83
• Reordered sequence is
[-26 -3 1 -3 -2 -6 2 -4 1 -4 1 1 5 0 2 0 0 -1 2 0 0 0 0 0 -1 -1 EOB]
• AC coefficient is 1
• Category is 1
• Preceded by 0 number of zeroes, therefore run is 0
• Code for 0/1 from table is 00 with length 3
• 1 is 2nd element of array therefore last 1 bit is 1
• Complete Array of code words is
• [1010110 0100 001 0100 0101 100001 0110 100011
001 100011 001 001 100101 11100110 110110 0110
11110100 000 1010]
• Image requires 92 bits
• C = 8x8x8/92 = 5.6:1

84
 Lookup table of huffman code to decode data
 Regenerate array of quantized coefficients
-26 -3 -6 2 2 0 0 0
1 -2 -4 0 0 0 0 0
-3 1 5 -1 -1 0 0 0
-4 1 2 -1 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0

Decode denormalize IDCT levelshift by +128errorrms error


85
 Multiply by normalization matrix

-26 -3 -6 2 2 0 0 0
1 -2 -4 0 0 0 0 0
-3 1 5 -1 -1 0 0 0
-4 1 2 -1 0 0 0 0
0 0 0 0 0 0 0 0 x
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0

Decode denormalize IDCT levelshift by +128errorrms error


86
 denormalized matrix

-416 -33 -60 32 48 0 0 0

12 -24 -56 0 0 0 0 0
-42 13 80 -24 -40 0 0 0
-56 17 44 -29 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0

Decode denormalize IDCT levelshift by +128errorrms error


87
-70 -64 -61 -64 -69 -66 -58 -50
-72 -73 -61 -39 -30 -40 -54 -59
-68 -78 -58 -9 13 -12 -48 -64
-59 -77 -57 0 22 -13 -51 -60
-54 -75 -64 -23 -13 -44 -63 -56
-52 -71 -72 -54 -54 -71 -71 -54
-45 -59 -70 -68 -67 -67 -61 -50
-35 -47 -61 -66 -60 -48 -44 -44

Decode denormalize IDCT levelshift by +128errorrms error

88
58 64 67 64 59 62 70 78
56 55 67 89 98 88 74 69
60 50 70 119 141 116 80 64
69 51 71 128 149 115 77 68
74 53 64 105 115 84 65 72
76 57 56 74 75 57 57 74
83 69 59 60 61 61 67 78
93 81 67 62 69 80 84 84

Decode denormalize IDCT levelshift by +128errorrms error


89
-6 -9 -6 2 11 -1 -6 -5
7 4 -1 1 11 -3 -5 3
2 9 -2 -6 -3 -12 -14 9
-6 7 0 -4 -5 -9 -7 1
-7 8 4 -1 6 4 3 -2
3 8 4 -4 2 6 1 1
2 2 5 -1 -6 0 -2 5
-6 -2 2 6 -4 -4 -6 10

Root mean square error is 5.8

Decode denormalize IDCT levelshift by +128errorrms error


90
Huffman
Table

AC Zig-zag Huffman
88
reordering coding
Color FDCT JPEG
components after Quantizer
bit-stream
(Y, Cb, or Cr) centeri
ng Difference Huffman
DC Encoding coding

Quantization
Table Huffman
Table

97
 For nonnegative integers
 Based on the assumption that larger an integer
the lower is the probability of occurence
 n is a non negative integer
 Simplest form is unary code
 Unary code is n 1s followed by a ‘0’
 n=4, code is 11110
 Golomb code of n with respect to m is Gm(n),
m> 0 and n>0
 Unary code is G1(n)
98
 r = n – qm, r = 0,1,…,m-1
 n = mq + r
 Form unary code for q
 k = Log2m rounded to higher integer
 c=2k – m, r = n mod m
 r‘ = r truncated to k-1 bits for 0<= r < c
 r‘ = r + c truncated to k bits for other values of r
 Concatenate codes of q and r’
 For m = 2k, c=0 and r is truncated to k bits
called Golomb Rice codes
99
 Ex: m = 5, n=7
 7 = 1x5 +2 , q=1 and r =2
 k = Log25 rounded to higher integer is 3
 c=2k – m =3
 q is coded to 10
 Since r < c, r=2 is coded using k=2 bits
 For n = 7, code is 1010

100
 For G4(9), m = 4 and n=9
 n = qm + r, r = 0,1,…,m-1
 9 = 2x4 + 1, q = 2 and r=1
 q is coded to 110
 k = log24 = 2 bits
 c = 22 – 4 = 0
 Since r = 1, r is truncated to 2(=k ) bits = 01
 Code is 11001

101
1. Find an integer i >= 0 such that
𝑖−1 𝑗+𝑘 𝑖 𝑗+𝑘
𝑗=0 2 ≤𝑛< 𝑗=0 2
and form a unary code of i.
If k=0 then i = 𝑙𝑜𝑔2 (𝑛 + 1) , (lower integer)
2. Truncate binary representation of
𝑖−1 𝑗+𝑘
𝑛− 𝑗=0 2
to atleast k+i LSB
3. Concatenate the result of step 1 and 2

102
1. If k = 0 then i = 𝑙𝑜𝑔2 (9) =3
and form a unary code of i.
i is coded to 1110
2. Truncate binary representation of
𝑖−1 𝑗+𝑘
𝑛− 𝑗=0 2
to atleast k+i LSB
3−1 𝑗
8− 𝑗=0 2 = 1 to atleast 3 LSB bits
Code 1 to 001
3. Concatenate the result of step 1 and 2

4. Code is 1110 001


103
n G1(n) G2(n) G4(n) Gexp0(n)
0 0 0 000 0
1 10 01 001 100
2 110 100 010 101
3 1110 101 011 11000
4 11110 1100 1000 11001
5 111110 1101 1001 11010
6 1111110 11100 1010 11011
7 11111110 11101 1011 1110000
8 111111110 111100 11000 1110001
9 1111111110 111101 11001 1110010

104
 Interval 0 to 1 is divided according to the
probabilities of the occurrence of intensities
 Does not generate codes for each character
 Performs arithmetic operation on a block of
data
 Possible to encode characters with a fractional
number of bits thus approaching the theoretical
optimum
 Works better than Huffman for codes with low
entropy
105
 A source emits 4 symbols {a,b,c,d} with the
probabilities 0.4, 0.2, 0.1 and 0.3 respectievely.
 Encode the word, ‘dad’

Symbol a b c d
Probability 0.4 0.2 0.1 0.3
Subrange (0-0.4) (0.4-0.6) (0.6-0.7) (0.7-1)

a b c d

0 0.4 0.6 0.7 1

106
 A source emits 4 symbols {a,b,c,d} with the
probabilities 0.4, 0.2, 0.1 and 0.3 respectievely.
 Encode the word, ‘dad’

Symbol a b c d
Probability 0.4 0.2 0.1 0.3
Subrange (0-0.4) (0.4-0.6) (0.6-0.7) (0.7-1)

a b c d

0 0.4 0.6 0.7 1

107
 A source emits 4 symbols {a,b,c,d} with the
probabilities 0.4, 0.2, 0.1 and 0.3 respectievely.
 Encode the word, ‘dad’

Symbol a b c d
Probability 0.4 0.2 0.1 0.3
Subrange (0-0.4) (0.4-0.6) (0.6-0.7) (0.7-1)

a b c d

0 0.4 0.6 0.7 1

108
0 0.4 0.6 0.7
a b c d 1

R = 1- 0.7 = 0.3

0.7 0.82 0.88 0.91 0.4

L+R x LR L+R x LR L+R x LR L+R x LR


=0.7+0.3x0 =0.7+0.3x0.4 =0.7+0.3x0.6 =0.7+0.3x0.7
= 0.7 = 0.82 = 0.88 = 0.91

H=L + R x HR H=L + R x HR H=L + R x HR H=L + R x HR


=0.7+0.3x0.4 =0.7+0.3x0.6 =0.7+0.3x0.7 =0.7+0.3x1
=0.82 =0.88 =0.91 =1

109
0 0.4 0.6 0.7
a b c d 1

R = 1- 0.7 = 0.3

0.7 0.82 0.88 0.91 1

L+R x LR L+R x LR L+R x LR L+R x LR


=0.7+0.3x0 =0.7+0.3x0.4 =0.7+0.3x0.6 =0.7+0.3x0.7
= 0.7 = 0.82 = 0.88 = 0.91

H=L + R x HR H=L + R x HR H=L + R x HR H=L + R x HR


=0.7+0.3x0.4 =0.7+0.3x0.6 =0.7+0.3x0.7 =0.7+0.3x1
=0.82 =0.88 =0.91 =1

110
0 0.4 0.6 0.7
a b c d 1

R = 1 – 0.7 = 0.3

0.7 0.82 0.88 0.91 1


R = 0.82–0.7=0.12
0.82
0.7
0.748 0.772 0.784
L+R x LR L+R x LR L+R x LR L+R x LR
=0.7+0.12x0 =0.7+0.12x0.4 =0.7+0.12x0.6 =0.7+0.12x0.7
= 0.7 = 0.748 = 0.772 = 0.784

H=L + R x HR H=L + R x HR H=L + R x HR H=L + R x HR


=0.7+0.12x0.4 =0.7+0.12x0.6 =0.7+0.12x0.7 =0.7+0.12x1
111
=0.748 =0.772 =0.784 =82
0 0.4 0.6 0.7
a b c d 1

R = 1 – 0.7 = 0.3

0.7 0.82 0.88 0.91 1


R = 0.82–0.7=0.12
0.82
0.7
0.748 0.772 0.784
L+R x LR L+R x LR L+R x LR L+R x LR
=0.7+0.12x0 =0.7+0.12x0.4 =0.7+0.12x0.6 =0.7+0.12x0.7
= 0.7 = 0.748 = 0.772 = 0.784

H=L + R x HR H=L + R x HR H=L + R x HR H=L + R x HR


=0.7+0.12x0.4 =0.7+0.12x0.6 =0.7+0.12x0.7 =0.7+0.12x1
112
=0.748 =0.772 =0.784 =82
0 0.4 0.6 0.7
a b c d 1

R = 1 – 0.7 = 0.3
a
0.7 0.82 0.88 0.91 1
R = 0.82–0.7=0.12
0.82
0.7
0.748 0.772 0.784 d

• Tag = (0.784+0.82)/2 = 0.802


• Tag and symbol probabilities are sent to the receiver
113
Symbol a b c d
Probability 0.4 0.2 0.1 0.3
Subrange (0-0.4) (0.4-0.6) (0.6-0.7) (0.7-1)

0 0.802 1

Tag value corresponds to the subrange of d

114
Symbol a b c d
Probability 0.4 0.2 0.1 0.3
Subrange (0-0.4) (0.4-0.6) (0.6-0.7) (0.7-1)

0.802
0 0.7 1
d T=0.802

1
a
0.34 0.4 T’=(T-0.7)/0.3
0 = 0.34

d 1
0.7 0.85 T’=(T-0)/0.4
= 0.85

115
Symbol a b c d
Probability 0.4 0.2 0.1 0.3
Subrange (0-0.4) (0.4-0.6) (0.6-0.7) (0.7-1)

0.802
0 0.7 1
d T=0.802

1
a
0.34 0.4 T’=(T-0.7)/0.3
0 = 0.34

d 1
0.7 0.85 T’=(T-0)/0.4
= 0.85

116
Symbol a b c d
Probability 0.4 0.2 0.1 0.3
Subrange (0-0.4) (0.4-0.6) (0.6-0.7) (0.7-1)

0.802
0 0.7 1
d T=0.802

1
a
0.34 0.4 T’=(T-0.7)/0.3
0 = 0.34

d 1
0.7 0.85 T’=(T-0)/0.4
= 0.85

117
 Low computational overhead
 Achieves good compression
 Error free or lossy
 Eliminates redundancies of closely spaced
pixels
 Extracts new information in each pixel
 New information is difference between actual
and predicted
Loss-less Predictive Encoding

 m

  i f  x, y  i  
ˆf (x, y)  round
 i 1 
 m

fˆ (x, y)  round   i f  x, y  i  
 i 1 

 Previous pixel can be from the same line (1-D)


or previous scan (2-D) or previous frame (3-D)
 If i =1 then it is differential or previous pixel
predictor and predictor has order=1
 𝑓 𝑥, 𝑦 = 𝑟𝑜𝑢𝑛𝑑 𝛼𝑓(𝑥, 𝑦 − 1)
 𝑒 𝑥, 𝑦 = 𝑓 𝑥, 𝑦 − 𝑓(𝑥, 𝑦)
First order Predictive Encoding
Image, f(x,y)

Histogram of image

𝑓 𝑥, 𝑦 = 𝑟𝑜𝑢𝑛𝑑 𝛼𝑓(𝑥, 𝑦 − 1)

𝑒 𝑥, 𝑦 = 𝑓 𝑥, 𝑦 − 𝑓 (𝑥, 𝑦)

Prediction error Image, e(x,y)

Histogram of
Prediction error
Image

Error image is scaled such that error of 128 is shifted to 0


Mean value of error before shifting is 128.26
First order Predictive Encoding

Standard deviation of error is


much less than that of
prediction error

Entropy of prediction error <


image which is 7.25

Requires k+1 bits to


encode error

Maximum
compression achieved
is 8:3.99 = 2:1
Two views of moving earth
Standard
deviation of
error is 3.76
bits/pixel which
is much less
than 15.58
bits/pixel for
standard
deviation of one
frame

Entropy of error
is 2.59 bits/pixel
which is less
than 3.99 Maximum
 1
 compression
bits/pixel for
entropy of one
fˆ (x, y, t)  round   f  x, y, t  1  achieved is
frame
 i 1  8:2.59 = 3.1:1
Video Compression
 Based temporal component since video consists of a
time sequence of images
 Take advantage of temporal correlation
 There are different situations in which video
compression becomes necessary
 Each requires a solution specific to its peculiar
conditions
 video compression algorithms and standards are
developed for different video communications
applications

124
Video Compression

• Video produces the largest amount of data


• For a video sequence generated using the CCIR 601
format Each image frame is made up of more than a
quarter million pixels
• Generates 30 frames per second and 16 bits per pixel
• Corresponds to a data rate of about 21 Mbytes or 168
Mbits per second
• Speech coding uses data rates of 2.4, 4.8, and 16 kbits per
second
• Video compression can be viewed as image compression
with a temporal component

125
Video Compression
• View video as a sequence of correlated images
• Most of the video compression algorithms make use of
the temporal correlation to remove redundancy
• Previous reconstructed frame is used to generate a
prediction for the current frame
• Difference between the prediction and the current frame,
the prediction error or residual, is encoded and
transmitted to the receiver
• Previous reconstructed frame is also available at the
receiver

126
Video Compression
• For the compression algorithm designed for two-way
communication, coding delay should to be minimal
• Compression and decompression should have about the
same level of complexity
• The complexity can be unbalanced in a broadcast
application with one transmitter and many receivers and
one-way communication
• In this case, the encoder can be much more complex than
the receiver
• There is also more tolerance for encoding delays
• For personal computers, the decoding complexity has to
be low for the decoder to decode a sufficient number of
images to give the illusion of motion

127
Video Compression

• Encoding is generally not done in real time


• Therefore encoder can be quite complex
• When the video is to be transmitted over packet
networks, the effects of packet loss have to be taken into
account
• Thus, each application presents unique requirements

128
Video Compression

• Receiver uses information to generate the prediction


values and add them to the prediction error to generate
the reconstruction
• The prediction operation in video coding is based on the
motion of the objects in the frame
• Known as motion compensation

129
Motion Compensation
 In most video sequences there is a little change in the
contents of the image from one frame to the next
 There is a significant portion of the image that does not
change from one frame to the next
 Most video compression schemes take advantage of
this redundancy by using the previous frame to
generate a prediction for the current frame
 Prediction based on pixel to pixel comparison may lead
to large data
 The object in one frame that provides the pixel at a
certain location (i0, j0) with its intensity value may
provide the same intensity value in the next frame to a
pixel at location (i1, j1)

130
Motion Compensation

Take the motion of objects in the image


into account

131
Block based Motion Compensation

 Frame is divided into blocks of size MxM.


 For each block, search previous frame for the block of
size MxM that most closely matches the block of
current frame
 Measure closeness/distance by the sum of absolute
differences between corresponding pixels in two blocks
 Can use square of differences instead of absolute
difference

132
Mean Absolute Distortion (MAD)
• Motion of the object is measured and encoded
into motion vectors
• Search for motion vector may be based on
maximum correlation or minimum error
between macroblock pixels and predicted pixels
1 m n
MAD(x, y)  
mn i 1 j 1
f  x  i, y  j   p(x  i dx, y j dy)

• Sum of Absolute Distortions (SAD)


m n
SAD(x, y)   f  x  i, y  j   p(x  i dx, y j dy)
i 1 j 1
133
Block based Motion Compensation

 If distance > threshold then block is uncompensable.


Encode it without the benefit of prediction
 Otherwise transmit motion vector
 Motion vector is relative location of two blocks
 Motion vector = upper left corner of best matching
block - upper left corner of block to be encoded

134
Motion Vector
 upper left corner of best matching block - upper left
corner of block to be encoded
 8x8 block to be encoded has coordinates (10,9) and
(17,16)
 Block that shows best matching has coordinates (6,23)
and (13,30)
 Motion vector is (-4, 14)
 Positive x component shows best matching block is to
the right of the block being encoded
 Positive y component shows best matching block is
below the block being encoded

135
Motion Compensation

• Transmit only motion vector.


• Current frame can be completely predicted by the previous frame

136
Motion Compensation

• Displacement between the block being encoded and


the best matching block is an integer number of
pixels in the horizontal and vertical directions
• There are algorithms in which the displacement is
measured in half pixels
• Pixels of the coded frame being searched are
interpolated to obtain twice as many pixels as in the
original frame
• Image with double size is then searched for the best
matching block

137
Motion Compensation

138
Motion Compensated Predictive Coding

SD=12.73
E=4.17
C=8/4.17
= 1.92
SD=5.62
E=3.04
C=8/3.04
= 2.63.
Sub pixel Motion Compensated Predictive Coding

SD=12.7
E=4.17

SD=4.4
E=3.35

SD=4
E=3.34

SD=3.8
E=3.35
Video Signal
• The composite color signal consists of luminance
component, Y, black-and-white signal
Y = 0.299R+0.587G+0.114B
where R is the red component, G is the green component,
and B is the blue component.
• Two chrominance signals are
Cb = B−Y and
Cr = R−Y
• Three signals can be used by the color television set to
generate the red, blue, and green signals
• The luminance signal can be used directly by the black-
and-white televisions
• Eye is much less sensitive to changes of the chrominance
in an image
141
 Y   0.299000 0.587000 0.114000   R   0 
C    0.168736 0.331264 0.500002  G   128
 b     
Cr   0.500000 0.418688 0.081312  B  128
(a) translate from RGB to YCbCr

 R  1.0 0.0 1.40210   Y 


G   1.0 0.34414 0.71414 C  128
    b 
 B  1.0 1.77180 0.0  Cr  128
(b) translate from YCbCr to RGB

142
ITU-T Recommendation H.261

• The earliest DCT-based video coding standard is the


ITU-T H.261 standard
• An input image is divided into blocks of 8×8 pixels
• For a given 8×8 block generate prediction error using
the previous frame
• If there is no previous frame or the previous frame is
very different from the current frame, the prediction
might be zero
• The difference between the block being encoded and
the prediction is transformed using a DCT
• The transform coefficients are quantized and the
quantization label is encoded using a variable-length
code

143
ITU-T Recommendation H.261

144
Motion Compensation

• Motion compensation requires a large amount of


computation
• To find a matching block for an 8×8 block, each
comparison requires taking 64 differences and
summation of the absolute value of the differences
• Assume that the closest block in the previous frame is
located within 20 pixels in either the horizontal or
vertical direction of the block to be encoded
• Then perform (20+21)(20+21)=1681 comparisons

145
Size of block to reduce the no. of computations

• One way is to increase the size of the block.


• More computations per comparison
• Fewer blocks per frame, so the number of times motion
compensation decreases
• Different objects in a frame may be moving in different
directions within a block

146
• For a block of 2×2 squares, it is possible to find a block
that exactly matches the 2×2 block that contains the circle
• For 4×4 squares, the block that contains the circle also
contains the upper part of the octagon
• Similar 4×4 block in the previous frame can not be found
• Thus, there is a trade-off
147
Size of block to reduce the no. of computations

• Another way to reduce the number of computations is to


reduce the size of block
• Reduces the search space
• If size of the region in which search for a match is
reduced then the number of computations reduce
• However, reduction in the search region increases the
probability of missing a match
• There is a trade-off between computation and the
amount of compression.

148
Size of block for H.261

• The H.261 standard has balanced the trade-offs


• The 8×8 blocks of luminance and chrominance pixels are
organized into macroblocks
• Macroblocks consist of four luminance blocks, and one
each of the two types of chrominance blocks
• The motion compensated prediction (or motion
compensation) operation is performed on the macroblock
level
• For each macroblock, search the previous reconstructed
frame for the macroblock that most closely matches the
macroblock being encoded
• To reduce the amount of computations, only the
luminance blocks are considered in this matching
operation.
149
Macroblock for H.261

• The motion vector for the prediction of the chrominance


blocks is obtained by halving the component values of
the motion vector for the luminance macroblock.
• If the motion vector for the luminance blocks is (−3 10),
then the motion vector for the chrominance blocks is
(−1, 5)
• The search area is restricted to ±15 pixels of the
macroblock being encoded in the horizontal and vertical
directions

150
Loop Filter

• Sharp edges in the block used for prediction results in


the generation of sharp changes in the prediction error
• Can cause high values for the high-frequency
coefficients in the transforms
• Which can increase the transmission rate
• To avoid increase in the transmission rate, prior to
taking the difference, the prediction block is smoothened
by using a two-dimensional spatial filter
• The filter coefficients are {1/4, ½, ¼}
• Block boundaries remain unchanged by the filtering
operation.

151
Example: Loop Filter

(110x1/4)+(218x1/2)+ (116x1/4)=165

110 218 116 112


108 210 110 114
110 218 210 112 Column process
Row process
112 108 110 116

110 165 140 112 110 165 140 112


108 159 135 114 108 167 148 113
110 188 187 112 110 161 154 113
112 109 111 116 112 109 111 116

This filter is either switched on or off for each macroblock.


The conditions for turning the filter on or off are not specified by the recommendations.
152
The Transform
• DCT on an 8×8 block of pixels or pixel differences
• If the motion compensation operation does not provide a
close match, then the transform operation is performed on
an 8×8 block of pixels
• Either a block or its predicted value is quantized and
transmitted to the receiver
• The receiver operation is also simulated at the transmitter,
where the reconstructed images are obtained and stored
in a frame store
• Encoder is said to be in intra mode if it operates directly
on the input image without the use of motion
compensation
• Otherwise, it is said to be in inter mode.

153
Quantization and coding
• In the case of an intra block, the DC coefficients take on
much larger values than the other coefficients
• For the little motion from frame to frame, motion vector
has small values leading to small values for the
coefficients
• For H.261 algorithm 32 different quantizers can be used
• Different quantizers may be used for macroblocks
• One quantizer is reserved for the intra DC coefficient
• Remaining 31 quantizers are used for the other
coefficients
• The intra DC quantizer is a uniform quantizer with a step
size of 8
• The other quantizers use a step size of an even value
between 2 and 62

154
Quantization and coding

• Each macroblock is preceded by a header


• The quantizer can be identified by a code as part of the
header
• When the amount of activity or motion in the sequence is
relatively constant, same quantizer is used for a large
number of macroblocks
• The macroblocks are organized into groups of blocks
(GOBs), each of which consist of three rows of 11
macroblocks
• The header preceding each GOB contains a 5-bit field for
identifying the quantizer

155
A GOB with 33 blocks

3x11 matrix of macroblocks

156
Rate Control

• The binary codewords generated by the transform coder


form the input to a transmission buffer
• The function of the transmission buffer is to keep the
output rate of the encoder fixed
• If the buffer starts filling up faster than the transmission
rate, it sends a message back to the transform coder to
reduce the output from the quantization
• If the buffer is going to be empty because the transform
coder is providing bits at a rate lower than the
transmission rate, the transmission buffer can request a
higher rate from the transform coder

157
Asymmetric applications
• The ITU-T H.261 algorithm is designed for videophone
and videoconferencing applications
• Therefore, the algorithm operates with minimal coding
delay (less than 150 milliseconds)
• There are a number of applications in which it is cost
effective for more computations at the encoder
• In multimedia applications where a video sequence is
stored on a CD-ROM, the speed of decompression is
performed in real time
• However, the compression is performed only once, and
there is no need for it to be in real time
• Thus, the encoding algorithms can be significantly more
complex

158
The MPEG-1 Video Standard

• The basic structure of MPEG is similar to that of ITU-T


H.261
• Blocks (8×8 in size) of either an original frame or the
difference between a frame and the motion-compensated
prediction are transformed using the DCT
• The blocks are organized in macroblocks, which are
defined in the same manner as in the H.261 algorithm
• Motion compensation is performed at the macroblock
level
• The transform coefficients are quantized and transmitted
to the receiver
• A buffer is used to smooth delivery of bits from the
encoder and also for rate control

159
The MPEG-1 Video Standard

• The H.261 standard has videophone and videoconferencing


as the main application areas
• MPEG standard initially had applications that require
digital storage and retrieval
• MPEG-1 has the provision for random access
• Frames are accessed periodically and are coded without
any reference to past frames
• These frames are referred to as I frames

160
I-frames
• I-frames resemble JPEG encoded block
• Ideal for starting points for generation of prediction
residuals
• Provide high degree of random access, ease of editing and
resistance to propagation of transmission error
• All standards require periodic insertion of I-frames into
compressed video code stream
• I frames do not use temporal correlation, therefore
compression rate is low compared to the frames that
make use of the temporal correlations for prediction
• Thus, the number of frames between two consecutive I
frames is a trade-off between compression efficiency and
convenience

161
Motion Compensated Predictive Coding
for MPEG

• Forward prediction (Predictive frame or P-frame)


• Backward prediction (Bidirectional frame or B-frame)
P frames and B frames

• To improve compression efficiency, the MPEG-1


algorithm contains two types of frames
• predictive coded, P frames and the bidirectionally
predictive coded, B frames
• P frames are coded using motion-compensated
prediction from the last I or P frame, whichever is closest
• Compression efficiency of P frames is substantially
higher than I frames
• The I and P frames are sometimes called anchor frames

163
B frames
• To compensate for the reduction in the amount of
compression due to the frequent use of I frames, the MPEG
standard uses B frames
• B frames achieve a high level of compression by using
motion-compensated prediction from the most recent
anchor frame and the closest future anchor frame
• Example: Video sequence in which there is a sudden
change between one frame and the next
• This is a common occurrence in TV advertisements
• In this situation, prediction based on the past frames may
be useless
• However, predictions based on future frames has a high
probability of being accurate

164
B frames

• A B frame can only be generated after the future anchor


frame has been generated
• B frame is not used for predicting any other frame
• This means that B frames can tolerate more error because
this error is not propagated by the prediction process.

165
Group of Pictures
• Different frames are organized together in a group of
pictures (GOP)
• A GOP is the smallest random access unit in the video
sequence
• Structure is set up as a trade-off between the high
compression efficiency of motion-compensated coding
and the fast picture acquisition capability of periodic
intra-only processing
• Must contain at least one I frame
• First I frame in a GOP is either the first frame of the
GOP, or is preceded by B frames that use motion-
compensated prediction only from this I frame

166
group of pictures.
• The first frame is an I frame, which is compressed
without reference to any previous frame
• The next frame to be compressed is the fourth frame
• This frame is compressed using motion-
compensated prediction from the first frame
• Compress frame two, which is compressed using
motion-compensated prediction from frame 1 and 4

167
A typical sequence of frames in display order
• Third frame is also compressed using motion-compensated
prediction from the first and fourth frames
• The next frame to be compressed is frame seven, which uses
motion-compensated prediction from frame four
• This is followed by frames 5 and 6, which are compressed
using motion-compensated predictions from frames 4 and 7
• Processing order that is different from the display order,
bitstream order

168
A typical sequence of frames in display order

A typical sequence of frames in bitstream


order

169
sequence of frames

• Unlike the ITU-T H.261 algorithm, the frame being


predicted and the frame upon which the prediction is based
are not necessarily adjacent
• Number of frames between the frame being encoded and
the frame upon which the prediction is based is variable
• Search for the best matching block in a neighboring frame
and the region of search depends on assumptions about the
amount of motion
• More motion will lead to larger search areas than a small
amount of motion
• MPEG recommends large search area for such cases

170
sequence of frames

• When the number of frames between the frame being


encoded and the prediction frame is variable, the search
area is a function of the distance between the two frames
• MPEG standard does not specify the method used for
motion compensation
• Once motion compensation has been performed, the block
of prediction errors is transformed using the DCT and
quantized, and the quantization labels are encoded

171
Rate control in MPEG

• Rate control in the MPEG standard can be performed at


the sequence level or at the level of individual frames.
• At the sequence level, any reduction in bit rate first
occurs with the B frames because they are not essential
for the encoding of other frames
• At the level of the individual frames, rate control takes
place in two steps
• First, as in the case of the H.261 algorithm, the
quantizer step sizes are increased
• If this is not sufficient, then the higher-order frequency
coefficients are dropped until the need for rate
reduction is over

172
MPEG Encoder
Specifications of MPEG

• The format for MPEG is flexible


• However, the MPEG committee has provided some
suggested values for the various parameters
• For MPEG-1 suggested values are called the constrained
parameter bitstream (CPB)
• Horizontal picture size <= 768 pixels
• Vertical size <= 576 pixels
• Pixel rate < 396 macroblocks per frame if the frame rate
is 25 frames per second
• Or pixel rate < 330 macroblocks per frame if the frame
rate is 30 frames per second or less

174
Specifications of MPEG

• The definition of a macroblock is the same as in the ITU-T


H.261 recommendations
• Frame size of 352×288 pixels at the 25-frames-per-second
rate, or a frame size of 352×240 pixels at the 30-frames-per-
second rate
• Fixed size frames allow the algorithm to achieve bit rates
of between 1 and 1.5 Mbits per second

175
MPEG 1 to MPEG 2

• MPEG-1 algorithm provides reconstructed images of


VHS quality for moderate to low-motion video
sequences, and worse than VHS quality for high-
motion sequences at rates of around 1.2 Mbits per
second
• MPEG-1 is used for the applications like CD-ROM
• There is no consideration of interlaced video in
MPEG-1
• MPEG committee has provided some additional
recommendations, the MPEG-2 recommendations
• MPEG-2 is designed to handle interlaced video

176
Image compression using wavelet coding
• Subimage processing is not required
• Eliminates blocking artifacts
• Wavelet transforms are computationally efficient and
have capability of multiresolution
compressed
Input
image
image Wavelet Symbol
Quantizer
Transform encoder

decompressed
compressed Inverse image
Symbol
image wavelet
decoder
transform

177
Selection of wavelets

• Based on computational complexity


• Less effect on compression efficiency and
reconstruction
• Most widely used wavelet function is Daubechies
and biorthogonal wavelet

Wavelet Filter taps Zeroed


coefficients
Haar 4 33.8%
Daubechies 16 40.9%
Symlet 16 41.2%
Biorthogonal 28 42.1%

Transform coefficients with values less than 1.5 are truncated


178
Decomposition level

• Number decomposition levels decide the


computational complexity

Decomposition Approximation Truncated Reconstruction


level coefficient coefficients error
image (%)
1 256x256 74.7 3.27
2 128x128 91.7 4.23
3 64x64 95.1 4.54
4 32x32 95.6 4.61
5 16x16 95.5 4.63

• Quantization levels also decide the number of computations

179
 Used for still images and video
 Uses wavelet transform
 Better compression than JPG
 Portion of the compressed image can be
extracted for retransmission, storage, display
and/or editing
 Coefficient quantization is dependent on the
scale and subbands
 Coefficients are arithmetically coded on
bitplane basis
 Isolate particular bits of
intensity value
 Shows contribution of
each bit
 Higher-order bits
usually contain most of
the significant visual
information
 Lower-order bits
contain subtle details
1. Take an image with k-bit intensity levels
2. Level shift by -2k-1
3. For color image perform level shifting on each of
three color components e.g. R, G and B
• Decorrelate level shifted components
• 𝑌 𝑥, 𝑦 = 0.299𝑅 𝑥, 𝑦 + 0.587𝐺 𝑥, 𝑦 + 0.114𝐵 𝑥, 𝑦
• 𝐶𝑏 𝑥, 𝑦 = −0.16875𝑅 𝑥, 𝑦 − 0.33126𝐺 𝑥, 𝑦 + 0.5𝐵 𝑥, 𝑦
• 𝐶𝑟 𝑥, 𝑦 = 0.5𝑅 𝑥, 𝑦 − 0.41869𝐺 𝑥, 𝑦 − 0.08131𝐵 𝑥, 𝑦
• Decorrelation is optional
• Histogram of resulting components are peaked
around zero
• Image/ color components can be subdivided into
tiles (optional)
1. Compute 1-D DWT for rows and columns of
each tile
2. Transformation generates 4 subbands (WD, WV,
WH, WØ)
3. Transformation can be repeated NL number of
times to produce NL -scale coefficients.
4. Lowest scale approximation coefficient is the
approximation of original image
NL - scale transform

a2LL(u,v) a2HL(u,v)

a1HL(u,v)
a2LH(u,v) a2HH(u,v)

a1LH(u,v) a1HH(u,v)

Contains 3NL+1 subbands


185
2-scale transform
Number of DWT coefficients = number of pixels

Analysis gain bits

0 1

1
1 2

1 2

Contains 3NL+1 = 7 subbands


186
 Important information is contained in few
coefficients
 Quantize coefficients, ab(u,v) of subband b
𝑎𝑏(𝑢,𝑣)
 𝑞𝑏 𝑢, 𝑣 = 𝑠𝑖𝑔𝑛 𝑎𝑏 𝑢, 𝑣 . 𝑓𝑙𝑜𝑜𝑟
∆𝑏
𝜇𝑏
 Quantization step ∆𝑏 = 2𝑅𝑏−𝜀𝑏 1 + 11
2
 Rb is nominal dynamic range of subband b
 Rb is the sum of number of bits used to represent
the original image and the analysis gain bits
 𝜀 b and 𝜇𝑏 are the number of bits allotted to the
exponent and mantissa of the coefficients
 For error free compression, Rb = 𝜀 b, 𝜇𝑏 = 0 and
∆𝑏 =1
 For lossy compression, quantization step size is
not specified in the standard
 Number of exponent and mantissa bits is provided
for all the subbands
 or number of exponent and mantissa bits is
provided for the NLLL only
 If 𝜀 0 and 𝜇0 bits are allocated to the NLLL subband

 Extrapolated parameters for other subbands are

𝜇𝑏 = 𝜇0 and 𝜀 b = 𝜀 0 + 𝑛𝑏 - NL , nb denote the number


of subband decomposition level from the original
image tile component to subband b
 Coefficients of each tile component are arranged
in a rectangular blocks, called code blocks
 Code block is encoded using bit-plane coding
starting with MSB
 Each bit of a bitplane is encoded in 3 passes,
called significance propagation, magnitude
refinement or cleanup
 Bit planes are arithmetically coded
 Coded data is grouped with similar passes from
other code blocks to form layers
 Layers are partitioned into packets
 Packets allow extraction of a region from the total
code stream
DWT of
Convert Divide Quantize
Level each tile
to Y,Cb into DWT
shift upto NL
and Cr tiles coeff.
level

Bit Arrange
Arithmetic Three
Layers plane in code
coding passes
slicing block

packets

190
 Invert the operations of encoder
 Reconstruct the subbands of the tile components
from arithmetically coded packets
 Decode subbands
 Out of Mb bit planes, Nb bitplanes can be
decoded
 This is equivalent to quantizing the coefficients of
code block using step size of 2𝑀𝑏−𝑁𝑏 . ∆𝑏
 Uncoded bits are set to zero
 Resulting coefficients unquantized
 𝑞 𝑢, 𝑣 are non zero coefficients
 After inverse quantization
𝑅𝑞𝑏 𝑢, 𝑣
𝑞𝑏 (𝑢, 𝑣 + 𝑟 × 2𝑀𝑏−𝑁𝑏 𝑢,𝑣 ∆𝑏 𝑞𝑏 𝑢, 𝑣 > 0
= 𝑞𝑏 (𝑢, 𝑣 − 𝑟 × 2𝑀𝑏−𝑁𝑏 𝑢,𝑣 ∆𝑏 𝑞𝑏 𝑢, 𝑣 < 0
0 𝑞𝑏 𝑢, 𝑣 = 0
 𝑅𝑞 𝑢, 𝑣 denotes inverse quantized transform
𝑏
coefficient and 𝑁𝑏 𝑢, 𝑣 is the number of decoded
bit planes for 𝑞𝑏 (𝑢, 𝑣 .
 Reconstruction parameter, r is chosen by the
decoder to produce the best visual quality
 Generally 0 <= r < 1
 Therefore r=1/2
 Compute inverse FWT
 Assemble component tiles
 Compute inverse component transformation
𝑅 𝑥, 𝑦 = 𝑌 𝑥, 𝑦 + 1.402𝐶𝑟 𝑥, 𝑦
𝐺 𝑥, 𝑦 = 𝑌 𝑥, 𝑦 − 0.34413𝐶𝑏 𝑥, 𝑦 - 0.71414𝐶𝑟 𝑥, 𝑦
𝐵 𝑥, 𝑦 = 𝑌 𝑥, 𝑦 + 1.772𝐶𝑏 𝑥, 𝑦

 Inverse transformed coefficients are shifted by


+2k-1
 video compression schemes for use over
networks
 For asynchronous transfer mode (ATM),
information is divided into packets, which are
transmitted over channels that can be used by
more than one user
 If at a given time there is very little traffic on
the network, the available capacity is high
 If there is congestion on the network, the
available capacity is low
 If alternate routes is used then some of the
packets may encounter congestion, leading to a
variable amount of delay through the network
 To prevent congestion networks prioritize the
traffic, with higher-priority traffic being
permitted to move ahead of lower-priority
traffic

196
 Buffer smoothens the output of the compression
algorithm for a high-activity region of the video
 Buffer generate more than the average number of
bits per second, in order to prevent the buffer
from overflowing
 This is followed by the generation of fewer bits
per second than the average
 Done by increasing the step size or dropping
coefficients, or entire frames
 This may reduce the quality of image

197
 If ATM network is not congested then it
accommodate the variable rate generated by
the compression algorithm
 Otherwise compression algorithm operates at a
reduced rate
 If the network is well designed compression
algorithm need not operate at a lower rate
 Therefore video coder can provides uniform
quality

198
 Congestion might cause long delays therefore
some packets arrive after they can be of any use
 That is, the frame they were supposed to be a
part of might have already been reconstructed
 To avoid these problems video compression
algorithm provides information in a layered
fashion
 Low-rate high-priority layer is used to
reconstruct the video, even though the
reconstruction is poor
 Low-priority enhancement layers enhance the
quality of the reconstruction
199
Highest
priority
& low
data rate

Analysis filter bank

200
 Encode the difference between the current
frame and the prediction for the current frame
using a 16×16 DCT
 Transmit the DC coefficient and the three
lowest-order AC coefficients to the receiver
 The coded coefficients make up the highest-
priority layer

201
 At the transmitter reconstructed frame is
subtracted from the original
 The sum of squared errors is calculated for
each 16×16 block
 Blocks with squared error greater than a
prescribed threshold are subdivided into four
8×8 blocks, and the coding process is repeated
using an 8×8 DCT
 The coded coefficients of 16×16 block make up
the next layer

202
 Since blocks that fail to meet the threshold test
are subdivided this information is transmitted
as side information
 Repeated with 4×4 blocks, to make the third
layer, and 2×2 blocks to make the fourth layer
 Data rate for the first layer is constant and is
variable for other layers
 To remove the effect of delayed packets from
the prediction, only the reconstruction from the
higher-priority layers is used for prediction.

203

You might also like