International Standards for Video Encoding

PERCEPTUAL VIDEO
COMPRESSION
International Standardized Video Encoders
Compression Techniques for Multimedia
Universitatea Politehnica din Bucureti Prof. dr. ing. Cristian Negrescu

International Standards for Video
Encoding
The development philosophy for an international video encoder:
Large R&D resources for developing a new encoder
Multiple contributors
Standardisation Committee and procedure
There are defined: architecture, models , meaning and representation of model
parameters , bit-stream syntax
Encoder/decoder compatibility, high flexibility, future improvements
ITU-T i MPEG:
Videoconference, Video 20-320 kbps H261, H263,
telephony MPEG-4/H264 AVC
Video Broadcast 2-5Mbps MPEG-2/H264 AVC
(Digital Television) (10-20Mbps ptHD)
ITU-T H261 V1 H261 V2 H263 H263+ H263++ Video delivery 4-8 Mbps MPEG-2/H264 AVC
DVD video (10-20Mbps ptHD) VC-1
HD DVD
Blu-Ray Disk
Joint ITU-T / H262/ H264 / Internet Streaming 20-600 kbps Proprietary encoders
MPEG2 MPEG4 AVC H263, MPEG-4/H264
MPEG AVC, VC-1
Video over 3G radio 20-200 kbps H263, MPEG-4/H264
networks AVC, VC-1
MPEG MPEG1 MPEG4 V1
1988 Politehnica
Universitatea 1992din Bucureti
1996 2000 2004 Prof. dr. ing. Cristian Negrescu
H261 Video Encoder
Destination: Video-telephony and videoconference:
Developed by CCITT (actual ITU-T) during 1988-1990
Intended to use for ISDN lines as a part of the protocol H320
Delivers encoded streams with p x 64kbps bitrate
Low-latency encoder
CBR encoder (constant bitrate for the encoded stream)
Accepted image formats CIF (352x288) and QCIF (176x144)
Characteristics of the encoder:
Predictive video encoder
Based on motion compensation
Compression unit: MB
Used in stand-alone or PC-based video conferences systems
Current Coded Decoded
frame C E Transform Quantizer Coding
error Decoding Tr. Inv.
E frame
Q 1 Q 1 1
+ T C C T (display)
Input - (innovation) + +
Q 1 Estimated of
current frame
Tr. Inv.
Prediction by motion R Memory
compensation frame
T 1 MC
Estimated of
current frame + E V Decoder
+
Decoded frame
(display)
Prediction by motion
compensation
R Memorie
MC cadru
Aux. Info.
Universitatea
V Politehnica
MV, din
CR, etc Bucureti
Motion estimation
Prof. dr. ing. Cristian Negrescu
ME Encoder
H261 Video Encoder
Structure of the H261 video sequence:
Level 1: Frame
Level 2: Group of blocks (GOB)
Helpful in synchronization recovery
1 CIF frame= 2x6 = 12 GOB
1 QCIF frame = 1x3= 3 GOB
Level 3: The macroblock
1 GOB = 11x3=33MB
Represents the compression unit

Current Coded
error
T Q C
Input + - (innovation)
Q 1
H261 Video Encoder Estimate of

current frame
+
Tr. Inv.
T 1
+ E
Decoded frame
The Macroblock characteristics:

(display)
compensation
R Memory
frame
MC
Aux. Info.
V
Each macroblock coresponds to a 16x16pixels image
MV, CR, etc
Motion estimation
ME Encoder
Color space (YCbCr)

CR
Format 4:2:0 CB
16x16 pixels for Y ' Y'
1 block 8x8 pixels CB + 1 block 8x8 pixels CR
The macroblock - encoding:

Skipped MB: Nothing will be transmitted if the MB from the current frame is
the same with that one from the previous frame
Intra MB: DCT, Quantization, Zigzag scan, RLE and Huffman coding (similar
with JPEG)
Inter MB:
Uses the motion estimation (ME) to indicate the corresponding block in the previous
frame
Performs the motion compensation (MC) and encodes the difference between the two
macroblocks
UniversitateaSame procedure
Politehnica DCT/Quantization/Coding as at Intra MB (but for
din Bucureti the
Prof. dr.difference)
ing. Cristian Negrescu
Current Coded
error
T Q C
Input + - (innovation)
Q 1

Current
frame C
Current
Transform
DCT
Cuantizor
Q
H261 Video Encoder
Variable Length Coding (VLC)
Run Length + Huffman
Buffer Error
correction
Estimate of
current frame
Tr. Inv.
T 1
+ E
Input frame +
Decoded frame
Macroblock characteristics:
(display)
Q 1 Prediction by motion
compensation
R Memory
frame
MC
Aux. Info.
V
Each macroblock
Tr. Inv. MV, CR, etc
IDCT corresponds to a 16x16pixels Motion estimation
ME Encoder
C
Color space (YCbCr)
Decoded current
frame (display)
CR
Format 4:2:0 Memory
frame
CB
Intra MB Coding
1 block 8x8 pixels CB + 1 block 8x8 pixelsCR
Intra-MB Coding:
The coding is applied for the current frame
8x8 DCT
Uniform quantization:
= 8 for DC coefficients
= 2,4, , 62 for AC coefficients. Unlike for JPEG, for
all the coefficients, the same is applied
Zig-zag scanning:
RLE for Huffman coding
Symbols (Run-length, value)
Huffman entropic coding for non-zero coefficients
Current Coded

error
Current Input + -
T Q C
(innovation)
frame C E Transform Quantizer Variable Length Coding (VLC) Buffer Error Q 1
Input + -
DCT Q
Q 1 H261 Video Encoder

Coded
error
correction
Estimate of
current frame
+
Tr. Inv.
T 1
+ E
Decoded frame
(innovation)
Tr. Inv. (display)
IDCT
compensation
R Memory
frame
MC
Estimated of Aux. Info.
current frame + E V
Each macroblock
MV, CR, etc
+ corresponds to a 16x16pixels image Motion estimation

ME Encoder
Decoded current
Color space
R (YCbCr)
Memory
frame (display)
compensation
CR
Format 4:2:0
frame
MC
CB
V
16x16 pixels for Y '
Motion estimation
Y'
ME Inter MB Coding
Inter-MB Coding:
The coding is applied for the prediction error (MCME)
8x8 DCT
Uniform quantization:
= 2,4, , 62 for AC coefficients. Unlike for JPEG, for
all the coefficients, the same is applied
Zig-zag scanning:
UniversitateaPolitehnica
Huffman entropic coding for non-zero coefficients
din Bucureti Prof. dr. ing. Cristian Negrescu
Current Coded

error
Current Input + -
T Q C
(innovation)
Input + -
DCT Q

Coded
error
correction
Estimate of
current frame
+
Tr. Inv.
T 1
+ E
Decoded frame
(innovation)

Tr. Inv. (display)
IDCT
compensation
R Memory
frame
MC
current frame + E V
Each macroblock
MV, CR, etc
+ coresponds to a 16x16pixels image Motion estimation

ME Encoder
Decoded current
Color space
R (YCbCr)
Memory
frame (display)
compensation
CR
Format 4:2:0
frame
MC
CB
V
Motion estimation
Y'
ME Inter MB Coding
Motion estimation:
The searching algorithm operates on the pixel level
The largest searching window is -15 +15
H261 operates between 64kbps 1984kbps
The searching window is application dependent
Motion on small areas (head, shoulders, )
Rhomboidal smaller searching windows

Current Coded

error
Current Input + -
T Q C
(innovation)
Input + -
DCT Q

Coded
error
correction
Estimate of
current frame
+
Tr. Inv.
T 1
+ E
Decoded frame
(innovation)

Tr. Inv. (display)
IDCT
compensation
R Memory
frame
MC
current frame + E V
Each macroblock
MV, CR, etc
+ coresponds to a 16x16pixels image Motion estimation

ME Encoder
Decoded current
Color space
R (YCbCr)
Memory
frame (display)
compensation
CR
Format 4:2:0
frame
MC
CB
V
Motion estimation
Y'
ME Inter MB Coding
Motion estimation:
The searching algorithm operates on the pixel level
The largest searching window is -15 +15
H261 operates between 64kbps 1984kbps
The searching window is application dependent
Motion on small areas (head, shoulders, )
Rhomboidal smaller searching windows
The final perceptual quality is similar

Current Coded

error
Current Input + -
T Q C
(innovation)
Input + -
DCT
DC T Q
1
Q
H261 Video Encoder
Coded
error
correction
Estimate of
current frame
+
Tr. Inv.
T 1
+ E
Decoded frame
(innovation)
(innovation
Tr. Inv. (display)
IDCT
compensation
R Memory
frame
MC
current frame + E V
Each macroblock
MV, CR, etc
+
Loop Filter
coresponds to a 16x16pixels image Motion estimation
ME Encoder
Decoded decoded
Current current
Color space
R (YCbCr)
Memory
frame (display)
compensation
CR
Format 4:2:0
frame
MC
CB
V
Motion estimation
Y'
ME Inter MB Coding
Motion compensation and MV coding:

Optional
Only one single motion vector for each macroblock
The motion vector (MV) is the same for all the luminance blocks from a
macrobloc
MV for chrominance blocks are derived from the MV for luminance

Current Coded

error
Current Input + -
T Q C
(innovation)
Input + -
DCT
DC T Q
1
Q
H261 Video Encoder
Coded
error
correction
Estimate of
current frame
+
Tr. Inv.
T 1
+ E
Decoded frame
(innovation)
(innovation
Tr. Inv. (display)
IDCT
compensation
R Memory
frame
MC
current frame + E V
Each macroblock
MV, CR, etc
+
Loop Filter
coresponds to a 16x16pixels image Motion estimation
ME Encoder
Decoded decoded
Current current
Color space
R (YCbCr)
Memory
frame (display)
compensation
CR
Format 4:2:0
frame
MC
CB
V
Motion estimation
Y'
ME Inter MB Coding
Motion compensation and MV coding:

Optional
Only one single motion vector for each macroblock
The motion vector (MV) is the same for all the luminance blocks from a
macrobloc
MV for chrominance blocks are derived from the MV for luminance
Loop filter (Optional smoothing filter)
It reduces the prediction error by LP filtering of the estimated image
It uses a bi-dimensional separable 8x8 filter (2 1D filters with 3 coefficients)
The filter coefficients ([0 1 0] at the edges and [1/4 1/2 1/4] elsewhere)
It can be activated/de-activated for each MB
Usually, it is applied at very low bitrates, for MB which uses MC
MV are differential Hufmann encoded
H261 Video Encoder
Each macroblock corresponds to a 16x16pixels image
Color space (YCbCr)
CR
Format 4:2:0 CB

Current Current
frame C Transform Cuantizor Variable Length Coding (VLC) Error frame C E Transform Quantizer Variable Length Coding (VLC) Buffer Error
Buffer
Current DCT Q DC T Q Run Length + Huffman correction
H261 Video Encoder

Run Length + Huffman correction
Input + -
Input frame
1 Q 1 Coded
Q
error
(innovation
Tr. Inv.
Tr. Inv. IDCT
IDCT Estimated of
current frame + E
C Decoded current

frame (display)
Loop Filter
+
Current decoded
frame (display)
Color space (YCbCr) R

Memory Memory
compensation
frame
Intra MB Coding frame
MC
CR
Format 4:2:0 V
CB
Motion estimation

ME Inter MB Coding

H261 Video Encoder
Designed for videophone and videoconferencing applications:
The encoder has low latency (each frame is encoded immediately after its arrival)
The buffer used to implement the CBR mechanism is small and therefore, the
additional delay introduced is small.
The number of MB encoded as intra is small
The quantization step is controlled dynamically by the filling degree of the buffer.
Filling buffer forces the encoder to work in "skiped" mode.
To interrupt the recursive chain of predictive coding, I frames are inserted.
Into an I frame all the MB are coded as intra
The remainder frames are named P frames. Into a P frame the most blocks are usually
encoded as skiped or inter, and only a small number of MB are coded as intra
The H261 encoder is not optimised for storage or playback applications
In its native form the H261 encoder is not suitable for direct access, FF of FRW
The H261 encoder has a limited robustness to errors
Errors or sync lost makes impossible the decoding of the remainder of MB from the
current GOB. Error detection is done by identifying a state prohibited for the decoder.
The error handling mechanism involves stopping decoding, seeking a new GOB start
and, after it, resumption of decoding . Lost macroblocks are treated as " skiped "MB
Due to the predictive coding mechanism, the errors typically propagates few seconds,
until it arrives the first I frame (I frames are rare - at least one I frame to 132 P frames)
H261 Video Encoder
Encoded bitstream structure:

MPEG-1 Video Encoder
Designed for interactive multimedia applications:
Developed by MPEG in the period 1988-1993
It referes and uses many of the techniques developed in H261, facilitating
coexistence of both standards in a same equipment
It has as its primary objective the development of multimedia applications using
the CD as a storage medium. It is optimized to achieve a bitrate of about 1.2Mbps
at a quality similar to the analog VHS format
Unlike H261, it provides additional functionality for encoders and decoders,
functionalities based on a number of basic features:
Fast access (random) at various places in the video sequence
Fast forward (FF ) and fast backward (FR ) to facilitate the search of a desired scene
Reverse Playback
Compressed domain editing capabilities (concatenation, segmentation, reordering, insertion and
extraction of a sequence)
Wide range of applications (from video CD to interactive applications, storage
and video streaming via telecommunications networks)
MPEG-1 is a generic encoder
It includes synchronization with audio streams
It has a degree of robustness to errors
The encoder can be implemented in real time
MPEG- 1 was originally designed for image sizes up to CIF ( 352x288 ) at max 30 fps
rates, without interlacing.
Later, extensions were specified for increasing the resolution and for the ability to
accept TV signals at its input
To ensure interoperability (high variability of conditions on entry) the standard
introduces a set of minimum requirements for the MPEG-1 decoders
The decoder must be able to decode at least video sequences with parameters weaker or equal than those
corresponding to the standard television signals
Frames resolution: at least 720x576,
Numbor of frames per second: up to 30fps
Bitrate for the encoded MPEG-1 stream: up to 1.86Mb/s.
Unlike standard television, which uses interlaced video sequences (frames with odd
lines and frames with even lines), for the MPEG-1 standard, the only accepted format
is the progressive one (the video sequence contains successive frames for which the
exploration is complete, on all lines).
There are no mechanisms to effectively exploit the interlaced exploration (odd/even
fields)
To use the TV signals, is necessary to transform (outside the MPEG-1 standard, and
with additional costs) the interlaced video signals into a progressive video sequence.

CBR encoder
Current
frame C E Transform Quantiz. Variable Length Coding (VLC) Buffer Error
DCT Q Run Length + Huffman correction
Input + -
Q 1
Coding type:
Coded
error
(innovation)
Tr. Inv.
IDCT
Video predictive coding

Estimated
current frame + E
+
Decoded current
Based on motion compensation

frame (display)
compensation
R Memory
frame
MC
The compression unit is MB V

Motion estimation
ME MCME predictive coding
Enhanced hierarchic structure:

Level 1: Video sequence
Level 2: Group of pictures
Level 3: Frame (picture)
Level 4: Slice
Represents a collection of adjacent macroblocks
Used for the synchronisation
It helps the decoder to have an elegant answer to the missing blocks
This structure assists the encoding process. The spatial references to macroblocks are relevant
only when the macroblocks are placed in the same slice. The slice segmentation is content dependent
The bit rate compensation can be done at the slice level
Level 5: Macroblock
Level 6: Sample

The group of pictures (GOP) is a collection of successive Current
Input + -
frames into a certain mutual relationship from the standpoint Q 1 Coded
error
(innovation)
of the processing performed by the encoder Estimated
Tr. Inv.
IDCT
current frame + E
GOP consists from I, P and B frames: +
Decoded current
frame (display)
The I frames (see H261) are frames encoded in intra mode, with R
compensation Memory
frame
MC
no references to blocks from other previous or later frames) V

Motion estimation
Editting a sequence that contains only I frames is similar

to cinematographic film editing (simple, straightforward,

fast and efficient).
The efficiency of encoding (compression rate) for the I
frames is small (the temporal redundancy is not used).
It is recommended to insert I frames wherever there are
major changes in video content (when the scenes are
changing, when the lighting conditions are changing, etc. )
The decoding of the I frames is straightforward and fast.
The frame encoding order is 1,2,3,

MPEG-1 Video Encoding
Input + -
error
(innovation)
Tr. Inv.
IDCT
current frame + E
GOP consists from I, P and B frames : +
Decoded current
frame (display)
compensation Memory
frame
MC
no references to blocks from other previous or later frames) V

Motion estimation
The P frames (see H261) are frames for which it applies

predictive coding (with or without the motion compensation) . Is

coded the difference between the current frame and its estimate
based on the reference frame. The reference frame is the closest
previous frame of type I or P.
Compression efficiency is greater than that for the I
frames.
Requires memory for storing the reference frame (I or P)
The coding order is 1, 2, 3, 4, 5,
Good for normal playback (1x speed from left to right)
For random access or 1x reverse playback, it requires the
decoding of the entire sequence back, up to the nearest
previous I frame.
Difficulties for FF, FRW, editing

MPEG-1 Video Encoding
Input + -
error
(innovation)
Tr. Inv.
IDCT
current frame + E
GOP consists from I, P and B frames : +
Decoded current
frame (display)
compensation Memory
frame
MC
no references to blocks from other previous or later frames V

Motion estimation
The P frames (see H261) are frames for which it applies forward
predictive coding (with or without the motion compensation).

The B frames are the frames for which it applies the forward
and backward predictive coding (with or without the motion
compensation) . It is encoded the difference between the current
frame and its estimate based on two reference frames (anchors).
The reference frames(one before and another after the current
frame) are the closest type I or type P frames.
Compression efficiency is greater than that for the P
frames
Requires memory to store two reference frames (I or P)
Introduces an algorithmic delay equal to the time elapsed
until the receiveing of the second reference frame
Coding order is 1, 5, 2, 3, 4,
Good for normal playback (1x speed, from left to right)
Offers support for 1x reverse playing, FF, FRW

Unidirectional forward and backward prediction Current
Input + -
Q 1 Coded
error
Pros.
(innovation)
Tr. Inv.
IDCT
Estimated
Forward Prediction current frame + E
f(A)=A Increasing of coding efficiency +
Decoded current
FPS can be increased with only a Prediction by motion R
frame (display)
compensation Memory
slightly increasing of the bitrate MC
frame
It limits the error propagation V

Motion estimation
Bidirectional prediction
Cons.
(ex.)
Ref. 1 frame Current frame f(A,B)=A or f(A,B)=B
or f(A,B)=(A+B)/2 It requires more memory
Coding latency is increasing
Coding complexity is increasing
Ref.1 frame Current frame Ref.2 frame

Current
frame
Reorder C E Transform Weighting Quantizer Variable Length Coding (VLC)

Buffer
frames DCT W Q Run Length + Huffman
DCT + -
Input Pre Q 1
Unidirectional forward and backward prediction

video seq. processing Current
frame C
Weighting E Motion Quantiz.
Transform Variable Length Coding (VLC) Error
Buffer
Vectors
W 1
Input + -
Q 1 Coded
error
Pros.
Tr. Inv. (innovation)
Tr. Inv.
IDCT IDCT
Estimate for the Estimated
current frame + E
Forward Prediction + E
f(A)=A Increasingcurrent frame
of coding efficiency Decoded current +
Decoded current
FPS can be increased with only+ a frame
R Prediction by motion
frame (display)
compensation Memory
slightly increasing of the bitrate MC
frame
It limits the error

propagation R1 Memory
compensation
V
Ref 1 Motion estimation
Bidirectional prediction MC
Cons.
(ex.) MPEG-1 Encoder
Ref. 1 frame Current frame f(A,B)=A or f(A,B)=B V1
It requires+ moreMotion
I P(A)
or f(A,B)=(A+B)/2 memory
1/2 estimation
BCoding latency is increasing
ME
Coding complexity is increasing

R +
compensation
R2 Memory
Ref 1
Ref.1 frame Current frame Ref.2 frame MC
V2
Motion estimation
ME

Current
frame
Reorder C E Transform Weighting Quantizer Variable Length Coding (VLC) Buffer
frames DCT W Q Run Length + Huffman
MPEG-1 Video Encoder DCT + -
Input Pre Q 1
video seq. processing
Weighting
Motion
Vectors
W 1
Rules for GOP Estimate for the

Tr. Inv.
IDCT
current frame + EDecoded current
The + frame
I and P frames from a GOP are named anchors
R1
(references) compensation
MC
Memory
Ref 1
MPEG-1 Encoder
1 GOP contains a single I frame I P(A) V1

+
1/2 Motion estimation
B ME
In a GOP an I frame can be followed by any number of P R +

compensation
R2 Memory
frames MC
Ref 1
V2
Between the anchors it can be any number of B frames Motion estimation
ME
B frames can be placed even in front of the I frame The GOP parameters
Into a GOP with B frames, the order of the frames for N number of frames in the
encoding/decoding can be different from the order of the frames GOP
in the original sequence or from the presentation order M 1+ number of B frames
Before coding/decoding process, the frames must be ordered between anchors (I or P)
in such a way that for each B frame, the anchors should be Default values N=18, 15, 12, 7
available Usual N=15, M=3
GOP types Examples
I BB BBP|I BB BBP|
Closed GOP All the B frames from GOP have
references inside the current GOP (the GOP I B P| I B P|
ended with a P frame) I B B B B B B B B B P| I B B B B B B B B B P|
Open GOP Some of the B frames have B I B P| B I B P|
references outside the current GOP (the GOP
ended with a B frame)
Cadru
curent
Reordonare C E Transform Ponderare Cuantizor Codare cu lungime variabil (VLC) Buffer
cadre DCT W Q Run Length + Huffman
DCT + -
Intrare Pre Q 1
secv. video procesare
DCT Ponderare
Vectori

micare
W 1
Tr. Inv.
IDCT
Estimat
cadru curent + E Cadru curent
+ decodat
Predicie prin
compensare micare
R1 Memorie
Ref 1
MC
Codorul MPEG-1
I P(A) V1
+
1/2 Estimare micare
B ME
R +
Predicie prin
compensare micare
R2 Memorie
Ref 1
MC
Each macroblock correspunds to a 16x16pixels image

V2
Estimare micare
ME
Color space (YCbCr)

CR
Format 4:1:1 (4:2:0) CB
I frames encoding:
The coding is applied for the current frame (8) 16 19 22 26 27 29 34
16 16 22 24 27 29 34 37
8x8 DCT
19 22 26 27 29 34 34 38
It allows the application of a weighting to all AC coefficients. To
22 22 26 27 29 34 37 40
avoid lage artefacts, the DC coefficient remains un-weighted 22 26 27 29 32 35 40 48

The bit rate control: it monitors the output buffer fill and it changes 26 27 29 32 35 40 48 58
26 27 29 34 38 46 56 69
the value of the weighting coefficient
27 29 35 38 46 56 69 83
Uniform quantisation of the weighted DCT coefficients. It exists a
standard quantization table (for MPEG-1 and MPEG-2), but, other
optional tables can be used
Differential coding for DC coefficients
Zigzag scanning
UniversitateaPolitehnica din Bucureti
Non-zero coefficients will be codded using Huffman Prof. dr. ing. Cristian Negrescu
Cadru
curent
DCT + -
Intrare Pre Q 1
DCT Ponderare
Vectori

micare
W 1
Tr. Inv.
IDCT
Estimat
+ decodat
Predicie prin
compensare micare
R1 Memorie
Ref 1
MC
Codorul MPEG-1
I P(A) V1
+
1/2 Estimare micare
B ME
R +
Predicie prin
compensare micare
R2 Memorie
Ref 1
MC

V2
Estimare micare
ME
Color space (YCbCr)

CR
Format 4:1:1 (4:2:0) CB
P frames coding:
16 16 16 16 16 16 16 16
16 16 16 16 16 16 16 16

It encodes the difference between the current frame and the 16

16 16 16 16 16 16 16

16 16 16 16 16 16 16 16
motion compensated reference (first previous I or P) frame 16 16 16 16 16 16 16 16

The motion compensation has subpixel accuracy (1/2sample). It 16 16 16 16 16 16 16 16
16 16 16 16 16 16 16 16
uses a bilinear motion model
16 16 16 16 16 16 16 16
For each macroblock, 1 single motion vector is encoded
8x8 DCT, weighting for bitrate control, uniform quantization,
scanning, entropic coding (similar to the I frames)
It exists a standard quantization table (different from the table
used for I frames). Other optional quantization tables are allowed.

Cadru
curent
DCT + -
Intrare Pre Q 1
DCT Ponderare
Vectori

micare
W 1
Tr. Inv.
IDCT
Estimat
+ decodat
Predicie prin
compensare micare
R1 Memorie
Ref 1
MC
Codorul MPEG-1
I P(A) V1
+
1/2 Estimare micare
B ME
R +
Predicie prin
compensare micare
R2 Memorie
Ref 1
MC

V2
Estimare micare
ME
Color space (YCbCr)

CR
Format 4:1:1 (4:2:0) CB
16x16 pixels forY ' Y'
B frames coding:
16 16 16 16 16 16 16 16
16 16 16 16 16 16 16 16

It uses reference frames (I or P) for which the prediction is done 16

16 16 16 16 16 16 16

16 16 16 16 16 16 16 16
considering the motion compensation 16 16 16 16 16 16 16 16

2 motion vectors/MB are used for the bidirectional prediction and 16 16 16 16 16 16 16 16
16 16 16 16 16 16 16 16
one single motion vector/MB is used for the forward or backward
16 16 16 16 16 16 16 16
prediction
It encodes the difference between the current frame and the
bidirectional or unidirectional predicted estimate of it
The motion compensation has a subpixel accuracy (1/2sample). It
uses a bilinear motion model
8x8 DCT, weighting for bitrate control, uniform quantization,
scanning, entropic coding (similar to the P frames)
Comparison MPEG-1 H261
MPEG-1 H261
Allows multiple spatial resolutions Allows only QCIF and CIF
(CIF, SIF, QCIF, SQCIF, etc)
Variable aspect ratio (defined in the Aspect ratio: only 4:3
header)
It introduces the GOP concept No GOP
Macroblocks I, P, B Macroblocks I, P
Optimized for 1.2 Mbps Optimized for. 384kbps
No restrictions regarding the number Only 1,2 or 3 successive frames are
of the successive skipped frames allowed
Algorithm for motion estimation with Algorithm for motion compensation
sub-pixel accuracy with pixel accuracy
Usual searching window (for motion Usual searching window (for motion
compensation): 15 pixels compensation): 7 pixels
Table based uniform quantization Uniform quantization, the same
(similar with JPEG) quantization step for all AC
coefficients
For application where the end-to-end For application with critic end-to-end
latency is not critic latency
A down-compatible superset for the MPEG-1 encoder:
Delivers video with better quality (at least conforming with the quality standard
CCIR 601)
Larger image formats
Impose restrictions for the Slice level
Offers support for operating with interlaced video sequences (TV)
Introduces frame images/field images
Allows other chroma formats (4:4:4 4:2:2)
Introduce new prediction modes
Allows the alternate scanning of the DCT coefficients
Allows applying DCT for frames/fields
Allows motion compensation on blocks with 16x8 pixeli
Enhances the entropic coding
Ofers support for film-to-video transfer
Video scalability (allows efficient delivery of simultaneous SD and HD versions
of the same content or optimisation of the perceived quality when transmission
channels prone to the errors are used)
Support for coding of interlaced video sequences (TV) :
Frame Mode
Each 8x8 DCT block contains both phases of the motion (the two fields are combined into
one single frame and subsequently coded)
Artefacts for scenes containing fast motion (the vertical edges become dented large
HF components (low compression rate)
Field Mode
Each 8x8 DCT block contains only one single phase of motion (each field is encoded
separately)
More efficient for the compression of static images or scenes containing slow motion

Support for coding of interlaced video sequences (TV) :
New modes for prediction
For the P frames, the Top field (odd lines) is estimated by prediction, using both field from the reference
frame
For P frames, the Bottom field (even lines) is estimated by prediction, using the Bottom field from the
reference frame and the Top current field (estimated in the above phase)

Support for coding of interlaced video sequences (TV):
New modes for prediction
For B frames, the Top field (odd lines) is estimated by prediction, using both (all) fields from the both
reference frames
For B frames, the Bottom field (even lines) is estimated by prediction, using both (all) fields from the
reference frames
Alternate scanning for the DCS coefficients

MPEG-2
Profiles and Levels for MPEG-2:
Video Encoder
A complete MPEG-2 encoders is quite complex, therefore it exist more video encoding variants.
Not all of them supports the maximal specifications. In this context, the Profile (Profil) and Level
(Nivel) notions are used. With their help one can define different compatible configurations.
The Profile represents a way in which the used basic functionalities or the coding tools are described.
Each new profile defines a new set of algorithms and it add them to the previous profile
By using levels, one specifies the range of values for the coding parameters supported by the respective
implementation implementare i suport
Profile (Profil) Algorithms (functionalities)
HIGH Supports all the functionalities of the Spatial Scalable Level (Nivel) Parametrs
profile plus: HIGH - 1920x1152
- Allows 3 levels for SNR and spatial scalability - 60 fps
-- Format YUV 4:2:2, (YUV 4:4:4) - 100(80)Mbps
Spatial Supports all the functionalities of the SNR Scalable profile High 1440 - 1440x1152
Scalable plus: - 60 fps
- Spatial scalability (allows 2 levels) - 80(60)Mbps
-- Format YUV 4:2:0 Main - 720x576
SNR Scalable Supports all the functionalities of the MAIN profile plus: - 30 fps
- SNR scalability (allows 2 levels) - 20(15)Mbps
-- Format YUV 4:2:0
Low - 352x288
Main Non-scalable encoder with coding tools for:
- 30 fps
- Interlaced video coding
- 4(3)Mbps
- Random access
- Uses the bidirectional prediction (B frames)
- Format YUV 4:2:0
Simple Includes MAIN functionalities but
- B frames are not supported
- Formatdin
Universitatea Politehnica YUV 4:2:0
Bucureti Prof. dr. ing. Cristian Negrescu
SNR Scalability
Spatial/temporal Scalability

International Standards for Video Encoding

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

International Standards for Video Encoding

Uploaded by

Copyright:

Available Formats

PERCEPTUAL VIDEO

Compression Techniques for Multimedia

Universitatea Politehnica din Bucureti Prof. dr. ing. Cristian Negrescu

Universitatea Politehnica din Bucureti Prof. dr. ing. Cristian Negrescu

H261 Video Encoder Estimate of

The Macroblock characteristics:

Color space (YCbCr)

The macroblock - encoding:

Q 1 H261 Video Encoder

+ corresponds to a 16x16pixels image Motion estimation

1 block 8x8 pixels CB + 1 block 8x8 pixelsCR

Q 1 H261 Video Encoder

The Macroblock characteristics:

+ coresponds to a 16x16pixels image Motion estimation

1 block 8x8 pixels CB + 1 block 8x8 pixelsCR

Universitatea Politehnica din Bucureti Prof. dr. ing. Cristian Negrescu

Q 1 H261 Video Encoder

The Macroblock characteristics:

+ coresponds to a 16x16pixels image Motion estimation

1 block 8x8 pixels CB + 1 block 8x8 pixelsCR

Universitatea Politehnica din Bucureti Prof. dr. ing. Cristian Negrescu

1 block 8x8 pixels CB + 1 block 8x8 pixelsCR

Motion compensation and MV coding:

Universitatea Politehnica din Bucureti Prof. dr. ing. Cristian Negrescu

1 block 8x8 pixels CB + 1 block 8x8 pixelsCR

Motion compensation and MV coding:

Universitatea Politehnica din Bucureti Prof. dr. ing. Cristian Negrescu

H261 Video Encoder

Each macroblock corresponds to a 16x16pixels image

Color space (YCbCr) R

16x16 pixels for Y ' Y'

1 block 8x8 pixels CB + 1 block 8x8 pixelsCR

Universitatea Politehnica din Bucureti Prof. dr. ing. Cristian Negrescu

Universitatea Politehnica din Bucureti Prof. dr. ing. Cristian Negrescu

Video predictive coding

Based on motion compensation

The compression unit is MB V

Enhanced hierarchic structure:

Universitatea Politehnica din Bucureti Prof. dr. ing. Cristian Negrescu

no references to blocks from other previous or later frames) V

Editting a sequence that contains only I frames is similar

to cinematographic film editing (simple, straightforward,

Universitatea Politehnica din Bucureti Prof. dr. ing. Cristian Negrescu

no references to blocks from other previous or later frames) V

The P frames (see H261) are frames for which it applies

predictive coding (with or without the motion compensation) . Is

Universitatea Politehnica din Bucureti Prof. dr. ing. Cristian Negrescu

no references to blocks from other previous or later frames V

predictive coding (with or without the motion compensation).

Universitatea Politehnica din Bucureti Prof. dr. ing. Cristian Negrescu

It limits the error propagation V

Ref.1 frame Current frame Ref.2 frame

Universitatea Politehnica din Bucureti Prof. dr. ing. Cristian Negrescu

MPEG-1 Video Encoder

Unidirectional forward and backward prediction

It limits the error

Coding complexity is increasing

Universitatea Politehnica din Bucureti Prof. dr. ing. Cristian Negrescu

MPEG-1 Video Encoder DCT + -

Rules for GOP Estimate for the

current frame + EDecoded current

1 GOP contains a single I frame I P(A) V1

In a GOP an I frame can be followed by any number of P R +