Professional Documents
Culture Documents
Video Coding
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Agenda
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Introduction (1/2)
Why video compression technique is important ? One movie video without compression
720 x 480 pixels per frame 30 frames per second Total 90 minutes Full color The full data quantity = 167.96 G bytes !!
3
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Introduction (2/2)
Interframe Coding
Remove temporal redundancy
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Desired Features
Better compression Improved quality Interactivity and Manipulation of Content Error Resilience Processing of content in the compressed domain Identification and selective coding/decoding of the object of interest Facilitate Search / Indexing (MPEG-7)
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Time table
H.264
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
MPEG
MPEG-1 Video-CD
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
MPEG-2:
.vob, .m2v, rarely .mpg files Anything to do with DVD
Camcorders, DVD players, DVD recorders
Digital TV (DVB)
MPEG-4:
High Quality AVI files Video Phones DivX Some advanced audio players support MPEG-4 Advanced Audio Coding (AAC)
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Where used?
H.263/+/++
NetMeeting and similar video-chat Network streaming application, video phone H.264
Video Conferencing: over different networks Multimedia Streaming: live and on-demand Multimedia Messaging Services (MMS) Blu-ray, Digital Video Broadcasting, iPod Video, HD DVD
VC-1, VC-2
Video on Internet, HDTV broadcast, UHDTV
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
48
46
44
PSNR (Y)
42
40
38
36
34
32 350
450
550
650
850
950
1050
MPEG-1
MPEG-2
MPEG-4
H.264
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Questions
What are video/audio codecs ? Name some popular codecs that your media players support. What are disadvantages of using specific codecs ? What is container format? Name some examples. Codecs and Formats
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Compression...
movie picture 1
movie picture 2
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Horse ride
Motion estimation
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Motion Prediction
Motion vector: a motion vector is a bidimensional pointer that tell the decoder how much left/right and up/down Motion estimation: the process, perfomed by the coder, that should find the motion vector pointing to the best prediction macroblock in a reference frame or field Motion compensation: what obtained after applying motion vector on reference frame
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Motion Estimation
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Motion Compensation
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Picture type
Slice
One or more "contiguous'' macroblocks. The order of the macroblocks within a slice is from left-to-right and top-to-bottom.
Macroblock
A 16-pixel by 16-line section of luminance components and the corresponding 8-pixel by 8-line section of the two chrominance components.
Block
A block is an 8-pixel by 8-line set of values of a luminance or a chrominance component.
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
CODEC Design
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Coding functions
Achieve high compression performance while keep good picture quality Theorem
Spatial redundancy DCT,DFT,subband,wavelet Temporal redundancy MC/ME Statistical redundancy VLC, Entropy coding Perceptual redundancy VQ
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
DCT
3D DCT transform ?
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
DCT Transformation
23
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Steps
Spatial-to-DCT domain transformation 8 x 8 DCT Discard unimportant DCT domain samples Quantization Lossless coding of DCT domain samples Entropy Coding
Image
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Quantization
Quantization
Eyes are insensible to high-frequency components The greater quantizer means greater loss Lower frequency component has smaller quantizer, high frequency component has greater quantizer The quantization tables in the encoder and decoder are the same
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Picture type
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Picture type
Intra picture
Coded using only information present in the picture itself I-pictures provide potential random access points into the compressed video data.
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Picture type
Predicted picture
coded with respect to the nearest previous I- or Ppicture. P-pictures use motion compensation Unlike I-pictures, P-pictures can propagate coding errors
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Picture type
Bidirectional picture
Coded use both a past and future picture as a reference B-pictures provide the most compression and do not propagate errors
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Picture type
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Intra-frame: encoded without prediction Inter-frame: predictively encoded => use quantized frames as ref for residue
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Defines the decoder but not the encoder Frames (pictures) A22
Intra-coded using JPEG Inter-coded using (interpolated) ME & MC and JPEG for the residuals
A21
MacroBlocks (MBs)
1616 pixels block
Rate control
buffer at each end Test Model 5 (TM5)
Slide 32 A22 Intracoding of MBs in MPEG is as same as what is described for JPEG, except that 1) unless otherwise specified in the sequence header MPEG defines quantization tables: one is used for intracoding, the other is used to code any residules when prediction by montion estimation. 2)Quantization scale factor, or MQuant is different.
Author, 6/17/2004
A21
MPEG does not define the encoder. A valid encoder produces a syntactically correct bit stream, resulting in the desired output if the bit stream is fed to a compliant decoder. But an MPEG-1 complaint decoder is required to decode all valid MPEG-1 bit streams.
Author, 6/17/2004
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
MPEG-2 = MPEG-1 +
Improvements
Color space: could support 4:2:2 and 4:4:4 coding Quantization: could have 9- or 10- bit precision for DC coefficients Concealment motion vectors: used when an intra-MB is lost Pan and Scan: supports display of different aspect ratios, e.g., 16:9
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
MPEG-2 = MPEG-1 +
Interlace tools Scalable coding profiles System layer: define two bit stream constructs
Program stream (PS): modeled on MPEG-1 (backward compatibility) Transport stream (TS): more robust, does not need a common time base, designed for use in error-prone environment.
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
35
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Data partitioning
Partitions the data in a video packet into a motion part and a texture part separated by a motion boundary marker (MBM)
I-VOP A video packet VP Header Resync. marker MB No. DC DCT data QP AC DCT data HEC Repeated header info. P-VOP Motion data VP Header MBM DCT use discard Motion data Texture data data use
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Multi-mode, multi-reference MC Motion vector can point out of image border 1/4-, 1/8-pixel motion vector precision B-frame prediction weighting 44 integer transform Multi-mode intra-prediction In-loop de-blocking filter UVLC (Uniform Variable Length Coding) NAL (Network Abstraction Layer) SP-slices
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
11 levels
Resolution, capability, bit rate, buffer, reference # Built to match popular international production and emission formats From QCIF to D-Cinema
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Transform/ Scal./Quant.
Decoder
Intra/Inter
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
The fixed block size may not be suitable for all motion objects
Improve the flexibility of comparison Reduce the error of comparison
41
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
The neighboring frames are not the most similar in some cases The B-frame can be reference frame
B-frame is close to the target frame in many situations
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
H ..
..
Mean (H, V)
H ..
4modes
Mean (H, V)
..
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Deblocking filter
Picture is filtered using an adaptive deblocking filter. The filter removes visible block structures on the edges of the 4 X 4 blocks caused by block-based transform coding and motion estimation
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Deblocking Filters
A boundary-strength (BS) parameter is assigned to every 44 block
Block modes and conditions One of the blocks is intracoded and the edge is a MB edge One of the blocks is intracoded One of the blocks has coded residuals Difference of block motion one luma sample distance Motion compensation from different reference frames Else (BS) 4
3 2 1
Thresholds and depend on the average quantization parameter (QP) The deblocking filtering accounts for 1/3 of the computational complexity of a decoder.
46
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
SP and SI-frames
allow identical reconstruction when coded using different references Subtract the reference in the coder and add it back in the decoder
Bitstream switching
In previous coding standards: perfect (mismatch-free) switching only happens at Intra-frames.
Stream 2: P2,n-2 P2,n-1
SP2,n
P2,n+1
P2,n+2
SP12,n
Other applications
Bitstream splicing Error recovery/resilience Video redundancy coding
47
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Transformation
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Network friendliless
H.264 structure
Video coding layer (VCL) Network abstraction layer (NAL)
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
H.264 Over IP
Network Abstraction LayerOSI/RM Unit (NALU)
A byte stream of variable length 1-byte header
NALU type (T) NALU importance (R) Error indication (F)
T R F
Protocols and specifi-cations for H.264 RTP (Real-Time Transport Protocol) Header size: IP/UDP/RTP = 20+8+12=40 bytes Media-Unaware RTP payload specifications to reduce the loss rates observed by the decoder. Packet duplication/Packet based FEC/Audio redundancy coding Control protocols: H.245, SIP (Session Initiation Protocol), SDP (Session Description Protocol), RTSP (Real-Time Streaming Protocol)
A1 UDP (User Datagram Protocol) IP: best effort service
RTP packetization
Simple packetization
One NALU in one RTP packet NALU header as RTP header
Session Layer
Slide 50 A1 IP header is 20 bytes in size and protected by a checksum. No protection of the payload is performed.
Author, 8/24/2011
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Comparison
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
H265 outlook Half-rate reduction compared to H264 Tree-structured prediction and residual difference block segmentation Extended prediction block sizes (up to 64x64) Tile and slice picture segmentations for loss resilience and parallelism Wavefront processing structure for decoder parallelism Mode-dependent sine/cosine transform type switching Adaptive motion vector predictor selection Temporal motion vector prediction
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
3D video coding
Left and right eye view Depth sensation Resolving 2D viewing ambiguity Additional features: Free view points Depth-controlled object insertion
53
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
view
..
. . .
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Homework 1
Download the open source tool X264 from VIDEOLAN website Capture a video sequence via webcam or from the Internet Work around with FFMPEG to encode and transcode the video sequence with different standards (mpeg2, mpeg4, h.263, h.264, etc), parameters Playback the encoded video and comment Contain the encoded video sequence in mp4 format
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Homework 2
Pham Van Tien, Dr. rer. nat. , Embedded Networking Research Group Faculty of Elec. and Telecom, Hanoi University of Science and Technology
Computer vision
Game Graphics
Multimedia retrieval
Segmentation Search (Google)
Multi-camera system
3D cinema Realistic broadcasting