AT MMC Lecture10

BITS Pilani
Hyderabad Campus
Multimedia
Abhishek Thakur
Computing
CSIS
BITS-Pilani, Hyderabad
Campus
BITS Pilani
Hyderabad Campus
Multimedia Computing
Research trends and Review of Key Terms
Module -10 (of 10)
BITS Pilani
Hyderabad Campus
References: Online materials
Modules
Module
Coverage
1. Introduction
Overview of Multimedia Applications, Systems and

Tools
2. Data Compression
Lossless and Lossy compression
3. Image, Graphics and

Colors
Graphics and Image Data Representation, Colour

Science and JPEG
4. Video / Audio
Fundamentals
Basics of Audio and Video as they evolved
5. Video Compression
H261 H264,MPEG and High Efficiency Video Coding
6. Audio and
Synchronization
Audio compression techniques and synchronization

between audio, video, text, graphics etc.
7. Storage and
Communication Basics
Overview of secondary storage and networks, latency,

buffering / queuing, interrupts etc.
8. Multimedia
Communication
Real time communication protocols, Quality of Service,

MPEG transport, Wireless communication
9. Modern Multimedia
New protocols (VoIP, DASH) and applications (YouTube,

Content Distribution).
10. Research trends

and Computing
Video Search,
object detection
and
course
Multimedia
10/4/16
Slide 4 and tracking
BITS Pilani,
Hyderabad
Campus
10.1 HEVC
High Efficiency Video Coding
Aims for 50% bit-rate reduction for similar video quality as compared
to Advanced Video Coding (AVC / H264 / MPEG-4 part 10)
25% reduction in bit rate with 50% reduction is complexity for same
quality
Feasible because of faster and more processing, larger buffers,
more GPU cores
Ref: Wikipedia + Gary Sullivans Overview of the High Efficiency
Video Coding (HEVC) Standard work (http://goo.gl/6aXHK3 and
http://goo.gl/SoLBEY)
10/4/16
Slide 5
BITS Pilani, Hyderabad Campus
Quick overview of Video

Compression (Till H-264)
Macro-Blocks typically 16x16

Intra coding spatial redundancy within frame / picture
2D Transforms on blocks (DCT/DWT) typically 8x8
Motion Prediction Predicted frames(P) .. Bi-Directional prediction (B)
Quantization compression after Transform (DCT/DWT)
Lossless Compression Run length, Arithmetic CABAC etc.
Prediction of Values - DC components, AC Conponents, motion
vectors
Group of Blocks (GOB) or Slices error resilience & parallelism

Scalability in multiple manners (spatial, temporal, quality, )
Object based coding for Graphical output
10/4/16
Slide 6
HEVC Needs
Higher resolution Video content and Displays [8k]
Higher frame rates [60 or 120 fps for super slow motion etc.]
Medical and other fields with higher color depth (say 10 bit i.o. 8 bit
on luminance)
More CPU/GPU cores needs better parallelization of logic
Multi View coding (multiple video feeds / 3D - though partially done in AVC)
Version 1 of HEVC formally ratified in April 2013
10/4/16
Slide 7
Key changes in HEVC

Larger Coding Tree Unit (CTU 64x64) for motion prediction
etc. in place of 16x16 macro blocks
Concept of rectangular tiles as an option with no Intra/inter
prediction across tiles in I slices and no Intra prediction in P/B
slices
CTU Can be split into smaller Transform Units (TU of 4x4, 8x8,
16x16)
Each CTU can have 4 or more TU
Loop filters may be used minimizes blocking effects around CTUs
10/4/16
Slide 8
Other changes
Waveform parallel processing (WPP)
Slices divided into rows of CTUs.

Some initial CTUs needs to be decoded for top row, before next row
can be decoded.
Allows some level of parallelism
Other improvements made on CABAC, motion vector prediction,

loop filters etc.
10/4/16
Slide 9
10.2 Video Search

Order of complexity much higher than Text Search or Image
Search.
If the content is well organized during storage (e.g. DVD Library or
Electronic Program Guide - EPG) it can be as simple as indexing on title (file
name), or tags put during storage e.g. actors, channel number (or name)
etc. explicit metadata can be used for such search.
Some times internal (implicit) metadata can be used e.g. Title and
other details within container (say file header fields), time and location of
capture / edit, tools used to capture / edit.
Next level is search for content beyond the headers e.g. subtitles
(transcripts or close-caption info), audio search, story search, image search,
scene search etc.
How the search engine is driven is also important e.g. is it pure text
search, sample image based search, interpret implicit user intent etc.
10/4/16
Slide 10
Advanced Video Search

Speech recognition say search when Welcome to New York is spoken
in video: Research area of Audio to Text and beyond.
Text within video search when M G Road or Mahatma Gandhi Road

us present within some scene in a video: Research area of Optical
Character recognition and beyond.
Frame analysis Visual descriptors analyze each frame to extract

information that can be later searched for e.g. color, texture, shape,
motion, situation etc.
Search against given input search across videos to match the image
of missing person or pet
10/4/16
Slide 11
Video Analysis
There are three key steps in video analysis:
detection of interesting moving objects,
tracking of such objects from frame to frame, and
analysis of object tracks to recognize their behaviour.
10/4/16
Slide 12
10.3 Object Detection and

Tracking
Ref: Yilmaz, Alper, Omar Javed, and Mubarak Shah. "Object tracking: A
survey."Acm computing surveys (CSUR) 38, no. 4 (2006): 13.
Some applications:
motion-based recognition, that is, human identification based on gait, automatic
object detection, etc;

automated surveillance, that is, monitoring a scene to detect suspicious activities
or unlikely events;
video indexing, that is, automatic annotation and retrieval of the videos in
multimedia databases;
human-computer interaction, that is, gesture recognition, eye gaze tracking for
data input to computers, etc.;
traffic monitoring, that is, real-time gathering of traffic statistics to direct traffic flow.
vehicle navigation, that is, video-based path planning and obstacle avoidance
capabilities.
10/4/16
Slide 13
Tracking - phases
Tracking task:
In the simplest form, tracking can be defined as the problem of
estimating the trajectory of an object in the image plane as it
moves around a scene. In other words, a tracker assigns
consistent labels to the tracked objects in different frames of a
video. Additionally, depending on the tracking domain, a tracker can
also provide object centric information, such as orientation, area, or
shape of an object.
How - Two subtasks:
Build some model of what you want to track
Use what you know about where the object was in the previous
frame(s) to make predictions about the current frame and restrict the
search
Repeat the two subtasks, possibly updating the model

10/4/16
Slide 14
Tracking : Complexity
Tracking objects can be complex due to:
loss of information caused by projection of 3D world on 2D image
noise in images
complex object shapes / motion
non-rigid or articulated nature of objects
partial and full object occlusions
scene illumination changes
real-time processing requirements
Constraints to Simplify:
Almost all tracking algorithms assume that the object motion is smooth with no
abrupt changes
The object motion is assumed to be of constant velocity
Prior knowledge about the number and the size of objects, or the object
appearance and shape
10/4/16
Slide 15
Object Representation - Shape
10/4/16
Slide 16
(a) Centroid,
(b) Multiple
points,
(c) Rectangular
patch,
(d) Elliptical
patch,
(e) Part-based
multiple
patches,
(f) Object
skeleton,
(g) Complete
object
contour,
(h) Control points
on object
contour,
(i) Object
silhouette.
Object Representation Appearance

Template based e.g. Face Description parameters pose
should not vary much
Probability densities based e.g. texture within the ellipse or
segment as modelled in Gaussian or Histogram based
approach
10/4/16
Slide 17
Features for Tracking

Color: RGB, Luv, Lab, HSV, etc. There is no last word on
which color space is more effective; a variety of color spaces
have been used
Edges: less sensitive to illumination changes compared to color
features. Algorithms that track the object boundary usually
use edges as features. Because of its simplicity and
accuracy, the most popular edge detection approach is the
Canny Edge detector.
Texture: measure of the intensity variation of a surface which
quantifies properties such as smoothness and regularity
Optical Flow: Look at uniformity of displacement across frames
10/4/16
Slide 18
Object Detection
Either at beginning or when an object first appears in the video
Point detectors: find interest points in images which have an
expressive texture in their respective localities
Segmentation: partition the image into perceptually similar
regions
Background subtraction: Build a representation of the scene
called the background model and then finding deviations from
the model for each incoming frame.
Supervised Classifiers: Prior training based on sample images
which are pre-classified (SVM, Neural networks etc.)
10/4/16
Slide 19
Detection Examples
10/4/16
Slide 20
Object Segmentation
Mean Shift Clustering

Graph Cut e.g. Normalized cuts
Contour based approaches
10/4/16
Slide 21
Object Tracking
Fig. 7. Taxonomy of tracking methods.

10/4/16
Slide 22
Some Examples of object

tracking
Fig. 8. (a) Different tracking approaches. Multipoint correspondence, (b) parametric

transformation of a rectangular patch, (c, d) Two examples of contour evolution.
Fig. 11. Results of two
point correspondence
algorithms. (a) Tracking
using the algorithm
proposed by Veenman et
al. [2001] in the rotating
dish sequence color
segmentation was used to
detect black dots on a
white dish (c 2001 IEEE).
(b) Tracking birds using
the algorithm proposed
by Shafique and Shah
[2003]; birds are detected
using
background
10/4/16
Slide 23
BITS
Pilani, Hyderabad Campus
10.4 Course Recap: Key Topics

(1/3)
Intro Evolution of Multimedia and Research challenges; Authoring and Editing tools
Compression
Lossless Compression Entropy, Run Length Coding, Variable Length Coding, Adaptive Huffman,
Dictionary based coding (LZW), Arithmetic, Differential coding in Images
Lossly Compression Quantization (linear/non-linear), Transforms DCT, DWT, Embedded Zerotree

Wavelet coefficient (successive approximation)
Color Science Human vision , RGB, YUV, CMYK , Gamma correction,

Image file format and compression 1 bit per pixel, 32 bpp.., palette, Color
Lookup tables
Audio Video Fundamentals Analog video: Connectivity (separate color and audio
etc.), scan lines, interlacing, color and audio subcarrier; Digital Video: Chroma subsampling,
screen resolution, frames rate Digitization of Sound: Sampling and Quantization, aliasing
and Nyquist sampling rate, SNR, logarithmic amplitudes (dB/dBm), non-linear transform before
encoding, human voice vs. music, band-limiting / low pass for voice, synthetic sound, PCM,
DPCM, Adaptive Delta Modulation, Adaptive DPCM
10/4/16
Slide 24
Course Recap: Key modules (2/3)

Video Compression: Macro-blocks, Motion prediction / motion vectors, GOB; MPEG,
bi-directional prediction, Slices, Scalable video coding (SNR, Temporal, Spatial, Hybrid, data
partitioning); MPEG-4 - Object based coding, Video object Sequence .. Video object
plane(VOP), Bounding box and boundary block in VOP, padding, shape adaptive DCT, shape
coding (intro), global vs. local motion compensation, triangulation for synthetic objects,
face/body animation/definition parameters; H264/AVC; MPEG-7 (description).
Audio Compression: telecom driven conferencing approaches (ADPCM), Vocoders

(channel bands discard phase), Linear predictive coding with vocoders, Code Excited
Linear Prediction (long term / short term); Psychoacoustics, Frequency and temporal
masking; MPEG audio - time to frequency transform, audio frames (12 or 36 samples across
32 bands), stereo/multi-channel audio,
Synchronization: Temporal Sync between audio, video, pointer etc., Logical data units
(LDU), compensating for loss of sync; Synchronization specification models Interval based,
Timeline axis based, Flow Control - Hierarchical (serial / parallel), Reference point based
Flow control, Event based.
10/4/16
Slide 25
Course Recap: Key modules (3/3)

Multimedia Systems challenge on scalability and portability [host CPU/RAM/Disk,
network, capture interface or device, render interface or device], intro to storage and
scheduling challenges
Multimedia Communications broad overview of networks and its evolution,

circuit vs. packet switched, OSI vs. TCP/IP layers, physical layers, Mobility and wireless,
TCP and UDP; FDM or WDM, TDM, Ethernet, access networks (fiber, terrestrial, satellite);
QoS, Jitter, MPLS; Multimedia on UDP/IP (Multicast, RTP, RTCP , RSVP , RTSP, H323, SIP),
Set top box, Broadcast schemes for Media on Demand, Buffer management; Wireless
networks - Cellular evolution GSM/CDMA, 3G and beyond, wireless LAN, Bluetooth,
wireless propagation and loss, error resiliency in multimedia for wireless communication
[FEC, burst errors, Error Resilient Entropy Coding blocks are at fixed distance]
Modern Multimedia Systems
Adaptive streaming, DASH (MPD Media

Presentation Description), Segment indexing, Content Distribution Networks, Scalable file
systems, Hadoop FS, GFS, BigTable
Multimedia Search, Detection and Tracking HEVC, Search, Detection

and Tracking
10/4/16
Slide 26
10/4/16
Slide 27
Thank You !!
10/4/16
Slide 28

AT MMC Lecture10

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AT MMC Lecture10

Uploaded by

Copyright:

Available Formats

BITS Pilani

References: Online materials

Overview of Multimedia Applications, Systems and

Lossless and Lossy compression

3. Image, Graphics and

Graphics and Image Data Representation, Colour

Basics of Audio and Video as they evolved

H261 H264,MPEG and High Efficiency Video Coding

Audio compression techniques and synchronization

Overview of secondary storage and networks, latency,

Real time communication protocols, Quality of Service,

New protocols (VoIP, DASH) and applications (YouTube,

10. Research trends

BITS Pilani, Hyderabad Campus

Quick overview of Video

Macro-Blocks typically 16x16

Group of Blocks (GOB) or Slices error resilience & parallelism

BITS Pilani, Hyderabad Campus

BITS Pilani, Hyderabad Campus

Key changes in HEVC

BITS Pilani, Hyderabad Campus

Slices divided into rows of CTUs.

Other improvements made on CABAC, motion vector prediction,

BITS Pilani, Hyderabad Campus

10.2 Video Search

BITS Pilani, Hyderabad Campus

Advanced Video Search

Text within video search when M G Road or Mahatma Gandhi Road

Frame analysis Visual descriptors analyze each frame to extract

BITS Pilani, Hyderabad Campus

BITS Pilani, Hyderabad Campus

10.3 Object Detection and

object detection, etc;

BITS Pilani, Hyderabad Campus

Repeat the two subtasks, possibly updating the model

BITS Pilani, Hyderabad Campus

BITS Pilani, Hyderabad Campus

Object Representation - Shape

Object Representation Appearance

BITS Pilani, Hyderabad Campus

Features for Tracking

BITS Pilani, Hyderabad Campus

BITS Pilani, Hyderabad Campus

BITS Pilani, Hyderabad Campus

Mean Shift Clustering

BITS Pilani, Hyderabad Campus

Fig. 7. Taxonomy of tracking methods.

BITS Pilani, Hyderabad Campus

Some Examples of object

Fig. 8. (a) Different tracking approaches. Multipoint correspondence, (b) parametric

10.4 Course Recap: Key Topics

Lossly Compression Quantization (linear/non-linear), Transforms DCT, DWT, Embedded Zerotree

Color Science Human vision , RGB, YUV, CMYK , Gamma correction,

BITS Pilani, Hyderabad Campus

Course Recap: Key modules (2/3)

Audio Compression: telecom driven conferencing approaches (ADPCM), Vocoders

BITS Pilani, Hyderabad Campus

Course Recap: Key modules (3/3)

Multimedia Communications broad overview of networks and its evolution,

Modern Multimedia Systems

Adaptive streaming, DASH (MPD Media

Multimedia Search, Detection and Tracking HEVC, Search, Detection

BITS Pilani, Hyderabad Campus