You are on page 1of 28

BITS Pilani

Hyderabad Campus

Multimedia
Abhishek Thakur
Computing
CSIS
BITS-Pilani, Hyderabad
Campus

BITS Pilani
Hyderabad Campus

Multimedia Computing
Research trends and Review of Key Terms
Module -10 (of 10)

BITS Pilani
Hyderabad Campus

References: Online materials

Modules
Module

Coverage

1. Introduction

Overview of Multimedia Applications, Systems and


Tools

2. Data Compression

Lossless and Lossy compression

3. Image, Graphics and


Colors

Graphics and Image Data Representation, Colour


Science and JPEG

4. Video / Audio
Fundamentals

Basics of Audio and Video as they evolved

5. Video Compression

H261 H264,MPEG and High Efficiency Video Coding

6. Audio and
Synchronization

Audio compression techniques and synchronization


between audio, video, text, graphics etc.

7. Storage and
Communication Basics

Overview of secondary storage and networks, latency,


buffering / queuing, interrupts etc.

8. Multimedia
Communication

Real time communication protocols, Quality of Service,


MPEG transport, Wireless communication

9. Modern Multimedia

New protocols (VoIP, DASH) and applications (YouTube,


Content Distribution).

10. Research trends


and Computing
Video Search,
object detection
and
course
Multimedia
10/4/16
Slide 4 and tracking
BITS Pilani,
Hyderabad
Campus

10.1 HEVC
High Efficiency Video Coding
Aims for 50% bit-rate reduction for similar video quality as compared
to Advanced Video Coding (AVC / H264 / MPEG-4 part 10)
25% reduction in bit rate with 50% reduction is complexity for same
quality
Feasible because of faster and more processing, larger buffers,
more GPU cores
Ref: Wikipedia + Gary Sullivans Overview of the High Efficiency
Video Coding (HEVC) Standard work (http://goo.gl/6aXHK3 and
http://goo.gl/SoLBEY)
Multimedia Computing

10/4/16

Slide 5

BITS Pilani, Hyderabad Campus

Quick overview of Video


Compression (Till H-264)

Macro-Blocks typically 16x16


Intra coding spatial redundancy within frame / picture
2D Transforms on blocks (DCT/DWT) typically 8x8
Motion Prediction Predicted frames(P) .. Bi-Directional prediction (B)
Quantization compression after Transform (DCT/DWT)
Lossless Compression Run length, Arithmetic CABAC etc.
Prediction of Values - DC components, AC Conponents, motion
vectors

Group of Blocks (GOB) or Slices error resilience & parallelism


Scalability in multiple manners (spatial, temporal, quality, )
Object based coding for Graphical output
Multimedia Computing

10/4/16

Slide 6

BITS Pilani, Hyderabad Campus

HEVC Needs
Higher resolution Video content and Displays [8k]
Higher frame rates [60 or 120 fps for super slow motion etc.]
Medical and other fields with higher color depth (say 10 bit i.o. 8 bit
on luminance)
More CPU/GPU cores needs better parallelization of logic
Multi View coding (multiple video feeds / 3D - though partially done in AVC)
Version 1 of HEVC formally ratified in April 2013
Multimedia Computing

10/4/16

Slide 7

BITS Pilani, Hyderabad Campus

Key changes in HEVC


Larger Coding Tree Unit (CTU 64x64) for motion prediction
etc. in place of 16x16 macro blocks
Concept of rectangular tiles as an option with no Intra/inter
prediction across tiles in I slices and no Intra prediction in P/B
slices
CTU Can be split into smaller Transform Units (TU of 4x4, 8x8,
16x16)
Each CTU can have 4 or more TU
Loop filters may be used minimizes blocking effects around CTUs
Multimedia Computing

10/4/16

Slide 8

BITS Pilani, Hyderabad Campus

Other changes
Waveform parallel processing (WPP)

Slices divided into rows of CTUs.


Some initial CTUs needs to be decoded for top row, before next row
can be decoded.
Allows some level of parallelism

Other improvements made on CABAC, motion vector prediction,


loop filters etc.

Multimedia Computing

10/4/16

Slide 9

BITS Pilani, Hyderabad Campus

10.2 Video Search


Order of complexity much higher than Text Search or Image
Search.
If the content is well organized during storage (e.g. DVD Library or
Electronic Program Guide - EPG) it can be as simple as indexing on title (file
name), or tags put during storage e.g. actors, channel number (or name)
etc. explicit metadata can be used for such search.

Some times internal (implicit) metadata can be used e.g. Title and
other details within container (say file header fields), time and location of
capture / edit, tools used to capture / edit.

Next level is search for content beyond the headers e.g. subtitles
(transcripts or close-caption info), audio search, story search, image search,
scene search etc.

How the search engine is driven is also important e.g. is it pure text
search, sample image based search, interpret implicit user intent etc.
Multimedia Computing

10/4/16

Slide 10

BITS Pilani, Hyderabad Campus

Advanced Video Search


Speech recognition say search when Welcome to New York is spoken
in video: Research area of Audio to Text and beyond.

Text within video search when M G Road or Mahatma Gandhi Road


us present within some scene in a video: Research area of Optical
Character recognition and beyond.

Frame analysis Visual descriptors analyze each frame to extract


information that can be later searched for e.g. color, texture, shape,
motion, situation etc.

Search against given input search across videos to match the image
of missing person or pet
Multimedia Computing

10/4/16

Slide 11

BITS Pilani, Hyderabad Campus

Video Analysis
There are three key steps in video analysis:
detection of interesting moving objects,
tracking of such objects from frame to frame, and
analysis of object tracks to recognize their behaviour.

Multimedia Computing

10/4/16

Slide 12

BITS Pilani, Hyderabad Campus

10.3 Object Detection and


Tracking
Ref: Yilmaz, Alper, Omar Javed, and Mubarak Shah. "Object tracking: A
survey."Acm computing surveys (CSUR) 38, no. 4 (2006): 13.
Some applications:
motion-based recognition, that is, human identification based on gait, automatic

object detection, etc;


automated surveillance, that is, monitoring a scene to detect suspicious activities
or unlikely events;
video indexing, that is, automatic annotation and retrieval of the videos in
multimedia databases;
human-computer interaction, that is, gesture recognition, eye gaze tracking for
data input to computers, etc.;
traffic monitoring, that is, real-time gathering of traffic statistics to direct traffic flow.
vehicle navigation, that is, video-based path planning and obstacle avoidance
capabilities.

Multimedia Computing

10/4/16

Slide 13

BITS Pilani, Hyderabad Campus

Tracking - phases
Tracking task:
In the simplest form, tracking can be defined as the problem of
estimating the trajectory of an object in the image plane as it
moves around a scene. In other words, a tracker assigns
consistent labels to the tracked objects in different frames of a
video. Additionally, depending on the tracking domain, a tracker can
also provide object centric information, such as orientation, area, or
shape of an object.
How - Two subtasks:
Build some model of what you want to track
Use what you know about where the object was in the previous
frame(s) to make predictions about the current frame and restrict the
search

Repeat the two subtasks, possibly updating the model


Multimedia Computing

10/4/16

Slide 14

BITS Pilani, Hyderabad Campus

Tracking : Complexity
Tracking objects can be complex due to:
loss of information caused by projection of 3D world on 2D image
noise in images
complex object shapes / motion
non-rigid or articulated nature of objects
partial and full object occlusions
scene illumination changes
real-time processing requirements
Constraints to Simplify:
Almost all tracking algorithms assume that the object motion is smooth with no
abrupt changes
The object motion is assumed to be of constant velocity
Prior knowledge about the number and the size of objects, or the object
appearance and shape
Multimedia Computing

10/4/16

Slide 15

BITS Pilani, Hyderabad Campus

Object Representation - Shape

Multimedia Computing

10/4/16

Slide 16

(a) Centroid,
(b) Multiple
points,
(c) Rectangular
patch,
(d) Elliptical
patch,
(e) Part-based
multiple
patches,
(f) Object
skeleton,
(g) Complete
object
contour,
(h) Control points
on object
contour,
(i) Object
silhouette.
BITS Pilani, Hyderabad Campus

Object Representation Appearance


Template based e.g. Face Description parameters pose
should not vary much
Probability densities based e.g. texture within the ellipse or
segment as modelled in Gaussian or Histogram based
approach

Multimedia Computing

10/4/16

Slide 17

BITS Pilani, Hyderabad Campus

Features for Tracking


Color: RGB, Luv, Lab, HSV, etc. There is no last word on
which color space is more effective; a variety of color spaces
have been used
Edges: less sensitive to illumination changes compared to color
features. Algorithms that track the object boundary usually
use edges as features. Because of its simplicity and
accuracy, the most popular edge detection approach is the
Canny Edge detector.
Texture: measure of the intensity variation of a surface which
quantifies properties such as smoothness and regularity
Optical Flow: Look at uniformity of displacement across frames

Multimedia Computing

10/4/16

Slide 18

BITS Pilani, Hyderabad Campus

Object Detection
Either at beginning or when an object first appears in the video
Point detectors: find interest points in images which have an
expressive texture in their respective localities
Segmentation: partition the image into perceptually similar
regions
Background subtraction: Build a representation of the scene
called the background model and then finding deviations from
the model for each incoming frame.
Supervised Classifiers: Prior training based on sample images
which are pre-classified (SVM, Neural networks etc.)

Multimedia Computing

10/4/16

Slide 19

BITS Pilani, Hyderabad Campus

Detection Examples

Multimedia Computing

10/4/16

Slide 20

BITS Pilani, Hyderabad Campus

Object Segmentation

Mean Shift Clustering


Graph Cut e.g. Normalized cuts
Contour based approaches

Multimedia Computing

10/4/16

Slide 21

BITS Pilani, Hyderabad Campus

Object Tracking

Fig. 7. Taxonomy of tracking methods.


Multimedia Computing

10/4/16

Slide 22

BITS Pilani, Hyderabad Campus

Some Examples of object


tracking

Fig. 8. (a) Different tracking approaches. Multipoint correspondence, (b) parametric


transformation of a rectangular patch, (c, d) Two examples of contour evolution.
Fig. 11. Results of two
point correspondence
algorithms. (a) Tracking
using the algorithm
proposed by Veenman et
al. [2001] in the rotating
dish sequence color
segmentation was used to
detect black dots on a
white dish (c 2001 IEEE).
(b) Tracking birds using
the algorithm proposed
by Shafique and Shah
[2003]; birds are detected
using
background
Multimedia Computing
10/4/16
Slide 23
BITS
Pilani, Hyderabad Campus

10.4 Course Recap: Key Topics


(1/3)
Intro Evolution of Multimedia and Research challenges; Authoring and Editing tools
Compression
Lossless Compression Entropy, Run Length Coding, Variable Length Coding, Adaptive Huffman,
Dictionary based coding (LZW), Arithmetic, Differential coding in Images

Lossly Compression Quantization (linear/non-linear), Transforms DCT, DWT, Embedded Zerotree


Wavelet coefficient (successive approximation)

Color Science Human vision , RGB, YUV, CMYK , Gamma correction,


Image file format and compression 1 bit per pixel, 32 bpp.., palette, Color
Lookup tables

Audio Video Fundamentals Analog video: Connectivity (separate color and audio
etc.), scan lines, interlacing, color and audio subcarrier; Digital Video: Chroma subsampling,
screen resolution, frames rate Digitization of Sound: Sampling and Quantization, aliasing
and Nyquist sampling rate, SNR, logarithmic amplitudes (dB/dBm), non-linear transform before
encoding, human voice vs. music, band-limiting / low pass for voice, synthetic sound, PCM,
DPCM, Adaptive Delta Modulation, Adaptive DPCM
Multimedia Computing

10/4/16

Slide 24

BITS Pilani, Hyderabad Campus

Course Recap: Key modules (2/3)


Video Compression: Macro-blocks, Motion prediction / motion vectors, GOB; MPEG,
bi-directional prediction, Slices, Scalable video coding (SNR, Temporal, Spatial, Hybrid, data
partitioning); MPEG-4 - Object based coding, Video object Sequence .. Video object
plane(VOP), Bounding box and boundary block in VOP, padding, shape adaptive DCT, shape
coding (intro), global vs. local motion compensation, triangulation for synthetic objects,
face/body animation/definition parameters; H264/AVC; MPEG-7 (description).

Audio Compression: telecom driven conferencing approaches (ADPCM), Vocoders


(channel bands discard phase), Linear predictive coding with vocoders, Code Excited
Linear Prediction (long term / short term); Psychoacoustics, Frequency and temporal
masking; MPEG audio - time to frequency transform, audio frames (12 or 36 samples across
32 bands), stereo/multi-channel audio,

Synchronization: Temporal Sync between audio, video, pointer etc., Logical data units

(LDU), compensating for loss of sync; Synchronization specification models Interval based,
Timeline axis based, Flow Control - Hierarchical (serial / parallel), Reference point based
Flow control, Event based.
Multimedia Computing

10/4/16

Slide 25

BITS Pilani, Hyderabad Campus

Course Recap: Key modules (3/3)


Multimedia Systems challenge on scalability and portability [host CPU/RAM/Disk,
network, capture interface or device, render interface or device], intro to storage and
scheduling challenges

Multimedia Communications broad overview of networks and its evolution,


circuit vs. packet switched, OSI vs. TCP/IP layers, physical layers, Mobility and wireless,
TCP and UDP; FDM or WDM, TDM, Ethernet, access networks (fiber, terrestrial, satellite);
QoS, Jitter, MPLS; Multimedia on UDP/IP (Multicast, RTP, RTCP , RSVP , RTSP, H323, SIP),
Set top box, Broadcast schemes for Media on Demand, Buffer management; Wireless
networks - Cellular evolution GSM/CDMA, 3G and beyond, wireless LAN, Bluetooth,
wireless propagation and loss, error resiliency in multimedia for wireless communication
[FEC, burst errors, Error Resilient Entropy Coding blocks are at fixed distance]

Modern Multimedia Systems

Adaptive streaming, DASH (MPD Media


Presentation Description), Segment indexing, Content Distribution Networks, Scalable file
systems, Hadoop FS, GFS, BigTable

Multimedia Search, Detection and Tracking HEVC, Search, Detection


and Tracking
Multimedia Computing

10/4/16

Slide 26

BITS Pilani, Hyderabad Campus

Multimedia Computing

10/4/16

Slide 27

BITS Pilani, Hyderabad Campus

Thank You !!

Multimedia Computing

10/4/16

Slide 28

BITS Pilani, Hyderabad Campus

You might also like